Mathematical Physical Chemistry: Practical and Intuitive Methodology [3 ed.]
 9819925118, 9789819925117, 9789819925124, 9789819925148

  • Commentary
  • Publisher PDF | Published: 05 October 2023

Table of contents :
Preface to the Third Edition
Preface to the Second Edition
Preface to the First Edition
Contents
Part I: Quantum Mechanics
Chapter 1: Schrödinger Equation and Its Application
1.1 Early-Stage Quantum Theory
1.2 Schrödinger Equation
1.3 Simple Applications of Schrödinger Equation
1.4 Quantum-Mechanical Operators and Matrices
1.5 Commutator and Canonical Commutation Relation
Reference
Chapter 2: Quantum-Mechanical Harmonic Oscillator
2.1 Classical Harmonic Oscillator
2.2 Formulation Based on an Operator Method
2.3 Matrix Representation of Physical Quantities
2.4 Coordinate Representation of Schrödinger Equation
2.5 Variance and Uncertainty Principle
References
Chapter 3: Hydrogen-Like Atoms
3.1 Introductory Remarks
3.2 Constitution of Hamiltonian
3.3 Separation of Variables
3.4 Generalized Angular Momentum
3.5 Orbital Angular Momentum: Operator Approach
3.6 Orbital Angular Momentum: Analytic Approach
3.6.1 Spherical Surface Harmonics and Associated Legendre Differential Equation
3.6.2 Orthogonality of Associated Legendre Functions
3.7 Radial Wave Functions of Hydrogen-Like Atoms
3.7.1 Operator Approach to Radial Wave Functions [3]
3.7.2 Normalization of Radial Wave Functions [10]
3.7.3 Associated Laguerre Polynomials
3.8 Total Wave Functions
References
Chapter 4: Optical Transition and Selection Rules
4.1 Electric Dipole Transition
4.2 One-Dimensional System
4.3 Three-Dimensional System
4.4 Selection Rules
4.5 Angular Momentum of Radiation [6]
References
Chapter 5: Approximation Methods of Quantum Mechanics
5.1 Perturbation Method
5.1.1 Quantum State and Energy Level Shift Caused by Perturbation
5.1.2 Several Examples
5.2 Variational Method
References
Chapter 6: Theory of Analytic Functions
6.1 Set and Topology
6.1.1 Basic Notions and Notations
6.1.2 Topological Spaces and Their Building Blocks
(a) Neighborhoods [4]
(b) Interior and Closure [4]
(c) Boundary [4]
(d) Accumulation Points and Isolated Points
(e) Connectedness
6.1.3 T1-Space
6.1.4 Complex Numbers and Complex Plane
6.2 Analytic Functions of a Complex Variable
6.3 Integration of Analytic Functions: Cauchy´s Integral Formula
6.4 Taylor´s Series and Laurent´s Series
6.5 Zeros and Singular Points
6.6 Analytic Continuation
6.7 Calculus of Residues
6.8 Examples of Real Definite Integrals
6.9 Multivalued Functions and Riemann Surfaces
6.9.1 Brief Outline
6.9.2 Examples of Multivalued Functions
References
Part II: Electromagnetism
Chapter 7: Maxwell´s Equations
7.1 Maxwell´s Equations and Their Characteristics
7.2 Equation of Wave Motion
7.3 Polarized Characteristics of Electromagnetic Waves
7.4 Superposition of Two Electromagnetic Waves
References
Chapter 8: Reflection and Transmission of Electromagnetic Waves in Dielectric Media
8.1 Electromagnetic Fields at an Interface
8.2 Basic Concepts Underlying Phenomena
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves
8.4 Energy Transport by Electromagnetic Waves
8.5 Brewster Angles and Critical Angles
8.6 Total Reflection
8.7 Waveguide Applications
8.7.1 TE and TM Waves in a Waveguide
8.7.2 Total Internal Reflection and Evanescent Waves
8.8 Stationary Waves
References
Chapter 9: Light Quanta: Radiation and Absorption
9.1 Blackbody Radiation
9.2 Planck´s Law of Radiation and Mode Density of Electromagnetic Waves
9.3 Two-Level Atoms
9.4 Dipole Radiation
9.5 Lasers
9.5.1 Brief Outlook
9.5.2 Organic Lasers
9.6 Mechanical System
References
Chapter 10: Introductory Green´s Functions
10.1 Second-Order Linear Differential Equations (SOLDEs)
10.2 First-Order Linear Differential Equations (FOLDEs)
10.3 Second-Order Differential Operators
10.4 Green´s Functions
10.5 Construction of Green´s Functions
10.6 Initial-Value Problems (IVPs)
10.6.1 General Remarks
10.6.2 Green´s Functions for IVPs
10.6.3 Estimation of Surface Terms
10.6.4 Examples
10.7 Eigenvalue Problems
References
Part III: Linear Vector Spaces
Chapter 11: Vectors and Their Transformation
11.1 Vectors
11.2 Linear Transformations of Vectors
11.3 Inverse Matrices and Determinants
11.4 Basis Vectors and Their Transformations
References
Chapter 12: Canonical Forms of Matrices
12.1 Eigenvalues and Eigenvectors
12.2 Eigenspaces and Invariant Subspaces
12.3 Generalized Eigenvectors and Nilpotent Matrices
12.4 Idempotent Matrices and Generalized Eigenspaces
12.5 Decomposition of Matrix
12.6 Jordan Canonical Form
12.6.1 Canonical Form of Nilpotent Matrix
12.6.2 Jordan Blocks
12.6.3 Example of Jordan Canonical Form
12.7 Diagonalizable Matrices
References
Chapter 13: Inner Product Space
13.1 Inner Product and Metric
13.2 Gram Matrices
13.3 Adjoint Operators
13.4 Orthonormal Basis
References
Chapter 14: Hermitian Operators and Unitary Operators
14.1 Projection Operators
14.2 Normal Operators
14.3 Unitary Diagonalization of Matrices
14.4 Hermitian Matrices and Unitary Matrices
14.5 Hermitian Quadratic Forms
14.6 Simultaneous Eigenstates and Diagonalization
References
Chapter 15: Exponential Functions of Matrices
15.1 Functions of Matrices
15.2 Exponential Functions of Matrices and Their Manipulations
15.3 System of Differential Equations
15.3.1 Introduction
15.3.2 System of Differential Equations in a Matrix Form: Resolvent Matrix
15.3.3 Several Examples
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave
References
Part IV: Group Theory and Its Chemical Applications
Chapter 16: Introductory Group Theory
16.1 Definition of Groups
16.2 Subgroups
16.3 Classes
16.4 Isomorphism and Homomorphism
16.5 Direct-Product Groups
Reference
Chapter 17: Symmetry Groups
17.1 A Variety of Symmetry Operations
17.2 Successive Symmetry Operations
17.3 O and Td Groups
17.4 Special Orthogonal Group SO(3)
17.4.1 Rotation Axis and Rotation Matrix
17.4.2 Euler Angles and Related Topics
References
Chapter 18: Representation Theory of Groups
18.1 Definition of Representation
18.2 Basis Functions of Representation
18.3 Schur´s Lemmas and Grand Orthogonality Theorem (GOT)
18.4 Characters
18.5 Regular Representation and Group Algebra
18.6 Classes and Irreducible Representations
18.7 Projection Operators: Revisited
18.8 Direct-Product Representation
18.9 Symmetric Representation and Antisymmetric Representation
References
Chapter 19: Applications of Group Theory to Physical Chemistry
19.1 Transformation of Functions
19.2 Method of Molecular Orbitals (MOs)
19.3 Calculation Procedures of Molecular Orbitals (MOs)
19.4 MO Calculations Based on π-Electron Approximation
19.4.1 Ethylene
19.4.2 Cyclopropenyl Radical [1]
19.4.3 Benzene
19.4.4 Allyl Radical [1]
19.5 MO Calculations of Methane
References
Chapter 20: Theory of Continuous Groups
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation
20.2 Rotation Groups: SU(2) and SO(3)
20.2.1 Construction of SU(2) Matrices
20.2.2 SU(2) Representation Matrices: Wigner Formula
20.2.3 SO(3) Representation Matrices and Spherical Surface Harmonics
20.2.4 Irreducible Representations of SU(2) and SO(3)
20.2.5 Parameter Space of SO(3)
20.2.6 Irreducible Characters of SO(3) and Their Orthogonality
20.3 Clebsch-Gordan Coefficients of Rotation Groups
20.3.1 Direct-Product of SU(2) and Clebsch-Gordan Coefficients
20.3.2 Calculation Procedures of Clebsch-Gordan Coefficients
20.3.3 Examples of Calculation of Clebsch-Gordan Coefficients
20.4 Lie Groups and Lie Algebras
20.4.1 Definition of Lie Groups and Lie Algebras: One-Parameter Groups
20.4.2 Properties of Lie Algebras
20.4.3 Adjoint Representation of Lie Groups
20.5 Connectedness of Lie Groups
20.5.1 Several Definitions and Examples
20.5.2 O(3) and SO(3)
20.5.3 Simply Connected Lie Groups: Local Properties and Global Properties
References
Part V: Introduction to the Quantum Theory of Fields
Chapter 21: The Dirac Equation
21.1 Historical Background
21.2 Several Remarks on the Special Theory of Relativity
21.2.1 Minkowski Space and Lorentz Transformation
21.2.2 Event and World Interval
21.3 Constitution and Solutions of the Dirac Equation
21.3.1 General Form of the Dirac Equation
21.3.2 Plane Wave Solutions of the Dirac Equation
21.3.3 Negative-Energy Solution of the Dirac Equation
21.3.4 One-Particle Hamiltonian of the Dirac Equation
21.4 Normalization of the Solutions of the Dirac Equation
21.5 Charge Conjugation
21.6 Characteristics of the Gamma Matrices
References
Chapter 22: Quantization of Fields
22.1 Lagrangian Formalism of the Fields [1]
22.2 Introductory Fourier Analysis [3]
22.2.1 Fourier Series Expansion
22.2.2 Fourier Integral Transforms: Fourier Transform and Inverse Fourier Transform
22.3 Quantization of the Scalar Field [5, 6]
22.3.1 Lagrangian Density and Action Integral
22.3.2 Equal-Time Commutation Relation and Field Quantization
22.3.3 Hamiltonian and Fock Space
22.3.4 Invariant Delta Functions of the Scalar Field
22.3.5 Feynman Propagator of the Scalar Field
22.3.6 General Consideration on the Field Quantization
22.4 Quantization of the Dirac Field [5, 6]
22.4.1 Lagrangian Density and Hamiltonian Density of the Dirac Field
22.4.2 Quantization Procedure of the Dirac Field
22.4.3 Antiparticle: Positron
22.4.4 Invariant Delta Functions of the Dirac Field
22.4.5 Feynman Propagator of the Dirac Field
22.5 Quantization of the Electromagnetic Field [5-7]
22.5.1 Relativistic Formulation of the Electromagnetic Field
22.5.2 Lagrangian Density and Hamiltonian Density of the Electromagnetic Field
22.5.3 Polarization Vectors of the Electromagnetic Field [5]
22.5.4 Canonical Quantization of the Electromagnetic Field
22.5.5 Hamiltonian and Indefinite Metric
22.5.6 Feynman Propagator of Photon Field
References
Chapter 23: Interaction Between Fields
23.1 Lorentz Force and Minimal Coupling
23.2 Lagrangian and Field Equation of the Interacting Fields
23.3 Local Phase Transformation and U(1) Gauge Field
23.4 Interaction Picture
23.5 S-Matrix and S-Matrix Expansion
23.6 N-Product and T-Product
23.6.1 Example of Two Field Operators
23.6.2 Calculations Including Both the N-Products and T-Products
23.7 Feynman Rules and Feynman Diagrams in QED
23.7.1 Zeroth- and First-Order S-Matrix Elements
23.7.2 Second-Order S-Matrix Elements and Feynman Diagrams
23.8 Example: Compton Scattering [1]
23.8.1 Introductory Remarks
23.8.2 Feynman Amplitude and Feynman Diagrams of Compton Scattering
23.8.3 Scattering Cross-Section [1]
23.8.4 Spin and Photon Polarization Sums
23.8.5 Detailed Calculation Procedures of Feynman Amplitude [1]
23.8.6 Experimental Observations
23.9 Summary
References
Chapter 24: Basic Formalism
24.1 Extended Concepts of Vector Spaces
24.1.1 Bilinear Mapping
24.1.2 Tensor Product
24.1.3 Bilinear Form
24.1.4 Dual Vector Space
24.1.5 Invariants
24.1.6 Tensor Space and Tensors
24.1.7 Euclidean Space and Minkowski Space
24.2 Lorentz Group and Lorentz Transformations
24.2.1 Lie Algebra of the Lorentz Group
24.2.2 Successive Lorentz Transformations [9]
24.3 Covariant Properties of the Dirac Equation
24.3.1 General Consideration on the Physical Equation
24.3.2 The Klein-Gordon Equation and the Dirac Equation
24.4 Determination of General Form of Matrix S(Λ)
24.5 Transformation Properties of the Dirac Spinors [9]
24.6 Transformation Properties of the Dirac Operators [9]
24.6.1 Case I: Single Lorentz Boost
24.6.2 Case II: Non-Coaxial Lorentz Boosts
24.7 Projection Operators and Related Operators for the Dirac Equation
24.8 Spectral Decomposition of the Dirac Operators
24.9 Connectedness of the Lorentz Group
24.9.1 Polar Decomposition of a Non-Singular Matrix [14, 15]
24.9.2 Special Linear Group: SL(2,)
24.10 Representation of the Proper Orthochronous Lorentz Group SO0(3,1) [6, 8]
References
Chapter 25: Advanced Topics of Lie Algebra
25.1 Differential Representation of Lie Algebra [1]
25.1.1 Overview
25.1.2 Adjoint Representation of Lie Algebra [1]
25.1.3 Differential Representation of Lorentz Algebra [1]
25.2 Cartan-Weyl Basis of Lie Algebra [1]
25.2.1 Complexification
25.2.2 Coupling of Angular Momenta: Revisited
25.3 Decomposition of Lie Algebra
25.4 Further Topics of Lie Algebra
25.5 Closing Remarks
References
Index

Citation preview

Shu Hotta

Mathematical Physical Chemistry Practical and Intuitive Methodology Third Edition

Mathematical Physical Chemistry

Shu Hotta

Mathematical Physical Chemistry Practical and Intuitive Methodology Third Edition

Shu Hotta Takatsuki, Osaka, Japan

ISBN 978-981-99-2512-4 ISBN 978-981-99-2511-7 https://doi.org/10.1007/978-981-99-2512-4

(eBook)

© Springer Nature Singapore Pte Ltd. 2018, 2020, 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

To the memory of my brother Kei Hotta and Emeritus Professor Yûsaku Hamada

Preface to the Third Edition

This book is the third edition of Mathematical Physical Chemistry. Although the main concept and main theme remain unchanged, this book’s third edition contains the introductory quantum theory of fields. The major motivation for including the relevant topics in the present edition is as follows: The Schrödinger equation has prevailed as a quantum-mechanical fundamental equation up to the present date in many fields of natural science. The Dirac equation was formulated as a relativistically covariant expression soon after the Schrödinger equation was discovered. In contrast to the Schrödinger equation, however, the Dirac equation is less prevailing even nowadays. One of the major reasons for this is that the Dirac equation has been utilized almost exclusively by elementary particle physicists and related researchers. And what is more, a textbook of the elementary particle physics or quantum theory of fields cannot help but be thick because it has to deal with so many topics that require a detailed explanation, starting with, e.g., renormalization and regularization as well as the underlying gauge theory. Such a situation inevitably leads to significant reduction of descriptions about fundamental properties of the Dirac equation. Under these circumstances, the author has aimed to explain the fundamental properties of the Dirac equation in a plain style so that scientific audience of various fields of specialization can readily understand the essence. In particular, the author described in detail the transformation properties of the Dirac equation (or Dirac operator) and its plane wave solutions in terms of standard matrix algebra. Another motivation for writing the present edition lies in extending the concepts of vector spaces. The theory of linear vector spaces has been developed in previous editions, in which the inner product space is intended as a vector space. The Dirac equation was constructed so as to be consistent with the requirement of the special theory of relativity that is closely connected to the Minkowski space, a type of vector space. In relation to the Minkowski space, moreover, various kinds of vector spaces play a role in the quantum theory of fields. Examples include tensor space and spinor space. In particular, the tensor spaces are dealt with widely in different fields of

vii

viii

Preface to the Third Edition

natural science. In this book, therefore, the associated theory of tensor spaces has been treated rather systematically along with related concepts. As is the case with the previous editions of this book, the author has disposed the mathematical topics at the last chapter of individual parts (Parts I–V). In the present edition, we added a chapter that dealt with advanced topics of Lie algebra at the end of the book. The description has been improved in several parts so that readers can gain clear understanding. Errors found in the previous editions have been corrected. As in the case of the previous editions, readers benefit from going freely back and forth across the whole topics of this book. The author owes sincere gratitude to the late Emeritus Professor Yûsaku Hamada in Kyoto Institute of Technology for a course of excellent lectures delivered by him on applied mathematics in the undergraduate days of the author. The author is deeply grateful as well to Emeritus Professor Chiaki Tsukamoto in Kyoto Institute of Technology for his helpful discussions and suggestions on exponential functions of matrices (Chap. 15). Once again, the author wishes to thank many students for valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an opportunity to write this book. Finally, the author is most grateful to his wife, Kazue Hotta, for continually encouraging him to write the book. Takatsuki, Osaka, Japan March 2023

Shu Hotta

Preface to the Second Edition

This book is the second edition of Mathematical Physical Chemistry. Mathematics is a common language of natural science including physics, chemistry, and biology. Although the words mathematical physics and physical chemistry (or chemical physics) are commonly used, mathematical physical chemistry sounds rather uncommon. Therefore, it might well be reworded as the mathematical physics for chemists. The book title could have been, for instance, “The Mathematics Of Physics And Chemistry” accordingly, in tribute to the famous book that was written three-quarters of a century ago by H. Margenau and G. M. Murphy. Yet, the word mathematical physical chemistry is expected to be granted citizenship, considering that chemistry and related interdisciplinary fields such as materials science and molecular science are becoming increasingly mathematical. The main concept and main theme remain unchanged, but this book’s second edition contains the theory of analytic functions and the theory of continuous groups. Both the theories are counted as one of the most elegant theories of mathematics. The mathematics of these topics is of a somewhat advanced level and something like a “sufficient condition” for chemists, whereas that of the first edition may be a prerequisite (or a necessary condition) for them. Therefore, chemists (or may be physicists as well) can creatively use the two editions. In association with these major additions to the second edition, the author has disposed the mathematical topics (the theory of analytic functions, Green’s functions, exponential functions of matrices, and the theory of continuous groups) at the last chapter of individual parts (Parts I–IV). At the same time, the author has also made several specific revisions including the introductory discussion on the perturbation method and variational method, both of which can be effectively used for gaining approximate solutions of various quantummechanical problems. As another topic, the author has presented the recent progress on organic lasers. This topic is expected to help develop high-performance lightemitting devices, one of important fields of materials science. As in the case of the first edition, readers benefit from going freely back and forth across the whole topics of this book. ix

x

Preface to the Second Edition

Once again, the author wishes to thank many students for valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an opportunity to write this book. Takatsuki, Osaka, Japan October 2019

Shu Hotta

Preface to the First Edition

The contents of this book are based upon manuscripts prepared for both undergraduate courses of Kyoto Institute of Technology by the author entitled “Polymer Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s course lecture of Kyoto Institute of Technology by the author entitled “Solid-State Polymers Engineering.” This book is intended for graduate and undergraduate students, especially those who major in chemistry and, at the same time, wish to study mathematical physics. Readers are supposed to have basic knowledge of analysis and linear algebra. However, they are not supposed to be familiar with the theory of analytic functions (i.e., complex analysis), even though it is desirable to have relevant knowledge about it. At the beginning, mathematical physics looks daunting to chemists, as used to be the case with myself as a chemist. The book introduces the basic concepts of mathematical physics to chemists. Unlike other books related to mathematical physics, this book makes a reasonable selection of material so that students majoring in chemistry can readily understand the contents in spontaneity. In particular, we stress the importance of practical and intuitive methodology. We also expect engineers and physicists to benefit from reading this book. In Parts I and II, the book describes quantum mechanics and electromagnetism. Relevance between the two is well considered. Although quantum mechanics covers the broad field of modern physics, in Part I we focus on a harmonic oscillator and a hydrogen (like) atom. This is because we can study and deal with many of fundamental concepts of quantum mechanics within these restricted topics. Moreover, knowledge acquired from the study of the topics can readily be extended to practical investigation of, e.g., electronic sates and vibration (or vibronic) states of molecular systems. We describe these topics by both analytic method (that uses differential equations) and operator approach (using matrix calculations). We believe that the basic concepts of quantum mechanics can be best understood by contrasting the analytical and algebraic approaches. For this reason, we give matrix representations

xi

xii

Preface to the First Edition

of physical quantities whenever possible. Examples include energy eigenvalues of a quantum-mechanical harmonic oscillator and angular momenta of a hydrogen-like atom. At the same time, these two physical systems supply us with a good opportunity to study classical polynomials, e.g., Hermite polynomials, (associated) Legendre polynomials, Laguerre polynomials, and Gegenbauer polynomials and special functions, more generally. These topics constitute one of important branches of mathematical physics. One of the basic concepts of quantum mechanics is that a physical quantity is represented by a Hermitian operator or matrix. In this respect, the algebraic approach gives a good opportunity to get familiar with this concept. We present tangible examples for this. We also emphasize the importance of the notion of Hermiticity of a differential operator. We often encounter unitary operator or unitary transformation alongside of the notion of Hermitian operator. We show several examples of the unitary operators in connection with transformation of vectors and coordinates. Part II describes Maxwell equations and their applications to various phenomena of electromagnetic waves. These include their propagation, reflection, and transmission in dielectric media. We restrict ourselves to treating those phenomena in dielectrics without charge. Yet, we cover a wide range of important topics. In particular, when two (or more) dielectrics are in contact with each other at a plane interface, reflection and transmission of light are characterized by various important parameters such as reflection and transmission coefficients, Brewster angles, and critical angles. We should have their proper understanding not only from a point of view of basic study, but also to make use of relevant knowledge in optical device applications such as a waveguide. In contrast to the concept of electromagnetic waves, light possesses a characteristic of light quanta. We present semiclassical and statistical approach to blackbody radiation occurring in a simplified system in relation to Part I. The physical processes are well characterized by a notion of two-level atoms. In this context, we outline the dipole radiation within the framework of the classical theory. We briefly describe how the optical processes occurring in a confined dielectric medium are related to a laser that is of great importance in fundamental science and its applications. Many of basic equations of physics are descried as second-order linear differential equations (SOLDEs). Different methods were developed and proposed to seek their solutions. One of the most important methods is that of Green’s functions. We present the introductory theory of Green’s functions accordingly. In this connection, we rethink the Hermiticity of a differential operator. In Parts III and IV, we describe algebraic structures of mathematical physics. Their understanding is useful to studies of quantum mechanics and electromagnetism whose topics are presented in Parts I and II. Part III deals with theories of linear vector spaces. We focus on the discussion on vectors and their transformations in finite-dimensional vector spaces. Generally, we consider the vector transformations among the vector spaces of different dimensions. In this book, however, we restrict ourselves to the case of the transformation between the vector spaces of same dimension, i.e., endomorphism of the space (Vn → Vn). This is not only because this is most often the case with many of physical applications, but because the

Preface to the First Edition

xiii

relevant operator is represented by a square matrix. Canonical forms of square matrices hold an important position in algebra. These include a triangle matrix, diagonalizable matrix as well as a nilpotent matrix and idempotent matrix. The most general form will be Jordan canonical form. We present its essential parts in detail taking a tangible example. Next to the general discussion, we deal with an inner product space. Once an inner product is defined between any couple of vectors, the vector space is given a fruitful structure. An example is a norm (i.e., “length”) of a vector. Also, we gain a clear relationship between Parts III and I. We define various operators or matrices that are important in physical applications. Examples include normal operators (or matrices) such as Hermitian operators, projection operators, and unitary operators. Once again, we emphasize the importance of Hermitian operators. In particular, two commutable Hermitian matrices share simultaneous eigenvectors (or eigenstates) and, in this respect, such two matrices occupy a special position in quantum mechanics. Finally, Part IV describes the essence of group theory and its chemical applications. Group theory has a broad range of applications in solid-state physics, solidstate chemistry, molecular science, etc. Nonetheless, the knowledge of group theory does not seem to have fully prevailed among chemists. We can discover an adequate reason for this in a preface to the first edition of “Chemical Applications of Group Theory” written by F. A. Cotton. It might well be natural that definition and statement of abstract algebra, especially group theory, sound somewhat pretentious for chemists, even though the definition of group is quite simple. Therefore, we present various examples for readers to get used to notions of group theory. Notion of mapping is important as in the case of linear vector spaces. Aside from being additive with calculation for a vector space and multiplicative for a group, the fundamentals of calculation regulations are pretty much the same regarding the vector space and group. We describe the characteristics of symmetry groups in detail partly because related knowledge is useful for molecular orbital (MO) calculations that are presented in the last section of the book. Representation theory is probably one of the most daunting notions for chemists. Practically, however, the representation is just homomorphism that corresponds to a linear transformation in a vector space. In this context, the representation is merely denoted by a number or a matrix. Basis functions of representation correspond to basis vectors in a vector space. Grand orthogonality theorem (GOT) is a “nursery bed” of the representation theory. Therefore, readers are encouraged to understand its essence apart from the rigorous proof of the theorem. In conjunction with Part III, we present a variety of projection operators. These are very useful to practical applications in, e.g., quantum mechanics and molecular science. The final parts of the book are devoted to applications of group theory to problems of physical chemistry, especially those of quantum chemistry, more specifically molecular orbital calculations. We see how symmetry consideration, particularly use of projection operators, saves us a lot of labor. Examples include aromatic hydrocarbons and methane. The previous sections sum up the contents of this book. Readers may start with any part and go freely back and forth. This is because contents of many parts are interrelated. For example, we stress the importance of Hermiticity of differential

xiv

Preface to the First Edition

operators and matrices. Also, projection operators and nilpotent matrices appear in many parts along with their tangible applications to individual topics. Hence, readers are recommended to carefully examine and compare the related contents throughout the book. We believe that readers, especially chemists, benefit from the writing style of this book, since it is suited to chemists who are good at intuitive understanding. The author would like to thank many students for their valuable suggestions and discussions at the lectures. The author also wishes to thank many students for valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an opportunity to write this book. Kyoto, Japan October 2017

Shu Hotta

Contents

Part I

Quantum Mechanics

1

Schrödinger Equation and Its Application . . . . . . . . . . . . . . . . . . . 1.1 Early-Stage Quantum Theory . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Simple Applications of Schrödinger Equation . . . . . . . . . . . . 1.4 Quantum-Mechanical Operators and Matrices . . . . . . . . . . . . 1.5 Commutator and Canonical Commutation Relation . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 8 14 21 28 30

2

Quantum-Mechanical Harmonic Oscillator . . . . . . . . . . . . . . . . . 2.1 Classical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . 2.2 Formulation Based on an Operator Method . . . . . . . . . . . . . 2.3 Matrix Representation of Physical Quantities . . . . . . . . . . . . 2.4 Coordinate Representation of Schrödinger Equation . . . . . . . 2.5 Variance and Uncertainty Principle . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

31 31 33 42 45 53 58

3

Hydrogen-Like Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Constitution of Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Generalized Angular Momentum . . . . . . . . . . . . . . . . . . . . . 3.5 Orbital Angular Momentum: Operator Approach . . . . . . . . . . 3.6 Orbital Angular Momentum: Analytic Approach . . . . . . . . . . 3.6.1 Spherical Surface Harmonics and Associated Legendre Differential Equation . . . . . . . . . . . . . . . . 3.6.2 Orthogonality of Associated Legendre Functions . . . . 3.7 Radial Wave Functions of Hydrogen-Like Atoms . . . . . . . . . 3.7.1 Operator Approach to Radial Wave Functions . . . . . . 3.7.2 Normalization of Radial Wave Functions . . . . . . . . .

59 59 60 69 75 79 94 94 105 109 110 114 xv

xvi

Contents

3.7.3 Associated Laguerre Polynomials . . . . . . . . . . . . . . . 3.8 Total Wave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

118 124 125

4

Optical Transition and Selection Rules . . . . . . . . . . . . . . . . . . . . . 4.1 Electric Dipole Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 One-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Three-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Angular Momentum of Radiation . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127 127 130 134 144 150 152

5

Approximation Methods of Quantum Mechanics . . . . . . . . . . . . 5.1 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Quantum State and Energy Level Shift Caused by Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Variational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. ..

155 155

. . . .

. . . .

157 160 176 183

Theory of Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Set and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . 6.1.2 Topological Spaces and Their Building Blocks . . . . . 6.1.3 T1-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Complex Numbers and Complex Plane . . . . . . . . . . . 6.2 Analytic Functions of a Complex Variable . . . . . . . . . . . . . . 6.3 Integration of Analytic Functions: Cauchy’s Integral Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Taylor’s Series and Laurent’s Series . . . . . . . . . . . . . . . . . . . 6.5 Zeros and Singular Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Calculus of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Examples of Real Definite Integrals . . . . . . . . . . . . . . . . . . . 6.9 Multivalued Functions and Riemann Surfaces . . . . . . . . . . . . 6.9.1 Brief Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Examples of Multivalued Functions . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

185 185 186 189 199 202 205

6

Part II 7

213 223 229 232 234 236 256 256 264 274

Electromagnetism

Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . 7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . 7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

277 277 284 289 294 303

Contents

8

9

10

Reflection and Transmission of Electromagnetic Waves in Dielectric Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Electromagnetic Fields at an Interface . . . . . . . . . . . . . . . . . . 8.2 Basic Concepts Underlying Phenomena . . . . . . . . . . . . . . . . . 8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Energy Transport by Electromagnetic Waves . . . . . . . . . . . . . 8.5 Brewster Angles and Critical Angles . . . . . . . . . . . . . . . . . . . 8.6 Total Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Waveguide Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 TE and TM Waves in a Waveguide . . . . . . . . . . . . . 8.7.2 Total Internal Reflection and Evanescent Waves . . . . 8.8 Stationary Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

305 305 307 313 319 322 326 331 331 338 343 350

Light Quanta: Radiation and Absorption . . . . . . . . . . . . . . . . . . . 9.1 Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Two-Level Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Dipole Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Brief Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Organic Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Mechanical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

353 358 361 367 367 372 396 399

Introductory Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Second-Order Linear Differential Equations (SOLDEs) . . . . . 10.2 First-Order Linear Differential Equations (FOLDEs) . . . . . . . 10.3 Second-Order Differential Operators . . . . . . . . . . . . . . . . . . . 10.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Construction of Green’s Functions . . . . . . . . . . . . . . . . . . . . 10.6 Initial-Value Problems (IVPs) . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.2 Green’s Functions for IVPs . . . . . . . . . . . . . . . . . . . 10.6.3 Estimation of Surface Terms . . . . . . . . . . . . . . . . . . 10.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

401 401 406 411 417 424 431 431 433 436 440 448 453

Part III 11

xvii

351 351

Linear Vector Spaces

Vectors and Their Transformation . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Linear Transformations of Vectors . . . . . . . . . . . . . . . . . . . .

457 457 463

xviii

Contents

11.3 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . 11.4 Basis Vectors and Their Transformations . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

473 478 484

12

Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Eigenspaces and Invariant Subspaces . . . . . . . . . . . . . . . . . . 12.3 Generalized Eigenvectors and Nilpotent Matrices . . . . . . . . . . 12.4 Idempotent Matrices and Generalized Eigenspaces . . . . . . . . . 12.5 Decomposition of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.1 Canonical Form of Nilpotent Matrix . . . . . . . . . . . . . 12.6.2 Jordan Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.3 Example of Jordan Canonical Form . . . . . . . . . . . . . 12.7 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

485 485 495 500 505 512 515 515 521 528 539 548

13

Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Inner Product and Metric . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Gram Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

549 549 552 561 566 570

14

Hermitian Operators and Unitary Operators . . . . . . . . . . . . . . . . . 14.1 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Unitary Diagonalization of Matrices . . . . . . . . . . . . . . . . . . . 14.4 Hermitian Matrices and Unitary Matrices . . . . . . . . . . . . . . . 14.5 Hermitian Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Simultaneous Eigenstates and Diagonalization . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

571 571 578 580 588 592 599 605

15

Exponential Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Exponential Functions of Matrices and Their Manipulations . . . 15.3 System of Differential Equations . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.2 System of Differential Equations in a Matrix Form: Resolvent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.3 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

607 607 611 618 618 621 626 637 644

Contents

Part IV

xix

Group Theory and Its Chemical Applications

16

Introductory Group Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Definition of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Isomorphism and Homomorphism . . . . . . . . . . . . . . . . . . . . 16.5 Direct-Product Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

647 647 649 651 653 657 659

17

Symmetry Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1 A Variety of Symmetry Operations . . . . . . . . . . . . . . . . . . . 17.2 Successive Symmetry Operations . . . . . . . . . . . . . . . . . . . . 17.3 O and Td Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4 Special Orthogonal Group SO(3) . . . . . . . . . . . . . . . . . . . . 17.4.1 Rotation Axis and Rotation Matrix . . . . . . . . . . . . . 17.4.2 Euler Angles and Related Topics . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

661 661 669 680 689 690 695 704

18

Representation Theory of Groups . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Definition of Representation . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Basis Functions of Representation . . . . . . . . . . . . . . . . . . . . . 18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) . . . 18.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Regular Representation and Group Algebra . . . . . . . . . . . . . . 18.6 Classes and Irreducible Representations . . . . . . . . . . . . . . . . . 18.7 Projection Operators: Revisited . . . . . . . . . . . . . . . . . . . . . . . 18.8 Direct-Product Representation . . . . . . . . . . . . . . . . . . . . . . . 18.9 Symmetric Representation and Antisymmetric Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

705 705 709 717 723 726 734 737 746

Applications of Group Theory to Physical Chemistry . . . . . . . . . . 19.1 Transformation of Functions . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Method of Molecular Orbitals (MOs) . . . . . . . . . . . . . . . . . . 19.3 Calculation Procedures of Molecular Orbitals (MOs) . . . . . . . 19.4 MO Calculations Based on π-Electron Approximation . . . . . . 19.4.1 Ethylene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.2 Cyclopropenyl Radical . . . . . . . . . . . . . . . . . . . . . . 19.4.3 Benzene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.4 Allyl Radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 MO Calculations of Methane . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

755 755 761 768 773 773 783 791 797 805 826

19

750 753

xx

20

Contents

Theory of Continuous Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1 Introduction: Operators of Rotation and Infinitesimal Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Rotation Groups: SU(2) and SO(3) . . . . . . . . . . . . . . . . . . . . 20.2.1 Construction of SU(2) Matrices . . . . . . . . . . . . . . . . 20.2.2 SU(2) Representation Matrices: Wigner Formula . . . . 20.2.3 SO(3) Representation Matrices and Spherical Surface Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.4 Irreducible Representations of SU(2) and SO(3) . . . . 20.2.5 Parameter Space of SO(3) . . . . . . . . . . . . . . . . . . . . 20.2.6 Irreducible Characters of SO(3) and Their Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Clebsch-Gordan Coefficients of Rotation Groups . . . . . . . . . 20.3.1 Direct-Product of SU(2) and Clebsch-Gordan Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.2 Calculation Procedures of Clebsch-Gordan Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.3 Examples of Calculation of Clebsch-Gordan Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 Lie Groups and Lie Algebras . . . . . . . . . . . . . . . . . . . . . . . . 20.4.1 Definition of Lie Groups and Lie Algebras: One-Parameter Groups . . . . . . . . . . . . . . . . . . . . . . 20.4.2 Properties of Lie Algebras . . . . . . . . . . . . . . . . . . . . 20.4.3 Adjoint Representation of Lie Groups . . . . . . . . . . . 20.5 Connectedness of Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . 20.5.1 Several Definitions and Examples . . . . . . . . . . . . . . 20.5.2 O(3) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5.3 Simply Connected Lie Groups: Local Properties and Global Properties . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part V 21

827 827 834 837 840 843 853 861 868 873 874 880 890 896 896 899 905 915 915 918 920 927

Introduction to the Quantum Theory of Fields

The Dirac Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Several Remarks on the Special Theory of Relativity . . . . . . 21.2.1 Minkowski Space and Lorentz Transformation . . . . 21.2.2 Event and World Interval . . . . . . . . . . . . . . . . . . . . 21.3 Constitution and Solutions of the Dirac Equation . . . . . . . . . 21.3.1 General Form of the Dirac Equation . . . . . . . . . . . . 21.3.2 Plane Wave Solutions of the Dirac Equation . . . . . . 21.3.3 Negative-Energy Solution of the Dirac Equation . . . 21.3.4 One-Particle Hamiltonian of the Dirac Equation . . .

. . . . . . . . . .

931 931 934 934 939 944 944 949 955 960

Contents

21.4 Normalization of the Solutions of the Dirac Equation . . . . . . 21.5 Charge Conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 Characteristics of the Gamma Matrices . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

23

xxi

. . . .

964 967 971 976

Quantization of Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 Lagrangian Formalism of the Fields . . . . . . . . . . . . . . . . . . . 22.2 Introductory Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . 22.2.1 Fourier Series Expansion . . . . . . . . . . . . . . . . . . . . . 22.2.2 Fourier Integral Transforms: Fourier Transform and Inverse Fourier Transform . . . . . . . . . . . . . . . . . . . . 22.3 Quantization of the Scalar Field . . . . . . . . . . . . . . . . . . . . . . 22.3.1 Lagrangian Density and Action Integral . . . . . . . . . . 22.3.2 Equal-Time Commutation Relation and Field Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.3 Hamiltonian and Fock Space . . . . . . . . . . . . . . . . . . 22.3.4 Invariant Delta Functions of the Scalar Field . . . . . . . 22.3.5 Feynman Propagator of the Scalar Field . . . . . . . . . . 22.3.6 General Consideration on the Field Quantization . . . . 22.4 Quantization of the Dirac Field . . . . . . . . . . . . . . . . . . . . . . . 22.4.1 Lagrangian Density and Hamiltonian Density of the Dirac Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4.2 Quantization Procedure of the Dirac Field . . . . . . . . . 22.4.3 Antiparticle: Positron . . . . . . . . . . . . . . . . . . . . . . . 22.4.4 Invariant Delta Functions of the Dirac Field . . . . . . . 22.4.5 Feynman Propagator of the Dirac Field . . . . . . . . . . . 22.5 Quantization of the Electromagnetic Field . . . . . . . . . . . . . . . 22.5.1 Relativistic Formulation of the Electromagnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.2 Lagrangian Density and Hamiltonian Density of the Electromagnetic Field . . . . . . . . . . . . . . . . . . 22.5.3 Polarization Vectors of the Electromagnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.4 Canonical Quantization of the Electromagnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5.5 Hamiltonian and Indefinite Metric . . . . . . . . . . . . . . 22.5.6 Feynman Propagator of Photon Field . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

977 977 985 986

Interaction Between Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.1 Lorentz Force and Minimal Coupling . . . . . . . . . . . . . . . . . . 23.2 Lagrangian and Field Equation of the Interacting Fields . . . . . 23.3 Local Phase Transformation and U(1) Gauge Field . . . . . . . . 23.4 Interaction Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.5 S-Matrix and S-Matrix Expansion . . . . . . . . . . . . . . . . . . . . .

987 991 991 993 999 1005 1015 1023 1028 1028 1035 1039 1040 1043 1049 1049 1055 1059 1061 1065 1071 1074 1075 1075 1079 1082 1085 1091

xxii

Contents

23.6

N-Product and T-Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.6.1 Example of Two Field Operators . . . . . . . . . . . . . . . 23.6.2 Calculations Including Both the N-Products and T-Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.7 Feynman Rules and Feynman Diagrams in QED . . . . . . . . . . 23.7.1 Zeroth- and First-Order S-Matrix Elements . . . . . . . . 23.7.2 Second-Order S-Matrix Elements and Feynman Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.8 Example: Compton Scattering . . . . . . . . . . . . . . . . . . . . . . . 23.8.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . 23.8.2 Feynman Amplitude and Feynman Diagrams of Compton Scattering . . . . . . . . . . . . . . . . . . . . . . . 23.8.3 Scattering Cross-Section . . . . . . . . . . . . . . . . . . . . . 23.8.4 Spin and Photon Polarization Sums . . . . . . . . . . . . . 23.8.5 Detailed Calculation Procedures of Feynman Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.8.6 Experimental Observations . . . . . . . . . . . . . . . . . . . 23.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Basic Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.1 Extended Concepts of Vector Spaces . . . . . . . . . . . . . . . . . . 24.1.1 Bilinear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 24.1.2 Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.1.3 Bilinear Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.1.4 Dual Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . 24.1.5 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.1.6 Tensor Space and Tensors . . . . . . . . . . . . . . . . . . . . 24.1.7 Euclidean Space and Minkowski Space . . . . . . . . . . 24.2 Lorentz Group and Lorentz Transformations . . . . . . . . . . . . . 24.2.1 Lie Algebra of the Lorentz Group . . . . . . . . . . . . . . 24.2.2 Successive Lorentz Transformations . . . . . . . . . . . . . 24.3 Covariant Properties of the Dirac Equation . . . . . . . . . . . . . . 24.3.1 General Consideration on the Physical Equation . . . . 24.3.2 The Klein-Gordon Equation and the Dirac Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.4 Determination of General Form of Matrix S(Λ) . . . . . . . . . . . 24.5 Transformation Properties of the Dirac Spinors . . . . . . . . . . . 24.6 Transformation Properties of the Dirac Operators . . . . . . . . . . 24.6.1 Case I: Single Lorentz Boost . . . . . . . . . . . . . . . . . . 24.6.2 Case II: Non-Coaxial Lorentz Boosts . . . . . . . . . . . . 24.7 Projection Operators and Related Operators for the Dirac Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1095 1095 1099 1103 1104 1108 1113 1113 1115 1120 1124 1126 1136 1139 1140 1141 1141 1142 1149 1155 1155 1165 1171 1177 1187 1189 1193 1197 1197 1200 1204 1207 1212 1212 1221 1229

Contents

24.8 24.9

Spectral Decomposition of the Dirac Operators . . . . . . . . . . Connectedness of the Lorentz Group . . . . . . . . . . . . . . . . . . 24.9.1 Polar Decomposition of a Non-Singular Matrix . . . . 24.9.2 Special Linear Group: SL(2, ℂ) . . . . . . . . . . . . . . . 24.10 Representation of the Proper Orthochronous Lorentz Group SO0(3, 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

xxiii

. . . .

1235 1241 1242 1250

. 1258 . 1269

Advanced Topics of Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 25.1 Differential Representation of Lie Algebra . . . . . . . . . . . . . . . 25.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.1.2 Adjoint Representation of Lie Algebra . . . . . . . . . . . 25.1.3 Differential Representation of Lorentz Algebra . . . . . 25.2 Cartan-Weyl Basis of Lie Algebra . . . . . . . . . . . . . . . . . . . . 25.2.1 Complexification . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2.2 Coupling of Angular Momenta: Revisited . . . . . . . . . 25.3 Decomposition of Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . 25.4 Further Topics of Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . 25.5 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1271 1271 1271 1273 1278 1287 1288 1294 1298 1301 1304 1304

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305

Part I

Quantum Mechanics

Quantum mechanics is clearly distinguished from classical physics whose major pillars are Newtonian mechanics and electromagnetism established by Maxwell. Quantum mechanics was first established as a theory of atomic physics that handled the microscopic world. Later on, quantum mechanics was applied to the macroscopic world, i.e., cosmos. The question on how exactly quantum mechanics describes the natural world and on how far the theory can go remains yet problematic and is in dispute to this day. Such an ultimate question is irrelevant to this monograph. Our major aim is to study a standard approach to applying Schrödinger equation to selected topics. The topics include a particle confined within a potential well, a harmonic oscillator, and hydrogen-like atoms. Our major task rests on solving eigenvalue problems of these topics. To this end, we describe both an analytical method and algebraic (or operator) method. Focusing on these topics, we will be able to acquire various methods to tackle a wide range of quantum-mechanical problems. These problems are usually posed as an analytical equation (i.e., differential equation) or an algebraic equation. A Hamiltonian is constructed analytically or algebraically accordingly. Besides Hamiltonian, physical quantities are expressed as a differential operator or a matrix operator. In both analytical and algebraic approaches, Hermitian property (or Hermiticity) of an operator and matrix is of crucial importance. This feature will, therefore, be highlighted not only in this part but also throughout this book along with a unitary operator and matrix. Optical transition and associated selection rules are dealt with in relation to the above topics. Those subjects are closely related to electromagnetic phenomena that are considered in Part II. Unlike the eigenvalue problems of the above-mentioned topics, it is difficult to get exact analytical solutions in most cases of quantum-mechanical problems. For this reason, we need appropriate methods to obtain approximate solutions with respect to various problems including the eigenvalue problems. In this context, we deal with approximation techniques of a perturbation method and variational method.

2

Part I

Quantum Mechanics

In the last part, we study the theory of analytic functions, one of the most elegant theories of mathematics. This approach not only helps cultivate a broad view of pure mathematics, but also leads to the acquisition of practical methodology. The last part deals with the introductory set theory and topology as well.

Chapter 1

Schrödinger Equation and Its Application

Quantum mechanics is an indispensable research tool of modern natural science that covers cosmology, atomic physics, molecular science, materials science, and so forth. The basic concept underlying quantum mechanics rests upon Schrödinger equation. The Schrödinger equation is described as a second-order linear differential equation (SOLDE). The equation is analytically solved accordingly. Alternatively, equations of the quantum mechanics are often described in terms of operators and matrices and physical quantities are represented by those operators and matrices. Normally, they are non-commutative. In particular, the quantum-mechanical formalism requires the canonical commutation relation between position and momentum operators. One of the great characteristics of the quantum mechanics is that physical quantities must be Hermitian. This aspect is deeply related to the requirement that these quantities should be described by real numbers. We deal with the Hermiticity from both an analytical point of view (or coordinate representation) relevant to the differential equations and an algebraic viewpoint (or matrix representation) associated with the operators and matrices. Including these topics, we briefly survey the origin of Schrödinger equation and consider its implications. To get acquainted with the quantum-mechanical formalism, we deal with simple examples of the Schrödinger equation.

1.1

Early-Stage Quantum Theory

The Schrödinger equation is a direct consequence of discovery of quanta. It stemmed from the hypothesis of energy quanta propounded by Max Planck (1900). This hypothesis was further followed by photon (light quantum) hypothesis propounded by Albert Einstein (1905). He claimed that light is an aggregation of light quanta and that individual quanta carry an energy E expressed as Planck constant h multiplied by frequency of light ν, i.e.,

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_1

3

4

1

Schrödinger Equation and Its Application

E = hν = ħω,

ð1:1Þ

where ħ  h/2π and ω = 2πν. The quantity ω is called angular frequency with ν being frequency. The quantity ħ is said to be a reduced Planck constant. Also Einstein (1917) concluded that momentum of light quantum p is identical to the energy of light quantum divided by light velocity in vacuum c. That is, we have p = E=c = ħω=c = ħk,

ð1:2Þ

where k  2π/λ (λ is wavelength of light in vacuum) and k is called wavenumber. Using vector notation, we have p = ħk,

ð1:3Þ

where k  2πλ n (n: a unit vector in the direction of propagation of light) is said to be a wavenumber vector. Meanwhile, Arthur Compton (1923) conducted various experiments where he investigated how an incident X-ray beam was scattered by matter (e.g., graphite, copper, etc.). As a result, Compton found out a systematical redshift in X-ray wavelengths as a function of scattering angles of the X-ray beam (Compton effect). Moreover he found that the shift in wavelengths depended only on the scattering angle regardless of quality of material of a scatterer. The results can be summarized in a simple equation described as Δλ =

h ð1 - cos θÞ, me c

ð1:4Þ

where Δλ denotes a shift in wavelength of the scattered beam; me is a rest mass of an electron; θ is a scattering angle of the X-ray beam (see Fig. 1.1). A quantity mhe c has a dimension of length and denoted by λe. That is, λe  h=me c:

ð1:5Þ

In other words, λe is equal to the maximum shift in the wavelength of the scattered beam; this shift is obtained when θ = π/2. The quantity λe is called an electron Compton wavelength and has an approximate value of 2.426 × 10-12 (m). Let us derive (1.4) on the basis of conservation of energy and momentum. To this end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is incident to the electron. Then the X-ray is scattered and the electron recoils as shown. The energy conservation reads as ħω þ me c2 = ħω0 þ

p2 c2 þ m e 2 c4 ,

ð1:6Þ

1.1

Early-Stage Quantum Theory

(a)

5

recoiled electron ( )

(b) incident X-ray (

)



rest electron

scattered X-ray (

)

Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray beam. (b) Conservation of momentum

where ω and ω′ are initial and final angular frequencies of the X-ray; the second term of RHS is an energy of the electron in which p is a magnitude of momentum after recoil. Meanwhile, conservation of the momentum as a vector quantity reads as ħk = ħk0 þ p,

ð1:7Þ

where k and k′ are wavenumber vectors of the X-ray before and after being scattered; p is a momentum of the electron after recoil. Note that an initial momentum of the electron is zero since the electron is originally at rest. Here p is defined as p  mu,

ð1:8Þ

where u is a velocity of an electron and m is given by [1] m = me =

1 - juj2 =c2 :

ð1:9Þ

Figure 1.1 shows that -ħk, ħk′, and p form a closed triangle. From (1.6), we have 2

me c2 þ ħðω - ω0 Þ = p2 c2 þ me 2 c4 :

ð1:10Þ

2me c2 ħðω - ω0 Þ þ ħ2 ðω - ω0 Þ = p2 c2 :

ð1:11Þ

Hence, we get 2

From (1.7), we have

6

1

p2

Schrödinger Equation and Its Application

= ħ2 ðk - k0 Þ = ħ2 k2 þ k0 - 2kk 0 cos θ 2

=

2

ħ2 2 2 ω þ ω0 - 2ωω0 cos θ , c2

ð1:12Þ

where we used the relations ω = ck and ω′ = ck′ with the third equality. Therefore, we get p2 c2 = ħ2 ω2 þ ω0 - 2ωω0 cos θ : 2

ð1:13Þ

From (1.11) and (1.13), we have 2me c2 ħðω - ω0 Þ þ ħ2 ðω - ω0 Þ = ħ2 ω2 þ ω0 - 2ωω0 cos θ : 2

2

ð1:14Þ

Equation (1.14) is simplified to the following: 2me c2 ħðω - ω0 Þ - 2ħ2 ωω0 = - 2ħ2 ωω0 cos θ: That is, me c2 ðω - ω0 Þ = ħωω0 ð1 - cos θÞ:

ð1:15Þ

1 1 1 0 ħ ω - ω0 = 0- = ð λ - λÞ = ð1 - cos θÞ, ω ω 2πc ωω0 m e c2

ð1:16Þ

Thus, we get

where λ and λ′ are wavelengths of the initial and final X-ray beams, respectively. Since λ′ - λ = Δλ, we have (1.4) from (1.16) accordingly. We have to mention another important person, Louis-Victor de Broglie (1924) in the development of quantum mechanics. Encouraged by the success of Einstein and Compton, he propounded the concept of matter wave, which was referred to as the de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and (1.2) such that ω = E=ħ,

ð1:17Þ

and k = p=ħ

or

λ = h=p,

ð1:18Þ

where p equals |p| and λ is a wavelength of a corpuscular beam. This is said to be the de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an

1.1

Early-Stage Quantum Theory

7

energy E and momentum p is accompanied by a wave that is characterized by an angular frequency ω and wavenumber k (or a wavelength λ = 2π/k). Equation (1.18) implies that if we are able to determine the wavelength of the corpuscular beam experimentally, we can decide a magnitude of momentum accordingly. In turn, from squares of both sides of (1.8) and (1.9) we get p

u= me

1 þ ðp=me cÞ2

:

ð1:19Þ

This relation represents a velocity of particles of the corpuscular beam. If we are dealing with an electron beam, (1.19) gives the velocity of the electron beam. As a non-relativistic approximation (i.e., p/mec ≪ 1), we have p ≈ me u: We used a relativistic relation in the second term of RHS of (1.6), where an energy of an electron Ee is expressed by Ee =

p2 c2 þ me 2 c4 :

ð1:20Þ

In the meantime, deleting u2 from (1.8) and (1.9) we have mc2 =

p2 c2 þ me 2 c4 :

Namely, we get [1] Ee = mc2 :

ð1:21Þ

The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equivalence theorem of mass and energy. If an electron is accompanied by a matter wave, that wave should be propagated with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18) we have vp = ω=k = Ee =p =

p2 c2 þ me 2 c4 =p > c,

vg = ∂ω=∂k = ∂E e =∂p = c2 p=

p2 c2 þ me 2 c4 < c,

ð1:22Þ

vp v g = c : 2

Notice that in the above expressions, we replaced E of (1.17) with Ee of (1.20). The group velocity is thought to be a velocity of a wave packet and, hence, a propagation velocity of a matter wave should be identical to vg. Thus, vg is considered as a particle velocity as well. In fact, vg given by (1.22) is identical to u expressed in (1.19). Therefore, a particle velocity must not exceed c. As for

8

1 Schrödinger Equation and Its Application

photons (or light quanta), vp = vg = c and, hence, once again we get vpvg = c2. We will encounter the last relation of (1.22) in Part II as well. The above discussion is a brief historical outlook of early-stage quantum theory before Erwin Schrödinger (1926) propounded his equation.

1.2

Schrödinger Equation

First we introduce a wave equation expressed by 2

— 2ψ =

1 ∂ ψ , v2 ∂t 2

ð1:23Þ

where ψ is an arbitrary function of a physical quantity relevant to propagation of a wave; v is a phase velocity of wave; — 2 called Laplacian (or Laplace operator) is defined below 2

—2 

2

2

∂ ∂ ∂ þ þ : ∂x2 ∂y2 ∂z2

ð1:24Þ

One of the special solutions for (1.24) called a plane wave is well-studied and expressed as ψ = ψ 0 eiðkx - ωtÞ :

ð1:25Þ

In (1.25), x denotes a position vector of a three-dimensional Cartesian coordinate and is described as

x = ðe1 e2 e3 Þ

x y

,

ð1:26Þ

z where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive directions of x-, y-, and z-axes, respectively. Here we make it a rule to represent basis vectors by a row vector and represent a coordinate or a component of a vector by a column vector; see Sect. 11.1. The other way around, now we wish to seek a basic equation whose solution is described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we rewrite (1.25) as

1.2

Schrödinger Equation

9

ψ = ψ 0 eiðħx - ħ tÞ , p

E

ð1:27Þ

px where we redefine p = ðe1 e2 e3 Þ

py and E as quantities associated with those of pz matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we obtain p E ∂ψ i i = px ψ 0 eiðħx - ħ tÞ = px ψ: ħ ħ ∂x

ð1:28Þ

Rewriting (1.28), we have ħ ∂ψ = px ψ: i ∂x

ð1:29Þ

Similarly we have ħ ∂ψ = py ψ i ∂y

and

ħ ∂ψ = pz ψ: i ∂z

ð1:30Þ

Comparing both sides of (1.29), we notice that we may relate a differential ∂ operator ħi ∂x to px. From (1.30), similar relationship holds with the y and z components. That is, we have the following relations: ħ ∂ $ px , i ∂x

ħ ∂ $ py , i ∂y

ħ ∂ $ pz : i ∂z

ð1:31Þ

Taking partial differentiation of (1.28) once more, 2

i ∂ ψ = px 2 ħ ∂x

2

ψ 0 eiðħx - ħ tÞ = p

E

1 2 p ψ: ħ2 x

ð1:32Þ

Hence, 2

- ħ2 Similarly we have

∂ ψ = p2x ψ: ∂x2

ð1:33Þ

10

1

Schrödinger Equation and Its Application

2

- ħ2

2

∂ ψ = p2y ψ ∂y2

and

- ħ2

∂ ψ = p2z ψ: ∂z2

ð1:34Þ

As in the above cases, we have 2

- ħ2

2

∂ $ p2x , ∂x2

- ħ2

∂ $ p2y , ∂y2

2

- ħ2

∂ $ p2z : ∂z2

ð1:35Þ

Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have -

p2 ħ2 2 — ψ= ψ 2m 2m

ð1:36Þ

ħ2 2 p2 — $ , 2m 2m

ð1:37Þ

and the following correspondence -

where m is the mass of a particle. Meanwhile, taking partial differentiation of (1.27) with respect to t, we obtain p E i ∂ψ i = - Eψ 0 eiðħx - ħ tÞ = - Eψ: ħ ħ ∂t

ð1:38Þ

That is, iħ

∂ψ = Eψ: ∂t

ð1:39Þ

As the above, we get the following relationship: iħ

∂ $ E: ∂t

ð1:40Þ

Thus, we have relationships between c-numbers (classical numbers) and q-numbers (quantum numbers, namely, operators) in (1.35) and (1.40). Subtracting (1.36) from (1.39), we get iħ

p2 ∂ψ ħ2 2 þ — ψ = Eψ: 2m ∂t 2m

Invoking the relationship on energy

ð1:41Þ

1.2

Schrödinger Equation

11

ðTotal energyÞ = ðKinetic energyÞ þ ðPotential energyÞ,

ð1:42Þ

we have E=

p2 þ V, 2m

ð1:43Þ

where V is a potential energy. Thus, (1.41) reads as iħ

∂ψ ħ2 2 — ψ = Vψ: þ ∂t 2m

ð1:44Þ

Rearranging (1.44), we finally get -

∂ψ ħ2 2 : — þ V ψ = iħ 2m ∂t

ð1:45Þ

This is the Schrödinger equation, a fundamental equation of quantum mechanics. In (1.45), we define a following Hamiltonian operator H as H -

ħ2 2 — þ V: 2m

ð1:46Þ

Then we have a shorthand representation such that Hψ = iħ

∂ψ : ∂t

ð1:47Þ

On going from (1.25) to (1.27), we realize that quantities k and ω pertinent to a field have been converted to quantities p and E related to a particle. At the same time, whereas x and t represent a whole space-time in (1.25), those in (1.27) are characterized as localized quantities. From a historical point of view, we have to mention a great achievement accomplished by Werner Heisenberg (1925) who propounded matrix mechanics. The matrix mechanics is often contrasted with the wave mechanics Schrödinger initiated. Schrödinger and Paul Dirac (1926) demonstrated that wave mechanics and matrix mechanics are mathematically equivalent. Note that the Schrödinger equation is described as a non-relativistic expression based on (1.43). In fact, kinetic energy K of a particle is given by [1] K=

me c2 1 - ðu=cÞ

2

As a non-relativistic approximation, we get

- m e c2 :

12

1

K ≈ m e c2 1 þ

1 u 2 c

2

Schrödinger Equation and Its Application

- m e c2 =

1 p2 me u2 ≈ , 2 2me

where we used p ≈ meu again as a non-relativistic approximation; also, we used p

1 1 ≈1 þ x 2 1-x

2

when x (>0) corresponding to uc is enough small than 1. This implies that in the above case the group velocity u of a particle is supposed to be well below light velocity c. Dirac (1928), however, formulated an equation that describes relativistic quantum mechanics of electron (the Dirac equation). The detailed description with respect to the Dirac equation will be found in Part V. In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential V depends only upon x. Then we have -

∂ψ ðx, t Þ ħ2 2 : — þ V ðxÞ ψ ðx, t Þ = iħ 2m ∂t

ð1:48Þ

Now, let us assume that separation of variables can be done with (1.48) such that ψ ðx, t Þ = ϕðxÞξðt Þ:

ð1:49Þ

Then, we have -

∂ϕðxÞξðt Þ ħ2 2 : — þ V ðxÞ ϕðxÞξðt Þ = iħ 2m ∂t

ð1:50Þ

Accordingly, (1.50) can be recast as -

∂ξðt Þ ħ2 2 =ξðt Þ: — þ V ðxÞ ϕðxÞ=ϕðxÞ = iħ 2m ∂t

ð1:51Þ

For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain fixed point x0 we have -

∂ξðt Þ ħ2 2 — þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ = iħ =ξðt Þ, 2m ∂t

ð1:52Þ

where ϕ(x0) of a numerator should be evaluated after operating — 2, while with ϕ(x0) in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us define a function Φ(x) such that

1.2

Schrödinger Equation

13

Φ ð xÞ  -

ħ2 2 — þ V ðxÞ ϕðxÞ=ϕðxÞ: 2m

ð1:53Þ

Then, we have Φðx0 Þ = iħ

∂ξðt Þ =ξðt Þ: ∂t

ð1:54Þ

If RHS of (1.54) varied depending on t, Φ(x0) would be allowed to have various values, but this must not be the case with our present investigation. Thus, RHS of (1.54) should take a constant value E. For the same reason, LHS of (1.51) should take a constant. Thus, (1.48) or (1.51) should be separated into the following equations: HϕðxÞ = EϕðxÞ,

ð1:55Þ

∂ξðt Þ = Eξðt Þ: ∂t

ð1:56Þ



Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t, we have dξðt Þ E = dt iħ ξðt Þ

or

d ln ξðt Þ =

E dt: iħ

ð1:57Þ

Integrating (1.57) from zero to t, we get ln

ξðt Þ Et = : iħ ξ ð 0Þ

ð1:58Þ

That is, ξðt Þ = ξð0Þ expð- iEt=ħÞ:

ð1:59Þ

Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56) represents an energy of a particle (electron). Thus, the next task we want to do is to solve an eigenvalue equation of (1.55). After solving the problem, we get a solution ψ ðx, t Þ = ϕðxÞ expð- iEt=ħÞ,

ð1:60Þ

where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normalized after determining the functional form (vide infra).

14

1.3

1

Schrödinger Equation and Its Application

Simple Applications of Schrödinger Equation

The Schrödinger equation has been expressed as (1.48). The equation is a secondorder linear differential equation (SOLDE). In particular, our major interest lies in solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex plane. Those points sometimes form a continuous domain, but we focus on the eigenvalues that comprise discrete points in the complex plane. Therefore, in our studies the eigenvalues are countable and numbered as, e.g., λn (n = 1, 2, 3, ⋯). An example is depicted in Fig. 1.2. Having this common belief as a background, let us first think of a simple form of SOLDE. Example 1.1 Let us think of a following differential equation: d 2 yð xÞ þ λyðxÞ = 0, dx2

ð1:61Þ

where x is a real variable; y may be a complex function of x with λ possibly being a complex constant as well. Suppose that y(x) is defined within a domain [-L, L] (L > 0). We set boundary conditions (BCs) for (1.61) such that yðLÞ = 0

and

yð- LÞ = 0

ðL > 0Þ:

ð1:62Þ

z

i

Fig. 1.2 Eigenvalues λn (n = 1, 2, 3, ⋯) on a complex plane

1.3

Simple Applications of Schrödinger Equation

15

The BCs of (1.62) are called Dirichlet conditions. We define the following differential operator D described as D -

d2 : dx2

ð1:63Þ

Then rewriting (1.61), we have DyðxÞ = λyðxÞ:

ð1:64Þ

According to a general principle of SOLDE, it has two linearly independent solutions. In the case of (1.61), we choose exponential functions for those solutions described by eikx

and

e - ikx

ðk ≠ 0Þ:

ð1:65Þ

This is because the above functions do not change a functional form with respect to the differentiation and we ascribe solving a differential equation to solving an algebraic equation among constants (or parameters). In the present case, λ and k are such constants. The parameter k could be a complex variable, because λ is allowed to take a complex value as well. Linear independence of these functions is ensured from a non-vanishing Wronskian, W. That is, W=

eikx e

ikx 0

e - ikx e

- ikx 0

=

eikx ikeikx

e - ikx = - ik - ik = - 2ik: - ike - ikx

If k ≠ 0, W ≠ 0. Therefore, as a general solution, we get yðxÞ = aeikx þ be - ikx ðk ≠ 0Þ,

ð1:66Þ

where a and b are (complex) constant. We call two linearly independent solutions eikx and e-ikx (k ≠ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66) into (1.61), we have λ - k 2 aeikx þ be - ikx = 0:

ð1:67Þ

For (1.67) to hold with any x, we must have λ - k 2 = 0, Using BCs (1.62), we have

i:e:,

λ = k2 :

ð1:68Þ

16

1

aeikL þ be - ikL = 0

Schrödinger Equation and Its Application

and ae - ikL þ beikL = 0:

ð1:69Þ

Rewriting (1.69) in a matrix form, we have e - ikL eikL

eikL e - ikL

a b

=

0 0

:

ð1:70Þ

For a and b in (1.70) to have non-vanishing solutions, we must have eikL

e - ikL

e - ikL

eikL

= 0,

i:e:,

e2ikL - e - 2ikL = 0:

ð1:71Þ

It is because if (1.71) were not zero, we would have a = b = 0 and y(x)  0. Note that with an eigenvalue problem we must avoid having a solution that is identically zero. Rewriting (1.71), we get eikL þ e - ikL eikL - e - ikL = 0:

ð1:72Þ

eikL þ e - ikL = 0

ð1:73Þ

eikL - e - ikL = 0:

ð1:74Þ

That is, we have either

or

In the case of (1.73), inserting this into (1.69) we have eikL ða - bÞ = 0:

ð1:75Þ

a = b,

ð1:76Þ

Therefore,

where we used the fact that eikL is a non-vanishing function for any ikL (either real or complex). Similarly, in the case of (1.74), we have a = - b: For (1.76), from (1.66) we have

ð1:77Þ

1.3

Simple Applications of Schrödinger Equation

yðxÞ = a eikx þ e - ikx = 2a cos kx:

17

ð1:78Þ

With (1.77), in turn, we get yðxÞ = a eikx - e - ikx = 2ia sin kx:

ð1:79Þ

Thus, we get two linearly independent solutions (1.78) and (1.79). Inserting BCs (1.62) into (1.78), we have cos kL = 0:

ð1:80Þ

Hence, kL =

π þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: 2

ð1:81Þ

π π for m = 0 and k = - 2L for m = - 1. Also, In (1.81), for instance, we have k = 2L 3π 3π we have k = 2L for m = 1 and k = - 2L for m = - 2. These cases, however, individually give linearly dependent solutions for (1.78). Therefore, to get a set of linearly independent eigenfunctions we may define k as positive. Correspondingly, from (1.68) we get eigenvalues of

λ = ð2m þ 1Þ2 π 2 =4L2 ðm = 0, 1, 2, ⋯Þ:

ð1:82Þ

Also, inserting BCs (1.62) into (1.79), we have sin kL = 0:

ð1:83Þ

kL = nπ ðn = 1, 2, 3, ⋯Þ:

ð1:84Þ

λ = n2 π 2 =L2 = ð2nÞ2 π 2 =4L2 ðn = 1, 2, 3, ⋯Þ,

ð1:85Þ

Hence,

From (1.68) we get

where we chose positive numbers n for the same reason as the above. With the second equality of (1.85), we made eigenvalues easily comparable to those of (1.82). Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2. From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and so from (1.68) we have

18

1

Schrödinger Equation and Its Application

(x 4

0 1

16

9

25

/4 )



+∞

Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62). The eigenvalues are given in a unit of π 2/4L2 on a real axis

p k = λ:

ð1:86Þ

The next step is to normalize eigenfunctions. This step corresponds to appropriate choice of a constant a in (1.78) and (1.79) so that we can have I=

L -L

yðxÞ yðxÞdx =

L -L

jyðxÞj2 dx = 1:

ð1:87Þ

That is, I = 4jaj2

L -L

cos 2 kxdx = 4jaj2

1 = 2jaj x þ sin 2kx 2k 2

L

L -L

1 ð1 þ cos 2kxÞdx 2

ð1:88Þ

2

-L

= 4Ljaj :

Combining (1.87) and (1.88), we get j aj =

1 2

1 : L

ð1:89Þ

1 iθ e , L

ð1:90Þ

Thus, we have a=

1 2

where θ is any real number and eiθ is said to be a phase factor. We usually set eiθ  1. Then, we have a =

1 2

yðxÞ =

1 L.

Thus for a normalized cosine eigenfunctions, we get 1 π cos kx kL = þ mπ ðm = 0, 1, 2, ⋯Þ L 2

ð1:91Þ

that corresponds to an eigenvalue λ = (2m + 1)2π 2/4L2 (m = 0, 1, 2, ⋯). For another series of normalized sine functions, similarly we get

1.3

Simple Applications of Schrödinger Equation

1 sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ L

yð xÞ =

19

ð1:92Þ

that corresponds to an eigenvalue λ = (2n)2π 2/4L2 (n = 1, 2, 3, ⋯). Notice that arranging λ in ascending order, we have even functions and odd functions alternately as eigenfunctions corresponding to λ. Such a property is said to be parity. We often encounter it in quantum mechanics and related fields. From (1.61) we find that if y(x) is an eigenfunction, so is cy(x). That is, we should bear in mind that the eigenvalue problem is always accompanied by an indeterminate constant and that normalization of an eigenfunction does not mean the uniqueness of the solution (see Chap. 10). Strictly speaking, we should be careful to assure that (1.81) holds on the basis of (1.80). It is because we have yet the possibility that k is a complex number. To see it, we examine zeros of a cosine function that is defined in a complex domain. Here the zeros are (complex) numbers to which the function takes zero. That is, if f(z0) = 0, z0 is called a zero (i.e., one of zeros) of f(z). Now we have cos z 

1 iz e þ e - iz ; z = x þ iy ðx, y : realÞ: 2

ð1:93Þ

Inserting z = x + iy in cos z and rearranging terms, we get 1 cos z = ½cos xðey þ e - y Þ þ i sin xðe - y - ey Þ: 2

ð1:94Þ

For cos z to vanish, both its real and imaginary parts must be zero. Since e y + e-y > 0 for all real numbers y, we must have cos x = 0 for the real part to vanish; i.e., x=

π þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: 2

ð1:95Þ

Note in this case that sin x = ± 1(≠0). Therefore, for the imaginary part to vanish, e-y - e y = 0. That is, we must have y = 0. Consequently, the zeros of cos z are real numbers. In other words, with respect to z0 that satisfies cos z0 = 0 we have z0 =

π þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: 2

ð1:96Þ

The above discussion equally applies to a sine function as well. Thus, we ensure that k is a non-zero real number. Eigenvalues λ are positive definite from (1.68) accordingly. This conclusion is not fortuitous but a direct consequence of the form of a differential equation we have dealt with in combination with the BCs we imposed, i.e., the Dirichlet conditions. Detailed discussion will follow in Sects. 1.4, 10.3, and 10.4 in relation to Hermiticity of a differential operator.

20

1

Schrödinger Equation and Its Application

Example 1.2 A particle confined within a potential well The results obtained in Example 1.1 can immediately be applied to dealing with a particle (electron) in a one-dimensional infinite potential well. In this case, (1.55) reads as ħ2 d 2 ψ ð xÞ þ Eψ ðxÞ = 0, 2m dx2

ð1:97Þ

where m is a mass of a particle and E is an energy of the particle. A potential V is expressed as V ð xÞ =

0 1

ð - L ≤ x ≤ LÞ, ð - L > x; x > LÞ:

Rewriting (1.97), we have d2 ψ ðxÞ 2mE þ 2 ψ ð xÞ = 0 ħ dx2

ð1:98Þ

ψ ðLÞ = ψ ð- LÞ = 0:

ð1:99Þ

with BCs

If we replace λ of (1.61) with 2mE , we can follow the procedures of Example 1.1. ħ2 That is, we put E=

ħ2 λ 2m

ð1:100Þ

with λ = k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with energy eigenvalues we get either E=

2 ħ2 ð2l þ 1Þ π 2 ðl = 0, 1, 2, ⋯Þ,  2m 4L2

to which ψ ð xÞ = corresponds or

π 1 cos kx kL = þ lπ ðl = 0, 1, 2, ⋯Þ 2 L

ð1:101Þ

1.4

Quantum-Mechanical Operators and Matrices

E=

21

2 ħ2 ð2nÞ π 2 ðn = 1, 2, 3, ⋯Þ,  2m 4L2

to which ψ ð xÞ =

1 sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ L

ð1:102Þ

corresponds. Since the particle behaves as a free particle within the potential well (-L ≤ x ≤ L ) and p = ħk, we obtain E=

ħ2 2 p2 = k , 2m 2m

where k=

ð2l þ 1Þπ=2L 2nπ=2L

ðl = 0, 1, 2, ⋯Þ, ðn = 1, 2, 3, ⋯Þ:

The energy E is a kinetic energy of the particle. Although in (1.97), ψ(x)  0 trivially holds, such a function may not be regarded as a solution of the eigenvalue problem. In fact, considering that |ψ(x)|2 represents existence probability of a particle, ψ(x)  0 corresponds to a situation where a particle in question does not exist. Consequently, such a trivial case has physically no meaning.

1.4

Quantum-Mechanical Operators and Matrices

As represented by (1.55), a quantum-mechanical operator corresponds to a physical quantity. In (1.55), we connect a Hamiltonian operator to an energy (eigenvalue). Let us rephrase the situation as follows: PΨ = pΨ :

ð1:103Þ

In (1.103), we are viewing P as an operation or measurement on a physical system that is characterized by the quantum state Ψ . Operating P on the physical system (or state), we obtain a physical quantity p relevant to P as a result of the operation (or measurement). A way to effectively achieve the above is to use a matrix and vector to represent the operation and physical state, respectively. Let us glance a little bit of matrix calculation to get used to the quantum-mechanical concept and, hence, to obtain

22

1 Schrödinger Equation and Its Application

clear understanding about it. In Part III, we will deal with matrix calculation in detail from a point of view of a general principle. At present, a (2, 2) matrix suffices. Let A be a (2, 2) matrix expressed as A=

a

b

c

d

:

ð1:104Þ

Let jψ i be a (2, 1) matrix, i.e., a column vector such that j ψi =

e f

:

ð1:105Þ

Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix. Furthermore, we define an adjoint matrix A{ such that A{ =

a

c

b

d

ð1:106Þ

,

where a is a complex conjugate of a. That is, A{ is a complex conjugate transposed matrix of A. Also, we define an adjoint vector hψ j or jψ i{ such that hψj  jψ i{ = ðe f  Þ:

ð1:107Þ

In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector, respectively. This naming or equivoque comes from that hψ|  | φi = hψ| φi forms a bracket. This is a (1, 2) × (2, 1) = (1, 1) matrix, i.e., a c-number (including a complex number) and hψ| φi represents an inner product. These notations are widely used nowadays in the field of mathematics and physics. g and using a matrix calculation rule, we have Taking another vector j ξi = h A{ j ψi = A{ ψi =

a b

c d

e f

=

a e þ c f b e þ d  f

:

ð1:108Þ

According to the definition (1.107), we have j A{ ψi{ = hA{ ψ j = ðae þ cf  be þ df  Þ: Thus, we get

ð1:109Þ

1.4

Quantum-Mechanical Operators and Matrices

A{ ψjξ = ðae þ cf  be þ df  Þ

g h

= ðag þ bhÞe þ ðcg þ dhÞf  :

23

ð1:110Þ

Similarly, we have hψjAξi = ðe f  Þ

a b c d

g h

= ðag þ bhÞe þ ðcg þ dhÞf  :

ð1:111Þ

Comparing (1.110) and (1.111), we get A{ ψjξ = hψjAξi:

ð1:112Þ

hψjAξi = hAξjψ i:

ð1:113Þ

Also, we have

Replacing A with A{ in (1.112), we get {

A{ ψjξ = ψjA{ ξ :

ð1:114Þ

From (1.104) and (1.106), obviously we have A{

{

= A:

ð1:115Þ

Then, from (1.114) and (1.115) we have hAψjξi = ψjA{ ξ = hξjAψ i ,

ð1:116Þ

where the second equality comes from (1.113) obtained by exchanging ψ and ξ there. Moreover, we have a following relation: ðABÞ{ = B{ A{ :

ð1:117Þ

The proof is left for readers. Using this relation, we have hAψj = jAψ i{ = ½Ajψi{ = j ψi{ A{ = hψ j A{ = hψA{ j :

ð1:118Þ

Making an inner product by multiplying jξi from the right of the leftmost and rightmost sides of (1.118) and using (1.116), we get

24

1

Schrödinger Equation and Its Application

hAψjξi = ψA{ jξ = ψjA{ ξ : This relation may be regarded as the associative law with regard to the symbol “j” of the inner product. This is equivalent to the associative law with regard to the matrix multiplication. The results obtained above can readily be extended to a general case where (n, n) matrices are dealt with. Now, let us introduce an Hermitian operator (or matrix) H. When we have H { = H,

ð1:119Þ

H is called an Hermitian matrix. Then, applying (1.112) to the Hermitian matrix H, we have H { ψjξ = hψjHξi = ψjH { ξ

or

hHψjξi = ψjH { ξ = hψjHξi:

ð1:120Þ

Also let us introduce a norm of a vector jψ i such that jjψ jj =

hψjψ i:

ð1:121Þ

A norm is a natural extension for a notion of a “length” of a vector. The norm kψk is zero, if and only if jψ i = 0 (zero vector). For, from (1.105) and (1.107), we have hψjψ i = jej2 þ jf j2 : Therefore, hψ| ψ i = 0 , e = f = 0 in (1.105), i.e., jψ i = 0. Let us further consider an eigenvalue problem represented by our newly introduced notation. The eigenvalue equation is symbolically written as H j ψi = λ j ψi,

ð1:122Þ

where H represents an Hermitian operator and jψi is an eigenfunction that belongs to an eigenvalue λ. Operating hψ j on (1.122) from the left, we have hψjHjψ i = hψjλjψ i = λhψjψ i = λ,

ð1:123Þ

where we assume that jψi is normalized; namely hψ| ψi = 1 or ||ψ|| = 1. Notice that the symbol “j” in an inner product is of secondary importance. We may disregard this notation as in the case where a product notation “×” is omitted by denoting ab instead of a × b. Taking a complex conjugate of (1.123), we have

1.4

Quantum-Mechanical Operators and Matrices

hψjHψ i = λ :

25

ð1:124Þ

Using (1.116) and (1.124), we have λ = hψjHψ i = ψjH { ψ = hψjHψ i = λ,

ð1:125Þ

where with the third equality we used the definition (1.119) for the Hermitian operator. Hence, the relation λ = λ obviously shows that any eigenvalue λ is real, if H is Hermitian. The relation (1.125) immediately tells us that even though jψi is not an eigenfunction, hψ| Hψi is real as well, if H is Hermitian. The quantity hψ| Hψi is said to be an expectation value. This value is interpreted as the most probable or averaged value of H obtained as a result of operation of H on a physical state jψi. We sometimes denote the expectation value as hH i  hψjHψ i,

ð1:126Þ

where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis of (1.121) by choosing jΦi such that j Φi = j ψi= ψ :

ð1:127Þ

Thus, we have an important consequence; if an Hermitian operator has an eigenvalue, it must be real. An expectation value of an Hermitian operator is real as well. The real eigenvalue and expectation value are a prerequisite for a physical quantity. As discussed above, the Hermitian matrices play a central role in quantum physics. Taking a further step, let us extend the notion of Hermiticity to a function space. In Example 1.1, we have remarked that we have finally reached a solution where λ is a real (and positive) number, even though at the beginning we set no restriction on λ. This is because the SOLDE form (1.61) accompanied by BCs (1.62) is Hermitian, and so eigenvalues λ are real. In this context, we give a little bit of further consideration. We define an inner product between two functions as follows: b

hgjf i 

gðxÞ f ðxÞdx,

ð1:128Þ

a

where g(x) is a complex conjugate of g(x); x is a real variable and an integration range can be either bounded or unbounded. If a and b are real definite numbers, [a, b] is the bounded case. With the unbounded case, we have, e.g., (-1, 1), (-1, c), and (c, 1), etc. where c is a definite number. This notation will appear again in Chap. 10. In (1.128) we view functions f and g as vectors in a function space, often

26

1 Schrödinger Equation and Its Application

referred to as a Hilbert space. We assume that any function f is square-integrable; i.e., |f|2 is finite. That is, b

jf ðxÞj2 dx < 1:

ð1:129Þ

a

Using the above definition, let us calculate hg| Dfi, where D was defined in (1.63). Then, using the integration by parts, we have hgjDf i

b

=

gð x Þ  -

a

=

b - ½g f 0 a 0

d 2 f ðxÞ b dx = - ½g f 0 a þ dx2 0

þ g f  0

= g f -g f

b a

b a

b

-

00

b

0

g f 0 dx

a 0

g fdx = g f - g f 0

a

b a

b

þ

00

- g f dx

a

þ hDgjf i: ð1:130Þ

If we have BCs such that f ð bÞ = f ð aÞ = 0 

and



gð bÞ = gð aÞ = 0

ð1:131Þ

gðbÞ = gðaÞ = 0,

i:e:,

we get hgjDf i = hDgjf i:

ð1:132Þ

In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has this property. Thus, the Hermiticity of a differential operator is closely related to BCs of the differential equation. Next, we consider a following inner product: b

hf jDf i = a

f  f 00 dx = - ½f  f 0 a þ

b

b

a

0

f  f 0 dx = - ½f  f 0 a þ

b

b

jf 0 j dx: ð1:133Þ 2

a

Note that the definite integral of (1.133) cannot be negative. There are two possibilities for D to be Hermitian according to different BCs. 1. Dirichlet conditions: f(b) = f(a) = 0. If we could have f ′ = 0, hfj Dfi would be zero. But, in that case f should be constant. If so, f(x)  0 according to BCs. We must exclude this trivial case. Consequently, to avoid this situation we must have

1.4

Quantum-Mechanical Operators and Matrices b

jf 0 j dx > 0 2

27

hf jDf i > 0:

or

ð1:134Þ

a

In this case, the operator D is said to be positive definite. Suppose that such a positive-definite operator has an eigenvalue λ. Then, for a corresponding eigenfunction y(x) we have DyðxÞ = λyðxÞ:

ð1:135Þ

In this case, we state that y(x) is an eigenfunction or eigenvector that corresponds (or belongs) to an eigenvalue λ. Taking an inner product of both sides, we have hyjDyi = hyjλyi = λhyjyi = λjjyjj2

λ = hyjDyi=jjyjj2 :

or

ð1:136Þ

Both hy| Dyi and ||y||2 are positive and, hence, we have λ > 0. Thus, if D has an eigenvalue, it must be positive. In this case, λ is said to be positive definite as well; see Example 1.1. 2. Neumann conditions: f ′(b) = f ′(a) = 0. From (1.130), D is Hermitian as well. Unlike the condition (1), however, f may be a non-zero constant in this case. Therefore, we are allowed to have b

jf 0 j dx = 0 2

or

hf jDf i = 0:

ð1:137Þ

a

For any function, we have hf jDf i ≥ 0:

ð1:138Þ

In this case, the operator D is said to be non-negative (or positive semidefinite). The eigenvalue may be zero from (1.136) and, hence, is called non-negative accordingly. 3. Periodic conditions: f(b) = f(a) and f ′(b) = f ′(a). We are allowed to have hf| Dfi ≥ 0 as in the case of the condition (2). Then, the operator and eigenvalues are non-negative. Thus, in spite of being formally the same operator, that operator behaves differently according to the different BCs. In particular, for a differential operator to be associated with an eigenvalue of zero produces a special interest. We will encounter another illustration in Chap. 3.

28

1

1.5

Schrödinger Equation and Its Application

Commutator and Canonical Commutation Relation

In quantum mechanics it is important whether two operators A and B are commutable. In this context, a commutator between A and B is defined such that ½A, B  AB - BA:

ð1:139Þ

If [A, B] = 0 (zero matrix), A and B are said to be commutable (or commutative). If [A, B] ≠ 0, A and B are non-commutative. Such relationships between two operators are called commutation relation. We have canonical commutation relation as an underlying concept of quantum mechanics. This is defined between a (canonical) coordinate q and a (canonical) momentum p such that ½q, p = iħ,

ð1:140Þ

where the presence of a unit matrix E is implied. Explicitly writing it, we have ½q, p = iħE:

ð1:141Þ

The relations (1.140) and (1.141) are called the canonical commutation relation. ∂ On the basis of a relation p = ħi ∂q , a brief proof for this is as follows: ½qpjψi ¼ ðqp - pqÞjψi ħ ∂ ħ ∂ ħ ∂ ħ ∂ ¼ q q jψi ¼ q jψi ðqjψiÞ i ∂q i ∂q i ∂q i ∂q ħ ∂jψi ħ ∂q ħ ∂jψi ħ jψi - q ¼ - jψi ¼ iħjψi: ¼q i ∂q i ∂q i ∂q i

ð1:142Þ

Since jψ i is an arbitrarily chosen vector, we have (1.140). Using (1.117), we have ½A, B{ = ðAB - BAÞ{ = B{ A{ - A{ B{ :

ð1:143Þ

If in (1.143) A and B are both Hermitian, we have ½A, B{ = BA - AB = - ½A, B:

ð1:144Þ

If we have an operator G such that G{ = - G,

ð1:145Þ

1.5

Commutator and Canonical Commutation Relation

29

G is said to be anti-Hermitian. Therefore, [A, B] is anti-Hermitian, if both A and B are Hermitian. If an anti-Hermitian operator has an eigenvalue, that eigenvalue is zero or pure imaginary. To show this, suppose that G j ψi = λ j ψi,

ð1:146Þ

where G is an anti-Hermitian operator and jψi has been normalized. As in the case of (1.123) we have hψjGjψ i = λhψjψ i = λ:

ð1:147Þ

Taking a complex conjugate of (1.147), we have hψjGψ i = λ :

ð1:148Þ

Using (1.116) and (1.145) again, we have λ = hψjGψ i = ψjG{ ψ = - hψjGψ i = - λ:

ð1:149Þ

This shows that λ is zero or pure imaginary. Therefore, (1.142) can be viewed as an eigenvalue equation to which any physical state jψi has a pure imaginary eigenvalue iħ with respect to [q, p]. Note that both q and p are Hermitian (see Sect. 10.2, Example 10.3), and so [q, p] is anti-Hermitian as mentioned above. The canonical commutation relation given by (1.140) is believed to underpin the uncertainty principle. In quantum mechanics, it is of great importance whether a quantum operator is Hermitian or not. A position operator and momentum operator along with an angular momentum operator are particularly important when we constitute Hamiltonian. Let f and g be arbitrary functions. Let us consider, e.g., a following inner product with the momentum operator. b

hgjpf i = a

gðxÞ

ħ ∂ ½f ðxÞdx, i ∂x

ð1:150Þ

where the domain [a, b] depends on a physical system; this can be either bounded or unbounded. Performing integration by parts, we have hgjpf i =

ħ b ½ gð x Þ  f ð xÞ  a i

b a

∂ ħ ½gðxÞ  f ðxÞdx i ∂x

ħ = ½ gð bÞ f ð bÞ - gð aÞ f ð aÞ  þ i

b a



ħ ∂ gðxÞ f ðxÞdx: i ∂x

ð1:151Þ

If we require f(b) = f(a) and g(b) = g(a), the first term vanishes and we get

30

1 b

hgjpf i = a

Schrödinger Equation and Its Application



ħ ∂ gðxÞ f ðxÞdx = hpgjf i: i ∂x

ð1:152Þ

Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a position operator q of (1.142) is Hermitian as a priori assumption. Meanwhile, the z-component of angular momentum operator Lz is described in a polar coordinate as follows: Lz =

ħ ∂ , i ∂ϕ

ð1:153Þ

where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of Lz will be mentioned in Chap. 3. Similarly as the above, we have hgjLz f i =

ħ ½gð2π Þ f ð2π Þ - gð0Þ f ð0Þ þ i

2π 0



ħ ∂ gðxÞ f ðxÞdϕ: i ∂ϕ

ð1:154Þ

Requiring an arbitrary function f to satisfy a BC f(2π) = f(0), we reach hgjLz f i = hLz gjf i:

ð1:155Þ

Note that we must have the above BC, because ϕ = 0 and ϕ = 2π are spatially the same point. Thus, we find that Lz is Hermitian as well on this condition. On the basis of aforementioned argument, let us proceed to quantum-mechanical studies of a harmonic oscillator. Regarding the angular momentum, we will study their basic properties in Chap. 3.

Reference 1. Mϕller C (1952) The theory of relativity. Oxford University Press, London

Chapter 2

Quantum-Mechanical Harmonic Oscillator

Quantum-mechanical treatment of a harmonic oscillator has been a well-studied topic from the beginning of the history of quantum mechanics. This topic is a standard subject in classical mechanics as well. In this chapter, first we briefly survey characteristics of a classical harmonic oscillator. From a quantum-mechanical point of view, we deal with features of a harmonic oscillator through matrix representation. We define creation and annihilation operators using position and momentum operators. A Hamiltonian of the oscillator is described in terms of the creation and annihilation operators. This enables us to easily determine energy eigenvalues of the oscillator. As a result, energy eigenvalues are found to be positive definite. Meanwhile, we express the Schrödinger equation by the coordinate representation. We compare the results with those of the matrix representation and show that the two representations are mathematically equivalent. Thus, the treatment of the quantum-mechanical harmonic oscillator supplies us with a firm ground for studying basic concepts of the quantum mechanics.

2.1

Classical Harmonic Oscillator

Classical Newtonian equation of a one-dimensional harmonic oscillator is expressed as m

d 2 xð t Þ = - sxðt Þ, dt 2

ð2:1Þ

where m is a mass of an oscillator and s is a spring constant. Putting s/m = ω2, we have

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_2

31

32

2

Quantum-Mechanical Harmonic Oscillator

d 2 xð t Þ þ ω2 xðt Þ = 0: dt 2

ð2:2Þ

In (2.2) we set ω positive, namely, ω=

s=m,

ð2:3Þ

where ω is called an angular frequency of the oscillator. If we replace ω2 with λ, we have formally the same equation as (1.61). Two linearly independent solutions of (2.2) are the same as before (see Example 1.1); we have eiωt and e-iωt (ω ≠ 0) as such. Note, however, that in Example 1.2 we were dealing with a quantum state related to existence probability of a particle in a potential well. In (2.2), however, we are examining a position of harmonic oscillator undergoing a force of a spring. We are thus considering a different situation. As a general solution we have xðt Þ = aeiωt þ be - iωt ,

ð2:4Þ

where a and b are suitable constants. Let us consider BCs different from those of Example 1.1 or 1.2 this time. That is, we set BCs such that x ð 0Þ = 0

and

x0 ð0Þ = v0 ðv0 > 0Þ:

ð2:5Þ

Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included in BCs (see Chap. 10). From (2.4) we have xð t Þ = a þ b = 0

and

x0 ð0Þ = iωða - bÞ = v0 :

ð2:6Þ

Then, we get a = - b = v0/2iω. Thus, we get a simple harmonic motion as a solution expressed as xð t Þ =

v v0 iωx e - e - iωx = 0 sin ωt: 2iω ω

ð2:7Þ

From this, we have E=K þ V =

1 mv 2 : 2 0

ð2:8Þ

In particular, if v0 = 0, x(t)  0. This is a solution of (2.1) that has the meaning that the particle is eternally at rest. It is physically acceptable as well. Notice also that unlike Examples 1.1 and 1.2, the solution has been determined uniquely. This is due to the different BCs.

2.2

Formulation Based on an Operator Method

33

From a point of view of a mechanical system, mathematical formulation of the classical harmonic oscillator resembles that of electromagnetic fields confined within a cavity. We return this point later in Sect. 9.6.

2.2

Formulation Based on an Operator Method

Now let us return to our task to find quantum-mechanical solutions of a harmonic oscillator. Potential V is given by V ð qÞ =

1 2 1 sq = mω2 q2 , 2 2

ð2:9Þ

where q is used for a one-dimensional position coordinate. Then, we have a classical Hamiltonian H expressed as H=

p2 p2 1 þ V ð qÞ = þ mω2 q2 : 2m 2m 2

ð2:10Þ

Following the formulation of Sect. 1.2, the Schrödinger equation as an eigenvalue equation related to energy E is described as Hψ ðqÞ = Eψ ðqÞ or -

1 ħ2 2 — ψ ðqÞ þ mω2 q2 ψ ðqÞ = Eψ ðqÞ: 2 2m

ð2:11Þ

This is a SOLDE and it is well known that the SOLDE can be solved by a power series expansion method. In the present studies, however, let us first use an operator method to solve the eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a quantum-mechanical Hamiltonian where a momentum operator p is explicitly represented. Thus, the Hamiltonian reads as H=

p2 1 þ mω2 q2 : 2m 2

ð2:12Þ

Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and q are expressed as quantum-mechanical operators. As in (1.126), we first examine an expectation value hHi of H. It is given by

34

2

hHi ¼ hψjHψi ¼

ψj

p2 ψ 2m

Quantum-Mechanical Harmonic Oscillator

1 þ ψj mω2 q2 ψ 2

1 1 1 1 p{ ψjpψi þ mω2 q{ ψjqψi ¼ hpψjpψi þ mω2 hqψjqψi 2 2m 2 2m 1 1 ¼ ð2:13Þ kpψ k2 þ mω2 kqψ k2 ≥ 0, 2m 2

¼

where again we assumed that jψi has been normalized. In (2.13) we used the notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi takes a non-negative value. In (2.13), the equality holds if and only if jpψi = 0 and jqψi = 0. Let us specify a vector jψ 0i that satisfies these conditions such that j pψ 0 i = 0

and

j qψ 0 i = 0:

ð2:14Þ

Multiplying q from the left on the first equation of (2.14) and multiplying p from the left on the second equation, we have qp j ψ 0 i = 0

and pq j ψ 0 i = 0:

ð2:15Þ

Subtracting the second equation of (2.15) from the first equation, we get ðqp - pqÞ j ψ 0 i = iħ j ψ 0 i = 0,

ð2:16Þ

where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i  0. This leads to the relations (2.14). That is, if and only if jψ 0(q)i  0, hHi = 0. But, since it has no physical meaning, jψ 0(q)i  0 must be rejected as unsuitable for the solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must take a positive definite value accordingly. Thus, on the basis of the canonical commutation relation, we restrict the range of the expectation values. Instead of directly dealing with (2.12), it is well known to introduce the following operators [1]: a

mω i qþp p 2ħ 2mħω

ð2:17Þ

and its adjoint (complex conjugate) operator a{ =

mω i q- p p: 2ħ 2mħω

ð2:18Þ

Notice here again that both q and p are Hermitian. Using a matrix representation for (2.17) and (2.18), we have

2.2

Formulation Based on an Operator Method

a a{

=

mω 2ħ mω 2ħ

35

i 2mħω i -p 2mħω p

q : p

ð2:19Þ

Then we have a{ a ¼ ð2mħωÞ - 1 ðmωq - ipÞðmωq þ ipÞ ¼ ð2mħωÞ - 1 m2 ω2 q2 þ p2 þ imωðqp - pqÞ ¼ ðħωÞ - 1

1 1 2 1 1 mω2 q2 þ p þ iωiħ ¼ ðħωÞ - 1 H - ħω , 2 2m 2 2

ð2:20Þ

where the second last equality comes from (1.140). Rewriting (2.20), we get 1 H = ħωa{ a þ ħω: 2

ð2:21Þ

1 ħω: 2

ð2:22Þ

Similarly we get H = ħωaa{ Subtracting (2.22) from (2.21), we have 0 = ħωa{ a - ħωaa{ þ ħω:

ð2:23Þ

a, a{ = 1 or a, a{ = E,

ð2:24Þ

That is,

where E represents an identity operator. Furthermore, using (2.21) we have 1 H, a{ = ħω a{ a þ , a{ = ħω a{ aa{ - a{ a{ a = ħωa{ a, a{ = ħωa{ : ð2:25Þ 2 Similarly, we get ½H, a = - ħωa:

ð2:26Þ

Next, let us calculate an expectation value of H. Using a normalized function jψi, from (2.21) we have

36

2

Quantum-Mechanical Harmonic Oscillator

1 1 hψjHjψi ¼ ψjħωa{ a þ ħωjψ ¼ ħω ψja{ ajψi þ ħωhψjψi 2 2 1 1 1 2 ¼ ħωhaψjaψi þ ħω ¼ ħωkaψ k þ ħω ≥ ħω: 2 2 2

ð2:27Þ

Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with that an energy eigenvalue is positive definite as mentioned above. Equation (2.27) also tells us that if we have j aψ 0 i = 0,

ð2:28Þ

we get hψ 0 jHjψ 0 i =

1 ħω: 2

ð2:29Þ

Equation (2.29) means that the smallest expectation value is 12 ħω on the condition of (2.28). On the same condition, using (2.21) we have 1 1 H j ψ 0 i = ħωa{ a j ψ 0 i þ ħω j ψ 0 i = ħω j ψ 0 i: 2 2

ð2:30Þ

Thus, jψ 0i is an eigenfunction corresponding to an eigenvalue 12 ħω  E 0 , which is identical with the smallest expectation value of (2.29). Since this is the lowest eigenvalue, jψ 0i is said to be a ground state. We ensure later that jψ 0i is certainly an eligible function for a ground state. The above method is consistent with the variational principle [2] which stipulates that under appropriate BCs an expectation value of Hamiltonian estimated with any arbitrary function is always larger than or equal to the smallest eigenvalue corresponding to the ground state. Next, let us evaluate energy eigenvalues of the oscillator. First we have H j ψ 0i =

1 ħω j ψ 0 i = E 0 j ψ 0 i: 2

ð2:31Þ

Operating a{ on both sides of (2.31) from the left, we have a{ H j ψ 0 i = a{ E 0 j ψ 0 i:

ð2:32Þ

Meanwhile, using (2.25), we have a{ H j ψ 0 i = Ha{ - ħωa{ j ψ 0 i: Equating RHSs of (2.32) and (2.33), we get

ð2:33Þ

2.2

Formulation Based on an Operator Method

37

(x

)

0 Fig. 2.1 Energy eigenvalues of a quantum-mechanical harmonic oscillator on a real axis

Ha{ j ψ 0 i = ðE 0 þ ħωÞa{ j ψ 0 i:

ð2:34Þ

This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using (2.25), we get H a{

2

j ψ 0 i = ðE 0 þ 2ħωÞ a{

2

j ψ 0 i:

ð2:35Þ

This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeatedly taking the above procedures, we get H a{

n

j ψ 0 i = ðE 0 þ nħωÞ a{

n

j ψ 0 i:

ð2:36Þ

Thus, (a{)n j ψ 0i belongs to an eigenvalue En  E 0 þ nħω = n þ

1 ħω, 2

ð2:37Þ

where En denotes an energy eigenvalue of the n-th excited state. The energy eigenvalues are plotted in Fig. 2.1. Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn be a normalization constant of that state. That is, we have j ψ n i = c n a{

n

j ψ 0 i,

ð2:38Þ

where jψ ni is a normalized eigenfunction of the n-th excited state. To determine cn, let us calculate a j ψ ni. This includes a factor a(a{)n. We have

38

2

a a{

n

n-1

¼ aa{ - a{ a a{ ¼ a, a{ a{ ¼ a{

n-1

n-1

Quantum-Mechanical Harmonic Oscillator n-1

þ a{ a a{

þ a{ a a{

n-2

þ a{ a, a{ a{

¼ 2 a{

n-1

þ a{ a a {

¼ 2 a{

n-1

þ a{

¼ 3 a{

n-1

þ a{ a a {

n-1

2

n-2

2

a, a{ a{

3

n-3

¼ a{

n-1

2

þ a { a a{

þ a{ a a{

n-1

n-2

ð2:39Þ n-3

3

þ a { a a{

n-3

¼ ⋯: In the above procedures, we used [a, a{] = 1. What is implied in (2.39) is that a coefficient of (a{)n - 1 increased one by one with a transferred toward the right one by one in the second term of RHS. Notice that in the second term a is sandwiched such that (a{)ma(a{)n - m (m = 1, 2, ⋯). Finally, we have a a{

n

= n a{

n-1

n

þ a{ a:

ð2:40Þ

Meanwhile, operating a on both sides of (2.38) from the left and using (2.40), we get a ψ n i ¼ c n a a{ ¼n

cn cn - 1

n

ψ 0 i ¼ c n n a{

c n - 1 a{

n-1

n-1

j ψ 0i ¼ n

n

þ a{ a j ψ 0 i ¼ c n n a{ cn cn - 1

n-1

j ψ n - 1 i,

j ψ 0i

ð2:41Þ

where the third equality comes from (2.28). Next, operating a on (2.40) we obtain a2 a{

n

¼ na a{

n-1

n

þ a a{ a

¼ n ð n - 1Þ a { ¼ n ð n - 1Þ a{

n-2 n-2

Operating another a on (2.42), we get

þ a{

n-1

a þ a a{ a

þ n a{

n-1

a þ a a{ a:

n

n

ð2:42Þ

2.2

Formulation Based on an Operator Method

a3 a{

n

= nðn - 1Þa a{

n-2

þ na a{ n-3

= nðn - 1Þ ðn - 2Þ a{ = nðn - 1Þðn - 2Þ a{ = ⋯:

n-3

39

n-1

n

a þ a2 a{ a n-2

þ a{

a þ na a{

þ nðn - 1Þ a{

n-2

n-1

n

a þ a2 a{ a

a þ na a{

n-1

n

a þ a2 a{ a

ð2:43Þ To generalize the above procedures, operating a on (2.40) m ( 0Þ, c

ð2:84Þ

πħ : mω

ð2:85Þ

we have 1 -1

To get (2.84), putting I 

e-

mω 2 ħ q

dq =

1 - cq2 dq, - 1e

we have

2.4

Coordinate Representation of Schrödinger Equation 1

I2 =

-1 1 - cr 2

=

1

e - cq dq

e

2

-1 2π

rdr

0

0

1

e - cs ds =

1 dθ = 2

47

2

1

e

- cR

e - cð q

-1 2π

dR

0

0

2

þs2 Þ

dqds ð2:86Þ

π dθ = , c

where with the third equality we converted two-dimensional Cartesian coordinate to polar coordinate; take q = r cos θ, s = r sin θ and convert an infinitesimal area element dqds to dr  rdθ. With the second last equality of (2.86), we used the variable transformation of r2 → R. Hence, we get I = π=c. Thus, we get N0 =

mω πħ

1=4

and

ψ 0 ð qÞ =

mω πħ

1=4

e - 2ħ q : mω 2

ð2:87Þ

Also, we have a{ = 

mω i q- p p= 2ħ 2mħω

mω i ħ ∂ q- p = 2ħ 2mħω i ∂q

mω q2ħ

ħ 2mω

∂ : ∂q

ð2:88Þ

From (2.52), we get 1 ψ n ð q Þ = p a{ n!

n

1 mω =p n! 2ħ

1 j ψ 0i = p n! n=2

mω q2ħ

ħ ∂ qmω ∂q

ħ ∂ 2mω ∂q

n

n

ψ 0 ðqÞ ð2:89Þ

ψ 0 ðqÞ:

Putting β we rewrite (2.89) as

mω=ħ

and

ξ = βq,

ð2:90Þ

48

2

Quantum-Mechanical Harmonic Oscillator n

∂ ξ 1 mω 2 =p ξβ ∂ξ n! 2ħβ2 n 1 1 n=2 ∂ =p ξψ 0 ð qÞ ∂ξ n! 2

ψ n ð qÞ = ψ n

=

1 mω 2 n! πħ

1=4

n

ξ-

n

∂ ∂ξ

n

ψ 0 ðqÞ ð2:91Þ

e - 2ξ : 1 2

Comparing (2.81) and (2.90), we have α=

β2 : 2

ð2:92Þ

Moreover, putting 1 mω 2 n! πħ

Nn 

1=4

n

=

β , π 1=2 2n n!

ð2:93Þ

we get ψ n ðξ=βÞ = N n ξ -

∂ ∂ξ

n

e - 2ξ : 1 2

ð2:94Þ

We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already been normalized as in (2.53), we have 1 -1

jψ n ðqÞj2 dq = 1:

ð2:95Þ

Changing a variable q to ξ, we have 1 β

1 -1

jψ n ðξ=βÞj2 dξ = 1:

ð2:96Þ

Let us define ψ n ðξÞ as being normalized with ξ. In other words, ψ n(q) is converted to ψ n ðξÞ by means of variable transformation and concomitant change in normalization condition. Then, we have 1 -1

jψ n ðξÞj2 dξ = 1:

Comparing (2.96) and (2.97), if we define ψ n ðξÞ as

ð2:97Þ

2.4

Coordinate Representation of Schrödinger Equation

49

1 ψ ðξ=βÞ, β n

ψ n ðξÞ 

ð2:98Þ

ψ n ðξÞ should be a proper normalized function. Thus, we get ψ n ðξÞ = N n ξ -

∂ ∂ξ

n

e - 2ξ

1 2

with

1 : π 1=2 2n n!

Nn 

ð2:99Þ

Meanwhile, according to a theory of classical orthogonal polynomial, the Hermite polynomials Hn(x) are defined as [3] H n ðxÞ  ð- 1Þn ex

2

2 dn e-x dxn

ðn ≥ 0Þ,

ð2:100Þ

where Hn(x) is a n-th order polynomial. We wish to show the following relation on the basis of mathematical induction: ψ n ðξÞ = N n H n ðξÞe - 2ξ : 1 2

ð2:101Þ

Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n = 0. When n = 1, from (2.99) we have ψ 1 ðξÞ = N 1 ξ -

1 2 1 2 1 2 1 2 ∂ e - 2ξ = N 1 ξe - 2ξ - ð - ξÞe - 2ξ = N 1  2ξe - 2ξ ∂ξ

2 d = N 1 ð- 1Þ e e-ξ dξ

1 ξ2

e

- 12ξ2

= N 1 H 1 ðξÞe

- 12ξ2

ð2:102Þ

:

Then, (2.101) holds with n = 1 as well. Next, from supposition of mathematical induction we assume that (2.101) holds with n. Then, we have

50

2

ψ nþ1 ðξÞ ¼ N nþ1 ξ -

∂ ∂ξ

nþ1

Quantum-Mechanical Harmonic Oscillator

1 ∂ ∂ Nn ξ ξ∂ξ ∂ξ 2ðn þ 1Þ

e - 2ξ ¼ 1 2

¼

1 ∂ ξ∂ξ 2 ð n þ 1Þ

¼

1 ∂ Nn ξ ∂ξ 2 ð n þ 1Þ

¼

1 ∂ N n ð- 1Þn ξ ∂ξ 2 ð n þ 1Þ

¼

n 1 2 d 1 ∂ 12ξ2 dn - ξ2 - ξ2 N n ð- 1Þn ξe2ξ e n e n e dξ dξ ∂ξ 2 ð n þ 1Þ

¼

n n 2 1 2 d 1 2 d 1 - ξ2 N n ð- 1Þn ξe2ξ n e - ξ - ξe2ξ n e dξ dξ 2 ð n þ 1Þ

- e 2ξ

1 2

¼

n

e - 2ξ

1 2

N n H n ðxÞe - 2ξ

1 2

ð- 1Þn eξ e 2ξ

1 2

2

2 dn e-ξ dξn

e - 2ξ

1 2

2 dn e-ξ dξn

2 d nþ1 e-ξ dξnþ1

nþ1 2 1 2 d 1 N n ð- 1Þnþ1 e2ξ nþ1 e - ξ dξ 2 ð n þ 1Þ

¼ N nþ1 ð- 1Þnþ1 eξ

2

2 dnþ1 e-ξ dξnþ1

e - 2ξ ¼ N nþ1 H nþ1 ðxÞe - 2ξ : 1 2

1 2

This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is true of n that is zero or any positive integer. Orthogonal relation reads as 1 -1

ψ m ðξÞ ψ n ðξÞdξ = δmn ,

ð2:103Þ

where δmn is Kronecker delta to which δmn =

1 ðm = nÞ, 0 ðm ≠ nÞ:

ð2:104Þ

Placing (2.98) back into the function form ψ n(q), we have ψ n ð qÞ =

βψ n ðβqÞ:

Using (2.101) and explicitly rewriting (2.105), we get

ð2:105Þ

2.4

Coordinate Representation of Schrödinger Equation

Table 2.1 First six Hermite polynomials

ψ n ð qÞ =

mω ħ

1=4

51

H0(x) = 1 H1(x) = 2x H2(x) = 4x2 - 2 H3(x) = 8x3 - 12x H4(x) = 16x4 - 48x2 + 12 H5(x) = 32x5 - 160x3 + 120x

1 Hn π 1=2 2n n!

mω 2 mω q e - 2ħ q ðn = 0, 1, 2, ⋯Þ: ħ

ð2:106Þ

We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index n represents the highest order of the polynomials. In Table 2.1 we see that even functions and odd functions appear alternately (i.e., parity). This is the case with mω 2 ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e - 2ħ q . Combining (2.101) and (2.103) as well as (2.99), the orthogonal relation between ψ n ðξÞ ðn = 0, 1, 2, ⋯Þ can be described alternatively as [3] 1 -1

p 2 e - ξ H m ðξÞH n ðξÞdξ = π 2n n!δmn :

ð2:107Þ

Note that Hm(ξ) is a real function, and so Hm(ξ) = Hm(ξ). The relation (2.107) is 2 well known as the orthogonality of Hermite polynomials with e - ξ taken as a weight function [3]. Here the weight function is a real and non-negative function within the domain considered [e.g., (-1, +1) in the present case] and independent of indices m and n. We will deal with it again in Sect. 10.4. The relation (2.101) and the orthogonality relationship described as (2.107) can more explicitly be understood as follows: From (2.11) we have the Schrödinger equation of a one-dimensional quantum-mechanical harmonic oscillator such that -

ħ2 d 2 uð qÞ 1 þ mω2 q2 uðqÞ = EuðqÞ: 2m dq2 2

ð2:108Þ

Changing a variable as in (2.90), we have -

d 2 uð ξ Þ 2E þ ξ 2 uð ξ Þ = uðξÞ: ħω dξ2

ð2:109Þ

Defining a dimensionless parameter λ

2E ħω

and also defining a differential operator D such that

ð2:110Þ

52

2

D -

Quantum-Mechanical Harmonic Oscillator

d2 þ ξ2 , dξ2

ð2:111Þ

we have a following eigenvalue equation: DuðξÞ = λuðξÞ:

ð2:112Þ

We further consider a following function v(ξ) such that uðξÞ = vðξÞe - ξ

2

=2

:

ð2:113Þ

Then, (2.109) is converted as follows: -

ξ2 d 2 vð ξ Þ dvðξÞ - ξ22 þ 2ξ e = ðλ - 1Þ vðξÞe - 2 : 2 dξ dξ

ð2:114Þ

ξ2

Since e - 2 does not vanish with any ξ, we have -

dvðξÞ d 2 vð ξ Þ þ 2ξ = ðλ - 1Þ vðξÞ: dξ dξ2

ð2:115Þ

If we define another differential operator D such that d2 d þ 2ξ , 2 dξ dξ

ð2:116Þ

DvðξÞ = ðλ - 1Þ vðξÞ:

ð2:117Þ

D we have another eigenvalue equation

Meanwhile, we have a following well-known differential equation: d 2 H n ðξÞ dH ðξÞ - 2ξ n þ 2nH n ðξÞ = 0: 2 dξ dξ

ð2:118Þ

This equation is said to be Hermite differential equation. Using (2.116), (2.118) can be recast as an eigenvalue equation such that DH n ðξÞ = 2nH n ðξÞ: Therefore, comparing (2.115) and (2.118) and putting

ð2:119Þ

2.5

Variance and Uncertainty Principle

53

λ = 2n þ 1,

ð2:120Þ

vðξÞ = cH n ðξÞ,

ð2:121Þ

we get

where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get un ðξÞ = cH n ðξÞe - ξ

2

=2

,

ð2:122Þ

where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we have En = n þ

1 ħω: 2

Thus, (2.37) is recovered. A normalization constant c of (2.122) can be decided as in (2.106). As discussed above, the operator representation and coordinate representation are fully consistent.

2.5

Variance and Uncertainty Principle

Uncertainty principle is one of the most fundamental concepts of quantum mechanics. To think of this conception on the basis of a quantum harmonic oscillator, let us introduce a variance operator [4]. Let A be a physical quantity and let hAi be an expectation value as defined in (1.126). We define a variance operator as ðΔAÞ2 , where we have ΔA  A - hAi:

ð2:123Þ

In (2.123), we assume that hAi is obtained by operating A on a certain physical state jψ i. Then, we have ðΔAÞ2 = ðA - hAiÞ2 = A2 - 2hAiA þ hAi2 = A2 - hAi2 : If A is Hermitian, ΔA is Hermitian as well. This is because

ð2:124Þ

54

2

Quantum-Mechanical Harmonic Oscillator

ðΔAÞ{ = A{ - hAi = A - hAi = ΔA,

ð2:125Þ

where we used the fact that an expectation value of an Hermitian operator is real. Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an eigenstate of A, h(ΔA)2i = 0. Therefore, h(ΔA)2i represents a measure of how large measured values are dispersed when A is measured in reference to a quantum state jψ i. Also, we define a standard deviation δA as δA 

ðΔAÞ2 :

ð2:126Þ

We have a following important theorem on a standard deviation δA [4]. Theorem 2.1 Let A and B be Hermitian operators. If A and B satisfy ½A, B = ik

ðk : non‐zero real numberÞ,

ð2:127Þ

δA  δB ≥ jkj=2

ð2:128Þ

then we have

in reference to any quantum state jψ i. Proof We have ½ΔA, ΔB = ½A - hψjAjψ i, B - hψjBjψi = ½A, B = ik:

ð2:129Þ

In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and those commute with any operator. Next, we calculate a following quantity in relation to a real number λ: kðΔA þ iλΔBÞjψik2

= hψjðΔA - iλΔBÞðΔA þ iλΔBÞjψ i = ψjðΔAÞ2 jψ - kλ þ ψjðΔBÞ2 jψ λ2 ,

ð2:130Þ

where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form to hold with any real number λ, we have ð- k Þ2 - 4 ψjðΔAÞ2 jψ

ψjðΔBÞ2 jψ ≤ 0:

ð2:131Þ

Thus, (2.128) will follow. This completes the proof. On the basis of Theorem 2.1, we find that both δA and δB are positive on condition that (2.127) holds. We have another important theorem.

2.5

Variance and Uncertainty Principle

55

Theorem 2.2 Let A be an Hermitian operator. The necessary and sufficient condition for a physical state jψ 0i to be an eigenstate of A is δA = 0. Proof Suppose that jψ 0i is a normalized eigenstate of A that belongs to an eigenvalue a. Then, we have ψ 0 jA2 ψ 0 = ahψ 0 jAψ 0 i = a2 hψ 0 jψ 0 i = a2 , hψ 0 jAψ 0 i2 = ½ahψ 0 jψ 0 i2 = a2 :

ð2:132Þ

From (2.124) and (2.126), we have ψ 0 jðΔAÞ2 ψ 0 = 0

i:e:,

δA = 0:

ð2:133Þ

Note that δA is measured in reference to jψ 0i. Conversely, suppose that δA = 0. Then, δA =

ψ 0 jðΔAÞ2 ψ 0 =

hΔAψ 0 jΔAψ 0 i = jjΔAψ 0 jj,

ð2:134Þ

where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121), for δA = 0 to hold, we have ΔAψ 0 = ðA - hAiÞψ 0 = 0

i:e:,

Aψ 0 = hAiψ 0 :

ð2:135Þ

This indicates that ψ 0 is an eigenstate of A that belongs to an eigenvalue hAi. This completes the proof. Theorem 2.1 implies that (2.127) holds with any physical state jψi. That is, we must have δA > 0 and δB > 0, if δA and δB are evaluated in reference to any jψi on condition that (2.127) holds. From Theorem 2.2, in turn, it follows that eigenstates cannot exist with A or B under the condition of (2.127). To explicitly show this, we take an inner product using (2.127). That is, with Hermitian operators A and B, consider the following inner product: hψj½A, Bjψ i = hψjikjψ i i:e:,

hψjAB - BAjψ i = ik,

ð2:136Þ

where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have A j ψ 0 i = a j ψ 0 i: Taking an adjoint of (2.137), we get

ð2:137Þ

56

2

Quantum-Mechanical Harmonic Oscillator

hψ 0 j A{ = hψ 0 j A = hψ 0 j a = ahψ 0 j ,

ð2:138Þ

where the last equality comes from the fact that A is Hermitian. From (2.138), we would have hψ 0 jAB - BAjψ 0 i = hψ 0 jABjψ 0 i - hψ 0 jBAjψ 0 i = hψ 0 jaBjψ 0 i - hψ 0 jBajψ 0 i = ahψ 0 jBjψ 0 i - ahψ 0 jBjψ 0 i = 0: This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127), where ik ≠ 0. Namely, we conclude that any physical state cannot be an eigenstate of A on condition that (2.127) holds. Equation (2.127) is rewritten as hψjBA - ABjψ i = - ik:

ð2:139Þ

Suppose now that jφ0i is an eigenstate of B that belongs to an eigenvalue b. Then, we can similarly show that any physical state cannot be an eigenstate of B. Summarizing the above, we restate that once we have a relation [A, B] = ik (k ≠ 0), their representation matrix does not diagonalize A or B. Or, once we postulate [A, B] = ik (k ≠ 0), we must abandon an effort to have a representation matrix that diagonalizes A and B. In the quantum-mechanical formulation of a harmonic oscillator, we have introduced the canonical commutation relation (see Sect. 2.3) described by [q, p] = iħ; see (1.140) and (2.73). Indeed, neither q nor p is diagonalized as shown in (2.69) or (2.70). Example 2.1 Taking a quantum harmonic oscillator as an example, we consider variance of q and p in reference to jψ ni (n = 0, 1, ⋯). We have ðΔqÞ2 = ψ n jq2 jψ n - hψ n jqjψ n i2 :

ð2:140Þ

Using (2.55) and (2.62) as well as (2.68), we get hψ n jqjψ n i = =

ħ ψ ja þ a{ jψ n 2mω n ħn hψ jψ iþ 2mω n n - 1

ħ ð n þ 1Þ ψ n jψ nþ1 = 0, 2mω

ð2:141Þ

where the last equality comes from (2.53). We have q2 =

ħ a þ a{ 2mω

2

=

ħ a2 þ E þ 2a{ a þ a{ 2mω

2

,

ð2:142Þ

where E denotes an identity operator and we used (2.24) along with the following relation:

2.5

Variance and Uncertainty Principle

57

aa{ = aa{ - a{ a þ a{ a = a, a{ þ a{ a = E þ a{ a:

ð2:143Þ

Using (2.55) and (2.62), we have ψ n jq2 jψ n =

ħ ħ ð2n þ 1Þ, hψ n jψ n i þ 2hψ n ja{ aψ n i = 2mω 2mω

ð2:144Þ

where we used (2.60) with the last equality. Thus, we get ðΔqÞ2 = ψ n jq2 jψ n - hψ n jqjψ n i2 =

ħ ð2n þ 1Þ: 2mω

Following similar procedures to those mentioned above, we get hψ n jpjψ n i = 0

and

ψ n jp2 jψ n =

mħω ð2n þ 1Þ: 2

ð2:145Þ

Thus, we get ðΔpÞ2 = ψ n jp2 jψ n =

mħω ð2n þ 1Þ: 2

Accordingly, we have δq  δp =

ðΔqÞ2 

ðΔpÞ2 =

ħ ħ ð2n þ 1Þ ≥ : 2 2

ð2:146Þ

The quantity δq  δp is equal to ħ2 for n = 0 and becomes larger with increasing n. The above example gives a good illustration for Theorem 2.1. Note that putting A = q and B = p along with k = ħ in Theorem 2.1, from (1.140) we should have ħ δq  δp ≥ : 2

ð2:147Þ

Thus, the quantum-mechanical harmonic oscillator gives us an interesting example with respect to the standard deviation of the position operator and momentum operator. That is, (2.146) is a special case of (2.147). The general relation represented by (2.147) is well-known as uncertainty principle. In relation to the aforementioned argument, we might well wonder if Examples 1.1 and 1.2 have an eigenstate of a fixed momentum. Suppose that we chose for an eigenstate y(x) = ceikx, where c is a constant. Then, we would have ħi ∂y∂xðxÞ = ħkyðxÞ and get an eigenvalue ħk for a momentum. Nonetheless, such y(x) does not satisfy the proper BCs; i.e., y(L) = y(-L ) = 0. This is because eikx never vanishes with any real numbers of k or x (any complex numbers of k or x, more generally). Thus, we

58

2

Quantum-Mechanical Harmonic Oscillator

cannot obtain a proper solution that has an eigenstate with a fixed momentum in a confined physical system.

References 1. Messiah A (1999) Quantum mechanics. Dover, New York, NY 2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York, NY 3. Lebedev NN (1972) Special functions and their applications. Dover, New York, NY 4. Shimizu A (2003) Upgrade foundation of quantum theory. Saiensu-sha, Tokyo. (in Japanese)

Chapter 3

Hydrogen-Like Atoms

In a history of quantum mechanics, it was first successfully applied to the motion of an electron in a hydrogen atom along with a harmonic oscillator. Unlike the case of a one-dimensional harmonic oscillator we dealt with in Chap. 2, however, with a hydrogen atom we have to consider three-dimensional motion of an electron. Accordingly, it takes somewhat elaborate calculations to constitute the Hamiltonian. The calculation procedures themselves, however, are worth following to understand underlying basic concepts of the quantum mechanics. At the same time, this chapter is a treasure of special functions. In Chap. 2, we have already encountered one of them, i.e., Hermite polynomials. Here we will deal with Legendre polynomials, associated Legendre polynomials, etc. These special functions arise when we deal with a physical system having, e.g., the spherical symmetry. In a hydrogen atom, an electron is moving in a spherically-symmetric Coulomb potential field produced by a proton. This topic provides us with a good opportunity to study various special functions. The related Schrödinger equation can be separated into an angular part and a radial part. The solutions of angular parts are characterized by spherical (surface) harmonics. The (associated) Legendre functions are correlated to them. The solutions of the radial part are connected to the (associated) Laguerre polynomials. The exact solutions are obtained by the product of the (associated) Legendre functions and (associated) Laguerre polynomials accordingly. Thus, to study the characteristics of hydrogen-like atoms from the quantum-mechanical perspective is of fundamental importance.

3.1

Introductory Remarks

The motion of the electron in hydrogen is well-known as a two-particle problem (or two-body problem) in a central force field. In that case, the coordinate system of the physical system is separated into the relative coordinates and center-of-mass coordinates. To be more specific, the coordinate separation is true of the case where © Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_3

59

60

3 Hydrogen-Like Atoms

two particles are moving under control only by a force field between the two particles without other external force fields [1]. In the classical mechanics, equation of motion is separated into two equations related to the relative coordinates and center-of-mass coordinates accordingly. Of these, a term of the potential field is only included in the equation of motion with respect to the relative coordinates. The situation is the same with the quantum mechanics. Namely, the Schrödinger equation of motion with the relative coordinates is expressed as an eigenvalue equation that reads as -

ħ2 2 — þ V ðr Þ ψ = Eψ, 2μ

ð3:1Þ

where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a potential with r being a distance between the electron and proton. In (3.1), we assume the spherically-symmetric potential; i.e., the potential is expressed only as a function of the distance r. Moreover, if the potential is coulombic, -

ħ2 2 e2 — ψ = Eψ, 2μ 4πε0 r

ð3:2Þ

where ε0 is permittivity of vacuum and e is an elementary charge. If we think of hydrogen-like atoms such as He+, Li2+, Be3+, etc., we have an equation described as -

Ze2 ħ2 2 — ψ = Eψ, 2μ 4πε0 r

ð3:3Þ

where Z is an atomic number and μ is a reduced mass of an electron and a nucleus pertinent to the atomic (or ionic) species. We start with (3.3) in this chapter.

3.2

Constitution of Hamiltonian

As explicitly described in (3.3), the coulombic potential has a spherical symmetry. In such a case, it will be convenient to recast (3.3) in a spherical coordinate (or polar coordinate). As the physical system is of three-dimensional, we have to consider orbital angular momentum L in Hamiltonian. We have

3.2

Constitution of Hamiltonian

61

L = ð e1 e2 e3 Þ

Lx Ly

,

ð3:4Þ

Lz where e1, e2, and e3 denote orthonormal basis vectors in a three-dimensional Cartesian space (ℝ3); Lx, Ly, and Lz represent each component of L. The angular momentum L is expressed in a form of determinant as e1 L=x×p= x

e2 y

px

py

e3 z , pz

where x denotes a position vector with respect to the relative coordinates x, y, and z. That is,

x = ðe1 e2 e3 Þ

x y

:

ð3:5Þ

z The quantity p denotes a momentum of an electron (as a particle carrying a reduced mass μ) with px, py, and pz being their components; p is denoted similarly to the above. As for each component of L, we have, e.g., Lx = ypz - zpy :

ð3:6Þ

To calculate L2, we estimate Lx2, Ly2, and Lz2 separately. We have Lx 2 = ypz - zpy  ypz - zpy = ypz ypz - ypz zpy - zpy ypz - zpy zpy = y2 pz 2 - ypz zpy - zpy ypz þ z2 py 2 = y2 pz 2 - y zpz - iħ py - z ypy - iħ pz þ z2 py 2 = y2 pz 2 þ z2 py 2 - yzpz py - zypy pz þ iħ ypy þ zpz , where we have used canonical commutation relation (1.140) in the second to the last equality. In the above calculations, we used commutability of, e.g., y and pz; z and py. For example, we have

62

3

pz , y j ψi =

ħ ∂ ∂ y-y i ∂z ∂z

j ψi =

Hydrogen-Like Atoms

∂ j ψi ∂ j ψi ħ y = 0: -y i ∂z ∂z

Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have L2 ¼ Lx 2 þ Ly 2 þ Lz 2 ¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2 þ x2 px 2 - x2 px 2 þ y2 py 2 - y2 py 2 þ z2 pz 2 - z2 pz 2 - yzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py þ iħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy ¼

y2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 p x 2 þ x 2 px 2 þ y 2 py 2 þ z2 pz 2 - x2 px 2 þ y2 py 2 þ z2 pz 2 þ yzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py

þ iħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy

¼ r2  p2 - rðr  pÞ  p þ 2iħðr  pÞ: ð3:7Þ In (3.7), we are able to ease the calculations by virtue of putting a term (x2px2 x px2 + y2py2 - y2py2 + z2pz2 - z2pz2). As a result, for the second term after the second to the last equality we have 2

- x 2 p x 2 þ y 2 py 2 þ z 2 pz 2

þ yzpz py þ zypy pz þ zxpx pz þ xzpz px

þ xypy px þ yxpx py ¼ - x xpx þ ypy þ zpz px þ y xpx þ ypy þ zpz py þ z xpx þ ypy þ zpz pz ¼ - rðr  pÞ  p: The calculations of r2  p2 [the first term of (3.7)] and r  p (in the third term) are straightforward. In a spherical coordinate, momentum p is expressed as

3.2

Constitution of Hamiltonian

Fig. 3.1 Spherical coordinate system and orthonormal basis set. (a) Orthonormal basis vectors e(r), e(θ), and e(ϕ) in ℝ3. (b) The basis vector e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of y = x tan ϕ

63

(a)

z e(r) e(φ ) θ

e(θ )

O

y

φ x

(b) z

e(r) e(φ )

e(θ )

θ O p = pr eðrÞ þ pθ eðθÞ þ pϕ eðϕÞ ,

ð3:8Þ

where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of y = x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the momentum operator is expressed as [2]

64

3

Hydrogen-Like Atoms

ħ — i ħ ðrÞ ∂ 1 ∂ 1 ∂ = e þ eðθÞ þ eðϕÞ : i r ∂θ r sin θ ∂ϕ ∂r

p =

ð3:9Þ

The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian coordinate, we have p=

∂ ∂ ∂ ħ ħ þ e2 þ e3 , —= e i i 1 ∂x ∂y ∂z

ð3:10Þ

where — is said to be nabla (or del), a kind of differential vector operator. Noting that r = reðrÞ , and using (3.9), we have ħ ħ ∂ r p=r — =r : i i ∂r

ð3:11Þ

Hence, r ð r  pÞ  p = r r

ħ ∂ i ∂r

2

∂ ħ ∂ = - ħ2 r 2 2 : i ∂r ∂r

ð3:12Þ

Thus, we have 2

L2 = r 2 p2 þ ħ2 r 2

∂ ∂ ∂ ∂ r2 : þ 2ħ2 r = r 2 p2 þ ħ2 ∂r 2 ∂r ∂r ∂r

ð3:13Þ

Therefore, p2 = -

∂ L2 ħ2 ∂ þ 2: r2 2 r ∂r r ∂r

ð3:14Þ

Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and so it can freely be divided by r2. Thus, the Hamiltonian H is represented by

3.2

Constitution of Hamiltonian

65

p2 þ V ðr Þ 2μ 1 ħ2 ∂ ∂ L2 Ze2 r2 þ 2 : = - 2 4πε0 r 2μ r ∂r r ∂r

H =

ð3:15Þ

Thus, the Schrödinger equation can be expressed as 1 ħ2 ∂ ∂ L2 Ze2 þ 2 - 2 ψ = Eψ: r2 2μ 4πε0 r r ∂r r ∂r

ð3:16Þ

Now, let us describe L2 in a polar coordinate. The calculation procedures are somewhat lengthy, but straightforward. First we have x = r sin θ cos ϕ, y = r sin θ sin ϕ,

ð3:17Þ

z = r cos θ, where we have 0 ≤ θ ≤ π and 0 ≤ ϕ ≤ 2π. Rewriting (3.17) with respect to r, θ, and ϕ, we get r = ðx2 þ y2 þ z2 Þ

1=2

ð x2 þ y Þ z y ϕ = tan - 1 : x

θ = tan - 1

,

2 1=2

,

ð3:18Þ

Thus, we have Lz = xpy - ypx = - iħ x

∂ ∂ , -y ∂y ∂x

∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂ = þ þ : ∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ In turn, we have

ð3:19Þ ð3:20Þ

66

3

∂r x ¼ ¼ sin θ cos ϕ, ∂x r

- 1=2

ð x 2 þ y2 Þ ∂θ 1  ¼ 2 2 2 2z ∂x 1 þ ðx þ y Þ=z

 2x

¼

Hydrogen-Like Atoms

z x  x2 þ y2 þ z2 ðx2 þ y2 Þ1=2

cos θ cos ϕ ¼ , r sin ϕ 1 ∂ϕ 1 y - 2 ¼ : ¼ r sin θ x ∂x 1 þ ðy2 =x2 Þ

ð3:21Þ

In calculating the last two equations of (3.21), we used the differentiation of an arc tangent function along with a composite function. Namely, 0

tan - 1 x =

1 : 1 þ x2

Inserting (3.21) into (3.20), we get ∂ ∂ cos θ cos ϕ ∂ sin ϕ ∂ = sin θ cos ϕ þ : r ∂x ∂r ∂θ r sin θ ∂ϕ

ð3:22Þ

Similarly, we have ∂ ∂ cos θ sin ϕ ∂ cos ϕ ∂ = sin θ sin ϕ þ þ : r ∂y ∂r ∂θ r sin θ ∂ϕ

ð3:23Þ

Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get Lz = - iħ

∂ : ∂ϕ

ð3:24Þ

In a similar manner, we have ∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂ = þ = cos θ : r ∂θ ∂z ∂z ∂r ∂z ∂θ ∂r Combining this relation with either (3.23) or (3.22), we get Lx = iħ sin ϕ

∂ ∂ þ cot θ cos ϕ , ∂θ ∂ϕ

Ly = - iħ cos ϕ

∂ ∂ - cot θ sin ϕ : ∂θ ∂ϕ

Now, we introduce the following operators:

ð3:25Þ ð3:26Þ

3.2 Constitution of Hamiltonian

67

LðþÞ  Lx þ iLy

and

Lð- Þ  Lx - iLy :

ð3:27Þ

Then, we have LðþÞ = ħeiϕ

∂ ∂ þ i cot θ ∂θ ∂ϕ

and Lð- Þ = ħe - iϕ -

∂ ∂ : ð3:28Þ þ i cot θ ∂θ ∂ϕ

Thus, we get LðþÞ Lð- Þ ¼ ħ2 eiϕ

∂ ∂ ∂ ∂ þ icot θ e - iϕ - þ icot θ ∂θ ∂ϕ ∂θ ∂ϕ

¼ ħ2 eiϕ e - iϕ -

þ e -iϕ cot θ -

2

2

2

∂ ∂ ∂ ∂ þ i cot θ 2 þ icot θ þ ie - iϕ cot θ ∂ϕ∂θ ∂θ ∂ϕ ∂ϕ

2

¼ - ħ2

2

∂ 1 ∂ ∂ þi þ i cot θ ∂θ∂ϕ sin 2 θ ∂ϕ ∂θ2

2

∂ ∂ ∂ ∂ þ cot θ þ i þ cot 2 θ 2 : 2 ∂θ ∂ϕ ∂θ ∂ϕ ð3:29Þ

In the above calculation procedure, we used differentiation of a product function. For instance, we have 2

∂ ∂ ∂ cot θ ∂ ∂ icot θ =i þ cot θ ∂θ ∂ϕ ∂θ ∂ϕ ∂θ∂ϕ 2

=i 2

-

∂ ∂ 1 : þ cot θ 2 ∂θ∂ϕ sin θ ∂ϕ 2

∂ ∂ = ∂ϕ∂θ . This is because we are dealing with continuous and Note also that ∂θ∂ϕ differentiable functions. Meanwhile, we have the following commutation relations:

Lx , Ly = iħLz ,

Ly , Lz = iħLx ,

and ½Lz , Lx  = iħLy :

ð3:30Þ

This can easily be confirmed by requiring canonical commutation relations. The derivation can routinely be performed, but we show it because the procedures include several important points. For instance, we have

68

3

Hydrogen-Like Atoms

Lx , Ly = Lx Ly - Ly Lx = ypz - zpy zpx - xpz - zpx - xpz ypz - zpy = ypz zpx - ypz xpz - zpy zpx þ zpy xpz - zpx ypz þ zpx zpy þ xpz ypz - xpz zpy = ypx pz z - zpx ypz þ zpy xpz - xpz zpy þ xpz ypz - ypz xpz þ zpx zpy - zpy zpx = - ypx zpz - pz z þ xpy zpz - pz z = iħ xpy - ypx = iħLz : In the above calculations, we used the canonical commutation relation as well as commutability of, e.g., y and px; y and z; px and py. For example, we get px , py j ψi = - ħ2

∂ ∂ ∂ ∂ ∂x ∂y ∂y ∂x

2

2

∂ j ψi ∂ j ψi = 0: ∂x∂y ∂y∂x

j ψi = - ħ2

In the above equation, we assumed that the order of differentiation with respect to x and y can be switched. It is because we are dealing with continuous and differentiable normal functions. Thus, px and py commute. For other important commutation relations, we have Lx , L2 = 0,

Ly , L2 = 0,

and

Lz , L2 = 0:

ð3:31Þ

With the derivation, use ½A, B þ C  = ½A, B þ ½A, C : The derivation is straightforward and it is left for readers. The relations (3.30) and (3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz. This is because L2 commute with them from (3.31), whereas Lz does not commute with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen in Part III. Thus, we have LðþÞ Lð- Þ = Lx 2 þ Ly 2 þ i Ly Lx - Lx Ly = Lx 2 þ Ly 2 þ i Ly , Lx = Lx 2 þ Ly 2 þ ħLz : Notice here that [Ly, Lx] = - [Lx, Ly] = - iħLz. Hence, L2 = LðþÞ Lð- Þ þ Lz 2 - ħLz : From (3.24), we have

ð3:32Þ

3.3

Separation of Variables

69 2

L z 2 = - ħ2

∂ : ∂ϕ2

ð3:33Þ

Finally, we get 2

L2 = - ħ2

2

∂ ∂ ∂ 1 þ cot θ þ 2 2 ∂θ ∂θ sin θ ∂ϕ2

or 2

L2 = - ħ2

∂ 1 ∂ ∂ 1 þ : sin θ sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

ð3:34Þ

Replacing L2 in (3.15) with that of (3.34), we have 2

H= -

ħ2 ∂ ∂ ∂ 1 ∂ ∂ 1 þ þ r2 sin θ sin θ ∂θ 2μr 2 ∂r ∂r ∂θ sin 2 θ ∂ϕ2

Ze2 : 4πε0 r

ð3:35Þ

Thus, the Schrödinger equation of (3.3) takes a following form: 2

-

Ze2 ∂ ∂ 1 ∂ ∂ 1 ħ2 ∂ ψ r2 þ sin θ þ 2 2 2 4πε0 r sin θ ∂θ 2μr ∂r ∂r ∂θ sin θ ∂ϕ

= Eψ: ð3:36Þ

3.3

Separation of Variables

If the potential is spherically symmetric (e.g., a Coulomb potential), it is well-known that the Schrödinger equations of (3.1)–(3.3) can be solved by a method of separation of variables. More specifically, (3.36) can be separated into two differential equations one of which only depends on a radial component r and the other of which depends only upon angular components θ and ϕ. To apply the method of separation of variables to (3.36), let us first return to (3.15). Considering that L2 is expressed as (3.34), we assume that L2 has eigenvalues γ (at any rate if any) and takes eigenfunctions Y(θ, ϕ) (again, if any as well) corresponding to γ. That is,

70

3

Hydrogen-Like Atoms

L2 Y ðθ, ϕÞ = γY ðθ, ϕÞ,

ð3:37Þ

where Y(θ, ϕ) is assumed to be normalized. Meanwhile, L2 = Lx 2 þ Ly 2 þ Lz 2 :

ð3:38Þ

From (3.6), we have Lx { = ypz - zpy

{

= pz { y{ - py { z{ = pz y - py z = ypz - zpy = Lx :

ð3:39Þ

Note that pz and y commute, so do py and z. Therefore, Lx is Hermitian, so is Lx2. More generally if an operator A is Hermitian, so is An (n: a positive integer); readers, please show it. Likewise, Ly and Lz are Hermitian as well. Thus, L2 is Hermitian, too. Next, we consider an expectation value of L2, i.e., hL2i. Let jψi be an arbitrary normalized non-zero vector (or function). Then, L2  ψjL2 ψ = ψjLx 2 ψ þ ψjLy 2 ψ þ ψjLz 2 ψ = Lx { ψjLx ψ þ Ly { ψjLy ψ þ Lz { ψjLz ψ

ð3:40Þ

= hLx ψjLx ψ i þ Ly ψjLy ψ þ hLz ψjLz ψ i = jLx ψ jj2 þ Ly ψ

2

þ jjLz ψ jj2 ≥ 0:

Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian. An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc. where we saw the calculation routines). Note also that in (3.40) the equality holds only when the following relations hold: j Lx ψi = j Ly ψi = j Lz ψi = 0:

ð3:41Þ

On this condition, we have j L2 ψi = j Lx 2 þ Ly 2 þ Lz 2 ψi = j Lx 2 ψiþ j Ly 2 ψiþ j Lz 2 ψi = Lx j Lx ψi þ Ly j Ly ψi þ Lz j Lz ψi = 0:

ð3:42Þ

The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simultaneous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have j Lx ψ 0 i = j Ly ψ 0 i = j Lz ψ 0 i = j L2 ψ 0 i = 0:

ð3:43Þ

3.3

Separation of Variables

71

As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz, and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We will come back this point later. In spite of this exceptional situation, it is impossible that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous eigenstates. We briefly show this as follows. In Chap. 2, we mention that if [A, B] = ik, any physical state cannot be an eigenstate of A or B. The situation is different, on the other hand, if we have a following case ½A, B = iC,

ð3:44Þ

where A, B, and C are Hermitian operators. The relation (3.30) is a typical example for this. If C j ψi = 0 in (3.44), jψi might well be an eigenstate of A and/or B. However, if C j ψi = c j ψi (c ≠ 0), jψi cannot be an eigenstate of A or B. This can readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of, e.g., [Lx, Ly] = iħLz. Suppose that for ∃ψ 0 we have Lz j ψ 0i = 0. Taking an inner product using jψ 0i, from (3.30) we have ψ 0 j Lx Ly - Ly Lx ψ 0 = 0: In this case, moreover, even if we have jLxψ 0i = 0 and jLyψ 0i = 0, we have no inconsistency. If, on the other hand, Lz j ψi = m j ψi (m ≠ 0), jψi cannot be an eigenstate of Lx or Ly as mentioned above. Thus, we should be careful to deal with a general situation where we have [A, B] = iC. In the case where [A, B] = 0; AB = BA, namely A and B commute, we have a different situation. This relation is equivalent to that an operator AB - BA has an eigenvalue zero for any physical state jψi. Yet, this statement is of less practical use. Again, regarding details we wish to make a discussion in Sect. 14.6 of Part III. Returning to (3.40), let us replace ψ with a particular eigenfunction Y(θ, ϕ). Then, we have YjL2 Y = hYjγY i = γ hYjY i = γ ≥ 0:

ð3:45Þ

Again, if L2 has an eigenvalue, the eigenvalue should be non-negative. Taking account of the coefficient ħ2 in (3.34), it is convenient to put γ = ħ2 λ ðλ ≥ 0Þ:

ð3:46Þ

On ground that the solution of (3.36) can be described as ψ ðr, θ, ϕÞ = Rðr ÞY ðθ, ϕÞ, the Schrödinger equation (3.16) can be rewritten as

ð3:47Þ

72

3

Hydrogen-Like Atoms

1 ħ2 ∂ ∂ L2 Ze2 þ 2 Rðr ÞY ðθ, ϕÞ = ERðr ÞY ðθ, ϕÞ: - 2 r2 4πε0 r 2μ r ∂r r ∂r

ð3:48Þ

That is, ∂Rðr Þ L2 Y ðθ, ϕÞ 1 ħ2 ∂ Ze2 Y ðθ, ϕÞ þ - 2 R ðr Þ Rðr ÞY ðθ, ϕÞ r2 2 2μ 4πε0 r r r ∂r ð3:49Þ ∂r = ERðr ÞY ðθ, ϕÞ: Recalling (3.37) and (3.46), we have ∂Rðr Þ ħ2 λY ðθ, ϕÞ 1 ħ2 ∂ Ze2 r2 Y ðθ, ϕÞ þ R ðr Þ Rðr ÞY ðθ, ϕÞ - 2 2 4πε0 r 2μ r r ∂r ∂r = ERðr ÞY ðθ, ϕÞ: ð3:50Þ Dividing both sides by Y(θ, ϕ), we get a SOLDE of a radial component as ∂Rðr Þ ħ2 λ Ze2 1 ħ2 ∂ þ 2 Rðr Þ Rðr Þ = ERðr Þ: r2 - 2 4πε0 r 2μ r r ∂r ∂r

ð3:51Þ

Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have 2

L2 Y ðθ, ϕÞ = - ħ2

1 ∂ ∂ 1 ∂ Y ðθ, ϕÞ sin θ þ 2 sin θ ∂θ ∂θ sin θ ∂ϕ2

= ħ2 λY ðθ, ϕÞ:

ð3:52Þ

Dividing both sides by ħ2, we get 2

-

1 ∂ ∂ 1 ∂ Y ðθ, ϕÞ = λY ðθ, ϕÞ: sin θ þ sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

ð3:53Þ

Notice in (3.53) that the angular part of SOLDE does not depend on a specific form of the potential. Now, we further assume that (3.53) can be separated into a zenithal angle part θ and azimuthal angle part ϕ such that Y ðθ, ϕÞ = ΘðθÞΦðϕÞ: Then we have

ð3:54Þ

3.3

Separation of Variables

73 2

-

∂ΘðθÞ 1 ∂ 1 ∂ Φ ð ϕÞ ΘðθÞ = λΘðθÞΦðϕÞ: ð3:55Þ sin θ Φ ð ϕÞ þ sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get 2

-

sin 2 θ 1 ∂ Φð ϕÞ = 2 Θ ðθ Þ ΦðϕÞ ∂ϕ

∂ΘðθÞ 1 ∂ þ λΘðθÞ : sin θ sin θ ∂θ ∂θ

ð3:56Þ

Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must have LHS of ð3:56Þ = RHS of ð3:56Þ = ηðconstantÞ:

ð3:57Þ

Thus, we have a following relation of LHS of (3.56): Putting D  -

d2 , dϕ2

1 d2 ΦðϕÞ = η: ΦðϕÞ dϕ2

ð3:58Þ

DΦðϕÞ = ηΦðϕÞ:

ð3:59Þ

we get

The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3, where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however, we have to consider different BCs, i.e., the periodic BCs. As in Example 1.1, we adopt two linearly independent solutions. That is, we have eimϕ

and

e - imϕ ðm ≠ 0Þ:

As their linear combination, we have ΦðϕÞ = aeimϕ þ be - imϕ :

ð3:60Þ

As BCs, we consider Φ(0) = Φ(2π) and Φ′(0) = Φ′(2π); i.e., we have a þ b = aei2πm þ be - i2πm :

ð3:61Þ

Φ0 ðϕÞ = aimeimϕ - bime - imϕ :

ð3:62Þ

Meanwhile, we have

Therefore, from BCs we have

74

3

Hydrogen-Like Atoms

aim - bim = aimei2πm - bime - i2πm : Then, a - b = aei2πm - be - i2πm :

ð3:63Þ

From (3.61) and (3.63), we have 2a 1 - ei2πm = 0

and 2b 1 - e - i2πm = 0:

If a ≠ 0, we must have m = 0, ± 1, ± 2, ⋯. If a = 0, we must have b ≠ 0 to avoid having Φ(ϕ)  0 as a solution. In that case, we have m = 0, ± 1, ± 2, ⋯ as well. Thus, it suffices to put ΦðϕÞ = ceimϕ ðm = 0, ± 1, ± 2, ⋯Þ: Therefore, as a normalized function ΦðϕÞ, we get 1 ΦðϕÞ = p eimϕ ðm = 0, ± 1, ± 2, ⋯Þ: 2π

ð3:64Þ

Inserting it into (3.58), we have m2 eimϕ = ηeimϕ : Therefore, we get η = m2 ðm = 0, ± 1, ± 2, ⋯Þ:

ð3:65Þ

From (3.56) and (3.65), we have -

dΘðθÞ m2 ΘðθÞ 1 d þ sin θ = λΘðθÞ ðm = 0, ± 1, ± 2, ⋯Þ: dθ sin θ dθ sin 2 θ

ð3:66Þ

p In (3.64) putting m = 0 as an eigenvalue, we have ΦðϕÞ = 1= 2π as a corresponding eigenfunction. Unlike Examples 1.1 and 1.2, this reflects that the differential operator -d2/dϕ2 accompanied by the periodic BCs is a non-negative operator that allows an eigenvalue of zero. Yet, we are uncertain of a range of m. To clarify this point, we consider generalized angular momentum in the next section.

3.4

Generalized Angular Momentum

3.4

75

Generalized Angular Momentum

We obtained commutation relations of (3.30) among individual angular momentum components Lx, Ly, and Lz. In an opposite way, we may start with (3.30) to define angular momentum. Such a quantity is called generalized angular momentum. Let J be a generalized angular momentum as in the case of (3.4) such that

J = ð e1 e2 e3 Þ

Jx Jy

:

ð3:67Þ

Jz For the sake of simple notation, let us define J as follows so that we can eliminate ħ and deal with dimensionless quantities in the present discussion: J x =ħ J  J=ħ = ðe1 e2 e3 Þ

J y =ħ

Jx = ð e1 e 2 e 3 Þ

J z =ħ

Jy

,

ð3:68Þ

Jz

J2 = J x 2 þ J y 2 þ J z 2 : Then, we require the following commutation relations: J x , J y = iJ z ,

J y , J z = iJ x ,

and

½J z , J x  = iJ y :

ð3:69Þ

Also, we require Jx, Jy, and Jz to be Hermitian. The operator J2 is Hermitian accordingly. The relations (3.69) lead to J x , J2 = 0,

J y , J2 = 0,

and

J z , J2 = 0:

ð3:70Þ

This can be confirmed as in the case of (3.30). As noted above, again a simultaneous eigenstate exists for J2 and one of Jx, Jy, and Jz. According to the convention, we choose J2 and Jz for the simultaneous eigenstate. Then, designating the eigenstate by jζ, μi, we have J 2 j ζ, μi = ζ j ζ, μi

and J z j ζ, μi = μ j ζ, μi:

ð3:71Þ

The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which jζ, μi belongs to as well. Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ ≥ 0 as in the case of (3.45). We define the following operators J(+) and J(-) as in the case of (3.27):

76

J ðþÞ  J x þ iJ y

and

3

Hydrogen-Like Atoms

J ð- Þ  J x - iJ y :

ð3:72Þ

Then, from (3.69) and (3.70), we get J ðþÞ , J2 = J ð- Þ , J2 = 0:

ð3:73Þ

Also, we obtain the following commutation relations: J z , J ðþÞ = J ðþÞ ;

J z , J ð- Þ = - J ð- Þ ;

J ðþÞ , J ð- Þ = 2J z :

ð3:74Þ

From (3.70) to (3.72), we get J2 J ðþÞ j ζ, μi = J ðþÞ J2 j μi = ζJ ðþÞ j ζ, μi, J2 J ð- Þ j ζ, μi = J ð- Þ J2 j ζ, μi = ζJ ð- Þ j ζ, μi:

ð3:75Þ

Equation (3.75) indicates that both J(+) j ζ, μi and J(-) j ζ, μi are eigenvectors of J that correspond to an eigenvalue ζ. Meanwhile, from (3.74) we get 2

J z J ðþÞ j ζ, μi = J ðþÞ ðJ z þ 1Þ j ζ, μi = ðμ þ 1ÞJ ðþÞ j ζ, μi, J z J ð- Þ j ζ, μi = J ð- Þ ðJ z - 1Þ j ζ, μi = ðμ - 1ÞJ ð- Þ j ζ, μi:

ð3:76Þ

The relation (3.76) means that J(+) j ζ, μi is an eigenvector of Jz corresponding to an eigenvalue (μ + 1), while J(-) j ζ, μi is an eigenvector of Jz corresponding to an eigenvalue (μ - 1). This implies that J(+) and J(-) function as raising and lowering operators (or ladder operators) that have been introduced in this chapter. Thus, using undetermined constants (or phase factors) aμ(+) and aμ(-), we describe J ðþÞ j ζ, μi = aμ ðþÞ j ζ, μ þ 1i

and J ð- Þ j ζ, μi = aμ ð- Þ j ζ, μ - 1i:

ð3:77Þ

Next, let us characterize eigenvalues μ. We have J x 2 þ J y 2 = J2 - J z 2 :

ð3:78Þ

J x 2 þ J y 2 j ζ, μi = J 2 - J z 2 j ζ, μi = ζ - μ2 j ζ, μi:

ð3:79Þ

Therefore,

Since (Jx2 + Jy2) is a non-negative operator, its eigenvalues are non-negative as well, as can be seen from (3.40) and (3.45). Then, we have

3.4

Generalized Angular Momentum

77

ζ - μ2 ≥ 0:

ð3:80Þ

Thus, for a fixed value of non-negative ζ, μ is bounded both upwards and downwards. We define then a maximum of μ as j and a minimum of μ as j′. Consequently, on the basis of (3.77), we have J ðþÞ j ζ, ji = 0

J ð- Þ j ζ, j0 i = 0:

and

ð3:81Þ

This is because we have no quantum state corresponding to jζ, j + 1i or jζ, j′ - 1i. From (3.75) and (3.81), possible numbers of μ are j, j - 1, j - 2, ⋯, j0 :

ð3:82Þ

From (3.69) and (3.72), we get J ð- Þ J ðþÞ = J2 - J z 2 - J z ,

J ðþÞ J ð- Þ = J2 - J z 2 þ J z :

ð3:83Þ

Operating these operators on jζ, ji or jζ, j′i and using (3.81) we get J ð- Þ J ðþÞ j ζ, ji = J2 - J z 2 - J z j ζ, ji = ζ - j2 - j j ζ, ji = 0, J ðþÞ J ð- Þ j ζ, j0 i= J2 - J z 2 þ J z j ζ, j0 i = ζ - j0 þ j0 j ζ, j0 i = 0: 2

ð3:84Þ

Since jζ, ji ≠ 0 and jζ, j′i ≠ 0, we have ζ - j2 - j = ζ - j0 þ j0 = 0:

ð3:85Þ

ζ = jðj þ 1Þ = j0 ðj0 - 1Þ:

ð3:86Þ

2

This means that

Moreover, from (3.86) we get jðj þ 1Þ - j0 ðj0 - 1Þ = ðj þ j0 Þðj - j0 þ 1Þ = 0:

ð3:87Þ

As j ≥ j′, j - j′ + 1 > 0. From (3.87), therefore, we get j þ j0 = 0

or

j = - j0 :

ð3:88Þ

Then, we conclude that the minimum of μ is –j. Accordingly, possible values of μ are

78

3

Hydrogen-Like Atoms

μ = j, j - 1, j - 2, ⋯, - j - 1, - j:

ð3:89Þ

That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a positive integer k, j - k = - j or

j = k=2:

ð3:90Þ

In other words, j is permitted to take a number zero, a positive integer, or a positive half-integer (or more precisely, half-odd-integer). For instance, if j = 1/2, μ can be 1/2 or -1/2. When j = 1, μ can be 1, 0, or - 1. Finally, we have to decide undetermined constants aμ(+) and aμ(-). To this end, multiplying hζ, μ - 1j on both sides of the second equation of (3.77) from the left, we have ζ, μ - 1jJ ð- Þ jζ, μ = aμ ð- Þ hζ, μ - 1jζ, μ - 1i = aμ ð- Þ ,

ð3:91Þ

where the second equality comes from that jζ, μ - 1i has been normalized; i.e., ||| ζ, μ - 1i|| = 1. Meanwhile, taking adjoint of both sides of the first equation of (3.77), we have hζ, μ j J ðþÞ

{



= aμ ðþÞ hζ, μ þ 1 j :

ð3:92Þ

However, from (3.72) and the fact that Jx and Jy are Hermitian, J ð þÞ

{

= J ð- Þ :

ð3:93Þ

Using (3.93) and replacing μ in (3.92) with μ - 1, we get 

hζ, μ - 1 j J ð- Þ = aμ - 1 ðþÞ hζ, μ j :

ð3:94Þ

Furthermore, multiplying jζ, μi on (3.94) from the right, we have 



ζ, μ - 1jJ ð- Þ jζ, μ = aμ - 1 ðþÞ hζ, μjζ, μi = aμ - 1 ðþÞ ,

ð3:95Þ

where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get 

aμ ð- Þ = aμ - 1 ðþÞ :

ð3:96Þ

Taking an inner product regarding the first equation of (3.77) and its adjoint,

3.5

Orbital Angular Momentum: Operator Approach 

79 2

ζ, μjJ ð- Þ J ðþÞ jζ, μ = aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i = aμ ðþÞ :

ð3:97Þ

Once again, the second equality of (3.97) results from the normalization of the vector. Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as ζ, μjJ 2 - J z 2 - J z jζ, μ 2

= ζ, μjjðj þ 1Þ - μ2 - μjζ, μ = hζ, μjζ, μiðj - μÞðj þ μ þ 1Þ = aμ ðþÞ : ð3:98Þ Thus, we get aμ ðþÞ = eiδ

ðj - μÞðj þ μ þ 1Þ ðδ : an arbitrary real numberÞ,

ð3:99Þ

where eiδ is a phase factor. From (3.96) we also get aμ ð- Þ = e - iδ

ðj - μ þ 1Þðj þ μÞ:

ð3:100Þ

In (3.99) and (3.100), we routinely put δ = 0 so that aμ(+) and aμ(-) can be positive numbers. Explicitly rewriting (3.77), we get J ðþÞ j ζ, μi = J

ð- Þ

j ζ, μi =

ðj - μÞðj þ μ þ 1Þ j ζ, μ þ 1i, ðj - μ þ 1Þðj þ μÞ j ζ, μ - 1i,

ð3:101Þ

where j is a fixed given number chosen from among zero, positive integers, and positive half-integers (or half-odd-integers). As discussed above, we have derived various properties and relations with respect to the generalized angular momentum on the basis of (1) the relation (3.30) or (3.69) and (2) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful in dealing with various angular momenta of different origins (e.g., orbital angular momentum, spin angular momentum, etc.) from a unified point of view. In Chap. 20, we will revisit this issue in more detail.

3.5

Orbital Angular Momentum: Operator Approach

In Sect. 3.4 we have derived various important results on angular momenta on the basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator approach. This approach enables us to understand why a quantity j introduced in

80

3 Hydrogen-Like Atoms

Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In the next section (Sect. 3.6) we will deal with the related issues by an analytical method. In (3.28) we introduced differential operators L(+) and L(-). According to Sect. 3.4, we define the following operators to eliminate ħ so that we can deal with dimensionless quantities: Mx M  L=ħ = ðe1 e2 e3 Þ

My

,

ð3:102Þ

Mz M 2 = L2 =ħ2 = M x 2 þ M y 2 þ M z 2 : Hence, we have M x = Lx =ħ,

M y = Ly =ħ,

and M z = Lz =ħ:

ð3:103Þ

Moreover, we define the following operators [3]: M ðþÞ  M x þ iM y = LðþÞ =ħ = eiϕ

∂ ∂ þ i cot θ , ∂θ ∂ϕ ∂ ∂ : þ i cot θ ∂θ ∂ϕ

M ð- Þ  M x - iM y = Lð- Þ =ħ = e - iϕ -

ð3:104Þ ð3:105Þ

Then we have 2

M2 = -

∂ 1 ∂ ∂ 1 : sin θ þ sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

ð3:106Þ

Here we execute variable transformation such that ξ = cos θ ð0 ≤ θ ≤ πÞ

or

sin θ =

1 - ξ2 :

ð3:107Þ

Noting, e.g., that ∂ ∂ ∂ξ ∂ == = - sin θ ∂ξ ∂θ ∂θ ∂ξ - sin 2 θ we get

∂ ∂ , = - 1 - ξ2 ∂ξ ∂ξ

1 - ξ2

∂ , ∂ξ

sin θ

∂ = ∂θ ð3:108Þ

3.5

Orbital Angular Momentum: Operator Approach

M ðþÞ = eiϕ -

1 - ξ2

M ð- Þ = e - iϕ M2 = -

∂ ∂ξ

1 - ξ2

81

∂ þi ∂ξ

∂ þi ∂ξ

ξ

∂ , ∂ϕ 1-ξ 2

ξ

∂ , 1 - ξ ∂ϕ

ð3:109Þ

2

2

∂ 1 ∂ : ∂ξ 1 - ξ2 ∂ϕ2

1 - ξ2

Although we showed in Sect. 3.3 that m = 0, ± 1, ± 2, ⋯, the range of m was unclear. The relationship between m and λ in (3.66) remains unclear so far as well. On the basis of a general approach developed in Sect. 3.4, however, we have known that the eigenvalue μ of the dimensionless z-component angular momentum Jz is bounded with its maximum and minimum being j and –j, respectively [see (3.89)], where j can be zero, a positive integer, or a positive half-odd-integer. Concomitantly, the eigenvalue ζ of J2 equals j( j + 1). In the present section, let us reconsider the relationship between m and λ in (3.66) in light of the knowledge obtained in Sect. 3.4. According to the custom, we replace μ in (3.89) with m to have m = j, j - 1, j - 2, ⋯, j - 1, - j:

ð3:110Þ

At the moment, we assume that m can be a half-odd-integer besides zero or an integer [3]. Now, let us define notation of Y(θ, ϕ) that appeared in (3.37). This function is eligible for a simultaneous eigenstate of M2 and Mz and can be indexed with j and m as in (3.110). Then, let Y(θ, ϕ) be described accordingly as Ym j ðθ, ϕÞ  Y ðθ, ϕÞ:

ð3:111Þ

From (3.54) and (3.64), we have imϕ Ym : j ðθ, ϕÞ / e

Therefore, we get iϕ M ðþÞ Y m j ðθ, ϕÞ = e

-

1 - ξ2

∂ ∂ξ mþ1

= -e



1-ξ

2

where we used the following equations:

∂ ∂ξ

mξ 1 - ξ2

Ym j ðθ, ϕÞ

1-ξ

2

-m

ð3:112Þ Ym j ðθ, ϕÞ

,

82

3

∂ ∂ξ

1 - ξ2

-m

= ð - mÞ = mξ

∂ ∂ξ

1 - ξ2

-m

1 - ξ2

-m-1



1 2

Hydrogen-Like Atoms -1

1 - ξ2

ð - 2ξÞ

-m-2

1 - ξ2

,

Ym j ðθ, ϕÞ = mξ

1 - ξ2

þ

-m-2

-m

1 - ξ2

ð3:113Þ Ym j ðθ, ϕÞ

∂Y m j ðθ, ϕÞ : ∂ξ

Similarly, we get - iϕ M ð- Þ Y m j ðθ, ϕÞ = e

= e - iϕ

1 - ξ2 1 - ξ2

∂ ∂ξ - mþ1

mξ 1 - ξ2 ∂ ∂ξ

Ym j ðθ, ϕÞ ð3:114Þ

m

1 - ξ2

Ym j ðθ, ϕÞ :

Let us derive the relations where M(+) or M(-) is successively operated on (+) Ym j ðθ, ϕÞ. In the case of M , using (3.109) we have n

n inϕ M ðþÞ Y m j ðθ, ϕÞ = ð- 1Þ e



1 - ξ2

-m

Ym j ðθ, ϕÞ :

mþn

1 - ξ2

n

∂ ∂ξn ð3:115Þ

We confirm this relation by mathematical induction. We have (3.112) by replacing n with 1 in (3.115). Namely, (3.115) holds when n = 1. Next, suppose that (3.115) holds with n. Then, using the first equation of (3.109) and noting (3.64), we have

3.5

Orbital Angular Momentum: Operator Approach

M ðþÞ

nþ1

n

ðþÞ Ym j ðθ, ϕÞ ¼ M

M ðþÞ Y m j ðθ, ϕÞ

¼ eiϕ -

1 - ξ2

∂ þi ∂ξ

ξ



2 ∂ϕ

1- ξ mþn

× ð- 1Þn einϕ

1- ξ2

mþn

×

1 - ξ2

¼ ð- 1Þn eiðnþ1Þϕ 1 2

1- ξ2

-1

n

1 -ξ2

-

1- ξ2 ξðn þ mÞ 1- ξ2

-m

1 - ξ2 ðm þ nÞ

1- ξ2

∂ ∂ξn

1- ξ2 -m

nþ1

1- ξ2

n

∂ ∂ξn mþðnþ1Þ

¼ ð- 1Þnþ1 eiðnþ1Þϕ

1 - ξ2

∂ ∂ξnþ1

mþn

1 - ξ2

Ym j ðθ,ϕÞ

Ym j ðθ, ϕÞ

n

ð - 2ξÞ

mþn

-

1- ξ2

∂ ξðn þ mÞ ∂ξ 1 - ξ2

∂ ∂ξn

-

-m

n

∂ ∂ξn

1 -ξ2

¼ ð- 1Þn eiðnþ1Þϕ -

×

83

1- ξ2

1 - ξ2

mþn -1

Ym j ðθ,ϕÞ -m

-m

Ym j ðθ,ϕÞ

Ym j ðθ,ϕÞ

nþ1

∂ ∂ξnþ1

1- ξ2

-m

Ym j ðθ, ϕÞ : ð3:116Þ

Notice that the first and third terms in the second last equality canceled each other. Thus, (3.115) certainly holds with (n + 1). Similarly, we have [3] n

- inϕ M ð- Þ Y m j ðθ, ϕÞ = e

1 - ξ2

- mþn

n

∂ ∂ξn

m



1 - ξ2

Ym j ðθ, ϕÞ :

ð3:117Þ

Proof of (3.117) is left for readers. From the second equation of (3.81) and (3.114) where m is replaced with -j, we have

84

3 jþ1

M ð- Þ Y j- j ðθ, ϕÞ = e - iϕ

1 - ξ2

∂ ∂ξ

1 - ξ2

-j

Hydrogen-Like Atoms

Y j- j ðθ, ϕÞ

= 0:

ð3:118Þ -j

1 - ξ2

This implies that

Y j- j ðθ, ϕÞ is a constant with respect to ξ. We

describe this as -j

1 - ξ2

Y j- j ðθ, ϕÞ = c ðc : constant with respect to ξÞ:

ð3:119Þ

Meanwhile, putting m = - j and n = 2j + 1 in (3.115) and taking account of the first equation of (3.77) and the first equation of (3.81), we get M ðþÞ

2jþ1

Y j- j ðθ, ϕÞ = ð- 1Þ2jþ1 eið2jþ1Þϕ j



1 - ξ2

jþ1

1 - ξ2

2jþ1

∂ ∂ξ2jþ1

Y j- j ðθ, ϕÞ = 0:

ð3:120Þ

This means that j

1 - ξ2

Y j- j ðθ, ϕÞ = ðat most a 2j‐degree polynomial with ξÞ:

ð3:121Þ

Replacing Y j- j ðθ, ϕÞ in (3.121) with that of (3.119), we get j

c

1 - ξ2

j

1 - ξ2

= c 1 - ξ2

j

= ðat most a 2j‐degree polynomial with ξÞ:

ð3:122Þ

Here if j is a half-odd-integer, c(1 - ξ2)j of (3.122) cannot be a polynomial. If on the other hand j is zero or a positive integer, c(1 - ξ2)j is certainly a polynomial and, to top it all, a 2j-degree polynomial with respect to ξ; so is

j

1 - ξ2 Y j- j ðθ, ϕÞ.

According to the custom, henceforth we use l as zero or a positive integer instead of j. That is, Y ðθ, ϕÞ  Y m l ðθ, ϕÞ ðl : zero or a positive integer Þ:

ð3:123Þ

At the same time, so far as the orbital angular momentum is concerned, from (3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have

3.5

Orbital Angular Momentum: Operator Approach

85

ζ = lðl þ 1Þ: Concomitantly, m in (3.110) is determined as m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l:

ð3:124Þ

Thus, as expected m is zero or a positive or negative integer. Considering (3.37) and (3.46), ζ is identical with λ in (3.46). Finally, we rewrite (3.66) such that -

dΘðθÞ m2 ΘðθÞ 1 d þ sin θ = lðl þ 1ÞΘðθÞ, dθ sin θ dθ sin 2 θ

ð3:125Þ

where l is equal to zero or positive integers and m is given by (3.124). On condition of ξ = cos θ (3.107), defining the following function: Pm l ðξÞ  ΘðθÞ,

ð3:126Þ

and considering (3.109) along with (3.54), we arrive at the next SOLDE described as d dξ

1 - ξ2

dPm m2 l ðξÞ þ lðl þ 1Þ Pm ðξÞ = 0: dξ 1 - ξ2 l

ð3:127Þ

The SOLDE of (3.127) is well-known as the associated Legendre differential equation. The solutions Pm l ðξÞ are called associated Legendre functions. In the next section, we characterize the said equation and functions by an analytical method. Before going into details, however, we further seek characteristics of Pm l ðξÞ by the operator approach. Adopting the notation of (3.123) and putting m = l in (3.112), we have M ðþÞ Y ll ðθ, ϕÞ = - eiϕ

lþ1

1 - ξ2

∂ ∂ξ

1 - ξ2

-l

Y ll ðθ, ϕÞ :

ð3:128Þ

Corresponding to (3.81), we have M ðþÞ Y ll ðθ, ϕÞ = 0. This implies that 1 - ξ2

-l

Y ll ðθ, ϕÞ = c ðc : constant with respect to ξÞ:

ð3:129Þ

From (3.64) and (3.107), we get Y ll ðθ, ϕÞ = κl sin l θeilϕ ,

ð3:130Þ

where κ l is another constant that depends on l, but is independent of θ and ϕ. Let us seek κ l by normalization condition. That is,

86

3 π



dϕ 0

0

π

2

sin θdθ Y ll ðθ, ϕÞ = 2π  jκ l j2

Hydrogen-Like Atoms

sin 2lþ1 θdθ = 1,

ð3:131Þ

0

where the integration is performed on a unit sphere. Note that an infinitesimal area element on the unit sphere is represented by sin θdθdϕ. We evaluate the above integral denoted as π

I

sin 2lþ1 θdθ:

ð3:132Þ

0

Using integration by parts, π

I =

ð - cos θÞ ′ sin 2l θdθ

0

=

- cos θ sin 2l θ π

= 2l

π 0

π

þ 0

sin 2l - 1 θdθ - 2l

0

ð cos θÞ  2l  sin 2l - 1 θ cos θdθ

π

ð3:133Þ

sin 2lþ1 θdθ:

0

Thus, we get a recurrence relation with respect to I (3.132) such that I=

2l 2l þ 1

π

sin 2l - 1 θdθ:

ð3:134Þ

0

Repeating the above process, we get I=

2l 2l - 2 2  ⋯ 2l þ 1 2l - 1 3

π 0

sin θdθ =

22lþ1 ðl!Þ2 : ð2l þ 1Þ!

ð3:135Þ

Then, jκ l j =

1 2l l!

ð2l þ 1Þ! or 4π

κl =

eiχ 2l l!

ð2l þ 1Þ! ðχ : realÞ, 4π

ð3:136Þ

where eiχ is an undetermined constant (phase factor) that is to be determined below. Thus we get Y ll ðθ, ϕÞ =

eiχ 2l l!

ð2l þ 1Þ! l ilϕ sin θe : 4π

ð3:137Þ

Meanwhile, in the second equation of (3.101) replacing J(-), j, and μ in (3.101) with M(-), l, and m, respectively, and using Y m l ðθ, ϕÞ instead of jζ, μi, we get

3.5

Orbital Angular Momentum: Operator Approach -1 Ym ðθ, ϕÞ = l

87

1 M ð- Þ Y m l ðθ, ϕÞ: ðl - m þ 1Þðl þ mÞ

ð3:138Þ

Replacing m with l in (3.138), we have 1 Y ll - 1 ðθ, ϕÞ = p M ð- Þ Y ll ðθ, ϕÞ: 2l

ð3:139Þ

Operating M(-)(l - m) times in total on Y ll ðθ, ϕÞ of (3.139), we have Ym l ðθ, ϕÞ = =

1 2lð2l - 1Þ⋯ðl þ m þ 1Þ ðl þ mÞ! M ð- Þ ð2lÞ!ðl - mÞ!

l-m

M ð- Þ

1  2⋯ðl - mÞ

l-m

Y ll ðθ, ϕÞ

Y ll ðθ, ϕÞ: ð3:140Þ

Putting m = l, n = l - m, and j = l in (3.117), we have M ð- Þ

l-m

Y ll ðθ, ϕÞ = e - iðl - mÞϕ

1 - ξ2

-m

l-m

∂ ∂ξl - m

l



1 - ξ2

Y ll ðθ, ϕÞ :

Further replacing M ð- Þ

l-m l Y l ðθ, ϕÞ

Ym l ðθ, ϕÞ =

ð3:141Þ

in (3.140) with that of (3.141), we get

ðl þ mÞ! - iðl - mÞϕ e ð2lÞ!ðl - mÞ!

1 - ξ2

-m

l-m

∂ ∂ξl - m

l



1 - ξ2

Y ll ðθ, ϕÞ :

ð3:142Þ

Finally, replacing Y ll ðθ, ϕÞ in (3.142) with that of (3.137) and converting θ to ξ, we arrive at the following equation: Ym l ðθ, ϕÞ =

eiχ 2l l!

ð2l þ 1Þðl þ mÞ! imϕ e 1 - ξ2 4π ðl - mÞ!

- m=2

l-m

∂ ∂ξl - m

Now, let us decide eiχ . Putting m = 0 in (3.143), we have

1 - ξ2

l

: ð3:143Þ

88

3

Y 0l ðθ, ϕÞ =

eiχ ð- 1Þl 2l l!ð- 1Þl

l

2l þ 1 ∂ 4π ∂ξl

1 - ξ2

Hydrogen-Like Atoms

l

,

ð3:144Þ

where we put (-1)l on both the numerator and denominator. In RHS of (3.144), ð- 1Þl ∂l 2l l! ∂ξl

1 - ξ2

l

 Pl ðξÞ:

ð3:145Þ

Equation (3.145) is well-known as Rodrigues formula of Legendre polynomials. We mention characteristics of Legendre polynomials in the next section. Thus, Y 0l ðθ, ϕÞ =

eiχ ð- 1Þl

2l þ 1 P ðξÞ: 4π l

ð3:146Þ

According to the custom [2], we require Y 0l ð0, ϕÞ to be positive. Noting that θ = 0 corresponds to ξ = 1, we have Y 0l ð0, ϕÞ =

eiχ ð- 1Þl

2l þ 1 eiχ Pl ð1Þ = 4π ð- 1Þl

2l þ 1 , 4π

ð3:147Þ

where we used Pl(1) = 1. For this important relation, see Sect. 3.6.1. Also noting that eiχ = 1, we must have ð- 1Þl eiχ =1 ð- 1Þl

or

eiχ = ð- 1Þl

ð3:148Þ

so that Y 0l ð0, ϕÞ can be positive. Thus, (3.143) is rewritten as Ym l ðθ, ϕÞ = 

ð- 1Þl 2l l!

1 - ξ2

l

ð2l þ 1Þðl þ mÞ! imϕ e 1 - ξ2 4π ðl - mÞ!

:

- m=2

l-m

∂ ∂ξl - m ð3:149Þ

In Sect. 3.3, we mentioned that jψ 0i in (3.43) is a constant. In fact, putting l = m = 0 in (3.149), we have Y 00 ðθ, ϕÞ =

1=4π:

ð3:150Þ

Thus, as a simultaneous eigenstate of all Lx, Ly, Lz, and L2 corresponding to l = 0 and m = 0, we have

3.5

Orbital Angular Momentum: Operator Approach

89

j ψ 0 i  Y 00 ðθ, ϕÞ: The normalized functions Y m l ðθ, ϕÞ described as (3.149) define simultaneous eigenfunctions of L2 (or M2) and Lz (or Mz). Those functions are called spherical surface harmonics and frequently appear in various fields of mathematical physics. As in the case of Sect. 2.3, matrix representation enables us to intuitively grasp the relationship between angular momentum operators and their eigenfunctions (or eigenvectors). Rewriting the relations of (3.101) so that they can meet the present purpose, we have M ð- Þ j l, mi = M

ð þÞ

j l, mi =

ðl - m þ 1Þðl þ mÞ j l, m - 1i,

ð3:151Þ

ðl - mÞðl þ m þ 1Þ j l, m þ 1i,

where we used l instead of ζ [or λ in (3.46)] to designate the eigenstate. Now, we know that m takes (2l + 1) different values that correspond to each l. This implies that the operators can be expressed with (2l + 1, 2l + 1) matrices. As implied in (3.151), M(-) takes the following form: M ð- Þ 0

p

2l  1 0

ð2l - 1Þ  2 0

=

⋱ ⋱ 0

ð2l - k þ 1Þ  k 0 ⋱ 0

⋱ p

1  2l 0 ð3:152Þ

where diagonal elements are zero and a (k, k + 1) element is ð2l - k þ 1Þ  k. That is, non-zero elements are positioned just above the zero diagonal elements. Correspondingly, we have

90

3

0 p 2l1

0 ð2l-1Þ2 0 ⋱

M

ð þÞ

Hydrogen-Like Atoms

=

0 ð2l-k þ1Þk 0 ⋱ ⋱

,

ð3:153Þ

0 ⋱ 0 p 12l 0 where again diagonal elements are zero and a (k + 1, k) element is ð2l - k þ 1Þ  k. In this case, non-zero elements are positioned just below the zero diagonal elements. Notice also that M(-) and M(+) are adjoint to each other and that these notations correspond to (2.65) and (2.66). Basis functions Y m l ðθ, ϕÞ can be represented by a column vector, as in the case of Sect. 2.3. These are denoted as follows:

j l, - li =

1

0

0 0

1 0



,

j l, - l þ 1i =



,

⋯,

0 0

0 0

j l, l - 1i =

0

0

0

0

⋮ 0

,

j l, li =

1 0

⋮ 0

,

ð3:154Þ

0 1

where the first number l in jl, - li, jl, - l + 1i, etc. denotes the quantum number associated with λ = l(l + 1) given in (3.46) and is kept constant; the latter number denotes m. Note from (3.154) that the column vector whose k-th row is 1 corresponds to m such that m = - l þ k - 1: For instance, if k = 1, m = - l; if k = 2l + 1, m = l, etc.

ð3:155Þ

3.5

Orbital Angular Momentum: Operator Approach

91

The operator M(-) converts the column vector whose (k + 1)-th row is 1 to that whose k-th row is 1. The former column vector corresponds to jl, m + 1i and the latter corresponding to jl, mi. Therefore, using (3.152), we get the following representation: M ð- Þ j l, m þ 1i =

ð2l - k þ 1Þ  k j l, mi = × j l, mi,

ðl - mÞðl þ m þ 1Þ

ð3:156Þ

where the second equality is obtained by replacing k with that of (3.155), i.e., k = l + m + 1. Changing m to (m - 1), we get the first equation of (3.151). Similarly, we obtain the second equation of (3.151) as well. That is, we have M ðþÞ j l, mi =

ð2l - k þ 1Þ  k j l, m þ 1i =

ðl - mÞðl þ m þ 1Þ

× j l, m þ 1i:

ð3:157Þ

From (3.32) we have M 2 = M ðþÞ M ð- Þ þ M z 2 - M z : In the above, M(+)M(-) and Mz are diagonal matrices and, hence, Mz2 and M2 are diagonal matrices as well such that

92

3

Hydrogen-Like Atoms

-l -l þ 1 -lþ 2 ⋱

Mz =

,

k-l ⋱ l 0 2l  1 ð2l - 1Þ  2 M ðþÞ M ð- Þ =



, ð2l - k þ 1Þ  k ⋱

1  2l ð3:158Þ

where k - l and (2l - k + 1)  k represent (k + 1, k + 1) elements of Mz and M(+)M(-), respectively. Therefore, (k + 1, k + 1) element of M2 is calculated as ð2l - k þ 1Þ  k þ ðk - lÞ2 - ðk - lÞ = lðl þ 1Þ: As expected, M2 takes a constant value l(l + 1). A matrix representation is shown in the following equation such that lðl þ 1Þ l ð l þ 1Þ ⋱ M = 2

lðl þ 1Þ

:

ð3:159Þ

⋱ l ð l þ 1Þ lðl þ 1Þ These expressions are useful to understand how the vectors of (3.154) constitute simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is

3.5

Orbital Angular Momentum: Operator Approach

93

said to diagonalize both M2 and Mz. In other words, the quantum states represented by (3.154) are simultaneous eigenstates of M2 and Mz. The matrices (3.152) and (3.153) that represent M(-) and M(+), respectively, are said to be ladder operators or raising and lowering operators, because operating column vectors those operators convert jmi to jm ∓ 1i as mentioned above. The operators M(-) and M(+) correspond to a and a{ given in (2.65) and (2.66), respectively. All these operators are characterized by that the corresponding matrices have diagonal elements of zero and that non-vanishing elements are only positioned on “right above” or “right below” relative to the diagonal elements. These matrices are a kind of triangle matrices and all their diagonal elements are zero. The matrices are characteristic of nilpotent matrices. That is, if a suitable power of a matrix is zero as a matrix, such a matrix is said to be a nilpotent matrix (see Part III). In the present case, (2l + 1)-th power of M(-) and M(+) becomes zero as a matrix. The operator M(-) and M(+) can be described by the following shorthand representations: M ð- Þ

kj

=

ð2l - k þ 1Þ  kδkþ1, j ð1 ≤ k ≤ 2lÞ:

If l = 0, Mz = M(+)M(-) = M2 = 0. This case corresponds to Y 00 ðθ, ϕÞ = and we do not need the matrix representation. Defining ak 

ð3:160Þ 1=4π

ð2l - k þ 1Þ  k,

we have, for instance, M ð- Þ

2

= kj

=

aδ aδ p k kþ1,p p pþ1,j ð2l - k þ 1Þ  k

= ak akþ1 δkþ2,j

ð3:161Þ

½2l - ðk þ 1Þ þ 1  ðk þ 1Þδkþ2, j ,

where the summation is non-vanishing only if p = k + 1. The factor δk + 2, j implies that the elements are shifted by one toward upper right by being squared. Similarly we have M ðþÞ

kj

= ak - 1 δk,jþ1 ð1 ≤ k ≤ 2l þ 1Þ:

In (3.158), M(+)M(-) is represented as follows:

ð3:162Þ

94

3

M ðþÞ M ð- Þ

kj

=

a δ aδ p k - 1 k,pþ1 p pþ1, j

Hydrogen-Like Atoms

= ak - 1 aj - 1 δk, j = ðak - 1 Þ2 δk, j

= ½2l - ðk - 1Þ þ 1  ðk - 1Þδk, j = ð2l - k þ 2Þðk - 1Þδk, j : ð3:163Þ Notice that although a0 is not defined, δ1, j + 1 = 0 for any j, and so this causes no inconvenience. Hence, [M(+)M(-)]kj of (3.163) is well-defined with 1 ≤ k ≤ 2l + 1. Important properties of angular momentum operators examined above are based upon the fact that those operators are ladder operators and represented by nilpotent matrices. These characteristics will further be studied in Parts III–V.

3.6

Orbital Angular Momentum: Analytic Approach

In this section, our central task is to solve the associated Legendre differential equation expressed by (3.127) by an analytical method. Putting m = 0 in (3.127), we have d dx

1 - x2

dP0l ðxÞ þ lðl þ 1ÞP0l ðxÞ = 0, dx

ð3:164Þ

where we use a variable x instead of ξ. Equation (3.164) is called Legendre differential equation and its characteristics and solutions have been widely investigated. Hence, we put P0l ðxÞ  Pl ðxÞ,

ð3:165Þ

where Pl(x) is said to be Legendre polynomials. We first start with Legendre differential equation and Legendre polynomials.

3.6.1

Spherical Surface Harmonics and Associated Legendre Differential Equation

Let us think of a following identity according to Byron and Fuller [4]: 1 - x2

l l d 1 - x2 = - 2lx 1 - x2 , dx

ð3:166Þ

where l is a positive integer. We differentiate both sides of (3.166) (l + 1) times. Here we use the Leibniz rule about differentiation of a product function that is described by

3.6

Orbital Angular Momentum: Analytic Approach n

dn ðuvÞ =

m=0

95

n! dm udn - m v, m!ðn - mÞ!

ð3:167Þ

where dm u=dxm  dm u: The above shorthand notation is due to Byron and Fuller [4]. We use this notation for simplicity from place to place. Noting that the third order and higher order differentiations of (1 - x2) vanish in LHS of (3.166), we have l

LHS = dlþ1 1 - x2 d 1 - x2

l

= 1 - x2 dlþ2 1 - x2 - 2ðl þ 1Þxd lþ1 1 - x2

l

l

- lðl þ 1Þd l 1 - x2 : Also noting that the second order and higher order differentiations of 2lx vanish in LHS of (3.166), we have l

RHS = - dlþ1 2lx 1 - x2 l

l

= - 2lxd lþ1 1 - x2 - 2lðl þ 1Þd l 1 - x2 : Therefore, l

l

l

LHS - RHS = 1 - x2 d lþ2 1 - x2 - 2xd lþ1 1 - x2 þ lðl þ 1Þdl 1 - x2 = 0: We define Pl(x) as Pl ðxÞ 

ð- 1Þl dl 2l l! dxl

1 - x2

l

,

ð3:168Þ

l

where a constant ð-2l1l!Þ is multiplied according to the custom so that we can explicitly represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x) defined above satisfies Legendre differential equation. Rewriting it, we get 1 - x2

d 2 Pl ðxÞ dP ðxÞ - 2x l þ lðl þ 1ÞPl ðxÞ = 0: 2 dx dx

Or equivalently, we have

ð3:169Þ

96

d dx

1 - x2

3

Hydrogen-Like Atoms

dPl ðxÞ þ lðl þ 1ÞPl ðxÞ = 0: dx

ð3:170Þ

Returning to (3.127) and using x as a variable, we rewrite (3.127) as d dx

1 - x2

dPm m2 l ð xÞ þ l ð l þ 1Þ Pm ðxÞ = 0, dx 1 - x2 l

ð3:171Þ

where l is a non-negative integer and m is an integer that takes following values: m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l: Deferential equations expressed as dyðxÞ d þ cðxÞyðxÞ = 0 pð x Þ dx dx are of particular importance. We will come back to this point in Sect. 10.3. Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and -m Pl ðxÞ must satisfy the same differential equation (3.171). This implies that Pm l ðxÞ and Pl- m ðxÞ are connected, i.e., linearly dependent. First, let us assume that m ≥ 0. In the case of m < 0, we will examine it later soon. According to Dennery and Krzywicki [5], we assume 2 Pm l ð xÞ = κ 1 - x

m=2

C ðxÞ,

ð3:172Þ

where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we obtain 1 - x2

dC d2 C - 2ðm þ 1Þx þ ðl - mÞðl þ m þ 1ÞC = 0 ð0 ≤ m ≤ lÞ: dx dx2

ð3:173Þ

Recall once again that if m = 0, the associated Legendre differential equation given by (3.127) and (3.171) is exactly identical to Legendre differential equation of (3.170). Differentiating (3.170) m times, we get d2 d m Pl d d m Pl - 2ðm þ 1Þx m 2 dx dxm dx dx dm P  m l = 0, dx

1 - x2

þ ðl - mÞðl þ m þ 1Þ ð3:174Þ

where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173) and (3.174), we find that

3.6

Orbital Angular Momentum: Analytic Approach

97

dm Pl , dxm

C ð xÞ = κ 0

where κ′ is a constant. Inserting this relation into (3.172) and setting κκ ′ = 1, we get 2 Pm l ð xÞ = 1 - x

m=2

dm Pl ðxÞ ð0 ≤ m ≤ lÞ: dxm

ð3:175Þ

Using Rodrigues formula of (3.168), we have Pm l ð xÞ 

ð- 1Þl 1 - x2 2l l!

m=2

dlþm dxlþm

1 - x2

l

:

ð3:176Þ

Equations (3.175) and (3.176) define the associated Legendre functions. Note, however, that the function form differs from literature to literature [2, 5, 6]. Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often appear in the literature. The relevant differential equation is defined by 1 - x2 =0

d2 λ d C ðxÞ - ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ 2 n dx dx 1 λ> - : 2

ð3:177Þ

Setting n = l - m and λ = m þ 12 in (3.177) [5], we have 1 - x2

d mþ1 d 2 mþ12 C ðxÞ - 2ðm þ 1Þx Cl - m2 ðxÞ þ ðl - mÞ 2 l-m dx dx mþ1

 ðl þ m þ 1ÞC l - m2 ðxÞ = 0:

ð3:178Þ

Once again comparing (3.174) and (3.178), we obtain dm Pl ðxÞ mþ1 = constant  Cl - m2 ðxÞ m dx

ð0 ≤ m ≤ lÞ:

ð3:179Þ

Next, let us determine the constant appearing in (3.179). To this end, we consider a following generating function of the polynomials C λn ðxÞ defined by [7, 8] 1 - 2tx þ t 2





1 n=0

Cλn ðxÞt n

λ> -

1 : 2

To calculate (3.180), let us think of a following expression for x and λ:

ð3:180Þ

98

3 1

ð 1 þ xÞ - λ =

m=0

-λ m x , m

where λ is an arbitrary real number and we define -λ m

Hydrogen-Like Atoms

-λ m

ð3:181Þ

as

 - λð- λ - 1Þð- λ - 2Þ⋯ð- λ - m þ 1Þ=m! and  1:

-λ 0 ð3:182Þ

Notice that (3.181) with (3.182) is an extension of binomial theorem (generalized binomial theorem). Putting -λ = n, we have n m

=

n! ðn - mÞ!m!

and

n 0

= 1:

We rewrite (3.181) using gamma functions Γ(z) such that 1

ð 1 þ xÞ - λ =

m=0

Γð- λ þ 1Þ xm , m!Γð- λ - m þ 1Þ

ð3:183Þ

where Γ(z) is defined by integral representation as 1

ΓðzÞ =

e - t t z - 1 dt ðℜe z > 0Þ:

ð3:184Þ

0

In (3.184), ℜe denotes the real part of a number z. Changing variables such that t = u2, we have 1

ΓðzÞ = 2

e - u u2z - 1 du ðℜe z > 0Þ: 2

ð3:185Þ

0

Note that the above expression is associated with the following fundamental feature of the gamma functions: Γðz þ 1Þ = zΓðzÞ, where z is any complex number. Replacing x with -t(2x - t) and rewriting (3.183), we have

ð3:186Þ

3.6

Orbital Angular Momentum: Analytic Approach

1 - 2tx þ t 2



1

=

m=0

99

Γð- λ þ 1Þ ð- t Þm ð2x - t Þm : m!Γð- λ - m þ 1Þ

ð3:187Þ

Assuming that x is a real number belonging to an interval [-1, 1], (3.187) holds with t satisfying |t| < 1 [8]. The discussion is as follows: When x satisfies the above condition, solving 1 - 2tx + t2 = 0 we get solution t± such that p t ± = x ± i 1 - x2 : Defining r as r  minfjt þ j, jt - jg, (1 - 2tx + t2)-λ, which is regarded as a function of t, is analytic in the disk |t| < r. However, we have jt ± j = 1: Thus, (1 - 2tx + t2)-λ is analytic within the disk |t| < 1 and, hence, it can be expanded in a Taylor’s series (see Chap. 6). Continuing the calculation of (3.187), we have 1-2tx þ t2



1

¼

Γð-λ þ 1Þ

m¼0 m!Γð-λ -m þ 1Þ 1

¼

m¼0 1

¼

m¼0

ð-1Þm t m

m

m!

k¼0 k!ðm -k Þ!

2m -k xm - k ð-1Þk tk

ð-1Þmþk Γð-λ þ 1Þ m- k m -k mþk x t 2 k¼0 k!ðm -k Þ! Γð-λ- m þ 1Þ m

ð-1Þmþk ð- 1Þm Γðλ þ mÞ m -k m -k mþk x t , 2 k¼0 k!ðm -k Þ! ΓðλÞ m

ð3:188Þ

where the last equality results from that we rewrote gamma functions using (3.186). Replacing (m + k) with n, we get 1 - 2tx þ t 2



=

1

ð- 1Þk 2n - 2k Γðλ þ n - kÞ n - 2k n t , x k = 0 k!ðn - 2k Þ! Γ ðλÞ ½n=2

n=0

ð3:189Þ

where [n/2] represents an integer that does not exceed n/2. This expression comes from a requirement that an order of x must satisfy the following condition: n - 2k ≥ 0

or

k ≤ n=2:

ð3:190Þ

That is, if n is even, the maximum of k = n/2. If n is odd, the maximum of k = (n - 1)/2. Comparing (3.180) and (3.189), we get [8]

100

3

Cλn ðxÞ =

Hydrogen-Like Atoms

ð- 1Þk 2n - 2k Γðλ þ n - kÞ n - 2k : x k = 0 k!ðn - 2k Þ! Γ ðλÞ ½n=2

ð3:191Þ

Comparing (3.164) and (3.177) and putting λ = 1/2, we immediately find that the two differential equations are identical [7]. That is, C1=2 n ðxÞ = Pn ðxÞ:

ð3:192Þ

Hence, we further have P n ð xÞ =

ð- 1Þk 2n - 2k Γ k = 0 k!ðn - 2k Þ! ½n=2

1 2

þ n - k n - 2k : x Γ 12

ð3:193Þ

Using (3.186) once again, we get [8] Pn ðxÞ =

ð- 1Þk ð2n - 2k Þ! n - 2k : x k = 0 2 k!ðn - k Þ!ðn - 2k Þ! ½n=2

n

ð3:194Þ

It is convenient to make a formula about a gamma function. In (3.193), n - k > 0, and so let us think of Γ 12 þ m ðm : positive integerÞ. Using (3.186), we have Γ

1 1 1 1 ¼ mþ m ¼ m- Γ m2 2 2 2 1 3 1 1 ¼ mm- ⋯ Γ 2 2 2 2

m-

3 3 Γ m¼⋯ 2 2

1 2 ð 2m 1 Þ! ð2mÞ! 1 1 ¼ 2 - 2m Γ : ¼ 2-m m-1 Γ m! 2 2 ðm - 1Þ! 2 ¼ 2 - m ð2m - 1Þð2m - 3Þ⋯3  1  Γ

ð3:195Þ

Notice that (3.195) still holds even if m = 0. Inserting n - k into m of (3.195), we get Γ

ð2n - 2kÞ! 1 1 Γ : þ n - k = 2 - 2ðn - kÞ 2 2 ðn - kÞ!

ð3:196Þ

Replacing Γ 12 þ n - k of (3.193) with RHS of the above equation, (3.194) will follow. A gamma function Γ 12 often appears in mathematical physics. According to (3.185), we have

3.6

Orbital Angular Momentum: Analytic Approach

Γ

1

1 =2 2

101

p 2 e - u du = π :

0

For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From (3.184), we also have Γð1Þ = 1: In relation of the discussion of Sect. 3.5, let us derive an important formula about Legendre polynomials. From (3.180) and (3.192), we get 1 - 2tx þ t2

- 1=2

1



P ðxÞt n=0 n

n

:

ð3:197Þ

Assuming |t| < 1, when we put x = 1 in (3.197), we have 1 - 2tx þ t 2

- 12

=

1 = 1-t

1 n=0

tn =

1

P ð1Þt n=0 n

n

:

ð3:198Þ

Comparing individual coefficients of tn in (3.198), we get Pn ð1Þ = 1: See the related parts of Sect. 3.5. Now, we are in the position to determine the constant in (3.179). Differentiating (3.194) m times, we have d m Pl ðxÞ=dxm ¼ ¼

k ½ðl - mÞ=2 ð- 1Þ ð2l - 2k Þ!ðl - 2k Þðl - 2k - 1Þ⋯ðl - 2k - m þ 1Þ l - 2k - m x l k¼0

2 k!ðl - k Þ!ðl - 2k Þ!

½ðl - mÞ=2 k¼0

k

ð- 1Þ ð2l - 2kÞ! xl - 2k - m : 2l k!ðl - kÞ!ðl - 2k - mÞ! ð3:199Þ

Meanwhile, we have ½ðl - mÞ=2

mþ1

Cl - m2 ðxÞ =

k=0

ð- 1Þk 2l - 2k - m Γ l þ 12 - k l - 2k - m : x k!ðl - 2k - mÞ! Γ m þ 12

Using (3.195) and (3.196), we have Γ l þ 12 - k ð2l - 2kÞ! m! : = 2 - 2ðl - k - mÞ ðl - kÞ! ð2mÞ! Γ m þ 12 Therefore, we get

ð3:200Þ

102

3 ½ðl - mÞ=2

mþ1

C l - m2 ðxÞ =

k=0

Hydrogen-Like Atoms

2m Γðm þ 1Þ l - 2k - m ð- 1Þk ð2l - 2kÞ! , x 2l k!ðl - k Þ!ðl - 2k - mÞ! Γð2m þ 1Þ

ð3:201Þ

where we used m ! = Γ(m + 1) and (2m) ! = Γ(2m + 1). Comparing (3.199) and (3.201), we get dm Pl ðxÞ Γð2m þ 1Þ mþ12 = m ðxÞ: C m dx 2 Γ ð m þ 1Þ l - m

ð3:202Þ

Þ Thus, we find that the constant appearing in (3.179) is 2ΓmðΓ2mþ1 ðmþ1Þ. Putting m = 0 in 1=2

(3.202), we have Pl ðxÞ = C l ðxÞ. Therefore, (3.192) is certainly recovered. This gives an easy checkup to (3.202). Meanwhile, Rodrigues formula of Gegenbauer polynomials [5] is given by C λn ðxÞ =

ð- 1Þn Γðn þ 2λÞΓ λ þ 12 2n n!Γ n þ λ þ 12 Γð2λÞ

1 - x2

- λþ12

dn dxn

1 - x2

nþλ - 12

:

ð3:203Þ

Hence, we have ð- 1Þl - m Γðl þ m þ 1ÞΓðm þ 1Þ 1 - x2 2l - m ðl - mÞ!Γðl þ 1ÞΓð2m þ 1Þ

mþ1

C l - m2 ðxÞ = 

1 - x2

l

-m

dl - m dxl - m

:

ð3:204Þ

Inserting (3.204) into (3.202), we have d m Pl ðxÞ dxm

=

ð- 1Þl - m Γðl þ m þ 1Þ 1 - x2 2l ðl - mÞ!Γðl þ 1Þ

ð- 1Þl - m ðl þ mÞ! 1 - x2 = 2l l!ðl - mÞ!

-m

-m

dl - m dxl - m

dl - m dxl - m

1 - x2

1-x

l

ð3:205Þ 2 l

:

Further inserting this into (3.175), we finally get Pm l ð xÞ =

ð- 1Þl - m ðl þ mÞ! 1 - x2 2l l!ðl - mÞ!

- m=2

dl - m dxl - m

1 - x2

l

:

ð3:206Þ

When m = 0, we have P0l ðxÞ =

ð- 1Þl dl 2l l! dxl

1 - x2

l

= Pl ðxÞ:

ð3:207Þ

3.6

Orbital Angular Momentum: Analytic Approach

103

Thus, we recover the functional form of Legendre polynomials. The expression (3.206) is also meaningful for negative m, provided |m| ≤ l, and permits an extension of the definition of Pm l ðxÞ given by (3.175) to negative numbers of m [5]. Changing m to -m in (3.206), we have Pl- m ðxÞ =

ð- 1Þlþm ðl - mÞ! 1 - x2 2l l!ðl þ mÞ!

m=2

dlþm dxlþm

1 - x2

l

:

ð3:208Þ

ð0 ≤ m ≤ lÞ:

ð3:209Þ

Meanwhile, from (3.168) and (3.175), Pm l ðxÞ =

ð- 1Þl 1 - x2 2l l!

m=2

dlþm dxlþm

1 - x2

l

Comparing (3.208) and (3.209), we get Pl- m ðxÞ =

ð- 1Þm ðl - mÞ! m Pl ðxÞ ð- l ≤ m ≤ lÞ: ðl þ mÞ!

ð3:210Þ

-m Thus, as expected earlier, Pm ðxÞ are linearly dependent. l ðxÞ and Pl Now, we return back to (3.149). From (3.206) we have

1 - x2

- m=2

dl - m dxl - m

1 - x2

l

=

ð- 1Þl - m 2l l!ðl - mÞ! m Pl ðxÞ: ðl þ mÞ!

ð3:211Þ

Inserting (3.211) into (3.149) and changing the variable x to ξ, we have m Ym l ðθ, ϕÞ = ð- 1Þ

ð2l þ 1Þðl - mÞ! m Pl ðξÞeimϕ ðξ = cos θ; 0 ≤ θ ≤ π Þ: 4π ðl þ mÞ!

ð3:212Þ

The coefficient (-1)m appearing (3.212) is well-known as Condon–Shortley phase [7]. Another important expression obtained from (3.210) and (3.212) is 

Y l- m ðθ, ϕÞ = ð- 1Þm Y m l ðθ, ϕÞ :

ð3:213Þ

Since (3.208) or (3.209) involves higher order differentiations, it would somewhat be inconvenient to find their functional forms. Here we try to seek the convenient representation of spherical harmonics using familiar cosine and sine functions. Starting with (3.206) and applying Leibniz rule there, we have

104

3

Pm l ðxÞ ¼

¼

Hydrogen-Like Atoms

ð- 1Þl- m ðl þ mÞ! ð1 þ xÞ - m=2 ð1 -xÞ - m=2 2l l!ðl -mÞ! l- m- r l- m ðl -mÞ! d r l d × ð 1 þ x Þ ð1- xÞl r r¼0 r!ðl- m- r Þ! dx dxl- m- r ð- 1Þl- m ðl þ mÞ! 2l l!ðl -mÞ!

l -m r¼0

ðl -mÞ! l!ð1 þ xÞl- r - 2 ð- 1Þl - m- r l!ð1-xÞmþr - 2 ðl -rÞ! ðm þ r Þ! r!ðl -m -rÞ! m

ð-1Þr ð1 þ xÞl -r - 2 ð1-xÞrþ 2 : ðm þ r Þ! r!ðl - m- rÞ! ðl -r Þ! m

l!ðl þ mÞ! ¼ 2l

l- m r¼0

m

m

ð3:214Þ

Putting x = cos θ in (3.214) and using a trigonometric formula, we have Pm l ðxÞ = l!ðl þ mÞ!

cos 2l - 2r - m ð- 1Þr r = 0 r!ðl - m - r Þ! ðl - r Þ! l-m

θ 2

sin 2rþm θ2 : ðm þ r Þ!

ð3:215Þ

Inserting this into (3.212), we get Ym l ðθ, ϕÞ = l!

ð2l þ 1Þðl þ mÞ!ðl - mÞ! imϕ e 4π

ð- 1Þrþm cos 2l - 2r - m 2θ sin 2rþm r=0 r!ðl - m - r Þ!ðl - r Þ!ðm þ r Þ! l-m

θ 2

:

ð3:216Þ Summation domain of r must be determined so that factorials of negative integers can be avoided [6]. That is, 1. If m ≥ 0, 0 ≤ r ≤ l - m; (l - m + 1) terms, 2. If m < 0, |m| ≤ r ≤ l; (l - |m| + 1) terms. For example, if we choose l for m, putting r = 0 in (3.216) we have Y ll ðθ, ϕÞ = =

l l ð2l þ 1Þ! ilϕ ð- 1Þ cos l 2θ sin e 4π l!

ð2l þ 1Þ! ilϕ ð- 1Þl sin l θ e : 4π 2l l!

In particular, we have Y 00 ðθ, ϕÞ = r = l in (3.216) we get

1 4π to

θ 2

ð3:217Þ

recover (3.150). When m = - l, putting

3.6

Orbital Angular Momentum: Analytic Approach

ð2l þ 1Þ! - ilϕ cos l 2θ sin l e 4π l!

Y l- l ðθ, ϕÞ =

θ 2

105

=

ð2l þ 1Þ! - ilϕ sin l θ : ð3:218Þ e 4π 2l l!

For instance, choosing l = 3 and m = ± 3 and using (3.217) or (3.218), we have Y 33 ðθ, ϕÞ = -

35 i3ϕ e sin 3 θ, 64π

Y 3- 3 ðθ, ϕÞ =

35 - i3ϕ e sin 3 θ: 64π

For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase. For l = 3 and m = 0, moreover, we have θ θ sin 2r 2 2 r¼0 r!ð3-rÞ!ð3-rÞ!r! θ θ θ θ θ cos 4 sin 2 cos 2 sin 4 cos 6 2 2 2 2 2 7 þ ¼ 18 1!2!2!1! 2!1!1!2! π 0!3!3!0!

7  3!  3! Y 03 ðθ,ϕÞ ¼ 3! 4π

3

ð- 1Þr cos 6 -2r

θ 2 3!0!0!3! sin 6

¼ 18

¼

7 π

cos 6

θ θ θ θ - sin 6 cos 2 sin 2 2 2 2 2 þ 36

sin 2

θ θ - cos 2 2 2

4

3 7 5 cos 3 θ - cos θ , 4 π 4

where in the last equality we used formulae of elementary algebra and trigonometric functions. At the same time, we get Y 03 ð0, ϕÞ =

7 : 4π

This is consistent with (3.147) in that Y 03 ð0, ϕÞ is positive.

3.6.2

Orthogonality of Associated Legendre Functions

Orthogonality relation of functions is important. Here we deal with it, regarding the associated Legendre functions.

106

3

Hydrogen-Like Atoms

Replacing m with (m - 1) in (3.174) and using the notation introduced before, we have 1 - x2 d mþ1 Pl - 2mxd m Pl þ ðl þ mÞðl - m þ 1Þdm - 1 Pl = 0:

ð3:219Þ

Multiplying both sides by (1 - x2)m - 1, we have m mþ1

1 - x2

d

Pl - 2mx 1 - x2

þ ð l þ m Þ ð l - m þ 1Þ 1 - x

m-1 m

d Pl

2 m-1 m-1

d

Pl = 0:

Rewriting the above equation, we get d 1 - x2 Þm dm Pl = - ðl þ mÞðl - m þ 1Þ 1 - x2

m-1 m-1

d

Pl :

ð3:220Þ

Now, let us define f(m) as follows: f ðmÞ 

1 -1

1 - x2

m

ðdm Pl Þdm Pl0 dx ð0 ≤ m ≤ l, l0 Þ:

ð3:221Þ

Rewriting (3.221) as follows and integrating it by parts, we have f ðmÞ = =

1 -1

m

d dm - 1 Pl d m Pl0 dx

d m - 1 Pl 1 - x2 1

==

1 - x2

-1 1

-1 0

m m

d Pl0

d m - 1 P l d 1 - x2

1 -1

-

1 -1

dm - 1 Pl d 1 - x2

m m

d Pl0 dx

m m

d Pl0 dx

dm - 1 Pl ðl0 þ mÞðl0 - m þ 1Þ 1 - x2

m-1 m-1

d

Pl0 dx

= ðl þ mÞðl0 - m þ 1Þf ðm - 1Þ, ð3:222Þ where with the second equality the first term vanished and with the second last equality we used (3.220). Equation (3.222) gives a recurrence formula regarding f(m). Further performing the calculation, we get

3.6

Orbital Angular Momentum: Analytic Approach

107

f ðmÞ = ðl0 þ mÞðl0 þ m - 1Þ  ðl0 - m þ 2Þðl0 - m þ 1Þf ðm - 2Þ =⋯ = ðl0 þ mÞðl0 þ m - 1Þ⋯ðl0 þ 1Þ  l0 ⋯ðl0 - m þ 2Þðl0 - m þ 1Þf ð0Þ ðl0 þ mÞ! f ð0Þ, = 0 ðl - mÞ! ð3:223Þ where we have 1

f ð 0Þ =

-1

Pl ðxÞPl0 ðxÞdx:

ð3:224Þ

Note that in (3.223) a coefficient of f(0) comprises 2m factors. In (3.224), Pl(x) and Pl0 ðxÞ are Legendre polynomials defined in (3.165). Then, using (3.168) we have ð- 1Þl ð- 1Þl f ð 0Þ = l 0 2 l! 2l l0 !

0

1 -1

d l 1 - x2

l0

0

l

d l 1 - x2

dx:

ð3:225Þ

To evaluate (3.224), we have two cases; i.e., (1) l ≠ l′ and (2) l = l′. With the first case, assuming that l > l′ and taking partial integration, we have I=

1 -1

d l 1 - x2

l

d l - 1 1 - x2

=

1

-

-1

0

d l 1 - x2

dx l0

0

l

d l - 1 1 - x2

l0

d l 1 - x2 l

1

ð3:226Þ

-1

0

dl þ1 1 - x2

l0

dx:

In the above, we find that the first term vanishes because it contains (1 - x2) as a factor. Integrating (3.226) another l′ times as before, we get 0

I = ð- 1Þl þ1

1 -1

0

dl - l

-1

1 - x2

l

0

d2l þ1 1 - x2 0

l0

ð3:227Þ

dx:

In (3.227) we have 0 ≤ l - l′ - 1 ≤ 2l, and so d l - l - 1 ð1 - x2 Þ does not vanish, 0 l0 l0 but ð1 - x2 Þ is an at most 2l′-degree polynomial. Hence, d2l þ1 ð1 - x2 Þ vanishes. Therefore, l

f ð0Þ = 0: If l < l′, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) = 0 as well.

ð3:228Þ

108

3

Hydrogen-Like Atoms

In the second case of l = l′, we evaluate the following integral: I=

1

d l 1 - x2

-1

l 2

ð3:229Þ

dx:

Similarly integrating (3.229) by parts l times, we have I = ð- 1Þl

1 -1

l

1 - x2

l

d 2l 1 - x2

dx = ð- 1Þ2l ð2lÞ!

1 -1

l

1 - x2 dx:

ð3:230Þ

In (3.230), changing x to cosθ, we have 1 -1

π

l

1 - x2 dx =

sin 2lþ1 θdθ:

ð3:231Þ

0

We have already estimate this integral of (3.132) to have Therefore,

22lþ1 ðl!Þ2 ð2lþ1Þ!

in (3.135).

f ð 0Þ =

ð- 1Þ2l ð2lÞ! 22lþ1 ðl!Þ2 2 = : 2 2l 2l þ 1 ð 2l þ 1 Þ! 2 ðl!Þ

ð3:232Þ

f ðm Þ =

ðl þ mÞ! ðl þ mÞ! 2 f ð 0Þ = : ðl - mÞ! ðl - mÞ! 2l þ 1

ð3:233Þ

Thus, we get

From (3.228) and (3.233), we have 1 -1

m Pm l ðxÞPl0 ðxÞdx =

ðl þ mÞ! 2 δ 0: ðl - mÞ! 2l þ 1 ll

ð3:234Þ

Accordingly, putting Pm l ð xÞ 

ð2l þ 1Þðl - mÞ! m Pl ðxÞ, 2ðl þ mÞ!

ð3:235Þ

we get 1 -1

m Pm l ðxÞPl0 ðxÞdx = δll0 :

Normalized Legendre polynomials immediately follow. This is given by

ð3:236Þ

3.7

Radial Wave Functions of Hydrogen-Like Atoms

ð- 1Þl 2l þ 1 P l ð xÞ = l 2 2 l!

Pl ðxÞ 

109

2l þ 1 d l 2 dxl

1 - x2

l

:

ð3:237Þ

Combining a normalized function (3.235) with p12π eimϕ , we recover ð2l þ 1Þðl - mÞ! m Pl ðxÞeimϕ ðx = cos θ; 0 ≤ θ ≤ πÞ: 4π ðl þ mÞ!

Ym l ðθ, ϕÞ =

ð3:238Þ

Notice in (3.238), however, we could not determine Condon–Shortley phase (1)m; see (3.212). -m ðxÞ are linearly dependent as noted in (3.210), the set of the Since Pm l ðxÞ and Pl associated Legendre functions cannot define a complete set of orthonormal system. In fact, we have 1 -1

-m Pm ðxÞdx = l ðxÞPl

2ð- 1Þm ð- 1Þm ðl - mÞ! ðl þ mÞ! 2 : = 2l þ 1 ðl þ mÞ! ðl - mÞ! 2l þ 1

ð3:239Þ

-m ðxÞ are not orthogonal. Thus, we need eimϕ to This means that Pm l ðxÞ and Pl constitute the complete set of orthonormal system. In other words, 2π

1

dϕ 0

3.7

-1

0



m 0 0 dðcos θÞ Y m l0 ðθ, ϕÞ Y l ðθ, ϕÞ = δll δmm :

ð3:240Þ

Radial Wave Functions of Hydrogen-Like Atoms

In Sect. 3.1 we have constructed Hamiltonian of hydrogen-like atoms. If the physical system is characterized by the central force field, the method of separation of variables into the angular part (θ, ϕ) and radial (r) part is successfully applied to the problem and that method allows us to deal with the Schrödinger equation separately. The spherical surface harmonics play a central role in dealing with the differential equations related to the angular part. We studied important properties of the special functions such as Legendre polynomials and associated Legendre functions, independent of the nature of the specific central force fields such as Coulomb potential and Yukawa potential. With the Schrödinger equation pertinent to the radial part, on the other hand, its characteristics differ depending on the nature of individual force fields. Of these, the differential equation associated with the Coulomb potential gives exact (or analytical) solutions. It is well-known that the secondorder differential equations are often solved by an operator representation method. Examples include its application to a quantum-mechanical harmonic oscillator and angular momenta of a particle placed in a central force field. Nonetheless, the

110

3 Hydrogen-Like Atoms

corresponding approach to the radial equation for the electron has been less popular to date. The initial approach, however, was made by Sunakawa [3]. The purpose of this chapter rests upon further improvement of that approach.

3.7.1

Operator Approach to Radial Wave Functions [3]

In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger equation described as ∂Rðr Þ ħ2 λ Ze2 1 ħ2 ∂ r2 þ 2 Rðr Þ Rðr Þ = ERðr Þ: - 2 4πε0 r 2μ r r ∂r ∂r

ð3:51Þ

We identified λ with l(l + 1); see Sect. 3.5. Thus, rewriting (3.51) and indexing R(r) with l, we have -

ħ2 l ð l þ 1Þ ħ2 d 2 dRl ðr Þ Ze2 þ Rl ðr Þ = ERl ðr Þ, r 2 2 dr 4πε dr 2μr 2μr 0r

ð3:241Þ

where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and eigenvalue of energy, respectively. Otherwise we follow conventions. Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantummechanical harmonic oscillator and the previous section of the angular momentum operator, we present the operator formalism in dealing with radial wave functions of hydrogen-like atoms. The essential point rests upon that the radial wave functions can be derived by successively operating lowering operators on a radial wave function having a maximum allowed orbital angular momentum quantum number. The results agree with the conventional coordinate representation method based upon power series expansion that is related to associated Laguerre polynomials. Sunakawa [3] introduced the following differential equation by suitable transformations of a variable, parameter, and function: -

d 2 ψ l ð ρÞ l ð l þ 1Þ 2 þ - ψ l ðρÞ = Eψ l ðρÞ, ρ dρ2 ρ2 2

ð3:242Þ

a where ρ = Zra , E = 2μ E, and ψ l(ρ) = ρRl(r) with a (4πε0ħ2/μe2) being Bohr ħ2 Z radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The related calculations are as follows: We have

3.7

Radial Wave Functions of Hydrogen-Like Atoms

111

dψ l 1 ψ l Z dRl d ðψ l =ρÞ dρ = : = dρ dr dρ ρ ρ2 a dr Thus, we get r2

a dRl dψ l = r - ψ l, dr dρ Z

d 2 ψ l ð ρÞ dR d ρ: r2 l = dr dr dρ2

Using the above relations we arrive at (3.242). Here we define the following operators: bl 

d l 1 : þ dρ ρ l

ð3:243Þ

Hence, b{l = -

d l 1 , þ dρ ρ l

ð3:244Þ

where the operator b{l is an adjoint operator of bl. Notice that these definitions are d different from those of Sunakawa [3]. The operator dρ ð AÞ is formally an antiHermitian operator. We have mentioned such an operator in Sect. 1.5. The second terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus, we foresee that bl and b{l can be denoted as follows: bl = A þ H

and

b{l = - A þ H:

These representations are analogous to those appearing in the operator formalism of a quantum-mechanical harmonic oscillator. Special care, however, should be taken in dealing with the operators bl and b{l . First, we should carefully examine d d whether dρ is in fact an anti-Hermitian operator. This is because for dρ to be antiHermitian, the solution ψ l(ρ) must satisfy boundary conditions in such a way that ψ l(ρ) vanishes or takes the same value at the endpoints ρ → 0 and 1. Second, the coordinate system we have chosen is not Cartesian coordinate but the polar (spherical) coordinate, and so ρ is defined only on a domain ρ > 0. We will come back to this point later. Let us proceed on calculations. We have

112

3

bl b{l =

d l 1 þ dρ ρ l

 -

Hydrogen-Like Atoms

d l 1 þ dρ ρ l

=-

d2 d l 1 l 1 d l2 2 1 þ - þ þ l ρ l dρ ρ2 ρ l2 dρ2 dρ ρ

=-

l ð l - 1Þ 2 1 d2 l l2 2 1 d2 þ þ = þ - þ 2: ρ l ρ2 dρ2 ρ2 ρ2 ρ l2 dρ2

ð3:245Þ

Also, we have b{l bl = -

lðl þ 1Þ 2 1 d2 þ - þ 2: 2 ρ l ρ2 dρ

ð3:246Þ

We further define an operator H(l ) as follows: H ðlÞ  -

lðl þ 1Þ 2 d2 þ - : ρ ρ2 dρ2

Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have H ðlÞ = blþ1 b{lþ1 þ εðlÞ ðl ≥ 0Þ, where εðlÞ  -

1 . ðlþ1Þ2

ð3:247Þ

Alternatively, H ðlÞ = b{l bl þ εðl - 1Þ ðl ≥ 1Þ:

ð3:248Þ

If we put l = n - 1 in (3.247) with n being a fixed given integer larger than l, we obtain H ðn - 1Þ = bn b{n þ εðn - 1Þ :

ð3:249Þ

We evaluate the following inner product of both sides of (3.249): χjH ðn - 1Þ jχ

= χjbn b{n jχ þ εðn - 1Þ hχjχ i = b{n χjb{n χ þ εðn - 1Þ hχjχ i = b{n jχi

2

þ εðn - 1Þ hχjχ i ≥ εðn - 1Þ :

ð3:250Þ

Here we assume that χ is normalized (i.e., hχ| χi = 1). On the basis of the variational principle [9], the above expected value must take a minimum ε(n - 1) so that χ can be an eigenfunction. To satisfy this condition, we have

3.7

Radial Wave Functions of Hydrogen-Like Atoms

113

j b{n χi = 0:

ð3:251Þ

In fact, if (3.251) holds, from (3.249) we have H ðn - 1Þ χ = εðn - 1Þ χ:

ð3:252Þ

We define such a function as follows: ðnÞ

ψ n - 1  χ:

ð3:253Þ

From (3.247) and (3.248), we have the following relationship: H ðlÞ blþ1 = blþ1 H ðlþ1Þ ðl ≥ 0Þ:

ð3:254Þ

Meanwhile we define the functions as shown in the following: ðnÞ

ψ ðnn-Þ s  bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1 ð2 ≤ s ≤ nÞ:

ð3:255Þ ðnÞ

With these functions (s - 1) operators have been operated on ψ n - 1 . Note that if s took 1, no operation of bl would take place. Thus, we find that bl functions upon the l-state to produce the (l - 1)-state. That is, bl acts as an annihilation operator. For the sake of convenience we express H ðn,sÞ  H ðn - sÞ :

ð3:256Þ

Using this notation and (3.254), we have ðnÞ

H ðn,sÞ ψ ðnn-Þ s = H ðn,sÞ bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1 ðnÞ

= bn - sþ1 H ðn,s - 1Þ bn - sþ2 ⋯bn - 1 ψ n - 1 ðnÞ

= bn - sþ1 bn - sþ2 H ðn,s - 2Þ ⋯bn - 1 ψ n - 1 ⋯⋯

ðnÞ

= bn - sþ1 bn - sþ2 ⋯H ðn,2Þ bn - 1 ψ n - 1

ð3:257Þ

ðnÞ = bn - sþ1 bn - sþ2 ⋯bn - 1 H ðn,1Þ ψ n - 1 ðnÞ = bn - sþ1 bn - sþ2 ⋯bn - 1 εðn - 1Þ ψ n - 1 ðnÞ = εðn - 1Þ bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1 = εðn - 1Þ ψ ðnn-Þ s :

Thus, total n functions ψ ðnn-Þ s ð1 ≤ s ≤ nÞ belong to the same eigenvalue ε(n - 1). Notice that the eigenenergy En corresponding to ε(n - 1) is given by

114

3

En = -

ħ2 Z 2μ a

2

Hydrogen-Like Atoms

1 : n2

ð3:258Þ ðnÞ

If we define l  n - s and take account of (3.252), total n functions ψ l (l = 0, 1, 2, ⋯, n - 1) belong to the same eigenvalue ε(n - 1). ðnÞ The quantum state ψ l is associated with the operators H(l ). Thus, the solution of ðnÞ (3.242) has been given by functions ψ l parametrized with n and l on condition that (3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter ðnÞ l by one from l to l - 1, when it operates on ψ l . The operator b0 cannot be defined as indicated in (3.243), and so the lowest number of l should be zero. Operators such as bl are known as a ladder operator (lowering operator or annihilation operator in the ðnÞ present case). The implication is that the successive operations of bl on ψ n - 1 produce various parameters l as a subscript down to zero, while retaining the same integer parameter n as a superscript.

3.7.2

Normalization of Radial Wave Functions [10]

Next we seek normalized eigenfunctions. Coordinate representation of (3.251) takes ðnÞ

-

dψ n - 1 n 1 ðnÞ ψ = 0: þ dρ ρ n n-1

ð3:259Þ

The solution can be obtained as ðnÞ

ψ n - 1 = cn ρn e - ρ=n ,

ð3:260Þ

where cn is a normalization constant. This can be determined as follows: 1 0

ðnÞ

2

ψ n - 1 dρ = 1:

ð3:261Þ

Namely, 1

j cn j 2

ρ2n e - 2ρ=n dρ = 1:

0

Consider the following definite integral:

ð3:262Þ

3.7

Radial Wave Functions of Hydrogen-Like Atoms 1

e - 2ρξ dρ =

0

115

1 : 2ξ

Differentiating the above integral 2n times with respect to ξ gives 1

2nþ1

1 2

ρ2n e - 2ρξ dρ =

0

ð2nÞ!ξ - ð2nþ1Þ :

ð3:263Þ

Substituting 1/n into ξ, we obtain 1

ρ2n e - 2ρ=n dρ =

0

1 2

2nþ1

ð2nÞ!nð2nþ1Þ :

ð3:264Þ

Hence, cn =

2 n

nþ12

=

ð2nÞ!:

ð3:265Þ

To further normalize the other wave functions, we calculate the following inner product: ðnÞ

ðnÞ

ψ l jψ l

ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 b{lþ1 jblþ1 blþ2 ⋯bn - 1 ψ n - 1 :

ð3:266Þ

From (3.247) and (3.248), we have b{l bl þ εðl - 1Þ = blþ1 b{lþ1 þ εðlÞ ðl ≥ 1Þ:

ð3:267Þ

Applying (3.267)–(3.266) repeatedly and considering (3.251), we reach the following relationship: ðnÞ

ðnÞ

ψ l jψ l

ðnÞ

= εðn - 1Þ - εðn - 2Þ εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlÞ ðnÞ

 ψ n - 1 jψ n - 1 :

ð3:268Þ ðnÞ

To show this, we use mathematical induction. We have already normalized ψ n - 1 ðnÞ ðnÞ in (3.261). Next, we calculate ψ n - 2 jψ n - 2 such that

116

3 ðnÞ

ðnÞ

ψ n - 2 jψ n - 2

ðnÞ

ðnÞ

Hydrogen-Like Atoms

ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 jbn - 1 ψ n - 1 = ψ n - 1 b{n - 1 bn - 1 ψ n - 1 ðnÞ

ðnÞ

= ψ n - 1 bn b{n þ εðn - 1Þ - εðn - 2Þ ψ n - 1 ðnÞ

ðnÞ

ðnÞ

= ψ n - 1 bn b{n ψ n - 1 þ εðn - 1Þ - εðn - 2Þ = εðn - 1Þ - εðn - 2Þ

ðnÞ

ðnÞ

ψ n - 1 jψ n - 1

ðnÞ

ψ n - 1 jψ n - 1 , ð3:269Þ

where with the third equality we used (3.267) with l = n - 1; with the last equality we used (3.251). Therefore, (3.268) holds with l = n - 2. Then, it suffices to show ðnÞ ðnÞ ðnÞ ðnÞ that assuming that (3.268) holds with ψ lþ1 jψ lþ1 , it holds with ψ l jψ l as well. ðnÞ

ðnÞ

Let us calculate ψ l jψ l ðnÞ

ðnÞ

ψ l jψ l

, starting with (3.266) as follows:

ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 b{lþ1 jblþ1 blþ2 ⋯bn - 1 ψ n - 1 ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 b{lþ1 blþ1 blþ2 ⋯bn - 1 ψ n - 1 ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 b{lþ2 þ εðlþ1Þ - εðlÞ blþ2 ⋯bn - 1 ψ n - 1 ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 b{lþ2 blþ2 ⋯bn - 1 ψ n - 1 þ εðlþ1Þ - εðlÞ

ðnÞ

ðnÞ

ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 ⋯bn - 1 ψ n - 1

ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 b{lþ2 blþ2 ⋯bn - 1 ψ n - 1 þ εðlþ1Þ - εðlÞ

ðnÞ

ðnÞ

ψ lþ1 jψ lþ1 : ð3:270Þ

In the next step, using b{lþ2 blþ2 = blþ3 b{lþ3 þ εðlþ2Þ - εðlþ1Þ , we have ðnÞ

ðnÞ

ψ l jψ l

ðnÞ

ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 blþ3 b{lþ3 blþ3 ⋯bn - 1 ψ n - 1 þ εðlþ1Þ - εðlÞ

ðnÞ

ðnÞ

ψ lþ1 jψ lþ1 þ εðlþ2Þ - εðlþ1Þ

ðnÞ

ðnÞ

ψ lþ1 jψ lþ1 : ð3:271Þ

Thus, we find that in the first term the index number of b{lþ3 has been increased by one with itself transferred toward the right side. On the other hand, we notice that ðnÞ ðnÞ with the second and third terms, εðlþ1Þ ψ lþ1 jψ lþ1 cancels out. Repeating the above processes, we reach a following expression:

3.7

Radial Wave Functions of Hydrogen-Like Atoms ðnÞ

ðnÞ

ψ l jψ l

ðnÞ

117 ðnÞ

= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 ⋯bn - 1 bn b{n ψ n - 1 ðn Þ

þ εðn - 1Þ - εðn - 2Þ

ðnÞ

ψ lþ1 jψ lþ1 þ εðn - 2Þ - εðn - 3Þ ðnÞ

ðnÞ

ðnÞ

ðnÞ

þ εðlþ2Þ - εðlþ1Þ

ψ lþ1 jψ lþ1 þ εðlþ1Þ - εðlÞ

= εðn - 1Þ - εðlÞ

ψ lþ1 jψ lþ1 :

ðn Þ

ðnÞ

ðn Þ

ψ lþ1 jψ lþ1 þ ⋯ ðnÞ

ψ lþ1 jψ lþ1

ð3:272Þ In (3.272), the first term of RHS vanishes because of (3.251); the subsequent ðnÞ ðnÞ terms produce ψ lþ1 jψ lþ1 whose coefficients have canceled out one another except for [ε(n - 1) - ε(l )]. Meanwhile, from assumption of the mathematical induction we have ðnÞ

ðnÞ

ψ lþ1 jψ lþ1 = εðn - 1Þ - εðn - 2Þ εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlþ1Þ ðnÞ

ðnÞ

 ψ n - 1 jψ n - 1 : Inserting this equation into (3.272), we arrive at (3.268). In other words, we have shown that if (3.268) holds with l = l + 1, (3.268) holds with l = l as well. This ðnÞ ðnÞ with l down to 0. completes the proof to show that (3.268) is true of ψ l jψ l ðnÞ

The normalized wave functions ψ l ðnÞ

are expressed from (3.255) as ðnÞ

ψ l = κ ðn, lÞ - 2 blþ1 blþ2 ⋯bn - 1 ψ n - 1 , 1

ð3:273Þ

where κ(n, l ) is defined such that κ ðn, lÞ  εðn - 1Þ - εðn - 2Þ  εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlÞ ,

ð3:274Þ

with l ≤ n - 2. More explicitly, we get κðn, lÞ =

ð2n - 1Þ!ðn - l - 1Þ!ðl!Þ2 : ðn þ lÞ!ðn!Þ2 ðnn - l - 2 Þ2

ð3:275Þ

In particular, from (3.265) we have ðnÞ

ψn-1 =

2 n

nþ12

1 ρn e - ρ=n : ð2nÞ!

From (3.272), we define the following operator:

ð3:276Þ

118

3

bl  εðn - 1Þ - εðl - 1Þ

- 12

Hydrogen-Like Atoms

bl :

ð3:277Þ

Then (3.273) becomes ðnÞ

ðnÞ

ψ l = blþ1 blþ2 ⋯bn - 1 ψ n - 1 :

3.7.3

ð3:278Þ

Associated Laguerre Polynomials ðnÞ

It will be of great importance to compare the functions ψ l with conventional wave functions that are expressed using associated Laguerre polynomials. For this purpose ðnÞ we define the following functions Φl ðρÞ such that ðnÞ

Φl ð ρ Þ 

2 n

lþ32

ðn - l - 1Þ! - ρn lþ1 2lþ1 2ρ : e ρ Ln - l - 1 n 2nðn þ lÞ!

ð3:279Þ

The associated Laguerre polynomials are described as Lνn ðxÞ =

1 - ν x dn nþν - x x e n ðx e Þ, ðν > - 1Þ: n! dx

ð3:280Þ

In a form of power series expansion, the polynomials are expressed for integer k ≥ 0 as Lkn ðxÞ =

ð- 1Þm ðn þ kÞ! xm : m = 0 ðn - mÞ!ðk þ mÞ!m! n

ð3:281Þ

Notice that “Laguerre polynomials” Ln(x) are defined as Ln ðxÞ  L0n ðxÞ: Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series expansion of Ln(x) are given by [2, 5] 1 x dn n - x ðx e Þ, e n! dxn n ð- 1Þm n! xm : Ln ðxÞ = m=0 ðn - mÞ!ðm!Þ2

Ln ðxÞ =

3.7

Radial Wave Functions of Hydrogen-Like Atoms

119

ðnÞ

ρ

The function Φl ðρÞ contains multiplication factors e - n and ρl + 1. The function 2lþ1 Ln - l - 1 2ρ is a polynomial of ρ with the highest order of ρn - l - 1. Therefore, n ρ ðnÞ Φl ðρÞ consists of summation of terms containing e - n ρt , where t is an integer equal ðnÞ to 1 or larger. Consequently, Φl ðρÞ → 0 when ρ → 0 and ρ → 1 (vide supra). ðnÞ Thus, we have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned d earlier and, hence, the operator dρ is indeed an anti-Hermitian. To show this more d explicitly, we define D  dρ . An inner product between arbitrarily chosen functions f and g is 1

hf jDgi 

0 

= ½f

f  Dgdρ = ½f  g1 0 -

g 1 0



1

ðDf  Þgdρ

0

ð3:282Þ

þ h - D f jgi,

where f  is a complex conjugate of f. Meanwhile, from (1.112) we have hf jDgi = D{ f jg :

ð3:283Þ

Therefore if the functions f and g vanish at ρ → 0 and ρ → 1, we obtain D{ = D by equating (3.282) and (3.283). However, since D is a real operator, D = D. Thus we get 

D{ = - D: ðnÞ

This means that D is anti-Hermitian. The functions Φl ðρÞ we are dealing with certainly satisfy the required boundary conditions. The operator H(l ) appearing in (3.247) and (3.248) is Hermitian accordingly. This is because b{l bl = ð- A þ H ÞðA þ H Þ = H 2 - A2 - AH þ HA, b{l bl

{

= H2

{

- A2

{

- H { A{ þ A{ H { = H {

2

- A{

2

ð3:284Þ

- H { A{ þ A{ H {

= H 2 - ð- AÞ2 - H ð - AÞ þ ð - AÞH = H 2 - A2 þ HA - AH = b{l bl : ð3:285Þ The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate (or wave function) which belongs to that eigenvalue are physically meaningful. Next, consider the following operation:

120

3

ðnÞ

bl Φl ðρÞ ¼ εðn - 1Þ - εðl - 1Þ ρ

- 12

 e - n ρlþ1 Ln2lþ1 -l-1

2 n 2ρ n

lþ32

Hydrogen-Like Atoms

ðn - l - 1Þ! d l 1 þ ρ l 2nðn þ lÞ! dρ ð3:286Þ

,

where εðn - 1Þ - εðl - 1Þ

- 12

=

nl : ðn þ lÞðn - lÞ

ð3:287Þ

2ρ in a power series expansion form using (3.280) and Rewriting Ln2lþ1 -l-1 n rearranging the result, we obtain

bl Φl ðρÞ ¼

2 n

lþ32

nl ðn þ lÞðn-lÞ

¼

2 n

lþ32

nl 1 d l 1 þ ρ l ðn þ lÞðn-lÞ 2nðn þ lÞ!ðn-l-1Þ! dρ m n-l-1 ð-1Þ ðn þ lÞ! lþmþ1 2 m ðn-l-1Þ! ρ , m¼0 n m!ðn-l-m-1Þ! ð2l þ m þ 1Þ!

ðnÞ

×e -ρ=n

1 d l 1 þ ρ l 2nðn þ lÞ!ðn-l-1Þ! dρ

ρ

en ρ -l

d n-l-1 nþl - 2ρn ρ e dρn-l-1

ð3:288Þ

where we used well-known Leibniz rule of higher order differentiation of a product 2ρ dn - l - 1 d function; i.e., dρ ρnþl e - n . To perform further calculation, notice that dρ does n-l-1 not change a functional form of e-ρ/n, whereas it lowers the order of ρl + m + 1 by one. Meanwhile, operation of ρl lowers the order of ρl + m + 1 by one as well. The factor nþl 2l in the following equation (3.289) results from these calculation processes. Considering these characteristics of the operator, we get ðnÞ bl Φ l ð ρÞ

¼

2 n

lþ32

×

þ 2l

nþl 2l n-l-1 m¼0

n-l-1 m¼0

nl ðn þ lÞðn - lÞ

1 e - ρ=n ρl 2nðn þ lÞ!ðn - l - 1Þ!

ð- 1Þmþ1 ðn þ lÞ! ðn - l - 1Þ! 2 ρmþ1 n ð2l þ m þ 1Þ! m!ðn - l - m - 1Þ!

mþ1

ð- 1Þm ðn þ l - 1Þ! ðn - l - 1Þ! 2 ρm n ð2l þ mÞ! m!ðn - l - m - 1Þ!

m

:

ð3:289Þ In (3.289), calculation of the part {⋯} next to the multiplication sign for RHS is somewhat complicated, and so we describe the outline of the calculation procedure below. We have

3.7

Radial Wave Functions of Hydrogen-Like Atoms

121

{⋯} of RHS of (3.289) = þ2l =

ð- 1Þm ðn þ lÞ! ðn - l- 1Þ! 2 m ρm n ð2l þ mÞ! ðm - 1Þ!ðn - l- mÞ! m n-l - 1 ð- 1Þ ðn þ l- 1Þ! ðn - l- 1Þ! 2 ρm m= 0 n ð2l þ mÞ! m!ðn -l - m- 1Þ!

n -l

m= 1

m 2ρ m ðn - l- 1Þ!ðn þ l -1Þ! n -l -1 ð-1Þ n m= 1 ð2l þ mÞ!

þð- 1Þn -l =

n-l

þ

nþl 2l þ ðm -1Þ!ðn -l - mÞ! m!ðn- l -m - 1Þ!

ðn þ l -1Þ! ð2l- 1Þ!

m 2ρ m ðn - l- 1Þ!ðn þ l -1Þ! ðm -1Þ!ðn - l- m- 1Þ!ð2l þ mÞðn -lÞ n -l -1 ð-1Þ n  m= 1 ð2l þ mÞ! ðm -1Þ!ðn -l - mÞ!m!ðn - l- m- 1Þ!

þð- 1Þn -l =

2ρ n

m

2ρ n

n-l

þ

ðn þ l -1Þ! ð2l- 1Þ!

m 2ρ m ðn - lÞ!ðn þ l- 1Þ! n -l -1 ð-1Þ n þ ð- 1Þn-l m= 1 ð2l þ m - 1Þ!ðn - l- mÞ!m!

= ðn - lÞ!

n-l - 1 m =1

2ρ n

n-l

þ

ðn þ l -1Þ! ð2l- 1Þ!

m

ð- 1Þm 2ρ ðn þ l- 1Þ! 1 2ρ n þ ð- 1Þn-l ð2l þ m- 1Þ!ðn - l- mÞ!m! ðn- lÞ! n

n- l

þ

ðn þ l- 1Þ! ðn - lÞ!ð2l- 1Þ!

m

= ðn - lÞ!

ð- 1Þm 2ρ ðn þ l -1Þ! -1 2ρ n : = ðn -lÞ!L2l n- l m =0 ð2l þ m -1Þ!ðn - l- mÞ!m! n n- l

ð3:290Þ

Notice that with the second equality of (3.290), the summation is divided into three terms; i.e., 1 ≤ m ≤ n - l - 1, m = n - l (the highest-order term), and m = 0 (the lowest-order term). Note that with the second last equality of (3.290), the highest-order (n - l) term and the lowest-order term (i.e., a constant) have been absorbed in a single equation, namely, an associated Laguerre polynomial. Correspondingly, with the second last equality of (3.290) the summation range is extended to 0 ≤ m ≤ n - l. Summarizing the above results, we get

122

3

2 n

lþ32

2 ¼ n

lþ12

ðnÞ

bl Φ l ð ρÞ ¼

¼

2 n

nþl 2l

nlðn - lÞ! ðn þ lÞðn - lÞ

Hydrogen-Like Atoms

1 2ρ e - ρ=n ρl Ln2l--l1 n 2nðn þ lÞ!ðn - l - 1Þ!

ðn - lÞ! 2ρ e - ρ=n ρl Ln2l--l1 n 2nðn þ l - 1Þ!

ðl - 1Þþ32

½n - ðl - 1Þ - 1! - ρn ðl - 1Þþ1 2ðl - 1Þþ1 2ρ e ρ Ln - ðl - 1Þ - 1 n 2n½n þ ðl - 1Þ!

ðnÞ

 Φl - 1 ðρÞ: ð3:291Þ ðnÞ

ðnÞ

Thus, we find out that Φl ðρÞ behaves exactly like ψ l . Moreover, if we replace l in (3.279) with n - 1, we find ðnÞ

ðnÞ

ð3:292Þ

ðnÞ

ð3:293Þ

Φn - 1 ðρÞ = ψ n - 1 : Operating bn - 1 on both sides of (3.292), ðnÞ

Φn - 2 ð ρÞ = ψ n - 2 : Likewise successively operating bl (1 ≤ l ≤ n - 1), ðnÞ

ðnÞ

Φl ðρÞ = ψ l ðρÞ,

ð3:294Þ

with all allowed numbers of l (i.e., 0 ≤ l ≤ n - 1). This permits us to identify ðnÞ

ðnÞ

Φl ðρÞ  ψ l ðρÞ:

ð3:295Þ

Consequently, it is clear that the parameter n introduced in (3.249) is identical to a principal quantum number and that the parameter l (0 ≤ l ≤ n - 1) is an orbital angular momentum quantum number (or azimuthal quantum number). The functions ðnÞ ðnÞ Φl ðρÞ and ψ l ðρÞ are identical up to the constant cn expressed in (3.265). Note, however, that a complex constant with an absolute value of 1 (phase factor) remains undetermined, as is always the case with the eigenvalue problem. The radial wave functions are derived from the following relationship as described earlier: ðnÞ

ðnÞ

Rl ðr Þ = ψ l =ρ: ðnÞ

To normalize Rl ðr Þ, we have to calculate the following integral:

ð3:296Þ

3.7

Radial Wave Functions of Hydrogen-Like Atoms 1 0

=

1

2

ðnÞ

Rl ðr Þ r 2 dr = a Z

3

0

1 ðnÞ ψ ρ2 l

2

2

a ρ Z

123

a a dρ = Z Z

1

3 0

ðnÞ

ψl

2



:

ð3:297Þ ðnÞ

Accordingly, we choose the following functions Rl ðr Þ for the normalized radial wave functions: ðnÞ

R l ðr Þ =

ðnÞ

ðZ=aÞ3 Rl ðr Þ:

ð3:298Þ

Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we obtain ðnÞ

Rl ðr Þ =

2Z an

3



ðn - l - 1Þ! 2Zr 2nðn þ lÞ! an

l

exp -

Zr an

Ln2lþ1 -l-1

2Zr : an

ð3:299Þ

Equation (3.298) is exactly the same as the normalized radial wave functions that can be obtained as the solution of (3.241) through the power series expansion. All ħ2 Z 2 1 these functions belong to the same eigenenergy E n = - 2μ a n2 ; see (3.258). Returning back to the quantum states sharing the same eigenenergy, we have such n states that belong to the principal quantum number n with varying azimuthal quantum number l (0 ≤ l ≤ n - 1). Meanwhile, from (3.158), (3.159), and (3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1 different values of m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l: The quantum states of a hydrogen-like atoms are characterized by a set of integers (m, l, n). On the basis of the above discussion, the number of those sharing the same energy (i.e., the principal quantum number n) is n-1 l=0

ð2l þ 1Þ = n2 :

Regarding this situation, we say that the quantum states belonging to the principal 2 ħ2 Z 2 1 quantum number n, or the energy eigenvalue E n = - 2μ a n2 , are degenerate n fold. Note that we ignored the freedom of spin state. In summary of this section, we have developed the operator formalism in dealing with radial wave functions of hydrogen-like atoms and seen how the operator formalism features the radial wave functions. The essential point rests upon that the radial wave functions can be derived by successively operating the lowering

124

3 Hydrogen-Like Atoms ðnÞ

operators bl on ψ n - 1 that is parametrized with a principal quantum number n and an orbital angular momentum quantum number l = n - 1. This is clearly represented by (3.278). The results agree with the conventional coordinate representation method based upon the power series expansion that leads to associated Laguerre polynomials. Thus, the operator formalism is again found to be powerful in explicitly representing the mathematical constitution of quantum-mechanical systems.

3.8

Total Wave Functions

Since we have obtained angular wave functions and radial wave functions, we ðnÞ

describe normalized total wave functions Λl,m of hydrogen-like atoms as a product of the angular part and radial part such that ðnÞ

ðnÞ

Λl,m = Y m l ðθ, ϕÞRl ðr Þ:

ð3:300Þ

Let us seek several tangible functional forms of hydrogen (Z = 1) including angular and radial parts. For example, we have ð1Þ

ϕð1sÞ  Y 00 ðθ, ϕÞR0 ðr Þ =

ðnÞ

1 - 3=2 ψ n - 1 a = 4π ρ

1 - 3=2 - r=a a e , π

ð3:301Þ

where we used (3.276) and (3.295). For ϕ(2s), using (3.277) and (3.278) we have ð2Þ 3 r 1 r ϕð2sÞ  Y 00 ðθ, ϕÞR0 ðr Þ = p a - 2 e - 2a 2 - : a 4 2π

ð3:302Þ

For ϕ(2pz), in turn, we express it as ð2Þ

3 r r 3 1 ðcos θÞ p a - 2 e - 2a a 4π 2 6 r 3 r r z 5 r 1 1 1 - 32 r - 2a = p a e cos θ = p a - 2 e - 2a = p a - 2 e - 2a z: a a r 4 2π 4 2π 4 2π ð3:303Þ

ϕ 2pz  Y 01 ðθ, ϕÞR1 ðr Þ =

For ϕ(2px + iy), using (3.217) we get

References 3 r r ð2Þ 1 ϕ 2pxþiy  Y 11 ðθ, ϕÞR1 ðr Þ = - p a - 2 e - 2a sin θeiϕ a 8 π 3 r r x þ iy 5 r 1 1 = - p a - 2 e - 2a = - p a - 2 e - 2a ðx þ iyÞ: a r 8 π 8 π

125

ð3:304Þ

In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore, we have ð2Þ 3 r r 1 ϕ 2px - iy  Y 1- 1 ðθ, ϕÞR1 ðr Þ = p a - 2 e - 2a sin θe - iϕ a 8 π 3 r r x - iy 5 r 1 1 = p a - 2 e - 2a = p a - 2 e - 2a ðx - iyÞ: a r 8 π 8 π

ð3:305Þ

Notice that the above notations ϕ(2px + iy) and ϕ(2px - iy) differ from the custom that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.

References 1. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York, NY 2. Arfken GB (1970) Mathematical methods for physicists, 2nd edn. Academic Press, Waltham, MA 3. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York, NY 5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York, NY 6. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd edn. Cambridge University Press, Cambridge 7. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham, MA 8. Lebedev NN (1972) Special functions and their applications. Dover, New York, NY 9. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York, NY 10. Hotta S (2017) Operator representations for radial wave functions of hydrogen-like atoms. Bull Kyoto Inst Technol 9:1–12

Chapter 4

Optical Transition and Selection Rules

In Sect. 1.2 we showed the Schrödinger equation as a function of space coordinates and time. In subsequent sections, we dealt with the time-independent eigenvalue problems of a harmonic oscillator and hydrogen-like atoms. This implies that the physical system is isolated from the outside world and that there is no interaction between the outside world and physical system we are considering. However, by virtue of the interaction, the system may acquire or lose energy, momentum, angular momentum, etc. As a consequence of the interaction, the system changes its quantum state as well. Such a change is said to be a transition. If the interaction takes place as an optical process, we are to deal with an optical transition. Of various optical transitions, the electric dipole transition is common and the most important. In this chapter, we study the optical transition of a particle confined in a potential well, a harmonic oscillator, and a hydrogen using a semiclassical approach. A question of whether the transition is allowed or forbidden is of great importance. We have a selection rule to judge it.

4.1

Electric Dipole Transition

We have a time-dependent Schrödinger equation described as Hψ = iħ

∂ψ : ∂t

ð1:47Þ

Using the method of separation of variables, we obtained two equations expressed below: HϕðxÞ = EϕðxÞ,

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_4

ð1:55Þ

127

128

4



Optical Transition and Selection Rules

∂ξðt Þ = Eξðt Þ: ∂t

ð1:56Þ

Equation (1.55) is an eigenvalue equation of energy and (1.56) is an equation with time. So far we have focused our attention upon (1.55) taking a particle confined within a potential well, a one-dimensional harmonic oscillator, and hydrogen-like atoms as an example. In this chapter we deal with a time-evolved Schrödinger equation and its relevance to an optical transition. The optical transition takes place according to selection rules. We mention their significance as well. We showed that after solving the eigenvalue equation, the Schrödinger equation is expressed as ψ ðx, t Þ = ϕðxÞ expð- iEt=ħÞ:

ð1:60Þ

The probability density of the system (i.e., normally a particle such as an electron, a harmonic oscillator, etc.) residing at a certain place x at a certain time t is expressed as ψ  ðx, t Þψ ðx, t Þ: If the Schrödinger equation is described as a form of separated variables as in the case of (1.60), the exponential factors including t cancel out and we have ψ  ðx, t Þψ ðx, t Þ = ϕ ðxÞϕðxÞ:

ð4:1Þ

This means that the probability density of the system depends only on spatial coordinate and is constant in time. Such a state is said to be a stationary state. That is, the system continues residing in a quantum state described by ϕ(x) and remains unchanged independent of time. Next, we consider a linear combination of functions described by (1.60). That is, we have ψ ðx, t Þ = c1 ϕ1 ðxÞ expð- iE 1 t=ħÞ þ c2 ϕ2 ðxÞ expð- iE 2 t=ħÞ,

ð4:2Þ

where the first term is pertinent to the state 1 and second term to the state 2 and c1 and c2 are complex constants with respect to the spatial coordinates but may be weakly time-dependent. The state described by (4.2) is called a coherent state. The probability distribution of that state is described as 2

ψ  ðx, t Þψ ðx, t Þ = jc1 j2 ϕ1 j2 þ c2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 e - iωt þ c2 c1 ϕ2 ϕ1 eiωt , where ω is expressed as

ð4:3Þ

4.1 Electric Dipole Transition

129

ω = ðE2 - E 1 Þ=ħ:

ð4:4Þ

This equation shows that the probability density of the system undergoes a sinusoidal oscillation with time. The angular frequency equals the energy difference between the two states divided by the reduced Planck constant. If the system is a charged particle such as an electron and proton, the sinusoidal oscillation is accompanied by an oscillating electromagnetic field. Thus, the coherent state is associated with the optical transition from one state to another, when the transition is related to the charged particle. The optical transitions result from various causes. Of these, the electric dipole transition yields the largest transition probability and the dipole approximation is often chosen to represent the transition probability. From the point of view of optical measurements, the electric dipole transition gives the strongest absorption or emission spectral lines. The matrix element of the electric dipole, more specifically a square of an absolute value of the matrix element, is a measure of the optical transition probability. Labeling the quantum states as a, b, etc. and describing the corresponding state vector as jai, jbi, etc., the matrix element Pba is given by Pba  hbjεe  Pjai,

ð4:5Þ

where εe is a unit polarization vector of the electric field of an electromagnetic wave (i.e., light). Equation (4.5) describes the optical transition that takes place as a result of the interaction between electrons and radiation field in such a way that the interaction causes electrons in the system to change the state from jai to jbi. That interaction is represented by εe  P. The quantum states jai and jbi are referred to as an initial state and final state, respectively. The quantity P is the electric dipole moment of the system, which is defined as Pe

x, j j

ð4:6Þ

where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron. Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be transition dipole moment, or more precisely, transition electric dipole moment with respect to the states jai and jbi. We assume that the optical transition occurs from a quantum state jai to another state jbi. Since Pba is generally a complex number, | Pba|2 represents the transition probability. If we adopt the coordinate representation, (4.5) is expressed by Pba =

ϕb εe  Pϕa dτ,

where τ denotes an integral range of a space.

ð4:7Þ

130

4

4.2

Optical Transition and Selection Rules

One-Dimensional System

Let us apply the aforementioned general description to individual cases of Chaps. 1– 3. Example 4.1 A particle confined in a square-well potential This example was treated in Chap. 1. As before, we assume that a particle (i.e., electron) is confined in a one-dimensional system [-L ≤ x ≤ L (L > 0)]. We consider the optical transition from the ground state ϕ1(x) to the first excited state ϕ2(x). Here, we put L = π/2 for convenience. Then, the normalized coherent state ψ(x) is described as 1 ψ ðx, t Þ = p ½ϕ1 ðxÞ expð- iE 1 t=ħÞ þ ϕ2 ðxÞ expð- iE 2 t=ħÞ, 2 where we put c1 = c2 =

p1 2

ð4:8Þ

in (4.2). In (4.8), we have

ϕ1 ð x Þ =

2 cos x and π

ϕ2 ðxÞ =

2 sin 2x: π

ð4:9Þ

Following (4.3), we have a following real function called a probability distribution density: ψ  ðx, t Þψ ðx, t Þ =

1 cos 2 x þ sin 2 2x þ ðsin 3x þ sin xÞ cos ωt , π

ð4:10Þ

where ω is given by (4.4) as ω = 3ħ=2m,

ð4:11Þ

where m is a mass of an electron. Rewriting (4.10), we have ψ  ðx, t Þψ ðx, t Þ =

1 1 1 þ ðcos 2x - cos 4xÞ þ ðsin 3x þ sin xÞ cos ωt : π 2

ð4:12Þ

Integrating (4.12) over - π2, π2 , a contribution from only the first term is non-vanishing to give 1, as anticipated (because of the normalization). Putting t = 0 and integrating (4.12) over a positive domain 0, π2 , we have π=2 0

ψ  ðx, 0Þψ ðx, 0Þdx =

1 4 þ ≈ 0:924: 2 3π

Similarly, integrating (4.12) over a negative domain - π2, 0 , we have

ð4:13Þ

,

Fig. 4.1 Probability distribution density ψ (x, t)ψ(x, t) of a particle confined in a square-well potential. The solid curve and broken curve represent the density of t = 0 and t = π/ω (i.e., half period), respectively

131

,

One-Dimensional System

4.2

− /2

0 - π=2

0

ψ  ðx, 0Þψ ðx, 0Þdx =

1 4 ≈ 0:076: 2 3π

/2

ð4:14Þ

Thus, 92% of a total charge (as a probability density) is concentrated in the positive domain. Differentiation of ψ (x, 0)ψ(x, 0) gives five extremals including both edges. Of these, a major maximum is located at 0.635 radian that corresponds to about 40% of π/2. This can be a measure of the transition moment. Figure 4.1 demonstrates these results (see a solid curve). Meanwhile, putting t = π/ω (i.e., half period), we plot ψ (x, π/ω)ψ(x, π/ω). The result shows that the graph is obtained by folding back the solid curve of Fig. 4.1 with respect to the ordinate axis. Thus, we find that the charge (or the probability density) exerts a sinusoidal oscillation with an angular frequency 3ħ/2m along the x-axis around the origin. Let e1 be a unit vector in the positive direction of the x-axis. Then, the electric dipole P of the system is P = ex = exe1 ,

ð4:15Þ

where x is a position vector of the electron. Let us define the matrix element of the electric dipole transition as P21  hϕ2 ðxÞje1  Pjϕ1 ðxÞi = hϕ2 ðxÞjexjϕ1 ðxÞi:

ð4:16Þ

Notice that we only have to consider that the polarization of light is parallel to the x-axis. With the coordinate representation, we have

132

4 π=2

P21 =

- π=2

=e =

2 π

e π

ϕ2 ðxÞexϕ1 ðxÞdx =

π=2 - π=2 π=2 - π=2

π=2

2 ðcos xÞex π

- π=2

x cos x sin 2xdx =

e π

xð- cos xÞ0 þ x -

Optical Transition and Selection Rules

π=2 - π=2

2 sin 2xdx π

xðsin x þ sin 3xÞdx

1 cos 3x 3

0

dx =

ð4:17Þ

16e , 9π

where we used a trigonometric formula and integration by parts. The factor 16/9π in (4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is estimated above from the major maximum of ψ (x, 0)ψ(x, 0). Note that the transition moment vanishes if the two states associated with the transition have the same parity. In other words, if these are both described by sine functions or cosine functions, the integral vanishes. Example 4.2 One-dimensional harmonic oscillator Second, let us think of an optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We denote the state of the oscillator as jni in place of jψ ni (n = 0, 1, 2, ⋯) of Chap. 2. Then, a general expression (4.5) can be written as Pkl = hkjεe  Pjli:

ð4:18Þ

Since we are considering the sole one-dimensional oscillator, we have εe = q

and

P = eq,

ð4:19Þ

where q is a unit vector in the positive direction of the coordinate q. Therefore, similar to the above, we have εe  P = eq:

ð4:20Þ

Pkl = ehkjqjli:

ð4:21Þ

That is,

Since q is an Hermitian operator, we have Pkl = e ljq{ jk = ehljqjk i = Plk , where we used (1.116). Using (2.68), we have

ð4:22Þ

4.2

One-Dimensional System

Pkl = e

133

ħ kja þ a{ jl = e 2mω

ħ hkjajli þ hkja{ jli : 2mω

ð4:23Þ

Taking the adjoint of (2.62) and modifying the notation, we have p hk j a = k þ 1hk þ 1 j :

ð4:24Þ

Using (2.62) once again, we get Pkl = e

p ħ p k þ 1hk þ 1jli þ l þ 1hkjl þ 1i : 2mω

ð4:25Þ

Using orthonormal conditions between the state vectors, we have Pkl = e

p ħ p k þ 1δkþ1, l þ l þ 1δk, 2mω

lþ1

:

ð4:26Þ

Exchanging k and l in the above, we get Pkl = Plk : The matrix element Pkl is symmetric with respect to indices k and l. Notice that the first term does not vanish only when k + 1 = l. The second term does not vanish only when k = l + 1. Therefore we get Pk,kþ1 = e

ħðk þ 1Þ and 2mω

Plþ1,l = e

ħð l þ 1Þ : 2mω

ð4:27Þ

Meanwhile, we find that the transition matrix P is expressed as

P = eq = e

ħ a þ a{ = e 2mω

ħ 2mω

0

1

1 0

0 p 2

0

0

0

0





0



0 0 p 3 0 0 2

⋯ ⋯ ⋯

0

2

0











0 p 2 0 p

3

0

, ð4:28Þ

where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix. Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and (2.66). It is an intuitively obvious and straightforward task. Having a glance at the matrix form immediately tells us that the transition matrix elements are

134

4 Optical Transition and Selection Rules

non-vanishing with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)element represents transition from the k-th excited state to (k - 1)-th excited state accompanied by photoemission, the (k + 1, k)-element implies the transition from (k - 1)-th excited state to k-th excited state accompanied by photoabsorption. The two transitions give the same transition moment. Note that zeroth excited state means the ground state; see (2.64) for basis vector representations. We should be careful about “addresses” of the matrix accordingly. For example, P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2) element. Suppose that we seek the transition dipole moments using coordinate representation. Then, we need to use (2.106) and perform definite integration. For instance, we have 1

e -1

ψ 0 ðqÞqψ 1 ðqÞdq

that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e

ħ 2mω.

The confirmation is left for the readers. Nonetheless, to seek a definite integral of product of higher excited-state wave functions becomes increasingly troublesome. In this respect, the operator method described above provides us with a much better insight into complicated calculations. Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to occur only when the quantum number changes by one. Notice also that the transition takes place between the even function and odd function; see Table 2.1 and (2.101). Such a condition or restriction on the optical transition is called a selection rule. The former equation of (4.27) shows that the transition takes place from the upper state to the lower state accompanied by the photon emission. The latter equation, on the other hand, shows that the transition takes place from the lower state to the upper accompanied by the photon absorption.

4.3

Three-Dimensional System

The hydrogen-like atoms give us a typical example. Since we have fully investigated the quantum states of those atoms, we make the most of the related results. Example 4.3 An electron in a hydrogen atom Unlike the one-dimensional system, we have to take account of an angular momentum in the three-dimensional system. We have already obtained explicit wave functions. Here we focus on 1s and 2p states of a hydrogen. For their normalized states, we have

4.3

Three-Dimensional System

135

ϕð1sÞ = ϕ 2pz =

ϕ 2px - iy =

1 8

ð4:29Þ

1 r - r=2a cos θ, e 2πa3 a

1 4

ϕ 2pxþiy = -

1 - r=a e , πa3

1 8

ð4:30Þ

1 r - r=2a sin θeiϕ , e πa3 a

ð4:31Þ

1 r - r=2a sin θe - iϕ , e πa3 a

ð4:32Þ

where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is due to the Condon–Shortley phase. Even though the transition probability is proportional to a square of the matrix element and so the phase factor cancels out, we describe the state vector faithfully. The energy eigenvalues are E ð1sÞ = -

ħ2 , 2μa2

E 2pz = E 2pxþiy = E 2px - iy = -

ħ2 , 8μa2

ð4:33Þ

where μ is a reduced mass of a hydrogen. Note that the latter three states are degenerate. First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the normalized coherent state is described as iE 2pz t iE ð1sÞt 1 þ ϕ 2pz exp ψ ðx, t Þ = p ϕð1sÞ exp ħ ħ 2

:

ð4:34Þ

As before, we have ψ  ðx, t Þψ ðx, t Þ = jψ ðx, t Þj2 1 = ½ϕð1sÞ2 þ ϕ 2pz 2

2

þ 2ϕð1sÞϕ 2pz cos ωt ,

ð4:35Þ

where ω is given by ω = E 2pz - E ð1sÞ =ħ = 3ħ=8 μa2 :

ð4:36Þ

In virtue of the third term of (4.35) that contains a cos ωt factor, the charge distribution undergoes a sinusoidal oscillation along the z-axis with an angular frequency described by (4.36). For instance, ωt = 0 gives +1 factor to (4.35) when t = 0, whereas it gives -1 factor when ωt = π, i.e., t = 8π μa2/3ħ. Integrating (4.35), we have

136

4

ψ  ðx, t Þψ ðx, t Þdτ ¼ ¼

Optical Transition and Selection Rules

jψ ðx, t Þj2 dτ 1 2

½ϕð1sÞ2 þ ϕ 2pz

þ cos ωt

2



ϕð1sÞϕ 2pz dτ

1 1 ¼ þ ¼ 1, 2 2 where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with orthogonality of them. Note that both of the functions are real. Next, we calculate the matrix element. For simplicity, we denote the matrix element simply as Pðεe Þ only by designating the unit polarization vector εe. Then, we have Pðεe Þ = ϕð1sÞjεe  Pjϕ 2pz ,

ð4:37Þ

where x y

P = ex = eðe1 e2 e3 Þ

:

ð4:38Þ

z We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we have ðe Þ

Pz,jp3 i = e ϕð1sÞjzjϕ 2pz z

e = p 4 2πa4

1

π

r 4 e - 3r=2a dr

0



cos 2 θ sin θdθ

0

0

ðe Þ

dϕ =

p 27 2 ea ≈ 0:745ea: 35 ð4:39Þ

In (4.39), we express the matrix element as Pz,jp3 i to indicate the z-component of z position vector and to explicitly show that ϕ(2pz) state is responsible for the transition. In (4.39), we used z = r cos θ. We also used a radial part integration such that 1 0

r 4 e - 3r=2a dr = 24

2a 5 : 3

4.3

Three-Dimensional System

137

Also, we changed a variable cos θ → t to perform the integration with respect to θ. We see that a “leverage” length of the transition moment is comparable to Bohr radius a. ðe Þ With the notation Pz,jp3 i , we need some explanation for consistency with the latter z description. Equation (4.39) represents the transition from jϕ(2pz)i to jϕ(1s)i that is accompanied by the photon emission. Thus, jpzi in the notation means that jϕ(2pz)i is the initial state. In the notation, in turn, (e3) denotes the polarization vector and z represents the electric dipole. In the case of photon absorption where the transition occurs from jϕ(1s)i to jϕ(2pz)i, we use the following notation: ðe Þ

Pz,hp3 j = e ϕ 2pz jzjϕð1sÞ :

ð4:40Þ

z

Since all the functions related to the integration are real, we have ðe Þ

ðe Þ

z

z

Pz,jp3 i = Pz,hp3 j , where hpzj means that ϕ(2pz) is designated as the final state. Meanwhile, if we choose e1 for εe to evaluate the matrix element Px, we have ðe Þ

Px,jp1 i z

= e ϕð1sÞjxjϕ 2pz e = p 4 2πa4

1

π

r 4 e - 3r=2a dr

0



sin 2 θ cos θdθ

0

cos ϕdϕ = 0,

ð4:41Þ

0

where cos ϕ comes from x = r sin θ cos ϕ and an integration of cos ϕ gives zero. In a similar manner, we have ðe Þ

Py,jp2 i = e ϕð1sÞjyjϕ 2pz z

= 0:

ð4:42Þ

Next, we estimate the matrix elements associated with 2px and 2py. For this purpose, it is convenient to introduce the following complex coordinates by a unitary transformation:

138

4

x ð e1 e2 e3 Þ

y

= ð e1 e2 e3 Þ

1 p 2 i -p

z

2

0

Optical Transition and Selection Rules

1 p 0 2 i p 0 2 0

1 p 2 1 p 2

1

i p

0

2

i -p

2

0

0

1 p ðx þ iyÞ 2 1 p ðx - iyÞ 2 z

1 1 = p ðe1 - ie2 Þ p ðe1 þ ie2 Þ e3 2 2

x

0

y z

1

,

ð4:43Þ where a unitary transformation is represented by a unitary matrix defined as U { U = UU { = E:

ð4:44Þ

We will investigate details of the unitary transformation and matrix in Parts III and IV. We define e+ and e- as follows [1]: 1 eþ  p ðe1 þ ie2 Þ 2

and

1 e -  p ðe1 - ie2 Þ, 2

ð4:45Þ

where complex vectors e+ and e- represent the left-circularly polarized light and right-circularly polarized light that carry an angular momentum ħ and -ħ, respectively. We will revisit the characteristics and implication of these complex vectors in Sect. 7.4. We have x ð e1 e2 e3 Þ

y z

= ð e - eþ e 3 Þ

1 p ðx þ iyÞ 2 1 p ðx - iyÞ 2 z

:

ð4:46Þ

Note that e+, e-, and e3 are orthonormal. That is, heþ jeþ i = 1,

heþ je - i = 0,

etc:

ð4:47Þ

In this situation, e+, e-, and e3 are said to form an orthonormal basis in a threedimensional complex vector space (see Sect. 11.4). Now, choosing e+ for εe, we have [2]

4.3

Three-Dimensional System ðe Þ

139

1  e ϕð1sÞjp ðx - iyÞjϕ 2pxþiy 2

Px -þ iy,jp

þi

ð4:48Þ

,

where jp+i is a shorthand notation of ϕ(2px + iy) that is designated as the initial state; x - iy represents a complex electric dipole. Equation (4.48) represents an optical process in which an electron causes transition from ϕ(2px + iy) to ϕ(1s) to lose an angular momentum ħ, whereas the radiation field gains that angular momentum to ðe Þ conserve a total angular momentum ħ. The notation Px -þ iy,jp i reflects this situation. þ Using the coordinate representation, we rewrite (4.48) as e ðe Þ Px -þ iy,jp i = - p þ 8 2πa4 p 27 2 ea, 35

1

π

r 4 e - 3r=2a dr

0



sin 3 θdθ

0

e - iϕ eiϕ dϕ =

0

ð4:49Þ

where we used x - iy = r sin θe - iϕ :

ð4:50Þ

In the definite integral of (4.49), e-iϕ comes from x - iy, while eiϕ comes from ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular momentum eigenvalue ħ. Notice that in (4.49) exponents e-iϕ and eiϕ cancel out and that an azimuthal integral is non-vanishing. If we choose e- for εe, we have 1 ðe - Þ = e ϕð1sÞjp ðx þ iyÞjϕ 2pxþiy Pxþiy,jp þi 2 e = p 8 2πa4

1 0

π

r 4 e - 3r=2a dr



sin 3 θdθ

0

ð4:51Þ

e2iϕ dϕ = 0,

ð4:52Þ

0

where we used x þ iy = r sin θeiϕ : With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy), which renders the integral (4.51) vanishing. Note that the only difference between (4.49) and (4.52) is about the integration of ϕ factor. For the same reason, if we choose e3 for εe, the matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element, ðe Þ only Px -þ iy,jp i survives. Similarly, with the ϕ(2px - iy)-related matrix element, only ðe - Þ Pxþiy,jp -i

þ

survives. Notice that jp-i is a shorthand notation of ϕ(2px designated as the initial state. That is, we have, e.g.,

- iy)

that is

140

4

Optical Transition and Selection Rules

1 = e ϕð1sÞjp ðx þ iyÞjϕ 2px - iy 2 1 = e ϕð1sÞjp ðx - iyÞjϕ 2px - iy 2

ðe - Þ Pxþiy,jp -i ðe Þ Px -þ iy,jp i -

p 27 2 = 5 ea, 3

ð4:53Þ

= 0:

Taking complex conjugate of (4.48), we have ðe Þ

Px -þ iy,jp

þi



p 1 27 2 = e ϕ 2pxþiy jp ðx þ iyÞjϕð1sÞ = ea: 35 2 ðe Þ

Here recall (1.116) and (x - iy){ = x + iy. Also note that since Px -þ iy,jp ðe Þ Px -þ iy,jp i þ



ð4:54Þ

þi

is real,

is real as well so that we have ðe Þ

Px -þ iy,jp

þi



ðe Þ

ðe Þ

= Px -þ iy,jp i = Pxþiy,hp j, þ

þ

ð4:55Þ

where hp+j means that ϕ(2px + iy) is designated as the final state. Comparing (4.48) and (4.55), we notice that the polarization vector has been switched from e+ to e- with the allowed transition, even though the matrix element remains the same. This can be explained as follows: In (4.48) the photon emission is occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a result, the radiation field has gained an angular momentum by ħ during the process in which the electron has lost an angular momentum ħ. In other words, ħ is transferred from the electron to the radiation field and this process results in the generation of left-circularly polarized light in the radiation field. In (4.54), on the other hand, the reversed process takes place. That is, the photon absorption is occurring in such a way that the electron is excited from ϕ(1s) to ϕ(2px + iy). After this process has been completed, the electron has gained an angular momentum by ħ, whereas the radiation field has lost an angular momentum by ħ. As a result, the positive angular momentum ħ is transferred to the electron from the radiation field that involves left-circularly polarized light. This can be translated into the statement that the radiation field has gained an angular momentum by -ħ. This is equivalent to the generation of right-circularly polarized light (characterized by e-) in the radiation field. In other words, the electron gains the angular momentum by ħ to compensate the change in the radiation field. The implication of the first equation of (4.53) can be interpreted in a similar manner. Also we have

4.3

Three-Dimensional System ðe Þ

Pxþiy,jp

 -i

141

1 ðe - Þ ðe Þ = Pxþiy,jp = Px -þ iy,hp j = e ϕ 2px - iy jp ðx - iyÞjϕð1sÞ -i 2 p 27 2 = ea: 35

In the above relation, hp-j implies that ϕ(2px - iy) is present as the final state. Notice that the inner products of (4.49) and (4.53) are real, even though neither x + iy ðe Þ ðe - Þ of (4.53) nor x - iy is Hermitian. Also note that Px -þ iy,jp i of (4.49) and Pxþiy,jp þ -i have the same absolute value with minus and plus signs, respectively. The minus sign of (4.49) comes from the Condon–Shortley phase. The difference, however, is ðe Þ

not essential, because the transition probability is proportional to Pxþiy,jp

ðe Þ

Px -þ iy,jp

þi

2

2 -i

or

. Some literature [3, 4] uses -(x + iy) instead of x + iy. This is simply

because of the inclusion of the Condon–Shortley phase; see (3.304). Let us think of the coherent state that is composed of ϕ(1s) and ϕ(2px ϕ(2px - iy). Choosing ϕ(2px + iy), the state ψ(x, t) can be given by

+ iy)

or

1 ψ ðx, t Þ = p 2  ϕð1sÞ exp - iE ð1sÞt=ħ þϕ 2pxþiy exp - iE 2pxþiy t=ħ ,

ð4:56Þ

where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we have ψ  ðx, tÞψ ðx,t Þ ¼ jψ ðx, t Þj2 1 ¼ jϕð1sÞj2 þ ϕ 2pxþiy 2 1 ¼ jϕð1sÞj2 þ ϕ 2pxþiy 2

2

þ ϕð1sÞℜe 2pxþiy eiðϕ - ωtÞ þ e - iðϕ- ωtÞ

2

þ 2ϕð1sÞℜe 2pxþiy cosðϕ -ωt Þ , ð4:57Þ

where using ℜe 2pxþiy , we denote ϕ(2px + iy) as follows: ϕ 2pxþiy  ℜe 2pxþiy eiϕ :

ð4:58Þ

That is, ℜe 2pxþiy represents a real part of ϕ(2px + iy) that depends only on r and θ. For example, from (3.304) we have 3 r r 1 ℜe 2pxþiy = - p a - 2 e - 2a sin θ: a 8 π

142

4

Optical Transition and Selection Rules

The third term of (4.57) implies that the existence probability density of an electron represented by |ψ(x, t)|2 is rotating counterclockwise around the z-axis with an angular frequency of ω. Similarly, in the case of ϕ(2px - iy), the existence probability density of an electron is rotating clockwise around the z-axis with an angular frequency of ω. Integrating (4.57), we have ψ  ðx, t Þψ ðx, t Þdτ = =

jψ ðx, t Þj2 dτ 1 2

jϕð1sÞj2 þ ϕ 2pxþiy 1

þ 0

=

π

r 2 dr

2

dτ 2π

sin θdθϕð1sÞℜe 2pxþiy

0

dϕ cosðϕ - ωt Þ

0

1 1 þ = 1, 2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2px vanishes because 2π

+ iy);

the last term

dϕ cosðϕ - ωt Þ = 0:

0

This is easily shown by suitable variable transformation. In relation to the above discussion, we often use real numbers to describe wave functions. For this purpose, we use the following unitary transformation to transform the orthonormal basis of e±imϕ to cos mϕ and sin mϕ. That is, we have ð- 1Þm imϕ 1 - imϕ 1 1 p e p cos mϕ p sin mϕ = p e π π 2π 2π 

ð- 1Þm p 2 1 p 2

-

ð- 1Þm i p 2 i p 2

,

ð4:59Þ

where we assume that m is positive so that we can appropriately take into account the Condon–Shortley phase. Alternatively, we describe it via unitary transformation as follows:

4.3

Three-Dimensional System

143

ð- 1Þm imϕ 1 - imϕ 1 1 p e p = p cos mϕ p sin mϕ e π π 2π 2π 

ð- 1Þm p 2 ð- 1Þm i p 2

1 p 2 i -p

:

ð4:60Þ

2

In this regard, we have to be careful about normalization constants; for trigonometric functions the constant should be p1π, whereas for the exponential representation, the constant is p12π . At the same time, trigonometric functions are expressed as a linear combination of eimϕ and e-imϕ, and so if we use the trigonometric functions, information of a magnetic quantum number is lost. ðnÞ

ðnÞ

In Sect. 3.7, we showed normalized functions Λl,m = Y m l ðθ, ϕÞRl ðr Þ of the ðnÞ

±imϕ hydrogen-like atom. Noting that Y m , Λl,m can be l ðθ, ϕÞ is proportional to e described using cos mϕ and sin mϕ for the basis vectors. We denote two linearly

ðnÞ

ðnÞ

independent vectors by Λl, cos mϕ and Λl, sin mϕ . Then, these vectors are expressed as

ðnÞ

Λl, cos mϕ

ðnÞ

ðnÞ

Λl, sin mϕ = Λl,m

ðnÞ

Λl, - m

ð- 1Þm p 2 1 p 2

-

ð- 1Þm i p 2 i p 2

,

ð4:61Þ

where we again assume that m is positive. In chemistry and materials science, we ðnÞ

ðnÞ

normally use real functions of Λl, cos mϕ and Λl, sin mϕ . In particular, we use the ð2Þ

ð2Þ

notations of, e.g., ϕ(2px) and ϕ(2py) instead of Λ1, cos ϕ and Λ1, sin ϕ , respectively. In that case, we explicitly have a following form:

ϕð2px Þ ϕ 2py

=

ð2Þ Λ1,1

ð2Þ Λ1, - 1

1 i p -p 2 2 1 i p p 2 2

1 i p -p 2 2 = ϕ 2pxþiy ϕ 2px - iy 1 i p p 2 2 r 3 r r 1 1 - 32 r - 2a p a - 2 e - 2a sin θ sin ϕ : e sin θ cos ϕ = p a a a 4 2π 4 2π

ð4:62Þ Thus, the Condon–Shortley phase factor has been removed.

144

4

Optical Transition and Selection Rules

Using this expression, we calculate matrix elements of the electric dipole transition. We have ðe Þ

Px,jp1 i = ehϕð1sÞjxjϕð2px Þi x

1

e = p 4 2πa4

4 - 3r=2a

r e

π

dr

0



sin θdθ 3

0

p ð4:63Þ 27 2 cos ϕdϕ = ea: 35 2

0

Thus, we obtained the same result as (4.49) apart from the minus sign. Since a square of an absolute value of the transition moment plays a role, the minus sign is again of secondary importance. With Pðye2 Þ , similarly we have ðe Þ

Py,jp2 i = e ϕð1sÞjyjϕ 2py y

1

e = p 4 2πa4

π

r 4 e - 3r=2a dr

0



sin 3 θdθ

0

sin 2 ϕdϕ =

0

p ð4:64Þ 27 2 ea: 35

Comparing (4.39), (4.63), and (4.64), we have ðe Þ Pz,jp3 i z

ðe Þ = Px,jp1 i x

ðe Þ = Py,jp2 i y

ðe Þ

ðe Þ

ðe Þ

z

x

y

p 27 2 = ea: 35

In the case of Pz,jp3 i , Px,jp1 i , and Py,jp2 i , the optical transition is said to be polarized along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant. Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.

4.4

Selection Rules

In a three-dimensional system such as hydrogen-like atoms, quantum states of particles (i.e., electrons) are characterized by three quantum numbers: principal quantum numbers, orbital angular momentum quantum numbers (or azimuthal quantum numbers), and magnetic quantum numbers. In this section, we examine the selection rules for the electric dipole approximation. Of the three quantum numbers mentioned above, angular momentum quantum numbers are denoted by l and magnetic quantum numbers by m. First, we examine the conditions on m. With the angular momentum operator L and its corresponding operator M, we get the following commutation relations:

4.4

Selection Rules

145

½M z , x = iy,

M y , z = ix,

½M x , y = iz;

½M z , iy = x,

M y , ix = z,

½M x , iz = y;

etc:

ð4:65Þ

Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the reader. Thus, we have, e.g., ½M z , x þ iy = x þ iy,

½M z , x - iy = - ðx - iyÞ, etc:

ð4:66Þ

Putting Qþ  x þ iy and

Q -  x - iy,

we have ½M z , Qþ  = Qþ ,

½M z , Q -  = - Q - :

ð4:67Þ

Taking an inner product of both sides of (4.67), we have hm0 j½M z , Qþ jmi = hm0 jM z Qþ - Qþ M z jmi = m0 hm0 jQþ jmi - mhm0 jQþ jmi = hm0 jQþ jmi, ð4:68Þ where the quantum state jmi is identical to jl, mi in (3.151). Here we need no information about l, and so it is omitted. Thus, we have, e.g., Mz j mi = m j mi. Taking its adjoint, we have hm j Mz{ = hm j Mz = mhmj, where Mz is Hermitian. These results lead to (4.68). From (4.68), we get ðm0 - m - 1Þhm0 jQþ jmi = 0:

ð4:69Þ

Therefore, for the matrix element hm′| Q+| mi not to vanish, we must have m0 - m - 1 = 0

or

Δm = 1 ðΔm  m0 - mÞ:

This represents the selection rule with respect to the coordinate Q+. Similarly, we get ðm0 - m þ 1Þhm0 jQ - jmi = 0: In this case, for the matrix element hm′| Q-| mi not to vanish, we have

ð4:70Þ

146

4

m0 - m þ 1 = 0

Optical Transition and Selection Rules

Δm = - 1:

or

To derive (4.70), we can alternatively use the following: Taking the adjoint of (4.69), we have ðm0 - m - 1ÞhmjQ - jm0 i = 0: Exchanging m′ and m, we have ðm - m0 - 1Þhm0 jQ - jmi = 0

ðm0 - m þ 1Þhm0 jQ - jmi = 0:

or

Thus, (4.70) is recovered. Meanwhile, we have a commutation relation ½M z , z = 0:

ð4:71Þ

Similarly, taking an inner product of both sides of (4.71), we have ðm0 - mÞhm0 jzjmi = 0: Therefore, for the matrix element hm′| z| mi not to vanish, we must have m0 - m = 0

or

Δm = 0:

ð4:72Þ

These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if circularly polarized light takes part in the optical transition, Δm = ± 1. For instance, using the present notation, we rewrite (4.48) as 1 ϕð1sÞjp ðx - iyÞjϕ 2pxþiy 2

p 27 2 1 = p h0jQ - j1i = - 5 a: 3 2

If linearly polarized light is related to the optical transition, we have Δm = 0. Next, we examine the conditions on l. To this end, we calculate a following commutator [5]:

4.4

Selection Rules

147

M2, z = Mx2 þ My2 þ Mz2, z = Mx2, z þ My2, z = M x ðM x z - zM x Þ þ M x zM x - zM x 2 þM y M y z - zM y þ M y zM y - zM y 2 = M x ½M x , z þ ½M x , zM x þ M y M y , z þ M y , z M y

ð4:73Þ

= i M y x þ xM y - M x y - yM x = i M x y - yM x - M y x þ xM y þ 2M y x - 2M x y = i 2iz þ 2M y x - 2M x y = 2i M y x - M x y þ iz : In the above calculations, (1) we used [Mz, z] = 0 (with the second equality); (2) RHS was modified so that the commutation relations can be used (the third equality); (3) we used -Mxy = Mxy - 2Mxy and Myx = - Myx + 2Myx so that we can use (4.65) (the second to the last equality). Moreover, using (4.65), (4.73) can be written as M 2 , z = 2i xM y - M x y = 2i M y x - yM x : Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For further use, we give alternative relations such that M 2 , x = 2i yM z - M y z = 2i M z y - zM y , M 2 , y = 2iðzM x - M z xÞ = 2iðM x z - xM z Þ:

ð4:74Þ

Using (4.73), we calculate another commutator such that M2, M2, z

¼ 2i M 2 , M y x - M 2 , M x y þ i M 2 , z ¼ 2i M y M 2 , x - M x M 2 , y þ i M 2 , z ¼ 2i 2iM y yM z - M y z - 2iM x ðM x z - xM z Þ þ i M 2 , z ¼ - 2 2 Mxx þ Myy þ Mzz Mz - 2 Mx2 þ My2 þ Mz2 z þ M2z - zM 2 ¼ 2 M 2 z þ zM 2 : ð4:75Þ

In the above calculations, (1) we used [M2, My] = [M2, Mx] = 0 (with the second equality); (2) we used (4.74) (the third equality); (3) RHS was modified so that we can use the relation M ⊥ x from the definition of the angular momentum operator, i.e., Mxx + Myy + Mzz = 0 (the second to the last equality). We used [Mz, z] = 0 as well. Similar results are obtained with x and y. That is, we have

148

4

Optical Transition and Selection Rules

M 2 , M 2 , x = 2 M 2 x þ xM 2 ,

ð4:76Þ

M 2 , M 2 , y = 2 M 2 y þ yM 2 :

ð4:77Þ

Rewriting, e.g., (4.75), we have M 4 z - 2M 2 zM 2 þ zM 4 = 2 M 2 z þ zM 2 :

ð4:78Þ

Using the relation (4.78) and taking inner products of both sides, we get, e.g., l0 jM 4 z - 2M 2 zM 2 þ zM 4 jl = l0 j2 M 2 z þ zM 2 jl :

ð4:79Þ

That is, l0 jM 4 z - 2M 2 zM 2 þ zM 4 jl - l0 j2 M 2 z þ zM 2 jl = 0: Considering that both terms of LHS contain a factor hl′| z| li in common, we have l0 ðl0 þ 1Þ - 2l0 lðl0 þ 1Þðl þ 1Þ þ l2 ðl þ 1Þ2 - 2l0 ðl0 þ 1Þ - 2lðl þ 1Þ 2

2

× hl0 jzjli = 0,

ð4:80Þ

where the quantum state jli is identical to jl, mi in (3.151) with m omitted. To factorize the first factor of LHS of (4.80), we view it as a quartic equation with respect to l′. Replacing l′ with -l, we find that the first factor vanishes, and so the first factor should have a factor (l′ + l ). Then, we factorize the first factor of LHS of (4.80) such that [the first factor of LHS of (4.80)] = ðl0 þ lÞ ðl0 - lÞ þ 2ðl0 þ lÞ l0 - l0 l þ l2 - 2l0 lðl0 þ lÞ - 2ðl0 þ lÞ - ðl0 þ lÞ 2

2

2

2

= ðl0 þ lÞ ðl0 þ lÞ l0 - lÞ2 þ 2 l0 - l0 l þ l2 - 2l0 l - 2 - ðl0 þ lÞ 2

= ðl0 þ lÞ ðl0 þ lÞ l0 - lÞ2 þ 2 l0 - 2l0 l þ l2 - ðl0 þ l þ 2Þ 2

= ðl0 þ lÞ ðl0 þ lÞðl0 - lÞ þ 2 l0 - lÞ2 - ðl0 þ l þ 2Þ 2

= ðl0 þ lÞ ðl0 - lÞ ½ðl0 þ lÞ þ 2 - ðl0 þ l þ 2Þ 2

= ðl0 þ lÞðl0 þ l þ 2Þðl0 - l þ 1Þðl0 - l - 1Þ: Thus rewriting (4.80), we get

ð4:81Þ

4.4

Selection Rules

149

ðl0 þ lÞðl0 þ l þ 2Þðl0 - l þ 1Þðl0 - l - 1Þhl0 jzjli = 0:

ð4:82Þ

We have similar relations with respect to hl′| x| li and hl′| y| li because of (4.76) and (4.77). For the electric dipole transition to be allowed, among hl′| x| li, hl′| y| li, and hl′| z| li, at least one term must be non-vanishing. For this, at least one of the four factors of (4.81) should be zero. Since l′ + l + 2 > 0, this factor is excluded. For l′ + l to vanish, we should have l′ = l = 0; notice that both l′ and l are nonnegative integers. We must then examine this condition. This condition is equivalent to that the sphericalpharmonics related to the angular variables θ and ϕ take the form of Y 00 ðθ, ϕÞ = 1= 4π , i.e., a constant. Therefore, the θ-related integral for the matrix element hl′| z| li only consists of a following factor: π

cos θ sin θdθ =

0

1 2

π 0

1 sin 2θdθ = - ½cos 2θπ0 = 0, 4

where cos θ comes from a polar coordinate z = r cos θ and sinθ is due to an infinitesimal volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl′| z| li vanishes on condition that l′ = l = 0. As a polar coordinate representation, x = r sin θ cos ϕ and y = r sin θ sin ϕ, and so the ϕ-related integrals hl′| x| li and hl′| y| li vanish as well. That is, 2π



cos ϕdϕ =

0

sin ϕdϕ = 0:

0

Therefore, the matrix elements relevant to l′ = l = 0 vanish with all the coordinates, i.e., we have hl0 jxjli = hl0 jyjli = hl0 jzjli = 0:

ð4:83Þ

Consequently, we exclude (l′ + l )-factor as well, when we consider a condition of the allowed transition. Thus, regarding the condition that should be satisfied with the allowed transition, from (4.82) we get l0 - l þ 1 = 0

or

l0 - l - 1 = 0:

ð4:84Þ

Or defining Δl  l′ - l, we get Δl = ± 1:

ð4:85Þ

Thus, for the transition to be allowed, the azimuthal quantum number must change by one.

150

4.5

4

Optical Transition and Selection Rules

Angular Momentum of Radiation [6]

In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light acts on an electron, what can we anticipate? Here we deal with this problem within a framework of a semiclassical theory. Let E and H be an electric and magnetic field of a left-circularly polarized light, respectively. They are expressed as 1 E = p E0 ðe1 þ ie2 Þ exp iðkz - ωt Þ, 2

ð4:86Þ

1 1 E0 ðe2 - ie1 Þ exp iðkz - ωt Þ: H = p H 0 ðe2 - ie1 Þ exp iðkz - ωt Þ = p 2 2 μv

ð4:87Þ

Here we assume that the light is propagating in the direction of the positive z-axis. The electric and magnetic fields described by (4.86) and (4.87) represent the leftcircularly polarized light. A synchronized motion of an electron is expected, if the electron exerts a circular motion in such a way that the motion direction of the electron is always perpendicular to the electric field and parallel to the magnetic field (see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron motion. Here, the Lorentz force F(t) is described by



Fðt Þ = eEðxðt ÞÞ þ e xðt Þ × Bðxðt ÞÞ,

ð4:88Þ

where the first term is electric Lorentz force and the second term represents the magnetic Lorentz force. Fig. 4.2 Synchronized motion of an electron under a left-circularly polarized light

y electron motion

H electron

O

E

x

4.5

Angular Momentum of Radiation [6]

151

The quantity B called magnetic flux density is related to H as B = μ0H, where μ0 is permeability of vacuum; see (7.10) and (7.11) of Sect. 7.1. In (4.88) E and B are measured at a position where the electron is situated at a certain time t. Equation (4.88) universally describes the motion of a charged particle in the presence of electromagnetic fields. We consider another related example in Chap. 15. Equation (4.86) can be rewritten as E

1 = p E 0 ½e1 cosðkz - ωt Þ - e2 sinðkz - ωt Þ 2 1 þi p E 0 ½e2 cosðkz - ωt Þ þ e1 sinðkz - ωt Þ: 2

ð4:89Þ

Suppose that the electron exerts the circular motion in a region narrow enough around the origin and that the said electron motion is confined within the xy-plane that is perpendicular to the light propagation direction. Then, we can assume that z ≈ 0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have 1 E = p E 0 ðe1 cos ωt þ e2 sin ωt Þ: 2 Thus, a force F exerting the electron is described by F = eE,

ð4:90Þ

where e is an elementary charge (e < 0). Accordingly, an equation of motion of the electron is approximated such that m€x = eE,

ð4:91Þ

where m is a mass of an electron and x is a position vector of the electron. With individual components of the coordinate, we have 1 m€x = p eE0 cos ωt 2

and

1 m€y = p eE0 sin ωt: 2

ð4:92Þ

Integrating (4.92) two times, we get eE mx = - p 0 cos ωt þ Ct þ D, 2ω2 where C and D are integration constants. Setting xð0Þ = have C = D = 0. Similarly, we have

ð4:93Þ peE 0 2mω2

and x′(0) = 0, we

152

4

Optical Transition and Selection Rules

eE my = - p 0 sin ωt þ C 0 t þ D0 , 2ω2 where C′ and D′ are integration constants. Setting y(0) = 0 and y0 ð0Þ = have C′ = D′ = 0. Thus, making t a parameter, we get x2 þ y 2 = p

eE 0 2mω2

ð4:94Þ peE 0 , 2mω

we

2

:

ð4:95Þ

This implies that the electron is exerting a counterclockwise circular motion with 0 under the influence of the electric field. This is consistent with a a radius - peE 2mω2 motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in (4.57). An angular momentum the electron has acquired is L = x × p = xpy - ypx = - p

eE 0 2mω2

meE e2 E 0 2 : -p 0 = 2mω3 2mω

ð4:96Þ

Identifying this with ħ, we have e2 E 0 2 = ħ: 2mω3

ð4:97Þ

e2 E 0 2 = ħω: 2mω2

ð4:98Þ

In terms of energy, we have

Assuming a wavelength of the light is 600 nm, we need a left-circularly polarized light whose electric field is about 1.5 × 1010 [V/m]. A radius α of a circular motion of the electron is given by α= p

eE 0 : 2mω2

Under the same condition as the above, α is estimated to be ~2 Å.

References 1. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York, NY 2. Fowles GR (1989) Introduction to modern optics, 2nd edn. Dover, New York, NY

ð4:99Þ

References

153

3. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham, MA 4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World Scientific, Singapore 5. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 6. Rossi B (1957) Optics. Addison-Wesley, Reading, MA

Chapter 5

Approximation Methods of Quantum Mechanics

In Chaps. 1–3 we have focused on solving eigenvalue equations with respect to a particle confined within a one-dimensional potential well or a harmonic oscillator along with an electron of a hydrogen-like atom. In each example we obtained exact analytical solutions with the quantum-mechanical states and corresponding eigenvalues (energy, angular momentum, etc.). In most cases of quantum-mechanical problems, however, we are not able to get such analytical solutions or accurately determine the corresponding eigenvalues. Under these circumstances, we need appropriate approximation methods of those problems. Among those methods, the perturbation method and variational method are widely used. In terms of usefulness, we provide several examples concerning physical systems that have already appeared in Chaps. 1–3. In this chapter, we examine how these physical systems change their quantum states and corresponding energy eigenvalues as a result of undergoing influence from the external field. We assume that the change results from the application of external electric field. For simplicity, we focus on the change in eigenenergy and corresponding eigenstate with respect to the non-degenerate quantum state. As specific cases in these examples, we happen to be able to get perturbed physical quantities accurately. Including such cases, for later purposes we take in advance important concepts of a complete orthonormal system (CONS) and projection operator.

5.1

Perturbation Method

In Chaps. 1–3, we considered a situation where no external field is exerted on a physical system. In Chap. 4, however, we studied the optical transition that takes place as a consequence of the interaction between the physical system and electromagnetic wave. In this chapter, we wish to examine how the physical system changes its quantum state, when an external electric field is applied to the system. We usually assume that the external field is weak and the corresponding change in © Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_5

155

156

5 Approximation Methods of Quantum Mechanics

the quantum state or energy is small. The applied external field causes the change in Hamiltonian of a quantum system and results in the change in that quantum state usually accompanied by the energy change of the system. We describe this process as a following eigenvalue equation: H j Ψi i = E i j Ψi i,

ð5:1Þ

where H represents the total Hamiltonian, Ei is an energy eigenvalue, and Ψi is an eigenstate corresponding to Ei. Equation (5.1) implies that there should be a series of eigenvalues and corresponding eigenvectors belonging to those eigenvalues. Strictly speaking, however, we can get such combinations of the eigenstate and eigenvalue only by solving (5.1) analytically. This is some kind of circular argument. Yet, at present we do not have to go into detail about that issue. Instead we advance to discussion of the approximation methods from a practical point of view. We assume that H is divided into two terms such that H = H 0 þ λV,

ð5:2Þ

where H0 is the Hamiltonian of the unperturbed system without the external field; V is an additional Hamiltonian due to the applied external field, or interaction between the physical system and the external field. The second term includes a parameter λ that can be modulated according to the change in the strength of the applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The parameter λ can be the applied field itself or can merely be a dimensionless parameter that can be set at λ = 1 after finishing the calculation. We will come back to this point later in examples. In (5.2) we assume that the following equation holds: ð0Þ

H 0 j ii = E i

j ii,

ð5:3Þ

where jii is the i-th quantum state. We designate j0i as the ground state. We assume the excited states jii (i = 1, 2, ⋯) to be numbered in order of increasing energies. These functions (or vectors) jii (including j0i) can be a quantum state that appeared as various eigenfunctions in the previous chapters. Of these, two or more functions may have the same eigenenergy (i.e., degenerate). If we are to deal with such degenerate states, those states jii have to be distinguished by, e.g., jidi (d = 1, 2, ⋯, s), where s denotes the degeneracy (or degree of degeneracy). For simplicity, however, in this chapter we are going to examine only the change in energy and physical states with respect to non-degenerate states. We disregard the issue on notation of the degenerate states accordingly. We pay attention to each individual case later, when necessary. Regarding a one-dimensional physical system, e.g., a particle confined within potential well and quantum-mechanical harmonic oscillator, for example, all the quantum states are non-degenerate as we have already seen in Chaps. 1 and 2. As a

5.1

Perturbation Method

157

special case we also consider the ground state of a hydrogen-like atom. This is because that state is non-degenerate, and so the problem can be dealt with in parallel to the cases of the one-dimensional physical systems. We assume that the quantum states jii (i = 0, 1, 2, ⋯) constitute the orthonormal eigenvectors such that hjjii = δji :

ð5:4Þ

Remember that the notation has already appeared in (2.53); see Chap. 13 as well.

5.1.1

Quantum State and Energy Level Shift Caused by Perturbation

One of the most important applications of the perturbation method is to evaluate the shift in quantum state and energy level caused by the applied external field. To this end, we expand both the quantum state and energy as a power series of λ. That is, in (5.1) we expand jΨii and Ei such that ð1Þ

ð2Þ

j Ψi i = j ii þ λ j ϕi ð0Þ

þ λ 2 j ϕi

ð1Þ

ð2Þ

ð3Þ

þ λ 3 j ϕi ð3Þ

Ei = E i þ λEi þ λ2 Ei þ λ3 E i þ ⋯, ð1Þ

ð2Þ

þ ⋯,

ð5:5Þ ð5:6Þ

ð3Þ

where j ϕi i, j ϕi i, j ϕi i, etc. are chosen as correction terms for the state jii that is associated with the unperturbed system. Once again, we assume that the state jii (i = 0, 1, 2, ⋯) represents the non-degenerate normalized eigenvector that ð0Þ belongs to the eigenenergy Ei of the unperturbed state. Unknown state vectors ð1Þ ð2Þ ð3Þ ð1Þ ð2Þ ð3Þ j ϕi i, j ϕi i, j ϕi i, etc. as well as E i , E i , Ei , etc. are to be determined after the calculation procedures carried out from now. These states jΨii and energies Ei ð0Þ result from the perturbation term λV and represent the deviation from jii and E i of the unperturbed system. With the normalization condition we impose the following condition upon jΨii such that hijΨi i = 1:

ð5:7Þ

This condition, however, does not necessarily mean that jΨii has been normalized (vide infra). From (5.4) and (5.7) we have ð1Þ

ijϕi

ð2Þ

= ijϕi

= ⋯ = 0:

ð5:8Þ

Let us calculate hΨi| Ψii on condition of (5.7) and (5.8). From (5.5) we have

158

5 ð1Þ

hΨi jΨi i = hijii þ λ

ð1Þ

þ hϕi jii

ijϕi

ð2Þ

þλ2

Approximation Methods of Quantum Mechanics

ð2Þ

ð1Þ

ð1Þ

þ hϕi jii þ hϕi jϕi i þ ⋯

ijϕi

ð1Þ

ð1Þ

ð1Þ

= hijii þ λ2 ϕi jϕi

ð1Þ

þ ⋯ ≈ 1 þ λ2 ϕi jϕi

,

where we used (5.4) and (5.8). Thus, hΨi| Ψii is normalized if we ignore the factor of λ 2. Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors with respect to λ, we obtain ð0Þ

E i - H 0 j ii = 0, ð0Þ

ð1Þ

ð1Þ

E i - H 0 j ϕi ð0Þ

ð2Þ

ð1Þ

E i - H 0 j ϕi

þ Ei

þ Ei ð1Þ

ð5:9Þ

j ii = V j ii, ð2Þ

j ϕi

ð5:10Þ ð1Þ

j ii = V j ϕi

þ Ei

:

ð5:11Þ

Equation (5.9) is identical with (5.3). Operating hjj on both sides of (5.10) from the left, we get ð0Þ

ð1Þ

jj E i - H 0 jϕi

ð1Þ

þ jjEi ji

ð0Þ

ð0Þ

= jj E i - Ej ð0Þ

ð0Þ

= Ei - Ej

ð1Þ

jϕi

ð1Þ

jjϕi

ð1Þ

þ Ei hjjii ð1Þ

þ E i δji = hjjVjii: ð5:12Þ

Putting j = i on (5.12), we have ð1Þ

E i = hijVjii:

ð5:13Þ

This represents the first-order correction term with the energy eigenvalue. Meanwhile, assuming j ≠ i on (5.12), we have ð1Þ

jjϕi ð0Þ

ð0Þ

=

hjjVjii ð0Þ Ei

ð0Þ

- Ej

ðj ≠ iÞ,

ð5:14Þ

where we used E i ≠ E j on the assumption that the quantum state jii that belongs ð0Þ to eigenenergy Ei is non-degenerate. Here, we emphasize that the eigenstates that ð0Þ correspond to Ej may or may not be degenerate. In other words, the quantum state that we are making an issue of with respect to the perturbation is non-degenerate. In this context, we do not question whether or not other quantum states [represented by jji in (5.14)] are degenerate.

5.1

Perturbation Method

159

Here we postulate that the eigenvectors, i.e., jii (i = 0, 1, 2, ⋯) of H0 form a complete orthonormal system (CONS) such that P

k

j kihk j = E,

ð5:15Þ

where jkihkj is said to be a projection operator and E is an identity operator. As remarked just above, (5.15) holds regardless of whether the eigenstates are degenerate. The word of complete system implies that any vector (or function) can be expanded by a linear combination of the basis vectors of the said system. The formal definition of the projection operator can be seen in Chaps. 14 and 18. Thus, we have ð1Þ

ð1Þ

ð1Þ

j ϕi i = E j ϕ i i =

k

=

j ki kjϕi

k≠i

j ki

ð1Þ

=

k≠i

hkjVjii ð0Þ Ei

j ki kjϕi

ð5:16Þ

,

ð0Þ - Ek

where with the third equality we used (5.8) and with the last equality we used (5.14). Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using (5.5) the approximated quantum state to the first order of λ is described by ð1Þ

j Ψi i ≈ j ii þ λ j ϕi i = j ii þ λ

k≠i

j ki

hkjVjii ð0Þ Ei

ð0Þ

- Ek

:

ð5:17Þ

For a while, let us think of a case where we deal with the change in the eigenenergy and corresponding eigenstate with respect to the degenerate quantum state. In that case, regarding the state jii in question on (5.12) we have ∃ j ji ( j ≠ i) ð0Þ ð0Þ that satisfies E i = E j . Therefore, we would have hj| V| ii = 0 from (5.12). Generally, it is not the case, however, and so we need a relation different from (5.12) to deal with the degenerate case. However, once again we do not get into details about this issue in this book. ð2Þ Next let us seek the second-order correction term E i of the eigenenergy. Operating hjj on both sides of (5.11) from the left, we get ð0Þ

ð0Þ

Ei - Ej

ð2Þ

jjϕi

ð1Þ

þ Ei

ð1Þ

jjϕi

ð2Þ

ð1Þ

þ E i δji = jjVjϕi

:

Putting j = i in the above relation as before, we have ð2Þ

ð1Þ

Ei = ijVjϕi

:

ð5:18Þ

Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get

160

5 ð2Þ

Ei

=

k≠i

hkjVjii

hijVjki

ð0Þ Ei

hkjVjii k≠i

=

ð0Þ - Ek

hkjVjii ð0Þ

ð0Þ

Ei - Ek

= =

Approximation Methods of Quantum Mechanics

k≠i

kjV { ji

hkjVjii



ð0Þ Ei

1 k≠i

ð0Þ

ð0Þ

Ei - Ek

ð0Þ

- Ek

ð5:19Þ

j hk j V jiij2 ,

where with the second equality we used (1.116) in combination with (1.118) and with the third equality we used the fact that V is an Hermitian operator; i.e., V{ = V. The state jki is sometimes called an intermediate state. Considering the expression of (5.16), we find that the approximated state of j ð1Þ ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state contains a linear combination of different eigenstates jki that have eigenenergies ð0Þ different from E i of jii. Substituting (5.13) and (5.19) for (5.6), with the energy correction terms up to the second order we have ð0Þ

ð1Þ

ð2Þ

1

ð0Þ

E i ≈ Ei þ λE i þ λ2 Ei = Ei þ λhijVjii þ λ2

k≠i

ð0Þ Ei

ð0Þ - Ek

j hk j V jiij2 : ð5:20Þ

If we think of the ground state j0i for jii, we get ð0Þ

ð1Þ

ð2Þ

E0 ≈ E 0 þ λE0 þ λ2 E0

1

ð0Þ

= E 0 þ λh0jVj0i þ λ2 ð0Þ

k≠0

ð0Þ E0

ð0Þ - Ek

j hk j V j0ij2 :

ð5:21Þ

ð0Þ

Since E 0 < E k ðk = 1, 2, ⋯Þ for any jki, the second-order correction term is always negative. This contributes to the energy stabilization.

5.1.2

Several Examples

To deepen the understanding of the perturbation method, we show several examples. We focus on the change in the eigenenergy and corresponding eigenstate with respect to specific quantum states. We assume that the change is caused by the applied electric field as a perturbation. Example 5.1 Suppose that the charged particle is confined within a one-dimensional potential well. This problem has already appeared in Example 1.2. Now let us consider how an energy of a particle carrying a charge e is shifted by the applied electric field. This situation could experimentally be achieved using a

5.1

Perturbation Method

161

“nano capacitor” where, e.g., an electron is confined within the capacitor while applying a voltage between the two electrodes. We adopt the coordinate system the same as that of Example 1.2. That is, suppose that we apply an electric field F between the electrodes that are positioned at x = ± L and that the electron is confined between the two electrodes. Then, the Hamiltonian of the system is described as H= -

ħ2 d 2 - eFx, 2m dx2

ð5:22Þ

where m is a mass of the particle. Note that we may choose -eF for the parameter λ of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is given by -

ħ2 d 2 ψ i ð xÞ - eFxψ i ðxÞ = E i ψ i ðxÞ, 2m dx2

ð5:23Þ

where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its corresponding eigenstate. We wish to seek the perturbation energy up to the second order and the quantum state up to the first order. We are particularly interested in the change in the ground state j0i. Notice that the ground state is obtained in (1.101) by putting l = 0. According to (5.21) we have 1

ð0Þ

E 0 ≈ E 0 þ λh0jVj0i þ λ2

k≠0

ð0Þ E0

j hk j V j0ij2 ,

ð0Þ - Ek

ð5:24Þ

where V = - eFx. Since the field F is thought to be an adjustable parameter, in (5.24) we may replace λ with -eF (vide supra). Then, in (5.24) V is replaced by x in turn. In this way, (5.24) can be rewritten as 1

ð0Þ

E0 ≈ E0 - eF h0jxj0i þ ð- eF Þ2

k≠0

ð0Þ E0

ð0Þ

- Ek

j hk j xj0ij2 :

ð5:25Þ

Considering that j0i is an even function [i.e., a cosine function in (1.101)] with respect to x and that x is an odd function, we find that the second term vanishes. Notice that the explicit coordinate representation of j0i is j 0i =

1 π cos x, L 2L

ð5:26Þ

5

Approximation Methods of Quantum Mechanics



162 Fig. 5.1 Notation of the quantum states that distinguish cosine and sine functions. ð0Þ E k ðk = 0, 1, 2, ⋯Þ represents energy eigenvalues of the unperturbed states

( )

=



|2

⟩ ≡ |4⟩

( )

=



|2

⟩ ≡ |3⟩

( )

=



|1

⟩ ≡ |2⟩

( )

=



|1

⟩ ≡ |1⟩

( )



|0

⟩ ≡ |0⟩

=

which is obtained by putting l = 0 in j lcos i 

1 π L cos 2L

þ lπL x ðl = 0, 1, 2, ⋯Þ in

(1.101). By the same token, hk| x| 0i in the third term of RHS of (5.25) vanishes if jki denotes a cosine function in (1.101). If, however, jki denotes a sine function, hk| x| 0i does not vanish. To distinguish the sine functions from cosine functions, we denote j nsin i 

1 nπ sin x ðn = 1, 2, 3, ⋯Þ L L

ð5:27Þ

that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states. ð0Þ

Now, putting E0 = rewrite it as

ħ2 2m

ð0Þ

π  4L 2 and E 2n - 1 = 2

ħ2 2m

2 2

 ð2n4LÞ 2π ðn = 1, 2, 3, ⋯Þ in (5.25), we

5.1

Perturbation Method

163 1

ð0Þ

E0 ≈ E0 - eF h0jxj0i þ ð- eF Þ2

1

k=1

ð0Þ E0

ð0Þ - Ek

1

ð0Þ

= E 0 - eF h0jxj0i þ ð- eF Þ2

1

n=1

ð0Þ E0

ð0Þ - E2n - 1

j hk j xj0ij2 j hnsin j xj0ij2 ,

where with the second equality n denotes the number appearing in (5.27) and jnsini ð0Þ belongs to E2n - 1 . Referring to, e.g., (4.17) with regard to the calculation of hk| x| 0i, we have arrived at the following equation described by E0 ≈

ħ2 π 2 322  8  mL4  2 - ð- eF Þ2 2m 4L π 6 ħ2

1 n=1

n2 : ð4n2 - 1Þ5

ð5:28Þ

The series of the second term in (5.28) rapidly converges. Defining the first N partial sum of the series as Sð N Þ 

N n=1

n2 , - 1Þ 5

ð4n2

we obtain Sð1Þ = 1=243 ≈ 0:004115226 and

Sð6Þ ≈ Sð1000Þ ≈ 0:004120685

as well as ½Sð1000Þ - Sð1Þ = 0:001324653: Sð1000Þ That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus, ð2Þ the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by ð2Þ

λ2 E 0 ≈ ðeF Þ2 ð2Þ

322  8  mL4 , 243π 6 ħ2

ð5:29Þ

where the notation λ2 E 0 is due to (5.6). Meanwhile, the corrected term of the quantum state is given by

164

5 ð1Þ

j Ψ0 i ≈ j 0i þ λ j ϕ0 = j 0i þ eF

Approximation Methods of Quantum Mechanics

≈ j 0i - eF

32  8  mL3 π 4 ħ2

1 n=1

k≠0

j ki

j nsin i

hkjxj0i ð0Þ E0

ð0Þ

- Ek

ð- 1Þn - 1 n : ð4n2 - 1Þ3

ð5:30Þ

For a reason similar to the above, the approximation is satisfactory, if we adopt only n = 1 in the second term of RHS of (5.30). That is, we obtain j Ψ0 i ≈ j 0i þ eF

32  8  mL3 j 1sin i: 27π 4 ħ2

ð5:31Þ

Thus, we have roughly estimated the stabilization energy and the corresponding quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating rough numbers of L and F in the actual situation (or perhaps in an actual “nano device”). The estimation is left for readers. Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscillator. Here let us suppose that a charged particle (with its charge e) is performing harmonic oscillation under an applied electric field. First we consider the coordinate representation of Schrödinger equation. Without the external field, the Schrödinger equation as an eigenvalue equation reads as -

ħ2 d 2 uð qÞ 1 þ mω2 q2 uðqÞ = EuðqÞ: 2m dq2 2

ð2:108Þ

Under the applied electric field, the perturbation is expressed as -eFq and, hence, the equation can be written as -

ħ2 d 2 u ð q Þ 1 þ mω2 q2 uðqÞ - eF ðqÞquðqÞ = EuðqÞ: 2m dq2 2

ð5:32Þ

For simplicity we assume that the electric field is uniform independent of q so that we can deal with it as a constant. Then, (5.32) can be rewritten as -

ħ2 d 2 uð qÞ 1 eF þ mω2 q 2m dq2 2 mω2

2

eF 1 uðqÞ = E þ mω2 2 mω2

2

uðqÞ:

ð5:33Þ

Changing the variable such that Q  qwe have

eF , mω2

ð5:34Þ

5.1

Perturbation Method

-

165

eF ħ2 d 2 u ð q Þ 1 1 þ mω2 Q2 uðqÞ = E þ mω2 2m dQ2 2 2 mω2

2

uðqÞ:

ð5:35Þ

Taking account of the change in functional form that results from the variable transformation, we rewrite (5.35) as -

1 ħ2 d 2 uð Q Þ 1 eF þ mω2 Q2 uðQÞ = E þ mω2 2 2 2m dQ2 mω2

2

uðQÞ,

ð5:36Þ

where uðqÞ = u Q þ

eF  uðQÞ: mω2

ð5:37Þ

Defining E as eF 1 E  E þ mω2 2 mω2

2

=E þ

e2 F 2 , 2mω2

ð5:38Þ

we get -

ħ2 d 2 u ð Q Þ 1 þ mω2 Q2 uðQÞ = EuðQÞ: 2m dQ2 2

ð5:39Þ

Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have λ

2E , ħω

λ = 2n þ 1:

Then, we have E=

ħωð2n þ 1Þ ħω : λ= 2 2

ð5:40Þ

From (5.38), we get E=E-

-

1 e2 F 2 e2 F 2 = ħω n þ : 2 2 2mω 2mω2

ð5:41Þ

This implies that the energy stabilization due to the applied electric field is e2 F 2 2mω2 . Note that the system is always stabilized regardless of the sign of e.

166

5

Approximation Methods of Quantum Mechanics

Next, we wish to consider the problem on the basis of the matrix (or operator) representation. Let us evaluate the perturbation energy using (5.20). Conforming the notation of (5.20) to the present case, we have 1

E n ≈ Eðn0Þ - eF hnjqjni þ ð- eF Þ2

k≠n

Eðn0Þ

ð0Þ - Ek

j hk j qjnij2 ,

ð5:42Þ

where Eðn0Þ is given by E ðn0Þ = ħω n þ

1 : 2

ð5:43Þ

Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes. With the third term, using (2.68) as well as (2.55) and (2.61) we have jhkjqjnij2 = = = = =

 ħ ħ kja þ a{ jn kja þ a{ jn 2mω 2mω p 2 ħ p nhkjn - 1i þ n þ 1hkjn þ 1i 2mω p 2 ħ p nδk,n - 1 þ n þ 1δk,nþ1 2mω ħ nδk,n - 1 þ ðn þ 1Þδk,nþ1 þ 2δk,n - 1 δk,nþ1 2mω ħ ½nδ þ ðn þ 1Þδk,nþ1 : 2mω k,n - 1

ð5:44Þ nð n þ 1 Þ

Notice that with the last equality of (5.44) there is no k that satisfies k = n 1 = n + 1 at once. Also note that hk| q| ni is a real number. Hence, as the third term of (5.42) we get 1 k≠n

ð0Þ

Enð0Þ - E k

jhkjqjnij2 = =

k≠n

1 ħ  ½nδ þ ðn þ 1Þδk,nþ1  ħωðn - k Þ 2mω k,n - 1

n nþ1 1 1 þ : =2mω2 2mω2 n - ðn - 1Þ n - ðn þ 1Þ

ð5:45Þ Thus, from (5.42) we obtain E n ≈ ħω n þ

1 e2 F 2 : 2 2mω2

ð5:46Þ

Consequently, we find that (5.46) obtained by the perturbation method is consistent with (5.41) that was obtained as the exact solution. As already pointed out in

5.1

Perturbation Method

167

Sect. 5.1.1, however, (5.46) does not represent an eigenenergy. In fact, using (5.17) together with (4.26) and (4.28), we have ð1Þ

jΨ0 i ≈ j0i þ λ j ϕ0

¼ j0i þ λ

k≠0

j ki

hkjV j0i ð0Þ E0

ð0Þ

- Ek

h1jV j0i

¼ j0i þ λ j 1i

ð0Þ

ð5:47Þ

ð0Þ

E0 - E1

¼ j0i - eF j 1i

h1jxj0i ð0Þ

ð0Þ

E0 - E1

e2 F 2 j 1i: 2mω3 ħ

¼ j0i þ

Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i that have different eigenenergies. Hence, jΨ0i does not possess a fixed eigenenergy. Notice also that in (5.47) the factors hk| x| 0i vanish except for h1| x| 0i; see Chaps. 2 and 4. To think of this point further, let us come back to the coordinate representation of Schrödinger equation. From (5.39) and (2.106), we have un ðQÞ = where H n

mω ħ Q

mω ħ

1 4

1 Hn π 2n n! 1 2

=

mω ħ

1 4

ðn = 0, 1, 2, ⋯Þ,

is the Hermite polynomial of the n-th order. Using (5.34) and

(5.37), we replace Q with q un ðqÞ = un q -

mω 2 mω Q e - 2ħ Q ħ

eF mω2 1 Hn 1 π 2 2n n!

eF mω2

to obtain the following form described by

mω eF qħ mω2

e - 2ħ



q-

eF mω2

2

ðn = 0, 1, 2, ⋯Þ: ð5:48Þ

Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using ψ n(q). That is, as a solution of (5.32) we get u n ð qÞ =

c ψ ðqÞ, k nk k

ð5:49Þ

where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are non-degenerate, un(q) expressed by (5.49) lacks a definite eigenenergy. Example 5.3 [1] In the previous examples we studied the perturbation in one-dimensional systems, where all the energy eigenstates are non-degenerate. Here we deal with the change in the energy and quantum states of a hydrogen atom (i.e., a three-dimensional system). Since the energy eigenstates are generally degenerate (see Chap. 3), for simplicity we consider only the ground state that is

168

5 Approximation Methods of Quantum Mechanics

non-degenerate. As in the previous two cases, we assume that the perturbation is caused by the applied electric field. As the simplest example, we deal with the properties including the polarizability of the hydrogen atom in its ground state. The polarizability of atom, molecule, etc. is one of the most fundamental properties in materials science. We assume that the electric field is applied along the direction of the z-axis; in this case the perturbation term V is expressed as V = - eFz,

ð5:50Þ

where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described by H = H 0 þ V,

ð5:51Þ 2

H0 = -

ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂ r2 þ sin θ þ sin θ ∂θ 2μr 2 ∂r ∂r ∂θ sin 2 θ ∂ϕ2

e2 , 4πε0 r

ð5:52Þ

where H0 is identical with the Hamiltonian H of (3.35), in which Z = 1. In this example, we discuss topics on the polarization and energy shift (Stark effect) caused by the applied electric field [1, 3]. 1. Polarizability of a hydrogen atom in the ground state As in (5.5) and (5.16), the first-order perturbed state is described by j Ψ0 i ≈ j 0i þ j ki

k≠0

j ki

hkjzj0i ð0Þ E0

ð0Þ

- Ek

,

hkjVj0i ð0Þ E0

ð0Þ

- Ek

= j 0i - eF

k≠0

ð5:53Þ

where j0i denotes the ground state expressed as (3.301). Since the quantum states jki represent all the quantum states of the hydrogen atom as implied in (5.3), in the second term of RHS in (5.53) we are considering all those states except for ð0Þ ħ2 j0i. The eigenenergy E 0 is identical with E 1 = - 2μa 2 obtained from (3.258), where n = 1. As already noted, we simply numbered the states jki in order of ð0Þ ð0Þ increasing energies. In the case of k ≠ j (k, j ≠ 0), we may have either Ek = E j ð0Þ

ð0Þ

or E k ≠ E j accordingly. Using (5.51), let us calculate the polarizability of a hydrogen atom in a ground state. In Sect. 4.1 we gave a definition of the electric dipole moment such that

5.1

Perturbation Method

169

Pe

ð4:6Þ

x, j j

where xj is a position vector of the j-th charged particle. Placing the proton of hydrogen at the origin, by use of the notation (3.5) P is expressed as Px P = ðe1 e2 e3 Þ

ex = ex = ðe1 e2 e3 Þ

Py Pz

ey ez

:

Hence, the z-component of the electric dipole moment Pz of electron is given by Pz = ez:

ð5:54Þ

The quantum-mechanical analogue of (5.54) is given by hPz i = ehΨ0 jzjΨ0 i,

ð5:55Þ

where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get hΨ0 jzjΨ0 i ¼

h0j - eF

k≠0

¼ h0jzj0i - eF - eF þ ðeF Þ2

k≠0

hkj

k≠0

hkjzj0i ð0Þ E0

ð0Þ - Ek

h0jzjk i

0 z{ k

hkjzjk i k≠0

z j0i - eF

k≠0

jki

hk jzj0i ð0Þ E0

ð0Þ

- Ek

hk jzj0i ð0Þ E0

ð0Þ

- Ek

hkjzj0i ð0Þ E0

ð0Þ

- Ek

jhkjzj0ij2 ð0Þ

ð0Þ

E0 - Ek

2

: ð5:56Þ

In (5.56) readers are referred to the computation rule of inner product described in Chap. 1 or in Chap. 13 for further details. Since z is Hermitian (i.e., z{ = z), the second and third terms of RHS of (5.56) equal. Neglecting the second-order perturbation factor on the fourth term, we have

170

5

Approximation Methods of Quantum Mechanics

jhkjzj0ij2

hΨ0 jzjΨ0 i ≈ h0jzj0i - 2eF

k≠0

ð0Þ

ð0Þ

E0 - Ek

:

Thus, from (5.55) we obtain jhkjzj0ij2

hPz i ≈ eh0jzj0i - 2e2 F

k≠0

ð0Þ

ð0Þ

E0 - Ek

:

ð5:57Þ

Using (3.301), the coordinate representation of h0| z| 0i is described by h0jzj0i =

a-3 π

=

a-3 π

= where we used

π 0

a-3 2π

1

ze - 2r=a dxdydz

-1 1

π



r 3 e - 2r=a dr



0

0 1

π



r 3 e - 2r=a dr



0

cos θ sin θdθ

ð5:58Þ

0

0

sin 2θdθ = 0,

0

sin 2θdθ = 0. Then, we have hPz i ≈ - 2e2 F

jhkjzj0ij2 k≠0

ð0Þ

ð0Þ

E0 - Ek

:

ð5:59Þ

On the basis of classical electromagnetism, we define the polarizability α as [4] ð5:60Þ

α  hPz i=F:

Here we have an important relationship between the polarizability and electric dipole moment. That is, ðpolarizabilityÞ × ðelectric fieldÞ = ðelectric dipole momentÞ: From (5.59) and (5.60) we have α = - 2e2

jhkjzj0ij2 k≠0

ð0Þ

ð0Þ

E0 - Ek

:

ð5:61Þ

At the first glance, to evaluate (5.61) appears to be formidable, but making the most of the fact that the total wave functions of a hydrogen atom form the CONS, the evaluation is straightforward. The discussion is as follows: First, let us seek an operator G that satisfies [1]

5.1

Perturbation Method

171

z j 0i = ðGH 0 - H 0 GÞ j 0i:

ð5:62Þ

To this end, we take account of all the quantum states jki of a hydrogen atom (including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect. 3.8 we have obtained the total wave functions described by ðnÞ

ðnÞ

Λl,m = Y m l ðθ, ϕÞRl ðr Þ,

ð3:300Þ

ðnÞ

where Λl,m is expressed as a product of spherical surface harmonics and radial wave functions. The latter functions are described by associated Laguerre functions. It is well-known [2] that the spherical surface harmonics constitute the CONS on a unit sphere and that associated Laguerre functions form the CONS in the real one-dimensional space. Thus, their product expressed as (3.300), i.e., aforementioned collection of jki constitutes the CONS in the real threedimensional space. This is equivalent to j

j jihj j = E,

ð5:63Þ

where E is an identity operator. The implications of (5.63) are as follows: Take any function jf(r)i and operate (5.63) from the left. Then, we get Ef ðrÞ = f ðrÞ =

k

j kihkjf ðrÞi =

f k k

j ki:

ð5:64Þ

In other words, (5.64) implies that any function f(r) can be expanded into a series of jki. The coefficients fk are defined as f k  hkjf ðrÞi:

ð5:65Þ

Those are the so-called Fourier coefficients. Related discussion is given in Sect. 10.4 as well as Chaps. 14, 18, 20, and 22. We further impose the conditions on the operator G defined in (5.62). (1) G commutes with r (=|r|). (2) G has a functional form described by G = G(r, θ). (3) G does not contain a differential operator. On those conditions, we have ∂G/∂ϕ = 0. From (3.301), we have ∂ j 0i/∂ϕ = 0. Thus, we get ðGH 0 - H 0 GÞ j 0i =-

ħ2 1 ∂ ∂ 1 ∂ ∂ 1 1 ∂ ∂ - 2 G- 2 G j 0i: G 2 r2 r2 sin θ 2μ r ∂r r ∂r r sin θ ∂θ ∂r ∂r ∂θ ð5:66Þ

172

5

Approximation Methods of Quantum Mechanics

This calculation is somewhat complicated, and so we calculate (5.66) termwise. With the second term of (5.66), we have -

1 ∂ ∂ 1 ∂ ∂G r2 Gj0 =- 2 r2 r2 ∂r r ∂r ∂r ∂r

j0 -

∂ j 0i 1 ∂ r2 G r 2 ∂r ∂r

2

=-

∂ j 0i ∂G ∂ G ∂G ∂ j 0i G ∂ ∂G ∂ j 0i 1 - 2  2r j0 j0 r2 r ∂r r2 ∂r ∂r ∂r ∂r ∂r ∂r ∂r2

=-

∂ j 0i 2 ∂G ∂ G ∂G ∂ j 0i G ∂ j 0 -2 r2 j0 - 2 2 r ∂r r ∂r ∂r ∂r ∂r ∂r

=-

∂ j 0i 2 ∂G G ∂ 2 ∂G ∂ G j0 þ j0 - 2 , r2 j0 a ∂r r ∂r r ∂r ∂r ∂r 2

2

2

ð5:67Þ where with the last equality we used (3.301), namely j0i / e-r/a, where a is Bohr radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then, we get ðGH 0 - H 0 GÞj0i ¼

ħ2 1 1 ∂G 2 0 2μ r a ∂r

2

þ

∂G 1 1 ∂ ∂ G 0 þ 2 sin θ r sin θ ∂θ ∂r2 ∂θ

¼

1 1 ∂G ∂ G 1 1 ∂ ħ2 ∂G 2 þ 2 þ 2 sin θ 2μ r a ∂r r sin θ ∂θ ∂r ∂θ

ð5:68Þ

0i

2

j0i:

From (5.62) we obtain hrjzj0i = hrjðGH 0 - H 0 GÞj0i

ð5:69Þ

so that we can have the coordinate representation. As a result, we get r cos θϕð1sÞ =

ħ2 2μ 2

 2

1 1 ∂G ∂ G 1 1 ∂ ∂G þ 2 þ 2 sin θ r a ∂r r sin θ ∂θ ∂r ∂θ

ϕð1sÞ,

ð5:70Þ

where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we have

5.1

Perturbation Method

173

2

1 1 ∂G 1 1 ∂ ∂ G ∂G 2μ þ2 þ sin θ = 2 r cos θ: r a ∂r r 2 sin θ ∂θ ∂r 2 ∂θ ħ

ð5:71Þ

Now, we assume that G has a functional form of Gðr, θÞ = gðr Þ cos θ:

ð5:72Þ

Inserting (5.72) into (5.71), we have 1 1 dg 2g 2μ d2 g þ2 - 2 = 2 r: r a dr r ħ dr 2

ð5:73Þ

Equation (5.73) has a regular singular point at the origin and resembles an Euler equation whose general form of a homogeneous equation is described by [5] d 2 g a dg b þ þ g = 0, dr 2 r dr r2

ð5:74Þ

where a and b are arbitrary constants. In particular, if we have a following differential equation described by a d2 g a dg þ - 2 g = 0, r dr 2 r dr

ð5:75Þ

we immediately see that one of the particular solutions is g = r. However, (5.73) differs from the general second-order Euler equation by the presence of - 2a dg dr . Then, let us assume that a particular solution g has a form of g = pr 2 þ qr:

ð5:76Þ

Inserting (5.76) into (5.73), we get 4p -

4pr 2q 2μ = 2 r: a a ħ

ð5:77Þ

Comparing coefficient of r and constant term, we obtain p= Hence, we get

aμ 2ħ2

and

q= -

a2 μ : ħ2

ð5:78Þ

174

5

Approximation Methods of Quantum Mechanics

aμ r þ a r: ħ2 2

gð r Þ = -

ð5:79Þ

Accordingly, we have Gðr, θÞ = gðr Þ cos θ = -

aμ r aμ r þ a r cos θ = - 2 þ a z: ħ2 2 ħ 2

ð5:80Þ

This is a coordinate representation of G(r, θ). Returning back to (5.62) and operating hjj from the left on its both sides, we have hjjzj0i = hjjðGH 0 - H 0 GÞj0i = hjjGH 0 j0i - hjjH 0 Gj0i ð0Þ

ð0Þ

= E0 - Ej

hjjGj0i:

ð5:81Þ

Notice here that ð0Þ

H 0 j 0i = E0 j 0i

ð5:82Þ

and that {

ð0Þ

hj j H 0 = H 0 { jji = ½H 0 jji{ = Ej jji

{

ð0Þ

= E j hj j ,

ð5:83Þ

where we used a computation rule of (1.118) and with the second equality we used the fact that H0 is Hermitian, i.e., H0{ = H0. Rewriting (5.81), we have hjjGj0i =

hjjzj0i ð0Þ E0

ð0Þ

- Ej

:

ð5:84Þ

Multiplying h0|z|ji on both sides of (5.84) and summing over j (≠0), we obtain

j≠0

h0jzjjihjjGj0i =

h0jzjjihjjzj0i j≠0

ð0Þ

ð0Þ

E0 - Ej

:

ð5:85Þ

Adding h0| z| 0ih0| G| 0i on both sides of (5.85), we have

j

h0jzjjihjjGj0i =

h0jzjjihjjzj0i j≠0

ð0Þ

ð0Þ

E0 - Ej

þ h0jzj0ih0jGj0i:

ð5:86Þ

But, from (5.58) h0| z| 0i = 0. Moreover, using the completeness of jji described by (5.63) for LHS of (5.86), we get

5.1

Perturbation Method

175

h0jzGj0i = =

j

h0jzjjihjjzj0i

h0jzjjihjjGj0i = jhjjzj0ij2

j≠0

ð0Þ

ð0Þ

E0 - Ej

j≠0

ð0Þ

ð0Þ

E0 - Ej

α = - 2, 2e

ð5:87Þ

where with the second equality we used (5.84) and with the last equality we used (5.61). Also, we used the fact that z is Hermitian. Meanwhile, using (5.80), LHS of (5.87) is expressed as h0jzGj0i = -

r aμ a2 μ aμ zj0 = 0jz þ a 0jzrzj0 0jz2 j0 : h i 2 2ħ2 ħ2 ħ2

ð5:88Þ

Noting that j0i is spherically symmetric and that z and r are commutative, (5.88) can be rewritten as h0jzGj0i = -

aμ a2 μ 3 0jr j0 0jr 2 j0 , 6ħ2 3ħ2

ð5:89Þ

where we used h0| x2| 0i = h0| y2| 0i = h0| z2| 0i = h0| r2| 0i/3. Returning back to the coordinate representation and taking account of variables change from the Cartesian coordinate to the polar coordinate as in (5.58), we get h0jzGj0i = =-

aμ 5a3 a2 μ 4a3 aμ 4π 120a6 a2 μ 4π 3a5 = - 2 4 ħ2 4 3ħ2 πa3 4 ħ 6ħ2 πa3 64

aμ 9a3 : ħ2 4

ð5:90Þ

To perform the definite integral calculations of (5.89) and obtain the result of (5.90), modify (3.263) and use the formula described below. That is, differentiate 1 0

e - rξ dr =

1 ξ

four (or five) times with respect to ξ and replace ξ with 2/a to get the result of (5.90) appropriately. Using (5.87) and (5.90), as the polarizability α we finally get

176

5

α = - 2e2 h0jzGj0i = - 2e2  -

Approximation Methods of Quantum Mechanics

aμ 9a3 9a3 e2 aμ 9a3 = = 4πε 0 2 ħ2 4 ħ2 2 ð5:91Þ

= 18πε0 a3 , where we used a = 4πε0 ħ2 =μe2 :

ð5:92Þ

See Sect. 3.7.1 for this relation. 2. Stark effect of a hydrogen atom in a ground state The energy shift with the ground state of a hydrogen atom up to the second order is given by 1

ð0Þ

E 0 ≈ E 0 - eF h0jzj0i þ ð- eF Þ2

j≠0

ð0Þ E0

ð0Þ - Ej

j j j zj0ij2 :

ð5:93Þ

Using (5.58) and (5.87) obtained above, we readily get ð0Þ

E0 ≈ E0 -

αe2 F 2 αF 2 ð0Þ ð0Þ = E = E0 - 9πε0 a3 F 2 : 0 2 2e2

ð5:94Þ

Experimentally, the energy shift by -9πε0a3F2 is well-known as the Stark shift. In the above examples, we have seen how energy levels and related properties are changed by the applied electric field. The energies of the system that result from the applied external field should not be considered as an energy eigenvalue but should be thought to be an expectation value. The perturbation theory has a wide application in physical problems. Examples include the evaluation of transition probability between quantum states. The theory also deals with scattering of particles. Including the treatment of the degenerate case, interested readers are referred to appropriate literature [1, 3].

5.2

Variational Method

Another approximation method is a variational method. This method also has a wide range of applications in mathematical physics. A major application of the variational method lies in seeking an eigenvalue that is appropriately approximated. Suppose that we have an Hermitian differential operator D that satisfies an eigenvalue equation

5.2

Variational Method

177

Dyn = λn yn ðn = 1, 2, 3, ⋯Þ,

ð5:95Þ

where λn is a real eigenvalue and yn is a corresponding eigenfunction. We have already encountered one of the typical differential operators and eigenvalue equations in (1.63) and (1.64). In (5.95) we assume that λ1 ≤ λ2 ≤ λ3 ≤ ⋯,

ð5:96Þ

where corresponding eigenstates may be degenerate. We also assume that a collection {yn; n = 1, 2, 3, ⋯} constitutes the CONS. Now suppose that we have a function u that satisfies the boundary conditions (BCs) the same as those for yn that is a solution of (5.95). Then, we have hyn jDui = hDyn jui = hλn yn jui = λn hyn jui = λn hyn jui:

ð5:97Þ

In (5.97) we used (1.132) and the computation rule of the inner product (see Sect. 13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real (Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can be expanded in a series described by u=

c j j

j yj i

ð5:98Þ

with cj  yj ju :

ð5:99Þ

This relation is in parallel with (5.64). Similarly, Du can be expanded such that Du =

d j j

j yj i =

j

yj jDu j yj i =

λ j j

yj ju j yj i =

λc j j j

j yj i, ð5:100Þ

where dj is an expansion coefficient and with the third equality we used (5.95) and with the last equality we used (5.99). Comparing the individual coefficient of jyji of (5.100), we get d j = λ j cj :

ð5:101Þ

Thus, we obtain Du =

λc j j j

j yj i:

ð5:102Þ

Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98). Meanwhile, taking an inner product hu| Dui again termwise with (5.100), we get

178

5

hujDui ¼ u ≥

j

j

λ j cj j yj

λ1 cj

2

¼ λ1

Approximation Methods of Quantum Mechanics

¼

j

j

cj ,

λj cj ujyj ¼

j

λj cj cj ¼

2

j

λ j cj

ð5:103Þ

where with the third equality we used (5.99) in combination  ujyj = yj ju = cj . With hu| ui, we have hujui =

c hy j j j j

c jy i j j j

=

2

c c j j j

yj jyj =

2

j

cj ,

with

ð5:104Þ

where with the last equality we used the fact that the functions {yn} are normalized. Comparing once again (5.103) and (5.104), we finally get hujDui ≥ λ1 hujui or

λ1 ≤

hujDui : hujui

ð5:105Þ

If we choose y1 for u, we have an equality in (5.105). Thus, we reach the following important theorem. Theorem 5.1 Suppose that we have a linear differential equation described by Dy = λy,

ð5:106Þ

where D is a suitable Hermitian differential operator. Suppose also that under appropriate boundary conditions (BCs), (5.106) has a series of (real) eigenvalues. Then, the smallest eigenvalue is equal to the minimum of hu| Dui/hu| ui. Using this powerful theorem, we are able to find a suitably approximated smallest eigenvalue. We have simple examples next. As before, let us focus on the change in the energies and quantum states caused by the applied electric field for a particle in a one-dimensional potential well and a harmonic oscillator. The latter problem will be left for readers as an exercise. As a specific case, for simplicity, we consider only the ground state and first excited state to choose a trial function. First, we deal with the problem in common with both the cases. Then, we discuss the feature individually. As a general case, suppose that H is the Hamiltonian of the system (either a particle in a one-dimensional potential well or a harmonic oscillator). We calculate an expectation value of energy hHi when the system undergoes an influence of the electric field. The Hamiltonian is described by H = H 0 - eFx,

ð5:107Þ

where H0 is the Hamiltonian without the external field F. We suppose that the quantum state jui is expressed as

5.2

Variational Method

179

j ui ≈ j 0i þ a j 1i,

ð5:108Þ

where j0i and j1i denote the ground state and first excited state, respectively; a is an unknow parameter to be decided after the analysis based on the variational method. In the present case, jui is a trial function. Then, we have hH i 

hujHui : hujui

ð5:109Þ

With the numerator, we have hujHui = 0jþa 1j H 0 - eFx j0 þ aj1 = h0jH 0 j0i þ ah0jH 0 j1i - eF h0jxj0i - eFah0jxj1i þa h1jH 0 j0 i þ a ah1jH 0 j1i - eFa h1jxj0i - eFa ah1jxj1i = h0jH 0 j0i - eFah0jxj1i þ a ah1jH 0 j1i - eFa h1jxj0i:

ð5:110Þ

Note that in (5.110) four terms vanish because of the symmetry requirement of the symmetric potential well and harmonic oscillator with respect to the origin as well as because of the orthogonality between the states j0i and j1i. Here x is Hermitian and we assume that the coordinate representation of j0i and j1i is real as in (1.101) and (1.102) or in (2.106). Then, we have h1jxj0i = h1jxj0i = 0jx{ j1 = h0jxj1i: Also, assuming that a is a real number, we rewrite (5.110) as hujHui = h0jH 0 j0i - 2eFah0jxj1i þ a2 h1jH 0 j1i:

ð5:111Þ

As for the denominator of (5.109), we have hujui =

0jþa 1j j0 þ aj1

= 1 þ a2 ,

ð5:112Þ

where we used the normalized conditions h0| 0i = h1| 1i = 1 and the orthogonal conditions h0| 1i = h1| 0i = 0. Thus, we get hH i =

h0jH 0 j0i - 2eFah0jxj1i þ a2 h1jH 0 j1i : 1 þ a2

ð5:113Þ

Here, defining the ground state energy and first excited energy as E0 and E1, respectively, we put

180

5

H 0 j 0i = E0 j 0i

Approximation Methods of Quantum Mechanics

and

H 0 j 1i = E 1 j 1i,

ð5:114Þ

where E0 ≠ E1. As already shown in Chaps. 1 and 2, both the systems have non-degenerate energy eigenstates. Then, we have hH i =

E0 - 2eFah0jxj1i þ a2 E 1 : 1 þ a2

ð5:115Þ

Now we wish to determine the condition where hHi has an extremum. That is, we are seeking a that satisfies ∂hH i = 0: ∂a

ð5:116Þ

Calculating LHS of (5.116) by use of (5.115), we have ∂hH i 2eF h0jxj1ia2 þ 2ðE 1 - E0 Þa - 2eF h0jxj1i : = ∂a ð 1 þ a2 Þ 2

ð5:117Þ

hH i to be zero, we must have For ∂∂a

eF h0jxj1ia2 þ ðE1 - E 0 Þa - eF h0jxj1i = 0:

ð5:118Þ

Then, solving the quadratic equation (5.118), we obtain

a=

E 0 - E 1 ± ðE 1 - E 0 Þ

2

h0jxj1i 1 þ ½2eF ðE - E Þ2 1

0

2eF h0jxj1i

:

ð5:119Þ

Taking the plus sign in the numerator of (5.119) and using p

1 þ Δ≈1 þ

Δ 2

ð5:120Þ

for small Δ, we obtain a≈

eF h0jxj1i : E1 - E0

Thus, with the optimized state of (5.108) we get

ð5:121Þ

5.2

Variational Method

181

j ui ≈ j 0i þ

eF h0jxj1i j 1i: E1 - E0

ð5:122Þ

If one compares (5.122) with (5.17), one should recognize that these two equations are the same within the first-order approximation. Notice that putting i = 0 and k = 1 in (5.17) and replacing V with -eFx, we have the expression similar to (5.122). Notice once again that h0| x| 1i = h1| x| 0i as x is an Hermitian operator and that we are considering real inner products. Inserting (5.121) into (5.115) and approximating 1 ≈ 1 - a2 , 1 þ a2

ð5:123Þ

we can estimate hHi. The estimation depends upon the nature of the system we choose. The next examples deal with these specific characteristics. Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how an energy of a particle carrying a charge e confined within a one-dimensional potential well is changed by the applied electric field. Here we deal with it using the variational method. We deal with the same Hamiltonian as in the case of Example 5.1. It is described as H= -

ħ2 d 2 - eFx: 2m dx2

ð5:22Þ

ħ2 d 2 , 2m dx2

ð5:124Þ

By putting H0  we rewrite the Hamiltonian as H = H 0 - eFx: Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial function jui as j ui = j 0i þ a j 1i: In the coordinate representation, we have

ð5:125Þ

182

5

Approximation Methods of Quantum Mechanics

j 0i =

1 π cos x L 2L

ð5:26Þ

j 1i =

π 1 sin x: L L

ð5:126Þ

and

From Example 1.2 of Sect. 1.3, we have ħ2 π 2 ,  2m 4L2 ħ2 4π 2 = 4E 0 :  h1jH 0 j1i = E 1 = 2m 4L2 h0jH 0 j0i = E 0 =

ð5:127Þ

Inserting these results into (5.121), we immediately get a ≈ eF

h0jxj1i : 3E0

ð5:128Þ

As in Example 5.1 of Sect. 5.1, we have h0jxj1i =

32L : 9π 2

ð5:129Þ

For this calculation, use the coordinate representation of (5.26) and (5.126). Thus, we get a ≈ eF

32  8  mL3 : 27π 4 ħ2

ð5:130Þ

Meanwhile, from (5.125) we obtain j ui ≈ j 0i þ eF

h0jxj1i j 1i: 3E0

ð5:131Þ

The resulting jui in (5.131) is the same as (5.31), which was obtained by the firstorder approximation of (5.30). Inserting (5.130) into (5.113) and approximating the denominator as before such that 1 ≈ 1 - a2 , 1 þ a2 and further using (5.130) for a, we get

ð5:123Þ

References

183

hH i ≈ E 0 - ðeF Þ2

322  8  mL4 : 243π 6 ħ2

ð5:132Þ

Once again, this is the same result as that of (5.28) and (5.29) where only n = 1 is taken into account. Example 5.5 We adopt the same problem as Example 5.2. The calculation procedures are almost the same as those of Example 5.2. We use h0jqj1i =

ħ 2mω

and

E 1 - E 0 = ħω:

ð5:133Þ

The detailed procedures are left for readers as an exercise. We should get the same results as those obtained in Example 5.2. As discussed in the above five simple examples, we showed the calculation procedures of the perturbation method and variational method. These methods not only supply us with suitable approximation techniques, but also provide physical and mathematical insight in many fields of natural science.

References 1. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 2. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York, NY 3. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York, NY 4. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York, NY 5. Coddington EA (1989) An introduction to ordinary differential equations. Dover, New York, NY

Chapter 6

Theory of Analytic Functions

Theory of analytic functions is one of the major fields of modern mathematics. Its application covers a broad range of topics of natural science. A complex function f(z), or a function that takes a complex number z as a variable, has various properties that often differ from those of functions that take a real number x as a variable. In particular, the analytic functions hold a paramount position in the complex analysis. In this chapter we explore various features of the analytic functions accordingly. From a practical point of view, the theory of analytic functions is very frequently utilized for the calculation of real definite integrals. For this reason, we describe the related topics together with tangible examples. The complex plane (or Gaussian plane) can be dealt with as a topological space where the metric (or distance function) is defined. Since the complex plane has a two-dimensional extension, we can readily imagine and investigate its topological feature. Set theory allows us to make an axiomatic approach along with the topology. Therefore, we introduce basic notions and building blocks of the set theory and topology.

6.1

Set and Topology

A complex number z is usually expressed as z = x þ iy,

ð6:1Þ

where x and y are the real numbers and i is an imaginary unit. Graphically, the number z is indicated as a point in a complex plane where the real axis is drawn as abscissa and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the complex plane has a two-dimensional extension, we can readily imagine the domain of variability of z and make a graphical object for it on the complex plane. A disklike diagram that is enclosed with a closed curve C is frequently dealt with in the © Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_6

185

186

6

Theory of Analytic Functions

Fig. 6.1 Complex plane where a complex number z is depicted

i 0

1

theory of analytic functions. Such a diagram provides a good subject for study of the set theory and topology. Before developing the theory of complex numbers and analytic functions, we briefly mention the basic notions as well as notations and meanings of sets and topology.

6.1.1

Basic Notions and Notations

A set comprises numbers (either real or complex) or mathematical objects, more generally. The latter include, e.g., vectors, their transformation, etc., as we shall see various illustrations in Parts III and IV. The set A is described such that A = f1, 2, 3, ⋯, 10g or B = fx; a < x < b, a, b 2 ℝg,

ð6:2Þ

where A contains ten elements in the former case and uncountably infinite elements are contained in the latter set B. In the latter case (sometimes in the former case as well), the elements are usually called points. When we speak of the sets, a universal set [1] is implied in the background. This set contains all the sets in consideration of the study and is usually clearly defined in the context of the discussion. For example, with the real analysis the universal set is ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set. We study various characteristics of sets (or subsets) that are contained in the universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an example of Venn diagram that shows a subset A. Naturally, every set contained in the universal set is its subset. As is often the case, the universal set U is omitted in Fig. 6.2a accordingly.

6.1

Set and Topology

187

(a)

Fig. 6.2 Venn diagrams. (a) An example of the Venn diagram. (b) A subset A and its complement Ac. U shows the universal set

(b)

To show a (sub)set A we usually depict it with a closed curve. If an element a is contained in A, we write a2A and depict it as a point inside the closed curve. To show that b is not contained in A, we write b2 = A: In that case, we depict b outside the closed curve of Fig. 6.2a. To show that a set A is contained in another set B, we write A ⊂ B:

ð6:3Þ

188

6

Fig. 6.3 Two different subsets A and B in a universal set U. The diagram shows the sum (A [ B), intersection (A \ B), and differences (A - B and B A)

Theory of Analytic Functions

− ˆ −

‰

If (6.3) holds, A is called a subset of B. If, however, the set B is contained in A, B is a subset of A and we write it as B ⊂ A. When both the relations A ⊂ B and B ⊂ A simultaneously hold, we describe it as A = B:

ð6:4Þ

Equation (6.4) indicates the equality of A and B as (sub)sets. The simultaneous establishment of A ⊂ B and B ⊂ A is often used to prove (6.4). We need to explicitly show a set that has no element. The relevant set is called an empty set and denoted by ∅. Examples are given as follows: x; x2 < 0, x 2 ℝ = ∅, fx; x ≠ xg = ∅, x 2 = ∅, etc: The subset A may have different cardinal numbers (or cardinalities) depending on the nature of A. To explicitly show this, elements are sometimes indexed as, e.g., ai (i = 1, 2, ⋯) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different cardinalities; (6.2) is an example. A complement (or complementary set) of A is defined as a difference U - A and denoted by Ac(U - A). That is, the set Ac is said to be a complement of A with respect to U and indicated with a shaded area in Fig. 6.2b. Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum (or union, or union of sets) A [ B, an intersection A \ B, and differences A - B and B - A. More specifically, A - B is defined as a subset of A to which B is subtracted from A and referred to as a set difference of A relative to B. We have A [ B = ðA - BÞ [ ðB - AÞ [ ðA \ BÞ:

ð6:5Þ

If A has no element in common with B, A - B = A and B - A = B with A \ B = ∅. Then, (6.5) trivially holds. Notice also that A [ B is the smallest set that contains both A and B. In (6.5) the elements must not be doubly counted. Thus, we have

6.1

Set and Topology

189

A [ A = A: The well-known de Morgan’s law can be understood graphically using Fig. 6.3. ðA [ BÞc = Ac \ Bc and ðA \ BÞc = Ac [ Bc :

ð6:6Þ

This can be shown as follows: With 8u 2 U we have = A [ B⟺u 2 = A and u 2 =B u 2 ðA [ BÞc ⟺u 2 ⟺u 2 Ac and u 2 Bc ⟺u 2 Ac \ Bc : Furthermore, we have = A \ B⟺u 2 = A or u 2 =B u 2 ðA \ BÞc ⟺u 2 ⟺u 2 Ac or u 2 Bc ⟺u 2 Ac [ Bc : We have other important relations such as the distributive law described by ðA [ BÞ \ C = ðA \ C Þ [ ðB \ C Þ,

ð6:7Þ

ðA \ BÞ [ C = ðA [ C Þ \ ðB [ C Þ:

ð6:8Þ

The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are intuitively acceptable, but we need more rigorous discussion to characterize and analyze various aspects of sets (vide infra).

6.1.2

Topological Spaces and Their Building Blocks

On the basis of the aforementioned discussion, we define a topology and topological space next. Let T be a universal set. Let τ be a collection of subsets of T. If the collection τ satisfies the following axioms, τ is called a topology on T. The relevant axioms are as follows: ðO1Þ T 2 τ and ∅ 2 τ, where ∅ is an empty set:

190

6

Theory of Analytic Functions

ðO2Þ If O1 , O2 , ⋯, On 2 τ, O1 \ O2 ⋯ \ On 2 τ: ðO3Þ If Oλ ðλ 2 ΛÞ 2 τ, [λ2Λ Oλ 2 τ: In the above axioms, T is called an underlying set. The coupled object of T and τ is said to be a topological space and denoted by (T, τ). The members (i.e., subsets) of τ are defined as open sets. (If we do not assume the topological space, the underlying set is equivalent to the universal set. Henceforth, however, we do not have to be too strict with the difference in this kind of terminology.) Let (T, τ) be a topological space with a subset O′ ⊂ T. Let τ′ be defined such that τ0  fO0 \ O; O 2 τg: Then, (O′, τ′) can be regarded as a topological space. In this case τ′ is called a relative topology for O′ [2, 3] and O′ is referred to as a subspace of T. The topological space (O′, τ′) shares the same topological feature as that of (T, τ). Notice that O′ does not necessarily need to be an open set of T. However, if O′ is an open set of T, then any open set belonging to O′ is an open set of T as well. This is evident from the definition of the relative topology and Axiom (O2). The above-mentioned Axioms (O1) to (O3) sound somewhat pretentious and the definition of the open sets seems to be “descent from heaven.” Nonetheless, these axioms and terminology turn out useful soon. Once a topological space (T, τ) is given, subsets of various types arise and a variety of relationships among them ensue as well. Note that the above axioms mention nothing about a set that is different from an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think of the properties of S. Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ. Then, a subset defined as a complement of S in (T, τ); namely, Sc = T - S is called a closed set in (T, τ). Replacing S with Sc in Definition 6.1, we have (Sc)c = S = T - Sc. The above definition may be rephrased as follows: Let (T, τ) be a topological space and let A be a subset of T. Then, if Ac = T - A 2 τ, then A is a closed set in (T, τ). In parallel with the axioms (O1) to (O3), we have the following axioms related to the closed sets of (T, τ): Let ~τ be a collection of all the closed sets of (T, τ). ðC1Þ T 2 ~τ and ∅ 2 ~τ, where ∅ is an empty set: ðC2Þ If C 1 , C 2 , ⋯, C n 2 ~τ, C1 [ C 2 ⋯ [ C n 2 ~τ: ðC3Þ If C λ ðλ 2 ΛÞ 2 ~τ, \λ2Λ C λ 2 ~τ:

6.1

Set and Topology

191

These axioms are obvious from those of (O1) to (O3) along with de Morgan’s law (6.6). Next, we classify and characterize a variety of elements (or points) and subsets as well as relationships among them in terms of the open sets and closed sets. We make the most of these notions and properties to study various aspects of topological spaces and their structures. In what follows, we assume the presence of a topological space (T, τ).

(a) Neighborhoods [4] Definition 6.2 Let (T, τ) be a topological space and S be a subset of T. If S contains an open set ∃O that contains an element (or point) a 2 T, i.e., a 2 O ⊂ S, then S is called a neighborhood of a. Figure 6.4 shows a Venn diagram that represents a neighborhood of a. This simple definition occupies an important position in the study of the topological spaces and the theory of analytic functions.

(b) Interior and Closure [4] With the notion of neighborhoods at the core, we shall see further characterization of sets and elements of the topological spaces. Definition 6.3 Let x be a point of T and S be a subset of T. If S is a neighborhood of x, x is said to be in the interior of S. In this case, x is called an interior point of S. The interior of S is denoted by S∘. The interior as a technical term can be understood as follows: Suppose that x 2 = S. Then, by Definition 6.2 S is not a neighborhood of x. By Definition 6.3, this leads to x2 = S∘. This statement can be translated into x 2 Sc ) x 2 (S∘)c. That is, Sc ⊂ (S∘)c. This is equivalent to Fig. 6.4 Venn diagram that represents a neighborhood S of a. S contains an open set O that contains a

192

6

S∘ ⊂ S:

Theory of Analytic Functions

ð6:9Þ

Definition 6.4 Let x be a point of T and S be a subset of T. If for any neighborhood N of x we have N \ S ≠ ∅, x is said to be in the closure of S. In this case, x is called an adherent point of S. The closure of S is denoted by S. 8

According to the above definition, with the adherent point x of S we explicitly write x 2 S. Think of negation of the statement of Definition 6.4. That is, with some = S. Meanwhile, neighborhood ∃N of x we have (i) N \ S = ∅ , x 2 (ii) N \ S = ∅ ) x 2 = S. Combining (i) and (ii), we have x 2 = S ) x2 = S. Similarly to the above argument about the interior, we get S ⊂ S:

ð6:10Þ

Combining (6.9) and (6.10), we get S∘ ⊂ S ⊂ S:

ð6:11Þ

In relation to the interior and closure, we have the following important lemmas. Lemma 6.1 Let S be a subset of T and O be any open set contained in S. Then, we have O ⊂ S∘. Proof Suppose that with an arbitrary point x, x 2 O. Since we have x 2 O ⊂ S, S is a neighborhood of x (due to Definition 6.2). Then, from Definition 6.3 we have x 2 S∘. Then, we have x 2 O ) x 2 S∘. This implies O ⊂ S∘. This completes the proof. Lemma 6.2 Let S be a subset of T. If x 2 S∘, then there is some open set ∃O that satisfies x 2 O ⊂ S. Proof Suppose that with a point x, x 2 S∘. Then by Definition 6.3, S is a neighborhood of x. Meanwhile, by Definition 6.2 there is some open set ∃O that satisfies x 2 O ⊂ S. This completes the proof. Lemmas 6.1 and 6.2 teach us further implications. From Lemma 6.1 and (6.11) we have O ⊂ S∘ ⊂ S

ð6:12Þ

with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing ~ Meanwhile, using Lemma 6.1 once again we ~ we have S∘ ⊂ O. this specific O as O, must get ~ ⊂ S∘ ⊂ S: O

ð6:13Þ

6.1

Set and Topology

193

(a)

(b)

̅

°

Fig. 6.5 (a) Largest open set S∘ contained in S. O denotes an open set. (b) Smallest closed set S containing S. C denotes a closed set

~ and O ~ ⊂ S∘ at once. This implies that with this specific That is, we have S∘ ⊂ O ∘ ~ ~ open set O, we get O = S . That is, ~ = S∘ ⊂ S: O

ð6:14Þ

Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the largest open set contained in S. Figure 6.5a depicts this relation. In this context we have a following important theorem. Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S = S∘. Proof From (6.14) S∘ is open. Therefore, if S = S∘, then S is open. Conversely, suppose that S is open. In that event S itself is an open set contained in S. Meanwhile, S∘ has been characterized as the largest open set contained in S. Consequently, we have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S = S∘. This completes the proof. In contrast to Lemmas 6.1 and 6.2, we have the following lemmas with respect to closures. Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we have S ⊂ C. Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2 = C. Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S = ∅. As x 2 Cc ⊂ Cc, by Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e., c c x2 = S. This statement can be translated into x 2 C c ) x 2 S ; i.e., C c ⊂ S . That is, we get S ⊂ C. Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S; i.e., x 2 = S. Then, x 2 = C for some closed set ∃C containing S. Proof If x 2 = S, then by Definition 6.4 we have some neighborhood ∃N and some ∃ open set O contained in that N such that x 2 O ⊂ N and N \ S = ∅. Therefore, we

194

6 Theory of Analytic Functions

must have O \ S = ∅. Let C = Oc. Then C is a closed set with C ⊃ S. Since x 2 O, x 2 = C. This completes the proof. From Lemmas 6.3 and 6.4 we obtain further implications. From Lemma 6.3 and (6.11) we have S⊂S⊂C

ð6:15Þ

with an arbitrary closed set 8C containing S. From Lemma 6.4 we have x 2 c ~ S ) x 2 C c ; i.e., C ⊂ S with ∃C containing S. Expressing this specific C as C, ~ we have C ⊂ S. Meanwhile, using Lemma 6.3 once again we must get ~ S ⊂ S ⊂ C:

ð6:16Þ

~ ⊂ S and S ⊂ C ~ at once. This implies that with this specific That is, we have C ~ we get C ~ = S. Hence, we get closed set C, ~ S ⊂ S = C:

ð6:17Þ

Relation (6.17) obviously shows that S is a closed set and moreover that S is the smallest closed set containing S. Figure 6.5b depicts this relation. In this context we have a following important theorem. Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S = S. Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is closed. Therefore, if S = S, then S is closed. Conversely, suppose that S is closed. In that event S itself is a closed set containing S. Meanwhile, S has been characterized as the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from (6.11) we have S ⊃ S. Thus, we get S = S. This completes the proof.

(c) Boundary [4] Definition 6.5 Let (T, τ) be a topological space and S be a subset of T. If a point x 2 T is in both the closure of S and the closure of the complement of S, i.e., Sc , x is said to be in the boundary of S. In this case, x is called a boundary point of S. The boundary of S is denoted by Sb. By this definition Sb can be expressed as Sb = S \ Sc : Replacing S with Sc, we get

ð6:18Þ

6.1

Set and Topology

195

ðSc Þb = Sc \ ðSc Þc = Sc \ S:

ð6:19Þ

Sb = ð S c Þ b :

ð6:20Þ

Thus, we have

This means that S and Sc have the same boundary. Since both S and Sc are closed sets and the boundary is defined as their intersection, from Axiom (C3) the boundary is a closed set. From Definition 6.1 a closed set is defined as a complement of an open set, and vice versa. Thus, the open set and closed set do not make a confrontation concept but a complementation concept. In this respect we have the following lemma and theorem. Lemma 6.5 Let S be an arbitrary subset of T. Then, we have S c = ð S∘ Þ c : Proof Since S∘ ⊂ S, (S∘)c ⊃ Sc. As S∘ is open, (S∘)c is closed. Hence, ðS∘ Þc ⊃ Sc , because Sc is the smallest closed set containing Sc. Next let C be an arbitrary closed set that contains Sc. Then Cc is an open set that is contained in S, and so we have Cc ⊂ S∘, because S∘ is the largest open set contained in S. Therefore, C ⊃ (S∘)c. Then, if we choose Sc for C, we must have Sc ⊃ (S∘)c. Combining this relation with ðS∘ Þc ⊃ Sc obtained above, we get Sc = ðS∘ Þc . Theorem 6.3[4] A necessary and sufficient condition for S to be both open and closed at once is Sb = ∅. Proof 1. Necessary condition: Let S be open and closed. Then, S = S and S = S∘. Suppose that Sb ≠ ∅. Then, from Definition 6.5 we would have ∃ a 2 S \ Sc for some a. Meanwhile, from Lemma 6.5 we have Sc = ðS∘ Þc . This leads to a 2 S \ ðS∘ Þc . However, from the assumption we would have a 2 S \ Sc. We have no such a, however. Thus, using the proof by contradiction we must have Sb = ∅. 2. Sufficient condition: Suppose that Sb = ∅. Starting with S ⊃ S∘ and S ≠ S∘, let us assume that there is a such that a 2 S and a 2 = S∘. That is, we have ∘ c ∘ c ∘ c c b a 2 ðS Þ ) a 2 S \ ðS Þ ⊂ S \ ðS Þ = S \ S = S , in contradiction to the supposition. Thus, we must not have such a, implying that S = S∘. Next, starting with S ⊃ S and S ≠ S, we assume that there is a such that a 2 S and a 2 = S. That is, we have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc = Sb , in contradiction to the supposition. Thus, we must not have such a, implying that S = S. Combining this relation with S = S∘ obtained above, we get S = S = S∘ . In other words, S is both open and closed at once. These complete the proof.

196

6

Theory of Analytic Functions

The above-mentioned set is sometimes referred to as a closed–open set or a clopen set as a portmanteau word. An interesting example can be seen in Chap. 20 in relation to topological groups (or continuous groups).

(d) Accumulation Points and Isolated Points We have another important class of elements and sets that include accumulation points and isolated points. Definition 6.6 Let (T, τ) be a topological space and S be a subset of T. Suppose that we have a point p 2 T. If for any neighborhood N of p we have N \ (S - {p}) ≠ ∅, then p is called an accumulation point of S. A set comprising all the accumulation points of S is said to be a derived set of S and denoted by Sd. Note that from Definition 6.4 we have N \ ðS - fpgÞ ≠ ∅ , p 2 S - fpg,

ð6:21Þ

where N is an arbitrary neighborhood of p. Definition 6.7 Let S be a subset of T. Suppose that we have a point p 2 S. If for some neighborhood N of p we have N \ (S - {p}) = ∅, then p is called an isolated point of S. A set comprising only isolated points of S is said to be a discrete set (or subset) of S and we denote it by Sdsc. Note that if p is an isolated point, N \ (S - {p}) = N \ S - N \ {p} = N \ S {p} = ∅. That is, {p} = N \ S. Therefore, any isolated points of S are contained in S. Contrary to this, the accumulation points are not necessarily contained in S. Comparing Definitions 6.6 and 6.7, we notice that the latter definition is obtained by negation of the statement of the former definition. This implies that any points of a set are classified into mutually exclusive two alternatives, i.e., the accumulation points or isolated points. We divide Sd into a direct sum of two sets such that S d = ð S [ S c Þ \ S d = Sd \ S [ S d \ S c ,

ð6:22Þ

where with the second equality we used (6.7). Now, we define Sd

þ

 Sd \ S and Sd

-

 Sd \ S c :

Then, (6.22) means that Sd is divided into two sets consisting of the accumulation points, i.e., (Sd)+ that belongs to S and (Sd)- that does not belong to S. Thus, as another direct sum we have

6.1

Set and Topology

197

S = Sd \ S [ Sdsc = Sd

þ

[ Sdsc :

ð6:23Þ

Equation (6.23) represents the direct sum consisting of a part of the derived set and the whole discrete set. Here we have the following theorem. Theorem 6.4 Let S be a subset of T. Then we have a following equation: S = Sd [ Sdsc Sd \ Sdsc = ∅ :

ð6:24Þ

Proof Let us assume that p is an accumulation point of S. Then, we have p 2 S - fpg ⊂ S. Meanwhile, Sdsc ⊂ S ⊂ S. This implies that S contains both Sd and Sdsc. That is, we have S ⊃ Sd [ Sdsc : It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc. Taking account of the above discussion on the accumulation points and isolated points, however, such remainder points should again be classified into accumulation and isolated points. Since all the isolated points are contained in S, they can be taken into S. Suppose that among the accumulation points, some points p are not contained in S; i.e., p 2 = S. In that case, we have S - {p} = S, namely S - fpg = S. Thus, p 2 S - fpg , p 2 S. From (6.21), in turn, this implies that in case p is not contained in S, for p to be an accumulation point of S is equivalent to that p is an adherent point of S. Then, those points p can be taken into S as well. Thus, finally, we get (6.24). From Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc = ∅. This completes the proof. Rewriting (6.24), we get S = Sd

-

[ Sd

þ

[ Sdsc = Sd

-

[ S:

ð6:25Þ

We have other important theorems. Theorem 6.5 Let S be a subset of T. Then we have a following relation: S = Sd [ S:

ð6:26Þ

Proof Let us assume that p 2 S. Then, we have the following two cases: (i) if p 2 S, we have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2 = S. Since p 2 S, from Definition 6.4 any neighborhood of p contains a point of S (other than p). Then, from Definition 6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of

198

6

Theory of Analytic Functions

̅

Fig. 6.6 Relationship among S, S, Sd, and Sdsc. Regarding the symbols and notations, see text

( (

)

) ̅ = ∪ = = ( ) ∪(

∪ )

,

p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S obtained above, we get (6.26). This completes the proof. Theorem 6.6 Let S be a subset of T. A necessary and sufficient condition for S be a closed set is that S contains all the accumulation points of S. Proof From Theorem 6.2, that S is a closed set is equivalent to S = S. From Theorem 6.5 this is equivalent to S = Sd [ S. Obviously, this is equivalent to Sd ⊂ S. In other words, S contains all the accumulation points of S. This completes the proof. As another proof, taking account of (6.25), we have Sd

-

= ∅ ⟺ S = S ⟺S is a closed set ðfrom Theorem 6:2Þ :

This relation is equivalent to the statement that S contains all the accumulation points of S. In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we can readily show that Sdsc = S - Sd = S - Sd : We have a further example of direct sum decomposition. Using Lemma 6.5, we have S b = S \ Sc = S \ ð S ∘ Þ c = S - S∘ : Since S ⊃ S∘ , we get

ð6:27Þ

6.1

Set and Topology

199

S = S∘ [ Sb ; S∘ \ Sb = ∅: Notice that (6.27) is a succinct expression of Theorem 6.3. That is, Sb = ∅ ⟺ S = S∘ ⟺ S is a clopen set:

(e) Connectedness In the precedent several paragraphs we studied various building blocks and their properties of the topological space and sets contained in the space. In this paragraph, we briefly mention the relationship between the sets. The connectedness is an important concept in the topology. The concept of the connectedness is associated with the subspaces defined earlier in this section. In terms of the connectedness, a topological space can be categorized into two main types; a subspace of the one type is connected and that of the other is disconnected. Let A be a subspace of T. Intuitively, for A to be connected is defined as follows: Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of the connected subspaces, suppose that the subspace A has the property that all the points inside any closed path (i.e., continuous closed line) C within A contain only the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply connected. A subspace that is not simply connected but connected is said to be multiply connected. A typical example for the latter is a torus. If a subspace is given as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be disconnected. Notice that if two (or more) sets have no element in common, those sets are said to be disjoint. Fig. 6.7c shows an example for which a disconnected subspace A is a union of two disjoint sets B and C. Interesting examples can be seen in Chap. 20 in relation to the connectedness.

6.1.3

T1-Space

We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ). We have two extremes of them. One is an indiscrete topology (or trivial topology) and the other is a discrete topology [2]. For the former the topology is characterized as τ = f∅, T g and the latter case is

ð6:28Þ

200

6

(a)

Theory of Analytic Functions

(b)

(c) =



Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can be connected by a continuous line that is contained entirely within A. (b) Simply connected subspace A. All the points inside any closed path C within A contain only the points that belong to A. (c) Disconnected subspace A that comprises two disjoint sets B and C

τ = f∅, all the subsets of T, T g:

ð6:29Þ

With τ all the subsets are open and complements to the individual subsets are closed by Definition 6.1. However, such closed subsets are contained in τ, and so they are again open. Thus, all the subsets are clopen sets. Practically speaking, the aforementioned two extremes are of less interest and value. For this reason, we need some moderate separation conditions with topological spaces. These conditions are well-established as the separation axioms. We will not get into details about the discussion [2], but we mention the first separation axiom (Fréchet axiom) that produces the T1-space. Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to x, y (x ≠ y) 2 T there is a neighborhood N in such a way that

8

x 2 N and y 2 = N: The topological space that satisfies the above separation condition is called a T1-space. We mention two important theorems of the T1-space.

6.1

Set and Topology

201

Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that for each point x 2 T, {x} is a closed set of T. Proof 1. Necessary condition: Let (T, τ) be a T1-space. Choose any pair of elements x, y (x ≠ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (≠x) that does not contain x. Then, we have y 2 N and N \ {x} = ∅. From Definition 6.4 = fxg. This means that this implies that y 2 = fxg. Thus, we have y ≠ x ) y 2 y 2 fxg ) y = x. Namely, fxg ⊂ fxg, but from (6.11) we have fxg ⊂ fxg. Thus, fxg = fxg. From Theorem 6.2 this shows that {x} is a closed set of T. 2. Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and y such that y ≠ x. Since {x} is a closed set, N = T - {x} is an open set that contains y; i.e., y 2 T - {x} and x 2 = T - {x}. Since y 2 T - {x} ⊂ T - {x} where T - {x} is an open set, from Definition 6.2 N = T - {x} is a neighborhood of y, and moreover N does not contain x. Then, from Definition 6.8 (T, τ) is a T1-space. These complete the proof. A set comprising a unit element is called a singleton or a unit set. Theorem 6.8 Let S be a subset of a T1-space. Then Sd is a closed set in the T1-space. = Sd. Then, Proof Let p be a point of S. Suppose p 2 Sd . Suppose, furthermore, p 2 ∃ from Definition 6.6 there is some neighborhood N of p such that N \ (S - {p}) = ∅. Since p 2 Sd , from Definition 6.4 we must have N \ Sd ≠ ∅ at once for this special neighborhood N of p. Then, as we have p 2 = Sd, we can take a point q such that d q 2 N \ S and q ≠ p. Meanwhile, we may choose an open set N for the above neighborhood of p (i.e., an open neighborhood), because p 2 N (=N∘) ⊂ N. Since N \ (T - {p}) = N - {p} is an open set from Theorem 6.7 as well as Axiom (O2) and Definition 6.1, N - {p} is an open neighborhood of q which does not contain p. Namely, we have q 2 N - {p}[=(N - {p})∘] ⊂ N - {p}. Note that q 2 N - {p} and p 2 = N - {p}, in agreement with Definition 6.8. As q 2 Sd, again from Definition 6.6 we have ðN - fpgÞ \ ðS - fqgÞ ≠ ∅:

ð6:30Þ

This implies that N \ (S - {q}) ≠ ∅ as well. However, it is in contradiction to the original relation of N \ (S - {p}) = ∅, which is obtained from p 2 = Sd. Then, we must have p 2 Sd. Thus, from the original supposition we get Sd ⊂ Sd . Meanwhile, we have Sd ⊃ Sd from (6.11). We have Sd = Sd accordingly. From Theorem 6.2, this means that Sd is a closed set. Hence, we have proven the theorem. The above two theorems are intuitively acceptable. For, in a one-dimensional Euclidean space ℝ we express a singleton {x} as [x, x] and a closed interval as [x, y] (x ≠ y). With the latter, [x, y] might well be expressed as (x, y)d. Such subsets are well-known as closed sets. According to the separation axioms, various topological

202

6

Theory of Analytic Functions

spaces can be obtained by imposing stronger constraints upon the T1-space [2]. A typical example for this is a metric space. In this context, the T1-space has acquired primitive notions of metric that can be shared with the metric space. Definitions of the metric (or distance function) and metric space will briefly be summarized in Chap. 13 in reference to an inner product space. In the above discussion, we have described a brief outline of the set theory and topology. We use the results in this chapter and in Part IV as well.

6.1.4

Complex Numbers and Complex Plane

Returning to (6.1), we further investigate how to represent a complex number. In Fig. 6.8 we redraw a complex plane and a complex number on it. The complex plane is a natural extension of a real two-dimensional orthogonal coordinate plane (Cartesian coordinate plane) that graphically represents a pair of real numbers x and y as (x, y). In the complex plane we represent a complex number as z = x þ iy

ð6:1Þ

by designating x as an abscissa on the real axis and y as an ordinate on the imaginary axis. The two axes are orthogonal to each other. An absolute value (or modulus) of z is defined as a real non-negative number j zj =

x 2 þ y2 :

ð6:31Þ

Analogously to a real two-dimensional polar coordinate, we can introduce in the complex plane a non-negative radial coordinate r and an angular coordinate θ

Fig. 6.8 Complex plane where z is expressed in a polar form using a non-negative radial coordinate r and an angular coordinate θ

i 0

x 1

6.1

Set and Topology

203

(Fig. 6.8). In the complex analysis the angular coordinate is called an argument and denoted by arg z so as to be argz = θ þ 2πn; n = 0, ± 1, ± 2, ⋯: In Fig. 6.8, x and y are given by x = r cos θ and y = r sin θ: Thus, we have a polar form of z expressed as z = r ðcos θ þ i sin θÞ:

ð6:32Þ

Using the well-known Euler’s formula (or Euler’s identity) eiθ = cos θ þ i sin θ,

ð6:33Þ

z = reiθ :

ð6:34Þ

zn = r n einθ = r n ðcos θ þ i sin θÞn = r n ðcos nθ þ i sin nθÞ,

ð6:35Þ

we get

From (6.32) and (6.34), we have

where the last equality comes from replacing θ with nθ in (6.33). Comparing both sides with the last equality of (6.35), we have ðcos θ þ i sin θÞn = cos nθ þ i sin nθ:

ð6:36Þ

Equation (6.36) is called the de Moivre’s theorem. This relation holds with n being integers including zero along with rational numbers including negative numbers. Euler’s formula (6.33) immediately leads to the following important formulae: cos θ =

1 iθ 1 iθ e þ e - iθ , sin θ = e - e - iθ : 2 2i

ð6:37Þ

Notice that although in (6.32) and (6.33) we assumed that θ is a real number, (6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from (6.37) and the following definition of power series expansion of the exponential function

204

6

ez 

Theory of Analytic Functions

zk , k = 0 k! 1

ð6:38Þ

where z is any complex number [5]. In fact, from (6.37) and (6.38) we have the following familiar expressions of power series expansion of the cosine and sine functions cos z =

1 k=0

ð- 1Þk

z2k , sin z = ð2kÞ!

1 k=0

ð- 1Þk

z2kþ1 : ð2k þ 1Þ!

These expressions frequently appear in this chapter. Next, let us think of a pair of complex numbers z and w with w expressed as w = u + iv. Then, we have z - w = ðx - uÞ þ iðy - vÞ: Using (6.31), we get jz - wj =

ð x - uÞ 2 þ ð y - v Þ 2 :

ð6:39Þ

Equation (6.39) represents the “distance” between z and w. If we define a following function ρ(z, w) such that ρðz, wÞ  jz - wj, ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be dealt with as a metric space. This makes it easy to view the complex plane as a topological space. Discussions about the set theory and topology already developed earlier in this chapter basically hold. In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded as a part of the complex plane. Since the complex plane ℂ itself represents a topological space, the subset A may be viewed as a subspace of ℂ. A complex function f(z) of a complex variable z of (6.1) is usually defined in a connected open set called a region [5]. The said region is of practical use among various subspaces and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region can be considered in a manner similar to that described in Sect. 6.1.2.

6.2

Analytic Functions of a Complex Variable

6.2

205

Analytic Functions of a Complex Variable

Since the complex number and complex plane have been well-characterized in the previous section, we deal with the complex functions of a complex variable in the complex plane. Taking the complex conjugate of (6.1), we have z = x - iy:

ð6:40Þ

Combining (6.1) and (6.40) in a matrix form, we get ð z z Þ = ð x y Þ The matrix

1 i

1 -i

1 i

1 -i

:

ð6:41Þ

is non-singular, and so the inverse matrix exists (see Sect.

11.3) and (x y) can be described as 1 ð x yÞ = ð z z  Þ 2

1 1

-i i

ð6:42Þ

or with individual components we have x=

1 i ðz þ z Þ and y = - ðz - z Þ, 2 2

ð6:43Þ

which can be understood as the “vector synthesis” (see Fig. 6.9). Equations (6.41) and (6.42) imply that two sets of variables (x y) and (z z) can be dealt with on an equal basis. According to the custom, however, we prioritize the notation using (x y) over that of (z z). Hence, the complex function f is described as f ðx, yÞ = uðx, yÞ þ ivðx, yÞ,

ð6:44Þ

where both u(x, y) and v(x, y) are the real functions of real variables x and y. Here, let us pause for a second to consider the differentiation of a function. Suppose that a mathematical function is defined on a real domain (i.e., a real number line). Consider whether that function is differentiable at a certain point x0 of the real number line. On this occasion, we can approach x0 only from two directions, i.e., from the side of x < x0 (from the left) or from the side of x0 < x (from the right); see Fig. 6.10a. Meanwhile, suppose that a mathematical function is defined on a complex domain (i.e., a complex plane). Also consider whether the function is differentiable at a certain point z0 of the complex plane. In this case, we can approach z0 from continuously varying directions; see Fig. 6.10b where only four directions are depicted. In this context let us think of a simple example.

206

6

− − − −

Theory of Analytic Functions







+



0 ∗

− − +



Fig. 6.9 Vector synthesis of z and z

Example 6.1 Let f(x, y) be a function described by f ðx, yÞ = 2x þ iy = z þ x:

ð6:45Þ

As remarked above, substituting (6.43) for (6.45), we could have 1 i 1 hðz, z Þ = 2  ðz þ z Þ þ i - ðz - z Þ = z þ z þ ðz - z Þ 2 2 2

=

1 ð3z þ z Þ, 2

where f(x, y) and h(z, z) denote different functional forms. The derivative (6.45) varies depending on a way z0 is approached. For instance, think of f ð0 þ zÞ - f ð0Þ df 2x þ iy 2x2 þ y2 - ixy = lim : = lim = lim dz 0 z → 0 z x → 0, y → 0 x þ iy x → 0, y → 0 x2 þ y2 Suppose that the differentiation is taken along a straight line in a complex plane represented by iy = (ik)x (k, x, y : real). Then, we have

6.2

Analytic Functions of a Complex Variable

Fig. 6.10 Ways a number (real or complex) is approached. (a) Two ways a number x0 is approached in a real number line. (b) Various ways a number z0 is approached in a complex plane. Here, only four ways are indicated

207

(a) x 0

(b) z i 0

1

df 2x2 þ k2 x2 - ikx2 2 þ k 2 - ik = lim = lim 2 dz 0 x → 0, y → 0 x → 0, y → 0 x2 þ k x2 1 þ k2 =

2 þ k2 - ik : 1 þ k2

However, this means that

ð6:46Þ df dz 0

takes varying values depending upon k. Namely,

df dz 0

cannot uniquely be defined but depends on different ways to approach the origin of the complex plane. Thus, we find that the derivative takes different values depending on straight lines along which the differentiation is taken. This means that f(x, y) is not differentiable or analytic at z = 0. Meanwhile, think of g(z) expressed as gðzÞ = x þ iy = z: In this case, we get

ð6:47Þ

208

6

Theory of Analytic Functions

gð0 þ zÞ - gð0Þ dg z-0 = lim = lim = 1: z dz 0 z → 0 z→0 z As a result, the derivative takes the same value 1, regardless of what straight lines the differentiation is taken along. Though simple, the above example gives us a heuristic method. As before, let f(z) be a function described as (6.44) such that f ðzÞ = f ðx, yÞ = uðx, yÞ þ ivðx, yÞ,

ð6:48Þ

where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to x and y. Then we have a derivative df(z)/dz such that f ðz þ ΔzÞ - f ðzÞ df ðzÞ = lim Δz dz Δz → 0

=

lim

Δx → 0, Δy → 0

uðx þ Δx, y þ ΔyÞ - uðx, yÞ þ i½vðx þ Δx, y þ ΔyÞ - vðx, yÞ : Δx þ iΔy ð6:49Þ

We wish to seek the condition that dfdzðzÞ gives the same result regardless of the order of taking the limit of Δx → 0 andΔy → 0. That is, taking the limit of Δy → 0 first, we have uðx þ Δx, yÞ - uðx, yÞ þ i½vðx þ Δx, yÞ - vðx, yÞ ∂uðx, yÞ df ðzÞ = = lim Δx dz Δx → 0 ∂x þi

∂vðx, yÞ : ∂x

ð6:50Þ

Next, taking the limit of Δx → 0 first, we get uðx, y þ ΔyÞ - uðx, yÞ þ i½vðx, y þ ΔyÞ - vðx, yÞ df ðzÞ = lim = dz iΔy Δy → 0 -i

∂uðx, yÞ ∂vðx, yÞ þ : ∂y ∂y

ð6:51Þ

Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we must have

6.2

Analytic Functions of a Complex Variable

209

∂uðx, yÞ ∂vðx, yÞ = , ∂x ∂y

ð6:52Þ

∂vðx, yÞ ∂uðx, yÞ =: ∂x ∂y

ð6:53Þ

These relationships (6.52) and (6.53) are called the Cauchy-Riemann conditions. Differentiating (6.52) with respect to x and (6.53) with respect to y and further subtracting one from the other, we get 2

∂ uðx, yÞ ∂x

2

2

þ

∂ uðx, yÞ ∂y

2

= 0:

ð6:54Þ

= 0:

ð6:55Þ

Similarly we have 2

∂ vðx, yÞ ∂x

2

2

þ

∂ vðx, yÞ ∂y

2

From the above discussion, we draw several important implications. 1. From (6.50), we have df ðzÞ ∂f = : dz ∂x

ð6:56Þ

df ðzÞ ∂f = -i : dz ∂y

ð6:57Þ

∂ ∂z ∂ ∂z ∂ ∂ ∂ = : = þ þ ∂x ∂x ∂z ∂x ∂z ∂z ∂z

ð6:58Þ

∂ ∂z ∂ ∂z ∂ ∂ ∂ =i : = þ ∂y ∂y ∂z ∂y ∂z ∂z ∂z

ð6:59Þ

2. Also from (6.51) we obtain

Meanwhile, we have

Also, we get

As in the case of Example 6.1, we rewrite f(z) as

210

6

Theory of Analytic Functions

f ðzÞ  F ðz, z Þ,

ð6:60Þ

where the change in the functional form is due to the variables transformation. Hence, from (6.56) and using (6.58) we have df ðzÞ ∂F ðz, z Þ ∂F ðz, z Þ ∂F ðz, z Þ ∂ ∂ = = þ : þ  F ðz, z Þ = dz ∂x ∂z ∂z ∂z ∂z

ð6:61Þ

Similarly, we get ∂F ðz, z Þ ∂F ðz, z Þ df ðzÞ ∂F ðz, z Þ ∂ ∂ = : = -i -  F ðz, z Þ = dz ∂y ∂z ∂z ∂z ∂z

ð6:62Þ

Equating (6.61) and (6.62), we obtain ∂F ðz, z Þ = 0: ∂z

ð6:63Þ

This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does not depend on z, but only depends upon z. Thus, we may write the differentiable function as F ðz, z Þ  f ðzÞ: This is an outstanding characteristic of the complex function that is differentiable with respect to the complex variable z. Taking the contraposition of the above statement, we say that if F(z, z) depends on z, it is not differentiable with respect to z. Example 6.1 is one of the illustrations. On the other way around, we consider what can happen if the Cauchy-Riemann conditions are satisfied [5]. Let f(z) be a complex function described by f ðzÞ = uðx, yÞ þ ivðx, yÞ,

ð6:48Þ

where u(x, y) and v(x, y) satisfy the Cauchy-Riemann conditions and possess continuous first-order partial derivatives with respect to x and y in some region of ℂ. Then, we have uðx þ Δx, y þ ΔyÞ - uðx, yÞ =

∂uðx, yÞ ∂uðx, yÞ Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ ∂x ∂y

6.2

Analytic Functions of a Complex Variable

vðx þ Δx, y þ ΔyÞ - vðx, yÞ =

211

∂vðx, yÞ ∂vðx, yÞ Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ ∂x ∂y

where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the first-order partial derivatives of u(x, y) and v(x, y). Then, we have f ðz þ ΔzÞ - f ðzÞ Δz

=

=

uðx þ Δx, y þ ΔyÞ - uðx, yÞ i½vðx þ Δx, y þ ΔyÞ - vðx, yÞ þ Δz Δz

∂vðx, yÞ ∂vðx, yÞ Δy Δx ∂uðx, yÞ Δy ∂uðx, yÞ Δx ðε þ iε2 Þ þ þi þ þi þ Δz Δz Δz Δz 1 ∂x ∂x ∂y ∂y  ðδ1 þ iδ2 Þ

=

∂vðx, yÞ ∂uðx, yÞ ∂vðx, yÞ Δx Δy Δx ∂uðx, yÞ þ þ þi ðε þ iε2 Þ þi Δz 1 Δz Δz ∂x ∂x ∂x ∂x þ

=

Δy ðδ þ iδ2 Þ Δz 1

Δy Δx þ iΔy ∂uðx, yÞ ðΔx þ iΔyÞ ∂vðx, yÞ Δx ðε þ iε2 Þ þ ðδ þ iδ2 Þ þ i þ  Δz Δz 1 Δz Δz 1 ∂x ∂x

=

∂uðx, yÞ ∂vðx, yÞ Δx Δy þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, Δz 1 Δz 1 ∂x ∂x

ð6:66Þ

where with the third equality we used the Cauchy-Riemann conditions (6.52) and (6.53). Accordingly, we have ∂uðx, yÞ ∂vðx, yÞ f ðz þ ΔzÞ - f ðzÞ Δx Δy þi = ðε þ iε2 Þ þ Δz Δz 1 Δz ∂x ∂x  ðδ1 þ iδ2 Þ: Taking the absolute values of both sides of (6.67), we get

ð6:67Þ

212

6

f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ þi Δz ∂x ∂x þ where

Δx Δz



Theory of Analytic Functions

Δx ðε þ iε2 Þ Δz 1

Δy ðδ þ iδ2 Þ , Δz 1

=p

jΔxj ðΔxÞ2 þðΔyÞ2

≤ 1 and

ð6:68Þ Δy Δz

=p

jΔyj ðΔxÞ2 þðΔyÞ2

≤ 1. Taking the limit of

Δz → 0, by assumption both the terms of RHS of (6.68) approach zero. Thus, lim

Δz → 0

f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f = þi = : Δz ∂x ∂x ∂x

ð6:69Þ

Alternatively, using partial derivatives of u(x, y) and v(x, y) with respect to y we can rewrite (6.69) as f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f = -i þi = -i : Δz Δz → 0 ∂y ∂y ∂y lim

ð6:70Þ

Notice that (6.69) and (6.70) are equivalent to the Cauchy-Riemann conditions. From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f(z) to be differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with respect to x and y should exist. Meanwhile, once f(z) is found to be differentiable (or analytic), its higher order derivatives must be analytic as well (vide infra). This requires, in turn, that the above first-order partial derivatives should be continuous. (Note that the analyticity naturally leads to continuity.) Thus, the following theorem will follow. Theorem 6.9 Let f(z) be a complex function of complex variables z such that f(z) = u(x, y) + iv(x, y), where x and y are the real variables. Then, a necessary and sufficient condition for f(z) to be differentiable is that the first-order partial derivatives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that u(x, y) and v(x, y) satisfy the Cauchy-Riemann conditions. Now, we formally give definitions of the differentiability and analyticity of a complex function of complex variables. Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f(z) be defined in a neighborhood containing z0. If f(z) is single-valued and differentiable at all points of this neighborhood containing z0, f(z) is said to be analytic at z0. A point at which f(z) is analytic is called a regular point of f(z). A point at which f(z) is not analytic is called a singular point of f(z). We emphasize that the above definition of analyticity at some point z0 requires the single-valuedness and differentiability at all the points of a neighborhood containing z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point is deeply connected to how and in which direction we take the limitation process.

6.3

Integration of Analytic Functions: Cauchy’s Integral Formula

213

Thus, we need detailed information about the neighborhood of the point in question to determine whether the function is analytic at that point. The next definition is associated with a global characteristic of the analytic function. Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f(z) be a complex function defined in R . If f(z) is analytic at all points of R , f(z) is said to be analytic in R . In this case R is called a domain of analyticity. If the domain of analyticity is an entire complex plane ℂ, the function is called an entire function. To understand various characteristics of the analytic functions, it is indispensable to introduce Cauchy’s integral formula. In the next section we deal with the integration of complex functions.

6.3

Integration of Analytic Functions: Cauchy’s Integral Formula

We can define the complex integration as a natural extension of Riemann integral with regard to a real variable. Suppose that there is a curve C in the complex plane. Both ends of the curve are fixed and located at za and zb. Suppose also that individual points of the curve C are described using a parameter t such that z = zðt Þ = xðt Þ þ iyðt Þ ðt 0  t a ≤ t ≤ t n  t b Þ,

ð6:71Þ

where z0  za = z(ta) and zn  zb = z(tb) together with zi = z(ti) (0 ≤ i ≤ n). For this parametrization, we assume that C is subdivided into n pieces designated by z0, z1, ⋯, zn. Let f(z) be a complex function defined in a region containing C. In this situation, let us assume the following summation Sn: Sn =

n i=1

f ðζ i Þðzi - zi - 1 Þ,

ð6:72Þ

where ζi lies between zi - 1 and zi (1 ≤ i ≤ n). Taking the limit n → 1 and concomitantly jzi - zi - 1 j → 0, we have lim Sn = lim

n→1

n→1

n i=1

f ðζ i Þðzi - zi - 1 Þ:

ð6:73Þ

If the constant limit exists independent of choice of zi (1 ≤ i ≤ n - 1) and ζ i (1 ≤ i ≤ n), this limit is called a contour integral of f(z) along the curve C. We write it as

214

6

I = lim Sn  n→1

zb

f ðzÞdz = C

Theory of Analytic Functions

f ðzÞdz:

ð6:74Þ

za

Using (6.48) and zi = xi + iyi, we rewrite (6.72) as n

Sn =

i=1 n

þ

i=1

½ uð ζ i Þ ð x i - x i - 1 Þ - v ð ζ i Þ ð y i - y i - 1 Þ 

i½vðζ i Þðxi - xi - 1 Þ þ uðζ i Þðyi - yi - 1 Þ:

Taking the limit of n → 1 as well as jxi - xi - 1 j → 0 and |yi - yi - 1| → 0 (0 ≤ i ≤ n), we have I=

½uðx, yÞdx - vðx, yÞdy þ i C

½vðx, yÞdx þ uðx, yÞdy:

ð6:75Þ

C

Further rewriting (6.75), we get I=

uðx, yÞ C

=

dx dy dt - vðx, yÞ dt þ i dt dt

½uðx, yÞ þ ivðx, yÞ C

½uðx, yÞ þ ivðx, yÞ

= C

f ðzÞ

= C

dx dt þ i dt

vðx, yÞ C

½uðx, yÞ þ ivðx, yÞ C

dx dy dt = þi dt dt

dz dt = dt

tb

f ðzÞ

ta

dx dy dt þ uðx, yÞ dt dt dt dy dt dt

½uðx, yÞ þ ivðx, yÞ C

dz dt = dt

zb

f ðzÞdz:

dz dt, dt ð6:76Þ

za

Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on the paths connecting the points za and zb. If so, (6.76) is merely of secondary importance. However, if f(z) can be described by the derivative of another function, the situation becomes totally different. Suppose that we have f ðzÞ =

~ ðzÞ dF : dz

ð6:77Þ

~ ðzÞ is analytic, f(z) is analytic as well. It is because a derivative of an analytic If F function is again analytic (vide infra). Then, we have f ðzÞ

~ ðzÞ dz dF ~ ½ zðt Þ dz d F : = = dt dz dt dt

Inserting (6.78) into (6.76), we get

ð6:78Þ

6.3

Integration of Analytic Functions: Cauchy’s Integral Formula

215

Fig. 6.11 Region R and several contours for integration. Regarding the symbols and notations, see text



f ðzÞ C

dz dt = dt

zb

tb

f ðzÞdz =

za

ta

~ ½zðt Þ dF ~ ðzb Þ - F ~ ðza Þ: dt = F dt

ð6:79Þ

~ ðzÞ is called a primitive function of f(z). It is obvious from (6.79) that In this case F zb za

f ðzÞdz = -

za

f ðzÞdz:

ð6:80Þ

zb

This implies that the contour integral I does not depend on the path C connecting the points za and zb. We must be careful, however, about a situation where there would be a singular point zs of f(z) in a region R . Then, f(z) is not defined at zs. If one chooses C′′ for a contour of the return path in such a way that zs is on C′′ (see Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation, we have to “expel” such singularity from the region R so that R can wholly be contained in the domain of analyticity of f(z) and that f(z) can be analytic in the entire region R . Moreover, we must choose R so that the whole region inside any closed paths within R can contain only the points that belong to the domain of analyticity of f(z). That is, the region R must be simply connected in terms of the analyticity of f(z) (see Sect. 6.1). For a region R to be simply connected ensures that (6.80) holds with any pair of points za and zb in R , so far as the integration path with respect to (6.80) is contained in R . From (6.80), we further get

216 zb

f ðzÞdz þ

za

za

6

Theory of Analytic Functions

f ðzÞdz = 0:

ð6:81Þ

zb

Since the integral does not depend on the paths, we can take a path from zb to za along a curve C′ (see Fig. 6.11). Thus, we have f ðzÞdz þ C

C0

f ðzÞdz =

CþC 0

f ðzÞdz = 0,

ð6:82Þ

~ as shown. In this way, in accordance with where C + C′ constitutes a closed curve C (6.81) we get

~ C

f ðzÞdz = 0,

ð6:83Þ

~ is followed counter-clockwise according to the custom. where C ~ can be chosen for the contour within the simply In (6.83) any closed path C connected region R . Hence, we reach the following important theorem. Theorem 6.10: Cauchy’s Integral Theorem [5] Let R be a simply connected region and let C be an arbitrary closed curve within R . Let f(z) be analytic on C and within the whole region enclosed by C. Then, we have f ðzÞdz = 0:

ð6:84Þ

C

There is a variation in description of Cauchy’s integral theorem. For instance, let f(z) be analytic in R except for a singular point at zs; see Fig. 6.12a. Then, the domain of analyticity of f(z) is not identical to R , but is defined as R - fzs g. That is, the domain of analyticity of f(z) is no longer simply connected and, hence, (6.84) is not generally

(a)

(b)



Γ ℛ

Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R . (b) The singular point at zs is not contained within R~ so that f(z) can be analytic in a simply connected region R~ . Regarding the symbols and notations, see text

6.3

Integration of Analytic Functions: Cauchy’s Integral Formula

217

true. In such a case, we may deform the integration path of f(z) so that its domain of analyticity can be simply connected and that (6.84) can hold. For example, we take ~ as well as lines L1 and L2 R~ so that it can be surrounded by curves C and Γ (Fig. 6.12b). As a result, zs has been expelled from R~ and, at the same time, R~ becomes simply connected. Then, as a tangible form of (6.84) we get

~ 1 þL2 CþΓþL

f ðzÞdz = 0,

ð6:85Þ

~ denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are where Γ rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel, because they are taken in the reverse direction. Thus, we have

~ CþΓ

f ðzÞdz = 0 or

f ðzÞdz =

Γ

C

f ðzÞdz,

ð6:86Þ

where Γ stands for the counter-clockwise integration. Equation (6.86) is valid as well, when there are more singular points. In that case, instead of (6.86) we have f ðzÞdz = C

i

Γi

f ðzÞdz,

ð6:87Þ

where Γi is taken so that it can encircle individual singular points. Reflecting the nature of singularity, we get different results with (6.87). We will come back to this point later. On the basis of Cauchy’s integral theorem, we show an integral representation of an analytic function that is well-known as Cauchy’s integral formula. Theorem 6.11: Cauchy’s Integral Formula Let f(z) be analytic in a simply connected region R . Let C be an arbitrary closed curve within R that encircles z. Then, we have f ðzÞ =

1 2πi

C

f ðζ Þ dζ, ζ-z

ð6:88Þ

where the contour integration along C is taken in the counter-clockwise direction. Proof Let Γ be a circle of a radius ρ in the complex plane so that z can be a center of the circle (see Fig. 6.13). Then, f(ζ)/(ζ - z) has a singularity at ζ = z. Therefore, we must evaluate (6.88) using (6.86). From (6.86), we have f ðζ Þ dζ = ζ C -z

f ðζ Þ dζ = f ðzÞ ζ Γ -z

dζ þ ζ Γ -z

f ðζ Þ - f ðzÞ dζ: ζ-z Γ

An arbitrary point ζ on the circle Γ is described as

ð6:89Þ

218

6

Fig. 6.13 Simply connected region R encircled by a closed contour C in a complex plane ζ. A circle Γ of radius ρ centered at z is contained within R . The contour integration is taken in the counter-clockwise direction along C or Γ

Theory of Analytic Functions



z

Γ

i 1

0

ζ = z þ ρeiθ :

ð6:90Þ

Then, taking infinitesimal quantities of (6.90), we have dζ = ρeiθ idθ = ðζ - zÞidθ:

ð6:91Þ

Inserting (6.91) into (6.89), we get dζ = ζ Γ -z



idθ = 2πi:

ð6:92Þ

0

Rewriting (6.89), we obtain f ðζ Þ dζ = 2πif ðzÞ þ ζ Γ -z

f ðζ Þ - f ðzÞ dζ: ζ-z Γ

ð6:93Þ

Since f(z) is analytic, f(z) is uniformly continuous [6]. Therefore, if we make ρ small enough with an arbitrary positive number ε, we have |f(ζ) - f(z)| < ε, as |ζ z| = ρ. Considering this situation, we have 1 2πi



f ðζ Þ 1 dζ - f ðzÞ = ζ z 2πi C

1 2π

Γ

f ðζ Þ 1 dζ - f ðzÞ = ζ z 2πi Γ

f ðζ Þ - f ðzÞ ε  jdζ j < ζ-z 2π

jdζ j ε = 2π ζ z j j Γ



f ðζ Þ - f ðzÞ dζ ζ-z Γ

dθ = ε:

0

Therefore, we get (6.88) in the limit of ρ → 0. This completes the proof.

ð6:94Þ

6.3

Integration of Analytic Functions: Cauchy’s Integral Formula

219

In the above proof, the domain of analyticity R of ζf ð-ζÞz (with respect to ζ) is given by R = R \ ½ℂ - fzg = R - fzg. Since both R and ℂ - fzg are open sets (i.e., ℂ - {z} = {z}c is a complementary set of a closed set {z} and, hence, an open set), R is an open set as well; see Sect. 6.1. Notice that R is not simply connected. For this reason, we had to evaluate (6.88) using (6.89). Theorem 6.10 (Cauchy’s integral theorem) and Theorem 6.11 (Cauchy’s integral formula) play a crucial role in the theory of analytic functions. To derive (6.94), we can equally use the Darboux’s inequality [5]. This is intuitively obvious and frequently used in the theory of analytic functions. Theorem 6.12: Darboux’s Inequality [5] Let f(z) be a function for which jf(z)j is bounded on C. Here C is a piecewise continuous path in the complex plane. Then, with the following integral I described by f ðzÞdz,

I= C

we have f ðzÞdz ≤ maxjf j  L,

ð6:95Þ

C

where L represents the arc length of the curve C for the contour integration. Proof As discussed earlier in this section, the integral is the limit of n → 1 of the sum described by Sn =

n i=1

ð6:72Þ

f ðζ i Þðzi - zi - 1 Þ

with I = lim Sn = lim n→1

n→1

n i=1

f ðζ i Þðzi - zi - 1 Þ:

ð6:73Þ

Denoting the maximum modulus of f(z) on C by max j fj, we have j Sn j ≤

n i=1

j f ðζ i Þ j  j ðzi - zi - 1 Þ j ≤ maxjf j

n i=1

jzi - zi - 1 j:

ð6:96Þ

The sum ni = 1 j ðzi - zi - 1 Þ j on RHS of inequality (6.96) is the length of a polygon inscribed in the curve C. It is shorter than the arc length L of the curve C. Hence, for all n we have j Sn j ≤ max j f j L:

220

6

As n → 1, jSn j = jI j = j

Theory of Analytic Functions

f ðzÞdz j ≤ max j f j L: C

This completes the proof. Applying the Darboux’s inequality to (6.94), we get f ðζ Þ - f ðzÞ dζ = ζ-z Γ

f ðζ Þ - f ðzÞ f ðζ Þ - f ðzÞ  2πρ < 2πε: dζ ≤ max iθ ρe ρeiθ Γ

That is, 1 2πi

f ðζ Þ - f ðzÞ dζ < ε: ζ-z Γ

ð6:97Þ

So far, we dealt with the differentiation and integration as different mathematical manipulations. However, Cauchy’s integral formula (or Cauchy’s integral expression) enables us to relate and unify these two manipulations. We have the following proposition for this. Proposition 6.1 [5] Let C be a piecewise continuous curve of finite length. (The curve C may or may not be closed.) Let f(z) be a continuous function. The contour integration of f(ζ)/(ζ - z) gives ~f ðzÞ such that ~f ðzÞ = 1 2πi

f ðζ Þ dζ: ζ C -z

ð6:98Þ

Then, ~f ðzÞ is analytic at any point z that does not lie on C. Proof We consider the following expression: Δ

~f ðz þ ΔzÞ - ~f ðzÞ 1 Δz 2πi

f ðζ Þ dζ , 2 C ðζ - zÞ

ð6:99Þ

where Δ is a real non-negative number. Describing (6.99) by use of (6.98) for ~f ðz þ ΔzÞ and ~f ðzÞ, we get Δ=

j Δz j 2π

f ðζ Þ dζ : 2 C ðζ - z - ΔzÞðζ - zÞ

ð6:100Þ

To obtain (6.100), in combination with (6.98) we have calculated (6.99) as follows:

6.3

Integration of Analytic Functions: Cauchy’s Integral Formula

Δ=

1 2πiΔz

221

f ðζ Þðζ - zÞ2 - f ðζ Þðζ - z - ΔzÞðζ - zÞ - f ðζ ÞΔzðζ - z - ΔzÞ dζ ðζ - z - ΔzÞðζ - zÞ2 C =

1 2πiΔz

f ðζ ÞðΔzÞ2 dζ : 2 C ðζ - z - ΔzÞðζ - zÞ

Since z is not on C, the integrand of (6.100) is bounded. Then, as Δz → 0, jΔz j → 0 and Δ → 0 accordingly. From (6.99), this ensures the differentiability of ~f ðzÞ. Then, by definition of the differentiation, d~f ðzÞ is given by dz

d~f ðzÞ 1 = dz 2πi

f ðζ Þ dζ: 2 C ðζ - zÞ

ð6:101Þ

As f(ζ) is continuous, ~f ðzÞ is single-valued. Thus, ~f ðzÞ is found to be analytic at any point z that does not lie on C. This completes the proof. If in (6.101) we take C as a closed contour that encircles z, from (6.98) we have ~f ðzÞ = 1 2πi

f ðζ Þ dζ: C ζ-z

ð6:102Þ

Moreover, if f(z) is analytic in a simply connected region that contains C, from Theorem 6.11 we must have f ðzÞ =

1 2πi

C

f ðζ Þ dζ: ζ-z

ð6:103Þ

Comparing (6.102) with (6.103) and taking account of the single-valuedness of f(z), we must have f ðzÞ  ~f ðzÞ:

ð6:104Þ

Then, from (6.101) we get df ðzÞ 1 = dz 2πi

f ðζ Þ dζ: 2 C ðζ - zÞ

An analogous result holds with the n-th derivative of f(z) such that [5]

ð6:105Þ

222

6

d n f ðzÞ n! = dzn 2πi

Theory of Analytic Functions

f ðζ Þ dζ ðn : zero or positive integersÞ: ð ζ zÞnþ1 C

ð6:106Þ

Equation (6.106) implies that an analytic function is infinitely differentiable and that the derivatives of all order of an analytic function are again analytic. These prominent properties arise partly from the aforementioned stringent requirement on the differentiability of a function of a complex variable. So far, we have not assumed the continuity of df(z)/dz. However, once we have established (6.106), it assures the presence of, e.g., d2f(z)/dz2 and, hence, the continuity of df(z)/dz. It is true of d nf(z)/dzn with any n (zero or positive integers). A following theorem is important and intriguing with the analytic functions. Theorem 6.13: Cauchy–Liouville Theorem A bounded entire function must be a constant. Proof Using (6.106), we consider the first derivative of an entire function f(z) described as df 1 = dz 2πi

f ðζ Þ dζ: 2 C ðζ - zÞ

Since f(z) is an entire function, we can arbitrarily choose a large enough circle of radius R centered at z for a closed contour C. On the circle, we have ζ = z þ Reiθ , where θ is a real number changing from 0 to 2π. Then, the above equation can be rewritten as df 1 = dz 2πi

2π 0

f ðζ Þ 1 iReiθ dθ = 2πR ðReiθ Þ2

2π 0

f ðζ Þ dθ: eiθ

ð6:107Þ

M , R

ð6:108Þ

Taking an absolute value of both sides, we have j

df 1 j ≤ dz 2πR

2π 0

j f ðζ Þ j dθ ≤

M 2πR

2π 0

dθ =

where M is the maximum of jf(ζ)j. As R → 1, |df/dz| tends to be zero. This implies that df/dz tends to be zero as well and, hence, that f(z) is constant. This completes the proof. At the first glance, Cauchy–Liouville theorem looks astonishing in terms of theory of real analysis. It is because we are too familiar with, e.g., -1 ≤ sin x ≤ 1 for any real number x. Note that sinx is bounded in a real domain. In fact, in Sect. 8.6 we will show that from (8.98) and (8.99), when a → ± 1 (a : real, a ≠ 0),

6.4

Taylor’s Series and Laurent’s Series

223

sin π2 þ ia takes real values with sin π2 þ ia → 1. This simple example clearly shows that sinϕ is an unbounded entire function in a complex domain. As a very familiar example of a bounded entire functions, we show f ðzÞ = cos 2 z þ sin 2 z  1,

ð6:109Þ

which is defined in an entire complex plane.

6.4

Taylor’s Series and Laurent’s Series

Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in relation to the power series expansion of analytic function. Taylor’s series and Laurent’s series are fundamental tools to study various properties of complex functions in the theory of analytic functions. First, let us examine Taylor’s series of an analytic function. Theorem 6.14 [6] Any analytic function f(z) can be expressed by uniformly convergent power series at an arbitrary regular point of the function in the domain of analyticity R . Proof Let us assume that we have a circle C of radius r centered at a within R . Choose z inside C with |z - a| = ρ (ρ < r) and consider the contour integration of f(z) along C (Fig. 6.14). Then, we have 1 z-a 1 1 1 1 = þ =  a = ζ a ζ - z ζ - a - ðz - aÞ ζ - a 1 - ζz ð ζ - aÞ 2 -a þ

ð z - aÞ 2 þ ⋯, ðζ - aÞ3

Fig. 6.14 Diagram to explain Taylor’s expansion that is assumed to be performed around a. A circle C of radius r centered at a is inside R . Regarding other symbols and notations, see text

ð6:110Þ



224

6

Theory of Analytic Functions

where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following formula: 1

1 = 1-x where x =

z-a ζ - a.

Since

z-a ζ-a

=

ρ r

n=0

xn ,

< 1, the geometric series of (6.110) converges.

Suppose that f(ζ) is analytic on C. Then, we have a finite positive number M on C such that jf ðζ Þj < M:

ð6:111Þ

f ðζ Þ f ðζ Þ ðz - aÞf ðζ Þ ðz - aÞ2 f ðζ Þ þ þ ⋯: = þ ζ-z ζ-a ð ζ - aÞ 3 ð ζ - aÞ 2

ð6:112Þ

Using (6.110), we have

Hence, combining (6.111) and (6.112) we get ðz - aÞν f ðζ Þ M ≤ ν=n r ðζ - aÞνþ1 1

=

ρ r

1 ν=n

ν

=

M 1 r 1-

ρ r

-

1 - ρr 1 - ρr

n

M ρ n : r-ρ r

ð6:113Þ

RHS of (6.113) tends to be zero when n → 1. Therefore, (6.112) is uniformly convergent with respect to ζ on C. In the above calculations, when we express the infinite series of RHS in (6.112) as Σ and the partial sum of its first n terms as Σn, LHS of (6.113) can be expressed as Σ - Σn(Ρn). Since jΡn j → 0 with n → 1, this certainly shows that the series Σ is uniformly convergent [6]. Consequently, we can perform termwise integration [6] of (6.112) on C and subsequently divide the result by 2πi to get 1 2πi

ð z - aÞ n n=0 2πi

f ðζ Þ dζ = ζ C -z

1

f ðζ Þ dζ: nþ1 C ð ζ - aÞ

ð6:114Þ

From Theorem 6.11, LHS of (6.114) equals f(z). Putting An  we get

1 2πi

C

f ðζ Þ dζ, ðζ - aÞnþ1

ð6:115Þ

6.4

Taylor’s Series and Laurent’s Series

225 1

f ðzÞ =

A ð z - aÞ n=0 n

n

:

ð6:116Þ

This completes the proof. In the above proof, from (6.106) we have 1 2πi

f ðζ Þ 1 d n f ðzÞ dζ = : nþ1 n! dzn C ðζ - zÞ

ð6:117Þ

Hence, combining (6.117) with (6.115) we get An =

1 2πi

f ðζ Þ 1 d n f ðzÞ dζ = nþ1 n! dzn C ðζ - aÞ

= z=a

1 d n f ð aÞ : n! dzn

ð6:118Þ

Thus, (6.116) can be rewritten as f ðzÞ =

1 d n f ð aÞ ð z - aÞ n : n = 0 n! dzn 1

This is the same form as that obtained with respect to a real variable. Equation (6.116) is a uniformly convergent power series called a Taylor’s series or Taylor’s expansion of f(z) with respect to z = a with An being its coefficient. In Theorem 6.14 we have assumed that a union of a circle C and its inside is simply connected. This topological environment is virtually the same as that of Fig. 6.13. In Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representation of an analytic function. On the basis of Cauchy’s integral formula, Theorem 6.14 demonstrates that any analytic function is given a tangible functional form. Example 6.2 Let us think of a function f(z) = 1/z. This function is analytic except for z = 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ - {0}. We consider the expansion around z = a (a ≠ 0). Using (6.115), we get the expansion coefficients that are expressed as An =

1 2πi

1 dz, nþ1 C z ð z - aÞ

where C is a closed curve that does not contain z = 0 on C or in its inside. Therefore, 1/z is analytic in the simply connected region encircled with C so that the point z = 0 may not be contained inside C (Fig. 6.15). Then, from (6.106) we get

226

6

Theory of Analytic Functions

Fig. 6.15 Diagram used for calculating the Taylor’s series of f(z) = 1/z around z = a (a ≠ 0). A contour C encircles the point a

z

i 1

0

1 2πi

1 dn ð1=zÞ 1 dz = nþ1 n! dzn C z ð z - aÞ

= z=a

ð- 1Þn 1 1 ð- 1Þn n! nþ1 = nþ1 : n! a a

Hence, from (6.116) we have f ðzÞ =

1 = z

ð- 1Þn 1 ð z - aÞ n = n = 0 anþ1 a 1

1 n=0

1-

z n : a

ð6:119Þ

The series uniformly converges within a convergence circle called a “ball” [7] (vide infra) whose convergence radius r is estimated to be ð- 1Þn n 1 1 = sup = nlim →1 r jaj anþ1

n

1 1 = : jaj jaj

ð6:120Þ

That is, r = j aj. In (6.120) “nlim sup ” stands for the superior limit and is →1

sometimes said to be “limes superior” [6]. The estimation is due to CauchyHadamard theorem. Readers interested in the theorem and related concepts are referred to appropriate literature [6]. The concept of the convergence circle and radius is very important for defining the analyticity of a function. Formally, the convergence radius is defined as a radius of a convergence circle in a metric space, typically ℝ2 and ℂ. In ℂ a region R within the convergence circle of convergence radius r (i.e., a real positive number) centered at z0 is denoted by R = z; z, ∃ z0 2 ℂ, ρðz, z0 Þ < r :

6.4

Taylor’s Series and Laurent’s Series

Fig. 6.16 Diagram to explain Laurent’s expansion that is performed around a. The circle Γ containing the point z lies on the annular region bounded by the circles C1 and C2

227

Γ



The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series defined in R is convergent. On the other hand, the Tailor’s series may or may not be convergent at z that is on the convergence circle. In Example 6.2, f(z) = 1/z is not defined at z = 0, even though z (=0) is on the convergent circle. At other points on the convergent circle, however, the Tailor’s series is convergent. Note that a region B n in ℝn similarly defined as B n = x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r is sometimes called a ball or open ball. This concept can readily be extended to any metric space. Next, we examine Laurent’s series of an analytic function. Theorem 6.15 [6] Let f(z) be analytic in a region R except for a point a. Then, f(z) can be described by the uniformly convergent power series within a region R - fag. Proof Let us assume that we have a circle C1 of radius r1 centered at a and another circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be centered at z (≠a) within R so that z can be outside of C2; see Fig. 6.16. In this situation we consider the contour integration along C1, C2, and Γ. Using (6.87), we have

228

6

1 2πi

f ðζ Þ 1 dζ = ζ z 2πi Γ

f ðζ Þ 1 dζ ζ z 2πi C1

Theory of Analytic Functions

f ðζ Þ dζ: ζ C2 - z

ð6:121Þ

From Theorem 6.11, we have LHS of ð6:121Þ = f ðzÞ: Notice that the region containing Γ and its inside form a simply connected region in terms of the analyticity of f(z). With the first term of RHS of (6.121), from Theorem 6.14 we have 1 2πi

f ðζ Þ dζ = ζ C1 - z

1

A ð z - aÞ n=0 n

n

ð6:122Þ

with An =

1 2πi

C1

f ðζ Þ dζ ðn = 0, 1, 2, ⋯Þ: ðζ - aÞnþ1

ð6:123Þ

In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in the case of the proof of Theorem 6.14. With the second term of RHS of (6.121), however, the situation is different in such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have ζ-a r = 2 < 1: z-a ρ

ð6:124Þ

Then, a geometric series described by 2 1 1 -1 ζ - a ð ζ - aÞ þ⋯ = = 1þ þ ζ - z ζ - a - ð z - aÞ z - a z - a ð z - aÞ 2

ð6:125Þ

is uniformly convergent with respect to ζ on C2. Accordingly, again we can perform termwise integration [6] of (6.125) on C2. Hence, we get C2

f ðζ Þ 1 dζ = ζ-z z-a

þ That is, we have

1 ð z - aÞ 3

f ðζ Þdζ þ C2

1 ð z - aÞ 2

f ðζ Þðζ - aÞdζ C2

f ðζ Þðζ - aÞ2 dζ þ ⋯: C2

ð6:126Þ

6.5

Zeros and Singular Points

-

229

f ðζ Þ dζ = ζ C2 - z

1 2πi

1 n=1

A - n ð z - aÞ - n ,

ð6:127Þ

where A-n =

1 2πi

f ðζ Þðζ - aÞn - 1 dζ ðn = 1, 2, 3, ⋯Þ:

ð6:128Þ

C2

Consequently, from (6.121), with z lying on the annular region bounded by C1 and C2, we get f ðzÞ =

1

A ð z - aÞ n= -1 n

n

:

ð6:129Þ

In (6.129), the coefficients An are given by (6.123) or (6.128) according to the plus (including zero) or minus sign of n. These complete the proof. Equation (6.129) is a uniformly convergent power series called a Laurent’s series or Laurent’s expansion of f(z) with respect to z = a with A-n being its coefficient. Note that the above annular region bounded by C1 and C2 is not simply connected, but multiply connected. ~ is taken in the annular region formed by C1 We add that if the integration path C and C2 so that it can be sandwiched in between them, the values of integral expressed as (6.123) and (6.128) remain unchanged in virtue of (6.87). That is, instead of (6.123) and (6.128) we may have a following unified formula An =

1 2πi

~ C

f ðζ Þ dζ ðn = 0, ± 1, ± 2, ⋯Þ ðζ - aÞnþ1

ð6:130Þ

that represents the coefficients An of the power series f ðzÞ =

6.5

1

A ð z - aÞ n= -1 n

n

:

ð6:129Þ

Zeros and Singular Points

We described the fundamental theorems of Taylor’s expansion and Laurent’s expansion. Next, we explain important concepts of zeros and singular points. Definition 6.11 Let f(z) be analytic in a region R . If f(z) vanishes at a point z = a, the point is called a zero of f(z). When we have

230

6

f ð aÞ =

df ðzÞ dz

z=a

=

d 2 f ðzÞ dz2

z=a

=⋯=

Theory of Analytic Functions

d n - 1 f ðzÞ dzn - 1

z=a

= 0,

ð6:131Þ

but d n f ðzÞ dzn

z=a

≠ 0,

ð6:132Þ

the function is said to have a zero of order n at z = a. If (6.131) and (6.132) hold, the first n coefficients in the Tailor’s series of the analytic function f(z) at z = a vanish. Hence, we have f ðzÞ = An ðz - aÞn þ Anþ1 ðz - aÞnþ1 þ ⋯

= ð z - aÞ n

1

A ð z - aÞ k = 0 nþk

k

= ðz - aÞn hðzÞ,

where we define h(z) as hðzÞ 

1

A ðz - aÞk : k = 0 nþk

ð6:133Þ

Then, h(z) is analytic and non-vanishing at z = a. From the analyticity h(z) must be continuous at z = a and differ from zero in some finite neighborhood of z = a. Consequently, it is also the case with f(z). Therefore, if the set of zeros had an accumulation point at z = a, any neighborhood of z = a would contain another zero, in contradiction to the above assumption. To avoid this contradiction, the analytic function f(z)  0 throughout the region R . Taking the contraposition of the above statement, if an analytic function is not identically zero [i.e., f(z) ≢ 0], the zeros of that function are isolated; see the relevant discussion of Sect. 6.1.2. Meanwhile, the Laurent’s series (6.129) can be rewritten as f ðzÞ =

1

A ðz - aÞk k=0 k

þ

1 k=1

A-k

1 ð z - aÞ k

ð6:134Þ

so that the constitution of the Laurent’s series can explicitly be recognized. The second term is called a singular part (or principal part) of f(z) at z = a. The singularity is classified as follows: (1) Removable singularity: If (6.134) lacks the singular part (i.e., A-k = 0 for k = 1, 2, ⋯), the singular point is said to be a removable singularity. In this case, we have

6.5

Zeros and Singular Points

231 1

f ðzÞ =

A ð z - aÞ k=0 k

k

:

ð6:135Þ

If in (6.135) we define suitably as f ðaÞ  A0 , f(z) can be regarded as analytic. Examples include the following case where a sin function is expanded as the Taylor’s series: sin z = z -

ð- 1Þn - 1 z2n - 1 z3 z 5 þ⋯ þ -⋯ þ 3! 5! ð2n - 1Þ!

ð- 1Þn - 1 z2n - 1 : n=1 ð2n - 1Þ! 1

=

ð6:136Þ

Hence, if we define f ðzÞ 

sin z z

f ð0Þ  lim

and

z→0

sin z = 1, z

ð6:137Þ

z = 0 is a removable singularity of f(z). (2) Poles: Suppose that in (6.134) we have a certain positive integer n (n ≥ 1) such that A-n ≠ 0

A - ðnþ1Þ = A - ðnþ2Þ = ⋯ = 0:

but

ð6:138Þ

Then, the function f(z) is said to have a pole of order n at z = a. If, in particular, n = 1 in (6.138), i.e., A-1 ≠ 0

but

A - 2 = A - 3 = ⋯ = 0,

ð6:139Þ

the function f(z) is said to have a simple pole. This special case is important in the calculation of various integrals (vide infra). In the general cases for (6.138), (6.134) can be rewritten as f ðzÞ =

gð z Þ , ð z - aÞ n

ð6:140Þ

where g(z) defined as gð z Þ 

1

A ð z - aÞ k=0 k-n

k

ð6:141Þ

is analytic and non-vanishing at z = a, i.e., A-n ≠ 0. Explicitly writing (6.140), we have

232

6

f ðzÞ =

1

A ð z - aÞ k=0 k

k

þ

n k=1

A-k

Theory of Analytic Functions

1 : ð z - aÞ k

(3) Essential singularity: If the singular part of (6.134) comprises an infinite series, the point z = a is called an essential singularity. In that case, f(z) performs a complicated behavior near or at z = a. Interested readers are referred to suitable literature [5]. A function f(z) that is analytic in a region within ℂ except for a set of points where the function has poles is called a meromorphic function in the said region. The above definition of meromorphic functions is true of the above Cases (1) and (2), but it is not true of Case (3). Henceforth, we will be dealing with the meromorphic functions. In the above discussion of Cases (1) and (2), we often deal with a function f(z) that has a single isolated pole at z = a. This implies that f(z) is analytic within a certain neighborhood Na of z = a but is not analytic at z = a. More specifically, f(z) is analytic in a region Na - {a}. Even though we consider more than one isolated pole, the situation is essentially the same. Suppose that there is another isolated pole at z = b. In that case, again take a certain neighborhood Nb of z = b (≠a) and f(z) is analytic in a region Nb - {b}. Readers may well wonder why we have to discuss this trifling issue. Nevertheless, think of the situation where a set of poles has an accumulation point. Any neighborhood of the accumulation point contains another pole where the function is not analytic. This is in contradiction to that f(z) is analytic in a region, e.g., Na - {a}. Thus, if such a function were present, it would be intractable to deal with.

6.6

Analytic Continuation

When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known that if a function is analytic in a certain region of ℂ and on a curve C that encircles the region, the values of the function within the region are determined once the values of the function on C are given. We have the following theorem for this. Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region R . Suppose that the two functions coincide in a set chosen from among (i) a neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a set of points containing an accumulation point belonging to R . Then, the two functions f1(z) and f2(z) coincide throughout R . Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three. Then, since f1(z) - f2(z) = 0 on that set, the set comprises the zeros of f1(z) - f2(z). This implies that the zeros are not isolated in R . Thus, as evidenced from the discussion of Sect. 6.5, we conclude that f1(z) - f2(z)  0 throughout R . That is, f1(z)  f2(z). This completes the proof.

6.6

Analytic Continuation

233

Fig. 6.17 Four convergence circles C1, C2, C3, and C4 for 1/z. These convergence circles are centered at 1, i, - 1, or - i



−1

1



The theorem can be understood as follows. Two different analytic functions cannot coincide in the set chosen from the above three (or more succinctly, a set that contains an accumulation point). In other words, the behavior of an analytic function in R is uniquely determined by the limited information of the subset belonging to R . The above statement is reminiscent of the pre-established harmony and, hence, occasionally talked about mysteriously. Instead, we wish to give a simple example. Example 6.3 In Example 6.2 we described a tangible illustration of the Taylor’s expansion of a function 1/z as follows: 1 = z

ð- 1Þn 1 ð z - aÞ n = n = 0 anþ1 a 1

1 n=0

1-

z n : a

ð6:119Þ

If we choose 1, i, - 1, or - i for a, we can draw four convergence circles as depicted in Fig. 6.17. Let each function described by (6.119) be f1(z), fi(z), f-1(z), or f-i(z). Let R be a region that consists of an outer periphery and its inside of the four circles C1, C2, C3, and C4 (i.e., convergence circles of radius 1). The said region R is colored pale green in Fig. 6.17. There are four petals as shown. We can view Fig. 6.17 as follows. For instance, f1(z) and fi(z) are defined in C1 and its inside and C2 and its inside, respectively, excluding z = 0. We have f1(z) = fi(z) within the petal P1 (overlapped part as shown). Consequently, from the statement of Theorem 6.16, f1(z)  fi(z) throughout the region encircled by C1 and C2 (excluding z = 0). Repeating this procedure, we get

234

6

Theory of Analytic Functions

f 1 ðzÞ  f i ðzÞ  f - 1 ðzÞ  f - i ðzÞ: Thus, we obtain the same functional entity throughout R except for the origin (z = 0). Further continuing similar procedures, we finally reach the entire complex plane except for the origin, i.e., ℂ - {0} regarding the domain of analyticity of 1/z. We further compare the above results with those for analytic function defined in the real domain. The tailor’s expansion of f(x) = 1/x (x: real with x ≠ 0) around a (a ≠ 0) reads as f ðxÞ =

f ðnÞ ðaÞ 1 ð x - aÞ n þ ⋯ = f ðaÞ þ f 0 ðaÞðx - aÞ þ ⋯ þ n! x ð- 1Þn ðx - aÞn : n = 0 anþ1 1

=

ð6:142Þ

This is exactly the same form as that of (6.119) aside from notation of the argument. The aforementioned procedure that determines the behavior of an analytic function outside the region where that function has been originally defined is called analytic continuation or analytic prolongation. Comparing (6.119) and (6.142), we can see that (6.119) is a consequence of the analytic continuation of 1/x from the real domain to the complex domain.

6.7

Calculus of Residues

The theorem of residues and its application to calculus of various integrals are one of central themes of the theory of analytic functions. The Cauchy’s integral theorem (Theorem 6.10) tells us that if f(z) is analytic in a simply connected region R , we have f ðzÞdz = 0,

ð6:84Þ

C

where C is a closed curve within R . As we have already seen, this is not necessarily the case if f(z) has singular points in R . Meanwhile, if we look at (6.130), we are aware of a simple but important fact. That is, replacing n with 1, we have A-1 =

1 2πi

f ðζ Þdζ or C

f ðζ Þdζ = 2πiA - 1 :

ð6:143Þ

C

This implies that if we could obtain the Laurent’s series of f(z) with respect to a singularity located at z = a and encircled by C, we can immediately estimate Cf(ζ) dζ of (6.143) from the coefficient of the term (z - a)-1, i.e., A-1. This fact further

6.7

Calculus of Residues

235

implies that even though f(z) has singularities, Cf(ζ)dζ may happen to vanish. Thus, whether (6.84) vanishes depends on the nature of f(z) and its singularities. To examine it, we define a residue of f(z) at z = a (that is encircled by C) as Res f ðaÞ 

1 2πi

f ðzÞdz or C

f ðzÞdz = 2πi Res f ðaÞ,

ð6:144Þ

C

where Res f(a) denotes the residue of f(z) at z = a. From (6.143) and (6.144), we have A - 1 = Res f ðaÞ:

ð6:145Þ

In a formal sense, the point a does not have to be a singular point. If it is a regular point of f(a), we have Res f(a) = 0 trivially. If there is more than one singularity at z = aj, we have Res f aj :

f ðzÞdz = 2πi C

ð6:146Þ

j

Notice that we assume the isolated singularities with (6.144) and (6.146). These equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this framework, we wish to evaluate the residue of f(z) at a pole of order n located at z = a. In Sect. 6.5, we had f(z) described by f ðzÞ =

gð zÞ , ð z - aÞ n

ð6:140Þ

where g(z) is analytic and non-vanishing at z = a. Inserting (6.140) into (6.144), we have Res f ðaÞ =

1 2πi

=

gð ζ Þ d n - 1 gð z Þ 1 dζ = n ðn - 1Þ! dzn - 1 C ð ζ - aÞ

d n - 1 ½ f ð z Þ ð z - aÞ n  1 ðn - 1Þ! dzn - 1

z=a

, z=a

ð6:147Þ

where with the second equality we used (6.106). In the special but important case where f(z) has a simple pole at z = a, from (6.147) we get Res f ðaÞ = f ðzÞðz - aÞjz = a : Setting n = 1 in (6.147) in combination with (6.140), we obtain

ð6:148Þ

236

6

Res f ðaÞ =

1 2πi

f ðzÞdz = C

1 2πi

Theory of Analytic Functions

gð z Þ dz = f ðzÞðz - aÞjz = a = gðzÞjz = a : z C -a

This is nothing but Cauchy’s integral formula (6.88) regarding g(z). Summarizing the above discussion, we can calculate the residue of f(z) at a pole of order n located at z = a using one of the following alternatives, i.e., (i) using (6.147) and (ii) picking up the coefficient A-1 in the Laurent’s series described by (6.129). In Sect. 6.3 we mentioned that the closed contour integral (6.86) or (6.87) depends on the nature of singularity. Here we have reached a simple criterion of evaluating that integral. In fact, suppose that we have a Laurent’s expansion such that 1

A ð z - aÞ n= -1 n

f ðzÞ =

n

:

ð6:129Þ

Then, let us calculate the contour integral of f(z) on a closed circle Γ of radius ρ centered at z = a. We have I

Γ

=

f ðζ Þdζ =

1

A n= -1 n



1

iA n= -1 n

Γ

ðζ - aÞn dζ

ρnþ1 eiðnþ1Þθ dθ:

ð6:149Þ

0

The above integral vanishes except for n = - 1. With n = - 1 we get I = 2πiA - 1 : Thus, we recover (6.145) in combination of (6.144). To get a Laurent’s series of f(z), however, is not necessarily easy. In that case, we estimate a residue by (6.147). Tangible examples can be seen in the next section.

6.8

Examples of Real Definite Integrals

The calculus of residues is of great practical importance. For instance, it is directly associated with the calculations of definite integrals. In particular, even if one finds the real integration to be hard to perform in order to solve a related problem, it is often easy to solve it using complex integration, especially the calculus of residues. Here we study several examples.

6.8

Examples of Real Definite Integrals

237

Fig. 6.18 Contour for the integration of 1/(1 + z2) that appears in Example 6.4. One may equally choose the upper semicircle (denoted by ΓR) and lower semicircle (denoted by Γ~R ) for contour integration

Γ

i 0



−i Γ

Example 6.4 Let us consider the following real definite integral: I=

1

dx : 1 þ x2 -1

ð6:150Þ

To this end, we convert the real variable x to the complex variable z and evaluate a contour integral IC (Fig. 6.18) described by IC =

R

dz þ 1 þ z2 -R

dz , 1 þ z2 ΓR

ð6:151Þ

where R is a real positive number enough large (R ≫ 1); ΓR denotes an upper semicircle; IC stands for the contour integral along the closed curve C that comprises the interval [-R, R] and the upper semicircle ΓR. Simple poles of order 1 are located at z = ± i as shown. There is only one simple pole at z = i within the upper semicircle of Fig. 6.18. From (6.146), therefore, we have IC =

f ðzÞdz = 2πi Res f ðiÞ,

ð6:152Þ

C 1 where f ðzÞ  1þz 2 . Using (6.148), the residue of f(z) at z = i can readily be estimated to be

Res f ðiÞ = f ðzÞðz - iÞjz = i =

1 : 2i

ð6:153Þ

To estimate the integral (6.150), we must calculate the second term of (6.151). To this end, we change the variable z such that

238

6

Theory of Analytic Functions

z = Reiθ ,

ð6:154Þ

where θ is an argument. Then, using the Darboux’s inequality (6.95), we get dz = 1 þ z2 ΓR

π

0

iReiθ dθ R2 e2iθ þ 1 dθ

þ 1 R2 e - 2iθ þ 1

1 R þ 2R cos 2θ þ 1 4

π

R R2 e2iθ

0

0

0

iReiθ dθ ≤ R2 e2iθ þ 1

π

=

=R

π

2

dθ ≤ R

π R2 1 -

1 R2

=

π R 1-

1 R2

:

Taking R → 1, we have dz → 0: 2 ΓR 1 þ z

ð6:155Þ

Consequently, if R → 1, we have lim I C =

R→1

1

dx þ lim 1 þ x2 R → 1 -1

dz = 1 þ z2 ΓR

1

dx =I 1 þ x2 -1

= 2πi Res f ðiÞ = π,

ð6:156Þ

where with the first equality we placed the variable back to x; with the last equality we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in general that if a meromorphic function is given as a quotient of two polynomials, an integral of a type of (6.155) tends to be zero with R → 1 in the case where the degree of the denominator of that function is at least two units higher than the degree of the numerator [8]. ~ (the closed curve comprising the We may equally choose another contour C interval [-R, R] and the lower semicircle Γ~R ) for the integration (Fig. 6.18). In that case, we have lim I C~ =

R→1

-1 1

dx þ lim 1 þ x2 R → 1

= 2πi Res f ð- iÞ = - π:

dz = 2 Γ~R 1 þ z

-1 1

dx 1 þ x2 ð6:157Þ

Note that in the above equation the contour integral is taken in the counterclockwise direction and, hence, that the real definite integral has been taken from 1 to -1. Notice also that

6.8

Examples of Real Definite Integrals

239

Res f ð- iÞ = f ðzÞðz þ iÞjz = - i = -

1 , 2i

because f(z) has a simple pole at z = - i in the lower semicircle (Fig. 6.18) for which the residue has been taken. From (6.157), we get the same result as before such that 1

dx = π: 1 þ x2 -1 Alternatively, we can use the method of partial fraction decomposition. In that case, as the integrand we have 1 1 1 1 = : 1 þ z2 2i z - i z þ i

ð6:158Þ

The residue of 1þ1z2 at z = i is 2i1 [i.e., the coefficient of z -1 i in (6.158)], giving the same result as (6.153). Example 6.5 Evaluate the following real definite integral: 1

I=

-1

dx : ð 1 þ x2 Þ 3

ð6:159Þ

As in the case of Example 6.4, we evaluate a contour integration described by IC =

1 dz = 3 C ð1 þ z2 Þ

R -R

dz þ ð1 þ z2 Þ3

ΓR

dz , ð 1 þ z2 Þ 3

ð6:160Þ

where C stands for the closed curve comprising the interval [-R, R] and the upper semicircle ΓR (see Fig. 6.18). The function f ðzÞ  ð1þz1 2 Þ3 has two isolated poles at z = ± i. In the present case, however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue. The inside of the upper semicircle contains the pole of order 3 at z = i. Therefore, we have Res f ðiÞ = where

1 2πi

f ðzÞdz = C

1 2πi

gð z Þ 1 d2 gðzÞ dz = 3 2! dz2 C ðz - i Þ

, z=i

ð6:161Þ

240

6

gð z Þ 

1 ðz þ iÞ3

and

f ðzÞ =

Theory of Analytic Functions

gð z Þ : ðz - iÞ3

Hence, we get 1 d 2 gð z Þ 2! dz2

z=i

=

1 ð- 3Þ  ð- 4Þ  ðz þ iÞ - 5 2

=

z=i

=

1 ð- 3Þ  ð- 4Þ  ð2iÞ - 5 2

3 = Res f ðiÞ: 16i

ð6:162Þ

For a reason similar to that of Example 6.4, the second term of (6.160) vanishes when R → 1. Then, we obtain lim I C =

R→1

1

dz þ lim 3 R→1 - 1 ð 1 þ z2 Þ

dz = lim 3 R→1 Γ R ð 1 þ z2 Þ

= 2πi ×

gð z Þ dz = 2πi Res f ðiÞ 3 C ðz - i Þ

3 3π = : 16i 8

That is, 1

dz = 3 - 1 ð1 þ z2 Þ

1 -1

dx 3π =I = : 8 ð 1 þ x2 Þ 3

If we are able to perform the Laurent’s expansion of f(z), we can immediately evaluate the residue using (6.145). To this end, let us utilize the binomial expansion formula (or generalized binomial theorem). We have f ðzÞ =

1 1 = : 3 3 2 ðz þ iÞ ðz - iÞ3 ð1 þ z Þ

Changing the variable z - i = ζ, we have f ðzÞ = ~f ðζ Þ =

1 1 ζ 1 = 3  1þ 3 3 3 2i ζ ð2iÞ ðζ þ 2iÞ ζ

-3

6.8

=

Examples of Real Definite Integrals

241

1 1 ζ ð- 3Þð- 4Þ ζ   1 þ ð- 3Þ þ 2! 2i 2i ζ 3 ð2iÞ3

=-

2

þ

ð- 3Þð- 4Þð- 5Þ ζ 3! 2i

1 1 3 3 5 þ þ⋯ , 8i ζ 3 2iζ 2 2ζ 4i

3

þ⋯

ð6:163Þ

where ⋯ of the above equation denotes a power series of ζ. Accompanied by the variable transformation z to ζ, the functional form f has been changed to ~f . Getting back the original functional form, we have f ðzÞ = -

1 1 3 3 5 þ þ⋯ : 8i ðz - iÞ3 2iðz - iÞ2 2ðz - iÞ 4i

ð6:164Þ

Equation (6.164) is a Laurent’s expansion of f(z). From (6.164) we see that the 3 coefficient of z -1 i of f(z) is 16i , in agreement with (6.145) and (6.162). To obtain the ~ (Fig. 6.18) and answer of (6.159), once again we may equally choose the contour C get the same result as in the case of Example 6.4. In that case, f(z) can be expanded into a Laurent’s series around z = - i. Readers are encouraged to check it. We make a few remarks about the (generalized) binomial expansion formula. We described it in (3.181) of Sect 3.6.1 such that ð1 þ xÞ - λ =

1 m=0

-λ m

xm ,

ð3:181Þ

where λ is an arbitrary real number. In (3.181), we assumed x is a real number with | x| < 1. However, (6.163) suggests that (3.181) holds with x that can be a complex number with |x| < 1. Moreover, λ in (3.181) is allowed to be any complex number. Then, on those conditions (1 + x)-λ is analytic because 1 + x ≠ 0, and so (1 + x)-λ -3 must have a convergent Taylor’s series. By the same token, the factor 1 þ 2iζ of (6.163) yields a convergent Taylor’s series around ζ = 0. In virtue of the factor ζ13 , (6.163) gives a convergent Laurent’s series around ζ = 0. Returning to the present issue, we rewrite (3.181) as a Taylor’s series such that ð1 þ zÞ - λ =

1 m=0

-λ m

zm ,

ð6:165Þ

where λ is any complex number with z also being a complex number of |z| < 1. We may view (6.165) as a consequence of the analytic continuation and (6.163) has been dealt with as such indeed.

242

6

Theory of Analytic Functions

2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -5

-3

-1

1

3

5

1 Fig. 6.19 Graphical form of f ðxÞ  1þx 3 . The modulus of f(x) tends to infinity at x = - 1. Inflection

points are present at x = 0 and

3

1=2 ð ≈ 0:7937Þ

Example 6.6 Evaluate the following real definite integral: I=

1 -1

dx : 1 þ x3

ð6:166Þ

We put f ðxÞ 

1 : 1 þ x3

ð6:167Þ

Figure 6.19 depicts a graphical form of f(x). The modulus of f(x) tends to be infinity at x = - 1. Inflection points are present at x = 0 and 3 1=2 ð ≈ 0:7937Þ. Now, in the former two examples, the isolated singular points (i.e., poles) existed only in the inside of a closed contour. In the present case, however, a pole is located on the real axis. Bearing in mind this situation, we wish to estimate the integral (6.166). Rewriting (6.166), we have I=

1

dx or I = 2 - 1 ð1 þ xÞðx - x þ 1Þ

1 -1

dz : ð1 þ zÞðz2 - z þ 1Þ

6.8

Examples of Real Definite Integrals

243

Fig. 6.20 Contour for the 1 integration of 1þz 3 that appears in Example 6.6

Γ Γ

0

−1



/

i

/

The polynomial of the denominator, i.e., (1 + z)(z2 - z + 1) has a real root at z = - 1 and complex roots at z = e ± iπ=3 = f ðzÞ 

p 1 ± 3i 2

. Therefore,

1 ð1 þ zÞðz2 - z þ 1Þ

has three simple poles at z = - 1 along with z = e±iπ/3. This time, we have a contour C depicted in Fig. 6.20, where the contour integration IC is performed in such a way that IC = þ

-1-r -R R - 1þr

dz þ ð 1 þ z Þ ð z 2 - z þ 1Þ

Γ-1

dz ð1 þ zÞðz2 - z þ 1Þ

dz ð 1 þ z Þ ð z 2 - z þ 1Þ

þ

dz , 2 - z þ 1Þ ð 1 þ z Þ z ð ΓR

ð6:168Þ

where Γ-1 stands for a small semicircle of radius r around z = - 1 as shown. Of the complex roots z = e±iπ/3, only z = eiπ/3 is responsible for the contour integration (see Fig. 6.20). Here we define a principal value P of the integral such that

244

6 1

P

-1-r

dx  lim 2 r→0 - 1 ð 1 þ xÞ ð x - x þ 1Þ þ

-1

Theory of Analytic Functions

dx ð 1 þ x Þ ð x2 - x þ 1Þ

ð6:169Þ

1

dx , 2 - x þ 1Þ ð 1 þ x Þ x ð - 1þr

where r is an arbitrary real positive number. As shown in this example, the principal value is a convenient device to traverse the poles on the contour and gives a correct answer in the contour integral. At the same time, getting R to infinity, we obtain lim I C = P

R→1

þ

1 -1

dx þ ð 1 þ x Þ ð x2 - x þ 1 Þ

Γ-1

dz ð1 þ zÞðz2 - z þ 1Þ

dz : 2 - z þ 1Þ ð 1 þ z Þ z ð Γ1

ð6:170Þ

Meanwhile, the contour integral IC is given by (6.146) such that lim I C =

R→1

f ðzÞdz = 2πi C

Res f aj ,

ð6:146Þ

j

where in the present case f ðzÞ  ð1þzÞðz12 - zþ1Þ and we have only one simple pole at aj = eiπ/3 within the contour C. We are going to get the answer by combining (6.170) and (6.146) such that IP

1

dx 2 - x þ 1Þ ð 1 þ x Þ x ð -1

= 2πi Res f eiπ=3 -

Γ1

dz 2 - z þ 1Þ ð 1 þ z Þ z ð Γ-1

dz : ð1 þ zÞðz2 - z þ 1Þ

ð6:171Þ

The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4 and 6.5. Therefore, if we can estimate the second term as well as the residues properly, the integral (6.166) can be adequately determined. Note that in this case the integral is meant by the principal value. To estimate the integral of the second term, we make use of the fact that defining g(z) as

6.8

Examples of Real Definite Integrals

gð z Þ 

245

1 , z2 - z þ 1

ð6:172Þ

g(z) is analytic at and around z = - 1. Hence, g(z) can be expanded in the Taylor’s series around z = - 1 such that 1

gðzÞ = A0 þ

A ðz n=1 n

þ 1Þ n ,

ð6:173Þ

where A0 ≠ 0. In fact, A0 = gð- 1Þ = 1=3:

ð6:174Þ

Then, we have dz = 2 - z þ 1Þ ð 1 þ z Þ z ð Γ-1 þ

1 Γ-1

A ðz n=1 n

gðzÞdz = A0 Γ-1 1 þ z

dz 1 Γ-1 þ z

þ 1Þn - 1 dz:

ð6:175Þ

Changing the variable z to the polar form in RHS such that ð6:176Þ

z = - 1 þ reiθ , with the first term of RHS we have A0

ireiθ dθ = A0 iθ Γ - 1 re

Γ-1

idθ = A0

0 π

π

idθ = - A0

idθ = - A0 iπ:

ð6:177Þ

0

Note that in (6.177) the contour integration along Γ-1 was performed in the decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as dz = - A0 iπ þ An i 2 Γ - 1 ð1 þ zÞðz - z þ 1Þ

0

1

rn n=1

einθ dθ, π

ð6:178Þ

where with the last equality we exchanged the order of the summation and integration. The second term of RHS of (6.178) vanishes when r → 0. Thus, (6.171) is further rewritten as I = lim I = 2πi Res f eiπ=3 þ A0 iπ: r→0

ð6:179Þ

We have only one simple pole for f(z) at z = eiπ/3 within the semicircle contour C. Then, using (6.148), we have

246

6

Res f eiπ=3 = f ðzÞ z - eiπ=3

=

1 ð1 þ

eiπ=3 Þðeiπ=3

- e - iπ=3 Þ

=-

Theory of Analytic Functions

z = eiπ=3

p 1 1 þ 3i : 6

ð6:180Þ

The above calculus is straightforward, but (6.37) can be conveniently used. Thus, finally we get p

p πi πi I= 1 þ 3i þ = 3 3

3π π =p : 3 3

ð6:181Þ

That is, I ≈ 1:8138:

ð6:182Þ

To avoid any ambiguity of the principal value, we properly define it as [8]. 1

P -1

f ðxÞdx  lim

R

R→1 -R

f ðxÞdx:

The integral can also be calculated using a lower semicircle including the simple pole located at z = e - 3 = iπ

p 1 - 3i 2

. That gives the same answer as (6.181). The

calculations are left for readers as an exercise. Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with that of the following real definite integral: 1

I= 0

dx : 1 þ x3

ð6:183Þ

To evaluate (6.183), it is convenient to use a sector of circle for contour (see Fig. 6.21a). In this case, the contour integral IC estimated along a sector is described by R

IC = 0

dz þ 1 þ z3

dz þ 1 þ z3 ΓR

L

dz : 1 þ z3

ð6:184Þ

In the third integral of (6.184), we take argz = θ (constant) so that with z = reiθ we may have dz = dreiθ. Then, that integral is given by

6.8

Examples of Real Definite Integrals

247

(a)

(b) Γ i −1

Γ

/

Γ

0

−1



/

i 0

/

/

1 Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The integration range is [0, 1). (b) The integration range is (-1, 0]

L

dz = 1 þ z3

L

dr : 1 þ r 3 e3iθ

Setting 3iθ = 2iπ, namely θ = 2π/3, we get

L

dz = e2πi=3 1 þ r 3 e3iθ

0 R

dr = - e2πi=3 1 þ r3

R 0

dr : 1 þ r3

Thus, taking R → 1 in (6.184) we have 1

lim I C =

R→1

0

dx þ 1 þ x3

dz - e2πi=3 3 1 þ z Γ1

1 0

dr : 1 þ r3

ð6:185Þ

The second term of (6.185) vanishes as before. Changing the variable r → x, we obtain 1

½RHS of ð6:185Þ = 1 - e2πi=3 0

dx = 1 - e2πi=3 I: 1 þ x3

ð6:186Þ

Meanwhile, considering that f(z) has a simple pole at z = eiπ/3 within the contour C, i.e., inside the sector (see Fig. 6.21a), we have ½LHS of ð6:185Þ = 2πiRes f eiπ=3 ,

ð6:187Þ

where f(x) was given in (6.167) of Example 6.6. Then, equating (6.186) with (6.187) we have

248

6

Theory of Analytic Functions

I = 2πiRes f eiπ=3 = 1 - e2πi=3

eiπ=3 - e - iπ=3

= 2πi= 1 þ eiπ=3

1 - e2πi=3 :

To calculate Res f(eiπ/3), once again we used (6.148) at z = eiπ/3; see (6.180). p Noting that (1 + eiπ/3)(1 - e2πi/3) = 3 and eiπ=3 - e - iπ=3 = 2i sinðπ=3Þ = 3i, we get p 2π I = 2πi= 3 3i = p : 3 3

ð6:188Þ

That is, I ≈ 1.2092. This number is two-thirds of (6.182). We further extend the above calculation method to evaluation of real definite integral. For instance, we have a following integral described by ~I = P

0

dx : 3 -1 1 þ x

ð6:189Þ

To evaluate this integral, we define the contour integral I C0 as shown in Fig. 6.21b, which is expressed as I C0 = P

0

dz þ 1 þ z3 -R

dz þ 2 - z þ 1Þ ð 1 þ z Þ z ð Γ-1

L

0

dz þ 1 þ z3

dz : ð6:190Þ 1 þ z3 ΓR

Notice that combining I C0 of (6.190) with IC of (6.184) makes the contour integration of (6.168). Proceeding as before and noting that there is no singularity inside the contour C′, we have lim I C0 = 0 = P

R→1

1

0

πi dz - þ e2πi=3 3 3 -1 1 þ z

0

dr =P 1 þ r3

0 -1

π dz - p , 1 þ z3 3 3

where we used (6.188) and (6.178) in combination with (6.174). Thus, we get ~I =

0

π dx = p : 3 3 3 -1 1 þ x

ð6:191Þ

As a matter of course, the result of (6.191) can immediately be obtained by subtracting (6.188) from (6.181). We frequently encounter real definite integrals including trigonometric functions. In such cases, to make a valid estimation of the integrals it is desired to use complex exponential functions. Jordan’s lemma is a powerful tool for this. The proof is given as the following literature [5].

6.8

Examples of Real Definite Integrals

249

Lemma 6.6: Jordan’s Lemma [5] Let ΓR be a semicircle of radius R centered at the origin in the upper half of the complex plane. Let f(z) be an analytic function that tends uniformly to zero as jz j → 1 when argz lies in the interval 0 ≤ arg z ≤ π. Then, with a non-negative real number α, we have lim

R→1 Γ R

eiαζ f ðζ Þdζ = 0:

Proof Using polar coordinates, ζ is expressed as ζ = Reiθ = Rðcos θ þ i sin θÞ: Then, the integral is denoted by IR 

ΓR

π

eiαζ f ðζ Þdζ = iR

f Reiθ eiαRðcos θþi sin θÞ eiθ dθ

0

π

= iR

f Reiθ eiαR cos θ - αR sin θþiθ dθ:

0

By assumption f(ζ) tends uniformly to zero as jζ j → 1, and so we have f Reiθ

< εðRÞ,

where ε(R) is a certain positive number that depends only on R and tends to be zero as jζ j → 1. Therefore, we have π

jI R j < εðRÞR

π=2

e - αR sin θ dθ = 2 εðRÞR

0

e - αR sin θ dθ,

0

where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we have sin θ ≥ 2θ=π when 0 ≤ θ ≤ π=2: Hence, we have π=2

jI R j < 2 εðRÞR 0

Therefore, we get

e - 2αRθ=π dθ =

πεðRÞ 1 - e - αR : α

250

6

Theory of Analytic Functions

lim I R = 0:

R→1

These complete the proof. Alternatively, for f(z) that is analytic and tends uniformly to zero as jz j → 1 when argz lies in the interval π ≤ arg z ≤ 2π, similarly we have lim

R→1 Γ R

e - iαζ f ðζ Þdζ = 0,

where α is a certain non-negative real number. In this case, we assume that a semicircle ΓR of radius R centered at the origin lies in the lower half of the complex plane. Using the Jordan’s lemma, we have several examples. Example 6.8 Evaluate the following real definite integral: I=

1 -1

sin x dx: x

ð6:192Þ

Rewriting (6.192) as I=

1 -1

sin z 1 dz = z 2i

1

eiz - e - iz dz, z -1

ð6:193Þ

we consider the following contour integral: IC = P

R

eiz dz þ -R z

eiz dz þ Γ0 z

eiz dz, ΓR z

ð6:194Þ

where Γ0 represents an infinitesimally small semicircle around the origin and ΓR shows the outer semicircle of radius R centered at the origin (see Fig. 6.22). The contour C consists of the real axis, Γ0, and ΓR as shown. The third term of (6.194) vanishes when R → 1 due to the Jordan’s lemma. With the second term, eiz is analytic in the whole complex plane (i.e., entire function) and, hence, can be expanded in the Taylor’s series around any complex number. Expanding it around the origin we have eiz =

ðizÞn =1 þ n = 0 n! 1

ðizÞn : n = 1 n! 1

ð6:195Þ

As already mentioned in (6.173) through (6.178), among terms of RHS in (6.195) only the first term, i.e., 1, contributes to the integral of the second term of RHS in (6.194). Thus, as in (6.178) we get

6.8

Examples of Real Definite Integrals

251

Fig. 6.22 Contour for the integration of eiz/z that appears in Example 6.8. The same contour will appear in Example 6.9 as well

Γ Γ 0



eiz dz = - 1 × iπ = - iπ: Γ0 z

ð6:196Þ

Notice that in (6.196) the argument is decreasing from π → 0 during the contour integration. Within the contour C, there is no singular point, and so IC = 0 due to Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain 1

eiz dz = -1 z

P

eiz dz = iπ: Γ0 z

ð6:197Þ

Exchanging the variable z → - z in (6.197), we have P -

-1 1

e - iz dz = P -z

1

e - iz dz = iπ: -1 - z

ð6:198Þ

Summing both sides of (6.197) and (6.198), we get 1

P

eiz dz þ P -1 z

1

e - iz dz = 2iπ: -1 - z

ð6:199Þ

Rewriting (6.199), we have 1

P

ðeiz - e - iz Þ dz = 2iP z -1

That is, the answer is

1 -1

sin z dz = 2iπ: z

ð6:200Þ

252

6 1

P -1

sin z dz = P z

1 -1

Theory of Analytic Functions

sin x dx = I = π: x

ð6:201Þ

Notice that as mentioned in (6.137) z = 0 is a removable singularity of (sinz)/z, and so the symbol P is superfluous in (6.201). Example 6.9 Evaluate the following real definite integral: I=

1 -1

sin 2 x dx: x2

ð6:202Þ

From the trigonometric theorem, we have sin 2 x =

1 ð1 - cos 2xÞ: 2

ð6:203Þ

Then, the integrand of (6.202) with the variable changed is rewritten using (6.37) as sin 2 z 1 1 - e2iz 1 - e - 2iz = þ : 4 z2 z2 z2

ð6:204Þ

Expanding e2iz as before, we have e2iz =

ð2izÞn = 1 þ 2iz þ n = 0 n! 1

ð2izÞn : n = 2 n! 1

ð6:205Þ

Then, we get 1 - e2iz = - 2iz -

1 - e2iz 2i =- z z2

ð2izÞn , n = 2 n! 1

ð2iÞn zn - 2 : n=2 n! 1

ð6:206Þ

As in Example 6.8, we wish to use (6.206) to estimate the following contour integral IC: lim I C = P

R→1

1

1 - e2iz dz þ z2 -1

1 - e2iz dz þ z2 Γ0

1 - e2iz dz, z2 Γ1

ð6:207Þ

where the contour for the integration is the same as that of Fig. 6.22. With the second integral of (6.207), as before, only the first term -2i/z of (6.206) contributes to the integral. The result is given as

6.8

Examples of Real Definite Integrals

1 - e2iz dz = - 2i z2 Γ0

253

1 dz = - 2i Γ0 z

0 π

idθ = - 2π:

ð6:208Þ

With the third integral of (6.207), we have 1 - e2iz dz = z2 Γ1

1 dz 2 z Γ1

e2iz dz: 2 Γ1 z

ð6:209Þ

The first term of RHS of (6.209) vanishes similarly to the integral of (6.155) that appeared in Example 6.4. The second term of RHS vanishes as well because of the Jordan’s lemma. Within the contour C of (6.207), again there is no singular point, and so lim I C = 0. Hence, from (6.207) we have R→1

1

P

1 - e2iz dz = z2 -1

1 - e2iz dz = 2π: z2 Γ0

ð6:210Þ

Exchanging the variable z → - z in (6.210) as before, we have P -

-1 1

1 - e - 2iz dz = P ð- zÞ2

1

1 - e - 2iz dz = 2π: z2 -1

ð6:211Þ

Summing both sides of (6.210) and (6.211) in combination with (6.204), we get an answer described by 1 -1

sin 2 z dz = π: z2

ð6:212Þ

Again, the symbol P is superfluous in (6.210) and (6.211). Example 6.10 Evaluate the following real definite integral: I=

1

cos x dx ða ≠ bÞ: ð x a Þ ð x - bÞ -1

ð6:213Þ

We rewrite the denominator of (6.213) using the method of partial fraction decomposition such that 1 1 1 1 : = ðx - aÞðx - bÞ a - b x - a x - b Then, (6.213) is described by

ð6:214Þ

254

6

Fig. 6.23 Contour for the z integration of ðz - acos Þðz - bÞ that appears in Example 6.10. (a) Along the upper contour C. (b) Along the lower contour C′

Theory of Analytic Functions

(a) Γ Γ 0



(b)

0



Γ Γ

I=

1 a-b

1 -1

1 = 2 ð a - bÞ

cos z cos z dz z-a z-b 1 -1

eiz þ e - iz eiz þ e - iz dz: z-a z-b

ð6:215Þ

We wish to consider the integration by dividing the integrand and calculate each term individually. First, we estimate 1

eiz dz: -1 z - a

ð6:216Þ

To apply the Jordan’s lemma to the integration of (6.216), we use the upper contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we consider a following integral:

6.8

Examples of Real Definite Integrals

lim I C = P

R→1

255

1

eiz dz þ -1 z - a

eiz dz þ Γa z - a

eiz dz, Γ1 z - a

ð6:217Þ

where Γa represents an infinitesimally small semicircle around z = a. Notice that (6.217) is obtained when we are considering R → 1 in Fig. 6.23a. The third term of (6.217) vanishes due to the Jordan’s lemma. With the second term of (6.217), we change variable z - a → z and rewrite it as eiz dz = Γa z - a

eiðzþaÞ dz = eia z Γ0

eiz dz = - eia iπ, Γ0 z

ð6:218Þ

where with the last equality we used (6.196). There is no singular point within the contour C, and so lim I C = 0. Then, we have R→1

1

eiz dz = z -1 - a

P

eiz dz = eia iπ: z Γa - a

ð6:219Þ

Next, we consider a following integral that appears in (6.215): 1

e - iz dz: -1 z - a

ð6:220Þ

To apply the Jordan’s lemma to the integration of (6.220), we use the lower contour C′ that consists of the real axis, Γ~a , and Γ~R (Figure 6.23b). This time, the integral to be evaluated is given by lim I C0 = - P

R→1

1

e - iz dz þ z -1 - a

e - iz dz þ z Γ~a - a

e - iz dz z Γ~1 - a

ð6:221Þ

so that we can trace the contour C′ counter-clockwise. Notice that the minus sign is present in front of the principal value. The third term of (6.221) vanishes due to the Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get e - iz dz = Γ~a z - a

e - iðzþaÞ dz = e - ia z Γ~0

e - iz dz = - e - ia iπ: Γ~0 z

ð6:222Þ

Hence, we obtain 1

P

e - iz dz = -1 z - a

e - iz dz = - e - ia iπ: Γ~a z - a

ð6:223Þ

Notice that in (6.222) the argument is again decreasing from 0 → - π during the contour integration. Summing (6.219) and (6.223), we get

256

6 1

P

eiz dz þ P -1 z - a

1

e - iz dz = P -1 z - a

Theory of Analytic Functions

1

eiz þ e - iz dz z-a -1

= eia iπ - e - ia iπ = iπ eia - e - ia = iπ  2i sin a = - 2π sin a:

ð6:224Þ

In a similar manner, we also get 1

eiz þ e - iz dz = - 2π sin b: z-b -1

P Thus, the final answer is I=

1 2ð a - bÞ

1 -1

2π ðsin b - sin aÞ eiz þ e - iz eiz þ e - iz dz = z-a z-b 2ða - bÞ

π ðsin b - sin aÞ = : a-b Example 6.11 Evaluate the following real definite integral: I=

1 -1

cos x dx: 1 þ x2

ð6:225Þ

The calculation is left for readers. Hints: (i) Use cosx = (eix + e-ix)/2 and Jordan’s lemma. (ii) Use the contour as shown in Fig. 6.18. Besides the examples listed above, a variety of integral calculations can be seen in literature [9–12].

6.9

Multivalued Functions and Riemann Surfaces

The last topics of the theory of analytic functions are related to multivalued functions. Riemann surfaces play a central role in developing the theory of the multivalued functions.

6.9.1

Brief Outline

We mention a brief outline for notions of multivalued functions and Riemann surfaces. These notions are indispensable for investigating properties of irrational functions and logarithmic functions and performing related calculations, especially integrations with respect to those functions.

6.9

Multivalued Functions and Riemann Surfaces

257

Definition 6.12 Let n be a positive integer and let z be an arbitrary complex number. Suppose z can be written as z = wn :

ð6:226Þ

Then, with respect to w it is denoted by w = z1=n

ð6:227Þ

and w is said to be a n-th root of z. We have expressly shown this simple thing. It is because if we think of real z, z1/n is uniquely determined only when n is odd regardless of whether z is positive or negative. In case we think of even n, (i) if z is positive, we have two n-th root of ±z1/n. (ii) If z is negative, however, we have no n-th root. Meanwhile, if we think of complex z, we have a totally different situation. That is, regardless of whether n is odd or even, we always have n different z1/n for any given complex number z. As a tangible example, let us think of n-th roots of ±1 for n = 2 and 3. These are depicted graphically in Fig. 6.24. We can readily understand the statement of the above paragraph. In Fig. 6.24a, if z = - 1 for n = 2, we have either w = (-1)1/2 = (eiπ)1/2 = i or w = (-1)1/2 = (e-iπ)1/2 = - i with double root. In Fig. 6.24b, if z = - 1 for n = 3, we have w = (-1)1/3 = - 1; w = (-1)1/3 = (eiπ)1/3 = eiπ/3; w = (-1)1/3 = (e-iπ)1/3 = e-iπ/3 with triple root. Moreover, we have 11/2 = 1; - 1. Also, we have 11/3 = 1; e2iπ/3; e-2iπ/3. Let us extend the above preliminary discussion to the analytic functions. Suppose we are given a following polar form of z such that z = r ðcos θ þ i sin θÞ

ð6:32Þ

and w = ρðcos φ þ i sin φÞ: Then, we have z = wn = ρn ðcos φ þ i sin φÞn = ρn ðcos nφ þ i sin nφÞ,

ð6:228Þ

where with the last equality we used de Moivre’s theorem (6.36). Comparing the real and imaginary parts of (6.32) and (6.228), we have r = ρn or ρ = r 1=n and

ð6:229Þ

258

6

(a)

1

/

−1

Theory of Analytic Functions

(−1)

/

1



(b)

1

/

(−1)

/ /

/

1

−1

/

/

Fig. 6.24 Multiple roots in the complex plane. (a) Square roots of 1 and -1. (b) Cubic roots of 1 and -1

θ = nφ þ 2kπ ðk = 0, ± 1, ± 2, ⋯Þ:

ð6:230Þ

In (6.229), we define both r and ρ as being positive so that positive ρ can be uniquely determined for any positive integer n (i.e., either even or odd). Recall once again that if n is even, we have two n-th roots for ρ (i.e., both positive and negative) with given positive r. Of these two roots, we discard the negative ρ. Rewriting (6.230), we have

6.9

Multivalued Functions and Riemann Surfaces

φ=

259

1 ðθ - 2kπ Þ ðk = 0, ± 1, ± 2, ⋯Þ: n

Further rewriting (6.227), we get 1

1

wk ðθÞ = zn = r n cos

1 1 ðθ - 2kπ Þ þ i sin ðθ - 2kπ Þ n n

ðk = 0, 1, 2, ⋯, n - 1Þ,

ð6:231Þ

where wk(θ) is said to be a branch. Note that we have wk ðθ - 2nπ Þ = wk ðθÞ:

ð6:232Þ

That is, wk(θ) is a periodic function of the period 2nπ. The number of the total branches is n; the index k of wk(θ) denotes these different branches. In particular, when k = 0, we have 1

1

w0 ðθÞ = zn = r n cos

1 1 θ þ i sin θ : n n

ð6:233Þ

Comparing (6.231) and (6.233), we obtain wk ðθÞ = w0 ðθ - 2kπ Þ:

ð6:234Þ

From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ toward the positive direction of θ. The branch w0(θ) is called a principal branch of the n-th root of z. The value of w0(θ) is called the principal value of that branch. The implication of the presence of the branches is as follows: Suppose that the branches besides w0(θ) would be absent. Then, from (6.233) we have 1

1

w0 ð2π Þ = zn = r n cos

1 2π 2π ≠ w0 ð0Þ = r n : þ i sin n n

This causes inconvenience, because θ = 0 and θ = 2π are assumed to be identical in the complex plane. Hence, the single-valuedness would be broken with w0(θ). However, in virtue of the presence of w1(θ), we have w1 ð2π Þ = w0 ð2π - 2π Þ = w0 ð0Þ: This means that the value of w0(0) is recovered and, hence, the single-valuedness remains intact. After another cycle around the origin, similarly we have

260

6

Theory of Analytic Functions

w2 ð4π Þ = w0 ð4π - 4π Þ = w0 ð0Þ: Thus, in succession we get wk ð2kπ Þ = w0 ð2kπ - 2kπ Þ = w0 ð0Þ: For k = n, from (6.231) we have 1 1 1 1 wn ðθÞ = zn = r n cos ðθ - 2nπ Þ þ i sin ðθ - 2nπ Þ n n

= r 1=n cos

1

1 1 θ - 2π þ i sin θ - 2π n n

= r n cos

1 1 θ þ i sin θ = w0 ðθÞ: n n

Thus, we have no more new function. At the same time, the single-valuedness of wk(θ) (k = 0, 1, 2, ⋯, n - 1) remains intact during these processes. To summarize the above discussion, if we have n planes (or sheets) as the complex planes and allocate them to w0(θ), w1(θ), ⋯, wn - 1(θ) individually, we have a single-valued function w(θ) as a whole throughout these planes. In a word, the following relation represents the situation:

wð θ Þ =

w 0 ðθ Þ for Plane 0, for Plane 1, w 1 ðθ Þ ⋯ ⋯ wn - 1 ðθÞ for Plane n - 1:

This is the essence of the Riemann surface. The superposition of these n planes is called a Riemann surface and each plane is said to be a Riemann sheet [of the function w(θ)]. Each single-valued function w0(θ), w1(θ), ⋯, wn - 1(θ) defined on each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is called a branch point of w(z). For simplicity, let us think of the function w(z) described by p wðzÞ  z1=2 = z: In this case, for z = reiθ (0 ≤ θ ≤ 2π) we have two different values w0 and w1 given by

6.9 Multivalued Functions and Riemann Surfaces

261

w0 ðθÞ = r 1=2 eiθ=2 , w1 ðθÞ = r 1=2 eiðθ - 2π Þ=2 = w0 ðθ - 2π Þ = - w0 ðθÞ:

ð6:235Þ

Then, we have wðθÞ =

w0 ðθÞ for Plane 0, w1 ðθÞ for Plane 1:

ð6:236Þ

In this case, w(z) is called a “two-valued” function of z and the functions w0 and w1 are said to be branches of w(z). Suppose that z makes a counter-clockwise circuit around the origin, starting from, e.g., a real positive number z = z0 to come full circle to the original point z = z0. Then, the argument of z has been increased by 2π. In this situation the arguments of the individual branches w0 and w1 are increased by π. Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be understood more clearly from (6.232) and (6.234). That is, putting n = 2 in (6.232), we have wk ðθ - 4π Þ = wk ðθÞ ðk = 0, 1Þ:

ð6:237Þ

Meanwhile, putting k = 1 in (6.234) we get w1 ðθÞ = w0 ðθ - 2π Þ:

ð6:238Þ

Replacing θ with θ + 2π in (6.238), we have w1(θ + 2π) = w0(θ). Also replacing θ with θ + 4π in (6.237), we have w1(θ + 4π) = w1(θ) = w0(θ - 2π) = w0(θ + 2π), where with the last equality we replaced θ with θ + 2π in (6.237). Rewriting the above, we get w1 ðθ þ 2π Þ = w0 ðθÞ and w0 ðθ þ 2π Þ = w1 ðθÞ: That is, adding 2π to θ, w0 and w1 are switched to each other. Let us further think of the two-valued function of w(z) = z1/2. Strictly speaking, an analytic function cannot be a two-valued function. This is because if so, the continuity and differentiability will be lost from the function and, hence, the analyticity would be broken. Then, we must make a suitable device to avoid it. Such a device is called a Riemann surface. Let us make a kit of the Riemann surface following Fig. 6.25. (i) Take a sheet of paper so that it can represent a complex plane and cut it with scissors along the real axis that starts from the origin so that the cut (or slit) can be made toward the real positive direction. This cut is called a branch cut with the origin being a branch point. Let us call this sheet Plane I. (ii) Take another sheet of paper and call it Plane II. Also make a cut in Plane II in exactly the same way as that for Plane I; see

262

6

Theory of Analytic Functions

(b) (a)

Plane I placed on top of Plane II

Plane I

Plane II

Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures: (a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown. Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape together the downside of the cut of Plane II and the foreside of the cut of Plane I

Fig. 6.25a for the processes (i) and (ii). (iii) Next, put Plane I on top of Plane II so that the two branch cuts can fit in line. (iv) Tape together the downside of the cut of Plane I and the foreside of the cut of Plane II (Fig. 6.25b). (v) Then, also tape together the downside of the cut of Plane II and the foreside of the cut of Plane I (see Fig. 6.25b once again). Thus, what we see is that starting from, e.g., a real positive number z = z0 of Plane I and coming full circle to the original point z = z0, then we cross the cut to enter Plane II located underneath Plane I. After another cycle within Plane II, we come back to Plane I again by crossing the cut. After all, we come back to the original Plane I after two cycles on the combined planes. This combined plane is called the Riemann surface. In this way, Planes I and II correspond to different branches w0 and w1, respectively, so that each branch can be single-valued. In other words, w(z) = z1/2 is a single-valued function that is defined on the whole Riemann surface. There are a variety of Riemann surfaces according to the nature of the complex functions. Another example of the multivalued function is expressed as wðzÞ =

ðz - aÞðz - bÞ,

where two branch points are located at z = a and z = b. As before, we assume that z - a = r a eiθa and z - b = r b eiθb : Then, we have

6.9

Multivalued Functions and Riemann Surfaces

263

Fig. 6.26 Graphically decided argument θa, which is an angle between a line connecting z and a and another line drawn in parallel with the real axis



0 −

p wðθa , θb Þ = z1=2 = r a r b eiðθa þθb Þ=2 :

ð6:239Þ

In the above, e.g., the argument θa can be decided graphically as shown in Fig. 6.26, where θa is an angle between a line connecting z and a and another line drawn in parallel with the real axis. We can choose w(θa, θb) of (6.239) for the principal branch and define this as w 0 ðθ a , θ b Þ 

p

r a r b eiðθa þθb Þ=2 :

ð6:240Þ

r a r b eiðθa þθb - 2π Þ=2 :

ð6:241Þ

Also, we define w1(θa, θb) as w 1 ðθ a , θ b Þ 

p

Then, as in (6.235) we have w1 ðθa , θb Þ = w0 ðθa - 2π, θb Þ = w0 ðθa , θb - 2π Þ = - w0 ðθa , θb Þ:

ð6:242Þ

From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each other as before. We also find that after the variable z has come full circle around one of a and b, w0(θa, θb) changes sign as in the case of (6.235).

264

6

Theory of Analytic Functions

We also have w0 ðθa - 2π, θb - 2π Þ = w0 ðθa , θb Þ, w1 ðθa - 2π, θb - 2π Þ = w1 ðθa , θb Þ:

ð6:243Þ

From (6.243), however, after the variable z has come full circle around both a and b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the case where the variable z comes full circle without encircling a or b. These behaviors imply that (i) if a contour encircles one of z = a and z = b, w0(θa, θb) and w1(θa, θb) change sign and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches. On the other hand, (ii) if a contour encircles both z = a and z = b, w0(θa, θb) and w1(θa, θb) remain intact. (iii) If a contour encircles neither z = a nor z = b, w0(θa, θb) and w1(θa, θb) remain intact as well. These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a, b, and c, respectively. In Fig. 6.27c after z comes full circle, argz relative to z = a returns to the original value keeping θ1 ≤ arg z ≤ θ2. This is similarly the case with argz relative to z = b. Correspondingly, the branch cut(s) are depicted with doubled broken line(s), e.g., as shown in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be chosen so that the circle (or contour) may not cross the branch cut. Although a bit complicated, the Riemann surface can be shaped accordingly. Other choices of the branch cuts are shown in Sect. 6.9.2 (vide infra). When we consider logarithm functions, we have to deal with rather complicated situation. For polar coordinate representation, we have z = reiθ. That is, ln z = ln r þ i arg z = lnjzj þ i arg z,

ð6:244Þ

arg z = θ þ 2πn ðn = 0, ± 1, ± 2, ⋯Þ:

ð6:245Þ

where

If n = 0 is chosen, ln z is said to be the principal value. With the logarithm functions, the Riemann surface comprises infinite planes each of which corresponds to the individual branch whose argument is given by (6.245). The point z = 0 is p called a logarithmic branch point. Meanwhile, the branch point for, e.g., wðzÞ = z is said to be an algebraic branch point.

6.9.2

Examples of Multivalued Functions

When we deal with a function having a branch point, we must remind that unless the contour crosses the branch cut (either algebraic or logarithmic), a function we are

6.9 Multivalued Functions and Riemann Surfaces

265

(b)

(a)

0

0

(c)

0 Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z = a and z = b) for ðz - aÞðz - bÞ. (a) The contour C encircles only z = a. (b) C encircles both z = a and z = b. (c) C encircles neither z = a nor z = b

(b)

(a)

0

0

Fig. 6.28 Branch cut(s) for ðz - aÞðz - bÞ. (a) A branch cut is shown by a line connecting z = a and z = b. (b) Branch cuts are shown by a line connecting z = a and z = 1 and another line connecting z = b and z = 1. In both (a) and (b) the branch cut(s) are depicted with doubled broken line(s)

266

6 Theory of Analytic Functions

Fig. 6.29 Branch cut (shown with a doubled broken line) and contour for a-1 the integration of z1þz

Γ

−1

Γ 0

thinking of is held single-valued and analytic (if that function is originally analytic) and can be differentiated or integrated in a normal way. We give some examples for this. Example 6.12 [6] Evaluate the following real definite integral: 1

I= 0

xa - 1 dx ð0 < a < 1Þ: 1þx

ð6:246Þ

We rewrite (6.246) as 1 a-1

I= 0

z dz: 1þz

ð6:247Þ

We define the integrand of (6.247) as f ðzÞ 

za - 1 , 1þz

where za - 1 can be further rewritten as za - 1 = eða - 1Þ ln z : The function f(z) has a branch point at z = 0 and a simple pole at z = - 1. Bearing in mind this situation, we consider a contour for integration (see Fig. 6.29). In Fig. 6.29 we depict the branch cut with a doubled broken line. Lines PQ and Q′P′ are contour lines located over and under the branch cut, respectively. We assume that

6.9

Multivalued Functions and Riemann Surfaces

267

the lines PQ and Q′P′ are illimitably close to the real axis. Thus, starting from the point P the contour integration IC is described by IC =

za - 1 dz þ PQ 1 þ z

za - 1 dz þ ΓR 1 þ z

za - 1 dz þ Q0 P0 1 þ z

za - 1 dz, Γ0 1 þ z

ð6:248Þ

where ΓR and Γ0 denote the outer large circle and inner small circle of their radius R (≫1) and r (≪1), respectively. Note that with the contour integration ΓR is traced counter-clockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is present at z = - 1, the related residue Res f(-1) is Res f ð- 1Þ = za - 1

z= -1

= ð- 1Þa - 1 = eiπ

a-1

= - eaiπ :

Then, we have I C = 2πi Res f ð- 1Þ = - 2πi eaiπ :

ð6:249Þ

When R → 1 and r → 0, we have Ra - 1 za - 1 dz <  2πR → 0 and R-1 ΓR 1 þ z

ra - 1 za - 1 dz <  2πr → 0: 1-r Γ0 1 þ z

ð6:250Þ

Thus, the second and fourth terms of (6.248) vanish. Meanwhile, we take the principal value of lnz of (6.244). Since the lines PQ and Q′P′ are very close to the real axis, using (6.245) with n = 0 we can put θ = 0 on PQ and θ = 2π on Q′P′. Then, we have za - 1 dz = PQ 1 þ z

eða - 1Þ ln z dz = PQ 1 þ z

eða - 1Þ lnjzj dz: 1þz PQ

ð6:251Þ

Moreover, we have za - 1 dz = Q0 P0 1 þ z

eða - 1Þ ln z dz = Q0 P0 1 þ z

= eða - 1Þ2πi

eða - 1Þðlnjzjþ2πiÞ dz 1þz Q0 P0

eða - 1Þ lnjzj dz: 1þz Q0 P0

ð6:252Þ

Notice that the argument underneath the branch cut (i.e., the line Q′P′) is increased by 2π relative to that over the branch cut. Considering (6.249)–(6.252) and taking the limit of R → 1 and r → 0, (6.248) is rewritten by

268

6

Theory of Analytic Functions

Fig. 6.30 Branch cut (shown with a doubled broken line) and contour for the integration of p 1 ða < bÞ. Only ðz - aÞðz - bÞ

the argument θ1 is depicted. With θ2 see text 1 ða - 1Þ lnjzj

e

- 2πi eaπi =

1þz

0

1 ða - 1Þ lnjzj

e

= 1 - eða - 1Þ2πi

1þz

0

dz þ eða - 1Þ2πi

0 ða - 1Þ lnjzj

e

1þz

1

dz

1 a-1

dz = 1 - eða - 1Þ2πi 0

z dz: 1þz

ð6:253Þ

In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we have e(a - 1) ln |z| = e(a - 1) ln z = za - 1. Therefore, from (6.253) we get 1 a-1

2πi eaπi z 2πi eaπi =: dz = ð a 1 Þ2πi 1þz 1 - e2πai 1-e

0

ð6:254Þ

Using (6.37) for (6.254) and performing simple trigonometric calculations (after multiplying both the numerator and denominator by 1 - e-2πai), we finally get 1 a-1 0

z dz = 1þz

1 0

π xa - 1 dx = I = : 1þx sin aπ

Example 6.13 [13] Estimate the following real definite integral: b

I= a

1 dx ða, b : real with a < bÞ: ðx - aÞðb - xÞ

ð6:255Þ

To this end, we wish to evaluate the following integral described by IC = C

1 dz, ð z - aÞ ð z - bÞ

ð6:256Þ

where we define the integrand of (6.256) as f ðzÞ 

1 : ðz - aÞðz - bÞ

The total contour C comprises Ca, PQ, Cb, and Q′P′ as depicted in Fig. 6.30. The function f(z) has two branch points at z = a and z = b; otherwise f(z) is analytic. We

6.9 Multivalued Functions and Riemann Surfaces

269

draw a branch cut with a doubled broken line as shown. Since the contour C does not cross the branch cut but encircle it, we can evaluate the integral in a normal manner. Starting from the point P′, (6.256) can be expressed by 1 dz þ ðz - aÞðz - bÞ

IC = Ca

þ

1 dz þ ð z - aÞ ð z - bÞ

PQ

Cb

1 dz ðz - aÞðz - bÞ

1 dz: ð z - aÞ ð z - bÞ

0 0

QP

ð6:257Þ

We assume that the lines PQ and Q′P′ are illimitably close to the real axis. This time, Ca and Cb are both traced counter-clockwise. Putting z - a = r1 eiθ1

and

z - b = r 2 eiθ2 ,

we have f ðzÞ = p

1 e - iðθ1 þθ2 Þ=2 : r1 r2

ð6:258Þ

Let Ca and Cb be a small circle of radius ε centered at z = a and z = b, respectively. When we are evaluating the integral on Cb, we can put z - b = εeiθ2 ð- π ≤ θ2 ≤ π Þ; i:e:, r 2 = ε: We also have r1 ≈ b - a; in Fig. 6.30 we assume b > a. Hence, we have

Cb

1 dz = ðz - aÞðz - bÞ

π

iε p e - iðθ1 þθ2 Þ=2 dθ2 ≤ r1 ε -π

p ε p dθ2 : r1 -π π

Therefore, we get lim

ε→0

Cb

1 dz = 0 or lim ε→0 ð z - aÞ ð z - bÞ

Cb

1 dz = 0: ð z - aÞ ð z - bÞ

In a similar manner, we have lim

ε→0 C a

1 dz = 0: ð z - aÞ ð z - bÞ

Meanwhile, on the line Q′P′ we have

ð6:259Þ

270

6

Theory of Analytic Functions

z - a = r 1 eiθ1 and z - b = r 2 eiθ2 :

ð6:260Þ

In (6.260) we have r1 = x - a, θ1 = 0 and r2 = b - x, θ2 = π; see Fig. 6.30. Also, we have dz = dx. Hence, we have

0 0

QP

1 dz = ðz - aÞðz - bÞ

e - iπ=2 dx = ð x - aÞ ð b - xÞ

a b

b

=i a

a b

i dx ð x - aÞ ð b - xÞ

1 dx: ðx - aÞðb - xÞ

ð6:261Þ

On the line PQ, in turn, we have r1 = x - a and θ1 = 2π. This is because when going from P′ to P, the argument is increased by 2π; see Fig. 6.30 once again. Also, we have r2 = b - x, θ2 = π, and dz = dx. Notice that the argument θ2 remains unchanged. Then, we get 1 dz = ð z - aÞ ð z - bÞ

PQ

b

=i a

b a

e - 3iπ=2 dx ðx - aÞðb - xÞ

1 dx: ðx - aÞðb - xÞ

ð6:262Þ

Inserting these results into (6.257), we obtain IC = C

1 dz = 2i ð z - aÞ ð z - bÞ

b a

1 dx: ð x - aÞ ð b - xÞ

ð6:263Þ

The calculation is yet to be finished, however, because (6.263) is not a real definite integral. Then, let us try another contour integration. We choose a contour C of a radius R and centered at the origin. We assume that R is large enough this time. Meanwhile, putting z=

1 , w

ð6:264Þ

we have ð z - a Þ ð z - bÞ = Thus, we have

1 w

ð1 - awÞð1 - bwÞ with dz = -

dw : w2

ð6:265Þ

6.9

Multivalued Functions and Riemann Surfaces

(a)

271

(b) 0
0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization vectors in the direction of positive x-axis and y-axis; we assume that two waves are

7.4

Superposition of Two Electromagnetic Waves

295

being propagated in the direction of the positive z-axis; δ is a phase difference. The total electric field E is described as the superposition of E1 and E2 such that E = E1 þ E2 = E1 e1 eiðkz - ωtÞ þ E 2 e2 eiðkz - ωtþδÞ :

ð7:69Þ

Note that we usually discuss the polarization characteristics of electromagnetic wave only by considering electric waves. We emphasize that an electric wave and concomitant magnetic wave share the same phase in a uniform and infinite dielectric media. A reason why the electric wave represents an electromagnetic wave is partly because optical application is mostly made in a non-magnetic substance such as glass, water, plastics, and most of semiconductors. Let us view temporal change of E at a fixed point x = 0; x = y = z = 0. Then, taking a real part of (7.69), x- and y-components of E; i.e., Ex and Ey are expressed as E x = E1 cosð- ωt Þ and E y = E 2 cosð- ωt þ δÞ:

ð7:70Þ

First, let us briefly think of the case where δ = 0. Eliminating t, we have Ey =

E2 E: E1 x

ð7:71Þ

This is an equation of a straight line. The resulting electric field E is called a linearly polarized light accordingly. That is, when we are observing the electric field of the relevant light at the origin, the field is oscillating along the straight line described by (7.71) with the origin centrally located of the oscillating field. If δ = π, we have Ey = -

E2 E: E1 x

This gives a straight line as well. Therefore, if we wish to seek the relationship between Ex and Ey, it suffices to examine it as a function of δ in a region of - π2 ≤ δ ≤ π2. (i) Case I: E1 ≠ E2. Let us consider the case where δ ≠ 0 in (7.70). Rewriting the second equation of (7.70) and inserting the first equation into it so that we can eliminate t, we have E y = E2 ðcos ωt cos δ þ sin ωt sin δÞ = E2 cos δ Rearranging terms of the above equation, we have

Ex E2 ± sin δ 1 - x2 : E1 E1

296

7

Ey E - ðcos δÞ x = ± ðsin δÞ E2 E1

1-

Maxwell’s Equations

Ex 2 : E 21

ð7:72Þ

Squaring both sides of (7.72) and arranging the equation, we get 2ðcos δÞE x E y Ey 2 Ex 2 þ = 1: E 21 sin 2 δ E 1 E 2 sin 2 δ E 22 sin 2 δ

ð7:73Þ

Using a matrix form, we have

Ex

1 sin 2 δ cos δ E 1 E 2 sin 2 δ

-

E 21

Ey

cos δ E 1 E 2 sin 2 δ 1 E22 sin 2 δ

Ex Ey

= 1:

ð7:74Þ

Note that the above matrix is real symmetric. In that case, to examine properties of the matrix we calculate its determinant along with principal minors. The principal minor means a minor with respect to a diagonal element. In this case, two principal 1 1 . Also we have minors are E2 sin 2 and 2 δ E sin 2 δ 2

1

1 E 21 sin 2 δ cos δ E 1 E 2 sin 2 δ

-

cos δ E 1 E 2 sin 2 δ 1 = 2 2 : 1 E 1 E 2 sin 2 δ E22 sin 2 δ

ð7:75Þ

Evidently, two principal minors as well as a determinant are all positive (δ ≠ 0). In this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related discussion will be given in Part III. The positive definiteness means that in a quadratic form described by (7.74), LHS takes a positive value for any real number Ex and Ey except a unique case where Ex = Ey = 0, which renders LHS zero. The positive definiteness of a matrix ensures the existence of positive eigenvalues with the said matrix. Let us consider a real symmetric (2, 2) matrix that has positive principal minors and a positive determinant in a general case. Let such a matrix M be M=

a c

c b

,

where a, b > 0 and detM > 0; i.e., ab - c2 > 0. Let a corresponding quadratic form be Q. Then, we have Q=ðx



a c

c b

x y

= ax2 þ 2cyx þ by2 = a



cy a

2

-

c2 y2 aby2 þ 2 a2 a

7.4

Superposition of Two Electromagnetic Waves

=a



cy a

2

þ

297

y2 ab - c2 a2

:

Thus, Q ≥ 0 for any real numbers x and y. We seek a condition under which Q = 0. We readily find that with M that has the above properties, only x = y = 0 makes Q = 0. Thus, M is positive definite. We will deal with this issue from a more general standpoint in Part III. In general, it is pretty complicated to seek eigenvalues and corresponding eigenvectors in the above case. Yet, we can extract important information from (7.74). The eigenvalues λ of the matrix that appeared in (7.74) are estimated as follows:

λ=

E 21 þ E 22 ±

E21 þ E 22

2

- 4E 21 E 22 sin 2 δ

2E 21 E 22 sin 2 δ

:

ð7:76Þ

Notice that λ in (7.76) represents two different positive eigenvalues. It is because an inside of the square root is rewritten by E 21 - E22

2

þ 4E21 E22 cos 2 δ > 0 ðδ ≠ ± π=2Þ:

Also we have E 21 þ E 22 >

E 21 þ E 22

2

- 4E21 E22 sin 2 δ:

These clearly show that the quadratic form of (7.74) gives an ellipse (i.e., elliptically polarized light). Because of the presence of the second term of LHS of (7.73), both the major and minor axes of the ellipse are tilted and diverted from the xand y-axes. Let us inspect the ellipse described by (7.74). Inserting Ex = E1 obtained at t = 0 in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as a double root such that Ey = E 2 cos δ: Similarly putting Ey = E2 in (7.73), we have Ex = E 1 cos δ: These results show that an ellipse described by (7.73) or (7.74) is internally tangent to a rectangle as depicted in Fig. 7.6a. Equation (7.69) shows that the electromagnetic wave is propagated toward the positive direction of the z-axis. Therefore, in Fig. 7.6a we are peeking into the oncoming wave from the bottom of a plane of paper at a certain position of z = constant. We set the constant = 0. Then,

298 Fig. 7.6 Trace of an electric field of an elliptically polarized light. (a) The trace is internally tangent to a rectangle of 2E1 × 2E2. In the case of δ > 0, starting from P at t = 0, the coordinate point representing the electric field traces the ellipse counter-clockwise with time. (b) The trace of an elliptically polarized light for δ = π/2

7

Maxwell’s Equations

(a)

cos

cos

(b)

we find that at t = 0 the electric field is represented by the point P (Ex = E1, Ey = E2 cos δ); see Fig. 7.6a. From (7.70), if δ > 0, P traces the ellipse counterclockwise. It reaches a maximum point of Ey = E2 at t = δ/2ω. Since the trace of electric field forms an ellipse as in Fig. 7.6, the associated light is said to be an elliptically polarized light. If δ < 0 in (7.70), however, P traces the ellipse clockwise. In a special case of δ = π/2, the second term of (7.73) vanishes and we have a simple form described as Ex 2 Ey 2 þ 2 = 1: E 21 E2

ð7:77Þ

Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis of (7.70), we see from Fig. 7.6b that starting from P at t = 0, again the coordinate point representing the electric field traces the ellipse counter-clockwise with time; see the curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse clockwise with time.

7.4

Superposition of Two Electromagnetic Waves

299

(ii) Case II: E1 = E2. Now, let us consider a simple but important case. When E1 = E2, (7.73) is simplified to be E x 2 - 2 cos δE x Ey þ E y 2 = E 21 sin 2 δ:

ð7:78Þ

Using a matrix form, we have ð Ex

Ey

Þ

1 - cos δ

- cos δ 1

= E 21 sin 2 δ:

Ex Ey

ð7:79Þ

We obtain eigenvalues λ of the matrix of (7.79) such that

Setting -

π 2

λ = 1 ± j cos δ j :

ð7:80Þ

λ = 1 ± cos δ:

ð7:81Þ

≤ δ ≤ π2, we have

The corresponding normalized eigenvectors v1 and v2 (as a column vector) are

v1 =

1 p 2 1 - p 2

1 p 2 1 p 2

v2 =

and

:

ð7:82Þ

Thus, we have a diagonalizing unitary matrix P such that

P=

1 p 2 1 - p 2

1 p 2 1 p 2

:

ð7:83Þ

Defining the above matrix appearing in (7.79) as A such that A=

1 - cos δ

- cos δ 1

ð7:84Þ

,

we obtain P - 1 AP =

1þ cos δ 0

0 1 - cos δ

:

ð7:85Þ

Notice that eigenvalues (1 + cos δ) and (1 - cos δ) are both positive as expected. Rewriting (7.79), we have

300

7

Maxwell’s Equations

Fig. 7.7 Relationship between the basis vectors (e1 e2) and e01 e02 in the case of E1 = E2; see text

ð Ex

= ð Ex

Ey

Ey

ÞPP - 1

ÞP

1þ cos δ 0

1 - cos δ

- cos δ 1

0 1 - cos δ

PP - 1

P-1

Ex Ey

Ex Ey

= E21 sin 2 δ:

ð7:86Þ

Here, let us define new coordinates such that u v

 P-1

Ex Ey

:

ð7:87Þ

This coordinate transformation corresponds to the transformation of basis vectors (e1 e2) such that ð e1 e 2 Þ

Ex Ey

= ðe1 e2 ÞPP - 1

Ex Ey

= e01 e02

u v

,

ð7:88Þ

where new basis vectors e01 e02 are given by 1 1 e01 e02 = ðe1 e2 ÞP = p e1 - p e2 2 2

1 1 p e 1 þ p e2 : 2 2

ð7:89Þ

The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The relevant discussion will again appear in Part. III.

7.4

Superposition of Two Electromagnetic Waves

301

Substituting (7.87) for (7.86) and rearranging terms, we get u2 E 21 ð1 -

cos δÞ

þ

E 21 ð1

v2 = 1: þ cos δÞ

ð7:90Þ

p Equation (7.90) indicates that a major axis and minor axis are E 1 1 þ cos δ and p E 1 1 - cos δ, respectively. When δ = ± π/2, (7.90) becomes u2 v 2 þ = 1: E 21 E 21

ð7:91Þ

This represents a circle. For this reason, the wave described by (7.91) is called a circularly polarized light. In (7.90) where δ ≠ ± π/2, the wave is said to be an elliptically polarized light. Thus, we have linearly, elliptically, and circularly polarized lights depending on a magnitude of δ. Let us closely examine characteristics of the elliptically and circularly polarized lights in the case of E1 = E2. When t = 0, from (7.70) we have Ex = E1

and

E y = E 1 cos δ:

ð7:92Þ

This coordinate point corresponds to A1 whose Ex coordinate is E1 (see Fig. 7.8a). In the case of Δt = δ/2ω, Ex = Ey = E1 cos (±δ/2). This point corresponds to A2 in Fig. 7.8a. We have p p E2x þ E 2y = 2E1 cosðδ=2Þ = E1 1 þ cos δ:

ð7:93Þ

This is equal to the major axis as anticipated. With t = Δt, E x = E1 cosð- ωΔt Þ

and

E y = E1 cosð- ωΔt þ δÞ:

ð7:94Þ

Notice that Ey takes a maximum E1 when Δt = δ/ω. Consequently, if δ takes a positive value, Ey takes a maximum E1 for a positive Δt, as is similarly the case with Fig. 7.6a. At that time, Ex = E1 cos (-δ) < E1. This point corresponds to A3 in Fig. 7.8a. As a result, the electric field traces the ellipse counter-clockwise with time, as in the case of Fig. 7.6. If δ takes a negative value, however, the field traces the ellipse clockwise. If δ = ± π/2, in (7.94) we have E x = E1 cosð- ωt Þ

and

Ey = E 1 cos - ωt ±

π : 2

ð7:95Þ

We examine the case of δ = π/2 first. In this case, p when t = 0, Ex = E1 and Ey = 0 (Point C1 in Fig. 7.8b). If ωt = π/4, E x = Ey = 1= 2 (Point C2 in Fig. 7.8b). In turn, if ωt = π/2, Ex = 0 and Ey = E1 (Point C3). Again the electric field traces the circle

302

7

Fig. 7.8 Polarized feature of light in the case of E1 = E2. (a) If δ > 0, the electric field traces an ellipse from A1 via A2 to A3 (see text). (b) If δ = π/2, the electric field traces a circle from C1 via C2 to C3 (leftcircularly polarized light)

Maxwell’s Equations

(a)

1 − cos

>0

(b)

= /2

counter-clockwise. In this situation, we see the light from above the z-axis. In other words, we are viewing the light against the direction of its propagation. The wave is said to be left-circularly polarized and have positive helicity. In contrast, when δ = - π/2, starting from Point C1 the electric field traces the circle clockwise. That light is said to be right-circularly polarized and have negative helicity. With the left-circularly polarized light, (7.69) can be rewritten as E = E1 þ E2 = E1 ðe1 þ ie2 Þeiðkz - ωtÞ :

ð7:96Þ

Therefore, a complex vector (e1 + ie2) characterizes the left-circular polarization. On the other hand, (e1 - ie2) characterizes the right-circular polarization. To

References

303

normalize them, it is convenient to use the following vectors as in the case of Sect. 4.3 [3]. 1 eþ  p ðe1 þ ie2 Þ 2

and

1 e -  p ðe1 - ie2 Þ: 2

ð4:45Þ

In the case of δ = 0, we have a linearly polarized light. For this, the points A1, A2, and A3 coalesce to be a point on a straight line of Ey = Ex.

References 1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester 3. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York

Chapter 8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

In Chap. 7, we considered the propagation of electromagnetic waves in an infinite uniform dielectric medium. In this chapter, we think of a situation where two (or more) dielectrics are in contact with each other at a plane interface. When two dielectric media adjoin each other with an interface, propagating electromagnetic waves are partly reflected by the interface and partly transmitted beyond the interface. We deal with these phenomena in terms of characteristic impedance of the dielectric media. In the case of an oblique incidence of a wave, we categorize it into a transverse electric (TE) wave and transverse magnetic (TM) wave. If a thin plate of a dielectric is sandwiched by a couple of metal sheets, the electromagnetic wave is confined within the dielectric. In this case, the propagating mode of the wave differs from that of a wave propagating in a free space (i.e., a space filled by a threedimensionally infinite dielectric medium). If a thin plate of a dielectric having a large refractive index is sandwiched by a couple of dielectrics with a smaller refractive index, the electromagnetic wave is also confined within the dielectric with a larger index. In this case, we have to take account of the total reflection that causes a phase change upon reflection. We deal with such specific modes of the electromagnetic wave propagation. These phenomena are treated both from a basic aspect and from a point of view of device application. The relevant devices are called waveguides in optics.

8.1

Electromagnetic Fields at an Interface

We start with examining a condition of an electromagnetic field at the plane interface. Suppose that two semi-infinite dielectric media D1 and D2 are in contact with each other at a plane interface. Let us take a small rectangle S that strides the interface (see Fig. 8.1). Taking a surface integral of both sides of (7.28) over the strip, we have

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_8

305

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

l

t1

D1 D2

n

h

306

C

S

t2

Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane Fig. 8.2 Diagram that intuitively explains Stokes’ theorem. In the diagram a surface S is encircled by a closed curve C. An infinitesimal portion of C is denoted by dl. The surface S is pertinent to the surface integration. Spiral vector field E is present on and near S

C

dS

n

S rot E  ndS þ S

S

∂B  ndS = 0, ∂t

ð8:1Þ

where n is a unit vector directed to a normal of S as shown. Applying Stokes’ theorem to the first term of (8.1), we get E  dl þ C

∂B  nΔlΔh = 0: ∂t

ð8:2Þ

With the line integral of the first term, C is a closed loop surrounding the rectangle S and dl = tdl, where t is a unit vector directed toward the tangential direction of C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed counterclockwise in the direction of t. Figure 8.2 gives an intuitive diagram that explains Stokes’ theorem. The diagram shows an overview of a surface S encircled by a closed curve C. Suppose that we have a spiral vector field E represented by arrowed circles as shown. In that case, rot E is directed toward the upper side of the plane of paper in the individual fragments. A summation of rot E  ndS forms a surface integral covering S. Meanwhile, the arrows of adjacent fragments cancel out each other and only the components on the periphery (i.e., the curve C) are non-vanishing (see Fig. 8.2).

8.2 Basic Concepts Underlying Phenomena

307

Thus, the surface integral of rot E is equivalent to the line integral of E. Accordingly, we get Stokes’ theorem described by [1] rot E  ndS = S

E  dl:

ð8:3Þ

C

 nΔlΔh⟶0. Then, the Returning back to Fig. 8.1 and taking Δh ⟶ 0, we have ∂B ∂t second term of (8.2) vanishes and we get E  dl = 0: C

This implies that ΔlðE1  t1 þ E2  t2 Þ = 0, where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the interface, respectively. Considering t2 = - t1 and putting t1 = t, we get ðE1 - E2 Þ  t = 0,

ð8:4Þ

where t represents a unit vector in the direction of a tangential line of the interface plane. Equation (8.4) means that the tangential components of the electric field are continuous on both sides of the interface. We obtain a similar result with the magnetic field. This can be shown by taking a surface integral of both sides of (7.29) as well. As a result, we get ðH 1 - H 2 Þ  t = 0,

ð8:5Þ

where H1 and H2 represent the magnetic field in D1 and D2 close to the interface, respectively. Hence, from (8.5) the tangential components of the magnetic field are continuous on both sides of the interface as well.

8.2

Basic Concepts Underlying Phenomena

When an electromagnetic wave is incident upon an interface of dielectrics, its reflection and transmission (refraction) take place at the interface. We address a question of how the nature of the dielectrics studied in the previous section is associated with the optical phenomena. When we deal with the problem, we assume non-absorbing media. Notice that the complex wavenumber vector is responsible for an absorbing medium along with a complex index of refraction. Nonetheless, our

308

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

approach is useful to discuss related problems in the absorbing media. Characteristic impedance plays a key role in the reflection and transmission of light. We represent a field (either electric or magnetic) of the incident, reflected, and transmitted (or refracted) waves by Fi, Fr, and Ft, respectively. We call a dielectric of the incidence side (and, hence, reflection side) D1 and another dielectric of the transmission side D2. The fields are described by Fi = F i εi eiðki x - ωtÞ ,

ð8:6Þ

Fr = F r εr eiðkr x - ωtÞ ,

ð8:7Þ

Ft = F t εt eiðkt x - ωtÞ ,

ð8:8Þ

where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit vector of polarization direction, i.e., the direction along which the field oscillates; and ki, kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These wavenumber vectors represent the propagation directions of individual waves. In (8.6)–(8.8), indices of i, r, and t stand for incidence, reflection, and transmission, respectively. Let xs be an arbitrary position vector at the interface between the dielectrics. Also, let t be a unit vector paralleling the interface. Thus, tangential components of the field are described as F it = F i ðt  εi Þeiðki xs - ωtÞ ,

ð8:9Þ

F rt = F r ðt  εr Þeiðkr xs - ωtÞ ,

ð8:10Þ

F tt = F t ðt  εt Þeiðkt xs - ωtÞ :

ð8:11Þ

Note that F it and F rt represent the field in D1 just close to the interface and that F tt denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5), we have F it þ F r t = F t t :

ð8:12Þ

Notice that (8.12) holds with any position xs and any time t. Let us think of elementary calculation of exponential functions or exponential polynomials and the relationship between individual coefficients and exponents. 0 With respect to two functions eikx and eik x , we have two alternatives according to a value Wronskian takes. Here, Wronskian W is expressed as

8.2

Basic Concepts Underlying Phenomena ikx

W = ðeeikx Þ0

0

eik x 0 0 ðeik x Þ

309 0

= - iðk - k 0 Þeiðkþk Þx :

ð8:13Þ

0

(i) W ≠ 0 if and only if k ≠ k′. In this case, eikx and eik x are said to be linearly independent. That is, on condition of k ≠ k′, for any x we have 0

aeikx þ beik x = 0 ⟺ a = b = 0:

ð8:14Þ

(ii) W = 0 if k = k′. In that case, we have 0

aeikx þ beik x = ða þ bÞeikx = 0 ⟺ a þ b = 0: Notice that eikx never vanishes with any x. To conclude, if we think of an equation of an exponential polynomial 0

aeikx þ beik x = 0, we have two alternatives regarding the coefficients. One is a trivial case of a = b = 0 and the other is a + b = 0. Next, with respect to eik1 x , and eik2 x , and eik3 x , similarly we have eik1 x ik 1 x 0

W = ðe Þ 00 ðeik1 x Þ

eik2 x ik 2 x 0

ðe Þ 00 ðeik2 x Þ

eik3 x 0

ðeik3 x Þ 00 ðeik3 x Þ

= - iðk 1 - k 2 Þðk2 - k3 Þðk 3 - k 1 Þeiðk1 þk2 þk3 Þx ,

ð8:15Þ

where W ≠ 0 if and only if k1 ≠ k2, k2 ≠ k3, and k3 ≠ k1. That is, on this condition for any x we have aeik1 x þ beik2 x þ ceik3 x = 0 ⟺ a = b = c = 0:

ð8:16Þ

If the three exponential functions are linearly dependent, at least two of k1, k2, and k3 are equal to each other and vice versa. On this condition, again consider a following equation of an exponential polynomial: aeik1 x þ beik2 x þ ceik3 x = 0: Without loss of generality, we assume that k1 = k2. Then, we have aeik1 x þ beik2 x þ ceik3 x = ða þ bÞeik1 x þ ceik3 x = 0: If k1 ≠ k3, we must have

ð8:17Þ

310

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

a þ b=0

and

c = 0:

ð8:18Þ

If, on the other hand, k1 = k3, i.e., k1 = k2 = k3, we have aeik1 x þ beik2 x þ ceik3 x = ða þ b þ cÞeik1 x = 0: That is, we have a þ b þ c = 0:

ð8:19Þ

Consequently, we must have k1 = k2 = k3 so that we can get three nonzero coefficients a, b, and c. Returning to (8.12), its full description is F i ðt  εi Þeiðki xs - ωtÞ þ F r ðt  εr Þeiðkr xs - ωtÞ - F t ðt  εt Þeiðkt xs - ωtÞ = 0:

ð8:20Þ

Again, (8.20) must hold with any position xs and any time t. Meanwhile, for (8.20) to have a physical meaning, we should have F i ðt  εi Þ ≠ 0, F r ðt  εr Þ ≠ 0, and F t ðt  εt Þ ≠ 0:

ð8:21Þ

On the basis of the above consideration, we must have following two relations: ki  xs - ωt = kr  xs - ωt = kt  xs - ωt or k i  xs = kr  xs = kt  xs ,

ð8:22Þ

and F i ðt  εi Þ þ F r ðt  εr Þ - F t ðt  εt Þ = 0 or F i ðt  εi Þ þ F r ðt  εr Þ = F t ðt  εt Þ:

ð8:23Þ

In this way, we are able to obtain a relation among amplitudes of the fields of incidence, reflection, and transmission. Notice that we get both the relations between exponents and coefficients at once. First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane (see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to D1 and transmitted (or refracted) partly into another dielectric medium D2. In

8.2

Basic Concepts Underlying Phenomena

Fig. 8.3 Geometry of the incident, reflected, and transmitted lights. We assume that the light is incident from a dielectric medium D1 toward another medium D2. The wavenumber vectors ki, kr, and kt represent the incident, reflected, and transmitted (or refracted) lights with an angle θ, θ′, and ϕ, respectively. Note here that we did not assume the equality of θ and θ′ (see text)

311

z

θ

kr

θ

D1

x

D2

θ

ki

φ kt

Fig. 8.3, ki, kr, and kt represent the incident, reflected, and transmitted lights that make an angle θ, θ′, and ϕ with the z-axis, respectively. Then we have k i = ð e1 e2 e3 Þ

k i sin θ 0 - k i cos θ

xs = ð e 1 e 2 e 3 Þ

x y 0

,

,

ð8:24Þ

ð8:25Þ

where θ is said to be an incidence angle. A plane formed by ki and a normal to the interface is called a plane of incidence (or incidence plane). In Fig. 8.3, the zx-plane forms the incidence plane. From (8.24) and (8.25), we have ki  xs = ki x sin θ,

ð8:26Þ

kr  xs = krx x þ k ry y,

ð8:27Þ

kt  xs = ktx x þ k ty y,

ð8:28Þ

where ki = j kij; krx and kry are x and y components of kr; similarly k tx and kty are x and y components of kt. Since (8.22) holds with any x and y, we have

312

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k i sin θ = krx = ktx ,

ð8:29Þ

k ry = k ty = 0:

ð8:30Þ

From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the zx-plane. Notice that at the beginning we did not assume the coplanarity of those waves. We did not assume the equality of θ and θ′ either (vide infra). From (8.29) and Fig. 8.3, however, we have k i sin θ = kr sin θ0 = kt sin ϕ,

ð8:31Þ

where kr = j krj and kt = j ktj; θ′ and ϕ are said to be a reflection angle and a refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a straight line that parallels the z-axis. Figure 8.3 clearly shows it. Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and that in D2 is λ2. Since the incident light and reflected light are propagated in D1, we have k i = kr = 2π=λ1 :

ð8:32Þ

From (8.31) and (8.32), we get sin θ = sin θ0 :

ð8:33Þ

Therefore, we have either θ = θ′ or θ′ = π - θ. Since 0 < θ, θ′ < π/2, we have θ = θ0 :

ð8:34Þ

Then, returning back to (8.31), we have k i sin θ = kr sin θ = kt sin ϕ:

ð8:35Þ

This implies that the components tangential to the interface of ki, kr, and kt are the same. Meanwhile, we have kt = 2π=λ2 : Also we have

ð8:36Þ

8.3

Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

c = λ0 ν, v1 = λ1 ν, v2 = λ2 ν,

313

ð8:37Þ

where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is common to D1 and D2, we have c=λ0 = v1 =λ1 = v2 =λ2 or c=v1 = λ0 =λ1 = n1 , c=v2 = λ0 =λ2 = n2 ,

ð8:38Þ

where λ0 is a wavelength in vacuum and n1 and n2 are refractive indices of D1 and D2, respectively. Combining (8.35) with (8.32), (8.36), and (8.38), we have several relations such that sin θ k λ n = t = 1 = 2 ð nÞ, λ 2 n1 sin ϕ ki

ð8:39Þ

where n is said to be a relative refractive index of D2 relative to D1. The relation (8.39) is called Snell’s law. Notice that (8.39) reflects the kinematic aspect of light and that this characteristic comes from the exponents of (8.20).

8.3

Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

On the basis of the above argument, we are now in the position to determine the relations among amplitudes of the electromagnetic fields of waves of incidence, reflection, and transmission. Notice that since we are dealing with non-absorbing media, the relevant amplitudes are real (i.e., positive or negative). In other words, when the phase is retained upon reflection, we have a positive amplitude due to ei0 = 1. When the phase is reversed upon reflection, on the other hand, we will be treating a negative amplitude due to eiπ = - 1. Nevertheless, when we consider the total reflection, we deal with a complex amplitude (vide infra). We start with the discussion of the vertical incidence of an electromagnetic wave before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei, for the incident field. There we define unit polarization vectors of the electric field εi, εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we also define Fi (both electric and magnetic fields) as positive. We have two cases about a geometry of the fields (see Fig. 8.4). The first case is that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the

314

8

(a)

z

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Hi

Ei

Hr

Er

Ht

(b)

Incidence light

e1

Et

D1 D2

x

Incidence light

z

Hi

Ei

Hr

Er

Ht

e1

x

Et

Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same direction, Er is reversed. In this case, we define Er as negative

same direction, Er is reversed (Fig. 8.4b). In this case, we define Er as negative. Notice that Ei and Et are always directed in the same direction and that Er is directed either in the same direction or in the opposite direction according to the nature of the dielectrics. The situation will be discussed soon. Meanwhile, unit polarization vectors of the magnetic fields are determined by (7.67) for the incident, reflected, and transmitted waves. In Fig. 8.4, the magnetic fields are polarized along the y-axis (i.e., perpendicular to the plane of paper). The magnetic fields Hi and Ht are always directed to the same direction as in the case of the electric fields. On the other hand, if the phase of Er is conserved, the direction of Hr is reversed and vice versa. This converse relationship with respect to the electric and magnetic fields results solely from the requirement that E, H, and the propagation unit vector n of light must constitute a right-handed system in this order. Notice that n is reversed upon reflection. Next, let us consider an oblique incidence. With the oblique incidence, electromagnetic waves are classified into two special categories, i.e., transverse electric (TE) waves (or modes) or transverse magnetic (TM) waves (or modes). The TE wave is characterized by the electric field that is perpendicular to the incidence plane, whereas the TM wave is characterized by the magnetic field that is perpendicular to the incidence plane. Here the incidence plane is a plane that is formed by the propagation direction of the incident light and the normal to the interface of the two dielectrics. Since E, H, and n form a right-handed system, in the TE wave, H lies on the incidence plane. For the same reason, in the TM wave, E lies on the incidence plane. In a general case where a field is polarized in an arbitrary direction, that field can be formed by superimposing two fields corresponding to the TE and TM waves. In other words, if we take an arbitrary field E, it can be decomposed into a component having a unit polarization vector directed perpendicular to the incidence plane and another component having the polarization vector that lies on the incidence plane. These two components are orthogonal to each other.

8.3

Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

Fig. 8.5 Geometry of the electromagnetic fields near the interface between dielectric media D1 and D2 in the case of oblique incidence of a TE wave. The electric field E is polarized along the y-axis (i.e., perpendicular to the plane of paper) with H polarized in the zx-plane. Unit polarization vectors εi, εr, and εt are given for H with red arrows. To avoid complication, neither Hi nor Ht is shown

Hr

ki

Ei Et

315

z

εi θ y

kr

εt

D1 x D2

Er

εr

φ kt

Example 8.1 TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of a TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies on that plane. The zx-plane defines the incidence plane. In this case, E is polarized along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5, the polarization direction of the electric field is denoted by a symbol ⨂. Therefore, we have e2  εi = e2  εr = e2  εt = 1:

ð8:40Þ

For H we define the direction of unit polarization vectors εi, εr, and εt so that their direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1 (a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have e1  εi = cos θ, e1  εr = cos θ, and e1  εt = cos ϕ:

ð8:41Þ

Note that we have used the same symbols εi, εr, and εt to represent the unit polarization vectors of both the electric field E and magnetic field H. In Fig. 8.5, the unit polarization vectors of H are depicted with red arrows. According as Hr is directed to the same direction as εr or the opposite direction to εr, the amplitude is defined as positive or negative, as in the case of the vertical incidence. In Fig. 8.5, we depict the case where the amplitude Hr is negative. That is, the phase of the magnetic field is reversed upon reflection and, hence, Hr is in an opposite direction to εr in Fig. 8.5. Note that Hi and Ht are in the same direction as εi and εt, respectively. To avoid complication, neither Hi nor Ht is shown in Fig. 8.5.

316

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Taking account of the above note of caution, we apply (8.23) to both E and H to get the following relations: Ei þ Er = Et ,

ð8:42Þ

H i cos θ þ H r cos θ = H t cos ϕ:

ð8:43Þ

To derive the above equations, we choose t = e2 with E and t = e1 with H for (8.23). Because of the abovementioned converse relationship with E and H, we have E r H r < 0:

ð8:44Þ

Suppose that we carry out an experiment to determine six amplitudes in (8.42) and (8.43). Out of those quantities, we can freely choose and fix Ei. Then, we have five unknown amplitudes, i.e., Er, Et, Hi, Hr, and Ht. Thus, we need three more relations to determine them. Here information about the characteristic impedance Z is useful. It was defined as (7.64). From (8.6) to (8.8) as well as (8.44), we get Z1 =

μ1 =ε1 = E i =H i = - E r =H r ,

Z2 =

μ2 =ε2 = E t =H t ,

ð8:45Þ

ð8:46Þ

where ε1 and μ1 are permittivity and permeability of D1, respectively, and ε2 and μ2 are permittivity and permeability of D2, respectively. As an example, we have H i = n × Ei =Z 1 = n × E i εi,e eiðki x - ωtÞ =Z 1 = Ei εi,m eiðki x - ωtÞ =Z 1 = H i εi,m eiðki x - ωtÞ ,

ð8:47Þ

where we distinguish polarization vectors of electric field (εi, e) and magnetic field (εi, m). With the notation and geometry of the fields Ei and Hi, see Fig. 7.4. Comparing coefficients of the last relation of (8.47), we get E i =Z 1 = H i :

ð8:48Þ

On the basis of (8.42)–(8.46), we are able to decide Er, Et, Hi, Hr, and Ht. What we wish to determine, however, is a ratio among those quantities. To this end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities:

8.3

Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

R⊥ E  E r =E i

and

T⊥ E  E t =E i ,

317

ð8:49Þ

⊥ where R⊥ E and T E are said to be a reflection coefficient and transmission coefficient with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave (i.e., electric field oscillating vertically with respect to the incidence plane). Thus ⊥ rewriting (8.42) and (8.43) and using R⊥ E and T E , we have ⊥ R⊥ E - T E = - 1, cos θ cos ϕ cos θ þ T⊥ = : R⊥ E Z E Z Z1 1 2

ð8:50Þ

Using Cramer’s rule of matrix algebra, we have a solution such that -1 cos θ Z1 R⊥ E = 1 cos θ Z1

-1 cos ϕ Z cos θ - Z 1 cos ϕ Z2 = 2 , Z 2 cos θ þ Z 1 cos ϕ -1 cos ϕ Z2

ð8:51Þ

1 cos θ Z1 T⊥ E = 1 cos θ Z1

-1 cos θ 2Z 2 cos θ Z1 = : Z 2 cos θ þ Z 1 cos ϕ -1 cos ϕ Z2

ð8:52Þ

Similarly, defining R⊥ H  H r =H i

and

T⊥ H  H t =H i ,

ð8:53Þ

⊥ where R⊥ H and T H are said to be a reflection coefficient and transmission coefficient with the magnetic field, respectively, we get

R⊥ H=

Z 1 cos ϕ - Z 2 cos θ , Z 2 cos θ þ Z 1 cos ϕ

ð8:54Þ

T⊥ H=

2Z 1 cos θ : Z 2 cos θ þ Z 1 cos ϕ

ð8:55Þ

In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and (8.46). Derivation of (8.54) and (8.55) is left for readers. Notice also that

318

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Table 8.1 Reflection and transmission coefficients of TE and TM waves k Incidence (TM)

⊥ Incidence (TE) =

Z 2 cos θ - Z 1 cos ϕ Z 2 cos θþZ 1 cos ϕ

R⊥ H =

Z 1 cos ϕ - Z 2 cos θ Z 2 cos θþZ 1 cos ϕ

T⊥ E =

2Z 2 cos θ Z 2 cos θþZ 1 cos ϕ

TE =

T⊥ H=

2Z 1 cos θ Z 2 cos θþZ 1 cos ϕ

TH =

R⊥ E

⊥ ⊥ ⊥ - R⊥ E RH þ T E T H

k

Z 2 cos ϕ - Z 1 cos θ Z 1 cos θþZ 2 cos ϕ

k

Z 1 cos θ - Z 2 cos ϕ Z 1 cos θþZ 2 cos ϕ

k

2Z 2 cos θ Z 1 cos θþZ 2 cos ϕ

k

2Z 1 cos θ Z 1 cos θþZ 2 cos ϕ

RE = = - R⊥ E

cos ϕ cos θ

RH =

= 1 ðR⊥ þ T ⊥ = 1Þ

k

k

k

k cos ϕ cos θ

- RE RH þ T E T H

⊥ R⊥ H = - RE :

k

= - RE

= 1 Rk þ T k = 1

ð8:56Þ

This relation can easily be derived by (8.45). Example 8.2 TM Wave In a manner similar to that described above, we obtain information about the TM wave. Switching a role of E and H, we assume that H is polarized along the y-axis with E polarized in the zx-plane. Following the aforementioned procedures, we have Ei cos θ þ Er cos θ = E t cos ϕ,

ð8:57Þ

Hi þ Hr = Ht :

ð8:58Þ

From (8.57) and (8.58), similarly we get Z 2 cos ϕ - Z 1 cos θ , Z 1 cos θ þ Z 2 cos ϕ

ð8:59Þ

2Z 2 cos θ : Z 1 cos θ þ Z 2 cos ϕ

ð8:60Þ

Z 1 cos θ - Z 2 cos ϕ k = - RE , Z 1 cos θ þ Z 2 cos ϕ

ð8:61Þ

2Z 1 cos θ : Z 1 cos θ þ Z 2 cos ϕ

ð8:62Þ

k

RE =

k

TE = Also, we get k

RH =

k

TH =

The symbol k of (8.59)–(8.62) means a quantity of the TM wave. In Table 8.1, we list the important coefficients in relation to the reflection and transmission of electromagnetic waves along with their relationship.

8.4

Energy Transport by Electromagnetic Waves

319

In Examples 8.1 and 8.2, we have examined how the reflection and transmission coefficients vary as a function of characteristic impedance as well as incidence and refraction angles. Meanwhile, in a nonmagnetic substance, a refractive index n can be approximated as p n ≈ εr ,

ð7:57Þ

assuming that μr ≈ 1. In this case, we have Z=

μ=ε =

p μr μ0 =εr ε0 ≈ Z 0 = εr ≈ Z 0 =n:

Using this relation, we can readily rewrite the reflection and transmission coefficients as a function of refractive indices of the dielectrics. The derivation is left for the readers.

8.4

Energy Transport by Electromagnetic Waves

Returning to (7.58) and (7.59), let us consider energy transport in a dielectric medium by electromagnetic waves. Let us describe their electric (E) and magnetic (H) fields of the electromagnetic waves in a uniform and infinite dielectric medium such that E = Eεe eiðknx - ωtÞ ,

ð8:63Þ

H = Hεm eiðknx - ωtÞ ,

ð8:64Þ

where εe and εm are unit polarization vector; we assume that both E and H are positive. Notice again that εe, εm, and n constitute a right-handed system in this order. The energy transport is characterized by a Poynting vector S that is described by S = E × H:

ð8:65Þ

V A Since E and H have a dimension [m ] and [m ], respectively, S has a dimension [mW2]. Hence, S represents an energy flow per unit time and per unit area with respect to the propagation direction. For simplicity, let us assume that the electromagnetic wave is propagating toward the z-direction. Then we have

E = Eεe eiðkz - ωtÞ ,

ð8:66Þ

320

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

H = Hεm eiðkz - ωtÞ :

ð8:67Þ

To seek a time-averaged energy flow toward the z-direction, it suffices to multiply real parts of (8.66) and (8.67) and integrate it during a period T at a point of z = 0. Thus, a time-averaged Poynting vector S is given by EH T

S = e3

T

cos 2 ωtdt,

ð8:68Þ

0

where T = 1/ν = 2π/ω. Using a trigonometric formula cos 2 ωt =

1 ð1 þ cos 2ωt Þ, 2

ð8:69Þ

the integration can easily be performed. Thus, we get S=

1 EHe3 : 2

ð8:70Þ

S=

1 E × H : 2

ð8:71Þ

Equivalently, we have

Meanwhile, an energy density W is given by W=

1 ðE  D þ H  BÞ, 2

ð8:72Þ

where the first and second terms are pertinent to the electric and magnetic fields, V C  m2] = [mJ3] and that the respectively. Note in (8.72) that the dimension of E  D is [m A Vs Ws J dimension of H  B is [m  m2 ] = [ m3 ] = [m3 ]. Using (7.7) and (7.10), we have W=

1 εE2 þ μH 2 : 2

ð8:73Þ

As in the above case, estimating a time-averaged energy density W, we get W=

1 1 1 1 2 1 2 εE þ μH = εE 2 þ μH 2 : 2 4 4 2 2

ð8:74Þ

We also get this relation by integrating (8.73) over a wavelength λ at a time of t = 0. Using (7.60) and (7.61), we have

8.4

Energy Transport by Electromagnetic Waves

321

εE2 = μH 2 :

ð8:75Þ

This implies that the energy density resulting from the electric field and that due to the magnetic field have the same value. Thus, rewriting (8.74) we have W=

1 2 1 2 εE = μH : 2 2

ð8:76Þ

Moreover, using (7.43), we have for an impedance Z = E=H =

μ=ε = μv or E = μvH:

ð8:77Þ

Using this relation along with (8.75), we get S=

1 1 vεE 2 e3 = vμH 2 e3 : 2 2

ð8:78Þ

Thus, we have various relations among amplitudes of electromagnetic waves and related physical quantities together with constant of dielectrics. Returning to Examples 8.1 and 8.2, let us further investigate the reflection and transmission properties of the electromagnetic waves. From (8.51) to (8.55) as well as (8.59) to (8.62), we get in both the cases of TE and TM waves ⊥ ⊥ ⊥ - R⊥ E RH þ T E T H

k

k

k

k

- RE RH þ T E T H

cos ϕ = 1, cos θ

ð8:79Þ

cos ϕ = 1: cos θ

ð8:80Þ

In both the TE and TM cases, we define reflectance R and transmittance T such that R  - RE RH = R2E = 2 Sr =2 Si = Sr = Si ,

ð8:81Þ

where Sr and Si are time-averaged Poynting vectors of the reflected wave and incident waves, respectively. Also, we have T  TETH

St cos ϕ cos ϕ 2 St cos ϕ = = , cos θ cos θ Si cos θ 2 Si

ð8:82Þ

where St is a time-averaged Poynting vector of the transmitted wave. Thus, we have

322

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.6 Luminous flux near the interface

z

ki D1

kr

θ

D2

x

φ kt R þ T = 1:

ð8:83Þ

ϕ The relation (8.83) represents the energy conservation. The factor cos cos θ can be understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose that we have an incident wave with an irradiance I mW2 whose incidence plane is the zxplane. Notice that I has the same dimension as a Poynting vector. Here let us think of the luminous flux that is getting through a unit area (i.e., a unit length square) perpendicular to the propagation direction of the light. Then, this flux illuminates an area on the interface of a unit length (in the y-direction) multiplied by ϕ a length of cos cos θ (in the x-direction). That is, the luminous flux has been widened ϕ (or squeezed) by cos cos θ times after getting through the interface. The irradiance has been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance of income and outgo with respect to the luminous flux before and after getting ϕ through the interface, the transmission irradiance must be multiplied by a factor cos cos θ .

8.5

Brewster Angles and Critical Angles

In this section and subsequent sections, we deal with nonmagnetic substance as dielectrics; namely, we assume μr ≈ 1. In that case, as mentioned in Sect. 8.3, we rewrite, e.g., (8.51) and (8.59) as R⊥ E =

cos θ - n cos ϕ , cos θ þ n cos ϕ

ð8:84Þ

8.5

Brewster Angles and Critical Angles k

RE =

323

cos ϕ - n cos θ , cos ϕ þ n cos θ

ð8:85Þ

where n (=n2/n1) is a relative refractive index of D2 relative to D1. Let us think of a k condition on which R⊥ E = 0 or RE = 0. First, we consider (8.84). We have ½numerator of ð8:84Þ = cos θ - n cos ϕ = cos θ -

=

sin θ cos ϕ sin ϕ

sinðϕ - θÞ sin ϕ cos θ - sin θ cos ϕ , = sin ϕ sin ϕ

ð8:86Þ

where with the second equality we used Snell’s law; the last equality is due to trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have -π/ 2 < ϕ - θ < π/2. Therefore, if and only if ϕ - θ = 0, sin(ϕ - θ) = 0. Namely, only when ϕ = θ, R⊥ E could vanish. For different dielectrics having different refractive indices, only if ϕ = θ = 0 (i.e., a vertical incidence), we have ϕ = θ. But, in that case we obtain lim

ϕ → 0, θ → 0

sinðϕ - θÞ 0 = : sin ϕ 0

This is a limit of indeterminate form. From (8.84), however, we have R⊥ E =

1-n , 1þn

ð8:87Þ

⊥ for ϕ = θ = 0. This implies that R⊥ E does not vanish at ϕ = θ = 0. Thus, RE never vanishes for any θ or ϕ. Note that for this condition, naturally we have k

RE =

1-n : 1þn

This is because with ϕ = θ = 0 we have no physical difference between TE and TM waves. In turn, let us examine (8.85) similarly with the case of TM wave. We have ½numerator of ð8:85Þ = cos ϕ - n cos θ = cos ϕ -

sin θ cos θ sin ϕ

324

8

=

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

sinðϕ - θÞ cosðϕ þ θÞ sin ϕ cos ϕ - sin θ cos θ : = sin ϕ sin ϕ

ð8:88Þ

With the last equality of (8.88), we used a trigonometric formula. From (8.86) we k ðϕ - θ Þ know that sinsin does not vanish. Therefore, for RE to vanish, we need cos ϕ (ϕ + θ) = 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) = 0 if and only if ϕ þ θ = π=2:

ð8:89Þ

In other words, for particular angles θ = θB and ϕ = ϕB that satisfy ϕB þ θB = π=2,

ð8:90Þ

k

we have RE = 0, i.e., we do not observe a reflected wave. The particular angle θB is said to be the Brewster angle. For θB we have sin ϕB = sin

π - θB = cos θB , 2

n = sin θB = sin ϕB = sin θB = cos θB = tan θB

or

θB = tan - 1 n,

ϕB = tan - 1 n - 1 :

ð8:91Þ

Suppose that we have a parallel plate consisting a dielectric D2 of a refractive index n2 sandwiched with another dielectric D1 of a refractive index n1 (Fig. 8.7). Let θB be the Brewster angle when the TM wave is incident from D1 to D2. In the above discussion, we defined a relative refractive index n of D2 relative D1 as n = n2/n1; recall (8.39). The other way around, suppose that the TM wave is incident from D2 to D1. Then, the relative refractive index of D1 relative to D2 is n1/n2 = n-1. In this situation, another Brewster angle (from D2 to D1) defined as ~ θB is given by ~θB = tan - 1 n - 1 :

ð8:92Þ

This number is, however, identical to ϕB in (8.91). Thus, we have ~θB = ϕB :

ð8:93Þ

Thus, regarding the TM wave that is propagating in D2 after getting through the interface and is to get back to D1, ~θB = ϕB is again the Brewster angle. In this way, the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1 without being reflected by the two interfaces. This conspicuous feature is often utilized for an optical device.

8.6

Total Reflection

325

Fig. 8.7 Diagram that explains the Brewster angle. Suppose that a parallel plate consisting of a dielectric D2 of a refractive index n2 is sandwiched with another dielectric D1 of a refractive index n1. The incidence angle θB represents the Brewster angle observed when the TM wave is incident from D1 to D2. ϕB is another Brewster angle that is observed when the TM wave is getting back from D2 to D1

z

θB D2 (

)

D1 (

)

D1 ( φB

) x

φB

θB

If an electromagnetic wave is incident from a dielectric of a higher refractive index to that of a lower index, the total reflection takes place. This is equally the case with both TE and TM waves. For the total reflection to take place, θ should be larger than a critical angle θc that is defined by θc = sin - 1 n:

ð8:94Þ

This is because at θc from the Snell’s law we have n sin θc = sin θc = 2 ð nÞ: sin π2 n1

ð8:95Þ

From (8.95), we have p tan θc = n= 1 - n2 > n = tan θB : In the case of the TM wave, therefore, we find that θc > θB :

ð8:96Þ

The critical angle is always larger than the Brewster angle with TM waves.

8.6

Total Reflection

In Sect. 8.2 we saw that Snell’s law results from the kinematical requirement. For this reason, we may consider it as a universal relation that can be extended to complex refraction angles. In fact, for Snell’s law to hold with θ > θc, we must have

326

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

sin ϕ > 1:

ð8:97Þ

This needs us to extend ϕ to a complex domain. Putting ϕ=

π þ ia ða : real, a ≠ 0Þ, 2

ð8:98Þ

we have sin ϕ 

1 1 iϕ e - e - iϕ = ðe - a þ ea Þ > 1, 2 2i

ð8:99Þ

i 1 iϕ e þ e - iϕ = ðe - a - ea Þ: 2 2

ð8:100Þ

cos ϕ 

Thus, cos ϕ is pure imaginary. Now, let us consider a transmitted wave whose electric field is described as Et = Eεt eiðkt x - ωtÞ ,

ð8:101Þ

where εt is the unit polarization vector and kt is a wavenumber vector of the transmission wave. Suppose that the incidence plane is the zx-plane. Then, we have kt  x = ktx x þ ktz z = xk t sin ϕ þ zk t cos ϕ,

ð8:102Þ

where k tx and ktz are x and z components of kt, respectively; kt = j ktj. Putting cos ϕ = ib ðb : real, b ≠ 0Þ,

ð8:103Þ

Et = Eεt eiðxkt sin ϕþibzkt - ωtÞ = Eεt eiðxkt sin ϕ - ωtÞ e - bzkt :

ð8:104Þ

we have

With the total reflection, we must have z⟶1 ⟹ e - bzkt ⟶0:

ð8:105Þ

To meet this requirement, we have b > 0: Meanwhile, we have

ð8:106Þ

8.6

Total Reflection

327

cos 2 ϕ = 1 - sin 2 ϕ = 1 -

sin 2 θ n2 - sin 2 θ = , n2 n2

ð8:107Þ

where notice that n < 1, because we are dealing with the incidence of light from a medium with a higher refractive index to a low index medium. When we consider the total reflection, the numerator of (8.107) is negative, and so we have two choices such that p cos ϕ = ± i

sin 2 θ - n2 : n

ð8:108Þ

From (8.103) and (8.106), we get p cos ϕ = i

sin 2 θ - n2 : n

ð8:109Þ

Hence, inserting (8.109) into (8.84), we have for the TE wave R⊥ E

p cos θ - i sin 2 θ - n2 p = : cos θ þ i sin 2 θ - n2

ð8:110Þ

Then, we have ⊥

R =

- R⊥ E

 R⊥ H

= R⊥ E

 R⊥ E

p cos θ - i sin 2 θ - n2 p = cos θ þ i sin 2 θ - n2

p cos θ þ i sin 2 θ - n2 p  = 1: cos θ - i sin 2 θ - n2

ð8:111Þ

As for the TM wave, substituting (8.109) for (8.85), we have k

RE =

p - n2 cos θ þ i sin 2 θ - n2 p : n2 cos θ þ i sin 2 θ - n2

ð8:112Þ

In this case, we also get p p - n2 cos θ þ i sin 2 θ - n2 - n2 cos θ - i sin 2 θ - n2 p p R =  = 1: n2 cos θ þ i sin 2 θ - n2 n2 cos θ - i sin 2 θ - n2 k

ð8:113Þ

The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher refractive index medium. Thus, the total reflection is characterized by the complex reflection coefficient expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and

328

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

(8.112), we can estimate a change in a phase of the electromagnetic wave that takes place by virtue of the total reflection. For this purpose, we put iα R⊥ E  e

k

RE  eiβ :

and

ð8:114Þ

Rewriting (8.110), we have R⊥ E

cos 2 θ - sin 2 θ - n2 - 2i cos θ = 1 - n2

p

sin 2 θ - n2

:

ð8:115Þ

At a critical angle θc, from (8.95) we have sin θc = n:

ð8:116Þ

1 - n2 = cos 2 θc :

ð8:117Þ

Therefore, we have

Then, as expected, we get R⊥ E

θ = θc

= 1:

ð8:118Þ

Note, however, that at θ = π/2 (i.e., grazing incidence), we have R⊥ E

θ = π=2

= - 1:

ð8:119Þ

From (8.115), an argument α in a complex plane is given by p 2 cos θ sin 2 θ - n2 tan α = : cos 2 θ - sin 2 θ - n2

ð8:120Þ

The argument α defines a phase shift upon the total reflection. Considering (8.115) and (8.118), we have αjθ = θc = 0: Since 1 - n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ - n2 > 0, the imaginary part of R⊥ E is negative for any θ (i.e., 0 to π/2). On the other hand, the real ⊥ part of RE varies from 1 to -1, as is evidenced from (8.118) and (8.119). At θ0 that satisfies a following condition:

8.6

Total Reflection

329

Fig. 8.8 Phase shift α defined in a complex plane for the total reflection of TE wave. The number n denotes a relative refractive index of D2 relative to D1. At a critical angle θc, α = 0

i

1

0 =

α

=

2

= sin

sin θ0 =

1 þ n2 , 2

1+ 2

ð8:121Þ

the real part is zero. Thus, the phase shift α varies from 0 to -π as indicated in Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have θc < θ0 < π=2: Similarly, we estimate the phase change for a TM wave. Rewriting (8.112), we have k

RE =

- n4 cos 2 θ þ sin 2 θ - n2 þ 2in2 cos θ n4 cos 2 θ þ sin 2 θ - n2

p

sin 2 θ - n2

p - n4 cos 2 θ þ sin 2 θ - n2 þ 2in2 cos θ sin 2 θ - n2 = : ð1 - n2 Þ sin 2 θ - n2 cos 2 θ Then, we have

ð8:122Þ

330

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media k

RE

θ = θc

= - 1:

ð8:123Þ

Also at θ = π/2 (i.e., grazing incidence), we have k

RE

θ = π=2

= 1:

ð8:124Þ

From (8.122), an argument β is given by p 2n2 cos θ sin 2 θ - n2 tan β = : - n4 cos 2 θ þ sin 2 θ - n2

ð8:125Þ

Considering (8.122) and (8.123), we have βjθ = θc = π:

ð8:126Þ

In the total reflection region, we have sin 2 θ - n2 cos 2 θ > n2 - n2 cos 2 θ = n2 1 - cos 2 θ > 0:

ð8:127Þ

Therefore, the denominator of (8.122) is positive and, hence, the imaginary part k of RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the k other hand, the real part of RE in (8.122) varies from -1 to 1. At ~ θ0 that satisfies a following condition: cos ~θ0 =

1 - n2 , 1 þ n4

ð8:128Þ

k

the real part of RE is zero. Once again, we have θc < ~θ0 < π=2:

ð8:129Þ

Thus, the phase β varies from π to 0 as depicted in Fig. 8.9. In this section, we mentioned somewhat peculiar features of complex trigonometric functions such as sinϕ > 1 in light of real functions. As a matter of course, the complex angle ϕ should be determined experimentally from (8.109). In this context readers are referred to Chap. 6 that dealt with the theory of analytic functions [2].

8.7

Waveguide Applications

Fig. 8.9 Phase shift β for the total reflection of TM wave. At a critical angle θc, β=π

331

= cos

1− 1+

i

β 1

0 =

8.7

=

2

Waveguide Applications

There are many optical devices based upon light propagation. Among them, waveguide devices utilize the total reflection. We explain their operation principle. Suppose that we have a thin plate (usually said to be a slab) comprising a dielectric medium that infinitely spreads two-dimensionally and that the plate is sandwiched with another dielectric (or maybe air or vacuum) or metal. In this situation electromagnetic waves are confined within the slab. Moreover, only under a restricted condition those waves are allowed to propagate in parallel to the slab plane. Such electromagnetic waves are usually called propagation modes or simply “modes.” An optical device thus designed is called a waveguide. These modes are characterized by repeated total reflection during the propagation. Another mode is an evanescent mode. Because of the total reflection, the energy transport is not allowed to take place vertically to the interface of two dielectrics. For this reason, the evanescent mode is thought to be propagated in parallel to the interface very close to it.

8.7.1

TE and TM Waves in a Waveguide

In a waveguide configuration, propagating waves are classified into TE and TM modes. Quality of materials that constitute a waveguide largely governs the propagation modes within the waveguide.

8

(a)

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

(b)

Metal d

Waveguide Metal y

Dielectric (clad layer) Waveguide (core layer)

d

332

Dielectric (clad layer)

y z x

z x

Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer

Figure 8.10 depicts a cross-section of a slab waveguide. We assume that the electromagnetic wave is propagated toward the positive direction of the z-axis and that a waveguide infinitely spreads toward the z- and x-axes. Suppose that the said waveguide is spatially confined toward the y-axis. Let the thickness of the waveguide be d. From a point of view of material that shapes a waveguide, waveguides are classified into two types. (i) Electromagnetic waves are completely confined within the waveguide layer. This case typically happens when a dielectric forming the waveguide is sandwiched between a couple of metal layers (Fig. 8.10a). This is because the electromagnetic wave is not allowed to exist or propagate inside the metal. (ii) Electromagnetic waves are not completely confined within the waveguide. This case happens when the dielectric of the waveguide is sandwiched by a couple of other dielectrics. We distinguish this case as the total internal reflection from the above case (i). We will further describe it in Sect. 8.7.2. For the total internal reflection to take place, the refractive index of the waveguide must be higher than those of other dielectrics (Fig. 8.10b). The dielectric of the waveguide is called core layer and the other dielectric is called clad layer. In this case, electromagnetic waves are allowed to propagate inside of the clad layer, even though the region is confined very close to the interface between the clad and core layers. Such electromagnetic waves are said to be an evanescent wave. According to these two cases (i) and (ii), we have different conditions under which the allowed modes can exist. Now, let us return to Maxwell’s equations. We have introduced the equations of wave motion (7.35) and (7.36) from Maxwell’s equations (7.28) and (7.29) along with (7.7) and (7.10). One of their simplest solutions is a plane wave described by (7.53). The plane wave is characterized by that the wave has the same phase on an infinitely spreading plane perpendicular to the propagation direction (characterized by a wavenumber vector k). In a waveguide, however, the electromagnetic field is confined with respect to the direction parallel to the normal to the slab plane (i.e., the direction of the y-axis in Fig. 8.10). Consequently, the electromagnetic field can no

8.7

Waveguide Applications

333

longer have the same phase in that direction. Yet, as solutions of equations of wave motion, we can have a solution that has the same phase with the direction of the xaxis. Bearing in mind such a situation, let us think of Maxwell’s equations in relation to the equations of wave motion. Ignoring components related to partial differentiation with respect to x (i.e., the component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian coordinates, we have [3] ∂E y ∂Bx ∂Ez þ = 0, ∂y ∂z ∂t

ð8:130Þ

∂E x ∂By þ = 0, ∂z ∂t

ð8:131Þ

-

∂E x ∂Bz þ = 0, ∂y ∂t

ð8:132Þ

∂H y ∂Dx ∂H z = 0, ∂y ∂z ∂t

ð8:133Þ

∂H x ∂Dy = 0, ∂z ∂t

ð8:134Þ

-

∂H x ∂Dz = 0: ∂y ∂t

ð8:135Þ

Of the above equations, we collect those pertinent to Ex and differentiate (8.131), (8.132), and (8.133) with respect to z, y, and t, respectively, to get 2

2 ∂ E x ∂ By þ = 0, 2 ∂z∂t ∂z

2

ð8:136Þ

2

∂ Ex ∂ Bz = 0, 2 ∂y∂t ∂y

ð8:137Þ

334

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media 2

2 ∂ H y ∂2 Dx ∂ Hz = 0: 2 ∂t∂y ∂t∂z ∂t

ð8:138Þ

Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using (7.7), we get 2

2

2

∂ E ∂ Ex ∂ Ex þ = με 2 x : 2 2 ∂y ∂z ∂t

ð8:139Þ

This is a two-dimensional equation of wave motion. In a similar manner, from (8.130), (8.134), and (8.135), we have for the magnetic field 2

2

2

∂ Hx ∂ Hx ∂ Hx þ = με : 2 2 2 ∂y ∂z ∂t

ð8:140Þ

Equations (8.139) and (8.140) are two-dimensional wave equations with respect to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has the same phase. Suppose that we have plane wave solutions for them as in the case of (7.58) and (7.59). Then, we have E = E0 eiðkx - ωtÞ = E0 eiðknx - ωtÞ , H = H 0 eiðkx - ωtÞ = H 0 eiðknx - ωtÞ :

ð8:141Þ

Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in a dielectric medium. In a waveguide, on the other hand, the electromagnetic waves undergo repeated (total) reflections from the two boundaries positioned either side of the waveguide while being propagated. In a three-dimensional version, the wavenumber vector has three components kx, ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components such that k2 = k 2 = k 2y þ k2z :

ð8:142Þ

Equations (8.139) and (8.140) can be rewritten as 2

2

2

2

2

2

∂ Ex ∂ Ex ∂ Ex ∂ Hx ∂ Hx ∂ Hx þ = με , þ = με : 2 2 2 2 2 ∂ ð ± yÞ ∂ ð ± zÞ ∂ ð ± t Þ ∂ ð ± yÞ ∂ ð ± zÞ ∂ ð ± t Þ2 Accordingly, we have four wavenumber vector components

8.7

Waveguide Applications

335

|

Fig. 8.11 Four possible propagation directions k of an electromagnetic wave in a slab waveguide

−|

|

|

|

−|

|

|

Fig. 8.12 Geometries and propagation of the electromagnetic waves in a slab waveguide

ky

kz

k

θ kz

ky

k

θ

y z x ky = ± j ky j

and

kz = ± j kz j :

Figure 8.11 indicates this situation where an electromagnetic wave can be propagated within a slab waveguide in either one direction out of four choices of k. In this section, we assume that the electromagnetic wave is propagated toward the positive direction of the z-axis, and so we define kz as positive. On the other hand, ky can be either positive or negative. Thus, we get kz = k sin θ

and

ky = ± k cos θ:

ð8:143Þ

Figure 8.12 shows the geometries of the electromagnetic waves within the slab waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two interfaces of the slab waveguide be

336

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

y=0

and

y = d:

ð8:144Þ

That is, we assume that the thickness of the waveguide is d. Since (8.139) describes a wave equation for only one component Ex, (8.139) is suited for representing a TE wave. In (8.140), in turn, a wave equation is given only for Hx, and hence, it is suited for representing a TM wave. With the TE wave, the electric field oscillates parallel to the slab plane and vertical to the propagation direction. With the TM wave, in turn, the magnetic field oscillates parallel to the slab plane and vertical to the propagation direction. In a general case, electromagnetic waves in a slab waveguide are formed by superposition of TE and TM waves. Notice that Fig. 8.12 is applicable to both TE and TM waves. Let us further proceed with the waveguide analysis. The electric field E within the waveguide is described by superposition of the incident and reflected waves. Using the first equation of (8.141) and (8.143), we have Eðz, yÞ = εe E þ eiðkzsin θþky cos θ - ωtÞ þ εe 0 E - eiðkzsin θ - ky cos θ - ωtÞ ,

ð8:145Þ

where E+ (E-) and εe ε0e represent an amplitude and unit polarization vector of the incident (reflected) waves, respectively. The vector εe (or εe′) is defined in (7.67). Equation (8.145) is common to both the cases of TE and TM waves. From now, we consider the TE mode case. Suppose that the slab waveguide is sandwiched with a couple of metal sheet of high conductance. Since the electric field must be absent inside the metal, the electric field at the interface must be zero owing to the continuity condition of a tangential component of the electric field. Thus, we require the following condition should be met with (8.145): t  Eðz, 0Þ = 0 = t  εe Eþ eiðkzsin θ - ωtÞ þ t  εe 0 E - eiðkzsin θ - ωtÞ = ðt  εe E þ þ t  εe 0 E - Þeiðkzsin θ - ωtÞ :

ð8:146Þ

Therefore, since ei(kz sin θ - ωt) never vanishes, we have t  εe E þ þ t  εe 0 E - = 0,

ð8:147Þ

where t is a tangential unit vector at the interface. Since E is polarized along the x-axis, setting εe = εe′ = e1 and taking t as e1, we get E þ þ E - = 0: This means that the reflection coefficient of the electric field is -1. Denoting E+ = - E-  E0 (>0), we have

8.7

Waveguide Applications

337

E = e1 E 0 eiðkzsin θþky cos θ - ωtÞ - eiðkzsin θ - ky cos θ - ωtÞ

= e1 E0 eikycos θ - e - iky cos θ eiðkzsin θ - ωtÞ

= e1 2iE 0 sinðkycos θÞeiðkzsin θ - ωtÞ :

ð8:148Þ

Requiring the electric field to vanish at another interface of y = d, we have Eðz, dÞ = 0 = e1 2iE 0 sinðkdcos θÞeiðkzsin θ - ωtÞ : Note that in terms of the boundary conditions, we are thinking of Dirichlet conditions (see Sects. 1.3 and 10.3). In this case, we have nodes for the electric field at the interface between metal and a dielectric. For this condition to be satisfied, we must have kd cos θ = lπ ðl = 1, 2, ⋯Þ:

ð8:149Þ

From (8.149), we have a following condition for l: l ≤ kd=π:

ð8:150Þ

k = nk 0 ,

ð8:151Þ

Meanwhile, we have

where n is a refractive index of a dielectric that shapes the slab waveguide and the quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is given by n = c=v,

ð8:152Þ

where c and v are light velocity in vacuum and the dielectric media, respectively. Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed to take several (or more) numbers depending upon k, d, and l. Since in the z-direction no specific boundary conditions are imposed, we have propagating modes in that direction characterized by a propagation constant (vide infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a free space. For this reason, a quantity β defined as

338

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

β = k sin θ = nk 0 sin θ ðβ > 0Þ

ð8:153Þ

is said to be a propagation constant. In (8.153), k0 is a wavenumber in vacuum. From (8.149) and (8.153), we get β = k2 -

l2 π 2 d2

1=2

:

ð8:154Þ

Thus, the allowed TE waves indexed by l are called TE modes and represented as TEl. The phase velocity vp is given by vp = ω=β:

ð8:155Þ

Meanwhile, the group velocity vg is given by vg =

dω dβ = dβ dω

-1

:

ð8:156Þ

Using (8.154) and noting that k2 = ω2/v2, we get vg = v2 β=ω:

ð8:157Þ

v p vg = v2 :

ð8:158Þ

Thus, we have

Note that in (1.22) of Sect 1.1, we saw a relationship similar to (8.158). The characteristics of TM waves can be analyzed in a similar manner by examining the magnetic field Hx. In that case, the reflection coefficient of the magnetic field is +1 and we have antinodes for the magnetic field at the interface. Concomitantly, we adopt Neumann conditions as the boundary conditions (see Sects. 1.3 and 8.3). Regardless of the difference in the boundary conditions, however, discussion including (8.149)–(8.158) applies to the analysis of TM waves. Once Hx is determined, Ey and Ez can be determined as well from (8.134) and (8.135).

8.7.2

Total Internal Reflection and Evanescent Waves

If a slab waveguide shaped by a dielectric is sandwiched by a couple of another dielectric (Fig. 8.10b), the situation differs from a metal waveguide (Fig. 8.10a) we encountered in Sect. 8.7.1. Suppose in Fig. 8.10b that the former dielectric D1 of a refractive index n1 is sandwiched by the latter dielectric D2 of a refractive index n2.

Waveguide Applications

Fig. 8.13 Cross-section of the waveguide where the light is propagated in the direction of k. A dielectric fills a semi-infinite space situated below NXY. We suppose another virtual plane N′X′Y′ that is parallel with NXY. We need the plane N′X′Y′ to estimate an optical path difference (or phase difference)

339

P N

X

d

8.7

A

Y

θ Q N’

X’

P’

B

Y’

θ

A’

Q’ B’

Suppose that an electromagnetic wave is being propagated from D1 toward D2. Then, we must have n1 > n2

ð8:159Þ

so that the total internal reflection can take place at the interface of D1 and D2. In this case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively. The biggest difference between the present waveguide and the previous one is that unlike the previous case, the total internal reflection occurs in the present case. Concomitantly, an evanescent wave is present in the clad layer very close to the interface. First, let us estimate the conditions that are satisfied so that an electromagnetic wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of the waveguide where the light is propagated in the direction of k. In Fig. 8.13, suppose that we have a normal N to the plane of paper at P. Then, N and a straight line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space situated below NXY. Further suppose that there is another virtual plane N′X′Y′ that is parallel with NXY as shown. Here N′ is parallel to N. The separation of the two parallel planes is d. We need the virtual plane N′X′Y′ just to estimate an optical path difference (or phase difference, more specifically) between two waves, i.e., a propagating wave and a reflected wave. Let n be a unit vector in the direction of k, i.e.,

340

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

n = k= j k j = k=k:

ð8:160Þ

Then the electromagnetic wave is described as E = E0 eiðkx - ωtÞ = E0 eiðknx - ωtÞ :

ð8:161Þ

Suppose that we take a coordinate system such that x = rn þ su þ tv,

ð8:162Þ

where u and v represent unit vectors in the direction perpendicular to n. Then (8.161) can be expressed by E = E0 eiðkr - ωtÞ :

ð8:163Þ

Suppose that the wave is propagated starting from a point A to P and reflected at P. Then, the wave is further propagated to B and reflected again to reach Q. The wave front is originally at AB and finally at PQ. Thus, the Z-shaped optical path length APBQ is equal to a separation between A′B′ and PQ. Notice that the separation between A′B′ and P′Q′ is taken so that it is equal to that between AB and PQ. The geometry of Fig. 8.13 implies that two waves starting from AB and A′B′ at once reach PQ again at once. We find the separation between AB and A′B′ is 2d cos θ: Let us tentatively call these waves Wave-AB and Wave-A′B′ and describe their electric fields as EAB and EA0 B0 , respectively. Then, we denote EAB = E0 eiðkr - ωtÞ ,

ð8:164Þ

EA0 B0 = E0 ei½kðrþ2d cos θÞ - ωt ,

ð8:165Þ

where k is a wavenumber in the dielectric. Note that since EA0 B0 gets behind EAB, a plus sign appears in the first term of the exponent. Therefore, the phase difference between the two waves is 2kd cos θ:

ð8:166Þ

Now, let us come back to the actual geometry of the waveguide. That is, the core layer of thickness d is sandwiched by a couple of clad layers (Fig. 8.10b). In this situation, the wave EAB experiences the total internal reflection two times, which we ignored in the above discussion of the metal waveguide. Since the total internal

8.7

Waveguide Applications

341

reflection causes a complex phase shift, we have to take account of this effect. The phase shift was defined as α of (8.114) for a TE mode and β for a TM mode. Notice that in Fig. 8.13 the electric field oscillates perpendicularly to the plane of paper with the TE mode, whereas it oscillates in parallel with the plane of paper with the TM mode. For both cases the electric field oscillates perpendicularly to n. Consequently, the phase shift due to these reflections has to be added to (8.166). Thus, for the phase commensuration to be obtained, the following condition must be satisfied: 2kd cos θ þ 2δ = 2lπ ðl = 0, 1, 2, ⋯Þ,

ð8:167Þ

where δ is either δTE or δTM defined below according to the case of the TE wave and TM wave, respectively. For a practical purpose, (8.167) is dealt with by a numerical calculation, e.g., to design an optical waveguide. Unlike (8.149), what is the most important with (8.167) is that the condition l = 0 is permitted because of δ < 0 (see just below). For convenience and according to the custom, we adopt a phase shift notation other than that defined in (8.114). With the TE mode, the phase is retained upon reflection at the critical angle, and so we identify α with an additional component δTE. In the TM case, on the other hand, the phase is reversed upon reflection at the critical angle (i.e., a π shift occurs). Since this π shift has been incorporated into β, it suffices to consider only an additional component δTM. That is, we have δTE  α

and

δTM  β - π:

ð8:168Þ

We rewrite (8.110) as iα iδTE  R⊥ E =e =e

ae - iσ = e - 2iσ aeiσ

and δTE = - 2σ ðσ > 0Þ,

ð8:169Þ

where we have aeiσ = cos θ þ i

sin 2 θ - n2 :

ð8:170Þ

Therefore, δ tan σ = - tan TE = 2 Meanwhile, rewriting (8.112) we have

p

sin 2 θ - n2 : cos θ

ð8:171Þ

342

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k

RE = eiβ = eiðδTM þπ Þ  -

be - iτ = - e - 2iτ = eið- 2τþπ Þ , beiτ

ð8:172Þ

where beiτ = n2 cos θ þ i

sin 2 θ - n2 :

ð8:173Þ

Note that the minus sign with the second last equality in (8.172) is due to the phase reversal upon reflection. From (8.172), we may put β = - 2τ þ π: Comparing this with the second equation of (8.168), we get δTM = - 2τ ðτ > 0Þ:

ð8:174Þ

Consequently, we get tan τ = - tan

δTM = 2

p

sin 2 θ - n2 : n2 cos θ

ð8:175Þ

Finally, the additional phase change δTE and δTM upon the total reflection is given by [4]. p δTE = - 2 tan

-1

sin 2 θ - n2 cos θ

p and

δTM = - 2 tan

-1

sin 2 θ - n2 : ð8:176Þ n2 cos θ

We emphasize that in (8.176) both δTE and δTM are negative quantities. This phase shift has to be included in (8.167) as a negative quantity δ. At a first glance, (8.176) seems to differ largely from (8.120) and (8.125). Nevertheless, noting that a trigonometric formula tan 2x =

2 tan x 1 - tan 2 x

and remembering that δTM in (8.168) includes π arising from the phase reversal, we find that both the relations are virtually identical. Evanescent waves are drawing a large attention in the field of basic physics and applied device physics. If the total internal reflection is absent, ϕ is real. But, under the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent wave is described as

8.8

Stationary Waves

343

Et = Eεt eiðkt zsin ϕþkt y cos ϕ - ωtÞ = Eεt eiðkt zsin ϕþkt yib - ωtÞ = Eεt eiðkt zsin ϕ - ωtÞ e - kt yb :

ð8:177Þ

In (8.177), a unit polarization vector εt is either perpendicular to the plane of paper of Fig. 8.13 (the TE case) or in parallel to it (the TM case). Notice that the coordinate system is different from that of (8.104). The quantity kt sin ϕ is the ðsÞ ð eÞ propagation constant. Let vp and vp be a phase velocity of the electromagnetic wave in the slab waveguide (i.e., core layer) and evanescent wave in the clad layer, respectively. Then, in virtue of Snell’s law, we have v1 < vðpsÞ =

ω ω v1 v = = = vðpeÞ = 2 < v2 , sin θ k sin θ kt sin ϕ sin ϕ

ð8:178Þ

where v1 and v2 are light velocity in a free space filled by the dielectrics D1 and D2, respectively. For this, we used a relation described as ω = v1 k = v2 k t :

ð8:179Þ

We also used Snell’s law with the third equality of (8.178). Notice that sinϕ > 1 in the evanescent region and that kt sin ϕ is a propagation constant in the clad layer. ðsÞ ðeÞ Also note that vp is equal to vp and that these phase velocities are in between the two velocities of the free space. Thus, the evanescent waves must be present, accompanying propagating waves that undergo the total internal reflections in a slab waveguide. As remarked in (8.105), the electric field of evanescent waves decays exponentially with increasing z. This implies that the evanescent waves exist only in the clad layer very close to an interface of core and clad layers.

8.8

Stationary Waves

So far, we have been dealing with propagating waves either in a free space or in a waveguide. If the dielectric shaping the waveguide is confined in another direction, the propagating waves show specific properties. Examples include optical fibers. In this section we consider a situation where the electromagnetic wave is propagating in a dielectric medium and reflected by a “wall” formed by metal or another dielectric. In such a situation, the original wave (i.e., a forward wave) causes interference with the backward wave, and a stationary wave is formed as a consequence of the interference.

344

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

To approach this issue, we deal with superposition of two waves that have different phases and different amplitudes. To generalize the problem, let ψ 1 and ψ 2 be two cosine functions described as ψ 1 = a1 cos b1 and ψ 2 = a2 cos b2 :

ð8:180Þ

Their addition is expressed as ψ = ψ 1 þ ψ 2 = a1 cos b1 þ a2 cos b2 :

ð8:181Þ

Here we wish to unify (8.181) as a single cosine (or sine) function. To this end, we modify a description of ψ 2 such that ψ 2 = a2 cos½b1 þ ðb2 - b1 Þ = a2 ½cos b1 cosðb2 - b1 Þ - sin b1 sinðb2 - b1 Þ = a2 ½cos b1 cosðb1 - b2 Þ þ sin b1 sinðb1 - b2 Þ:

ð8:182Þ

Then, the addition is described by ψ = ψ1 þ ψ2 = ½a1 þ a2 cosðb1 - b2 Þ cos b1 þ a2 sinðb1 - b2 Þ sin b1 :

ð8:183Þ

Putting R such that R=

½a1 þ a2 cosðb1 - b2 Þ2 þ a2 2 sin2 ðb1 - b2 Þ

=

a1 2 þ a2 2 þ 2a1 a2 cosðb1 - b2 Þ,

ð8:184Þ

we get ψ = R cosðb1 - θÞ, where θ is expressed by

ð8:185Þ

8.8

Stationary Waves

345

Fig. 8.14 Geometric diagram in relation to the superposition of two waves having different amplitudes (a1 and a2) and different phases (b1 and b2)

y cos(



)

sin(



)

b1 ‒ b2

a1

θ a2 O

b1

b2

a2 sinðb1 - b2 Þ : a1 þ a2 cosðb1 - b2 Þ

tan θ =

x

ð8:186Þ

Figure 8.14 represents a geometric diagram in relation to the superposition of two waves having different amplitudes (a1 and a2) and different phases (b1 and b2) [5]. To apply (8.185) to the superposition of two electromagnetic waves that are propagating forward and backward to collide head-on with each other, we change the variables such that b1 = kx - ωt

and

b2 = - kx - ωt,

ð8:187Þ

where the former equation represents a forward wave, whereas the latter a backward wave. Then we have b1 - b2 = 2kx:

ð8:188Þ

Equation (8.185) is rewritten as ψ ðx, t Þ = R cosðkx - ωt - θÞ, with

ð8:189Þ

346

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.15 Superposition of two sinusoidal waves. In (8.189) and (8.190), we put a1 = 1, a2 = 0.5, with (i) t = 0; (ii) t = T/4; (iii) t = T/2; (iv) t = 3T/4. ψ(x, t) is plotted as a function of phase kx



䢳䢰䢷

(i)



(iii) (ii)

䢲䢰䢷 䢲 䢯䢲䢰䢷

(iv)

䢯䢳 䢯䢳䢰䢷 䢯䢴

R=

0

Phase kx (rad)

a1 2 þ a2 2 þ 2a1 a2 cos 2kx

6.3 ð8:190Þ

and tan θ =

a2 sin 2kx : a1 þ a2 cos 2kx

ð8:191Þ

Equation (8.189) looks simple, but since both R and θ vary as a function of x, the situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless, when x takes a special value, (8.189) is expressed by a simple function form. For example, at t = 0, ψ ðx, 0Þ = ða1 þ a2 Þ cos kx: This corresponds to (i) of Fig. 8.15. If t = T/2 (where T is a period, i.e., T = 2π/ω), we have ψ x,

T = - ða1 þ a2 Þ cos kx: 2

This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do not have a simple function form. We characterize Fig. 8.15 below. If we have 2kx = nπ

or

x = nλ=4

ð8:192Þ

with λ being a wavelength, then θ = 0 or π, and so θ can be eliminated. This situation occurs with every quarter period of a wavelength. Let us put t = 0 and examine how the superposed wave looks like. For instance, putting x = 0, x = λ/4, and x = λ/2, we have

8.8

Stationary Waves

347

ψ ð0, 0Þ = j a1 þ a2 j , ψ ðλ=4, 0Þ = ψ ð3λ=4, 0Þ = 0, ψ ðλ=2, 0Þ = - ja1 þ a2 j, ð8:193Þ respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t = T/4, we have similarly ψ ð0, T=4Þ = 0, ψ ðλ=4, T=4Þ = ja1 - a2 j, ψ ðλ=2, T=4Þ = 0, ψ ðλ=4, T=4Þ = - ja1 - a2 j:

ð8:194Þ

Thus, the waves that vary with time are characterized by two dram-shaped envelopes that have extremals ja1 + a2j and ja1 - a2j or those - j a1 + a2j and - ja1 - a2j. An important implication of Fig. 8.15 is that no node is present in the superposed wave. In other words, there is no instant t0 when ψ(x, t0) = 0 for any x. From the aspect of energy transport of electromagnetic waves, if |a1| > j a2j (where a1 and a2 represent an amplitude of the forward and backward waves, respectively), the net energy flow takes place in the travelling direction of the forward wave. If, on the other hand, |a1| > j a2j, the net energy flow takes place in the travelling direction of the backward wave. In this respect, think of Poynting vectors. In case |a1| = j a2j, the situation is particularly simple. No net energy flow takes place in this case. Correspondingly, we observe nodes. Such waves are called stationary waves. Let us consider this simple situation for an electromagnetic wave that is incident perpendicularly to the interface between two dielectrics (one of them may be a metal). Returning back to (7.58) and (7.66), we describe two electromagnetic waves that are propagating in the positive and negative direction of the z-axis such that E1 = E 1 εe eiðkz - ωtÞ

and

E2 = E2 εe eið- kz - ωtÞ ,

ð8:195Þ

where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a superposed wave is described as E = E1 þ E2 :

ð8:196Þ

Taking account of the reflection of an electromagnetic wave perpendicularly incident on a wall, let us consider following two cases: (i) Syn-phase: The phase of the electric field is retained upon reflection. We assume that E1 = E2 (>0). Then, we have

348

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.16 Superposition of electric fields of forward (or incident) wave E1 and backward (or reflected) wave E2

=

(

=

(

)

)

E = E 1 εe eiðkz - ωtÞ þ eið- kz - ωtÞ = E 1 εe e - iωt eikz þ e - ikz

= 2E 1 εe e - iωt cos kz:

ð8:197Þ

In (8.197), we put z = 0 at the interface for convenience. Taking a real part of (8.197), we have E = 2E 1 εe cos ωt cos kz:

ð8:198Þ

Note that in (8.198) variables z and t have been separated. This implies that we have a stationary wave. For this case to be realized, the characteristic impedance of the dielectric of the incident wave side should be smaller enough than that of the other side; see (8.51) and (8.59). In other words, the dielectric constant of the incident side should be large enough. We have nodes at positions that satisfy - kz =

π þ mπ ðm = 0, 1, 2, ⋯Þ 2

or

-z=

1 m λ þ λ: 4 2

ð8:199Þ

Note that we are thinking of the stationary wave in the region of z < 0. Equation (8.199) indicates that nodes are formed at a quarter wavelength from the interface and every half wavelength from it. The node means the position where no electric field is present. Meanwhile, antinodes are observed at positions

8.8

Stationary Waves

349

- kz = mπ ðm = 0, 1, 2, ⋯Þ

or

-z= þ

m λ: 2

Thus, the nodes and antinodes alternate with every quarter wavelength. (ii) Antiphase: The phase of the electric field is reversed. We assume that E1 = - E2 (>0). Then, we have E = E1 εe eiðkz - ωtÞ - eið- kz - ωtÞ = E 1 εe e - iωt eikz - e - ikz

= 2iE 1 εe e - iωt sin kz:

ð8:200Þ

Taking a real part of (8.197), we have E = 2E 1 εe sin ωt sin kz:

ð8:201Þ

In (8.201) variables z and t have been separated as well. For this case to be realized, the characteristic impedance of the dielectric of the incident wave side should be larger enough than that of the other side. In other words, the dielectric constant of the incident side should be small enough. Practically, this situation can easily be attained choosing a metal of high reflectance for the wall material. We have nodes at positions that satisfy - kz = mπ ðm = 0, 1, 2, ⋯Þ

or

-z=

m λ: 2

ð8:202Þ

The nodes are formed at the interface and every half wavelength from it. As in the case of the syn-phase, the antinodes take place with the positions shifted by a quarter wavelength relative to the nodes. If there is another interface at say -z = L (>0), the wave goes back and forth many times. If an absolute value of the reflection coefficient at the interface is high enough (i.e., close to the unity), attenuation of the wave is ignorable. For both the syn-phase and antiphase cases, we must have kL = mπ ðm = 1, 2, ⋯Þ

or

L=

m λ 2

ð8:203Þ

so that stationary waves can stand stable. The stationary waves indexed with a positive integer m in (8.203) are said to be longitudinal modes. Note that the index l that features the transverse modes in the waveguide can be zero or a positive integer. Unlike the previous case of (8.167) in Sect. 8.7, the index m of (8.203) is not allowed to be zero. If L in (8.203) is large enough compared to λ, m is a pretty large number as well, and what is more, many different numbers of m are allowed between given

350

8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

interfaces. In other words, stationary waves of many different wavelengths are allowed to be present at once between the given interfaces. In such a case, the stationary waves are referred to as a longitudinal multimode. For a practical purpose, an optical device having interfaces that make the stationary waves stabilize is said to be a resonator. Various geometries and constitutions of the resonator are proposed in combination with various dielectrics including semiconductors. Related discussion can be seen in Chap. 9.

References 1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 2. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York 3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester 4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge 5. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester

Chapter 9

Light Quanta: Radiation and Absorption

So far we discussed propagation of light and its reflection and transmission (or refraction) at an interface of dielectric media. We described characteristics of light from the point of view of an electromagnetic wave. In this chapter, we describe properties of light in relation to quantum mechanics. To this end, we start with Planck’s law of radiation that successfully reproduced experimental results related to a blackbody radiation. Before this law had been established, Rayleigh–Jeans law failed to explain the experimental results at a high frequency region of radiation (the ultraviolet catastrophe). The Planck’s law of radiation led to the discovery of light quanta. Einstein interpreted Planck’s law of radiation on the basis of a model of two-level atoms. This model includes the so-called Einstein A and B coefficients that are important in optics applications, especially lasers. We derive these coefficients from a classical point of view based on a dipole oscillation. We also consider a close relationship between electromagnetic waves confined in a cavity and a motion of a harmonic oscillator.

9.1

Blackbody Radiation

Historically, the relevant theory was first propounded by Max Planck and then Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the basis of the experiments called cavity radiation or blackbody radiation. Here, however, we wish to derive Planck’s law of radiation on the assumption of the existence of quantum harmonic oscillators. As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an energy ℏω/2. Therefore, we measure energies of the oscillator in reference to that state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground state. Then, according to Boltzmann distribution law the number of oscillators of the first excited state N1 is

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_9

351

352

9

Light Quanta: Radiation and Absorption

N 1 = N 0 e - ℏω=kB T ,

ð9:1Þ

where kB is Boltzmann constant and T is absolute temperature. Let Nj be the number of oscillators of the j-th excited state. Then we have N j = N 0 e - jℏω=kB T :

ð9:2Þ

Let N be the total number of oscillators in the system. Then, we get N = N 0 þ N 0 e - ℏω=kB T þ ⋯ þ N 0 e - jℏω=kB T þ ⋯ = N0

1

j=0

ð9:3Þ

e - jℏω=kB T :

Let E be a total energy of the oscillator system in reference to the ground state. That is, we put a ground state energy at zero. Then we have E = 0  N 0 þ N 0 ℏωe - ℏω=kB T þ ⋯ þ N 0 jℏωe - jℏω=kB T þ ⋯ 1 = N 0 j = 0 jℏωe - jℏω=kB T :

ð9:4Þ

Therefore, an average energy of oscillators E is given by E=

1 - jℏω=k B T j = 0 je 1 - jℏω=kB T j = 0e

E = ℏω N

:

ð9:5Þ

Putting x  e - ℏω=kB T [1], we have E = ℏω

1 j j = 0 jx 1 j j = 0x

:

ð9:6Þ

Since x < 1, we have 1 j=0

=

jxj =

1 j=0

jxj - 1  x =

1

d dx

j=0

xj

x , ð 1 - xÞ 2

d 1 dx 1 - x

x ð9:7Þ

1 j=0

Then, we get

x=

xj =

1 : 1-x

ð9:8Þ

9.2

Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

E=

ℏωx ℏωe - ℏω=kB T ℏω : = = ℏω=k T B -1 1 - x 1 - e - ℏω=kB T e

353

ð9:9Þ

The function 1 eℏω=kB T - 1

ð9:10Þ

is a form of Bose–Einstein distribution functions; more specifically it is called the Bose–Einstein distribution function for photons today. If (ℏω/kBT ) ≪ 1, we have eℏω=kB T ≈ 1 þ ðℏω=kB T Þ: Therefore, we get E ≈ k B T:

ð9:11Þ

Thus, the relation (9.9) asymptotically agrees with a classical theory. In other words, according to the classical theory related to law of equipartition of energy, energy of kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a kinetic energy and a potential energy of a harmonic oscillator.

9.2

Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

Researcher at the time tried to seek the relationship between the energy density inside the cavity and (angular) frequency of radiation. To reach the relationship, let us introduce a concept of mode density of electromagnetic waves related to the blackbody radiation. We define the mode density D(ω) as the number of modes of electromagnetic waves per unit volume per unit angular frequency. We refer to the electromagnetic waves having allowed specific angular frequencies and polarization as modes. These modes must be described as linearly independent functions. Determination of the mode density is related to boundary conditions (BCs) imposed on a physical system. We already dealt with this problem in Chaps. 2, 3, and 8. These BCs often appear when we find solutions of a differential equation. Let us consider a following wave equation: 2

2

∂ ψ 1 ∂ ψ = 2 : 2 v ∂t 2 ∂x According to the method of separation of variables, we put

ð9:12Þ

354

9

Light Quanta: Radiation and Absorption

ψ ðx, t Þ = X ðxÞT ðt Þ:

ð9:13Þ

Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have 1 1 d2 T 1 d2 X = 2 = - k2 , 2 X dx v T dt 2

ð9:14Þ

where k is an undetermined (possibly complex) constant. For the x component, we get d2 X þ k 2 X = 0: dx2

ð9:15Þ

Remember that k is supposed to be a complex number for the moment (see Example 1.1). Modifying Example 1.1 a little bit such that (9.15) is posed in a domain [0, L] and imposing the Dirichlet conditions such that X ð0Þ = X ðLÞ = 0,

ð9:16Þ

X ðxÞ = a sin kx,

ð9:17Þ

we find a solution of

where a is a constant. The constant k can be determined to satisfy the BCs; i.e., kL = mπ

k = mπ=L ðm = 1, 2, ⋯Þ:

or

ð9:18Þ

Thus, we get real numbers for k. Then, we have a solution T ðt Þ = b sin kvt = b sin ωt:

ð9:19Þ

The overall solution is then ψ ðx, t Þ = c sin kx sin ωt:

ð9:20Þ

This solution has already appeared in Chap. 8 as a stationary solution. The readers are encouraged to derive these results. In a three-dimensional case, we have a wave equation 2

2

2

2

1 ∂ ψ ∂ ψ ∂ ψ ∂ ψ þ 2þ 2 = 2 : 2 v ∂t 2 ∂x ∂y ∂z In this case, we also assume

ð9:21Þ

9.2

Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

355

ψ ðx, t Þ = X ðxÞY ðyÞZ ðzÞT ðt Þ:

ð9:22Þ

1 1 d2 T 1 d2 X 1 d2 Y 1 d2 Z þ þ = = - k2 : X dx2 Y dx2 Z dx2 v2 T dt 2

ð9:23Þ

Similarly we get

Putting 1 d2 X = - k 2x , X dx2

1 d2 Y = - k 2y , Y dx2

1 d2 Z = - k2z , Z dx2

ð9:24Þ

we have k2x þ k 2y þ k2z = k2 :

ð9:25Þ

Then, we get a stationary wave solution as in the one-dimensional case such that ψ ðx, t Þ = c sin k x x sin k y y sin kz z sin ωt:

ð9:26Þ

The BCs to be satisfied with X(x), Y( y), and Z(z) are kx L = mx π,

ky L = my π,

k z L = mz π

mx , my , mz = 1, 2, ⋯ :

ð9:27Þ

Returning to the main issue, let us deal with the mode density. Think of a cube of each side of L that is placed as shown in Fig. 9.1. Calculating k, we have k=

k 2x þ k2y þ k2z =

π L

m2x þ m2y þ m2z =

ω , c

ð9:28Þ

where we assumed that the inside of a cavity is vacuum and, hence, the propagation velocity of light is c. Rewriting (9.28), we have m2x þ m2y þ m2z =

Lω : πc

ð9:29Þ

The number mx, my, and mz represents allowable modes in the cavity; the set of (mx, my, mz) specifies individual modes. Note that mx, my, and mz are all positive integers. If, for instance, -mx were allowed, this would produce sin(-kxx) = sin kxx; but this function is linearly dependent on sinkxx. Then, a mode indexed by mx should not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that satisfies (9.29) corresponds to each mode. Therefore, the number of modes that satisfies a following expression

356

9

Light Quanta: Radiation and Absorption

z

Fig. 9.1 Cube of each side of L. We use this simple model to estimate mode density

L L

L O

y x m2x þ m2y þ m2z ≤

Lω πc

ð9:30Þ

represents those corresponding to angular frequencies equal to or less than given ω. Each mode has one-to-one correspondence with the lattice indexed by (mx, my, mz). Accordingly, if mx, my, mz ≫ 1, the number of allowed modes approximately equals one-eighth of a volume of a sphere having a radius of Lω πc . Let NL be the number of modes whose angular frequencies equal to or less than ω. Recalling that there are two independent modes having the same index (mx, my, mz) but mutually orthogonal polarities, we have NL =

4π Lω 3 πc

3

1 L3 ω3  2= 2 3: 8 3π c

ð9:31Þ

Consequently, the mode density D(ω) is expressed as DðωÞdω =

ω2 1 dN L dω = 2 3 dω, 3 dω π c L

ð9:32Þ

where D(ω)dω represents the number of modes per unit volume whose angular frequencies range ω and ω + dω. Now, we introduce a function ρ(ω) as an energy density per unit angular frequency. Then, combining D(ω) with (9.9), we get

9.2

Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

ρðωÞ = DðωÞ

ℏω

eℏω=kB T - 1

=

1 ℏω3 : π 2 c3 eℏω=kB T - 1

357

ð9:33Þ

The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a dimension [J m-3 s]. To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an electric field within a cavity surrounded by a metal husk, because the electric field must be absent at an interface between the cavity and metal. The problem, however, can equivalently be solved using the magnetic field. This is because at the interface the reflection coefficient of electric field and magnetic field has a reversed sign (see Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann condition. This condition requires differential coefficients to vanish at the boundary (i.e., the interface between the cavity and metal). In a similar manner to the above, we get ψ ðx, t Þ = c cos kx cos ωt:

ð9:34Þ

By imposing BCs, again we have (9.18) that leads to the same result as the above. We may also impose the periodic BCs. This type of equation has already been treated in Chap. 3. In that case we have a solution of eikx

e - ikx :

and

The BCs demand that e0 = 1 = eikL. That is, kL = 2πm

ðm = 0, ± 1, ± 2, ⋯Þ:

ð9:35Þ

Notice that eikx and e-ikx are linearly independent and, hence, minus sign for m is permitted. Correspondingly, we have NL =

4π Lω 3 2πc

3

2=

L3 ω 3 : 3π 2 c3

ð9:36Þ

In other words, here we have to consider a whole volume of a sphere of a half radius of the previous case. Thus, we reach the same conclusion as before. If the average energy of an oscillator were described by (9.11), we would obtain a following description of ρ(ω) such that ρðωÞ = DðωÞkB T =

ω2 k T: π 2 c3 B

ð9:37Þ

This relation is well-known as Rayleigh-Jeans law, but (9.37) disagreed with experimental results in that according to Rayleigh-Jeans law, ρ(ω) diverges toward infinity as ω goes to infinity. The discrepancy between the theory and experimental

358

9

Light Quanta: Radiation and Absorption

results was referred to as “ultraviolet catastrophe.” Planck’s law of radiation described by (9.33), however, reproduces the experimental results well.

9.3

Two-Level Atoms

Although Planck established Planck’s law of radiation, researchers at that time hesitated in professing the existence of light quanta. It was Einstein that derived Planck’s law by assuming two-level atoms in which light quanta play a role. His assumption comprises the following three postulates: (1) The physical system to be addressed comprises the so-called hypothetical two-level atoms that have only two energy levels. If two-level atoms absorb a light quantum, a ground-state electron is excited up to a higher level (i.e., the stimulated absorption). (2) The higher-level electron may spontaneously lose its energy and return back to the ground state (the spontaneous emission). (3) The higher-level electron may also lose its energy and return back to the ground state. Unlike (2), however, the excited electron has to be stimulated by being irradiated by light quanta having an energy corresponding to the energy difference between the ground state and excited state (the stimulated emission). Figure 9.2 schematically depicts the optical processes of the Einstein model. Under those postulates, Einstein dealt with the problem probabilistically. Suppose that the ground state and excited state have energies E1 and E2. Einstein assumed that light quanta having an energy equaling E2 - E1 take part in all the above three transition processes. He also propounded the idea that the light quanta have an energy that is proportional to its (angular) frequency. That is, he thought that the following relation should hold:



,

)

(

)



=

(

,

Stimulated absorption

Stimulated emission

Fig. 9.2 Optical processes of Einstein two-level atom model

Spontaneous emission

9.3

Two-Level Atoms

359

ℏω21 = E 2 - E1 ,

ð9:38Þ

where ω21 is an angular frequency of light that takes part in the optical transitions. For the time being, let us follow Einstein’s postulates. 1. Stimulated absorption: This process is simply said to be an “absorption.” Let Wa [s-1] be the transition probability that the electron absorbs a light quantum to be excited to the excited state. Wa is described as W a = N 1 B21 ρðω21 Þ,

ð9:39Þ

where N1 is the number of atoms occupying the ground state; B21 is a proportional constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead of ω in (9.33). The coefficient B21 is called Einstein B coefficient; more specifically one of Einstein B coefficients. Namely, B21 is pertinent to the transition from the ground state to excited state. 2. Emission processes: The processes include both the spontaneous and stimulated emissions. Let We [s-1] be the transition probability that the electron emits a light quantum and returns back to the ground state. We is described as W e = N 2 B12 ρðω21 Þ þ N 2 A12 ,

ð9:40Þ

where N2 is the number of atoms occupying the excited state; B12 and A12 are proportional constants. The coefficient A12 is called Einstein A coefficient relevant to the spontaneous emission. The coefficient B12 is associated with the stimulated emission and also called Einstein B coefficient together with B21. Here, B12 is pertinent to the transition from the excited state to ground state. Now, we have B12 = B21 :

ð9:41Þ

The reasoning for this is as follows: The coefficients B12 and B21 are proportional to the matrix elements pertinent to the optical transition. Let T be an operator associated with the transition. Then, a matrix element is described using an inner product notation of Chap. 1 by B21 = hψ 2 jT j ψ 1 i,

ð9:42Þ

where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical transition. As a good approximation, we use er for T (dipole approximation), where e is an elementary charge and r is a position operator (see Chap. 1). If (9.42) represents the absorption process (i.e., the transition from the ground state to excited state), the corresponding emission process should be described as a reversed process by

360

9

Light Quanta: Radiation and Absorption

B12 = hψ 1 jT j ψ 2 i:

ð9:43Þ

Notice that in (9.43) ψ 2 and ψ 1 are initial and final states. Taking complex conjugate of (9.42), we have B21  = hψ 1 jT { j ψ 2 i,

ð9:44Þ

where T{ is an operator adjoint to T (see Chap. 1). With an Hermitian operator H, from Sect. 1.4 we have H { = H:

ð1:119Þ

Since T is also Hermitian, we have T { = T:

ð9:45Þ

B21  = B12 :

ð9:46Þ

Thus, we get

But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real functions. Then, we have B21  = B21 = B12 : That is, we assume that the matrix B is real symmetric. In the case of two-level atoms, as a matrix form we get B=

0 B12

B12 0

:

ð9:47Þ

Compare (9.47) with (4.28). Now, in the thermal equilibrium, we have W e = W a:

ð9:48Þ

N 2 B21 ρðω21 Þ þ N 2 A12 = N 1 B21 ρðω21 Þ,

ð9:49Þ

That is,

where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get

9.4

Dipole Radiation

361

N2 = exp½ - ðE 2 - E 1 Þ=kB T : N1

ð9:50Þ

Here if moreover we assume (9.38), we get N2 = expð- ℏω21 =kB T Þ: N1

ð9:51Þ

Combing (9.49) and (9.51), we have expð- ℏω21 =kB T Þ =

B21 ρðω21 Þ : B21 ρðω21 Þ þ A12

ð9:52Þ

Solving (9.52) with respect to ρ(ω12), we finally get ρðω21 Þ =

expð- ℏω21 =kB T Þ A A12 1  = 12  : B21 1 - expð- ℏω21 =kB T Þ B21 expðℏω21 =kB T Þ - 1

ð9:53Þ

Assuming that A12 ℏω21 3 = 2 3 , B21 π c

ð9:54Þ

we have ρðω21 Þ =

ℏω21 3 1  : π 2 c3 expðℏω21 =k B T Þ - 1

ð9:55Þ

This is none other than Planck’s law of radiation.

9.4

Dipole Radiation

In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The electromagnetic radiation results from an accelerated motion of a dipole. A dipole moment p(t) is defined as a function of time t by pð t Þ =

x0 ρðx0 , t Þdx0 ,

ð9:56Þ

where x′ is a position vector in a Cartesian coordinate; an integral is taken over a whole three-dimensional space; ρ is a charge density appearing in (7.1). If we

362

9 Light Quanta: Radiation and Absorption

Fig. 9.3 Dipole moment viewed from the frame O or O′





.

.

consider a system comprising point charges, integration can immediately be carried out to yield pð t Þ =

qx, i i i

ð9:57Þ

where qi is a charge of each point charge i and xi is a position vector of the point charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the coordinate system. However, if a total charge of the system is zero, p(t) does not depend on the coordinate system. Let p(t) and p′(t) be a dipole moment viewed from the frame O and O′, respectively (see Fig. 9.3). Then we have p0 ðt Þ =

q x0 i i i

=

q ðx i i 0

þ xi Þ =

q i i

x0 þ

qx i i i

=

= pðt Þ:

qx i i i ð9:58Þ

Notice that with the third equality the first term vanishes because the total charge is zero. The system comprising two point charges that have an opposite charge (±q) is particularly simple but very important. In that case we have x: pðt Þ = qx1 þ ð- qÞx2 = qðx1 - x2 Þ = q~

ð9:59Þ

~ is a vector Here we assume that q > 0 according to the custom and, hence, x directing from the minus point charge to the plus charge. Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radiation from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate

363

+q

z

z0 a

(a)

Dipole Radiation

‒q

(b)

x

θ

‒a ‒z 0

9.4

O

y

φ x

Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at the origin of the coordinate system is executing harmonic oscillation along the z-direction around an equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a righthanded system

system and assumed to be of an atomic or molecular scale in extension; we regard a center of the dipole as the origin. Figure 9.4b represents a large-scale geometry of the dipole and surrounding space of it. For the electromagnetic radiation, an accelerated motion of the dipole is of primary importance. The electromagnetic fields produced by pð€t Þ vary as the inverse of r, where r is a macroscopic distance between the dipole and observation point. Namely, r is much larger compared to the dipole size. There are other electromagnetic fields that result from the dipole moment. The fields result from p(t) and pð_t Þ. Strictly speaking, we have to include those quantities that are responsible for the electromagnetic fields associated with the dipole radiation. Nevertheless, the fields produced by p(t) and pð_t Þ vary as a function of the inverse cube and inverse square of r, respectively. Therefore, the surface integral of the square of the fields associated with p(t) and pð_t Þ asymptotically reaches zero with enough large r with respect to a sphere enveloping the dipole. Regarding pð€t Þ, however, the surface integral of the square of the fields remains finite even with enough large r. For this reason, we refer to the spatial region where pð€t Þ does not vanish as a wave zone. Suppose that a dipole placed at the origin of the coordinate system is executing harmonic oscillation along the z-direction around an equilibrium position (see Fig. 9.4). Motion of two charges having plus and minus signs is described by zþ = z0 e3 þ aeiωt e3

ðz0 , a > 0Þ,

z - = - z0 e3 - ae e3 , iωt

ð9:60Þ ð9:61Þ

364

9

Light Quanta: Radiation and Absorption

where z+ and z- are position vectors of a plus charge and minus charge, respectively; z0 and -z0 are equilibrium positions of each charge; a is an amplitude of the harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelerations of the charges are given by aþ  z€þ = - aω2 eiωt e3 ,

ð9:62Þ

a -  z€- = aω2 eiωt e3 :

ð9:63Þ

pðt Þ = qzþ þ ð- qÞz - = qðzþ - z - Þ ðq > 0Þ:

ð9:64Þ

pð€t Þ = qðz€þ - z€- Þ = - 2qaω2 eiωt e3 :

ð9:65Þ

Meanwhile, we have

Therefore,

The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time produces the electric field described by [2] E = - €p=4πε0 c2 r = - qaω2 eiωt e3 =2πε0 c2 r,

ð9:66Þ

where r is a distance between the dipole and observation point. In (9.66) we ignored a term proportional to inverse square and cube of r for the aforementioned reason. As described in (9.66), the strength of the radiation electric field in the wave zone measured at a point away from the oscillating dipole is proportional to a component of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation electric field lies in the direction perpendicular to a line connecting the observation point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the electric field in that direction and let E⊥ be the radiation electric field. Then, we have E⊥ = - qaω2 eiωt ðe3  εe Þεe =2πε0 c2 r = - qaω2 eiωt εe sin θ=2πε0 c2 r :

ð9:67Þ

As shown in Sect. 7.3, (εe  e3)εe in (9.67) “extracts” from e3 a vector component parallel to εe. Such an operation is said to be a projection of a vector. The related discussion can be seen in Part III. It takes a time of r/c for the emitted light from the charge to arrive at the observation point. Consequently, the acceleration of the charge has to be measured at the time when the radiation leaves the charge. Let t be the instant when the electric field is measured at the measuring point. Then, it follows that the radiation leaves the charge at a time of t - r/c. Thus, the electric field relevant to the radiation that can be observed far enough away from the oscillating charge is described as [2]

9.4

Dipole Radiation

365

qaω2 eiωðt - cÞ sin θ εe : 2πε0 c2 r r

E⊥ ðx, t Þ = -

ð9:68Þ

The radiation electric field must necessarily be accompanied by a magnetic field. Writing the radiation magnetic field as H⊥(x, t), we have [3] r r 1 qaω2 eiωðt - cÞ sin θ qaω2 eiωðt - cÞ sin θ  = n × ε n × εe e 2πcr cμ0 2πε0 c2 r r qaω2 eiωðt - cÞ sin θ =εm , 2πcr ð9:69Þ

H ⊥ ðx, t Þ = -

where n represents a unit vector in the direction parallel to a line connecting the observation point and the dipole. The εm is a unit polarization vector of the magnetic field as defined by (7.67). From the above, we see that the radiation electromagnetic waves in the wave zone are transverse waves that show the properties the same as those of electromagnetic waves in a free space. Now, let us evaluate a time-averaged energy flux from an oscillating dipole. Using (8.71), we have r r 1 qaω2 eiωðt - cÞ sin θ qaω2 e - iωðt - cÞ sin θ 1  n E × H =  2πcr 2 2 2πε0 c2 r ω4 sin 2 θ = 2 3 2 ðqaÞ2 n: 8π ε0 c r

Sð θ Þ =

ð9:70Þ

If we are thinking of an atom or a molecule in which the dipole consists of an electron and a positive charge that compensates it, q is replaced with -e (e < 0). Then (9.70) reads as Sð θ Þ =

ω4 sin 2 θ ðeaÞ2 n: 8π 2 ε0 c3 r 2

ð9:71Þ

Let us relate the above argument to Einstein A and B coefficients. Since we are dealing with an isolated dipole, we might well suppose that the radiation comes from the spontaneous emission. Let P be a total power of emission from the oscillating dipole that gets through a sphere of radius r. Then we have

366

9

P =

dϕ 0 2π

π



1

SðθÞr 2 sin θdθ ð9:72Þ

0

0

Changing cosθ to t, the integral I  I=

π



SðθÞ  ndS =

ω4 = 2 3 ðeaÞ2 8π ε0 c

Light Quanta: Radiation and Absorption

sin θdθ: 3

0 π 0

sin 3 θdθ can be converted into

1 - t 2 dt = 4=3:

-1

ð9:73Þ

Thus, we have P=

ω4 ðeaÞ2 : 3πε0 c3

ð9:74Þ

A probability of the spontaneous emission is given by N2A12. Since we are dealing with a single dipole, N2 can be put 1. Accordingly, an expected power of emission is A12ℏω21. Replacing ω in (9.74) with ω21 in (9.55) and equating A12ℏω21 to P, we get A12 =

ω21 3 ðeaÞ2 : 3πε0 c3 ℏ

ð9:75Þ

π ðeaÞ2 : 3ε0 ℏ2

ð9:76Þ

From (9.54), we also get B12 =

In order to relate these results to those of quantum mechanics, we may replace a2 in the above expressions with a square of an absolute value of the matrix elements of the position operator r. That is, representing | 1i and | 2i as the quantum states of the ground and excited states of a two-level atom, we define h1| r| 2i as the matrix element of the position operator. Relating |h1| r| 2i|2 to a2, we get ω 3 e2 A12 = 21 3 h1jrj2i 3πε0 c ℏ

From (9.77), we have

2

2

and

πe2 B12 = h1jrj2i : 3ε0 ℏ2

ð9:77Þ

9.5

Lasers

B12 =

367

πe2 πe2 πe2 h1jrj2ih1jrj2i = h1jrj2ih2jr{ j1 = h1jrj2i 2 2 3ε0 ℏ 3ε0 ℏ 3ε0 ℏ2

 h2jrj1i,

ð9:78Þ

where with the second equality we used (1.116); the last equality comes from that r is Hermitian. Meanwhile, we have B21 =

πe2 πe2 πe2 2jrj1ij2 = 2jrj1ih1jr{ j2 = h2jrj1ih1jrj2i: 2 jh 2h 3ε0 ℏ 3ε0 ℏ 3ε0 ℏ2

ð9:79Þ

Hence, we recover (9.41).

9.5

Lasers

9.5.1

Brief Outlook

A concept of the two-level atoms proposed by Einstein is universal and independent of materials and can be utilized for some particular purposes. Actually, in later years many researchers tried to validate that concept and verified its validity. After basic researches of 1950s and 1960s, fundamentals were put into practical use as various optical devices. Typical examples are masers and lasers, abbreviations for “microwave amplification by stimulated emission of radiation” and “light amplification by stimulated emission of radiation.” Of these, lasers are common and particularly important nowadays. On the basis of universality of the concept, a lot of materials including semiconductors and dyes are used in gaseous, liquid, and solid states. Let us consider a rectangular parallelepiped of a laser medium with a length L and cross-section area S (Fig. 9.5). Suppose there are N two-level atoms in the rectangular parallelepiped such that

t I(x) 0

t+dt I(x+dx)

x x+dx

L

Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area S (not shown). I(x) denotes irradiance at a point of x from the origin

368

9

Light Quanta: Radiation and Absorption

N = N1 þ N 2,

ð9:80Þ

where N1 and N2 represent the number of atoms occupying the ground and excited states, respectively. Suppose that a light is propagated from the left of the rectangular parallelepiped and entering it. Then, we expect that three processes occur simultaneously. One is a stimulated absorption and others are stimulated emission and spontaneous emission. After these process, an increment dE in photon energy of the total system (i.e., the rectangular parallelepiped) during dt is described by dE = fN 2 ½B21 ρðω21 Þ þ A12  - N 1 B21 ρðω21 Þgℏω21 dt:

ð9:81Þ

In light of (9.39) and (9.40), a dimensionless quantity dE/ℏω21 represents a number of effective events of photons emission that have occurred during dt. Since in lasers the stimulated emission is dominant, we shall forget about the spontaneous emission and rewrite (9.81) as dE = fN 2 B21 ρðω21 Þ - N 1 B21 ρðω21 Þgℏω21 dt = B21 ρðω21 ÞðN 2 - N 1 Þℏω21 dt:

ð9:82Þ

Under a thermal equilibrium, we have N2 < N1 on the basis of Boltzmann distribution law, and so dE < 0. In this occasion, therefore, the photon energy decreases. For the light amplification to take place, therefore, we must have a following condition: N2 > N1:

ð9:83Þ

This energy distribution is called inverted distribution or population inversion. Thus, the laser oscillation is a typical non-equilibrium phenomenon. To produce the population inversion, we need an external exciting source using an electrical or optical device. The essence of lasers rests upon the fact that stimulated emission produces a photon that possesses a wavenumber vector (k) and a polarization (ε) both the same as those of an original photon. For this reason, the laser light is monochromatic and highly directional. To understand the fundamental mechanism underlying the related phenomena, interested readers are encouraged to seek appropriate literature of quantum theory of light for further reading [4]. To make a discussion simple and straightforward, let us assume that the light is incident parallel to the long axis of the rectangular parallelepiped. Then, the stimulated emission produces light to be propagated in the same direction. As a result, an irradiance I measured in that direction is described as I=

E 0 c: SL

Note that the light velocity in the laser medium c′ is given by

ð9:84Þ

9.5

Lasers

369

c0 = c=n,

ð9:85Þ

where n is a refractive index of the laser medium. Taking an infinitesimal of both sides of (9.84), we have c0 c0 = B21 ρðω21 ÞðN 2 - N 1 Þℏω21 dt SL SL = B21 ρðω21 ÞNℏω21 c0 dt,

dI = dE

ð9:86Þ

~ = ðN 2 - N 1 Þ=SL denotes a “net” density of atoms that occupy the excited where N state. The energy density ρ(ω21) can be written as ρðω21 Þ = I ðω21 Þgðω21 Þ=c0 ,

ð9:87Þ

where I(ω12) [J s-1 m-2] represents an intensity of radiation; g(ω21) is a gain function [s]. The gain function is a measure that shows how favorably (or unfavorably) the transition takes place at the said angular frequency ω12. This is normalized in the emission range such that 1

gðωÞdω = 1:

0

The quantity I(ω21) is an energy flux that gets through per unit area per unit time. This flux corresponds to an energy contained in a long and thin rectangular parallelepiped of a length c′ and a unit cross-section area. To obtain ρ(ω21), I(ω12) should be divided by c′ in (9.87) accordingly. Using (9.87) and replacing c′dt with a distance dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as dI ðxÞ =

~ 21 B21 gðω21 ÞNℏω I ðxÞdx: 0 c

ð9:88Þ

Dividing (9.88) by I(x) and integrating both sides, we have I I0

dI ðxÞ = I ð xÞ

I

d ln I ðxÞ =

I0

~ 21 B21 gðω21 ÞNℏω 0 c

x

dx,

ð9:89Þ

0

where I0 is an irradiance of light at an instant when the light is entering the laser medium from the left. Thus, we get I ðxÞ = I 0 exp

~ 21 B21 gðω21 ÞNℏω x : c0

ð9:90Þ

370

9

Light Quanta: Radiation and Absorption

Equation (9.90) shows that an irradiance of the laser light is augmented exponentially along the path of the laser light. In (9.90), denoting an exponent as G G

~ 21 B21 gðω21 ÞNℏω , c0

ð9:91Þ

we get I ðxÞ = I 0 exp Gx: The constant G is called a gain constant. This is an index that indicates the laser ~ yield a high performance of the performance. Large numbers B21, g(ω21), and N laser. In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause constructive interference. In a one-dimensional dielectric medium, the condition is described as kL = mπ

or

mλ = 2L ðm = 1, 2, ⋯Þ,

ð9:92Þ

where k and λ denote a wavenumber and wavelength in the dielectric medium, respectively. Indexing k and λ with m that represents a mode, we have k m L = mπ

or

mλm = 2L ðm = 1, 2, ⋯Þ:

ð9:93Þ

This condition can be expressed by different manners such that ωm = 2πνm = 2πc0 =λm = 2πc=nλm = mπc=nL:

ð9:94Þ

It is often the case that if the laser is a long and thin rod, rectangular parallelepiped, etc., we see that sharply resolved and regularly spaced spectral lines are observed in emission spectrum. These lines are said to be a longitudinal multimode. The separation between two neighboring emission lines is referred to as the free spectral range [2]. If adjacent emission lines are clearly resolved so that the free spectral range can easily be recognized, we can derive useful information from the laser oscillation spectra (vide infra). Rewriting (9.94), as, e.g., ωm n =

πc m, L

and taking differential (or variation) of both sides, we get

ð9:95Þ

9.5

Lasers

371

nδωm þ ωm δn = nδωm þ ωm

δn δn πc δω = n þ ωm δωm = δm: δωm m δωm L

ð9:96Þ

Therefore, we get δωm =

δn πc n þ ωm δωm L

-1

δm:

ð9:97Þ

Equation (9.97) premises the wavelength dispersion of a refractive index of a laser medium. Here, the wavelength dispersion means that the refractive index varies as a function of wavelengths of light in a matter. The laser materials often have a considerably large dispersion and relevant information is indispensable. From (9.97), we find that ng  n þ ωm

δn δωm

ð9:98Þ

plays a role of refractive index when the laser material has a wavelength dispersion. The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is rewritten as δω =

πc δm, Lng

ð9:99Þ

where we omitted the index m of ωm. When we need to distinguish the refractive index n clearly from the group refractive index, we refer to n as a phase refractive index. Rewriting (9.98) as a relation of continuous quantities and using differentiation instead of variation, we have [2] ng = n þ ω

dn dω

or

ng = n - λ

dn : dλ

ð9:100Þ

To derive the second equation of (9.100), we used the following relations: Namely, taking a variation of λω = 2πc, we have ωdλ þ λdω = 0

or

λ ω =- : dλ dω

Several formulae or relation equations were proposed to describe the wavelength dispersion. One of the famous and useful formulas among them is Sellmeier’s dispersion formula [5]. As an example, the Sellmeier’s dispersion formula can be described as

372

9

n=



Light Quanta: Radiation and Absorption

B 1-

c 2 λ

,

ð9:101Þ

where A, B, and C are appropriate constants with A and B being dimensionless and C having a dimension [m]. In an actual case, it would be difficult to determine n analytically. However, if we are able to obtain well-resolved spectra, δm can be put as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR from (9.99) we have δωFSR =

πc Lng

or

ng =

πc : LðδωFSR Þ

ð9:102Þ

Thus, one can determine ng as a function of wavelengths.

9.5.2

Organic Lasers

By virtue of prominent features of lasers and related phenomena, researchers and engineers have been developing and proposing up to now various device structures and their operation principles in the research field of device physics. Of these, we have organic lasers as newly occurring laser devices. Organic crystals possess the well-defined structure-property relationship and some of them exhibit peculiar light-emitting features. Therefore, those crystals are suited for studying their lasing property. In this section we study the light-emitting properties (especially the lasing property) of the organic crystals in relation to the wavelength dispersion of refractive index of materials. From the point of view of device physics, another key issue lies in designing an efficient diffraction grating or resonator (see Sect. 8.8). We address these fundamental issues from as broad a viewpoint as possible so that we may acquire systematic knowledge about optical device applications of materials of different compositions either inorganic or organic. In the following tangible examples, we further investigate specific aspects of the light-emitting properties of the organic crystals to pursue fundamental properties of the organic light-emitting materials. The emission properties are examined in a device configuration of (1) thin rectangular parallelepiped or (2) slab waveguide equipped with a diffraction grating. In the former case, we investigate the longitudinal multimode of emissions that arise from an optical resonator shaped by a pair of parallel crystal faces (see Example 9.1). In the latter case, we study the transverse mode of the emissions in addition to the longitudinal mode (Examples 9.2 and 9.3). Example 9.1 [6] We wish to show and interpret optical emission data of the organic crystals within a framework of the interference theory of electromagnetic waves mentioned in the previous chapter. As an example, Fig. 9.6 [6] displays a broadband emission spectrum obtained under a weak optical excitation regime (using a mercury lamp) with a crystal consisting of an organic semiconductor AC′7. The optical

9.5

Lasers

373

(b) 11

Intensity (10 counts)

Intensity (10 counts)

(a)

3

3

10

10

5

0 500 520 540 560 580 600 620 640 Wavelength (nm)

9 529 530 531 Wavelength (nm)

Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC′7. (a) Full spectrum. (b) Enlarged profile of the spectrum around 530 nm. (Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117)

4 3

3

Intensity (10 counts)

Fig. 9.7 Laser oscillation spectrum of an organic semiconductor crystal AC′ 7. (Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/ phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://doi.org/ 10.1063/1.3634117)

2 1 0 522 524 526 Wavelength (nm)

528

measurements were carried out using an AC′7 crystal that had a pair of parallel crystal faces which worked as an optical resonator called Fabry-Pérot resonator. It caused the interference modulation of emissions and, as a result, produced a periodic modulation that was superposed onto the broadband emission. It is clearly seen in Fig. 9.6b. As another example, Fig. 9.7 [6] displays a laser oscillation spectrum obtained under a strong excitation regime (using a laser beam) with the same AC′7 crystal as the above. The structural formula of AC′7 is shown in Fig. 9.8 together with other

374

9

Light Quanta: Radiation and Absorption

Fig. 9.8 Structural formulae of several organic semiconductors BP1T, AC5, and AC′7

related organic semiconductors. The spectrum again exhibits the periodic modulation accompanied by the longitudinal multimode. Sharply resolved emission lines are clearly noted with a regular spacing in the spectrum. If we compare the regular spacing with that of Fig. 9.6b, we immediately recognize that the regular spacing is virtually the same with the two spectra in spite of a considerably large difference between their overall spectral profile. In terms of the interference, both the broadband spectra and laser emission lines gave the same free spectral range [6]. Thus, both the broadband emission spectra and laser oscillation spectra are equally useful to study the wavelength dispersion of refractive index of the crystal. Choosing a suitable empirical formula of the wavelength dispersion, we can compare it with experimentally obtained results. For example, if we choose, e.g., (9.101) for the empirical formula, what we do next is to determine the constants A, B, and C of (9.101) by comparing the computation results with actual emission data. If well-resolved interference modulation spectra can be obtained, this enables one to immediately get a precise value of ng from (9.102). In turn, it leads to information about the wavelength dispersion of n through (9.100) and (9.101). Bearing in mind this situation, Yamao et al. got a following expression [6]:

9.5

Lasers

375

Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with several organic semiconductor crystalsa Material BP1T AC5 AC′7

A 5.7 3.9 6.0

B 1.04 1.44 1.06

C (nm) 397 402 452

a

Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117

A 1-

ng = 1-

3 c 2 λ

2 c 2 λ

þB

A 1-

c 2 λ

:

ð9:103Þ

þB

Notice that (9.103) is obtained by inserting (9.101) into (9.100) and expressing ng as a function of λ. Determining optimum constants A, B, and C, a set of these constants yield a reliable dispersion formula in (9.101). Numerical calculations can be utilized effectively. The procedures are as follows: (1) Tentatively choosing probable numbers for A, B, and C in (9.103), ng can be expressed as a function of λ. (2) The resulting fitting curve is then compared with ng data experimentally decided from (9.102). After this procedure, one can choose another set of A, and B, and C and again compare the fitting curve with the experimental data. (3) This procedure can be repeated many times through iterative numerical computations of (9.103) using different sets of A, B, and C. Thus, we should be able to adjust and determine better and better combination of A, B, and C so that the refined function (9.103) can reproduce the experimental results as precise as one pleases. At the same time, we can determine the most reliable combination of A, B, and C. Optimized constants A, B, and C of (9.101) are listed in Table 9.1 [6]. Using these numbers A, B, and C, the dispersion of the phase refractive indices n can be calculated using (9.101) as a function of wavelengths λ. Figure 9.9 [6] shows several examples of the wavelength dispersion of ng and n for the organic semiconductor crystals. Once again, it is intriguing to see that ng associated with the laser oscillation line sits on the dispersion curve determined from the broadband emission lines (Fig. 9.9a). The formulae (9.101) and (9.103) along with associated procedures to determine the constants A, B, and C are expected to apply widely to various laser and lightemitting materials consisting of semiconducting inorganic and organic materials. Example 9.2 [7] If we wish to construct a laser device, it will be highly desired to equip the laser material with a suitable diffraction grating or resonator [8, 9]. Suppose a situation where we choose a thin crystal for an optically active material equipped with the diffraction grating. In that case, we are to deal with a device consisting of a crystal slab waveguide. We have already encountered such a device configuration in Sect. 8.7 where the transverse modes must be considered on the basis of (8.167). We

376

9

Light Quanta: Radiation and Absorption

Refractive index n

Group index ng

7 6

(a)

BP1T

AC'7 AC5

5 4 3 3.3 3.2 3.1 3 2.9 2.8 2.7 2.6 2.5

AC'7

(b)

BP1T

AC5 500 600 Wavelength (nm)

Fig. 9.9 Examples of the wavelength dispersion of refractive indices for several organic semiconductor crystals. (a) Dispersion curves of the group indices (ng). The optimized solid curves are plotted using (9.103). The data plotted by filled circles (BP1T), filled triangles (AC5), and filled diamonds (AC′7) were obtained by the broadband interference modulation of emissions. An open circle, an open triangle, and open diamonds show the group indices estimated from the mode intervals of the multimode laser oscillations. (b) Dispersion curves of the phase refractive indices (n). These data are plotted using the Sellmeier’s dispersion formula (9.101) with the optimized parameters A, B, and C (listed in Table 9.1). (Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https:// doi.org/10.1063/1.3634117)

should also take account of the longitudinal modes that arise from the presence of the diffraction grating. As a result, allowed modes in the optical device are indexed by, e.g., TMml, TEml, where m and l denote the longitudinal modes and transverse modes, respectively (see Sects. 8.7 and 8.8). Besides the information about the dispersion of phase refractive index, we need numerical data of the propagation constant and its dispersion. The propagation constant β has appeared in Sect. 8.7.1 and is defined as

9.5

Lasers

377

β = k sin θ = nk 0 sin θ,

ð8:153Þ

where θ is identical to an angle denoted by θ that was defined in Fig. 8.12. The propagation constant β (>0) characterizes the light propagation within a slab waveguide where the transverse modes of light are responsible. As the phase refractive index n has a dispersion, the propagation constant β has a dispersion as well. In optics, n sin θ is often referred to as an effective index. We denote it by neff  n sin θ

or

β = k 0 neff :

ð9:104Þ

As 0 ≤ θ ≤ π/2, we have n sin θ ≤ n. In general, it is going to be difficult to obtain analytical description or solution for the dispersion of both the phase refractive index and effective index. As in Example 9.1, we usually obtain the relevant data by the numerical calculations. Under these circumstances, to establish a design principle of high-performance organic light-emitting devices Yamao et al. [7] have developed a light-emitting device where an organic crystal of P6T (see structural formula of Fig. 9.10a) was placed onto a diffraction grating (Fig. 9.10b [7]). The compound P6T is known as one of a family of thiophene/phenylene co-oligomers (TPCOs) together with BP1T, AC5, AC′7, etc. that appear in Fig. 9.8 [10, 11]. The diffraction grating was engraved using a focused ion beam (FIB) apparatus on an aluminum-doped zinc oxide (AZO) substrate. The resulting substrate was then laminated with the thin crystal of P6T (Fig. 9.10c). Using those devices, Yamao et al. excited the P6T crystal with ultraviolet light of a mercury lamp and collected the emissions from the crystal and analyzed those emission spectra (vide infra). Detailed analysis of those spectra has revealed that there is a close relationship between the peak location of emission lines and emission direction. Thus, to analyze the angledependent emission spectra turns out to be a powerful tool to accurately determine the dispersion of propagation constant of the laser material along with light-emitting materials more generally. In a light-emitting device, we must consider a problem of out-coupling of light. Seeking efficient out-coupling of light is equivalent to solving equations of electromagnetic wave motion inside and outside the slab crystal under appropriate boundary conditions. The boundary conditions are given such that tangential components of electromagnetic fields are continuous across the interface formed by the slab crystal and air or a device substrate. Moreover, if a diffraction grating is present in an optical device (including the laser), the diffraction grating influences emission characteristics of the device. In other words, regarding the out-coupling of light we must impose the following condition on the device such that [7] β - mK = ðk0  ~eÞ~e,

ð9:105Þ

378

9 Light Quanta: Radiation and Absorption

(a)

P6T (b)

2 µm (c) P6T crystal

air

optical device

AZO grating Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b) Diffraction grating of an organic device. The purple arrow indicates the direction of grating wavevector K. (Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.5030486). (c) Cross-section of the device consisting of P6T crystal/AZO substrate

9.5

Lasers

379

Fig. 9.11 Geometrical relationship among the light propagation in crystal (indicated with β), the propagation outside the crystal (k0), and the grating wavevector (K). This geometry represents the phase matching between the emission inside the device (organic slab crystal) and out-coupled emission (i.e., the emission outside the device); see text



where β is the propagation vector with |β|  β that appeared in (8.153). The quantity β is called a propagation constant. In (9.105), furthermore, m is the diffraction order with a natural number (i.e., a positive integer); k0 denotes the wavenumber vector of an emission in vacuum with jk0 j  k 0 = 2π=λp , where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical to the wavenumber of the emission in air (i.e., the emission to be detected). The unit vector ~e that is oriented in the direction parallel to β - mK within the crystal plane is defined as ~e =

β - mK : j β - mK j

ð9:106Þ

In the above equation, K is the grating wavevector defined as jK j  K = 2π=Λ, where Λ is the grating period and the direction of K is perpendicular to the grating grooves; that direction is indicated in Fig. 9.10b with a purple arrow. Equation (9.105) can be visualized in Fig. 9.11 as geometrical relationship among the light propagation in crystal (indicated with β), the propagation outside the crystal (k0), and the grating wavevector (K). If we are dealing with the emission parallel to the substrate plane, i.e., grazing emission, (9.105) can be reduced to

380

9

Light Quanta: Radiation and Absorption

β - mK = k0 :

ð9:107Þ

Equations (9.105) and (9.107) represent the relationship between β and k0, i.e., the phase matching conditions between the emission inside the device (organic slab crystal) and the out-coupled emission, namely the emission outside the device (in air). Thus, the boundary conditions can be restated as (1) the tangential component continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and (2) the phase matching of the electromagnetic fields both inside and outsides the laser medium. We examine these two factors below. 1. Tangential component continuity of electromagnetic fields (waveguide condition): Let us suppose that we are dealing with a dielectric medium with anisotropic dielectric constant, but isotropic magnetic permeability. In this situation, Maxwell’s equations must be formulated so that we can deal with the electromagnetic properties in an anisotropic medium. Such substances are widely available and organic crystals are counted as typical examples. Among those crystals, P6T crystallizes in the monoclinic system as in many other cases of organic crystals [10, 11]. In this case, the permittivity tensor (or electric permittivity tensor) ε is written as [7] ε=

εaa 0 εc a

0 εbb 0

εac 0 εc  c 

:

ð9:108Þ

Notice that in (9.108) the permittivity tensor is described in the orthogonal coordinate of abc-system, where a and b coincide with those of the crystallographic a- and b-axes of P6T with the c-axis being perpendicular to the ab-plane. Note that in many of organic crystals the c-axis is not perpendicular to the abplane. Meanwhile, we assume that the magnetic permeability μ of P6T is isotropic and identical to that of vacuum. That is, in a tensor form we have μ=

μ0 0 0

0 μ0 0

0 0 μ0

,

ð9:109Þ

where μ0 is the magnetic permeability of vacuum. Then, the Maxwell’s equations in a matrix form read as rot E = - μ0

∂H , ∂t

ð9:110Þ

9.5

Lasers

381

Fig. 9.12 Laboratory coordinate system (the ξηζsystem) for the device experiments. Regarding the symbols and notations, see text

O

rot H = ε0

εaa 0 εc  a

0 εbb 0

εac 0 εc c

∂E : ∂t

ð9:111Þ

In (9.111) the equation is described in the abc-system. But, it will be desired to describe the Maxwell’s equations in the laboratory coordinate system (see Fig. 9.12) so that we can readily visualize the light propagation in an anisotropic crystal such as P6T. Figure 9.12 depicts the ξηζ-system, where the ξ-axis is in the direction of the light propagation within the crystal, namely the ξ-axis parallels β in (9.107). We assume that the ξηζ-system forms an orthogonal coordinate. Here we define the dielectric constant ellipsoid ~ε described by ðξ η ζ Þ~ε

ξ η ζ

= 1,

ð9:112Þ

which represents an ellipsoid in the ξηζ-system. If one cuts the ellipsoid by a plane that includes the origin of the coordinate system and is perpendicular to the ξ-axis, its cross-section is an ellipse. In this situation, one can choose the η- and ζaxes for the principal axis of the ellipse. This implies that when we put ξ = 0, we must have

382

9

ð0 η ζ Þ~ε

0 η ζ

Light Quanta: Radiation and Absorption

= εηη η2 þ εζζ ζ 2 = 1:

ð9:113Þ

In other words, we must have ~ε in the form of εξξ εηξ εζξ

~ε =

εξη εηη 0

εξζ 0 εζζ

,

where ~ε is expressed in the ξηζ-system. In terms of the matrix algebra, the principal submatrix with respect to the η- and ζ-components should be diagonalized (see Sect. 11.3). On this condition, the electric flux density of the electromagnetic wave is polarized in the η- and ζ-direction. A half of the reciprocal p p square root of the principal axes, i.e., 1= εηη or 1= εζζ represents the anisotropic refractive indices. Suppose that the ξηζ-system is reached by two successive rotations starting from the abc-system in such a way that we first perform the rotation by α around the c-axis that is followed by another rotation by δ around the ξ-axis (Fig. 9.12). Then we have [7] εξξ εηξ εζξ

εξη εηη 0

εξζ 0 εζζ

= Rξ- 1 ðδÞRc- 1 ðαÞ

εaa 0 εc  a

εac 0 εc c

0 εbb 0

Rc ðαÞRξ ðδÞ,

ð9:114Þ

where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the above, respectively. In (9.114) we have Rc ðαÞ =

cos α sin α 0

- sin α 0 cos α 0 0 1

,

1 0 0

Rξ ðδÞ =

0 cos δ sin δ

0 - sin δ cos δ

:

ð9:115Þ

With the matrix presentations and related coordinate transformations for (9.114) and (9.115), see Sects. 11.4, 17.1, and 17.4.2. Note that the rotation angle δ cannot be decided independent of α. In the literature [7], furthermore, the quantity εξη = εηξ was ignored as small compared to other components. As a result, (9.111) is converted into the following equation described by rot H = ε0

εξξ 0 εζξ

0 εηη 0

εξζ 0 εζζ

∂E : ∂t

ð9:116Þ

Notice here that the electromagnetic quantities H and E are measured in the ξηζ-coordinate system. Meanwhile, the electromagnetic waves that are propagating within the slab crystal are described as

9.5

Lasers

383

Φν exp½iðβξ - ωt Þ,

ð9:117Þ

where Φν stands for either an electric field or a magnetic field with ν chosen from ν = 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into (9.116) and separating the resulting equation into ξ, η, and ζ components, we obtain six equations with respect to six components of H and E. Of these, we are particularly interested in Eξ and Eζ as well as Hη, because we assume that the electromagnetic wave is propagated as a TM mode [7]. Using the set of the above six equations, with Hη in the P6T crystal we get the following second-order linear differential equation (SOLDE): d2 H η εξζ dH η þ 2ineff k0 þ 2 ε ζζ dζ dζ

εξξ -

εξζ 2 εξξ n 2 k 2 H = 0, εζζ εζζ eff 0 η

ð9:118Þ

where neff = n sin θ in (8.153) is the effective index. Assuming as a solution of (9.118) H η = H cryst e - iκζ cosðkζ þ ϕs Þ κ ≠ 0, k ≠ 0, H cryst , ϕs : constant

ð9:119Þ

and inserting (9.119) into (9.118), we get a following type of equation described by Ζe - iκζ cosðkζ þ ϕs Þ þ Ωe - iκζ sinðkζ þ ϕs Þ = 0:

ð9:120Þ

In (9.119) Hcryst represents the magnetic field within the P6T crystal. The constants Ζ and Ω in (9.120) can be described using κ and k together with the constant coefficients of dHη/dζ and Hη of SOLDE (9.118). Meanwhile, we have e - iκζ cosðkζþϕs Þ

½e - iκζ cosðkζþϕs Þ

- iκζ

0

e sinðkζþϕs Þ - 2iκ 0 ≠ 0, ½e - iκζ sinðkζþϕs Þ = ke

where the differentiation is pertinent to ζ. This shows that e-iκζ cos (kζ + ϕs) and e-iκζ sin (kζ + ϕs) are linearly independent. This in turn implies that ΖΩ0

ð9:121Þ

in (9.120). Thus, from (9.121) we can determine κ and k such that κ=β

εξζ εξζ = neff k0 , εζζ εζζ

k2 =

1 εζζ 2

εξξ εζζ - εξζ 2 εζζ - neff 2 k0 2 :

ð9:122Þ

Further using the aforementioned six equations to eliminate Eζ, we obtain

384

9

Eξ = i

Light Quanta: Radiation and Absorption

εζζ kH e - iκζ sinðkζ þ ϕs Þ: ωε0 ðεξξ εζζ - εξζ 2 Þ cryst

ð9:123Þ

Equations (9.119) and (9.123) define the electromagnetic fields as a function of ζ within the P6T slab crystal. Meanwhile, the fields in the AZO substrate and air can readily be determined. Assuming that both the substances are isotropic, we have εξζ = 0 and εξξ = εζζ ≈ ~n2 and, hence, we get d2 H η þ n~2 - neff 2 k0 2 H η = 0, dζ 2

ð9:124Þ

where ~n is the refractive index of either the substrate or air. Since 1 < ~n ≤ neff ≤ n,

ð9:125Þ

where n is the phase refractive index of the P6T crystal, we have ~ n2 - neff 2 < 0. Defining a quantity γ

neff 2 - ~n2 k 0

ð9:126Þ

for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic fields as ~ γζ , H η = He

ð9:127Þ

~ denotes the magnetic field relevant to the substrate or air. Considering where H that ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the magnetic field we have ~ γζ H η = He

~ - γðζ - cosd δÞ and H η = He

ð9:128Þ

for the substrate and air, respectively. Notice that Hη → 0, when ζ → - 1 with the substrate and ζ → 1 for air. Correspondingly, with the electric field we get Eξ = - i

γ ~ γζ He ωε0 ~n2

and

Eξ = i

γ ~ - γ ðζ - cosd δÞ He ωε0 ~n2

ð9:129Þ

for the substrate and air, respectively. The device geometry is characterized by that the P6T slab crystal is sandwiched by air and the device substrate so as to form the three-layered (air/crystal/substrate) structure (Fig. 9.10c). Therefore, we impose boundary

9.5

Lasers

385

Fig. 9.13 Geometry of the cross-section of air/P6T crystal/AZO substrate. The angle δ is identical with that of Fig. 9.12. The points P and P′ are located at the crystal/substrate interface and air/crystal interface, respectively. The distance between P and P′ is d/ cos δ

air P6T crystal

AZO substrate

conditions on Hη in (9.119) and (9.128) along with Eξ in (9.123) and (9.129) in such a way that their tangential components are continuous across the interfaces between the crystal and the substrate and between the crystal and air. Figure 9.13 represents the geometry of the cross-section of air/P6T crystal/ AZO substrate. From (9.119) and (9.128), (a) the tangential continuity condition for the magnetic field at the crystal/substrate interface is described as ~ γ0 ðcos δÞ, H cryst e - iκ0 cosðk  0 þ ϕs Þðcos δÞ = He

ð9:130Þ

where the factor of cosδ comes from the requirement of the tangential continuity condition. (b) The tangential continuity condition for the electric field at the same interface reads as i

εζζ γ ~ γ0 kH e - iκ0 sinðk  0 þ ϕs Þ = - i He : ωε0 ðεξξ εζζ - εξζ 2 Þ cryst n2 ωε0 ~

ð9:131Þ

Thus, dividing both sides of (9.131) by both sides of (9.130), we get εζζ γ k tan ϕs = - 2 εξξ εζζ - εξζ 2 ~n

or

εζζ γ k tanð- ϕs Þ = 2 , εξξ εζζ - εξζ 2 ~ n

ð9:132Þ

386

9

Light Quanta: Radiation and Absorption

where γ and ~n are substrate related quantities; see (9.125) and (9.126). Another boundary condition at the air/crystal interface can be obtained in a similar manner. Notice that in that case ζ = d/ cos δ; see Fig. 9.13. As a result, we have εζζ kd γ k tan þ ϕs = 2 , cos δ εξξ εζζ - εξζ 2 n

ð9:133Þ

where γ and n~ are air related quantities. The quantity ϕs can be eliminated considering arc tangent of (9.132) and (9.133). In combination with (9.122), we finally get the following eigenvalue equation for TM modes: k0 d

1 εζζ 2

εξξ εζζ - εξζ 2 εζζ - neff 2 =ðcos δÞ

= lπ þ tan - 1 þ tan - 1

1 nair

1 nsub 2

2

ðεξξ εζζ - εξζ 2 Þ ðεξξ εζζ - εξζ 2 Þ

neff 2 - nair 2 εζζ - neff 2

neff 2 - nsub 2 εζζ - neff 2

ð9:134Þ ,

where nair (=1) is the refractive index of air, nsub is that of AZO substrate equipped with a diffraction grating, d is the crystal thickness, and l is the order of the transverse mode. The refractive index nsub is estimated as the average of the refractive indices of air and AZO weighted by volume fraction they occupy in the diffraction grating. Regarding the index l, the condition l ≥ 0 (i.e., zero or a positive integer) must be satisfied. Note that l = 0 is allowed because the total internal reflection (Sects. 8.5 and 8.6) is responsible in the crystal waveguide. Thus, neff can be determined as a function of k0 with d and l being parameters. To solve (9.134), iterative numerical computation is needed. The detailed calculation procedures for this can be seen in the literature [7] and supplementary material therein. Note that (9.134) includes the well-known eigenvalue equation for a waveguide that comprises an isotropic core dielectric medium sandwiched by a couple of clad layers having the same refractive index (symmetric waveguide) [12]. In that case, in fact, the second and third terms in RHS of (9.134) are the same and their sum is identical with -δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134) use εξξ = εζζ ≈ n2 [see (7.57)] and εξζ = 0 together with the relative refractive index given by (8.39). 2. Phase matching of the electromagnetic fields (transverse mode requisition): Once we have obtained the eigenvalue equation described by (9.134), we will be able to solve the problem from the crystal thickness d. Even though the order of the transverse mode l is yet unknown, l is a discrete value (i.e., zero or a

9.5

Lasers

387

(b)

(a) −

Case (i)

Case (ii)

Fig. 9.14 Two cases of the light propagation in the device geometry of slab waveguide. (a) The projection of the vectors β and k0 onto K is parallel to each other. (b) The projection β and k0 onto K is antiparallel. For sake of simplicity, the projection of β onto K is parallel to K for both the cases. The number m is assumed to be a positive integer

positive integer) so that the computation of (9.134) can be simplified. Namely, inserting probable values l into (9.134) we can greatly save time and effort to get a solution. Meanwhile, the condition of the phase matching is equivalent to determining the index m related to the longitudinal mode. To determine m, we return back to (9.107). Figure 9.14 features following two cases with regard to the light propagation represented by (9.107) in the device geometry of slab waveguide. Case (1): The projection of the vectors β and k0 onto K is parallel to each other (see Fig. 9.14a). Case (2): The projection β and k0 onto K are antiparallel (Fig. 9.14b). The angle θ of Fig. 9.14 is defined as that made by k0 and K. Suppose geometric transformations that convert the vectors β, k0, and K to those having the opposite direction. Then, the mutual geometric arrangement of these vectors remains unchanged together with the form of (9.107) before and after the transformations. Concomitantly, the angle θ is switched to θ - π. As a result, if we set θ as 0 ≤ θ ≤ π/2, we have -π ≤ θ - π ≤ - π/2. Meanwhile, let us think of the reflection with respect to the plane that contains the vector K and is perpendicular to the plane of paper. In that case, we are uncertain as to whether β undergoes the same reflection according to the said reflection operation. It is because the dielectric constant ellipsoid does not necessarily possess the mirror symmetry with respect to that plane in an anisotropic medium. We need the information about the emission data taken at the angle -θ accordingly. Similarly, setting -θ as 0 ≤ - θ ≤ π/2 (or -π/2 ≤ θ ≤ 0), we have -3π/2 ≤ θ - π ≤ - π. Combining the above results, the ranges of θ and θ - π cover the emission angles [-3π/2, π/2]. It is equivalent to the whole range [0, 2π]. Thus, the angle-dependent measurements covering -π/2 ≤ θ ≤ π/2 provide us with sufficient data to be analyzed. Now, care should be taken to determine β. This is because from an experimental perspective k0 can be decided accurately but arbitrariness is involved with β. Defining β  |β|, k0  j k0j, and K  j Kj, we examine the relationship among

388

9 Light Quanta: Radiation and Absorption

magnitudes of β, k0, and K according to the above two cases. Using an elementary trigonometric formula, we describe the results as follows: β2 = k0 2 þ m2 K 2 - 2mKk0 cosðπ - θÞ = k0 2 þ m2 K 2 þ 2mKk0 cos θ, or β2 = k0 2 þ m2 K 2 - 2mKk0 cos θ,

ð9:135Þ

where we assume that m is a positive integer. In (9.135) the former and latter equations correspond to Case (1) and Case (2), respectively (see Fig. 9.14). Replacing β with β = k0neff and taking the variation of the resulting equation with regard to k0 and θ, we get ∓ mKk0 sin θ δk0 , = δθ k 0 ðneff 2 - 1Þ ∓ mK cos θ

ð9:136Þ

where the signs - and + are associated with Case (1) and Case (2), respectively. In (9.136) we did not take the variation of neff, because it was assumed to be weakly dependent on θ compared to k0. Note that since k0 was not allowed to take continuous numbers, we used the sign of variation δ instead of the sign of differentiation d or ∂. The denominator of (9.136) is positive (readers, check it), and so δk0/δθ is negative or positive with 0 ≤ θ ≤ π/2 for Case (1) and Case (2), respectively. The quantity δk0/δθ is positive or negative with -π/2 ≤ θ ≤ 0 for Case (1) and Case (2), respectively. For both Case (1) and Case (2), δk0/δθ = 0 at θ = 0, namely, k0 (or the emission wavelength) takes an extremum at θ = 0. In terms of the emission wavelength, Case (1) and Case (2) are related to the redshift and blueshift in the spectra, respectively. These shifts can be detected and inspected closely by the angle-dependent emission spectra (vide infra). Meanwhile, taking inner products (see Chap. 13) of both sides of (9.107), we have hβ - mKjβ - mK i = hk0 jk0 i: That is, we get β2 þ m2 K 2 - 2mKβ cos φ = k0 2 ,

ð9:137Þ

where φ is an angle between mK and β (see Fig. 9.11). To precisely decide φ is accompanied by experimental ambiguity, however. To avoid the ambiguity, therefore, it is preferred to deal with a case where all the vectors β, K, and k0 are (anti)parallel to ~e in (9.105). From a practical point of view, moreover, it is the most desired to make a device so that it can strongly emit light in the direction parallel to the grating wavevector.

9.5

Lasers

389

(a)

(b)

Z mK

Grating

Crystal

Z

k0

k0

mK

Grating β

Crystal

X

(c)

β

X

Z mK

Grating Crystal

−β

β

k0

X

Fig. 9.15 Schematic diagram for the relationship among k0, β, and K. This geometry is obtained by substituting φ = 0 for (9.137). (a) Redshifted case. (b) Blueshifted case. (c) Bragg’s condition case. All these diagrams are depicted as a cross section of the devices cut along the K direction. (Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/ 1.5129425)

In that case, we have φ = 0 and (9.137) is reduced to ðβ - mK Þ2 = k 0 2

or

β - mK = ± k0 :

Then, this gives us two choices such that ð1Þ β = k 0 þ mK = neff k0 , ð 2Þ

k0 =

- β = k 0 - mK = - neff k0 ,

2π mK = , λp neff - 1 2π mK k0 = = : λp neff þ 1

ð9:138Þ

In (9.138), all the physical quantities are assumed to be positive. According to the above two cases, we have the following corresponding geometry such that 1. β, K k k0 (β and K are parallel to k0), 2. β, K ka k0 (β and K are antiparallel to k0). The above geometry is depicted in Fig. 9.15 [13]. Figure 9.15 includes a case of the Bragg’s condition as well (Fig. 9.15c). Notice that Fig. 9.15a and Fig. 9.15b are special cases of the configurations depicted in Fig. 9.14a and Fig. 9.14b, respectively.

390

9

Light Quanta: Radiation and Absorption

Further rewriting (9.138) to combine the two equations, we get λp =

2π ðneff ∓ 1Þ ðneff ∓ 1ÞΛ = , mK m

ð9:139Þ

where the sign ∓ corresponds to the aforementioned Case (1) and Case (2), respectively; we used the relation of K = 2π/Λ. Equation (9.139) is useful for us to guess the magnitude of m. It is because since m is changed stepwise (e.g., m = 1, 2, 3,etc.), we have only to examine a limited number of m so that neff may satisfy (9.125). When the laser oscillation is responsible, (9.138) and (9.139) should be replaced with the following relation that represents the Bragg’s condition [14]: 2β = mK

ð9:140Þ

or λp =

2neff Λ : m

ð9:141Þ

Once we find out a suited m, (9.107) enables us to seek the relationship between λp and neff experimentally. As described above, we have explained two major factors of the out-coupling of emission. In the above discussion, we assumed that the TM modes dominate. Combining (9.107) and (9.134), we successfully assign individual emissions to the specific TMml modes, in which the indices m and l are longitudinal and transverse modes, respectively. Note that m ≥ 1 and l ≥ 0. Whereas the index m is associated with the presence of diffraction grating, l is pertinent to the waveguide condition. In general, it is quite difficult to precisely predict the phase refractive index and effective index of anisotropic dielectric materials such as organic crystals. This situation leads to the difficulty in developing optical devices based on the organic crystals. In that respect, (9.134) is highly useful in designing high-performance optical devices. At the same time, the angle-dependent emission spectroscopy offers various information necessary to construct the device. As a typical example, Fig. 9.16a [7] shows the angle-dependent emission spectra of a P6T crystal. For this, the grazing emission was intended. There are two series of progressions in which the emission peak locations are either redshifted or blueshifted with increasing angles jθj; the angle θ is defined as that made between two vectors K and k0; see Figs. 9.14 and 9.16b as well as (9.135). Also, these progressions exhibit the extrema at θ = 0 as expected. Thus, Fig. 9.16 obviously demonstrates that the redshifted and blueshifted progressions result from the device geometry represented by Fig. 9.14a [corresponding to Case (1)] and Fig. 9.14b [Case (2)], respectively. The first and second equations of (9.138) are associated with the emission spectra recorded at θ = 0 in a series of the redshifted and blueshifted progressions, respectively. That is, the index neff was determined from (9.138) by replacing λp with the peak wavelength observed at θ = 0 of Fig. 9.16a and replacing m with the

9.5 Lasers

391

RotatAngle ion angle (degree)

Intensity 3 (10 counts)

(a)

(b)

4 2 0

90 60 30 0

−30 −60 −90

600 700 800 Wavelength (nm)

Fig. 9.16 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. (Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1. 5030486). (b) Geometrical relationship between k0 and K. The grazing emissions (k0) were collected at various angles θ made with the grating wavevector direction (K)

most suitable integer m (vide supra). In such a manner, individual experimental emission data were assigned to a specific TMml mode. Namely, the TMml mode assignment has been completed by combining (k0, l) that was determined from (9.134) and (k0, m) decided by (9.138). Notice that the same number m is held unchanged within a same series of redshifted or blueshifted progression (i.e., 90 ° ≤ θ ≤ 90° in Fig. 9.16a). This allows us to experimentally determine the dispersion of effective indices within the same progression (either redshifted or blueshifted) using (9.107). The emission data experimentally obtained and assigned above are displayed in Fig. 9.17 [7]. The observed emission peaks are indicated with open circles/triangles or closed circles/triangles with blueshifted or redshifted peaks, respectively. In Fig. 9.17, the TM20 (blueshifted) and TM10 (redshifted) modes are seamlessly jointed so as to be an upper group of emission peaks. A lower group of emission peaks are assigned to TM21 (blueshifted) or TM11 (redshifted) modes. For those emission data, once again, the relevant modes are seamlessly jointed. Notice that these modes may have multiple values of neff according to different εξξ, εζζ, and εξζ that vary with the different propagation directions of light (i.e., the ξ-axis) within the slab crystal waveguide plane. The color solid curves represent the results obtained by purely numerical computations based on (9.134). Nonetheless, the dielectric constant ellipsoid elements εξξ, εζζ, and εξζ reflect the direction of the wavenumber vector k0 through (9.107)

9

Refractive index, effective index

392

Light Quanta: Radiation and Absorption

3 (2, 0, b)

(2, 1, b)

(m, l, s) = (1, 0, r)

2 (1, 1, r)

600

700 800 Peak wavelength (nm)

Fig. 9.17 Wavelength dispersion of the effective indices neff for the P6T crystal waveguide. The open and closed symbols show the data pertinent to the blueshifted and redshifted peaks, respectively. These data were calculated from either (9.105) or (9.107). The colored solid curves represent the dispersions of the effective refractive indices computed from (9.134). The numbers and characters (m, l, s) denote the diffraction order (m), order of transverse mode (l ), and shift direction (s) that distinguishes between blueshift (b) and redshift (r). A black solid curve drawn at the upper right indicates the wavelength dispersion of the phase refractive indices (n) of the P6T crystal related to the one principal axis [15]. (Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.5030486)

and, hence, (9.134) is associated with the experimental data in that sense. As clearly recognized from Fig. 9.17, agreement between the experiments and numerical computations is satisfactory. These curves are compared with a black solid curve drawn at the upper right that indicates the wavelength dispersion of the phase refractive indices (n) of the P6T crystal related to the one principal axis [15]. In a practical sense, it will be difficult to make a device that causes the laser oscillation at an intended wavelength using an anisotropic crystal as in the case of present example. It is because mainly of difficulty in predicting neff of the anisotropic crystal. From (9.134), however, we can readily determine the wavelength dispersion of neff of the crystal. From the Bragg’s condition (9.141), in turn, Λ can precisely be designed for a given index m and suitable λp (at which neff can be computed with a crystal of its thickness d ). Naturally, it is sensible to set λp at a wavelength where high optical gain (or emission gain) is anticipated [16]. Example 9.3 [13] Thus, everything has been arranged for one to construct a highperformance optical device, especially a laser. Using a single crystal of BP3T (see Fig. 9.18a for the structural formula), Inada and co-workers have developed a device that causes the laser oscillation at an intended wavelength [13]. To this end, the BP3T crystal was placed on a silicon oxide substrate and subsequently spin-coated with a photoresist film. The photoresist film was then

9.5

Lasers

393

(a) S

S

S

BP3T

Effective/phase refractive index

(b) 2.8 2.6 (m, l) = (3, 0) 2.4 2.2

(2, 0, b)

(m, l, s) = (1, 0, r) 550 600 650 700 Peak wavelength (nm)

Fig. 9.18 Wavelength dispersions of the effective/phase refractive indices for the BP3T crystal waveguide. (a) Structural formula of BP3T. (b) Dispersion data of the device D1 (see Table 9.2). Colored solid curves (either green or red) are obtained from (9.134). The data of green circles and red triangles are obtained through (9.107) using the emission directions and the emission line locations. The laser oscillation line is indicated with a blue square as a TM30 mode occurring at 568.1 nm. The black solid curve drawn at the upper right indicates the dispersion of the phase refractive index of the BP3T crystal pertinent to the one principal axis [15]. (Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/1.5129425)

engraved with a one-dimensional diffraction grating so that the a-axis of the BP3T crystal was parallel to the grating wavevector. With this device geometry, the angledependent spectroscopy was carried out in a weak excitation regime to determine the wavelength dependence of the effective index. At the same time, strong excitation measurement was performed to analyze the laser oscillation data.

394

9

Light Quanta: Radiation and Absorption

Table 9.2 Laser characteristics of BP3T crystal devicesa

Device D1

Crystal thickness d (nm) 428

Grating period Λ (nm) 319.8

Actual lasing wavelength λa (nm) 568.1

Intended emission wavelength λi (nm) 569.0

D2

295

349.2

569.6

569.0

Effective index neff at λa 2.65c 2.66d 2.48c 2.45d

Validity Vb 0.994 0.988

a

Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/ 1.5129425 b See (9.142) in the text c Estimated from neff(WG) in (9.134) d Estimated from neff(PM) in (9.141)

The BP3T crystal exhibited a high emission gain around 570 nm [17]. Accordingly, setting the laser oscillation wavelength at 569.0 nm, Inada and co-workers made the BP3T crystal devices with different crystal thicknesses. As an example, using a crystal of 428 nm in thickness (device D1; Table 9.2), on the basis of (9.134) they estimated the effective index neff at 569.0 nm in the direction of the a-axis of the BP3T crystal. Meanwhile, using (9.141) and assuming m to be 3 in (9.141), the grating period was decided with the actual period of the resulting device being 319.8 nm (see Table 9.2). The results of the emission measurements for the device D1 are displayed in Fig. 9.18b. The figure shows the dispersion of the effective indices neff of the BP3T crystal. Once again, the TM20 (blueshifted) and TM10 (redshifted) modes are seamlessly jointed. As in the case of Fig. 9.9a, the laser oscillation line shows up as a TM30 mode among other modes. That is, the effective index neff of the laser oscillation line sits on the dispersion curve consisting of other emission lines arising from the weak excitation. Figure 9.19 shows the distinct single-mode laser oscillation line occurring at 568.1 nm (Table 9.2). This clearly indicates that the laser oscillation has been achieved at a location very close to the designed wavelength (at 569.0 nm). Compare the spectrum of Fig. 9.19 with that of Fig. 9.7. Manifestation of the single-mode line is evident and should be contrasted with the longitudinal multimode laser oscillation of Fig. 9.7. To quantify the validity of (9.134), the following quantity V is useful [13]: V  1-

neff ðWGÞ - neff ðPMÞ , neff ðWGÞ

ð9:142Þ

where V stands for validity of (9.134); neff (WG) was estimated from the waveguide condition of (9.134) in combination with the emission data obtained with the device; neff (PM) was obtained from the phase matching condition described by (9.107),

Lasers

8

Excitation pulse energy –2 (J cm ) 121.7 118.4 113.0 109.3 105.0

6

4

Fig. 9.19 Single-mode laser oscillation spectrum of an organic semiconductor crystal BP3T (the device D1). A sharply resolved laser oscillation line is noted at 568.1 nm. (Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/ 10.1063/1.5129425)

395

Intensity (10 counts)

9.5

4 2 0 565

570 Wavelength (nm)

(9.139), or (9.141). We have 0 ≤ V ≤ 1 and a larger number of V (close to 1) means that the validity of (9.134) is high (see Table 9.2). Note that the validity estimation by means of (9.142) can equally be applied to both the laser oscillation and non-laser emission lines. In a word, as long as the thickness of a crystal becomes available along with permittivity tensor of the crystal, one can readily construct a single-mode laser that oscillates at an intended wavelength. As discussed above in detail, Examples 9.1, 9.2, and 9.3 enable us to design a high-performance laser device using an organic crystal that is characterized by a pretty complicated crystal structure associated with anisotropic refractive indices. From a practical point of view, a laser device is often combined with a diffraction grating. Since organic semiconductors are less robust as compared to inorganic semiconductors, special care should be taken not to damage an organic semiconductor crystal when equipping it with the diffraction grating via rigorous semiconductor processing. In that respect, Inada and co-workers further developed an effective method [18]. The method is based upon the focused ion beam lithography in combination with subsequent plasma etching. They skillfully minimized a damage from which an organic crystal might well suffer. The above-mentioned design principle enables one to construct effective laser devices that consist of light-emitting materials either organic or inorganic more widely. At the same time, these examples are expected to provide an effective methodology in interdisciplinary fields encompassing solid-state physics and solidstate chemistry as well as device physics.

396

9.6

9

Light Quanta: Radiation and Absorption

Mechanical System

As outlined above, two-level atoms have distinct characteristics in connection with lasers. Electromagnetic waves similarly confined within a one-dimensional cavity exhibit related properties and above all have many features in common with a harmonic oscillator [19]. We have already described several features and properties of the harmonic oscillator (Chap. 2). Meanwhile, we have briefly discussed the formation of electromagnetic stationary waves (Chap. 8). There are several resemblance and correspondence between the harmonic oscillator and electromagnetic stationary waves when we view them as mechanical systems. The point is that in a harmonic oscillator the position and momentum are in-quadrature relationship; i.e., their phase difference is π/2. For the electromagnetic stationary waves, the electric field and magnetic field are in-quadrature relationship as well. In Chap. 8, we examine the conditions under which stationary waves are formed. In a dielectric medium both sides of which are equipped with metal layers (or mirrors), the electric field is described as E = 2E 1 εe sin ωt sin kz:

ð8:201Þ

In (8.201), we assumed that forward and backward electromagnetic waves are propagating in the direction of the z-axis. Here, we assume that the interfaces (or walls) are positioned at z = 0 and z = L. Within a domain [0, L], the two waves form a stationary wave. Since this expression assumed two waves, the electromagnetic energy was doubled. To normalize the energy sopthat a single wave is contained, the amplitude E1 of (8.201) should be divided by 2. Therefore, we think of a following description for E: p E = 2Ee1 sin ωt sin kz or

p Ex = 2E sin ωt sin kz,

where we designated the polarization vector as a direction of the x-axis. At the same time, we omitted the index from the amplitude. Thus, from the second equation of (7.65) we have ∂H y 1 ∂E x ==μ ∂z ∂t

p

2Ek sin ωt cos kz: μ

ð9:143Þ

Note that this equation appeared as (8.131) as well. Integrating both sides of (9.143), we get p Hy =

2Ek cos ωt cos kz: μω

Using a relation ω = vk, we have

9.6

Mechanical System

397

p

2E cos ωt cos kz þ C, μv

Hy =

where v is a light velocity in the dielectric medium and C is an integration constant. Removing C and putting H

E , μv

we have p H y = 2H cos ωt cos kz: Using a vector expression, we have p H = e2 2H cos ωt cos kz: Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form nodes and antinodes, respectively. Namely, the two fields are in-quadrature. Let us calculate electromagnetic energy of the dielectric medium within a cavity. In the present case the cavity is meant as the dielectric sandwiched by a couple of metal layer. We have W = W e þ W m, where W is the total electromagnetic energy; We and Wm are electric and magnetic energies, respectively. Let the length of the cavity be L. Then the energy per unit cross-section area is described as We =

ε 2

L

E2 dz and W m =

0

μ 2

L

H 2 dz:

0

Performing integration, we get ε 2 LE sin 2 ωt, 2 μ μ E = LH 2 cos 2 ωt = L 2 2 μv

We = Wm

2

cos 2 ωt =

ε 2 LE cos 2 ωt, 2

where we used 1/v2 = εμ with the last equality. Thus, we have

ð9:144Þ

398

9

W=

Light Quanta: Radiation and Absorption

ε 2 μ 2 LE = LH : 2 2

ð9:145Þ

~ we have Representing an energy per unit volume as W~e , W~m , and W, ε W~e = E 2 sin2 ωt, 2

ε W~m = E2 cos 2 ωt, 2

~ = ε E2 = μ H 2 : W 2 2

ð9:146Þ

In Chap. 2, we treated motion of a harmonic oscillator. There, we had xð t Þ =

v0 sin ωt = x0 sin ωt: ω

ð2:7Þ

Here, we have defined an amplitude of the harmonic oscillation as x0 (>0) given by x0 

v0 : ω

Then, momentum is described as pðt Þ = mxð_t Þ = mωx0 cos ωt: Defining p0  mωx0 , we have pðt Þ = p0 cos ωt: Let a kinetic energy and potential energy of the oscillator be K and V, respectively. Then, we have 1 1 2 pð t Þ2 = p cos 2 ωtxðt Þ, 2m 2m 0 1 1 V = mω2 xðt Þ2 = mω2 x0 2 sin 2 ωtxðt Þ, 2 2 1 1 2 1 W =K þ V = p = mv0 2 = mω2 x0 2 : 2 2 2m 0 K=

ð9:147Þ

Comparing (9.146) and (9.147), we recognize the following relationship in energy between the electromagnetic fields and harmonic oscillator motion [19]: p

p mωx0 ⟷ εE

and

p p p0 = m⟷ μH:

ð9:148Þ

References

399

Thus, there is an elegant contradistinction between the dynamics of electromagnetic fields in cavity and motion of a harmonic oscillator. In fact, quantum electrodynamics (QED) is based upon the treatment of a quantum harmonic oscillator introduced in Chap. 2. Various aspects of QED will be studied in Part V along with the Dirac equation and its handling.

References 1. Moore WJ (1955) Physical chemistry, 3rd edn. Prentice-Hall, Englewood Cliffs 2. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester 3. Sunakawa S (1965) Theoretical electromagnetism. Kinokuniya, Tokyo (in Japanese) 4. Loudon R (2000) The quantum theory of light, 3rd edn. Oxford University Press, Oxford 5. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge 6. Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5):053113 7. Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23):235501 8. Siegman AE (1986) Lasers. University Science Books, Sausalito 9. Palmer C (2005) Diffraction grating handbook, 6th edn. Newport Corporation, New York 10. Hotta S, Yamao T (2011) The thiophene/phenylene co-oligomers: exotic molecular semiconductors integrating high-performance electronic and optical functionalities. J Mater Chem 21(5):1295–1304 11. Yamao T, Nishimoto Y, Terasaki K, Akagami H, Katagiri T, Hotta S, Goto M, Azumi R, Inoue M, Ichikawa M, Taniguchi Y (2010) Single crystal growth and charge transport properties of an alternating co-oligomer composed of thiophene and phenylene rings. Jpn J Appl Phys 49(4S):04DK20 12. Yariv A, Yeh P (2003) Optical waves in crystals: propagation and control of laser radiation. Wiley, New York 13. Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5):053102 14. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester 15. Sakurai Y, Hayashi W, Yamao T, Hotta S (2014) Phase refractive index dispersions of organic oligomer crystals with different molecular alignments. Jpn J Appl Phys 53(2S):02BB01 16. Yamao T, Yamamoto K, Hotta S (2009) A high optical-gain organic crystal comprising thiophene/phenylene co-oligomer nanomolecules. J Nanosci Nanotechnol 9(4):2582–2585 17. Bisri SZ, Takenobu T, Yomogida Y, Shimotani H, Yamao T, Hotta S, Iwasa Y (2009) High mobility and luminescent efficiency in organic single-crystal light-emitting transistors. Adv Funct Mater 19:1728–1735 18. Inada Y, Yamashita S, Murakami S, Takahashi K, Yamao T, Hotta S (2021) Direct fabrication of a diffraction grating onto organic oligomer crystals by focused ion beam lithography followed by plasma etching. Jpn J Appl Phys 60(12):120901 19. Fox M (2006) Quantum optics. Oxford University Press, Oxford

Chapter 10

Introductory Green’s Functions

In this chapter, we deal with various properties and characteristics of differential equations, especially first-order linear differential equations (FOLDEs) and second-order linear differential equations (SOLDEs). These differential equations are characterized by differential operators and boundary conditions (BCs). Of these, differential operators appearing in SOLDEs are particularly important. Under appropriate conditions, the said operators can be converted to Hermitian operators. The SOLDEs associated with classical orthogonal polynomials play a central role in many fields of mathematical physics including quantum mechanics and electromagnetism. We study the general principle of SOLDEs in relation to several specific SOLDEs we have studied in Part I and examine general features of an eigenvalue problem and an initial-value problem (IVP). In this context, Green’s functions provide a powerful tool for solving SOLDEs. For a practical purpose, we deal with actual construction of Green’s functions. In Sect. 8.8, we dealt with steadystate characteristics of electromagnetic waves in dielectrics in terms of propagation, reflection, and transmission. When we consider transient characteristics of electromagnetic and optical phenomena, we often need to deal with SOLDEs having constant coefficients. This is well known in connection with a motion of a damped harmonic oscillator. In the latter part of this chapter, we treat the initial-value problem of a SOLDE of this type.

10.1

Second-Order Linear Differential Equations (SOLDEs)

A general form of n-th order linear differential equations has the following form:

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_10

401

402

10

Introductory Green’s Functions

Table 10.1 Characteristics of SOLDEs Equation Boundary conditions

an ð x Þ

Type I Homogeneous Homogeneous

Type II Homogeneous Inhomogeneous

Type III Inhomogeneous Homogeneous

Type IV Inhomogeneous Inhomogeneous

dn u dn - 1 u du þ an - 1 ðxÞ n - 1 þ ⋯ þ a1 ðxÞ þ a0 ðxÞu = dðxÞ: n dx dx dx

ð10:1Þ

If d(x) = 0, the differential equation is said to be homogeneous; otherwise, it is called inhomogeneous. Equation (10.1) is a linear function of u and its derivatives. Likewise, we have a SOLDE such that að x Þ

du d2 u þ bðxÞ þ cðxÞu = dðxÞ: 2 dx dx

ð10:2Þ

In (10.2) we assume that the variable x is real. The equation can be solved under appropriate boundary conditions (BCs). A general form of BCs is described as B1 ðuÞ = α1 uðaÞ þ β1

du dx

x=a

þ γ 1 uðbÞ þ δ1

du dx

x=b

B2 ðuÞ = α2 uðaÞ þ β2

du dx

x=a

þ γ 2 uðbÞ þ δ2

du dx

x=b

= σ1,

ð10:3Þ

= σ2,

ð10:4Þ

where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b], where a and b can be infinity (i.e., ±1). The LHS of B1(u) and B2(u) is referred to as boundary functionals [1, 2]. If σ 1 = σ 2 = 0, the BCs are called homogeneous; otherwise, the BCs are said to be inhomogeneous. In combination with the inhomogeneous equation expressed as (10.2), Table 10.1 summarizes characteristics of SOLDEs. We have four types of SOLDEs according to homogeneity and inhomogeneity of equations and BCs. Even though SOLDEs are mathematically tractable, yet it is not easy necessarily to solve them depending upon the nature of a(x), b(x), and c(x) of (10.2). Nonetheless, if those functions are constant coefficients, it can readily be solved. We will deal with SOLDEs of that type in great deal later. Suppose that we find two linearly independent solutions u1(x) and u2(x) of a following homogeneous equation: að x Þ

d2 u du þ bðxÞ þ cðxÞu = 0: dx dx2

ð10:5Þ

Then, any solution u(x) of (10.5) can be expressed as their linear combination such that

10.1

Second-Order Linear Differential Equations (SOLDEs)

403

uðxÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ,

ð10:6Þ

where c1 and c2 are some arbitrary constants. In general, suppose that there are arbitrarily chosen n functions; i.e., f1(x), f2(x), ⋯, fn(x). Suppose a following equation with those functions: a1 f 1 ðxÞ þ a2 f 2 ðxÞ þ ⋯ þ an f n ðxÞ = 0,

ð10:7Þ

where a1, a2, ⋯, and an are constants. If a1 = a2 = ⋯ = an = 0, (10.7) always holds. In this case, (10.7) is said to be a trivial linear relation. If f1(x), f2(x), ⋯, and fn(x) satisfy a non-trivial linear relation, f1(x), f2(x), ⋯, and fn(x) are said to be linearly dependent. That is, the non-trivial expression means that in (10.7) at least one of a1, a2, ⋯, and an is non-zero. Suppose that an ≠ 0. Then, from (10.7), fn(x) is expressed as f n ð xÞ = -

a1 a a f ðxÞ - 2 f 2 ðxÞ - ⋯ - n - 1 f n - 1 ðxÞ: an 1 an an

ð10:8Þ

If f1(x), f2(x), ⋯, and fn(x) are not linearly dependent, they are called linearly independent. In other words, the statement that f1(x), f2(x), ⋯, and fn(x) are linearly independent is equivalent to that (10.7) holds if and only if a1 = a2 = ⋯ = an = 0. We will have relevant discussion in Part III. Now suppose that with the above two linearly independent functions u1(x) and u2(x), we have a1 u1 ðxÞ þ a2 u2 ðxÞ = 0:

ð10:9Þ

Differentiating (10.9), we have a1

du1 ðxÞ du ðxÞ þ a2 2 = 0: dx dx

ð10:10Þ

Expressing (10.9) and (10.10) in a matrix form, we get u1 ðxÞ du1 ðxÞ dx

u2 ð x Þ du2 ðxÞ dx

a1 a2

= 0:

ð10:11Þ

Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the following expression holds:

404

10

u1 ð x Þ du1 ðxÞ dx

Introductory Green’s Functions

u2 ð xÞ du2 ðxÞ  W ðu1 , u2 Þ ≠ 0, dx

ð10:12Þ

where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) = 0, then we have u1

du2 du - u2 1 = 0: dx dx

ð10:13Þ

This implies that there is a functional relationship between u1 and u2. In fact, if we du2 du1 2 can express as u2 = u2(u1(x)), then du dx = du1 dx . That is, u1

du2 du du du = u1 2 1 = u2 1 dx du1 dx dx

or

u1

du2 = u2 du1

or

du2 du1 = , u2 u1

ð10:14Þ

where the second equality of the first equation comes from (10.13). The third equation can easily be integrated to yield ln

u2 = c or u1

u2 = ec u1

or

u2 = e c u 1 :

ð10:15Þ

Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to show if u1(x) and u2(x) are linearly dependent, W(u1, u2) = 0. Thus, we have a following statement: Two functions are linearly dependent. , W(u1, u2) = 0. Then, as the contraposition of this statement we have Two functions are linearly independent. , W (u1, u2) ≠ 0. Conversely, suppose that we have another solution u3(x) for (10.5) besides u1(x) and u2(x). Then, we have d 2 u1 du þ bðxÞ 1 þ cðxÞu1 = 0, 2 dx dx 2 d u du aðxÞ 22 þ bðxÞ 2 þ cðxÞu2 = 0, dx dx 2 d u du aðxÞ 23 þ bðxÞ 3 þ cðxÞu3 = 0: dx dx

að x Þ

Again rewriting (10.16) in a matrix form, we have

ð10:16Þ

10.1

Second-Order Linear Differential Equations (SOLDEs)

d 2 u1 dx2 d 2 u2 dx2 d 2 u3 dx2

du1 dx

405

u1 a

du2 dx du3 dx

u2

b

= 0:

ð10:17Þ

c

u3

A necessary and sufficient condition to obtain a non-trivial solution (i.e., a solution besides a = b = c = 0) is that [1] d 2 u1 dx2 d 2 u2 dx2

du1 dx

d 2 u3 dx2

du2 dx du3 dx

u1 du1 = - dx d 2 u1 dx2

u2 du2 dx d 2 u2 dx2

u1 u2 = 0:

ð10:18Þ

u3

Note here that d 2 u1 dx2 d 2 u2 dx2 d 2 u3 dx2

du1 dx du2 dx du3 dx

u1 u2 u3

u3 du3 dx  - W ðu1 , u2 , u3 Þ, d 2 u3 dx2

ð10:19Þ

where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we used the fact that a determinant of a matrix is identical to that of its transposed matrix and that a determinant of a matrix changes the sign after permutation of row vectors. To be short, a necessary and sufficient condition to get a non-trivial solution is that W(u1, u2, u3) vanishes. This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19) mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is, we have no third linearly independent solution. Consequently, the general solution of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a fundamental set of solutions of (10.5). Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is a particular solution of (10.2). Let us think of a following function v(x) such that:

406

10

Introductory Green’s Functions

uðxÞ = vðxÞ þ up ðxÞ:

ð10:20Þ

Substituting (10.20) for (10.2), we have að x Þ

dup d 2 up d2 v dv þ b ð x Þ þ bð x Þ þ cðxÞup = dðxÞ: þ c ð x Þv þ a ð x Þ dx dx dx2 dx2

ð10:21Þ

Therefore, we have að x Þ

dv d2 v þ bðxÞ þ cðxÞv = 0: dx dx2

But v(x) can be described by a linear combination of u1(x) and u2(x) as in the case of (10.6). Hence, the general solution of (10.2) should be expressed as uðxÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ þ up ðxÞ,

ð10:22Þ

where c1 and c2 are arbitrary (complex) constants.

10.2

First-Order Linear Differential Equations (FOLDEs)

In a discussion that follows, first-order linear differential equations (FOLDEs) supply us with useful information. A general form of FOLDEs is expressed as að x Þ

du þ bðxÞu = d ðxÞ ½aðxÞ ≠ 0: dx

ð10:23Þ

An associated boundary condition is given by a boundary functional B(u) such that BðuÞ = αuðaÞ þ βuðbÞ = σ,

ð10:24Þ

where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23) d(x)  0, (10.23) can readily be integrated to yield a solution. Let us multiply both sides of (10.23) by w(x). Then we have wðxÞaðxÞ

du þ wðxÞbðxÞu = wðxÞdðxÞ: dx

Here, we define p(x) as follows:

ð10:25Þ

10.2

First-Order Linear Differential Equations (FOLDEs)

407

pðxÞ  wðxÞaðxÞ,

ð10:26Þ

where w(x) is called a weight function. As mentioned in Sect. 2.4, the weight function is a real and non-negative function within the domain considered. Here we suppose that dpðxÞ = wðxÞbðxÞ: dx

ð10:27Þ

Then, (10.25) can be rewritten as d ½pðxÞu = wðxÞdðxÞ: dx

ð10:28Þ

Thus, we can immediately integrate (10.28) to obtain a solution u=

1 pð x Þ

x

wðx0 Þdðx0 Þdx0 þ C ,

ð10:29Þ

where C is an arbitrary integration constant. To seek w(x), from (10.26) and (10.27) we have p0 = ðwaÞ0 = wb = wa

b : a

ð10:30Þ

This can easily be integrated for wa to be expressed as wa = C0 exp

b dx a

or

w=

C0 exp a

b dx , a

ð10:31Þ

0

where C′ is an arbitrary integration constant. The quantity Ca must be non-negative so that w can be non-negative. Example 10.1 Let us think of a following FOLDE within an interval [a, b]; i.e., a ≤ x ≤ b. du þ xu = x: dx

ð10:32Þ

A boundary condition is set such that uðaÞ = σ:

ð10:33Þ

Notice that (10.33) is obtained by setting α = 1 and β = 0 in (10.24). Following the above argument, we obtain a solution described as

408

10

uð x Þ =

1 pð x Þ

x

Introductory Green’s Functions

wðx0 Þd ðx0 Þdx0 þ uðaÞpðaÞ :

ð10:34Þ

a

Also, we have x

pðxÞ = wðxÞ = exp

x0 dx0 = exp

a

1 2 x - a2 : 2

ð10:35Þ

The integration of RHS of (10.34) can be performed as follows: x

x

wðx0 Þdðx0 Þdx0 ¼

a

x0 exp

a

1 02 x - a2 2

dx0

x2 =2

a2 ¼ exp 2

a2 =2

a2 ¼ exp 2

x2 a2 exp - exp 2 2

¼ exp -

exp tdt ð10:36Þ

a2 x2 exp - 1 ¼ pðxÞ - 1, 2 2

where with the second equality we used an integration by substitution of 12 x0 ⟶t. Considering (10.33) and putting p(a) = 1 in (10.34), (10.34) is rewritten as 2

1 σ-1 ½ pð x Þ - 1 þ σ  = 1 þ pð x Þ pð x Þ 1 2 2 = 1 þ ðσ - 1Þ exp a -x , 2

uð x Þ =

ð10:37Þ

where with the last equality we used (10.35). Notice that when we put σ = 1 in (10.37), we get u(x)  1. In fact, u(x)  1 is certainly a solution of (10.32) and satisfies the BC of (10.33). Uniqueness of a solution is thus guaranteed. In the next two examples, we make general discussions. Example 10.2 Let us consider a following differential operator: Lx =

d : dx

We think of a following identity using an integration by parts:

ð10:38Þ

10.2

First-Order Linear Differential Equations (FOLDEs) b

dxφ

a

d ψ dx

b

þ

dx a

409

d  φ ψ = ½φ ψ ba : dx

ð10:39Þ

Rewriting this, we get b a

dxφ

d ψ dx

b



dx -

a

d φ ψ = ½φ ψ ba : dx

ð10:40Þ

Looking at (10.40), we notice that LHS comprises a difference between two integrals, while RHS referred to as a boundary term (or surface term) does not contain an integral. Recalling the expression (1.128) and defining d L~x  dx in (10.40), we have hφjLx ψ i - L~x φjψ = ½φ ψ ba :

ð10:41Þ

Here, RHS of (10.41) needs to vanish so that we can have hφjLx ψ i = L~x φjψ :

ð10:42Þ

Meanwhile, adopting the expression (1.112) with respect to an adjoint operator, we have a following expression such that hφjLx ψ i = Lx { φjψ :

ð10:43Þ

Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary functions, we have L~x = Lx { . Thus, as an operator adjoint to Lx we get d = - Lx : Lx { = L~x = dx Notice that only if the surface term vanishes, the adjoint operator Lx{ can appropriately be defined. We will encounter a similar expression again in Part III. We add that if with an operator A we have a relation described by A{ = - A,

ð10:44Þ

the operator A is said to be anti-Hermitian. We have already encountered such an operator in Sects. 1.5 and 3.7.

410

10

Introductory Green’s Functions

Let us then examine on what condition the surface term vanishes. The RHS of (10.40) and (10.41) is given by φ ðbÞψ ðbÞ - φ ðaÞψ ðaÞ: For this term to vanish, we should have φ ðbÞψ ðbÞ = φ ðaÞψ ðaÞ

or

φ ðbÞ ψ ðaÞ = : φ ðaÞ ψ ðbÞ

If ψ(b) = 2ψ(a), then we should have φ(b) = φ(a)/2 for the surface term to vanish. Recalling (10.24), the above conditions are expressed as Bðψ Þ = 2ψ ðaÞ - ψ ðbÞ = 0,

ð10:45Þ

1 φðaÞ - φðbÞ = 0: 2

ð10:46Þ

B 0 ð φÞ =

The boundary functional B′(φ) is said to be adjoint to B(ψ). The two boundary functionals are admittedly different. If, however, we set ψ(b) = ψ(a), then we should have φ(b) = φ(a) for the surface term to vanish. That is Bðψ Þ = ψ ðaÞ - ψ ðbÞ = 0,

ð10:47Þ

B0 ðφÞ = φðaÞ - φðbÞ = 0:

ð10:48Þ

Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs with respect to these functionals. As discussed above, a FOLDE is characterized by its differential operator as well as a BC (or boundary functional). This is similarly the case with SOLDEs as well. Example 10.3 Next, let us consider a following differential operator: Lx =

1 d : i dx

ð10:49Þ

As in the case of Example 10.2, we have b a

dxφ

1 d ψ i dx



b

dx a

1 d 1 φ ψ = ½φ ψ ba : i dx i

Also rewriting (10.50) using an inner product notation, we get

ð10:50Þ

10.3

Second-Order Differential Operators

411

hφjLx ψ i - hLx φjψ i =

1  b ½φ ψ a : i

ð10:51Þ

Apart from the factor 1i , RHS of (10.51) is again given by φ ðbÞψ ðbÞ - φ ðaÞψ ðaÞ: Repeating a discussion similar to Example 10.2, when the surface term vanishes, we get hφjLx ψ i = hLx φjψ i:

ð10:52Þ

Comparing (10.43) and (10.52), we have hLx φjψ i = Lx { φjψ :

ð10:53Þ

Again considering that φ and ψ are arbitrary functions, we get Lx { = Lx :

ð10:54Þ

As in (10.54), if the differential operator is identical to its adjoint operator, such an operator is called self-adjoint. On the basis of (1.119), Lx would apparently be Hermitian. However, we have to be careful to assure that Lx is Hermitian. For a differential operator to be Hermitian, (1) the said operator must be self-adjoint. (2) The two boundary functionals adjoint to each other must be identical. In other words, ψ and φ must satisfy the same homogeneous BCs with respect to these functionals. In this example, we must have the same boundary functionals as those described by (10.47) and (10.48). If and only if the conditions (1) and (2) are satisfied, the operator is said to be Hermitian. It seems somewhat a formal expression. Nonetheless, satisfaction of these conditions is also the case with second-order differential operators so that these operators can be Hermitian. In fact, SOLDEs we studied in Part I are essentially dealt with within the framework of the aforementioned formalism.

10.3

Second-Order Differential Operators

The second-order differential operators are the most common operators and frequently treated in mathematical physics. The general differential operators are described as

412

10

Lx = aðxÞ

Introductory Green’s Functions

d2 d þ bðxÞ þ cðxÞ, dx dx2

ð10:55Þ

where a(x), b(x), and c(x) can in general be complex functions of a real variable x. Let us think of the following identities [1]: dðav Þ d2 ðav Þ du d2 u d , -u = -u av 2 2 dx dx dx dx dx dðbv Þ d du = ½buv , v b þ u dx dx dx v cu - ucv = 0:

v a

ð10:56Þ

Summing both sides of (10.56), we have an identity d 2 ðav Þ d ðbv Þ d2 u du þ cv þ b þ cu - u 2 dx dx dx dx2 dðav Þ du d d þ ½buv : = -u av dx dx dx dx

v a

ð10:57Þ

Hence, following the expressions of Sect. 10.2, we define Lx{ such that Lx { v





d2 ðav Þ dðbv Þ þ cv : dx dx2

ð10:58Þ

Taking a complex conjugate of both sides, we get Lx { v =

d 2 ð a v Þ d ð b vÞ þ c v: dx dx2

ð10:59Þ

Considering the differential of a product function, we have as Lx{ Lx { = a

d2 d d2 a db da þ 2 - b þ c : þ 2 dx dx dx dx2 dx

ð10:60Þ

Replacing (10.57) with (10.55) and (10.59), we have 

v ðLx uÞ - Lx { v u =

d ðav Þ du d -u þ buv : av dx dx dx

ð10:61Þ

Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61) within that interval, we get

10.3

Second-Order Differential Operators s



413

dx v ðLx uÞ - Lx { v u = av

r

d ðav Þ du þ buv : -u dx dx r s

ð10:62Þ

Using the definition of an inner product described in (1.128) and rewriting (10.62), we have hvjLx ui - Lx { vju = av

dðav Þ du þ buv : -u dx dx r s

Here if RHS of the above (i.e., the surface term of the above expression) vanishes, we get hvjLx ui = Lx { vju : We find that this notation is consistent with (1.112). Bearing in mind this situation, let us seek a condition under which the differential operator Lx is Hermitian. Suppose here that a(x), b(x), and c(x) are all real and that daðxÞ = bðxÞ: dx

ð10:63Þ

Then, instead of (10.60), we have Lx { = aðxÞ

d2 d þ bð x Þ þ cð xÞ = L x : dx dx2

Thus, we are successful in constituting a self-adjoint operator Lx. In that case, (10.62) can be rewritten as s

dxfv ðLx uÞ - ½Lx v ug = a v

r

du dv -u dx dx

s

:

ð10:64Þ

r

Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get hvjLx ui = hLx vjui: This notation is consistent with (1.119) and the Hermiticity of Lx becomes welldefined. If we do not have the condition of dadxðxÞ = bðxÞ, how can we deal with the problem? The answer is that following the procedures in Sect. 10.2, we can convert Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x) b(x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as

414

10

Introductory Green’s Functions

d 2 ðawv Þ dðbwv Þ d2 u du þ cwv þ bw u þ cwu dx dx dx2 dx2 dðawv Þ du d d þ ½bwuv : = -u awv dx dx dx dx

v aw

ð10:65Þ

Let us calculate {⋯} of the second term for LHS of (10.65). Using the conditions of (10.26) and (10.27), we have d 2 ðawv Þ dðbwv Þ þ cwv dx dx2 = ðawÞ ′ v þ awv = bwv þ awv









-

-

bw ′ v þ bwv

bw ′ v þ bwv





þ cwv

þ cwv

= ðbwÞ ′ v þ bwv þ ðawÞ ′ v þ awv - ðbwÞ ′ v - bwv þ cwv ′







ð10:66Þ

= ðawÞ ′ v þ awv þ cwv = bwv þ awv þ cwv ′















= w av þ bv þ cv = w a v þ b v þ c v 

= w av″ þ bv ′ þ cv : The second last equality of (10.66) is based on the assumption that a(x), b(x), and c(x) are real functions. Meanwhile, for RHS of (10.65) we have dðawv Þ du d d þ ½bwuv  -u awv dx dx dx dx =

d du dv d þ ½bwuv  awv - uðawÞ0 v - uaw dx dx dx dx

ð10:67Þ

dv d d du þ ½bwuv  = awv - ubwv - uaw dx dx dx dx =

d du dv aw v -u dx dx dx

=

d du dv p v -u dx dx dx

:

With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we rewrite (10.65) once again as d2 u du dv d2 v þ b þ cu - uw a 2 þ b þ cv 2 dx dx dx dx  du dv d = : -u p v dx dx dx

v w a

Then, integrating (10.68) from r to s, we finally get



ð10:68Þ

10.3

Second-Order Differential Operators s

415

dxwðxÞfv ðLx uÞ - ½Lx v ug = p v

r

s

du dv -u dx dx

:

ð10:69Þ

r

The relations (10.69) along with (10.62) are called the generalized Green’s identity. We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real functions, the associated differential operator Lx can be converted to a self-adjoint form following the procedures of (10.66) and (10.67). In the above, LHS of the original homogeneous differential equation (10.5) is rewritten as du d2 u þ bðxÞwðxÞ þ cðxÞwðxÞu dx dx2 d du þ cwðxÞu: = pð x Þ dx dx

aðxÞwðxÞ

Rewriting this, we have Lx u =

1 d du pðxÞ þ cu ½wðxÞ > 0, dx wðxÞ dx

ð10:70Þ

where we have pðxÞ = aðxÞwðxÞ and

dpðxÞ = bðxÞwðxÞ: dx

ð10:71Þ

The latter equation of (10.71) corresponds to (10.63) if we assume w(x)  1. When the differential operator Lx is defined as (10.70), Lx is said to be self-adjoint with respect to a weight function of w(x). Now we examine boundary functionals. The homogeneous adjoint boundary functionals are described as follows: B{1 ðuÞ = α1 v ðr Þ þ β1

dv dx

B{2 ðuÞ = α2 v ðr Þ þ β2

dv dx

x=r

þ γ 1 v ðsÞ þ δ1

dv dx

x=r

þ γ 2 v ð s Þ þ δ 2

dv dx

x=s

x=s

= 0,

ð10:72Þ

= 0:

ð10:73Þ

In (10.3) putting α1 = 1 and β1 = γ 1 = δ1 = 0, we have B 1 ð uÞ = u ð r Þ = σ 1 : Also putting γ 2 = 1 and α2 = β2 = δ2 = 0, we have

ð10:74Þ

416

10

Introductory Green’s Functions

B2 ðuÞ = uðsÞ = σ 2 :

ð10:75Þ

σ 1 = σ 2 = 0,

ð10:76Þ

Further putting

we also get homogeneous BCs of B1 ðuÞ = B2 ðuÞ = 0;

i:e:,

uðr Þ = uðsÞ = 0:

ð10:77Þ

For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that B{1 ðuÞ = v ðr Þ and

B{2 ðuÞ = v ðsÞ:

ð10:78Þ

Then, homogeneous adjoint BCs read as v ðr Þ = v ðsÞ = 0

i:e:,

vðr Þ = vðsÞ = 0:

ð10:79Þ

In this manner, we can readily construct the homogeneous adjoint BCs the same as those of (10.77) so that Lx can be Hermitian. We list several prescriptions of typical BCs below. ð1Þ ð2Þ ð3Þ

uðr Þ = uðsÞ = 0 ðDirichlet conditionsÞ, du du = = 0 ðNeumann conditionsÞ, dx x = r dx x = s du du = ðperiodic conditionsÞ: uðr Þ = uðsÞ and dx x = r dx x = s

ð10:80Þ

Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It is because conditions (1)–(3) are not necessary but sufficient conditions for the surface terms to vanish. Such conditions are not limited to them. Meanwhile, we often have to deal with the non-vanishing surface terms. In that case, we have to start with (10.62) instead of (10.69). In Sect. 10.2, we mentioned the definition of Hermiticity of the differential operator in such a way that the said operator is self-adjoint and that homogeneous BCs and homogeneous adjoint BCs are the same. In light of the above argument, however, we may relax the conditions for a differential operator to be Hermitian. This is particularly the case when p(x) = a(x)w(x) in (10.69) vanishes at both the endpoints. We will encounter such a situation in Sect. 10.7.

10.4

10.4

Green’s Functions

417

Green’s Functions

Having aforementioned discussions, let us proceed with studies of Green’s functions for SOLDEs. Though minimum, we have to mention a bit of formalism. Given Lx defined by (10.55), let us assume L x uð x Þ = d ð xÞ

ð10:81Þ

under homogeneous BCs with an inhomogeneous term d(x) being an arbitrary function. We also assume that (10.81) is well-defined in a domain [r, s]. The numbers r and s can be infinity. Suppose simultaneously that we have L x { vð xÞ = hð x Þ

ð10:82Þ

under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an arbitrary function as well. Let us describe the above relations as and L{ jvi = jhi:

Ljui = jdi

ð10:83Þ

Suppose that there is an inverse operator L-1  G such that GL = LG = E,

ð10:84Þ

where E is an identity operator. Operating G on (10.83), we have GLjui = Ejui = jui = Gjdi:

ð10:85Þ

This implies that (10.81) has been solved and the solution is given by G| di. Since an inverse operation to differentiation is integration, G is expected to be an integral operator. We have hxjLG j yi = Lx hxjG j yi = Lx Gðx, yÞ:

ð10:86Þ

Meanwhile, using (10.84) we get hxjLG j yi = hxjE j yi = hxjyi:

ð10:87Þ

Using a weight function w(x), we generalize an inner product of (1.128) such that s

hgjf i  r

wðxÞgðxÞ f ðxÞdx:

ð10:88Þ

418

10

Introductory Green’s Functions

=

Fig. 10.1 Function | fi and its coordinate representation f(x)

As we expand an arbitrary vector using basis vectors, we “expand” an arbitrary function | fi using basis vectors | xi. Here, we are treating real numbers as if they formed continuous innumerable basis vectors on a real number line (see Fig. 10.1). Thus, we could expand | fi in terms of | xi such that s

jf i =

dxwðxÞf ðxÞjxi:

ð10:89Þ

r

In (10.89) we considered f(x) as if it were an expansion coefficient. The following notation would be reasonable accordingly: f ðxÞ  hxjf i:

ð10:90Þ

In (10.90), f(x) can be viewed as coordinate representation of | fi. Thus, from (10.89) we get s

hx0 jf i = f ðx0 Þ =

dxwðxÞf ðxÞhx0 jxi:

ð10:91Þ

r

Alternatively, we have s

f ðx0 Þ =

dxf ðxÞδðx - x0 Þ:

ð10:92Þ

r

This comes from a property of the δ function [1] described as s

dxf ðxÞδðxÞ = f ð0Þ:

r

Comparing (10.91) and (10.92), we have

ð10:93Þ

10.4

Green’s Functions

419

wðxÞhx0 jxi = δðx - x0 Þ or

hx0 jxi =

δ ðx0 - xÞ δ ð x - x0 Þ = : wð x Þ wðxÞ

ð10:94Þ

Thus comparing (10.86) and (10.87) and using (10.94), we get Lx Gðx, yÞ =

δðx - yÞ : wð x Þ

ð10:95Þ

δðx - yÞ : wð x Þ

ð10:96Þ

In a similar manner, we also have Lx { gðx, yÞ =

To arrive at (10.96), we start the discussion assuming an operator (Lx{)-1 such that (Lx{)-1  g with gL{ = L{g = E. The function G(x, y) is called a Green’s function and g(x, y) is said to be an adjoint Green’s function. Handling of Green’s functions and adjoint Green’s functions is based upon (10.95) and (10.96), respectively. As (10.81) is defined in a domain r ≤ x ≤ s, (10.95) and (10.96) are defined in a domain r ≤ x ≤ s and r ≤ y ≤ s. Notice that except for the point x = y we have Lx Gðx, yÞ = 0

and Lx { gðx, yÞ = 0:

ð10:97Þ

That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homogeneous BCs with respect to the variable x as those imposed upon u(x) and v(x) of (10.81) and (10.82), respectively [1]. The relation (10.88) can be obtained as follows: Operating hg| on (10.89), we have s

hgjf i =

s

dxwðxÞf ðxÞhgjxi =

r

wðxÞgðxÞ f ðxÞdx,

ð10:98Þ

r

where for the last equality we used hgjxi = hx j gi = gðxÞ :

ð10:99Þ

For this, see (1.113) where A is replaced with an identity operator E with regard to a complex conjugate of an inner product of two vectors. Also see (13.2) of Sect. 13.1. If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions e.g., (10.80), we have

420

10 s

Introductory Green’s Functions



dxwðxÞ v ðLx uÞ - Lx { v u = 0,

ð10:100Þ

r

which is called Green’s identity. Since (10.100) is derived from identities (10.56), (10.100) is an identity as well (as a terminology of Green’s identity shows). Therefore, (10.100) must hold with any functions u and v so far as they satisfy homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96) together with (10.81), we have s



dxwðxÞ g ðx, yÞ½Lx uðxÞ - Lx { gðx, yÞ uðxÞ

r s

=

dxwðxÞ g ðx, yÞdðxÞ -

r s

=



δ ð x - yÞ uð x Þ w ð xÞ

ð10:101Þ

dxwðxÞg ðx, yÞdðxÞ - uðyÞ = 0,

r

where with the second last equality we used a property of the δ functions. Also notice that δðwxð-xÞyÞ is a real function. Rewriting (10.101), we get s

uðyÞ =

dxwðxÞg ðx, yÞdðxÞ:

ð10:102Þ

r

Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with (10.82), we have s

vð yÞ =

dxwðxÞG ðx, yÞhðxÞ:

ð10:103Þ

r

Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have s



dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ - Lx { gðx, t Þ Gðx, qÞ = 0:

ð10:104Þ

r

Notice that we have chosen q and t for the second argument y in (10.95) and (10.96), respectively. Inserting (10.95) and (10.96) into the above equation after changing arguments, we have s

dxwðxÞ g ðx, t Þ

r

Thus, we get



δðx - qÞ δðx - t Þ Gðx, qÞ = 0: wð x Þ wðxÞ

ð10:105Þ

10.4

Green’s Functions

421

g ðq, t Þ = Gðt, qÞ

or

gðq, t Þ = G ðt, qÞ:

ð10:106Þ

This implies that G(t, q) must satisfy the adjoint BCs with respect to the second argument q. Inserting (10.106) into (10.102), we get s

uð y Þ =

dxwðxÞGðy, xÞdðxÞ:

ð10:107Þ

r

Or exchanging the arguments x and y, we have s

uð x Þ =

dywðyÞGðx, yÞdðyÞ:

ð10:108Þ

r

Similarly, substituting (10.106) for (10.103) we get s

vð yÞ =

dxwðxÞgðy, xÞhðxÞ:

ð10:109Þ

dywðyÞgðx, yÞhðyÞ:

ð10:110Þ

r

Or, we have s

vð xÞ = r

Equations (10.107)–(10.110) clearly show that homogeneous equations [given by putting d(x) = h(x) = 0] have a trivial solution u(x)  0 and v(x)  0 under homogeneous BCs. Note that it is always the case when we are able to construct a Green’s function. This in turn implies that we can construct a Green’s function if the differential operator is accompanied by initial conditions. Conversely, if the homogeneous equation has a non-trivial solution under homogeneous BCs, (10.107)– (10.110) will not work. If the differential operator L in (10.81) is Hermitian, according to the associated remarks of Sect. 10.2 we must have Lx = Lx{ and u(x) and v(x) of (10.81) and (10.82) must satisfy the same homogeneous BCs. Consequently, in the case of an Hermitian operator we should have Gðx, yÞ = gðx, yÞ:

ð10:111Þ

From (10.106) and (10.111), if the operator is Hermitian we get Gðx, yÞ = G ðy, xÞ:

ð10:112Þ

In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to assure that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra). Then we have

422

10

Introductory Green’s Functions

Gðx, yÞ = Gðy, xÞ:

ð10:113Þ

That is, G(x, y) is real symmetric with respect to the arguments x and y. To be able to apply Green’s functions to practical use, we will have to estimate a behavior of the Green’s function near x = y. This is because in light of (10.95) and (10.96) there is a “jump” at x = y. When we deal with a case where a self-adjoint operator is relevant, using a function p(x) of (10.69) we have að x Þ ∂ δ ð x - yÞ ∂G p þ cðxÞG = : pðxÞ ∂x wðxÞ ∂x

ð10:114Þ

Multiplying both sides by paððxxÞÞ, we have pðxÞ δðx - yÞ pðxÞcðxÞ ∂ ∂G p = Gðx, yÞ: aðxÞ wðxÞ að x Þ ∂x ∂x

ð10:115Þ

Using a property of the δ function expressed by f ðxÞ δðxÞ = f ð0Þ δðxÞ

or

f ðxÞ δðx - yÞ = f ðyÞ δðx - yÞ,

ð10:116Þ

we have pðyÞ δðx - yÞ pðxÞcðxÞ ∂ ∂G p = Gðx, yÞ: aðyÞ wðyÞ að x Þ ∂x ∂x

ð10:117Þ

Integrating (10.117) with respect to x, we get p

∂Gðx, yÞ pð y Þ = θ ð x - yÞ aðyÞwðyÞ ∂x

x

dt r

pðt Þcðt Þ Gðt, yÞ þ C, aðt Þ

ð10:118Þ

where C is a constant. The function θ(x - y) is defined by θ ð xÞ =

1 ðx > 0Þ, 0 ðx < 0Þ:

ð10:119Þ

Note that we have dθðxÞ = δðxÞ: dx

ð10:120Þ

The function θ(x) is called a step function or Heaviside step function. In RHS of (10.118) the first term has a discontinuity at x = y because of θ(x - y), whereas the second term is continuous with respect to y. Thus, we have

10.4

Green’s Functions

lim pðy þ εÞ

ε → þ0

423

∂Gðx, yÞ ∂x

x = yþε

- pð y - ε Þ

∂Gðx, yÞ ∂x

x=y-ε

pð y Þ pð y Þ ½θðþεÞ - θð - εÞ = : = lim ε → þ0 aðyÞwðyÞ aðyÞwðyÞ

ð10:121Þ

Since p( y) is continuous with respect to the argument y, this factor drops off and we get lim

ε → þ0

∂Gðx, yÞ ∂x

x = yþε

-

∂Gðx, yÞ ∂x

x=y-ε

=

1 : aðyÞwðyÞ

ð10:122Þ

ð x, yÞ 1 Thus, ∂G∂x is accompanied by a discontinuity at x = y by a magnitude of aðyÞw ðyÞ. Since RHS of (10.122) is continuous with respect to the argument y, integrating (10.122) again with respect to x, we find that G(x, y) is continuous at x = y. These properties of G(x, y) are useful to calculate Green’s functions in practical use. We will encounter several examples in next sections. Suppose that there are two Green’s functions that satisfy the same homogeneous ~ ðx, yÞ be such functions. Then, we must have BCs. Let G(x, y) and G

Lx Gðx, yÞ =

δ ð x - yÞ w ð xÞ

and

~ ðx, yÞ = δðx - yÞ : Lx G wðxÞ

ð10:123Þ

Subtracting both sides of (10.123), we have ~ ðx, yÞ - Gðx, yÞ = 0: Lx G

ð10:124Þ

~ ðx, yÞ must satisfy the same homoIn virtue of the linearity of BCs, Gðx, yÞ - G geneous BCs as well. But (10.124) is a homogeneous equation, and so we must have a trivial solution from the aforementioned constructability of the Green’s function such that ~ ðx, yÞ  0 Gðx, yÞ - G

or

~ ðx, yÞ: Gðx, yÞ  G

ð10:125Þ

This obviously indicates that a Green’s function should be unique. We have assumed in Sect. 10.3 that the coefficients a(x), and b(x), and c(x) are real. Therefore, taking complex conjugate of (10.95) we have Lx Gðx, yÞ =

δðx - yÞ : wðxÞ

ð10:126Þ

Notice here that both δ(x - y) and w(x) are real functions. Subtracting (10.95) from (10.126), we have

424

10

Introductory Green’s Functions

Lx ½Gðx, yÞ - Gðx, yÞ = 0: Again, from the uniqueness of the Green’s function, we get G(x, y) = G(x, y); i.e., G(x, y) is real accordingly. This is independent of specific structures of Lx. In other words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y) is real whether or not Lx is self-adjoint.

10.5 Construction of Green’s Functions So far we dealt with homogeneous boundary conditions (BCs) with respect to a differential equation að x Þ

d2 u du þ bðxÞ þ cðxÞu = d ðxÞ, dx dx2

ð10:2Þ

where coefficients a(x), b(x), and c(x) are real. In this case, if d(x)  0 in (10.108), namely, the SOLDE is homogeneous equation, we have a solution u(x)  0 under homogeneous boundary conditions on the basis of (10.108). If, however, we have inhomogeneous boundary conditions (BCs), additional terms appear on RHS of (10.108) in both the cases of homogeneous and inhomogeneous equations. In this section, we examine how we can deal with this problem. Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem. In a more general case where the operator is not self-adjoint, (10.62) is useful. In this respect, in Sect. 10.6 we have a good opportunity for this. In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the differential operator in the case where the surface term vanishes. Meanwhile, we should bear in mind that the Green’s functions and adjoint Green’s functions are constructed using homogeneous BCs regardless of whether we are concerned with a homogeneous equation or inhomogeneous equation. Thus, even if the surface terms do not vanish, we may regard the differential operator as Hermitian. This is because we deal with essentially the same Green’s function to solve a problem with both the cases of homogeneous equation and inhomogeneous equations (vide infra). Notice also that whether or not RHS vanishes, we are to use the same Green’s function [1]. In this sense, we do not have to be too strict with the definition of Hermiticity. Now, suppose that for a differential (self-adjoint) operator Lx we are given s r

dxwðxÞfv ðLx uÞ - ½Lx v ug = pðxÞ v

du dv -u dx dx

s

, r

ð10:69Þ

10.5

Construction of Green’s Functions

425

where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then from (10.95) and (10.112), we have s



dxwðxÞ Gðy, xÞdðxÞ -

r s

=

δ ð x - yÞ uð x Þ w ð xÞ

dxwðxÞ Gðy, xÞd ðxÞ -

r

= pðxÞ Gðy, xÞ

δ ð x - yÞ uð x Þ w ð xÞ

duðxÞ ∂Gðy, xÞ - uð x Þ dx ∂x

ð10:127Þ

s x=r

:

Note in (10.127) we used the fact that both δ(x - y) and w(x) are real. Using a property of the δ function, we get s

uðyÞ =

dxwðxÞGðy, xÞdðxÞ

r

- pðxÞ Gðy, xÞ

duðxÞ ∂Gðy, xÞ - uð x Þ dx ∂x

s x=r

:

ð10:128Þ

When the differential operator Lx can be made Hermitian under an appropriate condition of (10.71), as a real Green’s function we have Gðx, yÞ = Gðy, xÞ:

ð10:113Þ

The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the Dirichlet BCs [see (10.80)], we have Gðr, yÞ = Gðs, yÞ = 0:

ð10:129Þ

Using the symmetric property of G(x, y) with respect to arguments x and y, from (10.129) we get Gðy, r Þ = Gðy, sÞ = 0:

ð10:130Þ

Thus, the first term of the surface terms of (10.128) is eliminated to yield s

uð y Þ =

dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ

r

Exchanging the arguments x and y, we get

∂Gðy, xÞ ∂x

s x=r

:

426

10 s

uð x Þ =

dywðyÞGðx, yÞd ðyÞ þ pðyÞuðyÞ

r

Introductory Green’s Functions

∂Gðx, yÞ ∂y

s y=r

:

ð10:131Þ

Then, (1) substituting surface terms of u(s) and u(r) that are associated with the inhomogeneous BCs described as B 1 ð uÞ = σ 1 and (2) calculating

∂Gðx, yÞ ∂y y=r

and

and B2 ðuÞ = σ 2

∂Gðx, yÞ , ∂y y=s

ð10:132Þ

we will be able to obtain a unique

solution. Notice that even though we have formally the same differential operators, we get different Green’s functions depending upon different BCs. We see tangible examples later. On the basis of the general discussion of Sect. 10.4 and this section, we are in the position to construct the Green’s functions. Except for the points of x = y, the Green’s function G(x, y) must satisfy the following differential equation: Lx Gðx, yÞ = 0,

ð10:133Þ

where Lx is given by Lx = aðxÞ

d2 d þ bðxÞ þ cðxÞ: dx dx2

ð10:55Þ

The differential equation Lxu = d(x) is defined within an interval [r, s], where r may be -1 and s may be +1. From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we expect the Green’s function to be described as a linear combination of a fundamental set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by two linearly independent solutions of a homogeneous equation Lxu = 0. Then we should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are described as F 1 ðx, yÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ for r ≤ x < y, F 2 ðx, yÞ = d1 u1 ðxÞ þ d 2 u2 ðxÞ for y < x ≤ s,

ð10:134Þ

where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later. These constants are given as a function of y. The combination has to be made such that Gðx, yÞ =

F 1 ðx, yÞ for r ≤ x < y, F 2 ðx, yÞ for y < x ≤ s:

Thus using θ(x) function defined as (10.119), we describe G(x, y) as

ð10:135Þ

10.5

Construction of Green’s Functions

427

Gðx, yÞ = F 1 ðx, yÞθðy - xÞ þ F 2 ðx, yÞθðx - yÞ:

ð10:136Þ

Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not, because G(x, y) contains the θ(x) function. If we have F 2 ðx, yÞ = F 1 ðy, xÞ,

ð10:137Þ

Gðx, yÞ = F 1 ðx, yÞθðy - xÞ þ F 1 ðy, xÞθðx - yÞ:

ð10:138Þ

Gðx, yÞ = Gðy, xÞ:

ð10:139Þ

Hence, we get

From (10.113), Lx is Hermitian. Suppose that F1(x, y) = (x - r)(y - s) and F2(x, y) = (x - s)(y - r). Then, (10.137) is satisfied and, hence, if we can construct the Green’s function from F1(x, y) and F2(x, y), Lx should be Hermitian. However, if we had, e.g., F1(x, y) = x - r and F2(x, y) = y - s, G(x, y) ≠ G(y, x), and so Lx would not be Hermitian. The Green’s functions must satisfy the homogeneous BCs. That is, B1 ðGÞ = B2 ðGÞ = 0:

ð10:140Þ

Also, we require continuity condition of G(x, y) at x = y and discontinuity condition of ∂G(x, y)/∂x at x = y described by (10.122). Thus, we have four conditions including BCs and continuity and discontinuity conditions to be satisfied by G(x, y). Thus, we can determine four constants c1, c2, d1, and d2 by the four conditions. Now, let us inspect further details about the Green’s functions by an example. Example 10.4 Let us consider a following differential equation d2 u þ u = 1: dx2

ð10:141Þ

We assume that a domain of the argument x is [0, L]. We set boundary conditions such that uð0Þ = σ 1

and

u ð LÞ = σ 2 :

ð10:142Þ

Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomogeneous differential equation under inhomogeneous BCs. Next, let us seek conditions that the Green’s function satisfies. We also seek a fundamental set of solutions of a homogeneous equation described by

428

10

Introductory Green’s Functions

d2 u þ u = 0: dx2

ð10:143Þ

This is obtained by putting a = c = 1 and b = 0 in a general form of (10.5) with a weight function being unity. The differential equation (10.143) is therefore selfadjoint according to the argument of Sect. 10.3. A fundamental set of solutions are given by and e - ix :

eix Then, we have

F 1 ðx, yÞ = c1 eix þ c2 e - ix F 2 ðx, yÞ = d1 e þ d2 e ix

- ix

for 0 ≤ x < y, for y < x ≤ L:

ð10:144Þ

The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that F 1 ð0, yÞ = c1 þ c2 = 0

and

F 2 ðL, yÞ = d1 eiL þ d2 e - iL = 0:

ð10:145Þ

F 2 ðx, yÞ = d1 eix - e2iL e - ix :

ð10:146Þ

Thus, we have F 1 ðx, yÞ = c1 eix - e - ix , Therefore, at x = y we have c1 eiy - e - iy = d1 eiy - e2iL e - iy :

ð10:147Þ

Discontinuity condition of (10.122) is equivalent to ∂F 2 ðx, yÞ ∂x

x=y

-

∂F 1 ðx, yÞ ∂x

x=y

= 1:

ð10:148Þ

This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to be differentiable at any x. The relation (10.148) then reads as id 1 eiy þ e2iL e - iy - ic1 eiy þ e - iy = 1: From (10.147) and (10.149), using Cramer’s rule we have

ð10:149Þ

10.5

Construction of Green’s Functions

429

0 - eiy þ e2iL e - iy iðeiy - e2iL e - iy Þ i - eiy - e2iL e - iy = c1 = iy , - iy iy 2iL - iy 2ð1 - e2iL Þ e -e -e þ e e eiy þ e - iy

ð10:150Þ

- eiy - e2iL e - iy

eiy - e - iy 0 iðeiy - e - iy Þ eiy þ e - iy i = : d1 = iy - iy iy 2iL - iy 2ð1 - e2iL Þ e -e -e þ e e eiy þ e - iy

ð10:151Þ

- eiy - e2iL e - iy

Substituting these parameters for (10.146), we get F 1 ðx, yÞ =

sin xðe2iL e - iy - eiy Þ , 1 - e2iL

F 2 ðx, yÞ =

sin yðe2iL e - ix - eix Þ : 1 - e2iL

ð10:152Þ

Making a denominator real, we have sin x½cosðy - 2LÞ - cos y , 2 sin 2 L sin y½cosðx - 2LÞ - cos x F 2 ðx, yÞ = : 2 sin 2 L

F 1 ðx, yÞ =

ð10:153Þ

Using the θ(x) function, we get sin x½cosðy - 2LÞ - cos y θ ð y - xÞ 2 sin 2 L sin y½cosðx - 2LÞ - cos x þ θðx - yÞ: 2 sin 2 L

Gðx, yÞ =

ð10:154Þ

Thus, G(x, y) = G(y, x) as expected. Notice, however, that if L = nπ (n = 1, 2, ⋯) the Green’s function cannot be defined as an ordinary function even if x ≠ y. We return to this point later. The solution for (10.141) under the homogeneous BCs is then described as L

uð x Þ =

dyGðx, yÞ

0

=

cosðx - 2LÞ - cos x 2 sin 2 L

x 0

sin ydy þ

sin x 2 sin 2 L

L

½cosðy - 2LÞ - cos ydy:

x

ð10:155Þ

430

10

Introductory Green’s Functions

This can readily be integrated to yields solution for the inhomogeneous equation such that cosðx - 2LÞ - cos x - 2 sin L sin x - cos 2L þ 1 2 sin 2 L cosðx - 2LÞ - cos x - 2 sin L sin x þ 2 sin 2 L = : 2 sin 2 L

uð x Þ =

ð10:156Þ

Next, let us consider the surface term. This is given by the second term of (10.131). We get ∂F 1 ðx, yÞ ∂x

y=L

=

sin x , sin L

∂F 2 ðx, yÞ ∂x

y=0

=

cosðx - 2LÞ - cos x : 2 sin 2 L

ð10:157Þ

Therefore, with the inhomogeneous BCs we have the following solution for the inhomogeneous equation: cosðx - 2LÞ - cos x - 2 sin L sin x þ 2 sin 2 L 2 sin 2 L 2σ sin L sin x þ σ 1 ½cos x - cosðx - 2LÞ þ 2 , 2 sin 2 L

uðxÞ =

ð10:158Þ

where the second term is the surface term. If σ 1 = σ 2 = 1, we have uðxÞ  1: Looking at (10.141), we find that u(x)  1 is certainly a solution for (10.141) with inhomogeneous BCs of σ 1 = σ 2 = 1. The uniqueness of the solution then ensures that u(x)  1 is a sole solution under the said BCs. From (10.154), we find that G(x, y) has a singularity at L = nπ (n : integers). This is associated with the fact that a homogenous equation (10.143) has a non-trivial solution, e.g., u(x) = sin x under homogeneous BCs u(0) = u(L ) = 0. The present situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words, when λ = 1 in (1.61), the form of a differential equation is identical to (10.143) with virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a homogeneous equation and, at the same time, as an eigenvalue equation. In such a case, a Green’s function approach will fail.

10.6

Initial-Value Problems (IVPs)

10.6 10.6.1

431

Initial-Value Problems (IVPs) General Remarks

The IVPs frequently appear in mathematical physics. The relevant conditions are dealt with as BCs in the theory of differential equations. With boundary functionals B1(u) and B2(u) of (10.3) and (10.4), setting α1 = β2 = 1 and other coefficients as zero, we get B1 ðuÞ = uðpÞ = σ 1

and B2 ðuÞ =

du dx

x=p

= σ2 :

ð10:159Þ

In the above, note that we choose [r, s] for a domain of x. The points r and s can be infinity as before. Any point p within the domain [r, s] may be designated as a special point on which the BCs (10.159) are imposed. The initial conditions are particularly prominent among BCs. This is because the conditions are set at one point of the argument. This special condition is usually called initial conditions. In this section, we investigate fundamental characteristics of IVPs. Suppose that we have uð pÞ =

du dx

x=p

= 0,

ð10:160Þ

with homogeneous BCs. Given a differential operator Lx defined as (10.55), i.e., Lx = aðxÞ

d2 d þ bðxÞ þ cðxÞ, dx dx2

ð10:55Þ

let a fundamental set of solutions be u1(x) and u2(x) for Lx uðxÞ = 0:

ð10:161Þ

A general solution u(x) for (10.161) is given by a linear combination of u1(x) and u2(x) such that uðxÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ,

ð10:162Þ

where c1 and c2 are arbitrary (complex) constants. Suppose that we have homogeneous BCs expressed by (10.160). Then, we have uðpÞ = c1 u1 ðpÞ þ c2 u2 ðpÞ = 0, u0 ðpÞ = c1 u01 ðpÞ þ c2 u02 ðpÞ = 0: Rewriting it in a matrix form, we have

432

10 u1 ðpÞ u01 ðpÞ

u2 ðpÞ u02 ðpÞ

c1 c2

Introductory Green’s Functions

= 0:

Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and u2(x), its determinant never vanishes at any point p. That is, we have u 1 ð pÞ

u 2 ð pÞ

u01 ðpÞ

u02 ðpÞ

≠ 0:

ð10:163Þ

Then, we necessarily have c1 = c2 = 0. From (10.162), we have a trivial solution uð x Þ  0 under the initial conditions as homogeneous BCs. Thus, as already discussed a Green’s function can always be constructed for IVPs. To seek the Green’s functions for IVPs, we return back to the generalized Green’s identity described as s



dx v ðLx uÞ - Lx { v u = av

r

d ðav Þ du þ buv : -u dx dx r s

ð10:62Þ

For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g., uð s Þ =

du dx

x=s

=0

and

vð r Þ =

dv dx

x=r

= 0,

for the two sets of BCs adjoint to each other. Obviously, these are not identical simply because the former is determined at s and the latter is determined at a different point r. For this reason, the operator Lx is not Hermitian, even though it is formally self-adjoint. In such a case, we would rather use Lx directly than construct a selfadjoint operator because we cannot make the operator Hermitian either way. Hence, unlike the precedent sections we do not need a weight function w(x). Or, we may regard w(x)  1. Then, we reconsider the conditions which the Green’s functions should satisfy. On the basis of the general consideration of Sect. 10.4, especially (10.86), (10.87), and (10.94), we have [2] Lx Gðx, yÞ = hxjyi = δðx - yÞ: Therefore, we have 2

δ ð x - yÞ ∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ þ þ Gðx, yÞ = : að x Þ að x Þ að x Þ ∂x2 ∂x Integrating or integrating by parts the above equation, we get

ð10:164Þ

10.6

Initial-Value Problems (IVPs)

∂Gðx, yÞ bð x Þ þ Gðx, yÞ að x Þ ∂x

433 x

-

x x0

x0

0

bð ξ Þ Gðξ, yÞdξ þ að ξ Þ

x x0

cð ξ Þ Gðξ, yÞdξ aðξÞ

θ ð x - yÞ : = að y Þ ∂Gðx, yÞ ∂x

Noting that the functions other than we have lim

ε → þ0

10.6.2

∂Gðx, yÞ ∂x

x = yþε

-

∂Gðx, yÞ ∂x

and

θ ð x - yÞ aðyÞ

x=y-ε

=

are continuous, as before

1 : að y Þ

ð10:165Þ

Green’s Functions for IVPs

From a practical point of view, we may set r = 0 in (10.62). Then, we can choose a domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use x instead of s. We consider two cases of x > 0 and x < 0. 1. Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define F1(x, y) and F2(x, y) as before such that F 1 ðx, yÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ

for 0 ≤ x < y,

F 2 ðx, yÞ = d1 u1 ðxÞ þ d2 u2 ðxÞ

for 0 < y < x:

As before, we set Gðx, yÞ =

F 1 ðx, yÞ for 0 ≤ x < y, F 2 ðx, yÞ for 0 < y < x:

Homogeneous BCs are defined as u ð 0Þ = 0

and u0 ð0Þ = 0:

Correspondingly, we have F 1 ð0, yÞ = 0 This is translated into

and F 01 ð0, yÞ = 0:

ð10:166Þ

434

10

c 1 u1 ð 0Þ þ c2 u2 ð 0Þ = 0

Introductory Green’s Functions

and c1 u01 ð0Þ þ c2 u02 ð0Þ = 0:

In a matrix form, we get u1 ð0Þ u01 ð0Þ

u2 ð0Þ u02 ð0Þ

c1 c2

= 0:

As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions, we have c1 = c2 = 0. Hence, we get F 1 ðx, yÞ = 0: From the continuity and discontinuity conditions (10.165) imposed upon the Green’s functions, we have d1 u1 ðyÞ þ d 2 u2 ðyÞ = 0 and

d 1 u01 ðyÞ þ d2 u02 ðyÞ = 1=aðyÞ:

ð10:167Þ

As before, we get d1 = -

u2 ð y Þ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

and

d2 =

u1 ðyÞ , aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get F 2 ðx, yÞ =

u2 ðxÞu1 ðyÞ - u1 ðxÞu2 ðyÞ : aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

ð10:168Þ

2. Case II (x < 0): Next, we think of the case as below: F 1 ðx, yÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ

for y < x ≤ 0,

F 2 ðx, yÞ = d1 u1 ðxÞ þ d2 u2 ðxÞ

for x < y < 0:

ð10:169Þ

Similarly proceeding as the above, we have c1 = c2 = 0. Also, we get F 2 ðx, yÞ =

u1 ðxÞu2 ðyÞ - u2 ðxÞu1 ðyÞ : aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

ð10:170Þ

Here notice that the sign is reversed in (10.170) relative to (10.168). This is because on the discontinuity condition, instead of (10.167) we have to have d1 u01 ðyÞ þ d 2 u02 ðyÞ = - 1=aðyÞ: This results from the fact that magnitude relationship between the arguments x and y has been reversed in (10.169) relative to (10.166).

10.6

Initial-Value Problems (IVPs)

435

x

Fig. 10.2 Graph of a function Θ(x, y). Θ(x, y) = 1 or - 1 in hatched areas, otherwise Θ(x, y) = 0

,

=1

( , ) O

,

y

= −1

Summarizing the above argument, (10.168) is obtained in the domain 0 ≤ y < x; (10.170) is obtained in the domain x < y < 0. Noting this characteristic, we define a function such that Θðx, yÞ  θðx - yÞθðyÞ - θðy - xÞθð- yÞ:

ð10:171Þ

Notice that Θðx, yÞ = - Θð- x, - yÞ: That is, Θ(x, y) is antisymmetric with respect to the origin. Figure 10.2 shows a feature of Θ(x, y). If the “initial point” is taken at x = a, we can use Θ(x - a, y - a) instead; see Fig. 10.3. The function is described as Θðx - a, y - aÞ = θðx - yÞθðy - aÞ - θðy - xÞθða - yÞ: Note that Θ(x - a, y - a) can be obtained by shifting Θ(x, y) toward the positive direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3 we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as Gðx, yÞ =

u2 ðxÞu1 ðyÞ - u1 ðxÞu2 ðyÞ Θðx, yÞ: aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

Defining a function F such that

ð10:172Þ

436

10

Introductory Green’s Functions

x

Fig. 10.3 Graph of a function Θ(x - a, y - a). We assume a > 0

( − , − ) O y

F ðx, yÞ 

u2 ðxÞu1 ðyÞ - u1 ðxÞu2 ðyÞ , aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

ð10:173Þ

we have Gðx, yÞ = F ðx, yÞΘðx, yÞ:

ð10:174Þ

Notice that Θðx, yÞ ≠ Θðy, xÞ

and Gðx, yÞ ≠ Gðy, xÞ:

ð10:175Þ

It is therefore obvious that the differential operator is not Hermitian.

10.6.3 Estimation of Surface Terms To include the surface term of the inhomogeneous case, we use (10.62). s r



dx v ðLx uÞ - Lx { v u = av

d ðav Þ du þ buv : -u dx dx r s

ð10:62Þ

As before, we set r = 0 in (10.62). Also, we classify (10.62) into two cases according as s > 0 or s < 0.

10.6

Initial-Value Problems (IVPs)

437

x

Fig. 10.4 Domain of G(y, x). Areas for which G(y, x) does not vanish are hatched

y

O

1. Case I (x > 0, y > 0): (10.62) reads as 1



dx v ðLx uÞ - Lx { v u = av

0

d ðav Þ du þ buv -u dx dx

1

:

ð10:176Þ

0

This time, inserting gðx, yÞ = G ðy, xÞ = Gðy, xÞ

ð10:177Þ

into v of (10.176) and arranging terms, we have 1

uð y Þ =

dxGðy, xÞd ðxÞ

0

- aðxÞGðy, xÞ

duðxÞ daðxÞ ∂Gðy, xÞ - uð x Þ Gðy, xÞ - uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ dx dx ∂x

1 x=0

:

ð10:178Þ Note that in the above we used Lx{g(x, y) = δ(x - y). In Fig. 10.4, we depict a domain of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain by folding back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y = x. ðy, xÞ ; see Fig. 10.4. Thus, we find that G(y, x) vanishes at x > y. So does ∂G∂x Namely, the second term of RHS of (10.178) vanishes at x = 1. In other words, g(x, y) and G(y, x) must satisfy the adjoint BCs; i.e.,

438

10

Introductory Green’s Functions

gð1, yÞ = Gðy, 1Þ = 0:

ð10:179Þ

At the same time, the upper limit of integration range of (10.178) can be set at y. Noting the above, we have y

½the first term of ð10:178Þ =

dxGðy, xÞd ðxÞ:

ð10:180Þ

0

Also with the second term of (10.178), we get ½the second term of ð10:178Þ = duðxÞ daðxÞ ∂Gðy, xÞ - uðxÞ Gðy, xÞ - uðxÞaðxÞ þ aðxÞGðy, xÞ þ bðxÞuðxÞGðy, xÞ dx dx ∂x duð0Þ dað0Þ ∂Gðy, xÞ - uð0Þ Gðy, 0Þ - uð0Það0Þ = að0ÞGðy, 0Þ dx dx ∂x

x=0

þ bð0Þuð0ÞGðy, 0Þ: x=0

ð10:181Þ If we substitute inhomogeneous BCs u ð 0Þ = σ 1

and

du dx

x=0

= σ2 ,

ð10:182Þ

for (10.181) along with other appropriate values, we should be able to get a unique solution as y

uð y Þ =

dxGðy, xÞdðxÞ þ σ 2 að0Þ - σ 1

0

- σ 1 a ð 0Þ

∂Gðy, xÞ ∂x

x=0

dað0Þ þ σ 1 bð0Þ Gðy, 0Þ dx

ð10:183Þ

:

Exchanging arguments x and y, we get x

uð x Þ =

dyGðx, yÞdðyÞ þ σ 2 að0Þ - σ 1

0

∂Gðx, yÞ - σ 1 a ð 0Þ ∂y

y=0

dað0Þ þ σ 1 bð0Þ Gðx, 0Þ dy

ð10:184Þ

:

Here, we consider that Θ(x, y) = 1 in this region and use (10.174). Meanwhile, from (10.174), we have

10.6

Initial-Value Problems (IVPs)

439

∂Gðx, yÞ ∂F ðx, yÞ ∂Θðx, yÞ = Θðx, yÞ þ F ðx, yÞ: ∂y ∂y ∂y

ð10:185Þ

In the second term, ∂Θðx, yÞ ∂θðx - yÞ ∂θðyÞ ∂θðy - xÞ ∂θð- yÞ = θðyÞ þ θðx - yÞ θð - yÞ - θðy - xÞ ∂y ∂y ∂y ∂y ∂y = - δðx - yÞθðyÞ þ θðx - yÞδðyÞ - δðy - xÞθð - yÞ þ θðy - xÞδð - yÞ = - δðx - yÞ½θðyÞ þ θð - yÞ þ ½θðx - yÞ þ θðy - xÞδðyÞ = - δðx - yÞ½θðyÞ þ θð - yÞ þ ½θðxÞ þ θð - xÞδðyÞ = - δðx - yÞ þ δðyÞ,

ð10:186Þ where we used θðxÞ þ θð- xÞ  1

ð10:187Þ

f ðyÞδðyÞ = f ð0ÞδðyÞ

ð10:188Þ

δð- yÞ = δðyÞ:

ð10:189Þ

as well as

and

However, the function -δ(x - y) + δ( y) is of secondary importance. It is because in (10.184) we may choose [ε, x - ε] (ε > 0) for the domain y and put ε → + 0 after the integration and other calculations related to the surface terms. Therefore, -δ(x - y) + δ( y) in (10.186) virtually vanishes. Thus, we can express (10.185) as ∂F ðx, yÞ ∂Gðx, yÞ ∂F ðx, yÞ = Θðx, yÞ = : ∂y ∂y ∂y Then, finally we reach x

uðxÞ =

dyF ðx, yÞdðyÞ þ σ 2 að0Þ - σ 1

0

- σ 1 að0Þ

∂F ðx, yÞ ∂y

y=0

dað0Þ þ σ 1 bð0Þ F ðx, 0Þ dy

:

2. Case II (x < 0, y < 0): Similarly as the above, (10.62) reads as

ð10:190Þ

440

10

uð y Þ ¼

0 -1

dxGðy, xÞd ðxÞ - aðxÞGðy, xÞ

- uðxÞaðxÞ

Introductory Green’s Functions

duðxÞ daðxÞ - uð x Þ Gðy, xÞ dx dx

∂Gðy, xÞ þ bðxÞuðxÞGðy, xÞ ∂x

0

: x¼ - 1

Similarly as mentioned above, the lower limit of integration range is y. ð y, xÞ vanish at x < y (see Fig. 10.4), we have Considering both G(y, x) and ∂G∂x 0

uðyÞ =

dxGðy, xÞdðxÞ

y

- aðxÞGðy, xÞ y

=0

duðxÞ daðxÞ ∂Gðy, xÞ - uð x Þ Gðy,xÞ - uðxÞaðxÞ þ bðxÞuðxÞGðy,xÞ dx dx ∂x

x= 0

dað0Þ þ σ 1 bð0Þ Gðy,0Þ dxGðy,xÞdðxÞ - σ 2 að0Þ - σ 1 dx

þσ 1 að0Þ

∂Gðy,xÞ ∂x

: x=0

ð10:191Þ Comparing (10.191) with (10.183), we recognize that the sign of RHS of (10.191) has been reversed relative to RHS of (10.183). This is also the case after exchanging arguments x and y. Note, however, Θ(x, y) = - 1 in the present case. As a result, two minus signs cancel and (10.191) takes exactly the same expression as (10.183). Proceeding with calculations similarly, for both Cases I and II we arrive at a unified solution represented by (10.190) throughout a domain (1, +1).

10.6.4

Examples

To deepen understanding of Green’s functions, we deal with tangible examples of the IVP below. Example 10.5 Let us consider a following inhomogeneous differential equation: d2 u þ u = 1: dx2

ð10:192Þ

Note that (10.192) is formally the same differential equation of (10.141). We may encounter (10.192) when we are observing a motion of a charged harmonic oscillator

10.6

Initial-Value Problems (IVPs)

441

that is placed under a static electric field. We assume that a domain of the argument x is a whole range of real numbers. We set boundary conditions such that u ð 0Þ = σ 1

and u0 ð0Þ = σ 2 :

ð10:193Þ

As in the case of Example 10.4, a fundamental set of solutions are given by eix

and e - ix :

ð10:194Þ

Therefore, following (10.173), we get F ðx, yÞ = sinðx - yÞ:

ð10:195Þ

Also following (10.190) we have x

uð x Þ =

dy sinðx - yÞ þ σ 2 sin x - σ 1 ½ - cosðx - yÞ

0

y=0

ð10:196Þ

= 1 - cos x þ σ 2 sin x þ σ 1 cos x: In particular, if we choose σ 1 = 1 and σ 2 = 0, we have uðxÞ  1:

ð10:197Þ

This also ensures that this is a unique solution under the inhomogeneous BCs described as σ 1 = 1 and σ 2 = 0. Example 10.6 Damped Oscillator If a harmonic oscillator undergoes friction, the oscillator exerts damped oscillation. Such an oscillator is said to be a damped oscillator. The damped oscillator is often dealt with when we think of bound electrons in a dielectric medium that undergo an effect of a dynamic external field varying with time. This is the case when the electron is placed in an alternating electric field or an electromagnetic wave. An equation of motion of the damped oscillator is described as m

du d2 u þ r þ ku = dðt Þ, dt dt 2

ð10:198Þ

where m is a mass of an electron; t denotes time; r is a damping constant; k is a spring constant of the damped oscillator; d(t) represents an external force field (generally time-dependent). To seek a fundamental set of solutions of a homogeneous equation described as

442

10

m

Introductory Green’s Functions

d2 u du þ r þ ku = 0, dt dt 2

putting ð10:199Þ

uðt Þ = eiρt and inserting it to (10.198), we have - ρ2 þ

r k iρt iρ þ e = 0: m m

Since eiρt does not vanish, we have - ρ2 þ

r k iρ þ = 0: m m

ð10:200Þ

We call this equation a characteristic quadratic equation. We have three cases for the solution of a quadratic equation of (10.200). Solving (10.200) with respect to ρ, we get ρ=

ir ∓ 2m

-

r2 k þ : 4m2 m

ð10:201Þ

Regarding (10.201), we have following three cases for the roots ρ: (1) ρ has two r2 k different pure imaginary roots; i.e., - 4m 2 þ m < 0 (an over damping). (2) ρ has ir r2 k ; - 4m double root 2m 2 þ m = 0 (a critical damping). (3) ρ has two complex roots; 2 r k - 4m 2 þ m > 0 (a weak damping). Of these, Case (3) is characterized by an oscillating solution and has many applications in mathematical physics. For the Cases (1) and (2), however, we do not have an oscillating solution. Case (1): The characteristic roots are given by ρ=

k ir r2 - : ∓i 2m 4m2 m

Therefore, we have a fundamental set of solutions described by uðt Þ = exp -

rt exp ∓ 2m

Then, a general solution is given by

k r2 - t : 2 m 4m

10.6

Initial-Value Problems (IVPs)

uðt Þ = exp -

rt 2m

443

r2 k - t 4m2 m

aexp

þ b exp -

r2 k - t 4m2 m

:

Case (2): The characteristic roots are given by ρ=

ir : 2m

Therefore, one of the solutions is u1 ðt Þ = exp -

rt : 2m

Another solution u2(t) is given by u2 ðt Þ = c

∂u1 ðt Þ rt , = c0 t exp 2m ∂ðiρÞ

where c and c′ are appropriate constants. Thus, general solution is given by uðt Þ = a exp -

rt rt þ bt exp : 2m 2m

The most important and interesting feature emerges as a “damped oscillator” in the next Case (3) in many fields of natural science. We are particularly interested in this case. Case (3): Suppose that the damping is relatively weak such that the characteristic equation has two complex roots. Let us examine further details of this case following the prescriptions of IVPs. We divide (10.198) by m for the sake of easy handling of the differential equation such that 1 d2 u r du k þ þ u = d ðt Þ: m dt 2 m dt m Putting ω

-

r2 k þ , 4m2 m

we get a fundamental set of solutions described as

ð10:202Þ

444

10

uðt Þ = exp -

Introductory Green’s Functions

rt expð ∓ iωt Þ: 2m

ð10:203Þ

Given BCs, following (10.172) we get as a Green’s function u2 ðt Þu1 ðτÞ - u1 ðt Þu2 ðτÞ Θðt, τÞ W ðu1 ðτÞ, u2 ðτÞÞ r 1 = e - 2mðt - τÞ sin ωðt - τÞΘðt, τÞ, ω

Gðt, τÞ =

ð10:204Þ

rt rt expðiωt Þ and u2 ðt Þ = exp - 2m expð- iωt Þ. where u1 ðt Þ = exp - 2m We examine whether G(t, τ) is eligible for the Green’s function as follows: r r dG 1 r = e - 2mðt - τÞ sin ωðt - τÞ þ ωe - 2mðt - τÞ cos ωðt - τÞ Θðt, τÞ dt ω 2m r 1 þ e - 2mðt - τÞ sin ωðt - τÞ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ: ω ð10:205Þ

The second term of (10.205) vanishes because sinω(t - τ)δ(t - τ) = 0  δ(t τ) = 0. Thus, d2 G 1 ¼ 2 ω dt

-

r 2m

2

e - 2mðt - τÞ sin ωðt - τÞ r

r r ωe - 2mðt - τÞ cos ωðt - τÞ m

- ω2 e - 2mðt - τÞ sin ωðt - τÞ Θðt, τÞ þ r

1 ω

-

r r e - 2mðt - τÞ sin ωðt - τÞ 2m

þ ωe - 2mðt - τÞ cos ωðt - τÞ × ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ: r

ð10:206Þ In the last term using the property of the δ function and θ function, we get δ(t - τ). That is, the following part in (10.206) is calculated such that e - 2mðt - τÞ cos ωðt - τÞ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ r

= e - 2m0 cosðω  0Þfδðt - τÞ½θðτÞ þ θð - τÞg r

= δðt - τÞ½θðτÞ þ θð - τÞ = δðt - τÞ: Thus, rearranging (10.206), we get

10.6

Initial-Value Problems (IVPs)

1 d2 G ¼ δ ðt - τ Þ þ ω dt 2 -

r m

-

¼ δ ðt - τ Þ -

-

-

445 2

r 2m

þ ω2 e - 2mðt - τÞ sin ωðt - τÞ r

r r r e - 2mðt - τÞ sin ωðt - τÞ þ ωe - 2mðt - τÞ cos ωðt - τÞ 2m

Θðt, τÞ

k r dG G, m m dt ð10:207Þ

where we used (10.202) and (10.204) for the last equality. Rearranging (10.207) once again, we have d 2 G r dG k þ þ G = δðt - τÞ: dt 2 m dt m

ð10:208Þ

Defining the following operator Lt 

d2 r d k þ þ , dt 2 m dt m

ð10:209Þ

we get Lt G = δðt - τÞ:

ð10:210Þ

Note that this expression is consistent with (10.164). Thus, we find that (10.210) satisfies the condition (10.123) of the Green’s function, where the weight function is identified with unity. Now, suppose that a sinusoidally changing external field eiΩt influences the motion of the damped oscillator. Here we assume that an amplitude of the external field is unity. Then, we have d2 u r du k 1 þ þ u = eiΩt : m dt 2 m dt m

ð10:211Þ

Thus, as a solution of the homogeneous boundary conditions [i.e., uð0Þ = _ uð0Þ = 0] we get uðt Þ =

1 mω

t

e - 2mðt - τÞ eiΩt sin ωðt - τÞdτ, r

ð10:212Þ

0

where t is an arbitrary positive or negative time. Equation (10.212) shows that with the real part we have an external field cosΩt and that with the imaginary part we have an external field sinΩt. To calculate (10.212) we use

446

10

sin ωðt - τÞ =

Introductory Green’s Functions

1 iωðt - τÞ e - e - iωðt - τÞ : 2i

ð10:213Þ

Then, the equation can readily be solved by integration of exponential functions, even though we have to do somewhat lengthy (but straightforward) calculations. Thus for the real part (i.e., the external field is cosΩt), we get a solution Cuðt Þ ¼

1 r 1 r Ω sin Ωt þ m m m 2m þ

1 - 2mr t e m

-

r 2m

2

2

cos Ωt -

Ω2 - ω2 cos ωt -

cos ωt -

r 2m

3

1 2 Ω - ω2 cos Ωt m

ð10:214Þ

r 1 2 Ω þ ω2 sin ωt 2m ω

1 sin ωt , ω

where C is a constant [i.e., a constant denominator of u(t)] expressed as C = Ω 2 - ω2

2

þ

r r2 Ω2 þ ω 2 þ 2m 2m2

4

:

ð10:215Þ

For the imaginary part (i.e., the external field is sinΩt) we get 1 2 1 r 1 r 2 Ω cos Ωt þ sin Ωt Ω - ω2 sin Ωt m m m m 2m r r r 2Ω Ω 2 1 þ e - 2mt Ω - ω2 sin ωt þ Ω cos ωt þ sin ωt : m 2m ω ω m

Cuðt Þ = -

ð10:216Þ

In Fig. 10.5 we show an example that depicts the positions of a damped oscillator as a function of time. In Fig. 10.5a, an amplitude of an envelope gradually diminishes with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the initial conditions uð0Þ = uð_0Þ = 0. In Fig. 10.5, we put m = 1 [kg], Ω = 1 [1/s], ω = 0.94 [1/s], and r = 0.006 [kg/s]. In the above calculations, if r/m is small enough (i.e., damping is small enough), the third order and fourth order of r/m may be ignored and the approximation is precise enough. In the case of inhomogeneous BCs, given σ 1 = u(0) and σ 2 = uð_0Þ we can decide additional surface term S(t) such that Sð t Þ = σ 2 þ σ 1

r r 1 - 2mr t sin ωt þ σ 1 e - 2mt cos ωt: e 2m ω

ð10:217Þ

The term S(t) corresponds to the second and third terms of (10.190). Thus, from (10.212) and (10.217),

10.6

Initial-Value Problems (IVPs)

(a)

447

0.6

Position (m)

0.4 0.2 0 -0.2 -0.4 -0.6

–446

(b)

Phase: 0 754

Time (s)

0.6

Position (m)

0.4 0.2 0 -0.2

Phase: 0

-0.4

-0.6

–110

Time (s)

110

Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a) Overall profile. (b) Profile enlarged near the origin

uð t Þ þ Sð t Þ gives a unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t) does not depend on the external field. In the above calculations, use

448

10

F ðt, τÞ =

Introductory Green’s Functions

1 - 2mr ðt - τÞ e sin ωðt - τÞ ω

together with a(t)  1, b(t)  r/m, d(t)  eiΩt/m with respect to (10.190). The conformation of (10.217) is left for readers.

10.7

Eigenvalue Problems

We often encounter eigenvalue problems in mathematical physics. Of these, those related to Hermitian differential operators have particularly interesting and important features. The eigenvalue problems we have considered in Part I are typical illustrations. Here we investigate general properties of the eigenvalue problems. Returning to the case of homogeneous BCs, we consider a following homogeneous SOLDE: að x Þ

du d2 u þ bðxÞ þ cðxÞu = 0: dx dx2

ð10:5Þ

Defining a following differential operator Lx such that Lx = aðxÞ

d2 d þ bðxÞ þ cðxÞ, dx dx2

ð10:55Þ

we have a homogeneous equation Lx uðxÞ = 0:

ð10:218Þ

Putting a constant -λ instead of c(x), we have að x Þ

d2 u du þ bðxÞ - λu = 0: dx dx2

ð10:219Þ

If we define a differential operator Lx such that Lx = a ð x Þ

d2 d þ bðxÞ - λ, dx dx2

ð10:220Þ

we have a homogeneous equation Lx u = 0

ð10:221Þ

to express (10.219). Instead, if we define a differential operator Lx such that

10.7

Eigenvalue Problems

449

Lx = aðxÞ

d2 d þ bðxÞ , dx dx2

we have the same homogeneous equation Lx u = λu

ð10:222Þ

to express (10.219). Equations (10.221) and (10.222) are essentially the same except that the expression is different. The expression using (10.222) is familiar to us as an eigenvalue equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is a given fixed function, λ in (10.219) is constant, but may be varied according to the solution of u(x). One of the most essential properties of the eigenvalue problem that is posed in the form of (10.222) is that its solution is not uniquely determined as already studied in various cases of Part I. Remember that the methods based upon the Green’s function are valid for a problem to which a homogeneous differential equation has a trivial solution (i.e., identically zero) under homogeneous BCs. In contrast to this situation, even though the eigenvalue problem is basically posed as a homogeneous equation under homogeneous BCs, non-trivial solutions are expected to be obtained. In this respect, we have seen that in Part I we rejected a trivial solution (i.e., identically zero) because of no physical meaning. As exemplified in Part I, the eigenvalue problems that appear in mathematical physics are closely connected to the Hermiticity of (differential) operators. This is because in many cases an eigenvalue is required to be real. We have already examined how we can convert a differential operator to the self-adjoint form. That is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in (10.70). As a symbolic description, we have wðxÞLx u =

d du þ cðxÞwðxÞu: pð x Þ dx dx

ð10:223Þ

In the same way, multiplying both sides of (10.222) by w(x), we get wðxÞLx u = λwðxÞu:

ð10:224Þ

For instance, Hermite differential equation that has already appeared as (2.118) in Sects. 2.3 is described as du d2 u - 2x þ 2nu = 0: dx dx2

ð10:225Þ

If we express (10.225) as in (10.224), multiplying e - x on both sides of (10.225) we have 2

450

10

Introductory Green’s Functions

2 du 2 d þ 2ne - x u = 0: e-x dx dx

ð10:226Þ

Notice that the differential operator has been converted to a self-adjoint form 2 according to (10.31) that defines a real and positive weight function e - x in the present case. The domain of the Hermite differential equation is (-1, +1) at the endpoints (i.e., ±1) of which the surface term of RHS of (10.69) approaches zero 2 sufficiently rapidly in virtue of e - x . In Sects. 3.4, and 3.5, in turn, we dealt with the (associated) Legendre differential equation (3.127) for which the relevant differential operator is self-adjoint. The surface term corresponding to (10.69) vanishes. It is because (1 - ξ2) vanishes at the endpoints ξ = cos θ = ± 1 (i.e., θ = 0 or π) from (3.107). Thus, the Hermiticity is automatically ensured for the (associated) Legendre differential equation as well as Hermite differential equation. In those cases, even though the differential equations do not satisfy any particular BCs, the Hermiticity is yet ensured. In the theory of differential equations, the aforementioned properties of the Hermitian operators have been fully investigated as the so-called Strum-Liouville system (or problem) in the form of a homogeneous differential equation. The related differential equations are connected to classical orthogonal polynomials having personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre, Tchebichef, etc. These equations frequently appear in quantum mechanics and electromagnetism as typical examples of Strum-Liouville system. They can be converted to differential equations by multiplying an original form by a weight function. The resulting equations can be expressed as dY ðxÞ d þ λn wðxÞY n ðxÞ = 0, aðxÞwðxÞ n dx dx

ð10:227Þ

where Yn(x) is a collective representation of classical orthogonal polynomials. Equation (10.226) is an example. Conventionally, a following form is adopted instead of (10.227): dY ðxÞ 1 d þ λn Y n ðxÞ = 0, aðxÞwðxÞ n dx wðxÞ dx

ð10:228Þ

where we put (aw)′ = bw. That is, the differential equation is originally described as að x Þ

dY ðxÞ d2 Y n ðxÞ þ bðxÞ n þ λn Y n ðxÞ = 0: dx dx2

ð10:229Þ

In the case of Hermite polynomials, for instance, a(x) = 1 and wðxÞ = e - x . Since 2 we have ðawÞ0 = - 2xe - x = bw, we can put b = - 2x. Examples including this case are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers 2

Legendre: Pn(x)

Gegenbauer: C λn ðxÞ

Laguerre: Lνn ðxÞ

Name of the polynomial Hermite: Hn(x)

d d ð1 - x2 Þ dx 2 Pn ðxÞ - 2x dx Pn ðxÞ þ nðn þ 1ÞPn ðxÞ = 0

2

λ λ d d λ ð1 - x2 Þ dx 2 C n ðxÞ - ð2λ þ 1Þx dx C n ðxÞ þ nðn þ 2λÞC n ðxÞ = 0 2

e-x

d2 d dx2 H n ðxÞ - 2x dx H n ðxÞ þ 2nH n ðxÞ = 0 d2 ν d ν Ln ðxÞ þ nLνn ðxÞ = 0 x dx2 Ln ðxÞ þ ðν þ 1 - xÞ dx

ð 1 - x2 Þ λ > - 12 1

λ - 12

xνe-x (ν > - 1)

2

Weight function: w(x)

SOLDE form

Table 10.2 Classical polynomials and their related SOLDEs

[-1, +1]

[-1, +1]

[0, +1)

Domain (-1, +1)

10.7 Eigenvalue Problems 451

452

10

Introductory Green’s Functions

that characterize the individual physical systems. The related fields have wide applications in many branches of natural science. After having converted the operator to the self-adjoint form; i.e., Lx = Lx{, instead of (10.100) we have s

dxwðxÞfv ðLx uÞ - ½Lx v ug = 0:

ð10:230Þ

r

Rewriting it, we get s

s

dxv ½wðxÞLx u =

r

dx½wðxÞLx v u:

ð10:231Þ

r

If we use an inner product notation described by (10.88), we get hvjLx ui = hLx vjui:

ð10:232Þ

Here let us think of two eigenfunctions ψ i and ψ j that belong to an eigenvalue λi and λj, respectively. That is, wðxÞLx ψ i = λi wðxÞψ i

and

wðxÞLx ψ j = λj wðxÞψ j :

ð10:233Þ

Inserting ψ i and ψ j into u and v, respectively, in (10.232), we have ψ j jLx ψ i = ψ j jλi ψ i = λi ψ j jψ i = λj  ψ j jψ i = λj ψ j jψ i = Lx ψ j jψ i :

ð10:234Þ

With the second and third equalities, we have used a rule of the inner product (see Parts I and III). Therefore, we get λi - λj  ψ j jψ i = 0:

ð10:235Þ

Putting i = j in (10.235), we get ðλi - λi  Þhψ i jψ i i = 0:

ð10:236Þ

An inner product hψ i| ψ ii vanishes if and only if jψ ii  0; see inner product calculation rules of Sect. 13.1. However, j ψ ii  0 is not acceptable as a physical state. Therefore, we must have hψ i| ψ ii ≠ 0. Thus, we get λi - λi  = 0

or

λi = λi  :

ð10:237Þ

References

453

The relation (10.237) obviously indicates that λi is real; i.e., we find that eigenvalues of an Hermitian operator are real. If λi ≠ λj = λj, from (10.235) we get ψ j jψ i = 0:

ð10:238Þ

That is, | ψ ji and | ψ ii are orthogonal to each other. We often encounter related orthogonality relationship between vectors and functions. We saw several cases in Part I and will see other cases in Part III.

References 1. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York

Part III

Linear Vector Spaces

In this part, we treat vectors and their transformations in linear vector spaces so that we can address various aspects of mathematical physics systematically but intuitively. We outline the general principles of linear vector spaces mostly from an algebraic point of view. Starting with abstract definition and description of vectors, we deal with their transformation in a vector space using a matrix. An inner product is a central concept in the theory of a linear vector space so that two vectors can be associated with each other to yield a scalar. Unlike many books of linear algebra and linear vector spaces, however, we describe canonical forms of matrices before considering the inner product. This is because we can treat the topics in light of the abstract theory of matrices and vector space without a concept of the inner product. Of the canonical forms of matrices, Jordan canonical form is of paramount importance. We study how it is constructed providing a tangible example. In relation to the inner product space, normal operators such as Hermitian operators and unitary operators frequently appear in quantum mechanics and electromagnetism. From a general aspect, we revisit the theory of Hermitian operators that often appeared in both Parts I and II. The last part deals with exponential functions of matrices. The relevant topics can be counted as one of the important branches of applied mathematics. Also, the exponential functions of matrices play an essential role in the theory of continuous groups that will be dealt with in Parts IV and V.

Chapter 11

Vectors and Their Transformation

In this chapter, we deal with the theory of finite-dimensional linear vector spaces. Such vector spaces are spanned by a finite number of linearly independent vectors, namely, basis vectors. In conjunction with developing an abstract concept and theory, we mention a notion of mapping among mathematical elements. A linear transformation of a vector is a special kind of mapping. In particular, we focus on endomorphism within a n-dimensional vector space Vn. Here, the endomorphism is defined as a linear transformation: Vn → Vn. The endomorphism is represented by a (n, n) square matrix. This is most often the case with physical and chemical applications, when we deal with matrix algebra. In this book we focus on this type of transformation. A non-singular matrix plays an important role in the endomorphism. In this connection, we consider its inverse matrix and determinant. All these fundamental concepts supply us with a sufficient basis for better understanding of the theory of the linear vector spaces. Through these processes, we should be able to get acquainted with connection between algebraic and analytical approaches and gain a broad perspective on various aspects of mathematical physics and related fields.

11.1

Vectors

From both fundamental and practical points of view, it is desirable to define linear vector spaces in an abstract way. Suppose that V is a set of elements denoted by a, b, c, etc. called vectors. The set V is a linear vector space (or simply a vector space), if a sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy the following mathematical relations: ða þ bÞ þ c = a þ ðb þ cÞ,

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_11

ð11:1Þ

457

458

11

Vectors and Their Transformation

a þ b = b þ a,

ð11:2Þ

a þ 0 = a,

ð11:3Þ

a þ ð- aÞ = 0:

ð11:4Þ

For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined (c is a complex number called a scalar) and we assume the following relations among vectors (a and b) and scalars (c and d ): ðcd Þa = cðdaÞ,

ð11:5Þ

1a = a,

ð11:6Þ

cða þ bÞ = ca þ cb,

ð11:7Þ

ðc þ dÞa = ca þ da:

ð11:8Þ

On the basis of the above relations, we can construct the following expression called a linear combination: c1 a1 þ c 2 a2 þ ⋯ þ c n an : If this linear combination is equated to zero, we obtain c1 a1 þ c2 a2 þ ⋯ þ cn an = 0:

ð11:9Þ

If (11.9) holds only in the case where every ci = 0 (1 ≤ i ≤ n), the vectors a1, a2, ⋯, an are said to be linearly independent. In this case the relation represented by (11.9) is said to be trivial. If the relation is non-trivial (i.e., ∃ci ≠ 0), those vectors are said to be linearly dependent. If in the vector space V the maximum number of linearly independent vectors is n, V is said to be an n-dimensional vector space and sometimes denoted by Vn. In this case any vector x of Vn is expressed uniquely as a linear combination of linearly independent vectors such that x = x 1 a1 þ x 2 a2 þ ⋯ þ x n an : Suppose x is denoted by

ð11:10Þ

11.1

Vectors

459

x = x01 a1 þ x02 a2 þ ⋯ þ x0n an :

ð11:11Þ

Subtracting both sides of (11.11) from (11.10), we obtain 0 = x1 - x01 a1 þ x2 - x02 a2 þ ⋯ þ xn - x0n an :

ð11:12Þ

Linear independence of the vectors a1, a2, ⋯, an implies xn - x0n = 0; i:e:, xn = x0n ð1 ≤ i ≤ nÞ. These n linearly independent vectors are referred to as basis vectors. A vector space that has a finite number of basis vectors is called finite-dimensional; otherwise, it is infinite-dimensional. Alternatively, we express (11.10) as x = ða1 ⋯an Þ

A set of coordinates

x1 ⋮ xn

x1 ⋮ xn

:

ð11:13Þ

is called a column vector (or a numerical vector) that

indicates an “address” of the vector x with respect to the basis vectors a1, a2, ⋯, an. These vectors are represented as a row vector in (11.13). Any vector in Vn can be expressed as a linear combination of the basis vectors and, hence, we say that Vn is spanned by a1, a2, ⋯, an. This is represented as V n = Spanfa1 , a2 , ⋯, an g:

ð11:14Þ

Let us think of a subset W of Vn (i.e., W ⊂ Vn). If the following relations hold for W, W is said to be a (linear) subspace of Vn. a, b 2 W ) a þ b 2 W, a 2 W ) ca 2 W:

ð11:15Þ

These two relations ensure that the relations of (11.1)-(11.8) hold for W as well. The dimension of W is equal to or smaller than n. For instance, W = Span{a1, a2, ⋯, ar} (r ≤ n) is a subspace of Vn. If r = n, W = Vn. Suppose that there are two subspaces W1 = Span{a1} and W2 = Span{a2}. Note that in this case W1 [ W2 is not a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by U = x = x 1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2

ð11:16Þ

is a subspace of Vn. We denote this subspace by W1 + W2. To show this is in fact a subspace, suppose that x, y 2 W1 + W2. Then, we may express x = x1 + x2 and y = y1 + y2, where x1, y1 2 W1; x2, y2 2 W2. We have x + y = (x1 + y1) + (x2 + y2), where x1 + y1 2 W1 and x2 + y2 2 W2 because both W1 and W2 are subspaces. Therefore, x + y 2 W1 + W2. Meanwhile, with any scalar

460

11

Vectors and Their Transformation

c, cx = cx1 + cx2 2 W1 + W2. By definition (11.15), W1 + W2 is a subspace accordingly. Suppose here x1 2 W1. Then, x1 = x1 + 0 2 W1 + W2. Then, W1 ⊂ W1 + W2. Similarly, we have W2 ⊂ W1 + W2. Thus, W1 + W2 contains both W1 and W2. Conversely, let W be an arbitrary subspace that contains both W1 and W2. Then, we have 8x1 2 W1 ⊂ W and 8x2 2 W2 ⊂ W and, hence, we have x1 + x2 2 W by definition (11.15). But, from (11.16) W1 + W2 = {x1 + x2; 8x1 2 W1, 8x2 2 W2}. Hence, W1 + W2 ⊂ W. Consequently, any subspace necessarily contains W1 + W2. This implies that W1 + W2 is the smallest subspace that contains both W1 and W2. Example 11.1 Consider a three-dimensional Cartesian space ℝ3 (Fig. 11.1). We regard the xy-plane and yz-plane as a subspace W1 and W2, respectively, and ! ! ! ℝ3 = W1 + W2. In Fig. 11.1a, a vector OB (in ℝ3) is expressed as OA þ AB (i.e., ! a sum of a vector in W1 and that in W2). Alternatively, the same vector OB can be ! ! expressed as OA0 þ A0 B . Conversely, we can designate a subspace in a different way; i.e., in Fig. 11.1b the z-axis is chosen for a subspace W3 instead of W2. We have ! ℝ3 = W1 + W3 as well. In this case, however, OB is uniquely expressed as ! ! ! OB = OP þ PB . Notice that in Fig. 11.1a W1 \ W2 = Span{e2}, where e2 is a unit vector in the positive direction of the y-axis. In Fig. 11.1b, however, we have W1 \ W3 = {0}. We can generalize this example to the following theorem. Theorem 11.1 Let W1 and W2 be subspaces of V and V = W1 + W2. Then a vector x in V is uniquely expressed as x = x 1 þ x2 ; x 1 2 W 1 , x 2 2 W 2 , if and only if W1 \ W2 = {0}. Proof Suppose W1 \ W2 = {0} and x = x1 þ x2 = x01 þ x02 , x1 , x01 2 W 1 , x2 , x02 2 W 2 . Then x1 - x01 = x02 - x2 . LHS belongs to W1 and RHS belongs to W2. Both sides belong to W1 \ W2 accordingly. Hence, from the supposition both the sides should be equal to a zero vector. Therefore, x1 = x01 , x2 = x02 . This implies that x is expressed uniquely as x = x1 + x2. Conversely, suppose the vector representation (x = x1 + x2) is unique and x 2 W1 \ W2. Then, x = x + 0 = 0 + x; x,0 2 W1 and x,0 2 W2. Uniqueness of the representation implies that x = 0. Consequently, W1 \ W2 = {0} follows. This completes the proof. In case W1 \ W2 = {0}, V = W1 + W2 is said to be a direct sum of W1 and W2 or we say that V is decomposed into a direct sum of W1 and W2. We symbolically denote this by V = W 1 ⨁W 2 : In this case, the following equality holds:

ð11:17Þ

11.1

Vectors

461

(a) z

z

B

B O

x

O

y

A

y

A’

x

z

(b)

% 2

y

3 x Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces. (a) ℝ3 = W1 + W2; W1 \ W2 = Span{e2}, where e2 is a unit vector in the positive direction of the yaxis. (b) ℝ3 = W1 + W3; W1 \ W3 = {0}

dim V = dimW 1 þ dimW 2 ,

ð11:18Þ

where “dim” stands for dimension of the vector space considered. To prove (11.18), we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned by r1 and r2 linearly independent vectors, respectively, such that

462

11 ð1Þ

ð1Þ

W 1 = Span e1 , e2 , ⋯, eðr11 Þ

and

Vectors and Their Transformation ð2Þ

ð2Þ

W 2 = Span e1 , e2 , ⋯, eðr22 Þ :

ð11:19Þ

This is equivalent to that dimension of W1 and W2 is r1 and r2, respectively. If V = W1 + W2 (here we do not assume that the summation is a direct sum), we have ð1Þ

ð1Þ

ð2Þ

ð2Þ

V = Span e1 , e2 , ⋯, erð11 Þ , e1 , e2 , ⋯, eðr22 Þ :

ð11:20Þ

Then, we have n ≤ r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these (r1 + r2) vectors cannot span V, but we need additional vectors for all vectors including the additional vectors to span V. Thus, we should have n ≤ r1 + r2 accordingly. That is, dim V ≤ dimW 1 þ dimW 2 :

ð11:21Þ

ð2Þ

Now, let us assume that V = W1 ⨁ W2. Then, ei ð1 ≤ i ≤ r 2 Þ must be linearly ð1Þ ð1Þ ð2Þ independent of e1 , e2 , ⋯, erð11 Þ . If not, ei could be described as a linear combinað1Þ ð1Þ ð2Þ ð2Þ tion of e1 , e2 , ⋯, eðr11 Þ . But this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 , in contradiction to that we have V = W1 ⨁ W2. It is because W1 \ W2 = {0} by ð1Þ ð2Þ ð2Þ assumption. Likewise, ej ð1 ≤ j ≤ r 1 Þ is linearly independent of e1 , e2 , ⋯, eðr22 Þ . ð1Þ ð1Þ ð2Þ ð2Þ Hence, e1 , e2 , ⋯, eðr11 Þ , e1 , e2 , ⋯, eðr22 Þ must be linearly independent. Thus, n ≥ r1 + r2. This is because in the vector space V we may well have additional vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n ≤ r1 + r2 from the above. Consequently, we must have n = r1 + r2. Thus, we have proven that V = W 1 ⨁W 2 ⟹ dim V = dimW 1 þ dimW 2 : Conversely, suppose n = r1 + r2. Then, any vector x in V is expressed uniquely as ð1Þ

ð2Þ

x = a1 e1 þ ⋯ þ ar1 erð11 Þ þ b1 e1 þ ⋯ þ br2 eðr22 Þ :

ð11:22Þ

The vector described by the first term is contained in W1 and that described by the second term in W2. Both the terms are again expressed uniquely. Therefore, we get V = W1 ⨁ W2. This is a proof of dim V = dimW 1 þ dimW 2 ⟹ V = W 1 ⨁W 2 : The above statements are summarized as the following theorem: Theorem 11.2 Let V be a vector space and let W1 and W2 be subspaces of V. Also suppose V = W1 + W2. Then, the following relation holds:

11.2

Linear Transformations of Vectors

463

dim V ≤ dimW 1 þ dimW 2 :

ð11:21Þ

dim V = dimW 1 þ dimW 2 ,

ð11:23Þ

Furthermore, we have

if and only if V = W1 ⨁ W2. Theorem 11.2 is readily extended to the case where there are three or more subspaces. That is, having W1, W2, ⋯, Wm so that V = W1 + W2⋯ + Wm, we obtain the following relation: dim V ≤ dimW 1 þ dimW 2 þ ⋯ þ dimW m :

ð11:24Þ

The equality of (11.24) holds if and only if V = W1 ⨁ W2 ⨁ ⋯ ⨁ Wm. In light of Theorem 11.2, Example 11.1 says that 3 = dim ℝ3 < 2 + 2 = dim W1 + dim W2. But dim ℝ3 = 2 + 1 = dim W1 + dim W3. Therefore, we have ℝ3 = W1 ⨁ W3.

11.2

Linear Transformations of Vectors

In the previous section we introduced vectors and their calculation rules in a linear vector space. It is natural and convenient to relate a vector to another vector, as a function f relates a number (either real or complex) to another number such that y = f(x), where x and y are certain two numbers. A linear transformation from the vector space V to another vector space W is a mapping A : V → W such that Aðca þ dbÞ = cAðaÞ þ dAðbÞ:

ð11:25Þ

We will briefly discuss the concepts of mapping at the end of this section. It is convenient to define addition of the linear transformations. It is defined as ðA þ BÞa = Aa þ Ba,

ð11:26Þ

where a is any vector in V. Since (11.25) is a broad but abstract definition, we begin with a well-known simple example of rotation of a vector within a xy-plane (Fig. 11.2). We denote an arbitrary position vector x in the xy-plane by

464

11

Vectors and Their Transformation

Fig. 11.2 Rotation of a vector x within a xy-plane

T T T

x = xe1 þ ye2 = ð e1 e2 Þ

ð11:27Þ

x ,

y

where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of the vector x in reference to e1 and e2. The expression (11.27) is consistent with (11.13). The rotation represented in Fig. 11.2 is an example of a linear transformation. We call this rotation R. According to the definition Rðxe1 þ ye2 Þ = RðxÞ = xRðe1 Þ þ yRðe2 Þ:

ð11:28Þ

Putting Rðxe1 þ ye2 Þ = x0 ,

Rðe1 Þ = e01 and Rðe2 Þ = e02 ,

we have x0 = xe01 þ ye02 = ð e01

e02 Þ

x y

:

ð11:29Þ

From Fig. 11.2 we readily obtain e01 = e1 cos θ þ e2 sin θ, e02 = - e1 sin θ þ e2 cos θ: Using a matrix representation,

ð11:30Þ

11.2

Linear Transformations of Vectors

465

e1′ e2′ = ðe1 e2 Þ

cos θ sin θ

- sin θ : cos θ

ð11:31Þ

Substituting (11.30) into (11.29), we obtain x0 = ðxcos θ - y sin θÞe1 þ ðxsin θ þ y cos θÞe2 :

ð11:32Þ

Meanwhile, x′ can be expressed relative to the original basis vectors e1 and e2. x0 = x0 e1 þ y 0 e2 :

ð11:33Þ

Comparing (11.32) and (11.33), uniqueness of the representation ensures that x0 = x cos θ - y sin θ, y0 = x sin θ þ y cos θ:

ð11:34Þ

Using a matrix representation once again, x′ y′

=

cos θ sin θ

- sin θ cos θ

x : y

ð11:35Þ

Further combining (11.29) and (11.31), we get x ′ = RðxÞ = ðe1 e2 Þ

cos θ sin θ

- sin θ cos θ

x : y

ð11:36Þ

The above example demonstrates that the linear transformation R has a (2, 2) matrix representation shown in (11.36). Moreover, this example obviously shows that if a vector is expressed as a linear combination of the basis vectors, the “coordinates” (represented by a column vector) can be transformed as well by the same matrix. Regarding an abstract n-dimensional linear vector space Vn, the linear vector transformation A is given by A ð xÞ = ð e1 ⋯ en Þ

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

,

ð11:37Þ

where e1, e2, ⋯, and en are basis vectors and x1, x2, ⋯, and xn are the corresponding coordinates of a vector x =

n

i=1

xi ei . We assume that the transformation is a mapping

A: Vn → Vn (i.e., endomorphism). In this case the transformation is represented by a (n, n) matrix. Note that the matrix operates on the basis vectors from the right and that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we often omit a parenthesis to simply write Ax.

466

11

Vectors and Their Transformation

Here we mention matrix notation for later convenience. We often identify a linear vector transformation with its representation matrix and denote both transformation and matrix by A. On this occasion, we write a11 ⋮ an1

A=

⋯ ⋱ ⋯

a1n ⋮ ann

ð11:38Þ

A = ðAÞij = aij , etc:,

,

where with the second expression (A)ij and (aij) represent the matrix A itself; for we ~ etc. The notation (11.38) frequently use indexed matrix notations such as A-1, A{, A, can conveniently be used in such cases. Note moreover that aij represents the matrix A as well. Equation (11.37) has duality such that the matrix A operates either on the basis vectors or coordinates. This can explicitly be written as

A ð xÞ = ð e1 ⋯ e n Þ

= ð e1 ⋯ en Þ

a11



a1n

x1









an1



ann

a11



a1n

x1









an1



ann

xn

xn

ð11:39Þ :

That is, we assume the associative law with the above expression. Making summation representation, we have n l = 1 a1l xl

x1 n

AðxÞ =

ea k = 1 k k1



n

ea k = 1 k kn



= ð e 1 ⋯ en Þ



n l = 1 anl xl

xn = =

n

n

k=1 n l=1

ea x l = 1 k kl l n ea k = 1 k kl

n

xl =

n

k=1

a x l = 1 kl l

ek :

That is, the above equation can be viewed in either of two ways, i.e., coordinate transformation with fix vectors or vector transformation with fixed coordinates. Also, (11.37) can formally be written as AðxÞ = ðe1 ⋯ en ÞA

x1 ⋮ xn

= ðe1 A ⋯ en AÞ

x1 ⋮ xn

,

ð11:40Þ

11.2

Linear Transformations of Vectors

467

where we assumed that the distributive law holds with operation of A on (e1⋯en). Meanwhile, if in (11.37) we put xi = 1, xj = 0 ( j ≠ i), we get n

A ð ei Þ =

ea : k = 1 k ki

Therefore, (11.39) can be rewritten as x1 ⋮ xn

AðxÞ = ðAðe1 Þ ⋯ Aðen ÞÞ

:

ð11:41Þ

Since xi (1 ≤ i ≤ n) can arbitrarily be chosen, comparing (11.40) and (11.41) we have ei A = Aðei Þ ð1 ≤ i ≤ nÞ:

ð11:42Þ

The matrix representation is unique in reference to the same basis vectors. Suppose that there is another matrix representation of the transformation A such that AðxÞ = ðe1 ⋯ en Þ

a011 ⋮ a0n1

⋯ ⋱ ⋯

a01n ⋮ a0nn

x1 ⋮ xn

:

ð11:43Þ

Subtracting (11.43) from (11.37), we obtain

ðe1 ⋯ en Þ

a11

⋯ a1n

x1









⋯ ann

xn

an1

n k=1

= ð e1 ⋯ en Þ

a1k - a01k



a01n

x1









a0n1



a0nn

xn

xk

⋮ n k=1

-

a011

= 0:

ank - a0nk xk n

On the basis of the linear dependence of the basis vectors, k=1

aik - a0ik xk =

0 ð1 ≤ i ≤ nÞ: This relationship holds for any arbitrarily and independently chosen complex numbers xi (1 ≤ i ≤ n). Therefore, we must have aik = a0ik ð1 ≤ i, k ≤ nÞ, meaning that the matrix representation of A is unique with regard to fixed basis vectors. Nonetheless, if a set of vectors e1, e2, ⋯, and en does not constitute basis vectors (i.e., those vectors are linearly dependent), the aforementioned uniqueness of the matrix representation loses its meaning. For instance, in V2 take vectors e1 and e2 such that e1 = e2 (i.e., the two vectors are linearly dependent) and let the

468

11

transformation matrix be B =

1 0

0 2

Vectors and Their Transformation

. This means that e1 should be e1 after the

transformation. At the same time, the vector e2 (=e1) should be converted to 2e2 (=2e1). It is impossible except for the case of e1 = e2 = 0. The above matrix B in its own right is an object of matrix algebra, of course. Putting a = b = 0 and c = d = 1 in the definition of (11.25) for the linear transformation, we obtain Að0Þ = Að0Þ þ Að0Þ: Combining this relation with (11.4) gives Að0Þ = 0:

ð11:44Þ

Then do we have a vector u ≠ 0 for which A(u) = 0? An answer is yes. This is because if a (2, 2) matrix of

1 0

0 0

is chosen for R, we get a linear transformation

such that e01 e02 = ðe1 e2 Þ

1 0

0 0

= ðe1 0Þ:

That is, we have R(e2) = 0. In general, vectors x (2V ) satisfying A(x) = 0 form a subspace in a vector space V. This is because A(x) = 0 and A(y) = 0 ⟹ A(x + y) = A(x) + A(y) = 0, A(cx) = cA(x) = 0. We call this subspace of a maximum dimension a null-space and represent it as Ker A, where Ker stands for “kernel.” In other words, Ker A = A-1(0). Note that this symbolic notation does not ensure the existence of the inverse transformation A-1 (vide infra) but represents a set comprising elements x that satisfy A(x) = 0. In the above example Ker R = Span{e2}. Let us introduce frequently used terminology and notation. First, suppose that with a vector space Vn an endomorphism A of Vn exists. That is, we have AðV n Þ  AðxÞ; 8 x 2 V n , AðxÞ 2 V n : Then, A(Vn) forms a subspace of Vn. In fact, for any x, y 2 Vn, we have A(x), A(y) 2 A(Vn) ⟹ A(x) + A(y) = A(x + y) 2 A(Vn) and cA(x) = A(cx) 2 A(Vn). Since A is an endomorphism, obviously A(Vn) ⊂ Vn. Hence, A(Vn) is a subspace of Vn. The subspace A(Vn) is said to be an image of transformation A and sometimes denoted by Im A. The number of dimA(Vn) is said to be a rank of the linear transformation A. That is, we write

11.2

Linear Transformations of Vectors

469

dimAðV n Þ = rank A: Also the number of dim Ker A is said to be a nullity of the linear transformation A. That is, we have dim Ker A = nullity A: With respect to the two subspaces A(Vn) and Ker A, we show an important theorem (the so-called dimension theorem) below. Theorem 11.3 Dimension Theorem Let Vn be a linear vector space of dimension n. Also let A be an endomorphism: Vn → Vn. Then we have dim V n = dimAðV n Þ þ dim Ker A:

ð11:45Þ

Equation (11.45) can be written succinctly as dim V n = rank A þ nullity A: Proof Let e1, e2, ⋯, and en be basis vectors of Vn. First, assume Aðe1 Þ = Aðe2 Þ = ⋯ = Aðen Þ = 0: This implies that nullity A = n. Then, A

n i=1

xi ei =

n i=1

xi Aðei Þ = 0. Since xi is 8

arbitrarily chosen, the expression means that A(x) = 0 for x 2 Vn. This implies A = 0. That is, rank A = 0. Thus, (11.45) certainly holds. n

To proceed with proof of the theorem, we think of a linear combination i=1

ci ei .

Next, assume that Ker A = Span{e1, e2, ⋯, eν} (ν < n); dim Ker A = ν. After A is n

operated on the above linear combination, we are left with n

c Aðei Þ = i=1 i

n

i = νþ1

c Aðei Þ = 0: i = νþ1 i

ci Aðei Þ. We put ð11:46Þ

Suppose that the (n - ν) vectors A(ei) (ν + 1 ≤ i ≤ n) are linearly dependent. Then without loss of generality we can assume cν + 1 ≠ 0. Dividing (11.46) by cν + 1 we obtain

470

11

Vectors and Their Transformation

cνþ2 c Aðeνþ2 Þ þ ⋯ þ n Aðen Þ = 0, cνþ1 cνþ1 c cn Aðeνþ1 Þ þ A νþ2 eνþ2 þ ⋯ þ A e = 0, cνþ1 cνþ1 n

Aðeνþ1 Þ þ

A eνþ1 þ

cνþ2 c e þ ⋯ þ n en = 0: cνþ1 νþ2 cνþ1

cn Meanwhile, the (ν + 1) vectors e1 , e2 , ⋯, eν , and eνþ1 þ ccνþ2 eνþ2 þ ⋯ þ cνþ1 en νþ1 n are linearly independent, because e1, e2, ⋯, and en are basis vectors of V . This would imply that the dimension of Ker A is ν + 1, but this is in contradiction to Ker A = Span{e1, e2, ⋯, eν}. Thus, the (n - ν) vectors A(ei) (ν + 1 ≤ i ≤ n) should be linearly independent. Let Vn - ν be described as

V n - ν = SpanfAðeνþ1 Þ, Aðeνþ2 Þ, ⋯, Aðen Þg:

ð11:47Þ

Then, Vn - ν is a subspace of Vn, and so dimA(Vn) ≥ n - ν = dim Vn Meanwhile, from (11.46) we have A

n

ce i = νþ1 i i

- ν

.

= 0:

From the above discussion, however, this relation holds if and only if cν + 1 = ⋯ = cn = 0. This implies that Ker A \ Vn - ν = {0}. Then, we have V n - ν þ Ker A = V n - ν

Ker A:

Furthermore, from Theorem 11.2 we have dim V n - ν

Ker A = dimV n - ν þ dim Ker A = ðn - νÞ þ ν = n = dimV n :

Thus, we must have dimA(Vn) = n - ν = dim Vn - ν. Since Vn - ν is a subspace of V and Vn - ν ⊂ A(Vn) from (11.47), Vn - ν = A(Vn). To conclude, we get n

dimAðV n Þ þ dim Ker A = dimV n , V n = A ðV n Þ

Ker A:

ð11:48Þ

This completes the proof. Comparing Theorem 11.3 with Theorem 11.2, we find that Theorem 11.3 is a special case of Theorem 11.2. Equations (11.45) and (11.48) play an important role

11.2

Linear Transformations of Vectors

471

in the theory of linear vector space. We add that Theorem 11.3 holds with a linear transformation that connects two vector spaces with different dimensions [1–3]. As an exercise, we have a following example: Example 11.2 Let e1, e2, e3, e4 be basis vectors of V4. Let A be an endomorphism of V4 and described by A=

1 0 1 0

0 1 0 1

1 1 1 1

0 0 0 0

:

We have

ð e1 e 2 e 3

e4 Þ

1

0 1 0

0

1 1 0

1 0

0 1 0 1 1 0

= ð e1 þ e3 e2 þ e4 e1 þ e2 þ e3 þ e4 0 Þ:

That is, A ð e1 Þ = e 1 þ e 3 ,

Aðe2 Þ = e2 þ e4 ,

A ð e3 Þ = e 1 þ e 2 þ e3 þ e4 ,

Aðe4 Þ = 0: We have Að- e1 - e2 þ e3 Þ = - Aðe1 Þ - Aðe2 Þ þ Aðe3 Þ = 0: Then, we find A V 4 = Spanfe1 þ e3 , e2 þ e4 g,

Ker A = Spanf - e1 - e2 þ e3 , e4 g: ð11:49Þ

For any x 2 Vn, using scalar ci (1 ≤ i ≤ 4), we have x = c 1 e1 þ c2 e2 þ c3 e3 þ c4 e4 1 1 = ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3 - c1 Þðe2 þ e4 Þ 2 2 1 1 þ ðc3 - c1 Þð - e1 - e2 þ e3 Þ þ ðc1 - 2c2 - c3 þ 2c4 Þe4 : 2 2

ð11:50Þ

Thus, x has been uniquely represented as (11.50) with respect to basis vectors (e1 + e3), (e2 + e4), (-e1 - e2 + e3), and e4. The linear independence of these vectors can easily be checked by equating (11.50) with zero. We also confirm that

472

11

Vectors and Their Transformation

injective

Fig. 11.3 Concept of mapping from a set X to another set Y

surjective bijective: injective + surjective

V4 = A V4

Ker A:

In the present case, (11.45) is read as dim V 4 = dimA V 4 þ dim Ker A = 2 þ 2 = 4: Linear transformation is a kind of mapping. Figure 11.3 depicts the concept of mapping. Suppose two sets of X and Y. The mapping f is a correspondence between an element x (2X) and y (2Y ). The set f(X) (⊂Y ) is said to be a range of f. (1) The mapping f is injective: If x1 ≠ x2 ⟹ f(x1) ≠ f(x2). (2) The mapping is surjective: f(X) = Y. For 8y 2 Y corresponding element(s) x 2 X exist(s). (3) The mapping is bijective: If the mapping f is both injective and surjective, it is said to be bijective (or reversible mapping or invertible mapping). A mapping that is not invertible is said to be a non-invertible mapping. If the mapping f is bijective, a unique element ∃x 2 X exists for 8y 2 Y such that f(x) = y. In terms of solving an equation, we say that with any given y we can find a unique solution x to the equation f(x) = y. In this case x is said to be an inverse element to y and this is denoted by x = f -1( y). The mapping f -1 is called an inverse mapping. If the linear transformation is relevant, the mapping is said to be an inverse transformation. Here we focus on a case where both X and Y form a vector space and the mapping is an endomorphism. Regarding the linear transformation A: Vn → Vn (i.e., an endomorphism of Vn), we have a following important theorem: Theorem 11.4 Let A: Vn → Vn be an endomorphism of Vn. A necessary and sufficient condition for the existence of an inverse transformation to A (i.e., A-1) is A-1(0) = {0}.

11.3

Inverse Matrices and Determinants

473

Proof Suppose A-1(0) = {0}. Then A(x1) = A(x2) ⟺ A(x1 - x2) = 0 ⟺ x1 x2 = 0; i.e., x1 = x2. This implies that the transformation A is injective. Other way round, suppose that A is injective. If A-1(0) ≠ {0}, there should be b (≠0) with which A(b) = 0. This is, however, in contradiction to that A is injective. Then we must have A-1(0) = {0}. Thus, A-1(0) = {0} ⟺ A is injective. Meanwhile, A-1(0) = {0} ⟺ dim A-1(0) = 0 ⟺ dim A(Vn) = n (due to Theorem 11.3); i.e., A(Vn) = Vn. Then A-1(0) = {0} ⟺ A is surjective. Combining this with the above-mentioned statement, we have A-1(0) = {0} ⟺ A is bijective. This statement is equivalent to that an inverse transformation exists. This completes the proof. In the proof of Theorem 11.4, to show that A-1(0) = {0} ⟺ A is surjective, we have used the dimension theorem (Theorem 11.3), for which the relevant vector space is finite (i.e., n-dimensional). In other words, that A is surjective is equivalent to that A is injective with a finite-dimensional vector space and vice versa. To conclude, so far as we are thinking of the endomorphism of a finite-dimensional vector space, if we can show it is either injective or surjective, the other necessarily follows and, hence, the mapping is bijective.

11.3

Inverse Matrices and Determinants

The existence of the inverse transformation plays a particularly important role in the theory of linear vector spaces. The inverse transformation is a linear transformation. Let x1 = A-1(y1), x2 = A-1(y2). Also, we have A(c1x1 + c2x2) = c1A(x1) + c2A(x2) = c1y1 + c2y2. Thus, c1x1 + c2x2 = A-1(c1y1 + c2y2) = c1A-1(y1) + c2A-1(y2), showing that A-1 is a linear transformation. As already mentioned, a matrix that represents a linear transformation A is uniquely determined with respect to fixed basis vectors. This should be the case with A-1 accordingly. We have an important theorem for this. Theorem 11.5 [4] The necessary and sufficient condition for the matrix A-1 that represents the inverse transformation to A to exist is that detA ≠ 0 (“det” means a determinant). Here the matrix A represents the linear transformation A. The matrix A-1 is uniquely determined and given by A-1

ij

= ð- 1Þiþj ðM Þji =ðdetAÞ,

ð11:51Þ

where (M)ij is the minor with respect to the element Aij. Proof First, we suppose that the matrix A-1 exists so that it satisfies the following relation:

474

11 n k=1

A-1

Vectors and Their Transformation

ðAÞkj = δij ð1 ≤ i, j ≤ nÞ:

ik

ð11:52Þ

On this condition, suppose that detA = 0. From the properties of determinants, this implies that one of the columns of A (let it be the m-th column) can be expressed as a linear combination of the other columns of A such that Akm =

A c: j ≠ m kj j

ð11:53Þ

Putting i = m in (11.52), multiplying by cj, and summing over j ≠ m, we get n k=1

A-1

A c j ≠ m kj j

mk

=

δ c j ≠ m mj j

= 0:

ð11:54Þ

= 1:

ð11:55Þ

From (11.52) and (11.53), however, we obtain n k=1

A-1

mk

A-1

=

A c j ≠ m kj j

ik

ðAÞkm

i=m

There is the inconsistency between (11.54) and (11.55). The inconsistency resulted from the supposition that the matrix A-1 exists. Therefore, we conclude that if detA = 0, A-1 does not exist. Taking contraposition to the above statement, we say that if A-1 exists, detA ≠ 0. Suppose next that detA ≠ 0. In this case, on the basis of the well-established result, a unique A-1 exists and it is given by (11.51). This completes the proof. Summarizing the characteristics of the endomorphism within a finite-dimensional vector space, we have injective ⟺ surjective ⟺ bijective ⟺ detA ≠ 0: Let a matrix A be A=

⋯ ⋱ ⋯

a11 ⋮ an1

a1n ⋮ ann

:

The determinant of a matrix A is denoted by detA or by a11



a1n

⋮ an1

⋱ ⋯

⋮ : ann

Here, the determinant is defined as

ð11:56Þ

11.3

Inverse Matrices and Determinants

det A  σ=

1 i1

2 i2

475

⋯ ⋯

εðσ Þa1i1 a2i2 ⋯anin ,

n in

ð11:57Þ

where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the case of even permutations) or - (for odd permutations). We deal with triangle matrices for future discussion. It is denoted by 

a11 : ⋱

T=

ð11:58Þ

, :

0

ann

where an asterisk (*) means that upper right off-diagonal elements can take any complex numbers (including zero). A large zero shows that all the lower left off-diagonal elements are zero. Its determinant is given by det T = a11 a22 ⋯ann :

ð11:59Þ

In fact, focusing on anin we notice that only if in = n, anin does not vanish. Then we get det A  σ=

1 i1

2 ⋯ n-1 i2 ⋯ in - 1

εðσ Þa1i1 a2i2 an - 1in - 1 ann :

Repeating this process, we finally obtain (11.59). The endomorphic linear transformation can be described succinctly as AðxÞ = y, where we have vectors such that x =

n i=1

ð11:60Þ

xi ei and y =

n i=1

yi ei . In reference to the

same set of basis vectors (e1 ⋯ en) and using a matrix representation, we have AðxÞ = ðe1 ⋯ en Þ

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

= ð e1 ⋯ en Þ

y1 ⋮ yn

:

ð11:61Þ

From the unique representation of a vector in reference to the basis vectors, (11.61) is simply expressed as

476

11 ⋯ ⋱ ⋯

a11 ⋮ an1

a1n ⋮ ann

x1 ⋮ xn

Vectors and Their Transformation y1 ⋮ yn

=

:

ð11:62Þ

With a shorthand notation, we have yi =

n

a x k = 1 ik k

ð1 ≤ i ≤ nÞ:

ð11:63Þ

From the above discussion, for the linear transformation A to be bijective, det A ≠ 0. In terms of the system of linear equations, we say that for (11.62) to have a unique solution

x1 ⋮ xn

for a given

y1 ⋮ yn

, we must have detA ≠ 0. Conversely,

detA = 0 is equivalent to that (11.62) has indefinite solutions or has no solution. As far as the matrix algebra is concerned, (11.61) is symbolically described by omitting a parenthesis as Ax = y:

ð11:64Þ

However, when the vector transformation is explicitly taken into account, the full representation of (11.61) should be borne in mind. The relations (11.60) and (11.64) can be considered as a set of simultaneous equations. A necessary and sufficient condition for (11.64) to have a unique solution x for a given y is det A ≠ 0. In that case, the solution x of (11.64) can be symbolically expressed as x = A - 1 y,

ð11:65Þ

where A-1 represents an inverse matrix of A. Here, we have a good place to supplement somewhat confusing but important concepts of matrix algebra in relation to the inverse matrix. Let A be a (n, n) matrix expressed as A=

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

:

ð11:56Þ

We often need a matrix “derived” from A of (11.56) so that we may obtain (m, m) matrix with m < n. Let us call such a matrix derived from A a submatrix A. Among various types of submatrices A, we think of a special type of matrix Aði, jÞ that is defined as a submatrix obtained by striking out the i-th row and j-th column. Then, Aði, jÞ is a (n - 1, n - 1) submatrix. We call detAði, jÞ a minor with respect to aij. Moreover, we define a cofactor Δij as Δij  ð- 1Þiþj detAði, jÞ:

ð11:66Þ

11.3

Inverse Matrices and Determinants

477

In (11.66), Δij is said to be a cofactor of aij; see Theorem 11.5. A transposed matrix of a matrix consisting of Δij is sometimes referred to as an adjugate matrix of A. In relation to Theorem 11.5, if A is non-singular, the adjugate matrix of A is given by A-1(detA). In that case, the adjugate matrix of A is non-singular as well. In a particular case, suppose that we strike out the i-th row and i-th column. Then, Aði, iÞ is called a principal submatrix and detAði, iÞ is said to be a principal minor with respect to aii, i.e., a diagonal element of A. In this case, we have Δii  detAði, iÞ. If we strike out several pairs of i-th rows and i-th columns, we can make (m, m) matrices (m < n), which are also called principal submatrices (of m-th order). Determinants of the resulting matrices are said to be principal minors of m-th order. Then, aii (1 ≤ i ≤ n) itself is a principal minor as well. Related discussion and examples have already appeared in Sect. 7.4. Other examples will appear in Sects. 12.2 and 14.5. Example 11.3 Think of three-dimensional rotation by θ in ℝ3 around the z-axis. The relevant transformation matrix is R=

cos θ sin θ 0

- sin θ cos θ 0

0 0 1

:

As detR = 1 ≠ 0, the transformation is bijective. This means that for 8y 2 ℝ3 there is always a corresponding x 2 ℝ3. This x can be found by solving Rx = y; i.e., x = R-1y. Putting x = xe1 + ye2 + ze3 and y = x′e1 + y′e2 + z′e3, a matrix representation is given by x y z

=R

-1

x′ y′ z′

cos θ - sin θ 0

=

sin θ cos θ 0

0 0 1

x′ y′ z′

:

Thus, x can be obtained by rotating y by -θ. Example 11.4 Think of a following matrix that represents a linear transformation P: P=

1 0 0

0 1 0

0 0 0

:

This matrix transforms a vector x = xe1 + ye2 + ze3 into y = xe1 + ye2 as follows: 1 0 0

0 1 0

0 0 0

x y z

=

x y 0

:

ð11:67Þ

478

11

Vectors and Their Transformation

z

Fig. 11.4 Example of an endomorphism P: ℝ3 → ℝ3

P y

O

x Here, we are thinking of an endomorphism P: ℝ3 → ℝ3. Geometrically, it can be viewed as in Fig. 11.4. Let us think of (11.67) from a point of view of solving a system of linear equations and newly consider the next equation. In other words, we are thinking of finding x, y, and z with given a, b, and c in the following equation: 1 0 0

0 1 0

0 0 0

x y z

=

a b c

:

ð11:68Þ

If c = 0, we find a solution of x = a, y = b, but z can be any (complex) number; we have thus indefinite solutions. If c ≠ 0, however, we have no solution. The former situation reflects the fact that the transformation represented by P is not injective. Meanwhile, the latter reflects the fact that the transformation is not surjective. Remember that as detP = 0, the transformation is not injective or surjective.

11.4

Basis Vectors and Their Transformations

In the previous sections we show that a vector is uniquely represented as a column vector in reference to a set of the fixed basis vectors. The representation, however, will be changed under a different set of basis vectors.

11.4

Basis Vectors and Their Transformations

479

First let us think of a linear transformation of a set of basis vectors e1, e2, ⋯, and en. The transformation matrix A representing a linear transformation A is defined as follows: a11 ⋮ an1

A=

⋯ ⋱ ⋯

a1n ⋮ ann

:

Notice here that we often denote both a linear transformation and its corresponding matrix by the same character. After the transformation, suppose that the resulting vectors are given by e01 , e02 , and e0n . This is explicitly described as ⋯ ⋱ ⋯

a11 ⋮ an1

ðe1 ⋯ en Þ

a1n ⋮ ann

= e1′ ⋯ en′ :

ð11:69Þ

With a shorthand notation, we have e0i =

n

a e k = 1 ki k

ð1 ≤ i ≤ nÞ:

ð11:70Þ

Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors e01 , e02 , and e0n may or may not be linearly independent. Let us operate both sides of (11.70) from the left on

ðe1 ⋯ en Þ

a11 ⋮ an1

x1 ⋮ xn

and equate both the sides to zero. That is, ⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

= e1′ ⋯ en′

x1 ⋮ xn

= 0:

Since e1, e2, ⋯, and en are the basis vectors, we get a11 ⋮ an1 Meanwhile, we must have

⋯ a1n ⋱ ⋮ ⋯ ann x1 ⋮ xn

x1 ⋮ xn

= 0:

ð11:71Þ

= 0 so that e01 , e02 , and e0n can be linearly

independent (i.e., so as to be a set of basis vectors). But this means that (11.71) has such a unique (and trivial) solution and, hence, detA ≠ 0. If conversely detA ≠ 0, (11.71) has a unique trivial solution and e01 , e02 , and e0n are linearly independent. Thus, a necessary and sufficient condition for e01 , e02 , and e0n to be a set of basis vectors is detA ≠ 0. If detA = 0, (11.71) has indefinite solutions (including a trivial solution) and e01 , e02 , and e0n are linearly dependent and vice versa. In case detA ≠ 0, an inverse matrix A-1 exists, and so we have

480

11

Vectors and Their Transformation

ðe1 ⋯ en Þ = e01 ⋯ e0n A - 1 :

ð11:72Þ

In the previous steps we see how the linear transformation (and the corresponding matrix representation) converts a set of basis vectors (e1⋯ en) to another set of basis vectors e01 ⋯ e0n . Is this possible then to find a suitable transformation between two sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector can be expressed uniquely by a linear combination of any set of basis vectors. A whole array of such linear combinations uniquely define a transformation matrix between the two sets of basis vectors as expressed in (11.69) and (11.72). The matrix has non-zero determinant and has an inverse matrix. A concept of the transformation between basis vectors is important and very often used in various fields of natural science. Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis vectors of A(V4) and those of Ker A span V4 in total. Therefore, in light of the above argument, there should be a linear transformation R between the two sets of vectors, i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, - e1 - e2 + e3, e4. Moreover, the matrix R associated with the linear transformation must be non-singular (i.e., detR ≠ 0). In fact, we find that R is expressed as

R=

1

0

-1

0

0

1

-1

0

1 0

0 1

:

1 0 0 1

This is because we have a following relation between the two sets of basis vectors:

ð e 1 e2 e3 e4 Þ

1 0

0 -1 0 1 -1 0

1 0

0 1

1 0 0 1

= ð e 1 þ e3

e 2 þ e4

- e 1 - e 2 þ e3 e 4 Þ :

We have det R = 2 ≠ 0 as expected. Next, let us consider successive linear transformations of vectors. Again we assume that the transformations are endomorphism:Vn → Vn. We have to take into account transformations of basis vectors along with the targeted vectors. First we choose a transformation by a non-singular matrix (having a non-zero determinant) for the subsequent transformation to have a unique matrix representation (vide supra). The vector transformation by P is expressed as PðxÞ = ðe1 ⋯ en Þ

p11 ⋮ pn1

⋯ ⋱ ⋯

p1n ⋮ pnn

x1 ⋮ xn

,

ð11:73Þ

11.4

Basis Vectors and Their Transformations

481

where the non-singular matrix P represents the transformation P. Notice here that the transformation and its matrix are represented by the same P. As mentioned in Sect. 11.2, the matrix P can be operated either from the right on the basis vectors or from the left on the column vector. We explicitly write p11



p1n

x1









pn1



pnn

xn

PðxÞ = ðe1 ⋯ en Þ

, ð11:74Þ

x1 = e01 ⋯ e0n



,

xn where e01 ⋯ e0n = (e1⋯ en)P [here P is the non-singular matrix defined in (11.73)]. Alternatively, we have

PðxÞ = ðe1 ⋯ en Þ

p11



p1n

x1









pn1



pnn

xn

ð11:75Þ

x01 = ðe1 ⋯ en Þ



,

x0n where x01 ⋮ x0n

=

⋯ ⋱ ⋯

p11 ⋮ pn1

p1n ⋮ pnn

x1 ⋮ xn

:

ð11:76Þ

Equation (11.75) gives a column vector representation regarding the vector that has been obtained by the transformation P and is viewed in reference to the basis vectors (e1⋯en). Combining (11.74) and (11.75), we get PðxÞ = e01 ⋯ e0n

x1 ⋮ xn

= ðe1 ⋯ en Þ

x01 ⋮ x0n

:

ð11:77Þ

We further make another linear transformation A : Vn → Vn. In this case a corresponding matrix A may be non-singular (i.e., detA ≠ 0) or singular (detA = 0). We have to distinguish the matrix representations of the two cases. It is because the matrix representations are uniquely defined in reference to an individual set of basis vectors; see (11.37) and (11.43). Let us denote the matrices AO and A′ with respect to

482

11

Vectors and Their Transformation

the basis vectors (e1⋯ en) and e01 ⋯ e0n , respectively. Then, A[P(x)] can be described in two different ways as follows: A½PðxÞ = e01 ⋯ e0n A0

x1 ⋮ xn

x01 ⋮ x0n

= ðe1 ⋯ en ÞAO

:

ð11:78Þ

This can be rewritten in reference to a linearly independent set of vectors e1, ⋯, en as A½PðxÞ = ½ðe1 ⋯ en ÞPA0 

x1 ⋮ xn

= ð e1 ⋯ en Þ A O P

As (11.79) is described for a vector x =

n i=1

x1 ⋮ xn

:

ð11:79Þ

xi ei arbitrarily chosen in Vn, we get

PA0 = AO P:

ð11:80Þ

Since P is non-singular, we finally obtain A0 = P - 1 AO P:

ð11:81Þ

We can see (11.79) from a point of view of successive linear transformations. When the subsequent operation is viewed in reference to the basis vectors e01 , ⋯, e0n newly reached by the precedent transformation, we make it a rule to write the relevant subsequent operator A′ from the right. In the case where the subsequent operation is viewed in reference to the original basis vectors, however, we write the subsequent operator AO from the left. Further discussion and examples can be seen in Part IV. We see (11.81) in a different manner. Suppose we have AðxÞ = ðe1 ⋯ en ÞAO

x1 ⋮ xn

:

ð11:82Þ

Note that since the transformation A has been performed in reference to the basis vectors (e1⋯en), AO should be used for the matrix representation. This is rewritten as AðxÞ = ðe1 ⋯ en ÞPP - 1 AO PP - 1 Meanwhile, any vector x in Vn can be written as

x1 ⋮ xn

:

ð11:83Þ

11.4

Basis Vectors and Their Transformations

483

x1 x = ð e1 ⋯ en Þ

⋮ xn x1

= ðe1 ⋯ en ÞPP

-1

x1 =



e01 ⋯ e0n

P

-1

x1 =



xn

e01 ⋯ e0n

xn



:

xn ð11:84Þ

In (11.84) we put ðe1 ⋯ en ÞP = e01 ⋯ e0n ,

x1 ⋮ xn

P-1

=

x~1 ⋮ x~n

:

ð11:85Þ

Equation (11.84) gives a column vector representation regarding the same vector x viewed in reference to the basis set of vectors e1, ⋯, en or e01 , ⋯, e0n . Equation (11.85) should not be confused with (11.76). That is, (11.76) relates the two coordinates

x1 ⋮ xn

and

x01 ⋮ x0n

of different vectors before and after the transformation

viewed in reference to the same set of basis vectors. The relation (11.85), however, relates two coordinates

x1 ⋮ xn

and

x~1 ⋮ x~n

of the same vector viewed in reference to

different set of basis vectors. Thus, from (11.83) we have AðxÞ = e01 ⋯ e0n P - 1 AO P

x~1 ⋮ x~n

:

ð11:86Þ

Meanwhile, viewing the transformation A in reference to the basis vectors e01 ⋯e0n , we have AðxÞ = e01 ⋯ e0n A0

x~1 ⋮ x~n

:

ð11:87Þ

Equating (11.86) and (11.87), A0 = P - 1 AO P:

ð11:88Þ

Thus, (11.81) is recovered. The relations expressed by (11.81) and (11.88) are called a similarity transformation on A. The matrices A0 and A′ are said to be similar to each other. As

484

11

Vectors and Their Transformation

mentioned earlier, if A0 (and hence A′) is non-singular, the linear transformation A produces a set of basis vectors other than e01 , ⋯, e0n , say e001 , ⋯, e00n . We write this symbolically as ðe1 ⋯en ÞPA = e01 ⋯e0n A = e001 ⋯e00n :

ð11:89Þ

Therefore, such A defines successive transformations of the basis vectors in conjunction with P defined in (11.73). The successive transformations and resulting basis vectors supply us with important applications in the field of group theory. Topics will be dealt with in Parts IV and V.

References 1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 3. Hassani S (2006) Mathematical physics. Springer, New York 4. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York

Chapter 12

Canonical Forms of Matrices

In Sect. 11.4 we saw that the transformation matrices are altered depending on the basis vectors we choose. Then a following question arises. Can we convert a (transformation) matrix to as simple a form as possible by similarity transformation(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in a linear vector space Vn we can always find a non-singular transformation matrix between the two. In conjunction with the transformation of the basis vectors, the matrix undergoes similarity transformation. It is our task in this chapter to find a simple form or a specific form (i.e., canonical form) of a matrix as a result of the similarity transformation. For this purpose, we should first find eigenvalue(s) and corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices, we get various canonical forms of matrices such as a triangle matrix and a diagonal matrix. Regarding any form of matrices, we can treat these matrices under a unified form called the Jordan canonical form.

12.1

Eigenvalues and Eigenvectors

An eigenvalue problem is one of the important subjects of the theory of linear vector spaces. Let A be a linear transformation on Vn. The resulting matrix gets to several different kinds of canonical forms of matrices. A typical example is a diagonal matrix. To reach a satisfactory answer, we start with a so-called eigenvalue problem. Suppose that after the transformation of x we have AðxÞ = αx,

ð12:1Þ

where α is a certain (complex) number. Then we say that α is an eigenvalue and that x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37) of Sect. 11.2, we have

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_12

485

486

12

AðxÞ = ðe1 ⋯ en Þ

a11 ⋮ an1

⋯ ⋱ ⋯

x1 ⋮ xn

a1n ⋮ ann

Canonical Forms of Matrices

= αðe1 ⋯ en Þ

x1 ⋮ xn

:

From linear dependence of e1, e2, ⋯, and en, we simply write ⋯ a1n ⋱ ⋮ ⋯ ann

a11 ⋮ an1 If we identify x with

x1 ⋮ xn

x1 ⋮ xn



x1 ⋮ xn

:

at fixed basis vectors (e1⋯en), we may naturally

rewrite (12.1) as Ax = αx:

ð12:2Þ

If x1 and x2 belong to the eigenvalue α, so do x1 + x2 and cx1 (c is an appropriate complex number). Therefore, all the eigenvectors belonging to the eigenvalue α along with 0 (a zero vector) form a subspace of A (within Vn) corresponding to the eigenvalue α. Strictly speaking, we should use terminologies such as a “proper” (or ordinary) eigenvalue, eigenvector, eigenspace, etc. to distinguish them from a “generalized” eigenvalue, eigenvector, eigenspace, etc. We will return to this point later. Further rewriting (12.2) we have ðA - αE Þx = 0,

ð12:3Þ

where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds (as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have jA - αE j = 0:

ð12:4Þ

In (12.4), |A - αE| stands for det(A - αE). Now let us define the following polynomial:

f A ðxÞ = jxE - Aj =

x - a11 ⋮ - an1

⋯ ⋱ ⋯

- a1n ⋮ : x - ann

ð12:5Þ

A necessary and sufficient condition for α to be an eigenvalue is that α is a root of fA(x) = 0. The function fA(x) is said to be a characteristic polynomial and we call fA(x) = 0 a characteristic equation. This is an n-th order polynomial. Putting

12.1

Eigenvalues and Eigenvectors

487

f A ðxÞ = xn þ a1 xn - 1 þ ⋯ þ an ,

ð12:6Þ

a1 = - ða11 þ a22 þ ⋯ þ ann Þ  - TrA,

ð12:7Þ

we have

where Tr stands for “trace” that is a summation of diagonal elements. Moreover, an = ð- 1Þn j A j :

ð12:8Þ

The characteristic equation fA(x) = 0 has n roots including possible multiple roots. Let those roots be α1, ⋯, αn (some of them may be identical). Then we have n

f A ð xÞ =

i=1

ðx - αi Þ:

ð12:9Þ

Furthermore, according to relations between roots and coefficients we get α1 þ ⋯ þ αn = - a1 = TrA,

ð12:10Þ

α1 ⋯αn = ð- 1Þn an = jAj:

ð12:11Þ

The characteristic equation fA(x) is invariant under a similarity transformation. In fact, f P - 1 AP ðxÞ = xE - P - 1 AP = j P - 1 ðxE - AÞP j = jPj - 1 jxE - AjjPj = jxE - Aj = f A ðxÞ:

ð12:12Þ

This leads to invariance of the trace under a similarity transformation. That is, Tr P - 1 AP = TrA:

ð12:13Þ

Let us think of a following triangle matrix: 

a11 :

T=

:

⋱ 0

ð11:58Þ

ann

The matrix of this type is thought to be one of canonical forms of matrices. Its characteristic equation fT(x) is

488

12

Canonical Forms of Matrices

x - a11





0 0

⋱ 0



f T ðxÞ = jxE - T j =

:

ð12:14Þ

x - ann

Therefore, we get f T ð xÞ =

n i=1

ðx - aii Þ,

ð12:15Þ

where we used (11.59). From (11.58) and (12.15), we find that eigenvalues of a triangle matrix are given by its diagonal elements accordingly. Our immediate task will be to examine whether and how a given matrix is converted to a triangle matrix through a similarity transformation. A following theorem is important. Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by similarity transformation [1]. Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower left off-diagonal elements are zero; see (11.58)] or a “lower” triangle matrix (to which all the upper right off-diagonal elements are zero). In the present case we show the proof for the upper triangle matrix. Regarding the lower triangle matrix the theorem is proven in a similar manner. The proof is due to mathematical induction. First we show that the theorem is true of a (2, 2) matrix. Suppose that one of the eigenvalues of A2 is α1 and that its corresponding eigenvector is x1. Then we have A2 x1 = α1 x1 ,

ð12:16Þ

where we assume that x1 represents a column vector. Let a non-singular matrix be P1 = ðx1 p1 Þ,

ð12:17Þ

where p1 is another column vector chosen in such a way that x1 and p1 are linearly independent so that P1 can be a non-singular matrix. Then, we have P1- 1 A2 P1 = P1- 1 A2 x1

P1- 1 A2 p1 :

ð12:18Þ

Meanwhile, x1 = ð x 1 p 1 Þ Hence, we have

1 0

= P1

1 0

:

ð12:19Þ

12.1

Eigenvalues and Eigenvectors

489

P1- 1 A2 x1 = α1 P1- 1 x1 = α1 P1- 1 P1

1 0

= α1

=

1 0

α1 0

:

ð12:20Þ

Thus (12.18) can be rewritten as α1 0

P1- 1 A2 P1 =

 

:

ð12:21Þ

This shows that Theorem 12.1 is true of a (2, 2) matrix A2. Now let us show that Theorem 12.1 holds in a general case, i.e., for a (n, n) square matrix An. Let αn be one of the eigenvalues of An. On the basis of the argument of the (2, 2) matrix case, suppose that after a suitable similarity transformation by a ~ we have non-singular matrix P ~ A~n = P

-1

~ An P:

ð12:22Þ

Then, we can describe A~n as ⋯ xn - 1

αn x1 0

An =



An - 1

:

ð12:23Þ

0 In (12.23), αn is one of the eigenvalues of An. To show that (12.23) is valid, we ~ such that have a similar argument as in the case of a (2, 2) matrix. That is, we set P P = ð an

p1

p2



pn - 1 Þ,

~ is a non-singular matrix where an is an eigenvector corresponding to αn and P formed by n linearly independent column vectors an, p1, p2, ⋯and pn - 1. Then we have An = P =

-1

P

An P

-1

An an P

-1

A n p1 P

The vector an can be expressed as

-1

An p2 ⋯ P

-1

A n pn - 1 :

ð12:24Þ

490

12

Canonical Forms of Matrices

1 0 an = ð an

p1

p2



pn - 1 Þ

1 0 =P

0 ⋮ 0

0

:

⋮ 0

Therefore, we have ~ P

-1

~ An an = P

-1

~ αn an = P

-1

~ αn P

1 0 0 ⋮ 0

= αn

1 0 0 ⋮ 0

=

αn 0 0 ⋮ 0

:

Thus, from (12.24) we see that one can express a matrix form of A~n as (12.23). By a hypothesis of the mathematical induction, we assume that there exists a (n 1, n - 1) non-singular square matrix Pn - 1 and an upper triangle matrix Δn - 1 such that Pn--11 An - 1 Pn - 1 = Δn - 1 :

ð12:25Þ

Let us define a following matrix: d

0

⋯ 0

0 Pn =



Pn - 1

,

0 where d ≠ 0. The Pn is (n, n) non-singular square matrix; remember that detPn = d(det Pn - 1) ≠ 0. Operating Pn on A~n from the right, we have

12.1

Eigenvalues and Eigenvectors

αn x1

491



xn - 1

d



0

0

0 ⋮



An - 1

Pn - 1

0

0 αn d =

0

xTn - 1 Pn - 1

0 ⋮

,

ð12:26Þ

x1 ⋮

. Therefore, xTn - 1 Pn - 1 is a (1,

An - 1 Pn - 1

0 where xTn - 1 is a transpose of a column vector

xn - 1

n - 1) matrix (i.e., a row vector). Meanwhile, we have d

0

0

αn

0

0 ⋮

0 ⋮

Pn - 1

0

Δn - 1

0 αn d

=

xTn - 1 Pn - 1 =d

xTn - 1 Pn - 1

0 ⋮

Pn - 1 Δn - 1

:

ð12:27Þ

0 From the assumption of (12.25), we have LHS of ð12:26Þ = LHS of ð12:27Þ:

ð12:28Þ

A~n Pn = Pn Δn :

ð12:29Þ

That is,

In (12.29) we define Δn such that

492

αn Δn =

12

Canonical Forms of Matrices

,

ð12:30Þ

xTn - 1 Pn - 1 =d

0 ⋮

Δn - 1

0 which has appeared in LHS of (12.27). As Δn - 1 is a triangle matrix from the assumption, Δn is a triangle matrix as well. Combining (12.22) and (12.29), we finally get -1

~ n Δn = PP

~ n: An PP

ð12:31Þ

~ n Pn - 1 P ~ n is a non-singular matrix, and so PP ~ - 1 = E. Hence, Notice that PP ~ - 1 = PP ~ n - 1 . The equation obviously shows that An has been converted Pn - 1 P to a triangle matrix Δn. This completes the proof. Equation (12.31) implies that eigenvectors are disposed on diagonal positions of a triangle matrix. Triangle matrices can further undergo a similarity transformation. Example 12.1 Let us think of a following triangle matrix A: A=

2 0

1 1

:

ð12:32Þ

Eigenvalues of A are 2 and 1. Remember that diagonal elements of a triangle matrix give eigenvalues. According to (12.20), a vector

1 0

can be chosen for an

eigenvector (as a column vector representation) corresponding to the eigenvalue 2. Another eigenvector can be chosen to be

-1 1

. This is because for an eigenvalue

1, we obtain 1 0

1 0

x=0

as an eigenvalue equation (A - E)x = 0. Therefore, with an eigenvector corresponding to the eigenvalue 1, we get c1 + c2 = 0. Therefore, we have a simple form of the eigenvector. Hence, putting P = mation is carried out such that

1 0

-1 1

-1 1

c1 c2

as

, similarity transfor-

12.1

Eigenvalues and Eigenvectors

P - 1 AP =

1 0

1 1

493

2 1 0 1

1 0

-1 1

=

2 0

0 : 1

ð12:33Þ

This is a simple example of matrix diagonalization. Regarding the eigenvalue/eigenvector problems, we have another important theorem. Theorem 12.2 Eigenvectors corresponding to different eigenvalues of A are linearly independent. Proof We prove the theorem by mathematical induction. Let α1 and α2 be two different eigenvalues of a matrix A and let a1 and a2 be eigenvectors corresponding to α1 and α2, respectively. Let us think of a following equation: c1 a1 þ c2 a2 = 0:

ð12:34Þ

Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we can put c1 ≠ 0. Accordingly, we get a1 = -

c2 a : c1 2

ð12:35Þ

Operating A from the left of (12.35) we have α 1 a1 = -

c2 α a = α2 a1 : c1 2 2

ð12:36Þ

With the second equality we have used (12.35). From (12.36) we have ðα1 - α2 Þa1 = 0:

ð12:37Þ

As α1 ≠ α2, α1 - α2 ≠ 0. This implies a1 = 0, in contradiction to that a1 is a eigenvector. Thus, a1 and a2 must be linearly independent. Next we assume that Theorem 12.2 is true of the case where we have (n - 1) eigenvalues α1, α2, ⋯, and αn - 1 that are different from one another and corresponding eigenvectors a1, a2, ⋯, and an - 1 are linearly independent. Let us think of a following equation: c1 a1 þ c2 a2 þ ⋯ þ cn - 1 an - 1 þ cn an = 0,

ð12:38Þ

where an is an eigenvector corresponding to an eigenvalue αn. Suppose here that a1, a2, ⋯, and an are linearly dependent. If cn = 0, we have

494

12

Canonical Forms of Matrices

c1 a1 þ c2 a2 þ ⋯ þ cn - 1 an - 1 = 0:

ð12:39Þ

But, from the linear independence of a1, a2, ⋯, and an - 1 we have c1 = c2 = ⋯ = cn - 1 = 0. Thus, it follows that all the eigenvectors a1, a2, ⋯, and an are linearly independent. However, this is in contradiction to the assumption. We should therefore have cn ≠ 0. Accordingly we get an = -

c1 c c a þ 2 a þ ⋯ þ n - 1 an - 1 : cn 1 cn 2 cn

ð12:40Þ

Operating A from the left of (12.38) again, we have αn an = -

c1 c c α a þ 2 α a þ ⋯ þ n - 1 αn - 1 an - 1 : cn 1 1 cn 2 2 cn

ð12:41Þ

Here we think of two cases of (1) αn ≠ 0 and (2) αn = 0. 1. Case I: αn ≠ 0. Multiplying both sides of (12.40) by αn we have αn an = -

c1 c c α a þ 2 α a þ ⋯ þ n - 1 αn an - 1 : cn n 1 cn n 2 cn

ð12:42Þ

Subtracting (12.42) from (12.41) we get 0=

c1 c c ðα - α1 Þa1 þ 2 ðαn - α2 Þa2 þ ⋯ þ n - 1 ðαn - αn - 1 Þan - 1 : cn n cn cn

ð12:43Þ

Since we assume that eigenvalue is different from one another, αn ≠ α1, αn ≠ α2, ⋯, αn ≠ αn - 1. This implies that c1 = c2 = ⋯ = cn - 1 = 0. From (12.40) we have an = 0. This is, however, in contradiction to that an is an eigenvector. This means that our original supposition that a1, a2, ⋯, and an are linearly dependent was wrong. Thus, the eigenvectors a1, a2, ⋯, and an should be linearly independent. 2. Case II: αn = 0. Suppose again that a1, a2, ⋯, and an are linearly dependent. Since as before cn ≠ 0, we get (12.40) and (12.41) once again. Putting αn = 0 in (12.41) we have 0= -

c c c1 α a þ 2 α a þ ⋯ þ n - 1 αn - 1 an - 1 : cn 1 1 cn 2 2 cn

ð12:44Þ

Since eigenvalues are different, we should have α1 ≠ 0, α2 ≠ 0, ⋯, and αn - 1 ≠ 0. Then, considering that a1, a2, ⋯, and an - 1 are linearly independent, for (12.44) to hold we must have c1 = c2 = ⋯ = cn - 1 = 0. But, from (12.40) we have an = 0, again in contradiction to that an is an eigenvector. Thus, the

12.2

Eigenspaces and Invariant Subspaces

495

eigenvectors a1, a2, ⋯, and an should be linearly independent. These complete the proof.

12.2

Eigenspaces and Invariant Subspaces

Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one of the basis vectors, the matrix representation of the linear transformation A in reference to such basis vectors is obtained so that the leftmost column is zero except for the left top corner on which an eigenvalue is positioned. (Note that if the said eigenvector is zero, the leftmost column is a zero vector.) Meanwhile, neither Theorem 12.1 nor Theorem 12.2 tells about multiplicity of eigenvalues. If the eigenvalues have multiplicity, we have to think about different aspects. This is a major issue of this section. Let us start with a discussion of invariant subspaces. Let A be a (n, n) square matrix. If a subspace W in Vn is characterized by x 2 W ⟹ Ax 2 W, W is said to be invariant with respect to A (or simply A-invariant) or an invariant subspace in Vn = Span{a1, a2, ⋯, an}. Suppose that x is an eigenvector of A and that its corresponding eigenvalue is α. Then, Span{x} is an invariant subspace of Vn. It is because A(cx) = cAx = cαx = α(cx) and cx is again an eigenvector belonging to α. Suppose that dimW = m (m ≤ n) and that W = Span{a1, a2, ⋯, am}. If W is Ainvariant, A causes a linear transformation within W. In that case, expressing A in reference to a1, a2, ⋯, am, am + 1, ⋯, an, we have ð a1

a2



am

amþ1

= ð a1

a2



am

⋯ amþ1

an ÞA ⋯

an Þ

Am 0

 

,

ð12:45Þ

where Am is a (m, m) square matrix and “zero” denotes a (n - m, m) zero matrix. Notice that the transformation A makes the remaining (n - m) basis vectors am + 1, am + 2, ⋯, and an in Vn be converted to a linear combination of a1, a2, ⋯, and an. The triangle matrix Δn given in (12.30) and (12.31) is an example to which Am is a (1, 1) matrix (i.e., simply a complex number). Let us examine properties of the A-invariant subspace still further. Let a be any vector in Vn and think of following (n + 1) vectors [2]: a, Aa, A2 a, ⋯, An a:

496

12

Canonical Forms of Matrices

These vectors are linearly dependent since there are at most n linearly independent vectors in Vn. These vectors span a subspace in Vn. Let us call this subspace M; i.e., M  Span a, Aa, A2 a, ⋯, An a : Consider the following equation: c0 a þ c1 Aa þ c2 A2 a þ ⋯ þ cn An a = 0:

ð12:46Þ

Not all ci (0 ≤ i ≤ n) are zero, because the vectors are linearly dependent. Suppose that cn ≠ 0. Then, from (12.46) we have An a = -

1 c a þ c1 Aa þ c2 A2 a þ ⋯ þ cn - 1 An - 1 a : cn 0

Operating A on the above equation from the left, we have Anþ1 a = -

1 c Aa þ c1 A2 a þ c2 A3 a þ ⋯ þ cn - 1 An a : cn 0

Thus, An + 1a is contained in M. That is, we have An + 1a 2 Span{a, Aa, A2a, ⋯, A a}. Next, suppose that cn = 0. Then, at least one of ci (0 ≤ i ≤ n - 1) is not zero. Suppose that cn - 1 ≠ 0. From (12.46), we have n

An - 1 a = -

1 c a þ c1 Aa þ c2 A2 a þ ⋯ þ cn - 2 An - 2 a : cn - 1 0

Operating A2 on the above equation from the left, we have Anþ1 a = -

1 c A2 a þ c1 A3 a þ c2 A4 a þ ⋯ þ cn - 2 An a : cn - 1 0

Again, An + 1a is contained in M. Repeating similar procedures, we reach c0 a þ c1 Aa = 0: If c1 = 0, then we must have c0 ≠ 0. If so, a = 0. This is impossible, however, because we should have a ≠ 0. Then, we have c1 ≠ 0 and, hence, Aa = -

c0 a: c1

Operating once again An on the above equation from the left, we have

12.2

Eigenspaces and Invariant Subspaces

497

Anþ1 a = -

c0 n A a: c1

Thus, once again An + 1a is contained in M. In the above discussion, we get AM ⊂ M. Further A, A2M ⊂ AM ⊂ M, A3M ⊂ A2M ⊂ AM ⊂ M, ⋯. Then we have

operating

a, Aa, A2 a, ⋯, An a, Anþ1 a, ⋯ 2 Span a, Aa, A2 a, ⋯, An a : That is, M is an A-invariant subspace. We also have m  dimM ≤ dimV n = n: There are m basis vectors in M, and so representing A in a matrix form in reference to the n basis vectors of Vn including these m vectors, we have A=

Am 0

 

:

ð12:47Þ

Note again that in (12.45) Vn is spanned by the m basis vectors in M together with other (n - m) linearly independent vectors. We happen to encounter a situation where two subspaces W1 and W2 are at once A-invariant. Here we can take basis vectors a1, a2, ⋯, and am for W1 and am + 1, am + 2, ⋯, and an for W2. In reference to such a1, a2, ⋯, and an as basis vector of Vn, we have A=

Am 0

0 An - m

,

ð12:48Þ

where An - m is a (n - m, n - m) square matrix and “zero” denotes either a (n - m, m) or (m, n - m) zero matrix. Alternatively, we denote A = Am

An - m :

In this situation, the matrix A is said to be reducible. As stated above, A causes a linear transformation within both W1 and W2. In other words, Am and An - m cause a linear transformation within W1 and W2 in reference to a1, a2, ⋯, am and am + 1, am + 2, ⋯, an, respectrively. In this case Vn can be represented as a direct sum of W1 and W2 such that Vn = W1

W2

= Spanfa1 , a2 , ⋯, am g

Spanfamþ1 , amþ2 , ⋯, an g:

ð12:49Þ

498

12

Canonical Forms of Matrices

This is because Span{a1, a2, ⋯, am} \ Span{am + 1, am + 2, ⋯, an} = {0}. In fact, if the two subspaces possess x ( ≠ 0) in common, a1, a2, ⋯, an become linearly dependent, in contradiction. The vector space Vn may well further be decomposed into subspaces with a lower dimension. For further discussion we need a following theorem and a concept of a generalized eigenvector and generalized eigenspace. Theorem 12.3 Hamilton–Cayley Theorem [3, 4] Let fA(x) be the characteristic polynomial pertinent to a linear transformation A: Vn → Vn. Then fA(A)(x) = 0 for 8 x 2 Vn. That is, Ker fA(A) = Vn. Proof To prove the theorem we use the following well-known property of determinants. Namely, let A be a (n, n) square matrix expressed as A=

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

:

Δ1n ⋮ Δnn

,

~ be the cofactor matrix of A, namely Let A ~= A

Δ11 ⋮ Δn1

⋯ ⋱ ⋯

where Δij is a cofactor of aij. Then ~ T A = AA ~ T = j A j E, A

ð12:50Þ

~ We now apply (12.50) to the characteristic ~ T is a transposed matrix of A. where A polynomial to get a following equation expressed as xE - A

T

ðxE - AÞ = ðxE - AÞ xE - A

T

= jxE - AjE = f A ðAÞE,

ð12:51Þ

where xE - A is the cofactor matrix of xE - A. Let the cofactor of the (i, j)-element of (xE - A) be Δij. Note in this case that Δij is an at most (n - 1)-th order polynomial of x. Let us put accordingly Δij = bij,0 xn - 1 þ bij,1 xn - 2 þ ⋯ þ bij,n - 1 : Also, we put Bk = (bij, k). Then we have

ð12:52Þ

12.2

Eigenspaces and Invariant Subspaces

xE - A =

Δ11

⋯ Δ1n







Δn1

⋯ Δnn

b11,0 x

n-1



b1n,0 xn - 1 þ ⋯ þ b1n,n - 1







bn1,0 xn - 1 þ ⋯ þ bn1,n - 1 b11,0 ⋯ b1n,0



bnn,0 xn - 1 þ ⋯ þ bnn,n - 1 b11,0 ⋯ b1n,n - 1

=

= xn - 1



n-1

B0 þ x

þ ⋯ þ b11,n - 1



bn1,0 =x

499



þ⋯þ

⋯ bnn,0

n-2

⋮ bn1,0





⋯ bnn,n - 1

B1 þ ⋯ þ Bn - 1 : ð12:53Þ

Thus, we get jxE - AjE = ðxE - AÞ xn - 1 BT0 þ xn - 2 BT1 þ ⋯ þ BTn - 1 :

ð12:54Þ

Replacing x with A and considering that BTl ðl = 0, ⋯, n - 1Þ is commutative with A, we have f A ðAÞ = ðA - AÞ An - 1 BT0 þ An - 2 BT1 þ ⋯ þ BTn - 1 = 0:

ð12:55Þ

This means that fA(A)(x) = 0 for 8x 2 Vn. These complete the proof. In relation to Hamilton–Cayley theorem, we consider an important concept of a minimal polynomial. Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that f(A) = 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the highest-order coefficient of one is said to be a minimal polynomial. We denote it by φA(x); i.e., φA(A) = 0. We have an important property for this. Namely, a minimal polynomial φA(x) is a divisor of f(A). In fact, suppose that we have f ðxÞ = gðxÞφA ðxÞ þ hðxÞ: Inserting A into x, we have f ðAÞ = gðAÞφA ðAÞ þ hðAÞ = hðAÞ = 0:

500

12

Canonical Forms of Matrices

From the above equation, h(x) should be a polynomial whose order is lower than that of φA(A). But the presence of such h(x) is in contradiction to the definition of the minimal polynomial. This implies that h(x)  0. Thus, we get f ðxÞ = gðxÞφA ðxÞ: That is, φA(x) must be a divisor of f(A). Suppose that φ0A ðxÞ is another minimal polynomial. If the order of φ0A ðxÞ is lower than that of φA(x), we can choose φ0A ðxÞ for a minimal polynomial from the beginning. Thus we assume that φA(x) and φ0A ðxÞ are of the same order. We have f ðxÞ = gðxÞφA ðxÞ = g0 ðxÞφ0A ðxÞ: Then, we have φA ðxÞ=φ0A ðxÞ = g0 ðxÞ=gðxÞ = c,

ð12:56Þ

where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But c should be one, since the highest-order coefficient of the minimal polynomial is one. Thus, φA(x) is uniquely defined.

12.3

Generalized Eigenvectors and Nilpotent Matrices

Equation (12.4) ensures that an eigenvalue is accompanied by an eigenvector. Therefore, if a matrix A : Vn → Vn has different n eigenvalues without multiple roots, the vector space Vn is decomposed to a direct sum of one-dimensional subspaces spanned by those individual linearly independent eigenvectors (see discussion of Sect 12.1). Thus, we have Vn = W1

W2

= Spanfa1 g



Wn

Spanfa2 g



Spanfan g,

ð12:57Þ

where ai (1 ≤ i ≤ n) are eigenvectors corresponding to different n eigenvalues. The situation, however, is not always simple. Even though a matrix has eigenvalues of multiple roots, we have yet a simple case as shown in a next example. Example 12.2 Let us think of a following matrix A : V3 → V3 described by A=

2 0 0

0 2 0

0 0 2

:

ð12:58Þ

12.3

Generalized Eigenvectors and Nilpotent Matrices

501

The matrix has a triple root 2. As can easily be seen below, individual eigenvectors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3 can be decomposed to a direct sum of the three invariant subspaces as in (12.57) such that

ð a1

a2

a3 Þ

2 0

0 2

0 0

0

0

2

= ð 2a1

2a2

2a3 Þ:

Let us think of another simple example. Example 12.3 Let us think of a linear transformation A : V2 → V2 expressed as AðxÞ = ð a1

a2 Þ

3 1

x1

0 3

x2

,

ð12:59Þ

where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a double root 3. We have ð a1

a2 Þ

3

1

0

3

= ð 3a1

a1 þ 3a2 Þ:

ð12:60Þ

Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not. Detailed discussion about matrices of this kind follows below. Nilpotent matrices play an important role in matrix algebra. These matrices are defined as follows. Definition 12.2 Let N be a linear transformation in a vector space Vn. Suppose that we have Nk = 0

and

N k - 1 ≠ 0,

ð12:61Þ

where N is a (n, n) square matrix and k (≤2) is a certain natural number. Then, N is said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61) holds with k = 1, N is a zero matrix. Nilpotent matrices have the following properties: 1. Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix. Suppose that

502

12

Nx = αx,

Canonical Forms of Matrices

ð12:62Þ

where α is an eigenvalue and x (≠0) is its corresponding eigenvector. Operating N (k - 1) more times from the left of both sides of (12.62), we have N k x = αk x:

ð12:63Þ

Meanwhile, Nk = 0 by definition, and so αk = 0, namely α = 0. Conversely, suppose that eigenvalues of a (n, n) matrix N are zero. From Theorem 12.1, via a suitable similarity transformation with P we have ~ P - 1 NP = N, ~ is a triangle matrix. Then, using (12.12) and (12.15) we have where N f P - 1 NP ðxÞ = f N~ ðxÞ = f N ðxÞ = xn : From Theorem 12.3, we have f N ðN Þ = N n = 0: Namely, N is a nilpotent matrix. In a trivial case, we have N = 0 (zero matrix). By Definition 12.2, we have Nk = 0 with a k-th nilpotent (n, n) matrix N. Then, its minimal polynomial is φN(x) = xk (k ≤ n). 2. A nilpotent matrix N is not diagonalizable (except for a zero matrix). Suppose that N is diagonalizable. Then N can be diagonalized by a non-singular matrix P such that P - 1 NP = 0: The above equation holds, because N only takes eigenvalues of zero. Operating P from the left of the above equation and P-1 from the right, we have N = 0: This means that N would be a zero matrix, in contradiction. Thus, a nilpotent matrix N is not diagonalizable.

12.3

Generalized Eigenvectors and Nilpotent Matrices

503

Example 12.4 Let N be a matrix of a following form: N=

0 0

1 0

:

Then, we can easily check that N2 = 0. Therefore, N is a nilpotent matrix of a second order. Note that N is an upper triangle matrix, and so eigenvalues are given by diagonal elements. In the present case the eigenvalue is zero (as a double root), consistent with the aforementioned property. With a nilpotent matrix of an order k, we have at least one vector x such that Nk - 1x ≠ 0. Here we add that a zero transformation A (or matrix) can be defined as 8

A = 0 , Ax = 0 for

x 2 V n:

ð12:64Þ

Taking contraposition of this, we have A ≠ 0 , Ax ≠ 0 for



x 2 V n:

In relation to a nilpotent matrix, we have a following important theorem. Theorem 12.4 If N is a k-th order nilpotent matrix, then for ∃x 2 V we have following linearly independent k vectors: x, Nx N 2 x, ⋯, N k - 1 x: Proof Let us think of a following equation: k-1

cN i=0 i

i

x = 0:

ð12:65Þ

Multiplying (12.65) by Nk - 1 from the left and using Nk = 0, we get c0 N k - 1 x = 0:

ð12:66Þ

This implies that c0 = 0. Thus, we are left with k-1

cN i=1 i

i

x = 0:

ð12:67Þ

Next, multiplying (12.67) by Nk - 2 from the left, we get similarly c1 N k - 1 x = 0,

ð12:68Þ

implying that c1 = 0. Continuing this procedure, we finally get ci = 0 (0 ≤ i ≤ k - 1). This completes the proof.

504

12

Canonical Forms of Matrices

This immediately tells us that for a k-th order nilpotent matrix N : Vn → Vn, we should have k ≤ n. This is because the number of linearly independent vectors is at most n. In Example 12.4, N causes a transformation of basis vectors in V2 such that ð e1

e2 ÞN = ð e1

e2 Þ

0

1

0

0

=ð0

e1 Þ:

That is, Ne2 = e1. Then, linearly independent two vectors e2 and Ne2 correspond to the case of Theorem 12.4. So far we have shown simple cases where matrices can be diagonalized via similarity transformation. This is equivalent to that the relevant vector space is decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic polynomial of the matrix has multiple root(s), it remains uncertain whether the vector space is decomposed to such a direct sum. To answer this question, we need a following lemma. Lemma 12.1 Let f1(x), f2(x), ⋯, and fs(x) be polynomials without a common factor. Then we have s polynomials M1(x), M2(x), ⋯, Ms(x) that satisfy the following relation: M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ = 1:

ð12:69Þ

Proof Let Mi(x) (1 ≤ i ≤ s) be arbitrarily chosen polynomials and deal with a set of g(x) that can be expressed as gðxÞ = M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ:

ð12:70Þ

Let a whole set of g(x) be H. Then H has the following two properties: 1. g1 ðxÞ, g2 ðxÞ 2 H ⟹ g1 ðxÞ þ g2 ðxÞ 2 H, 2. gðxÞ 2 H, M(x): an arbitrarily chosen polynomial ⟹ M ðxÞgðxÞ 2 H. Now let the lowest-order polynomial out of the set H be g0(x). Then 8 gðxÞð2 HÞ are a multiple of g0(x). Suppose that dividing g(x) by g0(x), we have gðxÞ = M ðxÞg0 ðxÞ þ hðxÞ,

ð12:71Þ

where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from the above properties (1) and (2). If h(x) ≠ 0, the order of h(x) is lower than that of g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x). Therefore, h(x) = 0. Thus, gðxÞ = M ðxÞg0 ðxÞ:

ð12:72Þ

12.4

Idempotent Matrices and Generalized Eigenspaces

505

This implies that H is identical to a whole set of polynomials comprising multiples of g0(x). In particular, polynomials f i ðxÞ ð1 ≤ i ≤ sÞ 2 H. To show this, put Mi(x) = 1 with other Mj(x) = 0 ( j ≠ i). Hence, the polynomial g0(x) should be a common factor of fi(x). Meanwhile, g0 ðxÞ 2 H, and so by virtue of (12.70) we have g0 ðxÞ = M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ:

ð12:73Þ

On assumption the greatest common factor of f1(x), f2(x), ⋯, and fs(x) is 1. This implies that g0(x) = 1. Thus, we finally get (12.69) and complete the proof.

12.4

Idempotent Matrices and Generalized Eigenspaces

In Sect. 12.1 we have shown that eigenvectors corresponding to different eigenvalues of A are linearly independent. Also if those eigenvalues do not possess multiple roots, the vector space comprises a direct sum of the subspaces of corresponding eigenvectors. However, how do we have to treat a situation where eigenvalues possess multiple roots? Even in this case there is at least one eigenvector that corresponds to the eigenvalue. To adequately address the question, we need a concept of generalized eigenvectors and generalized eigenspaces. For this purpose we extend and generalize the concept of eigenvectors. For a certain natural number l, if a vector x (2Vn) satisfies a following relation, x is said to be a generalized eigenvector of rank l that corresponds to an eigenvalue α. ðA - αEÞl x = 0,

ðA - αEÞl - 1 x ≠ 0:

Thus (12.3) implies that an eigenvector of (12.3) may be said to be a generalized eigenvector of rank 1. When we need to distinguish it from generalized eigenvectors of rank l (l ≥ 2), we call it a “proper” eigenvector. Thus far we have only been concerned with the proper eigenvectors. We have a following theorem related to the generalized eigenvectors. In this section, idempotent operators play a role. The definition is simple. Definition 12.3 An operator A is said to be idempotent if A2 = A. From this simple definition, we can draw several important pieces of information. Let A be an idempotent operator that operates on Vn. Let x be an arbitrary vector in Vn. Then, A2x = A(Ax) = Ax. That is, AðAx - xÞ = 0: Then, we have

506

12

Ax = x

or

Canonical Forms of Matrices

Ax = 0:

ð12:74Þ

From Theorem 11.3 (dimension theorem), for a general case of linear operators we had V n = A ðV n Þ

Ker A,

where AðV n Þ = fAx; x 2 V n g,

Ker A = fx; x 2 V n , Ax = 0g:

Consequently, if A is an idempotent operator, we must have AðV n Þ = fx; x 2 V n , Ax = xg,

Ker A = fx; x 2 V n , Ax = 0g:

ð12:75Þ

Thus, we find that an idempotent operator A decomposes Vn into a direct sum of A(Vn) to which Ax = x and Ker A to which Ax = 0. Conversely, if there exists an operator A that satisfies (12.75), such A must be an idempotent operator. Meanwhile, we have ðE - AÞ2 = E - 2A þ A2 = E - 2A þ A = E - A: Hence, E - A is an idempotent matrix as well. Moreover, we have AðE - AÞ = ðE - AÞA = 0: Putting E - A  B and following a procedure similar to the above Bx - x = ðE - AÞx - x = x - Ax - x = - Ax: Therefore, B(Vn) = {x; x 2 Vn, Bx = x} is identical to Ker A. Writing W A = fx; x 2 V n , Ax = xg,

W A = fx; x 2 V n , Ax = 0g ,

ð12:76Þ

we get W A = AV n ,

W A = ðE - AÞV n = BV n = V n - AV n = Ker A:

That is, V n = W A þ W A: Suppose that ∃ u 2 W A \ W A . Then, from (12.76) Au = u and Au = 0, namely u = 0, and so V is a direct sum of WA and W A . That is,

12.4

Idempotent Matrices and Generalized Eigenspaces

Vn = WA

507

W A:

Notice that if we consider an identity operator E as an idempotent operator, we are {0}. thinking of a trivial case. That is, Vn = Vn The result can immediately be extended to the case where more idempotent operators take part in the vector space. Let us define operators such that A1 þ A2 þ ⋯ þ As = E, where E is a (n, n) identity matrix. Also Ai Aj = Ai δij : Moreover, we define Wi such that W i = Ai V n = fx; x 2 V n , Ai x = xg:

ð12:77Þ

Then Vn = W1



W2

W s:

ð12:78Þ

In fact, suppose that ∃x( 2 Vn) 2 Wi, Wj (i ≠ j). Then Aix = x = Ajx. Operating Aj ( j ≠ i) from the left, AjAix = Ajx = AjAjx. That is, 0 = x = Ajx, implying that Wi \ Wj = {0}. Meanwhile, V n = ðA1 þ A2 þ ⋯ þ As ÞV n = A1 V n þ A2 V n þ ⋯ þ As V n

ð12:79Þ

= W 1 þ W 2 þ ⋯ þ W s:

As Wi \ Wj = {0} (i ≠ j), (12.79) is a direct sum. Thus, (12.78) will follow. Example 12.5 Think of the following transformation A:

ð e1

e2

Put x =

e3

e4 ÞA = ð e1

4 i = 1 xi ei .

e2

e3

e4 Þ

1 0

0 1

0 0 0 0

0

0

0 0

0

0

0 0

Then, we have

AðxÞ =

4

x Aðei Þ = i=1 i

2

xe i=1 i i

= W A,

= ð e1

e2

0 0 Þ:

508

12 4

ðE - AÞðxÞ = x - AðxÞ =

xe i=3 i i

Canonical Forms of Matrices

= W A:

In the above,

ð e1 e2 e3 e4 ÞðE - AÞ = ð e1 e2 e3 e4 Þ

0 0

0 0 0 0 0 0

0

0 1 0

0

0 0 1

= ð 0 0 e3 e 4 Þ :

Thus, we have W A = Spanfe1 , e2 g,

W A = Spanfe3 , e4 g,

V4 = WA

W A:

The properties of idempotent matrices can easily be checked. It is left for readers. Using the aforementioned idempotent operators, let us introduce the following theorem. Theorem 12.5 [3, 4] Let A be a linear transformation Vn → Vn. Suppose that a vector x (2Vn) satisfies the following relation: ðA - αE Þl x = 0,

ð12:80Þ

where l is an enough large natural number. Then a set comprising x forms an Ainvariant subspace that corresponds to an eigenvalue α. Let α1, α2, ⋯, αs be eigenvalues of A different from one another. Then Vn is decomposed to a direct sum of the A-invariant subspaces that correspond individually to α1, α2, ⋯, αs. This is succinctly expressed as follows: V n = W α1

W α2



W αs :

ð12:81Þ

~ αi ð1 ≤ i ≤ sÞ is given by Here W ~ αi = x; x 2 V n , ðA - αi E Þli x = 0 , W

ð12:82Þ

~ α i = ni . where li is an enough large natural number. If multiplicity of αi is ni, dim W ~ α k ð 1 ≤ k ≤ sÞ. Proof Let us define the aforementioned A-invariant subspaces as W Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of powers of first-degree polynomials, we have

12.4

Idempotent Matrices and Generalized Eigenspaces s

f A ð xÞ =

i=1

509

ðx - αi Þni ,

ð12:83Þ

where ni is a multiplicity of αi. Let us put f i ðxÞ = f A ðxÞ=ðx - αi Þni =

s

x - αj

j≠i

nj

:

ð12:84Þ

Then f1(x), f2(x), ⋯, and fs(x) do not have a common factor. Consequently, Lemma 12.1 tells us that there are polynomials M1(x), M2(x), ⋯, and Ms(x) such that M 1 ðxÞf 1 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ = 1:

ð12:85Þ

Replacing x with a matrix A, we get M 1 ðAÞf 1 ðAÞ þ ⋯ þ M s ðAÞf s ðAÞ = E:

ð12:86Þ

Or defining Mi(A)fi(A)  Ai, we have A1 þ A2 þ ⋯ þ As = E,

ð12:87Þ

where E is a (n, n) identity matrix. Moreover we have Ai Aj = Ai δij :

ð12:88Þ

In fact, if i ≠ j, Ai Aj = M i ðAÞf i ðAÞM j ðAÞf j ðAÞ = M i ðAÞM j ðAÞf i ðAÞf j ðAÞ = M i ðAÞM j ðAÞ

s k≠i

= M i ðAÞM j ðAÞf A ðAÞ

ðA - αk Þnk

s

ðA - αl Þnl

l≠j s ðA - αk Þnk k ≠ i,j

ð12:89Þ

= 0:

The second equality results from the fact that Mj(A) and fi(A) are commutable since both are polynomials of A. The last equality follows from Hamilton–Cayley theorem. On the basis of (12.87) and (12.89), Ai = Ai E = Ai ðA1 þ A2 þ ⋯ þ As Þ = A2i :

ð12:90Þ

Thus, we find that Ai is an idempotent matrix. Next, let us show that using Ai determined above, AiVn is identical to ~ αi ð1 ≤ i ≤ sÞ. To this end, we define Wi such that W W i = Ai V n = fx; x 2 V n , Ai x = xg: We have

ð12:91Þ

510

12

Canonical Forms of Matrices

ðx - αi Þni f i ðxÞ = f A ðxÞ:

ð12:92Þ

Therefore, from Hamilton–Cayley theorem we have ðA - αi E Þni f i ðAÞ = 0:

ð12:93Þ

Operating Mi(A) from the left and from the fact that Mi(A) commutes with ðA - αi E Þni , we get ðA - αi EÞni Ai = 0,

ð12:94Þ

where we used Mi(A)fi(A) = Ai. Operating both sides of this equation on Vn, furthermore, we have ðA - αi E Þni Ai V n = 0: This means that ~ αi ð1 ≤ i ≤ sÞ: Ai V n ⊂ W

ð12:95Þ

~ αi . Then (A - αE)lx = 0 holds for a certain Conversely, suppose that x 2 W natural number l. If Mi(x)fi(x) were divided out by x - αi, LHS of (12.85) would be divided out by x - αi as well, leading to the contradiction. Thus, it follows that (x αi)l and Mi(x)fi(x) do not have a common factor. Consequently, Lemma 12.1 ensures that we have polynomials M(x) and N(x) such that M ðxÞðx - αi Þl þ N ðxÞM i ðxÞf i ðxÞ = 1 and, hence, M ðAÞðA - αi EÞl þ N ðAÞM i ðAÞf i ðAÞ = E:

ð12:96Þ

Operating both sides of (12.96) on x, we get M ðAÞðA - αi E Þl x þ N ðAÞM i ðAÞf i ðAÞx = N ðAÞAi x = x:

ð12:97Þ

Notice that the first term of (12.97) vanishes from (12.82). As Ai is a polynomial of A, it commutes with N(A). Hence, we have x = Ai ½N ðAÞx 2 Ai V n : Thus, we get

ð12:98Þ

12.4

Idempotent Matrices and Generalized Eigenspaces

511

~ αi ⊂ Ai V n ð1 ≤ i ≤ sÞ: W

ð12:99Þ

From (12.95) and (12.99), we conclude that ~ αi = Ai V n ð1 ≤ i ≤ sÞ: W

ð12:100Þ

~ αi that is defined as (12.82). In other words, Wi defined as (12.91) is identical to W Thus, we have Vn = W1

W2



Ws

~ α2 W



~ αs : W

or ~ α1 Vn = W

ð12:101Þ

This completes the former half of the proof. With the latter half, the proof is as follows. ~ αi = n0i . In parallel to the decomposition of Vn to the direct Suppose that dim W sum of (12.81), A can be reduced as Að1Þ ⋮ 0

A

⋯ ⋱ ⋯

0 ⋮ AðsÞ

,

ð12:102Þ

where A(i) (1 ≤ i ≤ s) is a (n0i , n0i ) matrix and a symbol  indicates that A has been transformed by suitable similarity transformation. The matrix A(i) represents a linear ~ αi . We denote a n0i order identity matrix by E n0 . transformation that A causes to W i Equation (12.82) implies that the matrix represented by N i = AðiÞ - αi En0i

ð12:103Þ

is a nilpotent matrix. The order of a nilpotent matrix is at most n0i (vide supra) and, hence, li can be n0i . With Ni we have f N i ðxÞ = xE n0i - N i = xE n0i - AðiÞ - αi E n0i 0

= ðx þ αi ÞE n0i - AðiÞ = xni :

= xE n0i - AðiÞ þ αi E n0i

ð12:104Þ

The last equality is because eigenvalues of a nilpotent matrix are all zero. Meanwhile,

512

12

Canonical Forms of Matrices 0

f AðiÞ ðxÞ = xE n0i - AðiÞ = f N i ðx - αi Þ = ðx - αi Þni :

ð12:105Þ

Equation (12.105) implies that f A ð xÞ =

s

f ðiÞ ðxÞ = i=1 A

0

s i=1

s

ðx - αi Þni =

i=1

ðx - αi Þni :

ð12:106Þ

The last equality comes from (12.83). Thus, n0i = ni . At the same time, we may equate li in (12.82) to ni. These procedures complete the proof. Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can further be segmented according to individual eigenvalues. Considering Theorem 12.1 again, A(i) (1 ≤ i ≤ s) can be described as an upper triangle matrix by αi ⋮ 0

AðiÞ 

⋯ ⋱ ⋯

 ⋮ αi

:

ð12:107Þ

Therefore, denoting N(i) such that N ðiÞ = AðiÞ - αi E ni =

0 ⋮ 0

⋯ ⋱ ⋯

 ⋮ 0

,

ð12:108Þ

we find that N(i) is nilpotent. This is because all the eigenvalues of N(i) are zero. From (12.108), we have N ðiÞ

12.5

μi

= 0 ðμi ≤ ni Þ:

Decomposition of Matrix

To investigate canonical forms of matrices, it would be convenient if a matrix can be decomposed into appropriate forms. To this end, the following definition is important. Definition 12.4 A matrix similar to a diagonal matrix is said to be semi-simple. In the above definition, if a matrix is related to another matrix by similarity transformation, those matrices are said to be similar to each other. When we have two matrices A and A′, we express it by A~A′ as stated above. This relation satisfies the equivalence law. That is, (1) A~A, (2) A~A′ ) A′~A, (3) A~A′, A′~A″ ) A~A″.

12.5

Decomposition of Matrix

513

Readers, check this. We have a following important theorem with the matrix decomposition. Theorem 12.6 [3, 4] Any (n, n) square matrix A is expressed uniquely as A = S þ N,

ð12:109Þ

where S is semi-simple and N is nilpotent; S and N are commutable, i.e., SN = NS. Furthermore, S and N are polynomials of A with scalar coefficients. Proof Using (12.86) and (12.87), we write s

S = α1 A1 þ ⋯ þ αs As =

α M ðAÞf i ðAÞ: i=1 i i

ð12:110Þ

Then, (12.110) is a polynomial of A. From Theorem 12.1 and Theorem 12.5, A(i) (1 ≤ i ≤ s) in (12.102) is characterized by that A(i) is a triangle matrix whose eigenvalues αi (1 ≤ i ≤ s) are positioned on diagonal positions and that the order of A(i) is identical to the multiplicity of αi. Since Ai (1 ≤ i ≤ s) is an idempotent matrix, it should be diagonalized through similarity transformation (see Sect. 12.7). In fact, corresponding to (12.102), S is transformed via similarity transformation into S

α 1 E n1 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ α s E ns

,

ð12:111Þ

where E ni ð1 ≤ i ≤ sÞ is an identity matrix of an order ni that is identical to the multiplicity of αi. This expression is equivalent to, e.g., A1 

⋯ ⋱ ⋯

E n1 ⋮ 0

0 ⋮ 0

in (12.110). Thus, S is obviously semi-simple (i.e., diagonalizable). Putting N = A S, N is described after the above transformation as N

N ð 1Þ ⋮ 0

⋯ ⋱ ⋯

0 ⋮ N ðsÞ

,

N ðiÞ = AðiÞ - αi E ni :

ð12:112Þ

Since each N(i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also (12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable. To prove the uniqueness of the decomposition, we show the following: 1. Let S and S′ be commutable semi-simple matrices. Then, those matrices are simultaneously diagonalized. That is, with a certain non-singular matrix P, P-1SP and P-1S′P are diagonalized at once. Hence, S ± S′ is semi-simple as well.

514

12

Canonical Forms of Matrices

2. Let N and N′ be commutable nilpotent matrices. Then, N ± N′ is nilpotent as well. 3. A matrix both semi-simple and nilpotent is zero matrix. 1. Let different eigenvalues of S be α1, ⋯, αs. Then, since S is semi-simple, a vector space Vn is decomposed into a direct sum of eigenspaces W αi ð1 ≤ i ≤ sÞ. That is, we have ⋯

V n = W α1

W αs :

we have Since S and S′ are commutable, with ∃ x 2 W αi SS x = S′Sx = S′(αix) = αiS′x. Hence, we have S0 x 2 W αi . Namely, W αi is S′invariant. Therefore, if we adopt the basis vectors {a1, ⋯, an} with respect to the direct sum decomposition, we get ′

S

α1 E n1 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ αs E ns

,

S0 

S01 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ S0s

:

Since S′ is semi-simple, S0i ð1 ≤ i ≤ sÞ must be semi-simple as well. Here, let {e1, ⋯, en} be original basis vectors before the basis vector transformation and let P be a representation matrix of the said transformation. Then, we have ð e1



α1 E n1 ⋮ 0

⋯ ⋱ ⋯

en ÞP = ð a1



an Þ:

Thus, we get P - 1 SP =

0 ⋮ αs E ns

,

P - 1 S0 P =

S01 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ S0s

:

This means that both P-1SP and P-1S′P are diagonal. That is, P-1SP ± P S P = P-1(S ± S′)P is diagonal, indicating that S ± S′ is semi-simple as well. 0 2. Suppose that Nν = 0 and N 0ν = 0. From the assumption, N and N′ are commutable. Consequently, using binomial theorem we have -1 ′

ðN ± N 0 Þ

m

= N m ± mN m - 1 N 0 þ ⋯ þ ð ± 1Þi

m! i N m - iN0 þ ⋯ i!ðm - iÞ! ð12:113Þ

þð ± 1Þm N 0 : m

Putting m = ν + ν′ - 1, if i ≥ ν′, N′i = 0 from the supposition. If i < ν′, we have m - i > m - ν′ = ν + ν′ - 1 - ν′ = ν - 1; i.e., m - i ≥ ν. Therefore, Nm - i = 0. Consequently, we have Nm - iN′i = 0 with any i in (12.113). Thus, we get (N ± N′)m = 0, indicating that N ± N′ is nilpotent. 3. Let S be a semi-simple and nilpotent matrix. We describe S as

12.6

Jordan Canonical Form

515

S

α1 ⋮ 0

⋯ 0 ⋱ ⋮ ⋯ αn

,

ð12:114Þ

where some of αi (1 ≤ i ≤ n) may be identical. Since S is nilpotent, all αi (1 ≤ i ≤ n) is zero. We have then S~0; i.e., S = 0 accordingly. Now, suppose that a matrix A is decomposed differently from (12.109). That is, we have A = S þ N = S0 þ N 0

or

S - S0 = N 0 - N:

ð12:115Þ

From the assumption, S′ and N′ are commutable. Moreover, since S, S′, N, and N are described by a polynomial of A, they are commutable with one another. Hence, from (1) and (2) along with the second equation of (12.115), S - S′ and N′ - N are both semi-simple and nilpotent at once. Consequently, from (3) S S′ = N′ - N = 0. Thus, we finally get S = S′ and N = N′. That is, the decomposition is unique. These complete the proof. ′

Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. This matrix decomposition is said to be Jordan decomposition. On the basis of Theorem 12.6, we investigate Jordan canonical forms of matrices in the next section.

12.6

Jordan Canonical Form

Once the vector space Vn has been decomposed to a direct sum of generalized eigenspaces with the matrix reduced in parallel, we are able to deal with individual ~ αi ð1 ≤ i ≤ sÞ and the corresponding A(i) (1 ≤ i ≤ s) given in (12.102) eigenspaces W separately.

12.6.1 Canonical Form of Nilpotent Matrix To avoid complication of notation, we think of a following example where we assume a (n, n) matrix that operates on Vn: Let the nilpotent matrix be N: Suppose that Nν - 1 ≠ 0 and Nν = 0 (1 ≤ ν ≤ n). To avoid complex notation, we focus on one ~ αi ð1 ≤ i ≤ sÞ and regard it as n-dimensional vector space Vn. of the eigenspaces W If ν = 1, the nilpotent matrix N is a zero matrix. Notice that since a characteristic polynomial is described by fN(x) = xn, fN(N ) = Nn = 0 from Hamilton–Cayley theorem. Let W(i) be given such that

516

12

Canonical Forms of Matrices

W ðiÞ = x; x 2 V n , N i x = 0, N i - 1 x ≠ 0 ði ≥ 1Þ ;

W ð0Þ  f0g:

Then we have V n = W ðνÞ ⊃ W ðν - 1Þ ⊃ ⋯ ⊃ W ð1Þ ⊃ W ð0Þ  f0g:

ð12:116Þ

Note that when ν = 1, we have trivially Vn = W(ν) ⊃ W(0)  {0}. Let us put dim W = mi, mi - mi - 1 = ri (1 ≤ i ≤ ν), m0  0. Then we can add rν linearly independent vectors a1 , a2 , ⋯, and arν to the basis vectors of W(ν - 1) so that those rν vectors can be basis vectors of W(ν). Unless N = 0 (i.e., zero matrix), we must have at least one such vector; from the supposition with ∃x ≠ 0 we have Nν - 1x ≠ 0, and so x2 = W(ν - 1). At least one such vector x is present and it is eligible for a basis vector of W(ν). Hence, rν ≥ 1 and we have (i)

W ðνÞ = Spanfa1 , a2 , ⋯, arν g

W ðν - 1Þ :

ð12:117Þ

Note that (12.117) is expressed as a direct sum. Meanwhile, Na1 , Na2 , ⋯, Narν 2 W ðν - 1Þ . In fact, suppose that x 2 W(ν), i.e., Nνx = Nν - 1(Nx) = 0. That is, Nx 2 W(ν - 1). According to a similar reasoning made above, we have SpanfNa1 , Na2 , ⋯, Narν g \ W ðν - 2Þ = f0g: Moreover, these rν vectors Na1 , Na2 , ⋯, and Narν are linearly independent. Suppose that c1 Na1 þ c2 Na2 þ ⋯ þ crν Narν = 0:

ð12:118Þ

Operating Nν - 2 from the left, we have N ν - 1 ðc1 a1 þ c2 a2 þ ⋯ þ crν arν Þ = 0: This would imply that c1 a1 þ c2 a2 þ ⋯ þ crν arν 2 W ðν - 1Þ . On the basis of the above argument, however, we must have c1 = c2 = ⋯ = crν = 0. In other words, if ∃ ci (1 ≤ i ≤ rν) were non-zero, we would have ai 2 W(ν - 1), in contradiction. From (12.118) this means linear independence of Na1 , Na2 , ⋯, and Narν . As W(ν - 1) ⊃ W(ν - 2), we may well have additional linearly independent vectors within the basis vectors of W(ν - 1). Let those vectors be arν þ1 , ⋯, arν - 1 . Here we assume that the number of such vectors is rν - 1 - rν. We have rν - 1 - rν ≥ 0 accordingly. In this way we can construct basis vectors of W(ν - 1) by including arν þ1 , ⋯, and arν - 1 along with Na1 , Na2 , ⋯, and Narν . As a result, we get

12.6

Jordan Canonical Form

517

W ðν - 1Þ = SpanfNa1 , ⋯, Narν , arν þ1 , ⋯, arν - 1 g

W ðν - 2Þ :

We can repeat these processes to construct W(ν - 2) such that W ðν - 2Þ = Span N 2 a1 , ⋯, N 2 arν , Narν þ1 , ⋯, Narν - 1 , arν - 1 þ1 , ⋯, arν - 2 W ðν - 3Þ : For W(i), furthermore, we have W ðiÞ = Span N ν - i a1 , ⋯, N ν - i arν , N ν - i - 1 arν þ1 , ⋯, N ν - i - 1 arν - 1 , ⋯, ariþ1 þ1 , ⋯, ari W ði - 1Þ : ð12:119Þ Further repeating the procedures, we exhaust all the n basis vectors of W(ν) = Vn. These vectors are given as follows: N k ariþ1 þ1 , ⋯, N k ari ð1 ≤ i ≤ ν; 0 ≤ k ≤ i - 1Þ: At the same time we have 0  rνþ1 < 1 ≤ r ν ≤ r ν - 1 ≤ ⋯ ≤ r1 :

ð12:120Þ

Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have rν, rν - 1, ⋯ , r1 vectors. Their sum is n. This is the same number as that vertically counted. The dimension n of the vector space Vn is thus given by n=

ν

r i=1 i

=

ν i=1

iðr i - r iþ1 Þ:

ð12:121Þ

Let us examine the structure of Table 12.1 more closely. More specifically, let us inspect the i-layered structures of (ri - ri + 1) vectors. Picking up a vector from among ariþ1 þ1 , ⋯, ari , we call it aρ. Then, we get following set of vectors aρ, Naρ N2aρ, ⋯, Ni - 1aρ in the i-layered structure. These i vectors are displayed “vertically” in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and form an i-dimensional N-invariant subspace; i.e., Span{Ni - 1aρ, Ni - 2aρ, ⋯, Naρ, aρ}, where ri + 1 + 1 ≤ ρ ≤ ri. Matrix representation of the linear transformation N with respect to the set of these i vectors is [3, 4]

N νrν

a1 , ⋯, N

ν-1

ar ν

ν-2

arν þ1 , ⋯, N ar ν - 1 N (ν - 1)(rν - 1 - rν)

ν-2

rν - 1 - rν# arν þ1 , ⋯, arν - 1 Narν þ1 , ⋯, Narν - 1 ⋯ N ν - i - 1 arν þ1 , ⋯, N ν - i - 1 arν - 1 N ν - i - 2 arν þ1 , ⋯, N ν - i - 2 arν - 1 ⋯ N ν - 3 arν þ1 , ⋯, N ν - 3 arν - 1 ⋯



⋯ ⋯ ⋯ ⋯ ⋯ N ariþ1 þ1 , ⋯, N i(ri - ri + 1)

i-1

i-1

ari

ri - ri + 1# ariþ1 þ1 , ⋯, ari Nariþ1 þ1 , ⋯, Nari ⋯ N i - 2 ariþ1 þ1 , ⋯, N i - 2 ari







2(r2 - r3)

Nar3 þ1 , ⋯, Nar2

r2 - r3# ar3 þ1 , ⋯, ar2

Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo

n=

ν k = 1 rk

ν-1

N ν - i - 1 a1 , ⋯, N ν - i - 1 arν ⋯ N ν - 2 a1 , ⋯, N ν - 2 arν

rν# a1 , ⋯, arν Na1 , ⋯, Narν N 2 a1 , ⋯, N 2 arν ⋯ N ν - i a1 , ⋯, N ν - i arν

r1 - r2

ar2 þ1 , ⋯, ar1

r1 - r2# 12

a

r1←

rν← rν - 1← ⋯ ⋯ ri← ri - 1← ⋯ r2←

Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa

518 Canonical Forms of Matrices

12.6

Jordan Canonical Form

519

N i - 1 aρ N i - 2 aρ ⋯Naρ aρ N 0 1 0

1 0

= N

i-1

aρ N

i-2



aρ ⋯Naρ aρ

⋱ ⋱





0

1



0

:

ð12:122Þ

1 0

These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Equation (12.122) clearly shows that Ni - 1aρ is a proper eigenvector and that we have i - 1 generalized eigenvectors Ni - 2aρ, ⋯, Naρ, aρ. Notice that the number of those Jordan blocks is (ri - ri + 1). Let us expressly define this number as [3, 4] J i = r i - r iþ1 ,

ð12:123Þ

where Ji is the number of the i-th order Jordan blocks. The total number of Jordan blocks within a whole vector space Vn = W(ν) is ν

J i=1 i

=

ν i=1

ðr i - r iþ1 Þ = r 1 :

ð12:124Þ

Recalling the dimension theorem mentioned in (11.45), we have dim V n = dim Ker N i þ dimN i ðV n Þ = dim Ker N i þ rankN i :

ð12:125Þ

Meanwhile, since W(i) = Ker Ni, dim W(i) = mi = dim Ker Ni. From (12.116), m0  0. Then (11.45) is now read as dim V n = mi þ rankN i :

ð12:126Þ

n = mi þ rankN i

ð12:127Þ

mi = n - rankN i :

ð12:128Þ

That is

or

Meanwhile, from Table 12.1 [3, 4] we have

520

12

Canonical Forms of Matrices

(b)



(a) , ⋯,

,

=

=

= ⋯= 1

Fig. 12.1 Examples of the structure of Jordan blocks. (a) r1 = n. (b) r1 = r2 = ⋯ = rn = 1

dimW ðiÞ = mi =

i

r , k=1 k

dimW ði - 1Þ = mi - 1 =

i-1

r : k=1 k

ð12:129Þ

Hence, we have r i = mi - mi - 1 :

ð12:130Þ

Then we get [3, 4] J i = r i - r iþ1 = ðmi - mi - 1 Þ - ðmiþ1 - mi Þ = 2mi - mi - 1 - miþ1 = 2 n - rankN i - n - rankN i - 1 - n - rankN iþ1 = rankN

i-1

þ rankN

iþ1

ð12:131Þ

- 2rankN : i

The number Ji is therefore defined uniquely by N. The total number of Jordan blocks r1 is also computed using (12.128) and (12.130) as r 1 = m1 - m0 = m1 = n - rankN = dim Ker N,

ð12:132Þ

where the last equality arises from the dimension theorem expressed as (11.45). In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν = 1 in (12.116), i.e., N = 0, from (12.132) we have r1 = n and r2 = ⋯ = rn = 0; see Fig. 12.1a. Also we confirm n = νi = 1 r i in (12.121). In this case, all the eigenvectors are proper eigenvectors with multiplicity of n and we have n first-order Jordan blocks. The other is the case of ν = n. In that case, we have

12.6

Jordan Canonical Form

521

r 1 = r 2 = ⋯ = rn = 1:

ð12:133Þ

In the latter case, we also have n = νi = 1 r i in (12.121). We have only one proper eigenvector and (n - 1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we have this special case where we have only one n-th order Jordan block, when rankN = n - 1.

12.6.2

Jordan Blocks

Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and considering (12.108), we put N i = AðiÞ - αi Eni ð1 ≤ i ≤ sÞ,

ð12:134Þ

where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as before. In (12.134) the number ni corresponds to n in Vn of Sect. 12.6.1. As Ni is a (ni, ni) matrix, we have N i ni = 0: Here we are speaking of ν-th order nilpotent matrices Ni such that Niν - 1 ≠ 0 and Ni = 0 (1 ≤ ν ≤ ni). We can deal with Ni in a manner fully consistent with the theory we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ) that is expressed as ν

AðκÞ = N κi þ αi E κi ð1 ≤ κi ≤ ni Þ,

ð12:135Þ

where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent (κi, κ i)-matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed on the principal diagonal and entries of 1 are positioned on the matrix element next above the principal diagonal. All other entries are zero; see, e.g., a matrix of (12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our next task is to find out how many Jordan blocks are contained in individual A(i) and what is the dimension of those Jordan blocks. Corresponding to (12.122), the matrix representation of the linear transformation by A(κ) with respect to the set of κ i vectors is

522

12

AðκÞ - αi E κi

κi - 1

AðκÞ - αi E κi

=

0

AðκÞ - αi E κi



κi - 1

κi - 2

AðκÞ - αi Eκi



Canonical Forms of Matrices

AðκÞ - αi Eκi

aσ ⋯ aσ

κi - 2

aσ ⋯ aσ

1 0

1 ð12:136Þ

0 ⋱ ×









0

1



:

0

1 0

A vector aσ stands for a vector associated with the κ-th Jordan block of A(i). From (12.136) we obtain AðκÞ - αi E κi

AðκÞ - αi Eκi

κi - 1

aσ = 0:

ð12:137Þ

Namely, AðκÞ

AðκÞ - αi E κi

κi - 1



= αi

κi - 1

AðκÞ - αi E κi

κi - 1

aσ :

ð12:138Þ

aσ is a proper eigenvector of A(κ) that correThis shows that AðκÞ - αi Eκi sponds to an eigenvalue αi. Conversely, aσ is a generalized eigenvector of rank κi. 2) generalized eigenvectors of There are another (κi μ AðκÞ - αi E κi aσ ð1 ≤ μ ≤ κi - 2Þ. In total, there are κ i eigenvectors [a proper eigenvector and (κ i - 1) generalized eigenvectors]. Also we see that the sole proper eigenvector can be found for each Jordan block. In reference to these κi eigenvectors as the basis vectors, the (κi, κ i)-matrix A(κ) (i.e., a Jordan block) is expressed as

12.6

Jordan Canonical Form

523

αi

1 αi

A

ðκ Þ

1 αi



=

⋱ ⋱

⋱ ⋮ αi 1



αi

:

ð12:139Þ

1 αi

A (ni, ni)-matrix A(i) of (12.102) pertinent to an eigenvalue αi contains a direct sum of Jordan blocks whose dimension ranges from 1 to ni. The largest possible number of Jordan blocks of dimension d (that satisfies n2i þ 1 ≤ d ≤ ni , where [μ] denotes a largest integer that does not exceed μ) is at most one. An example depicted below is a matrix A( p) (1 ≤ p ≤ s) that explicitly includes two one-dimensional Jordan blocks, a (3, 3) three-dimensional Jordan block, and a (5, 5) five-dimensional Jordan block: αp

0 αp

0 αp

αp AðpÞ 



...

1 1 αp



0 αp

αp ...

:

1 1 αp

1 αp

1 αp

where A( p) is a (10, 10) upper triangle matrix in which an eigenvalue αp is displayed on the principal diagonal with entries 0 or 1 on the matrix element next above the principal diagonal with all other entries being zero. Theorem 12.1 shows that every (n, n) square matrix can be converted to a triangle matrix by suitable similarity transformation. Diagonal elements give eigenvalues. Furthermore, Theorem 12.5 ensures that A can be reduced to generalized ~ αi (1 ≤ i ≤ s) according to individual eigenvalues. Suppose, for eigenspaces W example, that after a suitable similarity transformation a full matrix A is represented as

524

12

Canonical Forms of Matrices

α1 α2





α2



α2 ⋱ αi

A







αi





αi



ð12:140Þ

,

αi ⋱ αs



αs where A = Að1Þ ⨁Að2Þ ⨁⋯⨁AðiÞ ⨁⋯⨁AðsÞ :

ð12:141Þ

In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix; ⋯ A is a (4, 4) matrix; ⋯ A(s) is a (2, 2) matrix. The above matrix form allows us to further deal with segmented triangle matrices separately. In the case of (12.140) we may use a following matrix for similarity transformation: (i)

1 1 1 1 ⋱ P=

p11 p21

p12 p22

p13 p23

p14 p24

p31

p32

p33

p34

p41

p42

p43

p44

,

⋱ 1 1 where a (4, 4) matrix P given by

12.6

Jordan Canonical Form

525

P=

p11 p21 p31 p41

p12 p22 p32 p42

p13 p23 p33 p43

p14 p24 p34 p44

is a non-singular matrix. The matrix P is to be operated on A(i) so that we can separately perform the similarity transformation with respect to a (4, 4) nilpotent matrix A(i) - αiE4 following the procedures mentioned Sect. 12.6.1. Thus, only an αi-associated segment can be treated with other segments left unchanged. In a similar fashion, we can consecutively deal with matrix segments related to other eigenvalues. In a practical case, however, it is more convenient to seek different eigenvalues and corresponding (generalized) eigenvectors at once and convert the matrix to Jordan canonical form. To make a guess about the structure of a matrix, however, the following argument will be useful. Let us think of an example after that. Using (12.140), we have A - αi E  α1 - αi α2 - αi



α2 - αi

 

α2 - αi ⋱ 0







0





0



:

0 ⋱ αs - αi



αs - αi Here we are treating the (n, n) matrix A on Vn. Note that a matrix A - αiE is not nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni = 4. Since eigenvalues α1, α2, ⋯, and αs take different values from one another, α1 - αi, α2 - αi, ⋯, and αs - αi ≠ 0. In a triangle matrix diagonal elements give eigenvalues and, hence, α1 - αi, α2 - αi, ⋯, and αs - αi are non-zero eigenvalues of A - αiE. We rewrite (12.140) as

526

12

Canonical Forms of Matrices

M ð1Þ M ð2Þ ⋱

A - αi E 

,

M ðiÞ

ð12:142Þ

⋱ M ðsÞ



α2 - αi M ð1Þ = ðα1 - αi Þ, M ð2Þ =

where



α2 - αi



, ⋯,

M ðiÞ =

α2 - αi 0







0





0



, ⋯, M ðsÞ =



αs - αi

αs - αi

:

0 Thus, M( p) ( p ≠ i) is a non-singular matrix and M(i) is a nilpotent matrix. Note that μ -1 μ if we can find μi such that M ðiÞ i ≠ 0 and M ðiÞ i = 0, with a minimal polynomial φM ðiÞ ðxÞ for M(i), we have φM ðiÞ ðxÞ = xμi . Consequently, we get M ð1Þ

μi

M ð2Þ

μi



ð A - α i E Þ μi 

:

0 ⋱ M ðsÞ

ð12:143Þ

μi μ

In (12.143) diagonal elements of non-singular triangle matrices M ðpÞ i ðp ≠ iÞ μ are αp - αi i ð ≠ 0Þ. Thus, we have a “perforated” matrix ðA - αi EÞμi , where ðiÞ μi M = 0 in (12.143). Putting ΦA ð x Þ  we get

s i=1

ðx - αi Þμi ,

12.6

Jordan Canonical Form

527

ΦA ðAÞ 

s i=1

ðA - αi E Þμi = 0:

A polynomial ΦA(x) gives a minimal polynomial for fA(x). From the above argument, we can choose μi for li in (12.82). Meanwhile, M(i) in (12.142) is identical to Ni in (12.134). Rewriting (12.134), we get AðiÞ = M ðiÞ þ αi Eni : Let us think of matrices AðiÞ - αi Eni we find

k

ð12:144Þ

and (A - αiE)k (k ≤ μi). From (11.45),

dim V n = n = dim Ker ðA - αi EÞk þ rank ðA - αi E Þk

ð12:145Þ

and ~ αi = ni = dim Ker AðiÞ - αi E ni dim W

k

k

þ rank AðiÞ - αi Eni ,

where rank (A - αiE)k = dim k k ðiÞ ðiÞ ~ rank A - αi Eni = dim A - αi E ni W αi . Noting that

(A

-

ð12:146Þ

αiE)k(Vn)

k

dim Ker ðA - αi EÞk = dim Ker AðiÞ - αi E ni ,

and

ð12:147Þ

we get k

n - rank ðA - αi E Þk = ni - rank AðiÞ - αi Eni :

ð12:148Þ

This notable property comes from the non-singularity of [M( p)]k ( p ≠ i, k: a positive integer); i.e., all the eigenvalues of [M( p)]k are non-zero. In particular, as μ rank AðiÞ - αi Eni i = 0, from (12.148) we have rankðA - αi E Þl = n - ni ðl ≥ μi Þ:

ð12:149Þ

Meanwhile, putting k = 1 in (12.148) and using (12.132) we get dim Ker AðiÞ - αi Eni

= ni - rank AðiÞ - αi Eni = n - rank ðA - αi EÞ = dim Ker

ðiÞ

ðiÞ ðA - α i E Þ = r 1 :

ð12:150Þ

The value r 1 gives the number of Jordan blocks with an eigenvalue αi.

528

12

Canonical Forms of Matrices

Moreover, we must consider a following situation. We know how the matrix A(i) in (12.134) is reduced to Jordan blocks of lower dimension. To get detailed information about it, however, we have to get the information about (generalized) eigenvalues corresponding to eigenvalues other than αi. In this context, Eq. (12.148) is useful. Equation (12.131) tells how the number of Jordan blocks in a nilpotent matrix is determined. If we can get this knowledge before we have found out all the (generalized) eigenvectors, it will be easier to address the problem. Let us rewrite (12.131) as J ðqiÞ = r q - r qþ1 = rank AðiÞ - αi E ni

q-1

þ rank AðiÞ - αi E ni

qþ1

q

- 2rank AðiÞ - αi E ni , ð12:151Þ

where we define J qðiÞ as the number of the q-th order Jordan blocks within A(i). Note that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J qðiÞ is expressed as J ðqiÞ = rank ðA - αi E Þq - 1 þ rank ðA - αi E Þqþ1 - 2rank ðA - αi E Þq :

ð12:152Þ

This relation can be obtained by replacing k in (12.148) with q - 1, q + 1, and q, respectively and deleting n and ni from these three relations. This enables us to gain access to a whole structure of the linear transformation represented by the (n, n) matrix A without reducing it to subspaces. To enrich our understanding of Jordan canonical forms, a following tangible example will be beneficial.

12.6.3

Example of Jordan Canonical Form

Let us think of a following matrix A:



1 1 0 -3

-1 0 3 0 0 2 -1 -2

The characteristic equation fA(x) is given by

0 0 0 4

:

ð12:153Þ

12.6

Jordan Canonical Form

529

f A ð xÞ =

x-1 1

0

0

-1 x-3

0

0

x-2

0

2

x-4

0

0

3

1

ð12:154Þ

3

= ð x - 4Þ ð x - 2Þ : Equating (12.154) to zero, we get an eigenvalue 4 as a simple root and an eigenvalue 2 as a triple root. The vector space V4 is then decomposed to two invariant subspaces. The first is a one-dimensional kernel (or null-space) of the transformation (A - 4E) and the other is a three-dimensional kernel of the transformation (A - 2E)3. We have to seek eigenvectors that span these invariant subspaces with individual eigenvalues of x. 1. Case I (x = 4): An eigenvector belonging to the first invariant subspace must satisfy a proper eigenvalue equation since the eigenvalue 4 is simple root. This equation is expressed as ðA - 4EÞx = 0: This reads in a matrix form as -3 1 0

-1 -1 0

0 0 -2

0 0 0

c2 c3

-3

-1

-2

0

c4

c1 = 0:

ð12:155Þ

This is equivalent to a following set of four equations: - 3c1 - c2 = 0, c1 - c2 = 0, - 2c3 = 0, - 3c1 - c2 - 2c3 = 0: These are equivalent to that c3 = 0 and c1 = c2 = - 3c1. Therefore, c1 = c2 = c3 = 0 with an arbitrarily chosen number of c4, which is chosen as ð4Þ 1 as usual. Hence, designating the proper eigenvector as e1 , its column vector representation is

530

12

ð4Þ e1

=

0 0 0 1

Canonical Forms of Matrices

:

A (4, 4) matrix in (12.155) representing (A - 4E) has a rank 3. The number of Jordan blocks for an eigenvalue 4 is given by (12.150) as ð4Þ

r 1 = 4 - rankðA - 4E Þ = 1:

ð12:156Þ

In this case, the Jordan block is naturally one-dimensional. In fact, using (12.152) we have ð4Þ

J 1 = rank ðA - 4E Þ0 þ rank ðA - 4E Þ2 - 2rank ðA - 4E Þ = 4 þ 3 - 2 x 3 = 1:

ð12:157Þ

ð4Þ

In (12.157), J 1 gives the number of the first-order Jordan blocks for an eigenvalue 4. We used

2

ðA - 4EÞ =

8 4 -4 0 0 0 8 4

0 0 4 4

0 0 0 0

ð12:158Þ

and confirmed that rank (A - 4E)2 = 3. 2. Case II (x = 2): The eigenvalue 2 has a triple root. Therefore, we must examine how the invariant subspaces can further be decomposed to subspaces of lower dimension. To this end we first start with a secular equation expressed as ðA - 2EÞx = 0:

ð12:159Þ

The matrix representation is -1 1 0 -3

-1 0 0 1 0 0 -1 -2

0 0 0 2

c1 c2 c3 c4

= 0:

This is equivalent to a following set of two equations: c1 þ c2 = 0, - 3c1 - c2 - 2c3 þ 2c4 = 0: From the above, we can put c1 = c2 = 0 and c3 = c4 (=1). The equations allow the existence of another proper eigenvector. For this we have c1 = - c2 = 1,

12.6

Jordan Canonical Form

531

c3 = 0, and c4 = 1. Thus, for the two proper eigenvectors corresponding to an eigenvalue 2, we get

ð2Þ e1

0 0 1 1

=

,

ð2Þ e2

=

1 -1 0 1

:

A dimension of the invariant subspace corresponding to an eigenvalue 2 is three (due to the triple root) and, hence, there should be one generalized eigenvector. To determine it, we examine a following matrix equation: ðA - 2E Þ2 x = 0:

ð12:160Þ

The matrix representation is 0 0 0 -4

0 0 0 0

0 0 0 -4

c1 c2 c3 c4

0 0 0 4

= 0:

That is, - c1 - c3 þ c4 = 0:

ð12:161Þ

Furthermore, we have

3

ðA - 2E Þ =

0 0 0 -8

0 0 0 0

0 0 0 -8

0 0 0 8

:

ð12:162Þ

Moreover, rank (A - 2E)l(=1) remains unchanged for l ≥ 2 as expected from (12.149). It will be convenient to examine a structure of the invariant subspace. For this ð2Þ purpose, we seek the number of Jordan blocks r 1 and their order. Using (12.150), we have ð2Þ

r 1 = 4 - rankðA - 2EÞ = 4 - 2 = 2:

ð12:163Þ

The number of first-order Jordan blocks is ð2Þ

J1

= rank ðA - 2E Þ0 þ rank ðA - 2E Þ2 - 2rank ðA - 2EÞ = 4 þ 1 - 2 × 2 = 1:

In turn, the number of second-order Jordan blocks is

ð12:164Þ

532

12

Canonical Forms of Matrices

Jordan blocks

Fig. 12.2 Structure of Jordan blocks of a matrix shown in (12.170)

( ) ( )

( )

Eigenvalue: 4

Eigenvalue: 2

ð2Þ

J 2 = rank ðA - 2E Þ þ rank ðA - 2EÞ3 - 2rank ðA - 2E Þ2 = 2 þ 1 - 2 × 1 = 1: ð2Þ

( )

ð12:165Þ

ð2Þ

In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan blocks is three; the number of the first-order and second-order Jordan blocks is two and one, respectively. ð2Þ ð2Þ The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper ð2Þ ð2Þ eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to ð2Þ ð2Þ J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that ð2Þ

ð2Þ

ðA - 2E Þ2 g2 = ðA - 2E Þ ðA - 2EÞg2

ð2Þ

= ðA - 2E Þe2 = 0:

ð12:166Þ

From (12.161), we can put c1 = c3 = c4 = 0 and c2 = - 1. Thus, the matrix ð2Þ representation of the generalized eigenvector g2 is ð2Þ g2

=

0 -1 0 0

:

ð12:167Þ

ð2Þ

ð2Þ

We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because from (12.166) we have ð2Þ

ð2Þ

ðA - 2EÞg2 = e2 ,

ð2Þ

ð2Þ

ðA - 2E Þg2 ≠ e1 :

ð12:168Þ

Thus, we have determined a set of (generalized) eigenvectors. The matrix representation R for the basis vectors transformation is given by

12.6

Jordan Canonical Form

533

0 0 0 1

R=

0 1 0 0 -1 -1 1 0 0 1 1 0

ð4Þ

ð2Þ

ð2Þ

ð2Þ

e1 e 1 e 2 g2

ð12:169Þ

,

ð4Þ

ð2Þ

1

0

ð2Þ

where the symbol ~ denotes the column vector representation; e1 , e1 , and e2 ð2Þ represent proper eigenvectors and g2 is a generalized eigenvector. Performing similarity transformation using this R, we get a following Jordan canonical form: -1 0 -1 R - 1 AR ¼

1

1 -1

0

0

0

0

0

0

1

0

1

3

0

0

0

0-1-1

1

0

0

0

0

0

2

0

0

1

0

0

-3-1-2

4

1

1

1

0

-1-1 0 0 4 0 0 0 ¼

0

2

0

0

0

0

2

1

0

0

0

2

:

ð12:170Þ The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of A remains unchanged before and after similarity transformation. Next, we consider column vector representations. According to (11.37), let us view the matrix A as a linear transformation over V4. Then A is given by x1 AðxÞ = ð e1

= ð e1

e2

e2

e3

e4 ÞA

e3

e4 Þ

x2 x3 x4 1 -1

0

0

x1

1

3

0

0

x2

0

0

2

0

-3 -1 -2

4

ð12:171Þ ,

x3 x4

where e1, e2, e3, and e4 are basis vectors and x1, x2, x3, and x4 are corresponding coordinates of a vector x =

n

i=1

xi ei 2 V 4 . We rewrite (12.171) as

534

12

Canonical Forms of Matrices

x1 AðxÞ = ð e1

e2

x2

e4 ÞRR - 1 ARR - 1

e3

x3 x4

= eð14Þ

ð2Þ e1

ð2Þ e2

ð2Þ g2

4

0

0 0

x01

0

2

0

0

x02

0

0

2

1

x03

0

0

0

2

x04

e3

e4 ÞR,

ð12:172Þ ,

where we have ð4Þ

ð2Þ

ð2Þ

ð2Þ

e1 e 1 e 2 g2 x01 x02 x03

= ð e1

e2

x1 = R-1

x2

ð12:173Þ

:

x3 x4

x04 After (11.84), we have x1 x = ð e1

e2

e3

e4 Þ

x2

x1 = ð e1

e2

x3 x4

ð2Þ

e1

ð2Þ

e2

ð2Þ

g2

e4 ÞRR - 1

x2 x3 x4

x01 = eð14Þ

e3

x02 x03

ð12:174Þ

:

x04 As for Vn in general, let us put R = ( p)ij and R-1 = (q)ij. Also represent j-th (generalized) eigenvectors by a column vector and denote them by p( j ). There we display individual (generalized) eigenvectors in the order of (e(1) e(2)⋯e( j )⋯e(n)), where e( j ) (1 ≤ j ≤ n) denotes either a proper eigenvector or a generalized eigenvector according to (12.174). Each p( j ) is represented in reference to original basis vectors (e1⋯en). Then we have

12.6

Jordan Canonical Form

535

R - 1 pðjÞ =

ðjÞ q p k = 1 ik k n

ðjÞ

= δi ,

ð12:175Þ

ðjÞ

where δi denotes a column vector to which only the j-th row is 1, otherwise 0. Thus, ðjÞ a column vector δi is an “address” of e( j ) in reference to (e(1) e(2)⋯e( j )⋯e(n)) taken as basis vectors. In our present case, in fact, we have

R - 1 pð1Þ =

R - 1 pð2Þ =

R - 1 pð3Þ =

R - 1 pð4Þ =

-1 0

-1

1

0

0

0

1

0

0

1

0

0

0

0

-1 -1 -1 0

0 -1

0 1

1 0

0

0

1

0

0

1

0

0

0

1

-1 -1 -1 0

0 -1

0 1

1 1

0

0

1

0

-1

1

0

0

0

0

-1 -1 -1 0

0 -1

0 1

1 0

0

0

1

0

-1

1

0

0

0

0

-1

-1

0

0

0

1 0

=

0

,

0 0 1

=

,

0 0

ð12:176Þ

0 =

0 1

,

0 0 =

0 0

:

1 ð4Þ

In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector ð2Þ representation of e1 , and so on. A minimal polynomial ΦA(x) is expressed as Φ A ð xÞ = ð x - 4Þ ð x - 2Þ 2 : Readers can easily make sure of it. We remark that a non-singular matrix R pertinent to the similarity transformation is not uniquely determined, but we have arbitrariness. In (12.169), for example, if we ð2Þ ð2Þ ð2Þ ð2Þ adopt - g2 instead of g2 , we should adopt - e2 instead of e2 accordingly. Thus, instead of R in (12.169) we may choose

536

0 0 0 1



R =

0 0 1 1

-1 0 1 1 0 0 -1 0

12

Canonical Forms of Matrices

:

ð12:177Þ

In this case, we also get the same Jordan canonical form as before. That is,

R0

-1

AR0 =

4

0 0

0

0

2 0

0

0 0

0 2 0 0

1 2

:

Suppose that we choose R″ such that 0

=

0

0

0 -1 -1 0

ðe1 e2 e3 e4 ÞR00 = ðe1 e2 e3 e4 Þ

ð2Þ e1

1

1

0

0

0

1

1

0

1

ð2Þ e2

ð2Þ g2

ð4Þ e1

2

0

0

0

0

2

1

0

0 0

0 0

2 0

0 4

:

In this case, we have

R00

-1

AR00 =

:

Note that we get a different disposition of the matrix elements from that of (12.172). Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In (12.172), we had

R - 1 AR =

Defining

4 0 0 0

0 2 0 0

0 0 2 0

0 0 1 2

:

12.6

Jordan Canonical Form

4 0 0 0

S=

0 2 0 0

537

0 0 2 0

0 0 0 2

and

0 0 0 0

N=

0 0 0 0

0 0 0 0

0 0 1 0

,

we have R - 1 AR = S þ N

i:e:,

A = RðS þ N ÞR - 1 = RSR - 1 þ RNR - 1 :

Performing the above matrix calculations and putting ~ S = RSR - 1 -1 ~ N = RNR , we get

and

~ A = ~S þ N with

~S =

2 0 0 -2

0 2 0 0

0 0 0 0 2 0 -2 4

and

-1 1 0 -1

~ = N

-1 1 0 -1

0 0 0 0 0 0 0 0

:

That is, we have

A=

þ

1 1 0 -3 -1 1 0 -1

-1 3 0 -1

0 0 2 -2

0 0 0 4

-1 0 0 1 0 0 0 0 0 -1 0 0

=

2 0 0 -2

:

0 2 0 0

0 0 2 -2

0 0 0 4 ð12:178Þ

Even though matrix forms S and N differ depending on the choice of different matrix forms of similarity transformation R (namely, R′, R′′, etc. represented above), ~ are uniquely determined, the decomposition (12.178) is unique. That is, ~S and N ~ ~ once a matrix A is given. The matrices S and N are commutable. The confirmation is left for readers as an exercise. We present another simple example. Let A be a matrix described as A=

0 4 : -1 4

Eigenvalues of A are 2 as a double root. According to routine, we have an ð2Þ eigenvector e1 as a column vector, e.g.,

538

12

Canonical Forms of Matrices

2 : 1

ð2Þ

e1 =

ð2Þ

Another eigenvector is a generalized eigenvector g1 decided such that ð2Þ

-2 -1

ðA - 2E Þg1 =

ð2Þ

of rank 2. This can be

ð2Þ

g1 = e 1 :

4 2

As an option, we get ð2Þ

g1 =

:

1 1

Thus, we can choose R for a diagonalizing matrix together with an inverse matrix R-1 such that R=

2 1

1 1

,

and

R-1 =

-1 2

1 -1

:

Therefore, with a Jordan canonical form we have R - 1 AR =

2 0

1 2

:

ð12:179Þ

As before, putting S=

2 0

0 2

and

N=

0 0

1 0

,

we have R - 1 AR = S þ N,

i:e:,

A = RðS þ N ÞR - 1 = RSR - 1 þ RNR - 1 :

~ = RNR - 1 , we get Putting ~S = RSR - 1 and N ~ A = ~S þ N with S=

2

0

0

2

and

N=

-2 4 -1 2

:

ð12:180Þ

We may also choose R′ for a diagonalizing matrix together with an inverse matrix R′ instead of R and R-1, respectively, such that we have, e.g., -1

12.7

Diagonalizable Matrices

R′ =

539

-3 -1

2 1

and

R′

-1

-1 -1

=

3 : 2

Using these matrices, we get exactly the same Jordan canonical form and the matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix decomposition is unique. Another simple example is a lower triangle matrix

A=

2 -2 0

0 1 0

0 0 1

:

Following now familiar procedures, as a diagonalizing matrix we have, e.g., R=

1 -2 0

0 1 0

0 0 1

1 2 0

and R - 1 =

0 1 0

0 0 1

:

Then, we get R - 1 AR = S =

2 0 0

0 1 0

0 0 1

þ

0 0 1

:

Therefore, the “decomposition” is A = RSR - 1 =

2 -2 0

0 1 0

0 0 0

0 0 0

0 0 0

,

where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e., zero matrix). Thus, the decomposition is once again unique.

12.7 Diagonalizable Matrices Among canonical forms of matrices, the simplest form is a diagonalizable matrix. Here we define a diagonalizable matrix as a matrix that can be converted to that whose off-diagonal elements are zero. In Sect. 12.5 we have investigated different properties of the matrices. In this section we examine basic properties of diagonalizable matrices. In Sect 12.6.1 we have shown that Span{Ni - 1aρ, Ni - 2aρ, ⋯Naρ, aρ} forms a Ninvariant subspace of a dimension i, where aρ satisfies the relations Niaρ = 0 and Ni - 1aρ ≠ 0 (ri + 1 + 1 ≤ j ≤ ri) as in (12.122). Of these vectors, only Ni - 1aρ is a sole

540

12

Canonical Forms of Matrices

proper eigenvector that is accompanied by (i - 1) generalized eigenvectors. Note that only the proper eigenvector can construct one-dimensional N-invariant subspace by itself. This is because regarding other generalized eigenvectors g (here g stands for all of generalized eigenvectors), Ng (≠0) and g are linearly independent. Note that with a proper eigenvector e, we have Ne = 0. A corresponding Jordan block is represented by a matrix as given in (12.122) in reference to the basis vectors comprising these i eigenvectors. Therefore, if a (n, n) matrix A has only proper eigenvectors, all Jordan blocks are one-dimensional. This means that A is diagonalizable. That A has only proper eigenvectors is equivalent to that those eigenvectors form a one-dimensional subspace and that Vn is a direct sum of the subspaces spanned by individual proper eigenvectors. In other words, if Vn is a direct sum of subspaces (i.e., eigenspaces) spanned by individual proper eigenvectors of A, A is diagonalizable. Next, suppose that A is diagonalizable. Then, after an appropriate similarity transformation with a non-singular matrix P, A has a following form: α1 ⋱ α1 α2 ⋱

P - 1 AP =

:

α2

ð12:181Þ

⋱ αs ⋱ αs In this case, let us examine what form a minimal polynomial φA(x) for A takes. A characteristic polynomial fA(x) for A is invariant through similarity transformation, so is φA(x). That is, φP - 1 AP ðxÞ = φA ðxÞ:

ð12:182Þ

From (12.181), we find that A - αiE (1 ≤ i ≤ s) has a “perforated” form such as (12.143) with the diagonalized form unchanged. Then we have ðA - α1 EÞðA - α2 E Þ⋯ðA - αs E Þ = 0:

ð12:183Þ

This is because a product of matrices having only diagonal elements is merely a product of individual diagonal elements. Meanwhile, in virtue of Hamilton-Cayley theorem, f A ðAÞ =

s i=1

Rewriting this expression, we have

ðA - αi EÞni = 0:

12.7

Diagonalizable Matrices

541

ðA - α1 E Þn1 ðA - α2 EÞn2 ⋯ðA - αs EÞns = 0:

ð12:184Þ

In light of (12.183), this implies that a minima polynomial φA(x) is expressed as φA ðxÞ = ðx - α1 Þðx - α2 Þ⋯ðx - αs Þ:

ð12:185Þ

Surely φA(x) of (12.185) has a lowest-order polynomial among those satisfying f(A) = 0 and is a divisor of fA(x). Also φA(x) has a highest-order coefficient of 1. Thus, φA(x) should be a minimal polynomial of A and we conclude that φA(x) does not have a multiple root. Then let us think how Vn is characterized in case φA(x) does not have a multiple root. This is equivalent to that φA(x) is described by (12.185). To see this, suppose that we have two matrices A and B and let BVn = W. We wish to use the following relation [3, 4]: rankðABÞ = dimABV n = dimAW = dimW - dim A - 1 ð0Þ \ W ≥ dimW - dim A - 1 ð0Þ = dimBV n - ðn - dimAV n Þ

ð12:186Þ

= rank A þ rank B - n: In (12.186), the third equality comes from the fact that the domain of A is restricted to W. Concomitantly, A-1(0) is restricted to A-1(0) \ W as well; notice that A-1(0) \ W is a subspace. Considering these situations, we use a relation corresponding to that of (11.45). The fourth equality is due to the dimension theorem of (11.45). Applying (12.186) to (12.183) successively, we have 0 = rank ½ðA - α1 E ÞðA - α2 E Þ⋯ðA - αs EÞ ≥ rank ðA - α1 E Þ þ rank ½ðA - α2 E Þ⋯ðA - αs E Þ - n ≥ rank ðA - α1 E Þ þ rank ðA - α2 EÞ þ rank ½ðA - α3 EÞ⋯ðA - αs E Þ - 2n ≥⋯ ≥ rank ðA - α1 E Þ þ ⋯ þ rank ðA - αs E Þ - ðs - 1Þn =

s i=1

rank ½ðA - αi E Þ - n þ n:

Finally we get s i=1

rank ½n - ðA - αi EÞ ≥ n:

ð12:187Þ

As rank ½n - ðA - αi E Þ = dimW αi , we have s i=1

Meanwhile, we have

dimW αi ≥ n:

ð12:188Þ

542

12

V n ⊃ W α1



W α2

n ≥ dim W α1

W αs , ⋯

W α2

Canonical Forms of Matrices

W αs =

s i=1

dimW αi :

ð12:189Þ

The equality results from the property of a direct sum. From (12.188) and (12.189), s i=1

dimW αi = n:

ð12:190Þ

Hence, V n = W α1



W α2

W αs :

ð12:191Þ

Thus, we have proven that if the minimal polynomial does not have a multiple root, Vn is decomposed into direct sum of eigenspaces as in (12.191). If in turn Vn is decomposed into direct sum of eigenspaces as in (12.191), A can be diagonalized by a similarity transformation. The proof is as follows: Suppose that (12.191) holds. Then, we can take only eigenvectors for the basis vectors of Vn. Then, we can take vectors Suppose that dimW αi = ni . i i-1 ak j = 1 nj so that ak can be the basis vectors of W αi . In j = 1 nj þ 1 ≤ k ≤ reference to this basis set, we describe a vector x 2 Vn such that x1 x = ð a1 a 2 ⋯ a n Þ Operating A on x, we get

x2 ⋮ xn

:

x1 AðxÞ = ð a1

a2



x1

x2

an ÞA

= ð a1 A

⋮ xn

a2 A



an A Þ

x2 ⋮ xn

x1 = ð α1 a1

α2 a2



αn an Þ

x2 ⋮ xn

α1 = ð a1

a2



an Þ

x1 α2

x2 ⋮

⋱ αn

,

xn ð12:192Þ

12.7

Diagonalizable Matrices

543

where with the second equality we used the notation (11.40); with the third equality some of αi (1 ≤ i ≤ n) may be identical; ai is an eigenvector that corresponds to an eigenvalue αi. Suppose that a1, a2, ⋯, and an are obtained by transforming an “original” basis set e1, e2, ⋯, and en by R. Then, we have ð0Þ

α1 AðxÞ = ð e1

e2



x1 α2

en Þ R

R-1

⋱ αn

ð0Þ

x2 ⋮

:

xðn0Þ

We denote the transformation A with respect to a basis set e1, e2, ⋯, and en by A0; see (11.82) with the notation. Then, we have ð0Þ

x1

ð0Þ

AðxÞ = ð e1

e2



en Þ A0

x2 ⋮

:

xðn0Þ Therefore, we get α1 R - 1 A0 R =

α2

:



ð12:193Þ

αn Thus, A is similar to a diagonal matrix as represented in (12.192) and (12.193). It is obvious to show a minimal polynomial of a diagonalizable matrix has no multiple roots. The proof is left for readers. Summarizing the above arguments, we have a following theorem: Theorem 12.7 [3, 4] The following three statements related to A are equivalent: 1. The matrix A is similar to a diagonal matrix (i.e., semi-simple). 2. The minimal polynomial φA(x) does not have a multiple root. 3. The vector space Vn is decomposed into a direct sum of eigenspaces. In Example 12.1 we showed the diagonalization of a matrix. There A has two different eigenvalues. Since with a (n, n) matrix having n different eigenvalues its characteristic polynomial does not have a multiple root, the minimal polynomial necessarily has no multiple roots. The above theorem therefore ensures that a matrix having no multiple roots must be diagonalizable.

544

12

Canonical Forms of Matrices

Another consequence of this theorem is that an idempotent matrix is diagonalizable. The matrix is characterized by A2 = A. Then A(A - E) = 0. Taking its determinant, (detA)[det(A - E)] = 0. Therefore, we have either detA = 0 or det (A - E) = 0. Hence, eigenvalues of A are zero or 1. Think of f(x) = x(x - 1). As f(A) = 0, f(x) should be a minimal polynomial. It has no multiple roots, and so the matrix is diagonalizable. Example 12.6 Let us revisit Example 12.1, where we dealt with A=

2 0

1 1

:

ð12:32Þ

From (12.33), fA(x) = (x - 2)(x - 1). Note that f A ðxÞ = f P - 1 AP ðxÞ. Let us treat a problem according to Theorem 12.5. Also we use the notation of (12.85). Given f1(x) = x - 1 and f2(x) = x - 2, let us decide M1(x) and M2(x) such that these can satisfy M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ = 1:

ð12:194Þ

We find M1(x) = 1 and M2(x) = - 1. Thus, using the notation of Theorem 12.5, Sect. 12.4 we have A1 = M 1 ðAÞf 1 ðAÞ = A - E =

1

1

0

0

A2 = M 2 ðAÞf 2 ðAÞ = - A þ 2E =

,

0

-1

0

1

:

We also have A1 þ A2 = E,

Ai Aj = Aj Ai = Ai δij :

ð12:195Þ

Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are expressed by a polynomial A, they are commutative with A. We find that A is represented by A = α1 A1 þ α2 A2 ,

ð12:196Þ

where α1 and α2 denote eigenvalues 2 and 1, respectively. Thus choosing proper eigenvectors for basis vectors, we have decomposed a vector space Vn into a direct sum of invariant subspaces comprising the proper eigenvectors. Concomitantly, A is represented as in (12.196). The relevant decomposition is always possible for a diagonalizable matrix. Thus, idempotent matrices play an important role in the linear vector spaces.

12.7

Diagonalizable Matrices

545

Example 12.7 Let us think of a following matrix. 1 0 0

A=

0 1 0

1 1 0

:

ð12:197Þ

This is a triangle matrix, and so diagonal elements give eigenvalues. We have an eigenvalue 1 of double root and that 0 of simple root. The matrix can be diagonalized using P such that 1 0 0

~ = P - 1 AP = A

=

1 0 0

0 1 0

0 0 0

0 1 0

1 1 1

1 0 0

0 1 0

1 1 0

1 0 0

0 1 0

-1 -1 1

:

ð12:198Þ

~ = A. ~ We also have As can be checked easily, A 2

~= E-A

0 0 0

0 0 0

0 0 1

ð12:199Þ

,

~ holds as well. Moreover, A ~ E-A ~ = E-A ~ A ~ = 0. Thus ~ 2=E-A where E - A ~ and E - A ~ behave like A1 and A2 of (12.195). A Next, suppose that x 2 Vn is expressed as a linear combination of basis vectors a1, a2, ⋯, and an. Then x = c1 a1 þ c 2 a2 þ ⋯ þ c n - 1 an - 1 þ c n an :

ð12:200Þ

Here let us define a following linear transformation P(k) such that P(k) “extracts” the k-th component of x. That is, PðkÞ ðxÞ = PðkÞ

n

ca j=1 j j

=

n j=1

½k  p ca i = 1 ij j i n

= c k ak ,

ð12:201Þ

½k 

where pij is the matrix representation of P(k). In fact, suppose that there is another arbitrarily chosen vector x such that y = d 1 a1 þ d 2 a2 þ ⋯ þ d n - 1 an - 1 þ d n an : Then we have

ð12:202Þ

546

12

Canonical Forms of Matrices

PðkÞ ðax þ byÞ = ðack þ bdk Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we should have ½k 

ðk Þ

= δi δðj kÞ ,

pij ðjÞ

where δi

ð12:204Þ

has been defined in (12.175). Meanwhile, δjðkÞ denotes a row vector to ðk Þ

which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix ðk Þ and that δðj kÞ denotes a (1, n) matrix. Therefore, δi δðj kÞ represents a (n, n) matrix (k) whose (k, k) element is 1, otherwise 0. Thus, P (x) is denoted by 0 ⋱

x1

0 PðkÞ ðxÞ = ða1 ⋯ an Þ

x2 ⋮

1 0

= x k ak ,

ð12:205Þ

xn

⋱ 0

where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] = P(k)(x). That is PðkÞ

2

= PðkÞ :

ð12:206Þ

Also P(k)[P(l )(x)] = 0 if k ≠ l. Meanwhile, we have P(1)(x) + ⋯ + P(n)(x) = x. Hence, P(1)(x) + ⋯ + P(n)(x) = [P(1) + ⋯ + P(n)](x) = x. Since this relation holds with any x 2 Vn, we get Pð1Þ þ ⋯ þ PðnÞ = E:

ð12:207Þ

As shown above, an idempotent matrix such as P(k) always exists. In particular, if the basis vectors comprise only proper eigenvectors, the decomposition as expressed in (12.196) is possible. In that case, it is described as A = α1 A1 þ ⋯ þ αn An ,

ð12:208Þ

where α1, ⋯, and αn are eigenvalues (some of which may be identical) and A1, ⋯, and An are idempotent matrices such as those represented by (12.205). In fact, the above situation is equivalent to that A is semi-simple. This implies that A can be decomposed into a following form described by

12.7

Diagonalizable Matrices

547

R - 1 AR = α1 Pð1Þ þ ⋯ þ αn PðnÞ ,

ð12:209Þ

where R is a non-singular operator. Rewriting (12.209), we have A = α1 RPð1Þ R - 1 þ ⋯ þ αn RPðnÞ R - 1 :

ð12:210Þ

Putting A~k  RPðkÞ R - 1 ð1 ≤ k ≤ nÞ, we obtain A = α1 A~1 þ ⋯ þ αn A~n :

ð12:211Þ

A~k A~l = δkl :

ð12:212Þ

We can readily show that

That is, A~k ð1 ≤ k ≤ nÞ is an idempotent matrix. Yet, we have to be careful to construct idempotent matrices according to a formalism described in Theorem 12.5. It is because we often encounter a situation where different matrices give an identical characteristic polynomial. We briefly mention this in the next example. Example 12.8 Let us think about following two matrices: 3 0 0

A=

0 2 0

0 0 2

,

B=

3 0 0

0 2 0

0 1 2

:

ð12:213Þ

Then, following Theorem 12.5, Sect. 12.4, we have f A ðxÞ = f B ðxÞ = ðx - 3Þðx - 2Þ2 ,

ð12:214Þ

with eigenvalues α1 = 3 and α2 = 2. Also we have f1(x) = (x - 2)2 and f2(x) = x - 3. Following the procedures of (12.85) and (12.86), we obtain M 1 ð xÞ = x - 2

and

M 2 ðxÞ = - x2 þ 3x - 3:

ð12:215Þ

Therefore, we have M 1 ðxÞf 1 ðxÞ = ðx - 2Þ3 ,

M 2 ðxÞf 2 ðxÞ = ðx - 3Þ - x2 þ 3x - 3 :

ð12:216Þ

Hence, we get A1  M 1 ðAÞf 1 ðAÞ = ðA - 2E Þ3 , A2  M 2 ðAÞf 2 ðAÞ = ðA - 3E Þ - A2 þ 3A - 3E :

ð12:217Þ

Similarly, we get B1 and B2 by replacing A with B in (12.217). Thus, we have

548

12

A1 = B1 =

1 0 0

0 0 0

0 0 0

,

A2 = B2 =

0 0 0 1 0 0

Canonical Forms of Matrices

0 0 1

:

ð12:218Þ

Notice that we get the same idempotent matrix of (12.218), even though the matrix forms of A and B differ. Also we have A1 þ A2 = B1 þ B2 = E:

ð12:219Þ

Then, we have A = ðA1 þ A2 ÞA = A1 A þ A2 A,

B = ðB1 þ B2 ÞB = B1 B þ B2 B:

ð12:220Þ

Nonetheless, although A = 3A1 + 2A2 holds, B ≠ 3B1 + 2B2. That is, the decomposition of the form of (12.208) is not true of B. The decomposition of this kind is possible only with diagonalizable matrices. In summary, a (n, n) matrix with s (1 ≤ s ≤ n) different eigenvalues has at least s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.) In the case of s < n, the matrix has multiple root(s) and may have generalized eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is accompanied by (ν - 1) generalized eigenvectors along with a sole proper eigenvector. Those vectors form an invariant subspace along with the proper eigenvector (s). In total, such n (generalized) eigenvectors span a whole vector space Vn. With the eigenvalue equation A(x) = αx, we have an indefinite but non-trivial solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane. However, we have a unique but trivial solution x = 0 for complex numbers α other than eigenvalues. This is characteristic of the eigenvalue problem.

References 1. Mirsky L (1990) An Introduction to linear algebra. Dover, New York 2. Hassani S (2006) Mathematical physics. Springer, New York 3. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)

Chapter 13

Inner Product Space

Thus far we have treated the theory of linear vector spaces. The vector spaces, however, were somewhat “structureless,” and so it will be desirable to introduce a concept of metric or measure into the linear vector spaces. We call a linear vector space where the inner product is defined an inner product space. In virtue of a concept of the inner product, the linear vector space is given a variety of structures. For instance, introduction of the inner product to the linear vector space immediately leads to the definition of adjoint operators and Gram matrices. Above all, the concept of inner product can readily be extended to a functional space and facilitate understanding of, e.g., orthogonalization of functions, as was exemplified in Parts I and II. Moreover, definition of the inner product allows us to relate matrix operators and differential operators. In particular, it is a key issue to understand logical structure of quantum mechanics. This can easily be understood from the fact that Paul Dirac, who was known as one of the prominent founders of quantum mechanics, invented bra and ket vectors to represent an inner product.

13.1

Inner Product and Metric

Inner product relates two vectors to a complex number. To do this, we introduce the notation jai and hbj to represent the vectors. This notation is due to Dirac and widely used in physics and mathematical physics. Usually jai and hbj are called a “ket” vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may call haj an adjoint vector of jai. Or we denote ha j  j ai{. The symbol “{” (dagger) means that for a matrix its transposed matrix should be taken with complex conju{ gate matrix elements. That is, aij = aji . If we represent a full matrix, we have

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_13

549

550

13

A=

⋯ ⋱ ⋯

a11 ⋮ an1

a1n ⋮ ann

, A{ =

a11 ⋮ a1n

⋯ ⋱ ⋯

an1 ⋮ ann

Inner Product Space

:

ð13:1Þ

We call A{ an adjoint matrix or adjoint to A; see (1.106). The operation of transposition and complex conjugate is commutable. A further remark will be made below after showing the definition of the inner product. The symbols jai and hbj represent vectors and, hence, we do not need to use bold characters to show that those are vectors. The definition of the inner product is as follows: hbjai = hajbi ,

ð13:2Þ

ha j ðβjbi þ γ jciÞ = βhajbi þ γ hajci,

ð13:3Þ

hajai ≥ 0:

ð13:4Þ

In (13.4), equality holds if and only if jai = 0. Note here that two vectors are said to be orthogonal to each other if their inner product vanishes, i.e., hb| ai = ha| bi = 0. In particular, if a vector jai 2 Vn is orthogonal to all the vectors in Vn, i.e., hx| ai = 0 for 8jxi 2 Vn, then jai = 0. This is because if we choose jai for jxi, we have ha| ai = 0. This means that jai = 0. We call a linear vector space to which the inner product is defined an inner product space. We can create another structure to a vector space. An example is a metric (or distance function). Suppose that there is an arbitrary set Q. If a real non-negative number ρ(a, b) is defined as follows with any arbitrary elements a, b, c 2 Q, the set Q is called a metric space [1]: ρða, bÞ = ρðb, aÞ,

ð13:5Þ

ρða, bÞ ≥ 0 for a, b; ρða, bÞ = 0 if and only if a = b,

ð13:6Þ

ρða, bÞ þ ρðb, cÞ ≥ ρða, cÞ:

ð13:7Þ

8

In our study a vector space is chosen for the set Q. Here let us define a norm for each vector a. The norm is defined as kak =

hajai:

ð13:8Þ

If we define ρ(a, b)  ka - bk, ka - bk satisfies the definitions of metric. Equations (13.5) and (13.6) are obvious. For (13.7) let us consider a vector | ci as | ci = j ai - xhb| ai j bi with real x. Since hc| ci ≥ 0, we have

13.1

Inner Product and Metric

551

x2 hajbihbjaihbjbi - 2xhajbihbjai þ hajai ≥ 0 or x2 jhajbij2 hbjbi - 2xjhajbij2 þ hajai ≥ 0:

ð13:9Þ

The inequality (13.9) related to the quadratic equation in x with real coefficients requires the inequality such that hajaihbjbi ≥ hajbihbjai = jhajbij2 :

ð13:10Þ

That is, hajai ∙

hbjbi ≥ jhajbij:

ð13:11Þ

Namely, jjajj ∙ jjbjj ≥ jhajbij ≥ Rehajbi:

ð13:12Þ

The relations (13.11) and (13.12) are known as Cauchy–Schwarz inequality. Meanwhile, we have jja þ bjj2 = ha þ bja þ bi = jjajj2 þ jjbjj2 þ 2Rehajbi, ðjjajj þ jjbjjÞ2 = jjajj2 þ jjbjj2 þ 2jjajj ∙ jjbjj:

ð13:13Þ ð13:14Þ

Comparing (13.13) and (13.14) and using (13.12), we have jjajj þ jjbjj ≥ jja þ bjj:

ð13:15Þ

The inequality (13.15) is known as the triangle inequality. In (13.15) replacing a → a - b and b → b - c, we get jja - bjj þ jjb - cjj ≥ jja - cjj:

ð13:16Þ

Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8) may be regarded as a “length” of a vector a. As β| bi + γ| ci represents a vector, we use a shorthand notation for it as j βb þ γci  βjbi þ γjci: According to the definition (13.3)

ð13:17Þ

552

13

ha j βb þ γci = βhajbi þ γ hajci:

Inner Product Space

ð13:18Þ

Also from (13.2), hβb þ γc j ai = ha j βb þ γci = ½βhajbi þ γ hajci = β hbjai þ γ  hcjai: ð13:19Þ Comparing the leftmost side and rightmost side of (13.19) and considering | ai can arbitrarily be chosen, we get hβb þ γcj = β hbj þ γ  hcj:

ð13:20Þ

Therefore, when we take out a scalar from a bra vector, it should be converted to a complex conjugate. When the scalar is taken out from a ket vector, however, it is unaffected by definition (13.3). To show this, we have jαai = α j ai. Taking its adjoint, hαa| =αha j . We can view (13.17) as a linear transformation in a vector space. In other words, if we regard | ∙i as a linear transformation of a vector a 2 Vn to j ai 2 V~n , | ∙i is that of Vn to V~n . Conversely, h∙j could not be regarded as a linear transformation of a 2 Vn to 0 ha j2 V~n . Sometimes the said transformation is referred to as “antilinear” or “sesquilinear.” From the point of view of formalism, the inner product can be viewed 0 0 as an operation: V~n × V~n → ℂ. The vector space V~n is said to be a dual vector space (or dual space) of V~n . The notion of the dual space possesses important position in the theory of vector space. We will come back to the definition and properties of the dual space in Part V (Chap. 24). Let us consider jxi = j yi in an inner product space V~n . Then jxi - j yi = 0. That is, jx - yi = 0. Therefore, we have x = y, or j0i = 0. This means that the linear transformation | ∙i converts 0 2 Vn to j 0i 2 V~n . This is a characteristic of a linear transformation represented in (11.44). Similarly we have h0 j = 0. However, we do not have to get into further details about the notation in this book. Also if we have to specify a vector space, we simply do so by designating it as Vn.

13.2

Gram Matrices

Once we have defined an inner product between any pair of vectors jai and jbi of a vector space Vn, we can define and calculate various quantities related to inner products. As an example, let ja1i, ⋯, j ani and jb1i, ⋯, j bni be two sets of vectors in Vn. The vectors ja1i, ⋯, j ani may or may not be linearly independent. This is true of jb1i, ⋯, j bni. Let us think of a following matrix M defined as below:

13.2

Gram Matrices

M=

553 ha1 j ⋮ han j

ha1 jb1 i ⋮ han jb1 i

ðjb1 i⋯jbn iÞ =

ha1 jbn i ⋮ han jbn i

⋯ ⋱ ⋯

:

ð13:21Þ

We assume the following cases. 1. Suppose that in (13.21) | b1i, ⋯, | bni are linearly dependent. Then, without loss of generality we can put jb1 i = c2 jb2 i þ c3 jb3 i þ ⋯ þ cn jbn i:

ð13:22Þ

Then we have M=

c2 ha1 jb2 iþc3 ha1 jb3 iþ⋯þcn ha1 jbn i ⋮ c2 han jb2 iþc3 han jb3 iþ⋯þcn han jbn i

ha1 jb2 i ⋮ han jb2 i

⋯ ⋯

ha1 jbn i ⋮ han jbn i



:

ð13:23Þ

Multiplying the second column,⋯, and the n-th column by (-c2),⋯, and (-cn), respectively, and adding them to the first column, we get ~= M

0 ⋮ 0

ha1 jb2 i ⋮ han jb2 i

⋯ ⋱ ⋯

ha1 jbn i

:

han jbn i

ð13:24Þ

~ was obtained after giving the above operations to M. We Note that in (13.24), M ~ have det M = 0. Since the above operations keep the determinant of a matrix unchanged, again we get det M = 0. 2. Suppose in turn that in (13.21) | a1i, ⋯, | ani are linearly dependent. In that case, again, without loss of generality we can put ja1 i = d2 ja2 i þ d 3 ja3 i þ ⋯ þ dn jan i:

ð13:25Þ

Focusing attention on individual rows and taking a similar procedure described above, we have

~ = M

0 ha2 jb1 i ⋮ han jb1 i

⋯ ⋯ ⋱ ⋯

0 ha2 jbn i ⋮ han jbn i

:

ð13:26Þ

Again, det M = 0. Next, let us examine the case where det M = 0. In that case, n column vectors of M in (13.21) are linearly dependent. Without loss of generality, we suppose that the first column is expressed as a linear combination of the other (n - 1) columns such that

554 ha1 jb1 i ⋮ han jb1 i

= c2

ha1 jb2 i ⋮ han jb2 i

þ ⋯ þ cn

ha1 jbn i ⋮ han jbn i

13

Inner Product Space

:

ð13:27Þ

Rewriting this, we have ha1 jb1 - c2 b2 - ⋯ - cn bn i ⋮ han jb1 - c2 b2 - ⋯ - cn bn i

=

0 ⋮ 0

:

ð13:28Þ

Multiplying the first row,⋯, and the n-th row of (13.28) by an appropriate complex number p1 ,⋯, and pn , respectively, we have hp1 a1 jb1 - c2 b2 - ⋯ - cn bn i ¼ 0, ⋯, hpn an jb1 - c2 b2 - ⋯ - cn bn i ¼ 0: Adding all the above, we get hp1 a1 þ ⋯ þ pn an jb1 - c2 b2 - ⋯ - cn bn i = 0: Now, suppose that ja1i, ⋯, j ani are the basis vectors. Then, jp1a1 + ⋯ + pnani represents any vectors in a vector space. This implies that jb1 - c2b2 - ⋯ cnbni = 0; for this see remarks after (13.4). That is, j b1 i = c2 j b2 i þ ⋯ þ cn j bn i:

ð13:29Þ

Thus, jb1i, ⋯, j bni are linearly dependent. Meanwhile, det M = 0 implies that n row vectors of M in (13.21) are linearly dependent. In that case, performing similar calculations to the above, we can readily show that if jb1i, ⋯, j bni are the basis vectors, ja1i, ⋯, j ani are linearly dependent. We summarize the above discussion by a following statement: suppose that we have two sets of vectors ja1i, ⋯, j ani and jb1i, ⋯, j bni. At least a set of vectors are linearly dependent: ⟺ detM = 0:

ð13:30Þ

Both the sets of vectors are linearly independent: ⟺ detM ≠ 0:

ð13:31Þ

The latter statement is obtained by considering contraposition of the former statement. We restate the above in a following theorem: Theorem 13.1 Let ja1i, ⋯, j ani and jb1i, ⋯, j bni be two sets of vectors defined in a vector space Vn. A necessary and sufficient condition for both these sets of vectors to be linearly independent is that with a matrix M defined as

13.2

Gram Matrices

555

M=

ha1 j ⋮ han j

ha1 jb1 i ⋮ han jb1 i

ðjb1 i⋯jbn iÞ =

⋯ ⋱ ⋯

ha1 jbn i ⋮ han jbn i

,

we must have detM ≠ 0.



Next we consider a norm of a vector expressed in reference to a set of basis vectors je1i, ⋯, j eni of Vn. Let us express a vector jxi in an inner product space as follows as in the case of (11.10) and (11.13): j xi = x1 j e1 i þ x2 j e2 i þ ⋯ þ xn j en i = j x1 e1 þ x 2 e2 þ ⋯ þ xn en i x1 ⋮ xn

= ðj e1 i⋯jen iÞ

:

ð13:32Þ

A bra vector hxj is then denoted by he1 j ⋮ hen j

hx j = x1 ⋯ xn

:

ð13:33Þ

Thus, we have an inner product described as hxjxi = x1 ⋯ xn = x1 ⋯ xn

he1 j ⋮ hen j he1 je1 i ⋮ hen je1 i

ðje1 i ⋯ jen iÞ ⋯ ⋱ ⋯

he1 jen i ⋮ hen jen i

x1 ⋮ xn

x1 ⋮ xn

:

ð13:34Þ

Here the matrix G expressed as follows is called a Gram matrix [2–4]: G=

he1 je1 i ⋮ hen je1 i

⋯ ⋱ ⋯

he1 jen i ⋮ hen jen i

:

ð13:35Þ

As hej| eii = hei| eji, we have G = G{. From Theorem 13.1, detG ≠ 0. With a shorthand notation, we write (G)ij = (hei| eji). As already mentioned in Sect. 1.4, if for a matrix H we have a relation described by H = H{,

ð1:119Þ

it is said to be an Hermitian matrix or a self-adjoint matrix. We often say that such a matrix is Hermitian.

556

13

Inner Product Space

Since the Gram matrices frequently appear in matrix algebra and play a role, their properties are worth examining. Since G is an Hermitian matrix, it can be diagonalized through similarity transformation using a unitary matrix. We will give its proof later (see Sect. 14.3). Let us deal with (13.34) further. We have he1 je1 i ⋮ hen je1 i

hxjxi = x1 ⋯ xn UU {

⋯ ⋱ ⋯

he1 jen i ⋮ hen jen i

x1 ⋮ xn

UU {

ð13:36Þ

,

where U is defined as UU{ = U{U = E (or U-1 = U{). Such a matrix U is called a unitary matrix. We represent a matrix form of U as U=

u11 ⋮ un1

⋯ ⋱ ⋯

u1n ⋮ unn

, U{ =

u11 ⋮ u1n

⋯ ⋱ ⋯

un1 ⋮ unn

:

ð13:37Þ

Here, putting x1 ⋮ xn

U{

=

ξ1 ⋮ ξn

or equivalently n

x u k = 1 k ki

 ξi

ð13:38Þ

and taking its adjoint such that x1 ⋯ xn U = ξ1 ⋯ ξn or equivalently n

x u k = 1 k ki

= ξi ,

we have hxjxi = ξ1 ⋯ ξn U {

he1 je1 i ⋮ hen je1 i

⋯ ⋱ ⋯

he1 jen i ⋮ hen jen i

U

ξ1 ⋮ ξn

:

ð13:39Þ

We assume that the Gram matrix is diagonalized by a similarity transformation by U. After being diagonalized, similarly to (12.192) and (12.193) the Gram matrix has a following form G′:

13.2

Gram Matrices

557

U - 1 GU = U { GU = G0 =

λ1 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ λn

:

ð13:40Þ

Thus, we get hxjxi = ξ1 ⋯ ξn

λ1 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ λn

ξ1 ⋮ ξn

= λ1 jξ1 j2 þ ⋯ þ λn jξn j2 :

ð13:41Þ

From the relation (13.4), hx| xi ≥ 0. This implies that in (13.41) we have λi ≥ 0 ð1 ≤ i ≤ nÞ:

ð13:42Þ

To show this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi ≠ 0. Then, with

0 ⋮ ξi ⋮ 0

we have hx| xi = λi|ξi|2 < 0, in contradiction.

Since we have detG ≠ 0, taking a determinant of (13.40) we have det G0 = det U { GU = det U { UG = detEdetG = detG =

n

λi ≠ 0:

ð13:43Þ

i=1

Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e., λi > 0 ð1 ≤ i ≤ nÞ:

ð13:44Þ

The norm hx| xi = 0, if and only if ξ1 = ξ2 = ⋯ = ξn = 0 which corresponds to x1 = x2 = ⋯ = xn = 0 from (13.38). For further study, we generalize the aforementioned feature a little further. That is, if je1i, ⋯, and j eni are linearly dependent, from Theorem 13.1 we get det G = 0. This implies that there is at least one eigenvalue λi = 0 (1 ≤ i ≤ n) and that for a vector

0 ⋮ ξi ⋮ 0

with ∃ξi ≠ 0 we have hx| xi = 0. (Suppose that in V2 we have linearly

dependent vectors je1i and je2i = j - e1i; see Example 13.2 below.) Let H be an Hermitian matrix [i.e., (H )ij = (H)ji]. If we have ϕ as a function of complex variables x1, ⋯, xn such that ϕðx1 , ⋯, xn Þ =

n

x ðH Þij xj , i,j = 1 i

ð13:45Þ

where Hij is a matrix element of an Hermitian matrix; ϕ(x1, ⋯, xn) is said to be an Hermitian quadratic form. Suppose that ϕ(x1, ⋯, xn) satisfies ϕ(x1, ⋯, xn) = 0 if and only if x1 = x2 = ⋯ = xn = 0, and otherwise ϕ(x1, ⋯, xn) > 0 with any other sets of

558

13

Inner Product Space

xi (1 ≤ i ≤ n). Then, the said Hermitian quadratic form is called positive definite and we write H > 0:

ð13:46Þ

If ϕ(x1, ⋯, xn) ≥ 0 for any xn (1 ≤ i ≤ n) and ϕ(x1, ⋯, xn) = 0 for at least a set of (x1, ⋯, xn) to which ∃xi ≠ 0, ϕ(x1, ⋯, xn) is said to be positive semi-definite or non-negative. In that case, we write H ≥ 0:

ð13:47Þ

From the above argument, a Gram matrix comprising linearly independent vectors is positive definite, whereas that comprising linearly dependent vectors is non-negative. On the basis of the above argument including (13.36)–(13.44), we have H > 0 ⟺ λi > 0 ð1 ≤ i ≤ nÞ, detH > 0; H ≥ 0 ⟺ λi ≥ 0 with



λi = 0 ð1 ≤ i ≤ nÞ, detH = 0:

ð13:48Þ

Notice here that eigenvalues λi remain unchanged after (unitary) similarity transformation. Namely, the eigenvalues are inherent to H. We have already encountered several examples of positive definite and non-negative operators. A typical example of the former case is Hamiltonian of quantum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigenvalues are all positive (i.e., positive definite). Orbital angular momenta L2 of hydrogen-like atoms, however, are non-negative operators and, hence, an eigenvalue of zero is permitted. Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If we take an orthonormal basis jη1i, j η2i, ⋯, j ηni, jeii can be expressed as j ei i = hek jei i =

n j=1

n

b b l = 1 lk ji =

n

b j = 1 ji

ηl jηj =

n j=1

B{

kj

j ηj ,

n j=1

n

b b δ l = 1 lk ji lj

ðBÞji = B{ B ki :

=

n

b b j = 1 jk ji ð13:49Þ

For the second equality of (13.49), we used the orthonormal condition hηi| ηji = δij. Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B. In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1, e2, ⋯, and en by a matrix A defined in (11.69) and examined whether the transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set of basis vectors) is detA ≠ 0. Thus, we notice that B plays a same role as A of (11.69)

13.2

Gram Matrices

559

and, hence, det B ≠ 0 if and only if the set of vectors je1i, j e2i, and⋯ j eni defined in (13.32) are linearly independent. By the same token as the above, we conclude that eigenvalues of B{B are all positive, only if det B ≠ 0 (i.e., B is non-singular). Alternatively, if B is singular, detB{B = det B{ det B = 0. In that case at least one of the eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are frequently dealt with in the field of mathematical physics in conjunction with quadratic forms. Further topics can be seen in the next chapter. Example 13.1 Let us take two vectors jε1i and j ε2i that are expressed as j ε1 i = j e1 iþ j e2 i, j ε2 i = j e1 i þ i j e2 i:

ð13:50Þ

Here we have hei| eji = δij (1 ≤ i, j ≤ 2). Then we have a Gram matrix expressed as G= Principal 2 1-i

1þi 2

minors

hε1 jε2 i hε2 jε2 i

hε1 jε1 i hε2 jε1 i of

G

2 1-i

=

are

1þi : 2 =

|2|

2

ð13:51Þ >

0

and

= 4 - ð1 þ iÞð1 - iÞ = 2 > 0. Therefore, according to Theorem 14.12,

G > 0 (vide infra). Let us diagonalize the matrix G. To this end, we find roots of the characteristic equation. That is, 2-λ 1-i

det j G - λE j =

= 0, λ2 - 4λ þ 2 = 0:

1þi 2-λ

ð13:52Þ

p We have λ = 2 ± 2. Then as a diagonalizing unitary matrix U we get

U=

1þi p2 2 2

1þi 2p 2 2

:

ð13:53Þ

Thus, we get

{

U GU =

1-i 2 1-i 2

p

2 2p 2 2 =

p 2þ 2 0

2

1þi

1-i

2

0p 2- 2

1þi p2 2 2 :

1þi 2p 2 2 ð13:54Þ

p p The eigenvalues 2 þ 2 and 2 - 2 are real positive as expected. That is, the Gram matrix is positive definite. □

560

13

Inner Product Space

Example 13.2 Let je1i and j e2i(=-| e1i) be two vectors. Then, we have a Gram matrix expressed as he1 je1 i he1 je2 i he2 je1 i he2 je2 i

G=

-1 : 1

1 -1

=

ð13:55Þ

Similarly in the case of Example 13.1, we have as an eigenvalue equation 1-λ -1

det j G - λE j =

-1 1-λ

= 0, λ2 - 2λ = 0:

ð13:56Þ

We have λ = 2 or 0. As a diagonalizing unitary matrix U we get 1 p 2 1 - p 2

U=

1 p 2 1 p 2

:

ð13:57Þ

Thus, we get

U { GU =

1 p 2 1 p 2

1 p 2 1 p 2

-

1

-1

-1

1

1 p 2 1 - p 2

1 p 2 1 p 2

=

2

0

0

0

:

ð13:58Þ

As expected, we have a diagonal matrix, one of the eigenvalues for which is zero. That is, the Gram matrix is non-negative. In the present case, let us think of a following Hermitian quadratic form: 2

x ðGÞij xj i,j = 1 i

ϕðx1 , x2 Þ = = x1 x2 U Then, if we take

ξ1 ξ2

2 0

0 0

=

= x1 x2 UU { GUU {

= ξ1 ξ2

ξ1 ξ2

x1 x2

U{

x1 x2

0 ξ2

with ξ2 ≠ 0 as a column vector, we get ϕ(x1,

2 0

0 0

= 2jξ1 j2 :

x2) = 0. With this type of column vector, we have

U{

x1 x2

=

0 ξ2

, i:e:,

x1 x2

=U

0 ξ2

=

1 1 p p 2 2 1 1 p - p 2 2

where ξ2 is an arbitrary complex number. Thus, even for ϕ(x1, x2) = 0. To be more specific, if we take, e.g.,

x1 x2

ξ2



x1 x2

=

1 =p 2

0

1 1

0 0

or

ξ2 ξ2

,

we may have 1 2

, we get

13.3

Adjoint Operators

561

ϕð1, 1Þ = ð 1

ϕð1, 2Þ = ð 1





-1 1

1 -1

-1 1

1 -1

1 1

=ð1

=ð1

1 2





0 0

-1 1

= 0,

= 1 > 0: □

13.3 Adjoint Operators A linear transformation A is similarly defined as before and A transforms jxi of (13.32) such that AðjxiÞ = ðj e1 i ⋯ jen iÞ

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

ð13:59Þ

,

where (aij) is a matrix representation of A. Defining A(| xi) = j A(x)i and h(x) A{ j = (hx| )A{ = [A(| xi)]{ to be consistent with (1.117), we have ðxÞA{ j = x1 ⋯ xn n i = 1 yi

Therefore, putting j yi =

ðxÞA{ j yi = x1 ⋯ xn

a11 ⋮ a1n

an1 ⋮ ann

⋯ ⋱ ⋯

he1 j ⋮ hen j

:

ð13:60Þ

j ei i, we have

a11 ⋮ a1n

⋯ ⋱ ⋯

an1 ⋮ ann

= x1 ⋯ xn A{ G

h e1 j ⋮ h en j y1 ⋮ yn

ðje1 i ⋯ jen iÞ

y1 ⋮ yn

:

ð13:61Þ

Meanwhile, we get hyjAðxÞi = y1 ⋯ yn

h e1 j ⋮ h en j

= y1 ⋯ yn GA Hence, we have

a11 ⋮ an1

ðje1 i ⋯ jen iÞ x1 ⋮ xn

:

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

ð13:62Þ

562

13 x1 ⋮ xn

hyjAðxÞi = ðy1 ⋯ yn ÞG A

= ðy1 ⋯ yn ÞGT A{

T

Inner Product Space x1 ⋮ xn

:

ð13:63Þ

With the second equality, we used G = (G{)T = GT (note that G is Hermitian) and A = (A{)T. A complex conjugate matrix A is defined as a11 ⋮ an1

A 

⋯ ⋱ ⋯

a1n ⋮ ann

:

Comparing (13.61) and (13.63), we find that one is transposed matrix of the other. Also note that (AB)T = BTAT, (ABC)T = CTBTAT, etc. Since an inner product can be viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical. Hence, we get ðxÞA{ jy = hyjAðxÞi = hAðxÞjyi,

ð13:64Þ

where the second equality is due to (13.2). The other way around, we may use (13.64) for the definition of an adjoint operator of a linear transformation A. In fact, on the basis of (13.64) we have

i, j, k

xi A{

= i, j, k

ik

ðGÞkj yj =

xi yj A{

= i, j, k

xi yj A{

ik

i, j, k

yj ðG Þjk ðA Þki xi

ðGÞkj - ðG Þjk ðA Þki 

ik

- ðA Þki ðGÞkj = 0:

ð13:65Þ

With the third equality of (13.65), we used (G)jk = (G)kj, i.e., G{ = G (Hermitian matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we must have A{

ik

= ðA Þki :

ð13:66Þ

Adopting the matrix representation of (13.59) for A, we get [1] A{

ik

= aki :

ð13:67Þ

Thus, we confirm that the adjoint operator A{ is represented by a complex conjugate transposed matrix of A, in accordance with (13.1). Taking complex conjugate of (13.64), we have

13.3

Adjoint Operators

563

ðxÞA{ jy



{

= yj A{ ðxÞ = hyjAðxÞi:

Comparing both sides of the second equality, we get A{

{

= A:

ð13:68Þ

In Chap. 11, we have seen how a matrix representation of a linear transformation A is changed from A0 to A′ by the basis vectors transformation. We have A0 = P - 1 A0 P:

ð11:88Þ

In a similar manner, taking the adjoint of (11.88) we get {

ðA0 Þ = P{ A{ P - 1

{

= P{ A{ P{

-1

:

ð13:69Þ

In (13.69), we denote the adjoint operator before the basis vectors transformation simply by A{ to avoid complicated notation. We also have 0

{

A { = ðA 0 Þ :

ð13:70Þ

Meanwhile, suppose that (A{)′ = Q-1A{Q. Then, from (13.69) and (13.70) we have P{ = Q - 1 : Next let us perform a calculation as below: ðαu þ βvÞA{ jy = hyjAðαu þ βvÞi = α hyjAðuÞi þ β hyjAðvÞi = α ðuÞA{ jy þ β ðvÞA{ jy = αðuÞA{ þ βðvÞA{ jy : As y is an element arbitrarily chosen from a relevant vector space, we have ðαu þ βvÞA{ j = αðuÞA{ þ βðvÞA{ j ,

ð13:71Þ

ðαu þ βvÞA{ = αðuÞA{ þ βðvÞA{ :

ð13:72Þ

or

Equation (13.71) states the equality of two vectors in an inner product space on both sides, whereas (13.72) states that in a vector space where the inner product is

564

13

Inner Product Space

not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear transformation. In fact, the matrix representation of (13.66) and (13.67) is independent of the concept of the inner product. Suppose that there are two (or more) adjoint operators B and C that correspond to A. Then, from (13.64) we have hðxÞBjyi = hðxÞCjyi = hyjAðxÞi :

ð13:73Þ

hðxÞB - ðxÞCjyi = hðxÞðB - CÞjyi = 0:

ð13:74Þ

Also we have

As x and y are arbitrarily chosen elements, we get B = C, indicating the uniqueness of the adjoint operator. It is of importance to examine how the norm of a vector is changed by the linear transformation. To this end, let us perform a calculation as below: ðxÞA{ jAðxÞ = x1 ⋯ xn

a11 ⋮ a1n

⋯ ⋱ ⋯

an1 ⋮ ann

h e1 j ⋮ h en j

ðje1 i ⋯ jen iÞ

= x1 ⋯ xn A{ GA

x1 ⋮ xn

a11 ⋮ an1

:

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

ð13:75Þ

Equation (13.75) gives the norm of vector after its transformation. We may have a case where the norm is conserved before and after the transformation. Actually, comparing (13.34) and (13.75), we notice that if A{GA = G, hx| xi = h(x)A{| A(x)i. Let us have a following example for this. Example 13.3 Let us take two mutually orthogonal vectors je1i and je2i with ke2k = 2ke1k as basis vectors in the xy-plane (Fig. 13.1). We assume that je1i is a unit vector. Hence, a set of vectors je1i and je2i does not constitute the orthonormal basis. Let jxi be an arbitrary position vector expressed as j xi = ðje1 i je2 iÞ

x1 x2

:

ð13:76Þ

Then we have hxjxi = ðx1 x2 Þ

he1 j he2 j

ðj e1 i je2 iÞ

x1 x2

13.3

Adjoint Operators

565

Fig. 13.1 Basis vectors je1i and je2i in the xy-plane and their linear transformation by R

y | ⟩ |

′⟩

| ⟩ cos

| ′⟩

sin | ⟩ 2

he1 je1 i he2 je1 i

he1 je2 i he2 je2 i

x

| ⟩ cos

−2| ⟩ sin

= ð x1 x2 Þ

| ⟩

x1 x2

= ðx1 x2 Þ

1 0 0 4

= x21 þ 4x22 :

x1 x2 ð13:77Þ

Next let us think of a following linear transformation R whose matrix representation is given by R=

cos θ ð sin θÞ=2

- 2 sin θ : cos θ

ð13:78Þ

The transformation matrix R is geometrically represented in Fig. 13.1. Following (11.36) we have Rðj xiÞ = j e1 i je2 i

cos θ ð sin θÞ=2

- 2 sin θ cos θ

x1 : x2

ð13:79Þ

As a result of the transformation R, the basis vectors je1i and je2i are transformed into j e01 and j e02 , respectively, as in Fig. 13.1 such that je1′

je2′

= j e1 i je2 i

cos θ ð sin θÞ=2

Taking an inner product of (13.79), we have

- 2 sin θ : cos θ

566

13

Inner Product Space

ðxÞR{ jRðxÞ = ð x1 x 2 Þ

cos θ - 2 sin θ

ð sin θÞ=2 cos θ

= ðx1 x2 Þ

1 0

1 0 0 4

0 4

cos θ ð sin θÞ=2

x1 x2

= x21 þ 4x22 :

- 2 sin θ cos θ

x1 x2 ð13:80Þ

Putting G=

1 0

0 , 4

we have R{ GR = G:

ð13:81Þ

Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged after the transformation R. This means that R is virtually a unitary transformation. A somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other than an orthonormal basis. □

13.4

Orthonormal Basis

Now we introduce an orthonormal basis, the simplest and most important basis set in an inner product space. If we choose the orthonormal basis so that hei| eji = δij, a Gram matrix G = E. Thus, R{GR = G reads as R{R = E. In that case a linear transformation is represented by a unitary matrix and it conserves a norm of a vector and an inner product with two arbitrary vectors. So far we assumed that an adjoint operator A{ operates only on a row vector from the right, as is evident from (13.61). At the same time, A operates only on the column vector from the left as in (13.62). To render the notation of (13.61) and (13.62) consistent with the associative law, we have to examine the commutability of A{ with G. In this context, choice of the orthonormal basis enables us to get through such a troublesome situation and largely eases matrix calculation. Thus, ðxÞA{ jAðxÞ = x1 ⋯ xn

a11 ⋮ a1n

⋯ ⋱ ⋯

an1 ⋮ ann

h e1 j ⋮ h en j

ðje1 i ⋯ jen iÞ

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

13.4

Orthonormal Basis

567

= x1 ⋯ xn A{ EA

x1 ⋮ xn

= x1 ⋯ xn A{ A

x1 ⋮ xn

:

ð13:82Þ

At the same time, we adopt a simple notation as below instead of (13.75) xA{ jAx = x1 ⋯ xn A{ A

x1 ⋮ xn

:

ð13:83Þ

This notation has become now consistent with the associative law. Note that A{ and A operate on either a column or row vector. We can also do without a symbol “|” in (13.83) and express it as xA{ Ax = xA{ jAx :

ð13:84Þ

Thus, we can freely operate A{ and A from both the left and right. By the same token, we rewrite (13.62) as hyjAxi = hyAjxi = y1 ⋯ yn

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

= y1 ⋯ yn A

x1 ⋮ xn

: ð13:85Þ

Here, notice that a vector jxi is represented by a column vector with respect to the orthonormal basis. Using (13.64), we have y1 ⋮ yn

hyjAxi = xA{ jy = x1 ⋯ xn A{ = x1 ⋯ xn

a11 ⋮ a1n

⋯ ⋱ ⋯

an1 ⋮ ann

y1 ⋮ yn

:

ð13:86Þ

If in (13.83) we put jyi = j Axi, hxA{| Axi = hy| yi ≥ 0. Thus, we define a norm of jAxi as Ax =

xA{ jAx :

ð13:87Þ

Now we are in a position to construct an orthonormal basis in Vn using n linearly independent vectors jii (1 ≤ i ≤ n). The following theorem is well-known as the Gram–Schmidt orthonormalization. Theorem 13.2 Gram-Schmidt Orthonormalization Theorem [5] Suppose that there are a set of linearly independent vectors jii (1 ≤ i ≤ n) in Vn. Then one can construct an orthonormal basis jeii (1 ≤ i ≤ n) so that hei| eji = δij (1 ≤ j ≤ n) and each vector jeii can be a linear combination of the vectors jii.

568

13

Inner Product Space

Proof First let us take j1i. This can be normalized such that je1 i =

j 1i , he1 je1 i = 1: h1j1i

ð13:88Þ

Next let us take j2i and then make a following vector: je2 i =

1 ½2i - he1 j2ije1 i, L2

ð13:89Þ

where L2 is a normalization constant such that ‹e2 j e2 › = 1. Note that | e2i cannot be a zero vector. This is because if | e2i were a zero vector, | 2i and | e1i (or | 1i) would be linearly dependent, in contradiction to the assumption. We have he1| e2i = 0. Thus, | e1i and | e2i are orthonormal. After this, the proof is based upon mathematical induction. Suppose that the theorem is true of (n - 1) vectors. That is, let | eii (1 ≤ i ≤ n - 1) so that hei| eji = δij (1 ≤ j ≤ n - 1) and each vector | eii can be a linear combination of the vectors jii. Meanwhile, let us define ~ j ni j ni

n-1 j=1

ej jn jej :

ð13:90Þ

Again, the vector j n~i cannot be a zero vector as asserted above. We have ek j ~n = hek jni -

n-1 j=1

ej jn ek jej = hek jni -

n-1 j=1

ej jn δkj = 0, ð13:91Þ

where 1 ≤ k ≤ n - 1. The second equality comes from the assumption of the induction. The vector j n~i can always be normalized such that j en i =

~ j ni ~ h~n j ni

, hen jen i = 1:

ð13:92Þ



Thus, the theorem is proven.

We note that in (13.92) a phase factor eiθ (θ: an arbitrarily chosen real number) can be added such that j en i =

~ eiθ j ni ~ h~n j ni

:

ð13:93Þ

To prove Theorem 13.2, we have used the following simple but important theorem.

13.4

Orthonormal Basis

569

Theorem 13.3 Let us have any n vectors jii ≠ 0 (1 ≤ i ≤ n) in Vn and let these vectors jii be orthogonal to one another. Then the vectors jii are linearly independent. Proof Let us think of the following equation: c1 j1i þ c2 j2i þ ⋯ þ cn j ni = 0:

ð13:94Þ

Multiplying (13.94) by hij from the left and considering the orthogonality among the vectors, we have ci hijii = 0:

ð13:95Þ

Since hi| ii ≠ 0, ci = 0. The above is true of any ci and jii. Then c1 = c2 = ⋯ = cn = 0. Thus, (13.94) implies that j1i, j 2i, ⋯, j ni are linearly independent. ■ It is worth reviewing Theorem 13.2 from the point of view of the similarity transformation. In Sect. 13.2, we have seen that a Gram matrix comprising linearly independent vectors is a positive definite Hermitian matrix. In that case, we had λ1 ⋮ 0

U - 1 GU = U { GU = G0 =

⋯ ⋱ ⋯

0 ⋮ λn

ð13:40Þ

,

where λi > 0 (1 ≤ i ≤ n). Now, let us define a following matrix given by p 1= λ1 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ p 1= λn

DT U { GUD =

1 ⋮ 0

⋯ ⋱ ⋯

D=

:

ð13:96Þ

Then, we have 0 ⋮ 1

ð13:97Þ

,

where RHS of (13.97) is a (n, n) identity matrix. Thus, we find that (13.97) is equivalent to what Theorem 13.2 implies. If the vector space we are thinking of is a real vector space, the corresponding Gram matrix is real as well and U of (13.97) is an orthogonal matrix. Then, instead of (13.97) we have DT U T GUD = ðUDÞT GUD = Defining UD  P, we have

1 ⋮ 0

⋯ ⋱ ⋯

0 ⋮ 1

:

ð13:98Þ

570

13

PT GP = En ,

Inner Product Space

ð13:99Þ

where En stands for the (n, n) identity matrix and P is a real non-singular matrix. The transformation of G represented by (13.99) is said to be an equivalence transformation (see Sect. 14.5). Equation (13.99) gives a special case where Theorem 14.11 holds; also see Sect. 14.5. The real vector space characterized by a Gram matrix En is called Euclidean space. Its related topics are discussed in Parts IV and V. In Part V, in particular, we will investigate several properties of vector spaces other than the (real) inner product space in relation to the quantum theory of fields.

References 1. Hassani S (2006) Mathematical physics. Springer, New York 2. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York 3. Satake I (1974) Linear algebra. Shokabo, Tokyo 4. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd edn. Cambridge University Press, Cambridge 5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York

Chapter 14

Hermitian Operators and Unitary Operators

Hermitian operators and unitary operators are quite often encountered in mathematical physics and, in particular, quantum physics. In this chapter we investigate their basic properties. Both Hermitian operators and unitary operators fall under the category of normal operators. The normal matrices are characterized by an important fact that those matrices can be diagonalized by a unitary matrix. Moreover, Hermitian matrices always possess real eigenvalues. This fact largely eases mathematical treatment of quantum mechanics. In relation to these topics, in this chapter we investigate projection operators systematically. We find their important application to physicochemical problems in Part IV. We further investigate Hermitian quadratic forms and real symmetric quadratic forms as an important branch of matrix algebra. In connection with this topic, positive definiteness and non-negative property of a matrix are an important concept. This characteristic is readily applicable to theory of differential operators, thus rendering this chapter closely related to basic concepts of quantum physics.

14.1

Projection Operators

In Chap. 12 we considered the decomposition of a vector space to direct sum of invariant subspaces. We also mentioned properties of idempotent operators. Moreover, we have shown how an orthonormal basis can be constructed from a set of linearly independent vectors. In this section an orthonormal basis set is implied as basis vectors in an inner product space Vn. Let us start with a concept of an orthogonal complement. Let W be a subspace in Vn. Let us think of a set of vectors jxi such that

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_14

571

572

14

Hermitian Operators and Unitary Operators

jxi; hxjyi = 0 for 8 j yi 2 W :

ð14:1Þ

We name this set W⊥ and call it an orthogonal complement of W. The set W⊥ forms a subspace of Vn. In fact, if jai, j bi 2 W⊥, ha| yi = 0, hb| yi = 0. Since (ha| +hb| ) j y i = ha| yi + hb| yi = 0. Therefore, jai + j bi 2 W⊥ and hαa| yi = αha| yi = 0. Hence, jαai = α j ai 2 W⊥. Then, W⊥ is a subspace of Vn. Theorem 14.1 [1, 2] Let W be a subspace and W⊥ be its orthogonal complement in Vn. Then, V n = W⨁W⊥ :

ð14:2Þ

Proof Suppose that an orthonormal basis comprising je1i, j e2i, ⋯, j eni spans Vn; V n = Spanfje1 i, je2 i, ⋯, jen ig: Of the orthonormal basis, let je1i, j e2i, ⋯, j eri (r < n) span W. Let an arbitrarily chosen vector from Vn be jxi. Then we have j xi = x1 j e1 i þ x2 j e2 i þ ⋯ þ xn j en i =

n

x i=1 i

j ei i:

ð14:3Þ

Multiplying hejj on (14.3) from the left, we have ej jx =

n

x i=1 i

ej jei =

n

xδ i = 1 i ji

= xj :

ð14:4Þ

That is, j xi =

n

j x0 i =

r

i=1

hei jxijei i:

ð14:5Þ

hei jxijei i:

ð14:6Þ

Meanwhile, put i=1

Then we have jx′i 2 W. Also putting jx′′i = j xi - j x′i and multiplying hei j (1 ≤ i ≤ r) on it from the left, we get hei jx00 i = hei jxi - hei jx0 i = hei jxi - hei jxi = 0:

ð14:7Þ

Taking account of W = Span{| e1i, | e2i, ⋯, | eri}, from the definition of the orthogonal complement we get jx′′i 2 W⊥. That is, for 8jxi 2 Vn

14.1

Projection Operators

573

j xi = j x0 iþ j x00 i:

ð14:8Þ

This means that Vn = W + W⊥. Meanwhile, we have W \ W⊥ = {0}. In fact, suppose that jxi 2 W \ W⊥. Then hx| xi = 0 because of the orthogonality. However, this implies that jxi = 0. Consequently, from Theorem 11.1 we have Vn = W ⨁ W⊥. This completes the proof. The consequence of Theorem 14.1 is that the dimension of W⊥ is (n - r); see Theorem 11.2. In other words, we have dimV n = n = dimW þ dimW ⊥ : Moreover, the contents of Theorem 14.1 can readily be generalized to more subspaces such that V n = W 1 ⨁W 2 ⨁⋯⨁W r ,

ð14:9Þ

where W1, W2, ⋯, and Wr (r ≤ n) are mutually orthogonal complements. In this case 8jxi 2 Vn can be expressed uniquely as the sum of individual vectors jw1i, j w2i, ⋯, and j wri of each subspace; i.e., j xi = j w1 iþ j w2 i þ ⋯þ j wr i = jw1 þ w2 þ ⋯ þ wr i:

ð14:10Þ

Let us define the following operators similarly to the case of (12.201): Pi ðjxiÞ = j wi i ð1 ≤ i ≤ r Þ:

ð14:11Þ

Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have ðP1 þ P2 þ ⋯ þ Pr ÞðjxiÞ = P1 j xi þ P2 j xi þ ⋯ þ Pr j xi = j w1 iþ j w2 i þ ⋯þ j wr i = j xi:

ð14:12Þ

Since jxi is an arbitrarily chosen vector, we get P1 þ P2 þ ⋯ þ Pr = E:

ð14:13Þ

Pi ½Pi ðjxiÞ = Pi ðjwi iÞ = jwi i ð1 ≤ i ≤ r Þ:

ð14:14Þ

Moreover,

Therefore, from (14.11) and (14.14) we have Pi ½Pi ðjxiÞ = Pi ðjxiÞ: The vector jxi is arbitrarily chosen, and so we get

ð14:15Þ

574

14

Hermitian Operators and Unitary Operators

Pi 2 = Pi :

ð14:16Þ

Choose another arbitrary vector jyi 2 Vn such that j yi = j u1 iþ j u2 i þ ⋯þ j ur i = j u1 þ u2 þ ⋯ þ ur i:

ð14:17Þ

Then, we have hxjPi yi = hw1 þ w2 þ ⋯ þ wr jPi ju1 þ u2 þ ⋯ þ ur i = hw1 þ w2 þ ⋯ þ wr jui i = hwi jui i:

ð14:18Þ

With the last equality, we used the mutual orthogonality of the subspaces. Meanwhile, we have hyjPi xi = hu1 þ u2 þ ⋯ þ ur jPi jw1 þ w2 þ ⋯ þ wr i = hu1 þ u2 þ ⋯ þ ur jwi i = hui j wi i = hwi jui i:

ð14:19Þ

Comparing (14.18) and (14.19), we get hxjPi yi = hyjPi xi = xjPi { y ,

ð14:20Þ

where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily chosen, we get Pi { = Pi :

ð14:21Þ

Equation (14.21) shows that Pi is Hermitian. The above discussion parallels that made in Sect. 12.4 with an idempotent operator. We have a following definition about a projection operator. Definition 14.1 An operator P is said to be a projection operator if P2 = P and P{ = P. That is, an idempotent and Hermitian operator is a projection operator. As described above, a projection operator is characterized by (14.16) and (14.21). An idempotent operator does not premise the presence of an inner product space, but we only need a direct sum of subspaces. In contrast, if we deal with the projection operator, we are thinking of orthogonal complements as subspaces and their direct sum. The projection operator can adequately be defined in an inner product vector space having an orthonormal basis. From (14.13) we have

14.1

Projection Operators

575 r

ðP1 þ P2 þ ⋯ þ Pr ÞðP1 þ P2 þ ⋯ þ Pr Þ = =

r

P i=1 i

þ

PP i≠j i j

=E þ

PP i≠j i j

P i=1 i

2

þ

PP i≠j i j

= E:

ð14:22Þ

In (14.22) we used (14.13) and (14.16). Therefore, we get PP i≠j i j

= 0:

In particular, we have Pi Pj = 0, Pj Pi = 0 ði ≠ jÞ:

ð14:23Þ

In fact, we have = 0 ði ≠ j, 1 ≤ i, j ≤ nÞ:

Pi Pj ðjxiÞ = Pi jwj

ð14:24Þ

The second equality comes from Wi \ Wj = {0}. Notice that in (14.24), indices i and j are interchangeable. Again, jxi is arbitrarily chosen, and so (14.23) holds. Combining (14.16) and (14.23), we write Pi Pj = δij :

ð14:25Þ

In virtue of the relation (14.23), Pi + Pj (i ≠ j) is a projection operator as well [3]. In fact, we have Pi þ Pj

2

= Pi 2 þ Pi Pj þ Pj Pi þ Pj 2 = Pi 2 þ Pj 2 = Pi þ Pj ,

where the second equality comes from (14.23). Also we have Pi þ Pj

{

= Pi { þ Pj { = Pi þ Pj :

The following notation is often used: Pi =

jwi ihwi j wi  wi

:

Then we have Pi j xi =

jwi ihwi j ðj w1 iþ j w2 i þ ⋯ þ jwr iÞ jjwi jj  jjwi jj

ð14:26Þ

576

14

Hermitian Operators and Unitary Operators

= ½jwi iðhwi jw1 i þ hwi jw2 i þ ⋯ þ hwi jwr iÞ=hwi jwi i = ½jwi ihwi jwi i=hwi jwi i = j wi i: Furthermore, we have Pi 2 j xi = Pi j wi i = j wi i = Pi j xi:

ð14:27Þ

Equation (14.16) is recovered accordingly. Meanwhile, ðjwi ihwi jÞ{ = ðhwi jÞ{ ðjwi iÞ{ = jwi ihwi j :

ð14:28Þ

Pi { = Pi :

ð14:21Þ

Hence, we recover

In (14.28) we used (AB){ = B{A{. In fact, we have xjB{ A{ jy = xB{ jA{ y = hyAjBxi = hyjABjxi = xjðABÞ{ jy :

ð14:29Þ

With the second equality of (14.29), we used (13.86) where A is replaced with B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors jxi and jyi, comparing the first and last sides of (14.29) we have ðABÞ{ = B{ A{ :

ð14:30Þ

We can express (14.29) alternatively as follows: {

{

xB{ jA{ y = y A{ j B{ x



= hyAjBxi = hyABxi ,

ð14:31Þ

where with the second equality we used (13.68). Also recall the remarks after (13.83) with the expressions of (14.29) and (14.31). Other notations can be adopted. We can view a projection operator under a more strict condition. Related operators can be defined as well. As in (14.26), let us define an operator such that P~k = j ek ihek j ,

ð14:32Þ

where jeki (k = 1, ⋯, n) are the orthonormal basis set of Vn. Operating P~k on jxi = x1 j e1i + x2 j e2i + ⋯ + xn j eni from the left, we get

14.1

Projection Operators

577

P~k j xi = j ek ihek j

n

x je j=1 j j

n

j ek i

xδ j = 1 j kj

n

x he je j=1 j k j

= j ek i

=

= xk j ek i:

Thus, we find that P~k plays the same role as P(k) defined in (12.201). Represented by a matrix, P~k has the same structure as that denoted in (12.205). Evidently, P~k

2

{ = P~k , P~k = P~k , P~1 þ P~2 þ ⋯ þ P~n = E:

ð14:33Þ

ðk Þ

Now let us modify P(k) in (12.201). There PðkÞ = δi δðj kÞ , where only the (k, k) ðk Þ

ðk Þ

element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ = δi δjðmÞ . A full matrix representation for it is 0 ⋱

1 0

ðk Þ

PðmÞ =

0

ð14:34Þ

, 0

⋱ 0

where only the (k, m) element is 1, otherwise 0. ðk Þ In an example of (14.34), PðmÞ is an upper triangle matrix (k < m). Therefore, its ðk Þ

eigenvalues are all zero, and so PðmÞ is a nilpotent matrix. If k > m, the matrix is a lower triangle matrix and nilpotent as well. Such a matrix is not Hermitian (nor a projection operator), as can be immediately seen from the matrix form of (14.34). ðk Þ Because of the properties of nilpotent matrices mentioned in Sect. 12.3, PðmÞ ðk ≠ mÞ is not diagonalizable either. Various relations can be extracted. As an example, we have ðk Þ

ðlÞ

PðmÞ PðnÞ =

ðk Þ

q

ðk Þ

ðk Þ

δi δqðmÞ δðqlÞ δðj nÞ = δi δml δðj nÞ = δml PðnÞ :

ð14:35Þ

ðk Þ

Note that PðkÞ  PðkÞ defined in (12.201). From (14.35), moreover, we have ðk Þ

ðmÞ

ðk Þ

ðnÞ

ðmÞ

ðnÞ

ðmÞ ðnÞ

ðmÞ

PðmÞ PðnÞ = PðnÞ , PðmÞ PðnÞ = PðnÞ , PðnÞ PðmÞ = PðmÞ , ðk Þ

ðk Þ

ðk Þ

ðmÞ ðmÞ

ðmÞ

PðmÞ PðmÞ = δmk PðmÞ , PðmÞ PðmÞ = PðmÞ

2

ðmÞ

= PðmÞ , etc:

These relations remain unchanged after the unitary similarity transformation by U. For instance, taking the first equation of the above, we have

578

14 ðk Þ

ðmÞ

Hermitian Operators and Unitary Operators

ðk Þ

ðmÞ

ðk Þ

U { PðmÞ PðnÞ U = U { PðmÞ U U { PðnÞ U = U { PðnÞ U: ðk Þ

Among these operators, only PðkÞ is eligible for a projection operator. We will encounter further examples in Parts IV and V. ðk Þ Replacing A in (13.62) with PðkÞ , we obtain ðk Þ

ðk Þ

yjPðkÞ ðxÞ = y1 ⋯ yn GPðkÞ

x1 ⋮ xn

:

ð14:36Þ

Within a framework of an orthonormal basis where G = E, the representation is largely simplified to be ðk Þ

ðk Þ

yjPðkÞ ðxÞ = y1 ⋯ yn PðkÞ

14.2

x1 ⋮ xn

= yk xk :

ð14:37Þ

Normal Operators

There are a large group of operators called normal operators that play an important role in mathematical physics, especially quantum physics. A normal operator is defined as an operator on an inner product space that commutes with its adjoint operator. That is, let A be a normal operator. Then, we have AA{ = A{ A:

ð14:38Þ

The normal operators include an Hermitian operator H defined as H{ = H as well as a unitary operator U defined as UU{ = U{U = E. In this condition let us estimate the norm of jA{xi together with jAxi defined by (13.87). If A is a normal operator, A{ x =

{

x A{ jA{ x =

xAA{ x =

xA{ Ax = jjAxjj:

ð14:39Þ

The other way around suppose that j|Ax| j = kA{xk. Then, since ||Ax||2 = kA{xk2, hxA{Axi = hxAA{xi. That is, hx| A{A - AA{| xi = 0 for an arbitrarily chosen vector jxi. To assert A{A - AA{ = 0, i.e., A{A = AA{ on the assumption that hx| A{A - AA{| xi = 0, we need the following theorems: Theorem 14.2 [4] A linear transformation A on an inner product space is the zero transformation if and only if hy| Axi = 0 for any vectors jxi and jyi.

14.2

Normal Operators

579

Proof If A = 0, then hy| Axi = hy| 0i = 0. This is because in (13.3) putting β = 1 = - γ and jbi = j ci, we get ha| 0i = 0. Conversely, suppose that hy| Axi = 0 for any vectors jxi and jyi. Then, putting jyi = j Axi, hxA{| Axi = 0 and hy| yi = 0. This implies that jyi = j Axi = 0. For jAxi = 0 to hold for any jxi we must have A = 0. Note here that if A is a singular matrix, for some vectors jxi, j Axi = 0. However, even though A is singular, for jAxi = 0 to hold for any jxi, A = 0. This completes the proof. We have another important theorem under a further restricted condition. Theorem 14.3 [4] A linear transformation A on an inner product space is the zero transformation if and only if hx| Axi = 0 for any vectors jxi. Proof As in the case of Theorem 14.2, a necessary condition is trivial. To prove a sufficient condition, let us consider the following: hx þ yjAðx þ yÞi = hxjAxi þ hyjAyi þ hxjAyi þ hyjAxi, hxjAyi þ hyjAxi = hx þ yjAðx þ yÞi - hxjAxi - hyjAyi:

ð14:40Þ

From the assumption that hx| Axi = 0 with any vectors jxi, we have hxjAyi þ hyjAxi = 0:

ð14:41Þ

Meanwhile, replacing jyi by jiyi in (14.41), we get hxjAiyi þ hiyjAxi = i½hxjAyi - hyjAxi = 0:

ð14:42Þ

hxjAyi - hyjAxi = 0:

ð14:43Þ

That is,

Summing both sides of (14.41) and (14.43), we get hxjAyi = 0:

ð14:44Þ

Theorem 14.2 means that A = 0, indicating that the sufficient condition holds. This completes the proof. Thus returning to the beginning, i.e., remarks made after (14.39), we establish the following theorem: Theorem 14.4 A necessary and sufficient condition for a linear transformation A on an inner product space to be a normal operator is that a following relation

580

14

Hermitian Operators and Unitary Operators

A{ x = jjAxjj

ð14:45Þ

holds.

14.3 Unitary Diagonalization of Matrices A normal operator has a distinct property. The normal operator can be diagonalized by a similarity transformation by a unitary matrix. The transformation is said to be a unitary similarity transformation. Let us prove the following theorem. Theorem 14.5 [5] A necessary and sufficient condition for a matrix A to be diagonalized by unitary similarity transformation is that the matrix A is a normal matrix. Proof To prove the necessary condition, suppose that A can be diagonalized by a unitary matrix U. That is, U { AU = D, i:e:, A = UDU { and A{ = UD{ U { ,

ð14:46Þ

where D is a diagonal matrix. Then AA{ = UDU { UD{ U { = UDD{ U { = UD{ DU { = UD{ U {  UDU { = A{ A:

ð14:47Þ

For the third equality, we used DD{ = D{D (i.e., D and D{ are commutable). This shows that A is a normal matrix. To prove the sufficient condition, let us show that a normal matrix can be diagonalized by unitary similarity transformation. The proof is due to mathematical induction, as is the case with Theorem 12.1. First we show that Theorem is true of a (2, 2) matrix. Suppose that one of the eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following procedures of the proof for Theorem 12.1 and remembering the Gram-Schmidt orthonormalization theorem, we can construct a unitary matrix U1 such that U 1 = ðjx1 i jp1 iÞ,

ð14:48Þ

where jx1i represents a column vector and jp1i is another arbitrarily determined column vector. Then we can convert A2 to a triangle matrix such that

14.3

Unitary Diagonalization of Matrices

581

A~2  U 1 { A2 U 1 =

α1 0

x y

:

ð14:49Þ

Then we have A~2 A~2

{

{

= U 1 { A2 U 1 U 1 { A2 U 1

= U 1 { A2 U 1 U 1 { A2 { U 1

= U 1 { A2 A2 { U 1 = U 1 { A2 { A2 U 1 = U 1 { A2 { U 1 U 1 { A2 U 1 { = A~2 A~2 :

ð14:50Þ

With the fourth equality, we used the supposition that A2 is a normal matrix. Equation (14.50) means that A~2 defined in (14.49) is a normal operator. Via simple matrix calculations, we have A2 A2

{

=

jα1 j2 þ jxj2 x y

xy , j yj 2

A2

{

A2 =

jα1 j2 α1 x

α1 x : jxj þ jyj2 2

ð14:51Þ

For (14.50) to hold, we must have x = 0 in (14.51). Accordingly, we get A2 =

α1 0

0 : y

ð14:52Þ

This implies that a normal matrix A2 has been diagonalized by the unitary similarity transformation. Now let us examine a general case where we consider a (n, n) square normal matrix An. Let αn be one of the eigenvalues of An. On the basis of the argument of the ~ we (2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U first have ~ ~ { An U, A~2 = U

ð14:53Þ

where we can put An =

αn 0

xT B

,

ð14:54Þ

where x is a column vector of order (n - 1), 0 is a zero column vector of order (n 1), and B is a (n - 1, n - 1) matrix. Then we have

582 {

An

=

14

Hermitian Operators and Unitary Operators

αn x

0 , B{

ð14:55Þ

where x is a complex column vector. Performing matrix calculations, we have An An An

{

{

jαn j2 þ xT x Bx

=

jαn j2 α n x

An =

xT B{ , BB{

αn xT : x x þ B{ B

ð14:56Þ

 T

For A~n ½A~n { = ½A~n { ½A~n  to hold with (14.56), we must have x = 0. Thus we get An =

αn 0

0 : B

ð14:57Þ

Since A~n is a normal matrix, so is B. According to mathematical induction, let us assume that the theorem holds with a (n - 1, n - 1) matrix, i.e., B. Then, also from the assumption there exists a unitary matrix C and a diagonal matrix D, both of order (n - 1), such that BC = CD. Hence, αn 0

0 B

1 0

0 C

=

1 0

αn 0

0 C

0 : D

ð14:58Þ

Here putting Cn =

1 0

0 C

and Dn =

αn 0

0 , D

ð14:59Þ

we get A~n C~n = C~n D~n : is a (n, n) unitary matrix, [C~n { C~n = C~n C~n As C~n { C~n A~n C~n = D~n . Thus, from (14.53) finally we get C~n

{

~ { An U ~ C~n = D~n : U

ð14:60Þ {

= E. Hence,

ð14:61Þ

~ C~n = V, V being another unitary operator, Putting U V { An V = D~n : Obviously, from (14.59) D~n is a diagonal matrix.

ð14:62Þ

14.3

Unitary Diagonalization of Matrices

583

These complete the proof. A direct consequence of Theorem 14.5 is that with any normal matrix we can find a set of orthonormal eigenvectors corresponding to individual eigenvalues whether or not those are degenerate. In Sect. 12.2 we dealt with a decomposition of a linear vector space and relevant reduction of an operator when discussing canonical forms of matrices. In this context Theorem 14.5 gives a simple and clear criterion for this. Equation (14.57) implies that a (n - 1, n - 1) submatrix B can further be reduced to matrices having lower dimensions. Considering that a diagonal matrix is a special case of triangle matrices, a normal matrix that has been diagonalized by the unitary similarity transformation gives eigenvalues by its diagonal elements. From a point of view of the aforementioned aspect, let us consider the characteristics of normal matrices, starting with the discussion about the invariant subspaces. We have a following important theorem. Theorem 14.6 Let A be a normal matrix and let one of its eigenvalues be α. Let Wα be an eigenspace corresponding to α. Then, Wα is both A-invariant and A{-invariant. Also Wα⊥ is both A-invariant and A{-invariant. Proof Theorem 14.5 ensures that a normal matrix is diagonalized by unitary similarity transformation. Therefore, we deal with only “proper” eigenvalues and eigenvectors here. First we show if a subspace W is A-invariant, then its orthogonal complement W⊥ is A{-invariant. In fact, suppose that jxi 2 W and jx′i 2 W⊥. Then, from (13.64) and (13.86) we have hx0 jAxi = 0 = xA{ jx0





= xjA{ x0 :

ð14:63Þ

The first equality comes from the fact that |xi 2 W ⟹ A|xi (=| Axi) 2 W as W is A-invariant. From the last equality of (14.63), we have A{ jx0

= jA{ x0

2 W ⊥:

ð14:64Þ

That is, W⊥ is A{-invariant. Next suppose that jxi 2 Wα. Then we have AA{ j xi = A{ A j xi = A{ ðαjxiÞ = αA{ j xi:

ð14:65Þ

Therefore, A{ j xi 2 Wα. This means that Wα is A{-invariant. From the above remark, Wα⊥ is (A{){-invariant, and so A-invariant accordingly. This completes the proof. From Theorem 14.5, we know that the resulting diagonal matrix D~n in (14.62) has a form with n eigenvalues (αn) some of which may be multiple roots arranged in diagonal elements. After diagonalizing the matrix, those eigenvalues can be sorted out according to different eigenvalues α1, α2, ⋯, and αs. This can also be done by unitary similarity transformation. The relevant unitary matrix U is represented as

584

14

Hermitian Operators and Unitary Operators

1 ⋱ 1



0 1

U=

1







1



,

ð14:66Þ

1 0 1

⋱ 1

where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If operated from the right, U exchanges the i-th and j-th columns of the matrix. Note that U is at once unitary and Hermitian with eigenvalue 1 or -1. Note that U2 = E. This is because exchanging two columns (or two rows) two times produces identity transformation. Thus performing such unitary similarity transformations appropriate times, we get α1



α1

α2

Dn



α2

: ⋱

αs



ð14:67Þ

αs

The matrix is identical to that represented in (12.181). In parallel, Vn is decomposed to mutually orthogonal subspaces associated with different eigenvalues α1, α2, ⋯, and αs such that V n = W α1

W α2



W αs :

ð14:68Þ

This expression is formally identical to that represented in (12.191). Note, however, that in (12.181) orthogonal subspaces are not implied. At the same time, An is reduced to Að1Þ An



Að2Þ

according to the different eigenvalues.

⋯ ⋱ ⋯

⋮ AðsÞ

,

ð14:69Þ

14.3

Unitary Diagonalization of Matrices

585

A normal operator has other distinct properties. Following theorems are good examples. Theorem 14.7 Let A be a normal operator on Vn. Then jxi is an eigenvector of A with an eigenvalue α, if and only if jxi is an eigenvector of A{ with an eigenvalue α. Proof We apply (14.45) for the proof. Both (A - αE){ = A{ - αE and (A - αE) are normal, since A is normal. Consequently, we have k(A - αE)xk = 0 if and only if k(A{ - αE)xk = 0. Since only the zero vector has a zero norm, we get (A - αE) j xi = 0 if and only if (A{ - αE) j xi = 0. This completes the proof. Theorem 14.8 Let A be a normal operator on Vn. Then, eigenvectors corresponding to different eigenvalues are mutually orthogonal. Proof Let A be a normal operator on Vn. Let jui be an eigenvector corresponding to an eigenvalue α and jvi be an eigenvector corresponding to an eigenvalue β with α ≠ β. Then we have αhvjui = hvjαui = hvjAui = ujA{ v



= hujβ vi = hβ vjui = βhvjui,

ð14:70Þ

where with the fourth equality we used Theorem 14.7. Then we get ðα - βÞhvjui = 0: Since α - β ≠ 0, hv| ui = 0. Namely, the eigenvectors jui and jvi are mutually orthogonal. This completes the proof. In (12.208) we mentioned the decomposition of diagonalizable matrices. As for the normal matrices, we have a related matrix decomposition. Let A be a normal operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as (14.67). This is equivalently expressed as a following succinct relation. That is, if we choose U for a diagonalizing unitary matrix, we have U { AU = α1 P1 þ α2 P2 þ ⋯ þ αs Ps ,

ð14:71Þ

where α1, α2, ⋯, and αs are different eigenvalues (possibly degenerate) of A; Pl (1 ≤ l ≤ s) is described such that, e.g., E n1 P1 =



0n 2

⋯ ⋱ ⋯

⋮ 0n s

,

ð14:72Þ

586

14

Hermitian Operators and Unitary Operators

where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This expression is in accordance with (14.69). From a matrix form (14.72), obviously Pl (1 ≤ l ≤ s) is a projection operator. Thus, operating U and U{ on both sides (14.71) from the left and right of (14.71), respectively, we obtain A = α1 UP1 U { þ α2 UP2 U { þ ⋯ þ αs UPs U { :

ð14:73Þ

Defining P~l  UPl U { ð1 ≤ l ≤ sÞ, we have A = α1 P~1 þ α2 P~2 þ ⋯ þ αs P~s :

ð14:74Þ

In (14.74), we can easily check that P~l is a projection operator with α1, α2, ⋯, and αs being different eigenvalues of A. If one of the eigenvalues is zero, in (14.74) we may disregard the term related to the zero eigenvalue. For example, suppose that α1 = 0 in (14.74), we get A = α2 P~2 þ ⋯ þ αs P~s : In case, moreover, αl (1 ≤ l ≤ s) is degenerate, we express P~l as P~μl ð1 ≤ μ ≤ ml Þ, where ml is multiplicity of αl. In that case, we may write P~l = P~1l



l P~m l :

ð14:75Þ

Also, for k ≠ l we have P~k P~l = UPk U { UPl U { = UPk EPl U { = UPk Pl U { = 0 ð1 ≤ k, l ≤ sÞ: The last equality comes from (14.23). Similarly, we have P~l P~k = 0: Thus, we have P~k P~l = δkl : If the operator is decomposed as in the case of (14.75), we can express P~μl P~νl = δμν ð1 ≤ μ, ν ≤ ml Þ: Conversely, if an operator A is expressed by (14.74), that operator is normal operator. In fact, we have

14.3

Unitary Diagonalization of Matrices

587

{

~i αi P

{

A A= i

~j αj P

~iP ~j = αi  αj P

= i, j

j

~i = αi  αj δij P i, j

~i, jαi j2 P i

{

~j αj P

AA{ = j

~i αi P

~jP ~i = αi  αj P

= i, j

i

~j αi  αj δji P i, j

2~

=

jαi j Pi :

ð14:76Þ

i

Hence, A{A = AA{. If projection operators are further decomposed as in the case of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient condition for an operator to be a normal operator is that the said operator is expressed as (14.74). The relation (14.74) is well-known as a spectral decomposition theorem. Thus, the spectral decomposition theorem is equivalent to Theorem 14.5. In Sect. 12.7, we have already encountered the matrix decomposition (12.211) similar to that of (14.74). In this context, (12.211) may well be referred to as the spectral decomposition in a broader sense. Notice, however, that in (12.211) A~k ð1 ≤ k ≤ nÞ is not necessarily Hermitian. In Part V, we will encounter interesting examples of matrix decomposition such as the spectral decomposition. The relations (12.208) and (14.74) are virtually the same, aside from the fact that whereas (14.74) premises an inner product space, (12.208) does not premise it. Correspondingly, whereas the related operators are called projection operators with the case of (14.74), those operators are said to be idempotent operators for (12.208). Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below

G=

1þi : 2

2 1-i

ð13:51Þ

After a unitary similarity transformation, we got U { GU =

p 2þ 2 0

0p : 2- 2

ð13:54Þ

~ = U { GU and rewriting (13.54), we have Putting G ~ = G

p 2þ 2 0

0p 2- 2

= 2þ

p

2

~ { = G, we get By a back calculation of U GU

1 0

0 0

p þ 2- 2

0 0 : 0 1

588

14

G= 2 þ

p

2

p 2ð1 - iÞ 4

þ 2- 2

p -

A1 =

p

1 2 2ð1- iÞ 4

p

2ð1 þ iÞ 4 1 2

p

1 2

p

Putting eigenvalues α1 = 2 þ

Hermitian Operators and Unitary Operators

p

2ð1 þ iÞ 4 1 2

1 2

p

-

2ð 1 - i Þ 4

2ð 1 þ i Þ 4 1 2

:

ð14:77Þ

p 2 and α2 = 2 - 2 along with

, A2 =

1 2

p 2ð 1 - i Þ 4

p 2ð1 þ iÞ 4

,

ð14:78Þ

1 2

we get G = α1 A1 þ α2 A2 :

ð14:79Þ

In the above, A1 and A2 are projection operators. In fact, as anticipated we have A1 2 = A1 , A2 2 = A2 , A1 A2 = A2 A1 = 0, A1 þ A2 = E:

ð14:80Þ

Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus, (14.77) and (14.79) are an example of spectral decomposition. The decomposition is unique. □ Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5, however, an inner product space is not implied, and so we used an idempotent matrix instead of a projection operator. Note that as can be seen in Example 12.5 that idempotent matrix was not Hermitian.

14.4

Hermitian Matrices and Unitary Matrices

Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both in fundamental and applied science. Let us think of several topics and examples. In quantum physics, one frequently treats expectation value of an operator. In general, such an operator is Hermitian, more strictly an observable. Moreover, a vector on an inner product space is interpreted as a state on a Hilbert space. Suppose that there is a linear operator (or observable that represents a physical quantity) O that has discrete (or countable) eigenvalues α1, α2, ⋯. The number of the eigenvalues may be a finite number or an infinite number, but here we assume the

14.4

Hermitian Matrices and Unitary Matrices

589

finite number; i.e., let us suppose that we have eigenvalues α1, α2, ⋯, and αs, in consistent with our previous discussion. In quantum physics, we have a well-known Born probability rule. The rule says the following: Suppose that we carry out a physical measurement on A with respect to a physical state jui. Here we assume that jui has been normalized. Then, the probability ℘l that A takes αl (1 ≤ l ≤ s) is given by ℘l = P~l u

2

,

ð14:81Þ

where P~l is a projection operator that projects jui to an eigenspace W αl spanned by | αl, ki. Here k (1 ≤ k ≤ nl) reflects the multiplicity nl of an eigenvalue αl. Hence, we express the nl-dimensional eigenspace W αl as W αl = Spanfjαl , 1i, jαl , 2i, ⋯, jαl , nl ig:

ð14:82Þ

Now, we define an expectation value hAi of A such that s

hAi 

α℘: l=1 l l

ð14:83Þ

From (14.81) we have ℘l = P~l u

2

{ = uP~l jP~l u = uP~l jP~l u = uP~l P~l u = uP~l u :

ð14:84Þ

For the third equality we used the fact that P~l is Hermitian; for the last equality we 2 used P~l = P~l . Summing (14.84) with the index l, we have P~l u = huEui = hujui = 1,

uP~l u = u

℘l = l

l

l

where with the third equality we used (14.33). Meanwhile, from the spectral decomposition theorem, we have A = α1 P~1 þ α2 P~2 þ ⋯ þ αs P~s :

ð14:74Þ

Operating huj and jui on both sides of (14.74), we get hujAjui = α1 ujP~1 ju þ α2 ujP~2 ju þ ⋯ þ αs ujP~s ju = α1 ℘1 þ α2 ℘2 þ ⋯ þ αs ℘s :

ð14:85Þ

Equating (14.83) and (14.85), we have hAi = hujAjui:

ð14:86Þ

590

14

Hermitian Operators and Unitary Operators

In quantum physics, a real number is required for an expectation value of an observable (i.e., a physical quantity). To warrant this, we have following theorems. Theorem 14.9 A linear transformation A on an inner product space is Hermitian if and only if hu| A| ui is real for all jui of that inner product space. Proof If A = A{, then hu| A| ui = hu| A{| ui = hu| A| ui. Therefore, hu| A| ui is real. Conversely, if hu| A| ui is real for all jui, we have hujAjui = hujAjui = ujA{ ju : Hence, ujA - A{ ju = 0:

ð14:87Þ

From Theorem 14.3, we get A - A{ = 0, i.e., A = A{. This completes the proof. Theorem 14.10 [4] The eigenvalues of an Hermitian operator A are real. Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then, A j ui = α j ui. Operating huj from the left, we have hu| A| ui = αhu| ui = αkuk2. Thus, α = hujAjui=kuk2 :

ð14:88Þ

Since A is Hermitian, hu| A| ui is real. Then, the eigenvalue α is real as well. This completes the proof. Unitary matrices have a following conspicuous features: (1) An inner product is held invariant under unitary transformation: Suppose that jx′i = U j xi and jy′i = U j yi, where U is a unitary operator. Then hy′| x′i = hyU{| Uxi = hy| xi. A norm of any vector is held invariant under unitary transformation as well. This is easily checked by replacing jyi with jxi in the above. (2) Let U be a unitary matrix and suppose that λ is an eigenvalue with jλi being its corresponding eigenvector of that matrix. Then we have λi = U { U λi = U { ðλjλiÞ = λU { jλi = λλ jλi,

ð14:89Þ

where with the last equality we used Theorem 14.7. Thus ð1 - λλ Þ j λi = 0:

ð14:90Þ

As jλi ≠ 0 is assumed, 1 - λλ = 0. That is λλ = jλj2 = 1:

ð14:91Þ

14.4

Hermitian Matrices and Unitary Matrices

591

Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigenvalues be λk = eiθk ðk = 1, 2, ⋯; θk : realÞ. Then, from (12.11) we have λk = eiðθ1 þθ2 þ⋯Þ and jdetU j =

detU = k

jλk j =

eiθk = 1:

k

k

That is, any unitary matrix has a determinant of unit absolute value. Example 14.2 Let us think of a following unitary matrix R:

R=

cos θ sin θ

- sin θ : cos θ

ð14:92Þ

- sin θ = λ2 - 2λ cos θ þ 1: cos θ - λ

ð14:93Þ

A characteristic equation is cos θ - λ sin θ Solving (14.93), we have λ = cos θ ±

p

cos 2 θ - 1 = cos θ ± i j sin θ j :

1. θ = 0: This is a trivial case. The matrix R has automatically been diagonalized to be an identity matrix. Eigenvalues are 1 (double root). 2. θ = π: This is a trivial case. Eigenvalues are -1 (double root). 3. θ ≠ 0, π: Let us think of 0 < θ < π. Then λ = cos θ ± i sin θ. As a diagonalizing unitary matrix U, we get 1 p 2 i - p 2

U=

1 p 2 i p 2

, U{ =

1 p 2 1 p 2

i p

2 i - p 2

:

Therefore, we have

U { RU =

=

1 p 2 1 p 2

i p 2 i -p

eiθ 0

0 e

- iθ

2 :

cos θ

- sin θ

sin θ

cos θ

1 p 2 i -p

1 p 2 i p 2 2

ð14:94Þ

592

14

Hermitian Operators and Unitary Operators

A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a diagonal matrix similarly. The conformation is left for readers. □

14.5 Hermitian Quadratic Forms The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications in the field of mathematical physics and materials science. Let H be an Hermitian operator and jxi on an inner product space Vn. We define the Hermitian quadratic form in an arbitrary orthonormal basis as follows: hxjHjxi  x1 ⋯ xn H

x1 ⋮ xn

=

x ðH Þij xj , i,j i

where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a diagonal matrix and an inner product such that hxjHjxi = ξ1 ⋯ ξn

λ1 ⋮ 0

⋯ ⋱ ⋯

ξ1 ⋮ ξn

0 ⋮ λn

= λ1 jξ1 j2 þ ⋯ þ λn jξn j2 : ð14:95Þ

Notice, however, that the Gram matrix comprising basis vectors (that are linearly independent) is positive definite. Remember that if a Gram matrix is constructed by A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is either positive definite or non-negative. The Hermitian matrix we are dealing with here, in general, does not necessarily possess the positive definiteness or non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1, ⋯, λn are real. Positive definiteness of matrices is an important concept in relation to the Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case where all the matrix elements are real and | xi is defined in a real domain, we are dealing with the quadratic form with respect to a real symmetric matrix. In the case of the real quadratic forms, we sometimes adopt the following notation [1, 2]: A½x  xT Ax =

n

a xx; i,j = 1 ij i j

x=

x1 ⋮ xn

,

where A = (aij) is a real symmetric matrix and xi (1 ≤ i ≤ n) are real numbers.

14.5

Hermitian Quadratic Forms

593

Here, we wish to show that the positive definiteness is invariant under a transformation PTAP, where P is non-singular. In other words, if A > 0, we must have A0 = PT AP > 0:

ð14:96Þ

In (14.96) the transformation A → A′ is said to be an equivalence transformation (or equivalent transformation) of A by P. Also, if A and A′ satisfy (14.96), A and A′ are said to be equivalent and we write [1, 2] A0 ≃ A: Note that the equivalence transformation satisfies the equivalence law; readers, check it. To prove the above proposition stated by (14.96), we take in advance the notion of the polar decomposition of a matrix (see Sect. 24.9.1). The point is that any real non-singular matrix P is uniquely decomposed into a product such that P = OS, where O is an orthogonal matrix and S is a positive definite real symmetric matrix (see Corollary 24.3). Then, we have A0 = ðOSÞT AðOSÞ = S OT AO S = S O - 1 AO S: ~  O - 1 AO, we have A ~ > 0. It is because the similarity transformation Defining A of A by O holds its eigenvalues unchanged. Next, suppose that S is diagonalized ~ It is always possible on the basis of Theorem 14.5. through an orthogonal matrix O. Then, we have ~O ~ O ~ O ~ T SO ~O ~T A ~ O ~TA ~ = DAD, ~= O ~ T SO ~O ~ T SO ~ =O ~TS O ~ T A0 O O

ð14:97Þ

where D is a diagonal matrix given by ~ T SO ~ = D=O

d1 ⋮

⋯ ⋱ ⋯

⋮ dn

~ O. ~ > 0, we have A > 0. ~ Since A ~TA with di > 0 (i = 1, ⋯, n). Also, we have A = O Meanwhile, let us think of a following inner product xjDADjx , where x is an arbitrary real vector in Vn. We have

594

14

xjDADjx = ðx1 ⋯ xn ÞDAD = ðx1 d1 ⋯ xn d n ÞA

x1 d 1 ⋮ xn d n

x1 ⋮ xn

Hermitian Operators and Unitary Operators

= ½ðx1 ⋯ xn ÞDA D

x1 ⋮ xn

:

ð14:98Þ

Suppose here that (x1⋯xn) ≠ 0. That is, at least one of xi (i = 1, ⋯, n) is a non-zero (real) number. Then, at least one of xidi (i = 1, ⋯, n) is a non-zero (real) number as well. Since A > 0, the rightmost side of (14.98) is positive. This implies that xjDADjx is positive with arbitrarily chosen (x1⋯xn) ≠ 0. Namely, DAD > 0. Thus, from (14.97) and (13.48) we have A′ > 0. It is because a similarity transfor~ keeps its eigenvalues unchanged. mation of A′ by O Thus, we have established the next important theorem. Theorem 14.11 [1, 2] Let A be a real symmetric matrix with A > 0. Then, we have PTAP > 0, where P is a real non-singular matrix. Since A is a real symmetric matrix, we must have a suitable orthogonal matrix O that diagonalizes A such that ^= ^ T AO O

λ1 ⋯ ⋮ ⋱ 0 ⋯

0 ⋮ λn

,

where λi (1 ≤ i ≤ n) are the eigenvalues of A. Since A > 0 from the assumption, we should have 8λi > 0 (1 ≤ i ≤ n) from (13.48). Then, we have T

det O AO = detO

T

ðdetAÞ detO = ð ± 1ÞdetAð ± 1Þ = detA =

n i=1

λi :

Notice that we distinguish the equivalence transformation from the similarity transformation P-1A P. Nonetheless, if we choose an orthogonal matrix O for P, the two transformations are the same because OT = O-1. We often encounter real quadratic forms in the field of electromagnetism. Typical example is a trace of electromagnetic fields observed with an elliptically or circularly polarized light (see Sect. 7.3). A permittivity tensor of an anisotropic media such as crystals (either inorganic or organic) is another example. A tangible example for this appeared in Sect. 9.5.2 with an anisotropic organic crystal. Regarding the real quadratic forms, we have a following important theorem (Theorem 14.12). To prove Theorem 14.12, we need Theorem 14.11. Theorem 14.12 [1, 2] Let A be a n-dimensional real symmetric matrix A = (aij). Let A(k) be k-dimensional principal submatrices described by

14.5

Hermitian Quadratic Forms

ðk Þ

A

=

ai1 i1 ai2 i1 ⋮ aik i1

595

⋯ ai1 ik ⋯ ai2 ik ⋱ ⋮ ⋯ aik ik

ai1 i2 ai 2 i 2 ⋮ aik i2

ð1 ≤ i1 < i2 < ⋯ < ik ≤ nÞ,

where the principal submatrices mean a matrix made by striking out rows and columns on diagonal elements. Then, we have A > 0 ⟺ detAðkÞ > 0 ð1 ≤ k ≤ nÞ:

ð14:99Þ

Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n - k) variables xl = 0 (l ≠ i1, i2, ⋯, ik), we obtain a quadratic form of k

a x x : μ,ν = 1 iμ iν iμ iν Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have AðkÞ > 0: Therefore, we get detAðkÞ > 0: This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we have proven ⟹ of (14.99). To prove ⟸, in turn, we use mathematical induction. If n = 1, we have a trivial case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n - 1. Then, we have A(n - 1) > 0 by supposition. Thus, it follows that it will suffice to show A > 0 on condition that A(n - 1) > 0 and detA > 0 in addition. Let A be a ndimensional real symmetric non-singular matrix such that Aðn - 1Þ aT

A=

a , an

where A(n - 1) is a symmetric matrix and non-singular as well. We define P such that P=

E 0

Aðn - 1Þ 1

-1

a ,

where E is a (n - 1, n - 1) unit matrix. Notice that detP = det E  1 = 1 ≠ 0, indicating that P is non-singular. We have

596

14

Hermitian Operators and Unitary Operators

E -1 aT Aðn - 1Þ

PT =

0 : 1

For this expression, consider a non-singular matrix S. Then, we have SS-1 = E. Taking its transposition, we have (S-1)TST = E. Therefore, if S is a symmetric matrix (S-1)TS = E; i.e., (S-1)T = S-1. Hence, an inverse matrix of a symmetric matrix is symmetric as well. Then, for a symmetric matrix A(n - 1) we have aT Aðn - 1Þ

-1 T

= Aðn - 1Þ

-1 T

T

aT

= Aðn - 1Þ

-1

a:

Therefore, A can be expressed as A = PT

=

E

aT A

ðn - 1Þ - 1

Aðn - 1Þ 0

Aðn - 1Þ 0

0 1

0 P: -1 an - Aðn - 1Þ ½a

0

an - A

ðn - 1Þ - 1

E 0

½ a

Aðn - 1Þ 1

-1

a :

ð14:100Þ Now, taking a determinant of (14.100), we have detA = detPT detAðn - 1Þ = detAðn - 1Þ

an - Aðn - 1Þ

an - Aðn - 1Þ

-1

-1

½a detP

½ a :

By supposition, we have detA(n - 1) > 0 and detA > 0. Hence, we have an - Aðn - 1Þ Putting a~n  an - Aðn - 1Þ Aðn - 1Þ 0

-1

½a and x 

0 an

-1

½a > 0: xðn - 1Þ xn

, we get

½x = Aðn - 1Þ xðn - 1Þ þ an xn 2 :

Since A(n - 1) > 0 and a~n > 0, we have ~  A Meanwhile, A is expressed as

Aðn - 1Þ 0

0 an

> 0:

14.5

Hermitian Quadratic Forms

597

~ or A ≃ A: ~ A = PT AP From (14.96), we have A > 0. These complete the proof. We also have a related theorem (the following Theorem 14.13) for an Hermitian quadratic form. Theorem 14.13 Let A = (aij) be a n-dimensional Hermitian matrix. Let A~ðkÞ be kdimensional principal submatrices. Then, we have A > 0 ⟺ detA~ðkÞ > 0 ð1 ≤ k ≤ nÞ, where A~ðkÞ is described as A~ðkÞ =

a11 ⋮ ak1

⋯ ⋱ ⋯

a1k ⋮ akk

:

The proof is left for readers. Example 14.3 Let us consider a following Hermitian (real symmetric) matrix and corresponding Hermitian quadratic form.

H=

5 2

5 2 , hxjHjxi = ðx1 x2 Þ 2 1

2 1

x1 : x2

ð14:101Þ

Principal minors of H are 5 (>0) and 1 (>0) and detH = 5 - 4 = 1 (>0). Therefore, from Theorem 14.12 we have H > 0. A characteristic equation gives the following eigenvalues; i.e., p p λ1 = 3 þ 2 2, λ2 = 3 - 2 2: Both the eigenvalues are positive as anticipated. As a diagonalizing matrix R, we get R=

p 1þ 2 1

p 1- 2 1

:

To obtain a unitary matrix, we have to seek norms of column vectors. p 4 þ 2 2 and Corresponding to λ1 and λ2, we estimate their norms to be p 4 - 2 2, respectively. Using them, as a unitary matrix U we get

598

14

1

p

4-2 2 1 p 4þ2 2

U=

Hermitian Operators and Unitary Operators

1

-

p 4þ2 2 1 p 4-2 2

:

ð14:102Þ

Thus, performing the matrix diagonalization, we obtain a diagonal matrix D such that p 3þ2 2 0

{

D = U HU =

0p : 3-2 2

ð14:103Þ

Let us view the above unitary diagonalization in terms of coordinate transformation. Using the above matrix U and changing (14.101) as in (13.36), 5 2

hxjHjxi = ðx1 x2 ÞUU { = ðx1 x2 ÞU

p 3þ2 2 0

2 x UU { 1 x2 1

x 0p U{ 1 : x2 3-2 2

ð14:104Þ

Making an argument analogous to that of Sect. 13.2 and using similar notation, we have h~xjH 0 j~xi = ð x~1 x~2 ÞH 0

x~1 x~2

= ðx1 x2 ÞUU { HUU {

x1 x2

= hxjHjxi:

ð14:105Þ

That is, we have x1 x2

= U{

x1 : x2

ð14:106Þ

Or, taking an adjoint of (14.106), we have equivalently ð x~1 x~2 Þ = ðx1 x2 ÞU. Notice here that

x~1 x~2

and

x1 x2

are real and that U is real (i.e., orthogonal matrix).

Likewise, H 0 = U { HU: Thus, it follows that

x1 x2

and

~x1 x~2

ð14:107Þ

are different column vector representations

of the identical vector that is viewed in reference to two different sets of orthonormal bases (i.e., different coordinate systems). From (14.102), as an approximation we have

14.6

Simultaneous Eigenstates and Diagonalization

599

Fig. 14.1 Ellipse obtained through diagonalization of an Hermitian quadratic form. The angle θ is about 22.5°

Uffi

0:92 0:38

- 0:38 0:92

cos 22:5 ° sin 22:5 °



- sin 22:5 ° cos 22:5 °

:

ð14:108Þ

Equating hx| H| xi to a constant, we get a hypersurface in a plane. Choosing 1 for a constant, we get an equation of hypersurface (i.e., an ellipse) as a function of x~1 and x~2 such that x~1 2 1p 3þ2 2

2

þ

x~2 2 1p 3-2 2

2

= 1:

ð14:109Þ

Figure 14.1 depicts the ellipse. □

14.6 Simultaneous Eigenstates and Diagonalization In quantum physics, a concept of simultaneous eigenstate is important and has been briefly mentioned in Sect. 3.3. To rephrase this concept, suppose that there are two operators B and C and ask whether the two operators (or more) possess a common set of eigenvectors. The question is boiled down to whether the two operators commute. To address this question, the following theorem is important:

600

14

Hermitian Operators and Unitary Operators

Theorem 14.14 [4] Two Hermitian matrices B and C commute if and only if there exists a complete orthonormal set of common eigenvectors. Proof Suppose that there exists a complete orthonormal set of common eigenvectors {jxi; ki (1 ≤ k ≤ mi)} that span the linear vector space, where i and mi are a positive integer and that jxii corresponds to an eigenvalue bi of B and ci of C. Note that if mi is 1, we say that the spectrum is non-degenerate and that if mi is equal to two or more, the spectrum is said to be degenerate. Then we have Bðjxi iÞ = bi j xi i, C ðjxi iÞ = ci j xi i:

ð14:110Þ

Therefore, BC(| xii) = B(ci| xii) = ciB(| xii) = cibi j xii. Similarly, CB(| xii) = cibi j xii. Consequently, (BC - CB)(| xii) = 0 for any jxii. As all the set of jxii span the vector space, BC - CB = 0, namely BC = CB. In turn, assume that BC = CB and that B(| xii) = bi j xii, where jxii are orthonormal. Then, we have CBðjxi iÞ = bi C ðjxi iÞ ⟹ B½Cðjxi iÞ = bi C ðjxi iÞ:

ð14:111Þ

This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi. We have two cases. 1. The spectrum is non-degenerate: The spectrum is said to be non-degenerate if only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(| xii) = ci j xii. That is, jxii is an eigenvector of C corresponding to an eigenvalue ci. That is, jxii is a common eigenvector to B and C. This completes the proof for the non-degenerate case. 2. The spectrum is degenerate: The spectrum is said to be degenerate if two or more eigenvectors belong to an eigenvalue. Multiplicity of bi is two or more; here ð1Þ ð2Þ ðmÞ suppose that the multiplicity is m (m ≥ 2). Let j xi i, j xi i, ⋯, and j xi i be linearly independent vectors and belong to the eigenvalue bi of B. Then, from the assumption we have m eigenvectors ð1Þ

ð2Þ

ðmÞ

C jxi i , C jxi i , ⋯, C jxi i that belong to an eigenvalue bi of B. This means that individual ðμÞ are described by linear combination of C jxi i ð1 ≤ μ ≤ mÞ ð1Þ

ð2Þ

ðmÞ

j xi i, j xi i, ⋯, and j xi i. What we want to prove is to show that suitable linear combination of these m vectors constitutes an eigenvector corresponding to an eigenvalue cμ of C. Here, to avoid complexity we denote the multiplicity by m instead of the above-mentioned mi. ð μÞ The vectors C jxi i ð1 ≤ μ ≤ mÞ can be described as

14.6

Simultaneous Eigenstates and Diagonalization ð1Þ

m

C jxi i =

α j = 1 j1

ðjÞ

601

ðmÞ

m

j xi i, ⋯, C jxi i =

α j = 1 jm

ðjÞ

j xi i:

ð14:112Þ

Using full matrix representation, we have ðk Þ

m

ð1Þ

γ1 ⋮ γm

ðmÞ

γ jx i = jxi i⋯jxi i C k=1 k i

C

ð1Þ

α11 ⋮ αm1

ðmÞ

= jxi i⋯jxi i

⋯ ⋱ ⋯

α1m ⋮ αmm

γ1 ⋮ γm

:

ð14:113Þ

In (14.113), we adopt the notation of (11.37). Since (αij) is a matrix representation of an Hermitian operator C, (αij) is Hermitian as well. More specifically, if we take an ðlÞ inner product of a vector expressed in (14.112) with j xi i, then we have ðlÞ

ðk Þ

xi jCxi

ðlÞ

ðjÞ α x j = 1 jk i m

= xi j

m

α j = 1 jk

=

ðlÞ

ðjÞ

xi jxi

=

m

α δ j = 1 jk lj

= αlk ð1 ≤ k, l ≤ mÞ,

ð14:114Þ

where the third equality comes from the orthonormality of the basis vectors. Meanwhile, ðlÞ

ðk Þ

xi jCxi

ðk Þ

ðlÞ

= xi jC{ xi



ðk Þ

ðlÞ

= xi jCxi



= αkl  ,

ð14:115Þ

where the second equality comes from the Hermiticity of C. From (14.114) and (14.115), we get αlk = αkl  ð1 ≤ k, l ≤ mÞ:

ð14:116Þ

This indicates that (αij) is in fact Hermitian. We are seeking the condition under which linear combinations of the eigenvecðk Þ tors j xi ið1 ≤ k ≤ mÞ for B are simultaneously eigenvectors of C. If the linear ðk Þ combination m k = 1 γ k j xi i is to be an eigenvector of C, we must have C

ðk Þ γ jx k=1 k i m

=c

ðjÞ γ jx j=1 j i m

:

ð14:117Þ

Considering (14.113), we have ð1Þ

ðmÞ

jxi i⋯jxi i

α11 ⋯ α1m ⋮ ⋱ ⋮ αm1 ⋯ αmm

γ1 ⋮ γm

γ1 ð1Þ ðmÞ = jxi i⋯jxi i c ⋮ : γm

ð14:118Þ

602

14

Hermitian Operators and Unitary Operators

ðjÞ

The vectors j xi ið1 ≤ j ≤ mÞ span an invariant subspace (i.e., an eigenspace corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently, ðjÞ in (14.118) we can equate the scalar coefficients of individual j xi ið1 ≤ j ≤ mÞ. Then, we get α11 ⋮ αm1

⋯ ⋱ ⋯

α1m ⋮ αmm

γ1 ⋮ γm

γ1 ⋮ γm

=c

:

ð14:119Þ

This is nothing other than an eigenvalue equation. Since (αij) is an Hermitian matrix, there should be m eigenvalues cμ some of which may be identical (the degenerate case). Moreover, we can always decide m orthonormal column vectors by solving (14.119). We denote them by γ (μ) (1 ≤ μ ≤ m) that belong to cμ. Rewriting (14.119), we get ⋯ ⋱ ⋯

α11 ⋮ αm1

ðμÞ

α1m ⋮ αmm

γ1 ⋮ ðμÞ γm

ð μÞ

γ1 ⋮ ð μÞ γm

= cμ

:

ð14:120Þ

Equation (14.120) implies that we can construct a (m, m) unitary matrix from the m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to diagonalize (αij) according to Theorem 14.5. Having determined m eigenvectors γ (μ) (1 ≤ μ ≤ m), we can construct a set of eigenvectors such that ð μÞ

yi

ð μÞ γ k=1 k m

=

ðk Þ

:

xi

ð14:121Þ

ð μÞ

Finally let us confirm that j yi ið1 ≤ μ ≤ mÞ in fact constitute an orthonormal basis. To show this, we have ðνÞ

ð μÞ

m

m

k=1

l=1

γk

yi jyi =

ðνÞ ðk Þ γ xi j k=1 k m

=

ðνÞ  ðμÞ γl

=

ðk Þ

ðlÞ

xi jxi

m k=1

ðνÞ

γk

= 

ðμÞ

ðμÞ ðlÞ γ xi l=1 l m

m

m

k=1

l=1

γ k = δνμ :

ðνÞ  ðμÞ γ l δkl

γk

ð14:122Þ

The last equality comes from the fact that a matrix comprising m orthonormal ðμÞ column vectors γ (μ) (1 ≤ μ ≤ m) forms a unitary matrix. Thus, j yi ið1 ≤ μ ≤ mÞ certainly constitute an orthonormal basis. The above completes the proof.

14.6

Simultaneous Eigenstates and Diagonalization

603

Theorem 14.14 can be restated as follows: Two Hermitian matrices B and C can be simultaneously diagonalized by a unitary similarity transformation. As mentioned above, we can construct a unitary matrix U such that ð1Þ

U=

γ1 ⋮ γ ðm1Þ

⋯ ⋱ ⋯

ðmÞ

γ1 ⋮ γ ðmmÞ

:

Then, using U, matrices B and C are diagonalized such that bi U { BU =

c1 bi

, U { CU =



c2

:



bi

ð14:123Þ

cm

Note that both B and C are represented in an invariant subspace Wm. As we have already seen in Part I that dealt with an eigenvalue problem of a hydrogen-like atom, squared angular momentum (L2) and z-component of angular momentum (Lz) possess a mutual set of eigenvectors and, hence, their eigenvalues are determined at once. Related matrix representations to (14.123) were given in (3.159). Conversely, this was not the case with a set of operators Lx, and Ly, and Lz; see (3.30). Yet, we pointed out the exceptional case where these three operators along with L2 take an eigenvalue zero in common which an eigenstate Y 00 ðθ, ϕÞ  1=4π corresponds to. Nonetheless, no complete orthonormal set of common eigenvectors exists with the set of operators Lx, and Ly, and Lz. This fact is equivalent to that these three operators are non-commutative among them. In contrast, L2 and Lz share a complete orthonormal set of common eigenvectors and, hence, are commutable. ð1Þ ð2Þ ðmÞ Notice that C jxi i , C jxi i , ⋯, and C jxi i are not necessarily linearly independent (see Sect. 11.4). Suppose that among m eigenvalues cμ (1 ≤ μ ≤ m), some cμ = 0. Then, detC = 0 according to (13.48) and (14.123). This means that C is ð1Þ ð2Þ ðmÞ singular. In that case, C jxi i , C jxi i , ⋯, and C jxi i are linearly dependent. In Sect. 3.3, in fact, we had j Lz Y 00 ðθ, ϕÞ = j L2 Y 00 ðθ, ϕÞ = 0. But this special situation does not affect the proof of Theorem 14.14. We know that any matrix A can be decomposed such that A=

1 1 A - A{ , A þ A{ þ i 2i 2

ð14:124Þ

where we put B  12 A þ A{ and C  2i1 A - A{ ; both B and C are Hermitian. That is, any matrix A can be decomposed to two Hermitian matrices in such a way that

604

14

Hermitian Operators and Unitary Operators

A = B þ iC:

ð14:125Þ

Note here that B and C commute if and only if A and A{ commute, that is A is a normal matrix. In fact, from (14.125) we get AA{ - A{ A = 2iðCB - BC Þ:

ð14:126Þ

From (14.126), if and only if B and C commute, AA{ - A{A = 0, i.e., AA{ = A{A. This indicates that A is a normal matrix. Taking account of the above context along with Theorem 14.14, we see that we can construct the diagonalizing matrix of A from the complete orthonormal set of eigenvectors common to B and C. This immediately implies that the diagonalizing matrix is unitary. In other words, with respect to a normal operator A on Vn a diagonalizing unitary matrix U must exist. On top of it, such matrix U diagonalizes the Hermitian matrices B and C at once. Then, we have U { AU = U { BU þ iU { CU λ1 =



⋮ ⋱ ⋯

⋮ λn

þi

μ1





⋱ ⋯

⋮ μn

λ1 þ iμ1





⋱ ⋯

=

⋮ λn þ iμn

,

ð14:127Þ where real eigenvalues λ1, ⋯, λn of B as well as those μ1, ⋯, μn of C are possibly degenerate. The complex eigenvalues λ1 + iμ1, ⋯, λn + iμn are possibly degenerate accordingly. Then, (14.127) can be rewritten as ω1

U { AU =



ω1

ω2



ω2

, ⋱

ωs



ð14:128Þ

ωs

where ωl (1 ≤ l ≤ s ≤ n) denotes different complex eigenvalues possibly with several of them degenerate. Equation (14.128) is equivalent to (14.71). Considering (14.74) and (14.128) represents the spectral decomposition. Following the argument made in Sect. 14.3, especially including (14.71)–(14.74), we acknowledge that the above discussion is equivalent to the implication of Theorem 14.5. Moreover, (14.12) implies that even with eigenvectors belonging to

References

605

a degenerate eigenvalue, we can choose mutually orthogonal (unit) vectors. Thus, we recognize that Theorems 14.5, 14.8, and 14.14 are mutually interrelated. These theorems and related concepts can be interpreted in terms of the projection operators and spectral decomposition. We will survey the related aspects in a broader context from a quantum-mechanical point of view later (see Chap. 24). In Sects. 14.1 and 14.3, we mentioned the spectral decomposition. There, we showed a case where projection operators commute with one another; see (14.23). Thus, in light of Theorem 14.14, those projection operators can be diagonalized at once to be expressed as, e.g., (14.72). This is a conspicuous feature of the projection operators.

References 1. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York 2. Satake I (1974) Linear algebra. Shokabo, Tokyo 3. Hassani S (2006) Mathematical physics. Springer, New York 4. Byron FW, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York 5. Mirsky L (1990) An introduction to linear algebra. Dover, New York

Chapter 15

Exponential Functions of Matrices

In Chap. 12, we dealt with a function of matrices. In this chapter we study several important definitions and characteristics of functions of matrices. If elements of matrices consist of analytic functions of a real variable, such matrices are of particular importance. For instance, differentiation can naturally be defined with the functions of matrices. Of these, exponential functions of matrices are widely used in various fields of mathematical physics. These functions frequently appear in a system of differential equations. In Chap. 10, we showed that SOLDEs with suitable BCs can be solved using Green’s functions. In the present chapter, in parallel, we show a solving method based on resolvent matrices. The exponential functions of matrices have broad applications in group theory that we will study in Part IV. In preparation for it, we study how the collection of matrices forms a linear vector space. In accordance with Chap. 13, we introduce basic notions of inner product and norm to the matrices.

15.1

Functions of Matrices

We consider a matrix in which individual components are differentiable with respect to a real variable such that Aðt Þ = aij ðt Þ :

ð15:1Þ

We define the differentiation as A0 ðt Þ =

dAðt Þ  a0ij ðt Þ : dt

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_15

ð15:2Þ

607

608

15

Exponential Functions of Matrices

We have ½Aðt Þ þ Bðt Þ0 = A0 ðt Þ þ B0 ðt Þ,

ð15:3Þ

½Aðt ÞBðt Þ0 = A0 ðt ÞBðt Þ þ Aðt ÞB0 ðt Þ:

ð15:4Þ

To get (15.4), putting A(t)B(t)  C(t), we have cij ðt Þ =

aik ðt Þbkj ðt Þ,

ð15:5Þ

k

where C(t) = (cij(t)), B(t) = (bij(t)). Differentiating (15.5), we get c0ij ðt Þ =

a0ik ðt Þbkj ðt Þ þ k

aik ðt Þb0kj ðt Þ,

ð15:6Þ

k

namely, we have C′(t) = A′(t)B(t) + A(t)B′(t), i.e., (15.4) holds. Analytic functions are usually expanded as an infinite power series. As an analogue, a function of a matrix is expected to be expanded as such. Among various functions of matrices, exponential functions are particularly important and have a wide field of applications. As in the case of a real or complex number x, the exponential function of matrix A can be defined as exp A  E þ A þ

1 2 1 A þ ⋯ þ Aν þ ⋯, 2! ν!

ð15:7Þ

where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used instead of a number 1 that appears in the power series expansion of expx described as exp x  1 þ x þ

1 1 2 x þ ⋯ þ xν þ ⋯: ν! 2!

Here we define the convergence of a series of matrix as in the following definition: Definition 15.1 Let A0, A1, ⋯, Aν, ⋯ [Aν  (aij, ν) (1 ≤ i, j ≤ n)] be a sequence of matrices. Let aij, 0, aij, 1, ⋯, aij, ν, ⋯ be the corresponding sequence of each matrix ~ ~ component. Let us define a series of matrices as A aij = A0 þ A1 þ ⋯ þ Aν þ ⋯ and a series of individual matrix components as ~ ~aij  aij,0 þ aij,1 þ ⋯ þ aij,ν þ ⋯. Then, if each ~ aij converges, we say that A converges. Regarding the matrix notation in Definition 15.1, readers are referred to (11.38). Now, let us show that the power series of (15.7) converges [1, 2].

15.1

Functions of Matrices

609

Let A be a (n, n) square matrix. Putting A = (aij) and max j aij j = M and i, j ðνÞ ðνÞ ν defining A  aij , where aij is defined as a (i, j) component of Aν, we get ðνÞ

aij

≤ nν M ν ð1 ≤ i, j ≤ nÞ:

ð15:8Þ

ð0Þ

Note that A0 = E. Hence, aij = δij . Then, if ν = 0, (15.8) routinely holds. When ν = 1, (15.8) obviously holds as well from the definition of M and assuming n ≥ 2; we are considering (n, n) matrices with n ≥ 2. We wish to show that (15.8) holds with any ν (≥2) using mathematical induction. Suppose that (15.8) holds with ν = ν - 1. That is, ðν - 1Þ

aij

≤ nν - 1 M ν - 1 ð1 ≤ i, j ≤ nÞ:

Then, we have ðνÞ

ðAν Þij  aij = Aν - 1 A

ij

n

=

k=1

Aν - 1

ik

ðAÞkj =

ðν - 1Þ a akj : k = 1 ik n

ð15:9Þ

Thus, we get ðνÞ

aij

=

ðν - 1Þ a akj k = 1 ik n



ðν - 1Þ

n k=1

aik

 j akj j ≤ n  nν - 1 M ν - 1  M

= nν M ν :

ð15:10Þ

Consequently (15.8) holds with ν = ν. Meanwhile, we have a next relation such that enM =

1 ν=0

1 ν ν nM : ν!

ð15:11Þ

The series (15.11) certainly converges; note that nM is not a matrix but a number. From (15.10) we obtain 1 ν=0

1 ðνÞ a ≤ ν! ij ð νÞ

1 ν=0

1 ν ν n M = enM : ν!

1 This implies that 1 ν = 0 ν! aij converges (i.e., absolute convergence [3]). Thus, from Definition 15.1, (15.7) converges. To further examine the exponential functions of matrices, in the following proposition, we show that a collection of matrices forms a linear vector space.

610

15

Exponential Functions of Matrices

Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms a linear vector space. Proof Let A = aij 2 M and B = bij 2 M be (n, n) square matrices. Then, αA = (αaij) and A + B = (aij + bij) are again (n, n) square matrices, and so we have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space. 2

In Proposition 15.1, M forms a n2-th order vector space V n . To define an inner ðk Þ product in this space, we introduce j PðlÞ ðk, l = 1, 2, ⋯, nÞ as a basis set of n2 vectors. Now, let A = (aij) be a (n, n) square matrices. To explicitly show that A is a vector in the inner product space, we express it as j Ai =

n

a k,l = 1 kl

ðk Þ

j PðlÞ :

Then, an adjoint matrix of jAi is written as n

hA jj Ai{ = ðk Þ

a k,l = 1 kl

ðk Þ



PðlÞ j ,

ð15:12Þ

ðk Þ

where PðlÞ j is an adjoint matrix of j PðlÞ . With the notations in (15.12), see Sects. ðk Þ

1.4 and 13.3; for PðlÞ see Sect. 14.1. Meanwhile, let B = (bij) be another (n, n) square matrix denoted by j Bi =

n

b s,t = 1 st

ðsÞ

j PðtÞ : ðk Þ

Here, let us define an orthonormal basis set j PðlÞ

and an inner product between

vectors described in a form of matrices such that ðk Þ

ðsÞ

PðlÞ jPðtÞ  δks δlt : Then, from (15.12) the inner product between A and B is given by hAjBi 

n

a b k,l,s,t = 1 kl st =

ðk Þ

ðsÞ

PðlÞ jPðtÞ = n

a b : k,l = 1 kl kl

n

a b δ δ k,l,s,t = 1 kl st ks lt ð15:13Þ

The relation (15.13) leads to a norm already introduced in (13.8). The norm of a matrix is defined as

15.2

Exponential Functions of Matrices and Their Manipulations

kAk 

n

hAjAi =

611

2

i,j = 1

aij :

ð15:14Þ

We have the following theorem regarding two matrices. Theorem 15.1 Let A = (aij) and B = (bij) be two square matrices. Then, we have a following inequality: kABk ≤ kAk  kBk:

ð15:15Þ

Proof Let C = AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1), we get 2

2

kC k2 =

cij = i, j

=

a b k ik kj

i, j

i,k

jaik j2

blj

2

jaik j2

≤ i, j

k

blj

2

l

= kAk2  kBk2 :

ð15:16Þ

j, l

This leads to (15.15)

15.2

Exponential Functions of Matrices and Their Manipulations

Using Theorem 15.1, we explore important characteristics of exponential functions of matrices. For this we have a next theorem. Theorem 15.2 [4] Let A and B be mutually commutative matrices, i.e., AB = BA. Then, we have expðA þ BÞ = exp A exp B = exp B exp A:

ð15:17Þ

Proof Since A and B are mutually commutative, (A + B)n can be expanded through the binomial theorem such that 1 ðA þ BÞn = n!

Ak Bn - k : k = 0 k! ðn - k Þ! n

Meanwhile, with an arbitrary integer m, we have

ð15:18Þ

612

15

Fig. 15.1 Number of possible combinations (k, l). Such combinations are indicated with green points. The number of the green points of upper left triangle is m(m + 1)/2. That for lower right triangle is m(m + 1)/2 as well. The total number of the green points is m(m + 1) accordingly. Adapted from Yamanouchi T, Sugiura M (1960) Introduction to continuous groups (New Mathematics Series 18: in Japanese) [4], with the permission of Baifukan Co., Ltd., Tokyo

−1 −2 ⋯⋯⋯ ⋯

+1

1 ðA þ B Þn = n = 0 n! where Rm has a form of

⋯⋯⋯



0 0

2m

Exponential Functions of Matrices

+1

An n = 0 n! m

Bn n = 0 n! m

þ Rm ,

ð15:19Þ

Ak Bl =k!l!, where the summation is taken over all

ðk , lÞ combinations of (k, l ) that satisfy max(k, l) ≥ m + 1 and k + l ≤ 2m. The number of all the combinations (k, l ) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that if in (15.19) we put m = 0, (15.19) trivially holds with Rm = 0 (i.e., zero matrix). The matrix Rm, however, changes with increasing m, and so we must evaluate it at m → 1. Putting max(| |A|| , | |B|| , 1) = C and using Theorem 15.1, we get

jjRm jj ≤ ðk, lÞ

jjAjjk jjBjjl : k!l!

ð15:20Þ

In (15.20) min(k, l) = 0, and so k ! l ! ≥ (m + 1)!. Since ||A||k||B||l ≤ Ck + l ≤ C2m, we have

ðk , lÞ

jjAjjk jjBjjl mðm þ 1ÞC 2m C 2m = , ≤ k!l! ðm þ 1Þ! ðm - 1Þ!

ð15:21Þ

where m(m + 1) in the numerator comes from the number of combinations (k, l ) as pointed out above. From Stirling’s interpolation formula [5], we have

15.2

Exponential Functions of Matrices and Their Manipulations

613

p 1 Γðz þ 1Þ ≈ 2πe - z zzþ2 :

ð15:22Þ

Replacing z with an integer m - 1, we get p 1 ΓðmÞ = ðm - 1Þ! ≈ 2π e - ðm - 1Þ ðm - 1Þm - 2 :

ð15:23Þ

Then, we have ½RHS of ð15:21Þ = eC 2 C =p 2πe m - 1

em - 1 C 2m C2m ≈ p 1 ðm - 1Þ! 2π ðm - 1Þm - 2 m - 12

ð15:24Þ

and eC 2 C lim p m → 1 2πe m - 1

m - 12

= 0:

ð15:25Þ

Thus, from (15.20) and (15.21), we get lim kRm k = 0:

ð15:26Þ

m→1

Remember that kRmk = 0 if and only if Rm = 0 (zero matrix); see (13.4) and (13.8) in Sect. 13.1. Finally, taking limit (m → 1) of both sides of (15.19), we obtain lim m→1

1 ðA þ BÞn = lim n = 0 n! m→1 2m

An n = 0 n! m

Bn n = 0 n! m

þ Rm :

Considering (15.26), we get lim

m→1

2m n=0

1 ðA þ BÞn = lim m→1 n!

m n=0

An n!

m n=0

Bn : n!

ð15:27Þ

This implies that (15.17) holds. It is obvious that if A and B commute, then expA and expB commute. That is, expA exp B = exp B exp A. These complete the proof. From Theorem 15.2, we immediately get the following property. ð1Þ ðexp AÞ - 1 = expð- AÞ:

ð15:28Þ

To show this, it suffices to find that A and -A commute. That is, A(-A) = A2 = (-A)A. Replacing B with -A in (15.17), we have

614

15

Exponential Functions of Matrices

expðA - AÞ = exp 0 = E = exp A expð- AÞ = expð- AÞ exp A:

ð15:29Þ

This leads to (15.28). Notice here that exp0 = E from (15.7). We further have important properties of exponential functions of matrices. ð2Þ P - 1 ðexp AÞP = exp P - 1 AP :

ð15:30Þ

ð3Þ exp AT = ðexp AÞT :

ð15:31Þ

ð4Þ exp A{ = ðexp AÞ{ :

ð15:32Þ

To show (2) we have P - 1 AP

n

= P - 1 AP P - 1 AP ⋯ P - 1 AP

= P - 1 A PP - 1 A PP - 1 ⋯ PP - 1 AP = P - 1 AEAE⋯EAP = P - 1 An P:

ð15:33Þ

Applying (15.33)–(15.7), we have exp P - 1 AP = E þ P - 1 AP þ P - 1 AP

= P - 1 EP þ P - 1 AP þ P - 1

= P-1 E þ A þ

2

1 þ ⋯ þ P - 1 AP 2!

n

1 þ⋯ n!

A2 An P þ ⋯ þ P-1 P þ ⋯ 2! n!

An A2 þ ⋯ þ þ ⋯ P = P - 1 ðexp AÞP: 2! n!

ð15:34Þ

With (3) and (4), use the following equations: AT

n

= ðAn ÞT and A{

n

= ðA n Þ{ :

ð15:35Þ

The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we describe other important theorem and properties of the exponential functions of matrices. Theorem 15.3 Let a1, a2, ⋯, an be eigenvalues of a (n, n) square matrix A. Then, expa1, exp a2, ⋯, exp an are eigenvalues of expA. Proof From Theorem 12.1 we know that every (n, n) square matrix can be converted to a triangle matrix by similarity transformation. Also, we know that eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1). Let P be a non-singular matrix used for such a similarity transformation. Then, we have

15.2

Exponential Functions of Matrices and Their Manipulations

615



a1 P - 1 AP ¼

:

⋱ 0

ð15:36Þ

an

Meanwhile, let T be a triangle matrix described by 

t 11

:



T¼ 0

ð15:37Þ

t nn

Then, from a property of the triangle matrix, we have 

tm 11 ⋱

Tm ¼

: tm nn

0 Applying this property to (15.36), we get 

am 1 P - 1 AP

m

¼ P - 1 Am P,



¼ 0

ð15:38Þ

am n

where the last equality is due to (15.33). Summing (15.38) over m following (15.34), we obtain another triangle matrix expressed as P - 1 ðexp AÞP ¼ ð exp a1  ⋱0 exp an Þ:

ð15:39Þ

Once again considering that eigenvalues of a triangle matrix are given by its diagonal elements and that the eigenvalues are invariant under a similarity transformation, (15.39) implies that expa1, exp a2, ⋯, exp an are eigenvalues of expA. These complete the proof. A question of whether the eigenvalues are a proper eigenvalue or a generalized eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of A in question. Theorem 15.3 immediately leads to a next important relation. Taking a determinant of (15.39), we get det P - 1 ðexp AÞP = detP - 1 detðexp AÞdetP = detðexp AÞ = ðexp a1 Þðexp a2 Þ⋯ðexp an Þ = expða1 þ a2 þ ⋯ þ an Þ = expðTrAÞ:

ð15:40Þ

616

15

Exponential Functions of Matrices

To derive (15.40), refer to (11.59) and (12.10) as well. Since (15.40) represents an important property of the exponential functions of matrices, we restate it separately as a following formula [4]: ð5Þ detðexp AÞ = expðTrAÞ:

ð15:41Þ

We have examined general aspects until now. Nonetheless, we have not yet studied the relationship between the exponential functions of matrices and specific types of matrices so far. Regarding the applications of exponential functions of matrices to mathematical physics, especially with the transformations of functions and vectors, we frequently encounter orthogonal matrices and unitary matrices. For this purpose, we wish to examine the following properties: (6) Let A be a real skew-symmetric matrix. Then, expA is a real orthogonal matrix. (7) Let A be an anti-Hermitian matrix. Then, expA is a unitary matrix. To show (6), we note that a skew-symmetric matrix is described as AT = - A or AT þ A = 0:

ð15:42Þ

Then, we have ATA = - A2 = AAT and, hence, A and AT commute. Therefore, from Theorem 15.2, we have exp A þ AT = exp 0 = E = exp A exp AT = exp Aðexp AÞT ,

ð15:43Þ

where the last equality comes from Property (3) of (15.31). Equation (15.43) implies that expA is a real orthogonal matrix. Since an anti-Hermitian matrix A is expressed as A{ = - A or A{ þ A = 0,

ð15:44Þ

we have A{A = - A2 = AA{, and so A and A{ commute. Using (15.32), we have exp A þ A{ = exp 0 = E = exp A exp A{ = exp Aðexp AÞ{ ,

ð15:45Þ

where the last equality comes from Property (4) of (15.32). This implies that expA is a unitary matrix. Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations (15.42)–(15.45) are held unchanged. Then, Properties (6) and (7) are rewritten as (6)′ Let A be a real skew-symmetric matrix. Then, exptA (t : real) is a real orthogonal matrix. (7)′ Let A be an anti-Hermitian matrix. Then, exptA (t : real) is a unitary matrix. Now, let a function F(t) be expressed as F(t) = exp tx with real numbers t and x. Then, we have a well-known formula

15.2

Exponential Functions of Matrices and Their Manipulations

617

d F ðt Þ = xF ðt Þ: dt

ð15:46Þ

Next, we extend (15.46) to the exponential functions of matrices. Let us define the differentiation of an exponential function of a matrix with respect to a real parameter t. Then, in concert with (15.46), we have a following important theorem. Theorem 15.4 [1, 2] Let F(t)  exp tA with t being a real number and A being a matrix that does not depend on t. Then, we have d F ðt Þ = AF ðt Þ: dt

ð15:47Þ

In (15.47) we assume that individual matrix components of F(t) are differentiable with respect to t and that the differentiation of a matrix is defined as in (15.2). Proof We have F ðt þ Δt Þ = expðt þ Δt ÞA = expðtA þ ΔtAÞ = ðexp tAÞðexpΔtAÞ = F ðt ÞF ðΔt Þ = ðexpΔtAÞðexp tAÞ = F ðΔt ÞF ðt Þ,

ð15:48Þ

where we considered that tA and ΔtA are commutative and used Theorem 15.2. Therefore, we get 1 1 1 ½F ðt þ Δt Þ - F ðt Þ = ½F ðΔt Þ - E F ðt Þ = Δt Δt Δt =

1 ν=1

1 ν=0

1 ðΔtAÞν - E F ðt Þ ν!

1 ðΔt Þν - 1 Aν F ðt Þ: ν!

ð15:49Þ

Taking the limit of both sides of (15.49), we get lim

1

Δt → 0 Δt

½F ðt þ Δt Þ - F ðt Þ 

d F ðt Þ = AF ðt Þ: dt

Notice that in (15.49) only the first term (i.e., ν = 1) of RHS is non-vanishing with respect to Δt → 0. Then, (15.47) will follow. This completes the proof. On the basis of the power series expansion of the exponential function of a matrix defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may write d F ðt Þ = F ðt ÞA: dt

ð15:50Þ

Using Theorem 15.4, we show the following important properties. These are converse propositions to Properties (6) and (7).

618

15

Exponential Functions of Matrices

(8) Let exptA be a real orthogonal matrix with any real number t. Then, A is a real skew-symmetric matrix. (9) Let exptA be a unitary matrix with any real number t. Then, A is an antiHermitian matrix. To show (8), from the assumption with any real number t, we have exp tAðexp tAÞT = exp tA exp tAT = E:

ð15:51Þ

Differentiating (15.51) with respect to t by use of (15.4), we have ðAexp tAÞ exp tAT þ ðexp tAÞAT exp tAT = 0:

ð15:52Þ

Since (15.52) must hold with any real number, (15.52) must hold with t → 0 as well. On this condition, from (15.52) we get A + AT = 0. That is, A is a real skewsymmetric matrix. In the case of (9), similarly we have exp tAðexp tAÞ{ = exp tA exp tA{ = E:

ð15:53Þ

Differentiating (15.53) with respect to t and taking t → 0, we get A + A{ = 0. That is, A is an anti-Hermitian matrix. Properties (6)–(9) including Properties (6)′ and (7)′ are frequently used later in Chap. 20 in relation to Lie groups and Lie algebras.

15.3

System of Differential Equations

In this section we describe important applications of the exponential functions of matrices to systems of differential equations.

15.3.1 Introduction In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present section, we examine the properties of systems of differential equations. There is a close relationship between the two topics. First we briefly outline it. We describe an example of the system of differential equations. First, let us think of the following general equation: xð_t Þ = aðt Þxðt Þ þ bðt Þyðt Þ,

ð15:54Þ

15.3

System of Differential Equations

619

yð_t Þ = cðt Þxðt Þ þ dðt Þyðt Þ,

ð15:55Þ

where x(t) and y(t) vary as a function of t; a(t), b(t), c(t) and d(t) are coefficients. Differentiating (15.54) with respect to t, we get xð€t Þ = að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þyð_t Þ = að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þ½ cðt Þxðt Þ þ d ðt Þyðt Þ = að_t Þ þ bðt Þcðt Þ xðt Þ þ aðt Þxð_t Þ þ bð_t Þ þ bðt Þdðt Þ yðt Þ,

ð15:56Þ

where with the second equality we used (15.55). To delete y(t) from (15.56), we multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of (15.54) and further subtract the latter equation from the former. As a result, we obtain b€x - b_ þ ða þ dÞb x_ þ a b_ þ bd - bða_ þ bcÞ x = 0:

ð15:57Þ

Upon the above derivation, we assume b ≠ 0. Solving (15.57) and substituting the resulting solution for (15.54), we can get a solution for y(t). If, on the other hand, b = 0, we have xð_t Þ = aðt Þxðt Þ. This is a simple FOLDE and can readily be integrated to yield t

x = C exp

aðt 0 Þdt 0 ,

ð15:58Þ

where C is an integration constant. Meanwhile, differentiating (15.55) with respect to t, we have _ þ d y_ = c_ x þ cax þ dy _ þ dy_ €y = c_ x þ c_x þ dy = ðc_ þ caÞ C exp

t

aðt 0 Þdt 0

_ þ d y,_ þ dy

where we used (15.58) as well as x_ = ax. Thus, we are going to solve a following inhomogeneous SOLDE: _ = ðc_ þ caÞC exp €y - d y_ - dy

t

aðt 0 Þdt 0 :

ð15:59Þ

In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE using a weight function. For a later purpose, let us consider a next FOLDE assuming a(x)  1 in (10.23):

620

15

Exponential Functions of Matrices

xð_t Þ þ pðt Þx = qðt Þ:

ð15:60Þ

As another method to solve (15.60), we examine the method of variation of constants. The homogeneous equation corresponding to (15.60) is xð_t Þ þ pðt Þx = 0:

ð15:61Þ

It can be solved as just before such that t

xðt Þ = C exp -

pðt 0 Þdt 0

 uðt Þ:

ð15:62Þ

Now we assume that a solution of (15.60) can be sought by putting xðt Þ = k ðt Þuðt Þ,

ð15:63Þ

where we assume that the functional form u(t) remains unchanged in the inhomogeneous equation (15.60) and that instead the “constant” k(t) may change as a function of t. Inserting (15.63) into (15.60), we have _ þ ku_ þ pku = ku _ þ kðu_ þ puÞ = ku _ þ k ð- pu þ puÞ x_ þ px = ku _ = q, = ku

ð15:64Þ

where with the third equality we used (15.61) and (15.62) to get u_ = - pu. In this way, (15.64) can easily be integrated to give k ðt Þ =

t

qð t 0 Þ 0 dt þ C, uð t 0 Þ

where C is an integration constant. Thus, as a solution of (15.60), we have xðt Þ = k ðt Þuðt Þ = uðt Þ

t

qðt 0 Þ 0 dt þ Cuðt Þ: uðt 0 Þ

ð15:65Þ

Comparing (15.65) with (10.29), we notice that these two expressions are related. In other words, the method of variation of constants in (15.65) is essentially the same as that using the weight function in (10.29). Another important point to bear in mind is that in Chap. 10 we have solved SOLDEs of constant coefficients using the method of Green’s functions. In that case, we have separately estimated a homogeneous term and inhomogeneous term (i.e., surface term). In the present section, we are going to seek a method corresponding to that using Green’s functions. In the above discussion, we see that a system of differential equations with two unknows can be translated into a SOLDE. Then,

15.3

System of Differential Equations

621

we expect such a system of differential equations of constant coefficients to be solved somehow or other. We will hereinafter describe it in detail giving examples.

15.3.2

System of Differential Equations in a Matrix Form: Resolvent Matrix

The equations of (15.54) and (15.55) are unified in a homogeneous single matrix equation such that xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ

að t Þ bð t Þ

cð t Þ : d ðt Þ

ð15:66Þ

In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so we describe them as row matrices. With this expression of equation, see Sect. 11.2 and (11.37) therein. Another expression is a transposition of (15.66) described by xð_t Þ yð_t Þ

=

að t Þ bð t Þ cð t Þ d ð t Þ

xð t Þ : yð t Þ

ð15:67Þ

Notice that the coefficient matrix has been transposed in (15.67) accordingly. We are most interested in an equation of constant coefficient, and hence, we rewrite (15.66) once again explicitly as xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ

a b

c d

ð15:68Þ

c : d

ð15:69Þ

or d a ð xð t Þ yð t Þ Þ = ð xð t Þ yð t Þ Þ b dt Putting F(t)  (x(t) y(t)), we have dF ðt Þ a c = F ðt Þ : b d dt This is exactly the same as (15.50) if we put

ð15:70Þ

622

15

c : d

ð15:71Þ

F ðt Þ = exp tA:

ð15:72Þ

A=

a b

Exponential Functions of Matrices

Then, we get

As already shown in (15.7) of Sect. 15.1, an exponential function of a matrix, i.e., exptA, can be defined as exp tA  E þ tA þ

1 1 ðtAÞ2 þ ⋯ þ ðtAÞν þ ⋯: ν! 2!

ð15:73Þ

From Property (5) of (15.41), we have det(expA) ≠ 0. This is because no matter what number (either real or complex) TrA may take, exp t(TrA) never vanishes. That is, F(t) = exp A is non-singular, and so an inverse matrix of expA must exist. This is the case with exptA as well; see equation below: ð5Þ0 detðexp tAÞ = exp TrðtAÞ = exp t ðTrAÞ:

ð15:74Þ

Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well. Since in that general case F(t) is non-singular, it consists of n linearly independent column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically described as F ðt Þ = ðx1 ðt Þ x2 ðt Þ ⋯ xn ðt ÞÞ, where xi(t) (1 ≤ i ≤ n) represents a i-th column vector. Note that since F(0) = E, xi(0) has its i-th row component of 1, otherwise 0. Since F(t) is non-singular, n column vectors xi(t) (1 ≤ i ≤ n) are linearly independent. In the case of, e.g., (15.66), A is a (2, 2) matrix, which implies that x(t) and y(t) represent two linearly independent column vectors, i.e., (2, 1) matrices. Furthermore, if we look at the constitution of (15.70), x(t) and y(t) represent two linearly independent solutions of (15.70). We will come back to this point later. On the basis of the method of variation of constants, we deal with inhomogeneous equations and describe below the procedures for solving the system of equations with two unknowns. But, they can be readily conformed to the general case where the equations with n unknowns are dealt with. Now, we rewrite (15.68) symbolically such that xð_t Þ = xðt ÞA, where x(t)  (x(t) y(t)), xð_t Þ  xð_t Þ yð_t Þ , and A  geneous equation can be described as

ð15:75Þ a b

c d

. Then, the inhomo-

15.3

System of Differential Equations

623

xð_t Þ = xðt ÞA þ bðt Þ,

ð15:76Þ

where we define the inhomogeneous term as b  (p q). This is equivalent to xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ

a b

c d

þ ðp qÞ:

ð15:77Þ

Using the same solution F(t) that appears in the homogeneous equation, we assume that the solution of the inhomogeneous equation is described by xðt Þ = kðt ÞF ðt Þ,

ð15:78Þ

where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) = k(t)F(t) of (15.78), we obtain _ þ kF_ = kF _ þ kFA = kFA þ b, x_ = kF

ð15:79Þ

where with the second equality we used F_ = FA in (15.50). Then, we have _ = b: kF

ð15:80Þ

This can be integrated so that we may have k=

t

ds bðsÞF ðsÞ - 1 :

ð15:81Þ

As previously noted, F(s)-1 exists because F(s) is non-singular. Thus, as a solution we get t

xðt Þ = kðt ÞF ðt Þ =

ds bðsÞF ðsÞ - 1 F ðt Þ :

ð15:82Þ

t0

Here we define Rðs, t Þ  F ðsÞ - 1 F ðt Þ:

ð15:83Þ

The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an essential role in solving the system of differential equations both homogeneous and inhomogeneous with either homogeneous or inhomogeneous boundary conditions (BCs). Rewriting (15.82), we get

624

15 t

xðt Þ = kðt ÞF ðt Þ =

Exponential Functions of Matrices

ds½bðsÞRðs, t Þ :

ð15:84Þ

t0

We summarize principal characteristics of the resolvent matrix. ð1Þ Rðs, sÞ = F ðsÞ - 1 F ðsÞ = E, where E is a (2, 2) unit matrix, i.e., E =

1 0

0 1

ð15:85Þ

for an equation having two

unknowns. ð2Þ Rðs, t Þ - 1 = F ðsÞ - 1 F ðt Þ

-1

= F ðt Þ - 1 F ðsÞ = Rðt, sÞ:

ð3Þ Rðs, t ÞRðt, uÞ = F ðsÞ - 1 F ðt ÞF ðt Þ - 1 F ðuÞ = F ðsÞ - 1 F ðuÞ = Rðs, uÞ:

ð15:86Þ ð15:87Þ

It is easy to include a term in solution related to the inhomogeneous BCs. As already seen in (10.33) of Sect. 10.2, a boundary condition for the FOLDEs is set such that, e.g., uðaÞ = σ,

ð15:88Þ

where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the “initial” condition. In the present case, we describe the BCs as xðt 0 Þ  x0  ðσ τÞ,

ð15:89Þ

where σ  x(t0) and τ  y(t0). Then, the said term associated with the inhomogeneous BCs is expected to be written as xðt 0 ÞRðt 0 , t Þ: In fact, since xðt 0 ÞRðt 0 , t 0 Þ = xðt 0 ÞE = xðt 0 Þ, this satisfies the BCs. Then, the full expression of the solution of the inhomogeneous equation that takes account of the BCs is assumed to be expressed as t

xðt Þ = xðt 0 ÞRðt 0 , t Þ þ

ds½bðsÞRðs, t Þ:

ð15:90Þ

t0

Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we give a formula on the differentiation of

15.3

System of Differential Equations

625 t

d dt

Qðt Þ =

Pðt, sÞds,

ð15:91Þ

t0

where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s) r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as 1 Δ→0 Δ

tþΔ

Qðt Þ = lim tþΔ

1 Δ Δ→0

= lim

t

Pðt þ Δ, sÞds -

t0

Pðt, sÞds

t0 t

Pðt þ Δ, sÞds -

t0

t

Pðt þ Δ, sÞds þ

t0 t

-

Pðt þ Δ, sÞds

t0

Pðt, sÞds

t0 tþΔ

1 Δ→0 Δ

= lim

t

Pðt þ Δ, sÞds þ

t

t

Pðt þ Δ, sÞds -

t0 t

1 1 Pðt, t ÞΔ þ lim Δ→0 Δ Δ→0 Δ

≈ lim

t t0

t

Pðt þ Δ, sÞds -

t0

= Pðt, t Þ þ

Pðt, sÞds

t0

Pðt, sÞds

t0

∂Pðt, sÞ ds: ∂t

ð15:92Þ

Replacing P(t, s) of (15.92) with bðsÞRðs, t Þ, we have d dt

t

t

bðsÞRðs, t Þds = bðt ÞRðt, t Þ þ

t0

bðsÞ

t0 t

= bð t Þ þ

bð s Þ

t0

∂Rðs, t Þ ds ∂t

∂Rðs, t Þ ds, ∂t

ð15:93Þ

where with the last equality we used (15.85). Considering (15.83) and (15.93) and differentiating (15.90) with respect to t, we get t

xð_t Þ = xðt0 ÞF ðt 0 Þ - 1 F ð_t Þ þ

ds bðsÞF ðsÞ - 1 F ð_t Þ þ bðt Þ

t0 t

= xðt 0 ÞF ðt 0 Þ - 1 F ðt ÞA þ

ds bðsÞF ðsÞ - 1 F ðt ÞA þ bðt Þ

t0 t

= xðt 0 ÞRðt 0 , t ÞA þ t0

ds½bðsÞRðs, t ÞA  þ bðt Þ

626

15 t

= xðt 0 ÞRðt 0 , t Þ þ

Exponential Functions of Matrices

ds½bðsÞRðs, t Þ A þ bðt Þ

t0

= xðt ÞA þ bðt Þ,

ð15:94Þ

where with the second equality we used (15.70) and with the last equality we used (15.90). Thus, we have certainly recovered the original differential equation of (15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous equation with inhomogeneous BCs. The above discussion and formulation equally apply to the general case of the differential equation with n unknowns, even though the calculation procedures become increasingly complicated with the increasing number of unknowns.

15.3.3 Several Examples To deepen our understanding of the essence and characteristics of the system of differential equations, we deal with several examples regarding the equations of two unknowns with constant coefficients. The constant matrices A of specific types (e.g., anti-Hermitian matrices and skew-symmetric matrices) have a wide field of applications in mathematical physics, especially in Lie groups and Lie algebras (see Chap. 20). In the subsequent examples, however, we deal with various types of matrices. Example 15.1 Solve the following equation xð_t Þ = x þ 2y þ 1, yð_t Þ = 2x þ y þ 2,

ð15:95Þ

under the BCs x(0) = c and y(0) = d. In a matrix form, (15.95) can be written as xð_t Þ yð_t Þ = ðx yÞ

1 2

2 1

þ ð1 2Þ:

ð15:96Þ

The homogeneous equation corresponding to (15.96) is expressed as xð_t Þ yð_t Þ = ðx yÞ

1 2 : 2 1

ð15:97Þ

As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t) such that

15.3

System of Differential Equations

627

F ðt Þ = exp tA, where A =

1 2

2 1

ð15:98Þ

. Since A is a real symmetric (i.e., Hermitian) matrix, it should be

diagonalized by the unitary similarity transformation (see Sect. 14.3). For the purpose of getting an exponential function of a matrix, it is easier to estimate exptD (where D is a diagonal matrix) than to directly calculate exptA. That is, we rewrite (15.97) as xð_t Þ yð_t Þ = ðx yÞUU {

1 2

2 1

UU { ,

ð15:99Þ

where U is a unitary matrix, i.e., U{ = U-1. Following routine calculation procedures (as in, e.g., Example 14.3), we readily get a following diagonal matrix and the corresponding unitary matrix for the diagonalization. That is, we obtain

U=

1 p 2 1 -p 2

1 p 2 1 p 2

and D = U -1

1 2

2 U = U - 1 AU = 1

-1 0

0 : 3

ð15:100Þ

Or we have A = UDU - 1 :

ð15:101Þ

Hence, we get F ðt Þ = exp tA = exp tUDU - 1 = U ðexp tDÞU - 1 ,

ð15:102Þ

where with the last equality we used (15.30). From Theorem 15.3, we have exptD = In turn, we get

e-t 0

0 : e3t

ð15:103Þ

628

15

F ðt Þ =

1 p 2 1 -p 2 =

1 2

1 p 2 1 p 2

e-t 0

e - t þ e3t - e - t þ e3t

0 e3t

Exponential Functions of Matrices

1 p 2 1 p 2

1 -p 2 1 p 2

- e - t þ e3t : e - t þ e3t

ð15:104Þ

With an inverse matrix, we have F ðt Þ - 1 =

1 2

e - 3t þ et e - 3t - e3t

e - 3t - e3t : e - 3t þ et

ð15:105Þ

Therefore, as a resolvent matrix, we get Rðs; t Þ = F ðsÞ - 1 F ðt Þ =

1 2

e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ

- e - ðt - sÞ þ e3ðt - sÞ : e - ðt - sÞ þ e3ðt - sÞ

ð15:106Þ

Thus, as a solution of (15.95), we obtain t

xðt Þ = xð0ÞRð0, t Þ þ

ds½bðsÞRðs, t Þ,

ð15:107Þ

0

where x(0) = (x(0) y(0)) = (c d) represents the BCs and b(s) = (1 2) comes from (15.96). This can easily be calculated so that we can have xð0ÞRð0, t Þ 1 ðc - dÞe - t þ ðc þ dÞe3t 2

=

1 ðd - cÞe - t þ ðc þ dÞe3t 2

ð15:108Þ

and t 0

ds½bðsÞRðs, t Þ =

1 -t e þ e3t - 1 2

-

1 -t e - e3t 2

:

ð15:109Þ

Note that the first component of (15.108) and (15.109) represents the x component of the solution for (15.95) and that the second component represents the y component. From (15.108), we have xð0ÞRð0, 0Þ = xð0ÞE = xð0Þ = ðc dÞ:

ð15:110Þ

Thus, we find that the BCs are certainly satisfied. Putting t = 0 in (15.109), we have

15.3

System of Differential Equations 0

ds½bðsÞRðs, 0Þ = 0,

629

ð15:111Þ

0

confirming that the second term of (15.107) vanishes at t = 0. The summation of (15.108) and (15.109) gives an overall solution of (15.107). Separating individual components of the solution, we describe them as 1 ðc - d þ 1Þe - t þ ðc þ d þ 1Þe3t - 2 , 2 1 yðt Þ = ðd - c - 1Þe - t þ ðc þ d þ 1Þe3t : 2

xð t Þ =

ð15:112Þ

Moreover, differentiating (15.112) with respect to t, we recover the original form of (15.95). The confirmation is left for readers. Remarks on Example 15.1: We become aware of several points in Example 15.1. 1. The resolvent matrix given by (15.106) can be obtained by replacing t of (15.104) with t - s. This is one of important characteristics of the resolvent matrix that is derived from an exponential function of a constant matrix. To see it, if A is a constant matrix, tA and -sA commute. By the definition (15.7) of the exponential function, we have 1 1 2 2 t A þ ⋯ þ t ν Aν þ ⋯, ν! 2! 1 1 expð- sAÞ = E þ ð- sAÞ þ ð- sÞ2 A2 þ ⋯ þ ð- sÞν Aν þ ⋯: 2! ν! exp tA = E þ tA þ

ð15:113Þ ð15:114Þ

Both (15.113) and (15.114) are polynomials of a constant matrix A and, hence, exptA and exp(-sA) commute as well. Then, from Theorem 15.2 and Property (1) of (15.28), we get expðtA - sAÞ = ðexp tAÞ½expð- sAÞ = ½expð- sAÞðexp tAÞ = expð- sA þ tAÞ = exp½ðt - sÞA:

ð15:115Þ

That is, using (15.28), (15.72), and (15.83), we get Rðs, t Þ = F ðsÞ - 1 F ðt Þ = exp ðsAÞ - 1 expðtAÞ = expð- sAÞ expðtAÞ = exp½ðt - sÞA = F ðt - sÞ:

ð15:116Þ

This implies that once we get exptA, we can safely obtain the resolvent matrix by automatically replacing t of (15.72) with t - s. Also, exptA and exp(-sA) are commutative. In the present case, therefore, by exchanging the order of products of F(s)-1 and F(t) in (15.106), we have the same result as (15.106) such that

630

15

Exponential Functions of Matrices

Rðs; t Þ = F ðt ÞF ðsÞ - 1 =

1 2

e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ

- e - ðt - sÞ þ e3ðt - sÞ : e - ðt - sÞ þ e3ðt - sÞ

ð15:117Þ

2. The resolvent matrix of the present example is real symmetric. This is because if A is real symmetric, i.e., AT = A, we have ðAn ÞT = AT

n

= An ,

ð15:118Þ

that is, An is real symmetric as well. From (15.7), expA is real symmetric accordingly. In a similar manner, if A is Hermitian, expA is also Hermitian. 3. Let us think of a case where A is not a constant matrix but varies as a function of t. In that case, we have

A ðt Þ =

aðt Þ bðt Þ : cð t Þ d ð t Þ

ð15:119Þ

AðsÞ =

að s Þ bð s Þ : cðsÞ d ðsÞ

ð15:120Þ

Also, we have

We can easily check that in a general case Aðt ÞAðsÞ ≠ AðsÞAðt Þ,

ð15:121Þ

namely, A(t) and A(s) are not generally commutative. The matrix tA(t) is not commutative with -sA(s), either. In turn, (15.115) does not hold either. Thus, we need an elaborated treatment for this. Example 15.2 Find the resolvent matrix of the following homogeneous equation: xð_t Þ = 0, yð_t Þ = x þ y:

ð15:122Þ

In a matrix form, (15.122) can be written as xð_t Þ yð_t Þ = ðx yÞ

0 0

1 1

:

ð15:123Þ

15.3

System of Differential Equations

The matrix A =

0 0

1 1

631

is characterized by an idempotent matrix (see Sect. 12.4).

As in the case of Example 15.1, we get xð_t Þ yð_t Þ = ðx yÞPP - 1 where we choose P =

1 0

0 0

and, hence, P - 1 =

1 1

PP - 1 ,

1 1

-1 1

1 0

ð15:124Þ . Notice that in this

example since A was not a symmetric matrix, we did not use a unitary matrix with the similarity transformation. As a diagonal matrix D, we have D = P-1

0 0

1 1

P = P - 1 AP =

0 0

0 1

:

ð15:125Þ

Or we have A = PDP - 1 :

ð15:126Þ

Hence, we get F ðt Þ = exp tA = exp tPDP - 1 = Pðexp tDÞP - 1 ,

ð15:127Þ

where we used (15.30) again. From Theorem 15.3, we have 1 0 : 0 et

exptD =

ð15:128Þ

In turn, we get F ðt Þ =

1 0

1 1

1 0

0 et

1 0

-1 1

=

1 0

- 1 þ et : et

ð15:129Þ

With an inverse matrix, we have F ðt Þ - 1 =

1 0

e-t - 1 : e-t

ð15:130Þ

Therefore, as a resolvent matrix, we get Rðs; t Þ = F ðsÞ - 1 F ðt Þ =

1 0

- 1 þ et - s : et - s

ð15:131Þ

Using the resolvent matrix, we can include the inhomogeneous term along with BCs (or initial conditions). Once again, by exchanging the order of products of F(s)-1

632

15

Exponential Functions of Matrices

and F(t) in (15.131), we have the same resolvent matrix. This can readily be checked. Moreover, we get Rðs, t Þ = F ðt - sÞ. Example 15.3 Solve the following equation xð_t Þ = x þ c, yð_t Þ = x þ y þ d,

ð15:132Þ

under the BCs x(0) = σ and y(0) = τ. In a matrix form, (15.132) can be written as xð_t Þ yð_t Þ = ðx yÞ The matrix of a type of M =

1 0

1 1

1 0

1 1

þ ðc d Þ:

ð15:133Þ

has been fully investigated in Sect. 12.6 and

characterized as a matrix that cannot be diagonalized. In such a case, we directly calculate exptM. Here, remember that product of triangle matrices of the same kind (i.e., an upper triangle matrix or a lower triangle matrix; see Sect. 12.1) is a triangle matrix of the same kind. We have M2 =

1 0

1 1

1 0

1 1

=

1 0

2 1

, M3 =

1 0

2 1

1 0

1 1

=

1 0

3 1

, ⋯:

Repeating this, we get Mn =

1 0

n 1

:

ð15:134Þ

Thus, we have 1 2 2 1 t M þ ⋯ þ tν M ν þ ⋯ 2! ν! 1 1 2 þ ⋯ þ t ν 10 1ν þ ⋯ 0 1 ν!

exp tM = Rð0, t Þ = E þ tM þ = =

et 0

1 0

0 1

þt

1 0

1 1

þ

1 2 t 2!

1 1 t 1 þ t þ t2 þ ⋯ þ tν - 1 þ ⋯ 2! ðν - 1Þ! et

With the resolvent matrix, we get

=

et 0

tet : et

ð15:135Þ

15.3

System of Differential Equations

633

Rðs; t Þ = expð - sM ÞexptM =

et - s 0

ðt - sÞet - s : et - s

ð15:136Þ

Using (15.107), we get t

xðt Þ = ðσ τÞRð0, t Þ þ

ds½ðc dÞRðs, t Þ:

ð15:137Þ

0

Thus, we obtain the solution described by xðt Þ = ðc þ σ Þet - c, yðt Þ = ðc þ σ Þtet þ ðd - c þ τÞet þ c - d:

ð15:138Þ

From (15.138), we find that the original inhomogeneous equation (15.132) and BCs of x(0) = σ and y(0) = τ have been recovered. Example 15.4 Find the resolvent matrix of the following homogeneous equation: xð_t Þ = 0, yð_t Þ = x:

ð15:139Þ

In a matrix form, (15.139) can be written as xð_t Þ yð_t Þ = ðx yÞ The matrix N =

0 0

1 0

0 0

:

1 0

ð15:140Þ

is characterized by a nilpotent matrix (Sect. 12.3). Since

2

N and matrices of higher order of N vanish, we have exp tN = E þ tN =

1 0

t 1

:

ð15:141Þ

We have a corresponding resolvent matrix such that Rðs, t Þ =

1 0

t-s 1

:

ð15:142Þ

Using the resolvent matrix, we can include the inhomogeneous term along with BCs. Example 15.5 As a trivial case, let us consider a following homogeneous equation:

634

15

xð_t Þ yð_t Þ = ðx yÞ

Exponential Functions of Matrices

a 0

0 b

ð15:143Þ

,

where one of a and b can be zero or both of them can be zero. Even though (15.143) merely apposes two FOLDEs, we can deal with such cases in an automatic manner. From Theorem 15.3, as eigenvalues of expA where A = a0 0b , we have ea and eb. That is, we have eat 0

F ðt Þ = exp tA =

0 ebt

:

ð15:144Þ

In particular, if a = b (including a = b = 0), we merely have the same FOLDE twice changing the arguments x with y. Readers may well wonder why we have to argue such a trivial case expressly. That is because the theory we developed in this chapter holds widely and we are able to make a clear forecast about a constitution of the solution for differential equations of a first order. For instance, (15.144) immediately tells us that a fundamental solution for xð_t Þ = axðt Þ

ð15:145Þ

is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)] of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation under BCs. Returning back to Example 15.1, we had xð_t Þ yð_t Þ = ðx yÞ

1 2

2 1

þ ð1 2Þ:

ð15:96Þ

This equation can be converted to xð_t Þ yð_t Þ U = ðx yÞUU - 1

where U =

p1 2 - p12

p1 2 p1 2

1 2

2 U þ ð1 2ÞU, 1

. Then, ðx yÞU = ðx yÞ

p1 2 - p12

ð15:146Þ

p1 2 p1 2

. Defining (X Y)

,

ð15:147Þ

as

ðX Y Þ  ðx yÞU = ðx yÞ

we have

1 p 2 1 -p 2

1 p 2 1 p 2

15.3

System of Differential Equations

635

-1 0 0 3

X ð_t Þ Y ð_t Þ = ðX Y Þ

þ ð1 2ÞU:

ð15:148Þ

Then, according to (15.90), we get a solution t

~ ðt 0 , t Þ þ X ðt Þ = X ðt 0 ÞR

~ ðs, t Þ , ds ~ bðsÞR

ð15:149Þ

t0

where X(t0) = (X(t0) by

~ ðs, t Þ is given Y(t0)) and ~bðsÞ = bðsÞU. The resolvent matrix R

~ ðs; t Þ = R

e - ðt - sÞ 0

0 : e3ðt - sÞ

ð15:150Þ

To convert (X Y) to (x y), operating U-1 on both sides of (15.149) from the right, we obtain t

~ ðt 0 , t ÞU - 1 þ X ðt ÞU - 1 = X ðt 0 ÞU - 1 U R

~ ðs, t Þ U - 1 , ð15:151Þ ds ~bðsÞU - 1 U R

t0

where with the first and second terms we insert U-1U = E. Then, we get t

~ ðt 0 , t ÞU - 1 þ x ð t Þ = xð t 0 Þ U R

~ ðs, t ÞU - 1 : dsbðsÞ U R

ð15:152Þ

t0

~ ðs, t ÞU - 1 to have We calculate U R ~ ðs; t ÞU - 1 = UR =

1 2

1 2

1 -1

e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ

1 1

e - ðt - sÞ 0

0 e3ðt - sÞ

- e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ

1 1

-1 1

= Rðs; t Þ:

ð15:153Þ

Thus, we recover (15.90), i.e., we have t

xðt Þ = xðt 0 ÞRðt 0 , t Þ þ

ds½bðsÞRðs, t Þ:

ð15:90Þ

t0

~ ðs, t Þ are connected to each other Equation (15.153) shows that Rðs, t Þ and R through the unitary similarity transformation such that

636

15

Exponential Functions of Matrices

~ ðs, t Þ = U - 1 Rðs, t ÞU = U { Rðs, t ÞU: R

ð15:154Þ

Example 15.6 Let us consider the following system of differential equations: xð_t Þ yð_t Þ = ðx yÞ

0 ω

-ω 0

þ ða 0Þ:

ð15:155Þ

0 -ω is characterized by an anti-Hermitian operator. This type ω 0 of operator frequently appears in Lie groups and Lie algebras that we will deal with in Chap. 20. Here we define an operator D as The matrix

D=

0 1

-1 0

or ωD =

0 ω

-ω : 0

ð15:156Þ

The calculation procedures will be seen in Chap. 20, and so we only show the result here. That is, we have cos ωt sin ωt

exp tωD =

- sin ωt : cos ωt

ð15:157Þ

We seek the resolvent matrix Rðs, t Þ such that Rðs; t Þ =

cos ωðt - sÞ sin ωðt - sÞ

- sin ωðt - sÞ : cos ωðt - sÞ

ð15:158Þ

The implication of (15.158) is simple; synthesis of a rotation of ωt and the inverse rotation -ωs is described by ω(t - s). Physically, (15.155) represents the motion of a point mass under an external field. We routinely obtained the above solution using (15.90). If, for example, we solve (15.155) under the BCs for which the point mass is placed at rest at the origin of the xy-plane at t = 0, we get a solution described by x= y= Deleting t from (15.159), we get

a sin ωt, ω

a ðcos ωt - 1Þ: ω

ð15:159Þ

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave

637

Fig. 15.2 Circular motion of a point mass

− /



x2 þ y þ

a ω

2

=

/

a 2 : ω

ð15:160Þ

In other words, the point mass performs a circular motion (see Fig. 15.2). In Example 15.1 we mentioned that if the matrix A that defines the system of differential equations varies as a function of t, then the solution method based upon exptA does not work. Yet, we have an important case where A is not a constant matrix. Let us consider a next illustration.

15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave In Sect. 4.5 we dealt with an electron motion under a circularly polarized light. Here we consider the motion of a charged particle in a linearly polarized electromagnetic wave. Suppose that a charged particle is placed in vacuum. Suppose also that an electromagnetic wave (light) linearly polarized along the x-direction is propagated in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take account of influence from both of the electric field and magnetic field components of Lorentz force. From (7.58) we have

638

15

Exponential Functions of Matrices

E = E0 eiðkx - ωtÞ = E 0 e1 eiðkz - ωtÞ ,

ð15:161Þ

where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two unit vectors appear just below.) If the extent of motion of the charged particle is narrow enough around the origin compared to a wavelength of the electromagnetic field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have E ≈ E0 e1 e - iωt :

ð15:162Þ

Taking a real part of (15.162), we get E = E 0 e1 cos ωt:

ð15:163Þ

Meanwhile, from (7.59) we have H = H 0 eiðkz - ωtÞ ≈ H 0 cos ωt:

ð15:164Þ

From (7.60) furthermore, we get H 0 = e3 ×

E0 μ0 ε0

= e3 ×

E 0 e1 E 0 e2 = : μ0 c μ0 c

ð15:165Þ

Thus, as the magnetic flux density, we have B = μ0 H ≈ μ0 H 0 cos ωt =

E 0 e2 cos ωt : c

ð15:166Þ

The Lorentz force F exerted on the charged particle is then described by F = eE þ e x_ × B,

ð15:167Þ

where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a charge e. In (15.167) the first term represents the electric Lorentz force and the second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in (15.167) with those in (15.163) and (15.166), we get F= 1-

eE 1 z_ eE 0 e1 cos ωt þ 0 x_ e3 cos ωt c c

= m€x = m€xe1 þ m€ye2 þ m€ze3 ,

ð15:168Þ

where m is a mass of the charged particle. Comparing each vector components, we have

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave

m€x = 1 -

639

1 z_ eE0 cos ωt, c

m€y = 0, eE0 x_ cos ωt: c

m€z =

ð15:169Þ

Note that the Lorentz force is not exerted in the direction of the y-axis. Putting a  eE0/m and b  eE0/mc, we get €x = a cos ωt - b_z cos ωt, ð15:170Þ

€z = b_x cos ωt: Further putting x_ = ξ and z_ = ζ, we obtain ξ_ = - bζ cos ωt þ a cos ωt, ζ_ = bξ cos ωt:

ð15:171Þ

Writing (15.171) in a matrix form, we obtain a following system of inhomogeneous differential equations: ξð_t Þ ζ ð_t Þ = ðξ ζ Þ

0 - b cos ωt

b cos ωt 0

þ ða cos ωt 0Þ,

ð15:172Þ

where the inhomogeneous term is given by (a cos ωt 0). First, we think of a homogeneous equation described by ξð_t Þ ζ ð_t Þ = ðξ ζ Þ

0 - b cos ωt

b cos ωt : 0

ð15:173Þ

Meanwhile, we consider the following equation: d dt

cos f ðt Þ - sin f ðt Þ =

sin f ðt Þ cos f ðt Þ cos f ðt Þ - sin f ðt Þ

=

- f ð_t Þ sin f ðt Þ - f ð_t Þ cos f ðt Þ

sin f ðt Þ cos f ðt Þ

0 - f ð_t Þ

f ð_t Þ cos f ðt Þ - f ð_t Þ sin f ðt Þ f ð_t Þ 0

:

ð15:174Þ

Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that f(t) may satisfy

640

15

Exponential Functions of Matrices

f ð_t Þ = b cos ωt,

ð15:175Þ

we might well be able to solve (15.172). In that case, moreover, we expect that two linearly independent column vectors cos f ðt Þ and - sin f ðt Þ

sin f ðt Þ cos f ðt Þ

ð15:176Þ

give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175) can be given by f ðt Þ =

b sin ωt = ~b sin ωt, ω

ð15:177Þ

where ~b  b=ω. Notice that at t = 0 two column vectors of (15.176) give 1 0

and

0 1

,

if we choose f ðt Þ = ~b sin ωt for f(t) in (15.176). Thus, defining F(t) as F ðt Þ 

cos ~b sin ωt -sin ~b sin ωt

sin ~ b sin ωt cos ~ b sin ωt

ð15:178Þ

and taking account of (15.174), we recast the homogeneous equation (15.173) as dF ðt Þ = F ðt ÞA, dt A

0 - b cos ωt

b cos ωt : 0

ð15:179Þ

Equation (15.179) is formally the same as (15.47), even though F(t) is not given in the form of exptA. This enables us to apply the general scheme mentioned in Sect. 15.3.2 to the present case. That is, we are going to address the given problem using the method of variation of constants such that xðt Þ = kðt ÞF ðt Þ,

ð15:78Þ

where we define x(t)  (ξ(t) ζ(t)) and k(t)  (u(t) v(t)) as a variable constant. Hence, using (15.80)–(15.90), we should be able to find the solution of (15.172) described by

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave t

xðt Þ = xðt 0 ÞRðt 0 , t Þ þ

641

ds½bðsÞRðs, t Þ,

ð15:90Þ

t0

where b(s)  (a cos ωt 0), i.e., the inhomogeneous term in (15.172). The resolvent matrix Rðs, t Þ is expressed as Rðs; t Þ = F ðsÞ - 1 F ðt Þ =

cos ½f ðt Þ - f ðsÞ - sin ½f ðt Þ - f ðsÞ

sin ½f ðt Þ - f ðsÞ , cos ½f ðt Þ - f ðsÞ

ð15:180Þ

where f(t) is given by (15.177). With F(s)-1, we simply take a transpose matrix of (15.178), because it is an orthogonal matrix. Thus, we obtain F ðsÞ - 1 =

cos ~b sin ωs sin ~b sin ωs

- sin ~ b sin ωs cos ~ b sin ωs

:

The resolvent matrix (15.180) possesses the properties described as (15.85), (15.86), and (15.87). These can readily be checked. Also, we become aware that F(t) of (15.178) represents a rotation matrix in ℝ2; see Sect. 11.2. Therefore, F(t) and F(s)-1 are commutative. Notice, however, that we do not have a resolvent matrix as a simple form that appears in (15.116). In fact, from (15.178) we find Rðs, t Þ ≠ F ðt - sÞ. It is because the matrix A given as (15.179) is not a constant but depends on t. Rewriting (15.90) as t

xðt Þ = xð0ÞRð0, t Þ þ

ds½ðacos ωt 0ÞRðs, t Þ

ð15:181Þ

0

and setting BCs as x(0) = (σ τ), with the individual components, we finally obtain a ξðt Þ = σ cos ~b sin ωt - τ sin ~b sin ωt þ sin ~ b sin ωt , b a a ζ ðt Þ = σ sin ~b sin ωt þ τ cos ~b sin ωt - cos ~ b sin ωt þ : b b

ð15:182Þ

Note that differentiating both sides of (15.182) with respect to t, we recover original equation of (15.172) to be solved. Further setting ξ(0) = σ = ζ(0) = τ = 0 as BCs, we have ξðt Þ = ζ ðt Þ =

a sin ~b sin ωt , b

a 1 - cos ~b sin ωt : b

ð15:183Þ

642

15

Fig. 15.3 Velocities ξ and ζ of a charged particle in the xand z-directions, respectively. The charged particle exerts a periodic motion under the influence of a linearly polarized electromagnetic wave

Exponential Functions of Matrices

/

/

Notice that the above BCs correspond to that the charged particle is initially placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get ξ2 þ ζ -

a b

2

=

a 2 : b

ð15:184Þ

Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions, respectively. It is interesting that although the velocity in the x-direction switches from negative to positive, that in the z-direction is kept nonnegative. This implies that the particle is oscillating along the x-direction but continually drifting toward the z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually drifting term ab t. Figure 15.4 shows the Lorentz force F as a function of time in the case where the charge of the particle is positive. Again, the electric Lorentz force (represented by E) switches from positive to negative in the x-direction, but the magnetic Lorentz force (represented by x_ × B) holds nonnegative in the z-direction all the time. To precisely analyze the positional change of the particle, we need to integrate (15.182) once again. This requires the detailed numerical calculation. For this purpose the following relation is useful [7]: cosðxsin t Þ = sinðxsin t Þ =

1

J ðxÞ cos mt, m= -1 m 1

J ðxÞ sin mt, m= -1 m

ð15:185Þ

where the functions Jm(x) are called the Bessel functions that satisfy the following equation:

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave

Fig. 15.4 Lorentz force as a function of time. In (a) and (b), the phase of electromagnetic fields E and B is reversed. As a result, the electric Lorentz force (represented by E) switches from positive to negative in the x-direction, but the magnetic Lorentz force (represented by x_ × B) holds nonnegative all the time in the z-direction

643

(a)

,

× , (b)

× , ,

d2 J n ðxÞ 1 dJ n ðxÞ n2 þ 1 þ J ðxÞ = 0: x dx dx2 x2 n

ð15:186Þ

Although we do not examine the properties of the Bessel functions in this book, the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel functions are widely investigated as one of special functions in many branches of mathematical physics. To conclude this section, we have described an example to which the matrix A of (15.179) that characterizes the system of differential equations is not a constant matrix but depends on the parameter t. Unlike the case of the constant matrix A, it

644

15

Exponential Functions of Matrices

may well be difficult to find a fundamental set of solutions. Once we can find the fundamental set of solutions, however, it is always possible to construct the resolvent matrix and solve the problem using it. In this respect, we have already encountered a similar situation in Chap. 10 where Green’s functions were constructed by a fundamental set of solutions. In this section, a fundamental set of solutions could be found and, as a consequence, we could construct the resolvent matrix. It is, however, not always the case that we can recast the system of differential equation (15.66) as the form of (15.179). Nonetheless, whenever we are successful in finding the fundamental set of solutions, we can construct a resolvent matrix and, hence, solve the problems. This is a generic and powerful tool.

References 1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 2. Satake I (1974) Linear algebra. Shokabo, Tokyo 3. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York 4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo 5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo 7. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd edn. Cambridge University Press, Cambridge 8. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 9. Hassani S (2006) Mathematical physics. Springer, New York

Part IV

Group Theory and Its Chemical Applications

The universe comprises space and matter. These two mutually stipulate their modality of existence. We often comprehend related various aspects as a manifestation of symmetry. In this part we deal with the symmetry from a point of view of group theory. In this part of the book, we outline and emphasize chemical applications of the methods of mathematical physics. This part supplies us with introductory description of group theory. The group theory forms an important field of both pure and applied mathematics. Starting with the definition of groups, we cover a variety of topics related to group theory. Of these, symmetry groups are familiar to chemists, because they deal with a variety of matter and molecules that are characterized by different types of symmetry. The symmetry group is a kind of finite group and called a point group as well. Meanwhile, we have various infinite groups that include rotation group as a typical example. We also mention an introductory theory of the rotation group of SO(3) that deals with an important topic of, e.g., Euler angles. We also treat successive coordinate transformations. Next, we describe representation theory of groups. Schur’s lemmas and related grand orthogonality theorem underlie the representation theory of groups. In parallel, characters and irreducible representations are important concepts that support the representation theory. We present various representations, e.g., regular representation, direct-product representation, symmetric and antisymmetric representations. These have wide applications in the field of quantum mechanics and quantum chemistry, and so forth. On the basis of the above topics, we highlight quantum chemical applications of group theory in relation to a method of molecular orbitals. As tangible examples, we adopt aromatic molecules and methane. The last part deals with the theory of continuous groups. The relevant theory has wide applications in many fields of both pure and applied physics and chemistry. We highlight the topics of SU(2) and SO(3) that very often appear there. Tangible examples help understand the essence.

Chapter 16

Introductory Group Theory

A group comprises mathematical elements that satisfy four simple definitions. A bunch of groups exists under these simple definitions. This makes the group theory a discriminating field of mathematics. To get familiar with various concepts of groups, we first show several tangible examples. Group elements can be numbers (both real and complex) and matrices. More abstract mathematical elements can be included as well. Examples include transformation, operation, etc. as already studied in previous parts. Once those mathematical elements form a group, they share several common notions such as classes, subgroups, and direct-product groups. In this context, readers are encouraged to conceive different kinds of groups close to their heart. Mapping is an important concept as in the case of vector spaces. In particular, isomorphism and homomorphism frequently appear in the group theory. These concepts are closely related to the representation theory that is an important pillar of the group theory.

16.1

Definition of Groups

In contrast to a broad range of applications, the definition of the group is simple. Let ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or uncountable (e.g., real numbers) and the number of elements may be finite or infinite. We denote this by ℊ = {gν}. If a group is a finite group, we express it as ℊ = fg1 , g2 , ⋯, gn g,

ð16:1Þ

where n is said to be an order of the group. Definition of the group comprises the following four axioms with respect to a well-defined “multiplication” rule between any pair of elements. The multiplication

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_16

647

648

16

Introductory Group Theory

is denoted by a symbol “⋄” below. Note that the symbol ⋄ implies an ordinary multiplication, an ordinary addition, etc. (A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say that the set is “closed” regarding the multiplication.) (A2) Multiplication is associative, i.e., a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c. (A3) The set ℊ contains an element of e called the identity element such that we have a ⋄ e = e ⋄ a = a with any element a of ℊ. (A4) For any a of ℊ, we have an element b such that a ⋄ b = b ⋄ a = e. The element b is said to be the inverse element of a. We denote b  a-1. In the above definitions, we assume that the commutative law does not necessarily hold, that is, a ⋄ b ≠ b ⋄ a. In that case the group ℊ is said to be a non-commutative group. However, we have a case where the commutative law holds, i.e., a ⋄ b = b ⋄ a. If so, the group ℊ is called a commutative group or an Abelian group. Let us think of some examples of groups. Henceforth, we follow the convention and write ab to express a ⋄ b. Example 16.1 We present several examples of groups below. Examples (1)–(4) are simple, but Example (5) is general. 1. ℊ = {1, -1}. The group ℊ makes a group with respect to the multiplication. This is an example of a finite group. 2. ℊ = {⋯, -3, -2, -1, 0, 1, 2, 3, ⋯}. The group ℊ makes a group with respect to the addition. This is an infinite group. For instance, take a(>0) and make a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1, ⋯ again and again. Thus, addition is not closed and, hence, we must have an infinite group. 3. Let us start with a matrix a = We have a2 =

-1 0

0 -1

0 1

-1 0

. Then, the inverse a - 1 = a3 =

0 -1

1 0

.

. Its inverse is a2 itself. These four elements make a

group. That is, ℊ = {e, a, a2, a3}. This is an example of cyclic groups. 4. ℊ = {1}. It is a most trivial case, but sometimes the trivial case is very important as well. We will come back later to this point. 5. Let us think of a more general case. In Chap. 11, we discussed endomorphism on a vector space and showed the necessary and sufficient condition for the existence of an inverse transformation. In this relation, consider a set that comprises matrices such that GLðn, ℂÞ  A = aij ; i, j = 1, 2, ⋯, n, aij 2 ℂ, detA ≠ 0 : This may be either a finite group or an infinite group. The former can be a symmetry group and the latter can be a rotation group. This group is characterized by a set of invertible and endomorphic linear transformations over a vector space Vn and called a linear transformation group or a general linear group and denoted by GL(n, ℂ), GL(Vn), GL(V ), etc. The relevant transformations are bijective. We can

16.2

Subgroups

Table 16.1 Multiplication table of ℊ = {e, a, a2, a3}

649 ℊ e a b c

e e a b c

a a b c e

b  a2 b c e a

c  a3 c e a b

readily make sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the vector space can be ℂn or a function space. The structure of a finite group is tabulated in a multiplication table. This is made up such that group elements are arranged in a first row and a first column and that an intersection of an element gi in the row and gj in the column is designated as a product gi ⋄ gj. Choosing the above (3) for an example, we make its multiplication table (see Table 16.1). There we define a2 = b and a3 = c. Having a look at Table 16.1, we notice that in the individual rows and columns each group element appears once and only once. This is well known as a rearrangement theorem. Theorem 16.1 Rearrangement theorem [1] In each row or each column in the group multiplication table, individual group elements appear once and only once. From this, each row and each column list merely rearranged group elements. Proof Let a set ℊ = {g1  e, g2, ⋯, gn} be a group. Arbitrarily choosing any element h from ℊ and multiplying individual elements by h, we obtain a set H = fhg1 , hg2 , ⋯, hgn g. Then, all the group elements of ℊ appear in H once and only once. Choosing any group element gi, let us multiply gi by h-1 to get h-1gi. Since h-1gi must be a certain element gk of ℊ, we put h-1gi = gk. Multiplying both sides by h, we have gi = hgk. Therefore, we are able to find this very element hgk in H, i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in turn that gi appears more than once. Then, we must have gi = hgk = hgl (k ≠ l ). Multiplying the relation by h-1, we would get h-1gi = gk = gl, in contradiction to the supposition. This means that gi appears in H once and only once. This confirms that the theorem is true of each row of the group multiplication table. A similar argument applies with a set H0 = fg1 h, g2 h, ⋯, gn hg. This confirms in turn that the theorem is true of each column of the group multiplication table. These complete the proof.

16.2

Subgroups

As we think of subspaces in a linear vector space, we have subgroups in a group. The definition of the subgroup is that a subset H of a group makes a group with respect to the multiplication ⋄ that is defined for the group ℊ. The identity element makes a

650

16

Introductory Group Theory

group by itself. Both {e} and ℊ are subgroups as well. We often call subgroups other than {e} and ℊ “proper” subgroups. A necessary and sufficient condition for the subset H to be a subgroup is the following: ð1Þ hi , hj 2 H ⟹ hi ⋄hj 2 H , ð 2Þ h 2 H ⟹ h - 1 2 H : If H is a subgroup of ℊ, it is obvious that the relations (1) and (2) hold. Conversely, if (1) and (2) hold, H is a subgroup. In fact, (1) ensures the aforementioned relation (A1). Since H is a subset of ℊ, this guarantees the associative law (A2). The relation (2) ensures (A4). Finally, in virtue of (1) and (2), h ⋄ h-1 = e is contained in H ; this implies that (A3) is satisfied. Thus, H is a subgroup, because H satisfies the axioms (A1) to (A4). Of the above examples, (3) has a subgroup H = e, a2 . It is important to decompose a set into subsets that do not mutually contain an element (except for a special element) among them. We saw this in Part III when we decomposed a linear vector space into subspaces. In that case the said special element was a zero vector. Here let us consider a related question in its similar aspects. Let H = fh1  e, h2 , ⋯, hs g be a subgroup of ℊ. Also let us consider aH where ∃ a 2 ℊ and a= 2H . Suppose that aH is a subset of ℊ such that aH = fah1 , ah2 , ⋯, ahs g. Then, we have another subset H þ aH . If H contains s elements, so does aH . In fact, if it were not the case, namely, if ahi = ahj, multiplying both sides by a-1, we would have hi = hj, in contradiction. Next, let us take b such that b= 2H and b= 2aH and make up bH and H þ aH þ bH successively. Our question is whether these procedures decompose ℊ into subsets mutually exclusive and collectively exhaustive. Suppose that we can succeed in such a decomposition and get ℊ = g1 H þ g2 H þ ⋯ þ gk H ,

ð16:2Þ

where g1, g2, ⋯, gk are mutually different elements with g1 being the identity e. In that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right coset decomposition can be done to give ℊ = H g1 þ H g 2 þ ⋯ þ H g k :

ð16:3Þ

gk H ≠ H gk or gk H gk- 1 ≠ H :

ð16:4Þ

In general, however,

Taking the case of left coset as an example, let us examine whether different cosets mutually contain a common element. Suppose that gi H and gj H mutually

16.3

Classes

651

contain a common element. Then, that element would be expressed as gihp = gjhq (1 ≤ i, j ≤ n; 1 ≤ p, q ≤ s). Thus, we have gi hp hq- 1 = gj . Since H is a subgroup of ℊ, hp hq- 1 2 H . This implies that gj 2 gi H . It is in contradiction to the definition of left coset. Thus, we conclude that different cosets do not mutually contain a common element. Suppose that the order of ℊ and H is n and s, respectively. Different k cosets comprise s elements individually and different cosets do not mutually possess a common element and, hence, we must have n = sk,

ð16:5Þ

where k is called an index of H . We will have many examples afterward.

16.3 Classes Another method to decompose a group into subsets is called conjugacy classes. A conjugate element is defined as follows: Let a be an element arbitrarily chosen from a group. Then an element gag-1 is called a conjugate element or conjugate to a. If c is conjugate to b and b is conjugate to a, then c is conjugate to a. It is because c = g0 bg0

-1

, b = gag - 1 ⟹ c = g0 bg0

-1

= g0 gag - 1 g0

-1

= g0 gaðg0 gÞ

-1

:

ð16:6Þ

In the above a set containing a and all the elements conjugate to a is said to be a (conjugate) class of a. Denoting this set by ∁a, we have ∁a = a, g2 ag2- 1 , g3 ag3- 1 , ⋯, gn agn- 1 :

ð16:7Þ

In ∁a the same element may appear repeatedly. It is obvious that in every group the identity element e forms a class by itself. That is, ∁e = feg:

ð16:8Þ

As in the case of the decomposition of a group into (left or right) cosets, we can decompose a group to classes. If group elements are not exhausted by a set = ∁a and make ∁b similarly comprising ∁e or ∁a, let us take b such that b ≠ e and b 2 to (16.7). Repeating this procedure, we should be able to decompose a group into classes. In fact, if group elements have not yet exhausted after these procedures, take remaining element z and make a class. If the remaining element is only z in this moment, z can make a class by itself (as in the case of e). Notice that for an Abelian group every element makes a class by itself. Thus, with a finite group, we have a decomposition such that

652

16

Introductory Group Theory

ℊ = ∁e þ ∁a þ ∁b þ ⋯ þ ∁z :

ð16:9Þ

To show that (16.9) is really a decomposition, suppose that, for instance, a set ∁a \ ∁b is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy a following relation: x = αaα-1 = βbβ-1, i.e., b = β-1 αaα-1β = β-1 αa(β-1 α)-1. This implies that b has already been included in ∁a, in contradiction to the supposition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of classes. In the above we thought of a class conjugate to a single element. This notion can be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be an element of ℊ. Let us now consider a set H 0 = gH g - 1 . The set H 0 is a subgroup of ℊ and is called a conjugate subgroup. In fact, let hi and hj be any two elements of H , that is, let ghig-1 and ghjg-1 be any two elements of H 0 . Then, we have ghi g - 1 ghj g - 1 = ghi hj g - 1 = ghk g - 1 ,

ð16:10Þ

Hence, ghk g - 1 2 H 0 . Meanwhile, where h k = hi h j 2 H . -1 -1 0 -1 -1 ðghi g Þ = ghi g 2 H . Thus, conditions (1) and (2) of Sect. 16.2 are satisfied with H 0 . Therefore, H 0 is a subgroup of ℊ. The subgroup H 0 has the same order as H . This is because with any two different elements hi and hj ghig-1 ≠ ghjg-1. If for 8g2ℊ and a subgroup H we have a following equality g - 1H g = H ,

ð16:11Þ

such a subgroup H is said to be an invariant subgroup. If (16.11) holds, H should be a sum of classes (reader, please show this). A set comprising only the identity, i.e., {e}, forms a class. Therefore, if H is a proper subgroup, H must contain two or more classes. The relation (16.11) can be rewritten as gH = H g:

ð16:12Þ

This implies that the left coset is identical to the right coset. Thus, as far as we are dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish left and right cosets. Now let us anew consider the (left) coset decomposition of ℊ by an invariant subgroup H ℊ = g1 H þ g2 H þ ⋯ þ gk H ,

ð16:13Þ

where we have H = fh1  e, h2 , ⋯, hs g. Then multiplication of two elements that belong to the cosets gi H and gj H is expressed as

16.4

Isomorphism and Homomorphism

ðgi hl Þ gj hm = gi gj gj- 1 hl

653

gj hm = gi gj gj- 1 hl gj hm = gi gj hp hm = gi gj h q ,

ð16:14Þ

where the third equality comes from (16.11). That is, we should have ∃hp such that gj- 1 hl gj = hp and hphm = hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with 1 ≤ α ≤ s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets ð g i H Þ g j H = g i gj H :

ð16:15Þ

Viewing LHS of (16.15) as a product of two cosets, we find that the said product is a coset as well. This implies that a collection of the cosets forms a group. Such a group that possesses cosets as elements is said to be a factor group or quotient group. In this context, the multiplication is a product of cosets. We denote the factor group by ℊ=H : An identity element of this factor group is H . This is because in (16.15) putting gi = e, we get H gj H = gj H . Alternatively, putting gj = e, we have ðgi H ÞH = gi H . In (16.15) moreover putting gj = gi- 1 , we get ðgi H Þ gi- 1 H = gi gi- 1 H = H :

ð16:16Þ

Hence, ðgi H Þ - 1 = gi- 1 H . That is, the inverse element of gi H is gi- 1 H .

16.4

Isomorphism and Homomorphism

As in the case of the linear vector space, we consider the mapping between group elements. Of these, the notion of isomorphism and homomorphism is important. Definition 16.1 Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′, ⋯} be groups and let a mapping ℊ → ℊ′ exist. Suppose that there is a one-to-one correspondence (i.e., injective mapping) x $ x0 , y $ y0 , ⋯ between the elements such that xy = z implies that x′y′ = z′ and vice versa. Meanwhile, any element in ℊ′ must be the image of some element of ℊ. That is, the mapping is surjective as well and, hence, the mapping is bijective. Then, the two

654

16

Introductory Group Theory

groups ℊ and ℊ′ are said to be isomorphic. The relevant mapping is called an isomorphism. We symbolically denote this relation by ℊ ffi ℊ0 : Note that the aforementioned groups can be either a finite group or an infinite group. We did not designate identity elements. Suppose that x is the identity e. Then, from the relations xy = z and x′y′ = z′, we have ey = z = y, x0 y0 = z0 = y0 :

ð16:17Þ

x0 = e0 , i:e:, e $ e0 :

ð16:18Þ

Then, we get

Also let us put y = x-1. Then, 0

xx - 1 = z = e, x0 x - 1 = e0 , x0 y0 = z0 = e0 :

ð16:19Þ

Comparing the second and third equations of (16.19), we get y0 = x 0

-1

0

= x-1 :

ð16:20Þ

The bijective character mentioned above can somewhat be loosened in such a way that the one-to-one correspondence is replaced with n-to-one correspondence. We have a following definition. Definition 16.2 Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′, ⋯} be groups and let a mapping ℊ → ℊ′ exist. Also let a mapping ρ: ℊ → ℊ′ exist such that with any arbitrarily chosen two elements, the following relation holds: ρðxÞρðyÞ = ρðxyÞ:

ð16:21Þ

Then, the two groups ℊ and ℊ′ are said to be homomorphic. The relevant mapping is called homomorphism. We symbolically denote this relation by ℊ  ℊ0 : In this case, we have ρðeÞρðeÞ = ρðeeÞ = ρðeÞ, i:e:, ρðeÞ = e0 , where e′ is an identity element of ℊ′. Also, we have

16.4

Isomorphism and Homomorphism

655

ρðxÞρ x - 1 = ρ xx - 1 = ρðeÞ = e0 : Therefore, ½ ρð xÞ  - 1 = ρ x - 1 : The two groups can be either a finite group or an infinite group. Note that in the above the mapping is not injective. The mapping may or may not be surjective. Regarding the identity and inverse elements, we have the same relations as (16.18) and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomorphism is the isomorphism. Let us introduce an important notion of a kernel of a mapping. In this regard, we have a following definition: Definition 16.3 Let ℊ = {e, x, y, ⋯} and ℊ′ = {e′, x′, y′, ⋯} be groups and let e and e′ be the identity elements. Suppose that there exists a homomorphic mapping ρ: ℊ → ℊ′. Also let F be a subset of ℊ such that ρðF Þ = e0 :

ð16:22Þ

Then, F is said to be a kernel of ρ. Regarding the kernel, we have following important theorems. Theorem 16.2 Let ℊ = {e, x, y, ⋯} and ℊ′ = {e′, x′, y′, ⋯} be groups, where e and e′ are identity elements. A necessary and sufficient condition for a surjective and homomorphic mapping ρ : ℊ → ℊ′ to be isomorphic is that a kernel F = feg. Proof We assume that F = feg. Suppose that ρ(x) = ρ( y). Then, we have ρðxÞ½ρðyÞ - 1 = ρðxÞρ y - 1 = ρ xy - 1 = e0 :

ð16:23Þ

The first and second equalities result from the homomorphism of ρ. Since F = feg, xy-1 = e, i.e., x = y. Therefore, ρ is injective (i.e., one-to-one correspondence). As ρ is surjective from the assumption, ρ is bijective. The mapping ρ is isomorphic accordingly. Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) = e′. From (16.18), ρ(e) = e′. We have ρ(x) = ρ(e) = e′ ⟹ x = e due to the isomorphism of ρ (i.e., one-to-one correspondence). This implies F = feg. This completes the proof. We become aware of close relationship between Theorem 16.2 and linear transformation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1 shows this relationship. Figure 16.1a represents homomorphic mapping ρ: ℊ → ℊ′ between two groups, whereas Fig. 16.1b shows linear transformation (endomorphism) A: Vn → Vn in a vector space Vn.

656

16

Fig. 16.1 Mapping in a group and vector space. (a) Homomorphic mapping ρ: ℊ → ℊ′ between two groups. (b) Linear transformation (endomorphism) A: Vn → Vn in a vector space Vn

Introductory Group Theory

(a)

: isomorphism

(b)

: invertible (bijective) Theorem 16.3 Suppose that there exists a homomorphic mapping ρ: ℊ → ℊ′, where ℊ and ℊ′ are groups. Then, a kernel F of ρ is an invariant subgroup of ℊ. Proof Let ki and kj be any two arbitrarily chosen elements of F . Then, ρð k i Þ = ρ k j = e0 ,

ð16:24Þ

where e′ is the identity element of ℊ′. From (16.21) we have ρ k i kj = ρðk i Þρ k j = e0 e0 = e0 :

ð16:25Þ

Therefore, k i kj 2 F . Meanwhile, from (16.20) we have ρ ki- 1 = ½ρðk i Þ - 1 = e0

-1

= e0 :

ð16:26Þ

Then, ki- 1 2 F . Thus, F is a subgroup of ℊ. Next, for 8g 2 ℊ we have ρ gk i g - 1 = ρðgÞρðki Þρ g - 1 = ρðgÞe0 ρ g - 1 = e0 :

ð16:27Þ

Accordingly we have gk i g - 1 2 F . Thus, gF g - 1 ⊂ F . Since g is chosen arbitrarily, replacing it with g-1, we have g - 1 F g ⊂ F . Multiplying g and g-1 on both sides from the left and right, respectively, we get F ⊂ gF g - 1 . Consequently, we get gF g - 1 = F : This implies that F of ρ is an invariant subgroup of ℊ.

ð16:28Þ

16.5

Direct-Product Groups

657

Theorem 16.4 Homomorphism Theorem Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′, ⋯} be groups and let a homomorphic (and surjective) mapping ρ: ℊ → ℊ′ exist. Also let F be a kernel of ℊ. Let us define a surjective mapping ~ ρ : ℊ=F → ℊ0 such that ~ρðgi F Þ = ρðgi Þ:

ð16:29Þ

Then, ~ρ is an isomorphic mapping. ~ is homomorphic. The confirProof From (16.15) and (16.21), it is obvious that ρ mation is left for readers. Let gi F and gj F be two different cosets. Suppose here that ρ(gi) = ρ(gj). Then we have ρ gi- 1 gj = ρ gi- 1 ρ gj = ½ρðgi Þ - 1 ρ gj = ½ρðgi Þ - 1 ρðgi Þ = e0 :

ð16:30Þ

This implies that gi- 1 gj 2 F . That is, we would have gj 2 gi F . This is in contradiction to the definition of a coset. Thus, we should have ρ(gi) ≠ ρ(gj). In other words, the different cosets gi F and gj F have been mapped into different elements ρ(gi) and ρ(gj) in ℊ′. That is, ~ρ is isomorphic; i.e., ℊ=F ffi ℊ0 .

16.5

Direct-Product Groups

So far we have investigated basic properties of groups. In Sect. 16.4 we examined factor groups. The homomorphism theorem shows that the factor group is characterized by division. In the case of a finite group, an order of the group is reduced. In this section, we study the opposite character, i.e., properties of direct product of groups, or direct-product groups. Let H = fh1  e, h2 , ⋯, hm g and H 0 = h01  e, h02 , ⋯, h0n be groups of the order of m and n, respectively. Suppose that (1) 8 hi (1 ≤ i ≤ m) and 8 0 hj ð1 ≤ i ≤ nÞ commute, i.e., hi h0j = h0j hi , and that (2) H \ H 0 = feg. Under these conditions let us construct a set ℊ such that ℊ = h1 h01  e, hi h0j ð1 ≤ i ≤ m, 1 ≤ j ≤ nÞ :

ð16:31Þ

In other words, ℊ is a set comprising mn elements hi h0j . A product of elements is defined as hi h0j

hk h0l = hi hk h0j h0l = hp h0q ,

ð16:32Þ

658

16

Introductory Group Theory

where hp = hihk and h0q = h0j h0l . The identity element is ee = e; ehihk = hihke. The inverse element is hi h0j

hi h0j

-1

= h0j

-1

hi - 1 = hi - 1 h0j

-1

. Associative law is obvious

= h0j hi .

Thus, ℊ forms a group. This is said to be a direct product of groups, from or a direct-product group. The groups H and H 0 are called direct factors of ℊ. In this case, we succinctly represent ℊ = H  H 0: In the above, the condition (2) is equivalent to that 8 g2ℊ is uniquely represented as g = hh0 ; h 2 H , h0 2 H 0 :

ð16:33Þ

In fact, suppose that H \ H 0 = feg and that g can be represented in two ways such that g = h1 h01 = h2 h02 ; h1 , h2 2 H , h01 , h02 2 H 0 :

ð16:34Þ

Then, we have h2 - 1 h1 = h02 h01

-1

; h2 - 1 h1 2 H , h02 h01

-1

2 H 0:

ð16:35Þ

From the supposition, we get h2 - 1 h1 = h02 h01

-1

= e:

ð16:36Þ

That is, h2 = h1 and h02 = h01 . This means that the representation is unique. Conversely, suppose that the representation is unique and that x 2 H \ H 0 . Then we must have x = xe = ex:

ð16:37Þ

Thanks to uniqueness of the representation, x = e. This implies H \ H 0 = feg. Now suppose h 2 H . Then for 8 g2ℊ putting g = hν h0μ , we have ghg - 1 = hν h0μ hh0μ

-1

hν - 1 = hν hh0μ h0μ

-1

hν - 1 = hν hhν - 1 2 H :

ð16:38Þ

Then we have gH g - 1 ⊂ H . Similarly to the proof of Theorem 16.3, we get gH g - 1 = H :

ð16:39Þ

Reference

659

This shows that H is an invariant subgroup of ℊ. Similarly, H 0 is an invariant subgroup as well. Regarding the unique representation of the group element of a direct-product group, we become again aware of the close relationship between the direct product and direct sum that was mentioned earlier in Part III.

Reference 1. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York

Chapter 17

Symmetry Groups

We have many opportunities to observe symmetry both macroscopic and microscopic in natural world. First, we need to formulate the symmetry appropriately. For this purpose, we must regard various symmetry operations as mathematical elements and classify these operations under several categories. In Part III we examined various properties of vectors and their transformations. We also showed that the vector transformation can be viewed as the coordinate transformation. On these topics, we focused upon abstract concepts in various ways. On another front, however, we have not paid attention to specific geometric objects, especially molecules. In this chapter, we study the symmetry of these concrete objects. For this, it will be indispensable to correctly understand a variety of symmetry operations. At the same time, we deal with the vector and coordinate transformations as group elements. Among such transformations, rotations occupy a central place in the group theory and related field of mathematics. Regarding the three-dimensional Euclidean space, SO(3) is particularly important. This is characterized by an infinite group in contrast to various symmetry groups (or point groups) we investigate in the former parts of this chapter.

17.1

A Variety of Symmetry Operations

To understand various aspects of symmetry operations, it is convenient and essential to consider a general point that is fixed in a three-dimensional Euclidean space and to examine how this point is transformed in the space. In parallel to the description in Part III, we express the coordinate of the general point P as

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_17

661

662

17

P=

x y z

:

Symmetry Groups

ð17:1Þ

Note that P may be on or within or outside a geometric object or molecule that we are dealing with. The relevant position vector x for P is expresses as x = xe1 þ ye2 þ ze3 = ð e1 e2 e3 Þ

x y z

ð17:2Þ

,

where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transformation A by AðxÞ = ðe1 e2 e3 Þ

a11 a21 a31

a12 a13 a22 a23 a32 a33

x y z

:

ð17:3Þ

Among various linear transformations that are represented by matrices, orthogonal transformations are the simplest and most widely used. We use orthogonal matrices to represent the orthogonal transformations accordingly. Let us think of a movement or translation of a geometric object and an operation that causes such a movement. First suppose that the geometric object is fixed on a coordinate system. Then the object is moved (or translate) to another place. If before and after such a movement (or translation) one could not tell whether the object has been moved, we say that the object possesses a “symmetry” in a certain sense. Thus, we have to specify this symmetry. In that context, group theory deals with the symmetry and defines it clearly. To tell whether the object has been moved (to another place), we usually distinguish it by change in (1) positional relationship and (2) attribute or property. To make the situation simple, let us consider a following example: Example 17.1 We have two round disks, i.e., Disk A and Disk B. Suppose that Disk A is a solid white disk, whereas Disk B is partly painted black (see Fig. 17.1). In Fig. 17.1 we are thinking of a rotation of an object (e.g., round disk) around an axis standing on its center (C) and stretching perpendicularly to the object plane. If an arbitrarily chosen positional vector fixed on the object before the rotation is moved to another position that was not originally occupied by the object, then we recognize that the object has certainly been moved. For instance, imagine that a round disk having a through-hole located aside from center is rotating. What about the case where that position was originally occupied by the object, then? We have two possibilities. The first alternative is that we cannot recognize that the object has been moved. The second one is that we can yet recognize that the object has been

17.1

A Variety of Symmetry Operations

663

(b)

(a) Disk A

Disk B Rotation around

Rotation around C

the center C

C

C

C

the center C

Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been moved. (b) Case where we can recognize that the object has been moved because of its attribute (i.e., because the round disk is partly painted black)

moved. According to Fig. 17.1a, b, we have the former case and the latter case, respectively. In the latter case, we have recognized the movement of the object by its attribute, i.e., by that the object is partly painted black. □ In the above example, we do not have to be rigorous. We have a clear intuitive criterion for a judgment of whether a geometric object has been moved. From now on, we assume that the geometric character of an object is pertinent to both its positional relationship and attribute. We define the equivalent (or indistinguishable) disposition of an object and the operation that yields such an equivalent disposition as follows: Definition 17.1 1. Symmetry operation: A geometric operation that produces an equivalent (or indistinguishable) disposition of an object. 2. Equivalent (or indistinguishable) disposition: Suppose that regarding a geometric operation of an object, we cannot recognize that the object has been moved before and after that geometric operation. In that case, the original disposition of the object and the resulting disposition reached after the geometric operation are referred to as an equivalent disposition. The relevant geometric operation is the symmetry operation. ▽ Here we should clearly distinguish translation (i.e., parallel displacement) from the abovementioned symmetry operations. This is because for a geometric object to possess the translation symmetry the object must be infinite in extent, typically an infinite crystal lattice. The relevant discipline is widely studied as space group and has a broad class of applications in physics and chemistry. However, we will not deal with the space group or associated topics, but focus our attention upon symmetry groups in this book. In the above example, the rotation is a symmetry operation with Fig. 17.1a, but the said geometric operation is not a symmetric operation with Fig. 17.1b. Let us further inspect properties of the symmetry operation. Let us consider a set H consisting of symmetry operations. Let a and b be any two symmetric operations of H. Then, (1) a ⋄ b is a symmetric operation as well. (2) Multiplication of successive symmetric operations a, b, c is associative, i.e., a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c. (3) The set H contains an element of e called the identity element such that we have a ⋄ e = e ⋄ a = a with any element a of H. Operating “nothing” should be e. If the

664

17

Symmetry Groups

rotation is relevant, 2π rotation is thought to be e. These are intuitively acceptable. (4) For any a of ℊ, we have an element b such that a ⋄ b = b ⋄ a = e. The element b is said to be the inverse element of a. We denote it by b  a-1. The inverse element corresponds to an operation that brings the disposition of a geometric object back to the original disposition. Thus, H forms a group. We call H satisfying the above criteria a symmetry group. A symmetry group is called a point group as well. This is because a point group comprises symmetry operations of geometric objects as group elements, and those objects have at least one fixed point after the relevant symmetry operation. The name of a point group comes from this fact. As mentioned above, the symmetric operation is best characterized by a (3, 3) orthogonal matrix. In Example 17.1, e.g., the π rotation is represented by an orthogonal matrix A such that

A=

-1 0 0

0 0 -1 0 0 1

:

ð17:4Þ

This operation represents a π rotation around the z-axis. Let us think of another symmetric operation described by B=

1 0 0

0 1 0

0 0 -1

:

ð17:5Þ

This produces a mirror symmetry with respect to the xy-plane. Then, we have -1 0 0

C  AB = BA =

0 -1 0

0 0 -1

:

ð17:6Þ

The operation C shows an inversion about the origin. Thus, A, B, and C along with an identity element E form a group. Here E is expressed as E=

1 0 0

0 1 0

0 0 1

:

ð17:7Þ

The above group is represented by four three-dimensional diagonal matrices whose elements are 1 or -1. Therefore, it is evident that an inverse element of A, B, and C is A, B, and C itself, respectively. The said group is commutative (Abelian) and said to be a four-group (or Klein four-group, more specifically) [1]. Meanwhile, we have a number of non-commutative groups. From a point of view of a matrix structure, non-commutativity comes from off-diagonal elements of the matrix. A typical example is a rotation matrix of a rotation angles different from zero

17.1

A Variety of Symmetry Operations

665

Fig. 17.2 Rotation by θ around the z-axis

z

A θ

y

O

x or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation around the z-axis. Figure 17.2 depicts a graphical illustration for this. A matrix R has a following form: R=

cos θ sin θ 0

- sin θ 0 cos θ 0 0 1

:

ð17:8Þ

Note that R has been reduced. This implies that the three-dimensional Euclidean space is decomposed into a two-dimensional subspace (xy-plane) and a one-dimensional subspace (z-axis). The xy-coordinates are not mixed with the zcomponent after the rotation R. Note, however, that if the rotation axis is oblique against the xy-plane, this is not the case. We will come back to this point later. Taking only the xy-coordinates in Fig. 17.3, we make a calculation. Using an addition theorem of trigonometric functions, we get x0 = r cosðθ þ αÞ = r ðcos α cos θ - sin α sin θÞ = x cos θ - y sin θ, 0

y = r sinðθ þ αÞ = r ðsin α cos θ þ cos α sin θÞ = y cos θ þ x sin θ,

ð17:9Þ ð17:10Þ

where we used x = r cos α and y = r sin α. Combining (17.9) and (17.10), as a matrix form, we get

666

17

Symmetry Groups

y

Fig. 17.3 Transformation of the xy-coordinates by a θ rotation

θ α x

O

x′ y′

=

cos θ sin θ

- sin θ cos θ

x : y

ð17:11Þ

The matrix of RHS in (17.11) is the same as (11.31) and represents a transformation matrix of a rotation angle θ within the xy-plane. Whereas in Chap. 11 we considered this from the point of view of the transformation of basis vectors, here we deal with the transformations of coordinates in the fixed coordinate system. If θ ≠ π, off-diagonal elements do not vanish. Including the z-component we get (17.8). Let us summarize symmetry operations and their (3, 3) matrix representations. The coordinates before and after a symmetry operation are expressed as x y z

and

x0 y0 z0

, respectively:

1. Identity transformation: To leave a geometric object or a coordinate system unchanged (or unmoved). By convention, we denote it by a capital letter E. It is represented by a (3, 3) identity matrix. 2. Rotation symmetry around a rotation axis: Here a “proper” rotation is intended. We denote a rotation by a rotation axis and its magnitude (i.e., rotation angle). Thus, we have

17.1

A Variety of Symmetry Operations

cos θ sin θ 0

Rzθ =

– sin θ cos θ 0

Rxφ =

667

0 0 1

1 0 0

, Ryϕ =

0 cos φ sin φ

cos ϕ 0 - sin ϕ

0 - sin φ cos φ

sin ϕ 0 cos ϕ

0 1 0

:

,

ð17:12Þ

With Ryϕ we first consider a following coordinate transformation: z′ x′ y′

=

cos ϕ sin ϕ 0

- sin ϕ 0 cos ϕ 0 0 1

z x y

:

ð17:13Þ

This can easily be visualized in Fig. 17.4; consider that cyclic permutation of x y z

z x y

produces

. Shuffling the order of coordinates, we get x′ y′ z′

=

cos ϕ 0 - sin ϕ

0 1 0

sin ϕ 0 cos ϕ

x y z

:

ð17:14Þ

By convention of the symmetry groups, the following notation Cn is used to denote a rotation. A subscript n of Cn represents the order of the rotation axis. The order means the largest number of n so that the rotation through 2π/n gives an equivalent configuration. Successive rotations of m times (m < n) are denoted by Cm n: If m = n, the successive rotations produce an equivalent configuration the same as the beginning, i.e., C nn = E. The rotation angles θ, φ, etc. used above are restricted to 2πm/n accordingly. 3. Mirror symmetry with respect to a plane of mirror symmetry: We denote a mirror symmetry by a mirror symmetry plane (e.g., xy-plane, yzplane, etc.). We have

M xy =

1

0

0

0 0

1 0

0 -1

, M yz =

-1

0

0

0 0

1 0

0 1

, M zx =

1

0

0

0 0

-1 0

0 1

: ð17:15Þ

668

17

Symmetry Groups

z

y

A

O

x Fig. 17.4 Rotation by ϕ around the y-axis

The mirror symmetry is usually denoted by σ v, σ h, and σ d, whose subscripts stand for “vertical,” “horizontal,” and “dihedral,” respectively. Among these symmetry operations, σ v and σ d include a rotation axis in the symmetry plane, while σ h is perpendicular to the rotation axis if such an axis exists. Notice that a group belonging to Cs symmetry possesses only E and σ h. Although σ h can exist by itself as a mirror symmetry, neither σ v nor σ d can exist as a mirror symmetry by itself. We will come back to this point later. 4. Inversion symmetry with respect to a center of inversion: We specify an inversion center if necessary, e.g., an origin of a coordinate system O. The operation of inversion is denoted by

IO =

-1 0 0

0 -1 0

0 0 -1

:

ð17:16Þ

Note that as obviously from the matrix form, IO is commutable with any other symmetry operations. Note also that IO can be expressed as successive symmetry operations or product of symmetry operations. For instance, we have

17.2

Successive Symmetry Operations

I O = Rzπ M xy =

669 -1 0 0

0 -1 0

0 0 1

1 0 0

0 1 0

0 0 -1

:

ð17:17Þ

Note that Rzπ and Mxy are commutable, i.e., RzπMxy = MxyRzπ . 5. Improper rotation: This is a combination of a proper rotation and a reflection by a mirror symmetry plane. That is, rotation around an axis is performed first and then reflection is carried out by a mirror plane that is perpendicular to the rotation axis. For instance, an improper rotation is expressed as M xy Rzθ =

1 0 0

=

0 0 1 0 0 -1

cos θ sin θ 0

cos θ sin θ 0

– sin θ 0 cos θ 0 0 -1

– sin θ cos θ 0

:

0 0 1

ð17:18Þ

As mentioned just above, the inversion symmetry I can be viewed as an improper rotation. Note that in this case the reflection and rotation operations are commutable. However, we will follow a conventional custom that considers the inversion symmetry as an independent symmetry operation. Readers may well wonder why we need to consider the improper rotation. The answer is simple; it solely rests upon the axiom (A1) of the group theory. A group must be closed with respect to the multiplication. The improper rotations are usually denoted by Sn. A subscript n again stands for an order of rotation.

17.2

Successive Symmetry Operations

Let us now consider successive reflections in different planes and successive rotations about different axes [2]. Figure 17.5a displays two reflections with respect to the planes σ and σ~ both perpendicular to the xy-plane. The said planes make a dihedral angle θ with their intersection line identical to the z-axis. Also, the plane σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpendicularly to the zx-plane. As in (17.15), an operation σ is represented as

670

17

(a)

(c)

(b)

z

Symmetry Groups

2

A

A ‒2

O

y

θ x

Fig. 17.5 Successive two reflections about two planes σ and σ~ that make an angle θ. (a) Reflections σ and σ~ with respect to two planes. (b) Successive operations of σ and σ~ in this order. The combined operation is denoted by σ~σ. The operations result in a 2θ rotation around the z-axis. (c) Successive operations of σ~ and σ in this order. The combined operation is denoted by σ~ σ . The operations result in a -2θ rotation around the z-axis

1 0 0

σ=

0 0 -1 0 0 1

:

ð17:19Þ

To determine a matrix representation of σ~, we calculate a matrix again as in the above case. As a result we have σ~ =

cos θ sin θ 0

- sin θ 0 cos θ 0 0 1 =

1 0 0 cos 2θ sin 2θ 0

cos θ - sin θ 0

0 0 -1 0 0 1 sin 2θ - cos 2θ 0

0 0 1

:

sin θ cos θ 0

0 0 1 ð17:20Þ

Notice that this matrix representation is referred to the original xyz-coordinate system; see discussion of Sect. 11.4. Hence, we describe the successive transformations σ followed by σ~ as σ~ σ =

cos 2θ sin 2θ 0

- sin 2θ 0 cos 2θ 0 0 1

:

ð17:21Þ

The expression (17.21) means that the multiplication should be done first by σ and then by σ~ ; see Fig. 17.5b. Note that σ and σ~ are conjugate to each other (Sect. 16.3). In this case, the combined operations produce a 2θ rotation around the z-axis. If, on the other hand, the multiplication is made first by σ~ and then by σ, we have a -2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have

17.2

Successive Symmetry Operations

671

cos 2θ - sin 2θ 0

σ~ σ=

sin 2θ cos 2θ 0

0 0 1

:

ð17:22Þ

Thus, successive operations of reflection by the planes that make a dihedral angle θ yield a rotation ±2θ around the z-axis (i.e., the intersection line of σ and σ~ ). The operation σ~ σ is an inverse to σ~σ. That is, ðσ~ σ Þðσ~σ Þ = E:

ð17:23Þ

We have detσ~ σ = det~ σ σ = 1. Meanwhile, putting R2θ =

cos 2θ sin 2θ 0

- sin 2θ 0 cos 2θ 0 0 1

,

ð17:24Þ

we have σ~σ = R2θ or σ~ = R2θ σ:

ð17:25Þ

This implies the following: Suppose a plane σ and a straight line on it. Also suppose that one first makes a reflection about σ and then makes a 2θ rotation around the said straight line. Then, the resulting transformation is equivalent to a reflection about σ~ that makes an angle θ with σ. At the same time, the said straight line is an intersection line of σ and σ~. Note that a dihedral angle between the two planes σ and σ~ is half an angle of the rotation. Thus, any two of the symmetry operations related to (17.25) are mutually dependent; any two of them produce the third symmetry operation. In the above illustration, we did not take account of the presence of a symmetry axis. If the aforementioned axis is a symmetry axis Cn, we must have 2θ = 2π=n or n = π=θ:

ð17:26Þ

From a symmetry requirement, there should be n planes of mirror symmetry in combination with the Cn axis. Moreover, an intersection line of these n mirror symmetry planes should coincide with that Cn axis. This can be seen as various Cnv groups. Next we consider another successive symmetry operations. Suppose that there are two C2 axes (Cxπ and Caπ as shown) that intersect at an angle θ (Fig. 17.6). There, Cxπ is identical to the x-axis and the other (Caπ) lies on the xy-plane making an angle θ with Cxπ. Following procedures similar to the above, we have matrix representations of the successive C2 operations in reference to the xyz-system such that

672

17

Fig. 17.6 Successive two π rotations around two C2 axes of Cxπ and Caπ that intersect at an angle θ. Of these, Cxπ is identical to the x-axis (not shown)

Symmetry Groups

A 2 −2

C xπ =

Caπ =

cos θ sin θ 0

-sin θ 0 cos θ 0 0 1 =

cos 2θ sin 2θ 0

1 0 0

0 -1 0

0 0 -1

1 0 0

0 -1 0

0 0 -1

sin 2θ - cos 2θ 0

, cos θ - sin θ 0 0 0 -1

sin θ cos θ 0

0 0 1 ð17:27Þ

,

where Caπ can be calculated similarly to (17.20). Again we get C aπ C xπ =

cos 2θ sin 2θ 0

C xπ C aπ =

cos 2θ - sin 2θ 0

-sin 2θ cos 2θ 0 sin 2θ cos 2θ 0

0 0 1 0 0 1

Notice that (CxπCaπ )(CaπCxπ) = E. Once again putting

,

:

ð17:28Þ

17.2

Successive Symmetry Operations

R2θ =

cos 2θ - sin 2θ sin 2θ cos 2θ 0 0

673

0 0 1

,

ð17:24Þ

we have [1, 2] C aπ Cxπ = R2θ or C aπ = R2θ C xπ :

ð17:29Þ

Note that in the above two illustrations for the successive symmetry operations, both the relevant operators have been represented in reference to the original xyzsystem. For this reason, the latter operation was done from the left in (17.28). From a symmetry requirement, once again the aforementioned C2 axes must be present in combination with the Cn axis. Moreover, those C2 axes should be perpendicular to the Cn axis. This can be seen in various Dn groups. Another illustration of successive symmetry operations is an improper rotation. If the rotation angle is π, this causes an inversion symmetry. In this illustration, reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical example. Equations (17.21), (17.22), and (17.24) demonstrate the same relation. Namely, two successive mirror symmetry operations about a couple of planes and two successive π-rotations about a couple of C2 axes cause the same effect with regard to the geometric transformation. In this relation, we emphasize that two successive reflection operations make a determinant of relevant matrices 1. These aspects cause an interesting effect and we will briefly discuss it in relation to O and Td groups. If furthermore the abovementioned mirror symmetry planes and C2 axes coexist, the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were not the case, another mirror plane or C2 axis would be generated from the symmetry requirement and the newly generated plane or axis would be coupled with the original plane or axis. From the above argument, these processes again produce another Cn axis. That must be prohibited. Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry. A rotation of 2π/n around such an axis produces another mirror symmetry plane. This newly generated plane intersects with the original mirror plane and produces a different Cn axis according to the above discussion. Thus, in this situation a mirror symmetry plane cannot coexist with a sole rotation axis. In a geometric object with higher symmetry such as Oh, however, several mirror symmetry planes can coexist with several rotation axes in such a way that the axes intersect with the mirror planes obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane, that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the case with a geometric object having a C2h symmetry. The mirror symmetry plane is denoted by σ h. Now, let us examine simple examples of molecules and geometric figures as well as associated symmetry operations.

674

17

Symmetry Groups

Fig. 17.7 Mirror symmetry plane σ h and a sole rotation axis perpendicular to it

Fig. 17.8 Chemical structural formulae and point groups of (a) thiophene, (b) bithiophene, (c) biphenyl, and (d) naphthalene

(a)

(b) S

(c)

S

S

(d)

Example 17.2 Figure 17.8 shows chemical structural formulae of thiophene, bithiophene, biphenyl, and naphthalene. These molecules belong to C2v, C2h, D2, and D2h, respectively. Note that these symbols are normally used to show specific point groups. Notice also that in biphenyl two benzene rings are twisted relative to the molecular axis. As an example, a multiplication table is shown in Table 17.1 for a C2v group. Table 17.1 clearly demonstrates that the group constitution of C2v differs from that of the group appearing in Example 16.1 (3), even though the order is four for both the case. Similar tables are given with C2h and D2. This is left for readers as an exercise. We will find that the multiplication tables of these groups have the same structure and that C2v, C2h, and D2 are all isomorphic to one another as a four-group. Table 17.2 gives matrix representation of symmetry operations for C2v. The

17.2

Successive Symmetry Operations

Table 17.1 Multiplication table of C2v

675

C2v E C2(z) σ v(zx) σ 0v ðyzÞ

E E C2 σv σ 0v

σ 0v ðyzÞ σ 0v σv C2 E

σ v(zx) σv σ 0v E C2

C2(z) C2 E σ 0v σv

Table 17.2 Matrix representation of symmetry operations for C2v C2 (around z-axis)

E Matrix

1 0 0

0 1 0

0 0 1

-1 0 0

0 -1 0

σ 0v ðyzÞ

σ v(zx)

0 0 1

1 0 0

0 -1 0

0 0 1

-1 0 0

0 1 0

0 0 1

Table 17.3 Multiplication table of D2h D2h E C2(z) σ v(zx) σ 0v ðyzÞ i σ 00v ðxyÞ C 02 ðyÞ C 002 ðxÞ

E E C2 σv σ 0v i σ 00v C 02 C 002

C2(z) C2 E σ 0v σv σ 00v i C002 C02

σ v(zx) σv σ 0v E C2 C 02 C002 i σ 00v

σ 0v ðyzÞ σ 0v σv C2 E C 002 C 02 σ 00v i

i i σ 00v C 02 C 002 E C2 σv σ 0v

σ 00v ðxyÞ σ 00v i C 002 C 02 C2 E σ 0v σv

C 02 ðyÞ C 02 C002 i σ 00v σv σ 0v E C2

C002 ðxÞ C 002 C 02 σ 00v i σ 0v σv C2 E

representation is defined as the transformation by the symmetry operations of a set of basis vectors (x y z) in ℝ3. Meanwhile, Table 17.3 gives the multiplication table of D2h. We recognize that the multiplication table of C2v appears on upper left and lower right blocks. If we suitably rearrange the order of group elements, we can make another multiplication table so that, e.g., C2h may appear on upper left and lower right blocks. As in the case of Table 17.2, Table 17.4 summarizes the matrix representation of symmetry operations for D2h. There are eight group elements, i.e., an identity, an inversion, three mutually perpendicular C2 axes, and three mutually perpendicular planes of mirror symmetry (σ). Here, we consider a possibility of constructing subgroups of D2h. The order of the subgroups must be a divisor of eight, and so let us list subgroups whose order is four and examine how many subgroups exist. We have 8C4 = 70 combinations, but those allowed should be restricted from the requirement of forming a group. This is because all the groups must contain identity element, and so the number allowed is equal to or no greater than 7C3 = 35. (1) In light of the aforementioned discussion, two C2 axes mutually intersecting at π/2 yield another C2 axis around the normal to a plane defined by the intersecting axes. Thus, three C2 axes have been chosen and a D2 symmetry results. In this case, we have only one choice. (2) In the case of C2v, two planes mutually intersecting at

D2h Matrix

E

1 0 0

0 1 0

0 0 1

C2( y) -1 0 0

0 0 1

-1 0 0

0 -1 0

C2(z) 0 1 0

0 0 -1

1 0 0

C2(x)

Table 17.4 Matrix representation of symmetry operations for D2h 0 -1 0

0 0 -1

i -1 0 0 0 -1 0

0 0 -1

1 0 0

σ(xy) 0 1 0

0 0 -1

1 0 0

σ(zx) 0 -1 0

0 0 1

-1 0 0

σ(yz) 0 1 0

0 0 1

676 17 Symmetry Groups

17.2

Successive Symmetry Operations

Table 17.5 Choice of symmetry operations for construction of subgroups of D2h

Subgroup D2 C2v C2h

677 E 1 1 1

C2(z) 3 1 1

σ 0 2 1

i 0 0 1

Choice 1 3 3

π/2 yield a C2 axis around their line of intersection. There are three possibilities of choosing two axes out of three (i.e., 3C2 = 3). (3) If we choose the inversion (i) along with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case when we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e., 3C1 = 3) as well. Thus, we have only seven choices to construct subgroups of D2h having an order of four. This is summarized in Table 17.5. An inverse of any element is that element itself. Therefore, if with any above subgroup one chooses any element out of the remaining four elements and combines it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since all those subgroups of an order of 4 and 2 are commutative with D2h, these subgroups are invariant subgroups. Thus in terms of a direct-product group, D2h can be expressed as various direct factors. Conversely, we can construct factor groups from coset decomposition. For instance, we have D2h =C2v ffi C s , D2h =C2v ffi C 2 , D2h =C 2v ffi C i :

ð17:30Þ

In turn, we express direct-product groups as, e.g., D2h = C 2v  Cs , D2h = C2v  C2 , D2h = C2v  C i :

ð17:31Þ □

Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are designated as shown. As for a chemical species of molecules, we have, e.g., boron trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center with three fluorine atoms located at vertices of the equilateral triangle. The boron atom and fluorine atoms form a planar molecule (Fig. 17.10). A symmetry group belonging to D3h comprises 12 symmetry operations such that D3h = E, C3 , C 03 , C 2 , C02 , C 002 , σ h , S3 , S03 , σ v , σ 0v , σ 00v ,

ð17:32Þ

where symmetry operations of the same species but distinct operation are denoted by a “prime” or “double prime.” When we represent these operations by a matrix, it is straightforward in most cases. For instance, a matrix for σ v is given by Myz of (17.15). However, we should make some matrix calculations about C 02 , C 002 , σ 0v , and σ 00v . To determine a matrix representation of, e.g., σ 0v in reference to the xyz-coordinate system with orthonormal basis vectors e1 and e2, we consider the x′y′z′-coordinate

678

17

Fig. 17.9 Equilateral triangle placed on the xyplane. Several symmetry operations of D3h are shown. We consider the successive operations (i), (ii), and (iii) to represent σ 0v in reference to the xyzcoordinate system (see text)

Symmetry Groups

(i) ± /6 (iii) (ii) (Σ );

Σ

Fig. 17.10 Boron trifluoride (BF3) belonging to a D3h point group

B system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation matrix between the two sets of basis vectors is represented by p

3 2

e1′ e2′ e3′ = ðe1 e2 e3 ÞRzπ6 = ðe1 e2 e3 Þ

1 2 0

1 0 2 p 3 0 2 0 1

:

ð17:33Þ

This representation corresponds to (11.69). Let Σv be a reflection with respect to the z′x′-plane. This is the same operation as σ 0v . However, a matrix representation is different. This is because Σv is represented in reference to the x′y′z′-system, while σ 0v is in reference to xyz-system. The matrix representation of Σv is simple and expressed as

17.2

Successive Symmetry Operations

Σv =

679

1 0 0

0 0 -1 0 0 1

:

ð17:34Þ

Referring to (11.80), we have -1

Rzπ6 Σv = σ v′ Rzπ6 or σ v′ = Rzπ6 Σv Rzπ6

:

ð17:35Þ

Thus, we see that in the first equation of (17.35) the order of multiplications is reversed according as the latter operation is expressed in the xyz-system or in the x′y′z′-system. As a full matrix representation, we get p σ v′

=

3 2 1 2 0

=

1 2 p 3 2 0

1 0 2 p 3 0 2 0 1 p 3 0 2 1 0 2 0 1

p 1 0 0

0 0 -1 0 0 1

3 2 1 2 0

1 2 p

0

3 0 2 0 1

:

ð17:36Þ

Notice that this matrix representation is referred to the original xyz-coordinate system as before. Graphically, (17.35) corresponds to the multiplication of the symmetry operations done in the order of (i) -π/6 rotation, (ii) reflection (denoted by Σv), and (iii) π/6 rotation (Fig. 17.9). The associated matrices are multiplied from the left. Similarly, with C02 we have

C2′ =

1 2 p 3 2 0

p

3 0 2 1 0 2 0 -1

:

ð17:37Þ

The matrix form of (17.37) can also be decided in a manner similar to the above according to three successive operations shown in Fig. 17.9. In terms of classes, σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy class. With regard to the reflection and rotation, we have detσ 0v = - 1 and detC2 = 1, respectively. □

680

17

Symmetry Groups

17.3 O and Td Groups According as a geometric object (or a molecule) has a higher symmetry, we have to deal with many symmetry operations and relationship between them. As an example, we consider O and Td groups. Both groups have 24 symmetry operations and are isomorphic. Let us think of the group O fist. We start with considering rotations of π/2 around x-, y-, and z-axis. The matrices representing these rotations are obtained from (17.12) to give Rxπ2 =

1 0 0

0 0 1

0 -1 0

, Ryπ2 =

0 0 -1

0 1 0

1 0 0

, Rzπ2 =

0 1 0

-1 0 0 0 0 1

:

ð17:38Þ We continue multiplications of these matrices so that the matrices can make up a complete set (i.e., a closure). Counting over those matrices, we have 24 of them and they form a group termed O. The group O is a pure rotation group. Here, the pure rotation group is defined as a group whose group elements only comprise proper rotations (with their determinant of 1). An example is shown in Fig. 17.11, where individual vertices of the cube have three arrows for the cube not to possess mirror symmetries or inversion symmetry. The group O has five conjugacy classes. Figure 17.12 summarizes them. Geometric characteristics of individual classes are sketched as well. These classes are categorized by a trace of the matrix. This is

Fig. 17.11 Cube whose individual vertices have three arrows for the cube not to possess mirror symmetries. This object belongs to a point group O called pure rotation group

O and Td Groups

17.3

681

χ=3

z 100 010 001

y z

χ=1 10 0 0 0 -1 01 0

1 00 0 01 0 -1 0

0 0 1 0 1 0 -1 0 0

0 0 -1 0 1 0 1 0 0

0 -1 0 1 0 0 0 0 1

0 1 0 -1 0 0 0 0 1

x

y

x

z

χ=0 001 100 010

0 01 -1 0 0 0 -1 0

0 0 -1 1 00 0 -1 0

0 0 -1 -1 0 0 0 10

010 001 100

0 -1 0 0 0 -1 10 0

0 1 0 0 0 -1 -1 0 0

χ = −1 10 0 0 -1 0 0 0 -1

y x

z -1 0 0 0 1 0 0 0 -1

-1 0 0 0 -1 0 0 01 x

χ = −1 -1 0 0 00 1 01 0

0 -1 0 0 0 1 -1 0 0

-1 0 0 0 0 -1 0 -1 0

0 1 0 1 0 0 0 0 -1

0 -1 0 -1 0 0 0 0 -1

0 0 1 0 -1 0 1 0 0

0 0 -1 0 -1 0 -1 0 0

z

y

y x

Fig. 17.12 Point group O and its five conjugacy classes. Geometric characteristics of individual classes are briefly sketched

because the trace is kept unchanged by a similarity transformation. (Remember that elements of the same conjugacy class are connected with a similarity transformation.) Having a look at the sketches, we notice that each operation switches the basis vectors e1, e2, and e3, i.e., x-, y-, and z-axes. Therefore, the presence of diagonal elements (either 1 or -1) implies that the matrix takes the basis vector(s) as an eigenvector with respect to the rotation. Corresponding eigenvalue(s) are either 1 or -1 accordingly. This is expected from the fact that the matrix is an orthogonal matrix (i.e., unitary). The trace, namely, a summation of diagonal elements, is closely related to the geometric feature of the operation. The operations of a π rotation around the x-, y-, and z-axis and those of a π rotation around an axis bisecting any two of three axes have a trace -1. The former operations take all the basis vectors as an eigenvectors; that is, all the diagonal elements are non-vanishing. With the latter operations, however, only one diagonal element is -1. This feature comes from that the bisected axes are switched by the rotation, whereas the remaining axis is reversed by the rotation. Another characteristic is the generation of eight rotation axes that trisect the x-, y-, and z-axes, more specifically a solid angle π/2 formed by the x-, y-, and z-axes. Since the rotation switches all the x-, y-, and z-axes, the trace is zero. At the same time, we find that this operation belongs to C3. This operation is generated by successive two π/2 rotations around two mutually orthogonal axes. To inspect this situation more closely, we consider a conjugacy class of π/2 rotation that belongs to the C4 symmetry and includes six elements, i.e., Rxπ2 , Ryπ2 , Rzπ2 , Rxπ2 , Ryπ2 , and Rzπ2 . With

682

17

Symmetry Groups

these notations, e.g., Rxπ2 stands for a π/2 counterclockwise rotation around the x-axis; Rxπ2 denotes a π/2 counterclockwise rotation around the -x-axis. Consequently, Rxπ2 implies a π/2 clockwise rotation around the x-axis and, hence, an inverse element of Rxπ2 . Namely, we have -1

Rx π2 = Rx π2

:

ð17:39Þ

Now let us consider the successive two rotations. This is denoted by the multiplication of matrices that represent the related rotations. For instance, the multiplication of, e.g., Rxπ2 and R0yπ produces the following: 2

Rxyz2π3 = Rxπ2 Ryπ ′ 2

=

1 0 0

0 0 1

0 0 -1

0 -1 0

0 1 0

1 0 0

0 1 0

=

0 1 0 0 1 0

:

ð17:40Þ

In (17.40), we define Rxyz2π3 as a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-, and z-axes. The prime “ ′ ” of R0yπ means that the operation is carried 2 out in reference to the new coordinate system reached by the previous operation Rxπ2 . For this reason, R0yπ is operated (i.e., multiplied) from the right in (17.40). Compare 2 this with the remark made just after (17.29). Changing the order of Rxπ2 and R0yπ , we 2 have Rxyz2π3 = Ryπ2 Rx′π

2

=

0 0 -1

0 1 0

1 0 0

1 0 0

0 0 1

0 -1 0

0 0 -1

=

1 0 0

0 -1 0

,

ð17:41Þ

where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-, and -z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 . 2 Thus, we notice that there are eight related operations that trisect eight octants of the coordinate system. These operations are further categorized into four sets in which the two elements are an inverse element of each other. For instance, we have Rxyz2π3 = Rxyz2π3

-1

:

ð17:42Þ

Notice that a 2π/3 counterclockwise rotation around an axis that trisects the -x-, -y-, and -z-axes is equivalent to a 2π/3 clockwise rotation around an axis that

17.3

O and Td Groups

683

trisects the x-, y-, and z-axes. Also, we have Rxyz2π3 = Rxyz2π3

-1

, etc. Moreover, we

have “cyclic” relations such as Rxπ2 R0yπ = Ryπ2 R0zπ = Rzπ2 R0xπ = Rxyz2π3 : 2

2

ð17:43Þ

2

Returning back to Sect. 11.4, we had A½PðxÞ = ½ðe1 ⋯ en ÞPA0 

x1 ⋮ xn

= ð e1 ⋯ en Þ A O P

x1 ⋮ xn

:

ð11:79Þ

Implication of (11.79) is that LHS is related to the transformation of basis vectors x1 ⋮ xn

while retaining coordinates

and that transformation matrices should be

operated on the basis vectors from the right. Meanwhile, RHS describes the transformation of coordinates while retaining basis vectors. In that case, transformation matrices should be operated on the coordinates from the left. Thus, the order of operator multiplication is reversed. Following (11.80), we describe Rxπ2 R0yπ = RO Rxπ2 , i:e:, RO = Rxπ2 R0yπ Rxπ2 2

2

-1

,

ð17:44Þ

where RO is viewed in reference to the original (or fixed) coordinate system and conjugate to R0yπ . Thus, we have 2

RO =

=

1 0 0 0 1 0

0 0 1

0 0 -1

0 -1 0

-1 0 0 0 0 1

:

0 1 0

1 0 0

1 0 0

0 0 0 1 -1 0 ð17:45Þ

Note that (17.45) is identical to a matrix representation of a π/2 rotation around the z-axis. This is evident from the fact that the y-axis is converted to the original zaxis by Rxπ2 ; readers, imagine it. We have two conjugacy classes of π rotation (the C2 symmetry). One of them includes six elements, i.e., Rxyπ , Ryzπ, Rzxπ, Rxyπ , Ryzπ , and Rzxπ . For these notations a subscript, e.g., xy, stands for an axis that bisects the angle formed by x- and y-axes. A subscript xy denotes an axis bisecting the angle formed by -x- and y-axes. Another class includes three elements, i.e., Rxπ , Ryπ, and Rzπ. As for Rxπ, Ryπ, and Rzπ, a combination of these operations should yield a C2 rotation axis as discussed in Sect. 17.2. Of these three rotation axes, in fact, any two

684

17

Symmetry Groups

produce a C2 rotation around the remaining axis, as is the case with naphthalene belonging to the D2h symmetry (see Sect. 17.2). Regarding the class comprising six π rotation elements, a combination of, e.g., Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put θ = π/3 there. In this respect, elementary analytic geometry teaches the positional relationship among planes and straight lines. The argument is as follows: A plane determined by three points

x1 y1 z1

x2 y2 z2

,

, and

x3 y3 z3

that do not sit on a line is

expressed by a following equation: x x1 x2 x3 Substituting

0 0 0

,

1 1 0

, and

y y1 y2 y3

z 1 z1 1 = 0: z2 1 z3 1

0 1 1

for

x1 y1 z1

,

ð17:46Þ

x2 y2 z2

, and

x3 y3 z3

in (17.46),

respectively, we have x - y þ z = 0:

ð17:47Þ

Taking account of direction cosines and using Hesse’s normal form, we get 1 p ðx - y þ zÞ = 0, 3

ð17:48Þ

where the normal to the plane expressed in (17.48) has direction cosines of p13, - p13, and p13 in relation to the x-, y-, and z-axes, respectively. Therefore, the normal is given by a straight line connecting the origin and 1 -1 1

. In other words, a line connecting the origin and a corner of a cube is the

normal to the plane described by (17.48). That plane is formed by two intersecting lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side of 2). These axes make an angle π/3; this can easily be checked by taking an inner product between

p 1=p2 1= 2 0 C02 . On

and

0p 1=p2 1= 2

. These column vectors are two direction

the basis of the discussion of Sect. 17.2, we must have a cosines of C2 and rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three intersecting sides.

17.3

O and Td Groups

685

z

Fig. 17.13 Rotation axes of C2 and C02 (marked red) along with another rotation axis C3 (marked blue) in a point group O

(0, 1, 1) 2 (1, –1, 1)

3

y

O 2 (1, 1, 0)

x Sheet 1

Sheet 2

Sheet 3

Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight lines in three-dimensional space. To make it, follow next procedures: (1) Take three thick sheets of paper and make slits (dashed lines) as shown. (2) Insert Sheet 2 into Sheet 1 so that the two sheets can make a right angle. (3) Insert Sheet 3 into combined Sheets 1 and 2

It is sometimes hard to visualize or envisage the positional relationship among planes and straight lines in three-dimensional space. It will therefore be useful to make a simple kit to help visualize it. Figure 17.14 gives an illustration. Another typical example having 24 group elements is Td. A molecule of methane belongs to this symmetry. Table 17.6 collects the relevant symmetry operations and their (3, 3) matrix representations. As in the case of Fig. 17.12, the matrices show how a set of vectors (x y z) are transformed according to the symmetry operations. Comparing it with Fig. 17.12, we immediately recognize that the close relationship between Td and O exists and that these point groups share notable characteristics. (1) Both Td and O consist of five conjugacy classes, each of which contains the same number of symmetry species. (2) Both Td and O contain a pure rotation group T as a subgroup. The subgroup T consists of 12 group elements E, 8C3, and 3C2. Other remaining 12 group elements of Td are symmetry species related to reflection: S4 and σ d. The elements 6S4 and 6σ d correspond to 6C4 and 6C2 of O, respectively. That is, successive operations of S4 cause similar effects to those of C4 of O. Meanwhile, successive operations of σ d are related to those of 6C2 of O.

1 0 0

6σ d:

6S4:

3C2:

8C3:

E:

1 0 0

-1 0 0

-1 0 0

0 1 0

0 1 0

0 0 1

0 0 1

1 0 0

0 0 1

0 1 0

0 -1 0

0 0 1

0 -1 0

0 0 1

1 0 0

0 -1 0

0 0 -1

-1 0 0

-1 0 0

0 0 -1

0 1 0

0 0 -1

0 -1 0

0 0 -1

0 1 0

1 0 0

0 1 0

0 0 1

0 1 0

0 0 -1

1 0 0

0 0 -1

1 0 0

0 -1 0

0 -1 0

-1 0 0

1 0 0

0 0 -1

0 0 -1

0 -1 0

0 1 0

0 0 1

0 -1 0 0 1 0

-1 0 0

0 -1 -1 0 0 0

Table 17.6 Symmetry operations and their matrix representation of Td

0 0 1

0 1 0

1 0 0

1 0 0

0 1 0

0 1 0

0 0 1

-1 0 0 0 0 -1

0 0 1

0 -1 0

-1 0 0

- 1 0 0

0 -1 0

0 -1 0

0 0 1

1 0 0

0 0 -1

0 0 -1

1 0 0

0 -1 0

0 0 -1

-1 0 0

0 1 0

686 17 Symmetry Groups

17.3

O and Td Groups

687

z

Fig. 17.15 Regular tetrahedron inscribed in a cube. As an example of symmetry operations, we can choose three pairs of planes of σ d (six planes in total) given by equations of x = ± y and y = ± z and z= ±x

O

y

x Let us imagine in Fig. 17.15 that a regular tetrahedron is inscribed in a cube. As an example of symmetry operations, suppose that three pairs of planes of σ d (six planes in total) are given by equations of x = ± y and y = ± z and z = ± x. Their Hesse’s normal forms are represented as 1 p ðx ± yÞ = 0, 2

ð17:49Þ

1 p ðy ± zÞ = 0, 2

ð17:50Þ

1 p ðz ± xÞ = 0: 2

ð17:51Þ

Then, a dihedral angle α of the two planes is given by 1 1 1 cos α = p ∙ p = 2 2 2

ð17:52Þ

1 1 1 1 cos α = p ∙ p - p ∙ p = 0: 2 2 2 2

ð17:53Þ

or

688

17

Symmetry Groups

That is, α = π/3 or α = π/2. On the basis of the discussion of Sect. 17.2, the intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the case of C3, the intersection is a straight line connecting the origin and a vertex of the cube. This can readily be verified as follows: For instance, two planes given by x = y and y = z make an angle π/3 and produce an intersection line x = y = z. This line, in turn, connects the origin and a vertex of the cube. If we choose, e.g., two planes x = ± y from the above, these planes make a right angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-, y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and O are related. Let us more closely inspect the structure and constitution of O and Td. First we construct mapping ρ between group elements of O and Td such that ρ : g 2 O ⟷ g ′ 2 T d , if g 2 T, ρ : g 2 O ⟷ - g′

-1

2 T d , if g 2 = T:

In the above relation, the minus sign indicates that with an inverse representation matrix R must be replaced with -R. Then, ρ(g) = g′ is an isomorphic mapping. In fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for O and Td. For example, taking the first matrix of S4 of Td, we have

-

-1 0 0

0 0 1

0 -1 0

-1

=-

-1 0 0

0 0 0 1 -1 0

=

1 0 0

0 0 0 -1 1 0

:

The resulting matrix is identical to the first matrix of C4 of O. Thus, we find Td and O are isomorphic to each other. Both O and Td consist of 24 group elements and isomorphic to a symmetric group S4; do not confuse it with the same symbol S4 as a group element of Td. The subgroup T consists of three conjugacy classes E, 8C3, and 3C2. Since T is constructed only by entire classes, it is an invariant subgroup; in this respect see the discussion of Sect. 16.3. The groups O, Td, and T along with Th and Oh form cubic groups [2]. In Table 17.7, we list these cubic groups together with their name and order. Of these, O is a pure rotation subgroup of Oh and T is a pure rotation subgroup of Td and Th.

Table 17.7 Several cubic groups and their characteristics Notation T Th, Td O Oh

Group name Tetrahedral rotation Tetrahedral Octahedral rotation Octahedral

Order 12 24 24 48

Remark Subgroup of Th, Td Subgroup of Oh

17.4

Special Orthogonal Group SO(3)

689

Symmetry groups are related to permutations of n elements (or objects or numbers). The permutation has already appeared in (11.57) when we defined a determinant of a matrix. That was defined as σ=

1 i1

2 i2

⋯ ⋯

n in

,

where σ means the permutation of the numbers 1, 2, ⋯, n. Therefore, the above symmetry group has n! group elements (i.e., different ways of rearrangements). Although we do not dwell on symmetric groups much, we describe a following important theorem related to finite groups without proof. Interested readers are referred to literature [1]. Theorem 17.1 Cayley’s Theorem [1] Every finite group ℊ of order n is isomorphic to a subgroup (containing a whole group) of the symmetric group Sn. ■

17.4 Special Orthogonal Group SO(3) In Parts III and IV thus far, we have dealt with a broad class of linear transformations. Related groups are finite groups. Here we will describe characteristics of special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an axis (a line through the origin) with the origin fixed. The rotation is defined by an azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is defined by two parameters and the magnitude is defined by one parameter, a rotation angle. Hence, the rotation is defined by three independent parameters. Since those parameters are continuously variable, SO(3) is one of the continuous groups. Two rotations result in another rotation with the origin again fixed. A reverse rotation is unambiguously defined. An identity transformation is naturally defined. An associative law holds as well. Thus, the relevant rotations form a group, i.e., SO(3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A matrix representation is uniquely determined once an orthonormal basis is set in ℝ3. Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is defined by RT R = RRT = E,

ð17:54Þ

where R is a real matrix with detR = 1. Notice that we exclude the case where detR = - 1. Matrices that satisfy (17.54) with detR = ± 1 are referred to as orthogonal matrices that cause orthogonal transformation. Correspondingly, orthogonal groups (represented by orthogonal matrices) contain rotation groups as a special case. In other words, the orthogonal groups contain the rotation groups as a

690

17

Symmetry Groups

subgroup. An orthogonal group in ℝ3 denoted O(3) contains SO(3) as a subgroup. By the same token, orthogonal matrices contain rotation matrices as a special case. In Sect. 17.2 we treated reflection and improper rotation with a determinant of their matrices being -1. In this section these transformations are excluded and only rotations are dealt with. We focus on geometric characteristics of the rotation groups. Readers are referred to more detailed representation theory of SO(3) in appropriate literature [1].

17.4.1 Rotation Axis and Rotation Matrix In this section we represent a vector such as jxi. We start with showing that any rotation has a unique presence of a rotation axis. The rotation axis is defined by the following: Suppose that there is a rigid body with some point within the body fixed. Here the said rigid body can be that with infinite extent. Then the rigid body exerts rotation. The rotation axis is a line on which every point is unmoved during the rotation. As a matter of course, identity matrix E has three linearly independent rotation axes. (Practically, this represents no rotation.) Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis. Unless the rotation matrix is identity, the rotation matrix should be accompanied by one and only one rotation axis. Proof As R is an orthogonal matrix of a determinant 1, so are RT and R-1. Then we have ðR - E ÞT = RT - E = R - 1 - E:

ð17:55Þ

Hence, we get detðR - EÞ = det RT - E = det R - 1 - E = det R - 1 ðE - RÞ = det R - 1 detðE - RÞ = detðE - RÞ = - detðR - E Þ:

ð17:56Þ

Note here that with any (3, 3) matrix A of ℝ3, detA = - ð- 1Þ9 detA = - detð- AÞ: This equality holds with ℝn (n : odd), but in ℝn (n : even), we have detA = det (A). Then, (17.56) results in a trivial equation 0 = 0 accordingly. Therefore, the discussion made below only applies to ℝn (n : odd). Thus, from (17.56) we have

17.4

Special Orthogonal Group SO(3)

691

detðR - E Þ = 0:

ð17:57Þ

ðR - E Þ j x0 i = 0:

ð17:58Þ

Rðajx0 iÞ = a j x0 i,

ð17:59Þ

This implies that for ∃jx0i ≠ 0

Therefore, we get

where a is an arbitrarily chosen real number. In this case an eigenvalue of R is 1, which an eigenvector a j x0i corresponds to. Thus, as a rotation axis, we have a straight line expressed as l = Spanfajx0 i; a 2 ℝg:

ð17:60Þ

This proves the presence of a rotation axis. Next suppose that there are two (or more) rotation axes. The presence of two rotation axes naturally implies that there are two linearly independent vectors (i.e., two straight lines that mutually intersect at the fixed point). Suppose that such vectors are jui and jvi. Then we have ðR - E Þ j ui = 0,

ð17:61Þ

ðR - E Þ j vi = 0:

ð17:62Þ

Let us consider a vector 8jyi that is chosen from Span{| ui, | vi}, to which we assume that Spanfjui, jvig  Spanfajui, bjvi; a, b 2 ℝg:

ð17:63Þ

That is, Span{| ui, | vi} represents a plane P formed by two mutually intersecting straight lines. Then we have y = sjui þ t jvi,

ð17:64Þ

where s and t are some real numbers. Operating R - E on (17.64), we have ðR - E Þ j yi = ðR - E Þðsjui þ tjviÞ = sðR - E Þ j ui þ t ðR - E Þ j vi = 0: ð17:65Þ This indicates that any vectors in P can be an eigenvector of R, implying that an infinite number of rotation axes exist. Now, take another vector jwi that is perpendicular to the plane P (see Fig. 17.16). Let us consider an inner product hy| Rwi. Since

692

17

Symmetry Groups

| ⟩

Fig. 17.16 Plane P formed by two mutually intersecting straight lines represented by | ui and | vi. Another vector | wi is perpendicular to the plane P

P | ⟩

| ⟩

R j ui = j ui, hujR{ = hujRT = huj:

ð17:66Þ

hvjRT = hvj:

ð17:67Þ

Similarly, we have

Therefore, using the relation (17.64), we get hy j R T = hy j :

ð17:68Þ

Here we are dealing with real numbers and, hence, we have R{ = RT :

ð17:69Þ

Now we have hyjRwi = yRT jRw = yjRT Rw = hyjEwi = hyjwi = 0:

ð17:70Þ

In (17.70) the second equality comes from the associative law; the third is due to (17.54). The last equality comes from that jwi is perpendicular to P. From (17.70) we have hyjðR - E Þwi = 0:

ð17:71Þ

17.4

Special Orthogonal Group SO(3)

693

However, we should be careful not to conclude immediately from (17.71) that (R - E) j wi = 0; i.e., R j wi = j wi. This is because in (17.70) jyi does not represent all vectors in ℝ3, but merely represents all the vectors in Span{| ui, | vi}. Nonetheless, both jwi and jRwi are perpendicular to P, and so jRwi = ajwi,

ð17:72Þ

where a is an arbitrarily chosen real number. From (17.72), we have wR{ jRw = wRT jRw = hwjwi = jaj2 hwjwi, i:e:, a = ± 1, where the first equality comes from that R is an orthogonal matrix. Since detR=1, a = 1. Thus, from (17.72) this time around, we have jRwi = j wi; that is, ðR - E Þ j wi = 0:

ð17:73Þ

Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen from ℝ3, we have ðR - E Þ j xi = 0:

ð17:74Þ

R - E = 0 or R = E:

ð17:75Þ

Consequently, we get

The above procedures represented by (17.61)–(17.75) indicate that the presence of two rotation axes necessarily requires a transformation matrix to be identity. This implies that all the vectors in ℝ3 are an eigenvector of a rotation matrix. Taking contraposition of the above, unless the rotation matrix is identity, the relevant rotation cannot have two rotation axes. Meanwhile, the proof of the former half ensures the presence at least one rotation axis. Hence, any rotation is characterized by a unique rotation axis except for the identity transformation. This completes the proof. ■ An immediate consequence of Theorem 17.2 is that the rotation matrix should have an eigenvalue 1 which an eigenvector representing the rotation axis corresponds to. This statement includes a trivial fact that all the eigenvalues of the identity matrix are 1. In Sect. 14.4, we calculated eigenvalues of a two-dimensional rotation matrix. The eigenvalues were eiθ or e-iθ, where θ is a rotation angle. Let us consider rotation matrices that we dealt with in Sect. 17.1. The matrix representing the rotation around the z-axis by a rotation angle θ is expressed by

694

17

cos θ sin θ 0

R=

- sin θ 0 cos θ 0 0 1

Symmetry Groups

:

ð17:8Þ

In Sect. 14.4 we treated diagonalization of a rotation matrix. As R is reduced, the diagonalization can be performed in a manner essentially the same as (14.92). That is, as a diagonalizing unitary matrix, we have

U=

1 p 2 i p 2 0

1 p 2 i - p 2 0

1 p 2 1 p 2 0

0 , U{ =

0 1

i p

2 i - p 2 0

0 :

0

ð17:76Þ

1

As a result of the unitary similarity transformation, we get

U { RU =

1 p 2 1 p 2 0 iθ

=

e 0 0

i p 0 2 i -p 0 2 0 1

0 0 e - iθ 0 0 1

cos θ sin θ 0

1 p 2 i -p

- sin θ 0 cos θ 0 0 1

2

1 p 2

0

i p

0

2 0

0

:

1

ð17:77Þ

Thus, eigenvalues are 1, eiθ, and e-iθ. The eigenvalue 1 results from the existence of the unique rotation axis. When θ = 0, (17.77) gives an identity matrix with all the eigenvalues 1 as expected. When θ = π, eigenvalues are -1, - 1, and 1. The eigenvalue 1 is again associated with the unique rotation axis. The (unitary) similarity transformation keeps a trace unchanged, that is, the trace χ is χ = 1 þ 2 cos θ:

ð17:78Þ

As R is a normal operator, spectral decomposition can be done as in the case of Example 14.1. Here we only show the result below: 1 2 R = eiθ

i 2 0

i 2 1 2 0

0 0 0

þ e - iθ

1 2 i 2 0

i 0 2 1 0 2 0 0

-

þ

0 0 0

Three matrices of the above equation are projection operators.

0 0 0

0 0 1

:

17.4

Special Orthogonal Group SO(3)

17.4.2

695

Euler Angles and Related Topics

Euler angles are well known and have been used in various fields of science. We wish to connect the above discussion with Euler angles. In Part III we dealt with successive linear transformations. This can be extended to the case of three or more successive transformations. Suppose that we have three successive transformation R1, R2, and R3 and that the coordinate system (a threedimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly. The symbol “O” stands for the original coordinate system and I, II, and III represent successively transformed systems. With the discussion that follows, let us denote the transformation by R02 , R03 , R003 , etc. in reference to the coordinate system I. For example, R03 means that the third transformation is viewed from the system I. The transformation R003 indicates the third transformation is viewed from the system II. That is, the number of primes “′” denotes the number of the coordinate system to distinguish the systems I and II. Let R2 (without prime) stand for the second transformation viewed from the system O. Meanwhile, we have R1 R02 = R2 R1 :

ð17:79Þ

This notation is in parallel to (11.80). Similarly, we have R02 R003 = R03 R02 and R1 R03 = R3 R1 :

ð17:80Þ

R1 R02 R003 = R1 R03 R02 = R3 R1 R02 = R3 R2 R1 :

ð17:81Þ

Therefore, we get [3]

Also combining (17.79) and (17.80), we have R003 = ðR2 R1 Þ - 1 R3 ðR2 R1 Þ:

ð17:82Þ

Let us call R02 , R003 , etc. a transformation on a “moving” coordinate system (i.e., the systems I, II, III, ⋯). On the other hand, we call R1, R2, etc. a transformation on a “fixed” system (i.e., original coordinate system O). Thus (17.81) shows that the multiplication order is reversed with respect to the moving system and fixed system [3]. For a practical purpose, it would be enough to consider three successive transformations. Let us think of, however, a general case where n successive transformations are involved (n denotes a positive integer). For the purpose of succinct notation, let us define the linear transformations and relevant coordinate systems as those in Fig. 17.17. The diagram shows the transformation of the basis vectors. ðiÞ We define a following orthogonal transformation Rj :

696

17

( )

O

( )

( )



Symmetry Groups

I

II

( )

III

z y

x

Fig. 17.17 Successive orthogonal transformations and relevant coordinate systems ðiÞ

ð0Þ

Rj ð0 ≤ i < j ≤ nÞ, and Ri  Ri ,

ð17:83Þ

ðiÞ

where Rj is defined as a transformation Rj described in reference to the coordinate ð0Þ system i; Ri means that Ri is referred to the original coordinate system (i.e., the fixed coordinate). Then we have ði - 2Þ ði - 1Þ

Ri - 1 Rk

ði - 2Þ ði - 2Þ Ri - 1

= Rk

ðk > i - 1Þ:

ð17:84Þ

Particularly, when i = 3 ð1Þ ð2Þ

ð1Þ ð1Þ

R2 Rk = Rk R2 ðk > 2Þ:

ð17:85Þ

For i = 2 we have ð1Þ

R1 Rk = Rk R1 ðk > 1Þ:

ð17:86Þ

We define n time successive transformations on a moving coordinate system as R~n such that ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ R~n  R1 R2 R3 ⋯Rn - 2 Rn - 1 Rðnn - 1Þ :

ð17:87Þ

ðn - 2Þ

Applying (17.84) on Rn - 1 Rðnn - 1Þ and rewriting (17.87), we have ð1Þ ð2Þ ð n - 3Þ ðn - 2Þ R~n = R1 R2 R3 ⋯Rn - 2 Rðnn - 2Þ Rn - 1 : ðn - 3Þ

Applying (17.84) again on Rn - 2 Rðnn - 2Þ , we get

ð17:88Þ

17.4

Special Orthogonal Group SO(3)

697

ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ R~n = R1 R2 R3 ⋯Rnðn - 3Þ Rn - 2 Rn - 1 :

ð17:89Þ

Proceeding similarly, we have ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ R~n = R1 Rðn1Þ R2 R3 ⋯Rn - 2 Rn - 1 = Rn R1 R2 R3 ⋯Rn - 2 Rn - 1 ,

ð17:90Þ

where with the last equality we used (17.86). In this case, we have R1 Rðn1Þ = Rn R1 :

ð17:91Þ

To reach RHS of (17.90), we applied (17.84) (n - 1) times in total. Then we ð n - 2Þ repeat the above procedures with respect to Rn - 1 another (n - 2) times to get ð1Þ ð2Þ ð n - 3Þ R~n = Rn Rn - 1 R1 R2 R3 ⋯Rn - 2 :

ð17:92Þ

Further proceeding similarly, we finally get R~n = Rn Rn - 1 Rn - 2 ⋯R3 R2 R1 :

ð17:93Þ

In total, we have applied the permutation of (17.84) n(n - 1)/2 times. When n is 2, n(n - 1)/2 = 1. This is the case with (17.79). If n is 3, n(n - 1)/2 = 3. This is the case with (17.81). Thus, (17.93) once again confirms that the multiplication order is reversed with respect to the moving system and fixed system. ~ as Meanwhile, we define P ~  R1 Rð1Þ Rð2Þ ⋯Rðn - 3Þ Rðn - 2Þ : P 2 3 n-2 n-1

ð17:94Þ

Alternately, we describe ~ = Rn - 1 Rn - 2 ⋯R3 R2 R1 : P

ð17:95Þ

Then, from (17.87) and (17.90), we get ~ ~ ðnn - 1Þ = Rn P: R~n = PR

ð17:96Þ

~ -1 ~ ðnn - 1Þ P Rn = PR

ð17:97Þ

Equivalently, we have

or

698

17

Symmetry Groups

~ - 1 Rn P: ~ Rnðn - 1Þ = P

ð17:98Þ

Moreover, we have ~ T = R1 Rð21Þ Rð32Þ ⋯Rðnn--23Þ Rðnn--12Þ P ð n - 2Þ T

= Rn - 1

ðn - 2Þ - 1

= Rn - 1

ðn - 3Þ T

Rn - 2

ð n - 3Þ - 1

Rn - 2

ð1Þ ð2Þ

ð1Þ T

⋯ R2

ðn - 3Þ ðn - 2Þ - 1

= R1 R2 R3 ⋯Rn - 2 Rn - 1

½R1 T

ð1Þ - 1

⋯ R2

T

R1 - 1

~ =P

-1

:

ð17:99Þ ðn - 2Þ

ðn - 3Þ

The third equality of (17.99) comes from the fact that matrices Rn - 1 , Rn - 2 , and R1 are orthogonal matrices. Then, we get

ð1Þ R2 ,

~T P ~=P ~P ~ T = E: P

ð17:100Þ

~ is an orthogonal matrix. Thus, P From a point of view of practical application, (17.97) and (17.98) are very useful. This is because Rn and Rðnn - 1Þ are conjugate to each other. Consequently, Rn has the same eigenvalues and trace as Rnðn - 1Þ . In light of (11.81), we see that (17.97) and (17.98) relate Rn (i.e., viewed in reference to the original coordinate system) to Rðnn - 1Þ [i.e., the same transformation viewed in reference to the coordinate system reached after the (n - 1) transformations]. Since the transformation Rðnn - 1Þ is usually described in a simple form, matrix calculations to compute Rn can readily be done. Now let us consider an example. Example 17.4 Successive rotations A typical illustration of three successive transformations in moving coordinate systems is well known in association with Euler angles. This contains the following three steps: 1. Rotation by α around the z-axis in the original coordinate system (O) 2. Rotation by β around the y′-axis in the transferred coordinate system (I) 3. Rotation by γ around the z′′-axis (the same as z′-axis) in the transferred coordinate system (II) The above three steps are represented by matrices of Rzα, R ′ y0 β , and R ′ ′ z00 γ in (17.12). That is, as a total transformation R~3 , we have

17.4

Special Orthogonal Group SO(3)

R3 =

=

cos α sin α 0

– sin α cos α 0

0 0 1

699

cos β 0 - sin β

cos α cos β cos γ - sin α sin γ sin α cos β cos γ þ cos α sin γ - sin β cos γ

sin β 0 cos β

0 1 0

cos γ sin γ 0

– sin γ cos γ 0

0 0 1

- cos α cos β sin γ - sin α cos γ cos α sin β - sin α cos β sin γ þ cos α cos γ sin α sin β sin β sin γ cos β

:

ð17:101Þ This matrix corresponds to (17.87), where n = 3. The angles α, β, and γ in (17.101) are called Euler angles and their domains are usually taken as 0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π. The matrix (17.101) is widely used in quantum mechanics and related fields of natural science. The matrix notation, however, differs from literature to literature, and so care should be taken [3–5]. Using the notation of (17.87), we have R1 =

ð1Þ

R2 =

ð2Þ

R3 =

cos α sin α 0

– sin α cos α 0

0 0 1

cos β 0 - sin β

0 1 0

sin β 0 cos β

cos γ sin γ 0

– sin γ cos γ 0

0 0 1

ð17:102Þ

,

ð17:103Þ

,

:

ð17:104Þ

From (17.94) we also get ~  R1 R2ð1Þ = P

cos α cos β sin α cos β - sin β

– sin α cos α 0

cos α sin β sin α sin β cos β

:

ð17:105Þ

Corresponding to (17.97), we have ~ ð32Þ P ~ - 1 = R1 Rð21Þ Rð32Þ Rð21Þ R3 = PR

-1

R1 - 1 :

Now matrix calculations are readily performed such that

ð17:106Þ

700

17

cos α sin α 0

R3 =

×

=

– sin α cos α 0 cos β 0 sin β

0 0 1

cos β 0 - sin β

0 - sin β 1 0 0 cos β

sin β 0 cos β

0 1 0

cos α - sin α 0

cos γ sin γ 0 sin α cos α 0

Symmetry Groups

– sin γ cos γ 0

0 0 1

0 0 1

ð cos 2 αcos 2 β þ sin 2 αÞ cosγ þ cos 2 αsin 2 β

cos αsin αsin 2 βð1- cos γ Þ cos αcos β sin βð1- cos γ Þ þ sin αsin β sin γ - cos β sin γ

cos αsin αsin 2 βð1 - cos γ Þ þ cos β sin γ

ð sin 2 αcos 2 β þ cos 2 αÞ cosγ sinα cosβ sin βð1- cos γ Þ - cos αsin β sin γ þ sin 2 α sin 2 β

cosβ sin β cos αð1- cos γ Þ - sin αsin β sin γ

sin αcos β sin βð1 - cos γ Þ þ cosα sinβ sin γ

:

sin 2 β cosγ þ cos 2 β

ð17:107Þ

Notice that in (17.107) we have a trace χ described as χ = 1 þ 2 cos γ:

ð17:108Þ

ð2Þ

The trace is the same as that of R3 , as expected from (17.106) and (17.97). Equation (17.107) apparently seems complicated but has simple and well-defined ð2Þ meaning. The rotation represented by R3 is characterized by a rotation by γ around ′′ the z -axis. Figure 17.18 represents the orientation of the z′′-axis viewed in reference to the original xyz-system. That is, the z′′-axis (identical to the rotation axis A) is designated by an azimuthal angle α and a zenithal angle β as shown. The operation R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α, β, and γ coincide with the Euler angles designated with the same independent parameters α, β, and γ. ~ That is, From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. {~ -1

U P

{

~U= P ~ U R3 P ~U R3 P

ð2Þ = U { R3 U

=

eiγ 0 0

0 0 e - iγ 0 0 1

:

ð17:109Þ

~ is a real matrix, we have Note that as P ~{ = P ~T = P ~ -1 P and

ð17:110Þ

17.4

Special Orthogonal Group SO(3)

701

Fig. 17.18 Rotation γ around the rotation axis A. The orientation of A is defined by angles α and β as shown

z

A γ

O

y

x

~ ¼ PU

¼

cos α cos β

- sin α

cos α sin β

sin α cos β

cos α

sin α sin β

- sin β

0

cos β

1 i p cos α cos β þ p sin α 2 2 1 i p sin α cos β - p cos α 2 2 1 - p sin β 2

1 p 2 i -p

2

1 p 2 i p 2 0

0 1 i p cos α cos β - p sin α 2 2 1 i p sin α cos β þ p cos α 2 2 1 - p sin β 2

0 0 1 cos α sin β sin α sin β

:

cos β ð17:111Þ

A vector representing the rotation axis corresponds to an eigenvalue 1. The direction cosines of x-, y-, and z-component for the rotation axis A are cosα sin β, sinα sin β, and cosβ (see Fig. 3.1), respectively, when viewed in reference to the original xyz-coordinate system. This can directly be shown as follows: The characteristic equation of R3 is expressed as

702

17

Symmetry Groups

jR3 - λE j = 0:

ð17:112Þ

Using (17.107) we have jR3 -λE j ð cos 2 αcos 2 β þ sin 2 αÞcosγ þcos 2 αsin 2 β -λ =

cosαsinαsin 2 βð1- cosγ Þ þcosβ sinγ cosαcosβ sinβð1- cosγ Þ - sinαsinβ sinγ

cosαsinαsin 2 βð1- cosγ Þ cosαcosβ sinβð1- cosγ Þ þsinαsinβ sinγ - cosβ sinγ ð sin 2 αcos 2 β þ cos 2 αÞcosγ sinαcosβ sinβð1- cosγ Þ : - cosαsinβ sinγ þsin 2 αsin 2 β -λ sinαcosβ sinβð1- cosγ Þ þcosαsinβ sinγ

sin 2 β cosγ þcos 2 β -λ ð17:113Þ

When λ = 1, we must have the direction cosines of the rotation axis as an eigenvector. That is, we get ð cos 2 β cos 2 α þ sin 2 αÞ cos γ þ sin 2 β cos 2 α - 1

sin 2 β cos α sin αð1 - cos γ Þ - cos β sin γ

cos β sin β cos αð1 - cos γ Þ þ sin β sin α sin γ

sin 2 β cos α sin αð1 - cos γ Þ þ cos β sin γ

ð cos 2 β sin 2 α þ cos 2 αÞ cos γ þ sin 2 α sin 2 β - 1

sin α cos β sin βð1 - cos γ Þ - cos α sin β sin γ

cos α cos β sin βð1 - cos γ Þ - sin α sin β sin γ

sin α cos β sin βð1 - cos γ Þ þ cos α sin β sin γ

sin 2 β cos γ þ cos 2 β - 1

×

cos α sin β sin α sin β cos β

=

0 0 : 0 ð17:114Þ

The above matrix calculations certainly verify that (17.114) holds. The confirmation is left as an exercise for readers. As an application of (17.107) to an illustrative example, let us consider a 2π/3 rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes (see Fig. 17.19 and Sect. 17.3). Then we have p p p cos α = sin α = 1= 2, cos β = 1= 3, sin β = 2= 3, p cos γ = - 1=2, sin γ = 3=2: Substituting (17.115) for (17.107), we get

ð17:115Þ

17.4

Special Orthogonal Group SO(3)

703

z

z

(a)

(b)

O

y

O y x

x Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the origin and vertex

R3 =

0 1 0

0 0 1

1 0 0

:

ð17:116Þ

If we write a linear transformation R3 following (11.37), we get R3 ðjxiÞ = ðje1 i je2 i je3 iÞ

0

0

1

x1

1 0

0 1

0 0

x2

,

ð17:117Þ

x3

where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and zaxes, respectively. We have jxi = x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get x1

R3 ðjxiÞ = ðje2 i je3 i je1 iÞ

x2

:

ð17:118Þ

x3

This implies a cyclic permutation of the basis vectors. This is well characterized by Fig. 17.19. Alternatively, (17.117) can be expressed in terms of the column vectors (i.e., coordinate) transformation as

704

17

R3 ðjxiÞ = ðje1 i je2 i je3 iÞ

x3 x1 x2

:

Symmetry Groups

ð17:119Þ

Care should be taken on which linear transformation is intended out of the basis vector transformation or coordinate transformation. As mentioned above, we have compared geometric features on the moving coordinate system and fixed coordinate system. The features apparently seem to differ at a glance, but are essentially the same. □

References 1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York 2. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York 3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press, Princeton 4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World Scientific, Singapore 5. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham

Chapter 18

Representation Theory of Groups

Representation theory is an important pillar of the group theory. As we shall see soon, a word “representation” and its definition sound a bit daunting. To be short, however, we need “numbers” or “matrices” to do a mathematical calculation. Therefore, we may think of the representation as merely numbers and matrices. Individual representations have their dimension. If that dimension is one, we treat a representation as a number (real or complex). If the dimension is two or more, we are going to deal with matrices; in the case of the n-dimension, it is a (n, n) square matrix. In this chapter we focus on the representation theory of finite groups. In this case, we have an important theorem stating that a representation of any finite group can be converted to a unitary representation by a similarity transformation. That is, group elements of a finite group are represented by a unitary matrix. According to the dimension of the representation, we have the same number of basis vectors. Bearing these things firmly in mind, we can pretty easily understand this important notion of representation.

18.1

Definition of Representation

In Sect. 16.4 we dealt with various aspects of the mapping between group elements. Of these, we studied fundamental properties of isomorphism and homomorphism. In this section we introduce the notion of representation of groups and study it. If we deal with a finite group consisting of n elements, we describe it as ℊ = {g1  e, g2, ⋯, gn} as in the case of Sect. 16.1. Definition 18.1 Let ℊ = {gν} be a group comprising elements gν. Suppose that a (d, d ) matrix D(gν) is given for each group element gν. Suppose also that in correspondence with

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_18

705

706

18 Representation Theory of Groups

gμ gν = gρ ,

ð18:1Þ

D gμ Dðgν Þ = D gρ

ð18:2Þ

holds. Then a set L consisting of D(gν), that is, L = fDðgν Þg is said to be a representation. Although the definition seems somewhat daunting, the representation is, as already seen, merely a homomorphism. We call individual matrices D(gν) a representation matrix. A dimension d of the matrix is said to be a dimension of representation as well. In correspondence with gνe = gν, we have D g μ D ð e Þ = D gμ :

ð18:3Þ

DðeÞ = E,

ð18:4Þ

Therefore, we get

where E is an identity matrix of dimension d. Also in correspondence with g νgν-1 = gν-1gν = E, we have Dðgν ÞD gν - 1 = D gν - 1 Dðgν Þ = DðeÞ = E:

ð18:5Þ

D gν - 1 = ½Dðgν Þ - 1 :

ð18:6Þ

That is,

Namely, an inverse matrix corresponds to an inverse element. If the representation has one-to-one correspondence, the representation is said to be faithful. In the case of n-to-one correspondence, the representation is homomorphic. In particular, representing all group elements by 1 [as (1, 1) matrix] is called an “identity representation.” Illustrative examples of the representation have already appeared in Sects. 17.2 and 17.3. In that case the representation was faithful. For instance, in Tables 17.2, 17.4, and 17.6 as well as Fig. 17.12, the number of representation matrices is the same as that of group elements. In Sects. 17.2 and 17.3, in most cases we have used real orthogonal matrices. Such matrices are included among unitary matrices. Representation using unitary matrices is said to be a “unitary representation.” In fact, a representation of a finite group can be converted to a unitary representation.

18.1

Definition of Representation

707

Theorem 18.1 [1, 2] A representation of any finite group can be converted to a unitary representation by a similarity transformation. Proof Let ℊ = {g1, g2, ⋯, gn} be a finite group of order n. Let D(gν) be a representation matrix of gν of dimension d. Here we suppose that D(gν) is not unitary, but non-singular. Using D(gν), we construct the following matrix H such that n

H=

i=1

Dðgi Þ{ Dðgi Þ:

ð18:7Þ

Then, for an arbitrarily chosen group element gj, we have {

n

= =

n i=1

i=1

{

n

D gj HD gj =

i=1

D gj Dðgi Þ{ Dðgi ÞD gj {

Dðgi ÞD gj {

D g i gj

Dðgi ÞD gj n

D gi gj =

k=1

D ð gk Þ { D ð gk Þ

= H,

ð18:8Þ

where the third equality comes from homomorphism of the representation and the second last equality is due to rearrangement theorem (see Sect. 16.1). Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a non-singular matrix and that H is a summation of such Gram matrices. Consequently, on the basis of the argument of Sect. 13.2, H is positive definite and all the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we can get a diagonal matrix Λ such that U { HU = Λ:

ð18:9Þ

Here, the diagonal matrix Λ is given by λ1 Λ=

λ2



ð18:10Þ

, λd

with λi > 0 (1 ≤ i ≤ d). We define Λ1/2 such that p Λ

1=2

=

λ1

p

λ2

⋱ p

, λd

ð18:11Þ

708

18 Representation Theory of Groups

λ1- 1 Λ1=2

-1

λ2- 1

=

:



ð18:12Þ

λd- 1

Notice that both Λ1/2 and (Λ1/2)-1 are non-singular. Furthermore, we define a matrix V such that V = U Λ1=2

-1

:

ð18:13Þ

Then, multiplying both sides of (18.8) by V-1 from the left and by V from the right and inserting VV-1 = E in between, we get {

V - 1 D gj VV - 1 HVV - 1 D gj V = V - 1 HV:

ð18:14Þ

Meanwhile, we have -1

V - 1 HV = Λ1=2 U { HU Λ1=2

-1

= Λ1=2 Λ Λ1=2

= Λ:

ð18:15Þ

With the second equality of (18.15), we used (18.9). Inserting (18.15) into (18.14), we get {

V - 1 D gj VΛV - 1 D gj V = Λ:

ð18:16Þ

Multiplying both sides of (18.16) by Λ-1 from the left, we get {

Λ - 1 V - 1 D gj ðVΛÞ  V - 1 D gj V = E:

ð18:17Þ

Using (18.13), we have Λ - 1 V - 1 = ðVΛÞ - 1 = UΛ2 1

-1

1

= Λ2

-1

U - 1 = Λ2 1

-1

Meanwhile, taking adjoint of (18.13) and noting (18.12), we get V{ =

Λ1=2

-1 {

Using (18.13) once again, we have

U { = Λ1=2

-1

U{:

U{:

18.2

Basis Functions of Representation

709

VΛ = UΛ1=2 : Also using (18.13) we have V -1

{

=

1

U Λ2

{

-1 -1

= Λ2 U { 1

{

= U{

{

1

Λ2

{

1

= UΛ2 ,

where we used unitarity of U and Hermiticity of Λ1/2 indicated in (18.11). Using the above relations and rewriting (18.17), finally we get V { D gj

{

V -1

{

∙ V - 1 D gj V = E:

ð18:18Þ

~ gj as follows Defining D ~ gj  V - 1 D gj V, D

ð18:19Þ

and taking adjoint of both sides of (18.19), we get V { D gj

{

V -1

{

{

~ gj : =D

ð18:20Þ

Then, from (18.18) we have ~ gj { D ~ gj = E: D

ð18:21Þ

Equation (18.21) implies that a representation of any finite group can be converted to a unitary representation by a similarity transformation of (18.19). This completes the proof.

18.2

Basis Functions of Representation

In Part III we considered a linear transformation of a vector. In that case we have defined vectors as abstract elements with which operation laws of (11.1)–(11.8) hold. We assumed that the operation is addition. In this part, so far we have dealt with vectors mostly in ℝ3. Therefore, vectors naturally possess geometric features. In this section, we extend a notion of vectors so that they can be treated under a wider scope. More specifically, we include mathematical functions treated in analysis as vectors. To this end, let us think of a basis of a representation. According to the dimension d of the representation, we have d basis vectors. Here we adopt d linearly independent basis vectors for the representation.

710

18 Representation Theory of Groups

Let ψ 1, ψ 2, ⋯, and ψ d be linearly independent vectors in a vector space Vd. Let ℊ = {g1, g2, ⋯, gn} be a finite group of order n. Here we assume that gi (1 ≤ i ≤ n) is a linear transformation (or operator) such as a symmetry operation dealt with in Chap. 17. Suppose that the following relation holds with gi 2 ℊ (1 ≤ i ≤ n): d

gi ð ψ ν Þ =

ψ μ Dμν ðgi Þ ð1 ≤ ν ≤ dÞ:

μ=1

ð18:22Þ

Here we followed the notation of (11.37) that represented a linear transformation of a vector. Rewriting (18.22) more explicitly, we have D11 ðgi Þ ⋮ Dd1 ðgi Þ

gi ðψ ν Þ = ðψ 1 ψ 2 ⋯ ψ d Þ

⋯ D1d ðgi Þ ⋱ ⋮ ⋯ Ddd ðgi Þ

:

ð18:23Þ

Comparing (18.23) with (11.37), we notice that ψ 1, ψ 2, ⋯, and ψ d act as vectors. The corresponding coordinates of the vectors (or a column vector), i.e., x1, x2, ⋯, and xd, have been omitted. Now let us make sure that a set L consisting of {D(g1), D(g2), ⋯, D(gn)} forms a representation. Operating gj on (18.22), we have d

d

gj ½gi ðψ ν Þ = gj gi ðψ ν Þ = gj

μ=1

d

= μ=1

ψ μ Dμν ðgi Þ =

d

gj ψ μ Dμν ðgi Þ = =

d λ=1

μ=1

d λ=1

μ=1

gj ψ μ Dμν ðgi Þ

ψ λ Dλμ gj Dμν ðgi Þ

ψ λ D gj Dðgi Þ

λν

:

ð18:24Þ

Putting gjgi = gk according to multiplication of group elements, we get gk ð ψ ν Þ =

d λ=1

ψ λ D gj Dðgi Þ

λν

:

ð18:25Þ

Meanwhile, replacing gi in (18.22) with gk, we have d

gk ð ψ ν Þ =

μ=1

ψ λ Dλν ðgk Þ:

ð18:26Þ

Comparing (18.25) and (18.26) and considering the uniqueness of vector representation based on linear independence of the basis vectors (see Sect. 11.1), we get

18.2

Basis Functions of Representation

D gj D ð gi Þ

711

λν

= Dλν ðgk Þ  ½Dðgk Þλν ,

ð18:27Þ

where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have D gj Dðgi Þ = Dðgk Þ:

ð18:28Þ

Thus, the set L consisting of {D(g1), D(g2), ⋯, D(gn)} is certainly a representation of the group ℊ = {g1, g2, ⋯, gn}. In such a case, the set B consisting of linearly independent d functions (i.e., vectors) B = fψ 1 , ψ 2 , ⋯, ψ d g

ð18:29Þ

is said to be a basis functions of the representation D. The number d equals the dimension of representation. Correspondingly, the representation matrix is a (d, d ) square matrix. As remarked above, the correspondence between the elements of L and ℊ is not necessarily one-to-one (isomorphic), but may be n-to-one (homomorphic). Definition 18.2 Let D and D′ be two representations of ℊ = {g1, g2, ⋯, gn}. Suppose that these representations are related to each other by similarity transformation such that D0 ðgi Þ = T - 1 Dðgi ÞT ð1 ≤ i ≤ nÞ,

ð18:30Þ

where T is a non-singular matrix. Then, D and D′ are said to be equivalent representations, or simply equivalent. If the representations are not equivalent, they are called inequivalent. Suppose that B = fψ 1 , ψ 2 , ⋯, ψ d g is a basis of a representation D of ℊ = {g1, g2, ⋯, gn}. Then we have (18.22). Let T be a non-singular matrix. Using T, we want to transform a basis of the representation D from B to a new set B 0 = ψ 01 , ψ 02 , ⋯, ψ 0d . Individual elements of the new basis are expressed as ψ 0ν =

d λ=1

ψ λ T λν :

ð18:31Þ

Since T is non-singular, this ensures that ψ 01 , ψ 02 , ⋯, and ψ 0d are linearly independent and, hence, that B 0 forms another basis set of D (see discussion of Sect. 11.4). Thus, we can describe ψ μ (1 ≤ μ ≤ n) in terms of ψ 0ν . That is, d

ψμ =

λ=1

ψ 0λ T - 1

Operating gi on both sides of (18.31), we have

λμ

:

ð18:32Þ

712

18 Representation Theory of Groups

gi ψ 0ν = =

d λ=1

d

d

μ=1 κ=1

d λ=1

ψ 0κ T - 1

κμ

d

d

gi ðψ λ ÞT λν =

λ=1

Dμλ ðgi ÞT λν =

μ=1 d κ=1

ψ μ Dμλ ðgi ÞT λν ψ 0κ T - 1 Dðgi ÞT

κν

:

ð18:33Þ

Let D′ be a representation of ℊ in reference to B 0 = ψ 01 , ψ 02 , ⋯, ψ 0d . Then we have gi ψ 0ν =

d κ=1

ψ 0κ D ′ κν :

Hence, (18.30) follows in virtue of the linear independence of ψ 01 , ψ 02 , ⋯, and ψ 0d . Thus, we see that the transformation of basis vectors via (18.31) causes similarity transformation between representation matrices. This is in parallel with (11.81) and (11.88). Below we show several important notions as a definition. We assume that a group is a finite group. ~ be two representations of ℊ = {g1, g2, ⋯, gn}. Let Definition 18.3 Let D and D ~ ðgi Þ be two representation matrices of gi (1 ≤ i ≤ n). If we construct Δ(gi) D(gi) and D such that

Δ ð gi Þ =

Dðgi Þ

~ ð gi Þ , D

ð18:34Þ

then Δ(gi) is a representation as well. The representation Δ(gi) is said to be a direct ~ ðgi Þ. We denote it by sum of D(gi) and D Δðgi Þ = Dðgi Þ

~ ðgi Þ: D

ð18:35Þ

~ ðgi Þ. A dimension Δ(gi) is a sum of that of D(gi) and that of D Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation matrix of a group element gi. If we can convert D(gi) to a block matrix such as (18.34) by its similarity transformation, D is said to be a reducible representation, or completely reducible. If the representation is not reducible, it is irreducible. A reducible representation can be decomposed (or reduced) to a direct sum of plural irreducible representations. Such an operation is called reduction. Definition 18.5 Let ℊ = {g1, g2, ⋯, gn} be a group and let V be a vector space. Then, we denote a representation D of ℊ operating on V by

18.2

Basis Functions of Representation

713

D : ℊ → GLðV Þ: We call V a representation space (or carrier space) L S of D [3, 4]. From Definition 18.5, the dimension of representation is identical with the dimension of V. Suppose that there is a subspace W in V. If W is D(gi)-invariant (see Sect. 12.2) with all 8gi 2 ℊ, W is said to be an invariant subspace of V. Here, we say that W is D(gi)-invariant if we have | xi 2 W ⟹ D(gi)| xi 2 W; see Sect. 12.2. Notice that | xi may well represent a function ψ ν of (18.22). In this context, we have a following important theorem. Theorem 18.2 [3] Let D: ℊ → GL(V ) be a unitary representation over V. Let W be a D(gi)-invariant subspace of V, where gi (1 ≤ i ≤ n) 2 ℊ = {g1, g2, ⋯, gn}. Then, W⊥ is a D(gi)-invariant subspace of V as well. Proof Suppose that | ai 2 W⊥ and let | bi 2 W. Then, from (13.86) we have hbjDðgi Þjai = ajDðgi Þ{ jb





= aj½Dðgi Þ - 1 jb



= ajD gi - 1 jb ,

ð18:36Þ

where with the second equality we used the fact that D(gi) is a unitary matrix; also, see (18.6). But, since W is D(gi)-invariant from the supposition, jD(gi-1)| bi 2 W. Here notice that gi-1 must be identical to ∃gk (1 ≤ k ≤ n) 2 ℊ. Therefore, we should have ajD gi - 1 jb = 0: Hence, from (18.36) we get 0 = ajD gi - 1 jb



= hbjDðgi Þjai:

This implies that jD(gi)| ai 2 W⊥. In turn, this means that W⊥ is a D(gi)-invariant subspace of V. This completes the proof. From Theorem 14.1, we have V = W ⨁ W⊥. Correspondingly, as in the case of (12.102), D(gi) is reduced as follows: Dðgi Þ =

DðW Þ ðgi Þ



DðW Þ ðgi Þ

,



where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces W and W⊥, respectively. This is the converse of the additive representations that appeared in Definition 18.3. From (11.17) and (11.18), we have dimV = dimW þ dimW ⊥ :

ð18:37Þ

714

18 Representation Theory of Groups

If V is a d-dimensional vector space, V is spanned by d linearly independent basis vectors (or functions) ψ μ (1 ≤ μ ≤ d ). Suppose that the dimension W and W⊥ is d(W ) ⊥ and dðW Þ , respectively. Then, W and W⊥ are spanned by d(W ) linearly independent ⊥ vectors and dðW Þ linearly independent vectors, respectively. The subspaces W and W⊥ may well further be decomposed into orthogonal complements [i.e., D(gi)invariant subspaces of V]. In this situation, in general, we write Dðgi Þ = Dð1Þ ðgi Þ

Dð2Þ ðgi Þ



DðωÞ ðgi Þ:

ð18:38Þ

Thus, unitary representation is completely reducible (Definition 18.4), if it is reducible at any rate. This is one of the conspicuous features of the unitary representation. This can readily be understood during the course of the proof of Theorem 18.2. We will develop further detailed discussion in Sect. 18.4. If the aforementioned decomposition cannot be done, D is irreducible. Therefore, it will be of great importance to examine a dimension of irreducible representations contained in (18.38). This is closely related to the choice of basis vectors and properties of the invariant subspaces. Notice that the same irreducible representation may repeatedly occur in (18.38). Before advancing to the next section, however, let us think of an example to get used to abstract concepts. This is at the same time for the purpose of taking the contents of Chap. 19 in advance. Example 18.1 Figure 18.1 shows structural formulae including resonance structures for allyl radical. Thanks to the resonance structures, the allyl radical belongs to C2v; see the multiplication table in Table 17.1. The molecule lies on the yz-plane and a line connecting C1 and a bonded H is the C2 axis (see Fig. 18.1). The C2 axis is identical to the z-axis. The molecule has mirror symmetries with respect to the yzand zx-planes. We denote π-orbitals of C1, C2, and C3 by ϕ1, ϕ2, and ϕ3, respectively. We suppose that these orbitals extend toward the x-direction with a positive sign in the upper side and a negative sign in the lower side relative to the plane of paper. Notice that we follow custom of the group theory notation with the coordinate setting in Fig. 18.1. We consider an inner product space V3 spanned by ϕ1, ϕ2, and ϕ3. To explicitly show that these are vectors of the inner product space, we express them as jϕ1i, jϕ2i, and jϕ3i in this example. Then, according to (11.19) of Sect. 11.1, we write V 3 = Spanfjϕ1 i, jϕ2 i, jϕ3 ig: The vector space V3 is a representation space pertinent to a representation D of the present example. Also, in parallel to (13.32) of Sect. 13.2, we express an arbitrary vector jψi 2 V3 as

18.2

Basis Functions of Representation

715

H

H

C1

H

H

C1

H

H

C3

C2

C3

C2

H

H

H

H

z

x

y

Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The zaxis is identical with a straight line connecting C1 and H (i.e., C2 axis)

jψi = c1 jϕ1 i þ c2 jϕ2 i þ c3 jϕ3 i = j c1 ϕ1 þ c2 ϕ2 þ c3 ϕ3 i

c1 c2 c3

= ðjϕ1 i jϕ2 i jϕ3 iÞ

:

Let us now operate a group element of C2v. For example, choosing C2(z), we have C 2 ðzÞðjψiÞ = ðjϕ1 i jϕ2 i jϕ3 iÞ

-1

0

0

c1

0 0

0 -1

-1 0

c2

:

ð18:39Þ

c3

Thus, C2(z) is represented by a (3, 3) matrix. Other group elements are represented similarly. These results are collected in Table 18.1, where each matrix is given with respect to jϕ1i, jϕ2i, and jϕ3i as basis vectors. Notice that the matrix representations differ from those of Table 17.2, where we chose jxi, j yi, and j zi for the basis vectors. From Table 18.1, we immediately see that the representation matrices are reduced to an upper (1, 1) diagonal matrix (i.e., just a number) and a lower (2, 2) diagonal matrix. In Table 18.1, we find that all the matrices are Hermitian (as well as unitary). Since C2v is an Abelian group (i.e., commutative), in light of Theorem 14.14, we should be able to diagonalize these matrices by a single unitary similarity transformation all at once. In fact, E and σ 0v ðyzÞ are invariant with respect to unitary similarity transformation, and so we only have to diagonalize C2(z) and σ v(zx) at once. As a characteristic equation of the above (3, 3) matrix of (18.39), we have

716

18 Representation Theory of Groups

Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i 1 0 0

0 1 0

0 0 1

-1 0 0

σ 0v ðyzÞ

σ v(zx)

C2(z)

E

0 0 -1

0 -1 0

-λ-1 0 0

1 0 0

0 -λ -1

0 0 1

-1 0 0

0 1 0

0 -1 0

0 0 -1

0 - 1 = 0: -λ

Solving the above equation, we get λ = 1 or λ = - 1 (as a double root). Also as a diagonalizing unitary matrix U, we get 1 U=

0 0

0 1 p 2 1 p 2

0 1 p 2 1 -p 2

= U{:

ð18:40Þ

Thus, (18.39) can be rewritten as C 2 ðzÞðjψiÞ = ðjϕ1 i jϕ2 i jϕ3 iÞUU {

1 1 = jϕ1 i p jϕ2 þ ϕ3 i p jϕ2 - ϕ3 i 2 2

-1

0

0

0 0

0 -1

-1 0

-1

0

0

-1 0

0 1

0 0

UU {

c1 c2 c3

c1 1 p ð c2 þ c 3 Þ 2 1 p ð c2 - c3 Þ 2

:

The diagonalization of the representation matrix σ v(zx) using the same U as the above is left for the readers as an exercise. Table 18.2 shows the results of the diagonalized representations with regard to the vectors j ϕ1 i, p12 j ϕ2 þ ϕ3 i, and p12 j ϕ2 - ϕ3 i. Notice that traces of the matrices remain unchanged in Tables 18.1 and 18.2, i.e., before and after the unitary similarity transformation. From Table 18.2, we find that the vectors are eigenfunctions of the individual symmetry operations. Each diagonal element is a corresponding eigenvalue of those operations. Of the three vectors, jϕ1i and p12 j ϕ2 þ ϕ3 i have the same eigenvalues with respect to individual symmetry operations. With symmetry operations C2 and σ v(zx), another vector p12 j ϕ2 - ϕ3 i has an eigenvalue of a sign opposite to that of the former two vectors. Thus, we find that we have arrived at “symmetry-adapted” vectors by taking linear combination of original vectors.

18.3

Schur’s Lemmas and Grand Orthogonality Theorem (GOT)

717

Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i, þ ϕ3 iÞ, and p12 ðjϕ2 - ϕ3 iÞ

p1 ðjϕ 2 2

1 0 0

0 1 0

0 0 1

-1 0 0

σ 0v ðyzÞ

σ v(zx)

C2(z)

E

0 -1 0

0 0 1

1 0 0

0 1 0

0 0 -1

-1 0 0

0 -1 0

0 0 -1

Returning back to Table 17.2, we see that the representation matrices have already been diagonalized. In terms of the representation space, we constructed the representation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In the next chapter, we make the most of such vectors constructed by the symmetryadapted linear combination. Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we deal with molecular orbital (MO) calculations. Including the present case of the allyl radical, we solve the energy eigenvalue problem by appropriately determining those coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a representation space. A dimension of the representation space depends on the number of molecular orbitals. The representation space is decomposed into several or more (but finite) orthogonal complements according to (18.38). We will come back to the present example in Sect. 19.4.4 and further investigate the problems there. As can be seen in the above example, the representation space accommodates various types of vectors, e.g., mathematical functions in the present case. If the representation space is decomposed into invariant subspaces, we can choose appropriate basis vectors for each subspace, the number of which is equal to the dimensionality of each irreducible representation [4]. In this context, the situation is related to that of Part III where we have studied how a linear vector space is decomposed into direct sum of invariant subspaces that are spanned by associated basis vectors. In particular, it will be of great importance to construct mutually orthogonal symmetry-adapted vectors through linear combination of original basis vectors in each subspace associated with the irreducible representation. We further study these important subjects in the following several sections.

18.3

Schur’s Lemmas and Grand Orthogonality Theorem (GOT)

Pivotal notions of the representation theory of finite groups rest upon Schur’s lemmas (first lemma and second lemma). Schur’s First Lemma [1, 2] ~ be two irreducible representations of ℊ. Let dimensions of represenLet D and D ~ be m and n, respectively. Suppose that with 8g 2 ℊ the following tations of D and D relation holds:

718

18 Representation Theory of Groups

~ ðgÞ, DðgÞM = M D

ð18:41Þ

where M is a (m, n) matrix. Then we must have Case (1): M = 0 or Case (2): M is a square matrix (i.e., m = n) with detM ≠ 0. ~ are inequivalent. In Case (2), on the In Case (1), the representations of D and D ~ other hand, D and D are equivalent. Proof (a) First, suppose that m > n. Let B = fψ 1 , ψ 2 , ⋯, ψ m g be a basis set of the representation space related to D. Then we have m

gðψ ν Þ =

μ=1

ψ μ Dμν ðgÞ ð1 ≤ ν ≤ mÞ:

Next we form a linear combination of ψ 1, ψ 2, ⋯, and ψ m such that m

ϕν =

μ=1

ψ μ M μν ð1 ≤ ν ≤ nÞ:

ð18:42Þ

Operating g on both sides of (18.42), we have m

gðϕν Þ = m

=

λ=1

=

ψλ

μ=1

g ψ μ M μν =

m μ=1

n

m

μ=1

λ=1

m

m

μ=1

λ=1

Dλμ ðgÞM μν = ~ μν ðgÞ = ψ λ M λμ D

m λ=1 n

ψ λ Dλμ ðgÞ M μν n

ψλ

μ=1

~

~ μν ðgÞ M λμ D

ϕ D ðgÞ: μ = 1 μ μν

ð18:43Þ

With the fourth equality of (18.43), we used (18.41). Therefore, B~ = ~ fϕ1 , ϕ2 , ⋯, ϕn g constitutes a basis set of the representation space for D. If m > n, we would have been able to construct a basis of the representation D using n ( n with a (m, n) matrix M. Figure 18.2 graphically shows the magnitude relationship in dimensions of representation between representation matrices and M. Thus, in parallel with the above argument of (a), we exclude the case where m < n as well. (c) We consider the third case of m = n. Similarly as before we make a linear combination of vectors contained in the basis set B = fψ 1 , ψ 2 , ⋯, ψ m g such that m

ϕν =

μ=1

ψ μ M μν ð1 ≤ ν ≤ mÞ,

ð18:47Þ

where M is a (m, m) square matrix. If detM = 0, then ϕ1, ϕ2, ⋯, and ϕm are linearly dependent (see Sect. 11.4). With the number of linearly independent vectors p, we have p < m accordingly. As in (18.43) again, this implies that we would have ~ in contradiction to the obtained a representation of a smaller dimension p for D, ~ supposition that D is irreducible. To avoid this contradiction, we must have detM ≠ 0. These complete the proof. Schur’s Second Lemma [1, 2] Let D be a representation of ℊ. Suppose that with 8g 2 ℊ we have

720

(a):

18 Representation Theory of Groups

> ( ) (m, m)

(b):

×

(m, n)

=

(m, n)

×

( ) (n, n)

< ( ) (n, n)

×

(n, m)

=

(n, m)

×

) ( (m, m)

~ ðgÞ] Fig. 18.2 Magnitude relationship between dimensions of representation matrices [D(g) and D and M. (a) m > n. (b) m < n. The diagram is based on (18.41) and (18.46) of the text

DðgÞM = MDðgÞ:

ð18:48Þ

Then, if D is irreducible, M = cE with c being an arbitrary complex number, where E is an identity matrix. Proof Let c be an arbitrarily chosen complex number. From (18.48) we have DðgÞðM - cEÞ = ðM - cE ÞDðgÞ:

ð18:49Þ

If D is irreducible, Schur’s first lemma implies that we must have either (1) M cE = 0 or (2) det(M - cE) ≠ 0. A matrix M has at least one proper eigenvalue λ (Sect. 12.1), and so choosing λ for c in (18.49), we have det(M - λE) = 0. Consequently, only former case is allowed. That is, we have M = cE:

ð18:50Þ

This completes the proof. Schur’s lemmas lead to important orthogonality theorem that plays a fundamental role in many scientific fields. The orthogonality theorem includes that of matrices and their traces (or characters). Theorem 18.3 Grand Orthogonality Theorem (GOT) [5] Let D(1), D(2), ⋯ be all inequivalent irreducible representations of a group ℊ = {g1, g2, ⋯, gn} of order

18.3

Schur’s Lemmas and Grand Orthogonality Theorem (GOT)

721

n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2), ⋯. Then, regarding their matrix representations, we have the following relationship: ðαÞ

ðβ Þ

Dij ðgÞ Dkl ðgÞ = g

n δ δ δ , dα αβ ik jl

ð18:51Þ

where Σg means that the summation should be taken over all n group elements and dα denotes a dimension of the representation D(α). The symbol δαβ means that δαβ = 1 when D(α) and D(β) are equivalent and that δαβ = 0 when D(α) and D(β) are inequivalent. Proof First we prove the case where D(α) = D(β). For the sake of simple expression, we omit a superscript and denote D(α) simply by D. Let us construct a matrix A such that A=

g

DðgÞXD g - 1 ,

ð18:52Þ

where X is an arbitrary matrix. Hence, Dðg0 ÞA =

g

Dðg0 ÞDðgÞXD g - 1 = =

g

=

g

Dðg0 ÞDðgÞXD g - 1 D g0

Dðg0 ÞDðgÞXD g - 1 g0 g

Dðg0 gÞXD ðg0 gÞ

-1

-1

-1

D ð g0 Þ

D ð g0 Þ

Dðg0 Þ:

ð18:53Þ

Thanks to the rearrangement theorem, for fixed g′ the element g′g runs through all the group elements as g does so. Therefore, we have

g

Dðg0 gÞXD ðg0 gÞ

-1

=

g

DðgÞXD g - 1 = A:

ð18:54Þ

Thus, DðgÞA = ADðgÞ:

ð18:55Þ

According to Schur’s second lemma, we have A = λE:

ð18:56Þ ðlÞ

The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5 and 12.6). Thus from (18.53) we have

722

18 Representation Theory of Groups

Dip ðgÞδðplÞ δðqmÞ Dqj g - 1 =

g

g, p, q

Dil ðgÞDmj g - 1 = λlm δij ,

ð18:57Þ

where λlm is a constant to be determined. Using the fact that the representation is unitary, from (18.57) we have g

Dil ðgÞDjm ðgÞ = λlm δij :

ð18:58Þ

Next, we wish to determine coefficients λlm. To this end, setting i = j and summing over i in (18.57), we get for LHS g

i

Dil ðgÞDmi g - 1 = =

g

g

D g - 1 DðgÞ

½DðeÞml =

δ g ml

ml

=

g

D g - 1g

= nδml ,

ml

ð18:59Þ

where n is equal to the order of group. As for RHS, we have λ δ i lm ii

= λlm d,

ð18:60Þ

where d is equal to a dimension of D. From (18.59) and (18.60), we get n δ : d lm

ð18:61Þ

n δ δ : d lm ij

ð18:62Þ

λlm d = nδlm or λlm = Therefore, from (18.58) g

Dil ðgÞDjm ðgÞ =

Specifying a species of the irreducible representation, we get ðαÞ

g

ðαÞ

Dil ðgÞDjm ðgÞ =

n δ δ , dα lm ij

ð18:63Þ

where dα is a dimension of D(α). Next, we examine the relationship between two inequivalent irreducible representations. Let D(α) and D(β) be such representations with dimensions dα and dβ, respectively. Let us construct a matrix B such that B=

g

DðαÞ ðgÞXDðβÞ g - 1 ,

where X is again an arbitrary matrix. Hence,

ð18:64Þ

18.4

Characters

723

DðαÞ ðg0 ÞB = = =

g

g

g

DðαÞ ðg0 ÞDðαÞ ðgÞXDðβÞ g - 1

DðαÞ ðg0 ÞDðαÞ ðgÞXDðβÞ g - 1 DðβÞ g0

DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ

-1

-1

DðβÞ ðg0 Þ

DðβÞ ðg0 Þ = BDðβÞ ðg0 Þ:

ð18:65Þ

According to Schur’s first lemma, we have B = 0:

ð18:66Þ

ðlÞ

Putting X = δi δjðmÞ as before and rewriting (18.64), we get ðαÞ

g

ðβ Þ

Dil ðgÞDjm ðgÞ = 0:

ð18:67Þ

Combining (18.63) and (18.67), we get (18.51). These procedures complete the proof.

18.4 Characters Representation matrices of a group are square matrices. In Part III we examined properties of a trace, i.e., a sum of diagonal elements of a square matrix. In group theory the trace is called a character. Definition 18.6 Let D be a (matrix) representation of a group ℊ = {g1, g2, ⋯, gn}. The sum of diagonal elements χ(g) is defined as follows: χ ðgÞ  TrDðgÞ =

d i=1

Dii ðgÞ,

ð18:68Þ

where g stands for group elements g1, g2, ⋯, and gn, Tr stands for “trace,” and d is a dimension of the representation D. Let ∁ be a set defined as ∁ = fχ ðg1 Þ, χ ðg2 Þ, ⋯, χ ðgn Þg:

ð18:69Þ

Then, the set ∁ is called a character of D. A character of an irreducible representation is said to be an irreducible character. Let us describe several properties of the character or trace. 1. A character of the identity element χ(e) is equal to a dimension d of a representation. This is because the identity element is given by a unit matrix. 2. Let P and Q be two square matrices. Then, we have

724

18 Representation Theory of Groups

TrðPQÞ = TrðQPÞ:

ð18:70Þ

This is because ðPQÞii = i

Pij Qji = i

j

Qji Pij = j

i

ðQPÞjj :

ð18:71Þ

j

Putting Q = SP-1 in (18.70), we get Tr PSP - 1 = Tr SP - 1 P = TrðSÞ:

ð18:72Þ

Therefore, we have the following property: 3. Characters of group elements that are conjugate to each other are equal. If gi and gj are conjugate, these elements are connected by means of a suitable element g such that ggi g - 1 = gj :

ð18:73Þ

Accordingly, a representation matrix is expressed as DðgÞDðgi ÞD g - 1 = DðgÞDðgi Þ½DðgÞ - 1 = D gj :

ð18:74Þ

Taking a trace of both sides of (18.74), from (18.72) we have χ ðgi Þ = χ gj :

ð18:75Þ

4. Any two equivalent representations have the same trace. This immediately follows from (18.30). There are several orthogonality theorem about a trace. Among them, a following theorem is well known. Theorem 18.4 A trace of irreducible representations satisfies the following orthogonality relation:

g

χ ðαÞ ðgÞ χ ðβÞ ðgÞ = nδαβ ,

ð18:76Þ

where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β), respectively. Proof In (18.51) putting i = j and k = l in both sides and summing over all i and k, we have

18.4

Characters

725 ðαÞ

g

i, k

ðβÞ

Dii ðgÞ Dkk ðgÞ =

n n n δαβ δik δik = δαβ δik = δαβ dα d d d α α α i, k i, k = nδαβ :

ð18:77Þ

From (18.68) and (18.77), we get (18.76). This completes the proof. Since a character is identical with group elements belonging to the same conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as a summation of the conjugacy classes. Thus, we have nc l=1

χ ðαÞ ðK l Þ χ ðβÞ ðK l Þk l = nδαβ ,

ð18:78Þ

where nc denotes the number of conjugacy classes in a group and kl indicates the number of group elements contained in a class Kl. We have seen a case where a representation matrix can be reduced to two (or more) block matrices as in (18.34). Also as already seen in Part III, the block matrix decomposition takes place with normal matrices (including unitary matrices). The character is often used to examine a constitution of a reducible representation or reducible matrix. Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed into block matrices, we say that the unitary matrix comprises a direct sum of those block matrices. In physics and chemistry dealing with atoms, molecules, crystals, etc., we very often encounter such a situation. Extending (18.35), the relation can generally be summarized as Dðgi Þ = Dð1Þ ðgi Þ

Dð2Þ ðgi Þ



DðωÞ ðgi Þ,

ð18:79Þ

where D(gi) is a reducible representation for a group element gi and D(1)(gi), D(2)(gi), ⋯, and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi) means that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may be inequivalent to them. More specifically, the same irreducible representations may well appear several times. To make the above situation clear, we usually use a following equation instead: Dðgi Þ =

q D α α

ðαÞ

ðgi Þ,

ð18:80Þ

where qα is zero or a positive integer and D(α) is different types of irreducible representations. If the same D(α) repeatedly appears in the direct sum, then qα specifies how many times D(α) appears in the direct sum. Unless D(α) appears, qα is zero. Bearing the above in mind, we take a trace of (18.80). Then we have

726

18 Representation Theory of Groups

χ ð gÞ =

q χ α α

ðαÞ

ðgÞ,

ð18:81Þ

where χ(g) is a character of the reducible representation for a group element g, where we omitted a subscript i indicating the i-th group element gi (1 ≤ i ≤ n). To find qα, let us multiply both sides of (18.81) by χ (α)(g) and take summation over group elements. That is, g

χ ð α Þ ð gÞ  χ ð gÞ =

q β β

χ ðαÞ ðgÞ χ ðβÞ ðgÞ = β

g

qβ nδαβ = qα n,

ð18:82Þ

where we used (18.76) with the second equality. Thus, we get qα =

1 n

g

χ ðαÞ ðgÞ χ ðgÞ:

ð18:83Þ

The integer qα explicitly gives the number of appearance of D(α)(g) that appears in a reducible representation D(g). The expression pertinent to the classes is qα =

1 n



j

χ ðαÞ K j χ K j kj ,

ð18:84Þ

where Kj and kj denote the j-th class of the group and the number of elements belonging to Kj, respectively.

18.5

Regular Representation and Group Algebra

Now, the readers may wonder how many different irreducible representations exist for a group. To answer this question, let us introduce a special representation of the regular representation. Definition 18.7 Let ℊ = {g1, g2, ⋯, gn} be a group. Let us define a (n, n) square matrix D(R)(gν) for an arbitrary group element gν (1 ≤ ν ≤ n) such that DðRÞ ðgν Þ

ij

= δ gi- 1 gν gj

ð1 ≤ i, j ≤ nÞ,

ð18:85Þ

where δ ð gν Þ = Let us consider a set

1 for gν = e ði:e:; identityÞ, 0

for gν ≠ e:

ð18:86Þ

18.5

Regular Representation and Group Algebra

727

R = DðRÞ ðg1 Þ, DðRÞ ðg2 Þ, ⋯, DðRÞ ðgn Þ :

ð18:87Þ

Then, the set R is said to be a regular representation of the group ℊ. In fact, R is a representation. This is confirmed as follows: In (18.85), if gi- 1 gν gj = e, δ gi- 1 gν gj = 1. This occurs, when gνgj = gi (A). Meanwhile, let us consider a situation where gj- 1 gμ gk = e. This occurs when gμgk = gj (B). Replacing gj in (A) with that in (B), we have gν g μ g k = g i :

ð18:88Þ

gi- 1 gν gμ gk = e:

ð18:89Þ

That is,

If we choose gi and gν, gj is uniquely decided from (A). If gμ is separately chosen, then gk is uniquely decide from (B) as well, because gj has already been uniquely decided. Thus, performing the following matrix calculations, we get DðRÞ ðgν Þ j

ij

DðRÞ gμ

jk

δ gi- 1 gν gj δ gj- 1 gμ gk = δ gi- 1 gν gμ gk

= j

= D ð RÞ g ν g μ

ik

:

ð18:90Þ

Rewriting (18.90) in a matrix product form, we get DðRÞ ðgν ÞDðRÞ gμ = DðRÞ gν gμ :

ð18:91Þ

Thus, D(R) is certainly a representation. To further confirm this, let us think of an example. Example 18.2 Let us consider a thiophene molecule that we have already examined in Sect. 17.2. In a multiplication table, we arrange E, C2, σ v , σ v0 in a first column and their inverse element E, C2, σ v , σ v0 in a first row. In this case, the inverse element is the same as original element itself. Paying attention, e.g., to C2, we allocate the number 1 on the place where C2 appears and the number 0 otherwise. Then, that matrix is a regular representation of C2; see Table 18.3. Thus as D(R)(C2), we get

728

18 Representation Theory of Groups

Table 18.3 How to make a regular representation of C2v

C2v E C2(z) σ v(zx) σ 0v ðyzÞ

E-1 E C2 σv σ 0v

DðRÞ ðC 2 Þ =

0 1 0 0

C2(z)-1 C2 E σ 0v σv

1 0 0 0

0 0 0 1

0 0 1 0

σ 0v ðyzÞ - 1 σ 0v σv C2 E

σ v(zx)-1 σv σ 0v E C2

:

ð18:92Þ

As evidenced in (18.92), the rearrangement theorem ensures that the number 1 appears once and only once in each column and each row in such a way that individual column and row vectors become linearly independent. Thus, at the same time, we confirm that the matrix is unitary. Another characteristic of the regular representation is that the identity is represented by an identity matrix. In this example, we have

ð RÞ

D

ðE Þ =

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

:

ð18:93Þ

For the other symmetry operations, we have

D

ð RÞ

ðσ v Þ =

0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

,D

ð RÞ

ðσ v ′ Þ =

0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0

:

Let χ (R)(gν) be a character of the regular representation. Then, according to the definition of (18.85), χ ðRÞ ðgν Þ =

n

δ gi- 1 gν gi = i=1

n

for gν = e,

0

for gν ≠ e:

ð18:94Þ

As can be seen from (18.92), the regular representation is reducible because the matrix is decomposed into block matrices. Therefore, the representation can be reduced to a direct sum of irreducible representations such that DðRÞ =

α

qα DðαÞ ,

ð18:95Þ

18.5

Regular Representation and Group Algebra

Table 18.4 Character table of C2v

C2v A1 A2 B1 B2

E 1 1 1 1

729 C2(z) 1 1 -1 -1

σ v(zx) 1 -1 1 -1

σ 0v ðyzÞ 1 -1 -1 1

z; x2, y2, z2 xy x; zx y; yz

where qα is a positive integer or zero and D(α) is an irreducible representation. Then, from (18.81), we have χ ðRÞ ðgν Þ =

q χ α α

ðαÞ

ðgν Þ,

ð18:96Þ

where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and (18.94), qα =

1 n

g

χ ðαÞ ðgÞ χ ðRÞ ðgÞ =

1 ðαÞ  ðRÞ 1 χ ðeÞ χ ðeÞ = χ ðαÞ ðeÞ n = χ ðαÞ ðeÞ n n = dα :

ð18:97Þ

Note that a dimension dα of the representation D(α) is equal to a trace of its identity matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible representation D(α) dα times. To show it more clearly, Table 18.4 gives a character table of C2v. The regular representation matrices are given in Example 18.2. For this, we have DðRÞ ðC2v Þ = A1 þ A2 þ B1 þ B2 :

ð18:98Þ

This relation obviously indicates that all the irreducible representations of C2v are contained one time (that is equal to the dimension of representation of C2v). To examine general properties of the irreducible representations, we return to (18.94) and (18.96). Replacing qα with dα there, we get n for gν = e,

d χ ðαÞ ðgν Þ = α α

0 for gν ≠ e:

ð18:99Þ

In particular, when gν = e, again we have χ (α)(e) = dα (dα is a real number!). That is, d α α

2

= n:

ð18:100Þ

This is very important relation in the representation theory of a finite group in that (18.100) sets an upper limit to the number of irreducible representations and their dimensions. That number cannot exceed the order of a group.

730

18 Representation Theory of Groups

In (18.76) and (18.78), we have shown the orthogonality relationship between traces. We have another important orthogonality relationship between them. To prove this theorem, we need a notion of group algebra [5]. The argument is as follows: (a) Let us think of a set comprising group elements expressed as ℵ=

ð18:101Þ

ag g, g

where g is a group element of a group ℊ = {g1, g2, ⋯, gn}, ag is an arbitrarily chosen complex number, and Σg means that summation should be taken over group elements. Let ℵ′ be another set similarly defined as (18.101). That is, ℵ0 = g0

a0g0 g0 :

ð18:102Þ

Then we can define a following sum: ℵ þ ℵ0 =

a0g0 g0 =

ag g þ g0

g

ag g þ a0g g = g

ag þ a0g g:

ð18:103Þ

g

Also, we get ℵ ∙ ℵ0 =

ag g g0

g

ag gg0

=

-1

a0g0 - 1 =

g

g0 - 1

= 0 -1

g

gg0

a0g0 g0

g

g0

g0

g

agg0 gg0 g0 g0 - 1

agg0 g a0g0 - 1 =

=

ag gg0 a0g0

=

gg0

gg0

0 -1

-1

a0g0 - 1

agg0 a0g0 - 1 g

g

agg0 a0g0 - 1 g,

ð18:104Þ

where we used the rearrangement theorem and suitable exchange of group elements. Thus, we see that the above-defined set ℵ is closed under summations (i.e., linear combinations) and multiplications. A set closed under summations and multiplications is said to be an algebra. If the set forms a group, the said set is called a group algebra.

18.5

Regular Representation and Group Algebra

731

So far we have treated calculations as a multiplication between two elements g and g′, i.e., g ⋄ g′ (see Sect. 16.1). Now we start regarding the calculations as summation as well. In that case, group elements act as basis vectors in a vector space. Bearing this in mind, let us further define specific group algebra. (b) Let Ki be the i-th conjugacy class of a group ℊ = {g1, g2, ⋯, gn}. Also let Ki be such that ðiÞ

ðiÞ

ðiÞ

K i = A1 , A2 , ⋯, Aki ,

ð18:105Þ

where ki is the number of elements belonging to Ki. Now think of a set gKig-1 (8g 2 ℊ). Then, due to the definition of a class, we have gK i g - 1 ⊂ K i :

ð18:106Þ

Multiplying g-1 from the left and g from the right of both sides, we get Ki ⊂ g-1Kig. Since g is arbitrarily chosen, replacing g with g-1, we have K i ⊂ gK i g - 1 :

ð18:107Þ

gK i g - 1 = K i :

ð18:108Þ

Therefore, we get

ðiÞ

ðiÞ

Meanwhile, for AðαiÞ , Aβ 2 K i ; AαðiÞ ≠ Aβ (1 ≤ α, β ≤ ki), we have ðiÞ

gAαðiÞ g - 1 ≠ gAβ g - 1

8

g2ℊ :

ð18:109Þ ðiÞ

This is because if the equality holds with (18.109), we have AðαiÞ = Aβ , in contradiction. (c) Let K be a set collecting several classes and described as K=

ai K i ,

ð18:110Þ

i

where ai is a positive integer or zero. Thanks to (18.105) and (18.110), we have gKg - 1 = K:

ð18:111Þ

732

18 Representation Theory of Groups

Conversely, if a group algebra K satisfies (18.111), K can be expressed as a sum of classes such as (18.110). Here suppose that K is not expressed by (18.110), but described by ai K i þ Q,

K=

ð18:112Þ

i

where Q is an “incomplete” set that does not form a class. Then, gKg - 1 = g

ai K i þ Q g - 1 = g i

ai K i g - 1 þ gQg - 1 = i

ai K i i

þ gQg - 1 =K =

ai K i þ Q,

ð18:113Þ

i

where the equality before the last comes from (18.111). Thus, from (18.113) we get gQg - 1 = Q

8

g2ℊ :

ð18:114Þ

By definition of the classes, this implies that Q must form a “complete” class, in contradiction to the supposition. Thus, (18.110) holds. In Sect. 16.3, we described several characteristics of an invariant subgroup. As readily seen from (16.11) and (18.111), any invariant subgroup consists of two or more entire classes. Conversely, if a group comprises entire classes, it must be an invariant subgroup. (d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105). We define a product KiKj as a set containing products ðjÞ AαðiÞ Aβ 1 ≤ α ≤ ki , 1 ≤ β ≤ kj . That is, KiKj =

ki l=1

kj ðiÞ ðjÞ A Am : m=1 l

ð18:115Þ

Multiplying (18.115) by g (8g 2 ℊ) from the left and by g-1 from the right, we get gK i K j g - 1 = ðgK i gÞ g - 1 K j g - 1 = K i K j : From the above discussion, we get

ð18:116Þ

18.5

Regular Representation and Group Algebra

KiKj =

733

cijl K l ,

ð18:117Þ

l

where cijl is a positive integer or zero. In fact, when we take gKiKjg-1, we merely permute the terms in (18.117). (e) If two group elements of the group ℊ = {g1, g2, ⋯, gn} are conjugate to each other, their inverse elements are conjugate to each other as well. In fact, suppose that for gμ, gν 2 Ki, we have gμ = ggν g - 1 :

ð18:118Þ

Then, taking the inverse of (18.118), we get gμ - 1 = ggν - 1 g - 1 :

ð18:119Þ

Thus, given a class Ki, there exists another class K i0 that consists of the inverses of the elements of Ki. If gμ ≠ gν, gμ-1 ≠ gν-1. Therefore, Ki and K i0 are of the same order. Suppose that the number of elements contained in Ki is ki and that in K i0 is ki0 . Then, we have k i = k i0 :

ð18:120Þ

If K j ≠ K i0 (i.e., Kj is not identical with K i0 as a set), KiKj does not contain e. In fact, suppose that for gρ 2 Ki there were a group element b 2 Kj such that bgρ = e. Then, we would have b = gρ - 1 and b 2 K i0 :

ð18:121Þ

This would mean that b 2 K j K i0 , implying that K j = K i0 by definition of class. It is in contradiction to K j ≠ K i0 . Consequently, KiKj does not contain e. Taking the product of the classes Ki and K i0 , we obtain the identity e precisely ki times. Rewriting (18.117), we have K i K j = cij1 K 1 þ

cijl K l ,

ð18:122Þ

l≠1

where K1 = {e}. As mentioned below, we are most interested in the first term in (18.122). In (18.122), if K j = K i0 , we have cij1 = ki :

ð18:123Þ

On the other hand, if K j ≠ K i0 , cij1 = 0. Summarizing the above arguments, we get

734

18 Representation Theory of Groups

cij1 =

ki

for K j = K i ′ ,

0

for K j ≠ K i ′ :

ð18:124Þ

Or, symbolically we write cij1 = ki δji0:

ð18:125Þ

18.6 Classes and Irreducible Representations After the aforementioned considerations, we have the following theorem: Theorem 18.5 Let ℊ = {g1, g2, ⋯, gn} be a group. A trace of irreducible representations satisfies the following orthogonality relation: nr α=1



χ ðαÞ ðK i Þχ ðαÞ K j

=

n δ , ki ij

ð18:126Þ

where summation α is taken over all inequivalent nr irreducible representations, Ki and Kj indicate conjugacy classes, and ki denotes the number of elements contained in the i-th class Ki. Proof Rewriting (18.108), we have gK i = K i g

8

g2ℊ :

ð18:127Þ

Since a homomorphic correspondence holds between a group element and its representation matrix, a similar correspondence holds as well with (18.127). Let K i be a sum of ki matrices of the α-th irreducible representation D(α) and let K i be expressed as DðαÞ :

Ki =

ð18:128Þ

g2K i

Note that in (18.128) D(α) functions as a linear transformation with respect to a group algebra. From (18.127), we have DðαÞ K i = K i DðαÞ

8

g2ℊ :

ð18:129Þ

Since D(α) is an irreducible representation, K i must be expressed on the basis of Schur’s second lemma as

18.6

Classes and Irreducible Representations

735

K i = λE:

ð18:130Þ

To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and (18.130), we get ki χ ðαÞ ðK i Þ = λdα :

ð18:131Þ

Thus, we get Ki =

k i ðαÞ χ ðK i ÞE: dα

ð18:132Þ

Next, corresponding to (18.122), we have KiKj =

cijl K l :

ð18:133Þ

l

Replacing K l in (18.133) with that of (18.132), we get k i kj χ ðαÞ ðK i Þχ ðαÞ K j = dα

cijl k l χ ðαÞ ðK l Þ:

ð18:134Þ

l

Returning to (18.99) and rewriting it, we have d χ α α

ðαÞ

ðK i Þ = nδi1 ,

ð18:135Þ

where again we have K1 = {e}. With respect to (18.134), we sum over all the irreducible representations α. Then we have

α

ki k j χ ðαÞ ðK i Þχ ðαÞ K j =

cijl k l l

α

d α χ ðαÞ ðK l Þ =

cijl kl nδl1 l

= cij1 n:

ð18:136Þ

In (18.136) we remark that k1 = 1, meaning that the number of group elements that K1 = {e} contains is 1. Rewriting (18.136), we get ki kj

α

χ ðαÞ ðK i Þχ ðαÞ K j = cij1 n = k i nδji0 ,

where we used (18.125) with the last equality. Moreover, using

ð18:137Þ

736

18 Representation Theory of Groups

χ ðαÞ ðK i0 Þ = χ ðαÞ ðK i Þ ,

ð18:138Þ

we get ki kj

α

χ ðαÞ ðK i Þχ ðαÞ K j



= cij1 n = k i nδji :

ð18:139Þ

Rewriting (18.139), we finally get nr α=1

χ ðαÞ ðK i Þχ ðαÞ K j



=

n δ : k i ij

ð18:126Þ

These complete the proof. Equations (18.78) and (18.126) are well known as orthogonality relations. So far, we have no idea about the mutual relationship between the two numbers nc and nr in magnitude. In (18.78), let us consider a following set S S=

k 1 χ ðαÞ ðK 1 Þ,

k2 χ ðαÞ ðK 2 Þ, ⋯,

knc χ ðαÞ ðK nc Þ :

ð18:140Þ

Viewing individual components in S as coordinates of a nc-dimensional vector, (18.78) can be considered as an inner product as expressed using (complex) coordinates of the two vectors. At the same time, (18.78) represents an orthogonal relationship between the vectors. Since we can obtain at most nc mutually orthogonal (i.e., linearly independent) vectors in a nc-dimensional space, for the number (nr) of such vectors, we have nr ≤ n c :

ð18:141Þ

Here nr is equal to the number of different α, i.e., the number of irreducible representations. Meanwhile, in (18.126) we consider a following set S′ S0 = χ ð1Þ ðK i Þ, χ ð2Þ ðK i Þ, ⋯, χ ðnr Þ ðK i Þ :

ð18:142Þ

Similarly, individual components in S′ can be considered as coordinates of a nrdimensional vector. Again, (18.126) implies the orthogonality relation among vectors. Therefore, as for the number (nc) of mutually orthogonal vectors, we have nc ≤ n r :

ð18:143Þ

Thus, from (18.141) and (18.143), we finally reach a simple but very important conclusion about the relationship between nc and nr such that

18.7

Projection Operators: Revisited

737

nr = nc :

ð18:144Þ

That is, the number (nr) of inequivalent irreducible representations is equal to that (nc) of conjugacy classes of the group. An immediate and important consequence of (18.144) together with (18.100) is that the representation of an Abelian group is one-dimensional. This is because in the Abelian group individual group elements constitute a conjugacy class. We have nc = n accordingly.

18.7 Projection Operators: Revisited In Sect. 18.2 we have described how basis vectors (or functions) are transformed by a symmetry operation. In that case the symmetry operation is performed by a group element that belongs to a transformation group. More specifically, given a group ℊ = {g1, g2, ⋯, gn} and a set of basis vectors ψ 1, ψ 2, ⋯, and ψ d, the basis vectors are transformed by gi 2 ℊ (1 ≤ i ≤ n) such that d

gi ð ψ ν Þ =

μ=1

ψ μ Dμν ðgi Þ:

ð18:22Þ

We may well ask then how we can construct such basis vectors. In this section we address this question. A central concept about this is a projection operator. We have already studied the definition and basic properties of the projection operators. In this section, we deal with them bearing in mind that we apply the group theory to molecular science, especially to quantum chemical calculations. In Sect. 18.6, we examined the permissible number of irreducible representations. We have reached a conclusion that the number (nr) of inequivalent irreducible representations is equal to that (nc) of conjugacy classes of the group. According to this conclusion, we modify the above Eq. (18.22) such that ðαÞ

g ψi

=



ðαÞ

ðαÞ

ψ j Dji ðgÞ,

ð18:145Þ

j=1

where α and dα denote the α-th irreducible representation and its dimension, respectively; the subscript i is omitted from gi for simplicity. This naturally leads ðαÞ ðβ Þ to the next question of how the basis vectors ψ j are related to ψ j that are basis vectors as well, but belong to a different irreducible representation β. Suppose that we have an arbitrarily chosen function f. Then, f is assumed to contain various components of different irreducible representations. Thus, let us assume that f is decomposed into the component such that

738

18 Representation Theory of Groups

f=

cðαÞ ψ ðmαÞ , m m

α

ð18:146Þ

ðαÞ where cm is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th irreducible representation. Now, let us define the following operator: ðαÞ

PlðmÞ 

dα n

ðαÞ

Dlm ðgÞ g,

ð18:147Þ

g

where Σg means that the summation should be taken over all n group elements g and ðαÞ dα denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have dα n

ðαÞ

PlðmÞ f =

g

dα n

= =

=

=

dα n

dα n

dα n

ðνÞ

g ðνÞ

c k k

ν

dν j=1



ðνÞ

ðνÞ

ψj

dν j=1 ðαÞ

ð νÞ

ðνÞ

ψ j Djk ðgÞ ðνÞ

Dlm ðgÞ Djk ðgÞ

ψj

j=1

ðνÞ ðνÞ c ψk k k

ð νÞ ðνÞ c gψ k k k

c k k

ν

ν

g

ν

ðαÞ

Dlm ðgÞ

c k k

ðαÞ

Dlm ðgÞ g

ðαÞ

Dlm ðgÞ g

ðνÞ

ν

dα n

ðαÞ

Dlm ðgÞ gf =

g

n ðαÞ ðαÞ δ δ δ = cm ψl , d α αν lj mk

ð18:148Þ

where we used (18.51) for the equality before the last. ðαÞ ðαÞ Thus, an implication of the operator PlðmÞ is that if f contains the component ψ m , ðαÞ

ðαÞ ðαÞ i.e., cðmαÞ ≠ 0, PlðmÞ plays a role in extracting the component cm ψ m from f and then ðαÞ

ðαÞ converting it to cðmαÞ ψ l . If the ψ m component is not contained in f, that means from ðαÞ (18.148) PlðmÞ f = 0. In that case we choose a more suitable function for f. In this context, we will investigate an example of a quantum chemical calculation later. ðαÞ In the above case, let us call PlðmÞ a projection operator sensu lato by relaxing Definition 14.1 of the projection operator. In Sect. 14.1, we dealt with several aspects of the projection operators. There we have mentioned that a projection operator should be idempotent and Hermitian in a rigorous sense (Definition 14.1). To address another important aspect of the projection operator, let us prove an important relation in a following theorem.

18.7

Projection Operators: Revisited

739

ðαÞ

ðβÞ

Theorem 18.6 [1, 2] Let PlðmÞ and PsðtÞ be projection operators defined in (18.147). Then, the following equation holds: ðαÞ

ðβÞ

ðαÞ

PlðmÞ PsðtÞ = δαβ δms PlðtÞ :

ð18:149Þ

Proof We have ðαÞ

dα n

ðβ Þ

PlðmÞ PsðtÞ = = = = =

dα dβ n n

ðαÞ

g

dα dβ n n

dα dβ n n

k

g0

g

Dlm ðgÞ g

dα dβ n n

dα dβ n n

= =

dα n

ðαÞ

Dlm ðgÞ g dβ n

g0

g0

g0

g0



ðβ Þ

Dlm ðgÞ

{

DðβÞ ðgÞ DðβÞ ðg0 Þ

Dlm ðgÞ

k

g0

dβ n



g0



Dlm ðgÞ DðβÞ ðgÞks DðβÞ ðg0 Þkt g0 g

dβ n  δ δ δ DðβÞ ðg0 Þkt g0 = n dα αβ lk ms

= δαβ δms

eg0

DðβÞ ðgÞks  DðβÞ ðg0 Þkt

ðαÞ

k

 st

ðαÞ

g



ðβ Þ

Dlm ðgÞ Dst g - 1 g0 gg - 1 g0 ðαÞ

g



ðβÞ

Dst ðg0 Þ g0 g0

Dst g - 1 g0 g - 1 g0

ðαÞ

g

dβ n



g0



g0

δαβ δms DðβÞ ðg0 Þlt g0 ðαÞ

DðβÞ ðg0 Þlt g0 = δαβ δms PlðtÞ :

ð18:150Þ

This completes the proof. In the above proof, we used the rearrangement theorem and grand orthogonality theorem (GOT) as well as the homomorphism and unitarity of the representation matrices. ðαÞ ðαÞ ψ m is not extracted Comparing (18.146) and (18.148), we notice that the term cm ð α Þ ðαÞ ðαÞ entirely, but cm ψ l is given instead. This is due to the linearity of PlðmÞ . Nonetheless, this is somewhat inconvenient for a practical purpose. To overcome this inconvenience, we modify (18.149). In (18.149) putting s = m, we have

740

18 Representation Theory of Groups ðαÞ

ðβ Þ

ðαÞ

PlðmÞ PmðtÞ = δαβ PlðtÞ :

ð18:151Þ

We further modify the relation. Putting β = α, we have ðαÞ

ðαÞ

ðαÞ

ð18:152Þ

ðαÞ

ðαÞ

ðαÞ

ð18:153Þ

PlðmÞ PmðtÞ = PlðtÞ : Putting t = l furthermore, PlðmÞ PmðlÞ = PlðlÞ : In particular, putting m = l moreover, we get ðαÞ ðαÞ

ðαÞ

PlðlÞ PlðlÞ = PlðlÞ

2

ðαÞ

= PlðlÞ :

ð18:154Þ

In fact, putting m = l in (18.148), we get ðαÞ

ðαÞ

ðαÞ

PlðlÞ f = cl ψ l : ðαÞ

ð18:155Þ

ðαÞ

This means that the term cl ψ l has been extracted entirely. Moreover, in (18.149) putting β = α, s = l, and t = m, we have ðαÞ

ðαÞ

2

ðαÞ

PlðmÞ PlðmÞ = PlðmÞ

ðαÞ

= δml PlðmÞ :

ðαÞ

Therefore, for PlðmÞ to be an idempotent operator, we must have m = l. That is, of ðαÞ

ðαÞ

various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator. ðαÞ

Meanwhile, fully describing PlðlÞ we have ðαÞ

PlðlÞ =

dα n

ðαÞ

Dll ðgÞ g:

ð18:156Þ

g

Taking complex conjugate transposition (i.e., adjoint) of (18.156), we have ðαÞ

PlðlÞ

{

=

dα n

ðαÞ

Dll ðgÞg{ = g

=

dα n

ðαÞ

g

dα n

ðαÞ

Dll

g-1 g-1

{

g-1 ðαÞ

Dll ðgÞ g = PlðlÞ ,

ð18:157Þ

18.7

Projection Operators: Revisited

741

where we used unitarity of g (with the third equality) and equivalence of summation over g and g-1. Notice that the notation of g{ is less common. This should be interpreted as meaning that g{ operates on a vector constituting a representation space. Thus, notation g{ implies that g{ is equivalent to its unitary representation ðαÞ matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely, ðαÞ  in (18.156) Dll ðgÞ is a coefficient of an operator g. ðαÞ Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a rigorous sense; namely, it is an idempotent and Hermitian operator (see Definition ðαÞ 14.1). Let us call PlðlÞ a projection operator sensu stricto accordingly. Also, we notice ðαÞ that cðmαÞ ψ m is entirely extracted from an arbitrary function f including a coefficient. This situation resembles that of (12.205). ðαÞ Regarding PlðmÞ ðl ≠ mÞ, on the other hand, we have ðαÞ

PlðmÞ

{

=

dα n

ðαÞ

Dlm ðgÞg{ = g

=

dα n

dα n

ðαÞ

Dlm g - 1 g - 1 g-1

ðαÞ

g

{

ðαÞ

Dml ðgÞ g = PmðlÞ :

ð18:158Þ

ðαÞ

Hence, PlðmÞ is not Hermitian. We have many other related operators and equations. For instance, ðαÞ

ðαÞ

PlðmÞ PmðlÞ ðαÞ

{

ðαÞ

= PmðlÞ

{

ðαÞ

PlðmÞ

{

ðαÞ

ðαÞ

= PlðmÞ PmðlÞ :

ð18:159Þ

ðαÞ

Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153). In (18.149) putting m = l and t = s, we get ðαÞ ðβÞ

ðαÞ

PlðlÞ PsðsÞ = δαβ δls PlðsÞ :

ð18:160Þ

As in the case of (18.146), we assume that h is described as h=

ðαÞ d ðαÞ ϕm , m m

α

ð18:161Þ

ðαÞ in (18.146), but linearly where ϕðmαÞ is transformed in the same manner as that for ψ m ðαÞ independent of ψ m . Namely, we have

ðαÞ

g ϕi

=



ðαÞ

ðαÞ

ϕj Dji ðgÞ:

j=1

Tangible examples can be seen in Chap. 19.

ð18:162Þ

742

18 Representation Theory of Groups ðαÞ

Operating PlðlÞ on both sides of (18.155), we have ðαÞ 2

ðαÞ

ðαÞ

ðαÞ

ðαÞ

PlðlÞ f = PlðlÞ cl ψ l

ðαÞ

ðαÞ

= PlðlÞ f = cl ψ l ,

ð18:163Þ

where with the second equality we used (18.154). That is, we have ðαÞ

ðαÞ

ðαÞ

PlðlÞ cl ψ l ðαÞ

ðαÞ

ðαÞ

= cl ψ l :

ð18:164Þ

ðαÞ

This equation means that cl ψ l is an eigenfunction corresponding to an ðαÞ ðαÞ ðαÞ eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to the “position” l of an irreducible representation α. Furthermore, for some constants c and d as well as the functions f and h that appeared in (18.146) and (18.161), we consider a following equation: ðαÞ

ðαÞ

ðαÞ

PlðlÞ ccl ψ l

ðαÞ ðαÞ

ðαÞ ðαÞ

þ ddl ϕl

ðαÞ

ðαÞ

= cPlðlÞ cl ψ l ðαÞ

= ccl ψ l

ðαÞ ðαÞ ðαÞ

þ dPlðlÞ d l ϕl

ðαÞ ðαÞ

þ ddl ϕl ,

ð18:165Þ

where with the last equality we used (18.164). ðαÞ ðαÞ ðαÞ ðαÞ This means that an arbitrary linear combination of cl ψ l and d l ϕl again ðαÞ ðαÞ belongs to the position l of an irreducible representation α. If ψ l and ϕl are linearly independent, we can construct two orthonormal basis vectors following Theorem 13.2 (Gram–Schmidt orthonormalization theorem). If there are other ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ linearly independent vectors pl ξl , ql φl , ⋯, where ξl , φl , etc. belong to ðαÞ ðαÞ ðαÞ ðαÞ the position l of an irreducible representation α, then ccl ψ l þ ddl ϕl þ ðαÞ ðαÞ ðαÞ ðαÞ ppl ξl þ qql φl þ ⋯ again belongs to the position l of an irreducible representation α. Thus, we can construct orthonormal basis vectors of the representation space according to Theorem 13.2. Regarding arbitrary functions f and h as vectors and using (18.160), we make an inner product of (18.160) such that ðαÞ ðβÞ

ðαÞ ðαÞ ðβÞ

ðαÞ

ðαÞ

hjPlðlÞ PsðsÞ jf = hjPlðlÞ PlðlÞ PsðsÞ jf = hjPlðlÞ δαβ δls PlðsÞ jf ðαÞ

ðαÞ

= δαβ δls hPlðlÞ jPlðsÞ jf ,

ð18:166Þ

where with the first equality we used (18.154). Meanwhile, from (18.148) we have ðαÞ

ðαÞ

PlðsÞ j f i = csðαÞ ψ l : Also using (18.155), we get

ð18:167Þ

18.7

Projection Operators: Revisited

743 ðαÞ

ðαÞ

PlðlÞ hi = d l

ðαÞ

ϕl

:

ð18:168Þ

Taking adjoint of (18.168), we have ðαÞ

hh j PlðlÞ

{

ðαÞ



ðαÞ

= h PlðlÞ = d l

ðαÞ

ϕl

ð18:169Þ

,

where we used (18.157). The relation (18.169) is due to the notation of Sect. 13.3. Substituting (18.167) and (18.168) for (18.166), ðαÞ ðβÞ

ðαÞ

hjPlðlÞ PsðsÞ jf = δαβ δls dl



ðαÞ

ðαÞ

cðsαÞ ϕl jψ l

:

Meanwhile, (18.166) can also be described as ðαÞ ðβÞ

ðαÞ ðαÞ

ðαÞ

hjPlðlÞ PsðsÞ jf = dl ϕl jcðsβÞ ψ ðsβÞ = dl



ðαÞ

csðβÞ ϕl jψ sðβÞ :

ð18:170Þ

For (18.166) and (18.170) to be identical, we must have ðαÞ  ðαÞ cs

δαβ δls dl

ðαÞ

ðαÞ

ϕl jψ l

ðαÞ  ðβÞ cs

= dl

ðαÞ

ϕl jψ ðsβÞ :

Deleting coefficients, we get ðαÞ

ðαÞ

ðαÞ

ϕl jψ ðsβÞ = δαβ δls ϕl jψ l

:

ð18:171Þ

The relation (18.171) is frequently used to estimate whether definite integrals vanish. Functional forms depend upon actual problems we encounter in various situations. We will deal with this problem in Chap. 19 in relation to, e.g., discussion on optical transition and evaluation of overlap integrals. To evaluate (18.170), if α ≠ β or l ≠ s, we get simply ðαÞ ðβÞ

hjPlðlÞ PsðsÞ jf = 0 ðα ≠ β or l ≠ sÞ: ðβÞ

ð18:172Þ ðαÞ

That is, under a given condition α ≠ β or l ≠ s, PsðsÞ j f i and PlðlÞ j hi are orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that functions belonging to different irreducible representations (α ≠ β) are mutually orthogonal. Even though the functions belong to the same irreducible representation, the functions are orthogonal if they are allocated to the different “place” as a basis ðαÞ vector designated by l, s, etc. Here the place means the index j of ψ j in (18.146) or ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ϕj in (18.161) that designates the “order” of ψ j within ψ 1 , ψ 2 , ⋯, ψ dα or ϕj

744

18 Representation Theory of Groups ðαÞ

ðαÞ

ðαÞ

within ϕ1 , ϕ2 , ⋯, ϕdα . This takes place if the representation is multidimensional (i.e., dimensionality: dα). In (18.160) putting α = β, we get ðαÞ ðαÞ

ðαÞ

PlðlÞ PsðsÞ = δls PlðsÞ : In the above relation, furthermore, unless l = s we have ðαÞ ðαÞ

PlðlÞ PsðsÞ = 0: ðαÞ

ðαÞ

Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection ðαÞ

ðαÞ

operator as well in the case of l ≠ s. Notice, however, that if l = s, PlðlÞ þ PsðsÞ is not a projection operator. Readers can readily show it. Moreover, let us define P(α) as below: ðαÞ P : l = 1 lðlÞ dα

PðαÞ 

ð18:173Þ

Similarly, P(α) is again a projection operator as well. From (18.156), P(α) in (18.173) can be rewritten as PðαÞ =

dα n

ðαÞ

dα g

l=1

Dll ðgÞ g =

dα n



χ ðαÞ ðgÞ g:

ð18:174Þ

g

Returning to (18.155) and taking summation over l there, we have ðαÞ P f l = 1 lðlÞ dα

ðαÞ ðαÞ c ψl : l=1 l dα

=

Using (18.173), we get PðαÞ f =

ðαÞ ðαÞ c ψl : l=1 l dα

ð18:175Þ

Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting all the vectors (or functions) that belong to the irreducible representation α including their coefficients. Defining those functions as ψ ðαÞ  we succinctly rewrite (18.175) as

ðαÞ ðαÞ c ψl , l=1 l dα

ð18:176Þ

18.7

Projection Operators: Revisited

745

PðαÞ f = ψ ðαÞ :

ð18:177Þ

In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows: PðβÞ PðαÞ =

dβ n



χ ðβÞ ðgÞ g g



dα n

χ ðαÞ ðg0 Þ g0 :

ð18:178Þ

g0

To carry out the calculation, (1) first we replace g′ with g-1g′ and rewrite the summation over g′ as that over g-1g′. (2) Using the homomorphism and unitarity of the representation, we rewrite [χ (α)(g′)] as χ ðαÞ ðg0 Þ



=

i,j

DðαÞ ðgÞ



ji

DðαÞ ðg0 Þ :

ð18:179Þ

ji

Rewriting (18.178) we have PðβÞ PðαÞ = =

dβ dα n n

DðβÞ ðgÞ g k , i, j

dβ dα n δ δ δ n n k, i, j d β αβ kj ki = δαβ

dα n



DðαÞ ðgÞ

kk 

DðαÞ ðg0 Þ g0 = ji

g0

dα n



ji

DðαÞ ðg0 Þ g0 ji

g0

DðαÞ ðg0 Þ k

g0



χ ðαÞ ðg0 Þ g0 = δαβ PðαÞ :

 kk

g0 ð18:180Þ

g0

These relationships are anticipated from the fact that both P(α) and P(β) are projection operators. As in the case of (18.160) and (18.180) is useful to evaluate an inner product of functions. Again taking an inner product of (18.180) with arbitrary functions f and g, we have gjPðβÞ PðαÞ jf = gjδαβ PðαÞ jf = δαβ gjPðαÞ jf :

ð18:181Þ

If we define P such that P

nr α=1

PðαÞ ,

ð18:182Þ

then P is once again a projection operator (see Sect. 14.1). Now taking summation in (18.177) over all the irreducible representations α, we have

746

18 Representation Theory of Groups nr α=1

PðαÞ f =

nr α=1

ψ ðαÞ = f :

ð18:183Þ

The function f has been arbitrarily taken and, hence, we get P = E:

ð18:184Þ

The relation described in (18.182)–(18.184) is called completeness relation (see Sect. 5.1). As mentioned in Example 18.1, the concept of the representation space is not only important but also very useful for addressing various problems of physics and chemistry. For instance, regarding molecular orbital methods to be dealt with in Chap. 19, we consider a representation space whose dimension is equal to the number of electrons of a molecule about which we wish to know, e.g., energy eigenvalues of those electrons. In that case, the dimension of representation space is equal to the number of molecular orbitals. According to a symmetry species of the molecule, the representation matrix is decomposed into a direct sum of invariant eigenspaces relevant to individual irreducible representations. If the basis vectors belong to different irreducible representation, such vectors are orthogonal to each other in virtue of (18.171). Even though those vectors belong to the same irreducible representation, they are orthogonal if they are allocated to a different place. However, it is often the case that the vectors belong to the same place of the same irreducible representation. Then, it is always possible according to Theorem 13.2 to make them mutually orthogonal by taking their linear combination. In such a way, we can construct an orthonormal basis set throughout the representation space. In fact, the method is a powerful tool for solving an energy eigenvalue problem and for determining associated eigenfunctions (or molecular orbitals).

18.8

Direct-Product Representation

In Sect. 16.5 we have studied basic properties of direct-product groups. Correspondingly, we examine in this section properties of direct-product representation. This notion is very useful to investigate optical transitions in molecular systems and selection rules relevant to those transitions. Let D(α) and D(β) be two different irreducible representations whose dimensions are dα and dβ, respectively. Then, operating a group element g on the basis functions, we have gðψ i Þ =

dα k=1

ðαÞ

ψ k Dki ðgÞ ð1 ≤ i ≤ dα Þ,

ð18:185Þ

18.8

Direct-Product Representation

747 dβ

g ϕj = l=1

ðβ Þ

ð18:186Þ

ϕl Dlj ðgÞ 1 ≤ j ≤ d β ,

where ψ k (1 ≤ k ≤ dα) and ϕl (1 ≤ l ≤ dβ) are basis functions of D(α) and D(β), respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions are transformed according to g such that ðαÞ

g ψ i ϕj = gðψ i Þg ϕj =

k

ðαÞ

=

k

l

ðβ Þ ϕ D ð gÞ l l lj

ψ k Dki ðgÞ ðβÞ

ψ k ϕl Dki ðgÞDlj ðgÞ:

ð18:187Þ

Here let us define the following matrix [D(α × β)(g)]kl, ij such that ðαÞ

Dðα × βÞ ðgÞ

kl,ij

ðβÞ

 Dki ðgÞDlj ðgÞ:

ð18:188Þ

Then we have g ψ i ϕj =

k

l

ψ k ϕl Dðα × βÞ ðgÞ

kl,ij

:

ð18:189Þ

The notation using double scripts is somewhat complicated. We notice, however, that in (18.188) the order of subscript of kilj is converted to kl, ij, i.e., the subscripts i and l have been interchanged. We write D(α) and D(β) in explicit forms as follows: ðαÞ

ðαÞ

D ð gÞ =

d 1 ,1 ⋮ ðαÞ d d α ,1

⋯ ⋱ ⋯

ðβ Þ

ðαÞ

d 1, d α ⋮ ðαÞ d dα , dα

ðβÞ

, D ð gÞ =

d 1 ,1 ⋮ ðβ Þ d d β ,1

⋯ ⋱ ⋯

ðβÞ

d 1, dβ ⋮ ðβÞ d dβ , dβ

:

ð18:190Þ

Thus, ðαÞ

Dðα × βÞ ðgÞ = DðαÞ ðgÞ  DðβÞ ðgÞ =

d1,1 DðβÞ ðgÞ ⋮ ðαÞ d dα ,1 DðβÞ ðgÞ

⋯ ⋱ ⋯

ðαÞ

d1, dα DðβÞ ðgÞ ⋮ ðαÞ ddα , dα DðβÞ ðgÞ

: ð18:191Þ

To get familiar with the double-scripted notation, we describe a case of (2, 2) matrices. Denoting

748

18 Representation Theory of Groups

a12 a22

a11 a21

DðαÞ ðgÞ =

and DðβÞ ðgÞ =

b12 , b22

b11 b21

ð18:192Þ

we get Dðα × βÞ ðgÞ = DðαÞ ðgÞ  DðβÞ ðgÞ =

a11 b11 a11 b21 a21 b11 a21 b21

a11 b12 a11 b22 a21 b12 a21 b22

a12 b11 a12 b21 a22 b11 a22 b21

a12 b12 a12 b22 a22 b12 a22 b22

:

ð18:193Þ

Corresponding to (18.22) and (18.189) describes the transformation of ψ iϕj regarding the double script. At the same time, a set {ψ iϕj; (1 ≤ i ≤ dα, 1 ≤ j ≤ dβ)} is a basis of D(α × β). In fact, Dðα × βÞ ðgg0 Þ =

ðαÞ ðαÞ 0 μ Dkμ ðgÞDμi ðg Þ

=

=

ðαÞ

kl,ij

ðβ Þ

= Dki ðgg0 ÞDlj ðgg0 Þ

ðβ Þ ðβ Þ 0 ν Dlν ðgÞDνj ðg Þ] ðαÞ

μ

μ

ν

ν

ðβÞ

ðαÞ

ðβÞ

Dkμ ðgÞDlν ðgÞDμi ðg0 ÞDνj ðg0 Þ

Dðα × βÞ ðgÞ

kl,μν

Dðα × βÞ ðg0 Þ

μν,ij

:

ð18:194Þ

Equation (18.194) shows that the rule of matrix calculation with respect to the double subscript is satisfied. Consequently, we get Dðα × βÞ ðgg0 Þ = Dðα × βÞ ðgÞDðα × βÞ ðg0 Þ:

ð18:195Þ

The relation (18.195) indicates that D(α × β) is certainly a representation of a group. This representation is said to be a direct-product representation. A character of the direct-product representation is given by putting k = i and l = j in (18.188). That is, Dðα × βÞ ðgÞ Denoting

ðαÞ

ij,ij

ðβÞ

= Dii ðgÞDjj ðgÞ:

ð18:196Þ

18.8

Direct-Product Representation

749

χ ð α × β Þ ð gÞ 

i,j

Dðα × βÞ ðgÞ

ij,ij

ð18:197Þ

,

we have χ ðα × βÞ ðgÞ = χ ðαÞ ðgÞχ ðβÞ ðgÞ:

ð18:198Þ × β)

Even though D(α) and D(β) are both irreducible, D(α irreducible. Suppose that DðαÞ ðgÞ  DðβÞ ðgÞ =

qD γ γ

ðγ Þ

is not necessarily

ðgÞ,

ð18:199Þ

where qγ is given by (18.83). Then, we have qγ =

1 n

g

χ ð γ Þ ð gÞ  χ ð α × β Þ ð gÞ =

1 n

g

χ ðγÞ ðgÞ χ ðαÞ ðgÞχ ðβÞ ðgÞ,

ð18:200Þ

where n is an order of the group. This relation is often used to perform quantum mechanical or chemical calculations, especially to evaluate optical transitions of matter including atoms and molecular systems. This is also useful to examine whether a definite integral of a product of functions (or an inner product of vectors) vanishes. In Sect. 16.5, we investigated definition and properties of direct-product groups. Similarly to the case of the direct-product representation, we consider a representation of the direct-product groups. Let us consider two groups ℊ and H and assume that a direct-product group ℊ  H is defined (Sect. 16.5). Also let D(α) and D(β) be dα- and dβ-dimensional representations of groups ℊ and H , respectively. Furthermore, let us define a matrix D(α × β)(ab) as in (18.188) such that Dðα × βÞ ðabÞ

ðαÞ

kl,ij

ðβÞ

 Dki ðaÞDlj ðbÞ,

ð18:201Þ

where a and b are arbitrarily chosen from ℊ and H , respectively, and ab 2 ℊ  H . Then a set comprising D(α × β)(ab) forms a representation of ℊ  H . (Readers, please verify it.) A dimension of D(α × β) is dαdβ. The character is given in (18.69), and so in the present case, by putting i = k and j = l in (18.201), we get χ ðα × βÞ ðabÞ =

k

l

Dðα × βÞ ðabÞ

kl,kl

=

= χ ðαÞ ðaÞχ ðβÞ ðbÞ:

ðαÞ

k

l

ðβ Þ

Dkk ðaÞDll ðbÞ ð18:202Þ

Equation (18.202) resembles (18.198). Hence, we should be careful not to confuse them. In (18.198), we were thinking of a direct-product representation within a sole group ℊ. In (18.202), however, we are considering a representation

750

18 Representation Theory of Groups

of the direct-product group comprising two different groups. In fact, even though a character is computed with respect to a sole group element g in (18.198), in (18.202) we evaluate a character regarding two group elements a and b chosen from different groups ℊ and H , respectively.

18.9 Symmetric Representation and Antisymmetric Representation As mentioned in Sect. 18.2, we have viewed a group element g as a linear transformation over a vector space V. There we dealt with widely chosen functions as vectors. In this chapter we introduce other useful ideas so that we can apply them to molecular science and atomic physics. In the previous section, we defined direct-product representation. In D(α × β)(g) = D(α)(g)  D(β)(g), we can freely consider a case where D(α)(g) = D(β)(g). Then we have ðαÞ

g ψ i ϕj = gðψ i Þg ϕj =

k

l

ðαÞ

ψ k ϕl Dki ðgÞDlj ðgÞ:

ð18:203Þ

Regarding a product function ψ jϕi, we can get a similar equation such that ðαÞ

g ψ j ϕ i = g ψ j gð ϕi Þ =

k

l

ðαÞ

ψ k ϕl Dkj ðgÞDli ðgÞ:

ð18:204Þ

On the basis of the linearity of the relations (18.203) and (18.204), let us construct a linear combination of the product functions. That is, g ψ i ϕj ± ψ j ϕi = g ψ i ϕj ± g ψ j ϕi =

ðαÞ

k

l

ðαÞ

ψ k ϕl Dki ðgÞDlj ðgÞ ±

ðαÞ

k ðαÞ

=

k

l

l

ðαÞ

ψ k ϕl Dkj ðgÞDli ðgÞ ðαÞ

ðψ k ϕl ± ψ l ϕk ÞDki ðgÞDlj ðgÞ:

ð18:205Þ

Here, defining Ψ ij± as Ψ ij±  ψ i ϕj ± ψ j ϕi ,

ð18:206Þ

we rewrite (18.205) as gΨ ij± =

k

l

Ψ kl±

1 ðαÞ ðαÞ ðαÞ ðαÞ D ðgÞDlj ðgÞ ± Dli ðgÞDkj ðgÞ 2 ki

:

ð18:207Þ

18.9

Symmetric Representation and Antisymmetric Representation

751

Notice that Ψ þ ij and Ψ ij in (18.206) are symmetric and antisymmetric with respect to the interchange of subscript i and j, respectively. That is,

Ψ ij± = ± Ψ ji± :

ð18:208Þ

We may naturally ask how we can constitute representation (matrices) with (18.207). To answer this question, we have to carry out calculations by replacing g with gg′ in (18.207) and following the procedures of Sect. 18.2. Thus, we have gg0 Ψ ij± = =

k

k

l

l

=

k

=

l

k

Ψ kl±

l

1 2

Ψ kl±

1 2

Ψ kl±

ðαÞ ðαÞ 0 μ Dlμ ðgÞDμi ðg Þ

±

1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ D ðgg ÞDlj ðgg Þ ± Dli ðgg0 ÞDkj ðgg0 Þ 2 ki

Ψ kl±

ðαÞ

μ

ðαÞ

ν

ðαÞ

Dlν ðgÞDνj ðg0 Þ

ðαÞ ðαÞ 0 ν Dkν ðgÞDνj ðg Þ ðαÞ

μ

1 2

ðαÞ

Dkμ ðgÞDμi ðg0 Þ

ν

μ

ðαÞ

ðαÞ

ðαÞ

ðαÞ

ðαÞ

Dkμ ðgÞDlν ðgÞ ± Dlμ ðgÞDkν ðgÞ Dμi ðg0 ÞDνj ðg0 Þ

Dðα × αÞ ðgÞ

ν

kl,μν

± Dðα × αÞ ðgÞ

ð αÞ

lk,μν

ð αÞ

Dμi ðg0 ÞDνj ðg0 Þ ,

ð18:209Þ where the last equality follows from the definition of a direct-product representation (18.188). We notice that the terms of [D(α × α)(gg′)]kl, μν ± [D(α × α)(gg′)]lk, μν in (18.209) are symmetric and antisymmetric with respect to the interchange of subscripts k and l together with subscripts μ and ν, respectively. Comparing both sides of (18.209), we see that this must be the case with i and j as well. Then, the last factor of (18.209) should be rewritten as: ðαÞ

ðαÞ

Dμi ðg0 ÞDνj ðg0 Þ =

1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ D ðg ÞDνj ðg Þ ± Dνi ðg0 ÞDμj ðg0 Þ : 2 μi

Now, we define the following notations D[α × α](g) and D{α × α}(g) represented by D½α × α ðgÞ

= and

kl,μν



1 2

Dðα × αÞ ðgÞ

kl,μν

þ Dðα × αÞ ðgÞ

1 ðαÞ ðαÞ ðαÞ ðαÞ D ðgÞDlν ðgÞ þ Dlμ ðgÞDkν ðgÞ 2 kμ

lk,μν

ð18:210Þ

752

18 Representation Theory of Groups

Dfα × αg ðgÞ

kl,μν

=



1 2

Dðα × αÞ ðgÞ

kl,μν

- Dðα × αÞ ðgÞ

lk,μν

1 ðαÞ ðαÞ ðαÞ ðαÞ D ðgÞDlν ðgÞ - Dlμ ðgÞDkν ðgÞ : 2 kμ ðαÞ

ð18:211Þ

ðαÞ

Meanwhile, using Dμi ðg0 ÞDνj ðg0 Þ = Dðα × αÞ ðg0 Þ μν,ij and considering the exchange of summation with respect to the subscripts μ and ν, we can also define D[α × α](g′) and D{α × α}(g) according to the symmetric and antisymmetric cases, respectively. Thus for the symmetric case, we have gg0 Ψ þ ij =

k

l

μ

=

k

l

ν

½α × α Ψþ ð gÞ kl D

D½α × α ðg0 Þ

kl,μν

½α × α Ψþ ðgÞD½α × α ðg0 Þ kl D

kl,ij

:

μν,ij

ð18:212Þ

Using (18.210) and (18.207) can be rewritten as gΨ þ ij =

k

l

½α × α Ψþ ð gÞ kl D

gg0 Ψ þ ij =

k

l

½α × α Ψþ ðgg0 Þ kl D

kl,ij

:

ð18:213Þ

Then we have

kl,ij

:

ð18:214Þ

Comparing (18.212) and (18.214), we finally get D½α × α ðgg0 Þ = D½α × α ðgÞD½α × α ðg0 Þ:

ð18:215Þ

Similarly, for the antisymmetric case, we have gΨ ij- = gg ′ Ψ ij- =

k

k l

l

Ψ kl- Dfα × αg ðgÞ

kl,ij

Ψ kl- Dfα × αg ðgÞDfα × αg ðg0 Þ

ð18:216Þ

, kl,ij

:

ð18:217Þ

Hence, we obtain Dfα × αg ðgg0 Þ = Dfα × αg ðgÞDfα × αg ðg0 Þ:

ð18:218Þ

Thus, both D[α × α](g) and D{α × α}(g) produce well-defined representations. Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions belonging to the symmetric representation and dα(dα - 1)/2 functions belonging to

References

753

the antisymmetric representation. With the two-dimensional representation, for instance, functions belonging to the symmetric representation are ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 :

ð18:219Þ

A function belonging to the antisymmetric representation is ψ 1 ϕ2 - ψ 2 ϕ1 :

ð18:220Þ

Note that these vectors have not yet been normalized. From (18.210) and (18.211), we can readily get useful expressions with characters of symmetric and antisymmetric representations. In (18.210) putting μ = k and ν = l and summing over k and l, χ ½ α × α  ð gÞ = =

1 2

k

l

D½α × α ðgÞ

ðαÞ

k

=

l

ðαÞ

D½α × α ðgÞ

kl,kl

ðαÞ

kl,kl

ðαÞ

Dkk ðgÞDll ðgÞ þ Dlk ðgÞDkl ðgÞ

1 2

χ ð α Þ ð gÞ

2

þ χ ðαÞ g2

:

ð18:221Þ

Similarly we have χ fα × αg ðgÞ =

1 2

χ ðαÞ ðgÞ

2

- χ ðαÞ g2

:

ð18:222Þ

References 1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer, Berlin 2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics. Shokabo, Tokyo 3. Hassani S (2006) Mathematical physics. Springer, New York 4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World Scientific, Singapore 5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York

Chapter 19

Applications of Group Theory to Physical Chemistry

On the basis of studies of group theory, now in this chapter we apply the knowledge to the molecular orbital (MO) calculations (or quantum chemical calculations). As tangible examples, we adopt aromatic hydrocarbons (ethylene, cyclopropenyl radical, benzene, and ally radical) and methane. The approach is based upon a method of linear combination of atomic orbitals (LCAO). To seek an appropriate LCAO MO, we make the most of a method based on a symmetry-adapted linear combination (SALC). To use projection operators is a powerful tool for this purpose. For the sake of correct understanding, it is desired to consider transformation of functions. To this end, we first show several examples. In the process of carrying out MO calculations, we encounter a secular equation as an eigenvalue equation. Using a SALC eases the calculations of the secular equation. Molecular science relies largely on spectroscopic measurements and researchers need to assign individual spectral lines to a specific transition between the relevant molecular states. Representation theory works well in this situation. Thus, the group theory finds a perfect fit with its applications in the molecular science.

19.1

Transformation of Functions

Before showing individual examples, let us consider the transformation of functions (or vectors) by the symmetry operation. Here we consider scalar functions. For example, let us suppose an arbitrary scalar function f(x, y) on a Cartesian xycoordinate. Figure 19.1 shows a contour map of f(x, y) = constant. Then suppose that the map is rotated around the origin. More specifically, the positon vector r0 fixed on a “summit” [i.e., a point that gives a maximal value of f(x, y)] undergoes a symmetry operation, namely, rotation around the origin. As a result, r0 is transformed to r00 . Here, we assume that a “mountain” represented by f(x, y) is a rigid body before and © Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_19

755

756

19

Fig. 19.1 Contour map of f(x, y) and f ′(x′, y′). The function f ′(x′, y′) is obtained by rotating a map of f(x, y) around the z-axis

Applications of Group Theory to Physical Chemistry

y

,

,

x

O

after the transformation. Concomitantly, a general point r (see Sect. 17.1) is transformed to r′ in exactly the same way as r0. A new function f ′ gives a new contour map after the rotation. Let us describe f ′ as f 0  OR f :

ð19:1Þ

The RHS of (19.1) means that f ′ is produced as a result of operating the rotation R on f. We describe the position vector after being transformed as r00 . Following the notation of Sect. 11.1, r00 and r′ are expressed as r00 = Rðr0 Þ

and

r0 = RðrÞ:

ð19:2Þ

The matrix representation for R is given by e.g., (11.35). In (19.2) we have r0 = ðe1 e2 Þ

x0 y0

and

r00 = ðe1 e2 Þ

x00 y00

,

ð19:3Þ

where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have r = ðe 1 e 2 Þ

x y

and

r 0 = ð e1 e2 Þ

x0 y0

:

ð19:4Þ

Meanwhile, the following equation must hold: f 0 ðx0 , y0 Þ = f ðx, yÞ: Or, using (19.1) we have

ð19:5Þ

19.1

Transformation of Functions

757

OR f ðx0 , y0 Þ = f ðx, yÞ:

ð19:6Þ

The above argument can be extended to a three-dimensional (or higher dimensional) space. In that case, similarly we have OR f ðx0 , y0 , z0 Þ = f ðx, y, zÞ, OR f ðr0 Þ = f ðrÞ, or f 0 ðr0 Þ = f ðrÞ,

ð19:7Þ

where r = ð e1 e 2 e 3 Þ

x y z

and

r 0 = ð e1 e2 e3 Þ

x0 y0 z0

:

ð19:8Þ

The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite (19.7) as OR f ½RðrÞ = f ðrÞ:

ð19:9Þ

Replacing r with R-1(r), we get OR f R R - 1 ðrÞ

= f R - 1 ðrÞ :

ð19:10Þ

That is, OR f ðrÞ = f R - 1 ðrÞ :

ð19:11Þ

More succinctly, (19.11) may be rewritten as Rf ðrÞ = f R - 1 ðrÞ :

ð19:12Þ

Comparing (19.11) with (19.12) and considering (19.1), we have f 0  OR f  Rf :

ð19:13Þ

To gain a good understanding of the function transformation, let us think of some examples. Example 19.1 Let f(x, y) be a function described by f ðx, yÞ = ðx - aÞ2 þ ðy - bÞ2 ; a, b > 0:

ð19:14Þ

A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis. Then, f(x, y) is transformed into Rf(x, y) = f ′(x, y) such that

758

19

Applications of Group Theory to Physical Chemistry

y, x'

Fig. 19.2 Contour map of f(x, y) = (x - a)2 + (y - b)2 and f ′(x′, y′) = (x′ + b)2 + (y′ - a)2 before and after a π/2 rotation around the z-axis

,

, ( , )

(− , )

y'

z

x

O

Rf ðx, yÞ = f 0 ðx, yÞ = ðx þ bÞ2 þ ðy - aÞ2 :

ð19:15Þ

We also have r 0 = ð e1 e2 Þ

r00 = ðe1 e2 Þ

-1 0

0 1

a b

a b

,

= ð e1 e2 Þ

-b a

,

-y x

:

where we define R as R=

0 1

-1 0

:

Similarly, r = ð e1 e 2 Þ

and r0 = ðe1 e2 Þ

x y

From (19.15), we have f 0 ðx0 , y0 Þ = ðx0 þ bÞ þ ðy0 - aÞ = ð- y þ bÞ2 þ ðx - aÞ2 = f ðx, yÞ: 2

2

ð19:16Þ

This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is that a view of f ′(x′, y′) from (-b, a) is the same as that of f(x, y) from (a, b). Imagine that if we are standing at (-b, a) of f ′(x′, y′), we are in the bottom of the “valley” of f ′(x′, y′). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in

19.1

Transformation of Functions

759

y ,

, ( , 0)

(− , 0)

O x

z

2 2 2 2 2 2 Fig. 19.3 Contour map of f ðrÞ = e - 2½ðx - aÞ þy þz  þ e - 2½ðxþaÞ þy þz  ða > 0Þ. The function form remains unchanged by a reflection with respect to the yz-plane

the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f ′(x′, y′), (a, b) and (-b, a) are the lowest point, respectively. Meanwhile, we have R-1 =

0 -1

1 0

, R - 1 ðrÞ = ðe1 e2 Þ

0 -1

1 0

x y

= ð e1 e2 Þ

y -x

:

Then we have f R - 1 ðrÞ = ½ðyÞ - a2 þ ½ð- xÞ - b2 = ðx þ bÞ2 þ ðy - aÞ2 = Rf ðrÞ:

ð19:17Þ

Thus, (19.12) certainly holds. Example 19.2 Let f(r) and g(r) be functions described by f ðrÞ = e - 2½ðx - aÞ þy 2

2

gðrÞ = e - 2½ðx - aÞ þy 2

þz2 

þ e - 2½ðxþaÞ þy

þz2 

- e - 2½ðxþaÞ

2

2

2

2

þz2 

þy2 þz2 

ða > 0Þ,

ð19:18Þ

ða > 0Þ:

ð19:19Þ

Figure 19.3 shows an outline of the contour of f(r). We consider a following symmetry operation in a three-dimensional coordinate system:

760

19

Applications of Group Theory to Physical Chemistry

(a)

(b)

0

0

‒a

0

a

‒a

0

a

2 2 2 2 2 2 2 2 2 Fig. 19.4 Plots of f ðrÞ = e - 2½ðx - aÞ þy þz  þ e - 2½ðxþaÞ þy þz  and gðrÞ = e - 2½ðx - aÞ þy þz  - 2½ðxþaÞ2 þy2 þz2  e ða > 0Þ as a function of x. The plots have been taken along the x-axis. (a) f(r).

(b) g(r)

R=

-1 0 0

0 1 0

0 0 1

and

-1 0 0

R-1 =

0 1 0

0 0 1

:

ð19:20Þ

This represents a reflection with respect to the yz-plane. Then we have f R - 1 ðrÞ = Rf ðrÞ = f ðrÞ,

ð19:21Þ

g R - 1 ðrÞ = RgðrÞ = - gðrÞ:

ð19:22Þ

Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4. Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and -1 for f(r) and g(r), respectively. In particular, f(r) holds the functional form after the transformation R. In this case, f(r) is said to be invariant with the transformation R. Moreover, f(r) is invariant with the following eight transformations: R=

±1 0 0

0 ±1 0

0 0 ±1

:

ð19:23Þ

These transformations form a group that is isomorphic to D2h. Therefore, f(r) is eligible for a basis function of the totally symmetric representation Ag of D2h. Notice that f(r) is invariant as well with a rotation of an arbitrary angle around the x-axis. On the other hand, g(r) belongs to B3u.

19.2

19.2

Method of Molecular Orbitals (MOs)

761

Method of Molecular Orbitals (MOs)

Bearing in mind these arguments, we examine several examples of quantum chemical calculations. Our approach is based upon the molecular orbital theory. The theory assumes the existence of molecular orbitals (MOs) in a molecule, as the notion of atomic orbitals has been well-established in atomic physics. Furthermore, we assume that the molecular orbitals comprise a linear combination of atomic orbitals (LCAO). This notion is equivalent to that individual electrons in a molecule are independently moving in a potential field produced by nuclei and other electrons. In other words, we assume that each electron is moving along an MO that is extended over the whole molecule. Electronic state in the molecule is formed by different MOs of various energies that are occupied by electrons. As in the case of an atom, an MO ψ i(r) occupied by an electron is described as Hψ i ðrÞ  -

ħ2 2 — þ V ðrÞ ψ i ðrÞ = E i ψ i ðrÞ, 2m

ð19:24Þ

where H is Hamiltonian of a molecule; m is a mass of an electron (note that we do not use a reduced mass μ here); r is a position vector of the electron; ∇2 is the Laplacian (Laplace operator); V(r) is a potential of the molecule at r; Ei is an energy of the electron occupying ψ i and said to be a molecular orbital energy. We assume that V(r) possesses a symmetry the same as that of the molecule. Let ℊ be a symmetry group and let a group element arbitrarily chosen from among ℊ be R. Suppose that an arbitrary position vector r is moved to another position r′. This transformation is expressed as (19.2). Since V(r) has the same symmetry as the molecule, an electron “feels” the same potential field at r′ as that at r. That is, V ðrÞ = V 0 ðr0 Þ = V ðr0 Þ:

ð19:25Þ

V ðrÞψ ðrÞ = V ðr0 Þψ 0 ðr0 Þ,

ð19:26Þ

Or we have

where ψ is an arbitrary function. Defining V ðrÞψ ðrÞ  ½Vψ ðrÞ = Vψ ðrÞ, and recalling (19.1) and (19.7), we get ½RV ψ ðr0 Þ = R½Vψ ðr0 Þ = V 0 ψ 0 ðr0 Þ = V 0 ðr0 Þψ 0 ðr0 Þ = V ðr0 Þψ 0 ðr0 Þ

ð19:27Þ

762

19

Applications of Group Theory to Physical Chemistry

= V ðr0 ÞRψ ðr0 Þ = VRψ ðr0 Þ = ½VRψ ðr0 Þ:

ð19:28Þ

Comparing the first and last sides and remembering that ψ is an arbitrary function, we get RV = VR:

ð19:29Þ

Next, let us examine the symmetry of the Laplacian — 2. The Laplacian is defined in Sect. 1.2 as 2

—2 

2

2

∂ ∂ ∂ þ 2þ 2, 2 ∂x ∂y ∂z

ð1:24Þ

where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that transforms the xyz-coordinate system to x′y′z′-coordinate system. We suppose that S is expressed as S=

s11 s21 s31

s12 s22 s32

s13 s23 s33

and

x′ y′ z′

=

s11 s21 s31

s12 s22 s32

s13 s23 s33

x y z

:

ð19:30Þ

Since S is an orthonormal matrix, we have x y z

=

s11 s12 s13

s21 s22 s23

s31 s32 s33

x′ y′ z′

:

ð19:31Þ

The equation is due to SST = ST S = E,

ð19:32Þ

where E is a unit matrix. Then we have ∂y ∂ ∂z ∂ ∂ ∂ ∂ ∂x ∂ ∂ = þ þ = s11 þ s12 þ s13 : ∂x ∂y ∂z ∂x0 ∂x0 ∂x ∂x0 ∂y ∂x0 ∂z Partially differentiating (19.33), we have ∂ ∂ ∂ ∂ ∂ ∂ ∂ = s11 0 þ s12 0 þ s13 0 2 0 ∂x ∂x ∂x ∂y ∂x ∂z ∂x

ð19:33Þ

19.2

Method of Molecular Orbitals (MOs)

= s11 s11

763

∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ þ s12 þ s13 þ s12 s11 þ s12 þ s13 ∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y ∂ ∂ ∂ ∂ þ s12 þ s13 ∂x ∂y ∂z ∂z

þs13 s11

2

= s11 2

2

2

2

2

∂ ∂ ∂ þ s11 s12 þ s11 s13 2 ∂y∂x ∂z∂x ∂x 2

þs12 s11

∂ ∂ ∂ þ s12 2 2 þ s12 s13 ∂x∂y ∂z∂y ∂y

þs13 s11

∂ ∂ ∂ þ s13 s12 þ s13 2 2 : ∂x∂z ∂y∂z ∂z

2

Calculating terms of

∂ ∂y0 2

and

2

∂ , ∂z0 2

2

ð19:34Þ

we have similar results. Then, collecting all

those 27 terms, we get, e.g., 2

s11 2 þ s21 2 þ s31 2

2

∂ ∂ = 2, 2 ∂x ∂x

ð19:35Þ

where we used an orthogonal relationship of S. Cross terms of vanish. Consequently, we get 2

2

2

2

∂ , ∂ , ∂x∂y ∂y∂x

etc. all

2

∂ ∂ ∂ ∂ ∂ ∂ þ þ = þ þ : ∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2

ð19:36Þ

Defining — ′2 as —0  2

∂ ∂ ∂ þ 02 þ 02 , 2 0 ∂x ∂y ∂z

ð19:37Þ

we have — 0 = — 2: 2

ð19:38Þ

Notice that (19.38) holds with not only the symmetry operation of the molecule, but also any rotation operation in ℝ3 (see Sect. 17.4).

764

19

Applications of Group Theory to Physical Chemistry

As in the case of (19.28), we have R— 2 ψ ðr0 Þ = R — 2 ψ ðr0 Þ = — 0 ψ 0 ðr0 Þ = — 2 ψ 0 ðr0 Þ 2

= — 2 Rψ ðr0 Þ = — 2 R ψ ðr0 Þ:

ð19:39Þ

R— 2 = — 2 R:

ð19:40Þ

Consequently, we get

Adding both sides of (19.29) and (19.40), we get R — 2 þ V = — 2 þ V R:

ð19:41Þ

RH = HR:

ð19:42Þ

From (19.24), we have

Thus, we confirm that the Hamiltonian H commutes with any symmetry operation R. In other words, H is invariant with the coordinate transformation relevant to the symmetry operation. Now we consider matrix representation of (19.42). Also let D be an irreducible representation of the symmetry group ℊ. Then, we have DðgÞH = HDðgÞ g 2 ℊ : Let {ψ 1, ψ 2, ⋯, ψ d} be a set of basis vectors that span a representation space L S associated with D. Then, on the basis of Schur’s Second Lemma of Sect. 18.3, (19.42) immediately leads to an important conclusion that if H is represented by a matrix, we must have H = λE,

ð19:43Þ

where λ is an arbitrary complex constant. That is, H is represented such that H=

λ ⋮ 0

⋯ ⋱ ⋯

0 ⋮ λ

,

ð19:44Þ

where the above matrix is (d, d ) square diagonal matrix. This implies that H is reduced to

19.2

Method of Molecular Orbitals (MOs)

765

H = λDð0Þ



λDð0Þ ,

where D(0) denotes the totally symmetric representation; notice that it is given by 1 for any symmetry operation. Thus, the commutability of an operator with all symmetry operation is equivalent to that the operator belongs to the totally symmetric representation. Since H is Hermitian, λ should be real. Operating both sides of (19.42) on {ψ 1, ψ 2, ⋯, ψ d} from the right, we have ðψ 1 ψ 2 ⋯ ψ d ÞRH = ðψ 1 ψ 2 ⋯ ψ d ÞHR

λ

= ðψ 1 ψ 2 ⋯ ψ d Þ



λ

⋯ ⋱ ⋯

⋮ λ

R = ðλψ 1 λψ 2 ⋯ λψ d ÞR

= λðψ 1 ψ 2 ⋯ ψ d ÞR:

ð19:45Þ

In particular, putting R = E, we get ðψ 1 ψ 2 ⋯ ψ d ÞH = λðψ 1 ψ 2 ⋯ ψ d Þ: Simplifying this equation, we have ψ i H = λψ i ð1 ≤ i ≤ dÞ:

ð19:46Þ

Thus, λ is found to be an energy eigenvalue of H and ψ i (1 ≤ i ≤ d ) are eigenfunctions belonging to the eigenvalue λ. Meanwhile, using d

Rðψ i Þ =

k=1

ψ k Dki ðRÞ,

we rewrite (19.45) as ðψ 1 ψ 2 ⋯ ψ d ÞRH = ðψ 1 R ψ 2 R ⋯ ψ d RÞH

=



d k=1

ψ k Dk1 ðRÞ

d k=1

ψ k Dk1 ðRÞ

d k=1

ψ k Dk2 ðRÞ ⋯

d k=1

ψ k Dk2 ðRÞ ⋯

d k=1

ψ k Dkd ðRÞ H

d k=1

ψ k Dkd ðRÞ ,

where with the second equality we used the relation (11.42); i.e.,

766

19

Applications of Group Theory to Physical Chemistry

ψ i R = Rðψ i Þ: Thus, we get d k=1

ψ k Dki ðRÞH = λ

d k=1

ψ k Dki ðRÞ ð1 ≤ i ≤ dÞ:

ð19:47Þ

The relations (19.45)–(19.47) imply that ψ 1, ψ 2, ⋯, and ψ d as well as their linear combinations using a representation matrix D(R) are eigenfunctions that belong to the same eigenvalue λ. That is, ψ 1, ψ 2, ⋯, and ψ d are said to be degenerate with a multiplication d. After the remarks of Theorem 14.5, we can construct an orthonormal basis set fψ~ 1 , ψ~ 2 , ⋯, ψ~ d g as eigenfunctions. These vectors can be constructed via linear combinations of ψ 1, ψ 2, ⋯, and ψ d. After transforming the vectors ψ~ i ð1 ≤ i ≤ dÞ by R, we get d k=1

=

ψ~ k Dki ðRÞj

d k=1

d l=1

ψ~ l Dlj ðRÞ =

Dki ðRÞ Dkj ðRÞ =

d k=1

d

d

k=1

l=1

Dki ðRÞ Dlj ðRÞhψ~ k j~ ψ li

D{ ðRÞ ik ½DðRÞkj = D{ ðRÞDðRÞ

ij

= δij :

With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set is retained after unitary transformation of fψ~ 1 , ψ~ 2 , ⋯, ψ~ d g. In the above discussion, we have assumed that ψ 1, ψ 2, ⋯, and ψ d belong to a certain irreducible representation D(ν). Since ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d consist of their linear combinations, ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d belong to D(ν) as well. Also, we have assumed that ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d form an orthonormal basis and, hence, according to Theorem 11.3 these vectors constitute basis vectors that belong to D(ν). In particular, functions derived using projection operators share these characteristics. In fact, the above principles underlie molecular orbital calculations dealt with in Sects. 19.4 and 19.5. We will go into more details in subsequent sections. Bearing the aforementioned argument in mind, we make the most of the relation expressed by (18.171) to evaluate inner products of functions (or vectors). In this connection, we often need to calculate matrix elements of an operator. One of the most typical examples is matrix elements of Hamiltonian. In the field of molecular science we have to estimate an overlap integral, Coulomb integral, resonance integral, etc. Other examples include transition matrix elements pertinent to, e.g., electric dipole transition. To this end, we deal with direct-product representation (see Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th irreducible representation. Let us think of a following inner product:

19.2

Method of Molecular Orbitals (MOs)

767

ðβ Þ

ϕl jOðγÞ ψ ðsαÞ ,

ð19:48Þ

ðβ Þ

where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the l-th component of β-th irreducible representation, respectively; see (18.146) and (18.161). (i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ belongs to an eigenvalue λ of H. Then, we have ðαÞ

ðαÞ

H j ψ l i = λ j ψ l i: In that case, with (19.48) we get ðβ Þ

ðβÞ

ϕl jHψ ðsαÞ = λ ϕl jψ sðαÞ :

ð19:49Þ

Having the benefit of (18.171), (19.49) can be rewritten as ðβÞ

ðαÞ

ðαÞ

ϕl jHψ ðsαÞ = λδαβ δls ϕl jψ l

:

ð19:50Þ

At the first glance this equation seems trivial, but the equation gives us a powerful tool to save us a lot of troublesome calculations. In fact, if we encounter a series of inner product calculations (i.e., definite integrals), we ignore many of them. In (19.50), the matrix elements of Hamiltonian do not vanish only if α = β and l = s. ðαÞ ðαÞ That is, we only have to estimate ϕl jψ l . Otherwise the inner products vanish. ðαÞ

ðαÞ

The functional forms of ϕl and ψ l are determined depending upon individual practical problems. We will show tangible examples later (Sects. 19.4 and 19.5). (ii) Next, let us consider matrix elements of the optical transition. In this case, we are thinking of transition probability between quantum states. Assuming the dipole ðβ Þ approximation, we use εe  P(γ) for O(γ) in ϕl jOðγÞ ψ sðαÞ of (19.48). The quantity P(γ) represents an electric dipole moment associated with the position vector. Normally, we can readily find it in a character table, in which we examine which irreducible representation γ the position vector components x, y, and z indicated in a rightmost column correspond to. Table 18.4 is an example. If we take a unit polarization vector εe in parallel to the position vector component that the character ðβ Þ table designates, εe  P(γ) is non-vanishing. Suppose that ψ sðαÞ and ϕl are an initial state and final state, respectively. At the first glance, we would wish to use (18.200) and count how many times a representation β occurs for a direct-product representation D(γ × α). It is often the case, however, where β does not belong to an irreducible representation. Even in such a case, if either α or β belongs to a totally symmetric representation, the handling will be easier. This situation corresponds to that an initial electronic configuration or a final electronic configuration forms a closed shell an electronic

768

19

Applications of Group Theory to Physical Chemistry

state of which is totally symmetric [1]. The former occurs when we consider the optical absorption that takes place, e.g., in a molecule of a ground electronic state. The latter corresponds to an optical emission that ends up with a ground state. Let us ðγ Þ consider the former case. Since Oj is Hermitian, we rewrite (19.48) as ðβÞ

ðγ Þ

ðγ Þ ðβ Þ 

ðγ Þ ðβÞ

ϕl jOj ψ ðsαÞ = Oj ϕl jψ sðαÞ = ψ ðsαÞ jOj ϕl

,

ð19:51Þ

where we assume that ψ sðαÞ is a ground state having a closed shell electronic configuration. Therefore, ψ sðαÞ belongs to a totally symmetric irreducible representation. For (19.48) not to vanish, therefore, we may alternatively state that it is necessary for Dðγ × βÞ = DðγÞ  DðβÞ to contain D(α) belonging to a totally symmetric representation. Note that in group theory we usually write D(γ) × D(β) instead of D(γ)  D(β); see (18.191). ðβÞ If ϕl belongs to a reducible representation, from (18.80) we have DðβÞ =

q D ω ω

ðωÞ

,

where D(ω) belongs to an irreducible representation ω. Then, we get Dðγ Þ × DðβÞ =

q D ω ω

ðγ Þ

× DðωÞ :

Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible representations. After that, we examine whether an irreducible representation α is contained in D(γ) × D(β). Since the totally symmetric representation is one-dimensional, s = 1 in (19.51).

19.3

Calculation Procedures of Molecular Orbitals (MOs)

We describe a brief outline of the MO method based on LCAO (LCAOMO). Suppose that a molecule consists of n atomic orbitals and that each MO ψ i (1 ≤ i ≤ n) comprises a linear combination of those n atomic orbitals ϕk (1 ≤ k ≤ n). That is, we have ψi =

n

c ϕ k = 1 ki k

ð1 ≤ i ≤ nÞ,

ð19:52Þ

where cki are complex coefficients. We assume that ϕk are normalized. That is,

19.3

Calculation Procedures of Molecular Orbitals (MOs)

769

ϕk ϕk dτ = 1,

hϕk jϕk i 

ð19:53Þ

where dτ implies that an integration should be taken over ℝ3. The notation hg| fi means an inner product defined in Sect. 13.1. The inner product is usually defined by a definite integral of gf whose integration range covers a part or all of ℝ3 depending on a constitution of a physical system. First we try to solve Schrödinger equation given as an eigenvalue equation. The said equation is described as Hψ = λψ

ðH - λψ Þ = 0:

or

ð19:54Þ

Replacing ψ with (19.52), we have n

c ðH k=1 k

- λÞϕk = 0 ð1 ≤ k ≤ nÞ,

ð19:55Þ

where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕj from the left and integrating over whole ℝ3, we have n

ϕj ðH - λÞϕk dτ = 0:

ð19:56Þ

ϕj Hϕk - λϕj ϕk dτ = 0:

ð19:57Þ

c k=1 k Rewriting (19.56), we get n

c k=1 k

Here let us define following quantities: H jk =

ϕj Hϕk dτ

and

ϕj ϕk dτ,

Sjk =

ð19:58Þ

where Sii = 1 (1 ≤ i ≤ n) due to a normalized function of ϕi. Then we get n

c k=1 k

H jk - λSjk = 0:

ð19:59Þ

Rewriting (19.59) in a matrix form, we get H 11 - λ ⋮ H n1 - λSn1

⋯ ⋱ ⋯

H 1n - λS1n ⋮ H nn - λ

c1 ⋮ cn

= 0:

ð19:60Þ

Suppose that by solving (19.59) or (19.60), we get λi (1 ≤ i ≤ n), some of which may be identical (i.e., the degenerate case), and obtain n different column

770

19

Applications of Group Theory to Physical Chemistry

eigenvectors corresponding to n eigenvalues λi. In light of (12.4) of Sect. 12.1, the following condition must be satisfied for this to get eigenvectors for which not all ck is zero: det H jk - λSjk = 0 or H 11 - λ ⋮ H n1 - λSn1

⋯ H 1n - λS1n ⋱ ⋮ ⋯ H nn - λ

= 0:

ð19:61Þ

Equation (19.61) is called a secular equation. This equation is pertinent to a determinant of an order n, and so we are expected to get n roots for this, some of which are identical (i.e., a degenerate case). In the above discussion, it is useful to introduce the following notation: ϕj jHϕk  H jk =

ϕj Hϕk dτ and ϕj jϕk  Sjk =

ϕj ϕk dτ:

ð19:62Þ

The above notation has already been introduced in Sect. 1.4. Equation (19.62) certainly satisfies the definition of the inner product described in (13.2)–(13.4); readers, please check it. On the basis of (13.64), we have hyjHxi = xH { jy :

ð19:63Þ

Since H is Hermitian, H{ = H. That is, hyjHxi = hxHjyi:

ð19:64Þ

To solve (19.61) with an enough large number n is usually formidable. However, if we can find appropriate conditions, (19.61) can pretty easily be solved. An essential point rests upon how we deal with off-diagonal elements Hjk and Sjk. If we are able to appropriately choose basis vectors so that we can get H jk = 0

and

Sjk = 0 ðj ≠ kÞ,

ð19:65Þ

the secular equation is reduced to a simple form ~ 11 - λ~S11 H

~ 22 - λ~S22 H



~ nn - λ~Snn H

= 0,

ð19:66Þ

where all the off-diagonal elements are zero and an eigenvalue λ is given by

19.3

Calculation Procedures of Molecular Orbitals (MOs)

771

~ ii =~Sii : λi = H

ð19:67Þ

This means that the eigenvalue equation has automatically been solved. The best way to achieve this is to choose basis functions (vectors) such that the functions conform to the symmetry which the molecule belongs to. That is, we “shuffle” the atomic orbitals as follows: ξi =

n

d ϕ k = 1 ki k

ð1 ≤ i ≤ nÞ,

ð19:68Þ

where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a linear combination of the atomic orbitals that belongs to individual irreducible representation. The said linear combination is called symmetry-adapted linear combination (SALC). We remark that all atomic orbitals are not necessarily included in the SALC. Then we have ~ jk = H

ξj Hξk dτ

and

~Sjk =

ξj ξk dτ:

ð19:69Þ

Thus, instead of (19.61), we have a new secular equation of ~ jk - λ~Sjk = 0: det H

ð19:70Þ

Since the Hamiltonian H is totally symmetric, in terms of the direct-product representation Hξk in (19.69) belongs to an irreducible representation which ξk belongs to. This can intuitively be understood. However, to assert this, use (18.76) and (18.200) along with the fact that characters of the totally symmetric representation are 1. In light of (18.171) and (18.181), if ξk and ξj belong to different irreducible ~ jk and ~Sjk both vanish at once. From (18.172), at the same time, ξk representations, H and ξj are orthogonal to each other. The most ideal situation is to get (19.66) with all the off-diagonal elements vanishing. Regarding the diagonal elements of (19.70), we always have the direct product of the same representation and, hence, the integrals (19.69) do not vanish. Thus, we get a powerful guideline for the evaluation of (19.70) in an as simplest as possible form. If the SALC orbitals ξj and ξk belong to the same irreducible representation, the relevant matrix elements are generally non-vanishing. Even in that case, however, according to Theorem 13.2 we can construct a set of orthonormal vectors by taking appropriate linear combination of ξj and ξk. The resulting vectors naturally belong to the same irreducible representation. This can be done by solving the secular equation as described below. On top of it, Hermiticity of the Hamiltonian ensures that those vectors can be rendered orthogonal (see Theorem 14.5). If n electrons are present in a molecule, we are to deal with n SALC orbitals which we view as vectors accordingly. In terms of representation theory, these vectors span a representation space where the vectors undergo symmetry operations. Thus, we can construct

772

19

Applications of Group Theory to Physical Chemistry

orthonormal basis vectors belonging to various irreducible representations throughout the representation space. To address our problems, we take following procedures: (i) First we have to determine a symmetry species (i.e., point group) of a molecule. (ii) Next, we pick up atomic orbitals contained in a molecule and examine how those orbitals are transformed by symmetry operations. (iii) We examine how those atomic orbitals are transformed according to symmetry operations. Since the symmetry operations are represented by a (n, n) unitary matrix, we can readily decide a trace of the matrix. Generally, that matrix representation is reducible, and so we reduce the representation according to the procedures of (18.81)–(18.83). Thus, we are able to determine how many irreducible representations are contained in the original reducible representation. (iv) After having determined the irreducible representations, we construct SALCs and constitute a secular equation using them. (v) Solving the secular equation, we determine molecular orbital energies and decide functional forms of the corresponding MOs. (vi) We examine physicochemical properties such as optical transition within a molecule. In the procedure (iii) of the above, if the same irreducible representations appear more than once, we have to solve a secular equation of an order of two or more. Even in that case, we can render a set of resulting eigenvectors orthogonal to one another during the process of solving a problem. To construct the above-mentioned SALC, we make the most of projection operators that are defined in Sect. 14.1. In (18.147) putting m = l, we have ðαÞ

PlðlÞ =

dα n

ðαÞ

Dll ðgÞ g:

ð19:71Þ

g

Or we can choose a projection operator P(α) described as PðαÞ =

dα n

dα g

ðαÞ

D ðgÞ g = l = 1 ll

dα n

ðαÞ



χ ðαÞ ðgÞ g:

ð18:174Þ

g

In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in (18.155) and (18.175), these projection operators act on an arbitrary function and extract specific component(s) pertinent to a specific irreducible representation of the point group which the molecule belongs to. ðαÞ At first glance the definition of PlðlÞ and P(α) looks daunting, but the use of character tables relieves a calculation task. In particular, all the irreducible representations are one-dimensional (i.e., just a number!) with Abelian groups as mentioned in Sect. 18.6. For this, notice that individual group elements form a class by themselves. Therefore, utilization of character tables becomes easier for Abelian groups. Even though we encounter a case where a dimension of representation is ðαÞ more than one (i.e., the case of non-commutative groups), Dll can be determined without much difficulty.

19.4

MO Calculations Based on π-Electron Approximation

773

MO Calculations Based on π-Electron Approximation

19.4

On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform molecular orbital calculations of individual molecules. First, we apply group theory to the molecular orbital calculations about aromatic hydrocarbons such as ethylene, cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases, in addition to adoption of the molecular orbital theory, we adopt the so-called π-electron approximation. With the first three molecules, we will not have to “solve” a secular equation. However, for allyl radical we deal with two SALC orbitals belonging to the same irreducible representations and, hence, the final molecular orbitals must be obtained by solving the secular equation.

19.4.1

Ethylene

We start with one of the simplest examples, ethylene. Ethylene is a planar molecule and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons extend vertically to the molecular plane toward upper and lower directions (Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is those atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane). In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have

z z

H C

H C H

H

y

y x

x Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane)

774

19

Applications of Group Theory to Physical Chemistry

C 2 ð z Þ ð ϕ1 Þ = ϕ2

ð19:72Þ

C 2 ð z Þ ð ϕ2 Þ = ϕ 1 :

ð19:73Þ

and

Equations (19.72) and (19.73) can be combined into a following equation: ðϕ1 ϕ2 ÞC2 ðzÞ = ðϕ2 ϕ1 Þ:

ð19:74Þ

Thus, using a matrix representation, we have C 2 ðzÞ =

0 1

1 0

:

ð19:75Þ

In Sect. 17.2, we had Rzθ =

cos θ sin θ 0

– sin θ 0 cos θ 0 0 1

ð19:76Þ

or C2 ðzÞ =

-1 0 0

0 -1 0

0 0 1

:

ð19:77Þ

Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3, (19.75) represents the transformation of functions in a two-dimensional vector space. A vector space composed of functions may be finite-dimensional or infinitedimensional. We have already encountered the latter case in Part I where we dealt with the quantum mechanics of a harmonic oscillator. Such a function space is often referred to as a Hilbert space. An essence of (19.75) is characterized by that a trace (or character) of the matrix is zero. Let us consider another transformation C2( y). In this case the situation is different from the above case in that ϕ1 is converted to -ϕ1 by C2( y) and that ϕ2 is converted to -ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals. Thus, C2( y) is represented by C 2 ð yÞ =

-1 0

0 -1

:

ð19:78Þ

The trace of the matrix C2( y) is -2. In this way, choosing ϕ1 and ϕ2 for the basis functions for the representation of D2h we can determine the characters for individual symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We collect the results in Table 19.1.

19.4

MO Calculations Based on π-Electron Approximation

775

Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene D2h Γ

E 2

C2(z) 0

C2( y) -2

C2(x) 0

i 0

σ(xy) -2

i

σ(xy) 1 1 -1 -1 -1 -1 1 1

σ(zx) 1 -1 1 -1 -1 1 -1 1

σ(zx) 0

σ(yz) 2

Table 19.2 Character table of D2h D2h Ag B1g B2g B3g Au B1u B2u B3u

E 1 1 1 1 1 1 1 1

C2(z) 1 1 -1 -1 1 1 -1 -1

C2( y) 1 -1 1 -1 1 -1 1 -1

C2(x) 1 -1 -1 1 1 -1 -1 1

1 1 1 1 -1 -1 -1 -1

σ(yz) 1 -1 -1 1 -1 1 1 -1

x2, y2, z2 xy zx yz z y x

Next, we examine what kind of irreducible representations is contained in our present representation of ethylene. To do this, we need a character table of D2h (see Table 19.2). If a specific kind of irreducible representation is contained, then we want to examine how many times that specific representation takes place. Equation (18.83) is very useful for this purpose. In the present case, n = 8 in (18.83). Also taking into account (18.79) and (18.80), we get Γ = B1u

ð19:79Þ

B3g ,

where Γ is a reducible representation for a set consisting of two pz atomic orbitals of ethylene. Equation (19.79) clearly shows that Γ is a direct sum of two irreducible representations B1u and B3g that belong to D2h. In instead of (19.79), we usually simply express the direct product in quantum chemical notation as Γ = B1u þ B3g :

ð19:80Þ

As a next step, we are going to find an appropriate basis function belonging to two irreducible representations of (19.80). For this purpose, we use projection operators expressed as PðαÞ =

dα n



χ ðαÞ ðgÞ g: g

Taking ϕ1, for instance, we apply (18.174) to ϕ1. That is, PðB1u Þ ϕ1 =

1 8



χ ðB1u Þ ðgÞ gϕ1 g

ð18:174Þ

776

19

Table 19.3 Character table of C2

=

Applications of Group Theory to Physical Chemistry

C2 A B

E 1 1

C2 1 -1

z; x2, y2, z2, xy x, y; yz, zx

1 ½1 ∙ ϕ1 þ 1 ∙ ϕ2 þ ð- 1Þð- ϕ1 Þ þ ð- 1Þð- ϕ2 Þ þ ð- 1Þð- ϕ2 Þ 8 1 þð- 1Þð- ϕ1 Þ þ 1 ∙ ϕ2 þ 1 ∙ ϕ1  = ðϕ1 þ ϕ2 Þ: 2

ð19:81Þ

Also with the B3g, we apply (18.174) to ϕ1 and get PðB3g Þ ϕ1 = =

1 8



χ ðB3g Þ ðgÞ gϕ1 g

1 ½1 ∙ ϕ1 þ ð- 1Þϕ2 þ ð- 1Þð- ϕ1 Þ þ 1ð- ϕ2 Þ þ 1ð- ϕ2 Þ þ ð- 1Þð- ϕ1 Þ 8 þð- 1Þϕ2 þ 1 ∙ ϕ1  =

1 ðϕ - ϕ2 Þ: 2 1

ð19:82Þ

Thus, after going through routine but sure procedures, we have reached appropriate basis functions that belong to each irreducible representation. As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product group and C2 group is contained in D2h as a subgroup. We can use it for the present analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a representation of H . We write this relation as D#H:

ð19:83Þ

This representation is called a subduced representation of D to H . Table 19.3 shows a character table of irreducible representations of C2. In the present case, we are thinking of C2(z) as C2; see Table 17.5 and Fig. 19.5. Then we have B1u # C 2 = A

and B3g # C 2 = B:

ð19:84Þ

The expression of (19.84) is called a compatibility relation. Note that in (19.84) C2 is not a symmetry operation, but means a subgroup of D2h. Thus, (19.81) is reduced to PðAÞ ϕ1 =

1 2

Also, we have



χ ðAÞ ðgÞ gϕ1 = g

1 1 ½1 ∙ ϕ1 þ 1 ∙ ϕ2  = ðϕ1 þ ϕ2 Þ: 2 2

ð19:85Þ

19.4

MO Calculations Based on π-Electron Approximation

PðBÞ ϕ1 =

1 2



χ ðBÞ ðgÞ gϕ1 = g

1 1 ½1 ∙ ϕ1 þ ð- 1Þϕ2  = ðϕ1 - ϕ2 Þ: 2 2

777

ð19:86Þ

The relations of (19.85) and (19.86) are essentially the same as (19.81) and (19.82), respectively. We can easily construct a character table (see Table 19.3). There should be two irreducible representations. Regarding the totally symmetric representation, we allocate 1 to each symmetry operation. For another representation we allocate 1 to an identity element E and -1 to an element C2 so that the row and column of the character table are orthogonal to each other. Readers might well ask why we bother to make circuitous approaches to reaching predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question seems natural when we are dealing with a case where the number of basis vectors (i.e., a dimension of the vector space) is small, typically 2 as in the present case. With increasing dimension of the vector space, however, to seek and determine appropriate SALCs become increasingly complicated and difficult. Under such circumstances, a projection operator is an indispensable tool to address the problems. We have a two-dimensional secular equation to be solved such that ~ 11 - λS~11 H ~ 21 - λ~S21 H

~ 12 - λS~12 H ~ 22 - λ~S22 H

= 0:

ð19:87Þ

In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1 - ϕ2 Þ belongs to B3g. Since they belong to different irreducible representations, we have ~ 12 = ~S12 = 0: H

ð19:88Þ

Thus, the secular equation (19.87) is reduced to ~ 11 - λ~S11 H 0

0 ~ 22 - λ~S22 H

= 0:

ð19:89Þ

As expected, (19.89) has automatically been solved to give a solution ~ 11 =~S11 λ1 = H

~ 22 =~ and λ2 = H S22 :

ð19:90Þ

The next step is to determine the energy eigenvalue of the molecule. Note here that a role of SALCs is to determine a suitable irreducible representation that corresponds to a “direction” of a vector. As the coefficient keeps the direction of a vector unaltered, it would be of secondary importance. The final form of normalized MOs can be decided last. That procedure includes the normalization of a vector. Thus, we tentatively choose following functions for SALCs, i.e.,

778

19

ξ 1 = ϕ1 þ ϕ2

Applications of Group Theory to Physical Chemistry

and

ξ 2 = ϕ 1 - ϕ2 :

Then, we have ~ 11 = H

=

ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ

ξ1 Hξ1 dτ =

ϕ1 Hϕ1 dτ þ

ϕ1 Hϕ2 dτ þ

ϕ2 Hϕ1 dτ þ

ϕ2 Hϕ2 dτ

= H 11 þ H 12 þ H 21 þ H 22 = H 11 þ 2H 12 þ H 22 :

ð19:91Þ

Similarly, we have ~ 22 = H

ξ2 Hξ2 dτ = H 11 - 2H 12 þ H 22 :

ð19:92Þ

The last equality comes from the fact that we have chosen real functions for ϕ1 and ϕ2 as studied in Part I. Moreover, we have H 11 

H 12 

ϕ1 Hϕ1 dτ =

ϕ1 Hϕ2 dτ =

ϕ1 Hϕ1 dτ =

ϕ1 Hϕ2 dτ =

ϕ2 Hϕ2 dτ = H 22 ,

ϕ2 Hϕ1 dτ =

= H 21 :

ϕ2 Hϕ1 dτ ð19:93Þ

The first equation comes from the fact that both H11 and H22 are calculated using the same 2pz atomic orbital of carbon. The second equation results from the fact that H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the convention, we denote α  H 11 = H 22

and β  H 12 ,

ð19:94Þ

where α is called Coulomb integral and β is said to be resonance integral. Then, we have ~ 11 = 2ðα þ βÞ: H In a similar manner, we get

ð19:95Þ

19.4

MO Calculations Based on π-Electron Approximation

779

~ 22 = 2ðα - βÞ: H

ð19:96Þ

Meanwhile, we have ~S11 = hξ1 jξ1 i =

=

ϕ21 dτ þ

ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ =

ϕ22 dτ þ 2

ϕ1 ϕ2 dτ = 2 þ 2

ðϕ1 þ ϕ2 Þ2 dτ

ϕ1 ϕ2 dτ,

ð19:97Þ

where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the convention, we denote ϕ1 ϕ2 dτ = S12 ,

S

ð19:98Þ

where S is called overlap integral. Thus, we have ~S11 = 2ð1 þ SÞ:

ð19:99Þ

~S22 = 2ð1 - SÞ:

ð19:100Þ

Similarly, we get

Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we get as the energy eigenvalue λ1 =

αþβ α-β and λ2 = : 1þS 1-S

ð19:101Þ

From (19.97), we get jjξ1 jj =

hξ1 jξ1 i =

2ð1 þ SÞ:

ð19:102Þ

Thus, for one of MOs corresponding to an energy eigenvalue λ1, we get Ψ1 =

Also, we have

j ξ1 i ξ1

=

ϕ1 þ ϕ 2 : 2ð1 þ SÞ

ð19:103Þ

780

19

jjξ2 jj =

Applications of Group Theory to Physical Chemistry

hξ2 jξ2 i =

2ð1 - SÞ:

ð19:104Þ

For another MO corresponding to an energy eigenvalue λ2, we get Ψ2 =

j ξ2 i ξ2

=

ϕ1 - ϕ 2 : 2ð 1 - SÞ

ð19:105Þ

Note that both normalized MOs and energy eigenvalues depend upon whether we ignore an overlap integral, as being the case with the simplest Hückel approximation. Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1 - ϕ2) remain the same regardless of the approximation levels about the overlap integral. Regarding quantitative evaluation of α, β, and S, we will briefly mention it later. Once we have decided symmetry of MO (or an irreducible representation which the orbital belongs to) and its energy eigenvalue, we will be in a position to examine various physicochemical properties of the molecule. One of them is an optical transition within a molecule, particularly electric dipole transition. In most cases, the most important transition is that occurring among the highest-occupied molecular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most stable state), two electrons are positioned in a B1u state (HOMO). An excited state is assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs belongs to a totally symmetric representation. In the case of ethylene, the ground state belongs to Ag accordingly. If a photon is absorbed by a molecule, that molecule is excited by an energy of the photon. In ethylene, this process takes place by exciting an electron from B1u to B3g. The resulting electronic state ends up as an electron remaining in B1u and another excited to B3g (a final state). Thus, the representation of the final sate electronic configuration (denoted by Γ f) is described as Γ f = B1u × B3g :

ð19:106Þ

That is, the final excited state is expressed as a direct product of the states associated with the optical transition. To determine the symmetry of the final sate, we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can determine the number of times that individual representations take place. Calculating χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a character table, we have Γ f = B2u :

ð19:107Þ

Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state electronic configuration and B2u is called a final state electronic configuration. This transition is characterized by an electric dipole transition moment operator P. Here

19.4

MO Calculations Based on π-Electron Approximation

=

(

)

=

(

)

781

Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene

we have an important physical quantity of transition matrix element. This quantity Tfi is approximated by T fi  Θf jεe  PjΘi ,

ð19:108Þ

where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state and final sate electronic configuration, respectively. The description (19.108) is in parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as a particle confined in a square-well potential, a sole one-dimensional harmonic oscillator, and an electron in a hydrogen atom. In the present case, however, we are dealing with a two-electron system, ethylene. Consequently, we cannot describe Θf by a simple wave function, but must use a more elaborated function. Nonetheless, when we discuss the optical transition of a molecule, it is often the case that when we study optical absorption or optical emission we first wish to know whether such a phenomenon truly takes place. In such a case, qualitative prediction for this is of great importance. This can be done by judging whether the integral (19.108) vanishes. If the integral does not vanish, the relevant transition is said to be allowed. If, however, the integral vanishes, the transition is called forbidden. In this context, a systematic approach based on group theory is a powerful tool for this. Let us consider optical absorption of ethylene. With the transition matrix element for this, we have T fi = Θf ðB2u Þ jεe  PjΘi ðAg Þ :

ð19:109Þ

782

19

Applications of Group Theory to Physical Chemistry

Again, note that a closed-shell electronic configuration belong to a totally symmetric representation [1]. Suppose that the position vector x belongs to an irreducible representation η. A necessary condition for (19.109) not to vanish is that DðηÞ × DðAg Þ contains the irreducible representation DðB2u Þ . In the present case, all the representations are one-dimensional, and so we can use χ (ω) instead of D(ω), where ω shows an arbitrary irreducible representation of D2h. This procedure is straightforward as seen in Sect. 19.2. However, if the character is real (it is the case with many symmetry groups and with the point group D2h as well), the situation will be easier. Suppose in general that we are examining whether a following matrix element vanishes: ðβÞ

ðαÞ

M fi = Φf jOðγÞ jΦi

,

ð19:110Þ

where α, β, and γ stand for irreducible representations and O is an appropriate operator. In this case, (18.200) can be rewritten as

=

=

1 n

qα =

1 n

1 n

g

g

g

χ ð α Þ ð gÞ  χ ð γ × β Þ ð gÞ =

χ ðαÞ ðgÞχ ðγ Þ ðgÞχ ðβÞ ðgÞ =

χ ðγ Þ ðgÞχ ðα × βÞ ðgÞ =

1 n

g

1 n

1 n

g

g

χ ðαÞ ðgÞχ ðγ × βÞ ðgÞ

χ ðγÞ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ

χ ðγÞ ðgÞ χ ðα × βÞ ðgÞ = qγ :

ð19:111Þ

Consequently, the number of times that D(γ) appears in D(α × β) is identical to the number of times that D(α) appears in D(γ × β). Thus, it suffices to examine whether D(γ × β) contains D(α). In other words, we only have to examine whether qα ≠ 0 in (19.111). Thus, applying (19.111)–(19.109), we examine whether DðB2u × Ag Þ contains the irreducible representation D(η) that is related to x. We easily get B2u = B2u × Ag :

ð19:112Þ

Therefore, if εe  P (or x) belongs to B2u, the transition is allowed. Consulting the character table, we find that y belongs to B2u. In this case, in fact (19.111) reads as qB2u =

1 ½1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × ð- 1Þ 8

19.4

MO Calculations Based on π-Electron Approximation

783

þð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × 1 = 1: Equivalently, we simply write B2u × B2u = Ag : This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized along the y-axis or polarized in the direction of the y-axis. As a molecular axis is parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This is often the case with aromatic molecules having a well-defined molecular long axis such as ethylene. We would examine whether ethylene is polarized along, e.g., the x-axis. From a character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111) we have qB3u =

1 ½1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × ð- 1Þ þ ð- 1Þ × 1 8

þð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × 1 þ 1 × ð- 1Þ = 0: This implies that B3u is not contained in B1u × B3g (=B2u). The above results on the optical transitions are quite obvious. Once we get used to using a character table, quick estimation will be done.

19.4.2

Cyclopropenyl Radical [1]

Let us think of another example, cyclopropenyl radical that has three resonant structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect. 17.2). In the molecule three π-electrons extend vertically to the molecular plane toward upper and lower directions. Suppose that cyclopropenyl radical is placed on the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral triangle are denoted by ϕ1, ϕ2 and ϕ3 in Fig. 19.8. The orbitals are numbered clockwise so that the calculations can be consistent with the conventional notation of a character table (vide infra). We assume that these π-orbitals take positive and negative signs on the upper and lower sides of the plane of paper, respectively, with a nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the problem can be treated in parallel to the case of ethylene. As in the case of ethylene, we can choose ϕ1, ϕ2 and ϕ3 as real functions. We construct basis vectors using these vectors. What we want to do to address the problem is as follows: (i) We examine how ϕ1, ϕ2 and ϕ3 are transformed by the

784

19





Applications of Group Theory to Physical Chemistry







Fig. 19.7 Three resonant structures of cyclopropenyl radical Fig. 19.8 Three pz atomic orbitals of carbons for cyclopropenyl radical that is placed on the xy-plane. The carbon atoms are located at vertices of an equilateral triangle. The atomic orbitals are denoted by ϕ1, ϕ2, and ϕ3

y

z

O

x

symmetry operations of D3h. According to the analysis, we can determine what irreducible representations SALCs should be assigned to. (ii) On the basis of knowledge obtained in (i), we construct proper MOs. In Table 19.4 we list a character table of D3h along with symmetry species. First we examine traces (characters) of representation matrices. Similarly in the case of ethylene, a subgroup C3 of D3h plays an essential role (vide infra). This subgroup contains three group elements such that C 3 = E, C3 , C 23 : In the above, we use the same notation for the group name and group element, and so we should be careful not to confuse them. They are transformed as follows: C3 ðzÞðϕ1 Þ = ϕ3 , C 3 ðzÞðϕ2 Þ = ϕ1 , and C 3 ðzÞðϕ3 Þ = ϕ2 ;

ð19:113Þ

19.4

MO Calculations Based on π-Electron Approximation

Table 19.4 Character table of D3h

D3h A01 A02 E′ A001 A002 E′′

E 1 1 2 1 1 2

2C3 1 1 -1 1 1 -1

785

σh 1 1 2 -1 -1 -2

3C2 1 -1 0 1 -1 0

2S3 1 1 -1 -1 -1 1

3σ v 1 -1 0 -1 1 0

x2 + y2, z2 (x, y); (x2 - y2, xy) z (yz, zx)

C 23 ðzÞðϕ1 Þ = ϕ2 , C 23 ðzÞðϕ2 Þ = ϕ3 , and C 23 ðzÞðϕ3 Þ = ϕ1 :

ð19:114Þ

Equation (19.113) can be combined into a following form: ðϕ1 ϕ2 ϕ3 ÞC3 ðzÞ = ðϕ3 ϕ1 ϕ2 Þ:

ð19:115Þ

Using a matrix representation, we have C 3 ðzÞ =

0 0 1

1 0 0

0 1 0

:

ð19:116Þ

0 1 0

0 0 1

1 0 0

:

ð19:117Þ

In turn, (19.114) is expressed as C 23 ðzÞ =

Both traces of (19.116) and (19.117) are zero. Similarly let us check the representation matrices of other symmetry species. Of these, e.g., for C2 related to the y-axis (see Fig. 19.8) we have ðϕ1 ϕ2 ϕ3 ÞC 2 = ð- ϕ1 - ϕ3 - ϕ2 Þ:

ð19:118Þ

Therefore, C2 =

-1 0 0

0 0 -1

0 -1 0

:

ð19:119Þ

We have a trace -1 accordingly. Regarding σ h, we have ðϕ1 ϕ2 ϕ3 Þσ h = ð- ϕ1 - ϕ2 - ϕ3 Þ and σ h =

-1 0 0

0 -1 0

0 0 -1

:

ð19:120Þ

786

19

Applications of Group Theory to Physical Chemistry

Table 19.5 Characters for individual symmetry transformations of 2pz orbitals in cyclopropenyl radical D3h Γ

E 3

2C3 0

Table 19.6 Character table of C3

σh -3

3C2 -1

C3 A E

E 1 1 1

C3 1 ε ε

C 23 1 ε ε

2S3 0

3σ v 1

ε = exp (i2π/3) z; x2 + y2, z2 (x, y); (x2 - y2, xy), (yz, zx)

In this way we can determine the trace for individual symmetry transformations of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible representation Γ in Table 19.5. It can be reduced to a summation of irreducible representations according to the procedures given in (18.81)–(18.83) and using a character table of D3h (Table 19.4). As a result, we get Γ = A2 00 þ E00 :

ð19:121Þ

As in the case of ethylene, we make the best use of the information of a subgroup C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in Table 19.6 we show a character table of irreducible representations of C3. We can readily construct this character table. There should be three irreducible representations. Regarding the totally symmetric representation, we allocate 1 to each symmetry operation. Hence, for other representations we allocate 1 to an identity element E and two other triple roots of 1, i.e., ε and ε [where ε = exp (i2π/3)] to an element C3 and C32 as shown so that the row and column vectors of the character table are orthogonal to each other. Returning to the construction of the subduced representation, we have A2 00 # C 3 = A

and E 00 # C 3 = 2E:

ð19:122Þ

Then, corresponding to (18.147) we have PðAÞ ϕ1 =

1 3

g

 1 χ ðAÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ 1 ∙ ϕ2 þ 1 ∙ ϕ3  3

1 = ðϕ1 þ ϕ2 þ ϕ3 Þ: 3 Also, we have

ð19:123Þ

19.4

MO Calculations Based on π-Electron Approximation ð 1Þ 1 P½E  ϕ1 = 3

g

787

 ð 1Þ 1 χ ½E  ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ε ϕ3 þ ðε Þ ϕ2  3

=

1 ðϕ þ εϕ2 þ ε ϕ3 Þ: 3 1

ð19:124Þ

Also, we get ð 2Þ 1 P½E  ϕ1 = 3

g

 ð 2Þ 1 χ ½E  ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ðε Þ ϕ3 þ ε ϕ2  3

=

1 ðϕ þ ε ϕ2 þ εϕ3 Þ: 3 1

ð19:125Þ

Here is the best place to mention an eigenvalue of a symmetry operator. Let us designate SALCs as follows: ξ1 = ϕ1 þ ϕ2 þ ϕ3 , ξ2 = ϕ1 þ εϕ2 þ ε ϕ3 , and ξ3 = ϕ1 þ ε ϕ2 þ εϕ3 : ð19:126Þ Let us choose C3 for a symmetry operator. Then we have C 3 ð ξ 1 Þ = C 3 ð ϕ1 þ ϕ 2 þ ϕ3 Þ = C 3 ϕ1 þ C 3 ϕ2 þ C 3 ϕ3 = ϕ3 þ ϕ1 þ ϕ2 = ξ 1 ,

ð19:127Þ

where for the second equality we used the fact that C3 is a linear operator. That is, regarding a SALC ξ1, an eigenvalue of C3 is 1. Similarly, we have C 3 ðξ2 Þ = C3 ðϕ1 þ εϕ2 þ ε ϕ3 Þ = C 3 ϕ1 þ εC 3 ϕ2 þ ε C 3 ϕ3 = ϕ3 þ εϕ1 þ ε ϕ2 = εðϕ1 þ εϕ2 þ ε ϕ3 Þ = εξ2 :

ð19:128Þ

Furthermore, we get C3 ðξ3 Þ = ε ξ3 :

ð19:129Þ

Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε, respectively. These pieces of information imply that if we appropriately choose proper functions for basis vectors, a character of a symmetry operation for a one-dimensional representation is identical to an eigenvalue of the said symmetry operation (see Table 19.6).

788

19

Applications of Group Theory to Physical Chemistry

Regarding the last parts of the calculations, we follow the procedures described in the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the secular equation such that ~ 11 - λ~S11 H

~ 22 - λ~S22 H

~ 33 - λ~S33 H

= 0:

ð19:130Þ

Since we have obtained three SALCs that are assigned to individual irreducible representations A, E(1), and E(2), these SALCs span the representation space V3. This makes off-diagonal elements of the secular equation vanish and it is simplified as in (19.130). Here, we have ~ 11 = H

ξ1 Hξ1 dτ =

ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ

= 3ðα þ 2βÞ,

ð19:131Þ

where we used the same α and β as defined in (19.94). Strictly speaking, α and β appearing in (19.131) should be slightly different from those of (19.94), because a Hamiltonian is different. This approximation, however, would be enough for the present studies. In a similar manner, we get. ~ 22 = H

~ 33 = ξ2 Hξ2 dτ = 3ðα - βÞ and H

ξ3 Hξ3 dτ = 3ðα - βÞ:

ð19:132Þ

Meanwhile, we have ~S11 = hξ1 jξ1 i =

ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ =

ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ

= 3ð1 þ 2SÞ:

ð19:133Þ

~S22 = ~S33 = 3ð1 - SÞ:

ð19:134Þ

Similarly, we get

Readers are urged to verify (19.133) and (19.134). Substituting (19.131) through (19.134) for (19.130), we get as the energy eigenvalue

19.4

MO Calculations Based on π-Electron Approximation

λ1 =

789

α þ 2β α-β α-β ,λ = , and λ3 = : 1 þ 2S 2 1 - S 1-S

ð19:135Þ

Notice that two MOs belonging to E′′ have the same energy. These MOs are said to be energetically degenerate. This situation is characteristic of a two-dimensional representation. Actually, even though the group C3 has only one-dimensional representations (because it is an Abelian group), the two complex conjugate representations labelled E behave as if they were a two-dimensional representation [2]. We will again encounter the same situation in a next example, benzene. From (19.126), we get jjξ1 jj =

ξ1 jξ1 =

3ð1 þ 2SÞ:

ð19:136Þ

Thus, as one of MOs corresponding to an energy eigenvalue λ1; i.e., Ψ 1, we get Ψ1 =

j ξ1 i ξ1

=

ϕ1 þ ϕ2 þ ϕ 3 : 3ð1 þ 2SÞ

ð19:137Þ

Also, we have ξ2 =

hξ2 jξ2 i =

3ð1 - SÞ and ξ3 =

hξ3 jξ3 i =

3ð1 - SÞ:

ð19:138Þ

Thus, for another MO corresponding to an energy eigenvalue λ2, we get Ψ2 =

j ξ2 i ξ2

=

ϕ1 þ εϕ2 þ ε ϕ3 : 3ð 1 - SÞ

ð19:139Þ

Also, with a MO corresponding to λ3 (=λ2), we have Ψ3 =

j ξ3 i ξ3

=

ϕ1 þ ε ϕ2 þ εϕ3 : 3ð 1 - SÞ

ð19:140Þ

Equations (19.139) and (19.140) include complex numbers, and so it is inconvenient to computer analysis. In that case, we can convert it to real numbers. In Part III, we examined properties of unitary transformations. Since the unitary transformation keeps a norm of a vector unchanged, this is suited to our present purpose. This can be done using a following unitary matrix U:

790

19

U=

Applications of Group Theory to Physical Chemistry

1 p 2 1 p 2

i p 2 i p 2

-

:

ð19:141Þ

Then we have 1 i ðΨ 2 Ψ 3 ÞU = p ðΨ 2 þ Ψ 3 Þ p ð- Ψ 2 þ Ψ 3 Þ : 2 2

ð19:142Þ

Thus, defining Ψ~2 and Ψ~3 as 1 Ψ~2 = p ðΨ 2 þ Ψ 3 Þ 2

and

i Ψ~3 = p ð- Ψ 2 þ Ψ 3 Þ, 2

ð19:143Þ

we get Ψ~2 =

1 ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3  6 ð 1 - SÞ

1 ð2ϕ1 - ϕ2 - ϕ3 Þ: 6ð 1 - SÞ

=

ð19:144Þ

Also, we have Ψ~3 =

i ½ðε - εÞϕ2 þ ðε - ε Þϕ3  = 6ð 1 - S Þ

=

p p i - i 3 ϕ2 þ i 3 ϕ 3 6ð 1 - SÞ

1 ðϕ2 - ϕ3 Þ: 2 ð 1 - SÞ

ð19:145Þ

Thus, we have successfully converted complex functions to real functions. In the above unitary transformation, notice that a norm of the vectors remains unchanged before and after the unitary transformation. As cyclopropenyl radical has three π electrons, two occupy the lowest energy level of A′′. Another electron occupies a level E′′. Since this level possesses an energy higher than α, the electron occupying this level is anticipated to be unstable. Under such a circumstance, a molecule tends to lose the said electron so as to be a cation. Following the argument given in the previous case of ethylene, it is easy to make sure that the allowed transition of the cyclopropenyl radical takes place when

19.4

MO Calculations Based on π-Electron Approximation

791

the light is polarized parallel to the molecular plane (i.e., the xy-plane in Fig. 19.8). The proof is left for readers as an exercise. This polarizing feature is typical of planar molecules with high molecular symmetry.

19.4.3

Benzene

Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h symmetry. In the molecule six π-electrons extend vertically to the molecular plane toward upper and lower directions as in the case of ethylene and cyclopropenyl radical. This is a standard illustration of quantum chemistry and dealt with in many textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be treated similarly. As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in Fig. 19.10. These vectors or their linear combinations span a six-dimensional representation space. We construct basis vectors using these vectors. Following the previous procedures, we construct proper SALC orbitals along with MOs. Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup contains six group elements such that C 6 = E, C 6 , C 3 , C2 , C 23 , C 56 :

ð19:146Þ

Taking C6(z) as an example, we have ðϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ÞC 6 ðzÞ = ðϕ6 ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Þ: Using a matrix representation, we have

Fig. 19.9 Structural formula of benzene. It belongs to D6h symmetry

ð19:147Þ

792

19

Applications of Group Theory to Physical Chemistry

Fig. 19.10 Six equivalent pz atomic orbitals of carbon of benzene

y

x

O

z

Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene D6h Γ

E 6

2C6 0

2C3 0

C2 0

C6 ðzÞ =

3C 02 -2

0 0 0 0 0 1

3C 002 0

1 0 0 0 0 0

0 1 0 0 0 0

i 0

0 0 1 0 0 0

2S3 0

0 0 0 1 0 0

0 0 0 0 1 0

2S6 0

σh -6

:

3σ d 0

3σ v 2

ð19:148Þ

Once again, we can determine the trace for individual symmetry transformations belonging to D6h. We collect the results in Table 19.7. The representation is reducible and this is reduced as follows using a character table of D6h (Table 19.8). As a result, we get Γ = A2u þ B2g þ E1g þ E 2u :

ð19:149Þ

As a subduced representation of D of D6h to C6, we have A2u # C 6 = A, B2g # C 6 = B, E 1g # C 6 = 2E1 , E 2u # C 6 = 2E2 :

ð19:150Þ

Here, we used Table 19.9 that shows a character table of irreducible representations of C6. Following the previous procedures, as SALCs we have

19.4

MO Calculations Based on π-Electron Approximation

793

Table 19.8 Character table of D6h D6h A1g A2g B1g B2g E1g E2g A1u A2u B1u B2u E1u E2u

E 1 1 1 1 2 2 1 1 1 1 2 2

2C6 1 1 -1 -1 1 -1 1 1 -1 -1 1 -1

2C3 1 1 1 1 -1 -1 1 1 1 1 -1 -1

C2 1 1 -1 -1 -2 2 1 1 -1 -1 -2 2

3C 02 1 -1 1 -1 0 0 1 -1 1 -1 0 0

3C 002 1 -1 -1 1 0 0 1 -1 -1 1 0 0

i 1 1 1 1 2 2 -1 -1 -1 -1 -2 -2

2S3 1 1 -1 -1 1 -1 -1 -1 1 1 -1 1

2S6 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1

σh 1 1 -1 -1 -2 2 -1 -1 1 1 2 -2

3σ d 1 -1 1 -1 0 0 -1 1 -1 1 0 0

3σ v 1 -1 -1 1 0 0 -1 1 1 -1 0 0

x2 + y2, z2

(yz, zx) (x2 - y2, xy) z

(x, y)

Table 19.9 Character table of C6 C6 A B E1 E2

E 1 1 1 1 1 1

C6 1 -1 ε ε - ε -ε

C3 1 1 - ε -ε -ε 



C2 1 -1 -1 -1 1 1

C 23 1 1 -ε - ε - ε -ε

C 56 1 -1 ε ε -ε - ε

ε = exp (iπ/3) z; x2 + y2, z2 (x, y); (yz, zx) (x2 - y2, xy)

6PðAÞ ϕ1  ξ1 = ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 , 6PðBÞ ϕ1  ξ2 = ϕ1 - ϕ2 þ ϕ3 - ϕ4 þ ϕ5 - ϕ6 , ð 1Þ P½E1  ϕ1  ξ3 = ϕ1 þ εϕ2 - ε ϕ3 - ϕ4 - εϕ5 þ ε ϕ6 ,

ð 2Þ P½E1  ϕ1  ξ4 = ϕ1 þ ε ϕ2 - εϕ3 - ϕ4 - ε ϕ5 þ εϕ6 ,

ð 1Þ P½E2  ϕ1  ξ5 = ϕ1 - ε ϕ2 - εϕ3 þ ϕ4 - ε ϕ5 - εϕ6 ,

ð 2Þ P½E2  ϕ1  ξ6 = ϕ1 - εϕ2 - ε ϕ3 þ ϕ4 - εϕ5 - ε ϕ6 ,

where ε = exp (iπ/3).

ð19:151Þ

794

19

Applications of Group Theory to Physical Chemistry

Correspondingly, we have a diagonal secular equation of a sixth-order such that ~ 11 - λ~ H S11 ~ 22 - λ~S22 H ~ 33 - λ~S33 H

¼ 0: ð19:152Þ ~ 44 - λ~S44 H ~ 55 - λ~S55 H ~ 66 - λ~S66 H

Here, we have, for example, ~ 11 = H

ξ1 Hξ1 dτ = hξ1 jHξ1 i = 6ðα þ 2β þ 2β0 þ β00 Þ:

ð19:153Þ

In (19.153), we used the same α and β defined in (19.94) as in the case of cyclopropenyl radical. That is, α is a Coulomb integral and β is a resonance integral between two adjacent 2 pz orbital of carbon. Meanwhile, β′ is a resonance integral between orbitals of “meta” positions such as ϕ1 and ϕ3. A quantity β′′ is a resonance integral between orbitals of “para” positions such as ϕ1 and ϕ4. It is unfamiliar to include such kind resonance integrals of β′ and β′′ at a simple π-electron approximation level. To ignore such resonance integrals is because of a practical purpose to simplify the calculations. However, we have no reason to exclude them. Or rather, the use of appropriate SALCs makes it feasible to include β′ and β′′. In a similar manner, we get ~ 22 = H

ξ2 Hξ2 dτ = hξ2 jHξ2 i = 6ðα - 2β þ 2β0 - β00 Þ, ~ 33 = H ~ 44 = 6ðα þ β - β0 - β00 Þ, H ~ 55 = H ~ 66 = 6ðα - β - β0 þ β00 Þ: H

Meanwhile, we have ~S11 = hξ1 jξ1 i = 6ð1 þ 2S þ 2S0 þ S00 Þ, ~S22 = hξ2 jξ2 i = 6ð1 - 2S þ 2S0 - S00 Þ, ~S33 = ~S44 = 6ð1 þ S - S0 - S00 Þ,

ð19:154Þ

19.4

MO Calculations Based on π-Electron Approximation

~S55 = ~S66 = 6ð1 - S - S0 þ S00 Þ,

795

ð19:155Þ

where S, S′, and S′′ are overlap integrals between the ortho, meta, and para positions, respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigenvalues are readily obtained as λ1 =

α þ 2β þ 2β0 þ β00 α - 2β þ 2β0 - β00 α þ β - β0 - β00 , 0 00 , λ2 = 0 00 , λ3 = λ4 = 1 þ 2S þ 2S þ S 1 - 2S þ 2S - S 1 þ S - S0 - S00

λ5 = λ6 =

α - β - β0 þ β00 : 1 - S - S0 þ S00

ð19:156Þ

Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen, λ3 and λ4 are doubly degenerate. So are λ5 and λ6. From (19.151), we get, for instance, jjξ1 jj =

hξ1 jξ1 i =

6ð1 þ 2S þ 2S0 þ S00 Þ:

ð19:157Þ

Thus, for one of the normalized MOs corresponding to an energy eigenvalue λ1, we get Ψ1 =

j ξ1 i ξ1

=

ϕ1 þ ϕ2 þ ϕ3 þ ϕ 4 þ ϕ5 þ ϕ6 : 6ð1 þ 2S þ 2S0 þ S00 Þ

ð19:158Þ

Following the previous examples, we have other normalized MOs. That is, we have Ψ2 =

Ψ3 =

Ψ4 =

j ξ2 i ξ2

j ξ3 i ξ3

j ξ4 i ξ4

=

ϕ1 - ϕ2 þ ϕ3 - ϕ4 þ ϕ5 - ϕ6 , 6ð1 - 2S þ 2S0 - S00 Þ

=

ϕ1 þ εϕ2 - ε ϕ3 - ϕ4 - εϕ5 þ ε ϕ6 , 6ð1 þ S - S0 - S00 Þ

=

ϕ1 þ ε ϕ2 - εϕ3 - ϕ4 - ε ϕ5 þ εϕ6 , 6ð1 þ S - S0 - S00 Þ

796

19

Applications of Group Theory to Physical Chemistry

(

Fig. 19.11 Energy diagram and MO assignments of benzene. Energy eigenvalues λ1 to λ6 are given in (19.156)

(

Ψ5 =

Ψ6 =

j ξ5 i ξ5

j ξ6 i ξ6

=

=

) =

(

)

=

(

)

)

ϕ1 - ε ϕ2 - εϕ3 þ ϕ4 - ε ϕ5 - εϕ6 , 6ð1 - S - S0 þ S00 Þ

ϕ1 - εϕ2 - ε ϕ3 þ ϕ4 - εϕ5 - ε ϕ6 : 6ð1 - S - S0 þ S00 Þ

ð19:159Þ

The eigenfunction Ψ i (1 ≤ i ≤ 6) corresponds to the eigenvalue λi. As in the case of cyclopropenyl radical, Ψ 3 and Ψ 4 can be transformed to Ψ~3 and Ψ~4 , respectively, through a unitary matrix of (19.141). Thus, we get 2ϕ þ ϕ2 - ϕ3 - 2ϕ4 - ϕ5 þ ϕ6 Ψ~3 = 1 , 12ð1 þ S - S0 - S00 Þ ϕ2 þ ϕ3 - ϕ5 - ϕ6 Ψ~4 = p : 2 1 þ S - S0 - S00

ð19:160Þ

Similarly, transforming Ψ 5 and Ψ 6 to Ψ~5 and Ψ~6 , respectively, we have 2ϕ - ϕ2 - ϕ3 þ 2ϕ4 - ϕ5 - ϕ6 Ψ~5 = 1 , 12ð1 - S - S0 þ S00 Þ ϕ2 - ϕ3 þ ϕ5 - ϕ6 Ψ~6 = p : 2 1 - S - S0 þ S00

ð19:161Þ

Figure 19.11 shows an energy diagram and MO assignments of benzene.

19.4

MO Calculations Based on π-Electron Approximation

797

A major optical transition takes place among HOMO (E1g) and LUMO (E2u) levels. In the case of optical absorption, an initial electronic configuration is assigned to the totally symmetric representation A1g and the symmetry of the final electronic configuration is described as Γ = E1g × E2u : Therefore, a transition matrix element is expressed as ΦðE1g × E2u Þ jεe  PjΦðA1g Þ ,

ð19:162Þ

where ΦðA1g Þ stands for the totally symmetric ground state electronic configuration; ΦðE1g × E2u Þ denotes an excited state electronic configuration represented by a directproduct representation. This representation is reducible and expressed as a direct sum of irreducible representations such that Γ = B1u þ B2u þ E 1u :

ð19:163Þ

Notice that unlike ethylene a direct-product representation associated with the final state is reducible. To examine whether (19.162) is non-vanishing, as in the case of (19.109) we estimate whether a direct-product representation A1g × E1g × E2u = E1g × E2u = B1u + B2u + E1u contains an irreducible representation which εe  P belongs to. Consulting a character table of D6h, we find that x and y belong to an irreducible representation E1u and that z belongs to A2u. Since the direct sum Γ in (19.163) contains E1u, benzene is expected to be polarized along both x- and y-axes (see Fig. 19.10). Since (19.163) does not contain A2u, the transition along the z-axis is forbidden. Accordingly, the transition takes place when the light is polarized parallel to the molecular plane (i.e., the xy-plane). This is a common feature among planar aromatic molecules including benzene and cyclopropenyl radical. On the other hand, A2u is not contained in Γ, and so we do not expect the optical transition to occur in the direction of the z-axis.

19.4.4 Allyl Radical [1] We revisit the allyl radical and perform its MO calculations. As already noted, Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual symmetry operations in reference to the basis vectors comprising three atomic orbitals of allyl radical. As usual, we examine traces (or characters) of those matrices. Table 19.10 collects them. The representation is readily reduced according to the character table of C2v (see Table 18.4) so that we have

798

19

Applications of Group Theory to Physical Chemistry

Table 19.10 Characters for individual symmetry transformations of 2pz orbitals in allyl radical C2v Γ

E 3

σ v(zx) 1

C2(z) -1

Γ = A2 þ 2B1 :

σ 0v ðyzÞ -3

ð19:164Þ

We have two SALC orbitals that belong to the same irreducible representation of B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two SALCs belong to B1 as well. Such a linear combination is given by a unitary transformation. In the present case, it is convenient to transform the basis vectors two times. The first transformation is carried out to get SALCs and the second one will be done in the process of solving a secular equation using the SALCs. Schematically showing the procedures, we have ϕ1 ϕ2 ϕ3



Ψ1 Ψ2 Ψ3



Φ1 Φ2 Φ3

,

where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2, Φ3 the final Mos. Thus, the three sets of vectors are connected through unitary transformations. Starting with ϕ1 and following the previous cases, we have, e.g., PðB1 Þ ϕ1 =

1 4



χ ðB1 Þ ðgÞ gϕ1 = ϕ1 : g

Also starting with ϕ2, we have PðB1 Þ ϕ2 =

1 4

g

 1 χ ðB1 Þ ðgÞ gϕ2 = ðϕ2 þ ϕ3 Þ: 2

Meanwhile, we get PðA2 Þ ϕ1 = 0,

PðA2 Þ ϕ2 =

1 4



χ ðA2 Þ ðgÞ gϕ2 = g

1 ðϕ - ϕ3 Þ: 2 2

Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not participate in A2, but take part in B1 by itself. Normalized SALCs are given as follows:

19.4

MO Calculations Based on π-Electron Approximation

Ψ 1 = ϕ1 , Ψ 2 = ðϕ2 þ ϕ3 Þ=

799

2ð1 þ S0 Þ, Ψ 3 = ðϕ2 - ϕ3 Þ=

2ð1 - S0 Þ,

where we define S′  ϕ2ϕ3dτ. If Ψ 1, Ψ 2, and Ψ 3 belonged to different irreducible representations (as in the cases of previous three examples of ethylene, cyclopropenyl radical, and benzene), the secular equation would be fully reduced to a form of (19.66). In the present case, however, Ψ 1 and Ψ 2 belong to the same irreducible representation B1, making the situation a bit complicated. Nonetheless, we can use (19.61) and the secular equation is “partially” reduced. Defining H jk =

Ψ j HΨ k dτ

Sjk =

and

Ψ j Ψ k dτ,

we have a following secular equation the same form as (19.61): det H jk - λSjk = 0: More specifically, we have α-λ

2 ðβ - SλÞ 1 þ S′

0

α þ β′ -λ 1 þ S′

0

0

α-β′ -λ 1 - S′

2 ðβ - SλÞ 1 þ S′ 0

= 0,

where α, β, and S are similarly defined as (19.94) and (19.98). The quantities β′ is defined as β0 

ϕ2 Hϕ3 dτ:

Thus, the secular equation is separated into the following two: α-λ 2 ðβ - SλÞ 1 þ S′

2 ðβ - SλÞ 1 þ S′ =0 α þ β′ -λ 1 þ S′

The second equation immediately gives

and

α-β′ - λ = 0: 1 - S′

ð19:165Þ

800

19

Applications of Group Theory to Physical Chemistry

λ=

α - β0 : 1 - S0

The first equation of (19.165) is somewhat complicated, and so we adopt the next approximation. That is, S0 = β0 = 0:

ð19:166Þ

This approximation is justified, because two carbon atoms C2 and C3 are pretty remote, and so the interaction between them is likely to be weak enough. Thus, we rewrite the first equation of (19.165) as p α-λ 2ðβ - SλÞ

p 2ðβ - SλÞ α-λ

= 0:

ð19:167Þ

Moreover, we assume that since S is a small quantity compared to 1, a square term of S2 ≪ 1. Hence, we ignore S2. Using the approximation of (19.166) and assuming S2 ≈ 0, from (19.167) we have a following quadratic equation: λ2 - 2ðα - 2βSÞλ þ α2 - 2β2 = 0: Solving this equation, we have p λ = α - 2βS ± 2β

1-

p αS 2αS ≈ α - 2βS ± 2β 1 , β β

where the last approximation is based on p 1 1-x≈1- x 2 with a small quantity x. Thus, we get λL ≈ α þ

p



p p 1 - 2S and λH ≈ α - 2β



p

2S ,

ð19:168Þ

where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting Φ = c1 Ψ 1 þ c2 Ψ 2 and from (19.167), we obtain

ð19:169Þ

19.4

MO Calculations Based on π-Electron Approximation

ðα - λÞc1 þ

p

801

2ðβ - SλÞc2 = 0:

Thus, with λ = λL we get c1 = c2 and for λ = λH we get c1 = - c2. Consequently, as a normalized eigenfunction ΦL corresponding to λL we get p 1 1 ΦL = p ð Ψ 1 þ Ψ 2 Þ = 1 þ 2S 2 2

- 1=2

p

2 ϕ1 þ ϕ2 þ ϕ3 :

ð19:170Þ

Likewise, as a normalized eigenfunction ΦH corresponding to λH we get p 1 1 ΦH = p ð- Ψ 1 þ Ψ 2 Þ = 1 - 2S 2 2

- 1=2

p - 2ϕ1 þ ϕ2 þ ϕ3 :

ð19:171Þ

As another eigenvalue λ0 and corresponding eigenfunction Φ0, we have λ0 ≈ α

and

1 Φ0 = p ðϕ2 - ϕ3 Þ, 2

ð19:172Þ

where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical bonding and, hence, is said to be a non-bonding orbital. It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function forms as those obtained by the simple Hückel theory that ignores the overlap integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs of Ψ 2 and Ψ 3 is weak. Notice that within the framework of the simple Hückel theory, two sets of basis vectors j ϕ1 i, p12 j ϕ2 þ ϕ3 i, p12 j ϕ2 - ϕ3 i and jΦLi, jΦHi, jΦ0i are connected by a following unitary matrix V: ðjΦL i jΦH i jΦ0 iÞ ¼

¼

j ϕ1 i

1 p j ϕ2 þ ϕ 3 i 2

j ϕ1 i

1 p j ϕ2 þ ϕ 3 i 2

1 p j ϕ2 - ϕ 3 i 2

1 p 2 1 p 2 0

1 p j ϕ2 - ϕ 3 i 2 -

1 p 2

V

0

1 p 0 2 0 1

:

ð19:173Þ

Although both SALCs Ψ 1 and Ψ 2 belong to the same irreducible representation B1, they are not orthogonal. As can be seen from (19.170) and (19.171), however, we find that ΦL and ΦH that are sought by solving the secular equation (19.167) have been mutually orthogonal. Starting from jϕ1i, jϕ2i, and jϕ3i of Example 18.1, we reached jΦLi, jΦHi, and jΦ0i via two-step unitary transformations (18.40) and (19.173). The combined unitary transformations W = UV are unitary again. That is, we have

802

(a)

19

(

)

(

)

(

)

Applications of Group Theory to Physical Chemistry

(b) −

2 )(1 +

2

+

2 )(1 −

2

(c)

Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state. (c) Second excited state

Fig. 19.13 Geometry and position vectors of carbon atoms of the allyl cation. The origin is located at the center of a line segment connecting C2 and C3 (r2 + r3 = 0)

C1

r1 C3

r3

j ϕ1 i j ϕ2 i j ϕ3 i UV = j ϕ1 i j ϕ2 i j ϕ3 i

O 1 p 2 1 2 1 2

= ðj ΦL i j ΦH i jΦ0 iÞ:

r2

C2

1 -p 0 2 1 1 p 2 2 1 1 -p 2 2 ð19:174Þ

The optical transition of allyl radical represents general features of the optical transitions of molecules. To make a story simple, let us consider a case of allyl cation. Figure 19.12 shows the electronic configurations together with symmetry species of individual eigenstates and their corresponding energy eigenvalues for the allyl cation. In Fig. 19.13, we redraw its geometry where the origin is located at the center of a line segment connecting C2 and C3 (r2 + r3 = 0). Major optical transitions (or optical absorption) are ΦL → Φ0 and ΦL → ΦH. (i) ΦL → Φ0: In this case, following (19.109) the transition matrix element Tfi is described by T fi = Θf ðB1 × A2 Þ jεe  PjΘi ðA1 Þ :

ð19:175Þ

19.4

MO Calculations Based on π-Electron Approximation

803

In the above equation, we designate the irreducible representation of eigenstates according to (19.164). Therefore, the symmetry of the final electronic configuration is described as Γ = B1 × A2 = B2 : Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the final configuration Θf ðB1 × A2 Þ is B1 × A2 × A1 = B2. Then, if εe  P belongs to the same irreducible representation B2, the associated optical transition should be allowed. Consulting a character table of C2v, we find that y belongs to B2. Thus, the allowed optical transition is polarized along the y-direction (see Fig. 18.1). (ii) ΦL → ΦH: In parallel with the above case, Tfi is described by T fi = Θf ðB1 × B1 Þ jεe  PjΘi ðA1 Þ :

ð19:176Þ

Thus, the transition is characterized by A1 → A1 , where the former A1 indicates the electronic ground state and the latter A1 indicates the excited state given by B1 × B1 = A1. The direct product of them is simply described as B1 × B1 × A1 = A1. Consulting a character table again, we find that z belongs to A1. This implies that the allowed transition is polarized along the zdirection (see Fig. 18.1). Next, we investigate the above optical transition in a semi-quantitative manner. In principle, the transition matrix element Tfi should be estimated from (19.175) and (19.176) that use electronic configurations of two-electron system. Nonetheless, a formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our present purpose. For ΦL → Φ0 transition, we have T fi ðΦL → Φ0 Þ =

= eεe 

eε = pe  2 2

p

Φ0  ðεe  PÞΦL dτ = eεe 

Φ0  rΦL dτ

1 p 1 2ϕ1 þ ϕ2 þ ϕ3 r p ðϕ2 - ϕ3 Þ dτ 2 2 p 2ϕ1 rϕ2 - 2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2 - ϕ3 rϕ3 dτ

804

19

eε ≈ pe  2 2

eε ≈ pe  r2 2 2

jϕ2 j2 dτ - r3

Applications of Group Theory to Physical Chemistry

ðϕ2 rϕ2 - ϕ3 rϕ3 Þ dτ

eε eε jϕ3 j2 dτ = pe  ðr2 - r3 Þ = p e  r2 , 2 2 2

where with the first near equality we ignored integrals ϕirϕjdτ (i ≠ j); with the last near equality r ≈ r2 or r3. For these approximations, we assumed that an electron density is very high near C2 or C3 with ignorable density at a place remote from them. Choosing εe for the direction of r2, we get e T fi ðΦL → Φ0 Þ ≈ p jr2 j: 2 With ΦL → ΦH transition, similarly we have T fi ðΦL → ΦH Þ =

ΦH  εe  PΦL dτ ≈

eεe eε  ðr3 - 2r1 þ r2 Þ = - e  r1 : 4 2

Choosing εe for the direction of r1, we get T fi ðΦL → ΦH Þ ≈

e jr j: 2 1

Transition p probability is proportional to a square of an absolute value of Tfi. Using j r2 j ≈ 3 j r1 j, we have 2

T fi ðΦL → Φ0 Þ ≈

e2 3e2 2 jr2 j2 = jr j , 2 2 1

2

T fi ðΦL → ΦH Þ ≈

e2 jr j2 : 4 1

Thus, we obtain 2

2

T fi ðΦL → Φ0 Þ ≈ 6 T fi ðΦL → ΦH Þ :

ð19:177Þ

Thus, the transition probability of ΦL → Φ0 is about six times that for ΦL → ΦH. Note that in the above simple estimation we ignore an overlap integral S. From the above discussion, we conclude that (i) the ΦL → Φ0 transition is polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL → ΦH transition is polarized along the r1 direction (i.e., the molecular short axis). (ii) Transition probability of ΦL → Φ0 is about six times that of ΦL → ΦH. Note

19.5

MO Calculations of Methane

805

that the polarized characteristics are consistent with those obtained from the discussion based on the group theory. The conclusion reached by the semi-quantitative estimation of a simple molecule of allyl cation well typifies the general optical features of more complicated molecules having a well-defined molecular long axis such as polyenes.

19.5

MO Calculations of Methane

So far, we investigated MO calculations of aromatic molecules based upon πelectron approximation. These are a homogeneous system that has the same quality of electrons. Here we deal with methane that includes a carbon and surrounding four hydrogens. These hydrogen atoms form a regular tetrahedron with the carbon atom positioned at a center of the tetrahedron. It is therefore considered as a heterogeneous system. The calculation principle, however, is consistent, namely, we make the most of projection operators and construct appropriate SALCs of methane. We deal with four 1s electrons of hydrogen along with two 2s electrons and two 2p electrons of carbon. Regarding basis functions of carbon, however, we consider 2s atomic orbital and three 2p orbitals (i.e., 2px, 2py, 2pz orbitals). That is, we deal with eight atomic orbitals all together. These are depicted in Fig. 19.14. The dimension of the vector space (i.e., representation space) is eight accordingly.

z

Fig. 19.14 Four 1 s atomic orbitals of hydrogen and a 2pz orbital of carbon. The former orbitals are represented by H1 to H4. 2px and 2py orbitals of carbon are omitted for simplicity

O

x

y

806

19

Applications of Group Theory to Physical Chemistry

As before, we wish to determine irreducible representations which individual MOs belong to. As already mentioned in Sect. 17.3, there are 24 symmetry operations in a point group Td which methane belongs to (see Table 17.6). According to the symmetry operations, we decide transformation matrices related to each operation. For example, Cxyz 3 transforms basis functions as follows: H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C xyz 3 = H 1 H 3 H 4 H 2 C2s C2py C2pz C2px , where by the above notations we denoted atomic species and molecular orbitals. Hence, as a matrix representation we have

C 3xyz =

1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 1

0 0 0 0 0 1 0 0

,

ð19:178Þ

appeared in Sect. 17.3. Therefore, where Cxyz 3 is the same operation as Rxyz2π 3 = 2: χ Cxyz 3 As another example σ yz d , we have H 1 H 2 H 3 H 4 C2s C2px C2py C2pz σ yz d = H 1 H 2 H 4 H 3 C2s C2px C2pz C2py , where σ yz d represents a mirror symmetry with respect to the plane that includes the xaxis and bisects the angle formed by the y- and z-axes. Thus, we have

σ yz d =

Then, we have

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1

0 0 0 0 0 0 1 0

:

ð19:179Þ

19.5

MO Calculations of Methane

807

Table 19.11 Characters for individual symmetry transformations of hydrogen 1s orbitals and carbon 2s and 2p orbitals in methane Td Γ

E 8

8C3 2

3C2 0

6S4 0

6σ d 4

χ σ yz d = 4: Taking some more examples, for Cz2 we get H 1 H 2 H 3 H 4 C2s C2px C2py C2pz Cz2 = H 4 H 3 H 2 H 1 C2s - C2px - C2py C2pz , where Cz2 means a rotation by π around the z-axis. Also, we have

C 2z =

0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0

0 1 0 0 0 0 0 0

1 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 -1 0 0 -1 0 0

0 0 0 0 0 0 0 1

,

ð19:180Þ

χ C z2 = 0: zπ

With S42 (i.e., an improper rotation by π2 around the z-axis), we get zπ

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42 = H 3 H 1 H 4 H 2 C2s C2py - C2px - C2pz ,



S4 2 =

0 0 1 0 0 0 0 0

1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 1 0 0 0 0 -1

,

ð19:181Þ

808

19

Applications of Group Theory to Physical Chemistry zπ

χ S42 = 0: As for the identity matrix E, we have χ ðE Þ = 8: Thus, Table 19.11 collects characters of individual symmetry transformations with respect to hydrogen 1s and carbon 2s and 2p orbitals in methane. From the above examples, we notice that all the symmetry operators R reduce an eightdimensional representation space V8 to subspaces of Span{H1, H2, H3, H4} and Span{C2s, C2px, C2py, C2pz}. In terms of a notation of Sect. 12.2, we have V 8 = SpanfH 1 , H 2 , H 3 , H 4 g

Span C2s, C2px , C2py , C2pz :

ð19:182Þ

In other words, V8 is decomposed into the above two R-invariant subspaces (see Part III); one is the hydrogen-related subspace and the other is the carbon-related subspace. Correspondingly, a representation D comprising the above representation matrices should be reduced to a direct sum of irreducible representations D(α). That is, we should have D=

q D α α

ðαÞ

,

where qα is a positive integer or zero. Here we are thinking of decomposition of (8, 8) matrices such as (19.178) into submatrices. We estimate qα using (18.83). That is, qα =

1 n

g

χ ðαÞ ðgÞ χ ðgÞ:

ð18:83Þ

With an irreducible representation A1 of Td, for instance, we have qA1 =

1 24

g

χ ðA1 Þ ðgÞ χ ðgÞ =

1 ð1  8 þ 1  8  2 þ 1  6  4Þ = 2: 24

As for A2, we have qA 2 =

1 ½1  8 þ 1  8  2 þ ð- 1Þ  6  4 = 0: 24

Regarding T2, we get qT 2 =

1 ð3  8 þ 1  6  4Þ = 2: 24

For other irreducible representations of Td, we get qα = 0. Consequently, we have

19.5

MO Calculations of Methane

809

D = 2DðA1 Þ þ 2DðT 2 Þ :

ð19:183Þ

Evidently, both for the hydrogen-related representation D(H ) and carbon-related representation D(C), we have individually DðH Þ = DðA1 Þ þ DðT 2 Þ and DðCÞ = DðA1 Þ þ DðT 2 Þ ,

ð19:184Þ

where D = D(H ) + D(C). In fact, taking (19.180), for example, in a subspace Span{H1, H2, H3, H4}, the operator Cz2 is expressed as

C 2z

=

0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0

:

Following the routine procedures based on characteristic polynomials, we get eigenvalues +1 (as a double root) and -1 (a double root as well). Unitary similarity transformation using a unitary matrix P expressed as

P=

1 2 1 2 1 2 1 2

1 2 1 2 1 2 1 2

1 2 1 2 1 2 1 2

1 2 1 2 1 2 1 2

yields a diagonal matrix described by

P

-1

C 2z P =

1 0 0 0

0 -1 0 0

0 0 -1 0

0 0 0 1

:

Note that this diagonal matrix is identical to submatrix of (19.180) for Span{C2s, zπ C2px, C2py, C2pz}. Similarly, S42 of (19.181) gives eigenvalues ±1 and ±i for both zπ the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex number with an absolute value of 1. Since these symmetry operation matrices are unitary, these matrices must be diagonalized according to Theorem 14.5. Using unitary matrices whose column vectors are chosen from eigenvectors of the matrices, diagonal elements are identical to eigenvalues including their multiplicity. Writing representation matrices of symmetry operations for the hydrogen-associated subspace and carbon-associated subspace as H and C, we find that H and C have the

810

19

Applications of Group Theory to Physical Chemistry

same eigenvalues in common. Notice here that different types of transformation zπ matrices (e.g., C z2 , S42 , etc.) give a different set of eigenvalues. Via unitary similarity transformation using unitary matrices P and Q, we get P - 1 HP = Q - 1 CQ

PQ - 1

or

-1

H PQ - 1 = C:

Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling Schur’s First Lemma, (19.184) results. Our next task is to construct SALCs, i.e., proper basis vectors using projection operators. From the above, we anticipate that the MOs comprise a linear combination of the hydrogen-associated SALC and carbon-associated SALC that belongs to the same irreducible representation (i.e., A1 or T2). For this purpose, we first find proper SALCs using projection operators described as ðαÞ

PlðlÞ =

dα n

ðαÞ

Dll ðgÞ g:

ð18:156Þ

g

We apply this operator to Span{H1, H2, H3, H4}. Taking, e.g., H1 and operating both sides of (18.156) on H1, as a basis vector corresponding to a one-dimensional representation of A1 we have ðA Þ

P1ð11Þ H 1 =

d A1 n

ðA Þ

D111 ðgÞ gH 1 = g

1 ½ð1  H 1 Þ þ ð1  H 1 þ 1  H 1 þ 1  H 3 24

þ1  H 4 þ 1  H 3 þ 1  H 2 þ 1  H 4 þ 1  H 2 Þ þ ð1  H 4 þ 1  H 3 þ 1  H 2 Þ þð1  H 3 þ 1  H 2 þ 1  H 2 þ 1  H 4 þ 1  H 4 þ 1  H 3 Þ þð1  H 1 þ 1  H 4 þ 1  H 1 þ 1  H 2 þ 1  H 1 þ 1  H 3 Þ: =

1 ð6  H 1 þ 6  H 2 þ 6  H 3 þ 6  H 4 Þ 24 1 = ðH 1 þ H 2 þ H 3 þ H 4 Þ: 4

ð19:185Þ

The case of C2s is simple, because all the symmetry operations convert C2s to itself. That is, we have ðA Þ

P1ð11Þ C2s =

1 ð24  C2sÞ = C2s: 24

Regarding C2p, taking C2px, for instance, we have

ð19:186Þ

19.5

MO Calculations of Methane

Table 19.12 Character table of Td

ðA Þ

P1ð11Þ C2px =

Td A1 A2 E T1 T2

811 E 1 1 2 3 3

8C3 1 1 -1 0 0

3C2 1 1 2 -1 -1

6S4 1 -1 0 1 -1

6σ d 1 -1 0 -1 1

x2 + y2 + z2 (2z2 - x2 - y2, x2 - y2) (x, y, z); (xy, yz, zx)

1 ð1  C2px Þ þ 1  C2py þ 1  C2pz þ 1  - C2py 24

þ1  - C2pz þ 1  - C2py þ 1  C2pz þ 1  C2py þ 1  - C2pz  þ½ð1  C2px þ 1  ð- C2px Þ þ 1  ð- C2px Þ þ ½1  ð- C2px Þ þ1  ð- C2px Þ þ 1  - C2pz þ 1  C2pz þ 1  C2py þ 1  - C2py  þ 1  C2py þ 1  C2px þ 1  C2pz þ 1  - C2py þ 1  C2px þ 1  - C2pz g = 0: The calculation is somewhat tedious, but it is natural that since C2s is spherically symmetric, it belongs to the totally symmetric representation. Conversely, it is natural to think that C2px is totally unlikely to contain a totally symmetric representation. This is also the case with C2py and C2pz. Table 19.12 shows the character table of Td in which the three-dimensional irreducible representation T2 is spanned by basis vectors (x y z). Since in Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can directly be utilized to represent T2. More specifically, we can directly choose the ðT Þ diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 , ðT 2 Þ ðT 2 Þ D22 , and D33 elements of the projection operators, respectively. Thus, we can construct SALCs using projection operators explained in Sect. 18.7. For example, using H1 we obtain SALCs that belong to T2 such that ðT Þ

P1ð12Þ H 1 =

3 24

ðT Þ

D112 ðgÞ gH 1 = g

3 fð 1  H 1 Þ 24

812

19

Applications of Group Theory to Physical Chemistry

þ½ð- 1Þ  H 4 þ ð- 1Þ  H 3 þ 1  H 2  þ½0  ð- H 3 Þ þ 0  ð- H 2 Þ þ 0  ð- H 2 Þ þ 0  ð- H 4 Þ þ ð- 1Þ  H 4 þ ð- 1Þ  H 3  þð0  H 1 þ 0  H 4 þ 1  H 1 þ 1  H 2 þ 0  H 1 þ 0  H 3 Þg

=

3 ½2  H 1 þ 2  H 2 þ ð- 2Þ  H 3 þ ð- 2Þ  H 4  24

=

1 ðH þ H 2 - H 3 - H 4 Þ: 4 1

ð19:187Þ

Similarly, we have 1 ðH - H 2 þ H 3 - H 4 Þ, 4 1

ð19:188Þ

1 ðT Þ P3ð32Þ H 1 = ðH 1 - H 2 - H 3 þ H 4 Þ: 4

ð19:189Þ

ðT Þ

P2ð22Þ H 1 =

ðT Þ

Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,

ðT Þ

ðT Þ

P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we obtain what we anticipate. That is, ðT Þ

ðT Þ

ðT Þ

P1ð12Þ C2px = C2px , P2ð22Þ C2py = C2py , P3ð32Þ C2pz = C2pz :

ð19:190Þ

This gives a good example to illustrate the general concept of projection operators and related calculations of inner products discussed in Sect. 18.7. For example, we ðT Þ ðT Þ have a non-vanishing inner product of P1ð12Þ H 1 jP1ð12Þ C2px and an inner product ðT Þ

ðT Þ

of, e.g., P2ð22Þ H 1 jP1ð12Þ C2px

is zero. This significantly reduces efforts to solve a ðT Þ

ðT Þ

secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are ðT Þ

ðT Þ

linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px ðαÞ

ðαÞ

correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions are linearly independent, while they belong to the same place “1” of the same threedimensional irreducible representation T2. Equations (19.187)–(19.190) seem intuitively obvious. This is because if we draw a molecular geometry of methane (see Fig. 19.14), we can immediately recognize the relationship between the directionality in ℝ3 and “directionality” of

19.5

MO Calculations of Methane

813

SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of carbon). As stated above, we have successfully obtained SALCs relevant to methane. Therefore, our next task is to solve an eigenvalue problem and construct appropriate MOs. To do this, let us first normalize the SALCs. We assume that carbon-based SALCs have already been normalized as well-studied atomic orbitals. We suppose that all the functions are real. For the hydrogen-based SALCs hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i = 4hH 1 jH 1 i þ 12hH 1 jH 2 i = 4 þ 12SHH = 4 1 þ 3SHH ,

ð19:191Þ

where the second equality comes from the fact that jH1i, i.e., a 1 s atomic orbital is normalized. We define hH1| H2i = SHH, i.e., an overlap integral between two adjacent hydrogen atoms. Note also that hHi| Hji (1 ≤ i, j ≤ 4; i ≠ j) is the same because of the symmetry requirement. Thus, as a normalized SALC we have H ðA1 Þ 

j H1 þ H2 þ H3 þ H4i 2

Defining a denominator as c = 2

1 þ 3SHH

:

ð19:192Þ

1 þ 3SHH , we have

H ðA1 Þ = j H 1 þ H 2 þ H 3 þ H 4 i=c:

ð19:193Þ

Also, we have hH 1 þ H 2 - H 3 - H 4 jH 1 þ H 2 - H 3 - H 4 i = 4hH 1 jH 1 i - 4hH 1 jH 2 i = 4 - 4SHH = 4 1 - SHH :

ð19:194Þ

Thus, we have ðT 2 Þ

H1



j H1 þ H2 - H3 - H4i p : 2 1 - SHH

p Also defining a denominator as d = 2 1 - SHH , we have

ð19:195Þ

814

19 ðT 2 Þ

Applications of Group Theory to Physical Chemistry

j H 1 þ H 2 - H 3 - H 4 i=d:

H1

Similarly, we define other hydrogen-based SALCs as ðT 2 Þ

j H 1 - H 2 þ H 3 - H 4 i=d,

ðT 2 Þ

j H 1 - H 2 - H 3 þ H 4 i=d:

H2

H3

The next step is to construct MOs using the above SALCs. To this end, we make a linear combination using SALCs belonging to the same irreducible representations. In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear combinations of a1 H ðA1 Þ þ b1 C2s,

ð19:196Þ

where a1 and b1 are arbitrary constants. On the basis of the discussions of projection operators in Sect. 18.7, both the above two linear combinations belong to A1 as well. ðT Þ ðT Þ ðT Þ Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three sets of linear combinations ðT Þ

ðT Þ

ðT Þ

q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz ,

ð19:197Þ

where q1, r1, etc. are arbitrary constants. These three sets of linear combinations belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to determine coefficients of the above MOs and to normalize them by solving the secular equations. With two different energy eigenvalues, we get two orthogonal (i.e., linearly independent) MOs for the individual four sets of linear combinations of (19.196) and (19.197). Thus, total eight linear combinations constitute MOs of methane. In light of (19.183), the secular equation can be reduced as follows according to the representations A1 and T2. There we have changed the order of entries in the equation so that we can deal with the equation easily. Then we have H 11 - λ H 21 - λS21

H 12 - λS12 H 22 - λ G11 - λ G21 - λT 21

G12 - λT 12 G22 - λ F 11 - λ F 21 - λV 21

F 12 - λV 12 F 22 - λ K 11 - λ K 12 - λW 12 K 21 - λW 21 K 22 - λ

¼ 0,

ð19:198Þ

19.5

MO Calculations of Methane

815

where off-diagonal elements are all zero except for those explicitly described. This is because of the symmetry requirement (see Sect. 19.2). Thus, in (19.198) the secular equation is decomposed into four (2, 2) blocks. The first block is pertinent to A1 of a hydrogen-based component and carbon-based component from the left, respectively. Lower three blocks are pertinent to T2 of hydrogen-based and carbon-based compoðT Þ ðT Þ ðT Þ nents from the left, respectively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the top. The notations follow those of (19.59). We compute these equations. The calculations are equivalent to solving the following four two-dimensional secular equations: H 11 - λ H 21 - λS21

H 12 - λS12 H 22 - λ

= 0,

G11 - λ G21 - λT 21

G12 - λT 12 G22 - λ

= 0,

F 11 - λ F 21 - λV 21

F 12 - λV 12 F 22 - λ

= 0,

K 11 - λ K 21 - λW 21

K 12 - λW 12 K 22 - λ

= 0:

ð19:199Þ

Notice that these four secular equations are the same as a (2, 2) determinant of (19.59) in the form of a secular equation. Note, at the same time, that while (19.59) did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199) expresses a secular equation with respect to two SALCs that belong to the same irreducible representation. The first equation of (19.199) reads as 1 - S12 2 λ2 - ðH 11 þ H 22 - 2H 12 S12 Þλ þ H 11 H 22 - H 12 2 = 0: In (19.200) we define quantities as follows: S12 

H ðA1 Þ C2sdτ

 H ðA1 Þ jC2s =

hH 1 þ H 2 þ H 3 þ H 4 jC2si 2

1 þ 3S

HH

=

2hH 1 jC2si 1 þ 3SHH

ð19:200Þ

816

19

Applications of Group Theory to Physical Chemistry

2SCH A1

=

1 þ 3SHH

:

ð19:201Þ

In (19.201), an overlap integral between hydrogen atomic orbitals and C2s is identical from the symmetry requirement and it is defined as SCH A1  hH 1 jC2si:

ð19:202Þ

Also in (19.200), other quantities are defined as follows: H 11  H ðA1 Þ jHH ðA1 Þ =

αH þ 3βHH , 1 þ 3SHH

ð19:203Þ

H 22  hC2sjHC2si,

H 12  H ðA1 Þ j HC2s =

hH 1 þ H 2 þ H 3 þ H 4 jHC2si 2

1 þ 3S

HH

=

ð19:204Þ

2hH 1 jHC2si 1 þ 3S

HH

=

2βCH A1 1 þ 3SHH

, ð19:205Þ

where H is a Hamiltonian of a methane molecule. In (19.203) and (19.205), moreover, we define the quantities as αH  hH 1 jHH 1 i, βHH  hH 1 jHH 2 i, βCH A1  hH 1 jHC2si:

ð19:206Þ

The quantity of H11 is a “Coulomb” integral of the hydrogen-based SALC that involves four hydrogen atoms. Solving the first equation of (19.199), we get

λ¼

H 11 þH 22 -2H 12 S12 ±

ðH 11 -H 22 Þ2 þ4 H 12 2 þH 11 H 22 S12 2 -H 12 S12 ðH 11 þH 22 Þ 2 1-S12 2

:

ð19:207Þ

Similarly, we obtain related solutions for the latter three eigenvalue equations of (19.199). With the second equation of (19.199), for instance, we have 1 - T 12 2 λ2 - ðG11 þ G22 - 2G12 T 12 Þλ þ G11 G22 - G12 2 = 0: Solving this, we get

ð19:208Þ

19.5

λ¼

MO Calculations of Methane

G11 þG22 -2G12 T 12 ±

817

ðG11 -G22 Þ2 þ4 G12 2 þG11 G22 T 12 2 -G12 T 12 ðG11 þG22 Þ 2 1-T 12 2

:

ð19:209Þ

In (19.208) and (19.209) we define these quantities as follows: T 12 

ðT Þ

ðT Þ

H 1 2 C2px dτ  H 1 2 jC2px =

hH 1 þ H 2 - H 3 - H 4 jC2px i p 2 1 - SHH

2SCH 2hH jC2px i = p 1 = p T2 , 1 - SHH 1 - SHH

ðT Þ

ðT 2 Þ

G11  H 1 2 jHH 1

=

αH - βHH , 1 - SHH

G22  hC2px jHC2px i,

ðT 2 Þ

G12  H 1 =

j HC2px =

ð19:210Þ

ð19:211Þ

ð19:212Þ

hH 1 þ H 2 - H 3 - H 4 jHC2px i p 2 1 - SHH

2βCH 2hH 1 jHC2px i p = p T2 : 1 - SHH 1 - SHH

ð19:213Þ

In the above equations, we have further defined integrals such that CH SCH T 2  hH 1 jC2px i, βT 2  hH 1 jHC2px i:

ð19:214Þ

In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and C2px is identical again from the symmetry requirement. That is, the integrals of (19.213) are additive regarding a product of plus components of a hydrogen atomic orbital of (19.187) and C2px as well as another product of minus components of a hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a node on yz-plane. The third and fourth equations of (19.199) give exactly the same eigenvalues as that given in (19.209). This is obvious from the fact that all the latter three equations of (19.199) are associated with a irreducible representation T2. The corresponding three MOs are triply degenerate.

818

19

Applications of Group Theory to Physical Chemistry

e

Fig. 19.15 Configuration of electron (e) and hydrogen nuclei (A and B). rA and rB denote a separation between the electron and A and that between the electron and B, respectively. R denotes a separation between A and B

A

B

In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus sign gives a lower energy. Equations (19.207) and (19.209), however, look somewhat complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider a case where jH11 j ≫ j H22j or jH11 j ≪ j H22j. In that case, (H11 - H22)2 dominates inside a square root, and so ignoring -2H12S12 and -S122 we have λ ≈ H11 or λ ≈ H22. Inserting these values into (19.199), we have either Ψ ≈ H ðA1 Þ or Ψ ≈ C2s, where Ψ is a resulting MO. This implies that no interaction would arise between H ðA1 Þ and C2s. (ii) If, however, H11 = H22, we would get λ=

H 11 - H 12 S12 ± j H 12 - H 11 S12 j : 1 - S12 2

ð19:215Þ

As H12 - H11S12 is positive or negative, we have a following alternative: 11 þH 12 and λL = H111--SH1212 , where Case I: H12 - H11S12 > 0. We have λH = H1þS 12 λ H > λ L. 11 þH 12 , where Case II: H12 - H11S12 < 0. We have λH = H111--SH1212 and λL = H1þS 12 λ H > λ L. In the above cases, we would anticipate maximum orbital mixing between H ðA1 Þ and C2s. (iii) If H11 and H22 are moderately different in between the above (i) and (ii), the orbital mixing is likely to be moderate. This is an actual situation. With (19.209) we have eigenvalues expressed similarly to those of (19.207). Therefore, classifications related to the above Cases I and II hold with G12 - G11T12, F12 F11V12, and K12 - K11W12 as well. In spite of simplicity of (19.215), the quantities H11, H12, and S12 are hard to calculate. In general cases including the present example (i.e., methane), the difficulty in calculating these quantities results essentially from the fact that we are dealing with a many-particle interactions that include electron repulsion. Nonetheless, for the simplest case of a hydrogen molecular ion Hþ 2 the estimation is feasible [3]. Let us estimate H12 - H11S12 quantitatively according to Atkins and Friedman [3].

19.5

MO Calculations of Methane

819

The Hamiltonian of Hþ 2 is described as ħ2 2 e2 e2 e2 þ — , 2m 4πε0 r A 4πε0 r B 4πε0 R

H= -

ð19:216Þ

where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last term represents the repulsive interaction between the two hydrogen nuclei. To estimate the quantities in (19.215), it is convenient to use dimensionless ellipsoidal coordinates

μ ν ϕ

such that [3].

μ=

rA þ rB R

and

ν=

rA - rB : R

ð19:217Þ

The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight line connecting the two nuclei). Then, we have H 11 = hAjHjAi = Aj þ Aj

ħ2 2 e2 e2 jA þ Aj jA — 2m 4πε0 r A 4πε0 r B

e2 jA 4πε0 R

= E 1s -

e2 1 e2 Aj jA þ , 4πε0 4πε0 R rB

ð19:218Þ

where E1s is the same as that given in (3.258) with Z = n = 1 and μ replaced with ħ2 m (i.e., E1s = - 2ma 2 ). Using a coordinate representation of (3.301), we have j Ai =

1 - 3=2 - rA =a a e : π

ð19:219Þ

Moreover considering (19.217), we have Aj

1 1 jA = 3 rB πa

dτe - 2rA =a

1 : rB

Converting Cartesian coordinates to ellipsoidal coordinates [3] such that dτ = we have

R 2

1



3

dϕ 0

1

dμ 1

-1

dν μ2 - ν2 ,

820

19

Aj

1 R3 jA =  2π rB 8πa3

= Putting I = 1

I=

Applications of Group Theory to Physical Chemistry

R2 2a3

1 1 1 dμ - 1 dνðμ 1

μe - μR=a dμ

-1

1

1

1

-1

1

1

dμ 1

1



-1

dν μ2 - ν2

e - ðμþνÞR=a R 2 ðμ - ν Þ

dνðμ þ νÞe - ðμþνÞR=a :

ð19:220Þ

þ νÞe - ðμþνÞR=a , we obtain 1

e - νR=a dν þ

e - μR=a dμ

1 -1

1

νe - νR=a dν:

ð19:221Þ

The above definite integrals can readily be calculated using the methods described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have 1 -1

e - cν dν =

e - cx -c

1

1 = ðec - e - c Þ: c -1

ð19:222Þ

Differentiating (19.222) with respect to the parameter c, we get 1 -1

νe - cν dν =

1 ½ð1 - cÞec - ð1 þ cÞe - c : c2

ð19:223Þ

In the present case, c is to be replaced with R/a. Other calculations of the definite integrals are left for readers. Thus, we obtain I=

R a3 2 - 2 1 þ e - 2R=a : a R3

In turn, we have Aj

1 R2 1 R jA = 3  I = 1 - 1 þ e - 2R=a : rB R a 2a

Introducing a symbol j0 according to Atkins and Friedman given by [3]. j0  we finally obtain

e2 , 4πε0

19.5

MO Calculations of Methane

821

H 11 = E 1s -

2R j0 j R 1 - 1 þ e- a þ 0 : R a R

ð19:224Þ

The quantity S12 appearing in (19.199) can be obtained as follows: S12 = hBjAi =

R3 8πa3

1

dμ -1

1

dϕ 0

1

dμ -1

1

1 - 3=2 - rB =a e πa

where we used j Bi = 1

1



dν μ2 - ν2 e - μR=a ,

ð19:225Þ

and (19.217). Noting that 1

dν μ2 - ν2 e - μR=a =

dμ 2μ2 -

1

2 - μR=a e 3

and following procedures similar to those described above, we get S12 = 1 þ

R 1 R þ a 3 a

2

e - R=a :

ð19:226Þ

In turn, for H12 in (19.199) we have H 12 = hAjHjBi = Aj þ Aj

ħ2 2 e2 e2 jB þ Aj jB — 2m 4πε0 r B 4πε0 r A

e2 jB 4πε0 R

= E 1s hAjBi -

e2 1 e2 Aj jB þ hAjBi: 4πε0 4πε0 R rA

ð19:227Þ

ħ Note that in (19.227) jBi is an eigenfunction belonging to E1s = - 2ma 2 that is an 2 ħ2 e2 eigenvalue of an operator - 2m — - 4πε0 rB . From (3.258), we estimate E1s to be 13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coordinates once again, we get 2

Aj Thus, we have

1 1 R jB = 1 þ e - R=a : rA a a

822

19

H 12 = E1s þ

Applications of Group Theory to Physical Chemistry R j0 j R S - 0 1 þ e - a: R 12 a a

ð19:228Þ

We are now in the position to evaluate H12 - H11S12 in (19.215). According to Atkins and Friedman [3], we define the following notations: j0  j0 Aj

1 jA rB

and

k0  j0 Aj

1 jB : rA

Then, we have H 12 - H 11 S12 = - k0 þ j0 S12 :

ð19:229Þ

The calculation of this quantity is straightforward. The result is H 12 - H 11 S12

= j0

=

R 1 2R - R=a j0 e 1þ R a R 3a2

j0 a 2R - R=a j0 a e 1þ a R 3a a R





R 1 R þ a 3 a

R 1 R þ a 3 a

2

2

e - 3R=a

e - 3R=a :

ð19:230Þ

In (19.230) we notice that whereas the second term is always negative, the first term may be negative or positive depending upon R. We could not tell a priori whether H12 - H11S12 is negative accordingly. If we had R ≪ a, (19.230) would become positive. Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm (using an electron rest mass). As an experimental result, R is approximately 106 pm [3]. Hence, for a Hþ 2 ion we have R=a ≈ 2:0: Using this number, we get H 12 - H 11 S12 ≈ - 0:13j0 =a < 0: We estimate H12 - H11S12 to be ~ - 3.5 eV. Therefore, from (19.215) we get

19.5

MO Calculations of Methane

λL =

823

H 11 þ H 12 1 þ S12

and

λH =

H 11 - H 12 , 1 - S12

ð19:231Þ

where λL and λH indicate the lower and higher energy eigenvalues, respectively. Namely, Case II in the above is more likely. Correspondingly, for MOs we have ΨL =

j Aiþ j Bi 2ð1 þ S12 Þ

and

ΨH =

j Ai - j Bi , 2ð1 - S12 Þ

ð19:232Þ

where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the same as those given in (19.101)–(19.105) of Sect. 19.1. In (19.101) and Fig. 19.6, α-β however, we merely assumed that λ1 = αþβ 1þS is lower than λ2 = 1 - S. Here, we have confirmed that this is truly the case. A chemical bond is formed in such a way that an electron is distributed as much as possible along the molecular axis (see Fig. 19.4 for a schematic) and that in this configuration a minimized orbital energy is achieved. In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199). Suppose that we get a MO by solving, e.g., the first secular equation of (19.199) such that Ψ = a1 H ðA1 Þ þ b1 C2s: From the secular equation, we get a1 = -

H 12 - λS12 b , H 11 - λ 1

ð19:233Þ

where S12 and H12 were defined as in (19.201) and (19.205), respectively. A ~ is described by normalized MO Ψ ~= Ψ

a1 H ðA1 Þ þ b1 C2s a1 2 þ b1 2 þ 2a1 b1 S12

:

ð19:234Þ

Thus, according to two different energy eigenvalues λ we will get two linearly independent MOs. Other three secular equations are dealt with similarly. From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding MO and a1b1 < 0 with an anti-bonding MO. Meanwhile, since both H ðA1 Þ and C2s ~ on the basis of the discussion belong to the irreducible representation A1, so does Ψ on the projection operators of Sect. 18.7. Thus, we should be able to construct proper MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving other three secular equations of (19.199). In this case, three bonding MOs are triply degenerate, so are three anti-bonding MOs. All these six MOs belong to the irreducible representation T2. Thus, we can get a complete set of MOs for methane. These eight MOs span the representation space V8.

824

19

Applications of Group Theory to Physical Chemistry

Fig. 19.16 Probable energy diagram and MO symmetry species of methane. (Adapted from http://www. science.oregonstate.edu/ ~gablek/CH334/Chapter1/ methane_MOs.htm with kind permission of Professor Kevin P. Gable)

C 4H

CH4 To precisely determine the energy levels, we need to take more elaborate approaches to approximate and calculate various parameters that appear in (19.199). At the same time, we need to perform detailed experiments including spectroscopic measurements and interpret those results carefully [4]. Taking account of these situations, Fig. 19.16 [5] displays as an example of MO calculations that give a probable energy diagram and MO symmetry species of methane. The diagram comprises a ground state bonding a1 state and its corresponding anti-bonding state a1 along with triply degenerate bonding t2 states, and their corresponding antibonding state t 2 . We emphasize that the said elaborate approaches ensue truly from the “paperand-pencil” methods based upon group theory. The group theory thus supplies us with a powerful tool and clear guideline for addressing various quantum chemical problems, a few of which we are introducing as an example in this book. Finally, let us examine the optical transition of methane. In this case, we have to consider electronic configurations of the initial and final states. If we are dealing with optical absorption, the initial state is the ground state A1 that is described by the totally symmetric representation. The final state, however, will be an excited state, which is described by a direct-product representation related to the two states that are associated with the optical transition. The matrix element is expressed as Θðα × βÞ jεe  PjΘðA1 Þ ,

ð19:235Þ

where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground state; Θ(α × β) denotes an electronic configuration of an excited state represented by a direct-product representation pertinent to irreducible representations α and β; P is an electric dipole operator and εe is a unit polarization vector of the electric field. From a

19.5

MO Calculations of Methane

825

character table for Td, we find that εe  P belongs to the irreducible representation T2 (see Table 19.12). The ground state electronic configuration is A1 (totally symmetric). It is denoted by a21 t 22 t ′ 22 t″22 , where three T2 states are distinguished by a prime and double prime. For possible configuration of excited states, we have a21 t 2 t ′ 22 t″22 t 2 ðA1 → T 2 × T 2 Þ, a21 t 2 t ′ 22 t″22 a1 ðA1 → T 2 × A1 = T 2 Þ, a1 t 22 t ′ 22 t″22 t 2 ðA1 → A1 × T 2 = T 2 Þ, a1 t 22 t ′ 22 t″22 a1 ðA1 → A1 × A1 = A1 Þ:

ð19:236Þ

In (19.236), we have optical excitations of t 2 → t 2 , t 2 → a1 , a1 → t 2 , and a1 → a1 , respectively. Consequently, the excited states are denoted by T2 × T2, T2, T2, and A1, respectively. Since εe  P is Hermitian as seen in (19.51) of Sect. 19.2, we have Θðα × βÞ jεe  PjΘðA1 Þ = ΘðA1 Þ jεe  PjΘðα × βÞ



:

ð19:237Þ

Therefore, according to the general theory of Sect. 19.2, we need to examine whether T2 × D(α) × D(β) contains A1 to judge whether the optical transition is allowed. As mentioned above, D(α) × D(β) is chosen from among T2 × T2, T2, T2, and A1. The results are given as follows: T 2 × T 2 × T 2 = 3T 1 þ 4T 2 þ 2E þ A2 þ A1 ,

ð19:238Þ

T 2 × T 2 = T 1 þ T 2 þ E þ A1 ,

ð19:239Þ

T 2 × A1 = T 2 :

ð19:240Þ

Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As for (19.240), however, the transition is forbidden because it does not contain A1. In light of the character table of Td (Table 19.12), we find that the allowed transitions (i.e., t 2 → t 2 , t 2 → a1 , and a1 → t 2) equally take place in the direction polarized along the x-, y-, z-axes. This is often the case with molecules having higher symmetries such as methane. The transition a1 → a1 , however, is forbidden. In the above argument including the energetic consideration, we could not tell magnitude relationship of MO energies or photon energies associated with the

826

19

Applications of Group Theory to Physical Chemistry

optical transition. Once again, this requires more accurate calculations and experiments. Yet, the discussion we have developed gives us strong guiding principles in the investigation of molecular science.

References 1. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York 2. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York 3. Atkins P, Friedman R (2005) Molecular quantum mechanics, 4th edn. Oxford University Press, Oxford 4. Anslyn EV, Dougherty DA (2006) Modern physical organic chemistry. University Science Books, Sausalito 5. Gable KP (2014) Molecular orbitals: methane. http://www.science.oregonstate.edu/~gablek/ CH334/Chapter1/methane_MOs.htm

Chapter 20

Theory of Continuous Groups

In Chap. 16, we classified the groups into finite groups and infinite groups. We focused mostly on finite groups and their representations in the precedent chapters. Of the infinite groups, continuous groups and their properties have been widely investigated. In this chapter we think of the continuous groups as a collective term that include Lie groups and topological groups. The continuous groups are also viewed as the transformation group. Aside from the strict definition, we will see the continuous group as a natural extension of the rotation group or SO(3) that we studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of infinitesimal rotation. We make the most of the exponential functions of matrices the theory of which we have explored in Chap. 15. In this context, we study the basic properties and representations of the special unitary group SU(2) that has close relevance to SO(3). Thus, we focus our attention on the representation theory of SU(2) and SO(3). The results are associated with the (generalized) angular momenta which we dealt with in Chap. 3. In particular, we show that the spherical surface harmonics constitute the basis functions of the representation space of SO(3). Finally, we study various important properties of SU(2) and SO(3) within a framework of Lie groups and Lie algebras. The last sections comprise the description of abstract ideas, but those are useful for us to comprehend the constitution of the group theory from a higher perspective.

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

To characterize the continuous transformation appropriately, we have to describe an infinitesimal transformation of rotation. This is because a rotation with a finite angle is attained by an infinite number of infinitesimal rotations.

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_20

827

828

20

Theory of Continuous Groups

First let us consider how a function Ψ is transformed by the rotation (see Sect. 19.1). Here we express Ψ by Ψ = ðψ 1 ψ 2 ⋯ ψ d Þ

c1 c2 ⋮ cd

ð20:1Þ

,

where {ψ 1, ⋯, ψ d} span the representation space relevant to the rotation; d is a dimension of the representation space; c1, c2, ⋯ cd are coefficients (or coordinates). Henceforth, we make it a rule to denote a vector in a representation space as in (20.1). This is a natural extension of (11.13). We assume ψ ν (ν = 1, 2, ⋯, d ) as a vector component and ψ ν is normally expressed as a function of a position vector x in a real three-dimensional space (see Chap. 18). Considering for simplicity a one-dimensional representation space (i.e., d = 1) and omitting the index ν, as a transformation of ψ we have Rθ ψ ðxÞ = ψ Rθ- 1 ðxÞ ,

ð20:2Þ

where the index θ indicates the rotation angle in a real space whose dimension is two or three. Note that the dimensions of the real space and representation space may or may not be the same. Equation (20.2) is essentially the same as (19.11). As a three-dimensional matrix representation of rotation, we have, e.g.,

Rθ =

cos θ sin θ

– sin θ 0 cos θ 0

0

0

:

ð20:3Þ

1

The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and (11.31). Borrowing the notation of (17.3) and (17.12), we have R θ ð xÞ = ð e1 e 2 e 3 Þ

cos θ sin θ 0

– sin θ 0 cos θ 0 0 1

x y z

:

ð20:4Þ

Therefore, as an inverse operator we get cos θ Rθ- 1 ðxÞ = ðe1 e2 e3 Þ - sin θ 0

sin θ cos θ 0

0 0 1

x y z

:

ð20:5Þ

Let us think of an infinitesimal rotation. Taking a first-order quantity of an infinitesimal angle θ, (20.5) can be approximated as

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

1 θ 0 x x þ θy Rθ- 1 ðxÞ ≈ ðe1 e2 e3 Þ - θ 1 0 y = ðe1 e2 e3 Þ y - θx z z 0 0 1 = ðx þ θyÞe1 þ ðy - θxÞe2 þ ze3 :

829

ð20:6Þ

Putting ψ ðxÞ = ψ ðxe1 þ ye2 þ ze3 Þ  ψ ðx, y, zÞ, we have Rθ ψ ðxÞ = ψ Rθ- 1 ðxÞ

ð20:2Þ

or Rθ ψ ðx, y, zÞ ≈ ψ ðx þ θy, y - θx, zÞ ∂ψ ∂ψ = ψ ðx, y, zÞ þ θy þ ð - θxÞ ∂x ∂y ∂ ∂ = ψ ðx, y, zÞ - iθð - iÞ x -y ψ ðx, y, zÞ: ∂y ∂x Now, using a dimensionless operator Mz (=Lz/ħ) such that Mz  - i x

∂ ∂ -y , ∂y ∂x

ð20:7Þ

we get Rθ ψ ðx, y, zÞ ≈ ψ ðx, y, zÞ - iθMz ψ ðx, y, zÞ = ð1 - iθMz Þψ ðx, y, zÞ:

ð20:8Þ

Note that the operator Mz has already appeared in Sect. 3.5 [also see (3.19), Sect. 3.2], but in this section we denote it by Mz to distinguish it from the previous notation Mz. This is because Mz and Mz are connected through a unitary similarity transformation (vide infra). From (20.8), we express an infinitesimal rotation θ around the z-axis as Rθ = 1 - iθMz : Next, let us think of a rotation of a finite angle θ. We have

ð20:9Þ

830

20

Theory of Continuous Groups

n

Rθ = Rθ=n ,

ð20:10Þ

where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by n successive infinitesimal rotations of Rθ/n with a large enough n. Taking a limit n → 1 [1, 2], we get Rθ = lim Rθ=n n→1

n

= lim

n→1

n

θ 1 - i Mz n

= expð- iθMz Þ:

ð20:11Þ

Recall the following definition of an exponential function other than (15.7) [3]: ex  lim 1 þ n→1

x n : n

ð20:12Þ

Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus, if a dimension of the representation space d is two or more, (20.9) should be read as Rθ = E - iθMz ,

ð20:13Þ

where E is a (d, d) identity matrix; Mz is represented by a (d, d) square matrix accordingly. As already shown in Chap. 15, an exponential function of a matrix is well-defined. Since Mz is Hermitian, -iMz is anti-Hermitian (see Chaps. 2 and 3). Thus, using (20.11) Rθ can be expressed as Rθ = expðθAz Þ,

ð20:14Þ

Az  - iMz :

ð20:15Þ

with

Operators of this type play an essential role in continuous groups. This is because in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie group. The Lie group is categorized as one of the continuous groups along with a topological group. In this chapter we further explore various characteristics of SO(3) and SU(2), both of which are Lie groups and have been fully investigated from various aspects. The Lie algebras frequently appear in the theory of continuous groups in combination with the Lie groups. Brief outline of the theory of Lie groups and Lie algebras will be given at the end of this chapter. Let us further consider implications of (20.9) and (20.13). From those equations we have Mz = ð1 - Rθ Þ=iθ or

ð20:16Þ

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

831

Mz = ðE - Rθ Þ=iθ:

ð20:17Þ

As a very trivial case, we might well be able to think of a one-dimensional real space. Let the space extend along the z-axis and a unit vector of that space be e3. Then, with any rotation Rθ around the z-axis makes e3 unchanged (or invariant). That is, we have R θ ð e3 Þ = e 3 : With the three-dimensional case, we have

E=

1 0

0 1

0 0

0

0

1

Rθ =

and

1 θ

-θ 0 1 0

0

0

:

ð20:18Þ

1

Therefore, from (20.17) we get

Mz =

0 i

-i 0 0 0

0

0

:

ð20:19Þ

0

Next, viewing x, y, and z as vector components (i.e., functions), we think of the following equation:

Mz ðΧÞ = ðx y zÞ

0 i 0

-i 0 0 0

c1 c2

0

c3

0

:

ð20:20Þ

The notation of (20.20) denotes the vector transformation in the representation space, in accordance with (11.37). In (20.20) we consider (x y z) as basis vectors as in the case of (e1 e2 e3) of (17.2) and (17.3). In other words, we are thinking of a vector transformation in a three-dimensional representation space spanned by a set of basis vectors (x y z). In this case, Mz represented by (20.19) operates on a vector (i.e., a function) Χ expressed by c1 Χ = ðx y zÞ

c2 c3

,

ð20:21Þ

832

20

Theory of Continuous Groups

Fig. 20.1 Function f(x, y) = ay (a > 0). The function ay is drawn green. Only a part of f(x, y) > 0 is shown

=

,

=

( > 0)

where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural extension of (11.13). We emphasize once more that (x y z) does not represent coordinates but functions. As an example, we depict a function f(x, y) = ay (a > 0) in Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector with respect to a linear transformation like a basis vector e2 of the Cartesian coordinate. The “directionality” as a vector can easily be visualized with the function ay. Since Mz of (20.19) is Hermitian, it must be diagonalized by unitary similarity transformation (Sect. 14.3). We find a diagonalizing unitary matrix P with respect to Mz such that

P=

1 -p 2 i -p 2 0

Using (20.22), we rewrite (20.20) as

1 p 2 i -p 0

0 2

0 1

:

ð20:22Þ

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

0

-i 0

i

0

0

0 1 p 2 i -p

0

0

Mz ðX Þ = ðx y zÞP  P - 1

= ðx y zÞ

×

1 -p 2 1 p 2 0

i p

2

i p

1 -p 2 i -p 2 0

0

0

2

0

c1

2 0

1

×

1 0

0 -1

0 0

0

0

0

c1 P  P-1

c2 c3

1

0

0

0

-1

0

0

0

0

1 1 i - p x- p y 2 2

=

c2 c3

0

0

833

1 i p x- p y z 2 2

i 1 - p c1 þ p c2 2 2 1 i p c 1 þ p c2 2 2 c3

:

ð20:23Þ

Here we define M z such that

Mz  P

-1

1 0

Mz P =

0 0 -1 0

0

0

:

ð20:24Þ

0

Equation (20.24) was obtained by putting l = 1 and shuffling the rows and columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by

R3 =

0 1

0 0

1 0

0

1

0

ð17:116Þ

,

we have

R3- 1 M z R3

= R3- 1 P - 1 Mz PR3

= ðPR3 Þ

-1

Mz ðPR3 Þ =

-1 0

0 0 0 0

0

0 1

= Mz:

834

20

Theory of Continuous Groups

Hence, Mz that appeared as a diagonal matrix in (3.158) has been obtained by a unitary similarity transformation of Mz using a unitary operator PR3. Comparing (20.20) and (20.23), we obtain useful information. If we choose (x y z) for basis vectors, we could not have a representation that diagonalizes Mz. If, 1 i 1 i however, we choose - p x - p y p x - p y z for the basis vectors, 2 2 2 2 we get a representation that diagonalizes Mz. These basis vectors are sometimes referred to as a spherical basis. If we carefully look at the latter basis vectors (i.e., spherical basis), we notice that these vectors have the same transformation properties as Y 11 Y 1- 1 Y 01 . In fact, using (3.216) and converting the spherical coordinates to Cartesian coordinates in (3.216), we get [4] Y 11

Y 1- 1

Y 01 =

3 1 4π r

1 1 - p ðx þ iyÞ p ðx - iyÞ 2 2

z ,

ð20:25Þ

where r is a radial coordinate (i.e., a positive number) that appears in the polar coordinate (r, θ). Note that the factor r contained in Y 11 Y 1- 1 Y 01 is invariant with respect to the rotation. We can generalize the results and conform them to the discussion of related characteristics in representation spaces of higher dimensions. The spherical surface harmonics possess an important position for this (vide infra). From Sect. 3.5, (3.150) we have M z Y 00 = 0, where Y 00 ðθ, ϕÞ = 1=4π. The above relation obviously shows that Y 00 ðθ, ϕÞ belongs to an eigenvalue of zero of Mz. This is consistent with the earlier comments made on the one-dimensional real space.

20.2

Rotation Groups: SU(2) and SO(3)

Bearing in mind the aforementioned background knowledge, we further develop the theory of the rotation groups SU(2) and SO(3) and explore in detail various properties of them. First, we deal with the topics through conventional approaches. Later we will describe them systematically. Following the argument developed above, we wish to relate an anti-Hermitian operator to a rotation operator of a finite angle θ represented by (17.12). We express Hermitian operators related to the x- and y-axes as

20.2

Rotation Groups: SU(2) and SO(3)

Mx =

0 0

0 0 0 -i

0

i

835

0 0

0 i 0 0

-i

0

and My =

0

:

0

Then, anti-Hermitian operators corresponding to (20.15) are described by

Az =

-1 0 0 0

0 1 0

0

Ax =

,

0

Ay =

0 0

0 0

1 0

-1

0

0

0 0

0 0 0 -1

0

1

,

0 ð20:26Þ

,

where Ax  - iMx and Ay  - iMy. These are characterized by real skewsymmetric matrices. Using (15.7), we calculate exp(θAz) such that expðθAz Þ = E þ θAz þ

1 1 1 1 ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5 2! 3! 4! 5!

þ ⋯:

ð20:27Þ

We note that

A2z =

A3z =

Thus, we get

-1

0

0

0

-1

0

0

0

0

0

1

0

-1

0

0

0

0

0

= A6z ,

= A7z ,

1

0

0

0

1

0

0

0

0

0

-1

0

1

0

0

0

0

0

A4z =

A5z =

= A8z etc:;

= A9z :

836

20

Theory of Continuous Groups

expðθAz Þ 1 1 1 1 = E þ ðθAz Þ2 þ ðθAz Þ4 þ ⋯ þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ ⋯ 2! 4! 3! 5! 1 1 0 0 1 - θ2 þ θ4 - ⋯ 4! 2! 1 1 = 0 1 - θ 2 þ θ4 - ⋯ 0 4! 2! 0 0 1 0 þ

=

=

1 1 - θ þ θ3 - θ5 þ ⋯ 5! 3!

0

0

0

0

0

1 1 θ - θ 3 þ θ5 þ ⋯ 5! 3! 0 cos θ

0

0

0

cos θ

0

0

0

1

þ

cos θ

- sin θ

0

sin θ

cos θ

0

0

0

1

0

- sin θ

0

sin θ

0

0

0

0

0

ð20:28Þ

:

Hence, we recover the matrix form of (17.12). Similarly, we get

expðφAx Þ =

exp ϕAy =

1 0

0 0 cos φ - sin φ

0

sin φ

cos ϕ 0 - sin ϕ

,

ð20:29Þ

:

ð20:30Þ

cos φ 0 1 0

sin ϕ 0 cos ϕ

Of the above operators, using (20.28) and (20.30) we recover (17.101) related to the Euler angles such that R3 = expðαAz Þ exp βAy expðγAz Þ:

ð20:31Þ

Using exponential functions of matrices, (20.31) gives a transformation matrix containing the Euler angles and supplies us with a direct method to be able to make a matrix representation of SO(3).

20.2

Rotation Groups: SU(2) and SO(3)

837

Although (20.31) is useful for the matrix representation in the real space, its extension to abstract representation spaces would somewhat be limited. In this respect, the method developed in SU(2) can widely be used to produce representation matrices for a representation space of an arbitrary dimension. Let us start with constructing those representation matrices using (2, 2) complex matrices.

20.2.1

Construction of SU(2) Matrices

In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many subgroups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central role in theoretical physics and chemistry. In this section we examine how to construct SU(2) matrices. Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is called a special unitary group of degree n and denoted by SU(n). In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary matrix is 1. In Definition 20.1, however, there is an additional constraint in that we are dealing with unitary matrices whose determinant is 1. The SU(2) matrices U have the following general form: U=

a

b

c

d

ð20:32Þ

,

where a, b, c, and d are complex numbers. According to Definition 20.1 we try to seek conditions for these numbers. The unitarity condition UU{ = U{U = E reads as

=

a

b

a

c

c

d

b

d

aj2 þ b

=

a

c

a

b

b

d

c

d

ac þ bd

2

a c þ b d =

aj2 þ c

cj2 þ d 2

ab þ cd 

2

a b þ c d bj2 þ d

2

ð20:33Þ =

1

0

0

1

:

Then we have ðiÞ jaj2 þ jbj2 = 1, ðivÞ jbj2 þ jdj2 = 1,

ðiiÞ jcj2 þ jdj2 = 1,

ðiiiÞ jaj2 þ jcj2 = 1,

ðvÞ a c þ b d = 0,

ðviÞ a b þ c d = 0,

ðviiÞ ad - bc = 1: Of the above conditions, (vii) comes from detU = 1. Condition (iv) can be eliminated by conditions (i)-(iii). From (v) and (vii), using Cramer’s rule we have

838

20

0 b 1 a - b c= = - b ,  =  a b jaj2 þjbj2 -b a

Theory of Continuous Groups

a 0 -b 1 a d=  =  a b jaj2 þjbj2 -b a

= a :

ð20:34Þ

From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redundant. Thus, as a general form of U we get a - b

U=

b a

with aj2 þ bj2 = 1:

ð20:35Þ

As a result, we have four freedoms with a and b with a restricting condition of |a| 2+|b|2 = 1. That is, we finally get three freedoms with choice of parameters. Putting a = p þ iq and

b = r þ is ðp, q, r, s : realÞ,

we have p2 þ q2 þ r 2 þ s2 = 1:

ð20:36Þ

Therefore, the restricted condition of (20.35) is equivalent to that p, q, r, and s are regarded as coordinates that are positioned on a unit sphere of ℝ4. From (20.35), we have U - 1 = U{ =

a b



-b a

:

Also putting U1 =

a1

b1

- b1

a1

and

U2 =

a2

b2

- b2

a2

,

ð20:37Þ

we get U1U2 =

a1 a2 - b1 b2

- a2 b1 - a1 b2

a1 b2 þ b1 a2

- b1 b2 þ a1 a2

:

ð20:38Þ

Thus, both U-1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices having a matrix form of (20.35) constitute a group; i.e., SU(2).

20.2

Rotation Groups: SU(2) and SO(3)

839

Next we wish to connect the SU(2) matrices to the matrices that represent the angular momentum. In Sect. 3.4 we developed the theory of generalized angular momentum. In the case of j = 1/2 we can obtain accurate information about the spin states. Using (3.101) as a starting point and defining the state belonging to the spin 0 1 and , respectively, we angular momentum μ = - 1/2 and μ = 1/2 as 1 0 obtain J ð þÞ =

Note that

1

0

1

0

J ð- Þ =

and

0

1

0

0

:

ð20:39Þ

0

are in accordance with (2.64) as a column vector 0 1 representation. Using (3.72), we have Jx =

1 2

and

0

0 1 1 0

,

Jy =

0

1 2

i

-i 0

Jz =

,

-1 0

1 2

0

In (20.40) Jz was given so that we may have J z 0 = 1 such that

Jz

1 2

σx =

1 1 0

:

=-

ð20:40Þ

1 2

1

and

0

0 . Deleting the coefficient 12 from (20.40), we get Pauli spin matrices 1

0 1

1 , 0

σy =

0 i , -i 0

σz =

-1 0

0 : 1

ð20:41Þ

Multiplying -i on (20.40), we have anti-Hermitian operators described by ζx =

1 2

0 -i

-i , 0

ζy =

1 2

0 1 , -1 0

ζz =

1 2

i 0

0 : -i

ð20:42Þ

Notice that these are traceless anti-Hermitian matrices. Now let us seek rotation matrices in a manner related to that deriving (20.31) but at a more general level. From Property (7)' of Chap. 15 (see Sect. 15.2), we know that expðtAÞ with an anti-Hermitian operator A combined with a real number t produces a unitary matrix. Let us apply this to the anti-Hermitian matrices described by (20.42). Using ζz of (20.42), we calculate exp(αζ z) as in (20.27). We show only the results described by

840

20

α α expðαζ z Þ = E  cos - iσ z sin = 2 2 =

iα=2

e

0

cos

α α þ i sin 2 2 0

Theory of Continuous Groups

0 α α cos - i sin 2 2

ð20:43Þ

0 : e - iα=2

Readers are recommended to derive this. Using ζ y and applying similar calculation procedures to the above, we also get

exp βζ y =

β β sin 2 2 β β - sin cos 2 2 cos

:

ð20:44Þ

The representation matrices describe “complex rotations” and (20.43) and (20.44) ð1=2Þ correspond to (20.28) and (20.30), respectively. We define Dα,β,γ as the representation matrix that describes the successive rotations α, β, and γ and corresponds to R~3 in (17.101) and (20.31) including Euler angles. As a result, we have ð1=2Þ

Dα,β,γ = expðαζz Þ exp βζ y expðγζ z Þ e2ðαþγÞ cos i

=

β 2

- e2ðγ - αÞ sin i

β 2 i β e - 2ðαþγÞ cos 2 e2ðα - γÞ sin i

β 2

,

ð20:45Þ

ð1=2Þ

where the superscript 1/2 of Dα,β,γ corresponds to the non-negative generalized angular momentum j (=1/2). Equation (20.45) can be obtained by putting a = i i e2ðαþγÞ cos β2 and b = e2ðα - γÞ sin β2 in the matrix U of (20.35). Using this general SU(2) representation matrix, we construct representation matrices of higher orders.

20.2.2

SU(2) Representation Matrices: Wigner Formula

Suppose that we have two functions (or vectors) v and u. We regard v and u as linearly independent vectors that undergo a unitary transformation by a unitary matrix U of (20.35). That is, we have v ′ u ′ = ðv uÞU = ðv uÞ

a - b

b , a

where (v u) are transformed into (v′ u′). Namely, we have

ð20:46Þ

20.2

Rotation Groups: SU(2) and SO(3)

841

v0 = av - b u, u0 = bv þ a u: Let us further define according to Hamermesh [5] an orthonormal set of 2j + 1 functions such that fm =

u jþm v j - m ðj þ mÞ!ðj - mÞ!

ðm = - j; - j þ 1; ⋯; j - 1; jÞ,

ð20:47Þ

where j is an integer or a half-odd-integer. Equation (20.47) implies that 2j + 1 monomials fm of 2j-degree with respect to v and u constitute the orthonormal basis. Related discussion can be seen in Sect. 20.3.2. The description that follows includes contents related to generalized angular momenta (see Sect. 3.4). We examine how these functions fm (or vectors) are transformed by (20.45). We denote this transformation by Rða, bÞ. That is, we have ðjÞ

Rða; bÞðf m Þ  m′

f m ′ Dm ′ m ða; bÞ,

ð20:48Þ

ðjÞ

where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation. Replacing u and v with u′ and v′, respectively, and using the binomial theorem we get Rða, bÞf m ¼ ¼ ¼

μ,ν

1 ðbv þ a uÞjþm ðav - b uÞj - m ðj þ mÞ!ðj - mÞ! 1 ðbv þ a uÞjþm ðav - b uÞj - m ðj þ mÞ!ðj - mÞ!

μ,ν

ðjþmÞ!ðj-mÞ! 1 ða uÞjþm-μ ðbvÞμ ð-b uÞj-m-ν ðavÞν ðjþmÞ!ðj-mÞ! ðjþm-μÞ!μ!ðj-m-νÞ!ν!

ðj þ mÞ!ðj - mÞ! ða Þjþm - μ bμ ð- b Þj - m - ν aν u2j - μ - ν vμþν : ðj þ m - μÞ!μ!ðj - m - νÞ!ν! ð20:49Þ Here, to eliminate ν, putting

¼

μ,ν

2j - μ - ν = j þ m0 ,

μ þ ν = j - m0 ,

we get ℜða, bÞf m ¼

ð j þ mÞ!ð j - mÞ! ðm0 - m þ μÞ!ð j - m0 - μÞ! ð j þ m μ Þ!μ! μ, m 0 0

ða Þ jþm - μ bμ ð- b Þm - mþμ a j - m Expressing f m0 as

0

- μ jþm0 j - m0

u

v

:

ð20:50Þ

842

20 0

Theory of Continuous Groups

0

u jþm v j - m ðm0 = - j, - j þ 1, ⋯, j - 1, jÞ, ð j þ m0 Þ!ð j - m0 Þ!

f m0 =

ð20:51Þ

we obtain ℜða,bÞð f m Þ =

f m0 μ, m 0

0 ð j þ mÞ!ð j-mÞ!ð j þ m0 Þ!ð j-m0 Þ! 0 ða Þ jþm-μ bμ ð-b Þm -mþμ a j-m - μ : ð j þ m-μÞ!μ!ðm0 -m þ μÞ!ð j-m0 -μÞ!

ð20:52Þ Equating (20.52) with (20.48), as the (m′, m) component of D( j )(a, b) we get ð jÞ

Dm0 m ða, bÞ =

μ

0 ð j þ mÞ!ð j - mÞ!ð j þ m0 Þ!ð j - m0 Þ! 0 ða Þ jþm - μ bμ ð- b Þm - mþμ aj - m - μ : ð j þ m - μÞ!μ!ðm0 - m þ μÞ!ð j - m0 - μÞ!

ð20:53Þ In (20.53), factorials of negative integers must be avoided; see (3.216). Finally, i i i replacing a (or a) and b (or b) with e2ðαþγÞ cos β2 [or e - 2ðαþγÞ cos β2] and e2ðα - γÞ sin β2 i [or e2ðγ - αÞ sin β2] by use of (20.45), respectively, we get 0

ð jÞ Dm0 m ðα, β, γ Þ =

× e

μ

ð- 1Þm - mþμ ð j þ mÞ!ð j - mÞ!ð j þ m0 Þ!ð j - m0 Þ! ð j þ m - μÞ!μ!ðm0 - m þ μÞ!ð j - m0 - μÞ! ð20:54Þ

- iðαm0 þγmÞ

cos

0 β 2jþm - m - 2μ 2

sin

0 β m - mþ2μ : 2

In (20.54), the notation D( j )(a, b) has been replaced with D( j )(α, β, γ), where α, β, and γ represent Euler angles that appeared in (17.101). Equation (20.54) is called Wigner formula [5–7]. ðjÞ To confirm that Dm0 m is indeed unitary, we carry out the following calculations. From (20.47) we have j

j

jf m j2 = m= -j

m= -j

Meanwhile, we have

ju jþm v j - m j = ð j þ mÞ!ð j - mÞ! 2

j m= -j

juj2ð jþmÞ jvj2ð j - mÞ : ð j þ mÞ!ð j - mÞ!

20.2

Rotation Groups: SU(2) and SO(3)

juj2 þ jvj2

2j

2j

= k=0

843

ð2jÞ!juj2ð2 j - kÞ jvj2k = ð2j - kÞ!k!

j m= -j

ð2jÞ!juj2ð jþmÞ jvj2ð j - mÞ , ð j þ mÞ!ð j - mÞ!

where with the last equality k was replaced with j - m. Comparing the above two equations, we get j

jf m j2 = m= -j

1 juj2 þ jvj2 ð2jÞ!

2j

:

ð20:55Þ

Viewing (20.46) as variables transformation and taking adjoint of (20.46), we have v0 u0 where we defined U =

a

= U{

v u

ð20:56Þ

,

b

(i.e., a unitary matrix). Operating (20.56) on - b a both sides of (20.46) from the right, we get ð v0 u0 Þ



v0 u0

= ðv uÞUU {

v u

= ð v uÞ

v u

:

That is, we have a quadratic form described by ju0 j þ jv0 j ¼ juj2 þ jvj2 : 2

2

From (20.55) and (20.57), we conclude that ðjÞ

j 2 m = - j jf m j

ð20:57Þ (i.e., the length of

vector) is invariant under the transformation Dm0 m . This implies that the transformaðjÞ tion is unitary and, hence, that Dm0 m is a unitary matrix [5].

20.2.3

SO(3) Representation Matrices and Spherical Surface Harmonics

Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get further useful information. We immediately notice that (20.54) is related to (3.216) that is explicitly described by trigonometric functions. Putting m = 0 in (20.54) we have

844

20 ðjÞ

Theory of Continuous Groups

ðjÞ

Dm0 0 ðα, β, γ Þ = Dm0 0 ðα, βÞ 0

=

ð- 1Þm þμ j! ðj þ m0 Þ!ðj - m0 Þ! - iαm0 β cos e 2 ðj - μÞ!μ!ðm0 þ μÞ!ðj - m0 - μÞ!

μ

2j - m0 - 2μ

β sin 2

m0 þ2μ

:

ð20:58Þ Note that (20.58) does not depend on the variable γ, because m = 0 leads to eimγ = ei  0  γ = e0 = 1 in (20.54). Notice also that the event of m = 0 in (20.54) never happens with the case where j is a half-odd-integer but happens with the case where j is a half-odd-integer but happens with the case where j is an integer l. In fact, D(l )(α, β, γ) is a (2l + 1)-degree representation of SO(3). Later in this section, we will give a full matrix form in the case of l = 1. Replacing m′ with m in (20.58), we get ðjÞ

Dm0 ðα, β, γ Þ =

μ

ð- 1Þmþμ j! ðj þ mÞ!ðj - mÞ! - iαm β cos e 2 ðj - μÞ!μ!ðm þ μÞ!ðj - m - μÞ!

2j - m - 2μ

sin

β 2

mþ2μ

: ð20:59Þ

For (20.59) to be consistent with (3.216), by changing notations of variables and replacing j with an integer l we further rewrite (20.59) to get ðlÞ

Dm0 ðϕ, θÞ = r

ð- 1Þmþr l! ðl þ mÞ!ðl - mÞ! - imϕ θ e cos 2 r!ðl - m - r Þ!ðl - r Þ!ðm þ r Þ!

2l - 2r - m

sin

θ 2

2rþm

: ð20:60Þ

Notice that in (20.60) we deleted the variable γ as it was redundant. Comparing (20.60) with (3.216), we get [5] ðlÞ

Dm0 ðϕ, θÞ



=

4π m Y ðθ, ϕÞ: 2l þ 1 l

ð20:61Þ

From a general point of view, let us return to a discussion of SU(2) and introduce a following important theorem in relation to the basis functions of SU(2). Theorem 20.1 [1] Let {ψ -j, ψ -j + 1, ⋯, ψ j} be a set of (2j + 1) functions. Let D( j ) be a representation of the special unitary group SU(2). Suppose that we have a following expression for ψ m, k (m = - j, ⋯, j) such that

20.2

Rotation Groups: SU(2) and SO(3)

845

Fig. 20.2 Coordinate systems O, I, and II and their transformation. Regarding the symbols and notations, see text

,

(

,

I

)

O ( , , )

II



ðjÞ

ψ m,k ðα, β, γ Þ = Dmk ðα, β, γ Þ ,

ð20:62Þ

where α, β, and γ are Euler angles that appeared in (17.101). Then, the above functions are basis functions of the representation of D( j ). Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators of transformation (i.e., rotation) between the coordinate system as shown. The transformation Q is defined as the coordinate transformation that transforms I to II. Note that their inverse transformations, e.g., Q-1 exist. In. Fig. 20.2, P and R are specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2] Q - 1 R = P:

ð20:63Þ

According to the expression (19.12), we have ðjÞ

Qψ m,k ðα, β, γ Þ = ψ m,k Q - 1 ðα, β, γ Þ = ψ m,k ðα0 , β0 , γ 0 Þ = Dmk Q - 1 R



,

where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element ðjÞ Dmk Q - 1 R can be written as ðjÞ



ðjÞ

Dmk Q - 1 R =

ðjÞ Dmn Q - 1 Dnk ðRÞ = n

ðjÞ

jÞ Dðnm ðQÞ Dnk ðRÞ, n

846

20

Theory of Continuous Groups

where with the last equality we used the unitarity of the representation matrix D( j ). Taking complex conjugate of the above expression, we get Qψ m,k ðα, β, γ Þ = ψ m,k ðα0 , β0 , γ 0 Þ = ðjÞ

jÞ Dðnm ðQÞ Dnk ðα, β, γ Þ

=



n



jÞ ψ n,k ðα, β, γ ÞDðnm ðQÞ,

=

n

ðjÞ

jÞ Dðnm ðQÞ Dnk ðRÞ

ð20:64Þ

n

where with the last equality we used the supposition (20.62). Equation (20.64) implies that ψ m, k (m = - j, ⋯, 0, ⋯, j) are basis functions of the representation of D( j ). This completes the proof. If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with an integer l and putting k = 0 in (20.62) and (20.64), we get lÞ ψ n,0 ðα, β, γ ÞDðnm ðQÞ,

Qψ m,0 ðα, β, γ Þ = ψ m,0 ðα0 , β0 , γ 0 Þ = n ðlÞ



ψ m,0 ðα, β, γ Þ = Dm0 ðα, β, γ Þ :

ð20:65Þ

This expression shows that the complex conjugates of individual components for the “center” column vector of the representation matrix give the basis functions of the representation of D(l ). Removing γ as it is redundant and by the aid of (20.61), we get 4π m Y ðβ, αÞ: 2l þ 1 l

ψ m,0 ðα, βÞ 

Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the spherical surface harmonics Y m l ðθ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ span the representation space with respect to D(l ). That is, we have lÞ Y nl ðβ, αÞDðnm ðQÞ:

QY m l ðβ, αÞ =

ð20:66Þ

n

Now, let us get back to (20.63). From (20.63), we have Q = RP - 1 :

ð20:67Þ

Since P represents an arbitrary transformation, we may choose any (orthogonal) coordinate system I in ℝ3. Meanwhile, a transformation which transforms I into II is necessarily present and given by Q that is expressed as (20.67); see (11.72) and the relevant argument of Sect. 11.4. Consequently, Q represents an arbitrary transformation in ℝ3 in turn.

20.2

Rotation Groups: SU(2) and SO(3)

847

Now, in (20.67) putting P = E we get Q  R(α, β, γ). Then, replacing Q with R in (20.66) as a special case we have ðlÞ Y nl ðβ, αÞDnm ðα, β, γ Þ:

Rðα, β, γ ÞY m l ðβ, αÞ =

ð20:68Þ

n

The LHS of (20.68) is associated with three successive transformations. To describe it appropriately, let us recall (19.12) again. We have Rf ðrÞ = f R - 1 ðrÞ :

ð19:12Þ

Now, we are thinking of the case where R in (19.12) is described by R(α, β, γ) or R3 = Rzα Ry0 0 β Rz0000 γ in (17.101). That is, we have Rðα, β, γ Þf ðrÞ = f Rðα, β, γ Þ - 1 ðrÞ : Considering the constitution of R3 of (17.101), we readily get [5] Rðα, β, γ Þ - 1 = Rð- γ, - β, - αÞ: Thus, we have Rðα, β, γ Þf ðrÞ = f ½Rð- γ, - β, - αÞðrÞ:

ð20:69Þ

Applying (20.69) to (20.68), we get the successive (inverse) transformations of the functions such that m m m Ym l ðβ, αÞ → Y l ðβ, α - γ Þ → Y l ðβ - β, α - γ Þ → Y l ðβ - β, α - γ - αÞ:

The last entry in the above is [5] m Ym l ðβ - β, α - γ - αÞ = Y l ð0, - γ Þ:

Then, from (20.68) we get ðlÞ Y nl ðβ, αÞDnm ðα, β, γ Þ:

Ym l ð0, - γ Þ = n

Meanwhile, from (20.61) we have

ð20:70Þ

848

20

Ym l ð0, - γ Þ =

Theory of Continuous Groups

 2l þ 1 ðlÞ Dm0 ð- γ, 0Þ : 4π

Returning to (20.60), if we would have sin 2θ factor in (20.60) or in (3.216), the ðlÞ relevant terms vanish with Dm0 ð- γ, 0Þ except for special cases where 2r + m = 0 in (3.216) or (20.60). But, to avoid having a negative integer inside the factorial in the denominator of (20.60), must have r + m ≥ 0. Since r ≥ 0, we must have 2r + m = (r + m) + r ≥ 0 as well. In other words, only if r = m = 0 in (20.60), ðlÞ we have 2r + m = 0, for which Dm0 ð- γ, 0Þ does not vanish. Using these conditions in (20.60), (20.60) is greatly simplified to be ðlÞ

Dm0 ð- γ, 0Þ = δm0 : From (20.61) we get [5] 2l þ 1 δ : 4π 0m

Ym l ð0, - γ Þ =

ð20:71Þ

Notice that this expression is consistent with (3.147) and (3.148). Meanwhile, replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting ðlÞ equation by Dmk ðα, β, γ Þ, and further summing over m, we get

m

=

4π m ðlÞ Y ðβ, αÞDmk ðα, β, γ Þ = 2l þ 1 l



ðlÞ

ðlÞ

Dm0 ðα, β, γ Þ Dmk ðα, β, γ Þ = δ0k m

ð20:72Þ

4π k Y ð0, - γ Þ, 2l þ 1 l

ðlÞ where with the second equality we used the unitarity of the matrices Dnm ðα, β, γ Þ (see Sect. 20.2.2) and with the last equality we used (20.70) multiplied by the constant 4π 2lþ1.

At the same time, we recovered (20.71) comparing the last two sides of

(20.72). Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get f - j f - jþ1 ⋯ f j - 1 f j ℜða, bÞ = f - j f - jþ1 ⋯ f j - 1 f j

D - j, - j



D - j,j







Dj, - j



Dj,j

,

ð20:73Þ

where fm (m = - j, -j + 1, ⋯, j - 1, j) is described as in (20.47). Though abstract, the matrix on RHS of (20.73) gives (2j + 1)-dimensional representation D( j ) of

20.2

Rotation Groups: SU(2) and SO(3)

849

(20.54). If j is an integer l, we can choose Y m l ðβ, αÞ for individual fm (i.e., basis functions). To get used to the abstract representation theory in continuous groups, we wish to think of a following trivial case: Putting j = 0 in (20.54), (20.54) becomes trivial, but still valid. In that case, we have m′ = m = 0 so that the inside of the factorial can be non-negative. In turn, we must have μ = 0 as well. Thus, we have ð0Þ

D00 ðα, β, γ Þ = 1: Or, inserting l = m = 0 into (20.61), we have ð0Þ

D00 ðϕ, θÞ



p = 1 = 4π Y 00 ðθ, ϕÞ,

i:e:,

Y 00 ðθ, ϕÞ =

1=4π:

Thus, we recover (3.150). Next, let us reproduce three-dimensional representation D(1) using (20.54). A full matrix form is expressed as

D

ð1Þ

D - 1, - 1

D - 1,0

D - 1,1

D0, - 1

D0,0

D0,1

D1, - 1 D1,0 1 iα p e sin β 2

D1,1

=

β eiðαþγÞ cos 2 2 =

1 - p eiγ sin β 2 β eiðγ - αÞ sin 2 2

cos β 1 - p e - iα sin β 2

eiðα - γÞ sin 2

β 2

1 p e - iγ sin β 2 β e - iðαþγÞ cos 2 2

ð20:74Þ :

Equation (20.74) is a special case of (20.73), in which j = 1. Note that (20.74) is a unitary matrix. The confirmation is left for readers. The complex conjugate of the column vector m = 0 in (20.61) with l = 1 is described by

ð1Þ

Dm0 ðϕ; θÞ



=

1 p e - iα sin β 2 cos β 1 iα - p e sin β 2

ð20:75Þ

that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760, Table 15.4 of Ref. [4]. Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the rotation matrix of ℝ3, i.e., R~3 of (17.101) that includes Euler angles. The argument is as follows: As already shown in (20.61) and (20.65), we have related the basis

850

20

Theory of Continuous Groups

vectors ( f-1 f0 f1) of D(1) to the spherical harmonics Y 1- 1 Y 01 Y 11 . Denoting the spherical basis by ~ 1  p1 ðx - iyÞ, f2

1 f~1  p ð- x - iyÞ, 2

f~0  z,

ð20:76Þ

we have

~ 1 f~0 f~1 = ðx z yÞU = ðx z yÞ f-

1 p 2

1 p 2

0

0

1

0

0

i - p

i - p

2

-

,

ð20:77Þ

2

where we define a (unitary) matrix U as 1 p 2

0

0

1

i p 2

0

U -

-

1 p 2 :

0 -

i p

2

Equation (20.77) describes the relationship among two set of functions ~ 1 f~0 f~1 and (x z y). We should not regard (x z y) as coordinate in ℝ3 as fmentioned in Sect. 20.1. The (3, 3) matrix of (20.77) is essentially the same as P of (20.22). Following (20.73) and taking account of (20.25), we express ~ 1 0 f~0 0 f~1 0 as f~ 1 f~0 f~1 Dð1Þ : ~ 1 0 f~0 0 f~1 0  f ~ 1 f~0 f~1 Rða, bÞ = f f-

ð20:78Þ

~ 1 f~0 f~1 forms the basis vectors of D(1) as This means that a set of functions f well. Meanwhile, the relation (20.77) holds after the transformation, because (20.77) is independent of the choice of specific coordinate systems. That is, we have ~ 1 0 f~0 0 f~1 0 = ðx0 z0 y0 Þ U: fThen, from (20.78) we get

20.2

Rotation Groups: SU(2) and SO(3)

851

~ 1 0 f~0 0 f~1 0 U - 1 = f ~ 1 f~0 f~1 Dð1Þ U - 1 ð x0 z 0 y0 Þ = f = ðx z yÞUDð1Þ U - 1 :

ð20:79Þ

Defining a unitary matrix V as V=

1 0 0

0 0 1

0 1 0

and operating V on both sides of (20.79) from the right, we get ðx0 z0 y0 ÞV = ðx z yÞV  V - 1 UDð1Þ U - 1 V: That is, we have ð x0 y0 z 0 Þ = ð x y z Þ U - 1 V

-1

Dð1Þ U - 1 V:

ð20:80Þ

Note that (x′ z′ y′)V = (x′ y′ z′) and (x z y)V = (x y z); that is, V exchanges the order of z and y in (x z y). Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have ðx0 y0 z0 Þ = ðx y zÞ R ,

ð20:81Þ

where R represents a (3, 3) orthogonal matrix. Equation (20.81) represents a rotation in SO(3). Since, x, y, and z are linearly independent, comparing (20.80) and (20.81) we get R  U - 1V

-1

{

Dð1Þ U - 1 V = U - 1 V Dð1Þ U - 1 V:

More explicitly, using a = e2ðαþγÞ cos (20.45), we describe R as i

R=

Reða2 - b2 Þ

Imða2 - b2 Þ - 2Reðab Þ

β 2

and b = e2ðα - γÞ sin

- Imða2 þb2 Þ Reða2 þb2 Þ 2Imðab Þ

i

β 2

ð20:82Þ

as in the case of

2ReðabÞ 2ImðabÞ jaj2 - jbj2

:

ð20:83Þ

In (20.83), Re and Im denote a real part and an imaginary part of a complex number, respectively. Comparing (20.83) with (17.101), we find out that R is identical to R~3 of (17.101). Equations (20.82) and (20.83) clearly show that D(1) and R  R~3 are connected to each other through the unitary similarity transformation, even though these matrices differ in that the former matrix is unitary and the latter is a real orthogonal matrix. That is, D(1) and R~3 are equivalent in terms of representation; see

852

20

Theory of Continuous Groups

Schur’s First Lemma of Sect. 18.3. Thus, D(1) is certainly a representation matrix of SO(3). The trace of D(1) and R~3 is given by TrDð1Þ = Tr R~3 = cosðα þ γ Þð1 þ cos βÞ þ cos β: Let us continue the discussion still further. We think of the quantity (x/r y/r z/r), where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the right, we have ðx=r y=r z=r ÞR = ðx=r y=r z=r ÞV - 1 UDð1Þ U - 1 V = f - 1 =r f 0 =r f 1 =r Dð1Þ U - 1 V: From (20.25), (20.61), (20.75), and (20.76), we get ~ 1 =r f~0 =r f~1 =r = p1 e - iα sin β cos β - p1 eiα sin β : f2 2

ð20:84Þ

Using (20.74) and (20.84), we obtain ~ 1 =r f~0 =r f~1 =r Dð1Þ = ð0 1 0Þ: fThis is a tangible example of (20.72). Then, we have ~ 1 =r f~0 =r f~1 =r Dð1Þ U - 1 V = ðx=r y=r z=r ÞR = ð0 0 1Þ: f-

ð20:85Þ

For the confirmation of (20.85), use the spherical coordinate representation (3.17) to denote x, y, and z in the above equation. This seems to be merely a confirmation of the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that the simple relation (20.85) holds only with a special case where the parameters α and β in D(1) (or R ) are identical with the angular components of spherical coordinates associated with the Cartesian coordinates x, y, and z. Equation (20.85), however, does not hold in a general case where the parameters α and β are different from those angular components. In other words, (20.68) that describes the special case where Q  R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), however, is used for the general case where Q represents an arbitrary transformation in ℝ3. Regarding further detailed discussion on the topics, readers are referred to the literature [1, 2, 5, 6].

20.2

Rotation Groups: SU(2) and SO(3)

20.2.4

853

Irreducible Representations of SU(2) and SO(3)

In this section we further explore characteristics of the representation of SU(2) and SO(3) and show the irreducibility of the representation matrices of SU(2) and SO(3). To prove the irreducibility of the SU(2) representation matrices D( j )(α, β, γ), we need to use Schur’s lemmas already explained in Sect. 18.3. At present, our task is to find a suitable matrix A that commutes with D( j )(α, β, γ). Since a general form of D( j )(α, β, γ) has a complicated structure as shown in (20.54), it would be a formidable task to directly deal with D( j )(α, β, γ) to verify its irreducibility. To gain clear understanding, we impose a restriction on D( j )(α, β, γ) so that we can readily perform related calculations [5]. To this end, in (20.54) we consider a special case where β = 0; i.e., D( j )(α, 0, γ). The point is that if we can find a matrix A which commutes with D( j )(α, 0, γ), the matrix that commutes with D( j )(α, β, γ) of a general form must be of the type A as well (if such matrix does exist). We wish to seek conditions under which D( j )(α, 0, γ) does not vanish. In (20.54) the exponential term never vanishes. When β = 0, cos β2 ð= 1Þ does not vanish either. For the factor sin

0 β m - mþ2μ 2

not to vanish, we must have

m0 - m þ 2μ = 0:

ð20:86Þ

In the denominator of (20.54), to avoid a negative number in factorials we must have m′ - m + μ ≥ 0 and μ ≥ 0. If μ > 0, m′ - m + 2μ = (m′ - m + μ) + μ > 0 as well. Therefore, for (20.86) to hold, we must have μ = 0. From (20.86), in turn, this means m′ - m = 0, i.e., m′ = m. Then, we have a greatly simplified expression with the ðjÞ matrix elements Dm0 m ðα, 0, γ Þ such that ðjÞ

0

Dm0 m ðα, 0, γ Þ = δm0 m e - iðαm þγmÞ :

ð20:87Þ

This implies that D( j )(α, 0, γ) is diagonal. Suppose that we have a matrix A = ðAm0 m Þ. If A commutes with D( j )(α, 0, γ), it must be diagonal from (20.87). That is, A should have a form Am0 m = am δm0 m . This can readily be confirmed using simple matrix algebra. The confirmation is left for readers. We seek further conditions for A to commute with D( j ) that has a general form of D( j )(α, β, γ) (β ≠ 0). For this purpose, we consider the ( j, m) component of D( j )(α, β, γ). Putting m′ = j in (20.54), we obtain ( j - m′ - μ) ! = (-μ)! in the denominator. Then, for (20.54) to be meaningful, we must have μ = 0. Hence, we get ðjÞ

Djm ðα, β, γ Þ = ð- 1Þj - m

ð2jÞ! β e - iðαjþγmÞ cos 2 ðj þ mÞ!ðj - mÞ!

jþm

sin

β 2

j-m

:

ð20:88Þ

854

20

Theory of Continuous Groups

With respect to (20.88), we think of the following three cases: (i) m ≠ j, m ≠ - j; ðjÞ (ii) m = j; (iii) m = - j. For the case (i), Djm ðα, β, γ Þ vanishes only when β = 0 or π. ðjÞ

For the case (ii), it vanishes only when β = π; in the case (iii), Djm ðα, β, γ Þ vanishes ðjÞ only when β = 0. The matrix element Djm ðα, β, γ Þ does not vanish other than the above three cases. Taking account of the above consideration, we continue our discussion. If A commutes with D( j )(α, β, γ), then with respect to the ( j, m) component of AD( j )(α, β, γ) we have ADðjÞ

jm

ðjÞ A D k jk km

=

=

ðjÞ aδ D k k jk km

ðjÞ

= aj Djm :

ð20:89Þ

ðjÞ

ð20:90Þ

Meanwhile, we obtain DðjÞ A

jm

=

ðjÞ

k

ðjÞ

Djk Akm =

k

Djk am δkm = am Djm :

From (20.89) and (20.90), for A to commute with D( j ) we must have ðjÞ

Djm ðα, β, γ Þ aj - am = 0:

ð20:91Þ

ðjÞ

Considering that the matrix elements Djm ðα, β, γ Þ in (20.88) do not vanish in a general case for α, β, γ, from (20.91) we must have aj - am = 0 for m that satisfies 1 ≤ m ≤ j - 1. When m = j, we have a trivial equation 0 = 0. This implies that aj cannot be determined. Thus, putting am = aj  a ≠ 0 (1 ≤ m ≤ j), we have Am0 m = aδm0 m , where a is an arbitrary complex constant. Then, from Schur’s Second Lemma we conclude that D( j )(α, β, γ) must be irreducible. We should be careful, however, about the application of Schur’s Second Lemma. This is because Schur’s Second Lemma described in Sect. 18.3 says the following: A representation D is irreducible ⟹ The matrix M that is commutative with all D(g) is limited to a form of M = cE. Nonetheless, we are uncertain of truth or falsehood of the converse proposition. Fortunately, however, the converse proposition is true if the representation is unitary. We prove the converse proposition by its contraposition such that A representation D is not irreducible (i.e., reducible) ⟹ Of the matrices that are commutative with all D(g), we can find at least one matrix M of the type other than M = cE.

20.2

Rotation Groups: SU(2) and SO(3)

855

In this context, Theorem 18.2 (Sect. 18.2) as well as (18.38) teaches us that the unitary representation D(g) is completely reducible so that we can describe it as a direct sum such that DðgÞ = Dð1Þ ðgÞ

Dð2Þ ðgÞ



DðωÞ ðgÞ,

ð20:92Þ

where g is any group element. Then, we can choose a following matrix A for a linear transformation that is commutative with D(g): A = αð1Þ E 1

αð2Þ E 2



αðωÞ Eω ,

ð20:93Þ

where α(1), ⋯, α(ω) are complex constants that may be different from one another; E1, ⋯, Eω are unit matrices having the same dimension as D(1)(g), ⋯, D(ω)(g), respectively. Then, even though A is not a type of A = αδm0 m , A commutes with D(g) for any g. Here we rephrase Schur’s Second Lemma as the following theorem. Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with 8 g 2 G we have DðgÞM = MDðgÞ:

ð18:48Þ

A necessary and sufficient condition for the said unitary representation to be irreducible is that linear transformations commutative with D(g)(8g 2 G) are limited to those of a form described by A = αδm0 m , where α is a (complex) constant. Next, we examine the irreducibility of the SO(3) representation matrices. We develop the discussion in conjunction with explanation about the important properties of the representation matrices of SU(2) as well as SO(3). We rewrite (20.54) when j = l (l : zero or positive integers) to relate it to the rotation in ℝ3. Namely, we have 0

ðlÞ

D ðα, β, γ Þ

m0 m

=

μ

×e

ð- 1Þm - mþμ ðl þ mÞ!ðl - mÞ!ðl þ m0 Þ!ðl - m0 Þ! ðl þ m - μÞ!μ!ðm0 - m þ μÞ!ðl - m0 - μÞ! - iðαm0 þγmÞ

β cos 2

2lþm - m0 - 2μ

β sin 2

m0 - mþ2μ

:

ð20:94Þ Meanwhile, as mentioned in Sect. 20.2.3 the transformation R~3 (appearing in Sect. 17.4.2) described by

856

20

Theory of Continuous Groups

R~3 = Rzα R ′ y0 β R″z00 γ may be regarded as R(α, β, γ) of (20.63). Consequently, the representation matrix D(l )(α, β, γ) is described by ðlÞ DðlÞ ðα, β, γ Þ  DðlÞ R~3 = DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R ′ ′ z00 γ :

ð20:95Þ

Equation (20.95) is based upon the definition of representation (18.2). This ð1=2Þ notation is in parallel with (20.45) in which Dα,β,γ is described by the product of three exponential functions of matrices each of which was associated with the rotation characterized by Euler angles of (17.101). Denoting ðlÞ ðlÞ ðlÞ 0 ðlÞ ðlÞ ðlÞ 00 Dα ðRzα Þ = D ðα, 0, 0Þ, Dβ Ry0 β = D ð0, β, 0Þ, and Dγ R ′ ′ z γ = D ð0, 0, γ Þ, we have DðlÞ ðα, β, γ Þ = DðlÞ ðα, 0, 0ÞDðlÞ ð0, β, 0ÞDðlÞ ð0, 0, γ Þ,

ð20:96Þ

where with each representation two out of three Euler angles are zero. In light of (20.94), we estimate each factor of (20.96). By the same token as before, putting β = γ = 0 in (20.94) let us examine on what conditions m0 - mþ2μ survives. For this factor to survive, we must have m′ sin 02 m + 2μ = 0 as in (20.87). Then, following the argument as before, we should have μ = 0 and m′ = m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with D(l )(0, 0, γ) as well. Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm = - l, ⋯, 0, ⋯, lÞ are eigenfunctions with regard to the rotation about the z-axis. In other words, we have m - imα m Rzα Y m Y l ðθ, ϕÞ: l ðθ, ϕÞ = Y l ðθ, ϕ - αÞ = e

This is because from (3.216) Y m l ðθ, ϕÞ can be described as imϕ Ym , l ðθ, ϕÞ = F ðθ Þe

where F(θ) is a function of θ. Hence, we have imðϕ - αÞ = e - imα F ðθÞeimϕ = e - imα Y m Ym l ðθ, ϕ - αÞ = F ðθ Þe l ðθ, ϕÞ:

Therefore, with a full matrix representation we get

20.2

Rotation Groups: SU(2) and SO(3)

857

Y l- l Y l- lþ1 ⋯Y 0l ⋯Y ll Rzα e - ið- lÞα e - ið- lþ1Þα

= Y l- l Y l- lþ1 ⋯ Y 0l ⋯Y ll

⋱ ð20:97Þ

e - ilα e

ilα

eiðl - 1Þα

= Y l- l Y l- lþ1 ⋯Y 0l ⋯Y ll

:

⋱ e - ilα

With a shorthand notation, we have DðlÞ ðα, 0, 0Þ = e - imα δm0 m ðm = - l, ⋯, lÞ:

ð20:98Þ

From (20.96), with the (m′, m) matrix elements of D(l )(α, β, γ) we get DðlÞ ðα, β, γ Þ = =

s, t

s, t

m0 m

DðlÞ ðα, 0, 0Þ

m0 s

DðlÞ ð0, β, 0Þ

e - isα δm0 s DðlÞ ð0, β, 0Þ 0

= e - im α DðlÞ ð0, β, 0Þ

m0 m

st

st

DðlÞ ð0, 0, γ Þ

tm

ð20:99Þ

e - imγ δtm

e - imγ :

Meanwhile, from (20.94) we have DðlÞ ð0, β, 0Þ 0

¼

μ

m0 m

ð- 1Þm - mþμ ðl þ mÞ!ðl - mÞ!ðl þ m0 Þ!ðl - m0 Þ! β cos 2 ðl þ m - μÞ!μ!ðm0 - m þ μÞ!ðl - m0 - μÞ!

2lþm - m0 - 2μ

sin

β 2

m0 - mþ2μ

:

ð20:100Þ

Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three factors. Equation (20.94) has such an implication. The same characteristic is shared with the SU(2) representation matrices (20.54) more generally. This can readily be understood from the aforementioned discussion. Taking D(1) of (20.74) as an example, we have

858

20

Theory of Continuous Groups

Dð1Þ ðα, β, γ Þ ¼ Dð1Þ ðα, 0, 0ÞDð1Þ ð0, β, 0ÞDð1Þ ð0, 0, γ Þ β 1 β p sin β sin 2 cos 2 2 2 2 iα e 0 0 1 1 p sin β - p sin β cos β ¼ 0 1 0 2 2 0 0 e - iα β 1 β sin 2 - p sin β cos 2 2 2 2 β 1 β iα 2 p e sin β eiðα - γ Þ sin eiðαþγ Þ cos 2 2 2 2 1 1 p e - iγ sin β : cos β ¼ - p eiγ sin β 2 2 β 1 β iα 2 - p e sin β e - iðαþγÞ cos 2 eiðγ - αÞ sin 2 2 2

eiγ 0 0 0

0

1

0

0 e

- iγ

ð20:74Þ

In this way, we have reproduced the result of (20.74). Once D(l )(α, β, γ) has been factorized as in (20.94), we can readily examine whether the representation is irreducible. To know what kinds of matrices commute with D(l )(α, β, γ), it suffices to examine whether the matrices commute with individual factors. The argument is due to Hamermesh [5]. (i) Let A be a matrix having a dimension the same as that of D(l )(α, β, γ), namely A is a (2l + 1, 2l + 1) square matrix with a form of A = (aij). First, we examine the commutativity of D(l )(α, 0, 0) and A. With the matrix elements we have ADðlÞ ðα, 0, 0Þ

m0 m

= am 0 m e

am0 k DðlÞ ðα, 0, 0Þ

= k - imα

km

am0 k e - imα δkm

= k

ð20:101Þ

,

where the second equality comes from (20.98). Also, we have DðlÞ ðα, 0, 0ÞA

m0 m

DðlÞ ðα, 0, 0Þ

= k

0

m0 k

akm = am0 m e - im α :

ð20:102Þ

From (20.101) and (20.102), for a condition of commutativity we have 0

am0 m e - imα - e - im α = 0: If m′ ≠ m, we should have am0 m = 0. If m′ = m, am0 m does not have to vanish but may take an arbitrary complex number. Hence, A is a diagonal matrix, namely A = am δm0 m , where am is an arbitrary complex number.

ð20:103Þ

20.2

Rotation Groups: SU(2) and SO(3)

859

(ii) Next we examine the commutativity of D(l )(0, β, 0) and A of (20.103). Using (20.66), we have ðlÞ Y nl ðβ, αÞDnm ðQÞ:

QY m l ðβ, αÞ =

ð20:66Þ

n

Choosing R(0, θ, 0) for Q and putting α = 0 in (20.66), we have lÞ Y nl ðβ, 0ÞDðnm ð0, θ, 0Þ:

Rð0, θ, 0ÞY m l ðβ, 0Þ =

ð20:104Þ

n

Meanwhile, from (20.69) we get -1 m ðβ, 0Þ = Y m Rð0, θ, 0ÞY m l ðβ, 0Þ = Y l Rð0, θ, 0Þ l ½Rð0, - θ, 0Þðβ, 0Þ m = Y l ðβ - θ, 0Þ:

ð20:105Þ Combining (20.104) and (20.105) we get ðlÞ Y nl ðβ, 0ÞDnm ð0, θ, 0Þ:

Ym l ðβ - θ, 0Þ =

ð20:106Þ

n

Further putting β = 0 in (20.106), we have lÞ Y nl ð0, 0ÞDðnm ð0, θ, 0Þ:

Ym l ð- θ, 0Þ =

ð20:107Þ

n

Taking account of (20.71), RHS does not vanish in (20.107) only when n = 0. On this condition, we obtain ðlÞ

0 Ym l ð- θ, 0Þ = Y l ð0, 0ÞD0m ð0, θ, 0Þ:

ð20:108Þ

Meanwhile, from (20.71) with m = 0, we have Y 0l ð0, 0Þ =

2l þ 1 : 4π

ð20:109Þ

Inserting this into (20.108), we have Ym l ð- θ, 0Þ =

2l þ 1 ðlÞ D ð0, θ, 0Þ: 4π 0m

Replacing θ with -θ in (20.110), we get

ð20:110Þ

860

20

ðlÞ

D0m ð0, - θ, 0Þ =

Theory of Continuous Groups

4π m Y ðθ, 0Þ: 2l þ 1 l

ð20:111Þ

Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with l = 1. Viewing (20.111) as a “row vector,” (20.111) with l = 1 can be expressed as ð1Þ

D0, - 1 ð0, - θ, 0Þ =

4π - 1 Y ðθ, 0Þ 3 1

ð1Þ

D0,0 ð0, - θ, 0Þ

ð1Þ

D0,1 ð0, - θ, 0Þ

Y 01 ðθ, 0Þ Y 11 ðθ, 0Þ =

1 p sin θ cos θ - p1 sin θ : 2 2

The above relation should be compared with (20.74). Getting back to our topic, with the matrix elements we have ADðlÞ ð0, θ, 0Þ

0m

ak δ0k DðlÞ ð0, θ, 0Þ

= k

km

= a0 DðlÞ ð0, θ, 0Þ

0m

and DðlÞ ð0, θ, 0ÞA

0m

DðlÞ ð0, θ, 0Þ

= k

0k

am δkm = DðlÞ ð0, θ, 0Þ

0m

am ,

where we assume A = am δm0 m in (20.103). Since from (20.108) [D(l )(0, θ, 0)]0m does not vanish other than specific values of θ, for A and D(l )(0, θ, 0) to commute we must have a m = a0 for all m = - l, - l + 1, ⋯, 0, ⋯, l - 1, l. Then, from (20.103) we get A = a0 δ m 0 m ,

ð20:112Þ

where a0 is an arbitrary complex number. Thus, a matrix that commutes with both D(l )(α, 0, 0) and D(l )(0, β, 0) must be of a form of (20.112). The same discussion holds with D(l )(α, β, γ) of (20.94) as a whole with regard to the commutativity. On the basis of Schur’s Second Lemma (Sect. 18.3), we conclude that the representation D(l )(α, β, γ) is again irreducible. This is one of the prominent features of SO(3) as well as SU(2). As can be seen clearly in (20.97), the dimension of representation matrix D(l )(α, β, γ) is 2l + 1. Suppose that there is another representation matrix 0 0 Dðl Þ ðα, β, γ Þ ðl0 ≠ lÞ. Then, D(l )(α, β, γ) and Dðl Þ ðα, β, γ Þ are inequivalent (see Sect. 18.3). Meanwhile, the unitary representation of SO(3) is completely reducible and, hence, any reducible unitary representation D can be described by

20.2

Rotation Groups: SU(2) and SO(3)

861

al DðlÞ ðα, β, γ Þ,

D=

ð20:113Þ

l

where al is zero or a positive integer. If al ≥ 2, this means that the same representations D(l )(α, β, γ) repeatedly appear. Equation (20.113) provides a tangible example of (20.92) expressed as DðgÞ = Dð1Þ ðgÞ

Dð2Þ ðgÞ



DðωÞ ðgÞ

ð20:92Þ

and applies to the representation matrices of SU(2) more generally. We will encounter related discussion in Sect. 20.3 in connection with the direct-product representation.

20.2.5

Parameter Space of SO(3)

As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the rotation in ℝ3. Their domains are usually taken as follows: 0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π: The domains defined above are referred to as a parameter space. Yet, there are different implications depending upon choosing different coordinate systems. The first choice is a moving coordinate system where α, β, and γ are Euler angles (see Sect. 17.4.2). The second choice is a fixed coordinate system where α is an azimuthal angle, β is a zenithal angle, and γ is a rotation angle. Notice that in the former case α, β, and γ represent equivalent rotation angles. In the latter case, however, although α and β define the orientation of a rotation axis, γ is designated as a rotation angle. In the present section we further discuss characteristics of the parameter space. First, we study several characteristics of (3, 3) real orthogonal matrices. (i) The (3, 3) real orthogonal matrices have eigenvalues 1, eiγ , and e-iγ (0 ≤ γ ≤ π). This is a direct consequence of (17.77) and invariance of the characteristic equation of the matrix. The rotation axis is associated with an eigenvector that belongs to the eigenvalue 1. Let A = (aij) (1 ≤ i, j ≤ 3) be a (3, 3) real orthogonal matrix. Let u be an eigenvector belonging to the eigenvalue 1. We suppose that when we are thinking of the rotation on the fixed coordinate system, u is given by a column vector such that

862

20

u1 u2 u3

u=

Theory of Continuous Groups

:

Then, we have A u = 1u = u:

ð20:114Þ

Operating AT on both sides of (20.114), we get AT A u = Eu = u = AT u,

ð20:115Þ

where we used the property of an orthogonal matrix; i.e., ATA = E. From (20.114) and (20.115), we have A - AT u = 0:

ð20:116Þ

Writing (20.116) in matrix components, we have [5] a11 a21 a31

a12 a22 a32

a13 a23 a33

-

a11 a12 a13

a21 a22 a23

a31 a32 a33

u1 u2 u3

= 0:

ð20:117Þ

That is, ða12 - a21 Þu2 þ ða13 - a31 Þu3 = 0, ða21 - a12 Þu1 þ ða23 - a32 Þu3 = 0, ða31 - a13 Þu1 þ ða32 - a23 Þu2 = 0:

ð20:118Þ

Solving (20.118), we get u1 ∶u2 ∶u3 = ða32 - a23 Þ∶ða13 - a31 Þ∶ða21 - a12 Þ: If we normalize u, i.e., u12 + u22 + u32 = 1,

u1 u2 u3

ð20:119Þ

gives direction cosines of the

eigenvector, namely the rotation axis. Equation (20.119) applies to any orthogonal matrix. A simple example is (20.3); a more complicated example can be seen in (17.107). Confirmation is left for readers. We use this property soon below. Let us consider infinitesimal transformations of rotation. As already implied in (20.6), an infinitesimal rotation θ around the z-axis is described by Rzθ =

1 -θ 0 θ 1 0 0 0 1

:

Now, we newly introduce infinitesimal rotation operators as below:

ð20:120Þ

20.2

Rotation Groups: SU(2) and SO(3)

863

z

Fig. 20.3 Infinitesimal rotations ξ, η, and ζ around the x-, y-, and z-axes, respectively

y O

x Rxξ =

1 0 0

1 ξ

0 0 -ξ 1

Rzζ =

1 0 -η

, Ryη = 1 ζ 0

-ζ 0 1 0 0 1

0 η 1 0 0 1

,

:

ð20:121Þ

These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and z-axes, respectively (see Fig. 20.3). Note that these operators commute one another to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have Rxξ Ryη = Ryη Rxξ , Ryη Rzζ = Rzζ Ryη , Rxξ Ryη Rzζ = Rzζ Ryη Rxξ , etc: with, e.g., Rxξ Ryη Rzζ =

1 ζ -η

-ζ η 1 -ξ ξ 1

ð20:122Þ

to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded. Next, let us consider a successive transformation with a finite rotation angle ω that follows the infinitesimal rotations. Readers may well ask why and for what purpose we need to make such elaborate calculations. It is partly because when we were dealing with finite groups in the previous chapters, group elements were finite and

864

20

Theory of Continuous Groups

relevant calculations were straightforward accordingly. But now we are thinking of continuous groups that possess an infinite number of group elements and, hence, we have to consider a “density” of group elements. In this respect, we are now thinking of how the density of group elements in the parameter space is changed according to the rotation of a finite angle ω. The results are used for various group calculations including orthogonalization of characters. Getting back to our subject, we further operate a finite rotation of an angle ω around the z-axis subsequent to the aforementioned infinitesimal rotations. The discussion developed below is due to Hamermesh [5]. We denote this rotation by Rω. Since we are dealing with a spherically symmetric system, any rotation axis can equivalently be chosen without loss of generality. Notice also that the rotations of the same rotation angle belong to the same conjugacy class [see (17.106) and Fig. 17.18]. Defining RxξRyηRzζ  RΔ of (20.122), the rotation R that combines RΔ and subsequently occurring Rω is described by

R = RΔ Rω =

=

1 ζ -η

-ζ η 1 -ξ ξ 1

cos ω - ζ sin ω sin ω þ ζ cos ω ξ sin ω - η cos ω

cos ω sin ω 0

– sin ω 0 cos ω 0 0 1

- sin ω - ζ cos ω η cos ω - ζ sin ω - ξ ξ cos ω þ η sin ω 1

ð20:123Þ :

Hence, we have R32 - R23 = ξ cos ω þ η sin ω - ð - ξÞ = ξð1 þ cos ωÞ þ η sin ω, R13 - R31 = η - ðξ sin ω - η cos ωÞ = ηð1 þ cos ωÞ - ξ sin ω, R21 - R12 = sin ω þ ζ cos ω - ð - sin ω - ζ cos ωÞ = 2ðsin ω þ ζ cos ωÞ: ð20:124Þ Since (20.124) gives a relative directional ratio of the rotation axis, we should normalize a vector whose components are given by (20.124) to get direction cosines of the rotation axis. Note that the direction of the rotation axis related to R of (20.123) should be close to Rω and that only R12 - R21 in (20.124) has a term (i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to normalize R12 - R21 to seek the direction cosines to the first order. Hence, dividing (20.124) by 2 sin ω, as components of the direction cosines we get ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ þ , - , 1: 2 sin ω 2 sin ω 2 2

ð20:125Þ

Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω′ for R in (20.123). The trace χ of (20.123) is written as χ = 1 þ 2ðcos ω - ζ sin ωÞ: From (17.78) we have

ð20:126Þ

20.2

Rotation Groups: SU(2) and SO(3)

865

χ = 1 þ 2 cos ω0 :

ð20:127Þ

Equating (20.126) and (20.127), we obtain cos ω0 = cos ω - ζ sin ω:

ð20:128Þ

Approximating (20.128), we get 1 1 2 1 - ω0 ≈ 1 - ω2 - ζω: 2 2

ð20:129Þ

From (20.129), we have a following approximate expression: ðω0 þ ωÞðω0 - ωÞ ≈ 2ωðω0 - ωÞ ≈ 2ζω: Hence, we obtain ω0 ≈ ω þ ζ:

ð20:130Þ

Combining (20.125) and (20.130), we get their product to the first order such that ω

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ þ ,ω - , ω þ ζ: 2 sin ω 2 sin ω 2 2

ð20:131Þ

Since these quantities are products of individual direction cosines and the rotation angle ω + ζ, we introduce variables ~x, ~y, and ~z as the x-, y-, and z-related quantities, respectively. Namely, we have ~x = ω

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ þ , ~y = ω - , ~z = ω þ ζ: 2 sin ω 2 sin ω 2 2

ð20:132Þ

To evaluate how the density of group elements is changed as a function of the rotation angle ω, we are interested in variations of ~x, ~y, and ~z that depend on ξ, η, and ζ. To this end, we calculate the Jacobian J as

866

20

∂~x ∂~x ∂~x ∂ξ ∂η ∂ζ J=

ωð1 þ cos ωÞ 2 sin ω

∂ð~x, ~y, ~zÞ ∂~y ∂~y ∂~y = = ∂ðξ, η, ζ Þ ∂ξ ∂η ∂ζ

-

∂~z ∂~z ∂~z ∂ξ ∂η ∂ζ =

ω 2

0

Theory of Continuous Groups

ω 2

0

ωð1 þ cos ωÞ 2 sin ω

0

0

1

2 ω2 ω2 ð1 þ cos ωÞ , þ1 = 2 4 sin ω 4 sin 2 ω2

ð20:133Þ where we used formulae of trigonometric functions with the last equality. Note that (20.133) does not depend on the ± signs of ω. This is because J represents the relative volume density ratio between two parameter spaces of the ~x~y~z -system and ξηζ-system and this ratio is solely determined by the modulus of rotation angle. Taking ω → 0, we have lim J = lim

ω→0

ω→0

ω2 4 sin 2

ð ω2 Þ

0

2ω 1 lim 0 = ω = ωlim ω 1 ω → 0 sin 2 ω 4 ω→0 2 sin  cos 2 2 2 2 2 0

ω ð ωÞ 1 = lim = lim = 1: 0 = lim ω → 0 sin ω ω → 0 ð sin ωÞ ω → 0 cos ω Thus, the relative volume density ratio in the limit of ω → 0 is 1, as expected. Let dV = d~xd~yd~z and dΠ = dξdηdζ be volume elements of each coordinate system. Then we have dV = JdΠ:

ð20:134Þ

We assume that J=

ρξηζ dV , = dΠ ρ~x~y~z

where ρ~x~y~z and ρξηζ are a “number density” of group elements in each coordinate system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are converted by a finite rotation Rω of (20.123) into the ~x~y~z -coordinate. In this way, J can be viewed as an “expansion” coefficient as a function of rotation angle jωj. Hence, we have

20.2

Rotation Groups: SU(2) and SO(3)

867

ρξηζ dΠ or ρ~x~y~z dV = ρξηζ dΠ: ρ~x~y~z

dV =

ð20:135Þ

Equation (20.135) implies that total (infinite) number of group elements that are contained in the group SO(3) is invariant with respect to the transformation. Let us calculate the total volume of SO(3). This can be measured such that 1 dV: J

dΠ =

ð20:136Þ

Converting dV to the polar coordinate representation, we get π

π

~ dω

dΠ =

Π½SOð3Þ =



dθ 0

0

dϕ 0

4 sin 2 ω~2 2 ~ sin θ = 8π 2 , ω ~2 ω

ð20:137Þ

where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is ~ j ω j so that the radial coordinate can be positive. As already replaced with ω noted, (20.133) does not depend on the ± signs of ω. In other words, among the angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as π ≤ ω < π (instead of 0 ≤ ω < 2π) so that it can be conformed to the radial coordinate. We utilize (20.137) for calculation of various functions. Let f(ϕ, θ, ω) be an arbitrary function on a parameter space. We can readily estimate a “mean value” of f(ϕ, θ, ω) on that space. That is, we have

f ðϕ, θ, ωÞ 

f ðϕ, θ, ωÞdΠ

ð20:138Þ

, dΠ

where f ðϕ, θ, ωÞ represents a mean value of the function and it is given by π

f ðϕ, θ, ωÞ =

π

~ dω

4 0



dθ 0

0

dϕf ðϕ, θ, ωÞ sin 2

~ ω sin θ = 2

dΠ:

ð20:139Þ

There would be some inconvenience to use (20.139) because of the mixture of ~ . Yet, that causes no problem if f(ϕ, θ, ω) is an even function with variables ω and ω respect to ω (see Sect. 20.2.6).

868

20.2.6

20

Theory of Continuous Groups

Irreducible Characters of SO(3) and Their Orthogonality

To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of Sect. 20.2.3. Also, we recall how successive coordinate transformations produce changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the parameters α, β, and γ uniquely specify individual rotation, different sets of α, β, and γ (in this section we use ω instead of γ) cause different irreducible representations. A corresponding rotation matrix is given by (17.101). In fact, the said matrix is a representation of the rotation. Keeping these points in mind, we discuss irreducible representations and their characters particularly in relation to the orthogonality of the irreducible characters. Figure 20.4 depicts a geometrical arrangement of two rotation axes accompanied by the same rotation angle ω. Let Rω and R0ω be such two rotations around the rotation axis A and A′, respectively. Let Q be another rotation that transforms the rotation axis A to A′. Then we have [1, 2] R0ω = QRω Q - 1 :

ð20:140Þ

This implies that Rω and R0ω belong to the same conjugacy class. To consider the situation more clearly, let us view the two rotations Rω and R0ω from two different coordinate systems, say some xyz-coordinate system and another x′y′z′-coordinate system. Suppose also that A coincides with the z-axis and that A′ coincides with the z′-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in reference to the x′y′z′-system, the representation of these rotations must be identical in reference to the individual coordinate systems. Let this representation matrix be Rω . (i) If we describe Rω and R0ω with respect to the xyz-system, we have Rω = Rω , R0ω = Q - 1 Namely, we reproduce

Fig. 20.4 Geometrical arrangement of two rotation axes A and A′ accompanied by the same rotation angle ω

-1

Rω Q - 1 = QRω Q - 1 :

ð20:141Þ

20.2

Rotation Groups: SU(2) and SO(3)

869

R0ω = QRω Q - 1 : (ii)

ð20:140Þ

If we describe Rω and R0ω with respect to the x′y′z′-system, we have R0ω = Rω , Rω = Q - 1 Rω Q:

ð20:142Þ

Hence, again we reproduce (20.140). That is, the relation (20.140) is independent of specific choices of the coordinate system. Taking the representation indexed by l of Sect. 20.2.4 with respect to (20.140), we have DðlÞ R0ω = DðlÞ QRω Q - 1 = DðlÞ ðQÞDðlÞ ðRω ÞDðlÞ Q - 1 = DðlÞ ðQÞDðlÞ ðRω Þ DðlÞ ðQÞ

-1

,

ð20:143Þ

where with the last equality we used (18.6). Operating D(l )(Q) on (20.143) from the right, we get DðlÞ R0ω DðlÞ ðQÞ = DðlÞ ðQÞDðlÞ ðRω Þ:

ð20:144Þ

Since both DðlÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations determined by the orientation of the rotation axis A that is accompanied by an azimuthal angle α (0 ≤ α < 2π) and a zenithal angle β (0 ≤ β < π) (see Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4. In Sect. 20.2.2 we know that the spherical surface harmonics (l ) ð and span the repreYm l θ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ constitute basis functions of D sentation space. A dimension of the matrix (or the representation space) is 2l + 1. 0 Therefore, D(l ) and Dðl Þ for different l and l′ have a different dimension. From 0 (18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent. Returning to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13), the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω = χ ðlÞ ðRω Þ. Then, we put χ ðlÞ ðωÞ  χ ðlÞ ðRω Þ = χ ðlÞ R0ω :

ð20:145Þ

Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96) whose representation matrix is given by (20.97). We have

870

20 l

χ ðlÞ ðωÞ =

eimω = m= -l

=

e

Theory of Continuous Groups

e - ilω 1 - eiωð2lþ1Þ 1 - eiω

- ilω

ð1 - e - iω Þ 1 - eiωð2lþ1Þ , ð1 - eiω Þð1 - e - iω Þ

ð20:146Þ

where ½numerator of ð20:146Þ = e - ilω þ eilω - eiωðlþ1Þ - e - iωðlþ1Þ = 2½cos lω - cosðl þ 1Þω 1 ω = 4 sin l þ ω sin 2 2 and ½denominator of ð20:146Þ = 2ð1 - cos ωÞ = 4 sin 2

ω : 2

Thus, we get sin l þ 12 ω : sin ω2

χ ðlÞ ðωÞ =

ð20:147Þ

To adapt (18.76) to the present formulation, we rewrite it as 1 n

χ ðαÞ ðgÞ χ ðβÞ ðgÞ = δαβ :

g

ð20:148Þ

A summation over a finite number of group element n in (20.148) should be read as integration in the case of the continuous groups. Instead of a finite number of irreducible representations in a finite group, we are thinking of an infinite number of irreducible representations with continuous groups. In (20.139) the denominator dΠ (=8π 2) corresponds to n of (20.148). If f(ϕ, θ, ω) or f(α, β, ω) is an even function with respect to ω (-π ≤ ω < π) as in the case of (20.147), the numerator of (20.139) can be expressed in a form of π

f ðα, β, ωÞdΠ = π

= 4

π

dω 0

π

dω 0



0

ω2

ω 2 ω2 sin β

0 2π

ω dαf ðα, β, ωÞ sin sin β: 2 2

dβ 0

dαf ðα, β, ωÞ



4 sin 2

0

ð20:149Þ

20.2

Rotation Groups: SU(2) and SO(3)

871

If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character described by (20.147), the calculation is further simplified such that π

f ðωÞdΠ = 16π

dωf ðωÞ sin 2

ω : 2

ð20:150Þ

0 

0

Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have π



0

0

χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ = 16π



dω χ ðl Þ ðωÞ χ ðlÞ ðωÞ sin 2

ω 2

0 π

= 16π

dω 0

sin l0 þ 12 ω sin l þ 12 ω ω sin 2 sin ω2 sin ω2 2

π

dω sin l0 þ

= 16π

1 1 ω sin l þ ω 2 2

0 π

dω½cosðl0 - lÞω - cosðl0 þ l þ 1Þω:

= 8π

ð20:151Þ

0

Since l′ + l + 1 > 0, the integral of cos(l′ + l + 1)ω term vanishes. Only if l′ l = 0, the integral of cos(l′ - l )ω term does not vanish but takes a value of π. Therefore, we have 

0

χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ = 8π 2 δl0 l : 0

ð20:152Þ



Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get 0



χ ðl Þ ðωÞ χ ðlÞ ðωÞ = δl0 l :

ð20:153Þ

This relation gives the orthogonalization condition in concert with (20.148) obtained in the case of a finite group. In Sect. 18.6 we have shown that the number of inequivalent irreducible representations of a finite group is well-defined and given by the number of conjugacy classes of that group. This is clearly demonstrated in (18.144). In the case of the continuous groups, however, the situation is somewhat complicated and we are uncertain of the number of the inequivalent irreducible representations. We address this problem as follows: Suppose that Ω(α, β, ω) would be an irreducible

872

20

Theory of Continuous Groups

representation that is inequivalent to individual D(l )(α, β, ω) (l = 0, 1, 2, ⋯). Let K (ω) be a character of Ω(α, β, ω). Defining f(ω)  [χ (l )(ω)]K(ω), (20.150) reads as ðlÞ

π





dω χ ðlÞ ðωÞ KðωÞ sin 2

χ ðωÞ KðωÞdΠ = 16π

ω : 2

ð20:154Þ

0

The orthogonalization condition (20.153) demands that the integral of (20.154) vanishes. Similarly, we would have χ

ðlþ1Þ

π





dω χ ðlþ1Þ ðωÞ KðωÞ sin 2

ðωÞ KðωÞdΠ = 16π

ω , 2

ð20:155Þ

0

which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and taking account of the fact that χ (l )(ω) is real [see (20.147)], we have π

dω χ ðlþ1Þ ðωÞ - χ ðlÞ ðωÞ KðωÞ sin 2

ω = 0: 2

ð20:156Þ

0

Invoking a trigonometric formula of sin a - sin b = 2 cos

aþb a-b sin 2 2

and applying it to (20.147), (20.156) can be rewritten as π

dω½cosðl þ 1ÞωKðωÞ sin 2

2

ω = 0 ðl = 0, 1, 2, ⋯Þ: 2

ð20:157Þ

0

Putting l = 0 in LHS of (20.154) and from the assumption that the integral of (20.154) vanishes, we also have π

dωKðωÞ sin 2

ω = 0, 2

ð20:158Þ

0

where we used χ (0)(ω) = 1. Then, (20.157) and (20.158) are combined to give

20.3

Clebsch–Gordan Coefficients of Rotation Groups π

dωðcos lωÞKðωÞ sin 2

873

ω = 0 ðl = 0, 1, 2, ⋯Þ: 2

ð20:159Þ

0

This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish. Considering that coslω (l = 0, 1, 2, ⋯) forms a complete orthonormal system in the region [0, π], we must have KðωÞ sin 2

ω  0: 2

ð20:160Þ

Requiring K(ω) to be continuous, (20.160) is equivalent to K(ω)  0. This implies that there is no other inequivalent irreducible representation than D(l )(α, β, ω) (l = 0, 1, 2, ⋯). In other words, the representations D(l )(α, β, ω) (l = 0, 1, 2, ⋯) constitute a complete set of irreducible representations. The spherical surface harmonics Y m l ðθ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ constitute basis functions of D(l ) and span the representation space of D(l )(α, β, ω) accordingly. In the above discussion, we assumed that K(ω) is an even function with respect to ω as in the case of χ (l )(ω) in (20.147). Hence, KðωÞ sin 2 ω2 is an even function as well. Therefore, we assumed that KðωÞ sin 2 ω2 can be expanded to the Fourier cosine series.

20.3 Clebsch-Gordan Coefficients of Rotation Groups In Sect. 18.8, we studied direct-product representations. We know that even though D(α) and D(β) are both irreducible, D(α × β) is not necessarily irreducible. This is a common feature for both finite and infinite groups. Moreover, the reducible unitary representation can be expressed as a direct sum of irreducible representations such that DðαÞ ðgÞ  DðβÞ ðgÞ =

qD γ γ

ðγ Þ

ðgÞ:

ð18:199Þ

In the infinite groups, typically the continuous groups, this situation often appears when we deal with the addition of two angular momenta. Examples include the addition of two (or more) orbital angular momenta and that of the orbital angular momentum and spin angular momentum [6, 8]. Meanwhile, there is a set of basis vectors that spans the representation space with respect to individual irreducible representations. There, we have a following question: What are the basis vectors that span the total reduced representation space described by (18.199)? An answer for this question is that these vectors must be constructed by the basis vectors relevant to the individual irreducible representations. The Clebsch-Gordan coefficients appear as coefficients with respect to a

874

20

Theory of Continuous Groups

linear combination of the basis vectors associated with those irreducible representations. In a word, to find the Clebsch-Gordan coefficients is equivalent to the calculation of proper coefficients of the basis functions that span the representation space of the direct-product groups. The relevant continuous groups which we are interested in are SU(2) and SO(3).

20.3.1

Direct-Product of SU(2) and Clebsch-Gordan Coefficients

In Sect. 20.2.6 we have explained the orthogonality of irreducible characters of SO(3) by deriving (20.153). It takes advanced theory to show the orthogonality of irreducible characters of SU(2). Fortunately, however, we have an expression similar to (20.153) with SU(2) as well. Readers are encouraged to look up appropriate literature for this [9]. On the basis of this important expression, we further develop the representation theory of the continuous groups. Within this framework, we calculate the Clebsch-Gordan coefficients. As in the case of the previous section, the character of the irreducible representation D( j )(α, β, γ) of SU(2) is described only as a function of a rotation angle. Similarly to (20.147), in SU(2) we get χ ðjÞ ðωÞ =

j

eimω = m= -j

sin j þ 12 ω , sin ω2

ð20:161Þ

where χ ( j )(ω) is an irreducible character of D( j )(α, β, γ). It is because we can align the rotation axis in the direction of, e.g., the z-axis and the representation matrix of a rotation angle ω is typified by D( j )(ω, 0, 0) or D( j )(0, 0, ω). Or, we can evenly choose any direction of the rotation axis at will. We are interested in calculating angular momenta of a coupled system, i.e. addition of angular momenta of that system [5, 6]. The word of “coupled system” needs some explanation. This is because on the one hand we may deal with, e.g., a sum of an orbital angular momentum and a spin angular momentum of a “single” electron, but on the other hand we might calculate, e.g., a sum of two angular momenta of “two” electrons. In either case, we will deal with the total angular momenta of the coupled system. Now, let us consider a direct-product group for this kind of problem. Suppose that one system is characterized by a quantum number j1 and the other by j2. Here these numbers are supposed to be those of (3.89), namely the highest positive number of the generalized angular momentum in the z-direction. That is, the representation matrices of rotation for systems 1 and 2 are assumed to be Dðj1 Þ ðω, 0, 0Þ and Dðj2 Þ ðω, 0, 0Þ, respectively. Then, we are to deal with the direct-product representation Dðj1 Þ  Dðj2 Þ (see Sect. 18.8). As in (18.193), we describe

20.3

Clebsch–Gordan Coefficients of Rotation Groups

875

Dðj1 × j2 Þ ðωÞ = Dðj1 Þ ðωÞ  Dðj2 Þ ðωÞ:

ð20:162Þ

In (20.162), the group is represented by a rotation angle ω, more strictly the rotation operation of an angle ω. The quantum numbers j1 and j2 mean the irreducible representations of the rotations for systems 1 and 2, respectively. According to (20.161), we have χ ðj1 × j2 Þ ðωÞ = χ ðj1 Þ ðωÞχ ðj2 Þ ðωÞ, where χ ðj1 × j2 Þ ðωÞ gives a character of the direct-product representation; see (18.198). Furthermore, we have [5] χ ðj1 × j2 Þ ðωÞ = =

j1

j1 m1 = - j1 j2

m1 = - j1 m2 = - j2 j1 þj2 J

=

eim1 ω

j2 m2 = - j2

eim2 ω ð20:163Þ

eiðm1 þm2 Þω

J = jj1 - j2 j M = - J

eiMω =

j1 þj2 J = jj1 - j2 j

χ ðJ Þ ðωÞ,

where a positive number J can be chosen from among J = jj1 - j2 j, jj1 - j2 j þ 1, ⋯, j1 þ j2 :

ð20:164Þ

Rewriting (20.163) more explicitly, we have χ ðj1 × j2 Þ ðωÞ = χ ðjj1 - j2 jÞ ðωÞ þ χ ðjj1 - j2 jþ1Þ ðωÞ þ ⋯ þ χ ðj1 þj2 Þ ðωÞ:

ð20:165Þ

If both j1 and j2 are integers, from (20.165) we can immediately derive an important result. In light of (18.199) and (18.200), we multiply χ (k)(ω) on both sides of (20.165) and then integrate it to get χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞdΠ = þ

χ ðkÞ ðωÞ χ ðjj1 - j2 jÞ ðωÞdΠ þ

χ ðkÞ ðωÞ χ ðjj1 - j2 jþ1Þ ðωÞdΠ þ ⋯

ð20:166Þ

χ ðkÞ ðωÞ χ ðj1 þj2 Þ ðωÞdΠ

= 8π 2 δk,jj1 - j2 j þ δk,jj1 - j2 jþ1 þ ⋯ þ δk,j1 þj2 , where k is zero or an integer and with the last equality we used (20.152). Dividing both sides by dΠ, we get

876

20

Theory of Continuous Groups

χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞ = δk,jj1 - j2 j þ δk,jj1 - j2 jþ1 þ ⋯ þ δk,j1 þj2 ,

ð20:167Þ

where we used (20.153). Equation (20.166) implies that only if k is identical to one out of jj1 - j2j, jj1 - j2 j + 1, ⋯, j1 + j2, χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞ does not vanish. If, for example, we choose jj1 - j2j for k in (20.167), we have χ ðjj1 - j2 jÞ ðωÞ χ ðj1 × j2 Þ ðωÞ = δjj1 - j2 j,jj1 - j2 j = 1:

ð20:168Þ

This relation corresponds to (18.200) where a finite group is relevant. Note that the volume of SO(3), i.e., dΠ = 8π 2 corresponds to the order n of a finite group in (20.148). Considering (18.199) and (18.200) once again, it follows that in the above case Dðjj1 - j2 jÞ takes place once and only once in the direct-product representation Dðj1 × j2 Þ . Thus, we obtain the following important relation: j1 þj2

Dðj1 × j2 Þ = Dðj1 Þ ðωÞ  Dðj2 Þ ðωÞ =

J = jj1 - j2 j

DðJ Þ :

ð20:169Þ

Any group (finite or infinite) is said to be simply reducible, if each irreducible representation takes place at most once when the direct-product representation of that group in question is reduced. Equation (20.169) clearly shows that SO(3) is simply reducible. With respect to SU(2) we also have the same expression of (20.169) [9]. In Sect. 18.8 we examined the direct-product representation of two (irreducible) representations. Let D(μ)(g) and D(ν)(g) be two different irreducible representations of the group G. Here G may be either a finite or infinite group. Note that in Sect. 18.8 we have focused on the finite groups and that now we are thinking of an infinite group, typically SU(2) and SO(3). Let nμ and nν be a dimension of the representation space of the irreducible representations μ and ν, respectively. We assume that each representation space is spanned by following basis functions: ðμÞ

ψj

ðνÞ

μ = 1, ⋯, nμ and ϕl ðν = 1, ⋯, nν Þ:

We know from Sect. 18.8 that nμnν new basis functions can be constructed using ð μÞ ð ν Þ the functions that are described by ψ j ϕl . Our next task will be to classify these nμnν functions into those belonging to different irreducible representations. For this, we need relation of (18.199). According to the custom, we rewrite it as follows: DðμÞ ðωÞ  DðνÞ ðωÞ =

γ

ðμνγ ÞDðγÞ ðωÞ,

ð20:170Þ

where ω denotes a group element (i.e., rotation angle); (μνγ) is identical with qγ of RHS of (18.199) and shows how many times the same representation γ takes place in the direct-product representations. Naturally, we have

20.3

Clebsch–Gordan Coefficients of Rotation Groups

877

ðμνγ Þ = ðνμγ Þ: Then, we get

γ

ðμνγ Þnγ = nμ nν :

Meanwhile, we should be able to constitute the functions that have transformation properties the same as those of the irreducible representation γ. In other words, these ðγ τ Þ functions should make the basis functions belonging to γ. Let the function be Ψs γ . Then, we have ðγ τ Þ Ψs γ =

ð μÞ ð ν Þ

μj, νljγ τγ s ψ j ϕl ,

j, l

ð20:171Þ

where τγ = 1, ⋯, (μνγ) and s stands for the number that designates the basis functions of the irreducible representation γ; i.e., s = 1, ⋯, nγ . (We often designate the basis functions with a negative integer. In that case, we choose that negative integer for s; see Examples 20.1 and 20.2 shown later.) If γ takes place more than once, we label γ as γ τγ . The coefficients with respect to the above linear combination μj, νljγ τγ s are called Clebsch-Gordan coefficients. Readers might well be bewildered by the notations of abstract algebra, but Examples of Sect. 20.3.3 should greatly relieve them of anxiety about it. ðγ τ Þ As we shall see later in this chapter, the functions Ψs γ are going to be ðμÞ ðνÞ normalized along with the product functions ψ j ϕl . The Clebsch-Gordan coefficients must form a unitary matrix accordingly. Notice that a transformation between two orthonormal basis sets is unitary (see Chap. 14). According as there are nμnν product functions as the basis vectors, the Clebsch-Gordan coefficients are associated with (nμnν, nμnν) square matrices. The said coefficients can be defined with any group either finite or infinite. Or rather, difficulty in determining the ClebschGordan coefficients arises when τγ is more than one. This is because we should take account of linear combinations described by

τγ

ðγ τ Þ c γ τ γ Ψs γ

that belong to the irreducible representation γ. This would cause arbitrariness and ambiguity [5]. Nevertheless, we do not need to get into further discussion about this

878

20

Theory of Continuous Groups

problem. From now on, we focus on the Clebsch-Gordan coefficients of SU(2) and SO(3), a typical simply reducible group, namely τγ is at most one. For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1) (2j2 + 1) for the representation space with the direct product of two representations is redistributed to individual irreducible representations that take place only once. The arithmetic based on this fact is as follows: ð2j1 þ 1Þð2j2 þ 1Þ 1  2j ð2j þ 1Þ þ ð2j2 þ 1Þ 2 2 2 = 2ðj1 - j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ ⋯ þ 2j2 Þ þ ð2j2 þ 1Þ

= 2ðj1 - j2 Þð2j2 þ 1Þ þ 2

= ½2ðj1 - j2 Þ þ 0 þ 2½ðj1 - j2 Þ þ 1 þ 2½ðj1 - j2 Þ þ 2 þ ⋯ þ2½ðj1 - j2 Þ þ 2j2  þ ð2j2 þ 1Þ

ð20:172Þ

= ½2ðj1 - j2 Þ þ 1 þ f2½ðj1 - j2 Þ þ 1 þ 1g þ ⋯ þ ½2ðj1 þ j2 Þ þ 1, where with the second equality, we used the formula 1 þ 2 þ ⋯ þ n = 12 nðn þ 1Þ; resulting numbers of 0, 1, 2, ⋯, and 2j2 are distributed to 2j2 + 1 terms of the subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1 terms each as well on the rightmost hand side. Equation (20.172) is symmetric with j1 and j2, but if we assume that j1 ≥ j2, each term of the rightmost hand side of (20.172) is positive. Therefore, for convenience we assume j1 ≥ j2 without loss of generality. To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3). Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general . This case. In these tables the topmost layer has a diagram indicated by diagram corresponds to the case of J = j j1 - j2j and contains different 2|j1 j2| + 1 quantum states. The lower layers include diagrams indicated by . These diagrams correspond to the case of J = |j1 - j2| + 1, ⋯, j1 + j2. For each J, 2J + 1 functions (or quantum states) are included and labeled according to different numbers M  m1 + m2 = - J, - J + 1, ⋯, J. The rightmost column displays J = |j1 - j2|, |j1 - j2| + 1, ⋯, j1 + j2 from the topmost row through the bottommost row. Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1) functions that form basis vectors of Dðj1 Þ ðωÞ  Dðj2 Þ ðωÞ are redistributed according to the M number. The same numbers that appear from the topmost row through the bottommost row are marked red and connected with dotted aqua lines. These lines form a parallelogram together with the top and bottom horizontal lines as shown. The upper and lower sides of the parallelogram are 2 j j1 - j2j in width. The parallelogram gets flattened with decreasing jj1 - j2j. If j1 = j2, the parallelogram coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2 and 20.3.3 (vide infra). Bearing in mind the above remarks, we focus on the coupling of two angular momenta as a typical example. Within this framework we deal with the ClebschGordan coefficients with regard to SU(2). We develop the discussion according to

20.3

Clebsch–Gordan Coefficients of Rotation Groups

879

Table 20.1 Summation of angular momenta. J = 0, 1, or 2; M  - J, - J + 1, ⋯, J 1

−1

0

−1

−2

−1

0

0

−1

0

1

0

1

2

2

1 (=

2)

1 (=

1)

Regarding a dotted aqua line, see text

Table 20.2 Summation of angular momenta. J = 1, 2, or 3; M  - J, - J + 1, ⋯, J 1

−2

2

−1

0

1

2 (=

−1

−3

−2

−1

0

1

0

−2

−1

0

1

2

−1

0

1

2

3

1 (=

2)

1)

Table 20.3 Summation of angular momenta in a general case 1 2



2

− 2+1 ⋯ 2

−1 2

−1 −1− −1−

2

− 1+1

2

−1+

2



2

2

−1−

2

+1



+1

−1−

2

+2



2



32−

+1



3

⋯ −1+



⋯ −1 2

−1+ −1+

2

2

2





1



1 1



+1



−1



1



⋯ 2

1





1

1

−2

2

1

−3

2

−3 2+1



⋯ 1



2

1



⋯ ⋯

1 1



−1



2

1





1

−1

1



2

1



2

1

+

2

1

+



2

+1



−1



1

+

2

−2

2



1

+

2

−1

−1 2

We suppose j1 > 2j2. J = |j1 - j2|, ⋯, j1 + j2; M  - J, - J + 1, ⋯, J

literature [5]. The relevant approach helps address a more complicated situation where the coupling of three or more angular momenta including j - j coupling and L - S coupling is responsible [6]. From (20.47) we know that fm forms a basis set that spans a (2j + 1)-dimensional representation space associated with D( j ) such that

880

20

fm =

u jþm v j - m ðj þ mÞ!ðj - mÞ!

Theory of Continuous Groups

ðm = - j; - j þ 1; ⋯; j - 1; jÞ:

ð20:47Þ

The functions fm are transformed according to Rða, bÞ so that we have ðjÞ

Rða, bÞðf m Þ  m0

ð20:48Þ

f m0 Dm0 m ða, bÞ,

where j can be either an integer or a half-odd-integer. We henceforth work on fm of (20.47).

20.3.2

Calculation Procedures of Clebsch-Gordan Coefficients

In the previous section we have described how systematically the quantum states are made up of two angular momenta. We express those states in terms of a linear combination of the basis functions related to the direct-product representations of (20.169). To determine those coefficients, we follow calculation procedures due to Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize each item separately. 1. Invariant quantities AJ and BJ: We start with seeking invariant quantities under the unitary transformation U of (20.35) expressed as U=

a - b

b a

with jaj2 þ jbj2 = 1:

ð20:35Þ

As in (20.46) let us consider a combination of variables u  (u2 u1) and v  (v2 v1) that are transformed such that u02 u01 = ðu2 u1 Þ

a - b

b a

, v02 v01 = ðv2 v1 Þ

a - b

b a

:

ð20:173Þ

For a shorthand notation we have u0 = um, v0 = vm, where we define u′, v′, m as

ð20:174Þ

20.3

Clebsch–Gordan Coefficients of Rotation Groups

881

u0  u02 u01 , v0  v02 v01 , m 

a - b

:

b a

ð20:175Þ

As in (20.47) we also define the following functions as ψ jm1 1 = ϕjm2 2 =

u1 j1 þm1 u2 j1 - m1 ðm1 = - j1 , - j1 þ 1, ⋯, j1 - 1, j1 Þ, ð j1 þ m1 Þ!ð j1 - m1 Þ! v1 j2 þm2 v2 j2 - m2

ð j2 þ m2 Þ!ð j2 - m2 Þ!

ðm2 = - j2 ; - j2 þ 1; ⋯; j2 - 1; j2 Þ:

ð20:176Þ

Meanwhile, we also consider a combination of variables x  (x2 x1) that are transformed as a -b

x02 x01 = ðx2 x1 Þ

b a

:

ð20:177Þ

For a shorthand notation we have x0 = xm ,

ð20:178Þ

where we have x0  x02 x01 , x  ðx2 x1 Þ, m =

a -b

b a

:

ð20:179Þ

The unitary matrix m is said to be a complex conjugate matrix of m. We say that in (20.174) u and v are transformed covariantly and that in (20.178) x is transformed contravariantly. Defining a unitary operator g as g

0 1

-1 0

ð20:180Þ

and operating g on both sides of (20.174), we have u0 g = u gg{ mg = ug g{ mg = ugm :

ð20:181Þ

Rewriting (20.181) as u0 g = ðugÞm ,

ð20:182Þ

we find that ug is transformed contravariantly. Taking transposition of (20.178), we have

882

20

Theory of Continuous Groups

x0 = ðm ÞT xT = m{ xT : T

Then, we get u0 x0 = ðumÞ m{ xT = u mm{ xT = uxT = u2 x2 þ u1 x1 = uxT , T

ð20:183Þ

where with the third equality we used the unitarity of m. This implies that uxT is an invariant quantity under the unitary transformation by m. We have v0 x0 = vxT , T

ð20:184Þ

meaning that vxT is an invariant as well. In a similar manner, we have u01 v02 - u02 v01 = v0 ðu0 gÞT = v0 gT u0 T = vmgT mT uT = vgT uT = vðugÞT 0 1 u2 = ð v2 v1 Þ = u1 v 2 - u2 v 1 : -1 0 u1

ð20:185Þ

Thus, v(ug)T (or u1v2 - u2v1) is another invariant under the unitary transformation by m. In the above discussion we can view (20.183)–(20.185) as a quadratic form of two variables (see Sect. 14.5). Using these invariants, we define other important invariants AJ and BJ as follows [5]: AJ  ðu1 v2 - u2 v1 Þj1 þj2 - J ðu2 x2 þ u1 x1 Þj1 - j2 þJ ðv2 x2 þ v1 x1 Þj2 - j1 þJ , BJ  ðu2 x2 þ u1 x1 Þ2J :

ð20:186Þ ð20:187Þ

The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1 and x2 is 2J. Expanding AJ in powers of x1 and x2, we get AJ =

J M = -J

W JM X JM ,

ð20:188Þ

where X JM is defined as X JM 

x1 JþM x2 J - M ðM = - J, - J þ 1, ⋯, J - 1, J Þ ðJ þ M Þ!ðJ - M Þ!

ð20:189Þ

and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will be determined soon in combination with X JM . Meanwhile, we have

20.3

Clebsch–Gordan Coefficients of Rotation Groups

883

2J

BJ =

ð2J Þ! ðu1 x1 Þk ðu2 x2 Þ2J - k k! ð 2J k Þ! k=0 J

= ð2J Þ!

1 ðu1 x1 ÞJþM ðu2 x2 ÞJ - M ð J þ M Þ! ð J M Þ! M= -J J

ð20:190Þ

1 u JþM u2 J - M x1 JþM x2 J - M = ð2J Þ! ð J þ M Þ! ðJ - M Þ! 1 M= -J J

= ð2J Þ! M= -J

ΦJM X JM ,

where ΦMJ is defined as ΦMJ 

u1 JþM u2 J - M ðM = - J, - J þ 1, ⋯, J - 1, J Þ: ðJ þ M Þ!ðJ - M Þ!

ð20:191Þ

Implications of the above calculations of abstract algebra are as follows: In Sect. ðjÞ 20.2.2 we determined the representation matrices Dm0 m of SU(2) using (20.47) and (20.48). We see that the functions fm (m = - j, -j + 1, ⋯, j - 1, j) of (20.47) are transformed as basis vectors by the unitary transformation D( j ). Meanwhile, the functions ΦJM ðM = - J, - J þ 1, ⋯, J - 1, J Þ of (20.191) are transformed as the basis vectors by the unitary transformation D(J ). Here let us compare (20.188) and (20.190). Then, we see that whereas ΦJM associates X JM with the invariant BJ, W JM associates X JM with the invariant AJ. Thus, W JM plays the same role as ΦJM and, hence, we expect that W JM is eligible for the basis vectors for D(J ) as well. 2. Binomial expansion of AJ: To calculate W JM , using the binomial theorem we expand AJ such that [5] ðu1 v2 - u2 v1 Þj1 þj2 - J =

j1 þj2 - J

j1 þ j2 - J

ð- 1Þλ

λ

λ=0

ðu2 x2 þ u1 x1 Þj1 - j2 þJ = ðv2 x2 þ v1 x1 Þj2 - j1 þJ =

j1 - j2 þJ

j1 - j2 þ J

μ=0

μ

j1 - j2 þJ

j2 - j1 þ J

μ=0

ν

ðu1 v2 Þj1 þj2 - J - λ ðu2 v1 Þλ ,

ðu1 x1 Þj1 - j2 þJ - μ ðu2 x2 Þμ , ðv1 x1 Þj2 - j1 þJ - ν ðv2 x2 Þν : ð20:192Þ

Therefore, we have

884

20

Theory of Continuous Groups

j1 þ j2 - J j1 - j2 þ J j2 - j1 þ J AJ = ð- 1Þλ λ μ ν λ, μ , ν × u1 2j1 - λ - μ u2 λþμ v1 j2 - j1 þJ - νþλ v2 j1 þj2 - J - λþν x1 2J - μ - ν x2 μþν :

ð20:193Þ

Introducing new summation variables m1 = j1 - λ - μ and m2 = J - j1 + λ - ν, we obtain AJ =

λ, μ , ν

×

ð- 1Þλ

u1

j1 þm1

where to derive

u2

j1 þ j2 - J j1 - m1

j2 - j1 þJ j2 - λþm2

j1 - j2 þ J

j2 - j1 þ J

λ j1 - λ - m1 j2 - λ þ m 2 v1 j2 þm2 v2 j2 - m2 x1 Jþm1 þm2 x2 J - m1 - m2 ,

ð20:194Þ

in RHS we used a b

=

a a-b

:

ð20:195Þ

Further using (20.176) and (20.189), we modify AJ such that AJ =

ð - 1Þλ

ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ! λ!ðj1 þ j2 - J - λÞ!ðj1 - λ - m1 Þ!

m1 ,m2 , λ ½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ m1 þ m2 Þ!ðJ - m1 - m2 Þ!1=2 j1 j2 J ψ m1 ϕm2 X m1 þm2 : × ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ - j1 þ λ - m2 Þ!

Setting m1 + m2 = M, we have ð - 1Þλ

AJ = m 1 , m2 , λ

×

ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ! λ!ðj1 þ j2 - J - λÞ!ðj1 - λ - m1 Þ!

½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ M Þ!ðJ - M Þ!1=2 j1 j2 J ψ m 1 ϕm 2 X M : ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ - j1 þ λ - m2 Þ! ð20:196Þ

In this way, we get W JM for (20.188) expressed as W MJ = ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ! C mJ 1 , m2 ψ mj11 ϕmj22 , × m1 , m2 , m1 þm2 = M where we define C Jm1 ,m2 as

ð20:197Þ

20.3

Clebsch–Gordan Coefficients of Rotation Groups

CJm1 ,m2 

λ

885

ð-1Þλ ½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ M Þ!ðJ - M Þ!1=2 : λ!ðj1 þ j2 - J -λÞ!ðj1 -λ - m1 Þ!ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ -j1 þ λ - m2 Þ! ð20:198Þ

From (20.197), we assume that the functions W JM ðM = - J, - J þ 1, ⋯, J - 1, J Þ are suited for forming basis vectors of D(J ). Then, any basis set ΛJM should be connected to W JM via unitary transformation such that ΛJM = UW JM , where U is a unitary matrix. Then, we get W JM = U { ΛJM . What we have to do is only to normalize W JM . If we carefully look at the functional form of (20.197), we become aware that we only have to adjust the first factor of (20.197) that is a function of only J. Hence, as the proper normalized functions we expect to have ΨJM = CðJ ÞW JM = CðJ ÞU{ ΛJM ,

ð20:199Þ

where C(J ) is a constant that depends only on J with given numbers j1 and j2. An implication of (20.199) is that we can get suitably normalized functions ΨJM using arbitrary basis vectors ΛJM . Putting ρJ  C ðJ Þ ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ!,

ð20:200Þ

we get ΨJM = ρJ

m1 , m2 , m1 þm2 = M

C Jm1 ,m2 ψ jm1 1 ϕjm2 2 :

ð20:201Þ

The combined states are completely decided by J and M. If we fix J at a certain number, (20.201) is expressed as a linear combination of different functions ψ jm1 1 ϕjm2 2 with varying m1 and m2 but fixed m1 + m2 = M. Returning back to Tables 20.1, 20.2, and 20.3, the same M can be found in at most 2j2 + 1 places. This implies that ΨJM of (20.201) has at most 2j2 + 1 terms on condition that j1 ≥ j2. In (20.201) the ClebschGordan coefficients are given by ρJ CJm1 , m2 : The descriptions of the Clebsch-Gordan coefficients are different from literature to literature [5, 6]. We adopt the description due to Hamermesh [5] such that

886

20

Theory of Continuous Groups

ðj1 m1 j2 m2 jJM Þ  ρJ CJm1 ,m2 , ΨJM =

m1 , m2 , m1 þm2 = M

ðj1 m1 j2 m2 jJM Þψ jm1 1 ϕjm2 2 :

ð20:202Þ

3. Normalization of ΨJM : Normalization condition for ΨJM of (20.201) or (20.202) is given by 2

jρJ j2 m1 , m2 , m1 þm2 = M

C Jm1 ,m2 = 1:

ð20:203Þ

To obtain (20.203), we assume the following normalization condition for the basis vectors ψ jm1 1 ϕjm2 2 that span the representation space. That is, for an appropriate pair of complex variables z1 and z2, we define their inner product as follows [9]: -l k m-k = jz1 z2 zl1 zm 2

l!ðm - lÞ!

k!ðm - k Þ!δlk

i.e., -l zl1 zm 2 j l!ðm - lÞ!

-k zk1 zm 2 = δlk : k!ðm - kÞ!

ð20:204Þ

Equation (20.204) implies that m + 1 monomials of m-degree with respect to z1 and z2 constitute the orthonormal basis. Since ρJ defined in (20.200) is independent of M, we can evaluate it by choosing M conveniently in (20.203). Setting M (=m1 + m2) = J and substituting it in J j1 + λ - m2 of (20.198), we obtain (m1 + m2) - j1 + λ - m2 = λ + m1 - j1. So far as we are dealing with integers, an inside of the factorial must be non-negative, and so we have λ þ m 1 - j1 ≥ 0

or

λ ≥ j1 - m1 :

ð20:205Þ

Meanwhile, looking at another factorial ( j1 - λ - m1)!, we get λ ≤ j1 - m 1 :

ð20:206Þ

From the above equations, we obtain only one choice for λ such that [5] λ = j1 - m 1 :

ð20:207Þ

Notice that j1 - m1 is an integer, whichever j1 takes out of an integer or a half-oddinteger. Then, we get

20.3

Clebsch–Gordan Coefficients of Rotation Groups

CJm1 ,m2 = ð- 1Þj1 - m1 = ð- 1Þj1 - m1

887

ð2J Þ!  ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!

ð2J Þ!  ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!

ðj1 þ m1 Þ!ðj2 þ m2 Þ! ðj1 - m1 Þ!ðj2 - m2 Þ! j1 þ m1

j2 þ m 2

j2 - m2

j1 - m 1

: ð20:208Þ

To confirm (20.208), calculate

j1 þ m1 j2 - m2

j2 þ m2 j1 - m1

and use, e.g.,

j1 - j2 þ m1 þ m2 = j1 - j2 þ M = j1 - j2 þ J:

ð20:209Þ

Thus, we get

m1 , m2 , m1 þm2 = M

=

C Jm1 ,m2

2

j1 þ m 1 ð2J Þ! ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! m , m , m þm = M j2 - m2 1 2 1 2

j2 þ m2 j1 - m1

ð20:210Þ :

To evaluate the sum, we use the following relations [4, 5, 10]: ΓðzÞΓð1 - zÞ = π=ðsin πzÞ:

ð20:211Þ

Replacing z with x + 1 in (20.211), we have Γðx þ 1ÞΓð- xÞ = π=½sin π ðx þ 1Þ:

ð20:212Þ

Meanwhile, using gamma functions, we have x y

Γ ð x þ 1Þ Γðx þ 1ÞΓð - xÞ Γðy - xÞ x! =  = y!ðx - yÞ! y!Γðx - y þ 1Þ Γðx - y þ 1ÞΓðy - xÞ y!Γð - xÞ π sin π ðx - y þ 1Þ Γðy - xÞ sin π ðx - y þ 1Þ Γðy - xÞ =   =  sin π ðx þ 1Þ π y!Γð - xÞ sin π ðx þ 1Þ y!Γð - xÞ Γðy - xÞ : = ð - 1Þ y  y!Γð - xÞ =

ð20:213Þ Replacing x in (20.213) with y - x - 1, we have

888

20 y-x-1 y

= ð- 1Þy 

Theory of Continuous Groups

Γðx þ 1Þ = ð- 1Þy  y!Γðx - y þ 1Þ

x y

,

where with the last equality we used (20.213). That is, we get = ð- 1Þy

x y

y-x-1 y

:

ð20:214Þ

Applying (20.214) to (20.210), we get

m1 , m2 , m1 þm2 ¼M

¼

C Jm1 ,m2

2

ð2J Þ! ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j2 - m2 - j1 - m1 - 1 j2 - m2

ð- 1Þj2 - m2 ð- 1Þj1 - m1

× m1 , m2 , m1 þm2 ¼M



¼

j1 - m1 - j2 - m2 - 1 j1 - m 1

ð- 1Þj1 þj2 - J ð2J Þ! ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! m , m , m þm 1

2

1

j2 - j1 - J - 1

j1 - j2 - J - 1

j2 - m2

j1 - m 1

2 ¼M

:

ð20:215Þ Meanwhile, from the binomial theorem, we have ð1 þ xÞr ð1 þ xÞs = α

r α x α

= ð 1 þ xÞ

rþs

β

= γ

s β x = β α, β rþs γ x: γ

r α

s αþβ x β

ð20:216Þ

Comparing the coefficients of the γ-th power monomials of the last three sides, we get rþs γ

= α, β, αþβ = γ

r α

s β

:

ð20:217Þ

Applying (20.217) to (20.215) and employing (20.214) once again, we obtain

20.3

Clebsch–Gordan Coefficients of Rotation Groups

m1 , m2 , m1 þm2 ¼M

¼ ¼ ¼

C Jm1 ,m2

2

ð- 1Þj1 þj2 - J ð2J Þ! ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j1 þj2 - J

889

j1 þj2 - J

- 2J - 2 j1 þ j2 - J

ð- 1Þ ð- 1Þ ð2J Þ! ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! ð2J Þ! ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!

j1 þ j2 þ J þ 1 j1 þ j2 - J

j1 þ j2 þ J þ 1

ð20:218Þ

j1 þ j2 - J ðj1 þ j2 þ J þ 1Þ! ð2J Þ!  ¼ ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! ðj1 þ j2 - J Þ!ð2J þ 1Þ! ðj1 þ j2 þ J þ 1Þ! ¼ : ð2J þ 1ÞðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ! Inserting (20.218) into (20.203), as a positive number ρJ we have ρJ =

ð2J þ 1ÞðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ! : ðj1 þ j2 þ J þ 1Þ!

ð20:219Þ

From (20.200), we find C ðJ Þ =

2J þ 1 : ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ!ðj1 þ j2 þ J þ 1Þ!

ð20:220Þ

Thus, as the Clebsch-Gordan coefficients ( j1m1j2m2| JM), at last we get ðj1 m1 j2 m2 jJM Þ  ρJ CJm1 ,m2 ¼

× λ

ð2J þ 1ÞðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ! ðj1 þ j2 þ J þ 1Þ!

ð- 1Þλ ½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ M Þ!ðJ - M Þ!1=2 : λ!ðj1 þ j2 - J - λÞ!ðj1 - λ - m1 Þ!ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ - j1 þ λ - m2 Þ! ð20:221Þ

During the course of the above calculations, we encountered, e.g., Γ(-x) and the factorial involving a negative integer. Under ordinary circumstances, however, such things must be avoided. Nonetheless, it would be convenient for practical use, if we properly recognize that we first choose a number, e.g., -x close to (but not identical with) a negative integer and finally take the limit of -x at a negative integer after the related calculations have been finished.

890

20

Theory of Continuous Groups

Choosing, e.g., M (=m1 + m2) = - J in (20.203) instead of setting M = J in the above, we will see that only the factor of λ = j2 + m2 in (20.198) survives. Yet we get the same result as the above. The confirmation is left for readers.

20.3.3 Examples of Calculation of Clebsch-Gordan Coefficients Simple examples mentioned below help us understand the calculation procedures of the Clebsch-Gordan coefficients and their implications. Example 20.1 As a simplest example, we examine a case of D(1/2)  D(1/2). At the same time, it is an example of the symmetric and antisymmetric representations discussed in Sect. 18.9. Taking two sets of complex valuables u  (u2 u1) and v  (v2 v1) and rewriting them according to (20.176), we have 1=2

1=2

1=2

1=2

ψ - 1=2 = u2 , ψ 1=2 = u1 , ϕ - 1=2 = v2 , ϕ1=2 = v1 , so that we can get the basis functions of the direct-product representation described by ðu2 v2 u1 v2 u2 v1 u1 v1 Þ: Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) = 4, j1 = j2 = 1/2. Using (20.221) we can determine the proper basis functions with respect to ΨJM of (20.202). For example, for Ψ1- 1 we have only one choice to take m1 = m2 = - 12 to get m1 + m2 = M = - 1. In (20.221) we have no other choice but to take λ = 0. In this way, we have 1=2

1=2

Ψ1- 1 = u2 v2 = ψ - 1=2 ϕ - 1=2 : Similarly, we get 1=2 1=2

Ψ11 = u1 v1 = ψ 1=2 ϕ1=2 : With two other functions, using (20.221) we obtain 1 1 1=2 1=2 1=2 1=2 Ψ10 = p ðu2 v1 þ u1 v2 Þ = p ψ - 1=2 ϕ1=2 þ ψ 1=2 ϕ - 1=2 , 2 2

20.3

Clebsch–Gordan Coefficients of Rotation Groups

891

1 1 1=2 1=2 1=2 1=2 Ψ00 = p ðu1 v2 - u2 v1 Þ = p ψ 1=2 ϕ - 1=2 - ψ - 1=2 ϕ1=2 : 2 2 We put the above results in a simple matrix form representing the basis vector transformation such that

ð u2 v 2 u1 v 2 u1 v 1 u2 v 1 Þ

1

0

0

0

0

1 p 2

0

1 p 2

0 0

0 1 p 2

1 0

0 1 -p 2

= Ψ1- 1 Ψ10 Ψ11 Ψ00 :

ð20:222Þ

In this way, (20.222) shows how the basis functions that possess a proper irreducible representation and span the direct-product representation space are constructed using the original product functions. The Clebsch-Gordan coefficients play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this respect these coefficients resemble those appearing in the symmetry-adapted linear combinations (SALCs) mentioned in Sects. 18.2 and 19.3. Expressing the unitary transformation according to (20.173) and choosing 1 Ψ - 1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar manner to (20.48) such that Rða, bÞ Ψ1- 1 Ψ10 Ψ11 Ψ00 = Ψ1- 1 Ψ10 Ψ11 Ψ00 Dð1=2Þ  Dð1=2Þ :

ð20:223Þ

Then, as the matrix representation of D(1/2)  D(1/2) we get

Dð1=2Þ  Dð1=2Þ =

p 2 2ab pa    - 2ab aa p  bb  2  ðb Þ - 2a b 0 0

2 0 pb  2a b 0 ð a Þ 2 0 0 1

:

ð20:224Þ

Notice that (20.224) is a block matrix, namely (20.224) has been reduced according to the symmetric and antisymmetric representations. Symbolically writing (20.223) and (20.224), we have Dð1=2Þ  Dð1=2Þ = Dð1Þ ⨁Dð0Þ : The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1) submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the conditions (20.35) and (20.173). The confirmation is left for readers. The (3, 3) submatrix is a symmetric representation, whose representation space Ψ1- 1 Ψ10 Ψ11

892

20

Theory of Continuous Groups

span. Note that these functions are symmetric with the exchange m1 $ m2. Or, we may think that these functions are symmetric with the exchange ψ $ ϕ. Though trivial, the basis function Ψ00 spans the antisymmetric representation space. The corresponding submatrix is merely the number 1. This is directly related to the fact that v(ug)T (or u1v2 - u2v1) of (20.185) is an invariant. The function Ψ00 changes the sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ. Once again, readers might well ask why we need to work out such an elaborate means to pick up a small pebble. With increasing dimension of the vector space, however, to seek an appropriate set of proper basis functions becomes increasingly as difficult as lifting a huge rock. Though simple, a next example gives us a feel for it. Under such situations, a projection operator is an indispensable tool to address the problems. Example 20.2 We examine a direct product of D(1)  D(1), where j1 = j2 = 1. This is another example of the symmetric and antisymmetric representations. Taking two sets of complex valuables u  (u2 u1) and v  (v2 v1) and rewriting them according to (20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions expressed as ψ 1- 1 ϕ1- 1 , ψ 10 ϕ1- 1 , ψ 11 ϕ1- 1 , ψ 1- 1 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 1- 1 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 :

ð20:225Þ

These product functions form the basis functions of the direct-product representation. Consequently, we will be able to construct proper eigenfunctions by means of linear combinations of these functions. This method is again related to that based on SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the Clebsch-Gordan coefficients described by (20.221). As implied in Table 20.1, we need to determine the individual coefficients of nine functions of (20.225) with respect to the following functions that are constituted by linear combinations of the above nine functions: Ψ2- 2 , Ψ2- 1 , Ψ20 , Ψ21 , Ψ22 , Ψ1- 1 , Ψ10 , Ψ11 , Ψ00 : (i) With the first five functions, we have J = j1 + j2 = 2. In this case, the determination procedures of the coefficients are pretty much simplified. Substituting J = j1 + j2 for the second factor of the denominator of (20.221), we get (-λ)! As well as λ! In the first factor of the denominator. Therefore, for the inside of factorials not to be negative we must have λ = 0. Consequently, we have [5] ρJ = C Jm1 ,m2 =

ð2j1 Þ!ð2j2 Þ! , ð2J Þ!

ðJ þ M Þ!ðJ - M Þ! ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!

1=2

:

20.3

Clebsch–Gordan Coefficients of Rotation Groups

893

With Ψ2± 2 we have only one choice for ρJ and C Jm1 ,m2 . That is, we have m1 = ± j1 and m2 = ± j2. Then, Ψ2± 2 = ψ 1± 1 ϕ1± 1 . In the case of M = - 1, we have two choices in (20.221); i.e., m1 = - 1, m2 = 0 and m1 = 0, m2 = - 1. Then, we get 1 Ψ2- 1 = p ψ 1- 1 ϕ10 þ ψ 10 ϕ1- 1 : 2

ð20:226Þ

In the case of M = 1, similarly we have two choices in (20.221); i.e., m1 = 1, m2 = 0 and m1 = 0, m2 = 1. We get 1 Ψ21 = p ψ 11 ϕ10 þ ψ 10 ϕ11 : 2

ð20:227Þ

Moreover, in the case of M = 0, we have three choices in (20.221); i.e., m1 = 1, m2 = 1 and m1 = 0, m2 = 0 along with m1 = 1, m2 = - 1 (see Table 20.1). As a result, we get 1 Ψ20 = p ψ 1- 1 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ1- 1 : 6

ð20:228Þ

(ii) With J = 1, i.e., Ψ1- 1 , Ψ10 , and Ψ11 , we start with (20.221). Noting the λ! and (1 - λ)! in the first two factors of denominator, we have λ = 0 or λ = 1. In the case of M = - 1, we have only one choice of m1 = 0, m2 = - 1 for λ = 0. Similarly, m1 = - 1, m2 = 0 for λ = 1. Hence, we get 1 Ψ1- 1 = p ψ 10 ϕ1- 1 - ψ 1- 1 ϕ10 : 2

ð20:229Þ

In a similar manner, for M = 0 and M = 1, respectively, we have 1 Ψ10 = p ψ 11 ϕ1- 1 - ψ 1- 1 ϕ11 , 2

ð20:230Þ

1 Ψ11 = p ψ 11 ϕ10 - ψ 10 ϕ11 : 2

ð20:231Þ

(iii) With J = 0 (i.e., Ψ00), we have J = j1 - j2 (=0). In this case, we have J - j1 + λ - m2 = - j2 + λ - m2 in the denominator of (20.221) [5]. Since this factor is inside the factorial, we have -j2 + λ - m2 ≥ 0. Also, we have j2 - λ + m2 ≥ 0 in the denominator of (20.221). Hence, we have only one choice of λ = j2 + m2. Therefore, we get

894

20

ρJ = CJm1 ,m2 = ð- 1Þj2 þm2

Theory of Continuous Groups

ð2J þ 1Þ!ð2j2 Þ! , ð2j1 þ 1Þ!

ðj1 þ m1 Þ!ðj1 - m1 Þ! ðJ þ M Þ!ðJ - M Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!

1=2

:

In the above, we have three choices of m1 = - 1, m2 = 1; m1 = m2 = 0; m1 = 1, m2 = - 1. As a result, we get 1 Ψ00 = p ψ 1- 1 ϕ11 - ψ 10 ϕ10 þ ψ 11 ϕ1- 1 : 3 Summarizing the above results, we have constructed the proper eigenfunctions with respect to the combined angular momenta using the basis functions of the direct-product representation. An example of the relevant matrix representations is described as follows: ψ 1-1 ϕ1-1 ψ 10 ϕ1-1 ψ 11 ϕ1-1 ψ 10 ϕ11 ψ 11 ϕ11 ψ 1-1 ϕ10 ψ 1-1 ϕ11 ψ 11 ϕ10 ψ 10 ϕ10 1 0

×

0

0 0

0

0

0

0

1 0 p 0 2

0 0

1 p 2

0

0

0

1 0 0 p 0 0 6

0

1 p 2

0

1 p 3

0 0

1 0 p 0 2

0

0

1 -p 2

0

0 0

0

0

0

0

0

0

0

0

1 0 p 0 2

0 1

1 0 0 -p 2

1 0 0 p 0 0 6

0

1 -p 2

0

1 p 3

1 0 p 0 2

0

0

1 p 2

0

2 0 0 p 0 0 6

0

0

0

1 -p 3

0 0

ð20:232Þ

= Ψ2-2 Ψ2-1 Ψ20 Ψ21 Ψ22 Ψ1-1 Ψ10 Ψ11 Ψ00 : In (20.232) the matrix elements represent the Clebsch-Gordan coefficients and their combination in (20.232) forms a unitary matrix. In (20.232) we have arbitrariness with the disposition of the elements of both row and column vectors. In other

20.3

Clebsch–Gordan Coefficients of Rotation Groups

895

words, we have the arbitrariness with the unitary similarity transformation of the matrix, but the unitary matrix that underwent the unitary similarity transformation is again a unitary matrix (see Chap. 14). To show that the determinant of (20.232) is 1, use the expansion of the determinant and use the fact that when (a multiple of) a certain column (or row) is added to another column (or row), the determinant is unchanged (see Sect. 11.3). We find that the functions belonging to J = 2 and J = 0 are the basis functions of the symmetric representations. Meanwhile, those belonging to J = 1 are the basis functions of the antisymmetric representations (see Sect. 18.9). Expressing the unitary transformation of this example according to (20.173) and choosing Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation as ℜða, bÞ Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 = Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 Dð1Þ  Dð1Þ :

ð20:233Þ

Notice again that the representation matrix of D(1)  D(1) can be converted (or reduced) to block unitary matrices according to the symmetric and antisymmetric representations such that Dð1Þ  Dð1Þ = Dð2Þ ⨁Dð1Þ ⨁Dð0Þ ,

ð20:234Þ

where D(2) and D(0) are the symmetric representations and D(1) is the antisymmetric representation. Recall that SO(3) is simply reducible. To show (20.234), we need somewhat lengthy calculations, but the computation procedures are straightforward. A set of basis functions Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 spans a five-dimensional symmetric representation space. In turn, Ψ00 spans a one-dimensional symmetric representation space. Another set of basis functions Ψ1- 1 Ψ10 Ψ11 spans a threedimensional antisymmetric representation space. We add that to seek some special Clebsch-Gordan coefficients would be easy. For example, we can readily find out them in the case of J = j1 + j2 = M or J = j1 + j2 = - M. The final result of the normalized basis functions can be j þj

j

j

j þj

j

j

Ψ j11 þj22 = ψ j11 ϕj22 , Ψ -1 ðj 2þj Þ = ψ -1 j1 ϕ -2 j2 : 1

2

That is, among the related coefficients, only ðj1 , m1 = j1 , j2 , m2 = j2 jJ = j1 þ j2 , M = J Þ and ðj1 , m1 = - j1 , j2 , m2 = - j2 jJ = j1 þ j2 , M = - J Þ survive and take a value of 1; see the corresponding parts of the matrix elements in (20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.

896

20.4

20

Theory of Continuous Groups

Lie Groups and Lie Algebras

In Sect. 20.2 we have developed a practical approach to dealing with various aspects and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory of Lie groups and Lie algebras that enable us to systematically study SU(2) and SO(3).

20.4.1

Definition of Lie Groups and Lie Algebras: One-Parameter Groups

We start with the following discussion. Let A(t) be a non-singular matrix whose elements vary as a function of a real number t. Here we think of a matrix as a function of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be a one-parameter group. In particular, we are interested in a situation where we have Aðs þ t Þ = AðsÞAðt Þ:

ð20:235Þ

AðsÞAðt Þ = Aðs þ t Þ = Aðt þ sÞ = Aðt ÞAðsÞ:

ð20:236Þ

We further get

Therefore, any pair of A(t) and A(s) is commutative. Putting t = s = 0 in (20.236), we have Að0Þ = Að0ÞAð0Þ:

ð20:237Þ

Since A(0) is non-singular, A(0)-1 must exist. Then, multiplying A(0)-1 on both sides of (20.237), we obtain Að0Þ = E,

ð20:238Þ

where E is an identity matrix. Also, we have Aðt ÞAð- t Þ = Að0Þ = E, meaning that Að- t Þ = Aðt Þ - 1 : Differentiating both sides of (20.235) with respect to s at s = 0, we get

ð20:239Þ

20.4

Lie Groups and Lie Algebras

897

A0 ðt Þ = A0 ð0ÞAðt Þ:

ð20:240Þ

X  A0 ð0Þ,

ð20:241Þ

A0 ðt Þ = XAðt Þ:

ð20:242Þ

Aðt Þ = expðtX Þ

ð20:243Þ

Defining

we obtain

Integrating (20.242), we have

under an initial condition A(0) = E. Thus, any one-parameter group A(t) is described by (20.243) on condition that A(t) is represented by (20.235). In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context (20.243) shows how the Lie algebra is related to the one-parameter group. For a group to be a continuous group is thus deeply connected to this one-parameter group. Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups. Bearing in mind the above situation, we describe definitions of the Lie group and Lie algebra. Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that Ak (k = 1, 2, ⋯) 2 G and that lim Ak = A ½2 GLðn, ℂÞ. If A 2 G, then G is said to be a linear Lie k→1

group of dimension n. We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in that they read it as though the definition itself including the word of Lie group was written in the air. Nevertheless, a next example is again expected to relieve the chemists of useless anxiety. Example 20.3 Let G be a group and let Ak 2 G with det Ak = 1 such that Ak =

cos θk sin θk

- sin θk ðk : real numberÞ: cos θk

ð20:244Þ

Suppose that lim θk = θ and then we have k→1

lim Ak = lim

k→1

k→1

cos θk sin θk

- sin θk cos θk

=

cos θ sin θ

- sin θ cos θ

 A:

ð20:245Þ

Certainly, we have A 2 G, because det Ak = det A = 1. Such group is called SO(2).

898

20

Theory of Continuous Groups

In a word, a lie group is a group whose component consists of continuous and differentiable (or analytic) functions. In that sense the groups such as SU(2) and SO(3) we have already encountered are Lie groups. Next, the following is in turn a definition of a Lie algebra. Definition 20.3 [9] Let G be a linear Lie group. If with 8t (t: real number) we have expðtX Þ 2 G,

ð20:246Þ

all X are called a Lie algebra of the corresponding Lie group G. We denote the Lie algebra of the Lie group G by g. The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a complex number) as the trivial case. Most importantly, (20.246) is directly connected to (20.243) and Definition 20.3 shows the direct relevance between the Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie continuous groups. The lie algebra has following properties: (i) If X 2 g, aX 2 g as well, where a is any real number. (ii) If X, Y 2 g, X þ Y 2 g as well. (iii) If X, Y 2 g, ½X, Y ð XY - YX Þ 2 g as well. In fact, exp[t(aX)] = exp [(ta)X] and ta is a real number so that we have (i). Regarding (ii) we should examine individual properties of X. As already shown in Property (7)′ of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated with an anti-Hermitian matrix of X; we are interested in this particular case. In the case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skewsymmetric matrix. In the former case, for example, traceless anti-Hermitian matrices X and Y can be expressed as X=

ic a þ ib

- a þ ib ,Y = - ic

ih f þ ig

- f þ ig , - ih

ð20:247Þ

where a, b, c, etc. are real. Then, we have XþY=

iðc þ hÞ ða þ f Þ þ iðb þ gÞ

- ð a þ f Þ þ i ð b þ gÞ : - iðc þ hÞ

ð20:248Þ

Thus, X + Y is a traceless anti-Hermitian matrix as well. With the property (iii), we need some explanation. Suppose X, Y 2 g. Then, with any real numbers s and t, exp (sX), exp (tY), exp (-sX), exp (-tY) 2 G by Definition 20.3. Then, their product exp (sX) exp (tY) exp (-sX) exp (-tY) 2 G as well. Taking an infinitesimal transformation of these four products within the first order of s and t, we have

20.4

Lie Groups and Lie Algebras

899

expðsX Þ expðtY Þ expð - sX Þ expð - tY Þ ≈ ð1 þ sX Þð1 þ tY Þð1 - sX Þð1 - tY Þ = ð1 þ tY þ sX þ stXY Þð1 - tY - sX þ stXY Þ

ð20:249Þ

≈ ð1 - tY - sX þ stXY Þ þ ðtY - stYX Þ þ ðsX - stXY Þ þ stXY = 1 þ st ðXY - YX Þ: Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient. Defining ½X, Y   XY - YX,

ð20:250Þ

we have expðsX Þ expðtY Þ expð- sX Þ expð- tY Þ ≈ 1 þ st ½X, Y  ≈ expðst ½X, Y Þ:

ð20:251Þ

Since LHS represents an element of G and st is a real number, by Definition 20.3 ½X, Y  2 g. The quantity [X, Y] is called a commutator. The commutator and its definition appeared in (1.139). From properties (i) and (ii), the Lie algebra g forms a vector space (see Chap. 11). An element of g is usually expressed by a matrix. Zero vector corresponds to zero matrix (see Proposition 15.1). As already seen in Chap. 15, we defined an inner product between any A, B 2 g such that hAjBi  i, j

aij bij :

ð20:252Þ

Thus, g constitutes an inner product (vector) space.

20.4.2 Properties of Lie Algebras Let us further continue our discussion of Lie algebras by thinking of a following proposition about the inner product. Proposition 20.1 Let f(t) and g(t) be continuous and differentiable functions with respect to a real number t. Then, we have df ðt Þ dgðt Þ d jgðt Þ þ f ðt Þj : hf ðt Þjgðt Þi = dt dt dt Proof We calculate a following equation:

ð20:253Þ

900

20

Theory of Continuous Groups

hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt Þi = hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt þ ΔÞi þ hf ðt Þjgðt þ ΔÞi - hf ðt Þjgðt Þi = hf ðt þ ΔÞ - f ðt Þjgðt þ ΔÞi þ hf ðt Þjgðt þ ΔÞ - gðt Þi: ð20:254Þ Dividing both sides of (20.254) by a real number Δ and taking a limit as below, we have lim

1

Δ→0 Δ

½hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt Þi

= lim

Δ→0

f ðt þ Δ Þ - f ðt Þ gð t þ Δ Þ - gð t Þ jgðt þ ΔÞ þ f ðt Þj Δ Δ

ð20:255Þ ,

where we used the calculation rules for inner products of (13.3) and (13.20). Then, we get (20.253). This completes the proof. Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we have {

dAðt Þ d jAðt Þχ ψAðt Þ{ jAðt Þχ = ψ dt dt

dAðt Þ χ , þ ψAðt Þ{ j dt

ð20:256Þ

where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have LHS of ð20:256Þ =

d d d ψjAðt Þ{ Aðt Þjχ = hψjEjχ i = hψjχ i = 0, dt dt dt

ð20:257Þ

where with the second equality we used the unitarity of A(t). Taking limit t → 0 in (20.256), we have lim ½RHS of ð20:256Þ = ψ

t→0

dAð0Þ{ jAð0Þχ dt

dAð0Þ χ : þ ψAð0Þ{ j dt

Meanwhile, taking a limit of t → 0 in the following relation, we have dAðt Þ{ dt

t→0

=

dAðt Þ dt

{ t→0

{

= ½A0 ð0Þ ,

where we assumed that operations of the adjoint and t → 0 are commutable. Then, we get

20.4

Lie Groups and Lie Algebras

901

dAð0Þ { lim ½RHS of ð20:256Þ = ψ ½A0 ð0Þ jAð0Þχ þ ψAð0Þ{ j χ dt { { = ψX jχ þ hψjXχ i = ψ X þ X jχ ,

t→0

ð20:258Þ

where we used X  A′(0) of (20.241) and A(0) = A(0){ = E. From (20.257) and (20.258), we have 0 = ψ X { þ X jχ : Since ψ and χ are arbitrary, from Theorem 14.2 we get X{ þ X  0

i:e:,

X { = - X:

ð20:259Þ

This indicates that X is an anti-Hermitian operator. Let X be expressed as (xij). Then, from (20.259) xij = - xji . As for diagonal elements, xii = - xii ; i.e., xii þ xii = 0. Hence, xii is a pure imaginary number or zero. We have the following theorem accordingly. Theorem 20.3 Let A(t) be a one-parameter unitary group with A(0) = E that is described by A(t) = exp (tX) and satisfies (20.235). Then, X  A′(0) is an antiHermitian operator. Differentiating both sides of A(t) = exp (tX) with respect to t and using (15.47) of Theorem 15.4 [11, 12], we have A0 ðt Þ = X expðtX Þ = XAðt Þ = Aðt ÞX:

ð20:260Þ

Putting t = 0 in (20.260), we restore X = A′(0). If we require A(t) to be unitary, from Theorem 20.3 again X should be anti-Hermitian. Thus, once we have a one-parameter group in a form of a unitary operator A(t) = exp (tX), the exponent can be separated into a product of a parameter t and a t-independent constant antiHermitian operator X. In fact, all the one-parameter groups that have appeared in Sect. 20.2 are of a type of exp(tX). Conversely, let us consider what type of operator exp(tX) would be if X is antiHermitian. Here we assume that X is a (n, n) matrix. With an arbitrary real number t we have ½expðtX Þ½expðtX Þ{ = ½expðtX Þ exp tX { = ½expðtX - tX Þ = exp 0 = E,

= ½expðtX Þ½expð - tX Þ

ð20:261Þ

where we used (15.32) and Theorem 15.2 as well as the assumption that X is antiHermitian. Note that exp(tX) and exp(-tX) are commutative; see (15.29) of Sect. 15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The

902

20

Theory of Continuous Groups

notation U(n) means a unitary group. Hence, X 2 uðnÞ, where uðnÞ means the Lie algebra corresponding to U(n). Next, let us seek a Lie algebra that corresponds to a special unitary group SU(n). By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), uðnÞ ⊃ suðnÞ , where suðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition under which det[exp(tX)] = 1 holds with any real number t and the elements X 2 uðnÞ. From (15.41) [9], we have det½expðtX Þ = exp TrðtX Þ = exp t ½TrðX Þ = 1,

ð20:262Þ

where Tr stands for trace (see Chap. 12). This is equivalent to that t ½TrðX Þ = 2miπ ðm : zero or integersÞ

ð20:263Þ

holds with any real t. For this, we must have Tr(X) = 0 with m = 0. This implies that suðnÞ comprises anti-Hermitian matrices with its trace being zero (i.e., traceless). Consequently, the Lie algebra suðnÞ is certainly a subset (or subspace) of uðnÞ that consists of anti-Hermitian matrices. In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real and orthogonal, (20.256) can be rewritten as T

dAðt Þ d ψAðt ÞT jAðt Þχ = ψ jAðt Þχ dt dt

þ ψAðt ÞT j

dAðt Þ χ : dt

ð20:264Þ

Following the procedure similar to the case of (20.259), we get XT þ X  0

i:e:,

X T = - X:

ð20:265Þ

In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding Lie algebras are denoted by oðnÞ and soðnÞ, respectively. Note that both oðnÞ and soðnÞ consist of skew-symmetric matrices. Conversely, let us consider what type of operator exp(tX) would be if X is a skewsymmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real number t we have ½expðtX Þ½expðtX ÞT = ½expðtX Þ exp tX T = ½expðtX - tX Þ = exp 0 = E,

= ½expðtX Þ½expð - tX Þ

ð20:266Þ

where we used (15.31). This implies that exp(tX) is an orthogonal matrix, i.e., exp (tX) 2 O(n). Hence, X 2 oðnÞ. Notice that exp(tX) and exp(-tX) are commutative. Summarizing the above, we have the following theorem.

20.4

Lie Groups and Lie Algebras

903

Theorem 20.4 The n-th order Lie algebra uðnÞ corresponding to the Lie group U(n) (i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th order Lie algebra suðnÞ corresponding to the Lie group SU(n) (i.e., special unitary group) comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th order Lie algebras oðnÞ and soðnÞ corresponding to the Lie groups O(n) and SO(n), respectively, are all the real skew-symmetric (n, n) matrices. Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real number). This is true of skew-symmetric matrices as well. Hence, uðnÞ, suðnÞ, oðnÞ, and soðnÞ form a linear vector space. In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in quantum mechanics, especially as commutation relations (see Part I). Major properties are as follows: ðiÞ ½X þ Y, Z  = ½X, Z  þ ½Y, Z , ðiiÞ ½aX, Y  = a½X, Z  ða 2 ℝÞ, ðiiiÞ ½X, Y  = - ½Y, X , ðivÞ ½X, ½Y, Z  þ ½Y, ½Z, X  þ ½Z, ½X, Y  = 0:

ð20:267Þ

The last equation of (20.267) is well-known as Jacobi’s identity. Readers are encouraged to check it. Since the Lie algebra forms a linear vector space of a finite dimension, we denote its basis vectors by X1, X2, ⋯, Xd. Then, we have gðdÞ = SpanfX 1 , X 2 , ⋯, X d g,

ð20:268Þ

where gðd Þ stands for a Lie algebra of dimension d. As their commutators belong to gðdÞ as well, with an arbitrary pair of Xi and Xj (i, j = 1, 2, ⋯, d) we have Xi, Xj =

d

ð20:269Þ

f X , k = 1 ijk k

where a set of real coefficients fijk is said to be structure constants, which define the structure of the Lie algebra. Example 20.4 In (20.42) we write ζ 1  ζ x, ζ 2  ζ y, and ζ 3  ζ z. Rewriting it explicitly, we have ζ1 =

1 2

Then, we get

0 -i

-i 0

, ζ2 =

1 2

0 -1

1 0

, ζ3 =

1 2

i 0

0 -i

:

ð20:270Þ

904

20

Theory of Continuous Groups

3

ζi , ζj =

E ζ , k = 1 ijk k

ð20:271Þ

where Eijk is called the Levi-Civita symbol [4] and denoted by

Eijk =

þ1

ði, j, kÞ = ð1, 2, 3Þ, ð2, 3, 1Þ, ð3, 1, 2Þ;

-1 0

ði, j, kÞ = ð3, 2, 1Þ, ð1, 3, 2Þ, ð2, 1, 3Þ; otherwise:

ð20:272Þ

Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk = 1, but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk = - 1. Otherwise, for e.g., i = j, j = k, or k = i, etc. Eijk = 0. The relation of (20.271) is essentially the same as (3.30) and (3.69) of Chap. 3. We have ð20:273Þ

suð2Þ = Spanfζ 1 , ζ 2 , ζ 3 g: Equation (20.270) shows that the trace of ζ i (i = 1, 2, 3) is zero.

Example 20.5 In (20.26) we write A1  Ax, A2  Ay, and A3  Az. Rewriting it, we have A1 =

0 0 0

0 0 1

0 -1 0

, A2 =

0 0 -1

0 0 0

1 0 0

, A3 =

0 1 0

-1 0 0 0 0 0

:

ð20:274Þ

Again, we have Ai , Aj =

3

E A: k = 1 ijk k

ð20:275Þ

This is of the same form as that of (20.271). Namely, soð3Þ = oð3Þ = SpanfA1 , A2 , A3 g:

ð20:276Þ

Equation (20.274) clearly shows that diagonal elements of Ai (i = 1, 2, 3) are zero. From Examples 20.4 and 20.5, suð2Þ and soð3Þ ½or oð3Þ structurally resemble each other. This fact is directly related to similarity between SU(2) and SO(3). Note also that the dimensionality of suð2Þ and soð3Þ is the same in terms of linear vector spaces. In Chap. 3, we deal with the generalized angular momentum using the relation (3.30). Using (20.15), we can readily obtain the relation the same as (20.271) and (20.275). That is, from (3.30) defining the following anti-Hermitian operators as

20.4

Lie Groups and Lie Algebras

905

iJ y ~ iJ J~x  - x , J~y  , J  - iJ z =ħ, ħ ħ z we get J~l , J~m =

~

3

E J , k = 1 lmn n

where l, m, n stand for x, y, z. The derivation is left for readers.

20.4.3

Adjoint Representation of Lie Groups

In (20.252) of Sect. 20.3.1 we defined an inner product of g as hAjBi  i, j

aij bij :

ð20:252Þ

We can readily check that the definition (20.252) satisfies those of the inner product of (13.2), (13.3), and (13.4). In fact, we have hBjAi = i, j

bij aij =

aij bij

i, j





= i, j

aij bij

= hAjBi :

ð20:277Þ

Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4), we get hAjAi = Tr A{ A =

2

aij ≥ 0,

ð20:278Þ

i, j

where we have A = (aij) and the equality holds if and only if all aij = 0, i.e., A = 0. Thus, hA| Ai gives a positive definite inner product on g. From (20.278), we may equally define an inner product of g as hAjBi = Tr A{ B :

ð20:279Þ

In fact, we have Tr A{ B = i, j

A{ ij ðBÞji =

i, j

aji bji  hAjBi:

In another way, we can readily compare (20.279) with (20.252) using, e.g., a (2, 2) matrix and find that both equations give the same result. It is left for readers as an exercise.

906

20

Theory of Continuous Groups

From Theorem 20.4, uðnÞ corresponding to the unitary group U(n) consists of all the anti-Hermitian (n, n) matrices. With these matrices, we have hBjAi = Tr B{ A = i, j

bji aji =

- bij i, j

- aij =

i, j

aij bij = hAjBi:

ð20:280Þ

Then, comparing (20.277) and (20.280) we have hAjBi = hAjBi: This means that hA| Bi is real. For example, using uð2Þ, let us evaluate an inner product. We denote arbitrary X 1 , X 2 2 uð2Þ by X1 =

ia - cþid

cþid ib

, X2 =

ip - rþis

rþis iq

,

where a, b, c, d; p, q, r, s are real. Then, according to the calculation rule of an inner product of the Lie algebra, we have hX 1 jX 2 i = ap þ 2ðcr þ dsÞ þ bq, which gives a real inner product. Notice that this is the case with suð2Þ as well. Next, let us think of the following inner product: gXg - 1 jgYg - 1 = Tr gXg - 1 Þ{ gYg - 1 ,

ð20:281Þ

where g is any non-singular matrix. Bearing in mind that we are particularly interested in the continuous groups U(n) [or its subgroup SU(n)] and O(n) [or its subgroup SO(n)], we assume that g is an element of those groups and represented by a unitary matrix (including an orthogonal matrix); i.e., g-1 = g{ (or gT). Then, if g is a unitary matrix, we have {

Tr gXg - 1 gYg - 1 ¼ Tr gXg{ { gYg{ ¼ Tr gX { g{ gYg{ ¼ Tr gX { Yg{ ¼ Tr X { Y ¼ hX Yi,

ð20:282Þ

where with the second last equality we used (12.13), namely invariance of the trace under the (unitary) similarity transformation. If g is represented by a real orthogonal matrix, instead of (20.282) we have {

Tr gXg - 1 gYg - 1 ¼ Tr gXgT { gYgT ¼ Tr gT { X { g{ gYgT ¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hX Yi, ð20:283Þ

20.4

Lie Groups and Lie Algebras

907

where g is a complex conjugate matrix of g. Combining (20.281) and (20.282) or (20.283), we have gXg - 1 jgYg - 1 = hXjY i:

ð20:284Þ

This relation clearly shows that the real inner product of (20.284) remains unchanged by the operation X ⟼ gXg - 1 , where X is an anti-Hermitian (or skew-symmetric) matrix and g is a unitary (or orthogonal) matrix. Now, we give the following definition and related theorems in relation to both the Lie groups and Lie algebras. Definition 20.4 Let G be a Lie group chosen from U(n) [including SU(n)] and O(n) [including SO(n)]. Let g be a Lie algebra corresponding to G. Let g 2 G and X 2 g. We define the following transformation on g such that Ad½gðX Þ  gXg - 1 :

ð20:285Þ

Then, Ad[g] is said to be an adjoint representation of G on g. We write the relation (20.285) as Ad½g : g → g: From Definition 20.4, Ad½gðX Þ 2 g. The operator Ad[g] is a kind of mapping (i.e., endomorphism) discussed in Sect. 11.2. Notice that both g and X are represented by (n, n) matrices. The matrix g is either a unitary matrix or an orthogonal matrix. The matrix X is either an anti-Hermitian matrix or a skew-symmetric matrix. Theorem 20.5 Let g be an element of a Lie group G and g be a Lie algebra of G. Then, Ad[g] is a linear transformation on g. Proof Let X, Y 2 g. Then, we have Ad½gðaX þ bY Þ  gðaX þ bY Þg - 1 = a gXg - 1 þ b gYg - 1 = aAd½gðX Þ þ bAd½gðY Þ:

ð20:286Þ

Thus, we find that Ad[g] is a linear transformation. By Definition 20.3, exp (tX) 2 G with an arbitrary real number t. Meanwhile, we have expftAd½gðX Þg = exp tgXg - 1 = exp gtXg - 1 = g expðtX Þg - 1 , where with the last equality we used (15.30).

908

20

Theory of Continuous Groups

Then, if g 2 G, g exp (tX)g-1 2 G. Again from Definition 20.3, Ad½gðX Þ 2 g. That is, Ad[g] is a linear transformation on g. This completes the proof. Notice that in Theorem 20.5, G may be any linear Lie group chosen from among GL(n, ℂ). Lie algebra corresponding to GL(n, ℂ) is denoted by glðn, ℂÞ. The following theorem shows that Ad[g] is a representation. Theorem 20.6 Let g be an element of a Lie group G. Then, g ⟼ Ad[g] is a representation of G on g. Proof From (20.285), we have Ad½g1 g2 ðX Þ  ðg1 g2 ÞX ðg1 g2 Þ - 1 = g1 g2 Xg2- 1 g1- 1 = g1 Ad½g2 ðX Þg1- 1 ð20:287Þ = Ad½g1 ðAd½g2 ðX ÞÞ = ðAd½g1 Ad½g2 ÞðX Þ: Comparing the first and last sides of (20.287), we get Ad½g1 g2  = Ad½g1 Ad½g2 : That is, g ⟼ Ad[g] is a representation of G on g. By virtue of Theorem 20.6, we call g ⟼ Ad[g] an adjoint representation of G on g. Once again, we remark that X ⟼ Ad½gðX Þ  X 0

or

Ad½g : g → g

ð20:288Þ

is a linear transformation g → g, namely an endomorphism that operates on a vector space g (see Sect. 11.2). Figure 20.5 schematically depicts it. The notation of the adjoint representation Ad is somewhat confusing. That is, (20.288) shows that Ad[g] is a linear transformation g → g (i.e., endomorphism). Conversely, Ad is thought to be a mapping G → G′ where G and G′ mean two groups. Namely, we express it symbolically as [9]

Fig. 20.5 Linear transformation Ad[g]: g → g (i.e., endomorphism). Ad[g] is an endomorphism that operates on a vector space g

Ad[ ]

≡ Ad[ ]( )

20.4

Lie Groups and Lie Algebras

909

g ⟼ Ad½g

Ad : G → G0 :

or

ð20:289Þ

The G and G′ may or may not be identical. Examples can be seen in, e.g., (20.294) and (20.302) later. As mentioned above, if g is either uðnÞ or oðnÞ, Ad[g] is an orthogonal transformation on g. The representation of Ad[g] is real accordingly. An immediate implication of this is that the transformation of the basis vectors of g has a connection with the corresponding Lie groups such as U(n) and O(n). Moreover, it is well known and studied that there is a close relationship between Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of suð2Þ. A general form of X 2 suð2Þ is a following traceless anti-Hermitian matrix described by X=

- a þ ib , - ic

ic a þ ib

where a, b, and c are real. We have encountered this type of matrix in (20.42) and (20.270). Using an inner product described in (20.252), we determine an orthonormal basis set of suð2Þ such that 1 e1 = p 2

0 -i

-i 0

1 , e2 = p 2

0 -1

1 0

1 , e3 = p 2

i 0

0 -i

ð20:290Þ

,

where we have hei| eji = δij (i, j = 1, 2, 3). Using the basis set of suð2Þ given by (20.290) for those of the representation space, we wish to seek a tangible representation matrix of Ad[g] [g 2 SU(2)]. To construct Ad[g] [g 2 SU(2)], we choose gα, gβ 2 SU(2) given by

gα 

eiα=2 0

0

and

e - iα=2

gβ 

β 2 β - sin 2 cos

β 2 β cos 2 sin

:

The elements gα and gβ have appeared in (20.43) and (20.44). Then, we have [9] 1 Ad½gα ðe1 Þ = gα e1 gα- 1 = p 2 1 =p 2

eiα=2

0

0

-i

e - iα=2

0

0

e - iα=2

-i

0

0

eiα=2

0

sin α - i cos α

- sin α - i cos α

0

= e1 cos α þ e2 sin α: ð20:291Þ

Similarly, we get

910

20

Theory of Continuous Groups

1 0 cosαþisinα Ad½gα ðe2 Þ= p = -e1 sinαþe2 cosα, ð20:292Þ cosαþisinα 0 2 1 Ad½gα ðe3 Þ = p 2

i 0

= e3 :

0 -i

ð20:293Þ

Notice that we obtained (20.291)–(20.293) as a product of three diagonal matrices. Thus, we obtain ðe1 e2 e3 ÞAd½gα  = ðe1 e2 e3 Þ

cos α sin α 0

- sin α 0 cos α 0 0 1

:

ð20:294Þ

From (20.294), as a real representation matrix we get Ad½gα  =

cos α sin α 0

- sin α 0 cos α 0 0 1

:

ð20:295Þ

:

ð20:296Þ

Calculating (e1 e2 e3)Ad[gβ] likewise, we get Ad gβ =

cos β 0 - sin β

0 1 0

sin β 0 cos β

Moreover, with gγ =

eiγ=2 0

0 , e - iγ=2

ð20:297Þ

we have Ad gγ =

cos γ sin γ 0

- sin γ 0 cos γ 0 0 1

:

ð20:298Þ

Finally, let us reproduce (17.101) by calculating Ad½gα   Ad gβ  Ad gγ = Ad gα gβ gγ ,

ð20:299Þ

where we used the fact that gω ⟼ Ad[gω] (ω = α, β, γ) is a representation of SU(2) on suð2Þ. To obtain an explicit form of the representation matrix of Ad [g] [g 2 SU(2)], we use

20.4

Lie Groups and Lie Algebras

911

e2ðαþγÞ cos i

gαβγ  gα gβ gγ =

β 2

- e2ðγ - αÞ sin i

β 2 β - 2i ðαþγ Þ cos e 2 e2ðα - γÞ sin i

β 2

:

ð20:300Þ

Note that the matrix of (20.300) is identical with (20.45). Using (20.300), we calculate Ad[gαβγ ](e1) such that Ad gαβγ ðe1 Þ ¼ gαβγ e1 gαβγ -1 e2ðαþγ Þ cos i

1 ¼p 2

1 ¼p 2

β 2

-e2ðγ -αÞ sin i

e2ðα-γ Þ sin i

β 2

β - 2i ðαþγ Þ β e cos 2 2

-isinβ cosγ

0 -i

cosαcosβ cosγ - sinαsinγ sinαcosβ cosγ þ cosαsinγ - sinβ cosγ

i

i β β -e2ðα-γ Þ sin 2 2

i β β e2ðαþγ Þ cos 2 2 -iðcosαcosβ cosγ - sinαsinγ Þ þsinαcosβ cosγ þ cosαsinγ

-i 0

-iðcosαcosβ cosγ - sinαsinγ Þ - ðsinαcosβ cosγ þ cosαsinγ Þ

¼ ðe1 e2 e3 Þ

e - 2ðαþγ Þ cos e2ðγ -αÞ sin i

isinβ cosγ : ð20:301Þ

Similarly calculating Ad[gαβγ ](e2) and Ad[gαβγ ](e3) and combining the results with (20.301), we obtain ðe1 e2 e3 Þ Ad gαβγ = cosαcosβcosγ - sinαsinγ - cosαcosβsinγ - sinαcosγ cosαsinβ ð e1 e2 e3 Þ

sinαcosβcosγ þ cosαsinγ - sinαcosβsinγ þ cosαcosγ sinαsinβ - sinβcosγ

sinβsinγ

:

cosβ ð20:302Þ

The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles. The calculations are somewhat lengthy but straightforward. As can be seen, e.g., in (20.294) and (20.302), we may symbolically express the above adjoint representation as [9] Ad : SU ð2Þ → SOð3Þ

ð20:303Þ

in parallel to (20.289). Equation (20.303) shows the mapping between different groups SU(2) and SO(3). From (20.302) we may identify Ad[gαβγ ] with the SO(3) matrix (17.101) and write

912

20

Theory of Continuous Groups

Ad gαβγ ¼ cos α cos β cos γ - sin α sin γ

- cos α cos β sin γ - sin α cos γ

cos α sin β

sin α cos β cos γ þ cos α sin γ

- sin α cos β sin γ þ cos α cos γ

sin α sin β

- sin β cos γ

sin β sin γ

cos β

:

ð20:304Þ In the above discussion, we make several remarks. (i) g 2 SU(2) is described by (2, 2) complex matrices, but Ad[SU(2)] is expressed by (3, 3) real orthogonal matrices. Thus, the dimension of the matrices is different. (ii) Notice again that (e1 e2 e3) of (20.302) is an orthonormal basis set of suð2Þ. Hence, (20.302) can be viewed as the orthogonal transformation of (e1 e2 e3). That is, we may regard Ad [SU(2)] as all the matrix representations of SO(3). From (12.64) and (20.302), we write [9] Ad½SU ð2Þ - SOð3Þ = 0

or

Ad½SU ð2Þ = SOð3Þ:

Regarding the adjoint representation of G, we have the following important theorem. Theorem 20.7 The adjoint representation of Ad[SU(2)] is a surjective homomorphism of SU(2) on SO(3). The kernel F of the representation is {e, -e} 2 SU(2). With any rotation R 2 SO(3), two elements ±g 2 SU(2) satisfy Ad[g] = Ad[-g] = R. Proof We have shown that by (20.302) the adjoint representation of Ad[SU(2)] is surjective mapping of SU(2) on SO(3). Since Ad[g] is a representation, namely homomorphism mapping from SU(2) to SO(3), we seek its kernel on the basis of Theorem 16.3 (Sect. 16.4). Let h be an element of kernel of SU(2) such that Ad½h = E 2 SOð3Þ,

ð20:305Þ

where E is the identity transformation of SO(3). Operating (20.305) on e3 2 suð2Þ, from (20.305) we have a following expression such that Ad½hðe3 Þ = he3 h - 1 = E ðe3 Þ = e3

or

he3 = e3 h,

where e3 is denoted by (20.290). Expressing h as

h11 h21

h12 h22

ih12 - ih22

:

ih11 ih21

- ih12 - ih22

=

ih11 - ih21

, we get

From the above relation, we must have h12 = h21 = 0. As h 2 SU(2), deth = 1. Hence, for h we choose h = we get

a 0

0 a-1

, where a ≠ 0. Moreover, considering he2 = e2h

20.4

Lie Groups and Lie Algebras

913

(2)

Fig. 20.6 Homomorphism mapping Ad: SU(2) ⟶ SO (3) with a kernel F = {e, e}. E is the identity transformation of SO(3)

(3) Ad −

ℱ 0 - a-1

a 0

=

0 -a

a-1 0

:

ð20:306Þ

This implies that a = a-1, or a = ± 1. That is, h = ± e, where e denotes the identity element of SU(2). (Note that we get no further conditions from he1 = e1h.) Thus, we obtain F = fe, - eg:

ð20:307Þ

In fact, with respect to X 2 suð2Þ we have Ad½ ± eðX Þ = ð ± eÞX ð ± eÞ - 1 = ð ± eÞX ð ± eÞ = X:

ð20:308Þ

Therefore, certainly we get Ad½ ± e = E:

ð20:309Þ

Meanwhile, suppose that with g1, g2 2 SU(2) we have Ad[g1] = Ad [g2] = E 2 SO(3). Then, we have Ad g1 g2 - 1 = Ad½g1 Ad g2 - 1 = Ad½g2 Ad g2 - 1 = Ad g2 g2 - 1 = Ad½e = E:

ð20:310Þ

Therefore, from (20.307) and (20.309) we get g 1 g2 - 1 = ± e

or

g1 = ± g 2 :

ð20:311Þ

Conversely, we have Ad½ - g = Ad½ - eAd½g = Ad½g,

ð20:312Þ

where with the first equality we used Theorem 20.6 and with the second equality we used (20.309). The relations (20.311) and (20.312) imply that with any g 2 SU(2) there are two elements ±g that satisfy Ad[g] = Ad[-g]. These complete the proof. In the above proof of Theorem 20.7, (20.309) is a simple, but important expression. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above discussion as follows: Let Ad be a homomorphism mapping such that

914

20

Theory of Continuous Groups

Ad : SU ð2Þ⟶SOð3Þ

ð20:303Þ

with a kernel F = fe, - eg. The relation is schematically represented in Fig. 20.6. Notice the resemblance between Figs. 20.6 and 16.1a. Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic ~ such that mapping Ad ~ : SU ð2Þ=F ⟶SOð3Þ, Ad

ð20:313Þ

where F = fe, - eg. Symbolically writing (20.313) as in Sect. 16.4, we have SU ð2Þ=F ffi SOð3Þ. Notice that F is an invariant subgroup of SU(2). Using a general form of SU(2) of (20.300) and a general form of SO(3) of (20.302), we can readily show Ad[gαβγ ] = Ad[-gαβγ ]. That is, from (20.300) we have gαþ2π,βþ2π,γþ2π = - gαβγ :

ð20:314Þ

Both the above two elements gαβγ and -gαβγ produce an identical orthogonal matrix given in RHS of (20.302). To associate the above results of abstract algebra with the tangible matrix form obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of sin β = 2 sin

β β cos : 2 2

That is, we get 0

ðjÞ Dm0 m ðα, β, γ Þ =

μ

ð- 1Þm - mþμ ðj þ mÞ!ðj - mÞ!ðj þ m0 Þ!ðj - m0 Þ! ðj þ m - μÞ!μ!ðm0 - m þ μÞ!ðj - m0 - μÞ! 0

× e - iðαm þγmÞ cos

β 2

2j

cos

β 2

m - m0 - 2μ

sin

m0 - mþ2μ

β 2

0

=

μ

×e

ð- 1Þm - mþμ ðj þ mÞ!ðj - mÞ!ðj þ m0 Þ!ðj - m0 Þ! m - m0 - 2μ 2 ðj þ m - μÞ!μ!ðm0 - m þ μÞ!ðj - m0 - μÞ! - iðαm0 þγmÞ

ðsin βÞ

m0 - mþ2μ

β cos 2

2jþ2ðm - m0 - 2μÞ

: ð20:315Þ

From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and m′. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But (ii) if j is a half-odd-integer, so are m and m′. Then, (20.315) is a periodic function with 4π. Note that in the case of (ii) 2j = 2n + 1 (n = 0.1, 2, ⋯). Therefore, in (20.315) we have

20.5

Connectedness of Lie Groups

cos

β 2

2jþ2ðm - m0 - 2μÞ

915

= cos

β β  cos 2 2

2ðnþm - m0 - 2μÞ

:

ð20:316Þ

As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics span the representation space of D(l ) (l = 0, 1, 2, ⋯). The simplest case for it is D(0) = 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is ð1=2Þ D(1/2) whose matrix representation was given by Dα,β,γ in (20.45) or by gαβγ in (20.300). With ±gαβγ , both Ad[±gαβγ] produce a real orthogonal (3, 3) matrix expressed as (20.302). This representation matrix allowedly belongs to SO(3).

20.5

Connectedness of Lie Groups

As already discussed in Chap. 6, reflecting the topological characteristics the connectedness is an important concept in the Lie groups. If we think of the analytic functions on the complex plane ℂ, the connectedness was easily envisaged geometrically. To deal with the connectedness in general Lie groups, however, we need abstract concepts. In this section we take a down-to-earth approach as much as possible.

20.5.1 Several Definitions and Examples To discuss this topic, let us give several definitions and examples. Definition 20.5 Suppose that A is a subset of GL(n, ℂ) and that we have any pair of elements a, b 2 A ⊂ GL(n, ℂ). Meanwhile, let f(t) be a continuous function defined in an interval 0 ≤ t ≤ 1 and take values within A; that is, f(t) 2 A with 8t. If f(0) = a and f(1) = b, then a and b are said to be connected within A. Major parts of the discussion that follows are based mostly upon literature [9]. A simple example for the above is given below. Example 20.6 Let a =

1 0

0 1

and b =

-1 0

0 -1

be elements of SO(2). Let f(x)

be described by f ðt Þ =

cos πt sin πt

- sin πt : cos πt

Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) = a and f(1) = b and, hence, a and b are connected to each other within SO(2).

916

20

Theory of Continuous Groups

For a and b to be connected within A is denoted by a~b. Then, the symbol ~ satisfies the equivalence relation. That is, we have a following proposition. Proposition 20.2 We have a, b, c 2 A ⊂ GL(n, ℂ). Then, we have following three relations. (i) a~a. (ii) If a~b, b~a. (iii) If a~b and b~c, then we have a~c. Proof (i) Let f(t) = a (0 ≤ t ≤ 1). Then, f(0) = f(1) = a so that we have a~a. (ii) Let f(t) be a continuous curve that connects a and b such that f(0) = a and f(1) = b (0 ≤ t ≤ 1), and so a~b. Let g(t) be another continuous curve within A. Then, g(t)  f(1 - t) is also a continuous curve within A. Then, g(0) = b and g(1) = a. This implies b~a. (iii) Let f(t) and g(t) be two continuous curves that connect a and b and b and c, respectively. Meanwhile, we define h(t) as hðt Þ =

f ð2t Þ ð0 ≤ t ≤ 1=2Þ , gð2t - 1Þ ð1=2 ≤ t ≤ 1Þ

so that h(t) can be a third continuous curve. Then, h(1/2) = f(1) = g(0). From the supposition we have h(0) = f(0) = a, h(1/2) = f(1) = g(0) = b, and h(1) = g(1) = c. This implies if a~b and b~c, then we have a~c. Let us give another related definition. Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection of points that are connected to a within A is said to be a connected component of a and denoted by C(a). From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) = ∅. (II) C(a) = C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) = C(b). Thus, the subset A is a direct sum of several connected components such that A = CðaÞ [ CðbÞ [ ⋯ [ CðzÞ with any CðpÞ \ CðqÞ = ∅ðp ≠ qÞ,

ð20:317Þ

where p, q are taken from among a, b, ⋯, z. In particular, the connected component containing e (i.e., the identity element) is of major importance. Definition 20.7 Let A be a subset of GL(n, ℂ). If any pair of elements a, b (2A) is connected to each other within A, then A is connected or A is called a connected set. The connected set is nothing but a set that consists of a single connected set. With a connected set, we have the following important theorem. Theorem 20.8 An image of a connected set A obtained by a continuous mapping f is a connected set as well. Proof The proof is almost self-evident. Let f: A ⟶ B be a continuous mapping from a connected set A to B. Suppose f(A) = B. Then, we have 8a, a′ 2 A and b, b′ 2 B that

20.5

Connectedness of Lie Groups

917

satisfy f(a) = b and f(a′) = b′. Then, from Definition 20.5 there is a continuous function g(t) (0 ≤ t ≤ 1) that satisfies g(0) = a and g(1) = a′. Now, define a function h(t)  f[g(t)]. Then, h(t) is a continuous mapping with h(0) = f[g(0)] = f(a) = b and h(1) = f[g(1)] = f(a′) = b′. Again from Definition 20.5, h(t) is a continuous curve that connects b and b′ within B. Meanwhile, from supposition of f(A) = B, with 8b, b′ 2 B we must have ∃ a0 , a00 2 A that satisfy f(a0) = b and f a00 = b0 . Consequently, from Definition 20.7, B is a connected set as well. This completes the proof. Applications of Theorem 20.8 are given below as examples. Example 20.7 Let f(x) be a real continuous function defined on a subset A ⊂ GL(n, ℝ), where x 2 A with A being a connected set. Suppose that for a pair of a, b 2 A, f(a) = p and f(b) = q ( p, q : real) with p < q. Then, f(x) takes all real numbers for its value that exist on the interval [p, q]. In other words, we have f(A) ⊃ [p, q]. This is well-known as an intermediate value theorem. Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function g(t) (0 ≤ t ≤ 1) such that g(0) = a and g(1) = b. Meanwhile, suppose that we can define a real determinant for any 8g 2 A as a real continuous function detg(t). Now, suppose that deta = p < 0 and detb = q > 0 with p < q. Then, from Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some x 2 A such that det(x) = 0. But this is in contradiction to A ⊂ GL(n, ℝ), because we must have det(A) ≠ 0. Thus, within a connected set the sign of the determinant should be constant, if the determinant is defined as a real continuous function. In relation to the discussion of the connectedness, we have the following general theorem. Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group G that contains the identity element e. Then, G0 is an invariant subgroup of G. A connected component C(g) containing g 2 G is identical with gG0 = G0g. Proof Let a, b 2 G0. Since G0 = C(e), from the supposition there are continuous curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such that f(0) = a, f(1) = e and g(0) = b, g(1) = e. Meanwhile, let h(t)  f(t)[g(t)]-1. Then, h(t) is another continuous curve. We have h(0) = f(0)[g(0)]-1 = ab-1 and h(1) = f(1) [g(1)]-1 = e(e-1) = e  e = e. That is, ab-1 and e are connected, being indicative of ab-1 2 G0. This implies that G0 is a subgroup of G. Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above. Defining k(t) as k(t) = g f(t)g-1, k(t) is a continuous curve within G with k(0) = gag-1 and k(1) = g eg-1 = e. Hence, we have gag-1 2 G0. Since a is an arbitrary point of G0, we have gG0g-1 ⊂ G0 accordingly. Similarly, g is an arbitrary point of G, and so replacing g with g-1 in the above, we get g-1G0g ⊂ G0. Operating g and g-1 from the left and right, respectively, we obtain G0 ⊂ gG0g-1. Combining this with the above

918

20

Theory of Continuous Groups

relation gG0g-1 ⊂ G0, we obtain gG0g-1 = G0 or gG0 = G0g. This means that G0 is an invariant subgroup of G; see Sect. 16.3. Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) = g f(t) is a continuous curve within G with l(0) = ga and l(1) = g e = g. Then, ga and g are connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0, gG0 ⊂ C(g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve m(t) (0 ≤ t ≤ 1) that connects p with g within G. This implies that m(0) = p and m(1) = g. Also setting a continuous curve n(t) = g-1m(t), we have n(0) = g-1m(0) = g-1p and n(1) = g-1m(1) = g-1g = e. This means that g-1p 2 G0. Multiplying g on its both sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get C(g) ⊂ gG0. Combining this with the above relation gG0 ⊂ C(g), we obtain C(g) = gG0 = G0g. These complete the proof. In the above proof, we add the following statement: In Sect. 16.2 with the necessary and sufficient condition for the subset to be a subgroup, we describe (1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h - 1 2 H . In (1) if we replace hj with hi- 1 , hi ⋄hj- 1 = hi ⋄hi- 1 = e 2 H . In turn, if we replace hi with e, Finally, if we replace hj = hj- 1 , hi ⋄hj- 1 = e⋄hj- 1 = hj- 1 2 H . hi ⋄hj- 1 = hi ⋄ hj- 1

-1

= hi ⋄hj 2 H . Thus, H satisfies the axioms (A1), (A3), and

(A4) of Sect. 16.1 and, hence, forms a (sub)group. In other words, the conditions (1) and (2) are combined so as to be hi 2 H , hj 2 H ⟹hi ⋄hj- 1 2 H : The above general theorem can immediately be applied to an important class of Lie groups SO(n). We discuss this topic below.

20.5.2 O(3) and SO(3) In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their subgroups. In this section, we examine several important properties in terms of Lie groups. The groups O(3) and SO(3) are characterized by their determinant detO(3) = ± 1 and detSO(3) = 1. Example 20.8 implies that in O(3) there should be two different connected components according to detO(3) = ± 1. Also Theorem 20.9 tells us that a connected component C(E) is an invariant subgroup G0 in O(3), where E is the identity element of O(3). Obviously, G0 is SO(3). In turn, another connected component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with the notation see Sect. 6.1. Remembering (20.317), we have

20.5

Connectedness of Lie Groups

919

Oð3Þ = SOð3Þ [ SOð3Þc :

ð20:318Þ

This can be rewritten as a direct sum such that Oð3Þ = C ðEÞ [ Cð- E Þ

with

C ðEÞ \ C ð- EÞ = ∅,

where E denotes a (3, 3) unit matrix and - E =

-1 0 0

0 -1 0

0 0 -1

ð20:319Þ . In the above,

SO(3) is identical to C(E) with SO(3) itself being a connected set; SO(3)c is identical with C(-E) that is another connected set. Another description of O(3) is given by [13] Oð3Þ = SOð3Þ  fE, - Eg,

ð20:320Þ

where the symbol  denotes a direct-product group (Sect. 16.5). An alternative description for this is Oð3Þ = SOð3Þ [ fð- EÞSOð3Þg,

ð20:321Þ

where the direct sum is implied. In Sect. 20.2.6, we discussed two rotations around different rotation axes with the same rotation angle. These rotations belong to the same conjugacy class; see (20.140). With Q as another rotation that transforms the rotation axis of Rω to that of R0ω , the two rotations Rω and R0ω having the same rotation angle ω are associated with each other through the following relation: R0ω = QRω Q - 1 :

ð20:140Þ

Notice that this relation is independent of the choice of specific coordinate system. In Fig. 20.4 let us choose the z-axis for the rotation axis A with respect to the rotation Rω. Then, the rotation matrix Rω is expressed in reference to the Cartesian coordinate system as Rω =

cos ω sin ω 0

- sin ω 0 cos ω 0 0 1

:

ð20:322Þ

Meanwhile, multiplying -E on both sides of (20.140), we have - R0ω = Qð- Rω ÞQ - 1 :

ð20:323Þ

Note that -E is commutative with any Q (or Q-1). Also, notice that from (20.323) - R0ω and -Rω are again connected via a unitary similarity transformation. From (20.322), we have

920

20

- cos ω - sin ω 0

- Rω =

sin ω - cos ω 0

cos ðω þ π Þ sin ðω þ π Þ 0

=

Theory of Continuous Groups

0 0 -1

- sin ðω þ π Þ 0 cos ðω þ π Þ 0 0 -1

ð20:324Þ

,

where -Rω represents an improper rotation of an angle ω + π (see Sect. 17.1). Referring to Fig. 17.12 and Table 17.6 (Sect. 17.3), as another tangible example we further get 0 -1 0

-1 0 0

0 0 -1

-1 0 0

0 -1 0

0 0 -1

=

0 1 0

1 0 0

0 0 1

,

ð20:325Þ

where the first component of LHS belongs to O (octahedral rotation group) and the RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining (20.324) and (20.325), we get the following schematic representation: ðRotationÞ × ðInversionÞ = ðImproper rotationÞ or ðMirror symmetryÞ: This is a finite group version of (20.320).

20.5.3 Simply Connected Lie Groups: Local Properties and Global Properties As a final topic of Lie groups, we outline the characteristic of simply connected groups. In (20.318), (20.319), and (20.321) we showed three different decompositions of O(3). This can be generalized to O(n). That is, instead of -E we may take F described by [9] 1 F=

1



,

ð20:326Þ

1 -1 where det F = - 1. Then, we have OðnÞ = SOðnÞ [ SOðnÞc = C ðEÞ [ CðF Þ = SOðnÞ þ FSOðnÞ,

ð20:327Þ

where the last side represents the coset decomposition (Sect. 16.2). This is essentially the same as (20.321). In terms of the isomorphism, we have

20.5

Connectedness of Lie Groups

921

OðnÞ=SOðnÞ ffi fE, F g,

ð20:328Þ

where {E, F} is a subgroup of O(n). If in particular n is odd (n ≥ 3), we can choose -E instead of F, because det(E) = - 1. Then, similarly to (20.320) we have OðnÞ = SOðnÞ  fE, - E g ðn : oddÞ,

ð20:329Þ

where E is a (n, n) identity matrix. Note that since -E is commutative with any elements of O(n), the direct-product of (20.329) is allowed. Of the connected components, that containing the identity element is of particular importance. This is because as already mentioned in Sect. 20.3.1, the “initial” condition of the one-parameter group A(t) is set at t = 0 such that Að0Þ = E:

ð20:238Þ

Under this condition, it is important to examine how A(t) evolves with t, starting from E at t = 0. In this connection we have the following theorems that give good grounds to further discussion of Sect. 20.2. We describe them without proof. Interested readers are encouraged to look up literature [9]. Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra of G and G0 is identical. Theorem 20.10 shows the significance of the connected component of the identity. From this theorem the Lie algebras of, e.g., oðnÞ and soðnÞ are identical, namely both of the Lie algebras are given by real skew-symmetric matrices. This is inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector) as an element of Lie algebra corresponds to the identity element of the Lie group. This is obvious from the relation e = exp 0, where e is the identity element of the Lie group and 0 represents the zero matrix. Basically Lie algebras are suited for describing local properties associated with infinitesimal transformations around the identity element e. More specifically, the exponential functions of the real skew-symmetric matrices cannot produce -E. Considering that one of the connected components of O(3) is SO(3) that contains the identity element, it is intuitively understandable that oðnÞ and soðnÞ are the same. Another interesting point is that SO(3) is at once an open set and a closed set (i.e., clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary and sufficient condition for a subset S of a topological space to be both open and closed at once is Sb = ∅. Since the determinant of any elements of O(3) is alternatively ±1, it is natural that SO(3) has no boundary; if there were boundary, its determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example 20.8.

922

20

Theory of Continuous Groups

Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0 in the sense of Theorem 20.10.) Then, 8g 2 G can be described by g = ðexp t1 X 1 Þðexp t 2 X 2 Þ⋯ðexp t d X d Þ, where Xi (i = 1, 2, ⋯, d ) is a basis set of Lie algebra corresponding to G and ti (i = 1, 2, ⋯, d ) is an arbitrary real number. Originally, the notion of Lie algebra has been introduced to deal with infinitesimal transformation near the identity element. Yet, Theorem 20.11 says that any transformation (or element) of a connected linear Lie group can be described by a finite number of elements of the corresponding Lie algebra. That is, the Lie algebra determines the global characteristics of the connected Lie group. Strengthening the condition of connectedness, we need to deal with simply connected groups. In this context we have the following definitions. Definition 20.8 Let G be a linear Lie group and let an interval I = [0, 1]. Let f be a continuous function such that f : I⟶G or

f ðI Þ ⊂ G:

ð20:330Þ

Then, the function f is called a path, in which f(0) and f(1) are an initial point and an end point, respectively. If f(0) = f(1) = x0, f is called a loop (i.e., a closed path) at x0 [14]. If f(t)  x0 (0 ≤ t ≤ 1), f is said to be a constant loop. Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed from one to the other, they are said to be homotopic. To be more specific with Definition 20.9, let us define a function h(s, t) that is continuous with respect to s and t in a region I × I = {(s, t); 0 ≤ s ≤ 1, 0 ≤ t ≤ 1}. If furthermore h(0, t) = f(t) and h(1, t) = g(t) hold, f and g are homotopic [9]. Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply connected. This would somewhat be an abstract concept. Rephrasing the statement in a word, for G to be simply connected is that loops at any 8x 2 G can continually be contracted to that point x. A next example helps understand the meaning of being simply connected. Example 20.9 Let S2 be a spherical surface of ℝ3 such that S2 = x 2 ℝ3 ; hxjxi = 1 :

ð20:331Þ

Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected.

20.5

Connectedness of Lie Groups

923

By the same token, a spherical hypersurface (or hypersphere) given by Sn = {x 2 ℝn + 1; hx| xi = 1} is simply connected. Thus, from (20.36) SU(2) is simply connected. Note that the parameter space of SU(2) is S3. Example 20.10 In Sect. 20.4.2 we showed that SO(3) is a connected set. But it is not simply connected. To see this, we return to Sect. 17.4.2 that dealt with the threedimensional rotation matrices. Rewriting R~3 in (17.101) as R~3 ðα, β, γ Þ, we have R3 ðα, β; γ Þ ¼

cos α cos β cos γ - sin α sin γ sin α cos β cos γ þ cos α sin γ - sin β cos γ

- cos α cos β sin γ - sin α cos γ cos α sin β - sin α cos β sin γ þ cos α cos γ sin α sin β sin β sin γ cos β ð20:332Þ

Note that in (20.332) we represent the rotation in the moving coordinate system. Putting β = 0 in (20.332), we get R3 ðα; 0; γ Þ cos α cos γ - sin α sin γ - cos α sin γ - sin α cos γ = sin αð cos 0Þ cos γ þ cos α sin γ - sin αð cos 0Þ sin γ þ cos α cos γ 0 0 cos ðα þ γ Þ - sin ðα þ γ Þ 0 = sin ðα þ γ Þ cos ðα þ γ Þ 0 : 0 0 1

0 0 1

ð20:333Þ If in (20.333) we replace α with α + ϕ0 and γ with γ - ϕ0, we are going to obtain the same result as (20.333). That is, different sets of parameters give the same result (i.e., the same rotation) such that ðα, 0, γ Þ⟷ðα þ ϕ0 , 0, γ - ϕ0 Þ:

ð20:334Þ

Putting β = π in (20.332), we get R3 ðα; π; γ Þ =

- cos ðα - γ Þ - sin ðα - γ Þ 0

- sin ðα - γ Þ 0 cos ðα - γ Þ 0 0 -1

:

ð20:335Þ

If, in turn, we replace α with α + ϕ0 and γ with γ + ϕ0 in (20.335), we obtain the same result as (20.335). Again, different sets of parameters give the same result (i.e., the same rotation) such that ðα, π, γ Þ⟷ðα þ ϕ0 , π, γ þ ϕ0 Þ: Meanwhile, R3 of (17.107) is expressed as

ð20:336Þ

:

924

20

ðcos 2 α cos 2

βþ sin 2 αÞ cos γ

cos α sin α sin 2 βð1 - cos γ Þ

cos α cos β sin βð1 - cos γ Þ

þ cos α sin β

- cos β sin γ

þ sin α sin β sin γ

cos α sin α sin 2 βð1 - cos γ Þ

ðsin 2 α cos 2 βþ cos 2 αÞ cos γ

sin α cos β sin βð1 - cos γ Þ

þ cos β sin γ

þ sin 2 α sin 2 β

- cos α sin β sin γ

cos β sin β cos αð1 - cos γ Þ - sin α sin β sin γ

sin α cos β sin βð1 - cos γ Þ þ cos α sin β sin γ

sin 2 β cos γ þ cos 2 β

2

R3 ¼

Theory of Continuous Groups

2

:

ð20:337Þ Note in (20.337) we represent the rotation in the fixed coordinate system. In this case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once again. In this parameter space, specifying SO(3) the rotation is characterized by azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of variability of the parameters (α, β, γ) are 0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π: Figure 20.7a shows a geometry viewed in parallel to the equator that includes a zenithal angle β as well as a point P (i.e., a point of intersection between the rotation axis and geosphere) and its antipodal point P. If β is measured at P, β and γ should be replaced with π - β and 2π - γ, respectively. Meanwhile, Fig. 20.7b, c depict an azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is measured at the antipodal point P, α should be replaced with either α + π (in the case of 0 ≤ α ≤ π; see Fig. 20.7b) or α - π (in the case of π < α ≤ 2π; see Fig. 20.7c). Consequently, we have the following correspondence: ðα, β, γ Þ $ ðα ± π, π - β, 2π - γ Þ:

ð20:338Þ

Thus, we confirm once again that the different sets of parameters give the same rotation. In other words, two different points in the parameter space correspond to the same transformation (i.e., rotation). In fact, using these different sets of parameters, the matrix form of (20.337) is held unchanged. The conformation is left for readers. Because of the aforementioned characteristic of the parameter space, a loop cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply connected. In contrast to Example 20.10, the mapping from the parameter space S3 to SU(2) is bijective in the case of SU(2). That is, any two different points of S3 give different transformations on SU(2) (i.e., injective). For any element of SU(2) it has a corresponding point in S3 (i.e., surjective). This can be rephrased such that SU(2) and S3 are isomorphic. In this context, let us give important concepts. Let (T, τ) and (S, σ) be two topological spaces. Suppose that f : (T, τ)  (S, σ) is a bijective mapping. If in this

20.5

Connectedness of Lie Groups

Fig. 20.7 Geometry of the rotation axis (A) accompanied by a rotation γ. (a) Geometry viewed in parallel to the equator. (b), (c) Geometry viewed from above the north pole (N). If the azimuthal angle α is measured at the antipodal ~ α should be point P, replaced with (b) α + π (in the case of 0 ≤ α ≤ π) or (c) α - π (in the case of π < α ≤ 2π)

925

(a) N

Equator



2 −

S (b)

+

N

(c)



N

926

20

Theory of Continuous Groups

case, furthermore, both f and f -1 are continuous mappings, (T, τ) and (S, σ) are said to be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic. Apart from being simply connected or not, SU(2) and SO(3) resemble in many aspects. The resemblance has already shown up in Chap. 3 when we considered the (orbital) angular momentum and generalized angular momentum. As shown in (3.30) and (3.69), the commutation relation was virtually the same. It is obvious when we compare (20.271) and (20.275). That is, in terms of Lie algebras their structure constants are the same and, eventually, the structure of corresponding Lie groups is related. This became well understood when we considered the adjoint representation of the Lie group G on its Lie algebra g. Of the groups whose Lie algebras are characterized by the same structure constants, the simply connected group is said to be a universal covering group. For example, among the Lie groups O(3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal covering group. The understanding of the Lie algebra is indispensable to explicitly describing the elements of the Lie group, especially the connected components of the identity element e 2 G. Then, we are able to understand the global characteristics of the Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example of linear Lie groups. Finally, though formal, we describe a definition of a topological group. Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is called a topological group. 1. The set G is a group. 2. The set G is a T1-space. 3. Group operations (or calculations) of G are continuous. That is, let P and Q be two mappings such that P : G × G → G; Pðg, hÞ = gh ðg, h 2 GÞ,

ð20:339Þ

Q : G → G; QðgÞ = g - 1 ðg 2 GÞ:

ð20:340Þ

Then, both the mappings P and Q are continuous. By Definition 20.11, a topological group combines the structures of group and topological space. Usually, the continuous group is a collective term of Lie groups and topological groups. The continuous groups are very frequently dealt with in various fields of natural science and mathematical physics.

References 1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer, Berlin

References

927

2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, Expanded edn. Shokabo, Tokyo. (in Japanese) 3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese) 4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York 6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York 7. Hassani S (2006) Mathematical physics. Springer, New York 8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press, Princeton, NJ 9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo. (in Japanese) 10. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 11. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York 12. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer algebra, 2nd edn. World Scientific, Singapore 14. McCarty G (1988) Topology. Dover, New York 15. Murakami S (2004) Foundation of continuous groups, Revised edn. Asakura Publishing, Tokyo. (in Japanese)

Part V

Introduction to the Quantum Theory of Fields

In Part I we described the fundamental feature of quantum mechanics. The theory based upon the Schrödinger equation was inconsistent with the theory of relativity, however. To make the theory consistent with the special theory of relativity, Dirac developed his theory that led to the discovery of the Dirac equation, named after himself. The success by Dirac was followed by the development of the quantum theory of fields. In this part, we study the basic properties and constitution of the Dirac equation. We examine various properties of the plane wave solutions of the Dirac equation. The quantum theory of fields was first established as quantum electrodynamics (QED), in which the Dirac field and electromagnetic field (or photon field) play the leading role. With the electromagnetic field, we highlight the coexistence of classical theory and quantum theory. Before the quantization of the Dirac field and photon field, however, we start with the canonical quantization of the scalar field (or the Klein–Gordon field) because of simplicity and accessibility as well as the wide applicability to other fields. After studying the quantization of fields, we approach several features of the interaction between the fields, with emphasis upon the practical aspect. The interactions between the fields are presented as nonlinear field equations. Since it is difficult to obtain analytic solutions of the nonlinear equation, suitable approximation must be done. The method for this is well established as the perturbation theory that makes the most of the S-matrix. In this book, we approach the problem by calculating the interaction process in lowest order of the perturbation theory. As a tangible example, we take up Compton scattering and study its characteristics in detail. In the latter half, we examine basic formalism of the Lorentz group from the point of view of the Lie groups and Lie algebras. In particular, we explore in detail the Lorentz transformation properties of the Dirac equation and its plane wave solutions. We stress the importance and practicability of the matrix representation of the Dirac operators and Dirac spinors with respect to the Dirac equation. We study their transformation properties in terms of the matrix algebra that is connected to the Lorentz group. The Dirac operators associated with the Lorentz transformation are somewhat intractable, however, since the operators are not represented by normal

930

Part V

Introduction to the Quantum Theory of Fields

matrices (e.g., Hermitian matrix or unitary matrix) because of the presence of the Lorentz boost. We examine how these issues are dealt with within the framework of the standard matrix theory. This last part of the book deals with advanced topics of Lie algebra, with its differential representation stressed. Since the theory of vector spaces underlies all parts of the final part, several extended concepts of vector spaces are explained in relation to Part III. Historically speaking, the Schrödinger equation has obtained citizenship across a wide range of scientific community including physics, chemistry, materials science, and even biology. As for the Dirac equation, on the other hand, it seems to have found the place only in theoretical physics, especially the quantum theory of fields. Nonetheless, in recent years the Dirac equation often shows up as a fundamental equation in, e.g., solid-state physics. Under these circumstances, we wish to investigate the constitution and properties of the Dirac equation as well as the Dirac operators and spinors on a practical level throughout the whole of this part.

Chapter 21

The Dirac Equation

In this chapter we describe the feature of the Dirac equation including its historical background. The Dirac equation is a quantum theory of electron that deals with the electron dynamics within the framework of special theory of relativity. Before the establishment of the theory achieved by Dirac, several attempts to develop the relativistic quantum-mechanical theory of electron were made mainly on the basis of the Klein-Gordon equation. The theory, however, had a serious problem in terms of the probability density. Dirac proposed the Dirac equation, named after himself, to dispel the problem. Most importantly, the Dirac equation opened up a novel route to the quantum theory of fields. The quantum theory of fields has first been established as the quantum electrodynamics (QED) that deals with electron and photons as quantized fields. The theory has later become the foundation of the advanced quantum field theory based on the gauge theory. In this chapter we study and pursue the framework of the Dirac equation with emphasis upon the plane wave solutions to make it a clear starting point for gaining a good understanding of QED. The structure of the Dirac equation can be well-understood using the polar coordinate representation. The results will be used in Chap. 24 as well to further develop the theory. The Dirac equation is originally “relativistic.” To help understand the nature of the Dirac equation, we add several remarks on the special theory of relativity at the beginning of the chapter accordingly.

21.1

Historical Background

The success in establishing the atomic-scale mechanics by the Schrödinger equation prompted many researchers to pursue relativistic formulation of the quantum mechanics. As a first approach for this, the Klein-Gordon equation (occasionally referred to as Klein-Gordon-Fock equation) was proposed [1, 2]. It is expressed as

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_21

931

932

21 2

mc 1 ∂ -—2 þ ħ c2 ∂t 2

2

ϕðt, xÞ = 0,

The Dirac Equation

ð21:1Þ

where the operator — 2 has already appeared in (1.24) and (7.30). More succinctly we have □þ

mc ħ

2

ϕðt, xÞ = 0,

where □ is d’Alembertian (or d’Alembert operator) defined by 2

□

1 ∂ - — 2: c2 ∂t 2

The Klein-Gordon equation is a relativistic wave equation for a quantummechanical free particle. This equation is directly associated with the equivalence theorem of mass and energy. In other words, it is based on the following relationship among energy, momentum, and mass of the particle: E 2 = p 2 c2 þ m 2 c4 ,

ð21:2Þ

where E, p, and m are energy, momentum, and mass of the particle, respectively; see (1.20) and (1.21). Equation (21.1) can be obtained using the correspondence between momentum or energy and differential operators expressed in (1.31) or (1.40). Nonetheless, the Klein-Gordon equation included a fundamental difficulty in explaining a probability density for the presence of particle. The argument is as follows: The Schrödinger equation is described as iħ

∂ψ ħ2 2 = — þ V ψ: 2m ∂t

ð1:45Þ

Multiplying both sides of (1.45) by ψ  from the left, we have iħψ 

∂ψ ħ2  2 ψ — ψ þ Vψ  ψ: =2m ∂t

ð21:3Þ

Taking complex conjugate of (1.45) and multiplying ψ from the right, we have - iħ

ħ2 ∂ψ  — 2 ψ  ψ þ Vψ  ψ: ψ= 2m ∂t

Subtracting (21.4) from (21.3), we get

ð21:4Þ

21.1

Historical Background

iħ ψ 

933

ħ2 ∂ψ ∂ψ  ψ = þ 2m ∂t ∂t

— 2ψ  ψ - ψ — 2ψ :

ð21:5Þ

Further rewriting (21.5), we have ∂  ħ ðψ ψ Þ þ ψ  — 2 ψ - — 2 ψ  ψ = 0: 2mi ∂t

ð21:6Þ

Here, we define ρ and j as ρðx, t Þ  ψ  ðx, t Þψ ðx, t Þ, jðx, t Þ 

ħ ½ψ  — ψ - ð— ψ  Þψ , 2mi

ð21:7Þ

where x is a position vector and — is nabla [defined in (3.10)]. Recasting (21.6) by use of (21.7), we get a continuity equation expressed as ∂ρ þ div j = 0: ∂t

ð21:8Þ

If we deem ρ to be a probability density of the particle and j to be its flow, (21.8) represents the conservation law of the probability density; see (7.20). Notice that the quantity ρ(x, t) defined in (21.7) is positive (or zero) anywhere at any time. Meanwhile, we can similarly define ρ and j with the Klein-Gordon equation of (21.1) such that ρðx, t Þ 

∂ϕ ∂ϕ iħ ϕ , ϕ 2m ∂t ∂t

jðx, t Þ 

ħ ½ϕ — ϕ - ð— ϕ Þϕ 2mi

ð21:9Þ

to derive a continuity equation similar to (21.8). A problem lies, however, in regarding ρ as the probability density. It is because ρ may be either positive or negative. To see this, put, e.g., ϕ = f + ig (where f and g are real functions) and calculate the first equation of (21.9). Thus, the conservation of the probability density could not be guaranteed. Under such circumstances, it was Dirac that settled the problem (1928); see Sect. 1.2. He proposed the Dirac equation and his theory was counted as one of the most successful theories among those describing the equation of motion of a quantummechanical particle (i.e., electron) within the framework of the special theory of relativity.

934

21.2 21.2.1

21

The Dirac Equation

Several Remarks on the Special Theory of Relativity Minkowski Space and Lorentz Transformation

The special theory of relativity is based on the principle of constancy of light velocity. We give several brief remarks on the theory together with examples. Regarding the mathematical equations dealt with in the quantum theory of fields, we normally define natural units to represent physical quantities by putting c = 1, ħ = 1:

ð21:10Þ

We follow the custom from here on. Using the natural units, as the KleinGordon equation we have □ þ m2 ϕðt, xÞ = 0:

ð21:11Þ

Also, time t and space coordinates x are denoted altogether by x = xμ =

t x

=

x01 x2 x3 x

ðμ = 0, 1, 2, 3Þ

ð21:12Þ

where x0 represents the time coordinate and x1, x2, and x3 show the space coordinates. The quantity x is referred to collectively as a space–time coordinate. We follow the custom as well with the superscript notation of physical quantities. The implication will be mentioned later (also, see Sect. 24.1). The quantity xμ (μ = 0, 1, 2, 3) is called a four-vector; this is often represented by a single letter x. That is, ϕ(x) is meant by ϕ(t, x). Example 21.1 Suppose that a train is running on a straight rail track with a constant velocity v toward the positive direction of the x-axis and that a man (M) standing on firm ground is watching the train. Also, suppose that at the moment when the tail end of the train is passing in front of him, he and a passenger (P) on the train adjust their clocks to t = t′ = 0. Here, t is time for the man on firm ground and t′ for the passenger (Fig. 21.1). Suppose moreover that a light is emitted at the tail end of the train at t = t′ = 0. Let O be the inertial frame of reference fixed on the man on firm ground and let O′ be another inertial frame of reference fixed on the train. Also, let the origin of O be on that man and let that of O′ be on the tail end of the train. Let us assume that the emitted light is observed at t = t at a position vector x (=xe1 + ye2 + ze3) of O and that the same light is observed at t′ = t′ at a position vector x0 = x0 e01 þ y0 e02 þ z0 e03 of O′. In the above, e1, e2, e3 and e01 , e02 , e03 are individual basis sets for the Cartesian coordinates related to O and O′, respectively. To make the situation clearer, suppose that a laser beam has been emitted at t = 0 and x = 0 in the frame O and that the beam has been emitted at t′ = 0 and x′ = 0 in

21.2

Several Remarks on the Special Theory of Relativity

Fig. 21.1 Two inertial frames of reference O and O'. In O a man (M) is standing on firm ground and in O' a passenger (P) is standing at the tail end of the train. The train is running on a straight rail track with a constant velocity v in reference to O toward the positive direction of the xaxis. The man and passenger are observing how a light (e.g., a laser beam) is emitted and that light hits a ball after a moment at a certain place Q (see text)

935

( , ) ( , ′) (P)

(M)

the frame O′. After a moment, the same beam hits a ball at a certain place Q. Let Q be specified as x = x in the frame O and at x′ = x′ in the frame O′. Let time when the beam hits the ball be t = t in the frame O and t′ = t′ in the frame O′. Naturally, the man M chooses his standing position for x = 0 (i.e., the origin of O) and the passenger P chooses the tail end of the train for x′ = 0 (i.e., the origin of O′). In this situation, the special theory of relativity tells us that the following relationship must hold: ðct Þ2 - x2 = ðct 0 Þ - x0 = 0: 2

2

Note that the light velocity c is the same with the two frames of O and O′ from the requirement of the special theory of relativity. Using the natural units, the above equation is rewritten as t 2 - x2 = t 0 - x0 = 0: 2

2

ð21:13Þ

Equation (21.13) represents an invariant (i.e., zero in this case) with respect to the choice of space–time coordinates, or inertial frames of reference. Before taking a step further to think of the implication of (21.13) and generalize the situation of Example 21.1, let us introduce a mathematical technique. Let B(y, x) be a bilinear form defined as [3, 4] Bðy, xÞ  yT ηx = y1 ⋯ yn η

x1 ⋮ xn

,

ð21:14Þ

where x and y represent a set of coordinates of vectors in a n-dimensional vector space. A common basis set is assumed for x and y. The coordinates x and y are defined as a column vector described by

936

21 x1 ⋮ xn

x

,y 

y1 ⋮ yn

The Dirac Equation

:

Although the coordinates xi and yi (i = 1, ⋯, n) are usually chosen from either real numbers or complex numbers, in this chapter we assume that both xi and yi are real. In the Minkowski space we will be dealing with in this chapter, xi and yi (i = 0, 1, 2, 3) are real numbers. The coordinates x0 and y0 denote the time coordinate and xk and yk (k = 1, 2, 3) represent the space coordinates. We call xi and yi (i = 0, 1, 2, 3) the space–time coordinates altogether. In (21.14) η is said to be the metric tensor that characterizes the vector space which we are thinking of. In a general case of a n-dimensional vector space, we have 1 ⋱ 1

η=

,

-1

ð21:15Þ

⋱ -1 where η is a (n, n) matrix having p integers of 1 and q integers of -1 ( p + q = n) with diagonal elements; all the off-diagonal elements are zero. For this notation, see Sect. 24.1 as well. In the present specific case, we have

η=

1 0 0 0 0 -1 0 0 0 0 -1 0 0 0 0 -1

:

ð21:16Þ

The tensor η of (21.16) is called Minkowski metric and the vector space pertinent to the special theory of relativity is known as a four-dimensional Minkowski space accompanied by space–time coordinates represented by (21.12). By virtue of the metric tensor η, we have the following three cases: Bðx, xÞ

0

We define a new row vector such that ðx0 x1 x2 x3 Þ  x0 x1 x2 x3 η: Then, the bilinear form B(x, x) given by (21.14) can be rewritten as

ð21:17Þ

21.2

Several Remarks on the Special Theory of Relativity

937 x01 x2 x3 x

Bðx, xÞ = ðx0 x1 x2 x3 Þ

:

ð21:18Þ

With respect to individual components xν, we have 3

xν = μ=0

xμ ημν =

3 μ=0

xμ ηνμ ,

ð21:19Þ

where xμ is called a contravariant (column) vector and xν a covariant (row) vector. These kinds of vectors have already appeared in Sect. 20.3.2 and will be dealt with later in detail in Sect. 24.1 in relation to tensor and tensor space. Then, we have x0 = x 0 , x 1 = - x 1 , x 2 = - x 2 , x 3 = - x 3 :

ð21:20Þ

Adopting Einstein summation convention, we express (21.18) as 3

Bðx, xÞ =

xμ xμ  xμ xμ ,

ð21:21Þ

μ=0

where the same subscripts and superscripts are supposed to be summed over 0 to 3. Using this summation convention, (21.13) can be rewritten as μ

xμ xμ = x0μ x0 = 0:

ð21:22Þ

Notice that η of (21.16) is symmetric matrix. The inverse matrix of η is the same as η. That is, η-1 = η or ημν ηνρ = δμρ , where δμρ is Kronecker delta. Once again, note that the subscript and superscript ν are summed. The symbols ημν and ηνρ represent η-1 and η, respectively. That is, (ημν) = (η)μν = (η-1)μν and (ημν) = (η)μν; with these notations see (11.38). What we immediately see from (21.17) is that unlike the inner product, B(x, x) does not have a positive definite feature. If we put e.g., x0 = x0 = 0, we have B(x, x) ≤ 0. For this reason, the Minkowski metric η is said to be an indefinite metric or the Minkowski space is called an indefinite metric space. Inserting E = Λ-1Λ in (21.18), we rewrite it as

938

21

Bðx, xÞ = ðx0 x1 x2 x3 ÞΛ - 1 Λ

x01 x2 x3 x

The Dirac Equation

,

ð21:23Þ

where Λ is a non-singular matrix. Suppose in (21.23) that the contravariant vector is transformed such that x00 x01 x02 x03

x01 x2 x3 x



:

ð21:24Þ

Suppose, in turn, that with the transformation of a covariant vector we have x00 x01 x02 x03 = ðx0 x1 x2 x3 ÞΛ - 1 :

ð21:25Þ

Then, (21.23) can further be rewritten as Bðx, xÞ = x00 x01 x02 x03

x0 0 x00 12 x x0 3

3

= μ=0

μ

μ

x0μ x0 = x0μ x0 = Bðx0 , x0 Þ:

ð21:26Þ

Equation (21.26) implies that if we demand the contravariant vector xμ (μ = 0, 1, 2, 3) and the covariant vector xμ be transformed according to (21.24) and (21.25), respectively, the bilinear form B(x, x) is held invariant before and after the transformation caused by Λ. In fact, using (21.24) and (21.25) we have μ

Bðx0 , x0 Þ = x0μ x0 = xσ Λ- 1

σ Λμρ xρ μ

= xσ δσρ xρ = xσ xσ = Bðx, xÞ:

In this respect, (21.22) is a special case of (21.26). Or rather, the transformation Λ is decided so that B(x, x) can be held invariant. A linear transformation Λ that holds B(x, x) invariant on the Minkowski space is called Lorentz transformation. A tangible form of the matrix Λ will be decided below. To explicitly show how the covariant vector is transformed as a column vector, taking the transposition of (21.25) we get x00 x01 x02 x03

x0

¼ Λ

-1 T

x1 x2 x3

:

ð21:27Þ

Equation (21.27) indicates that a covariant vector is transformed by a matrix (Λ-1)T. We have (Λ-1)T = (ΛT)-1. To confirm this, taking transposition of ΛΛ-1 = E, we get (Λ-1)TΛT = E. Multiplying both sides of this equation by (ΛT)-1 from the right, we obtain (Λ-1)T = (ΛT)-1. Regarding the discussion of the bilinear form, we will come back to the issue in Chap. 24.

21.2

Several Remarks on the Special Theory of Relativity

939

Remark 21.1 Up to now in this book, we have not thought about the necessity to distinguish the covariant and contravariant vectors except for a few cases (see Sect. 20.3.2). This is because if we think of a vector, e.g., in ℝn (i.e., the n-dimensional Euclidean space), the metric called Euclidean metric is represented by an n-dimensional identity matrix En. Instead of dealing with the Lorentz transformation, let us think of a familiar real orthogonal transformation in ℝ3. In parallel with (21.14), as a bilinear form we have Bðx, xÞ = x1 x2 x3 E3

x1 x2 x3

= x1 x 2 x 3

x1 x2 x3

,

where E3 is a (3, 3) identity matrix. In this case the bilinear form B(x, x) is identical to the positive definite inner product hx| xi. With the transformation of a contravariant vector, we have x0 1 x0 2 x0 3

=R

x1 x2 x3

:

ð21:28Þ

With a covariant vector, in turn, similarly to the case of (21.27) we have x01 x02 x03

= R-1

T

x1 x2 x3

= RT

T

x1 x2 x3

=R

x1 x2 x3

,

ð21:29Þ

where with the second equality we used R-1 = RT for an orthogonal matrix. From (21.28) and (21.29), we find that with the orthogonal transformation in ℝ3 we do not need to distinguish a contravariant vector and a covariant vector. Note, however, that unless the transformation is orthogonal, the contravariant vector and covariant vector should be distinguished. Also, we will come back to this issue in Chap. 24.

21.2.2

Event and World Interval

If we need to specify space–time points where physical phenomena happen, we call such physical phenomena “events.” Suppose, for instance, that we are examining a motion of a baseball game and that we are going to decide when and where the ball is hit with a bat. In that situation the phenomenon of batting the ball is called an event. At the same time, we can specify the space–time coordinates where (or when) the ball is hit in an arbitrarily chosen inertial frame of reference. Bearing in mind such a situation, suppose in Example 21.1 that two events E and F have happened. Also, suppose that in Fig. 21.1 an observer M on the frame O has experienced the events E and F at xμ (E) and xμ (F), respectively. Meanwhile, another observer P on the

940

21

The Dirac Equation

frame O′ has experienced the same events E and F at x′μ (E) and x′μ (F), respectively. In that situation we define a following quantity s. s

3

½ x0 ð F Þ - x 0 ð E Þ  2 -

k=1

½xk ðF Þ - xk ðE Þ2 :

ð21:30Þ

The quantity s is said to be a world interval between the events E and F. The world interval between the two events is classified as follows: timelike ðs : realÞ spacelike ðs : imaginaryÞ lightlike, null ðs : zeroÞ:

s=

ð21:31Þ

In terms of the special principle of relativity which demands that physical laws be the same in every inertial frame of reference, the world interval s must be an invariant under the Lorentz transformation. The relationship s = 0 is associated with the principle of constancy of light velocity. Therefore, we can see (21.31) as a natural extension of the light velocity constancy principle. Historically, this principle has been established on the basis of the precise physical measurements including those performed in the Michelson-Morley experiment [5]. Now we are in the position to determine an explicit form of Λ. To make the argument simple, we revisit Example 21.1. Example 21.2 We assume that in Fig. 21.1 the frame O′ is moving at a velocity of v toward the positive direction of the x1-axis of the frame O so that the x1- and x′1axes can coincide. In this situation, it suffices to only take account of x1 as a space coordinate. In other words, we have x0 = x 2 2

x0 = x3 : 3

and

ð21:32Þ

Then, we assume a following linear transformation [5]: x0 = px0 þ qx1 ,

x0 = rx0 þ ux1 :

0

1

ð21:33Þ

Writing (21.33) in a matrix form, we get x0 0 x0 1

=

p r

q u

x0 x1

:

ð21:34Þ

Corresponding to (21.30), we have 2

2

2

2

s2 = x 0 ð F Þ - x 0 ð E Þ - x1 ð F Þ - x1 ð E Þ = x 0 ð F Þ - x1 ð F Þ ,

21.2

Several Remarks on the Special Theory of Relativity

s 0 = x0 ð F Þ - x 0 ð E Þ 2

0

0

2

- x0 ðF Þ - x0 ðE Þ 1

1

2

941

= x0 ð F Þ 0

2

2

- x0 ðF Þ : ð21:35Þ 1

In (21.35) we assume x0(E) = x1(E) = 0 and x′0(E) = x′1(E) = 0; i.e., as discussed earlier we regard the event E as the fact that the tail end of the train is just passing in front of the man standing on firm ground. Deleting F from (21.35) and on the basis of the Lorentz invariance of s, we must have s02 = x0

0 2

- x0

1 2

= x0

2

- x1

2

= s2 :

ð21:36Þ

The coefficients p, q, r, and u of (21.33) can be determined from the requirement that (21.36) holds with any x0 and x1. Inserting x′0 and x′1 of RHSs of (21.33) into (21.36) and comparing the coefficients of xixk (i, k = 0, 1; i ≤ k) in (21.36), we get [5] p2 - r 2 = 1, pq - ru = 0, q2 - u2 = - 1:

ð21:37Þ

To decide four parameters, we need another condition different from (21.37). Since we assume that the frame O′ is moving at the velocity of v to the x1-direction of the frame O, the origin of O′ (i.e., x′1 = 0) should correspond to x1 = vx0 of O. Namely, from (21.33) we have 0 = rx0 þ uvx0 = ðr þ uvÞx0 :

ð21:38Þ

As (21.38) must hold with any x0, we have r þ uv = 0:

ð21:39Þ

Since (21.37) and (21.39) are second-order equations with respect to p, q, r, and u, we do not obtain a unique solution. Solving (21.37) and (21.39), we get p p = ± 1= 1 - v2 , u = ± p, q = - vp, and r = - vu. However, p, q, r, and u must satisfy the following condition such that lim

v→0

p r

q u

=

1 0

0 1

:

Then, we must have p p = u = 1= 1 - v2 ,

p r = q = - v= 1 - v2 :

Inserting these four parameters into (21.33), we rewrite (21.33) as

942

21

x0 = p 0

1 x0 - vx1 , 1 - v2

x0 = p 1

The Dirac Equation

1 - vx0 þ x1 : 1 - v2

Using a single parameter γ given by p γ  1= 1 - v2 , (21.34) can be rewritten as x0 0 x0 1

=

γ - γv

- γv γ

x0 x1

:

ð21:40Þ

With the reversed form, we obtain x0 x1

=

γ γv

x0 0 x0 1

γv γ

:

ð21:41Þ

Taking account of (21.32), a full (4, 4) matrix form is 0

x′ 1 x′ 2 x′ 3 x′

=

γ - γv - γv γ 0 0 0 0

0 0 1 0

x0 x1 x2 x3

0 0 0 1

:

ð21:42Þ

The (4, 4) matrix of (21.42) is a simple example of the Lorentz transformations. These transformations form a group that is called the Lorentz group. If v = 0, we have γ = 1. Then, the Lorentz transformation given by (21.42) becomes the identity matrix (i.e., the identity element of the Lorentz group). The inverse element of the matrix of (21.42) is given by its inverse matrix. The matrix given by (21.42) can conveniently be expressed as follows: Putting tanh ω  v ð- 1 < ω < 1



- 1 < v < 1Þ

ð21:43Þ

,

ð21:44Þ

and using the formulae of the hyperbolic functions cosh ω = p

1 1 - tan h2 ω

we have p 1 = γ, sinh ω = γv = v= 1 - v2 : cosh ω = p 1 - v2 Thus, (21.42) can be rewritten as

ð21:45Þ

21.2

Several Remarks on the Special Theory of Relativity

γ - γv - γv γ 0 0 0 0

0 0 1 0

0 0 0 1

943

cosh ω - sinh ω - sinh ω cosh ω 0 0 0 0

=

0 0 0 0 1 0 0 1

:

ð21:46Þ

The matrix of (21.46) is Hermitian (or real symmetric) with real positive eigenvalues and the determinant of 1. We will come back to this feature in Chap. 24. If two successive Lorentz transformations are given by tanhω1 and tanhω2, their multiplication is given by cosh ω2 - sinh ω2 - sinh ω2 cosh ω2 0 0 0 0 =

0 0 1 0

cosh ω1 - sinh ω1 - sinh ω1 cosh ω1 0 0 0 0

0 0 0 1

coshð ω2 þ ω1 Þ - sinhð ω2 þ ω1 Þ - sinhð ω2 þ ω1 Þ coshð ω2 þ ω1 Þ 0 0 0 0

0 0 1 0

0 0 0 1

0 0 0 0 1 0 0 1

: ð21:47Þ

Equation (21.47) obviously shows the closure in terms of the multiplication. The associative law with respect to the multiplication is evident. Thus, the matrices represented by (21.46) satisfy the axioms (A1) to (A4) of Sect. 16.1. Hence, a collection of these matrices forms a group (i.e., the Lorentz group). As clearly seen in (21.46) and (21.47), the constitution of the Lorentz group is well characterized by the variable ω, which is sometimes called “rapidity.” Notice that the matrices given above are not unitary. These matrices are often referred to as the Lorentz boosts. Example 21.3 Here let us briefly think of the Lorentz contraction. Suppose that a straight stick exists in a certain inertial frame of reference O. The stick may or may not be moving in reference to that frame. Being strictly formal, measuring a length of the stick in that frame comprises (1) to measure and fix a spatial point P x1P , x2P , x3P that corresponds to one end of the stick and (2) to measure and fix another spatial point Q x1Q , x2Q , x3Q

corresponding to the other end. The former and latter opera-

tions correspond to the events E and F, respectively. These operations must be done at the same time in the frame. Then, the distance between P and Q (P and Q may be either different or identical) defined as dPQ in the frame is given by a following absolute value of the world interval: d PQ 

-

3 k=1

xkQ - xkP

2

:

ð21:48Þ

944

21

The Dirac Equation

We define the length of the straight stick as dPQ. We have dPQ ≥ 0; dPQ = 0 if and only if P and Q are identical. Notice that the world interval between the events E and F is a pure imaginary from (21.30), because those events happen at the same time. Bearing in mind the above discussion, suppose that the straight stick is laid along the x′1-direction with its end in contact with the tail end of the train. Suppose that the length of the stick is L′ in the frame O′ of Example 21.1. This implies that from (21.40) x′1 = L′. (Remember that the man standing on firm ground and the passenger on the train adjusted their watches to t = t′ = 0, at the moment when the tail end of the train was passing in front of the man.) If we assume that the length of the stick is read off at t (or x0) = 0 in the frame O of Example 21.1, again from (21.40) we have p p 1 L = x1 = 1 - v2 x0 = L0 1 - v2 = L0 =γ,

ð21:49Þ

where L is a length of the stick measured in the frame O. This p implies that the man on firm ground judges that the stick has been shortened by 1 - v2 (i.e., the Lorentz contraction). In an opposite way, suppose that a straight stick of its length of L is laid along the x1-direction with its end at the origin of the frame O. Similarly to the above case, x1 = L. If the length of the stick is read off at t′ (or x′0) = 0 in the frame O′, from (21.41), i.e., the inverse matrix to (21.40), this time we have p p 1 L0 = x0 = 1 - v2 x1 = L 1 - v2 = L=γ, where L′ is a length of the stick measured in the frame O′. The Lorentz contraction is again observed. Substituting x1 = L in (21.40) gives x′0 ≠ 0. This observation implies that even though measurements of either end of the stick have been done at the same time in the frame O, the coinstantaneity is lost with the frame O′. The relativeness of the time dilation can be dealt with in a similar manner. The discussion is left for readers. In the above examples, we have shown the brief outline and constitution of the special theory of relativity. Since the theory is closely related to the Lorentz transformation, or more specifically the Lorentz group, we wish to survey the constitution and properties of the Lorentz group in more detail in Chap. 24.

21.3

Constitution and Solutions of the Dirac Equation

21.3.1 General Form of the Dirac Equation To address the difficulty in the probabilistic interpretation of (21.9) mentioned in Sect. 21.1, Dirac postulated that the Klein-Gordon Eq. (21.11) could be “factorized” in such a way that

21.3

Constitution and Solutions of the Dirac Equation

945

□ þ m2 ϕðxÞ = - iγ μ ∂μ - m ðiγ ν ∂ν - mÞϕðxÞ = 0,

ð21:50Þ

where γ μ denotes a certain constant mathematical quantity and ∂μ is a shorthand notation of ∂x∂μ . The quantity ∂μ behaves as a covariant vector (see Sect. 21.2). Remember that x represents the four-vector xμ. Once again note that the same subscripts and superscripts should be summed. To show that ∂μ is a covariant vector, we have μ

∂ν 

ρ

∂Λ ρ x ∂ ∂ ∂ ∂ ∂x0μ ∂ 0 = = = Λμρ δρν 0μ = Λμν 0μ = Λμν ∂μ : ν ν 0μ ν 0μ ∂x ∂x ∂x ∂x ∂x ∂x ∂x

Multiplying (21.51) by Λ - 1 ∂ν Λ- 1

ν κ

=

∂ Λ- 1 ∂xν

ν κ

ν , κ

= Λμν

ð21:51Þ

we have

∂ Λ- 1 ∂x0μ

ν κ

= δμκ

∂ ∂ 0 = = ∂κ : ð21:52Þ ∂x0μ ∂x0κ

Equation (21.52) shows that ∂μ is transformed as in (21.25). That is, ∂μ behaves as a covariant vector. Meanwhile, defining A  iγ μ ∂μ , we rewrite (21.50) as ð- A - mÞðA - mÞϕðxÞ = 0:

ð21:53Þ

This expression leads to a following equation: ðA - mÞϕðxÞ = 0:

ð21:54Þ

Since (-A - m) and (A - m) are commutative, we have an alternative expression such that ðA - mÞð- A - mÞϕðxÞ = 0:

ð21:55Þ

ð- A - mÞϕðxÞ = 0:

ð21:56Þ

This naturally produces

Equations (21.54) and (21.56) both share a solid footing for detailed analysis of the Dirac equation. According to the custom, however, we solely use (21.54) for our present purposes. As a consequence of the fact that the original Klein-Gordon equation (21.11) is a second-order linear differential equation (SOLDE), we have

946

21

The Dirac Equation

another linearly independent solution (see Chap. 10). Letting that solution be χ(x), we have ðA - mÞχ ðxÞ = 0:

ð21:57Þ

We give a full description of the Dirac equation as iγ μ ∂μ - m ϕðxÞ = 0

iγ μ ∂μ - m χ ðxÞ = 0:

and

ð21:58Þ

As in the case of Example 1.1 of Sect. 1.3, we choose exponential functions for ϕ(x) and χ(x). More specifically, we assume the solutions described by ϕðxÞ = uðpÞe - ipx where x =

x0 x12 x3 x

p0 p12 p3 p

, p=

χ ðxÞ = vðpÞeipx ,

and

ð21:59Þ

. Also, remark that px = pμ xμ = pμ xμ :

ð21:60Þ

Notice that in (21.59) neither u( p) nor v( p) contains space–time coordinates. That is, u( p) and v( p) behave as a coefficient of the exponential function. To determine the form of γ μ in (21.50), we consider the cross term of μ γ ∂μγ ν∂ν (μ ≠ ν). We have γ μ ∂μ γ ν ∂ν = γ μ γ ν ∂μ ∂ν = γ ν γ μ ∂ν ∂μ = γ ν γ μ ∂μ ∂ν ðμ ≠ νÞ,

ð21:61Þ

where the last equality was obtained by exchanging ∂ν for ∂μ. Therefore, the coefficient of ∂μ∂ν (μ ≠ ν) is γμγν þ γνγμ: Taking account of the fact that the cross terms ∂μ∂ν (μ ≠ ν) are not present in the original Klein-Gordon equation (21.11), we must have γ μ γ ν þ γ ν γ μ = 0 ðμ ≠ νÞ:

ð21:62Þ

To satisfy the relation (21.62), γ μ must have a matrix form. With regard to the 2 terms of ∂μ , comparing their coefficients with those of (21.11), we get γ0

2

= 1, γ 1

2

= γ2

Combining (21.62) and (21.63), we have

2

= γ3

2

= - 1:

ð21:63Þ

21.3

Constitution and Solutions of the Dirac Equation

947

γ μ γ ν þ γ ν γ μ = 2ημν :

ð21:64Þ

Introducing an anti-commutator fA, Bg  AB þ AB,

ð21:65Þ

fγ μ , γ ν g = 2ημν :

ð21:66Þ

we obtain

The quantities γ μ are represented by (4, 4) matrices and called the gamma matrices. Among several representations of the gamma matrices, the Dirac representation is frequently used. With this representation, we have [1, 2, 6] γ0 =

1 0

0 -1

, γk =

0 - σk

σk 0

ðk = 1, 2, 3Þ,

ð21:67Þ

where each component denotes (2, 2) matrix; 1 shows a (2, 2) identity matrix; 0 shows a (2, 2) zero matrix; σ k (k = 1, 2, 3) are the Pauli spin matrices given in (20.41). That is, as a full representation of the gamma matrices we have

γ0 =

γ1 =

γ3 =

1 0 0

0 1 0

0

0 0 0 -1

0 0 0

0 0 - 1 0 0 1 0

-1 0 0 0 0 -1 0 0 1 0 0 0 0 1 0 0 -1

0

0 0 0

,

-1 1 0 , γ2 = 0 0

0 0 0 0 0 -i 0 -i 0

i 0 0

i

0

0

0

,

ð21:68Þ

:

0

Let us determine the solutions of the Dirac equation. Replacing ϕ(x) and χ(x) of (21.58) with those of (21.59), we get pμ γ μ - m uðpÞ = 0

and

- pμ γ μ - m vðpÞ = 0:

ð21:69Þ

In accordance with (21.68) and (21.69), u( p) and v( p) are (4, 1) matrices, i.e., column vectors or strictly speaking spinors to be determined below. If we replaced p of v( p) with -p, where

948

p=

E p

=

p01 p2 p3 p

, -p=

-E -p

- p01 - p2 - p3 -p

=

21

The Dirac Equation

,

ð21:70Þ

we would have pμ γ μ - m vð- pÞ = 0:

ð21:71Þ

Comparing (21.71) with the first equation of (21.69), we have vð- pÞ = cuðpÞ ðc : constantÞ:

ð21:72Þ

So far, the relevant discussion looks parallel with that appeared in Example 1.1 of Sect. 1.3. However, the present case differs from the previous example in the following aspects: (1) Although boundary conditions (BCs) of Dirichlet type were imposed in the previous case, we are dealing with a plane wave solution with an infinite space–time extent. (2) In the previous case (Example 1.1), k could take both positive and negative numbers. In the present case, however, a negative p0 does not fit naturally into our intuition, even though the Lorentz covariance permits a negative number for p0. This is because p0 represents energy and unlike the momentum p, p0 does not have “directionality.” Instead of considering p as the quantities changeable between -1 and +1, we single out p0 and deal with it as a positive parameter (0 < p0 < + 1) with varying values depending on jpj, while assuming that p =

p1 p2 p3

are parameters freely

changeable between -1 and +1. This treatment seems at first glance to violate the Lorentz invariance, but it is not the case. We will discuss the issue in Chap. 22. Thus, we safely assign p0 (>0) and -p0 ( 0, ϕ(x) = u( p) e-ipx of (21.58) and (21.59) must be assigned to the positive-energy state and χ(x) = v( p) eipx of (21.58) and (21.59) to the negative-energy state. To further specify the solutions of (21.69) we restate (21.69) as pμ γ μ - m uðp, hÞ = 0, μ

- pμ γ - m vðp, hÞ = 0,

ð21:73Þ ð21:74Þ

21.3

Constitution and Solutions of the Dirac Equation

949

where (21.73) and (21.74) represent the positive-energy state and negative-energy state, respectively. The index h is called helicity and takes a value either +1 or -1. Notice that we have abandoned u( p) and v( p) in favor of u(p, h) and v(p, h). We will give further consideration to the index h in the next section.

21.3.2

Plane Wave Solutions of the Dirac Equation

The Dirac equation is of special importance as a relativistic equation. The Dirac equation has a wide range of applications in physics including atomic physics that deals with the properties of hydrogen-like atoms (see Chap. 3) as one of the major subjects [2]. Among the applications of the Dirac equations, however, its plane wave solutions are most frequently dealt with for a reason mentioned below: In the field of high-energy physics, particles (e.g., electrons) need to be accelerated close to light velocity and in that case relativistic effects become increasingly dominant. What is more, the motion of particles is well approximated by a plane wave besides collisions of the particles that take place during a very short period of time and in a very small extent of space. In this book, we solely deal with the plane wave solutions accordingly. First, we examine the properties for solutions of u(p, h) of (21.73). Using the gamma matrices given in (21.68), the matrix equation to be solved is described as p0 - m 0 - p3 p1 - ip2

0 p -m p1 þ ip2 p3 0

- p1 - ip2 - p3 0 0 -p -m

p3 - p þ ip2 - p0 - m 0 1

c0 c1 c2 c3

= 0, ð21:75Þ

where

c01 c2 c3 c

is a column vector representation of u(p, h). That is, we have

uðp, hÞ =

c01 c2 c3 c

:

ð21:76Þ

We must have u(p, h) ≢ 0 to obtain a physically meaningful solution with u(p, h). For this, the determinant of the (4, 4) matrix of (21.75) should be zero. That is,

950

p0 - m

0

p3

- p1 - ip2

0

p0 - m

- p1 þ ip2

- p3

- p3

p1 þ ip2

- p0 - m

0

p1 - ip2

p3

0

- p0 - m

21

The Dirac Equation

= 0:

ð21:77Þ

This is a quartic equation with respect to pμ. The equation leads to a simple form described by p0

2

- p2 - m 2

2

= 0:

ð21:78Þ

Hence, we have p0

2

- p2 - m2 = 0 or

p0 =

p2 þ m2 :

ð21:79Þ

Remember that p0 represents the positive energy. Note that (21.79) is consistent with (1.8), (1.9), and (1.20); recall the natural units. The result, however, has already been implied in (21.2). Thus, we have to find another way to solve (21.75). To this end, first let us consider (21.75) in a simpler form. That is, putting p1 = p2 = 0 in (21.75), (21.75) reads as p0 - m 0 - p3 0

0 p0 - m 0 p3

p3 0 - p0 - m 0

c0 c1 c2 c3

0 - p3 0 - p0 - m

= 0:

ð21:80Þ

Clearly, the first and second column vectors are linearly independent with each other. Multiplying the first column by ( p0 + m) and the third column by p3 produces the same column vectors. This implies that the first and third column vectors are linearly dependent with each other. In a similar manner, the second and fourth column vectors are linearly dependent as well. Thus, the rank of the matrix of (21.80) is two. If p3 = 0 in (21.80), i.e., p = 0, from (21.79) we have p0 = m (>0). Then, (21.80) can be rewritten as 0 0 0 0

0 0 0 0

0 0 - 2m 0

0 0 0 - 2m

c0 c1 c2 c3

= 0:

ð21:81Þ

Equation (21.81) gives the solution of an electron at rest. We will come back to this simple situation in Sect. 24.5. The rank of the matrix of (21.81) is two. In a general case of (21.75) the rank of the matrix is two as well. This means the presence of two linearly independent solutions for (21.75). The index h (h = ± 1) of u(p, h)

21.3

Constitution and Solutions of the Dirac Equation

951

represents and distinguishes these two solutions. The situation is the same as the negative-energy solutions of v(p, h) given in (21.74); see Sect. 21.3.3. Now, (21.75) can be rewritten in a succinct form as p0 - m σp

c01 c2 c3 c

- σp - p0 - m

= 0:

ð21:82Þ

:

ð21:83Þ

Here, we introduce another (4, 4) matrix H

1 jpj

σp 0

0 σp

Since σ  p is Hermitian, H is Hermitian as well. Since an Hermitian matrix can be diagonalized by unitary similarity transformation (see Theorem 14.5 of Sect. 14.3), so can both σ  p and H be. Let χ be a (2, 1) matrix, i.e., a column vector. As an eigenfunction of (21.83), we assume a following type of (4, 1) matrix: Χ

χ sχ

ð21:84Þ

,

where s is a parameter to be determined below. To confirm whether this assumption is valid, let us examine the properties of the following (2, 2) submatrix S, which is described by S

1 σ  p: jpj

ð21:85Þ

The factor σ  p is contained in common in the matrix of LHS for (21.82) and H of (21.83). Using the notation of (21.85), we have H=

S 0

0 S

=

1 jpj

σp 0

0 σp

:

ð21:86Þ

We can readily check the unitarity of S, i.e., S{ S = E 2 , where E2 is a (2, 2) identity matrix. Also, we have detS = - 1:

ð21:87Þ

Since the (2, 2) matrix S is both Hermitian and unitary, (21.87) immediately implies that the eigenvalues of S are ±1; see Chaps. 12 and 14. Representing the

952

21

The Dirac Equation

two linearly independent functions (i.e., eigenfunctions belonging to the eigenvalues of S of ±1) as χ - and χ +, we have Sχ - = ð- 1Þχ - and Sχ þ = ðþ1Þχ þ

ð21:88Þ

where χ - belongs to the eigenvalue -1 and χ + belongs to the eigenvalue +1 with respect to the positive-energy solutions. When we do not specify the eigenvalue, we write Sχ = hχ ðh = ± 1Þ,

ð21:89Þ

where h is said to be a helicity. Thus, we get HΧ =

S 0

χ sχ

0 S

=

Sχ sSχ

=h

χ sχ

= hΧ:

ð21:90Þ

That is, we certainly find that Χ of (21.84) is an eigenvector of H that belongs to an eigenvalue h. Meanwhile, defining the matrix of (21.82) as G

p0 - m σp

- σp - p0 - m

= pμ γ μ - m,

ð21:91Þ

we have Guðp, hÞ = 0

ð21:92Þ

as the equation to be solved. In (21.92) u(p, h) indicates a (4, 1) column vector of (21.76) or (21.82). We find ½G, H = 0:

ð21:93Þ

That is, G and H commute with each other. Note that (21.93) results naturally from the presence of gamma matrices. We remember that two Hermitian matrices commute if and only if there exists a complete orthonormal set (CONS) of common eigenfunctions (Theorem 14.14). This is almost true of G and H. Care should be taken, however, to apply Theorem 14.14 to the present case. This is because G is neither a normal matrix nor an Hermitian matrix except for special cases. Therefore, the matrix G is in general impossible to diagonalize by the unitary similarity transformation (Theorem 14.5). From the point of view of the matrix algebra, it is of great interest whether or not G can be diagonalized. Moreover, if G could be diagonalize, it is important to know what kind of non-singular matrix is used for the similarity transformation (i.e., diagonalization) and what eigenvalues G possesses. Yet, (21.92) immediately tells us that G possesses an eigenvalue zero. Otherwise, we

21.3

Constitution and Solutions of the Dirac Equation

953

are to have u(p, h)  0. We will come back to this intriguing discussion in Chap. 24 to examine the condition for G to be diagonalized. Here, we examine what kind of eigenfunctions G and H share as the common eigenvectors. Since G and H commute, from (21.84) and (21.90) we have GHΧ = HGΧ = hGΧ:

ð21:94Þ

This means that GΧ is an eigenvector of H. Let us tentatively consider that GΧ can be described by GΧ = cΧ,

ð21:95Þ

where c is a constant (certainly including zero). This implies that Χ is an eigenfunction of G and that from (21.90) and (21.95) G and H share a common eigenfunction Χ. Using the above results and notations, let us examine the conditions under which χ sχ

is a solution of (21.75). Replacing the column vector of (21.75) with

χ sχ

, we

have GΧ = G =

χ

=

p0 - m

-σ  p

χ

=

p0 - m

- PS

sχ sχ σp -p -m PS -p -m ðp0 - mÞχ - P sSχ ðp0 - mÞχ - P shχ = = 0, P Sχ - ðp0 þ mÞsχ P hχ - ðp0 þ mÞsχ 0

0

χ sχ

ð21:96Þ where we define a positive number P such that P j p j : For (21.96) to be satisfied, namely, for

χ sχ

ð21:97Þ to be an eigenvector of G that

belongs to the eigenvalue zero, the coefficient of χ of each row for (21.96) must vanish. That is, we must have p0 - m - P sh = 0, P h - p0 þ m s = 0: Then, if we postulate the following relations: s=

p0 - m Ph = , Ph p0 þ m

(21.96) will be satisfied. Equation (21.98) implies that

ð21:98Þ

954

21

ð P h Þ 2 = P 2 h 2 = P 2 = p 2 = p0 þ m p 0 - m = p 0

2

The Dirac Equation

- m2 :

However, this is equivalent to (21.79). Thus, we have confirmed that Χ =

χ sχ

of

(21.84) is certainly an eigenfunction of G, if we choose s as s = P h=ðp þ mÞ. At the 0

χ sχ

same time, we find that Χ =

is indeed the common eigenfunction shared by G

and H with a definite value s designated by (21.98). This eigenfunction belongs to the eigenvalue h (h = ± 1) of H and the eigenvalue 0 of G. It will be convenient for a purpose of future reference to define a positive number S as S

P p0 þ m

ð21:99Þ

so that we have s = hS. The eigenfunction of (21.96) is then expressed as Χ=

χ hSχ

:

ð21:100Þ

Hence, as u(p, h) given in (21.76) we obtain uðp, hÞ = N

χ hSχ

ð21:101Þ

,

where N is a normalization constant that will be determined later in Sect. 21.4. To solve the Dirac equation in an actual form, we wish to choose the polar coordinate representation (see Sect. 3.2). For this, we put p1 = P sin θ cos ϕ, p2 = P sin θ sin ϕ, p3 = P cos θ:

Then, we have - cos θ e - iϕ sin θ

S=

eiϕ sin θ cos θ

:

ð21:102Þ

Note that detS = - 1. Solving the eigenvalue equation (21.89) according to Sects. 12.1 and 14.3, we get normalized eigenfunctions described by

χ- =

eiϕ=2 cos -e

- iϕ=2

θ 2

sin

θ 2

eiϕ=2 sin

and χ þ = e

- iϕ=2

θ

2 θ cos 2

ð21:103Þ

21.3

Constitution and Solutions of the Dirac Equation

955

where χ - and χ + are the eigenfunctions of the operator S belonging to the helicity h = - 1 and +1, respectively. Then, as the positive-energy solution of (21.73) we have eiϕ=2 cos

uðp, - 1Þ = N

-e

- iϕ=2

- Se

iϕ=2

Se

- iϕ=2

θ

eiϕ=2 sin

2

sin cos

sin

θ 2 θ

θ

2 θ e cos 2 θ iϕ=2 Se sin 2 θ - iϕ=2 Se cos 2 - iϕ=2

, uðp, þ1Þ = N

2 θ 2

:

ð21:104Þ

Since the rank of G is two, any positive-energy solution can be expressed as a linear combination of u(p, -1) and u(p, +1).

21.3.3

Negative-Energy Solution of the Dirac Equation

Using (21.74), with the negative-energy sate v(p, h) we have the Dirac equation expressed as - pμ γ μ - m vðp, hÞ =

- p0 - m σð - pÞ

- σð - pÞ p0 - m

χ~ t~ χ

= 0:

ð21:105Þ

In accordance with (21.84) we put ~= Χ

χ~ t~ χ

:

ð21:106Þ

~ as Meanwhile, defining S ~  S

1 1 σ  ð- pÞ = σ  ð- pÞ = - S P j - pj

ð21:107Þ

~ as and defining G ~  - pμ γ μ - m, G we rewrite (21.105) as

ð21:108Þ

956

21

~ χ~ = G t~ χ

- p0 - m ~ PS

~ - PS p0 - m

χ~ t~ χ

The Dirac Equation

= 0:

ð21:109Þ

As in the case of (21.96), we get ~ χ~ = G t~ χ

~χ - ðp0 þmÞχ~ - P tS~

=

~ χ þðp0 - mÞt~ P S~ χ

- ðp0 þmÞχ~ - P th~ χ P h~ χ þðp0 - mÞt~ χ

= 0,

ð21:110Þ

where we put ~ χ = h~ S~ χ: For (21.110) to be satisfied, we have - p0 þ m - P ht = 0,

ð21:111Þ

hP þ p0 - m t = 0:

ð21:112Þ

Then, we get t= -

Ph p0 þ m : =- 0 Ph p -m

ð21:113Þ

Again, (21.113) is equivalent to (21.79). Hence, (21.106) is represented as χ~ t~ χ

=

p0 þ m Ph

Ph χ~ p0 þ m - χ~

=

p0 þ m Ph

hS~ χ - χ~

=

1 S

S~ χ - h~ χ

,

ð21:114Þ

where with the last equality the relations h2 = 1 and (21.99) were used. As a ~ the eigenvalue equation (21.89) is consequence of switching the operator S to S, replaced with ~ χ = h~ S~ χ = - S~ χ:

ð21:115Þ

~ as in the case of (21.86) such that Defining H ~ H we get

~ S 0

0 ~ S

= -H=

1 j -p j

- σp 0

0 - σp

,

ð21:116Þ

21.3

Constitution and Solutions of the Dirac Equation

~ H

S~ χ - h~ χ

=

~ S 0

S~ χ - h~ χ

0 ~ S

=

~χ S S~ ~χ - hS~

957 Sh~ χ - h2 χ~

=

=h

S~ χ - h~ χ

:

ð21:117Þ

From (21.114) and (21.117), as the negative-energy solution v(p, h) of (21.105) we get ~ vðp, hÞ = N

S~ χ , - h~ χ

ð21:118Þ

~ is a normalization constant to be determined later. where N From (21.115), the negative-energy state corresponding to the helicity eigenvalue ~ should be translated into the positive-energy state corresponding to the +h of H eigenvalue -h of H. Borrowing the results of (21.103), with the normalized eigenfunctions we have

χ~þ =

- eiϕ=2 cos e - iϕ=2 sin

θ

and χ~ - =

2 θ 2

- eiϕ=2 sin

θ 2 θ

- e - iϕ=2 cos

ð21:119Þ

,

2

~ and χ~ - belongs to the where χ~þ belongs to the helicity eigenvalue h = + 1 of S ~ with respect to the negative-energy solutions. Thus, using helicity h = - 1 of S (21.114) we obtain - Seiϕ=2 cos

~ vðp, þ1Þ = N

Se e

- iϕ=2

iϕ=2

sin

cos

θ

2

θ 2

- e - iϕ=2 sin

- Seiϕ=2 sin

2 θ

θ 2

~ , vðp, - 1Þ = N

θ

2 θ - Se cos 2 θ iϕ=2 -e sin 2 θ - iϕ=2 -e cos 2 - iϕ=2

:

ð21:120Þ

The column vectors of (21.104) and (21.120) expressed as (4, 1) matrices are called the Dirac spinors. In both the cases, we retain the freedom to choose a phase factor, because it does not alter the quantum state. Strictly speaking, the factor e±ipx of (21.59) is a phase factor, but this factor is invariant under the Lorentz transformation. Ignoring the phase factor, with the second equation of (21.120) we may choose

958

21

Seiϕ=2 sin

~ vðp, - 1Þ = N

The Dirac Equation

θ

2 θ Se cos 2 θ eiϕ=2 sin 2 θ e - iϕ=2 cos 2 - iϕ=2

:

ð21:121Þ

We will revisit this issue in relation to the Dirac adjoint and the charge conjugation (vide infra). ~ is two, any negative-energy solution can be expressed as a Since the rank of G linear combination of v(p, +1) and v(p, -1) as in the case of the positive-energy solutions. In the description of the Dirac spinor, a following tangible example helps further realize what is the point. Example 21.4 In Sect. 21.3.1 we have mentioned that the parameter p can be freely chosen between -1 and +1 in the three-dimensional space. Then, we wish to examine how the Dirac spinor can be described by switching the momentum from p to -p. Choosing u(p, -1) for the Dirac spinor, for instance, we had eiϕ=2 cos

uðp, - 1Þ = N

θ 2

- e - iϕ=2 sin - Seiϕ=2 cos Se - iϕ=2 sin

θ 2 θ

:

ð21:104Þ

2 θ 2

Corresponding to p → - p, we have ðϕ, θÞ → ðϕ ± π, π - θÞ;

ð20:338Þ

see Sect. 20.5.3. Upon the replacement of (20.338), u(p, -1) is converted to

21.3

Constitution and Solutions of the Dirac Equation

θ 2 θ - iϕ ð ± iÞe 2 cos 2 N iϕ θ ð ± iÞSe 2 sin 2 iϕ θ ð ± iÞSe - 2 cos 2 ð ± iÞuðp, þ1Þ,

959



u0 ð - p, þ1Þ =

=

θ 2 θ - iϕ2 e cos 2 iϕ θ Se 2 sin 2 iϕ θ Se - 2 cos 2 iϕ

ð ± iÞe 2 sin

e 2 sin

= ð ± iÞN

ð21:122Þ where u′ denotes the change in the functional form; the helicity has been switched from -1 to +1 according to (21.89). With the above switching, we have uðp, - 1Þ → ð ± iÞuðp, 1Þ

ð21:123Þ

with the energy unaltered. For u(p, +1), similarly we have uðp, þ1Þ → ð ± iÞuðp, - 1Þ:

ð21:124Þ

With the negative-energy solutions, we obtain a similar result. Then, aside from the phase factor, the momentum switching produces χ - $ χ þ;

χþ $ χ -;

h - $ hþ ,

where h- and h+ stand for the helicity -1 and +1, respectively. Although in the non-relativistic quantum mechanics (i.e., the Schrödinger equation) the helicity was not an important dynamical variable, it plays an essential role as a conserved quantity in the relativistic Dirac equation. In a quantum-mechanical system, the total angular momentum (A) containing orbital angular momentum (L) and spin angular momentum (σ) is a conserved quantity. That is, we have A = L þ σ:

ð21:125Þ

Taking an inner product of (21.125) with p, we obtain A  p = L  p þ σ  p:

ð21:126Þ

If the angular momentum is measured in the direction of motion (i.e., the direction parallel to p), L  p = 0. It is because in this situation we can represent L  p = ðx × pÞ  p = 0, where x is a position vector. Thus, we get

960

21

The Dirac Equation

A  p = σ  p:

ð21:127Þ

Since we are dealing with the free field, we have A  p = σ  p = c,

ð21:128Þ

where c is a constant. That is, the component of the total angular momentum in the direction of motion is conserved and equal to the component of the spin angular momentum in the direction of motion. Getting back to the definition of S, we may define it as S  lim

jpj → 0 j

1 σ  p: pj

ð21:129Þ

Equation (21.129) implies that S can be defined even with an infinitesimally small amount of |p|. We remark that we can observe the motion of the electron from various inertial frames of reference. Depending upon the choice of the inertial frame, we may have varying p (both of varying directions and magnitudes). In particular, small changes in p may lead to a positive or a negative with σ  p and the sudden jump in the helicity between ±1 accordingly. This seems somewhat unnatural. Or rather, it may be natural to assume that an electron inherently carries spin regardless of its movement form (either moving or resting). Even though in (21.129) S seemed to be described as an indeterminate form of limit at p → 0, it was defined in (21.102) actually, regardless of the magnitude of momentum jpj. Or rather, it is preferable to assume that the angles ϕ and θ represent the geometric transformation of the basis vectors. We will come back to this point later in Chap. 24. Readers may well wonder whether we defined an operator σ0 σ0 instead of H defined in (21.83). Such an operator, however, would be non-commutative with Hp defined just below (see Sect. 21.3.4). This aspect warrants that the projection of the spin operator σ onto the direction of motion should be taken into account. For this reason, from here on, we will not distinguish the spin operator from the helicity ~ when we are thinking of the plane wave solutions of the Dirac operator H and H equation.

21.3.4 One-Particle Hamiltonian of the Dirac Equation In this section, we wish to think of how we can specify the quantum state of the plane wave of an electron in the relativistic Dirac field. We know from Chaps. 3 and 14 that it is important to specify a set of mutually commutative physical quantities (i.e., Hermitian operators) to characterize the physical system. Normally, it will suffice to

21.3

Constitution and Solutions of the Dirac Equation

961

describe the Hamiltonian of the system and seek the operators that commute with the Hamiltonian. First, let us think of the one-particle Hamiltonian Hp for the plane wave with the positive energy. It is given by [2] H p = γ 0 ðγ  p þ mÞ:

ð21:130Þ

The derivation of (21.130) will be performed in Chap. 22 in relation to the quantization of fields. As a matrix representation, we have Hp = Hence, operating Hp on Hp

χ hSχ

=

m σ ∙p

σ ∙p -m

χ hSχ χ hSχ

m σ ∙p

σ ∙p -m

:

ð21:131Þ

, we have =

mχ þP hhSχ P hχ - mhSχ

=

p0 χ hSp0 χ

= p0

χ hSχ

: ð21:132Þ

For the negative-energy solution, in turn, similarly operating H-p on Χ0 =

S~ χ - h~ χ

ð21:133Þ

,

we have H -p - p0

S~ χ - h~ χ S~ χ - h~ χ

=

m -σ ∙p

-σ ∙p -m

S~ χ - h~ χ

=

mS~ χ - P hh~ χ P hS~ χ þmh~ χ

=

- Sp0 χ~ hp0 χ~

=

:

ð21:134Þ

Thus, as expected we have reproduced a positive energy p0 and negative energy -p0, to which u(p, h) and v(p, h) belong to, respectively. The spectra of Hp and H-p are characterized by the continuous spectra as shown in Fig. 21.2. Note that energies E ranging - m < E < m are forbidden. We can readily check that Hp and H are commutative with each other. So are H-p ~ Thus, we find that the Dirac spinors u(p, h) and v(p, h) determined in (21.101) and H. Fig. 21.2 Continuous spectra of Hp and H-p. Energies E ranging m < E < m are forbidden

=

+

− −

0



962

21

The Dirac Equation

and (21.118) are the simultaneous eigenstates of the Hamiltonian and Hermitian helicity operator of the Dirac field. Naturally, these Dirac spinors constitute the complete set of the solutions of the Dirac equation. In Table 21.1 we summarize the matrix form of the Hamiltonian and helicity operators relevant to the Dirac field together with their simultaneous eigenstates and their eigenvalues. It is worth comparing the Dirac equation with the Schrödinger equation from the point of view of an eigenvalue problem. An energy eigenvalue equation based on the Schrödinger equation is usually described as HϕðxÞ = EϕðxÞ,

ð1:55Þ

where H denotes the Hamiltonian. In a hydrogen atom, the total angular momentum operator L2 is commutative with H; see (3.15). Since L2 (or M2) and Lz (or Mz) have simultaneously been diagonalized, Lz is commutative with H; see (3.158) and (3.159) as well. Since all H, L2, and Lz are Hermitian operators, there must exist a complete orthonormal set of common eigenvectors (Theorem 14.14). Thus, with the ~ ðnÞ hydrogen atom, the solution ϕ(x) is represented by (3.300), i.e., Y m l ðθ, ϕÞRl ðr Þ. As compared with the Schrödinger equation, the eigenvalue equation of the Dirac equation is given by Guðp, hÞ = 0 or

~ ðp, hÞ = 0: Gv

In other words, we are seeking an eigenfunction u(p, h) or v(p, h) that belongs to an eigenvalue of zero. ~ are commutative Hermitian operators, according Since Hp and H (or H-p and H) to Theorem 14.14 there must exist a CONS of common eigenvectors as well. ~ in general. Therefore, the said Nonetheless, H±p is not commutative with G or G CONS cannot be a solution of the Dirac equation as an exceptional case, however, ~ if p = 0. (We have already encountered a similar H±p is commutative with G and G case in Sect. 3.3 where Lx, Ly, and Lz are commutative with one another if the orbital angular momentum L = 0, even though it is not the case with L ≠ 0.) In that ~ are Hermitian (i.e., real diagonalized exceptional case where p = 0, both G and G metrices), and so we have a CONS of common eigenvectors that are solutions of the Dirac equation. ~ needs further The fact that H±p (p ≠ 0) is not commutative with G or G ~ ~ =ω ~ ). Then, ~Ω explanation. Consider an eigenvalue problem GΩ = ωΩ (or GΩ both the positive-energy solution and negative-energy solution of the Dirac equation can be a solution of the above two types of eigenvalue problems. That is, both Ω and ~ take both the positive-energy and negative-energy solutions as an eigenfunction. Ω Nonetheless, Hp (H-p) takes only a positive (negative) energy as an eigenvalue. Correspondingly, Hp (H-p) takes only the positive-energy (negative-energy) solution as the eigenfunction. For this reason, H±p (p ≠ 0) is not commutative with G or ~ For further discussion, see Chap. 24. G.

a

H -p =

-σ  p

-m

-σ  p

-m

σp

m

Hamiltonian Hp, H-p m Hp = σp

With the normalization constants N and N, we have N = N =

v(p, h)

Dirac spinor u(p, h)

p0 þ m; see (21.138) and (21.143)

Helicity operator ℌ, ℌ σp 0 ℌ = j1pj 0 σp -σ  p 0 1 ℌ = j - pj 0 -σ p

Table 21.1 Hamiltonian and helicity operators of the Dirac field Eigenstatea χ N hSχ Sχ N - hχ

-p0

Eigenvalue Energy p0

h

Helicity h

21.3 Constitution and Solutions of the Dirac Equation 963

964

21

The Dirac Equation

In a general case where p ≠ 0, however, the eigenvectors eligible for the solution of the Dirac equation are not likely to constitute the CONS. As already mentioned in ~ is Hermitian in the general case makes the Sect. 21.3.2, the fact that neither G nor G problem more complicated. Hence, we must abandon the familiar orthonormalization method based on the inner product in favor of another way. We develop a different method in the next section accordingly.

21.4 Normalization of the Solutions of the Dirac Equation First, we introduce the Dirac adjoint ψ such that ψ  ψ {γ0:

ð21:135Þ

Using this notation, we have uðp, hÞ uðp, hÞ = N  χ { hSχ { γ 0 N 2

{

χ

= jN j2 χ { - hSχ {

hSχ

2 2 {

hSχ ð21:136Þ

= jN j χ  χ - h S χ  χ , = jN j 2 1 - S 2 = jN j2 1 -

χ

P p0 þ m

2

= jN j2

2m , p0 þ m

where χ represents either χ + or χ - and χ {  χ = 1 in either case; we have h2 = 1; with 2 the last equality we used the relation ðp0 Þ = P 2 þ m2 . The number N is the normalization constant of (21.101). According to the custom, we wish to get a normalized eigenfunction u(p, h) such that uðp, hÞ uðp, hÞ = 2m:

ð21:137Þ

This implies that |N|2 = p0 + m. Choosing N to be real and positive, we have N=

p0 þ m:

ð21:138Þ

Then, from (21.101) we get uðp, hÞ =

p0 þ m

χ hSχ

:

ð21:139Þ

With the eigenfunctions having different helicity h and h′ (h ≠ h′), we have

21.4

Normalization of the Solutions of the Dirac Equation

965

uðp, hÞ uðp, h0 Þ = 0 ðh ≠ h0 Þ,

ð21:140Þ

because χ {h  χ h0 = 0, namely, χ h and χ h0 ðh ≠ h0 Þ are orthogonal. Notice that χ h or χ h0 denotes either χ - or χ +. Combined with the relation χ {  χ = 1, we have χ {h  χ h0 = δhh0 : Further combining (21.137) and (21.140), we get uðp, hÞ uðp, h0 Þ = 2mδhh0 :

ð21:141Þ

With the negative-energy solution, using (21.114) and similarly normalizing the function given in (21.118) we obtain vðp, hÞ vðp, h0 Þ = - 2mδhh0 ,

ð21:142Þ

where v(p, h) is expressed as ~ vðp, hÞ = N

S~ χ - h~ χ

=

p0 þ m

S~ χ - h~ χ

:

That is, the normalization constant given by ~= N

p0 þ m

ð21:143Þ

for (21.118) is the same as the constant N of (21.138). Note, however, the minus sign in (21.142). Thus, the normalization of the eigenfunctions differs from that based on the positive definite inner product (see Theorem 13.2: Gram–Schmidt orthonormalization Theorem). With regard to the normalization of the plane wave solutions of the Dirac equation, we have the following useful relations:

h = - 1, þ1

uðp, hÞuðp, hÞ = pμ γ μ þ m,

ð21:144Þ

vðp, hÞvðp, hÞ = pμ γ μ - m,

ð21:145Þ

h = þ1, - 1

where the summation with h should be taken as both -1 and +1. We will come back to these important expressions for the normalization in Chap. 24 in relation to the projection operators. We remark the following relationship with respect to the Dirac adjoint. We had

966

21

The Dirac Equation

Guðp, hÞ = 0:

ð21:92Þ

Taking the adjoint of this expression, we have u{ ðp, hÞG{ = u{ ðp, hÞγ 0 γ 0 G{ = uðp, hÞγ 0 G{ = 0:

ð21:146Þ

In (21.91) we had G = pμ γ μ - m. Taking the adjoint of both sides and taking account of the fact that pμ and m are real, we have G{ = pμ ðγ μ Þ{ - m = pμ γ 0 γ μ γ 0 - m,

ð21:147Þ

ð γ μ Þ{ = γ 0 γ μ γ 0 :

ð21:148Þ

where notice that [1]

Multiplying both sides of (21.147) by γ 0 from both the left and right, we get γ 0 G{ γ 0 = pμ γ μ - m = G:

ð21:149Þ

Moreover, multiplying (21.146) by γ 0 from the right, we get uðp, hÞγ 0 G{ γ 0 = uðp, hÞG = 0,

ð21:150Þ

where with the first equality we used (21.149). Meanwhile, we have ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þ = pν ðγ ν γ μ þ γ μ γ ν Þ = pν  2ηνμ = 2pμ : That is, ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þ = 2pμ :

ð21:151Þ

Multiplying both sides of (21.151) by uðp, hÞ from the left and u(p, h′) from the right, we obtain uðp, hÞ½ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þuðp, h0 Þ = 2pμ uðp, hÞuðp, h0 Þ = 2pμ 2mδhh0 ,

ð21:152Þ

where with the last equality we used (21.141). Using (21.92) and (21.150), we get LHS of ð21:152Þ = uðp, hÞðmγ μ þ γ μ mÞuðp, h0 Þ = 2muðp, hÞγ μ uðp, h0 Þ: ð21:153Þ Comparing RHS of (21.152) and (21.153), we obtain

21.5

Charge Conjugation

967

uðp, hÞγ μ uðp, h0 Þ = 2pμ δhh0 :

ð21:154Þ

vðp, hÞγ μ vðp, h0 Þ = 2pμ δhh0 :

ð21:155Þ

Likewise, we have

In particular, putting μ = 0 in (21.154) and (21.155) we have uγ 0 uðp, h0 Þ = ½uðp, hÞ{ uðp, h0 Þ = 2p0 δhh0 , vγ 0 vðp, h0 Þ = ½vðp, hÞ{ vðp, h0 Þ = 2p0 δhh0 :

ð21:156Þ

With respect to the other normalization relations, we have, e.g., u{ ðp, hÞvð- p, h0 Þ = v{ ðp, hÞuð- p, h0 Þ = 0:

ð21:157Þ

Equation (21.157) immediately follows from the fact that u(±p, h) and v(±p, h) belong to the different energy eigenvalues p0 and -p0, respectively; see the discussion of Sect. 21.3.4.

21.5

Charge Conjugation

As in the case of (21.141) and (21.142), the normalization based upon the Dirac adjoint keeps (21.141) and (21.142) invariant by the multiplication of a phase factor c = eiθ (θ : real). That is, instead of u(p, h) and v(p, h) we may freely use cu(p, h) and cv(p, h). Yet, we normally impose a pretty strong constraint on the eigenfunctions u(p, h) and v(p, h) (vide infra). Such a constraint is known as the charge conjugation or charge-conjugation transformation, which is a kind of discrete transformation and thought of as the particle-antiparticle transduction. The charge-conjugation transformation of a function ψ is defined by ψ C  iγ 2 ψ  :

ð21:158Þ

This transformation is defined as the interchange between a particle and an antiparticle. The notion of antiparticle is characteristic of the quantum theory of fields. If ψ represents the quantum state of the particle, ψ C corresponds to that of the antiparticle and vice versa. If ψ is associated with the charge Q (=1), ψ C carries the charge of -Q (= - 1). The charge conjugation is related to the Dirac adjoint such that

968

21

The Dirac Equation

ψ C = Cψ T ,

ð21:159Þ

C  iγ 2 γ 0 :

ð21:160Þ

where C is given by

That is, we have ψ C = Cψ T = iγ 2 γ 0 γ 0

T

T

ψ{

= iγ 2 γ 0 γ 0 ψ  = iγ 2 ψ 

ð21:161Þ

to recover the definition of the charge conjugation of (21.158). Writing ψ  as a column vector representation such that ψ 1 ψ 2

ψ =

ψ 3 ψ 4

ð21:162Þ

,

we obtain

ψ C = iγ 2 ψ  =

0 0 0

0 0 1

0 1 0

-1 0 0

-1

0

0

0

ψ 1 ψ 2 ψ 3 ψ 4

=

- ψ 4 ψ 3 ψ 2 - ψ 1

,

ð21:163Þ

where we used the representation of (21.68) with γ 2. As for the representation of the matrix C, we get

C = iγ 2 γ 0 =

0 0 0

0 0 1

0 -1 0

1 0 0

-1

0

0

0

:

ð21:164Þ

The matrix C is anti-Hermitian with eigenvalues of ±i (each, double root). Also, we have C = - C-1. As anticipated, successive charge-conjugation transformation of ψ gives

21.5

Charge Conjugation C

ψ C = iγ 2 iγ 2 ψ  - ð- E 4 Þψ = ψ,

969 

= iγ 2 ð- iÞ - γ 2 ðψ  Þ = - γ 2

2

ψ=

ð21:165Þ

where with the second to the last equality, we used (21.63). We apply the notion of charge conjugation to u(p, h) and v(p, h) of (21.104) and (21.120) so that we may obtain vðp, hÞ = iγ 2 uðp, hÞ :

ð21:166Þ

uðp, hÞ = iγ 2 vðp, hÞ :

ð21:167Þ

Also, we have

Using the column vector representation for (21.166), we get iϕ

iγ 2 uðp, þ1Þ = N

0 0 0 -1

0 0 1 0

0 -1 1 0 0 0 0

θ 2 θ - iϕ=2 sin Se 2 θ eiϕ=2 cos 2 θ - e - iϕ=2 sin 2

0

e 2 sin iϕ

e - 2 cos iϕ

Se 2 sin

θ 2

= vðp, þ1Þ,

 

θ 2

Se - 2 cos

- Seiϕ=2 cos

= N





θ 2

θ 2



970

21 iϕ

iγ 2 uðp, - 1Þ = N

0 0 0

0 0 1

-1

0

e 2 cos

0 -1 1 0 0 0 0

θ 2 θ - iϕ=2 - Se cos 2 θ iϕ=2 sin -e 2 θ - e - iϕ=2 cos 2

θ 2



- e - 2 sin iϕ

- Se 2 cos

0



Se - 2 sin

The Dirac Equation



θ 2 θ 2 θ 2

  

- Seiϕ=2 sin

=N

ð21:168Þ

= vðp, - 1Þ:

In (21.168), we used N = N. Notice that (21.168) is exactly the same as the second equation of (21.120) including the phase factor. This feature is essential when we deal with the quantization of the Dirac field (see Chap. 22). Also, note that the charge conjugation holds the helicity invariant. As already mentioned, the charge conjugation transforms a particle carrying the charge Q (= + 1) into its antiparticle Q (= - 1) or vice versa. Therefore, the charge conjugation does not cause the change in the movement form of a particle but is responsible for the inner transformation between particle and antiparticle. Then, it is natural that the charge conjugation holds the helicity invariant. We also derive the charge-conjugation transformation in a general form as follows: pμ γ μ - m uðp, hÞ = 0, # take complex conjugate: p2 - γ 2 þ μ≠2

pμ γ μ - m uðp, hÞ = 0,

# insert iγ 2iγ 2 = E4 [(4, 4) identity matrix], before u(p, h): p2 - γ 2 þ μ≠2

# use Cuðp, hÞT  iγ 2 uðp, hÞ :

pμ γ μ - m

iγ 2

iγ 2 uðp, hÞ = 0,

21.6

Characteristics of the Gamma Matrices

p2 - γ 2 iγ 2 þ μ≠2

971

pμ γ μ iγ 2 - miγ 2 Cuðp, hÞT = 0,

# use γ μγ 2 = - γ 2γ μ (μ ≠ 2) and (γ 2)2 = - E4: ip2 - iγ 2 μ≠2

pμ γ μ - miγ 2 Cuðp, hÞT = 0,

# multiply iγ 2 on the above expression from the left: - p2 γ 2 μ≠2

pμ γ μ - m Cuðp, hÞT = 0:

Thus, we obtain - pμ γ μ - m Cuðp, hÞT = 0:

ð21:169Þ

Meanwhile, as already seen, we have had - pμ γ μ - m vðp, hÞ = 0:

ð21:74Þ

Comparing (21.169) and (21.74), we get vðp, hÞ = Cuðp, hÞT :

ð21:170Þ

uðp, hÞ = Cvðp, hÞT :

ð21:171Þ

In an opposite way, we have

The concepts of the Dirac adjoint and charge conjugation play an essential role in the QED, and so we will come back to this point in Chaps. 23 and 24 to consider these concepts in terms of the matrix algebra.

21.6 Characteristics of the Gamma Matrices In this section we summarize important expressions and relations of the gamma matrices. The gamma matrices were constructed using the Pauli spin matrices as in (21.67) and their tangible forms were given by (21.68). The gamma matrices are inherently coupled to the Lorentz transformation. The gamma matrices combine the basic importance and practical feature in the quantum theory of fields accordingly. In

972

21

The Dirac Equation

terms of the latter aspect, we make the most of various formula of the gamma matrices (see Chaps. 23 and 24). We summarize several of them in the last part of this section. We started with the discussion of the Dirac equation by introducing the gamma matrices (Sect. 21.3.1). The key equation for this is given by fγ μ , γ ν g = 2ημν :

ð21:66Þ

As the frequently used representation of the gamma matrices, we had the Dirac representation described by γ0 =

1

0

0

-1

,

γk =

0

σk

- σk

0

ðk = 1, 2, 3Þ:

ð21:67Þ

In (21.67), γ 0 is an Hermitian operator and γ k (k = 1, 2, 3) is anti-Hermitian, while both γ 0 and γ k are unitary. While sticking to the representation of γ 0 and (anti-) Hermitian property of these operators, we wish to deal with the representation of gamma matrices in a little bit broader context. Suppose that γ k (k = 1, 2, 3) is represented by

γk =

a e

b f

c d g h

p

q

r

s

t

u

v

w

ð21:172Þ

,

where a, b, ⋯, w are arbitrary complex numbers. Taking account of γ0γk þ γk γ0 = 0

ðk = 1, 2, 3Þ

ð21:173Þ

and the fact that γ k is anti-Hermitian, (21.172) is reduced to

γ k = - γ k{ =

0 - b - c

b 0 - g

c g

d h

0

s

- d

- h

- s

0

:

From the unitarity of γ k, we get cg þ dh = 0, - bg þ ds = 0, bh - cs = 0, cb þ hs = 0, db - gs = 0, dc þ hg = 0;

ð21:174Þ

21.6

Characteristics of the Gamma Matrices

973

2

2

2

2

bj2 þ c þ dj2 = 1, b 2 þ gj2 þ h = 1, cj2 þ g þ sj2 = 1, d 2 þ hj2 þ s = 1:

ð21:175Þ

Putting b = c = h = s = 0 in (21.175), we have |d| 2=|g|2 = 1. Then, using θk and ρk (k = 1, 2) as real parameters, we obtain 0 0 0

- e - iρk

- e - iθk

0

γk =

0 eiθk eiρk 0 0 0

0 0

0

ðk = 1, 2Þ:

ð21:176Þ

0

From (21.66), we have - eiðθ1 - θ2 Þ γ1, γ2 =

0 0

- eiðρ2 - ρ1 Þ

0

0

- eiðθ2 - θ1 Þ 0 0

iðρ1 - ρ2 Þ

-e 0

0 0 0

0

-e þ

0 0

0

iðθ2 - θ1 Þ

0 0

0 iðρ2 - ρ1 Þ

0 0

-e 0

0

0

- eiðρ1 - ρ2 Þ

0

0

- eiðθ1 - θ2 Þ

= 0:

ð21:177Þ Hence, 2 cosðθ1 - θ2 Þ = 2 cosðρ1 - ρ2 Þ = 0: This is equivalent to θ1 - θ2 = ± π/2 and ρ1 - ρ2 = ± π/2. Setting θ1 = ρ1 = 0 and θ2 = - ρ2 = π/2 in (21.176), we get γ 1 and γ 2 represented in (21.68). Similarly, putting b = d = g = s = 0 in (21.174), we have |c| 2=|h|2 = 1 from (21.175). Using τ and σ as real parameters, we obtain γ 3 described by

γ3 =

0 0 - e - iτ

0 0 0

0

- e - iσ

eiτ 0 0 eiσ 0 0 0

:

0

Setting τ = π and σ = 0, we get γ 3 represented in (21.68).

ð21:178Þ

974

21

The Dirac Equation

There is arbitrariness for choosing the gamma matrices. Multiplying both sides of (21.66) by a unitary operator U{ from the left and U from the right and defining γ μ  U { γ μ U,

ð21:179Þ

fγ μ , γ ν g = 2ημν :

ð21:180Þ

we get

That is, newly defined gamma matrices γ μ satisfy the same condition (21.66) while retaining unitarity and (anti-)Hermiticity. The characteristic polynomials of γ k (k = 1, 2, 3) given by (21.68) are all λ2 þ 1

2

= 0:

ð21:181Þ

That is, λ = ± i (each, doubly degenerate). Their determinants are all 1. With γ 0, its characteristic polynomial is λ2 - 1

2

= 0:

ð21:182Þ

Hence, λ = ± 1 (each, doubly degenerate). Its determinant is 1. We have another gamma matrix, γ 5, that is linearly independent of μ γ (μ = 0, 1, 2, 3). The matrix is defined as

γ 5  iγ 0 γ 1 γ 2 γ 3 =

0 0 1

0 0 0

1 0 0

0 1 0

0

1

0

0

:

ð21:183Þ

The matrix γ 5 is Hermitian and unitary with eigenvalues λ = ± 1 (each, doubly degenerate) and γ 5 is anti-commutative with γ μ (μ = 0, 1, 2, 3). The determinant of γ 5 is 1. In Table 21.2, we summarize major characteristics of the gamma matrices including γ 5.

Table 21.2 Major characteristics of the gamma matrices Matrix γ0

Characteristic Hermitian, unitary

Eigenvalue

γ i (i = 1, 2, 3) γ 5  iγ 0γ 1γ 2γ 3

anti-Hermitian, unitary Hermitian, unitary

±i (each doubly degenerate)

±1 (each doubly degenerate) ±1 (each doubly degenerate)

Determinant 1 1 1

21.6

Characteristics of the Gamma Matrices

975

So far, we have assumed that the gamma matrices comprise a fourth-order square matrix. But it is not self-evident. Now, we tentatively assume that the gamma matrix is the n-th order square matrix. From (21.62) we have γ μ γ ν = - γ ν γ μ ðμ ≠ νÞ: Taking the determinant of the above equation, we have detγ μ detγ ν = ð- 1Þn detγ ν detγ μ :

ð21:184Þ

Thus, we obtain (-1)n = 1. Namely, n must be an even number. In (21.184) we know that detγ μ ≠ 0 from (21.63). We acknowledge that the linearly independent 5 matrices are E (identity matrix), γ μ (μ = 0, 1, 2, 3, 5), and 10 = products 2 consisting of two different gamma matrices. Therefore, the number of linearly independent matrices is 16. Hence, n of (21.184) is at least 4, because we have 42 (=16) matrix elements for a (4, 4) matrix. In fact, it suffices to choose a (4, 4) matrix for the gamma matrix. The detailed proof for n = 4 is given in literature and website [7, 8]. Other than the characteristics of the gamma matrices summarized in Table 21.2, various properties of the gamma matrices are summarized or tabulated in many textbooks. We list some typical examples of them below. Among them, the contractions and trace calculations are frequently used. Useful applications can be seen in subsequent chapters. (i) Contractions: [1] γ λ γ λ = 4E4 , γ λ γ α γ β γ λ = 4ηαβ , γ λ ðAσ γ σ Þγ λ = - 2ðAσ γ σ Þ,

γ λ γα γ λ = - 2γ α ,

ð21:185Þ

γ λ γ α γ β γ ρ γ λ = - 2γ ρ γ β γ α ,

ð21:186Þ

γ λ ðAσ γ σ ÞðBσ γ σ Þγ λ = 4Aσ Bσ ,

ð21:187Þ

γ λ ðAσ γ σ ÞðBσ γ σ ÞðCσ γ σ Þγ λ = - 2ðC σ γ σ ÞðBσ γ σ ÞðAσ γ σ Þ:

ð21:188Þ

Note that in the above, E4 denotes the (4, 4) identity matrix. (ii) Traces: [1] Tr½ðAσ γ σ ÞðBσ γ σ Þ = 4Aσ Bσ , σ

σ

σ

σ

Tr½ðAσ γ ÞðBσ γ ÞðC σ γ ÞðDσ γ Þ = 4½ðAσ Bσ ÞðC σ Dσ Þ - ðAσ C σ ÞðBσ Dσ Þ þ ðAσ Dσ ÞðBσ Cσ Þ:

ð21:189Þ ð21:190Þ

976

21

The Dirac Equation

References 1. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester 2. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York 3. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York 4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 5. Møller C (1952) The theory of relativity. Oxford University Press, London 6. Kaku M (1993) Quantum field theory. Oxford University Press, New York 7. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese) 8. Sakamoto M (2014). https://www.shokabo.co.jp/author/2511/answer/QFT_answer_all.pdf; http://www.shokabo.co.jp/author/2511/answer/QFT_answer_all.pdf. (in Japanese)

Chapter 22

Quantization of Fields

In Chap. 1 we thought the canonical commutation relation as the key concept of quantum mechanics. The canonical commutation relation is based on the commutation relation between the canonical coordinate and canonical momentum. These quantities are represented as generalized coordinates and generalized momenta so that these fundamental variables can conform to different coordinate systems. In this chapter, we reformulate the canonical commutation relation as the guiding principle of the quantization of fields. This will be done within the framework of the Lagrangian formalism. In this chapter we deal with the scalar field, Dirac field, and photon field (electromagnetic field). The commutation relation is first represented as the equal-time commutation relation, as the Lagrangian formalism singles out the time coordinate, so does the equal-time commutation relation. This looks at first glance inconsistent with the Lorentz invariance (or covariance). However, this superficial inconsistency is worked out by introducing the invariant delta functions. At the same time, by translating the canonical commutation relation in the coordinate space into that in the momentum space, the calculations are eased with respect to the field quantities. The covariant nature is indispensable to any physical theory that is based on the theory of relativity. In that sense, the invariant delta functions provide us with a good example. The combination of the invariant delta function and time-ordered product yields the Feynman propagator that is fully necessary to evaluate the interaction between the fields. These procedures are directly associated with the construction of the quantum electrodynamics (QED).

22.1

Lagrangian Formalism of the Fields [1]

Analytical mechanics is based upon the introduction of scalar quantities (e.g., Lagrangian and Hamiltonian) that are independent of the choice of individual coordinate system. Of these quantities, Lagrangian and the associated action integral © Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_22

977

978

22 Quantization of Fields

play a central role in the quantum theory of fields as a powerful tool to develop the theory and to perform calculations. We describe a brief outline of the Lagrangian formalism [1]. For the sake of brevity, suppose that a single particle is moving under the influence of potential U. Then, the kinetic energy T of the particle is described as 3

1 2 m_x = 2 k

T= k=1

3 k=1

1 2 p, 2m k

ð22:1Þ

where m is a mass of the particle; xk and pk (k = 1, 2, 3) represent the x-, y-, and zcomponent of the Cartesian coordinate and momentum of the particle in that direction, respectively. In the Cartesian coordinate, we have pk = m_xk ðk = 1, 2, 3Þ:

ð22:2Þ

Here, we may modify (22.2) so that we can use any suited variables to specify the position of the particle instead of the Cartesian coordinate xi. Examples include the spherical coordinate introduced in Sect. 3.2. Let qk (k = 1, 2, 3) be a set of such variables, which are said to be generalized coordinates. Suppose that xk is expressed as a function of qk such that x1 = x1 ðq1 , q2 , q3 Þ, x2 = x2 ðq1 , q2 , q3 Þ, x3 = x3 ðq1 , q2 , q3 Þ: Differentiating xk with respect to t (i.e., time), we get 3

x_ k = l=1

∂xk q_ ðk = 1, 2, 3Þ: ∂ql l

ð22:3Þ

Assuming that x_ k depends upon both q1, q2, q3 and q_ 1 , q_ 2 , q_ 3 , x_ k can be described by x_ k = x_ k ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ:

ð22:4Þ

Then, from (22.3) we obtain ∂_xk ∂x = k: ∂q_ l ∂ql Comparing (22.1) and (22.2), we may express pk as

ð22:5Þ

22.1

Lagrangian Formalism of the Fields [1]

pk = m_xk =

979

∂T ðk = 1, 2, 3Þ: ∂_xk

ð22:6Þ

Since in (22.1) T is described by x_ k ðk = 1, 2, 3Þ, T can be described by q1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 as well through (22.4) such that T = T ðx_ 1 , x_ 2 , x_ 3 Þ = T ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ:

ð22:7Þ

Thus, in parallel with (22.6) we can define a generalized momentum (or conjugate momentum) pk such that pk 

∂T ðk = 1, 2, 3Þ: ∂q_ k

ð22:8Þ

Taking account of (22.4) and applying the differentiation of a composite function to (22.7), we have 3

pk = l=1

∂T ∂_xl ðk = 1, 2, 3Þ: ∂_xl ∂q_ k

ð22:9Þ

Using (22.5) and (22.6), we obtain 3

pk =

m_xl l=1

∂xl : ∂qk

ð22:10Þ

Further differentiating (22.10) with respect to t, we get p_k =

3

3

m€xl l=1

∂xl d ∂xl þ m_xl : dt ∂qk l = 1 ∂qk

ð22:11Þ

Now, suppose that the single particle we are dealing with undergoes an infinitesimal displacement so that q1, q2, q3 are changed by dq1, dq2, dq3. This causes xk = xk(q1, q2, q3) to be changed by dxk such that 3

dxk = l=1

∂xk dq : ∂ql l

If we think of the virtual displacement δxk [1], we rewrite (22.12) as

ð22:12Þ

980

22 Quantization of Fields 3

∂xk δq : ∂ql l

δxk = l=1

ð22:13Þ

Suppose that the particle is subject to a force F, which exerts an infinitesimal virtual work δW to the particle described by 3

F k δxk :

δW =

ð22:14Þ

k=1

Note that Fk is the x-, y-, and z-component of the original Cartesian coordinate. Namely, F can be expressed such that 3

F k ek ,

F=

ð22:15Þ

k=1

where ek (k = 1, 2, 3) are orthonormal basis vectors of the Cartesian coordinate that appeared in Sect. 3.2. Also, notice that we have used the word virtual work and symbol δW in correspondence with the virtual displacement δxk. Measuring δW in reference to the generalized coordinate, we get 3

3

3

F k δxk =

δW = k=1

Fk l=1

k=1

3

∂xk δq ∂ql l

3

=

Fk l=1

k=1

∂xk δql , ∂ql

ð22:16Þ

where δxk in (22.14) was replaced with δqk through (22.13) and the order of summation was exchanged with respect to l and k. In (22.16), we define Ql as 3

Ql 

∂xk : ∂ql

ð22:17Þ

Ql δql :

ð22:18Þ

Fk k=1

Then, we have 3

δW = l=1

Comparing (22.14) and (22.18), we have 3

3

F k δxk = k=1

Qk δqk : k=1

ð22:19Þ

22.1

Lagrangian Formalism of the Fields [1]

981

Thus, we notice that Qk represented in reference to the generalized coordinate system corresponds to Fk represented in reference to the original Cartesian coordinate system. Hence, Qk is said to be a generalized force. Describing the Newton’s equation of motion in reference to the Cartesian coordinate, we obtain F l = m€xl :

ð22:20Þ

Substituting (22.20) for (22.11), we have 3

p_k =

3

Fl l=1

3

∂xl d ∂xl d ∂xl þ m_xl m_xl = Qk þ , dt ∂qk dt ∂qk l = 1 ∂qk l=1

ð22:21Þ

where with the second equality we used (22.17). Once again, taking account of (22.4) and applying the differentiation of a composite function to (22.7), we have ∂T = ∂qk

3 l=1

∂T ∂_xl = ∂_xl ∂qk

3

m_xl l=1

∂_xl ðk = 1, 2, 3Þ, ∂qk

ð22:22Þ

∂_xl where with the second equality we used (22.6). Calculating ∂q by use of (22.3), we k obtain

∂_xl = ∂qk 

3 m=1

∂xl 2 q_ = ∂qk ∂qm m

3 m=1

∂xl 2 q_ = ∂qm ∂qk m

3 m=1

d ∂ ∂xl dqm = dt ∂qm ∂qk dt

∂xl : ∂qk ð22:23Þ

The last equality of (22.23) is obtained from the differentiation of the composite function xl = xl ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ with respect to t. Substituting (22.23) for RHS of (22.22), we get ∂T = ∂qk

3

m_xl l=1

d ∂xl dt ∂qk

ðk = 1, 2, 3Þ:

ð22:24Þ

Replacing the second term of (22.21) with LHS of (22.24), finally we obtain a succinct expression described by

982

22 Quantization of Fields

∂T : p_k = Qk þ ∂qk

ð22:25Þ

Using (22.8) that defined the generalized momentum, (22.25) is rewritten as ∂T d ∂T ðk = 1, 2, 3Þ: = Qk þ dt ∂q_ k ∂qk

ð22:26Þ

Equation (22.26) can further be transformed to a well-known form based upon the Lagrangian L. Suppose that the force is a conservative force. In that case, Qk can be described as Qk = -

∂U , ∂qk

ð22:27Þ

where U is potential. Then, (22.26) is further rewritten as ∂T ∂U d ∂T þ = 0 ðk = 1, 2, 3Þ: dt ∂q_ k ∂qk ∂qk

ð22:28Þ

We assume that U is independent of q_ k ðk = 1, 2, 3Þ. That is, we have ∂U = 0: ∂q_ k

ð22:29Þ

∂ ðT - U Þ d ∂ ðT - U Þ = 0: dt ∂q_ k ∂qk

ð22:30Þ

Accordingly, we get

Defining the Lagrangian L as L  T - U,

ð22:31Þ

d ∂L ∂L = 0 ðk = 1, 2, 3Þ: dt ∂q_ k ∂qk

ð22:32Þ

we obtain

Equation (22.32) is called the Lagrange’s equations of motion (or Euler– Lagrange equation) and utilized in various fields of physics and natural science. Although we have assumed a single particle system so far, the number of particles does not need to be limited. If a dynamical system we are thinking of contains n particles, the freedom of that system is 3n. However, some part of them may be put

22.1

Lagrangian Formalism of the Fields [1]

983

under a constraint; for instance, suppose that motion of some particles is restricted to the surface of a sphere, or some particles are subject to free fall under the force of gravity. In such cases the freedom decreases. Hence, the freedom of the dynamical system is normally characterized by a positive integer n. Then, (22.32) is reformulated as follows: d ∂L ∂L = 0 ðk = 1, 2, ⋯, nÞ: dt ∂q_ k ∂qk

ð22:33Þ

We remark that on the condition of (22.29), (22.8) can be rewritten as pk 

∂L ∂T = ðk = 1, 2, ⋯, nÞ: ∂q_ k ∂q_ k

ð22:34Þ

The relations (22.8) and (22.34) make a turning point in our formulation. Through the medium of (22.7) and (22.33), we express the Lagrangian L as _ t Þ, L = Lðq1 , ⋯, qn , q_ 1 , ⋯, q_ n , t Þ  Lðq, q,

ð22:35Þ

where with the last identity q and q_ represent q1, ⋯, qn and q_ 1 , ⋯, q_ n collectively. _ t. In an Consequently, pk ðk = 1, 2, ⋯, nÞ can be described as a function of q, q, _ p, t Þ, but this would break the opposite way, we could describe L as Lðq, p, t Þ or Lðq, original functional form of L. Instead, we wish to seek another way of describing a physical quantity P as a function of q, p, and t, i.e., Pðq, p, t Þ. Such satisfactory prescription is called the Legendre transformation [1]. The Legendre transformation is defined by _ t Þ - pq: _ Pðq, p, t Þ = Lðq, q,

ð22:36Þ

The implication of (22.36) is that the physical quantity P has been described as a _ t. Notice that Pðq, p, t Þ function of the independent variables q, p, t instead of q, q, _ t Þ are connected to each other through p and q. _ The quantities q, p are said and Lðq, q, to be canonical variables. That is, the Legendre transformation enables us to describe P as a function of the canonical variables. The Hamiltonian H ðq, p, t Þ, another important quantity in the analytical mechanics, is given by the Legendre transformation and defined as n

_ t Þ: pk q_ k - Lðq, q,

H ðq, p, t Þ =

ð22:37Þ

k=1

If the kinetic energy T is given by a second-order homogeneous polynomial with respect to q_ k ðk = 1, 2, ⋯, nÞ as in (22.1), from (22.34) we have

984

22 Quantization of Fields

∂T = pk = 2T k =q_ k : ∂q_ k

ð22:38Þ

Hence, we obtain 2T k = pk q_ k : Taking the summation of this expression, we get n

n

pk q_ k = 2 k=1 n

where k=1

T k = 2T,

ð22:39Þ

k=1

T k = T. Hence, the Hamiltonian H ðq, p, t Þ given in (22.37) is described by H ðq, p, t Þ = 2T - L = 2T - ðT - U Þ = T þ U:

ð22:40Þ

Thus, the Hamiltonian represents the total energy of the dynamical system. The Legendre transformation is often utilized in thermodynamics. Interested readers are referred to literature [1]. Along with the Lagrange’s equations of motion, we have another important guiding principle of the Lagrangian formalism. It is referred to as the principle of least action or Hamilton’s principle. To formulate the principle properly, we define an action integral S such that [2] Sðq1 , ⋯, qn Þ 

1 -1

dtLðq1 , ⋯, qn , q_ 1 , ⋯, q_ n , t Þ:

ð22:41Þ

The principle of least action is simply described as δSðq1 , ⋯, qn Þ = 0:

ð22:42Þ

The principle of least action says that particles of the dynamical system move in such a way that the action integral takes an extremum. Equations (22.41) and (22.42) lead to the following equation: δSðqÞ =

1 -1

_ t Þ - Lðq, q, _ t Þ = 0, dt ½Lðq þ δq, q_ þ δq,

ð22:43Þ

where q and q_ represent q1, ⋯, qn and q_ 1 , ⋯, q_ n all in one, respectively. We have

22.2

Introductory Fourier Analysis [3]

985 n

_ t Þ = Lðq, q, _ tÞ þ Lðq þ δq, q_ þ δq,

δqk k=1

∂L ∂L þ δq_ k : ∂qk ∂q_ k

ð22:44Þ

d ∂L δqk , dt ∂q_ k

ð22:45Þ

Noting that δq_ k

∂L ∂L d ∂L d = = - δqk δq dt ∂q_ k dt k ∂q_ k ∂q_ k

þ

we get δSðqÞ = = = =

1

n

dt -1

∂L d ∂L - δqk dt ∂q_ k ∂qk

δqk

∂L d ∂L dt ∂q_ k ∂qk

k=1 n

1

dt -1

k=1 n

1

δqk

dt -1

k=1 n

1

δqk

dt -1

δqk

k=1

∂L d ∂L dt ∂q_ k ∂qk d ∂L ∂L dt ∂q_ k ∂qk

þ þ

þ

∂L d δqk dt ∂q_ k

d ∂L δqk dt ∂q_ k 1

n

dt -1

d ∂L δqk dt ∂ q_ k k=1

n

δqk

þ k=1

∂L ∂q_ k

1

-1

= 0:

Assuming that infinitesimal variations vanish at t = ± 1, we obtain δSðqÞ =

1

n

δqk

dt -1

k=1

∂L d ∂L dt ∂q_ k ∂qk

= 0:

ð22:46Þ

The integrand of (22.46) must vanish so that (22.46) can hold with any arbitrarily chosen variations δqk. In other words, the Lagrange’s equations of motion must be represented by ∂L d ∂L = 0 ðk = 1, 2, ⋯, nÞ: dt ∂q_ k ∂qk

ð22:33Þ

In what follows, we will make the most of the Lagrangian formalism.

22.2

Introductory Fourier Analysis [3]

Fourier analysis is one of the most important mathematical techniques. It contains the Fourier series expansion, Fourier transform, inverse Fourier transform, etc. We dealt with these techniques in this book from place to place (e.g., in Chaps. 5 and 20). Since the Fourier analysis, especially the Fourier transform is indispensable tool in

986

22 Quantization of Fields

the quantum theory of fields, we introduce the related techniques rather systematically in this section. This is because the field quantization is associated with the whole space–time. Be that as it may, the major concepts related to the field quantization have already been shown in Parts I and IV. Here, we further extend and improve such concepts.

22.2.1

Fourier Series Expansion

Let us start with the Fourier series expansion. Consider the following SOLDE that appeared in Example 1.1 of Sect. 1.3: d 2 yð xÞ þ λyðxÞ = 0: dx2

ð1:61Þ

We solved (1.61) under the Dirichlet boundary conditions (see Sect. 10.3) of y(L ) = 0 and y(-L ) = 0 (L > 0). The normalized eigenfunctions were described by 1 π cos kx kL = þ mπ ðm = 0, 1, 2, ⋯Þ , L 2

yð xÞ =

ð1:91Þ

with λ = (2m + 1)2π 2/4L2 (m = 0, 1, 2, ⋯) or yð xÞ =

1 sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ, L

ð1:92Þ

with λ = (2n)2π 2/4L2 (n = 1, 2, 3, ⋯). Meanwhile, in Sect. 3.3 we dealt with essentially the same SOLDE -

1 d2 ΦðϕÞ = η: ΦðϕÞ dϕ2

ð3:58Þ

Rewriting (3.58), we get d 2 Φð ϕÞ þ ηΦðϕÞ = 0 ð0 ≤ ϕ ≤ 2π Þ: dϕ2

ð22:47Þ

Under the periodic boundary conditions (BCs) of Φ(0) = Φ(2π) and Φ′(0) = Φ′(2π), we obtained the eigenfunctions described by

22.2

Introductory Fourier Analysis [3]

987

1 ΦðϕÞ = p eimϕ ðm = 0, ± 1, ± 2, ⋯Þ, 2π

ð3:64Þ

with η = m2 (m = 0, ±1, ±2, ⋯). The above two examples show how the different BCs (boundary conditions) produced different eigenfunctions that corresponded to the different eigenvalues, even though we were considering the same differential equation.When we deal with the problems of the quantum field theory, the periodic BCs are more likely to be encountered than the Dirichlet BCs. If we admit that with the quantum-mechanical wave functions ψ, |ψ|2 represents the existence probability (of either particle or field), it is natural to assume that |ψ(x)|2 → 0 at x → ± 1, where x denotes the space–time coordinate. In light of the periodic BCs, we may safely say that ψ(x) vanishes “smoothly” at x → ± 1. Having the aforementioned background knowledge, let us consider the Fourier series expansion of a function f(x) that satisfies the periodic BCs specified by f ð- aÞ = f ðaÞ and f 0 ð- aÞ = f 0 ðaÞ ða > 0Þ:

ð22:48Þ

With a system of eigenfunctions F n(x), we use 1 F n ðxÞ = p einπx=a ðn = 0, ± 1, ± 2, ⋯; a > 0Þ: 2a

ð22:49Þ

Note that if we put a = π in (22.49), (22.49) is of the same form as (3.64). Then, using (22.49) f(x) can be expanded such that f ðxÞ =

1 2a

1

cðnÞeinπx=a

ð22:50Þ

n= -1

with c ð nÞ =

a -a

e - inπx=a f ðxÞ:

ð22:51Þ

In the next paragraph, we further explore the characteristics and implications of (22.50) and (22.51).

22.2.2

Fourier Integral Transforms: Fourier Transform and Inverse Fourier Transform

Let us suppose that a → 1 in (22.50) and (22.51), keeping the BCs the same as (22.48) to see what happens. Putting

988

22 Quantization of Fields

nπ = k, a

ð22:52Þ

Δnπ = Δk: a

ð22:53Þ

we think of its difference

1 = Taking Δn = 1, we have 2a

f ðxÞ =

Δk 2π

Δk 2π .

Then, (22.50) is rewritten as

1

cðnÞeikx = n= -1

1 2π

1

cðnÞeikx Δk:

ð22:54Þ

n= -1

We see from (22.53) that as a → 1, Δk → 0. Then, (22.54) can further be converted into f ðxÞ =

1

1 2π

-1

cðkÞeikx dk,

ð22:55Þ

where c(n) in (22.50) was changed to c(k) through (22.52). Multiplying both sides of (22.55) by e-iqx and integrating it with respect to x, we have 1 -1

e - iqx f ðxÞdx =

1 2π

=

1 2π

=

1 2π

1

1

-1 1

-1

-1 1 -1

cðkÞeiðk - qÞx dkdx

cð k Þ

1

eiðk - qÞx dx dk

-1

ð22:56Þ

cðkÞ2πδðk - qÞdk = cðqÞ:

In (22.56) we used a property of the δ function described by δ ðk Þ =

1 2π

1

eikx dx: -1

ð22:57Þ

Changing the argument q to k in (22.56), we get cð k Þ =

1 -1

e - ikx f ðxÞdx:

ð22:58Þ

The mathematical manipulation that transforms (22.55) into (22.58) is referred to as the Fourier integral transform. The transformation from (22.58) to (22.55) is said to be the inverse Fourier transform. These transforms are called the Fourier integral transforms as a collective term. Rewriting (22.55) as

22.2

Introductory Fourier Analysis [3]

989 1

1 f ðxÞ = p 2π

-1

cðkÞeikx dk,

ð22:59Þ

where cðkÞ  p12π cðkÞ, similarly we get p

1

e -1

- iqx

f ðxÞdx =

2π 2π

1 -1

p cðk Þ2πδðk - qÞdk = 2π cðqÞ:

ð22:60Þ

That is, we have 1

1 cðqÞ = p 2π

-1

e - iqx f ðxÞdx:

ð22:61Þ

p In (22.59) and (22.61), both f(x) and cðqÞ share 1= 2π as a coefficient of the integral in common. Writing cðqÞ as c(q) anew, we obtain 1

1 f ðxÞ = p 2π

cðkÞeikx dk,

ð22:62Þ

e - ikx f ðxÞdx:

ð22:63Þ

-1 1

1 cðk Þ = p 2π

-1

In what follows, we make it a rule to represent the Fourier integral transforms as (22.62) and (22.63), when we deal with the transformation of the canonical coordinates and canonical momenta. Note, however, that when we calculate the invariant delta functions and Feynman propagator, we use the Fourier transform pairs of (22.55) and (22.58). Though non-essential, care should be taken accordingly. In Sect. 5.1 we mentioned that the eigenvectors jki (k = 0, 1, 2, ⋯) (i.e., solutions of an eigenvalue equation) form a complete orthonormal system (CONS) such that j kihk j = E,

P

ð5:15Þ

k

where E is an identity operator. In (5.15) the number of jki might be either finite or infinite, but jki was countable. A dynamical system we will be dealing with in this chapter, however, normally has infinite and uncountable degrees of freedom. To conform (5.15) to our present purpose, (5.15) should be modified as follows: E=

1 -1

j kihk j dk,

where E is the identity operator and k is a suitable continuous parameter.

ð22:64Þ

990

22 Quantization of Fields

Both (5.15) and (22.64) can be regarded as the completeness of the projection operators (see Sect. 14.1). Multiplying both sides of (22.64) by hxj and jfi from the left and from the right, respectively, we get 1

hxjf i =

hxjk ihkjf idk:

-1

Using the relation introduced in Sect. 10.4 as f ðxÞ  hxjf i

ð10:90Þ

cðkÞ  hkjf i,

ð22:65Þ

and further defining

we have 1

f ðxÞ =

-1

hxjk icðk Þdk:

ð22:66Þ

Comparing (22.62) and (22.66), we obtain the following relationship [4]: 1 hxjk i = p eikx : 2π

ð22:67Þ

As in the case of (22.64), we have 1

E=

-1

j xihx j dx:

ð22:68Þ

Multiplying both sides of (22.68) by hkj and jfi from the left and from the right, respectively, we get hkjf i =

1 -1

hkjxihxjf idx:

ð22:69Þ

Thus, we obtain cðk Þ =

1 -1

1 =p 2π

hkjxif ðxÞdx = 1

e -1

- ikx

1 -1

hxjk i f ðxÞdx =

1

1 p e - ikx f ðxÞdx - 1 2π

ð22:70Þ

f ðxÞdx,

where with the second equality we used (13.2). Thus, we reproduced (22.63).

22.3

Quantization of the Scalar Field [5, 6]

991

22.3 Quantization of the Scalar Field [5, 6] 22.3.1

Lagrangian Density and Action Integral

Going back to Sects. 1.1 and 21.1, the energy–momentum–mass relationship of a particle is represented in the natural units as E 2 = p2 þ m 2 :

ð22:71Þ

As already seen in Sect. 21.1, the Klein–Gordon equation was based upon this relationship. From (22.71), we have E= ±

p2 þ m2 ,

ð22:72Þ

to which both positive and negative energies are presumed. The quantization of (neutral) scalar field (or Klein-Gordon field), Dirac field, and electromagnetic field to be dealt with in this chapter is closely associated with this assumption. At the same time, this caused problems at the early stage of the quantum field theory along with the difficulty in the probabilistic interpretation of (21.9). Nevertheless, after Dirac invented the equation named after himself (i.e., the Dirac equation), the Klein– Gordon equation turned out to be valid as well as the Dirac equation as a basis equation describing quantum fields [7]. Historically speaking, the quantum theory of fields was at first established as the theory that dealt with the dynamical system consisting of photons and electrons and referred to as quantum electrodynamics (QED). In this connection, the quantization of scalar field is not necessarily required. In this section, however, we first describe the quantization of the (neutral) scalar fields based on the Klein–Gordon equation. It is because the scalar field quantization is not only the simplest example but also a prototype of the related quantum theories of fields. Hence, we study the quantization of scalar field in detail accordingly. We start with the formulation of the Lagrangian density of the free Klein–Gordon field. The Lagrangian density L is given by [5, 8] L=

1 μ ∂ ϕ∂ ϕ - m2 ϕ2 : 2 μ

ð22:73Þ

The action integral S(ϕ) follows from (22.73) such that Sð ϕÞ =

dx L =

dx

1 μ ∂ ϕðxÞ∂ ϕðxÞ - m2 ½ϕðxÞ2 : 2 μ

ð22:74Þ

Note that unlike (22.41) where the integration was taken only with the time coordinate, the integration in (22.74) has been taken over the whole space–time. That is

992

22 Quantization of Fields 1

dx 

1

dx0

1

1

dx1

-1

-1

dx2 -1

-1

dx3 :

ð22:75Þ

Taking variation with respect to ϕ of S(ϕ), we have δSðϕÞ = δ

dx LðxÞ

1 2 1 = 2

μ

þ

μ

dx ∂μ δϕðxÞ ∂ ϕðxÞ þ ∂μ ϕðxÞ½∂ δϕðxÞ - m2  2ϕðxÞδϕðxÞ

=

1

μ

dx ≠ μ f½δϕðxÞ∂ ϕðxÞgxμ = - 1 þ 1 2

μ

1 2

dx ≠ μ ∂μ ϕðxÞ½δϕðxÞ

1 xμ = - 1

μ

dx - ∂μ ∂ ϕðxÞ - ∂ ∂μ ϕðxÞ - m2  2ϕðxÞ δϕðxÞ = 0, ð22:76Þ

where the second to the last equality resulted from the integration by parts; x≠μ (or x≠μ) in the first and second terms means the triple integral with respect to the variables x other than xμ (or xμ). We assume that δϕ(x) → 0 at xμ (xμ) → ± 1. Hence, with the second to the last equality of (22.76), the first and second integrals vanish. Consequently, for δS(ϕ) = 0 to be satisfied with regard to any arbitrarily taken δϕ(x), the integrand of the third integral must vanish. This is equivalent to μ

μ

- ∂μ ∂ ϕðxÞ - ∂ ∂μ ϕðxÞ - m2  2ϕðxÞ = 0: That is, we obtain μ

∂μ ∂ ϕðxÞ þ m2 ϕðxÞ = 0:

ð22:77Þ

Note that in the above equation ∂μ∂μϕ(x) = ∂μ∂μϕ(x). Equation (22.77) is identical with (21.11). Thus, we find that the Klein–Gordon equation has followed from the principle of least action. In other words, on the basis of that principle we obtain an equation of motion of the quantum field from the action integral described by Sð ϕÞ =

dxL ϕðxÞ, ∂μ ϕðxÞ :

ð22:78Þ

Namely, the Euler–Lagrange equation described by [2] ∂μ

∂L ∂ ∂μ ϕ

-

∂L =0 ∂ϕ

ð22:79Þ

22.3

Quantization of the Scalar Field [5, 6]

993

follows from δS(ϕ) = 0. The Euler–Lagrange equation can be regarded as a relativistic extension of the Lagrange’s equations of motion (22.33). The derivation of (22.79) is left for the readers [2]. The implication of (22.78) and (22.79) is that once L[ϕ(x), ∂μϕ(x)] is properly formulated with a relativistic field, the equation of motion for that field can adequately be derived from the principle of least action. The Lagrangian formalism of the Dirac field and electromagnetic field will be given in Sects. 22.4 and 22.5, respectively.

22.3.2

Equal-Time Commutation Relation and Field Quantization

Once the Lagrangian density L has been given with the field to be quantized, the quantization of fields is carried out as the equal-time commutation relation between the canonical variables, i.e., generalized coordinate (or canonical coordinate) and generalized momentum (or canonical momentum). That is, if L is given as L[ϕ(x), ∂μϕ(x)] in (22.78) as a function of generalized coordinates ϕ(x) as well as their derivatives ∂μϕ(x), the quantization (or canonical quantization) of the scalar field is postulated as the canonical commutation relations expressed as ½ϕðt, xÞ, π ðt, yÞ = iδ3 ðx - yÞ, ½ϕðt, xÞ, ϕðt, yÞ = ½π ðt, xÞ, π ðt, yÞ = 0:

ð22:80Þ

In (22.80), generalized momentum π(x) is defined in analogy with (22.34) as π

∂L : ∂ϕ_

ð22:81Þ

Also, we have δ 3 ð x - yÞ  δ x 1 - y1 δ x 2 - y2 δ x3 - y3 : If more than one coordinate is responsible for (22.80), that can appropriately be indexed. In that case, the commutation relations are sometimes represented as a matrix form (see Sect. 22.4). The first equation of (22.80) is essentially the same as (1.140) and (1.141), which is rewritten in the natural units (i.e., ħ = 1) as ½q, p = iE,

ð22:82Þ

where E is the identity operator. Thus, we notice that E of (22.82) is replaced with δ3(x - y) in (22.80). This is because the whole spatial coordinates are responsible for the field quantization.

994

22 Quantization of Fields

Equation (22.80) is often said to be the equal-time commutation relation. As is so named, the equal-time commutation relation favors the time coordinate over the other spatial coordinate. For this reason, the Lorentz covariance seems unclear, but the covariance manifests itself later, especially when we deal with, e.g., the invariant delta functions and the Feynman propagator. With the free Klein–Gordon field, from (22.73) the canonical momentum π(t, x) is described as π ðt, xÞ =

∂L 0 = ∂ ϕðt, xÞ = ϕ_ ðt, xÞ: ∂ð∂0 ϕÞ

ð22:83Þ

Using (22.83), the equal-time commutation relation is written as ½ϕðt, xÞ, π ðt, yÞ = ϕðt, xÞ, ϕ_ ðt, yÞ = iδ3 ðx - yÞ, ½ϕðt, xÞ, ϕðt, yÞ = ϕ_ ðt, xÞ, ϕ_ ðt, yÞ = 0:

ð22:84Þ

Now, we are in the position to do with our task at hand. We wish to seek Fourier integral representation of the free Klein–Gordon field ϕ(x). The three-dimensional representation is given by [8] ϕðt, xÞ =

d3 k ð2π Þ

3

qk ðt Þeikx ,

ð22:85Þ

where d3k  dk1dk2dk3. Notice that (22.85) is the three-dimensional version of (22.62). The Fourier integral representation of (22.85) differs from that appeared in Sect. 22.2.2 in that the expansion coefficient qk(t) contains the time coordinate as a parameter. Inserting (22.85) into the Klein–Gordon equation (22.77), we obtain ::

qk ðt Þ þ k2 þ m2 qk ðt Þ = 0:

ð22:86Þ

This is the equation of motion of a harmonic oscillator. With a general solution of (22.86), as in the case of (2.4) of Sect. 2.1 we have qk ðt Þ = q1 ðkÞe - ik0 t þ q2 ðkÞeik0 t ,

ð22:87Þ

with k0 

k2 þ m2 ð > 0Þ:

ð22:88Þ

Notice that in (22.87) each term has been factorized with respect to k and t so that the exponential factors contain only t as a variable. The quantity k0 represents the angular frequency of the harmonic oscillator and varies with |k|. Also, note that the

22.3

Quantization of the Scalar Field [5, 6]

995

angular frequency and energy have the same dimension in the natural units (vide infra). Here we assume that ϕ(x) represents the neutral scalar field so that ϕ(x) is Hermitian. In the present case ϕ(x) is represented by a (1, 1) matrix. Whereas it is a real c-number before the field quantization, after the field quantization it should be regarded as a real q-number. Namely, the order of the multiplication must be considered properly between the q-numbers. Taking adjoint of (22.85), we have d3 k

ϕ{ ðxÞ =

3

ð2π Þ

qk { ðt Þe - ikx :

ð22:89Þ

q - k { ðt Þeikx :

ð22:90Þ

Replacing k with -k in (22.89), we get d3 k

ϕ{ ðxÞ =

3

ð2π Þ

For (22.90) we used þ1 -1

d3 k ð2π Þ

3

qk { ðt Þe - ikx = =

-1

ð- 1Þ3 d3 k

þ1 þ1 -1

ð2π Þ d k 3

ð2π Þ3

3

q - k { ðt Þeikx ð22:91Þ {

q - k ðt Þe : ikx

In (22.91), the integration range covers a domain from -1 to +1 with each component of k. Since the domain is symmetric relative to the origin, it suffices for the integral calculation of (22.91) to switch k to -k in the integrand. For ϕ(x) to be Hermitian, that is, for ϕ{(x) = ϕ(x) to hold, comparing (22.85) and (22.90) we must have q - k { ðt Þ = qk ðt Þ:

ð22:92Þ

Meanwhile, from (22.87) we have q - k { ðt Þ = q1 { ð- kÞeik0 t þ q2 { ð- kÞe - ik0 t :

ð22:93Þ

Comparing once again (22.87) and (22.93), we obtain q1 { ð- kÞ = q2 ðkÞ and q2 { ð- kÞ = q1 ðkÞ,

ð22:94Þ

996

22 Quantization of Fields

with the first and second relations of (22.94) being identical. Substituting (22.87) for (22.85), we get ϕðt, xÞ = = = = =

1 -1 1 -1 1 -1 1 -1 1 -1

d3 k ð2π Þ

3

d3 k ð2π Þ

3

d k 3

ð2π Þ

3

d k 3

ð2π Þ

3

d k 3

ð2π Þ

3

q1 ðkÞe - ik0 t þ q2 ðkÞeik0 t eikx -1

q1 ðkÞe - ik0 t eikx þ

1

ð2π Þ

-1

q1 ðkÞe - ik0 t eikx þ q1 ðkÞe - ikx þ

ð- 1Þ3 d3 k ð- 1Þ3 d3 k

1

1 -1

3

ð2π Þ d k 3

ð2π Þ

3

3

q2 ð - kÞeik0 t eið- kÞx q1 { ðkÞeik0 t eið- kÞx

q1 { ðkÞeikx

q1 ðkÞe - ikx þ q1 { ðkÞeikx , ð22:95Þ

where with the second equality k of the second term has been replaced with -k; with the third equality (22.94) was used in the second term. Note also that in (22.95) kx  k0 x0 - kx =

k2 þ m2 t - kx:

ð22:96Þ

Defining a(k) as [8] a ð kÞ 

2k 0 q1 ðkÞ,

ð22:97Þ

we get ϕð xÞ =

d3 k 3

ð2π Þ ð2k0 Þ

aðkÞe - ikx þ a{ ðkÞeikx :

ð22:98Þ

p The origin of the coefficient 2k0 is explained in literature [9] (vide infra). Equation (22.98) clearly shows that ϕ(x) is Hermitian, that is ϕ{(x) = ϕ(x). We divide (22.98) into the “positive-frequency” part ϕ+(x) and “negative-frequency” part ϕ-(x) such that [5]

22.3

Quantization of the Scalar Field [5, 6]

997

ϕðxÞ = ϕþ ðxÞ þ ϕ - ðxÞ, ϕþ ð xÞ  ϕ - ð xÞ 

d3 k 3

ð2π Þ ð2k0 Þ

aðkÞe - ikx ,

d k 3

3

ð2π Þ ð2k0 Þ

ð22:99Þ a{ ðkÞeikx :

Note that the positive- and negative-frequency correspond to the positive- and negative-energy, respectively. From (22.83) and (22.98) we have 

0

π ð xÞ = ∂ ϕð xÞ = ϕð xÞ =

d3 k ð2π Þ3 ð2k0 Þ

 - ik 0 aðkÞe - ikx þ ik 0 a{ ðkÞeikx :

ð22:100Þ

Next, we convert the equal-time commutation relations (22.84) into the commutation relation between a(k) and a{(q). For this purpose, we solve (22.98) and (22.100) with respect to a(k) and a{(k). To perform the calculation, we make the most of the Fourier transform developed in Sect. 22.2.2. In particular, the threedimensional version of (22.57) is useful. The formula is expressed as ð2π Þ3 δ3 ðkÞ =

1 -1

eikx d 3 x:

ð22:101Þ

To apply (22.101) to (22.98) and (22.100), we deform those equations such that ϕðt, xÞ = ϕ_ ðt, xÞ =

d3 k 3

aðkÞe - ik0 t þ a{ ð- kÞeik0 t eikx ,

ð22:102Þ

ð- ik 0 Þ aðkÞe - ik0 t - a{ ð- kÞeik0 t eikx :

ð22:103Þ

ð2π Þ ð2k0 Þ d3 k 3

ð2π Þ ð2k0 Þ

In (22.102) and (22.103), k has been replaced with -k in their second terms of RHS. We want to solve these equations in terms of aðkÞe - ik0 t and a{ ð- kÞeik0 t . The calculation procedures are as follows: (1) Multiply both sides of (22.102) and (22.103) by e-iqx and integrate them with respect to x by use of (22.101) to get a factor δ3(k - q). (2) Then, we perform the integration with respect to k to delete k. As a consequence of the integration, k is replaced with q and k0 = k2 þ m2 is replaced with q2 þ m2  q0 accordingly. In that way, we convert (22.102) and (22.103) into the equations with respect to aðqÞe - iq0 t and a{ ðqÞeiq0 t . The result is expressed as follows:

998

22 Quantization of Fields

aðqÞe - iq0 t = i {

a ðqÞe

iq0 t

= -i

d3 x 3

ð2π Þ ð2q0 Þ

ϕ_ ðt, xÞ - iq0 ϕðt, xÞ e - iqx ,

d3 x ð2π Þ3 ð2q0 Þ

ϕ_ ðt, xÞ þ iq0 ϕðt, xÞ eiqx :

ð22:104Þ

Note that in the second equation of (22.104) -q was switched back to q to change a{(-q) to a{(q). (3) Next, we calculate the commutation relation between a(k) and a{(q). For this, the commutator aðkÞe - ik0 t , a{ ðqÞeiq0 t can directly be calculated using the equaltime commutation relations (22.84). The factors e - ik0 t and eiq0 t are gotten rid of in the final stage. The relevant calculations are straightforward and greatly simplified by virtue of the integration of the factor δ3(x - y) that comes from (22.84). The use of (22.101) also eases computations. As a result, we get succinct equation that is described by aðkÞe - ik0 t , a{ ðqÞeiq0 t = e - iðk0 - q0 Þt aðkÞ, a{ ðqÞ =

q0 δ3 ðk - qÞ: k 0 q0

Since e - iðk0 - q0 Þt never vanishes, we have [a(k), a{(q)] = 0 if k ≠ q. If k = q, we obtain k0 = k2 þ m2 = q2 þ m2 = q0 . Consequently, [a(k), a{(q)] = δ3(k - q). Meanwhile, we have [a(k), a(q)] = [a{(k), a{(q)] = 0 with any value of k and q. Summarizing the above results, with the commutation relations among the annihilation and creation operators we get aðkÞ, a{ ðqÞ = δ3 ðk - qÞ, ½aðkÞ, aðqÞ = a{ ðkÞ, a{ ðqÞ = 0:

ð22:105Þ

p Note that the factor 2k0 of (22.97) has been introduced so that (22.105) can be expressed in the simplest form possible. Equation (22.105) is free from the inclusion of space–time coordinates and, hence, conveniently used for various calculations, e.g., those of interaction between the quantum fields. Looking at (22.105), we find that we have again reached a result similar to that obtained in Sect. 2.2 expressed as a, a{ = 1

and

a, a{ = E:

ð2:24Þ

In fact, characteristics of a(k), a{(q), and a, a{ are closely related in that a{(q) and a function as the creation operators and that a(k) and a act as the annihilation operators. Thus, the above discussion clearly shows that the quantum-mechanical description of the harmonic oscillators is of fundamental importance in the quantum field theory. The operator a(k) is sometimes given in a simple form such that [8] {

22.3

Quantization of the Scalar Field [5, 6]

999

d3 x

aðkÞ = i

ð2π Þ3 ð2k 0 Þ

$

eikx ∂0 ϕ ,

ð22:106Þ

where $

f ∂0 g  f ð∂0 gÞ - ð∂0 f Þg:

ð22:107Þ

As already shown in Chap. 2, the use of a(k) and a{(k) enables us to calculate various physical quantities in simple algebraic forms. In the next section, we give the calculation processes for the Hamiltonian of the free Klein–Gordon field. The concept of the Fock space is introduced as well. The results can readily be extended to the calculations regarding, e.g., the Dirac field and photon field.

22.3.3

Hamiltonian and Fock Space

The Hamiltonian is described in a form analogous to that of the classical theory of analytical mechanics. It is given as the spatial integration of the Hamiltonian density ℋ defined as ℋðxÞ = π ðxÞϕð_xÞ - L ϕðxÞ; ∂μ ϕðxÞ :

ð22:108Þ

In our present case, we have π ðxÞ = ϕ_ ðt, xÞ and from (22.73) we get ℋð xÞ =

1 ½π ðxÞ2 þ ½— ϕðxÞ2 þ ½mϕðxÞ2 : 2

ð22:109Þ

The Hamiltonian H is then given by H=

d3 x ℋ =

d3 x

1 ½π ðxÞ2 þ ½— ϕðxÞ2 þ ½mϕðxÞ2 : 2

The first term of the above integral is

ð22:110Þ

1000

22 Quantization of Fields

1 2

d3 x½π ðt; xÞ2 =

1 4ð2π Þ3

3

ð2π Þ ð2k0 Þ d q 3

×

d3 x ϕ_ ðt; xÞ

d3 k

1 = d3 x 2

=-

1 2

3

ð2π Þ ð2q0 Þ d3 x

2

ð - ik 0 Þ aðkÞe - ik0 t - a{ ð - kÞeik0 t eikx

ð - iq0 Þ aðqÞe - iq0 t - a{ ð - qÞeiq0 t eiqx

d3 kd3 q k0 q0 aðkÞaðqÞe - iðk0 þq0 Þt k 0 q0

- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt þ a{ ð - kÞa{ ð - qÞeiðk0 þq0 Þt eiðkþqÞx =-

1 4ð2π Þ

3

d3 xeiðkþqÞx

d 3 kd3 q k0 q0 aðkÞaðqÞe - iðk0 þq0 Þt k 0 q0

½ {

- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt þ a{ ð - kÞa{ ð - qÞeiðk0 þq0 Þt =-

1 4ð2π Þ

3

ð2π Þ3

d3 kd3 q δðk þ qÞk 0 q0 aðkÞaðqÞe - iðk0 þq0 Þt k 0 q0

½{{

- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt þ a{ ð - kÞa{ ð - qÞeiðk0 þq0 Þt =-

1 4ð2π Þ

ð2π Þ3 3

d3 k k0 q0 aðkÞað - kÞe - iðk0 þq0 Þt k 0 q0

- aðkÞa{ ðkÞe - iðk0 - q0 Þt - a{ ð - kÞað - kÞeiðk0 - q0 Þt þ a{ ð - kÞa{ ðkÞeiðk0 þq0 Þt =

1 4

d3 k k0 q0 - aðkÞað - kÞe - iðk0 þq0 Þt þ aðkÞa{ ðkÞe - iðk0 - q0 Þt k 0 q0

þ a{ ð- kÞað - kÞeiðk0 - q0 Þt - a{ ð - kÞa{ ðkÞeiðk0 þq0 Þt =

1 4

d3 kk0 - aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ

- a{ ð - kÞa{ ðkÞe2ik0 t :

½{{{ ð22:111Þ

22.3

Quantization of the Scalar Field [5, 6]

1001

In the above somewhat lengthy calculations, we have several points. Namely, at [{] we exchanged the order of the integrations d3x and d3kd3q with respect to the factor ei(k + q)x. This exchange allowed us to use (22.101) at [{{]. At [{ { {] we switched -k to k with regard to the factor a{(-k)a(-k). Also, we used k0 = q0. It is because since k0 = k2 þ m2 and q0 = q2 þ m2 , we have k0 = q0 in the case of k + q = 0, which comes from the factor δ(k + q) appearing in [{{]. The second and third terms of (22.110) can readily be calculated in a similar manner. We describe the results as follows: 1 2

d3 x½— ϕðxÞ2

=

1 d3 k 2 k þ aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ 4 k 0 q0 þ a{ ð - kÞa{ ðkÞe2ik0 t ,

1 2

ð22:112Þ

d3 x½mϕðxÞ2 =

1 d3 k m2 þ aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ 4 k 0 q0 þ a{ ð - kÞa{ ðkÞe2ik0 t :

ð22:113Þ

Combining (22.112) and (22.113), we obtain 1 2

d3 x½— ϕðxÞ2 þ ½mϕðxÞ2 =

1 4

d3 k k2 þ m2 k 0 q0

× þaðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ þ a{ ð - kÞa{ ðkÞe2ik0 t =

1 4

d3 kk0

× þaðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ þ a{ ð - kÞa{ ðkÞe2ik0 t , ð22:114Þ where we used k2 + m2 = k02 = q02. The integrals including the terms ∓ aðkÞað- kÞe - 2ik0 t and ∓ a{ ð- kÞa{ ðkÞe2ik0 t in (22.111) and (22.114) are canceled out. Taking account of these results, finally we get the Hamiltonian in a simple form described by

1002

22 Quantization of Fields

1 2 1 = 2

d3 kk 0 aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ

H =

ð22:115Þ

k2 þ m2 aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ ,

d3 k

where we explicitly described the dependence of the integrand on k. Let us continue the discussion further. Using the commutation relation (22.105), we have lim aðkÞ, a{ ðk - eÞ = lim δ3 ðeÞ,

e→0

E→0

where e is an infinitesimal three-dimensional real vector. Then, we have H=

1 2

d3 kk 0 a{ ðkÞaðkÞ þ a{ ðkÞaðkÞ þ lim δ3 ðeÞ : e→0

ð22:116Þ

From (22.101), we obtain lim δ3 ðeÞ =

e→0

1 lim ð2π Þ3 e → 0

eiex d3 x →

1 ð2π Þ3

d3 x:

Therefore, we get H=

d 3 kk0 a{ ðkÞaðkÞ þ

1 2ð2π Þ3

k0 d3 k

d3 x:

ð22:117Þ

Defining the number density operator n(k) as nðkÞ  a{ ðkÞaðkÞ,

ð22:118Þ

we have H=

d3 kk0 nðkÞ þ

1 ð2π Þ3

d3 kd3 x

1 k : 2 0

ð22:119Þ

The second term of (22.119) means a total energy included in the whole phase space. In other words, the quantity (k0/2) represents the energy density per “volume” (2π)3 of phase space. Note here that in the natural units the dimension of time and length is the inverse of mass (mass dimension). The angular frequency and wavenumber as well as energy and momentum have the same dimension of mass accordingly. This is represented straight in (22.88). Representing (2.21) in the natural units, we have

22.3

Quantization of the Scalar Field [5, 6]

1003

1 H = ωa{ a þ ω: 2

ð22:120Þ

The analogy between (22.117) and (22.120) is evident. The operator n(k) is closely related to the number operator introduced in Sect. 2.2; see (2.58). The second term of (22.117) corresponds to the zero-point energy and has relevance to the discussion about the harmonic oscillator of Chap. 2. Thus, we appreciate how deeply the quantization of fields is connected to the treatment of the harmonic oscillators. In accordance with (2.28) of Sect. 2.2, we define the state of vacuum j0i such that aðkÞ j 0i = 0

for

8

k:

ð22:121Þ

Operating (22.119) on j0i, we have H j 0i = =

d3 kk0 nðkÞ j 0i þ 1 ð2π Þ3

d3 kd 3 x

1 ð2π Þ3

d3 kd 3 x

1 k j 0i 2 0

1 k j 0i: 2 0

ð22:122Þ

That is, j0i seems to be an “eigenstate” of H that belongs to an “eigenvalue” of d3 kd3 x 12 k 0 . Since this value diverges to infinity, it is usually ignored and we

1 ð2π Þ3

plainly state that the energy of j0i (i.e., vacuum) is zero. To discuss the implication of (22.119) and (22.122) is beyond the scope of this book and, hence, we do not get into further details about this issue. In contrast to the above argument, the following momentum operator Pμ has a well-defined meaning: Pμ 

d 3 kkμ nðkÞ þ

1 ð2π Þ3

d3 kd3 x

1 μ k ðμ = 1, 2, 3Þ: 2

ð22:123Þ

The second term vanishes, because kμ (μ = 1, 2, 3) is an antisymmetric function with respect to kμ (μ = 1, 2, 3). Then, we get Pμ =

d 3 kkμ nðkÞ ðμ = 1, 2, 3Þ:

ð22:124Þ

Notice that from (22.119) and (22.124), if we ignore the second term of (22.119) and the index μ is extended to 0, we have P0  H. Thus, we reformulate Pμ as a fourvector such that Pμ =

d3 kk μ nðkÞ ðμ = 0, 1, 2, 3Þ:

ð22:125Þ

1004

22 Quantization of Fields

Meanwhile, using the creation operators a{(k), let us consider the following states: j k 1 , k 2 , ⋯, k n i  a{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i,

ð22:126Þ

where the freedom of the dynamical system is specified by an integer i (i = 1, ⋯, n); ki denotes the four-momentum of a particle that is present at the ith state. Operating a{(q)a(q) on both sides of (22.126) from the left, we have a{ ðqÞaðqÞ j k1 , k2 , ⋯kn i = a{ ðqÞaðqÞa{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i = a{ ðqÞ a{ ðk1 ÞaðqÞ þ δ3 ðq - k1 Þ a{ ðk2 Þ⋯a{ ðkn Þ j 0i = a{ ðqÞa{ ðk1 ÞaðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i = a{ ðqÞa{ ðk1 Þ a{ ðk2 ÞaðqÞ þ δ3 ðq - k2 Þ ⋯a{ ðkn Þ j 0i þδ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i = a{ ðqÞa{ ðk1 Þa{ ðk2 ÞaðqÞ⋯a{ ðkn Þ j 0i þ δ3 ðq - k2 Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn Þ j 0i þδ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i =⋯ = a{ ðqÞa{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn ÞaðqÞ j 0i þ δ3 ðq - kn Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i þ⋯ þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i = δ3 ðq - kn Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i þ⋯ þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i, ð22:127Þ where with the second to the last equality the first term vanishes because from (22.121) we have a(q) j 0i = 0; with the last equality n terms containing the δ3(q ki) (i = 1, ⋯, n) factor survive. Thus, from (22.118), (22.125), and (22.127) we get Pμ j k1 , k 2 , ⋯, k n i = =

d 3 qkμ a{ ðqÞaðqÞ j k1 , k 2 , ⋯, k n i

d3 qkμ δ3 ðq - kn Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i þ ⋯ þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i

= k nμ a{ ðkn Þa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i þ ⋯ þ k1μ a{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i = k 1μ þ ⋯ þ k nμ a{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i = k 1μ þ ⋯ þ k nμ j k1 , k 2 , ⋯, k n i,

ð22:128Þ

where with the second to the last equality we used the second equation of (22.105); with the last equality we used the definition of (22.126). With respect to the energy of jth state, we have

22.3

Quantization of the Scalar Field [5, 6]

k 0j 

k2j þ m2 ð > 0Þ:

1005

ð22:129Þ

Therefore, once kj ( j = 1, ⋯, n) is determined from (22.128), k 0j is also determined from (22.129) and, hence, kμj (μ = 0, 1, 2, 3) is determined accordingly. Thus, we obtain [8] Pμ j k1 , k2 , ⋯, kn i = kμ1 þ ⋯ þ kμn j k 1 , k 2 , ⋯, kn i ðμ = 1, 2, 3Þ:

ð22:130Þ

From (22.130), we find that jk1, k2, ⋯, kni is the eigenstate of the four-momentum operator Pμ (μ = 0, 1, 2, 3) that belongs to the eigenvalue k μ1 þ ⋯ þ k μn . The state described by jk1, k2, ⋯, kni is called a Fock basis. A Hilbert space spanned by the Fock basis is called a Fock space. As in the case of Chap. 2, the operator a{(k) is interpreted as an operator that creates a particle having momentum k. The calculation procedures of (22.127) naturally lead to the normal product (or normal-ordered product), where the annihilation operators a(k) always stand to the right of the creation operators a{(k). This is because if so, any term that contains a(k) j 0i as a factor vanishes in virtue of (22.121). This situation is reminiscent of (2.28), where jaψ 0i = 0 corresponds to a(k) j 0i = 0.

22.3.4

Invariant Delta Functions of the Scalar Field

We have already mentioned that the Lorentz invariance (or covariance) is not necessarily clear with the equal-time commutation relation. In this section we describe characteristics of the invariant delta functions of the scalar field to explicitly show the Lorentz invariance of these functions. In this section, we deal with the invariant delta functions of the scalar field in detail, since their usage offers a typical example as a prototype of the field quantization and, hence, can be applied to the Dirac field and photon field as well. The invariant delta function is defined using a commutator as iΔðx - yÞ  ½ϕðxÞ, ϕðyÞ,

ð22:131Þ

where x and y represent an arbitrary space–time four-coordinate such that x =

t

x t0 and y = . Several different invariant delta functions will appear soon. Since y their definition and notation differ depending on literature, care should be taken. Note that although we have first imposed the condition t = t′ on the commutation relation, it is not the case with the present consideration. Since the invariant delta function Δ(x - y) is a scalar function as we shall see below, we must have a Lorentz

1006

22 Quantization of Fields

invariant form for it. One of the major purposes of this section is to represent Δ(x y) and related functions in the Lorentz invariant form. Dividing the positive-frequency part and negative-frequency part of ϕ(x) and ϕ( y) such that [5] ϕðxÞ = ϕþ ðxÞ þ ϕ - ðxÞ, ϕðyÞ = ϕþ ðyÞ þ ϕ - ðyÞ,

ð22:132Þ

we have ½ϕðxÞ, ϕðyÞ = ½ϕþ ðxÞ, ϕþ ðyÞ þ ½ϕþ ðxÞ, ϕ - ðyÞ þ½ϕ - ðxÞ, ϕþ ðyÞ þ ½ϕ - ðxÞ, ϕ - ðyÞ:

ð22:133Þ

From the expansion formulae of (22.99) as well as the second equation of (22.105), we find that ½ϕþ ðxÞ, ϕþ ðyÞ = ½ϕ - ðxÞ, ϕ - ðyÞ = 0:

ð22:134Þ

Then, we have ½ϕðxÞ, ϕðyÞ = ½ϕþ ðxÞ, ϕ - ðyÞ þ ½ϕ - ðxÞ, ϕþ ðyÞ:

ð22:135Þ

Next, applying the first equation of (22.105) to the commutator consisting of (22.99), we obtain ½ϕþ ðxÞ, ϕ - ðyÞ =

d3 kd3 q aðkÞe - ikx , a{ ðqÞeiqy ð2π Þ3 2 k0 q0

=

d3 kd3 q e - iðkx - qyÞ aðkÞ, a{ ðqÞ = ð2π Þ3 2 k0 q0

=

1 ð2π Þ3

d3 kd 3 q e - iðkx - qyÞ δ3 ðk - qÞ ð2π Þ3 2 k0 q0

d3 k - ikðx - yÞ e , 2k 0 ð22:136Þ

where with the third equality we used the first equation of (22.105). Also, in (22.136) we used the fact that since k 0 = k2 þ m2 and q0 = q2 þ m2 with k = q, we have k0 = q0 accordingly (vide supra). We define one of the invariant delta functions, i.e., Δ+(x - y) as

22.3

Quantization of the Scalar Field [5, 6]

iΔþ ðx - yÞ  ½ϕþ ðxÞ, ϕ - ðyÞ =

1007

d 3 k - ikðx - yÞ e : 2k0

1 ð2π Þ3

ð22:137Þ

Exchanging the coordinate x and y in (22.137) and reversing the order of the operators in the commutator, we get the following expression where another invariant delta function Δ-(x - y) is defined as [5] iΔ - ðx - yÞ  ½ϕ - ðxÞ, ϕþ ðyÞ = - iΔþ ðy - xÞ = -

1 ð2π Þ3

d 3 k - ikðy - xÞ e : 2k0

ð22:138Þ

From (22.131), (22.135), (22.137), and (22.138), we have Δðx - yÞ = Δþ ðx - yÞ þ Δ - ðx - yÞ:

ð22:139Þ

From (22.137) and (22.138), we can readily get the following expression described by □x þ m2 Δþ ðx - yÞ = □x þ m2 Δ - ðx - yÞ = 0, □x þ m2 Δðx - yÞ = 0,

ð22:140Þ

where □x means that the differentiation should be performed with respect to x. Note that upon defining the invariant delta functions, the notations and equalities vary from literature to literature. In this book, to avoid the complexity of the notation for “contraction,” we have chosen (22.137) and (22.138) for the definitions of the invariant delta functions. The contraction is frequently used for the perturbation calculation, which will appear later in Chap. 23. The contraction is a convenient notation to describe a time-ordered product (or T-product) that is defined as h0| T[ϕ(x)ϕ( y)]| 0i (see Sect. 22.3.5). The timeordered product is represented by the following symbol: h0jT½ϕðxÞϕðyÞj0i  ϕðxÞϕðyÞ: j

ð22:141Þ

j

Equation (22.141) is referred to as the Feynman propagator and the discussion about it will be made in the next section. To show that (22.137) is explicitly Lorentz invariant, we wish to convert the k0 . For this integral with respect to k into the integral with respect to k = k purpose, we use the δ function. We have

1008

22 Quantization of Fields

δ k 2 - m2 = δ k 0 2 - k2 þ m2  1 δ k0 = 2 2 k þ m2

k2 þ m 2 þ δ k 0 þ

k2 þ m 2

:

ð22:142Þ This relation results from the following property of the δ function [10]. We have δ½f ðxÞ = n

δðx - an Þ , jf 0 ðan Þj

ð22:143Þ

where f(an) = 0, f ′(an) ≠ 0. As a special case, we obtain [4] δ x 2 - a2 =

1 ½δðx - aÞ þ δðx þ aÞ ða > 0Þ: 2a

ð22:144Þ

Replacing x with k0 and a with k2 þ m2 , we get (22.142). Taking account of k0 > 0 in (22.136), we wish to pick a positive value up in (22.142). Multiplying both sides of (22.142) by θ(k0), we get δ k 2 - m 2 θ ðk 0 Þ 1 δ k0 = 2 2 k þ m2

k2 þ m2 θðk0 Þ þ δ k0 þ

k2 þ m2 θðk0 Þ : ð22:145Þ

Since the argument in the δ function is positive in the second term of (22.145), that term vanishes and we are left with δ k 2 - m 2 θ ðk 0 Þ =

1 2

k þ 2

m2

δ k0 -

k2 þ m 2 :

ð22:146Þ

Operating an arbitrary function f(k0) on both sides of (22.146), we obtain δ k 2 - m2 θðk0 Þf ðk 0 Þ = =

1 2

k þ 1

m2

2

k þ

m2

2

2

δ k0 -

k2 þ m 2 f ð k 0 Þ

δ k0 -

k2 þ m 2 f

k2 þ m2 , ð22:147Þ

where with the second equality we used the following relation [11]:

22.3

Quantization of the Scalar Field [5, 6]

1009

f ðxÞ δðx - aÞ = f ðaÞδðx - aÞ:

ð22:148Þ

Integrating (22.147) with respect to k0, we obtain dk 0 δ k 2 - m2 θðk0 Þf ðk0 Þ ¼

1

dk0

k þ m2 2

2 1

¼ 2

k þ m2 2

f

δ k0 -

k2 þ m2 f

k2 þ m 2

ð22:149Þ

k2 þ m2 :

Multiplying both sides of (22.149) by g(k) and further integrating it with respect to k, we get d3 kdk 0 δ k2 - m2 θðk0 Þf ðk 0 ÞgðkÞ =

d kδ k - m θðk0 Þf ðk 0 ÞgðkÞ = 4

2

2

d k 3

k2 þ m2

f 2

k2 þ m 2

ð22:150Þ gðkÞ:

We wish to use (22.150) to show that (22.137) is explicitly Lorentz invariant. To this end, we put f ðk0 Þ  e - ik0 ðx Under the condition of k0 = k2 þ m 2

f 2

k2 þ m2

g ð kÞ =

0

- y0 Þ

and

gðkÞ  eikðx - yÞ :

ð22:151Þ

k2 þ m2 , in RHS of (22.150) we have 1 - ikðx - yÞ 1 - ik0 ðx0 - y0 Þ ikðx - yÞ e e = e : 2k 0 2k 0

Applying (22.150) to (22.137), we get 1 ð2π Þ3 1 = ð2π Þ3

iΔþ ðx - yÞ =

d 3 k - ikðx - yÞ e 2k0

ð22:152Þ

d 4 kδ k2 - m2 θðk 0 Þe - ikðx - yÞ :

Note that with the first equality of (22.152) the factor e-ik(x - y) is given by

1010

22 Quantization of Fields

e - ikðx - yÞ = e - i

p

k2 þm2 ðx0 - y0 Þ ikðx - yÞ

e

ð22:153Þ

with k0 replaced with k2 þ m2 . With the second equality of (22.152), however, k0 is changed to an explicit integration variable. Also, notice that in spite of the above change in k0, the function e-ik(x - y) is represented in the same form in (22.152) throughout. The important point is that the positive number k2 þ m2 does not appear in LHS of (22.145) explicitly, but that number is absorbed into the Lorentz invariant form of δ(k2 - m2)θ(k0). Thus, (22.137) has successfully been converted into the explicitly Lorentz invariant form. In a similar way, for Δ-(x - y) we obtain another Lorentz invariant form described by iΔ - ðx - yÞ =

-1 ð2π Þ3

dkδ k 2 - m2 θðk 0 Þeikðx - yÞ :

ð22:154Þ

From (22.137) to (22.139) as well as (22.152) and (22.154), we have iΔðx - yÞ = iΔþ ðx - yÞ þ iΔ - ðx - yÞ = ½ϕþ ðxÞ, ϕ - ðyÞ þ ½ϕ - ðxÞ, ϕþ ðyÞ 1 d4 kδ k 2 - m2 θðk0 Þ e - ikðx - yÞ - eikðx - yÞ = ð2π Þ3 - 2i = d4 kδ k 2 - m2 θðk0 Þ sin kðx - yÞ: ð2π Þ3

ð22:155Þ

Note that as a representation equivalent to (22.155), we have iΔðx - yÞ =

1 ð2π Þ3

d 4 kδ k2 - m2 εðk0 Þe - ikðx - yÞ ,

ð22:156Þ

where ε(x) is defined by εðxÞ  x=jxj, εð0Þ  0 ðx : realÞ:

ð22:157Þ

To show that (22.155) and (22.156) are identical, use the fact that ε(x) is an odd function with respect to x that is expressed as εðxÞ = θðxÞ - θð- xÞ:

ð22:158Þ

Thus, we have shown that all the above functions Δ+(x - y), Δ-(x - y), and Δ(x - y) are Lorentz invariant. On account of the Lorentz invariance, these functions are said to be invariant delta functions. Since the invariant delta functions are given

22.3

Quantization of the Scalar Field [5, 6]

1011

Fig. 22.1 Contours to evaluate the complex integrations with respect to the invariant delta functions Δ±(x - y) and Δ(x - y) of the scalar field. The contours C+, C-, and C correspond to Δ+(x - y), Δ-(x - y), and Δ(x - y), respectively



0

=

+

as a scalar, they are of special importance in the quantum field theory. This situation is the same with the Dirac field and photon field as well. In (22.152) and (22.154), we assume that the integration range spans the whole real numbers from -1 to +1. As another way to represent the invariant delta functions in the Lorentz invariant form, we may choose the complex variable for k0. The contour integration method developed in Chap. 6 is useful for the present purpose. In Fig. 22.1, we draw several contours to evaluate the complex integrations with respect to the invariant delta functions. With these contours, the invariant delta functions are represented as [5] Δ ± ð x - yÞ =

1 ð2π Þ4

d4 k C±

e - ikðx - yÞ : m2 - k 2

ð22:159Þ

To examine whether (22.159) is equivalent to (22.137) or (22.138), we rewrite the denominator of (22.159) as m2 - k2 = m2 þ k2 - k20 : Further defining a positive number K  and defining the integral I C ± as IC ±  we have

d4 k C±

ð22:160Þ

k2 þ m2 in accordance with (22.149) e - ikðx - yÞ , m2 - k2

ð22:161Þ

1012

22 Quantization of Fields

IC ± =

d4 k C±

e - ikðx - yÞ = m2 - k 2

1 d3 keikðx - yÞ 2K

=

d4 k C±

dk 0 e

1 - ikðx - yÞ 1 1 e 2K k0 - K k0 þ K

- ik 0 ðx0 - y0 Þ



1 1 : k0 - K k0 þ K

ð22:162Þ

1 The function k0 þK is analytic within the complex domain k0 encircled by C+. With the function k0 -1 K , however, a simple pole is present within the said domain (at k0 = K ). Therefore, choosing C+ for the contour, only the second term contributes to the integral with the integration I Cþ . It is given by

I Cþ =

d4 k Cþ

1 - ikðx - yÞ 1 e 2K k0 - K

d 3 keikðx - yÞ

=

d ke

=

3

= 2πi

ikðx - yÞ

1 2K 1 2K

d 3 keikðx - yÞ

dk 0 e - ik0 ðx

0

- y0 Þ

-



2πiRes e

- ik 0 ðx0 - y0 Þ

1 k0 - K 1 k0 - K

ð22:163Þ k0 = K

0 0 1 - e - iK ðx - y Þ , 2K

where with the third equality we used (6.146) and (6.148). Then, from (22.159) Δ+ is given by [5] Δ þ ð x - yÞ =

1 -i I Cþ = ð2π Þ4 ð2π Þ3

d3 k

1 - ikðx - yÞ , e 2K

ð22:164Þ

where we have e - ikðx - yÞ = e - iK ðx

0

- y0 Þ ikðx - yÞ

e

:

ð22:165Þ

Thus, Δ+(x - y) of (22.159) is obviously identical to (22.137) if C+ is chosen for the contour. Notice that in (22.164) k0 is no longer an integration variable, but a real positive number K = k2 þ m2 as in (22.165), independent of k0. In turn, the 1 function k0 K is analytic within the complex domain encircled by C-. With the 1 function k0 þK , a simple pole is present within the said domain. Therefore, choosing C- for the contour, only the first term of RHS of (22.162) contributes to the integral with the integration I C - . It is given by

22.3

Quantization of the Scalar Field [5, 6]

IC - =

d4 k C-

1 - ikðx - yÞ 1 e 2K k0 þ K

d3 keikðx - yÞ

=

d3 keikðx - yÞ

=

1013

1 2K 1 2K

dk 0 e - ik0 ðx

0

- y0 Þ

C-

2πiRes e - ik0 ðx

0

1 k0 þ K

- y0 Þ

1 k0 þ K

ð22:166Þ k0 = - K

1 3 - ikðx - yÞ iK ðx0 - y0 Þ e d ke 2K 1 ikðx - yÞ d3 k e , 2K

= 2πi = 2πi

where with the third equality we used (6.148) again. Also, note that with the second to the last equality we used d 3 keikðx - yÞ =

d 3 ke - ikðx - yÞ :

ð22:167Þ

Then, Δ-(x - y) is given by Δ - ðx - yÞ =

1 i I = 4 Cð2π Þ ð2π Þ3

d3 k

1 ikðx - yÞ e , 2K

ð22:168Þ

0 0 where eikðx - yÞ = eiK ðx - y Þ e - ikðx - yÞ . This shows that Δ-(x - y) of (22.159) is identical with (22.138) if C- is chosen for the contour. Then, from (22.139), (22.164), and (22.168), we have [5]

Δ ðx - yÞ =

-i ð2π Þ3

-1 ð2π Þ3

d3 k

=

d3 k

1 e - ikðx - yÞ - eikðx - yÞ 2K

1 sin k ðx - yÞ: K

ð22:169Þ

Or from (22.131), we have equivalently ½ϕðxÞ, ϕðyÞ =

1 ð2π Þ3

d3 k

1 e - ikðx - yÞ - eikðx - yÞ : 2K

ð22:170Þ

This relation can readily be derived from (22.137) and (22.138) as well. Returning to the notation (22.159), we get

1014

22 Quantization of Fields

Δðx - yÞ = =

1 1 I þ IC - = 4 Cþ ð2π Þ ð2π Þ4 1 ð2π Þ4

d4 k C

d4 k Cþ

e - ikðx - yÞ þ m2 - k 2

d4 k C-

e - ikðx - yÞ m2 - k2

- ik ðx - yÞ

e , m2 - k 2 ð22:171Þ

where C is a contour that encircles both C+ and C- as shown in Fig. 22.1. This is evident from (6.87) and (6.146) that relate the complex integral with residues. We summarize major properties of the invariant delta functions below. From (22.169), we have ΔðxÞ = - i½ϕðxÞ, ϕð0Þ = - i

d3 k e - ikx - eikx : ð2π Þ3 2k 0

ð22:172Þ

Therefore, Δ(x) = - Δ(-x) (i.e., odd function). Putting x0 = 0 in (22.172), we have ΔðxÞjx0 = 0 = - i

d3 k eikx - e - ikx = 0, ð2π Þ3 2k 0

ð22:173Þ

where with the second equality we used (22.167). Differentiating (22.172) with respect to x0, we get :

d3 k ð- ik 0 Þe - ikx - ik 0 eikx = ð2π Þ3 2k 0

ΔðxÞ = - i

d3 k ð2π Þ3 2

 e - ikx þ eikx :

ð22:174Þ

Putting x0 = 0 in (22.174), we have :

ΔðxÞ

x0 = 0

=-

d3 k eikx þ e - ikx = - δ3 ðxÞ, ð2π Þ3 2

where we used (22.101). As an alternative way, from (22.169) we have ΔðxÞ = 

-1 ð2π Þ3

Using ðsin kxÞ = k0 cos kx, we obtain

d3 k

1 sin kx: K

22.3

Quantization of the Scalar Field [5, 6] :

Δ ð xÞ =

-1 ð2π Þ3

1015

d3 k cos kx:

Then, we get :

Δ ð xÞ

x0 = 0

=

-1 ð2π Þ3

d 3 k cosð- k  xÞ =

-1 ð2π Þ3

d3 k cosðk  xÞ = - δ3 ðxÞ:

Getting back to the equal-time commutation relation, we had ½ϕðt, xÞ, ϕðt, yÞ = 0:

ð22:84Þ

Using (22.131) and (22.173), however, the above equation can be rewritten as ½ϕðt, xÞ, ϕðt, yÞ = iΔð0, x - yÞ = 0:

ð22:175Þ

A world interval between the events that are labeled by t, x and t, y is imaginary (i.e., spacelike) as can be seen in (21.31). Since this nature is Lorentz invariant, we conclude that ½ϕðxÞ, ϕðyÞ = 0 for ðx - yÞ2 < 0:

ð22:176Þ

This clearly states that the fields at any two points x and y with spacelike separation commute [5]. In other words, this statement can be translated into that the physical measurements of the fields at two points with spacelike separation must not interfere with each other. This is known as the microcausality.

22.3.5 Feynman Propagator of the Scalar Field Once the Lorentz invariant functions have been introduced, this directly leads to an important quantity called the Feynman propagator. It is expressed as a c-number directly connected to the experimental results that are dealt with in the framework of the interaction between fields. The Feynman propagator is defined as a vacuum expectation value of the time-ordered product of ϕ(x)ϕ( y), i.e., h0jT½ϕðxÞϕðyÞj0i:

ð22:177Þ

With the above quantity, the time-ordered product T[ϕ(x)ϕ( y)] of the neutral scalar field is defined by

1016

22 Quantization of Fields

T½ϕðxÞϕðyÞ  θ x0 - y0 ϕðxÞϕðyÞ þ θ y0 - x0 ϕðyÞϕðxÞ:

ð22:178Þ

The time-ordered product (abbreviated as T-product) frequently appears in the quantum field theory together with the normal-ordered product (abbreviated as N-product). Their tangible usage for practical calculations will be seen in the next chapter. Using (22.102), we have T ½ϕðxÞϕðyÞ d 3 kd3 q θ x0 - y0 aðkÞe - ikx þ a{ ðkÞeikx aðqÞe - iqy þ a{ ðqÞeiqy ð2π Þ3 2 k 0 q0

= þ

d3 kd3 q θ y0 - x0 aðqÞe - iqy þ a{ ðqÞeiqy aðkÞe - ikx þ a{ ðkÞeikx : ð2π Þ3 2 k0 q0

Each integral contains four terms, only one of which survives after taking the vacuum expectation values h0| ⋯| 0i; use the equations of a(k) j 0i = h0 j a{(k) = 0. Then, we have h0jT½ϕðxÞϕðyÞj0i d3 kd3 q θ x0 - y0 e - ikx eiqy 0jaðkÞa{ ðqÞj0 ð2π Þ3 2 k0 q0

= þ

ð22:179Þ

d kd q θ y0 - x0 e - iqy eikx 0jaðqÞa{ ðkÞj0 : 3 ð2π Þ 2 k0 q0 3

3

Using the first equation of (22.105), we have aðkÞa{ ðqÞ = a{ ðqÞaðkÞ þ δ3 ðk - qÞ, aðqÞa{ ðkÞ = a{ ðkÞaðqÞ þ δ3 ðq - kÞ: Hence, we obtain 0jaðkÞa{ ðqÞj0 = 0ja{ ðqÞaðkÞj0 þ 0jδ3 ðk - qÞj0 = 0jδ3 ðk - qÞj0 = δ3 ðk - qÞh0j0i = δ3 ðk - qÞ: Similarly, we have 0jaðqÞa{ ðkÞj0 = δ3 ðq - kÞ: Substituting the above equations for (22.179) and integrating the resulting equation with respect to q, we get

22.3

Quantization of the Scalar Field [5, 6]

h0jT½ϕðxÞϕðyÞj0i =

1017

d3 k ð2π Þ3 2k 0

 θ x0 - y0 e - ikðx - yÞ þ θ y0 - x0 eikðx - yÞ ,

ð22:180Þ

where, e.g., e-ik(x - y) is represented by e - ikðx - yÞ = e - ik0 ðx

0

- y0 Þþikðx - yÞ

; k0 =

k2 þ m2 :

ð22:181Þ

We call (22.180) the Feynman propagator and symbolically define it as [8] ΔF ðx - yÞ  h0jT½ϕðxÞϕðyÞj0i:

ð22:182Þ

Notice that the definition and notation of the invariant delta functions and Feynman propagator are different from literature to literature. Caution should be taken accordingly. In this book we define the Feynman propagator as identical with the vacuum expectation value of the time-ordered product [8]. Meanwhile, the commutator [ϕ+(x), ϕ-( y)] is a scalar function. Then, we have [5] iΔþ ðx - yÞ = ½ϕþ ðxÞ, ϕ - ðyÞ = h0j½ϕþ ðxÞ, ϕ - ðyÞj0i = h0jϕþ ðxÞϕ - ðyÞj0i = h0jϕðxÞϕðyÞj0i: ð22:183Þ This relation is particularly important, because the invariant delta function Δ+(x - y) is not only a scalar, but also given by a vacuum expectation value of the product function ϕ(x)ϕ( y). This aspect will become clearer when we evaluate the Feynman propagator. Switching the arguments x and y and using (22.138), we have iΔþ ðy - xÞ = - iΔ - ðx - yÞ = h0jϕðyÞϕðxÞj0i:

ð22:184Þ

Using (22.178) and (22.182), we get [5] ΔF ðx - yÞ = i θ x0 - y0 Δþ ðx - yÞ - θ y0 - x0 Δ - ðx - yÞ :

ð22:185Þ

The Feynman propagator plays an important role in evaluating the interactions between various fields, especially through the perturbation calculations. As in the case of (22.155), we wish somehow to describe (22.180) in the explicitly Lorentz invariant form. To do this, normally starting with (22.180), we arrive at the Lorentz invariant Fourier integral representation. In the present case, however, in an opposite manner we start with the Fourier integral representation of the time-ordered product in the four-momentum space and show how the said representation naturally leads to (22.180).

1018

22 Quantization of Fields

The Fourier integral representation of the Feynman propagator is described in an invariant form as [8] Δ F ð x - yÞ =

-i ð2π Þ4

d4 k

e - ikðx - yÞ , m2 - k 2 - iE

ð22:186Þ

where E is an infinitesimal positive constant and we will take the limit E → 0 after the whole calculation. When we dealt with the invariant delta functions in the last section, the constant E was not included in the denominator. The presence or absence of E comes from the difference in the integration paths of the complex integral that are chosen for the evaluation of individual invariant functions. We will make a brief discussion on this point at the end of this section. It is convenient to express the Feynman propagator in the momentum space (momentum space representation). In practice, taking the Fourier transform of (22.186), we have -i = m2 - k 2 - iE

d 4 xΔF ðx - yÞeikðx - yÞ ,

ð22:187Þ

where we applied the relations (22.55) and (22.58) to (22.186). For the derivation, use ð2π Þ4 δ4 ðkÞ =

1

eikx d 4 x: -1

We define ΔF ðk Þ as ΔF ðkÞ 

-i : m2 - k2 - iE

ð22:188Þ

The momentum space representation ΔF ðk Þ is frequently used for dealing with problems of the interaction between the fields (see Chap. 23). We modify (22.186) as follows for the sake of easy handling. Defining a positive constant K as K  k2 þ m2 as before, we think of an equation A

1 1 1 : 2K k 0 - ð- K þ iEÞ k 0 - ðK - iEÞ

ð22:189Þ

After some calculations, (22.189) is deformed to A=

1 - ðiE=K Þ 1 2K - 2iE = 2 :  2 2 2 2K K - k 0 - 2iKE þ E m - k 2 - 2iKE þ E2

ð22:190Þ

22.3

Quantization of the Scalar Field [5, 6]

1019

In (22.190), 2KE in the denominator is an infinitesimal positive constant as well; we neglect E2 as usual. The term -iE/K in the numerator does not affect subsequent calculations and, hence, can be discarded from the beginning. Thus, (22.190) can be regarded as identical to 1/(m2 - k2 - iE). Then, instead of calculating (22.186), we are going to evaluate ΔF ðx - yÞ =

-i ð2π Þ4

d4 k

1 - ikðx - yÞ 1 1 e : 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ

ð22:191Þ

Rewriting (22.191), we have ΔF ðx - yÞ ¼

-i ð2π Þ4 ×

d3 keikðx - yÞ

þ1 -1

dk 0

1 - ik0 ðx0 - y0 Þ e 2K

1 1 k0 - ð- K þ iEÞ k0 - ðK - iEÞ

:

ð22:192Þ

With the above relation, we define I as I

þ1 -1

dk 0

1 1 - ik0 ðx0 - y0 Þ 1 : e 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ

ð22:193Þ

Since complex numbers are contained in the denominator of the integrand I of (22.192), we wish to convert I to a complex integration with respect to k0 (see Chap. 6). From (22.192) we immediately see that the integrand has simple poles at k0 = ± (K - iE). The point is that the presence of the simple poles allows us to use (6.146) described by f ðzÞdz = 2πi C

Res f aj :

ð6:146Þ

j

Another point is that the functions 1 k0 - ð- K þ iEÞ

and

1 k0 - ðK - iEÞ

tend uniformly to zero with |k0| → 1. This situation allows us to apply Jordan’s lemma to the complex integration of (22.192); see Sect 6.8, Lemma 6.6. To perform the complex integration, we have the following two cases.

1020

22 Quantization of Fields

(a) Γ − +

i 0

− =

+

(b) =



−i

0

+



Γ

Fig. 22.2 Contours to calculate the complex integrations with respect to the Feynman propagator ΔF(x - y) of the scalar field. (a) Contour integration with the upper semicircle C. (b) Contour ~ integration with the lower semicircle C

1. Case I: We choose an upper semicircle C for the contour integration (see Fig. 22.2a). As in the case of Example 6.4, we define IC as IC =

dk 0 C

1 1 - ik0 ðx0 - y0 Þ 1 : e 2K k 0 - ð- K þ iEÞ k0 - ðK - iEÞ

ð22:194Þ

22.3

Quantization of the Scalar Field [5, 6]

1021

As the function k0 - ð1K - iEÞ is analytic on and inside the contour C, its associated contour integration vanishes due to the Cauchy’s integral theorem (Theorem 6.10). Hence, we are left with IC =

1 - ik0 ðx0 - y0 Þ 1 : e 2K k0 - ð- K þ iEÞ

dk 0

ð22:195Þ

C

Making the upper semicircle ΓR large enough, we get 1

1 - ik0 ðx0 - y0 Þ 1 e 2K R→1 k0 - ð- K þ iEÞ -1 1 1 - ik0 ðx0 - y0 Þ þ lim dk0 : e 2K R→1 Γ k ð K þ iEÞ 0 R lim I C =

dk 0

ð22:196Þ

In (22.196), the contour integration is taken counter-clockwise so that k0 = K + iE can be contained inside the closed contour (i.e., the upper semicircle C). For the contour integral along C to have a physical meaning (i.e., for Jordan’s Lemma to be applied to the present case), we must have x0 - y0 < 0. For this, put k0 = Reiθ (R : real positive; 0 < θ < π) so that we may get e - ik0 ðx

0

- y0 Þ

= e - iRðx

0

- y0 Þ cos θ Rðx0 - y0 Þ sin θ

e

:

Taking the absolute value of both sides for the above equation, we obtain e - ik0 ðx

0

- y0 Þ

We have eRðx

0

= e - iRðx - y0 Þ sin θ

0

- y0 Þ cos θ

 eRðx

0

- y0 Þ sin θ

= eR ð x

0

- y0 Þ sin θ

→ 0 with R → 1. Since the function

:

1 k0 - ð- KþiEÞ

is

analytic on ΓR and tends uniformly to zero as R → 1, the second term of (22.196) vanishes according to Lemma 6.6 (Jordan’s Lemma). Thus, we obtain lim I C =

R→1

1 -1

dk 0

1 1 - ik0 ðx0 - y0 Þ = I: e 2K k0 - ð- K þ iEÞ

Applying (6.146) and (6.148) to the present case, we get I = 2πi

1 - ik0 ðx0 - y0 Þ e 2K

Taking E → 0, we obtain

k0 = - KþiE

= 2πi

1 ðiKþEÞðx0 - y0 Þ e : 2K

1022

22 Quantization of Fields

I = 2πi

1 iK ðx0 - y0 Þ : e 2K

ð22:197Þ

2. Case II: We choose a lower semicircle C for the contour integration (see Fig. 22.2b). As in Case I, we define I as C

I =

dk 0

C

1 1 1 - ik0 ðx0 - y0 Þ : e 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ

C

This time, as the function k0 - ð-1 KþiEÞ is analytic on and inside the contour C, its associated contour integration vanishes due to the Cauchy’s integral theorem, and so we are left with lim I =

R→1 C

þ lim

1 -1

R→1 Γ R

dk 0

dk 0

1 1 - ik0 ðx0 - y0 Þ e 2K k0 - ðK - iEÞ

1 1 - ik0 ðx0 - y0 Þ : e 2K k0 - ðK - iEÞ

ð22:198Þ

Using Jordan’s lemma as in Case I and applying (6.146) and (6.148) to the present case, we get I = - 2πi

0 0 1 - e - ik0 ðx - y Þ 2K

k0 = K

= 2πi

1 - iK ðx0 - y0 Þ e : 2K

ð22:199Þ

In (22.199), the minus sign before 2πi with the first equality resulted from that the contour integration was taken clockwise so that k0 = K - iE was contained inside the closed contour (i.e., the lower semicircle C). Note that in this case we must have x0 - y0 > 0 for Jordan’s lemma to be applied. Combining the results of Case I and Case II and taking account of the fact that we must have x0 - y0 < 0 (x0 - y0 > 0) with Case I (Case II), we get I = 2πi

0 0 0 0 1 θ - x0 þ y0 eiK ðx - y Þ þ θ x0 - y0 e - iK ðx - y Þ : 2K

Substituting (22.200) for (22.192), we get

ð22:200Þ

22.3

Quantization of the Scalar Field [5, 6]

Δ F ðx - y Þ =

-i ð2π Þ4

=

1 ð2π Þ3

d 3 kIe - ikðx - yÞ 0 0 0 0 d 3 k ikðx - yÞ e θ - x0 þ y0 eiK ðx - y Þ þ θ x0 - y0 e - iK ðx - y Þ 2K

1 d3 k θ - x0 þ y0 3 2K ð2π Þ = h0jT ½ϕðxÞϕðyÞj0i, =

1023

ik ðx - yÞ

þ θ x0 - y0 e - ikðx - yÞ

ð22:201Þ where with the second equality we switched the integral variable k to -k in the first term; the last equality came from (22.180). Thus, we have shown that (22.186) properly reproduces the definition of the Feynman propagator ΔF(x - y) given by (22.182). To calculate the invariant delta functions, we have encountered several examples of the complex integration. It is worth noting that comparing the contours C of Fig. 22.2a and C- of Fig. 22.1 their topological features are virtually the same. This is also the case with the topological relationship between the contours C of Fig. 22.2b and C+ of Fig. 22.1. That is, the former two contours encircle only k0 = - K (or k0 = - K + iE), whereas the latter two contours encircle only k0 = K (or k0 = K - iE). Comparing (22.186) with (22.159) and (22.171), we realize that the functional forms of the integrand are virtually the same except for the presence or absence of the term -iE in the denominator. With the complex integration of k0, simple poles ±K are located inside the contour for the integration regarding (22.159) and (22.171). With (22.186), however, if the term -iE were lacked, the simple poles ±K would be located on the integration path. To avoid this situation, we “raise” or “lower” the pole by adding +iE (E > 0) or -iE to its complex coordinate, respectively, so that we can apply Cauchy’s Integral Theorem (see Theorem 6.10) and Cauchy’s Integral Formula (Theorem 6.11) to the contour integration [11]. After completing the relevant calculations, we take E → 0 to evaluate the integral properly; see (22.197), for example. Nevertheless, we often do without this mathematical manipulation. An example can be seen in Chap. 23.

22.3.6

General Consideration on the Field Quantization

So far, we have described the quantization procedures of the scalar field. We will subsequently deal with the quantization of the Dirac field and electromagnetic field (or photon field). The quantization processes and resulting formulation have different faces depending on different types of relativistic fields. Nevertheless, the quantization methods can be integrated into a unified approach. This section links the contents of the previous sections to those of the subsequent sections. In Sect. 22.2.2, we had

1024

22 Quantization of Fields

E=

1

j kihk j dk,

-1

ð22:64Þ

where E is the identity operator. We extend a variable k of (22.64) to the momentum four-vector p. That is, we have E=

1 -1

j pihp j dp4 ,

ð22:202Þ

p0 , to which p0 denotes the energy and p represents the threep dimensional momenta of a relativistic particle. We define the quantity p2 as

where p =

p2  p μ p μ : Then, from (21.79) we have p2 = p0 2 - p2 = m2

or

p2 - m2 = 0:

Taking account of this constraint properly, we modify (22.202) in an explicitly invariant form in such a way that [7] E = =

1 -1 1

δ p2 - m2 j pihp j dp4 1

dp3 -1

-1

ð22:203Þ dp0 δ p0 2 - p2 þ m2

j pihp j :

To further modify (22.203), using (22.143) where f(x) = x2 - a2 we obtain δ x 2 - a2 =

δðx - aÞ δðx þ aÞ þ : 2j aj 2j - aj

Exchanging x and a and using the fact that the δ function is an even function, we have δ x2 - a2 =

δðx - aÞ δðx þ aÞ þ , 2j xj 2j - x j

where x is regarded as the valuable. Replacing x with p0 and a with get

p2 þ m2 , we

22.3

Quantization of the Scalar Field [5, 6]

1025

δ p2 - m2 = δ p0 2 - p2 þ m2 δ p0 -

=

p2 þ m2 2jp0 j

þ

δ p0 þ

p2 þ m 2

2j - p0 j

:

Inserting this relation into (22.203), we obtain E = =

1 -1

δ p2 - m2 j pihp j dp4

1

δ p0 -

1

dp3 -1

-1

dp0

p2 þ m2 2jp0 j

þ

δ p0 þ

p2 þ m 2

2j - p0 j

Defining jpi  j p, p0i along with a real positive number Ep  using a property of the δ function, we get E=

1

dp3 -1

j pihp j : p2 þ m2 and

1 p, Ep ihp, Ep þ p, - E p ihp, - E p : 2Ep

Operating hxj and jΦi on both sides of the above equation from the left and right, respectively, we get xEΦ

= hxjΦi =

1

dp3 - 1 2E p

xjp, E p hp, E p jΦi þ hxjp, - E p ihp, - Ep jΦi ,

ð22:204Þ

where jxi represents the continuous space–time basis vectors of the Minkowski space (see Fig. 10.1, which shows a one-dimensional case); jΦi denotes the quantum state of the relativistic particle (photon, electron, etc.) we are thinking of. According to (10.90), we express hx| Φi of (22.204) as ΦðxÞ  hxjΦi: Also defining jxi  j x, ti, we rewrite (22.204) as Φ ð xÞ =

1

dp3 - 1 2E p

Using (22.67), we have

x, tjp, Ep hp, Ep jΦi þ hx, tjp, - E p ihp, - Ep jΦi :

1026

22 Quantization of Fields

x, tjp, ± E p =

1 ð2π Þ

3

eipx :

ð22:205Þ

In the above equation, neither t nor Ep can be regarded as Fourier components because of the presence of δ( p2 - m2) in the integrand of (22.203). In other words, if we choose p for freely changeable parameter over the range (-1, +1), we cannot deal with p0 as an independent parameter on account of the factor δ( p2 - m2) but have to regard it as a dependent parameter p0 = ± p2 þ m2 . The factor hp, ±Ep| Φi describes the quantum state of the free particle in the positive-energy state and negative-energy state, respectively. The former (latter) state contains e - iEp t (eiEp t ) as a factor accordingly, independent of the particle species. The factor hp, ±Ep| Φi reflects the nature of individual particle species. As already seen in Chap. 21, for example, hp, ±Ep| Φi denotes a spinor with the Dirac field. As for the photon field, hp, ±Ep| Φi represents an electromagnetic four-vector, as we will see later. More importantly, hp, ±Ep| Φi contains a q-number, i.e., operator. Normally, in the quantum field theory, the positive-energy (or negativeenergy) state is accompanied by the annihilation (or creation) operator. The factor hp, ±Ep| Φi must properly be “normalized.” Including the aforementioned characteristics, Φ(x) is described as Φ ð xÞ =

1

dp3 - 1 2E p

1 ð2π Þ

3

eipx e - iEp t αþ ðpÞΦþ p, E p þ eiEp t α{- ðpÞΦ - p, - E p , ð22:206Þ

where α+(p) and α{- ðpÞ are the annihilation and creation operators (i.e., q-numbers), respectively. In (22.206) we have defined p, E p jΦ  αþ ðpÞΦþ p, Ep

and

p, - E p jΦ  α{- ðpÞΦ - p, - Ep :

Even though Φ±(p, ±Ep) contains the representation of spinor, vector, etc., it is a c-number. Further defining the operators such that [8] aþ ð p Þ  we obtain

2E p αþ ðpÞ

and

a{- ðpÞ 

2Ep α{- ðpÞ,

22.3

Quantization of the Scalar Field [5, 6]

1027

Φ ð xÞ =

1

dp3

-1

3

ð2π Þ 2E p

eipx e - iEp t aþ ðpÞΦþ p, E p þ eiEp t a{- ðpÞΦ - p, - E p : ð22:207Þ

Then, we get ΦðxÞ ¼

1 -1

dp3 ð2π Þ3 2E p

eipx e - iEp t aþ ðpÞΦþ p, Ep

þ e - ipx eiEp t a{- ð - pÞΦ - - p, - E p ¼

1 -1

dp3 ð2π Þ3 2E p

e - ipx aþ ðpÞΦþ p, E p þ eipx a{- ð - pÞΦ - - p, - E p ,

ð22:208Þ where with the second equality e - ipx = eipx e - iEp t and eipx = e - ipx eiEp t; in the second term of RHS p was switched to -p. In (22.208), e.g., b{- ðpÞ is often used instead of a{- ð- pÞ for the sake of distinction and clarity. The quantization of the relativistic fields is based upon (22.208) in common with regard to the neutral scalar field, Dirac field, and the photon field. The first term of (22.208) is related to the positive-energy state that is characterized by a+(p) and Φ+(p, Ep). The second term, in turn, is associated with the negative-energy state characterized by a{- ð- pÞ and Φ-(-p, -Ep). In Sect. 22.3.2, we established a basic equation of (22.98) as a starting point for the quantization of the scalar field. It was unclear whether (22.98) is Lorentz invariant. Its equivalent expression of (22.208), however, has been derived from the explicitly invariant equation of (22.203). Thus, this enables us to adopt (22.208) as a fundamental equation for the quantization of relativistic fields. As in (22.208), it is always possible to separate (or decompose) the field operator Φ(x) into the positive-energy and negative-energy parts. Defining ΦðxÞ = Φþ ðxÞ þ Φ - ðxÞ, we have

1028

22 Quantization of Fields

Φþ ðxÞ 

1

dp3

-1

3

Φ - ð xÞ 

ð2π Þ 2E p

1

dp3

-1

ð2π Þ3 2E p

e - ipx aþ ðpÞΦþ p, E p , ð22:209Þ eipx a{- ð - pÞΦ - - p, - E p :

Using these equations, we obtain various useful representations regarding, e.g., commutation relations, the invariant delta functions, and Feynman propagators (see Sects. 22.3.4 and 22.3.5 as well as the subsequent Sects. 22.4 and 22.5). In the case of the neutral scalar field (or Klein-Gordon field), we do not have to distinguish a±(±p) but we only have to take account of a(p) [and a{(p)]. Also in the neutral scalar case, as a normalization condition to be satisfied in (22.208) we may put Φþ p, E p  Φ - - p, - Ep  1: Thus, we have reproduced (22.98). In the case of the Dirac field, in turn, we have following correspondence: Φþ p, E p $ uðp, hÞ and Φ - - p, - Ep $ vðp, hÞ, where we consider the helicity h as an additional freedom (see Sect. 22.4). Regarding other specific notations, see the subsequent sections.

22.4

Quantization of the Dirac Field [5, 6]

22.4.1 Lagrangian Density and Hamiltonian Density of the Dirac Field As in the case of the scalar field, we start with the Lagrangian density of the Dirac field. The Lagrangian density L(x) is given by LðxÞ  ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ:

ð22:210Þ

Then, the action integral S(ψ) follows from (22.210) such that Sðψ Þ =

dx LðxÞ =

dx ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ :

Taking variation with respect to ψ of S(ψ) as before, we have

ð22:211Þ

22.4

Quantization of the Dirac Field [5, 6]

δSðψ Þ =

1029

dxδLðxÞ

=

dx δψiγ μ ∂μ ψ þ ψiγ μ ∂μ δψ - mδψ  ψ - mψ  δψ

=

dxδψ iγ μ ∂μ ψ - mψ þ

=

dxδψ iγ μ ∂μ ψ - mψ þ ½ψiγ μ δψ 1 -1 þ

=

dxδψ iγ μ ∂μ ψ - mψ -

dx ψiγ μ ∂μ - mψ δψ dx - ∂μ ψiγ μ - mψ δψ

dx i∂μ ψγ μ þ mψ δψ, ð22:212Þ

where with the second to the last equality we used integration by parts and δψ was supposed to vanish at x = ± 1. Since ψ and ψ are regarded as independent variables, for δS(ψ) = 0 to hold with any arbitrarily chosen δψ and δψ we must have iγ μ ∂μ - m ψ ðxÞ = 0

ð22:213Þ

i∂μ ψ ðxÞγ μ þ mψ ðxÞ = 0:

ð22:214Þ

and

Instead of (22.214), we sometimes write [6] i ψ ∂μ γ μ þ mψ = 0:

ð22:215Þ

Equations (22.214) and (22.215) are called the adjoint Dirac equation. Rewriting (22.214), we have

1030

22 Quantization of Fields

i∂μ ψ { γ 0 γ μ þ mψ { γ 0 = 0, # take a transposition and rearrange the terms : T

μ T

iðγ Þ γ 0 ∂μ ψ  þ mγ 0 ψ  = 0, # use transposition of γ 0 γ μ = 2η0μ - γ μ γ 0 i 2η0μ - γ

0 T

ðγ μ ÞT ∂μ ψ  þ mγ 0 ψ  = 0,

# multiply both sides by γ 0 use γ i 2η0μ γ 0

T

ð21:66Þ :

0 T

γ

0 T

= γ

0 T 0

γ =E

T

from the left and

ðidentity matrixÞ :

ð22:216Þ

- ðγ μ ÞT ∂μ ψ  þ mψ  3

T

= i γ 0 ∂0 ψ  - i

γk

T

∂k ψ  þ mψ  = 0,

k=1

# take complex conjugate of the above : 3

- i γ 0 ∂0 ψ þ

γ k ∂k ψ

þ mψ = 0,

k=1

where we used the fact that γ 0 is an Hermitian operator and that γ k (k = 1, 2, 3) are anti-Hermitian. Rewriting (22.216), finally we recover iγ μ ∂μ - m ψ ðxÞ = 0:

ð22:213Þ

Thus, (22.213) can naturally be derived from (22.214), which is again equivalent to (21.54). As in the case of the scalar field, the Hamiltonian density ℋ(x) of the Dirac field is given by ℋðxÞ = π ðxÞψ_ ðxÞ - L ψ ðxÞ, ∂μ ψ ðxÞ :

ð22:217Þ

Since ψ(x) represents a spinor field that is described by a (4, 1) matrix (i.e., column vector), π(x) is a (1, 4) matrix (row vector) so that ℋ is a scalar, i.e., a (1, 1) matrix. If we need to distinguish each component, we put an index, e.g., ψ(x)a and π(x)b (a, b = 1, 2, 3, 4). When we need to distinguish the time coordinate from the space coordinates, we describe, e.g., π(x) as π(t, x). We have π ð xÞ =

∂LðxÞ = ψ ðxÞiγ 0 = ψ { ðxÞγ 0 iγ 0 = iψ { ðxÞ: ∂ ½ ∂ 0 ψ ð xÞ 

ð22:218Þ

The representation of (22.218) is somewhat symbolic, because ψ(x) involved as the differentiation variable is a matrix. Therefore, we use the following expression to specify individual components of ψ(x) and π(x):

22.4

Quantization of the Dirac Field [5, 6]

π a ð xÞ =

∂LðxÞ = ψ ðxÞiγ 0 ∂ ½ ∂ 0 ψ a ð xÞ 

1031

a

= ψ { ðxÞγ 0 iγ 0 a = i ψ { ðxÞa

= iψ  ðxÞa :

ð22:219Þ

Note that in (22.219) ψ {(x) is a (1, 4) matrix and that ψ (x)a is its ath component. The derivation of (22.219) is based on a so-called compendium quantization method. Interested readers are referred to literature for further discussion [2, 6, 8]. Using (22.219), the Hamiltonian density ℋ of (22.217) is expressed as :

ℋðxÞ = iψ { ðxÞ ψ ðxÞ - ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ :

= iψ { ðxÞ ψ ðxÞ - ψ { ðxÞγ 0 iγ μ ∂μ ψ ðxÞ þ ψ ðxÞmψ ðxÞ :

= iψ { ðxÞ ψ ðxÞ - ψ { ðxÞγ 0 iγ μ ∂μ ψ ðxÞ þ ψ ðxÞmψ ðxÞ 3

=-

ψ { ðxÞγ 0 iγ k ∂k ψ ðxÞ þ ψ ðxÞmψ ðxÞ

k=1 3

= ψ ð xÞ -

iγ k ∂k þ m ψ ðxÞ: k=1

Defining α  γ 0γ [where γ  (γ 1, γ 2, γ 3)] and β  γ 0 according to the custom, we have ℋ = ψ { ðxÞð- iα  — þ mβÞψ ðxÞ:

ð22:220Þ

Hence, with the Hamiltonian H we get H=

d3 x ℋ =

d3 x ψ { ðxÞð- iα  — þ mβÞψ ðxÞ:

ð22:221Þ

According to the classical theory (i.e., the theory before the quantum field theory) based on the one-particle wave function and its probabilistic interpretation, the integrand of (22.221) can be viewed as an expectation value of the one-particle energy if ψ(x) is normalized. Then, the classical Schrödinger equation could be described as Hψ ðxÞ = ð- iα  — þ mβÞψ ðxÞ = i∂0 ψ ðxÞ,

ð22:222Þ

where H represents the one-particle Hamiltonian. Then, (22.222) is written as

1032

22 Quantization of Fields 3

- iγ 0 γ k ∂k þ γ 0 m - γ 0 γ 0 i∂0 ψ ðxÞ = 0:

ð22:223Þ

k=1

Multiplying both sides of (22.223) by γ 0 from the left, we obtain 3

- iγ k ∂k þ m - iγ 0 ∂0 ψ ðxÞ = 0: k=1

Rearranging the terms, we get iγ μ ∂μ - m ψ ðxÞ = 0:

ð22:213Þ

Thus, once again we have recovered the Dirac equation. If Schrödinger had invented the gamma matrices, he would have discovered the Dirac equation (22.213). If we choose ϕ(x) of (21.59) for the solution of the Dirac equation, H is identical with Hp of (21.131) and gives a positive one-particle energy p0. If, however, χ(x) of (21.59) is chosen, H is identical with H-p that gives a negative energy -p0, in agreement with the result of Sect. 21.3.3. Unlike the classical theory, ψ(x) must be dealt with as a quantum-mechanical operator. Instead of the commutation relation given in (22.84) with the scalar field, we use the following equal-time anti-commutation relation (see Sect. 21.3.1) described by fψ ðt, xÞ, π ðt, yÞg = ψ ðt, xÞ, iψ { ðt, yÞ = iδ3 ðx - yÞ,

ð22:224Þ

ψ ðt, xÞ, ψ T ðt, yÞ = π T ðt, xÞ, π ðt, yÞ = 0:

The expression of (22.224) is somewhat symbolic. To deal with the anticommutators of (22.224) requires caution. It is because although ψ(t, x)π(t, y) is a (4, 4) matrix, π(t, y)ψ(t, x) is a (1, 1) matrix. To avoid this inconsistency, we define fψ ðt, xÞ, π ðt, yÞg  ψ ðt, xÞπ ðt, yÞ þ π ðt, yÞT ½ψ ðt, xÞT

T

:

ð22:225Þ

In the second term of (22.225), π(t, y)ψ(t, x) is replaced with π(t, y)T[ψ(t, x)]T and the components of π(t, y) are placed in front of those of ψ(t, x). Another word of caution is that both ψ(t, x) and π(t, y) are Grassmann numbers. Let a and b be Grassmann numbers. Then, we have ab = - ba, a2 = b2 = 0. Let A = (aij) and B = (bij) be matrices whose matrix elements are Grassmann numbers. Unlike ordinary c-numbers, we have aijbkl ≠ bklaij. Hence, the standard matrix multiplication rules must be modified. If numbers we are dealing with are not Grassmann numbers but ordinary c-numbers, we would simply have

22.4

Quantization of the Dirac Field [5, 6]

1033 T

π ðt, yÞT ½ψ ðt, xÞT

= ψ ðt, xÞπ ðt, yÞ:

This equation, however, is not true of the present case where both ψ(t, x) and π(t, y) are Grassmann numbers. To highlight this feature, let us think of a following simple example. Example 22.1 Suppose that we have matrices a1 a2 a3

A=

and

B = ðb1 b2 b3 b4 Þ:

ð22:226Þ

a4 Then, we have

AB =

a 1 b1

a1 b2

a1 b3

a1 b4

a2 b1

a2 b2

a2 b3

a2 b4

a3 b1

a3 b2

a3 b3

a3 b4

,

a4 b1 a4 b2 a4 b3 a4 b4 b1 a1 b2 a1 b3 a1 b4 a1 BT AT

T

=

b1 a2

b2 a2

b3 a2

b4 a2

b1 a3

b2 a3

b3 a3

b4 a3

b1 a4

b2 a4

b3 a4

b4 a4

ð22:227Þ :

If all the above numbers are ordinary c-numbers, we have no problem with AB = (BTAT)T. But, if we are dealing with the Grassmann numbers, we have AB ≠ (BTAT)T. If we express the anti-commutator between A and B, symbolically we have fA, Bg = AB þ BT AT

=

T

a1 b1 þ b1 a1

a1 b2 þ b2 a1

a1 b3 þ b3 a1

a 1 b4 þ b4 a1

a2 b1 þ b1 a2

a2 b2 þ b2 a2

a2 b3 þ b3 a2

a 2 b4 þ b4 a2

a3 b1 þ b1 a3

a3 b2 þ b2 a3

a3 b3 þ b3 a3

a 3 b4 þ b4 a3

a4 b1 þ b1 a4

a4 b2 þ b2 a4

a4 b3 þ b3 a4

a 4 b4 þ b4 a4

:

What we have done in the above calculation is to compute aibk + bkai (i, k = 1, 2, 3, 4) for the individual matrix elements.

1034

22 Quantization of Fields

Thus, this simple example shows that care should be taken when dealing with the Grassmann numbers, particularly in the case of matrix calculations. □ Getting back to our present issue, we write the first equation of (22.224) in a matrix form to get T

π1

ψ1 ψ2

ð π1

ψ3

π2

π3

π2

π4 Þ þ

ψ1

π3

ψ2

ψ3

ψ4

π4

ψ4

ð22:228Þ

iδ ðx - yÞ 3

iδ3 ðx - yÞ

=

,

iδ3 ðx - yÞ iδ3 ðx - yÞ

where with RHS all the off-diagonal elements vanish with the diagonal elements being iδ3(x - y). That is, (22.228) is a diagonal matrix. Also, notice that using (22.218), (22.228) can be rewritten as ψ 1

ψ1 ψ2

ð ψ 1

ψ3

ψ 2

ψ 3

ψ 4 Þ þ

ψ 2 ψ 3

T

ψ1

ψ2

ψ3

ψ4

ψ 4

ψ4

ð22:229Þ

δ ð x - yÞ 3

=

δ 3 ð x - yÞ

:

δ3 ðx - yÞ δ3 ðx - yÞ

Another way to show (22.228) is to write down the matrix elements in such a way that [8] ψ α ðt, xÞ, π β ðt, yÞ = iδαβ δ3 ðx - yÞ ðα, β = 1, 2, 3, 4Þ:

ð22:230Þ

ψ α ðt, xÞπ β ðt, yÞ þ π β ðt, yÞψ α ðt, xÞ = iδαβ δ3 ðx - yÞ:

ð22:231Þ

Namely,

22.4

Quantization of the Dirac Field [5, 6]

1035

Note that neither α nor β is an index that indicates the four-vector, but that both α and β are indices of spinors. Then, we specify α, β = 1, 2, 3, 4 instead of α, β = 0, 1, 2, 3. Equations (22.230) and (22.231) imply the presence of a transposed matrix, which has appeared in (22.228) and (22.229). Unlike (22.84) describing the commutation relation of the scalar field, the anti-commutation relation for the Dirac field of (22.224), (22.229), and (22.230) is represented by (4, 4) matrices.

22.4.2 Quantization Procedure of the Dirac Field The quantization of the Dirac field can be performed basically in parallel with that of the scalar field. That is, (A) the operator ψ(x) needs to be Fourier-expanded in an invariant form. (B) The creation operator and annihilation operator are introduced. (C) The anti-commutation relation between the creation and annihilation operators needs to be established. The calculation processes, however, differ from the scalar field quantization in several points. For instance, (a) ψ(x) does not need to be Hermitian. (b) The calculations are somewhat time-consuming in that we have to do (4, 4) matrices calculations. (But they are straightforward.) A. We start with the Fourier expansion of ψ(x). In Sect. 22.3.6 we had Φ ð xÞ =

1

dp3

-1

ð2π Þ3 2Ep

 e - ipx aþ ðpÞΦþ p, E p þ eipx a{- ð- pÞΦ - - p, - E p :

ð22:208Þ

In the Dirac field, we must include an additional freedom of the helicity h that takes a number 1 or -1. Accordingly, Fourier expansion of the identity operator (22.203) is reformulated as E=

1 - 1h = ± 1

δ p2 - m2 j p, hihp, h j dp4 :

Hence, (22.208) is rewritten as ΦðxÞ ¼

1

dp3

-1

3

ð2π Þ

e - ipx aþ ðp, hÞΦþ p, Ep , h 2Ep h¼ ± 1

þ eipx a{- ð - p, hÞΦ - - p, - E p , h :

1036

22 Quantization of Fields

B. To this relation, we wish to introduce the creation operator and annihilation operator. To this end, we rewrite the field quantity Φ(x) using the following expressions given by [8] aþ ðp, hÞ $ bðp, hÞ

and

a{- ð- p, hÞ $ d{ ðp, hÞ

as well as Φþ p, E p , h $ uðp, hÞ and

Φ - - p, - Ep , h $ vðp, hÞ:

With respect to the Dirac field, using ψ(x) in place of Φ(x) in the above expression, we get [8] ψ ð xÞ =

1 -1

dp3 3

ð2π Þ 2p0

e - ipx bðp, hÞuðp, hÞ þ eipx d { ðp, hÞvðp, hÞ , h= ±1

ð22:232Þ where p0 was used instead of Ep in (22.208). The operators b(p, h) and d{(p, h) are an annihilation operator of an electron and a creation operator of a positron, respectively. The introduction of d{(p, h) needs some explanation. So far, we permitted the negative-energy state. However, if a particle carrying a negative energy did exist in the real world, we would face an intolerable situation. To get rid of this big hurdle, Dirac proposed the hole theory (1930). Even though several aspects of the hole theory are explained in literature [6, 8], we wish to add a few comments to the theory. Dirac thought that vacuum consisted of electrons that occupied all the negative-energy states (strictly speaking, the energy states with an energy of -m or less where m is an electron mass). If an electron happened to be excited so as to have a positive energy, a created hole within the negative-energy electrons would behave as if it were a particle carrying a positive charge (|e| = - e) and a positive energy with a rest mass m, i.e., a positron. The positron predicted by Dirac was indeed soon discovered experimentally by Carl Anderson (1932). Getting back to the present issue, the operator d{(p, h) might well be interpreted as an annihilation operator of an electron carrying a negative energy. But once the positron was discovered in the real world, d{(p, h) must be interpreted as the creation operator of a positron. Notice again that according to the custom the annihilation operator b(p, h) is a “coefficient” of the positive-energy state u(p, h) and that the creation operator d{(p, h) is a coefficient of the negative-energy state v(p, h). The anticommutation relations will be introduced because electron is a Fermion particle. At the same time, the anti-commutation relations facilitate various calculations (vide infra). Note that unlike the scalar field ϕ(x) described in (22.98), ψ(x) is

22.4

Quantization of the Dirac Field [5, 6]

1037

asymmetric with respect to the creation and annihilation operators, reflecting the fact that ψ(x) is not Hermitian. C. Our next task is to translate the canonical anti-commutation relations (22.224) into the anti-commutation relations between the creation and annihilation operators. To this end, we wish first to solve (22.232) with respect to b(p, h). The calculation procedures for this are as follows: (a) Multiply both sides of (22.232) by e-iqx and integrate them with respect to x to apply (22.101). As a result, we obtain 1 -1

dx3 e - iqx ψ ðxÞ ð22:233Þ

ð2π Þ3

=

2q0

e

- iq0 t

0

0

bðq, h Þuðq, h Þ þ e

iq0 t {

0

0

d ðq, h Þvðq, h Þ :

h0 = ± 1

(b) Multiplying both sides of (22.233) by u{(q, h) from the left and using (21.156) and (21.157), we delete d{(q, h′) from (22.233) to get 1 -1

dx3 e - iqx u{ ðq, hÞψ ðxÞ =

ð2π Þ3 2q0 e - iq0 t bðq, hÞ:

ð22:234Þ

Rewriting (22.234), we have bðq, hÞ =

1

1 3

ð2π Þ 2q0

-1

dx3 eiqx u{ ðq, hÞψ ðxÞ:

ð22:235Þ

(c) Next, we determine the anti-commutation relations between b(q, h) and d(q, h) (and their adjoints). For example, bðp, hÞ, b{ ðq, h0 Þ =

1 ð2π Þ3 2p0 þ

1

-1

1 -1

dx3 eipx u{ ðp, hÞψ ðxÞ

dy3 e - iqy ψ { ðyÞuðq, h0 Þ =

1 -1

1 -1

dy3 e - iqy ψ { ðyÞuðq, h0 Þ

dx3 eipx u{ ðp, hÞψ ðxÞ

1 × ð2π Þ3 2p0

1038

22 Quantization of Fields

1

1

dx3 -1

-1

dy3 eipx e - iqy u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þþψ { ðyÞuðq, h0 Þu{ ðp, hÞψ ðxÞ: ð22:236Þ

The RHS is a (1, 1) matrix containing Grassmann numbers (ψ)α, (ψ )β (α, β = 1, 2, 3, 4). The quantity (u)α, however, is an ordinary c-number. Note that the matrix multiplication (1, 4) × (4, 1) × (1, 4) × (4, 1) produces a (1, 1) matrix. Matrix calculations are somewhat tedious, but after termwise estimation of (22.236) we are left with a simple form by virtue of the anti-commutation relation (22.224). The calculation processes are as follows: Regarding u{(p, h)ψ(x)ψ {( y)u(q, h′), its full description is given by {

u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þ ψ 1 ð xÞ = ð u1 ðp, hÞ

u2 ðp, hÞ

u3 ðp, hÞ

ψ 2 ð xÞ

u4 ðp, hÞ Þ

ψ 3 ð xÞ ψ 4 ð xÞ

ð22:237Þ

0

u1 ðq, h Þ × ð ψ 1 ð yÞ 

ψ 2 ð yÞ 

ψ 3 ð yÞ 

ψ 4 ðyÞ Þ

u2 ðq, h0 Þ u3 ðq, h0 Þ

:

u4 ðq, h0 Þ Meanwhile, ψ {( y)u(q, h′)u{(p, h)ψ(x) is similarly expressed as in (22.237). After the calculations, we get u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þ þ ψ { ðyÞuðq, h0 Þu{ ðp, hÞψ ðxÞ 4

= α=1

ð22:238Þ

uα uα δ3 ðx - yÞ = 2p0 δhh0 δ3 ðx - yÞ,

where with the last equality we used (21.156) and (22.229). Inserting (22.238) into (22.236), we obtain bðp, hÞ, b{ ðq, h0 Þ

=

2p0 eitðp0 - q0 Þ δhh0 ð2π Þ3 2p0 =

1

1

dx3 -1

2p0 eitðp0 - q0 Þ δhh0 ð2π Þ3 2p0

dy3 eiðqy - pxÞ δ3 ðx - yÞ

-1 1

-1

dx3 eixðq - pÞ

22.4

=

Quantization of the Dirac Field [5, 6]

1039

2p0 eitðp0 - q0 Þ δhh0 ð2π Þ3 δ3 ðq - pÞ = δhh0 δ3 ðq - pÞ, ð2π Þ3 2p0

ð22:239Þ

where with the second to the last equality we used (22.101); the last equality was due to the inclusion of δ3(q - p), which leads to q0 - p0 = 0. Thus, the result of (22.239) has largely been simplified. We remark that the above calculation procedures offer an interesting example for a computation of numbers that contain both Grassmann numbers and ordinary c-numbers. Other anti-commutation relations can be derived similarly. For example, we obtain the following relation as in the case of (22.235): d { ðq, hÞ =

1

1 3

ð2π Þ 2p0

-1

dx3 e - iqx v{ ðq, hÞψ ðxÞ:

ð22:240Þ

Also, proceeding similarly to (22.239), we have d ðp, hÞ, d{ ðq, h0 Þ = δhh0 δ3 ðq - pÞ:

ð22:241Þ

With other combinations of the b-related operators and d-related operators, we have the following anti-commutation relations such that fbðp, hÞ, bðq, h0 Þg = fbðp, hÞ, dðq, h0 Þg = bðp, hÞ, d { ðq, h0 Þ = b{ ðp, hÞ, b{ ðq, h0 Þ = b{ ðp, hÞ, dðq, h0 Þ = b{ ðp, hÞ, d { ðq, h0 Þ 0

{

{

ð22:242Þ

0

= fdðp, hÞ, d ðq, h Þg = d ðp, hÞ, d ðq, h Þ = 0: These results can be viewed in parallel with (22.105) in spite of the difference between the commutation and anti-commutation relations and the difference in presence or absence of the helicity freedom. Since (22.242) does not contain space–time coordinates, it is useful for various calculations in relation to the quantum fields.

22.4.3

Antiparticle: Positron

The Dirac field is characterized by the antiparticles. The antiparticle of electron is the positron. In Sect. 21.4 we have introduced the notion of the charge conjugation. It can be regarded as the particle–antiparticle transformation. Strictly speaking, it is not until we properly describe the field quantization by introducing the creation and annihilation operators that the charge conjugation has its place. Nonetheless, it is of great use in a mathematical approach to introduce the notion of charge conjugation into the spinor field before being quantized. In fact, the formulation of the charge

1040

22 Quantization of Fields

conjugation developed in Sect. 21.4 can directly be imported along with the creation and annihilation operators into the field quantization of the antiparticle. In Sect. 21.4 the charge conjugation of the spinor ψ into ψ C was described as ψ C = Cψ T = iγ 2 ψ  :

ð21:161Þ

Borrowing this notation and using (22.232), the antiparticle field ψ anti(x) is expressed as ψ anti ðxÞ = ψ C ðxÞ = iγ 2 ψ  1 dp3 = eipx b{ ðp, hÞiγ 2 u ðp, hÞ þ e - ipx dðp, hÞiγ 2 v ðp, hÞ : 3 -1 ð2π Þ 2p0 h = ± 1 ð22:243Þ Further using (21.166) and (21.167), we get ψ anti ðxÞ =

1 -1

dp3 3

ð2π Þ 2p0 h = ± 1

eipx b{ ðp, hÞvðp, hÞ þ e - ipx dðp, hÞuðp, hÞ : ð22:244Þ

Comparing (22.232) and (22.244), we find that ψ(x) and ψ anti(x) can be obtained merely by exchanging b(p, h) and d(p, h). The discrimination between the particle and antiparticle is associated with whether the operator ψ(x) is Hermitian or not. Since in the case of the neutral scalar field the operator ϕ(x) is Hermitian, we have no discrimination between the particle and antiparticle. With the Dirac field, however, the operator ψ(x) is not Hermitian, but the particle and antiparticle should be distinguished. We may think that destroying the negativeenergy state is equivalent to creating the positive-energy state, or vice versa. Interested readers are referred to appropriate literature with more detailed discussions on the particle and antiparticle as well as their implications [8, 12].

22.4.4

Invariant Delta Functions of the Dirac Field

In this section we introduce the invariant delta functions of the Dirac field. As in the case of the scalar field, these functions are of great importance in studying the interaction of quantum fields. The derivation of these functions is parallel with those of the scalar field. We have performed the canonical quantization by setting the equal-time anticommutation relation (22.224). Combining (22.224) with the Dirac field expansion

22.4

Quantization of the Dirac Field [5, 6]

1041

described by (22.232) and (22.244), we introduced the anti-commutation relations between the creation and annihilation operators expressed as (22.239), (22.241), and (22.242). Further combining the Dirac field expansion with the anti-commutation relations between the creation and annihilation operators, we wish to seek the covariant anti-commutation relations between the Dirac fields. Note that unlike the equal-time anti-commutation relation (22.224) the covariant anti-commutation relations do not contain the (equal) time variable explicitly. We obtain a following covariant relations at once such that fψ ðxÞ, ψ ðyÞg = fψ ðxÞ, ψ ðyÞg = 0:

ð22:245Þ

With this expression see Example 22.1. To explore other covariant anti-commutation relations of the Dirac field, we decompose the field operators into the positive-frequency and negative-frequency parts as in the case of the scalar field (Sect. 22.3.4). For example, from (22.232) we have ψ þ ðxÞ = ψ - ðxÞ =

dp3

e - ipx bðp, hÞuðp, hÞ,

3

ð2π Þ 2p0 h = ± 1 dp3

ð22:246Þ eipx b{ ðp, hÞuðp, hÞ:

ð2π Þ3 2p0 h = ± 1

Similarly, a pair of functions ψ -(x) and ψ þ ðxÞ is obtained such that ψ - ðxÞ = ψ þ ðxÞ =

dp3

eipx d{ ðp, hÞvðp, hÞ,

3

ð2π Þ 2p0 h = ± 1 dp3 3

e - ipx dðp, hÞvðp, hÞ:

ð2π Þ 2p0 h = ± 1

From (22.246) we get fψ þ ðxÞ, ψ - ðxÞg dq3 dp3 1 e - iðpx - qyÞ = 3 2p0  2q0 h, h0 = ± 1 ð2π Þ

ð22:247Þ

× bðp, hÞb{ ðq, h0 Þuðp, hÞuðq, h0 Þ þ b{ ðq, h0 Þbðp, hÞuðq, h0 Þuðp, hÞ : As implied in Example 22.1, the factor uðq, h0 Þuðp, hÞ in the second term of the integrand of (22.247) must be read as

1042

22 Quantization of Fields T

uðq, h0 Þuðp, hÞ = uðq, h0 Þ uðp, hÞT T

= uðp, hÞuðq, h0 Þ:

ð22:248Þ

Notice that since (22.248) does not contain the Grassmann numbers, unlike (22.227) the second equality of (22.248) holds. Then, (22.247) can be rewritten as fψ þ ðxÞ, ψ - ðxÞg =

1 ð2π Þ3

dq3 dp3 e - iðpx - qyÞ 2p0  2q0 h, h0 = ± 1

× uðp, hÞuðq, hÞ bðp, hÞb{ ðq, h0 Þ þ b{ ðq, h0 Þbðp, hÞ =

1 ð2π Þ3

dq3 dp3 e - iðpx - qyÞ uðp, hÞuðq, h0 Þ bðp, hÞ, b{ ðq, h0 Þ 2p0  2q0 h, h0 = ± 1

=

1 ð2π Þ3

dq3 dp3 e - iðpx - qyÞ uðp, hÞuðq, h0 Þδhh0 δ3 ðq - pÞ 2p0  2q0 h, h0 = ± 1

=

1 ð2π Þ3

dp3 e - ipðx - yÞ uðp, hÞuðp, hÞ 2p0 h = ± 1

=

1 ð2π Þ3

dp3 - ipðx - yÞ e pμ γ μ þ m 2p0

= iγ μ ∂μ þ m x

1 ð2π Þ3

½{

½{{

dp3 - ipðx - yÞ x e = iγ μ ∂μ þ m iΔþ ðx - yÞ: 2p0

½{{{ ð22:249Þ

At [{] in (22.249) we used (22.239); at [{{] we used (21.144); at [{ { {] we used x (22.164). In (22.249) ∂μ means the partial differentiation with respect to x. Similarly calculating fψ - ðxÞ, ψ þ ðyÞg, we obtain fψ - ðxÞ, ψ þ ðyÞg = iγ μ ∂μ þ m iΔ - ðx - yÞ, x

where Δ-(x - y) is identical with that appearing in (22.168). Also, we define iγ μ ∂μ þ m iΔþ ðx - yÞ  iSþ ðx - yÞ = fψ þ ðxÞ, ψ - ðxÞg, x

iγ μ ∂μ þ m iΔ - ðx - yÞ  iS - ðx - yÞ = fψ - ðxÞ, ψ þ ðyÞg: x

ð22:250Þ

Summing two equations of (22.250), using (22.139), and further defining Sðx - yÞ  Sþ ðx - yÞ þ S - ðx - yÞ, we obtain [5]

22.4

Quantization of the Dirac Field [5, 6]

iγ μ ∂μ þ m iΔðx - yÞ  iSðx - yÞ: x

1043

ð22:251Þ

Meanwhile, we have [5] fψ ðxÞ, ψ ðyÞg = fψ þ ðxÞ, ψ þ ðxÞg þ fψ þ ðxÞ, ψ - ðxÞg þ fψ - ðxÞ, ψ þ ðxÞg þ fψ - ðxÞ, ψ - ðxÞg = fψ þ ðxÞ, ψ - ðxÞg þ fψ - ðxÞ, ψ þ ðxÞg = iSþ ðx - yÞ þ iS - ðx - yÞ = iSðx - yÞ, ð22:252Þ where with the first equality, the first term and fourth term vanish because of (22.242).

22.4.5

Feynman Propagator of the Dirac Field

Similarly to the case of the scalar field, the Feynman propagator SF(x - y) is defined as a vacuum expectation value of the T-product such that SF ðx - yÞ  h0jT½ψ ðxÞψ ðyÞj0i:

ð22:253Þ

In the case of the Dirac field, the time-ordered product (T-product) T½ψ ðxÞψ ðyÞ is defined by T½ψ ðxÞψ ðyÞ  θ x0 - y0 ψ ðxÞψ ðyÞ - θ y0 - x0 ψ ðyÞψ ðxÞ:

ð22:254Þ

Note the minus sign before the second term of (22.254). It comes from the fact that the Fermion operators ψ(x) and ψ ðyÞ are anti-commutative. We have to pay attention to the symbolic notation of (22.254) as in the case of the calculation of fψ þ ðxÞ, ψ - ðxÞg and fψ - ðxÞ, ψ þ ðyÞg. In fact, ψ ðyÞψ ðxÞ must be dealt with as a (4, 4) matrix as well. Taking the vacuum expectation value of (22.254), we have SF ðx - yÞ = h0jT½ψ ðxÞψ ðyÞj0i = θ x0 - y0 h0jψ ðxÞψ ðyÞj0i - θ y0 - x0 h0jψ ðyÞψ ðxÞj0i:

ð22:255Þ

Similarly to the scalar field case of (22.183), we have h0jψ ðxÞψ ðyÞj0i = fψ þ ðxÞ, ψ - ðyÞg = iSþ ðx - yÞ,

ð22:256Þ

h0jψ ðyÞψ ðxÞj0i = fψ - ðxÞ, ψ þ ðyÞg = iS - ðx - yÞ:

ð22:257Þ

Substituting (22.256) and (22.257) for (22.255), we get

1044

22 Quantization of Fields

SF ð x - y Þ = i θ x 0 - y 0 Sþ ð x - y Þ - θ y 0 - x 0 S - ð x - y Þ :

ð22:258Þ

Then, from (22.232), we have T ½ψ ðxÞψ ðyÞ 1

=

dq3 dp3 -1

θ ð x0 - y0 Þ ð2π Þ3 2p0

e - ipx bðp, hÞuðp, hÞ þ eipx d { ðp, hÞvðp, hÞ h= ±1

eipy b{ ðq, h0 Þuðq, h0 Þ þ e - iqy dðq, h0 Þvðq, h0 Þ

× 0

h = ±1

-

1

dq3 dp3 -1

θ ð y0 - x0 Þ ð2π Þ3 2p0

eipy b{ ðq, h0 Þuðq, h0 Þ þ e - iqy dðq, h0 Þvðq, h0 Þ 0

h = ±1

e - ipx bðp, hÞuðp, hÞ þ eipx d{ ðp, hÞvðp, hÞ

×

:

ð22:259Þ

h= ±1

Therefore, we obtain h0jT ½ψ ðxÞψ ðyÞj0i ¼

1

dq3 dp3 -1

θ ð x0 - y0 Þ h0j ð2π Þ3 2p0

e - ipx bðp, hÞuðp, hÞþ eipx d{ ðp, hÞvðp, hÞ h¼ ± 1

eipy b{ ðq, h0 Þuðq, h0 Þþ e - iqy dðq, h0 Þvðq, h0 Þ

×

j0

0

h ¼±1

-

1

dq3 dp3 -1

θ ð y0 - x0 Þ h0j ð2π Þ3 2p0

eipy b{ ðq, h0 Þuðq, h0 Þþ e - iqy d ðq, h0 Þvðq, h0 Þ h0 ¼ ± 1

e - ipx bðp, hÞuðp, hÞþ eipx d{ ðp, hÞvðp, hÞ

×

j0 :

ð22:260Þ

h¼ ± 1

Equation (22.260) comprises two integrals and the integrands of each integral comprise 2 × 2 = 4 terms. However, we do not need to calculate all of them. This is because in (22.260) we only have to calculate the integrals with respect to the following two integrands that lead to non-vanishing integrals: e - iðpx - qyÞ 0jbðp, hÞuðp, hÞ  b{ ðq, h0 Þuðq, h0 Þj0

A h, h0 = ± 1

and

22.4

Quantization of the Dirac Field [5, 6]

e - iðqy - pxÞ 0jd ðq, h0 Þvðq, h0 Þ  d{ ðp, hÞvðp, hÞj0 :

B

1045

ð22:261Þ

h, h0 = ± 1

Other terms vanish similarly as in the case of (22.121). It is because we have bðp, hÞ j 0i = d ðq, h0 Þ j 0i = 0 and h0 j b{ ðq, h0 Þ = h0 j d{ ðp, hÞ = 0:

ð22:262Þ

We estimate the above two integrands A and B one by one. First, we rewrite A as e - iðpx - qyÞ 0jbðp, hÞuðp, hÞ  b{ ðq, h0 Þuðq, h0 Þj0

A 0

h, h = ± 1

e - iðpx - qyÞ 0j δhh0 δ3 ðq - pÞ - b{ ðq, h0 Þbðp, hÞ uðp, hÞuðq, h0 Þj0

= 0

h, h = ± 1

e - iðpx - qyÞ 0jδhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þj0

= 0

h, h = ± 1

e - iðpx - qyÞ δhh0 δ3 ðq - pÞh0juðp, hÞuðq, h0 Þj0i,

= 0

h, h = ± 1

ð22:263Þ where the second equality comes from (22.239) and the third equality from (22.262). Paying attention to the fact that uðp, hÞuðq, h0 Þ is not a quantum operator but a (4, 4) matrix, we have h0juðp, hÞuðq, h0 Þj0i = uðp, hÞuðq, h0 Þh0j0i = uðp, hÞuðq, h0 Þ:

ð22:264Þ

Notice that h0| 0i = 1, i.e., the vacuum state has been normalized. Thus, we obtain A=

e - iðpx - qyÞ δhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þ:

ð22:265Þ

e - iðqy - pxÞ δhh0 δ3 ðq - pÞvðq, h0 Þvðp, hÞ:

ð22:266Þ

0

h, h = ± 1

Similarly, we have B= 0

h, h = ± 1

Neither A nor B is a Grassmann number but ordinary number. Hence, according to the manipulation as noted earlier in (22.248), we obtain

1046

22 Quantization of Fields

vðq, h0 Þvðp, hÞ = vðp, hÞvðq, h0 Þ:

ð22:267Þ

Substituting (22.265) and (22.266) for (22.260), by the aid of (22.267) we get 1

1 θðx0 - y0 Þ θ ðy 0 - x 0 Þ Adq3 dp3 B 3 ð2π Þ 2p0 ð2π Þ3 2p0 -1 -1 1 θðx0 - y0 Þ dq3 dp3 e - iðpx - qyÞ δhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þ = ð2π Þ3 2p0 h, h0 = ± 1 -1

h0jT ½ψ ðxÞψ ðyÞj0i =

dq3 dp3

-

1

dq3 dp3 -1

θðy0 - x0 Þ e - iðqy - pxÞ δhh0 δ3 ðq - pÞνðp, hÞνðq, h0 Þ: ð2π Þ3 2p0 h, h0 = ± 1

ð22:268Þ Performing the integration with respect to q and summation with respect to h′, we obtain h0jT ½ψ ðxÞψ ðyÞj0i = -

1

dp3

θ ð x0 - y0 Þ e - ipðx - yÞ uðp, hÞuðp, hÞ 3 ð2π Þ 2p0 h = ± 1

dp3

θ ð y0 - x0 Þ eipðx - yÞ νðp, hÞνðp, hÞ: ð2π Þ3 2p0 h = ± 1

-1 1 -1

ð22:269Þ

Using (21.144) and (21.145), finally we obtain SF ðx - yÞ = h0jT ½ψ ðxÞψ ðyÞj0i 1 dp3 = θ x0 - y0 e - ipðx - yÞ pμ γ μ þ m - θ y0 - x0 eipðx - yÞ pμ γ μ - m 3 - 1 ð2π Þ 2p0 1 dp3 θ x0 - y0 e - ipðx - yÞ pμ γ μ þ m þ θ y0 - x0 eipðx- yÞ - pμ γ μ þ m : = 3 - 1 ð2π Þ 2p0

ð22:270Þ We get another representation of SF(x - y) in a way analogous to (22.251). It is described by SF ðx - yÞ = iγ μ ∂μ þ m ΔF ðx - yÞ, x

ð22:271Þ

where ΔF(x - y) was given by (22.201). With the derivation of (22.271), use (22.270) and perform the differentiation with respect to the individual coordinate xμ. We should be careful in treating the differentiation with respect to x0, because it contains the differentiation of θ(x0 - y0). We show the related calculations as follows:

22.4

Quantization of the Dirac Field [5, 6]

1047

x

iγ 0 ∂0 ΔF ðx - yÞ 1 iγ 0 dp3 x ∂0 θ x0 - y0 e - ipðx - yÞ þ θ y0 - x0 eipðx - yÞ = 3 - 1 ð2π Þ 2p0 1 iγ 0 dp3 = δ x0 - y0 e - ipðx - yÞ þ θ x0 - y0 ð - ip0 Þe - ipðx - yÞ 3 - 1 ð2π Þ 2p0 - δðy0 - x0 Þeipðx - yÞ þ θðy0 - x0 Þðip0 Þeipðx - yÞ =

1

iγ 0 dp3

-1

ð2π Þ3 2p0

ð22:272Þ

δ x0 - y0 e - ipðx - yÞ - δ y0 - x0 eipðx - yÞ

þ θðx0 - y0 Þð - ip0 Þe - ipðx - yÞ þ θðy0 - x0 Þðip0 Þeipðx - yÞ =

1

dp3 p0 γ 0

-1

3

θ x0 - y0 e - ipðx - yÞ - θ y0 - x0 eipðx - yÞ ,

ð2π Þ 2p0

where with the second to the last equality the first and second integrands canceled out each other. For this, use the properties of δ functions; i.e., δ(x0 - y0)e-ip(x - y) = δ(x0 - y0)eip(x - y) and δ(x0 - y0) = δ(y0 - x0) along with 1 1 ipðx - yÞ 3 d p = - 1 e - ipðx - yÞ d3 p. - 1e Also, we make use of the following relations: 3 x

iγ k ∂k ΔF ðx - yÞ

k=1

3

= k=1 3

=

1

iγ k dp3

-1 1

3

ð2π Þ 2p0 iγ k dp3 3

∂k θ x0 - y0 e - ipðx - yÞ þ θ y0 - x0 eipðx - yÞ x

θ x0 - y0 ð - ipk Þe - ipðx - yÞ

ð22:273Þ

- 1 ð2π Þ 2p0 þ θðy0 - x0 Þðipk Þeipðx - yÞ 3 1 dp3 pk γ k θ x0 - y0 e - ipðx - yÞ - θ y0 - x0 eipðx - yÞ : = 3 ð 2π Þ 2p 1 0 k=1 k=1

Adding (22.272) and (22.273) together with mΔF(x - y) and taking account of (22.201), we acknowledge that (22.271) is identical to (22.270). We wish to relate SF(x - y) expressed as (22.270) to the momentum representation (or Fourier integral representation) as discussed in Sect. 22.3.5. We are readily aware that the difference between (22.270) and (22.201) is with or without the factor of ( pμγ μ + m) or (-pμγ μ + m). We derive these factors by modifying the functional form of the integrand of (22.186). As studied in Sect. 22.3.5, (22.201) was based on the residue obtained by the contour integral with respect to k0. In Sect. 6.5, we considered a complex function having a pole of order n at z = a that is represented by f(z) = g(z)/(z - a)n, where g(z) is analytic within a domain we are thinking of. In Sect. 22.3.5, we thought of a type of a function having a simple pole given by e-ikz/(z - a) with g(z) = e-ikz. In the present case, we are thinking of a

1048

22 Quantization of Fields

type of a function described by e-ikz(cz + b)/(z - a), where c and b are certain constants. That is, we have gðzÞ = e - ikz ðcz þ bÞ: From (6.148), we have Res f ðaÞ = f ðzÞðz - aÞjz = a = gðaÞ: Therefore, the residue of the present case is given by g(a) = e-ika(ca + b). Taking account of (22.270), we assume the next analytic function described by gðp0 , pÞ = eipðx - yÞ e - ip0 ðx

0

3

- y0 Þ

p0 γ 0 þ

pk γ k þ m k=1

=e

- ipðx - yÞ

ð22:274Þ

μ

pμ γ þ m :

Since in (22.274) eip(x - y) and 3k = 1 pk γ k are constants with respect to p0 along with m, the p dependence is denoted by g( p0, p) in (22.274). The function g( p0) consists of a product of an exponential function and a linear function of p0 and is an entire function (Definition 6.10). According to the case where the pole is located at p0 = ∓ p0, we obtain g(∓p0), which corresponds to Case I (-p0) and Case II (+p0) of Sect. 22.3.5, respectively. With Case I, the contour integration produces a residue g(-p0, p) described by gð- p0 , pÞ = eipðx - yÞ eip0 ðx

0

- y0 Þ

3

- p0 γ 0 þ

pk γ k þ m : k=1

As in the scalar field case, switching p to -p effects no change in an ensuing definite integral. Then, with Case I we obtain a function given by gð - p0 , - pÞ = e - ipðx - yÞ eip0 ðx

0

3

- y0 Þ

- p0 γ 0 -

pk γ k þ m k=1

= eipðx - yÞ - p0 γ 0 -

3

pk γ k þ m k=1

=e

ipðx - yÞ

μ

- pμ γ þ m :

Comparing (22.201) and (22.270), we find that instead of e-ik(x - y) in (22.186), g( p0, p) = e-ip(x - y)( pμγ μ + m) produces a proper Feynman propagator of the Dirac field SF(x - y). It is described by

22.5

Quantization of the Electromagnetic Field [5–7]

SF ðx - yÞ =

-i ð2π Þ4

d4 p

e - ipðx - yÞ pμ γ μ þ m : m2 - p2 - iE

1049

ð22:275Þ

Equation (22.275) is well known as the Lorentz invariant representation of the Feynman propagator for the Dirac field. The momentum space representation SF ðpÞ (or Fourier counterpart of the coordinate representation) of the Feynman propagator is given in accordance with (22.188) as S F ð pÞ 

- i pμ γ μ þ m : m2 - p2 - iE

ð22:276Þ

We will make the most of the above-mentioned Feynman propagator in the next chapter that deals with the interaction between the quantum fields.

22.5 Quantization of the Electromagnetic Field [5–7] In Part II we studied the classical electromagnetic theory based on the Maxwell’s equations, as in the case of the scalar field and the Dirac field, we deal with the quantization of the electromagnetic field (or photon field). Also, in this case the relativistic description of the electromagnetic field is essential. The quantized theory of both the Dirac field and electromagnetic field directly leads to the construction of the quantum electrodynamics (QED).

22.5.1

Relativistic Formulation of the Electromagnetic Field

To properly define the Lagrangian density and Hamiltonian density of the electromagnetic field, we revisit Part II as a preliminary stage. First, we introduce the Heaviside-Lorentz units (in combination with the natural units) to simplify the description of the Maxwell’s equations. It is because the use of the dielectric constant of vacuum (ε0) and permeability of vacuum (μ0) is often troublesome. To delete these constants, we redefine the electromagnetic quantities as follows: p p E, D → E= ε0 , ε0 D; p p H, B → H= μ0 , μ0 B; p p ρ, i → ε0 ρ, ε0 i: As a result, Maxwell’s equations of electromagnetism (7.1)-(7.4) are recast as follows:

1050

22 Quantization of Fields

div E = ρ, div B = 0, ∂B rot E þ = 0, ∂t ∂E rot B = i: ∂t

ð22:277Þ

In essence, we have redefined the electromagnetic quantities such that D=E

and

B = H:

ð22:278Þ

That is, D = ε0E, e.g., the relationship (7.7) in vacuum is rewritten as p

p p ε0 D = ε0 ðE= ε0 Þ = ε0 E

or

D = E:

In (22.277) μ0ε0 = 1 is implied in accordance with the natural units; see (7.12). Next, we introduce vector potential and scalar potential to express the electromagnetic fields. Of the Maxwell’s equations, with the magnetic flux density B we had div B = 0:

ð7:2Þ

We knew a formula of vector analysis described by div rot V = 0,

ð7:22Þ

for any vector field V. From the above equations, we immediately express B as B = rot A,

ð22:279Þ

where A is a vector field called the vector potential. We know another formula of vector analysis rot— Λ  0,

ð22:280Þ

where Λ is any scalar field. The confirmation of (22.280) is left for readers. Therefore, the vector potential A has arbitrariness of choice so that B can remain unchanged as follows: A0 = A - — Λ and B = rot A0 :

ð22:281Þ

Meanwhile, replacing B in the third equation of (22.277) with (22.279) and exchanging the operating order of the differential operators ∂/∂t and rot, we obtain

22.5

Quantization of the Electromagnetic Field [5–7]

rot E þ

∂A = 0: ∂t

1051

ð22:282Þ

Using (22.280) we express E as E= -

∂A - — ϕ, ∂t

ð22:283Þ

where ϕ is another scalar field called the scalar potential. The scalar potential ϕ and vector potential A are said to be electromagnetic potentials as a collective term. Nonetheless, we have no clear reason to mathematically distinguish ϕ of (22.283) from Λ in (22.280) or (22.281). Considering that these potentials ϕ and A have arbitrariness, we wish to set the functional relationship between them by imposing an appropriate constraint. Such constraint on the electromagnetic potentials is called the gauge transformation. The gauge transformation is defined by A0 ðt, xÞ  Aðt, xÞ - — Λðt, xÞ, ∂Λðt, xÞ ϕ0 ðt, xÞ  ϕðt, xÞ þ : ∂t

ð22:284Þ

Note that A′ and ϕ′ denote the change in the functional form caused by the gauge transformation. The gauge transformation holds the electric field E and magnetic flux density B unchanged such that ∂A0 ðt, xÞ - — ϕ0 ðt, xÞ ∂t ∂Aðt, xÞ ∂— Λðt, xÞ ∂Λðt, xÞ =þ - — ϕðt, xÞ - — ∂t ∂t ∂t ∂Aðt, xÞ =- — ϕðt, xÞ, ∂t

E0 ðt, xÞ = -

where the last equality resulted from the fact that the operating order of ∂/∂t and grad is exchangeable in the second equality. Also, we have B0 ðt, xÞ = rot A0 ðt, xÞ = rot Aðt, xÞ - rot ½— Λðt, xÞ = rot Aðt, xÞ: Having the above-mentioned background knowledge, we recast the Maxwell’s equations in the relativistic formulation. For this, we introduce the following antisymmetric field tensor Fμν (in a contravariant tensor form) expressed as

1052

F μν 

22 Quantization of Fields

F 00

F 01

F 02

F 03

F 10 F 20

F 11 F 21

F 12 F 22

F 13 F 23

F 30

F 31

F 32

F 33

=

0

- E1

- E2

- E3

E1 E2

0 B3

- B3 0

B2 - B1

E3

- B2

B1

0

= - F νμ ,

ð22:285Þ

where the indices 0, 1, 2, and 3 denote the t-, x-, y-, and z-component, respectively. Note that the field quantities of (22.285) have been taken from (22.277). For later use, we show the covariant form expressed as

F μν =

=

F 00

F 01

F 02

F 03

F 10

F 11

F 12

F 13

F 20 F 30

F 21 F 31

F 22 F 32

F 23 F 33

0

E1

E2

E3

- E1

0

- B3

B2

- E2 - E3

B3 - B2

0 B1

- B1 0

= - F νμ :

ð22:286Þ

This expression is obtained from F μν = F ρσ ηρμ ησν ,

ð22:287Þ

where (η)ρμ is the Minkowski metric that has appeared in (21.16). We will discuss the constitution and properties of the tensors in detail in Chap. 24. Defining the charge–current density four-vector as

sν ðxÞ 

ρð xÞ i ð xÞ

=

ρð xÞ i 1 ð xÞ i 2 ð xÞ i 3 ð xÞ

,

ð22:288Þ

we get the Maxwell’s equations that are described in the Lorentz invariant form as ∂μ F μν ðxÞ = sν ðxÞ, ρ μν

μ

ν ρμ

∂ F ðxÞ þ ∂ F νρ ðxÞ þ ∂ F ðxÞ = 0,

ð22:289Þ ð22:290Þ

22.5

Quantization of the Electromagnetic Field [5–7]

1053

t t . Note that we use the notation to x x represent the four-vector and that, e.g., the notation rot A(t, x) is used to show the dependence of a function on independent variables. In (22.290) the indices ρ, μ, and ν are cyclically permuted. These equations are equivalent to (22.277). For instance, in (22.289) the equation described by where x represents the four-vector

∂μ F μ0 ðxÞ = ∂0 F 00 ðxÞ þ ∂1 F 10 ðxÞ þ ∂2 F 20 ðxÞ þ ∂3 F 30 ðxÞ = s0 ðxÞ implies ∂E 1 ∂E2 ∂E 3 þ þ = ρðxÞ: ∂x1 ∂x2 ∂x3 Rewriting the above equation using (t, x), we have div E = ρ, i.e., the first equation of (22.277). As an example of (22.290), an equation given by 0

1

2

∂ F 12 ðxÞ þ ∂ F 20 ðxÞ þ ∂ F 01 ðxÞ = 0 means -

∂B3 ∂E 2 ∂E1 þ =0 ∂x0 ∂x1 ∂x2

or

∂E 2 ∂E1 ∂B3 þ = 0: ∂x1 ∂x2 ∂x0

Also, rewriting the above equation using (t, x) we get ∂E y ∂E x ∂Bz þ =0 ∂x ∂y ∂t

or

ðrot EÞz þ

∂B ∂t

= 0: z

That is, the third equation of (22.277) is recovered. The confirmation of other relations is left for readers. In the above derivations of Maxwell’s equations, note that 0

k

∂ = ∂=∂x0 = ∂=∂x0 = ∂0 , ∂ = ∂=∂xk = - ∂=∂xk = - ∂k ðk = 1, 2, 3Þ:

ð22:291Þ

Since Fμν is the antisymmetric tensor, from (22.289) we have ∂μ sμ ðxÞ = 0:

ð22:292Þ

1054

22 Quantization of Fields

Equation (22.292) is essentially the same as the current continuity equation that has already appeared in (7.20). Using four-vectors, the gauge transformation is succinctly described as μ

μ

A0 ðxÞ = Aμ ðxÞ þ ∂ ΛðxÞ, where Aμ ðxÞ =

ϕð xÞ Að x Þ

ð22:293Þ

. Equation (22.293) is equivalent to (22.284). Using Aμ(x),

we get μ

ν

0

1

F μν ðxÞ = ∂ Aν ðxÞ - ∂ Aμ ðxÞ:

ð22:294Þ

For example, we have F 01 ðxÞ = ∂ A1 ðxÞ - ∂ A0 ðxÞ, i.e., - E1 ðxÞ =

∂A1 ðxÞ ∂A0 ðxÞ þ ∂x0 ∂x1

or

∂A ∂t

ðE Þx = -

- ð — ϕÞ x : x

Other relations can readily be confirmed. The field strength Fμν(x) is invariant under the gauge transformation (gauge invariance). It can be shown as follows: μν

μ

ν

ν

μ

F 0 ðxÞ = ∂ A0 ðxÞ - ∂ A0 ðxÞ μ

ν

ν

μ

= ∂ ½Aν ðxÞ þ ∂ ΛðxÞ - ∂ ½Aμ ðxÞ þ ∂ ΛðxÞ μ μ ν ν ν μ μ ν = ∂ Aν ðxÞ þ ∂ ∂ Λ - ∂ Aμ ðxÞ - ∂ ∂ Λ = ∂ Aν ðxÞ - ∂ Aμ ðxÞ = F μν ðxÞ, ð22:295Þ where with the third equality the operating order of ∂μ and ∂ν can be switched. Further substituting (22.294) for (22.289), we obtain μ

ν

μ

ν

∂μ ½∂ Aν ðxÞ - ∂ Aμ ðxÞ = ∂μ ∂ Aν ðxÞ - ∂μ ∂ Aμ ðxÞ = sν ðxÞ: Exchanging the operating order of ∂μ and ∂ν with the first equality of the above equation, we get μ

ν

∂μ ∂ Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = sν ðxÞ: Or using ∂μ∂μ = □ we have

22.5

Quantization of the Electromagnetic Field [5–7] ν

1055

□Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = sν ðxÞ:

ð22:296Þ

χ ðxÞ  ∂μ Aμ ðxÞ,

ð22:297Þ

ν

ð22:298Þ

Defining in (22.296)

we obtain □Aν ðxÞ - ∂ χ ðxÞ = sν ðxÞ:

The quantity Aμ(x) is said to be a four-vector potential and a relativistic field described by such four-vector is called a gauge field. The electromagnetic field is a typical gauge field.

22.5.2 Lagrangian Density and Hamiltonian Density of the Electromagnetic Field Now, we are in the position to describe the Lagrangian density to quantize the free electromagnetic field. Choosing Aμ(x) for the canonical coordinate, we first define the Lagrangian density L(x) as 1 1 F ðxÞF μν ðxÞ = - ∂μ Aν ðxÞ - ∂ν Aμ ðxÞ 4 4 μν μ ν  ½∂ Aν ðxÞ - ∂ Aμ ðxÞ:

Lð xÞ  -

ð22:299Þ

Using (22.285) and (22.286) based on the classical electromagnetic fields E(t, x) and B(t, x), we obtain Lð xÞ =

1 ½Eðt, xÞ2 - ½Bðt, xÞ2 : 2

ð22:300Þ

From (22.299), as the corresponding canonical momentum π μ(x) we have 

π μ ðxÞ  ∂LðxÞ=∂ Aμ ðxÞ = F μ0 ðxÞ: Hamiltonian density ℋ(x) is then described by 

ℋðxÞ = π μ ðxÞ Aμ ðxÞ - LðxÞ: The first term is given by

ð22:301Þ

1056

22 Quantization of Fields

μ π μ ðxÞA_ ðxÞ = 0 - E 1 - E 2 - E 3

0 - E1 - E2 - E3

= E2 ,

where we used (22.286) to denote π μ(x); we also used (22.283), to which ϕ  0 is imposed. The condition of ϕ  0 is due to the Coulomb gauge that is often used to describe the radiation in vacuum [5]. From (22.300), we get ℋðxÞ =

1 ½Eðt, xÞ2 þ ½Bðt, xÞ2 : 2

ð22:302Þ

Although (22.300) and (22.302) seem reasonable, (22.301) has an inconvenience. It is because we could not define π 0(x), i.e., π 0(x) = F00(x)  0. To circumvent this inconvenience, we need to redefine the Lagrangian density. We develop the discussion in accordance with the literature [12]. The point is to use a scalar function χ(x) of (22.297). That is, we set the Lagrangian density such that [12] Lð xÞ  -

1 1 F ðxÞF μν ðxÞ - χ 2 : 4 μν 2

ð22:303Þ

Rewriting FμνFμν as -

1 1 μ ν F F μν = - ∂μ Aν ∂ Aν - ∂μ Aν ∂ Aμ 4 μν 2 1 1 1 μ = - ∂μ Aν ∂ Aν þ ∂μ ðAν ∂ν Aμ Þ - Aν ∂ν χ, 2 2 2

we have Lð xÞ  -

1 1 1 1 μ ∂ A ∂ Aν þ ∂μ ðAν ∂ν Aμ Þ - Aν ∂ν χ - χ 2 : 2 2 2 2 μ ν

ð22:304Þ

We wish to take the variation with respect to Aν (i.e., δAν) termwise in (22.304). Namely, we have LðxÞ = L1 ðxÞ þ L2 ðxÞ þ L3 ðxÞ þ L4 ðxÞ,

ð22:305Þ

where L1(x), ⋯, and L4(x) correspond to the first, ⋯, and fourth term of (22.304), respectively. We obtain

22.5

Quantization of the Electromagnetic Field [5–7]

SðAν Þ =

dx LðxÞ =

δSðAν Þ = δ

1057

dxL1 ðxÞ þ ⋯ þ

dx LðxÞ = δ

dx L4 ðxÞ = S1 þ ⋯ þ S4 ,

dxL1 ðxÞ þ ⋯ þ δ

dx L4 ðxÞ ,

= δS1 ðAν Þ þ ⋯ þ δS4 ðAν Þ, ð22:306Þ where Sk(Aν) = dxLk(x) and δSk(Aν) = δ one by one. With δS1 we get δS1 ðAν Þ = δ

dx -

= =

dxL1 ðxÞ 1 2 1 dx 2

=

1 2

dxLk(x) (k = 1, 2, 3, 4). We evaluate δSk

μ

μ

μ

μ

∂μ δAν ∂ Aν þ ∂μ Aν ð∂ δAν Þ

ð22:307Þ

∂μ δAν ∂ Aν þ ∂μ Aν ð∂ δAν Þ μ

μ

μ

dxδAν ∂μ ∂ Aν þ ∂ ∂μ Aν =

dxδAν ∂μ ∂ Aν ,

where with the third equality the second term was rewritten so that the variation was evaluated with δAν. In the derivation of (22.307), we used the integration by parts and assumed that δAν vanishes at xμ → ± 1 as in the derivation of (22.76). Similarly evaluating δS2(Aν), δS3(Aν), and δS4(Aν), we have

δS3 ðAν Þ =

1 2 1 dxδL3 ðxÞ = 2

δS4 ðAν Þ =

dxδL4 ðxÞ =

δS2 ðAν Þ =

dxδL2 ðxÞ = -

ν

dxδAν ∂μ ∂ Aμ , ν

dxδAν ∂ ∂μ Aμ ,

ð22:308Þ

ν

dxδAν ∂ ∂μ Aμ :

Summing δS1(Aν), δS2(Aν), δS3(Aν), and δS4(Aν), we obtain a simple result expressed as δSðAν Þ = δS1 ðAν Þ =

μ

dxδAν ∂μ ∂ Aν :

ð22:309Þ

Notice that δS2(Aν), δS3(Aν), and δS4(Aν) canceled out one another. Thus, the Euler–Lagrange equation obtained from (22.309) reads as

1058

22 Quantization of Fields μ

∂ μ ∂ A ν ð xÞ = 0

or

□Aν ðxÞ = 0:

ð22:310Þ

For (22.310) to be equivalent to the Maxwell’s equations, we need another condition [12]. We seek the condition below. With a free electromagnetic field without the charge–current density, considering (22.297) and (22.298) on condition that sν(x) = 0 we have μ

ν

μ

ν

∂μ ∂ Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = ∂μ ∂ Aν ðxÞ - ∂ χ ðxÞ = 0:

ð22:311Þ

Then, for (22.310) to be consistent with (22.311) we must have ν

∂ χ ðxÞ = 0:

ð22:312Þ

This leads to ν

∂ν ∂ χ ðxÞ = 0:

ð22:313Þ

Putting ν = 0 in (22.312), we have 0

∂ χ ðxÞ = ∂χ ðxÞ=∂t = 0:

ð22:314Þ

Consequently, if we impose χ ðxÞ = ∂μ Aμ ðxÞ = 0 at

t = 0,

ð22:315Þ

from (22.313) we have χ(x)  0 at all the times [12]. The constraint χ(x) = 0 is called the Lorentz condition. Any gauge in which the Lorentz condition holds is called a Lorentz gauge [12]. The significance of the Lorentz condition will be mentioned later in relation to the indefinite metric (Sect. 22.5.5). Differentiating (22.293) with respect to xμ, we obtain μ

μ

∂μ A0 ðxÞ = ∂μ Aμ ðxÞ þ ∂μ ∂ ΛðxÞ:

ð22:316Þ

Using the Lorentz gauge, from (22.315) and (22.316) we have [12] μ

∂ μ ∂ Λ ð xÞ = 0

or

□ΛðxÞ = 0:

ð22:317Þ

When we originally chose Λ(x) in (22.280), Λ could be any scalar field. We find, however, that the scalar function Λ(x) must satisfy the condition of (22.317) under the Lorentz gauge. Under the Lorentz gauge, in turn, the last two terms of (22.304) vanish. If we assume that Aν vanishes at x = ± 1, the second term of (22.304) vanishes as well. It is because the integration of that term can be regarded the extension of the Gauss’s

22.5

Quantization of the Electromagnetic Field [5–7]

1059

theorem to the four-dimensional space–time (see Sect. 7.1). Thus, among four terms of (22.304) only the first term survives as the Lagrangian density L(x) that properly produces the physically meaningful π 0(x). That is, we get [12] Lð xÞ = -

1 μ ∂ A ∂ Aν : 2 μ ν

ð22:318Þ

The Lagrangian density described by (22.318) is equivalent to that of (22.303) or (22.304) on which the Lorentz condition is imposed and for which the field Aν is supposed to vanish at x = ± 1. Using (22.318), as the canonical momentum π μ(x) we obtain [12] μ

π μ ðxÞ = ∂LðxÞ=∂A_ μ ðxÞ = - A_ ðxÞ:

ð22:319Þ

Thus, the inconvenience of π 0(x) = 0 mentioned earlier has been skillfully gotten rid of. Notice that RHS of (22.319) has a minus sign, in contrast to (22.83). What remains to be done is to examine whether or not L(x) is invariant under the gauge transformation (22.293). We will discuss this issue in Chap. 23. In the subsequent sections, we perform the canonical quantization of the electromagnetic field using the canonical coordinates and canonical momenta.

22.5.3

Polarization Vectors of the Electromagnetic Field [5]

In Part I and Part II, we discussed the polarization vectors of the electromagnetic fields within the framework of the classical theory. In this section, we revisit the related discussion and rebuild the polarization features of the fields from the relativistic point of view. Previously, we have mentioned that the electromagnetic waves have two independent polarization states as the transverse modes. Since in this section we are dealing with the electromagnetic fields as the four-vectors, we need to consider two extra polarization freedoms, i.e., the longitudinal mode and scalar mode. Then, we distinguish these modes as the polarization vectors such as εμ ðk, r Þ

or

εμ ðk, r Þ ðμ, r = 0, 1, 2, 3Þ,

where the index μ denotes the component of the four-vector (either contravariant or covariant); k is a wavenumber vector (see Part II); r represents the four linearly independent polarization states for each k. The modes εμ(k, 1) and εμ(k, 2) are called transverse polarizations, εμ(k, 3) longitudinal polarization, εμ(k, 0) scalar polarization. The quantity εμ(k, r) is a complex c-number. Also, note that

1060

22 Quantization of Fields

k=

k0 k

with k0 = k0 = jkj:

ð22:320Þ

Equation (22.320) is a consequence of the fact that photon is massless. The orthonormality and completeness relations are read as [5] εμ ðk, r Þ εμ ðk, sÞ = - ζ r δrs ðr, s = 0, 1, 2, 3Þ, 3

ζ r εμ ðk, r Þ εν ðk, r Þ = - ημν or

r=0

3 r=0

ζ r εμ ðk, r Þ εν ðk, r Þ = - δμν

ð22:321Þ ð22:322Þ

with ζ 0 = - 1, ζ 1 = ζ 2 = ζ 3 = 1:

ð22:323Þ

Equation (22.321) means that εμ ðk, 0Þ εμ ðk, 0Þ = 1 > 0, εμ ðk, r Þ εμ ðk, r Þ = - 1 < 0 ðr = 1, 2, 3Þ:

ð22:324Þ

If εμ(k, r) is real, εμ(k, 0) is a timelike vector and that εμ(k, r) (r = 1, 2, 3) are spacelike vectors in the Minkowski space. We show other important relations in a special frame where k is directed to the positive direction of the z-axis. We have εμ ðk, 0Þ =

1 , εμ ðk, r Þ = 0

0 εðk, r Þ

ðr = 1, 2, 3Þ,

ð22:325Þ

where ε(k, r) denotes a usual three-dimensional vector; to specify it we use a Gothic letter ε. If we wish to represent εμ(k, 0) in a general frame, we denote it by a covariant notation [5] εμ ðk, 0Þ  nμ :

ð22:326Þ

Also, we have k  εðk, r Þ = 0 ðr = 1, 2Þ, εðk, 3Þ = k=jkj, εðk, r Þ  εðk, sÞ = δrs ðr, s = 1, 2, 3Þ: Denoting εμ(k, 3) by a covariant form, we have [5]

ð22:327Þ

22.5

Quantization of the Electromagnetic Field [5–7]

εμ ðk, 3Þ =

k μ - ðknÞnμ

1061

,

ð22:328Þ

ðknÞ2 - k 2

where the numerator was obtained by subtracting the timelike component from k0 ; related discussion for the three-dimensional version can be seen in k= k Sect. 7.3. In (22.321) and (22.322), we used complex numbers for εμ(k, s), which is generally used for the circularly (or elliptically) polarized light (Sect. 7.4). With a linearly polarized light, we use a real number for εμ(k, s). All the above discussions relevant to the polarization features of the electromagnetic fields can be dealt with within the framework of the classical theory of electromagnetism.

22.5.4

Canonical Quantization of the Electromagnetic Field

The quantization of the electromagnetic field is inherently represented in a tensor form. This differs from those for the scalar field and the Dirac field. This feature is highly important with both the classical and quantum electromagnetism. As in the case of the scalar field, the canonical quantization is translated into the commutation relations between the creation and annihilation operators. Unlike the scalar field and Dirac field, however, the classical polarization vectors of photon are incorporated into the commutation relations. The equal-time canonical commutation relation is described as ½Aμ ðt, xÞ, π ν ðt, yÞ = iδμν δ3 ðx - yÞ:

ð22:329Þ

Equation (22.329) corresponds to (22.84) of the scalar field case and is clearly represented in the tensor form, i.e., a direct product of a contravariant vector Aμ(t, x) and a covariant vector π ν(t, y). Multiplying both sides of (22.329) by ηνρ, we obtain ½Aμ ðt, xÞ, π ρ ðt, yÞ = iημρ δ3 ðx - yÞ:

ð22:330Þ

This is a tensor described as a direct product of two contravariant vectors. Using (22.319), we get Aμ ðt, xÞ, A_ ν ðt, yÞ = - iδμν δ3 ðx - yÞ, ρ

Aμ ðt, xÞ, A_ ðt, yÞ = - iημρ δ3 ðx - yÞ:

ð22:331Þ ð22:332Þ

1062

22 Quantization of Fields

In (22.331) and (22.332), the minus sign of RHS comes from (22.319). All the other commutators of other combinations vanish. (That is, all the other combinations are commutative.) We have, e.g., μ ν ½Aμ ðt, xÞ, Aν ðt, yÞ = A_ ðt, xÞ, A_ ðt, yÞ = 0:

We perform the quantization of the electromagnetic field in accordance with (22.98) of the scalar field (Sect. 22.3.2). Taking account of the polarization index r (=0, ⋯, 3), we expand the identity operator E such that E=

1

3

-1 r=0

δ k 0 2 - k2 j p, rihp, r j dp4 :

ð22:333Þ

Notice that in (22.333) photon is massless. As the Fourier expansion of the field operator Aμ(x) we obtain [2] d3 k

Aμ ðxÞ =

ð2π Þ3 ð2k 0 Þ 3

×

aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx :

ð22:334Þ

r=0

In (22.334), the summation with r is pertinent to a(k, r) [or a{(k, r)] and εμ(k, r) [or εμ(k, r)]. In comparison with (22.208), in (22.334) we use the following correspondence: aþ ðpÞ $ aðk, r Þ

and a{- ð- pÞ $ a{ ðk, r Þ

as well as Φþ p, E p $ εμ ðk, r Þ

and Φ - - p, - E p $ εμ ðk, r Þ :

Then, as the decomposition of Aμ(x) into the positive-frequency term Aμ+(x) and negative-frequency term Aμ-(x), we have Aμ ðxÞ = Aμþ ðxÞ þ Aμ - ðxÞ with

22.5

Quantization of the Electromagnetic Field [5–7]

Aμþ ðxÞ 

3

d3 k 3

ð2π Þ ð2k 0 Þ μ-

A

ð xÞ 

1063

aðk, r Þεμ ðk, r Þe - ikx ,

r=0 3

d3 k

ð22:335Þ {

μ

 ikx

a ðk, r Þε ðk, r Þ e :

ð2π Þ3 ð2k 0 Þ

r=0

Note that Aμ(x) is Hermitian, as is the case with the scalar field ϕ(x). With the covariant quantized electromagnetic field Aμ(x), we have Aμ ð x Þ =

3

d3 k ð2π Þ3 ð2k0 Þ

r=0

 aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx :

ð22:336Þ

The operator Aμ(x) is also divided into Aμ ðxÞ = Aμ þ ðxÞ þ Aμ - ðxÞ, where Aμ þ ðxÞ and Aμ - ðxÞ are similarly defined as in the case of (22.335). Now, we define operators Bμ ðkÞ as Bμ ð kÞ 

3

aðk, r Þεμ ðk, r Þ:

ð22:337Þ

r=0

Taking the adjoint of (22.337), we get Bμ ð kÞ { =

3

a{ ðk, r Þεμ ðk, r Þ :

ð22:338Þ

r=0

Using the operators Bμ ðkÞ and Bμ ðkÞ{ , we may advance discussion of the quantization in parallel with that of the scalar field. Note the following correspondence between the photon field and scalar field: Bμ ðkÞ $ aðkÞ, Bμ ðkÞ{ $ a{ ðkÞ: Remember once again that both the photon field operator and real scalar field operator are Hermitian (and bosonic field operators), allowing us to carry out the parallel discussion on the field quantization. As in the case of (22.104), we have

1064

22 Quantization of Fields

d3 x

Bμ ðkÞe - ik0 t = i

ð2π Þ3 ð2k 0 Þ

μ A_ ðt, xÞ - ik 0 Aμ ðt, xÞ e - iqx ,

ð22:339Þ

μ A_ ðt, xÞ þ ik 0 Aμ ðt, xÞ eiqx :

ð22:340Þ

d3 x

Bμ ðkÞ{ eik0 t = - i

ð2π Þ3 ð2k0 Þ

In light of the quantization of the scalar field represented by (22.105), we acknowledge that the following relationship holds: Bμ ðkÞ, Bν ðqÞ{ = - δμν δ3 ðk - qÞ,

ð22:341Þ

where Bν ðqÞ{ is the covariant form corresponding to Bν ðqÞ{ ; the minus sign of the head of RHS in (22.341) comes from (22.319). In fact, we have 3

Bμ ðkÞ, Bν ðqÞ{ =

aðk, r Þεμ ðk, rÞ,

r=0 3

3

a{ ðq, sÞεν ðq, sÞ

s=0 3

=

εμ ðk, r Þεν ðq, sÞ aðk, r Þ, a{ ðq, sÞ

r=0 s=0 3

3

=

ζ r εμ ðk, r Þεν ðq, sÞ ζ r aðk, r Þ, a{ ðq, sÞ ,

ð22:342Þ

r=0 s=0

where with the last equality we used ζ r2 = 1. For (22.341) and (22.342) to be identical, taking account of (22.322) we must have [5] ζ r aðk, r Þ, a{ ðq, sÞ = δrs δ3 ðk - qÞ

ð22:343Þ

aðk, r Þ, a{ ðq, sÞ = ζ r δrs δ3 ðk - qÞ:

ð22:344Þ

or

Also, as another commutation relation we have ½aðk, r Þ, aðq, sÞ = a{ ðk, r Þ, a{ ðq, sÞ = 0: Then, (22.342) becomes

22.5

Quantization of the Electromagnetic Field [5–7]

Bμ ðkÞ, Bν ðqÞ{ =

3

3

1065

ζ r εμ ðk, r Þεν ðq, sÞ δrs δ3 ðk - qÞ

r=0 s=0 3

=

ζ r εμ ðk, r Þεν ðq, r Þ δ3 ðk - qÞ

r=0 3

=

ð22:341Þ

ζ r εμ ðq, r Þεν ðq, r Þ δ3 ðk - qÞ

r=0

= - δμν δ3 ðk - qÞ as required. With the second to the last equality of (22.341), we used (10.116), i.e., the property of the δ function; with the last equality we used (22.322). It is worth mentioning that RHS of (22.344) contains ζ r. This fact implies that the quantization of the radiation field is directly associated with the polarization feature of the classical electromagnetic field.

22.5.5

Hamiltonian and Indefinite Metric

In the last section, we had the commutation relation (22.344) with the creation and annihilation operators as well as the canonical commutation relations (22.331) and (22.332). These equations differ from the corresponding commutation relations (22.84) and (22.105) of the scalar field in that (22.331) has a minus sign and that (22.344) has a factor ζ r. This feature causes problems with the subsequent discussion. In relation to this issue, we calculate the Hamiltonian density ℋ(x) of the electromagnetic field. It is given by ℋðxÞ = π μ ðxÞA_ μ ðxÞ - LðxÞ μ 1 μ = - A_ ðxÞA_ μ ðxÞ þ ∂μ Aν ∂ Aν , 2

ð22:345Þ

where with the second equality we used (22.318) and (22.319). As in the case of the scalar field, the Hamiltonian H is given by the triple integral of ℋ(x) such that H=

d3 x ℋðxÞ:

The calculations are similar to those of the scalar field (Sect. 22.3.3). We estimate the integral termwise. With the first term of (22.345), using (22.334) and (22.336) we have

1066

22 Quantization of Fields μ

- A_ ðxÞA_ μ ðxÞ =

-1 2ð2π Þ3 ×

=

d q p q0

aðk, r Þεμ ðk, r Þð - ik 0 Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ ðik 0 Þeikx

r=0

3

3

aðq, sÞεμ ðq, sÞð - iq0 Þe - iqx þ a{ ðq, sÞεμ ðq, sÞ ðiq0 Þeiqx

s=0 3

1 2ð2π Þ3 ×

3

d3 k p k0

d3 k

k0

- aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx

r=0 3

p

- aðq, sÞεμ ðq, sÞe - iqx þ a{ ðq, sÞεμ ðq, sÞ eiqx :

d 3 q q0 s=0

ð22:346Þ The integrand of (22.346) is given by ½Integrand of ð22:346Þ 3

=

3

k 0 q0

- aðk, r Þεμ ðk, r Þa{ ðq, sÞεμ ðq, sÞ e - iðk - qÞx

r=0 s=0

- a{ ðk, r Þεμ ðk, r Þ aðq, sÞεμ ðq, sÞeiðk - qÞx þaðk, r Þεμ ðk, r Þaðq, sÞεμ ðq, sÞe - iðkþqÞx þa{ ðk, r Þεμ ðk, r Þ a{ ðq, sÞεμ ðq, sÞ eiðkþqÞx : μ To calculate d3 x - A_ ðxÞA_ μ ðxÞ , we exchange the integration order between the variable x and variable k (or variable q). That is, performing the integration with respect to x first, we obtain

d3 x ½Intrgrand of ð22:346Þ 3

=

3

k 0 q0

- aðk, r Þεμ ðk, r Þa{ ðq, sÞεμ ðq, sÞ e - iðk0 - q0 Þ ð2π Þ3 δðk - qÞ

r=0 s=0

- a{ ðk, r Þεμ ðk, r Þ aðq, sÞεμ ðq, sÞeiðk0 - q0 Þ ð2π Þ3 δðk - qÞ þaðk, r Þεμ ðk, r Þaðq, sÞεμ ðq, sÞe - iðk0 þq0 Þ ð2π Þ3 δðk þ qÞ þa{ ðk, r Þεμ ðk, r Þ a{ ðq, sÞεμ ðq, sÞ eiðk0 þq0 Þ ð2π Þ3 δðk þ qÞ : ð22:347Þ Further performing the integration of (22.347) with respect to q, we obtain

22.5

Quantization of the Electromagnetic Field [5–7]

1067

d 3 q ð22:347Þ 3

3

- aðk, r Þεμ ðk, r Þa{ ðk, sÞεμ ðk, sÞ

= ð2π Þ3 k 0 r=0 s=0

- a{ ðk, r Þεμ ðk, r Þ aðk, sÞεμ ðk, sÞ þaðk, r Þεμ ðk, r Það - k, sÞεμ ð - k, sÞe - 2ik0 þa{ ðk, r Þεμ ðk, r Þ a{ ð - k, sÞεμ ð- k, sÞ e2ik0 :

ð22:348Þ

Only when k = - k, the last two terms of (22.348) are physically meaningful. In that case, however, we have k = 0. But this implies no photon state, and so we discard these two terms. Compare the present situation with that of (22.111), i.e., the scalar field case, where k = 0 is physically meaningful. Then, using (22.321) we get d 3 q ð22:347Þ 3

3

- aðk, r Þεμ ðk, sÞð - ζ r δrs Þ - a{ ðk, r Þaðk, sÞð - ζ r δrs Þ

= ð2π Þ3 k 0 r=0 s=0 3

ζ r aðk, r Þa{ ðk, r Þ þ a{ ðk, r Þaðk, r Þ

= ð2π Þ3 k 0 r=0 3

ζ r a{ ðk, r Þaðk, r Þ þ ζ r δ3 ð0Þ þ a{ ðk, r Þaðk, r Þ

= ð2π Þ3 k 0 r=0 3

ζ r 2a{ ðk, r Þaðk, r Þ þ ζ r 2 δ3 ð0Þ

= ð2π Þ3 k 0 r=0 3

ζ r 2a{ ðk, r Þaðk, r Þ þ ð2π Þ3 k0 4δ3 ð0Þ,

= ð2π Þ3 k 0 r=0

where with the third equality we used (22.344); with the last equality we used (22.323). Finally, we get μ

d3 x - A_ ðxÞA_ μ ðxÞ 3

=

d kk 0

{

ζ r a ðk, r Þaðk, r Þ þ 2

3

ð22:349Þ d kk0 δ ð0Þ: 3

3

r=0

In (22.349) the second term leads to the divergence as discussed in Sect. 22.3.3. As before, we discard this term; namely the normal order is implied there.

1068

22 Quantization of Fields

As for the integration of the second term of (22.345), use the Fourier expansion formulae (22.334) and (22.336) in combination with the δ function formula (22.101). That term will vanish because of the presence of a factor k μ kμ = k0 k 0 - jkj2 = 0: The confirmation is left for readers. Thus, we obtain the Hamiltonian H described by 3

H=

d 3 kk 0

ζ r a{ ðk, r Þaðk, r Þ:

ð22:350Þ

r=0

In (22.350) the presence of a factor ζ0 = - 1 causes a problem. It is because ζ 0 leads to a negative integrand in (22.350). The argument is as follows: Defining the one-photon state j1q, si as [5] j 1q,s i  a{ ðq, sÞ j 0i,

ð22:351Þ

where j0i denotes the vacuum, its energy is 3

H j 1q,s i =

d 3 kk0

ζ r a{ ðk, r Þaðk, r Þa{ ðq, sÞ j 0i

r=0 3

=

d 3 kk0

ζ r a{ ðk, r Þ a{ ðq, sÞaðk, r Þ þ ζ r δrs δ3 ðk - qÞ j 0i

r=0 3

=

d 3 kk0

ζ r a{ ðk, r Þζ r δrs δ3 ðk - qÞ j 0i

r=0

= k0 ðζ s Þ2 a{ ðq, sÞ j 0i = k0 a{ ðq, sÞ j 0i = k 0 j 1q,s i, ð22:352Þ where with the second equality we used (22.344); with the third equality we used a(k, r) j 0i = 0; with the third to the last equality we used (22.323). Thus, (22.352) certainly shows that the energy of the one-photon state is k0. As a square of the norm h1q, s| 1q, si [see (13.8)], we have 1q,s j1q,s = 0jaðq, sÞa{ ðq, sÞj0 = 0ja{ ðq, sÞaðq, sÞ þ ζ s δ3 ðq - qÞj0 = ζ s δ3 ð0Þh0j0i = ζs δ3 ð0Þ,

ð22:353Þ

where with the last equality we assumed that the vacuum state is normalized. With s = 0, we have ζ 0 = - 1 and, hence,

22.5

Quantization of the Electromagnetic Field [5–7]

1069

1q,0 j1q,0 = - δ3 ð0Þ:

ð22:354Þ

This implies that the (square of) norm of a scalar photon is negative. The negative norm is said to be an indefinite metric. Admittedly, (22.354) violates the rule of inner product calculations (see Sect. 13.1). In fact, such a metric is intractable, and so it should be appropriately dealt with. So far as the factor a(k, 0)a{(k, 0) is involved, we could not avoid inconvenience. We wish to remove it from quantum-mechanical operators. At the same time, we have another problem with the Lorentz condition described by ∂μ Aμ ðxÞ = 0:

ð22:355Þ

The Lorentz condition is incompatible with the equal-time canonical commutation relation. That is, we have [2] A0 ðt, xÞ, ∂μ Aμ ðt, xÞ = A0 ðt, xÞ, ∂0 A0 ðt, xÞ þ A0 ðt, xÞ, ∂k Ak ðt, xÞ = A0 ðt, xÞ, - π 0 ðt, xÞ = - iδ3 ðx - yÞ ≠ 0:

ð22:356Þ

This problem was resolved by Gupta and Bleuler by replacing the Lorentz condition (22.355) with a weaker condition. The point is that we divide the Hermitian operator Aμ(x) into Aμ+(x) and Aμ-(x) such that Aμ ðxÞ = Aμþ ðxÞ þ Aμ - ðxÞ with Aμþ ðxÞ 

d3 k 3

ð2π Þ ð2k 0 Þ μ-

A

ð xÞ 

d3 k ð2π Þ3 ð2k 0 Þ

3

aðk, r Þεμ ðk, r Þe - ikx ,

r=0 3

ð22:335Þ {

μ

 ikx

a ðk, r Þε ðk, r Þ e : r=0

We also require Aμ+(x) to be ∂μ Aμþ ðxÞ j Ψi = 0,

ð22:357Þ

where jΨi is any physically meaningful state. Taking an adjoint of (22.357), we have hΨ j ∂μ Aμ - ðxÞ = 0: Accordingly, we obtain

ð22:358Þ

1070

22 Quantization of Fields

Ψ ∂μ Aμ ðxÞ Ψ = Ψ ∂μ Aμ - ðxÞ þ ∂μ Aμþ ðxÞ Ψ = 0:

ð22:359Þ

Equation (22.359) implies that the Lorentz condition holds with an “expectation value.” Substituting Aμ+(x) of (22.335) for (22.357), we get [2] 3

k μ aðk, r Þεμ ðk, r Þ j Ψi = 0:

ð22:360Þ

r=0

With the above equation, notice that the functions e-ikx in (22.335) are linearly independent with respect to individual k. Recalling the special coordinate frame where k is directed to the positive direction of the z-axis, we have [2] kμ = ðk0 0 0 - k0 Þ ðk 0 > 0Þ, where k1 = k2 = 0 and k3 = - k0. Also, designating εμ(k, r) as [2] 1 0 ε0 ðk, 0Þ =

0 0 0 0

ε2 ðk, 2Þ =

1

0 1

ð0Þ

 δ0 , ε1 ðk, 1Þ =

0

ð1Þ

 δ1 ,

0 0 0

ð2Þ

 δ2 , ε3 ðk, 3Þ =

0

0

ð22:361Þ ð3Þ

 δ3 ,

1

we obtain 3

kμ aðk, r Þεμ ðk, r Þ =

r=0

3 r=0

ð0Þ

ð3Þ

k 0 aðk, r Þδ0 þ k 3 aðk, r Þδ3

= k0 ½aðk, 0Þ - aðk, 3Þ: ð0Þ

ð3Þ

With δ0 , δ3 , etc. in (22.361), see (12.175). Then, (22.360) is rewritten as k 0 ½aðk, 0Þ - aðk, 3Þ j Ψi = 0: Since k0 ≠ 0, we have

ð22:362Þ

22.5

Quantization of the Electromagnetic Field [5–7]

½aðk, 0Þ - aðk, 3Þ j Ψi = 0:

1071

ð22:363Þ

Taking the adjoint of (22.363), we have hΨ j a{ ðk, 0Þ - a{ ðk, 3Þ = 0:

ð22:364Þ

Multiplying both sides of (22.363) by hΨ j a{(k, 0) or hΨ j a{(k, 3) from the left and multiplying both sides of (22.364) by a(k, 0) j Ψi or a(k, 3) j Ψi from the right, and then summing both sides of all these four equations, we get Ψ a{ ðk, 0Þaðk, 0Þ - a{ ðk, 3Þaðk, 3Þ Ψ = 0:

ð22:365Þ

Further multiplying both sides of (22.365) by k0 and performing the triple integration with respect to k, we have 0=

d3 kk 0 Ψ a{ ðk, 0Þaðk, 0Þ - a{ ðk, 3Þaðk, 3Þ Ψ :

ð22:366Þ

Meanwhile, sandwiching both sides of (22.350) between hΨj and jΨi, we obtain 3

hΨjH jΨi =

d3 kk0

Ψ ζ r a{ ðk, r Þaðk, r Þ Ψ :

ð22:367Þ

r=0

Finally, summing both sides of (22.366) and (22.367), we get 2

hΨjH jΨi =

d3 kk 0

Ψ a{ ðk, r Þaðk, r Þ Ψ :

ð22:368Þ

r=1

Thus, we have successfully gotten rid of the factor a{(k, 0)a(k, 0) together with a (k, 3)a(k, 3) from the Hamiltonian. {

22.5.6

Feynman Propagator of Photon Field

As in the case of both the scalar field and Dirac field, we wish to find the Feynman propagator of the photon field. First, we calculate [Aμ(x), Aν( y)]. Using (22.334), we have

1072

22 Quantization of Fields

½Aμ ðxÞ, Aν ðyÞ =

d3 kd3 q ð2π Þ3 2 k0 q0

3

faðk, r Þεμ ðk, rÞe - ikx

r=0 3

þa{ ðk, rÞεμ ðk, rÞ eikx g,

aðq, sÞεν ðq, sÞe - iqx þ a{ ðq, sÞεν ðq, sÞ eiqx

s=0

=

d 3 kd3 q ð2π Þ3 2 k 0 q0

3

3

εμ ðk, r Þεν ðq, sÞ e - ikx eiqx aðk, r Þ, a{ ðq, sÞ

r=0 s=0

þεμ ðk, r Þ εν ðq, sÞeikx e - iqx a{ ðk, r Þ, aðq, sÞ = =

d3 k 3 ð2π Þ 2 k0 q0

3

ζ r εμ ðk, r Þεν ðk, r Þ e - iðkx - qyÞ - ζ r εμ ðk, r Þ εν ðk, r Þeiðkx - qyÞ

r=0

d3 kð- ημν Þ - ikðx - yÞ e - eikðx - yÞ , ð2π Þ3 2k 0

ð22:369Þ

where the second to the last equality we used (22.344) and with the last equality we used (22.322). Comparing (22.369) with (22.169), we find that (22.369) differs from (22.169) only by the factor -ημν. Also, notice that photon is massless (m = 0). Therefore, we have lim K = lim

m→0

m→0

k2 þ m2 = jkj = k0 ðk 0 > 0Þ:

ð22:370Þ

Thus, k0 corresponds to K of (22.169). In conjunction with (22.131), we define the invariant delta function Dμν(x - y) such that [5, 6] iDμν ðx - yÞ  ½Aμ ðxÞ, Aν ðyÞ = lim ½ - iημν Δðx - yÞ: m→0

ð22:371Þ

Once we have gotten the invariant delta function Dμν(x) in the form of (22.369), we can immediately obtain the Feynman propagator with respect to the photon field. Defining the Feynman propagator Dμν F ðxÞ as μ ν Dμν F ðxÞ  h0jT½A ðxÞA ðyÞj0i

and taking account of (22.369), we get μν Dμν F ðxÞ = lim ½ - η ΔF ðx - yÞ: m→0

Writing it explicitly, we have

ð22:372Þ

22.5

Quantization of the Electromagnetic Field [5–7]

Dμν F ð x - yÞ = =

-i ð2π Þ4

-i ð2π Þ4 d4 k

d4 kð- ημν Þ

1073

e - ikðx - yÞ - k2 - iE

ημν e - ikðx - yÞ : k2 þ iE

ð22:373Þ

Using the momentum space representation, we have μν

DF ðk Þ =

iημν - iημν = = lim - ημν ΔF ðkÞ : 2 k þ iE - k 2 - iE m → 0

ð22:374Þ

Equation (22.374) can further be deformed using (22.322), (22.323), (22.326), and (22.328). That is, (22.322) is rewritten as ημν = -

2

εμ ðk, r Þ εν ðk, r Þ - εμ ðk, 3Þ εν ðk, 3Þ þ εμ ðk, 0Þ εν ðk, 0Þ

r=1 2

=-

εμ ðk, r Þ εν ðk, r Þ -

½k μ - ðknÞnμ ½k ν - ðknÞnν  þ nμ nν ðknÞ2 - k2

εμ ðk, r Þ εν ðk, r Þ -

½k μ kν - ðknÞðkμ nν þ nμ k ν Þ þ nμ nν k 2 : ðknÞ2 - k 2

r=1 2

=r=1

ð22:375Þ

The quantity ημν in (22.374) can be replaced with that of (22.375) and the μν resulting DF ðk Þ can be used for the calculations of interaction among the quantum fields. Meanwhile, in (22.375) we assume that k2 ≠ 0 in general cases. With the real photon we have k2 = 0. Then, we get ημν = -

2

εμ ðk, r Þ εν ðk, r Þ -

r=1

kμ k ν - ðknÞðkμ nν þ nμ kν Þ : ðknÞ2

ð22:376Þ

Equation (22.376) is rewritten as 2

εμ ðk, r Þ εν ðk, r Þ = - ημν r=1

k μ kν - ðknÞðkμ nν þ nμ k ν Þ : ðknÞ2

ð22:377Þ

Equation (22.377) will be useful in evaluating the matrix elements of the S-matrix including the Feynman amplitude; see the next chapter, especially Sect. 23.8.

1074

22 Quantization of Fields

References 1. Goldstein H, Poole C, Safko J (2002) Classical mechanics, 3rd edn. Addison Wesley, San Francisco 2. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese) 3. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York 4. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 5. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester 6. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York 7. Kaku M (1993) Quantum field theory. Oxford University Press, New York 8. Kugo C (1989) Quantum theory of gauge field I. Baifukan, Tokyo. (in Japanese) 9. Greiner W, Reinhardt J (1996) Field quantization. Springer, Berlin 10. Boas ML (2006) Mathematical methods in the physical sciences, 3rd edn. Wiley, New York 11. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 12. Schweber SS (1989) An introduction to relativistic quantum field theory. Dover, New York

Chapter 23

Interaction Between Fields

So far, we have focused on the characteristics of the scalar field, Dirac field, and electromagnetic field that behave as free fields without interaction among one another. In this chapter, we deal with the interaction between these fields. The interaction is described as that between the quantized fields, whose important features have been studied in Chap. 22. The interactions between the fields are presented as non-linear field equations. It is difficult to obtain analytic solutions of the non-linear equation, and so suitable approximation must be done. The method for this is well-established as the perturbation theory that makes the most of the Smatrix. Although we have dealt with the perturbation method in Chap. 5, the method we use in the present chapter differs largely from the previous method. Here we focus on the interaction between the Dirac field (i.e., electron) and the electromagnetic field (photon). The physical system that comprises electrons and photons has been fully investigated and established as the quantum electrodynamics (QED). We take this opportunity to get familiar with this major field of quantum mechanics. In the last part of the chapter, we deal with the Compton scattering as a typical example to deepen understanding of the theory [1].

23.1

Lorentz Force and Minimal Coupling

Although in the quantum theory of fields abstract and axiomatic approaches tend to be taken, we wish to adhere to an intuitive approach to the maximum extent possible. To that effect, we get back to the Lorentz force we studied in Chaps. 4 and 15. The Lorentz force and its nature are widely investigated in various areas of natural science. The Lorentz force F(t) is described by

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_23

1075

1076

23

Interaction Between Fields

Fðt Þ = eEðxðt ÞÞ þ exð_t Þ × Bðxðt ÞÞ,

ð4:88Þ

where the first term is electric Lorentz force and the second term represents the magnetic Lorentz force. We wish to relate the Lorentz force to the Hamiltonian of the interacting physical system within the framework of the quantum field theory. To this end, the minimal coupling (or minimal substitution) is an important concept in association with the covariant derivative. The covariant derivative is defined as [1] ∂ ∂ → i - qϕðt, xÞ, ∂t ∂t 1 1 — → — - qAðt, xÞ, i i i

ð23:1Þ

where q is a charge of the matter field; ϕ(t, x) and A(t, x) are the electromagnetic potentials. Using the four-vector notation, we have ∂μ → Dμ  ∂μ þ iqAμ ðxÞ,

ð23:2Þ

where Dμ defines the covariant derivative, which leads to the interaction between the electron and photon. This interaction is referred to as a minimal coupling. To see how the covariant derivative produces the interaction, we replace ∂μ with Dμ in (22.213) such that iγ μ Dμ - m ψ ðxÞ = 0:

ð23:3Þ

iγ μ ∂μ þ ieAμ - m ψ ðxÞ = 0,

ð23:4Þ

Then, we have

where e is an elementary charge (e < 0), as defined in Chap. 4. Rewriting (23.4), we get 3

iγ 0 ∂0 ψ = - i

k=1

γ k ∂k þ eγ 0 A0 þ e

3 k=1

γ k Ak þ m ψ:

ð23:5Þ

Multiplying both sides of (23.5) by γ 0, we obtain 3

i∂0 ψ = - i =

-i

k=1 3 k=1

γ 0 γ k ∂k þ eγ 0 γ 0 A0 þ e

3 k=1

γ 0 γ k ∂k þ mγ 0 þ eA0 þ e

γ 0 γ k Ak þ mγ 0 ψ 3 k=1

γ 0 γ k Ak

ψ

23.1

Lorentz Force and Minimal Coupling

= where α  γ 0γ; γ =

γ1 γ2 γ3

1077

1 α  — þ mβ þ ðeA0 - eα  AÞ ψ, i and β  γ 0. The operator

1 iα

ð23:6Þ

 — þ mβ is identical with

~ (of the free Dirac field) defined in (22.222). Therethe one-particle Hamiltonian H fore, the operator (eA0 - eα  A) in (23.6) is the interaction Hamiltonian Hint that represents the interaction between the Dirac field and electromagnetic field. Symbolically writing (23.6), we have ~ þ H int ψ ðxÞ, i∂0 ψ ðxÞ = H

ð23:7Þ

~ = 1 α  — þ mβ H i

ð23:8Þ

H int = eA0 - eα  A:

ð23:9Þ

where

and

Defining the one-particle Hamiltonian H as ~ þ H int , H=H

ð23:10Þ

i∂0 ψ ðxÞ = Hψ ðxÞ:

ð23:11Þ

we get

Equations (23.7) and (23.11) show the Schrödinger equation of the Dirac field that interacts with the electromagnetic field. The interaction Hamiltonian Hint is directly associated with the Lorentz force F(t) described by (4.88). Representing E and B using the electromagnetic potential as in (22.279) and (22.283), we get Fð t Þ = - e — ϕ ð x ð t Þ Þ -

∂Aðxðt ÞÞ þ exð_t Þ × rot Aðxðt ÞÞ: ∂t

ð23:12Þ

The second term of (23.12) can be rewritten as [2] exð_t Þ × rot A = e— A A  xð_t Þ - e xð_t Þ  — A,

ð23:13Þ

where — A acts only on A (i.e., vector potential). Then, (23.12) is further rewritten as

1078

23

Fð t Þ = - e — ϕ ð x ð t Þ Þ -

Interaction Between Fields

∂Aðxðt ÞÞ þ e— A A  xð_t Þ - e xð_t Þ  — A: ∂t

Of these terms, the moiety of conservative force FC(t) is expressed as FC ðt Þ = - e— ϕðxðt ÞÞ þ e— A A  xð_t Þ :

ð23:14Þ

Integrating (23.14) between two spatial points of Q and P, we obtain P

- FC ðt Þ  dl = e

Q

P

— ϕðxðt ÞÞ - — A A  xð_t Þ

 dl

Q

=e

P Q

dϕðxðt ÞÞ - e

P

d xð_t Þ  A

Q

= e½ϕðPÞ - ϕðQÞ - e xð_t Þ  AðPÞ - xð_t Þ  AðQÞ , where dl means the line integral; we used — ϕ  dl = dϕ and

— A A  xð_tÞ  dl = d xð_tÞ  A . Taking Q → - 1 and setting ϕ(-1) = A(-1) = 0, we get P

- FC ðt Þ  dl = eϕðPÞ - exð_t Þ  AðPÞ:

ð23:15Þ

Q

Comparing (23.15) with (23.6), we find that α corresponds to the classical velocity operator xð_t Þ [3]. Thus, Hint of (23.7) corresponds to a work the Lorentz force exerts on an electron. Notice, however, that α of (23.6) does not have a concept of a particle velocity. It is because — A of (23.14) represents a derivative with respect to the vector potential. Hence, the minimal coupling introduced through the covariant derivative defined in (23.2) properly reflects the interaction between the Dirac field and electromagnetic field. If the interaction is sufficiently weak, namely Hint is sufficiently small compared ~ we can apply the perturbation method we developed in Chap. 5. It is indeed the to H, case with our problems that follow. Nevertheless, our approaches described in this chapter largely differ from those of Chap. 5. After we present the relevant methods that partly include somewhat abstract notions, we address practical problems such as the Compton scattering which we dealt with in Chap. 1.

23.2

23.2

Lagrangian and Field Equation of the Interacting Fields

1079

Lagrangian and Field Equation of the Interacting Fields

We continue the aforementioned discussion still further. To represent Hint in a scalar form using the four-vector potentials, we rewrite (23.9) as H int = eγ 0 γ μ Aμ : Assuming that ψ(x) of (23.11) is normalized, we have an expectation value of hHinti such that hH int ðt Þi = hψ ðxÞjH int jψ ðxÞi = =

d 3 x ψ { ðxÞeγ 0 γ μ Aμ ðxÞψ ðxÞ d 3 x eψ ðxÞγ μ Aμ ðxÞψ ðxÞ:

ð23:16Þ

Thus, the interaction Hamiltonian density Hint ðxÞ is given by the integrand of (23.16) as Hint ðxÞ  eψ ðxÞγ μ Aμ ðxÞψ ðxÞ:

ð23:17Þ

Then, the interaction Lagrangian density Lint ðxÞ is given by Lint ðxÞ = - Hint ðxÞ = - eψ ðxÞγ μ Aμ ðxÞψ ðxÞ:

ð23:18Þ

Note that in (23.18) we have derived the Lagrangian density from the Hamiltonian density in a way opposite to (22.37). This derivation, however, does not cause a problem, because regarding the equation of motion of the interacting system we are considering now, the interaction does not involve derivatives of physical variables (see Sect. 22.1). Or rather, since the kinetic energy is irrelevant to the interaction, from (22.31) and (22.40) we have Lint ðxÞ = - Uint ðxÞ = - Hint ðxÞ, where Uint ðxÞ denotes the potential energy density. On the basis of the above discussion and (22.210), (22.318), as well as (23.18), the Lagrangian density LðxÞ of the interacting fields (we assume the interaction between the Dirac field and electromagnetic field) is given by LðxÞ = L ðxÞ þ Lint ðxÞ

1080

23

= ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ -

Interaction Between Fields

1 μ ∂ A ðxÞ∂ Aν ðxÞ - eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: 2 μ ν

ð23:19Þ

Note that in some textbooks [1] the sign of the third term is reversed. This is due to the difference in the definition of e (i.e., e < 0 in this book, whereas e > 0 in [1]). Rewriting (23.19) using the covariant derivative, we get 1 μ LðxÞ = ψ ðxÞ iγ μ ∂μ þ ieAμ ðxÞ - m ψ ðxÞ - ∂μ Aν ðxÞ∂ Aν ðxÞ 2 1 μ ð23:20Þ = ψ ðxÞ iγ μ Dμ - m ψ ðxÞ - ∂μ Aν ðxÞ∂ Aν ðxÞ: 2 The quantum mechanics featured by the Lagrangian density of (23.19) or (23.20) is called quantum electrodynamics (QED). From now on, we focus on QED and describe its characteristics including the fundamental principle and practical problems. For future use, we introduce the charge-current density operator sμ(x) as follows [1]: sμ ðxÞ  qψ ðxÞγ μ ψ ðxÞ:

ð23:21Þ

Notice that in the case of electron, q = e ( 0, i.e., positive charge). Thus, (23.29) represents the field equation of the positron.

23.3 Local Phase Transformation and U(1) Gauge Field In Chap. 22, we mentioned several characteristics of the gauge field and the associated gauge transformation. To give a precise definition of the “gauge field” and “gauge transformation,” however, is beyond the scope of this book. Yet, we can get an intuitive concept of these principles at the core of the covariant derivative. In Chap. 22, we have shown that the field strength Fμν(x) is invariant under the gauge transformation of (22.293). Meanwhile, we can express an exponential function (or phase factor) as eiΘðxÞ ,

ð23:30Þ

where Θ(x) is a real function of x. The phase factor (23.30) is different from those that appeared in Part I in that in this chapter the exponent may change depending upon the space-time coordinate x. Let us consider a function transformation described by f 0 ðxÞ = eiΘðxÞ f ðxÞ: Also, let us consider a set GðxÞ defined by

ð23:31Þ

23.3

Local Phase Transformation and U(1) Gauge Field

1083

GðxÞ  eiΘðxÞ ; ΘðxÞ : real function of x :

ð23:32Þ

Then, GðxÞ forms a U(1) group (i.e., continuous transformation group). That is, (A1) closure and (A2) associative law are evident (see Sect. 16.1) by choosing real functions Θλ(x) (λ 2 Λ), where Λ can be a finite set or infinite set with different cardinalities (see Sect. 6.1). (A3) Identity element corresponds to Θ(x)  0. (A4) Inverse element eiΘ(x) is e-iΘ(x). The functional transformation described by (23.31) and related to GðxÞ is said to be a local phase transformation. The important point is that if we select Θ(x) appropriately, we can make the covariant derivative of f(x) in (23.31) undergo the same local phase transformation as f(x). In other words, appropriately selecting a functional form as Θ(x), we may have 0

Dμ f ðxÞ = eiΘðxÞ Dμ f ðxÞ ,

ð23:33Þ

where the prime (′) denotes the local phase transformation. Meanwhile, in Chap. 22 we chose a function Λ(x) for the gauge transformation such that μ

A0μ ðxÞ = Aμ ðxÞ þ ∂ ΛðxÞ:

ð22:293Þ

We wish to connect Θ(x) of (23.31) to Λ(x). The proper choice is to fix Θ(x) at ΘðxÞ = - qΛðxÞ: We show this as follows: 0

Dμ f ðxÞ = ∂μ þ iqA0μ ðxÞ f 0 ðxÞ = ∂μ f 0 ðxÞ þ iqA0μ ðxÞf 0 ðxÞ μ

= ∂μ e - iqΛðxÞ f ðxÞ þ iq½Aμ ðxÞ þ ∂ ΛðxÞe - iqΛðxÞ f ðxÞ = - iqe - iqΛðxÞ ∂μ ΛðxÞ f ðxÞ þ e - iqΛðxÞ ∂μ f ðxÞ þiqAμ ðxÞe - iqΛðxÞ f ðxÞ þ iq ∂μ ΛðxÞ e - iqΛðxÞ f ðxÞ = e - iqΛðxÞ ∂μ þ iqAμ ðxÞ f ðxÞ = e - iqΛðxÞ Dμ f ðxÞ:

ð23:34Þ

In (23.34), the prime (′) denotes either the local phase transformation with the Dirac field or the gauge transformation with the electromagnetic field. Applying (23.34) to ψ ðxÞ iγ μ Dμ - m ψ ðxÞ, we get

1084

23

Interaction Between Fields

0

ψ ðxÞ iγ μ Dμ - m ψ ðxÞ = eiqΛðxÞ ψ ðxÞe - iqΛðxÞ iγ μ Dμ - m ψ ðxÞ = ψ ðxÞ iγ μ Dμ - m ψ ðxÞ:

ð23:35Þ

Thus, the first term of LðxÞ of (23.20) is invariant under both the gauge transformation (22.293) and the local phase transformation (23.31). As already seen in (22.318), the second term of LðxÞ of (23.20) represents the Lagrangian density of the free electromagnetic field. From (22.316) and (22.317), ∂μAμ(x) is gauge invariant if one uses the Lorentz gauge. It is unclear, however, whether the second term of LðxÞ in (23.20) is gauge invariant. To examine it, from (22.293) we have μ

ν

μ

ν

∂μ A0ν ðxÞ∂ A0 ðxÞ = ∂μ f½Aν þ ∂ν ΛðxÞgf∂ ½Aν ðxÞ þ ∂ ΛðxÞg μ

μ ν

= ∂μ Aν ðxÞ þ ∂μ ∂ν ΛðxÞ ½∂ Aν ðxÞ þ ∂ ∂ ΛðxÞ μ

μ

ν

= ∂μ Aν ðxÞ∂ Aν ðxÞ þ ∂μ Aν ðxÞ∂ ∂ ΛðxÞ μ

μ

ν

þ∂μ ∂ν ΛðxÞ∂ Aν ðxÞ þ ∂μ ∂ν ΛðxÞ∂ ∂ ΛðxÞ:

ð23:36Þ

Thus, we see that the term ∂μAν(x)∂μAν(x) is not gauge invariant. Yet, this does not cause a serious problem. Let us think of the action integral S(Aν) of ∂μAν(x)∂μAν(x). As in (22.306), we have S A0ν =

μ

ν

dx ∂μ A0ν ðxÞ∂ A0 :

ð23:37Þ

Taking the variation with respect to Aν and using integration by parts with the second and third terms of (23.36), we are left with ∂μ∂μΛ(x) for them on condition that Aν(x) vanishes at x = ± 1. But, from (22.317), we have μ

∂μ ∂ ΛðxÞ = 0:

ð23:38Þ

Taking variation with respect to Λ with the last term of (23.36), we also have the same result of (23.38). Thus, only the first term of (23.36) survives on condition that Aν(x) vanishes at x = ± 1 and, hence, we conclude that LðxÞ of (23.20) is virtually unchanged in total by the gauge transformation. That is, LðxÞ is gauge invariant. In this book, we restrict the gauge field to the electromagnetic vector field Aμ(x). Correspondingly, we also restrict the gauge transformation to (22.293). By virtue of the local phase transformation (23.31) and the gauge transformation (22.293), we can construct a firm theory of QED. More specifically, we can develop the theory that uniquely determines the interaction between the Dirac field and gauge field (i.e., electromagnetic field). Before closing this section, it is worth mentioning that the gauge fields must be accompanied by massless particles. Among them, the photon field is the most important both theoretically and practically. Moreover, the photons are only an

23.4

Interaction Picture

1085

entity that has ever been observed experimentally as a massless free particle. Otherwise, gluons are believed to exist, accompanied by the vector bosonic field, but those have never been observed as a massless free particle so far [4, 5]. Also, the massless free particles must move at a light velocity. We revisit Example 21.1 where the inertial frame of reference O′ is moving with a constant velocity v (>0) relative to the frame O. Suppose that the x-axis of O and the x′-axis of O′ coincide and that the frame O′ is moving in the positive direction of the x′-axis. Also, suppose that a particle is moving along the x-axis (x′-axis, too) at a velocity of u and u′ relative to the frame O and O′, respectively. Then, we have [6] u0 =

u-v 1 - uv

and

u=

u0 þ v : 1 þ u0 v

ð23:39Þ

If we put u′ = 1, we have u = 1, vice versa. This simple relation clearly demonstrates that if the velocity of a particle is 1 (i.e., light velocity) with respect to an arbitrarily chosen inertial frame of reference, the velocity of that particle must be 1 with respect to any other inertial frames of reference as well. This is the direct confirmation of the principle of constancy of light velocity and an outstanding feature of the gauge field.

23.4

Interaction Picture

In the previous sections, we have seen how the covariant derivative introduces the interaction term into the Dirac equation through minimal coupling. Meanwhile, we also acknowledge that this interaction term results from the variation of the action integral related to the interaction Lagrangian. As a result, we have gotten the proper field equation of (23.27). The magnitude of the interaction relative to the free field, however, is not explicitly represented in (23.27). In this respect, the situation is different from the case of Chap. 5 where the interaction Hamiltonian was quantitatively presented. Thus, we are going to take a different approach in this section. The point is that the Hamiltonian density Hint ðxÞ is explicitly presented in (23.17). This provides a strong clue to solving the problem. We adopt an interaction picture to address a given issue. Let us assume the time-dependent total Hamiltonian H(t) as follows: H ðt Þ = H 0 ðt Þ þ H int ðt Þ,

ð23:40Þ

where H0(t) is a free-field Hamiltonian and Hint(t) is an interaction Hamiltonian. All the H(t), H0(t), and Hint(t) may or may not be time-dependent, but we designate them as, e.g., H(t), where t denotes time. From (23.17), as Hint(t) we have

1086

23

H int ðt Þ =

dx3 Hint ðxÞ =

Interaction Between Fields

dx3 eψ ðxÞγ μ Aμ ðxÞψ ðxÞ:

ð23:41Þ

To deal with the interaction between the quantum fields systematically, we revisit Sect. 1.2. In Chap. 1, we solved the Schrödinger equation by the method of the separation of variables and obtained the solution of (1.48) described by ψ ðx, t Þ = expð- iEt ÞϕðxÞ,

ð1:60Þ

where we adopted the natural units of ħ = 1. More generally, the solution of (1.47) is given as j ψ ðt, xÞi = exp½ - iH ðt Þt jψ ð0, xÞi:

ð23:42Þ

With the exponential function of the operator H(t), especially that given in a matrix form see Chap. 15. Since H(t) is Hermitian, -iH(t)t is anti-Hermitian. Hence, exp[-iH(t)t] is unitary; see Sect. 15.2. Thus, (23.42) implies that the state vector jψ(t, x)i evolves with time through the unitary transformation of exp[-iH(t)t], starting with | ψ(0, x)i. Now, we wish to generalize the present situation in some degree. We can think of the situation as viewing the same vector in reference to a different set of basis vectors (Sect. 11.4). Suppose that jψ(λ, x)i is an element of a vector space V (or Hilbert space, more generally) that is spanned by a set of basis vectors E λ ðλ 2 ΛÞ, where Λ is an infinite set (see Sect. 6.1). Also, let jψ(λ, x)i′ be an element of another vector space V 0 spanned by a set of basis vectors E 0λ ðλ 2 ΛÞ. In general, V 0 might be different from the original vector space V , but in this chapter we assume that the mapping caused by an operator is solely an endomorphism (see Sect. 11.2). Suppose that a quantum state ψ is described as ψ=

dλE λ j ψ ðλ, xÞi:

ð23:43Þ

Equation (23.43) is given pursuant to (11.13) so that the vector ψ can be dealt with in a vector space with an infinite dimension. Note, however, if we were strictly pursuant to (13.32), we would write instead jψi =

dλjE λ iψ ðλ, xÞ,

where jE λ i is the basis set of the inner product space with ψ(λ, x) being a coordinate of a vector in that space. A reason why we use (23.43) is that we place emphasis on the coordinate representation; the basis set representation is hardly used in the present case. Operating O on the function ψ, we have

23.4

Interaction Picture

1087

Oðψ Þ =

dλE λ O j ψ ðλ, xÞi,

ð23:44Þ

where O is an arbitrary Hermitian operator. For a future purpose, we assume that the operator O does not depend on λ in a special coordinate system (or frame). Equation (23.44) can be rewritten as Oðψ Þ =

dλE λ V ðλÞV ðλÞ{ OV ðλÞV ðλÞ{ j ψ ðλ, xÞi,

ð23:45Þ

where V(λ) is a unitary operator. According to Theorem 14.5, as O is Hermitian, O can be diagonalized via unitary similarity transformation. But in (23.45), we do not necessarily assume the diagonalization. Putting E 0λ  E λ V ðλÞ, O0 ðλÞ  V ðλÞ{ OV ðλÞ, V ðλÞ{ jψ ðλ, xÞi  jψ ðλ, xÞi0 ,

ð23:46Þ

we rewrite (23.45) such that O ðψ Þ =

dλE 0λ O0 ðλÞ j ψ ðλ, xÞi0 :

ð23:47Þ

Equation (23.47) shows how the operator O′(λ) and the state vector jψ(λ, x)i′ change in accordance with the transformation of the basis vectors E λ . We can develop the theory, however, so that jψ(λ, x)i′ may not depend upon λ in a specific frame. To show how it looks in the present case, according to the custom we choose t (time variable) for λ and choose a unitary operator exp[-iH(t)t] for V(t). Namely, V ðt Þ = exp½ - iH ðt Þt   U ðt Þ:

ð23:48Þ

A time-independent state vector | ψ(0, x)i of (23.42) is said to be a state vector of Heisenberg picture (or Heisenberg representation). In other words, we can choose the Heisenberg picture so that the state vector jψ(t, x)i′ may not depend upon t. The other way around, we may choose a special frame where the operator O does not depend on t. That frame is referred to as the Schrödinger frame where a timedependent state vector | ψ(t, x)i described by (23.42) is called a state vector of Schrödinger picture (or Schrödinger representation). Accordingly, we designate a state vector in the Heisenberg frame as j iH and that in the Schrödinger frame as j iS. Replacing λ with t, jψ(λ, x)i with | ψ(t, x)iS, and O with OS in (23.44), we rewrite it as

1088

23

O ðψ Þ =

Interaction Between Fields

dtE t OS jψ ðt, xÞiS ,

ð23:49Þ

where OS denotes the operator O in the Schrödinger frame. Equation (23.49) can further be rewritten as O ðψ Þ =

dtE t U ðt ÞU ðt Þ{ OS U ðt ÞU ðt Þ{ jψ ðt, xÞiS

=

dtE 0t U ðt Þ{ OS U ðt Þ jψ ðt, xÞi0S

=

dtE 0t U ðt Þ{ OS U ðt Þ jψ ðt, xÞi0S dtE 0t OH ðt Þjψ ð0, xÞiH ,

ð23:50Þ

jψ ðt, xÞi0S  U ðt Þ{ jψ ðt, xÞiS ,

ð23:51Þ

= where we define

E 0t  E t U ðt Þ, jψ ð0, xÞiH  jψ ðt, xÞi0S = U ðt Þ{ jψ ðt, xÞiS ,

ð23:52Þ

OH ðt Þ  U ðt Þ{ OS U ðt Þ:

ð23:53Þ

Putting t = 0 in the above equation, we have jψ ð0, xÞiH = jψ ð0, xÞiS : Equations (23.49) and (23.50) imply the invariance of O(ψ) upon transfer from the Schrödinger frame to the Heisenberg frame. The situation is in parallel to that of Sect. 11.4. We can view (23.51) and (23.52) as the unitary transformation between the two vectors | ψ(t, x)iS and | ψ(0, x)iH. Correspondingly, (23.53) represents the unitary similarity transformation of the operator O between the two frames. In (23.53), we assume that OH(t) is a time-dependent operator, but that OS is timeindependent. In particular, if we choose the Hamiltonian H(t) for O, H H ðt Þ = H H = H S :

ð23:54Þ

It is because H(t) and U(t) = exp [-iH(t)t] are commutative. Thus, the Hamiltonian is a time-independent quantity, regardless of the specific frames chosen. Differentiating OH(t) with respect to t in (23.53), we obtain

23.4

Interaction Picture

1089

dOH ðt Þ = iH ðt ÞeiH ðtÞt OS e - iH ðtÞt þ eiH ðtÞt OS ½ - iH ðt Þe - iH ðtÞt dt = iH ðt ÞeiH ðtÞt OS e - iH ðtÞt - ieiH ðtÞt OS e - iH ðtÞt H ðt Þ = i H ðt ÞeiH ðtÞt OS e - iH ðtÞt - eiH ðtÞt OS e - iH ðtÞt H ðt Þ = i½H ðt ÞOH ðt Þ - OH ðt ÞH ðt Þ = i½H ðt Þ, OH ðt Þ,

ð23:55Þ

where with the second equality, we used the fact that e-iH(t)t and H(t) are commutative. Multiplying both sides of (23.55) by i, we get i

dOH ðt Þ = ½OH ðt Þ, H ðt Þ: dt

ð23:56Þ

Equation (23.56) is well-known as the Heisenberg equation of motion. Let us think of a commutator [OH(t), PH(t)], where PH(t) is another physical quantity of the Heisenberg picture. Then, we have ½OH ðt Þ, PH ðt Þ = U ðt Þ{ ½OS , PS U ðt Þ,

ð23:57Þ

where PS is the physical quantity of the Schrödinger picture that corresponds to PH(t) of the Heisenberg picture. If, in particular, [OS, PS] = c (c is a certain constant), [OH(t), PH(t)] = c as well. Thus, we have ½qH ðt Þ, pH ðt Þ = ½qS , pS  = i,

ð23:58Þ

where q and p are the canonical coordinate and canonical momentum, respectively. Thus, the unitary similarity transformation described by (23.53) makes the canonical commutation relation invariant. Such a transformation is called the canonical transformation. Now, we get back to the major issue related to the interaction picture. A major difficulty in solving the interacting field equation rests partly on the fact that H0(t) and Hint(t) in general are not commutative. Let us consider another transformation of the state vector and operator different from (23.50). That is, using U 0 ðt Þ  exp½ - iH 0 ðt Þt 

ð23:59Þ

instead of U(t), we have an expression related to (23.51)-(23.53). Namely, we have Oðψ Þ = where we define

dtE It OI ðt Þjψ ðt, xÞiI ,

ð23:60Þ

1090

23

Interaction Between Fields

jψ ðt, xÞiI  U 0 ðt Þ{ jψ ðt, xÞiS = U 0 ðt Þ{ U ðt Þjψ ð0, xÞiH , E It

ð23:61Þ

 E t U 0 ðt Þ,

OI ðt Þ  U 0 ðt Þ{ OS U 0 ðt Þ:

ð23:62Þ

In (23.62), OI(t) is a physical quantity of the interaction picture. This time, | ψ(t, x)iI is time-dependent. The previous results (23.45) and (23.46) obtained with the Schrödinger picture and Heisenberg picture basically hold as well between the Schrödinger picture and the interaction picture. Differentiating | ψ(t, x)iI with respect to t, we get djψ ðt, xÞiI = iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS þ eiH 0 ðtÞt ½ - iH ðt Þjψ ðt, xÞiS dt = iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS - ieiH 0 ðtÞt ½H 0 ðt Þ þ H int ðt Þjψ ðt, xÞiS = iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS - iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS - ieiH 0 ðtÞt H int ðt Þjψ ðt, xÞiS = - ieiH 0 ðtÞt H int ðt Þe - iH 0 ðtÞt jψ ðt, xÞiI = - iH int ðt ÞI jψ ðt, xÞiI ,

ð23:63Þ

where with the third equality eiH 0 ðtÞt and H0(t) are commutative. In (23.63), we defined Hint(t)I as H int ðt ÞI  eiH 0 ðtÞt H int ðt Þe - iH 0 ðtÞt = U 0 ðt Þ{ H int ðt ÞU 0 ðt Þ:

ð23:64Þ

Also, notice again that in (23.63) e ± iH 0 ðtÞt and Hint(t) are not commutative in general. Multiplying both sides of (23.63) by i, we obtain i

djψ ðt, xÞiI = H int ðt ÞI jψ ðt, xÞiI : dt

ð23:65Þ

Similarly to (23.56), we have i

dOI ðt Þ = ½OI ðt Þ, H 0 ðt Þ: dt

ð23:66Þ

Close relationship between (23.66) and (23.56) is evident. We schematically depict the conversion among the three pictures of Schrödinger, Heisenberg, and interaction in Fig. 23.1. Equation (23.65) formally resembles the time-dependent Schrödinger equation (1.47) and provides a starting point to address a key question of the interacting fields.

23.5

S-Matrix and S-Matrix Expansion

Fig. 23.1 State vector | ψ(t, x)iI of the interaction picture in comparison with those of the Schrödinger picture | ψ(t, x)iS and the Heisenberg picture | ψ(0, x)iH

1091

| ( , )⟩ ( )

( )

| ( , )⟩

( )

| ( , )⟩

23.5 S-Matrix and S-Matrix Expansion Since (23.65) is a non-linear equation, it is impossible in general to obtain an analytic solution. Instead, we will be approaching the problem by an iteration method. To this end, the S-matrix expansion plays a central role. From here on, we exclusively adopt the interaction picture that gave (23.65), and so we rewrite (23.65) as follows: i

djΨðt, xÞi = H int ðt ÞjΨðt, xÞi, dt

ð23:67Þ

where the subscript I of | ψ(t, x)iI and Hint(t)I was omitted. Namely, as we work solely on the interaction picture, we redefine the state vector and Hamiltonian such that jΨðt, xÞi  jψ ðt, xÞiI and H int ðt Þ  H int ðt ÞI :

ð23:68Þ

If we have no interaction, i.e., Hint(t) = 0, we have a state vector | Ψ(t, x)i constant in time. Suppose that the interaction is “switched on” at a certain time ton and that the state vector is in | Ψ(ti, x)i at an initial time t = ti well before ton. In that situation, we define the state vector | Ψ(ti, x)i as jΨðt i , xÞi  jii ≈ jΨð- 1, xÞi:

ð23:69Þ

Also, suppose that the interaction is “switched off” at a certain time toff and that the state vector is in | Ψ(tf, x)i at a final time t = tf well after toff. Then, the state vector | Ψ(tf, x)i is approximated by jΨ t f , x

≈ jΨð1, xÞi:

ð23:70Þ

Because of the interaction (or collision), the state | Ψ(1, x)i may include different final states | fαi (α = 1, 2, ⋯), which may involve different species of particles. If we assume that | fαi form a complete orthonormal system (CONS), we have [1]

1092

23

α

Interaction Between Fields

jf α ihf α j = E,

ð23:71Þ

where E is an identity operator. Then, we have jΨð1, xÞi =

α

jf α ihf α jΨð1, xÞi:

ð23:72Þ

(i) The state | ii is described by the Fock state [see (22.126)] that is characterized by the particle species and definite number of those particles with definite momenta and energies. The particles in | ii are far apart from one another before the collision such that they do not interact with one another. The particles in | fαi are also far apart from one another after the collision. (ii) The states | ii and | fαi are described by the plane wave. Normally, the interaction is assumed to cause in a very narrow space and short time. During the interaction, we cannot get detailed information about | Ψ(t, x)i. Before and after the interaction, the species or number of particles may or may not be conserved. The plane wave description is particularly suited for the actual experimental situation where accelerated particle beams collide. The bound states, on the other hand, are neither practical nor appropriate. In what follows we are going to deal with electrons, positrons, and photons within the framework of QED. The S-matrix S is defined as an operator that relates | fαi to | ii such that f α jΨ t f , x

≈ hf α jΨð1, xÞi = hf α jSjii  Sf α ,i = hf α jSjΨð- 1, xÞi,

ð23:73Þ

where with the first near equality, we used (23.70). Then, (23.72) can be rewritten as jΨð1, xÞi =

α

Sf α ,i jf α i:

ð23:74Þ

Equation (23.74) is a plane wave expansion of | Ψ(1, x)i. If jΨ(tf, x)i and jii are normalized, we have Ψ tf , x j Ψ tf , x

= hijii = 1 = ijS{ Sji :

Since the transformation with S does not change the norm, S is unitary. Integrating (23.67) with respect to t from -1 to t, we get j Ψðt, xÞi = j Ψð- 1, xÞi þ ð- iÞ

t -1

dt 1 H int ðt 1 Þ j Ψðt 1 , xÞi

ð23:75Þ

23.5

S-Matrix and S-Matrix Expansion

1093 t

= j ii þ ð- iÞ

-1

dt 1 H int ðt 1 Þ j Ψðt 1 , xÞi:

ð23:76Þ

The RSH of (23.76) comprises jii and an additional second term. Since we can solve (23.67) only iteratively, as a next step replacing jΨ(t1, x)i with t1

j ii þ ð- iÞ

-1

H int ðt 2 Þ j Ψðt 2 , xÞi,

ð23:77Þ

we obtain j Ψðt, xÞi = j ii þ ð- iÞ þð- iÞ2

t

t1

-1

dt 1

-1

t -1

dt 1 H int ðt 1 Þ j ii

dt 2 H int ðt 1 ÞH int ðt 2 Þ j Ψðt 2 , xÞi:

ð23:78Þ

Repeating the iteration, we get j Ψðt, xÞi = j ii þ

r

ð- iÞn n=1

t -1

tn - 1

dt 1 ⋯

-1

dt n H int ðt 1 Þ⋯H int ðt n ÞjΨðt n , xÞi ,

ð23:79Þ

where t0  t when n = 1. Although we could further replace jΨ(tr, x)i with j ii þ ð- iÞ

tr -1

H int ðt rþ1 Þ j Ψðt rþ1 , xÞi,

we cut off the iteration process in this stage. Hence, jΨ(t, x)i of (23.79) is approximated by ð- iÞn n=1 r

ð- iÞn n=1

= j ii þ = Eþ

r

t

r

j Ψðt, xÞi ≈ j ii þ

ð- iÞn n=1

t -1

t -1

-1

dt 1 ⋯

dt 1 ⋯

dt 1 ⋯

tn - 1 -1

tn - 1 -1

tn - 1 -1

dt n H int ðt 1 Þ⋯H int ðt n Þjii

dt n H int ðt 1 Þ⋯H int ðt n Þ j ii

dt n H int ðt 1 Þ⋯H int ðt n Þ

Taking t, t1, ⋯, tn → 1 in (23.80), we have

j ii:

ð23:80Þ

1094

23

Interaction Between Fields

j Ψð1, xÞi 1

r

ð- iÞn n=1

≈ Eþ

-1

1

dt 1 ⋯

-1

dt n H int ðt 1 Þ⋯H int ðt n Þ

j ii:

ð23:81Þ

Taking account of (23.73), we have j Ψð1, xÞi = Sjii: Then, from (23.81) we obtain 1

r

ð- iÞn n=1

S=E þ

-1

dt 1 ⋯

1 -1

dt n H int ðt 1 Þ⋯H int ðt n Þ :

Defining S(0)  E and r

SðrÞ 

ð- iÞn n=1

1 -1

dt 1 ⋯

1 -1

dt n H int ðt 1 Þ⋯H int ðt n Þ ðr ≥ n ≥ 1Þ,

ð23:82Þ

we get r

S=

n=0

SðnÞ :

ð23:83Þ

The matrix S(n) is said to be the n-th order S-matrix. In (23.83), the number r can be as large as one pleases, even though the related S-matrix calculations become increasingly complicated. Taking the T-product of (23.82), moreover, we get ð- iÞn n = 0 n! r

SðrÞ =

1

1

-1

dt 1

-1

dt 2 ⋯

1 -1

dt n T½H int ðt 1 Þ⋯H int ðt n Þ:

ð23:84Þ

Note that the factorial in the denominator of (23.84) resulted from taking the T-product. Using the Hamiltonian density Hint ðxÞ of (23.17), we rewrite (23.84) as Sð r Þ =

ð- iÞn n = 0 n! r

1 -1

1

dx1

-1

dx2 ⋯

1 -1

dxn T½Hint ðx1 Þ⋯Hint ðxn Þ:

ð23:85Þ

Combining (23.83) with (23.73) and (23.74), we get j Ψð1, xÞi =

α

n

f α jSðnÞ ji jf α i:

ð23:86Þ

23.6

N-Product and T-Product

1095

By virtue of using Hint ðxÞ, S has been made an explicitly covariant form. Notice that we must add a factor -1 when we reorder Fermion quantities as in the case of the T-product. But, in the above we have not put the factor -1, because Hint ðxÞ contains two Fermion quantities and, hence, exchanging Hint ðxi Þ and Hint xj xi ≠ xj produces even number permutations of the Fermion quantities. Then, even permutations produce a factor (-1)2n = 1 where n is a positive integer. The S-matrix contains the identity operator E and additional infinite terms and can be expressed as an infinite power series of the fine structure constant α described in the natural units (SI units) by α  e2 =4π e2 =4πε0 ħc ≈ 7:29735 × 10 - 3 :

23.6

ð23:87Þ

N-Product and T-Product

In Sects. 22.3 and 23.5, we mentioned the normal-ordered product (N-product) and time-ordered product (T-product). The practical calculations based on them frequently appear and occupy an important position in the quantum field theory. We represent them as, for example NðABÞ and TðABC⋯Þ, where the former represents the N-product and the latter denotes the T-product; A, B, C, etc. show the physical quantities (i.e., quantum fields). The N-product and T-product are important not only in the practical calculations but also from the theoretical point of view. In particular, they are useful to clarify the relativistic invariance (or covariance) of the mathematical formulae.

23.6.1 Example of Two Field Operators We start with the simplest case where two field operators are pertinent. The essence of both the N-product and T-product clearly shows up in this case and the methods developed here provide a good starting point for dealing with more complicated problems. Let A(x) and B(x) be two field operators, where x represents the space-time coordinates. We choose A(x) and B(x) from among the following operators:

1096

23

ϕð xÞ =

d3 k 3

ð2π Þ ð2k0 Þ

Interaction Between Fields

aðkÞe - ikx þ a{ ðkÞeikx ,

ð22:98Þ

ψ ð xÞ =

1 -1

dp3 3

ð2π Þ 2p0

h= ±1

bðp, hÞuðp, hÞe - ipx þ d{ ðp, hÞvðp, hÞeipx , ð22:232Þ

ψ ð xÞ =

1 -1

dp3 3

ð2π Þ 2p0

h= ±1

dðp, hÞvðp, hÞe - ipx þ b{ ðp, hÞuðp, hÞeipx , ð23:88Þ

ψ anti ðxÞ =

1 -1

dp3 3

ð2π Þ 2p0

h= ±1

dðp, hÞuðp, hÞe - ipx þ b{ ðp, hÞvðp, hÞeipx , ð22:244Þ

ψ anti ðxÞ =

1 -1

dp3 3

ð2π Þ 2p0

h= ±1

bðp, hÞvðp, hÞe - ipx þ d{ ðp, hÞuðp, hÞeipx , ð23:89Þ

μ

A ð xÞ =

d3 k 3

ð2π Þ ð2k0 Þ

3 r=0

aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx : ð22:334Þ

Equations (22.244) and (23.89) represent the operators of a positron. Note that with respect to (22.98) for the real scalar field and (22.334) for the photon field, the field operators ϕ(x) and Aμ(x) are Hermitian. The Dirac field operators, on the other hand, are not Hermitian. For this reason, we need two pairs of operators b(p, h), b{(p, h); d(p, h), d{(p, h) to describe the Dirac field. We see that it is indeed the case when we describe the time-ordered products (vide infra). When QED is relevant, the operators of the Dirac field and photon field are chosen from among ψ(x), ψ ðxÞ, ψ anti(x), ψ anti ðxÞ, and Aμ(x). The above field operators share several features in common. (i) They can be divided into two parts, i.e., the positive-frequency part (containing the factor e-ikx)

23.6

N-Product and T-Product

1097

and the negative-frequency part (containing the factor eikx) [1]. The positive and negative frequencies correspond to the positive energy and negative energy, respectively. These quantities result from the fact that the relativistic field theory is associated with the Klein-Gordon equation (see Chaps. 21 and 22). (ii) For each operator, the first term, i.e., the positive-frequency part, contains the annihilation operator and the second term (negative-frequency part) contains the creation operator. Accordingly, we express the operators as in, e.g., (22.99) and (22.132) such that [1] ϕðxÞ = ϕþ ðxÞ þ ϕ - ðxÞ:

ð23:90Þ

Care should be taken when dealing with ψ ðxÞ [or ψ anti ðxÞ]. It is because when taking the Dirac adjoint of ψ(x) [or ψ anti(x)], the positive-frequency part and the negative-frequency part are switched. Let us get back to the present task. As already mentioned in Sect. 22.3, with respect to the N-product for, e.g., the scalar field, the annihilation operators a(k) always stand to the right of the creation operators a{(k). Now, we show the computation rule of the N-product as follows: (i) Boson: N½ϕðxÞϕðyÞ = Nf½ϕþ ðxÞ þ ϕ - ðxÞ½ϕþ ðyÞ þ ϕ - ðyÞg

= N½ϕþ ðxÞϕþ ðyÞ þ ϕþ ðxÞϕ - ðyÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðxÞϕ - ðyÞ = ϕþ ðxÞϕþ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðxÞϕ - ðyÞ:

ð23:91Þ

Notice that with the second term in the last equality of (23.91), ϕ+(x)ϕ-( y) has been switched to ϕ-( y)ϕ+(x). Similarly, we obtain N½ϕðyÞϕðxÞ þ

þ

-

þ

= ϕ ðyÞϕ ðxÞ þ ϕ ðxÞϕ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðyÞϕ - ðxÞ:

ð23:92Þ

Applying (22.134) to (23.92), we get N½ϕðyÞϕðxÞ = ϕþ ðxÞϕþ ðyÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðxÞϕ - ðyÞ:

ð23:93Þ

Comparing (23.91) and (23.93), we have N½ϕðxÞϕðyÞ = N½ϕðyÞϕðxÞ:

ð23:94Þ

1098

23

Interaction Between Fields

In this way, in the Boson field case, the N-product is kept unchanged after exchanging the operators involved in the N-product. The expressions related to (23.91) and (23.94) hold with the photon field Aμ(x) as well. (ii) Fermion (electron, positron, etc.): N½ψ ðxÞψ ðyÞ = Nf½ψ þ ðxÞ þ ψ - ðxÞ½ψ þ ðyÞ þ ψ - ðyÞg

= N½ψ þ ðxÞψ þ ðyÞ þ ψ þ ðxÞψ - ðyÞ þ ψ - ðxÞψ þ ðyÞ þ ψ - ðxÞψ - ðyÞ = ψ þ ðxÞψ þ ðyÞ - ψ - ðyÞψ þ ðxÞ þ ψ - ðxÞψ þ ðyÞ þ ψ - ðxÞψ - ðyÞ:

ð23:95Þ

The expression related to (23.92) holds with the Dirac field ψ ðxÞ, ψ anti ðxÞ, and ψ anti ðxÞ. With the above calculations, the positive-frequency part ψ +(x) always stands to the right of the negative-frequency part ψ -(x) as in the Boson case. When the multiplication order is switched, however, the sign of the relevant term is reversed (+ to - and - to +) for the Fermion case. The reversal of the sign with respect to +ψ +(x)ψ -( y) and -ψ -( y)ψ +(x) results from the relations (22.246) and the anti-commutation relations (22.242). Similarly, we have N½ψ ðyÞψ ðxÞ þ

þ

-

þ

= ψ ðyÞψ ðxÞ - ψ ðxÞψ ðyÞ þ ψ - ðyÞψ þ ðxÞ þ ψ - ðyÞψ - ðxÞ:

ð23:96Þ

Applying (22.245) to (23.96), we get N½ψ ðyÞψ ðxÞ = - ψ þ ðxÞψ þ ðyÞ - ψ - ðxÞψ þ ðyÞ þ ψ - ðyÞψ þ ðxÞ - ψ - ðxÞψ - ðyÞ:

ð23:97Þ

Comparing (23.95) and (23.97), we have N½ψ ðyÞψ ðxÞ = - N½ψ ðxÞψ ðyÞ:

ð23:98Þ

Contrast (23.98) with (23.94) with respect to the presence or absence of the minus sign. If we think of N½ψ ðxÞψ ðyÞ, we can perform the relevant calculations following (23.95)-(23.98) in the same manner. Thinking of the constitution of the N-product, it is clear that for both the cases of Boson and Fermion, the vacuum expectation value of the N-product is zero. Or rather, the N-product is constructed in such a way that the vacuum expectation value of the product of field operators may vanish.

23.6

N-Product and T-Product

23.6.2

1099

Calculations Including Both the N-Products and T-Products

We continue dealing with the mathematical expressions of the above example and examine their implications for both the Boson and Fermion cases. (i) Boson: Subtracting N[ϕ(x)ϕ( y)] from ϕ(x)ϕ( y) in (23.91), we have ϕðxÞϕðyÞ - N½ϕðxÞϕðyÞ = ½ϕþ ðxÞ, ϕ - ðyÞ = iΔþ ðx - yÞ,

ð23:99Þ

where with the last equality we used (22.137). Exchanging x and y in (23.99), we obtain ϕðyÞϕðxÞ - N½ϕðyÞϕðxÞ = ½ϕþ ðyÞ, ϕ - ðxÞ = iΔþ ðy - xÞ = - iΔ - ðx - yÞ, ð23:100Þ where with the last equality we used (22.138). Using (23.94), we get ϕðyÞϕðxÞ - N½ϕðxÞϕðyÞ = - iΔ - ðx - yÞ:

ð23:101Þ

Equations (23.99) and (23.101) show that if we subtract the N-product from the product of the field operators, i.e., ϕ(x)ϕ( y) or ϕ( y)ϕ(x), we are left with the scalar of iΔ+(x - y) or -iΔ-(x - y). This implies the relativistic invariance of (23.99) and (23.101). Meanwhile, sandwiching (23.99) between h0j and | 0i, i.e., taking a vacuum expectation value of (23.99), we obtain h0 j ϕðxÞϕðyÞj0i - h0 j N½ϕðxÞϕðyÞj0i = h0 j ½ϕþ ðxÞ, ϕ - ðyÞj0i:

ð23:102Þ

Taking account of the fact that h0 j N[ϕ(x)ϕ( y)]| 0i vanishes, we get h0jϕðxÞϕðyÞj0i = h0j½ϕþ ðxÞ, ϕ - ðyÞj0i = ½ϕþ ðxÞ, ϕ - ðyÞh0 j 0i = ½ϕþ ðxÞ, ϕ - ðyÞ = iΔþ ðx - yÞ,

ð23:103Þ

where with the second equality we used the fact that [ϕ+(x), ϕ-( y)] = iΔ+(x - y) is a c-number, see (22.137); with the second to the last equality, we used the normalized condition h0 j 0i = 1: Then, using (23.103), we rewrite (23.99) as [1]

ð23:104Þ

1100

23

Interaction Between Fields

ϕðxÞϕðyÞ - N½ϕðxÞϕðyÞ = h0 j ϕðxÞϕðyÞj0i:

ð23:105Þ

Thus, (23.105) shows the relativistic invariance based on the N-product and the vacuum expectation value of ϕ(x)ϕ( y). We wish to derive a more useful expression on the basis of (23.105). To this end, we deal with the mathematical expressions including both the N-products and T-products. Multiplying both sides of (23.99) by θ(x0 - y0), we get θ x0 - y0 ϕðxÞϕðyÞ - θ x0 - y0 N½ϕðxÞϕðyÞ = iθ x0 - y0 Δþ ðx - yÞ: ð23:106Þ Multiplying both sides of (23.101) by θ(y0 - x0), we obtain θ y0 - x0 ϕðyÞϕðxÞ - θ y0 - x0 N½ϕðxÞϕðyÞ = - iθ y0 - x0 Δ - ðx - yÞ:

ð23:107Þ

Adding both sides of (23.106) and (23.107), we get θ x0 - y0 ϕðxÞϕðyÞ þ θ y0 - x0 ϕðyÞϕðxÞ - θ x0 - y0 þ θ y0 - x0 N½ϕðxÞϕðyÞ = i θ x0 - y0 Δþ ðx - yÞ - θ y0 - x0 Δ - ðx - yÞ = ΔF ðx - yÞ,

ð23:108Þ

where the last equality is due to (22.185). Using the definition of the time-ordered product (or T-product) of (22.178) and the relation described by (see Chap. 10) θ x0 - y0 þ θ y0 - x0  1,

ð23:109Þ

we obtain T½ϕðxÞϕðyÞ - N½ϕðxÞϕðyÞ = ΔF ðx - yÞ = h0jT½ϕðxÞϕðyÞj0i,

ð23:110Þ

where with the last equality we used (22.182). Comparing (23.110) with (23.105), we find that in (23.110) T[ϕ(x)ϕ( y)] has replaced ϕ(x)ϕ( y) of (23.105) except for ϕ(x)ϕ( y) in N[ϕ(x)ϕ( y)]. Putting it another way, subtracting the N-product from the T-product in (23.110), we are left with the relativistically invariant scalar h0| T [ϕ(x)ϕ( y)]| 0i. (ii) Fermion: The discussion is in parallel with that of the Boson case mentioned above. From a practical point of view, a product ψ ðxÞψ ðyÞ is the most important. Subtracting N½ψ ðxÞψ ðyÞ from ψ ðxÞψ ðyÞ in (23.95), we have

23.6

N-Product and T-Product

1101

ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ = fψ þ ðxÞ, ψ - ðyÞg = iSþ ðx - yÞ

ð23:111Þ

# multiply both sides by θ(x0 - y0) θ x0 - y0 ψ ðxÞψ ðyÞ - θ x0 - y0 N½ψ ðxÞψ ðyÞ = iθ x0 - y0 Sþ ðx - yÞ: ð23:112Þ Note that with the last equality of (23.111), we used (22.256). In a similar manner, we have ψ ðyÞψ ðxÞ - N½ψ ðyÞψ ðxÞ þ

-

= fψ ðyÞ, ψ ðxÞg = fψ - ðxÞ, ψ þ ðyÞg = iS - ðx - yÞ

ð23:113Þ

# multiply both sides by θ(y0 - x0) θ y0 - x0 ψ ðyÞψ ðxÞ - θ y0 - x0 N½ψ ðyÞψ ðxÞ = iθ y0 - x0 S - ðx - yÞ: ð23:114Þ Subtracting both sides of (23.114) from (23.112) and using the definition of the time-ordered product of (22.254) for the Dirac field, we get T½ψ ðxÞψ ðyÞ - θ x0 - y0 þ θ y0 - x0 N½ψ ðxÞψ ðyÞ = T½ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ = i θ x0 - y0 Sþ ðx - yÞ - θ y0 - x0 S - ðx - yÞ :

ð23:115Þ

where we used the following relationship that holds with the Fermion case: N½ψ ðyÞψ ðxÞ = - N½ψ ðxÞψ ðyÞ:

ð23:116Þ

Using (22.258), we rewrite (23.115) as T½ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ = SF ðx - yÞ = h0jT½ψ ðxÞψ ðyÞj0i:

ð23:117Þ

Again, we have obtained the expression (23.117) related to (23.110). Collectively writing these equations, we obtain [1] T½AðxÞBðyÞ - N½AðxÞBðyÞ = h0jT½AðxÞBðyÞj0i:

ð23:118Þ

In the case where both A and B are Fermion operators, A(x) = ψ(x) [or ψ ðxÞ] and BðyÞ = ψ ðyÞ [or ψ( y)] are intended. In this case, all three terms of (23.118) change the sign upon exchanging A(x) and B( y). Therefore, (23.118) does not change the sign upon the same exchange. Meanwhile, in the case where at least one of A and B is a Boson operator, each term of (23.118) does not change sign upon permutation of A(x) and B( y). Hence, (23.118) holds as well.

1102

23

Interaction Between Fields

In the case of the photon field, the discussion can be developed in parallel to the scalar bosonic field, because the photon is among a Boson family. In both the Boson and Fermion cases, the essence of the perturbation method clearly shows up in the simple expression (23.118). In fact, (23.118) is the simplest case of the Wick’s theorem [1, 3]. In the general case, the Wick’s theorem is expressed as [1] C  YZ TðABCD⋯WXYZ Þ = NðABCD⋯WXYZ Þ þ N AB ⎵ þN ABC  YZ þ ⋯ þ N ABC  YZ þ N ABCD  YZ þ ⋯ j j ⎵

⎵ ⎵

þ ⋯: þN AB⋯WX YZ ⎵ ⎵

ð23:119Þ

Equation (23.119) looks complicated, but clearly shows how the Feynman propagators are produced in coexistence of the T-products and N-products. Glancing at (23.110) and (23.117), this feature is once again obvious. In RHS of (23.119), the terms from the second on down are sometimes called the generalized normal products [1]. We have another rule with them. For instance, we have [1]  YZ = ð- 1ÞP AC ½NðB  YZ Þ, N ABC j j ⎵

ð23:120Þ

where P is the number of interchanges of neighboring Fermion operators required to change the order (ABC   YZ) to (ACB   YZ). In this case, the interchange takes place between B and C. If both B and C are Fermion operators, then P = 1; otherwise, P = 0. Bearing in mind the future application (e.g., the Compton scattering), let us think of a following example given as N F 1 B1 F 01 F 2 B2 F 02 ,

ð23:121Þ

where F and F′ are Fermion operators; B and B′ are Boson operators. Indices 1 and 2 designate the space-time coordinates. Suppose that we have a following contraction within the operators given by N F 1 B1 F 01 F 2 B2 F 02 = ð- 1Þ2 F 1 F 02 N B1 F 01 F 2 B2 j

j

= F 1 F 02 N j

j

B1 F 01 F 2 B2

j

:

Meanwhile, suppose that we have

j

ð23:122Þ

23.7

Feynman Rules and Feynman Diagrams in QED

1103

N F 2 B2 F 02 F 1 B1 F 01 = F 02 F 1 ð- 1Þ2 N F 2 B2 B1 F 01 = - F 1 F 02 ð- 1Þ2 N F 2 B2 B1 F 01 j

=

j

j

j

j

- F 1 F 02 ð- 1Þ2 ð- 1ÞN j

j

B1 F 01 F 2 B2

= F 1 F 02 N j

j

j

B1 F 01 F 2 B2

:

ð23:123Þ

Comparing (23.122) and (23.123), we get N F 1 B1 F 01 F 2 B2 F 02 = N F 2 B2 F 02 F 1 B1 F 01 : j

j

j

ð23:124Þ

j

Thus, we have obtained a simple but important relationship with the generalized normal product. We make use of this relationship in the following sections.

23.7

Feynman Rules and Feynman Diagrams in QED

So far, we have developed the general discussion on the interacting fields and how to deal with the interaction using the perturbation methods based upon the S-matrix method. In this section, we develop the methods to apply the general principle to the specific practical problems. For this purpose, we make the most of the Feynman diagrams and associated Feynman rules. In parallel, we represent the mathematical expressions using a symbol or diagram. One of typical examples is a contraction, which we mentioned in the previous section. We summarize frequently appearing contractions below. The contractions represent the Feynman propagators. ϕðxÞϕðyÞ = ΔF ðx - yÞ,

ð23:125Þ

ψ ðxÞψ ðyÞ = - ψ ðyÞψ ðxÞ = SF ðx - yÞ,

ð23:126Þ

Aμ ðxÞAν ðyÞ = DF ðx - yÞ:

ð23:127Þ

j

j

j

j

j

j

j

j

Among the above relations, (23.126) and (23.127) are pertinent to QED. When we examine the interaction between the fields, it is convenient to describe field operator F such that [1] F = Fþ þ F - ,

ð23:128Þ

where F+ and F- represent the positive-frequency part and negative-frequency part, respectively (Sects. 22.3.6 and 23.6.1). Regarding a product of more than one operators F, G, etc., we similarly denote

1104

23

Interaction Between Fields

FG⋯ = ðF þ þ F - ÞðGþ þ G - Þ⋯:

ð23:129Þ

In our present case of the Hamiltonian density of electromagnetic interaction, we have Hint ðxÞ  e½ψ þ ðxÞ þ ψ - ðxÞ γ μ Aþ μ ðxÞ þ Aμ ðxÞ

½ψ þ ðxÞ þ ψ - ðxÞ: ð23:130Þ

The operator Aμ(x) defined by (22.336) is an Hermitian operator with Aμ ðxÞ = Aþ μ ðxÞ þ Aμ ðxÞ:

We have Aþ μ ð xÞ

{

= Aμ- ðxÞ

and

Aμ- ðxÞ

{

= Aþ μ ðxÞ:

ð23:131Þ

Note, however, that neither ψ(x) nor ψ ðxÞ is Hermitian. In conjunction to this fact, the first factor of (23.130) contains a creation operator ψ - of electron and an annihilation operator ψ þ of positron. The third factor of (23.130), on the other hand, includes a creation operator ψ - of positron and an annihilation operator ψ + of electron. The relevant discussion can be seen in Sect. 23.8.2. According to the prescription of Sect. 23.5, we wish to calculate matrix elements of (23.86) up to the second order of the S-matrix. For simplicity, we assume that the final state obtained after the interaction between the fields is monochromatic. In other words, the final state of individual particles is assumed to be described by a single plane wave. For instance, if we are observing photons and electrons as the final states after the interaction, those photons and electrons possess definite four-momentum and spin (or polarization). We use f instead of fα in (23.86) accordingly.

23.7.1

Zeroth- and First-Order S-Matrix Elements

We start with the titled simple cases to show how the S-matrix elements are related to the elementary processes that are associated with the interaction between fields. (i) Zeroth-order matrix element hf| S(0)| ii: We have S(0) = E (identity operator) in this trivial case. We assume j ii = b{ ðp, hÞj0i

and

j f i = b{ ðp0 , h0 Þj0 :

ð23:132Þ

23.7

Feynman Rules and Feynman Diagrams in QED

1105

The RHSs of (23.132) imply that an electron with momenta p (or p′) and helicity h (or h′) has been created in vacuum. That is, the elementary process is described by e- ⟶ e- , where e- denotes an electron. With this process we have f jSð0Þ ji = hf jEjii = hf jii = 0jbðp0 , h0 Þb{ ðp, hÞj0 = 0jbðp0 , h0 Þb{ ðp, hÞj0 = 0j - b{ ðp, hÞbðp0 , h0 Þ þ δh0 h δ3 ðp0 - pÞj0 = 0j - b{ ðp, hÞbðp0 , h0 Þj0 þ δh0 h δ3 ðp0 - pÞh0j0i = δh0 h δ3 ðp0 - pÞ,

ð23:133Þ

where with the last equality, we used (22.262). Equation (23.133) implies that hf| S(0)| ii vanishes unless we have h′ = h or p′ = p. That is, electrons are only moving without changing its helicity or momentum. This means that no interaction occurred. (ii) First-order matrix element hf| S(1)| ii [1]: We assume that j ii = b{ ðp, hÞj0i

and

j f i = a{ ðk0 , r 0 Þb{ ðp0 , h0 Þj0 :

ð23:134Þ

The corresponding elementary process would be e- ⟶ e- þ γ,

ð23:135Þ

where e- and γ denote an electron and a photon, respectively. An electron of the initial state jii in vacuum is converted into an electron and a photon in the final state jfi of (23.135). We pick up the effective normal product from the interaction Hamiltonian (23.17) [1]. Then, we have N eψ ðxÞγ μ Aμ ðxÞψ ðxÞ = eψ - ðxÞγ μ Aμ- ðxÞψ þ ðxÞ:

ð23:136Þ

As the matrix element hf| S(1)| ii, we have f jSð1Þ ji = ieh0 j bðp0 , h0 Þaðk0 , r 0 Þ d4 xψ - ðxÞγ μ Aμ- ðxÞψ þ ðxÞ b{ ðp, hÞ j 0i:

ð23:137Þ

In (23.137), we expand ψ - ðxÞ, Aμ- ðxÞ, and ψ +(x) using (23.88), (22.334), and (22.232), respectively. That is, we have

1106

23

ψ - ð xÞ =

Aμ- ðxÞ = ψ þ ð xÞ =

1 -1 1

d 3 w0

0

ð2π Þ3 2w00 d3 l0

t0 = ± 1

ð2π Þ3 2l00

1

d3 w 3

ð2π Þ 2w0

b{ ðw0 , t 0 Þuðw0 , t 0 Þeiw x ,

ð23:138Þ

0

2

-1

-1

Interaction Between Fields

m0 = 1

a{ ðl0 , m0 Þεμ ðl0 , m0 Þeil x ,

ð23:139Þ

bðw, t Þuðw, t Þe - iwx :

ð23:140Þ

t= ±1

In (23.139), we took only m′ = 1, 2 (i.e., transverse mode) with the photon polarization [1]. Using (22.239), we have bðp0 , h0 Þb{ ðw0 , t 0 Þ = - b{ ðw0 , t 0 Þbðp0 , h0 Þ þ δh0 t0 δ3 ðp0 - w0 Þ, {

{

ð23:141Þ

bðw, t Þb ðp, hÞ = - b ðp, hÞbðw, t Þ þ δth δ ðw - pÞ:

ð23:142Þ

3

Also, using (22.344), we have aðk0 , r 0 Þa{ ðl0 , m0 Þ = a{ ðl0 , m0 Þaðk0 , r 0 Þ þ ζ r0 δr0 m0 δ3 ðk0 - l0 Þ:

ð23:143Þ

Further using (23.141)-(23.143) and performing the momentum space integration with respect to dw′3, dw3, and dl′3, we get f jSð1Þ ji = ie

= ie

1 9

ð2π Þ  2p0 

0

2p00



2k00

1 ð2π Þ  2p0  2p00  2k00 9

d 4 xuðp0 , h0 Þεμ ðk0 , r 0 Þγ μ ζ r0 uðp, hÞeiðp þk

0

- pÞx

uðp0 , h0 Þεμ ðk0 , r 0 Þγ μ ζ r0 uðp, hÞð2π Þ4 δ4 ðp0 þ k0 - pÞ, ð23:144Þ

where with the last equality, we used the four-dimensional version of (22.101). The four-dimensional δ function means that δ4 ðp0 þ k0 - pÞ = δ p00 þ k00 - p0 δ3 ðp0 þ k0 - pÞ,

ð23:145Þ

23.7

Feynman Rules and Feynman Diagrams in QED

where p0 =

p00 p0 0

, p=

p0 p

1107

, etc. From (23.145), εμ ðk0 , r 0 Þ in (23.144) can be replaced

with εμ ðp - p , r 0 Þ. Equation (23.145) implies the conservation of four-momentum (or energy-momentum). That is, we have p = p0 þ k 0 :

ð23:146Þ

With the electron, we have p02 = p00 - p0 = m2 2

2

p2 = p0 2 - p2 = m2 :

and

ð23:147Þ

For the photon, we get k 0 = k00 - k0 = ω0 - ω0 = 0, 2

2

2

2

2

ð23:148Þ

where ω′ is an angular frequency of an emitted photon. But, from (23.146), we have p2 = p0 þ 2p0 k0 þ k 0 : 2

2

This implies p′k′ = 0. It is, however, incompatible with a real physical process [1]. This can be shown as follows: We have p0 k0 = p00 ω0 - p0 k0 : But, we have p0 k0 ≤ jp0 j  jk0 j =

p00 2 - m2  jk0 j < p00 ω0 :

Hence, p′k′ > p′0ω′ - p′0ω′ = 0. That is, p′k′ > 0. This is, however, inconsistent with p′k′ = 0, and so an equality of p′k′ = 0 is physically unrealistic. The simplest case would be that a single electron is present at rest in the initial state and it emits a photon with the electron recoiling in the final state. Put p=

m 0

, p0 =

p0 0 p0

, k0 =

ω0 k0

:

Then, seek a value of p′k′ on the basis of the energy-momentum conservation. It is left for readers. As evidenced from the above illustration, we must have

1108

23

Interaction Between Fields

f jSð1Þ ji = 0:

23.7.2

ð23:149Þ

Second-Order S-Matrix Elements and Feynman Diagrams

As already examined above, we had to write down a pretty large number of equations to examine even the simplest cases. It is the Feynman diagram that facilitates understanding the physical meaning of individual processes. A building block comprises three lines jointed at a vertex (see Fig. 23.2); i.e., two electron- or positron-associated lines and a photon-associated line. These lines correspond to the interaction Hamiltonian (23.17) related to the fields. Figure 23.2 is topologically equivalent to the first-order S-matrix S(1) discussed in the last section. The diagram shows an incoming (or initial) electron (represented by a straight line labelled with p) and an outgoing (final) electron (labelled with p′). A photon line (shown as a wavy line labelled with k or k′) intersects with the electron lines at their vertex. The Feynman diagrams comprise such building blocks and the number of the blocks contained in a diagram is determined from the order of the S-matrix. In general, a Feynman diagram consisting of n blocks is related to a n-th order S-matrix S(n). We show several kinds of Feynman diagrams related to S(2) below. Other diagrams representing S(2) processes can readily be constructed using two building blocks, one of which is depicted in Fig. 23.2. Associated Feynman rules will be summarized at the end of this section. We show examples of the second-order S-matrix S(2). Figure 23.3 summarizes several topologically different diagrams. Individual diagrams correspond to different scattering processes. One of important features that have not appeared in the previous section is that the matrix elements hf| S(2)| ii contain the contraction(s). In

(a)

(b)

Fig. 23.2 Building blocks of the Feynman diagram. The Feynman diagram comprises two electron- or positron-associated lines (labelled p, p′) and a photon-associated line (labelled k, k′) that are jointed at a vertex. (a) Incoming photon is labelled k. (b) Outgoing photon is labelled k′

23.7

Feynman Rules and Feynman Diagrams in QED

(a)

1109

(b)

=

=

+



=

(d)

(c)

Fig. 23.3 Feynman diagram representing the second-order S-matrix S(2). The diagram is composed of two building blocks. (a) Feynman diagram containing a single contraction related to electron field. (b) Feynman diagram showing the electron self-energy. (c) Feynman diagram showing the photon self-energy (or vacuum polarization). (d) Feynman diagram showing the vacuum bubble (or vacuum diagram)

QED, each contraction corresponds to the Feynman propagator described by SF(x - y) or DF(x - y). On the basis of (23.85), we explicitly show S(2) such that Sð2Þ = 1 = - e2 2

1 2 e 2

-1

dx1

-1

dx2 T½Hint ðx1 ÞHint ðx2 Þ

1

1 -1

1

1

dx1

-1

dx2 T

ψ ðx1 Þγ μ Aμ ðx1 Þψ ðx1 Þ ψ ðx2 Þγ μ Aμ ðx2 Þψ ðx2 Þ :

ð23:150Þ

From (23.150), up to three contractions may be present in S(2). As can be seen in Fig. 23.3, the patterns and number of the contractions determine the topological features of the Feynman diagrams. In other words, the contractions play an essential role in determining how the building blocks are jointed in the diagram (see Fig. 23.3). Before discussing the several features of an example (i.e., Compton scattering) at large in the next section, we wish to briefly mention the characteristics of the Feynman diagrams that have contraction(s). Figure 23.3a shows a diagram having a single contraction. Initial electron and photon are drawn on the left side and final electron and photon are depicted on the right side with the contraction with respect to the electron shown at the center.

1110

23

Interaction Between Fields

Examples that have two contractions are shown in Fig. 23.3b, c, which represent the “electron self-energy” and “photon self-energy,” respectively. The former term causes us to reconsider an electron as a certain entity that is surrounded by “photon cloud.” The photon self-energy, on the other hand, is sometimes referred to as the vacuum polarization. It is because the photon field might modify the distribution of virtual electron-positron pairs in such a way that the photon field polarizes the vacuum like a dielectric [1]. The S-matrix computation of the physical processes represented by Fig. 23.3b, c leads to a divergent integral. We need a notion of renormalization to traverse these processes, but we will not go into detail about this issue. We show another example of the Feynman diagrams in Fig. 23.3d called “vacuum diagram” (or vacuum bubble [7]). It lacks external lines, and so no transitions are thought to occur. Diagrams collected in Fig. 23.3c, d contain two contractions between the Fermions. In those cases, the associated contraction lines go in opposite directions. This implies that both types of Fermions (i.e., electron and positron) take part in the process. Now, in accordance with the above topological feature of the Feynman diagrams, we list major S(2) processes [1] as follows: ð2Þ

SA = -

e2 2

dx1 dx2 N ψγ μ Aμ ψ

x1

ðψγ ν Aν ψ Þx2 ,

ð23:151Þ

ð2Þ

SB = -

e2 2

dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψ γ ν Aν ψÞx  þ ½ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx j

2

j

j

j

2

,

ð23:152Þ ð2Þ

SC = -

e2 2

dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx , j

j

2

ð23:153Þ

ð23:154Þ ð23:155Þ

ð23:156Þ ð2Þ

Of these equations, SB contains a single contraction between the Fermions. We describe related important aspects in full detail in the next section with emphasis

23.7

Feynman Rules and Feynman Diagrams in QED

1111 ð2Þ

upon the Compton scattering. Another process SC containing a single contraction includes the Mϕller scattering and Bhabha scattering. Interested readers are referred ð2Þ ð2Þ ð2Þ to the literature [1, 3]. The SD , SE , and SF processes are associated with the electron self-energy, photon self-energy, and vacuum bubble, respectively. The first ð2Þ two processes contain two contractions, whereas the last process SF contains three contractions. Thus, we find that the second-order S-matrix elements ð2Þ Sk ðk = A, ⋯, FÞ reflect major processes of QED. Among the above equations, (23.151) lacking a contraction is of little importance as in the previous case, because it does not reflect real processes. Equation (23.152) contains two contractions between the Dirac fields (ψ and ψ) and related two terms are equivalent. As discussed in Sect. 23.6.2, especially as represented in (23.124), we have a following N-product expressed as N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx2 = N ðψγ μ Aμ ψÞx2 ðψ γ ν Aν ψÞx1 : j

j

j

j

ð23:157Þ

Notice that (i) exchanging two factors ψγ μ Aμ ψ x1 and ψγ μ Aμ ψ x2 causes even permutations of the Fermion operators (ψ and ψ). But, the even permutations do not change the N-products as shown in (23.120). (ii) Since (23.157) needs to be integrated with respect to x1 and x2, these variables can freely be exchanged. Hence, exchanging the integration variables x1 and x2 in RHS of (23.157), we obtain N ðψγ μ Aμ ψÞx2 ðψ γ ν Aν ψÞx1 = N ðψγ μ Aμ ψÞx1 ðψ γ ν Aν ψÞx2 : j

j

j

j

Thus, two terms of (23.152) are identical, and so (23.152) can be equivalently expressed as ð2Þ

SB = - e 2

dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx2 : j

j

ð23:158Þ

As already noted, the contractions represent the Feynman propagators. The contraction in (23.158) corresponds to SF(x - y) described by (23.126) and contains two uncontracted Fermion operators and two uncontracted photon operators. These uncontracted operators are referred to as external particles (associated with the external lines in the Feynman diagrams) and responsible for absorbing or creating the particles (i.e., electrons or photons). Consequently, (23.158) contributes to many real physical processes (see Sect. 23.8.2). In parallel with the discussion made so far, in Fig. 23.4 we depict several major “components” of the diagram along with their meaning. A set of line drawings accompanied by their physical meaning is usually referred to as Feynman rules. The words “initial” and “final” used in the physical meaning should be taken as formal. In the above discussion, the initial states are associated with the momenta p and k, whereas the final states are pertinent to p′ and k′. In this respect, the words initial and

1112

23

Interaction Between Fields

(a) =

+ +



(b) =

− + (d)

(c) ( , ℎ)

– ( , ℎ)

( , ℎ)

( , ℎ)

(e) ( , ) ( , )

Fig. 23.4 Major components of the Feynman diagrams. (a) Internal (or intermediate) electron line and the corresponding Feynman propagator. (b) Internal (or intermediate) photon line and the corresponding Feynman propagator. (c) External lines of the initial electron [u(p, h)] and final electron [uðp, hÞ] from the top. (d) External lines of the initial positron [vðp, hÞ] and final positron [v(p, h)] from the top. (e) External lines of the initial photon [εμ(k, r)] and final photon [εμ(k, r)] from the top

final are of secondary importance. What is important is a measured (or measurable) quantity of p, k, p′, and k′. Figure 23.4a, b shows the internal (or intermediate) electron and photon lines labelled by momentum p and k, respectively. Figure 23.4c indicates the external lines of the initial electron [u(p, h)] and final electron [uðp, hÞ] from the top. In Fig. 23.4d, we depict similarly the external lines of initial positron [vðp, hÞ] and final positron [v(p, h)] from the top. Note that the direction of the arrows of Fig. 23.4d is reversed

23.8

Example: Compton Scattering [1]

1113

relative to Fig. 23.4c. Moreover, Fig. 23.4e shows likewise the external lines of photons. They are labelled by the polarization εμ(k, r).

23.8 23.8.1

Example: Compton Scattering [1] Introductory Remarks

To gain better understanding of the interaction between the quantum fields and to get used to dealing with the Feynman diagrams, we take this opportunity to focus upon a familiar example of Compton scattering. In Chap. 1, we dealt with it from a classical point of view (i.e., early-stage quantum theory). Here we revisit this problem to address the major issue from a point of view of quantum field theory. Even though the Compton scattering is only an example among various kinds of interactions of the quantum fields, it represents a general feature of the field interactions and, hence, to explore its characteristics in depth enhances heuristic knowledge. The elementary process of the Compton scattering is described by γ þ e-

⟶ γ þ e- ,

ð23:159Þ

where γ denotes a photon and e- stands for an electron. Using (23.130), a major part ð2Þ of the interaction Hamiltonian for SB of (23.158) is described as e2 N ½ψ þ ðx1 Þ þ ψ - ðx1 Þγ μ Aþ μ ðx1 Þ þ Aμ ðx1 Þ SF ðx1 - x2 Þ þ × γ ν Aþ ν ðx2 Þ þ Aν ðx2 Þ ½ψ ðx2 Þ þ ψ ðx2 Þg:

ð23:160Þ

Equation (23.160) represents many elementary processes including the Compton scattering (see Sect. 23.8.2). Of the operators in (23.160), ψ - ðx1 Þ and ψ +(x2) are responsible for the creation [b{(p, h)] and annihilation [b(p, h)] of electron, respectively. The operators ψ -(x2) and ψ þ ðx1 Þ are associated with the creation [d{(p, h)] and annihilation [d(p, h)] of positron, respectively. The operators Aμ- and Aþ μ are pertinent to the creation and annihilation of photon, respectively. We need to pick appropriate operators up according to individual elementary processes and arrange them in the normal order. In the case of the Compton scattering, for example, the order of operators is either þ ψ - ðx1 Þγ μ Aμ- ðx1 Þγ ν Aþ ν ðx2 Þψ ðx2 Þ

or

1114

23

Interaction Between Fields

þ ψ - ðx1 Þγ μ Aν- ðx2 Þγ ν Aþ μ ðx1 Þψ ðx2 Þ: þ Regarding the photon operators, either Aþ μ ðx1 Þ or Aν ðx2 Þ is thought to annihilate the “initial” photon. Either Aμ ðx1 Þ or Aν ðx2 Þ creates the “final” photon accordingly. Note the “initial” and “final” with the double quotation marks written above. Because of the presence of time-ordered products, the words of initial and final are somewhat pointless. Nonetheless, we used them according to the custom. The custom merely arises in the notation hf| S(2)| ii. That is, in this notation the “initial state” i stands to the right of the “final state” f. ð2Þ Defining the S(2) relevant to the Compton scattering as SComp , we have [1] ð2Þ

SComp = Sa þ Sb ,

ð23:161Þ

where Sa and Sb are given by Sa = - e2

þ d4 x1 d 4 x2 ψ - ðx1 Þγ μ SF ðx1 - x2 Þγ ν Aμ- ðx1 ÞAþ ν ðx2 Þψ ðx2 Þ

ð23:162Þ

þ d4 x1 d 4 x2 ψ - ðx1 Þγ μ SF ðx1 - x2 Þγ ν Aν- ðx2 ÞAþ μ ðx1 Þψ ðx2 Þ:

ð23:163Þ

and S b = - e2

In (23.162) and (23.163), once again we note that the field operators have been placed in the normal order. In both (23.162) and (23.163), the positive-frequency part ψ + and negative-frequency part ψ - of the electron field are assigned to the coordinate x2 and x1, respectively. The normal order of the photon part is given by þ þ þ Aμ- ðx1 ÞAþ ν ðx2 Þ, Aμ ðx1 ÞAν ðx2 Þ, Aμ ðx1 ÞAν ðx2 Þ, and Aν ðx2 ÞAμ ðx1 Þ:

Of these, the first and last terms are responsible for Sa and Sb, respectively. Other ð2Þ two terms produce no contribution to SComp . The term Aμ- ðx1 ÞAþ ν ðx2 Þ of Sa in (23.162) plays a role in annihilating the initial photon at x2 and in creating the final photon at x1. Meanwhile, the term Aν- ðx2 ÞAþ μ ðx1 Þ of Sb in (23.163) plays a role in annihilating the initial photon at x1 and in creating the final photon at x2. Thus, comparing (23.162) and (23.163), we find that the coordinates x1 and x2 switch their role in the annihilation and creation of photon. Therefore, it is of secondary importance which “event” has happened prior þ to another between the event labelled by Aþ ν ðx2 Þ and that labelled by Aμ ðx1 Þ. This is the case with Aμ- ðx1 Þ and Aν- ðx2 Þ as well. Such a situation frequently occurs in other kinds of field interactions and their calculations more generally. Normally, we

23.8

Example: Compton Scattering [1]

1115

distinguish the state of the quantum field by momentum and helicity with electron and momentum and polarization with photon. The related Feynman diagrams and calculations pertinent to the Compton scattering are given and discussed accordingly (vide infra).

23.8.2

Feynman Amplitude and Feynman Diagrams of Compton Scattering ð2Þ

The S-matrix SB of (23.152) represents interaction processes other than the Compton scattering shown in (23.159) as well. These are described as follows: γ þ eþ ⟶ γ þ eþ , -

þ

ð23:164Þ

e þ e ⟶ γ þ γ,

ð23:165Þ

γ þ γ ⟶ e- þ eþ ,

ð23:166Þ

ð2Þ

where e+ stands for a positron. In the SB types of processes, we are given four particles, i.e., e - , eþ , γ, γ: Using the above four particles, we have four different ways of choosing two of them. That is, we have γ, e - , γ, eþ , e - , eþ , γ, γ: The above four combinations are the same as the remainder combinations left after the choice. The elementary processes represented by (23.164), (23.165), and (23.166) are called the Compton scattering by positrons, electron-positron pair annihilation, and electron-positron pair creation, respectively. To distinguish those processes unambiguously, the interaction Hamiltonian (23.160) including both the positive frequency-part and negative-frequency part must be written down in full. In the case of the electron-positron pair creation of (23.166), for instance, the second-order S-matrix element is described as [1] S ð 2 Þ = - e2

þ d4 x1 d4 x2 ψ - ðx1 Þγ μ SF ðx1 - x2 Þγ ν ψ - ðx2 ÞAþ μ ðx1 ÞAν ðx2 Þ: ð23:167Þ

In (23.167), field operators are expressed in the normal order and both the annihilation operators (corresponding to the positive frequency) and creation operators (corresponding to the negative frequency) are easily distinguished.

1116

23

Interaction Between Fields

We have already shown the calculation procedure of hf| S(1)| ii. Even though it did not describe a real physical process, it helped understand how the calculations are performed on the interacting fields. The calculation processes of hf| S(2)| ii are basically the same as those described above. In the present case, instead of (23.134) we assume j ii = a{ ðk, r Þb{ ðp, hÞj0i

j f i = a{ ðk0 , r 0 Þb{ ðp0 , h0 Þj0 :

and

Then, using e.g., (23.161), we have ð2Þ

f jSComp ji = hf jSa jii þ hf jSb jii with hf jSa jii = h0 j bðp0 , h0 Þaðk0 , r 0 ÞSa a{ ðk, r Þb{ ðp, hÞj0i, hf jSb jii = h0 j bðp0 , h0 Þaðk0 , r 0 ÞSb a{ ðk, r Þb{ ðp, hÞj0i:

ð23:168Þ

Substituting (23.162) or (23.163) for Sa or Sb in (23.168), respectively, we carry out the calculations as in Sect. 23.7.1. The calculation procedures are summarized as follows: (i) Expand the field operators into the positive- or negative-frequency parts in the three-dimensional momentum space as in Sects. 22.4 and 22.5. (ii) Use the (anti)commutation relation regarding the creation or annihilation operators of photon and electron as in (23.141)-(23.143). (iii) Perform the integration subsequently with respect to the momentum space and further perform the integration with respect to x1 and x2 of SF(x1 - x2) that produces δ functions of four-momenta. Then, (iv) further integration of the resulting SF(x1 - x2) in terms of the four-momenta converts it into the Fourier counterpart of S~F ðpÞ denoted by (22.276). Thus, the complicated integration can be reduced to simple algebraic computation. The above summary would rather hint to us an overwhelming number of computations. As the proverb goes, however, “an attempt is sometimes easier than expected.” We show the major parts of the calculations for hf| Sa| ii as below. Similar techniques can be applied to the computation of hf| Sb| ii. The S-matrix element is described as hf jSa jii = ð- ieÞ2

× 0j

ζ r ζ r0 ð2π Þ

2p00

6

2k00 ð2p0 Þð2k 0 Þ



d4 x1 d4 x2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν SF ðx1 - x2 Þεμ ðk, r Þγ μ uðp, hÞ j0 , ð23:169Þ

where ζ r and ζ r0 come from (22.344). In (23.169) SF(x1 - x2) is expressed as

23.8

Example: Compton Scattering [1]

1117

- ix ðq - p - id 4 q e 1 ð2π Þ4

SF ð x 1 - x 2 Þ =

0

- k 0 Þ ix2 ðq - p - kÞ

e qμ γ μ þ m : m2 - q2 - iE

Performing the integration of SF(x1 - x2) in (23.169) with respect to x1 and x2, we obtain an expression including δ functions of four-momenta such that hf jSa jii = ð- ieÞ2



ð2π Þ8 uðp0 , h0 Þεν ðk0 ,r 0 Þ γ ν

ζ r ζ r0 h0j0i ð2π Þ6

2p00 2k 00 ð2p0 Þð2k0 Þ

×

0 4 4 0 μ -id 4 q δ ðq - p -k Þδ ðq- p - k Þ qμ γ þ m εμ ðk, rÞγ μ uðp,hÞ m2 - q2 -iE ð2π Þ4

= ð- ieÞ2 ð2π Þ4



× uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν =

ζ r ζ r0 δ4 ðp þ k - p0 - k 0 Þ ð2π Þ2

2p00 2k00 ð2p0 Þð2k0 Þ

ð- iÞ pμ þ kμ γ μ þ m m 2 - ðp þ k Þ2

εμ ðk, r Þγ μ uðp, hÞ

ð2π Þ4 δ4 ðp þ k - p0 - k 0 Þ 2p00 2k00 ð2p0 Þð2k0 Þ

ð2π Þ6 

ð23:170Þ

δ4 ðq - p0 - k0 Þδ4 ðq - p - kÞ = δ4 ðp þ k - p0 - k 0 Þδ4 ðq - p - k Þ:

ð23:171Þ

× - e2 ζ r ζ r0 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ kÞεμ ðk, r Þγ μ uðp, hÞ : In (23.170), we used

Since the first factor of (23.171) does not contain the variable q, it can be taken out from the integrand. The integration of the integrand containing the second factor produces S~F ðp þ kÞ in (23.170). In (23.170), S~F ðpÞ was defined in Sect. 22.4.4 as S~F ðpÞ =

- i pμ γ μ þ m : m2 - p2 - iE

ð22:276Þ

Namely, the function S~F ðpÞ is a momentum component (or Fourier component) of SF(x) defined in (22.275). If in (23.170) we think of only the transverse modes for the photon field, the indices r and r′ of ζ r and ζ r0 run over 1 and 2 [1]. In that case, we have ζ 1 = ζ 2 = 1, and so we may neglect the factor ζ r ζ r0 ð= 1Þ accordingly. On this condition, we define a complex number M a as

1118

23

Interaction Between Fields

(a)

Fig. 23.5 Feynman diagrams representing the second-order S-matrix of the Compton scattering. (a) Diagram of Sa. (b) Diagram of Sb

=

+

(b)

=





M a  - e2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ kÞεμ ðk, r Þγ μ uðp, hÞ:

ð23:172Þ

The complex number M a is called the Feynman amplitude (or invariant scattering amplitude) associated with the Feynman diagram of Sa (see Fig. 23.5a). To compute S~F ðp þ k Þ, we have S~F ðp þ kÞ = =

ð- iÞ pμ γ μ þ m 2

ðp þ k Þ - m2 þ iE

=

ð- iÞ pμ γ μ þ m p2 þ 2pk þ k2 - m2 þ iE

ð- iÞ pμ γ μ þ m ð- iÞ pμ γ μ þ m = , 2pk þ iE 2pk

ð23:173Þ

where in the denominator we used p2 - m2 = k2 = 0. In (23.173), we dealt with the “square” ( p + k)2 as if both p and k were ordinary numbers. Here, it means

23.8

Example: Compton Scattering [1]

1119

ð p þ k Þ 2  pμ þ k μ ð p μ þ k μ Þ = p μ p μ þ p μ k μ þ k μ p μ þ k μ k μ = pμ pμ þ 2pμ k μ þ k μ kμ = p2 þ 2pk þ k2 , where we define pk  pμkμ. Since it is a convenient notation, it will be used henceforth. Since we had pk ≠ 0 in the present case, the denominator was not an infinitesimal quantity and, hence, we deleted iE as redundant. Thus, we get hf jSa jii = ð2π Þ4 δ4 ðp0 þ k 0 - p - k Þ

1 ð2π Þ

2p00

6

2k00 ð2p0 Þð2k 0 Þ

M a : ð23:174Þ

In a similar manner, with hf| Sb| ii, we have hf jSb jii = ð2π Þ4 δ4 ðp0 þ k0 - p - k Þ

1 2p00 2k00 ð2p0 Þð2k0 Þ

ð2π Þ6

 × - e2 ζ r ζ r0 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ :

ð23:175Þ

Also, defining a complex number M b as  M b  - e2 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k 0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ,

ð23:176Þ

we obtain hf jSb jii = ð2π Þ4 δ4 ðp0 þ k 0 - p - k Þ

1 ð2π Þ

6

2p00

2k00 ð2p0 Þð2k 0 Þ

M b : ð23:177Þ

In (23.174) and (23.177), the presence of δ4( p′ + k′ - p - k) reflects the fourmomentum conservation before and after the Compton scattering process. Figure 23.5 represents the Feynman diagrams that correspond to Sa (Fig. 23.5a) and Sb (Fig. 23.5b) related to the Compton scattering. Both the diagrams consist of two building blocks depicted in Fig. 23.2. Even though Fig. 23.5a, b shares a topological feature the same as that of Fig. 23.3a, the connection of the two building blocks has been reversed with Sa and Sb. From the discussions of the present and previous sections, the implications of these Feynman diagrams (Fig. 23.5a, b) are evident. Equations (23.174) and (23.177) represent a general feature of the field interactions. The factor (2π)4 present in both the numerator and denominator could be reduced, but we leave it without change. It is because (23.174) and (23.177) can readily be generalized in the case where initial two particles and N final particles take part in the overall reaction. In the general two-particle collision, we assume that the reaction is represented by

1120

23

P1 þ P2



Interaction Between Fields

P0 1 þ P0 2 þ ⋯ þ P0 N,

ð23:178Þ

where P1 and P2 denote the initial particles and P′1, P′2, ⋯P′N represent the final particles. Then, in accordance with (23.174) and (23.177), the S-matrix element is expressed as hf jSjii N

p0 f =1 f

= ð2π Þ4 δ4



f =1

In (23.179), pi =

p i=1 i

1

N

ð2π Þ 2E 0f 3

Ei pi

two particles and p0f =

1

2

-

ð2π Þ3 ð2E 1 Þð2π Þ3 ð2E 2 Þ

M:

ð23:179Þ

ði = 1, 2Þ denotes four-momenta of the initially colliding E 0f p0f

ðf = 1, ⋯, N Þ represents those of the finally created

N particles. Also, in (23.179) we used S and M instead of Sa, M a , etc., since a general case is intended. Notice that with the Compton scattering N = 2.

23.8.3 Scattering Cross-Section [1] To connect the general formalism to experimental observations, we wish to relate the S-matrix elements to the transition probability that measures the strength of the interaction between the fields. The relevant discussion is first based on the general expression (23.179), and then, we focus our attention to the Compton scattering. Let w be the transition probability per unit time and unit volume. This is given by w = jhf jSjiij2 =TV,

ð23:180Þ

where T is a time during which the scattering experiment is being done; V is a volume related to the experiment. Usually, TV is taken as the whole space-time, because an elementary particle reaction takes place in an extremely narrow spacetime domain. Conservation of the energy and momentum with respect to (23.179) is read as E1 þ E2 =

N

E0 , k=1 k

ð23:181Þ

23.8

Example: Compton Scattering [1]

1121

p1 þ p2 =

N

p0 : k=1 k

ð23:182Þ

From the above equations, we know that all E 01 , ⋯, E 0N are not independent valuables. All p01 , ⋯, p0N are not independent valuables, either. The cross-section σ for the reaction or scattering is described by σ = ð2π Þ6 w=vrel ,

ð23:183Þ

where vrel represents the relative velocity between two colliding particles P1 and P2. In (23.183), vrel is defined by [1] vrel 

ðp1 p2 Þ2 - ðm1 m2 Þ2 =E 1 E 2 ,

ð23:184Þ

where m1 and m2 are the rest masses of the colliding particles 1 and 2, respectively. When we have a colinear collision of the two particles, (23.184) is reduced to a simple expression. Suppose that at a certain inertial frame of reference Particle 1 and Particle 2 have a velocity v1 and v2 = av1 (where a is a real number), respectively. According to (1.9), we define γ 1 and γ 2 such that γ i  1=

1 - jvi j2 ði = 1, 2Þ:

ð23:185Þ

Then, from (1.21), we have E i = mi γ i

and

pi = mi γ i vi ði = 1, 2Þ:

ð23:186Þ

Also, we obtain ðp1 p2 Þ2 - ðm1 m2 Þ2 = ðm1 m2 γ 1 γ 2 - p1  p2 Þ2 - ðm1 m2 Þ2 4

= ðm1 m2 γ 1 γ 2 Þ2 1 - 2a v1 j2 þ a2 v1 -

1 γ12γ22

= ðm1 m2 γ 1 γ 2 Þ2 jv1 j2 ða - 1Þ2 :

ð23:187Þ

With the above derivation, we used (23.185) and (23.186) along with v2 = av1 (colinear collision). Then, we get vrel = j v1 ða - 1Þ j = j v2 - v1 j :

ð23:188Þ

Thus, in the colinear collision case, we have had a relative velocity identical to that in a non-relativistic limit. In a particular instance of a = 0 (to which Particle 2 is at rest), vrel = j v1j. This is usually the case with a laboratory frame where Particle

1122

23

Interaction Between Fields

1 is caused to collide against Particle 2 at rest. If Particle 1 is photon, in a laboratory frame we have a particularly simple expression of vrel = 1:

ð23:189Þ

Substituting (23.180) and (23.184) for (23.183), we obtain ð2π Þ6  TV

σ= 

E1 E2 ð p1 p2 Þ 2 - ð m 1 m 2 Þ 2

1 ð2π Þ4 δ4 ð2π Þ  4E1 E2

N

p0 f =1 f

6

1

×

ð2π Þ 2E 0f 3

f

2

2

-

p i=1 i

jM j2 ,

ð23:190Þ

where (2π)6 of the numerator in the first factor is related to the normalization of incident particles 1 and 2 (in the initial state) [8]. In (23.190), we have N

p0 f =1 f

ð2π Þ4 δ4

2

2

-

p i=1 i

N

p0 f =1 f

= ð2π Þ4 δ4 ð0Þ  ð2π Þ4 δ4

-

2

p i=1 i

,

where we used a property of a δ function expressed as [9] f ðxÞδðxÞ = f ð0ÞδðxÞ:

ð23:191Þ

But, in analogy with (22.57), we obtain δ ð 0Þ =

1 2π

1 -1

ei0x dx =

1

1 2π

dx

or

-1

1

ð2π Þ4 δ4 ð0Þ =

-1

d4 x ≈ TV, ð23:192Þ

where (2π)4δ4(0) represents the volume of the whole space-time. Thus, we get ð2π Þ4

σ= 4 

ðp1 p2 Þ2 - ðm1 m2 Þ2 1

N f =1

ð2π Þ 2E 0f 3

δ4

jM j2 :

N

p0 f =1 f

-

2

p i=1 i

ð23:193Þ

The cross-section is usually described in terms of differential cross-section dσ such that

23.8

Example: Compton Scattering [1]

1123

ð2π Þ4

dσ = 4

ðp1 p2 Þ2 - ðm1 m2 Þ2 dp0f

N



f =1

N

p0 f =1 f

δ4

2

-

p i=1 i

jM j2 ,

ð2π Þ3 2E0f

ð23:194Þ

where dp0f represents an infinitesimal range of momentum from p0f to p0f þ dp0f for which the particle P′f ( f = 1, ⋯, N ) is found. Then, σ can be expressed in a form of integration of (23.194) as ð2π Þ4

σ= 4 

ðp1 p2 Þ2 - ðm1 m2 Þ2 N

dp0f

f =1

3

ð2π Þ

N

p0 f =1 f

δ4

2E0f

2

-

p i=1 i

jM j2 :

ð23:195Þ

Note that (23.195) is Lorentz invariant. In fact, if in (22.150) we put f(k0)  1 and g(k)  1, we have d4 kδ k2 - m2 θðk 0 Þ =

d3 k

1 k þ m2 2

2

:

As an equivalent form of (23.195), we obtain σ=

ð2π Þ4 4 δ 4vrel E 1 E 2

N

p0 f =1 f

2

dp0f

N

p i=1

f =1

ð2π Þ3 2E 0f

jM j2 : ð23:196Þ

In particular, if N = 2 (to which the Compton scattering is relevant), we obtain dσ =

1 δ4 64π 2 vrel E 1 E 2 E 01 E 02

2

p0 f =1 f

-

2

p i=1 i

jM j2 d3 p01 d3 p02 :

ð23:197Þ

Now, we wish to perform practical calculations of the Compton scattering after Mandl and Shaw [1]. Because of the momentum conservation described by p02 = p1 þ p2 - p01 ,

ð23:198Þ

we regard only p01 (of the scattered photon) as an independent variable out of p01 and p02 . The choice is at will, but it is both practical and convenient to choose the momentum of the particle in question for the independent variable. To express the

1124

23

Interaction Between Fields

differential cross-section dσ as a function of the scattering angle of the particle, we convert the Cartesian coordinate representation into the polar coordinate representation with respect to the momentum space p01 in (23.197). Then, we integrate (23.197) with respect to p02 to get 2

E0 f =1 f

dσ = δ4

2

-

E i=1 i

1 2 jM j2 p01 d p01 dΩ0 , ð23:199Þ 64π 2 E 1 E 2 E 01 E 02 vrel

where dΩ′ denotes a solid angle element, to which the particle P′1 is scattered into. 2 2 0 integration converted δ4 into Notice that the p02 f = 1 pf i = 1 pi δ4

2 0 f = 1 Ef

-

2 i = 1 Ei

.

Further integration of dσ over j p01 j produces [1]

dσd

p01

1 = d~ σ= jM j2 p01 2 64π E 1 E 2 E 01 E 02 vrel

2

-1

∂ E 01 þ E02 ∂ p01

dΩ0 , ð23:200Þ

where d~ σ means the differential cross-section obtained after the integration of dσ over j p01 j. To derive (23.200), we used the following formula of δ function [1] f ðx, yÞδ½gðx, yÞdx =

= f ðx, yÞ

f ðx, yÞδ½gðx, yÞ ∂g ∂x

∂x ∂g

dg y

-1

: y

ð23:201Þ

g=0

We make use of (23.200) in Sect. 23.8.6.

23.8.4

Spin and Photon Polarization Sums

We often encounter a situation where a Feynman amplitude M takes a form [1] M = cuðp0 , h0 ÞΓuðp, hÞ,

ð23:202Þ

where c is a complex number and Γ is usually a (4, 4) matrix (tensor) that contains gamma matrices. The amplitude M is a complex number as well and we are interested in calculating jM j2 (a real number). Hence, we have

23.8

Example: Compton Scattering [1]

1125 

jM j2 = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ½uðp0 , h0 ÞΓuðp, hÞ :

ð23:203Þ

Viewing uðp0 , h0 ÞΓuðp, hÞ as a (1, 1) matrix, we have 

{

½uðp0 , h0 ÞΓuðp, hÞ = ½uðp0 , h0 ÞΓuðp, hÞ :

ð23:204Þ

Then, we obtain jM j2 = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ u{ ðp, hÞΓ{ u{ ðp0 , h0 Þ = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ u{ ðp, hÞγ 0 γ 0 Γ{ γ 0 γ 0 u{ ðp0 , h0 Þ = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ uðp, hÞ γ 0 Γ{ γ 0 γ 0 γ 0 uðp0 , h0 Þ = jcj2 ½uðp0 , h0 ÞΓuðp, hÞuðp, hÞ γ 0 Γ{ γ 0 uðp0 , h0 Þ,

ð23:205Þ

where with the second equality we used γ 0γ 0 = E; with the third equality, we used ~ as u{ = γ 0 u. Defining Γ ~  γ 0 Γ{ γ 0 , Γ

ð23:206Þ

~ ðp0 , h0 Þ: jM j2 = jcj2 uðp0 , h0 ÞΓuðp, hÞuðp, hÞΓu

ð23:207Þ

we get

~ as well as In (23.207), u(p, h), u(p′, h′), etc. are (4, 1) matrices; Γ and Γ 0 0 ~ uðp, hÞuðp, hÞ are (4, 4) matrices. Then, Γuðp, hÞuðp, hÞΓuðp , h Þ is a (4, 1) matrix. Now, we recall (18.70) and (18.71); namely, TrðPQÞ = TrðQPÞ, ðPQÞii = i

Pij Qji = i

j

ð18:70Þ

Qji Pij = j

i

ðQPÞjj :

ð18:71Þ

j

Having a look at (18.71), we find (18.70) can be extended to the case where neither P nor Q is a square matrix but PQ is a square matrix. Thus, (23.207) can be rewritten as ~ ðp0 , h0 Þuðp0 , h0 Þ , jM j2 = jcj2  Tr Γuðp, hÞuðp, hÞΓu

ð23:208Þ

~ ðp0 , h0 Þuðp0 , h0 Þ is a (4, 4) matrix. Note, however, that where Γuðp, hÞuðp, hÞΓu taking traces does not mean sum or average of the spin and photon polarization. It is because in (23.208) the helicity indices h or h′ for electron remain without being

1126

23

Interaction Between Fields

~ for (23.208)] have not summed. The photon polarization indices [hidden in Γ and Γ yet been summed either. The polarization sum and average are completed by taking summation over all the pertinent indices. This will be done in detail in the next section.

23.8.5

Detailed Calculation Procedures of Feynman Amplitude [1]

Now that we have gotten used to the generalities of the scattering theory, we are in the position to carry out the detailed calculations of the Feynman amplitude. In the case of the Compton scattering, the Feynman amplitude M a has been given as 

M a  - e2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ k Þεμ ðk, r Þγ μ uðp, hÞ, 

M b  - e2 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k 0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ:

ð23:172Þ ð23:176Þ

Comparing (23.172) and (23.176) with (23.202), we find that c in (23.202) is given by c = - e2 ζ r ζ r 0 :

ð23:209Þ

Note that ζ r ζ r0 = 1 ðr, r 0 = 1, 2Þ. The total Feynman amplitude M is given by M = M a þ M b:

ð23:210Þ

jMj2 = jM a j2 þ jM b j2 þ M a M b  þ M a  M b :

ð23:211Þ

Then, we have

From here on, we assume that electrons and/or photons are unpolarized in both their initial and final states. In terms of the experimental techniques, it is more difficult to detect or determine the polarization of helicity of electrons than the polarization of photons. To estimate the polarization sum with the photon field, we need another technique in relation to its gauge transformation (vide infra). (i) In the case of the electrons, we first average over initial spins (or helicities) h and ~ that sum over final helicities h′. These procedures yield a positive real number X is described by [1]

23.8

Example: Compton Scattering [1]

1127

~= 1 X 2

2

2

h=1

h0 = 1

jMj2 :

ð23:212Þ

(ii) As for the photons, we similarly average over initial polarizations r and sum over final polarizations r′. These procedures yield another positive real number X defined as [1] X

1 2

2

2

r=1

r0 = 1

~= 1 X 4

2

2

2

2

r=1

r0 = 1

h=1

h0 = 1

jMj2 ,

ð23:213Þ

where jMj2 was given in (23.211). We further define X1 to X4 as 1 4 1 X2  4 1 X3  4 1 X4  4

2

2

2

2

r=1

r0 = 1

h=1

h0 = 1

2

2

2

2

r=1

r0 = 1

h=1

h0 = 1

X1 

jM a j2 ,

ð23:214Þ

jM b j2 ,

ð23:215Þ

M aM b,

ð23:216Þ

M aM b:

ð23:217Þ

2

2

2

2

r=1

r0 = 1

h=1

h0 = 1

2

2

2

2

r=1

r0 = 1

h=1

h0 = 1

As a number X we want to compute, we have X = X1 þ X2 þ X3 þ X4 :

ð23:218Þ

We calculate four terms of (23.218) one by one. To compute jM a j2 and jM b j2 , we follow (23.214) and (23.215), respectively. We think of M a M b  and M a  M b afterward. First, with jM a j2 we have 

Γ=

εν ðk0 , r 0 Þ γ ν pμ þ k μ γ μ þ m εμ ðk, r Þγ μ : 2pk

ð23:219Þ

Then, from (23.206), we obtain ~ = γ0 Γ

ðγ 0 γ μ γ 0 Þεμ ðk, r Þ γ 0 pμ þ kμ γ μ þ m γ 0 ðγ 0 γ ν γ 0 Þεν ðk0 , r 0 Þ 0 γ 2pk =

Thereby, we get

γ μ εμ ðk, r Þ pμ þ kμ γ μ þ m γ ν εν ðk0 , r 0 Þ : 2pk

ð23:220Þ

1128

23

X1 = =

e4 4

2

2

2

r=1

r0 = 1

h=1

h0 = 1

Tr r,r 0 ,h,h0

εν ðk0 , r 0 Þ γ ν pμ þ k μ γ μ þ m ελ ðk, r Þγ λ uðp, hÞuðp, hÞ 2pk

×

γ ρ ερ ðk, r Þ pμ þ k μ γ μ þ m εσ ðk0 , r 0 Þγ σ uðp0 , h0 Þuðp0 , h0 Þ 2pk

=

e4 4

½ {



Tr r,r 0

εν ðk0 , r 0 Þ γ ν pμ þ kμ γ μ þ m ελ ðk, r Þγ λ ð pα γ α þ m Þ 2pk

γ ρ ερ ðk, r Þ pμ þ kμ γ μ þ m εσ ðk0 , r 0 Þγ σ 0 β pβ γ þ m 2pk 

e4 4

Tr r,r0

×

×

εν ðk0 , r 0 Þ εσ ðk0 , r 0 Þγ ν pμ þ kμ γ μ þ m ερ ðk, r Þ ελ ðk, r Þγ λ 2pk

ðpα γ α þ mÞγ ρ pμ þ kμ γ μ þ m γ σ p0β γ β þ m 2pk =

=

~ ðp0 , h0 Þuðp0 , h0 Þ Tr Γuðp, hÞuðp, hÞΓu



e4 4

× =

2

Interaction Between Fields

½{{

ð- ηνσ Þγ ν pμ þ kμ γ μ þ m - ηρλ γ λ e4 Tr 4 2pk

ðpα γ α þ mÞγ ρ pμ þ kμ γ μ þ m γ σ p0β γ β þ m 2pk

½{{{

ð- γ σ Þ pμ þ k μ γ μ þ m - γ ρ ðpα γ α þ mÞγ ρ pμ þ k μ γ μ þ m γ σ p0β γ β þ m e4 Tr 4 ð2pkÞ2

=

γ σ pμ þ kμ γ μ þ m γ ρ ðpα γ α þ mÞγ ρ pμ þ kμ γ μ þ m γ σ p0β γ β þ m e4 Tr 4 ð2pk Þ2

:

ð23:221Þ At [{] of (23.221), we used uðp, hÞuðp, hÞ = pμ γ μ þ m and h

h0

uðp0 , h0 Þuðp0 , h0 Þ = p0β γ β þ m:

ð21:144Þ

At [{{], we used the fact that εσ (k′, r′) and ερ ðk, r Þ can freely be moved in (23.221) because they are not an operator, but merely a complex number. At

23.8

Example: Compton Scattering [1]

1129

[{ { {], see discussion made just below based on the fact that the Feynman amplitude M must have the gauge invariance. Suppose that the vector potential Aμ(x) takes a form [1] Aμ ðxÞ = αεμ ðk, r Þe ± ikx , where α is a constant. Also, suppose that in (22.293) we have [1] ~ ðk Þe ± ikx : Λ ð xÞ = α Λ Then, from (22.293), Aμ(x) should be transformed such that ~ ðk Þe ± ikx A0μ ðxÞ = Aμ ðxÞ þ ∂μ ΛðxÞ = Aμ ðxÞ ± ik μ Λ ~ ðk Þe ± ikx = α εμ ðk, r Þ ± ik μ Λ ~ ðkÞ=α e ± ikx : = αεμ ðk, r Þe ± ikx ± ik μ Λ

ð23:222Þ

Therefore, corresponding to the gauge transformation of Aμ(x), εμ(k, r) is transformed such that [1] ~ ðkÞ=α: ε0μ ðk, r Þ = εμ ðk, r Þ ± ik μ Λ

ð23:223Þ

Taking complex conjugate of (23.223), we have ~ ðkÞ =α : ε0μ ðk, r Þ = εμ ðk, r Þ ∓ ik μ Λ

ð23:224Þ

Using (23.223) and (23.224), we get 2

ε0 ðk, r Þ εν 0 ðk, r Þ r=1 μ =

2 r=1

~ ðk Þ =α εν ðk, r Þ ± ik ν Λ ~ ðkÞ=α εμ ðk, r Þ ∓ ik μ Λ =

þ

2 r=1

2

ε ðk, r Þ r=1 μ



εν ðk, r Þ

~ ðkÞ=α ∓ ik μ εν ðk, r ÞΛ ~ ðkÞ=α ~ ðkÞ =α þ k μ kν Λ ± ik ν εμ ðk, r Þ Λ

2

:

ð23:225Þ Namely, the gauge transformation of the photon field produces the additional second term in (23.225). Let the (partial) Feynman amplitude X1 including the factor 2  0 r = 1 εμ ′ ðk, r Þ εν ðk, r Þ be described by

1130

23

X1 = =

e4 Tr 4

e4 Tr 4

2

2

ε0 ðk, r Þ ε0ν ðk, r Þ r=1 μ

r0 = 1

2 r0 = 1

2 ε ðk, r Þ εν ðk, r Þ r=1 μ

Interaction Between Fields μν M~

μν þ Ξ k μ , k ν M~ ,

where Ξ(kμ, kν) represents the second term of (23.225). That is, we have Ξ kμ , kν 

2 r=1

~ ðk Þ=α ∓ ik μ εν ðk, r ÞΛ ~ ðkÞ=α ~ ðk Þ =α þ kμ kν Λ ± ik ν εμ ðk, r Þ Λ

2

:

ð23:226Þ μν The factor M~ represents a remaining part of X1 . In the present case, the μν ~ indicate those of the gamma matrices γ μ and γ ν. Since X1 is indices μ and ν of M μν gauge invariant, Ξ kμ , k ν M~ must vanish. As Ξ(kμ, kν) is linear with respect to both kμ and kν, it follows that

k μ M~

μν

= k ν M~

μν

= 0:

ð23:227Þ

Meanwhile, we had 2

ε ðk, r Þ r=1 μ



εν ðk, r Þ = - ημν -

k μ kν - ðknÞ kμ nν þ nμ k ν ðknÞ2

:

ð23:228Þ

Note that as pointed out previously, (22.377) holds with the real photon field (i.e., an external photon) as in the present case. Then, we obtain X1 =

e4 Tr 4

2 r0 = 1

- ημν -

kμ kν - ðknÞ kμ nν þ nμ kν ðknÞ

2

μν M~ :

ð23:229Þ

But, from the reason mentioned just above, we have kμ k ν - ðknÞ kμ nν þ nμ kν ðknÞ2

μν M~ = 0,

ð23:230Þ

μν with only the first term - ημν M~ contributing to X1 . With regard to another 2 0 0  0 0 related factor ε ð k , r Þε ð k , r Þ of (23.221), the above argument holds 0 β r =1 α as well. Thus, the calculation of X1 comes down to the trace computation of (23.221). The remaining calculations are straightforward and follow the literature method [1]. We first calculate the numerator of (23.221). To this end, we define a (4, 4) matrix Y included as a numerator of (23.221) as [1]

23.8

Example: Compton Scattering [1]

1131

Y  γ σ pμ þ k μ γ μ þ m γ ρ ðpα γ α þ mÞγ ρ ðpλ þ k λ Þγ λ þ m γ σ # use γ ρ( pαγ α)γ ρ = - 2( pαγ α) and γ ργ ρ = 4: = γ σ pμ þ kμ γ μ þ m ð- 2pα γ α þ 4mÞ ðpλ þ kλ Þγ λ þ m γ σ = γ σ - 2 pμ þ k μ γ μ ðpα γ α Þðpλ þ k λ Þγ λ - 2m pμ þ kμ γ μ ðpα γ α Þ þ4m pμ þ kμ

2

þ 4m2 pμ þ kμ γ μ - 2mðpα γ α Þðpλ þ kλ Þγ λ

- 2m2 ðpα γ α Þ þ 4m2 ðpλ þ kλ Þγ λ þ 4m3 γ σ = 4ðpλ þ kλ Þγ λ ðpα γ α Þ pμ þ k μ γ μ - 16m pμ þ kμ pμ þ16m pμ þ kμ

2

- 16m2 pμ þ kμ γ μ þ 4m2 ðpα γ α Þ þ 16m3 :

ð23:231Þ

With the last equality of (23.231), we used the following formulae [1]: γ λ γ λ = 4E 4 , σ

λ

σ

σ

ð21:185Þ σ

λ

σ

γ λ ðAσ γ Þγ = - 2ðAσ γ Þ, γ λ ðAσ γ ÞðBσ γ Þγ = 4Aσ B ,

ð21:187Þ

γ λ ðAσ γ σ ÞðBσ γ σ ÞðCσ γ σ Þγ λ = - 2ðC σ γ σ ÞðBσ γ σ ÞðAσ γ σ Þ:

ð21:188Þ

In (21.187) and (21.188), Aα, Bα, and Cα are arbitrary four-vectors. The spin and polarization average and sum have already been taken in the process of deriving (23.218), an example of which is given as (23.221). Hence, what remains to be done is only the computation of the trace. To this end, we define another real number Y as [1] Y  TrY p0β γ β þ m :

ð23:232Þ

Then, from (23.221), we have X1 = e4 Y=16ðpk Þ2 :

ð23:233Þ

With Y, we obtain Y = 16 2 pμ þ kμ pμ

pρ þ k ρ p0

ρ

- pμ þ kμ ðpμ þ kμ Þpρ p0

ρ

þm2 - 4pμ ðpμ þ kμ Þ þ 4 pμ þ kμ ðpμ þ kμ Þ þm2 pρ p0ρ - 4 pμ þ kμ p0μ þ 4m4 g: In deriving (23.234), we used the following formulae [1]:

ð23:234Þ

1132

23

Interaction Between Fields

Tr½ðAσ γ σ ÞðBσ γ σ Þ = 4Aσ Bσ , σ

σ

σ

ð21:189Þ

σ

Tr½ðAσ γ ÞðBσ γ ÞðC σ γ ÞðDσ γ Þ = 4½ðAσ Bσ ÞðC σ Dσ Þ - ðAσ C σ ÞðBσ Dσ Þ þ ðAσ Dσ ÞðBσ Cσ Þ:

ð21:190Þ

Also, we used the fact that the trace of an odd number product of gamma matrices vanishes [1]. Calculations of (23.234) are straightforward using three independent scalars, i.e., m2, pk, and pk′; here notice that pk, pk′, etc. are the abbreviations of pk  pμ kμ , pk 0  pμ k 0μ , etc:

ð23:235Þ

ðp þ kÞp = p2 þ kp = m2 þ kp:

ð23:236Þ

We note, for instance,

The conservation of four-momentum is expressed as p þ k = p0 þ k 0 :

ð23:237Þ

Taking the square of both sides of (23.237), we have p2 þ 2kp þ k2 = p02 þ 2k0 p0 þ k 02 :

ð23:238Þ

Using p2 = p′2 = m2 and k2 = k′2 = 0, we get kp = k 0 p0 :

ð23:239Þ

From (23.237), we obtain another relation pk 0 = p0 k:

ð23:240Þ

Rewriting (23.237), in turn, we obtain a relationship p - p0 = k0 - k:

ð23:241Þ

Taking the square of both sides of (23.241) again and using k′k = pk - pk′, we get pp0 = m2 þ pk - pk 0 :

ð23:242Þ

Using the above relationship among the scalars, we finally get [1] Y = 32 m4 þ m2 ðpk Þ þ ðpk Þðpk 0 Þ : Substituting (23.243) for (23.233), we obtain

ð23:243Þ

23.8

Example: Compton Scattering [1]

X1 =

1133

2e4 ½m4 þ m2 ðpk Þ þ ðpk Þðpk 0 Þ : ðpk Þ2

ð23:244Þ

Next, we compute X2 of (23.215). From (23.172), with the c-numbers of the photon field, we have 



jM a j2 / εν ðk0 , r 0 Þ εν ðk0 , r 0 ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ðk, r Þεμ ðk, r Þ : Since X1 is obtained as a result of averaging the indices r and r′, X1 is unaltered by exchanging the indices r and r′. Hence, we obtain    X1 / εν ðk0 , r Þ εν ðk0 , r ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ðk, r 0 Þεμ ðk, r 0 Þ    = εν ð- k0 , r Þ εν ð- k0 , r ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ð- k, r 0 Þεμ ð- k, r 0 Þ :

Since in the above equation we are thinking of the transverse modes with the polarization, those modes are held unchanged with respect to the exchange of k $ - k and k′ $ - k′ as shown in the above equation. Meanwhile, from (23.176), we have X 2 / jM b j2 



/ εν ðk, r Þεν ðk, r Þ S~F ðp - k0 ÞS~F ðp - k 0 Þεμ ðk0 , r 0 Þ εμ ðk0 , r 0 Þ: Comparing the above expressions of X1 and X2 , we notice that X2 can be obtained merely by switching the four-momentum k $ - k′. Then, changing k $ - k′ in (23.244), we immediately get X2 =

2e4 ½m4 - m2 ðpk 0 Þ þ ðpk Þðpk 0 Þ ðpk 0 Þ

2

:

ð23:245Þ

To calculate X3 , we need related but slightly different techniques from that used for the calculation of X1 . The outline for this is as follows. As similarly to the case of (23.221), we arrive at the following relation described by X3 =

γ σ pρ þ kρ γ ρ þ m γ μ ðpα γ α þ mÞγ σ pρ - k 0ρ γ ρ þ m γ μ p0β γ β þ m e4 Tr 4 ð2pk Þð- 2pk 0 Þ ð23:246Þ

Decomposing the numerator of (23.246), we have non-vanishing eight terms that comprise gamma matrices and four-vectors as well as scalars (i.e., an electron mass

:

1134

23

Interaction Between Fields

m). Of these terms, m is included as either a factor of m2 or m4. Classifying the eight terms according to the power of m, we get m4 : one term, m2 : six terms, m0 : one term:

ð23:247Þ

We evaluate (23.247) termwise below. (i) m4 : m4 γ σ γ μ γ σ γ μ m4 γ σ γ μ γ σ γ μ = m4 ðγ σ γ μ γ σ Þγ μ = m4 ð- 2γ μ Þγ μ = - 2m4 γ μ γ μ = - 8m4 E 4 ,

ð23:248Þ

where with the second and last equalities, we used (21.185); E4 denotes the (4, 4) identity matrix. Taking the trace of (23.248), we have Tr - 8m4 E 4 = - 32m4 :

ð23:249Þ

Notice that the trace of E4 is 4. (ii) m2: Six terms, one of which is e.g., m2γ σ [( pρ + kρ)γ ρ]γ μ[pαγ α]γ σ γ μ m2 γ σ pρ þ kρ γ ρ γ μ ½pα γ α γ σ γ μ = m2 γ σ pρ þ kρ γ ρ  pα γ μ γ α γ σ γ μ = m2 γ σ pρ þ k ρ γ ρ  pα ð4ηασ Þ = m2 γ σ pρ þ k ρ γ ρ  ð4pσ Þ = 4m2 γ σ pσ pρ þ k ρ γ ρ ,

ð23:250Þ

where with the second equality, we used the first equation of (21.186). Taking the trace of (23.250), we obtain Tr 4m2 γ σ pσ pρ þ kρ γ ρ

= 16m2 pðp þ kÞ = 16m2 m2 þ ðpk Þ ,

ð23:251Þ

where with the first equality, we used (21.189). Other five terms of the second power of m are calculated in a similar manner. Summing up the traces of these six matrices, we get 96m4 þ 48m2 ðpk Þ - 48m2 ðpk 0 Þ:

ð23:252Þ

23.8

(iii)

Example: Compton Scattering [1]

1135

m0 : γ σ pρ þ k ρ γ ρ γ μ ½pα γ α γ σ pλ - k0λ γ λ γ μ p0β γ β

γ σ pρ þ k ρ γ ρ γ μ ½pα γ α γ σ pλ - k0λ γ λ γ μ p0β γ β = - 2½pα γ α γ μ pρ þ k ρ γ ρ  = - 2½ pα γ α   γ μ p ρ þ k ρ γ ρ

pλ - k 0λ γ λ γ μ p0β γ β pλ - k 0λ γ λ γ μ  p0β γ β

= - 2½pα γ α   4ðp þ kÞðp - k0 Þ  p0β γ β = - 8ðp þ kÞðp - k 0 Þ½pα γ α  p0β γ β ,

ð23:253Þ

where with the first equality, we used (21.188) with Bσ  1 [or B  (1 1 1 1), where B denotes a covariant four-vector]; with the second to the last equality, we used the second equation of (21.187). Taking the trace of (23.253), we have Tr - 8ðp þ kÞðp - k0 Þ½pα γ α  p0β γ β = - 32ðp þ kÞðp - k0 Þpp0 = - 32m2 m2 þ ðpk Þ - ðpk 0 Þ ,

ð23:254Þ

where with the first equality, we used (21.189); the last equality resulted from (23.237), (23.239), and (23.242). Summing (23.249), (23.252), and (23.254), finally we get X3 = -

e4 32m4 þ 16m2 ðpk Þ - 16m2 ðpk 0 Þ =  4 ð2pk Þð- 2pk 0 Þ e4 ½2m4 þ m2 ðpk Þ - m2 ðpk 0 Þ : ðpk Þðpk 0 Þ

ð23:255Þ

From (23.216) and (23.217), at once we see that X4 is obtained as a complex conjugate of X3 . Also, we realize that (23.255) remains unchanged with respect to the exchange of k with -k′. Since X3 is real, we obtain X3 = X4 = -

e4 ½2m4 þ m2 ðpk Þ - m2 ðpk 0 Þ : ðpk Þðpk 0 Þ

ð23:256Þ

Summing up all the above results and substituting them for (23.218), we get X of (23.213) as

1136

23

X = X1 þ X2 þ X3 þ X4 = =

2

2

2

2

r=1

r0 = 1

h=1

h0 = 1

j Mj 2

2e4 ½m4 þ m2 ðpk Þ þ ðpk Þðpk0 Þ 2e4 ½m4 - m2 ðpk 0 Þ þ ðpk Þðpk 0 Þ þ 2 ðpk Þ2 ðpk 0 Þ -2×

= 2e4 m4

k′ .

1 4

Interaction Between Fields

1 1 pk pk 0

e4 ½2m4 þ m2 ðpk Þ - m2 ðpk 0 Þ ðpk Þðpk 0 Þ

2

þ 2m2

1 1 pk pk 0

þ

pk 0 pk þ pk pk0

:

ð23:257Þ

Once again, notice that (23.257) is unchanged with the exchange between k and -

23.8.6 Experimental Observations Our next task is to connect (23.257) to the experimental results associated with the scattering cross-section that has been expressed as (23.200). In the present situation, we assume a scattering described by (23.178) as P1 þ P2⟶P0 1 þ P0 2:

ð23:258Þ

It is the case where N = 2 in (23.178). In relation to an evaluation of the crosssection of Compton scattering, we frequently encounter an actual case where electrons and/or photons are unpolarized or their polarization is not detected with their initial or final states (i.e., before and after the collision experiments). In this section, we deal with such a case in the general manner possible. We denote the momentum four-vectors by p

E , p0  p

E0 ,k  p0

ω , k0  k

ω0 , k0

ð23:259Þ

where p and k represent the initial state of the electron and photon field, respectively; p′ and k′ represent the final state of the electron and photon field, respectively. To use (23.200), we express the quantities of (23.259) in the laboratory frame where a photon beam is incident to the targeted electrons that are nearly stationary. In practice, this is a good approximation. Thereby, p of (23.259) is approximated as p≈

m 0

:

Then, the momentum conservation described by p + k = p′ + k′ is translated into

23.8

Example: Compton Scattering [1]

1137

Fig. 23.6 Geometry of the photon field. With the meaning of the notation see text

=

p0 = k - k0 :



ð23:260Þ

The energy conservation described by E + ω = E′ + ω′ is approximated as m þ ω ≈ E 0 þ ω0 :

ð23:261Þ

k0 k = pk - pk 0

ð23:262Þ

Meanwhile, the relation

directly derived from the four-momentum conservation is approximated by k0 k ≈ mω - mω0 = mðω - ω0 Þ:

ð23:263Þ

With the photon field, we have ω2 = k2 , ω0 = k0 , ω = jkj, ω0 = jk0 j: 2

2

ð23:264Þ

Figure 23.6 shows the geometry of the photon field. We assume that photons are scattered in a radially symmetric direction with respect to the photon momentum k as a function of the scattering angle θ. Notice that we conform Figs. 23.6 to 1.1. In this geometry, we have k0 k = ω0 ω - k0  k = ω0 ω - ω0 ω cos θ = ω0 ωð1 - cos θÞ:

ð23:265Þ

Combining (23.263) and (23.265), we obtain mðω - ω0 Þ = ωω0 ð1 - cos θÞ:

ð23:266Þ

This is consistent with (1.16), if (1.16) is written in the natural units. In (23.258), we may freely choose P′1 (or P′2) from among electron and photon. For a practical

1138

23

Interaction Between Fields

purpose, however, we choose photon for P′1 (i.e., scattered photon) in (23.258). That is, since we are to directly observe the photon scattering in the geometry of the laboratory frame, it will be convenient to measure the energy and momentum of the scattered photon as P′1 in the final state. Then, P′2 is electron accordingly. To apply (23.200) to the experimental results, we need to express the electron energy E 02 in terms of the final photon energy ω′. We have E 0 = p0 þ m2 = ðk - k0 Þ þ m2 , 2

2

2

ð23:267Þ

where with the last equality, we used (23.260). Applying the cosine rule to Fig. 23.6, we have ðk - k0 Þ = jkj2 þ jk0 j - 2jkj  jk0 j cos θ = ω2 þ ω0 - 2ωω0 cos θ: 2

2

2

ð23:268Þ

Then, we obtain E0 =

ω2 þ ω0 2 - 2ωω0 cos θ þ m2 :

ð23:269Þ

Differentiating (23.269) with respect to ω′, we get ∂E0 = ∂ω0 2

2ω0 - 2ω cos θ ω2 þ ω0 2 - 2ωω0 cos θ þ m2

:

ð23:270Þ

Accordingly, we obtain ∂ ð E 0 þ ω0 Þ = ∂ω0 =

ω0 - ω cos θ ω2 þ ω0 2 - 2ωω0 cos θ þ m2

þ1

ω0 - ω cos θ þ E 0 ωð1 - cos θÞ þ m = , E0 E0

ð23:271Þ

where with the last equality, we used (23.261). Using (23.266), furthermore, we get ∂ ðE 0 þ ω 0 Þ mω = 0 0: Eω ∂ω0

ð23:272Þ

Now, (23.200) is read as d~ σ=

∂ðω0 þ E 0 Þ ω0 2 X 64π 2 mωω0 E 0 vrel ∂ω0

-1

dΩ0 :

Substituting (23.189), (23.257), and (23.272) for (23.273), we get

ð23:273Þ

23.9

Summary

1139

d~ σ ω0 2 E 0 ω0 =  X 0 0 2 0 64π mωω E vrel mω dΩ =

2e4 ω0 2 1 1 m4 pk pk 0 64π 2 m2 ω2

2

þ 2m2

1 1 pk pk 0

þ

pk 0 pk þ pk pk 0

: ð23:274Þ

The energy of the scattered electron was naturally deleted. As mentioned earlier, in (23.274) we assumed vrel = 1 as a good approximation. In the case where the electron is initially at rest, we can use the approximate relations (23.266) as well as pk = mω and pk′ = mω′. On these conditions, we obtain a succinct relation described by α 2 ω0 d~ σ = dΩ0 2m2 ω

2

ω ω0 þ - sin2 θ , ω0 ω

ð23:275Þ

where α is the fine structure constant given in (23.87). Moreover, using (23.266) ω′ can be eliminated from (23.275) [1]. The polarized measurements of the photon have thoroughly been pursued and the results are well-known as the Klein-Nishina formula. The details can be seen in the literature [1, 3].

23.9 Summary In this chapter, we have described the features of the interaction among quantum field, especially within the framework of QED. As a typical example, we detailed the calculation procedures of the Compton scattering. It is intriguing to point out that QED adequately explains the various aspects of the Compton scattering, whose experiments precisely opened up a route to QED. In this chapter, we focused on the S-matrix expansion of the second order. Yet, the essence of QED clearly showed up. Namely, we studied how the second-order Sð2Þ matrix elements Sk ðk = A, ⋯, FÞ reflected major processes of QED including the Compton scattering. We have a useful index about whether the integrals relevant to the S-matrix diverge. The index K is given by [1] K = 4 - ð3=2Þf e - be ,

ð23:276Þ

where fe and be are the number of the external Fermion (i.e., electron) lines and external Boson (photon) lines in the Feynman diagram, respectively. That K ≥ 0 is a necessary condition for the integral to diverge. In the case of the Compton scattering, fe = be = 2, then we have K = - 1. The integrals converge accordingly. We have confirmed that this is really the case. In the cases of the electron self-energy (K = 1) and photon self-energy (K = 2), the second-order S-matrix integrals diverge [1].

1140

23

Interaction Between Fields

Even though we did not go into detail on those issues, especially the renormalization, we gained a glimpse into the essence of QED that offers prolific aspects on the elementary particle physics.

References 1. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester 2. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 3. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York 4. Kaku M (1993) Quantum field theory. Oxford University Press, New York 5. Peskin ME, Schroeder DV (1995) An introduction to quantum field theory. Westview Press, Boulder 6. Moller C (1952) The theory of relativity. Oxford University Press, London 7. Klauber RD (2013) Student friendly quantum field theory, 2nd edn. Sandtrove Press, Fairfield 8. Sakamoto M (2020) Quantum field theory II. Shokabo, Tokyo. (in Japanese) 9. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York

Chapter 24

Basic Formalism

We have studied several important aspects of the quantum theory of fields and found that the theory is based in essence upon the theory of relativity. In Chaps. 1 and 21, we briefly surveyed its essence. In this chapter, we explore the basic formalism and fundamental skeleton of the theory more systematically in depth and in breadth. First, we examine the constitution of the Lorentz group from the point of view of Lie algebra. The Lorentz group comprises two components, one of which is virtually the same as the rotation group that we have examined in Part IV. Another component, on the other hand, is called Lorentz boost, which is considerably different from the former one. In particular, whereas the matrices associated with the spatial rotation are unitary, the matrices that represent the Lorentz boost are not unitary. In this chapter, we connect these aspects of the Lorentz group to the matrix representation of the Dirac equation to clarify the constitution of the Dirac equation. Another point ~ that describe of great importance rests on the properties of Dirac operators G and G the Dirac equation (Sect. 21.3). We examine their properties from the point of view ~ can be diagonalized through of matrix algebra. In particular, we study how G and G the similarity transformation. Remaining parts of this chapter deal with advanced topics of the Lorentz group from the point of view of continuous groups and their Lie algebras.

24.1

Extended Concepts of Vector Spaces

In Part III, we studied the properties and characteristics of the linear vector spaces. We have seen that among the vector spaces, the inner product space and its properties are most widely used and investigated in various areas of natural science, accompanied by the bra-ket notation ha| bi of inner product due to Dirac (see Chaps. 1 and 13). The inner product space is characterized by the positive definite metric associated with the Gram matrix (Sect. 13.2). In this section, we wish to modify the concept of metric so that it can be applied to wider vector spaces © Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_24

1141

1142

24

Basic Formalism

including the Minkowski space (Chap. 21). In parallel, we formally introduce tensors and related concepts as a typical example of the extended vector spaces.

24.1.1

Bilinear Mapping

First, we introduce a direct product (vector) space formed by the product of two (or more) vector spaces. In Part IV, we introduced a notion of direct product in relation to groups and their representation. In this chapter, we extend this notion to the linear vector spaces. We also study the properties of the bilinear mapping that maps the direct product space to another vector space. First, we have a following definition: Definition 24.1 Let B be a mapping described by B : W × V ⟶ T,

ð24:1Þ

where V, W, and T are vector spaces over K; W × V means the direct product of V and W. Suppose that B(, ) satisfies the following conditions [1, 2]: Bðy þ y0 , xÞ = Bðy, xÞ þ Bðy0 , xÞ, Bðy, x þ x0 Þ = Bðy, xÞ þ Bðy, x0 Þ, Bðay, xÞ = Bðy, axÞ = aBðy, xÞ x, x0 2 V; y, y0 2 W; a 2 K,

ð24:2Þ

where K is a field usually chosen from a complex number field ℂ or real number field ℝ. Then, the mapping B is called a bilinear mapping. The range of B, i.e., B(, ) is a subspace of T. This is obvious from the definition of (24.2) and (11.15). Since in Definition 24.1 the dimension of V and W can be different, we assume that the dimension of V and W is n and m, respectively. If the direct product W × V is deemed to be a topological space (Sect. 6.1.2), it is called a direct product space. (However, we do not need to be too strict with the terminology.) Definition 24.1 does not tell us more detailed structure of B(, ) with the exception of the fact that B(, ) forms a subspace of T. To clarify this point, we set further conditions on the bilinear mapping B defined by (24.1) and (24.2). The conditions (B1)-(B3) are given as follows [1, 2]: (B1) Let V and W be linear vector spaces with dimV = n and dimW = m. If x1, ⋯, xr 2 V are linearly independent, with y1, ⋯, yr 2 W we have r i=1

where r ≤ n.

Bðyi , xi Þ = 0



yi = 0 ð1 ≤ i ≤ r Þ,

ð24:3Þ

24.1

Extended Concepts of Vector Spaces

1143

(B2) If y1, ⋯, yr 2 W are linearly independent, for x1, ⋯, xr 2 V, we have r i=1

Bðyi , xi Þ = 0



xi = 0 ð1 ≤ i ≤ r Þ,

ð24:4Þ

where r ≤ m. (B3) The vector space T is spanned by B(y, x) (x 2 V, y 2 W). Regarding Definition 24.1, we have a following important theorem. Theorem 24.1 [1, 2] Let B be a bilinear mapping described by B(, ) : W × V ⟶ T, where V, W, and T are vector spaces over K. Let {ei; 1 ≤ i ≤ n} and {fj; 1 ≤ j ≤ m} be the basis set of V and W, respectively. Then, a necessary and sufficient condition for the statements (B1)-(B3) to hold is that B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of T. Proof (i) Sufficient condition: Suppose that B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of T. That is, we have T = Span B f j , ei ; 1 ≤ i ≤ n, 1 ≤ j ≤ m :

ð24:5Þ

From Definition 24.1 and from the assumption that ei (1 ≤ i ≤ n) and fj (1 ≤ j ≤ m) are the basis set of V and W, regarding 8x 2 V, 8y 2 W, B(y, x) is described by a linear combination of B( fj, ei). Since that linear combination may take any number (over K ) as a coefficient of B( fj, ei), the said B(y, x) spans T. It is the assertion of (B3). Suppose that with x(k) (2V ) and y(k) (2W ) we have xðkÞ = ðk Þ

ðk Þ

with ξi , ηj

ðk Þ ξ ei , yðkÞ i=1 i n

=

ðk Þ η fj j=1 j m

ð1 ≤ k ≤ r Þ,

ð24:6Þ

2 K. We assume that in (24.6) r ≤ max (m, n). Then, we have

r k=1

B yð k Þ , xð k Þ =

In (24.7) putting

r k = 1B

m

n

j=1

i=1

ðk Þ ðkÞ η ξi k=1 j r

B f j , ei :

ð24:7Þ

yðkÞ , xðkÞ = 0, we get ðk Þ ðk Þ η ξi k=1 j r

= 0:

ð24:8Þ

It is because B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of T and, hence, linearly independent. Equation (24.8) holds with any pair of j and i. Therefore, with a fixed j, from (24.8) we obtain

1144

24 ð1Þ ð1Þ

ð2Þ ð2Þ

ðr Þ ðr Þ

ð1Þ ð1Þ

ð2Þ ð2Þ

ðr Þ ðr Þ

Basic Formalism

ηj ξ1 þ ηj ξ1 þ ⋯ þ ηj ξ1 = 0, ηj ξ2 þ ηj ξ2 þ ⋯ þ ηj ξ2 = 0, ⋮ ð1Þ

ð2Þ

ðr Þ

ηj ξðn1Þ þ ηj ξðn2Þ þ ⋯ þ ηj ξðnrÞ = 0:

ð24:9Þ

Rewriting (24.9) in a form of column vectors, we get ð1Þ

ð1Þ ηj

ξ1 ⋮ ξðn1Þ

ð2Þ

þ

ξ1 ⋮ ξðn2Þ

ð2Þ ηj

ðr Þ

þ⋯þ

ðr Þ ηj

ξ1 ⋮ ξðnrÞ

= 0:

ð24:10Þ

If x(1), ⋯, x(r) 2 V are linearly independent, the corresponding column vectors of (24.10) are linearly independent as well with the fixed basis set of B( fj, ei). Consequently, we have ð1Þ

ð2Þ

ðr Þ

ηj = ηj = ⋯ = ηj = 0:

ð24:11Þ

Since (24.11) holds with all j (1 ≤ j ≤ m), from (24.6) we obtain yð1Þ = yð2Þ = ⋯ = yðrÞ = 0: The above argument ensures the assertion of (B1). In a similar manner, the statement (B2) is assured. Since B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of T, (B3) is evidently assured. (ii) Necessary condition: Suppose that the conditions (B1)- (B3) hold. From the argument of the sufficient condition, we have (24.5). Therefore, for B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) to be the basis set of T, we only have to show that B( fj, ei) are linearly independent. Suppose that we have f j , ei = 0, ζ ij 2 K:

ζ B i,j ij

ð24:12Þ

Putting yi = ∑jζ ijfj, from (24.12) we obtain i

Bðyi , ei Þ = 0:

ð24:13Þ

Since {ei; 1 ≤ i ≤ n} are linearly independent, from (B1) yi = 0. Since {fj; 1 ≤ j ≤ m} are the basis set of W, from ∑jζ ijfj = 0 we must have ζ ij = 0 (1 ≤ i ≤ n, 1 ≤ j ≤ m). Thus, from (24.12) B( fj, ei) are linearly independent. These complete the proof.

24.1

Extended Concepts of Vector Spaces

1145

Earlier we have mentioned that B(, ) is a subspace of the vector space T. Practically, however, we may safely identify the range of B with the whole space of T. In other words, we may as well regard the subspace B(, ) as T itself. Thus, the dimension of T can be regarded as mn. Abbreviating B( fj, ei) as fj  ei and omitting superscript k in (24.7), we have Bðy, xÞ =

m

n

j=1

i=1

f j  ei η j ξ i :

ð24:14Þ

Further rewriting (24.14), we obtain Bðy, xÞ

= ð f 1  e1 ⋯ f 1  en

f 2  e1 ⋯ f 2  en ⋯ f m  e1 ⋯ f m  en Þ

η 1 ξ1 ⋮ η 1 ξn η 2 ξ1 ⋮ η 2 ξn ⋮ ηm ξ1 ⋮ ηm ξn

:

ð24:15Þ Equations (24.14) and (24.15) clearly show that any elements belonging to T can be expressed as RHS of these equations. Namely, T is spanned by B(y, x) (x 2 V, y 2 W ). This is what the statement (B3) represents. Equation (24.15) corresponds to (11.13); in (24.15) mn basis vectors form a row vector and mn “coordinates” form a corresponding column vector. The next example will help us understand the implication of the statements (B1)(B3) imposed on the bilinear mapping earlier in this section. Example 24.1 Let V and W be vector spaces with dimension 3 and 2, respectively. Suppose that xi 2 V (i = 1, 2, 3) and yj 2 W ( j = 1, 2, 3). Let xi and yj be given by x1 = ð e1 e2 e3 Þ

y1 = ð f 1 f 2 Þ

ξ1 ξ2 ξ3

, x 2 = ð e1 e2 e3 Þ

η1 η′ , y2 = ðf 1 f 2 Þ 1′ η2 η2

ξ1′ ξ2′ ξ3′

, x3 = ð e1 e2 e3 Þ

, y3 = ðf 1 f 2 Þ

η″1 , η″2

ξ″1 ξ″2 ξ″3

;

ð24:16Þ

where (e1 e2 e3) and ( f1 f2) are basis sets of V and W, respectively; ξ1, η1, etc. 2 K. For simplicity, we write

1146

24

Basic Formalism

ðe1 e2 e3 Þ  e, ðf 1 f 2 Þ  f :

ð24:17Þ

We define the bilinear mapping B(y, x) as, e.g.,

Bðy1 ; x1 Þ = ðf × eÞ

η1 η2

ξ1 ξ2 ξ3

×

η1 ξ1 η1 ξ2 η1 ξ3 η2 ξ1 η2 ξ2 η2 ξ3

 ð f × eÞ

,

ð24:18Þ

where we have ð f × eÞ = ð f 1  e1

f 1  e2

f 1  e3

f 2  e1

f 2  e3 Þ :

f 2  e2

ð24:19Þ

The RHS of (24.19) is a row vector comprising the direct products of fjek ( j = 1, 2; k = 1, 2, 3). Fixing the basis sets of V and W, we represent the vectors as numerical vectors (or column vectors) [1, 2]. Then, the first equation of the bilinear mapping in (24.3) of (B1) can be read as

Bðy1 ; x1 Þ þ Bðy2 ; x2 Þ þ Bðy3 ; x3 Þ = ðf × eÞ

η1 ξ1 þ η1′ ξ1′ η1 ξ2 þ η1′ ξ2′ η1 ξ3 þ η1′ ξ3′ η2 ξ1 þ η2′ ξ1′ η2 ξ2 þ η2′ ξ2′ η2 ξ3 þ η2′ ξ3′

þ η″1 ξ″1 þ η″1 ξ″2 þ η″1 ξ″3 þ η″2 ξ″1 þ η″2 ξ″2 þ η″2 ξ″3

= 0: ð24:20Þ

Omitting the basis set (f × e), (24.20) is separated into two components such that ξ1 η1

ξ2 ξ3 þ η002

þ η01

ξ01 ξ02 ξ03

ξ001 ξ002

þ η001

ξ001 ξ002

ξ1 = 0, η2

ξ003

= 0:

ξ2 ξ3

þ η02

ξ01 ξ02 ξ03 ð24:21Þ

ξ003

Assuming that ξ1, ξ2, and ξ3 (2V ) are linearly independent, (24.21) implies that η1 = η01 = η001 = η2 = η02 = η002 = 0. Namely, from (24.16) we have y1 = y2 = y3 = 0:

ð24:22Þ

This is an assertion of (B1). Meanwhile, think of the following equation:

24.1

Extended Concepts of Vector Spaces

1147

Bðy1 ; x1 Þ þ Bðy2 ; x2 Þ = ðf × eÞ

η1 ξ1 þ η1′ ξ1′ η1 ξ2 þ η1′ ξ2′ η1 ξ3 þ η1′ ξ3′ η2 ξ1 þ η2′ ξ1′ η2 ξ2 þ η2′ ξ2′ η2 ξ3 þ η2′ ξ3′

= 0:

ð24:23Þ

This time, (24.23) can be separated into three components such that ξ1

η1 η2 þ ξ03

η01 η02

þ ξ01 η01 η02

= 0, ξ2

η1 η2

þ ξ02

η01 η02

= 0, ξ3

η1 η2

= 0:

ð24:24Þ

Assuming that y1 and y2(2W) are linearly independent, in turn, (24.24) implies that ξ1 = ξ01 = ξ2 = ξ02 = ξ3 = ξ03 = 0. Namely, from (24.16), we have x1 = x2 = 0:

ð24:25Þ

This is an assertion of (B2). The statement (B3) is self-evident; see (24.14) and (24.15). In relation to Theorem 24.1, we give another important theorem. Theorem 24.2 [1–3] Let V and W be vector spaces over K. Then, a vector space T (over K ) and a bilinear mapping B : W × V ⟶ T that satisfy the above-mentioned conditions (B1), (B2), and (B3) must exist. Also, let T be an arbitrarily chosen vector space over K and let B be an arbitrary bilinear mapping expressed as B : W × V ⟶ T:

ð24:26Þ

Then, a linear mapping ρ : T ⟶ T that satisfies the condition described by Bðy, xÞ = ρ½Bðy, xÞ; x 2 V, y 2 W

ð24:27Þ

is uniquely determined. Proof (i) Existence: Suppose that T is a nm-dimensional vector space with the basis set B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m), where ei (1 ≤ i ≤ n) and fj (1 ≤ j ≤ m) are the basis set of V and W, respectively. We assume that V and W are n- and m-dimensional vector spaces, respectively. Let x 2 V and y 2 W be expressed as

1148

24 n

x=

Basic Formalism

m

ξ e ,y= i=1 i i

ηf : j=1 j j

ð24:28Þ

We define the bilinear mapping B(y, x) as m

Bðy, xÞ =

n

ηξe i = 1 j i ji

j=1

ð24:29Þ

with eji  B f j , ei , so that eij 2 T ð1 ≤ i ≤ n, 1 ≤ j ≤ mÞ is the basis set of T. Then, we have B : W × V ⟶ T. On top of it, a set of (T, B) satisfies the conditions (B1)(B3) according to Theorem 24.1. Thus, existence of the bilinear mapping of B required by the statement of the theorem must necessarily exist. (ii) Uniqueness: Suppose that an arbitrary bilinear mapping B : W × V ⟶ T is given. Given such B and T, we define a linear mapping ρ : T ⟶ T that is described by ζ e i,j ji ji

ρ

=

ζ B i,j ji

ð24:30Þ

f j , ei ,

where ζ ji is an arbitrary number over K. Choosing ζ ji = 1 when j = k, i = l, and otherwise ζ ji = 0, we have ρðekl Þ = Bðf k , el Þ:

ð24:31Þ

Meanwhile, we have shown the existence of the bilinear mapping B(y, x) described by (24.29). Operating ρ on both sides of (24.29), we obtain m

ρ½Bðy, xÞ = ρ =

ηξB i,j j i

n

j=1

f j , ei =

i,j

ηξe i = 1 j i ji

=

B ηj f j , ξi ei = B

m j=1

ηf , j j j

n

ηξρ i=1 j i ξe i i i

eji = Bðy, xÞ,

where with the third equality, we used (24.30) and (24.31); with the fourth and fifth equalities, we used the bilinearity of B. Thus, the linear mapping ρ defined in (24.30) satisfies (24.27). The other way around, any linear mapping that satisfies (24.27) can be described as ρ eji = ρ B f j , ei

= B f j , ei :

ð24:32Þ

Multiplying both sides of (24.32) by ζji and summing over i and j, we recover (24.30). Thus, the linear mapping ρ is determined uniquely. These complete the proof.

24.1

Extended Concepts of Vector Spaces

24.1.2

1149

Tensor Product

Despite being apparently craggy and highly abstract, Theorem 24.2 finds extensive applications in the vector space theory. The implication of Theorem 24.2 is that once the set (T, B) is determined for the combination of the vector spaces V and W such that B : W × V ⟶ T, any other set ðT, BÞ can be derived from (T, B). In this respect, B defined in Theorem 24.2 is referred to as the universal bilinear mapping [3]. Another important point is that the linear mapping ρ : T ⟶ T defined in Theorem 24.2 is bijective (see Fig. 11.3) [1, 2]. That is, the inverse mapping ρ-1 exists. The range of the universal bilinear mapping B is a vector space over K. The vector space T accompanied by the bilinear mapping B, or the set (T, B) is called a tensor product (or Kronecker product) of V and W. Thus, we find that the constitution of tensor product hinges upon Theorem 24.2. The tensor product T is expressed as T  W  V:

ð24:33Þ

Bðy, xÞ  y  x,

ð24:34Þ

Also, B(y, x) is described as

where x 2 V, y 2 W, and y  x 2 T. The notion of tensor product produces a fertile plain for the vector space theory. In fact, (24.29) can readily be extended so that various bilinear mappings are included. As a typical example, let us think of linear mappings φ and ψ such that φ : V ⟶ V 0, ψ : W ⟶ W 0,

ð24:35Þ

where V, V′, W, and W′ are vector spaces of the dimension n, n′, m, and m′, respectively. Also, suppose that we have [1, 2] T =W  V

and

T  W 0  V 0:

ð24:36Þ

What we assume in (24.36) is that even though in Theorem 24.2 we have chosen T as an arbitrary vector space, in (24.36) we choose T as a special vector space (i.e., a tensor product W′  V′). From (24.35) and (24.36), using φ and ψ we can construct a bilinear mapping B : W × V ⟶ W 0  V 0 such that Bðy, xÞ = ψ ðyÞ  φðxÞ,

ð24:37Þ

where ψ( y) 2 W′, φ(x) 2 V′, and ψ ðyÞ  φðxÞ 2 T = W 0  V 0 . Then, from Theorem 24.2 and (24.34), a linear mapping ρ : T ⟶ T must uniquely be determined such that

1150

24

ρðy  xÞ = ψ ðyÞ  φðxÞ:

Basic Formalism

ð24:38Þ

Also, defining this linear mapping ρ as ρ  ψ  φ,

ð24:39Þ

(24.38) can be rewritten as [1, 2] ðψ  φÞðy  xÞ = ψ ðyÞ  φðxÞ:

ð24:40Þ

Equation (24.40) implies that in response to a tensor product of the vector spaces, we can construct a tensor product of the operators that act on the vector spaces. This is one of the most practical consequences derived from Theorem 24.2. The constitution of the tensor products is simple and straightforward. Indeed, we have already dealt with the tensor products in connection with the group representation (see Sect. 18.8). Let A and B be matrix representations of φ and ψ, respectively. Following (18.193), the tensor product of B  A is expressed as [4] b11 A ⋮ bm0 1 A

B  A=

⋯ ⋱ ⋯

b1m A ⋮ bm0 m A

ð24:41Þ

,

where A and B are given by (aij) (1 ≤ i ≤ n′; 1 ≤ j ≤ n) and (bij) (1 ≤ i ≤ m′; 1 ≤ j ≤ m), respectively. In RHS of (24.41), A represents a (n′, n) full matrix of (aij). Thus, the direct product of matrix A and B yields a (m′n′, mn) matrix. We write it symbolically as ½ðm0 , mÞ matrix  ½ðn0 , nÞ matrix = ½ðm0 n0 , mnÞ matrix:

ð24:42Þ

The computation rules of (24.41) and (24.42) apply to a column matrix and a row matrix as well. Suppose that we have x1 x = ðe1 ⋯ en Þ

⋮ xn

y1 , y = ðf 1 ⋯ f m Þ

⋮ ym

,

ð24:43Þ

where e1, ⋯, en are the basis vectors of V and f1, ⋯, fm are the basis vectors of W. Thus, the expression of y  x has already appeared in (24.15) as

24.1

Extended Concepts of Vector Spaces

1151

y  x=

ð f 1  e1 ⋯ f 1  en

y1 x1 ⋮ y1 xn y2 x1 ⋮ y2 xn ⋮ y m x1 ⋮ y m xn

f 2  e1 ⋯ f 2  en ⋯ f m  e1 ⋯ f m  en Þ

:

ð24:44Þ In (24.44), the basis set is fp  eq (1 ≤ p ≤ m, 1 ≤ q ≤ n) and their corresponding coordinates are ypxq (1 ≤ p ≤ m, 1 ≤ q ≤ n). Replacing ψ  φ in (24.40) with B  A, we obtain ðB  AÞðy  xÞ = BðyÞ  AðxÞ:

ð24:45Þ

Following the expression of (11.37), we have

BðyÞ =

f 01



f 0m0

B

y1 ⋮

, AðxÞ =

e01



e0n0

A

x1 ⋮

ym

,

ð24:46Þ

xn

where e01 , ⋯, e0n0 are the basis vectors of V′ and f 01 , ⋯, f 0m0 are the bas vectors of W′. Thus, (24.45) can be rewritten as f 01 ⋯ f 0m0 B =

y1 ⋮ ym



e01 ⋯ e0n0 A

f 01 ⋯ f 0m0  e01 ⋯ e0n0 ðB  AÞ

Using (24.41) and (24.44), we get

y1 ⋮ ym



x1 ⋮ xn x1 ⋮ xn

:

ð24:47Þ

1152

ðB  AÞ

24

y1 ⋮ ym



x1 ⋮ xn

=

b11 A ⋮ bm ′ 1 A

⋯ ⋱ ⋯

y1 x1 ⋮ y1 x n y2 x1 ⋮ y2 x n ⋮ ym x 1 ⋮ ym x n

b1m A ⋮ bm ′ m A

Basic Formalism

: ð24:48Þ

Notice that since B  A operated on the coordinates from the left with the basis set unaltered, we omitted the basis set f 01 ⋯f 0m0  e01 ⋯e0n0 in (24.48). Thus, we only had to perform the numerical calculations based on the standard matrix theory as in RHS of (24.48), where we routinely dealt with a product of a (m′n′, mn) matrix and a (mn, 1) matrix, i.e., column vector. The resulting column vector is a (m′n′, 1) matrix. The bilinearity of tensor product of the operators A and B can readily be checked so that we have B  ðA þ A0 Þ = B  A þ B  A0 , ðB þ B0 Þ  A = B  A þ B0  A:

ð24:49Þ

The bilinearity of the tensor product of column or row matrices can be checked similarly as well. Suppose that the dimension of all the vector spaces V, V′, W, and W′ is the same n. This simplest case is the most important and frequently appears in many practical applications. We have AðxÞ = ðe1 ⋯ en Þ

a11 ⋮ an1

⋯ ⋱ ⋯

a1n ⋮ ann

x1 ⋮ xn

,

B ð y Þ = ð e1 ⋯ e n Þ

b11 ⋮ bn1

⋯ ⋱ ⋯

b1n ⋮ bnn

y1 ⋮ yn

,

ð24:50Þ

where x and y are given by

x = ð e1 ⋯ en Þ

x1 ⋮

, y = ðe 1 ⋯ en Þ

xn Now, we are thinking of the endomorphism

y1 ⋮ yn

:

ð24:51Þ

24.1

Extended Concepts of Vector Spaces

1153

A : V n ⟶ V n, B : V n ⟶ V n,

ð24:52Þ

where Vn is a vector space of the dimension n; A and B are given by

A=

a11



a1n

⋮ an1

⋱ ⋯

⋮ ann

,B=

b11

⋯ b1n

⋮ bn1

⋱ ⋮ ⋯ bnn

:

ð24:53Þ

Then, replacing φ and ψ of (24.40) with A and B as before, respectively, we obtain ðB  AÞðy  xÞ = BðyÞ  AðxÞ:

ð24:54Þ

Following (11.37) once again, we rewrite (24.54) such that ðB  AÞðy  xÞ = ½ðe1 ⋯ en Þ  ðe1 ⋯ en Þ½B  A

y1 ⋮ yn



x1 ⋮ xn

,

ð24:55Þ

where B  A is given by a (n2, n2) matrix expressed as B  A=

b11 A ⋮ bn1 A

⋯ ⋱ ⋯

b1n A ⋮ bnn A

:

ð24:56Þ

The structure of (24.55) is the same as (24.45) in point of the linear mapping. Also, (24.56) has a same constitution as (24.41), except that the former gives a (n2, n2) matrix, whereas the latter is a (m′n′, mn) matrix. In other words, instead of (24.48) we have a linear mapping B  A described by 2

2

B  A : Vn ⟶ Vn :

ð24:57Þ

In this regard, the tensor product produces an enlarged vector space. As shown above, we use a term of tensor product to represent a direct product of two operators (or matrices) besides a direct product of two vector spaces. In relation to the latter case, we use the same term of tensor product to represent a direct product of row vectors and that of column vectors (vide supra). A following example helps us understand how to construct a tensor product. Example 24.2 We have x = ðe1 e2 Þ

x1 x2

and y = ðe1 e2 Þ

y1 y2

. Also, let A and B be

1154

24

A=

a12 ,B= a22

a11 a21

b12 : b22

b11 b21

Basic Formalism

ð24:58Þ

Then, we get ðB  AÞðy  xÞ = ½ðe1 e2 Þ  ðe1 e2 Þ½B  A = ðe1  e1 b11 a11 b11 a21 b21 a11 b21 a21

e1  e2

b11 a12 b11 a22 b21 a12 b21 a22

e2  e 1

b12 a11 b12 a21 b22 a11 b22 a21



y1 y2

x1 x2

e2  e2 Þ ×

b12 a12 b12 a22 b22 a12 b22 a22

y 1 x1 y 1 x2 y 2 x1 y 2 x2

:

ð24:59Þ

Omitting the basis set, we have

RHS of ð24:59Þ =

ðb11 y1 þ b12 y2 Þða11 x1 þ a12 x2 Þ ðb21 y2 þ b22 y2 Þða11 x1 þ a12 x2 Þ ðb11 y1 þ b12 y2 Þða21 x1 þ a22 x2 Þ ðb21 y2 þ b22 y2 Þða21 x1 þ a22 x2 Þ

b a yx l, j 1l 1 j l j =

b a yx l, j 1l 2 j l j b a yx l, j 2l 1 j l j



b a yx l, j 2l 2 j l j

y1′ x1′ y1′ x2′ y2′ x1′ y2′ x2′

,

ð24:60Þ

where the rightmost side denotes the column vector after operating B  A on y  x. Thus, we obtain y0i x0k =

bil akj yl xj :

ð24:61Þ

bil akj Ylj :

ð24:62Þ

j, l

Defining Yik  yi xk , we get Y0ik =

j, l

Equation (24.62) has already represented a general feature of the tensor product. We discuss the transformation properties of tensors later.

24.1

Extended Concepts of Vector Spaces

24.1.3

1155

Bilinear Form

In the previous sections, we have studied several forms of the bilinear mapping. In this section, we introduce a specific form of the bilinear mapping called a bilinear form. Definition 24.2 Let V and W be n-dimensional and m-dimensional vector spaces over K, respectively. Let B be a mapping described by B : W × V ⟶ K,

ð24:63Þ

where W × V is a direct product space (see Sect. 24.1.1); K denotes the field (usually chosen from among a complex number field ℂ or real number field ℝ). Suppose that B(, ) satisfies the following conditions [1, 2]: Bðy þ y0 , xÞ = Bðy, xÞ þ Bðy0 , xÞ, Bðy, x þ x0 Þ = Bðy, xÞ þ Bðy, x0 Þ, Bðay, xÞ = Bðy, axÞ = aBðy, xÞ, x, x0 2 V; y, y0 2 W; a 2 K:

ð24:64Þ

Then, the mapping B is called a bilinear form. Equation (24.64) is virtually the same as (24.2). Note, however, that although in (24.2) the range of B is a vector space, in (24.64) that of B is K (i.e., simply a number). Let x 2 V and y 2 W be expressed as x=

n

ξ e ,y= i=1 i i

m

ηf : j=1 j j

ð24:28Þ

Then, we have Bðy, xÞ =

m j=1

n

ηξB i=1 j i

f j , ei :

ð24:65Þ

Thus, B(, ) is uniquely determined by a (m, n) matrix B( fj, ei) (2K ).

24.1.4

Dual Vector Space

So far, we have introduced the bilinear mapping and tensor product as well as related notions. The vector spaces, however, were “structureless,” as in the case of the vector spaces where the inner product was not introduced (see Part III). Now, we wish to introduce the concept of the dual (vector) space to equip the vector space with a “metric.”

1156

24

Basic Formalism

Definition 24.3 Let V be a vector space over a field K. Then, a collection of arbitrary homogeneous linear functionals f defined on V that map x (2V ) to K is called a dual vector space (or dual space) of V. This simple but abstract definition needs some explanation. The dual space of V is written as V. With 8f 2 V and 8g 2 V, we define the sum f + g and multiplication αf (α 2 K ) as ðf þ gÞðxÞ  f ðxÞ þ gðxÞ, ðαf ÞðxÞ  α  f ðxÞ, x 2 V:

ð24:66Þ

Then, V forms a vector space over K with respect to the operation of (24.66). The dimensionality of V is the same as that of V [1, 2]. To show this, we make a following argument: Let 8x 2 V be expressed as x=

n i=1

ξ i ei ,

where ξi 2 K and ei (i = 1, ⋯, n) is a basis set of a n-dimensional vector space V. Also, for a future purpose, we denote the “coordinate” that corresponds to ei by a character with a superscript such as ξi. The element x 2 V is uniquely represented by a set {ξ1, ⋯, ξn} (see Sect. 11.1). Let fi 2 V  fi : V ⟶ K be defined as fi ðxÞ = ξi :

ð24:67Þ

fi ðxÞei ði = 1, ⋯, nÞ:

ð24:68Þ

Accordingly, we have x=

n i=1

Meanwhile, we have fi ð xÞ = f i

n k=1

ξ k ek =

n k=1

ξk fi ðek Þ:

ð24:69Þ

For (24.67) and (24.69) to be consistent, we get fi ðek Þ = δik :

ð24:70Þ

A set of f1 , ⋯, fn is a basis set of V and called a dual basis with respect to {e1, ⋯, en}. As we have already seen in Sect. 11.4, we can choose another basis set of V that consists of n linearly independent vectors. Accordingly, the dual basis that satisfies (24.70) is sometimes referred to as a standard basis or canonical basis to distinguish it from other dual bases of V. We have

24.1

Extended Concepts of Vector Spaces

1157

V  = Span f1 , ⋯, fn in correspondence with V = Span fe1 , ⋯, en g: Thus, dimV = n. We have V ffi V (i.e., isomorphic) accordingly [1, 2]. In this case, we must have a bijective mapping (or isomorphism) between V and V, especially between {e1, ⋯, en} and f1 , ⋯, fn (Sects. 11.2, 11.4, and 16.4). We define the bijective mapping N : V ⟶ V  as Nðek Þ = fk ðk = 1, ⋯, nÞ:

ð24:71Þ

Then, there must be an inverse mapping N - 1 to N (see Sect. 11.2). That is, we have ek = N - 1 fk ðk = 1, ⋯, nÞ: Now, let y (2V) be expressed as y=

n

ηf i=1 i

i

,

where ηi 2 K. As in the case of (24.68), we have y=

n

e ðyÞf i=1 i

i

ði = 1, ⋯, nÞ,

ð24:72Þ

with ei ðyÞ = ηi ,

ð24:73Þ

where ei 2 V (ei : V ⟶ K ). As discussed above, we find that the vector spaces V and V stand in their own right on an equal footing. Whereas residents in V see V as a dream world, those in V see V as it as well. Nonetheless, the two vector spaces are complementary to each other, which can clearly be seen from (24.67), (24.68), (24.72), and (24.73). These situations can be formulated as a recurrence property such that ðV  Þ = V  = V:

ð24:74Þ

Notice, however, that (24.74) is not necessarily true of an infinite dimensional vector space [1, 2]. A bilinear form is naturally introduced as in (24.64) to the direct product V × V. Although the field K can arbitrarily be chosen, from here on we assume K = ℝ (a real

1158

24

Basic Formalism

number field) for a future purpose. Then, as in (24.65), B(, ) is uniquely determined by a (n, n) matrix B fj , ei ð2 ℝÞ, where fj ðj = 1, ⋯, nÞ ð2 V  Þ are linearly independent in V and form a dual basis (vide supra). The other way around, ek (k = 1, ⋯, n) (2V ) are linearly independent in V. To denote these bilinear forms B(, ), we use a shorthand notation described by ½j, where an element of V and that of V are written in the left-side and right-side “parentheses,” respectively. Also, to clearly specify the nature of a vector species, we use j  ] for an element of V and [  j for that of V as in the case of a vector of the inner product space (see Chap. 13). Notice, however, that [| ] is different from the inner product h| i in the sense that although h| i is positive definite, it is not necessarily true of [| ]. Using the shorthand notation, e.g., (24.70), can be succinctly expressed as fi jek = δik :

ð24:75Þ

Because of the isomorphism between V and V, jx] is mapped to [xj by the bijective mapping and vice versa. We call [xj a dual vector of jx]. Here we briefly compare the bilinear form with the inner product (Chap. 13). Unlike the inner product space, the basis vectors [xj do not imply the concept of adjoint with respect to the vector spaces we are thinking of now. Although hx| xi is positive definite for any jxi 2 V, the positive definiteness is not necessarily true of [x| x]. Think of a following simple example. In case K = ℂ, we have [x| x] = [ix| ix] = - [ix| ix]. Hence, if [x| x] > 0, [ix| ix] < 0. Also, we may have a case where [x| x] = 0 with jx] ≠ 0 (vide infra). We will come back to this issue later. Hence, it is safe to say that whereas the inner product buys the positive definiteness at the cost of the bilinearity, the bilinear form buys the bilinearity at the cost of the positive definiteness. The relation represented by (24.70) is somewhat trivial and, hence, it will be convenient to define another bilinear form B(, ). Meanwhile, in mathematics and physics, we often deal with invariants before and after the vector (or coordinate) transformation (see Sect. 24.1.5). For this purpose, we wish to choose a basis set of V more suited than fk ðk = 1, ⋯, nÞ. Let us define the said basis set of V as ff1 , ⋯, fn g: Also, we define a linear mapping M such that Mðek Þ = fk ðk = 1, ⋯, nÞ,

ð24:76Þ

where M : V ⟶ V  is a bijective mapping. Expressing vectors |x] (2V ) and [x| (2V) as

24.1

Extended Concepts of Vector Spaces n

j x =

i=1

1159 n

ξi j ei , x j =

i=1

ξi fi j , ξi 2 ℝ,

ð24:77Þ

respectively, and using a matrix representation, we express [x| x] as ½f1 je1  ⋮ ½fn je1 

½xjx = ξ1 ⋯ξn

⋯ ⋱ ⋯

ξ1 ⋮ ξn

½f1 jen  ⋮ ½fn jen 

:

ð24:78Þ

Note that from the definition of M, we may write ð24:79Þ

Mjx = ½x j : Defining a matrix M as [1, 2] ½f1 je1  ⋮ ½fn je1 

M

⋯ ⋱ ⋯

½f1 jen  ⋮ ½fn jen 

ð24:80Þ

,

we obtain ½xjx = ξ1 ⋯ ξn M

ξ1 ⋮ ξn

:

ð24:81Þ

We rewrite (24.81) as ξi M ij ξj =

½xjx = i, j

ξk M kk ξk þ

ξl M lm ξm ,

ð24:82Þ

l, m; l ≠ m

k

where we put M = (M )ij. The second term of (24.82) can be written as ξl M lm ξm = l, m; l ≠ m

=

1 2

ξl M lm ξm þ l, m; l ≠ m

ξm M ml ξl m, l; m ≠ l

1 ðM þ M ml Þξl ξm : 2 l, m; l ≠ m lm

~ ij  1 M ij þ M ji , we get a symmetric quadratic form Therefore, defining M 2 described by ~ kk ξk þ ξk M

½xjx = k

~ lm ξl ξm : M l, m; l ≠ m

1160

24

Basic Formalism

Then, we assume that M is symmetric from the beginning, i.e., fi jej = fj jei [1, 2]. Thus, M is real symmetric matrix (note K = ℝ), and so it is to be diagonalized through orthogonal similarity transformation. That is, M can be diagonalized with an orthogonal matrix P = Pi j . We have P1 1 ⋮ P1 n

PT MP =

⋯ ⋱ ⋯

f Pj j j j 1

=

½f1 je1  ⋮ ½fn je1 

Pn 1 ⋮ Pn n e Pi i i 1



e Pi i i 1





½f01 je01  ⋮ ½f0n je01 

½f1 jen  ⋮ ½fn jen  f Pj j j j 1



f Pj j j j n

=

⋯ ⋱ ⋯

⋯ ⋱ ⋯

½f01 je0n  ⋮ ½f0n je0n 

⋯ ⋱ ⋯

P1 1 ⋮ Pn 1

P1 n ⋮ Pn n

e Pi i i n

⋮ f Pj j j j n

=

d1 ⋮ 0

⋯ ⋱ ⋯

e Pi i i n 0 ⋮ dn

,

ð24:83Þ

where summation with respect to both i and j ranges from 1 to n. In (24.83), we have e0k = Pðek Þ =

i

Pi k j ei , f0k j = Pðfk Þ =

j

Pj k fj j , f0j je0i = d j δji ,

ð24:84Þ

where dj ( j = 1, ⋯, n) are real eigenvalues of M. Equation (24.84) shows that both the basis sets of jei] and fj j ði, j = 1, ⋯, nÞ of V and V, respectively, undergo the orthogonal transformation by P so that it can be converted into new basis sets j e0i  and fj0 j. The number dj takes a positive value or a negative value. Further, interchanging the basis set jei] and similarly exchanging

fj j, the

resulting diagonalized matrix of (24.83) is converted through orthogonal transformation Π = Q1⋯Qs into d i1



d ip dipþ1

, ⋱

dipþq

where di1 , ⋯, dip are positive; dipþ1 , ⋯, d ipþq are negative; we have p + q = n. What we have done in the above process is to rearrange the order of the basis vectors s times so that the positive di1 , ⋯, dip can be arranged in the upper diagonal elements and the negative dipþ1 , ⋯, d ipþq can be arranged in the lower diagonal elements. Correspondingly, we obtain

24.1

Extended Concepts of Vector Spaces

je01 ⋯je0n Π = je0i1 ⋯je0ipþq

1161

f01 ⋯ f0n Π =

,

f0i1 ⋯ f0ipþq

:

ð24:85Þ

Assuming a following non-singular matrix D expressed as [1, 2] 1=

D=

p

d i1



p

1=

d ip 1=

p

- d ipþ1



1=

p

,

ð24:86Þ

- dipþq

we get DT ΠT PT MPΠD = ðPΠDÞT MPΠD 1 ⋱

=

1

:

-1

ð24:87Þ

⋱ -1

The numbers p and q are intrinsic to the matrix M. Since M is a real symmetric matrix, M can be diagonalized with the similarity transformation by an orthogonal matrix. But, since the eigenvalues (given by diagonal elements) are unchanged after the similarity transformation, it follows that p (i.e., the number of the positive eigenvalues) and q (the number of the negative eigenvalues) are uniquely determined by M. The integers p and q are called a signature [1, 2]. The uniqueness of ( p, q) is known as the Sylvester’s law of inertia [1, 2]. We add as an aside that if M is a real symmetric singular matrix, M must have an eigenvalue 0. In case the eigenvalue zero is degenerated r-fold, we have p + q + r = n. In that case, p, q, and r are also uniquely determined and these integers are called a signature as well. This case is also included as the Sylvester’s law of inertia [1, 2]. Note that (24.87) is not a similarity transformation, but an equivalence transformation and, hence, that the eigenvalues have been changed [from dj (≠0) to ±1]. Also, notice that unlike the case of the equivalence transformation with an arbitrarily chosen non-singular matrix mentioned in Sect. 14.5, (24.87) represents an equivalence transformation as a special case. That is, the non-singular operator comprises a product of an orthogonal matrix PΠ and a diagonal matrix D. The above-mentioned equivalence transformation also causes the transformation of the basis set. Using (24.86), we have je0i1 ⋯je0ipþq  D = je0i1 = f0i1 j⋯ f0ipþq j D = Defining

f0i1 j=

d i1 ⋯je0ipþq =

- d ipþq ,

di1 ⋯ f0ipþq j=

- d ipþq :

1162

24

± dik and ~fik j f0ik j =

j ~eik  j e0ik =

Basic Formalism

± d ik ðk = 1, ⋯, n = p þ qÞ,

we find that both the basis sets of V and V finally undergo the same transformation through the equivalence transformation by PΠD such that ðj~ei1 ⋯j~ein Þ = ðje1 ⋯jen ÞPΠD, ~f ⋯ ~f i1 in

= ð½f1 j ⋯½fn jÞPΠD

ð24:88Þ

with ~f j~ei = ± δji i, j take a number out of i1 , ⋯, ip , ipþ1 , ⋯, ipþq : j

ð24:89Þ

Putting W  PΠD, from (24.81) we have -1

½xjx = ξ1 ⋯ ξn W T = ξ1 ⋯ ξ n W - 1

T

ξ1 ⋮ ξn

 W T MW  W - 1

 W T MW  W - 1

ξ1 ⋮ ξn

:

Just as the basis sets are transformed by W, so the coordinates are transformed according to W-1 = D-1Π-1P-1 = D-1ΠTPT. Then, consecutively we have ξ0 = k

~ξik ¼

im

D-1

i

ik

P-1

ξ0 m ¼

k

ξi , ξ0 k = i

i

Π-1

ik

± d ik δik im ξ0 m ¼

i

im

im

i

im

ξ0 m , i

im

± d i k ξ0 k : i

ð24:90Þ

Collectively writing (24.90), we obtain ~ξi1 ⋮ ~ξin

¼W

-1

ξ1 ⋮

¼D

-1

Π

-1

ξn

P

-1

ξ1 ⋮

:

ð24:91Þ

ξn

Thus, (24.88) and (24.91) afford the transformation of the basis vectors and corresponding coordinates. These relations can be compared to (11.85). Now, if the bilinear form [y| x] (x 2 V, y 2 V) described by [1, 2] ½yjx = η1 ⋯ ηn M

ξ1 ⋮ ξn

24.1

Extended Concepts of Vector Spaces

1163

satisfies the following conditions, the bilinear form is said to be non-degenerate. That is, (i) if for 8x 2 V we have [y| x] = 0, then we must have y = 0. Also, (ii) if for 8 y 2 V we have [y| x] = 0, then we must have x = 0. The above condition (i) is translated into η1 ⋯ ηn M = 0



η1 ⋯ ηn = 0:

This implies detM ≠ 0. Namely, M is non-singular. From the condition (ii) which implies that M

ξ1 ⋮ ξn

=0 ⟹

ξ1 ⋮ ξn

= 0,

we get detM ≠ 0 as well. The determinant of the matrix for RHS of (24.87) is ±1. Thus, the non-degeneracy of the bilinear form is equivalent to that M has only non-zero eigenvalues. It is worth comparing two vector spaces, the one of which is an inner product space Vin and the other of which is a vector space Vbi where the bilinear form is defined. Let x 2 Vin and y 2 Vin, where Vin is a dual space of Vin. The inner product can be formulated by a mapping I such that I : V in  × V in ⟶ ℂ: In that case, we have described a resulting complex number of the above expression as an inner product hy| xi and we have defined the computation rule of the inner product as (13.2)-(13.4). Regarding two sets of vectors of Vin, we have defined a Gram matrix G as (13.35) in Sect. 13.2. There, we have shown that Both the sets of vectors of V in are linearly independent: ⟺ detG ≠ 0: ð13:31Þ As for the vector space Vbi, the matrix M defined in (24.80) is a counterpart to the Gram matrix G. With respect to two sets of basis vectors of Vbi and its dual space V bi , we have an expression related to (13.31). That is, we have [1, 2] The bilinear form B given by B : V bi × V bi ⟶ K ðK = ℂ or ℝÞ is non degenerate: ⟺ det M ≠ 0:

ð24:92Þ

Let us consider a case where we construct M as in (24.80) from two sets of basis vectors of Vbi and V bi given by fje1 , ⋯, jen ; jei  2 V bi ði = 1, ⋯, nÞg,

1164

24

Basic Formalism

f1 j , ⋯, fn j ; ½fi j 2 V bi ði = 1, ⋯, nÞ : We assume that these two sets are connected to each other through (24.76). Suppose that we do not know whether these sets are linearly dependent or independent. However, if either of the above two sets of vectors is linearly independent, so is the remainder set. Think of the following relation: ξ 1 j e1  þ ⋯ þ ξ n j en  = 0 # M " M-1 ξ1 ½f1 j þ⋯ þ ξn ½fn j = 0: In the above relation, M has been defined in (24.76). If je1], ⋯, j en] are linearly independent, we obtain ξ1 = ⋯ = ξn = 0. Then, it necessarily follows that ½f1 j , ⋯, ½fn j are linearly independent as well, and vice versa. If, the other way around, either of the above two sets of vectors is linearly dependent, so is the remainder set. This can be shown in a manner similar to the above. Nonetheless, even though we construct M using these two basis sets, we do not necessarily have detM ≠ 0. To assert detM ≠ 0, we need the condition (24.92). In place of M of (24.80), we use a character M and define (24.87) anew as 1 ⋱ 1

M 

:

-1

ð24:93Þ

⋱ -1

Equation (24.93) can be rewritten as ðM Þij = - δij ζ i = - δij ζ j ; ζ 1 = , ⋯, = ζ p = - ζ pþ1 = , ⋯, = - ζ pþq = - 1:

ð24:94Þ

Meanwhile, we redefine the notation ~e, ~f, and ~ξ as e, f, and ξ, respectively, and renumber their subscripts (or superscripts) in (24.88) and (24.90) such that fe1 , ⋯, en g ← fei1 , ⋯, ein g, ff1 , ⋯, fn g ← fi1 , ⋯, fin , i1

ξ1 , ⋯, ξn ⟵ ξ , ⋯, ξ

in

:

Then, from (24.89) and (24.95), M can be rewritten as ðM Þji = fj jei = ± δji ð1 ≤ i, j ≤ nÞ:

ð24:95Þ

24.1

Extended Concepts of Vector Spaces

1165

The matrix M is called metric tensor. This terminology is, however, somewhat misleading, because M is merely a matrix consisting of real numbers. Care should be taken accordingly. If all the eigenvalues of M are positive, the quadratic form [x| x] is positive definite (see Sect. 14.5). In that case, the metric tensor M can be expressed using an inner product such that ðM Þji = fj jei = fj jei = δji ð1 ≤ i, j ≤ nÞ:

ð24:96Þ

This is ensured by Theorem 13.2 (Gram–Schmidt orthonormalization Theorem) and features the Euclidean space. That space is characterized by the Euclidean metric given by (24.96). If, on the other hand, M contains both positive and negative eigenvalues, [x| x] can be positive or negative or zero. In the latter case, [x| x] is said to be an indefinite (Hermitian) quadratic form. As a typical example for this, we have the four-dimensional Minkowski space where the metric tensor M is represented by

M=

1 0 0 0

0 -1 0 0

0 0 -1 0

0 0 0 -1

:

ð24:97Þ

The matrix M is identical with η of (21.16) that has already appeared in Chap. 21.

24.1.5

Invariants

The metric tensor M plays a fundamental role in a n-dimensional real vector space. It is indispensable for studying the structure and properties of the vector space. One of the most important points is that M has a close relationship to the invariants of the vector field. First, among various operators that operate on a vector, we wish to examine what kinds of operators hold [x| x] invariant after their operation on vectors jx] and [xj belonging to V or V, respectively. Let the relevant operator be R. Operating R on jx =

n

ξi j ei  = ðje1  ⋯ jen Þ i=1

ξ1 ⋮ ξn

ð2 V Þ,

we obtain Rjx = j Rx =

n i,j = 1

Ri j ξj j ei ,

ð24:98Þ

1166

24

Basic Formalism

where ei has been defined in (24.95). Operating M of (24.76) on both sides of (24.98), we obtain n

Rx j =

i,j = 1

R i j ξj ½ fi j :

ð24:99Þ

Then, we get ½RxjRx =

n i,j = 1

R i j ξj

n k,l = 1

Rk l ξl ½fi jek  =

n i,j,k,l = 1

ξj Ri j M ik Rk l ξl , ð24:100Þ

where with the last equality we used (24.80) and M ik represents (i, k)-element of M given in (24.93), i.e., M ik = ± δik . Meanwhile, we have n

½xjx =

j,l = 1

ξj M jl ξl :

ð24:101Þ

From (24.100) and (24.101), we have n

½RxjRx - ½xjx =

i,j,k,l = 1

ξj Ri j M ik Rk l - M jl ξl :

ð24:102Þ

Since ξ j and ξl can arbitrarily be chosen from ℝ, we must have n i,k = 1

Ri j M ik Rk l - M jl = 0

ð24:103Þ

so that [Rx| Rx] - [x| x] = 0 (namely, so that [x| x] can be invariant before and after the transformation by R). In a matrix form, (24.103) can be rewritten as RT M R = M :

ð24:104Þ

Remind that an equation related to (24.104) has already appeared in Sect. 13.3 in relation to the Gram matrix and adjoint operator; see (13.81). If real matrices are chosen for R and G, (13.81) is translated into RT GR = G: Hence, this expression is virtually the same as (24.104). Even though (24.103) and (24.104) are equivalent, (24.103) clearly shows a covariant characteristic (vide infra). The transformation R plays a fundamental role in a tensor equation, i.e., an equation which contains tensor(s). In what follows, the transformation of the tensors and tensor equation along with related invariants will form the central theme. First, we think about the transformation of the basis vectors and its corresponding coordinates. In Sect. 11.4, we described the transformation as

24.1

Extended Concepts of Vector Spaces

= ðe1 ⋯ en ÞPP - 1

x1 ⋮ xn

1167

x = ðe1 ⋯ en Þ

x1 ⋮ xn

= e01 ⋯ e0n P - 1

x1 ⋮ xn

x~1 ⋮ x~n

= e01 ⋯ e0n

: ð11:84Þ

Here, we rewrite this relationship as x = ð e1 ⋯ e n Þ

ξ1 ⋮ ξn

ξ1 ⋮ ξn

= ðe1 ⋯ en ÞR - 1  R

ξ01 ⋮ ξ0n

= e01 ⋯ e0n

: ð24:105Þ

Note that in (24.105) R was used instead of P-1 in (11.84). This is because in this section we place emphasis on the coordinate transformation. Thus, as the transformation of the basis set and its corresponding coordinates of V, we have e01 ⋯ e0n = ðe1 ⋯ en ÞR - 1

n

or

e0i =

or

ξ0i =

e j=1 j

j

R - 1 i:

ð24:106Þ

Also, we obtain ξ01 ⋮ ξ0n

ξ1 ⋮ ξn

=R

n j=1

R i j ξj :

ð24:107Þ

We assume that the transformation of the dual basis fi of V is described such that n

f0i =

j=1

Qi j fj :

ð24:108Þ

We wish to decide Q so that it can satisfy (24.70) fi jej = f0i je0j = δij :

ð24:109Þ

Inserting e0j of (24.106) and f0i of (24.108) into (24.109), we get f0i je0j = Qi l fl jek R - 1

k j

= Qi l R - 1

= Qi l R - 1

l j

k j

fl jek = Qi l R - 1

= fi jej = δij :

Representing (24.110) in a matrix form, we obtain

k

δl j k ð24:110Þ

1168

24

Basic Formalism

QR - 1 = E:

ð24:111Þ

Hence, we get Q = R. Thus, we find that fi undergoes the same transformation as ξ . It is said to be a contravariant transformation. Since both N of (24.71) and M of (24.76) are bijective mappings and f1 , ⋯, fn and ff1 , ⋯, fn g are two basis sets of V, we must have a non-singular matrix that transforms each other (Sect. 11.4). Let T be the said non-singular matrix. Then, we have j

fi =

n

T f j = 1 ij

j

ði = 1, ⋯, nÞ:

ð24:112Þ

Also, we have ½xjx = =

n j,l,k = 1

n j,l = 1

n

ξj fj jel ξl = n

ξj T jk fk jel ξl =

j,l,k = 1

j,l = 1

ξj

n

T f k = 1 jk

ξj T jk δkl ξl =

n j,l = 1

k

jel ξl

ξj T jl ξl :

ð24:113Þ

Comparing (24.113) with (24.101) and seeing that ξ j ( j = 1, ⋯, n) can arbitrarily be chosen, we get T jl = M jl

T =M:

i:e:,

Hence, we have fi =

n j=1

M ij fj ði = 1, ⋯, nÞ:

ð24:114Þ

Making a bilinear form between ek and both sides of (24.114), we obtain ½fi jek  =

n j=1

M ij fj jek =

n j=1

M ij δjk = M ik ,

ð24:115Þ

where with the second equality we used (24.109). Next, we investigate the transformation property of fi . Let R be a transformation in a vector space V. Operating R-1 on both sides of (24.76), we obtain R - 1 Mðek Þ = R - 1 fk ðk = 1, ⋯, nÞ:

ð24:116Þ

Assuming that R-1 and M are commutative, we have M R - 1 ek = M e0k = R - 1 fk = f0k : Operating M on (24.106), we obtain

ð24:117Þ

24.1

Extended Concepts of Vector Spaces

1169

j

M e0i = f0i = R - 1 i M ej =

n

j

R - 1 i fj ,

j=1

ð24:118Þ

where with the first equality we used (24.117); with the last equality we used (24.76). Rewriting (24.118), we get f0i =

n

f j=1 j

j

R - 1 i:

ð24:119Þ

Namely, fj exhibits the same transformation property as ej. We examine the coordinate transformation in a similar manner. To this end, let us define a following quantity ξi such that n

ξl 

j=1

ξj M jl :

ð24:120Þ

Using the definition (24.120), (24.101) is rewritten as n

½xjx =

ξξ l=1 l

l

:

ð24:121Þ

Further rewriting (24.121), we get ½xjx =

n

ξξ l=1 l

l

=

n

ξ δk ξl k,l = 1 k l

=

n

ξ i,k,l = 1 k

R-1

k i

R i l ξl :

ð24:122Þ

Defining the transformation of ξi as ξ0i 

n

ξ k=1 k

R-1

k

ð24:123Þ

i

and using (24.107), we obtain ½xjx =

n

ξ0 ξ0i : i=1 i

ð24:124Þ

Consequently, nl= 1 ξl ξl is an invariant quantity with regard to the transformation by R. Thus, (24.106), (24.107), (24.108), (24.119), and (24.123) give the transformation properties of the basis vectors and their corresponding coordinates. Next, we investigate the transformation property of M of (24.93). Using (24.106) and (24.119), we have f0i je0k =

n j,l = 1

fj jel R - 1

j i

R-1

l k

:

Multiplying both sides by Ri s Rk t and summing over i and k, we obtain

ð24:125Þ

1170

24 n i,k = 1

=

n

Ri s Rk t f0i je0k =

n

δj δl j,l = 1 s t

j

i,k,j,l = 1

l

R - 1 i Ri s R - 1 n

fj jel = ½fs jet  = M st =

i,k = 1

k

Basic Formalism

Rk t fj jel

Ri s M ik Rk t ,

ð24:126Þ

where with the second to the last equality we used (24.115); with the last equality we used (24.103). Putting f0i je0k = M 0ik and comparing the leftmost side and rightmost side of (24.126), we have n i,k = 1

n

Ri s M 0ik Rk t =

i,k = 1

Ri s M ik Rk t :

ð24:127Þ

Rewriting (24.127) in a matrix form, we obtain RT M 0 R = RT M R:

ð24:128Þ

Since R is a non-singular matrix, we get M0=M:

ð24:129Þ

That is, M is invariant with respect to the basis vector transformation. Hence, M is called invariant metric tensor. In other words, the transformation R that satisfies (24.128) and (24.129) makes [x| x] invariant and, hence, plays a major role in the vector space characterized by the invariant metric tensor M . As can be seen immediately from (24.93), M is represented as a symmetric matrix and an inverse matrix of M - 1 is M itself. We denote M - 1 by M -1  M -1

ij

 ðM Þij :

Then, we have n j=1

M ij M jk = δik

or

n k=1

M jk M ki = δij :

ð24:130Þ

These notations make various expressions simple. For instance, multiplying both sides of (24.120) by M ik and summing the result over i we have n

ξM i=1 i

ik



n

n

i=1

j=1

ξj M ji M ik =

Multiplying both sides of (24.114) by M il = M li obtain

n j=1

ξj δkj = ξk :

and summing over i, we

24.1

Extended Concepts of Vector Spaces n

fM i=1 i

il

=

n

n

i=1

j=1

1171

M li M ij fj =

n

δl fj j=1 j

= fl :

ð24:131Þ

Moreover, we get n i=1

=

ξi fi = n

ξf k=1 k

k

n

n

l=1

i=1

ξi δli fl =

n

n

n

l=1

k=1

i=1

ξi M ik M kl fl ð24:132Þ

,

where with the second and last equalities we used (24.130). In this way, we can readily move the indices up and down. With regard to the basis vector transformation of e.g., (24.114), we can carry out the matrix calculation in a similar way. These manipulations are useful in relation to derivation of the invariants.

24.1.6

Tensor Space and Tensors

In Sect. 24.1.1, we showed the definition and a tangible example of the bilinear mapping. Advancing this notion further, we reach the multilinear mapping [1, 2]. Suppose that a mapping S from a direct product space Vs × ⋯ × V1 to a vector space T is described by S : V s × ⋯ × V 1 ⟶ T, where V1, ⋯, Vs are vector spaces and Vs × ⋯ × V1 is their direct product space. If with x(i) 2 Vi (i = 1, ⋯, s), S xðsÞ , ⋯, xð1Þ is linear with respect to each x(i) (i = 1, ⋯, s), then S is called a multilinear mapping. As a natural extension of the tensor product dealt with in Sect. 24.1.2, we can define a tensor product T in relation to the multilinear mapping. It is expressed as T = V s  ⋯  V 1:

ð24:133Þ

Then, in accordance with (24.34), we have S xðsÞ , ⋯, xð1Þ = xðsÞ  ⋯  xð1Þ : Also, as a natural extension of Theorem 24.2, we have a following theorem. Although the proof based on the mathematical induction can be seen in literature [1, 2], we skip it and show the theorem here without proof. Theorem 24.3 [1, 2] Let Vi (i = 1, ⋯, s) be vector spaces over K. Let T be an ~ be an arbitrary multilinear mapping expressed as arbitrary vector space. Let S

1172

24

Basic Formalism

~ : V s × ⋯ × V 1 ⟶ T: S Also, a tensor product is defined such that T = V s  ⋯  V 1: Then, a linear mapping ρ : V s  ⋯  V 1 ⟶ T that satisfies the condition described by ~ xðsÞ , ⋯, xð1Þ = ρ xðsÞ  ⋯  xð1Þ ; xðiÞ 2 V i ði = 1, ⋯, sÞ S

ð24:134Þ

is uniquely determined. As in the case of the bilinear mapping, we have linear mappings φi (i = 1, ⋯, s) such that φi : V i ⟶ V 0i ði = 1, ⋯, sÞ:

ð24:135Þ

Also, suppose that we have T = V 0s  ⋯  V 01 :

ð24:136Þ

Notice that (24.136) is an extension of (24.36). Then, from (24.133) to (24.136), ~ : V s × ⋯ × V 1 ⟶T as before such that we can construct a multilinear mapping S ~ xðsÞ , ⋯, xð1Þ = φs xðsÞ  ⋯  φ1 xð1Þ : S

ð24:137Þ

Also defining a linear mapping ρ : T⟶T such that ρ  φs  ⋯  φ1 ,

ð24:138Þ

we get ðφs  ⋯  φ1 Þ xðsÞ  ⋯  xð1Þ = φs xðsÞ  ⋯  φ1 xð1Þ :

ð24:139Þ

We wish to consider the endomorphism with respect to φi (i = 1, ⋯, s) within a same vector space Vi as in the case of Sect. 24.1.2. Then, we have φi : V i ⟶ V i ði = 1, ⋯, sÞ: In this case, we have

ð24:140Þ

24.1

Extended Concepts of Vector Spaces

1173

T = V s  ⋯  V 1: Accordingly, from (24.133), we get a following endomorphism with the multilinear mapping φs  ⋯  φ1: φs  ⋯  φ1 : T ⟶ T: Replacing φi with a (n, n) matrix Ai, (24.55) can readily be generalized such that ðAs  ⋯  A1 Þ xðsÞ  ⋯  xð1Þ = eð1Þ  ⋯  eðsÞ ½As  ⋯  A1  ξðsÞ  ⋯  ξð1Þ ,

ð24:141Þ

where e(i) (i = 1, ⋯, s) denotes e1ðiÞ ⋯ enðiÞ ; ξ(i) (i = 1, ⋯, s) is an abbreviation for

ξ1ðiÞ ⋮ ξðniÞ

. Also, in (24.141), As  ⋯  A1 operates on a vector e(i)  ξ(i) 2 T.

In the above, e(i)  ξ(i) is a symbolic expression meant by eðiÞ  ξðiÞ = eð1Þ  ⋯  eðsÞ ξðsÞ  ⋯  ξð1Þ , where e(1)  ⋯  e(s) is represented by a (1, ns) row matrix; ξ(s)  ⋯  ξ(1) by a (ns, 1) column matrix. The operator As  ⋯  A1 is represented by a (ns, ns) matrix. As in the case of (11.39), the operation of As  ⋯  A1 is described by ðAs  ⋯  A1 Þ xðsÞ  ⋯  xð1Þ =

eð1Þ  ⋯  eðsÞ ½As  ⋯  A1 

= eð1Þ  ⋯  eðsÞ

ξðsÞ  ⋯  ξð1Þ

½As  ⋯  A1  ξðsÞ  ⋯  ξð1Þ

:

That is, the associative law holds with the matrix multiplication. We have a more important case where some of vector spaces Vi (i = 1, ⋯, s) that are involved in the tensor product are the same space V and the remaining vector spaces are its dual space V. The resulting tensor product is described by T = V s  ⋯  V 1 ðV i = V or V  ; i = 1, ⋯, nÞ,

ð24:142Þ

where T is a tensor product consisting of s vector spaces in total; V is a dual space of V. The tensor product described by (24.142) is called a tensor space. If the number of vector spaces V and that of V contained in the tensor space are q and r (q + r = s), respectively, that tensor space is said to be a (q, r)-type tensor space or a q-th order contravariant-r-th order covariant tensor space.

1174

24

Basic Formalism

An element of the tensor space is called a s-th rank tensor. In particular, if all Vi (i = 1, ⋯, s) are V, the said tensor space is called a s-th rank contravariant tensor space and the elements in that space are said to be a s-th rank contravariant tensor. If all Vi (i = 1, ⋯, s) are V, the tensor space is called a s-th rank covariant tensor space and the elements in that space are said to be a s-th rank covariant tensor accordingly. As an example, we construct three types of second rank tensors described by (i) ðR  RÞ xð2Þ  xð1Þ = eð2Þ  eð1Þ ½R  R ξð2Þ  ξð1Þ , (ii) R-1 = fTð2Þ  fTð1Þ

T

 R-1

R-1

T

T

 R-1

yð2Þ  yð1Þ T

ηTð2Þ  ηTð1Þ ,

(iii) R-1

T

 R yð2Þ  xð1Þ = fTð2Þ  eð1Þ

ηTð2Þ  ξð1Þ :

R-1

T

R ð24:143Þ

Equation (24.143) shows (2, 0)-, (0, 2)-, and (1, 1)-type tensors for (i), (ii), and (iii) from the top. In (24.143), with x(i) 2 V we denote it by x(i) = e(i)  ξ(i) (i = 1, 2); for y(i) 2 V we express it as yðiÞ = fTðiÞ  ηTðiÞ ði = 1, 2Þ. The vectors x(i) and y(i) are referred to as a contravariant vector and a covariant vector, respectively. Since with a dual vector of V, a basis set fðiÞ are arranged in a column vector and coordinates are arranged in a row vector, they are transposed so that their arrangement can be consistent with that of a vector of V. Concomitantly, R-1 has been transposed as well. Higher rank tensors can be constructed in a similar manner as in (24.143) noted above. The basis sets e(2)  e(1), fTð2Þ  fTð1Þ , and fTð2Þ  eð1Þ are referred to a standard basis (or canonical basis) of the tensor spaces V  V, V  V, and V  V, respectively. Then, if (24.142) contains the components of V, we need to use ηTðiÞ for the coordinate; see (24.132). These basis sets are often disregarded in the computation process of tensors. Equation (24.143) are rewritten using tensor components (or “coordinates” of a tensor) as

24.1

Extended Concepts of Vector Spaces

1175 n

ξ ′ ið2Þ ξ ′ ðk1Þ =

(i)

j,l = 1

Ri l Rk j ξðl 2Þ ξjð1Þ ,

(ii) n

η0ið2Þ η0kð1Þ =

j,l = 1

R-1

l i

R-1

j

η η , k lð2Þ jð1Þ

(iii) n

η0ið2Þ ξ ′ kð1Þ =

l

j,l = 1

R - 1 i Rk j ηlð2Þ ξjð1Þ ,

where, e.g., ξ ′ ið2Þ ξ ′ kð1Þ shows the tensor components after being transformed by R. With these relations, see (24.62). Also, notice that the invariant tensor fj jel

of

(24.125) indicates the transformation property of the tensor components of a second rank tensor of (0, 2)-type. We have another invariant tensor δij of (1, 1)-type, whose transformation is described by l

l

i

δ0 j = R - 1 j Rik δkl = R - 1 j Ril = RR - 1 j = δij : i

Note that in the above relation R can be any non-singular operator. The invariant tensor δij is exactly the same as fi ðek Þ of (24.70). Although the invariant metric tensor ðM Þij of (24.93) is determined by the inherent structure of individual vector spaces that form a tensor product, the invariant tensor δij is common to any tensor space, regardless of the nature of constituent vector spaces. To show tensor components, we use a character with super/subscripts such as e.g., Ξij (1≤i, j ≤ n), which are components of a second rank tensor of (2, 0)-type. The transformation of Ξ is described by Ξ0 = ik

n j,l = 1

Ri l Rk j Ξlj , etc:

We can readily infer the computation rule of the tensor from analogy and generalization of the tensor product calculations shown in Example 24.2. More generally, let us consider a (q, r)-type tensor space T. We express an element T (i.e., a tensor) of T ðT 2 T Þ symbolically as T = Γ  Θ  Ψ,

ð24:144Þ

where Γ denotes a tensor product of the s basis vectors of V or V; Θ denotes a tensor product of the s operators R or (R-1)T; Ψ also denotes a tensor product of the s coordinates ξiðjÞ or ηk(l ) where 1 ≤ j ≤ q, 1 ≤ l ≤ r, 1 ≤ i, k ≤ n (n is a dimension of the vector spaces V and V). Individual tensor products are further described by

1176

24

Basic Formalism

Γ = gðsÞ  gðs - 1Þ  ⋯  gð1Þ , Θ = QðsÞ  Qðs - 1Þ  ⋯  Qð1Þ , Ψ = ζ ðsÞ  ζ ðs - 1Þ  ⋯  ζ ð1Þ :

ð24:145Þ

In (24.145), Γ represents s basis sets gðiÞ ð1 ≤ i ≤ sÞ, which denote either e(i) or fTðiÞ ð1 ≤ i ≤ sÞ. The number of e(i) is q and that of f TðiÞ is r with q + r = s. With the tensor product Θ, Q(i) represents R or (R-1)T with the number of R and (R-1)T being q and r, respectively. The tensor product Ψ denotes s coordinates ζ (i) (1 ≤ i ≤ s), which represent either ξ(i) or ηTðiÞ . The number of ξ(i) is q and that of ηTðiÞ is r with q + r = s. The symbols ξ(i) and ηTðiÞ represent ξðiÞ =

ξ1ðiÞ ⋮ ξnðiÞ

ð1 ≤ i ≤ qÞ, ηTðiÞ =

η 1ð i Þ ⋮ η nð i Þ

ð1 ≤ i ≤ r Þ,

respectively. Thus, Ψ is described by the tensor products of column vectors. Consequently, Ψ has a dimension of nq + r = ns. In Sect. 22.5, we have introduced the field tensor Fμν, i.e., a (2, 0)-type tensor which is represented by a matrix. To express a physical quantity in a tensor form, Ψ is usually described as ξ ξ

⋯ξ

Ψ  T ηððqrÞÞηððrq-- 11ÞÞ⋯ηðð11ÞÞ :

ð24:146Þ

Then, Ψ is transformed by R such that ξ ξ

⋯ξ

Ψ0 = T 0 ηððqrÞÞηððrq-- 11ÞÞ⋯ηðð11ÞÞ = R-1

α η ðr Þ

R-1

β η ð r - 1Þ

⋯ R-1

γ η ð 1Þ

ρσ⋯τ RξðqÞ ρ Rξðq - 1Þ σ ⋯Rξð1Þ τ T αβ⋯γ :

ð24:147Þ

As can be seen above, if the tensor products of the basis vectors Γ are fixed (normally at the standard basis) in (24.144), the tensor T and their transformation are determined by the part Θ  Ψ in (24.144). Moreover, since the operator R is related to the invariant metric tensor M through (24.104), the most important factor of (24.144) is the tensor product Ψ of the coordinates. Hence, it is often the case in practice with physics and natural science that Ψ described by (24.146) is simply referred to as a tensor. Let us think of the transformation property of a following (1, 1)-type tensor Ξji :

24.1

Extended Concepts of Vector Spaces

1177 n

Ξ ′ ki =

j,l = 1

l

R - 1 i Rk j Ξjl :

ð24:148Þ

Putting k = i in (24.148) and taking the sum over i, we obtain n i=1

Ξ ′ ii =

n

l

i,j,l = 1

R - 1 i Ri j Ξjl =

n

δl Ξj j,l = 1 j l

n

=

Ξl l=1 l

 Ξll :

ð24:149Þ

Thus, (24.149) represents an invariant quantity that does not depend on the coordinate transformation. The above computation rule is known as tensor contraction. In this case, the summation symbol is usually omitted and this rule is referred to as Einstein summation convention (see Chap. 21). We readily infer from (24.149) that a (q, r)-type tensor is converted into a (q - 1, r - 1)-type tensor by the contraction.

24.1.7

Euclidean Space and Minkowski Space

As typical examples of the real vector spaces that are often dealt with in mathematical physics, we have Euclidean space and Minkowski space. Even though the former is familiar enough even at the elementary school level, we wish to revisit and summarize its related topics. The metric tensor of Euclidean space is given by M = ðM Þij = δij :

ð24:150Þ

Hence, from (24.104) the transformation R that holds a bilinear form [x| x] invariant satisfies the condition RT ER = RT R = E,

ð24:151Þ

where E is the identity operator of Euclidean space. That is, R must be an orthogonal matrix. From (24.120), we have ξi = ξ j. Also, from (24.123) we have n

ξ0i  =

ξ k=1 k n k=1

R-1 Ri k ξk :

k i

=

n

ξ k=1 k

RT

k i

=

n

ξ Ri k=1 k k ð24:152Þ

From (24.107), ξi is the same transformation property as ξ′i. Therefore, we do not have to distinguish a contravariant vector and covariant vector in Euclidean space. The bilinear form [x| x] is identical with the positive definite inner product hx| xi in a real vector space. In the n-dimensional Minkowski space, the metric tensor ðM Þij = fi jej is represented by

1178

24

Basic Formalism

fi jej = - δij ζ i = - δij ζ j = fj jei ; ζ 0 = - ζ k ðk = 1, ⋯, n - 1Þ = - 1: ð24:153Þ In the four-dimensional case, M is expressed as M=

1 0 0 0

0 -1 0 0

0 0 -1 0

0 0 0 -1

:

ð24:97Þ

Minkowski space has several properties different from those of Euclidean space because of the difference in the metric tensor. First, we make general discussion concerning the vector spaces, followed by specific topics related to Minkowski space as contrasted with the properties of Euclidean space. The discussion is based on the presence of the bilinear form that has already been mentioned in Sect. 24.1.3. First, we formally give a following definition. Definition 24.4 [5] Let W be a subspace of a vector space V. Suppose that with w0 (≠0) 2 W we have



½w0 jw = 0

8

for

w 2 W:

ð24:154Þ

Then, the subspace W is said to be degenerate. Otherwise, W is called nondegenerate. If we have ½ujv = 0

for

8

u, v 2 W,

W is called totally degenerate. The statement that the subspaces are degenerate, non-degenerate, or totally degenerate in the above definition is clear from the definition of subspace (Sect. 11.1) and [| ]. The existence of degenerate and totally degenerate subspaces results from the presence of a singular vector, i.e., a vector u (≠0) that satisfies [u| u] = 0. Note that any subspace of Euclidean space is non-degenerate and that no singular vector is present in Euclidean space except for zero vector. If it was not the case, from (24.154) with ∃w0 (≠0) 2 W we would have [w0| w0] = 0. It is in contradiction to the positive definite inner product of Euclidean space. In Sect. 14.1, we showed a direct sum decomposition of a vector space into its subspaces. The concept of the direct sum decomposition plays an essential role in this section as well. The definition of an orthogonal complement stated in (14.1), however, needs to be modified in such a way that W ⊥  jx; ½xjy = 0 for 8 jy 2 W : In that case, W⊥ is said to be an orthogonal complement of W.

ð24:155Þ

24.1

Extended Concepts of Vector Spaces

1179

In this context, we have following important proposition and theorems and see how the present situation can be compared with the previous results obtained in the inner product space (Sect. 14.1). Proposition 24.1 [5] Let V be a vector space where a bilinear form [| ] is defined. Let W be a subspace of V and let W⊥ be an orthogonal complement of W. A necessary and sufficient condition for W \ W⊥ = {0} to hold is that W is non-degenerate. Proof (i) Sufficient condition: To prove “W is non-degenerate ⟹ W \ W⊥ = {0},” we prove its contraposition. That is, suppose W \ W⊥ ≠ {0}. Then, we have 0 ≠ ∃w0 which satisfies w0 2 W \ W⊥. From the definition of (24.155) for the orthogonal complement and the assumption of w0 2 W⊥, we have ½w0 jw = 0 for 0 ≠ w0 2 W, 8 w 2 W:

ð24:156Þ

But, (24.156) implies that W is degenerate. (ii) Necessary condition: Let us suppose that when W \ W⊥ = {0}, W is degenerate. Then, from the definition of (24.154), (24.156) holds with W. But, from the definition of the orthogonal complement, (24.156) implies w0 2 W⊥. That is, we have 0 ≠ w0 2 W \ W⊥. This is, however, in contradiction to W \ W⊥ = {0}. Then, W must be non-degenerate. These complete the proof. Theorem 24.4 [5] Let V be a vector space with a dimension n where a bilinear form [| ] is defined. Let W be a subspace of V with a dimension r. If W is non-degenerate, we have W ⊥:

V =W

ð24:157Þ

Moreover, we have ⊥

ðW ⊥ Þ = W, ⊥

dimW = n - r:

ð24:158Þ ð24:159Þ

Proof If W is non-degenerate, Theorem 14.1 can be employed. As already seen, we can always construct the orthonormal basis such that [5] fi jej = δij ζ i = δij ζ j ζ j = ± 1 ;

ð24:160Þ

see (24.94) and (24.115). Notice that the terminology of orthonormal basis was borrowed from that of the inner product space and that the existence of the orthonormal basis can be proven in parallel to the proof of Theorem 13.2 (Gram-Schmidt orthonormalization theorem) and Theorem 13.3 [4].

1180

24

Suppose that with 8|x] (2V ), |x] is described as j x = n

fj jx =

i=1

ξi fj jei =

n i=1

n i i = 1ξ

Basic Formalism

j ei . Then,

ξi δji ζ i = ξj ζ j :

ð24:161Þ

Note that the summation with respect to j was not taken. Multiplying both sides of (24.161) by ζ j, we have ξj = ζ j fj jx:

ð24:162Þ

Thus, we obtain j x =

n j=1

ζ j fj jxjej :

ð24:163Þ

Since W is a non-degenerate subspace, it is always possible to choose the orthonormal basis for it. Then, we can make {e1, ⋯, er} the basis set of W. In fact, with ∃|w0] 2 W and 8w 2 W, describing r

j w0  =

ηk k=1 0

r

j ek , j w =

j=1

ηj j ej ,

ð24:164Þ

we obtain ½w0 jw =

r

fk jej =

ηk ηj k,j = 1 0

r

ηk ηj δkj ζ j k,j = 1 0

=

r

ηj ηj ζ j : j=1 0

ð24:165Þ

For [w0| w] = 0 to hold with 8w 2 W, i.e., with 8η j 2 ℝ ( j = 1, ⋯, r), we must have ηj0 ζ j = 0. As ζ j = ± 1 (≠0), we must have ηj0 = 0 ðj = 1, ⋯, r Þ. That is, |w0] = 0. This means that W is non-degenerate. Thus, using the orthonormal basis we represent |x′] 2 W as j x0  =

r j=1

ζ j fj jxjej ,

ð24:166Þ

where we used (24.162). Putting |x′′] = |x] - |x′] and operating ½fk j ð1 ≤ k ≤ r Þ on both sides from the left, we get n

fk jx00  =

j=1 r

= = ζk

j=1 2

r

ζ j fj jx fk jej -

ζ j fj jxδkj ζ j 2

j=1 r j=1

ζ j fj jx fk jej

ζ j fj jxδkj ζ j

fk jx - ζ k ½fk jx = ½fk jx - ½fk jx = 0:

ð24:167Þ

Note in (24.167) fk jej = 0 with r + 1 ≤ j ≤ n. Rewriting (24.167), we have

24.1

Extended Concepts of Vector Spaces

1181

½x00 jek  = 0 ð1 ≤ k ≤ r Þ:

ð24:168Þ

Since W = Span {e1, ⋯, er}, (24.168) implies that [x′′|w] = 0 with 8|w] 2 W. Consequently, from the definition of the orthogonal complement, we have x′′ 2 W⊥. Thus, we obtain jx = jx0  þ jx00 ,

ð24:169Þ

where |x] 2 V, |x′] 2 W, and x′′ 2 W⊥. Then, from Proposition 24.1 we get W ⊥:

ð24:157Þ

dimW ⊥ = n - r:

ð24:159Þ

V =W From Theorem 11.2, we immediately get

Also, from the above argument we have W⊥ = Span {er + 1, ⋯, en}. This once again implies that W⊥ is non-degenerate as well. Then, we have ⊥

V = W⊥

ðW ⊥ Þ :

ð24:170Þ

Therefore, we obtain ⊥

dimðW ⊥ Þ = n - ðn - r Þ = r:

ð24:171Þ

From the definition for the orthogonal complement of (24.155), we have ⊥

8

ðW ⊥ Þ = jx; ½xjy = 0 for j y 2 W ⊥ :

ð24:172Þ

Then, if jx] 2 W, again from the definition such jx] must be contained in (W⊥)⊥. That is, (W⊥)⊥ ⊃ W [1, 2]. But, from the assumption and (24.171), dim (W⊥)⊥ = dim W. This implies [1, 2] ⊥

ðW ⊥ Þ = W:

ð24:158Þ

These complete the proof. Getting back to (24.93), we have two non-degenerate subspaces of V where their basis vectors comprise |ei] (i = 1, ⋯, p) to which ½fk jei  = δki or those comprise |ej] ( j = p + 1, ⋯, p + q = n) to which fk ej = - δkj . Let us define two such subspaces as

1182

24

W þ = Span e1 , ⋯, ep , W - = Span epþ1 , ⋯, epþq :

Basic Formalism

ð24:173Þ

Then, from Theorem 24.4 we have V = Wþ

W-:

ð24:174Þ

In (24.173) we redefine p and q as p = nþ , q = n - ; nþ þ n - = n: We have dimV = dimW þ þ dimW - = nþ þ n - = n:

ð24:175Þ

We name W+ and W- Euclidean subspace and anti-Euclidean subspace, respectively. With respect to W+ and W-, we have a following theorem. Theorem 24.5 [5] Let V be a vector space with a dimension n. Let W+ and W- be Euclidean subspace and anti-Euclidean subspace of V with the dimension n+ and n-, respectively (n+ + n- = n). Then, there must be a totally degenerate subspace W0 with a dimension n0 given by n0 = minðnþ , n - Þ:

ð24:176Þ

Proof From Theorem 24.4, we have W+ \ W- = {0}. From the definition of the totally degenerate subspace, obviously we have W+ \ W0 = W- \ W0 = {0}. (Otherwise, W+ and W- would be degenerate.) From this, we have nþ þ n0 ≤ n, n - þ n0 ≤ n:

ð24:177Þ

Then, we obtain n0 ≤ minðn - nþ , n - n - Þ = minðn - , nþ Þ  m:

ð24:178Þ

Suppose n+ = m ≤ n-. Then, we can choose two sets of orthonormal bases fei1 , ⋯, eim g for which fik jeil = δik il and ej1 , ⋯, ejm

for which fjk jejl = - δjk jl .

Let us think of the following set W given by W = ei1 - ej1 , ⋯, eim - ejm : Defining eis - ejs  Es , fis - fjs  Fs ðs = 1, ⋯, mÞ, we have

ð24:179Þ

24.1

Extended Concepts of Vector Spaces

½Fs jEt  = 0 ðs, t = 1, ⋯, mÞ:

1183

ð24:180Þ

Let us consider the following equation: c1 E1 þ ⋯ þ cm Em = 0:

ð24:181Þ

Suppose that E1 , ⋯, Em are linearly dependent. Then, without loss of generality we assume c1 ≠ 0. Dividing both sides of (24.181) by c1 and expressing (24.181) in terms of ei1 , we obtain ei1 = ej1 þ

c c2 e - ej2 þ ⋯ þ m eim - ejm : c1 i2 c1

ð24:182Þ

Equation (24.182) shows that ei1 is described by a linear combination of (2m - 1) linearly independent vectors, but it is in contradiction to the fact that 2m vectors ei1 , ⋯, eim ; ej1 , ⋯, ejm are linearly independent. Thus, E1 , ⋯, and Em must be linearly independent. Hence, Span fE1 , ⋯, Em g forms a m-dimensional vector space. Equation (24.180) implies that it is a totally degenerate subspace of V with a dimension of m. This immediately means that W 0 = Span ei1 - ej1 , ⋯, eim - ejm :

ð24:183Þ

Then, from the assumption we have n0 = m. Namely, the equality holds with (24.178) so that (24.176) is established. If, on the other hand, we assume n- = m ≤ n+, proceeding in a similar way as the above discussion we reach the same conclusion, i.e., n0 = m. This completes the proof. Since we have made general discussion and remarks on the vector spaces where the bilinear form is defined, we wish to show examples of the Minkowski space 4 and 3 . Example 24.3 We study several properties of the Minkowski space, taking fourdimensional space 4 and three-dimensional space 3 as an example. For both the cases of 4 and 3 , the dimension of a totally degenerate subspace is 1 (Theorem 24.5). In 4 , n+ and n- of (24.176) are 1 and 3, respectively; see (24.97). The space 4 is denoted by 4 = Span fe0 , e1 , e2 , e3 g, where e0, e1, e2, and e3 are defined according to (24.95) with

ð24:184Þ

1184

24

Basic Formalism

fi jej = - δij ζ i = - δij ζ j = fj jei ; ζ 0 = - ζ k ðk = 1, ⋯, 3Þ = - 1:

ð24:185Þ

With the notation ζk (k = 0, ⋯, 3), see (22.323). According to Theorem 24.4, we have, e.g., a following orthogonal decomposition: 4 = Span fe0 g

Span fe0 g⊥ ,

where Span {e0}⊥ = Span{e1, e2, e3}. Since ½f0 je0  = 1, e0 is a timelike vector. As ½f1 þ f2 þ f3 je1 þ e2 þ e3  = - 3, e1 + e2 + e3 is a spacelike vector, etc. Meanwhile, according to Theorem 24.5, a dimension n0 of the totally degenerate subspace W0 is n0 = min (n+, n-) = min (1, 3) = 1. Then, for instance, we have W 0 = Spanfje0 þje1 g:

ð24:186Þ

Proposition 24.1 tells us that it is impossible to get an orthogonal decomposition using a degenerate subspace. In fact, as an orthogonal complement to W0 we have W 0 ⊥ = Span fe0 þ e1 , e2 , e3 g:

ð24:187Þ

Then, W0 \ W0⊥ = W0 = j e0] + j e1] ≠ 0. Note that W0⊥ can alternatively be expressed as [5] W 0⊥ = W 0

Span fe2 , e3 g:

ð24:188Þ

The presence of the totally degenerate subspace W0 has a special meaning in the theory of special relativity. The collection of all singular vectors forms a light cone. Since the dimension of W0 is one, it is spanned by a unique lightlike vector (i.e., singular vector), e.g., je0] + j e1]. It may be chosen from among any singular vectors. Let us denote a singular vector by uλ (λ 2 Λ) and the collection of all the singular vectors that form the light cone by L. Then, we have L = [λ2Λ uλ :

ð24:189Þ

With the above notation, see Sect. 6.1.1. Notice that the set L is not a vector space (vide infra). The totally degenerate subspace W0 can be described by W 0 = Span fuλ g, where uλ is any single singular vector. We wish to consider the coordinate representation of the light cone in the orthogonal coordinate system. In the previous chapters, by the orthogonal coordinate system we meant a two- or three-dimensional Euclidean space, i.e., a real inner

24.1

Extended Concepts of Vector Spaces

1185

product space accompanied by an orthonormal basis. In this chapter, however, we use the terminologies of orthogonal coordinate system and orthonormal basis, if basis vectors satisfy the relations of (24.153). Let the vector w 2 L be expressed as 3

w=

i=0

ξ i ei :

ð24:190Þ

Then, we obtain 3 i=0

ξ i ei j

3 k=0

ξ k ek = ξ 0

2

- ξ1

2

- ξ2

2

- ξ3

2

= 0:

ð24:191Þ

That is, ξ1

2

þ ξ2

2

þ ξ3

2

2

= ξ0 :

ð24:192Þ

If we regard the time coordinate (represented by ξ0) as parameter, the light cone can be seen as an expanding (or shrinking) hypersphere in 4 with elapsing time (both past and future directions). In Fig. 24.1a, we depict a geometric structure including the light cone of the three-dimensional Minkowski space 3 so that the geometric feature can appeal to eye. There, we have ξ1

2

þ ξ2

2

2

= ξ0 ,

ð24:193Þ

where ξ1 and ξ2 represent the coordinate of the x- and y-axis, respectively; ξ0 indicates the time coordinate t. Figure 24.1b shows how to make a right circular cone contained in the light cone. The light cone comprises the whole collection of singular vectors uλ (λ 2 Λ). Geometrically, it consists of a couple of right circular cones that come face-to-face with each other at the vertices, with the axis (i.e., a line that connects the vertex and the center of the bottom circle) shared in common by the two right circular cones in 3 (see Fig. 24.1a). A lightlike vector (i.e., singular vector) is represented by a straight line obliquely extending from the origin to both the future and past directions in the orthogonal coordinate system. Since individual lightlike vectors form the light cone, it is a kind of ruled surface. A circle expanding (or shrinking) with time ξ0 according to (24.193) is a conic section of the right circular cone shaping the light cone. In Fig. 24.1a, the light cone comprises a two-dimensional figure, and so one might well wonder if the light cone is a two-dimensional vector space. Think of, however, e.g., je0] + j e1] and je0] - j e1] within the light cone. If those vectors were basis vectors, (| e0]+| e1]) + (| e0]-| e1]) = 2 j e0] would be a vector of the light cone as well. That is, however, evidently not the case, because the vector je0] is located

1186

24

t

(a)

(b)

|

] |

| ]

Basic Formalism

]

y

O

x

Fig. 24.1 Geometric structure including the light cone of the three-dimensional Minkowski space 3 . (a) The light cone consists of a couple of right circular cones that come face-to-face with each other at the vertices, with the axis (i.e., a line that connects the vertex and the center of the bottom circle) shared in common. A lightlike vector (i.e., singular vector) is represented by an oblique arrow extending from the origin to both the future and past directions. Individual lightlike vectors form the light cone as a ruled surface. (b) Simple kit that helps visualize the light cone. To make it, follow next procedures: (i) Take a thick sheet of paper and cut out two circles to shape them into p sectors with a central angle θ = 2π radian ≈ 254:6 ° . (ii) Bring together the two sides of the sector so that it can be a right circular cone with its vertical angle of a right angle (i.e., 90°). (iii) Bring together the two right circular cones so that they can be jointed at their vertices with the axis shared in common to shape the light cone depicted in (a) of the above

outside the light cone. Whereas a vector “inside” the light cone is a timelike, that “outside” the light cone is spacelike. We have, e.g., 3 = Span fe0 g

Span fe0 g⊥ ,

ð24:194Þ

where Span {e0}⊥ = Span{e1, e2}. Since ½f0 je0  = 1, e0 is a timelike vector. As ½f1 þ f2 je1 þ e2  = - 2, e1 + e2 is a spacelike vector, etc. Meanwhile, a dimension n0 of the totally degenerate subspace W~0 of 3 is n0 = min (n+, n-) = min (1, 2) = 1. Then, for instance, as in (24.186), we have W~0 = Spanfje0 þje1 g: Also, as an orthogonal complement to W~0 , we obtain

ð24:195Þ

24.2

Lorentz Group and Lorentz Transformations ⊥ W~0 = Span fe0 þ e1 , e2 g

or

⊥ W~0 = W~0

1187

Span fe2 g:

ð24:196Þ

⊥ The orthogonal complement W~0 forms a tangent space to the light cone L. We ⊥ have W~0 \ W~0 = W~0 = j e0 þ j e1  ≠ 0.

As described above, we have overviewed the characteristics of the extended concept of vector spaces and studied several properties which the inner product space and Euclidean space lack. Examples included the singular vectors and degenerate subspaces as well as their associated properties. We have already seen that the indefinite metric caused intractability in the field quantization. The presence of the singular vectors is partly responsible for it. The theory of vector space is a fertile ground for mathematics and physics. Indeed, the theory deals with a variety of vector spaces such as an inner product space (in Chap. 13), Euclidean space, Minkowski space, etc. Besides them a spinor space is also contained, even though we do not go into detail about it. Interested readers are referred to the literature with this topic [5]. Yet, the spinor space naturally makes an appearance in the Dirac equation as a representation space. We will deal with the Dirac equation from a different angle in the following sections.

24.2

Lorentz Group and Lorentz Transformations

In the previous section, we have learned that the transformation R that makes the bilinear form [| ] invariant must satisfy the condition described by RT M R = M :

ð24:104Þ

At the same time, (24.104) has given a condition for the metric tensor M to be invariant with respect to the vector (or coordinate) transformation R in a vector space where [| ] is defined. In the Minkowski space 4 , R of (24.104) is called the Lorentz transformation and written as Λ according to the custom. Also, M is denoted by η. Then, we have η = ΛT ηΛ:

ð24:197Þ

Equation (24.197) implies that the matrix η is determined, independent of the choice of the inertial frame of reference. Or rather, the mathematical formulation of the theory of special relativity has been established in such a way that η can be invariant in any inertial frame of reference. This has had its origin in the invariance of the world interval of the events (see Sect. 21.2.2). As already mentioned in Sect. 21.2, the Lorentz group consists of all the Lorentz transformations Λ. The Lorentz group is usually denoted by O(3, 1). On the basis of (24.197), we define O(3, 1) as a set that satisfies the following relation described by

1188

24

Oð3, 1Þ = Λ 2 GLð4, ℝÞ; ΛT ηΛ = η :

Basic Formalism

ð24:198Þ

Suppose that Λ1, Λ2 2 O(3, 1). Then, we have ðΛ1 Λ2 ÞT ηΛ1 Λ2 = ΛT2 ΛT1 ηΛ1 Λ2 = ΛT2 ΛT1 ηΛ1 Λ2 = ΛT2 ηΛ2 = η:

ð24:199Þ

That is, Λ1Λ2 2 O(3, 1). With the identity operator E, we have E T ηE = EηE = ηEE = η: Multiplying both sides of ΛT1 ηΛ1 = η by ΛT1 the right, we obtain η = ΛT1

-1

-1

-1

ð24:200Þ

from the left and (Λ1)-1 from

ηðΛ1 Þ - 1 = Λ1- 1 ηðΛ1 Þ - 1 , T

ð24:201Þ

T

where we used ΛT1 = Λ1- 1 . From (24.199) to (24.201), we find that O(3, 1) certainly satisfies the definitions of group; see Sect. 16.1. Taking determinant of (24.197), we have detΛT detηdetΛ = detΛð- 1ÞdetΛ = detη = - 1,

ð24:202Þ

where with the first equality we used detΛT = det Λ [1, 2]. That is, we obtain ðdetΛÞ2 = 1 or detΛ = ± 1:

ð24:203Þ

Lorentz groups are classified according to the sign of determinants and the (0, 0) component (i.e., Λ00 ) of the Lorentz transformation matrix. Using the notation of tensor elements, (24.197) can be written as ηρσ = ημν Λμ ρ Λν σ :

ð24:204Þ

In (24.204) putting ρ = σ = 0, we get η00 Λ0 0

2

þ

3

η i = 1 ii

Λi 0

2

= η00 ,

namely we have Λ0 0 Hence,

2

=1 þ

3 i=1

2

Λi 0 :

ð24:205Þ

24.2

Lorentz Group and Lorentz Transformations

1189

Table 24.1 Four connected components of the Lorentz groupa

a

Λ0 0

Connected component L0  SO0 ð3, 1Þ

Nomenclature Proper orthochronous

Λ0 0 ≥ 1

detΛ +1

L1

Improper orthochronous

Λ0 0 ≥ 1

-1

L2

Improper anti-orthochronous

Λ0 0 ≤ - 1

-1

L3

Proper anti-orthochronous

Λ0 0 ≤ - 1

+1

With the notations L0 , L1 , L2 , and L3 , see Sect. 24.9.2

Λ0 0

2

≥ 1:

ð24:206Þ

Therefore, we obtain either Λ0 0 ≥ 1

or

Λ0 0 ≤ - 1:

ð24:207Þ

Notice that in (24.207) the case of Λ0 0 ≥ 1 is not accompanied by the time reversal, but the case of Λ0 0 ≤ - 1 is accompanied by the time reversal. Combining (24.203) and (24.207), we get four combinations for the Lorentz groups and these combinations correspond to individual connected components of the groups (see Chap. 20). Table 24.1 summarizes them [6–8]. The connected component that contains the identity element is an invariant subgroup of O(3, 1); see Theorem 20.9. This component is termed the proper orthochronous Lorentz group and denoted by SO0(3, 1). The notation differs from literature to literature, and so care should be taken accordingly. In this book, to denote the proper orthochronous Lorentz group, we use SO0(3, 1) according to Hirai [6], for which the index 0 means the orthochronous transformation and S stands for the proper transformation (i.e., pure rotation without spatial inversion). In Table 24.1, we distinguish the improper Lorentz transformation from the proper transformation according as the determinant of the transformation matrix Λ is ±1 [7]. We will come back to the discussion about the construction of the Lorentz group in Sect. 24.9.2 in terms of the connectedness.

24.2.1

Lie Algebra of the Lorentz Group

The Lie algebra of the Lorentz group is represented by a (4, 4) real matrix. We have an important theorem about the Lie algebra of the Lorentz group. Theorem 24.6 [8] Let the Lie algebra of the Lorentz group O(3, 1) be denoted as oð3, 1Þ. Then, we have oð3, 1Þ = X; X T η þ ηX = 0, X 2 glð4, ℝÞ :

ð24:208Þ

1190

24

Basic Formalism

Proof Suppose that X 2 oð3, 1Þ. Then, from Definition 20.3, exp(sX) 2 O(3, 1) with 8 s (s: real number). Moreover, from (24.198), we have exp sXÞT ηðexp sX Þ = η:

ð24:209Þ

Differentiating (24.209) with respect to s, we obtain X T exp sXÞT ηðexp sX Þ þ exp sXÞT ηX ðexp sX Þ = 0:

ð24:210Þ

Taking the limit s → 0, we get X T η þ ηX = 0:

ð24:211Þ

Inversely, suppose that X satisfies (24.211). Then, using (15.31), we obtain ðexp sX ÞT η = exp sX T η =

1 ν=0

1 ν sX T η = ν!

1 ν=0

sν T ν X η: ν!

ð24:212Þ

In (24.212), we have ν

XT η = XT

ν-1

= XT

XT η = XT ν-2

ν-1

ð- ηX Þ = X T

ν-2

ηð- 1Þ2 X 2 = ⋯ = ηð- 1Þν X ν :

X T ð- ηX Þ ð24:213Þ

Substituting (24.213) for (24.212), we get ðexp sX ÞT η =

sν ηð- 1Þν X ν = η ν = 0 ν! 1

sν ð- X Þν = η expð- sX Þ ν = 0 ν! 1

= ηðexp sX Þ - 1 ,

ð24:214Þ

where with the last equality we used (15.28). Multiplying both sides of (24.214) by expsX from the right, we obtain ðexp sX ÞT ηðexp sX Þ = η:

ð24:215Þ

This implies that expsX 2 O(3, 1). This means X 2 oð3, 1Þ. These complete the proof. Let us seek a tangible form of a Lie algebra of the Lorentz group. Suppose that X 2 oð3, 1Þ of (24.208). Assuming that

24.2

Lorentz Group and Lorentz Transformations

a e p t

X=

b f q u

c g r v

1191

d h s w

ða w : realÞ

ð24:216Þ

and substituting X of (24.216) for (24.208), we get 2a b - e b-e - 2f c-p - ð g þ qÞ d-t - ð h þ uÞ

c-p - ð g þ qÞ - 2r - ð s þ vÞ

d -t - ð h þ uÞ - ð s þ vÞ - 2w

= 0:

ð24:217Þ

Then, we obtain a = f = r = w = 0, b = e, c = p, d = t, g = - q, h = - u, s = - v:

ð24:218Þ

The resulting form of X is 0 b c d

X=

b 0 -g -h

c g 0 -s

d h s 0

:

ð24:219Þ

Since we have ten conditions for sixteen elements, we must have six independent representations. We show a tangible example of these representations as follows: 0 0 0

0 0 0

0 0 0

0 0 -1

0

0

1

0

0 1 0

1 0 0

0 0 0

0 0 0

0

0

0

0

0 ,

,

0 0

0 0 0

0 0 0 1 0 0

0

-1

0

0 0 1

0 0 0

1 0 0

0 0 0

0

0

0

0

,

0

,

0 0 0

0 0 1

0

0

0 0 -1 0 0 0 0

0 0 0 0 0 0

0 0 0

1 0 0

1

0

0

0

;

0

: ð24:220Þ

The Lie algebra of the Lorentz group can be expressed as linear combinations of the above matrices (i.e., basis vectors of this Lie algebra). This Lie algebra is a six-dimensional vector space V6 accordingly. Note that in (24.220) the upper three matrices are anti-Hermitian and that the lower three matrices are Hermitian. We symbolically represent the upper three matrices and the lower three matrices as A  ðA1 , A2 , A3 Þ

and

B  ðB1 , B2 , B3 Þ,

ð24:221Þ

1192

24

Basic Formalism

respectively, in the order from the left of (24.220). Regarding the upper three matrices, the (3, 3) principal submatrices excluding the (0, 0) element are identical with Ax, Ay, and Az in the shown order; see (20.26). In other words, these three matrices represent the spatial rotation in ℝ3. Moreover, these matrices have been decomposed into a direct sum such that A1 = f0g

Ax , A2 = f0g

Ay , A3 = f0g

Az ,

ð24:222Þ

where {0} denotes the (0, 0) element of A. The remaining three matrices B are called Lorentz boosts. Expressing a magnitude related to A and B as a  (a1, a2, a3) and b  (b1, b2, b3), respectively, we succinctly describe any Lorentz transformation Λ as Λ = expða  AÞ expðb  BÞ:

ð24:223Þ

The former factor of (24.223) is related to a spatial rotation and represented by a unitary matrix (see Sect. 15.2). The latter factor of (24.223) is pertinent to the Lorentz boost and represented by a Hermitian matrix (Sect. 25.3). Since the Lorentz transformation is represented by a real matrix, a matrix describing the Lorentz boost is symmetric. For example, with the former factor, we obtain, e.g.,

expðθA3 Þ =

1 0 0 0

0 cos θ sin θ 0

0 - sin θ cos θ 0

0 0 0 1

ð24:224Þ

,

where we put a3 = θ. With the Lorentz boost, we have, e.g.,

expð - ωB3 Þ =

coshω 0 0 1 0 0 - sinhω 0

0 - sinhω 0 0 1 0 0 coshω

,

ð24:225Þ

where we put b3 = - ω. In (24.224), we have 0 ≤ θ ≤ 2π, that is, the parameter space θ is bounded. In (24.225), on the other hand, we have - 1 < ω < 1 where the parameter space ω is unbounded. The former group is called a compact group, whereas the latter is said to be a non-compact group accordingly. In general, with a compact group, it is possible to construct a unitary representation [8]. For a non-compact group, however, it is not necessarily possible to construct a unitary representation. In fact, although expðθA3 Þ of (24.224) is unitary [see (7)’ of Sect. 15.2], expð- ωB3 Þ in (24.225) in not unitary, but Hermitian. These features of the Lorentz group lead to various intriguing properties (vide infra).

24.2

Lorentz Group and Lorentz Transformations

24.2.2

1193

Successive Lorentz Transformations [9]

In the last sections, we have studied how the Lorentz transformation is dealt with using the bilinear form and Lie algebra in the Minkowski space. In the subsequent sections, we examine how the Dirac equation is transformed through the medium of the Lorentz transformation within the framework of standard matrix algebra. To examine the constitution of the Dirac equation appropriately, we need to deal with the successive Lorentz transformations. These transformations can be well understood by the idea of the moving coordinate systems (Sect. 17.4.2). Let us express a four-vector X (i.e., space-time vector) in the Minkowski space as X = ðje0  je1  je2  je3 Þ

x01 x2 x3 x

ð24:226Þ

,

where {| e0], | e1], | e2], | e3]} are the basis set of Minkowski space defined in Sect. 24.1.7;

x01 x2 x3 x

are corresponding coordinates. We will use a symbol ei (i = 0, 1, 2, 3)

instead of jei], in case confusion is unlikely to happen. Following Sect. 11.4, we specify the transformation in such a way that the said transformation keeps the vector X unaltered. In terms of the special theory of relativity, a vector is associated with a certain space-time point where an event has taken place. Consequently, the vector X should be the same throughout the inertial frames of reference that are connected to one another via the Lorentz transformations. In what follows, we express a four-vector X using a shorthand notation as X  ex = ðje0  je1  je2  je3 Þ

x01 x2 x3 x

,

where e stands for the basis set of 4 and x denotes the corresponding coordinates (i.e., a column vector). Let a transformation be expressed as P. We rewrite X as X = ex = eP - 1 ðPxÞ = e0 x0 ,

ð24:227Þ

where e and e′ are the set of basis vectors connected by P with e′ = eP-1 and Px = x′. Once again, we note that in the case where the Lorentz transformation is relevant to the expression, unlike (11.84) of Sect. 11.4 the coordinate representation is placed ahead of the basis vector representation (see Sect. 24.1.5). Next, let us think about the several successive transformations P1, ⋯, Pn. We have

1194

24

ex = eP1 - 1 ⋯Pn - 1 ðPn ⋯P1 xÞ = e0 ðPn ⋯P1 xÞ = e0 x0 :

Basic Formalism

ð24:228Þ

From the last equality, we obtain x0 = Pn ⋯P1 x,

ð24:229Þ

because e′ is the basis set of vectors. Now, we wish to apply the above general point of view to our present case. Let us think of three successive Lorentz transformations. Of these, two transformations are spatial rotations and remaining one is a Lorentz boost. We have four pertinent inertial frames of reference. We denote them by O, Oθ, Oϕ, and O′. Also, we distinguish the associated basis vectors and corresponding coordinates by e, x ðOÞ; θ, xθ ðOθ Þ; ϕ, xϕ Oϕ ; e0 , x0 ðO0 Þ:

ð24:230Þ

We proceed with the calculations from the assumption that an electron is at rest in the frame O, where the Dirac equation can readily be solved. We perform the successive Lorentz transformations that start with O and finally reach O′ via Oθ and Oϕ such that O ðeÞ → Oθ ðθÞ → Oϕ ðϕÞ → O0 ðe0 Þ:

ð24:231Þ

Then, the successive transformation of the basis vectors is described by eΛb Λθ- 1 Λϕ- 1 = e0 ,

ð24:232Þ

where the individual transformations are given in the moving coordinate. Figure 24.2a depicts these transformations similarly as in the case of Fig. 17.17. Equation (24.232) can be rewritten as e = e0 Λϕ Λθ Λb- 1 :

ð24:233Þ

In accordance with Fig. 24.2a, Fig. 24.2b shows how the motion of electron looks from the frame O′, where the electron is moving at a velocity v in the direction defined by a zenithal angle θ and azimuthal angle ϕ. Correspondingly, from (24.229) we have x0 = Λϕ Λθ Λb- 1 x: As the full description of (24.234), we get

ð24:234Þ

24.2

Lorentz Group and Lorentz Transformations x00 x01 x02 x03

1195 x01 x2 x3 x

= Λϕ Λθ Λb- 1

:

ð24:235Þ

Thus, we find that three successive Lorentz transformations have produced the total Lorentz transformation Λ described by Λ = Λϕ Λθ Λb- 1 ,

ð24:236Þ

where Λb- 1 related to the Lorentz boost is the inverse matrix of (24.225). We compute the combined Lorentz transformation Λ such that Λ = Λϕ Λθ Λb- 1 = 1 0 0 0 0 cos ϕ - sin ϕ 0 0 0

=

sin ϕ 0

cos ϕ 0

0 1

1 0

0 cos θ

0 0

0 sin θ

0 0 1 0 0 - sin θ 0 cos θ

coshω 0 0 sinhω

0 1 0 0

0 sinhω 0 0 1 0 0 coshω

coshω

0

0

sinhω

cos ϕ sin θsinhω sin ϕ sin θsinhω

cos ϕ cos θ sin ϕ cos θ

- sin ϕ cos ϕ

cos ϕ sin θcoshω sin ϕ sin θcoshω

cos θsinhω

- sin θ

0

cos θcoshω

:

ð24:237Þ

Equation (24.237) describes a general form of the Lorentz transformation with respect to the polar coordinate representation (see Fig. 24.2b). Notice that the transformation Λ expressed as (24.237) satisfies the criterion of (24.197). The confirmation is left for the readers. The Lorentz boost Λb- 1 has been defined in reference to the frame O. It is worth examining how the said boost is described in reference to the frame O′. For this purpose, we make the most of the general method that was developed in Sect. 17.4.2. It is more convenient to take a reverse procedure to (24.231) so that the general method can directly be applied to the present case. That is, think of the following transformations O0 ðe0 Þ → Oϕ ðϕÞ → Oθ ðθÞ → O ðeÞ

ð24:238Þ

in combination with the inverse operators to those shown in Fig. 24.2a. Then, use ð1Þ (17.85) and (17.91) with k = n = 3. Deleting R3 , we get ð1Þ ð2Þ

R1 R2 R3

ð1Þ - 1

R1 R2

ð1Þ

= R3 : ð2Þ

ð24:239Þ

Substituting Λϕ, Λθ, and Λb- 1 for R1, R2 , and R3 , respectively, and defining -1 , we obtain R3  Λ~b

1196

24

(a) Λ

(b)

Λ

Λ

Basic Formalism

Fig. 24.2 Successive Lorentz transformations and their corresponding coordinate systems. (a) Lorentz boost (Λb) followed by two spatial rotations (Λθ- 1 and Λϕ- 1 ). The diagram shows the successive three transformations of the basis vectors. (b) Geometry of an electron motion. The electron is moving at a velocity v in the direction defined by a zenithal angle θ and azimuthal angle ϕ in reference to the frame O′

Λ~b

-1

= Λϕ Λθ Λb- 1 Λϕ Λθ

-1

:

ð24:240Þ

Further defining the combined rotation-related unitary transformation Λu in the Minkowski space as Λ u  Λϕ Λθ ,

ð24:241Þ

Λ = Λu Λb- 1 :

ð24:242Þ

we get

Also, we have Λ~b -1

-1

= Λu Λb- 1 Λ{u :

ð24:243Þ

The operator Λ~b represents the Lorentz boost viewed in reference to the frame O′, whereas the operator Λb- 1 is given in reference to Oθ with respect to the same boost. Both the transformations are connected to each other through the unitary similarity transformation and belong to the same conjugacy class, and so they have a -1 is given by same physical meaning. The matrix representation of Λ~b

24.3

Covariant Properties of the Dirac Equation

Λ~b coshω

cos ϕsinθsinhω

cos ϕsinθsinhω cos 2 ϕcos 2 θ þ sin 2 ϕ þcos 2 ϕ sin 2 θcoshω sin ϕsinθsinhω

cos ϕ sinϕsin 2 θ × ðcoshω- 1Þ

cos θsinhω

cos ϕcos θ sinθ × ðcoshω- 1Þ

1197

-1

=

sin ϕsinθsinhω

cos θsinhω

cos ϕsin ϕsin θ × ðcoshω- 1Þ

cos ϕcos θ sinθ × ðcoshω -1Þ

sin 2 ϕcos 2 θ þ cos 2 ϕ þ sin 2 ϕsin 2 θcoshω

sinϕcos θ sinθ × ðcoshω -1Þ

sin ϕcos θ sinθ × ðcoshω- 1Þ

sin 2 θ þ cos 2 θcoshω

2

:

ð24:244Þ

The trace χ of (24.244) equals χ = 2 þ cosh ω:

ð24:245Þ

of Λb- 1 , reflecting the fact that the unitary similarity transformation keeps the trace unchanged. Equation (24.242) gives an interesting example of the decomposition of Lorentz transformation into the unitary part and Hermitian boost part (i.e., polar decomposition). The Lorentz boost is represented by a positive definite Hermitian matrix (or a real symmetric matrix). We will come back to this issue later (Sect. 24.9.1).

24.3 24.3.1

Covariant Properties of the Dirac Equation General Consideration on the Physical Equation

It is of great importance and interest to examine how the Dirac equation is transformed under the Lorentz transformation. Before addressing this question, we wish to see how physical equations (including the Schrödinger equation, the KleinGordon equation, and Dirac equation) undergo the transformation. In general, the physical equations comprise a state function X (scalar, vector, spinor, etc.) and an operator A that operates on the state function X. A general form of the physical equation is given as A ðX Þ = 0: According to Sect. 11.2, we wright X as

ð24:246Þ

1198

24

X = ðξ 1 ⋯ ξ n Þ

c1 ðxÞ ⋮ cn ðxÞ

Basic Formalism

ð24:247Þ

,

where {ξ1, ⋯, ξn} are the basis functions (or basis vectors) of a representation space and

c1 ðxÞ ⋮ cn ðxÞ

indicate the corresponding coordinates. In (24.247), we assume that

X is an element of the representation space of dimension n that is related to the physical equation. Correspondingly, we wright LHS of (24.246) as A ðX Þ = ðξ1 ⋯ ξn ÞA

c1 ð xÞ ⋮ cn ð x Þ

:

ð24:248Þ

Also, according to Sect. 11.4, we rewrite (24.248) such that A ðX Þ = ðξ1 ⋯ ξn ÞR - 1 RAR - 1 R = ðξ ′ 1 ⋯ ξ ′ n ÞRAR - 1

c01 ðxÞ ⋮ c0n ðxÞ

c1 ð xÞ ⋮ cn ð x Þ

= ðξ ′ 1 ⋯ ξ ′ n ÞA 0

c01 ðxÞ ⋮ c0n ðxÞ

,

ð24:249Þ

where R represents a transformation matrix in a general form; it can be e.g., an orthogonal transformation, a Lorentz transformation, etc. Notice again that unlike (11.84) of Sect. 11.4 (see Sect. 24.2.2), the coordinates representation and transformation are placed ahead of those of the basis vectors. In (24.249), the relation expressed as ðξ ′ 1 ⋯ ξ ′ n Þ = ðξ1 ⋯ ξn ÞR - 1

ð24:250Þ

shows the set of basis functions obtained after the transformation. The relation c01 ðxÞ ⋮ c0n ðxÞ

=R

c1 ðxÞ ⋮ cn ðxÞ

ð24:251Þ

represents the corresponding coordinates after being transformed. Also, the operator A 0 given by A 0 = RAR - 1

ð24:252Þ

denotes the representation matrix of the operator viewed in reference to ξ01 ⋯ ξ0n . In general, the collection of the transformations forms a group (i.e., transformation group). In that case, ξ01 , ⋯, ξ0n spans the representation space. To stress that

24.3

Covariant Properties of the Dirac Equation

1199

R is the representation matrix of the group, we identify it with D(R). That is, from (18.6) we have DðRÞ  R

½DðRÞ - 1 = D R - 1 :

and

ð24:253Þ

Then, (24.249) is rewritten as A ðX Þ = ðξ ′ 1 ⋯ ξ ′ n ÞDðRÞA ½DðRÞ - 1

c01 ðxÞ ⋮ c0n ðxÞ

:

ð24:254Þ

Instead of the full representation of (24.248) and (24.249), we normally write ðξ1 ⋯ ξn ÞA = 0

or

ðξ ′ 1 ⋯ ξ ′ n ÞA 0 = 0:

ð24:255Þ

Alternatively, we write A

c1 ð xÞ ⋮ cn ð x Þ

=0

or

c01 ðxÞ ⋮ c0n ðxÞ

A0

= 0,

ð24:256Þ

where A 0 = DðRÞA ½DðRÞ - 1 , c01 ðxÞ ⋮ c0n ðxÞ

= DðRÞ

c1 ð xÞ ⋮ cn ð xÞ

ð24:257Þ :

ð24:258Þ

Equation (24.256) can be viewed as an eigenvalue problem with respect to the operator A with its corresponding “eigenvector” of (n, 1) matrix. At least one of the eigenvalues is zero (possibly degenerate). The second equation of (24.256) indicates the transformation of the operator A and its eigenvectors, which are transformed according to (24.257) and (24.258), respectively. Their transformation is accompanied by the transformation of the basis functions (or vectors) {ξ1, ⋯, ξn}, even though the basis functions (or vectors) are not shown explicitly. In this section, we solely use (24.256) to conform to the custom. The relevant formalism may well be referred to as the “coordinate representation.” According to the custom, when the formalism is associated with the Lorentz transformation, we express the representation D(Λ) as DðΛÞ  SðΛÞ,

ð24:259Þ

where Λ indicates a Lorentz transformation; S(Λ) denotes a representation of Lorentz group. The notation (Λ) is usually omitted and we simply write S to mean S(Λ).

1200

24.3.2

24

Basic Formalism

The Klein-Gordon Equation and the Dirac Equation

Now, we wish to see how the tangible physical equations which we are interested in undergo the transformation. We start with the Klein-Gordon equation and then examine the transformation properties of the Dirac equation in detail. (i) Klein-Gordon equation: The equation is given by μ

∂μ ∂ þ m2 ϕðxÞ = 0:

ð24:260Þ

Using (24.252), we have μ

A  ∂μ ∂ þ m 2 ,

ð24:261Þ

μ

A 0 = SAS - 1 = S ∂μ ∂ þ m2 S - 1 :

ð24:262Þ

Then, (24.256) can be translated into μ

S ∂μ ∂ þ m2 S - 1  SϕðxÞ = 0:

ð24:263Þ

Meanwhile, using (21.25) and (21.51) we obtain μ

0

∂μ ∂ ¼ Λν μ ∂ν Λ - 1

μ



ρ

0



0



∂ ¼ δνρ ∂ν ∂ ¼ ∂ρ ∂ :

ð24:264Þ

Assuming that S is commutative with both ∂μ and ∂μ, (24.263) can be rewritten as 0



∂μ ∂ þ m2 SϕðxÞ = 0:

ð24:265Þ

For (24.265) to be covariant with respect to the Lorentz transformation, we must have 0



∂μ ∂ þ m2 ϕ0 ðx0 Þ = 0:

ð24:266Þ

Since ϕ(x) is a scalar function, from the discussion of Sect 19.1 we must have ϕðxÞ = ϕ0 ðx0 Þ = SϕðxÞ: Comparing the first and last sides of (24.267), we get

ð24:267Þ

24.3

Covariant Properties of the Dirac Equation

1201

S  1:

ð24:268Þ

Thus, the Klein-Gordon equation gives a trivial but important one-dimensional representation with the scalar field. (ii) Dirac equation: In the x-system, the Dirac equation has been described as iγ μ ∂μ - m ψ ðxÞ = 0:

ð22:213Þ

Since we assume the four-dimensional representation space with the Dirac spinor ψ(x), we describe ψ(x) as ψ ð xÞ 

c1 ð xÞ c2 ð xÞ c34 ðxÞ c ð xÞ

ð24:269Þ

accordingly. Notice that in (24.269) the superscripts take the integers 1, ⋯, 4 instead of 0, ⋯, 3. It is because the representation space for the Dirac equation is not the Minkowski space. Since ψ(x) in (24.269) represents a spinor, the superscripts are not associated with the space-time. Defining A  iγ μ ∂μ - m,

ð24:270Þ

Aψ ðxÞ = 0:

ð24:271Þ

(22.213) is read as

If we were faithful to the notation of (24.248), we would write ðχ 1 ⋯ χ n ÞAψ ðxÞ = 0: At the moment, however, we only pay attention to the coordinate representation in the representation space. In this case, we may regard the column vector ψ(x), i.e., spinor as the “eigenvector.” The situation is virtually the same as that where we formulated the quantum-mechanical harmonic oscillator in terms of the matrix representation (see Sect. 2.3). A general form (24.256) is reduced to S iγ μ ∂μ - m S - 1  Sψ ðxÞ = 0: Using (21.51), (24.272) is rewritten as

ð24:272Þ

1202

24 0

S iγ μ Λν μ ∂ν - m S - 1  Sψ ðxÞ = 0:

Basic Formalism

ð24:273Þ

We wish to delete the factor Λν μ from (24.273) to explicitly describe 22.213 in the Lorentz covariant form. For this, it suffices for the following relation to hold: 0

0

Siγ μ Λν μ ∂ν S - 1 = iγ ν ∂ν :

ð24:274Þ 0

In (24.274) we have assumed that S is commutative with ∂ν . Rewriting (24.274), we get 0

∂ν Λν μ Sγ μ S - 1 - γ ν = 0:

ð24:275Þ

That is, to maintain the Lorentz covariant form, again, it suffices for the following relation to hold [10, 11]: Λν μ Sγ μ S - 1 = γ ν :

ð24:276Þ

Notice that in (24.276) neither S nor S-1 is in general commutative with γ μ. Meanwhile, we define the Dirac spinor observed in the x′-system as [10, 11] ψ 0 ðx0 Þ  Sψ ðxÞ = Sψ Λ - 1 x0 ,

ð24:277Þ

where ψ ′(x′) denotes the change in the functional form accompanied by the coordinate transformation. Notice that only in the case where ψ(x) is a scalar function, we have ψ ′(x′) = ψ(x); see (19.5) and (24.267). Using (24.276), we rewrite (24.273) as 0

iγ μ ∂μ - m  ψ 0 ðx0 Þ = 0:

ð24:278Þ

Thus, on the condition of (24.276), we have gotten the covariant form of the Dirac equation (24.278). Defining A 0  iγ μ ∂ ′ μ - m,

ð24:279Þ

from (24.273), (24.274), and (24.278), we get A 0 ψ 0 ðx0 Þ = 0,

ð24:280Þ

A 0 = SAS - 1 :

ð24:281Þ

with

Equation (24.281) results from (24.257).

24.3

Covariant Properties of the Dirac Equation

1203

In this way, (24.281) decides the transformation property of the operator A. Needless to say, our next task is to determine a tangible form of S so that (24.276) and (24.281) can hold at once. To do this, (24.276) will provide a good criterion by which to judge whether we have chosen a proper matrix form for S. These will be performed in Sect. 24.4. A series of equations (24.277)-(24.281) can be viewed as the transformation of the Dirac operators in connection with the Lorentz transformation and is of fundamental importance to solve the Dirac equation. That is, once we successfully find an appropriate form of matrix S in combination with a solution ψ(x) of the Dirac equation in the x-system, from (24.277) we can immediately determine ψ ′(x′), i.e., a solution of the Dirac equation in the x′-system. To discuss the transformation of the Dirac equation, in this book we call G ~ of (21.108) Dirac operators together with their differential defined in (21.91) and G μ form A  iγ ∂μ - m. Note that normally we call the differential operator (iγ μ∂μ m) that appeared in Sect. 21.3.1 Dirac operators. When we are dealing with the plane wave solution with the Dirac equation, ∂μ is to be replaced with -ipμ or +ipμ (i.e., ~ respectively. Fourier components) to give the Dirac operator G or G, We also emphasize that (24.277) is contrasted with (24.267), which characterizes the transformation of the scalar function. Comparing the transformation properties of the Klein-Gordon equation and the Dirac equation, we realize that the representation matrix S is closely related to the nature of the representation space. In the representation space, state functions (i.e., scalar, vector, spinor, etc.) and basis functions (e.g., basis vectors in a vector space) as well as the operators undergo the simultaneous transformation by the representation matrix S; see (24.257) and (24.258). In the next section, we wish to determine a general form of S of the Dirac equation. To this end, we carry the discussion in parallel with that of Sect. 24.2. Before going into detail, once again we wish to view solving the Dirac equation as an eigenvalue problem. From (21.96), (21.107), and (21.109), we immediately get ~ = - 2mE, GþG

ð24:282Þ

~ is a normal operator (i.e., where E is a (4, 4) identity matrix. Since neither G nor G ~ through the neither Hermitian nor unitary), it is impossible to diagonalize G and G unitary similarity transformation (see Theorem 14.5). It would therefore be intrac~ even table to construct the diagonalizing matrix of the Dirac operators G and G, though we can find their eigenvalues. Instead, if we are somehow able to find a suitable diagonalizing matrix S(Λ), we can at once solve the Dirac equation. Let S be such a matrix that diagonalizes G. Then, from (24.282) we have ~ = - 2mS - 1 ES = - 2mE: S - 1 GS þ S - 1 GS Rewriting this equation, we obtain

1204

24

Basic Formalism

~ = - 2mE - D, S - 1 GS ~ can also be where D is a diagonal matrix given by D = S - 1 GS. This implies that G diagonalized using the same matrix S. Bearing in mind this point, in the next section we are going to seek and construct a ~ matrix S(Λ) that diagonalizes both G and G.

24.4

Determination of General Form of Matrix S(Λ)

In the previous section, if the representation matrix S(Λ) is properly chosen, S(iγ μ∂μ - m)S-1 and Sψ(x) appropriately represent the Dirac operator and Dirac spinor, respectively. Naturally, then, we are viewing the Dirac operator and spinor in reference to the basis vectors after being transformed. The condition for S(Λ) to satisfy is given by (24.276). The matrices S(Λ) of (24.259) and their properties have been fully investigated and available up to now [10, 11]. We will make the most of those results to decide a tangible form of S(Λ) for the Dirac equation. Since S(Λ) is a representation matrix, from (24.236) we immediately calculate S(Λ) such that SðΛÞ = S Λϕ SðΛθ ÞS Λb- 1 = S Λϕ SðΛθ Þ½SðΛb Þ - 1 :

ð24:283Þ

The matrix S(Λϕ) produced by the spatial rotation is given as [11] S Λϕ = expðϕρm Þ,

ð24:284Þ

where ϕ is a rotation angle and ρm is given by ρm  ð- iÞJ kl ; J kl 

i k l γ , γ ðk, l, m = 1, 2, 3Þ: 4

ð24:285Þ

In (24.285) γ k is a gamma matrix and k, l, m change cyclic. We have J kl =

1 2

σm 0

0 σm

,

ð24:286Þ

where σ m is a Pauli spin matrix and 0 denotes (2, 2) zero matrix. Note that Jkl is given by a direct sum of σ m. That is, J kl =

1 σ 2 m

σm :

When the rotation by ϕ is made around the z-axis, S(Λϕ) is given by

ð24:287Þ

24.4

Determination of General Form of Matrix S(Λ)

eiϕ=2 S Λϕ = expðϕρ3 Þ =

0 0

e

0

1205

0

0 - iϕ=2

0

0

0 eiϕ=2

0 0

0

0

e - iϕ=2

:

ð24:288Þ

With another case where the rotation by θ is made around the y-axis, we have θ 2 θ - sin 2

θ 2 θ cos 2

0

0

0

0

cos

SðΛθ Þ ¼ expðθρ2 Þ ¼

sin

0

0

0

0

θ 2 θ - sin 2

θ 2 θ cos 2

cos

:

ð24:289Þ

sin

Meanwhile, an example of the transformation matrices S(Λb) related to the Lorentz boost is expressed as SðΛb Þ = expð- ωβk Þ,

ð24:290Þ

where ω represents the magnitude of boost; βk is given by i βk  ð- iÞ J 0k ; J 0k  γ 0 γ k ðk = 1, 2, 3Þ: 2

ð24:291Þ

If we make the Lorentz boost in the direction of the z-axis, S(Λb) is expressed as SðΛb Þ ¼ expð- ωβ3 Þ ω cosh 0 2 ω 0 cosh 2 ¼ ω sinh 0 2 ω 0 - sinh 2

sinh

ω 2

- sinh

0 cosh 0

0

ω 2

ω 2

0 cosh

:

ð24:292Þ

ω 2

As in the case of the Lorentz transformation matrices, the generator ρm (m = 1, 2, 3), which is associated with the spatial rotation, is anti-Hermitian. Hence, exp(ϕρ3) and exp(θρ2) are unitary. On the other hand, βk (k = 1, 2, 3), associated with the Lorentz boost, is Hermitian. The matrix exp(ωβ3) is Hermitian as well; for these see Sects. 15.2 and 25.4. From (24.288), (24.289), and (24.292) we get

1206

24

Basic Formalism

SðΛÞ = S Λϕ SðΛθ Þ½SðΛb Þ - 1 = expðϕρ3 Þexpðθρ2 Þexpðωβ3 Þ =

θ 2 θ - sin 2

θ 2 θ cos 2

0

0

0

0

cos

eiϕ=2

0

0

0

0 0

e - iϕ=2 0

0 eiϕ=2

0 0

0

0

0

e - iϕ=2

cosh

ω 2

0 - sinh

cosh ω 2

0 θ ω eiϕ=2 cos cosh 2 2

- sinh

0 ω 2

0 ω sinh 2

ω 2

0 cosh 0

θ ω eiϕ=2 sin cosh 2 2

sin

0

0

0

0

θ 2 θ - sin 2

θ 2 θ cos 2

cos

sin

0 sinh

ω 2

ω 2

0 cosh

ω 2

θ ω -eiϕ=2 cos sinh 2 2

θ ω eiϕ=2 sin sinh 2 2

θ ω θ ω θ ω θ ω e - iϕ=2 sin sinh - e -iϕ=2 sin cosh e -iϕ=2 cos cosh e - iϕ=2 cos sinh 2 2 2 2 2 2 2 2 : = θ ω θ ω θ ω θ ω - eiϕ=2 cos sinh eiϕ=2 cos cosh eiϕ=2 sin sinh eiϕ=2 sin cosh 2 2 2 2 2 2 2 2 θ ω θ ω θ ω - iϕ=2 θ ω -iϕ=2 -iϕ=2 - iϕ=2 sin sinh cos sinh sin cosh cos cosh e e -e e 2 2 2 2 2 2 2 2 ð24:293Þ

Also, we have ½SðΛÞ - 1 =

24.5

Transformation Properties of the Dirac Spinors [9]

1207

θ ω e - iϕ=2 cos cosh 2 2

θ ω - eiϕ=2 sin cosh 2 2

θ ω θ ω e - iϕ=2 cos sinh - eiϕ=2 sin sinh 2 2 2 2

θ ω e - iϕ=2 sin cosh 2 2

θ ω eiϕ=2 cos cosh 2 2

θ ω θ ω - e - iϕ=2 sin sinh - eiϕ=2 cos sinh 2 2 2 2

θ ω cos sinh e 2 2 θ ω - iϕ=2 sin sinh -e 2 2

θ ω sin sinh -e 2 2 θ ω iϕ=2 cos sinh -e 2 2

- iϕ=2

θ ω cos cosh e 2 2 θ ω - iϕ=2 sin cosh e 2 2 - iϕ=2

iϕ=2

: θ ω sin cosh -e 2 2 θ ω iϕ=2 e cos cosh 2 2 ð24:294Þ iϕ=2

Equation (24.293) gives a general form of S(Λ). The construction processes of (24.293) somewhat resemble those already taken to form the unitary transformation matrix of SU(2) in (20.45) or those to construct the transformation matrix of (17.101). Because of the presence of [S(Λb)]-1 in (24.293), however, S(Λ) of (24.293) is not unitary. This feature plays an essential role in what follows.

24.5

Transformation Properties of the Dirac Spinors [9]

Now, we are in the position to seek a tangible form of the Dirac spinors. Regarding the plane waves, ψ(x) written in a general form in (24.269) is described by

ψ ð xÞ =

c0 ð xÞ c1 ð x Þ c2 ð x Þ c3 ð x Þ

= e ± ipx

c0 c1 c2 c3

= e ± ipx wðp; hÞ,

ð24:295Þ

where w(p, h) represents either u(p, h) or v(p, h) given in Sect. 21.3. Note that w(p, h) does not depend on x. Since px (pμxμ) in e±ipx is a scalar quantity, it is independent of the choice of an inertial frame of reference, namely, px is invariant with the Lorentz transformation. Therefore, it behaves as a constant in terms of the operation of S(Λ). Let the x-system be set within the inertial frame of reference O where the electron is at rest (see Fig. 24.2). Then, we have ψ ðxÞ = e ± ip0 x wð0, hÞ = e ± imx wð0, hÞ: 0

0

ð24:296Þ

Meanwhile, (24.277) is rewritten as 0 0

0 0

ψ 0 ðx0 Þ = e ± ip x SðΛÞwð0, hÞ = e ± ip x wðp0 , hÞ:

ð24:297Þ

In (24.297) we assume that x′-system is set within the inertial frame of reference O where the electron is moving at a velocity of v (see Fig. 24.2). Note that in ′

(24.297) p′x′ = px = p0x0 = mx0 and p =

p0 p

; p 0 = p0 , p0 =

p00 p0

.

1208

24

Basic Formalism

In the remaining parts of this section, we use the variables x and p for the frame O and the variables x′ and p′ for the frame O′. Note that the frames O and O′ are identical to those appearing in Sect. 24.2.2. First, we wish to seek the solution w(0, h) of the Dirac equation with an electron at rest. In this case, momentum p of the electron is zero, i.e., p1 = p2 = p3 = 0. Then, if we find the solution w(0, h) in the frame O, we are going to get a solution w(p′, h) in the frame O′ such that SðΛÞwð0, hÞ = wðp0 , hÞ:

ð24:298Þ

Equation (24.298) implies that the helicity is held unchanged by the transformation S(Λ). Now, the Dirac operator of the rest frame is given by pμ γ μ - m = p0 γ 0 - m = mγ 0 - m = m γ 0 - 1  A

ð24:299Þ

- pμ γ μ - m = - p0 γ 0 - m = - mγ 0 - m = m - γ 0 - 1  B:

ð24:300Þ

or

Namely, we have

A

0 0 0 0

0 0 0 0

0 0 - 2m 0

0 0 0 - 2m

B

- 2m 0 0 0

0 - 2m 0 0

0 0 0 0

ð24:301Þ

and 0 0 0 0

:

ð24:302Þ

The operators A and B correspond to the positive-energy solution and the negative-energy solution, respectively. For the former case, we have 0 0 0 0

0 0 0 0

0 0 - 2m 0

0 0 0 - 2m

c0 c1 c2 c3

= 0:

ð24:303Þ

In (24.303) we have c2 = c3 = 0 and c1 and c2 can freely be chosen. Then, with the two independent solutions, we get

24.5

Transformation Properties of the Dirac Spinors [9]

1 0 0 0

u1 ð0; - 1Þ =

1209

, u2 ð0; þ1Þ =

0 1 0 0

:

ð24:304Þ

With the plane wave solutions, we have

ψ 1 ð xÞ = e

- imx0

1 0 0 0

, ψ 2 ðxÞ = e

- imx0

0 1 0 0

x0  t :

ð24:305Þ

With the negative-energy solution, on the other hand, we have - 2m 0 0 - 2m 0 0 0 0

0 0 0 0

0 0 0 0

c0 c1 c2 c3

= 0:

ð24:306Þ

Therefore, as another two independent solutions we obtain 0 0 1 0

v1 ð0; þ1Þ =

, v2 ð0; - 1Þ =

0 0 0 1

ð24:307Þ

with the plane wave solutions described by

ψ 3 ðxÞ = eimx

0 0 1 0

0

, ψ 4 ðxÞ = eimx

0

0 0 0 1

:

ð24:308Þ

The above solutions (24.304) can be automatically obtained from the results of Sect. 21.3 (including the normalized constant as well). That is, using (21.104) and putting θ = ϕ = 0 as well as S = 0 in (21.104), we have the same results as (24.304). Note, however, that in (24.307) we did not take account of the charge conjugation (vide infra). But, using (21.120) and putting θ = ϕ = 0 and S = 0 once again, we obtain

v1 ð0; þ1Þ =

0 0 1 0

, v2 ð0; - 1Þ =

0 0 0 -1

:

ð24:309Þ

Notice that the latter solution of (24.309) has correctly reflected the charge conjugation. Therefore, with the negative-energy plane wave solutions, we must use

1210

24

ψ 3 ð xÞ = e

imx0

0 0 1 0

, ψ 4 ð xÞ = e

imx0

0 0 0 -1

Basic Formalism

:

ð24:310Þ

Even though (24.303) and (24.306) and their corresponding solutions (24.304) and (24.309) look trivial, they provide solid ground for the Dirac equation and its solution of an electron at rest. In our case, these equations are dealt with in the frame O and described by x. In the above discussion, (24.305) and (24.310) give ψ(x) in (24.296). Thus, substituting (24.304) and (24.309) for w(0, h) of (24.297) and using D(Λ) = S(Λ) of (24.293), we get at once the solutions ψ ′(x′) described by x′ with respect to the frame O′. The said solutions are given by θ ω eiϕ=2 cos cosh 2 2

0 0 ψ 01 ðx0 Þ = e - ip x

θ ω - e - iϕ=2 sin cosh 2 2 θ ω - eiϕ=2 cos sinh 2 2

θ ω eiϕ=2 sin cosh 2 2

0 0 , ψ 02 ðx0 Þ = e - ip x

θ ω e - iϕ=2 sin sinh 2 2

0 0

ψ 03 ðx0 Þ = eip x

- eiϕ=2 cos

θ ω sinh 2 2

e - iϕ=2 sin

θ ω sinh 2 2

θ ω eiϕ=2 cos cosh 2 2 - e - iϕ=2 sin

θ ω cosh 2 2

θ ω e - iϕ=2 cos cosh 2 2 θ ω eiϕ=2 sin sinh 2 2

,

θ ω e - iϕ=2 cos sinh 2 2 iϕ=2 -e

0 0

, ψ 04 ðx0 Þ = eip x

sin

θ ω sinh 2 2

- e - iϕ=2 cos -e

iϕ=2

θ ω sinh 2 2

θ ω sin cosh 2 2

- e - iϕ=2 cos

:

θ ω cosh 2 2 ð24:311Þ

The individual column vectors of (24.311) correspond to those of (24.293) with the sign for ψ 04 ðx0 Þ reversed relative to the last column of (24.293). This is because of the charge conjugation. Note that both the positive-energy solution and the negativeenergy solution are transformed by the same representation matrix S. This is evident from (24.297). Further rewriting (24.311), we obtain e.g.,

24.5

Transformation Properties of the Dirac Spinors [9]

1211

θ 2 θ - e - iϕ=2 sin θ 2 ω - eiϕ=2 cos tanh 2 2 θ ω e - iϕ=2 sin tanh 2 2 eiϕ=2 cos

0 0

ψ 01 ðx0 Þ = e - ip x cosh

ω 2

:

ð24:312Þ

Notice that in (24.312) we normalized ψ 1(x) to unity [instead of 2m used in e.g., (21.141)]. Here we wish to associate the hyperbolic functions with the physical quantities of the Dirac fields. From (1.8), (1.9), and (1.21) along with (21.45), we have [12] p00 = mγ = m cosh ω

and

p0 = mγv:

ð24:313Þ

As important formulae of the hyperbolic functions (associated with the physical quantities), we have cosh

ω = 2

tanh

1 þ cosh ω = 2

sinh ω2 ω = = cosh ω2 2

p00 þ m ω , sinh = 2m 2

p00 - m = p00 þ m

- 1 þ cosh ω = 2

ðp00 Þ2 - m2 P0 = S0, = 00 2 00 p þm ðp þ m Þ

p00 - m , 2m ð24:314Þ ð24:315Þ

where P 0 j p0 j and S 0  P 0 = p0 0 þ m [see (21.97) and (21.99)]. With the second to the last equality of (24.315), we used (21.79) and (21.97); with the last equality we used (21.99). Then, we rewrite (24.312) as eiϕ=2 cos 0 0 ψ 01 ðx0 Þ = e - ip x

p00 þ m 2m

θ 2

θ 2 θ iϕ=2 cos -S ′e 2 θ 0 - iϕ=2 sin Se 2 - e - iϕ=2 sin

:

ð24:316Þ

The column vector of (24.316) is exactly the same as the first equation of (21.104) including the phase factor (but, aside from the normalization factor 1=2m). Other three equations of (24.311) yield the column vectors same as given in (21.104) and (21.120) as well. Thus, following the procedure of (24.277) we have proven that S(Λ) given in (24.293) produces the proper solutions of the Dirac equation. From (24.304) and (24.309), we find that individual column vectors of S(Λ) yield four solutions of the ~ (aside from the phase factor). This Dirac equation, i.e., the eigenvectors of G and G

1212

24

Basic Formalism

~ For the same reason, the suggests that S(Λ) is a diagonalizing matrix of G and G. column vectors of S(Λ) are the eigenvectors of H for (21.86) as well. As already pointed out in Sect. 21.3.2, however, G is not Hermitian. Consequently, on the basis of Theorem 14.5, the said eigenvectors shared by G and H do not constitute the orthonormal system. Hence, we conclude S(Λ) is not unitary. This is consistent with the results of Sect. 24.4; see an explicit matrix form of S(Λ) indicated by (24.293). That is, the third factor matrix representing the Lorentz boost in LHS of (24.293) is not unitary. Now, we represent the positive energy solutions and negative-energy solutions as ~ altogether, respectively. Operating both sides of (24.282) on Ψ and Ψ ~ from Ψ and Ψ the left, we get ~ = 0 þ GΨ ~ = GΨ ~ = - 2mΨ, GΨ þ GΨ ~ þG ~Ψ ~ = GΨ ~ þ 0 = GΨ ~ = - 2mΨ: ~ GΨ From the above equations, we realize that Ψ is an eigen function that belongs to the eigenvalue 0 of G and, at the same time, an eigen function that belongs to the ~ In turn, Ψ ~ is an eigen function that belongs to the eigenvalue eigenvalue -2m of G. ~ 0 of G and, at the same time, an eigen function that belongs to the eigenvalue -2m of ~ possess the eigenvalues 0 and -2m with both doubly degenerate. G. Both G and G Thus, the questions raised in Sect. 21.3.2 have been adequately answered. The point rests upon finding out a proper matrix form of S(Λ). We will further pursue the structure of S(Λ) in Sect. 24.9 in relation to the polar decomposition of a matrix.

24.6

Transformation Properties of the Dirac Operators [9]

In the previous section, we have derived the functional form of the Dirac spinor from that of the Dirac spinor determined in the rest frame of the electron. In this section we study how the Dirac operator is transformed according to the Lorentz transformation. In the first half of this section, we examine it in reference to the rest frame in accordance with the approach taken in the last section. More generally, however, we need to investigate the transformation property of the Dirac operator in reference to a moving frame of an electron. We deal with this issue in the latter half of the section.

24.6.1

Case I: Single Lorentz Boost

In (24.293) we considered S(Λ) = S(Λϕ)S(Λθ)[S(Λb)]-1. To further investigate the operation and properties of S(Λ), we consider the spatial rotation factor S(Λϕ)S(Λθ) and the Lorentz boost [S(Λb)]-1 of (24.293) separately. Defining SR as

24.6

Transformation Properties of the Dirac Operators [9]

1213

SR  SðΛu Þ = S Λϕ SðΛθ Þ,

ð24:317Þ

we have

eiϕ=2

0

0

0

0

e - iϕ=2

0

0

θ 2 θ - sin 2

0

0

eiϕ=2

0

0

0

0

0

0

e - iϕ=2 0

0

cos

SR ¼

θ 2 θ θ - e - iϕ=2 sin e - iϕ=2 cos 2 2 eiϕ=2 cos

¼

θ 2

eiϕ=2 sin

0

0

0

0

θ 2 θ cos 2 sin

0

0

0

0

θ 2 θ - sin 2

θ 2 θ cos 2

cos

0

0

0

0

θ 2 θ θ - iϕ=2 - iϕ=2 sin cos e -e 2 2 eiϕ=2 cos

θ 2

sin

: ð24:318Þ

eiϕ=2 sin

Also defining a (2, 2) submatrix S~R as S~R 

θ 2 θ - e - iϕ=2 sin 2 eiϕ=2 cos

θ 2 θ e - iϕ=2 cos 2 eiϕ=2 sin

ð24:319Þ

,

we have SR =

SR 0

0 SR

:

0 0

0 E2

ð24:320Þ

Then, we obtain SR ASR- 1 = ð - 2mÞ SR 0 = ð- 2mÞ

0 0

0 -1 S~R E 2 S~R

0 SR

= ð- 2mÞ

0 0

-1

SR 0 0 E2

= A,

0 -1 SR ð24:321Þ

1214

24

Basic Formalism

where A is that appeared in (24.301). Consequently, the functional form of the Dirac equation for the electron at rest is invariant with respect to the spatial rotation. Corresponding to (24.311), the Dirac spinor is given by e.g.,

ψ 01 ðx0 Þ = SR

ψ 02 ðx0 Þ = SR

eiϕ=2 cos

1 0 0 0

2 θ - e - iϕ=2 sin 2 0 0

00

= e - imx

0 1 0 0

eiϕ=2 sin

= e - imx

θ

ð24:322Þ

:

ð24:323Þ

θ

2 θ e - iϕ=2 cos 2 0 0

00

,

Similarly, we have 0 0 1 0

ψ 03 ðx0 Þ = SR

ψ 04 ðx0 Þ = SR

0 0 0 -1

= eimx

0 0 θ eiϕ=2 cos 2

00

-e

= eimx

- iϕ=2

sin

0 0

00

- eiϕ=2 sin -e

- iϕ=2

ð24:324Þ

, θ 2

θ

2 θ cos 2

:

ð24:325Þ

With the projection operators, we obtain e.g., cos2

jψ 01 ψ 01 j =

θ

2 θ θ - e - iϕ cos sin 2 2 0

- eiϕ cos sin

2 θ 2

0

0

θ

0

sin

2

θ 2

0

0

0

0

0

0

0

0

= jψ 01 ψ 01 j : 2

ð24:326Þ

We find that jψ 01 ψ 01 j is a Hermitian and idempotent operator (Sects. 12.4 and 14.1); i.e., a projection operator sensu stricto (Sect. 18.7). It is the case with other operators jψ 0k ψ 0k j ðk = 2, 3, 4Þ as well. Similarly calculating other operators, we get 4 k=1

jψ 0k

ψ 0k j = E 4 ,

ð24:327Þ

where E4 is a (4, 4) identity matrix. Other requirements with the projection operators mentioned in Sect. 14.1 are met accordingly (Definition 14.1). Equation (24.327) represents the completeness relation (see Sect. 18.7).

24.6

Transformation Properties of the Dirac Operators [9]

1215

It is worth mentioning that in (24.322)-(24.325) the components of upper two rows and lower two rows are separated with the two pairs of Dirac spinors ψ 1(x), ψ 2(x) and ψ 3(x), ψ 4(x). The spatial rotation does not change the relative movement between the observers standing still on the individual inertial frames of reference. This is the case with the frames O and OR. Here, we have defined OR as an inertial frame of reference that is reached by the spatial rotation of O. Consequently, if the observer standing still on OR is seeing an electron at rest, the other observer on O is seeing that electron at rest. Thus, the physical situation is the same in that sense with those two observers either on O or on OR. Meanwhile, defining the Lorentz boost factor Sω  [S(Λb)]-1 in (24.293) as cosh

ω 2

0

Sω ¼

- sinh

0 ω 2

cosh ω 2

- sinh 0

0 ω 2

0

0

sinh

cosh

sinh

ω 2

ω 2

ω 2

ð24:328Þ

,

0

0

cosh

ω 2

we get Sω ASω- 1

cosh ¼ ð- 2mÞ

0

ω 2

0 sinh 0

ω 2

0 ω sinh 2

ω 2

0 ω - sinh 2

ω 2

0

0 ω 2

sinh

ω cosh 2 0

sinh

0 cosh

ω 2

- sinh

0 cosh

ω - sinh 2 0

cosh ×

ω 2

ω 2

0

0 cosh

ω 2

0 - sinh

0 cosh

0 0 0 E2

ω 2

ω 2

0 cosh

ω 2

1216

24

- sin h2

ω 2

- sin h2

0 ¼ ð - 2mÞ cosh

- cosh

0

ω ω sinh 2 2

ω 2

- cosh

0

ω ω sinh 2 2 0

cos h2

0 ω ω sinh 2 2

Basic Formalism

0 cosh

ω 2

ω ω sinh 2 2

cos h2

0

:

0 ω 2

ð24:329Þ

Rewriting (24.329) by use of the formulae of the hyperbolic functions of (24.314), we obtain

Sω ASω- 1 =

p′ 0 - m 0 - p′ 3

p′ - m 0

p′ 3 0 - p′ 0 - m

0 - p′ 3 0

p′ 3

0

- p′ 0 - m

0 0

0

p′ 3 > 0 : ð24:330Þ

In the above computation, we put ϕ = θ = 0 in (24.293). From (24.314) we had cosh

p03 p00 - m j p0 j = = : 2m 2m 2m

p00 þ m 2m

ω ω sinh = 2 2

This is identical with RHS of (21.80). The Dirac spinor is given by e.g.,

ψ 1′ x ′ = Sω

= e - ip



x



1 0 0 0

= e - ip

p′0 þ m 2m



x′

ω cosh 2 0 ω - sinh 2 0

1 0 j p′3 j - ′0 p þm 0

,

ð24:331Þ

24.6

Transformation Properties of the Dirac Operators [9]

0 1

ψ 2′ x ′ = Sω

= e - ip



0 0

0 = e - ip

Similarly, we have 0 0

ψ 3′ x ′ = Sω

= eip



x′

1 0



x



cosh 0

0 1 :

0

ð24:332Þ

′3

jp j p′0 þ m - sinh

= eip

p′0 þ m 2m



0

x′

cosh

ψ 4′ x ′ = Sω

ω 2

ω 2

0 -

j p′3 j p′0 þ m 0 1 0

,

0 0 0 0 -1

ω 2 ω 2

sinh

p′0 þ m 2m

x′

1217

sinh = eip



x



ð24:333Þ

ω 2

0 cosh

ω 2

0 j p′3 j - ′0 = eip x ð24:334Þ p þm : 0 -1 In the case of the Lorentz boost, we see that from (24.322) to (24.325) the components of upper two rows and lower two rows are intermixed with the two pairs of Dirac spinors ψ 01 ðx0 Þ, ψ 02 ðx0 Þ and ψ 03 ðx0 Þ, ψ 04 ðx0 Þ. This is because whereas one is observing the electron at rest, the other is observing the moving electron. Using (24.301) and (24.302), we have ′



p′0 þ m 2m

B = - 2m - A:

ð24:335Þ

Therefore, multiplying both sides of (24.335) by Sω from the left and Sω- 1 from the right and further using (24.330), we get

1218

24

Basic Formalism

Sω BSω- 1 = - 2m - Sω ASω- 1 =

-p′0 -m 0 0 -p′0 -m 0 p′3 -p′3 0

- p′3 0 p′0 -m 0

0 p′3 0 p′0 -m

:

ð24:336Þ

Equation (24.336) is the operator that gives the negative-energy solution of the Dirac equation. From (24.293) we see that S(Λ) is decomposed into S(Λ) = SRSω. Although SR is unitary, Sω is not a unitary matrix nor a normal matrix either. Accordingly, S is not a unitary matrix as a whole. Using (24.329) in combination with (24.318), we obtain SAS - 1 = SR Sω ASω- 1 SR- 1 = SðΛÞA½SðΛÞ - 1 = ð- 2mÞ × -sinh2

ω 2

0

ω ω ω ω iϕ -cos θ cosh sinh e sin θ cosh sinh 2 2 2 2

ω ω ω ω ω cos θ cosh sinh e-iϕ sin θ cosh sinh 2 2 2 2 2 : ω ω ω ω iϕ ω cos θ cosh sinh -e sin θ cosh sinh 0 cosh2 2 2 2 2 2 ω ω ω ω ω cosh2 0 -e-iϕ sin θ cosh sinh -cos θ cosh sinh 2 2 2 2 2 0

-sinh2

ð24:337Þ

Using (24.314) and (21.79), (24.337) can be rewritten in the polar coordinate representation as SAS - 1 =

p′ 0 - m 0 - j p′j cos θ j p′j e - iϕ sin θ

0 p′ - m j p′j eiϕ sin θ j p′j cos θ 0

j p′j cos θ - j p′j eiϕ sin θ - iϕ - j p′j e sin θ -j p′j cos θ 0 - p′ 0 - m - p′ 0 - m 0

:

ð24:338Þ Converting the polar coordinate into the Cartesian coordinate, we find that (24.338) is identical with the (4, 4) matrix of (21.75). Namely, we have successfully gotten SAS - 1 = G:

ð24:339Þ

-1 of (24.240) for We remark that when deriving (24.338), we may choose Λ~b -1 ~ in place of S(Λ) = S(Λϕ)S(Λθ)[S(Λb)]-1, the Lorentz boost. Using S Λb

instead of SAS-1 of (24.338) we get

24.6

Transformation Properties of the Dirac Operators [9]

S Λ~b

-1

1219

 A  S Λ~b ,

which yields the same result as SAS-1 of (24.338). This is evident in light of (24.321). Corresponding to (24.243), we have S Λ~b

-1

= SðΛu ÞS Λb- 1 S Λ{u = SðΛu ÞS Λb- 1 S Λu- 1

= SðΛu ÞS Λb- 1 ½SðΛu Þ - 1 = SðΛu ÞS Λb- 1 ½SðΛu Þ{

cosh

ω 2

0 ¼

ω 2 ω - iϕ sin θ sinh e 2 - cos θ sinh

cosh

ω 2 ω e - iϕ sin θ sinh 2 ω cosh 2 - cos θ sinh

0 ω 2

ω 2 ω cos θ sinh 2

eiϕ sin θ sinh

0

ω 2 ω cos θ sinh 2

eiϕ sin θ sinh

:

0 cosh

ω 2 ð24:340Þ

In (24.340) we obtain the representation S Λ~b Λb- 1

-1

as a consequence of viewing



in reference to the frame O . The trace of (24.340) is 4 cosh ω2 , the operator S which is the same as [S(Λb)]-1, reflecting the fact that the unitary similarity transformation keeps the trace unchanged; see (24.328). Now, the significance of (24.339) is evident and consistent with (24.281). Tracing the calculation (24.339) reversely, we find that S - 1 GS = A. That is, G has been diagonalized to yield A through the similarity transformation with S. The operator G possesses the eigenvalues of -2m (doubly degenerate) and zero (doubly degenerate). We also find that the rank of G is two. From (24.338) and (24.339), we find that the trace of G is -4m that is identical with the trace of the matrices A and B of (24.301) and (24.302). Let us get back to (24.282) once again. Sandwiching both sides of (24.282) between S-1 and S from the left and right, respectively, we obtain ~ = - 2mS - 1 ES: S - 1 GS þ S - 1 GS

ð24:341Þ

~ = - 2mE. Further using (24.301) and (24.302), we Using (24.339), A þ S - 1 GS get

1220

24

Basic Formalism

Table 24.2 Eigenvalues and their corresponding Dirac spinors (eigenspinors) of the Dirac operators Dirac operatora Inertial frame O O′ A G B

Dirac spinorb Inertial frame O ψ 1(x), ψ 2(x) ψ 3(x), ψ 4(x) ψ 3(x), ψ 4(x) ψ 1(x), ψ 2(x)

Eigenvalue 0 -2m 0 -2m

~ G

O′ Sψ 1(x), Sψ 2(x) Sψ 3(x), Sψ 4(x) Sψ 3(x), Sψ 4(x) Sψ 1(x), Sψ 2(x)

~ are connected to A and A and B are taken from (24.301) and (24.302), respectively. G and G B through similarity transformation by S, respectively; see (24.339) and (24.342) b ψ 1(x) and ψ 2(x) are taken from (24.305); ψ 3(x) and ψ 4(x) are taken from (24.310). a

~ = - 2mE - A = B S - 1 GS

or

~ SBS - 1 = G:

ð24:342Þ

~ is found to possess the eigenvalues of -2m (doubly degenerate) The operator G and zero (doubly degenerate) as well, in agreement with the argument made at the end of Sect. 24.5. Thus, choosing S of (24.293) for an appropriate diagonalizing ~ have been successfully diagonalized so as to be A and B, matrix, both G and G respectively. Table 24.2 summarizes eigenvalues and their corresponding Dirac spinors (eigenspinors) of the Dirac operators. Note that both the eigenvalues 0 and -2m are doubly degenerate. Of the Dirac spinors, those belonging to the eigenvalue 0 are the solutions of the Dirac equation. Following (24.277), we define SϕðxÞ  ϕ0 ðx0 Þ

and

Sχ ðxÞ  χ 0 ðx0 Þ:

ð24:343Þ

In (24.343) ϕ(x) is chosen from among ψ 1(x) and ψ 2(x) of (24.305) and represents the positive-energy state of an electron at rest; χ(x) from among ψ 3(x) and ψ 4(x) of (24.310) and for the negative-energy state of an electron at rest. Using (24.301), (24.303) is rewritten as AϕðxÞ = 0:

ð24:344Þ

AS - 1  SϕðxÞ = 0, SAS - 1  SϕðxÞ = 0:

ð24:345Þ

Then, we have successively

The second equation of (24.345) can be rewritten as Gϕ0 ðx0 Þ = 0,

ð24:346Þ

24.6

Transformation Properties of the Dirac Operators [9]

1221

where G = SAS - 1 and ϕ′(x′) = Sϕ(x). Taking a reverse process of the above three equations, we get Gϕ0 ðx0 Þ = 0, GS  S - 1 ϕ0 ðx0 Þ = 0, S - 1 GS  S - 1 ϕ0 ðx0 Þ = 0:

ð24:347Þ

The final equation of (24.347) is identical to (24.344), where S - 1 GS = A

and

S - 1 ϕ0 ðx0 Þ = ϕðxÞ:

ð24:348Þ

The implication of (24.348) is that once G has been diagonalized with S, ϕ′(x′) is at once given by operating S on a simple function ϕ(x) that represents the state of an electron at rest. This means that the Dirac equation has been solved. With negative-energy state of an electron, starting with (24.306), we reach a related result described by ~ 0 ð x0 Þ = 0 Gχ

ð24:349Þ

~ = SBS - 1 and χ ′(x′) = Sχ(x). Evidently, (24.346) and (24.349) correspond to with G ~ (21.73) and (21.74), respectively. At the same time, we find that both G and G possess eigenvalues -2m and 0. It is evident from (24.303) and (24.306). At the end of this section, we remark that in relation to the charge conjugation mentioned in Sect. 21.5, using the matrix C of (21.164) we obtain following relations: CAC - 1 = B and CBC - 1 = A:

24.6.2

Case II: Non-Coaxial Lorentz Boosts

As already seen earlier, the Lorentz transformations include the unitary operators that are related to the spatial rotation and Hermitian operators (or real symmetric operators) called Lorentz boost. In the discussion made thus far, we focused on the transformation that included only a single Lorentz boost. In this section, we deal with the case where the transformation contains complex Lorentz boosts, especially non-coaxial Lorentz boosts. When the non-coaxial Lorentz boosts are involved, we encounter a somewhat peculiar feature. As described in Sect. 24.2.2, the successive Lorentz transformations consisted of a set of two spatial rotations and a single Lorentz boost. In this section, we further make a step toward constructing the Lorentz transformations. Considering those three successive transformations as a unit, we combine two unit transformations so that the resulting transformation can contain two Lorentz boosts. We have two cases for this: (i) The boosts are coaxial. (ii) The boosts are non-coaxial. Whereas in the former case the boost directions are the same when viewed from a certain inertial frame of reference, in the latter case the boost directions are different.

1222

24

Basic Formalism

As can be seen from Fig. 24.2a, if the rotation angle θ ≠ 0 in the frame Oθ, the initial Lorentz boost and subsequently occurring boost are non-coaxial. Here, we assume that the second Lorentz boost takes place along the z′-axis of the frame O′. In the sense of the non-coaxial boost, the situation is equivalent to a following case: An observer A1 is watching a moving electron with a constant velocity in the direction of, say the z-axis in a certain inertial frame of reference F1 where the observer A1 stays at rest. Meanwhile, another observer A2 is moving with a constant velocity in the direction of, say along the x-axis relative to the frame F1. On this occasion, we have a question of how the electron is moving in reference to the frame F2 where the observer A2 stays at rest. In Sect. 24.2.2 we expressed the successive transformations as Λ = Λϕ Λθ Λb- 1 :

ð24:236Þ

Let us denote the transformation by (ϕ, θ, b). Let another transformation be (ϕ′, θ′, b′) that is described by Λ0 = Λϕ0 Λθ0 Λb-0 1 :

ð24:350Þ

Suppose that (ϕ′, θ′, b′) takes place subsequently to the initial transformation (ϕ, θ, b). The overall transformation Ξ is then described such that Ξ = Λ0 Λ = Λϕ0 Λθ0 Λb-0 1 Λϕ Λθ Λb- 1 :

ð24:351Þ

Note that in (24.351) we are viewing the transformation in the moving coordinate systems. That is, in the second transformation Λ′ the boost takes place in the direction of the z′-axis of the frame O′ (Fig. 24.2a). As a result, the successive transformations Λ′Λ cause the non-coaxial Lorentz boosts in the case of θ ≠ 0. In a special case where θ = 0 is assumed for the original transformation Λ, however, Ξ = Λ′Λ produces co-axial boosts. Let the general form of the Dirac equation be Dψ ðxÞ = 0,

ð24:352Þ

~ of (24.342). Starting with the rest where D represents either G of (24.339) or G 0 frame of the electron, we have D = A or D = B and ψ ðxÞ = e ± imx wð0, hÞ ; see (24.296) with the latter equation. Operating S(Λ) of (24.293) on (24.352) from the left, we obtain SðΛÞD½SðΛÞ - 1 SðΛÞψ ðxÞ = 0: Defining D0  SðΛÞD½SðΛÞ - 1 and using (24.277), we have

ð24:353Þ

24.6

Transformation Properties of the Dirac Operators [9]

1223

D0 ψ 0 ðx0 Þ = 0:

ð24:354Þ

Further operating S(Λ′) on (24.354) from the left, we get SðΛ0 ÞD0 ½SðΛ0 Þ

-1

SðΛ0 Þψ 0 ðx0 Þ = 0,

ð24:355Þ

where Λ′ was given in (24.350). Once again, defining D00  SðΛ0 ÞD0 ½SðΛ0 Þ ψ ′′(x′′)  S(Λ′)ψ ′(x′), we have D00 ψ 00 ðx00 Þ = 0:

-1

and

ð24:356Þ

We assume that after the successive Lorentz transformations Λ and Λ′, we reach the frame O′′. Also, we have ψ 00 ðx00 Þ = SðΛ0 Þψ 0 ðx0 Þ = SðΛ0 ÞSðΛÞψ ðxÞ = SðΛ0 ΛÞψ ðxÞ, D00 = SðΛ0 ÞD0 ½SðΛ0 Þ

-1

= SðΛ0 ÞSðΛÞD½SðΛÞ - 1 ½SðΛ0 Þ

= SðΛ0 ÞSðΛÞD½SðΛ0 ÞSðΛÞ - 1 = SðΛ0 ΛÞD½SðΛ0 ΛÞ

-1

:

-1

ð24:357Þ

In the second equation of (24.357), with the last equality we used the fact that S(Λ) is a representation of the Lorentz group. Repeatedly operating S(Λ), S(Λ′), etc., we can obtain a series of successively transformed Dirac spinors and corresponding Dirac operators. Thus, the successive transformations of the Dirac operators can be viewed as a similarity transformation as a whole. In what follows, we focus on a case where two non-coaxial Lorentz boosts take place. Example 24.4 As a tangible example, we show how the Dirac spinors and Dirac operators are transformed as a result of the successively occurring Lorentz transformations. Table 24.3 shows a (4, 4) matrix of Ξ = Λ′Λ given in (24.351). Table 24.4 lists another (4, 4) matrix of S(Ξ) = S(Λ′Λ) = S(Λ′)S(Λ). The matrix S(Λ) is given in (24.293). The matrix S(Λ′) is expressed as S Λ ′ = S Λϕ ′ S Λθ ′ ω′ ω′ θ′ θ′ ′ cosh eiϕ =2 sin cosh 2 2 2 2 ω ′ - iϕ ′ =2 ω′ θ′ θ′ - iϕ ′ =2 e sin cosh cos cosh -e 2 2 2 2 ω′ ω′ θ′ θ′ iϕ ′ =2 iϕ ′ =2 e cos sinh sin sinh -e 2 2 2 2 ′ ′ ′ ω ω′ θ θ ′ ′ e - iϕ =2 cos sinh e - iϕ =2 sin sinh 2 2 2 2 eiϕ



=2

cos

S Λb ′

-1

=

ω′ ω′ θ′ θ′ ′ sinh eiϕ =2 sin sinh 2 2 2 2 ω′ ω′ θ′ θ′ - iϕ ′ =2 - iϕ ′ =2 e e sin sinh cos sinh 2 2 2 2 : ω′ ω′ θ′ θ′ iϕ ′ =2 iϕ ′ =2 e e cos cosh sin cosh 2 2 2 2 ′ ′ ′ ω ω′ θ θ ′ ′ - e - iϕ =2 sin cosh e - iϕ =2 cos cosh 2 2 2 2 ð24:358Þ - eiϕ



=2

cos

- cos ϕ ′ cos θ ′ sin ϕ - sin ϕ ′ cos ϕ

- sin ϕ ′ cos θ ′ sin ϕ þ cos ϕ ′ cos ϕ

sin θ ′ sin ϕ

sin ϕ ′ cos θ ′ cos ϕ cos θ þ cos ϕ ′ sin ϕ cos θ - sin ϕ ′ sin θ ′ sin θcoshω ′ - sin θ ′ cos ϕ cos θ - cos θ ′ sin θcoshω ′

sin ϕ ′ sin θ ′ sinhω ′ coshω þ sin ϕ ′ cos θ ′ cos ϕ sin θsinhω þ cos ϕ ′ sin ϕ sin θsinhω þ sin ϕ ′ sin θ ′ cos θcoshω ′ sinhω

cos θ ′ sinhω ′ coshω - sin θ ′ cos ϕ sin θsinhω þ cos θ ′ cos θcoshω ′ sinhω

0

cos ϕ ′ cos θ ′ cos ϕ cos θ - sin ϕ ′ sin ϕ cos θ - cos ϕ ′ sin θ ′ sin θcoshω ′

- sin θsinhω ′

cos ϕ ′ sin θ ′ sinhω ′ coshω þ cos ϕ ′ cos θ ′ cos ϕ sin θsinhω - sin ϕ ′ sin ϕ sin θsinhω þ cos ϕ ′ sin θ ′ cos θcoshω ′ sinhω

coshω ′ coshω þ cos θsinhω ′ sinhω

Ξ=Λ′Λ=

Table 24.3 Representation matrix of Lorentz transformation Λ′Λ

cos θ ′ sinhω ′ sinhω - sin θ ′ cos ϕ sin θcoshω þ cos θ ′ cos θcoshω ′ coshω

sin ϕ ′ sin θ ′ sinhω ′ sinhω þ sin ϕ ′ cos θ ′ cos ϕ sin θcoshω þ cos ϕ ′ sin ϕ sin θcoshω þ sin ϕ ′ sin θ ′ cos θcoshω ′ coshω

cos ϕ ′ sin θ ′ sinhω ′ sinhω þ cos ϕ ′ cos θ ′ cos ϕ sin θcoshω - sin ϕ ′ sin ϕ sin θcoshω þ cos ϕ ′ sin θ ′ cos θcoshω ′ coshω

coshω ′ sinhω þ cos θsinhω ′ coshω

1224 24 Basic Formalism

θ ω0 - ω θ0 sin sinh 2 2 2 θ0 θ ω0 þ ω iðϕ0 - ϕÞ=2 sin cos sinh þe 2 2 2

θ0 θ ω0 - ω sin sinh 2 2 2 0 θ0 θ ω0 þ ω þe - iðϕ þϕÞ=2 cos cos sinh 2 2 2

θ0 θ ω0 þ ω cos sinh 2 2 2 0 θ0 θ ω0 - ω - e - iðϕ þϕÞ=2 cos sin sinh 2 2 2

0

e - iðϕ - ϕÞ=2 sin

0

e - iðϕ - ϕÞ=2 sin

0

- eiðϕ þϕÞ=2 cos

θ ω0 þ ω θ0 cos sinh 2 2 2 θ0 θ ω0 - ω iðϕ0 - ϕÞ=2 sin sin sinh -e 2 2 2

0

- eiðϕ þϕÞ=2 cos

0

θ0 θ ω0 - ω sin cosh 2 2 2 θ ω0 þ ω θ0 - iðϕ0 þϕÞ=2 þe cos cos cosh 2 2 2

0

- e - iðϕ - ϕÞ=2 sin

- e - iðϕ - ϕÞ=2 sin

θ0 θ ω0 - ω sin cosh 2 2 2 θ0 θ ω0 þ ω iðϕ0 - ϕÞ=2 þe sin cos cosh 2 2 2

0

eiðϕ þϕÞ=2 cos

θ0 θ ω0 þ ω cos cosh 2 2 2 θ ω0 - ω θ0 - iðϕ0 þϕÞ=2 -e cos sin cosh 2 2 2

eiðϕ þϕÞ=2 cos

0

θ0 θ ω0 þ ω cos cosh 2 2 2 θ0 θ ω0 - ω iðϕ0 - ϕÞ=2 -e sin sin cosh 2 2 2

SðΛ ′ΛÞ =

Table 24.4 Representation matrix of S(Λ′Λ)

- e - iðϕ - ϕÞ=2 sin

0

θ0 θ ω0 þ ω cos cosh 2 2 2 0 θ0 θ ω0 - ω - e - iðϕ þϕÞ=2 cos sin cosh 2 2 2

θ0 θ ω0 þ ω cos cosh 2 2 2 0 θ ω0 - ω θ0 - eiðϕ - ϕÞ=2 sin sin cosh 2 2 2 0

eiðϕ þϕÞ=2 cos

θ0 θ ω0 þ ω cos sinh 2 2 2 0 θ0 θ ω0 - ω - e - iðϕ þϕÞ=2 cos sin sinh 2 2 2 e - iðϕ - ϕÞ=2 sin

0

θ0 θ ω0 - ω sin sinh 2 2 2

- eiðϕ - ϕÞ=2 sin

0

θ ω0 þ ω θ0 cos sinh 2 2 2

0

- eiðϕ þϕÞ=2 cos

θ0 θ ω0 - ω sin cosh 2 2 2 0 θ0 θ ω0 þ ω þe - iðϕ þϕÞ=2 cos cos cosh 2 2 2 0

- e - iðϕ - ϕÞ=2 sin

0

eiðϕ þϕÞ=2 cos

θ0 θ ω0 - ω sin cosh 2 2 2 0 θ0 θ ω0 þ ω þeiðϕ - ϕÞ=2 sin cos cosh 2 2 2

θ0 θ ω0 - ω sin sinh 2 2 2 θ0 θ ω0 þ ω - iðϕ0 þϕÞ=2 þe cos cos sinh 2 2 2 0

θ0 θ ω0 þ ω cos sinh 2 2 2

θ0 θ ω0 - ω sin sinh 2 2 2

e - iðϕ - ϕÞ=2 sin

0

þeiðϕ - ϕÞ=2 sin

0

- eiðϕ þϕÞ=2 cos

24.6 Transformation Properties of the Dirac Operators [9] 1225

a

= ð - 2mÞ ×

- 12 sin θ ′ ′ ′

0 ′

- ð3; 1Þ

ð4; 1Þ



0

cosh2ω2 cosh2ω2 þ sinh2ω2 sinh2ω2 þ12 cos θsinhω ′ sinhω

- ð4; 1Þ

ð1; 1Þ 

- ð3; 1Þ

0

ð3; 3Þ

0

ð3; 1Þ

- ð4; 1Þ

In the above matrix, e.g., (1, 1) means that the (2, 2) matrix element should be replaced with the (1, 1) element

′ ′ 1 - iϕ ′ sin θsinhω eiϕ sin 2θ2 - e - iϕ cos 2θ2 2e ′ - 12e - iϕ sin θ ′ cos θcoshω ′ sinhω þ sinhω ′ coshω

sin θ cos ϕsinhω þ12 cos θ sinhω coshω þ cos θcoshω ′ sinhω

-

-1

′ ′ cosh2ω2 sinh2ω2 þ sinh2ω2 cosh2ω2 1 ′ þ2 cos θsinhω sinhω

D″ = S Λ ′ Λ A S Λ ′Λ

Table 24.5 Representation matrix of Dirac operator D00 a

1226 24 Basic Formalism

24.6

Transformation Properties of the Dirac Operators [9]

1227

As shown in Sect. 24.5, the Dirac spinors ψ(x) was expressed in reference to the rest frame of the electron as ψ ðxÞ = e ± ip x wð0, hÞ = e ± imx wð0, hÞ:

ð24:296Þ

ψ 00 ðx00 Þ = SðΞÞψ ðxÞ = e ± imx SðΞÞwð0, hÞ:

ð24:359Þ

0 0

0

Then, we have 0

The spinors w(0, h) were given in (24.305) and (24.310) according to the positiveand negative-energy electron. Thus, individual columns of S(Ξ) in Table 24.4 are directly associated with the four spinors of either the positive- or negative-energy electron, as in the case of (24.293). The orthogonal characteristics hold with the Dirac spinors as in (21.141) and (21.142). Individual matrix elements of S(Ξ) consist of two complex terms. Note, however, in the case of the co-axial boosts (i.e., θ = 0) each matrix element comprises a single term. In both the co-axial and non-coaxial boost cases, individual four columns of S(Λ′Λ) represent the Dirac spinors with respect to the frame O′′. As mentioned before, the sign for ψ 004 ðx00 Þ should be reversed relative to the fourth column of S(Λ′Λ) because of the charge conjugation. Meanwhile, Table 24.5 lists individual matrix elements of a (4, 4) matrix of the Dirac operator D00 with the positive-energy electron. Using S(Λ′Λ), D00 viewed in reference to the frame O′′ is described by D00 = SðΛ0 ΛÞA½SðΛ0 ΛÞ - 1 = SðΞÞA½SðΞÞ - 1 ,

ð24:360Þ

where A was given in (24.301). As can be seen in (21.75), the Dirac operator contains all the information of the rest mass of an electron and the energymomentum of a plane wave electron. This is clearly stated in (24.278). In the x′′system (i.e., the frame O′′) the Dirac equation is described in a similar way such that iγ μ ∂

00

μ

- m  ψ 00 ðx00 Þ = 0:

The point is that in any inertial frame of reference the matrix form of the gamma matrices is held invariant. Or rather, S(Ξ) has been chosen for the gamma matrices to be invariant in such a way that Λνμ Sγ μ S - 1 = γ ν :

ð24:276Þ

Readers are encouraged to confirm this relation by calculating a tangible example. Thus, with the plane wave solution (of the positive-energy case) in frame O′′ we obtain

1228

24

Basic Formalism

p00 μ γ μ - m  ψ 00 ðx00 Þ = 0, where p′′μ is a four-momentum in O′′. Hence, even though the matrix form of Table 24.5 is somewhat complicated, the Dirac equation must uniquely be represented in any inertial frame of reference. Thus, from Table 24.5 we get the following four-momenta in the frame O′′ described by p00 = ½ð1, 1Þ - ð3, 3Þ=2, m = ½ð1, 1Þ þ ð3, 3Þ=ð- 2Þ, 0

p00 = ð1, 3Þ, p1 = ½ - ð1, 4Þ þ ð4, 1Þ=2, p2 = ½ð1, 4Þ þ ð4, 1Þi=2, 3

ð24:361Þ

where (m, n) to which 1 ≤ m, n ≤ 4 indicates the (m, n)-element of the matrix given in Table 24.5. As noted from (21.75), we can determine other matrix elements from the four elements indicated in Table 24.5, in which we certainly confirm that (1, 1) + (3, 3) = - 2m. This is in agreement with G of (21.75) and (21.91). Thus, from (24.361) and Table 24.5 we get p000 = mðcosh ω0 cosh ω þ cos θ sinh ω0 sinh ωÞ, p003 = m cos θ0 ðsinh ω0 cosh ω þ cos θ cosh ω0 sinh ωÞ - m sin θ0 sin θ cos ϕ sinh ω, p001 = m½ sin θ0 cos ϕ0 ðsinh ω0 cosh ω þ cos θ cosh ω0 sinh ωÞ þ cosðϕ0 þ ϕÞ cos 2

θ0 θ0 sin θ sinh ω - cosðϕ0 - ϕÞ sin 2 sin θ sinh ω, 2 2

p002 = m½ sin θ0 sin ϕ0 ðsinh ω0 cosh ω þ cos θ cosh ω0 sinh ωÞ þ sinðϕ0 þ ϕÞ cos 2

θ0 sin θ sinh ω 2

- sinðϕ0 - ϕÞ sin 2

θ0 sin θ sinh ω: 2

ð24:362Þ

From the above relations, we recover p000

2

- p002 - m2 = 0

or

p000 =

p002 þ m2 :

ð21:79Þ

As seen above, the basic construction of the Dirac equation remains unaltered by the operation of successive Lorentz transformations including the non-coaxial boosts.

24.7

Projection Operators and Related Operators for the Dirac Equation

1229

Example 24.5 We show another simple example of the non-coaxial Lorentz boosts. Suppose that in the moving coordinate systems we have successive non-coaxial boosts Λ described by cosh ω ′ - sinh ω ′ - sinh ω 0

Λ=



0

cosh ω 0



0

0

0

0 1 0

0 0 1

cosh ω ′ cosh ω - sinh ω ′ =

cosh ω 0 0

0 1 0

0 0 1

sinh ω 0 0

sinh ω

0

0

cosh ω

0

cosh ω ′ sinh ω

- sinh ω ′ cosh ω 0

cosh ω ′ 0

0 1

- sinh ω ′ sinh ω 0

sinh ω

0

0

cosh ω

:

ð24:363Þ

Equation (24.363) corresponds to a special case of (24.351), where θ = - θ′ = π/ 2 and ϕ = ϕ′ = 0; see Table 24.3. Readers are encouraged to imagine the geometric feature of the successive non-coaxial boosts in the moving coordinate systems; see Fig. 24.2.

24.7

Projection Operators and Related Operators for the Dirac Equation

In this section, we examine the constitution of the Dirac equation in terms of the projection operator. In the next section, related topics are discussed as well. To this end, we wish to deal with the issue from a somewhat general standpoint. Let G be expressed as G=

pE -H

H ðp ≠ qÞ, qE

where E is a (2, 2) identity matrix; H is a (2, 2) Hermitian matrix. Then, we have G{ G =

p2 E þ H 2 ðq - pÞH

ðq - pÞH , GG{ = q2 E þ H 2

p2 E þ H 2 ðp - qÞH

ðp - qÞH : ð24:364Þ q2 E þ H 2

For G to be a normal matrix (i.e., G{G = GG{), we must have H  0. In our present case, if we choose G of (21.91) for G, we have p = p0 - m, q = - p0 - m, and H = - σ  p. Thus, H  0 is equivalent to p  0. This means that an electron is at rest. Namely, G is normal (and Hermitian) only if the electron is at rest. As long as the electron is moving (in a certain inertial frame of reference), G is not normal. In this situation, Theorem 14.5 tells us that G could only be diagonalized via similarity

1230

24

Basic Formalism

transformation using a non-unitary operator, if p ≠ 0. In fact, in (24.339) using a non-unitary operator S, G could be diagonalized. This means that although the operator G is not normal (nor Hermitian), it is semi-simple (Sect. 12.5). The above ~ of (21.108) as shown in (24.342). discussion is true of the operator G In the discussion of the previous section, we have examined the constitution of the operator S and known that their non-unitarity comes from the Lorentz boost. ~ is given, the diagonalizing matrix form for S can uniquely be However, once G or G determined (see Sect. 24.9). Let us start our main theme about the projection operator. From (24.301) and (24.302), we obtain A B þ = E, ð- 2mÞ ð- 2mÞ

ð24:365Þ

where E is the identity operator; i.e., a (4, 4) identity matrix. Operating S from the left and S-1 from the right on both sides of (24.365), respectively, we get SAS - 1 SBS - 1 þ = SES - 1 = E ð- 2mÞ ð- 2mÞ

SAS - 1 þ SBS - 1 = ð- 2mÞE:

or

ð24:366Þ

Using (24.339) and (24.342), we restore (24.282). Meanwhile, operating both sides of (24.366) on ϕ′(x′) of (24.346) from the left and taking account of (24.282), we obtain ~ 0 ðx0 Þ = - 2mϕ0 ðx0 Þ: Gϕ

ð24:367Þ

Operating both sides of (24.366) on χ ′(x′) of (24.349) from the left and taking account of (24.282) once again, we have Gχ 0 ðx0 Þ = - 2mχ 0 ðx0 Þ:

ð24:368Þ

Now, suppose that the general solutions of the plane wave Dirac field Ψ(x′) are given by 4

Ψðx0 Þ =

α ψ 0 ðx0 Þ, i=1 i i

ð24:369Þ

where ψ 0i ðx0 Þ are those of (24.311) and αi are their suitable coefficients. Then, ~ from the left on both sides of (24.369), we get operating G ~ ð x0 Þ = GΨ

2 i=1

ð- 2mÞαi ψ 0i ðx0 Þ,

ð24:370Þ

24.7

Projection Operators and Related Operators for the Dirac Equation 4

GΨðx0 Þ =

i=3

ð- 2mÞαi ψ 0i ðx0 Þ:

1231

ð24:371Þ

Rewriting the above equations, we obtain ~ ðx0 Þ=ð- 2mÞ = GΨ

2

α ψ 0 ðx0 Þ, i=1 i i

ð24:372Þ

GΨðx0 Þ=ð- 2mÞ =

4

ð24:373Þ

α ψ 0 ðx0 Þ: i=3 i i

~ ðx0 Þ=ð- 2mÞ extracts the positiveEquations (24.372) and (24.373) mean that GΨ ′ energy states from Ψ(x ) including the coefficients and that GΨðx0 Þ=ð- 2mÞ extracts the negative-energy states from Ψ(x′) including the coefficients as well. These imply ~ ðx0 Þ=ð- 2mÞ and GΨðx0 Þ=ð- 2mÞ act as the projection that both the operators GΨ operators (see Sects. 14.1 and 18.7). Defining P and Q as P

~ - pμ γ μ - m pμ γ μ þ m G = = , 2m ð- 2mÞ ð- 2mÞ

Q

- pμ γ μ þ m pμ γ μ - m G = = 2m ð- 2mÞ ð- 2mÞ

ð24:374Þ

and referring to (24.282), we obtain P þ Q = E:

ð24:375Þ

pμ γ μ þ m p ρ γ ρ þ m pμ γ μ pρ γ ρ þ 2mpρ γ ρ þ m2 : = ð2mÞð2mÞ ð2mÞð2mÞ

ð24:376Þ

From (24.374), we have P2 =

To calculate (24.376), we take account of the following relation described by [13] p μ γ μ pρ γ ρ = pμ pρ γ μ γ ρ =

1 1 p p ðγ μ γ ρ þ γ ρ γ μ Þ = pμ pρ  2ημρ = pμ pμ 2 μ ρ 2 = p2 = m 2 ,

ð24:377Þ

where with the third equality we used (21.64). Substituting (24.377) into (24.376), we rewrite it as P2 =

2m pρ γ ρ þ m pμ γ μ þ m = = P, 2m ð2mÞð2mÞ

ð24:378Þ

where the last equality came from the first equation of (24.374). Or, using the matrix algebra we reach the same result at once. That is, we have

1232

24

P= P2 =

Basic Formalism

SBS - 1 , ð- 2mÞ

ð- 2mÞSBS - 1 SBS - 1 SBS - 1 SBS - 1 SB2 S - 1 = =  = = P, ð- 2mÞ ð- 2mÞ ð- 2mÞ2 ð- 2mÞ ð- 2mÞ2

ð24:379Þ

where with the third equality we used (24.302). Similarly, we obtain Q2 = Q:

ð24:380Þ

PQ = QP = 0:

ð24:381Þ

Moreover, we have

The relations (24.375) through (24.381) imply that P and Q are projection operators. However, notice that neither P nor Q is Hermitian. This property originally comes from the involvement of the Lorentz boost term of (24.293). In this regard, P and Q can be categorized as projection operators sensu lato (Sect. 18.7), even though these operators are idempotent as can be seen from (24.378) and (24.380). Equation (24.375) represents the completeness relation. Let us consider the following example. Example 24.6 We have an interesting and important example in relation to the projection operators and spectral decomposition dealt with in Sect. 14.3. A projection operator was defined using an adjoint operator (see Sect. 14.1) such that P~k = jwk ihwk j :

ð24:382Þ

Notice that (24.382) postulates that P~k is Hermitian. That is, ðjwi ihwi jÞ{ = ðhwi jÞ{ ðjwi iÞ{ = jwi ihwi j :

ð14:28Þ

To apply this notation to the present case, we define following vectors in relation to (24.304) and (24.309) as: j1i  u1 ð0, - 1Þ, j2i  u2 ð0, þ1Þ;

ð24:383Þ

j3i  v1 ð0, þ1Þ, j4i  v2 ð0, - 1Þ:

ð24:384Þ

Defining ~  P we have

A ð- 2mÞ

and

~  Q

B , ð- 2mÞ

ð24:385Þ

24.7

Projection Operators and Related Operators for the Dirac Equation 2

2

P þ Q = E, P = P, Q = Q, PQ = 0:

1233

ð24:386Þ

~ are Hermitian projection operators (i.e., projection operators ~ and Q Hence, both P sensu stricto; see Sect. 18.7). This is self-evident, however, notice that neither A nor B in (24.385) involves a Lorentz boost. Then, using (24.383) and (24.384), we obtain j1ih1jþj2ih2j þ j3ih3jþj4ih4j = E:

ð24:387Þ

Rewriting (24.387) in the form of completeness relation, we have 1 0 0 0 þ

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0

þ

0 0 0 0 1 0 0 0

0 0 0 0

0 1 0 0

0 0 0 0 0 0 0 0

0 0 0 0

þ

0 0 0 0

0 0 0 0

0 0 0 1

= E:

ð24:388Þ

Though trivial, (24.388) represents the spectral decomposition. Example 24.7 Another important projection operator can readily be derived from the spin associated operator defined in Chap. 21 as 1 σp jpj

ð21:85Þ

1 σ  ð- pÞ = - S: j - pj

ð21:107Þ

S and ~  S

The above operators were (2, 2) submatrices (Sect. 21.3). The projection opera~ ~ of (4, 4) full matrix can be defined by use of the direct sum of S or S tors Σ and Σ such that 1 Eþ 2

S 0

0 S

,

ð24:389Þ

~ 1 Eþ Σ 2

~ S 0

0 ~ S

,

ð24:390Þ

Σ

~ are Hermitian. We can where E is the (4, 4) identity matrix. Notice that both Σ and Σ readily check that

1234

24

Basic Formalism

2

Σ þ Σ = E, Σ2 = Σ, Σ = Σ, ΣΣ = ΣΣ = 0:

ð24:391Þ

More importantly, we have ~ { = Σ: ~ Σ{ = Σ and Σ

ð24:392Þ

~ are the projection operators sensu stricto. In this respect, the operators Σ and Σ In Sect.21.3.2, as the solution of the Dirac equation (21.96) we have gotten the eigenfunction Χ described by χ hSχ

Χ=

:

ð21:100Þ

Operating Σ of (24.389) on Χ of (21.100) from the left, for instance, we obtain ΣΧ =

1 Eþ 2

S 0

0 S

χ hSχ

=

=

1 2

1 2

χ hSχ

χ hSχ

χ hSχ

1 2

1 þ h 2

χ hSχ

χ hSχ

If in (24.393) h = 1, we have ΣΧ = ΣΧ = 0. Consequently, Σ extracts

þ

Sχ hSSχ

χ hSχ

=

1 2

χ hSχ

þ

:

1 2

hχ hShχ

ð24:393Þ

. If h = - 1, on the other hand,

intactly when h = 1, whereas it discards

in whole when h = - 1.

For another example of the projection operator, we introduce ~h  1 E þ h Σ 2

~ S 0

:

0 ~ S

ð24:394Þ

~ h of (24.394) on Χ′ of (21.133) from the left, we obtain Operating Σ ~ h Χ0 ¼ 1 E þ h Σ 2 ¼

1 2

S~ χ - h~ χ

1 þ h 2

Sh~ χ - hh~ χ

~ S 0

¼

S~ χ - h~ χ

0 ~ S

1 2

S~ χ - h~ χ

¼

1 þ h2 2

1 2

S~ χ - h~ χ

S~ χ - h~ χ

¼

1 þ h 2 S~ χ - h~ χ

~χ S S~ ~χ - hS~

¼ Χ0 : ð24:395Þ

Equation (24.395) looks trivial, but it clearly shows that Χ′ is certainly a proper negative-energy eigenvector that belongs to the helicity h. Similarly, we define

24.8

Spectral Decomposition of the Dirac Operators

Σh 

1 Eþh 2

S 0

1235

0 S

:

ð24:396Þ

Also, we have Σh Χ = Σ h

χ hSχ

= Χ, Σh Χ0 = Σh

Sχ - hχ

= 0, Σh Χ = 0, {

~h = Σ ~h, Σ ~ h Σ h = Σh Σ ~ =Σ ~ h , etc: ~ h = 0, Σ{ = Σh , Σ Σ h 2 = Σh , Σ h h 2

ð24:397Þ ð24:398Þ

To the above derivation, use (21.89) and (21.115). It is obvious that both Σh and ~ h are projection operators sensu stricto. Σ

24.8

Spectral Decomposition of the Dirac Operators

~ (with the In the previous section, we have shown how the Dirac operators G and G eigenvalues -2m and 0 both doubly degenerate) can be diagonalized. In Sect. 14.3 we discussed the spectral decomposition and its condition. That is, we have shown that a necessary and sufficient condition for an operator to be a normal operator is that the said operator is expressed as the relation (14.74) called the spectral decomposition. This is equivalent to the statement of Theorem 14.5. In this case, relevant projection operators are the projection operators sensu stricto (i.e., Hermitian and idempotent). Meanwhile, in Sects. 12.5 and 12.7 we mentioned that similar decomposition can be done if an operator in question can be diagonalized via similarity transformation (namely, in the case where the said operator is semi-simple). In that case, Definition 14.1 of the projection operator has been relaxed so that the projection operators sensu lato can play a role (see Sect. 18.7). The terminology of spectral decomposition should be taken in a broader meaning accordingly. Bearing in mind the above discussion, we examine the conditions for the spectral decomposition of the Dirac ~ Neither G nor G ~ is Hermitian or normal, but both of them are operators G and G. semi-simple. Now, operating S and S-1 on both sides of (24.387) from the left and right, respectively, we have Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞS - 1 = SES - 1 = E:

ð24:399Þ

If the Lorentz boost is absent, the representation matrix S related to the Lorentz transformation is unitary (i.e., normal). In this case, (24.399) is rewritten as

1236

24

Basic Formalism

Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞS{ = E:

ð24:400Þ

Sjki  jk0 i ðk = 1, 2, 3, 4Þ

ð24:401Þ

Defining

and taking the adjoint of (24.401), we have hkjS{  hk0 j:

ð24:402Þ

j10 ih10 jþj20 ih20 j þ j30 ih30 jþj40 ih40 j = E:

ð24:403Þ

Then, from (24.400) we get

Individual projection operators | k′ihk′ j (k = 1, 2, 3, 4) are Hermitian and, hence, these operators are the projection operators sensu stricto. We have already seen this case in Sect. 24.7. Notice again that (24.401) represents the transformation of the Dirac spinor caused by the Lorentz transformation. If the Lorentz boost is present, however, S is not unitary. The spectral decomposition is not straightforward in that case. Instead, we have a following relation with S described by [11] γ 0 S{ γ 0 = S - 1 :

ð24:404Þ

Using (24.404) we rewrite (24.399) as Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞγ 0 S{ γ 0 = E: Using (24.401), we have ðj10 ih1jþj20 ih2j þ j30 ih3jþj40 ih4jÞγ 0 S{ γ 0 = E:

ð24:405Þ

Note that | 1′i and | 2′i represent two linearly independent solutions ϕ′(x′) of (24.346) and that | 3′i and | 4′i represent two linearly independent solutions χ ′(x′) of (24.349). Using h1| γ 0 = h1| , h2|γ 0 = h2j; h3| γ 0 = - h3| , h4|γ 0 = - h4j, we get ðj10 ih1jþj20 ih2j - j30 ih3j - j40 ih4jÞS{ γ 0 = E: Using (24.402), we have

ð24:406Þ

24.8

Spectral Decomposition of the Dirac Operators

1237

ðj10 ih10 jþj20 ih20 j - j30 ih30 j - j40 ih40 jÞγ 0 = E:

ð24:407Þ

hk 0 jγ 0  hk 0 j,

ð24:408Þ

j10 ih10 j þ j20 ih20 j - j30 ih30 j - j40 ih40 j = E:

ð24:409Þ

Defining

we obtain

Note that (24.408) is equivalent to (21.135) that defines the Dirac adjoint. Equation (24.409) represents the completeness relation of the Dirac spinors; be aware of the minus signs of the third and fourth terms. Multiplying P of (24.374) on both sides of (24.409) from the left and using (24.349) and (24.367), we have P j10 ih10 j þ j20 ih20 j = j10 ih10 j þ j20 ih20 j = P =

pμ γ μ þ m : 2m

ð24:410Þ

In (24.410), we identify j10 ih10 j þ j20 ih20 j with ( pμγ μ + m)/2m. This can readily be confirmed using the coordinate representation ψ 01 ðx0 Þ and ψ 02 ðx0 Þ of (24.311) for | 1′i and | 2′i, respectively. Readers are encouraged to confirm this. Multiplying Q of (24.374) on both sides of (24.409) from the left and using (24.346) and (24.368), in turn, we obtain Q - j30 ih30 j - j40 ih40 j = - j30 ih30 j - j40 ih40 j = Q:

ð24:411Þ

That is, we get j30 ih30 j þ j40 ih40 j = - Q =

pμ γ μ - m : 2m

ð24:412Þ

From (24.410) and (24.412), we recover P þ Q = E:

ð24:375Þ

- pμ γ μ - m = - 2mj10 ih10 j þ ð- 2mÞj20 ih20 j:

ð24:413Þ

Rewriting (24.410), we have

Also, we obtain

1238

24

Basic Formalism

j10 ih10 j  j10 ih10 j = j10 ih10 j, j20 ih20 j  j20 ih20 j = j20 ih20 j, j10 ih10 j  j20 ih20 j = j20 ih20 j  j10 ih10 j = 0:

ð24:414Þ

Similarly, we have pμ γ μ - m = - 2m - j30 ih30 j þ ð- 2mÞ - j40 ih40 j ,

ð24:415Þ

j30 ih30 j  j30 ih30 j = - j30 ih30 j, j40 ih40 j  j40 ih40 j = - j40 ih40 j, j30 ih30 j  j40 ih40 j = j40 ih40 j  j30 ih30 j = 0: For instance, (24.415) can readily be checked using (24.311) and (24.337). Equation (24.413) can similarly be checked from (24.282). Looking at (24.413) and (24.415), we find the resemblance between them and (12.196). In fact, both these cases offer examples of the spectral decomposition (see Sects. 12.5 and 12.7). Taking account of the fact that eigenvalues of a matrix remain unchanged after the similarity transformation (Sect. 12.1), we find that the coefficients -2m for RHS of (24.413) and (24.415) are the eigenvalues (doubly degenerate) for the matrix B of (24.302) and A of (24.301). From (24.414), j10 ih10 j and j20 ih20 j are idempotent operators. Also, from (24.415), - j30 ih30 j and - j40 ih40 j are idempotent as well. The presence of the two idempotent operators in (24.413) and (24.415) reflects that the rank of (-pμγ μ - m) and ( pμγ μ - m) is 2. Notice that neither the operator j30 ih30 j nor j40 ih40 j, on the other hand, is an idempotent operator but, say, “anti-idempotent” operator. That is, for an anti-idempotent operator A we have A2 = - A, where its eigenvalues are 0 or -1. The operators of both types play an important role in QED. Meanwhile, we have jk0 ihk 0 j

{

= jk 0 ihk 0 jγ 0

{

{

= γ 0 jk 0 ihk 0 j = γ 0 jk0 ihk0 j

≠ jk 0 ihk0 jγ 0 = jk0 ihk 0 j: Hence, jk0 ihk0 j is not Hermitian. The operator - jk 0 ihk 0 j is not Hermitian, either. Thus, the operators ± jk0 ihk0 j may be regarded as the projection operators sensu lato accordingly. The non-Hermiticity of the projection operators results from the fact that S is not unitary; see the discussion made in Sect. 24.7. As mentioned earlier in this section, however, we have an exceptional case where the Lorentz boost is absent; see Sect. 24.6.1. In this case, we can construct the projection operators sensu stricto; see (24.327), (24.387), and (24.403). As can be

24.8

Spectral Decomposition of the Dirac Operators

1239

seen in these examples, the Dirac operator G is Hermitian. Then, as pointed out in Sect. 21.3.4, we must have the complete orthonormal set (CONS) of common eigenfunctions with the operators G and H according to Theorem 14.14. Then, the CONS constitutes the diagonalizing unitary matrix S(Λ). This can easily be checked using (21.86), (21.102), (24.319) and (24.320). With H, we obtain

SR

-1

-1 0 0 0

{

HSR = SR HSR =

0 1 0 0

0 0 -1 0

0 0 0 1

,

where SR is given by (24.318); the diagonal elements are eigenvalues -1 and 1 (each doubly degenerate) of H. Individual columns of (24.318) certainly constitute the CONS. Denoting them from the left as | ji ( j = 1, 2, 3, 4), we have hjjji = δij,

4 j=1

jjihj j = E:

The operator | jihjj is a projection operator sensu stricto. The above results are essentially the same as those of Example 24.6. With G ð = AÞ, the result is selfevident. What about S - 1 HS, then? As already seen in Sect. 21.3.2, G and H commute with each other; see (21.93). Consequently, we expect to have common eigenvectors for G and H. In other words, once G has been diagonalized with S as in (24.339), H may well be diagonalized with S as well. It is indeed the case. That is, we also have

S

-1

HS = Sω- 1

SR

-1

HSR Sω =

-1 0 0 0

0 1 0 0

0 0 -1 0

0 0 0 1

,

ð24:416Þ

where Sω is given by (24.328). Thus, we find that the common eigenvectors for G and H are designated by the individual column vectors of S given in (24.293). As already shown in Sects. 21.4 and 24.5, these column vectors do not constitute CONS. ~ given in (21.116), similarly we get Regarding the operator H

S

-1~

HS= -S

-1

HS =

1 0 0 0

0 -1 0 0

0 0 1 0

0 0 0 -1

:

ð24:417Þ

We take the normalization as in (21.141) and (21.142) and define the operators | u-i, | u+i, etc. such that

1240

p

24

Basic Formalism

p p p 2m j 1 ′ j ju - i, 2m 2 ′  juþ i, 2m j3 ′  jvþ i, 2m 4 ′  jv - i; p p p p 2m h10 j  hu - j, 2m h20 j  huþ j, 2m h30 j  hvþ j, 2m h40 j  hv - j, ð24:418Þ

where the subscripts - and + denote the helix -1 and +1, respectively. Thus, rewriting (24.410) and (24.412) we get the relations described by ju - ihu - j þ juþ ihuþ j = pμ γ μ þ m,

ð24:419Þ

jvþ ihvþ j þ jv - ihv - j = pμ γ μ - m:

ð24:420Þ

Using (21.141), we have ju - ihu - j  ju - ihu - j = 2mju - ihu - j, juþ ihuþ j  juþ ihuþ j = 2mjuþ ihuþ j: Similarly, using (21.142) we obtain jvþ ihvþ j  jvþ ihvþ j = - 2mjvþ ihvþ j, jv - ihv - j  jv - ihv - j = - 2mjv - ihv - j: Equivalent expressions of (24.419) and (24.420) are (21.144) and (21.145), respectively. The above relations are frequently used for various calculations including the field interaction (see Chap. 23). Since the operator pμγ μ in (24.419) or (24.420) is not a normal operator, neither (24.419) nor (24.420) represents a spectral decomposition in a strict sense (see Sect. 14.3). Yet, (24.419) and (24.420) may be referred to as the spectral decomposition in a broader sense (Sect. 12.7). In the present section and the precedent sections, we have examined the properties of the projection operators and related characteristics of the spectral decomposition in connection with the Dirac equation on the basis of their results, we wish to summarize important aspects below. ~ and helicity operators (H and H) ~ are mutually (i) The Dirac operators (G and G) commutative. Both the operators can be diagonalized with the operator S(Λ). In other words, individual column vectors of S(Λ) are the eigenvectors common to ~ H, and H. ~ all the operators G, G, (ii) Although the Dirac operators are not Hermitian (nor normal) for the case of p ≠ 0, the helicity operators are Hermitian. The non-Hermiticity of Dirac operators implies that the diagonalizing matrix S(Λ) is not unitary. Consequently, the projection operators formed by the individual column vectors of S(Λ) may be termed projection operators sensu lato. Combining the Dirac adjoint with the change in the functional form of the Dirac spinor described by (24.277), we have [10]

24.9

Connectedness of the Lorentz Group

ψ ′ ðx0 Þ = Sψ ðxÞ = ½Sψ ðxÞ{ γ 0 = ψ { ðxÞS{ γ 0 = ψ ðxÞ γ 0

1241 -1 { 0



= ψ ðxÞγ 0 S{ γ 0 = ψ ðxÞS - 1 ,

ð24:421Þ

where with the last equality we used (24.404). Multiplying both sides of (24.421) by the Dirac operator G from the right, we obtain ψ ′ ðx0 ÞG = ψ ðxÞS - 1 G = ψ ðxÞS - 1  SAS - 1 = ψ ðxÞAS - 1 , where with the second equality we used (24.339). Multiplying both sides of (24.421) ~ from the right, we get by the Dirac operator G ~ = ψ ðxÞS - 1  SBS - 1 = ψ ðxÞBS - 1 , ~ = ψ ðxÞS - 1 G ψ ′ ðx0 ÞG where with the second equality we used (24.342). If we choose ψ 1(x) or ψ 2(x) of (24.305) for ψ(x), we have ψ ðxÞA = 0. This leads to ψ ′ ðx0 ÞG = 0,

ð24:422Þ

which reproduces (21.150). Similarly, if we choose ψ 3(x) or ψ 4(x) of (24.308) for ψ(x), in turn, we have ψ ðxÞB = 0. Then, we obtain ~ = 0: ψ ′ ð x0 Þ G

24.9

ð24:423Þ

Connectedness of the Lorentz Group

In this section, we wish to examine the properties of the Lorenz groups in terms of the Lie groups and Lie algebras. In Chap. 20 we studied the structure and properties of SU(2) and SO(3). We mentioned that several different Lie groups share the same Lie algebra. Of those Lie groups, we had a single universal covering group characterized as the simply connected group (Sect. 20.5). As a typical example, only SU(2) is simply connected as the universal covering group amongst the Lie groups such as O(3), SO(3), etc., having the same Lie algebra. In this section, we examine how the Lorentz group is positioned as part of the continuous groups.

1242

24.9.1

24

Basic Formalism

Polar Decomposition of a Non-Singular Matrix [14, 15]

We have already studied properties of Lie algebra of the Lorentz group (Sect 24.2). We may ask what kind of properties the Lorentz group has in relation to the abovementioned characteristics of SU(2) and related continuous groups. In particular, is there a universal covering group for the Lorentz group? To answer this question, we need a fundamentally important theorem about the matrix algebra. The theorem is well-known as the polar decomposition of a matrix. Theorem 24.7 Polar Decomposition Theorem [15] Any complex non-singular matrix A is uniquely decomposed into a product such that A = UP,

ð24:424Þ

where U is a unitary matrix and P is a positive definite Hermitian matrix. Proof Since A is non-singular, A{A is a positive definite Hermitian matrix (see Sect. 13.2). Hence, A{A can be diagonalized via unitary similarity transformation. Namely, we have a following equation: λ1 W { A{ AW = Λ 

λ2



ð24:425Þ

, λn

where W is a suitable unitary matrix; n is a dimension of the matrix and λi (i = 1, ⋯, n) is a real positive number with all the off-diagonal elements being zero. Defining another positive definite Hermitian matrix Λ1/2 as p þ λ1 Λ

1=2



p þ λ2



p þ λn

,

ð24:426Þ

we rewrite (24.425) such that W { A{ A W = Λ1=2 Λ1=2 : Further rewriting (24.427), we have

ð24:427Þ

24.9

Connectedness of the Lorentz Group

1243

~ WΛ1=2 W { = Λ~1=2  Λ~1=2 = Λ1=2

A{ A = WΛ1=2 Λ1=2 W { = WΛ1=2 W {

2

,

ð24:428Þ ~ is defined as where Λ1=2 ~  WΛ1=2 W { : Λ1=2

ð24:429Þ

~ Since the unitary similarity transformation holds the eigenvalue unchanged, Λ1=2 is the positive definite Hermitian operator. Meanwhile, we define a following matrix V as ~ V  A Λ1=2

-1

:

ð24:430Þ

Then, using (24.428) and (24.430) we obtain V {V =

~ Λ1=2

-1 {

~ A{ A Λ1=2

~ = Λ1=2

-1

-1

Λ~1=2

where with the second equality we used

2

{ -1

~ Λ1=2

=

Λ~1=2

~ Λ1=2

-1

-1 {

~ Λ1=2

2

~ Λ1=2

= E,

=

Λ~1=2

-1

ð24:431Þ { -1

and with the

third equality we used the Hermiticity of Λ~1=2 . To show the former relation, suppose that we have two non-singular matrices Q and R. We have (QR){ = R{Q{. Putting R = Q-1, we get E{ = E = (Q-1){Q{, namely (Q{)-1 = (Q-1){ [14]. Thus, from (24.430) it follows that there exist a unitary matrix V and a positive definite ~ which satisfy (24.424) such that Hermitian matrix Λ1=2 A = V  Λ~1=2 :

ð24:432Þ

Next, suppose that there is another combination of unitary matrix U and Hermitian matrix P other than (24.432) and see what happens. From (24.424) we have A{ A = P{ U { UP = P2 ,

ð24:433Þ

where with the second equality we used U{U = E (U: unitary) and P{ = P (P: Hermitian). Then, from (24.428) and (24.433), we obtain

1244

24

P2 = Λ~1=2

2

Basic Formalism

:

ð24:434Þ

= WΛW { :

ð24:435Þ

P2 = WΩ2 W { = WΛW { :

ð24:436Þ

From (24.429) we get ~ P2 = Λ1=2

2

Here, assuming P = WΩW{, we obtain

Hence, Ω2 = Λ. Then, we have p ± λ1 Ω=

p ± λ2



p

:

ð24:437Þ

± λn

From the assumption, however, P is positive definite. Consequently, only the positive sign is allowed in (24.437). Then, we must have Ω = Λ1/2, and so from the above assumption of P = WΩW{ and (24.429) we get P = WΛ1=2 W { = Λ~1=2 :

ð24:438Þ

Then, from (24.424) we have ~ U = A Λ1=2

-1

:

ð24:439Þ

From (24.430) we have U = V. This implies that the decomposition (24.424) is unique. These complete the proof. In relation to the proof of Theorem 24.7, we add that including Λ1/2 we have 2n 1=2 matrices Λð ± Þ that satisfy 1=2

1=2

Λð ± Þ Λð ± Þ = Λ, 1=2

where Λð ± Þ is defined as

ð24:440Þ

24.9

Connectedness of the Lorentz Group

1245

p ± λ1 1=2 Λð ± Þ

p ± λ2

=



:

p

ð24:441Þ

± λn

Among them, however, only Λ1/2 is the positive definite operator. The one-dimensional polar representation of a non-zero complex number is the polar coordinate representation expressed as z = eiθr [where the unitary part is described first and r (>0) corresponds to a positive definite eigenvalue]. Then, we can safely say that the polar decomposition of a matrix is a multi-dimensional polar representation of the matrix. Corollary 24.1 [14] Any complex non-singular matrix A is uniquely decomposed into a product such that A = P0 U 0 ,

ð24:442Þ

where P′ is a positive definite Hermitian matrix and U′ is a unitary matrix. (Notice that the product order of P′ and U′ has been reversed against Theorem 24.7.) Proof The proof can be carried out following that of Theorem 24.7 except that we perform the calculation using AA{ instead of A{A used before. That is, we have λ1′ U′

{

AA{ U ′ = Ω 

λ2′



,

ð24:443Þ

λn′

where all the off-diagonal elements are zero. Defining Ω1/2 as þ

λ1′ λ2′

þ

Ω1=2 

ð24:444Þ

⋱ þ

λn′

and proceeding similarly as before, we get {

AA{ = U 0 Ω1=2 Ω1=2 U 0 = U 0 Ω1=2 U 0

{

U 0 Ω1=2 U 0

{

= Ω~1=2  Ω~1=2 = Ω~1=2

2

,

where Ω~1=2 is defined as Ω~1=2  U 0 Ω1=2 U 0{ . Moreover, defining a following unitary -1 A, we obtain matrix W as W  Ω~1=2

1246

24

Basic Formalism

A = Ω~1=2  W: This means the existence of the polar decomposition of (24.442). The uniqueness of the polar decomposition can be shown as before. This completes the proof. Corollary 24.2 [14] Let A be a non-singular matrix. Then, there exist positive definite Hermitian matrices H1 and H2 as well as a unitary matrix U that satisfy A = UH 1 = H 2 U:

ð24:445Þ

If and only if A is a normal matrix, then we have H1 = H2. (That is, H1 = H2 and U are commutative.) Proof From Theorem 24.7, there must exist a positive definite Hermitian matrix H1 and a unitary matrix U that satisfy the relation A = UH1. Rewriting this, we have A = UH 1 U { U = H~1 U,

ð24:446Þ

where H~1  UH 1 U { . This is a unitary similarity transformation, and so H~1 is a positive definite Hermitian matrix. From Corollary 24.1 the decomposition A = H~1 U is unique, and so comparing RHSs of (24.445) and (24.446) we can choose H2 as H 2 = H~1 . Multiplying this by U{ from the right, we get H 2 U = H~1 U = UH 1 U { U = UH 1 :

ð24:447Þ

This gives the proof of the former half of Corollary 24.2. From Corollary 24.1, with respect to (24.447) that U and H1 commute is equivalent to H1 = H2. Taking account of this fact, the latter half of the proof is as follows: If U and H1 commute, we have A{ A = H {1 U { UH 1 = H {1 H 1 = H 1 H 1 : Also, we have AA{ = UH 1 H {1 U { = H 1 UU { H {1 = H 1 H {1 = H 1 H 1 ,

ð24:448Þ

where with the second equality we used UH1 = H1U and its adjoint H {1 U { = U { H {1 . From the above two equations, A is normal. Conversely, if A is normal, from (24.447) we have A{ A = H 21 = U { H {2 H 2 U = U { H 22 U = AA{ = H 22 :

24.9

Connectedness of the Lorentz Group

1247

Hence, H 21 = H 22 . Since both H1 and H2 are positive definite Hermitian, as in the proof of Theorem 24.7 we have H1 = H2. Namely, U and H1 = H2 commute. These complete the proof. Regarding real non-singular matrices, we have the following corollary. Corollary 24.3 [8] Any real non-singular matrix A is uniquely decomposed into a product such that A = OP,

ð24:449Þ

where O is an orthogonal matrix and P is a positive definite symmetric matrix. Proof The proof can immediately be performed using Theorem 24.7. This is because a real unitary matrix is an orthogonal matrix and a real Hermitian matrix is a symmetric matrix. Since the Lorentz transformation Λ is represented by a real (4, 4) matrix, from Corollary 24.3 we realize that the matrix Λ is expressed as (24.449); see Sect. 24.2. Tangible examples with respect to the above theorem and corollaries can be seen in the following examples. Example 24.8 Let us take an example of the polar decomposition for a matrix A expressed as 2 1 : 1 0

A=

ð24:450Þ

We have AT A =

5 2

2 : 1

ð24:451Þ

We dealt with the diagonalization of this matrix in Example 14.3. Borrowing U in (14.102), we have D = U HU = T

p 3þ2 2 0

p 0p 3-2 2

=

2þ1 0

2

p

0 2-1

2

: ð24:452Þ

Then, we get p D as well as

1=2

=

2þ1 0

p0 2-1

ð24:453Þ

1248

24

Basic Formalism

D~1=2 = UD1=2 U { = UD1=2 U T 1

p

4-2 2 1 p 4þ2 2

=

-

1

p

4þ2 2 1 p 4-2 2 =

p

1

2þ1 0

p 3=p2 1= 2

p

0 2-1

-

p 1=p2 : 1= 2

p 4-2 2 1 p 4þ2 2

1

p 4þ2 2 1 p 4-2 2 ð24:454Þ

With the matrix V, we obtain V = A D~1=2

2 1 = 1 0 p 1= p 2 : - 1= 2

p 1=p2 1= 2

=

p 1= p 2 - 1= 2

-1

p - 1= p 2 3= 2 ð24:455Þ

Thus, we find that V is certainly an orthogonal matrix. The unique polar decomposition of A is given by A=

2 1

= V  D~1=2 =

1 0

p 1=p2 1= 2

p 1= p 2 - 1= 2

p 3=p2 1= 2

p 1=p2 : 1= 2

ð24:456Þ

In (24.456) V and D~1=2 are commutative, as expected, because A is a real symmetric matrix (i.e., Hermitian and normal). The confirmation that D~1=2 is positive definite is left for readers. Example 24.9 Let us take an example of the polar decomposition for a triangle matrix A expressed as A=

1 2 : 0 1

ð24:457Þ

Similarly proceeding as the above, we should get a result shown as A=

1 0

2 1

=

p 1= p 2 - 1= 2

p 1=p2 1= 2

p 1=p2 1= 2

p 1=p2 : 3= 2

ð24:458Þ

In this case, the orthogonal matrix and the positive definite symmetric matrix are not commutative. Finding the other decomposition is left as an exercise for readers. With respect to the transformation matrix S(Λ) of Sect. 24.4, let us think of a following example.

24.9

Connectedness of the Lorentz Group

1249

Example 24.10 In the general case of the representation matrix described by (24.293) of the previous section, consider a special case of θ = 0. In that case, from (24.293) we obtain eiϕ=2 cosh

ω 2

e - iϕ=2 cosh

0 - eiϕ=2 sinh

ω 2

0 eiϕ=2 0 0

¼

0



- eiϕ=2 sinh

0

0

0

0

0

iϕ=2

0 0 - sinh

0 ω 2

0 sinh

ω 2

0 e - iϕ=2 cosh

ω 2

e - iϕ=2

0

cosh

ω 2

0

- iϕ=2

0 ω cosh 2 0 ω - sinh 2 0

e - iϕ=2 sinh

eiϕ=2 cosh

ω e - iϕ=2 sinh 2

e

0

0

0

0 e

ω 2

ω 2

0 cosh

ω 2

ω 2

0

0 sinh

ω 2

ω 2

0 cosh

:

ð24:459Þ

ω 2

The LHS of Eq. (24.459) is a normal matrix and the two matrices of RHS are commutative as expected. As in this example, if an axis of the rotation and a direction of the Lorentz boost coincide, such successive operations of the rotation and boost are commutative and, hence, the relevant representation matrix S(Λ) is normal according to Corollary 24.2. Theorem 24.7 helps understand that universal covering group exists in the Lorentz groups. The said group is termed special linear group and abbreviated as SL(2, ℂ). In Sects. 20.2 and 20.5, we have easily shown that SU(2) is isomorphic to S3, because we have many constraint conditions imposed upon the parameters of SU(2). Then, we could readily show that since S3 is simply connected, so is SU(2). With SL(2, ℂ), on the other hand, the number of constraint conditions is limited; we have only two conditions resulting from that the determinant of two-dimensional complex matrices is 1. Therefore, it is not easy to examine whether SL(2, ℂ) is simply connected. In virtue of Theorem 24.7, however, we can readily prove that SL(2, ℂ) is indeed simply connected. It will be one of the main subjects of the next section.

1250

24.9.2

24

Basic Formalism

Special Linear Group: SL(2, ℂ)

In this section we examine how SL(2, ℂ) is related to the Lorentz group. We start with the most general feature of SL(2, ℂ) on the basis of the discussion of Sect. 20.5. The discussion is based on Definitions 20.8-20.10 (Sect. 20.5) along with the following definition (Definition 24.5). Definition 24.5 The special linear group SL(2, ℂ) is defined as SL(2, ℂ)  {A = (aij); i, j = 1, 2, aij 2 ℂ, detA = 1}. Theorem 24.8 [8] The group SL(2, ℂ) is simply connected. Proof Let g(t, s) 2 SL(2, ℂ) be an arbitrary continuous matrix function with respect to t in a region I × I = {(t, s); 0 ≤ t ≤ 1, 0 ≤ s ≤ 1}. Also, let g(t, 1) be an arbitrary loop in SL(2, ℂ). Then, according to Theorem 24.7, g(t, 1) can uniquely be decomposed such that gðt, 1Þ = U ðt ÞPðt, 1Þ,

ð24:460Þ

where U(t) is unitary and P(t, 1) is positive definite Hermitian. Since U(t) [2SL(2, ℂ)] is unitary, we must have U(t) 2 SU(2). Suppose that the function P(t, s) can be described as Pðt, sÞ = exp sX ðt Þ ð0 ≤ s ≤ 1Þ,

ð24:461Þ

where X ðt Þ 2 slð2, ℂÞ; slð2, ℂÞ is a Lie algebra of SL(2, ℂ). Since P(t, 1) is Hermitian and s is real, X(t) is Hermitian as well. This can immediately be derived from (15.32). Since we have gðt, 1Þ = U ðt Þ exp X ðt Þ

and

gðt, 0Þ = U ðt Þ

ð24:462Þ

and expsX(t) is analytic, g(t, 1) is homotopic to g(0, t) = U(t); see Definition 20.9. Because SU(2) is simply connected (see Sect. 20.5.3), U(t) is homotopic to any constant loop (Definition 20.10). Consequently, from the above statement g(t, 1) is homotopic to a constant loop. This loop can arbitrarily be chosen and g(t, 1) is an arbitrary loop in SL(2, ℂ). Therefore, SL(2, ℂ) is simply connected. During the process of the above proof, an important feature of SL(2, ℂ) has already shown up. That is, Hermitian operators are contained in slð2, ℂÞ. This feature shows a striking contrast with the Lie algebra suðnÞ in that suðnÞ comprises antiHermitian operators only. Theorem 24.9 [8] The Lorentz group O(3, 1) is self-conjugate. That is, if Λ 2 O(3, 1), then ΛT 2 O(3, 1). Proof From (24.198), for a group element Λ to constitute the Lorentz group O(3, 1) we have

24.9

Connectedness of the Lorentz Group

1251

ΛT ηΛ = η: Multiplying both sides by Λη from the left and Λ-1η from the right, we obtain Λη  ΛT ηΛ  Λ - 1 η = Λη  η  Λ - 1 η: Using η2 = 1, we get ΛηΛT = η

or

ΛT

T

ηΛT = η:

This implies ΛT 2 O(3, 1). Theorem 24.10 [8] Suppose that with the Lorentz group O(3, 1), individual elements Λ 2 O(3, 1) are decomposed into Λ = UP:

ð24:463Þ

Then, their factors U and P belong to O(3, 1) as well with P described by P = exp Y,

ð24:464Þ

where Y 2 oð3, 1Þ is a real symmetric matrix. Proof From Theorem 24.9, ΛT 2 O(3, 1). Then, ΛTΛ 2 O(3, 1). Since Λ is real, from Corollary 24.3 U is an orthogonal matrix and P is a positive definite symmetric matrix. Meanwhile, we have ΛT Λ = PT U T UP = PT P = P2 ,

ð24:465Þ

where with the second equality we used the fact that U is an orthogonal matrix; with the last equality we used the fact that P is symmetric. Then, P2 2 O(3, 1). Therefore, using ∃ X 2 oð3, 1Þ, P2 can be expressed as P2 = exp X:

ð24:466Þ

Since P is a positive definite symmetric matrix, so is P2. Taking the transposition of (24.466), we have P2

T

= P2 = ðexp X ÞT = exp X T ,

ð24:467Þ

where with the last equality we used (15.31). Comparing (24.466) and (24.467), we obtain

1252

24

Basic Formalism

exp X = exp X T :

ð24:468Þ

Since (24.468) equals the positive definite matrix P2, a logarithmic function corresponding to (24.468) uniquely exists [8]. Consequently, taking the logarithm of (24.468) we get X T = X:

ð24:469Þ

Namely, X is real symmetric. As X=2 2 oð3, 1Þ, exp(X/2) 2 O(3, 1). Since P and P2 are positive definite, exp(X/2) is uniquely determined such that expðX=2Þ = P2

1=2

= P:

ð24:470Þ

Hence, P 2 O(3, 1). Putting X/2  Y, we obtain (24.464), to which Y 2 oð3, 1Þ is real symmetric. From (24.463) we have U = ΛP-1, and so U 2 O(3, 1). This completes the proof. Theorem 24.10 shows that O(3, 1) comprises the direct product of a compact subgroup O(3, 1) O(4) and exp ~oð3, 1Þ, where ~oð3, 1Þ is a subspace of oð3, 1Þ consisting of real symmetric matrices. We have another two important theorems with the Lorentz group O(3, 1). Theorem 24.11 [8] The compact subgroup O(3, 1) Oð3, 1Þ

Oð4Þ = Oð3Þ

That is, any elements Λ of O(3, 1) ±1 0

0

O(4) is represented as Oð1Þ:

ð24:471Þ

O(4) can be described by 0

0

ð24:472Þ

In an opposite way, any elements that can be expressed as (24.472) belong to O(3, 1) O(4). Proof We write Λ such that Λ=

g0 g2

g1 g3

; g0 2 ℝ, g1 : ð1, 3Þ matrix, g2 : ð3, 1Þ matrix, g3 2 glð3, ℝÞ:

ð24:473Þ

As Λ is an orthogonal matrix, i.e., ΛT = Λ-1, ΛTηΛ = η is equivalent to

24.9

Connectedness of the Lorentz Group

ηΛ =

g0 - g2

1253

g1 - g3

=

g0 g2

- g1 - g3

= Λη:

ð24:474Þ

Therefore, we have g1 = g2 = 0. From ΛTηΛ = η and the above result, we have ΛT ηΛ =

g0 g1T

1 0 0 0

=

g2T g3T

g1 - g3

g0 - g2

0 0 0 -1 0 0 0 -1 0 0 0 -1

=

g20 0

0 - g3T g3

= η:

ð24:475Þ

From the above equation, we have g0 = ± 1

and

gT3 g3 = 1:

ð24:476Þ

The second equation of (24.476) implies that g3 is an orthogonal matrix. In an opposite way, any elements given in a form of (24.472) must belong to O(3, 1) O(4). These complete the proof. Theorem 24.12 [8] The Lorentz group O(3, 1) consists of four connected components L0  SO0 ð3, 1Þ, L1 , L2 , and L3 that contain

E0  E =

1

0

0

0

0

1

0

0

0 0

E2 =

0 0

-1

0

0

0

0

1

0

0

0 0

0 0

1 0

0 1

1 0

, E1 =

0 1

, E3 =

1

0 0

0

0

1 0

0

0 0

0 1 0 0

0 -1

-1

0 0

0

0

1 0

0

0 0

0 1 0 0

0 -1

,

,

ð24:477Þ

respectively. Proof As in the case of the proof of Theorem 24.8, an arbitrary element Λ of the Lorentz group O(3, 1) can be represented as Λ = Oðt ÞPðt, sÞ,

ð24:478Þ

where O(t) is an orthogonal matrix of O(4); P(t, s) = exp sX(t) for which X(t) is a real symmetric matrix according to Theorem 24.10. Both O(t) and P(t, s) belong to O(3, 1) (Theorem 24.10). Then, we have O(t) 2 O(3, 1) O(4).

1254

24

Basic Formalism

Consequently, making the parameter s change continuously from 1 to 0, we find Λ 2 O(3, 1) is connected to O(t) in O(3, 1). Since O(t) 2 O(3, 1) O(4), O(t) must be described in the form of (24.472) according to Theorem 24.11. The elements Ei (i = 0, 1, 2, 3) of (24.477) certainly belong to O(t) and, hence, belong to O(3, 1) O(4) as well. Meanwhile, Example 20.8 tells us that within a connected set the sign of the determinant should be constant, if the determinant is defined as a real continuous function. Also, we have already known that O(n) is decomposed into two connected sets (Sect. 20.5). In the case of O(3), each set is connected to either of the following two matrices according to the determinant ±1: 8

O=

1 0

0 1

0 0

0

0

1

0

,O =

1 0

1

0

0

0 0 0

:

ð24:479Þ

-1

For this, see (20.326) and (20.327). Taking account of O(1) = {+1, -1} in (24.471), we have 2 × 2 = 4 connected sets in O(3, 1). These four sets originate from the combinations of (24.479) with ±1 of O(1) and the individual connected sets are represented by Ei (i = 0, 1, 2, 3) of (24.477). Suppose that L0 , L1 , L2 , and L3 comprise all the elements that are connected to E0, E1, E2, and E3, respectively. Then, these Li ði = 0, 1, 2, 3Þ are connected sets and arbitrary elements of O(3, 1) are contained in one of them. Thus, it suffices to show that Li \ Lj = ∅ði ≠ jÞ: First, from ΛTηΛ = η we have (detΛ)2 = 1, i.e., detΛ = ± 1. From (24.477) we have detE0 = det E3 = 1 and detE1 = det E2 = - 1. Then, all the elements Λ belonging to L0 and L3 have detΛ = 1 and those belonging to L1 and L2 have detΛ = - 1. From Example 20.8, the elements of L0 cannot be connected to the elements of L1 or those of L2 . Similarly, the elements of L3 cannot be connected to the elements of L1 or those of L2 . Let Λ0 2 L0 and let Λ0 be expressed as Λ0 = O0P0 with the polar decomposition. The element Λ0 is connected to O0, and so O0 2 L0 from Theorem 24.10. Since L0 is a collection of the elements that are connected to E0, so is O0 ½2 L0 \ Oð4Þ. The ~ fþ1g so that detO ~ = 1 in (24.479) and det element E0 can be expressed as E 0 = O {+1} = 1. Meanwhile, let Λ3 2 L3 be described by Λ3 = O3P3. The element Λ3 is connected to O3, and so O3 2 L3 . Similarly to the above case, O3 ½2 L3 \ Oð4Þ is connected to E3, which is expressed as E 3 = O~0 f - 1g so that detO~0 = - 1 in (24.479) and det{-1} = - 1. Again, from Example 20.8, within a connected set the sign of the determinant should be constant. Therefore, the elements of L0 cannot be connected to the elements of L3 . The discussion equally applies to the relationship between L1 and L2 . Thus, after all, we get

24.9

Connectedness of the Lorentz Group

1255

Oð3, 1Þ = C ðE0 Þ [ C ðE1 Þ [ C ðE2 Þ [ C ðE3 Þ, CðE i Þ \ C E j = ∅ði ≠ jÞ,

ð24:480Þ

where CðE i Þ  Li . These complete the proof. In the above proof, (24.480) is in parallel with (20.318) of O(3), where the number of the connected components is two. The group L0  SO0 ð3, 1Þ is a connected component that contains the identity element E0. Then, according to Theorem 20.9, SO0(3, 1) is an invariant subgroup of O(3, 1). Thus, following Sect. 16.3 we have a coset decomposition described by Oð3, 1Þ = E0 SO0 ð3, 1Þ þ E 1 SO0 ð3, 1Þ þ E 2 SO0 ð3, 1Þ þ E 3 SO0 ð3, 1Þ: ð24:481Þ Summarizing the above, we classify the connected components such that [8]: L0  SO0 ð3, 1Þ = fΛ; Λ 2 Oð3, 1Þ, detΛ = 1, Λ00 ≥ 1g, L1 = fΛ; Λ 2 Oð3, 1Þ, detΛ = - 1, Λ00 ≥ 1g, L2 = fΛ; Λ 2 Oð3, 1Þ, detΛ = - 1, Λ00 ≤ - 1g, L3 = fΛ; Λ 2 Oð3, 1Þ, detΛ = 1, Λ00 ≤ - 1g:

ð24:482Þ

The set {E0, E1, E2, E3} forms a group, which is isomorphic to the four-group we dealt with in Sects. 17.1 and 17.2. In parallel with (20.328), we have ð24:483Þ

Oð3, 1Þ=SO0 ð3, 1Þ ffi fE0 , E 1 , E 2 , E3 g:

In theoretical physics, instead of E1 the following E~1 is often used as a representative element:

E~1 ¼

1

0

0

0

0

-1

0

0

0 0

0 0

-1 0

0 -1

The element E~1 represents the inversion symmetry (i.e., spatial inversion) and is connected to E1. This can be shown as in the case of Example 20.6. Suppose a following continuous matrix function:

f ðt Þ =

1 0 0 0

0 cos πt sin πt 0

0 - sin πt cos πt 0

0 0 0 -1

:

1256

24

Basic Formalism

Then, f(t) is a continuous matrix function with respect to real t; f(t) represents product transformations of mirror symmetry with respect to the xy-plane and rotation about the z-axis. We have f(0) = E1 and f ð1Þ = E~1 . Hence, f(0) and f(1) are connected to each other in L1 . The element E1 (or E~1 ) is known as a discrete symmetry along with the element E2 that represents the time reversal. The matrix C defined in (21.164), on the other hand, is not a Lorentz transformation. This is known from Theorem 24.11. Namely, C is unitary, and so if C belonged to O(3, 1), C would be described in a form of (24.472). But it is not the case. This is associated with the fact that C represents the exchange between a pair of particle and antiparticle. Note that this is not a space-time transformation but an “inner” transformation. As a result of Theorem 24.12 and on the basis of Theorem 20.10, we reach the following important consequence: The Lie algebras oð3, 1Þ of O(3, 1) and so0 ð3, 1Þ, of SO0(3, 1) are identical. With the modulus of Λ00 , see (24.207). The special theory of relativity is based upon L0 = O0 ð3, 1Þ where neither the spatial inversion nor time reversal is assumed. As a tangible example, expðθA3 Þ of (24.224) and expð- ωB3 Þ of (24.225) satisfy this criterion. Equation (24.223) reflects the polar decomposition. The representation matrix S(Λ) of (24.293) with respect to the transformation of the Dirac spinor gives another example of the polar decomposition. In this respect, we show some examples below. Example 24.11: Lorentz Transformation Matrix In (24.223), expða  AÞ and expðb  BÞ represent an orthogonal matrix and a positive definite symmetric matrix, respectively. From Theorems 24.10 and 24.11, we have SO0 ð3, 1Þ

SOð4Þ = SOð3Þ

~ of SO0(3, 1) where SO(1)  {1}. Any elements Λ by 1 0

0 0

SOð1Þ,

ð24:484Þ

SO(4) can therefore be described

0

ð24:485Þ

In fact, (24.224) is an orthogonal matrix, i.e., a specific type of (24.485), and a unitary matrix component of the polar decomposition (Theorem 24.7 and Corollary 24.3). Equation (24.225) represents an example of a positive definite real symmetric matrix (i.e., Hermitian) component of the polar decomposition. The matrix expð- ωB3 Þ of (24.225) has eigenvalues cosh ω þ sinh ω, cosh ω - sinh ω, 1, 1:

ð24:486Þ

24.9

Connectedness of the Lorentz Group

1257

Since all these eigenvalues are positive and det expð- ωB3 Þ = 1, expð- ωB3 Þ is positive definite (see Sect. 13.2) and the product of the eigenvalues of (24.486) is 1, as can be easily checked. Consequently, (24.223) is a (unique) polar decomposition. Example 24.12: Transformation Matrix S(Λ) of Dirac Spinor As already shown, (24.293) describes the representation matrix S(Λ) with respect to the transformation of the Dirac spinor caused by the Lorentz transformation. The RHS of (24.293) can be decomposed such that Sð Λ Þ ¼ iϕ

e 2 cos iϕ

θ 2

- e - 2 sin

×

θ 2 iϕ θ e - 2 cos 2 iϕ

e 2 sin

θ 2

0

0

0

0

ω cosh 2 0 ω - sinh 2 0

0

0

0

0

θ 2 iϕ iϕ θ θ e - 2 cos - e - 2 sin 2 2 ω - sinh 0 2 ω 0 sinh 2 : ω cosh 0 2 ω 0 cosh 2 iϕ

e 2 cos

0 cosh

ω 2

0 sinh

ω 2

θ 2



e 2 sin

ð24:487Þ

The first factor of (24.487) is unitary and the second factor is a real symmetric matrix. The eigenvalues of the second factor are positive. These are given by ω ω ω þ sinh ðdoubly degenerateÞ, cosh 2 2 2 ω - sinh ðdoubly degenerateÞ 2

cosh

ð24:488Þ

with a determinant of 1 (>0). Hence, the second matrix is positive definite. Thus, as in the case of Example 24.10, S(Λ) gives a unique polar decomposition. Given the above tangible examples, we wish to generalize the argument a little bit further. In Theorem 24.7 and Corollary 24.1 of Sect. 24.9.1 we mentioned two ways of the polar decomposition. What about the inverse Lorentz transformation, then? The inverse transformation of (24.424) is described by

1258

24

A - 1 = P - 1U - 1:

Basic Formalism

ð24:489Þ

Here, suppose that a positive definite Hermitian operator P is represented by a (n, n) square matrix. From Theorem 14.5 using a suitable unitary matrix U, we have a following diagonal matrix D U { PU = D 

λ1 ⋮

⋯ ⋱ ⋯

⋮ λn

ð24:490Þ

,

where λ1, ⋯, λn are real positive numbers. Taking the inverse of (24.490), we obtain U - 1P - 1 U{

-1

= U{P - 1U =

1=λ1 ⋮

⋯ ⋱ ⋯

⋮ 1=λn

:

ð24:491Þ

Meanwhile, we have P-1

{

= P{

-1

= P - 1:

ð24:492Þ

Equations (24.491) and (24.492) imply that P-1 is a positive definite Hermitian operator. Thus, the above argument ensures that the inverse transformation A-1 of (24.489) can be dealt with on the same footing as the transformation A. In fact, in both Examples 24.11 and 24.12, eigenvalues of the positive definite real symmetric (i.e., Hermitian) matrices are unchanged, or switched to one another (consider e.g., exchange between ω and -ω) after the inversion. We have known from Theorem 24.8 that an arbitrary continuous function g(t, s) 2 SL(2, ℂ) is homotopic to U(t) 2 SU(2) and concluded that since SU(2) is simply connected, SL(2, ℂ) is simply connected as well. As for the Lorentz group O(3, 1), on the other hand, O(3, 1) is homotopic to O(t) 2 O(4), but since O(4) is not simply connected, we infer that O(3, 1) is not simply connected, either. In the next section, we investigate the representation of O(3, 1), especially that of the proper orthochronous Lorentz group SO0(3, 1). We further investigate the connectedness of the Lorentz group with this special but important case.

24.10

Representation of the Proper Orthochronous Lorentz Group SO0(3, 1) [6, 8]

Amongst four connected components of the Lorentz group O(3, 1), the component that is connected to the identity element E0  E does not contain the spatial inversion or time reversal. This component itself forms a group termed SO0(3, 1) and frequently appears in the elementary particle physics.

24.10

Representation of the Proper Orthochronous Lorentz. . .

1259

Let us think of a following set V4 represented by V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian ,

ð24:493Þ

where (hij) is a complex (2, 2) Hermitian matrix. That is, V4 is a set of all (2, 2) Hermitian matrices. The set V4 is a vector space on a real number field ℝ. To show this, we express an arbitrary element H of V4 as a c - id

H=

c þ id b

ða; b; c; d : realÞ:

ð24:494Þ

We put σ0 =

1 0 , σ1 = 0 1

0 1

1 , σ2 = 0

0 i , σ3 = -i 0

-1 0

0 : 1

ð24:495Þ

Notice that σ k (k = 1, 2, 3) are the Pauli spin matrices that appeared in (20.41). Also, σ k is expressed as σ k = 2iζ k ðk = 1, 2, 3Þ, where ζ k is given by (20.270). Whereas ζ k is anti-Hermitian, σ k is Hermitian. Then, H of (24.494) can be described with Hermitian operators σ μ (μ = 0, 1, 2, 3) by H=

1 1 ða þ bÞσ 0 þ ð- a þ bÞσ 3 þ cσ 1 þ dσ 2 : 2 2

ð24:496Þ

Equation (24.496) implies that any vector H belonging to V4 is expressed as a linear combination of the four basis vectors σ μ (μ = 0, 1, 2, 3). Since four coefficients of σ μ (μ = 0, 1, 2, 3) are real, it means that V4 is a four-dimensional vector space on a real number field ℝ. If we view V4 as an inner product space, the orthonormal set is given by 1 1 1 1 p σ0, p σ1 , p σ2, p σ3: 2 2 2 2

ð24:497Þ

Notice that among σ μ (μ = 0, 1, 2, 3), σ 0 is not an element of the Lie algebra slð2, ℂÞ since the trace of σ 0 is not zero. Meanwhile, putting H= we have

x0 - x3 x1 - ix2

x1 þ ix2 x0 þ x 3

x0 ; x1 ; x2 ; x3 : real ,

ð24:498Þ

1260

24

Basic Formalism

H = x0 σ 0 þ x1 σ 1 þ x2 σ 2 þ x3 σ 3 = xμ σ μ :

ð24:499Þ

That is, (24.498) gives the coordinates xμ (μ = 0, 1, 2, 3) in reference to the basis vectors σ μ. Now, we consider the following operation: H ⟼ qHq{ ½q 2 SLð2, ℂÞ,

ð24:500Þ

with q=

a c

b d

, detq = ad - bc = 1:

ð24:501Þ

We have (qHq{){ = qH{q{ = qHq{. That is, qHq{ 2 V4 . Defining this mapping as φ[q], we write φ½q : V4 ⟶ V4 , φ½qðH Þ  qHq{ :

ð24:502Þ

The mapping φ[q] is a linear transformation within V4 . It is because we have φ½qðH 1 þ H 2 Þ = φ½qðH 1 Þ þ φ½qðH 2 Þ, φ½qðαH Þ = αφ½qðH Þ ðα 2 ℝÞ: ð24:503Þ Also, we have φ½qpðH Þ = qpH ðqpÞ{ = qpHp{ q{ = q pHp{ q{ = φ½q½φ½pðH Þ = φ½qφ½pðH Þ:

ð24:504Þ

Thus, we find that q ⟼ φ[q] is a real representation of SL(2, ℂ) over V4 . Using the notation of Sect. 11.2, we have

H ′ = φ½qðH Þ = ðσ 0 σ 1 σ 2 σ 3 Þφ½q

= ðσ 0 φ½q σ 1 φ½q σ 2 φ½q σ 3 φ½qÞ

Then, with e.g., σ 0φ[q] we obtain

x0 x1 x2 x3 x0 x1 x2 x3

:

ð24:505Þ

24.10

Representation of the Proper Orthochronous Lorentz. . .

σ 0 φ½q = φ½qðσ 0 Þ = qσ 0 q{ =

a c

1 0

ac þbd  jcj2 þjdj2

jaj2 þjbj2 a cþb d

=

b d

1261

a b

0 1

c d

:

ð24:506Þ

Using the relation (24.496), after the transformation φ[q](σ 0) we get H0 =

1 2

aj2 þ b 2 þ cj2 þ d

2

σ0 þ

1 2

cj2 þ d 2 - aj2 - b

þℜeðac þ bd  Þσ 1 þ ℑmðac þ bd Þσ 2 :

2

σ3 ð24:507Þ

The coefficients of each σ i (i = 0, 1, 2, 3) act as their coordinates. Meanwhile, from (24.498) we have detH = x0

2

- x1

2

- x2

2

2

- x3 :

ð24:508Þ

Equation (24.7) is identical with the bilinear form introduced into the Minkowski space 4 . We know from (24.500) and (24.501) that the transformation φ[q] keeps the determinant unchanged. That is, the bilinear form (24.508) is invariant under this transformation. This implies that φ[q] is associated with the Lorentz transformation. To see whether this is really the case, we continue the calculations pertinent to (24.507). In a similar manner, we obtain related results H′ after the transformations φ[q](σ 1), φ[q](σ 2), and φ[q](σ 3). The results are shown collectively as [6] ðσ 0 σ 1 σ 2 σ 3 Þφ½q = ðσ 0 σ 1 σ 2 σ 3 Þ × 1 jaj2 þ jbj2 þ jcj2 þ jd j2 2 Reðac þ bd  Þ Imðac þ bd  Þ 1 2 jcj þ jd j2 - jaj2 - jbj2 2

Reðab þ cd  Þ - Imðab þ cd Þ Reðad  þ bc Þ - Imðad  - bc Þ Imðad  þ bc Þ Reðad  - bc Þ Reðcd  - ab Þ

Imðab - cd Þ

1 jbj2 þ jdj2 - jaj2 - jcj2 2 Reðbd  - ac Þ : Imðbd  - ac Þ 1 jaj2 þ jd j2 - jbj2 - jcj2 2 ð24:509Þ

Equation (24.509) clearly gives a matrix representation of SO0(3, 1). As already noticed, the present method based on (24.500) is in parallel to the derivation of (20.302) containing Euler angles performed by the use of the adjoint representation. In either case, we have represented a vector in the real vector space (either threedimensional or four-dimensional) and obtained the matrices (20.302) or (24.509) as a set of the expansion coefficients with respect to the basis vectors. Let us further inspect the properties of (24.509) in the following example.

1262

24

Basic Formalism

Example 24.13 We may freely choose the parameters a, b, c, and d under the restriction of (24.501). Let us choose b = c = 0 and a = eiθ/2 and d = e-iθ/2 in (24.501). That is, q=

eiθ=2 0

0 : e - iθ=2

ð24:510Þ

Then, we have ad = eiθ to get 1 0 0 cos θ 0 sin θ 0 0

φ½q =

0 - sin θ cos θ 0

0 0 0 1

:

ð24:511Þ

This is just a rotation by θ around the z-axis. Next, we choose a = d = cosh (ω/2) and b = c = - sinh (ω/2). Namely, q=

coshðω=2Þ - sinhðω=2Þ

- sinhðω=2Þ : coshðω=2Þ

ð24:512Þ

Then, we obtain coshω - sinhω - sinhω coshω 0 0 0 0

φ½ q =

0 0 1 0

Moreover, if we choose b = c = 0 and a = cosh cosh ω2 þ sinh ω2 in (24.501), we have q=

cosh

ω ω - sinh 2 2 0

0 0 0 1 ω 2

:

- sinh

0 ω ω cosh þ sinh 2 2

:

ð24:513Þ

ω 2

and d =

ð24:514Þ

Then, we get

φ½ q =

coshω 0 0 sinhω

0 1 0 0

0 0 1 0

sinhω 0 0 coshω

:

ð24:515Þ

The latter two illustrations represent Lorentz boosts. We find that the (2, 2) matrix of (24.510) belongs to SU(2) and that the (2, 2) matrices of (24.512) and (24.514) are positive definite symmetric matrices. In terms of Theorem 24.7 and Corollary 24.3,

24.10

Representation of the Proper Orthochronous Lorentz. . .

1263

these matrices have already been decomposed. If we expressly (or purposely) write in a decomposed form, we may do so with the above examples such that q= q=

1 0

0 1

eiθ=2 0

0 e - iθ=2

coshðω=2Þ - sinhðω=2Þ

1 0

0 , 1

- sinhðω=2Þ : coshðω=2Þ

ð24:516Þ ð24:517Þ

The identity matrix of (24.516) behaves as a positive definite Hermitian matrix P in relation to Theorem 24.7. The identity matrix of (24.517) behaves as an orthogonal matrix O in relation to Corollary 24.3. The parameter θ in (24.510) is bounded. A group characterized by a bounded parameter is called a compact group, to which a unitary representation can be constructed (Theorem 18.1). In contrast, the parameter ω of (24.512) and (24.514) ranges from -1 to +1. A group characterized by an unbounded parameter is called a non-compact group and the unitary representation is in general impossible. Equations (24.11) and (24.13) are typical examples. We investigate the constitution of the Lorentz group a little further. In Sect. 20.4, we have shown that SU ð2Þ=F ffi SOð3Þ, F = fe, - eg,

ð24:518Þ

where F is the kernel and e is the identity element. To show this, it was important to show the existence of the kernel F . In the present case, we wish to show the characteristic of the kernel N with respect to the homomorphism mapping φ, namely N = fq 2 SLð2, ℂÞ; φ½q = φ½σ 0 g:

ð24:519Þ

First, we show the following corollary. Corollary 24.4 [8] Let ρ be a continuous representation of a connected linear Lie group G and let eG be the identity element of G. Let H be another Lie group and let H0 be a connected component of the identity element eH of H. We express the image of G by ρ as ρ(G). If ρ(G) is contained in a Lie group H, then ρ(G) is contained in a connected component H0. Proof Since ρ transforms the identity element eG to eH, eH 2 ρ(G). The identity element eH is contained in H. Because ρ(G) is an image of the connected set G, in virtue of Theorem 20.8 we find that ρ(G) is also a connected subset of H. We have eH 2 ρ(G), and so this implies that all the elements of ρ(G) are connected to eH. This means that ρ(G) ⊂ H0. This completes the proof. The representation φ[q] keeps the bilinear form given by detH of (24.508) invariant and φ[q], to which q 2 SL(2, ℂ), is the representation of SO0(3, 1). Meanwhile, SL(2, ℂ) is (simply) connected and SO0(3, 1) is the connected Lie

1264

24

Basic Formalism

group that contains the identity element E0. Then, according to Corollary 24.4, we have φ½SLð2, ℂÞ ⊂ SO0 ð3, 1Þ:

ð24:520Þ

As already seen in (24.509), φ[q] [q 2 SL(2, ℂ)] has given the matrix representation of the Lorentz group SO0(3, 1). This seems to imply at first glance that SO0 ð3, 1Þ ⊂ φ½SLð2, ℂÞ:

ð24:521Þ

Strictly speaking to assert (24.521), however, we need the knowledge of the differential representation and Theorem 25.2 based on it. Equation (24.21) will be proven in Sect. 25.1 accordingly. Taking (24.521) in advance and combining (24.520) with (24.521), we get φ½SLð2, ℂÞ = SO0 ð3, 1Þ:

ð24:522Þ

Now, we have a question of what kind of property φ has (see Sect. 16.4). In terms of the mapping, if we have f ð x1 Þ = f ð x 2 Þ



x1 = x2 ,

ð24:523Þ

the mapping f is injective (Sect. 11.2). If (24.523) is not true, f is not injective, and so f is not isomorphic either. Therefore, our question can be translated into whether or not the following proposition is true of φ: φ ½ q  = φ½ σ 0 



q = σ0:

ð24:524Þ

Meanwhile, φ[σ 0] gives the identity mapping (Sect. 16.4). In fact, we have φ½σ 0 ðH Þ = σ 0 Hσ 0 { = σ 0 Hσ 0 = H:

ð24:525Þ

Consequently, our question boils down to the question of whether or not we can find q (≠σ 0) for which φ[q] gives the identity mapping. If we find such q, φ[q] is not isomorphic, but has a non-trivial kernel. Let us get back to our current issue. We have φ½ - σ 0 ðH Þ = ð- σ 0 ÞH - σ 0 { = σ 0 Hσ 0 = H:

ð24:526Þ

Then, φ[-σ 0] gives the identity mapping as well; that is, φ[-σ 0] = φ[σ 0]. Equation (24.24) does not hold with φ accordingly. Therefore, according to Theorem 16.2 φ is not isomorphic. To fully characterize the representation φ, we must seek element (s) other than -σ 0 as a candidate for the kernel.

24.10

Representation of the Proper Orthochronous Lorentz. . .

1265

Now, suppose that q 2 SL(2, ℂ) is an element of the kernel N , i.e., q 2 N . Then, for an arbitrary element H 2 V4 defined in (24.493) we must have φ½qðH Þ = qHq{ = H: In a special case where we choose σ 0 (2H ), the following relation must hold: φ½qðσ 0 Þ = qσ 0 q{ = qq{ = σ 0 ,

ð24:527Þ

where the last equality comes from the supposition that φ[q] is contained in the kernel N . This means that q is unitary. That is, q{ = q-1. Then, for any H 2 V4 we have φ½qðH Þ = qHq{ = qHq - 1 = H

or

qH = Hq:

ð24:528Þ

Meanwhile, any (2, 2) complex matrices z can be expressed as z = x þ iy, x, y 2 V4 ,

ð24:529Þ

which will be proven in Theorem 24.14 given just below. Then, from both (24.528) and (24.529), q must be commutative with any (2, 2) complex matrices. Let C be a set of all the (2, 2) complex matrices described as C = C = cij ; i, j = 1, 2, cij 2 ℂ :

ð24:530Þ

Then, since C ⊃ SLð2, ℂÞ, it follows that q is commutative with 8g 2 SL(2, ℂ). Consequently, according to Schur’s Second Lemma q must be expressed as q = αE ½α 2 ℂ, E : ð2, 2Þ identity matrix: Since detq = 1, we must have α2 = 1, i.e., α = ± 1. This implies that we have only the possibility that the elements of N are limited to q that satisfies q = σ0

or

q = - σ0:

ð24:531Þ

Thus, we obtain N = fσ 0 , - σ 0 g:

ð24:532Þ

Any other elements belonging to V4 are excluded from the kernel N . This implies that φ is a two-to-one homomorphic mapping. At the same time, from Theorem 16.4 (Homomorphism Theorem) we obtain

1266

24

Basic Formalism

SO0 ð3, 1Þ ffi SLð2, ℂÞ=N :

ð24:533Þ

Equation (24.32) is in parallel to the relationship between SU(2) and SO(3) discussed in Sect. 20.4.3; see (20.314). Thus, we reach a next important theorem. Theorem 24.13 [8] Suppose that we have a representation φ[q] [q 2 SL(2, ℂ)] on V4 described by φ½qðH Þ : V4 ⟶ V4 , φ½qðH Þ = qHq{ :

ð24:502Þ

Then, φ defined by φ : SL(2, ℂ) ⟶ SO0(3, 1) is a two-to-one surjective homomorphic mapping from SL(2, ℂ) to SO0(3, 1). Namely, we have SO0 ð3, 1Þ ffi SLð2, ℂÞ=N , where N = fσ 0 , - σ 0 g. Thus, we find that the relationship between SO0(3, 1) and SL(2, ℂ) can be dealt with in parallel with that between SO(3) and SU(2). As the representation of SU(2) has led to that of SO(3), so the representation of SL(2, ℂ) boils down to that of SO0(3, 1). We will come back to this issue in Sect. 25.1. Moreover, we have simple but important theorems on the complex matrices and, more specifically, on slð2, ℂÞ. Theorem 24.14 [8] Any (2, 2) complex matrices z can be expressed as z = x þ iy x, y 2 V4 :

ð24:534Þ

Proof Let z be described as z=

a þ ib p þ iq

c þ id r þ is

ða; b; c; d; p; q; r; s 2 ℝÞ:

ð24:535Þ

Then, z is written as z=

1 ½ða þ rÞσ 0 þ ð- a þ r Þσ 3 þ ðc þ pÞσ 1 þ ðd - qÞσ 2  2

i þ ½ðb þ sÞσ 0 þ ð- b þ sÞσ 3 þ ðd þ qÞσ 1 þ ð- c þ pÞσ 2 , 2

ð24:536Þ

where σ i ði = 0, 1, 2, 3Þ 2 V4 is given by (24.495). The first and second terms of (24.536) are evidently expressed as x and iy of (24.534), respectively. This completes the proof. Note in Theorem 24.14 all the (2, 2) complex matrices z form the Lie algebra glð2, ℂÞ. Therefore, from (24.535) glð2, ℂÞ has eight degrees of freedom, namely, glð2, ℂÞ is an eight-dimensional vector space.

24.10

Representation of the Proper Orthochronous Lorentz. . .

1267

Theorem 24.15 [8] Let Z be any arbitrary element of slð2, ℂÞ. Then, z is uniquely described as Z = X þ iY ½X, Y 2 suð2Þ:

ð24:537Þ

Proof Let X and Y be described as X = Z - Z { =2, Y = Z þ Z { =2i:

ð24:538Þ

Then, (24.537) certainly holds. Both X and Y are anti-Hermitian. As Z is traceless, so are X and Y. Therefore, X, Y 2 suð2Þ. Next, suppose that X′(≠X) and Y′(≠Y ) satisfy (24.538). That is, Z = X 0 þ iY 0 ½X 0 , Y 0 2 suð2Þ:

ð24:539Þ

Subtracting (24.537) from (24.539), we obtain ðX 0 - X Þ þ iðY 0 - Y Þ = 0:

ð24:540Þ

Taking the adjoint of (24.540), we get - ðX 0 - X Þ þ iðY 0 - Y Þ = 0:

ð24:541Þ

Adding (24.540) and (24.541), we get 2iðY 0 - Y Þ = 0

or

Y 0 = Y:

ð24:542Þ

Subtracting (24.541) from (24.540) once again, we have 2ð X 0 - X Þ = 0

or

X 0 = X:

ð24:543Þ

Equations (24.40) and (24.41) indicate that the representation (24.537) is unique. These complete the proof. The fact that an element of slð2, ℂÞ is traceless is shown as follows: From (15.41), we have detðexp tAÞ = expðTrtAÞ; t : real:

ð24:544Þ

Suppose A 2 slð2, ℂÞ. Then, exptA 2 SL(2, ℂ). Therefore, we have expðTrtAÞ = detðexp tAÞ = 1,

ð24:545Þ

1268

24

Basic Formalism

where with the first equality we used (15.41) with A replaced with tA; the last equality came from Definition 24.5. This implies that TrtA = tTrA = 0. That is, A is traceless. Theorem 24.15 gives an outstanding feature to slð2, ℂÞ. That is, Theorem 24.15 tells us that the elements belonging to slð2, ℂÞ can be decomposed into a matrix belonging to suð2Þ and that belonging to the set i  suð2Þ. In turn, the elements of slð2, ℂÞ can be synthesized from those of suð2Þ. With suð2Þ, an arbitrary vector is described as a linear combination on a real field ℝ. In fact, let X be a vector on suð2Þ. Then, we have X=

a - ib - ic

ic - a - ib

ða; b; c 2 ℝÞ:

ð24:546Þ

Defining the basis vectors of suð2Þ by the use of the matrices defined in (20.270) as ζ1 =

0

-i

-i

0

, ζ2 =

0

1

-1

0

, ζ3 =

i

0

0

-i

,

ð24:547Þ

we describe X as X = cζ~3 þ aζ~2 þ bζ~1 ða, b, c 2 ℝÞ:

ð24:548Þ

As for slð2, ℂÞ, on the other hand, we describe Z 2 slð2, ℂÞ as Z=

a þ ib p þ iq

c þ id - a - ib

ða; b; c; d; p; q 2 ℝÞ:

ð24:549Þ

Using (24.547) for the basis vectors, we write Z as 1 i Z = ðb - iaÞζ~3 þ ðc - pÞ þ ðd - qÞ ζ~2 2 2 1 i þ - ðd þ qÞ þ ðc þ pÞ ζ~1 : 2 2

ð24:550Þ

Thus, we find that slð2, ℂÞ is a complex vector space. If in general a complex Lie algebra g has a real Lie subalgebra h (i.e., h ⊂ g) and an arbitrary element Z 2 g is uniquely expressed as Z = X þ iY ðX, Y 2 hÞ,

ð24:551Þ

then, h is said to be a real form of g. By the same token, g is said to be complexification of h [8]. In light of (24.551), the relationship between

References

1269

suð2Þ and slð2, ℂÞ is evident. Note that in the case of slð2, ℂÞ, X is anti-Hermitian  and iY is Hermitian. Defining the operators ~ζ k such that ~ζ   iζ~k ðk = 1, 2, 3Þ, k

ð24:552Þ

we can construct the useful commutation relations among them. On the basis of (20.271), we obtain ζ~k , ζ~l =

E ζ , m = 1 klm m

ð24:553Þ

~

ð24:554Þ

~ζ  : m

ð24:555Þ

~ζ  , ~ζ  = k l  ζ~k , ~ζ l =

~

3

3

E ζ , m = 1 klm m 3

E m = 1 klm

The complexification is an important concept to extend the scope of applications of Lie algebra. Examples of the complexification will be provided in the next chapter. In the above discussion we mentioned the subalgebra. Its definition is formally given as follows. Definition 24.6 Let g be a Lie algebra. Suppose that h is a subspace of g and that with 8 X, Y 2 h we have ½X, Y  2 h. Then, h is called a subalgebra of g.

References 1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 3. Birkhoff G, Mac Lane S (2010) A survey of modern algebra, 4th edn. CRC Press, Boca Raton 4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York 5. Rosén A (2019) Geometric multivector analysis. Birkhäuser, Cham 6. Hirai T (2001) Linear algebra and representation of groups II. Asakura Publishing, Tokyo. (in Japanese) 7. Kugo C (1989) Quantum theory of gauge field I. Baifukan, Tokyo. (in Japanese) 8. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo. (in Japanese) 9. Hotta S (2023) A method for direct solution of the Dirac equation. Bull Kyoto Inst Technol 15: 27–46 10. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York 11. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese) 12. Goldstein H, Poole C, Safko J (2002) Classical mechanics, 3rd edn. Addison Wesley, San Francisco 13. Schweber SS (1989) An introduction to relativistic quantum field theory. Dover, New York 14. Mirsky L (1990) An Introduction to linear algebra. Dover, New York 15. Hassani S (2006) Mathematical physics. Springer, New York

Chapter 25

Advanced Topics of Lie Algebra

Abstract In Chap. 20, we dealt with the Lie groups and Lie algebras within the framework of the continuous groups. With the Lie groups the operation is multiplicative (see Part IV), whereas the operation of the Lie algebras is additive since the Lie algebra forms a vector space (Part III). It is of importance and interest, however, that the representation is multiplicative in both the Lie groups and Lie algebras. So far, we have examined the properties of the representations of the groups (including the Lie group), but little was mentioned about the Lie algebra. In this chapter, we present advanced topics of the Lie algebra with emphasis upon its representation. Among those topics, the differential representation of Lie algebra is particularly useful. We can form a clear view of the representation theory by contrasting the representation of the Lie algebra with that of the Lie group. Another topic is the complexification of the Lie algebra. We illustrate how to extend the scope of application of Lie algebra through the complexification with actual examples. In the last part, we revisit the topic of the coupling of angular momenta and compare the method with the previous approach based on the theory of continuous groups.

25.1 25.1.1

Differential Representation of Lie Algebra [1] Overview

As already discussed, the Lie algebra is based on the notion of infinitesimal transformation around the identity element of the Lie groups (Sect. 20.4). The representation of Lie groups and that of Lie algebras should have close relationship accordingly. As the Lie algebra is related to the infinitesimal transformation, so is its representation. We summarize the definition of the representation of Lie algebra. We have a differential representation as another representation specific to Lie algebra. The differential representation is defined in relation to the representation of the Lie group.

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4_25

1271

1272

25

Advanced Topics of Lie Algebra

Definition 25.1 Let σ(X) be a mapping that relates X 2 g (i.e., X is an element of a Lie algebra g) to a vector space V. Suppose that σ(X) is a linear transformation and that σ(X) satisfies the following relationships (i), (ii), and (iii): (i) σ ðaX Þ = aσ ðX Þ ða 2 ℝÞ,

ð25:1Þ

σ ðX þ Y Þ = σ ðX Þ þ σ ðY Þ,

ð25:2Þ

σ ðXY Þ = σ ðX Þσ ðY Þ:

ð25:3Þ

(ii)

(iii)

Then, the mapping σ(X) is called a representation of the Lie algebra g on V. We schematically write this mapping as σ : g→V or g 3 X ° σ ðX Þ 2 V:

ð25:4Þ

The relationship (iii) implies that the σ(X) is a homomorphic mapping on g. In the above definition, if V of (25.4) is identical to g, σ(X) is an endomorphism (Chap. 11). Next, we think of the mapping σ(X), where X represents a commutator. We have σ ð½X, Y Þ = σ ðXY - YX Þ = σ ðXYÞ - σ ðYX Þ = σ ðX Þσ ðY Þ - σ ðY Þσ ðX Þ = ½σ ðX Þ, σ ðY Þ, where with the second equality we used (25.2); with the third equality we used (25.3). That is, (iv) σ ð½X, Y Þ = ½σ ðX Þ, σ ðY Þ:

ð25:5Þ

The reducibility and irreducibility of the representation can similarly be defined as before (see, e.g., Sect. 20.2). The expression (25.5) plays a special role with the representation because it holds the commutation relation unchanged in V as well as in g. More importantly, we have a relationship described by ρðexp tXÞ = exp½tσ ðX Þ

ðX 2 g, t 2 ℝÞ,

ð25:6Þ

where ρ is a representation of the Lie group ℊ corresponding to g. Differentiating both sides of (25.6) and taking the limit of t → 0, we get

25.1

Differential Representation of Lie Algebra [1]

lim

t→0

1 ½ρðexp tX Þ - ρðexp 0Þ = σ ðX Þ lim ½exp tσ ðX Þ: t t→0

1273

ð25:7Þ

From (18.4) we have ρðexp 0Þ = ρðeÞ = E,

ð25:8Þ

where e and E are an identity element of ℊ and ρ(ℊ), respectively. Then, (25.7) can be rewritten as σ ðX Þ = lim

t→0

1 ½ρðexp tX Þ - E : t

ð25:9Þ

Thus, the relationship between the Lie group ℊ and its representation space L S [see (18.36)] can be translated into that between the Lie algebra g and the associated vector space V. In other words, as a whole collection consisting of X forms the Lie algebra g (i.e., a vector space g), so a whole collection consisting of σ(X) forms the Lie algebra on the vector space V. We summarize the above statements as a following definition. Definition 25.2 Suppose that we have a relationship described by ρðexp tXÞ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ:

ð25:6Þ

The representation σ(X) of g on V is called a differential representation for the representation ρ of the Lie group ℊ. In this instance, σ is expressed as σ  dρ:

ð25:10Þ

To specify the representation space that is associated with the Lie group and Lie algebra, we normally write the combination of the representation σ or ρ and its associated representation space V as [1] ðσ, VÞ

or

ðρ, VÞ:

ð25:11Þ

Notice that the same representation space V has been used in (25.11) for both the representations σ and ρ relevant to the Lie algebra and Lie group, respectively.

25.1.2

Adjoint Representation of Lie Algebra [1]

One of the most important representations of the Lie algebras is an adjoint representation. In Sect. 20.4.3, we defined the adjoint representation for the Lie group. In the present section, we deal with the adjoint representation for the Lie algebra. It is

1274

25

Advanced Topics of Lie Algebra

defined as the differential representation of the adjoint representation Ad[g] of the Lie group ℊ 3 g. In this case, the differential representation σ(X) is denoted by σ ðX Þ  ad½X ,

ð25:12Þ

where we used an angled bracket with RHS in accordance with Sect. 20.4.3. The operator ad[X] operates on an element Y of the Lie algebra such that ad½X ðY Þ:

ð25:13Þ

With (25.13) we have a simple but important theorem. Theorem 25.1 [1] With X, Y 2 g we have ad½X ðY Þ = ½X, Y :

ð25:14Þ

gX ðt Þ = Ad½exp tX :

ð25:15Þ

Proof Let gX(t) be expressed as

That is, gX is an adjoint representation of the Lie group ℊ on the Lie algebra g. Notice that we are using (25.15) for ρ(exptX) of (25.6). Then, according to (25.6) and (25.9), we have ad½X ðY Þ = g0X ð0ÞY = =

d ðexp tX ÞY exp tXÞ - 1 dt

t=0

d ½ðexp tX ÞY ðexp - tX Þt = 0 = XY - YX = ½X, Y , dt

ð25:16Þ

where with the third equality we used (15.28) and with the second to the last equality we used (15.4). This completes the proof. Equation (25.14) can be seen as a definition of the adjoint representation of the Lie algebra. We can directly show that the operator ad[X] is indeed a representation of g as follows: That is, with any arbitrary element Z 2 g we have ad½X, Y ðZ Þ = ½½X, Y , Z  = - ½Z, ½X, Y  = ½X, ½Y, Z  - ½Y, ½X, Z  = ½X, ad½Y ðZ Þ - ½Y, ad½X ðZ Þ = ad½X ðad½Y Z Þ - ad½Y ðad½X Z Þ = ad½X ad½Y ðZ Þ - ad½Y ad½X ðZ Þ = ½ad½X , ad½Y ðZ Þ, ð25:17Þ where with the third equality we used Jacobi’s identity (20.267). Comparing the first and last sides of (25.17), we get

25.1

Differential Representation of Lie Algebra [1]

1275

ad½X, Y ðZ Þ = ½ad½X , ad½Y ðZ Þ:

ð25:18Þ

Equation (25.18) holds with any arbitrary element Z 2 g. This implies that ad½X, Y  = ½ad½X , ad½Y :

ð25:19Þ

From (25.5), (25.19) satisfies the condition for a representation of the Lie algebra g.

We know that if X, Y 2 g, ½X, Y  2 g (see Sect. 20.4). Hence, we have a following relationship described by g 3 Y ° ad½X ðY Þ = Y 0 2 g ðX 2 gÞ

ð25:20Þ

ad½X  : g → g:

ð25:21Þ

or

That is, ad[X] is an endomorphism within g. Example 25.1 In Example 20.4 we had (20.270) as suð2Þ. Among them we use i 0 ζ 3 = 12 2 suð2Þ for X in (25.20). We have 0 -i ad½X  : g → g ½X 2 suð2Þ, g = suð2Þ:

ð25:22Þ

ad½ζ 3 ðY Þ = ζ 3 Y - Yζ 3 ½Y 2 suð2Þ:

ð25:23Þ

Also, we have

Substituting ζ 1, ζ 2, and ζ 3 for Y, respectively, we obtain ad½ζ 3 ðζ 1 Þ = ζ 2 , ad½ζ 3 ðζ 2 Þ = - ζ 1 , and ad½ζ3 ðζ 3 Þ = 0:

ð25:24Þ

That is, we get

ðζ 1 ζ 2 ζ 3 Þad½ζ 3  = ðζ 1 ζ 2 ζ 3 Þ

Similarly, we obtain

0 1

-1 0 0 0

0

0

0

= ðζ 1 ζ 2 ζ 3 ÞA3 :

ð25:25Þ

1276

25

Fig. 25.1 Isomorphism ad between suð2Þ and soð3Þ

(2)

Advanced Topics of Lie Algebra

ad

ad: isomorphism

ðζ 1 ζ 2 ζ 3 Þad½ζ 1  = ðζ 1 ζ 2 ζ 3 Þ

ðζ 1 ζ 2 ζ 3 Þad½ζ 2  = ðζ 1 ζ 2 ζ 3 Þ

0 0

0 0 0 -1

0

1

(3)

= ad( )

= ðζ 1 ζ 2 ζ 3 ÞA1 ,

ð25:26Þ

= ðζ 1 ζ 2 ζ 3 ÞA2 :

ð25:27Þ

0

0 0

0 1 0 0

-1

0 0

The (3, 3) matrices of RHS of (25.25), (25.26), and (25.27) are identical to A3, A1, and A2 of (20.274), respectively. Equations (25.25)-(25.27) show that the differential mapping ad transforms the basis vectors ζ1, ζ 2, and ζ 3 of suð2Þ into basis vectors A1, A2, and A3 of soð3Þ ½ = oð3Þ, respectively (through the medium of ζ 1, ζ 2, and ζ 3). From the discussion of Sect. 11.4, the transformation ad is bijective. Namely, suð2Þ and soð3Þ are isomorphic to each other. In Fig. 25.1 we depict the isomorphism ad between suð2Þ and soð3Þ as ad : suð2Þ → soð3Þ:

ð25:28Þ

Notice that the mapping between two Lie algebras suð2Þ and soð3Þ represented in (25.28) is in parallel with the mapping between two Lie groups SU(2) and SO(3) that is described by Ad : SU ð2Þ → SOð3Þ:

ð20:303Þ

Thus, in accordance with (25.11), we have the following correspondence between the differential representation ad½X  ½X 2 suð2Þ on a three-dimensional vector space V3 (as a representation space) and the representation Ad[g] [g 2 SU(2)] on V3 such that

25.1

Differential Representation of Lie Algebra [1]

1277

ad, V3 , Ad, V3 :

ð25:29Þ

In the present example, we have chosen suð2Þ for V3 ; see Sect. 20.4.3. In (25.6), replacing ρ, t, X, and σ with Ad, α, ζ 3, and ad, respectively, we get expfα  ad½ζ 3 g = Ad½expðαζ 3 Þ ðζ 3 2 g, α 2 ℝÞ:

ð25:30Þ

In fact, comparing (20.28) with (20.295), we find that both sides of the above equation certainly give a same matrix expressed as cos α

- sin α

0

sin α 0

cos α 0

0 1

:

Notice that [LHS of (25.30)]= (20.28) and [RHS of (25.30)]= (20.295). As explained above, the present example clearly shows a close relationship between the representations of Lie group and Lie algebra. We can construct the adjoint representation of the Lie algebras using a representation other than (25.14). In that case it is again helpful to use Jacobi’s identity in combination with (20.269) that gives structure constants which define the structure of the Lie algebra (Sect. 20.4.2). We rewrite (20.269) in such a way that the expression given below conforms to the Einstein summation rule X i , X j = f ij l X l :

ð25:31Þ

Jacobi’s identity is described by Xi, Xj, Xk

þ X j , ½X k , X i  þ X k , X i , X j

= 0:

ð25:32Þ

The term [Xi, [Xj, Xk]] can be rewritten by the use of the structure constants such that X i , f jk l X l = f jk l ½X i , X l  = f jk l f il m X m : Similarly rewriting the second and third terms of (25.32), we obtain f jk l f il

m

þ f ki l f jl

m

þ f ij l f kl m X m = 0:

Since Xm (m = 1, ⋯, d; d is a dimension of the representation) is linearly independent, we must have for each m

1278

25

f jk l f il

m

þ f ki l f jl

m

Advanced Topics of Lie Algebra

þ f ij l f kl m = 0:

ð25:33Þ

Taking account of the fact that f ki l is antisymmetric with respect to the indices k and i, (25.33) is rewritten as f jk l f il

m

- f ik l f jl

m

= f ij l f lk m :

ð25:34Þ

l

ð25:35Þ

If in (25.34) we define f jk l as f jk l  σ j

k

,

l

then the operator σ j k is a representation of the Lie algebra. Note that in (25.35) l σ j k indicates (l, k) element of the matrix σ j. By the use of (25.35), (25.34) can be rewritten as ð σ i Þm

l

σj

l k

- σj

m l

ðσ i Þl k = f ij l ðσ l Þm k :

ð25:36Þ

That is, we have σ i , σ j = f ij l σ l :

ð25:37Þ

Equation (25.37) is virtually the same form as (25.31) and (20.269). Hence, we find that (25.35) is the representation of the Lie algebra, whose structure is represented by (25.37) or (20.269). Using the notation (25.14), we get ad½σ i  σ j = σ i , σ j = f ij l σ l :

ð25:38Þ

25.1.3 Differential Representation of Lorentz Algebra [1] As another important differential representation of the Lie algebra, we wish to introduce the differential representation of the Lorentz algebra. In Sect. 24.10 we examined the properties of the operation described by H ° qHq{ ½q 2 SLð2, ℂÞ, with

ð24:500Þ

25.1

Differential Representation of Lie Algebra [1]

q=

a c

1279

b , detq = ad - bc = 1: d

ð24:501Þ

This constituted the representation φ, V4 described by φ½qðH Þ : V4 → V4 , φ½qðH Þ = qHq{ :

ð24:502Þ

Now, we wish to construct another differential representation dφ, V4 . Following Theorem 25.1 and defining f(t) as f X ðt Þ  φðexp tX Þ

½X 2 slð2, ℂÞ,

for y 2 V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian we have dφ½X ðyÞ = f 0X ð0ÞðyÞ =

d ðexp tX Þy exp tXÞ{ dt

d = ðexp tX Þy exp tXÞ{ dt

t=0

t=0

= X exp tXÞy exp tX { þ ðexp tX ÞyX { exp tX {

t=0

{

= Xy þ yX ,

ð25:39Þ

where with the third equality we used (15.32). Example 25.2 Following Example 25.1, we compute dφ[X]( y) with X 2 slð2, ℂÞ and y 2 V4 . The mapping is expressed as dφ½X  : V4 → V4 ; dφ½X ðyÞ = Xy þ yX {

ð25:40Þ

with y 2 V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian , where V4 is spanned by σ 0, σ 1, σ 2, and σ 3 of (24.497). Note that (25.40) gives an endomorphism on V4 . i 0 Putting, e.g., X = ζ 3 = 12 , we have 0 -i dφ½ζ 3 ðyÞ = ζ 3 y þ yζ {3 : Substituting σ 0, σ 1, σ 2, and σ 3 for y, respectively, we obtain

ð25:41Þ

1280

25

Advanced Topics of Lie Algebra

dφ½ζ 3 ðσ 0 Þ = 0, dφ½ζ 3 ðσ 1 Þ = σ 2 , dφ½ζ 3 ðσ 2 Þ = - σ 1 and dφ½ζ 3 ðσ 3 Þ = 0: That is, we get

ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 3  = ðσ 0 σ 1 σ 2 σ 3 Þ

0 0

0

0

0 0 0 1

-1 0

0 0

0 0

0

0

:

ð25:42Þ

,

ð25:43Þ

:

ð25:44Þ

Similarly, we obtain

ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 1  = ðσ 0 σ 1 σ 2 σ 3 Þ

ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 2  = ðσ 0 σ 1 σ 2 σ 3 Þ

0 0 0 0

0 0

0 0

0 0 0 0

0 1

-1 0

0 0 0 0

0 0

0 0

0 1

0 0 -1 0

0 0

On the basis of Theorem 24.15, we have known that slð2, ℂÞ is spanned by six linearly independent basis vectors. That is, we have slð2, ℂÞ = Spanfζ 1 , ζ 2 , ζ 3 , iζ 1 , iζ 2 , iζ 3 g,

ð25:45Þ

where ζ1, ζ 2, ζ 3 have been defined in (20.270). Then, corresponding to (25.42), (25.43), and (25.44), we must have related expressions with iζ1, iζ 2, and iζ 3. In fact, we obtain

ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 3  = ðσ 0 σ 1 σ 2 σ 3 Þ

ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 1  = ðσ 0 σ 1 σ 2 σ 3 Þ

0

0

0 1

0

0

0 0

0 1

0 0

0 0 0 0

0

1

0 0

1

0

0 0

0 0

0 0

0 0 0 0

,

ð25:46Þ

,

ð25:47Þ

25.1

Differential Representation of Lie Algebra [1]

ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 2  = ðσ 0 σ 1 σ 2 σ 3 Þ

1281

0

0 1

0

0

0 0

0

1 0

0 0 0 0

0 0

:

ð25:48Þ

Equations (25.42), (25.43), and (25.44) are related to the spatial rotation and (25.46), (25.47), and (25.48) to the Lorentz boost. All these six (4, 4) real matrices are identical to those given in (24.220). Equations (25.42)-(25.48) show that the differential mapping dφ transforms the basis vectors ζ 1, ζ 2, ζ 3, iζ 1, iζ 2, iζ 3 of slð2, ℂÞ into those A = ðA1 , A2 , A3 Þ, B = ðB1 , B2 , B3 Þ of so0 ð3, 1Þ, respectively (through the medium of σ 0, σ 1, σ 2, and σ 3). Whereas slð2, ℂÞ is a six-dimensional complex vector space, so3 ð3, 1Þ is a six-dimensional real vector space. Once again, from the discussion of Sect. 11.4 the transformation dφ is bijective. Namely, slð2, ℂÞ and so0 ð3, 1Þ [i.e., Lie algebras corresponding to the Lie groups SL(2, ℂ) and SO0(3, 1)] are isomorphic to each other. Thus, the implication of Examples 25.1 and 25.2 is evident and the differential representation (e.g., ad and dφ in the present case) allows us to get a clear view on the representation theory of the Lie algebra. In Sect. 24.1, we had the following relation described by Λ = expða  AÞ expðb  BÞ:

ð24:223Þ

In this section, redefining a and b as a  ðθ1 , θ2 , θ3 Þ and b  ð- ω1 , - ω2 , - ω3 Þ:

ð25:49Þ

We establish the expression the same as (24.223). Notice that the parameters ωi (i = 1, 2, 3) have the minus sign. It came from the conventional definition about the direction of the Lorentz boost. Example 25.3 In Example 25.2, through the complexification we extended the Lie algebra suð2Þ = Spanfζ1 , ζ 2 , ζ 3 g to slð2, ℂÞ such that slð2, ℂÞ = Spanfζ 1 , ζ 2 , ζ 3 , iζ 1 , iζ 2 , iζ 3 g:

ð25:45Þ

Meanwhile, replacing ρ, t, X, and σ with φ, ω, iζ 3, and dφ, respectively, the following equation given by ρðexp tXÞ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ can be read as

ð25:6Þ

1282

25

Advanced Topics of Lie Algebra

expfω  dφ½iζ 3 g = φ½expðωiζ 3 Þ ½iζ 3 2 slð2, ℂÞ, ω 2 ℝ:

ð25:50Þ

From (20.270) we have iζ 3 =

1 2

-1

0

0

1

:

The function exp(ωiζ 3) can readily be calculated so that we have expðωiζ 3 Þ =

cosh

ω ω - sinh 2 2 0

0 cosh

ω ω þ sinh 2 2

:

Then, from (24.514) and (24.515), we obtain

φðexp ωiζ3 Þ =

cosh ω 0 0

0 1 0

0 0 1

sinh ω 0 0

sinh ω

0

0

cosh ω

:

With respect to RHS of (25.50), in turn, from (25.46) we get

ωdφ½iζ 3  =

ω

0

0

0

0

0

0 0

0 ω

0 0

0 0 0 0

:

Then, calculating exp{ωdφ[iζ 3]} we immediately find that (25.50) holds. Once again, Examples 25.1 and 25.3 provide tangible illustrations to an abstract relation between the representations of Lie group and Lie algebra that is represented by (25.6). With regard to Examples 25.1 and 25.2, we have a following important theorem in relation to Theorem 24.13. Theorem 25.2 [1] Let ðρ, VÞ be a representation of a linear Lie group G on a vector space V and let ðσ, VÞ be a representation of the Lie algebra g of G so that σ is a differential representation of ρ; i.e., σ = dρ. Also, let N be a kernel of ρ. That is, we have N = fg 2 G; ρðgÞ = eg, where e is an identity element. Then, we have

25.1

(i)

Differential Representation of Lie Algebra [1]

1283

G=N ffi ρðGÞ:

(ii) Let n be a Lie algebra of N . Then, n is a kernel of the homomorphism mapping σ = dρ of g. That is, n = fX 2 g; σ ðX Þ = 0g:

ð25:51Þ

Proof (i) We have already proven (i) in the proof of Theorem 16.4 (Homomorphism Theorem). With (ii), the proof is as follows: Regarding 8 X 2 g and any real number t, we have ρðexp tX Þ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ:

ð25:6Þ

If σ(X) = 0, RHS of (25.6) is e for any t 2 ℝ. Therefore, according to Definition 16.3 we have exp tX 2 N with any t 2 ℝ. This means that X is an element of the Lie algebra of N . From the definition of the Lie algebra of N , we have X 2 n. In an opposite way, if X 2 n, we have exp tX 2 N for any t 2 ℝ and, hence, LHS of (25.6) is always e according to Definition 16.3. Consequently, differentiating both sides of (25.6) with respect to t, we have 0 = σ ðX Þ½exp tσ ðX Þ: Taking the limit t → 0, we have 0 = σ(X). That is, n comprises the elements X that satisfy (25.51). This completes the proof. Now, we come back to the proof of Theorem 24.13. We wish to consider dφ, V4 in correspondence with φ, V4 . Let us start a discussion about the kernel N of φ. According to Theorem 24.13, it was given as ð24:532Þ

N = fσ 0 , - σ 0 g:

Since the differential representation dφ is related to the neighborhood of the identity element σ 0, the kernel n with respect to dφ is only X = 0 in light of (25.40). In fact, from Example 25.2, we have dφ½aζ1 þ bζ 3 þ cζ 3 þ diζ 1 þ eiζ 2 þ fiζ 3  =

0 d e f

d e 0 -c c 0 -b a

f b -a 0

:

From (25.45), aζ 1 + ⋯ + f i ζ 3 (a, ⋯, f : real) represents all the elements slð2, ℂÞ. For RHS to be zero element 0′ of so0 ð3, 1Þ, we need a = ⋯ = f = 0. This implies that dφ - 1 ð00 Þ = f0g,

1284

25

Fig. 25.2 Isomorphism dφ between slð2, ℂÞ and so0 ð3, 1Þ

Advanced Topics of Lie Algebra

(3, 1)

(2, )

: isomorphism

=

( )

where 0 denotes the zero element of slð2, ℂÞ; dφ-1 is the inverse transformation of dφ. The zero element 0′ is represented by a (4, 4) matrix. An explicit form is given by 00 =

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

:

The presence of dφ-1 is equivalent to the statement that dφ : slð2, ℂÞ⟶so0 ð3, 1Þ

ð25:52Þ

is a bijective mapping (see Sect. 11.2). From Theorem 25.2, with respect to the kernel n of dφ we have n = fX 2 slð2, ℂÞ; dφðX Þ = 00 g: Since dφ is bijective, from Theorem 11.4 we get n = f0g, 0 

0 0

0 0

:

Consequently, the mapping dφ can be characterized as isomorphism expressed as slð2, ℂÞ ffi so0 ð3, 1Þ:

ð25:53Þ

This relationship is in parallel with that between suð2Þ and soð3Þ; see Example 25.1. In Fig. 25.2 we depict the isomorphism dφ between slð2, ℂÞ and so0 ð3, 1Þ. Meanwhile, Theorem 20.11 guarantees that 8g 2 SO0(3, 1) can be described by g = ðexp t 1 X 1 Þðexp t 2 X 2 Þ⋯ðexp t 6 X 6 Þ,

ð25:54Þ

25.1

Differential Representation of Lie Algebra [1]

Fig. 25.3 Homomorphic mapping φ from SL(2, ℂ) to SO0(3, 1) with the kernel N = fσ 0 , - σ 0 g

1285

(2, )

(3, 1)



where X1, X2, ⋯, and X6 can be chosen from among the (4, 4) matrices of, e.g., (25.42), (25.43), (25.44), (25.46), (25.47), and (25.48). With 8q 2 SL(2, ℂ), q can be expressed as q = ðexp t 1 ζ 1 Þðexp t 2 ζ 2 Þðexp t 3 ζ 3 Þðexp t 4 iζ 1 Þðexp t 5 iζ 2 Þðexp t 6 iζ 3 Þ:

ð25:55Þ

Then, (25.6) is read in the present case as φ½ðexp t1 ζ 1 Þ⋯ðexp t 6 iζ 3 Þ = expðt 1 dφ½ζ 1 Þ⋯ expðt 6 dφ½iζ 3 Þ ½ζ 1 , ⋯, iζ 3 2 slð2, ℂÞ, t i ði = 1, ⋯, 6Þ 2 ℝ: Since from the above discussion there is the bijective mapping (or isomorphism) between slð2, ℂÞ and so0 ð3, 1Þ, we may identify X1, ⋯, X6 with dφ[ζ 1], ⋯, dφ[iζ 3], respectively; see (25.42)-(25.48). Thus, we get φ½q = expðt 1 dφ½ζ 1 Þ⋯ expðt 6 dφ½iζ 3 Þ = ðexp t 1 X 1 Þðexp t 2 X 2 Þ⋯ðexp t 6 X 6 Þ = g:

ð25:56Þ

Equation (25.56) implies that 8g 2 SO0(3, 1) can be described using ∃q 2 SL(2, ℂ) given by (25.55). This means that SO0 ð3, 1Þ ⊂ φ½SLð2, ℂÞ:

ð24:521Þ

Combining (24.520) and (24.521), we get the relation described by φ½SLð2, ℂÞ = SO0 ð3, 1Þ:

ð24:522Þ

Figure 25.3 shows the homomorphic mapping φ from SL(2, ℂ) to SO0(3, 1) with the kernel N = fσ 0 , - σ 0 g. In correspondence with (25.29) of Example 25.1, we have

1286

25

Advanced Topics of Lie Algebra

Table 25.1 Representation species of Lie algebras Representation Ad ad φ

Representation space suð2Þ ðrealÞ suð2Þ ðrealÞ V4 ðrealÞ

Mapping SU(2) → SO(3) suð2Þ → soð3Þ SL(2, ℂ) → SO0(3, 1)

Category Homomorphism Isomorphism Homomorphism



V4 ðrealÞ

slð2, ℂÞ → so0 ð3, 1Þ

Isomorphism

dφ, V4



φ, V4 :

ð25:57Þ

Note that in Example 25.1 the representation space V4 has been defined as a fourdimensional real vector space given in (24.493). Comparing Figs. 25.1 and 25.2 with Figs. 25.3 and 20.6, we conclude the following: (i) The differential representations ad and dφ exhibit the isomorphic mapping between the Lie algebras, whereas the representations Ad and φ indicate the homomorphic mapping between the Lie groups. (ii) This enables us to deal with the relationship between SL(2, ℂ) and SO0(3, 1) and that between SU(2) and SO(3) in parallel. Also, we are allowed to deal with the relation between slð2, ℂÞ and so0 ð3, 1Þ and that between suð2Þ and soð3Þ in parallel. Table 25.1 summarizes the properties of several (differential) representations associated with the Lie algebras and corresponding Lie groups. The Lie algebra is a special kind of vector space. In terms of the vector space, it is worth taking a second look at the basic concept of Chaps. 11 and 20. In Sect. 25.1, we have mentioned the properties of the differential representation of the Lie algebra g on a certain vector space V. That is, we have had σ : g⟶V:

ð25:4Þ

Equation (25.4) may or may not be an endomorphism. Meanwhile, the adjoint representation Ad of the Lie group SU(2) was not an endomorphism; see (20.303) denoted by Ad : SU ð2Þ ⟶ SOð3Þ:

ð20:303Þ

These features not only enrich the properties of Lie groups and Lie algebras but also make various representations including differential representation easy to use and broaden the range of applications. As already seen in Sect. 20.4, soð3Þ and suð2Þ have the same structure constants. Getting back to Theorem 11.3 (Dimension Theorem), we have had dim V n = dimAðV n Þ þ dim Ker A: Suppose that there is a mapping M from suð2Þ to soð3Þ such that

ð11:45Þ

25.2

Cartan–Weyl Basis of Lie Algebra [1]

1287

M : suð2Þ⟶ soð3Þ: Equations (20.273) and (20.276) clearly show that both suð2Þ and soð3Þ are spanned by three basis vectors, namely the dimension of both suð2Þ and soð3Þ is three. Hence, (11.45) should be translated into dim½suð2Þ = dimM½suð2Þ þ dim Ker M:

ð25:58Þ

dim½suð2Þ = dimM½suð2Þ = dim½soð3Þ = 3:

ð25:59Þ

Also, we have

This naturally leads to dim Ker M = 0 in (25.58). From Theorem 11.4, in turn, we conclude that M is a bijective mapping, i.e., isomorphism. In this situation, we expect to have the homomorphism between the corresponding Lie groups. This was indeed the case with SU(2) and SO(3) (Theorem 20.7). By the same token, in this section we have shown that there is one-to-one correspondence between slð2, ℂÞ and so0 ð3, 1Þ. Corresponding to (25.58), we have ~ ½slð2, ℂÞ þ dim Ker M, ~ dim½slð2, ℂÞ = dimM

ð25:60Þ

~ : slð2, ℂÞ⟶so0 ð3, 1Þ: M

ð25:61Þ

~ ½slð2, ℂÞ = dim½so0 ð3, 1Þ = 6: dim½slð2, ℂÞ = dimM

ð25:62Þ

where

In the above, we have

~ = 0. Notice that dφ of (25.52) is a special but Again, we have dim Ker M ~ important case of M. Similarly to the case of suð2Þ, there has been a homomorphic mapping from SL(2, ℂ) to SO0(3, 1) (Theorem 24.13).

25.2

Cartan-Weyl Basis of Lie Algebra [1]

In this section, we wish to take the concept of complexification a step further. In this context, we deal with the Cartan-Weyl basis that is frequently used in quantum mechanics and the quantum theory of fields. In our case, the Cartan-Weyl basis has already appeared in Sect. 3.5, even though we did not mention that terminology.

1288

25.2.1

25

Advanced Topics of Lie Algebra

Complexification

We focus on the application of Cartan-Weyl basis to soð3Þ [or oð3Þ] to widen the vector space of soð3Þ by their “complexification” (see Sect. 24.10). The Lie algebra soð3Þ is a vector space spanned by three basis vectors whose commutation relation is described by X i , X j = f ij k X k ði, j, k = 1, 2, 3Þ,

ð25:63Þ

where Xi (i = 1, 2, 3) are the basis vectors, which are not commutative with one another. We know that soð3Þ is isomorphic to suð2Þ. In the case of suð2Þ, the complexification of suð2Þ enables us to deal with many problems within the framework of slð2, ℂÞ. Here, we construct the Cartan-Weyl basis from the elements of soð3Þ [or oð3Þ] by extending soð3Þ to oð3, ℂÞ through the complexification. Note that oð3, ℂÞ comprises complex skew-symmetric matrices. In Sects 20.1 and 20.2 we converted the Hermitian operators to the anti-Hermitian operators belonging to soð3Þ such that, e.g., Az  - iMz :

ð20:15Þ

These give a clue to how to address the problem. In an opposite way, we get iAz = Mz :

ð25:64Þ

In light of Theorem 24.15, the Hermitian operator Mz should now be regarded as due to the complexification of soð3Þ in (25.64). We redefine A3  Az and M3  Mz; see (20.274). In a similar way, we get M1  iA1 and M2  iA2 : The operators M1 , M2 , and M3 construct oð3, ℂÞ together with A1, A2, and A3. That is, we have oð3, ℂÞ = Span fA1 , A2 , A3 , M1 , M2 , M3 g: A general form of oð3, ℂÞ is given using real numbers a, b, c, d, f, g as

ð25:65Þ

25.2

Cartan–Weyl Basis of Lie Algebra [1]

0

a þ bi

c þ di

- ða þ biÞ

0

f þ gi

- ðc þ diÞ

- ðf þ giÞ 0

0

=

a

=

1289

0

a

-a

0

-c

-f 0

c

-a

0

f

-c

-f

0

þi

c

di

0

bi

þ

- bi

0

gi

0 b d

- di

- gi

0

f

-b

0

g

-d

-g

0

:

ð25:66Þ Let U be an arbitrary element of oð3, ℂÞ. Then, from (25.66) U is uniquely described as U = V þ iW ½V, W 2 soð3Þ: This implies that oð3, ℂÞ is a complexification of soð3Þ. Or, we may say that soð3Þ is a real form of oð3, ℂÞ. According to the custom, we define Fþ ½2 oð3, ℂÞ and F - ½2 oð3, ℂÞ as Fþ  A1 þ iA2 = A1 þ M2 and F - = A1 - iA2 = A1 - M2 , and choose Fþ and F - for the basis vectors of oð3, ℂÞ instead of M1 and M2 . We further define the operators J(+) and J(-) in accordance with (3.72) such that J ðþÞ  iFþ = iA1 - A2 and J ð- Þ = iF - = iA1 þ A2 : Thus, we have gotten back to the discussion made in Sects. 3.4 and 3.5 to directly apply it to the present situation. Although J(+), J(-), and M3 are formed by the linear combination of the elements of soð3Þ, these operators are not contained in soð3Þ but contained in oð3, ℂÞ. The vectors J(+), J(-), and M3 form a subspace of oð3, ℂÞ. We term this subspace W, to which these three vectors constitute a basis set. This basis set is referred to as Cartan-Weyl basis and we have W  Span J ð- Þ , M3 , J ðþÞ :

ð25:67Þ

Even though M3 is Hermitian, J(+) is neither Hermitian nor anti-Hermitian. Nor is J , either. Nonetheless, we are well used to dealing with these operators in Sect. 3.5. We have commutation relations described as (-)

1290

25

Advanced Topics of Lie Algebra

M3 , J ðþÞ = J ðþÞ , M3 , J ð- Þ = - J ð- Þ ,

ð25:68Þ

J ðþÞ , J ð- Þ = 2M3 : Note that the commutation relations of (25.68) have already appeared in (3.74) in the discussion of the generalized angular momentum. p Using J ð~± Þ  J ð ± Þ = 2, we obtain a “normalized” relations expressed as M3 , J ð~± Þ = ± J ð~± Þ and J ð~þÞ , J ð~- Þ = M3 :

ð25:69Þ

From (25.68) and (25.69), we find that W forms a subalgebra of oð3, ℂÞ. Also, at the same time, from (25.14) the differential representation ad with respect to W diagonalizes ad½M3 . That is, with the three-dimensional representation, using (25.68) or (25.69) we have -1 0 0

J ð- Þ M3 J ðþÞ ad½M3  = J ð- Þ M3 J ðþÞ

0 0 0

0 0 1

:

ð25:70Þ

Hence, we obtain ad½M3  =

-1 0 0

0 0 0

0 0 1

:

ð25:71Þ

In an alternative way, let us consider slð2, ℂÞ instead of oð3, ℂÞ. In a similar ~ of slð2, ℂÞ defined by manner of the above case, we can choose a subspace W ~  Spanfiζ1 þ ζ 2 , iζ 3 , iζ 1 - ζ 2 g: W ~ is a subalgebra of slð2, ℂÞ. The operators iζ 1 + ζ 2, iζ 3, and At the same time, W iζ 1 - ζ 2 correspond to J(-), M3 , and J(+) of the above case, respectively. Then, W ~ are isomorphic and the commutation relations exactly the same as (25.68) and W hold with the basis vectors iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2. Corresponding to (25.71), we have ad½iζ 3  =

-1 0 0

0 0 0

0 0 1

:

ð25:72Þ

We find that choosing the basis vectors iζ1 + ζ 2, iζ 3, and iζ 1 - ζ 2 contained in slð2, ℂÞ, ad[iζ 3] can be diagonalized as in (25.71). Thus, in terms of Lie algebra the ~ are equivalent. subalgebra W and W

25.2

Cartan–Weyl Basis of Lie Algebra [1]

1291

From (25.68) and on the basis of the discussions in Sects. 3.4, 3.5, 20.2, and 20.4, ~ on the (2j + 1)we can readily construct an irreducible representation ad of W or W dimensional representation space V, where j is given as positive integers or positive half-odd-integers. The number j is said to be a highest weight as defined below. Definition 25.3 Eigenvalues of the differential representation ad½M3  or ad[iζ 3] are called a weight of the representation ad. Among the weights, the weight that has the largest real part is the highest weight. In (25.71) and (25.72), for instance, the highest weight of M3 or iζ 3 is 1. From the discussion of Sects. 3.4 and 3.5, we immediately obtain the matrix representation of the Cartan-Weyl basis of J(-), M3, and J(+) or (iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2). Replacing l with j in (3.152), (3.153), and (3.158), we get -j -j þ 1 -jþ 2 M3 =



ð25:73Þ

, k-j ⋱ j

where k = 0, 1, ⋯, 2j and the highest weight of M3 is j. Also, we have J ð- Þ =p 0 2j  1 0

ð2j - 1Þ  2 0

⋱ ⋱

0 ð2j - k þ 1Þ  k

,

0 ⋱

⋱ 0

p

1  2j 0 ð25:74Þ

1292

25

J ð þÞ = 0 p 2j  1

Advanced Topics of Lie Algebra

0 ð2j - 1Þ  2

0 ⋱

0 ð2j - k þ 1Þ  k

,

0 ⋱

⋱ 0 ⋱

p

0 1  2j 0 ð25:75Þ

where k = 1, 2, ⋯, 2j. In (25.73)-(25.75), the basis vectors iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2 may equally be used instead of J(-), M3, and J(+), respectively. The above matrix representations of the Cartan-Weyl basis of J(-), M3, and J(+) give important information by directly appealing to the intuition. The matrix M3 is admittedly a diagonalized matrix. The matrices J(+) and J(-) are lower and upper triangle matrices, respectively. This means that all their eigenvalues are zero (see Sect. 12.1). In turn, this immediately leads to the fact that J(+) and J(-) are (2j+1)th nilpotent (2j+1, 2j+1) square matrices and implies the presence of (2j+1) linearly independent vectors in the representation space V. Also, we know that neither J(+) nor J(-) can be diagonalized; see Sect. 12.3. Let ad be a representation pertinent to (25.73)-(25.75). Then, the representation ðad, VÞ is irreducible. Summarizing the above results, we have the following theorem. Theorem 25.3 [1] Let ðad, VÞ be an irreducible representation of W. Suppose that the highest weight of ðad, VÞ is j (>0). Then, the following relationships holds: (i) dimV = 2j þ 1. (ii) The weights are j, j - 1, ⋯, - j + 1, - j with the highest weight j. (iii) There exist (2j + 1) basis vectors jζ, μ i [μ = j, j - 1, ⋯, -j + 1, -j; ζ = j( j + 1)] that satisfy the following equations: M3 j ζ, μi = μ j ζ, μ i ðμ = j, j - 1, ⋯, - j þ 1, - jÞ, J ðþÞ j ζ, μi = J

ð- Þ

j ζ, μi =

ðj - μÞðj þ μ þ 1Þ j ζ, μ þ 1i ðμ = j - 1, j - 2, ⋯, - jÞ, ð25:76Þ ðj - μ þ 1Þðj þ μÞ j ζ, μ - 1 i ðμ = j, j - 1, ⋯, - j þ 1Þ,

where j is chosen from among positive integers or positive half-odd-integers.

25.2

Cartan–Weyl Basis of Lie Algebra [1]

1293

Three operators of (25.73)-(25.75) satisfy the commutation relations described by (25.68). Readers are encouraged to confirm it. The dimension of the representation space V relevant to these irreducible representations is 2j + 1. Correspondingly, V is spanned by the (2j + 1) basis vectors jζ, μ i [μ = j, j - 1, ⋯, -j + 1, j; ζ = j( j + 1)]. Note, however, that the dimension as the Lie algebra of W is three; see (25.67). The commutation relations (25.73)-(25.75) equally hold with iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2, which constitute the basis vectors of W. As explained above, the notion of complexification of the Lie algebra finds wide applications in the field of continuous groups not only from the point of view of abstract algebra, but also with the physical interpretations of, e.g., angular momenta that were dealt with in Chap. 3. In particular, the Cartan-Weyl basis has been constructed through the complexification of soð3Þ or suð2Þ. The Lie algebras of W and W are subalgebras of oð3, ℂÞ and slð2, ℂÞ, respectively. In relation to the complexification, we add a following condition to the definition of the representation of Lie algebra so that we may extend the concept of the representation to the complex representation. The condition is described by [1] ρðiX Þ = iρðX Þ,

ð25:77Þ

where ρ is a representation of the Lie algebra. If ρ satisfies (25.77), ρ is said to be a complex analytic representation of the Lie algebra [1]. Let us formally give a following definition. Definition 25.4 Let g be a complex Lie algebra and let V be a vector space. Suppose that there is a complex linear mapping ρ from g to V. If ρ is a representation of g on V and satisfies the condition (25.77), then such ρ is said to be a complex analytic representation. By virtue of the condition (25.77), we may extend (i) of Definition 25.1 to ðiÞ0 σ ðaX Þ = aσ ðX Þ ða 2 ℂÞ:

ð25:78Þ

Except for (i) of Definition 25.1, the conditions (ii), (iii), and (iv) hold without alterations. In the context of the complex representation, we have a following general theorem. Theorem 25.4 [1] Let g be a complex Lie algebra and h be a real form of g (see Sect. 24.10). Let ρ be any representation of h on a vector space V. Moreover, let ρc be a representation of g on V. Suppose with ρc we have a following relationship: ρc ðZ Þ = ρðX Þ þ iρðY Þ ðZ 2 g; X, Y 2 hÞ,

ð25:79Þ

where Z = X + iY. Then, ðρc , VÞ is a complex analytic representation of g on V.

1294

25

Advanced Topics of Lie Algebra

Proof For the proof, use (25.1), (25.2), and (25.5). The representation ρc is a real linear mapping from g to glðVÞ. The representation ρc is a complex linear mapping as well. It is because we have ρc ½iðX þ iY Þ = ρc ðiX - Y Þ = - ρðY Þ þ iρðX Þ = iρc ðX þ iY Þ:

ð25:80Þ

Meanwhile, we have ρc ð½X þ iY, U þ iW Þ = ½ρðX Þ, ρðU Þ - ½ρðY Þ, ρðW Þ þi½ρðX Þ, ρðW Þ þ i½ρðY Þ, ρðU Þ = ½ρc ðX þ iY Þ, ρc ðU þ iW Þ: Hence, ðρc , VÞ is a complex analytic representation of g. This completes the proof. In the next section, we make the most of the results obtained in this section.

25.2.2

Coupling of Angular Momenta: Revisited

In Sect. 20.3, we calculated the Clebsch-Gordan coefficients on the basis of the representation theory of the continuous group SU(2). Here, we wish to seek the relevant results using the above-mentioned theory of Lie algebras. We deal with the problems that we considered in Examples 20.1 and 20.2. Since in this section we consider the coupling of the angular momenta, we modify the expression as in Sect. 20.3 such that J = jj1 - j2 j, jj1 - j2 j þ 1, ⋯, j1 þ j2 ,

ð20:164Þ

where J indicates the momenta of the coupled system with j1 and j2 being momenta of the individual systems. At the same time, J represents the highest weight. Thus, j used in the previous section should be replaced with J. We show two examples below and compare the results with those of Sect. 20.3.3. In the examples given below, we adopt the following calculation rule. Let O be an operator chosen from among M3, J(+), and J(-) of (25.76). Suppose that the starting equation we wish to deal with is given by Ξ = Φ  Ψ, where Φ  Ψ is a direct product of quantum states Φ and Ψ (or physical states, more generally). Then, we have

25.2

Cartan–Weyl Basis of Lie Algebra [1]

1295

OΞ = OðΦ  ΨÞ = ðOΦÞ  Ψ þ Φ  ðOΨÞ, where O is an operator that is supposed to operate on Ξ. If we choose M3 for O and both Φ and Ψ are eigenstate of M3, we have a trivial result with no new result obtained. It can be checked easily. If, however, we choose J(+) or J(-), we can get meaningful results. To show this, we give two examples below. Example 25.4 We deal with the problem that we considered in Example 20.1. That is, we are examining the case of D(1/2)  D(1/2). According to (20.164), as the highest weight J we have J = 1, 0: (i) J = 1 If among the highest weights we choose the largest one, the associated coupled state is always given by the direct product of the individual component state(see Examples 20.1 and 20.2). Thus, we have j 1, 3i = j 1=2, 2iA j 1=2, 2iB :

ð25:81Þ

In (25.81), we adopted the expression jμ, 2J + 1i with 2J + 1 bold faced according to the custom instead of jζ, μi used in the previous section. The subscripts A and B in RHS of (25.81) designate the individual component systems. The strategy to solve the problem is to operate J(-) of (25.76) on both sides of (25.81) and to diminish μ stepwise with J (or ζ) kept unchanged. That is, we have J ð- Þ j 1, 3i = J ð- Þ ½j1=2, 2iA j1=2, 2iB :

ð25:82Þ

Then, we obtain p

2 j 0, 3i = J ð- Þ j1=2, 2iA j 1=2, 2iB þ j 1=2, 2iA J ð- Þ j1=2, 2iB :

ð25:83Þ

Hence, we have 1 j 0, 3i = p ½j - 1=2, 2iA j1=2, 2iB þj1=2, 2iA j - 1=2, 2iB : 2

ð25:84Þ

Further operating J(-) on both sides, we get j - 1, 3i = j - 1=2, 2iA j - 1=2, 2iB :

ð25:85Þ

1296

25

Advanced Topics of Lie Algebra

Thus, we have obtained the symmetric representations (25.81), (25.84), and (25.85). Note that J ð- Þ j - 1=2, 2iA = J ð- Þ j - 1=2, 2iB = 0:

ð25:86Þ

Looking at (25.85), we notice that we can equally start with (25.85) using J(+) instead of starting with (25.81) using J(-). (ii) J = 0 In this case, we determine a suitable solution j0, 1i so that it may become orthogonal to j0, 3i given by (25.84). We can readily find the solution such that 1 j 0, 1i = p ½j - 1=2, 2iA j1=2, 2iB - j1=2, 2iA j - 1=2, 2iB : 2

ð25:87Þ

Compare the above results with those obtained in Example 20.1. Example 25.5 Next, we consider the case of the direct product D(1)  D(1) in parallel with Example 20.2. In this case the highest weight J is J = 2, 1, 0:

ð25:88Þ

The calculation processes are a little bit longer than Example 25.4, but the calculation principle is virtually the same. (i) J = 2. We start with j 2, 5i = j 1, 3iA j 1, 3iB :

ð25:89Þ

Proceeding in a similar manner as before, we get 1 j 1, 5i = p ðj1, 3iA j0, 3iB þj0, 3iA j1, 3iB Þ: 2

ð25:90Þ

Further operating J(-) on both sides of (25.90), we successively obtain 1 j 0, 5i = p ðj - 1, 3iA j1, 3iB þ 2j0, 3iA j0, 3iB þj1, 3iA j - 1, 3iB Þ, 6

(ii) J = 1

ð25:91Þ

1 j - 1, 5i = p ðj - 1, 3iA j0, 3iB þj0, 3iA j - 1, 3iB Þ, 2

ð25:92Þ

j - 2, 5i = j - 1, 3iA j - 1, 3iB :

ð25:93Þ

25.2

Cartan–Weyl Basis of Lie Algebra [1]

1297

In this case, we determine a suitable solution j1, 3i so that it may become orthogonal to j1, 5i given by (25.90). We can readily find the solution described by 1 j 1, 3i = p ðj1, 3iA j0, 3iB - j0, 3iA j1, 3iB Þ: 2

ð25:94Þ

Then, operating J(-) on both sides of (25.94) successively, we get 1 j 0, 3i = p ðj1, 3iA j - 1, 3iB - j - 1, 3iA j0, 3iB Þ, 2

ð25:95Þ

1 j - 1, 3i = p ðj0, 3iA j - 1, 3iB - j - 1, 3iA j0, 3iB Þ: 2

ð25:96Þ

(iii) J = 0 We determine a suitable solution j0, 1i so that it is orthogonal to j0, 5i given by (25.91) to get the desired solution described by 1 j 0, 1i = p ðj - 1, 3iA j1, 3iB - j0, 3iA j0, 3iB þj1, 3iA j - 1, 3iB Þ: 3

ð25:97Þ

Notice that if we had chosen for j0, 1i a vector orthogonal to j0, 3i of (25.95), we would not successfully construct a unitary matrix expressed in (20.232). The above is the whole procedure that finds a set of proper solutions. Of these, the solutions obtained with J = 2 and 0 possess the symmetric representations, whereas those obtained for J = 1 have the antisymmetric representions. Symbolically writing the above results of the direct product in terms of the direct sum, we have 3  3 = 5  3  1:

ð25:98Þ

If in the above processes we start with j - 2, 5i = j - 1, 3iA j - 1, 3iB and operate J(+) on both sides, we will reach the same conclusion as the above. Once again, compare the above results with those obtained in Example 20.2. As shown in the above examples, we recognize that we have been able to automatically find a whole bunch of solutions compared to the previous calculations carried out in Examples 20.1 and 20.2. And what is more, as we have seen in this chapter, various problems which the rresentation theory deals with can be addressed clearly and succinctly by replacing the representation ðρ, VÞ of the group G with the representation ðσ, VÞ of its corresponding Lie algebra g. Thus, the representation theory of the Lie algebras together with the Lie groups serves as a powerful tool to solve various kinds of problems in mathematical physics.

1298

25.3

25

Advanced Topics of Lie Algebra

Decomposition of Lie Algebra

In Sect. 24.10 we have mentioned the decomposition of the complex matrices (Theorem 24.14) and that of slð2, ℂÞ (Theorem 24.15) in connection with the discussion that has clarified the relationship between SL(2, ℂ) and SO0(3, 1). Since the whole collection of the (2, 2) complex matrices forms glð2, ℂÞ, Theorem 24.14 states the decomposition of glð2, ℂÞ. Since the Lie algebras form vector spaces, the relevant discussion will come down to the decomposition of the vector spaces. Let us formally define glð2, ℂÞ such that glð2, ℂÞ  C = cij ; i, j = 1, 2, cij 2 ℂ :

ð25:99Þ

This has already appeared in Sect. 24.10. In fact, glð2, ℂÞ is identical to C of (24.530). In Sect. 11.1 we dealt with the decomposition of a vector space into a direct sum of two (or more) subspaces. Meanwhile, in Sect. 14.6 we have shown that any matrix A can be decomposed such that A = B þ iC,

ð14:125Þ

where B

1 1 A - A{ A þ A{ and C  2i 2

ð25:100Þ

with both B and C being Hermitian. In other words, any matrix A can be decomposed into the sum of an Hermitian matrix and an anti-Hermitian matrix. Instead of decomposing A as (14.125), we may equally decompose A such that A = B þ iC,

ð14:125Þ

where B

1 1 A þ A{ A - A{ and C  2i 2

ð25:101Þ

with both B and C being anti-Hermitian. If a Lie algebra is formed by a complexification of a subalgebra (see Definition 24.2), such examples are of high practical interest. As a typical example, we had slð2, ℂÞ and glð2, ℂÞ. Both (25.100) and (25.101) are used for the complexification of real Lie algebras. We have already encountered the latter example in Sect. 24.10. There, we have shown that slð2, ℂÞ is a complexification of suð2Þ, which is a real Lie subalgebra of slð2, ℂÞ; see Theorem 24.15. In that case, we can choose suð2Þ for a real form h of slð2, ℂÞ. The Lie algebra slð2, ℂÞ is decomposed as in (14.125) in

25.3

Decomposition of Lie Algebra

1299

combination with (25.101), where both B and C are anti-Hermitian. Namely, we have Z = B þ iC ½Z 2 slð2, ℂÞ; B, C 2 suð2Þ: In the case of glð2, ℂÞ, however, glð2, ℂÞ is a complexification of V4 . The vector space V4 , which has appeared in (24.493), is evidently a real subalgebra of glð2, ℂÞ. (The confirmation is left for the readers.) Therefore, we can choose V4 for a real form h of glð2, ℂÞ. It is decomposed as in (14.125) in combination with (25.100), where both B and C are Hermitian. In this case, we have Z = B þ iC Z 2 glð2, ℂÞ; B, C 2 V4 : Viewing the Lie algebra as a vector space, we have a following theorem with respect to V4 . Theorem 25.5 Let the vector space V4 be given by V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian ,

ð24:493Þ

where (hij) is a complex (2, 2) Hermitian matrix. Then, the Lie algebra glð2, ℂÞ is expressed as a direct sum of V4 and iV4 . That is, we have glð2, ℂÞ = V4  iV4 ,

ð25:102Þ

where iV4 = iH; H 2 V4 ; see (24.493). Proof It is evident that both V4 and iV4 form a vector space [i.e., subspace of glð2, ℂÞ. Let z be an arbitrarily chosen element of glð2, ℂÞ. We assume that z can be described as z=

a þ ib

c þ id

p þ iq

r þ is

ða, b, c, d, p, q, r, s 2 ℝÞ:

ð24:535Þ

Using σ i ði = 0, 1, 2, 3Þ 2 V4 given by (24.495), we have z =

1 ½ða þ r Þσ 0 þ ð - a þ r Þσ 3 þ ðc þ pÞσ 1 þ ðd - qÞσ 2  2 i þ ½ðb þ sÞσ 0 þ ð - b þ sÞσ 3 þ ðd þ qÞσ 1 þ ð - c þ pÞσ 2 : 2

ð24:536Þ

The element of the first term of (24.536) belongs to V4 and that of the second term of (24.536) belongs to iV4 . Thus, z 2 V4 þ iV4 . Since z is arbitrary, this implies that

1300

25

Advanced Topics of Lie Algebra

glð2, ℂÞ ⊂ V4 þ iV4 :

ð25:103Þ

Now, suppose that ∃ u 2 glð2, ℂÞ is contained in V4 \ iV4 . Then, u can be expressed as u = ασ 0 þ βσ 1 þ γσ 2 þ δσ 3 ðα, β, γ, δ : arbitrary real numbersÞ = iðλσ 0 þ μσ 1 þ νσ 2 þ ξσ 3 Þ ðλ, μ, ν, ξ : arbitrary real numbersÞ:

ð25:104Þ

Since σ 0, σ 1, σ 2, and σ 3 are basis vectors of V4 , (25.104) implies that α = iλ, β = iμ, γ = iν, δ = iξ:

ð25:105Þ

Since α, β, ⋯, λ, μ, etc. are real numbers, (25.105) means that α = β = γ = δ = λ = μ = ν = ξ = 0: Namely, u  0. This clearly indicates that V4 \ iV4 = f0g: Thus, from Theorem 11.1, any vector w 2 V4 þ iV4 is uniquely described by w = w1 þ w2 w1 2 V4 , w2 2 iV4 :

ð25:106Þ

That is, by the definition of the direct sum described by (11.17), V4 þ iV4 is identical with V4  iV4 . Meanwhile, from (25.106) evidently we have V4  iV4 ⊂ glð2, ℂÞ:

ð25:107Þ

Using (25.103) and (25.107) along with (25.106), we obtain (25.102). These complete the proof. Thus, glð2, ℂÞ is a complexification of V4 . In a complementary style, the subalgebra V4 is said to be a real form of glð2, ℂÞ. Theorem 25.5 is in parallel with the discussion of Sects. 24.9 and 25.2 on the relationship between slð2, ℂÞ and suð2Þ. In the latter case, we say that slð2, ℂÞ is a complexification of suð2Þ and that suð2Þ is a real form of slð2, ℂÞ. Caution should be taken in that the real form V4 comprises the Hermitian operators, whereas the real form suð2Þ consists of the anti-Hermitian operators. We remark that although iV4 is a subspace of glð2, ℂÞ, it is not a subalgebra of glð2, ℂÞ. In this book, we chose glð2, ℂÞ and slð2, ℂÞ as typical examples of the complex Lie algebra. A bit confusing, however, these Lie algebras are real Lie algebra as well. This is related to the definition of the representation of Lie algebra. In (25.1), we designated a as a real number. Suppose that the basis set of the Lie algebra g is

25.4

Further Topics of Lie Algebra

1301

fe1 , ⋯, ed g, where d is the dimension of g (as a vector space). Then, from (25.1) and (25.2) we obtain d

d

σ

ak e k k=1

ak σ ðek Þ ak : real :

=

ð25:108Þ

k=1

Now, we assume that σ : g → g0 . Also, we assume that σ ðek Þ ðk = 1, ⋯, dÞ is the basis set of g0 . Then, g and g0 are isomorphic and σ is a bijective mapping. If σ is endomorphism (i.e., g = g0 ), σ is merely an identity mapping. Though trivial, (25.108) clearly indicates that σ transforms a real vector d

vector k=1

d k=1

ak ek into another real

ak σ ðek Þ. It follows that both the g and g0 are real Lie algebras. Thus, so far

as ak is a real number, we must deal with even a complex Lie algebra as a real Lie algebra. As already discussed, this confusing situation applies to glð2, ℂÞ and slð2, ℂÞ. Note that in contrast, e.g., suð2Þ is inherently not a complex algebra, but a real Lie algebra. Recall (24.550). We have mentioned that slð2, ℂÞ is a complex vector space. Yet, if we regard iζ 1 , iζ 2 , and iζ 3 as the basis vectors of slð2, ℂÞ, then their coefficients a, (d - q)/2, and (c + p)/2 of (24.550) are real, and so slð2, ℂÞ can be viewed as a real vector space. This situation is the same as that for glð2, ℂÞ; see (24.536). In this respect, the simplest example is a complex number z 2 ℂ. It can be expressed as z = x + iy (x, y 2 ℝ). We rewrite it as z = x  1 þ y  i ðx, y 2 ℝÞ:

ð25:109Þ

We have z = 0 if and only if x = y = 0. Suppose that we view a whole complex field ℂ as a two-dimensional vector space. Then, 1 and i constitute the basis vectors of ℂ with x and y being coefficients. In that sense, ℂ is a two-dimensional real vector space over ℝ. In terms of abstract algebra, ℂ is referred to as the second-order extension of ℝ. We will not go into further detail about this issue. Instead, further reading is given at the end of this chapter (see Refs. [2, 3]).

25.4

Further Topics of Lie Algebra

In relation to the advanced topics of the Lie algebra given in this chapter, we wish to mention related topics below. In Sect. 15.2 we have had various properties of the exponential functions of matrices. Here, it is worth adding some more properties of those functions. We have additional important properties below [1].

1302

(10) (11) (12) (13)

25

Advanced Topics of Lie Algebra

Let A be a symmetric matrix. Then, expA is a symmetric matrix. Let A be an Hermitian matrix. Then, expA is an Hermitian matrix. Let expA be a symmetric matrix. Then, A is a symmetric matrix. Let expA be an Hermitian matrix. Then, A is an Hermitian matrix.

Properties (10) and (11) are evident from Properties (3) and (4), respectively. To show Property (12), we have ðexp tAÞT = exp tAT = exp tA,

ð25:110Þ

where with the first equality we used Property (3) of (15.31). Differentiating exp (tAT) and exptA of (25.110) with respect to t and taking a limit of t → 0 in the derivative, we obtain AT = A. In a similar manner, we can show Property (13). With so0 ð3, 1Þ we have six basis vectors (in a matrix form) as shown in (24.220). Of these, three vectors are real skew-symmetric (or anti-Hermitian) matrices and other three vectors are real symmetric (or Hermitian) matrices. Then, their exponential functions are real orthogonal matrices [from Property (6) of Sect. 15.2] and real symmetric (or Hermitian) matrices [from Properties (12) and (13) as shown above], respectively. Since eigenvalues of the Hermitian matrix are real, the eigenvalues corresponding to its exponential function are positive definite from Theorem 15.3. Thus, naturally we have a good example of the polar decomposition of a matrix with respect to the Lorentz transformation matrix; see Theorem 24.7 and Corollary 24.3. We have a next example for this. Example 25.6 Returning to the topics dealt with in Sects. 24.1 and 24.2, let us consider the successive Lorentz transformations of the rotation about the z-axis and the subsequent boost in the positive direction of the z-axis. It is a case where the direction of rotation axis and that of boost coincide. The pertinent elements of the Lie algebra are given by

Xr =

0

0

0

0

0 0

0 1

-1 0

0 0

0

0

0

0

, Xb =

0 0 0 0 0 0

0 0 0

1 0 0

1

0

0

0

,

ð25:111Þ

where Xr and Xb stand for the rotation and boost, respectively. We express the rotation by θ and the boost by ω as

25.4

Further Topics of Lie Algebra

X θr =

1303

0 0 0

0 0 θ

0 -θ 0

0 0 0

0

0

0

0

0

0

0



0 0

0 0

0 0

0 0

-ω 0

0

0

, X ωb =

:

ð25:112Þ

Notice that Xr and Xb are commutative and so are X θr and X ωb . Therefore, on the basis of Theorem 15.2 we obtain exp X θr exp X ωb = exp X ωb exp X θr = exp X θr þ X ωb :

ð25:113Þ

Meanwhile, we define X as

X  X θr þ X ωb =

0 0

0 0

0 -θ

-ω 0

0 -ω

θ 0

0 0

0 0

:

ð25:114Þ

Borrowing (24.224) and the related result, as the successive Lorentz transformations Λ we get Λ = exp X θr exp X ωb = exp X ωb exp X θr = exp X θr þ X ωb = exp X

=

=

1

0

0

0

cosh ω

0

cos θ

- sin θ

0

0

1

0

0

0

sin θ

cos θ

0

0

0

1

0

0

0

0

1

- sinh ω

0

0

cosh ω

cosh ω 0 0 - sinh ω

0 cos θ sin θ 0

0 - sinh ω - sin θ 0 cos θ 0 0

0

0

- sinh ω

ð25:115Þ

:

cosh ω

It can be readily checked that Λ is normal. Thus, (25.115) provides a good illustration for the polar decomposition. As expected from the discussion of Sect. 24.9.1, two matrices with the second last equality of (25.115) are commutative by virtue of Corollary 24.2. We have

1304

25

Advanced Topics of Lie Algebra

ΛΛ{ = Λ{ Λ = ðexpX Þ expX { = expX { ðexpX Þ = exp X þ X { 0

0

- sinh 2ω

0 0

1 0

0 1

0 0

- sinh 2ω

0

0

cosh 2ω

cosh 2ω = exp 2X bω =

:

ð25:116Þ

As shown above, if the successive transformations comprise the rotation around a certain axis (the z-axis in the present case) and the boost in the direction of that same axis, the corresponding representation matrices Λ and Λ{ are commutative (i.e., Λ is normal). In this case, the effects of the rotation of Λ and Λ{ are cancelled out and we are only left with a doubled effect of the boosts.

25.5 Closing Remarks In this book we have dealt with classical electromagnetism and quantum mechanics that are bridged by the quantum field theory. Meanwhile, chemistry-oriented topics such as quantum chemistry were studied in a framework of materials science and molecular science including the device physics. The theory of vector spaces has given us a bird’s-eye view of all these topics. Theory of analytic functions and group theory supply us with a powerful tool for calculating various physical quantities and invariants that appeared in the mathematical equations. The author is sure that the readers acquire ability to attack various problems in the mathematical physical chemistry.

References 1. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo. (in Japanese) 2. Takeuchi G (1983) Lie algebra and the theory of elementary particles. Shokabo, Tokyo. (in Japanese) 3. Gilmore R (2005) Lie groups, Lie algebras, and some of their applications. Dover, New York

Index

A Abelian group, 648, 715, 737, 772, 789 Abscissa axis, 185, 202 Absolute convergence, 609 Absolute temperature, 352 Absolute value, 122, 129, 141, 144, 202, 211, 222, 349, 366, 591, 804, 809, 837, 943, 1021 Abstract algebra, 877, 883, 914, 1293, 1301 Abstract concept, 457, 661, 714, 915, 922 AC5, 374–377 AC’7, 376 Accumulation point, 196–198, 230, 232, 233 Adherent point, 192, 197 Adjoint boundary functionals, 411, 415 Adjoint Dirac equation, 1029, 1081 Adjoint Green’s functions, 419, 424 Adjoint matrix, 22, 550, 610 Adjoint operator, 111, 409, 411, 549, 561–566, 1166, 1232 Algebraic branch point, 264 Algebraic equation, 15 Allowed transition, 140, 149, 790, 803, 825 Allyl radical, 714, 715, 773, 797–805 Aluminum-doped zinc oxide (AZO), 377, 378, 384–386 Ampère, A.-M., 281 Ampère’s circuital law, 281 Amplitude, 287, 293, 294, 308, 310, 313, 315, 316, 321, 336, 344, 345, 347, 364, 396, 398, 445, 446, 1073, 1115–1120, 1124, 1126–1136 Analytical equation, 3 Analytical mechanics, 977, 983, 999 Analytical method, 80, 85, 94

© Springer Nature Singapore Pte Ltd. 2023 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-99-2512-4

Analytical solution, 45, 109, 155 Analytic continuation, 232–234, 241, 273 Analytic function, 185–274, 330, 607, 608, 898, 1048, 1304 Analytic prolongation, 234 Anderson, C., 1036 Angle-dependent emission spectra, 377, 388, 390, 391 Angle-dependent spectroscopy, 393 Angular coordinate, 202, 203 Angular frequency, 4, 7, 32, 129, 131, 142, 287, 353, 358, 359, 364, 369, 994, 995, 1002, 1107 Angular momentum generalized, 75–79, 839, 840, 874, 904, 926, 1290 operator, 30, 89, 94, 144, 147 orbital, 60, 79–110, 124, 144, 874, 926, 959 spin, 79, 873, 874, 959 total, 139, 959, 960, 962 Anisotropic crystal, 381, 392 dielectric constant, 380 medium, 380 Annihilation operator, 31, 41, 113, 114, 998, 1005, 1026, 1035–1037, 1039–1041, 1061, 1065, 1097, 1104, 1115, 1116 Anti-commutative, 974, 1043 Anti-commutator, 947, 1033 Anti-Hermitian, 29, 111, 119, 409, 830, 901, 968, 972, 974, 1030, 1086, 1191, 1208, 1259, 1267, 1268, 1298, 1299 Anti-Hermitian operator, 29, 111, 636, 834, 835, 839, 901, 904, 1288, 1300 Antilinear, 552

1305

1306 Antiparticle, 967, 970, 1039–1040, 1256 Anti-phase, 349 Antipodal point, 924, 925 Antisymmetric, 435, 750–753, 890–892, 895, 1003, 1053, 1278, 1297 Approximation method, 155–183 Arc tangent, 66, 386 Argument, 30, 57, 68, 156, 192, 203, 234, 238, 245, 251, 255, 261, 263, 264, 267, 268, 270, 273, 313, 328, 330, 365, 407, 416, 420–423, 425, 427, 428, 431, 434, 435, 438, 440, 441, 480, 489, 516, 525, 527, 543, 558, 581, 598, 604, 634, 642, 649, 673, 684, 707, 719, 729, 733, 757, 761, 766, 773, 777, 790, 825, 834, 846, 849, 856, 858, 878, 932, 940, 988, 1003, 1008, 1017, 1068, 1130, 1144, 1156, 1181, 1220, 1257, 1258 Aromatic hydrocarbons, 755, 773 Aromatic molecules, 783, 797, 805 Associated Laguerre polynomials, 59, 110, 118–124 Associated Legendre differential equation, 85, 94–105, 450 Associated Legendre functions, 59, 85, 97, 105–109 Associative law, 24, 466, 566, 567, 650, 658, 689, 692, 943, 1083, 1173 Asymmetric, 1037 Atomic orbitals, 761, 768, 771–773, 775, 778, 783, 784, 791, 792, 798, 805, 813 Attribute, 662, 663 Azimuthal angle, 30, 72, 700, 819, 861, 869, 924, 925, 1194, 1196 Azimuthal quantum number, 122, 123, 144, 149 Azimuth of direction, 689

B Back calculation, 587 Backward wave, 286, 343, 347 Ball, 226, 227, 935, 939 Basis functions, 90, 709–717, 746, 747, 771, 776, 786, 805, 806, 827, 844–846, 849, 869, 873, 874, 876, 877, 880, 890–892, 894, 895, 1198, 1199 set, 63, 483, 542, 543, 566, 576, 610, 711, 718, 719, 746, 766, 877, 879, 885, 909, 912, 922, 934, 935, 1086, 1143–1148, 1151, 1152, 1154, 1156, 1158,

Index 1160–1162, 1164, 1167, 1174, 1176, 1180, 1193, 1194, 1289, 1300 vectors, 8, 42, 61, 63, 143, 159, 300, 418, 457, 459, 464, 465, 467, 469–471, 473, 475, 478–484, 495, 497, 501, 503, 514, 516–518, 522, 532–535, 540, 542, 544–546, 554, 558, 562–566, 571, 601, 662, 666, 675, 677, 678, 681, 683, 695, 703, 705, 709, 710, 712, 714, 715, 717, 730, 737, 742, 746, 764, 770, 773, 787, 791, 798, 810, 831, 834, 873, 883, 895, 903, 960, 1025, 1086, 1150, 1160, 1175, 1194, 1198, 1203, 1259, 1261, 1276, 1287–1290, 1302 BCs, see Boundary conditions (BCs) Benzene, 674, 755, 773, 789, 791–797, 799 Bessel functions, 642, 643 Bijective, 472–474, 476, 648, 655, 924, 926, 1149, 1157, 1158, 1168, 1276, 1281, 1284, 1285, 1287, 1301 Bijective homomorphism, 655, 656 Bilinear form, 935, 936, 938, 939, 1155, 1157, 1158, 1162, 1163, 1168, 1177–1179, 1183, 1187, 1193, 1261, 1263 Bilinearity of tensor product, 1152 Binomial expansion, 240, 241, 883 Binomial theorem generalized, 240 Biphenyl, 674 Bithiophene, 674 Blackbody, 351–353 Blackbody radiation, 351–353 Block matrix, 712, 891 Blueshift, 388 Blueshifted, 389–392, 394 Bohr radius of hydrogen, 135, 822 of hydrogen-like atoms, 110 Boltzmann distribution law, 351, 360 Born probability rule, 589 Boron trifluoride, 678 Bose–Einstein distribution functions, 353 Boson, 1097–1102, 1139 Boundary functionals, 402, 406, 410, 411, 415, 431 term, 409 Boundary conditions (BCs) adjoint, 415 homogeneous, 445 homogeneous adjoint, 415–417 inhomogeneous, 424, 623

Index Bounded, 25, 29, 77, 81, 212, 219, 221–223, 229, 1192, 1263 BP1T, 374–377 BP3T, 392–394 Bragg’s condition, 389, 390, 392 Branch, 259, 261–264, 643 Branch cut, 261, 262, 264–269, 271–273 Branch point, 260, 262, 264–266, 268, 271–273 Bra vector, 22, 549, 552, 555 Brewster angle, 322–325

C Canonical basis, 1156, 1174 Canonical coordinate, 28, 977, 993, 1055, 1059, 1089 Canonical forms of matrices, 485–548, 583 Canonical momentum, 977, 993, 994, 1055, 1059, 1089 Canonical variables, 983, 993 Cardinalities, 188, 1083 Cardinal numbers, 188 Carrier space, 713 Cartan–Weyl basis, 1287–1297 Cartesian coordinate, 8, 47, 111, 175, 291, 361, 677, 762, 819, 821, 834, 852, 919, 978, 980, 1124, 1218 Cartesian space, 61, 460, 461 Cauchy–Hadamard theorem, 209–212 Cauchy–Liouville Theorem, 222 Cauchy–Riemann conditions, 209–212 Cauchy–Schwarz inequality, 551, 611 Cauchy’s integral formula, 213–223, 225, 232, 236 Cauchy’s integral theorem, 216, 217, 251, 1021–1023 Cavity, 33, 351, 353, 355, 357, 396, 397, 399 Cavity radiation, 351 Cayley’s theorem, 689 Center of inversion, 668 Center-of-mass coordinate, 60 Character, 479, 654, 657, 663, 723, 725, 726, 728, 729, 748–750, 767, 772, 774–777, 782–787, 792, 797, 803, 811, 825, 871, 872, 874, 943, 1156, 1164, 1175 Characteristic equation, 443, 486, 487, 528, 591, 597, 715, 861 Characteristic impedance of dielectric media, 279, 280, 288, 305–351 Characteristic polynomial, 486, 508, 540, 543, 547, 974 Charge conjugation, 967–971, 1039, 1040, 1081, 1209, 1210, 1221, 1227

1307 Chemical species, 677 Circular argument, 156 Circular cone, 1185, 1186 Circularly polarized light left-, 138, 140, 150, 302 right-, 138, 140 Clad layer, 332, 339, 340, 343, 386 Classes, 647, 651–653, 679–681, 683, 685, 688, 725, 726, 731–737, 871 Classical Hamiltonian, 33 Classical numbers, 10 Classical orthogonal polynomials, 49, 97, 401, 450 Clebsch–Gordan coefficient, 873–895, 1294 Clopen set, 196, 199, 200, 921 Closed interval, 201 Closed-open set, 196 Closed path, 199, 200, 215, 216, 922 Closed set, 190, 191, 193–195, 198, 201, 219, 921 Closed shell, 767, 768, 782 Closed surface, 281, 282 Closure, 191–194, 680, 943, 1083 C-number, 10, 22, 995, 1015, 1026, 1032, 1033, 1038, 1039, 1059, 1099, 1133 Cofactor, 476, 477, 498 Cofactor matrix, 498 Coherent state, 128, 129, 135, 141, 152 Coinstantaneity, 944 Column vector, 8, 22, 42, 90, 91, 93, 299, 465, 481, 483, 488, 489, 491, 492, 529, 533–535, 537, 553, 560, 567, 580, 581, 592, 597, 598, 602, 622, 640, 684, 710, 786, 809, 839, 846, 849, 861, 894, 935, 937, 938, 947, 949–953, 957, 968, 969, 1030, 1144–1146, 1152–1154, 1174, 1176, 1193, 1201, 1210–1212, 1239, 1240 Common factor, 504, 505, 509, 510 Commutable, 28, 509, 513–515, 537, 550, 580, 603, 668, 669, 900 Commutation relation anti-, 1032, 1035–1039, 1041, 1098, 1116 canonical, 3, 28–30, 44, 45, 56, 61, 67, 68, 977, 993, 1061, 1065 equal-time, 977, 993–999, 1005, 1015 Commutative, 28, 175, 499, 544, 611, 617, 629, 630, 641, 648, 677, 715, 854, 855, 896, 901, 902, 919, 921, 945, 960–962, 1062, 1088–1090, 1168, 1200, 1202, 1240, 1248, 1249, 1265, 1288, 1303, 1304 Commutative group, 648 Commutative law, 648

1308 Commutator, 28–30, 146, 147, 899, 903, 947, 998, 1005–1007, 1017, 1032, 1062, 1089, 1272 Compact group, 1192, 1263 Compatibility relation, 776 Compendium quantization method, 1031 Complement, 187, 188, 190, 194, 195, 571, 572, 583, 1178, 1179, 1181, 1184, 1186, 1187 Complementary set, 188, 219, 918 Completely reducible, 712, 714, 855, 860 Completeness relation, 746, 1214, 1232, 1233, 1237 Complete orthonormal system (CONS), 64, 155, 159, 167, 170, 171, 177, 873, 952, 962, 969, 989, 1091, 1239 Complete system, 159 Complex amplitude, 313 Complex analysis, 185, 186, 203 Complex analytic representation, 1293, 1294 Complex conjugate, 22, 24, 25, 29, 34, 119, 140, 205, 360, 412, 419, 423, 552, 562, 718, 849, 881, 907, 932, 970, 1030, 1129 Complex conjugate representation, 846 Complex conjugate transposed matrix, 22, 550, 740, 789 Complex domain, 19, 205, 223, 234, 272–274, 326, 1012 Complex function, 14, 185, 205, 210, 212, 213, 411, 790, 1047 Complexification, 1268, 1269, 1281, 1288–1294, 1298–1300 Complex number field, 1142, 1155 Complex phase shift, 341 Complex plane, 14, 185, 186, 202–204, 206, 207, 212, 213, 217–219, 223, 249, 250, 258–261, 272, 328, 329, 548, 915 Complex roots, 243, 442, 443 Complex variable, 15, 204–213, 222, 237, 557, 886, 1011 Composite function, 66, 979, 981 Compton, A., 4, 6, 1075, 1078, 1102, 1109, 1111 Compton effect, 4 Compton scattering, 1075, 1078, 1102, 1109, 1111, 1113–1139 Compton wavelength, 4 Condon–Shortley phase, 103, 105, 109, 125, 135, 141–143 Conductance, 336 Conic section, 1185 Conjugacy classes, 651, 680, 681, 683, 685, 688, 725, 734, 737 Conjugate element, 651

Index Conjugate momentum, 979 Conjugate representations, 789 Connected component, 916–918, 921, 926, 1189, 1255, 1258, 1263 set, 916, 917, 919, 923, 1254, 1263 Connectedness, 199, 200, 204, 915–926, 1241–1258 CONS, see Complete orthonormal system (CONS) Conservation law, 283, 933 Conservation of charge, 282 Conservation of energy, 4 Conservative force, 982, 1078 Conserved quantity, 959 Constant loop, 922, 1250 Constant matrix, 629, 630, 643 Constant term, 173 Constructability, 423 Continuity equation current, 282, 1054 Continuous function, 220, 915, 917, 922, 1258 Continuous group, 196, 612, 689, 827–926, 1141, 1241, 1242, 1293, 1294 Continuous spectra, 961 Contour integral, 213–215, 236–238, 244, 246, 248, 250, 1021, 1047 integration, 217–219, 223, 227, 237, 239, 243, 245, 247, 255, 267, 270, 271, 1011, 1020–1023, 1048 map, 755, 756, 758, 759 Contraposition, 210, 230, 404, 474, 503, 554, 693, 854, 1179 Contravariant tensor, 1051, 1173, 1174 vector, 937–939, 1061, 1174, 1177 Convergence circle, 226, 227, 233 Convergence radius, 226 Converse proposition, 617, 854 Coordinate point, 298, 301 Coordinate representation, 45–53, 110, 114, 124, 129, 131, 139, 149, 161, 167, 170, 172, 174, 175, 179, 181, 182, 264, 418, 819, 852, 867, 954, 1049, 1086, 1124, 1184, 1193, 1195, 1199, 1201, 1237, 1245 Coordinate space, 977 Coordinate transformation, 300, 382, 661, 667, 703, 706, 764, 845, 868, 1167, 1169, 1177, 1187 Core layer, 300, 382, 661, 667, 703, 764, 845, 868, 1167, 1169, 1177, 1187 Corpuscular beam, 6, 7 Correction term, 157–160

Index Coset decomposition, 650, 652, 677, 920, 1255 left, 650, 651 right, 650, 651 Cosine rule, 1138 Coulomb gauge, 1056 Coulomb integral, 766, 778, 794, 816 Coulomb potential, 59, 69 Coulomb’s law, 278 Countable, 14, 588, 647, 989 Covariant derivative, 1076, 1080, 1082, 1083, 1085 form, 1052, 1060, 1064, 1095, 1202 tensor, 1173, 1174 vector, 938, 939, 945, 1061, 1174, 1177 Covering group universal, 926, 1241, 1242, 1249 Cramer’s rule, 317, 428, 837 Creation operator, 41, 998, 1004, 1005, 1026, 1035, 1036, 1097, 1104 Critical angle, 322–325, 328, 329, 331, 341 Cross-section area, 367, 369, 397 Crystal anisotropic, 392 organic, 372, 377, 380, 389, 390, 393–395, 594 Cubic groups, 688 Cyclic group, 648 Cyclic permutation, 667, 703 Cyclopropenyl radical, 773, 783–791, 794, 796, 799

D d’Alembertian, 932 d’Alembert operator, 932 Damped oscillator, 441, 443, 445, 446 Damping constant, 441 critical, 442 over, 442 weak, 442, 443 Darboux’s inequality, 219–220, 238 de Broglie, L.-V., 6 de Broglie wave, 6 Definite integral, 26, 101, 114, 134, 139, 175, 236, 743, 749, 767, 769, 820, 1048 Degeneracy, 156 Degenerate doubly, 795, 974, 1212, 1219, 1220, 1235, 1238, 1239, 1257 triply, 817, 823, 824

1309 δ function, 418, 420, 422, 425, 444, 988, 1007, 1008, 1024, 1025, 1047, 1065, 1068, 1106, 1116, 1117, 1122, 1124 de Moivre’s theorem, 203 de Morgan’s law, 189 Derived set, 196, 197 Determinant, 43, 61, 296, 405, 432, 457, 473–478, 544, 553, 557, 591, 596, 673, 680, 689, 690, 770, 815, 837, 895, 917, 918, 921, 943, 945, 949, 974, 1163, 1188, 1189, 1249, 1254, 1257 Device physics, 342, 372, 395, 1304 Device substrate, 377, 384 Diagonal elements, 89, 90, 93, 296, 475, 477, 487, 488, 492, 503, 523, 525, 526, 540, 545, 583, 595, 614, 615, 681, 716, 723, 771, 809, 811, 901, 902, 904, 936, 1034, 1160, 1161, 1239 Diagonalizable, 502, 513, 540, 543, 544 Diagonalizable matrix, 539–548, 585 Diagonalization, 493, 543, 592, 598–605, 627, 694, 716, 952, 1087, 1247 Diagonalized, 44, 502, 504, 513, 540, 542, 545, 556, 571, 580, 581, 583, 585, 591, 593, 603, 605, 627, 632, 716, 717, 809, 832, 951–953, 962, 1087, 1141, 1160, 1161, 1204, 1219–1221, 1229, 1230, 1235, 1239, 1240, 1242, 1290, 1292 Diagonal matrix, 91, 485, 512, 543, 560, 580, 582, 583, 592, 593, 598, 627, 631, 664, 707, 715, 764, 809, 834, 856, 858, 910, 1034, 1161, 1204, 1258 Diagonal positions, 492, 513 Dielectric constant, 278, 279, 348, 349, 380, 1049 Dielectric constant ellipsoid, 381, 387, 391 Dielectric media, 279, 280, 288, 294, 305–351 Differentiable, 67, 205, 207, 210, 212, 222, 283, 428, 607, 617, 898, 899 Differential cross-section, 1122, 1124 Differential equations inhomogeneous, 402, 405, 424, 430, 440, 624, 626, 634 in a matrix form, 621–626 second-order, 941 system of, 607, 618–637, 643, 644 theory of, 431, 450 Differential operator, 15, 19, 26, 27, 51, 52, 71, 74, 80, 171, 177, 278, 401, 408, 410, 411, 413, 415, 416, 421, 424–426, 431, 436, 448–450, 549, 571, 932, 1050, 1203

1310 Diffraction grating, 372, 375–378, 386, 390, 393, 395 Diffraction order, 379, 392 Dimensionless, 75, 80, 81, 110, 278–280, 287, 368, 372, 819, 829 Dimensionless parameter, 51, 156 Dimension theorem, 469, 473, 505, 519, 520, 541, 1286 Dipole approximation, 129, 144, 359 isolated, 365 oscillation, 351, 364 radiation, 361–367 transition, 127–129, 131, 134, 149, 766, 780 Dirac, P., 11, 12, 22, 549, 931 Dirac adjoint, 958, 964, 965, 967, 971, 1097, 1237, 1240 Dirac equation, 12, 399, 931–975, 991, 1032, 1081, 1085, 1141, 1187, 1193, 1194, 1197–1204, 1208, 1210, 1211, 1214, 1218, 1220–1222, 1227–1235, 1240 Dirac field, 960, 962, 963, 970, 977, 991, 993, 999, 1005, 1011, 1023, 1026–1049, 1061, 1071, 1075, 1077–1079, 1083, 1084, 1096, 1098, 1101, 1111, 1211 Dirac operators, 1141, 1203, 1204, 1208, 1212–1229, 1235–1241 Direct factor, 658, 677 Direction cosine, 288, 315, 684, 701, 702, 862, 864, 865 Direct-product group, 647, 657–659, 677, 746, 749, 750, 874, 919 Direct product space, 1142, 1155, 1171 Direct sum, 196–198, 460, 462, 497, 500, 501, 504–508, 514–516, 523, 540, 542, 543, 571, 574, 659, 712, 717, 725, 728, 746, 768, 775, 797, 808, 855, 873, 916, 919, 1178, 1192, 1204, 1233, 1298–1300 Dirichlet conditions, 15, 19, 26, 73, 354, 357, 416, 430 Disconnected, 199, 200 Discrete set, 196, 197 Discrete topology, 199 Disjoint sets, 199, 200 Dispersion of refractive index, 373–376 Displacement current, 281, 283 Distance function, 185, 202, 204, 550 Distributive law, 189, 467 Divisor, 499, 500, 541, 675 Domain of analyticity, 213, 215–217, 219, 223, 234 complex, 19, 205, 223, 234, 272–274, 326, 1012

Index of variability, 185 Dual basis, 1156, 1158, 1167 Dual space, 552, 1156, 1163, 1173 Dual vector, 552, 1155–1165, 1174 Dynamical system, 982–984, 989, 991, 1004 Dynamical variable, 959

E Effective index, 377, 383, 390–394 Eigenenergy, 44, 113, 123, 155–160, 167, 168 Eigenequation, 486 Eigenfunction, 17–19, 24, 25, 27, 36, 37, 40, 42, 69–71, 74, 89, 112, 114, 139, 156, 177, 452, 716, 742, 746, 765, 766, 796, 800, 801, 821, 856, 892, 894, 951–955, 957, 962, 964, 965, 967, 986, 987, 1234, 1239 Eigenspace, 486, 495–500, 505–512, 514, 515, 523, 540, 542, 543, 583, 589, 602, 746 Eigenspinor, 1220 Eigenstate energy, 167, 168, 180 simultaneous, 68, 71, 75, 81, 88, 92, 93, 599–605, 962 Eigenvalue equation, 13, 24, 29, 33, 52, 128, 155, 156, 164, 176, 177, 386, 430, 486, 492, 529, 548, 560, 602, 755, 760, 769, 771, 816, 954, 956, 962, 989 problem, 14, 16, 19, 21, 24, 122, 448–453, 485, 548, 603, 717, 746, 813, 962, 1199, 1203 Eigenvector, 27, 37, 42, 76, 89, 156, 157, 159, 299, 485–495, 498, 500–505, 519–522, 525, 528–534, 537, 538, 540, 542–544, 546, 548, 580, 583, 585, 590, 599–604, 681, 691, 693, 702, 717, 770, 772, 809, 861, 862, 952, 953, 962, 964, 989, 1199, 1201, 1211, 1212, 1234, 1239, 1240 Einstein, A., 3, 4, 6, 7, 351, 367 Einstein coefficients, 359, 361, 365 Einstein summation convention, 937, 1177 Electric dipole approximation, 144 moment, 129, 169, 170, 767 operator, 824 transition, 127–129, 131, 134, 149, 766, 780 Electric displacement, 278 Electric field, 129, 150, 152, 161, 164, 168, 170, 178, 277, 278, 280, 293–295, 297, 298, 301, 302, 307, 313–317, 321, 326, 336, 337, 340–343, 347–349, 357,

Index 363–365, 383–385, 396, 397, 441, 637, 781, 824, 1051 Electric flux density, 278, 382 Electric permittivity tensor, 380 Electric wave, 293–295 Electromagnetic field, 33, 129, 151, 305–307, 313–315, 332, 363, 377, 380, 384, 386, 398, 594, 643, 977, 991, 993, 1023, 1049–1073, 1075, 1077–1079, 1083, 1084 Electromagnetic induction, 280 Electromagnetic plane wave, 290 Electromagnetic potential, 1051, 1077 Electromagnetic wave, 129, 277, 289–303, 305–351, 353–358, 365, 370, 372, 382, 383, 396, 401, 441, 637, 1059 Electromagnetism, 170, 277, 278, 287, 450, 594, 1049, 1061, 1304 Electronic configuration, 767, 780–782, 797, 802, 803, 824, 825 Element, 47, 86, 129, 186, 296, 359, 457, 487, 549, 577, 607, 647, 661, 705, 761, 841, 936, 1032, 1083, 1145, 1271 Elementary charge, 60, 129, 151, 359, 1076 Elementary particle physics, 1140, 1258 reaction, 1120 Ellipse, 297, 298, 301, 302, 381, 599 Ellipsoidal coordinates, 819 Elliptically polarized light, 297, 298, 301, 1061 Emission direction, 377, 393 gain, 392, 394 grazing, 379, 390, 391 lines, 370, 374, 375, 377, 393, 394 optical, 372, 768 spontaneous, 358, 359, 365, 366, 368 stimulated, 358, 359, 367, 368 Empty set, 188–190, 652 Endomorphism, 457, 465, 468, 469, 471–474, 478, 480, 648, 656, 907, 908, 1086, 1152, 1172, 1173, 1272, 1275, 1279, 1286, 1301 Energy conservation, 4, 322, 1137 Energy eigenvalue, 20, 31, 36, 37, 53, 123, 135, 155, 156, 158, 162, 176, 717, 746, 765, 777, 779, 780, 789, 795, 796, 802, 814, 823, 962, 967 Energy level, 41, 157–160, 176, 358, 781, 824 Energy-momentum, 991, 1107 Energy transport, 319–322, 331, 347 Entire function, 213, 222, 223, 250, 1048 Equation of motion, 60, 151, 441, 933, 981, 992–994, 1079, 1089 Equation of wave motion, 277, 284–289, 334

1311 Equilateral triangle, 677, 678, 783, 784 Equivalence law, 512, 593 relation, 916 theorem, 932 transformation, 570, 593, 594, 1161, 1162 Equivalent transformation, 593 Ethylene, 755, 773–784, 786, 788, 790, 791, 797, 799 Euclidean space, 1165, 1177–1187 Euler angles, 695–704, 836, 840, 842, 845, 849, 856, 861, 911, 1261 Euler equation second-order, 173 Euler–Lagrange equation, 982, 992, 993, 1057 Euler’s formula, 203 Euler’s identity, 203 Evanescent mode, 331 Evanescent wave, 332, 338–343 Even function, 19, 51, 134, 161, 867, 873, 1024 Even permutations, 475, 904, 1095, 1111 Excitation optical, 372, 825 strong, 373, 393 weak, 393, 394 Excited state, 37, 44, 134, 156, 178, 179, 181, 351, 352, 358, 359, 366, 780, 797, 802, 803, 824, 825 Existence probability, 21, 32, 987 Existence probability density, 142 Expansion coefficient, 177, 225, 418, 866, 994, 1261 Expectation value, 25, 33–36, 53, 54, 70, 176, 178, 588–590, 1015–1017, 1031, 1043, 1079, 1098–1100 Exponential function, 15, 248, 290, 308, 309, 446, 607–644, 827, 830, 836, 856, 921, 946, 1048, 1082, 1086, 1301, 1302 Exponential polynomial, 309 External force field, 60, 441 Extremals, 131, 347 Extremum, 180, 388, 984

F Fabry-Pérot resonator, 373 Factor group, 653, 657, 677 Faraday, M., 280 Faraday’s law of electromagnetic induction, 280 Fermion, 1036, 1043, 1095, 1098–1103, 1110, 1111, 1139 Feynman amplitude, 1073, 1115–1120, 1124, 1126, 1129

1312 Feynman (cont.) diagrams, 1103–1113, 1115–1120, 1139 propagator, 977, 989, 994, 1007, 1015–1023, 1028, 1043–1049, 1071, 1072, 1103, 1109, 1111, 1112 rules, 1103–1113 FIB, see Focused ion beam (FIB) Fine structure constant, 1095, 1139 First-order approximation, 181, 182 First-order linear differential equation (FOLDE), 45, 46, 401, 406–411, 619, 620, 624, 634 First-order partial differential equation, 277 Fixed coordinate system, 666, 683, 696, 704, 861, 924 Flux, 151, 278–280, 322, 365, 369, 382, 638, 1050, 1051 Fock basis, 1005 space, 999–1005 state, 1092 Focused ion beam (FIB), 377, 395 FOLDE, see First-order linear differential equation (FOLDE) Forbidden, 127, 781, 797, 825, 961 Force field central, 59, 109 external, 60, 441 Forward wave, 286, 343, 345, 347 Four-group, 664, 674, 1255 Fourier analysis, 985–990 coefficients, 171 components, 1026, 1117, 1203 cosine coefficients, 873 cosine series, 873 counterpart, 1049, 1116 integral transform, 987–990 series expansion, 985–987 Fourier transform inverse, 985, 987–990 pairs, 989 Four‐momentum, 1004, 1017, 1104, 1107, 1119, 1132, 1133, 1137, 1228 Four-vector, 934, 945, 1003, 1024, 1026, 1035, 1052–1055, 1059, 1076, 1079, 1131, 1133, 1135, 1136, 1193 Fréchet axiom, 200 Free field, 960, 1075, 1085 Free particle, 21, 932, 1026, 1085 Free space, 305, 337, 343, 365 Functionals, 13, 15, 103, 120, 124, 136, 142, 165, 171, 173, 206, 210, 225, 234, 241, 287, 402, 404, 406, 410, 411, 415, 431, 549, 620, 743, 760, 767, 772, 780, 885,

Index 959, 983, 1023, 1047, 1051, 1083, 1156, 1202, 1212, 1214, 1240 Functional space, 549 Function space, 25, 649, 774 Fundamental set of solutions, 15, 405, 426–428, 431–434, 441–443, 640, 644 Fundamental solution, 634

G Gain function, 369 Gamma functions, 98–100, 887 Gamma matrices, 947, 949, 952, 971–975, 1032, 1082, 1124, 1130, 1132, 1133, 1204, 1227 Gauge field, 1055, 1082–1085 invariant, 1084, 1130 transformation, 1051, 1054, 1082–1084, 1126, 1129 Gaussian plane, 185 Gauss’ law, 278 Gegenbauer polynomials, 97, 102 Generalized binomial theorem, 98, 240 Generalized coordinate, 977, 978, 980, 981, 993 Generalized eigenspace, 498, 505–512, 515, 523 Generalized eigenvalue, 486, 528, 615 Generalized eigenvector, 498, 500–505, 519, 521, 522, 525, 528, 531–534, 538, 540, 548 Generalized force, 981 Generalized Green’s identity, 432 Generalized momentum, 979, 982, 993 General linear group, 648, 837 General point, 661, 756, 844, 1194 General solution, 15, 32, 286, 405, 406, 431, 442, 443, 994, 1230 Generating function, 97 Geometric figures, 673 Geometric object, 661–664, 666, 673, 680 Geometric series, 224, 228 Glass, 277, 288, 295 Global properties, 920–926 Gram matrix, 555, 556, 558–560, 566, 569, 570, 587, 592, 707, 1141, 1163, 1166 Gram–Schmidt orthonormalization, 567, 580, 742, 965, 1165, 1179 Grand orthogonality theorem (GOT), 717–723, 739 Grassmann numbers, 1032–1034, 1038, 1042, 1045 Grating period, 379, 394 Grating wavevector, 378, 379, 388, 391, 393

Index Grazing incidence, 328, 330 Greatest common factor, 505 Green’s functions, 401–453, 607, 618, 620, 621, 644 Green’s identity, 420 Ground state, 36, 130, 134, 156, 157, 160, 161, 168, 176, 178, 179, 181, 351, 352, 358, 359, 768, 780, 797, 802, 803, 824, 825 Group algebra, 726–734 element, 647, 649, 651, 659, 675, 680, 685, 688, 689, 705–707, 710, 712, 715, 719, 721, 723–726, 730, 733–735, 737, 738, 746, 750, 761, 772, 780, 784, 791, 855, 863–867, 870, 876, 1250 finite, 647–649, 651, 654, 655, 657, 689, 705–707, 709, 710, 712, 717, 729, 827, 863, 870, 871, 876, 918, 920 infinite, 648, 655, 661, 689, 827, 873, 876 representation, 711, 720, 723, 725–727 symmetric, 688, 689 theory, 484, 607, 647–659, 662, 669, 705, 714, 737, 755–826 Group index, 371, 376 Group refractive index, 371 Group velocity, 7, 12, 338

H Half-integer, 78, 79 Half-odd-integer, 78, 79, 81, 84, 841, 844, 880, 914, 1291 Hamilton–Cayley theorem, 498–499, 509, 510, 515, 540 Hamiltonian, 11, 21, 29, 31, 33, 36, 43, 59–69, 109, 156, 161, 168, 178, 181, 558, 761, 764, 766, 767, 771, 788, 816, 819, 960–964, 977, 983, 984, 999–1005, 1031, 1049, 1065–1071, 1076, 1077, 1079, 1085, 1088, 1091, 1094, 1104, 1105, 1108, 1113, 1115 Hamiltonian density, 999, 1028–1035, 1049, 1055–1059, 1065, 1079, 1085, 1094, 1104 Hamilton’s principle, 984 Harmonic oscillator, 30–59, 109–111, 127, 128, 132, 155, 156, 178, 179, 351, 353, 396, 398, 399, 401, 440, 441, 558, 774, 994, 998, 1003, 1201 Heaviside-Lorentz units, 1049 Heaviside step function, 422 Heisenberg, W., 11, 1087–1091 Heisenberg equation of motion, 1089 Hermite differential equation, 449, 450

1313 Hermite polynomials, 49, 51, 59, 167, 450 Hermitian, 7, 34, 70, 132, 156, 360, 401, 555, 571, 616, 707, 765, 830, 943, 995, 1081, 1165, 1278 Hermitian differential operator, 176, 178, 411, 421, 424, 425, 448 Hermitian matrix, 24, 133, 555–557, 569, 592, 597, 616, 618, 627, 898, 907, 909, 951, 952, 1192, 1197, 1229, 1242, 1243, 1245–1247, 1259, 1263, 1298, 1299, 1302 Hermitian operator, 24, 25, 29, 54, 55, 71, 79, 111, 132, 160, 177, 181, 360, 401, 424, 450, 453, 571–605, 636, 741, 766, 834, 835, 839, 901, 904, 960, 962, 972, 1030, 1069, 1087, 1104, 1221, 1243, 1250, 1258, 1259, 1288, 1300 Hermiticity, 3, 19, 25, 26, 119, 413, 416, 424, 449, 450, 601, 709, 771, 974, 1238, 1240, 1243 Hesse’s normal form, 288, 684, 687 High-energy physics, 949 Highest-occupied molecular orbital (HOMO), 780, 781, 797 Hilbert space, 26, 588, 774, 1005 Hole theory, 1036 Homeomorphic, 926 HOMO, see Highest-occupied molecular orbital (HOMO) Homogeneous boundary conditions (BCs), 410, 411, 416, 417, 420, 421, 424, 425, 427, 429–433, 445, 448, 449, 623 Homogeneous equation, 173, 402, 419, 421, 423, 424, 426, 427, 430, 441, 448, 449, 620, 623, 626, 630, 633, 639, 640 Homomorphic, 654–657, 706, 711, 734, 1265, 1272, 1285–1287 Homomorphism, 647, 653–657, 705–707, 739, 745, 912–914, 1263, 1283, 1286, 1287 Homomorphism theorem, 657, 914, 1265 Homotopic, 922, 1250, 1258 Hückel approximation, 780 Hydrogen, 59, 124, 127, 134–144, 167–171, 176, 781, 805, 807–810, 813–819, 962 Hydrogen-like atoms, 59–125, 127, 134, 143, 144, 155, 157, 558, 603, 949 Hyperbolic functions, 942, 1211, 1216 Hypersphere, 923, 1185 Hypersurface, 599, 923

I Idempotent matrix, 505–513, 544, 546–548, 588, 631

1314 Identity element, 648, 649, 651, 653–656, 658, 663, 664, 675, 723, 777, 786, 913, 916–918, 921, 922, 942, 1083, 1189, 1255, 1258, 1263, 1271, 1273, 1282, 1283 mapping, 1264, 1301 matrix, 44, 507, 509, 511, 513, 521, 569, 570, 586, 591, 608, 666, 690, 694, 706, 720, 728, 729, 808, 830, 896, 915, 921, 939, 947, 951, 970, 975, 1030, 1134, 1203, 1214, 1229, 1230, 1233, 1263, 1265 operator, 35, 56, 159, 171, 417, 419, 507, 989, 993, 1024, 1035, 1062, 1092, 1095, 1104, 1177, 1188, 1230 transformation, 666, 689, 693, 912, 913 Image of transformation, 468 Imaginary axis, 185, 202 Imaginary unit, 185 Improper rotation, 669, 673, 690, 807, 920 Incidence angle, 311, 325 Incidence plane, 311, 314, 315, 317, 322, 326 Incident light, 310, 312, 314 Indeterminate constant, 19 Indeterminate form of limit, 323, 960 Indiscrete topology, 199 Inequivalent, 711, 718, 720–722, 725, 734, 737, 860, 869, 871–873 Inertial frame of reference, 934, 939, 940, 943, 1085, 1121, 1187, 1207, 1222, 1227–1229 Infinite dielectric media, 294, 295, 305, 306 Infinite-dimensional, 459, 1157 Infinite power series, 608, 1095 Infinite series, 163, 224, 232 Inflection point, 242 Inhomogeneous boundary conditions (BCs), 426, 427, 430, 438, 441, 446, 447, 623, 624, 626 Inhomogeneous equation, 402, 405, 424, 430, 623, 624, 633 Initial value problems (IVPs), 401, 431–448 Injective, 472–474, 478, 653, 655, 924, 1264 Inner product of functions, 745 space, 202, 549–571, 574, 578, 579, 587, 588, 590, 592, 610, 714, 1086, 1141, 1158, 1163, 1179, 1259 of vectors, 419, 601, 749 vector space, 574 In-quadrature, 396, 397 Integral

Index action, 977, 984, 991–993, 1028, 1080, 1084 line, 306, 307 surface, 305–307, 363 Integrand, 221, 239, 252, 254, 266, 268, 271, 274, 985, 992, 995, 1002, 1019, 1023, 1026, 1031, 1041, 1044, 1045, 1047, 1066, 1068, 1079, 1117 Integration clockwise, 217 complex, 213, 236, 1011, 1019, 1020, 1023 constant, 151, 152, 397, 407, 619, 620 contour, 217–219, 223, 227, 237, 239, 243, 245, 247, 255, 267, 270, 271, 1011, 1020–1023, 1048 counter-clockwise, 217 definite, 134 by parts, 26, 29, 86, 106, 108, 132, 408, 432, 992, 1029, 1057, 1084 path, 215, 217, 229, 1018, 1023 range, 25, 247, 438, 440, 769, 995, 1011 real, 236 surface, 282, 306 termwise, 224, 228 volume, 282 Interface, 305–308, 310–315, 322, 325, 331, 332, 335–339, 343, 347–351, 357, 377, 380, 385, 386, 396, 397 Interference constructive, 370 theory, 372 Interference modulation of emissions, 373, 376 spectra, 374 Interior point, 191 Intermediate state, 160 Intermediate value theorem, 917 Intersection, 188, 195, 649, 669, 671, 677, 688, 924 Invariant, 487, 495–501, 504, 508, 514, 517, 529–531, 539, 540, 544, 548, 571, 583, 590, 593, 602, 603, 615, 652, 656, 659, 677, 688, 713–715, 717, 732, 746, 760, 764, 808, 831, 834, 843, 867, 869, 880, 882, 883, 892, 914, 917, 918, 935, 938, 940, 957, 967, 970, 977, 989, 994, 1005–1015, 1017, 1018, 1023, 1024, 1027, 1028, 1035, 1040–1043, 1049, 1052, 1054, 1059, 1072, 1082, 1084, 1089, 1100, 1118, 1123, 1130, 1158, 1165–1171, 1175–1177, 1187, 1189, 1207, 1214, 1227, 1255, 1261, 1263, 1304

Index Invariant delta functions, 977, 1005–1015, 1017, 1018, 1023, 1028, 1040–1043, 1072 Invariant scattering amplitude, 1118 Invariant subgroup, 652, 656, 659, 677, 688, 732, 914, 917, 918, 1189, 1255 Invariant subspace, 495–500, 508, 517, 529–531, 539, 540, 544, 548, 571, 583, 602, 603, 713, 714, 717, 808 Inverse element, 472, 648, 653, 655, 658, 664, 682, 706, 727, 733, 942, 1083 Inverse matrix, 43, 205, 457, 476, 479, 480, 538, 596, 622, 628, 631, 706, 937, 942, 944, 1170, 1195 Inverse transformation, 468, 472, 473, 648, 845, 847, 1257, 1258, 1284 Inversion symmetry, 668, 669, 673, 680, 1255 Inverted distribution, 368 Invertible, 472, 648, 656 Irradiance, 322, 367–370 Irrational functions, 256 Irreducible, 712, 714, 717–726, 728, 729, 734–738, 742–746, 749, 764, 766–768, 771–773, 775–777, 780, 782, 784, 786, 788, 792, 797–799, 801, 803, 806, 808, 810–812, 814, 815, 817, 823–825, 853–861, 868–874, 876–878, 891, 1291–1293 Irreducible character, 723, 868–874 Isolated point, 196–199 Isomorphic, 654, 655, 657, 674, 680, 688, 711, 760, 914, 924, 1157, 1249, 1255, 1264, 1276, 1281, 1286, 1288, 1290, 1301 Isomorphism, 647, 653–657, 705, 920, 1157, 1158, 1276, 1284–1287 IVPs, see Initial value problems (IVPs)

J Jacobian, 865 Jacobi’s identity, 903, 1274, 1277 j-j Coupling, 879 Jordan blocks, 517–528, 530–533, 540 Jordan canonical form, 485, 515–539 Jordan decomposition, 515 Jordan’s lemma, 248–256, 1021, 1022

K Kernel, 468, 529, 655–657, 912–914, 1263–1265, 1283–1285 Ket vector, 22, 549, 552 Klein-Gordon equation, 931–934, 946, 991, 992, 994, 1097, 1197, 1200–1204

1315 Klein-Gordon field, 991, 994, 999, 1028 Kronecker delta, 50, 937 Kronecker product, 1149

L Laboratory coordinate system, 381 Laboratory frame, 1121, 1136, 1138 Ladder operator, 76, 93, 94, 114 Lagrange’s equations of motion, 982, 984, 985, 993 Lagrangian, 977–985, 991–993, 1028–1035, 1049, 1055–1059, 1079–1082, 1084, 1085 Lagrangian density, 991–993, 1028, 1049, 1055–1059, 1079, 1080, 1084 Lagrangian formalism, 977–985, 993 Laguerre polynomials, 59, 110, 118–124 Laplace operator, 8, 761 Laplacian, 8, 761, 762 Laser device, 372, 375, 395 material, 371, 375, 377 medium, 367–369, 371, 380 organic, 372–395 Laser oscillation multimode, 376, 394 single-mode, 394, 395 Lasing property, 372 Laurent’s expansion, 227, 229, 236, 240, 241 Laurent’s series, 223–230, 234, 236, 241 Law of conservation of charge, 282 Law of electromagnetic induction, 280 Law of equipartition of energy, 353 LACO, see Linear combination of atomic orbitals (LCAO) Least action principle of, 984, 992, 993 Left-circularly polarized light, 138, 140, 146, 150, 301, 302, 637 Legendre differential equation, 95, 96, 450 Legendre polynomials, 59, 88, 94, 95, 101, 103, 107–109, 273 Legendre transformation, 983, 984 Leibniz rule, 94, 96, 103, 120 Leibniz rule about differentiation, 94, 96 Levi-Civita symbol, 904 Lie algebra, 618, 626, 636, 827, 830, 896–915, 921, 922, 926, 1189–1193, 1241, 1242, 1250, 1256, 1259, 1266, 1269, 1271–1304 Lie group linear, 897, 898, 908, 917, 921, 922, 926, 1282

1316 Light amplification, 367, 368 Light amplification by stimulated emission of radiation (LASER), 367 Light-emitting devices, 377, 378, 391, 392 Light-emitting materials organic, 372 Light-emitting properties, 372 Lightlike, 940 Lightlike vector, 1184–1186 Light propagation, 151, 277, 331, 377, 379, 381, 387 Light quantum, 3, 4, 358, 359 Limes superior, 226 Limit, 208, 212–214, 218, 219, 226, 233, 267, 323, 438, 440, 613, 617, 729, 830, 866, 889, 900, 960, 1018, 1121, 1190, 1272, 1283, 1302 Linear algebra, 518 Linear combination of atomic orbitals (LCAO), 755, 761, 768 of basis vectors, 545 of functions, 128 of a fundamental set of solutions, 426 Linear combination of atomic orbitals (LCAO), 755, 761, 768 Linearly dependent, 17, 96, 103, 109, 309, 355, 403–405, 458, 467, 469, 479, 493, 494, 496, 553, 554, 557, 558, 568, 950, 1164, 1183 Linearly independent, 15, 17, 32, 73, 309, 353, 357, 383, 403–405, 426, 457–459, 461, 462, 470, 479, 482, 489, 493–495, 497, 500, 503–505, 516, 517, 540, 552, 554, 558, 559, 567, 569, 571, 600, 622, 640, 690, 691, 710, 714, 719, 728, 736, 741, 742, 812, 814, 840, 851, 946, 950, 952, 974, 975, 1142–1144, 1146, 1147, 1156, 1158, 1163, 1164, 1183, 1236, 1280 Linearly polarized light, 144, 146, 295, 303, 1061 Linear transformation endomorphic, 475, 648 group, 648 successive, 480, 482, 695 Linear vector space finite-dimensional, 457, 473, 474, 1157 n-dimensional, 457, 458, 461, 465, 515 Lithography, 395 Local properties, 920–926 Logarithmic branch point, 264 Logarithmic functions, 256, 1252 Longitudinal multimode, 350, 370, 374, 394 Loop, 306, 922, 924, 1250

Index Lorentz algebra, 1278–1287 boost, 1141, 1192, 1194–1197, 1205, 1206, 1212–1223, 1229, 1230, 1232, 1233, 1235, 1236, 1238, 1249, 1262, 1281 condition, 1058, 1059, 1069, 1070 contraction, 943, 944 force, 150, 637–639, 642, 643, 1075–1079 gauge, 1058, 1084 group, 942–944, 1141, 1187–1197, 1223, 1241–1258, 1263, 1264 invariant, 1006, 1007, 1009–1011, 1015, 1027, 1049, 1052, 1123 transformation, 934–940, 942, 943, 957, 971, 1187–1200, 1203, 1205, 1207, 1212, 1221, 1223, 1224, 1228, 1236, 1247, 1256, 1257, 1261, 1302, 1303 Lowering operator, 93, 110, 114 Lower left off-diagonal elements, 475 Lower triangle matrix, 488, 539, 577, 632 Lowest-unoccupied molecular orbital (LUMO), 780, 781, 797 L-S Coupling, 879 Luminous flux, 322

M Magnetic field, 150, 277, 279–281, 284, 285, 292–294, 307, 313–317, 320, 321, 334, 336, 338, 357, 365, 383–385, 396, 637 Magnetic flux, 280 Magnetic flux density, 151, 280 Magnetic permeability, 279, 380 Magnetic quantum number, 143, 144 Magnetic wave, 155, 293, 295 Magnetostatics, 279 Magnitude relationship, 434, 720, 825 Major axis, 301 Mapping bijective, 926, 1157, 1158, 1168, 1284, 1285, 1287, 1301 bilinear, 1142–1149, 1155, 1172 continuous, 916, 917, 926 homomorphic, 655, 656, 1265, 1272, 1285–1287 identity, 1264, 1301 injective, 653 inverse, 472, 1149, 1157 invertible, 472 isomorphic, 657, 688 multilinear, 1171–1173 non-invertible, 472 reversible, 472

Index surjective, 657, 912 universal bilinear, 1149 Mathematical induction, 49, 82, 115, 117, 488, 490, 493, 568, 582, 595, 609, 1171 Matrix adjugate, 477 algebra, 43, 317, 382, 457, 468, 476, 501, 556, 853, 952, 971, 1141, 1193, 1242 decomposition, 515, 539, 585, 587, 725 equation, 531, 949 function, 915, 1250, 1255, 1256 mechanics, 11 nilpotent, 93, 94, 500–505, 511–521, 526, 536, 539, 577, 633 self-adjoint, 555 semi-simple, 513, 536, 539 singular, 579, 1161 skew-symmetric, 616, 618, 626, 902, 903, 907, 921, 1288, 1302 symmetric, 133, 592, 594–597, 631, 1160, 1170, 1197, 1247, 1248, 1251–1253, 1262, 1302 transposed, 405, 498, 549, 562, 1035 Matrix element of electric dipole, 144 of Hamiltonian, 766, 767 Matter field, 1076 Matter wave, 6, 7 Maximum number, 458 Maxwell, J.C., 332, 333, 380, 381, 1049–1053, 1058 Maxwell’s equations, 277–303, 332, 333, 380, 381, 1049, 1050, 1052, 1053, 1058 Mechanical system, 33, 124, 396–399 Meromorphic, 232, 238 Meta, 794, 795 Methane, 620, 622, 640 Method of variation of constants, 685, 755, 805–826 Metric Euclidean, 939, 1165 indefinite, 476, 937, 1058, 1065–1071, 1187 Minkowski, 936, 937, 1052 space, 202, 204, 226, 227, 550, 937 tensor, 936, 1165, 1170, 1175–1178, 1187 Michelson–Morley experiment, 940 Microcausality, 1015 Minimal coupling, 1075–1078, 1085 Minimal substitution, 1076 Minkowski space four-dimensional, 936, 1165 n-dimensional, 1177

1317 Minor, 296, 397, 473, 476, 559, 597 Minor axis, 301 Mirror symmetry plane of, 667, 671, 673, 675 Mode density, 353–358 Modulus, 202, 219, 242, 866, 1256 Molecular axis, 674, 783, 819, 823 Molecular long axis, 783, 804, 805 Molecular orbital energy, 761 Molecular orbitals (MOs), 717, 746, 761–826 Molecular science, 3, 737, 750, 755, 826, 1304 Molecular short axis, 804 Molecular states, 755 Molecular systems, 746, 749 Momentum, 3–5, 7, 28–30, 33, 57, 62, 396, 932, 958, 960, 977, 991, 1002, 1005, 1018, 1024, 1073, 1105, 1112, 1116, 1123, 1136 Momentum operator, 29–31, 33, 63, 79–94, 147, 1003 Momentum space representation, 1017, 1018, 1049, 1073 Monoclinic, 380 Monomials, 841, 886, 888 Moving coordinate system, 695, 696, 698, 861, 923, 1193, 1229 Multiple roots, 258, 487, 500, 505, 543, 544, 583 Multiplication table, 649, 674, 675, 714, 727 Multiplicity, 495, 508, 509, 513, 520, 525, 586, 589, 600, 809 Multiply connected, 199, 229 Multivalued function, 256–274

N Nabla, 64, 933 Nano capacitor, 161 Nano device, 164 Naphthalene, 674, 684 Natural units, 934, 935, 991, 993, 995, 1002, 1049, 1050, 1086, 1095, 1137 Necessary and sufficient condition, 195, 198, 201, 405, 473, 476, 486, 554, 579, 580, 648, 650, 655, 855, 918, 1143, 1179, 1235 Negation, 192, 196 Negative-energy solution, 951, 955–960, 962, 965, 1209, 1212, 1218 state, 948, 949, 1026, 1027, 1036, 1220, 1221, 1231 Negative helicity, 302

1318 Neumann conditions, 27, 338, 416 Newtonian equation, 31 Newton’s equation of motion, 981 Nilpotent, 93, 94, 500–505, 511–521, 525, 526, 528, 539, 577, 633, 1292 Node, 337, 338, 347–349, 397, 773, 774, 817 Non-commutative, 3, 28, 603, 960 Non-commutative group, 648, 664, 772 Non-compact group, 1192, 1263 Non-degenerate, 155–158, 167, 168, 180, 600, 1163, 1178, 1180, 1181 Non-degenerate subspace, 1180, 1181 Non-equilibrium phenomenon, 368 Non-linear equation, 1075, 1091 Non-magnetic substance, 288, 295 Non-negative, 27, 34, 51, 70, 71, 77, 96, 202, 220, 249, 250, 406, 550, 560, 571, 592, 840, 849, 886 Non-negative operator, 76, 558 Non-relativistic approximation, 7, 11, 12 limit, 1121 quantum mechanics, 959 Non-singular matrix, 457, 480, 481, 488, 489, 492, 502, 513, 525, 535, 540, 570, 593–596, 614, 707, 711, 896, 906, 938, 952, 1161, 1168, 1170, 1242–1249 Non-trivial, 403, 458, 1264 Non-trivial solution, 405, 421, 430, 449, 548 Non-vanishing, 15, 16, 93, 130, 134, 139, 149, 230, 231, 235, 282, 306, 416, 617, 681, 767, 771, 797, 812, 1044, 1133 Non-zero coefficient, 310 constant, 27 determinant, 480 eigenvalue, 525, 527, 1163 element, 89, 90 vector, 70 Norm, 24, 55, 550, 551, 555, 557, 564–567, 578, 585, 590, 607, 610, 789, 790, 1068, 1069, 1092 Normalization constant, 37, 46, 53, 114, 568, 954, 957, 965 Normalized, 18, 24, 25, 29, 34, 35, 37, 40, 55, 70, 78, 109, 117, 123, 130, 134–136, 158, 369, 568, 753, 769, 779, 813, 885, 1026, 1031, 1045, 1079, 1209, 1211, 1290 Normalized eigenfunction, 37, 801 Normalized eigenstate, 55 Normalized eigenvector, 157 Normal operator, 578–581, 585, 604, 694, 1203, 1235 Normal-ordered product, 1005, 1016, 1095

Index Normal product, 1005, 1103, 1105 N-product, 1016, 1095–1103, 1111 n-th Root, 257–259 Nullity, 469 Null-space, 468, 529 Number density operator, 1002 Number operator, 41, 1003 Numerical vector, 459, 1146

O O(3), 690, 918, 919, 921 O(3, 1), 1187–1190, 1250–1256, 1258 O(n), 902, 903, 906, 907, 1254 Oblique incidence of wave, 305, 315 Observable, 588, 590 Octants, 682 Odd function, 51, 134, 161, 1014 Odd permutation, 475, 904 Off-diagonal elements, 475, 488, 539, 584, 664, 666, 770, 771, 788, 815, 936, 1034, 1242, 1245 One-dimensional harmonic oscillator, 31, 59, 128, 132 infinite potential well, 20 position coordinate, 33 representation, 772, 787, 828 system, 130–134 One-parameter group, 896–905, 921 One-particle Hamiltonian, 960–964, 1031, 1077 One-particle wave function, 1031 Open ball, 227 Open neighborhood, 201 Open set, 190–193, 195, 196, 199, 201, 219 Operator, 3, 31, 63, 132, 155, 278, 359, 401, 482, 505, 549, 571, 636, 673, 710, 755, 827, 932, 989, 1077, 1141, 1274 Operator method, 33–41, 134 Optical device high-performance, 390 Optical path, 339, 340 Optical process, 127, 358 Optical resonator, 372, 373 Optical transition, 127–152, 155, 359, 743, 746, 749, 767, 780, 781, 783, 797, 802, 803, 824–826 Optical waveguide, 341 Optics, 305, 351, 377 Orbital energy, 761, 772, 818, 823 Order of group, 675, 722 Ordinate axis, 131 Organic materials, 375 Orthochronous Lorentz group

Index proper, 1189, 1258 Orthogonal complement, 571–574, 714, 717, 1178, 1179, 1181, 1184, 1186, 1187 coordinate, 202, 381, 1184, 1185 decomposition, 1184 group, 689–694, 902 matrix, 569, 593, 594, 598, 616, 618, 641, 664, 690, 693, 698, 762, 851, 861, 862, 902, 907, 939, 1160, 1161, 1247, 1248, 1251–1253, 1256, 1263 similarity transformation, 1160 transformation, 662, 689, 695, 696, 912, 939, 1160, 1198 Orthonormal base, 8, 1182 basis set, 63, 571, 576, 610, 746, 766, 877, 972 basis vectors, 61, 63, 662, 677, 678, 742, 756, 772, 980 eigenfunctions, 40 eigenvectors, 157, 583 system, 109, 155, 159, 873, 989, 1091, 1212 Orthonormalization, 567, 580, 742, 964, 965, 1165, 1179 Oscillating electromagnetic field, 129 Out-coupling of emission, 379, 380, 390 of light, 377 Over damping, 442 Overlap integrals, 743, 766, 779, 780, 795, 804, 813, 816, 817

P Pair annihilation, 1115 Pair creation, 1115 Paper-and-pencil, 824 Para, 794, 795 Parallelepiped, 367, 368, 372 Parallelogram, 878 Parallel translation, 286 Parameter space, 861–867, 923, 924, 1192 Parity, 19, 51, 132 Partial differential equations, 277 Partial differentiation, 9, 10, 291, 333, 1095 Partial fraction decomposition method of, 239, 1213 Partial sum, 163, 224 Particular solution, 173, 405 Path, 199, 200, 214–217, 219, 229, 339, 340, 370, 922, 1018, 1023 Pauli spin matrices, 839, 947, 971, 1204, 1259 Periodic conditions, 27, 416 Periodic function, 259, 914

1319 Periodic modulation, 373, 374 Periodic wave, 287 Permeability, 151, 279, 316, 380, 1049 Permittivity, 60, 110, 278, 279, 316, 380, 395, 594 Permutation, 405, 475, 667, 689, 697, 703, 904, 1095, 1101, 1111 Perturbation calculation, 1007, 1017 method, 36, 155–176, 183, 1075, 1078, 1102, 1103 theory, 176, 1075 Perturbed state, 157, 168 Phase change, 329, 342 difference, 295, 339, 340, 396 factor, 18, 40, 76, 79, 86, 122, 135, 143, 293, 568, 957, 959, 967, 970, 1082, 1211 matching, 379, 380, 386, 387, 394 matching conditions, 380, 394 refractive index, 376, 377, 384, 390 shift, 328, 329, 331, 341, 342 space, 1002 velocity, 7, 8, 286, 288, 289, 313, 338, 343 Photoabsorption, 134 Photoemission, 134 Photon absorption, 134, 137, 140 emission, 134, 137, 140 energy, 368, 825, 1138 field, 977, 999, 1005, 1011, 1023, 1026, 1027, 1049, 1063, 1071, 1072, 1084, 1096, 1098, 1102, 1110, 1117, 1129, 1130, 1133, 1136, 1137 Physical laws, 940 Physicochemical properties, 772, 780 Picture Heisenberg, 1087, 1089–1091 interaction, 1085–1091 Schrödinger, 1087, 1089–1091 π-Electron approximation, 773–805 Planar molecule, 677, 773, 783, 791 Planck, M., 3, 351, 358 Planck constant, 3, 4, 129 Planck’s law of radiation, 351, 353–361 Plane of incidence, 311 Plane of mirror symmetry, 667, 671, 673 Plane wave electromagnetic, 290 expansion, 1092 solution, 334, 931, 948–955, 960, 965, 1209, 1227 Plasma etching, 395

1320 Plastics, 295 Point group, 661, 664, 674, 678, 680, 681, 685, 772, 782, 806 Polar coordinate, 30, 47, 65, 149, 175, 202, 249, 1218 Polar coordinate representation, 149, 264, 867, 931, 1124, 1195, 1218 Polar decomposition of a matrix, 593, 1212, 1242, 1245, 1247, 1302 Polar form, 202, 203, 245, 257 Polarizability, 168–170, 175 Polarization elliptic, 277 left-circular, 302 longitudinal, 1059 photon, 1106, 1124–1126 right-circular, 302 scalar, 1059 transverse, 1059 vacuum, 1109, 1110 vector, 129, 136, 137, 140, 293, 313–316, 319, 326, 336, 343, 347, 363–365, 396, 767, 781, 824, 1059–1061 Pole isolated, 232, 239 of order n, 231, 235, 1047 simple, 231, 235, 237, 239, 243–245, 247, 266, 267, 271, 274, 1012, 1019, 1023 Polyene, 805 Polymer, 277, 288 Polynomial exponential, 309 homogeneous, 983 minimal, 499, 500, 502, 526, 527, 535, 540–544 Population inversion, 368 Portmanteau word, 196 Position coordinate, 33 Position operator, 29, 30, 57, 359, 366 Position vector, 8, 61, 129, 131, 136, 151, 169, 288, 289, 308, 361, 362, 364, 463, 564, 638, 662, 756, 761, 767, 774, 782, 802, 828, 933, 934, 959 Positive definite, 17, 27, 34, 36, 296, 297, 558, 559, 569, 571, 592, 593, 595, 707, 905, 937, 939, 965, 1141, 1158, 1165, 1177, 1178, 1197, 1242–1248, 1250–1252, 1256–1258, 1262, 1263, 1302 Positive definiteness, 296, 571, 592, 593, 1158 Positive-definite operator, 27, 1245 Positive-energy solution, 952, 955, 962, 1208, 1210, 1212 state, 948, 949, 957, 1026, 1027, 1036, 1040, 1220, 1231

Index Positive helicity, 302 Positive semi-definite, 27, 558 Positron, 1036, 1039, 1082, 1092, 1096, 1098, 1104, 1108, 1110, 1112, 1113, 1115 Power series expansion, 110, 118, 120, 123, 124, 203, 204, 223, 608, 617 Poynting vector, 319–322, 347 Primitive function, 215 Principal axis, 381, 392, 393 Principal branch, 259, 263 Principal diagonal, 521, 523 Principal minor, 296, 477, 559, 595, 597 Principal part, 230 Principal submatrix, 382, 477 Principal value of branch, 259 of integral, 243, 244 of ln z, 267 Principle of constancy of light velocity, 934, 940, 1085 Probabilistic interpretation, 944, 991, 1031 Probability density, 128, 129, 131, 142, 931–933 distribution, 128 distribution density, 131 Projection operator sensu lato, 738, 1232, 1235, 1238, 1240 sensu stricto, 741, 1214, 1233–1236, 1238, 1239 Proof by contradiction, 195 Propagating electromagnetic wave, 305 Propagation constant, 337, 338, 343, 376, 377, 379 Propagation vector, 379 Propagation velocity, 7, 286 Proper rotation, 666, 669 Proposition converse, 617, 854 P6T, 377, 378, 380, 381, 383–385, 390–392 Pure imaginary, 29, 326, 342, 442, 901, 944 Pure rotation group, 680, 685

Q QED, see Quantum electrodynamics (QED) Q-numbers, 995, 1026 Quadratic equation, 180, 297, 442, 551, 800 Quadratic form Hermitian, 54, 557, 558, 560, 592–599, 1165 real symmetric, 571 Quantization canonical, 993, 1040, 1059, 1061–1065 field, 986, 991, 993, 995, 1005, 1023–1028, 1035, 1039, 1040, 1063, 1187

Index Quantum chemical calculations, 737, 755 Quantum chemistry, 791, 1304 Quantum electrodynamics (QED), 399, 931, 971, 977, 991, 1049, 1075, 1080, 1084, 1092, 1096, 1103–1113, 1139, 1140, 1238 Quantum electromagnetism, 1061 Quantum-mechanical, 3, 21–27, 30, 59, 124, 155, 169, 605, 931, 959, 987, 1032, 1069 Quantum-mechanical harmonic oscillator, 31–58, 109, 111, 156, 164, 558, 1201 Quantum mechanics, 3, 6, 11, 12, 19, 28, 29, 31, 59, 60, 155–183, 351, 366, 401, 450, 549, 571, 699, 774, 903, 959, 977, 1075, 1080, 1304 Quantum numbers azimuthal, 122, 149 magnetic, 143, 144 orbital angular momentum, 110, 124, 144 principal, 122–124 Quantum operator, 29, 1045 Quantum state, 21, 32, 54, 77, 93, 114, 123, 128, 129, 134, 144, 145, 148, 155–164, 167, 168, 171, 176, 366, 767, 878, 880, 957, 960, 967, 1026, 1086, 1294 Quantum theory of light, 368 Quartic equation, 148, 950 Quotient group, 653

R Radial coordinate, 202, 834, 852, 867 Radial wave function, 109–124 Radiation field, 129, 139, 140, 1065 Raising and lowering operators, 93 Raising operator, 76, 93 Rank, 468, 469, 505, 522, 527, 530–532, 548, 950, 955, 958, 1174, 1175, 1219, 1238 Rapidity, 943 Rational number, 203 Rayleigh–Jeans law, 351, 357 Real analysis, 186, 222 Real axis, 18, 37, 185, 202, 242, 250, 254, 255, 263, 267, 269 Real domain, 205, 222, 234, 273, 592 Real form, 1268, 1289, 1293, 1298–1300 Real function, 51, 130, 143, 144, 205, 330, 414, 420, 423, 425, 426, 778, 783, 790, 933, 1082, 1083 Real number field, 1142, 1155, 1259 line, 207, 418 Real space, 828, 834, 837

1321 Real symmetric matrix, 592–594, 597, 1160, 1197, 1251 Real variable, 14, 25, 205, 212, 213, 225, 237, 411, 607 Rearrangement theorem, 649, 707, 721, 728, 730, 739 Rectangular parallelepiped, 367, 368, 372 Recurrence relation, 86 Redshift, 4, 388, 392 Redshifted, 389–392, 394 Reduced mass, 60, 61, 110, 135, 761 Reduced Planck constant, 4, 129 Reducible, 497, 712, 714, 725, 726, 728, 768, 772, 775, 780, 786, 792, 797, 854, 855, 860, 873, 876, 878, 895 Reduction, 583, 712 Reflectance, 321, 328, 349 Reflected light, 312 Reflected wave, 321, 324, 336, 339, 348 Reflection angle, 312 coefficient, 317, 328, 336, 338, 349, 357 Refraction angle, 312, 319, 326 Refractive index of dielectric, 288, 305, 324, 325, 332, 337, 338 group, 371 of laser medium, 369, 371 phase, 376, 377, 384, 390, 393 relative, 313, 323, 329 Region, 151, 204, 210, 213–218, 221, 225–230, 232–234, 294, 295, 328, 330, 332, 343, 351, 363, 438, 873, 922, 1250 Regular hexagon, 791 Regular point, 212, 223 Regular singular point, 173 Regular tetrahedron, 687, 805 Relations between roots and coefficients, 487 Relative coordinates, 59–61 Relative permeability, 279 Relative permittivity, 279 Relative refractive index, 313, 323, 324, 329 Relativistic effect, 949 Relativistic field, 993, 1023, 1027, 1055, 1097 Relativistic quantum mechanics, 959 Relativity special principle of, 940 Renormalization, 1110, 1140 Representation adjoint, 905–915, 1261, 1273–1278, 1286 antisymmetric, 750–753, 890–892, 895, 1297 complex conjugate, 789, 846 conjugate, 651

1322 Representation (cont.) coordinate, 3, 31, 45–53, 110, 114, 124, 129, 131, 134, 139, 149, 161, 167, 170, 172, 174, 175, 179, 181, 182, 264, 418, 819, 852, 867, 931, 934, 936, 954, 1023, 1049, 1086, 1124, 1184, 1193, 1195, 1198, 1199, 1201, 1237, 1245 differential, 1264, 1271–1287, 1290, 1291 dimension of, 706, 711, 713, 729, 746, 772, 860 Dirac, 947 direct-product, 746–751, 766, 767, 797, 824, 873–876, 880, 890–892, 894 of groups, 705, 749 Heisenberg, 1087 irreducible, 712, 714, 717, 720–722, 724–726, 728, 729, 734, 735, 737, 738, 742–746, 766–768, 771–773, 775–777, 780, 782, 784, 786, 797–799, 801, 803, 806, 808, 810–812, 814, 815, 817, 823–825, 853–861, 868, 870, 871, 873, 876–878, 891, 1291–1293 matrix, 3, 31, 34, 42–45, 89, 92, 93, 464, 465, 467, 475, 477, 480–482, 495, 517, 521, 530, 531, 545, 561–564, 577, 601, 603, 666, 670, 675–679, 685, 689, 717, 721, 756, 764, 772, 785, 828, 837, 856, 891, 894, 912, 961, 1141, 1150, 1196, 1261, 1291, 1292 one-dimensional, 768, 772, 782, 787, 789, 810, 828, 1201, 1245 reducible, 712, 725, 726, 728, 768, 772, 775, 786 regular, 726–734 Schrödinger, 1087 space, 1018, 1049, 1073 subduced, 776, 786, 792 symmetric, 750–753, 760, 765, 767, 768, 771, 777, 780, 786, 797, 811, 824, 891 theory, 690, 705–753, 755, 771, 827, 849, 874, 1281, 1294, 1297 three-dimensional, 831, 849, 1290 totally symmetric, 760, 765, 767, 768, 771, 777, 780, 786, 797, 811, 824 two-dimensional, 753, 789 unitary, 705, 706, 709, 713, 714, 741, 855, 860, 1192, 1263 Representative element, 1255 Residue, 234–237, 239, 240, 244, 267, 1014, 1047, 1048 Resolvent, 607, 621–626, 628–633, 635, 636, 641, 644 Resolvent matrix, 621–626, 628–633, 635, 636, 641, 644 Resonance integral, 778, 794

Index Resonance structures, 714, 715 Resonant structures, 784 Resonator, 350, 372, 373, 375 Riemann sheet, 260 Riemann surface, 256–274 Right-handed system, 293, 314, 319, 397 Rigid body, 690, 755 Rodrigues formula, 88, 95, 97, 102, 118, 273 Root complex, 243, 442, 443 double, 257, 297, 442, 501, 503, 537, 545, 591, 716, 809, 968 multiple, 258, 487, 500, 504, 505, 541, 543, 544, 548, 583 simple, 529, 545 square, 258, 297, 382, 818 triple, 257, 501, 529–531, 786 Rotation angle, 664, 666, 667, 673, 693, 828, 861, 863–866, 868, 874–876, 919, 1204, 1222 axis, 665–668, 673, 674, 683–685, 688, 690–694, 700–703, 861, 862, 864, 868, 869, 874, 919, 924, 925, 1302 group, 680, 685, 689, 690, 827, 834–895, 920, 1141 improper, 669, 673, 690, 807, 920 matrix, 641, 664, 689–694, 849, 868, 919 proper, 666, 669 symmetry, 666 transformation, 661 Row vector, 8, 405, 459, 491, 546, 554, 566, 567, 728, 860, 936, 937, 1030, 1146, 1174 Ruled surface, 1185, 1186

S SALC, see Symmetry-adapted linear combination (SALC) Scalar, 278, 284, 458, 459, 471, 499, 513, 552, 602, 977, 1017, 1028, 1030, 1059, 1069, 1079, 1099, 1100, 1102, 1132, 1133, 1197, 1203, 1207 Scalar field, 977, 991–1028, 1030, 1032, 1035, 1036, 1040, 1041, 1043, 1048–1051, 1058, 1061–1065, 1067, 1071, 1075, 1096, 1097, 1201 Scalar function, 286, 287, 755, 1005, 1017, 1056, 1058, 1200, 1202, 1203 Scalar potential, 1050, 1051 Scattering Bhabha, 1111 cross-section, 1120–1124, 1136 Moller, 1111

Index process, 4, 5, 176, 1108, 1111, 1119 Schrödinger, E., 8 Schrödinger equation of motion, 60 time-evolved, 128 Schur’s First Lemma, 717, 720, 723, 810, 852, 869 Schur’s lemmas, 717–723, 853 Schur’s Second Lemma, 717, 719, 721, 734, 764, 854, 855, 860, 1265 Second-order differential operator, 411–416 Second-order extension, 1301 Second-order linear differential equation (SOLDE), 3, 14, 15, 33, 45, 72, 73, 85, 383, 401–406, 410–412, 417, 424, 447, 448, 451, 607, 618–620, 945, 986 Secular equation, 530, 755, 770–773, 777, 788, 794, 798, 799, 801, 812, 814, 815, 823 Selection rule, 127–152, 746 Self-adjoint, 411, 415, 416, 424, 432, 449, 450, 452, 555 Self-adjoint operator, 413, 422, 424, 449 Self-energy electron, 1109, 1139 photon, 1109–1111, 1139 Sellmeier’s dispersion formula, 371, 375, 376 Semicircle, 237–239, 243, 245, 246, 249, 250, 255, 1020–1022 Semiclassical, 127 Semiclassical theory, 150 Semiconductor crystal, 372, 373, 375, 376, 395 organic, 372, 373, 375, 376, 395 processing, 395 Semi-infinite dielectric media, 305, 306 Semi-simple, 512–515, 536, 539, 543, 546, 1230, 1235 Separation axiom, 200, 210 Separation condition, 200 Separation of variables, 12, 69–74, 110, 127, 353, 1086 Sesquilinear, 552 Set difference, 188 finite, 188, 1083, 1086 infinite, 188, 1083, 1086 of points, 232 theory, 185, 186, 202, 204, 274 Signature, 1161 Similarity transformation, 485, 487–489, 492, 502, 504, 511–513, 523, 525, 533, 535, 537, 540, 542, 556, 577, 580, 581, 583, 584, 587, 593, 603, 614, 627, 631, 635, 681, 694, 705, 707, 709, 712, 834, 851, 869, 895, 906, 919, 951, 952,

1323 1087–1089, 1141, 1160, 1161, 1196, 1197, 1203, 1219, 1220, 1223, 1235, 1238, 1242, 1243, 1246 Simple harmonic motion, 32 Simply connected, 199, 200, 215–219, 221, 225, 228, 229, 234, 920–926, 1241, 1249, 1250, 1258, 1263 Simply reducible, 876, 878, 895 Simultaneous eigenfunction, 89 Singleton, 201 Single-valuedness, 212, 221, 259, 260 Singularity essential, 232 removable, 230, 252 Singular part, 230, 232 Singular point isolated, 242 regular, 173 Sinusoidal oscillation, 129, 131, 135 Skew-symmetric, 616, 618, 626, 902, 903, 907, 921, 1288, 1302 S-matrix element, 1104–1108, 1111, 1115, 1116, 1120 expansion, 1091–1095, 1139 Snell’s law, 313, 323, 325, 326, 343 SO0(3, 1), 1189, 1255, 1256, 1258, 1261, 1263, 1264, 1266, 1281, 1284–1287, 1298 Solid angle, 681, 684, 702, 1124 Solid-state chemistry, 395 Solid-state physics, 395 Space coordinates, 127, 934, 936, 1030 Space group, 663 Spacelike, 940, 1015, 1186 Spacelike separation, 1015 Spacelike vector, 1060, 1184, 1186 Space-time coordinates, 934–936, 946, 987, 998, 1039, 1082, 1102 Space-time point, 939, 1193 Span, 459–462, 468–470, 480, 495–498, 501, 517, 529, 539, 548, 572, 600, 602, 691, 693, 764, 771, 788, 791, 808–810, 823, 828, 846, 869, 873, 874, 886, 891, 915, 1143, 1157, 1181–1184, 1186, 1187, 1288, 1289 Special functions, 59, 109, 643 Special linear group, 1249–1258 Special orthogonal group SO(3), 661, 689–704, 827, 830, 834–874, 876, 878, 895, 896, 898, 904, 909, 911–915, 918, 919, 921, 923, 924, 926, 1241, 1265, 1266, 1276, 1286, 1287 SO(n), 902, 903, 906, 907, 918 Special solution, 8 Special theory of relativity, 931, 934–944, 1256

1324 Special unitary group SU(2), 827, 830, 834–880, 883, 896, 898, 904, 909–914, 923, 924, 926, 1207, 1241, 1242, 1249, 1250, 1258, 1262, 1265, 1266, 1276, 1286, 1287, 1294 SU(n), 837, 902, 903, 906, 907 Spectral decomposition, 587–589, 604, 605, 694, 1232, 1233, 1235–1241 Spectral lines, 129, 279, 370, 755 Spherical basis, 834, 850 Spherical coordinate, 60, 62, 63, 834, 852, 978 Spherical surface harmonics, 89, 94–105, 109, 171, 846, 869, 873, 915 Spherical symmetry, 59, 60, 922 Spinor Dirac, 957, 958, 961–963, 1201, 1202, 1207–1212, 1214–1217, 1220, 1223, 1227, 1236, 1237, 1256, 1257 space, 1187 Spin state, 123 Spring constant, 31 Square-integrable, 26 Square matrix, 457, 488, 490, 495, 497, 498, 501, 512, 513, 523, 608, 609, 614, 711, 718, 719, 723, 830, 858, 975, 1125, 1258 Standard deviation, 54, 57 Stark effect, 176 State vector, 129, 133, 135, 157, 1086, 1087, 1089, 1091 Stationary current, 283 Stationary wave, 343–350, 355, 396 Step function, 422 Stimulated absorption, 358, 359, 368 Stirling’s interpolation formula, 612 Stokes’ theorem, 306, 307 Strictly positive, 17 Structure constants, 903, 926, 1095, 1139, 1277, 1286 Structure-property relationship, 372 Strum-Liouville system, 450 Subalgebra, 1268, 1269, 1290, 1293, 1298–1300 Subgroups, 647, 649–652, 656, 659, 675, 677, 685, 688–690, 732, 776, 784, 786, 791, 897, 906, 914, 917, 918, 921, 1252, 1255 Submatrix, 382, 476, 477, 583, 809, 891, 892, 951, 1213 Subset, 186–198, 200, 201, 204, 233, 459, 649–651, 655, 902, 915–918, 921, 1263 Subspace, 190, 199, 200, 204, 459–463, 468–470, 486, 495–501, 504, 505, 508,

Index 514, 517, 528–531, 539, 540, 544, 548, 571–574, 583, 584, 602, 603, 649, 650, 665, 713, 714, 717, 808, 809, 902, 1142, 1145, 1178–1184, 1186, 1187, 1252, 1269, 1289, 1290, 1298–1300 Superior limit, 226 Superposed wave, 346, 347 Superposition of waves, 294–303, 336, 344–346, 348 Surface term, 409–411, 413, 416, 419, 424–426, 430, 432, 436–440, 446, 450, 621 Surjective, 472–474, 478, 653, 655, 657, 912, 924, 1266 Sylvester’s law of inertia, 1161 Symmetric, 59, 60, 69, 133, 175, 179, 296, 360, 386, 422, 425, 571, 592–597, 616, 618, 626, 627, 630, 631, 663, 664, 688, 689, 750–753, 760, 765, 767, 768, 771, 777, 780, 786, 797, 811, 824, 825, 835, 864, 878, 890–892, 895, 898, 902, 903, 907, 921, 937, 943, 995, 1051, 1137, 1159–1161, 1170, 1192, 1197, 1221, 1247, 1248, 1251–1253, 1256–1258, 1262, 1288, 1296, 1297, 1302 Symmetry discrete, 1256 groups, 648, 661–704, 761, 764, 782 operations, 661, 663–669, 671, 673–680, 685–687, 710, 716, 717, 728, 737, 755, 759, 763–765, 771–773, 776, 777, 784, 787, 797, 806, 809, 810 requirement, 179, 671, 673, 813, 815–817 species, 685, 746, 772, 784, 785, 802, 824 transformation, 774, 775, 786, 792, 798, 807, 808 Symmetry-adapted linear combination (SALC), 755, 771–773, 777–778, 784, 787, 788, 791, 792, 794, 798, 800, 801, 805, 810, 811, 813–816, 891, 892 Syn-phase, 347, 349 System of differential equations, 607–637, 643, 644 System of linear equations, 476, 478

T Tangent space, 1187 Taylor’s expansion, 223, 225, 229 Taylor’s series, 99, 223–229, 231, 241, 245, 250, 272 Td, 673, 680–689, 806–808, 811, 825, 920 TE mode, see Transverse electric (TE) mode

Index Tensor contraction, 1177 electric permittivity, 380 equation, 1166 invariant, 1175 permittivity, 380, 395, 594 product, 1149–1155, 1171–1173, 1175, 1176 Termwise integration, 224, 228 TE wave, see Transverse electric (TE) wave θ function, 444 Thiophene, 674, 727 Thiophene/phenylene co-oligomers (TPCOs), 373, 375–377 Three-dimensional, 8, 59–61, 134–144, 168, 287, 288, 334, 354, 361, 460, 461, 477, 523, 529, 661, 664, 665, 677, 685, 689, 757, 759, 811, 828, 831, 849, 895, 958, 994, 1002, 1060, 1061, 1116, 1183–1186, 1276, 1290 Three-dimensional Cartesian coordinate, 8, 677 Time-averaged Poynting vector, 320–322 Time coordinate, 934–936, 939, 946, 948, 977, 987, 994, 998, 1030, 1039, 1082, 1102, 1185 Time dilation, 944 Timelike, 940, 1061, 1186 Timelike vector, 1060, 1084, 1186 Time-ordered product (T-product), 977, 1007, 1015–1017, 1043, 1094–1103, 1114, 1146 TM mode, see Transverse magnetic (TM) mode TM wave, see Transverse magnetic (TM) wave Topological group, 196, 827, 830, 926 Topological space, 185, 189–191, 194, 196, 199, 200, 204, 921, 926, 1142 Topology, 185–204 Torus, 199 Total internal reflection, 332, 338–343, 386 Totally symmetric, 760, 765, 767, 768, 771, 777, 780, 786, 797, 811, 824 Totally symmetric ground state, 797, 824 Total reflection, 305, 313, 325–331, 334, 342 TPCOs, see Thiophene/phenylene co-oligomers (TPCOs) T-product, see Time-ordered product (T-product) Trace, 255, 298, 301, 302, 487, 533, 592, 594, 680, 681, 694, 698, 700, 716, 720, 723–725, 729, 734, 735, 772, 774, 784–786, 792, 797, 852, 864, 869, 902–904, 906, 975, 1125, 1130–1132, 1134, 1135, 1197, 1219, 1259

1325 Traceless, 839, 898, 902, 909, 1267 Transformation canonical, 1089 charge-conjugation, 967, 968, 970 contravariant, 1168 coordinate, 300, 382, 661, 667, 703, 704, 764, 845, 868, 1167, 1169, 1177, 1187 discrete, 967 equivalence, 593, 594, 1161, 1162 group, 648, 737, 837, 1083 linear, 457, 463–473, 475–477, 479–482, 484, 485, 495, 497, 498, 501, 508, 511, 517, 521, 528, 533, 545, 546, 552, 558, 561–563, 565, 578, 579, 590, 648, 655, 656, 662, 695, 703, 704, 709, 710, 734, 832, 855, 907, 908, 938, 940, 1260, 1272 local phase, 1082–1085 matrix, 468, 477, 479, 480, 485, 565, 693, 836, 1188, 1198, 1207, 1248, 1256–1258, 1302 particle-antiparticle, 1039 similarity, 485, 487–489, 492, 502, 504, 511–513, 523, 525, 533, 535, 537, 540, 542, 556, 577, 580, 581, 583, 584, 587, 593, 603, 614, 627, 631, 635, 681, 694, 705, 707, 709, 712, 834, 851, 869, 895, 906, 919, 951, 952, 1087–1089, 1141, 1160, 1161, 1196, 1197, 1203, 1219, 1220, 1223, 1235, 1238, 1242, 1243, 1246 successive, 484, 695, 696, 847, 863, 1193, 1194, 1221–1223, 1304 Transition dipole moment, 129, 134 Transition electric dipole moment, 129 Transition matrix element, 133, 766, 781, 797, 802, 803 Transition probability, 129, 135, 141, 176, 359, 767, 804, 1120 Translation symmetry, 663 Transmission coefficient, 317, 318 of electromagnetic wave, 305–350 irradiance, 322 of light, 308 Transmittance, 321 Transpose complex conjugate, 22 Transposition, 550, 596, 621, 740, 881, 938, 1030, 1081, 1251 Transverse electric (TE) mode, 336, 338, 341 Transverse electric (TE) wave, 305, 313–319, 327, 329, 331–338, 341

1326 Transverse magnetic (TM) mode, 341, 383, 386, 390 Transverse magnetic (TM) wave, 305, 313–319, 321, 323–325, 327, 329, 331–338, 341 Transverse mode, 349, 375, 377, 386, 392, 1059, 1106, 1117, 1133 Transverse wave, 290, 291, 365 Trial function, 178, 179 Triangle inequality, 551 Triangle matrix lower, 488, 539, 577, 632 upper, 488, 490, 502, 512, 523, 577, 632 Trigonometric formula, 104, 132, 320, 323, 324, 342, 388, 872, 914 Trigonometric function, 143, 248, 665, 843, 866 Triple integral, 992, 1065 Triply degenerate, 817, 824 Trivial, 21, 26, 199, 309, 403, 421, 423, 432, 449, 458, 462, 479, 486, 502, 507, 548, 579, 591, 595, 633, 634, 648, 690, 693, 767, 831, 849, 854, 892, 898, 1104, 1158, 1201, 1210, 1233, 1234, 1264, 1301 Trivial solution, 421, 423, 432, 449, 479, 486, 548 Trivial topology, 199 T1-space, 199–202, 926 Two-level atoms, 351, 358–361, 366, 367, 396

U U(1), 1082–1085 Ultraviolet catastrophe, 351, 358 Ultraviolet light, 377 Unbounded, 25, 29, 223, 1192, 1263 Uncertainty principle, 29, 53–58 Uncountable, 647, 989 Underlying set, 190 Undetermined constant, 76, 78, 86 Uniformly convergent, 224, 225, 227–229 Uniqueness of decomposition, 513 of Green’s function, 424 of representation, 460, 465, 467, 658, 710 of solution, 408, 430 Unique solution, 438, 441, 447, 472, 476, 941 Unitarity, 709, 739, 741, 745, 766, 837, 846, 848, 852, 882, 900, 951, 972, 974, 1230 Unitary diagonalization, 580–588, 592, 598 group, 827, 837, 844, 901–903, 906

Index matrix, 138, 299, 556, 559, 560, 566, 571, 580–583, 585, 590, 591, 597, 602–604, 616, 618, 627, 631, 694, 705, 707, 713, 716, 725, 772, 789, 796, 801, 809, 832, 833, 843, 849–851, 877, 881, 885, 894, 895, 898, 906, 907, 1192, 1218, 1239, 1242, 1243, 1245–1247, 1256, 1258, 1297 operator, 571–605, 834, 881, 891, 900, 901, 974, 1087, 1221, 1230 representation, 705–707, 709, 713, 714, 741, 855, 860, 1192, 1263 transformation, 138, 142, 566, 590, 766, 789, 790, 801, 840, 880, 882, 883, 885, 891, 895, 1086, 1088, 1196, 1207 Unitary similarity transformation, 577, 580, 583, 584, 587, 603, 627, 635, 694, 834, 851, 895, 906, 919, 951, 1087–1089, 1197, 1203, 1242, 1243, 1246 Unit polarization vector of electric field, 129, 293, 313, 363, 364, 781, 824 of magnetic field, 314, 365 Unit set, 201 Universal covering group, 926, 1241, 1242, 1249 Universal set, 186–190 Unknowns, 157, 293, 316, 386, 622, 624, 626 Unperturbed state, 157, 162 system, 156, 157 Upper right off-diagonal elements, 475, 488

V Vacuum expectation value, 1016, 1017, 1043, 1098–1100 Variable transformation, 48, 80, 142, 241 Variance, 53–58 Variance operator, 53 Variational method, 155, 176–183 Variational principle, 36, 112 Vector analysis, 284, 291, 1050 field, 280, 281, 306, 1050, 1084, 1165 potential, 1050, 1051, 1055, 1077–1079, 1129 singular, 1178, 1184–1187 transformation, 465, 466, 476, 480, 514, 661, 704, 831, 1170, 1171 unit, 4, 131, 132, 281, 288–290, 306–308, 313–315, 336, 339, 340, 365, 460, 461, 564, 605, 638, 831

Index Vector space complex, 138, 1268, 1301 finite-dimensional, 473 linear, 457, 465, 469, 471, 473, 485, 544, 549, 583, 600, 609, 610, 649, 650, 653, 717, 903, 1141, 1142 n-dimensional, 457, 458, 461, 515, 936, 1156 theory of, 552, 1187, 1304 Venn diagram, 186, 187, 189, 191 Vertex, 688, 703, 1108, 1185, 1186 Vertical incidence, 313, 314, 323 Virtual displacement, 979, 980 Virtual work infinitesimal, 980

W Wave equation, 8, 277, 287, 293, 334, 336, 353, 354, 932 Wave front, 340 Waveguide, 305, 331–343, 349, 372, 375, 377, 380, 386, 390–394 Wavelength, 4, 6, 7, 152, 312, 313, 321, 346, 348–350, 370–376, 388–395, 638 Wavelength dispersion, 371, 372, 374–376, 392, 393 Wavenumber, 4, 7, 287, 337, 338, 340, 370, 379, 1002 Wavenumber vector, 4, 5, 287, 307, 308, 311, 326, 332, 334, 368, 379, 391, 1059

1327 Wavevector, 378, 379, 388, 391, 393 Wave zone, 363–365 Weight function, 51, 406, 413, 415, 417, 428, 432, 445, 450, 620 Wick’s theorem, 1102 Wigner formula, 840–843 World interval, 939–944, 1015, 1187 Wronskian, 15, 308, 404, 405, 432, 434

X X-ray, 4–6

Y Yukawa potential, 109

Z Zenithal angle, 72, 700, 861, 869, 924, 1194, 1196 Zero element, 1283, 1284 matrix, 28, 495, 497, 501, 502, 514–516, 539, 586, 612, 613, 921, 947, 1204 vector, 24, 278, 458, 460, 486, 495, 568, 585, 650, 899, 921, 1178 Zero-point energy, 1003 Zeros, 19, 229–232, 521