Mathematical physical chemistry [2 ed.] 9789811522246, 9789811522253

443 52 6MB

English Pages 920 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Mathematical Physical Chemistry: Practical and Intuitive Methodology [2 ed.] 9789811522253, 9811522251

392 115 46MB Read more

Mathematical Physical Chemistry: Practical and Intuitive Methodology

637 125 11MB Read more

Physical Chemistry [11th Ed.]

7,377 765 42MB Read more

PHYSICAL CHEMISTRY IN BRIEF

801 66 2MB Read more

Physical organic chemistry

5,437 1,223 99MB Read more

Exam Survival Guide: Physical Chemistry

1,053 164 5MB Read more

Physical Chemistry [4 ed.]

Moore is something approaching a Holy Cannon of Physical Chemistry. This is a text that should be on the reference books

1,516 139 20MB Read more

Mathematical Physical Chemistry: Practical and Intuitive Methodology [3 ed.] 9819925118, 9789819925117, 9789819925124, 9789819925148

The third edition of this book has been updated so that both advanced physics and advanced chemistry can be overviewed f

166 54 21MB Read more

Mathematical Physical Chemistry. Practical and Intuitive Methodology [3 ed.] 9789819925117, 9789819925124

145 66 17MB Read more

Physical Chemistry [4 ed.]

Moore's Physical Chemistry is a classic and reference for other authors. It's something approaching a Holy Can

1,200 113 3MB Read more

Mathematical physical chemistry [2 ed.]
9789811522246, 9789811522253

Author / Uploaded
Hotta S

Table of contents :
Preface to the Second Edition......Page 6
Preface to the First Edition......Page 8
Contents......Page 12
Part I: Quantum Mechanics......Page 18
1.1 Early-Stage Quantum Theory......Page 20
1.2 Schrödinger Equation......Page 25
1.3 Simple Applications of Schrödinger Equation......Page 31
1.4 Quantum-Mechanical Operators and Matrices......Page 38
1.5 Commutator and Canonical Commutation Relation......Page 44
Reference......Page 47
2.1 Classical Harmonic Oscillator......Page 48
2.2 Formulation Based on an Operator Method......Page 50
2.3 Matrix Representation of Physical Quantities......Page 58
2.4 Coordinate Representation of Schrödinger Equation......Page 61
2.5 Variance and Uncertainty Principle......Page 68
References......Page 73
3.1 Introductory Remarks......Page 74
3.2 Constitution of Hamiltonian......Page 75
3.3 Separation of Variables......Page 84
3.4 Generalized Angular Momentum......Page 89
3.5 Orbital Angular Momentum: Operator Approach......Page 94
3.6 Orbital Angular Momentum: Analytic Approach......Page 108
3.6.1 Spherical Surface Harmonics and Associated Legendre Differential Equation......Page 109
3.6.2 Orthogonality of Associated Legendre Functions......Page 120
3.7.1 Operator Approach to Radial Wave Functions......Page 124
3.7.2 Normalization of Radial Wave Functions......Page 129
3.7.3 Associated Laguerre Polynomials......Page 133
3.8 Total Wave Functions......Page 139
References......Page 140
4.1 Electric Dipole Transition......Page 141
4.2 One-Dimensional System......Page 144
4.3 Three-Dimensional System......Page 148
4.4 Selection Rules......Page 158
4.5 Angular Momentum of Radiation [6]......Page 163
References......Page 166
5.1 Perturbation Method......Page 167
5.1.1 Quantum State and Energy Level Shift Caused by Perturbation......Page 169
5.1.2 Several Examples......Page 172
5.2 Variational Method......Page 188
References......Page 195
6.1 Set and Topology......Page 196
6.1.1 Basic Notions and Notations......Page 197
6.1.2 Topological Spaces and Their Building Blocks......Page 200
6.1.3 T1-Space......Page 210
6.1.4 Complex Numbers and Complex Plane......Page 212
6.2 Analytic Functions of a Complex Variable......Page 214
6.3 Integration of Analytic Functions: Cauchy´s Integral Formula......Page 222
6.4 Taylor´s Series and Laurent´s Series......Page 231
6.5 Zeros and Singular Points......Page 238
6.6 Analytic Continuation......Page 240
6.7 Calculus of Residues......Page 242
6.8 Examples of Real Definite Integrals......Page 245
6.9.1 Brief Outline......Page 263
6.9.2 Examples of Multivalued Functions......Page 271
References......Page 280
Part II: Electromagnetism......Page 281
7.1 Maxwell´s Equations and Their Characteristics......Page 282
7.2 Equation of Wave Motion......Page 289
7.3 Polarized Characteristics of Electromagnetic Waves......Page 293
7.4 Superposition of Two Electromagnetic Waves......Page 298
References......Page 306
8.1 Electromagnetic Fields at an Interface......Page 307
8.2 Basic Concepts Underlying Phenomena......Page 309
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves......Page 315
8.4 Energy Transport by Electromagnetic Waves......Page 320
8.5 Brewster Angles and Critical Angles......Page 324
8.6 Total Reflection......Page 327
8.7 Waveguide Applications......Page 331
8.7.1 TE and TM Waves in a Waveguide......Page 332
8.7.2 Total Internal Reflection and Evanescent Waves......Page 339
8.8 Stationary Waves......Page 343
References......Page 349
9.1 Blackbody Radiation......Page 350
9.2 Planck´s Law of Radiation and Mode Density of Electromagnetic Waves......Page 352
9.3 Two-Level Atoms......Page 356
9.4 Dipole Radiation......Page 360
9.5.1 Brief Outlook......Page 365
9.5.2 Organic Lasers......Page 370
9.6 Mechanical System......Page 386
References......Page 389
10.1 Second-Order Linear Differential Equations (SOLDEs)......Page 390
10.2 First-Order Linear Differential Equations (FOLDEs)......Page 395
10.3 Second-Order Differential Operators......Page 400
10.4 Green´s Functions......Page 405
10.5 Construction of Green´s Functions......Page 412
10.6.1 General Remarks......Page 419
10.6.2 Green´s Functions for IVPs......Page 422
10.6.3 Estimation of Surface Terms......Page 425
10.6.4 Examples......Page 429
10.7 Eigenvalue Problems......Page 436
References......Page 441
Part III: Linear Vector Spaces......Page 442
11.1 Vectors......Page 443
11.2 Linear Transformations of Vectors......Page 448
11.3 Inverse Matrices and Determinants......Page 458
11.4 Basis Vectors and Their Transformations......Page 462
Reference......Page 468
12.1 Eigenvalues and Eigenvectors......Page 469
12.2 Eigenspaces and Invariant Subspaces......Page 478
12.3 Generalized Eigenvectors and Nilpotent Matrices......Page 483
12.4 Idempotent Matrices and Generalized Eigenspaces......Page 488
12.5 Decomposition of Matrix......Page 495
12.6.1 Canonical Form of Nilpotent Matrix......Page 498
12.6.2 Jordan Blocks......Page 503
12.6.3 Example of Jordan Canonical Form......Page 511
12.7 Diagonalizable Matrices......Page 522
References......Page 532
13.1 Inner Product and Metric......Page 533
13.2 Gram Matrices......Page 536
13.3 Adjoint Operators......Page 545
13.4 Orthonormal Basis......Page 551
References......Page 555
14.1 Projection Operators......Page 556
14.2 Normal Operators......Page 563
14.3 Unitary Diagonalization of Matrices......Page 565
14.4 Hermitian Matrices and Unitary Matrices......Page 573
14.5 Hermitian Quadratic Forms......Page 577
14.6 Simultaneous Eigenstates and Diagonalization......Page 583
References......Page 589
15.1 Functions of Matrices......Page 590
15.2 Exponential Functions of Matrices and Their Manipulations......Page 594
15.3.1 Introduction......Page 601
15.3.2 System of Differential Equations in a Matrix Form: Resolvent Matrix......Page 604
15.3.3 Several Examples......Page 609
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave......Page 620
References......Page 626
Part IV: Group Theory and Its Chemical Applications......Page 627
16.1 Definition of Groups......Page 628
16.2 Subgroups......Page 631
16.3 Classes......Page 632
16.4 Isomorphism and Homomorphism......Page 634
16.5 Direct-Product Groups......Page 638
Reference......Page 640
17.1 A Variety of Symmetry Operations......Page 641
17.2 Successive Symmetry Operations......Page 649
17.3 O and Td Groups......Page 660
17.4 Special Orthogonal Group SO(3)......Page 669
17.4.1 Rotation Axis and Rotation Matrix......Page 670
17.4.2 Euler Angles and Related Topics......Page 675
References......Page 684
18.1 Definition of Representation......Page 685
18.2 Basis Functions of Representation......Page 689
18.3 Schur´s Lemmas and Grand Orthogonality Theorem (GOT)......Page 698
18.4 Characters......Page 703
18.5 Regular Representation and Group Algebra......Page 706
18.6 Classes and Irreducible Representations......Page 713
18.7 Projection Operators......Page 716
18.8 Direct-Product Representation......Page 726
18.9 Symmetric Representation and Antisymmetric Representation......Page 729
References......Page 733
19.1 Transformation of Functions......Page 734
19.2 Method of Molecular Orbitals (MOs)......Page 739
19.3 Calculation Procedures of Molecular Orbitals (MOs)......Page 747
19.4.1 Ethylene......Page 752
19.4.2 Cyclopropenyl Radical [1]......Page 762
19.4.3 Benzene......Page 769
19.4.4 Allyl Radical [1]......Page 775
19.5 MO Calculations of Methane......Page 782
References......Page 803
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation......Page 804
20.2 Rotation Groups: SU(2) and SO(3)......Page 811
20.2.1 Construction of SU(2) Matrices......Page 813
20.2.2 SU(2) Representation Matrices: Wigner Formula......Page 817
20.2.3 SO(3) Representation Matrices and Spherical Surface Harmonics......Page 819
20.2.4 Irreducible Representations of SU(2) and SO(3)......Page 828
20.2.5 Parameter Space of SO(3)......Page 836
20.2.6 Irreducible Characters of SO(3) and Their Orthogonality......Page 842
20.3.1 Direct-Product of SU(2) and Clebsch-Gordan Coefficients......Page 848
20.3.2 Calculation Procedures of Clebsch-Gordan Coefficients......Page 854
20.3.3 Examples of Calculation of Clebsch-Gordan Coefficients......Page 863
20.4 Lie Groups and Lie Algebras......Page 869
20.4.1 Definition of Lie Groups and Lie Algebras: One-Parameter Groups......Page 870
20.4.2 Properties of Lie Algebras......Page 873
20.4.3 Adjoint Representation of Lie Groups......Page 879
20.5.1 Several Definitions and Examples......Page 890
20.5.2 O(3) and SO(3)......Page 893
20.5.3 Simply Connected Lie Groups: Local Properties and Global Properties......Page 895
References......Page 901
Index......Page 903

Citation preview

Shu Hotta

Mathematical Physical Chemistry Practical and Intuitive Methodology Second Edition

Mathematical Physical Chemistry

Shu Hotta

Mathematical Physical Chemistry Practical and Intuitive Methodology Second Edition

Shu Hotta Takatsuki, Osaka, Japan

ISBN 978-981-15-2224-6 ISBN 978-981-15-2225-3 https://doi.org/10.1007/978-981-15-2225-3

(eBook)

© Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

To my wife Kazue and To the memory of Roro

Preface to the Second Edition

This book is the second edition of Mathematical Physical Chemistry. Mathematics is a common language of natural science including physics, chemistry, and biology. Although the words mathematical physics and physical chemistry (or chemical physics) are commonly used, mathematical physical chemistry sounds rather uncommon. Therefore, it might well be reworded as the mathematical physics for chemists. The book title could have been, for instance, “The Mathematics of Physics and Chemistry” accordingly, in tribute to the famous book that was written three-quarters of a century ago by H. Margenau and G. M. Murphy. Yet, the word mathematical physical chemistry is expected to be granted citizenship, considering that chemistry and related interdisciplinary fields such as materials science and molecular science are becoming increasingly mathematical. The main concept and main theme remain unchanged, but this book’s second edition contains the theory of analytic functions and the theory of continuous groups. Both the theories are counted as one of the most elegant theories of mathematics. The mathematics of these topics is of a somewhat advanced level and something like a “sufficient condition” for chemists, whereas that of the first edition may be a prerequisite (or a necessary condition) for them. Therefore, chemists (or may be physicists as well) can creatively use the two editions. In association with these major additions to the second edition, the author has disposed the mathematical topics (the theory of analytic functions, Green’s functions, exponential functions of matrices, and the theory of continuous groups) at the last chapter of individual parts (Part I through Part IV). At the same time, the author has also made several specific revisions including the introductory discussion on the perturbation method and variational method, both of which can be effectively used for gaining approximate solutions of various quantummechanical problems. As another topic, the author has presented the recent progress on organic lasers. This topic is expected to help develop high-performance lightemitting devices, one of the important fields of materials science. As in the case of the first edition, readers benefit from going freely back and forth across the whole topics of this book. vii

viii

Preface to the Second Edition

Once again, the author wishes to thank many students for valuable discussions and Dr. Shin’ichi Koizumi at Springer for giving him an opportunity to write this book. Takatsuki, Japan October 2019

Shu Hotta

Preface to the First Edition

The contents of this book are based upon manuscripts prepared for both undergraduate courses of Kyoto Institute of Technology by the author entitled “Polymer Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s course lecture of Kyoto Institute of Technology by the author entitled “Solid-State Polymers Engineering.” This book is intended for graduate and undergraduate students, especially those who major in chemistry and, at the same time, wish to study mathematical physics. Readers are supposed to have a basic knowledge of analysis and linear algebra. However, they are not supposed to be familiar with the theory of analytic functions (i.e., complex analysis), even though it is desirable to have relevant knowledge about it. At the beginning, mathematical physics looks daunting to chemists, as used to be the case with myself as a chemist. The book introduces the basic concepts of mathematical physics to chemists. Unlike other books related to mathematical physics, this book makes a reasonable selection of material so that students majoring in chemistry can readily understand the contents in spontaneity. In particular, we stress the importance of practical and intuitive methodology. We also expect engineers and physicists to benefit from reading this book. In Part I and Part II, the book describes quantum mechanics and electromagnetism. Relevance between the two is well considered. Although quantum mechanics covers the broad field of modern physics, in Part I we focus on a harmonic oscillator and a hydrogen (like) atom. This is because we can study and deal with many of fundamental concepts of quantum mechanics within these restricted topics. Moreover, knowledge acquired from the study of the topics can readily be extended to practical investigation of, e.g., electronic states and vibration (or vibronic) states of molecular systems. We describe these topics by both analytic method (that uses differential equations) and operator approach (using matrix calculations). We believe that the basic concepts of quantum mechanics can be best understood by contrasting the analytical and algebraic approaches. For this reason, we give matrix representations of physical quantities whenever possible. Examples include energy ix

x

Preface to the First Edition

eigenvalues of a quantum-mechanical harmonic oscillator and angular momenta of a hydrogen-like atom. At the same time, these two physical systems supply us with a good opportunity to study classical polynomials, e.g., Hermite polynomials, (associated) Legendre polynomials, Laguerre polynomials, Gegenbauer polynomials, and special functions, more generally. These topics constitute one of the important branches of mathematical physics. One of the basic concepts of quantum mechanics is that a physical quantity is represented by a Hermitian operator or matrix. In this respect, the algebraic approach gives a good opportunity to get familiar with this concept. We present tangible examples for this. We also emphasize the importance of the notion of Hermiticity of a differential operator. We often encounter a unitary operator or unitary transformation alongside the notion of Hermitian operators. We show several examples of unitary operators in connection with transformation of vectors and coordinates. Part II describes Maxwell equations and their applications to various phenomena of electromagnetic waves. These include their propagation, reflection, and transmission in dielectric media. We restrict ourselves to treating those phenomena in dielectrics without charge. Yet, we cover a wide range of important topics. In particular, when two (or more) dielectrics are in contact with each other at a plane interface, reflection and transmission of light are characterized by various important parameters such as reflection and transmission coefficients, Brewster angles, and critical angles. We should have a proper understanding not only from the point of view of basic study but also to make use of relevant knowledge in optical device applications such as a waveguide. In contrast to a concept of electromagnetic waves, light possesses a characteristic of light quanta. We present semiclassical and statistical approaches to blackbody radiation occurring in a simplified system in relation to Part I. The physical processes are well characterized by a notion of two-level atoms. In this context, we outline the dipole radiation within the framework of the classical theory. We briefly describe how the optical processes occurring in a confined dielectric medium are related to a laser that is of great importance in fundamental science and its applications. Many of basic equations of physics are descried as second-order linear differential equations (SOLDEs). Different methods were developed and proposed to seek their solutions. One of the most important methods is that of Green’s functions. We present the introductory theory of Green’s functions accordingly. In this connection, we rethink the Hermiticity of a differential operator. In Part III and Part IV, we describe algebraic structures of mathematical physics. Their understanding is useful to studies of quantum mechanics and electromagnetism whose topics are presented in Part I and Part II. Part III deals with theories of linear vector spaces. We focus on the discussion of vectors and their transformations in finite-dimensional vector spaces. Generally, we consider the vector transformations among the vector spaces of different dimensions. In this book, however, we restrict ourselves to the case of the transformation between the vector spaces of same dimension, i.e., endomorphism of the space (Vn ! Vn). This is not only because this is most often the case with many of physical applications, but because the relevant operator is represented by a square matrix. Canonical forms of square matrices hold

Preface to the First Edition

xi

an important position in algebra. These include a triangle matrix, diagonalizable matrix as well as a nilpotent matrix and idempotent matrix. The most general form will be Jordan canonical form. We present its essential parts in detail taking a tangible example. Next to the general discussion, we deal with an inner product space. Once an inner product is defined between any couple of vectors, the vector space is given a fruitful structure. An example is a norm (i.e., “length”) of a vector. Also, we gain a clear relationship between Part III and Part I. We define various operators or matrices that are important in physical applications. Examples include normal operators (or matrices) such as Hermitian operators, projection operators, and unitary operators. Once again, we emphasize the importance of Hermitian operators. In particular, two commutable Hermitian matrices share simultaneous eigenvectors (or eigenstates) and, in this respect, such two matrices occupy a special position in quantum mechanics. Finally, Part IV describes the essence of group theory and its chemical applications. Group theory has a broad range of applications in solid-state physics, solidstate chemistry, molecular science, etc. Nonetheless, the knowledge of group theory does not seem to have fully prevailed among chemists. We can discover an adequate reason for this in a preface to the first edition of Chemical Applications of Group Theory written by F. A. Cotton. It might well be natural that definition and statement of abstract algebra, especially group theory, sound somewhat pretentious for chemists, even though the definition of group is quite simple. Therefore, we present various examples for readers to get used to notions of group theory. The notion of mapping is important as in the case of the linear vector spaces. Aside from being additive with calculation for a vector space and multiplicative for a group, the fundamentals of calculation regulations are pretty much the same regarding the vector space and group. We describe the characteristics of symmetry groups in detail partly because related knowledge is useful for molecular orbital (MO) calculations that are presented in the last section of the book. Representation theory is probably one of the most daunting notions for chemists. Practically, however, the representation is just homomorphism that corresponds to a linear transformation in a vector space. In this context, the representation is merely denoted by a number or a matrix. Basis functions of representation correspond to basis vectors in a vector space. Grand orthogonality theorem (GOT) is a “nursery bed” of the representation theory. Therefore, readers are encouraged to understand its essence apart from the rigorous proof of the theorem. In conjunction with Part III, we present a variety of projection operators. These are very useful to practical applications in, e.g., quantum mechanics and molecular science. The final parts of the book are devoted to applications of group theory to problems of physical chemistry, especially those of quantum chemistry, more specifically molecular orbital calculations. We see how symmetry consideration, particularly the use of projection operators, saves us a lot of labor. Examples include aromatic hydrocarbons and methane. The previous sections sum up the contents of this book. Readers may start with any part and go freely back and forth. This is because contents of many parts are interrelated. For example, we emphasize the importance of Hermiticity of differential operators and matrices. Also projection operators and nilpotent matrices appear

xii

Preface to the First Edition

in many parts along with their tangible applications to individual topics. Hence, readers are recommended to carefully examine and compare the related contents throughout the book. We believe that readers, especially chemists, benefit from the writing style of this book, since it is suited to chemists who are good at intuitive understanding. The author would like to thank many students for their valuable suggestions and discussions at the lectures. The author also wishes to thank many students for valuable discussions and Dr. Shin’ichi Koizumi at Springer for giving him an opportunity to write this book. Kyoto, Japan October 2017

Shu Hotta

Contents

Part I

Quantum Mechanics

1

Schrödinger Equation and Its Application . . . . . . . . . . . . . . . . . . 1.1 Early-Stage Quantum Theory . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Simple Applications of Schrödinger Equation . . . . . . . . . . . . . 1.4 Quantum-Mechanical Operators and Matrices . . . . . . . . . . . . . 1.5 Commutator and Canonical Commutation Relation . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

3 3 8 14 21 27 30

2

Quantum-Mechanical Harmonic Oscillator . . . . . . . . . . . . . . . . . . 2.1 Classical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Formulation Based on an Operator Method . . . . . . . . . . . . . . . 2.3 Matrix Representation of Physical Quantities . . . . . . . . . . . . . 2.4 Coordinate Representation of Schrödinger Equation . . . . . . . . 2.5 Variance and Uncertainty Principle . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

31 31 33 41 44 51 56

3

Hydrogen-Like Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Constitution of Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Generalized Angular Momentum . . . . . . . . . . . . . . . . . . . . . . 3.5 Orbital Angular Momentum: Operator Approach . . . . . . . . . . . 3.6 Orbital Angular Momentum: Analytic Approach . . . . . . . . . . . 3.6.1 Spherical Surface Harmonics and Associated Legendre Differential Equation . . . . . . . . . . . . . . . . . 3.6.2 Orthogonality of Associated Legendre Functions . . . . 3.7 Radial Wave Functions of Hydrogen-Like Atoms . . . . . . . . . . 3.7.1 Operator Approach to Radial Wave Functions . . . . . . 3.7.2 Normalization of Radial Wave Functions . . . . . . . . . . 3.7.3 Associated Laguerre Polynomials . . . . . . . . . . . . . . .

. . . . . . .

57 57 58 67 72 77 91

. . . . . .

92 103 107 107 112 116 xiii

xiv

Contents

3.8 Total Wave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4

Optical Transition and Selection Rules . . . . . . . . . . . . . . . . . . . . . 4.1 Electric Dipole Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 One-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Three-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Angular Momentum of Radiation . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

5

Approximation Methods of Quantum Mechanics . . . . . . . . . . . . . 5.1 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Quantum State and Energy Level Shift Caused by Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Variational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 151 . 151

6

. . . .

153 156 172 179

Theory of Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Set and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . . . 6.1.2 Topological Spaces and Their Building Blocks . . . . . . . 6.1.3 T1-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Complex Numbers and Complex Plane . . . . . . . . . . . . 6.2 Analytic Functions of a Complex Variable . . . . . . . . . . . . . . . . 6.3 Integration of Analytic Functions: Cauchy’s Integral Formula . . . . 6.4 Taylor’s Series and Laurent’s Series . . . . . . . . . . . . . . . . . . . . . 6.5 Zeros and Singular Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Calculus of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Examples of Real Definite Integrals . . . . . . . . . . . . . . . . . . . . . 6.9 Multivalued Functions and Riemann Surfaces . . . . . . . . . . . . . . 6.9.1 Brief Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Examples of Multivalued Functions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

181 181 182 185 195 197 199 207 216 223 225 227 230 248 248 256 265

Part II 7

125 125 128 132 142 147 150

Electromagnetism

Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . . . 7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . . 7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

269 269 276 280 285 293

Contents

8

9

10

Reflection and Transmission of Electromagnetic Waves in Dielectric Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Electromagnetic Fields at an Interface . . . . . . . . . . . . . . . . . . 8.2 Basic Concepts Underlying Phenomena . . . . . . . . . . . . . . . . . 8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Energy Transport by Electromagnetic Waves . . . . . . . . . . . . . 8.5 Brewster Angles and Critical Angles . . . . . . . . . . . . . . . . . . . 8.6 Total Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Waveguide Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 TE and TM Waves in a Waveguide . . . . . . . . . . . . . . 8.7.2 Total Internal Reflection and Evanescent Waves . . . . . 8.8 Stationary Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 295 . 295 . 297 . . . . . . . . .

303 308 312 315 319 320 327 331 337

Light Quanta: Radiation and Absorption . . . . . . . . . . . . . . . . . . . 9.1 Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Two-Level Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Dipole Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Brief Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Organic Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Mechanical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

341 345 349 354 354 359 375 378

Introductory Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Second-Order Linear Differential Equations (SOLDEs) . . . . . . 10.2 First-Order Linear Differential Equations (FOLDEs) . . . . . . . . 10.3 Second-Order Differential Operators . . . . . . . . . . . . . . . . . . . 10.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Construction of Green’s Functions . . . . . . . . . . . . . . . . . . . . . 10.6 Initial Value Problems (IVPs) . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.2 Green’s Functions for IVPs . . . . . . . . . . . . . . . . . . . . 10.6.3 Estimation of Surface Terms . . . . . . . . . . . . . . . . . . . 10.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

379 379 384 389 394 401 408 408 411 414 418 425 430

. . . .

433 433 438 448

Part III 11

xv

. 339 . 339

Linear Vector Spaces

Vectors and Their Transformation . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Linear Transformations of Vectors . . . . . . . . . . . . . . . . . . . . . 11.3 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . .

xvi

Contents

11.4 Basis Vectors and Their Transformations . . . . . . . . . . . . . . . . . 452 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 12

Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Eigenspaces and Invariant Subspaces . . . . . . . . . . . . . . . . . . . 12.3 Generalized Eigenvectors and Nilpotent Matrices . . . . . . . . . . 12.4 Idempotent Matrices and Generalized Eigenspaces . . . . . . . . . 12.5 Decomposition of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.1 Canonical Form of Nilpotent Matrix . . . . . . . . . . . . . 12.6.2 Jordan Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.3 Example of Jordan Canonical Form . . . . . . . . . . . . . . 12.7 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

459 459 468 473 478 485 488 488 493 501 512 522

13

Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Inner Product and Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Gram Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

523 523 526 535 541 545

14

Hermitian Operators and Unitary Operators . . . . . . . . . . . . . . . . 14.1 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Unitary Diagonalization of Matrices . . . . . . . . . . . . . . . . . . . . 14.4 Hermitian Matrices and Unitary Matrices . . . . . . . . . . . . . . . . 14.5 Hermitian Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Simultaneous Eigenstates and Diagonalization . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

547 547 554 556 564 568 574 580

15

Exponential Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Exponential Functions of Matrices and Their Manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 System of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.2 System of Differential Equations in a Matrix Form: Resolvent Matrix . . . . . . . . . . . . . . . . . . . . . . . 15.3.3 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

581 581 585 592 592 595 600 611 617

Contents

Part IV

xvii

Group Theory and Its Chemical Applications

16

Introductory Group Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Definition of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Isomorphism and Homomorphism . . . . . . . . . . . . . . . . . . . . . 16.5 Direct-Product Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

621 621 624 625 627 631 633

17

Symmetry Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1 A Variety of Symmetry Operations . . . . . . . . . . . . . . . . . . . . 17.2 Successive Symmetry Operations . . . . . . . . . . . . . . . . . . . . . . 17.3 O and Td Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4 Special Orthogonal Group SO(3) . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Rotation Axis and Rotation Matrix . . . . . . . . . . . . . . 17.4.2 Euler Angles and Related Topics . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

635 635 643 654 663 664 669 678

18

Representation Theory of Groups . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Definition of Representation . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Basis Functions of Representation . . . . . . . . . . . . . . . . . . . . . 18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) . . . 18.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Regular Representation and Group Algebra . . . . . . . . . . . . . . 18.6 Classes and Irreducible Representations . . . . . . . . . . . . . . . . . 18.7 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.8 Direct-Product Representation . . . . . . . . . . . . . . . . . . . . . . . . 18.9 Symmetric Representation and Antisymmetric Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

679 679 683 692 697 700 707 710 720

19

Applications of Group Theory to Physical Chemistry . . . . . . . . . . 19.1 Transformation of Functions . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Method of Molecular Orbitals (MOs) . . . . . . . . . . . . . . . . . . . 19.3 Calculation Procedures of Molecular Orbitals (MOs) . . . . . . . . 19.4 MO Calculations Based on π-Electron Approximation . . . . . . . 19.4.1 Ethylene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.2 Cyclopropenyl Radical . . . . . . . . . . . . . . . . . . . . . . . 19.4.3 Benzene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.4 Allyl Radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 MO Calculations of Methane . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

20

Theory of Continuous Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 20.1 Introduction: Operators of Rotation and Infinitesimal Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799

. 723 . 727 729 729 734 742 747 747 757 764 770 777 798

xviii

Contents

20.2

Rotation Groups: SU(2) and SO(3) . . . . . . . . . . . . . . . . . . . . . . 20.2.1 Construction of SU(2) Matrices . . . . . . . . . . . . . . . . . . 20.2.2 SU(2) Representation Matrices: Wigner Formula . . . . . 20.2.3 SO(3) Representation Matrices and Spherical Surface Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.4 Irreducible Representations of SU(2) and SO(3) . . . . . . 20.2.5 Parameter Space of SO(3) . . . . . . . . . . . . . . . . . . . . . . 20.2.6 Irreducible Characters of SO(3) and Their Orthogonality 20.3 ClebschGordan Coefficients of Rotation Groups . . . . . . . . . . . 20.3.1 Direct-Product of SU(2) and ClebschGordan Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.2 Calculation Procedures of ClebschGordan Coefficients 20.3.3 Examples of Calculation of ClebschGordan Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 Lie Groups and Lie Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4.1 Definition of Lie Groups and Lie Algebras: One-Parameter Groups . . . . . . . . . . . . . . . . . . . . . . . . 20.4.2 Properties of Lie Algebras . . . . . . . . . . . . . . . . . . . . . 20.4.3 Adjoint Representation of Lie Groups . . . . . . . . . . . . . 20.5 Connectedness of Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . 20.5.1 Several Definitions and Examples . . . . . . . . . . . . . . . . 20.5.2 O(3) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5.3 Simply Connected Lie Groups: Local Properties and Global Properties . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

806 808 812 814 823 831 837 843 843 849 858 864 865 868 874 885 885 888 890 896

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899

Part I

Quantum Mechanics

Quantum mechanics is clearly distinguished from classical physics whose major pillars are Newtonian mechanics and electromagnetism established by Maxwell. Quantum mechanics was first established as a theory of atomic physics that handled microscopic world. Later on, quantum mechanics was applied to macroscopic world, i.e., cosmos. A question on how exactly quantum mechanics describes the natural world and on how far the theory can go remains yet problematic and is in dispute to thisday. Such an ultimate question is irrelevant to this monograph. Our major aim is to study a standard approach to applying Schrödinger equation to selected topics. The topics include a particle confined within a potential well, a harmonic oscillator, and a hydrogen-like atoms. Our major task rests on solving eigenvalue problems of these topics. To this end, we describe both an analytical method and algebraic (oroperator) method. Focusing on these topics, we will be able to acquire various methods to tackle a wide range of quantum-mechanical problems. These problems are usually posed as an analytical equation (i.e., differential equation) or an algebraic equation. A Hamiltonian is constructed analytically or algebraically accordingly. Besides Hamiltonian, physical quantities are expressed as a differential operator or a matrix operator. In both analytical and algebraic approaches, Hermitian property (orHermiticity) of an operator and matrix is of crucial importance. This feature will, therefore, be highlighted not only in this part but also throughout this book along with a unitary operator and matrix. Optical transition and associated selection rules are dealt with in relation to the above topics. Those subjects are closely related to electromagnetic phenomena that are considered in PartII. Unlike the eigenvalue problems of the abovementioned topics, it is difficult to get exact analytical solutions in most cases of quantum-mechanical problems. For this reason, we need appropriate methods to obtain approximate solutions with respect to various problems including the eigenvalue problems. In this context, we deal with approximation techniques of a perturbation method and variational method.

2

Part I

Quantum Mechanics

In the last part, we study the theory of analytic functions, one of the most elegant theories of mathematics. This approach not only helps cultivate a broad view of pure mathematics, but also leads to the acquisition of practical methodology. The last part deals with the introductory set theory and topology aswell.

Chapter 1

Schrödinger Equation and Its Application

Quantum mechanics is an indispensable research tool of modern natural science that covers cosmology, atomic physics, molecular science, materials science, and so forth. The basic concept underlying quantum mechanics rests upon Schrödinger equation. The Schrödinger equation is described as a second-order linear differential equation (SOLDE). The equation is analytically solved accordingly. Alternatively, equations of the quantum mechanics are often described in terms of operators and matrices and physical quantities are represented by those operators and matrices. Normally, they are noncommutative. In particular, the quantum-mechanical formalism requires the canonical commutation relation between position and momentum operators. One of great characteristics of the quantum mechanics is that physical quantities must be Hermitian. This aspect is deeply related to the requirement that these quantities should be described by real numbers. We deal with the Hermiticity from both an analytical point of view (or coordinate representation) relevant to the differential equations and an algebraic viewpoint (or matrix representation) associated with the operators and matrices. Including these topics, we briefly survey the origin of Schrödinger equation and consider its implications. To get acquainted with the quantum-mechanical formalism, we deal with simple examples of the Schrödinger equation.

1.1

Early-Stage Quantum Theory

The Schrödinger equation is a direct consequence of discovery of quanta. It stemmed from the hypothesis of energy quanta propounded by Max Planck (1900). This hypothesis was further followed by photon (light quantum) hypothesis propounded by Albert Einstein (1905). He claimed that light is an aggregation of light quanta and that individual quanta carry an energy E expressed as Planck constant h multiplied by frequency of light ν, i.e., © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_1

3

4

1 Schrödinger Equation and Its Application

(a)

recoiled electron ( )

(b) incident X-ray (ℏ )

−ℏ

rest electron ℏ ′

scattered X-ray (ℏ ′)

Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray beam. (b) Conservation of momentum

E ¼ hν ¼ ħω,

ð1:1Þ

where ħ h/2π and ω ¼ 2πν. The quantity ω is called angular frequency with ν being frequency. The quantity ħ is said to be a reduced Planck constant. Also Einstein (1917) concluded that momentum of light quantum p is identical to the energy of light quantum divided by light velocity in vacuum c. That is, we have p ¼ E=c ¼ ħω=c ¼ ħk,

ð1:2Þ

where k 2π/λ (λ is wavelength of light in vacuum) and k is called wavenumber. Using vector notation, we have p = ħk,

ð1:3Þ

where k 2πλ n (n: a unit vector in the direction of propagation of light) is said to be a wavenumber vector. Meanwhile, Arthur Compton (1923) conducted various experiments where he investigated how an incident X-ray beam was scattered by matter (e.g., graphite, copper). As a result, Compton found out a systematical redshift in X-ray wavelengths as a function of scattering angles of the X-ray beam (Compton effect). Moreover he found that the shift in wavelengths depended only on the scattering angle regardless of quality of material of a scatterer. The results can be summarized in a simple equation described as Δλ ¼

h ð1 cos θÞ, me c

ð1:4Þ

where Δλ denotes a shift in wavelength of the scattered beam; me is a rest mass of an electron; θ is a scattering angle of the X-ray beam (see Fig. 1.1). A quantity mhe c has a dimension of length and denoted by λe. That is,

1.1 Early-Stage Quantum Theory

5

λe h=me c:

ð1:5Þ

In other words, λe is equal to the maximum shift in the wavelength of the scattered beam; this shift is obtained when θ ¼ π/2. The quantity λe is called an electron Compton wavelength and has an approximate value of 2.426 1012 [m]. Let us derive (1.4) on the basis of conservation of energy and momentum. To this end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is incident to the electron. Then the X-ray is scattered and the electron recoils as shown. The energy conservation reads as ħω þ me c2 ¼ ħω0 þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 c2 þ me 2 c4 ,

ð1:6Þ

where ω and ω0 are initial and final angular frequencies of the X-ray; the second term of RHS is an energy of the electron in which p is a magnitude of momentum after recoil. Meanwhile, conservation of the momentum as a vector quantity reads as ħk = ħk0 þ p,

ð1:7Þ

where k and k0 are wavenumber vectors of the X-ray before and after being scattered; p is a momentum of the electron after recoil. Note that an initial momentum of the electron is zero since the electron is originally at rest. Here p is defined as p mu,

ð1:8Þ

where u is a velocity of an electron and m is given by [1]. m ¼ me =

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 juj2 =c2 :

ð1:9Þ

Figure 1.1 shows that ħk, ħk0, and p form a closed triangle. From (1.6), we have

me c2 þ ħðω ω0 Þ

2

¼ p2 c 2 þ m e 2 c 4 :

ð1:10Þ

Hence, we get 2me c2 ħðω ω0 Þ þ ħ2 ðω ω0 Þ ¼ p2 c2 : 2

ð1:11Þ

From (1.7), we have 2 2 p2 ¼ ħ2 ðk k0 Þ ¼ ħ2 k2 þ k0 2kk 0 cos θ ¼

ħ2 2 02 0 ω þ ω 2ωω cos θ , c2

ð1:12Þ

6

1 Schrödinger Equation and Its Application

where we used the relations ω ¼ ck and ω0 ¼ ck0 with the third equality. Therefore, we get 2 p2 c2 ¼ ħ2 ω2 þ ω0 2ωω0 cos θ :

ð1:13Þ

From (1.11) and (1.13), we have 2 2 2me c2 ħðω ω0 Þ þ ħ2 ðω ω0 Þ ¼ ħ2 ω2 þ ω0 2ωω0 cos θ :

ð1:14Þ

Equation (1.14) is simplified to the following: 2me c2 ħðω ω0 Þ 2ħ2 ωω0 ¼ 2ħ2 ωω0 cos θ: That is, me c2 ðω ω0 Þ ¼ ħωω0 ð1 cos θÞ:

ð1:15Þ

ω ω0 1 1 1 0 ħ ¼ 0 ¼ ðλ λÞ ¼ ð1 cos θÞ, ω ω 2πc ωω0 me c2

ð1:16Þ

Thus, we get

where λ and λ0 are wavelengths of the initial and final X-ray beams, respectively. Since λ0 λ ¼ Δλ, we have (1.4) from (1.16) accordingly. We have to mention another important person, Louis-Victor de Broglie (1924) in the development of quantum mechanics. Encouraged by the success of Einstein and Compton, he propounded the concept of matter wave, which was referred to as the de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and (1.2) such that ω ¼ E=ħ,

ð1:17Þ

and k¼

p ħ

or

λ ¼ h=p,

ð1:18Þ

where p equals jpj and λ is a wavelength of a corpuscular beam. This is said to be the de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an energy E and momentum p is accompanied by a wave that is characterized by an angular frequency ω and wavenumber k (or a wavelength λ ¼ 2π/k). Equation (1.18) implies that if we are able to determine the wavelength of the corpuscular beam experimentally, we can decide a magnitude of momentum accordingly.

1.1 Early-Stage Quantum Theory

7

In turn, from squares of both sides of (1.8) and (1.9) we get u¼

p qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : me 1 þ ðp=me cÞ2

ð1:19Þ

This relation represents a velocity of particles of the corpuscular beam. If we are dealing with an electron beam, (1.19) gives the velocity of the electron beam. As a nonrelativistic approximation (i.e., p/mec 1), we have p me u: We used a relativistic relation in the second term of RHS of (1.6), where an energy of an electron Ee is expressed by Ee ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 c 2 þ m e 2 c 4 :

ð1:20Þ

In the meantime, deleting u2 from (1.8) and (1.9) we have mc2 ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 c 2 þ m e 2 c 4 :

Namely, we get [1]. E e ¼ mc2 :

ð1:21Þ

The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equivalence theorem of mass and energy. If an electron is accompanied by a matter wave, that wave should be propagated with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18) we have pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 c2 þ me 2 c4 =p > c, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vg ¼ ∂ω=∂k ¼ ∂Ee =∂p ¼ c2 p= p2 c2 þ me 2 c4 < c,

vp ¼ ω=k ¼ Ee =p ¼

ð1:22Þ

vp vg ¼ c : 2

Notice that in the above expressions, we replaced E of (1.17) with Ee of (1.20). The group velocity is thought to be a velocity of a wave packet and, hence, a propagation velocity of a matter wave should be identical to vg. Thus, vg is considered as a particle velocity as well. In fact, vg given by (1.22) is identical to u expressed in (1.19). Therefore, a particle velocity must not exceed c. As for photons (or light quanta), vp ¼ vg ¼ c and, hence, once again we get vpvg ¼ c2. We will encounter the last relation of (1.22) in Part II as well. The above discussion is a brief historical outlook of early-stage quantum theory before Erwin Schrödinger (1926) propounded his equation.

8

1.2

1 Schrödinger Equation and Its Application

Schrödinger Equation

First we introduce a wave equation expressed by 2

∇2 ψ ¼

1 ∂ ψ , v2 ∂t 2

ð1:23Þ

where ψ is an arbitrary function of a physical quantity relevant to propagation of a wave; v is a phase velocity of wave; ∇2 called Laplacian is defined below 2

∇2

2

2

∂ ∂ ∂ þ 2þ 2: 2 ∂x ∂y ∂z

ð1:24Þ

One of special solutions for (1.24) called a plane wave is well studied and expressed as ψ ¼ ψ 0 eiðk ∙ xωtÞ :

ð1:25Þ

In (1.25), x denotes a position vector of a three-dimensional Cartesian coordinate and is described as 0 1 x B C x = ð e1 e2 e3 Þ @ y A,

ð1:26Þ

z where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive directions of x-, y-, and z-axes. Here we make it a rule to represent basis vectors by a row vector and represent a coordinate or a component of a vector by a column vector; see Sect. 9.1. The other way around, now we wish to seek a basic equation whose solution is described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we rewrite (1.25) as ψ ¼ ψ 0 eiðħ ∙ x ħ tÞ , p

0

1

E

ð1:27Þ

px B C where we redefine p = ðe1 e2 e3 Þ@ py A and E as quantities associated with those of pz matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we obtain

1.2 Schrödinger Equation

9 p E ∂ψ i i ¼ p ψ eiðħ ∙ x ħ tÞ ¼ px ψ: ħ ∂x ħ x 0

ð1:28Þ

Rewriting (1.28), we have ħ ∂ψ ¼ px ψ: i ∂x

ð1:29Þ

Similarly we have ħ ∂ψ ¼ py ψ i ∂y

and

ħ ∂ψ ¼ pz ψ: i ∂z

ð1:30Þ

Comparing both sides of (1.29), we notice that we may relate a differential operator to px. From (1.30), similar relationship holds with the y and z components. That is, we have the following relations: ħ ∂ i ∂x

ħ ∂ ħ ∂ $ px , $ py , i ∂x i ∂y

ħ ∂ $ pz : i ∂z

ð1:31Þ

Taking partial differentiation of (1.28) once more, 2 2 p E ∂ ψ i 1 ¼ ψ 0 eiðħ ∙ x ħtÞ ¼ 2 p2x ψ: p x ħ ∂x2 ħ

ð1:32Þ

Hence, 2

ħ2

∂ ψ ¼ p2x ψ: ∂x2

ð1:33Þ

Similarly we have 2

ħ2

∂ ψ ¼ p2y ψ ∂y2

2

and

ħ2

∂ ψ ¼ p2z ψ: ∂z2

ð1:34Þ

As in the above cases, we have 2

ħ2

2

2

∂ ∂ ∂ $ p2x , ħ2 2 $ p2y , ħ2 2 $ p2z : ∂x2 ∂y ∂z

Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have

ð1:35Þ

10

1 Schrödinger Equation and Its Application

ħ2 2 p2 ∇ ψ¼ ψ 2m 2m

ð1:36Þ

ħ2 2 p2 ∇ $ , 2m 2m

ð1:37Þ

and the following correspondence

where m is the mass of a particle. Meanwhile, taking partial differentiation of (1.27) with respect to t, we obtain p E ∂ψ i i ¼ Eψ 0 eiðħ ∙ x ħ tÞ ¼ Eψ: ħ ħ ∂t

ð1:38Þ

That is, iħ

∂ψ ¼ Eψ: ∂t

ð1:39Þ

As the above, we get the following relationship: iħ

∂ $ E: ∂t

ð1:40Þ

Thus, we have relationships between c-numbers (classical numbers) and q-numbers (quantum numbers, namely, operators) in (1.35) and (1.40). Subtracting (1.36) from (1.39), we get ∂ψ ħ2 2 iħ þ ∇ ψ¼ ∂t 2m

p2 E ψ: 2m

ð1:41Þ

Invoking the relationship on energy ðTotal energyÞ ¼ ðKinetic energyÞ þ ðPotential energyÞ,

ð1:42Þ

we have E¼

p2 þ V, 2m

where V is a potential energy. Thus, (1.41) reads as

ð1:43Þ

1.2 Schrödinger Equation

11

iħ

∂ψ ħ2 2 þ ∇ ψ ¼ Vψ: ∂t 2m

ð1:44Þ

Rearranging (1.44), we finally get ħ2 2 ∂ψ ∇ þ V ψ ¼ iħ : 2m ∂t

ð1:45Þ

This is the Schrödinger equation, a fundamental equation of quantum mechanics. In (1.45), we define a following Hamiltonian operator H as H

ħ2 2 ∇ þ V: 2m

ð1:46Þ

Then we have a shorthand representation such that Hψ ¼ iħ

∂ψ : ∂t

ð1:47Þ

On going from (1.25) to (1.27), we realize that quantities k and ω pertinent to a field have been converted to quantities p and E related to a particle. At the same time, whereas x and t represent a whole space-time in (1.25), those in (1.27) are characterized as localized quantities. From a historical point of view, we have to mention a great achievement accomplished by Werner Heisenberg (1925) who propounded matrix mechanics. The matrix mechanics is often contrasted with the wave mechanics Schrödinger initiated. Schrödinger and Pau Dirac (1926) demonstrated that wave mechanics and matrix mechanics are mathematically equivalent. Note that the Schrödinger equation is described as a nonrelativistic expression based on (1.43). In fact, kinetic energy K of a particle is given by [1]. m e c2 K ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m e c2 : 2 1 ðu=cÞ As a nonrelativistic approximation, we get

1 u 2 1 p2 , K m e c2 1 þ me c2 ¼ me u2 2 c 2 2me where we used p meu again as a nonrelativistic approximation; also, we used

12

1 Schrödinger Equation and Its Application

1 1 pffiffiffiffiffiffiffiffiffiffiffi 1 þ x 2 1x 2 when x (>0) corresponding to uc is enough small than 1. This implies that in the above case the group velocity u of a particle is supposed to be well below light velocity c. Dirac (1928) formulated an equation that describes relativistic quantum mechanics (the Dirac equation). In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential V depends only upon x. Then we have

∂ψ ðx, t Þ ħ2 2 : ∇ þ V ðxÞ ψ ðx, t Þ ¼ iħ 2m ∂t

ð1:48Þ

Now, let us assume that separation of variables can be done with (1.48) such that ψ ðx, t Þ ¼ ϕðxÞξðt Þ:

ð1:49Þ

∂ϕðxÞξðt Þ ħ2 2 : ∇ þ V ðxÞ ϕðxÞξðt Þ ¼ iħ 2m ∂t

ð1:50Þ

Then, we have

Accordingly, (1.50) can be recast as

∂ξðt Þ ħ2 2 =ξðt Þ: ∇ þ V ðxÞ ϕðxÞ=ϕðxÞ ¼ iħ 2m ∂t

ð1:51Þ

For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain fixed point x0 we have

∂ξðt Þ ħ2 2 =ξðt Þ, ∇ þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ ¼ iħ 2m ∂t

ð1:52Þ

where ϕ(x0) of a numerator should be evaluated after operating ∇2, while with ϕ(x0) in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us define a function Φ(x) such that

ħ2 2 Φð xÞ ∇ þ V ðxÞ ϕðxÞ=ϕðxÞ: 2m Then, we have

ð1:53Þ

1.2 Schrödinger Equation

13

Φðx0 Þ ¼ iħ

∂ξðt Þ =ξðt Þ: ∂t

ð1:54Þ

If RHS of (1.54) varied depending on t, Φ(x0) would be allowed to have various values, but this must not be the case with our present investigation. Thus, RHS of (1.54) should take a constant value E. For the same reason, LHS of (1.51) should take a constant. Thus, (1.48) or (1.51) should be separated into the following equations: HϕðxÞ ¼ EϕðxÞ,

ð1:55Þ

∂ξðt Þ ¼ Eξðt Þ: ∂t

ð1:56Þ

iħ

Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t, we have dξðt Þ E ¼ dt iħ ξðt Þ

or

d ln ξðt Þ ¼

E dt: iħ

ð1:57Þ

Integrating (1.57) from zero to t, we get ln

ξðt Þ Et ¼ : ξð0Þ iħ

ð1:58Þ

That is, ξðt Þ ¼ ξð0Þ exp ðiEt=ħÞ:

ð1:59Þ

Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56) represents an energy of a particle (electron). Thus, the next task we want to do is to solve an eigenvalue equation of (1.55). After solving the problem, we get a solution ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=ħÞ,

ð1:60Þ

where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normalized after determining the functional form (vide infra).

14

1.3

1 Schrödinger Equation and Its Application

Simple Applications of Schrödinger Equation

The Schrödinger equation has been expressed as (1.48). The equation is a secondorder linear differential equation (SOLDE). In particular, our major interest lies in solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex plane. Those points sometimes form a continuous domain, but we focus on the eigenvalues that comprise discrete points in the complex plane. Therefore, in our studies the eigenvalues are countable and numbered as, e.g., λn (n ¼ 1, 2, 3, ). An example is depicted in Fig. 1.2. Having this common belief as a background, let us first think of a simple form of SOLDE. Example 1.1 Let us think of a following differential equation: d 2 yð xÞ þ λyðxÞ ¼ 0, dx2

ð1:61Þ

where x is a real variable; y may be a complex function of x with λ possibly being a complex constant as well. Suppose that y(x) is defined within a domain [L, L] (L > 0). We set boundary conditions (BCs) for (1.61) such that yðLÞ ¼ 0

and yðLÞ ¼ 0 ðL > 0Þ:

ð1:62Þ

The BCs of (1.62) are called Dirichlet conditions. We define the following differential operator D described as D

d2 : dx2

ð1:63Þ

Then rewriting (1.61), we have

Fig. 1.2 Eigenvalues λn (n ¼ 1, 2, 3, ) on a complex plane

z

i 0

1

1.3 Simple Applications of Schrödinger Equation

DyðxÞ ¼ λyðxÞ:

15

ð1:64Þ

According to a general principle of SOLDE, it has two linearly independent solutions. In the case of (1.61), we choose exponential functions for those solutions described by eikx

and eikx ðk 6¼ 0Þ:

This is because the above functions do not change a functional form with respect to the differentiation and we ascribe solving a differential equation to solving an algebraic equation among constants (or parameters). In the present case, λ and k are such constants. The parameter k could be a complex variable, because λ is allowed to take a complex value as well. Linear independence of these functions is ensured from a nonvanishing Wronskian, W. That is,

eikx

W ¼ ikx 0

e

eikx

eikx ikx 0 ¼ ikx

e ike

eikx

¼ ik ik ¼ 2ik: ikeikx

ð1:65Þ

If k 6¼ 0, W 6¼ 0. Therefore, as a general solution, we get yðxÞ ¼ aeikx þ beikx ðk 6¼ 0Þ,

ð1:66Þ

where a and b are (complex) constant. We call two linearly independent solutions eikx and eikx (k 6¼ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66) into (1.61), we have λ k2 aeikx þ beikx ¼ 0:

ð1:67Þ

For (1.67) to hold with any x, we must have λ k2 ¼ 0, i:e:, λ ¼ k2 :

ð1:68Þ

aeikL þ beikL ¼ 0 and aeikL þ beikL ¼ 0:

ð1:69Þ

Using BCs (1.62), we have

Rewriting (1.69) in a matrix form, we have

eikL

eikL

eikL

eikL

0 a ¼ : b 0

For a and b in (1.70) to have nonvanishing solutions, we must have

ð1:70Þ

16

1 Schrödinger Equation and Its Application

ikL

e

eikL

eikL

¼ 0, eikL

i:e:,

e2ikL e2ikL ¼ 0:

ð1:71Þ

It is because if (1.71) were not zero, we would have a ¼ b ¼ 0 and y(x) 0. Note that with an eigenvalue problem we must avoid having a solution that is identically zero. Rewriting (1.71), we get ikL e þ eikL eikL eikL ¼ 0:

ð1:72Þ

eikL þ eikL ¼ 0

ð1:73Þ

eikL eikL ¼ 0:

ð1:74Þ

That is, we have either

or

In the case of (1.73), inserting this into (1.69) we have eikL ða bÞ ¼ 0:

ð1:75Þ

a ¼ b,

ð1:76Þ

Therefore,

where we used the fact that eikL is a nonvanishing function for any ikL (either real or complex). Similarly, in the case of (1.74), we have a ¼ b:

ð1:77Þ

yðxÞ ¼ a eikx þ eikx ¼ 2a cos kx:

ð1:78Þ

For (1.76), from (1.66) we have

With (1.77), in turn, we get yðxÞ ¼ a eikx eikx ¼ 2ia sin kx: Thus, we get two linearly independent solutions (1.78) and (1.79). Inserting BCs (1.62) into (1.78), we have

ð1:79Þ

1.3 Simple Applications of Schrödinger Equation

17

cos kL ¼ 0:

ð1:80Þ

Hence, kL ¼

π þ mπ ðm ¼ 0, 1, 2, Þ: 2

ð1:81Þ

π π In (1.81), for instance, we have k ¼ 2L for m ¼ 0 and k ¼ 2L for m ¼ 1. Also, we 3π 3π have k ¼ 2L for m ¼ 1 and k ¼ 2L for m ¼ 2. These cases, however, individually give linearly dependent solutions for (1.78). Therefore, to get a set of linearly independent eigenfunctions we may define k as positive. Correspondingly, from (1.68) we get eigenvalues of

λ ¼ ð2m þ 1Þ2 π 2 =4L2 ðm ¼ 0, 1, 2, Þ:

ð1:82Þ

Also, inserting BCs (1.62) into (1.79), we have sin kL ¼ 0:

ð1:83Þ

kL ¼ nπ ðn ¼ 1, 2, 3, Þ:

ð1:84Þ

λ ¼ n2 π 2 =L2 ¼ ð2nÞ2 π 2 =4L2 ðn ¼ 1, 2, 3, Þ,

ð1:85Þ

Hence,

From (1.68) we get

where we chose positive numbers n for the same reason as the above. With the second equality of (1.85), we made eigenvalues easily comparable to those of (1.82). Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2. From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and so from (1.68) we have k¼

pffiffiffi λ:

ð1:86Þ

The next step is to normalize eigenfunctions. This step corresponds to appropriate choice of a constant a in (1.78) and (1.79) so that we can have

[ 0 1

4

9

16

25

⋯

/4

+∞

Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62). The eigenvalues are given in a unit of π 2/4L2 on a real axis

18

1 Schrödinger Equation and Its Application

Z I¼

L L

yðxÞ yðxÞdx ¼

Z

L

L

jyðxÞj2 dx ¼ 1:

ð1:87Þ

That is, Z I ¼ 4jaj2

L

L

h

Z cos 2 kxdx ¼ 4jaj2

¼ 2jaj2 x þ

1 sin 2kx 2k

iL L

L

1 ð1 þ cos 2kxÞdx L 2

¼ 4Ljaj2 :

ð1:88Þ

Combining (1.87) and (1.88), we get rffiffiffi 1 : L

ð1:89Þ

rffiffiffi 1 iθ e , L

ð1:90Þ

1 j aj ¼ 2 Thus, we have a¼

1 2

iθ iθ where θ is any real number qffiffi and e is said to be a phase factor. We usually set e 1.

Then, we have a ¼ 12

1 L.

Thus for a normalized cosine eigenfunctions, we get

rffiffiffi h i 1 π yð xÞ ¼ cos kx kL ¼ þ mπ ðm ¼ 0, 1, 2, Þ L 2

ð1:91Þ

that corresponds to an eigenvalue λ ¼ (2m + 1)2π 2/4L2 (m ¼ 0, 1, 2, ). For another series of normalized sine functions, similarly we get rffiffiffi 1 yð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3, Þ

L

ð1:92Þ

that corresponds to an eigenvalue λ ¼ (2n)2π 2/4L2 (n ¼ 1, 2, 3, ). Notice that arranging λ in ascending order, we have even functions and odd functions alternately as eigenfunctions corresponding to λ. Such a property is said to be parity. We often encounter it in quantum mechanics and related fields. From (1.61) we find that if y(x) is an eigenfunction, so is cy(x). That is, we should bear in mind that the eigenvalue problem is always accompanied by an indeterminate constant and that normalization of an eigenfunction does not mean the uniqueness of the solution (see Chap. 10).

1.3 Simple Applications of Schrödinger Equation

19

Strictly speaking, we should be careful to assure that (1.81) holds on the basis of (1.80). It is because we have yet the possibility that k is a complex number. To see it, we examine zeros of a cosine function that is defined in a complex domain. Here the zeros are (complex) numbers to which the function takes zero. That is, if f(z0) ¼ 0, z0 is called a zero (i.e., one of zeros) of f(z). Now we have cos z

1 iz e þ eiz ; z ¼ x þ iy ðx, y : realÞ: 2

ð1:93Þ

Inserting z ¼ x + iy in cosz and rearranging terms, we get 1 cos z ¼ ½ cos xðey þ ey Þ þ i sin xðey ey Þ : 2

ð1:94Þ

For cosz to vanish, both its real and imaginary parts must be zero. Since e y + ey > 0 for all real numbers y, we must have cosx ¼ 0 for the real part to vanish; i.e., x¼

π þ mπ ðm ¼ 0, 1, 2, Þ: 2

ð1:95Þ

Note in this case that sinx ¼ 1 (6¼0). Therefore, for the imaginary part to vanish, ey e y ¼ 0. That is, we must have y ¼ 0. Consequently, the zeros of cosz are real numbers. In other words, with respect to z0 that satisfies cosz0 ¼ 0 we have z0 ¼

π þ mπ ðm ¼ 0, 1, 2, Þ: 2

ð1:96Þ

The above discussion equally applies to a sine function as well. Thus, we ensure that k is a nonzero real number. Eigenvalues λ are positive definite from (1.68) accordingly. This conclusion is not fortuitous but a direct consequence of the form of a differential equation we have dealt with in combination with the BCs we imposed, i.e., the Dirichlet conditions. Detailed discussion will follow in Sects. 1.4, 8.3, and 8.4 in relation to the Hermiticity of a differential operator. Example 1.2: A Particle Confined Within a Potential Well The results obtained in Example 1.1 can immediately be applied to dealing with a particle (electron) in a one-dimensional infinite potential well. In this case, (1.55) reads as ħ2 d2 ψ ðxÞ þ Eψ ðxÞ ¼ 0, 2m dx2

ð1:97Þ

where m is a mass of a particle and E is an energy of the particle. A potential V is expressed as

20

1 Schrödinger Equation and Its Application

V ð xÞ ¼

0 1

ðL x LÞ, ðL > x; x > LÞ:

Rewriting (1.97), we have d2 ψ ðxÞ 2mE þ 2 ψ ð xÞ ¼ 0 dx2 ħ

ð1:98Þ

ψ ðLÞ ¼ ψ ðLÞ ¼ 0:

ð1:99Þ

with BCs

If we replace λ of (1.61) with 2mE , we can follow the procedures of Example 1.1. That ħ2 is, we put E¼

ħ2 λ 2m

ð1:100Þ

with λ ¼ k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with energy eigenvalues we get either E¼

2 ħ2 ð2l þ 1Þ π 2 ðl ¼ 0, 1, 2, Þ, 2m 4L2

to which rffiffiffi h i 1 π ψ ð xÞ ¼ cos kx kL ¼ þ lπ ðl ¼ 0, 1, 2, Þ L 2

ð1:101Þ

corresponds or E¼

2 ħ2 ð2nÞ π 2 ðn ¼ 1, 2, 3, Þ, 2m 4L2

to which rffiffiffi 1 ψ ð xÞ ¼ sin kx ½kL ¼ nπ ðn ¼ 1, 2, 3, Þ

L

ð1:102Þ

corresponds. Since the particle behaves as a free particle within the potential well (L x L ) and p ¼ ħk, we obtain

1.4 Quantum-Mechanical Operators and Matrices

E¼

21

p2 ħ2 2 ¼ k , 2m 2m

where

k¼

8 > < ð2l þ 1Þπ=2L > :

ðl ¼ 0, 1, 2, Þ, ðn ¼ 1, 2, 3, Þ:

2nπ=2L

The energy E is a kinetic energy of the particle. Although in (1.97), ψ(x) 0 trivially holds, such a function may not be regarded as a solution of the eigenvalue problem. In fact, considering that |ψ(x)|2 represents existence probability of a particle, ψ(x) 0 corresponds to a situation where a particle in question does not exist. Consequently, such a trivial case has physically no meaning.

1.4

Quantum-Mechanical Operators and Matrices

As represented by (1.55), a quantum-mechanical operator corresponds to a physical quantity. In (1.55), we connect a Hamiltonian operator to an energy (eigenvalue). Let us rephrase the situation as follows: PΨ ¼ pΨ :

ð1:103Þ

In (1.103), we are viewing P as an operation or measurement on a physical system that is characterized by the quantum state Ψ . Operating P on the physical system (or state), we obtain a physical quantity p relevant to P as a result of the operation (or measurement). A way to effectively achieve the above is to use a matrix and vector to represent the operation and physical state, respectively. Let us glance a little bit of matrix calculation to get used to the quantum-mechanical concept and, hence, to obtain clear understanding about it. In Part III, we will deal with matrix calculation in detail from a point of view of a general principle. At present, a (2, 2) matrix suffices. Let A be a (2, 2) matrix expressed as A¼

a c

b : d

Let jψi be a (2, 1) matrix, i.e., a column vector such that

ð1:104Þ

22

1 Schrödinger Equation and Its Application

e j ψi ¼ : f

ð1:105Þ

Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix. Furthermore, we define an adjoint matrix A{ such that {

A ¼

c , d

a b

ð1:106Þ

where a is a complex conjugate of a. That is, A{ is a complex conjugate transposed matrix of A. Also, we define an adjoint vector hψ| or jψi{ such that hψj j ψi{ ¼ ðe f Þ:

ð1:107Þ

In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector, respectively. This naming or equivoque comes from that hψ j j φi ¼ hψ| φi forms a bracket. This is a (1, 2) (2, 1) ¼ (1, 1) matrix, i.e., a c-number (including a complex number) and hψ| φi represents an inner product. These notations are widely used nowadays in the field of mathematics and physics. g Taking another vector j ξi ¼ and using a matrix calculation rule, we have h {

{

A j ψi ¼j A ψi ¼

a

c

e

b

d

f

¼

a e þ c f b e þ d f

:

ð1:108Þ

According to the definition (1.107), we have j A{ ψi{ ¼ A{ ψj ¼ ðae þ cf be þ df Þ:

ð1:109Þ

Thus, we get

A{ ψjξi ¼ ðae þ cf be þ df Þ

g ¼ ðag þ bhÞe þ ðcg þ dhÞf : ð1:110Þ h

Similarly, we have hψjAξi ¼ ðe f Þ

a b c d

g ¼ ðag þ bhÞe þ ðcg þ dhÞf : h

Comparing (1.110) and (1.111), we get

ð1:111Þ

1.4 Quantum-Mechanical Operators and Matrices

23

A{ ψjξi ¼ hψjAξi:

ð1:112Þ

hψjAξi ¼ hAξjψi:

ð1:113Þ

Also, we have

Replacing A with A{ in (1.112), we get D { E A{ ψjξ ¼ ψjA{ ξ :

ð1:114Þ

From (1.104) and (1.106), obviously we have

A{

{

¼ A:

ð1:115Þ

Then, from (1.114) and (1.115) we have hAψjξi ¼ ψjA{ ξ ¼ hξjAψ i ,

ð1:116Þ

where the second equality comes from (1.113) obtained by exchanging ψ and ξ there. Moreover, we have a following relation: ðABÞ{ ¼ B{ A{ :

ð1:117Þ

The proof is left for readers. Using this relation, we have hAψ j¼j Aψi{ ¼ ½Ajψi { ¼j ψi{ A{ ¼ ψjA{ ¼ ψA{ j :

ð1:118Þ

Making an inner product by multiplying jξi from the right of the leftmost and rightmost sides of (1.118) and using (1.116), we get hAψ j ξi ¼ ψA{ j ξi ¼ ψjA{ ξ : This relation may be regarded as the associative law with regard to the symbol “j” of the inner product. This is equivalent to the associative law with regard to the matrix multiplication. The results obtained above can readily be extended to a general case where (n, n) matrices are dealt with. Now, let us introduce an Hermitian operator (or matrix) H. When we have H { ¼ H,

ð1:119Þ

24

1 Schrödinger Equation and Its Application

H is called an Hermitian matrix. Then, applying (1.112) to the Hermitian matrix H, we have

H { ψjξi ¼ hψjHξi ¼ ψjH { ξ or hHψjξi ¼ ψjH { ξ ¼ hψjHξi:

ð1:120Þ

Also let us introduce a norm of a vector jψi such that jjψ jj ¼

pffiffiffiffiffiffiffiffiffiffiffiffi hψjψi:

ð1:121Þ

A norm is a natural extension for a notion of a “length” of a vector. The norm j|ψ|j is zero, if and only if jψi ¼ 0 (zero vector). Then, from (1.105) and (1.107), we have hψjψi ¼ jej2 þ j f j2 : Therefore, hψ| ψi ¼ 0 ⟺ e ¼ f ¼ 0, i. e. , j ψi ¼ 0. Let us further consider an eigenvalue problem represented by our newly introduced notation. The eigenvalue equation is symbolically written as H j ψi ¼ λ j ψi,

ð1:122Þ

where H represents an Hermitian operator and jψi is an eigenfunction that belongs to an eigenvalue λ. Operating hψ| on (1.122) from the left, we have hψjH jψi ¼ hψjλjψi ¼ λhψjψi ¼ λ,

ð1:123Þ

where we assume that jψi is normalized, namely hψ| ψi ¼ 1 or j|ψ| j ¼ 1. Notice that the symbol “j” in an inner product is of secondary importance. We may disregard this notation as in the case where a product notation “” is omitted by denoting ab instead of a b. Taking a complex conjugate of (1.123), we have hψjH ψi ¼ λ :

ð1:124Þ

Using (1.116) and (1.124), we have λ ¼ hψjH ψi ¼ ψjH { ψ ¼ hψjHψi ¼ λ,

ð1:125Þ

where with the third equality we used the definition (1.119). The relation λ ¼ λ obviously shows that any eigenvalue λ is real, if H is Hermitian. The relation (1.125) immediately tells us that even though jψi is not an eigenfunction, hψ| Hψi is real as well, if H is Hermitian. The quantity hψ| Hψi is said to be an expectation value. This value is interpreted as the most probable or averaged value of H obtained as a result of operation of H on a physical state jψi. We sometimes denote the expectation value as

1.4 Quantum-Mechanical Operators and Matrices

25

hHi hψjHψi,

ð1:126Þ

where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis of (1.121) by choosing jΦi such that j Φi ¼j ψi= jjψ jj :

ð1:127Þ

Thus, we have an important consequence; if an Hermitian operator has an eigenvalue, it must be real. An expectation value of an Hermitian operator is real as well. The real eigenvalue and expectation value are a prerequisite for a physical quantity. As discussed above, the Hermitian matrices play a central role in quantum physics. Taking a further step, let us extend the notion of Hermiticity to a function space. In Example 1.1, we have remarked that we have finally reached a solution where λ is a real (and positive) number, even though at the beginning we set no restriction on λ. This is because the SOLDE form (1.61) accompanied by BCs (1.62) is Hermitian, and so eigenvalues λ are real. In this context, we give a little bit of further consideration. We define an inner product between two functions as follows: Z

b

hgjf i

gðxÞ f ðxÞdx,

ð1:128Þ

a

where g(x) is a complex conjugate of g(x); x is a real variable and an integration range can be either bounded or unbounded. If a and b are real definite numbers, [a, b] is the bounded case. With the unbounded case, we have, e.g., (1, 1), (1, c), and (c, 1), etc. where c is a definite number. This notation will appear again in Chap. 10. In (1.128) we view functions f and g as vectors in a function space, often referred to as a Hilbert space. We assume that any function f is square-integrable; i.e., |f |2 is finite. That is, Z

b

j f ðxÞj2 dx < 1:

ð1:129Þ

a

Using the above definition, let us calculate hg| Df i, where D was defined in (1.63). Then, using the integration by parts, we have

Z b 0 b 0 d2 f ðxÞ gð x Þ g f 0 dx dx ¼ g f a þ hg j Df i ¼ 2 dx a a Z b Z b 0 b 0 b 00 00 b g fdx ¼ ½g 0 f g f 0 a þ g f dx ¼ g f a þ g f a Z

b

a

¼ ½g 0 f g f 0 a þ hDg j f i: b

a

ð1:130Þ

26

1 Schrödinger Equation and Its Application

If we have BCs such that f ðbÞ ¼ f ðaÞ ¼ 0 and gðbÞ ¼ gðaÞ ¼ 0, i:e:, gðbÞ ¼ gðaÞ ¼ 0,

ð1:131Þ

we get hgjDf i ¼ hDgjf i:

ð1:132Þ

In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has this property. Thus, the Hermiticity of a differential operator is closely related to BCs of the differential equation. Next, we consider a following inner product: Z hf jDf i ¼

b

00

f f dx ¼ ½

a

¼ ½ f f 0 a þ

Z

b

b

b f f 0 a

Z

b

þ

0

f f 0 dx

a

j f 0 j dx: 2

ð1:133Þ

a

Note that the definite integral of (1.133) cannot be negative. There are two possibilities for D to be Hermitian according to different BCs. (i) Dirichlet conditions: f(b) ¼ f(a) ¼ 0. If we could have f 0 ¼ 0, h f j Df i would be zero. But, in that case f should be constant. If so, f(x) 0 according to BCs. We must exclude this trivial case. Consequently, to avoid this situation we must have Z

b

j f 0 j dx > 0 or h f jDf i > 0: 2

ð1:134Þ

a

In this case, the operator D is said to be positive definite. Suppose that such a positive-definite operator has an eigenvalue λ. Then, for a corresponding eigenfunction y(x) we have DyðxÞ ¼ λyðxÞ:

ð1:135Þ

In this case, we state that y(x) is an eigenfunction or eigenvector that corresponds (or belongs) to an eigenvalue λ. Taking an inner product of both sides, we have

1.5 Commutator and Canonical Commutation Relation

hyjDyi ¼ hyjλyi ¼ λhyjyi ¼ λkyk2 or λ ¼ hyjDyi=kyk2 :

27

ð1:136Þ

Both hy| Dyi and kyk2 are positive and, hence, we have λ > 0. Thus, if D has an eigenvalue, it must be positive. In this case, λ is said to be positive definite as well; see Example 1.1. (ii) Neumann conditions: f 0(b) ¼ f 0(a) ¼ 0. From (1.130), D is Hermitian as well. Unlike the condition (i), however, f may be a nonzero constant in this case. Therefore, we are allowed to have Z

b

j f 0 j dx ¼ 0 or hf jDf i ¼ 0: 2

ð1:137Þ

a

For any function, we have hf jDf i 0:

ð1:138Þ

In this case, the operator D is said to be non-negative (or positive semidefinite). The eigenvalue may be zero from (1.136) and, hence, is called non-negative accordingly. (iii) Periodic conditions: f(b) ¼ f(a) and f 0(b) ¼ f 0(a). We are allowed to have h f j Df i 0 as in the case of the condition (ii). Then, the operator and eigenvalues are non-negative. Thus, in spite of being formally the same operator, that operator behaves differently according to the different BCs. In particular, for a differential operator to be associated with an eigenvalue of zero produces a special interest. We will encounter another illustration in Chap. 3.

1.5

Commutator and Canonical Commutation Relation

In quantum mechanics it is important whether two operators A and B are commutable. In this context, a commutator between A and B is defined such that ½A, B AB BA:

ð1:139Þ

If [A, B] ¼ 0 (zero matrix), A and B are said to be commutable (or commutative). If [A, B] 6¼ 0, A and B are noncommutative. Such relationships between two operators are called commutation relation. We have canonical commutation relation as an underlying concept of quantum mechanics. This is defined between a (canonical) coordinate q and a (canonical) momentum p such that

28

1 Schrödinger Equation and Its Application

½q, p ¼ iħ,

ð1:140Þ

where the presence of a unit matrix E is implied. Explicitly writing it, we have, ½q, p ¼ iħE:

ð1:141Þ

The relations (1.140) and (1.141) are called the canonical commutation relation. On ∂ the basis of a relation p ¼ ħi ∂q , a brief proof for this is as follows:

ħ ∂ ħ ∂ ħ ∂

ħ ∂ ψi q q

ψi ¼ q ðqjψiÞ i ∂q i ∂q i ∂q i ∂q

ħ ∂ j ψi ħ ∂q

ħ ∂ j ψi ħ

¼ ψi ¼ iħ j ψi: ψi q ð1:142Þ ¼q i ∂q i ∂q i i ∂q

½q, p jψi ¼ ðqp pqÞjψi ¼

Since jψi is an arbitrarily chosen vector, we have (1.140). Using (1.117), we have ½A, B { ¼ ðAB BAÞ{ ¼ B{ A{ A{ B{ :

ð1:143Þ

If in (1.143) A and B are both Hermitian, we have ½A, B { ¼ BA AB ¼ ½A, B :

ð1:144Þ

If we have an operator G such that G{ ¼ G,

ð1:145Þ

G is said to be anti-Hermitian. Therefore, [A, B] is anti-Hermitian, if both A and B are Hermitian. If an anti-Hermitian operator has an eigenvalue, that eigenvalue is zero or pure imaginary. To show this, suppose that G j ψi ¼ λ j ψi,

ð1:146Þ

where G is an anti-Hermitian operator and jψi has been normalized. As in the case of (1.123) we have hψjG j ψi ¼ λhψjψi ¼ λ:

ð1:147Þ

Taking a complex conjugate of (1.147), we have hψjGψi ¼ λ : Using (1.116) and (1.145) again, we have

ð1:148Þ

1.5 Commutator and Canonical Commutation Relation

λ ¼ hψjGψi ¼ ψjG{ ψ ¼ hψjGψi ¼ λ,

29

ð1:149Þ

This shows that λ is zero or pure imaginary. Therefore, (1.142) can be viewed as an eigenvalue equation to which any physical state jψi has a pure imaginary eigenvalue iħ with respect to [q, p]. Note that both q and p are Hermitian (see Sect. 10.2, Example 10.3), and so [q, p] is anti-Hermitian as mentioned above. The canonical commutation relation given by (1.140) is believed to underpin the uncertainty principle. In quantum mechanics, it is of great importance whether a quantum operator is Hermitian or not. A position operator and momentum operator along with an angular momentum operator are particularly important when we constitute Hamiltonian. Let f and g be arbitrary functions. Let us consider, e.g., a following inner product with the momentum operator. Z hgjpf i ¼

b

gð x Þ

a

ħ ∂ ½ f ðxÞ dx, i ∂x

ð1:150Þ

where the domain [a, b] depends on a physical system; this can be either bounded or unbounded. Performing integration by parts, we have Z b ħ ∂ ħ b hgjpf i ¼ ½gðxÞ f ðxÞ a ½gðxÞ f ðxÞdx i i a ∂x

Z b ħ ħ ∂ gðxÞ f ðxÞdx: ¼ ½gðbÞ f ðbÞ gðaÞ f ðaÞ þ i i ∂x a

ð1:151Þ

If we require f(b) ¼ f(a) and g(b) ¼ g(a), the first term vanishes and we get hgjpf i ¼

Z b a

ħ ∂ gðxÞ f ðxÞdx ¼ hpgj f i: i ∂x

ð1:152Þ

Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a position operator q of (1.142) is Hermitian as a priori assumption. Meanwhile, the z-component of angular momentum operator Lz is described in a polar coordinate as follows: Lz ¼

ħ ∂ , i ∂ϕ

ð1:153Þ

where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of Lz will be mentioned in Chap. 3. Similarly as the above, we have

30

1 Schrödinger Equation and Its Application

ħ hgjLz f i ¼ ½gð2π Þ f ð2π Þ gð0Þ f ð0Þ þ i

2π

Z 0

ħ ∂ gðxÞ f ðxÞdϕ: i ∂ϕ

ð1:154Þ

Requiring an arbitrary function f to satisfy a BC f(2π) ¼ f(0), we reach hgjLz f i ¼ hLz gj f i:

ð1:155Þ

Note that we must have the above BC, because ϕ ¼ 0 and ϕ ¼ 2π are spatially the same point. Thus, we find that Lz is Hermitian as well on this condition. On the basis of aforementioned argument, let us proceed to quantum-mechanical studies of a harmonic oscillator. Regarding the angular momentum, we will study their basic properties in Chap. 3.

Reference 1. Møller C (1952) The theory of relativity. Oxford University Press, London

Chapter 2

Quantum-Mechanical Harmonic Oscillator

Quantum-mechanical treatment of a harmonic oscillator has been a well-studied topic from the beginning of the history of quantum mechanics. This topic is a standard subject in classical mechanics as well. In this chapter, first we briefly survey characteristics of a classical harmonic oscillator. From a quantum-mechanical point of view, we deal with features of a harmonic oscillator through matrix representation. We define creation and annihilation operators using position and momentum operators. A Hamiltonian of the oscillator is described in terms of the creation and annihilation operators. This enables us to easily determine energy eigenvalues of the oscillator. As a result, energy eigenvalues are found to be positive definite. Meanwhile, we express the Schrödinger equation by the coordinate representation. We compare the results with those of the matrix representation and show that the two representations are mathematically equivalent. Thus, the treatment of the quantum-mechanical harmonic oscillator supplies us with a firm ground for studying basic concepts of the quantum mechanics.

2.1

Classical Harmonic Oscillator

Classical Newtonian equation of a one-dimensional harmonic oscillator is expressed as m

d2 xðtÞ ¼ sxðt Þ, dt 2

ð2:1Þ

where m is a mass of an oscillator and s is a spring constant. Putting s/m ¼ ω2, we have

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_2

31

32

2 Quantum-Mechanical Harmonic Oscillator

d 2 xð t Þ þ ω2 xðt Þ ¼ 0: dt 2

ð2:2Þ

In (2.2) we set ω positive, namely, ω¼

pffiffiffiffiffiffiffiffi s=m,

ð2:3Þ

where ω is called an angular frequency of the oscillator. If we replace ω2 with λ, we have formally the same equation as (1.61). Two linearly independent solutions of (2.2) are the same as before (see Example 1.1); we have eiωt and eiωt (ω 6¼ 0) as such. Note, however, that in Example 1.2 we were dealing with a quantum state related to existence probability of a particle in a potential well. In (2.2), on the other hand, we are examining a position of harmonic oscillator undergoing a force of a spring. We are thus considering a different situation. As a general solution we have xðt Þ ¼ aeiωt þ beiωt ,

ð2:4Þ

where a and b are suitable constants. Let us consider BCs different from those of Examples 1.1 or 1.2 this time. That is, we set BCs such that x ð 0Þ ¼ 0

and x0 ð0Þ ¼ v0 ðv0 > 0Þ:

ð2:5Þ

Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included in BCs (see Chap. 10). From (2.4) we have xð t Þ ¼ a þ b ¼ 0

and

x0 ð0Þ ¼ iωða bÞ ¼ v0 :

ð2:6Þ

Then, we get a ¼ b ¼ v0/2iω. Thus, we get a simple harmonic motion as a solution expressed as xð t Þ ¼

v v0 iωx e eiωx ¼ 0 sin ωt: 2iω ω

ð2:7Þ

From this, we have 1 E ¼ K þ V ¼ mv20 : 2

ð2:8Þ

In particular, if v0 ¼ 0, x(t) 0. This is a solution of (2.1) that has the meaning that the particle is eternally at rest. It is physically acceptable as well. Notice also that unlike Examples 1.1 and 1.2, the solution has been determined uniquely. This is due to the different BCs.

2.2 Formulation Based on an Operator Method

33

From a point of view of a mechanical system, mathematical formulation of the classical harmonic oscillator resembles that of electromagnetic fields confined within a cavity. We return this point later in Sect. 9.6.

2.2

Formulation Based on an Operator Method

Now let us return to our task to find quantum-mechanical solutions of a harmonic oscillator. Potential V is given by 1 1 V ðqÞ ¼ sq2 ¼ mω2 q2 , 2 2

ð2:9Þ

where q is used for a one-dimensional position coordinate. Then, we have a classical Hamiltonian H expressed as H¼

p2 p2 1 þ V ð qÞ ¼ þ mω2 q2 : 2m 2m 2

ð2:10Þ

Following the formulation of Sect. 1.2, the Schrödinger equation as an eigenvalue equation related to energy E is described as Hψ ðqÞ ¼ Eψ ðqÞ or

ħ2 2 1 ∇ ψ ðqÞ þ mω2 q2 ψ ðqÞ ¼ Eψ ðqÞ: 2 2m

ð2:11Þ

This is a SOLDE and it is well known that the SOLDE can be solved by a power series expansion method. In the present studies, however, let us first use an operator method to solve the eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a quantum-mechanical Hamiltonian where a momentum operator p is explicitly represented. Thus, the Hamiltonian reads as H¼

p2 1 þ mω2 q2 : 2m 2

ð2:12Þ

Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and q are expressed as quantum-mechanical operators. As in (1.126), we first examine an expectation value hHi of H. It is given by

34

2 Quantum-Mechanical Harmonic Oscillator

2 D E p 1 hH i ¼ hψjHψ i ¼ ψ ψ þ ψ mω2 q2 ψ 2 2m 1 1 1 1 ¼ p{ ψ jpψ þ mω2 q{ ψ jqψ ¼ hpψ jpψ i þ mω2 hqψ jqψ i 2m 2 2m 2 1 1 2 2 2 ¼ kpψ k þ mω kqψ k 0, 2m 2

ð2:13Þ

where again we assumed that jψi has been normalized. In (2.13) we used the notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi takes a non-negative value. In (2.13), the equality holds if and only if jpψi ¼ 0 and jqψi ¼ 0. Let us specify a vector jψ 0i that satisfies these conditions such that j pψ 0 i ¼ 0

and

j qψ 0 i ¼ 0:

ð2:14Þ

Multiplying q from the left on the first equation of (2.14) and multiplying p from the left on the second equation, we have qp j ψ 0 i ¼ 0

and

pq j ψ 0 i ¼ 0:

ð2:15Þ

Subtracting the second equation of (2.15) from the first equation, we get ðqp pqÞjψ 0 i ¼ iħjψ 0 i ¼ 0,

ð2:16Þ

where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i 0. This leads to the relations (2.14). That is, if and only if jψ 0(q)i 0, hHi ¼ 0. But, since it has no physical meaning, jψ 0(q)i 0 must be rejected as unsuitable for the solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must take a positive definite value accordingly. Thus, on the basis of the canonical commutation relation, we restrict the range of the expectation values. Instead of directly dealing with (2.12), it is well known to introduce following operators [1]: a

rffiffiffiffiffiffiffi mω i q þ pffiffiffiffiffiffiffiffiffiffiffiffi p 2ħ 2mħω

ð2:17Þ

and its adjoint (complex conjugate) operator rffiffiffiffiffiffiffi mω i q pffiffiffiffiffiffiffiffiffiffiffiffi p: a ¼ 2ħ 2mħω {

ð2:18Þ

Notice here again that both q and p are Hermitian. Using a matrix representation for (2.17) and (2.18), we have

2.2 Formulation Based on an Operator Method

a a{

0 qffiffiffiffiffiffiffi mω B 2ħ ¼B ffiffiffiffiffiffiffi @ qmω 2ħ

35

1 i

pffiffiffiffiffiffiffiffiffiffiffiffi q 2mħω C C A p i pffiffiffiffiffiffiffiffiffiffiffiffi 2mħω

ð2:19Þ

Then we have a{ a ¼ ð2mħωÞ1 ðmωq ipÞðmωq þ ipÞ

¼ ð2mħωÞ1 m2 ω2 q2 þ p2 þ imωðqp pqÞ h i 1 1 2 1 1 p þ iωiħ ¼ ðħωÞ1 H ħω , ¼ ðħωÞ1 mω2 q2 þ 2 2m 2 2

ð2:20Þ

where the second last equality comes from (1.140). Rewriting (2.20), we get 1 H ¼ ħωa{ a þ ħω: 2

ð2:21Þ

1 H ¼ ħωaa{ ħω: 2

ð2:22Þ

Similarly we get

Subtracting (2.22) from (2.21), we have 0 ¼ ħωa{ a ħωaa{ þ ħω:

ð2:23Þ

That is,

a, a{ ¼ 1 or a, a{ ¼ E,

ð2:24Þ

where E represents an identity operator. Furthermore, using (2.21) we have

h i

1 H, a{ ¼ ħω a{ a þ , a{ ¼ ħω a{ aa{ a{ a{ a ¼ ħωa{ a, a{ ¼ ħωa{ : ð2:25Þ 2 Similarly, we get ½H, a ¼ ħωa:

ð2:26Þ

Next, let us calculate an expectation value of H. Using a normalized function jψi, from (2.21) we have E D 1 1 hψ jH jψ i ¼ ψ ħωa{ a þ ħωψ ¼ ħω ψ a{ aψ þ ħωhψjψ i 2 2 1 1 1 ¼ ħωhaψjaψ i þ ħω ¼ ħωkaψ k2 þ ħω ħω: 2 2 2

ð2:27Þ

36

2 Quantum-Mechanical Harmonic Oscillator

Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with that an energy eigenvalue is positive definite as mentioned above. Equation (2.27) also tells us that if we have j aψ 0 i ¼ 0,

ð2:28Þ

1 hψ 0 jHjψ 0 i ¼ ħω: 2

ð2:29Þ

we get

Equation (2.29) means that the smallest expectation value is 12 ħω on the condition of (2.28). On the same condition, using (2.21) we have 1 1 H j ψ 0 i ¼ ħωa{ a j ψ 0 i þ ħω j ψ 0 i ¼ ħω j ψ 0 i: 2 2

ð2:30Þ

Thus, jψ 0i is an eigenfunction corresponding to an eigenvalue 12 ħω E0 , which is identical with the smallest expectation value of (2.29). Since this is the lowest eigenvalue, jψ 0i is said to be a ground state. We ensure later that jψ 0i is certainly an eligible function for a ground state. The above method is consistent with the variational principle [2] which stipulates that under appropriate BCs an expectation value of Hamiltonian estimated with any arbitrary function is always larger than or equal to the smallest eigenvalue corresponding to the ground state. Next, let us evaluate energy eigenvalues of the oscillator. First we have 1 H j ψ 0 i ¼ ħω j ψ 0 i ¼ E0 j ψ 0 i: 2

ð2:31Þ

Operating a{ on both sides of (2.31), we have a{ H j ψ 0 i ¼ a{ E 0 j ψ 0 i:

ð2:32Þ

Meanwhile, using (2.25), we have a{ H j ψ 0 i ¼ Ha{ ħωa{ j ψ 0 i:

ð2:33Þ

Equating RHSs of (2.32) and (2.33), we get Ha{ j ψ 0 i ¼ ðE 0 þ ħωÞa{ j ψ 0 i:

ð2:34Þ

This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using (2.25), we get

2.2 Formulation Based on an Operator Method

37 [ℏ

0 1 2

3 2

5 2

7 2

9 2

+∞

⋯

Fig. 2.1 Energy eigenvalues of a quantum-mechanical harmonic oscillator on a real axis

2 2 H a{ j ψ 0 i ¼ ðE 0 þ 2ħωÞ a{ j ψ 0 i:

ð2:35Þ

This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeatedly taking the above procedures, we get n n H a{ j ψ 0 i ¼ ðE 0 þ nħωÞ a{ j ψ 0 i:

ð2:36Þ

Thus, (a{)n j ψ 0i belongs to an eigenvalue 1 E n ðE0 þ nħωÞ ¼ n þ ħω, 2

ð2:37Þ

where En denotes an energy eigenvalue of the n-th excited state. The energy eigenvalues are plotted in Fig. 2.1. Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn be a normalization constant of that state. That is, we have n j ψ n i ¼ cn a{ j ψ 0 i,

ð2:38Þ

where jψ ni is a normalized eigenfunction of the n-th excited state. To determine cn, let us calculate a j ψ ni. This includes a factor a(a{)n. We have n n1 n1 a a{ ¼ aa{ a{ a a{ þ a{ a a{

n1 n1 { n1 n1 ¼ a, a{ a{ þ a{ a a{ ¼ a þ a{ a a{ n1

n2 { 2 { n2 ¼ a{ þ a{ a, a{ a{ þ a a a { n1 { 2 { n2 ¼2 a þ a a a { n1 { 2 { { n3 { 3 { n3 ¼2 a þ a a, a a þ a a a { n1 { 3 { n3 ¼3 a þ a a a

ð2:39Þ

¼ : In the above procedures, we used [a, a{] ¼ 1. What is implied in (2.39) is that a coefficient of (a{)n 1 increased one by one with a transferred toward the right one by one in the second term of RHS. Notice that in the second term a is sandwiched such that (a{)ma(a{)n m (m ¼ 1, 2, . . .). Finally, we have

38

2 Quantum-Mechanical Harmonic Oscillator

n n1 { n a a{ ¼ n a{ þ a a:

ð2:40Þ

Thus, we get h n1 n i n n1 þ a{ a ψ 0 i ¼ c n n a{ a ψ n i ¼ c n a a { ψ 0 i ¼ c n n a { ψ 0 i ð2:41Þ n1 c c ¼ n n cn1 a{ jψ 0 i ¼ n n jψ n1 i, cn1 cn1 where the third equality comes from (2.28). Next, operating a on (2.40) we get n n1 n a2 a{ ¼ na a{ þ a a{ a h n2 { n1 i n þ a a þ a a{ a ¼ n ð n 1Þ a{ n2 n1 n ¼ nð n 1Þ a{ þ n a{ a þ a a{ a

ð2:42Þ

Operating another a on (2.42), we get n n2 n1 n a3 a{ ¼ nðn 1Þa a{ þ na a{ a þ a2 a{ a h n3 { n2 i n1 n þ a a þ na a{ a þ a2 a{ a ¼ nðn 1Þ ðn 2Þ a{ n3 n2 n1 n þ nð n 1Þ a{ a þ na a{ a þ a2 a{ a ¼ nðn 1Þðn 2Þ a{ ¼ : ð2:43Þ To generalize the above procedures, operating a on (2.40) m ( 0Þ, dq ¼ c

ð2:84Þ

rffiffiffiffiffiffiffi πħ : dq ¼ mω

ð2:85Þ

we have Z

1 1

To get (2.84), putting I

Z

I2 ¼ ¼

Z 0

1 1

1

ecq dq 2

ecr rdr 2

Z 0

R1

1 e

Z

2π

e

2 mω ħ q

cq2

1

1

dθ ¼

dq, we have

ecs ds

1 2

2

Z 0

1

Z

¼

1

1 Z 2π

ecR dR

0

ecðq þs Þ dqds 2

2

π dθ ¼ , c

ð2:86Þ

where with the third equality we converted two-dimensional Cartesian coordinate to polar coordinate; take q ¼ r cos θ, s ¼ r sin θ and convert an infinitesimal area

46

2 Quantum-Mechanical Harmonic Oscillator

element dqds to dr rdθ. With the second last equality pffiffiffiffiffiffiffiffi of (2.86), we used the variable transformation of r2 ! R. Hence, we get I ¼ π=c. Thus, we get N0 ¼

1=4 1=4 mω 2 mω mω and ψ 0 ðqÞ ¼ e 2ħ q : πħ πħ

ð2:87Þ

Also, we have rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω i mω i ħ ∂ q pffiffiffiffiffiffiffiffiffiffiffiffi p ¼ q pffiffiffiffiffiffiffiffiffiffiffiffi 2ħ 2ħ 2mħω 2mħω i ∂q rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω ħ ∂ q : ¼ 2ħ 2mω ∂q

a{ ¼

ð2:88Þ

From (2.52), we get rffiffiffiffiffiffiffiffiffiffi !n rffiffiffiffiffiffiffi mω ħ ∂ ψ 0 ð qÞ q 2ħ 2mω ∂q n

1 mω n=2 ħ ∂ ¼ pffiffiffiffi q ψ 0 ðqÞ: mω ∂q n! 2ħ

1 n 1 ψ n ðqÞ ¼ pffiffiffiffi a{ j ψ 0 i ¼ pffiffiffiffi n! n!

ð2:89Þ

Putting rffiffiffiffiffiffiffi mω β ħ

and

ξ ¼ βq,

ð2:90Þ

we rewrite (2.89) as

n=2

n ξ 1 mω ∂ ψ n ð qÞ ¼ ψ n ξ ψ 0 ð qÞ ¼ pffiffiffiffi β ∂ξ n! 2ħβ2 rffiffiffiffiffiffiffiffiffi

n n

1 2 1 1 n=2 ∂ 1 mω 1=4 ∂ ¼ pffiffiffiffi ξ ψ 0 ð qÞ ¼ ξ e2ξ : ð2:91Þ n 2 2 πħ n! ∂ξ ∂ξ n! Comparing (2.81) and (2.90), we have α¼ Moreover, putting

β2 : 2

ð2:92Þ

2.4 Coordinate Representation of Schrödinger Equation

47

rffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 mω 1=4 β Nn ¼ , 2n n! πħ π 1=2 2n n!

ð2:93Þ

n 1 2 ∂ e 2ξ : ψ n ðξ=βÞ ¼ N n ξ ∂ξ

ð2:94Þ

we get

We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already been normalized as in (2.53), we have Z

1

1

jψ n ðqÞj2 dq ¼ 1:

ð2:95Þ

Changing a variable q to ξ, we have 1 β

Z

1

1

jψ n ðξ=βÞj2 dξ ¼ 1:

ð2:96Þ

fn ðξÞ as being normalized with ξ. In other words, ψ n(q) is converted Let us define ψ fn ðξÞ by means of variable transformation and concomitant change in normalito ψ zation condition. Then, we have Z

1 1

fn ðξÞj2 dξ ¼ 1: jψ

ð2:97Þ

fn ðξÞ as Comparing (2.96) and (2.97), if we define ψ rffiffiffi 1 fn ðξÞ ψ ðξ=βÞ, ψ β n

ð2:98Þ

fn ðξÞ should be a proper normalized function. Thus, we get ψ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

n 1 2 ∂ 1 ξ fn ξ fn fn ðξÞ ¼ N e 2 with N ψ : ∂ξ π 1=2 2n n!

ð2:99Þ

Meanwhile, according to a theory of classical orthogonal polynomial, the Hermite polynomials Hn(x) are defined as [3] 2

H n ðxÞ ð1Þn ex

dn x2 e ðn 0Þ, n dx

ð2:100Þ

48

2 Quantum-Mechanical Harmonic Oscillator

where Hn(x) is a n-th order polynomial. We wish to show the following relation on the basis of mathematical induction: fn H n ðξÞe2ξ : fn ðξÞ ¼ N ψ 1 2

ð2:101Þ

Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n ¼ 0. When n ¼ 1, from (2.99) we have

h i f1 ξ ∂ e12ξ2 ¼ N f1 ξe12ξ2 ðξÞe12ξ2 ¼ N f1 2ξe12ξ2 f1 ðξÞ ¼ N ψ ∂ξ 2 1 2 1 ξ2 d f f1 H 1 ðξÞe12ξ2 : ¼ N 1 ð1Þ e eξ e2ξ ¼ N dξ

ð2:102Þ

Then, (2.101) holds with n ¼ 1 as well. Next, from supposition of mathematical induction we assume that (2.101) holds with n. Then, we have

nþ1

n 1 2 ∂ 1 ∂ f ∂ 12ξ2 g ψg e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ e 2ξ Nn ξ nþ1 ðξÞ ¼ N nþ1 ξ ∂ξ ∂ξ ∂ξ 2ð n þ 1 Þ

h i 1 2 1 ∂ f N n H n ðxÞe2ξ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ξ ∂ξ 2ð n þ 1Þ

1 2 n 2 d 1 ξ2 fn ξ ∂ ð1Þn eξ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N e e 2ξ dξn ∂ξ 2ð n þ 1Þ

n 1 fn ð1Þn ξ ∂ e12ξ2 d n eξ2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N dξ ∂ξ 2ð n þ 1Þ n n 1 fn ð1Þn ξe12ξ2 d n eξ2 ∂ e12ξ2 d n eξ2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N dξ dξ ∂ξ 2ð n þ 1Þ n n nþ1 1 2 d 1 2 d 1 2 d 1 n ξ ξ2 ξ ξ2 ξ ξ2 f 2 2 2 e e e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N n ð1Þ ξe ξe e dξn dξn dξnþ1 2ð n þ 1Þ nþ1 1 ξ2 fn ð1Þnþ1 e12ξ2 d e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N dξnþ1 2ð n þ 1Þ 1 2 nþ1 nþ1 ξ2 d ξ2 1 ξ2 ð 1 Þ e e ¼ Ng e2ξ ¼ Ng nþ1 nþ1 H nþ1 ðxÞe 2 : nþ1 dξ ð2:103Þ This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is true of n that is zero or any positive integer. Orthogonal relation reads as

2.4 Coordinate Representation of Schrödinger Equation Table 2.1 First six Hermite polynomials

49

H0(x) ¼ 1 H1(x) ¼ 2x H2(x) ¼ 4x2 2 H3(x) ¼ 8x3 12x H4(x) ¼ 16x4 48x2 + 12 H5(x) ¼ 32x5 160x3 + 120x

Z

1

1

fn ðξÞdξ ¼ δmn : ψfm ðξÞ ψ

ð2:104Þ

Placing (2.98) back into the function form ψ n(q), we have ψ n ð qÞ ¼

pffiffiffi fn ðβqÞ: βψ

ð2:105Þ

Using (2.101) and explicitly rewriting (2.105), we get ψ n ð qÞ ¼

mω ħ

1=4 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω 2 1 mω q e 2ħ q ðn ¼ 0, 1, 2, Þ: Hn ħ π 1=2 2n n!

ð2:106Þ

We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index n represents the highest order of the polynomials. In Table 2.1 we see that even functions and odd functions appear alternately (i.e., parity). This is the case with mω 2 ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e 2ħ q . Combining (2.101) and (2.104), the orthogonal relation between fn ðξÞ ðn ¼ 0, 1, 2, Þ can be described alternatively as [3] ψ Z

1 1

eξ H m ðξÞH n ðξÞdξ ¼ 2

pffiffiffi n π 2 n!δmn :

ð2:107Þ

Note that Hm(ξ) is a real function, and so Hm(ξ) ¼ Hm(ξ). The relation (2.107) is 2 well known as the orthogonality of Hermite polynomials with eξ taken as a weight function [3]. Here the weight function is a real and non-negative function within the domain considered [e.g., (1, +1) in the present case] and independent of indices m and n. We will deal with it again in Sect. 8.4. The relation (2.101) and the orthogonality relationship described as (2.107) can more explicitly be understood as follows: From (2.11) we have the Schrödinger equation of a one-dimensional quantum-mechanical harmonic oscillator such that

ħ2 d 2 uð qÞ 1 þ mω2 q2 uðqÞ ¼ EuðqÞ: 2 2m dq2

Changing a variable as in (2.90), we have

ð2:108Þ

50

2 Quantum-Mechanical Harmonic Oscillator

d 2 uð ξ Þ 2E uðξÞ: þ ξ2 uðξÞ ¼ 2 ħω dξ

ð2:109Þ

Defining a dimensionless parameter λ

2E ħω

ð2:110Þ

and also defining a differential operator D such that D

d2 þ ξ2 , dξ2

ð2:111Þ

we have a following eigenvalue equation: DuðξÞ ¼ λuðξÞ:

ð2:112Þ

We further consider a following function v(ξ) such that uðξÞ ¼ vðξÞeξ

2

=2

:

ð2:113Þ

Then, (2.109) is converted as follows: 2 2 d vð ξ Þ dvðξÞ ξ22 ξ2 e þ 2ξ ¼ ð λ 1 Þ v ð ξ Þe : dξ dξ2

ð2:114Þ

ξ2

Since e 2 does not vanish with any ξ, we have

d 2 vð ξ Þ dvðξÞ ¼ ðλ 1Þ vðξÞ: þ 2ξ 2 dξ dξ

ð2:115Þ

e such that If we define another differential operator D 2 e d þ 2ξ d , D dξ dξ2

ð2:116Þ

we have another eigenvalue equation e ðξÞ ¼ ðλ 1Þ vðξÞ: Dv Meanwhile, we have a following well-known differential equation:

ð2:117Þ

2.5 Variance and Uncertainty Principle

51

d 2 H n ðξÞ dH ðξÞ 2ξ n þ 2nH n ðξÞ ¼ 0: 2 dξ dξ

ð2:118Þ

This equation is said to be Hermite differential equation. Using (2.116), (2.118) can be recast as an eigenvalue equation such that e n ðξÞ ¼ 2nH n ðξÞ: DH

ð2:119Þ

Therefore, comparing (2.115) and (2.118) and putting λ ¼ 2n þ 1,

ð2:120Þ

vðξÞ ¼ cH n ðξÞ,

ð2:121Þ

we get

where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get un ðξÞ ¼ cH n ðξÞeξ

2

=2

,

ð2:122Þ

where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we have 1 E n ¼ n þ ħω: 2 Thus, (2.37) is recovered. A normalization constant c of (2.122) can be decided as in (2.106). As discussed above, the operator representation and coordinate representation are fully consistent.

2.5

Variance and Uncertainty Principle

Uncertainty principle is one of most fundamental concepts of quantum mechanics. To think of this conception on the basis of a quantum harmonic oscillator, let us introduce a variance operator [4]. Let A be a physical quantity and let hAi be an expectation value as defined in (1.126). We define a variance operator as D E ðΔAÞ2 , where we have

52

2 Quantum-Mechanical Harmonic Oscillator

ΔA A hAi:

ð2:123Þ

In (2.123), we assume that hAi is obtained by operating A on a certain physical state jψi. Then, we have D E D E D E ðΔAÞ2 ¼ ðA hAiÞ2 ¼ A2 2hAiA þ hAi2 ¼ A2 hAi2 :

ð2:124Þ

If A is Hermitian, ΔA is Hermitian as well. This is because ðΔAÞ{ ¼ A{ hAi ¼ A hAi ¼ ΔA,

ð2:125Þ

where we used the fact that an expectation value of an Hermitian operator is real. Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an eigenstate of A, h(ΔA)2i ¼ 0. Therefore, h(ΔA)2i represents a measure of how large measured values are dispersed when A is measured in reference to a quantum state jψi. Also, we define a standard deviation δA as δA

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi D Effi

ðΔAÞ2 :

ð2:126Þ

We have a following important theorem on a standard deviation δA [4]. Theorem 2.1 Let A and B be Hermitian operators. If A and B satisfy ½A, B ¼ ik ðk : non zero real numberÞ,

ð2:127Þ

δA δB j k j =2

ð2:128Þ

then we have

in reference to any quantum state jψi. Proof We have ½ΔA, ΔB ¼ ½A hψjAjψ i, B hψjBjψ i ¼ ½A, B ¼ ik:

ð2:129Þ

In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and those commute with any operator. Next, we calculate a following quantity in relation to a real number λ: jjðΔA þ iλΔBÞjψijj2 ¼ hψjðΔA iλΔBÞðΔA þ iλΔBÞjψi ¼ hψjðΔAÞ2 j ψi kλ þ hψjðΔBÞ2 j ψiλ2 ,

ð2:130Þ

2.5 Variance and Uncertainty Principle

53

where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form to hold with any real number λ, we have ðkÞ2 4hψjðΔAÞ2 j ψihψjðΔBÞ2 j ψi 0:

ð2:131Þ

Thus, (2.128) will follow. ∎ On the basis of Theorem 2.1, we find that both δA and δB are positive on condition that (2.127) holds. We have another important theorem. Theorem 2.2 Let A be an Hermitian operator. The necessary and sufficient condition for a physical state jψ 0i to be an eigenstate of A is δA ¼ 0. Proof Suppose that jψ 0i is a normalized eigenstate of A that belongs to an eigenvalue a. Then, we have

ψ 0 jA2 ψ 0 ¼ ahψ 0 jAψ 0 i ¼ a2 ψ 0 jψ 0 i ¼ a2 , hψ 0 jAψ 0 i2 ¼ a ψ 0 jψ 0 i2 ¼ a2 :

ð2:132Þ

From (2.124) and (2.126), we have D E ψ 0 jðΔAÞ2 ψ 0 ¼ 0, i:e:, δA ¼ 0:

ð2:133Þ

Note that δA is measured in reference to jψ 0i. Conversely, suppose that δA ¼ 0. Then, δA ¼

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi D E

ψ 0 jðΔAÞ2 ψ 0 ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hΔAψ 0 jΔAψ 0 i ¼j jΔAψ 0 j j ,

ð2:134Þ

where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121), for δA ¼ 0 to hold, we have ΔAψ 0 ¼ ðA hAiÞψ 0 ¼ 0, i:e:, Aψ 0 ¼ hAiψ 0 :

ð2:135Þ

This indicates that ψ 0 is an eigenstate of A that belongs to an eigenvalue hAi. This completes the proof. ∎ Theorem 2.1 implies that (2.127) holds with any physical state jψi. That is, we must have δA > 0 and δB > 0, if δA and δB are evaluated in reference to any jψi on condition that (2.127) holds. From Theorem 2.2, in turn, it follows that eigenstates cannot exist with A or B under the condition of (2.127). To explicitly show this, we take an inner product of (2.127). That is, with Hermitian operators A and B, consider the following inner product: hψj½A, Bjψi ¼ hψjik jψi, i:e:, hψjAB BA j ψi ¼ ik,

ð2:136Þ

54

2 Quantum-Mechanical Harmonic Oscillator

where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have A j ψ 0 i ¼ a j ψ 0 i:

ð2:137Þ

Taking an adjoint of (2.137), we get hψ 0 jA{ ¼ hψ 0 jA ¼ hψ 0 ja ¼ ahψ 0 j,

ð2:138Þ

where the last equality comes from the fact that A is Hermitian. From (2.138), we would have hψ 0 jAB BAjψ 0 i ¼ hψ 0 jABjψ 0 i hψ 0 jBAjψ 0 i ¼ hψ 0 jaBjψ 0 i hψ 0 jBajψ 0 i ¼ ahψ 0 jBjψ 0 i ahψ 0 jBjψ 0 i ¼ 0: This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127), where ik 6¼ 0. Namely, we conclude that any physical state cannot be an eigenstate of A on condition that (2.127) holds. Equation (2.127) is rewritten as hψjBA AB j ψi ¼ ik:

ð2:139Þ

Suppose now that jφ0i is an eigenstate of B that belongs to an eigenvalue b. Then, we can similarly show that any physical state cannot be an eigenstate of B. Summarizing the above, we restate that once we have a relation [A, B] ¼ ik (k 6¼ 0), their representation matrix does not diagonalize A or B. Or, once we postulate [A, B] ¼ ik (k 6¼ 0), we must abandon an effort to have a representation matrix that diagonalizes A and B. In the quantum-mechanical formulation of a harmonic oscillator, we have introduced the canonical commutation relation (see Sect. 2.3) described by [q, p] ¼ iħ (1.140). Indeed, neither q nor p is diagonalized as shown in (2.69) or (2.70). Example 2.1 Taking a quantum harmonic oscillator as an example, we consider variance of q and p in reference to jψ ni (n ¼ 0, 1, ). We have D E ðΔqÞ2 ¼ hψ n jq2 j ψ n i hψ n jq j ψ n i2 :

ð2:140Þ

Using (2.55) and (2.62) as well as (2.68), we get rffiffiffiffiffiffiffiffiffiffi ħ ψ a þ a{ ψ n hψ n jqjψ n i ¼ 2mω n rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi ħð n þ 1Þ ħn ¼ ψ n jψ nþ1 ¼ 0, hψ n jψ n1 i þ 2mω 2mω where the last equality comes from (2.53). We have

ð2:141Þ

2.5 Variance and Uncertainty Principle

q2 ¼

55

h 2 2 i ħ ħ a þ a{ ¼ a2 þ E þ 2a{ a þ a{ , 2mω 2mω

ð2:142Þ

where E denotes an identity operator and we used (2.24) along with the following relation:

aa{ ¼ aa{ a{ a þ a{ a ¼ a, a{ þ a{ a ¼ E þ a{ a:

ð2:143Þ

Using (2.55) and (2.62), we have h ψ n j q2 j ψ n i ¼

ħ ħ ð2n þ 1Þ, hψ n jψ n i þ 2 ψ n ja{ aψ n i ¼ 2mω 2mω

ð2:144Þ

where we used (2.60) with the last equality. Thus, we get D E ħ ðΔqÞ2 ¼ hψ n jq2 j ψ n i hψ n jq j ψ n i2 ¼ ð2n þ 1Þ: 2mω Following similar procedures to those mentioned above, we get hψ n jp j ψ n i ¼ 0 and hψ n jp2 j ψ n i ¼

mħω ð2n þ 1Þ: 2

ð2:145Þ

Thus, we get D E mħω ðΔpÞ2 ¼ hψ n jp2 j ψ n i ¼ ð2n þ 1Þ: 2 Accordingly, we have rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi D E rD E ħ ħ 2 ðΔpÞ2 ¼ ð2n þ 1Þ : δq δp ¼ ðΔqÞ 2 2

ð2:146Þ

The quantity δq δp is equal to ħ2 for n ¼ 0 and becomes larger with increasing n. The above example gives a good illustration for Theorem 2.1. Note that putting A ¼ q and B ¼ p along with k ¼ ħ in Theorem 2.1, we should have from (1.140) ħ δq δp : 2 This is indeed the case with (2.146) for the quantum-mechanical harmonic oscillator. This example represents uncertainty principle more generally. In relation to the aforementioned argument, we might well wonder if Examples 1.1 and 1.2 have an eigenstate of a fixed momentum. Suppose that we chose for an eigenstate y(x) ¼ ceikx, where c is a constant. Then, we would have ħi ∂y∂xðxÞ ¼ ħkyðxÞ

56

2 Quantum-Mechanical Harmonic Oscillator

and get an eigenvalue ħk for a momentum. Nonetheless, such y(x) does not satisfy the proper BCs; i.e., y(L) ¼ y(L ) ¼ 0. This is because eikx never vanishes with any real numbers of k or x (any complex numbers of k or x, more generally). Thus, we cannot obtain a proper solution that has an eigenstate with a fixed momentum in a confined physical system.

References 1. Messiah A (1999) Quantum mechanics. Dover, New York 2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York 3. Lebedev NN (1972) Special functions and their applications. Dover, New York 4. Shimizu A (2003) Upgrade foundation of quantum theory (in Japanese). Saiensu-sha, Tokyo

Chapter 3

Hydrogen-Like Atoms

In a history of quantum mechanics, it was first successfully applied to the motion of an electron in a hydrogen atom along with a harmonic oscillator. Unlike the case of a one-dimensional harmonic oscillator we dealt with in Chap. 2, however, with a hydrogen atom we have to consider three-dimensional motion of an electron. Accordingly, it takes somewhat elaborate calculations to constitute the Hamiltonian. The calculation procedures themselves, however, are worth following to understand underlying basic concepts of the quantum mechanics. At the same time, this chapter is a treasure of special functions. In Chap. 2, we have already encountered one of them, i.e., Hermite polynomials. Here we will deal with Legendre polynomials, associated Legendre polynomials, etc. These special functions arise when we deal with a physical system having, e.g., the spherical symmetry. In a hydrogen atom, an electron is moving in a spherically symmetric Coulomb potential field produced by a proton. This topic provides us with a good opportunity to study various special functions. The related Schrödinger equation can be separated into an angular part and a radial part. The solutions of angular parts are characterized by spherical (surface) harmonics. The (associated) Legendre functions are correlated to them. The solutions of the radial part are connected to the (associated) Laguerre polynomials. The exact solutions are obtained by the product of the (associated) Legendre functions and (associated) Laguerre polynomials accordingly. Thus, to study the characteristics of hydrogen-like atoms from the quantum-mechanical perspective is of fundamental importance.

3.1

Introductory Remarks

The motion of the electron in hydrogen is well known as a two-particle problem (or two-body problem) in a central force field. In that case, the coordinate system of the physical system is separated into the relative coordinates and center-of-mass coordinates. To be more specific, the coordinate separation is true of the case where © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_3

57

58

3 Hydrogen-Like Atoms

two particles are moving under control only by a force field between the two particles without other external force fields [1]. In the classical mechanics, equation of motion is separated into two equations related to the relative coordinates and center-of-mass coordinates accordingly. Of these, a term of the potential field is only included in the equation of motion with respect to the relative coordinates. The situation is the same with the quantum mechanics. Namely, the Schrödinger equation of motion with the relative coordinates is expressed as an eigenvalue equation that reads as ħ2 ∇2 þ V ðr Þ ψ ¼ Eψ, 2μ

ð3:1Þ

where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a potential with r being a distance between the electron and proton. In (3.1), we assume the spherically symmetric potential; i.e., the potential is expressed only as a function of the distance r. Moreover, if the potential is coulombic, ħ2 2 e2 ∇ ψ ¼ Eψ, 2μ 4πε0 r

ð3:2Þ

where ε0 is permittivity of vacuum and e is an elementary charge. If we think of hydrogen-like atoms such as He+, Li2+ and Be3+, we have an equation described as ħ2 Ze2 ∇2 ψ ¼ Eψ, 2μ 4πε0 r

ð3:3Þ

where Z is an atomic number and μ is a reduced mass of an electron and a nucleus pertinent to the atomic (or ionic) species. We start with (3.3) in this chapter.

3.2

Constitution of Hamiltonian

As explicitly described in (3.3), the coulombic potential has a spherical symmetry. In such a case, it will be convenient to recast (3.3) in a spherical coordinate (or polar coordinate). As the physical system is of three-dimensional, we have to consider orbital angular momentum L in Hamiltonian. We have

3.2 Constitution of Hamiltonian

59

0

1 Lx B C L ¼ ðe1 e2 e3 Þ@ Ly A,

ð3:4Þ

Lz where e1, e2, and e3 denote an orthonormal basis vectors in a three-dimensional Cartesian space (ℝ3); Lx, Ly, and Lz represent each component of L. The angular momentum L is expressed in a form of determinant as e1 L¼xp¼ x px

e2 y py

e3 z , pz

where x denotes a position vector with respect to the relative coordinates x, y, and z. That is, 0 1 x B C x ¼ ð e1 e2 e3 Þ @ y A: z

ð3:5Þ

The quantity p denotes a momentum of an electron (as a particle carrying a reduced mass μ) with px, py, and pz being their components; p is denoted similarly to the above. As for each component of L, we have, e.g., Lx ¼ ypz zpy :

ð3:6Þ

To calculate L2, we estimate Lx2, Ly2, and Lz2 separately. We have Lx 2 ¼ ypz zpy ypz zpy ¼ ypz ypz ypz zpy zpy ypz zpy zpy ¼ y2 pz 2 ypz zpy zpy ypz þ z2 py 2 ¼ y2 pz 2 y zpz iħ py z ypy iħ pz þ z2 py 2 ¼ y2 pz 2 þ z2 py 2 yzpz py zypy pz þ iħ ypy þ zpz , where we have used canonical commutation relation (1.140) in the second to the last equality. In the above calculations, we used commutability of, e.g., y and pz; z and py. For example, we have

60

3 Hydrogen-Like Atoms

∂jψi ∂jψi ħ ∂ ∂ ħ yy y pz , y j ψi ¼ y j ψi ¼ ¼ 0: i ∂z i ∂z ∂z ∂z

Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have L2 ¼ Lx 2 þ Ly 2 þ Lz 2 ¼ y2 pz 2 þ z2 py 2 þ z2 px 2 þ x2 pz 2 þ x2 py 2 þ y2 px 2 þ x 2 px 2 x 2 px 2 þ y 2 py 2 y 2 py 2 þ z 2 pz 2 z 2 pz 2 yzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy ¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2 þx2 px 2 þ y2 py 2 þ z2 pz 2 Þ x2 px 2 þ y2 py 2 þ z2 pz 2 þyzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py Þ þiħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy ¼ r2 p2 rðr pÞ p þ 2iħðr pÞ:

ð3:7Þ

In (3.7), we are able to ease the calculations by virtue of putting a term (x2px2 x2px2 + y2py2 y2py2 + z2pz2 z2pz2). As a result, for the second term after the second to the last equality we have x 2 px 2 þ y 2 py 2 þ z 2 pz 2 þyzpz py þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py Þ

¼ x xpx þ ypy þ zpz px þ y xpx þ ypy þ zpz py þ z xpx þ ypy þ zpz pz ¼ rðr pÞ p: The calculations of r2 p2 [the first term of (3.7)] and r p (in the third term) are straightforward. In a spherical coordinate, momentum p is expressed as p = pr eðrÞ þ pθ eðθÞ þ pϕ eðϕÞ ,

ð3:8Þ

where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In

3.2 Constitution of Hamiltonian

(a)

61

(b)

z

z

e(r)

e(r)

e(φ )

θ O

e(φ )

e(θ )

e(θ )

θ y O

φ x

Fig. 3.1 Spherical coordinate system and orthonormal basis set. (a) Orthonormal basis vectors e(r), e(θ), and e(ϕ) in ℝ3. (b) The basis vector e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of y ¼ x tan ϕ

Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of y ¼ x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the momentum operator is expressed as [2] ħ p¼ — i ħ ðrÞ ∂ 1 ∂ ðθ Þ 1 ∂ ðϕÞ ¼ e þe þe : i r ∂θ r sin θ ∂ϕ ∂r

ð3:9Þ

The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian coordinate, we have p=

ħ ħ ∂ ∂ ∂ —= e1 þ e2 þ e3 , i i ∂x ∂y ∂z

where — is said to be nabla (or del), a kind of differential vector operator. Noting that r = reðrÞ , and using (3.9), we have

ð3:10Þ

62

3 Hydrogen-Like Atoms

ħ ħ ∂ r p=r — =r : i i ∂r

ð3:11Þ

2 ħ ∂ ħ ∂ ∂ rðr pÞ p = r r ¼ ħ2 r 2 2 : i ∂r i ∂r ∂r

ð3:12Þ

Hence,

Thus, we have L 2 ¼ r 2 p2 þ ħ2 r 2

2 ∂ 2 ∂ 2 2 2 ∂ 2 ∂ þ 2ħ r p þ ħ ¼ r r : 2 ∂r ∂r ∂r ∂r

ð3:13Þ

ħ2 ∂ L2 2 ∂ r p ¼ 2 þ 2: r ∂r r ∂r

ð3:14Þ

Therefore, 2

Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and so it can freely be divided by r2. Thus, the Hamiltonian H is represented by p2 þ V ðr Þ 2μ 2 1 ħ ∂ ∂ L2 Ze2 ¼ 2 r2 : þ 2 2μ 4πε0 r r ∂r r ∂r

H¼

ð3:15Þ

Thus, the Schrödinger equation can be expressed as

2 1 ħ ∂ L2 Ze2 2 ∂ 2 r ψ ¼ Eψ: þ 2 2μ 4πε0 r r ∂r r ∂r

ð3:16Þ

Now, let us describe L2 in a polar coordinate. The calculation procedures are somewhat lengthy, but straightforward. First we have x ¼ r sin θ cos ϕ, y ¼ r sin θ sin ϕ, z ¼ r cos θ,

9 > = > ;

ð3:17Þ

where we have 0 θ π and 0 ϕ 2π. Rewriting (3.17) with respect to r, θ, and ϕ, we get

3.2 Constitution of Hamiltonian

63

9 > > > > =

1

r ¼ ð x2 þ y2 þ z 2 Þ 2 , ð x2 þ y2 Þ z 1 y : ϕ ¼ tan x θ ¼ tan 1

1=2

, > > > > ;

ð3:18Þ

Thus, we have

∂ ∂ Lz ¼ xpy ypx ¼ iħ x y , ∂y ∂x

ð3:19Þ

∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂ ¼ þ þ : ∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ

ð3:20Þ

In turn, we have ∂r x ¼ ¼ sin θ cos ϕ, ∂x r

1

ðx2 þ y2 Þ 2 2x ∂θ 1 z x cos θ cos ϕ ¼ 2 ¼ , ¼ 2z r x þ y2 þ z2 ðx2 þ y2 Þ12 ∂x 1 þ ðx2 þ y2 Þ=z2

∂ϕ 1 1 sin ϕ ¼ : y ¼ r sin θ x2 ∂x 1 þ ðy2 =x2 Þ ð3:21Þ In calculating the last two equations of (3.21), we used the differentiation of an arc tangent function along with a composite function. Namely,

0 tan 1 x ¼

1 : 1 þ x2

Inserting (3.21) into (3.20), we get ∂ ∂ cos θ cos ϕ ∂ sin ϕ ∂ ¼ sin θ cos ϕ þ : r ∂x ∂r ∂θ r sin θ ∂ϕ

ð3:22Þ

Similarly, we have ∂ ∂ cos θ sin ϕ ∂ cos ϕ ∂ ¼ sin θ sin ϕ þ þ : r ∂y ∂r ∂θ r sin θ ∂ϕ Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get

ð3:23Þ

64

3 Hydrogen-Like Atoms

Lz ¼ iħ

∂ : ∂ϕ

ð3:24Þ

In a similar manner, we have ∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂ ¼ þ ¼ cos θ : r ∂θ ∂z ∂z ∂r ∂z ∂θ ∂r Combining this relation with either (3.23) or (3.22), we get

∂ ∂ þ cot θ cos ϕ Lx ¼ iħ sin ϕ , ∂θ ∂ϕ ∂ ∂ cot θ sin ϕ Ly ¼ iħ cos ϕ : ∂θ ∂ϕ

ð3:25Þ ð3:26Þ

Now, we introduce following operators: LðþÞ Lx þ iLy

and

LðÞ Lx iLy :

ð3:27Þ

Then, we have LðþÞ ¼ ħeiϕ

∂ ∂ ∂ ∂ þ i cot θ and LðÞ ¼ ħeiϕ þ i cot θ : ∂θ ∂ϕ ∂θ ∂ϕ

ð3:28Þ

Thus, we get ∂ ∂ iϕ ∂ ∂ þ i cot θ þ i cot θ L L ¼ħ e e ∂θ ∂ϕ ∂θ ∂ϕ 2 2 ∂ 1 ∂ ∂ þ i cot θ ¼ ħ2 eiϕ eiϕ 2 þ i ∂θ∂ϕ sin 2 θ ∂ϕ ∂θ 2 2 ∂ ∂ ∂ ∂ þ i cot θ 2 þ i cot θ þeiϕ cot θ þ ieiϕ cot θ ∂ϕ∂θ ∂θ ∂ϕ ∂ϕ 2 2 ∂ ∂ ∂ 2 ∂ þ i þ cot þ cot θ θ ¼ ħ2 ð3:29Þ ∂θ ∂ϕ ∂θ2 ∂ϕ2 ðþÞ ðÞ

2 iϕ

In the above calculation procedure, we used differentiation of a product function. For instance, we have

3.2 Constitution of Hamiltonian

65

2 2 ∂ ∂ ∂ cot θ ∂ ∂ 1 ∂ ∂ i cot θ þ cot θ þ cot θ ¼i : ¼i ∂θ∂ϕ ∂θ∂ϕ ∂θ ∂ϕ ∂θ ∂ϕ sin 2 θ ∂ϕ 2

2

∂ ∂ ¼ ∂ϕ∂θ . This is because we are dealing with continuous and Note also that ∂θ∂ϕ differentiable functions. Meanwhile, we have following commutation relations:

Lx , Ly ¼ iħLz , Ly , Lz ¼ iħLx , and ½Lz , Lx ¼ iħLy :

ð3:30Þ

This can easily be confirmed by requiring canonical commutation relations. The derivation can routinely be performed, but we show it because the procedures include several important points. For instance, we have

Lx , Ly ¼ Lx Ly Ly Lx ¼ ypz zpy zpx xpz zpx xpz ypz zpy ¼ ypz zpx ypz xpz zpy zpx þ zpy xpz zpx ypz þ zpx zpy þ xpz ypz xpz zpy ¼ ypx pz z zpx ypz þ zpy xpz xpz zpy þ xpz ypz ypz xpz þ zpx zpy zpy zpx ¼ ypx zpz pz z þ xpy zpz pz z ¼ iħ xpy ypx ¼ iħLz

In the above calculations, we used the canonical commutation relation as well as commutability of, e.g., y and px; y and z; px and py. For example, we get

px , py j ψi ¼ ħ

2

∂ ∂ ∂ ∂ ∂x ∂y ∂y ∂x

2

j ψi ¼ ħ

2

2

∂ jψi ∂ jψi ∂x∂y ∂y∂x

! ¼ 0:

In the above equation, we assumed that the order of differentiation with respect to x and y can be switched. It is because we are dealing with continuous and differentiable normal functions. Thus, px and py commute. For other important commutation relations, we have

Lx , L2 ¼ 0, Ly , L2 ¼ 0, and Lz , L2 ¼ 0:

With the derivation, use

ð3:31Þ

66

3 Hydrogen-Like Atoms

½A, B þ C ¼ ½A, B þ ½A, C : The derivation is straightforward and it is left for readers. The relations (3.30) and (3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz. This is because L2 commute with them from (3.31), whereas Lz does not commute with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen in Part III. Thus, we have

LðþÞ LðÞ ¼ Lx 2 þ Ly 2 þ i Ly Lx Lx Ly ¼ Lx 2 þ Ly 2 þ i Ly , Lx ¼ Lx 2 þ Ly 2 þ ħLz Notice here that [Ly, Lx] ¼ [Lx, Ly] ¼ iħLz. Hence, L2 ¼ LðþÞ LðÞ þ Lz 2 ħLz :

ð3:32Þ

From (3.24), we have 2

Lz 2 ¼ ħ2

∂ : 2 ∂ϕ

ð3:33Þ

Finally we get 2

L2 ¼ ħ2

2

∂ ∂ 1 ∂ þ þ cot θ 2 2 ∂θ sin θ ∂ϕ2 ∂θ

!

or " L2 ¼ ħ2

1 ∂ sin θ ∂θ

# 2 ∂ 1 ∂ sin θ : þ ∂θ sin 2 θ ∂ϕ2

ð3:34Þ

Replacing L2 in (3.15) with that of (3.34), we have " # 2 ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂ Ze2 r2 sin θ : ð3:35Þ H¼ þ þ 2 2 2 sin θ ∂θ 4πε0 r 2μr ∂r ∂r ∂θ sin θ ∂ϕ Thus, the Schrödinger equation of (3.3) takes a following form:

3.3 Separation of Variables

(

67

) " # 2 ħ2 ∂ 2 ∂ 1 ∂ ∂ 1 ∂ Ze2 ψ ¼Eψ: ð3:36Þ r þ sinθ þ 2 2 sinθ ∂θ 4πε0 r 2μr ∂r ∂r ∂θ sin θ ∂ϕ2

3.3

Separation of Variables

If the potential is spherically symmetric (e.g., a Coulomb potential), it is well known that the Schrödinger equations of (3.1)–(3.3) can be solved by a method of separation of variables. More specifically, (3.36) can be separated into two differential equations one of which only depends on a radial component r and the other of which depends only upon angular components θ and ϕ. To apply the method of separation of variables to (3.36), let us first return to (3.15). Considering that L2 is expressed as (3.34), we assume that L2 has eigenvalues γ (at any rate if any) and takes eigenfunctions Y(θ, ϕ) (again, if any as well) corresponding to γ. That is, L2 Y ðθ, ϕÞ ¼ γY ðθ, ϕÞ,

ð3:37Þ

where Y(θ, ϕ) is assumed to be normalized. Meanwhile, L2 ¼ Lx 2 þ Ly 2 þ Lz 2 :

ð3:38Þ

From (3.6), we have { Lx { ¼ ypz zpy ¼ pz { y{ py { z{ ¼ pz y py z ¼ ypz zpy ¼ Lx :

ð3:39Þ

Note that pz and y commute, so do py and z. Therefore, Lx is Hermitian, so is Lx2. More generally if an operator A is Hermitian, so is An (n: a positive integer); readers, please show it. Likewise, Ly and Lz are Hermitian as well. Thus, L2 is Hermitian, too. Next, we consider an expectation value of L2, i.e., hL2i. Let jψi be an arbitrary normalized nonzero vector (or function). Then, hL2 i hψ jL2 ψi ¼ hψ jLx 2 ψi þ hψ jLy 2 ψi þ hψ jLz 2 ψi ¼ hLx { ψ jLx ψi þ hLy { ψ jLy ψi þ hLz { ψ jLz ψi

68

3 Hydrogen-Like Atoms

¼ hLx ψ jLx ψi þ hLy ψ jLy ψi þ hLz ψ jLz ψi 2

¼jjLx ψ jj2 þjjLy ψ jj þ jjLz ψ jj2 0:

ð3:40Þ

Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian. An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc. where we saw the calculation routines). Note also that in (3.40) the equality holds only when the following relations hold: jLx ψi ¼jLy ψi ¼jLz ψi ¼ 0:

ð3:41Þ

On this condition, we have jL2 ψ ¼ j Lx 2 þ Ly 2 þ Lz 2 ψ ¼ jLx 2 ψ þ jLy 2 ψ þ jLz 2 ψ ¼ Lx Lx ψi þ Ly Ly ψi þ Lz jLz ψi ¼ 0:

ð3:42Þ

The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simultaneous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have jLx ψ 0 i ¼jLy ψ 0 i ¼jLz ψ 0 i ¼jL2 ψ 0 i ¼ 0:

ð3:43Þ

As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz, and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We will come back this point later. In spite of this exceptional situation, it is impossible that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous eigenstates. We briefly show this as below. In Chap. 2, we mention that if [A, B] ¼ ik, any physical state cannot be an eigenstate of A or B. The situation is different, on the other hand, if we have a following case ½A, B ¼ iC,

ð3:44Þ

where A, B, and C are Hermitian operators. The relation (3.30) is a typical example for this. If Cjψi ¼ 0 in (3.44), jψi might well be an eigenstate of A and/or B. However, if Cjψi ¼ cjψi (c 6¼ 0), jψi cannot be an eigenstate of A or B. This can readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of, e.g., [Lx, Ly] ¼ iħLz. Suppose that for ∃ψ 0 we have Lzj ψ 0i ¼ 0. Taking an inner product using jψ 0i, from (3.30) we have

3.3 Separation of Variables

69

ψ 0 j Lx Ly Ly Lx ψ 0 ¼ 0:

In this case, moreover, even if we have jLxψ 0i ¼ 0 and jLyψ 0i ¼ 0, we have no inconsistency. If, on the other hand, Lzjψi ¼ mjψi (m 6¼ 0), jψi cannot be an eigenstate of Lx or Ly as mentioned above. Thus, we should be careful to deal with a general situation where we have [A, B] ¼ iC. In the case where [A, B] ¼ 0; AB ¼ BA, namely A and B commute, we have a different situation. This relation is equivalent to that an operator AB BA has an eigenvalue zero for any physical state jψi. Yet, this statement is of less practical use. Again, regarding details we wish to make a discussion in Sect. 12.6 of Part III. Returning to (3.40), let us replace ψ with a particular eigenfunction Y(θ, ϕ). Then, we have

YjL2 Y ¼ hYjγYi ¼ γ hYjYi ¼ γ 0:

ð3:45Þ

Again, if L2 has an eigenvalue, the eigenvalue should be non-negative. Taking account of the coefficient ħ2 in (3.34), it is convenient to put γ ¼ ħ2 λ ðλ 0Þ:

ð3:46Þ

On ground that the solution of (3.36) can be described as ψ ðr, θ, ϕÞ ¼ Rðr ÞY ðθ, ϕÞ,

ð3:47Þ

the Schrödinger equation (3.16) can be rewritten as

2 1 ħ ∂ L2 Ze2 2 ∂ 2 r þ 2 Rðr ÞY ðθ, ϕÞ ¼ ERðr ÞY ðθ, ϕÞ: 2μ 4πε0 r r ∂r r ∂r

ð3:48Þ

That is, 2 ∂Rðr Þ L2 Y ðθ, ϕÞ 1 ħ ∂ Ze2 R ð r Þ 2 r2 Rðr ÞY ðθ, ϕÞ Y ðθ, ϕÞ þ 2 2μ 4πε0 r r ∂r r ∂r ¼ ERðr ÞY ðθ, ϕÞ:

ð3:49Þ

Recalling (3.37) and (3.46), we have 2 ħ2 λY ðθ, ϕÞ 1 ħ ∂ Ze2 2 ∂Rðr Þ 2 r Rðr ÞY ðθ, ϕÞ R ð r Þ Y ðθ, ϕÞ þ 2μ 4πε0 r r ∂r r2 ∂r ¼ ERðr ÞY ðθ, ϕÞ:

ð3:50Þ

Dividing both sides by Y(θ, ϕ), we get a SOLDE of a radial component as

70

3 Hydrogen-Like Atoms

2 1 ħ ∂ ħ2 λ Ze2 2 ∂Rðr Þ 2 r Rðr Þ ¼ ERðr Þ: þ 2 R ðr Þ 2μ 4πε0 r r r ∂r ∂r

ð3:51Þ

Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have " L Y ðθ, ϕÞ ¼ ħ 2

2

1 ∂ sin θ ∂θ

# 2 ∂ 1 ∂ sin θ Y ðθ, ϕÞ þ ∂θ sin 2 θ ∂ϕ2

¼ ħ2 λY ðθ, ϕÞ :

ð3:52Þ

Dividing both sides by ħ2, we get "

1 ∂ sin θ ∂θ

# 2 ∂ 1 ∂ sin θ Y ðθ, ϕÞ ¼ λY ðθ, ϕÞ: þ ∂θ sin 2 θ ∂ϕ2

ð3:53Þ

Notice in (3.53) that the angular part of SOLDE does not depend on a specific form of the potential. Now, we further assume that (3.53) can be separated into a zenithal angle part θ and azimuthal angle part ϕ such that Y ðθ, ϕÞ ¼ ΘðθÞΦðϕÞ:

ð3:54Þ

Then we have "

1 ∂ sin θ ∂θ

# 2 ∂ΘðθÞ 1 ∂ Φ ð ϕÞ sin θ ΘðθÞ ¼ λΘðθÞΦðϕÞ: ð3:55Þ ΦðϕÞ þ ∂θ sin 2 θ ∂ϕ2

Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get 2

1 ∂ ΦðϕÞ sin 2 θ ¼ ΦðϕÞ ∂ϕ2 ΘðθÞ

∂ΘðθÞ 1 ∂ sin θ þ λΘðθÞ : sin θ ∂θ ∂θ

ð3:56Þ

Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must have LHS of ð3:56Þ ¼ RHS of ð3:56Þ ¼ η ðconstantÞ: Thus, we have a following relation of LHS of (3.56):

ð3:57Þ

3.3 Separation of Variables

71

1 d 2 Φð ϕÞ ¼ η: ΦðϕÞ dϕ2

ð3:58Þ

DΦðϕÞ ¼ ηΦðϕÞ:

ð3:59Þ

2

d Putting D dϕ 2 , we get

The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3, where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however, we have to consider different BCs, i.e., the periodic BCs. As in Example 1.1, we adopt two linearly independent solutions. That is, we have eimϕ and eimϕ ðm 6¼ 0Þ: As their linear combination, we have ΦðϕÞ ¼ aeimϕ þ beimϕ :

ð3:60Þ

As BCs, we consider Φ(0) ¼ Φ(2π) and Φ0(0) ¼ Φ0(2π); i.e., we have a þ b ¼ aei2πm þ bei2πm :

ð3:61Þ

Φ0 ðϕÞ ¼ aimeimϕ bimeimϕ :

ð3:62Þ

Meanwhile, we have

Therefore, from BCs we have aim bim ¼ aimei2πm bimei2πm : Then, a b ¼ aei2πm bei2πm :

ð3:63Þ

From (3.61) and (3.63), we have 2a 1 ei2πm ¼ 0 and 2b 1 ei2πm ¼ 0: If a 6¼ 0, we must have m ¼ 0, 1, 2, . If a ¼ 0, we must have b 6¼ 0 to avoid having Φ(ϕ) 0 as a solution. In that case, we have m ¼ 0, 1, 2, as well. Thus, it suffices to put Φ(ϕ) ¼ ceimϕ (m ¼ 0, 1, 2, ). Therefore, as a normalized e ðϕÞ, we get function Φ

72

3 Hydrogen-Like Atoms

e ðϕÞ ¼ p1ffiffiffiffiffi eimϕ ðm ¼ 0, 1, 2, Þ: Φ 2π

ð3:64Þ

Inserting it into (3.58), we have m2 eimϕ ¼ ηeimϕ : Therefore, we get η ¼ m2 ðm ¼ 0, 1, 2, Þ:

ð3:65Þ

From (3.56) and (3.65), we have

dΘðθÞ m 2 Θ ðθ Þ 1 d þ sin θ ¼ λΘðθÞ ðm ¼ 0, 1, 2, Þ: dθ sin θ dθ sin 2 θ

ð3:66Þ

pffiffiffiffiffi In (3.64) putting m ¼ 0 as an eigenvalue, we have ΦðϕÞ ¼ 1= 2π as a corresponding eigenfunction. Unlike Examples 1.1 and 1.2, this reflects that the d2 differential operator dϕ 2 accompanied by the periodic BCs is a non-negative operator that allows an eigenvalue of zero. Yet, we are uncertain of a range of m. To clarify this point, we consider generalized angular momentum in the next section.

3.4

Generalized Angular Momentum

We obtained commutation relations of (3.30) among individual angular momentum components Lx, Ly, and Lz. In an opposite way, we may start with (3.30) to define angular momentum. Such a quantity is called generalized angular momentum. Let e J be a generalized angular momentum as in the case of (3.4) such that 0

1 Jex B C e J ¼ ðe1 e2 e3 Þ@ Jey A: Jez

ð3:67Þ

For the sake of simple notation, let us define J as follows so that we can eliminate ħ and deal with dimensionless quantities in the present discussion:

3.4 Generalized Angular Momentum

73

0

1 0 1 Jx Jex =ħ Be C B C e J J=ħ ¼ ðe1 e2 e3 Þ@ J y =ħ A ¼ ðe1 e2 e3 Þ@ J y A, Jz Jez =ħ J2 ¼ J x 2 þ J y 2 þ J z 2 :

ð3:68Þ

Then, we require following commutation relations:

J x , J y ¼ iJ z , J y , J z ¼ iJ x , and ½J z , J x ¼ iJ y :

ð3:69Þ

Also, we require Jx, Jy, and Jz to be Hermitian. The operator J2 is Hermitian accordingly. The relations (3.69) lead to

J x , J2 ¼ 0, J y , J2 ¼ 0, and J z , J2 ¼ 0:

ð3:70Þ

This can be confirmed as in the case of (3.30). As noted above, again a simultaneous eigenstate exists for J2 and one of Jx, Jy, and Jz. According to the convention, we choose J2 and Jz for the simultaneous eigenstate. Then, designating the eigenstate by jζ, μi, we have J2 jζ, μi ¼ ζjζ, μi and J z jζ, μi ¼ μjζ, μi:

ð3:71Þ

The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which jζ, μi belongs to as well. Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ 0 as in the case of (3.45). We define following operators J (+) and J () as in the case of (3.27): J ðþÞ J x þ iJ y and J ðÞ J x iJ y :

ð3:72Þ

Then, from (3.69) and (3.70), we get h

i h i J ðþÞ , J2 ¼ J ðÞ , J2 ¼ 0:

ð3:73Þ

Also, we obtain following commutation relations: h i h i h i J z , J ðþÞ ¼ J ðþÞ ; J z , J ðÞ ¼ J ðÞ ; J ðþÞ , J ðÞ ¼ 2J z : From (3.70) to (3.72), we get

ð3:74Þ

74

3 Hydrogen-Like Atoms

J2 J ðþÞ jζ, μi ¼ J ðþÞ J2 jμi ¼ ζJ ðþÞ jζ, μi, J2 J ðÞ jζ, μi ¼ J ðÞ J2 jζ, μi ¼ ζJ ðÞ jζ, μi:

ð3:75Þ

Equation (3.75) indicates that both J (+)jζ, μi and J ()jζ, μi are eigenvectors of J2 that correspond to an eigenvalue ζ. Meanwhile, from (3.74) we get J z J ðþÞ jζ, μi ¼ J ðþÞ ðJ z þ 1Þjζ, μi ¼ ðμ þ 1ÞJ ðþÞ jζ, μi, J z J ðÞ jζ, μi ¼ J ðÞ ðJ z 1Þjζ, μi ¼ ðμ 1ÞJ ðÞ jζ, μi:

ð3:76Þ

The relation (3.76) means that J (+)jζ, μi is an eigenvector of Jz corresponding to an eigenvalue (μ + 1), while J ()jζ, μi is an eigenvector of Jz corresponding to an eigenvalue (μ 1). This implies that J (+) and J () function as raising and lowering operators (or ladder operators) that have been introduced in this chapter. Thus, using undetermined constants (or phase factors) aμ(+) and aμ(), we describe J ðþÞ jζ, μi ¼ aμ ðþÞ jζ, μ þ 1i and J ðÞ jζ, μi ¼ aμ ðÞ jζ, μ 1i:

ð3:77Þ

Next, let us characterize eigenvalues μ. We have J x 2 þ J y 2 ¼ J2 J z 2 :

ð3:78Þ

J x 2 þ J y 2 Þjζ, μi ¼ J2 J z 2 jζ, μi ¼ ζ μ2 jζ, μi:

ð3:79Þ

Therefore,

Since (Jx2 + Jy2) is a non-negative operator, its eigenvalues are non-negative as well, as can be seen from (3.40) and (3.45). Then, we have ζ μ2 0:

ð3:80Þ

Thus, for a fixed value of non-negative ζ, μ is bounded both upwards and downwards. We define then a maximum of μ as j and a minimum of μ as j0. Consequently, on the basis of (3.77), we have J ðþÞ jζ, ji ¼ 0

and J ðÞ jζ, j0 i ¼ 0:

ð3:81Þ

This is because we have no quantum state corresponding to jζ, j + 1i or jζ, j0 1i. From (3.75) and (3.81), possible numbers of μ are

3.4 Generalized Angular Momentum

75

j, j 1, j 2, , j0 :

ð3:82Þ

From (3.69) and (3.72), we get J ðÞ J ðþÞ ¼ J2 J z 2 J z ,

J ðþÞ J ðÞ ¼ J 2 J z 2 þ J z :

ð3:83Þ

Operating these operators on j ζ, ji or j ζ, j0i and using (3.81) we get J ðÞ J ðþÞ jζ, ji ¼ J2 J z 2 J z jζ, ji ¼ ζ j2 j jζ, ji ¼ 0,

2 J ðþÞ J ðÞ jζ, j0 i ¼ J2 J z 2 þ J z jζ, j0 i ¼ ζ j0 þ j0 jζ, j0 i ¼ 0:

ð3:84Þ

Since jζ, ji 6¼ 0 and jζ, j0i 6¼ 0, we have ζ j2 j ¼ ζ j0 þ j0 ¼ 0:

ð3:85Þ

ζ ¼ jð j þ 1Þ ¼ j0 ð j0 1Þ:

ð3:86Þ

2

This means that

Moreover, from (3.86) we get jð j þ 1Þ j0 ð j0 1Þ ¼ ð j þ j0 Þð j j0 þ 1Þ ¼ 0:

ð3:87Þ

As j j0, j j0 + 1 > 0. From (3.87), therefore, we get j þ j0 ¼ 0

or

j ¼ j0 :

ð3:88Þ

Then, we conclude that the minimum of μ is –j. Accordingly, possible values of μ are μ ¼ j, j 1, j 2, , j 1, j:

ð3:89Þ

That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a positive integer k, j k ¼ j

or

j ¼ k=2:

ð3:90Þ

In other words, j is permitted to take a number zero, a positive integer, or a positive half-integer (or more precisely, half-odd-integer). For instance, if j ¼ 1/2, μ can be 1/2 or 1/2. When j ¼ 1, μ can be 1, 0, or 1. Finally, we have to decide undetermined constants aμ(+) and aμ(). To this end, multiplying hζ, μ 1| on both sides of the second equation of (3.77) from the left, we have

76

3 Hydrogen-Like Atoms

hζ, μ 1jJ ðÞ jζ, μi ¼ aμ ðÞ hζ, μ 1jζ, μ 1i ¼ aμ ðÞ ,

ð3:91Þ

where the second equality comes from that jζ, μ 1i has been normalized; i.e., jjj ζ, μ 1ijj ¼ 1. Meanwhile, taking adjoint of both sides of the first equation of (3.77), we have h i{ h i

hζ, μj J ðþÞ ¼ aμ ðþÞ hζ, μ þ 1j:

ð3:92Þ

But, from (3.72) and the fact that Jx and Jy are Hermitian, h

J ðþÞ

i{

¼ J ð Þ :

ð3:93Þ

Using (3.93) and replacing μ in (3.92) with μ 1, we get h i

hζ, μ 1jJ ðÞ ¼ aμ1 ðþÞ hζ, μj:

ð3:94Þ

Furthermore, multiplying jζ, μi on (3.94) from the right, we have h i

h i

hζ, μ 1jJ ðÞ jζ, μi ¼ aμ1 ðþÞ hζ, μjζ, μi ¼ aμ1 ðþÞ ,

ð3:95Þ

where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get h i

aμ ðÞ ¼ aμ1 ðþÞ :

ð3:96Þ

Taking an inner product regarding the first equation of (3.77) and its adjoint, h i

2 hζ, μjJ ðÞ J ðþÞ jζ, μi ¼ aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i ¼aμ ðþÞ :

ð3:97Þ

Once again, the second equality of (3.97) results from the normalization of the vector. Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as hζ, μjJ2 J z 2 J z jζ, μi ¼ hζ, μj jð j þ 1Þ μ2 μjζ, μi 2 ¼ hζ, μjζ, μið j μÞð j þ μ þ 1Þ ¼ aμ ðþÞ : Thus, we get

ð3:98Þ

3.5 Orbital Angular Momentum: Operator Approach

aμ ðþÞ ¼ eiδ

77

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j μÞð j þ μ þ 1Þ ðδ : an arbitrary real numberÞ,

ð3:99Þ

where eiδ is a phase factor. From (3.96) we also get aμ ðÞ ¼ eiδ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j μ þ 1Þ ð j þ μ Þ :

ð3:100Þ

In (3.99) and (3.100), we routinely put δ ¼ 0 so that aμ(+) and aμ() can be positive numbers. Explicitly rewriting (3.77), we get pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j μÞð j þ μ þ 1Þjζ, μ þ 1i, p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi J ðÞ jζ, μi ¼ ð j μ þ 1Þð j þ μÞjζ, μ 1i, J ðþÞ jζ, μi ¼

ð3:101Þ

where j is a fixed given number chosen from among zero, positive integers, and positive half-integers (or half-odd-integers). As discussed above, we have derived various properties and relations with respect to the generalized angular momentum on the basis of (i) the relation (3.30) or (3.69) and (ii) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful in dealing with various angular momenta of different origins (e.g., orbital angular momentum, spin angular momentum) from a unified point of view. In Chap. 20, we will revisit this issue in more detail.

3.5

Orbital Angular Momentum: Operator Approach

In Sect. 3.4 we have derived various important results on angular momenta on the basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator approach. This approach enables us to understand why a quantity j introduced in Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In the next section (Sect 3.6) we will deal with the related issues by an analytical method. In (3.28) we introduced differential operators L(+) and L(). According to Sect. 3.4, we define following operators to eliminate ħ so that we can deal with dimensionless quantities: 0

Mx

1

B C M L=ħ ¼ ðe1 e2 e3 Þ@ M y A, Mz

78

3 Hydrogen-Like Atoms

M 2 ¼ L2 =ħ2 ¼ M x 2 þ M y 2 þ M z 2 :

ð3:102Þ

Ly Lx , M y ¼ , and M z ¼ Lz =ħ: ħ ħ

ð3:103Þ

Hence, we have Mx ¼

Moreover, we define following operators: ∂ ∂ þ i cot θ , ∂θ ∂ϕ ∂ ∂ þ i cot θ LðÞ =ħ ¼ eiϕ : ∂θ ∂ϕ

M ðþÞ M x þ iM y ¼ LðþÞ =ħ ¼ eiϕ M ð Þ

ð3:104Þ ð3:105Þ

Then we have "

1 ∂ M ¼ sin θ ∂θ 2

# 2 ∂ 1 ∂ sin θ : þ ∂θ sin 2 θ ∂ϕ2

ð3:106Þ

Here we execute variable transformation such that ξ ¼ cos θ ð0 θ πÞ or sin θ ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ξ2 :

ð3:107Þ

Noting, e.g., that ∂ ∂ξ ∂ ∂ ¼ ¼ sin θ ¼ ∂θ ∂θ ∂ξ ∂ξ ∂ , ¼ 1 ξ2 ∂ξ

qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ ∂ ∂ 1 ξ2 , sin θ ¼ sin 2 θ ∂ξ ∂θ ∂ξ ð3:108Þ

we get ! qffiffiffiffiffiffiffiffiffiffiffiffiffi ξ ∂ 2 ∂ M ¼e 1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ξ 1 ξ2 ∂ϕ ! qffiffiffiffiffiffiffiffiffiffiffiffiffi ξ ∂ ð Þ iϕ 2 ∂ 1ξ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi M ¼e ∂ξ 1 ξ2 ∂ϕ 2 ∂ ∂ 1 ∂ 1 ξ2 : M2 ¼ ∂ξ ∂ξ 1 ξ2 ∂ϕ2 ð þÞ

iϕ

ð3:109Þ

3.5 Orbital Angular Momentum: Operator Approach

79

Although we showed in Sect. 3.3 that m ¼ 0, 1, 2, , the range of m was unclear. The relationship between m and λ in (3.66) remains unclear so far as well. On the basis of a general approach developed in Sect. 3.4, however, we have known that the eigenvalue μ of the dimensionless z-component angular momentum Jz is bounded with its maximum and minimum being j and –j, respectively [see (3.89)], where j can be zero, a positive integer, or a positive half-odd-integer. Concomitantly, the eigenvalue ζ of J2 equals j( j + 1). In the present section, let us reconsider the relationship between m and λ in (3.66) in light of the knowledge obtained in Sect. 3.4. According to the custom, we replace μ in (3.89) with m to have m ¼ j, j 1, j 2, , j 1, j:

ð3:110Þ

At the moment, we assume that m can be a half-odd-integer besides zero or an integer [3]. Now, let us define notation of Y(θ, ϕ) that appeared in (3.37). This function is eligible for a simultaneous eigenstate of M2 and Mz and can be indexed with j and m as in (3.110). Then, let Y(θ, ϕ) be described accordingly as Ym j ðθ, ϕÞ Y ðθ, ϕÞ:

ð3:111Þ

From (3.54) and (3.64), we have imϕ Ym : j ðθ, ϕÞ / e

Therefore, we get M ð þÞ Y m j ðθ, ϕÞ

! qffiffiffiffiffiffiffiffiffiffiffiffiffi mξ 2 ∂ pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m ¼e 1ξ j ðθ, ϕÞ ∂ξ 1 ξ2 qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim ∂ iϕ m 2 2 1ξ ¼ e 1ξ Y j ðθ, ϕÞ , ∂ξ iϕ

ð3:112Þ

where we used the following equation: ∂ ∂ξ

qffiffiffiffiffiffiffiffiffiffiffiffiffim qffiffiffiffiffiffiffiffiffiffiffiffiffim1 qffiffiffiffiffiffiffiffiffiffiffiffiffi1 1 2 1ξ 1 ξ2 ð2ξÞ ¼ ðmÞ 1 ξ2 2 qffiffiffiffiffiffiffiffiffiffiffiffiffim2 ¼ mξ 1 ξ2 ,

80

3 Hydrogen-Like Atoms

∂ ∂ξ

qffiffiffiffiffiffiffiffiffiffiffiffiffim qffiffiffiffiffiffiffiffiffiffiffiffiffim2 1 ξ2 Ym ð θ, ϕ Þ ¼ mξ 1 ξ2 Ym j j ðθ, ϕÞ qffiffiffiffiffiffiffiffiffiffiffiffiffim ∂Y m j ðθ, ϕÞ : þ 1 ξ2 ∂ξ

ð3:113Þ

Similarly, we get M ð Þ Y m j ðθ, ϕÞ

! qffiffiffiffiffiffiffiffiffiffiffiffiffi mξ 2 ∂ 1ξ ¼e pffiffiffiffiffiffiffiffiffiffiffiffiffi Y m j ðθ, ϕÞ ∂ξ 1 ξ2 qffiffiffiffiffiffiffiffiffiffiffiffiffimþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffim ∂ iϕ m 2 2 1ξ ¼e 1ξ Y j ðθ, ϕÞ : ∂ξ iϕ

ð3:114Þ

Let us derive the relations where M (+) or M () is successively operated on (+) Ym j ðθ, ϕÞ. In the case of M , using (3.109) we have qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n h in ∂ n inϕ ð þÞ m 1 ξ2 Y j ðθ, ϕÞ ¼ ð1Þ e M n ∂ξ qffiffiffiffiffiffiffiffiffiffiffiffiffim 1 ξ2 Ym ð θ, ϕ Þ : j

ð3:115Þ

We confirm this relation by mathematical induction. We have (3.112) by replacing n with 1 in (3.115). Namely, (3.115) holds when n ¼ 1. Next, suppose that (3.115) holds with n. Then, using the first equation of (3.109) and noting (3.64), we have nh o h inþ1 in ð þÞ M ðþÞ Ym M ðþÞ Y m j ðθ, ϕÞ ¼ M j ðθ, ϕÞ

¼e

iϕ

! qffiffiffiffiffiffiffiffiffiffiffiffiffi ξ ∂ 2 ∂ þ i pffiffiffiffiffiffiffiffiffiffiffiffiffi 1ξ ∂ξ 1 ξ2 ∂ϕ

qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 2 m m Y ð θ, ϕ Þ 1 ξ2 1 ξ j ∂ξn " qffiffiffiffiffiffiffiffiffiffiffiffiffi # ξðn þ mÞ n iðnþ1Þϕ 2 ∂ ¼ ð1Þ e pffiffiffiffiffiffiffiffiffiffiffiffiffi 1ξ ∂ξ 1 ξ2 ð1Þn einϕ

qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 2 2 m m Y j ðθ, ϕÞ 1ξ 1ξ ∂ξn

3.5 Orbital Angular Momentum: Operator Approach

( n iðnþ1Þϕ

¼ ð1Þ e

81

qffiffiffiffiffiffiffiffiffiffiffiffiffimþn1 qffiffiffiffiffiffiffiffiffiffiffiffiffi" 1 ξ 2 ð m þ nÞ 1 ξ2

qffiffiffiffiffiffiffiffiffiffiffiffiffi n qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 2 1 2 m m 1ξ 1ξ ð2ξÞ Y j ðθ, ϕÞ ∂ξn qffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffimþn nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 2 m m 1 ξ2 1 ξ 1 ξ2 Y ð θ, ϕ Þ j ∂ξnþ1 ) qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n "qffiffiffiffiffiffiffiffiffiffiffiffiffi ξðn þ mÞ ∂ 2 2 m m pffiffiffiffiffiffiffiffiffiffiffiffiffi 1ξ 1ξ Y j ðθ, ϕÞ ∂ξn 1 ξ2

1 2

nþ1 iðnþ1Þϕ

¼ ð1Þ

e

qffiffiffiffiffiffiffiffiffiffiffiffiffimþðnþ1Þ nþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 2 2 m m 1ξ 1ξ Y j ðθ, ϕÞ : ð3:116Þ ∂ξnþ1

Notice that the first and third terms in the second last equality cancelled each other. Thus, (3.115) certainly holds with (n + 1). Similarly, we have [3] qffiffiffiffiffiffiffiffiffiffiffiffiffimþn n qffiffiffiffiffiffiffiffiffiffiffiffiffi h in ∂ ðÞ m inϕ 2 2 m m 1ξ 1ξ M Y j ðθ, ϕÞ ¼ e Y j ðθ, ϕÞ : n ∂ξ ð3:117Þ Proof of (3.117) is left for readers. From the second equation of (3.81) and (3.114) where m is replaced with j, we have iϕ M ðÞ Y j j ðθ, ϕÞ ¼ e

This implies that

qffiffiffiffiffiffiffiffiffiffiffiffiffi jþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 1 ξ2 1 ξ2 j Y j ð θ, ϕ Þ ¼ 0: ð3:118Þ j ∂ξ

pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ξ2 j Y j j ðθ, ϕÞ is a constant with respect to ξ. We

describe this as qffiffiffiffiffiffiffiffiffiffiffiffiffij 1 ξ2 Y j j ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ:

ð3:119Þ

Meanwhile, putting m ¼ j and n ¼ 2j + 1 in (3.115) and taking account of the first equation of (3.77) and the first equation of (3.81), we get h i2jþ1 M ðþÞ Y j j ðθ, ϕÞ ¼ ð1Þ2jþ1 eið2jþ1Þϕ This means that

qffiffiffiffiffiffiffiffiffiffiffiffiffi jþ1 2jþ1 qffiffiffiffiffiffiffiffiffiffiffiffiffi ∂ 2 j j Y ð θ, ϕ Þ ¼ 0: ð3:120Þ 1 ξ2 1 ξ j 2jþ1 ∂ξ

82

3 Hydrogen-Like Atoms

qffiffiffiffiffiffiffiffiffiffiffiffiffij 1 ξ2 Y j j ðθ, ϕÞ ¼ ðat most a 2j degree polynomial with ξÞ:

ð3:121Þ

Replacing Y j j ðθ, ϕÞ in (3.121) with that of (3.119), we get c

qffiffiffiffiffiffiffiffiffiffiffiffiffij qffiffiffiffiffiffiffiffiffiffiffiffiffij j 1 ξ2 1 ξ2 ¼ c 1 ξ2 ¼ ðat most a 2j degree polynomial with ξÞ: ð3:122Þ

Here, if j is a half-odd-integer, c(1 ξ2) j of (3.122) cannot be a polynomial. If, on the other hand, j is zero or a positive integer, c(1 ξ2) j is certainly a polynomial and,

pffiffiffiffiffiffiffiffiffiffiffiffiffij to top it all, a 2j-degree polynomial with respect to ξ; so is 1 ξ2 Y j j ðθ, ϕÞ. According to the custom, henceforth we use l as zero or a positive integer instead of j. That is, Y ðθ, ϕÞ Y m l ðθ, ϕÞ ðl : zero or a positive integer Þ:

ð3:123Þ

At the same time, so far as the orbital angular momentum is concerned, from (3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have ζ ¼ lðl þ 1Þ: Concomitantly, m in (3.110) is determined as m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l:

ð3:124Þ

Thus, as expected m is zero or a positive or negative integer. Considering (3.37) and (3.46), ζ is identical with λ in (3.46). Finally, we rewrite (3.66) such that

dΘðθÞ m2 ΘðθÞ 1 d þ sin θ ¼ lðl þ 1ÞΘðθÞ, dθ sin θ dθ sin 2 θ

ð3:125Þ

where, l is equal to zero or positive integers and m is given by (3.124). On condition of ξ ¼ cos θ (3.107), defining the following function Pm l ðξÞ ΘðθÞ,

ð3:126Þ

and considering (3.109) along with (3.54), we arrive at the next SOLDE described as

3.5 Orbital Angular Momentum: Operator Approach

83

m d m2 2 dPl ðξÞ 1ξ þ lðl þ 1Þ Pm ðξÞ ¼ 0: dξ dξ 1 ξ2 l

ð3:127Þ

The SOLDE of (3.127) is well known as the associated Legendre differential equation. The solutions Pm l ðξÞ are called associated Legendre functions. In the next section, we characterize the said equation and functions by an analytical method. Before going into details, however, we further seek characteristics of Pm l ðξÞ by the operator approach. Adopting the notation of (3.123) and putting m ¼ l in (3.112), we have # qffiffiffiffiffiffiffiffiffiffiffiffiffilþ1 "qffiffiffiffiffiffiffiffiffiffiffiffiffil ∂ l 2 2 1ξ 1ξ ¼ e Y l ðθ, ϕÞ : ∂ξ

M ðþÞ Y ll ðθ, ϕÞ

iϕ

ð3:128Þ

Corresponding to (3.81), we have M ðþÞ Y ll ðθ, ϕÞ ¼ 0. This implies that qffiffiffiffiffiffiffiffiffiffiffiffiffil 1 ξ2 Y ll ðθ, ϕÞ ¼ c ðc : constant with respect to ξÞ:

ð3:129Þ

From (3.107) and (3.64), we get Y ll ðθ, ϕÞ ¼ κ l sin l θeilϕ ,

ð3:130Þ

where κ l is another constant that depends on l, but is independent of θ and ϕ. Let us seek κ l by normalization condition. That is, Z

Z

2π

π

dϕ 0

0

2 sin θdθY ll ðθ, ϕÞ ¼ 2π jκl j2

Z

π

sin 2lþ1 θdθ ¼ 1,

ð3:131Þ

0

where the integration is performed on a unit sphere. Note that an infinitesimal area element on the unit sphere is represented by sinθdθdϕ. We evaluate the above integral denoted as Z I

π

sin 2lþ1 θdθ:

ð3:132Þ

0

Using integration by parts, Z

π

I¼

ð cos θÞ0 sin 2l θdθ

0

π ¼ ð cos θÞ sin 2l θ 0 þ Z ¼ 2l 0

π

Z

Z sin 2l1 θdθ 2l

π

ð cos θÞ 2l sin 2l1 θ cos θdθ

0

0

π

sin 2lþ1 θdθ:

ð3:133Þ

84

3 Hydrogen-Like Atoms

Thus, we get a recurrence relation with respect to I (3.132) such that I¼

2l 2l þ 1

Z

π

sin 2l1 θdθ:

ð3:134Þ

0

Repeating the above process, we get I¼

2l 2l 2 2 2l þ 1 2l 1 3

Z

π

0

sin θdθ ¼

22lþ1 ðl!Þ2 : ð2l þ 1Þ!

ð3:135Þ

Then, 1 jκ l j ¼ l 2 l!

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ! or 4π

eiχ κl ¼ l 2 l!

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ! ðχ : realÞ, 4π

ð3:136Þ

where eiχ is an undetermined constant (phase factor) that is to be determined below. Thus we get Y ll ðθ, ϕÞ

eiχ ¼ l 2 l!

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ! l ilϕ sin θe : 4π

ð3:137Þ

Meanwhile, in the second equation of (3.101) replacing J (), j, and μ in (3.101) with M (), l, and m, respectively, and using Y m l ðθ, ϕÞ instead of jζ, μi, we get 1 Y m1 ðθ, ϕÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M ðÞ Y m l l ðθ, ϕÞ: ðl m þ 1Þðl þ mÞ

ð3:138Þ

Replacing m with l in (3.138), we have 1 ðÞ l Y l1 Y l ðθ, ϕÞ: l ðθ, ϕÞ ¼ pffiffiffiffi M 2l

ð3:139Þ

Operating M () (l m) times in total on Y ll ðθ, ϕÞ of (3.139), we have h ilm 1 ð Þ p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi M ð θ, ϕ Þ ¼ Y ll ðθ, ϕÞ Ym l 2lð2l 1Þ ðl þ m þ 1Þ 1 2 ðl mÞ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðl þ mÞ! h ðÞ ilm l ¼ Y l ðθ, ϕÞ: M ð2lÞ!ðl mÞ! ð3:140Þ Meanwhile, putting m ¼ l, n ¼ l m, and j ¼ l in (3.117), we have

3.5 Orbital Angular Momentum: Operator Approach

h

M ð Þ

ilm

qffiffiffiffiffiffiffiffiffiffiffiffiffim lm ∂ 1 ξ2 lm ∂ξ "qffiffiffiffiffiffiffiffiffiffiffiffiffi # l 1 ξ2 Y ll ðθ, ϕÞ :

85

Y ll ðθ, ϕÞ ¼ eiðlmÞϕ

ð3:141Þ

lm l Further replacing M ðÞ Y l ðθ, ϕÞ in (3.140) with that of (3.141), we get Ym l ðθ, ϕÞ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffim lm ðl þ mÞ! iðlmÞϕ ∂ ¼ 1 ξ2 e lm ð2lÞ!ðl mÞ! ∂ξ "qffiffiffiffiffiffiffiffiffiffiffiffiffi # l l 2 1 ξ Y l ðθ, ϕÞ :

ð3:142Þ

Finally, replacing Y ll ðθ, ϕÞ in (3.142) with that of (3.137) and converting θ to ξ, we arrive at the following equation: eiχ Ym l ðθ, ϕÞ ¼ l 2 l!

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi m=2 ∂lm h l i ð2l þ 1Þðl þ mÞ! imϕ 1 ξ2 1 ξ2 : ð3:143Þ e lm 4π ðl mÞ! ∂ξ

Now, let us decide eiχ . Putting m ¼ 0 in (3.143), we have Y 0l ðθ, ϕÞ

eiχ ð1Þl ¼ l 2 l!ð1Þl

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i ð2l þ 1Þ ∂l h 2 l 1 ξ , 4π ∂ξl

ð3:144Þ

where we put (1)l on both the numerator and denominator. In RHS of (3.144),

ð1Þl ∂l 1 ξ2 Þl Pl ðξÞ: l l 2 l! ∂ξ

ð3:145Þ

Equation (3.145) is well known as Rodrigues formula of Legendre polynomials. We mention characteristics of Legendre polynomials in the next section. Thus, Y 0l ðθ, ϕÞ

eiχ ¼ ð1Þl

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ Pl ðξÞ: 4π

ð3:146Þ

According to the custom [2], we require Y 0l ð0, ϕÞ to be positive. Noting that θ ¼ 0 corresponds to ξ ¼ 1, we have

86

3 Hydrogen-Like Atoms

Y 0l ð0, ϕÞ

eiχ ¼ ð1Þl

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ ð2l þ 1Þ eiχ Pl ð1Þ ¼ , 4π 4π ð1Þl

ð3:147Þ

where we used Pl(1) ¼ 1. For this important relation, see Sect. 3.6.1. Also noting that eiχ ð1Þl ¼ 1, we must have eiχ ¼ 1 or ð1Þl

eiχ ¼ ð1Þl

ð3:148Þ

so that Y 0l ð0, ϕÞ can be positive. Thus, (3.143) is rewritten as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi l lm ð 1 Þ ð2l þ 1Þðl þ mÞ! imϕ 2 m=2 ∂ 1 ξ Ym e l ðθ, ϕÞ ¼ lm 4π ðl mÞ! 2l l! ∂ξ h i l 1 ξ2 :

ð3:149Þ

In Sect. 3.3, we mentioned that jψ 0i in (3.43) is a constant. In fact, putting l ¼ m ¼ 0 in (3.149), we have Y 00 ðθ, ϕÞ ¼

pffiffiffiffiffiffiffiffiffiffi 1=4π :

ð3:150Þ

Thus, as a simultaneous eigenstate of all Lx, Ly, Lz, and L2 corresponding to l ¼ 0 and m ¼ 0, we have jψ 0 i Y 00 ðθ, ϕÞ: The normalized functions Y m l ðθ, ϕÞ described as (3.149) define simultaneous eigenfunctions of L2 (or M2) and Lz (or Mz). Those functions are called spherical surface harmonics and frequently appear in various fields of mathematical physics. As in the case of Sect. 2.3, matrix representation enables us to intuitively grasp the relationship between angular momentum operators and their eigenfunctions (or eigenvectors). Rewriting the relations of (3.101) so that they can meet the present purpose, we have pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðl m þ 1Þðl þ mÞ jl, m 1i, p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M ðþÞ jl, mi ¼ ðl mÞðl þ m þ 1Þ jl, m þ 1i, M ðÞ jl, mi ¼

where we used l instead of ζ to designate the eigenstate.

ð3:151Þ

3.5 Orbital Angular Momentum: Operator Approach

87

Now, we know that m takes (2l + 1) different values that correspond to each l. This implies that the operators can be expressed with (2l + 1, 2l + 1) matrices. As implied in (3.151), M () takes the following form: M ð Þ ¼ 0 B B B B B B B B B B B B B B B B B B B B @

0

pffiffiffiffiffiffiffiffiffiffi 2l 1 0

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l 1Þ 2 0

⋱ ⋱ 0

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l k þ 1Þ k 0 ⋱

⋱ pffiffiffiffiffiffiffiffiffiffi 0 1 2l

C C C C C C C C C C C, C C C C C C C C C A

0 ð3:152Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where diagonal elements are zero and a (k, k + 1) element is ð2l k þ 1Þ k. That is, nonzero elements are positioned just above the zero diagonal elements. Correspondingly, we have M ð þÞ ¼ 1

0

0 B pffiffiffiffiffiffiffiffiffi B 2l 1 0 B B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B B ð2l 1Þ 2 0 B B B ⋱ B B B B B B B B B B B @

0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l k þ 1Þ k 0 ⋱

⋱ 0 ⋱

0 pffiffiffiffiffiffiffiffiffi 1 2l 0

C C C C C C C C C C C, C C C C C C C C C A

ð3:153Þ

88

3 Hydrogen-Like Atoms

where again diagonal elements are zero and a (k + 1, k) element is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ð2l k þ 1Þ k . In this case, nonzero elements are positioned just below the zero diagonal elements. Notice also that M () and M (+) are adjoint to each other and that these notations correspond to (2.65) and (2.66). Basis functions Y m l ðθ, ϕÞ can be represented by a column vector, as in the case of Sect. 2.3. These are denoted as follows: 0

1 0 0

0

1

0 1 0

1

C B C B C B C B C B C B C B C B C B C B jl, li ¼ B ⋮ C, jl, l þ 1i ¼ B ⋮ C, , jl, l 1i C B C B C B C B B 0 C B 0 C A @ A @ 0 B B B B B ¼B B B B @

0 0 0 ⋮

1

0

C B C B C B C B B C , jl, li ¼ B C 0 C B B C B 1 C @ A 0

0 0 ⋮

1

0

C C C C C , 0 C C C 0 C A

ð3:154Þ

1

where the first number l in jl, li, jl, l + 1i, etc. denotes the quantum number associated with λ ¼ l(l + 1) of (3.124) and is kept constant; the latter number denotes m. Note from (3.154) that the column vector whose k-th row is 1 corresponds to m such that m ¼ l þ k 1:

ð3:155Þ

For instance, if k ¼ 1, m ¼ l; if k ¼ 2l + 1, m ¼ l, etc. The operator M () converts the column vector whose (k + 1)-th row is 1 to that whose k-th row is 1. The former column vector corresponds to jl, m + 1i and the latter corresponding to jl, mi. Therefore, using (3.152), we get the following representation: M ðÞ jl, m þ 1i ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l k þ 1Þ kjl, mi ¼ ðl mÞðl þ m þ 1Þ jl, mi, ð3:156Þ

where the second equality is obtained by replacing k with that of (3.155), i.e., k ¼ l + m + 1. Changing m to (m 1), we get the first equation of (3.151). Similarly, we obtain the second equation of (3.151) as well. That is, we have

3.5 Orbital Angular Momentum: Operator Approach

89

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l k þ 1Þ k jl, m þ 1i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ðl mÞðl þ m þ 1Þ jl, m þ 1i:

M ðþÞ jl, mi ¼

ð3:157Þ

From (3.32) we have M 2 ¼ M ðþÞ M ðÞ þ M z 2 M z : In the above, M (+)M () and Mz are diagonal matrices and, hence, Mz2 and M2 are diagonal matrices as well such that 0 B B B B B B B Mz ¼ B B B B B B @

1

l

C C C C C C C C, C C C C C A

l þ 1 l þ 2 ⋱ kl ⋱ l 0

M

ð þÞ

M

ðÞ

B B B B B B B ¼B B B B B B @

1

0 2l 1 ð2l 1Þ 2 ⋱

ð2l k þ 1Þ k ⋱

C C C C C C C C, C C C C C A 1 2l ð3:158Þ

where k l and (2l k + 1) k represent (k + 1, k + 1) elements of Mz and M (+)M (), respectively. Therefore, (k + 1, k + 1) element of M2 is calculated as ð2l k þ 1Þ k þ ðk lÞ2 ðk lÞ ¼ lðl þ 1Þ: As expected, M2 takes a constant value l(l + 1). A matrix representation is shown in (3.159) such that

90

3 Hydrogen-Like Atoms

0 B B B B B B B 2 M ¼B B B B B B @

1

l ð l þ 1Þ

C C C C C C C C: C C C C C A

lðl þ 1Þ ⋱ lðl þ 1Þ ⋱ l ð l þ 1Þ

lðl þ 1Þ ð3:159Þ

These expressions are useful to understand how the vectors of (3.154) constitute simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is said to diagonalize both M2 and Mz. In other words, the quantum states represented by (3.154) are simultaneous eigenstates of M2 and Mz. The matrices (3.152) and (3.153) that represent M () and M (+), respectively, are said to be ladder operators or raising and lowering operators, because operating column vectors those operators convert jmi to jm 1i as mentioned above. The operators M () and M (+) correspond to a and a{ given in (2.65) and (2.66), respectively. All these operators are characterized by that the corresponding matrices have diagonal elements of zero and that nonvanishing elements are only positioned on “right above” or “right below” relative to the diagonal elements. These matrices are a kind of triangle matrices and all their diagonal elements are zero. The matrices are characteristic of nilpotent matrices. That is, if a suitable power of a matrix is zero as a matrix, such a matrix is said to be a nilpotent matrix (see Part III). In the present case, (2l + 1)-th power of M () and M (+) becomes zero as a matrix. The operator M () and M (+) can be described by the following shorthand representations: h i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M ðÞ ¼ ð2l k þ 1Þ kδkþ1,j ð1 k 2lÞ: kj

If l ¼ 0, Mz ¼ M (+)M () ¼ M2 ¼ 0. This case corresponds to Y 00 ðθ, ϕÞ ¼ and we do not need the matrix representation. Defining ak we have for instance

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l k þ 1Þ k,

ð3:160Þ pffiffiffiffiffiffiffiffiffiffi 1=4π

3.6 Orbital Angular Momentum: Analytic Approach

91

h i2 X ð Þ M ¼ aδ aδ ¼ ak akþ1 δkþ2,j p k kþ1,p p pþ1,j kj

¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l k þ 1Þ k ½2l ðk þ 1Þ þ 1 ðk þ 1Þδkþ2,j ,

ð3:161Þ

where the summation is nonvanishing only if p ¼ k + 1. The factor δk + 2, j implies that the elements are shifted by one toward upper right by being squared. Similarly we have h

M ð þÞ

i kj

¼ ak1 δk,jþ1 ð1 k 2l þ 1Þ:

ð3:162Þ

In (3.158), M (+)M () is represented as follows: h i X a δ a δ ¼ ak1 a j1 δk,j ¼ ðak1 Þ2 δk,j M ðþÞ M ðÞ ¼ p k1 k,pþ1 p pþ1,j kj

¼ ½2l ðk 1Þ þ 1 ðk 1Þδk,j ¼ ð2l k þ 2Þðk 1Þδk,j :

ð3:163Þ

Notice that although a0 is not defined, δ1, j + 1 ¼ 0 for any j, and so this causes no inconvenience. Hence, [M (+)M ()]kj of (3.163) is well defined with 1 k 2l + 1. Important properties of angular momentum operators examined above are based upon the fact that those operators are ladder operators and represented by nilpotent matrices. These characteristics will further be studied in Part III.

3.6

Orbital Angular Momentum: Analytic Approach

In this section, our central task is to solve the associated Legendre differential equation expressed by (3.127) by an analytical method. Putting m ¼ 0 in (3.127), we have dP0l ðxÞ d 1 x2 þ lðl þ 1ÞP0l ðxÞ ¼ 0, dx dx

ð3:164Þ

where we use a variable x instead of ξ. Equation (3.164) is called Legendre differential equation and its characteristics and solutions have been widely investigated. Hence, we put P0l ðxÞ Pl ðxÞ,

ð3:165Þ

where Pl(x) is said to be Legendre polynomials. We first start with Legendre differential equation and Legendre polynomials.

92

3 Hydrogen-Like Atoms

3.6.1

Spherical Surface Harmonics and Associated Legendre Differential Equation

Let us think of a following identity according to Byron and Fuller [4]:

1 x2

d l l 1 x2 ¼ 2lx 1 x2 , dx

ð3:166Þ

where l is a positive integer. We differentiate both sides of (3.166) (l + 1) times. Here we use the Leibniz rule about differentiation of a product function that is described by d n ðuvÞ ¼

Xn m¼0

n! d m udnm v, m!ðn mÞ!

ð3:167Þ

where dm u=dxm dm u: The above shorthand notation is due to Byron and Fuller [4]. We use this notation for simplicity from place to place. Noting that the third order and higher differentiations of (1 x2) vanish in LHS of (3.166), we have h l i LHS ¼ dlþ1 1 x2 d 1 x2 l l ¼ 1 x2 dlþ2 1 x2 2ðl þ 1Þxd lþ1 1 x2 l lðl þ 1Þdl 1 x2 : Also noting that the second order and higher differentiations of 2lx vanish in LHS of (3.166), we have h l i RHS ¼ d lþ1 2lx 1 x2 l l ¼ 2lxd lþ1 1 x2 2lðl þ 1Þd l 1 x2 : Therefore, LHS RHS l l l ¼ 1 x2 dlþ2 1 x2 2xd lþ1 1 x2 þ lðl þ 1Þdl 1 x2 ¼ 0: We define Pl(x) as

3.6 Orbital Angular Momentum: Analytic Approach

Pl ðxÞ

i ð1Þl d l h 2 l 1 x , 2l l! dxl

93

ð3:168Þ

l

Þ where a constant ð1 is multiplied according to the custom so that we can explicitly 2l l! represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x) defined above satisfies Legendre differential equation. Rewriting it, we get

1 x2

d 2 P l ð xÞ dP ðxÞ 2x l þ lðl þ 1ÞPl ðxÞ ¼ 0: dx dx2

ð3:169Þ

Or equivalently, we have d 2 dPl ðxÞ þ lðl þ 1ÞPl ðxÞ ¼ 0: 1x dx dx

ð3:170Þ

Returning to (3.127) and using x as a variable, we rewrite (3.127) as dPm d m2 l ð xÞ 1 x2 þ l ð l þ 1Þ Pm ðxÞ ¼ 0, dx dx 1 x2 l

ð3:171Þ

where l is a non-negative integer and m is an integer that takes following values: m ¼ l, l 1, l 2, 1, 0, 1, l þ 1, l: Deferential equations expressed as dyðxÞ d þ cðxÞyðxÞ ¼ 0 pð x Þ dx dx are of particular importance. We will come back to this point in Sect. 8.3. Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and Pm ð x Þ must satisfy the same differential equation (3.171). This implies that Pm l l ð xÞ m and Pl ðxÞ are connected, i.e., linearly dependent. First, let us assume that m 0. In the case of m < 0, we will examine it later soon. According to Dennery and Krzywicki [5], we assume 2 m=2 Pm CðxÞ, l ðxÞ ¼ κ 1 x

ð3:172Þ

where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we obtain

94

3 Hydrogen-Like Atoms

1 x2

d2 C dx

2

2ðm þ 1Þx

dC þ ðl mÞðl þ m þ 1ÞC ¼ 0 ð0 m lÞ: ð3:173Þ dx

Recall once again that if m ¼ 0, the associated Legendre differential equation given by (3.127) and (3.171) is exactly identical to Legendre differential equation of (3.170). Differentiating (3.170) m times, we get

d 2 d m Pl d d m Pl 1x 2ðm þ 1Þx m dx dxm dx2 dx 2

þ ðl m Þðl þ m þ 1 Þ

d m Pl ¼ 0, dxm

ð3:174Þ

where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173) and (3.174), we find that C ð xÞ ¼ κ 0

d m Pl , dxm

where κ0 is a constant. Inserting this relation into (3.172) and setting κκ 0 ¼ 1, we get m 2 m=2 d Pl ðxÞ ð0 m lÞ: Pm l ðxÞ ¼ 1 x dxm

ð3:175Þ

Using Rodrigues formula of (3.168), we have Pm l ð xÞ

m=2 dlþm h l i ð1Þl 1 x2 : 1 x2 l lþm dx 2 l!

ð3:176Þ

Equation (3.175) defines the associated Legendre functions. Note, however, that the function form differs from literature to literature [2, 5, 6]. Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often appear in the literature. The relevant differential equation is defined by

d2 λ d C ðxÞ ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ dx dx2 n

1 ¼0 λ> : 2

1 x2

Setting n ¼ l m and λ ¼ m þ 12 in (3.177) [5], we have

1 x2

d2 mþ12 d mþ1 C ðxÞ 2ðm þ 1Þx Clm2 ðxÞ 2 lm dx dx

ð3:177Þ

3.6 Orbital Angular Momentum: Analytic Approach

95 mþ1

þðl mÞðl þ m þ 1ÞC lm2 ðxÞ ¼ 0:

ð3:178Þ

Once again comparing (3.174) and (3.178), we obtain d m P l ð xÞ mþ1 ¼ constant C lm2 ðxÞ ð0 m lÞ: dxm

ð3:179Þ

Next, let us determine the constant appearing in (3.179). To this end, we consider a following generating function of the polynomials C λn ðxÞ defined by [7, 8]

1 2tx þ t 2

λ

X1

C λ ðxÞt n n¼0 n

λ>

1 : 2

ð3:180Þ

To calculate (3.180), let us think of a following expression for x and λ: ð 1 þ xÞ

λ

X1 λ ¼ xm , m¼0 m

where λ is an arbitrary real number and we define

λ m

λ m

ð3:181Þ

as

λðλ 1Þðλ 2Þ λ m þ 1=m!

and

λ 0

1: ð3:182Þ

Notice that (3.181) with (3.182) is an extension of binomial theorem (generalized binomial theorem). Putting λ ¼ n, we have

n m

¼

n! and ðn mÞ!m!

n 0

¼ 1:

We rewrite (3.181) using gamma functions Γ(z) such that ð1 þ xÞλ ¼

X1 m¼0

Γðλ þ 1Þ xm , m!Γðλ m þ 1Þ

ð3:183Þ

where Γ(z) is defined by integral representation as Z Γ ðzÞ ¼

1

et t z1 dt ð Re z > 0Þ:

0

Changing variables such that t ¼ u2, we have

ð3:184Þ

96

3 Hydrogen-Like Atoms

Z ΓðzÞ ¼ 2

1

eu u2z1 du ð Re z > 0Þ: 2

ð3:185Þ

0

Note that the above expression is associated with the following fundamental feature of the gamma functions: Γðz þ 1Þ ¼ zΓðzÞ,

ð3:186Þ

where z is any complex number. Replacing x with t(2x t) and rewriting (3.183), we have

1 2tx þ t 2

λ

¼

X1 m¼0

Γðλ þ 1Þ ðt Þm ð2x t Þm : m!Γðλ m þ 1Þ

ð3:187Þ

Assuming that x is a real number belonging to an interval [1, 1], (3.187) holds with t satisfying jt j < 1 [8]. The discussion is as follows: When x satisfies the above condition, solving 1 2tx + t2 ¼ 0 we get solution t such that pffiffiffiffiffiffiffiffiffiffiffiffiffi t ¼ x i 1 x2 : Defining r as r min fjt þ j, jt jg, (1 2tx + t2)λ, regarded as a function of t, is analytic in the disk |t| < r. But, we have jt j ¼ 1: Thus, (1 2tx + t2)λ is analytic within the disk jt j < 1 and, hence, it can be expanded in a Taylor’s series (see Chap. 6). Continuing the calculation of (3.187), we have

λ 1 2tx þ t 2 X X1 m Γðλ þ 1Þ m! m m k k mk mk ¼ 2 x ð1Þ t ð1Þ t m¼0 m!Γðλ m þ 1Þ k¼0 k!ðm k Þ! X1 Xm ð1Þmþk Γðλ þ 1Þ ¼ 2mk xmk t mþk m¼0 k¼0 k!ðm k Þ! Γðλ m þ 1Þ X1 Xm ð1Þmþk ð1Þm Γðλ þ mÞ ¼ 2mk xmk t mþk , m¼0 k¼0 k!ðm k Þ! ΓðλÞ ð3:188Þ

3.6 Orbital Angular Momentum: Analytic Approach

97

where the last equality results from that we rewrote gamma functions using (3.186). Replacing (m + k) with n, we get

1 2tx þ t 2

λ

¼

X1 X½n=2 ð1Þk 2n2k Γðλ þ n k Þ xn2k t n , n¼0 k¼0 k!ðn 2k Þ! Γ ðλÞ

ð3:189Þ

where [n/2] represents an integer that does not exceed n/2. This expression comes from a requirement that an order of x must satisfy the following condition: n 2k 0

or

k n=2:

ð3:190Þ

That is, if n is even, the maximum of k ¼ n/2. If n is odd, the maximum of k ¼ (n 1)/2. Comparing (3.180) and (3.189), we get [8] Cλn ðxÞ ¼

X½n=2 ð1Þk 2n2k Γðλ þ n k Þ xn2k : k¼0 k!ðn 2k Þ! Γ ðλÞ

ð3:191Þ

Comparing (3.164) and (3.177) and putting λ ¼ 1/2, we immediately find that the two differential equations are identical [7]. That is, C 1=2 n ðxÞ ¼ Pn ðxÞ:

ð3:192Þ

X½n=2 ð1Þk 2n2k Γ 1 þ n k 2 P n ð xÞ ¼ xn2k : k¼0 k!ðn 2k Þ! Γ 12

ð3:193Þ

Hence, we further have

Using (3.186) once again, we get [8] P n ð xÞ ¼

X½n=2 k¼0

ð1Þk ð2n 2kÞ! xn2k : 2 k!ðn kÞ!ðn 2k Þ! n

ð3:194Þ

It is convenient to make about a gamma function. In (3.193), n k > 0, a formula and so let us think of Γ 12 þ m ðm: positive integerÞ. Using (3.186), we have

98

3 Hydrogen-Like Atoms

1 1 1 1 3 3 þm ¼ m Γ m ¼ m m Γ m ¼ 2 2 2 2 2 2

1 3 1 1 1 m m Γ ¼ 2 ð2m 1Þð2m 3Þ 3 1 Γ ¼ m 2 2 2 2 2

ð 2m 1 Þ! ð 2m Þ! 1 1 ¼ 2m m1 Γ ¼ 22m : Γ 2 m! 2 ðm 1Þ! 2

Γ

ð3:195Þ Notice that (3.195) still holds even if m ¼ 0. Inserting n k into m of (3.195), we get Γ

ð2n 2kÞ! 1 1 þ n k ¼ 22ðnkÞ : Γ 2 2 ðn kÞ!

ð3:196Þ

Replacing Γ 12 þ n k of (3.193) with RHS of the above equation, (3.194) will follow. A gamma function Γ 12 often appears in mathematical physics. According to (3.185), we have Z 1

pffiffiffi 2 1 Γ ¼2 eu du ¼ π : 2 0 For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From (3.184), we also have Γð1Þ ¼ 1: In relation of the discussion of Sect. 3.5, let us derive an important formula about Legendre polynomials. From (3.180) and (3.192), we get

1 2tx þ t 2

1=2

X1

P ðxÞt n¼0 n

n

:

ð3:197Þ

Assuming jt j < 1, when we put x ¼ 1 in (3.197), we have

1 2tx þ t 2

12

¼

X1 X1 1 ¼ tn ¼ P ð1Þt n : n¼0 n¼0 n 1t

ð3:198Þ

Comparing individual coefficients of tn in (3.198), we get Pn ð1Þ ¼ 1: See the related parts of Sect. 3.5. Now, we are in the position to determine the constant in (3.179). Differentiating (3.194) m times, we have

3.6 Orbital Angular Momentum: Analytic Approach

99

dm Pl ðxÞ=dxm ¼ ¼

X½ðlmÞ=2 ð1Þk ð2l 2k Þ!ðl 2kÞðl 2k 1Þ ðl 2k m þ 1Þ 2l k!ðl k Þ!ðl 2k Þ!

k¼0

X½ðlmÞ=2 k¼0

xl2km

ð1Þk ð2l 2k Þ! xl2km : 2 k!ðl kÞ!ðl 2k mÞ! l

ð3:199Þ

Meanwhile, we have mþ1 C lm2 ðxÞ

X½ðlmÞ=2 ð1Þk 2l2km Γ l þ 1 k 2 ¼ xl2km : k¼0 k!ðl 2k mÞ! Γ m þ 12

ð3:200Þ

Using (3.195) and (3.196), we have Γ l þ 12 k ð2l 2k Þ! m! : ¼ 22ðlkmÞ ðl k Þ! ð2mÞ! Γ m þ 12 Therefore, we get mþ1

Clm2 ðxÞ ¼

X½ðlmÞ=2 k¼0

ð1Þk ð2l 2kÞ! 2m Γðm þ 1Þ l2km , x 2l k!ðl kÞ!ðl 2k mÞ! Γð2m þ 1Þ

ð3:201Þ

where we used m! ¼ Γ(m + 1) and (2m)! ¼ Γ(2m + 1). Comparing (3.199) and (3.201), we get dm Pl ðxÞ Γð2m þ 1Þ mþ12 ¼ m C ðxÞ: m dx 2 Γðm þ 1Þ lm

ð3:202Þ

Þ Thus, we find that the constant appearing in (3.179) is 2ΓmðΓ2mþ1 ðmþ1Þ. Putting m ¼ 0 in 1=2

(3.202), we have Pl ðxÞ ¼ C l ðxÞ. Therefore, (3.192) is certainly recovered. This gives an easy checkup to (3.202). Meanwhile, Rodrigues formula of Gegenbauer polynomials [5] is given by C λn ðxÞ

1i 1 n h ð1Þn Γðn þ 2λÞΓ λ þ 12 2 λþ2 d 2 nþλ2 1 x ¼ n 1 x : dxn 2 n!Γ n þ λ þ 12 Γð2λÞ

Hence, we have

ð3:203Þ

100

3 Hydrogen-Like Atoms

mþ1

C lm2 ðxÞ ¼

m dlm ð1Þlm Γðl þ m þ 1ÞΓðm þ 1Þ 1 x2 lm dxlm 2 ðl mÞ!Γðl þ 1ÞΓð2m þ 1Þ h i l 1 x2 :

ð3:204Þ

Inserting (3.204) into (3.202), we have m d lm

dm Pl ðxÞ ð1Þlm Γðl þ m þ 1Þ ¼ l 1 x2 Þ l 1 x2 m lm dx 2 ðl mÞ!Γðl þ 1Þ dx ¼

m dlm

ð1Þlm ðl þ mÞ! 1 x2 Þ l : 1 x2 l lm 2 l!ðl mÞ! dx

ð3:205Þ

Further inserting this into (3.175), we finally get Pm l ð xÞ ¼

lm h i ð1Þlm ðl þ mÞ! 2 m=2 d 2 l 1 x : 1 x dxlm 2l l!ðl mÞ!

ð3:206Þ

When m ¼ 0, we have P0l ðxÞ ¼

ð1Þl d l 1 x2 Þl ¼ Pl ðxÞ: l l 2 l! dx

ð3:207Þ

Thus, we recover the functional form of Legendre polynomials. The expression (3.206) is also meaningful for negative m, provided jm j l, and permits an extension of the definition of Pm l ðxÞ given by (3.175) to negative numbers of m [5]. Changing m to m in (3.206), we have Pm l ð xÞ ¼

m=2 dlþm h l i ð1Þlþm ðl mÞ! 1 x2 : 1 x2 l lþm dx 2 l!ðl þ mÞ!

ð3:208Þ

Meanwhile, from (3.168) and (3.175), Pm l ð xÞ ¼

m=2 dlþm h l i ð1Þl 1 x2 1 x2 ð0 m lÞ: l lþm dx 2 l!

ð3:209Þ

Comparing (3.208) and (3.209), we get Pm l ð xÞ ¼

ð1Þm ðl mÞ! m Pl ðxÞ ðl m lÞ: ðl þ mÞ!

m Thus, as expected earlier, Pm l ðxÞ and Pl ðxÞ are linearly dependent.

ð3:210Þ

3.6 Orbital Angular Momentum: Analytic Approach

101

Now, we return back to (3.149). From (3.206) we have

1 x2

m2 dlm

ð1Þlm 2l l!ðl mÞ! m 2 l 1 x ¼ Þ Pl ðxÞ: ðl þ mÞ! dxlm

ð3:211Þ

Inserting (3.211) into (3.149) and changing the variable x to ξ, we have Ym l ðθ, ϕÞ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þðl mÞ! m ¼ ð1Þ Pl ðξÞeimϕ ðξ ¼ cos θ; 0 θ π Þ: ð3:212Þ 4π ðl þ mÞ! m

The coefficient (1)m appearing (3.212) is well known as Condon–Shortley phase [7]. Another important expression obtained from (3.210) and (3.212) is

m m Y m l ðθ, ϕÞ ¼ ð1Þ Y l ðθ, ϕÞ :

ð3:213Þ

Since (3.208) or (3.209) involves higher order differentiations, it would somewhat be inconvenient to find their functional forms. Here we try to seek the convenient representation of spherical harmonics using familiar cosine and sine functions. Starting with (3.206) and applying Leibniz rule there, we have ð1Þlm ðl þ mÞ! ð1 þ xÞm=2 ð1 xÞm=2 2l l!ðl mÞ! Xlm ðl mÞ! d r

lmr

l d 1 xÞl r¼0 r 1 þ xÞ lmr r!ðl m r Þ! dx dx m m: lm X lm ð1Þ ðl þ mÞ! ðl mÞ! l!ð1 þ xÞlr 2 ð1Þlmr l!ð1 xÞmþr 2 ¼ r¼0 r!ðl m r Þ! ðl r Þ! ðm þ r Þ! 2l l!ðl mÞ! m m r lr rþ 2 ð 1 xÞ 2 l!ðl þ mÞ! Xlm ð1Þ ð 1 þ xÞ ¼ : l r¼0 r! ð l m r Þ! ð l r Þ! ð m þ r Þ! 2 ð3:214Þ

Pm l ð xÞ ¼

Putting x ¼ cos θ in (3.214) and using a trigonometric formula, we have Pm l ð xÞ

¼ l!ðl þ mÞ!

Xlm

cos 2l2rm ð1Þr r¼0 r!ðl m r Þ! ðl r Þ!

Inserting this into (3.212), we get

θ 2

sin 2rþm θ2 : ðm þ r Þ!

ð3:215Þ

102

3 Hydrogen-Like Atoms

Ym l ðθ, ϕÞ ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rþm cos 2l2rm θ2 sin 2rþm θ2 ð2l þ 1Þðl þ mÞ!ðl mÞ! imϕ Xlm ð1Þ e l! : r¼0 4π r!ðl m r Þ!ðl r Þ!ðm þ rÞ! ð3:216Þ Summation domain of r must be determined so that factorials of negative integers can be avoided [6]. That is, (i) If m 0, 0 r l m; (l m + 1) terms, (ii) If m < 0, jm j r l; (ljmj+1) terms. For example, if we choose l for m, putting r ¼ 0 in (3.216) we have rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi l l ð2l þ 1Þ! ilϕ ð1Þ cos l θ2 sin θ2 e ¼ l! 4π rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ! ilϕ ð1Þl sin l θ e ¼ : ð3:217Þ 4π 2l l! qffiffiffiffi 1 to recover (3.150). When m ¼ l, putting In particular, we have Y 00 ðθ, ϕÞ ¼ 4π Y ll ðθ, ϕÞ

r ¼ l in (3.216) we get Y l l ðθ, ϕÞ

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi l rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þ! ilϕ cos l θ2 sin θ2 ð2l þ 1Þ! ilϕ sin l θ ¼ e e ¼ : ð3:218Þ l! 4π 4π 2l l!

For instance, choosing l ¼ 3 and m ¼ 3 and using (3.217) or (3.218), we have Y 33 ðθ, ϕÞ

rffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffi 35 i3ϕ 35 i3ϕ 3 3 e sin θ, Y 3 ðθ, ϕÞ ¼ e ¼ sin 3 θ: 64π 64π

For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase. For l ¼ 3 and m ¼ 0, moreover, we have

3.6 Orbital Angular Momentum: Analytic Approach

103

r 2r θ 62r θ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin ð 1 Þ cos 2 2 7 3! 3!X3 Y 03 ðθ, ϕÞ ¼ 3! r¼0 4π r!ð3 r Þ!ð3 r Þ!r! 3 2 2 θ 4 θ 6 θ 4 θ 2 θ rffiffiffi cos 6 θ sin sin cos cos sin 2 2 2 2 2 2 7 76 6 7 ¼ 18 þ 1!2!2!1! 2!1!1!2! 3!0!0!3! 5 π 4 0!3!3!0! 9 8 θ θ θ θ θ θ > 6 2 2 6 2 2 rffiffiffi> > > sin sin cos cos < cos 2 sin 2 2 2 2 2 = 7 þ ¼ 18 36 4 π> > > > ; : rffiffiffi 7 5 3 ¼ cos 3 θ cos θ , π 4 4 where in the last equality we used formulae of elementary algebra and trigonometric functions. At the same time, we get Y 03 ð0, ϕÞ ¼

rffiffiffiffiffi 7 : 4π

This is consistent with (3.147) in that Y 03 ð0, ϕÞ is positive.

3.6.2

Orthogonality of Associated Legendre Functions

Orthogonality relation of functions is important. Here we deal with it, regarding the associated Legendre functions. Replacing m with (m 1) in (3.174) and using the notation introduced before, we have

1 x2 d mþ1 Pl 2mxd m Pl þ ðl þ mÞðl m þ 1Þdm1 Pl ¼ 0:

Multiplying both sides by (1 x2)m 1, we have

m1 m dmþ1 Pl 2mx 1 x2 d Pl m1 m1 þðl þ mÞðl m þ 1Þ 1 x2 d Pl ¼ 0: 1 x2

m

Rewriting the above equation, we get

ð3:219Þ

104

3 Hydrogen-Like Atoms

h

m1 m1 d 1 x2 Þm dm Pl ¼ ðl þ mÞðl m þ 1Þ 1 x2 d Pl :

ð3:220Þ

Now, let us define f (m) as follows: Z f ðmÞ

1 1

1 x2

m

ðd m Pl Þd m Pl0 dx ð0 m l, l0 Þ:

ð3:221Þ

Rewriting (3.221) as follows and integrating it by parts, we have Z f ðm Þ ¼ ¼

1

1

1 x2

m m1 m d d Pl d Pl0 dx

dm1 Pl ð1 x2 Þm dm Pl0 11

Z

1 1

m dm1 Pl d 1 x2 dm Pl0 dx

Z 1 m1 m

d Pl d 1 x2 dm Pl0 dx ¼ 1

Z ¼

1

1

m1 m1 dm1 Pl ðl0 þ mÞðl0 m þ 1Þ 1 x2 d Pl0 dx

¼ ðl0 þ mÞðl0 m þ 1Þf ðm 1Þ,

ð3:222Þ

where with the second equality the first term vanished and with the second last equality we used (3.220). Equation (3.222) gives a recurrence formula regarding f (m). Further performing the calculation, we get f ðmÞ ¼ ðl0 þ mÞðl0 þ m 1Þ ðl0 m þ 2Þðl0 m þ 1Þf ðm 2Þ ¼ ¼ ðl0 þ mÞðl0 þ m 1Þ ðl0 þ 1Þ l0 ðl0 m þ 2Þðl0 m þ 1Þf ð0Þ ¼

ðl0 þ mÞ! f ð0Þ, ðl0 mÞ!

ð3:223Þ

where we have Z f ð 0Þ ¼

1 1

Pl ðxÞPl0 ðxÞdx:

ð3:224Þ

Note that in (3.223) a coefficient of f(0) comprises 2m factors. In (3.224), Pl(x) and Pl0 ðxÞ are Legendre polynomials defined in (3.165). Then, using (3.168) we have

3.6 Orbital Angular Momentum: Analytic Approach 0

f ð 0Þ ¼

ð1Þl ð1Þl 0 2l l! 2l l0 !

105

Z 1h ih 0 0i d l 1 x2 l dl 1 x2 l dx: 1

ð3:225Þ

To evaluate (3.224), we have two cases; i.e., (i) l 6¼ l0 and (ii) l ¼ l0. With the first case, assuming that l > l0 and taking partial integration, we have Z 1h ih 0 i 0 I¼ d l 1 x2 Þl dl 1 x2 Þl dx 1

hn l on l0 l0 oi1 d 1 x2 ¼ d l1 1 x2

1

Z 1h i

h 0 0 d l1 1 x2 Þl dl þ1 1 x2 Þl dx: 1

ð3:226Þ

In the above, we find that the first term vanishes because it contains (1 x2) as a factor. Integrating (3.226) another l0 times as before, we get 0

I ¼ ð1Þl þ1

Z 1h ih 0 i 0 0 d ll 1 1 x2 Þl d2l þ1 1 x2 Þl dx: 1

0

ð3:227Þ

In (3.227) 0 l l0 1 2l, and so dll 1 ð1 x2 Þ does not vanish, but 0 l0 l0 ð1 x2 Þ is an at most 2l0-degree polynomial and, hence, d2l þ1 ð1 x2 Þ vanishes. Therefore, l

f ð0Þ ¼ 0:

ð3:228Þ

If l < l0, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) ¼ 0 as well. In the second case of l ¼ l0, we evaluate the following integral: Z 1h l i2 d l 1 x2 dx: I¼

ð3:229Þ

1

Similarly integrating (3.229) by parts l times, we have Z I ¼ ð1Þl

1 1

1 x2

l h

Z 1 l i l d2l 1 x2 dx ¼ ð1Þ2l ð2lÞ! 1 x2 dx:

In (3.230), changing x to cosθ, we have

1

ð3:230Þ

106

3 Hydrogen-Like Atoms

Z

1

1

l 1 x2 dx ¼

Z

π

sin 2lþ1 θdθ:

ð3:231Þ

0 2

ðl!Þ We have already estimate this integral in (3.132) to have 2ð2lþ1 Þ! . Therefore, 2lþ1

f ð0Þ ¼

ð1Þ2l ð2lÞ! 22lþ1 ðl!Þ2 2 : ¼ 2 2l 2l þ1 ð2l þ 1Þ! 2 ðl!Þ

ð3:232Þ

f ðmÞ ¼

ðl þ mÞ! ðl þ mÞ! 2 : f ð 0Þ ¼ ðl mÞ! ðl mÞ! 2l þ 1

ð3:233Þ

Thus, we get

From (3.228) and (3.233), we have Z

1

1

m Pm l ðxÞPl0 ðxÞdx ¼

ðl þ mÞ! 2 δ 0: ðl mÞ! 2l þ 1 ll

ð3:234Þ

Accordingly, putting sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi fm ðxÞ ð2l þ 1Þðl mÞ!Pm ðxÞ, P l l 2ðl þ mÞ!

ð3:235Þ

we get Z

1

1

fm0 ðxÞdx ¼ δll0 : fm ðxÞP P l l

ð3:236Þ

Normalized Legendre polynomials immediately follow. This is given by Pel ðxÞ

rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi l i ð1Þl 2l þ 1 d l h 2l þ 1 P l ð xÞ ¼ l 1 x2 : l 2 2 dx 2 l!

ð3:237Þ

Combining a normalized function (3.235) with p1ffiffiffiffi eimϕ , we recover 2π Ym l ðθ, ϕÞ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2l þ 1Þðl mÞ! m ¼ Pl ðxÞeimϕ ðx ¼ cos θ; 0 θ πÞ: 4π ðl þ mÞ!

ð3:238Þ

Notice in (3.238), however, we could not determine Condon–Shortley phase (1)m; see (3.212).

3.7 Radial Wave Functions of Hydrogen-Like Atoms

107

m Since Pm l ðxÞ and Pl ðxÞ are linearly dependent as noted in (3.210), the set of the associated Legendre functions cannot define a complete set of orthonormal system. In fact, we have

Z

1

1

m Pm l ðxÞPl ðxÞdx ¼

ð1Þm ðl mÞ! ðl þ mÞ! 2 2ð1Þm ¼ : 2l þ 1 ðl þ mÞ! ðl mÞ! 2l þ 1

ð3:239Þ

imϕ m This means that Pm to l ðxÞ and Pl ðxÞ are not orthogonal. Thus, we need e constitute the complete set of orthonormal system. In other words,

Z

Z

2π

1

dϕ 0

3.7

1

h 0 i

m 0 0 d ð cos θÞ Y m l0 ðθ, ϕÞ Y l ðθ, ϕÞ ¼ δll δmm :

ð3:240Þ

Radial Wave Functions of Hydrogen-Like Atoms

In Sect. 3.1 we have constructed Hamiltonian of hydrogen-like atoms. If the physical system is characterized by the central force field, the method of separation of variables into the angular part (θ, ϕ) and radial (r) part is successfully applied to the problem and that method allows us to deal with the Schrödinger equation separately. The spherical surface harmonics play a central role in dealing with the differential equations related to the angular part. We studied important properties of the special functions such as Legendre polynomials and associated Legendre functions, independent of the nature of the specific central force fields such as Coulomb potential and Yukawa potential. With the Schrödinger equation pertinent to the radial part, on the other hand, its characteristics differ depending on the nature of individual force fields. Of these, the differential equation associated with the Coulomb potential gives exact (or analytical) solutions. It is well known that the secondorder differential equations are often solved by an operator representation method. Examples include its application to a quantum-mechanical harmonic oscillator and angular momenta of a particle placed in a central force field. Nonetheless, the corresponding approach to the radial equation for the electron has been less popular to date. The initial approach, however, was made by Sunakawa [3]. The purpose of this chapter rests upon further improvement of that approach.

3.7.1

Operator Approach to Radial Wave Functions

In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger equation described as

108

3 Hydrogen-Like Atoms

2 1 ħ ∂ ħ2 λ Ze2 2 ∂Rðr Þ 2 r Rðr Þ ¼ ERðr Þ: þ 2 R ðr Þ 2μ 4πε0 r r r ∂r ∂r

ð3:51Þ

We identified λ with l(l + 1) in (3.124). Thus, rewriting (3.51) and indexing R(r) with l, we have

2 ħ lðl þ 1Þ ħ2 d 2 dRl ðr Þ Ze2 þ r R ðr Þ ¼ ERl ðr Þ, dr 4πε0 r l 2μr 2 dr 2μr 2

ð3:241Þ

where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and eigenvalue of energy, respectively. Otherwise we follow conventions. Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantummechanical harmonic oscillator and the previous section of the angular momentum operator, we present the operator formalism in dealing with radial wave functions of hydrogen-like atoms. The essential point rests upon that the radial wave functions can be derived by successively operating lowering operators on a radial wave function having a maximum allowed orbital angular momentum quantum number. The results agree with the conventional coordinate representation method based upon power series expansion that is related to associated Laguerre polynomials. Sunakawa [3] introduced the following differential equation by suitable transformations of a variable, parameter, and function: d 2 ψ l ð ρÞ l ð l þ 1Þ 2 þ ψ l ðρÞ ¼ Eψ l ðρÞ, ρ dρ2 ρ2

ð3:242Þ

a 2 where ρ ¼ Zra , E ¼ 2μ E , and ψ l(ρ) ¼ ρRl(r) with a (4πε0ħ2/μe2) being Bohr ħ2 Z radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The related calculations are as follows: We have dRl d ðψ l =ρÞ dρ ¼ ¼ dρ dr dr

dψ l 1 ψ l Z : dρ ρ ρ2 a

Thus, we get d 2 ψ l ð ρÞ dRl dψ l a d 2 dRl r r ρ: ¼ r ψ l, ¼ Z dr dr dρ dr dρ2 2

Using the above relations we arrive at (3.242). Here we define the following operators:

3.7 Radial Wave Functions of Hydrogen-Like Atoms

d l 1 þ : bl dρ ρ l

109

ð3:243Þ

Hence, b{l ¼

d l 1 þ , dρ ρ l

ð3:244Þ

where the operator b{l is an adjoint operator of bl. Notice that these definitions are d different from those of Sunakawa [3]. The operator dρ ð AÞ is formally an antiHermitian operator. We have mentioned such an operator in Sect. 1.5. The second terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus, we foresee that bl and b{l can be denoted as follows: bl ¼ A þ H

and b{l ¼ A þ H:

These representations are analogous to those appearing in the operator formalism of a quantum-mechanical harmonic oscillator. Special care, however, should be taken in dealing with the operators bl and b{l . First, we should carefully examine d d whether dρ is in fact an anti-Hermitian operator. This is because for dρ to be antiHermitian, the solution ψ l(ρ) must satisfy boundary conditions in such a way that ψ l(ρ) vanishes or takes the same value at the endpoints ρ ! 0 and 1. Second, the coordinate system we have chosen is not Cartesian coordinate but the polar (spherical) coordinate, and so ρ is defined only on a domain ρ > 0. We will come back to this point later. Let us proceed on calculations. We have bl b{l

d l 1 d l 1 ¼ þ þ dρ ρ l dρ ρ l d2 d l 1 l 1 d l2 2 1 ¼ 2þ þ 2 þ 2 dρ ρ l ρ l dρ ρ ρ l dρ ¼

lðl 1Þ 2 1 d2 l l2 2 1 d2 þ ¼ þ þ 2 þ ρ l dρ2 ρ2 ρ2 ρ l2 dρ2 ρ2

ð3:245Þ

Also, we have b{l bl ¼

l ðl þ 1 Þ 2 1 d2 þ þ 2: 2 ρ l dρ ρ2

We further define an operator H(l ) as follows:

ð3:246Þ

110

3 Hydrogen-Like Atoms

H

ðlÞ

l ð l þ 1Þ 2 d2 2þ : ρ dρ ρ2

Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have H ðlÞ ¼ blþ1 b{lþ1 þ εðlÞ ðl 0Þ,

ð3:247Þ

1 . Alternatively, where εðlÞ ðlþ1 Þ2

H ðlÞ ¼ b{l bl þ εðl1Þ ðl 1Þ:

ð3:248Þ

If we put l ¼ n 1 in (3.247) with n being a fixed given integer larger than l, we obtain H ðn1Þ ¼ bn b{n þ εðn1Þ :

ð3:249Þ

We evaluate the following inner product of both sides of (3.249): D E χH ðn1Þ χ ¼ h χ jbn b{ jχ i þ εðn1Þ h χ jχ i n

¼ b{ χ b{ χi þ εðn1Þ h χ jχ i n

n

2 ¼ b{n jχ i þ εðn1Þ h χ jχ i εðn1Þ :

ð3:250Þ

Here we assume that χ is normalized. On the basis of the variational principle [9], the above expected value must take a minimum ε(n 1) so that χ can be an eigenfunction. To satisfy this condition, we have jb{n χi ¼ 0:

ð3:251Þ

In fact, if (3.251) holds, from (3.249) we have H ðn1Þ χ ¼ εðn1Þ χ:

ð3:252Þ

We define such a function as below ðnÞ

ψ n1 χ: From (3.247) and (3.248), we have the following relationship:

ð3:253Þ

3.7 Radial Wave Functions of Hydrogen-Like Atoms

111

H ðlÞ blþ1 ¼ blþ1 H ðlþ1Þ ðl 0Þ:

ð3:254Þ

Meanwhile we define the functions as shown below ðnÞ

nÞ ψ ðns bnsþ1 bnsþ2 ∙ ∙ ∙ ∙ bn1 ψ n1 ð2 s nÞ:

ð3:255Þ ðnÞ

With these functions (s – 1) operators have been operated on ψ n1 . Note that if s took 1, no operation of bl would take place. Thus, we find that bl functions upon the l-state to produce the (l 1)-state. That is, bl acts as an annihilation operator. For the sake of convenience we express H ðn,sÞ H ðnsÞ :

ð3:256Þ

Using this notation and (3.254), we have ðnÞ

nÞ H ðn,sÞ ψ ðns ¼ H ðn,sÞ bnsþ1 bnsþ2 bn1 ψ n1 ðnÞ

¼ bnsþ1 H ðn,s1Þ bnsþ2 bn1 ψ n1 ðnÞ

¼ bnsþ1 bnsþ2 H ðn,s2Þ bn1 ψ n1 ð nÞ

¼ bnsþ1 bnsþ2 H ðn,2Þ bn1 ψ n1 ð nÞ

¼ bnsþ1 bnsþ2 bn1 H ðn,1Þ ψ n1 ðnÞ

¼ bnsþ1 bnsþ2 bn1 εðn1Þ ψ n1 ðnÞ

¼ εðn1Þ bnsþ1 bnsþ2 bn1 ψ n1 nÞ ¼ εðn1Þ ψ ðns :

ð3:257Þ

nÞ ð1 s nÞ belong to the same eigenvalue ε(n 1). Thus, total n functions ψ ðns Notice that the eigenenergy En corresponding to ε(n 1) is given by

En ¼

ħ2 Z 2 1 : 2μ a n2

ð3:258Þ ðnÞ

If we define l n s and take account of (3.252), total n functions ψ l (l ¼ 0, 1, 2, ∙ ∙ ∙ ∙, n 1) belong to the same eigenvalue ε(n 1). ðnÞ The quantum state ψ l is associated with the operators H(l ). Thus, the solution of ðnÞ (3.242) has been given by functions ψ l parametrized with n and l on condition that (3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter

112

3 Hydrogen-Like Atoms ðnÞ

l by one from l to l – 1, when it operates on ψ l . The operator b0 cannot be defined as indicated in (3.243), and so the lowest number of l should be zero. Operators such as bl are known as a ladder operator (lowering operator or annihilation operator in the ðnÞ present case). The implication is that the successive operations of bl on ψ n1 produce various parameters l as a subscript down to zero, while retaining the same integer parameter n as a superscript.

3.7.2

Normalization of Radial Wave Functions

Next we seek normalized eigenfunctions. Coordinate representation of (3.251) takes

ðnÞ dψ n1 n 1 ðnÞ ψ n1 ¼ 0: þ ρ n dρ

ð3:259Þ

The solution can be obtained as ðnÞ

ψ n1 ¼ cn ρn eρ=n ,

ð3:260Þ

where cn is a normalization constant. This can be determined as follows: Z

1

ðnÞ 2 ψ n1 dρ ¼ 1:

0

ð3:261Þ

Namely, Z j cn j

1

2

ρ2n e2ρ=n dρ ¼ 1:

ð3:262Þ

0

Consider the following definite integral: Z

1

e2ρξ dρ ¼

0

1 : 2ξ

Differentiating the above integral 2n times with respect to ξ gives Z

1

ρ2n e2ρξ dρ ¼

0

Substituting 1/n into ξ, we obtain

2nþ1 1 ð2nÞ!ξð2nþ1Þ : 2

ð3:263Þ

3.7 Radial Wave Functions of Hydrogen-Like Atoms

Z

1

ρ2n e2ρ=n dρ ¼

0

2nþ1 1 ð2nÞ!nð2nþ1Þ : 2

113

ð3:264Þ

Hence, cn ¼

nþ12 pffiffiffiffiffiffiffiffiffiffi 2 = ð2nÞ!: n

ð3:265Þ

To further normalize the other wave functions, we calculate the following inner product: D

E D E ðnÞ ðnÞ ðnÞ ðnÞ ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1 :

ð3:266Þ

From (3.247) and (3.248), we have b{l bl þ εðl1Þ ¼ blþ1 b{lþ1 þ εðlÞ ðl 1Þ:

ð3:267Þ

Applying (3.267) to (3.266) repeatedly and considering (3.251), we reach the following relationship: E ðnÞ ðnÞ ψ l ψ l E h ih i h iD ðnÞ ðnÞ ¼ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlÞ ψ n1 ψ n1 : D

ð3:268Þ

ðnÞ

To show this, we use mathematical induction. We have already normalized ψ n1 D E ðnÞ ðnÞ in (3.261). Next, we calculate ψ n2 jψ n2 such that D

E D E D E ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ ψ n2 jψ n2 ¼ ψ n1 b{n1 jbn1 ψ n1 ¼ ψ n1 b{n1 bn1 ψ n1 D h i E ðnÞ ðnÞ ¼ ψ n1 bn b{n þ εðn1Þ εðn2Þ ψ n1 D E h iD E ðnÞ ðnÞ ðnÞ ðnÞ ¼ ψ n1 bn b{n ψ n1 þ εðn1Þ εðn2Þ ψ n1 ψ n1 h iD E ðnÞ ðnÞ ¼ εðn1Þ εðn2Þ ψ n1 ψ n1 :

ð3:269Þ

With the third equality, we used (3.267) with l ¼ n 1; with the last equality we used (3.251). Therefore, (3.268) holds D with lE¼ n 2. Then, Dit sufficesEto show that ðnÞ ðnÞ ðnÞ ðnÞ as well. assuming that (3.268) holds with ψ lþ1 ψ lþ1 , it holds with ψ l ψ l D E ðnÞ ðnÞ Let us calculate ψ l ψ l , starting with (3.266) as below:

114

3 Hydrogen-Like Atoms

E D E ðnÞ ðnÞ ðnÞ ðnÞ ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1 D

E ðnÞ ðnÞ ¼ ψ n1 b{n1 b{lþ2 b{lþ1 blþ1 blþ2 bn1 ψ n1 D h i E ðnÞ ðnÞ ¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 þ εðlþ1Þ εðlÞ blþ2 bn1 ψ n1 D

E ðnÞ ðnÞ ¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 blþ2 bn1 ψ n1 h iD E ðnÞ ð nÞ þ εðlþ1Þ εðlÞ ψn1 b{n1 b{lþ2 blþ2 bn1 ψ n1 D

E ðnÞ ðnÞ ¼ ψ n1 b{n1 b{lþ2 blþ2 b{lþ2 blþ2 bn1 ψ n1 h iD E ðnÞ ðnÞ þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 : D

ð3:270Þ

In the next step, using b{lþ2 blþ2 ¼ blþ3 b{lþ3 þ εðlþ2Þ εðlþ1Þ , we have D E D E ðnÞ ðnÞ ðnÞ ðnÞ ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 blþ2 blþ3 b{lþ3 blþ3 bn1 ψ n1 h iD E h iD E ðnÞ ðnÞ ðnÞ ðnÞ þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 þ εðlþ2Þ εðlþ1Þ ψ lþ1 ψ lþ1 : ð3:271Þ Thus, we find that in the first term the index number of b{lþ3 has been increased by one with itself transferred toward theDright side. EOn the other hand, we notice that ðnÞ ðnÞ with the second and third terms, εðlþ1Þ ψ lþ1 ψ lþ1 cancels out. Repeating the above processes, we reach a following expression: D

E D ðnÞ E ðnÞ ðnÞ ðnÞ ψ l ψ l ¼ ψ n1 b{n1 b{lþ2 blþ2 bn1 bn b{n ψ n1 h iD E h iD E ðnÞ ðnÞ ðnÞ ðnÞ þ εðn1Þ εðn2Þ ψ lþ1 ψ lþ1 þ εðn2Þ εðn3Þ ψ lþ1 ψ lþ1 þ h iD E h iD E ð3:272Þ ðnÞ ðnÞ ðnÞ ðnÞ þ εðlþ2Þ εðlþ1Þ ψ lþ1 ψ lþ1 þ εðlþ1Þ εðlÞ ψ lþ1 ψ lþ1 h iD E ðnÞ ðnÞ ¼ εðn1Þ εðlÞ ψ lþ1 ψ lþ1 :

In (3.272), the D first termE of RHS vanishes because of (3.251); the subsequent ðnÞ ðnÞ terms produce ψ lþ1 ψ lþ1 whose coefficients have cancelled out one another except for [ε(n 1) ε(l )]. Meanwhile, from assumption of the mathematical induction we have

3.7 Radial Wave Functions of Hydrogen-Like Atoms

115

D E ðnÞ ðnÞ ψ lþ1 ψ lþ1 h ih i h iD E ðnÞ ðnÞ ¼ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlþ1Þ ψ n1 ψ n1 : Inserting this equation into (3.272), we arrive at (3.268). In other words, we have shown that if (3.268) holds with l ¼ l + 1, (3.268)D holds with E l ¼ l as well. This ðnÞ ðnÞ with l down to 0. completes the proof to show that (3.268) is true of ψ l ψ l ðnÞ

el The normalized wave functions ψ ðnÞ

el ψ

are expressed from (3.255) as ðnÞ

e n1 , ¼ κðn, lÞ2 blþ1 blþ2 bn1 ψ 1

ð3:273Þ

where κ(n, l ) is defined such that h i h i h i κðn, lÞ εðn1Þ εðn2Þ εðn1Þ εðn3Þ εðn1Þ εðlÞ ,

ð3:274Þ

with l n 2. More explicitly, we get κ ðn, lÞ ¼

ð2n 1Þ!ðn l 1Þ!ðl!Þ2 : ðn þ lÞ!ðn!Þ2 ðnnl2 Þ2

ð3:275Þ

In particular, from (3.265) we have ðnÞ

e n1 ¼ ψ

nþ12 2 1 pffiffiffiffiffiffiffiffiffiffi ρn eρ=n : n ð2nÞ!

ð3:276Þ

From (3.272), we define the following operator: h i12 e bl εðn1Þ εðl1Þ bl :

ð3:277Þ

Then (3.273) becomes ðnÞ

el ψ

ðnÞ e n1 : ¼e blþ1e bn1 ψ blþ2 e

ð3:278Þ

116

3.7.3

3 Hydrogen-Like Atoms

Associated Laguerre Polynomials ðnÞ

e l with conventional wave It will be of great importance to compare the functions ψ functions that are expressed using associated Laguerre polynomials. For this purpose ðnÞ we define the following functions Φl ðρÞ such that ðnÞ Φl ð ρÞ

lþ32 2 n

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðn l 1Þ! ρn lþ1 2lþ1 2ρ : e ρ Lnl1 n 2nðn þ lÞ!

ð3:279Þ

The associated Laguerre polynomials are described as Lνn ðxÞ ¼

1 ν x dn nþν x x e n ðx e Þ, ðν > 1Þ: n! dx

ð3:280Þ

In a form of power series expansion, the polynomials are expressed for integer k 0 as Lkn ðxÞ ¼

Xn

ð1Þm ðn þ k Þ! xm : m¼0 ðn mÞ!ðk þ mÞ!m!

ð3:281Þ

Notice that “Laguerre polynomials” Ln(x) are defined as Ln ðxÞ L0n ðxÞ: Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series expansion of Ln(x) are given by [2, 5] Ln ðxÞ ¼ L n ð xÞ ¼ ðnÞ

1 x dn n x ðx e Þ, e n! dxn

Xn m¼0

ð1Þm n! xm : ðn mÞ!ðm!Þ2 ρ

The function Φl ðρÞ contains multiplication factors en and ρl + 1. The function ðnÞ 2lþ1 2ρ Lnl1 n is a polynomial of ρ with the highest order of ρn l 1. Therefore, Φl ðρÞ ρ consists of summation of terms containing en ρt , where t is an integer equal to 1 or ðnÞ larger. Consequently, Φl ðρÞ ! 0 when ρ ! 0 and ρ ! 1 (vide supra). Thus, we ðnÞ have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned earlier and, d hence, the operator dρ is indeed an anti-Hermitian. To show this more explicitly, we d define D dρ . An inner product between arbitrarily chosen functions f and g is

3.7 Radial Wave Functions of Hydrogen-Like Atoms

Z

1

f Dgdρ 0 Z 1 ¼ ½ f g 1 ðDf Þgdρ 0

h f jDgi

117

ð3:282Þ

0

¼ ½ f g 1 0 þ hDf jgi,

where f is a complex conjugate of f. Meanwhile, from (1.112) we have h f jDgi ¼ D{ f jgi:

ð3:283Þ

Therefore if the functions f and g vanish at ρ ! 0 and ρ ! 1, D{ ¼ D by equating (3.282) and (3.283). This means that D is anti-Hermitian. The functions ðnÞ Φl ðρÞ we are dealing with certainly satisfy the required boundary conditions. The operator H(l ) appearing in (3.247) and (3.248) is Hermitian accordingly. This is because b{l bl ¼ ðA þ H ÞðA þ H Þ ¼ H 2 A2 AH þ HA,

b{l bl

{

ð3:284Þ

{ { 2 2 ¼ H 2 A2 H { A{ þ A{ H { ¼ H { A{ H { A{ þ A{ H { ¼ H 2 ðAÞ2 H ðAÞ þ ðAÞH ¼ H 2 A2 þ HA AH ¼ b{l bl : ð3:285Þ

The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate (or wave function) which belongs to that eigenvalue are physically meaningful. Next, consider the following operation: ðnÞ e bl Φ l ð ρÞ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n

o h i12 lþ32 ðn l 1Þ! ρ 2 d l 1 2ρ ðn1Þ ðl1Þ ¼ ε þ en ρlþ1 L2lþ1 , ε nl1 n ρ l n 2nðn þ lÞ! dρ

ð3:286Þ where h

εðn1Þ εðl1Þ

i12

nl ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ðn þ lÞðn lÞ

ð3:287Þ

2ρ Rewriting L2lþ1 in a power series expansion form using (3.280) and nl1 n rearranging the result, we obtain

118

3 Hydrogen-Like Atoms

ðnÞ e bl Φ l ð ρÞ

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

lþ32 2 nl 1 d l 1 p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ¼ þ n ρ l ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! dρ nl1 ρ l d nþl 2ρ n n eρ ρ e dρnl1 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

lþ32 2 nl 1 d l 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ þ n ρ l ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! dρ ρ

en

Xnl1 ð1Þm ðn þ lÞ! m¼0

ð2l þ m þ 1Þ!

ρlþmþ1

m ðn l 1Þ! 2 , n m!ðn l m 1Þ!

ð3:288Þ

where we used well-known Leibniz rule of higher order differentiation of a product nþl 2ρ=n d nl1 d does . To perform further calculation, notice that dρ function, i.e., dρnl1 ρ e ρ/n l+m+1 not change a functional form of e , whereas it lowers the order of ρ by one. Meanwhile, operation of ρl lowers the order of ρl + m + 1 by one as well. The factor nþl 2l in the following equation (3.289) results from these calculation processes. Considering these characteristics of the operator, we get ðnÞ e bl Φ l ð ρÞ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

lþ32 2 nþl nl 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eρ=n ρl ¼ n 2l ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! (

mþ1 Xnl1 ð1Þmþ1 ðn þ lÞ! ðn l 1Þ! mþ1 2 ρ m¼0 n ð2l þ m þ 1Þ! m!ðn l m 1Þ! )

m Xnl1 ð1Þm ðn þ l 1Þ! ðn l 1Þ! m 2 þ2l m¼0 : ρ n ð2l þ mÞ! m!ðn l m 1Þ!

ð3:289Þ

In (3.289), calculation of the part { } next to the multiplication sign for RHS is somewhat complicated, and so we describe the outline of the calculation procedure below. { } of RHS of (3.289) ¼

m Xnl ð1Þm ðn þ lÞ! ðn l 1Þ! m 2 ρ m¼1 n ð2l þ mÞ! ðm 1Þ!ðn l mÞ!

m Xnl1 ð1Þm ðn þ l 1Þ! ðn l 1Þ! 2 þ2l m¼0 ρm n ð2l þ mÞ! m!ðn l m 1Þ!

3.7 Radial Wave Functions of Hydrogen-Like Atoms

¼

Xnl1 ð1Þm

2ρm n

m¼1

119

ðn l 1Þ!ðn þ l 1Þ! ð2l þ mÞ!

nþl 2l þ ðm 1Þ!ðn l mÞ! m!ðn l m 1Þ! þð1Þnl

¼

Xnl1 ð1Þm

2ρm n

m¼1

2ρ n

nl

þ

ðn þ l 1Þ! ð2l 1Þ!

ðn l 1Þ!ðn þ l 1Þ! ð2l þ mÞ!

ðm 1Þ!ðn l m 1Þ!ð2l þ mÞðn lÞ ðm 1Þ!ðn l mÞ!m!ðn l m 1Þ! þð1Þnl

2ρ n

nl

þ

ðn þ l 1Þ! ð2l 1Þ!

2ρm

nl ðn þ l 1Þ! ðn lÞ!ðn þ l 1Þ! nl 2ρ þ þ ð 1 Þ m¼1 n ð2l þ m 1Þ!ðn l mÞ!m! ð2l 1Þ! " Xnl1 ð1Þm 2ρ m ðn þ l 1Þ! 1 2ρ nl nl n ¼ ðn lÞ! þ ð1Þ m¼1 ð2l þ m 1Þ!ðn l mÞ!m! ðn lÞ! n ¼

Xnl1 ð1Þm

n

# Xnl ð1Þm 2ρ m ðn þ l 1Þ! ðn þ l 1Þ! n þ ¼ ðn lÞ! m¼0 ð2l þ m 1Þ!ðn l mÞ!m! ðn lÞ!ð2l 1Þ! ¼ ðn lÞ!L2l1 nl

2ρ : n

ð3:290Þ

Notice that with the second equality of (3.290), the summation is divided into three terms; i.e., 1 m n l 1, m ¼ n l (the highest-order term), and m ¼ 0 (the lowest- order term). Note that with the second last equality of (3.290), the highest-order (n l) term and the lowest-order term (i.e., a constant) have been absorbed in a single equation, namely, an associated Laguerre polynomial. Correspondingly, with the second last equality of (3.290) the summation range is extended to 0 m n l. Summarizing the above results, we get

120

3 Hydrogen-Like Atoms ðnÞ e bl Φl ð ρÞ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

lþ32

nlðn lÞ! 2 nþl 1 2ρ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ eρ=n ρl L2l1 nl n 2l n ðn þ lÞðn lÞ 2nðn þ lÞ!ðn l 1Þ! sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

lþ12

ðn lÞ! 2 2ρ ¼ eρ=n ρl L2l1 nl n n 2nðn þ l 1Þ! sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðl1Þþ32 ½n ðl 1Þ 1! ρ

2 2ρ 2ðl1Þþ1 ðnÞ ¼ Φl1 ðρÞ: en ρðl1Þþ1 Lnðl1Þ1 n n 2n½n þ ðl 1Þ!

ð3:291Þ ðnÞ

ðnÞ

e l . Moreover, if we replace Thus, we find out that Φl ðρÞ behaves exactly like ψ l in (3.279) with n – 1, we find ðnÞ

ðnÞ

ð3:292Þ

ðnÞ

ð3:293Þ

e n1 : Φn1 ðρÞ ¼ ψ Operating e bn1 on both sides of (3.292), ðnÞ

e n2 : Φn2 ðρÞ ¼ ψ Likewise successively operating e bl (1 l n 1), ðnÞ

ðnÞ

e l ðρÞ, Φl ðρÞ ¼ ψ

ð3:294Þ

with all allowed numbers of l (i.e., 0 l n 1). This permits us to identify ðnÞ

ðnÞ

e l ðρÞ: Φl ð ρÞ ψ

ð3:295Þ

Consequently, it is clear that the parameter n introduced in (3.249) is identical to a principal quantum number and that the parameter l (0 l n 1) is an orbital angular momentum quantum number (or azimuthal quantum number). The functions ðnÞ ðnÞ e l ðρÞ are identical up to the constant cn expressed in (3.265). Note, Φl ðρÞ and ψ however, that a complex constant with an absolute number of 1 (phase factor) remains undetermined, as is always the case with the eigenvalue problem. The radial wave functions are derived from the following relationship as described earlier: ðnÞ

ðnÞ

e l =ρ: Rl ðr Þ ¼ ψ ðnÞ

To normalize Rl ðr Þ, we have to calculate the following integral:

ð3:296Þ

3.7 Radial Wave Functions of Hydrogen-Like Atoms

Z 0

Z ðnÞ 2 2 Rl ðr Þ r dr ¼

1

1

0

121

3 Z 1 1 ðnÞ 2 a 2 a a ðnÞ 2 el e l dρ ρ dρ ¼ ψ ψ 2 Z Z Z ρ 0

3 a : ¼ Z

ð3:297Þ ðnÞ

el ðr Þ for the normalized radial Accordingly, we choose the following functions R wave functions: eðl nÞ ðr Þ ¼ R

qffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðnÞ ðZ=aÞ3 Rl ðr Þ:

ð3:298Þ

Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we obtain eðl nÞ ðr Þ R

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

3 ðn l 1Þ! l h i

2Z 2Zr Zr 2Zr L2lþ1 : ¼ exp nl1 an an an 2nðn þ lÞ! an

ð3:299Þ

Equation (3.298) is exactly the same as the normalized radial wave functions that can be obtained as the solution of (3.241) through the power series expansion. All ħ2 Z 2 1 these functions belong to the same eigenenergy E n ¼ 2μ a n2 ; see (3.258). Returning back to the quantum states sharing the same eigenenergy, we have such n states that belong to the principal quantum number n with varying azimuthal quantum number l (0 l n 1). Meanwhile, from (3.158), (3.159), and (3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1 different values of m ¼ l, l 1,

l 2, 1, 0, 1, l þ 1, l:

The quantum states of a hydrogen-like atoms are characterized by a set of integers (m, l, n). On the basis of the above discussion, the number of those sharing the same energy (i.e., the principal quantum number n) is Xn1 l¼0

ð2l þ 1Þ ¼ n2 :

Regarding this situation, we say that the quantum states belonging to the principal 2 ħ2 Z 2 1 quantum number n, or the energy eigenvalue En ¼ 2μ a n2 , are degenerate n fold. Note that we ignored the freedom of spin state. In summary of this section, we have developed the operator formalism in dealing with radial wave functions of hydrogen-like atoms and seen how the operator formalism features the radial wave functions. The essential point rests upon that

122

3 Hydrogen-Like Atoms

the radial wave functions can be derived by successively operating the lowering ð nÞ e n1 that is parametrized with a principal quantum number n and an operators bl on ψ orbital angular momentum quantum number l ¼ n 1. This is clearly represented by (3.278). The results agree with the conventional coordinate representation method based upon the power series expansion that leads to associated Laguerre polynomials. Thus, the operator formalism is again found to be powerful in explicitly representing the mathematical constitution of quantum-mechanical systems.

3.8

Total Wave Functions

Since we have obtained angular wave functions and radial wave functions, we e ðnÞ of hydrogen-like atoms as a product describe normalized total wave functions Λ l,m

of the angular part and radial part such that e ðnÞ ¼ Y m ðθ, ϕÞR eðl nÞ ðr Þ: Λ l l,m

ð3:300Þ

Let us seek several tangible functional forms of hydrogen (Z ¼ 1) including angular and radial parts. For example, we have ϕð1sÞ

eð01Þ ðr Þ Y 00 ðθ, ϕÞR

! rffiffiffi rffiffiffiffiffi ðnÞ e n1 1 3=2 ψ 1 3=2 r=a a a ¼ e , ¼ 4π π ρ

ð3:301Þ

where we used (3.276) and (3.295). For ϕ(2s), using (3.277) and (3.278) we have

eð02Þ ðr Þ ¼ p1ffiffiffiffiffi a32 e2ar 2 r : ϕð2sÞ Y 00 ðθ, ϕÞR a 4 2π

ð3:302Þ

For ϕ(2pz), in turn, we express it as

ϕ 2pz

eð12Þ ðr Þ Y 01 ðθ, ϕÞR

rffiffiffiffiffi 3 r r 3 1 ð cos θÞ pffiffiffi a2 e2a ¼ 4π a 2 6

3 r r 3 r r z 5 r 1 1 1 ¼ pffiffiffiffiffi a2 e2a cos θ ¼ pffiffiffiffiffi a2 e2a ¼ pffiffiffiffiffi a2 e2a z: ð3:303Þ a a r 4 2π 4 2π 4 2π

For ϕ(2px + iy), using (3.217) we get

References

123

1 32 r 2ar eð12Þ ðr Þ ¼ p ffiffiffi a e sin θeiϕ ϕ 2pxþiy Y 11 ðθ, ϕÞR a 8 π 3 r r x þ iy 5 r 1 1 ¼ pffiffiffi a2 e2a ðx þ iyÞ: ¼ pffiffiffi a2 e2a a r 8 π 8 π

ð3:304Þ

In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore, we have 1 32 r 2ar eð2Þ sin θeiϕ ϕ 2pxiy Y 1 1 ðθ, ϕÞR1 ðr Þ ¼ pffiffiffi a a e 8 π 3 r r x iy 5 r 1 1 ¼ pffiffiffi a2 e2a ðx iyÞ: ¼ pffiffiffi a2 e2a a r 8 π 8 π

ð3:305Þ

Notice that the above notations ϕ(2px + iy) and ϕ(2px iy) differ from the custom that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.

References 1. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York 2. Arfken GB (1970) Mathematical methods for physicists, 2nd edn. Academic Press, Waltham 3. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York 5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 6. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd edn. Cambridge University Press, Cambridge 7. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 8. Lebedev NN (1972) Special functions and their applications. Dover, New York 9. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York

Chapter 4

Optical Transition and Selection Rules

In Sect. 1.2 we showed the Schrödinger equation as a function of space coordinates and time. In subsequent sections, we dealt with the time-independent eigenvalue problems of a harmonic oscillator and a hydrogen-like atoms. This implies that the physical system is isolated from the outside world and that there is no interaction between the outside world and physical system we are considering. However, by virtue of the interaction the system may acquire or lose energy, momentum, angular momentum, etc. As a consequence of the interaction, the system changes its quantum state as well. Such a change is said to be a transition. If the interaction takes place as an optical process, we are to deal with an optical transition. Of various optical transitions, the electric dipole transition is common and the most important. In this chapter, we study the optical transition of a particle confined in a potential well, a harmonic oscillator, and a hydrogen using a semiclassical approach. A question of whether the transition is allowed or forbidden is of great importance. We have a selection rule to judge it.

4.1

Electric Dipole Transition

We have a time-dependent Schrödinger equation described as Hψ ¼ ih

∂ψ : ∂t

ð1:47Þ

Using the method of separation of variables, we obtained two equations expressed below. HϕðxÞ ¼ EϕðxÞ,

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_4

ð1:55Þ

125

126

4 Optical Transition and Selection Rules

ih

∂ξðt Þ ¼ Eξðt Þ: ∂t

ð1:56Þ

Equation (1.55) is an eigenvalue equation of energy and (1.56) is an equation with time. So far we have focused our attention upon (1.55) taking a one-dimensional harmonic oscillator and hydrogen-like atoms as an example. In this chapter we deal with a time-evolved Schrödinger equation and its relevance to an optical transition. The optical transition takes place according to selection rules. We mention their significance as well. We showed that after solving the eigenvalue equation, the Schrödinger equation is expressed as ψ ðx, t Þ ¼ ϕðxÞ exp ðiEt=hÞ:

ð1:60Þ

The probability density of the system (i.e., normally a particle such as an electron and a harmonic oscillator) residing at a certain place x at a certain time t is expressed as ψ ðx, t Þψ ðx, t Þ: If the Schrödinger equation is described as a form of separated variables as in the case of (1.60), the exponential factors including t cancel out and we have ψ ðx, t Þψ ðx, t Þ ¼ ϕ ðxÞϕðxÞ:

ð4:1Þ

This means that the probability density of the system depends only on spatial coordinate and is constant in time. Such a state is said to be a stationary state. That is, the system continues residing in a quantum state described by ϕ(x) and remains unchanged independent of time. Next, we consider a linear combination of functions described by (1.60). That is, we have ψ ðx, t Þ ¼ c1 ϕ1 ðxÞ exp ðiE 1 t=hÞ þ c2 ϕ2 ðxÞ exp ðiE 2 t=hÞ,

ð4:2Þ

where the first term is pertinent to the state 1 and second term to the state 2; c1 and c2 are complex constants with respect to the spatial coordinates but may be weakly time-dependent. The state described by (4.2) is called a coherent state. The probability distribution of that state is described as ψ ðx, t Þψ ðx, t Þ ¼ jc1 j2 jϕ1 j2 þ jc2 j2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 eiωt þ c2 c1 ϕ2 ϕ1 eiωt , where ω is expressed as

ð4:3Þ

4.1 Electric Dipole Transition

127

ω ¼ ðE 2 E 1 Þ=h:

ð4:4Þ

This equation shows that the probability density of the system undergoes a sinusoidal oscillation with time. The angular frequency equals the energy difference between the two states divided by the reduced Planck constant. If the system is a charged particle such as an electron and proton, the sinusoidal oscillation is accompanied by an oscillating electromagnetic field. Thus, the coherent state is associated with the optical transition from one state to another, when the transition is related to the charged particle. The optical transitions result from various causes. Of these, the electric dipole transition yields the largest transition probability and the dipole approximation is often chosen to represent the transition probability. From the point of view of optical measurements, the electric dipole transition gives the strongest absorption or emission spectral lines. The matrix element of the electric dipole, more specifically a square of an absolute value of the matrix element, is a measure of the optical transition probability. Labelling the quantum states as a, b, etc. and describing the corresponding state vector as j ai, j bi, etc., the matrix element Pba is given by Pba hbjεe Pjai,

ð4:5Þ

where εe is a unit polarization vector of the electric field of an electromagnetic wave (i.e., light). Equation (4.5) describes the optical transition that takes place as a result of the interaction between electrons and radiation field in such a way that the interaction causes electrons in the system to change the state from j ai to j bi. That interaction is represented by εe P. The quantum states j ai and j bi are referred to as an initial state and final state, respectively. The quantity P is the electric dipole moment of the system, which is defined as P eΣj xj ,

ð4:6Þ

where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron. Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be transition dipole moment, or, more precisely, transition electric dipole moment with respect to the states j ai and j bi. We assume that the optical transition occurs from a quantum state j ai to another state j bi. Since Pba is generally a complex number, jPbaj2 represents the transition probability. If we adopt the coordinate representation, (4.5) is expressed by Z Pba ¼

ϕb εe Pϕa dτ,

where τ denotes an integral range of a space.

ð4:7Þ

128

4.2

4 Optical Transition and Selection Rules

One-Dimensional System

Let us apply the aforementioned general description to individual cases of Chaps. 1– 3. Example 4.1: A Particle Confined in a Square-Well Potential This example was treated in Chap. 1. As before, we assume that a particle (i.e., electron) is confined in a one-dimensional system [L x L (L > 0)]. We consider the optical transition from the ground state ϕ1(x) to the first excited state ϕ2(x). Here, we put L ¼ π/2 for convenience. Then, the normalized coherent state ψ(x) is described as 1 ψ ðx, t Þ ¼ pffiffiffi ½ϕ1 ðxÞ exp ðiE 1 t=hÞ þ ϕ2 ðxÞ exp ðiE 2 t=hÞ, 2

ð4:8Þ

where we put c1 ¼ c2 ¼ p1ffiffi2 in (4.2). In (4.8), we have rffiffiffi rffiffiffi 2 2 ϕ1 ð xÞ ¼ cos x and ϕ2 ðxÞ ¼ sin 2x: π π

ð4:9Þ

Following (4.3), we have a following real function called a probability distribution density: ψ ðx, t Þψ ðx, t Þ ¼

1 cos 2 x þ sin 2 2x þ ð sin 3x þ sin xÞ cos ωt , π

ð4:10Þ

where ω is given by (4.4) as ω ¼ 3h=2m,

ð4:11Þ

where m is a mass of an electron. Rewriting (4.10), we have h i 1 1 1 þ ð cos 2x cos 4xÞ þ ð sin 3x þ sin xÞ cos ωt : ð4:12Þ π 2 π π Integrating (4.12) over 2, 2 , a contribution from only the first term is nonvanishing to give 1, as anticipated (because of the normalization). Putting t ¼ 0 and integrating (4.12) over a positive domain 0, π2 , we have ψ ðx, t Þψ ðx, t Þ ¼

Z 0

π=2

1 4 0:924: ψ ðx, 0Þψ ðx, 0Þdx ¼ þ 2 3π

Similarly, integrating (4.12) over a negative domain π2, 0 , we have

ð4:13Þ

4.2 One-Dimensional System

∗

,0

,0

Fig. 4.1 Probability distribution density ψ (x, t)ψ(x, t) of a particle confined in a square-well potential. The solid curve and broken curve represent the density of t ¼ 0 and t ¼ π/ω (i.e., half period), respectively

129

− /2

Z

0

π=2

0

ψ ðx, 0Þψ ðx, 0Þdx ¼

1 4 0:076: 2 3π

/2

ð4:14Þ

Thus, 92% of a total charge (as a probability density) is concentrated in the positive domain. Differentiation of ψ (x, 0)ψ(x, 0) gives five extremals including both edges. Of these, a major maximum is located at 0.635 radian that corresponds to about 40% of π/2. This can be a measure of the transition moment. Figure 4.1 demonstrates these results (see a solid curve). Meanwhile, putting t ¼ π/ω (i.e., half period), we plot ψ (x, π/ω)ψ(x, π/ω). The result shows that the graph is obtained by folding back the solid curve of Fig. 4.1 with respect to the ordinate axis. Thus, we find that the charge (or the probability density) exerts a sinusoidal oscillation with an angular frequency 3 h/2m along the x-axis around the origin. Let e1 be a unit vector in the positive direction of the x-axis. Then, the electric dipole P of the system is P = ex = exe1 ,

ð4:15Þ

where x is a position vector of the electron. Let us define the matrix element of the electric dipole transition as P21 hϕ2 ðxÞje1 Pjϕ1 ðxÞi ¼ hϕ2 ðxÞjexjϕ1 ðxÞi:

ð4:16Þ

Notice that we only have to consider that the polarization of light is parallel to the xaxis. With the coordinate representation, we have Z P21 ¼ ¼e

π=2

π=2

2 π

Z

ϕ2 ðxÞexϕ1 ðxÞdx π=2

π=2

rffiffiffi rffiffiffi 2 2 ð cos xÞex sin 2x dx ¼ π π π=2 Z

x cos x sin 2x dx ¼

π=2

e π

Z

π=2

π=2

xð sin x þ sin 3xÞdx

130

4 Optical Transition and Selection Rules

¼

e π

Z

π=2 π=2

0 1 16e , xð cos xÞ0 þ x cos 3x dx ¼ 3 9π

ð4:17Þ

where we used a trigonometric formula and integration by parts. The factor 16/9π in (4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is estimated above from the major maximum of ψ (x, 0)ψ(x, 0). Note that the transition moment vanishes if the two states associated with the transition have the same parity. In other words, if these are both described by sine functions or cosine functions, the integral vanishes. Example 4.2: One-Dimensional Harmonic Oscillator Second, let us think of an optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We denote the state of the oscillator as j ni in place of jψ ni (n ¼ 0, 1, 2, ) of Chap. 2. Then, a general expression (4.5) can be written as Pkl ¼ hkjεe Pjli:

ð4:18Þ

Since we are considering the sole one-dimensional oscillator, εe ¼ e q and P = eq,

ð4:19Þ

where e q is a unit vector in the positive direction of the coordinate q. Therefore, similarly to the above we have εe P = eq:

ð4:20Þ

Pkl ¼ ehkjqjli:

ð4:21Þ

That is,

Since q is an Hermitian operator, we have

Pkl ¼ e ljq{ jk ¼ ehljqjki ¼ Plk ,

ð4:22Þ

where we used (1.116). Using (2.68), we have rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi

h h { kja þ a jl ¼ e Pkl ¼ e hkjajli þ kja{ jl : 2mω 2mω Taking the adjoint of (2.62) and modifying the notation, we have

ð4:23Þ

4.2 One-Dimensional System

131

D pffiffiffiffiffiffiffiffiffiffiffi kja ¼ k þ 1hk þ 1j:

ð4:24Þ

Using (2.62) once again, we get rffiffiffiffiffiffiffiffiffiffih i pffiffiffiffiffiffiffiffiffiffi h pffiffiffiffiffiffiffiffiffiffiffi k þ 1hk þ 1jli þ l þ 1hkjl þ 1i : Pkl ¼ e 2mω

ð4:25Þ

Using orthonormal conditions between the state vectors, we have rffiffiffiffiffiffiffiffiffiffih i pffiffiffiffiffiffiffiffiffiffi h pffiffiffiffiffiffiffiffiffiffiffi k þ 1δkþ1,l þ l þ 1δk,lþ1 : Pkl ¼ e 2mω

ð4:26Þ

Exchanging k and l in the above, we get Pkl ¼ Plk : The matrix element Pkl is symmetric with respect to indices k and l. Notice that the first term does not vanish only when k + 1 ¼ l. The second term does not vanish only when k ¼ l + 1. Therefore we get Pk,kþ1

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hð k þ 1Þ hð l þ 1Þ hð k þ 1Þ and Plþ1,l ¼ e or Pkþ1,k ¼ e : ð4:27Þ ¼e 2mω 2mω 2mω

Meanwhile, we find that the transition matrix P is expressed as rffiffiffiffiffiffiffiffiffiffi h P ¼ eq ¼ e a þ a{ 2mω 0 0 1 0 0 pffiffiffi B 1 0 2 0 pffiffiffi rffiffiffiffiffiffiffiffiffiffiB pffiffiffi B B 0 2 0 3 h B pffiffiffi ¼e B 2mωB 0 0 3 0 B @ 0 0 0 2 ⋮ ⋮ ⋮ ⋮

0 0 0 2 0 ⋮

1

C C C C C, C C C A

ð4:28Þ

⋱

where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix. Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and (2.66). It is an intuitively obvious and straightforward task. Having a glance at the matrix form immediately tells us that the transition matrix elements are nonvanishing with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)-element represents transition from the k-th excited state to (k 1)-th excited state accompanied by photoemission, the (k + 1, k)-element implies the transition from (k 1)-excited state

132

4 Optical Transition and Selection Rules

to k-th excited state accompanied by photoabsorption. The two transitions give the same transition moment. Note that zeroth excited state means the ground state; see (2.64) for basis vector representations. We should be careful about “addresses” of the matrix accordingly. For example, P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2) element. Suppose that we seek the transition dipole moments using coordinate representation. Then, we need to use (2.106) and perform definite integration. For instance, we have Z e

1

1

ψ 0 ðqÞqψ 1 ðqÞdq

that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e

qffiffiffiffiffiffiffi h 2mω.

The confirmation is left for the readers. Nonetheless, to seek a definite integral of product of higher excited-state wave functions becomes increasingly troublesome. In this respect, the operator method described above provides us with a much better insight into complicated calculations. Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to occur only when the quantum number changes by one. Notice also that the transition takes place between the even function and odd function; see Table 2.1 and (2.101). Such a condition or restriction on the optical transition is called a selection rule. The former equation of (4.27) shows that the transition takes place from the upper state to the lower state accompanied by the photon emission. The latter equation, on the other hand, shows that the transition takes place from the lower state to the upper accompanied by the photon absorption.

4.3

Three-Dimensional System

The hydrogen-like atoms give us a typical example. Since we have fully investigated the quantum states of those atoms, we make the most of the related results. Example 4.3: An Electron in a Hydrogen Atom Unlike the one-dimensional system, we have to take account of an angular momentum in the three-dimensional system. We have already obtained explicit wave functions. Here we focus on 1s and 2p states of a hydrogen. For their normalized states we have ϕð1sÞ ¼

rffiffiffiffiffiffiffi 1 r=a e , πa3

ð4:29Þ

4.3 Three-Dimensional System

133

rffiffiffiffiffiffiffiffiffiffi 1 r r=2a e ϕ 2pz cos θ, 2πa3 a rffiffiffiffiffiffiffi 1 1 r r=2a ϕ 2pxþiy ¼ e sin θeiϕ , 8 πa3 a rffiffiffiffiffiffiffi 1 1 r r=2a ϕ 2pxiy ¼ e sin θeiϕ , 8 πa3 a

1 ¼ 4

ð4:30Þ ð4:31Þ ð4:32Þ

where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is due to the Condon–Shortley phase. Even though the transition probability is proportional to a square of the matrix element and so the phase factor cancels out, we describe the state vector faithfully. The energy eigenvalues are E ð1sÞ ¼

h2 h2 , E 2pz ¼ E 2pxþiy ¼ E 2pxiy ¼ , 2 2μa 8μa2

ð4:33Þ

where μ is a reduced mass of a hydrogen. Note that the latter three states are degenerate. First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the normalized coherent state is described as 1 ψ ðx, t Þ ¼ pffiffiffi f ϕð1sÞ exp ½iE ð1sÞt=h þ ϕ 2pz exp iE 2pz t=h g: 2

ð4:34Þ

As before, we have ψ ðx, t Þψ ðx, t Þ ¼ jψ ðx, t Þj2 n o 2 1 ½ϕð1sÞ2 þ ϕ 2pz ¼ þ 2ϕð1sÞϕ 2pz cos ωt , 2

ð4:35Þ

where ω is given by ω ¼ E 2pz Eð1sÞ =h ¼ 3h=8 μa2 :

ð4:36Þ

In virtue of the third term of (4.35) that contains a cos ωt factor, the charge distribution undergoes a sinusoidal oscillation along the z-axis with an angular frequency described by (4.36). For instance, ωt ¼ 0 gives +1 factor to (4.35) when t ¼ 0, whereas it gives 1 factor when ωt ¼ π, i.e., t ¼ 8π μa2/3h. Integrating (4.35), we have

134

4 Optical Transition and Selection Rules

Z

¼

ψ ðx, t Þψ ðx, t Þdτ ¼ 1 2

Z n

Z jψ ðx, t Þj2 dτ

2 o ½ϕð1sÞ2 þ ϕ 2pz dτ þ cos ωt

Z

1 1 ϕð1sÞϕ 2pz dτ ¼ þ ¼ 1, 2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with orthogonality of them. Note that both of the functions are real. Next, we calculate the matrix element. For simplicity, we denote the matrix element simply as Pðεe Þ only by designating the unit polarization vector εe. Then, we have

Pðεe Þ ¼ ϕð1sÞjεe Pjϕ 2pz ,

ð4:37Þ

0 1 x B C P = ex = eðe1 e2 e3 Þ@ y A:

ð4:38Þ

where

z We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we have

ðe Þ Pz,jp3 ¼ e ϕð1sÞjzjϕ 2pz i z pffiffiffi Z 1 Z π Z 2π e 27 2 4 3r=2a 2 ¼ pffiffiffi r e dr cos θ sin θdθ dϕ ¼ 5 ea 0:745ea: ð4:39Þ 3 4 2πa4 0 0 0 ðe Þ

In (4.39), we express the matrix element as Pz,jp3

to indicate the z-component of i position vector and to explicitly show that ϕ(2pz) state is responsible for the transition. In (4.39), we used z ¼ r cos θ. We also used a radial part integration such that Z

1 0

r 4 e3r=2a dr ¼ 24

z

2a 5 : 3

Also, we changed a variable cos θ ⟶ t to perform the integration with respect to θ. We see that a “leverage” length of the transition moment is comparable to Bohr radius a. ðe Þ With the notation Pz,jp3 we need some explanation for consistency with the latter zi description. Equation (4.39) represents the transition from jϕ(2pz)i to jϕ(1s)i that is accompanied by the photon emission. Thus, j pzi in the notation means that jϕ(2pz)i

4.3 Three-Dimensional System

135

is the initial state. In the notation, in turn, (e3) denotes the polarization vector and z represents the electric dipole. In the case of photon absorption where the transition occurs from jϕ(1s)i to jϕ(2pz)i, we use the following notation:

ðe Þ Pz, 3p j ¼ e ϕ 2pz jzjϕð1sÞ : hz

ð4:40Þ

Since all the functions related to the integration are real, we have ðe Þ

ðe Þ

Pz,jp3 ¼ Pz, 3p j : hz zi Meanwhile, if we choose e1 for εe to evaluate the matrix element Px, we have

ðe Þ Px,jp1 ¼ e ϕð1sÞjxjϕ 2pz zi Z 1 Z π Z 2π e ¼ pffiffiffi r 4 e3r=2a dr sin 2 θ cos θdθ cos ϕdϕ ¼ 0, 4 2πa4 0 0 0

ð4:41Þ

where cos ϕ comes from x ¼ r sin θ cos ϕ and an integration of cos ϕ gives zero. In a similar manner, we have

ðe Þ Py, 2j p ¼ e ϕð1sÞjyjϕ 2pz ¼ 0: zi

ð4:42Þ

Next, we estimate the matrix elements associated with 2px and 2py. For this purpose, it is convenient to introduce the following complex coordinates by a unitary transformation: 10 0 1 1 1 0 1 p ffiffi ffi p ffiffi ffi 0 CB pffiffiffi B x 2 2 CB 2 B CB B B C ðe1 e2 e3 Þ@ y A ¼ ðe1 e2 e3 ÞB piffiffiffi pi ffiffiffi 0 CB p1ffiffiffi CB B 2 2 A@ 2 @ z 0 0 1 0 0 1 1 pffiffiffi ðx þ iyÞ C

B 2 B C 1 1 C, ¼ pffiffiffi ðe1 ie2 Þ pffiffiffi ðe1 þ ie2 Þ e3 B 1 B C 2 2 @ pffiffiffi ðx iyÞ A 2 z

1 i pffiffiffi 0 C0 1 2 C x CB C i @yA pffiffiffi 0 C C 2 A z 0 1

where a unitary transformation is represented by a unitary matrix defined as

ð4:43Þ

136

4 Optical Transition and Selection Rules

U { U ¼ UU { ¼ E:

ð4:44Þ

We will investigate details of the unitary transformation and matrix in Parts III and IV. We define e+ and e as follows [1]: 1 1 eþ pffiffiffi ðe1 þ ie2 Þ and e pffiffiffi ðe1 ie2 Þ, 2 2

ð4:45Þ

where complex vectors e+ and e represent the left-circularly polarized light and right-circularly polarized light that carry an angular momentum h and h, respectively. We will revisit the characteristics and implication of these complex vectors in Sect. 7.4. We have 0 1 1 0 1 p ffiffi ffi ð x þ iy Þ x B 2 C B C B C C ð e1 e2 e3 Þ @ y A ¼ ð e eþ e3 Þ B 1 B pffiffiffi ðx iyÞ C: @ A 2 z z

ð4:46Þ

Note that e+, e, and e3 are orthonormal. That is, heþ jeþ i ¼ 1, heþ je i ¼ 0, etc:

ð4:47Þ

In this situation, e+, e, and e3 are said to form an orthonormal basis in a threedimensional complex vector space (see Sect. 11.4). Now, choosing e+ for εe, we have [2] ðeþ Þ Pxiy, j pþ i

1 e ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxþiy , 2

ð4:48Þ

where j p+i is a shorthand notation of ϕ(2px + iy); x iy represents a complex electric dipole. Equation (4.48) represents an optical process in which an electron causes transition from ϕ(2px + iy) to ϕ(1s) to lose an angular momentum h, whereas the radiation field gains that angular momentum to conserve a total angular momentum ðeþ Þ reflects this situation. Using the coordinate representation, h. The notation Pxiy, j pþ i we rewrite (4.48) as

4.3 Three-Dimensional System

ðe Þ

þ Pxiy,jp

137

Z π Z 2π Z 1 e ¼ pffiffiffi r 4 e3r=2a dr sin 3 θdθ eiϕ eiϕ dϕ þi 8 2πa4 0 0 0 pffiffiffi 27 2 ¼ 5 ea, 3

ð4:49Þ

where we used x iy ¼ r sin θeiϕ :

ð4:50Þ

In the definite integral of (4.49), eiϕ comes from x iy, while eiϕ comes from ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular momentum eigenvalue h. Notice that in (4.49) exponents eiϕ and eiϕ cancel out and that an azimuthal integral is nonvanishing. If we choose e for εe, we have 1 ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxþiy 2 Z 1 Z π Z 2π r 4 e3r=2a dr sin 3 θdθ e2iϕ dϕ ¼ 0,

ðe Þ Pxþiy,jp þi

e ¼ pffiffiffi 8 2πa4

0

0

ð4:51Þ ð4:52Þ

0

where we used x þ iy ¼ r sin θeiϕ : With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy) which renders the integral (4.51) vanishing. Note that the only difference between (4.49) and (4.52) is about the integration of ϕ factor. For the same reason, if we choose e3 for εe, the matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element, only ð eþ Þ survives. Similarly, with the ϕ(2px iy)-related matrix element, only Pxiy, j pþ i ð e Þ Pxþiy,jp survives. Notice that j pi is a shorthand notation of ϕ(2px iy). That is, i we have, e.g., ðe Þ Pxþiy,jp i ðe Þ

þ Pxiy, jp

i

pffiffiffi 1 27 2 ¼ e ϕð1sÞjpffiffiffi ðx þ iyÞjϕ 2pxiy ¼ 5 ea, 3 2 1 ¼ e ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxiy ¼ 0: 2

Taking complex conjugate of (4.48), we have

ð4:53Þ

138

4 Optical Transition and Selection Rules

ðeþ Þ Pxiy, j pþ i

1 ¼ e ϕ 2pxþiy jpffiffiffi ðx þ iyÞjϕð1sÞ 2

pffiffiffi 27 2 ¼ 5 ea 3 ðe Þ

þ Here recall (1.116) and (x iy){ ¼ x + iy. Also note that since Pxiy,

ð eþ Þ [Pxiy, j pþ i

ð4:54Þ

j pþ i

is real,

is real as well so that we have

ð eþ Þ Pxiy, j pþ i

ðe Þ

þ ¼ Pxiy,

j pþ i

ðe Þ

¼ Pxþiy,

hpþ j

:

ð4:55Þ

Comparing (4.48) and (4.55), we notice that the polarization vector has been switched from e+ to e with the allowed transition, even though the matrix element remains the same. This can be explained as follows: In (4.48) the photon emission is occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a result, the radiation field has gained an angular momentum by h during the process in which the electron has lost an angular momentum h. In other words, h is transferred from the electron to the radiation field and this process results in the generation of left-circularly polarized light in the radiation field. In (4.54), on the other hand, the reversed process takes place. That is, the photon absorption is occurring in such a way that the electron is excited from ϕ(1s) to ϕ(2px + iy). After this process has been completed, the electron has gained an angular momentum by h, whereas the radiation field has lost an angular momentum by h. As a result, the positive angular momentum h is transferred to the electron from the radiation field that involves left-circularly polarized light. This can be translated into the statement that the radiation field has gained an angular momentum by h. This is equivalent to the generation of right-circularly polarized light (characterized by e) in the radiation field. In other words, the electron gains the angular momentum by h to compensate the change in the radiation field. The implication of the first equation of (4.53) can be interpreted in a similar manner. Also we have h

ðe Þ

Pxþiy, jp

¼

i i

pffiffiffi 27 2 ea: 35

ðe Þ

¼ Pxþiy, jp

i

1 ð eþ Þ p ffiffi ffi j ¼ Pxiy, ¼ e ϕ 2p ð x iy Þjϕ ð 1s Þ xiy hp j 2

Notice that the inner products of (4.49) and (4.53) are real, even though operators ðeþ Þ ð e Þ of (4.49) and Pxþiy, x + iy and x iy are not Hermitian. Also note that Pxiy, j p i j pþ i of (4.53) have the same absolute value with minus and plus signs, respectively. The minus sign of (4.49) comes from the Condon–Shortley phase. The difference, however, is not essential, because the transition probability is proportional to

4.3 Three-Dimensional System

2 ð e Þ ðeþ Þ Pxþiy,j p i or Pxiy,

139

2 . Some literature [3, 4] uses (x + iy) instead of x + iy. j pþ i

This is because simply of the inclusion of the Condon–Shortley phase; see (3.304). Let us think of the coherent state that is composed of ϕ(1s) and ϕ(2px + iy) or ϕ(2px iy). Choosing ϕ(2px + iy), the state ψ(x, t) can be given by 1 ψ ðx, t Þ ¼ pffiffiffi 2 ϕð1sÞ exp ðiE ð1sÞt=hÞ þ ϕ 2pxþiy exp iE 2pxþiy t=h , ð4:56Þ where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we have ψ ðx, t Þψ ðx, t Þ ¼ jψ ðx, t Þj2 n io 2 h 1 ¼ jϕð1sÞj2 þ ϕ 2pxþiy þ ϕð1sÞR 2pxþiy eiðϕωtÞ þ eiðϕωtÞ 2 n o 2 1 ð4:57Þ ¼ jϕð1sÞj2 þ ϕ 2pxþiy þ 2ϕð1sÞR 2pxþiy cos ðϕ ωt Þ , 2 where using R 2pxþiy , we denote ϕ(2px+iy) as follows: ϕ 2pxþiy R 2pxþiy eiϕ :

ð4:58Þ

That is, R 2pxþiy represents a real component of ϕ(2px+iy) that depends only on r and θ. The third term of (4.57) implies that the existence probability density of an electron represented by jψ(x, t)j2 is rotating counterclockwise around the z-axis with an angular frequency of ω. Similarly, in the case of ϕ(2pxiy), the existence probability density of an electron is rotating clockwise around the z-axis with an angular frequency of ω. Integrating (4.57), we have Z

ψ ðx, t Þψ ðx, t Þdτ ¼

1 ¼ 2

Z jψ ðx, t Þj2 dτ

Z n 2 o jϕð1sÞj2 þ ϕ 2pxþiy dτ Z

þ

Z

1 2

r dr 0

0

π

sin θ dθ ϕð1sÞR 2pxþiy

Z

2π 0

1 1 dϕ cos ðϕ ωt Þ ¼ þ ¼ 1, 2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2px+iy); the last term vanishes because

140

4 Optical Transition and Selection Rules

Z

2π

dϕ cos ðϕ ωt Þ ¼ 0:

0

This is easily shown by suitable variable transformation. In relation to the above discussion, we often use real numbers to describe wave functions. For this purpose, we use the following unitary transformation to transform the orthonormal basis of e imϕ to cos mϕ and sin mϕ. That is, we have 0 ð1Þm

pffiffiffi B ð1Þm 1 1 1 2 pffiffiffi cos mϕ pffiffiffi sin mϕ ¼ pffiffiffiffiffi eimϕ pffiffiffiffiffi eimϕ B @ 1 π π 2π 2π pffiffiffi 2

1 ð1Þm i pffiffiffi C 2 C, A i pffiffiffi 2 ð4:59Þ

where we assume that m is positive so that we can appropriately take into account the Condon–Shortley phase. Alternatively, we describe it via unitary transformation as follows:

ð1Þm imϕ 1 imϕ 1 1 pffiffiffiffiffi e pffiffiffiffiffi e ¼ pffiffiffi cos mϕ pffiffiffi sin mϕ π π 2π 2π 0 1 m ð1Þ 1 pffiffiffi pffiffiffi B 2 C 2 C, B @ ð1Þm i i A pffiffiffi pffiffiffi 2 2

ð4:60Þ

In this regard, we have to be careful about normalization constants; for trigonometric functions the constant should be p1ffiffiπ , whereas for the exponential representation the constant is p1ffiffiffiffi . At the same time, trigonometric functions are expressed as a linear 2π combination of eimϕ and eimϕ, and so if we use the trigonometric functions, information of a magnetic quantum number is lost. eðnÞ ¼ Y m ðθ, ϕÞR eðl nÞ ðr Þ of the In Sect. 3.7, we showed normalized functions Λ l l,m

imϕ e ðnÞ hydrogen-like atom. Noting that Y m , Λl,m can be l ðθ, ϕÞ is proportional to e described using cosmϕ and sinmϕ for the basis vectors. We denote two linearly eðnÞ eðnÞ independent vectors by Λ l, cos mϕ and Λl, sin mϕ . Then, these vectors are expressed as

4.3 Three-Dimensional System

141

0

ðnÞ e eðnÞ eðnÞ Λ Λ l, cos mϕ l, sin mϕ ¼ Λl,m

ð1Þm pffiffiffi B 2 eðnÞ B Λ l,m @ 1 pffiffiffi 2

1 ð1Þm i pffiffiffi C 2 C, A i pffiffiffi 2

ð4:61Þ

where we again assume that m is positive. In chemistry and materials science, we eðnÞ eðnÞ normally use real functions of Λ l, cos mϕ and Λl, sin mϕ . In particular, we use the eð2Þ eð2Þ , respectively. In notations of, e.g., ϕ(2px) and ϕ(2py) instead of Λ and Λ 1, cos ϕ

1, sin ϕ

that case, we explicitly have a following form: 0 1 1 i p ffiffi ffi p ffiffi ffi ð2Þ ð2Þ B 2 2C e e Λ B C ϕð2px Þ ϕ 2py ¼ Λ 1,1 1,1 @ 1 i A pffiffiffi pffiffiffi 2 2 0 1 1 i pffiffiffi pffiffiffi B 2 2C C ¼ ϕ 2pxþiy ϕ 2pxiy B @ 1 i A pffiffiffi pffiffiffi 2 2

r r 1 1 32 r 2a 32 r 2a pffiffiffiffiffi a pffiffiffiffiffi a ¼ e sin θ cos ϕ e sin θ sin ϕ : a a 4 2π 4 2π

ð4:62Þ

Thus, the Condon–Shortley phase factor has been removed. Using this expression, we calculate matrix elements of the electric dipole transition. We have ðe Þ

Px,jp1 i ¼ ehϕð1sÞjxjϕð2px Þi x

e ¼ pffiffiffi 4 2πa4

Z

0

1

r 4 e3r=2a dr

Z

π

Z sin 3 θdθ

0

0

2π

cos 2 ϕdϕ ¼

pffiffiffi 27 2 ea: 35

ð4:63Þ

Thus, we obtained the same result as (4.49) apart from the minus sign. Since a square of an absolute value of the transition moment plays a role, the minus sign is again of secondary importance. With Pðye2 Þ , similarly we have

ðe Þ Py,jp2 ¼ e ϕð1sÞjyjϕ 2py i y pffiffiffi Z 1 Z π Z 2π e 27 2 4 3r=2a 3 2 ¼ pffiffiffi r e dr sin θdθ sin ϕdϕ ¼ 5 ea: 3 4 2πa4 0 0 0 Comparing (4.39), (4.63), and (4.64), we have

ð4:64Þ

142

4 Optical Transition and Selection Rules

ðe Þ Pz, 3 p j zi ðe Þ

¼

ðe Þ

ðe Þ Px,j1 p i x

¼

ðe Þ Py, 2 p j yi

pffiffiffi 27 2 ¼ 5 ea: 3

ðe Þ

In the case of Pz,jp3 , Px,j1 p i , and Py,jp2 , the optical transition is said to be polarized x zi yi along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant. Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.

4.4

Selection Rules

In a three-dimensional system such as hydrogen-like atoms, quantum states of particles (i.e., electrons) are characterized by three quantum numbers: principal quantum numbers, orbital angular momentum quantum numbers (or azimuthal quantum numbers), and magnetic quantum numbers. In this section, we examine the selection rules for the electric dipole approximation. Of the three quantum numbers mentioned above, angular momentum quantum numbers are denoted by l and magnetic quantum numbers by m. First, we examine the conditions on m. With the angular momentum operator L and its corresponding operator M, we get the following commutation relations: M y , z ¼ ix, ½M x , y ¼ iz; ½M z , iy ¼ x, M y , ix ¼ z, ½M x , iz ¼ y; etc: ½M z , x ¼ iy,

ð4:65Þ

Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the reader. Thus, we have, e.g., ½M z , x þ iy ¼ x þ iy, ½M z , x iy ¼ ðx iyÞ, etc:

ð4:66Þ

Putting Qþ x þ iy and Q x iy, we have ½M z , Qþ ¼ Qþ , ½M z , Q ¼ Q : Taking an inner product of both sides of (4.67), we have

ð4:67Þ

4.4 Selection Rules

143

hm0 j½M z , Qþ jmi ¼ hm0 jM z Qþ Qþ M z jmi ¼ m0 hm0 jQþ jmi mhm0 jQþ jmi ¼ hm0 jQþ jmi,

ð4:68Þ

where the quantum state j mi is identical to j l, mi in (3.151). Here we need no information about l, and so it is omitted. Thus, we have, e.g., Mzj mi ¼ mj mi. Taking its adjoint, we have hm j Mz{ ¼ hm j Mz ¼ mhmj, where Mz is Hermitian. These results lead to (4.68). From (4.68), we get ðm0 m 1Þhm0 jQþ jmi ¼ 0:

ð4:69Þ

Therefore, for the matrix element hm0j Q+j mi not to vanish, we must have m0 m 1 ¼ 0 or Δm ¼ 1 ðΔm m0 mÞ: This represents the selection rule with respect to the coordinate Q+. Similarly, we get ðm0 m þ 1Þhm0 jQ jmi ¼ 0:

ð4:70Þ

In this case, for the matrix element hm0j Qj mi not to vanish we have m0 m þ 1 ¼ 0 or Δm ¼ 1: To derive (4.70), we can alternatively use the following: Taking the adjoint of (4.69), we have ðm0 m 1ÞhmjQ jm0 i ¼ 0: Exchanging m0 and m, we have ðm m0 1Þhm0 jQ jmi ¼ 0 or ðm0 m þ 1Þhm0 jQ jmi ¼ 0: Thus, (4.70) is recovered. Meanwhile, we have a commutation relation ½M z , z ¼ 0: Similarly, taking an inner product of both sides of (4.71), we have ðm0 mÞhm0 jzjmi ¼ 0: Therefore, for the matrix element hm0j zj mi not to vanish, we must have

ð4:71Þ

144

4 Optical Transition and Selection Rules

m0 m ¼ 0 or Δm ¼ 0:

ð4:72Þ

These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if circularly polarized light takes part in the optical transition, Δm ¼ 1. For instance, using the present notation we rewrite (4.48) as

1 ϕð1sÞjpffiffiffi ðx iyÞjϕ 2pxþiy 2

pffiffiffi 1 27 2 ¼ pffiffiffi h0jQ j1i ¼ 5 a: 3 2

If linearly polarized light is related to the optical transition, we have Δm ¼ 0. Next, we examine the conditions on l. To this end, we calculate a following commutator [5]:

M2 , z ¼ M x 2 þ M y2 þ M z 2 , z ¼ M x 2 , z þ M y 2 , z ¼ M x ðM x z zM x Þ þ M x zM x zM x 2 þ M y M y z zM y þ M y zM y zM y 2

¼ M x ½M x , z þ ½M x , zM x þ M y M y , z þ M y , z M y ¼ i M y x þ xM y M x y yM x ¼ i M x y yM x M y x þ xM y þ 2M y x 2M x y ¼ i 2iz þ 2M y x 2M x y ¼ 2i M y x M x y þ iz :

ð4:73Þ

In the above calculations, (i) we used [Mz, z] ¼ 0 (with the second equality); (ii) RHS was modified so that the commutation relations can be used (the third equality); (iii) we used Mxy ¼ Mxy 2Mxy and Myx ¼ Myx + 2Myx so that we can use (4.65) (the second last equality). Moreover, using (4.65), (4.73) can be written as 2 M , z ¼ 2i xM y M x y ¼ 2i M y x yM x : Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For further use, we give alternative relations such that M 2 , x ¼ 2i yM z M y z ¼ 2i M z y zM y , 2 M , y ¼ 2iðzM x M z xÞ ¼ 2iðM x z xM z Þ:

Using (4.73), we calculate another commutator such that

ð4:74Þ

4.4 Selection Rules

145

2 2 M , M , z ¼ 2i M 2 , M y x M 2 , M x y þ i M 2 , z ¼ 2i M y M 2 , x M x M 2 , y þ i M 2 , z ¼ 2i 2iM y yM z M y z 2iM x ðM x z xM z Þ þ i M 2 , z ¼ 2 2 M x x þ M y y þ M z z M z 2 M x 2 þ M y 2 þ M z 2 z ð4:75Þ þM 2 z z M 2 g ¼ 2 M 2 z þ z M 2 : In the above calculations, (i) we used [M2, My] ¼ [M2, Mx] ¼ 0 (with the second equality); (ii) we used (4.74) (the third equality); (iii) RHS was modified so that we can use the relation M ⊥ x from the definition of the angular momentum operator, i.e., Mxx + Myy + Mzz ¼ 0 (the second last equality). We used [Mz, z] ¼ 0 as well. Similar results are obtained with x and y. That is, we have

M 2 , M 2 , x ¼ 2 M 2 x þ xM 2 , 2 2 M , M , y ¼ 2 M 2 y þ yM 2 :

ð4:76Þ ð4:77Þ

Rewriting, e.g., (4.75), we have M 4 z 2M 2 z M 2 þ z M 4 ¼ 2 M 2 z þ z M 2 :

ð4:78Þ

Using the relation (4.78) and taking inner products of both sides, we get, e.g.,

0 4

l jM z 2M 2 z M 2 þ z M 4 jl ¼ l0 j2 M 2 z þ z M 2 jl :

ð4:79Þ

That is, 0 4

l jM z 2M 2 z M 2 þ z M 4 jl l0 j2 M 2 z þ z M 2 jl ¼ 0: Considering that both terms of LHS contain a factor hl0j zj li in common, we have h i 2 2 l0 ðl0 þ 1Þ 2l0 lðl0 þ 1Þðl þ 1Þ þ l2 ðl þ 1Þ2 2l0 ðl0 þ 1Þ 2lðl þ 1Þ hl0 jzjli ¼ 0,

ð4:80Þ

where the quantum state j li is identical to j l, mi in (3.151) with m omitted. To factorize the first factor of LHS of (4.80), we view it as a quartic equation with respect to l0. Replacing l0 with l, we find that the first factor vanishes, and so the first factor should have a factor (l0 + l). Then, we factorize the first factor of LHS of (4.80) such that

146

4 Optical Transition and Selection Rules

½the first factor of LHS of ð4:80Þ 2 2 2 2 ¼ ðl0 þ lÞ ðl0 lÞ þ 2ðl0 þ lÞ l0 l0 l þ l2 2l0 lðl0 þ lÞ 2ðl0 þ lÞ ðl0 þ lÞ h i 2 2 ¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2 l0 l0 l þ l2 2l0 l 2 ðl0 þ lÞ h i 2 2 ¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2 l0 2l0 l þ l2 ðl0 þ l þ 2Þ h i 2 2 ¼ ðl0 þ lÞ ðl0 þ lÞðl0 lÞ þ 2ðl0 lÞ ðl0 þ l þ 2Þ n o 2 ¼ ðl0 þ lÞ ðl0 lÞ ½ðl0 þ lÞ þ 2 ðl0 þ l þ 2Þ ¼ ðl0 þ lÞðl0 þ l þ 2Þðl0 l þ 1Þðl0 l 1Þ:

ð4:81Þ

Thus rewriting (4.80), we get ðl0 þ lÞðl0 þ l þ 2Þðl0 l þ 1Þðl0 l 1Þhl0 jzjli ¼ 0:

ð4:82Þ

We have similar relations with respect to hl0j xj li and hl0j yj li because of (4.76) and (4.77). For the electric dipole transition to be allowed, among hl0j xj li, hl0j yj li, and hl0j zj li at least one term must be nonvanishing. For this, at least one of the four factors of (4.81) should be zero. Since l0 + l + 2 > 0, this factor is excluded. For l0 + l to vanish, we should have l0 ¼ l ¼ 0; notice that both l0 and l are nonnegative integers. We must then examine this condition. This condition is equivalent to that the spherical harmonics related to the angular variables θ and ϕ take the form pffiffiffiffiffi of Y00 ðθ, ϕÞ ¼ 1= 4π, i.e., a constant. Therefore, the θ-related integral for the matrix element hl0j zj li only consists of a following factor: Z 0

π

1 cos θ sin θdθ ¼ 2

Z 0

π

1 sin 2θdθ ¼ ½ cos 2θπ0 ¼ 0, 4

where cosθ comes from a polar coordinate z ¼ r cos θ; sinθ is due to an infinitesimal volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl0j zj li vanishes on condition that l0 ¼ l ¼ 0. As a polar coordinate representation, x ¼ r sin θ cos ϕ and y ¼ r sin θ sin ϕ, and so the ϕ-related integral hl0j xj li and hl0j yj li vanishes as well. That is, Z

2π 0

Z cos ϕdϕ ¼

2π

sin ϕdϕ ¼ 0:

0

Therefore, the matrix elements relevant to l0 ¼ l ¼ 0 vanish with all the coordinates; i.e., we have

4.5 Angular Momentum of Radiation

147

hl0 jxjli ¼ hl0 jyjli ¼ hl0 jzjli ¼ 0:

ð4:83Þ

Consequently, we exclude (l0 + l)-factor as well, when we consider a condition of the allowed transition. Thus, regarding the condition that should be satisfied with the allowed transition, from (4.82) we get l0 l þ 1 ¼ 0 or l0 l 1 ¼ 0:

ð4:84Þ

Or defining Δl l0 l, we get Δl ¼ 1:

ð4:85Þ

Thus, for the transition to be allowed, the azimuthal quantum number must change by one.

4.5

Angular Momentum of Radiation [6]

In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light acts on an electron, what can we anticipate? Here we deal with this problem within a framework of a semiclassical theory. Let E and H be an electric and magnetic field of a left-circularly polarized light, respectively. They are expressed as 1 E = pffiffiffi E 0 ðe1 þ ie2 Þ exp iðkz ωt Þ, 2 1 1 E H = pffiffiffi H 0 ðe2 2 ie1 Þ exp iðkz ωt Þ ¼ pffiffiffi 0 ðe2 2 ie1 Þ exp iðkz ωt Þ: 2 2 μv

ð4:86Þ ð4:87Þ

Here we assume that the light is propagating in the direction of the positive z-axis. The electric and magnetic fields described by (4.86) and (4.87) represent the leftcircularly polarized light. A synchronized motion of an electron is expected, if the electron exerts a circular motion in such a way that the motion direction of the electron is always perpendicular to the electric field and parallel to the magnetic field (see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron motion. Here, the Lorentz force F(t) is described by Fðt Þ ¼ eEðxðt ÞÞ þ exð_t Þ Bðxðt ÞÞ,

ð4:88Þ

where the first term is electric Lorentz force and the second term represents the magnetic Lorentz force.

148

4 Optical Transition and Selection Rules

y

Fig. 4.2 Synchronized motion of an electron under a left-circularly polarized light

electron motion

H electron

O

E

x

The quantity B called magnetic flux density is related to H as B = μ0H, where μ0 is permeability of vacuum; see (7.10) and (7.11) of Sect. 7.1. In (4.88) E and B are measured at a position where the electron is situated at a certain time t. Equation (4.88) universally describes the motion of a charged particle in the presence of electromagnetic fields. We consider another related example in Chap. 15. Equation (4.86) can be rewritten as 1 E = pffiffiffi E 0 ½e1 cos ðkz ωt Þ 2 e2 sin ðkz ωt Þ 2 1 þi pffiffiffi E 0 ½e2 cos ðkz ωt Þ þ e1 sin ðkz ωt Þ: 2

ð4:89Þ

Suppose that the electron exerts the circular motion in a region narrow enough around the origin and that the said electron motion is confined within the xy-plane that is perpendicular to the light propagation direction. Then, we can assume that z 0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have 1 E = pffiffiffi E0 ðe1 cos ωt þ e2 sin ωt Þ: 2 Thus, a force F exerting the electron is described by F = eE,

ð4:90Þ

where e is an elementary charge (e < 0). Accordingly, an equation of motion of the electron is approximated such that m€ x = eE,

ð4:91Þ

4.5 Angular Momentum of Radiation

149

where m is a mass of an electron and x is a position vector of the electron. With individual components of the coordinate, we have 1 1 m€x ¼ pffiffiffi eE0 cos ωt and m€y ¼ pffiffiffi eE 0 sin ωt: 2 2

ð4:92Þ

Integrating (4.92) two times, we get eE mx ¼ pffiffiffi 0 cos ωt þ Ct þ D, 2 ω2

ð4:93Þ

0 where C and D are integration constants. Setting xð0Þ ¼ pffiffieE and x 0 (0) ¼ 0, we 2mω2 have C ¼ D ¼ 0. Similarly, we have

eE my ¼ pffiffiffi 0 sin ωt þ C 0 t þ D0 , 2 ω2

ð4:94Þ

ffiffi 0 , we where C0 and D0 are integration constants. Setting y(0) ¼ 0 and y0 ð0Þ ¼ peE 2mω have C0 ¼ D0 ¼ 0. Thus, making t a parameter, we get x2 þ y2 ¼

2 eE pffiffiffi 0 : 2mω2

ð4:95Þ

This implies that the electron is exerting a counterclockwise circular motion with 0 under the influence of the electric field. This is consistent with a a radius pffiffieE 2mω2 motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in (4.57). An angular momentum the electron has acquired is L ¼ x p = xpy ypx ¼

eE meE e2 E 0 2 pffiffiffi 0 : pffiffiffi 0 ¼ 2 2mω3 2mω 2mω

ð4:96Þ

Identifying this with h, we have e2 E 0 2 ¼ h: 2mω3

ð4:97Þ

e2 E 0 2 ¼ hω: 2mω2

ð4:98Þ

In terms of energy, we have

Assuming a wavelength of the light is 600 nm, we need a left-circularly polarized light whose electric field is about 1.5 1010 [V/m].

150

4 Optical Transition and Selection Rules

A radius α of a circular motion of the electron is given by eE α ¼ pffiffiffi 0 : 2mω2

ð4:99Þ

Under the same condition as the above, α is estimated to be 2 Å.

References 1. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York 2. Fowles GR (1989) Introduction to modern optics, 2nd edn. Dover, New York 3. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World Scientific, Singapore 5. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 6. Rossi B (1957) Optics. Addison-Wesley, Reading, Massachusetts

Chapter 5

Approximation Methods of Quantum Mechanics

In Chaps. 1–3 we have focused on solving eigenvalue equations with respect to a particle confined within a one-dimensional potential well or a harmonic oscillator along with an electron of a hydrogen-like atom. In each example we obtained exact analytical solutions with the quantum-mechanical states and corresponding eigenvalues (energy, angular momentum, etc.). In most cases of quantum-mechanical problems, however, we are not able to get such analytical solutions or accurately determine the corresponding eigenvalues. Under these circumstances, we need appropriate approximation methods of those problems. Among those methods, the perturbation method and variational method are widely used. In terms of usefulness, we provide several examples concerning physical systems that have already appeared in Chaps. 1–3. In this chapter, we examine how these physical systems change their quantum states and corresponding energy eigenvalues as a result of undergoing influence from the external field. We assume that the change results from the application of external electric field. For simplicity, we focus on the change in eigenenergy and corresponding eigenstate with respect to the nondegenerate quantum state. As specific cases in these examples, we happen to be able to get perturbed physical quantities accurately. Including such cases, for later purposes we take in advance important concepts of a complete orthonormal system (CONS) and projection operator.

5.1

Perturbation Method

In Chaps. 1–3, we considered a situation where no external field is exerted on a physical system. In Chap. 4, on the other hand, we studied the optical transition that takes place as a consequence of the interaction between the physical system and electromagnetic wave. In this chapter, we wish to examine how the physical system changes its quantum state, when an external electric field is applied to the system. We usually assume that the external field is weak and the corresponding change in © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_5

151

152

5 Approximation Methods of Quantum Mechanics

the quantum state or energy is small. The applied external field causes the change in Hamiltonian of a quantum system and results in the change in that quantum state usually accompanied by the energy change of the system. We describe this process as a following eigenvalue equation: H j Ψi i ¼ Ei j Ψi i,

ð5:1Þ

where H represents the total Hamiltonian; Ei is an energy eigenvalue and Ψi is an eigenstate corresponding to Ei. Equation (5.1) implies that there should be a series of eigenvalues and corresponding eigenvectors belonging to those eigenvalues. Strictly speaking, however, we can get such combinations of the eigenstate and eigenvalue only by solving (5.1) analytically. This is some kind of circular argument. Yet, at present we do not have to go into detail about that issue. Instead we advance to discussion of the approximation methods from a practical point of view. We assume that H is divided into two terms such that H ¼ H 0 þ λV,

ð5:2Þ

where H0 is the Hamiltonian of the unperturbed system without the external field; V is an additional Hamiltonian due to the applied external field, or interaction between the physical system and the external field. The second term includes a parameter λ that can be modulated according to the change in the strength of the applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The parameter λ can be the applied field itself or can merely be a dimensionless parameter that can be set at λ ¼ 1 after finishing the calculation. We will come back to this point later in Examples. In (5.2) we assume that the following equation holds: ð0Þ

H 0 j ii ¼ E i

j ii,

ð5:3Þ

where jii is the i-th quantum state. We designate j0i as the ground state. We assume the excited states ji i (i ¼ 1, 2, ) to be numbered in order of increasing energies. These functions (or vectors) jii (including j0i) can be a quantum state that appeared as various eigenfunctions in the previous chapters. Of these, two or more functions may have the same eigenenergy (i.e., degenerate). If we are to deal with such degenerate states, those states jii have to be distinguished by, e.g., jid i (d ¼ 1, 2, , s), where s denotes the degeneracy (or degree of degeneracy). For simplicity, however, in this chapter we are going to examine only the change in energy and physical states with respect to nondegenerate states. We disregard the issue on notation of the degenerate states accordingly. We pay attention to each individual case later, when necessary. Regarding a one-dimensional physical system, e.g., a particle confined within potential well and quantum-mechanical harmonic oscillator, for example, all the quantum states are nondegenerate as we have already seen in Chaps. 1 and 2. As a

5.1 Perturbation Method

153

special case we also consider the ground state of a hydrogen-like atom. This is because that state is nondegenerate, and so the problem can be dealt with in parallel to the cases of the one-dimensional physical systems. We assume that the quantum sates ji i (i ¼ 0, 1, 2, ) constitute the orthonormal eigenvectors such that h j j ii ¼ δji :

ð5:4Þ

Remember that the notation has already appeared in (2.53); see Chap. 13 as well.

5.1.1

Quantum State and Energy Level Shift Caused by Perturbation

One of the most important applications of the perturbation method is to evaluate the shift in quantum state and energy level caused by the applied external field. To this end, we expand both the quantum sate and energy as a power series of λ. That is, in (5.1) we expand jΨii and Ei such that ð1Þ

ð2Þ

ð3Þ

j Ψi i ¼j ii þ λ j ϕi i þ λ2 j ϕi i þ λ3 j ϕi i þ , ð0Þ

ð1Þ

ð2Þ

ð3Þ

Ei ¼ Ei þ λE i þ λ2 Ei þ λ3 Ei þ , ð1Þ

ð2Þ

ð5:5Þ ð5:6Þ

ð3Þ

where j ϕi i, j ϕi i, j ϕi i, etc. are chosen as correction terms for the state jii that is associated with the unperturbed system. Once again, we assume that the state ji i (i ¼ 0, 1, 2, ) represents the nondegenerate normalized eigenvector that belongs to ð0Þ ð1Þ ð2Þ the eigenenergy E i of the unperturbed state. Unknown state vectors j ϕi i, j ϕi i, ð3Þ ð1Þ ð2Þ ð3Þ j ϕi i, etc. as well as E i , E i , Ei , etc. are to be determined after the calculation procedures carried out from now. These states jΨii and energies Ei result from the ð0Þ perturbation term λV and represent the deviation from jii and E i of the unperturbed system. With the normalization condition we impose the following condition upon jΨii such that hijΨi i ¼ 1:

ð5:7Þ

This condition, however, not necessarily means that jΨii has been normalized (vide infra). From (5.4) and (5.7) we have D E D E ð1Þ ð2Þ ijϕi ¼ ijϕi ¼ ¼ 0:

ð5:8Þ

Let us calculate hΨij Ψii on condition of (5.7) and (5.8). From (5.5) we have

154

5 Approximation Methods of Quantum Mechanics

hD Ei E D ð1Þ ð1Þ þ ϕi ji hΨi jΨi i ¼ hijii þ λ ijϕi E D hD E D Ei ð2Þ ð2Þ ð1Þ ð1Þ þλ2 ijϕi þ ϕi ji þ ϕi jϕi þ D E D E ð1Þ ð1Þ ð1Þ ð1Þ ¼ hijii þ λ2 ϕi jϕi þ 1 þ λ2 ϕi jϕi , where we used (5.4) and (5.8). Thus, hΨij Ψii is normalized if we ignore the factor of λ 2. Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors with respect to λ, we obtain h

i ð0Þ E i H 0 j ii ¼ 0, h i ð0Þ ð1Þ ð1Þ E i H 0 j ϕi i þ E i j ii ¼ V j ii, h i ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ Ei H 0 j ϕi i þ E i j ϕi i þ Ei j ii ¼ V j ϕi i:

ð5:9Þ ð5:10Þ ð5:11Þ

Equation (5.9) is identical with (5.3). Operating h jj on both sides of (5.10) from the left, we get h i h i ð0Þ ð1Þ ð1Þ ð0Þ ð0Þ ð1Þ ð1Þ h j j Ei H 0 j ϕi i þ h j j E i j ii ¼ h j j E i E j j ϕi i þ E i h jjii h i ð0Þ ð0Þ ð1Þ ð1Þ ¼ E i E j h j j ϕi i þ E i δji ¼ h jjVjii: ð5:12Þ Putting j ¼ i on (5.12), we have ð1Þ

Ei

¼ hijVjii:

ð5:13Þ

This represents the first-order correction term with the energy eigenvalue. Meanwhile, assuming j 6¼ i on (5.12), we have ð1Þ

h j j ϕi i ¼ h ð 0Þ

ð0Þ

h jjVjii ð0Þ Ei

ð0Þ

Ej

i ð j 6¼ iÞ,

ð5:14Þ

where we used Ei 6¼ E j on the assumption that the quantum state jii that belongs ð0Þ to eigenenergy E i is nondegenerate. Here, we emphasize that the eigenstates that ð0Þ correspond to Ej may or may not be degenerate. In other words, the quantum state that we are making an issue of with respect to the perturbation is nondegenerate. In this context, we do not question whether or not other quantum states [represented by jji in (5.14)] are degenerate.

5.1 Perturbation Method

155

Here we postulate that the eigenvectors, i.e., ji i (i ¼ 0, 1, 2, ) of H0 form a complete orthonormal system (CONS) such that P

X k

jkihkj ¼ E,

ð5:15Þ

where jk ih kj is said to be a projection operator and E is an identity operator. As remarked just above, (5.15) holds regardless of whether the eigenstates are degenerate. The word of complete system implies that any vector (or function) can be expanded by a linear combination of the basis vectors of the said system. The formal definition of the projection operator can be seen in Chaps. 14 and 18. Thus, we have ð1Þ

ð1Þ

j ϕi i ¼ E j ϕi i ¼ ¼

X k6¼i

X

ð1Þ

jkihk jϕi i ¼ k

X

ð1Þ

k6¼i

jkihkjϕi i

hkjVjii i, j ki h ð0Þ ð0Þ Ei Ek

ð5:16Þ

where with the third equality we used (5.8) and with the last equality we used (5.14). Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using (5.5) the approximated quantum state to the first order of λ is described by ð1Þ

j Ψi i j ii þ λ j ϕi i ¼j ii þ λ

X k6¼i

j ki h

hkjVjii ð0Þ Ei

ð0Þ

Ek

i:

ð5:17Þ

For a while, let us think of a case where we deal with the change in the eigenenergy and corresponding eigenstate with respect to the degenerate quantum state. In that case, regarding the state jii in question on (5.12) we have ∃j ji ( j 6¼ i) ð 0Þ ð0Þ that satisfies Ei ¼ E j . Therefore, we would have h jj Vj ii ¼ 0 from (5.12). Generally, it is not the case, however, and so we need a relation different from (5.12) to deal with the degenerate case. However, we do not get into details about this issue in this book. ð2Þ Next let us seek the second-order correction term E i of the eigenenergy. Operating h jj on both sides of (5.11) from the left, we get h

ð0Þ

ð0Þ

Ei Ej

i ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ h j j ϕi i þ E i h j j ϕi i þ E i δji ¼ h j j V j ϕi i:

Putting j ¼ i in the above relation as before, we have

156

5 Approximation Methods of Quantum Mechanics ð2Þ

Ei

ð1Þ

¼ hi j V j ϕi i:

ð5:18Þ

Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get ð2Þ

Ei

¼

¼

X k6¼i

X k6¼i

X hkjVjii hkjVjii i¼ i hi j V j ki h hk j V { j ii h k6¼i ð0Þ ð0Þ ð0Þ ð0Þ Ei Ek Ei Ek X hkjVjii 1 i¼ h i jhkjVjiij2 , hk j V j ii h k6¼i ð0Þ ð0Þ ð0Þ ð 0Þ Ei Ek Ei Ek

ð5:19Þ

where with the second equality we used (1.116) in combination with (1.118) and with the third equality we used the fact that V is an Hermitian operator; i.e., V{ ¼ V. The state jki is sometimes called an intermediate state. Considering the expression of (5.16), we find that the approximated state of ð1Þ j ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state contains a linear combination of different eigenstates jki that have eigenenergies ð0Þ different from Ei of jii. Substituting (5.13) and (5.19) for (5.6), with the energy correction terms up to the second order we have ð0Þ

ð1Þ

ð2Þ

E i Ei þ λE i þ λ2 E i ð0Þ

¼ Ei þ λhijVjii þ λ2

X k6¼i

h

1 ð0Þ Ei

i jhkjVjiij2 :

ð5:20Þ

i jhkjVj0ij2 :

ð5:21Þ

ð0Þ Ek

If we think of the ground statej0i forjii, we get ð0Þ

ð1Þ

ð2Þ

E 0 E 0 þ λE 0 þ λ2 E 0 ð0Þ

¼ E0 þ λh0jVj0i þ λ2 ð0Þ

X k6¼0

h

1 ð0Þ E0

ð0Þ Ek

ð0Þ

Since E 0 < Ek ðk ¼ 1, 2, Þ for anyjki, the second-order correction term is always negative. This contributes to the energy stabilization.

5.1.2

Several Examples

To deepen the understanding of the perturbation method, we show several examples. We focus on the change in the eigenenergy and corresponding eigenstate with respect to specific quantum states. We assume that the change is caused by the applied electric field as a perturbation.

5.1 Perturbation Method

157

Example 5.1 Suppose that the charged particle is confined within a one-dimensional potential well. This problem has already appeared in Example 1.2. Now let us consider how an energy of a particle carrying a charge e is shifted by the applied electric field. This situation could experimentally be achieved using a “nano capacitor” where, e.g., an electron is confined within the capacitor, while applying a voltage between the two electrodes. We adopt the coordinate system the same as that of Example 1.2. That is, suppose that we apply an electric field F between the electrodes that are positioned at x ¼ L and that the electron is confined between the two electrodes. Then, the Hamiltonian of the system is described as H¼

h2 d 2 eFx, 2m dx2

ð5:22Þ

where m is a mass of the particle. Note that we may choose eF for the parameter λ of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is given by

h2 d 2 ψ i ð x Þ eFxψ i ðxÞ ¼ E i ψ i ðxÞ, 2m dx2

ð5:23Þ

where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its corresponding eigenstate. We wish to seek the perturbation energy up to the second order and the quantum state up to the first order. We are particularly interested in the change in the ground state j0i. Notice that the ground state is obtained in (1.101) by putting l ¼ 0. According to (5.21) we have ð0Þ

E 0 E0 þ λh0jVj0i þ λ2

X k6¼0

1 h i jhkjVj0ij2 , ð0Þ ð0Þ E0 Ek

ð5:24Þ

where V ¼ eFx. Since the field F is thought to be an adjustable parameter, in (5.24) we may replace λ with eF (vide supra). Then, in (5.24) V is replaced by x in turn. In this way, (5.24) can be rewritten as ð0Þ

E 0 E0 eF h0jxj0i þ ðeF Þ2

X k6¼0

h

1 ð0Þ E0

ð0Þ Ek

i jhkjxj0ij2 :

ð5:25Þ

Considering that j0i is an even function [i.e., a cosine function in (1.101)] with respect to x and that x is an odd function, we find that the second term vanishes. Notice that the explicit coordinate representation of j0i is

158

5 Approximation Methods of Quantum Mechanics

rffiffiffi 1 π cos x, j 0i ¼ L 2L which is obtained by putting l ¼ 0 in j lcos i

qffiffi

ð5:26Þ

1 π L cos 2L

þ lπL x ðl ¼ 0, 1, 2, Þ in

(1.101). By the same token, hkj xj 0i in the third term of RHS of (5.25) vanishes if jki denotes a cosine function in (1.101). If, on the other hand, jki denotes a sine function, hkj xj 0i does not vanish. To distinguish the sine functions from cosine functions, we denote rffiffiffi 1 nπ j nsin i sin x ðn ¼ 1, 2, 3, Þ L L

ð5:27Þ

that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states. ð0Þ

ð0Þ

2 2

ð2nÞ π h π h Now, putting E0 ¼ 2m 4L ðn ¼ 1, 2, 3, Þ in (5.25), 2 and E 2n1 ¼ 2m 4L2 we rewrite it as 2

2

ð0Þ

Fig. 5.1 Notation of the quantum states that distinguish cosine and sine functions. ð0Þ E k ðk ¼ 0, 1, 2, Þ represents energy eigenvalues of the unperturbed states

X1 k¼1

1 h i jhkjxj0ij2 ð0Þ ð0Þ E0 Ek

⋯

E0 E 0 eF h0jxj0i þ ðeF Þ2

2

( )

=

ℏ

|2

⟩ ≡ |4⟩

( )

=

ℏ

|2

⟩ ≡ |3⟩

( )

=

ℏ

|1

⟩ ≡ |2⟩

( )

=

ℏ

|1

⟩ ≡ |1⟩

( )

ℏ

|0

⟩ ≡ | 0⟩

=

5.1 Perturbation Method

159

ð0Þ

¼ E 0 eF h0jxj0i þ ðeF Þ2

X1 n¼1

1 h i jhnsin jxj0ij2 , ð0Þ ð0Þ E 0 E 2n1

ð5:25Þ

where with the second equality n denotes the number appearing in (5.27) and jnsini ð0Þ belongs to E2n1. Referring to, e.g., (4.17) with regard to the calculation of hkj xj 0i, we have arrived at the following equation described by E0

h2 π 2 322 8 mL4 X1 n2 : 2 ðeF Þ2 2 6 n¼1 2 2m 4L π h ð4n 1Þ5

ð5:28Þ

The series of the second term in (5.28) rapidly converges. Defining the first N partial sum of the series as Sð N Þ

XN

n2 , n¼1 ð4n2 1Þ5

we obtain Sð1Þ ¼ 1=243 0:004115226 and Sð6Þ Sð1000Þ 0:004120685 as well as ½Sð1000Þ Sð1Þ ¼ 0:001324653: Sð1000Þ That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus, ð2Þ the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by ð2Þ

λ2 E 0 ðeF Þ2

322 8 mL4 , 243π 6 h2

ð5:29Þ

ð2Þ

where the notation λ2 E 0 is due to (5.6). Meanwhile, the corrected term of the quantum state is given by ð1Þ

j Ψ0 i j 0i þ λ j ϕ0 i j 0i eF

¼j 0i þ eF

X k6¼0

hkjxj0i i j ki h ð0Þ ð0Þ E0 Ek

ð1Þn1 n 32 8 mL3 X1 j n i : sin n¼1 π 4 h2 ð4n2 1Þ3

ð5:30Þ

For a reason similar to the above, the approximation is satisfactory, if we adopt only n ¼ 1 in the second term of RHS of (5.30). That is, we obtain

160

5 Approximation Methods of Quantum Mechanics

j Ψ0 i j 0i þ eF

32 8 mL3 j 1sin i: 27π 4 h2

ð5:31Þ

Thus, we have roughly estimated the stabilization energy and the corresponding quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating rough numbers of L and F in the actual situation (or perhaps in an actual “nano device”). The estimation is left for readers. Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscillator. Here let us suppose that a charged particle (with its charge e) is performing harmonic oscillation under an applied electric field. First we consider the coordinate representation of Schrödinger equation. Without the external field, the Schrödinger equation as an eigenvalue equation reads as

h2 d 2 uð qÞ 1 þ mω2 q2 uðqÞ ¼ EuðqÞ: 2 2m dq2

ð2:108Þ

Under the applied electric field, the perturbation is expressed as eFq and, hence, the equation can be written as

h2 d 2 uð qÞ 1 þ mω2 q2 uðqÞ eF ðqÞquðqÞ ¼ EuðqÞ: 2 2m dq2

ð5:32Þ

For simplicity we assume that the electric field is uniform independent of q so that we can deal with it as a constant. Then, (5.32) can be rewritten as 2 h2 d2 uðqÞ 1 eF 2 1 2 2 eF þ mω q uðqÞ ¼ E þ mω uðqÞ: 2 2 2m dq2 mω2 mω2

ð5:33Þ

Changing the variable such that Qq

eF , mω2

ð5:34Þ

we have 2 h2 d 2 uð qÞ 1 1 2 2 2 eF þ mω Q uðqÞ ¼ E þ mω uðqÞ: 2 2 2m dQ2 mω2

ð5:35Þ

Taking account of the change in functional form that results from the variable transformation, we rewrite (5.35) as

5.1 Perturbation Method

161

2 uð Q Þ 1 h2 d 2 e 1 2 2 2 eF e þ mω Q e uðQÞ ¼ E þ mω uðQÞ, 2 2 2m dQ2 mω2

ð5:36Þ

where eF e uðQÞ: uð qÞ ¼ u Q þ mω2

ð5:37Þ

2 e2 F 2 e E þ 1 mω2 eF E ¼ E þ , 2 mω2 2mω2

ð5:38Þ

e as Defining E

we get

uðQÞ 1 h2 d 2 e ee þ mω2 Q2e uð Q Þ ¼ E uðQÞ: 2 2m dQ2

ð5:39Þ

Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have λ

e 2E and λ ¼ 2n þ 1: hω

Then, we have e ¼ hω λ ¼ hωð2n þ 1Þ : E 2 2

ð5:40Þ

From (5.38), we get e E¼E

e2 F 2 1 e2 F 2 ¼ hω n þ : 2 2 2mω 2mω2

ð5:41Þ 2

2

e F This implies that the energy stabilization due to the applied electric field is 2mω 2. Note that the system is always stabilized regardless of the sign of e. Next, we wish to consider the problem on the basis of the matrix (or operator) representation. Let us evaluate the perturbation energy using (5.20). Conforming the notation of (5.20) to the present case, we have

E n Eðn0Þ eF hnjqjni þ ðeF Þ2 where Eðn0Þ is given by

X k6¼n

h

1 Eðn0Þ

ð0Þ Ek

i jhkjqjnij2 ,

ð5:42Þ

162

5 Approximation Methods of Quantum Mechanics

1 Eðn0Þ ¼ hω n þ : 2

ð5:43Þ

Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes. With the third term, using (2.68) as well as (2.55) and (2.61) we have rffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi

h h { kja þ a jn kja þ a{ jn jhkjqjnij ¼ 2mω 2mω pffiffiffiffiffiffiffiffiffiffiffi 2 h pffiffiffi nhkjn 1i þ n þ 1hkjn þ 1i ¼ 2mω pffiffiffiffiffiffiffiffiffiffiffi 2 h pffiffiffi nδk,n1 þ n þ 1δk,nþ1 ¼ 2mω h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii h ¼ nδk,n1 þ ðn þ 1Þδk,nþ1 þ 2δk,n1 δk,nþ1 nðn þ 1Þ 2mω h ¼ ½nδ þ ðn þ 1Þδk,nþ1 : ð5:44Þ 2mω k,n1 2

Notice that with the last equality of (5.44) there is no k that satisfies k ¼ n 1 ¼ n + 1 at once. Also note that hkj qj ni is a real number. Hence, as the third term of (5.42) we get X

h

k6¼n

1 Eðn0Þ

ð0Þ

Ek

i jhkjqjnij2

X

1 h ½nδ þ ðn þ 1Þδk,nþ1 hωðn kÞ 2mω k,n1 1 n nþ1 1 ¼ þ ¼ : 2 2mω n ðn 1Þ n ðn þ 1Þ 2mω2

¼

k6¼n

ð5:45Þ

Thus, from (5.42) we obtain 1 e2 F 2 : En hω n þ 2 2mω2

ð5:46Þ

Consequently, we find that (5.46) obtained by the perturbation method is consistent with (5.41) that was obtained as the exact solution. As already pointed out in Sect. 5.1.1, however, (5.46) does not represent an eigenenergy. In fact, using (5.17) together with (4.26) and (4.28), we have

5.1 Perturbation Method

163

ð1Þ

j Ψ0 i j 0i þ λ j ϕ0 i ¼j 0i þ λ

X k6¼0

j ki h

hkjVj0i ð0Þ E0

ð0Þ Ek

h1jVj0i i ¼j 0i þ λ j 1i h i ð0Þ ð0Þ E0 E1

rffiffiffiffiffiffiffiffiffiffiffiffiffiffi e2 F 2 i ¼j 0i þ ¼ j0i eF j1i h j 1i: ð0Þ ð0Þ 2mω3 h E E h1jxj0i

0

ð5:47Þ

1

Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i. Hence, jΨ0i does not possess an eigenenergy. Notice also that in (5.47) the factors hkj xj 0i vanish except for h1j xj 0i; see Chaps. 2 and 4. To think of this point further, let us come back to the coordinate representation of Schrödinger equation. From (5.39) and (2.106), we have

mω e un ðQÞ ¼ h where H n

1=4 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω 2 1 mω Q e 2h Q ðn ¼ 0, 1, 2, Þ, Hn h π 1=2 2n n!

pffiffiffiffiffi mω h Q is the Hermite polynomial of the n-th order. Using (5.34) and

eF (5.37), we replace Q with q mω 2 to obtain the following form described by

eF un ð qÞ ¼ e un q 2 mω 14 rffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffi mω eF 2 mω 1 mω eF q ¼ Hn e 2h qmω2 ðn ¼ 0, 1, 2, Þ: ð5:48Þ 1 h h mω2 π 2 2n n! Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using ψ n(q). That is, as a solution of (5.32) we get un ð qÞ ¼

X

c ψ ðqÞ, k nk k

ð5:49Þ

where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are nondegenerate, un(q) expressed by (5.49) lacks a definite eigenenergy. Example 5.3 [1] In the previous examples we studied the perturbation in one-dimensional systems, where all the energy eigenstates are nondegenerate. Here we deal with the change in the energy and quantum states of a hydrogen atom (i.e., a three-dimensional system). Since the energy eigenstates are generally degenerate (see Chap. 3), for simplicity we consider only the ground state that is nondegenerate. As in the previous two cases, we assume that the perturbation is caused by the applied electric field. As the simplest example, we deal with the properties including the polarizability of the hydrogen atom in its ground state. The polarizability of atom, molecule, etc. is one of the most fundamental properties in materials science.

164

5 Approximation Methods of Quantum Mechanics

We assume that the electric field is applied along the direction of the z-axis; in this case the perturbation term V is expressed as V ¼ eFz,

ð5:50Þ

where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described by

H0 ¼

h2 2μr 2

e ¼ H 0 þ V, H #

2 ∂ 1 ∂ ∂ 1 ∂ 2 ∂ r sin θ þ þ sin θ ∂θ ∂r ∂r ∂θ sin 2 θ ∂ϕ2

"

e2 , 4πε0 r

ð5:51Þ

ð5:52Þ

where H0 is identical with the Hamiltonian H of (3.35), in which Z ¼ 1. In this example, we discuss topics on the polarization and energy shift (Stark effect) caused by the applied electric field [1, 3]. (1) Polarizability of a hydrogen atom in the ground state As in (5.5) and (5.16), the first-order perturbed state is described by j Ψ0 i j 0i þ

X k6¼0

j ki h

hkjVj0i ð0Þ ð0Þ E0 Ek

i ¼j 0i eF

X k6¼0

j ki h

hkjzj0i ð0Þ ð0Þ E0 Ek

i,

ð5:53Þ

where j0i denotes the ground state expressed as (3.301). Since the quantum states jki represent all the quantum states of the hydrogen atom as implied in (5.3), in the second term of RHS in (5.53) we are considering all those states except for j0i. The ð 0Þ h2 eigenenergy E0 is identical with E 1 ¼ 2μa 2 obtained from (3.258), where n ¼ 1. As already noted, we simply numbered the states jki in order of increasing energies. ð0Þ ð0Þ ð0Þ ð0Þ In the case of k 6¼ j (k, j 6¼ 0), we may have either E k ¼ Ej or E k 6¼ Ej accordingly. Using (5.51), let us calculate the polarizability of a hydrogen atom in a ground state. In Sect. 4.1 we gave a definition of the electric dipole moment such that Pe

X

x, j j

ð4:6Þ

where xj is a position vector of the j-th charged particle. Placing the proton of hydrogen at the origin, by use of the notation (3.5) P is expressed as

5.1 Perturbation Method

165

0

1 0 1 Px ex B C B C P ¼ ðe1 e2 e3 Þ@ Py A ¼ ex ¼ ðe1 e2 e3 Þ@ ey A: ez

Pz

Hence, the z-component of the electric dipole moment Pz of electron is given by Pz ¼ ez:

ð5:54Þ

The quantum-mechanical analog of (5.54) is given by hPz i ¼ ehΨ0 jzjΨ0 i,

ð5:55Þ

where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get hΨ0 jzjΨ0 i * ¼

h0 j eF

X k6¼0

¼ h0jzj0i eF

þðeF Þ2

X

hk j h

X k6¼0

hkjzj0i ð0Þ

ð0Þ

E0 Ek

ijzjj 0i eF

X

hkjzj0i

k6¼0

+

i j ki h ð0Þ ð0Þ E0 Ek

X

hkjzj0i hkjzj0i i eF i 0jz{ jk h h0jzjk i h k6¼0 ð0Þ ð0Þ ð0Þ ð0Þ E0 Ek E0 Ek

hkjzjki h k6¼0

jhkjzj0ij2 ð0Þ

ð0Þ

E0 Ek

i2 :

ð5:56Þ

In (5.56) readers are referred to the computation rule of inner product such as (13.20) and (13.64). Since z is Hermitian (i.e., z{ ¼ z), the second and third terms of RHS of (5.56) equal. Neglecting the second-order perturbation factor on the fourth term, we have hΨ0 jzjΨ0 i h0jzj0i 2eF

X

jhkjzj0ij2 h i: k6¼0 ð0Þ ð0Þ E0 Ek

Thus, from (5.55) we obtain hPz i eh0jzj0i 2e2 F

X

jhkjzj0ij2 h i: k6¼0 ð0Þ ð0Þ E0 Ek

Using (3.301), the coordinate representation of h0j zj 0i is described by

ð5:57Þ

166

5 Approximation Methods of Quantum Mechanics

a3 h0jzj0i ¼ π a3 ¼ π a3 ¼ 2π where we used

Rπ 0

Z Z

1

ze2r=a dxdydz

1 1

3 2r=a

r e

Z dr

0

Z

Z

2π

0

1

3 2r=a

r e

Z

cos θ sin θdθ

0

Z

2π

dr

0

π

dϕ dϕ 0

π

sin 2θdθ ¼ 0,

ð5:58Þ

0

sin 2θdθ ¼ 0. Then, we have hPz i 2e2 F

X

jhkjzj0ij2 h i: k6¼0 ð0Þ ð0Þ E0 Ek

ð5:59Þ

On the basis of classical electromagnetism, we define the polarizability α as [4] α hPz i=F:

ð5:60Þ

Here we have an important relationship between the polarizability and electric dipole moment. That is, ðpolarizabilityÞ ðelectric fieldÞ ¼ ðelectric dipole momentÞ: From (5.59) and (5.60) we have α ¼ 2e2

X k6¼0

h

jhkjzj0ij2 ð0Þ

ð 0Þ

E0 Ek

i:

ð5:61Þ

At the first glance, to evaluate (5.61) appears to be formidable, but making the most of the fact that the total wave functions of a hydrogen atom form the CONS, the evaluation is straightforward. The discussion is as follows: First, let us seek an operator G that satisfies [1] z j 0i ¼ ðGH 0 H 0 GÞ j 0i:

ð5:62Þ

To this end, we take account of all the quantum states jki of a hydrogen atom (including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect 3.8 we have obtained the total wave functions described by eðnÞ ¼ Y m ðθ, ϕÞR eðl nÞ ðr Þ, Λ l l,m

ð3:300Þ

5.1 Perturbation Method

167

ðnÞ

e is expressed as a product of spherical surface harmonics and radial wave where Λ l,m functions. The latter functions are described by associated Laguerre functions. It is well known [2] that the spherical surface harmonics constitute the CONS on a unit sphere and that associated Laguerre functions form the CONS in the real onedimensional space. Thus, their product expressed as (3.300), i.e., aforementioned collection of jki constitutes the CONS in the real three-dimensional space. This is equivalent to X j

j jih j j¼ E,

ð5:63Þ

where E is an identity operator. The implications of (5.63) are as follows: Take any function jf(r)i and operate (5.63) from the left. Then, we get Ef ðrÞ ¼ f ðrÞ ¼

X k

j kihkj f ðrÞi ¼

X k

f k j ki:

ð5:64Þ

In other words, (5.64) implies that any function f(r) can be expanded into a series of jki. The coefficients fk are defined as f k hkj f ðrÞi:

ð5:65Þ

Those are so-called “Fourier coefficients.” Related discussion is given in Sect. 10.4 as well as Chaps. 14, 18, and 20. We further impose the conditions on the operator G defined in (5.62). (i) G commutes with r (¼j rj ). (ii) G has a functional form described by G ¼ G (r, θ). (iii) G does not contain a differential operator. On those conditions, we have ∂G/∂ϕ ¼ 0. From (3.301), we have ∂ j 0 i /∂ϕ ¼ 0. Thus, we get ðGH 0 H 0 GÞ j 0i

h2 1 ∂ 2∂ 1 ∂ 2∂ 1 1 ∂ ∂ ¼ G 2 r 2 r G 2 sinθ G j 0i: ð5:66Þ 2μ r ∂r r ∂r r sinθ ∂θ ∂r ∂r ∂θ This calculation is somewhat complicated, and so we calculate (5.66) termwise. With the first term of (5.66), we have

168

5 Approximation Methods of Quantum Mechanics

1 ∂ 1 ∂ 1 ∂ 2 ∂ 2 ∂G 2 ∂ j 0i r r r G 2 G j 0i ¼ 2 j 0i 2 r ∂r r ∂r r ∂r ∂r ∂r ∂r

2 ∂ j 0i 1 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂G ∂ j 0i 2 j 0i 2 j 0i r2 ¼ 2 2r r r ∂r ∂r ∂r ∂r ∂r ∂r ∂r ∂r

2 ∂ j 0i 2 ∂G ∂ G ∂G ∂ j 0i G ∂ 2 ¼ j 0i 2 j 0i 2 r2 r ∂r r ∂r ∂r ∂r ∂r ∂r

2 2 ∂G ∂ G 2 ∂G G ∂ 2 ∂ j 0i j 0i 2 j 0i þ j 0i 2 r ¼ , ð5:67Þ r ∂r a ∂r r ∂r ∂r ∂r where with the last equality we used (3.301), namely j0 i / er/a, where a is Bohr radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then, we get ðGH 0 H 0 GÞ j 0i

2 h2 1 1 ∂G ∂ G 1 1 ∂ ∂G ¼ j0i þ 2 j0i þ 2 sin θ 2 j0i r a ∂r 2μ r sin θ ∂θ ∂r ∂θ

2 h2 1 1 ∂G ∂ G 1 1 ∂ ∂G þ 2 þ 2 sin θ ¼ 2 j 0i: r a ∂r 2μ r sin θ ∂θ ∂r ∂θ

ð5:68Þ

From (5.62) we obtain hr j z j 0i ¼ hr j ðGH 0 H 0 GÞ j 0i

ð5:69Þ

so that we can have the coordinate representation. As a result, we get

2 h2 1 1 ∂G ∂ G 1 1 ∂ ∂G r cos θϕð1sÞ ¼ þ sin θ 2 þ ϕð1sÞ, ð5:70Þ r a ∂r ∂r 2 r 2 sin θ ∂θ 2μ ∂θ where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we have

5.1 Perturbation Method

169

2 ∂ G 1 1 ∂G 1 1 ∂ ∂G 2μ þ sin θ þ 2 ¼ 2 r cos θ: r a ∂r r 2 sin θ ∂θ ∂r 2 ∂θ h

ð5:71Þ

Now, we assume that G has a functional form of Gðr, θÞ ¼ gðr Þ cos θ:

ð5:72Þ

Inserting (5.72) into (5.71), we have d2 g 1 1 dg 2g 2μ þ2 ¼ 2 r: 2 r a dr r 2 dr h

ð5:73Þ

Equation (5.73) has a regular singular point at the origin and resembles an Euler equation whose general from of a homogeneous equation is described by [5] d2 g a dg b þ g ¼ 0, þ dr 2 r dr r 2

ð5:74Þ

where a and b are arbitrary constants. In particular, if we have a following differential equation described by d2 g a dg a g ¼ 0, þ dr 2 r dr r 2

ð5:75Þ

we immediately see that one of particular solutions is g ¼ r. However, (5.73) differs from the general Euler equation by the presence of 2a dg dr. Then, let us assume that a particular solution g has a form of g ¼ pr 2 þ qr:

ð5:76Þ

Inserting (5.76) into (5.73), we get 4p

4pr 2q 2μ ¼ 2 r: a a h

ð5:77Þ

Comparing coefficient of r and constant term, we obtain p¼ Hence, we get

aμ a2 μ and q ¼ 2 : 2 2h h

ð5:78Þ

170

5 Approximation Methods of Quantum Mechanics

gð r Þ ¼

aμ r þ a r: h2 2

ð5:79Þ

Accordingly, we have Gðr, θÞ ¼ gðr Þ cos θ ¼

aμ r aμ r þ a r cos θ ¼ þ a z: h2 2 h2 2

ð5:80Þ

This is a coordinate representation of G(r, θ). Returning back to (5.62) and operating h jj from the left on its both sides, we have h jjzj0i ¼ h jjðGH 0 H 0 GÞj0i ¼ h jjGH 0 j0i h jjH 0 Gj0i h i ð0Þ ð0Þ ¼ E 0 E j h jjGj0i:

ð5:81Þ

Notice here that ð0Þ

H 0 j 0i ¼ E 0 j 0i

ð5:82Þ

and h i{ ð0Þ ð0Þ h j j H 0 ¼ H 0 { j ji{ ¼ H 0 j ji{ ¼ Ej j ji ¼ E j h j j ,

ð5:83Þ

where we used a computation rule of (1.118) and with the second equality we used the fact that H0 is Hermitian, i.e., H0{ ¼ H0. Rewriting (5.81), we have h jjGj0i ¼

h jjzj0i ð0Þ E0

ð0Þ

Ej

:

ð5:84Þ

Multiplying h0jzj ji on both sides of (5.84) and summing over j (6¼0), we obtain X

h0jzj jih jjGj0i ¼ j6¼0

X j6¼0

h0jzj jih jjzj0i h i: ð0Þ ð0Þ E0 Ej

ð5:85Þ

Adding h0jzj0 i h0jGj0i on both sides of (5.85), we have X j

h0jzj jih jjGj0i ¼

X j6¼0

h0jzj jih jjzj0i h i þ h0jzj0ih0jGj0i: ð0Þ ð0Þ E0 Ej

ð5:86Þ

But, from (5.58) h0j zj 0i ¼ 0. Moreover, using the completeness of jji described by (5.63) for LHS of (5.86), we get

5.1 Perturbation Method

h0jzGj0i ¼

X

h0jzj jih jjGj0i ¼ j

¼

171

X

h0jzj jih jjzj0i X jh jjzj0ij2 h i¼ h i j6¼0 j6¼0 ð0Þ ð0Þ ð0Þ ð0Þ E0 Ej E0 Ej

α , 2e2

ð5:87Þ

where with the second equality we used (5.84) and with the last equality we used (5.61). Also, we used the fact that z is Hermitian. Meanwhile, using (5.80), LHS of (5.87) is expressed as h0jzGj0i ¼

aμ r aμ a2 μ þ a z0i ¼ 2 h0jzrzj0i 2 h0z2 0i: 0 z h 2 2 h 2h h

ð5:88Þ

Noting that j0i is spherically symmetric and that z and r are commutative, (5.88) can be rewritten as h0jzGj0i ¼

aμ 3 a2 μ 2 0 r 0i h h0 r 0i, 6h2 3h2

ð5:89Þ

where we used h0jx2j0 i ¼ h0jy2j0 i ¼ h0jz2j0 i ¼ h0jr2j0 i /3. Returning back to the coordinate representation and taking account of variables change from the Cartesian coordinate to the polar coordinate as in (5.58), we get aμ 4π 120a6 a2 μ 4π 3a5 aμ 5a3 a2 μ 4a3 ¼ 2 6h2 πa3 64 3h2 πa3 4 h2 4 h 4 3 aμ 9a ¼ 2 : h 4

h0jzGj0i ¼

ð5:90Þ

To perform the definite integral calculations of (5.89) and obtain the result of (5.90), modify (3.263) and use the formula described below. That is, differentiate Z1

erξ dr ¼

1 ξ

0

four (or five) times with respect to ξ and replace ξ with 2/a to get the result of (5.90) appropriately. Using (5.87) and (5.90), as the polarizability α we finally get

172

5 Approximation Methods of Quantum Mechanics

aμ 9a3 α ¼ 2e h0jzGj0i ¼ 2e 2 h 4 2

2

¼

e2 aμ 9a3 9a3 ¼ 4πε 0 2 h2 2 ð5:91Þ

¼ 18πε0 a3 , where we used a ¼ 4πε0 h2 =μe2 :

ð5:92Þ

See Sect. 3.7.1 for this relation. (2) Stark effect of a hydrogen atom in a ground state The energy shift with the ground state of a hydrogen atom up to the second order is given by ð0Þ

E 0 E0 eF h0jzj0i þ ðeF Þ2

X j6¼0

1 h i jh jjzj0ij2 : ð0Þ ð0Þ E0 Ej

ð5:93Þ

Using (5.58) and (5.87) obtained above, we readily get ð0Þ

E0 E0

αe2 F 2 αF 2 ð0Þ ð0Þ ¼ E 0 9πε0 a3 F 2 : ¼ E 0 2 2e2

ð5:94Þ

Experimentally, the energy shift by 9πε0a3F2 is well known as the Stark shift. In the above examples, we have seen how energy levels and related properties are changed by the applied electric field. The energies of the system that result from the applied external field should not be considered as an energy eigenvalue, but should be thought to be an expectation value. The perturbation theory has a wide application in physical problems. Examples include the evaluation of transition probability between quantum states. The theory also deals with scattering of particles. Including the treatment of the degenerate case, interested readers are referred to appropriate literature [1, 3].

5.2

Variational Method

Another approximation method is a variational method. This method also has a wide range of applications in mathematical physics. A major application of the variational method lies in seeking an eigenvalue that is appropriately approximated. Suppose that we have an Hermitian differential operator D that satisfies an eigenvalue equation

5.2 Variational Method

173

Dyn ¼ λn yn ðn ¼ 1, 2, 3, Þ,

ð5:95Þ

where λn is a real eigenvalue and yn is a corresponding eigenfunction. We have already encountered one of typical differential operators and eigenvalue equations in (1.63) and (1.64). In (5.95) we assume that λ1 λ2 λ3 ,

ð5:96Þ

where corresponding eigenstates may be degenerate. We also assume that a collection {yn; n ¼ 1, 2, 3, } constitutes the CONS. Now suppose that we have a function u that satisfies the boundary conditions (BCs) the same as those for yn that is a solution of (5.95). Then, we have hyn jDui ¼ hDyn jui ¼ hλn yn jui ¼ λn hyn jui ¼ λn hyn jui:

ð5:97Þ

In (5.97) we used (1.132) and the computation rule of the inner product (see Sect. 13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real (Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can be expanded in a series described by u¼

X

c j j

j yj i

ð5:98Þ

with

cj yj ju :

ð5:99Þ

This relation is in parallel with (5.64). Similarly, Du can be expanded such that Du ¼

X

d j j

j yj i ¼

X X

X

y jDu j yj i ¼ λ y ju j yj i ¼ λ c j yj i, ð5:100Þ j j j j j j j j

where dj is an expansion coefficient and with the third equality we used (5.95) and with the last equality we used (5.99). Comparing the individual coefficient of jyji of (5.100), we get d j ¼ λj c j :

ð5:101Þ

Thus, we obtain Du ¼

X

λc j j j

j yj i:

ð5:102Þ

Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98).

174

5 Approximation Methods of Quantum Mechanics

Meanwhile, taking an inner product huj Dui again termwise with (5.100), we get D X E X X 2 X ¼ λ c ujy λ c c ¼ λ c hujDui ¼ uj j λj cj j yj i ¼ j j j j j j j j j j j

X

λ j 1

X 2 2 cj ¼ λ 1 c , j j

ð5:103Þ

where with the third equality we used (5.99) in combination with yj ju ¼ cj . With huj ui, we have hujui ¼

DX E X X

X 2 c , jj ¼ c y c j y i ¼ c c y jy j j j j j j j j j j j j j

ujyj ¼

ð5:104Þ

where with the last equality we used the fact that the functions {yn} are normalized. Comparing once again (5.103) and (5.104), we finally get hujDui λ1 hujui or λ1

hujDui : hujui

ð5:105Þ

If we choose y1 for u, we have an equality in (5.105). Thus, we reach the following important theorem. Theorem 5.1 Suppose that we have a linear differential equation described by Dy ¼ λy,

ð5:106Þ

where D is a suitable Hermitian differential operator. Suppose also that under appropriate boundary conditions (BCs), (5.106) has a series of (real) eigenvalues. i Then, the smallest eigenvalue is equal to the minimum of hhujDu ujui . Using this powerful theorem, we are able to find a suitably approximated smallest eigenvalue. We have simple examples next. As before, let us focus on the change in the energies and quantum states caused by the applied electric field for a particle in a one-dimensional potential well and a harmonic oscillator. The latter problem will be left for readers as an exercise. As a specific case, for simplicity, we consider only the ground state and first excited state to choose a trial function. First, we deal with the problem in common with both the cases. Then, we discuss the feature individually. As a general case, suppose that H is the Hamiltonian of the system (either a particle in a one-dimensional potential well or a harmonic oscillator). We calculate an expectation value of energy hHi when the system undergoes an influence of the electric field. The Hamiltonian is described by

5.2 Variational Method

175

H ¼ H 0 eFx,

ð5:107Þ

where H0 is the Hamiltonian without the external field F. We suppose that the quantum state jui is expressed as j ui j 0i þ a j 1i,

ð5:108Þ

where j0i and j1i denote the ground state and first excited state, respectively; a is an unknown parameter to be decided after the analysis based on the variational method. In the present case, jui is a trial function. Then, we have hH i

hujHui : hujui

ð5:109Þ

With the numerator, we have hujHui ¼ hh0 j þah1 jjH 0 eFxjj 0i þ a j 1ii ¼ h0jH 0 j0i þ ah0jH 0 j1i eF h0jxj0i eFah0jxj1i þa h1jH 0 j0i þ a ah1jH 0 j1i eFa h1jxj0i eFa ah1jxj1i ¼ h0jH 0 j0i eFah0jxj1i þ a ah1jH 0 j1i eFa h1jxj0i:

ð5:110Þ

Note that in (5.110) four terms vanish because of the symmetry requirement of the symmetric potential well and harmonic oscillator with respect to the origin as well as because of the orthogonality between the states j0i and j1i. Here x is Hermitian and we assume that the coordinate representation of j0i and j1i is real as in (1.101) and (1.102) or in (2.106). Then, we have

h1jxj0i ¼ h1jxj0i ¼ 0jx{ j1 ¼ h0jxj1i: Also, assuming that a is a real number, we rewrite (5.110) as hujHui ¼ h0jH 0 j0i 2eFah0jxj1i þ a2 h1jH 0 j1i:

ð5:111Þ

As for the denominator of (5.109), we have hujui ¼ hh0 j þah1 jjj 0i þ a j 1ii ¼ 1 þ a2 ,

ð5:112Þ

where we used the normalized conditions h0j 0i ¼ h1j 1i ¼ 1 and the orthogonal conditions h0j 1i ¼ h1j 0i ¼ 0. Thus, we get

176

5 Approximation Methods of Quantum Mechanics

hH i ¼

h0jH 0 j0i 2eFah0jxj1i þ a2 h1jH 0 j1i : 1 þ a2

ð5:113Þ

Here, defining the ground state energy and first excited energy as E0 and E1, respectively, we put H 0 j0i ¼ E 0 j0i and H 0 j 1i ¼ E 1 j 1i,

ð5:114Þ

where E0 6¼ E1. As already shown in Chaps. 1 and 2, both the systems have nondegenerate energy eigenstates. Then, we have hH i ¼

E0 2eFah0jxj1i þ a2 E1 : 1 þ a2

ð5:115Þ

Now we wish to determine the condition where hHi has an extremum. That is, we are seeking a that satisfies ∂ hH i ¼ 0: ∂a

ð5:116Þ

Calculating LHS of (5.116) by use of (5.115), we have ∂hH i 2eF h0jxj1ia2 þ 2ðE 1 E0 Þa 2eF h0jxj1i ¼ : ∂a ð1 þ a2 Þ2

ð5:117Þ

hH i to be zero, we must have For ∂∂a

eF h0jxj1ia2 þ ðE1 E0 Þa eF h0jxj1i ¼ 0:

ð5:118Þ

Then, solving the quadratic equation (5.118), we obtain

a¼

E 0 E1 ðE 1 E 0 Þ

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h0jxj1i2 1 þ 4½ðeF E E Þ2 1

0

2eF h0jxj1i

:

ð5:119Þ

Taking the plus sign in the numerator of (5.119) and using pffiffiffiffiffiffiffiffiffiffiffiffi Δ 1þΔ1þ 2 for small Δ, we obtain

ð5:120Þ

5.2 Variational Method

177

a

eF h0jxj1i : E1 E0

ð5:121Þ

Thus, with the optimized state of (5.108) we get j ui j 0i þ

eF h0jxj1i j 1i: E1 E0

ð5:122Þ

If one compares (5.122) with (5.17), one should recognize that these two equations are the same within the first-order approximation. Notice that putting i ¼ 0 and k ¼ 1 in (5.17) and replacing V with eFx, we have the expression similar to (5.122). Notice once again that h0j xj 1i ¼ h1j xj 0i as x is an Hermitian operator and that we are considering real inner products. Inserting (5.121) into (5.115) and approximating 1 1 a2 , 1 þ a2

ð5:123Þ

we can estimate hHi. The estimation depends upon the nature of the system we choose. The next examples deal with these specific characteristics. Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how an energy of a particle carrying a charge e confined within a one-dimensional potential well is changed by the applied electric field. Here we deal with it using the variational method. We deal with the same Hamiltonian as in the case of Example 5.1. It is described as h2 d 2 eFx: 2m dx2

ð5:22Þ

h2 d 2 , 2m dx2

ð5:124Þ

H ¼ H 0 eFx:

ð5:125Þ

H¼ By putting

H0 we rewrite the Hamiltonian as

Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial function jui as

178

5 Approximation Methods of Quantum Mechanics

j ui ¼j 0i þ a j 1i:

ð5:108Þ

In the coordinate representation, we have rffiffiffi 1 π j 0i ¼ cos x L 2L

ð5:26Þ

rffiffiffi 1 π j 1i ¼ sin x: L L

ð5:126Þ

and

From Example 1.2 of Sect. 1.3, we have h0jH 0 j0i ¼ E0 ¼

h2 π 2 , 2m 4L2

h1jH 0 j1i ¼ E 1 ¼

h2 4π 2 ¼ 4E0 : 2m 4L2

ð5:127Þ

Inserting these results into (5.121), we immediately get a eF

h0jxj1i : 3E 0

ð5:128Þ

As in Example 5.1 of Sect. 5.1, we have h0jxj1i ¼

32L : 9π 2

ð5:129Þ

For this calculation, use the coordinate representation of (5.26) and (5.126). Thus, we get a eF

32 8 mL3 : 27π 4 h2

ð5:130Þ

Meanwhile, from (5.108) we obtain j ui j 0i þ eF

h0jxj1i j 1i: 3E 0

ð5:131Þ

The resulting jui in (5.131) is the same as (5.31), which was obtained by the firstorder approximation of (5.30). Inserting (5.130) into (5.113) and approximating the denominator as before such that

References

179

1 1 a2 , 1 þ a2

ð5:123Þ

and further using (5.130) for a, we get hH i E0 ðeF Þ2

322 8 mL4 : 243π 6 h2

ð5:132Þ

Once again, this is the same result as that of (5.28) and (5.29) where only n ¼ 1 is taken into account. Example 5.5 We adopt the same problem as Example 5.2. The calculation procedures are almost the same as those of Example 5.2. We use rffiffiffiffiffiffiffiffiffiffi h and E1 E 0 ¼ hω: h0jqj1i ¼ 2mω

ð5:133Þ

The detailed procedures are left for readers as an exercise. We should get the same results as those obtained in Example 5.2. As discussed in the above five simple examples, we showed the calculation procedures of the perturbation method and variational method. These methods not only supply us with suitable approximation techniques, but also provide physical and mathematical insight in many fields of natural science.

References 1. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese) 2. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York 3. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York 4. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York 5. Coddington EA (1989) An introduction to ordinary differential equations. Dover, New York

Chapter 6

Theory of Analytic Functions

Theory of analytic functions is one of major fields of modern mathematics. Its application covers broad range of topics of natural science. A complex function f (z), or a function that takes a complex number z as a variable, has various properties that often differ from those of functions that take a real number x as a variable. In particular, the analytic functions hold a paramount position in the complex analysis. In this chapter we explore various features of the analytic functions accordingly. From a practical point of view, the theory of analytic functions is very frequently utilized for the calculation of real definite integrals. For this reason, we describe the related topics together with tangible examples. The complex plane (or Gaussian plane) can be dealt with as a topological space where the metric (or distance function) is defined. Since the complex plane has a two-dimensional extension, we can readily imagine and investigate its topological feature. Set theory allows us to make an axiomatic approach along with the topology. Therefore, we introduce basic notions and building blocks of the set theory and topology.

6.1

Set and Topology

A complex number z is usually expressed as z ¼ x þ iy,

ð6:1Þ

where x and y are real numbers and i is an imaginary unit. Graphically, the number z is indicated as a point in a complex plane where the real axis is drawn as abscissa and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the complex plane has a two-dimensional extension, we can readily imagine the domain of variability of z and make a graphical object for it on the complex plane. A disk-like diagram that is enclosed with a closed curve C is frequently dealt with in the theory of analytic © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_6

181

182

6 Theory of Analytic Functions

Fig. 6.1 Complex plane where a complex number z is depicted

i 0

1

functions. Such a diagram provides a good subject for study of the set theory and topology. Before developing the theory of complex numbers and analytic functions, we briefly mention the basic notions as well as notations and meanings of sets and topology.

6.1.1

Basic Notions and Notations

A set comprises numbers (either real or complex) or mathematical objects, more generally. The latter include, e.g., vectors, their transformation, etc., as we shall see various illustrations in Parts III and IV. The set A is described such that A ¼ f1, 2, 3, , 10g

or

B ¼ fx; a < x < b, a, b 2 ℝg,

ð6:2Þ

where A contains ten elements in the former case and uncountably infinite elements are contained in the latter set B. In the latter case (sometimes in the former case as well), the elements are usually called points. When we speak of the sets, a universal set [1] is implied in the background. This set contains all the sets in consideration of the study and is usually clearly defined in the context of the discussion. For example, with the real analysis the universal set is ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set. We study various characteristics of sets (or subsets) that are contained in the universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an example of Venn diagram that shows a subset A. As is often the case, the universal set U is omitted in Fig. 6.2a. To show a (sub)set A we usually depict it with a closed curve. If an element a is contained in A, we write a2A

ð6:3Þ

6.1 Set and Topology

(a)

183

(b)

Fig. 6.2 Venn diagrams. (a) An example of the Venn diagram. (b) A subset A and its complement Ac. U shows the universal set

and depict it as a point inside the closed curve. To show that b is not contained in A, we write b2 = A: In that case, we depict b outside the closed curve of Fig. 6.2a. To show that a set A is contained in another set B, we write A ⊂ B:

ð6:4Þ

If (6.4) holds, A is called a subset of B. Naturally, every set contained in the universal set is its subset. We need to explicitly show a set that has no element. The relevant set is called an empty set and denoted by ∅. Examples are given below.

x; x2 < 0, x 2 ℝ ¼ ∅, fx; x 6¼ x; g ¼ ∅, x 2 = ∅, etc:

The subset A may have different cardinal numbers (or cardinalities) depending on the nature of A. To explicitly show this, elements are sometimes indexed as, e.g., ai (i ¼ 1, 2, ) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different cardinalities; (6.2) is an example. A complement (or complementary set) of A is defined as a difference U A and denoted by Ac(U A). That is, the set Ac is said to be a complement of A with respect to U and indicated with a shaded area in Fig. 6.2b. Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum (or union, or union of sets) A [ B, an intersection A \ B, and differences A B and B A. More specifically, A B is defined as a subset of A to which B is subtracted from A and referred to as a set difference of A relative to B. We have A [ B ¼ ðA BÞ [ ðB AÞ [ ðA \ BÞ:

ð6:5Þ

184

6 Theory of Analytic Functions

Fig. 6.3 Two different subsets A and B in an universal set U. The diagram shows the sum (A [ B), intersection (A \ B), and differences (A B and B A)

− ∩ −

∪ If A has no element in common with B, A B ¼ A and B A ¼ B with A \ B ¼ ∅. Then, (6.5) trivially holds. Notice also that A [ B is the smallest set that contains both A and B. In (6.5) the elements must not be doubly counted. Thus, we have A [ A ¼ A: The well-known de Morgan’s law can be understood graphically using Fig. 6.3. ðA [ BÞc ¼ Ac \ Bc and ðA \ BÞc ¼ Ac [ Bc :

ð6:6Þ

This can be shown as follows: With 8u 2 U we have = A [ B ⟺ u2 = A and u 2 = B ⟺ u 2 Ac and u 2 Bc u 2 ðA [ BÞc ⟺ u 2 ⟺ u 2 Ac \ Bc : Furthermore, we have = A \ B ⟺ u2 = A or u 2 = B ⟺ u 2 Ac or u 2 Bc ⟺ u 2 Ac [ Bc : u 2 ðA \ BÞc ⟺ u 2 We have other important relations such as the distributive law described by ðA [ BÞ \ C ¼ ðA \ C Þ [ ðB \ C Þ,

ð6:7Þ

ðA \ BÞ [ C ¼ ðA [ CÞ \ ðB [ CÞ:

ð6:8Þ

The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are intuitively acceptable, but we need more rigorous discussion to characterize and analyze various aspects of sets (vide infra).

6.1 Set and Topology

6.1.2

185

Topological Spaces and Their Building Blocks

On the basis of the aforementioned discussion, we define a topology and topological space next. Let T be a universal set. Let τ be a collection of subsets of T. If the collection τ satisfies the following axioms, τ is called a topology on T. The relevant axioms are as follows: (O1) T 2 τ and ∅ 2 τ, where ∅ is an empty set. (O2) If O1, O2, , On 2 τ, O1 \ O2 \ On 2 τ. (O3) If Oλ (λ 2 Λ) 2 τ, [λ 2 ΛOλ 2 τ. In the above axioms, T is called an underlying set. The coupled object of T and τ is said to be a topological space and denoted by (T, τ). The members (i.e., subsets) of τ are defined as open sets. (If we do not assume the topological space, the underlying set is equivalent to the universal set. Henceforth, however, we do not have to be too strict with the difference in this kind of terminology.) Let (T, τ) be a topological space with a subset O0 ⊂ T. Let τ0 be defined such that τ0 fO0 \ O; O 2 τg: Then, (O0, τ0) can be regarded as a topological space. In this case τ0 is called a relative topology for O0 [2, 3] and O0 is referred to as a subspace of T. The topological space (O0, τ0) shares the same topological feature as that of (T, τ). Notice that O0 does not necessarily need to be an open set of T. But, if O0 is an open set of T, then any open set belonging to O0 is an open set of T as well. This is evident from the definition of the relative topology and Axiom (O2). The abovementioned Axioms (O1)–(O3) sound somewhat pretentious and the definition of the open sets seems to be “descent from heaven.” Nonetheless, these axioms and terminology turn out useful soon. Once a topological space (T, τ) is given, subsets of various types arise and a variety of relationships among them ensue as well. Note that the above axioms mention nothing about a set that is different from an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think of the properties of S. Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ. Then, a subset defined as a complement of S in (T, τ); i.e., Sc ¼ T S is called a closed set in (T, τ). Replacing S with Sc in Definition 6.1, we have (Sc)c ¼ S ¼ T Sc. The above definition may be rephrased as follows: Let (T, τ) be a topological space and let A be a subset of T. Then, if Ac ¼ T A 2 τ, then A is a closed set in (T, τ). In parallel with the axioms (O1) to (O3), we have the following axioms related to the closed sets of (T, τ): Let eτ be a collection of all the closed sets of (T, τ). (C1) T 2 eτ and ∅ 2 eτ, where ∅ is an empty set. (C2) If C1, C2, , C n 2 eτ, C1 [ C 2 [ C n 2 eτ. (C3) If C λ ðλ 2 ΛÞ 2 eτ, \λ2Λ C λ 2 eτ.

186

6 Theory of Analytic Functions

Fig. 6.4 Venn diagram that represents a neighborhood S of a. S contains an open set O that contains a

These axioms are obvious from those of (O1)–(O3) along with de Morgan’s law (6.6). Next, we classify and characterize a variety of elements (or points) and subsets as well as relationships among them in terms of the open sets and closed sets. We make the most of these notions and properties to study various aspects of topological spaces and their structures. In what follows, we assume the presence of a topological space (T, τ). (a) Neighborhoods [4] Definition 6.2 Let (T, τ) be a topological space and S be a subset of T. If S contains an open set ∃O that contains an element (or point) a 2 T, i.e., a 2 O ⊂ S, then S is called a neighborhood of a. Figure 6.4 gives a Venn diagram that represents a neighborhood of a. This simple definition occupies an important position in the study of the topological spaces and the theory of analytic functions. (b) Interior and Closure [4] With the notion of neighborhoods at the core, we shall see further characterization of sets and elements of the topological spaces. Definition 6.3 Let x be a point of T and S be a subset of T. If S is a neighborhood of x, x is said to be in the interior of S. In this case, x is called an interior point of S. The interior of S is denoted by S∘. The interior as a technical term can be understood as follows: Suppose that x 2 = S. Then, by Definition 6.2 S is not a neighborhood of x. By Definition 6.3, this leads to x2 = S∘. This statement can be translated into x 2 Sc ) x 2 (S∘)c. That is, Sc ⊂ (S∘)c. This is equivalent to S∘ ⊂ S:

ð6:9Þ

Definition 6.4 Let x be a point of T and S be a subset of T. If for any neighborhood N of x we have N \ S 6¼ ∅, x is said to be in the closure of S. In this case, x is called an adherent point of S. The closure of S is denoted by S. 8

6.1 Set and Topology

187

According to the above definition, with the adherent point x of S we explicitly write x 2 S. Think of negation of the statement of Definition 6.4. That is, with some neighborhood ∃N of x we have (i) N \ S ¼ ∅ , x 2 = S . Meanwhile, (ii) N \ S ¼ ∅ ) x 2 = S. Combining (i) and (ii), we have x 2 = S ) x2 = S. Similarly to the above argument about the interior, we get S ⊂ S:

ð6:10Þ

Combining (6.9) and (6.10), we get S∘ ⊂ S ⊂ S:

ð6:11Þ

In relation to the interior and closure, we have following important lemmas. Lemma 6.1 Let S be a subset of T and O be any open set contained in S. Then, we have O ⊂ S∘. Proof Suppose that with an arbitrary point x, x 2 O. Since we have x 2 O ⊂ S, S is a neighborhood of x (due to Definition 6.2). Then, from Definition 6.3 we have x 2 S∘. Then, we have x 2 O ) x 2 S∘. This implies O ⊂ S∘. This completes the proof. ∎ Lemma 6.2 Let S be a subset of T. If x 2 S∘, then there is some open set ∃O that satisfies x 2 O ⊂ S. Proof Suppose that with a point x, x 2 S∘. Then by Definition 6.3, S is a neighborhood of x. Meanwhile, by Definition 6.2 there is some open set ∃O that satisfies x 2 O ⊂ S. This completes the proof. ∎ Lemmas 6.1 and 6.2 teach us further implications. From Lemma 6.1 and (6.11) we have O ⊂ S∘ ⊂ S

ð6:12Þ

with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing e we have S∘ ⊂ O. e Meanwhile, using Lemma 6.1 once again we this specific O as O, must get e ⊂ S∘ ⊂ S: O

ð6:13Þ

e and O e ⊂ S∘ at once. This implies that with this specific That is, we have S∘ ⊂ O ∘ e e open set O, we get O ¼ S . That is,

188 Fig. 6.5 (a) Largest open set S∘ contained in S. O denotes an open set. (b) Smallest closed set S containing S. C denotes a closed set

6 Theory of Analytic Functions

(a)

(b)

°

e ¼ S∘ ⊂ S: O

̅

ð6:14Þ

Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the largest open set contained in S. Figure 6.5a depicts this relation. In this context we have a following important theorem. Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S ¼ S∘. Proof From (6.14) S∘ is open. Therefore, if S ¼ S∘, then S is open. Conversely, suppose that S is open. In that event S itself is an open set contained in S. Meanwhile, S∘ has been characterized as the largest open set contained in S. Consequently, we have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S ¼ S∘. This completes the proof. ∎ In contrast to Lemmas 6.1 and 6.2, we have following lemmas with respect to closures. Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we have S ⊂ C. Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2 = C. Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S ¼ ∅. As x 2 Cc ⊂ Cc, by Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e., c c x2 = S. This statement can be translated into x 2 Cc ) x 2 S ; i.e., Cc ⊂ S . That ∎ is, we get S ⊂ C. Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S; i.e., x 2 = S. Then, x 2 = C for some closed set ∃C containing S. Proof If x 2 = S, then by Definition 6.4 we have some neighborhood ∃N and some ∃ open set O contained in that N such that x 2 O ⊂ N and N \ S ¼ ∅. Therefore, we must have O \ S ¼ ∅. Let C ¼ Oc. Then C is a closed set with C ⊃ S. Since x 2 O, x 2 = C. This completes the proof. ∎ From Lemmas 6.3 and 6.4 we obtain further implications. From Lemma 6.3 and (6.11) we have

6.1 Set and Topology

189

S⊂S⊂C

ð6:15Þ

with an arbitrary closed set 8C containing S. From Lemma 6.4 we have x 2 c e S ) x 2 C c ; i.e., C ⊂ S with ∃C containing S. Expressing this specific C as C, e we have C ⊂ S. Meanwhile, using Lemma 6.3 once again we must get e S ⊂ S ⊂ C:

ð6:16Þ

e ⊂ S and S ⊂ C e at once. This implies that with this specific That is, we have C e e closed set C, we get C ¼ S. That is, e S ⊂ S ¼ C:

ð6:17Þ

Relation (6.17) obviously shows that S is a closed set and moreover that S is the smallest closed set containing S. Figure 6.5b depicts this relation. In this context we have a following important theorem. Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S ¼ S. Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is closed. Therefore, if S ¼ S, then S is closed. Conversely, suppose that S is closed. In that event S itself is a closed set containing S. Meanwhile, S has been characterized as the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from (6.11) we have S ⊃ S. Thus, we get S ¼ S. This completes the proof. ∎ (c) Boundary [4] Definition 6.5 Let (T, τ) be a topological space and S be a subset of T. If a point x 2 T is in both the closure of S and the closure of the complement of S, i.e., Sc, x is said to be in the boundary of S. In this case, x is called a boundary point of S. The boundary of S is denoted by Sb. By this definition Sb can be expressed as Sb ¼ S \ Sc :

ð6:18Þ

ðSc Þb ¼ Sc \ ðSc Þc ¼ Sc \ S:

ð6:19Þ

Sb ¼ ðSc Þb :

ð6:20Þ

Replacing S with Sc, we get

Thus, we have

190

6 Theory of Analytic Functions

This means that S and Sc have the same boundary. Since both S and Sc are closed sets and the boundary is defined as their intersection, from Axiom (C3) the boundary is a closed set. From Definition 6.1 a closed set is defined as a complement of an open set, and vice versa. Thus, the open set and closed set do not make a confrontation concept but a complementation concept. In this respect we have the following lemma and theorem. Lemma 6.5 Let S be an arbitrary subset of T. Then, we have Sc ¼ ðS∘ Þc : Proof Since S∘ ⊂ S, (S∘)c ⊃ Sc. As S∘ is open, (S∘)c is closed. Hence, ðS∘ Þc ⊃ Sc, because Sc is the smallest closed set containing Sc. Next let C be an arbitrary closed set that contains Sc. Then Cc is an open set that is contained in S, and so we have Cc ⊂ S∘, because S∘ is the largest open set contained in S. Therefore, C ⊃ (S∘)c. Then, if we choose Sc for C, we must have Sc ⊃ (S∘)c. Combining this relation with ðS∘ Þc ⊃ Sc obtained above, we get Sc ¼ ðS∘ Þc . ∎ Theorem 6.3 [4] A necessary and sufficient condition for S to be both open and closed at once is Sb ¼ ∅. Proof (i) Necessary condition: Let S be open and closed. Then, S ¼ S and S ¼ S∘. Suppose that Sb 6¼ ∅. Then, from Definition 6.5 we would have ∃ a 2 S \ Sc for some a. Meanwhile, from Lemma 6.5 we have Sc ¼ ðS∘ Þc . This leads to a 2 S \ ðS∘ Þc. But, from the assumption we would have a 2 S \ Sc. We have no such a, however. Thus, using the proof by contradiction we must have Sb ¼ ∅. (ii) Sufficient condition: Suppose that Sb ¼ ∅. Starting with S ⊃ S∘ and S 6¼ S∘, let us assume that there is a such that a 2 S and a 2 = S∘. That is, we have a 2 c c c ðS∘ Þ ) a 2 S \ ðS∘ Þ ⊂ S \ ðS∘ Þ ¼ S \ Sc ¼ Sb , in contradiction to the supposition. Thus, we must not have such a, implying that S ¼ S∘. Next, starting with S ⊃ S and S 6¼ S, we assume that there is a such that a 2 S and a 2 = S. That is, we have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc ¼ Sb , in contradiction to the supposition. Thus, we must not have such a, implying that S ¼ S. Combining this relation with S ¼ S∘ obtained above, we get S ¼ S ¼ S∘ . In other words, S is both open and closed at once. These complete the proof. ∎ The abovementioned set is sometimes referred to as a closed-open set or a clopen set as a portmanteau word. An interesting example can be seen in Chap. 20 in relation to topological groups (or continuous groups).

6.1 Set and Topology

191

(d) Accumulation Points and Isolated Points We have another important class of elements and sets that include accumulation points and isolated points. Definition 6.6 Let (T, τ) be a topological space and S be a subset of T. Suppose that we have a point p 2 T. If for any neighborhood N of p we have N \ (S {p}) 6¼ ∅, then p is called an accumulation point of S. A set comprising all the accumulation points of S is said to be a derived set of S and denoted by Sd. Note that from Definition 6.4 we have N \ ðS fpgÞ 6¼ ∅ , p 2 S fpg,

ð6:21Þ

where N is an arbitrary neighborhood of p. Definition 6.7 Let S be a subset of T. Suppose that we have a point p 2 S. If for some neighborhood N of p we have N \ (S {p}) ¼ ∅, then p is called an isolated point of S. A set comprising only isolated points of S is said to be a discrete set (or subset) of S and we denote it by Sdsc. Note that if p is an isolated point, N \ (S {p}) ¼ N \ S N \ {p} ¼ N \ S {p} ¼ ∅. That is, {p} ¼ N \ S. Therefore, any isolated points of S are contained in S. Contrary to this, the accumulation points are not necessarily contained in S. Comparing Definitions 6.6 and 6.7, we notice that the latter definition is obtained by negation of the statement of the former definition. This implies that any points of a set are classified into mutually exclusive two alternatives, i.e., the accumulation points or isolated points. We divide Sd into a direct sum of two sets such that Sd ¼ ð S [ Sc Þ \ Sd ¼ Sd \ S [ Sd \ Sc ,

ð6:22Þ

where with the second equality we used (6.7). Now, we define d þ Sd \ S and Sd Sd \ Sc : S Then, (6.22) means that Sd is divided into two sets consisting of the accumulation points, i.e., (Sd)+ that belongs to S and (Sd) that does not belong to S. Thus, as another direct sum we have þ S ¼ Sd \ S [ Sdsc ¼ Sd [ Sdsc :

ð6:23Þ

Equation (6.23) represents the direct sum consisting of a part of the derived set and the whole discrete set. Here we have the following theorem.

192

6 Theory of Analytic Functions

Theorem 6.4 Let S be a subset of T. Then we have a following equation: S ¼ Sd [ Sdsc

Sd \ Sdsc ¼ ∅ :

ð6:24Þ

Proof Let us assume that p is an accumulation point of S. Then, we have p 2 S fpg ⊂ S. Meanwhile, Sdsc ⊂ S ⊂ S. This implies that S contains both Sd and Sdsc. That is, we have S ⊃ Sd [ Sdsc : It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc. Taking account of the above discussion on the accumulation points and isolated points, however, such remainder points should again be classified into accumulation and isolated points. Since all the isolated points are contained in S, they can be taken into S. Suppose that among the accumulation points, some points p are not contained in S; i.e., p 2 = S. In that case, we have S {p} ¼ S, namely S fpg ¼ S. Thus, p 2 S fpg , p 2 S. From (6.21), in turn, this implies that in case p is not contained in S, for p to be an accumulation point of S is equivalent to that p is an adherent point of S. Then, those points p can be taken into S as well. Thus, finally we get (6.24). From Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc ¼ ∅. This completes the proof. ∎ Rewriting (6.24), we get þ S ¼ Sd [ Sd [ Sdsc ¼ Sd [ S:

ð6:25Þ

We have other important theorems. Theorem 6.5 Let S be a subset of T. Then we have a following relation: S ¼ Sd [ S:

ð6:26Þ

Proof Let us assume that p 2 S. Then, we have following two cases: (i) if p 2 S, we have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2 = S. Since p 2 S, from Definition 6.4 any neighborhood of p contains a point of S (other than p). Then, from Definition 6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S obtained above, we get (6.26). This completes the proof. ∎

6.1 Set and Topology

193

̅

Fig. 6.6 Relationship among S, S, Sd, and Sdsc. Regarding the symbols and notations, see text

( (

)

) ̅= ∪ = =( ) ∪(

∪ )

,

Theorem 6.6 Let S be a subset of T. A necessary and sufficient condition for S be a closed set is that S contains all the accumulation points of S. Proof From Theorem 6.2, that S is a closed set is equivalent to S ¼ S . From Theorem 6.5 this is equivalent to S ¼ Sd [ S. Obviously, this is equivalent to Sd ⊂ S. In other words, S contains all the accumulation points of S. This completes the proof. ∎ As another proof, taking account of (6.25), we have d S ¼ ∅ ⟺ S ¼ S ⟺ S is a closed set ðfrom Theorem 6:2Þ: This relation is equivalent to the statement that S contains all the accumulation points of S. In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we can readily show that Sdsc ¼ S Sd ¼ S Sd : We have a further example of direct sum decomposition. Using Lemma 6.5, we have Sb ¼ S \ Sc ¼ S \ ð S∘ Þ c ¼ S S∘ : Since S ⊃ S∘ , we get S ¼ S∘ [ Sb ; S∘ \ Sb ¼ ∅: Notice that (6.27) is a succinct expression of Theorem 6.3. That is, Sb ¼ ∅ ⟺ S ¼ S∘ ⟺ S is a clopen set:

ð6:27Þ

194

6 Theory of Analytic Functions

(a)

(b)

(c) =

∪

Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can be connected by a continuous line that is contained entirely within A. (b) Simply connected subspace A. All the points inside any closed path C within A contain only the points that belong to A. (c) Disconnected subspace A that comprises two disjoint sets B and C

(e) Connectedness In the precedent several paragraphs we studied various building blocks and their properties of the topological space and sets contained in the space. In this paragraph, we briefly mention the relationship between the sets. The connectedness is an important concept in the topology. The concept of the connectedness is associated with the subspaces defined earlier in this section. In terms of the connectedness, a topological space can be categorized into two main types: a subspace of the one type is connected and that of the other is disconnected. Let A be a subspace of T. Intuitively, for A to be connected is defined as follows: Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of the connected subspaces, suppose that the subspace A has the property that all the points inside any closed path (i.e., continuous closed line) C within A contain only the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply connected. A subspace that is not simply connected but connected is said to be multiply connected. A typical example for the latter is a torus. If a subspace is given as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be disconnected. Notice that if two (or more) sets have no element in common, those sets are said to be disjoint. Figure 6.7c shows an example for which a disconnected subspace A is a union of two disjoint sets B and C. Interesting examples can be seen in Chap. 20 in relation to the connectedness.

6.1 Set and Topology

6.1.3

195

T1-Space

We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ). We have two extremes of them. One is an indiscrete topology (or trivial topology) and the other is a discrete topology [2]. For the former the topology is characterized as τ ¼ f∅, T g

ð6:28Þ

τ ¼ f∅, all the subsets of T, T g:

ð6:29Þ

and the latter case is

With τ all the subsets are open and complements to the individual subsets are closed by Definition 6.1. But, such closed subsets are contained in τ, and so they are again open. Thus, all the subsets are clopen sets. Practically speaking, the aforementioned two extremes are of less interest and value. For this reason, we need some moderate separation conditions with topological spaces. These conditions are well established as the separation axioms. We will not get into details about the discussion [2], but we mention the first separation axiom (Fréchet axiom) that produces the T1-space. Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to 8 x, y (x 6¼ y) 2 T there is a neighborhood N in such a way that x 2 N and y 2 = N: The topological space that satisfies the above separation condition is called a T1-space. We mention two important theorems of the T1-space. Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that for each point x 2 T, {x} is a closed set of T. Proof (i) Necessary condition: Let (T, τ) to be a T1-space. Choose any pair of elements x, y (x 6¼ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (6¼x) that does not contain x; y 2 N and N \ {x} ¼ ∅. From Definition 6.4 this implies that y 2 = fxg. Thus, we have y 6¼ x ) y 2 = fxg. This means that y 2 fxg ) y ¼ x. Namely, fxg ⊂ fxg , but from (6.11) we have fxg ⊂ fxg . Thus, fxg ¼ fxg . From Theorem 6.2 this shows that {x} is a closed set of T. (ii) Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and y such that y 6¼ x. Since {x} is a closed set, N ¼ T {x} is an open set that contains y; i.e., y 2 T {x} and x 2 = T {x}. Since y 2 T {x} ⊂ T {x}

196

6 Theory of Analytic Functions

where T {x} is an open set, from Definition 6.2 N ¼ T {x} is a neighborhood of y, and moreover N does not contain x. Then, from Definition 6.8 (T, τ) is a T1-space. These complete the proof. ∎ A set comprising a unit element is called a singleton or a unit set. Theorem 6.8 Let S be a subset of a T1-space. Then Sd is a closed set in the T1-space. = Sd. Then, Proof Let p be a point of S. Suppose p 2 Sd . Suppose, furthermore, p 2 ∃ from Definition 6.6 there is some neighborhood N of p such that N \ (S {p}) ¼ ∅. Since p 2 Sd , from Definition 6.4 we must have N \ Sd 6¼ ∅ at once for this special neighborhood N of p. Then, as we have p 2 = Sd, we can take a point q such that d q 2 N \ S and q 6¼ p. Meanwhile, we may choose an open set N for the above neighborhood of p (i.e., an open neighborhood), because p 2 N (¼N∘) ⊂ N. Since N \ (T {p}) ¼ N {p} is an open set from Theorem 6.7 as well as Axiom (O2) and Definition 6.1, N {p} is an open neighborhood of q which does not contain p. Namely, we have q 2 N {p}[¼(N {p})∘] ⊂ N {p}. Note that q 2 N {p} and p 2 = N {p}, in agreement with Definition 6.8. As q 2 Sd, again from Definition 6.6 we have ðN fpgÞ \ ðS fqgÞ 6¼ ∅:

ð6:30Þ

This implies that N \ (S {q}) 6¼ ∅ as well. But, it is in contradiction to the original relation of N \ (S {p}) ¼ ∅, which is obtained from p 2 = Sd. Then, we must have p 2 Sd. Thus, from the original supposition we get Sd ⊂ Sd . Meanwhile, we have Sd ⊃ Sd from (6.11). We have Sd ¼ Sd accordingly. From Theorem 6.2, this means that Sd is a closed set. Hence, we have proven the theorem. ∎ The above two theorems are intuitively acceptable. For, in a one-dimensional Euclidean space ℝ we express a singleton {x} as [x, x] and a closed interval as [x, y] (x 6¼ y). With the latter, [x, y] might well be expressed as (x, y)d. Such subsets are well known as closed sets. According to the separation axioms, various topological spaces can be obtained by imposing stronger constraints upon the T1-space [2]. A typical example for this is a metric space. In this context, the T1-space has acquired primitive notions of metric that can be shared with the metric space. Definitions of the metric (or distance function) and metric space will briefly be summarized in Chap. 13 in reference to an inner product space. In the above discussion, we have described a brief outline of the set theory and topology. We use the results in this chapter and in Part IV as well.

6.1 Set and Topology

197

Fig. 6.8 Complex plane where z is expressed in a polar form using a nonnegative radial coordinate r and an angular coordinate θ

i 0

6.1.4

x

1

Complex Numbers and Complex Plane

Returning to (6.1), we further investigate how to represent a complex number. In Fig. 6.8 we redraw a complex plane and a complex number on it. The complex plane is a natural extension of a real two-dimensional orthogonal coordinate plane (Cartesian coordinate plane) that graphically represents a pair of real numbers x and y as (x, y). In the complex plane we represent a complex number as z ¼ x þ iy

ð6:1Þ

by designating x as an abscissa on the real axis and y as an ordinate on the imaginary axis. The two axes are orthogonal to each other. An absolute value (or modulus) of z is defined as a real non-negative number j zj ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 þ y2 :

ð6:31Þ

Analogously to a real two-dimensional polar coordinate, we can introduce in the complex plane a non-negative radial coordinate r and an angular coordinate θ (Fig. 6.8). In the complex analysis the angular coordinate is called an argument and denoted by arg z so as to be arg z ¼ θ þ 2πn; n ¼ 0, 1, 2, : In Fig. 6.8, x and y are given by x ¼ r cos θ and y ¼ r sin θ: Thus, we have a polar form of z expressed as

198

6 Theory of Analytic Functions

z ¼ r ðcos θ þ i sin θÞ:

ð6:32Þ

Using the well-known Euler’s formula (or Euler’s identity) eiθ ¼ cos θ þ i sin θ,

ð6:33Þ

z ¼ reiθ :

ð6:34Þ

zn ¼ r n einθ ¼ r n ðcos θ þ i sin θÞn ¼ r n ðcos nθ þ i sin nθÞ,

ð6:35Þ

we get

From (6.32) and (6.34), we have

where the last equality comes from replacing θ with nθ in (6.33). Comparing both sides with the last equality of (6.35), we have ðcos θ þ i sin θÞn ¼ cos nθ þ i sin nθ:

ð6:36Þ

Equation (6.36) is called the de Moivre’s theorem. This relation holds with n being integers including zero along with rational numbers including negative numbers. Euler’s formula (6.33) immediately leads to the following important formulae: cos θ ¼

1 iθ 1 iθ e þ eiθ , sin θ ¼ e eiθ : 2 2i

ð6:37Þ

Notice that although in (6.32) and (6.33) we assumed that θ is a real number, (6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from (6.37) and the following definition of power series expansion of the exponential function ez

X 1 zk , k¼0 k!

ð6:38Þ

where z is any complex number [5]. In fact, from (6.37) and (6.38) we have following familiar expressions of power series expansion of the cosine and sine functions cos z ¼

X1 k¼0

ð1Þk

X1 z2k z2kþ1 ð1Þk , sin z ¼ : k¼0 ð2kÞ! ð2k þ 1Þ!

These expressions frequently appear in this chapter.

6.2 Analytic Functions of a Complex Variable

199

Next, let us think of a pair of complex numbers z and w with w expressed as w ¼ u + iv. Then, we have z w ¼ ðx uÞ þ iðy vÞ: Using (6.31), we get jz wj ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð x uÞ 2 þ ð y v Þ 2 :

ð6:39Þ

Equation (6.39) represents the “distance” between z and w. If we define a following function ρ(z, w) such that ρðz, wÞ jz wj , ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be dealt with as a metric space. This makes it easy to view the complex plane as a topological space. Discussions about the set theory and topology already developed earlier in this chapter basically hold. In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded as a part of the complex plane. Since the complex plane ℂ itself represents a topological space, the subset A may be viewed as a subspace of ℂ. A complex function f (z) of a complex variable z of (6.1) is usually defined in a connected open set called a region [5]. The said region is of practical use among various subspaces and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region can be considered in a manner similar to that described in Sect. 6.1.2.

6.2

Analytic Functions of a Complex Variable

Since the complex number and complex plane have been well characterized in the previous section, we deal with the complex functions of a complex variable in the complex plane. Taking the complex conjugate of (6.1), we have z ¼ x iy:

ð6:40Þ

Combining (6.1) and (6.40) in a matrix form, we get ðz z Þ ¼ ðx yÞ

1

1

i

i

:

ð6:41Þ

200

6 Theory of Analytic Functions

− − − −

∗

∗

∗

+

∗

0 ∗

− − +

∗

Fig. 6.9 Vector synthesis of z and z

(b)

(a)

z

i

x

0

0

1

Fig. 6.10 Ways a number (real or complex) is approached. (a) Two ways a number x0 is approached in a real number line. (b) Various ways a number z0 is approached in a complex plane. Here, only four ways are indicated

The matrix

1

1

is non-singular, and so the inverse matrix exists (see Sect. i i 11.3) and (x y) can be described as 1 1 ðx yÞ ¼ ðz z Þ 2 1

i

i

ð6:42Þ

or with individual components we have 1 x ¼ ðz þ z Þ 2

and

i y ¼ ðz z Þ, 2

ð6:43Þ

which can be understood as the “vector synthesis” (see Fig. 6.9). Equations (6.41) and (6.42) imply that two sets of variables (x y) and (z z) can be dealt with on an equal basis. According to the custom, however, we prioritize the notation using (x y) over that of (z z). Hence, the complex function f is describe as f ðx, yÞ ¼ uðx, yÞ þ ivðx, yÞ, where both u(x, y) and v(x, y) are real functions of real variables x and y.

ð6:44Þ

6.2 Analytic Functions of a Complex Variable

201

Here let us pause for a second to consider the differentiation of a function. Suppose that a mathematical function is defined on a real domain (i.e., a real number line). Consider whether that function is differentiable at a certain point x0 of the real number line. On this occasion, we can approach x0 only from two directions, i.e., from the side of x < x0 (from the left) or from the side of x0 < x (from the right); see Fig. 6.10a. Meanwhile, suppose that a mathematical function is defined on a complex domain (i.e., a complex plane). Also consider whether the function is differentiable at a certain point z0 of the complex plane. In this case, we can approach z0 from continuously varying directions; see Fig. 6.10b where only four directions are depicted. In this context let us think of a simple example. Example 6.1 Let f (x, y) be a function described by f ðx, yÞ ¼ 2x þ iy ¼ z þ x:

ð6:45Þ

As remarked above, substituting (6.43) for (6.45), we could have h i 1 i 1 1 hðz, z Þ ¼ 2 ðz þ z Þ þ i ðz z Þ ¼ z þ z þ ðz z Þ ¼ ð3z þ z Þ, 2 2 2 2 where f (x, y) and h(z, z) denote different functional forms. The derivative (6.45) varies depending on a way z0 is approached. For instance, think of f ð 0 þ z Þ f ð 0Þ df 2x þ iy 2x2 þ y2 ixy ¼ lim ¼ lim : ¼ lim z dz 0 x þ iy z!0 x!0, y!0 x!0, y!0 x 2 þ y2 Suppose that the differentiation is taken along a straight line in a complex plane represented by iy ¼ (ik)x (k, x, y : real). Then, we have df 2x2 þ k2 x2 ikx2 2 þ k 2 ik 2 þ k 2 ik ¼ lim ¼ : ð6:46Þ ¼ lim 2 2 2 dz 0 x!0, y!0 x!0, y!0 1 þ k2 1 þ k2 x þk x

However, this means that df dz 0 takes varying values depending upon k. Namely,

df dz 0

cannot uniquely be defined but depends on different ways to approach the origin of the complex plane. Thus, we find that the derivative takes different values depending on straight lines along which the differentiation is taken. This means that f (x, y) is not differentiable or analytic at z ¼ 0. Meanwhile, think of g(z) expressed as gðzÞ ¼ x þ iy ¼ z: In this case, we get

ð6:47Þ

202

6 Theory of Analytic Functions

gð 0 þ z Þ gð 0Þ dg z0 ¼ lim ¼ 1: ¼ lim z dz 0 z z!0 z!0 As a result, the derivative takes the same value 1, regardless of what straight lines the differentiation is taken along. Though simple, the above example gives us a heuristic method. As before, let f (z) be a function described as (6.44) such that f ðzÞ ¼ f ðx, yÞ ¼ uðx, yÞ þ ivðx, yÞ,

ð6:48Þ

where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to x and y. Then we have a derivative df (z)/dz such that df ðzÞ f ðz þ ΔzÞ f ðzÞ ¼ lim dz Δz Δz!0 ¼

lim

Δx!0, Δy!0

uðx þ Δx, y þ ΔyÞ uðx, yÞ þ i½vðx þ Δx, y þ ΔyÞ vðx, yÞ : Δx þ iΔy ð6:49Þ

We wish to seek the condition that dfdzðzÞ gives the same result regardless of the order of taking the limit of Δx ! 0 and Δy ! 0. That is, taking the limit of Δy ! 0 first, we have df ðzÞ uðx þ Δx, yÞ uðx, yÞ þ i½vðx þ Δx, yÞ vðx, yÞ ¼ lim dz Δx Δx!0 ¼

∂uðx, yÞ ∂vðx, yÞ þi : ∂x ∂x

ð6:50Þ

Next, taking the limit of Δx ! 0, we get df ðzÞ uðx, y þ ΔyÞ uðx, yÞ þ i½vðx, y þ ΔyÞ vðx, yÞ ¼ lim dz iΔy Δy!0 ¼ i

∂uðx, yÞ ∂vðx, yÞ þ : ∂y ∂y

ð6:51Þ

Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we must have ∂uðx, yÞ ∂vðx, yÞ ¼ , ∂x ∂y

ð6:52Þ

6.2 Analytic Functions of a Complex Variable

203

∂vðx, yÞ ∂uðx, yÞ ¼ : ∂x ∂y

ð6:53Þ

These relationships (6.52) and (6.53) are called the CauchyRiemann conditions. Differentiating (6.52) with respect to x and (6.53) with respect to y and further subtracting one from the other, we get 2

∂ uðx, yÞ ∂x

2

2

þ

∂ uðx, yÞ ∂y

2

¼ 0:

ð6:54Þ

¼ 0:

ð6:55Þ

Similarly we have 2

∂ vðx, yÞ ∂x

2

2

þ

∂ vðx, yÞ ∂y

2

From the above discussion, we draw several important implications. (1) From (6.50), we have df ðzÞ ∂f ¼ : dz ∂x

ð6:56Þ

df ðzÞ ∂f ¼ i : dz ∂y

ð6:57Þ

∂ ∂z ∂ ∂z ∂ ∂ ∂ ¼ : ¼ þ þ ∂x ∂x ∂z ∂x ∂z ∂z ∂z

ð6:58Þ

∂ ∂z ∂ ∂z ∂ ∂ ∂ ¼ þ ¼i : ∂y ∂y ∂z ∂y ∂z ∂z ∂z

ð6:59Þ

(2) Also from (6.51) we obtain

Meanwhile, we have

Also, we get

As in the case of Example 6.1, we rewrite f (z) as f ðzÞ F ðz, z Þ,

ð6:60Þ

where the change in the functional form is due to the variables transformation. Hence, from (6.56) and using (6.58) we have

204

6 Theory of Analytic Functions

df ðzÞ ∂F ðz, z Þ ¼ ¼ dz ∂x

∂F ðz, z Þ ∂F ðz, z Þ ∂ ∂ þ þ F ðz, z Þ ¼ : ∂z ∂z ∂z ∂z

ð6:61Þ

Similarly, we get df ðzÞ ∂F ðz, z Þ ¼ i ¼ dz ∂y

∂F ðz, z Þ ∂F ðz, z Þ ∂ ∂ F ðz, z Þ ¼ : ∂z ∂z ∂z ∂z

ð6:62Þ

Equating (6.61) and (6.62), we obtain ∂F ðz, z Þ ¼ 0: ∂z

ð6:63Þ

This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does not depend on z, but only depends upon z. Thus, we may write the differentiable function as F ðz, z Þ f ðzÞ: This is an outstanding characteristic of the complex function that is differentiable with respect to the complex variable z. Taking the contraposition of the above statement, we say that if F(z, z) depends on z, it is not differentiable with respect to z. Example 6.1 is one of the illustration. On the other way around, we consider what can happen if the CauchyRiemann conditions are satisfied [5]. Let f (z) be a complex function described by f ðzÞ ¼ uðx, yÞ þ ivðx, yÞ,

ð6:48Þ

where u(x, y) and v(x, y) satisfy the CauchyRiemann conditions and possess continuous first-order partial derivatives with respect to x and y in some region of ℂ. Then, we have uðx þ Δx, y þ ΔyÞ uðx, yÞ ¼

∂uðx, yÞ ∂uðx, yÞ Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ ∂x ∂y

vðx þ Δx, y þ ΔyÞ vðx, yÞ ¼

∂vðx, yÞ ∂vðx, yÞ Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ ∂x ∂y

where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the first-order partial derivatives of u(x, y) and v(x, y). Then, we have

6.2 Analytic Functions of a Complex Variable

205

f ðz þ ΔzÞ f ðzÞ Δz uðx þ Δx, y þ ΔyÞ uðx, yÞ i½vðx þ Δx, y þ ΔyÞ vðx, yÞ ¼ þ Δz Δz

∂vðx, yÞ ∂vðx, yÞ Δx ∂uðx, yÞ Δy ∂uðx, yÞ ¼ þi þi þ Δz Δz ∂x ∂x ∂y ∂y Δx Δy ðδ þ iδ2 Þ þ ðε1 þ iε2 Þ þ Δz Δz 1

∂vðx, yÞ ∂vðx, yÞ ∂uðx, yÞ Δx ∂uðx, yÞ Δy þi þi ¼ þ Δz Δz ∂x ∂x ∂x ∂x þ

Δx Δy ðε þ iε2 Þ þ ðδ þ iδ2 Þ Δz 1 Δz 1

¼

Δx þ iΔy ∂uðx, yÞ ðΔx þ iΔyÞ ∂vðx, yÞ Δx Δy þ i þ ðε1 þ iε2 Þ þ ðδ þ iδ2 Þ Δz Δz Δz Δz 1 ∂x ∂x

¼

∂uðx, yÞ ∂vðx, yÞ Δx Δy þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, Δz 1 Δz 1 ∂x ∂x

ð6:66Þ

where with the third equality we used the CauchyRiemann conditions (6.52) and (6.53). Accordingly, we have

f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ þi Δz ∂x ∂x ¼

Δx Δy ðε þ iε2 Þ þ ðδ þ iδ2 Þ: Δz 1 Δz 1

ð6:67Þ

Taking the absolute values of both sides of (6.67), we get

f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ þi Δz ∂x ∂x Δx Δy ðε1 þ iε2 Þ þ ðδ1 þ iδ2 Þ , ð6:68Þ Δz Δz j Δx j jΔyj Δx Δy where ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 and ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Δz Δz 2 2 ðΔxÞ þ ðΔyÞ ðΔxÞ2 þ ðΔyÞ2 1: Taking the limit of Δz ! 0, by assumption both the terms of RHS of (6.68) approach zero. Thus,

206

6 Theory of Analytic Functions

lim

Δy!0

f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f ¼ þi ¼ : Δz ∂x ∂x ∂x

ð6:69Þ

Alternatively, using partial derivatives of u(x, y) and v(x, y) with respect to y we can rewrite (6.66) as

f ðz þ ΔzÞ f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f ¼ i þi lim ¼ i : Δz Δy!0 ∂y ∂y ∂y

ð6:70Þ

From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f (z) to be differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with respect to x and y should exist. Meanwhile, once f (z) is found to be differentiable (or analytic), its higher order derivatives must be analytic as well (vide infra). This requires, in turn, that the above first-order partial derivatives should be continuous. (Note that the analyticity naturally leads to continuity.) Thus, the following theorem will follow. Theorem 6.9 Let f (z) be a complex function of complex variables z such that f (z) ¼ u(x, y) + iv(x, y), where x and y are real variables. Then, a necessary and sufficient condition for f (z) to be differentiable is that the first-order partial derivatives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that u (x, y) and v(x, y) satisfy the CauchyRiemann conditions. Now, we formally give definitions of the differentiability and analyticity of a complex function of complex variables. Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f (z) be defined in a neighborhood containing z0. If f (z) is single valued and differentiable at all points of this neighborhood containing z0, f (z) is said to be analytic at z0. A point at which f (z) is analytic is called a regular point of f (z). A point at which f (z) is not analytic is called a singular point of f (z). We emphasize that the above definition of analyticity at some point z0 requires the single valuedness and differentiability at all the points of a neighborhood containing z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point is deeply connected to how and in which direction we take the limitation process. Thus, we need detailed information about the neighborhood of the point in question to determine whether the function is analytic at that point. The next definition is associated with a global characteristic of the analytic function. Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f (z) be a complex function defined in R . If f (z) is analytic at all points of R , f (z) is said to be analytic in R . In this case R is called a domain of analyticity. If the domain of analyticity is an entire complex plane ℂ, the function is called an entire function. To understand various characteristics of the analytic functions, it is indispensable to introduce Cauchy’s integral formula. In the next section we deal with the integration of complex functions.

6.3 Integration of Analytic Functions: Cauchy’s Integral Formula

6.3

207

Integration of Analytic Functions: Cauchy’s Integral Formula

We can define the complex integration as a natural extension of Riemann integral with regard to a real variable. Suppose that there is a curve C in the complex plane. Both ends of the curve are fixed and located at za and zb. Suppose also that individual points of the curve C are described using a parameter t such that z ¼ zðt Þ ¼ xðt Þ þ iyðt Þ ðt 0 t a t t n t b Þ,

ð6:71Þ

where z0 za ¼ z(ta) and zn zb ¼ z(tb) together with zi ¼ z(ti) (0 i n). For this parametrization, we assume that C is subdivided into n pieces designated by z0, z1, , zn. Let f (z) be a complex function defined in a region containing C. In this situation, let us assume the following summation Sn: Sn ¼

Xn i¼1

f ðζ i Þðzi zi1 Þ,

ð6:72Þ

where ζ i lies between zi 1 and zi (1 i n). Taking the limit n ! 1 and concomitantly jzi zi 1 j ! 0, we have lim Sn ¼ lim

n!1

Xn i¼1

n!1

f ðζ i Þðzi zi1 Þ:

ð6:73Þ

If the constant limit exists independent of choice of zi (1 i n 1) and ζ i (1 i n), this limit is called a contour integral of f (z) along the curve C. We write it as Z

Z

I ¼ lim Sn n!1

f ðzÞdz ¼ C

zb

f ðzÞdz:

ð6:74Þ

za

Using (6.48) and zi ¼ xi + iyi, we rewrite (6.72) as Sn ¼

Xn

½uðζ i Þðxi xi1 Þ vðζ i Þðyi yi1 Þ i¼1 Xn i½vðζ i Þðxi xi1 Þ þ uðζ i Þðyi yi1 Þ: þ i¼1

Taking the limit of n ! 1 as well as jxi xi (0 i n), we have Z I¼

1j

! 0 and jyi yi

Z ½uðx, yÞdx vðx, yÞdy þ i ½vðx, yÞdx þ uðx, yÞdy:

C

Further rewriting (6.75), we get

C

1j

!0

ð6:75Þ

208

6 Theory of Analytic Functions

Z

Z

dx dy dx dy I¼ uðx, yÞ dt vðx, yÞ dt þ i vðx, yÞ dt þ uðx, yÞ dt dt dt dt dt C ZC Z dx dy ¼ ½uðx, yÞ þ ivðx, yÞ dt þ i ½uðx, yÞ þ ivðx, yÞ dt dt dt C ZC Z dx dy dz ¼ ½uðx, yÞ þ ivðx, yÞ þi dt ¼ ½uðx, yÞ þ ivðx, yÞ dt: dt dt dt C C Z Z tb Z zb dz dz ¼ f ðzÞ dt ¼ f ðzÞ dt ¼ f ðzÞdz dt dt C ta za

ð6:76Þ

Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on the paths connecting the points za and zb. If so, (6.76) is merely of secondary importance. But, if f (z) can be described by the derivative of another function, the situation becomes totally different. Suppose that we have f ðzÞ ¼

e ðzÞ dF : dz

ð6:77Þ

e ðzÞ is analytic, f (z) is analytic as well. It is because a derivative of an analytic If F function is again analytic (vide infra). Then, we have f ð zÞ

e ðzÞ dz d F e ½zðt Þ dz d F : ¼ ¼ dz dt dt dt

ð6:78Þ

Inserting (6.78) into (6.76), we get Z f ðzÞ C

dz dt ¼ dt

Z

zb

Z f ðzÞdz ¼

za

ta

tb

e ½zðt Þ dF e ðzb Þ F e ðza Þ: dt ¼ F dt

ð6:79Þ

e ðzÞ is called a primitive function of f (z). It is obvious from (6.79) In this case F that Z

zb

za

Z f ðzÞdz ¼

za

f ðzÞdz:

ð6:80Þ

zb

This implies that the contour integral I does not depend on the path C connecting the points za and zb. We must be careful, however, about a situation where there would be a singular point zs of f (z) in a region R . Then, f (z) is not defined at zs. If one chooses C00 for a contour of the return path in such a way that zs is on C00 (see Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation, we have to “expel” such singularity from the region R so that R can wholly be

6.3 Integration of Analytic Functions: Cauchy’s Integral Formula Fig. 6.11 Region R and several contours for integration. Regarding the symbols and notations, see text

209

ℛ

contained in the domain of analyticity of f (z) and that f (z) can be analytic in the entire region R . Moreover, we must choose R so that the whole region inside any closed paths within R can contain only the points that belong to the domain of analyticity of f (z). That is, the region R must be simply connected in terms of the analyticity of f (z) (see Sect. 6.1). For a region R to be simply connected ensures that (6.80) holds with any pair of points za and zb in R , so far as the integration path with respect to (6.80) is contained in R . From (6.80), we further get Z

zb

Z f ðzÞdz þ

za

za

f ðzÞdz ¼ 0:

ð6:81Þ

zb

Since the integral does not depend on the paths, we can take a path from zb to za along a curve C0 (see Fig. 6.11). Thus, we have Z

Z f ðzÞdz þ C

Z

C0

f ðzÞdz ¼

CþC 0

f ðzÞdz ¼ 0,

ð6:82Þ

e as shown. In this way, in accordance with where C + C0 constitutes a closed curve C (6.81) we get Z e C

f ðzÞdz ¼ 0,

ð6:83Þ

e is followed counterclockwise according to the custom. where C e can be chosen for the contour within the simply In (6.83) any closed path C connected region R . Hence, we reach the following important theorem. Theorem 6.10: Cauchy’s Integral Theorem Let R be a simply connected region and let C be an arbitrary closed curve within R . Let f (z) be analytic there. Then, we have

210

6 Theory of Analytic Functions

(a)

(b)

Γ

ℛ

ℛ

Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R . (b) The singular point at zs is not contained within Re so that f(z) can be analytic in a simply connected region Re . Regarding the symbols and notations, see text

I f ðzÞdz ¼ 0:

ð6:84Þ

C

There is a variation in description of Cauchy’s integral theorem. For instance, let f (z) be analytic in R except for a singular point at zs; see Fig. 6.12a. Then, the domain of analyticity of f (z) is not identical to R , but is defined as R fzs g. That is, the domain of analyticity of f (z) is no longer simply connected and, hence, (6.84) is not generally true. In such a case, we may deform the integration path of f (z) so that its domain of analyticity can be simply connected and that (6.84) can hold. For e as well as example, we take Re so that it can be surrounded by curves C and Γ lines L1 and L2 (Fig. 6.12b). As a result, zs has been expelled from Re and, at the same time, Re becomes simply connected. Then, as a tangible form of (6.84) we get I Cþe ΓþL1 þL2

f ðzÞdz ¼ 0,

ð6:85Þ

e denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are where Γ rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel, because they are taken in the reverse direction. Thus, we have I Cþe Γ

I f ðzÞdz ¼ 0 or

I f ðzÞdz ¼

C

Γ

f ðzÞdz,

ð6:86Þ

where Γ stands for the counterclockwise integration. Equation (6.86) is valid as well, when there are more singular points. In that case, instead of (6.86) we have I f ðzÞdz ¼ C

XI i

Γi

f ðzÞdz,

ð6:87Þ

6.3 Integration of Analytic Functions: Cauchy’s Integral Formula Fig. 6.13 Simply connected region R encircled by a closed contour C in a complex plane ζ. A circle Γ of radius ρ centered at z is contained within R . The contour integration is taken in the counterclockwise direction along C or Γ

211

ℛ

z Γ

i 0

1

where Γi is taken so that it can encircle individual singular points. Reflecting the nature of singularity, we get different results with (6.87). We will come back to this point later. On the basis of Cauchy’s integral theorem, we show an integral representation of an analytic function that is well known as Cauchy’s integral formula. Theorem 6.11: Cauchy’s Integral Formula Let f (z) be analytic in a simply connected region R . Let C be an arbitrary closed curve within R that encircles z. Then, we have f ðzÞ ¼

1 2πi

I C

f ðζ Þ dζ, ζz

ð6:88Þ

where the contour integration along C is taken in the counterclockwise direction. Proof Let Γ be a circle of a radius ρ in the complex plane so that z can be a center of the circle (see Fig. 6.13). Then, f (ζ)/(ζ z) has a singularity at ζ ¼ z. Therefore, we must evaluate (6.88) using (6.86). From (6.86), we have I

f ðζ Þ dζ ¼ ζ C z

I

I I f ðζ Þ f ðζ Þ f ðzÞ dζ dζ ¼ f ðzÞ dζ: þ ζ z ζz ζ z Γ Γ Γ

ð6:89Þ

An arbitrary point ζ on the circle Γ is described as ζ ¼ z þ ρeiθ :

ð6:90Þ

Then, taking infinitesimal quantities of (6.90), we have dζ ¼ ρeiθ idθ ¼ ðζ zÞ idθ: Inserting (6.91) into (6.89), we get

ð6:91Þ

212

6 Theory of Analytic Functions

I

dζ ¼ ζ Γ z

Z

2π

idθ ¼ 2πi:

ð6:92Þ

0

Rewriting (6.89), we obtain I

f ðζ Þ dζ ¼ 2πif ðzÞ þ Γ ζz

I Γ

f ðζ Þ f ðzÞ dζ: ζz

ð6:93Þ

Since f (z) is analytic, f (z) is uniformly continuous [6]. Therefore, if we make ρ small enough with an arbitrary positive number ε, we have j f (ζ) f (z)j < ε, as jζ zj ¼ ρ. Considering this situation, we have I I I 1 f ðζ Þ f ðζ Þ f ðζ Þ f ðzÞ 1 1 dζ f ð z Þ ¼ dζ f ð z Þ ¼ dζ j j j j 2πi ζ z ζz 2πi Γ ζ z 2πi Γ C I I Z 2π f ðζ Þ f ðzÞ jdζ j 1 ε jdζ j < ε dθ ¼ ε: ¼ 2π Γ ζ z 2π Γ jζ zj 2π 0 ð6:94Þ Therefore, we get (6.88) in the limit of ρ ! 0. This completes the proof.

∎

f ðζ Þ In the above proof, the domain of analyticity R of ζz (with respect to ζ) is given by R ¼ R \ ½ℂ fzg ¼ R fzg. Since both R and ℂ fzg are open sets (i.e., ℂ {z} ¼ {z}c is a complementary set of a closed set {z} and, hence, an open set), R is an open set as well; see Sect. 6.1. Notice that R is not simply connected. For this reason, we had to evaluate (6.88) using (6.89). Theorem 6.10 (Cauchy’s integral theorem) and Theorem 6.11 (Cauchy’s integral formula) play a crucial role in the theory of analytic functions. To derive (6.94), we can equally use the Darboux inequality [5]. This is intuitively obvious and frequently used in the theory of analytic functions.

Theorem 6.12: Darboux Inequality Let f (z) be a function for which j f (z)j is bounded on C. Here C is a piecewise continuous path in the complex plane. Then, with the following integral I described by Z I¼

f ðzÞdz, C

we have

6.3 Integration of Analytic Functions: Cauchy’s Integral Formula

213

Z j

f ðzÞdz j max j f j L,

ð6:95Þ

C

where L represents the arc length of the curve C for the contour integration. Proof As discussed earlier in this section, the integral is the limit of n ! 1 of the sum described by Sn ¼

Xn i¼1

f ðζ i Þðzi zi1 Þ

ð6:72Þ

with I ¼ lim Sn ¼ lim n!1

Xn

n!1

i¼1

f ðζ i Þðzi zi1 Þ:

ð6:73Þ

Denoting the maximum modulus of f (z) on C by max j fj, we have j Sn j

Xn i¼1

j f ðζ i Þ j j ðzi zi1 Þj max j f j

Xn i¼1

j ðzi zi1 Þj :

ð6:96Þ

P The sum ni¼1 j ðzi zi1 Þj on RHS of inequality (6.96) is the length of a polygon inscribed in the curve C. It is shorter than the arc length L of the curve C. Hence, for all n we have j Sn j max j f j L: As n ! 1, jSnj ¼ j I j ¼ j This completes the proof.

R

C f (z)dz

j max j f j L. ∎

Applying the Darboux inequality to (6.94), we get I I f ðζ Þ f ðzÞ f ðζ Þ f ðzÞ f ðζ Þ f ðzÞ 2πρ < 2πε: dζ ¼ dζ max ζz ρeiθ ρeiθ Γ Γ That is, j

1 2πi

I Γ

f ðζ Þ f ðzÞ dζ j< ε: ζz

ð6:97Þ

So far, we dealt with the differentiation and integration as different mathematical manipulations. But Cauchy’s integral formula (or Cauchy’s integral expression) enables us to relate and unify these two manipulations. We have following proposition for this.

214

6 Theory of Analytic Functions

Proposition 6.1 [5] Let C be a piecewise continuous curve of finite length. (The curve C may or may not be closed.) Let f (z) be a continuous function. The contour integration of f (ζ)/(ζ z) gives ef ðzÞ such that ef ðzÞ ¼ 1 2πi

Z

f ðζ Þ dζ: ζ C z

ð6:98Þ

Then, ef ðzÞ is analytic at any point z that does not lie on C. Proof We consider the following expression: Z e f ðζ Þ 1 f ðz þ ΔzÞ ef ðzÞ Δ dζ , Δz 2πi C ðζ zÞ2

ð6:99Þ

where Δ is a real non-negative number. Describing (6.99) by use of (6.98) for ef ðz þ ΔzÞ and ef ðzÞ, we get Z f ðζ Þ j Δz j Δ¼ dζ : 2π C ðζ z ΔzÞðζ zÞ2

ð6:100Þ

To obtain (6.100), in combination with (6.98) we have calculated (6.99) as follows: Z f ðζ Þðζ zÞ2 f ðζ Þðζ z ΔzÞðζ zÞ f ðζ ÞΔzðζ z ΔzÞ 1 Δ¼ dζ 2πiΔz C ðζ z ΔzÞðζ zÞ2 : Z f ðζ ÞðΔzÞ2 1 ¼ dζ 2πiΔz C ðζ z ΔzÞðζ zÞ2 Since z is not on C, the integrand of (6.100) is bounded. Then, as Δz ! 0, jΔz j ! 0 and Δ ! 0 accordingly. From (6.99), this ensures the differentiability of ef ðzÞ. Then, by definition of the differentiation, def ðzÞ is given by dz Z def ðzÞ f ðζ Þ 1 ¼ dζ: dz 2πi C ðζ zÞ2

ð6:101Þ

As f (ζ) is continuous, ef ðzÞ is single valued. Thus, ef ðzÞ is found to be analytic at any point z that does not lie on C. This completes the proof. ∎ If in (6.101) we take C as a closed contour that encircles z, from (6.98) we have

6.3 Integration of Analytic Functions: Cauchy’s Integral Formula

ef ðzÞ ¼ 1 2πi

215

I

f ðζ Þ dζ, ζ C z

ð6:102Þ

Moreover, if f (z) is analytic in a simply connected region that contains C, from Theorem 6.11 we must have 1 f ðzÞ ¼ 2πi

I

f ðζ Þ dζ: ζ C z

ð6:103Þ

Comparing (6.102) with (6.103) and taking account of the single valuedness of f (z), we must have f ðzÞ ef ðzÞ:

ð6:104Þ

Then, from (6.101) we get I

df ðzÞ 1 ¼ dz 2πi

f ðζ Þ dζ: ð ζ zÞ2 C

ð6:105Þ

An analogous result holds with the n-th derivative of f (z) such that [5] d n f ðzÞ n! ¼ 2πi dzn

I

f ðζ Þ dζ ðn : zero or positive integersÞ: nþ1 C ðζ zÞ

ð6:106Þ

Equation (6.106) implies that an analytic function is infinitely differentiable and that the derivatives of all order of an analytic function are again analytic. These prominent properties arise partly from the aforementioned stringent requirement on the differentiability of a function of a complex variable. So far, we have not assumed the continuity of dfdzðzÞ . However, once we have established (6.106), it assures the presence of, e.g., d dzf ð2zÞ and, hence, the continuity of 2

df ðzÞ dz .

It is true of d dzf ðnzÞ with any n (zero or positive integers). A following theorem is important and intriguing with the analytic functions. n

Theorem 6.13: Cauchy–Liouville Theorem A bounded entire function must be a constant. Proof Using (6.106), we consider the first derivative of an entire function f (z) described as df 1 ¼ dz 2πi

I

f ðζ Þ dζ: 2 C ðζ zÞ

Since f (z) is an entire function, we can arbitrarily choose a large enough circle of radius R centered at z for a closed contour C. On the circle, we have

216

6 Theory of Analytic Functions

ζ ¼ z þ Reiθ , where θ is a real number changing from 0 to 2π. Then, the above equation can be rewritten as df 1 ¼ dz 2πi

Z

f ðζ Þ 1 iReiθ dθ ¼ 2πR ðReiθ Þ2

2π

0

Z

2π

0

f ðζ Þ dθ: eiθ

ð6:107Þ

M , R

ð6:108Þ

Taking an absolute value of both sides, we have j

df 1 j dz 2πR

Z

2π

j f ðζ Þ j dθ

0

M 2πR

Z

2π

0

dθ ¼

where M is the maximum of j f (ζ)j. As R ! 1, j df dz j tends to be zero. This implies that df tends to be zero as well and, hence, that f (z) is constant. This completes the dz proof. ∎ At the first glance, Cauchy–Liouville theorem looks astonishing in terms of theory of real analysis. It is because we are too familiar with, e.g., 1 sin x 1 for any real number x. Note that sinx is bounded in a real domain. In fact, in Sect. 8.6 we will show when that from (8.98) and (8.99), a ! 1 (a : real, a 6¼ 0), sin π2 þ ia takes real values with sin π2 þ ia ! 1. This simple example clearly shows that sinϕ is an unbounded entire function in a complex domain. As a very familiar example of a bounded entire functions, we show f ðzÞ ¼ cos z2 þ sin z2 1,

ð6:109Þ

which is defined in an entire complex plane.

6.4

Taylor’s Series and Laurent’s Series

Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in relation to the power series expansion of analytic function. Taylor’s series and Laurent’s series are fundamental tools to study various properties of complex functions in the theory of analytic functions. First, let us examine Taylor’s series of an analytic function. Theorem 6.14 [6] Any analytic function f (z) can be expressed by uniformly convergent power series at an arbitrary regular point of the function in the domain of analyticity R .

6.4 Taylor’s Series and Laurent’s Series

217

Fig. 6.14 Diagram to explain Taylor’s expansion that is assumed to be performed around a. A circle C of radius r centered at a is inside R . Regarding other symbols and notations, see text

ℛ

Proof Let us assume that we have a circle C of radius r centered at a within R . Choose z inside C with jz aj ¼ ρ (ρ < r) and consider the contour integration of f (z) along C (Fig. 6.14). Then, we have 1 1 1 1 ¼ ¼ za ζ z ζ a ðz aÞ ζ a 1 ζa ¼

ð z aÞ 2 1 za þ þ þ , ζ a ð ζ aÞ 2 ð ζ aÞ 3

ð6:110Þ

where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following formula: X1 1 ¼ xn , n¼0 1x za ρ za where x ¼ ζa . Since ζa ¼ r < 1 , the geometric series of (6.110) converges. Suppose that f (ζ) is analytic on C. Then, we have a finite positive number M on C such that j f ðζ Þj< M:

ð6:111Þ

f ðζ Þ f ðζ Þ ðz aÞf ðζ Þ ðz aÞ2 f ðζ Þ ¼ þ þ þ : ζz ζa ðζ aÞ2 ð ζ aÞ 3

ð6:112Þ

Using (6.110), we have

Hence, combining (6.111) and (6.112) we get

218

6 Theory of Analytic Functions

" n # X1 1 ρr ðz aÞν f ðζ Þ M X1 ρ ν M 1 ¼ ν¼n ν¼n r r r 1 ρr 1 ρr ðζ aÞνþ1 M ρ n ¼ : rρ r

ð6:113Þ

RHS of (6.113) tends to be zero when n ! 1. Therefore, (6.112) is uniformly convergent with respect to ζ on C. In the above calculations, when we express the infinite series of RHS in (6.112) as Σ and the partial sum of its first n terms as Σn, LHS of (6.113) can be expressed as Σ Σn( Pn). Since j Pn j ! 0 with n ! 1, this certainly shows that the series Σ is uniformly convergent [6]. Consequently, we can perform termwise integration [6] of (6.112) on C and subsequently divide the result by 2πi to get 1 2πi

I

X 1 ð z aÞ n f ðζ Þ dζ ¼ n¼0 2πi Cζz

I

f ðζ Þ dζ: nþ1 C ðζ aÞ

ð6:114Þ

From Theorem 6.11, LHS of (6.114) equals f (z). Putting 1 An 2πi

I

f ðζ Þ dζ, nþ1 C ð ζ aÞ

ð6:115Þ

we get f ðzÞ ¼

X1

A ðz n¼0 n

aÞ n :

ð6:116Þ ∎

This completes the proof. In the above proof, from (6.106) we have 1 2πi

I

f ðζ Þ 1 d n f ðzÞ dζ ¼ : nþ1 n! dzn C ðζ zÞ

ð6:117Þ

Hence, combining (6.117) with (6.115) we get 1 An ¼ 2πi

f ðζ Þ 1 dn f ðzÞ 1 d n f ð aÞ dζ ¼ ¼ : n nþ1 n! dz n! dzn C ð ζ aÞ z¼a

I

Thus, (6.116) can be rewritten as

ð6:118Þ

6.4 Taylor’s Series and Laurent’s Series

219

Fig. 6.15 Diagram used for calculating the Taylor’s series of f(z) ¼ 1/z around z ¼ a (a 6¼ 0). A contour C encircles the point a

z i 0

f ðzÞ ¼

1

X1 1 dn f ðaÞ ð z aÞ n : n¼0 n! dzn

This is the same form as that obtained with respect to a real variable. Equation (6.116) is a uniformly convergent power series called a Taylor’s series or Taylor’s expansion of f (z) with respect to z ¼ a with An being its coefficient. In Theorem 6.14 we have assumed that a union of a circle C and its inside is simply connected. This topological environment is virtually the same as that of Fig. 6.13. In Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representation of an analytic function. On the basis of Cauchy’s integral formula, Theorem 6.14 demonstrates that any analytic function is given a tangible functional form. Example 6.2 Let us think of a function f (z) ¼ 1/z. This function in analytic except for z ¼ 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ {0}. We consider the expansion around z ¼ a (a 6¼ 0). Using (6.115), we get the coefficients of the expansion are 1 An ¼ 2πi

I

1 dz, nþ1 C z ð z aÞ

where C is a closed curve that does not contain z ¼ 0 on C or in its inside. Therefore, 1/z is analytic in the simply connected region encircled with C so that the point z ¼ 0 may not be contained inside C (Fig. 6.15). Then, from (6.106) we get 1 2πi

ð1Þn 1 1 d n ð1=zÞ 1 1 dz ¼ ¼ ð1Þn n! nþ1 ¼ nþ1 : n nþ1 n! dz n! a a C z ð z aÞ z¼a

I

Hence, from (6.116) we have

220

6 Theory of Analytic Functions

f ðzÞ ¼

n

1 X1 ð1Þ 1 X1 z n n ¼ ð z a Þ ¼ 1 : n¼0 anþ1 n¼0 z a a

ð6:119Þ

The series uniformly converges within a convergence circle called a “ball” [7] (vide infra) whose convergence radius r is estimated to be sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffi n n ð1Þ 1 1 n 1 1 ¼ : ¼ lim sup nþ1 ¼ r n!1 jaj jaj jaj a

ð6:120Þ

That is, r ¼ j aj. In (6.120) “n!1 lim sup” stands for the superior limit and is sometimes said to be limes superior [6]. The estimation is due to CauchyHadamard theorem. Readers interested in the theorem and related concepts are referred to appropriate literature [6]. The concept of the convergence circle and radius is very important for defining the analyticity of a function. Formally, the convergence radius is defined as a radius of a convergence circle in a metric space, typically ℝ2 and ℂ. In ℂ a region R within the convergence circle of convergence radius r (i.e., a real positive number) centered at z0 is denoted by R ¼ z; z, ∃ z0 2 ℂ, ρðz, z0 Þ < r : The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series defined in R is convergent. On the other hand, the Tailor’s series may or may not be convergent at z that is on the convergence circle. In Example 6.2, f ðzÞ ¼ 1z is not defined at z ¼ 0, even though z (¼0) is on the convergent circle. At other points on the convergent circle, however, the Tailor’s series is convergent. Note that a region B n in ℝn similarly defined as B n ¼ x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r is sometimes called a ball or open ball. This concept can readily be extended to any metric space. Next, we examine Laurent’s series of an analytic function. Theorem 6.15 [6] Let f (z) be analytic in a region R except for a point a. Then, f (z) can be described by the uniformly convergent power series within a region R fag. Proof Let us assume that we have a circle C1 of radius r1 centered at a and another circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be centered at z (6¼a) within R so that z can be outside of C2; see Fig. 6.16. In this situation we consider the contour integration along C1, C2, and Γ. Using (6.87), we have

6.4 Taylor’s Series and Laurent’s Series

221

Fig. 6.16 Diagram to explain Laurent’s expansion that is performed around a. The circle Γ containing the point z lies on the annular region bounded by the circles C1 and C2

Γ

ℛ

1 2πi

I

f ðζ Þ 1 dζ ¼ ζ z 2πi Γ

I

f ðζ Þ 1 dζ ζ z 2πi C1

I

f ðζ Þ dζ: ζ C2 z

ð6:121Þ

From Theorem 6.11, we have LHS of ð6:121Þ ¼ f ðzÞ: Notice that the region containing Γ and its inside form a simply connected region in terms of the analyticity of f (z). With the first term of RHS of (6.121), from Theorem 6.14 we have 1 2πi

I

X1 f ðζ Þ dζ ¼ A ð z aÞ n n¼0 n C1 ζ z

ð6:122Þ

with 1 An ¼ 2πi

I

f ðζ Þ dζ ðn ¼ 0, 1, 2, Þ: nþ1 C 1 ð ζ aÞ

ð6:123Þ

In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in the case of the proof of Theorem 6.14. With the second term of RHS of (6.121), however, the situation is different in such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have ζ a r 2 ¼ < 1: za ρ Then, a geometric series described by

ð6:124Þ

222

6 Theory of Analytic Functions

" # 2 1 1 1 ζ a ð ζ aÞ ¼ ¼ 1þ þ þ ζ z ζ a ð z aÞ z a z a ð z aÞ 2

ð6:125Þ

is uniformly convergent with respect to ζ on C2. Accordingly, again we can perform termwise integration [6] of (6.125) on C2. Hence, we get I

I

I 1 f ðζ Þdζ þ f ðζ Þðζ aÞdζ ðz aÞ2 C2 C2 I 1 f ðζ Þðζ aÞ2 dζ þ : þ 3 ðz aÞ C2

f ðζ Þ 1 dζ ¼ ζ z z a C2

ð6:126Þ

That is, we have

1 2πi

I

X1 f ðζ Þ dζ ¼ A ðz aÞn , n¼1 n C2 ζ z

ð6:127Þ

where An

1 ¼ 2πi

I f ðζ Þðζ aÞn1 dζ ðn ¼ 1, 2, 3, Þ:

ð6:128Þ

C2

Consequently, from (6.121), with z lying on the annular region bounded by C1 and C2, we get f ðzÞ ¼

X1

A ðz n¼1 n

aÞ n :

ð6:129Þ

In (6.129), the coefficients An are given by (6.123) or (6.128) according to the plus (including zero) or minus sign of n. These complete the proof. ∎ Equation (6.129) is a uniformly convergent power series called a Laurent’s series or Laurent’s expansion of f (z) with respect to z ¼ a with An being its coefficient. Note that the above annular region bounded by C1 and C2 is not simply connected, but multiply connected. e is taken in the annular region formed by C1 We add that if the integration path C and C2 so that it can be sandwiched in between them, the values of integral expressed as (6.123) and (6.128) remain unchanged in virtue of (6.87). That is, instead of (6.123) and (6.128) we may have a following unified formula I f ðζ Þ 1 dζ ðn ¼ 0, 1, 2, Þ An ¼ nþ1 2πi e C ð ζ aÞ that represents the coefficients An of the power series

ð6:130Þ

6.5 Zeros and Singular Points

223

f ðzÞ ¼

6.5

X1

A ðz n¼1 n

aÞ n :

ð6:129Þ

Zeros and Singular Points

We described the fundamental theorems of Taylor’s expansion and Laurent’s expansion. Next, we explain important concepts of zeros and singular points. Definition 6.11 Let f (z) be analytic in a region R . If f (z) vanishes at a point z ¼ a, the point is called a zero of f (z). When we have df ðzÞ d2 f ðzÞ dn1 f ðzÞ f ð aÞ ¼ ¼ ¼ 0, z¼a ¼ ¼ dz z¼a dz2 dzn1 z¼a

ð6:131Þ

but dn f ðzÞ 6¼ 0, dzn z¼a

ð6:132Þ

the function is said to have a zero of order n at z ¼ a. If (6.131) and (6.132) hold, the first n coefficients in the Tailor’s series of the analytic function f (z) at z ¼ a vanish. Hence, we have f ðzÞ ¼ An ðz aÞn þ Anþ1 ðz aÞnþ1 þ X1 ¼ ðz aÞn k¼0 Anþk ðz aÞk ¼ ðz aÞn hðzÞ, where we define h(z) as hð z Þ

X1

A ðz k¼0 nþk

aÞ k :

ð6:133Þ

Then, h(z) is analytic and nonvanishing at z ¼ a. From the analyticity h(z) must be continuous at z ¼ a and differ from zero in some finite neighborhood of z ¼ a. Consequently, it is also the case with f (z). Therefore, if the set of zeros had an accumulation point at z ¼ a, any neighborhood of z ¼ a would contain another zero, in contradiction to the above assumption. To avoid this contradiction, the analytic function f (z) 0 throughout the region R . Taking the contraposition of the above statement, if an analytic function is not identically zero [i.e., f (z) ≢ 0], the zeros of that function are isolated; see the relevant discussion of Sect. 6.1.2. Meanwhile, the Laurent’s series (6.129) can be rewritten as

224

6 Theory of Analytic Functions

f ðzÞ ¼

X1

A ðz k¼0 k

aÞk þ

X1

A k¼1 k

1 ð z aÞ k

ð6:134Þ

so that the constitution of the Laurent’s series can explicitly be recognized. The second term is called a singular part (or principal part) of f (z) at z ¼ a. The singularity is classified as follows: (1) If (6.134) lacks the singular part (i.e., Ak ¼ 0 for k ¼ 1, 2, ), the singular point is said to be a removable singularity. In this case, we have f ðzÞ ¼

X1

A ðz k¼0 k

aÞ k :

ð6:135Þ

If in (6.135) we define suitably as f ðaÞ A0 , f (z) can be regarded as analytic. Examples include the following case where a sin function is expanded as the Taylor’s series: sin z ¼ z

X1 ð1Þn1 z2n1 ð1Þn1 z2n1 z3 z5 þ þ þ ¼ : ð6:136Þ n¼1 ð2n 1Þ! 3! 5! ð2n 1Þ!

Hence, if we define f ðzÞ

sin z sin z and f ð0Þ lim ¼ 1, z z z!0

ð6:137Þ

z ¼ 0 is a removable singularity. (2) Suppose that in (6.134) we have a certain positive integer such that An 6¼ 0 but Aðnþ1Þ ¼ Aðnþ2Þ ¼ ¼ 0,

ð6:138Þ

the function f (z) is said to have a pole of order n at z ¼ a. If, in particular, n ¼ 1 in (6.138), i.e., A1 6¼ 0 but A2 ¼ A3 ¼ ¼ 0,

ð6:139Þ

the function f (z) is said to have a simple pole. This special case is important in the calculation of various integrals (vide infra). In the above cases, (6.134) can be rewritten as

6.6 Analytic Continuation

225

f ðzÞ ¼

gð z Þ , ð z aÞ n

ð6:140Þ

where g(z) defined as gð z Þ

X1

A ðz k¼0 kn

aÞk

ð6:141Þ

is analytic and nonvanishing at z ¼ a, i.e., An 6¼ 0. Explicitly writing (6.140), we have f ðzÞ ¼

X1

A ðz k¼0 k

aÞ k þ

Xn

A k¼1 k

1 : ðz aÞk

(3) If the singular part of (6.134) comprises an infinite series, the point z ¼ a is called an essential singularity. In that case, f (z) performs a complicated behavior near or at z ¼ a. Interested readers are referred to suitable literature [5]. A function f (z) that is analytic in a region ℂ except at a set of points of the region where the function has poles is called a meromorphic function in the said region. The above definition of meromorphic functions is true of Cases (1) and (2), but not true of Case (3). Henceforth, we will be dealing with the meromorphic functions. In the above discussion of Cases (1) and (2), we often deal with a function f (z) that has a single isolated pole at z ¼ a. This implies that f (z) is analytic within a certain neighborhood Na of z ¼ a but is not analytic at z ¼ a. More specifically, f (z) is analytic in a region Na {a}. Even though we consider more than one isolated pole, the situation is essentially the same. Suppose that there is another isolated pole at z ¼ b. In that case, again take a certain neighborhood Nb of z ¼ b (6¼a) and f (z) is analytic in a region Nb {b}. Readers may well wonder why we have to discuss this trifling issue. Nevertheless, think of the situation where a set of poles has an accumulation point. Any neighborhood of the accumulation point contains another pole where the function is not analytic. This is in contradiction to that f (z) is analytic in a region, e.g., Na {a}. Thus, if such a function were present, it would be intractable to deal with.

6.6

Analytic Continuation

When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known that if a function is analytic in a certain region of ℂ and on a curve C that encircles the region, the values of the function within the region are determined once the values of the function on C are given. We have the following theorem for this.

226

6 Theory of Analytic Functions

Fig. 6.17 Four convergence circles C1, C2, C3, and C4 for 1/z. These convergence circles are centered at 1, i, 1, or i

ℛ

−1

1

−

Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region R . Suppose that the two functions coincide in a set chosen from among (i) a neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a set of points containing an accumulation point belonging to R . Then, the two functions f1(z) and f2(z) coincide throughout R . Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three. Then, since f1(z) f2(z) ¼ 0 on that set, the set comprises the zeros of f1(z) f2(z). This implies that the zeros are not isolated in R . Thus, as evidenced from the discussion of Sect. 6.5, we conclude that f1(z) f2(z) 0 throughout R . That is, f1(z) f2(z). This completes the proof. ∎ The theorem can be understood as follows: Two different analytic functions cannot coincide in the set chosen from the above three (or more succinctly, a set that contains an accumulation point). In other words, the behavior of an analytic function in R is uniquely determined by the limited information of the subset belonging to R . The above statement is reminiscent of the pre-established harmony and, hence, occasionally talked about mysteriously. Instead, we wish to give a simple example. Example 6.3 In Example 6.2 we described a tangible illustration of the Taylor’s expansion of a function 1/z as follows: n

1 X1 ð1Þ 1 X1 z n n ¼ ð z a Þ ¼ 1 : n¼0 anþ1 n¼0 z a a

ð6:119Þ

If we choose 1, i, 1, or i for a, we can draw four convergence circles as depicted in Fig. 6.17. Let each function described by (6.119) be f1(z), fi(z), f1(z), or fi(z). Let R be a region that consists of an outer periphery and its inside of the four circles C1,

6.7 Calculus of Residues

227

C2, C3, and C4 (i.e., convergence circles of radius 1). The said region R is colored pale green in Fig. 6.17. There are four petals as shown. We can view Fig. 6.17 as follows: For instance, f1(z) and fi(z) are defined in C1 and its inside and C2 and its inside, respectively, excluding z ¼ 0. We have f1(z) ¼ fi(z) within the petal P1 (overlapped part as shown). Consequently, from the statement of Theorem 6.16, f1(z) fi(z) throughout the region encircled by C1 and C2 (excluding z ¼ 0). Repeating this procedure, we get f 1 ðzÞ f i ðzÞ f 1 ðzÞ f i ðzÞ: Thus, we obtain the same functional entity throughout R except for the origin (z ¼ 0). Further continuing similar procedures, we finally reach the entire complex plane except for the origin, i.e., ℂ {0} regarding the domain of analyticity of 1/z. We further compare the above results with those for analytic function defined in the real domain. The tailor’s expansion of f (x) ¼ 1/x (x: real with x 6¼ 0) around a (a 6¼ 0) reads as f ð xÞ ¼

X1 ð1Þn f ðnÞ ðaÞ 1 þ ¼ ¼ f ðaÞ þ f 0 ðaÞ þ þ ðx aÞn : ð6:142Þ n¼0 anþ1 n! x

This is exactly the same form as that of (6.119) aside from notation of the argument. The aforementioned procedure that determines the behavior of an analytic function outside the region where that function was originally defined is called analytic continuation or analytic prolongation. Comparing (6.119) and (6.142), we can see that (6.119) is a consequence of the analytic continuation of 1/x from the real domain to the complex domain.

6.7

Calculus of Residues

The theorem of residues and its application to calculus of various integrals are one of central themes of the theory of analytic functions. The Cauchy’s integral theorem (Theorem 6.10) tells us that if f (z) is analytic in a simply connected region R , we have I f ðzÞdz ¼ 0,

ð6:84Þ

C

where C is a closed curve within R . As we have already seen, this is not necessarily the case if f (z) has singular points in R . Meanwhile, if we look at (6.130), we are aware of a simple but important fact. That is, replacing n with 1, we have

228

6 Theory of Analytic Functions

1 2πi

A1 ¼

I

I f ðζ Þdζ or C

f ðζ Þdζ ¼ 2πiA1 :

ð6:143Þ

C

This implies that if we could obtain the Laurent’s series of f (z) with respect H to a singularity located at z ¼ a and encircled by C, we can immediately estimate Cf (ζ) 1 dζ of (6.143) from the coefficient of the term (z H a) , i.e., A1. This fact further implies that even though f (z) has singularities, Cf (ζ)dζ may happen to vanish. Thus, whether (6.84) vanishes depends on the nature of f (z) and its singularities. To examine it, we define a residue of f (z) at z ¼ a (that is encircled by C) as I

1 Res f ðaÞ 2πi

I f ðzÞdz or

C

f ðzÞdz ¼ 2πi Res f ðaÞ,

ð6:144Þ

C

where Res f (a) denotes the residue of f (z) at z ¼ a. From (6.143) and (6.144), we have A1 ¼ Res f ðaÞ:

ð6:145Þ

In a formal sense, the point a does not have to be a singular point. If it is a regular point of f (a), we have Res f (a) ¼ 0 trivially. If there is more than one singularity at z ¼ aj, we have I f ðzÞdz ¼ 2πi C

X j

Res f aj :

ð6:146Þ

Notice that we assume the isolated singularities with (6.144) and (6.146). These equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this framework, we wish to evaluate the residue of f (z) at a pole of order n located at z ¼ a. Using (6.140) f ðzÞ ¼

gð z Þ , ð z aÞ n

ð6:140Þ

where g(z) is analytic and nonvanishing at z ¼ a. Inserting (6.140) into (6.144), we have Res f ðaÞ ¼

1 2πi

¼

I

gð ζ Þ dn1 gðzÞ 1 j n dζ ¼ ðn 1Þ! dzn1 z¼a C ð ζ aÞ

d n1 ½ f ðzÞðz aÞn 1 jz¼a , ðn 1Þ! dzn1

where with the second equality we used (6.106).

ð6:147Þ

6.7 Calculus of Residues

229

In the special but important case where f (z) has a simple pole at z ¼ a, from (6.147) we get Res f ðaÞ ¼ f ðzÞðz aÞjz¼a :

ð6:148Þ

Setting n ¼ 1 in (6.147) in combination with (6.140), we obtain Res f ðaÞ ¼

1 2πi

I f ðzÞdz ¼ C

1 2πi

I

gðzÞ dz ¼ f ðzÞðz aÞjz¼a ¼ gðzÞjz¼a : z C a

This is nothing but Cauchy’s integral formula (6.88) regarding g(z). Summarizing the above discussion, we can calculate the residue of f (z) at a pole of order n located at z ¼ a using one of the following alternatives, i.e., (i) using (6.147) and (ii) picking up the coefficient A1 in the Laurent’s series described by (6.129). In Sect. 6.3 we mentioned that the closed contour integral (6.86) or (6.87) depends on the nature of singularity. Here we have reached a simple criterion of evaluating that integral. In fact, suppose that we have a Laurent’s expansion such that f ðzÞ ¼

X1

A ðz n¼1 n

aÞ n :

ð6:129Þ

Then, let us calculate the contour integral of f (z) on a closed circle Γ of radius ρ centered at z ¼ a. We have I I ¼

Γ

f ðζ Þdζ ¼

X1 n¼1

X1

A n¼1 n

Z iAn

2π

I Γ

ðζ aÞn dζ

ρnþ1 eiðnþ1Þθ dθ:

ð6:149Þ

0

The above integral vanishes except for n ¼ 1. With n ¼ 1 we get I ¼ 2πiA1 : Thus, we recover (6.145) in combination of (6.144). To get a Laurent’s series of f (z), however, is not necessarily easy. In that case, we estimate a residue by (6.147). Tangible examples can be seen in the next section.

230

6.8

6 Theory of Analytic Functions

Examples of Real Definite Integrals

The calculus of residues is of great practical importance. For instance, it is directly associated with the calculations of definite integrals. In particular, even if one finds the real integration to be hard to perform in order to solve a related problem, it is often easy to solve it using complex integration, especially the calculus of residues. Here we study several examples. Example 6.4 Let us consider the following real definite integral: Z I¼

1

1

dx : 1 þ x2

ð6:150Þ

To this end, we convert the real variable x to the complex variable z and evaluate a contour integral IC (Fig. 6.18) described by Z

R

dz IC ¼ þ 1 þ z2 R

Z

dz , 1 þ z2 ΓR

ð6:151Þ

where R is a real positive number enough large (R 1); ΓR denotes an upper semicircle; IC stands for the contour integral along the closed curve C that comprises the interval [R, R] and the upper semicircle ΓR. Simple poles of order 1 are located at z ¼ i as shown. There is only one simple pole at z ¼ i within the upper semicircle of Fig. 6.18. From (6.146), therefore, we have I IC ¼

f ðzÞdz ¼ 2πi Res f ðiÞ,

ð6:152Þ

C 1 where f ðzÞ 1þz 2. Using (6.148), the residue of f (z) at z ¼ i can readily be estimated to be

Fig. 6.18 Contour for the integration of 1/(1 + z2) that appears in Example 6.4. One may equally choose the upper semicircle (denoted by ΓR) and lower semicircle fR ) for contour (denoted by Γ integration

Γ i −

0 −i Γ

6.8 Examples of Real Definite Integrals

231

Res f ðiÞ ¼ f ðzÞðz iÞjz¼i ¼

1 : 2i

ð6:153Þ

To estimate the integral (6.150), we must calculate the second term of (6.151). To this end, we change the variable z such that z ¼ Reiθ ,

ð6:154Þ

where θ is an argument. Then, using the Darboux inequality (6.95), we get Z π Z π Z π iReiθ dz iReiθ dθ R dθ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2 dθ 2 2iθ 2 2 2iθ 2iθ ΓR 1þz 0 R e þ1 0 R e þ1 0 R e þ1 R2 e2iθ þ1

Z

Z π 1 π π ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dθ R ¼R 4 1 1 0 R þ2R2 cos2θ þ1 R2 1 2 R 1 2 R R

:

Taking R ! 1, we have Z

dz ! 0: 1 þ z2 ΓR

ð6:155Þ

Consequently, if R ! 1, we have Z Z 1 dx dz dx lim I C ¼ þ lim ¼ ¼I 2 2 R!1 R!1 1 þ x 1 þ z 1 þ x2 ΓR 1 1 Z

1

¼ 2πi Res f ðiÞ ¼ π,

ð6:156Þ

where with the first equality we placed the variable back to x; with the last equality we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in general that if a meromorphic function is given as a quotient of two polynomials, an integral of a type of (6.155) tends to be zero with R ! 1 in the case where the degree of the denominator of that function is at least two units higher than the degree of the numerator [8]. e (the closed curve comprising the We may equally choose another contour C fR ) for the integration (Fig. 6.18). In that interval [R, R] and the lower semicircle Γ case, we have

232

6 Theory of Analytic Functions

Z lim I ¼

C R!1 e

1

1

Z Z 1 dx dz dx þ lim ¼ 1 þ x2 R!1 ΓeR 1 þ z2 1 þ x2 1

¼ 2πi Res f ðiÞ ¼ π:

ð6:157Þ

Note that in the above equation the contour integral is taken in the counterclockwise direction and, hence, that the real definite integral has been taken from 1 to 1. Notice also that Res f ðiÞ ¼ f ðzÞðz þ iÞjz¼i ¼

1 , 2i

because f (z) has a simple pole at z ¼ i in the lower semicircle (Fig. 6.18) for which the residue has been taken. From (6.157), we get the same result as before such that Z

1

dx ¼ π: 1 þ x2 1

Alternatively, we can use the method of partial fraction decomposition. In that case, as the integrand we have

1 1 1 1 : ¼ 2 2i z i z þ i 1þz

ð6:158Þ

1 1 1 The residue of 1þz 2 at z ¼ i is 2i [i.e., the coefficient of zi in (6.158)], giving the same result as (6.153).

Example 6.5 Evaluate the following real definite integral: Z I¼

1

1

dx : ð 1 þ x2 Þ 3

ð6:159Þ

As in the case of Example 6.4, we evaluate a contour integration described by I IC ¼

1 dz ¼ 3 C ð1 þ z2 Þ

Z

R

dz þ 3 R ð1 þ z2 Þ

Z ΓR

dz , ð1 þ z2 Þ3

ð6:160Þ

where C stands for the closed curve comprising the interval [R, R] and the upper semicircle ΓR (see Fig. 6.18). The function f ðzÞ ð1þz1 2 Þ3 has two isolated poles at z ¼ i. In the present case, however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue. The inside of the upper semicircle contains the pole of order 3 at z ¼ i. Therefore, we have

6.8 Examples of Real Definite Integrals

1 Res f ðiÞ ¼ 2πi

I

233

1 f ðzÞdz ¼ 2πi C

I

gð z Þ 1 d2 gðzÞ dz ¼ , 3 2 2! dz C ðz iÞ z¼i

ð6:161Þ

where gð z Þ

gð z Þ 1 and f ðzÞ ¼ : 3 ðz þ i Þ ðz iÞ3

Hence, we get 1 d 2 gð z Þ 1 1 5 5 z¼i ¼ ð3Þ ð4Þ ðz þ iÞ ¼ ð3Þ ð4Þ ð2iÞ 2 2! dz 2 2 z¼i ¼

3 ¼ Res f ðiÞ: 16i

ð6:162Þ

For a reason similar to that of Example 6.4, the second term of (6.160) vanishes when R ! 1. Then, we obtain Z I gð z Þ dz dz þ lim ¼ lim dz ¼ 2πi Res f ðiÞ 3 3 3 2 2 R!1 R!1 ΓR ð1 þ z Þ C ðz iÞ 1 ð1 þ z Þ 3 3π ¼ 2πi

¼ : 16i 8 Z

lim I ¼

R!1C

1

That is, Z

1

dz ¼ 3 1 ð1 þ z2 Þ

Z

1

dx 3π ¼I¼ : 3 2 8 1 ð1 þ x Þ

If we are able to perform the Laurent’s expansion of f (z), we can immediately evaluate the residue using (6.145). To this end, let us utilize the binomial expansion formula (or generalized binomial theorem). We have f ðzÞ ¼

1 1 ¼ : ð1 þ z2 Þ3 ðz þ iÞ3 ðz iÞ3

Changing the variable z i ¼ ζ, we have

234

6 Theory of Analytic Functions

3 1 1 1 ζ ¼ 1 þ 2i ðζ þ 2iÞ3 ζ 3 ζ 3 ð2iÞ3 " # 2 3 ð3Þð4Þð5Þ ζ 1 1 ζ ð3Þð4Þ ζ ¼ 3 1 þ ð3Þ þ þ þ 2! 3! 2i 2i 2i ζ ð2iÞ3

1 1 3 3 5 ¼ þ þ , 8i ζ 3 2iζ 2 2ζ 4i ð6:163Þ

f ðzÞ ¼ ef ðζ Þ ¼

where of the above equation denotes a power series of ζ. Accompanied by the variable transformation z to ζ, the functional form f has been changed to ef . Getting back the original functional form, we have " # 1 1 3 3 5 þ þ : f ðzÞ ¼ 8i ðz iÞ3 2iðz iÞ2 2ðz iÞ 4i

ð6:164Þ

Equation (6.164) is a Laurent’s expansion of f (z). From (6.164) we see that the 1 3 of f (z) is 16i , in agreement with (6.145) and (6.162). To obtain the coefficient of zi e (Fig. 6.18) and answer of (6.159), once again we may equally choose the contour C get the same result as in the case of Example 6.4. In that case, f (z) can be expanded into a Laurent’s series around z ¼ i. Readers are encouraged to check it. We make a few remarks about the (generalized) binomial expansion formula. We described it in (3.181) of Sect 3.6.1 such that ð1 þ xÞλ ¼

X1 λ m¼0

m

xm ,

ð3:181Þ

where λ is an arbitrary real number. In (3.181), we assumed x is a real number with jxj < 1. But, (6.163) suggests that (3.181) holds with x that can be a complex number with jxj < 1. Moreover, λ in (3.181) is allowed to be any complex number. Then, on those conditions (1 + x)λ is analytic because 1 + x 6¼ 0, and so (1 + x)λ must have a 3 of (6.163) yields convergent Taylor’s series. By the same token, the factor 1 þ 2iζ a convergent Taylor’s series around ζ ¼ 0. In virtue of the factor ζ13 , (6.163) gives a convergent Laurent’s series around ζ ¼ 0. Returning to the present issue, we rewrite (3.181) as a Taylor’s series such that ð1 þ zÞλ ¼

X1 λ m¼0

m

zm ,

ð6:165Þ

where λ is any complex number with z also being a complex number of jzj < 1. We may view (6.165) as a consequence of the analytic continuation and (6.163) has been dealt with as such indeed.

6.8 Examples of Real Definite Integrals Fig. 6.19 Graphical form 1 of f ðxÞ 1þx 3 . The modulus of f(x) tends to infinity at x ¼ 1. Inflection points are present at x ¼ 0 and p ffiffiffiffiffiffiffiffi 3 1=2 ð 0:7937Þ

235

-5

-3

-1

1

3

5

Example 6.6 Evaluate the following real definite integral: Z I¼

1

dx : 1 þ x3 1

ð6:166Þ

We put f ð xÞ

1 : 1 þ x3

ð6:167Þ

Figure 6.19 depicts a graphical form of f (x). The modulus of f (x) tends to be pffiffiffiffiffiffiffiffi infinity at x ¼ 1. Inflection points are present at x ¼ 0 and 3 1=2 ð 0:7937Þ. Now, in the former two examples, the isolated singular points (i.e., poles) existed only in the inside of a closed contour. In the present case, however, a pole is located on the real axis. Bearing in mind this situation, we wish to estimate the integral (6.166). Rewriting (6.166), we have Z

Z

1

dx I¼ 2 x þ 1Þ ð 1 þ x Þ x ð 1

or

I¼

1

dz : 2 z þ 1Þ ð 1 þ z Þ z ð 1

2 The polynomial of the denominator, i.e., + z)(z z + 1) has a real root at (1 p ffiffi 1 3i iπ=3 ¼ 2 . Therefore, z ¼ 1 and complex roots at z ¼ e

f ðzÞ

1 ð 1 þ z Þ ð z 2 z þ 1Þ

236

6 Theory of Analytic Functions

Fig. 6.20 Contour for the 1 integration of 1þz 3 that appears in Example 6.6

Γ

−1

−

/

i

Γ

0 /

has three simple poles at z ¼ 1 along with z ¼ eiπ/3. This time, we have a contour C depicted in Fig. 6.20, where the contour integration IC is performed in such a way that Z IC ¼

1r

dz þ ð1 þ zÞðz2 z þ 1Þ

Z

dz 2 z þ 1Þ ð 1 þ z Þ z ð R Γ1 Z R Z dz dz þ þ , 2 2 1þr ð1 þ zÞðz z þ 1Þ ΓR ð1 þ zÞðz z þ 1Þ ð6:168Þ

where Γ1 stands for a small semicircle of radius r around z ¼ 1 as shown. Of the complex roots z ¼ eiπ/3, only z ¼ eiπ/3 is responsible for the contour integration (see Fig. 6.20). Here we define a principal value P of the integral such that Z P

1

dx 2 x þ 1Þ ð 1 þ x Þ x ð 1

Z 1r Z 1 dx dx lim þ , 2 r!0 ð 1 þ x Þ ð x 2 x þ 1Þ 1 1þr ð1 þ xÞðx x þ 1Þ

ð6:169Þ

where r is an arbitrary real positive number. As shown in this example, the principal value is a convenient device to traverse the poles on the contour and gives a correct answer in the contour integral. At the same time, getting R to infinity, we obtain Z lim I C ¼ P

R!1

1

dx þ 2 x þ 1Þ ð 1 þ x Þ x ð 1 Z dz : þ 2 z þ 1Þ ð 1 þ z Þ z ð Γ1

Z

dz 2 z þ 1Þ ð 1 þ z Þ z ð Γ1

Meanwhile, the contour integral IC is given by (6.146) such that

ð6:170Þ

6.8 Examples of Real Definite Integrals

237

I lim I C ¼

f ðzÞdz ¼ 2πi

R!1

C

X

Res f aj ,

ð6:146Þ

j

where in the present case f ðzÞ ð1þzÞðz12 zþ1Þ and we have only one simple pole at aj ¼ eiπ/3 within the contour C. We are going to get the answer by combining (6.170) and (6.146) such that Z IP

1

dx 2 x þ 1Þ ð 1 þ x Þ x ð 1 Z

Z dz dz : ¼ 2πi Res f eiπ=3 2 2 Γ1 ð1 þ zÞðz z þ 1Þ Γ1 ð1 þ zÞðz z þ 1Þ ð6:171Þ

The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4 and 6.5. Therefore, if we can estimate the second term as well as the residues properly, the integral (6.166) can be adequately determined. Note that in this case the integral is meant by the principal value. To estimate the integral of the second term, we make use of the fact that defining g(z) as gð z Þ

1 , z2 z þ 1

ð6:172Þ

g(z) is analytic at and around z ¼ 1. Hence, g(z) can be expanded in the Taylor’s series around z ¼ 1 such that gðzÞ ¼ A0 þ

X1

A ðz n¼1 n

þ 1Þ n ,

ð6:173Þ

where A0 6¼ 0. In fact, A0 ¼ gð1Þ ¼ 1=3:

ð6:174Þ

Then, we have Z

Z

gðzÞdz Γ1 1 þ z Z Z X 1 dz ¼ A0 þ A ðz þ 1Þn1 dz: n¼1 n 1 þ z Γ1 Γ1

dz ¼ 2 z þ 1Þ ð 1 þ z Þ z ð Γ1

Changing the variable z to the polar form in RHS such that

ð6:175Þ

238

6 Theory of Analytic Functions

z ¼ 1 þ reiθ ,

ð6:176Þ

with the first term of RHS we have Z A0

ireiθ dθ ¼ A0 iθ Γ1 re

Z

Z

Γ1

idθ ¼ A0

π

0

Z idθ ¼ A0

π

idθ ¼ A0 iπ:

ð6:177Þ

0

Note that in (6.177) the contour integration along Γ1 was performed in the decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as Z

X1 Z 0 dz iπ þ A i r n einθ dθ, ¼ A 0 n 2 n¼1 Γ1 ð1 þ zÞðz z þ 1Þ π

ð6:178Þ

where with the last equality we exchanged the order of the summation and integration. The second term of RHS of (6.178) vanishes when r ! 0. Thus, (6.171) is further rewritten as

I ¼ lim I ¼ 2πi Res f eiπ=3 þ A0 iπ: r!0

ð6:179Þ

We have only one simple pole for f (z) at z ¼ eiπ/3 within the semicircle contour C. Then, using (6.148), we have

Res f eiπ=3 ¼ f ðzÞ z eiπ=3 ¼

z¼eiπ=3

pffiffiffi 1 1 1 þ 3i : ¼ 6 ð1 þ eiπ=3 Þðeiπ=3 eiπ=3 Þ

ð6:180Þ

The above calculus is straightforward, but (6.37) can be conveniently used. Thus, finally we get I¼

pffiffiffi pffiffiffi πi 3π πi π ¼ pffiffiffi : 1 þ 3i þ ¼ 3 3 3 3 That is, I 1:8138:

ð6:181Þ ð6:182Þ

To avoid any ambiguity of the principal value, we properly define it as [8] Z

1

P 1

Z f ðxÞdx lim

R

R!1 R

f ðxÞdx:

6.8 Examples of Real Definite Integrals

239

(a)

(b) Γ i −1

Γ

/

Γ

−1

−

0

/

i

0 /

/

1 Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The integration range is [0, 1). (b) The integration range is (1, 0]

The integral can also be calculated using a lower semicircle including the simple pffiffi 1 3i iπ=3 ¼ 2 pole located at z ¼ e . That gives the same answer as (6.181). The calculations are left for readers as an exercise. Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with that of the following real definite integral: Z

1

I¼ 0

dx : 1 þ x3

ð6:183Þ

To evaluate (6.183), it is convenient to use a sector of circle for contour (see Fig. 6.21a). In this case, the contour integral IC estimated along a sector is described by Z

R

IC ¼

dz þ 1 þ z3

0

Z

dz þ 1 þ z3 ΓR

Z L

dz : 1 þ z3

ð6:184Þ

In the third integral of (6.184), we take argz ¼ θ (constant) so that with z ¼ reiθ we may have dz ¼ dreiθ. Then, that integral is given by Z L

dz ¼ 1 þ z3

Z L

dr : 1 þ r 3 e3iθ

ð6:185Þ

Setting 3iθ ¼ 2iπ, namely θ ¼ 2π/3, we get Z L

dz ¼ e2πi=3 1 þ r 3 e3iθ

Z

0 R

dr ¼ e2πi=3 1 þ r3

Thus, taking R ! 1 in (6.184) we have

Z

R 0

dr : 1 þ r3

240

6 Theory of Analytic Functions

Z

1

lim I C ¼

R!1

0

dx þ 1 þ x3

Z

dz e2πi=3 1 þ z3 Γ1

Z 0

1

dr : 1 þ r3

ð6:186Þ

The second term of (6.186) vanishes as before. Changing the variable r ! x and taking account of (6.180), we obtain

Z 2πiRes f eiπ=3 ¼ 1 e2πi=3

1

0

dx 2πi=3 ¼ 1 e I: 1 þ x3

ð6:187Þ

Then, we have

I ¼ 2πiRes f eiπ=3 = 1 e2πi=3

¼ 2πi= 1 e2πi=3 1 þ eiπ=3 eiπ=3 eiπ=3 : To calculate Res f (eiπ/3), we used (6.148) at z ¼ eiπ/3; f (z) has a simple pole at z ¼ eiπ/3 within the contour C of Fig. 6.21a. Noting that (1 e2πi/3) (1 + eiπ/3) ¼ 3 pffiffiffi and eiπ=3 eiπ=3 ¼ 2i sin ðπ=3Þ ¼ 3i, we get pffiffiffi 2π I ¼ 2πi= 3 3i ¼ pffiffiffi : 3 3

ð6:188Þ

That is, I 1.2092. This number is two-thirds of (6.182). We further extend the above calculation method to evaluation of real definite integral. For instance, we have a following integral described by Z I¼P

0

1

dx : 1 þ x3

ð6:189Þ

To evaluate this integral, we define the contour integral I C0 as in Fig. 6.21b, which is expressed as Z

I C0

0

dz ¼P þ 1 þ z3 R

Z

dz þ 2 z þ 1Þ ð 1 þ z Þ z ð Γ1

Z

dz þ 3 0 1 þ z L

Z

dz : ð6:190Þ 1 þ z3 ΓR

Notice that combining I C0 of (6.190) with IC of (6.184) makes the contour integration of (6.168). Proceeding as before and noting that there is no singularity inside the contour C0, we have lim I C

R!1

0

Z ¼0¼P

0

2πi dz πi þe3 3 3 1 þ z 1

Z 0

1

dr ¼P 1 þ r3

Z

0

dz π pffiffiffi , 3 1 þ z 3 3 1

where we used (6.188) and (6.178) in combination with (6.174). Thus, we get

6.8 Examples of Real Definite Integrals

Z I¼

241 0

dx π ¼ pffiffiffi : 3 3 3 1 1 þ x

ð6:191Þ

As a matter of course, the result of (6.191) can immediately be obtained by subtracting (6.188) from (6.181). We frequently encounter real definite integrals including trigonometric functions. In such cases, to make a valid estimation of the integrals it is desired to use complex exponential functions. Jordan’s lemma is a powerful tool for this. The proof is given as below following literature [5]. Lemma 6.6: Jordan’s Lemma [5] Let ΓR be a semicircle of radius R centered at the origin in the upper half of the complex plane. Let f (z) be an analytic function that tends uniformly to zero as jz j ! 1 when argz lies in the interval 0 arg z π. Then, with a non-negative real number α, we have Z lim

R!1 Γ R

eiαζ f ðζ Þdζ ¼ 0:

Proof Using polar coordinates, ζ is expressed as ζ ¼ Reiθ ¼ Rðcos θ þ i sin θÞ: Then, the integral is denoted by Z IR

Z

π

e f ðζ Þdζ ¼ iR iαζ

ΓR

Z

¼ iR

f Reiθ eiαRðcos θþi sin θÞ eiθ dθ

0 π

f Re

iθ

eiαR cos θαR sin θþiθ dθ:

0

By assumption f (z) tends uniformly to zero as jz j ! 1, and so we have iθ f Re < εðRÞ, where ε(R) is a certain positive number that depends only on R and tends to be zero as jzj ! 1. Therefore, we have Z π Z jI R j < εðRÞR eαR sin θ dθ ¼ 2 εðRÞR 0

π=2

eαR sin θ dθ,

0

where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we have

242

6 Theory of Analytic Functions

sin θ 2θ=π when 0 θ π=2: Hence, we have Z jI R j < 2 εðRÞR

π=2

e2αRθ=π dθ ¼

0

πεðRÞ 1 eαR : α

Therefore, we get lim I R ¼ 0:

R!1

∎

These complete the proof.

Alternatively, for f (z) that is analytic and tends uniformly to zero as jz j ! 1 when argz lies in the interval π arg z 2π, similarly we have Z lim

R!1 Γ R

eiαζ f ðζ Þdζ ¼ 0,

where α is a certain non-negative real number. In this case, we assume that a semicircle ΓR of radius R centered at the origin lies in the lower half of the complex plane. Using the Jordan’s lemma, we have several examples. Example 6.8 Evaluate the following real definite integral: Z I¼

1

1

sin x dx: x

ð6:192Þ

Rewriting (6.192) as Z I¼

1

1

sin z 1 dz ¼ z 2i

Z

1

eiz eiz dz, z 1

ð6:193Þ

we consider the following contour integral: Z

R

eiz IC ¼ P dz þ R z

Z

eiz dz þ Γ0 z

Z

eiz dz, ΓR z

ð6:194Þ

where Γ0 represents an infinitesimally small semicircle around the origin and ΓR shows the outer semicircle of radius R centered at the origin (see Fig. 6.22). The contour C consists of the real axis, Γ0, and ΓR as shown. The third term of (6.194) vanishes when R ! 1 due to the Jordan’s lemma. With the second term, eiz is analytic in the whole complex plane (i.e., entire function) and,

6.8 Examples of Real Definite Integrals

243

Fig. 6.22 Contour for the integration of sinz z that appears in Example 6.8

Γ Γ −

0

hence, can be expanded in the Taylor’s series around any complex number. Expanding it around the origin we have eiz ¼

X1 ðizÞn X1 ðizÞn ¼ 1 þ : n¼0 n! n¼1 n!

ð6:195Þ

As already mentioned in (6.173) through (6.178), among terms of RHS in (6.195) only the first term, i.e., 1, contributes to the integral of the second term of RHS in (6.194). Thus, as in (6.178) we get Z

eiz dz ¼ 1 iπ ¼ iπ: Γ0 z

ð6:196Þ

Notice that in (6.196) the argument is decreasing from π ! 0 during the contour integration. Within the contour C, there is no singular point, and so IC ¼ 0 due to Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain Z P

1

eiz dz ¼ 1 z

Z

eiz dz ¼ iπ: Γ0 z

ð6:197Þ

Exchanging the variable z ! z in (6.197), we have

Z P

Z 1 iz e e dz ¼ P dz ¼ iπ: z 1 z

1 iz

1

ð6:198Þ

Summing both sides of (6.197) and (6.198), we get Z 1 iz eiz e dz þ P dz ¼ 2iπ: 1 z 1 z

Z P

1

Rewriting (6.199), we have

ð6:199Þ

244

6 Theory of Analytic Functions

Z

1

ðeiz eiz Þ dz ¼ 2iP P z 1

Z

1

1

sin z dz ¼ 2iπ: z

ð6:200Þ

That is, the answer is Z

1

P 1

Z 1 sin z sin x dz ¼ P dx ¼ I ¼ π: z 1 x

Notice that as mentioned in (6.137) z ¼ 0 is a removable singularity of the symbol P is superfluous in (6.201).

ð6:201Þ sin z z ,

and so

Example 6.9 Evaluate the following real definite integral: Z I¼

1

1

sin 2 x dx: x2

ð6:202Þ

From the trigonometric theorem, we have 1 sin 2 x ¼ ð1 cos 2xÞ: 2

ð6:203Þ

Then, the integrand of (6.202) with the variable changed is rewritten using (6.37) as

sin 2 z 1 1 e2iz 1 e2iz ¼ þ : 4 z2 z2 z2

ð6:204Þ

Expanding e2iz as before, we have e2iz ¼

X1 ð2izÞn X1 ð2izÞn ¼ 1 þ 2iz þ : n¼0 n! n¼2 n!

ð6:205Þ

Then, we get X1 ð2izÞn , n¼2 n! n n2 X 1 ð2iÞ z 2i ¼ : n¼2 z n!

1 e2iz ¼ 2iz 1 e2iz z2

ð6:206Þ

As in Example 6.8, we wish to use (6.206) to estimate the second term of the following contour integral IC:

6.8 Examples of Real Definite Integrals

Z lim I C ¼ P

R!1

1

1 e2iz dz þ z2 1

245

Z

Z

1 e2iz dz þ z2 Γ0

1 e2iz dz: z2 Γ1

ð6:207Þ

With the second integral of (6.207), as before, only the first term of (6.206) contributes to the integral. The result is Z

1 e2iz dz ¼ 2i z2 Γ0

Z

1 dz ¼ 2i Γ0 z

Z π

0

idθ ¼ 2π:

ð6:208Þ

With the third integral of (6.207), we have Z

1 e2iz dz ¼ z2 Γ1

Z

1 dz 2 Γ1 z

Z

e2iz dz: 2 Γ1 z

ð6:209Þ

The first term of RHS of (6.209) vanishes for the reason similar to that already we have seen in Example 6.4. The second term of RHS vanishes as well because of the Jordan’s lemma. Within the contour C of (6.207), again there is no singular point, and so lim I C ¼ 0. Hence, from (6.207) we have R!1

Z P

Z 1 e2iz 1 e2iz dz ¼ dz ¼ 2π: 2 z z2 1 Γ0 1

ð6:210Þ

Exchanging the variable z ! z in (6.210) as before, we have " Z P

1

1

# Z 1 1 e2iz 1 e2iz dz ¼ P dz ¼ 2π: 2 z2 ðzÞ 1

ð6:211Þ

Summing both sides of (6.210) and (6.211) in combination with (6.204), we get an answer described by Z

1 1

sin 2 z dz ¼ π: z2

ð6:212Þ

Again, the symbol P is superfluous in (6.210) and (6.211). Example 6.10 Evaluate the following real definite integral: Z I¼

1

1

cos x dx ða 6¼ bÞ: ð x aÞ ð x b Þ

ð6:213Þ

We rewrite the denominator of (6.213) using the method of partial fraction decomposition such that

246

6 Theory of Analytic Functions

(a)

(b)

Γ Γ

0

−

0

−

Γ Γ

z Fig. 6.23 Contour for the integration of ðzacos ÞðzbÞ that appears in Example 6.10. (a) Along the upper 0 contour C. (b) Along the lower contour C

1 1 1 1 ¼ : ðx aÞðx bÞ a b x a x b

ð6:214Þ

Then, (6.213) is described by Z

cos z cos z dz zb 1 z a Z 1 iz 1 e þ eiz eiz þ eiz ¼ dz: za zb 2ða bÞ 1

1 I¼ ab

1

ð6:215Þ

We wish to consider the integration by dividing the integrand and calculate each term individually. First, we estimate Z

1

eiz dz: 1 z a

ð6:216Þ

To apply the Jordan’s lemma to the integration of (6.216), we use the upper contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we consider a following integral: Z lim I C ¼ P

R!1

1

eiz dz þ 1 z a

Z

eiz dz þ Γa z a

Z

eiz dz, Γ1 z a

ð6:217Þ

where Γa represents an infinitesimally small semicircle around z ¼ a. Notice that (6.217) is obtained when we are considering R ! 1 in Fig. 6.23a. The third term of (6.217) vanishes due to the Jordan’s lemma. With the second term of (6.217), we change variable z a ! z and rewrite it as

6.8 Examples of Real Definite Integrals

Z

eiz dz ¼ Γa z a

Z

247

eiðzþaÞ dz ¼ eia z Γ0

Z

eiz dz ¼ eia iπ, Γ0 z

ð6:218Þ

where with the last equality we used (6.196). There is no singular point within the contour C, and so lim I C ¼ 0. Then, we have R!1

Z eiz eiz dz ¼ dz ¼ eia iπ, P 1 z a Γa z a Z

1

ð6:219Þ

Next, we consider a following integral that appears in (6.215): Z

1

eiz dz: 1 z a

ð6:220Þ

To apply the Jordan’s lemma to the integration of (6.220), we use the lower fR (Fig. 6.23b). This time, the contour C0 that consists of the real axis, Γea , and Γ integral to be evaluated is given by Z lim I C0 ¼ P

Z Z eiz eiz eiz dz þ dz þ dz z a z a z 1 Γea Γe1 a

R!1

1

ð6:221Þ

so that we can trace the contour C0 counterclockwise. Notice that the minus sign is present in front of the principal value. The third term of (6.221) vanishes due to the Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get Z

Z iz Z iðzþaÞ eiz e e ia dz ¼ dz ¼ e dz ¼ eia iπ: z a z e e e Γa Γ0 Γ0 z

ð6:222Þ

Hence, we obtain Z eiz eiz dz ¼ dz ¼ eia iπ: Γea z a 1 z a

Z P

1

ð6:223Þ

Notice that in (6.222) the argument is again decreasing from 0 ! π during the contour integration. Summing (6.219) and (6.223), we get Z P

Z 1 iz Z 1 iz eiz e e þ eiz dz þ P dz ¼ P dz ¼ eia iπ eia iπ 1 z a 1 z a 1 z a ¼ iπ eia eia ¼ iπ 2i sin a ¼ 2π sin a: 1

In a similar manner, we also get

248

6 Theory of Analytic Functions

Z P

1

eiz þ eiz dz ¼ 2π sin b: 1 z b

Thus, the final answer is 1 I¼ 2ð a bÞ ¼

Z

1 iz

1

2π ðsin b sin aÞ e þ eiz eiz þ eiz dz ¼ za zb 2ð a bÞ

π ðsin b sin aÞ : ab

ð6:224Þ

Example 6.11 Evaluate the following real definite integral: Z I¼

1

1

cos x dx: 1 þ x2

ð6:225Þ

The calculation is left for readers. Hints: (i) Use cosx ¼ (eix + eix)/2 and Jordan’s lemma. (ii) Use the contour as shown in Fig. 6.18 for integration. Besides the examples listed above, a variety of integral calculations can be seen in literature [9–12].

6.9

Multivalued Functions and Riemann Surfaces

The last topics of the theory of analytic functions are related to multivalued functions. Riemann surfaces play a central role in developing the theory of the multivalued functions.

6.9.1

Brief Outline

We mention a brief outline for notions of multivalued functions and Riemann surfaces. These notions are indispensable for investigating properties of irrational functions and logarithmic functions and performing related calculations, especially integrations with respect to those functions. Definition 6.12 Let n be a positive integer and let z be an arbitrary complex number. Suppose we have z ¼ wn : Then, w is said to be a n-th root of z and we denote

ð6:226Þ

6.9 Multivalued Functions and Riemann Surfaces Fig. 6.24 Multiple roots in the complex plane. (a) Square roots of 1 and 1. (b) Cubic roots of 1 and 1

(a)

1

249

/

(−1)

−1

/

1

−

(b)

1

/

(−1)

/ /

/

1

−1

/

/

w ¼ z1=n :

ð6:227Þ

We have expressly shown this simple thing. It is because if we think of real z, z1/n is uniquely determined only when n is odd regardless of whether z is positive or negative. In case we think of even n, (i) if z is positive, we have two n-th root of z1/n. (ii) If z is negative, on the other hand, we have no n-th root. Meanwhile, if we think of complex z, we have a totally different situation. That is, regardless of whether n is odd or even, we always have n different z1/n for any given complex number z. As a tangible example, let us think of n-th roots of 1 for n ¼ 2 and 3. These are depicted graphically in Fig. 6.24. We can readily understand the statement of the above paragraph. In Fig. 6.24a, if z ¼ 1 for n ¼ 2, we have either w ¼ (1)1/2 ¼ (eiπ)1/2 ¼ i or w ¼ (1)1/2 ¼ (eiπ)1/2 ¼ i with square roots. In Fig. 6.24b, if z ¼ 1 for n ¼ 3, we have w ¼ (1)1/3 ¼ 1; w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3; w ¼ (1)1/3 ¼ (eiπ)1/3 ¼ eiπ/3 with cubic roots. Moreover, we have 11/2 ¼ 1; 1. Also, we have 11/3 ¼ 1; e2iπ/3; e2iπ/3. Let us extend the above preliminary discussion to the analytic functions. Suppose we are given a following polar form of z such that z ¼ r ðcos θ þ i sin θÞ and

ð6:32Þ

250

6 Theory of Analytic Functions

w ¼ ρðcos φ þ i sin φÞ: Then, we have z ¼ wn ¼ ρn ðcos φ þ i sin φÞn ¼ ρn ðcos nφ þ i sin nφÞ,

ð6:228Þ

where with the last equality we used de Moivre’s theorem (6.36). Comparing the real and imaginary parts of (6.32) and (6.228), we have r ¼ ρn or ρ ¼ r 1=n

ð6:229Þ

θ ¼ nφ þ 2kπ ðk ¼ 0, 1, 2, Þ:

ð6:230Þ

and

In (6.229), we define both r and ρ as being positive so that positive ρ can be uniquely determined for any positive integer n (i.e., either even or odd). Recall once again that if n is even, we have two n-th roots for ρ (i.e., both positive and negative) with given positive r. Of these two roots, we discard the negative ρ. Rewriting (6.230), we have 1 φ ¼ ðθ 2kπ Þ ðk ¼ 0, 1, 2, Þ: n Further rewriting (6.227), we get h i 1 1 wk ðθÞ ¼ z1=n ¼ r 1=n cos ðθ 2kπ Þ þ i sin ðθ 2kπ Þ n n ðk ¼ 0, 1, 2, , n 1Þ,

ð6:231Þ

where wk(θ) is said to be a branch. Note that we have wk ðθ 2nπ Þ ¼ wk ðθÞ:

ð6:232Þ

That is, wk(θ) is a periodic function of the period 2nπ. The number of the total branches is n; the index k of wk(θ) denotes these different branches. In particular, when k ¼ 0, we have

1 1 w0 ðθÞ ¼ z1=n ¼ r 1=n cos θ þ i sin θ : n n Comparing (6.231) and (6.233), we obtain

ð6:233Þ

6.9 Multivalued Functions and Riemann Surfaces

wk ðθÞ ¼ w0 ðθ 2kπ Þ:

251

ð6:234Þ

From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ toward the positive direction of θ. The branch w0(θ) is called a principal branch of the n-th root of z. The value of w0(θ) is called the principal value of that branch. The implication of the presence of the branches is as follows: Suppose that the branches besides w0(θ) would be absent. Then, from (6.233) we have

2π 2π þ i sin 6 w0 ð0Þ ¼ r 1=n : ¼ w0 ð2π Þ ¼ z1=n ¼ r 1=n cos n n This causes inconvenience, because θ ¼ 0 and θ ¼ 2π are assumed to be identical in the complex plane. Hence, the single valuedness would be broken with w0(θ). But, in virtue of the presence of w1(θ), we have w1 ð2π Þ ¼ w0 ð2π 2π Þ ¼ w0 ð0Þ: This means that the value of w0(0) is recovered and, hence, the single valuedness remains intact. After another cycle around the origin, similarly we have w2 ð4π Þ ¼ w0 ð4π 4π Þ ¼ w0 ð0Þ: Thus, in succession we get wk ð2kπ Þ ¼ w0 ð2kπ 2kπ Þ ¼ w0 ð0Þ: For k ¼ n, from (6.231) we have h i 1 1 wn ðθÞ ¼ z1=n ¼ r 1=n cos ðθ 2nπ Þ þ i sin ðθ 2nπ Þ n n h

i 1 1 1=n ¼r cos θ 2π þ i sin θ 2π n n

1 1 ¼ r 1=n cos θ þ i sin θ ¼ w0 ðθÞ: n n Thus, we have no more new function. At the same time, the single valuedness of wk(θ) (k ¼ 0, 1, 2, , n 1) remains intact during these processes. To summarize the above discussion, if we have n planes (or sheets) as the complex planes and allocate them to w0(θ), w1(θ), , wn 1(θ) individually, we have a single-valued function w(θ) as a whole throughout these planes. In a word, the following relation represents the situation:

252

6 Theory of Analytic Functions

w ðθ Þ ¼

8 w 0 ðθ Þ > > > > < w 1 ðθ Þ > > > > :

for Plane 0, for Plane 1,

for Plane n 1: wn1 ðθÞ

This is the essence of the Riemann surface. The superposition of these n planes is called a Riemann surface and each plane is said to be a Riemann sheet [of the function w(θ)]. Each single-valued function w0(θ), w1(θ), , wn 1(θ) defined on each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is called a branch point of w(z). For simplicity, let us think of the function w(z) described by wðzÞ z1=2 ¼

pffiffi z:

In this case, for z ¼ reiθ (0 θ 2π) we have two different values w0 and w1 given by w0 ðθÞ ¼ r 2 e 2 , w1 ðθÞ ¼ r 1=2 eiðθ2πÞ=2 ¼ w0 ðθ 2π Þ ¼ w0 ðθÞ: 1 iθ

ð6:235Þ

Then, we have wðθÞ ¼

w0 ðθÞ for Plane 0, w1 ðθÞ for Plane 1:

ð6:236Þ

In this case, w(z) is called a “two-valued” function of z and the functions w0 and w1 are said to be branches of w(z). Suppose that z makes a counterclockwise circuit around the origin, starting from, e.g., a real positive number z ¼ z0 to come full circle to the original point z ¼ z0. Then, the argument of z has been increased by 2π. In this situation the arguments of the individual branches w0 and w1 are increased by π. Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be understood more clearly from (6.232) and (6.234). That is, putting n ¼ 2 in (6.232), we have wk ðθ 4π Þ ¼ wk ðθÞ ðk ¼ 0, 1Þ:

ð6:237Þ

Meanwhile, putting k ¼ 1 in (6.234) we get w1 ðθÞ ¼ w0 ðθ 2π Þ:

ð6:238Þ

Replacing θ with θ + 2π in (6.238), we have w1(θ + 2π) ¼ w0(θ). Also replacing θ with θ + 4π in (6.237), we have w1(θ + 4π) ¼ w1(θ) ¼ w0(θ 2π) ¼ w0(θ + 2π),

6.9 Multivalued Functions and Riemann Surfaces

(a)

253

(b) Plane I placed on top of Plane II

Plane I

Plane II

Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures: (a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown. Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape together the downside of the cut of Plane II and the foreside of the cut of Plane I

where with the last equality we replaced θ with θ + 2π in (6.237). Rewriting the above, we get w1 ðθ þ 2π Þ ¼ w0 ðθÞ and w0 ðθ þ 2π Þ ¼ w1 ðθÞ: That is, adding 2π to θ, w0 and w1 are switched to each other. Let us further think of the two-valued function of w(z) ¼ z1/2. Strictly speaking, an analytic function cannot be a two-valued function. This is because if so, the continuity and differentiability will be lost from the function and, hence, the analyticity would be broken. Then, we must make a suitable device to avoid it. Such a device is called a Riemann surface. Let us make a kit of the Riemann surface following Fig. 6.25. (i) Take a sheet of paper so that it can represent a complex plane and cut it with scissors along the real axis that starts from the origin so that the cut (or slit) can be made toward the real positive direction. This cut is called a branch cut with the origin being a branch point. Let us call this sheet Plane I. (ii) Take another sheet of paper and call it Plane II. Also make a cut in Plane II in exactly the same way as that for Plane I; see Fig. 6.25a for the processes (i) and (ii). (iii) Next, put Plane I on top of Plane II so that the two branch cuts can fit in line. (iv) Tape together the downside of the cut of Plane I and the foreside of the cut of Plane II (Fig. 6.25b). (v) Then, also tape together the downside of the cut of Plane II and the foreside of the cut of Plane I (see Fig. 6.25b once again). Thus, what we see is that starting from, e.g., a real positive number z ¼ z0 of Plane I and coming full circle to the original point z ¼ z0, then we cross the cut to enter Plane II located underneath Plane I. After another cycle within Plane II, we come back to Plane I again by crossing the cut. After all, we come back to the original Plane I after two cycles on the combined planes. This combined plane is called the Riemann surface. In this way, Planes I and II correspond to different branches w0 and

254

6 Theory of Analytic Functions

Fig. 6.26 Graphically decided argument θa, which is an angle between a line connecting z and a and another line drawn in parallel with the real axis

−

0 − w1, respectively, so that each branch can be single valued. In other words, w(z) ¼ z1/2 is a single-valued function that is defined on the whole Riemann surface. There are a variety of Riemann surfaces according to the nature of the complex functions. Another example of the multivalued function is expressed as wðzÞ ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðz aÞðz bÞ,

where two branch points are located at z ¼ a and z ¼ b. As before, we assume that z a ¼ r a eiθa and z b ¼ r b eiθb : Then, we have wðθa , θb Þ ¼ z1=2 ¼

pffiffiffiffiffiffiffiffi iðθa þθb Þ=2 ra rb e :

ð6:239Þ

In the above, e.g., the argument θa can be decided graphically as shown in Fig. 6.26, where θa is an angle between a line connecting z and a and another line drawn in parallel with the real axis. We can choose w(θa, θb) of (6.239) for the principal branch and define this as pffiffiffiffiffiffiffiffi iðθa þθb Þ=2 ra rb e :

ð6:240Þ

pffiffiffiffiffiffiffiffi iðθa þθb 2πÞ=2 ra rb e :

ð6:241Þ

w 0 ðθ a , θ b Þ Also, we define w1(θa, θb) as w1 ðθa , θb Þ Then, as in (6.235) we have

w1 ðθa , θb Þ ¼ w0 ðθa 2π, θb Þ ¼ w0 ðθa , θb 2π Þ ¼ w0 ðθa , θb Þ:

ð6:242Þ

6.9 Multivalued Functions and Riemann Surfaces

(a)

255

(b)

0

0

(c)

0 Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z ¼ a and pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi z ¼ b) for ðz aÞðz bÞ. (a) The contour C encircles only z ¼ a. (b) C encircles both z ¼ a and z ¼ b. (c) C encircles neither z ¼ a nor z ¼ b

From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each other as before. We also find that after the variable z has come full circle around one of a and b, w0(θa, θb) changes sign as in the case of (6.235). We also have w0 ðθa 2π, θb 2π Þ ¼ w0 ðθa , θb Þ, w1 ðθa 2π, θb 2π Þ ¼ w1 ðθa , θb Þ: ð6:243Þ From (6.243), on the other hand, after the variable z has come full circle around both a and b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the case where the variable z comes full circle without encircling a or b. These behaviors imply that (i) if a contour encircles one of z ¼ a and z ¼ b, w0(θa, θb) and w1(θa, θb) change sign and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches. On the other hand, (ii) if a contour encircles both z ¼ a and z ¼ b, w0(θa, θb) and w1(θa, θb) remain intact. (iii) If a contour encircles neither z ¼ a nor z ¼ b, w0(θa, θb) and w1(θa, θb) remain intact as well. These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a–c, respectively. In Fig. 6.27c after z comes full circle, argz relative to z ¼ a returns to the original value keeping θ1 arg z θ2. This is similarly the case with argz relative to z ¼ b. Correspondingly, the branch cut(s) are depicted with doubled broken line(s), e.g., as in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be chosen so that

256

6 Theory of Analytic Functions

(b)

(a)

0

0

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Fig. 6.28 Branch cut(s) for ðz aÞðz bÞ. (a) A branch cut is shown by a line connecting z ¼ a and z ¼ b. (b) Branch cuts are shown by a line connecting z ¼ a and z ¼ 1 and another line connecting z ¼ b and z ¼ 1. In both (a, b) the branch cut(s) are depicted with doubled broken line(s)

the circle (or contour) may not cross the branch cut. Although a bit complicated, the Riemann surface can be shaped accordingly. Other choices of the branch cuts are shown in Sect. 6.9.2 (vide infra). When we consider logarithm functions, we have to deal with rather complicated situation. For polar coordinate representation, we have z ¼ reiθ. That is, ln z ¼ ln r þ i arg z ¼ ln jzj þ i arg z,

ð6:244Þ

arg z ¼ θ þ 2πn ðn ¼ 0, 1, 2, Þ:

ð6:245Þ

where

If n ¼ 0 is chosen, lnz is said to be the principal value. With the logarithm functions, the Riemann surface comprises infinite planes each of which corresponds to the individual branch whose argument is given by (6.245). The point z ¼ 0 is pffiffi called a logarithmic branch point. Meanwhile, the branch point for, e.g., wðzÞ ¼ z is said to be an algebraic branch point.

6.9.2

Examples of Multivalued Functions

When we deal with a function having a branch point, we must remind that unless the contour crosses the branch cut (either algebraic or logarithmic), a function we are thinking of is held single valued and analytic (if that function is originally analytic) and can be differentiated or integrated in a normal way. We give some examples for this.

6.9 Multivalued Functions and Riemann Surfaces

257

Example 6.12 [6] Evaluate the following real definite integral: Z I¼ 0

1

xa1 dx ð0 < a < 1Þ: 1þx

ð6:246Þ

We rewrite (6.246) as Z I¼

1

0

za1 dz: 1þz

ð6:247Þ

We define the integrand of (6.247) as f ðzÞ

za1 , 1þz

where za1 can be further rewritten as za1 ¼ eða1Þ ln z : The function f (z) has a branch point at z ¼ 0 and a simple pole at z ¼ 1. Bearing in mind this situation, we consider a contour for integration (see Fig. 6.29). In Fig. 6.29 we depict the branch cut with a doubled broken line. Lines PQ and Q0P0 are contour lines located over and under the branch cut, respectively. We assume that the lines PQ and Q0P0 are illimitably close to the real axis. Thus, starting from the point P the contour integration IC is described by Z IC ¼

za1 dz þ PQ 1 þ z

Z

za1 dz þ ΓR 1 þ z

Z

za1 dz þ Q0 P0 1 þ z

Z

za1 dz, Γ0 1 þ z

ð6:248Þ

where ΓR and Γ0 denote the outer large circle and inner small circle of their radius R ( 1) and r ( 1), respectively. Note that with the contour integration ΓR is traced counterclockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is present at z ¼ 1, the related residue Res f (1) is Fig. 6.29 Branch cut (shown with a doubled broken line) and contour for a1 the integration of z1þz

Γ

−1

Γ 0

258

6 Theory of Analytic Functions

a1 Res f ð1Þ ¼ za1 z¼1 ¼ ð1Þa1 ¼ eiπ ¼ eaiπ : Then, we have I C ¼ 2πi Res f ð1Þ ¼ 2πi eaiπ :

ð6:249Þ

When R ! 1 and r ! 0, we have Z Z za1 Ra1 za1 r a1 < < dz 2πR ! 0 and dz 2πr ! 0: ð6:250Þ R1 1r ΓR 1 þ z Γ0 1 þ z Thus, the second and fourth terms of (6.248) vanish. Meanwhile, we take the principal value of ln z of (6.244). Since the lines PQ and Q0P0 are very close to the real axis, using (6.245) with n ¼ 0 we can put θ ¼ 0 on PQ and θ ¼ 2π on Q0P0. Then, we have Z

za1 dz ¼ PQ 1 þ z

Z

eða1Þ ln z dz ¼ PQ 1 þ z

Z

eða1Þ ln jzj dz: PQ 1 þ z

ð6:251Þ

Moreover, we have Z

Z eða1Þ ln z eða1Þð ln jzjþ2πiÞ dz ¼ dz 1þz Q0 P0 1 þ z Q0 P0 Z eða1Þ ln jzj ¼ eða1Þ2πi dz: 1þz Q0 P0

za1 dz ¼ Q0 P0 1 þ z

Z

ð6:252Þ

Notice that the argument underneath the branch cut (i.e., the line Q0P0) is increased by 2π relative to that over the branch cut. Considering (6.249) through (6.252) and taking the limit of R ! 1 and r ! 0, (6.248) is rewritten by Z 2πi eaπi ¼ 0

1 ða1Þ ln jzj

e

1þz

dz þ eða1Þ2πi

h iZ ða1Þ2πi ¼ 1e h ¼ 1 eða1Þ2πi

iZ

Z

0 ða1Þ ln jzj

e

1

1þz

dz

1 ða1Þ ln jzj

e

1þz

0 1 0

za1 dz: 1þz

dz ð6:253Þ

In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we have e(a 1) ln jz| ¼ e(a 1) ln z ¼ za 1. Therefore, from (6.253) we get

6.9 Multivalued Functions and Riemann Surfaces

Z

1

0

za1 2πi eaπi 2πi eaπi ¼ : dz ¼ 1þz 1 e2πai 1 eða1Þ2πi

259

ð6:254Þ

Using (6.37) for (6.254) and performing simple trigonometric calculations (after multiplying both the numerator and denominator by 1 e2πai), we finally get Z

1

0

za1 dz ¼ 1þz

Z 0

1

xa1 dx ¼ I ¼ π=ðsin aπ Þ: 1þx

Example 6.13 [13] Estimate the following real definite integral: Z

b

I¼ a

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ða, b : real with a < bÞ: ð x aÞ ð b x Þ

ð6:255Þ

To this end, we wish to evaluate the following integral described by Z IC ¼

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz, ð z aÞ ð z bÞ C

ð6:256Þ

where we define the integrand of (6.256) as 1 f ðzÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ðz aÞðz bÞ The total contour C comprises Ca, PQ, Cb, and Q0P0 as depicted in Fig. 6.30. The function f (z) has two branch points at z ¼ a and z ¼ b; otherwise f (z) is analytic. We draw a branch cut with a doubled broken line as shown. Since the contour C does not cross the branch cut but encircle it, we can evaluate the integral in a normal manner. Starting from the point P0, (6.256) can be expressed by Z

Z 1 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz IC ¼ ðz aÞðz bÞ ð z aÞ ð z bÞ Ca PQ Z Z 1 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz: þ 0 0 ð z a Þ ð z b Þ ð z a Þ ð z bÞ Cb QP

ð6:257Þ

We assume that the lines PQ and Q0P0 are illimitably close to the real axis. This time, Ca and Cb are both traced counterclockwise. Putting z a ¼ r 1 eiθ1 and z b ¼ r 2 eiθ2 , we have

260

6 Theory of Analytic Functions

Fig. 6.30 Branch cut (shown with a doubled broken line) and contour for the integration of 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ða < bÞ. Only the argument θ1 is depicted. With θ2 see text ðzaÞðzbÞ

1 f ðzÞ ¼ pffiffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 : r1 r2

ð6:258Þ

Let Ca and Cb be a small circle of radius ε centered at z ¼ a and z ¼ b, respectively. When we are evaluating the integral on Cb, we can put z b ¼ εeiθ2 ðπ θ2 π Þ; i:e:, r 2 ¼ ε: We also have r1 b a; in Fig. 6.30 we assume b > a. Hence, we have Z Z Z π pffiffiffi π ε 1 iε pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ pffiffiffiffi dθ2 : pffiffiffiffiffiffiffi eiðθ1 þθ2 Þ=2 dθ2 r1 ε r1 Cb ðz aÞðz bÞ π π Therefore, we get Z Z 1 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0 or lim pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0: lim ε!0 C ε!0 C ð z a Þ ð z b Þ ð z a Þ ð z bÞ b b In a similar manner, we have Z lim

ε!0 C a

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0: ð z aÞ ð z bÞ

Meanwhile, on the line Q0P0 we have z a ¼ r 1 eiθ1 and z b ¼ r 2 eiθ2 :

ð6:259Þ

In (6.259) we have r1 ¼ x a, θ1 ¼ 0 and r2 ¼ b x, θ2 ¼ π; see Fig. 6.30. Also, we have dz ¼ dx. Hence, we have

6.9 Multivalued Functions and Riemann Surfaces

Z

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 0 0 ðz aÞðz bÞ QP Z b 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ¼i ðx aÞðb xÞ a

Z

a b

261

eiπ=2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ ð x aÞ ð b x Þ

Z

a b

i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ðx aÞðb xÞ

ð6:260Þ On the line PQ, in turn, we have r1 ¼ x a and θ1 ¼ 2π. This is because when going from P0 to P, the argument is increased by 2π; see Fig. 6.30 once again. Also, we have r2 ¼ b x, θ2 ¼ π, and dz ¼ dx. Notice that the argument θ2 remains unchanged. Then, we get Z

Z

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ ð z a Þ ð z bÞ PQ

b a

Z

e3iπ=2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx ðx aÞðb xÞ

b

¼i a

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ðx aÞðb xÞ

ð6:261Þ

Inserting these results into (6.257), we obtain I

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ 2i IC ¼ ðz aÞðz bÞ C

Z

b a

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx: ð x aÞ ð b x Þ

ð6:262Þ

The calculation is yet to be finished, however, because (6.262) is not a real definite integral. Then, let us try another contour integration. We choose a contour C of a radius R and centered at the origin. We assume that R is large enough this time. Meanwhile, putting z ¼ 1=w,

ð6:263Þ

we have pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw ð z aÞ ð z bÞ ¼ ð1 awÞð1 bwÞ with dz ¼ 2 : w w

ð6:264Þ

Thus, we have I

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dz ¼ IC ¼ ðz aÞðz bÞ C

I C

0

1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dw, w ð1 awÞð1 bwÞ

ð6:265Þ

where the closed contour C0 of diameter 1/R comes full circle clockwise around the origin of the w-plane (see Fig. 6.31). This is because as in the z-plane the contour C is

262

6 Theory of Analytic Functions

(b)

(a) 0
0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization vectors in the direction of positive x-axis and y-axis; we assume that two waves are

286

7 Maxwell’s Equations

being propagated in the direction of the positive z-axis; δ is a phase difference. The total electric field E is described as the superposition of E1 and E2 such that E = E1 þ E2 = E1 e1 eiðkzωtÞ þ E 2 e2 eiðkzωtþδÞ :

ð7:69Þ

Note that we usually discuss the polarization characteristics of electromagnetic wave only by considering electric waves. We emphasize that an electric wave and concomitant magnetic wave share the same phase in a uniform and infinite dielectric media. A reason why the electric wave represents an electromagnetic wave is partly because optical application is mostly made in a nonmagnetic substance such as glass, water, plastics, and most of semiconductors. Let us view temporal change of E at a fixed point x = 0; x ¼ y ¼ z ¼ 0. Then, taking a real part of (7.69), x- and y-components of E; i.e., Ex and Ey are expressed as E x ¼ E 1 cos ðωt Þ and E y ¼ E 2 cos ðωt þ δÞ:

ð7:70Þ

First, let us briefly think of the case where δ ¼ 0. Eliminating t, we have Ey ¼

E2 E: E1 x

ð7:71Þ

This is an equation of a straight line. The resulting electric field E is called a linearly polarized light accordingly. That is, when we are observing the electric field of the relevant light at the origin, the field is oscillating along the straight line described by (7.71) with the origin centrally located of the oscillating field. If δ ¼ π, we have Ey ¼

E2 E: E1 x

This gives a straight line as well. Therefore, if we wish to seek the relationship between Ex and Ey, it suffices to examine it as a function of δ in a region of π2 δ π2. (i) Case I: E1 6¼ E2. Let us consider the case where δ 6¼ 0 in (7.70). Rewriting the second equation of (7.70) and inserting the first equation into it so that we can eliminate t, we have E y ¼ E 2 ð cos ωt cos δ þ sin ωt sin δÞ ¼ E 2

E cos δ x sin δ E1

Rearranging terms of the above equation, we have

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! E2 1 x2 : E1

7.4 Superposition of Two Electromagnetic Waves

287

Ey E ð cos δÞ x ¼ ð sin δÞ E2 E1

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E2 1 x2 : E1

ð7:72Þ

Squaring both sides of (7.72) and arranging the equation, we get 2ð cos δÞE x E y Ey 2 Ex 2 þ ¼ 1: E 21 sin 2 δ E1 E2 sin 2 δ E22 sin 2 δ

ð7:73Þ

Using a matrix form, we have 0

1 cos δ 1

B E 21 sin 2 δ Ex E 1 E 2 sin 2 δ C B C Ex Ey @ ¼ 1: A E cos δ 1 y E 1 E 2 sin 2 δ E22 sin 2 δ

ð7:74Þ

Note that the above matrix is real symmetric. In that case, to examine properties of the matrix we calculate its determinant along with principal minors. The principal minor means a minor with respect to a diagonal element. In this case, two principal 1 1 minors are E2 sin . Also we have 2 and 2 δ E sin 2 δ 2

1

1 cos δ E21 sin 2 δ E 1 E 2 sin 2 δ 1 ¼ E2 E 2 sin 2 δ : cos δ 1 1 2 E E sin 2 δ E22 sin 2 δ 1 2

ð7:75Þ

Evidently, two principal minors as well as a determinant are all positive (δ 6¼ 0). In this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related discussion will be given in Part III. The positive definiteness means that in a quadratic form described by (7.74), LHS takes a positive value for any real number Ex and Ey except a unique case where Ex ¼ Ey ¼ 0, which renders LHS zero. The positive definiteness of a matrix ensures the existence of positive eigenvalues with the said matrix. Let us consider a real symmetric (2, 2) matrix that has positive principal minors and a positive determinant in a general case. Let such a matrix M be M¼

a

c

c

b

,

where a, b > 0 and det M > 0; i.e., ab c2 > 0. Let a corresponding quadratic form be Q. Then, we have

288

7 Maxwell’s Equations

a c Q ¼ ð x yÞ c b

x cy 2 c2 y2 aby2 2 2 ¼ ax þ 2cyx þ by ¼ a x þ 2 þ 2 a a a y

cy 2 y2 þ 2 ab c2 : ¼a xþ a a

Thus, Q 0 for any real numbers x and y. We seek a condition under which Q ¼ 0. We readily find that with M that has the above properties, only x ¼ y ¼ 0 makes Q ¼ 0. Thus, M is positive definite. We will deal with this issue from a more general standpoint in Part III. In general, it is pretty complicated to seek eigenvalues and corresponding eigenvectors in the above case. Yet, we can extract important information from (7.74). The eigenvalues λ are estimated as follows:

λ¼

E 21 þ E 22

q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi

2 E21 þ E22 4E21 E22 sin 2 δ 2E 21 E 22 sin 2 δ

:

ð7:76Þ

Notice that λ in (7.76) represents two different positive eigenvalues. It is because an inside of the square root is rewritten by

E21 E22

2

þ 4E21 E22 cos 2 δ > 0 ðδ 6¼ π=2Þ:

Also we have E 21

þ

E22

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 2

2 2 2 2 2 > E 1 þ E 2 4E 1 E 2 sin δ:

These clearly show that the quadratic form of (7.74) gives an ellipse (i.e., elliptically polarized light). Because of the presence of the second term of LHS of (7.73), both the major and minor axes of the ellipse are tilted and diverted from the x- and y-axes. Let us inspect the ellipse described by (7.74). Inserting Ex ¼ E1 obtained at t ¼ 0 in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as a double root such that E y ¼ E 2 cos δ: Similarly putting Ey ¼ E2 in (7.73), we have E x ¼ E 1 cos δ: These results show that an ellipse described by (7.73) or (7.74) is internally tangent to a rectangle as depicted in Fig. 7.6a. Equation (7.69) shows that the electromagnetic wave is propagated toward the positive direction of the z-axis. Therefore, in Fig. 7.6a we are peeking into the oncoming wave from the bottom of a plane of paper

7.4 Superposition of Two Electromagnetic Waves

(a)

289

(b)

cos cos

Fig. 7.6 Trace of an electric field of an elliptically polarized light. (a) The trace is internally tangent to a rectangle of 2E1 2E2. In the case of δ > 0, starting from P at t ¼ 0, the coordinate point representing the electric field traces the ellipse counterclockwise with time. (b) The trace of an elliptically polarized light for δ ¼ π/2

at a certain position of z ¼ constant. We set the constant ¼ 0. Then, we find that at t ¼ 0 the electric field is represented by the point P (Ex ¼ E1, Ey ¼ E2 cos δ); see Fig. 7.6a. From (5.70), if δ>0, P traces the ellipse counterclockwise. It reaches a maximum point of Ey ¼ E2 at t ¼ δ/2ω. Since the trace of electric field forms an ellipse as in Fig. 7.6, the associated light is said to be an elliptically polarized light. If δ < 0 in (5.70), on the other hand, P traces the ellipse clockwise. In a special case of δ ¼ π/2, the second term of (7.73) vanishes and we have a simple form described as Ex 2 Ey 2 þ 2 ¼ 1: E 21 E2

ð7:77Þ

Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis of (7.70), we see from Fig. 7.6b that starting from P at t ¼ 0, again the coordinate point representing the electric field traces the ellipse counterclockwise with time; see the curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse clockwise with time. (ii) Case II: E1 ¼ E2. Now, let us consider a simple but important case. When E1 ¼ E2, (7.73) is simplified to be E x 2 2 cos δE x E y þ E y 2 ¼ E21 sin 2 δ: Using a matrix form, we have

ð7:78Þ

290

7 Maxwell’s Equations

Ex Ey

1

cos δ

cos δ

1

Ex Ey

¼ E 21 sin 2 δ:

ð7:79Þ

We obtain eigenvalues λ of the matrix of (7.79) such that λ ¼ 1 j cos δ j :

ð7:80Þ

λ ¼ 1 cos δ:

ð7:81Þ

Setting π2 δ π2, we have

The corresponding normalized eigenvectors v1 and v2 (as a column vector) are 0

1 0 1 1 1 pffiffiffi pffiffiffi B 2 C B C C and v2 ¼ B 2 C: v1 ¼ B @ @ 1 A 1 A pffiffiffi pffiffiffi 2 2

ð7:82Þ

Thus, we have a diagonalizing unitary matrix P such that 0

1 1 1 pffiffiffi pffiffiffi B 2 2C C: P¼B @ 1 1 A pffiffiffi pffiffiffi 2 2

ð7:83Þ

Defining the above matrix appearing in (7.79) as A such that A¼

1 cos δ , cos δ 1

ð7:84Þ

we obtain P1 AP ¼

1 þ cos δ 0 : 0 1 cos δ

ð7:85Þ

Notice that eigenvalues (1 + cos δ) and (1 cos δ) are both positive as expected. Rewriting (7.79), we have

E x Ey PP

1

1

cos δ

cos δ

1

Ex 1 PP Ey

7.4 Superposition of Two Electromagnetic Waves

¼ Ex Ey P

1 þ cos δ 0

291

Ex 1 ¼ E21 sin 2 δ: P Ey 1 cos δ 0

ð7:86Þ

Here, let us define new coordinates such that u v

1

P

Ex : Ey

ð7:87Þ

This coordinate transformation corresponds to the transformation of basis vectors (e1 e2) such that ð e1 e2 Þ

Ex

Ey

¼ ðe1 e2 ÞPP1

Ex Ey

u ¼ e01 e02 , v

ð7:88Þ

where new basis vectors e01 e02 are given by

e01 e02 ¼ ðe1 e2 ÞP ¼

1 1 1 1 pffiffiffi e1 pffiffiffi e2 pffiffiffi e1 þ pffiffiffi e2 : 2 2 2 2

ð7:89Þ

The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The relevant discussion will again appear in Part III. Substituting (7.87) for (7.86) and rearranging terms, we get

E21 ð1

u2 v2 þ 2 ¼ 1: cos δÞ E 1 ð1 þ cos δÞ

ð7:90Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Equation (7.90) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi indicates that a major axis and minor axis are E1 1 þ cos δ and E 1 1 cos δ, respectively. When δ ¼ π/2, (7.90) becomes

Fig. 7.7 Relationship between the basis vectors (e1 e2) and e01 e02 in the case of E1 ¼ E2; see text

292

7 Maxwell’s Equations

(b)

(a)

1 − cos

>0 = /2

Fig. 7.8 Polarized feature of light in the case of E1 ¼ E2. (a) If δ > 0, the electric field traces an ellipse from A1 via A2 to A3 (see text). (b) If δ ¼ π/2, the electric field traces a circle from C1 via C2 to C3 (left-circularly polarized light)

u2 v 2 þ ¼ 1: E21 E 21

ð7:91Þ

This represents a circle. For this reason, the wave described by (7.91) is called a circularly polarized light. In (7.90) where δ 6¼ π/2, the wave is said to be an elliptically polarized light. Thus, we have linearly, elliptically, and circularly polarized lights depending on a magnitude of δ. Let us closely examine characteristics of the elliptically and circularly polarized lights in the case of E1 ¼ E2. When t ¼ 0, from (7.70) we have E x ¼ E1 and Ey ¼ E 1 cos δ:

ð7:92Þ

This coordinate point corresponds to A1 whose Ex coordinate is E1 (see Fig. 7.8a). In the case of Δt ¼ δ/2ω, Ex ¼ Ey ¼ E1 cos ( δ/2). This point corresponds to A2 in Fig. 7.8a. We have qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E 2x þ E2y ¼ 2E 1 cos ðδ=2Þ ¼ E 1 1 þ cos δ:

ð7:93Þ

This is equal to the major axis as anticipated. With t ¼ Δt, E x ¼ E1 cos ðωΔt Þ and E y ¼ E 1 cos ðωΔt þ δÞ:

ð7:94Þ

Notice that Ey takes a maximum E1 when Δt ¼ δ/ω. Consequently, if δ takes a positive value, Ey takes a maximum E1 for a positive Δt, as is similarly the case with Fig. 7.6a. At that time, Ex ¼ E1 cos (δ) < E1. This point corresponds to A3 in Fig. 7.8a. As a result, the electric field traces the ellipse counterclockwise with time,

References

293

as in the case of Fig. 7.6. If δ takes a negative value, on the other hand, the field traces the ellipse clockwise. If δ ¼ π/2, in (7.94) we have

π Ex ¼ E 1 cos ðωt Þ and E y ¼ E 1 cos ωt : 2

ð7:95Þ

We examine the case of δ ¼ π/2 first. In this case,pwhen t ¼ 0, Ex ¼ E1 and Ey ¼ 0 ffiffiffi (Point C1 in Fig. 7.8b). If ωt ¼ π/4, Ex ¼ E y ¼ 1= 2 (Point C2 in Fig. 7.8b). In turn, if ωt ¼ π/2, Ex ¼ 0 and Ey ¼ E1 (Point C3). Again the electric field traces the circle counterclockwise. In this situation, we see the light from above the z-axis. In other words, we are viewing the light against the direction of its propagation. The wave is said to be left-circularly polarized and have positive helicity. In contrast, when δ ¼ π/2, starting from Point C1 the electric field traces the circle clockwise. That light is said to be right-circularly polarized and have negative helicity. With the left-circularly polarized light, (7.69) can be rewritten as E = E1 þ E2 = E1 ðe1 þ ie2 ÞeiðkzωtÞ :

ð7:96Þ

Therefore, a complex vector (e1 + ie2) characterizes the left-circular polarization. On the other hand, (e1 ie2) characterizes the right-circular polarization. To normalize them, it is convenient to use the following vectors as in the case of Sect. 4.3 [3]. 1 1 eþ pffiffiffi ðe1 þ ie2 Þ and e pffiffiffi ðe1 ie2 Þ, 2 2

ð4:45Þ

In the case of δ ¼ 0, we have a linearly polarized light. For this, the points A1, A2, and A3 coalesce to be a point on a straight line of Ey ¼ Ex.

References 1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester 3. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York

Chapter 8

Reflection and Transmission of Electromagnetic Waves in Dielectric Media

In Chap. 7, we considered the propagation of electromagnetic waves in an infinite uniform dielectric medium. In this chapter, we think of a situation where two (or more) dielectrics are in contact with each other at a plane interface. When two dielectric media adjoin each other with an interface, propagating electromagnetic waves are partly reflected by the interface and partly transmitted beyond the interface. We deal with these phenomena in terms of characteristic impedance of the dielectric media. In the case of an oblique incidence of a wave, we categorize it into a transverse electric (TE) wave and transverse magnetic (TM) wave. If a thin plate of a dielectric is sandwiched by a couple of metal sheets, the electromagnetic wave is confined within the dielectric. In this case, the propagating mode of the wave differs from that of a wave propagating in a free space (i.e., a space filled by a threedimensionally infinite dielectric medium). If a thin plate of a dielectric having a large refractive index is sandwiched by a couple of dielectrics with a smaller refractive index, the electromagnetic wave is also confined within the dielectric with a larger index. In this case, we have to take account of the total reflection that causes a phase change upon the reflection. We deal with such specific modes of the electromagnetic wave propagation. These phenomena are treated both from a basic aspect and from a point of view of device application. The relevant devices are called waveguides in optics.

8.1

Electromagnetic Fields at an Interface

We start with examining a condition of an electromagnetic field at the plane interface. Suppose that two semi-infinite dielectric media D1 and D2 are in contact with each other at a plane interface. Let us take a small rectangle S that strides the interface (see Fig. 8.1). Taking a surface integral of both sides of (7.28) over the strip, we have © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_8

295

296

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Z

Z rot E ndS þ

S

S

∂B ndS ¼ 0, ∂t

ð8:1Þ

where n is a unit vector directed to a normal of S as shown. Applying Stokes’ theorem to the first term of (8.1), we get I E dl þ C

∂B nΔlΔh ¼ 0: ∂t

ð8:2Þ

With the line integral of the first term, C is a closed loop surrounding the rectangle S and dl = tdl, where t is a unit vector directed toward the tangential direction of C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed counter-clockwise in the direction of t. Figure 8.2 gives an intuitive diagram that explains the Stokes’ theorem. The diagram shows an overview of a surface S encircled by a closed curve C. Suppose that we have a spiral vector field E represented by arrowed circles as shown. In that case, rot E is directed toward the upper side of the plane of paper in the individual fragments. A summation of rot E ndS forms a surface integral covering S. Meanwhile, the arrows of adjacent fragments cancel out each other and only the components on the periphery (i.e., the curve C) are nonvanishing (see Fig. 8.2). Thus, the surface integral of rot E is equivalent to the line integral of E. Accordingly, we get Stokes’ theorem described by [1]

n

Δh

D1 D2

Δl

t1

S

C

t2

Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane

C

dS

n

S Fig. 8.2 Diagram that intuitively explains the Stokes’ theorem. In the diagram a surface S is encircled by a closed curve C. An infinitesimal portion of C is denoted by dl. The surface S is pertinent to the surface integration. Spiral vector field E is present on and near S

8.2 Basic Concepts Underlying Phenomena

Z

297

I rot E ndS ¼ S

E dl:

ð8:3Þ

C

Returning back to Fig. 8.1 and taking Δh ⟶ 0, we have ∂B nΔlΔh⟶0. Then, the ∂t second term of (8.2) vanishes and we get I E dl = 0: C

This implies that ΔlðE1 t1 þ E2 t2 Þ ¼ 0, where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the interface, respectively. Considering t2 = 2 t1 and putting t1 = t, we get ðE1 E2 Þ t ¼ 0,

ð8:4Þ

where t represents a unit vector in the direction of a tangential line of the interface plane. Equation (8.4) means that the tangential components of the electric field are continuous on both sides of the interface. We obtain a similar result with the magnetic field. This can be shown by taking a surface integral of both sides of (7.29) as well. As a result, we get ðH 1 H 2 Þ t ¼ 0,

ð8:5Þ

where H1 and H2 represent the magnetic field in D1 and D2 close to the interface, respectively. Hence, from (8.5) the tangential components of the magnetic field are continuous on both sides of the interface as well.

8.2

Basic Concepts Underlying Phenomena

When an electromagnetic wave is incident upon an interface of dielectrics, its reflection and transmission (refraction) take place at the interface. We address a question of how the nature of the dielectrics and the conditions dealt with in the previous section are associated with the optical phenomena. When we deal with the problem, we assume non-absorbing media. Notice that the complex wavenumber vector is responsible for an absorbing medium along with a complex index of refraction. Nonetheless, our approach is useful to discuss related problems in the absorbing media. Characteristic impedance plays a key role in the reflection and transmission of light.

298

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

We represent a field (either electric or magnetic) of the incident, reflected, and transmitted (or refracted) waves by Fi, Fr, and Ft, respectively. We call a dielectric of the incidence side (and, hence, reflection side) D1 and another dielectric of the transmission side D2. The fields are described by Fi = F i εi eiðki xωtÞ ,

ð8:6Þ

Fr = F r εr eiðkr xωtÞ ,

ð8:7Þ

Ft = F t εt eiðkt xωtÞ ,

ð8:8Þ

where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit vector of polarization direction, i.e., the direction along which the field oscillates; ki, kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These wavenumber vectors represent the propagation directions of individual waves. In (8.6) to (8.8), indices of i, r, and t stand for incidence, reflection, and transmission, respectively. Let xs be an arbitrary position vector at the interface between the dielectrics. Also, let t be a unit vector paralleling the interface. Thus, tangential components of the field are described as F it = F i ðt εi Þeiðki xs ωtÞ ,

ð8:9Þ

F rt = F r ðt εr Þeiðkr xs ωtÞ ,

ð8:10Þ

F tt = F t ðt εt Þeiðkt xs ωtÞ :

ð8:11Þ

Note that F it and F rt represent the field in D1 just close to the interface and that F tt denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5), we have F it þ F rt ¼ F tt :

ð8:12Þ

Notice that (8.12) holds with any position xs and any time t. Let us think of elementary calculation of exponential functions or exponential polynomials and the relationship between individual coefficients and exponents. 0 With respect to two functions eikx and eik x , we have two alternatives according to a value Wronskian takes. Here, Wronskian W is expressed as eikx W ¼ ikx 0 e

0 eik x 0 ik0 x 0 ¼ iðk k 0 Þeiðkþk Þx : e 0

ð8:13Þ

(i) W 6¼ 0 if and only if k 6¼ k0. In this case, eikx and eik x are said to be linearly independent. That is, on condition of k 6¼ k0, for any x we have

8.2 Basic Concepts Underlying Phenomena

299

0

aeikx þ beik x ¼ 0 ⟺ a ¼ b ¼ 0:

ð8:14Þ

(ii) W ¼ 0 if k ¼ k0. In that case, we have 0

aeikx þ beik x ¼ ða þ bÞeikx ¼ 0 ⟺ a þ b ¼ 0: Notice that eikx never vanishes with any x. To conclude, if we think of an equation of an exponential polynomial 0

aeikx þ beik x ¼ 0, we have two alternatives regarding the coefficients. One is a trivial case of a ¼ b ¼ 0 and the other is a þ b ¼ 0. Next, with respect to eik1 x , and eik2 x , and eik3 x , similarly we have W¼

eik1 x 0 eik1 x ik x 00 e 1

eik2 x 0 eik2 x ik x 00 e 2

eik3 x 0 eik3 x ik x 00 e 3

¼ iðk1 k 2 Þðk2 k3 Þðk3 k1 Þeiðk1 þk2 þk3 Þx ,

ð8:15Þ

where W 6¼ 0 if and only if k1 6¼ k2, k2 6¼ k3, and k3 6¼ k1. That is, on this condition for any x we have aeik1 x þ beik2 x þ ceik3 x ¼ 0 ⟺ a ¼ b ¼ c ¼ 0:

ð8:16Þ

If the three exponential functions are linearly dependent, at least two of k1, k2, and k3 are equal to each other, and vice versa. On this condition, again consider a following equation of an exponential polynomial: aeik1 x þ beik2 x þ ceik3 x ¼ 0:

ð8:17Þ

Without loss of generality, we assume that k1 ¼ k2. Then, we have aeik1 x þ beik2 x þ ceik3 x ¼ ða þ bÞeik1 x þ ceik3 x ¼ 0: If k1 6¼ k3, we must have a þ b ¼ 0 and c ¼ 0: If, on the other hand, k1 ¼ k3, i.e., k1 ¼ k2 ¼ k3, we have

ð8:18Þ

300

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

aeik1 x þ beik2 x þ ceik3 x ¼ ða þ b þ cÞeik1 x ¼ 0: That is, we have a þ b þ c ¼ 0:

ð8:19Þ

Consequently, we must have k1 ¼ k2 ¼ k3 so that we can get three nonzero coefficients a, b, and c. Returning to (8.12), its full description is F i ðt εi Þeiðki xs ωtÞ þ F r ðt εr Þeiðkr xs ωtÞ F t ðt εt Þeiðkt xs ωtÞ ¼ 0:

ð8:20Þ

Again, (8.20) must hold with any position xs and any time t. Meanwhile, for (8.20) to have a physical meaning, we should have F i ðt εi Þ 6¼ 0, F r ðt εr Þ 6¼ 0, and F t ðt εt Þ 6¼ 0:

ð8:21Þ

On the basis of the above consideration, we must have following two relations: ki xs ωt ¼ kr xs ωt ¼ kt xs ωt or ki xs ¼ kr xs ¼ k t xs ,

ð8:22Þ

and F i ðt εi Þ þ F r ðt εr Þ F t ðt εt Þ ¼ 0 or F i ðt εi Þ þ F r ðt εr Þ ¼ F t ðt εt Þ:

ð8:23Þ

In this way, we are able to obtain a relation among amplitudes of the fields of incidence, reflection, and transmission. Notice that we get both the relations between exponents and coefficients at once. First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane (see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to D1 and transmitted (or refracted) partly into another dielectric medium D2. In Fig. 8.3, ki, kr, and, kt represent the incident, reflected, and transmitted lights that make an angle θ, θ0, and ϕ with the z-axis, respectively. Then we have

8.2 Basic Concepts Underlying Phenomena

301

z

kr

θ

θ

'

x

'

θ

ki

φ kt Fig. 8.3 Geometry of the incident, reflected, and transmitted lights. We assume that the light is incident from a dielectric medium D1 toward another medium D2. The wavenumber vectors ki, kr, and, kt represent the incident, reflected, and transmitted (or refracted) lights with an angle θ, θ0, and ϕ, respectively. Note here that we did not assume the equality of θ and θ0 (see text)

0 B ki ¼ ð e 1 e 2 e 3 Þ @

ki sin θ 0

ki cos θ 0 1 x B C xs ¼ ð e 1 e 2 e3 Þ @ y A ,

1 C A,

ð8:24Þ

ð8:25Þ

0 where θ is said to be an incidence angle. A plane formed by ki and a normal to the interface is called a plane of incidence (or incidence plane). In Fig. 8.3, the zx-plane forms the incidence plane. From (8.24) and (8.25), we have ki xs ¼ k i x sin θ,

ð8:26Þ

kr xs ¼ k rx x þ kry y,

ð8:27Þ

kt xs ¼ ktx x þ kty y,

ð8:28Þ

where ki ¼ j kij; krx and kry are x and y components of kr; similarly ktx and k ty are x and y components of kt. Since (8.22) holds with any x and y, we have ki sin θ ¼ krx ¼ ktx ,

ð8:29Þ

k ry ¼ k ty ¼ 0:

ð8:30Þ

From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the zx-plane. Notice that at the beginning we did not assume the coplanarity of those waves. We did not assume the equality of θ and θ0 either (vide infra). From (8.29) and Fig. 8.3, however, we have

302

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k i sin θ ¼ kr sin θ0 ¼ kt sin ϕ,

ð8:31Þ

where kr ¼ j krj and kt ¼ j ktj; θ0 and ϕ are said to be a reflection angle and a refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a straight line that parallels the z-axis. Figure 8.3 clearly shows it. Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and that in D2 is λ2. Since the incident light and reflected light are propagated in D1, we have ki ¼ kr ¼ 2π=λ1 :

ð8:32Þ

sin θ ¼ sin θ0 :

ð8:33Þ

From (8.31) and (8.32), we get

Therefore, we have either θ ¼ θ0 or θ0 ¼ π θ. since 0 < θ, θ0 < π/2, we have θ ¼ θ0 :

ð8:34Þ

Then, returning back to (8.31), we have k i sin θ ¼ kr sin θ ¼ kt sin ϕ:

ð8:35Þ

This implies that the components tangential to the interface of ki, kr, and kt are the same. Meanwhile, we have kt ¼ 2π=λ2 :

ð8:36Þ

Also we have c ¼ λ0 ν,

v1 ¼ λ1 ν,

v2 ¼ λ2 ν,

ð8:37Þ

where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is common to D1 and D2, we have c=λ0 ¼ v1 =λ1 ¼ v2 =λ2 or c=v1 ¼ λ0 =λ1 ¼ n1 ,

c=v2 ¼ λ0 =λ2 ¼ n2 ,

ð8:38Þ

8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

303

where λ0 is a wavelength in vacuum; n1 and n2 are refractive indices of D1 and D2, respectively. Combining (8.35) with (8.32), (8.36), and (8.38), we have several relations such that sin θ kt λ1 n2 ¼ ¼ ¼ ð nÞ, sin ϕ ki λ2 n1

ð8:39Þ

where n is said to be a relative refractive index of D2 relative to D1. The relation (8.39) is called Snell’s law. Notice that (8.39) reflects the kinematic aspect of light and that this characteristic comes from the exponents of (8.20).

8.3

Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

On the basis of the above argument, we are now in the position to determine the relations among amplitudes of the electromagnetic fields of waves of incidence, reflection, and transmission. Notice that since we are dealing with non-absorbing media, the relevant amplitudes are real (i.e., positive or negative). In other words, when the phase is retained upon reflection, we have a positive amplitude due to ei0 ¼ 1. When the phase is reversed upon reflection, on the other hand, we will be treating a negative amplitude due to eiπ ¼ 1. Nevertheless, when we consider the total reflection, we deal with a complex amplitude (vide infra). We start with the discussion of the vertical incidence of an electromagnetic wave before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei for the incident field. There we define unit polarization vectors of the electric field εi, εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we also define Fi (both electric and magnetic fields) as positive.

(a)

z

Hi

Ei

Hr

Er

Ht

(b)

Incidence light

e1

Et

' '

x

z

Incidence light

Hi

Ei

Hr

Er

Ht

e1

x

Et

Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same direction, Er is reversed. In this case, we define Er as negative

304

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

We have two cases about a geometry of the fields (see Fig. 8.4). The first case is that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the same direction, Er is reversed (Fig. 8.4b). In this case, we define Er as negative. Notice that Ei and Et are always directed in the same direction and that Er is directed either in the same direction or in the opposite direction according to the nature of the dielectrics. The situation will be discussed soon. Meanwhile, unit polarization vectors of the magnetic fields are determined by (7.67) for the incident, reflected, and transmitted waves. In Fig. 8.4, the magnetic fields are polarized along the y-axis (i.e., perpendicular to the plane of paper). The magnetic fields Hi and Ht are always directed to the same direction as in the case of the electric fields. On the other hand, if the phase of Er is conserved, the direction of Hr is reversed and vice versa. This converse relationship with respect to the electric and magnetic fields results solely from the requirement that E, H, and the propagation unit vector n of light must constitute a right-handed system in this order. Notice that n is reversed upon reflection. Next, let us consider an oblique incidence. With the oblique incidence, electromagnetic waves are classified into two special categories, i.e., transverse electric (TE) waves (or modes) or transverse magnetic (TM) waves (or modes). The TE wave is characterized by the electric field that is perpendicular to the incidence plane, whereas the TM wave is characterized by the magnetic field that is perpendicular to the incidence plane. Here the incidence plane is a plane that is formed by the propagation direction of the incident light and the normal to the interface of the two dielectrics. Since E, H, and n form a right-handed system, in the TE wave H lies on the incidence plane. For the same reason, in the TM wave E lies on the incidence plane. In a general case where a field is polarized in an arbitrary direction, that field can be formed by superimposing two fields corresponding to the TE and TM waves. In other words, if we take an arbitrary field E, it can be decomposed into a component having a unit polarization vector directed perpendicular to the incidence plane and another component having the polarization vector that lies on the incidence plane. These two components are orthogonal to each other. Example 8.1: TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of a TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies on that plane. The zx-plane defines the incidence plane. In this case, E is polarized along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5, the polarization direction of the electric field is denoted by a symbol ⨂. Therefore, we have e2 εi = e2 εr = e2 εt = 1:

ð8:40Þ

8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

Hr

ki

Ei Et

305

z

εi

θ

kr

εt

' x

y

'

Er

εr φ

kt Fig. 8.5 Geometry of the electromagnetic fields near the interface between dielectric media D1 and D2 in the case of oblique incidence of a TE wave. The electric field E is polarized along the y-axis (i.e., perpendicular to the plane of paper) with H polarized in the zx-plane. Polarization directions εi, εr, and εt are given for H. To avoid complication, neither Hi nor Ht is shown

For H we define the direction of unit polarization vectors εi, εr, and εt so that their direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1 (a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have e1 εi = cos θ, e1 εr = cos θ, and e1 εt = cos ϕ:

ð8:41Þ

According as Hr is directed to the same direction as εr or the opposite direction to εr, the amplitude is defined as positive or negative, as in the case of the vertical incidence. In Fig. 8.5, we depict the case where the amplitude Hr is negative. That is, the phase of the magnetic field is reversed upon reflection and, hence, Hr is in an opposite direction to εr in Fig. 8.5. Note that Hi and Ht are in the same direction as εi and εt, respectively. To avoid complication, neither Hi nor Ht is shown in Fig. 8.5. Applying (8.23) to both E and H, we have Ei þ Er ¼ Et ,

ð8:42Þ

H i cos θ þ H r cos θ ¼ H t cos ϕ:

ð8:43Þ

To derive the above equations, we choose t = e2 with E and t = e1 with H for (8.23). Because of the abovementioned converse relationship with E and H, we have E r H r < 0:

ð8:44Þ

Suppose that we carry out an experiment to determine six amplitudes in (8.42) and (8.43). Out of those quantities, we can freely choose and fix Ei. Then, we have five unknown amplitudes, i.e., Er, Et, Hi, Hr, and Ht. Thus, we need three more relations to determine them. Here information about the characteristic impedance Z is useful. It was defined as (7.64). From (8.6) to (8.8) as well as (8.44), we get

306

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Z1 ¼

pffiffiffiffiffiffiffiffiffiffiffi μ1 =ε1 ¼ E i =H i ¼ E r =H r , pffiffiffiffiffiffiffiffiffiffiffi Z 2 ¼ μ2 =ε2 ¼ E t =H t ,

ð8:45Þ ð8:46Þ

where ε1 and μ1 are permittivity and permeability of D1, respectively; ε2 and μ2 are permittivity and permeability of D2, respectively. As an example, we have Hi = n Ei =Z 1 ¼ n Ei εi,e eiðki xωtÞ =Z 1 ¼ Ei εi,m eiðki xωtÞ =Z 1 ¼ H i εi,m eiðki xωtÞ ,

ð8:47Þ

where we distinguish polarization vectors of electric and magnetic fields. Note in the above discussion, however, we did not distinguish these vectors to avoid complication. Comparing coefficients of the last relation of (8.47), we get Ei =Z 1 ¼ H i :

ð8:48Þ

On the basis of (8.42) to (8.46), we are able to decide Er, Et, Hi, Hr, and Ht. What we wish to determine, however, is a ratio among those quantities. To this end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities: ⊥ R⊥ E E r =E i and T E E t =E i ,

ð8:49Þ

⊥ where R⊥ E and T E are said to be a reflection coefficient and transmission coefficient with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave (i.e., electric field oscillating vertically with respect to the incidence plane). Thus ⊥ rewriting (8.42) and (8.43) and using R⊥ E and T E , we have

9 ⊥ R⊥ = E T E ¼ 1, cos θ cos ϕ cos θ þ T⊥ ¼ :; R⊥ E Z E Z Z1 1 2

ð8:50Þ

Using Cramer’s rule of matrix algebra, we have a solution such that 1 cosθ Z ⊥ RE ¼ 1 1 cos θ Z1

1 cos ϕ Z cos θ Z 1 cos ϕ Z2 ¼ 2 , Z 1 2 cos θ þ Z 1 cos ϕ cos ϕ Z2

ð8:51Þ

8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves

1 cosθ Z ⊥ TE ¼ 1 1 cos θ Z1

1 cos θ 2Z 2 cos θ Z1 ¼ : Z cos θ þ Z 1 cos ϕ 1 2 cos ϕ Z2

307

ð8:52Þ

Similarly, defining ⊥ R⊥ H H r =H i and T H H t =H i ,

ð8:53Þ

⊥ where R⊥ H and T H are said to be a reflection coefficient and transmission coefficient with the magnetic field, respectively, we get

R⊥ H ¼

Z 1 cos ϕ Z 2 cos θ , Z 2 cos θ þ Z 1 cos ϕ

ð8:54Þ

T⊥ H ¼

2Z 1 cos θ : Z 2 cos θ þ Z 1 cos ϕ

ð8:55Þ

In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and (8.46). Derivation of (8.54) and (8.55) is left for readers. Notice also that ⊥ R⊥ H ¼ RE :

ð8:56Þ

This relation can easily be derived by (8.45). Example 8.2: TM Wave In a manner similar to that described above, we obtain information about the TM wave. Switching a role of E and H, we assume that H is polarized along the y-axis with E polarized in the zx-plane. Following the aforementioned procedures, we have E i cos θ þ E r cos θ ¼ E t cos ϕ,

ð8:57Þ

Hi þ Hr ¼ Ht :

ð8:58Þ

From (8.57) and (8.58), similarly we get k

RE ¼ k

TE ¼ Also, we get

Z 2 cos ϕ Z 1 cos θ , Z 1 cos θ þ Z 2 cos ϕ

ð8:59Þ

2Z 2 cos θ : Z 1 cos θ þ Z 2 cos ϕ

ð8:60Þ

308

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Table 8.1 Reflection and transmission coefficients of TE and TM waves ⊥ Incidence (TE) R⊥ E

¼

k Incidence (TM)

Z 2 cos θZ 1 cos ϕ Z 2 cos θþZ 1 cos ϕ

k

ϕZ 1 cos θ RE ¼ ZZ 21 cos cos θþZ 2 cos ϕ

Z 1 cos ϕZ 2 cos θ ⊥ R⊥ H ¼ Z 2 cos θþZ 1 cos ϕ ¼ RE

θZ 2 cos ϕ RH ¼ ZZ 11 cos cos θþZ 2 cos ϕ ¼ RE

2Z 2 cos θ T⊥ E ¼ Z 2 cos θþZ 1 cos ϕ

2Z 2 cos θ T E ¼ Z 1 cos θþZ 2 cos ϕ

2Z 1 cos θ T⊥ H ¼ Z 2 cos θþZ 1 cos ϕ

2Z 1 cos θ T H ¼ Z 1 cos θþZ 2 cos ϕ

⊥ ⊥ ⊥ R⊥ E RH þ T E T H

k

k

k k

cos ϕ cos θ

k

¼ 1 ðR⊥ þ T ⊥ ¼ 1Þ

k

RH ¼

k

k

k cos ϕ cos θ

RE RH þ T E T H

¼ 1 Rk þ T k ¼ 1

Z 1 cos θ Z 2 cos ϕ k ¼ RE , Z 1 cos θ þ Z 2 cos ϕ

ð8:61Þ

2Z 1 cos θ : Z 1 cos θ þ Z 2 cos ϕ

ð8:62Þ

k

TH ¼

In Table 8.1, we list the important coefficients in relation to the reflection and transmission of electromagnetic waves along with their relationship. In Examples 8.1 and 8.2, we have examined how the reflection and transmission coefficients vary as a function of characteristic impedance as well as incidence and refraction angles. Meanwhile, in a nonmagnetic substance a refractive index n can be approximated as n

pffiffiffiffi εr ,

ð7:57Þ

assuming that μr 1. In this case, we have Z¼

pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi μ=ε ¼ μr μ0 =εr ε0 Z 0 = εr Z 0 =n:

Using this relation, we can readily rewrite the reflection and transmission coefficients as a function of refractive indices of the dielectrics. The derivation is left for the readers.

8.4

Energy Transport by Electromagnetic Waves

Returning to (7.58) and (7.59), let us consider energy transport in a dielectric medium by electromagnetic waves. Let us describe their electric (E) and magnetic (H) fields of the electromagnetic waves in a uniform and infinite dielectric medium such that

8.4 Energy Transport by Electromagnetic Waves

309

E = Eεe eiðknxωtÞ ,

ð8:63Þ

H = Hεm eiðknxωtÞ ,

ð8:64Þ

where εe and εm are unit polarization vector; we assume that both E and H are positive. Notice again that εe, εm, and n constitute a right-handed system in this order. The energy transport is characterized by a Poynting vector S that is described by S = E H:

ð8:65Þ

V A Since E and H have a dimension [m ] and [m ], respectively, S has a dimension [mW2]. Hence, S represents an energy flow per unit time and per unit area with respect to the propagation direction. For simplicity, let us assume that the electromagnetic wave is propagating toward the z-direction. Then we have

E = Eεe eiðkzωtÞ ,

ð8:66Þ

H = Hεm eiðkzωtÞ :

ð8:67Þ

To seek a time-averaged energy flow toward the z-direction, it suffices to multiply real parts of (8.66) and (8.67) and integrate it during a period T at a point of z ¼ 0. Thus, a time-averaged Poynting vector S is given by EH S ¼ e3 T

Z

T

cos 2 ωtdt,

ð8:68Þ

0

where T ¼ 1/ν ¼ 2π/ω. Using a trigonometric formula 1 cos 2 ωt ¼ ð1 þ cos 2ωt Þ, 2

ð8:69Þ

the integration can easily be performed. Thus, we get 1 S ¼ EHe3 : 2

ð8:70Þ

1 S ¼ E H : 2

ð8:71Þ

Equivalently, we have

Meanwhile, an energy density W is given by

310

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

1 W ¼ ðE D þ H BÞ, 2

ð8:72Þ

where the first and second terms are pertinent to the electric and magnetic fields, V C respectively. Note in (8.72) that the dimension of E D is [m m2] ¼ [mJ3] and that the A Vs Ws J dimension of H B is [m m2 ] ¼ [ m3 ] ¼ [m3 ]. Using (7.7) and (7.10), we have W¼

1 2 εE þ μH 2 : 2

ð8:73Þ

As in the above case, estimating a time-averaged energy density W, we get W¼

1 1 2 1 2 1 1 εE þ μH = εE 2 þ μH 2 : 2 2 2 4 4

ð8:74Þ

We also get this relation by integrating (8.73) over a wavelength λ at a time of t ¼ 0. Using (7.60) and (7.61), we have εE2 = μH 2 :

ð8:75Þ

This implies that the energy density resulting from the electric field and that due to the magnetic field have the same value. Thus, rewriting (8.74) we have 1 1 W ¼ εE 2 = μH 2 : 2 2

ð8:76Þ

Moreover, using (7.43), we have for an impedance Z ¼ E=H ¼

pffiffiffiffiffiffiffiffi μ=ε ¼ μv or E ¼ μvH:

ð8:77Þ

Using this relation along with (8.75), we get 1 1 S ¼ vεE 2 e3 ¼ vμH 2 e3 : 2 2

ð8:78Þ

Thus, we have various relations among amplitudes of electromagnetic waves and related physical quantities together with constant of dielectrics. Returning to Examples 8.1 and 8.2, let us further investigate the reflection and transmission properties of the electromagnetic waves. From (8.51) to (8.55) as well as (8.59) to (8.62), we get in both the cases of TE and TM waves ⊥ ⊥ ⊥ R⊥ E RH þ TE T H

cos ϕ ¼ 1, cos θ

ð8:79Þ

8.4 Energy Transport by Electromagnetic Waves k

k

k

k

RE RH þ T E T H

311

cos ϕ ¼ 1: cos θ

ð8:80Þ

In both the TE and TM cases, we define reflectance R and transmittance T such that R RE RH ¼ R2E ¼ 2 j Sr j =2 j Si j¼j Sr j = j Si j ,

ð8:81Þ

where Sr and Si are time-averaged Poynting vectors of the reflected wave and incident waves, respectively. Also, we have T TETH

cos ϕ 2 j St j cos ϕ j St j cos ϕ ¼ ¼ , cos θ 2 j Si j cos θ j Si j cos θ

ð8:82Þ

where St is a time-averaged Poynting vector of the transmitted wave. Thus, we have R þ T ¼ 1:

ð8:83Þ

ϕ The relation (6.83) represents the energy conservation. The factor cos cos θ can be understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose that we have an incident wave with an irradiance I mW2 whose incidence plane is the zx-plane. Notice that I has the same dimension as a Poynting vector. Here let us think of the luminous flux that is getting through a unit area (i.e., a unit length square) perpendicular to the propagation direction of the light. Then, this flux illuminates an area on the interface of a unit length (in the y-direction) multiplied by ϕ a length of cos cos θ (in the x-direction). That is, the luminous flux has been widened ϕ (or squeezed) by cos cos θ times after getting through the interface. The irradiance has been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance of income and outgo with respect to the luminous flux before and after getting ϕ through the interface, the transmission irradiance must be multiplied by a factor cos cos θ.

Fig. 8.6 Luminous flux near the interface

z

ki

'

kr

θ x

'

φ kt

312

8.5

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Brewster Angles and Critical Angles

In this section and subsequent sections, we deal with nonmagnetic substance as dielectrics; namely we assume μr 1. In that case, as mentioned in Sect. 8.3 we rewrite, e.g., (8.51) and (8.59) as R⊥ E ¼ k

RE ¼

cos θ n cos ϕ , cos θ þ n cos ϕ

ð8:84Þ

cos ϕ n cos θ , cos ϕ þ n cos θ

ð8:85Þ

where n(¼n2/n1) is a relative refractive index of D2 relative to D1. Let us think of a k condition on which R⊥ E ¼ 0 or RE ¼ 0. First, we consider (8.84). We have ½numerator of ð8:84Þ ¼ cos θ n cos ϕ ¼ cos θ ¼

sin θ cos ϕ sin ϕ

sin ϕ cos θ sin θ cos ϕ sin ðϕ θÞ , ¼ sin ϕ sin ϕ

ð8:86Þ

where with the second equality we used Snell’s law; the last equality is due to trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have π/2 < ϕ θ < π/2. Therefore, if and only if ϕ θ ¼ 0, sin(ϕ θ) ¼ 0. Namely, only when ϕ ¼ θ, R⊥ E could vanish. For different dielectrics having different refractive indices, only if ϕ ¼ θ ¼ 0 (i.e., a vertical incidence), we have ϕ ¼ θ. But, in that case we have lim

ϕ!0, θ!0

sin ðϕ θÞ 0 ¼ : sin ϕ 0

This is a limit of indeterminate form. From (8.84), however, we have R⊥ E ¼

1n , 1þn

ð8:87Þ

⊥ for ϕ ¼ θ ¼ 0. This implies that R⊥ E does not vanish at ϕ ¼ θ ¼ 0. Thus, RE never vanishes for any θ or ϕ. Note that for this condition, naturally we have k

RE ¼

1n : 1þn

This is because with ϕ ¼ θ ¼ 0 we have no physical difference between TE and TM waves. In turn, let us examine (8.85) similarly with the case of TM wave. We have

8.5 Brewster Angles and Critical Angles

313

½numerator of ð8:85Þ ¼ cos ϕ n cos θ ¼ cos ϕ ¼

sin θ cos θ sin ϕ

sin ðϕ θÞ cos ðϕ þ θÞ sin ϕ cos ϕ sin θ cos θ : ¼ sin ϕ sin ϕ

ð8:88Þ

With the last equality of (8.88), we used a trigonometric formula. From (8.86) we k ðϕθÞ know that sinsin does not vanish. Therefore, for RE to vanish, we need cos ϕ (ϕ + θ) ¼ 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) ¼ 0 if and only if ϕ þ θ ¼ π=2:

ð8:89Þ

In other words, for particular angles θ ¼ θB and ϕ ¼ ϕB that satisfy ϕB þ θB ¼ π=2,

ð8:90Þ

k

we have RE ¼ 0; i.e., we do not observe a reflected wave. The particular angle θB is said to be the Brewster angle. For θB we have sin ϕB ¼ sin

π θB ¼ cos θB , 2

n ¼ sin θB = sin ϕB ¼ sin θB = cos θB ¼ tan θB or θB ¼ tan 1 n, ϕB ¼ tan 1 n1 :

ð8:91Þ

Suppose that we have a parallel plate consisting a dielectric D2 of a refractive index n2 sandwiched with another dielectric D1 of a refractive index n1 (Fig. 8.7). Let θB be the Brewster angle when the TM wave is incident from D1 to D2. In the

z

θB

'

'

'

φB

x

φB

θB

Fig. 8.7 Diagram that explains the Brewster angle. Suppose that a parallel plate consisting of a dielectric D2 of a refractive index n2 is sandwiched with another dielectric D1 of a refractive index n1. The incidence angle θB represents the Brewster angle observed when the TM wave is incident from D1 to D2. ϕB is another Brewster angle that is observed when the TM wave is getting back from D2 to D1

314

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

above discussion, we defined a relative refractive index n of D2 relative D1 as n ¼ n2/n1; recall (8.39). The other way around, suppose that the TM wave is incident from D2 to D1. Then, the relative refractive index of D1 relative to D2 is n1/n2 ¼ n1. In this situation, another Brewster angle (from D2 to D1) defined as e θB is given by e θB ¼ tan 1 n1 :

ð8:92Þ

This number is, however, identical to ϕB in (8.91). Thus, we have e θ B ¼ ϕB :

ð8:93Þ

Thus, regarding the TM wave that is propagating in D2 after getting through the interface and is to get back to D1, e θB ¼ ϕB is again the Brewster angle. In this way, the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1 without being reflected by the two interfaces. This conspicuous feature is often utilized for an optical device. If an electromagnetic wave is incident from a dielectric of a higher refractive index to that of a lower index, the total reflection takes place. This is equally the case with both TE and TM waves. For the total reflection to take place, θ should be larger than a critical angle θc that is defined by θc ¼ sin 1 n:

ð8:94Þ

This is because at θc from the Snell’s law we have sin θc n ¼ sin θc ¼ 2 ð nÞ: sin π2 n1

ð8:95Þ

From (8.95), we have pffiffiffiffiffiffiffiffiffiffiffiffiffi tan θc ¼ n= 1 n2 > n ¼ tan θB : In the case of the TM wave, therefore, we find that θc > θB : The critical angle is always larger than the Brewster angle with TM waves.

ð8:96Þ

8.6 Total Reflection

8.6

315

Total Reflection

In Sect. 8.2 we saw that the Snell’s law results from the kinematical requirement. For this reason, we may consider it as a universal relation that can be extended to complex refraction angles. In fact, for the Snell’s law to hold with θ > θc, we must have sin ϕ > 1:

ð8:97Þ

This needs us to extend ϕ to a complex domain. Putting ϕ¼

π þ ia ða : real, 2

a 6¼ 0Þ,

ð8:98Þ

we have 1 1 iϕ e eiϕ ¼ ðea þ ea Þ > 1, 2i 2 1 i cos ϕ eiϕ þ eiϕ ¼ ðea ea Þ: 2 2

sin ϕ

ð8:99Þ ð8:100Þ

Thus, cosϕ is pure imaginary. Now, let us consider a transmitted wave whose electric field is described as Et = Eεt eiðkt xωtÞ ,

ð8:101Þ

where εt is the unit polarization vector and kt is a wavenumber vector of the transmission wave. Suppose that the incidence plane is the zx-plane. Then, we have kt x = ktx x þ k tz z ¼ xk t sin ϕ þ zk t cos ϕ,

ð8:102Þ

where ktx and ktz are x and z components of kt, respectively; kt ¼ j ktj. Putting cos ϕ ¼ ib ðb : real,

b 6¼ 0Þ,

ð8:103Þ

we have Et = Eεt eiðxkt sin ϕþibzkt ωtÞ ¼ Eεt eiðxkt sin ϕωtÞ ebzkt :

ð8:104Þ

With the total reflection, we must have z⟶1 ⟹ ebzkt ⟶0:

ð8:105Þ

316

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

To meet this requirement, we have b > 0:

ð8:106Þ

Meanwhile, we have cos2 ϕ ¼ 1 sin2 ϕ ¼ 1

sin2 θ n2 sin2 θ ¼ , n2 n2

ð8:107Þ

where notice that n < 1, because we are dealing with the incidence of light from a medium with a higher refractive index to a low index medium. When we consider the total reflection, the numerator of (8.107) is negative, and so we have two choices such that pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin2 θ n2 cos ϕ ¼ i : n

ð8:108Þ

From (8.103) and (8.106), we get pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin2 θ n2 : cos ϕ ¼ i n

ð8:109Þ

Hence, inserting (8.109) into (8.84) we have for the TE wave R⊥ E

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cos θ i sin2 θ n2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ¼ cos θ þ i sin2 θ n2

ð8:110Þ

Then, we have ⊥ ⊥ ¼ R⊥ R⊥ ¼ R⊥ E RH E RE pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cos θ i sin2 θ n2 cos θ þ i sin2 θ n2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: ¼ cos θ þ i sin2 θ n2 cos θ i sin2 θ n2

ð8:111Þ

As for the TM wave, substituting (8.109) for (8.85) we have k RE

In this case, we also get

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n2 cos θ þ i sin2 θ n2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ¼ n2 cos θ þ i sin2 θ n2

ð8:112Þ

8.6 Total Reflection

317

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n2 cos θ þ i sin2 θ n2 n2 cos θ i sin2 θ n2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1: R ¼ n2 cos θ þ i sin2 θ n2 n2 cos θ i sin2 θ n2 k

ð8:113Þ

The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher refractive index medium. Thus, the total reflection is characterized by the complex reflection coefficient expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and (8.112) we can estimate a change in a phase of the electromagnetic wave that takes place by virtue of the total reflection. For this purpose, we put k

iα iβ R⊥ E e and RE e :

ð8:114Þ

Rewriting (8.110), we have R⊥ E

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cos2 θ sin2 θ n2 2i cos θ sin2 θ n2 ¼ : 1 n2

ð8:115Þ

At a critical angle θc, from (8.95) we have sin θc ¼ n:

ð8:116Þ

1 n2 ¼ cos2 θc :

ð8:117Þ

R⊥ E θ¼θc ¼ 1:

ð8:118Þ

Therefore, we have

Then, as expected, we get

Note, however, that at θ ¼ π/2 (i.e., grazing incidence) we have R⊥ E θ¼π=2 ¼ 1:

ð8:119Þ

From (8.115), an argument α in a complex plane is given by pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 cos θ sin2 θ n2 : tan α ¼ cos2 θ sin2 θ n2

ð8:120Þ

The argument α defines a phase shift upon the total reflection. Considering (8.115) and (8.118), we have

318

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.8 Phase shift α defined in a complex plane for the total reflection of TE wave. The number n denotes a relative refractive index of D2 relative to D1. At a critical angle θc, α ¼ 0

i

1

0 =

α

=

2

= sin

1+ 2

αjθ¼θc ¼ 0: Since 1 n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ n2 > 0, the imaginary part of R⊥ E is negative for any θ (i.e., 0 to π/2). On the other hand, the real varies from 1 to 1, as is evidenced from (8.118) and (8.119). At θ0 that part of R⊥ E satisfies a following condition: rffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ n2 sin θ0 ¼ , 2

ð8:121Þ

the real part is zero. Thus, the phase shift α varies from 0 to π as indicated in Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have θc < θ0 < π=2: Similarly, we estimate the phase change for a TM wave. Rewriting (8.112), we have k RE

n4 cos2 θ þ sin2 θ n2 þ 2in2 cos θ ¼ n4 cos2 θ þ sin2 θ n2

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin2 θ n2

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n4 cos2 θ þ sin2 θ n2 þ 2in2 cos θ sin2 θ n2 : ¼ ð1 n2 Þ sin2 θ n2 cos2 θ

ð8:122Þ

Then, we have k RE

θ¼θc

¼ 1:

Also at θ ¼ π/2 (i.e., grazing incidence) we have

ð8:123Þ

8.7 Waveguide Applications

319

k RE

θ¼π=2

¼ 1:

ð8:124Þ

From (8.122), an argument β is given by pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2n2 cos θ sin2 θ n2 tan β ¼ : n4 cos2 θ þ sin2 θ n2

ð8:125Þ

Considering (8.122) and (8.123), we have βjθ¼θc ¼ π:

ð8:126Þ

In the total reflection region we have sin2 θ n2 cos2 θ > n2 n2 cos2 θ ¼ n2 1 cos2 θ > 0:

ð8:127Þ

Therefore, the denominator of (8.122) is positive and, hence, the imaginary part of k RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the other k θ0 that satisfies a hand, the real part of RE in (8.122) varies from 1 to 1. At e following condition: cos e θ0 ¼

rffiffiffiffiffiffiffiffiffiffiffiffiffi 1 n2 , 1 þ n4

ð8:128Þ

k

the real part of RE is zero. Once again, we have θc < e θ0 < π=2:

ð8:129Þ

Thus, the phase β varies from π to 0 as depicted in Fig. 8.9. In this section, we mentioned somewhat peculiar features of complex trigonometric functions such as sinϕ > 1 in light of real functions. As a matter of course, the complex angle ϕ should be determined experimentally from (8.109). In this context readers are referred to Chap. 6 that dealt with the theory of analytic functions [2].

8.7

Waveguide Applications

There are many optical devices based upon light propagation. Among them, waveguide devices utilize the total reflection. We explain their operation principle. Suppose that we have a thin plate (usually said to be a slab) comprising a dielectric medium that infinitely spreads two-dimensionally and that the plate is sandwiched with another dielectric (or maybe air or vacuum) or metal. In this

320

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.9 Phase shift β for the total reflection of TM wave. At a critical angle θc, β¼π

= cos

1− 1+

i

β 1 0 =

=

2

situation electromagnetic waves are confined within the slab. Moreover, only under a restricted condition those waves are allowed to propagate in parallel to the slab plane. Such electromagnetic waves are usually called propagation modes or simply “modes.” An optical device thus designed is called a waveguide. These modes are characterized by repeated total reflection during the propagation. Another mode is an evanescent mode. Because of the total reflection, the energy transport is not allowed to take place vertically to the interface of two dielectrics. For this reason, the evanescent mode is thought to be propagated in parallel to the interface very close to it.

8.7.1

TE and TM Waves in a Waveguide

In a waveguide configuration, propagating waves are classified into TE and TM modes. Quality of materials that constitute a waveguide largely governs the propagation modes within the waveguide. Figure 8.10 depicts a cross-section of a slab waveguide. We assume that the electromagnetic wave is propagated toward the positive direction of the z-axis and that a waveguide infinitely spreads toward the z- and x-axes. Suppose that the said waveguide is spatially confined toward the y-axis. Let the thickness of the waveguide be d. From a point of view of material that shapes a waveguide, waveguides are classified into two types. (i) Electromagnetic waves are completely confined within the waveguide layer. This case typically happens when a dielectric forming the waveguide is sandwiched between a couple of metal layers (Fig. 8.10a). This is because the electromagnetic wave is not allowed to exist or propagate inside the metal. (ii) Electromagnetic waves are not completely confined within the waveguide. This case happens when the dielectric of the waveguide is sandwiched by a couple of

8.7 Waveguide Applications

(b) Dielectric (clad layer)

Metal d

Waveguide Metal y

Waveguide (core layer)

d

(a)

321

Dielectric (clad layer)

y z x

z x

Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer

other dielectrics. We distinguish this case as the total internal reflection from the above case (i). We further describe it in Sect. 8.7.2. For the total internal reflection to take place, the refractive index of the waveguide must be higher than those of other dielectrics (Fig. 8.10b). The dielectric of the waveguide is called core layer and the other dielectric is called clad layer. In this case, electromagnetic waves are allowed to propagate inside of the clad layer, even though the region is confined very close to the interface between the clad and core layers. Such electromagnetic waves are said to be an evanescent wave. According to these two cases (i) and (ii), we have different conditions under which the allowed modes can exist. Now, let us return to Maxwell’s equations. We have introduced the equations of wave motion (7.35) and (7.36) from Maxwell’s eqs. (7.28) and (7.29) along with (7.7) and (7.10). One of their simplest solutions is a plane wave described by (7.53). The plane wave is characterized by that the wave has the same phase on an infinitely spreading plane perpendicular to the propagation direction (characterized by a wavenumber vector k). In a waveguide, however, the electromagnetic field is confined with respect to the direction parallel to the normal to the slab plane (i.e., the direction of the y-axis in Fig. 8.10). Consequently, the electromagnetic field can no longer have the same phase in that direction. Yet, as solutions of equations of wave motion, we can have a solution that has the same phase with the direction of the x-axis. Bearing in mind such a situation, let us think of Maxwell’s equations in relation to the equations of wave motion. Ignoring components related to partial differentiation with respect to x (i.e., the component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian coordinates, we have [3]. ∂Ey ∂Bx ∂E z 2 þ ¼ 0, ∂y ∂z ∂t

ð8:130Þ

322

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

∂E x ∂By þ ¼ 0, ∂z ∂t

ð8:131Þ

∂Ex ∂Bz þ ¼ 0, ∂y ∂t

ð8:132Þ

∂H y ∂Dx ∂H z 2 ¼ 0, ∂y ∂z ∂t

ð8:133Þ

∂H x ∂Dy ¼ 0, ∂z ∂t

ð8:134Þ

∂H x ∂Dz ¼ 0: ∂y ∂t

ð8:135Þ

Of the above equations, we collect those pertinent to Ex and differentiate (8.131), (8.132), and (8.133) with respect to z, y, and t, respectively, to get 2

2 ∂ E x ∂ By ¼ 0, þ 2 ∂z∂t ∂z 2

ð8:136Þ

2

∂ E x ∂ Bz ¼ 0, 2 ∂y∂t ∂y

ð8:137Þ

2

2 ∂ H y ∂2 Dx ∂ Hz 2 ¼ 0: 2 ∂t∂y ∂t∂z ∂t

ð8:138Þ

Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using (7.7), we get 2

2

2

∂ Ex ∂ Ex ∂ E þ ¼ με 2 x : 2 2 ∂y ∂z ∂t

ð8:139Þ

This is a two-dimensional equation of wave motion. In a similar manner, from (8.130), (8.134), and (8.135), we have for the magnetic field 2

2

2

∂ Hx ∂ Hx ∂ Hx þ ¼ με : 2 2 2 ∂y ∂z ∂t

ð8:140Þ

Equations (8.139) and (8.140) are two-dimensional wave equations with respect to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has the same phase. Suppose that we have plane wave solutions for them as in the case of (7.58) and (7.59). Then, we have

8.7 Waveguide Applications

323

E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ , H = H0 eiðkxωtÞ ¼ H0 eiðknxωtÞ :

ð8:141Þ

Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in a dielectric medium. In a waveguide, on the other hand, the electromagnetic waves undergo repeated (total) reflections from the two boundaries positioned either side of the waveguide, while being propagated. In a three-dimensional version, the wavenumber vector has three components kx, ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components such that k2 ¼ k 2 ¼ k 2y þ k2z :

ð8:142Þ

Equations (8.139) and (8.140) can be rewritten as 2

2

2

2

∂ Ex ∂ Ex ∂ Ex þ ¼ με , ∂ð yÞ2 ∂ð zÞ2 ∂ð t Þ2

2

2

∂ Hx ∂ Hx ∂ Hx þ ¼ με : ∂ð yÞ2 ∂ð zÞ2 ∂ð t Þ2

Accordingly, we have four wavenumber vector components k y ¼ k y

and

kz ¼ j kz j :

Figure 8.11 indicates this situation where an electromagnetic wave can be propagated within a slab waveguide in either one direction out of four choices of k. In this section, we assume that the electromagnetic wave is propagated toward the positive direction of the z-axis, and so we define kz as positive. On the other hand, ky can be either positive or negative. Thus, we get kz ¼ k sin θ

and

ky ¼ k cos θ:

ð8:143Þ

|

Fig. 8.11 Four possible propagation directions k of an electromagnetic wave in a slab waveguide

|

−| |

| |

−|

|

324

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.12 Geometries and propagation of the electromagnetic waves in a slab waveguide

ky

k

θ kz

kz ky

k

θ

y z x Figure 8.12 shows the geometries of the electromagnetic waves within the slab waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two interfaces of the slab waveguide be y ¼ 0 and y ¼ d:

ð8:144Þ

That is, we assume that the thickness of the waveguide is d. Since (8.139) describes a wave equation for only one component Ex, (8.139) is suited for representing a TE wave. In (8.140), in turn, a wave equation is given only for Hx, and hence, it is suited for representing a TM wave. With the TE wave the electric field oscillates parallel to the slab plane and vertical to the propagation direction. With the TM wave, in turn, the magnetic field oscillates parallel to the slab plane and vertical to the propagation direction. In a general case, electromagnetic waves in a slab waveguide are formed by superposition of TE and TM waves. Notice that Fig. 8.12 is applicable to both TE and TM waves. Let us further proceed with the waveguide analysis. The electric field E within the waveguide is described by superposition of the incident and reflected waves. Using the first equation of (8.141) and (8.143), we have Eðz, yÞ = εe Eþ eiðkz sin θþky cos θωtÞ þ εe 0 E eiðkz sin θky cos θωtÞ ,

ð8:145Þ

where E+ (E) and εe ε0e represent an amplitude and unit polarization vector of the incident (reflected) waves, respectively. The vector εe (or εe0) is defined in (7.67). Equation (8.145) is common to both the cases of TE and TM waves. From now, we consider the TE mode case. Suppose that the slab waveguide is sandwiched with a couple of metal sheet of high conductance. Since the electric field must be absent inside the metal, the electric field at the interface must be zero owing to the continuity condition of a tangential component of the electric field. Thus, we require the following condition should be met with (8.145):

8.7 Waveguide Applications

325

t Eðz, 0Þ = 0 = t εe E þ eiðkz sin θωtÞ þ t εe 0 E eiðkz sin θωtÞ = ðt εe Eþ þ t εe 0 E Þeiðkz sin θωtÞ :

ð8:146Þ

Therefore, since ei(kz sin θ ωt) never vanishes, we have t εe E þ þ t εe 0 E ¼ 0,

ð8:147Þ

where t is a tangential unit vector at the interface. Since E is polarized along the x-axis, setting εe ¼ εe0 ¼ e1 and taking t as e1, we get E þ þ E ¼ 0: This means that the reflection coefficient of the electric field is 1. Denoting E+ ¼ E E0 (>0), we have h i E = e1 E0 eiðkz sin θþky cos θωtÞ eiðkz sin θky cos θωtÞ h i = e1 E0 eiky cos θ eiky cos θ eiðkz sin θωtÞ = e1 2iE 0 sin ðky cos θÞeiðkz sin θωtÞ :

ð8:148Þ

Requiring the electric field to vanish at another interface of y ¼ d, we have Eðz, dÞ = 0 = e1 2iE 0 sin ðkd cos θÞeiðkz sin θωtÞ : Note that in terms of the boundary conditions we are thinking of Dirichlet conditions (see Sects. 1.3 and 10.3). In this case, we have nodes for the electric field at the interface between metal and a dielectric. For this condition to be satisfied, we must have kd cos θ ¼ mπ ðm ¼ 1, 2, Þ:

ð8:149Þ

From (8.149), we have a following condition for m: m kd=π:

ð8:150Þ

k ¼ nk 0 ,

ð8:151Þ

Meanwhile, we have

326

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

where n is a refractive index of a dielectric that shapes the slab waveguide; the quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is given by n ¼ c=v,

ð8:152Þ

where c and v are light velocity in vacuum and the dielectric media, respectively. Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed to take several (or more) numbers depending upon k, d, and m. Since in the z-direction no specific boundary conditions are imposed, we have propagating modes in that direction characterized by a propagation constant (vide infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a free space. For this reason, a quantity β defined as β ¼ k sin θ ¼ nk0 sin θ

ð8:153Þ

is said to be a propagation constant. In (8.153), k0 is a wavenumber in vacuum. From (8.149) and (8.153), we get

1=2 m2 π 2 2 β¼ k 2 : d

ð8:154Þ

Thus, the allowed TE waves indexed by m are called TE modes and represented as TEm. The phase velocity vp is given by vp ¼ ω=β:

ð8:155Þ

Meanwhile, the group velocity vg is given by dω vg ¼ ¼ dβ

dβ dω

1

:

ð8:156Þ

Using (8.154) and noting that k2 ¼ ω2/v2, we get vg ¼ v2 β=ω:

ð8:157Þ

vp vg ¼ v2 :

ð8:158Þ

Thus, we have

Note that in (1.22) of Sect 1.1 we saw a relationship similar to (8.158). The characteristics of TM waves can be analyzed in a similar manner by examining the magnetic field Hx. In that case, the reflection coefficient of the magnetic field is +1 and we have antinodes for the magnetic field at the interface.

8.7 Waveguide Applications

327

Concomitantly, we adopt Neumann conditions as the boundary conditions (see Sects. 1.3 and 8.3). Regardless of the difference in the boundary conditions, however, discussion including (8.149) to (8.158) applies to the analysis of TM waves. Once Hx is determined, Ey and Ez can be determined as well from (8.134) and (8.135).

8.7.2

Total Internal Reflection and Evanescent Waves

If a slab waveguide shaped by a dielectric is sandwiched by a couple of another dielectric (Fig. 8.10b), the situation differs from a metal waveguide (Fig. 8.10a) we encountered in Sect. 8.7.1. Suppose in Fig. 8.10b that the former dielectric D1 of a refractive index n1 is sandwiched by the latter dielectric D2 of a refractive index n2. Suppose that an electromagnetic wave is being propagated from D1 toward D2. Then, we must have n1 > n2

ð8:159Þ

so that the total internal reflection can take place at the interface of D1 and D2. In this case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively. The biggest difference between the present waveguide and the previous one is that unlike the previous case, the total internal reflection occurs in the present case. Concomitantly, an evanescent wave is present in the clad layer very close to the interface. First, let us estimate the conditions that are satisfied so that an electromagnetic wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of the waveguide where the light is propagated in the direction of k. In Fig. 8.13, suppose that we have a normal N to the plane of paper at P. Then, N and a straight line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space situated below NXY. Further suppose that there is another virtual plane N’X’Y’ that is parallel with NXY as shown. Here N0 is parallel to N. The separation of the two parallel plane is d. We need the virtual plane N’X’Y0 just to estimate an optical path difference (or phase difference, more specifically) between two waves, i.e., a propagating wave and a reflected wave. Let n be a unit vector in the direction of k; i.e., n ¼ k= j k j¼ k=k:

ð8:160Þ

Then the electromagnetic wave is described as E = E0 eiðkxωtÞ ¼ E0 eiðknxωtÞ : Suppose that we take a coordinate system such that

ð8:161Þ

328

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.13 Cross-section of the waveguide where the light is propagated in the direction of k. A dielectric fills a semi-infinite space situated below NXY. We suppose another virtual plane N’X’Y0 that is parallel with NXY. We need the plane N’X’Y0 to estimate an optical path difference (or phase difference)

3 1

;

0Þ:

ð8:174Þ

Consequently, we get δ tan τ ¼ tan TM ¼ 2

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin2 θ n2 : n2 cos θ

ð8:175Þ

Finally, the additional phase change δTE and δTM upon the total reflection is given by [4]. δTE ¼ 2 tan

1

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin2 θ n2 sin2 θ n2 1 : and δTM ¼ 2 tan cos θ n2 cos θ

ð8:176Þ

We emphasize that in (8.176) both δTE and δTM are negative quantities. This phase shift has to be included in (8.167) as a negative quantity δ. At a first glance, (8.176) seems to differ largely from (8.120) and (8.125). Nevertheless, noting that a trigonometric formula tan 2x ¼

2 tan x 1 tan2 x

and remembering that δTM in (8.168) includes π arising from the phase reversal, we find that both the relations are virtually identical. Evanescent waves are drawing a large attention in the field of basic physics and applied device physics. If the total internal reflection is absent, ϕ is real. But, under

8.8 Stationary Waves

331

the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent wave is described as Et = Eεt eiðkt z sin ϕþkt y cos ϕωtÞ ¼ Eεt eiðkt z sin ϕþkt yibωtÞ ¼ Eεt eiðkt z sin ϕωtÞ ekt yb :

ð8:177Þ

In (8.177), a unit polarization vector εt is either perpendicular to the plane of paper of Fig. 8.13 (the TE case) or in parallel to it (the TM case). Notice that the coordinate system is different from that of (8.104). The quantity kt sin ϕ is the propagation constant. Let vðpsÞ and vðpeÞ be a phase velocity of the electromagnetic wave in the slab waveguide (i.e., core layer) and evanescent wave in the clad layer, respectively. Then, in virtue of Snell’s law we have v1 < vðpsÞ ¼

v1 ω ω v ¼ ¼ ¼ vðpeÞ ¼ 2 < v2 , sin θ k sin θ kt sin ϕ sin ϕ

ð8:178Þ

where v1 and v2 are light velocity in a free space filled by the dielectric D1 and D2, respectively. For this, we used a relation described as ω ¼ v1 k ¼ v2 k t :

ð8:179Þ

We also used Snell’s law with the third equality. Notice that sin ϕ > 1 in the evanescent region and that kt sin ϕ is a propagation constant in the clad layer. Also note that vðpsÞ is equal to vðpeÞ and that these phase velocities are in between the two velocities of the free space. Thus, the evanescent waves must be present, accompanying propagating waves that undergo the total internal reflections in a slab waveguide. As remarked in (6.105), the electric field of evanescent waves decays exponentially with increasing z. This implies that the evanescent waves exist only in the clad layer very close to an interface of core and clad layers.

8.8

Stationary Waves

So far, we have been dealing with propagating waves either in a free space or in a waveguide. If the dielectric shaping the waveguide is confined in another direction, the propagating waves show specific properties. Examples include optical fibers. In this section we consider a situation where the electromagnetic wave is propagating in a dielectric medium and reflected by a “wall” formed by metal or another dielectric. In such a situation, the original wave (i.e., a forward wave) causes interference with the backward wave and a stationary wave is formed as a consequence of the interference.

332

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

To approach this issue, we deal with superposition of two waves that have different phases and different amplitudes. To generalize the problem, let ψ 1 and ψ 2 be two cosine functions described as ψ 1 ¼ a1 cos b1 and ψ 2 ¼ a2 cos b2 :

ð8:180Þ

Their addition is expressed as ψ ¼ ψ 1 þ ψ 2 ¼ a1 cos b1 þ a2 cos b2 :

ð8:181Þ

Here we wish to unify (8.181) as a single cosine (or sine) function. To this end, we modify a description of ψ 2 such that ψ 2 ¼ a2 cos ½b1 þ ðb2 b1 Þ ¼ a2 ½ cos b1 cos ðb2 b1 Þ sin b1 sin ðb2 b1 Þ ¼ a2 ½ cos b1 cos ðb1 b2 Þ þ sin b1 sin ðb1 b2 Þ:

ð8:182Þ

Then, the addition is described by ψ ¼ ψ1 þ ψ2 ¼ ½a1 þ a2 cos ðb1 b2 Þ cos b1 þ a2 sin ðb1 b2 Þ sin b1 :

ð8:183Þ

Putting R such that pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½a1 þ a2 cos ðb1 b2 Þ2 þ a2 2 sin2 ðb1 b2 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ a1 2 þ a2 2 þ 2a1 a2 cos ðb1 b2 Þ,

R¼

ð8:184Þ

we get ψ ¼ R cos ðb1 θÞ,

ð8:185Þ

where θ is expressed by tan θ ¼

a2 sin ðb1 b2 Þ : a1 þ a2 cos ðb1 b2 Þ

ð8:186Þ

Figure 8.14 represents a geometrical diagram in relation to the superposition of two waves having different amplitudes (a1 and a2) and different phases (b1 and b2) [5]. To apply (8.185) to the superposition of two electromagnetic waves that are propagating forward and backward to collide head-on with each other, we change the variables such that

8.8 Stationary Waves

333

Fig. 8.14 Geometrical diagram in relation to the superposition of two waves having different amplitudes (a1 and a2) and different phases (b1 and b2)

y cos(

−

)

sin(

−

)

b1 ‒ b2

a1

θ a2 O

b1

b2

b1 ¼ kx ωt and b2 ¼ kx ωt,

x

ð8:187Þ

where the former equation represents a forward wave, whereas the latter a backward wave. Then we have b1 b2 ¼ 2kx:

ð8:188Þ

ψ ðx, t Þ ¼ R cos ðkx ωt θÞ,

ð8:189Þ

Equation (8.185) is rewritten as

with R¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a1 2 þ a2 2 þ 2a1 a2 cos 2kx

ð8:190Þ

and tan θ ¼

a2 sin 2kx : a1 þ a2 cos 2kx

ð8:191Þ

Equation (8.189) looks simple, but since both R and θ vary as a function of x, the situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless, when x takes a special value, (8.189) is expressed by a simple function form. For example, at t ¼ 0, ψ ðx, 0Þ ¼ ða1 þ a2 Þ cos kx:

334

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.15 Superposition of two sinusoidal waves. In (8.189) and (8.190), we put a1 ¼ 1, a2 ¼ 0.5, with (i) t ¼ 0; (ii) t ¼ T/4; (iii) t ¼ T/2; (iv) t ¼ 3T/4. ψ(x, t) is plotted as a function of phase kx

㻞㻝㻚㻡

(i)

㻝

(iii) (ii)

㻜㻚㻡㻜㻙㻜㻚㻡

(iv)

㻙㻝㻙㻝㻚㻡㻙㻞

0

Phase kx (rad)

6.3

This corresponds to (i) of Fig. 8.15. If t ¼ T/2 (where T is a period, i.e., T ¼ 2π/ω), we have ψ ðx, T=2Þ ¼ ða1 þ a2 Þ cos kx: This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do not have a simple function form. We characterize Fig. 8.15 below. If we have 2kx ¼ nπ or x ¼ nλ=4

ð8:192Þ

with λ being a wavelength, then θ ¼ 0 or π, and so θ can be eliminated. This situation occurs with every quarter period of a wavelength. Let us put t ¼ 0 and examine how the superposed wave looks like. For instance, putting x ¼ 0, x ¼ λ/4, and x ¼ λ/2 we have ψ ð0, 0Þ ¼ ja1 þ a2 j, ψ ðλ=4, 0Þ ¼ ψ ð3λ=4, 0Þ ¼ 0, ψ ðλ=2, 0Þ ¼ ja1 þ a2 j,

ð8:193Þ

respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t ¼ T/4, we have similarly ψ ð0, T=4Þ ¼ 0, ψ ðλ=4, T=4Þ ¼ ja1 a2 j, ψ ðλ=2, T=4Þ ¼ 0, ψ ðλ=4, T=4Þ ¼ ja1 a2 j:

ð8:194Þ

Thus, the waves that vary with time are characterized by two dram-shaped envelopes that have extremals ja1 + a2j and |a1 a2| or those j a1 + a2j and |a1 a2|. An important implication of Fig. 8.15 is that no node is present in the superposed wave.

8.8 Stationary Waves

335

In other words, there is no instant t0 when ψ(x, t0) ¼ 0 for any x. From the aspect of energy transport of electromagnetic waves, if |a1| > j a2j (where a1 and a2 represent an amplitude of the forward and backward waves, respectively), the net energy flow takes place in the travelling direction of the forward wave. If, on the other hand, | a1| > ja2j, the net energy flow takes place in the travelling direction of the backward wave. In this respect, think of Poynting vectors. In case |a1| ¼ ja2j, the situation is particularly simple. No net energy flow takes place in this case. Correspondingly, we observe nodes. Such waves are called stationary waves. Let us consider this simple situation for an electromagnetic wave that is incident perpendicularly to the interface between two dielectrics (one of them may be a metal). Returning back to (7.58) and (7.66), we describe two electromagnetic waves that are propagating in the positive and negative direction of the z-axis such that E1 = E1 εe eiðkzωtÞ and E2 = E2 εe eiðkzωtÞ ,

ð8:195Þ

where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a superposed wave is described as E ¼ E1 þ E2 :

ð8:196Þ

Taking account of the reflection of an electromagnetic wave perpendicularly incident on a wall, let us consider following two cases: (i) Syn-phase: The phase of the electric field is retained upon reflection. We assume that E1 ¼ E2 (>0). Then, we have

Fig. 8.16 Superposition of electric fields of forward (or incident) wave E1 and backward (or reflected) wave E2

=

(

=

(

)

)

336

8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

h i E = E1 εe eiðkzωtÞ þ eiðkzωtÞ ¼ E1 εe eiωt eikz þ eikz ¼ 2E 1 εe eiωt cos kz:

ð8:197Þ

In (8.197), we put z ¼ 0 at the interface for convenience. Taking a real part of (8.197), we have E ¼ 2E1 εe cos ωt cos kz:

ð8:198Þ

Note that in (8.198) variables z and t have been separated. This implies that we have a stationary wave. For this case to be realized, the characteristic impedance of the dielectric of the incident wave side should be smaller enough than that of the other side; see (8.51) and (8.59). In other words, the dielectric constant of the incident side should be large enough. We have nodes at positions that satisfy kz ¼

π 1 m þ mπ ðm ¼ 0, 1, 2, Þ or z ¼ λ þ λ: 2 4 2

ð8:199Þ

Note that we are thinking of the stationary wave in the region of z < 0. Equation (8.199) indicates that nodes are formed at a quarter wavelength from the interface and every half wavelength from it. The node means the position where no electric field is present. Meanwhile, antinodes are observed at positions kz ¼ mπ ðm ¼ 0, 1, 2, Þ or z ¼ þ

m λ: 2

Thus, the nodes and antinodes alternate with every quarter wavelength. (ii) Anti-phase: The phase of the electric field is reversed. We assume that E1 ¼ E2 (>0). Then, we have h i E = E 1 εe eiðkzωtÞ eiðkzωtÞ ¼ E1 εe eiωt eikz eikz ¼ 2iE 1 εe eiωt sin kz:

ð8:200Þ

Taking a real part of (8.197), we have E ¼ 2E 1 εe sin ωt sin kz:

ð8:201Þ

In (8.201) variables z and t have been separated as well. For this case to be realized, the characteristic impedance of the dielectric of the incident wave side should be larger enough than that of the other side. In other words, the dielectric constant of the incident side should be small enough. Practically, this situation

References

337

can easily be attained choosing a metal of high reflectance for the wall material. We have nodes at positions that satisfy kz ¼ mπ ðm ¼ 0, 1, 2, Þ or z ¼

m λ: 2

ð8:202Þ

The nodes are formed at the interface and every half wavelength from it. As in the case of the syn-phase, the antinodes take place with the positions shifted by a quarter wavelength relative to the nodes. If there is another interface at say z ¼ L (>0), the wave goes back and forth many times. If an absolute value of the reflection coefficient at the interface is high enough (i.e., close to the unity), attenuation of the wave is ignorable. For both the syn-phase and anti-phase cases, we must have kL ¼ mπ ðm ¼ 1, 2, Þ or L ¼

m λ 2

ð8:203Þ

so that stationary waves can stand stable. For a practical purpose, an optical device having such an interface is said to be a resonator. Various geometries and constitutions of the resonator are proposed in combination with various dielectrics including semiconductors. Related discussion can be seen in Chap. 9.

References 1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 2. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York 3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester 4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge 5. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester

Chapter 9

Light Quanta: Radiation and Absorption

So far we discussed propagation of light and its reflection and transmission (or refraction) at an interface of dielectric media. We described characteristics of light from the point of view of an electromagnetic wave. In this chapter, we describe properties of light in relation to quantum mechanics. To this end, we start with Planck’s law of radiation that successfully reproduced experimental results related to a blackbody radiation. Before this law had been established, Rayleigh–Jeans law failed to explain the experimental results at a high frequency region of radiation (the ultraviolet catastrophe). The Planck’s law of radiation led to the discovery of light quanta. Einstein interpreted Planck’s law of radiation on the basis of a model of twolevel atoms. This model includes so-called Einstein A and B coefficients that are important in optics applications, especially lasers. We derive these coefficients from a classical point of view based on a dipole oscillation. We also consider a close relationship between electromagnetic waves confined in a cavity and a motion of a harmonic oscillator.

9.1

Blackbody Radiation

Historically, the relevant theory was first propounded by Max Planck and then Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the basis of the experiments called cavity radiation or blackbody radiation. Here, however, we wish to derive Planck’s law of radiation on the assumption of the existence of quantum harmonic oscillators. As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an energy 12 ħω. Therefore, we measure energies of the oscillator in reference to that state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground state. Then, according to Boltzmann distribution law the number of oscillators of the first excited state N1 is © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_9

339

340

9 Light Quanta: Radiation and Absorption

N 1 ¼ N 0 eħω=kB T ,

ð9:1Þ

where kB is Boltzmann constant and T is absolute temperature. Let Nj be the number of oscillators of the j-th excited state. Then we have N j ¼ N 0 ejħω=kB T :

ð9:2Þ

Let N be the total number of oscillators in the system. Then, we get N ¼ N 0 þ N 0 eħω=kB T þ þ N 0 ejħω=kB T þ X1 ¼ N 0 j¼0 ejħω=kB T :

ð9:3Þ

Let E be a total energy of the oscillator system in reference to the ground state. That is, we put a ground-state energy at zero. Then we have E ¼ 0 N 0 þ N 0 ħωeħω=kB T þ þ N 0 jħωejħω=kB T þ X1 ¼ N 0 j¼0 jħωejħω=kB T :

ð9:4Þ

Therefore, an average energy of oscillators E is P1 jħω=kB T E j¼0 je E ¼ ¼ ħω P1 jħω=k T : B N j¼0 e

ð9:5Þ

Putting x eħω=kB T [1], we have P1 j j¼0 jx E ¼ ħω P1 j : j¼0 x

ð9:6Þ

Since x < 1, we have X1

jxj ¼ j¼0 ¼

X1

jxj1 x ¼ j¼0

d X1 j d 1 x x x ¼ j¼0 dx dx 1 x

x : ð 1 xÞ 2

ð9:7Þ X1 j¼0

Therefore, we get

xj ¼

1 : 1x

ð9:8Þ

9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

E¼

ħωx ħωeħω=kB T ħω ¼ ¼ : 1 x 1 eħω=kB T eħω=kB T 1

341

ð9:9Þ

The function 1 eħω=kB T 1

ð9:10Þ

is a form of Bose–Einstein distribution functions; more specifically it is called the Bose–Einstein distribution function for photons today. If ðħω=kB T Þ 1, eħω=kB T 1 þ ðħω=kB T Þ. Therefore, we have E k B T:

ð9:11Þ

Thus, the relation (9.9) asymptotically agrees with a classical theory. In other words, according to the classical theory related to law of equipartition of energy, energy of kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a kinetic energy and a potential energy of a harmonic oscillator.

9.2

Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

Researcher at the time tried to seek the relationship between the energy density inside the cavity and (angular) frequency of radiation. To reach the relationship, let us introduce a concept of mode density of electromagnetic waves related to the blackbody radiation. We define the mode density D(ω) as the number of modes of electromagnetic waves per unit volume per unit angular frequency. We refer to the electromagnetic waves having allowed specific angular frequencies and polarization as modes. These modes must be described as linearly independent functions. Determination of the mode density is related to boundary conditions (BCs) imposed on a physical system. We already dealt with this problem in Chaps. 2, 3, and 8. These BCs often appear when we find solutions of a differential equations. Let us consider a following wave equation: 2

2

∂ ψ 1 ∂ ψ ¼ 2 : 2 v ∂t 2 ∂x

ð9:12Þ

According to the method of separation of variables, we put ψ ðx, t Þ ¼ X ðxÞT ðt Þ: Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have

ð9:13Þ

342

9 Light Quanta: Radiation and Absorption

1 d2 X 1 1 d2 T ¼ ¼ k 2 , X dx2 v2 T dt 2

ð9:14Þ

where k is an undetermined (possibly complex) constant. For the x component, we get d2 X þ k2 X ¼ 0: dx2

ð9:15Þ

Remember that k is supposed to be a complex number for the moment (see Example 1.1). Modifying Example 1.1 a little bit such that (9.15) is posed in a domain [0, L] and imposing the Dirichlet conditions such that X ð0Þ ¼ X ðLÞ ¼ 0,

ð9:16Þ

X ðxÞ ¼ a sin kx,

ð9:17Þ

we find a solution of

where a is a constant. The constant k can be determined to satisfy the BCs; i.e., kL ¼ mπ or k ¼ mπ=L ðm ¼ 1, 2, Þ:

ð9:18Þ

Thus, we get real numbers for k. Then, we have a solution T ðt Þ ¼ b sin kvt ¼ b sin ωt:

ð9:19Þ

ψ ðx, t Þ ¼ c sin kx sin ωt:

ð9:20Þ

The overall solution is then

This solution has already appeared in Chap. 8 as a stationary solution. The readers are encouraged to derive these results. In a three-dimensional case, we have a wave equation 2

2

2

2

∂ ψ ∂ ψ ∂ ψ 1 ∂ ψ þ 2þ 2 ¼ 2 : 2 2 v ∂x ∂y ∂z ∂t

ð9:21Þ

In this case, we also assume ψ ðx, t Þ ¼ X ðxÞY ðyÞZ ðzÞT ðt Þ: Similarly we get

ð9:22Þ

9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves

343

1 d2 X 1 d2 Y 1 d2 Z 1 1 d2 T þ þ ¼ ¼ k2 : X dx2 Y dx2 Z dx2 v2 T dt 2

ð9:23Þ

1 d2 X 1 d2 Y 1 d2 Z 2 2 ¼ k , ¼ k , ¼ k2z , x y X dx2 Y dx2 Z dx2

ð9:24Þ

k 2x þ k2y þ k 2z ¼ k 2 :

ð9:25Þ

Putting

we have

Then, we get a stationary wave solution as in the one-dimensional case such that ψ ðx, t Þ ¼ c sin kx x sin ky y sin k z z sin ωt:

ð9:26Þ

The BCs to be satisfied with X(x), Y( y), and Z(z) are kx L ¼ mx π, ky L ¼ my π, k z L ¼ mz π mx , my , mz ¼ 1, 2, :

ð9:27Þ

Returning to the main issue, let us deal with the mode density. Think of a cube of each side of L that is placed as shown in Fig. 9.1. Calculating k, we have k¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi π ω m2x þ m2y þ m2z ¼ , k 2x þ k2y þ k 2z ¼ L c

ð9:28Þ

where we assumed that the inside of a cavity is vacuum and, hence, the propagation velocity of light is c. Rewriting (9.28), we have qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Lω m2x þ m2y þ m2z ¼ : πc

ð9:29Þ

Fig. 9.1 Cube of each side of L. We use this simple model to estimate mode density

z

L

L L O y

x

344

9 Light Quanta: Radiation and Absorption

The number mx, my, and mz represents allowable modes in the cavity; the set of (mx, my, mz) specifies individual modes. Note that mx, my, and mz are all positive integers. If for instance mx were allowed, this would produce sin(kxx) ¼ sin kxx; but this function is linearly dependent on sinkxx. Then, a mode indexed by mx should not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that satisfies (9.29) corresponds to each mode. Therefore, the number of modes that satisfies a following expression qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Lω m2x þ m2y þ m2z πc

ð9:30Þ

represents those corresponding to angular frequencies equal to or less than given ω. Each mode has one-to-one correspondence with the lattice indexed by (mx, my, mz). Accordingly, if mx, my, mz 1, the number of allowed modes approximately equals one-eighth of a volume of a sphere having a radius of Lω πc . Let NL be the number of modes whose angular frequencies equal to or less than ω. Recalling that there are two independent modes having the same index (mx, my, mz) but mutually orthogonal polarities, we have NL ¼

4π Lω 3 1 L3 ω3 2¼ 2 3: 3 πc 8 3π c

ð9:31Þ

Consequently, the mode density D(ω) is expressed as DðωÞdω ¼

1 dN L ω2 dω ¼ 2 3 dω, 3 dω π c L

ð9:32Þ

where D(ω)dω represents the number of modes per unit volume whose angular frequencies range ω and ω + dω. Now, we introduce a function ρ(ω) as an energy density per unit angular frequency. Then, combining D(ω) with (9.9), we get ρ ð ω Þ ¼ D ð ωÞ

ħω

eħω=kB T

1

¼

ħω3 1 : π 2 c3 eħω=kB T 1

ð9:33Þ

The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a dimension [Jm3s]. To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an electric field within a cavity surrounded by a metal husk, because the electric field must be absent at an interface between the cavity and metal. The problem, however, can equivalently be solved using the magnetic field. This is because at the interface the reflection coefficient of electric field and magnetic field have a reversed sign (see Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann condition. This condition requires differential coefficients to vanish at the boundary

9.3 Two-Level Atoms

345

(i.e., the interface between the cavity and metal). In a similar manner to the above, we get ψ ðx, t Þ ¼ c cos kx cos ωt:

ð9:34Þ

By imposing BCs, again we have (9.18) that leads to the same result as the above. We may also impose the periodic BCs. This type of equation has already been treated in Chap. 3. In that case we have a solution of eikx and eikx : The BCs demand that e0 ¼ 1 ¼ eikL. That is, kL ¼ 2πm ðm ¼ 0, 1, 2, Þ:

ð9:35Þ

Notice that eikx and eikx are linearly independent and, hence, minus sign for m is permitted. Correspondingly, we have NL ¼

4π Lω 3 L3 ω3 2¼ 2 3: 3 2πc 3π c

ð9:36Þ

In other words, here we have to consider a whole volume of a sphere of a half radius of the previous case. Thus, we reach the same conclusion as before. If the average energy of an oscillator were described by (9.11), we would obtain a following description of ρ(ω) such that ρðωÞ ¼ DðωÞk B T ¼

ω2 kB T: π 2 c3

ð9:37Þ

This relation is well known as RayleighJeans law, but (9.37) disagreed with experimental results in that according to RayleighJeans law, ρ(ω) diverges toward infinity as ω goes to infinity. The discrepancy between the theory and experimental results was referred to as “ultraviolet catastrophe.” Planck’s law of radiation described by (9.33), on the other hand, reproduces the experimental results well.

9.3

Two-Level Atoms

Although Planck established Planck’s law of radiation, researchers at that time hesitated in professing the existence of light quanta. It was Einstein that derived Planck’s law by assuming two-level atoms in which light quanta play a role. His assumption comprises the following three postulates: (i) The physical system to be addressed comprises so-called hypothetical “two-level” atoms that have only two energy levels. If two-level atoms absorb a light quantum, a ground-state electron

346

9 Light Quanta: Radiation and Absorption

, −

Fig. 9.2 Optical processes of Einstein two-level atom model

)

(

)

ℏ

=

(

,

Stimulated absorption

Stimulated emission

Spontaneous emission

is excited up to a higher level (i.e., the stimulated absorption). (ii) The higher-level electron may spontaneously lose its energy and return back to the ground state (the spontaneous emission). (iii) The higher-level electron may also lose its energy and return back to the ground state. Unlike (ii), however, the excited electron has to be stimulated by being irradiated by light quanta having an energy corresponding to the energy difference between the ground state and excited state (the stimulated emission). Figure 9.2 schematically depicts the optical processes of the Einstein model. Under those postulates, Einstein dealt with the problem probabilistically. Suppose that the ground state and excited state have energies E1 and E2. Einstein assumed that light quanta having an energy equaling E2 E1 take part in all the above three transition processes. He also propounded the idea that the light quanta have an energy that is proportional to its (angular) frequency. That is, he thought that the following relation should hold: ħω21 ¼ E2 E1 ,

ð9:38Þ

where ω21 is an angular frequency of light that takes part in the optical transitions. For the time being, let us follow Einstein’s postulates. (i) Stimulated absorption: This process is simply said to be an “absorption.” Let Wa [s1] be the transition probability that the electron absorbs a light quantum to be excited to the excited state. Wa is described as W a ¼ N 1 B21 ρðω21 Þ,

ð9:39Þ

where N1 is the number of atoms occupying the ground state; B21 is a proportional constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead of ω in (9.33). The coefficient B21 is called Einstein B coefficient; more specifically one of Einstein B coefficients. Namely, B21 is pertinent to the transition from the ground state to excited state.

9.3 Two-Level Atoms

347

(ii) Emission processes: The processes include both the spontaneous and stimulated emissions. Let We [s1] be the transition probability that the electron emits a light quantum and returns back to the ground state. We is described as W e ¼ N 2 B12 ρðω21 Þ þ N 2 A12 ,

ð9:40Þ

where N2 is the number of atoms occupying the excited state; B12 and A12 are proportional constants. The coefficient A12 is called Einstein A coefficient relevant to the spontaneous emission. The coefficient B12 is associated with the stimulated emission and also called Einstein B coefficient together with B21. Here, B12 is pertinent to the transition from the excited state to ground state. Now, we have B12 ¼ B21 :

ð9:41Þ

The reasoning for this is as follows: The coefficients B12 and B21 are proportional to the matrix elements pertinent to the optical transition. Let T be an operator associated with the transition. Then, a matrix element is described using an inner product notation of Chap. 1 by B21 ¼ hψ 2 jT j ψ 1 i,

ð9:42Þ

where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical transition. As a good approximation, we use er for T (dipole approximation), where e is an elementary charge and r is a position operator (see Chap. 1). If (9.42) represents the absorption process (i.e., the transition from the ground state to excited state), the corresponding emission process should be described as a reversed process by B12 ¼ hψ 1 jT j ψ 2 i:

ð9:43Þ

Notice that in (9.43) ψ 2 and ψ 1 are initial and final states. Taking complex conjugate of (9.42), we have B21 ¼ hψ 1 jT { j ψ 2 i,

ð9:44Þ

where T{ is an operator adjoint to T (see Chap. 1). With an Hermitian operator H, from Sect. 1.4 we have H { ¼ H: Since T is also Hermitian, we have

ð1:119Þ

348

9 Light Quanta: Radiation and Absorption

T { ¼ T:

ð9:45Þ

B21 ¼ B12 :

ð9:46Þ

Thus, we get

But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real functions. Then, we have B21 ¼ B21 ¼ B12 : That is, we assume that the matrix B is real symmetric. In the case of two-level atoms, as a matrix form we get

B¼

0 B12

B12 : 0

ð9:47Þ

Compare (9.47) with (4.28). Now, in the thermal equilibrium, we have W e ¼ W a:

ð9:48Þ

N 2 B21 ρðω21 Þ þ N 2 A12 ¼ N 1 B21 ρðω21 Þ,

ð9:49Þ

That is,

where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get N2 ¼ exp ½ðE 2 E 1 Þ=kB T : N1

ð9:50Þ

Here if moreover we assume (9.38), we get N2 ¼ exp ðħω21 =kB T Þ: N1

ð9:51Þ

Combing (9.49) and (9.51), we have exp ðħω21 =kB T Þ ¼

B21 ρðω21 Þ : B21 ρðω21 Þ þ A12

Solving (9.52) with respect to ρ(ω12), we finally get

ð9:52Þ

9.4 Dipole Radiation

ρðω21 Þ ¼

349

exp ðħω21 =k B T Þ A12 A 1 ¼ 12 : B21 1 exp ðħω21 =k B T Þ B21 exp ðħω21 =k B T Þ 1

ð9:53Þ

Assuming that A12 ħω21 3 ¼ 2 3 , B21 π c

ð9:54Þ

we have ρðω21 Þ ¼

ħω21 3 1 : π 2 c3 exp ðħω21 =kB T Þ 1

ð9:55Þ

This is none other than Planck’s law of radiation.

9.4

Dipole Radiation

In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The electromagnetic radiation results from an accelerated motion of a dipole. A dipole moment p(t) is defined as a function of time t by pð t Þ ¼

Z

x0 ρðx0 , t Þdx0 ,

ð9:56Þ

where x0 is a position vector in a Cartesian coordinate; an integral is taken over a whole three-dimensional space; ρ is a charge density appearing in (7.1). If we consider a system comprising point charges, integration can immediately be carried out to yield pð t Þ ¼

X

qx, i i i

ð9:57Þ

where qi is a charge of each point charge i and xi is a position vector of the point charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the coordinate system. However, if a total charge of the system is zero, p(t) does not depend on the coordinate system. Let p(t) and p0(t) be a dipole moment viewed from the frame O and O0, respectively (see Fig. 9.3). Then we have

350

9 Light Quanta: Radiation and Absorption

ಥ

᧧

.

.

Fig. 9.3 Dipole moment viewed from the frame O or O0

+q

z

(b)

x

z0 a

(a)

‒a ‒z0

θ

‒q

O

y

φ x

Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at the origin of the coordinate system is executing harmonic oscillation along the z-direction around an equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a righthanded system

p0 ð t Þ ¼

X

q x0 ¼ i i i

X

q ð x þ xi Þ ¼ i i 0

X X X q þ q x ¼ qx x 0 i i i i i i i i

¼ pðt Þ:

ð9:58Þ

Notice that with the third equality the first term vanishes because the total charge is zero. The system comprising two point charges that have an opposite charge ( q) is particularly simple but very important. In that case we have pðt Þ ¼ qx1 þ ðqÞx2 ¼ qðx1 x2 Þ ¼ qe x:

ð9:59Þ

Here we assume that q > 0 according to the custom and, hence, e x is a vector directing from the minus point charge to the plus charge. Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radiation from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate system and assumed to be of an atomic or molecular scale in extension; we regard a center of the dipole as the origin. Figure 9.4b represents a large-scale geometry of the

9.4 Dipole Radiation

351

dipole and surrounding space of it. For the electromagnetic radiation, an accelerated motion of the dipole is of primary importance. The electromagnetic fields produced by pð€t Þ vary as the inverse of r, where r is a macroscopic distance between the dipole and observation point. Namely, r is much larger compared to the dipole size. There are other electromagnetic fields that result from the dipole moment. The fields result from p(t) and pð_t Þ. Strictly speaking, we have to include those quantities that are responsible for the electromagnetic fields associated with the dipole radiation. Nevertheless, the fields produced by p(t) and pð_t Þ vary as a function of the inverse cube and inverse square of r, respectively. Therefore, the surface integral of the square of the fields associated with p(t) and pð_t Þ asymptotically reaches zero with enough large r with respect to a sphere enveloping the dipole. Regarding pð€t Þ, on the other hand, the surface integral of the square of the fields remains finite even with enough large r. For this reason, we refer to the spatial region where pð€t Þ does not vanish as a wave zone. Suppose that a dipole placed at the origin of the coordinate system is executing harmonic oscillation along the z-direction around an equilibrium position (see Fig. 9.4). Motion of two charges having plus and minus signs is described by zþ = z0 e3 þ aeiωt e3 ðz0 , a > 0Þ,

ð9:60Þ

z 2 = 2 z0 e3 2 aeiωt e3 ,

ð9:61Þ

where z+ and z2 are position vectors of a plus charge and minus charge, respectively; z0 and z0 are equilibrium positions of each charge; a is an amplitude of the harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelerations of the charges are given by aþ z€þ = aω2 eiωt e3 ,

ð9:62Þ

a 2 z€ = aω2 eiωt e3 :

ð9:63Þ

pðt Þ ¼ qzþ þ ðqÞz 2 ¼ qðzþ z 2 Þ ðq > 0Þ:

ð9:64Þ

pð€t Þ ¼ qðz€þ z€ Þ ¼ 2qaω2 eiωt e3 :

ð9:65Þ

Meanwhile, we have

Therefore,

The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time, produces the electric field described by [2]

352

9 Light Quanta: Radiation and Absorption

E= 2€ p=4πε0 c2 r ¼ 2 qaω2 eiωt e3 =2πε0 c2 r,

ð9:66Þ

where r is a distance between the dipole and observation point. In (9.66) we ignored a term proportional to inverse square and cube of r for the aforementioned reason. As described in (9.66), the strength of the radiation electric field in the wave zone measured at a point away from the oscillating dipole is proportional to a component of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation electric field lies in the direction perpendicular to a line connecting the observation point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the electric field in that direction and let E⊥ be the radiation electric field. Then, we have E⊥ = 2 qaω2 eiωt ðe3 εe Þεe =2πε0 c2 r ¼ 2 qaω2 eiωt εe sin θ=2πε0 c2 r :

ð9:67Þ

As shown in Sect. 7.3, (εe e3)εe in (9.67) “extracts” from e3 a vector component parallel to εe. Such an operation is said to be a projection of a vector. The related discussion can be seen in Part III. It takes a time of r/c for the emitted light from the charge to arrive at the observation point. Consequently, the acceleration of the charge has to be measured at the time when the radiation leaves the charge. Let t be the instant when the electric field is measured at the measuring point. Then, it follows that the radiation leaves the charge at a time of t r/c. Thus, the electric field relevant to the radiation that can be observed far enough away from the oscillating charge is described as [2] qaω2 eiωðtcÞ sin θ εe : 2πε0 c2 r r

E⊥ ðx, t Þ = 2

ð9:68Þ

The radiation electric field must necessarily be accompanied by a magnetic field. Writing the radiation magnetic field as H⊥(x, t), we have [3] r r 1 qaω2 eiωðtcÞ sin θ qaω2 eiωðtcÞ sin θ H ðx, t Þ = 2 n εe n εe = 2 2πcr cμ0 2πε0 c2 r

⊥

qaω2 eiωðtcÞ sin θ εm , 2πcr r

=2

ð9:69Þ

where n represents a unit vector in the direction parallel to a line connecting the observation point and the dipole. The εm is a unit polarization vector of the magnetic field as defined by (7.67). From the above, we see that the radiation electromagnetic waves in the wave zone are transverse waves that show the properties the same as those of electromagnetic waves in a free space. Now, let us evaluate a time-averaged energy flux from an oscillating dipole. Using (8.71), we have

9.4 Dipole Radiation

353

r r 1 1 qaω2 eiωðtcÞ sin θ qaω2 eiωðtcÞ sin θ n SðθÞ ¼ E H = 2πcr 2 2 2πε0 c2 r

=

ω4 sin 2 θ ðqaÞ2 n: 8π 2 ε0 c3 r 2

ð9:70Þ

If we are thinking of an atom or a molecule in which the dipole consists of an electron and a positive charge that compensates it, q is replaced with e (e < 0). Then (9.70) reads as SðθÞ =

ω4 sin 2 θ ðeaÞ2 n: 8π 2 ε0 c3 r 2

ð9:71Þ

Let us relate the above argument to Einstein A and B coefficients. Since we are dealing with an isolated dipole, we might well suppose that the radiation comes from the spontaneous emission. Let P be a total power of emission from the oscillating dipole that gets through a sphere of radius r. Then we have P¼ ¼

Z

SðθÞ ndS ¼

ω ðeaÞ2 8π 2 ε0 c3 4

Changing cosθ to t, the integral I I¼

Z

Z

Z

Z

2π 0

2π

Z

dϕ 0

Rπ 0

1

1

π

dϕ

SðθÞr 2 sin θdθ

0 π

sin 3 θdθ:

ð9:72Þ

0

sin 3 θdθ can be converted into

1 t 2 dt ¼ 4=3:

ð9:73Þ

ω4 ðeaÞ2 : 3πε0 c3

ð9:74Þ

Thus, we have P¼

A probability of the spontaneous emission is given by N2 A12. Since we are dealing with a single dipole, N2 can be put 1. Accordingly, an expected power of emission is A12ħω21. Replacing ω in (9.74) with ω21 in (9.55) and equating A12ħω21 to P, we get A12 ¼ From (9.54), we also get

ω21 3 ðeaÞ2 : 3πε0 c3 ħ

ð9:75Þ

354

9 Light Quanta: Radiation and Absorption

B12 ¼

π ðeaÞ2 : 2 3ε0 ħ

ð9:76Þ

In order to relate these results to those of quantum mechanics, we may replace a2 in the above expressions with a square of an absolute value of the matrix elements of the position operator r. That is, representing | 1i and | 2i as the quantum states of the ground and excited states of a two-level atom, we define h1| r| 2i as the matrix element of the position operator. Relating |h1| r| 2i|2 to a2, we get A12 ¼

ω21 3 e2 πe2 jh1jrj2ij2 : jh1jrj2ij2 and B12 ¼ 3 3πε0 c ħ 3ε0 ħ2

ð9:77Þ

From (9.77), we have B12 ¼ ¼

πe2 πe2 h1jrj2ih1jrj2i ¼ h1jrj2ih2jr{ j1 2 2 3ε0 ħ 3ε0 ħ πe2 h1jrj2ih2jrj1i, 3ε0 ħ2

ð9:78Þ

where with the second equality we used (1.116); the last equality comes from that r is Hermitian. Meanwhile, we have B21 ¼

πe2 πe2 πe2 2 { 2 jrj1i ¼ 2 jrj1i 1 jr j2 ¼ j h j h h h2jrj1ih1jrj2i: 3ε0 ħ2 3ε0 ħ2 3ε0 ħ2

ð9:79Þ

Hence, we recover (9.41).

9.5 9.5.1

Lasers Brief Outlook

A concept of the two-level atoms proposed by Einstein is universal and independent of materials and can be utilized for some particular purposes. Actually, in later years many researchers tried to validate that concept and verified its validity. After basic researches of 1950s and 1960s, fundamentals were put into practical use as various optical devices. Typical examples are masers and lasers, abbreviations for “microwave amplification by stimulated emission of radiation” and “light amplification by stimulated emission of radiation.” Of these, lasers are common and particularly important nowadays. On the basis of universality of the concept, a lot of materials including semiconductors and dyes are used in gaseous, liquid, and solid states.

9.5 Lasers

355

t I(x) 0

t+dt I(x+dx)

x x+dx

L

Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area S (not shown). I(x) denotes irradiance at a point of x from the origin

Let us consider a rectangular parallelepiped of a laser medium with a length L and cross-section area S (Fig. 9.5). Suppose there are N two-level atoms in the rectangular parallelepiped such that N ¼ N1 þ N2,

ð9:80Þ

where N1 and N2 represent the number of atoms occupying the ground and excited states, respectively. Suppose that a light is propagated from the left of the rectangular parallelepiped and entering it. Then, we expect that three processes occur simultaneously. One is a stimulated absorption and others are stimulated emission and spontaneous emission. After these process, an increment dE in photon energy of the total system (i.e., the rectangular parallelepiped) during dt is described by dE ¼ fN 2 ½B21 ρðω21 Þ þ A12 N 1 B21 ρðω21 Þgħω21 dt:

ð9:81Þ

In light of (9.39) and (9.40), a dimensionless quantity dE/ħω21 represents a number of effective events of photons emission that have occurred during dt. Since in lasers the stimulated emission is dominant, we shall forget about the spontaneous emission and rewrite (9.81) as dE ¼ fN 2 B21 ρðω21 Þ N 1 B21 ρðω21 Þgħω21 dt ¼ B21 ρðω21 ÞðN 2 N 1 Þħω21 dt:

ð9:82Þ

Under a thermal equilibrium, we have N2 < N1 on the basis of Boltzmann distribution law, and so dE < 0. In this occasion, therefore, the photon energy decreases. For the light amplification to take place, therefore, we must have a following condition: N2 > N1:

ð9:83Þ

This energy distribution is called inverted distribution or population inversion. Thus, the laser oscillation is a typical nonequilibrium phenomenon. To produce the population inversion, we need an external exciting source using an electrical or optical device. The essence of lasers rests upon the fact that stimulated emission produces a photon that possesses a wavenumber vector (k) and a polarization (ε) both the same as those of an original photon. For this reason, the laser light is monochromatic and

356

9 Light Quanta: Radiation and Absorption

highly directional. To understand the fundamental mechanism underlying the related phenomena, interested readers are encouraged to seek appropriate literature of quantum theory of light for further reading [4]. To make a discussion simple and straightforward, let us assume that the light is incident parallel to the long axis of the rectangular parallelepiped. Then, the stimulated emission produces light to be propagated in the same direction. As a result, an irradiance I measured in that direction is described as I¼

E 0 c: SL

ð9:84Þ

Note that the light velocity in the laser medium c0 is given by c0 ¼ c=n,

ð9:85Þ

where n is a refractive index of the laser medium. Taking an infinitesimal of both sides of (9.84), we have dI ¼ dE

c0 c0 ¼ B21 ρðω21 ÞðN 2 N 1 Þħω21 dt SL SL e 21 c0 dt, ¼ B21 ρðω21 ÞNħω

ð9:86Þ

e ¼ ðN 2 N 1 Þ=SL denotes a “net” density of atoms that occupy the excited where N state. The energy density ρ(ω21) can be written as ρðω21 Þ ¼ I ðω21 Þgðω21 Þ=c0 ,

ð9:87Þ

where I(ω12) [Js1m2] represents an intensity of radiation; g(ω21) is a gain function [s]. The gain function is a measure that shows how favorably (or unfavorably) the transition takes place at the said angular frequency ω12. This is normalized in the emission range such that Z

1

gðωÞdω ¼ 1:

0

The quantity I(ω21) is an energy flux that gets through per unit area per unit time. This flux corresponds to an energy contained in a long and thin rectangular parallelepiped of a length c0 and a unit cross-section area. To obtain ρ(ω21), I(ω12) should be divided by c0 in (9.87) accordingly. Using (9.87) and replacing c0dt with a distance dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as

9.5 Lasers

357

dI ðxÞ ¼

e 21 B21 gðω21 ÞNħω I ðxÞdx: 0 c

ð9:88Þ

Dividing (9.88) by I(x) and integrating both sides, we have Z

I

I0

dI ðxÞ ¼ I ðxÞ

Z

I

d ln I ðxÞ ¼

I0

e 21 B21 gðω21 ÞNħω c0

Z

x

dx,

ð9:89Þ

0

where I0 is an irradiance of light at an instant when the light is entering the laser medium from the left. Thus, we get e 21 B21 gðω21 ÞNħω I ðxÞ ¼ I 0 exp x : c0

ð9:90Þ

Equation (9.90) shows that an irradiance of the laser light is augmented exponentially along the path of the laser light. In (9.90), denoting an exponent as G G

e 21 B21 gðω21 ÞNħω , 0 c

ð9:91Þ

we get I ðxÞ ¼ I 0 exp Gx: The constant G is said to be a gain constant. This is an index that indicates the laser e yield a high performance of the performance. Large numbers B21, g(ω21), and N laser. In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause constructive interference. In a one-dimensional dielectric medium, the condition is described as kL ¼ mπ or mλ ¼ 2L ðm ¼ 1, 2, Þ,

ð9:92Þ

where k and λ denote a wavenumber and wavelength in the dielectric medium, respectively. Indexing k and λ with m that represents a mode, we have k m L ¼ mπ or mλm ¼ 2L ðm ¼ 1, 2, Þ:

ð9:93Þ

This condition can be expressed by different manners such that ωm ¼ 2πνm ¼ 2πc0 =λm ¼ 2πc=nλm ¼ mπc=nL:

ð9:94Þ

It is often the case that if the laser is a long and thin rod, rectangular parallelepiped, etc., we see that sharply resolved and regularly spaced spectral lines are

358

9 Light Quanta: Radiation and Absorption

observed in emission spectrum. These lines are referred to as a longitudinal multimode. The separation between two neighboring emission lines is referred to as the free spectral range [2]. If adjacent emission lines are clearly resolved so that the free spectral range can easily be recognized, we can derive useful information from the laser oscillation spectra (vide infra). Rewriting (9.94) as, e.g., ωm n ¼

πc m, L

ð9:95Þ

and taking differential (or variation) of both sides, we get δn nδωm þ ωm δn ¼ nδωm þ ωm δω ¼ δωm m

δn πc n þ ωm δωm ¼ δm: δωm L

ð9:96Þ

Therefore, we get

1 πc δn δωm ¼ n þ ωm δm: L δωm

ð9:97Þ

Equation (9.97) premises the wavelength dispersion of a refractive index of a laser medium. Here, the wavelength dispersion means that the refractive index varies as a function of wavelengths of light in a matter. The laser materials often have a considerably large dispersion and relevant information is indispensable. From (9.97), we find that n g n þ ωm

δn δωm

ð9:98Þ

plays a role of refractive index when the laser material has a wavelength dispersion. The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is rewritten as δω ¼

πc δm, Lng

ð9:99Þ

where we omitted the index m of ωm. When we need to distinguish the refractive index n clearly from the group refractive index, we refer to n as a phase refractive index. Rewriting (9.98) as a relation of continuous quantities and using differentiation instead of variation, we have [2] ng ¼ n þ ω

dn dn or ng ¼ n λ : dω dλ

ð9:100Þ

9.5 Lasers

359

To derive the second equation of (9.100), we used following relations: Namely, taking a variation of λω ¼ 2πc, we have ωdλ þ λdω ¼ 0 or

ω λ ¼ : dω dλ

Several formulae or relation equations were proposed to describe the wavelength dispersion. One of famous and useful formula among them is Sellmeier’s dispersion formula [5]. As an example, the Sellmeier’s dispersion formula can be described as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B n¼ Aþ 2 , 1 λc

ð9:101Þ

where A, B, and C are appropriate constants with A and B being dimensionless and C having a dimension [m]. In an actual case, it would be difficult to determine n analytically. However, if we are able to obtain well-resolved spectra, δm can be put as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR from (9.99) we have δωFSR ¼

πc πc or ng ¼ : Lng LðδωFSR Þ

ð9:102Þ

Thus, one can determine ng as a function of wavelengths.

9.5.2

Organic Lasers

By virtue of prominent features of lasers and related phenomena, researchers and engineers have been developing and proposing up to now various device structures and their operation principles in the research field of device physics. Of these, we have organic lasers as newly occurring laser devices. Organic crystals possess the well-defined structureproperty relationship and some of them exhibit peculiar light-emitting features. Therefore, those crystals are suited for studying their lasing property. In this section we study the light-emitting properties (especially the lasing property) of the organic crystals in relation to the wavelength dispersion of refractive index of materials. From the point of view of device physics, another key issue lies in designing an efficient diffraction grating or resonator. In the following tangible examples, we further investigate specific aspects of the light-emitting properties of the organic crystals in the slab waveguide configurations (see Sect. 8.8) to pursue fundamental properties of the organic light-emitting materials and incorporate them into high-performance devices. Example 9.1 [6] Figure 9.6 [6] displays a broadband emission spectra of a crystal consisting of an organic semiconductor AC’7. As another example, Fig. 9.7 [6]

360

9 Light Quanta: Radiation and Absorption

(a)

(b) 11 Intensity (103 counts)

Intensity (103 counts)

10

5

10

9 0 500 520 540 560 580 600 620 640 Wavelength (nm)

529

530

531

Wavelength (nm)

Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC’7. (a) Full spectrum. (b) Enlarged profile of the spectrum around 530 nm. Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117

Intensity (103 counts)

4 3 2 1 0 522

524

526

528

Wavelength (nm)

Fig. 9.7 Laser oscillation spectrum of an organic semiconductor crystal AC’7. Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/ phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117

displays a laser oscillation spectrum of an AC’7 crystal. The structural formula of AC’7 is shown in Fig. 9.8 together with other related organic semiconductors. Once we choose an empirical formula of the wavelength dispersion [e.g., (9.101)], we can determine constants of that empirical formula by comparing it with experimentally decided data. For such data, laser oscillation spectra (Fig. 9.7) were used in addition to the broadband emission spectra. It is because the laser oscillation spectra are essentially identical to Fig. 9.6 in a sense that both the broadband and laser emission lines gave the same free spectral range. Inserting (9.101) into (9.100) and expressing ng as a function of λ, Yamao et al. got a following expression [6]:

9.5 Lasers

361

Fig. 9.8 Structural formulae of several organic semiconductors BP1T, AC5, and AC’7

h 2 i2 A 1 λc þB ffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ng ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi: h i h c 2 3 c 2 i 1 λ A 1 λ þB

ð9:103Þ

Determining optimum constants A, B, and C, a set of these constants yields a reliable dispersion formula in (9.101). Numerical calculations can be utilized effectively. The procedures are as follows: (i) Tentatively choosing probable numbers for A, B, and C for (9.103), ng can be expressed as a function of λ. (ii) The resulting fitting curve is then compared with ng data experimentally decided from (9.102). After this procedure, one can choose another set of A, and B, and C and again compare the fitting curve with the experimental data. (iii) This procedure can be repeated many times through iterative numerical computations of (9.103) using different sets of A, B, and C. Thus, we should be able to adjust and determine better and better combination of A, B, and C so that the refined function (9.103) can reproduce the experimental results as precise as one pleases. At the same time, we can determine the most reliable combination of A, B, and C with the dispersion formula of (9.101). Figure 9.9 [6] shows several examples of the wavelength dispersion for organic semiconductor crystals. Optimized constants A, B, and C of (9.101) are listed in Table 9.1 [6]. The formulae (9.101) and (9.103) along with associated procedures to determine the constants A, B, and C are expected to apply to various laser and light-emitting materials consisting of semiconducting inorganic and organic materials. Example 9.2 [7] If we wish to construct a laser device, it will be highly desired to equip the laser material with a suitable diffraction grating or resonator [8, 9]. In that case, besides the information about the dispersion of phase refractive index, we need numerical data of the propagation constant. The propagation constant has appeared in Sect. 8.7.1 and is defined as

362

9 Light Quanta: Radiation and Absorption

Fig. 9.9 Examples of the wavelength dispersion of (a) group indices and (b) refractive indices for several organic semiconductor crystals. Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/ phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://doi.org/ 10.1063/1.3634117

Group index ng

7 6

(a) BP1T

AC'7 AC5

5 4

Refractive index n

3 3.3 3.2 3.1 3 2.9 2.8 2.7 2.6 2.5

(b)

AC'7 BP1T

AC5 500 600 Wavelength (nm)

Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with several organic semiconductor crystalsa Material BP1T AC5 AC’7

A 5.7 3.9 6.0

B 1.04 1.44 1.06

C (nm) 397 402 452

a

Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages, with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117

β ¼ k sin θ ¼ nk 0 sin θ:

ð8:153Þ

As the phase refractive index n has a dispersion, the propagation constant β has a dispersion as well. In optics, n sin θ is often referred to as an effective index. We denote it by neff n sin θ or β ¼ k0 neff :

ð9:104Þ

As 0 θ π/2, we have n sin θ n. In general, it is going to be difficult to obtain analytical description or solution for the dispersion of both the phase refractive index and effective index. As in Example 9.1, we usually obtain the relevant data by the numerical calculations.

9.5 Lasers

363

(a)

P6T (b)

(c) P6T crystal air

optical device

2 µm

AZO grating

Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b) Diffraction grating of an organic device. The purple arrow indicates the direction of grating wavevector K. Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.5030486. (c) Cross-section of P6T crystal/AZO substrate

Under these circumstances, to establish a design principle of high-performance organic light-emitting devices Yamao et al. [7] have developed a light-emitting device where an organic crystal of P6T (see structural formula of Fig. 9.10a) was placed onto a diffraction grating (Fig. 9.10b [7]). The compound P6T is known as one of a family of thiophene/phenylene co-oligomers (TPCOs) together with BP1T, AC5, AC’7, etc. that appear in Fig. 9.8 [10, 11]. The diffraction grating was engraved using a focused ion beam (FIB) apparatus on an aluminum-doped zinc oxide (AZO) substrate. The resulting substrate was then laminated with the thin crystal of P6T (Fig. 9.10c). Using those devices, Yamao et al. excited the P6T crystal with ultraviolet light of a mercury lamp and collected the emissions from the crystal and analyzed those emission spectra (vide infra). Detailed analysis of those spectra has revealed that there is a close relationship between the peak location of emission lines and emission direction. Thus, to analyze the angle-dependent emission spectra turns out to be a powerful tool to accurately determine the dispersion of propagation constant of the laser material. In a light-emitting device, we must consider a problem of out-coupling of light. Seeking efficient out-coupling of light is equivalent to solving equations of electromagnetic wave motion inside and outside the slab crystal under appropriate boundary conditions. The boundary conditions are given such that tangential components of electromagnetic fields are continuous across the interface formed by the slab crystal and air or a device substrate. Moreover, if a diffraction grating is present in an optical device (including the laser), the diffraction grating influences emission

364

9 Light Quanta: Radiation and Absorption

−

Fig. 9.11 Geometrical relationship among the light propagation in crystal (indicated with β), the propagation outside the crystal (k0), and the grating wavevector (K). This geometry represents the phase matching between the emission inside the device (organic slab crystal) and out-coupled emission (i.e., the emission outside the device); see text

characteristics of the device. In other words, regarding the out-coupling of light we must impose the following condition on the device such that [7] β 2 mK ¼ ðk0 eeÞee,

ð9:105Þ

where β is the propagation vector with |β| β that appeared in (8.153). The quantity β is called a propagation constant. Equation (9.105) can be visualized in Fig. 9.11 as geometrical relationship among the light propagation in crystal (indicated with β), the propagation outside the crystal (k0), and the grating wavevector (K ). The grating wavevector K is defined as jK j K ¼ 2π=Λ, where Λ is the grating period and the direction of K is perpendicular to the grating grooves; that direction is indicated in Fig. 9.10b with a purple arrow. The unit vector ee is defined as ee ¼

β mK : j β mK j

Namely, ee is oriented in the direction parallel to β 2 mK. In (9.105), furthermore, m is the diffraction order with a natural number (i.e., a positive integer); k0 denotes the wavenumber vector of an emission in vacuum with jk0 j k0 ¼ 2π=λp ,

ð9:106Þ

where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical to the wavenumber of the emission in air (i.e., the emission to be detected). If we are dealing with the emission parallel to the substrate plane, i.e., grazing emission, (9.105) can be reduced to

9.5 Lasers

365

β 2 mK ¼ k0 :

ð9:107Þ

Equations (9.105) and (9.107) represent the relationship between β and k0, i.e., the phase matching conditions between the emission inside the device (organic slab crystal) and the out-coupled emission, i.e., the emission outside the device (in air). Thus, the boundary conditions can be restated as (i) the tangential component continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and (ii) the phase matching of the electromagnetic fields both inside and outsides the laser medium. We examine these two factors below. (i) Tangential component continuity of electromagnetic fields: Let us suppose that we are dealing with a dielectric medium with anisotropic dielectric constant, but isotropic magnetic permeability. In this situation, Maxwell’s equations must be formulated so that we can deal with the electromagnetic properties in an anisotropic medium. Such substances are widely available and organic crystals are counted as typical examples. Among those crystals, P6T crystallizes in the monoclinic system as in many other cases of organic crystals [10, 11]. In this case, the permittivity tensor (or electric permittivity tensor) ε is written as [7] 0

εaa

B ε¼@ 0

εc a

0 εbb 0

εac

1

C 0 A:

ð9:108Þ

εc c

Notice that in (9.108) the permittivity tensor is described in the orthogonal coordinate of abc -system, where a and b coincide with those of the crystallographic a- and b-axes of P6T with the c -axis being perpendicular to the ab-plane. Note that in many of organic crystals the c-axis is not perpendicular to the ab-plane. Meanwhile, we assume that the magnetic permeability μ of P6T is isotropic and identical to that of vacuum. That is, in a tensor form we have 0

μ0

B μ¼@ 0 0

0 μ0 0

0

1

C 0 A, μ0

ð9:109Þ

where μ0 is the magnetic permeability of vacuum. Then, the Maxwell’s equations in a matrix form read as rot E ¼ 2 μ0

∂H , ∂t

ð9:110Þ

366

9 Light Quanta: Radiation and Absorption ∗

Fig. 9.12 Laboratory coordinate system (the ξηζsystem) for the device experiments. Regarding the symbols and notations, see text

O

0

εaa

B rot H ¼ ε0 @ 0 εc a

0 εbb 0

εac

ε

1

C ∂E 0 A : ∂t

ð9:111Þ

c c

In (9.111) the equation is described in the abc -system. But, it will be desired to describe the Maxwell’s equations in the laboratory coordinate system (see Fig. 9.12) so that we can readily visualize the light propagation in an anisotropic crystal such as P6T. Figure 9.12 depicts the ξηζ-system, where the ξ-axis is in the direction of the light propagation within the crystal, namely the ξ-axis parallels β in (9.107). We assume that the ξηζ-system forms an orthogonal coordinate. Here we define the dielectric constant ellipsoid e ε described by 0 1 ξ B C ðξ η ζ Þe ε@ η A ¼ 1, ζ

ð9:112Þ

which represents an ellipsoid in the ξηζ-system. If one cuts the ellipsoid by a plane that includes the origin of the coordinate system and perpendicular to the ξ-axis, its cross-section is an ellipse. In this situation, one can choose the η- and ζ- axes for the principal axis of the ellipse. This implies that when we put ξ ¼ 0, we must have 0 1 0 B C ð0 η ζ Þe ε@ η A ¼ εηη η2 þ εζζ ζ 2 ¼ 1: ζ In other words, we must have e ε in the form of

ð9:113Þ

9.5 Lasers

367

0

εξξ B e ε ¼ @ εηξ

εξη εηη

εζξ

1 εξζ C 0 A,

εζζ

0

where e ε is expressed in the ξηζ-system. In terms of the matrix algebra, the principal submatrix with respect to the η- and ζ- components should be diagonalized (see Sect. 14.5). On this condition, the electric flux density of the electromagnetic wave is polarized in the η- and ζ-direction. A half of the reciprocal square root of the pffiffiffiffiffiffi pffiffiffiffiffiffi principal axes, i.e., 1= εηη or 1= εζζ , represents the anisotropic refractive indices. Suppose that the ξηζ-system is reached by two successive rotations starting from the abc -system in such a way that we first perform the rotation by α around the c axis that is followed by another rotation by δ around the ξ-axis (Fig. 9.12). Then we have [7] 0

εξξ B @ εηξ

εζξ

εξη εηη

1 0 εξζ εaa C B 1 1 0 A ¼ Rξ ðδÞRc ðαÞ@ 0

εζζ

0

ε

c a

εac

0 εbb 0

ε

1

C 0 ARc ðαÞRξ ðδÞ,

ð9:114Þ

c c

where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the above, respectively. In (9.114) we have 0

cos α

B Rc ðαÞ ¼ @ sin α 0

sin α cos α 0

0

1

0

1

C B 0 A, Rξ ðδÞ ¼ @ 0 1 0

0 cos δ sin δ

0

1

C sin δ A: cos δ

ð9:115Þ

With the matrix presentations and related coordinate transformations for (9.114) and (9.115), see Sects. 11.4, 17.1, and 17.4.2. Note that the rotation angle δ cannot be decided independent of α. In the literature [7], furthermore, the quantity εξη ¼ εηξ was ignored as small compared to other components. As a result, (9.111) is converted into the following equation described by 0

εξξ

B rot H ¼ ε0 @ 0 εζξ

0 εηη 0

εξζ

1

C ∂E 0 A : ∂t εζζ

ð9:116Þ

Notice here that the electromagnetic quantities H and E are measured in the ξηζcoordinate system.

368

9 Light Quanta: Radiation and Absorption

Meanwhile, the electromagnetic waves that are propagating within the slab crystal are described as Φν exp ½iðβξ ωt Þ,

ð9:117Þ

where Φν stands for either an electric field or a magnetic field with ν chosen from ν ¼ 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into (9.116) and separating the resulting equation into ξ, η, and ζ components, we obtain six equations with respect to six components of H and E. Of these, we are particularly interested in Eξ and Eζ as well as Hη, because we assume that the electromagnetic wave is propagated as a TM mode [7]. Using the set of the above six equations, with Hη in the P6T crystal we get the following second-order linear differential equation (SOLDE): d2 H η εξζ dH η þ 2ineff k 0 þ εζζ dζ dζ 2

εξζ 2 εξξ εζζ

εξξ 2 n k 2 H ¼ 0, εζζ eff 0 η

ð9:118Þ

where neff ¼ n sin θ in (8.153) is the effective index. Assuming as a solution of (9.118) H η ¼ H cryst eiκζ cos ðkζ þ ϕs Þ κ 6¼ 0, k 6¼ 0, H cryst , ϕs : constant

ð9:119Þ

and inserting (9.119) into (9.118), we get a following type of equation described by Ζeiκζ cos ðkζ þ ϕs Þ þ Ωeiκζ sin ðkζ þ ϕs Þ ¼ 0:

ð9:120Þ

In (9.119) Hcryst represents the magnetic field within the P6T crystal. The constants Ζ and Ω in (9.120) can be described using κ and k together with the dH constant coefficients of dζη and Hη of SOLDE (9.118). Meanwhile, we have eiκζ cos ðkζ þ ϕ Þ s

0 iκζ e cos ðkζ þ ϕs Þ

eiκζ sin ðkζ þ ϕs Þ iκζ

0 ¼ ke2iκ 6¼ 0, e sin ðkζ þ ϕs Þ

where the differentiation is pertinent to ζ. This shows that eiκζ cos (kζ + ϕs) and eiκζ sin (kζ + ϕs) are linearly independent. This in turn implies that ΖΩ0 in (9.120). Thus, from (9.121) we can determine κ and k such that

ð9:121Þ

9.5 Lasers

κ¼β

369

εξζ εξζ 1 ¼ neff k0 , k 2 ¼ 2 εξξ εζζ εξζ 2 εζζ neff 2 k0 2 : εζζ εζζ εζζ

ð9:122Þ

Further using the aforementioned six equations to eliminate Eζ, we obtain Eξ ¼ i

εζζ kH eiκζ sin ðkζ þ ϕs Þ: ωε0 ðεξξ εζζ εξζ 2 Þ cryst

ð9:123Þ

Equations (9.119) and (9.123) define the electromagnetic fields as a function of ζ within the P6T slab crystal. Meanwhile, the fields in the AZO substrate and air can readily be determined. Assuming that both the substances are isotropic, we have εξζ ¼ 0 and εξξ ¼ εζζ e n2 and, hence, we get d2 H η 2 þ e n neff 2 k0 2 H η ¼ 0, 2 dζ

ð9:124Þ

where e n is the refractive index of either the substrate or air. Since e n neff n,

ð9:125Þ

where n is the phase refractive index of the P6T crystal, we have e n2 neff 2 < 0. Defining a quantity qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi γ neff 2 e n2 k 0

ð9:126Þ

for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic fields as e γζ , H η ¼ He

ð9:127Þ

e denotes the magnetic field relevant to the substrate or air. Considering that where H ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the magnetic field we have e γζ and H η ¼ He e γðζ cosd δÞ H η ¼ He

ð9:128Þ

for the substrate and air, respectively. Notice that Hη ! 0, when ζ ! 1 with the substrate and ζ ! 1 for air. Correspondingly, with the electric field we get

370

9 Light Quanta: Radiation and Absorption

E ξ ¼ i

γ e γζ γ e γðζ cosd δÞ He and E ξ ¼ i He 2 n n2 ωε0e ωε0e

ð9:129Þ

for the substrate and air, respectively. The device geometry is characterized by that the P6T slab crystal is sandwiched by air and the device substrate so as to form the three-layered (air/crystal/substrate) structure (Fig. 9.10c). Therefore, we impose boundary conditions on Hη in (9.119) and (9.128) along with Eξ in (9.123) and (9.129) in such a way that their tangential components are continuous across the interfaces between the crystal and the substrate and between the crystal and air. Figure 9.13 represents the geometry of the cross-section of air/P6T crystal/AZO substrate. From (9.119) and (9.128), (a) the tangential continuity condition for the magnetic field at the crystal/substrate interface is described as e γ0 ð cos δÞ, H cryst eiκ0 cos ðk 0 þ ϕs Þð cos δÞ ¼ He

ð9:130Þ

where the factor of cosδ comes from the requirement of the tangential continuity condition. (b) The tangential continuity condition for the electric field at the same interface reads as i

εζζ γ e γ0 He : kH eiκ0 sin ðk 0 þ ϕs Þ ¼ i ωε0 ðεξξ εζζ εξζ 2 Þ cryst n2 ωε0e

ð9:131Þ

Thus, dividing both sides of (9.131) by both sides of (9.130), we get εζζ εζζ γ γ k tan ϕs ¼ 2 or k tan ðϕs Þ ¼ 2 , 2 εξξ εζζ εξζ 2 ε ε ε ξξ ζζ ξζ e e n n

ð9:132Þ

where γ and e n are substrate related quantities; see (9.125) and (9.126). Another boundary condition at the air/crystal interface can be obtained in a similar manner. Notice that in that case ζ ¼ d/ cos δ; see Fig. 9.13. As a result, we have

Fig. 9.13 Geometry of the cross-section of air/P6T crystal/AZO substrate. The angle δ is identical with that of Fig. 9.12. The points P and P0 are located at the crystal/substrate interface and air/crystal interface, respectively. The distance between P and P0 is d/ cos δ

∗

air

P6T crystal

AZO substrate

9.5 Lasers

371

εζζ k tan εξξ εζζ εξζ 2

kd þ ϕs cos δ

¼

γ , e n2

ð9:133Þ

where γ and e n are air related quantities. The quantity ϕs can be eliminated considering arc tangent of (9.132) and (9.133). In combination with (9.122), we finally get the following eigenvalue equation for TM modes: rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ðε ε εξζ 2 Þðεζζ neff 2 Þ=ð cos δÞ εζζ 2 ξξ ζζ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi "

ffi# 2n 2 1 n eff air 1 ¼ lπ þ tan ðεξξ εζζ εξζ 2 Þ nair 2 εζζ neff 2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi "

ffi# 1 neff 2 nsub 2 1 2 þ tan ðεξξ εζζ εξζ Þ , nsub 2 εζζ neff 2 k0 d

ð9:134Þ

where nair (¼ 1) is the refractive index of air, nsub is that of AZO substrate equipped with a diffraction grating, d is the crystal thickness, and l is the order of the transverse mode. The refractive index nsub is estimated as the average of the refractive indices of air and AZO weighted by volume fraction they occupy in the diffraction grating. To solve (9.134), iterative numerical computation was needed. The detailed calculation procedures for this can be seen in the literature [7] and supplementary material therein. Note that (9.134) includes the well-known eigenvalue equation for a waveguide that comprises an isotropic core dielectric medium sandwiched by a couple of clad layers having the same refractive index (symmetric waveguide) [12]. In that case, in fact, the second and third terms in RHS of (9.134) are the same and their sum is identical with δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134) use εξξ ¼ εζζ n2 [see (7.57)] and εξζ ¼ 0 together with the relative refractive index given by (8.39). (ii) Phase matching of the electromagnetic fields: Once we have obtained the eigenvalue equation described by (9.134), we will be able to solve the problem and, hence, to design a high-performance optical device, especially the laser. To this end, let us return back to (9.107). Taking inner products (see Chap. 13) of both sides of (9.107), we have hβ 2 mKjβ 2 mK i ¼ hk0 jk0 i: That is, we get

372

9 Light Quanta: Radiation and Absorption

β2 þ m2 K 2 2mKβ cos φ ¼ k0 2 ,

ð9:135Þ

where φ is an angle between mK and β (see Fig. 9.11). Inserting (9.104) into (9.135) and solving the quadratic equation, we obtain neff as a function of k0 (or emission wavelength). This implies that we can determine neff from the experimental data. Yet, the above process does not mean that we can determine neff uniquely. It is because there is still room for choice of a positive integer m. Using (9.125) that restricts the values neff, however, we can decide the most probable integer m. Thus, we can compare neff obtained from (9.134) with neff determined from the experiments. As described above, we have explained two major factors of the out-coupling of emission. In the above discussion, we assumed that the TM modes dominate. Combining (9.107) and (9.134), we successfully assign individual emissions to the specific TMml modes, in which the indices m and l are longitudinal and transverse modes with m associated with the grating. Note that m 1 and l 0. With the index l, l ¼ 0 is allowed because the total internal reflection (Sects. 8.5 and 8.6) is responsible. Figure 9.14a [7] shows an example of the angle-dependent emission spectra. For this, the grazing emission was intended. There are two series of progression in which the emission peak locations are either redshifted or blueshifted with increasing

(b)

RotatAngle ion angle (degree)

Intensity 3 (×10 counts)

(a) 4 2 0

90 60 30 0

−30 −60 −90 600 700 800 Wavelength (nm)

Fig. 9.14 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1. 5030486. (b) The grazing emissions (k0) were collected at various angles θ made with the grating wavevector direction (K)

373

Refractive index, effective index

9.5 Lasers

3 (2, 0, b)

(2, 1, b)

(m, l, s) = (1, 0, r)

2 (1, 1, r)

600

700

800

Peak wavelength (nm)

Fig. 9.15 Wavelength dispersion of effective indices neff. The open and closed symbols show the data pertinent to the blueshifted and redshifted peaks, respectively. These data were calculated from either (9.105) or (9.107). The colored solid curves represent the dispersions of the effective refractive indices computed from (9.134). The numbers and characters (m, l, s) denote the diffraction order (m), order of transverse mode (l ), and shift direction (s) that distinguishes between blueshift (b) and redshift (r). A black solid curve at the upper right indicates the wavelength dispersion of the phase refractive indices (n) of the P6T crystal related to the one principal axis. Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10. 1063/1.5030486

angles jθj. Figure 9.15 [7] compares the effective indices determined by the experiments and numerical computations based on (9.134). The experimentally observed emission peaks are indicated with and open circles (or open triangles) or closed circles (or closed triangles) with blueshifted or redshifted peaks, respectively. The color solid curves represent the results obtained from (9.134). As clearly recognized in Fig. 9.15 [7], the results of the experiments and computations are satisfactory. In particular, TM20 (blueshifted) and TM10 (redshifted) modes are seamlessly jointed so as to be an upper group of emission peaks. A lower group of emission peaks are assigned to TM21 (blueshifted) or TM11 (redshifted) modes. Notice here that these modes may have multiple values of neff according to different εξξ, εζζ, and εξζ that vary with the different propagation directions of light (i.e., the ξ-axis) within the slab crystal waveguide. From a practical point of view, it is desired to make a device so that it can strongly emit light in the direction parallel to the grating wavevector. In that case, (9.107) can be rewritten as β ¼ ðk0 þ mK Þee,

374

9 Light Quanta: Radiation and Absorption

where all the vectors β, K, and k0 are parallel to ee. Then, we have two choices such that β ¼ k0 þ mK ¼ neff k 0 , k 0 ¼

2π mK ; ¼ λp neff 1

β ¼ k0 mK ¼ neff k0 , k0 ¼

2π mK ¼ : λp neff þ 1

ð9:136Þ

The first and second equations of (9.136) are associated with the spectra of redshifted and blueshifted progressions, respectively. Further rewriting (9.136) to combine the two equations, we get λp ¼

2π ðneff 1Þ ðneff 1ÞΛ ¼ , mK m

ð9:137Þ

where we used the relation of K ¼ 2π/Λ. From (9.134) we find that neff implicitly depends on l and d (crystal thickness). Thus, (9.137) tells us that if we choose m, l, and d appropriately, we can determine the most probable neff. Then, choosing the grating period Λ appropriately in turn, we may predict the emission peak wavelength λp. The other way around, designating λp properly, we can decide Λ. In fact, λp can experimentally be determined from an emission color (or maximum of emission gain) inherent to the chemical species of organic crystals. For example, a maximum of emission gain of P6T is located around 660 nm, as can be seen from Fig. 9.14a [7]. In combination with (9.134), (9.137) finally allows us to choose the optimum combination of the device parameters m, l, d, Λ, and λp. When the laser oscillation is taking place, (9.136) and (9.137) should be replaced with the following expressions that represent the Bragg’s condition [13, 14]: 2β ¼ mK

ð9:138Þ

or λp ¼

2neff Λ : m

ð9:139Þ

As discussed above in detail, Examples 9.1 and 9.2 enable us to design a highperformance laser device using an organic crystal that is characterized by a pretty complicated crystal structure associated with anisotropic refractive indices. The abovementioned design principle enables one to construct effective laser devices that consist of light-emitting materials either organic or inorganic more widely. At the same time, these examples are expected to provide an effective methodology in interdisciplinary fields encompassing solid-state physics and solid-state chemistry as well as device physics.

9.6 Mechanical System

9.6

375

Mechanical System

As outlined above, two-level atoms have distinct characteristics in connection with lasers. Electromagnetic waves similarly confined within a one-dimensional cavity exhibit related properties and above all have many features in common with a harmonic oscillator [15]. We have already described several features and properties of the harmonic oscillator (Chap. 2). Meanwhile, we have briefly discussed formation of electromagnetic stationary waves (Chap. 8). There are several resemblance and correspondence between the harmonic oscillator and electromagnetic stationary waves when we view them as mechanical systems. The point is that in a harmonic oscillator the position and momentum are in-quadrature relationship; i.e., their phase difference is π/2. For the electromagnetic stationary waves, electric field and magnetic field are in-quadrature relationship as well. In Chap. 8, we examine the conditions under which stationary waves are formed. In a dielectric medium both sides of which are equipped with metal layers (or mirrors), the electric field is described as E ¼ 2E 1 εe sin ωt sin kz:

ð8:201Þ

In (8.201), we assumed that forward and backward electromagnetic waves are propagating in the direction of the z-axis. Here, we assume that the interfaces (or walls) are positioned at z ¼ 0 and z ¼ L. Within a domain [0, L], the two waves form a stationary wave. Since this expression assumed two waves, the electromagnetic energy was doubled. To normalize the energy sopthat ffiffiffi a single wave is contained, the amplitude E1 of (8.201) should be divides by 2. Therefore, we think of a following description for E: E¼

pffiffiffi pffiffiffi 2Ee1 sin ωt sin kz or Ex ¼ 2E sin ωt sin kz,

ð9:140Þ

where we designated the polarization vector as a direction of the x-axis. At the same time, we omitted the index from the amplitude. Thus, from the second equation of (7.65) we have pffiffiffi ∂H y 2Ek 1 ∂E x sin ωt cos kz: ¼ ¼ μ μ ∂z ∂t

ð9:141Þ

Note that this equation appears in the second equation of (8.131) as well. Integrating both sides of (9.141), we get Hy ¼ Using a relation ω ¼ vk, we have

pffiffiffi 2Ek cos ωt cos kz: μω

376

9 Light Quanta: Radiation and Absorption

pffiffiffi 2E cos ωt cos kz þ C, Hy ¼ μv where v is a light velocity in the dielectric medium and C is an integration constant. Removing C and putting H

E , μv

we have Hy ¼

pffiffiffi 2H cos ωt cos kz:

Using a vector expression, we have pffiffiffi H ¼ e2 2H cos ωt cos kz: Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form nodes and antinodes, respectively. Namely, the two fields are in-quadrature. Let us calculate electromagnetic energy of the dielectric medium within a cavity. In the present case the cavity is meant as the dielectric sandwiched by a couple of metal layer. We have W ¼ W e þ W m,

ð9:142Þ

where W is the total electromagnetic energy; We and Wm are electric and magnetic energies, respectively. Let the length of the cavity be L. Then the energy per unit cross-section area is described as ε We ¼ 2

Z

L

μ E dz and W m ¼ 2 2

0

Z

L

H 2 dz:

ð9:143Þ

0

Performing integration, we get ε W e ¼ LE 2 sin2 ωt, 2 2 μ 2 μ E ε 2 cos 2 ωt ¼ LE 2 cos 2 ωt, W m ¼ LH cos ωt ¼ L 2 2 μv 2 where we used 1/v2 ¼ εμ with the last equality. Thus, we have

ð9:144Þ

9.6 Mechanical System

377

ε μ W ¼ LE 2 ¼ LH 2 : 2 2

ð9:145Þ

fe , W g e Representing an energy per unit volume as W m , and W, we have ε 2 ε 2 μ 2 2 fe ¼ ε E 2 sin2 ωt, W g e W m ¼ E cos ωt, W ¼ E ¼ H : 2 2 2 2

ð9:146Þ

In Chap. 2, we treated motion of a harmonic oscillator. There, we had xð t Þ ¼

v0 sin ωt ¼ x0 sin ωt: ω

ð2:7Þ

Here, we have defined an amplitude of the harmonic oscillation as x0 (>0) x0

v0 : ω

Then, momentum is described as pðt Þ ¼ mxð_t Þ ¼ mωx0 cos ωt: Defining p0 mωx0 , we have pðt Þ ¼ p0 cos ωt: Let a kinetic energy and potential energy of the oscillator be K and V, respectively. Then, we have 1 1 2 pðt Þ2 ¼ p cos 2 ωtxðt Þ, 2m 2m 0 1 1 V ¼ mω2 xðt Þ2 ¼ mω2 x0 2 sin2 ωtxðt Þ, 2 2 1 2 1 1 p ¼ mv0 2 ¼ mω2 x0 2 : W ¼KþV ¼ 2m 0 2 2 K¼

ð9:147Þ

Comparing (9.146) and (9.147), we recognize the following relationship in energy between the electromagnetic fields and harmonic oscillator motion [15]: pffiffiffiffi pffiffiffiffi pffiffiffi pffiffiffi mωx0 ⟷ εE and p0 = m⟷ μH:

ð9:148Þ

378

9 Light Quanta: Radiation and Absorption

Thus, there is an elegant contradistinction between the dynamics of electromagnetic fields in cavity and motion of a harmonic oscillator. In fact, quantum electromagnetism is based upon the treatment of a quantum harmonic oscillator introduced in Chap. 2.

References 1. Moore WJ (1955) Physical chemistry, 3rd edn. Prentice-Hall, Englewood Cliffs 2. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester 3. Sunakawa S (1965) Theoretical electromagnetism. Kinokuniya, Tokyo. (in Japanese) 4. Loudon R (2000) The quantum theory of light, 3rd edn. Oxford University Press, Oxford 5. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge 6. Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5):053113 7. Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys 123(23):235501 8. Siegman AE (1986) Lasers. University Science Books, Sausalito 9. Palmer C (2005) Diffraction grating handbook, 6th edn. Newport Corporation, New York 10. Hotta S, Yamao T (2011) The thiophene/phenylene co-oligomers: exotic molecular semiconductors integrating high-performance electronic and optical functionalities. J Mater Chem 21 (5):1295–1304 11. Hotta S, Goto M, Azumi R, Inoue M, Ichikawa M, Taniguchi Y (2004) Crystal structures of thiophene/phenylene co-oligomers with different molecular shapes. Chem Mater 16 (2):237–241 12. Yariv A, Yeh P (2003) Optical waves in crystals: propagation and control of laser radiation. Wiley, New York 13. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester 14. Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5):053102 15. Fox M (2006) Quantum Optics. Oxford University Press, Oxford

Chapter 10

Introductory Green’s Functions

In this chapter, we deal with various properties and characteristics of differential equations, especially first-order linear differential equations (FOLDEs) and secondorder linear differential equations (SOLDEs). These differential equations are characterized by differential operators and boundary conditions (BCs). Of these, differential operators appearing in SOLDEs are particularly important. Under appropriate conditions, the said operators can be converted to Hermitian operators. The SOLDEs associated to classical orthogonal polynomials play a central role in many fields of mathematical physics including quantum mechanics and electromagnetism. We study the general principle of SOLDEs in relation to several specific SOLDEs we have studied in Part I and examine general features of an eigenvalue problem and an initial-value problem (IVP). In this context, Green’s functions provide a powerful tool for solving SOLDEs. For a practical purpose, we deal with actual construction of Green’s functions. In Sect. 8.8, we dealt with steady-state characteristics of electromagnetic waves in dielectrics in terms of propagation, reflection, and transmission. When we consider transient characteristics of electromagnetic and optical phenomena, we often need to deal with SOLDEs having constant coefficients. This is well known in connection with a motion of a damped harmonic oscillator. In the latter part of this chapter, we treat the initial value problem of a SOLDE of this type.

10.1

Second-Order Linear Differential Equations (SOLDEs)

A general form of n-th order linear differential equations has the following form: a n ð xÞ

dn u dn1 u du þ an1 ðxÞ n1 þ þ a1 ðxÞ þ a0 ðxÞu ¼ d ðxÞ: n dx dx dx

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_10

ð10:1Þ

379

380

10

Introductory Green’s Functions

Table 10.1 Characteristics of SOLDEs Equation Boundary conditions

Type I Homogeneous Homogeneous

Type II Homogeneous Inhomogeneous

Type III Inhomogeneous Homogeneous

Type IV Inhomogeneous Inhomogeneous

If d(x) ¼ 0, the differential equation is said to be homogeneous; otherwise it is called inhomogeneous. Equation (10.1) is a linear function of u and its derivatives. Likewise, we have a SOLDE such that að x Þ

d2 u du þ bðxÞ þ cðxÞu ¼ dðxÞ: dx dx2

ð10:2Þ

In (10.2) we assume that the variable x is real. The equation can be solved under appropriate boundary conditions (BCs). A general form of BCs is described as du du ¼ σ1, x¼a þ γ 1 uðbÞ þ δ1 dx dx x¼b du du B2 ðuÞ ¼ α2 uðaÞ þ β2 x¼a þ γ 2 uðbÞ þ δ2 ¼ σ2, dx dx x¼b

B1 ðuÞ ¼ α1 uðaÞ þ β1

ð10:3Þ ð10:4Þ

where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b], where a and b can be infinity (i.e., 1). The LHS of B1(u) and B2(u) are referred to as boundary functionals [1, 2]. If σ 1 ¼ σ 2 ¼ 0, the BCs are called homogeneous; otherwise the BCs are said to be inhomogeneous. In combination with the inhomogeneous equation expressed as (10.2), Table 10.1 summarizes characteristics of SOLDEs. We have four types of SOLDEs according to homogeneity and inhomogeneity of equations and BCs. Even though SOLDEs are mathematically tractable, yet it is not easy necessarily to solve them depending upon the nature of a(x), b(x), and c(x) of (8.2). Nonetheless, if those functions are constant coefficients, it can readily be solved. We will deal with SOLDEs of that type in great deal later. Suppose that we find two linearly independent solutions u1(x) and u2(x) of a following homogeneous equation: að x Þ

d2 u du þ bðxÞ þ cðxÞu ¼ 0: dx dx2

ð10:5Þ

Then, any solution u(x) of (10.5) can be expressed as their linear combination such that uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ,

ð10:6Þ

where c1 and c2 are some arbitrary constants. In general, suppose that there are arbitrarily chosen n functions; i.e., f1(x), f2(x), , fn(x). Suppose a following equation with those functions:

10.1

Second-Order Linear Differential Equations (SOLDEs)

a1 f 1 ðxÞ þ a2 f 2 ðxÞ þ þ an f n ðxÞ ¼ 0,

381

ð10:7Þ

where a1, a2, , and an are constants. If a1 ¼ a2 ¼ ¼ an ¼ 0, (10.7) always holds. In this case, (10.7) is said to be a trivial linear relation. If f1(x), f2(x), , and fn(x) satisfy a nontrivial linear relation, f1(x), f2(x), , and fn(x) are said to be linearly dependent. That is, the nontrivial expression means that in (10.7) at least one of a1, a2, , and an is nonzero. Suppose that an 6¼ 0. Then, from (10.7), fn(x) is expressed as f n ð xÞ ¼

a1 a a f ðxÞ 2 f 2 ðxÞ n1 f n1 ðxÞ: an 1 an an

ð10:8Þ

If f1(x), f2(x), , and fn(x) are not linearly dependent, they are called linearly independent. In other words, the statement that f1(x), f2(x), , and fn(x) are linearly independent is equivalent to that (10.7) holds if and only if a1 ¼ a2 ¼ ¼ an ¼ 0. We will have relevant discussion in Part III. Now suppose that with the above two linearly independent functions u1(x) and u2(x), we have a1 u1 ðxÞ þ a2 u2 ðxÞ ¼ 0:

ð10:9Þ

Differentiating (10.9), we have a1

du1 ðxÞ du ðxÞ þ a2 2 ¼ 0: dx dx

ð10:10Þ

Expressing (10.9) and (10.10) in a matrix form, we get 0

1 u1 ð x Þ u2 ðxÞ a @ du ðxÞ du ðxÞ A 1 ¼ 0: 1 2 a2 dx dx

ð10:11Þ

Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the following expression holds: u1 ð x Þ u2 ðxÞ du ðxÞ du ðxÞ W ðu1 , u2 Þ 6¼ 0, 1 2 dx dx

ð10:12Þ

where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) ¼ 0, then we have u1

du2 du u2 1 ¼ 0: dx dx

ð10:13Þ

382

10

Introductory Green’s Functions

This implies that there is a functional relationship between u1 and u2. In fact, if we du2 du1 2 can express as u2 ¼ u2(u1(x)), then du dx ¼ du1 dx . That is, u1

du2 du du du du du du ¼ u1 2 1 ¼ u2 1 or u1 2 ¼ u2 or 2 ¼ 1 , dx du1 dx dx du1 u2 u1

ð10:14Þ

where the second equality of the first equation comes from (10.13). The third equation can easily be integrated to yield ln

u2 u ¼ c or 2 ¼ ec or u2 ¼ ec u1 : u1 u1

ð10:15Þ

Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to show if u1(x) and u2(x) are linearly dependent, W(u1, u2) ¼ 0. Thus, we have a following statement: Two functions are linearly dependent: , W ðu1 , u2 Þ ¼ 0: Then, as the contraposition of this statement we have Two functions are linearly independent: , W ðu1 , u2 Þ 6¼ 0: On the other hand, suppose that we have another solution u3(x) for (10.5) besides u1(x) and u2(x). Then, we have að x Þ

d 2 u1 du þ bðxÞ 1 þ cðxÞu1 ¼ 0, dx dx2

að x Þ

d 2 u2 du þ bðxÞ 2 þ cðxÞu2 ¼ 0, dx dx2

aðxÞ

d 2 u3 du þ bðxÞ 3 þ cðxÞu3 ¼ 0: dx dx2

ð10:16Þ

Again rewriting (10.16) in a matrix form, we have 0 B B B B B B @

d 2 u1 dx2 d 2 u2 dx2 d 2 u3 dx2

10 1 a du1 u1 CB C dx CB C CB b C du2 B C u2 C CB C ¼ 0: dx CB C A@ A du3 u3 c dx

ð10:17Þ

A necessary and sufficient condition to obtain a nontrivial solution (i.e., a solution besides a ¼ b ¼ c ¼ 0) is that [1]

10.1

Second-Order Linear Differential Equations (SOLDEs)

d 2 u1 dx2 d 2 u2 dx2 d 2 u3 dx2

du1 dx du2 dx du3 dx

u1 u2 u3

¼ 0:

383

ð10:18Þ

Note here that

d 2 u1 dx2 d 2 u2 dx2 d 2 u3 dx2

du1 dx du2 dx du3 dx

u1 u2 u3

u1 du1 ¼ dx 2 d u1 2 dx

u2 du2 dx d 2 u2 dx2

u3 du3 dx W ðu1 , u2 , u3 Þ, d 2 u3 dx2

ð10:19Þ

where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we used the fact that a determinant of a matrix is identical to that of its transposed matrix and that a determinant of a matrix changes the sign after permutation of row vectors. To be short, a necessary and sufficient condition to get a nontrivial solution is that W (u1, u2, u3) vanishes. This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19) mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is, we have no third linearly independent solution. Consequently, the general solution of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a fundamental set of solutions of (10.5). Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is a particular solution of (10.2). Let us think of a following function v(x) such that: uðxÞ ¼ vðxÞ þ up ðxÞ:

ð10:20Þ

Substituting (10.20) for (10.2), we have aðxÞ

d 2 up dup d2 v dv þ c ð x Þv þ a ð x Þ þ b ð x Þ þ bð x Þ þ cðxÞup ¼ dðxÞ: 2 2 dx dx dx dx

Therefore, we have að x Þ

d2 v dv þ bðxÞ þ cðxÞv ¼ 0: 2 dx dx

ð10:21Þ

384

10

Introductory Green’s Functions

But, v(x) can be described by a linear combination of u1(x) and u2(x) as in the case of (10.6). Hence, the general solution of (10.2) should be expressed as uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ þ up ðxÞ,

ð10:22Þ

where c1 and c2 are arbitrary (complex) constants.

10.2

First-Order Linear Differential Equations (FOLDEs)

In a discussion that follows, first-order linear differential equations (FOLDEs) supply us with useful information. A general form of FOLDEs is expressed as að x Þ

du þ bðxÞu ¼ d ðxÞ ½aðxÞ 6¼ 0: dx

ð10:23Þ

An associated boundary condition is given by a boundary functional B(u) such that BðuÞ ¼ αuðaÞ þ βuðbÞ ¼ σ,

ð10:24Þ

where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23) d (x) 0, (10.23) can readily be integrated to yield a solution. Let us multiply both sides of (10.23) by w(x). Then we have wðxÞaðxÞ

du þ wðxÞbðxÞu ¼ wðxÞd ðxÞ: dx

ð10:25Þ

We define p(x) as pðxÞ wðxÞaðxÞ,

ð10:26Þ

where w(x) is called a weight function. As mentioned in Sect. 2.3, the weight function is a real and non-negative function within the domain considered. Here we suppose that dpðxÞ ¼ wðxÞbðxÞ: dx

ð10:27Þ

Then, (10.25) can be rewritten as d ½pðxÞu ¼ wðxÞdðxÞ: dx

ð10:28Þ

10.2

First-Order Linear Differential Equations (FOLDEs)

385

Thus, we can immediately integrate (10.28) to obtain a solution u¼

1 pð x Þ

Z

x

wðx0 Þdðx0 Þdx0 þ C ,

ð10:29Þ

where C is an arbitrary integration constant. To seek w(x), from (10.26) and (10.27) we have b : p ¼ ðwaÞ ¼ wb ¼ wa a 0

0

ð10:30Þ

This can easily be integrated for wa to be expressed as wa ¼ C 0 exp

Z

Z b C0 b dx or w ¼ dx , exp a a a

ð10:31Þ

0

where C0 is an arbitrary integration constant. The quantity Ca must be non-negative so that w can be non-negative. Example 10.1 Let us think of a following FOLDE within an interval [a, b]; i.e., a x b. du þ xu ¼ x: dx

ð10:32Þ

A boundary condition is set such that uðaÞ ¼ σ:

ð10:33Þ

Notice that (10.33) is obtained by setting α ¼ 1 and β ¼ 0 in (10.24). Following the above argument, we obtain a solution described as u¼

1 pð x Þ

Z

x

wðx0 Þdðx0 Þdx0 þ uðaÞpðaÞ :

ð10:34Þ

a

Also, we have Z

x

pðxÞ ¼ wðxÞ ¼ exp

0

x dx a

0

¼ exp

h i 1 2 x a2 : 2

The integration of RHS can be performed as follows:

ð10:35Þ

386

10

Z

x

0

0

0

Z

h 2

i 1 0 x a2 dx0 2 a 2 Z x2 =2 a ¼ exp exp tdt 2 a2 =2 2 2 2 a x a ¼ exp exp exp 2 2 2 2 2 a x ¼ exp exp 1 ¼ pðxÞ 1, 2 2

wðx Þdðx Þdx ¼

a

Introductory Green’s Functions

x

x0 exp

ð10:36Þ

where with the second equality we used an integration by substitution of 12 x0 ⟶t. Considering (10.33) and putting p(a) ¼ 1, (10.34) is rewritten as 2

1 σ1 ½ pð x Þ 1 þ σ ¼ 1 þ pð x Þ pðxÞ 2 2 a x ¼ 1 þ ðσ 1Þ exp : 2

u¼

ð10:37Þ

where with the last equality we used (10.35). Notice that when σ ¼ 1, u(x) 1. This is because u(x) 1 is certainly a solution of (10.32) and satisfies the BC of (10.33). Uniqueness of a solution imposes this strict condition upon (10.37). In the next two examples, we make general discussions. Example 10.2 Let us consider a following differential operator: Lx ¼

d : dx

ð10:38Þ

We think of a following identity using an integration by parts: Z

b

dxφ

d ψ dx

a

Z

d φ ψ ¼ ½φ ψ ba : dx

ð10:39Þ

d dx φ ψ ¼ ½φ ψ ba : dx

ð10:40Þ

b

þ

dx a

Rewriting this, we get Z

b a

dxφ

d ψ dx

Z

b

a

Looking at (10.40), we notice that LHS comprises a difference between two integrals, while RHS referred to as a boundary term (or surface term) does not contain an integral. Recalling the expression (1.128) and defining

10.2

First-Order Linear Differential Equations (FOLDEs)

387

d Lex dx in (10.40), we have hφjLx ψ i Lex φjψ ¼ ½φ ψ ba :

ð10:41Þ

Here, RHS of (10.41) needs to vanish so that we can have hφjLx ψ i ¼ Lex φjψ :

ð10:42Þ

Meanwhile, adopting the expression (1.112) with respect to an adjoint operator, we have a following expression such that hφjLx ψ i ¼ Lx { φjψ :

ð10:43Þ

Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary functions, we have Lex ¼ Lx { . Thus, as an operator adjoint to Lx we get Lx { ¼ Lex ¼

d ¼ Lx : dx

Notice that only if the surface term vanishes, the adjoint operator Lx{ can appropriately be defined. We will encounter a similar expression again in Part III. We add that if with an operator A we have a relation describe by A{ ¼ A,

ð10:44Þ

the operator A is said to be anti-Hermitian. We have already encountered such an operator in Sect. 1.5. Let us then examine on what condition the surface term vanishes. The RHS of (10.40) and (10.41) is given by φ ðbÞψ ðbÞ φ ðaÞψ ðaÞ: For this term to vanish, we should have φ ðbÞψ ðbÞ ¼ φ ðaÞψ ðaÞ or

φ ð bÞ ψ ð aÞ ¼ : φ ð aÞ ψ ð bÞ

If ψ(b) ¼ 2ψ(a), then we should have φðbÞ ¼ 12 φðaÞ for the surface term to vanish. Recalling (10.24), the above conditions are expressed as

388

10

Introductory Green’s Functions

Bðψ Þ ¼ 2ψ ðaÞ ψ ðbÞ ¼ 0,

ð10:45Þ

1 B0 ðφÞ ¼ φðaÞ φðbÞ ¼ 0: 2

ð10:46Þ

The boundary functional B0(φ) is said to be adjoint to B(ψ). The two boundary functionals are admittedly different. If, however, we set ψ(b) ¼ ψ(a), then we should have φ(b) ¼ φ(a) for the surface term to vanish. That is Bðψ Þ ¼ ψ ðaÞ ψ ðbÞ ¼ 0, 0

B ðφÞ ¼ φðaÞ φðbÞ ¼ 0:

ð10:47Þ ð10:48Þ

Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs with respect to these functionals. As discussed above, a FOLDE is characterized by its differential operator as well as a BC (or boundary functional). This is similarly the case with SOLDEs as well. Example 10.3 Next, let us consider a following differential operator: Lx ¼

1 d : i dx

ð10:49Þ

As in the case of Example 10.2, we have Z

b

dxφ a

1 d ψ i dx

Z

b

a

1 d 1 φ ψ ¼ ½φ ψ ba : dx i dx i

ð10:50Þ

Also rewriting (10.50) using an inner product notation, we get 1 hφjLx ψ i hLx φjψ i ¼ ½φ ψ ba , i

ð10:51Þ

Apart from the factor 1i , RHS of (10.51) are again given by φ ðbÞψ ðbÞ φ ðaÞψ ðaÞ: Repeating a discussion similar to Example 10.2, when the surface term vanishes, we get hφjLx ψ i ¼ hLx φjψ i: Comparing (10.43) and (10.52), we have

ð10:52Þ

10.3

Second-Order Differential Operators

hLx φjψ i ¼ Lx { φjψ :

389

ð10:53Þ

Again considering that φ and ψ are arbitrary functions, we get Lx { ¼ Lx :

ð10:54Þ

As in (10.54), if the differential operator is identical to its adjoint operator, such an operator is called self-adjoint. On the basis of (1.119), Lx would apparently be Hermitian. However, we have to be careful to assure that Lx is Hermitian. For a differential operator to be Hermitian, (i) the said operator must be self-adjoint. (ii) The two boundary functionals adjoint to each other must be identical. In other words, ψ and φ must satisfy the same homogeneous BCs with respect to these functionals. In this example, we must have the same boundary functionals as those described by (10.47) and (10.48). If and only if the conditions (i) and (ii) are satisfied, the operator is said to be Hermitian. It seems somewhat a formal expression. Nonetheless, satisfaction of these conditions is also the case with second-order differential operators so that these operators can be Hermitian. In fact, SOLDEs we studied in Part I are essentially dealt with within the framework of the aforementioned formalism.

10.3

Second-Order Differential Operators

The second-order differential operators are the most common operators and frequently treated in mathematical physics. The general differential operators are described as Lx ¼ a ð x Þ

d2 d þ bðxÞ þ cðxÞ, 2 dx dx

ð10:55Þ

where a(x), b(x), and c(x) can in general be complex functions of a real variable x. Let us think of following identities [1]: d2 ðav Þ d ðav Þ d2 u d du v a 2 u , av u ¼ dx dx dx dx dx2

v b

dðbv Þ du d ¼ ½buv , þu dx dx dx

v cu ucv ¼ 0: Summing both sides of (10.56), we have an identity

ð10:56Þ

390

10

Introductory Green’s Functions

2 d ðav Þ d ðbv Þ d2 u du þ cv v a 2 þ b þ cu u dx dx dx dx2 dðav Þ d du d þ ½buv : ¼ av u dx dx dx dx

ð10:57Þ

Hence, following the expressions of Sect. 10.2, we define Lx{ such that

Lx { v

d 2 ðav Þ d ðbv Þ þ cv : dx dx2

ð10:58Þ

Taking a complex conjugate of both sides, we get Lx { v ¼

d 2 ð a v Þ d ð b v Þ þ c v: dx dx2

ð10:59Þ

Considering the differential of a product function, we have as Lx{ Lx

{

d2 da d d2 a db þ b þ c : ¼a þ 2 dx dx2 dx dx dx2

ð10:60Þ

Replacing (10.57) with (10.55) and (10.59), we have d ðav Þ d du þ buv : av u v ðLx uÞ Lx { v u ¼ dx dx dx

ð10:61Þ

Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61) within that interval, we get Z

s r

s h { i dðav Þ du þ buv : u dx v ðLx uÞ Lx v u ¼ av dx dx r

ð10:62Þ

Using the definition of an inner product described in (1.128) and rewriting (10.62), we have s dðav Þ du þ buv : hvjLx ui Lx { vju ¼ av u dx dx r Here if RHS of the above (i.e., the surface term of the above expression) vanishes, we get hvjLx ui ¼ Lx { vju : We find that this notation is consistent with (1.112).

10.3

Second-Order Differential Operators

391

Bearing in mind this situation, let us seek a condition under which the differential operator Lx is Hermitian. Suppose here that a(x), b(x), and c(x) are all real and that daðxÞ ¼ bðxÞ: dx

ð10:63Þ

Then, instead of (10.60), we have Lx { ¼ a ð x Þ

d2 d þ bð x Þ þ c ð x Þ ¼ L x : 2 dx dx

Thus, we are successful in constituting a self-adjoint operator Lx. In that case, (10.62) can be rewritten as Z r

s

s dv du u dx½v ðLx uÞ ½Lx v u ¼ a v : dx dx r

ð10:64Þ

Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get hvjLx ui ¼ hLx vjui: This notation is consistent with (1.119) and the Hermiticity of Lx becomes well defined. If we do not have the condition of dadxðxÞ ¼ bðxÞ , how can we deal with the problem? The answer is that following the procedures in Sect. 10.2, we can convert Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x)b (x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as

2 d ðawv Þ dðbwv Þ d2 u du v aw 2 þ bw þ cwu u þ cwv dx dx dx dx2 dðawv Þ d du d ¼ þ ½bwuv : awv u dx dx dx dx

ð10:65Þ

Let us calculate { } of the second term for LHS of (10.65). Using (10.26) and (10.27), we have d2 ðawv Þ dðbwv Þ þ cwv dx dx2 h i h i 0 0 0 ¼ ðawÞ0 v þ awv ðbwÞ0 v þ bwv þ cwv h i h i 0 0 0 ¼ bwv þ awv bwÞ0 v þ bwv þ cwv

392

10

Introductory Green’s Functions

¼ ðbwÞ0 v þ bwv þ ðawÞ0 v þ awv ðbwÞ0 v bwv þ cwv 0

0

00

0

¼ ðawÞ0 v þ awv þ cwv ¼ bwv þ awv þ cwv 00

0 00 0 ¼ w av þ bv þ cv ¼ w a v þ b v þ c v 0

00

0

00

¼ wðav00 þ bv0 þ cvÞ :

ð10:66Þ

The second last equality of (10.66) is based on the assumption that a(x), b(x), and c(x) are real functions. Meanwhile, for RHS of (10.65) we have dðawv Þ d d du þ ½bwuv awv u dx dx dx dx d dv d 0 du awv uðawÞ v uaw þ ½bwuv ¼ dx dx dx dx d du dv d ¼ awv ubwv uaw þ ½bwuv dx dx dx dx d du dv d du dv ¼ aw v u p v u ¼ : ð10:67Þ dx dx dx dx dx dx With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we rewrite (10.65) once again as 2 2 d u du d v dv v w a 2 þ b þ cu uw a 2 þ b þ cv dx dx dx dx d du dv p v u ¼ : dx dx dx

ð10:68Þ

Then, integrating (10.68) from r to s, we finally get Z r

s

s du dv dxwðxÞfv ðLx uÞ ½Lx v ug ¼ p v u : dx dx r

ð10:69Þ

The relations (10.69) along with (10.62) are called the generalized Green’s identity. We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real functions, the associated differential operator Lx can be converted to a self-adjoint form following the procedures of (10.66) and (10.67). In the above, LHS of the original homogeneous differential equation (10.5) is rewritten as

10.3

Second-Order Differential Operators

393

d2 u du aðxÞwðxÞ 2 þ bðxÞwðxÞ þ cðxÞwðxÞu dx dx d du pð x Þ ¼ þ cwðxÞu: dx dx Rewriting this, we have Lx u ¼

1 d du pð x Þ þ cu dx wðxÞ dx

½wðxÞ > 0,

ð10:70Þ

where we have pðxÞ ¼ aðxÞwðxÞ and

dpðxÞ ¼ bðxÞwðxÞ: dx

ð10:71Þ

The latter equation of (10.71) corresponds to (10.63) if we assume w(x) 1. When the differential operator Lx is defined as (10.70), Lx is said to be self-adjoint with respect to a weight function of w(x). Now we examine boundary functionals. The homogeneous adjoint boundary functionals are described as follows: dv dv þ γ v ð s Þ þ δ ¼ 0, 1 1 dx x¼r dx x¼s dv dv B{2 ðuÞ ¼ α2 v ðr Þ þ β2 þ γ v ð s Þ þ δ ¼ 0: 2 2 dx x¼r dx x¼s

B{1 ðuÞ ¼ α1 v ðr Þ þ β1

ð10:72Þ ð10:73Þ

In (10.3) putting α1 ¼ 1 and β1 ¼ γ 1 ¼ δ1 ¼ 0, we have B 1 ð uÞ ¼ uð r Þ ¼ σ 1 :

ð10:74Þ

Also putting γ 2 ¼ 1 and α2 ¼ β2 ¼ δ2 ¼ 0, we have B2 ðuÞ ¼ uðsÞ ¼ σ 2 :

ð10:75Þ

σ 1 ¼ σ 2 ¼ 0,

ð10:76Þ

B1 ðuÞ ¼ B2 ðuÞ ¼ 0; i:e:, uðr Þ ¼ uðsÞ ¼ 0:

ð10:77Þ

Further putting

we also get homogeneous BCs of

For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that

394

10

Introductory Green’s Functions

B{1 ðuÞ ¼ v ðr Þ and B{2 ðuÞ ¼ v ðsÞ:

ð10:78Þ

Then, homogeneous adjoint BCs read as v ðr Þ ¼ v ðsÞ ¼ 0, i:e:, vðr Þ ¼ vðsÞ ¼ 0:

ð10:79Þ

In this manner, we can readily construct the homogeneous adjoint BCs the same as those of (10.77) so that Lx can be Hermitian. We list several prescriptions of typical BCs below. ðiÞ uðr Þ ¼ uðsÞ ¼ 0 ðDirichlet conditionsÞ, du du ðiiÞ x¼r ¼ ¼ 0 ðNeumann conditionsÞ, dx dx x¼s du du ðiiiÞ uðr Þ ¼ uðsÞ and ðPeriodic conditionsÞ: x¼r ¼ dx dx x¼s

ð10:80Þ

Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It is because conditions (i) to (iii) are not necessary but sufficient conditions for the surface terms to vanish. Such conditions are not limited to them. Meanwhile, we often have to deal with the nonvanishing surface terms. In that case, we have to start with (10.62) instead of (10.69). In Sect. 10.2, we mentioned the definition of Hermiticity of the differential operator in such a way that the said operator is self-adjoint and that homogeneous BCs and homogeneous adjoint BCs are the same. In light of the above argument, however, we may relax the conditions for a differential operator to be Hermitian. This is particularly the case when p(x) ¼ a(x)w(x) in (10.69) vanishes at both the endpoints. We will encounter such a situation in Sect. 10.7.

10.4

Green’s Functions

Having aforementioned discussions, let us proceed with studies of Green’s functions for SOLDEs. Though minimum, we have to mention a bit of formalism. Given Lx defined by (10.55), let us assume Lx u ð x Þ ¼ d ð x Þ

ð10:81Þ

under homogeneous BCs with an inhomogeneous term d(x) being an arbitrary function. We also assume that (10.81) is well defined in a domain [r, s]. The numbers r and s can be infinity. Suppose simultaneously that we have

10.4

Green’s Functions

395

Lx { vðxÞ ¼ hðxÞ

ð10:82Þ

under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an arbitrary function as well. Let us describe the above relations as Ljui ¼ jdi and L{ jvi ¼ jhi:

ð10:83Þ

Suppose that there is an inverse operator L1 G such that GL ¼ LG ¼ E,

ð10:84Þ

where E is an identity operator. Operating G on (10.83), we have GLjui ¼ Ejui ¼ jui ¼ Gjdi:

ð10:85Þ

This implies that (10.81) has been solved and the solution is given by G| di. Since an inverse operation to differentiation is integration, G is expected to be an integral operator. We have hxjLG j yi ¼ Lx hxjG j yi ¼ Lx Gðx, yÞ:

ð10:86Þ

Meanwhile, using (10.84) we get hxjLG j yi ¼ hxjE j yi ¼ hxjyi:

ð10:87Þ

Using a weight function w(x), we generalize an inner product of (1.128) such that Z

s

hgj f i

wðxÞgðxÞ f ðxÞdx:

ð10:88Þ

r

As we expand an arbitrary vector using basis vectors, we “expand” an arbitrary function | f i using basis vectors | xi. Here, we are treating real numbers as if they formed continuous innumerable basis vectors on a real number line (see Fig. 10.1). Thus, we could expand | f i in terms of | xi such that

Fig. 10.1 Function | f i and its coordinate representation f(x)

=

396

10

Z

s

jfi ¼

Introductory Green’s Functions

dxwðxÞf ðxÞjxi:

ð10:89Þ

r

In (10.89) we considered f(x) as if it were an expansion coefficient. The following notation would be reasonable accordingly: f ðxÞ hxj f i:

ð10:90Þ

In (10.90), f(x) can be viewed as coordinate representation of | f i. Thus, from (10.89) we get 0

Z

0

hx j f i ¼ f ðx Þ ¼

s

dxwðxÞf ðxÞhx0 jxi:

ð10:91Þ

r

Alternatively, we have Z

0

s

f ðx Þ ¼

dxf ðxÞδðx x0 Þ:

ð10:92Þ

r

This comes from a property of the δ function [1] described as Z

s

dxf ðxÞδðxÞ ¼ f ð0Þ:

ð10:93Þ

r

Comparing (10.91) and (10.92), we have wðxÞhx0 jxi ¼ δðx x0 Þ or hx0 jxi ¼

δ ð x x0 Þ δðx0 xÞ ¼ : wðxÞ wðxÞ

ð10:94Þ

Thus comparing (10.86) and (10.87) and using (10.94), we get Lx Gðx, yÞ ¼

δðx yÞ : wðxÞ

ð10:95Þ

δðx yÞ : wðxÞ

ð10:96Þ

In a similar manner, we also have Lx { gðx, yÞ ¼

To arrive at (10.96), we start the discussion assuming an operator (Lx{)1 such that (Lx{)1 g with gL{ ¼ L{g ¼ E. The function G(x, y) is called a Green’s function and g(x, y) is said to be an adjoint Green’s function. Handling of Green’s functions and adjoint Green’s functions is based upon (10.95) and (10.96), respectively. As (10.81) is defined in a domain

10.4

Green’s Functions

397

r x s, (10.95) and (10.96) are defined in a domain r x s and r y s. Notice that except for the point x ¼ y we have Lx Gðx, yÞ ¼ 0 and Lx { gðx, yÞ ¼ 0:

ð10:97Þ

That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homogeneous BCs with respect to the variable x as those imposed upon u(x) and v(x) of (10.81) and (10.82), respectively [1]. The relation (10.88) can be obtained as follows: Operating hg| on (10.89), we have Z

s

hgjf i ¼

Z

s

dxwðxÞf ðxÞhgjxi ¼

r

wðxÞgðxÞ f ðxÞdx,

ð10:98Þ

r

where for the last equality we used hgjxi ¼ hx j gi ¼ gðxÞ :

ð10:99Þ

For this, see (1.113) where A is replaced with an identity operator E with regard to a complex conjugate of an inner product of two vectors. Also see (13.2) of Sect. 13.1. If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions, e.g., (10.80), we have Z

s

n o dxwðxÞ v ðLx uÞ Lx { v u ¼ 0,

ð10:100Þ

r

which is called Green’s identity. Since (10.100) is derived from identities (10.56), (10.100) is an identity as well (as a terminology of Green’s identity shows). Therefore, (10.100) must hold with any functions u and v so far as they satisfy homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96) together with (10.81), we have Z

n o dxwðxÞ g ðx, yÞ½Lx uðxÞ Lx { gðx, yÞ uðxÞ r

Z s δðx yÞ ¼ dxwðxÞ g ðx, yÞdðxÞ uð x Þ wðxÞ r Z s ¼ dxwðxÞg ðx, yÞdðxÞ uðyÞ ¼ 0, s

ð10:101Þ

r

where with the second last equality we used a property of the δ functions. Also notice Þ that δwðxy ðxÞ is a real function. Rewriting (10.101), we get

398

10

Z

s

uð y Þ ¼

Introductory Green’s Functions

dxwðxÞg ðx, yÞd ðxÞ:

ð10:102Þ

r

Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with (10.82), we have Z

s

vð yÞ ¼

dxwðxÞG ðx, yÞhðxÞ:

ð10:103Þ

r

Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have Z

s

n o dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ Lx { gðx, t Þ Gðx, qÞ ¼ 0:

ð10:104Þ

r

Notice that we have chosen q and t for the second argument y in (10.95) and (10.96), respectively. Inserting (10.95) and (10.96) into the above equation after changing arguments, we have Z r

s

δ ð x qÞ δ ðx t Þ dxwðxÞ g ðx, t Þ Gðx, qÞ ¼ 0: wðxÞ w ð xÞ

ð10:105Þ

Thus, we get g ðq, t Þ ¼ Gðt, qÞ or gðq, t Þ ¼ G ðt, qÞ:

ð10:106Þ

This implies that G(t, q) must satisfy the adjoint BCs with respect to the second argument q. Inserting (10.106) into (10.102), we get Z

s

uð y Þ ¼

dxwðxÞGðy, xÞd ðxÞ:

ð10:107Þ

r

Or exchanging the arguments x and y, we have Z

s

uð x Þ ¼

dywðyÞGðx, yÞd ðyÞ:

ð10:108Þ

r

Similarly, using (10.103) into (10.106) we get Z

s

vð yÞ ¼ r

Or, we have

dxwðxÞgðy, xÞhðxÞ:

ð10:109Þ

10.4

Green’s Functions

399

Z

s

vð xÞ ¼

dywðyÞgðx, yÞhðyÞ:

ð10:110Þ

r

Equations (10.107) to (10.110) clearly show that homogeneous equations [given by putting d(x) ¼ h(x) ¼ 0] have a trivial solution u(x) 0 and v(x) 0 under homogeneous BCs. Note that it is always the case when we are able to construct a Green’s function. This in turn implies that we can construct a Green’s function if the differential operator is accompanied by initial conditions. Conversely, if the homogeneous equation has a nontrivial solution under homogeneous BCs, Eqs. (10.107) to (10.110) will not work. If the differential operator L in (10.81) is Hermitian, according to the associated remarks of Sect. 10.2 we must have Lx ¼ Lx{ and u(x) and v(x) of (10.81) and (10.82) must satisfy the same homogeneous BCs. Consequently, in the case of an Hermitian operator we should have Gðx, yÞ ¼ gðx, yÞ:

ð10:111Þ

From (10.106) and (10.111), if the operator is Hermitian we get Gðx, yÞ ¼ G ðy, xÞ:

ð10:112Þ

In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to assure that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra). Then we have Gðx, yÞ ¼ Gðy, xÞ:

ð10:113Þ

That is, G(x, y) is real symmetric with respect to the arguments x and y. To be able to apply Green’s functions to practical use, we will have to estimate a behavior of the Green’s function near x ¼ y. This is because in light of (10.95) and (10.96) there is a “jump” at x ¼ y. When we deal with a case where a self-adjoint operator is relevant, using a function p(x) of (10.69) we have að x Þ ∂ δ ð x yÞ ∂G p : þ cðxÞG ¼ pðxÞ ∂x wðxÞ ∂x

ð10:114Þ

Multiplying both sides by paððxxÞÞ, we have pðxÞ δðx yÞ pðxÞcðxÞ ∂ ∂G p Gðx, yÞ: ¼ aðxÞ wðxÞ að x Þ ∂x ∂x Using a property of the δ function expressed by

ð10:115Þ

400

10

Introductory Green’s Functions

f ðxÞ δðxÞ ¼ f ð0Þ δðxÞ or f ðxÞ δðx yÞ ¼ f ðyÞ δðx yÞ,

ð10:116Þ

pðyÞ δðx yÞ pðxÞcðxÞ ∂ ∂G p Gðx, yÞ: ¼ aðyÞ wðyÞ að x Þ ∂x ∂x

ð10:117Þ

we have

Integrating (10.117) with respect to x, we get p

∂Gðx, yÞ pð y Þ ¼ θ ð x yÞ aðyÞwðyÞ ∂x

Z

x

dt r

pðt Þcðt Þ Gðt, yÞ þ C, að t Þ

ð10:118Þ

where C is a constant. The function θ(x y) is defined by

θ ð xÞ ¼

1 ð x > 0Þ 0 ðx < 0Þ:

ð10:119Þ

Note that we have dθðxÞ ¼ δðxÞ: dx

ð10:120Þ

In RHS of (10.118) the first term has a discontinuity at x ¼ y because of θ(x y), whereas the second term is continuous with respect to y. Thus, we have "

# ∂Gðx, yÞ ∂Gðx, yÞ lim pðy þ εÞ x¼yþε pðy εÞ ε!þ0 ∂x ∂x x¼yε ¼ lim

pð y Þ

ε!þ0 aðyÞwðyÞ

½θðþεÞ θðεÞ ¼

pð y Þ : aðyÞwðyÞ

ð10:121Þ

Since p( y) is continuous with respect to the argument y, this factor drops off and we get "

# ∂Gðx, yÞ ∂Gðx, yÞ 1 : ¼ lim x¼yþε ε!þ0 aðyÞwðyÞ ∂x ∂x x¼yε

ð10:122Þ

ðx, yÞ 1 is accompanied by a discontinuity at x ¼ y by a magnitude of aðyÞw Thus, ∂G∂x ðyÞ. Since RHS of (10.122) is continuous with respect to the argument y, integrating (10.122) again with respect to x, we find that G(x, y) is continuous at x ¼ y. These properties of G(x, y) are useful to calculate Green’s functions in practical use. We will encounter several examples in next sections.

10.5

Construction of Green’s Functions

401

Suppose that there are two Green’s functions that satisfy the same homogeneous e ðx, yÞ be such functions. Then, we must have BCs. Let G(x, y) and G Lx Gðx, yÞ ¼

δ ð x yÞ e ðx, yÞ ¼ δðx yÞ : and Lx G w ð xÞ wðxÞ

ð10:123Þ

Subtracting both sides of (10.123), we have h i e ðx, yÞ Gðx, yÞ ¼ 0: Lx G

ð10:124Þ

e ðx, yÞ must satisfy the same homogeIn virtue of the linearity of BCs, Gðx, yÞ G neous BCs as well. But, (10.124) is a homogeneous equation, and so we must have a trivial solution from the aforementioned constructability of the Green’s function such that e ðx, yÞ 0 or Gðx, yÞ G e ðx, yÞ: Gðx, yÞ G

ð10:125Þ

This obviously indicates that a Green’s function should be unique. We have assumed in Sect. 10.3 that the coefficients a(x), and b(x), and c(x) are real. Therefore, taking complex conjugate of (10.95) we have Lx Gðx, yÞ ¼

δ ð x yÞ : wðxÞ

ð10:126Þ

Notice here that both δ(x y) and w(x) are real functions. Subtracting (10.95) from (10.126), we have Lx ½Gðx, yÞ Gðx, yÞ ¼ 0: Again, from the uniqueness of the Green’s function, we get G(x, y) ¼ G(x, y); i.e., G (x, y) is real accordingly. This is independent of specific structures of Lx. In other words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y) is real whether or not Lx is self-adjoint.

10.5

Construction of Green’s Functions

So far we dealt with homogeneous boundary conditions (BCs) with respect to a differential equation

402

10

að x Þ

Introductory Green’s Functions

d2 u du þ bðxÞ þ cðxÞu ¼ dðxÞ, dx dx2

ð10:2Þ

where coefficients a(x), b(x), and c(x) are real. In this case, if d(x) 0 in (10.108), namely, the SOLDE is homogeneous equation, we have a solution u(x) 0 on the basis of (10.108). If, on the other hand, we have inhomogeneous boundary conditions (BCs), additional terms appear on RHS of (10.108) in both the cases of homogeneous and inhomogeneous equations. In this section, we examine how we can deal with this problem. Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem. In a more general case where the operator is not self-adjoint, (10.62) is useful. In this respect, in Sect. 10.6 we have a good opportunity for this. In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the differential operator in the case where the surface term vanishes. Meanwhile, we should bear in mind that the Green’s functions and adjoint Green’s functions are constructed using homogeneous BCs regardless of whether we are concerned with a homogeneous equation or inhomogeneous equation. Thus, even if the surface terms do not vanish, we may regard the differential operator as Hermitian. This is because we deal with essentially the same Green’s function to solve a problem with both the cases of homogeneous equation and inhomogeneous equations (vide infra). Notice also that whether or not RHS vanishes, we are to use the same Green’s function [1]. In this sense, we do not have to be too strict with the definition of Hermiticity. Now, suppose that for a differential (self-adjoint) operator Lx we are given Z r

s

s du dv dxwðxÞfv ðLx uÞ ½Lx v ug ¼ pðxÞ v u , dx dx r

ð10:69Þ

where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then from (10.95) and (10.112), we have Z s

δðx yÞ δðx yÞ dxwðxÞ Gðy,xÞdðxÞ u ¼ dxwðxÞ Gðy, xÞdðxÞ u wðxÞ wðxÞ r r

s duðxÞ ∂Gðy, xÞ uð x Þ : ¼ pðxÞ Gðy,xÞ dx ∂x x¼r Z

s

ð10:127Þ Note in (10.127) we used the fact that both δ(x y) and w(x) are real. Using a property of the δ function, we get

10.5

Construction of Green’s Functions

Z

s

uð y Þ ¼ r

403

s duðxÞ ∂Gðy, xÞ uð x Þ dxwðxÞGðy, xÞdðxÞ pðxÞ Gðy, xÞ : dx ∂x x¼r ð10:128Þ

When the differential operator Lx can be made Hermitian under an appropriate condition of (10.71), as a real Green’s function we have Gðx, yÞ ¼ Gðy, xÞ:

ð10:113Þ

The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the Dirichlet BCs [see (10.80)], we have Gðr, yÞ ¼ Gðs, yÞ ¼ 0:

ð10:129Þ

Using the symmetric property of G(x, y) with respect to arguments x and y, from (10.129) we get Gðy, r Þ ¼ Gðy, sÞ ¼ 0:

ð10:130Þ

Thus, the first term of the surface terms of (10.128) is eliminated to yield Z uð y Þ ¼ r

s

∂Gðy, xÞ dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ ∂x

s : x¼r

Exchanging the arguments x and y, we get Z uð x Þ ¼ r

s

∂Gðx, yÞ dywðyÞGðx, yÞd ðyÞ þ pðyÞuðyÞ ∂y

s :

ð10:131Þ

y¼r

Then, (i) substituting surface terms of u(s) and u(r) that are associated with the inhomogeneous BCs described as B1 ðuÞ ¼ σ 1 and B2 ðuÞ ¼ σ 2 ðx, yÞ and (ii) calculating ∂G∂y

y¼r

ðx, yÞ and ∂G∂y

y¼s

ð10:132Þ

, we will be able to obtain a unique

solution. Notice that even though we have formally the same differential operators, we get different Green’s functions depending upon different BCs. We see tangible examples later. On the basis of the general discussion of Sect. 10.4 and this section, we are in the position to construct the Green’s functions. Except for the points of x ¼ y, the Green’s function G(x, y) must satisfy the following differential equation:

404

10

Introductory Green’s Functions

Lx Gðx, yÞ ¼ 0,

ð10:133Þ

where Lx is given by Lx ¼ aðxÞ

d2 d þ bðxÞ þ cðxÞ: dx dx2

ð10:55Þ

The differential equation Lxu ¼ d(x) is defined within an interval [r, s], where r may be 1 and s may be +1. From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we expect the Green’s function to be described as a linear combination of a fundamental set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by two linearly independent solutions of a homogeneous equation Lxu ¼ 0. Then we should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are described as F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for r x < y, F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d2 u2 ðxÞ for y < x s,

ð10:134Þ

where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later. These constants are given as a function of y. The combination has to be made such that

Gðx, yÞ ¼

F 1 ðx, yÞ for r x < y, F 2 ðx, yÞ for y < x s:

ð10:135Þ

Thus using θ(x) function defined as (10.119), we describe G(x, y) as Gðx, yÞ ¼ F 1 ðx, yÞθðy xÞ þ F 2 ðx, yÞθðx yÞ:

ð10:136Þ

Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not, because G(x, y) contains the θ(x) function. If we have F 2 ðx, yÞ ¼ F 1 ðy, xÞ,

ð10:137Þ

Gðx, yÞ ¼ F 1 ðx, yÞθðy xÞ þ F 1 ðy, xÞθðx yÞ:

ð10:138Þ

Gðx, yÞ ¼ Gðy, xÞ:

ð10:139Þ

Hence, we get

From (10.113), Lx is Hermitian. Suppose that F1(x, y) ¼ (x r)(y s) and F2(x, y) ¼ (x s)(y r). Then, (10.137) is satisfied and, hence, if we can construct the

10.5

Construction of Green’s Functions

405

Green’s function from F1(x, y) and F2(x, y), Lx should be Hermitian. However, if we had, e.g., F1(x, y) ¼ x r and F2(x, y) ¼ y s, G(x, y) 6¼ G(y, x), and so Lx would not be Hermitian. The Green’s functions must satisfy the homogeneous BCs. That is, B1 ðGÞ ¼ B2 ðGÞ ¼ 0:

ð10:140Þ

Also, we require continuity condition of G(x, y) at x ¼ y and discontinuity condition ðx, yÞ of ∂G∂x at x ¼ y described by (10.122). Thus, we have four conditions including BCs and continuity and discontinuity conditions to be satisfied by G(x, y). Thus, we can determine four constants c1, c2, d1, and d2 by the four conditions. Now, let us inspect further details about the Green’s functions by an example. Example 10.4 Let us consider a following differential equation d2 u þ u ¼ 1: dx2

ð10:141Þ

We assume that a domain of the argument x is [0, L]. We set boundary conditions such that uð0Þ ¼ σ 1 and uðLÞ ¼ σ 2 :

ð10:142Þ

Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomogeneous differential equation under inhomogeneous BCs. Next, let us seek conditions that the Green’s function satisfies. We also seek a fundamental set of solutions of a homogeneous equation described by d2 u þ u ¼ 0: dx2

ð10:143Þ

This is obtained by putting a ¼ c ¼ 1 and b ¼ 0 in a general form of (10.5) with a weight function being unity. The differential equation (10.143) is therefore selfadjoint according to the argument of Sect. 10.3. A fundamental set of solutions are given by eix and eix : Then, we have F 1 ðx, yÞ ¼ c1 eix þ c2 eix for 0 x < y, F 2 ðx, yÞ ¼ d1 eix þ d 2 eix for y < x L:

ð10:144Þ

The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that

406

10

Introductory Green’s Functions

F 1 ð0, yÞ ¼ c1 þ c2 ¼ 0 and F 2 ðL, yÞ ¼ d 1 eiL þ d 2 eiL ¼ 0:

ð10:145Þ

Thus, we have F 1 ðx, yÞ ¼ c1 eix eix , F 2 ðx, yÞ ¼ d1 eix e2iL eix :

ð10:146Þ

Therefore, at x ¼ y we have c1 eiy eiy ¼ d 1 eiy e2iL eiy :

ð10:147Þ

Discontinuity condition of (10.122) is equivalent to ∂F 2 ðx, yÞ ∂F 1 ðx, yÞ ¼ 1: x¼y ∂x ∂x x¼y

ð10:148Þ

This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to be differentiable at any x. The relation (10.148) then reads as id 1 eiy þ e2iL eiy ic1 eiy þ eiy ¼ 1:

ð10:149Þ

From (10.147) and (10.149), using Cramer’s rule we have 0 eiy þ e2iL eiy i eiy e2iL eiy iðeiy e2iL eiy Þ ¼ c1 ¼ iy , iy iy 2iL iy e e 2ð1 e2iL Þ e þ e e eiy þ eiy eiy e2iL eiy iy e eiy 0 eiy þ eiy i iðeiy eiy Þ ¼ : d1 ¼ iy e eiy eiy þ e2iL eiy 2ð1 e2iL Þ eiy þ eiy eiy e2iL eiy

ð10:150Þ

ð10:151Þ

Substituting these parameters for (10.146), we get F 1 ðx, yÞ ¼

sin xðe2iL eiy eiy Þ sin yðe2iL eix eix Þ , F ð x, y Þ ¼ : 2 1 e2iL 1 e2iL

Making a denominator real, we have

ð10:152Þ

10.5

Construction of Green’s Functions

407

F 1 ðx, yÞ ¼

sin x½ cos ðy 2LÞ cos y , 2 sin 2 L

F 2 ðx, yÞ ¼

sin y½ cos ðx 2LÞ cos x : 2 sin 2 L

ð10:153Þ

Using the θ(x) function, we get Gðx, yÞ ¼

sin x½ cos ðy 2LÞ cos y sin y½ cos ðx 2LÞ cos x θðy xÞ þ θðx yÞ: 2 sin 2 L 2 sin 2 L ð10:154Þ

Thus, G(x, y) ¼ G(y, x) as expected. Notice, however, that if L ¼ nπ (n ¼ 1, 2, ) the Green’s function cannot be defined as an ordinary function even if x 6¼ y. We return to this point later. The solution for (10.141) under the homogeneous BCs is then described as Z uðxÞ ¼

L

dyGðx, yÞ

0

¼

cos ðx 2LÞ cos x 2 sin 2 L

Z

x 0

sin ydy þ

sin x 2 sin 2 L

Z

L

½ cos ðy 2LÞ cos ydy:

x

ð10:155Þ This can readily be integrated to yields solution for the inhomogeneous equation such that cos ðx 2LÞ cos x 2 sin L sin x cos 2L þ 1 2 sin 2 L cos ðx 2LÞ cos x 2 sin L sin x þ 2 sin 2 L ¼ : 2 sin 2 L

uð x Þ ¼

ð10:156Þ

Next, let us consider the surface term. This is given by the second term of (10.131). We get ∂F 1 ðx, yÞ cos ðx 2LÞ cos x sin x ∂F 2 ðx, yÞ , ¼ ¼ : y¼L sin L ∂x ∂x y¼0 2 sin 2 L

ð10:157Þ

Therefore, with the inhomogeneous BCs we have the following solution for the inhomogeneous equation:

408

10

uð x Þ ¼

Introductory Green’s Functions

cos ðx 2LÞ cos x 2 sin L sin x þ 2 sin 2 L 2 sin 2 L 2σ sin L sin x þ σ 1 ½ cos x cos ðx 2LÞ þ 2 , 2 sin 2 L

ð10:158Þ

where the second term is the surface term. If σ 1 ¼ σ 2 ¼ 1, we have uðxÞ 1: Looking at (10.141), we find that u(x) 1 is certainly a solution for (10.141) with inhomogeneous BCs of σ 1 ¼ σ 2 ¼ 1. The uniqueness of the solution then ensures that u(x) 1 is a sole solution under the said BCs. From (10.154), we find that G(x, y) has a singularity at L ¼ nπ (n : integers). This is associated with the fact that a homogenous equation (10.143) has a nontrivial solution, e.g., u(x) ¼ sin x under homogeneous BCs u(0) ¼ u(L ) ¼ 0. The present situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words, when λ ¼ 1 in (1.61), the form of a differential equation is identical to (10.143) with virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a homogeneous equation and, at the same time, as an eigenvalue equation. In such a case, a Green’s function approach will fail.

10.6

Initial Value Problems (IVPs)

10.6.1 General Remarks The IVPs frequently appear in mathematical physics. The relevant conditions are dealt with as BCs in the theory of differential equations. With boundary functionals B1(u) and B2(u) of (10.3) and (10.4), setting α1 ¼ β2 ¼ 1 and other coefficients as zero, we get B1 ðuÞ ¼ uðpÞ ¼ σ 1 and B2 ðuÞ ¼

du ¼ σ2 : dx x¼p

ð10:159Þ

In the above, note that we choose [r, s] for a domain of x. The points r and s can be infinity as before. Any point p within the domain [r, s] may be designated as a special point on which the BCs (10.159) are imposed. The initial conditions are particularly prominent among BCs. This is because the conditions are set at one point of the argument. This special condition is usually called initial conditions. In this section, we investigate fundamental characteristics of IVPs. Suppose that we have

10.6

Initial Value Problems (IVPs)

409

uð pÞ ¼

du ¼ 0, dx x¼p

ð10:160Þ

with homogeneous BCs. Given a differential operator Lx defined as (10.55), i.e., Lx ¼ aðxÞ

d2 d þ bðxÞ þ cðxÞ, dx dx2

ð10:55Þ

let a fundamental set of solutions be u1(x) and u2(x) for Lx uðxÞ ¼ 0:

ð10:161Þ

A general solution u(x) for (10.161) is given by a linear combination of u1(x) and u2(x) such that uðxÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ,

ð10:162Þ

where c1 and c2 are arbitrary (complex) constants. Suppose that we have homogeneous BCs expressed by (10.160). Then, we have uðpÞ ¼ c1 u1 ðpÞ þ c2 u2 ðpÞ ¼ 0, u0 ðpÞ ¼ c1 u1 0 ðpÞ þ c2 u2 0 ðpÞ ¼ 0: Rewriting it in a matrix form, we have

u1 ð pÞ

u2 ð pÞ

u1 0 ð pÞ

u2 0 ðpÞ

c1 c2

¼ 0:

Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and u2(x), its determinant never vanishes at any point p. That is, we have u1 ð pÞ u2 ð pÞ u 0 ðpÞ u 0 ðpÞ 6¼ 0: 1 2

ð10:163Þ

Then, we necessarily have c1 ¼ c2 ¼ 0. From (10.162), we have a trivial solution uð x Þ 0 under the initial conditions as homogeneous BCs. Thus, as already discussed a Green’s function can always be constructed for IVPs. To seek the Green’s functions for IVPs, we return back to the generalized Green’s identity described as

410

10

Z

s r

Introductory Green’s Functions

s h { i dðav Þ du þ buv : u dx v ðLx uÞ Lx v u ¼ av dx dx r

ð10:62Þ

For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g., uð s Þ ¼

du dv x¼s ¼ 0 and vðr Þ ¼ ¼ 0, dx dx x¼r

for the two sets of BCs adjoint to each other. Obviously, these are not identical simply because the former is determined at s and the latter is determined at a different point r. For this reason, the operator Lx is not Hermitian, even though it is formally self-adjoint. In such a case, we would rather use Lx directly than construct a selfadjoint operator because we cannot make the operator Hermitian either way. Hence, unlike the precedent sections we do not need a weight function w(x). Or, we may regard w(x) 1. Then, we reconsider the conditions which the Green’s functions should satisfy. On the basis of the general consideration of Sect. 8.4, especially (10.86), (10.87), and (10.94), we have [2]. Lx Gðx, yÞ ¼ hxjyi ¼ δðx yÞ:

ð10:164Þ

Therefore, we have 2

∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ δðx yÞ þ þ Gðx, yÞ ¼ : 2 að x Þ að x Þ að x Þ ∂x ∂x Integrating or integrating by parts the above equation, we get x 0 Z x Z x ∂Gðx, yÞ bð x Þ bð ξ Þ cðξÞ þ Gðξ, yÞdξ þ Gðx, yÞ Gðξ, yÞdξ að x Þ a ð ξ Þ a ∂x x0 x0 ð ξ Þ x0 ¼

θ ð x yÞ : aðyÞ

Noting that the functions other than have

∂Gðx, yÞ ∂x

and

θðxyÞ aðyÞ

are continuous, as before we

"

# ∂Gðx, yÞ ∂Gðx, yÞ 1 : ¼ lim x¼yþε ε!þ0 að y Þ ∂x ∂x x¼yε

ð10:165Þ

10.6

Initial Value Problems (IVPs)

411

10.6.2 Green’s Functions for IVPs From a practical point of view, we may set r ¼ 0 in (10.62). Then, we can choose a domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use x instead of s. We consider two cases of x > 0 and x < 0. (i) Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define F1(x, y) and F2(x, y) as before such that F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for 0 x < y, F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d 2 u2 ðxÞ for 0 < y < x:

ð10:166Þ

As before, we set

Gðx, yÞ ¼

F 1 ðx, yÞ for 0 x < y, F 2 ðx, yÞ

for 0 < y < x:

Homogeneous BCs are defined as uð0Þ ¼ 0 and u0 ð0Þ ¼ 0: Correspondingly, we have F 1 ð0, yÞ ¼ 0 and F 1 0 ð0, yÞ ¼ 0: This is translated into c1 u1 ð0Þ þ c2 u2 ð0Þ ¼ 0 and c1 u1 0 ð0Þ þ c2 u2 0 ð0Þ ¼ 0: In a matrix form, we get

u1 ð 0Þ u1 0 ð 0Þ

u2 ð 0Þ u2 0 ð0Þ

c1 c2

¼ 0:

As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions, we have c1 ¼ c2 ¼ 0. Hence, we get F 1 ðx, yÞ ¼ 0: From the continuity and discontinuity conditions (10.165) imposed upon the Green’s functions, we have

412

10

Introductory Green’s Functions

d 1 u1 ðyÞ þ d2 u2 ðyÞ ¼ 0 and d 1 u1 0 ðyÞ þ d 2 u2 0 ðyÞ ¼ 1=aðyÞ:

ð10:167Þ

As before, we get d1 ¼

u2 ð y Þ u1 ð y Þ and d 2 ¼ , aðyÞW ðu1 ðyÞ, u2 ðyÞÞ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get F 2 ðx, yÞ ¼

u2 ðxÞu1 ðyÞ u1 ðxÞu2 ðyÞ : aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

ð10:168Þ

(ii) Case II (x < 0): Next, we think of the case as below: F 1 ðx, yÞ ¼ c1 u1 ðxÞ þ c2 u2 ðxÞ for y < x 0, F 2 ðx, yÞ ¼ d1 u1 ðxÞ þ d 2 u2 ðxÞ for x < y < 0:

ð10:169Þ

Similarly proceeding as the above, we have c1 ¼ c2 ¼ 0. Also, we get F 2 ðx, yÞ ¼

u1 ðxÞu2 ðyÞ u2 ðxÞu1 ðyÞ : aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

ð10:170Þ

Here notice that the sign is reversed in (10.170)) relative to (10.168). This is because on the discontinuity condition, instead of (10.167) we have to have d 1 u1 0 ðyÞ þ d2 u2 0 ðyÞ ¼ 1=aðyÞ: This results from the fact that magnitude relationship between the arguments x and y has been reversed in (10.169) relative to (10.166). Summarizing the above argument, (10.168) is obtained in the domain 0 y < x; (10.170) is obtained in the domain x < y < 0. Noting this characteristic, we define a function such that Θðx, yÞ θðx yÞθðyÞ θðy xÞθðyÞ:

ð10:171Þ

Notice that Θðx, yÞ ¼ Θðx, yÞ: That is, Θ(x, y) is antisymmetric with respect to the origin. Figure 10.2 shows a feature of Θ(x, y). If the “initial point” is taken at x ¼ a, we can use Θ(x a, y a) instead; see Fig. 10.3. The function is described as

10.6

Initial Value Problems (IVPs)

413 x

Fig. 10.2 Graph of a function Θ(x, y). Θ(x, y) ¼ 1 or 1 in hatched areas, otherwise Θ(x, y) ¼ 0

,

=1

( , ) y

O

,

= −1

x

Fig. 10.3 Graph of a function Θ(x a, y a). We assume a > 0

( − , − ) O y

Θðx a, y aÞ ¼ θðx yÞθðy aÞ θðy xÞθða yÞ: Note that Θ(x a, y a) can be obtained by shifting Θ(x, y) toward the positive direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3 we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as Gðx, yÞ ¼

u2 ðxÞu1 ðyÞ u1 ðxÞu2 ðyÞ Θðx, yÞ: aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

Defining a function F such that

ð10:172Þ

414

10

F ðx, yÞ

Introductory Green’s Functions

u2 ðxÞu1 ðyÞ u1 ðxÞu2 ðyÞ , aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

ð10:173Þ

we have Gðx, yÞ ¼ F ðx, yÞΘðx, yÞ:

ð10:174Þ

Θðx, yÞ 6¼ Θðy, xÞ and Gðx, yÞ 6¼ Gðy, xÞ:

ð10:175Þ

Notice that

It is therefore obvious that the differential operator is not Hermitian.

10.6.3 Estimation of Surface Terms To include the surface term of the inhomogeneous case, we use (10.62). Z

s r

s h i dðav Þ du þ buv : dx v ðLx uÞ Lx { v u ¼ av u dx dx r

ð10:62Þ

As before, we set r ¼ 0 in (10.62). Also, we classify (10.62) into two cases according as s > 0 or s < 0. (i) Case I (x > 0, y > 0): Equation (10.62) reads as Z

1

0

1 h { i dðav Þ du þ buv dx v ðLx uÞ Lx v u ¼ av : u dx dx 0

ð10:176Þ

This time, inserting gðx, yÞ ¼ G ðy, xÞ ¼ Gðy, xÞ

ð10:177Þ

into v of (10.176) and arranging terms, we have Z uð y Þ ¼

1

dxGðy, xÞd ðxÞ

0

1 duðxÞ daðxÞ ∂Gðy,xÞ aðxÞGðy,xÞ þ bðxÞuðxÞGðy,xÞ : uð x Þ Gðy, xÞ uðxÞaðxÞ dx dx ∂x x¼0 ð10:178Þ

10.6

Initial Value Problems (IVPs)

415

x

Fig. 10.4 Domain of G(y, x). Areas for which G(y, x) does not vanish are hatched

O

y

Note that in the above we used Lx{g(x, y) ¼ δ(x y). In Fig. 10.4, we depict a domain of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain by folding back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y ¼ x. Thus, we find that ðy, xÞ G(y, x) vanishes at x > y. So does ∂G∂x ; see Fig. 10.4. Namely, the second term of RHS of (10.178) vanishes at x ¼ 1. In other words, g(x, y) and G(y, x) must satisfy the adjoint BCs; i.e., gð1, yÞ ¼ Gðy, 1Þ ¼ 0:

ð10:179Þ

At the same time, the upper limit of integration range of (10.178) can be set at y. Noting the above, we have Z

y

ðthe first termÞ of ð10:178Þ ¼

dxGðy, xÞdðxÞ:

ð10:180Þ

0

Also with the second term of (10.178), we get ðthe second termÞ of ð10:178Þ ¼ duðxÞ daðxÞ ∂Gðy, xÞ þ aðxÞGðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ dx dx ∂x x¼0 duð0Þ dað0Þ ∂Gðy, xÞ uð 0Þ Gðy, 0Þ uð0Það0Þ ¼ að0ÞGðy, 0Þ dx dx ∂x x¼0 þ bð0Þuð0ÞGðy, 0Þ: If we substitute inhomogeneous BCs

ð10:181Þ

416

10

uð0Þ ¼ σ 1 and

Introductory Green’s Functions

du ¼ σ2 , dx x¼0

ð10:182Þ

for (10.181) along with other appropriate values, we should be able to get a unique solution as Z uð y Þ ¼

y

dxGðy, xÞdðxÞ

0

dað0Þ ∂Gðy, xÞ þ σ 1 bð0Þ Gðy, 0Þσ 1 að0Þ þ σ 2 að 0Þ σ 1 : ð10:183Þ dx ∂x x¼0 Exchanging arguments x and y, we get Z uð x Þ ¼ 0

x

dyGðx, yÞdðyÞ

dað0Þ ∂Gðx, yÞ þ σ 1 bð0Þ Gðx, 0Þσ 1 að0Þ þ σ 2 að 0Þ σ 1 : ð10:184Þ dy ∂y y¼0

Here, we consider that Θ(x, y) ¼ 1 in this region and use (10.174). Meanwhile, from (10.174), we have ∂Gðx, yÞ ∂F ðx, yÞ ∂Θðx, yÞ ¼ Θðx, yÞ þ F ðx, yÞ: ∂y ∂y ∂y

ð10:185Þ

In the second term, ∂Θðx, yÞ ∂θðx yÞ ∂θðyÞ ∂θðy xÞ ¼ θðyÞ þ θðx yÞ θðyÞ ∂y ∂y ∂y ∂y θ ð y xÞ

∂θðyÞ ∂y

¼ δðx yÞθðyÞ þ θðx yÞδðyÞ δðy xÞθðyÞ þ θðy xÞδðyÞ ¼ δðx yÞ½θðyÞ þ θðyÞ þ ½θðx yÞ þ θðy xÞδðyÞ ¼ δðx yÞ½θðyÞ þ θðyÞ þ ½θðxÞ þ θðxÞδðyÞ ¼ δðx yÞ þ δðyÞ,

ð10:186Þ

where we used θðxÞ þ θðxÞ 1 as well as

ð10:187Þ

10.6

Initial Value Problems (IVPs)

417

f ðyÞδðyÞ ¼ f ð0ÞδðyÞ

ð10:188Þ

δðyÞ ¼ δðyÞ:

ð10:189Þ

and

However, the function δ(x y) + δ( y) is of secondary importance. It is because in (10.184) we may choose [ε, x ε] (ε > 0) for the domain y and put ε ! + 0 after the integration and other calculations related to the surface terms. Therefore, δ(x y) + δ( y) in (10.186) virtually vanishes. Thus, we can express (10.185) as ∂Gðx, yÞ ∂F ðx, yÞ ∂F ðx, yÞ ¼ Θðx, yÞ ¼ : ∂y ∂y ∂y Then, finally we reach Z uð x Þ ¼

x

dyF ðx, yÞdðyÞ

0

dað0Þ ∂F ðx, yÞ þ σ 2 að 0Þ σ 1 : ð10:190Þ þ σ 1 bð0Þ F ðx, 0Þσ 1 að0Þ dy ∂y y¼0 (ii) Case II (x < 0, y < 0): Similarly as the above, Eq. (10.62) reads as Z uð y Þ ¼

0

1

dxGðy, xÞd ðxÞ

0 duðxÞ daðxÞ ∂Gðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ : aðxÞGðy, xÞ dx dx ∂x 1 Similarly as mentioned above, the lower limit of integration range is y. Considering ðy, xÞ vanish at x < y (see Fig. 10.4), we have both G(y, x) and ∂G∂x Z 0 uð y Þ ¼ dxGðy, xÞd ðxÞ y duðxÞ daðxÞ ∂Gðy, xÞ uð x Þ Gðy, xÞ uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ aðxÞGðy, xÞ dx dx ∂x x¼0 Z y dað0Þ ¼ dxGðy, xÞdðxÞ σ 2 að0Þ σ 1 þ σ 1 bð0Þ Gðy, 0Þ dx 0 ∂Gðy, xÞ þσ 1 að0Þ : ð10:191Þ ∂x x¼0

418

10

Introductory Green’s Functions

Comparing (10.191) with (10.183), we recognize that the sign of RHS of (10.191) has been reversed relative to RHS of (10.183). This is also the case after exchanging arguments x and y. Note, however, Θ(x, y) ¼ 1 in the present case. As a result, two minus signs cancel and (10.191) takes exactly the same expression as (10.183). Proceeding with calculations similarly, for both Cases I and II we arrive at a unified solution represented by (10.190) throughout a domain (1, +1).

10.6.4 Examples To deepen understanding of Green’s functions, we deal with tangible examples of the IVP below. Example 10.5 Let us consider a following inhomogeneous differential equation d2 u þ u ¼ 1: dx2

ð10:192Þ

Note that (10.192) is formally the same differential equation of (10.141). We may encounter (10.192) when we are observing a motion of a charged harmonic oscillator that is placed under a static electric field. We assume that a domain of the argument x is a whole range of real numbers. We set boundary conditions such that uð0Þ ¼ σ 1 and u0 ð0Þ ¼ σ 2 :

ð10:193Þ

As in the case of Example 10.4, a fundamental set of solutions are given by eix and eix :

ð10:194Þ

Therefore, following (10.173), we get F ðx, yÞ ¼ sin ðx yÞ:

ð10:195Þ

Also following (10.190) we have Z

x

uð x Þ ¼ 0

dy sin ðx yÞ þ σ 2 sin x σ 1 ½ cos ðx yÞ

y¼0

¼ 1 cos x þ σ 2 sin x þ σ 1 cos x: In particular, if we choose σ 1 ¼ 1 and σ 2 ¼ 0, we have

ð10:196Þ

10.6

Initial Value Problems (IVPs)

419

uðxÞ 1:

ð10:197Þ

This also ensures that this is a unique solution under the inhomogeneous BCs described as σ 1 ¼ 1 and σ 2 ¼ 0. Example 10.6: Damped Oscillator If a harmonic oscillator undergoes friction, the oscillator exerts damped oscillation. Such an oscillator is said to be a damped oscillator. The damped oscillator is often dealt with when we think of bound electrons in a dielectric medium that undergo an effect of a dynamic external field varying with time. This is the case when the electron is placed in an alternating electric field or an electromagnetic wave. An equation of motion of the damped oscillator is described as m

d2 u du þ r þ ku ¼ d ðxÞ, dx dt 2

ð10:198Þ

where m is a mass of an electron; r is a damping constant; k is a spring constant of the damped oscillator. To seek a fundamental set of solutions of a homogeneous equation described as m

d2 u du þ r þ ku ¼ 0, dx dt 2

putting u ¼ eiρt

ð10:199Þ

and inserting it to (10.198), we have

r k iρt ρ þ iρ þ e ¼ 0: m m 2

Since eiρt does not vanish, we have ρ2 þ

r k iρ þ ¼ 0: m m

ð10:200Þ

We call this equation a characteristic quadratic equation. We have three cases for the solution of a quadratic equation of (10.200). Solving (10.200), we get ir ρ¼ 2m

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r2 k 2þ : m 4m

ð10:201Þ 2

r k (i) Equation (10.201) gives two pure imaginary roots; i.e., 4m 2 þ m < 0 (an over 2 r k damping). (ii) The equation has double roots; 4m 2 þ m ¼ 0 (a critical damping).

420

10

Introductory Green’s Functions

2

r k (iii) The equation has two complex roots; 4m 2 þ m > 0 (a weak damping). Of these, Case (iii) is characterized by an oscillating solution and has many applications in mathematical physics. For the Cases (i) and (ii), on the other hand, we do not have an oscillating solution. Case (i): The characteristic roots are given by

ir ρ¼ i 2m

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r2 k : 4m2 m

Therefore, we have a fundamental set of solutions described by

rt exp uðt Þ ¼ exp 2m

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! r2 k t : 4m2 m

Then, a general solution is given by " rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !#

rt r2 k r2 k t þ b exp t : a exp uðt Þ ¼ exp 2 2 2m m m 4m 4m Case (ii): The characteristic roots are given by ρ¼

ir : 2m

Therefore, one of the solutions is

rt u1 ðt Þ ¼ exp : 2m Another solution u2(t) is given by u 2 ðt Þ ¼ c

∂u1 ðt Þ rt ¼ c0 t exp , 2m ∂ðiρÞ

where c and c0 are appropriate constants. Thus, general solution is given by

rt rt þ bt exp : uðt Þ ¼ a exp 2m 2m The most important and interesting feature emerges as a “damped oscillator” in the next Case (iii) in many fields of natural science. We are particularly interested in this case.

10.6

Initial Value Problems (IVPs)

421

Case (iii): Suppose that the damping is relatively weak such that the characteristic equation has two complex roots. Let us examine further details of this case following the prescriptions of IVPs. We divide (10.198) by m for the sake of easy handling of the differential equation such that d 2 u r du k 1 þ u ¼ dðxÞ: þ m dt 2 m dx m Putting rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r2 k ω 2þ , m 4m

ð10:202Þ

we get a fundamental set of solutions described as

rt exp ð iωt Þ: uðt Þ ¼ exp 2m

ð10:203Þ

Given BCs, following (10.172) we get as a Green’s function Gðt, τÞ ¼

u2 ðt Þu1 ðτÞ u1 ðt Þu2 ðτÞ Θðt, τÞ W ðu1 ðτÞ, u2 ðτÞÞ

1 2mr ðtτÞ e sin ωðt τÞΘðt, τÞ, ð10:204Þ ω rt rt where u1 ðt Þ ¼ exp 2m exp ðiωt Þ and u2 ðt Þ ¼ exp 2m exp ðiωt Þ. We examine whether G(t, τ) is eligible for the Green’s function as follows: ¼

dG 1 ¼ dt ω þ

h

i

r r 2mr ðtτÞ sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ Θðt, τÞ e 2m

1 2mr ðtτÞ e sin ωðt τÞ½δðt τÞθðτÞ þ δðτ t ÞθðτÞ: ω

ð10:205Þ

The second term of (10.205) vanishes because sinω(t τ)δ(t τ) ¼ 0 δ(t τ) ¼ 0. Thus, d2 G 1 ¼ ω dt 2

r r 2 2mr ðtτÞ r e sin ωðt τÞ ωe2mðtτÞ cos ωðt τÞ 2m m

ω2 e2mðtτÞ sin ωðt τÞΘðt, τÞ

h i r 1 r 2mr ðtτÞ þ e sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ ω 2m r

½δðt τÞθðτÞ þ δðτ t ÞθðτÞ:

ð10:206Þ

422

10

Introductory Green’s Functions

In the last term using the property of the δ function and θ function, we get δ(t τ). Note here that e2mðtτÞ cos ωðt τÞ½δðt τÞθðτÞ þ δðτ t ÞθðτÞ r

¼ e2m0 cos ðω 0Þfδðt τÞ½θðτÞ þ θðτÞg r

¼ δðt τÞ½θðτÞ þ θðτÞ ¼ δðt τÞ: Thus, rearranging (10.206), we get

r d2 G 1 r 2 2 2m f ¼ δ ð t τ Þ þ þ ω e ðtτÞ sin ωðt τÞ ω 2m dt 2

h i r r r 2mr ðtτÞ e sin ωðt τÞ þ ωe2mðtτÞ cos ωðt τÞ g Θðt, τÞ m 2m k r dG , ð10:207Þ ¼ δ ðt τ Þ G m m dt where we used (10.202) for the last equality. Rearranging (10.207) once again, we have d 2 G r dG k þ þ G ¼ δðt τÞ: dt 2 m dt m

ð10:208Þ

Defining the following operator Lt

d2 r d k þ , þ dt 2 m dt m

ð10:209Þ

we get Lt G ¼ δðt τÞ:

ð10:210Þ

Note that this expression is consistent with (10.164). Thus, we find that (10.210) satisfies the condition (10.123) of the Green’s function, where the weight function is identified with unity. Now, suppose that a sinusoidally changing external field eiΩt influences the motion of the damped oscillator. Here we assume that an amplitude of the external field is unity. Then, we have d2 u r du k 1 þ u ¼ eiΩt : þ m dt 2 m dx m

ð10:211Þ

10.6

Initial Value Problems (IVPs)

423

Thus, as a solution of the homogeneous boundary conditions [i.e., uð0Þ ¼ uð_0Þ ¼ 0] we get 1 uð t Þ ¼ mω

Z

t

e2mðtτÞ eiΩt sin ωðt τÞdτ, r

ð10:212Þ

0

where t is an arbitrary positive or negative time. Equation (10.212) shows that with the real part we have an external field cosΩt and that with the imaginary part we have an external field sinΩt. To calculate (10.212) we use sin ωðt τÞ ¼

h i 1 iωðtτÞ e eiωðtτÞ : 2i

ð10:213Þ

Then, the equation can readily be solved by integration of exponential functions, even though we have to do somewhat lengthy (but straightforward) calculations. Thus for the real part (i.e., the external field is cosΩt), we get a solution Cuðt Þ ¼

1 r 1 r 2 1 2 ΩsinΩt þ Ω ω2 cos Ωt cos Ωt m m m 2m m h 1 r r 1 2 þ e2mt Ω2 ω2 cos ωt Ω þ ω2 sin ωt m 2m ω 2 3 r r 1 sin ωt, ð10:214Þ cos ωt 2m 2m ω

where C is a constant [i.e., a constant denominator of u(t)] expressed as 2 r 4 r2 : C ¼ Ω 2 ω 2 þ 2 Ω 2 þ ω2 þ 2m 2m

ð10:215Þ

For the imaginary part (i.e., the external field is sinΩt) we get

1 2 1 r 1 r 2 Ω ω2 sin Ωt ΩcosΩt þ sin Ωt m m m m 2m 2 1 2mr t Ω 2 r r Ω 2 þ e Ω ω sin ωt þ Ωcosωt þ sin ωt : m ω m 2m ω

Cuðt Þ ¼

ð10:216Þ In Fig. 10.5 we show an example that depicts the positions of a damped oscillator as a function of time. In Fig. 10.5a, an amplitude of an envelope gradually diminishes with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the initial conditions uð0Þ ¼ uð_0Þ ¼ 0 . In Fig. 10.5, we put m ¼ 1 [kg], Ω ¼ 1 1s , ω ¼ 0:94 1s , and r ¼ 0:006 kgs . In the above calculations, if mr is small enough (i.e., damping is small enough), the third order and fourth order of mr may be ignored and the approximation is precise enough.

424

10

(a)

Introductory Green’s Functions

3RVLWLRQP

ಥ

(b)

Phase: 7LPHV

3RVLWLRQP

ಥ

Phase: 0

7LPHV

Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a) Overall profile. (b) Profile enlarged near the origin

In the case of inhomogeneous BCs, given σ 1 ¼ u(0) and σ 2 ¼ uð_0Þ we can decide additional terms S(t) using (10.190) such that

r r 1 2mr t sin ωt þ σ 1 e2mt cos ωt: e Sð t Þ ¼ σ 2 þ σ 1 2m ω

ð10:217Þ

This term arises from (10.190). Thus, from (10.212) and (10.217), u(t) + S(t) gives a unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t) does not depend on the external field.

10.7

10.7

Eigenvalue Problems

425

Eigenvalue Problems

We often encounter eigenvalue problems in mathematical physics. Of these, those related to Hermitian differential operators have particularly interesting and important features. The eigenvalue problems we have considered in Part I are typical illustrations. Here we investigate general properties of the eigenvalue problems. Returning to the case of homogeneous BCs, we consider a following homogeneous SOLDE: að x Þ

d2 u du þ bðxÞ þ cðxÞu ¼ 0: dx dx2

ð10:5Þ

Defining a following differential operator Lx such that Lx ¼ aðxÞ

d2 d þ bðxÞ þ cðxÞ, dx dx2

ð10:55Þ

we have a homogeneous equation Lx uðxÞ ¼ 0:

ð10:218Þ

Putting a constant λ instead of c(x), we have að x Þ

d2 u du þ bðxÞ λu ¼ 0: dx dx2

ð10:219Þ

If we define a differential operator Lx such that L x ¼ að x Þ

d2 d þ bðxÞ λ, dx dx2

ð10:220Þ

Lx u ¼ 0

ð10:221Þ

we have a homogeneous equation

to express (10.219). Instead, if we define a differential operator Lx such that Lx ¼ aðxÞ

d2 d þ bð x Þ , dx dx2

we have the same homogeneous equation

426

10

Introductory Green’s Functions

Lx u ¼ λu

ð10:222Þ

to express (10.219). Equations (10.221) and (10.222) are essentially the same except that the expression is different. The expression using (10.222) is familiar to us as an eigenvalue equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is a given fixed function, λ in (10.219) is constant, but may be varied according to the solution of u(x). One of the most essential properties of the eigenvalue problem that is posed in the form of (10.222) is that its solution is not uniquely determined as already studied in various cases of Part I. Remember that the methods based upon the Green’s function are valid for a problem to which a homogeneous differential equation has a trivial solution (i.e., identically zero) under homogeneous BCs. In contrast to this situation, even though the eigenvalue problem is basically posed as a homogeneous equation under homogeneous BCs, nontrivial solutions are expected to be obtained. In this respect, we have seen that in Part I we rejected a trivial solution (i.e., identically zero) because of no physical meaning. As exemplified in Part I, the eigenvalue problems that appear in mathematical physics are closely connected to the Hermiticity of (differential) operators. This is because in many cases an eigenvalue is required to be real. We have already examined how we can convert a differential operator to the self-adjoint form. That is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in (10.70). As a symbolic description, we have wðxÞLx u ¼

d du pð x Þ þ cðxÞwðxÞu: dx dx

ð10:223Þ

In the same way, multiplying both sides of (10.222) by w(x), we get wðxÞLx u ¼ λwðxÞu:

ð10:224Þ

For instance, Hermite differential equation that has already appeared as (2.118) in Sect. 2.3 is described as d2 u du 2x þ 2nu ¼ 0: dx dx2

ð10:225Þ

If we express (10.225) as in (10.224), multiplying ex on both sides of (10.225) we have 2

2 d x2 du e þ 2nex u ¼ 0: dx dx

ð10:226Þ

Notice that the differential operator has been converted to a self-adjoint form 2 according to (10.31) that defines a real and positive weight function ex in the

10.7

Eigenvalue Problems

427

present case. The domain of the Hermite differential equation is (1, +1) at the endpoints (i.e., 1) of which the surface term of RHS of (10.69) approaches zero 2 sufficiently rapidly in virtue of ex . In Sects. 3.4 and 3.5, in turn, we dealt with the (associated) Legendre differential equation (3.127) for which the relevant differential operator is self-adjoint. The surface term corresponding to (10.69) vanishes. It is because (1 ξ2) vanishes at the endpoints ξ ¼ cos θ ¼ 1 (i.e., θ ¼ 0 or π) from (3.107). Thus, the Hermiticity is automatically ensured for the (associated) Legendre differential equation as well as Hermite differential equation. In those cases, even though the differential equations do not satisfy any particular BCs, the Hermiticity is yet ensured. In the theory of differential equations, the aforementioned properties of the Hermitian operators have been fully investigated as the so-called StrumLiouville system (or problem) in the form of a homogeneous differential equation. The related differential equations are connected to classical orthogonal polynomials having personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre, and Tchebichef. These equations frequently appear in quantum mechanics and electromagnetism as typical examples of StrumLiouville system. They can be converted to differential equations by multiplying an original form by a weight function. The resulting equations can be expressed as dY ðxÞ d þ λn wðxÞY n ðxÞ ¼ 0, aðxÞwðxÞ n dx dx

ð10:227Þ

where Yn(x) is a collective representation of classical orthogonal polynomials. Equation (10.226) is an example. Conventionally, a following form is adopted instead of (10.227): dY n ðxÞ 1 d þ λn Y n ðxÞ ¼ 0, aðxÞwðxÞ dx wðxÞ dx

ð10:228Þ

where we put (aw)0 ¼ bw. That is, the differential equation is originally described as að x Þ

d 2 Y n ð xÞ dY ðxÞ þ bðxÞ n þ λn Y n ðxÞ ¼ 0: dx dx2

ð10:229Þ

In the case of Hermite polynomials, for instance, a(x) ¼ 1 and wðxÞ ¼ ex . Since we 2 have ðawÞ0 ¼ 2xex ¼ bw, we can put b ¼ 2x. Examples including this case are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers that characterize the individual physical systems. The related fields have a wide applications in many branches of natural science. After having converted the operator to the self-adjoint form; i.e., Lx ¼ Lx{, instead of (10.100) we have 2

Legendre: Pn(x)

Gegenbauer: C λn ðxÞ

Laguerre: Lνn ðxÞ

Name of the polynomial Hermite: Hn(x) ¼0

d d ð1 x2 Þ dx 2 Pn ðxÞ 2x dx Pn ðxÞ þ nðn þ 1ÞPn ðxÞ ¼ 0

2

λ λ d d λ ð1 x2 Þ dx 2 C n ðxÞ ð2λ þ 1Þx dx C n ðxÞ þ nðn þ 2λÞC n ðxÞ ¼ 0 2

d2 d dx2 H n ðxÞ 2x dx H n ðxÞ þ 2nH n ðxÞ ¼ 0 d2 ν d ν x dx2 Ln ðxÞ þ ðν þ 1 xÞ dx Ln ðxÞ þ nLνn ðxÞ

ð1 x2 Þ λ > 12 1

λ12

xνex (ν > 1)

2

Weight function: w(x) ex

SOLDE form

Table 10.2 Classical polynomials and their related SOLDEs

[1, +1]

[1, +1]

[0, +1)

Domain (1, +1)

428 10 Introductory Green’s Functions

10.7

Eigenvalue Problems

429

Z

s

dxwðxÞfv ðLx uÞ ½Lx v ug ¼ 0:

ð10:230Þ

r

Rewriting it, we get Z

s

Z

dxv ½wðxÞLx u ¼

r

s

dx½wðxÞLx v u:

ð10:231Þ

r

If we use an inner product notation described by (10.88), we get hvjLx ui ¼ hLx vjui:

ð10:232Þ

Here let us think of two eigenfunctions ψ i and ψ j that belong to an eigenvalue λi and λj, respectively. That is, wðxÞLx ψ i ¼ λi wðxÞψ i and wðxÞLx ψ j ¼ λj wðxÞψ j :

ð10:233Þ

Inserting ψ i and ψ j into u and v, respectively, in (10.232), we have ψ j jLx ψ i ¼ ψ j jλi ψ i ¼ λi ψ j jψ i ¼ λj ψ j jψ i ¼ λj ψ j jψ i ¼ Lx ψ j jψ i :

ð10:234Þ

With the second and third equalities, we have used a rule of the inner product (see Parts I and III). Therefore, we get

λi λj ψ j jψ i ¼ 0:

ð10:235Þ

Putting i ¼ j in (10.235), we get ðλi λi Þhψ i jψ i i ¼ 0:

ð10:236Þ

An inner product hψ i| ψ ii vanishes if and only if jψ ii 0; see inner product calculation rules of Sect. 13.1. However, j ψ ii 0 is not acceptable as a physical state. Therefore, we must have hψ i| ψ ii 6¼ 0. Thus, we get λi λi ¼ 0 or λi ¼ λi :

ð10:237Þ

The relation (10.237) obviously indicates that λi is real; i.e., we find that eigenvalues of an Hermitian operator are real. If λi 6¼ λj ¼ λj, from (10.235) we get

ψ j jψ i ¼ 0:

That is, | ψ ji and | ψ ii are orthogonal to each other.

ð10:238Þ

430

10

Introductory Green’s Functions

We often encounter related orthogonality relationship between vectors and functions. We saw several cases in Part I and will see other cases in Part III.

References 1. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York

Part III

Linear Vector Spaces

In this part we treat vectors and their transformations in linear vector spaces so that we can address various aspects of mathematical physics systematically but intuitively. We outline general principles of linear vector spaces mostly from an algebraic point of view. Starting with abstract definition and description of vectors, we deal with their transformation in a vector space using a matrix. An inner product is a central concept in the theory of a linear vector space so that two vectors can be associated with each other to yield a scalar. Unlike many of the books of linear algebra and linear vector spaces, however, we describe canonical forms of matrices before considering the inner product. This is because we can treat the topics in light of the abstract theory of matrices and vector space without a concept of the inner product. Of the canonical forms of matrices, Jordan canonical form is of paramount importance. We study how it is constructed providing a tangible example. In relation to the inner product space, normal operators such as Hermitian operators and unitary operators frequently appear in quantum mechanics and electromagnetism. From a general aspect, we revisit the theory of Hermitian operators that often appeared in both Parts I andII. The last part deals with exponential functions of matrices. The relevant topics can be counted as one of the important branches of applied mathematics. Also, the exponential functions of matrices play an essential role in the theory of continuous groups that will be dealt with in PartIV.

Chapter 11

Vectors and Their Transformation

In this chapter, we deal with the theory of finite-dimensional linear vector spaces. Such vector spaces are spanned by a finite number of linearly independent vectors, namely, basis vectors. In conjunction with developing an abstract concept and theory, we mention a notion of mapping among mathematical elements. A linear transformation of a vector is a special kind of mapping. In particular, we focus on endomorphism within a n-dimensional vector space V n. Here, the endomorphism is defined as a linear transformation: V n ! V n. The endomorphism is represented by a (n, n) square matrix. This is most often the case with physical and chemical applications, when we deal with matrix algebra. In this book we focus on this type of transformation. A non-singular matrix plays an important role in the endomorphism. In this connection, we consider its inverse matrix and determinant. All these fundamental concepts supply us with a sufficient basis for better understanding of the theory of the linear vector spaces. Through these processes, we should be able to get acquainted with connection between algebraic and analytical approaches and gain a broad perspective on various aspects of mathematical physics and related fields.

11.1

Vectors

From both fundamental and practical points of view, it is desirable to define linear vector spaces in an abstract way. Suppose V is a set of elements denoted by a, b, c, etc. called vectors. The set V is a linear vector space (or simply a vector space), if a sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy the following mathematical relations: ða þ bÞ þ c ¼ a þ ðb þ cÞ,

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_11

ð11:1Þ

433

434

11

Vectors and Their Transformation

a þ b ¼ b þ a,

ð11:2Þ

a þ 0 ¼ a,

ð11:3Þ

a þ ð 2 aÞ ¼ 0:

ð11:4Þ

For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined (c is a complex number called a scalar) and we assume the following relations among vectors and scalars: ðcd Þa ¼ cðdaÞ,

ð11:5Þ

1a ¼ a,

ð11:6Þ

cða þ bÞ ¼ ca þ cb,

ð11:7Þ

ðc þ dÞa ¼ ca þ da:

ð11:8Þ

On the basis of the above relations, we can construct the following expression called a linear combination: c 1 a1 þ c 2 a2 þ þ c n an : If this linear combination is equated to zero, we obtain c1 a1 þ c2 a2 þ þ cn an ¼ 0:

ð11:9Þ

If (11.9) holds only in the case where every ci ¼ 0 (1 i n), the vectors a1, a2, , an are said to be linearly independent. In this case the relation represented by (11.9) is said to be trivial. If the relation is nontrivial (i.e., ∃ci 6¼ 0), those vectors are said to be linearly dependent. If in the vector space V the maximum number of linearly independent vectors is n, V is said to be an n-dimensional vector space and sometimes denoted by V n. In this case any vector x of V n is expressed uniquely as a linear combination of linearly independent vectors such that x ¼ x 1 a1 þ x 2 a2 þ þ x n an :

ð11:10Þ

Suppose x is denoted by x ¼ x01 a1 þ x02 a2 þ þ x0n an :

ð11:11Þ

Subtracting both sides of (11.11) from (11.10), we obtain 0 ¼ x1 2 x01 a1 þ x2 2 x02 a2 þ þ xn 2 x0n an :

ð11:12Þ

Linear independence of the vectors a1, a2, , an implies xn 2 x0n ¼ 0; i:e:, xn ¼ x0n ð1 i nÞ. These n linearly independent vectors are referred to as basis vectors.

11.1

Vectors

435

A vector space that has a finite number of basis vectors is called finite-dimensional; otherwise it is infinite-dimensional. Alternatively, we express (11.10) as 0

x1

1

B C x ¼ ða1 an Þ@ ⋮ A: xn

ð11:13Þ

0

1 x1 B C A set of coordinates @ ⋮ A is called a column vector (or a numerical vector) that xn indicates an “address” of the vector x with respect to the basis vectors a1, a2, , an. Any vector in V n can be expressed as a linear combination of the basis vectors and, hence, we say that V n is spanned by a1, a2, , an. This is represented as V n ¼ Spanfa1 , a2 , , an g:

ð11:14Þ

Let us think of a subset W of V n (i.e., W ⊂ V n). If the following relations hold for W, W is said to be a (linear) subspace of V n. a, b 2 W⟹a þ b 2 W, a 2 W⟹ca 2 W:

ð11:15Þ

These two relations ensure that the relations of (11.1)–(11.8) hold for W as well. The dimension of W is equal to or smaller than n. For instance W ¼ Span{a1, a2, , ar} (r n) is a subspace of V n. If r ¼ n, W ¼ V n. Suppose that there are two subspaces W1 ¼ Span{a1} and W2 ¼ Span{a2}. Note that in this case W1 [ W2 is not a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by U ¼ x ¼ x1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2

ð11:16Þ

is a subspace of V n. We denote this subspace by W1 + W2. To show this is in fact a subspace, suppose that x,y 2 W1 + W2. Then, we may express x ¼ x1 + x2 and y ¼ y1 + y2, where x1,y1 2 W1; x2,y2 2 W2. We have x + y ¼ (x1 + y1) + (x2 + y2), where x1 + y1 2 W1 and x2 + y2 2 W2 because both W1 and W2 are subspaces. Therefore, x + y 2 W1 + W2. Meanwhile, with any scalar c, cx ¼ cx1 + cx2 2 W1 + W2. By definition (11.15), W1 + W2 is a subspace accordingly. Suppose here x1 2 W1. Then, x1 = x1 + 0 2 W1 + W2. Then, W1 ⊂ W1 + W2. Similarly, we have W2 ⊂ W1 + W2. Thus, W1 + W2 contains both W1 and W2. Conversely, let W be an arbitrary subspace that contains both W1 and W2. Then, we have 8x1 2 W1 ⊂ W and 8x2 2 W2 ⊂ W and, hence, we have x1 + x2 2 W by definition (11.15). But, from (11.16) W1 + W2 ¼ {x1 + x2;8x1 2 W1, 8 x2 2 W2}. Hence, W1 + W2 ⊂ W. Consequently, any subspace necessarily contains

436

11

(a)

z

x

(b)

z

z

%

% 2

Vectors and Their Transformation

2

y

$

%

x

2

y

$ಬ

y

3 x

Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces. (a) ℝ3 ¼ W1 + W2; W1 \ W2 ¼ Span{e2}, where e2 is a unit vector in the positive direction of the yaxis. (b) ℝ3 ¼ W1 + W3; W1 \ W3 ¼ {0}

W1 + W2. This implies that W1 + W2 is the smallest subspace that contains both W1 and W2. Example 11.1 Consider a three-dimensional Cartesian space ℝ3 (Fig. 11.1). We regard the xy-plane and yz-plane as a subspace W1 and W2, respectively, and ! ! ! ℝ3 ¼ W1 + W2. In Fig. 11.1a, a vector OB (in ℝ3) is expressed as OA þ AB (i.e., ! a sum of a vector in W1 and that in W2). Alternatively, the same vector OB can be ! ! expressed as OA0 þ A0 B . On the other hand, we can designate a subspace in a different way; i.e., in Fig. 11.1b the z-axis is chosen for a subspace W3 instead of ! W2. We have ℝ3 ¼ W1 + W3 as well. In this case, however, OB is uniquely expressed ! ! ! as OB ¼ OP þ PB . Notice that in Fig. 11.1a W1 \ W2 ¼ Span{e2}, where e2 is a unit vector in the positive direction of the y-axis. In Fig. 11.1b, on the other hand, we have W1 \ W3 ¼ {0}. We can generalize this example to the following theorem. Theorem 11.1 Let W1 and W2 be subspaces of V and V ¼ W1 + W2. Then a vector x in V is uniquely expressed as x ¼ x1 þ x 2 , x1 2 W 1 , x2 2 W 2 , if and only if W1 \ W2 ¼ {0}. Proof Suppose W1 \ W2 ¼ {0} and x ¼ x1 þ x2 = x01 þ x02 , x1 , x01 2 W 1 , x2 , x02 2 W 2 . Then x1 2 x01 = x02 2 x2 . LHS belongs to W1 and RHS belongs to W2. Both sides belong to W1 \ W2 accordingly. Hence, from the supposition both the sides should be equal to a zero vector. Therefore, x1 = x01, x2 = x02. This implies that x

11.1

Vectors

437

is expressed uniquely as x ¼ x1 + x2. Conversely, suppose the vector representation (x ¼ x1 + x2) is unique and x 2 W1 \ W2. Then, x ¼ x + 0 = 0 + x; x,0 2 W1 and x,0 2 W2. Uniqueness of the representation implies that x ¼ 0. Consequently, W1 \ W2 ¼ {0} follows. In case W1 \ W2 ¼ {0}, V ¼ W1 + W2 is said to be a direct sum of W1 and W2 or we say that V is decomposed into a direct sum of W1 and W2. We symbolically denote this by V ¼ W 1 ⨁W2 :

ð11:17Þ

In this case, the following equality holds: dim V ¼ dimW 1 þ dimW 2 ,

ð11:18Þ

where “dim” stands for dimension of the vector space considered. To prove (11.18), we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned by r1 and r2 linearly independent vectors, respectively, such that n o n o ð1Þ ð1Þ ð2Þ ð2Þ W 1 ¼ Span e1 , e2 , , eðr11 Þ and W 2 ¼ Span e1 , e2 , , eðr22 Þ :

ð11:19Þ

This is equivalent to that dimension of W1 and W2 is r1 and r2, respectively. If V ¼ W1 + W2 (here we do not assume that the summation is a direct sum), we have n o ð1Þ ð1Þ ð2Þ ð2Þ V ¼ Span e1 , e2 , , eðr11 Þ , e1 , e2 , , eðr22 Þ :

ð11:20Þ

Then, we have n r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these (r1 + r2) vectors cannot span V, but we need additional vectors for all vectors including the additional vectors to span V. Thus, we should have n r1 + r2 accordingly. That is, dim V dimW 1 þ dimW 2 :

ð11:21Þ

ð2Þ

Now, let us assume that V ¼ W1 ⨁ W2. Then, ei

ð1 i r 2 Þ must be linearly ð1Þ ð1Þ ð2Þ ð1Þ independent of e1 , e2 , , er1 . If not, ei could be described as a linear combinað1Þ ð1Þ ð2Þ ð2Þ tion of e1 , e2 , , eðr11 Þ. But, this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 , in contradiction to that we have V ¼ W1 ⨁ W2. It is because W1 \ W2 ¼ {0} by ð1Þ ð2Þ ð2Þ assumption. Likewise, ej ð1 j r 1 Þ is linearly independent of e1 , e2 , , eðr22 Þ . ð1Þ

ð1Þ

ð2Þ

ð2Þ

Hence, e1 , e2 , , eðr11 Þ , e1 , e2 , , eðr22 Þ must be linearly independent. Thus, n r1 + r2. This is because in the vector space V we may well have additional vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n r1 + r2 from the above. Consequently, we must have n ¼ r1 + r2. Thus, we have proven that

438

11

Vectors and Their Transformation

V ¼ W 1 ⨁W 2 ⟹dim V ¼ dimW 1 þ dimW 2 : Conversely, suppose n ¼ r1 + r2. Then, any vector x in V is expressed uniquely as ð1Þ ð2Þ x = a1 e1 þ þ ar1 eðr11 Þ þ b1 e1 þ þ br2 eðr22 Þ :

ð11:22Þ

The vector described by the first term is contained in W1 and that described by the second term in W2. Both the terms are again expressed uniquely. Therefore, we get V ¼ W1 ⨁ W2. This is a proof of dim V ¼ dimW 1 þ dimW 2 ⟹V ¼ W 1 ⨁W 2 : The above statements are summarized as the following theorem:

∎

Theorem 11.2 Let V be a vector space and let W1 and W2 be subspaces of V. Also suppose V ¼ W1 + W2. Then, the following relation holds: dim V dimW 1 þ dimW 2 :

ð11:21Þ

dim V ¼ dimW 1 þ dimW 2 ,

ð11:23Þ

Furthermore, we have

if and only if V ¼ W1 ⨁ W2. Theorem 11.2 is readily extended to the case where there are three or more subspaces. That is, having W1, W2, , Wm so that V ¼ W1 + W2 + Wm, we obtain the following relation: dim V dimW 1 þ dimW 2 þ þ W m :

ð11:24Þ

The equality of (11.24) holds if and only if V ¼ W1 ⨁ W2 ⨁ ⨁ Wm. In light of Theorem 11.2, Example 11.1 says that 3 ¼ dim ℝ3 < 2 + 2 ¼ dim W1 + dim W2. But, dim ℝ3 ¼ 2 + 1 ¼ dim W1 + dim W3. Therefore, we have ℝ 3 ¼ W 1 ⨁ W 2.

11.2

Linear Transformations of Vectors

In the previous section we introduced vectors and their calculation rules in a linear vector space. It is natural and convenient to relate a vector to another vector, as a function f relates a number (either real or complex) to another number such that y ¼ f (x), where x and y are certain two numbers. A linear transformation from the vector space V to another vector space W is a mapping A: V ! W such that

11.2

Linear Transformations of Vectors

439

Aðca þ dbÞ = cAðaÞ þ dAðbÞ:

ð11:25Þ

We will briefly discuss the concepts of mapping at the end of this section. It is convenient to define addition of the linear transformations. It is defined as ðA þ BÞa = Aa þ Ba,

ð11:26Þ

where a is any vector in V. Since (11.25) is a broad but abstract definition, we begin with a well-known simple example of rotation of a vector within a xy-plane (Fig. 11.2). We denote an arbitrary position vector x in the xy-plane by x ¼ xe1 þ ye2 :

x , ¼ ð e1 e 2 Þ y

ð11:27Þ

where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of the vector x in reference to e1 and e2. The expression (11.27) is consistent with (11.13). The rotation represented in Fig. 11.2 is an example of a linear transformation. We call this rotation R. According to the definition Rðxe1 þ ye2 Þ = RðxÞ = xRðe1 Þ þ yRðe2 Þ:

ð11:28Þ

Putting Rðxe1 þ ye2 Þ = x0 , Rðe1 Þ ¼ e01 , and Rðe2 Þ ¼ e02 , x0 = xe01 þ ye02 ,

0 0 x ¼ e1 e2 : y

ð11:29Þ

From Fig. 11.2 we readily obtain Fig. 11.2 Rotation of a vector x within a xy-plane

θ

θ θ ′

440

11

Vectors and Their Transformation

e01 ¼ e1 cos θ þ e2 sin θ, e02 ¼ 2 e1 sin θ þ e2 cos θ:

ð11:30Þ

Using a matrix representation,

e01

e02

¼ ð e1 e2 Þ

sin θ : cos θ

cos θ sin θ

ð11:31Þ

Substituting (11.30) into (11.29), we obtain x0 = ðx cos θ y sin θÞe1 þ ðx sin θ þ y cos θÞe2 :

ð11:32Þ

Meanwhile, x0 can be expressed relative to the original basis vectors e1 and e2. x0 = x 0 e 1 þ y 0 e 2 :

ð11:33Þ

Comparing (11.32) and (11.33), uniqueness of the representation ensures that x0 = x cos θ y sin θ, y0 ¼ x sin θ þ y cos θ:

ð11:34Þ

Using a matrix representation once again,

x0 y0

¼

x : cos θ y

cos θ

sin θ

sin θ

ð11:35Þ

Further combining (11.29) and (11.31), we get

0

x ¼ RðxÞ ¼ ðe1 e2 Þ

cos θ sin θ

x : cos θ y

sin θ

ð11:36Þ

The above example demonstrates that the linear transformation R has a (2, 2) matrix representation shown in (11.36). Moreover, this example obviously shows that if a vector is expressed as a linear combination of the basis vectors, the “coordinates” (represented by a column vector) can be transformed as well by the same matrix. Regarding an abstract n-dimensional linear vector space V n, the linear vector transformation A is given by 0

a11 B AðxÞ ¼ ðe1 en Þ@ ⋮ an1

10 1 x1 a1n CB C ⋱ ⋮ A@ ⋮ A , xn ann

ð11:37Þ

11.2

Linear Transformations of Vectors

441

where e1, e2, , and en are basis vectors and x1, x2, , and xn are the corresponding coordinates of a vector x = ∑ni¼1 xi ei. We assume that the transformation is a mapping A: V n ! V n (i.e., endomorphism). In this case the transformation is represented by a (n, n) matrix. Note that the matrix operates on the basis vectors from the right and that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we often omit a parenthesis to simply write Ax. Here we mention matrix notation for later convenience. We often identify a linear vector transformation with its representation matrix and denote both transformation and matrix by A. On this occasion, we write 0

a11 B A¼@⋮

⋱

1 a1n C ⋮ A, A ¼ ðAÞij ¼ aij , etc:,

an1

ann

ð11:38Þ

where with the second expression (A)ij and (aij) represent the matrix A itself; for we e The notation (11.38) frequently use indexed matrix notations such as A1, A{ and A. can conveniently be used in such cases. Note moreover that aij represents the matrix A as well. Equation (11.37) has duality such that the matrix A operates either on the basis vectors or coordinates. This can explicitly be written as 2

0

a1n

a11

6 B AðxÞ ¼ 4ðe1 en Þ@ ⋮ an1 20 a11 6B ¼ ðe1 en Þ4@ ⋮ an1

13 0

x1

1

C7 B C ⋱ ⋮ A5 @ ⋮ A xn ann 10 13 x1 a1n CB C7 ⋱ ⋮ A@ ⋮ A5: xn ann

That is, we assume the associative law with the above expression. Making summation representation, we have

AðxÞ ¼ ¼ ¼

Xn

e a k¼1 k k1

Xn Xn k¼1

ea k¼1 k kn

0

1 0 Pn 1 x1 l¼1 a1l xl B C B C ⋮ @ ⋮ A ¼ ðe1 en Þ@ A: Pn xn l¼1 anl xl

ea x l¼1 k kl l

Xn Xn l¼1

Xn

Xn Xn e a ¼ a x x ek : k kl l kl l k¼1 k¼1 l¼1

ð11:39Þ

That is, the above equation can be viewed in either of two ways, i.e., coordinate transformation with fix vectors or vector transformation with fixed coordinates.

442

11

Vectors and Their Transformation

Also, Eq. (11.37) can formally be written as 0

x1

1

0

x1

1

B C B C AðxÞ ¼ ðe1 en ÞA@ ⋮ A ¼ ðe1 A en AÞ@ ⋮ A, xn

ð11:40Þ

xn

where we assumed that the distributive law holds with operation of A on (e1 en). Meanwhile, if in (11.37) we put xi ¼ 1, xj ¼ 0 ( j 6¼ i), from (11.39) we get A ð ei Þ ¼

Xn

ea : k¼1 k ki

Therefore, (11.39) can be rewritten as 0

1 x1 B C AðxÞ ¼ ðAðe1 Þ Aðen ÞÞ@ ⋮ A:

ð11:41Þ

xn Since xi (1 i n) can arbitrarily be chosen, comparing (11.40) and (11.41) we have ei A ¼ Aðei Þ ð1 i nÞ:

ð11:42Þ

The matrix representation is unique in reference to the same basis vectors. Suppose that there is another matrix representation of the transformation A such that 0

a011 B AðxÞ ¼ ðe1 en Þ@ ⋮ a0n1

10 1 a01n x1 CB C ⋱ ⋮ A@ ⋮ A: a0nn xn

ð11:43Þ

Subtracting (11.43) from (11.37), we obtain 20

10 1 0 0 10 13 a11 a01n x1 x1 a1n CB C B CB C7 ⋱ ⋮ A@ ⋮ A @ ⋮ ⋱ ⋮ A@ ⋮ A5 a0n1 a0nn an1 ann xn xn 0 n 1 ∑k¼1 a1k a01k xk B C ¼ ðe1 en Þ@ ⋮ A ¼ 0: n 0 ∑k¼1 ank ank xk

a11 6B ðe1 en Þ4@ ⋮

11.2

Linear Transformations of Vectors

443

On the basis of the linear dependence of the basis vectors, ∑nk¼1 aik a0ik xk ¼ 0 ð1 i nÞ: This relationship holds for any arbitrarily and independently chosen complex numbers xi (1 i n). Therefore, we must have aik ¼ a0ik ð1 i, k nÞ, meaning that the matrix representation of A is unique with regard to fixed basis vectors. Nonetheless, if a set of vectors e1, e2, , and en does not constitute basis vectors (i.e., those vectors are linearly dependent), the aforementioned uniqueness of the matrix representation loses its meaning. For instance, in V2 take vectors e1 and e2 such that e1 ¼ e2 (i.e., the two vectors are linearly dependent) and let the transfor1 0 mation matrix be B ¼ . This means that e1 should be e1 after the transfor0 2 mation. At the same time, the vector e2 (¼e1) should be converted to 2e2 (¼2e1). It is impossible except for the case of e1 ¼ e2 ¼ 0. The above matrix B in its own right is an object of matrix algebra, of course. Putting a ¼ b ¼ 0 and c ¼ d ¼ 1 in the definition of (9.25) for the linear transformation, we obtain Að0Þ = Að0Þ þ Að0Þ: Combining this relation with (11.4) gives Að0Þ = 0:

ð11:44Þ

Then do we have a vector u 6¼ 0 for

which A(u) = 0? An answer is yes. This is 1 0 is chosen for R, we get a linear transformation because if a (2, 2) matrix of 0 0 such that

1 e01 e02 ¼ ðe1 e2 Þ 0

0 0

¼ ðe1 0Þ:

That is, we have R(e2) ¼ 0. In general, vectors x (2V ) satisfying A(x) = 0 form a subspace in a vector space V. This is because A(x) = 0 and A(y) = 0 ⟹ A(x + y) = A(x) + A(y) = 0, A(cx) = cA (x) = 0. We call this subspace of a maximum dimension a null-space and represent it as Ker A, where Ker stands for “kernel.” In other words, Ker A ¼ A1(0). Note that this symbolic notation does not ensure the existence of the inverse transformation A1 (vide infra), but represents a set comprising elements x that satisfy A(x) = 0. In the above example Ker R ¼ Span{e2}. A(V n) (here V n is a vector space considered and we assume that A is an endomorphism of V n) also forms a subspace in V n. In fact, for any x, y 2V n, we have A(x), A( y) 2 A(V n) ⟹ A(x) + A( y) = A(x + y) 2 A(V n);cA(x) = A(cx) 2 A(V n). Obviously, A(V n) ⊂ V n. The subspace A(V n) is said to be an image of the

444

11

Vectors and Their Transformation

transformation A and sometimes denoted by Im A. We have a so-called “dimension theorem” expressed as follows: Theorem 11.3: Dimension Theorem Let V n be a linear vector space of dimension n. Also let A be an endomorphism: V n ! V n. Then we have dim V n ¼ dimAðV n Þ þ dim Ker A:

ð11:45Þ

The number of dimA(V n) is said to be a rank of the linear transformation A. That is, we write dimAðV n Þ ¼ rank A: Also the number of dim Ker A is said to be a nullity of the linear transformation A. That is, we have dim Ker A ¼ nullity A: Thus, (11.45) can be written succinctly as dim V n ¼ rank A þ nullity A: Proof Let e1, e2, , and en be basis vectors of V n. First, assume Aðe1 Þ ¼ Aðe2 Þ ¼ ¼ Aðen Þ ¼ 0: This implies that nullity A ¼ n. Then, A ∑ni¼1 xi ei ¼ ∑ni¼1 xi Aðei Þ ¼ 0. Since xi is arbitrarily chosen, the expression means that A(x) ¼ 0 for 8x 2 V n. This implies A ¼ 0. That is, rank A ¼ 0. Thus, (11.45) certainly holds. To proceed with proof of the theorem, we think of a linear combination ∑ni¼1 ci ei. Next, assume that Ker A ¼ Span{e1, e2, , eν} (ν < n); dim Ker A ¼ ν. After A is operated on the above linear combination, we are left with ∑ni¼νþ1 ci Aðei Þ. We put ∑ni¼1 ci Aðei Þ ¼ ∑ni¼νþ1 ci Aðei Þ ¼ 0:

ð11:46Þ

Suppose that the (n ν) vectors A(ei) (ν + 1 i n) are linearly dependent. Then without loss of generality we can assume cν + 1 6¼ 0. Dividing (11.46) by cν + 1 we obtain

11.2

Linear Transformations of Vectors

445

cνþ2 c Aðeνþ2 Þ þ þ n Aðen Þ ¼ 0, C νþ1 Cνþ1

c cn Aðeνþ1 Þ þ A νþ2 eνþ2 þ þ A en ¼ 0, Cνþ1 C νþ1

c c A eνþ1 þ νþ2 eνþ2 þ þ n en ¼ 0: C νþ1 Cνþ1 Aðeνþ1 Þ þ

n eνþ2 þ þ Ccνþ1 en Meanwhile, the (ν + 1) vectors e1 , e2 , , eν , and eνþ1 þ Ccνþ2 νþ1 n are linearly independent, because e1, e2, , and en are basis vectors of V . This would imply that the dimension of Ker A is ν + 1, but this is in contradiction to Ker A ¼ Span{e1, e2, , eν}. Thus, the (n ν) vectors A(ei) (ν + 1 i n) should be linearly independent. Let V n ν be described as

V nν ¼ SpanfAðeνþ1 Þ, Aðeνþ2 Þ, , Aðen Þg:

ð11:47Þ

Then, V n ν is a subspace of V n, and so dimA(V n) n ν ¼ dim V n Meanwhile, from (11.46) we have

ν

.

A ∑ni¼νþ1 ci ei ¼ 0: From the above discussion, however, this relation holds if and only if cν + 1 ¼ ¼ cn ¼ 0. This implies that Ker A \ V n ν ¼ {0}. Then, we have V nν þ Ker A ¼ V nν Ker A: Meanwhile, from Theorem 11.2 we have dim½V nν Ker A ¼ dimV nν þ dim Ker A ¼ ðn νÞ þ ν ¼ n ¼ dimV n : Thus, we must have dimA(V n) ¼ n ν ¼ dim V n ν. Since V n ν is a subspace of V n and V n ν ⊂ A(V n) from (11.47), V n ν ¼ A(V n). To conclude, we get dim AðV n Þ þ dim Ker A ¼ dimV n , V n ¼ AðV n Þ Ker A: This completes the proof.

ð11:48Þ ∎

Comparing Theorem 11.3 with Theorem 11.2, we find that Theorem 11.3 is a special case of Theorem 11.2. Equations (11.45) and (11.48) play an important role in the theory of linear vector space. As an exercise, we have a following example:

446

11

Vectors and Their Transformation

Example 11.2 Let e1, e2, e3, e4 be basis vectors of V 4. Let A be an endomorphism of V 4 and described by 0

1

B0 B A¼B @1 0

0

1

1

1

0 1

1 1

0

1

0C C C: 0A 0

We have 0

1 B0 B ð e1 , e2 , e3 , e4 Þ B @1

0 1

1 1

0

1

1 0 0C C C ¼ ðe1 þ e3 , e2 þ e4 , e1 þ e2 þ e3 þ e4 , 0Þ: 0A

0

1

1

0

That is, Aðe1 Þ ¼ e1 þ e3 , Aðe2 Þ ¼ e2 þ e4 , Aðe3 Þ ¼ e1 þ e2 þ e3 þ e4 , Aðe4 Þ ¼ 0: We have Að 2 e1 e2 þ e3 Þ ¼ Aðe1 Þ Aðe2 Þ þ Aðe3 Þ ¼ 0: Then, we find A V 4 ¼ Spanfe1 þ e3 , e2 þ e4 g, Ker A ¼ Spanf 2 e1 e2 þ e3 , e4 g:

ð11:49Þ

For any x 2 V n, using scalar ci (1 i 4), we have x ¼ c1 e1 þ c2 e2 þ c 3 e3 þ c 2 e4 1 1 ¼ ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3 c1 Þðe2 þ e4 Þ 2 2 1 1 þ ðc3 c1 Þð 2 e1 e2 þ e3 Þ þ ðc1 2c2 c3 þ 2c4 Þe4 : 2 2

ð11:50Þ

Thus, x has been uniquely represented as (11.50) with respect to basis vectors (e1 + e3), (e2 + e4), (2e1 e2 + e3), and e4. The linear independence of these vectors can easily be checked by equating (11.50) with zero. We also confirm that

11.2

Linear Transformations of Vectors

447

injective

Fig. 11.3 Concept of mapping from a set X to another set Y

surjective bijective: injective + surjective

V 4 ¼ A V 4 Ker A: Linear transformation is a kind of mapping. Figure 11.3 depicts the concept of mapping. Suppose two sets of X and Y. The mapping f is a correspondence between an element x (2X) and y (2Y ). The set f(X) ( ⊂ Y ) is said to be a range of f. (i) The mapping f is injective: If x1 6¼ x2 ⟹ f(x1) 6¼ f(x2). (ii) The mapping is surjective: f(X) ¼ Y. For 8y 2 Y corresponding element(s) x 2 X exist(s). (iii) The mapping is bijective: If the mapping f is both injective and surjective, it is said to be bijective (or reversible mapping or invertible mapping). A mapping that is not invertible is said to be a non-invertible mapping. If the mapping f is bijective, a unique element ∃x 2 X exists for 8y 2 Y such that f (x) ¼ y. In terms of solving an equation, we say that with any given y we can find a unique solution x to the equation f(x) ¼ y. In this case x is said to be an inverse element to y and this is denoted by x ¼ f 1( y). The mapping f 1 is called an inverse mapping. If the linear transformation is relevant, the mapping is said to be an inverse transformation. Here we focus on a case where both X and Y form a vector space and the mapping is an endomorphism. Regarding the linear transformation A: V n ! V n (i.e., an endomorphism of V n), we have a following important theorem: Theorem 11.4 Let A: V n ! V n be an endomorphism of V n. A necessary and sufficient condition for the existence of an inverse transformation to A (i.e., A1) is A1(0) ¼ {0}. Proof Suppose A1(0) ¼ {0}. Then A(x1) ¼ A(x2) ⟺ A(x1 x2) ¼ 0 ⟺ x1 x2 ¼ 0; i.e., x1 ¼ x2. This implies that the transformation A is injective. Other way round, suppose that A is injective. If A1(0) 6¼ {0}, there should be b (6¼0) with which A (b) ¼ 0. This is, however, in contradiction to that A is injective. Then we must have A1(0) ¼ {0}. Thus, A1(0) ¼ {0} ⟺ A is injective. Meanwhile, A1(0) ¼ {0} ⟺ dim A1(0) ¼ 0 ⟺ dim A(V n) ¼ n (due to Theorem 11.3); i.e., A(V n) ¼ V n. Then A1(0) ¼ {0} ⟺ A is surjective. Combining this with the abovementioned statement, we have A1(0) ¼ {0} ⟺ A is bijective. This statement is equivalent to that an inverse transformation exists. In the proof of Theorem 11.4, to show that A1(0) ¼ {0} ⟺ A is surjective, we have used the dimension theorem (Theorem 11.3), for which the relevant vector

448

11

Vectors and Their Transformation

space is finite (i.e., n-dimensional). In other words, that A is surjective is equivalent to that A is injective with a finite-dimensional vector space, and vice versa. To conclude, so far as we are thinking of the endomorphism of a finite-dimensional vector space, if we can show it is either injective or surjective, the other necessarily follows and, hence, the mapping is bijective. ∎

11.3

Inverse Matrices and Determinants

The existence of the inverse transformation plays a particularly important role in the theory of linear vector spaces. The inverse transformation is a linear transformation. Let x1 ¼ A1( y1), x2 ¼ A1( y2). Also, we have A(c1x1 + c2x2) ¼ c1A(x1) + c2A (x2) ¼ c1 y1 + c2 y2. Thus, c1x1 + c2x2 ¼ A1(c1 y1 + c2 y2) ¼ c1A1( y1) + c2A1( y2), showing that A1 is a linear transformation. As already mentioned, a matrix that represents a linear transformation A is uniquely determined with respect to fixed basis vectors. This should be the case with A1 accordingly. We have an important theorem for this. Theorem 11.5 [1] The necessary and sufficient condition for the matrix A1 that represents the inverse transformation to A to exist is that det A 6¼ 0 (“det” means a determinant). Here the matrix A represents the linear transformation A. The matrix A1 is uniquely determined and given by

A1

ij

¼ ð1Þiþj ðM Þji =ðdet AÞ,

ð11:51Þ

where (M)ij is the minor of det A corresponding to the element Aij. Proof First, we suppose that the matrix A1 exists so that it satisfies the following relation ∑nk¼1 A1 ik ðAÞkj ¼ δij ð1 i, j nÞ:

ð11:52Þ

On this condition, suppose that det A ¼ 0. From the properties of determinants this implies that one of the columns of A (let it be the m-th column) can be expressed as a linear combination of the other columns of A such that Akm ¼

X

A c: j6¼m kj j

ð11:53Þ

Putting i ¼ m in (11.52), multiplying by cj, and summing over j 6¼ m, we get

11.3

Inverse Matrices and Determinants

449

∑nk¼1 A1 mk ∑j6¼m Akj cj ¼ ∑j6¼m δmj cj ¼ 0:

ð11:54Þ

From (11.52) and (11.53), on the other hand, we obtain Xn k¼1

A1

X mk

A c ¼ j6¼m kj j

n

A1

ik

ðAÞkm

o i¼m

¼ 1:

ð11:55Þ

There is the inconsistency between (11.54) and (11.55). The inconsistency resulted from the supposition that the matrix A1 exists. Therefore, we conclude that if det A ¼ 0, A1 does not exist. Taking contraposition to the above statement, we say that if A1 exists, det A 6¼ 0. Suppose next that det A 6¼ 0. In this case, on the basis of the well-established result, a unique A1 exists and it is given by (11.51). This completes the proof. ∎ Summarizing the characteristics of the endomorphism within a finite-dimensional vector space, we have injective ⟺ surjective ⟺ bijective ⟺ det A 6¼ 0: Let a matrix A be 0

a11

B A¼@⋮ an1

a1n

1

C ⋮ A: ann

⋱

The determinant of a matrix A is denoted by det A or by a11 ⋮ an1

⋱

a1n ⋮ : ann

n

The determinant is defined as det A ∑ 1

2

i1

i2

in

σ¼

εðσ Þa1i a2i ani , 1 2 n

ð11:56Þ

where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the case of even permutations) or (for odd permutations). We deal with triangle matrices for future discussion. It is denoted by

450

11

=

*

0

Vectors and Their Transformation

ð11:57Þ

,

where an asterisk ( ) means that upper right off-diagonal elements can take any complex numbers (including zero). A large zero shows that all the lower left off-diagonal elements are zero. Its determinant is given by det T ¼ a11 a22 ann :

ð11:58Þ

In fact, focusing on anin we notice that only if in ¼ n, anin does not vanish. Then we get det A ∑ 1

2

n 1 εðσ Þa1i1 a2i2 an1in1 ann :

i1

i2

σ¼

ð11:59Þ

in1

Repeating this process, we finally obtain (11.58). The endomorphic linear transformation can be described succinctly as AðxÞ ¼ y,

ð11:60Þ

where we have vectors such that x = ∑ni¼1 xi ei and y = ∑ni¼1 yi ei . In reference to the same set of basis vectors (e1 en) and using a matrix representation, we have 0

a11

B AðxÞ ¼ ðe1 en Þ@ ⋮

an1

a1n

10

x1

1

0

y1

1

CB C B C ⋮ A@ ⋮ A ¼ ðe1 en Þ@ ⋮ A:

⋱

ann

xn

ð11:61Þ

yn

From the unique representation of a vector in reference to the basis vectors, (11.61) is simply expressed as 0

a11 B @⋮ an1

10 1 0 1 x1 y1 a1n CB C B C ⋱ ⋮ A@ ⋮ A ¼ @ ⋮ A : xn yn ann

ð11:62Þ

With a shorthand notation, we have yi ¼

Xn

a x k¼1 ik k

ð1 i nÞ:

ð11:63Þ

From the above discussion, for the linear transformation A to be bijective, det A 6¼ 0. In terms of the system of linear equations, we say that for (11.62) to have a unique

11.3

Inverse Matrices and Determinants

0

x1

1

0

y1

451

1

B C B C solution @ ⋮ A for a given @ ⋮ A, we must have det A 6¼ 0. Conversely, det A ¼ 0 xn yn is equivalent to that (11.62) has indefinite solutions or has no solution. As far as the matrix algebra is concerned, (11.61) is symbolically described by omitting a parenthesis as Ax ¼ y:

ð11:64Þ

However, when the vector transformation is explicitly taken into account, the full representation of (11.61) should be borne in mind. The relations (11.60) and (11.64) can be considered as a set of simultaneous equations. A necessary and sufficient condition for (11.64) to have a unique solution x for a given y is det A 6¼ 0. In that case, the solution x of (11.64) can be symbolically expressed as x = A1 y,

ð11:65Þ

where A1 represents an inverse matrix of A. Example 11.3 Think of three-dimensional rotation by θ in ℝ3 around the z-axis. The relevant transformation matrix is 0

cos θ

B R ¼ @ sin θ 0

sin θ

0

1

C 0 A: 1

cos θ 0

As detR ¼ 1 6¼ 0, the transformation is bijective. This means that for 8y 2 ℝ3 there is always a corresponding x 2 ℝ3. This x can be found by solving Rx ¼ y; i.e., x ¼ R1y. Putting x ¼ xe1 + ye2 + ze3 and y ¼ x0e1 + y0e2 + z0e3, a matrix representation is given by 0 1 0 01 0 x cos θ x B C B C B @ y A ¼ R1 @ y0 A ¼ @ sin θ z 0 z0

sin θ cos θ 0

0

10

x0

1

CB C 0 A@ y0 A: 1 z0

Thus, x can be obtained by rotating y by θ. Example 11.4 Think of a following matrix that represents a linear transformation P: 0

1 B P ¼ @0

0 1

1 0 C 0 A:

0

0

0

ð11:66Þ

This matrix transforms a vector x ¼ xe1 + ye2 + ze3 into y ¼ xe1 + ye2 as follows:

452

11

Vectors and Their Transformation

z

Fig. 11.4 Example of an endomorphism P: ℝ3 ! ℝ3

P

O

y

x

0

1

B @0 0

0 1 0

10 1 0 1 x x CB C B C 0 A@ y A ¼ @ y A: 0 z 0 0

ð11:67Þ

In this example we are thinking of an endomorphism P: ℝ3 ! ℝ3. Geometrically, it can be viewed as in Fig. 11.4. Let us think of (11.67) from a point of view of solving a system of linear equations and newly consider the next equation. In other words, we are thinking of finding x, y, and z with given a, b, and c in (11.68). 0

1

B @0 0

0 1 0

10 1 0 1 x a CB C B C 0 A@ y A ¼ @ b A: 0 z c 0

ð11:68Þ

If c ¼ 0, we can readily find a solution of x ¼ a, y ¼ b, but z can be any (complex) number; we have thus indefinite solutions. If c 6¼ 0, we have no solution. The former situation reflects the fact that the transformation represented by P is not injective. Meanwhile, the latter reflects the fact that the transformation is not surjective. Remember that as detP ¼ 0, the transformation is not injective or surjective.

11.4

Basis Vectors and Their Transformations

In the previous sections we show that a vector is uniquely represented as a column vector in reference to a set of the fixed basis vectors. The representation, however, will be changed under a different set of basis vectors. First let us think of a linear transformation of a set of basis vectors e1, e2, , and en. The transformation matrix A representing a linear transformation A is defined as follows:

11.4

Basis Vectors and Their Transformations

453

0

a11 B A¼@⋮

⋱

1 a1n C ⋮ A:

an1

ann

Notice here that we often denote both a linear transformation and its corresponding matrix by the same character. After the transformation, suppose that the resulting vectors are given by e01 , e02 , and e0n . This is explicitly described as 0

a11 B ðe1 en Þ@ ⋮ an1

1 a1n C ⋮ A ¼ e01 e0n :

⋱

ð11:69Þ

ann

With a shorthand notation, we have e0i ¼

Xn

a e k¼1 ki k

ð1 i nÞ:

ð11:70Þ

Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors e01 , e02 , and e0n may or may0not 1 be linearly independent. Let us operate both sides of x1 B C (11.70) from the left on @ ⋮ A and equate both the sides to zero. That is, xn 0

a11

B ðe1 en Þ@ ⋮ an1

⋱

a1n

10

x1

1

0

x1

1

B C CB C ⋮ A@ ⋮ A ¼ e01 e0n @ ⋮ A ¼ 0: xn xn ann

Since e1, e2, , and en are the basis vectors, we get 0

a11 B @⋮

⋱

10 1 x1 a1n CB C ⋮ A@ ⋮ A ¼ 0:

an1

ann

0

1

x1

ð11:71Þ

xn

B C Meanwhile, we must have @ ⋮ A ¼ 0 so that e01 , e02 , and e0n can be linearly indexn pendent (i.e., so as to be a set of basis vectors). But this means that (11.71) has such a unique (and trivial) solution and, hence, det A 6¼ 0. If conversely det A 6¼ 0, (11.71) has a unique trivial solution and e01 , e02 , and e0n are linearly independent. Thus, a necessary and sufficient condition for e01 , e02 , and e0n to be a set of basis vectors is det A 6¼ 0. If det A ¼ 0, (11.71) has indefinite solutions (including a trivial solution)

454

11

Vectors and Their Transformation

and e01 , e02 , and e0n are linearly dependent, and vice versa. In case det A 6¼ 0, an inverse matrix A1 exists, and so we have ðe1 en Þ ¼ e01 e0n A1 :

ð11:72Þ

In the previous steps we see how the linear transformation (and the corresponding matrix representation) converts a set of basis vectors (e1 en) to another set of basis vectors e01 e0n . Is this possible then to find a suitable transformation between two sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector can be expressed uniquely by a linear combination of any set of basis vectors. A whole array of such linear combinations uniquely define a transformation matrix between the two sets of basis vectors as expressed in (11.69) and (11.72). The matrix has nonzero determinant and has an inverse matrix. A concept of the transformation between basis vectors is important and very often used in various fields of natural science. Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis vectors of A(V 4) and those of Ker A span V 4 in total. Therefore, in light of the above argument, there should be a linear transformation R between the two sets of vectors, i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, 2 e1 e2 + e3, e4. Moreover, the matrix R associated with the linear transformation must be non-singular (i.e., detR 6¼ 0). In fact, we find that R is expressed as 0

1 0

B0 1 B R¼B @1 0 0 1

1 1 1 0

0

1

0C C C: 0A 1

This is because we have a following relation between the two sets of basis vectors: 0

1

B0 B ðe1 , e2 , e3 , e4 ÞB @1 0

0 1 1 1 0 1

1 0

0

1

0C C C ¼ ðe1 þ e3 , e2 þ e4 , 2 e1 e2 þ e3 , e4 Þ: 0A 1

We have det R ¼ 2 6¼ 0 as expected. Next, let us consider successive linear transformations of vectors. Again we assume that the transformations are endomorphism:V n ! V n. We have to take into account transformations of basis vectors along with the targeted vectors. First we choose a transformation by a non-singular matrix (having a nonzero determinant) for the subsequent transformation to have a unique matrix representation (vide supra). The vector transformation by P is expressed as

11.4

Basis Vectors and Their Transformations

455

0

p11 B PðxÞ ¼ ðe1 en Þ@ ⋮ pn1

10 1 p1n x1 CB C ⋱ ⋮ A@ ⋮ A , xn pnn

ð11:73Þ

where the non-singular matrix P represents the transformation P. Notice here that the transformation and its matrix are represented by the same P. As mentioned in Sect. 11.2, the matrix P can be operated either from the right on the basis vectors or from the left on the column vector. We explicitly write 2

0

130 1 x1 p1n C7B C ⋱ ⋮ A5@ ⋮ A, pn1 pnn xn 0 1 x1 0 C 0 B ¼ e1 en @ ⋮ A, xn

p11 6 B PðxÞ ¼ 4ðe1 en Þ@ ⋮

ð11:74Þ

where e01 e0n ¼ (e1 en)P [here P is the non-singular matrix defined in (11.73)]. Alternatively, we have 20

10 13 x1 p1n CB C7 ⋱ ⋮ A@ ⋮ A5 : pn1 pnn xn 0 0 1 x1 B C ¼ ðe1 en Þ@ ⋮ A,

p11 6B PðxÞ ¼ ðe1 en Þ4@ ⋮

ð11:75Þ

x0n where 0

x01

1

0

p11 B C B @⋮A ¼ @ ⋮ x0n pn1

10 1 p1n x1 CB C ⋱ ⋮ A@ ⋮ A : xn pnn

ð11:76Þ

Equation (11.75) gives a column vector representation regarding the vector that has been obtained by the transformation P and is viewed in reference to the basis vectors (e1 en). Combining (11.74) and (11.75), we get 0 1 0 0 1 x1 x1 B C B C PðxÞ = e01 e0n @ ⋮ A ¼ ðe1 en Þ@ ⋮ A: x0n xn

ð11:77Þ

456

11

Vectors and Their Transformation

We further make another linear transformation A : V n ! V n. In this case a corresponding matrix A may be non-singular (i.e., det A 6¼ 0) or singular (det A ¼ 0). We have to distinguish the matrix representations of the two cases. It is because the matrix representations are uniquely defined in reference to an individual set of basis 0 vectors; see (11.37) and (11.43). Let denote us the matrices AO and A with respect to 0 0 the basis vectors (e1 en) and e1 en , respectively. Then, A[P(x)] can be described in two different ways as follows: 0

1 0 0 1 x1 x1 0 0B C B C 0 A½PðxÞ = e1 en A @ ⋮ A ¼ ðe1 en ÞAO @ ⋮ A: x0n xn

ð11:78Þ

This can be rewritten in reference to a linearly independent set of vectors e1, , en as 0

x1

1

2

0

x1

13

B C 6 B C7 A½PðxÞ = ½ðe1 en ÞPA0 @ ⋮ A ¼ ðe1 en Þ4AO P@ ⋮ A5: xn xn

ð11:79Þ

As (11.79) is described for a vector x = ∑ni¼1 xi ei arbitrarily chosen in V n, we get PA0 ¼ AO P:

ð11:80Þ

Since P is non-singular, we finally obtain A0 ¼ P1 AO P:

ð11:81Þ

We can see (11.79) from a point of view of successive linear transformations. When the subsequent operation is viewed in reference to the basis vectors e01 , , e0n newly reached by the precedent transformation, we make it a rule to write the relevant subsequent operator A0 from the right. In the case where the subsequent operation is viewed in reference to the original basis vectors, on the other hand, we write the subsequent operator AO from the left. Further discussion and examples can be seen in Part IV. We see (11.81) in a different manner. Suppose we have 0

x1

1

B C AðxÞ ¼ ðe1 en ÞAO @ ⋮ A: xn

ð11:82Þ

Note that since the transformation A has been performed in reference to the basis vectors (e1 en), AO should be used for the matrix representation. This is rewritten as

11.4

Basis Vectors and Their Transformations

457

0

1 x1 B C AðxÞ ¼ ðe1 en ÞPP1 AO PP1 @ ⋮ A:

ð11:83Þ

xn Meanwhile, any vector x in V n can be written as 0

x1

1

B C x ¼ ðe1 en Þ@ ⋮ A xn 0

x1

1

0

x1

1

0

xe1

1

B C B C B C ¼ ðe1 en ÞPP1 @ ⋮ A ¼ e01 e0n P1 @ ⋮ A ¼ e01 e0n @ ⋮ A: xn xn xen

ð11:84Þ

In (11.84) we put 0

1 0 1 xe1 x1 0 1 B C B C 0 ðe1 en ÞP ¼ e1 en , P @ ⋮ A ¼ @ ⋮ A: xn xen

ð11:85Þ

Equation (11.84) gives a column vector representation regarding the same vector x viewed in reference to the basis set of vectors e1, , en or e01 , , e0n . Equation (11.85) should 0 not 1 be confused 0 0 1 with (11.76). That is, (11.76) relates the two x1 x1 B C B C coordinates @ ⋮ A and @ ⋮ A of different vectors before and after the transforx0n xn mation viewed in reference to the same set0of basis 1 vectors. 0 The 1 relation (11.85), on x1 xe1 B C B C the other hand, relates two coordinates @ ⋮ A and @ ⋮ A of the same vector xn xen viewed in reference to different set of basis vectors. Thus, from (11.83) we have 0

xe1

1

B C AðxÞ = e01 e0n P1 AO P@ ⋮ A: xen

ð11:86Þ

Meanwhile, viewing the transformation A in reference to the basis vectors e01 e0n , we have

458

11

Vectors and Their Transformation

0

1 xe1 B C AðxÞ = e01 e0n A0 @ ⋮ A: xen

ð11:87Þ

Equating (11.86) and (11.87), A0 ¼ P1 AO P:

ð11:88Þ

Thus, (11.81) is recovered. The relations expressed by (11.81) and (11.88) are called a similarity transformation on A. The matrices A0 and A0 are said to be similar to each other. As mentioned earlier, if A0 (and hence A0) is non-singular, the linear transformation A produces a set of basis vectors other than e01 , , e0n , say e001 , , e00n . We write this symbolically as ðe1 en ÞPA ¼ e01 e0n A ¼ e001 e00n :

ð11:89Þ

Therefore, such A defines successive transformations of the basis vectors in conjunction with P defined in (11.73). The successive transformations and resulting basis vectors supply us with important applications in the field of group theory. Topics will be dealt with in Part IV.

Reference 1. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York

Chapter 12

Canonical Forms of Matrices

In Sect. 11.4 we saw that the transformation matrices are altered depending on the basis vectors we choose. Then a following question arises. Can we convert a (transformation) matrix to as simple a form as possible by similarity transformation(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in a linear vector space V n we can always find a non-singular transformation matrix between the two. In conjunction with the transformation of the basis vectors, the matrix undergoes similarity transformation. It is our task in this chapter to find a simple form or a specific form (i.e., canonical form) of a matrix as a result of the similarity transformation. For this purpose, we should first find eigenvalue(s) and corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices, we get various canonical forms of matrices such as a triangle matrix and a diagonal matrix. Regarding any form of matrices, we can treat these matrices under a unified form called the Jordan canonical form.

12.1

Eigenvalues and Eigenvectors

An eigenvalue problem is one of important subjects of the theory of linear vector spaces. Let A be a linear transformation on V n. The resulting matrix gets to several different kinds of canonical forms of matrices. A typical example is a diagonal matrix. To reach a satisfactory answer, we start with so-called an eigenvalue problem. Suppose that after the transformation of x we have AðxÞ ¼ αx,

ð12:1Þ

where α is a certain (complex) number. Then we say that α is an eigenvalue and that x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37) of Sect. 11.2, we have © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_12

459

460

12

0

a11 B AðxÞ ¼ ðe1 en Þ@ ⋮ an1

Canonical Forms of Matrices

10 1 0 1 a1n x1 x1 CB C B C ⋱ ⋮ A@ ⋮ A ¼ αðe1 en Þ@ ⋮ A: xn xn ann

From linear dependence of e1, e2, , and en, we simply write 0

a11

B @⋮ an1 0

⋱ x1

a1n

10

x1

1

0

x1

1

CB C B C ⋮ A@ ⋮ A ¼ α@ ⋮ A: xn xn ann

1

B C If we identify x with @ ⋮ A at fixed basis vectors (e1 en), we may naturally xn rewrite (12.1) as Ax ¼ αx:

ð12:2Þ

If x1 and x2 belong to the eigenvalue α, so does x1 + x2 and cx1 (c is an appropriate complex number). Therefore, all the eigenvectors belonging to the eigenvalue α along with 0 (a zero vector) form a subspace of A (within V n) corresponding to the eigenvalue α. Strictly speaking, we should use terminologies such as a “proper” (or ordinary) eigenvalue, eigenvector, and eigenspace to distinguish them from a “generalized” eigenvalue, eigenvector, and eigenspace. We will return to this point later. Further rewriting (12.2) we have ðA αEÞx ¼ 0,

ð12:3Þ

where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds (as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have jA αEj ¼ 0,

ð12:4Þ

In (12.4), |A αE| stands for det(A αE). Now let us define the following polynomial: x a11 f A ðxÞ ¼ jxE Aj ¼ ⋮ an1

⋱

: x ann a1n ⋮

ð12:5Þ

12.1

Eigenvalues and Eigenvectors

461

A necessary and sufficient condition for α to be an eigenvalue is that α is a root of fA(x) ¼ 0. The function fA(x) is said to be a characteristic polynomial and we call fA(x) ¼ 0 a characteristic equation. This is an n-th order polynomial. Putting f A ðxÞ ¼ xn þ a1 xn1 þ þ an ,

ð12:6Þ

a1 ¼ ða11 þ a22 þ þ ann Þ TrA,

ð12:7Þ

we have

where Tr stands for “trace” that is a summation of diagonal elements. Moreover, an ¼ ð1Þn j A j :

ð12:8Þ

The characteristic equation fA(x) ¼ 0 has n roots including possible multiple roots. Let those roots be α1, , αn (some of them may be identical). Then we have f A ð xÞ ¼

n Y

ðx αi Þ:

ð12:9Þ

i¼1

Furthermore, according to relations between roots and coefficients we get α1 þ þ αn ¼ a1 ¼ TrA,

ð12:10Þ

α1 αn ¼ ð1Þn an ¼ jAj:

ð12:11Þ

The characteristic equation fA(x) is invariant under a similarity transformation. In fact, f P1 AP ðxÞ ¼ xE P1 AP ¼ P1 ðxE AÞP a ¼ jPj1 jxE AjjPj ¼ jxE Aj ¼ f A ðxÞ:

ð12:12Þ

This leads to invariance of the trace under a similarity transformation. That is, Tr P1 AP ¼ TrA:

ð12:13Þ

Let us think of a following triangle matrix:

=

0

*

.

ð12:14Þ

462

12

Canonical Forms of Matrices

The matrix of this type is thought to be one of canonical forms of matrices. Its characteristic equation fT(x) is x a11 f T ðxÞ ¼ jxE T j ¼ 0 0

⋱ : 0 x ann

ð12:15Þ

Therefore, we get f T ð xÞ ¼

n Y

ðx aii Þ,

ð12:16Þ

i¼1

where we used (11.58). From (12.14) and (12.16), we find that eigenvalues of a triangle matrix are given by its diagonal elements accordingly. Our immediate task will be to examine whether and how a given matrix is converted to a triangle matrix through a similarity transformation. A following theorem is important. Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by similarity transformation [1]. Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower left off-diagonal elements are zero; see (12.14)] or a “lower” triangle matrix (to which all the upper right off-diagonal elements are zero). In the present case we show the proof for the upper triangle matrix. Regarding the lower triangle matrix the theorem is proven in a similar manner. The proof is due to mathematical induction. First we show that the theorem is true of a (2, 2) matrix. Suppose that one of eigenvalues of A2 is α1 and that its corresponding eigenvector is x1. Then we have A2 x1 ¼ α1 x1 ,

ð12:17Þ

where we assume that x1 represents a column vector. Let a non-singular matrix be P1 ¼ ðx1 p1 Þ,

ð12:18Þ

where p1 is another column vector chosen in such a way that x1 and p1 are linearly independent so that P1 can be a non-singular matrix. Then, we have 1 1 P1 1 A2 P1 ¼ P1 A2 x1 P1 A2 p1 : Meanwhile,

ð12:19Þ

12.1

Eigenvalues and Eigenvectors

463

1 1 x1 ¼ ð x1 p1 Þ ¼ P1 : 0 0 Hence, we have P1 1 A2 x1

¼

α1 P1 1 x1

¼

α1 P1 1 P1

1 1 α1 ¼ α1 ¼ : 0 0 0

ð12:20Þ

Thus (12.19) can be rewritten as P1 1 A2 P1 ¼

α1

0

:

ð12:21Þ

This shows that Theorem 12.1 is true of a (2, 2) matrix A2. Now let us show that Theorem 12.1 holds in a general case, i.e., for a (n, n) square matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the (2, 2) matrix case, suppose that after a suitable similarity transformation by a none we have singular matrix P 1 f e e An P: An ¼ P

ð12:22Þ

fn as. Then, we can describe A

0

AnѸ1

.

ð12:23Þ

In (12.23), αn is one of eigenvalues of An. To show that (12.23) is valid, we have a e such that similar argument as in the case of a (2, 2) matrix. That is, we set P e ¼ ðan p1 p2 pn1 Þ, P e is a non-singular matrix where an is an eigenvector corresponding to αn and P formed by n linearly independent column vectors an, p1, p2, and pn 1. Then we have 1 fn ¼ P e e An P A 1 1 1 1 e An an P e An p1 P e An p2 P e An pn1 : ¼ P The vector an can be expressed as

ð12:24Þ

464

12

Canonical Forms of Matrices

0 1 1 B B 0 C B C B B C B B C B e 0 an ¼ ðan p1 p2 pn1 ÞB C ¼ PB B C B @ @⋮A 0 0

1 1 0 C C C 0 C C: C ⋮A 0

Therefore, we have 1 0 1 B 0 C B C B B C B B 1 1 1 C B e e e e P An an ¼ P αn an ¼ P αn PB 0 C ¼ αn B B C B B @⋮A @ 0 0

1 0 1 B 0 C C B C B B 0 C C¼B C B ⋮A @ 0

1 αn 0 C C C 0 C C: C ⋮A 0

fn as (12.23). Thus, from (12.24) we see that one can express a matrix form of A By a hypothesis of the mathematical induction, we assume that there exists a (n 1, n 1) non-singular square matrix Pn 1 and an upper triangle matrix Δn 1 such that P1 n1 An1 Pn1 ¼ Δn1 :

ð12:25Þ

Let us define a following matrix 0

0

⋯ 0 ,

PnѸ1

where d 6¼ 0. The Pn is (n, n) non-singular square matrix; remember that detPn ¼ d fn from the right, we have (det Pn 1) 6¼ 0. Operating Pn on A ⋯

AnѸ1

0 ⋯

0

PnѸ1

AnѸ1 PnѸ1 0

x1

,

ð12:26Þ

1

B C where xTn1 is a transpose of a column vector @ ⋮ A . Therefore, xTn1 Pn1 is a xn1 (1, n 1) matrix (i.e., a row vector). Meanwhile, we have

12.1

Eigenvalues and Eigenvectors

0

465

0 0

PnѸ1

/

PnѸ1 ΔnѸ1

Δ nѸ1

.

ð12:27Þ

From the assumption of (12.25), we have LHS of ð12:26Þ ¼ LHS of ð12:27Þ:

ð12:28Þ

f An Pn ¼ Pn Δn :

ð12:29Þ

That is,

In (12.29) we define Δn such that /

Δ nѸ1

,

ð12:30Þ

which has appeared in LHS of (12.27). As Δn 1 is a triangle matrix from the assumption, Δn is a triangle matrix as well. Combining (12.22) and (12.29), we finally get e n: e n 1 An PP Δn ¼ PP

ð12:31Þ

e n is a non-singular matrix, and so PP e1 ¼ E . Hence, e n Pn 1 P Notice that PP e1 ¼ PP e n 1 . The equation obviously shows that An has been converted to Pn 1 P ∎ a triangle matrix Δn. This completes the proof. Equation (12.31) implies that eigenvectors are disposed on diagonal positions of a triangle matrix. Triangle matrices can further undergo a similarity transformation. Example 12.1 Let us think of a following triangle matrix A: A¼

2

1

0

1

:

ð12:32Þ

Eigenvalues of A are 2 and 1. Remember that diagonal elements of a triangle 1 matrix give eigenvalues. According to (12.20), a vector can be chosen for an 0 eigenvector (as a column vector representation) corresponding to the eigenvalue

466

12

2. Another eigenvector can be chosen to be

1 1

Canonical Forms of Matrices

. This is because for an

eigenvalue 1, we obtain

1

1

0

0

x¼0

c1

as an eigenvalue equation (A E)x ¼ 0. Therefore, with an eigenvector c2 1 corresponding to the eigenvalue 1, we get c1 + c2 ¼ 0. Therefore, we have as 1 1 1 a simple form of the eigenvector. Hence, putting P ¼ , similarity trans0 1 formation is carried out such that P1 AP ¼

1 0

1 1

2 0

1 1

1 0

1 1

¼

2 0 : 0 1

ð12:33Þ

This is a simple example of matrix diagonalization. Regarding the eigenvalue/eigenvector problems, we have another important theorem. Theorem 12.2 Eigenvectors corresponding to different eigenvalues of A are linearly independent. Proof We prove the theorem by mathematical induction. Let α1 and α2 be two different eigenvalues of a matrix A and let a1 and a2 be eigenvectors corresponding to α1 and α2, respectively. Let us think of a following equation: c1 a1 þ c2 a2 ¼ 0:

ð12:34Þ

Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we can put c1 6¼ 0. Accordingly, we get a1 ¼

c2 a: c1 2

ð12:35Þ

Operating A from the left of (12.35) we have α1 a1 ¼

c2 α a = α2 a1 : c1 2 2

With the second equality we have used (12.35). From (12.36) we have

ð12:36Þ

12.1

Eigenvalues and Eigenvectors

467

ðα1 α2 Þa1 = 0:

ð12:37Þ

As α1 6¼ α2, α1 α2 6¼ 0. This implies a1 = 0, in contradiction to that a1 is a eigenvector. Thus, a1 and a2 must be linearly independent. Next we assume that Theorem 12.2 is true of the case where we have (n 1) eigenvalues α1, α2, , and αn 1 that are different from one another and corresponding eigenvectors a1, a2, , and an 1 are linearly independent. Let us think of a following equation: c1 a1 þ c2 a2 þ þ cn1 an1 þ cn an ¼ 0,

ð12:38Þ

where an is an eigenvector corresponding to an eigenvalue αn. Suppose here that a1, a2, , and an are linearly dependent. If cn = 0, we have c1 a1 þ c2 a2 þ þ cn1 an1 ¼ 0:

ð12:39Þ

But, from the linear independence of a1, a2, , and an 1 we have c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. Thus, it follows that all the eigenvectors a1, a2, , and an are linearly independent. However, this is in contradiction to the assumption. We should therefore have cn 6¼ 0. Accordingly we get an ¼

c1 c c a1 þ 2 a2 þ þ n1 an1 : cn cn cn

ð12:40Þ

Operating A from the left of (12.38) again, we have c1 c2 cn1 α a þ α a þ þ α a : α n an ¼ cn 1 1 cn 2 2 cn n1 n1

ð12:41Þ

Here we think of two cases of (i) αn 6¼ 0 and (ii) αn ¼ 0. (i) Case I: αn 6¼ 0. Multiplying both sides of (12.40) by αn we have c1 c2 cn1 α a þ α a þ þ αa αn an ¼ : cn n 1 cn n 2 cn n n1

ð12:42Þ

Subtracting (12.42) from (12.41) we get 0¼

c1 c c ðα α1 Þa1 þ 2 ðαn α2 Þa2 þ þ n1 ðαn αn1 Þan1 : cn n cn cn

ð12:43Þ

Since we assume that eigenvalue are different from one another, αn 6¼ α1, αn 6¼ α2, , αn 6¼ αn 1. This implies that c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. From (12.40) we have an ¼ 0. This is, however, in contradiction to that an is an

468

12

Canonical Forms of Matrices

eigenvector. This means that our original supposition that a1, a2, , and an are linearly dependent was wrong. Thus, the eigenvectors a1, a2, , and an should be linearly independent. (ii) Case II: αn ¼ 0. Suppose again that a1, a2, , and an are linearly dependent. Since as before cn 6¼ 0, we get (12.40) and (12.41) once again. Putting αn ¼ 0 in (12.41) we have c c c 0 ¼ 1 α1 a1 þ 2 α2 a2 þ þ n1 αn1 an1 : cn cn cn

ð12:44Þ

Since eigenvalues are different, we should have α1 6¼ 0, α2 6¼ 0, , and αn 1 6¼ 0. Then, considering that a1, a2, , and an 1 are linearly independent, for (12.44) to hold we must have c1 ¼ c2 ¼ ¼ cn 1 ¼ 0. But, from (12.40) we have an ¼ 0, again in contradiction to that an is an eigenvector. Thus, the eigenvectors a1, a2, , and an should be linearly independent. These complete the proof. ∎

12.2

Eigenspaces and Invariant Subspaces

Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one of basis vectors, the matrix representation of the linear transformation A in reference to such basis vectors is obtained so that the leftmost column is zero except for the left top corner on which an eigenvalue is positioned. (Note that if the said eigenvector is zero, the leftmost column is a zero vector.) Meanwhile, neither Theorem 12.1 nor Theorem 12.2 tells about multiplicity of eigenvalues. If the eigenvalues have multiplicity, we have to think about different aspects. This is a major issue of this section. Let us start with a discussion of invariant subspaces. Let A be a (n, n) square matrix. If a subspace W in V n is characterized by x 2 W⟹Ax 2 W, W is said to be invariant with respect to A (or simply A-invariant) or an invariant subspace in V n ¼ Span{a1, a2, , an}. Suppose that x is an eigenvector of A and that its corresponding eigenvalue is α. Then, Span{x} is an invariant subspace of V n. It is because A(cx) = cAx = cαx ¼ α(cx) and cx is again an eigenvector belonging to α. Suppose that dimW ¼ m (m n) and that W ¼ Span{a1, a2, , am}. If W is Ainvariant, A causes a linear transformation within W. In that case, expressing A in reference to a1, a2, , am, am + 1, , an, we have

12.2

Eigenspaces and Invariant Subspaces

469

Am ða1 a2 am amþ1 an ÞA ¼ ða1 a2 am amþ1 an Þ 0

,

ð12:45Þ

where Am is a (m, m) square matrix and “zero” denotes a (n m, m) zero matrix. Notice that the transformation A makes the remaining (n m) basis vectors am + 1, am + 2, , and an in V n be converted to a linear combination of a1, a2, , and an. The triangle matrix Δn given in (12.30) and (12.31) is an example to which Am is a (1, 1) matrix (i.e., simply a complex number). Let us examine properties of the A-invariant subspace still further. Let a be any vector in V n and think of following (n + 1) vectors [2]: a, Aa, A2 a, , An a: These vectors are linearly dependent, since there are at most n linearly independent vectors in V n. These vectors span a subspace in V n. Let us call this subspace M; i.e.,

M Span a, Aa, A2 a, , An a : Consider the following equation: c0 a þ c1 Aa þ c2 A2 a þ þ cn An a = 0:

ð12:46Þ

Not all ci (0 i n) are zero, because the vectors are linearly dependent. Suppose that cn 6¼ 0. Then, from (12.46) we have An a = 2

1 c a þ c1 Aa þ c2 A2 a þ þ cn1 An1 a : cn 0

Operating A on the above equation from the left, we have Anþ1 a = 2

1 c0 Aa þ c1 A2 a þ c2 A3 a þ þ cn1 An a : cn

Thus, An + 1a is contained in M. That is, we have An + 1a 2 Span{a, Aa, A2a, , A a}. Next, suppose that cn ¼ 0. Then, at least one of ci (0 i n 1) is not zero. Suppose that cn 1 6¼ 0. From (12.46), we have n

An1 a = 2

1 c a þ c1 Aa þ c2 A2 a þ þ cn2 An2 a : cn1 0

Operating A2 on the above equation from the left, we have

470

12

Anþ1 a = 2

1 cn1

Canonical Forms of Matrices

c0 A2 a þ c1 A3 a þ c2 A4 a þ þ cn2 An a :

Again, An + 1a is contained in M. Repeating similar procedures, we reach c0 a þ c1 Aa ¼ 0: If c1 ¼ 0, then we must have c0 6¼ 0. If so, a = 0. This is impossible, however, because we should have a 6¼ 0. Then, we have c1 6¼ 0 and, hence, Aa ¼

c0 a: c1

Operating once again An on the above equation from the left, we have Anþ1 a =

c0 n A a: c1

Thus, once again An + 1a is contained in M. In the above discussion, we get AM ⊂ M. Further operating A, A2M ⊂ AM ⊂ M, A3M ⊂ A2M ⊂ AM ⊂ M, . Then we have

a, Aa, A2 a, , An a, Anþ1 a, 2 Span a, Aa, A2 a, , An a : That is, M is an A-invariant subspace. We also have m dimM dimV n ¼ n: There are m basis vectors in M, and so representing A in a matrix form in reference to the n basis vectors of V n including these m vectors, we have A¼

Am 0

:

ð12:47Þ

Note again that in (12.45) V n is spanned by the m basis vectors in M together with other (n 2 m) linearly independent vectors. We happen to encounter a situation where two subspaces W1 and W2 are at once A-invariant. Here we can take basis vectors a1, a2, , and am for W1 and am + 1, am + 2, , and an for W2. In reference to such a1, a2, , and an as basis vector of V n, we have A¼

Am

0

0

Anm

,

ð12:48Þ

12.2

Eigenspaces and Invariant Subspaces

471

where An m is a (n m, n m) square matrix and “zero” denotes either a (n m, m) or (m, n m) zero matrix. Alternatively, we denote A ¼ Am Anm : In this situation, the matrix A is said to be reducible. As stated above, A causes a linear transformation within both W1 and W2. In other words, Am and An m cause a linear transformation within W1 and W2 in reference to a1, a2, , am and am + 1, am + 2, , an, respectively. In this case V n can be represented as a direct sum of W1 and W2 such that V n ¼ W 1 W 2 ¼ Spanfa1 , a2 , , am g Spanfamþ1 , amþ2 , , an g:

ð12:49Þ

This is because Span{a1, a2, , am} \ Span{am + 1, am + 2, , an} ¼ {0}. In fact, if the two subspaces possess x ( ≠ 0) in common, a1, a2, , an become linearly dependent, in contradiction. The vector space V n may well further be decomposed into subspaces with a lower dimension. For further discussion we need a following theorem and a concept of a generalized eigenvector and generalized eigenspace. Theorem 12.3: Hamilton–Cayley Theorem [3, 4] Let fA(x) be the characteristic polynomial pertinent to a linear transformation A: V n ! V n. Then fA(A)(x) ¼ 0 for 8 x 2 V n. That is, Ker fA(A) ¼ V n. Proof To prove the theorem we use the following well-known property of determinants. Namely, let A be a (n, n) square matrix expressed as 0

a11

B A¼@⋮

an1

⋱

a1n

1

C ⋮ A:

ann

e be the cofactor matrix of A, namely Let A 0

Δ11 B e A¼@ ⋮ Δn1

⋱

1 Δ1n C ⋮ A, Δnn

where Δij is a cofactor of aij. Then eT ¼j A j E, eT A ¼ AA A

ð12:50Þ

eT is a transposed matrix of A. e We now apply (12.50) to the characteristic where A polynomial to get a following equation expressed as

472

12

Canonical Forms of Matrices

g T xE AÞ ðxE AÞ ¼ ðxE AÞ xEg AÞT ¼ jxE AjE ¼ f A ðAÞE,

ð12:51Þ

where xEg A is the cofactor matrix of xE A. Let the cofactor of the (i, j)-element of (xE A) be Δij. Note in this case that Δij is an at most (n 1)-th order polynomial of x. Let us put accordingly Δij ¼ bij,0 xn1 þ bij,1 xn2 þ þ bij,n1 :

ð12:52Þ

Also, we put Bk ¼ (bij, k). Then we have 0

Δ11

B xEg A¼B @⋮

Δn1

⋱

Δ1n

1

C ⋮ C A Δnn

0

b11,0 xn1 þ þ b11,n1 B ¼@ ⋮ bn1,0 xn1 þ þ bn1,n1 0

b11,0

B ¼ xn1 @ ⋮

bn1,0

⋱

b1n,0

1 b1n,0 xn1 þ þ b1n,n1 C ⋱ ⋮ A bnn,0 xn1 þ þ bnn,n1

1

0

b11,0

C B ⋮ A þ þ @ ⋮ bn1,0 bnn,0

b1n,n1

⋱

⋮

bnn,n1

1 C A

¼ xn1 B0 þ xn2 B1 þ þ Bn1 :

ð12:53Þ

Thus, we get jxE AjE ¼ ðxE AÞ xn1 BT0 þ xn2 BT1 þ þ BTn1 :

ð12:54Þ

Replacing x with A and considering that BTl ðl ¼ 0, , n 1Þ is commutative with A, we have f A ðAÞ ¼ ðA AÞ An1 BT0 þ An2 BT1 þ þ BTn1 ¼ 0: This means that fA(A)(x) ¼ 0 for 8x 2 V n. These complete the proof.

ð12:55Þ ∎

In relation to Hamilton–Cayley theorem, we consider an important concept of a minimal polynomial.

12.3

Generalized Eigenvectors and Nilpotent Matrices

473

Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that f (A) ¼ 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the highest-order coefficient of one is said to be a minimal polynomial. We denote it by φA(x); i.e., φA(A) ¼ 0. We have an important property for this. Namely, a minimal polynomial φA(x) is a divisor of f(A). In fact, suppose that we have f ðxÞ ¼ gðxÞφA ðxÞ þ hðxÞ: Inserting A into x, we have f ðAÞ ¼ gðAÞφA ðAÞ þ hðAÞ ¼ hðAÞ ¼ 0: From the above equation, h(x) should be a polynomial whose order is lower than that of φA(A). But, the presence of such h(x) is in contradiction to the definition of the minimal polynomial. This implies that h(x) 0. Thus, we get f ðxÞ ¼ gðxÞφA ðxÞ: That is, φA(x) must be a divisor of f(A). Suppose that φ0A ðxÞ is another minimal polynomial. If the order of φ0A ðxÞ is lower than that of φA(x), we can choose φ0A ðxÞ for a minimal polynomial from the beginning. Thus we assume that φA(x) and φ0A ðxÞ are of the same order. We have f ðxÞ ¼ gðxÞφA ðxÞ ¼ g0 ðxÞφ0A ðxÞ: Then, we have φA ðxÞ=φ0A ðxÞ ¼ g0 ðxÞ=gðxÞ ¼ c, where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But, c should be one, since the highest-order coefficient of the minimal polynomial is one. Thus, φA(x) is uniquely defined.

12.3

Generalized Eigenvectors and Nilpotent Matrices

Equation (12.4) ensures that an eigenvalue is accompanied by an eigenvector. Therefore, if a matrix A : V n ! V n has different n eigenvalues without multiple roots, the vector space V n is decomposed to a direct sum of one-dimensional subspaces spanned by those individual linearly independent eigenvectors (see discussion of Sect 12.1). Thus, we have

474

12

Canonical Forms of Matrices

Vn ¼ W1 W2 Wn ¼ Spanfa1 g Spanfa2 g Spanfan g,

ð12:56Þ

where ai (1 i n) are eigenvectors corresponding to different n eigenvalues. The situation, however, is not always simple. Even though a matrix has eigenvalues of multiple roots, we have yet a simple case as shown in a next example. Example 12.2 Let us think of a following matrix A : V3 ! V3 described by 0

2

0

B A ¼ @0 0

2 0

0

1

C 0 A: 2

ð12:57Þ

The matrix has a triple root 2. As can easily be seen below, individual eigenvectors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3 can be decomposed to a direct sum of the three invariant subspaces as in (12.56). 0

2 0

0

B ð a1 a2 a3 Þ @ 0 2

1

C 0 A ¼ ð2a1 2a2 2a3 Þ:

0 0

ð12:58Þ

2

Let us think of another simple example. Example 12.3 Let us think of a linear transformation A : V2 ! V2 expressed as 3 A ð xÞ ¼ ð a1 a2 Þ 0

1 3

x1 x2

,

ð12:59Þ

where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a double root 3. We have ð a1 a2 Þ

3

1

0

3

¼ ð3a1 a1 þ 3a2 Þ:

ð12:60Þ

Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not. Detailed discussion about matrices of this kind follows below. Nilpotent matrices play an important role in matrix algebra. These matrices are defined as follows. Definition 12.2 Let N be a linear transformation in a vector space V n. Suppose that we have N k ¼ 0 and N k1 6¼ 0,

ð12:61Þ

12.3

Generalized Eigenvectors and Nilpotent Matrices

475

where N is a (n, n) square matrix and k (2) is a certain natural number. Then, N is said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61) holds with k ¼ 1, N is a zero matrix. Nilpotent matrices have following properties: (i) Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix. Suppose that Nx ¼ αx,

ð12:62Þ

where α is an eigenvalue and x (6¼0) is its corresponding eigenvector. Operating N (k 1) more times from the left of both sides of (12.62), we have N k x ¼ αk x:

ð12:63Þ

Meanwhile, Nk ¼ 0 by definition, and so αk ¼ 0, namely α ¼ 0. Conversely, suppose that eigenvalues of a (n, n) matrix N are zero. From Theorem 12.1, via a suitable similarity transformation with P we have e P1 NP ¼ N, e is a triangle matrix. Then, using (12.12) and (12.16) we have where N f P1 NP ðxÞ ¼ f eðxÞ ¼ f N ðxÞ ¼ xn : N

From Theorem 12.3, we have f N ðN Þ ¼ N n ¼ 0: Namely, N is a nilpotent matrix. In a trivial case, we have N ¼ 0 (zero matrix). By Definition 12.2, we have Nk ¼ 0 with a k-th nilpotent (n, n) matrix N. Then, its minimal polynomial is φN(x) ¼ xk (k n). (ii) A nilpotent matrix N is not diagonalizable (except for a zero matrix). Suppose that N is diagonalizable. Then N can be diagonalized by a non-singular matrix P such that P1 NP ¼ 0: The above equation holds, because N only takes eigenvalues of zero. Operating P from the left of the above equation and P1 from the right, we have N ¼ 0:

476

12

Canonical Forms of Matrices

This means that N would be a zero matrix, in contradiction. Thus, a nilpotent matrix N is not diagonalizable. Example 12.4 Let N be a matrix of a following form: N¼

1 : 0

0 0

Then, we can easily check that N2 ¼ 0. Therefore, N is a nilpotent matrix of a second order. Note that N is an upper triangle matrix, and so eigenvalues are given by diagonal elements. In the present case the eigenvalue is zero (as a double root), consistent with the aforementioned property. With a nilpotent matrix of an order k, we have at least one vector x such that Nk 1x 6¼ 0. Here we add that a zero transformation A (or matrix) can be defined as A ¼ 0 ⟺ Ax ¼ 0 for

8

x 2 V n:

∃

x 2 V n:

ð12:64Þ

Taking contraposition of this, we have A 6¼ 0 ⟺ Ax 6¼ 0 for

In relation to a nilpotent matrix, we have a following important theorem. Theorem 12.4 If N is a k-th order nilpotent matrix, then for ∃x 2 V we have following linearly independent k vectors: x, Nx N 2 x, , N k1 x: Proof Let us think of a following equation: Xk1

cN i¼0 i

i

x ¼ 0:

ð12:65Þ

Multiplying (12.65) by Nk 1 from the left and using Nk ¼ 0, we get c0 N k1 x ¼ 0:

ð12:66Þ

This implies that c0 ¼ 0. Thus, we are left with Xk1

cN i¼1 i

i

x ¼ 0:

Next, multiplying (12.67) by Nk 2 from the left, we get similarly

ð12:67Þ

12.3

Generalized Eigenvectors and Nilpotent Matrices

477

c1 N k1 x ¼ 0,

ð12:68Þ

implying that c1 ¼ 0. Continuing this procedure, we finally get ci ¼ 0 (0 i k 1). This completes the proof. This immediately tells us that for a k-th order nilpotent matrix N : V n ! V n, we should have k n. This is because the number of linearly independent vectors is at most n. In Example 12.4, N causes a transformation of basis vectors in V2 such that ðe1 e2 ÞN ¼ ðe1 e2 Þ

0 0

1 0

¼ ð0 e1 Þ:

That is, Ne2 ¼ e1. Then, linearly independent two vectors e2 and Ne2 correspond to the case of Theorem 12.4. So far we have shown simple cases where matrices can be diagonalized via similarity transformation. This is equivalent to that the relevant vector space is decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic polynomial of the matrix has multiple root(s), it remains uncertain whether the vector space is decomposed to such a direct sum. To answer this question, we need a following lemma. ∎ Lemma 12.1 Let f1(x), f2(x), , and fs(x) be polynomials without a common factor. Then we have s polynomials M1(x), M2(x), , Ms(x) that satisfy the following relation: M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ þ M s ðxÞf s ðxÞ ¼ 1:

ð12:69Þ

Proof Let Mi(x) (1 i s) be arbitrarily chosen polynomials and deal with a set of g(x) that can be expressed as gðxÞ ¼ M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ þ M s ðxÞf s ðxÞ:

ð12:70Þ

Let a whole set of g(x) be H. Then H has following two properties: (i) g1 ðxÞ, g2 ðxÞ 2 H⟹g1 ðxÞ þ g2 ðxÞ 2 H, (ii) gðxÞ 2 H, M(x): an arbitrarily chosen polynomial ⟹M ðxÞgðxÞ 2 H. Now let the lowest order polynomial out of the set H be g0(x). Then 8gðxÞð2 HÞ are a multiple of g0(x). Suppose that dividing g(x) by g0(x), we have gðxÞ ¼ M ðxÞg0 ðxÞ þ hðxÞ,

ð12:71Þ

where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from the above properties (i) and (ii). If h(x) 6¼ 0, the order of h(x) is lower than that of

478

12

Canonical Forms of Matrices

g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x). Therefore, h(x) ¼ 0. Thus, gðxÞ ¼ M ðxÞg0 ðxÞ:

ð12:72Þ

This implies that H is identical to a whole set of polynomials comprising multiples of g0(x). In particular, polynomials f i ðxÞ ð1 i sÞ 2 H. To show this, put Mi(x) ¼ 1 with other Mj(x) ¼ 0 ( j 6¼ i). Hence, the polynomial g0(x) should be a common factor of fi(x). Meanwhile, g0 ðxÞ 2 H, and so by virtue of (12.70) we have g0 ðxÞ ¼ M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ þ M s ðxÞf s ðxÞ:

ð12:73Þ

On assumption the greatest common factor of f1(x), f2(x), , and fs(x) is 1. This implies that g0(x) ¼ 1. Thus, we finally get (12.69) and complete the proof. ∎

12.4

Idempotent Matrices and Generalized Eigenspaces

In Sect. 12.1 we have shown that eigenvectors corresponding to different eigenvalues of A are linearly independent. Also if those eigenvalues do not possess multiple roots, the vector space comprises a direct sum of the subspaces of corresponding eigenvectors. However, how do we have to treat a situation where eigenvalues possess multiple roots? Even in this case there is at least one eigenvector that corresponds to the eigenvalue. To adequately address the question, we need a concept of generalized eigenvectors and generalized eigenspaces. For this purpose we extend and generalize the concept of eigenvectors. For a certain natural number l, if a vector x (2V n) satisfies a following relation, x is said to be a generalized eigenvector of rank l that corresponds to an eigenvalue α. ðA αE Þl x ¼ 0, ðA αEÞl1 x 6¼ 0: Thus (12.3) implies that an eigenvector of (12.3) may be said to be a generalized eigenvector of rank 1. When we need to distinguish it from generalized eigenvectors of rank l (l 2), we call it a “proper” eigenvector. Thus far we have only been concerned with the proper eigenvectors. We have a following theorem related to the generalized eigenvectors. In this section, idempotent operators play a role. The definition is simple. Definition 12.3 An operator A is said to be idempotent if A2 ¼ A. From this simple definition, we can draw several important pieces of information. Let A be an idempotent operator that operates on V n. Let x be an arbitrary vector in V n. Then, A2x ¼ A(Ax) = Ax. That is,

12.4

Idempotent Matrices and Generalized Eigenspaces

479

AðAx xÞ ¼ 0: Then, we have Ax = x

or

Ax = 0:

ð12:74Þ

From Theorem 11.3 (dimension theorem), we have V n ¼ AðV n Þ Ker A, where AðV n Þ ¼ fx; x 2 V n , Ax = xg, Ker A ¼ fx; x 2 V n , Ax = 0g:

ð12:75Þ

Thus, we find that A decomposes V n into a direct sum of A(V n) and Ker A. Conversely, we can readily verify that if there exists an operator A that satisfies (12.75), such A must be an idempotent operator. The verification is left for readers. Meanwhile, we have ðE 2 AÞ2 ¼ E 2A þ A2 ¼ E 2A þ A ¼ E A: Hence, E A is an idempotent matrix as well. Moreover, we have AðE AÞ ¼ ðE AÞA ¼ 0: Putting E A B and following a procedure similar to the above Bx x ¼ ðE AÞx x ¼ x Ax x ¼ Ax: Therefore, B(V n) ¼ {x; x 2 V n, Bx = x} is identical to Ker A. Writing W A ¼ f x; x 2 V n , Ax = xg, W A ¼ f x; x 2 V n , Ax = 0g,

ð12:76Þ

we get W A ¼ AV n , W A ¼ ðE AÞV n ¼ BV n ¼ V n AV n ¼ Ker A: That is, V n ¼ W A þ W A: Suppose that ∃ u 2 W A \ W A . Then, from (12.76) Au = u and Au = 0, namely u = 0, and so V is a direct sum of WA and W A . That is,

480

12

Canonical Forms of Matrices

V n ¼ W A W A: Notice that if we consider an identity operator E as an idempotent operator, we are thinking of a trivial case. That is, V n ¼ V n {0}. The result can immediately be extended to the case where more idempotent operators take part in the vector space. Let us define operators such that A1 þ A2 þ þ As ¼ E, where E is a (n, n) identity matrix. Also Ai Aj ¼ Ai δij : Moreover, we define Wi such that W i ¼ Ai V n ¼ fx; x 2 V n , Ai x ¼ xg:

ð12:77Þ

V n ¼ W 1 W 2 W s:

ð12:78Þ

Then

In fact, suppose that ∃x( 2 V n) 2 Wi, Wj (i 6¼ j). Then Aix ¼ x = Ajx. Operating Aj ( j 6¼ i) from the left, AjAix ¼ Ajx = AjAjx. That is, 0 ¼ x = Ajx, implying that Wi \ Wj ¼ {0}. Meanwhile, V n ¼ ðA1 þ A2 þ þ As ÞV n ¼ A1 V n þ A2 V n þ þ As V n ¼ W 1 þ W 2 þ þ W s:

ð12:79Þ

As Wi \ Wj ¼ {0} (i 6¼ j), (12.79) is a direct sum. Thus, (12.78) will follow. Example 12.5 Think of the following transformation A: 0

Put x =

1 B0 B ðe1 e2 e3 e4 ÞA ¼ ðe1 e2 e3 e4 ÞB @0

0 1 0

1 0 0 0 0C C C ¼ ðe1 e2 0 0Þ: 0 0A

0

0

0 0

P4

i¼1 xi ei .

Then, we have AðxÞ ¼

X4

x Aðei Þ = i¼1 i

X2

ðE AÞðxÞ ¼ x 2 AðxÞ = In the above,

xe i¼1 i i

X4

= W A,

xe i¼3 i i

= W A:

12.4

Idempotent Matrices and Generalized Eigenspaces

0

481

0 B0 B ðe1 e2 e3 e4 ÞðE AÞ ¼ ðe1 e2 e3 e4 ÞB @0

0 0 0

1 0 0 0 0C C C ¼ ð0 0 e3 e4 Þ: 1 0A

0

0

0 1

Thus, we have W A ¼ Spanfe1 , e2 g, W A ¼ Spanfe3 , e4 g, V 4 ¼ W A W A : The properties of idempotent matrices can easily be checked. It is left for readers. Using the aforementioned idempotent operators, let us introduce the following theorem. Theorem 12.5 [3, 4] Let A be a linear transformation V n ! V n. Suppose that a vector x (2V n) satisfies the following relation: ðA αEÞl x ¼ 0,

ð12:80Þ

where l is an enough large natural number. Then a set comprising x forms an Ainvariant subspace that corresponds to an eigenvalue α. Let α1, α2, , αs be eigenvalues of A different from one another. Then V n is decomposed to a direct sum of the A-invariant subspaces that correspond individually to α1, α2, , αs. This is succinctly expressed as follows: g g g Vn ¼ W α1 W α2 W αs :

ð12:81Þ

e αi ð1 i sÞ is given by Here W e αi ¼ fx; x 2 V n , ðA αi E Þli x ¼ 0g, W

ð12:82Þ

e α i ¼ ni . where li is an enough large natural number. If multiplicity of αi is ni, dim W e αk ð1 k sÞ. Proof Let us define the aforementioned A-invariant subspaces as W Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of powers of first-degree polynomials, we have f A ð xÞ ¼

Ys i¼1

ðx αi Þni ,

ð12:83Þ

where ni is a multiplicity of αi. Let us put f i ðxÞ ¼ f A ðxÞ=ðx αi Þni ¼

Ys j6¼i

x αj

nj

:

ð12:84Þ

482

12

Canonical Forms of Matrices

Then f1(x), f2(x), , and fs(x) do not have a common factor. Consequently, Lemma 12.1 tells us that there are polynomials M1(x), M2(x), , and Ms(x) such that M 1 ðxÞf 1 ðxÞ þ þ M s ðxÞf s ðxÞ ¼ 1:

ð12:85Þ

Replacing x with a matrix A, we get M 1 ðAÞf 1 ðAÞ þ þ M s ðAÞf s ðAÞ ¼ E:

ð12:86Þ

Or defining Mi(A)fi(A) Ai A1 þ A2 þ þ As ¼ E,

ð12:87Þ

where E is a (n, n) identity matrix. Moreover we have Ai Aj ¼ Ai δij :

ð12:88Þ

In fact, if i 6¼ j, Ai Aj ¼ M i ðAÞf i ðAÞM j ðAÞf j ðAÞ ¼ M i ðAÞM j ðAÞf i ðAÞf j ðAÞ ¼ M i ðAÞM j ðAÞ

s Y

ðA αk Þnk

k6¼i

¼ M i ðAÞM j ðAÞf A ðAÞ

s Y

ðA αl Þnl

l6¼j s Y

ðA αk Þnk ¼ 0:

ð12:89Þ

k6¼i, j

The second equality results from the fact that Mj(A) and fi(A) are commutable since both are polynomials of A. The last equality follows from Hamilton–Cayley theorem. On the basis of (12.87) and (12.89), Ai ¼ Ai E ¼ Ai ðA1 þ A2 þ þ As Þ ¼ A2i :

ð12:90Þ

Thus, we find that Ai is an idempotent matrix. Next, let us show that using Ai determined above, AiV n is identical to e αi ð1 i sÞ. To this end, we define Wi such that W W i ¼ Ai V n ¼ fx; x 2 V n , Ai x ¼ xg:

ð12:91Þ

ðx αi Þni f i ðxÞ ¼ f A ðxÞ:

ð12:92Þ

We have

Therefore, from Hamilton–Cayley theorem we have

12.4

Idempotent Matrices and Generalized Eigenspaces

ðA αi E Þni f i ðAÞ ¼ 0:

483

ð12:93Þ

Operating Mi(A) from the left and from the fact that Mi(A) commutes with ðA αi EÞni , we get ðA αi E Þni Ai ¼ 0,

ð12:94Þ

where we used Mi(A)fi(A) ¼ Ai. Operating both sides of this equation on V n, furthermore, we have ðA αi E Þni Ai V n ¼ 0: This means that e αi ð1 i sÞ: Ai V n ⊂ W

ð12:95Þ

e αi . Then (A αE)lx ¼ 0 holds for a certain Conversely, suppose that x 2 W natural number l. If Mi(x)fi(x) were divided out by x αi, LHS of (12.85) would be divided out by x αi as well, leading to the contradiction. Thus, it follows that (x αi)l and Mi(x)fi(x) do not have a common factor. Consequently, Lemma 12.1 ensures that we have polynomials M(x) and N(x) such that M ðxÞðx αi Þl þ N ðxÞM i ðxÞf i ðxÞ ¼ 1: and, hence, M ðAÞðA αi E Þl þ N ðAÞM i ðAÞf i ðAÞ ¼ E:

ð12:96Þ

Operating both sides of (12.96) on x, we get M ðAÞðA αi E Þl x þ N ðAÞM i ðAÞf i ðAÞx ¼ N ðAÞAi x = x:

ð12:97Þ

Notice that the first term of (12.97) vanishes from (12.82). As Ai is a polynomial of A, it commutes with N(A). Hence, we have x ¼ Ai ½N ðAÞx 2 Ai V n :

ð12:98Þ

e αi ⊂ Ai V n ð1 i sÞ: W

ð12:99Þ

Thus, we get

From (12.95) and (12.99), we conclude that

484

12

Canonical Forms of Matrices

e αi ¼ Ai V n ð1 i sÞ: W

ð12:100Þ

e αi that is defined as (12.82). In other words, Wi defined as (12.91) is identical to W Thus, we have Vn ¼ W1 W2 Ws or g g g Vn ¼ W α1 W α2 W αs :

ð12:101Þ

This completes the former half of the proof. With the latter half, the proof is as follows. e αi ¼ n0i . In parallel to the decomposition of V n to the direct Suppose that dim W sum of (12.81), A can be reduced as 0

Að1Þ B A @ ⋮ 0

⋱

1 0 C ⋮ A,

ð12:102Þ

ðsÞ

A

where A(i) (1 i s) is a (n0i , n0i ) matrix and a symbol indicates that A has been transformed by suitable similarity transformation. The matrix A(i) represents a linear e αi . We denote a n0i order identity matrix by E n0 . transformation that A causes to W i Equation (12.82) implies that the matrix represented by N i ¼ AðiÞ αi E n0i

ð12:103Þ

is a nilpotent matrix. The order of an nilpotent matrix is at most n0i (vide supra) and, hence, li can be n0i . With Ni we have h i f N i ðxÞ ¼ jxEn0i N i j ¼ xE n0i AðiÞ αi E n0i 0 ¼ ðx þ αi ÞEn0i AðiÞ ¼ xni : ¼ xEn0i AðiÞ þ αi E n0i

ð12:104Þ

The last equality is because eigenvalues of a nilpotent matrix are all zero. Meanwhile, 0

f AðiÞ ðxÞ ¼ jxE n0i AðiÞ j ¼ f N i ðx αi Þ ¼ ðx αi Þni : Equation (12.105) implies that

ð12:105Þ

12.5

Decomposition of Matrix

f A ð xÞ ¼

485

Ys

f ði Þ ð xÞ ¼ i¼1 A

Ys

0

ðx αi Þni ¼ i¼1

Ys i¼1

ðx αi Þni :

ð12:106Þ

The last equality comes from (12.83). Thus, n0i ¼ ni . These procedures complete ∎ the proof. At the same time, we may equate li in (12.82) to ni. Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can further be segmented according to individual eigenvalues. Considering Theorem 12.1 again, A(i) (1 i s) can be described as an upper triangle matrix by 0

αi

B AðiÞ @ ⋮ 0

1

C ⋱ ⋮ A: αi

ð12:107Þ

Therefore, denoting N (i) such that 0

0

B N ðiÞ ¼ AðiÞ αi E ni ¼ @ ⋮ 0

⋱

1

C ⋮ A,

ð12:108Þ

0

we find that N (i) is nilpotent. This is because all the eigenvalues of N (i) are zero. From (12.108), we have h

12.5

N ðiÞ

iμi

¼ 0 ðμi ni Þ:

Decomposition of Matrix

To investigate canonical forms of matrices, it would be convenient if a matrix can be decomposed into appropriate forms. To this end, the following definition is important. Definition 12.4 A matrix similar to a diagonal matrix is said to be semi-simple. In the above definition, if a matrix is related to another matrix by similarity transformation, those matrices are said to be similar to each other. When we have two matrices A and A0, we express it by A~A0 as stated above. This relation satisfies the equivalence law. That is, ðiÞ A A, ðiiÞ A A0 ) A0 A, ðiiiÞ A A0 , A0 A00 ) A A00 : Readers, check this. We have a following important theorem with the matrix decomposition.

486

12

Canonical Forms of Matrices

Theorem 12.6 [3, 4] Any (n, n) square matrix A is expressed uniquely as A ¼ S þ N,

ð12:109Þ

where S is semi-simple and N is nilpotent; S and N are commutable, i.e., SN ¼ NS. Furthermore, S and N are polynomials of A with scalar coefficients. Proof Using (12.86) and (12.87), we write S ¼ α1 A1 þ þ αs As ¼

Xs

α M ðAÞf i ðAÞ: i¼1 i i

ð12:110Þ

Then, Eq. (12.110) is a polynomial of A. From Theorem 12.1 and Theorem 12.5, A(i) (1 i s) in (12.102) is characterized by that A(i) is a triangle matrix whose eigenvalues αi (1 i s) are positioned on diagonal positions and that the order of A(i) is identical to the multiplicity of αi. Since Ai (1 i s) is an idempotent matrix, it should be diagonalized through similarity transformation (see Sect. 12.7). In fact, S is transformed via similarity transformation the same as (12.102) into 0

α1 En1

B S @ ⋮ 0

⋱

0

1

C ⋮ A, αs E ns

ð12:111Þ

where Eni ð1 i sÞ is an identity matrix of an order ni that is identical to the multiplicity of αi. This expression is equivalent to, e.g., 0

E n1

B A1 @ ⋮ 0

⋱

0

1

C ⋮A 0

in (12.110). Thus, S is obviously semi-simple. Putting N ¼ A S, N is described after the above transformation as 0

N ð1Þ

B N @ ⋮ 0

⋱

0

1

C ⋮ A, N ðiÞ ¼ AðiÞ αi Eni : N ðsÞ

ð12:112Þ

Since each N (i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also (12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable. To prove the uniqueness of the decomposition, we show the following: (i) Let S and S0 be commutable semi-simple matrices. Then, those matrices are simultaneously diagonalized. That is, with a certain non-singular matrix

12.5

Decomposition of Matrix

487

P, P1SP and P1S0P are diagonalized at once. Hence, S S0 is semi-simple as well. (ii) Let N and N0 be commutable nilpotent matrices. Then, N N0 is nilpotent as well. (iii) A matrix both semi-simple and nilpotent is zero matrix. (i) Let different eigenvalues of S be α1, , αs. Then, since S is semi-simple, a vector space V n is decomposed into a direct sum of eigenspaces W αi ð1 i sÞ. That is, we have V n ¼ W α1 W αs : Since S and S0 are commutable, with ∃ x 2 W αi we have SS0x ¼ S0Sx = S0(αix) ¼ αiS0x. Hence, we have S0 x 2 W αi . Namely, W αi is S0invariant. Therefore, if we adopt the basis vectors {a1, , an} with respect to the direct sum decomposition, we get 0

α1 E n1

B S @ ⋮ 0

0

1

C ⋱ ⋮ A, αs E ns

0

S01 B S0 @ ⋮ 0

1 0 C ⋱ ⋮ A::

S0s

Since S0 is semi-simple, S0i ð1 i sÞ must be semi-simple as well. Here, let {e1, , en} be original basis vectors before the basis vector transformation and let P be a representation matrix of the said transformation. Then, we have ðe1 en ÞP ¼ ða1 an Þ: Thus, we get 0

α1 En1

B P1 SP ¼ @ ⋮ 0

0

1

C ⋱ ⋮ A, αs E ns

0

S01 B 1 0 P S P ¼ @⋮ 0

1 0 C ⋱ ⋮ A: S0s

This means that both P1SP and P1S0P are diagonal. That is, P SP P1S0P ¼ P1(S S0)P is diagonal, indicating that S S0 is semisimple as well. 0 (ii) Suppose that Nν ¼ 0 and N 0ν ¼ 0 . From the assumption, N and N0 are commutable. Consequently, using binomial theorem we have 1

ðN N 0 Þ ¼ N m mN m1 N 0 þ þ ð1Þi m

m! i m N mi N 0 þ þ ð1Þm N 0 : i!ðm iÞ! ð12:113Þ

488

12

Canonical Forms of Matrices

Putting m ¼ ν + ν0 1, if i ν0, N0 i ¼ 0 from the supposition. If i < ν0, we have m i > m ν0 ¼ ν + ν0 1 ν0 ¼ ν 1; i.e., m i ν. Therefore, Nm i ¼ 0. Consequently, we have Nm iN0 i ¼ 0 with any i in (12.113). Thus, we get (N N0)m ¼ 0, indicating that N N0 is nilpotent. (iii) Let S be a semi-simple and nilpotent matrix. We describe S as 0

α1

B S @⋮ 0

⋱

0

1

C ⋮ A, αn

ð12:114Þ

where some of αi (1 i n) may be identical. Since S is nilpotent, all αi (1 i n) is zero. We have then S~0; i.e., S ¼ 0 accordingly. Now, suppose that a matrix A is decomposed differently from (12.109). That is, we have A ¼ S þ N ¼ S0 þ N 0

or

S S0 ¼ N 0 N:

ð12:115Þ

From the assumption, S0 and N0 are commutable. Moreover, since S, S0, N, and N0 are described by a polynomial of A, they are commutable with one another. Hence, from (i) and (ii) along with the second equation of (12.115), S S0 and N0 N are both semi-simple and nilpotent at once. Consequently, from (iii) S S0 ¼ N0 N ¼ 0. Thus, we finally get S ¼ S0 and N ¼ N0. That is, the decomposition is unique. These complete the proof. ∎ Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. On the basis of Theorem 12.6, we investigate Jordan canonical forms of matrices in the next section.

12.6

Jordan Canonical Form

Once the vector space V n has been decomposed to a direct sum of generalized eigenspaces with the matrix reduced in parallel, we are able to deal with individual e αi ð1 i sÞ and the corresponding A(i) (1 i s) separately. eigenspaces W

12.6.1 Canonical Form of Nilpotent Matrix To avoid complication of notation, we think of a following example where we assume a (n, n) matrix that operates on V n: Let the nilpotent matrix be N: Suppose that Nν 1 6¼ 0 and Nν ¼ 0 (1 ν n). If ν ¼ 1, the nilpotent matrix N is a zero

12.6

Jordan Canonical Form

489

matrix. Notice that since a characteristic polynomial is described by fN(x) ¼ xn, fN(N ) ¼ Nn ¼ 0 from Hamilton–Cayley theorem. Let W (i) be given such that

W ðiÞ ¼ x; x 2 V n , N i x ¼ 0 : Then we have V n ¼ W ðνÞ ⊃ W ðν1Þ ⊃ ⊃ W ð1Þ ⊃ W ð0Þ f0g:

ð12:116Þ

Note that when ν ¼ 1, we have trivially V n ¼ W (ν) ⊃ W (0) {0}. Let us put dim W ¼ mi, mi mi 1 ¼ ri (1 i ν), m0 0. Then we can add rν linearly independent vectors a1 , a2 , , and arν to the basis vectors of W (ν 1) so that those rν vectors can be basis vectors of W (ν). Unless N ¼ 0 (i.e., zero matrix), we must have at least one such vector; from the supposition with ∃x 6¼ 0 we have Nν 1x 6¼ 0, and so x 2 = W (ν 1). At least one such vector x is present and it is eligible for a basis vector of W (ν). Hence, rν 1 and we have (i)

W ðνÞ ¼ Spanfa1 , a2 , , arν g W ðν1Þ :

ð12:117Þ

Note that (12.117) is expressed as a direct sum. Meanwhile, Na1 , Na2 , , Narν 2 W ðν1Þ. In fact, suppose that x 2 W (ν), i.e., Nνx ¼ Nν 1(Nx) = 0. That is, Nx 2 W (ν 1). According to a similar reasoning made above, we have SpanfNa1 , Na2 , , Narν g \ W ðν2Þ ¼ f0g: Moreover, these rν vectors Na1 , Na2 , , and Narν are linearly independent. Suppose that c1 Na1 þ c2 Na2 þ þ crν Narν = 0:

ð12:118Þ

Operating N ν 2 from the left, we have N ν1 ðc1 a1 þ c2 a2 þ þ crν arν Þ = 0: This would imply that c1 a1 þ c2 a2 þ þ crν arν 2 W ðν1Þ . On the basis of the above argument, however, we must have c1 ¼ c2 ¼ ¼ crν ¼ 0. In other words, if ci (1 i rν) were nonzero, we would have ai 2 W (ν 1), in contradiction. From (12.118) this means linear independence of Na1 , Na2 , , and Narν . As W (ν 1) ⊃ W (ν 2), we may well have additional linearly independent vectors within the basis vectors of W (ν 1). Let those vectors be arν þ1 , , arν1. Here we assume that the number of such vectors is rν 1 rν. We have rν 1 rν 0 accordingly. In this way we can construct basis vectors of W (ν 1) by including arν þ1 , , and arν1 along with Na1 , Na2 , , and Narν . As a result, we get

490

12

Canonical Forms of Matrices

W ðν1Þ ¼ SpanfNa1 , , Narν , arν þ1 , , arν1 g W ðν2Þ : We can repeat these processes to construct W (ν 2) such that

W ðν2Þ ¼ Span N 2 a1 , , N 2 arν , Narν þ1 , , Narν1 , arν1 þ1 , , arν2 W ðν3Þ : For W (i), furthermore, we have

W ðiÞ ¼ Span N νi a1 , , N νi arν , N νi1 arν þ1 , , N νi1 arν1 , , ariþ1 þ1 , , ari W ði1Þ : ð12:119Þ Further repeating the procedures, we exhaust all the n basis vectors of W (ν) ¼ V n. These vectors are given as follows: N k ariþ1 þ1 , , N k ari ð1 i ν; 0 k i 1Þ: At the same time we have 0 r νþ1 < 1 r ν r ν1 r 1 :

ð12:120Þ

Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have rν, rν 1, , r1 vectors. Their sum is n. This is the same number as that vertically counted. The dimension n of the vector space V n is thus given by n¼

Xν

r ¼ i¼1 i

Xν i¼1

iðr i r iþ1 Þ:

ð12:121Þ

Let us examine the structure of Table 12.1 more closely. More specifically, let us inspect the i-layered structures of (ri ri + 1) vectors. Picking up a vector from among ariþ1 þ1 , , ari , we call it aρ. Then, we get following set of vectors aρ, Naρ N2aρ, , Ni 1aρ in the i-layered structure. These i vectors are displayed “vertically” in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and form an i- dimensional N-invariant subspace; i.e., Span{Ni 1aρ, Ni 2aρ, , Naρ, aρ}, where ri + 1 + 1 ρ ri. Matrix representation of the linear transformation N with respect to the set of these i vectors is

a

N a1 , , N νrν

ν1

ν1

ar ν

N νi1 a1 , , N νi1 arν N ν2 a1 , , N ν2 arν

rν# a1 , , arν Na1 , , Narν N 2 a1 , , N 2 arν N νi a1 , , N νi arν

ν2

N arν þ1 , , N arν1 (ν 1)(rν 1 rν)

ν2

rν 1 rν# arν þ1 , , arν1 Narν þ1 , , Narν1 N νi1 arν þ1 , , N νi1 arν1 N νi2 arν þ1 , , N νi2 arν1 N ν3 arν þ1 , , N ν3 arν1

ari

N ariþ1 þ1 , , N i(ri ri + 1)

i1

i1

ri ri + 1# ariþ1 þ1 , , ari Nariþ1 þ1 , , Nari N i2 ariþ1 þ1 , , N i2 ari

2(r2 r3)

Nar3 þ1 , , Nar2

r2 r3# ar3 þ1 , , ar2

Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo

P n¼ νk¼1 r k

r1

rν rν 1 ri ri 1 r2

Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa

r1 r2

ar2 þ1 , , ar1

r1 r2#

12.6 Jordan Canonical Form 491

492

12

N i1 aρ N i2 aρ Naρ aρ N

0

B B B B B B B i1 B i2 ¼ N aρ N aρ Naρ aρ B B B B B B B @

0

Canonical Forms of Matrices

1

1 0 1 0 ⋮

⋱

C C C C C C C C C: ⋮ C C 0 1 C C C 0 1C A

ð12:122Þ

0 These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Notice that the number of those Jordan blocks is (ri ri + 1). Let us expressly define this number as [3, 4] J i ¼ r i r iþ1 ,

ð12:123Þ

where Ji is the number of the i-th order Jordan blocks. The total number of Jordan blocks within a whole vector space V n ¼ W (ν) is Xν

J i¼1 i

¼

Xν i¼1

ðr i r iþ1 Þ ¼ r 1 :

ð12:124Þ

Recalling the dimension theorem mentioned in (11.45), we have dim V n ¼ dim Ker N i þ dimN i ðV n Þ ¼ dim Ker N i þ rank N i :

ð12:125Þ

Meanwhile, since W (i) ¼ Ker Ni, dim W (i) ¼ mi ¼ dim Ker Ni. From (12.116), m0 0. Then (11.45) is now read as dim V n ¼ mi þ rankN i :

ð12:126Þ

n ¼ mi þ rankN i

ð12:127Þ

mi ¼ n rankN i :

ð12:128Þ

That is

or

Meanwhile, from Table 12.1 [3, 4] we have

12.6

Jordan Canonical Form

dim W ðiÞ ¼ mi ¼

493

Xi

r , dim k¼1 k

W ði1Þ ¼ mi1 ¼

Xi1

r : k¼1 k

ð12:129Þ

Hence, we have r i ¼ mi mi1 :

ð12:130Þ

Then we get [3, 4] J i ¼ r i r iþ1 ¼ ðmi mi1 Þ ðmiþ1 mi Þ ¼ 2mi mi1 miþ1 ¼ 2 n rankN i n rankN i1 n rankN iþ1 ¼ rankN i1 þ rankN iþ1 2rankN i :

ð12:131Þ

The number Ji is therefore defined uniquely by N. The total number of Jordan blocks r1 is also computed using (12.128) and (12.130) as r 1 ¼ m1 m0 ¼ m1 ¼ n rankN ¼ dim Ker N,

ð12:132Þ

where the last equality arises from the dimension theorem expressed as (11.45). In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν ¼ 1 in (12.116), i.e., N ¼ 0, from (12.132) P we have r1 ¼ n and r2 ¼ ¼ rn ¼ 0; see Fig. 12.1a. Also we confirm n ¼ νi¼1 r i in (12.121). In this case, all the eigenvectors are proper eigenvectors with multiplicity of n and we have n first-order Jordan blocks. The other is the case of ν ¼ n. In that case, we have r 1 ¼ r 2 ¼ ¼ r n ¼ 1:

ð12:133Þ

P In the latter case, we also have n ¼ νi¼1 r i in (12.121). We have only one proper eigenvector and (n 1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we have this special case where we have only one n-th order Jordan block, when rankN ¼ n 1.

12.6.2 Jordan Blocks Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and considering (12.108), we put N i ¼ AðiÞ αi E ni ð1 i sÞ,

ð12:134Þ

494

12

Canonical Forms of Matrices

(b)

Fig. 12.1 Examples of the structure of Jordan blocks. (a) r1 ¼ n. (b) r1 ¼ r2 ¼ ¼ rn ¼ 1

⋯

(a) ,

, ⋯, =

=

=⋯=1

where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as before. In (12.134) the number ni corresponds to n in V n of Sect. 12.6.1. As Ni is a (ni, ni) matrix, we have N i ni ¼ 0: Here we are speaking of ν-th order nilpotent matrices Ni such that Niν 1 6¼ 0 and Ni ¼ 0 (1 ν ni). We can deal with Ni in a manner fully consistent with the theory we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ) that is expressed as ν

AðκÞ ¼ N κi þ αi E κi ð1 κi ni Þ,

ð12:135Þ

where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent (κi, κ i) matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed on the principal diagonal and entries of 1 are positioned on the matrix element next above the principal diagonal. All other entries are zero; see, e.g., a matrix of (12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our next task is to find out how many Jordan blocks are contained in individual A(i) and what is the dimension of those Jordan blocks. Corresponding to (12.122), the matrix representation of the linear transformation by A(κ) with respect to the set of κ i vectors is

12.6

Jordan Canonical Form

h

AðκÞ αi E κi

495

iκi 1

aσ

h h i AðκÞ αi Eκi κi 2 aσ aσ AðκÞ αi Eκi

h iκi 1 h iκi 2 ðκ Þ ðκ Þ ¼ A αi Eκi aσ A αi E κi aσ aσ 0 B B B B B B B B B B B @

0

1

1 0

1 ⋱ ⋱

0 ⋮

⋱ 0

C C C C C C ⋮ C: C C 1 C C 0 1A 0

ð12:136Þ

A vector aσ stands for a vector associated with the κ-th Jordan block of A(i). From (12.136) we obtain h i h iκi 1 ðκ Þ ðκ Þ aσ = 0: A αi Eκi A αi Eκi

ð12:137Þ

Namely, AðκÞ

h

AðκÞ αi E κi

h iκi 1 iκi 1 aσ = αi AðκÞ αi E κi aσ :

ð12:138Þ

κi 1 aσ is a proper eigenvector of A(κ) that correThis shows that AðκÞ αi Eκi sponds to an eigenvalue αi. On the other hand, aσ is a generalized eigenvector of rank κ i. There are another (κ i 2) generalized eigenvectors of μ AðκÞ αi Eκi aσ ð1 μ κ i 2Þ . In total, there are κi eigenvectors [a proper eigenvector and (κi 1) generalized eigenvectors]. Also we see that the sole proper eigenvector can be found for each Jordan block. In reference to these κ i eigenvectors as the basis vectors, the (κ i,κi)-matrix A(κ) (i.e., a Jordan block) is expressed as 1 ( )

⎛ ⎜ = ⎜ ⎜ ⎜

1 ⋮

⋱ ⋱

⋱

⋮ 1

⎞ ⎟ ⎟. ⎟ ⎟

ð12:139Þ

496

12

Canonical Forms of Matrices

A (ni, ni) matrix A(i) of (12.102) pertinent to an eigenvalue αi contains a direct sum of Jordan blocks whose dimension ranges from 1 to ni. The largest possible number of Jordan blocks of dimension d (that satisfies n2i þ 1 d ni , where [μ] denotes a largest integer that does not exceed μ) is at most one. An example depicted below is a matrix A(i) that explicitly includes two onedimensional Jordan blocks, a (3, 3) three-dimensional Jordan block, and a (5, 5) fivedimensional Jordan block: 0

AðiÞ

B B B B B B B B B

B B B B B B B B B @

αi

1

0 αi

0 αi

1 αi

⋮

1 αi

0 αi

⋮ 1 αi

1 αi

1 αi

C C C C C C C C C C, C C C C C C C C 1A αi

where A(i) is a (10, 10) upper triangle matrix in which αi is displayed on the principal diagonal with entries 0 or 1 on the matrix element next above the principal diagonal with all other entries being zero. Theorem 12.1 shows that every (n, n) square matrix can be converted to a triangle matrix by suitable similarity transformation. Diagonal elements give eigenvalues. Furthermore, Theorem 12.5 ensures that A can be reduced to generalized e αi (1 i s) according to individual eigenvalues. Suppose for eigenspaces W example that after a suitable similarity transformation a full matrix A is rresented as 0 B B B B B B B B B B B B A B B B B B B B B B B B @

1

α1 α2

α2

α2 ⋱ αi

αi

αi

αi ⋱ αs

C C C C C C C C C C C C C, C C C C C C C C C C A αs

where

ð12:140Þ

12.6

Jordan Canonical Form

497

A ¼ Að1Þ Að2Þ AðiÞ AðsÞ :

ð12:141Þ

In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix; A(i) is a (4, 4) matrix; A(s) is a (2, 2) matrix. The above matrix form allows us to further deal with segmented triangle matrices separately. In the case of (12.140) we may use a following matrix for similarity transformation: 0 B B B B B B B B B B B B B B B e¼B P B B B B B B B B B B B B B B B @

1

1

C C C C C C C C C C C C C C C C C, C C C C C C C C C C C C C C A

1 1 1 ⋱ p11

p12

p13

p14

p21

p22

p23

p24

p31

p32

p33

p34

p41

p42

p43

p44 ⋱ 1 1

where a (4, 4) matrix P given by 0

p11

B B p21 P¼B Bp @ 31 p41

p12

p13

p14

p22 p32

p23 p33

p24 p34

p42

p43

p44

1 C C C C A

is a non-singular matrix. The matrix P is to be operated on A(i) so that we can separately perform the similarity transformation with respect to a (4, 4) nilpotent matrix A(i) αiE4 following the procedures mentioned Sect. 12.6.1. Thus, only an αi-associated segment can be treated with other segments left unchanged. In a similar fashion, we can consecutively deal with matrix segments related to other eigenvalues. In a practical case, however, it is more convenient to seek different eigenvalues and corresponding (generalized) eigenvectors at once and convert the matrix to Jordan canonical form. To make a guess about the structure of a

498

12

Canonical Forms of Matrices

matrix, however, the following argument will be useful. Let us think of an example after that. Using (12.140), we have 0 B B B B B B B B B B B B B B B B B B B A αi E B B B B B B B B B B B B B B B B B B @

1

α1 αi α 2 αi

α2 αi

α2 αi ⋱

0

0 0 0 ⋱ αs αi

αs αi

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C A

Here we are treating the (n, n) matrix A on V n. Note that a matrix A αiE is not nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni ¼ 4. Since eigenvalues α1, α2, , and αs take different values from one another, α1 αi, α2 αi, , and αs αi 6¼ 0. In a triangle matrix diagonal elements give eigenvalues and, hence, α1 αi, α2 αi, , and αs αi are nonzero eigenvalues of A αiE. We rewrite (12.140) as 0 B B B B A αi E B B B B @

1

M ð1Þ ⋱ M ðiÞ ⋱ 0

where M ð1Þ ¼ ðα1 αi Þ,

C C C C C, C C C A

M ð2Þ

B M ð2Þ ¼ @

ð12:142Þ

M ðsÞ α2 αi

α2 αi

1

C A, , α2 αi

12.6

Jordan Canonical Form

0

M

ðiÞ

0 B 0 B ¼B @

499

0

1 C αs αi C ðsÞ C, , M ¼ A

αs αi

:

0 Thus, M ( p) ( p 6¼ i) is a non-singular matrix and M (i) is a nilpotent matrix. Note

μ 1

μ that if we can find μi such that M ðiÞ i 6¼ 0 and M ðiÞ i ¼ 0 , with a minimal polynomial φM ðiÞ ðxÞ for M (i), we have φM ðiÞ ðxÞ ¼ xμi . Consequently, we get 0 B B B B B μi ðA αi EÞ B B B B @

M ð1Þ

1

μi

ð2Þ μi M ⋱ 0 ⋱

ðsÞ μi M

C C C C C C: C C C A

ð12:143Þ

ðpÞ μi In (12.143) ðp 6¼ iÞ μidiagonal elements of non-singular triangle matrices M are αp αi ð6¼ 0Þ . Thus, we have a “perforated” matrix ðA αi E Þμi where

ðiÞ μi M ¼ 0 in (12.143). Putting ΦA ð x Þ

Ys i¼1

ðx αi Þμi ,

we get ΦA ðAÞ

Ys i¼1

ðA αi E Þμi ¼ 0:

A polynomial ΦA(x) gives a minimal polynomial for fA(x). From the above argument, we can choose μi for li in (12.82). Meanwhile, M (i) in (12.142) is identical to Ni in (12.134). Rewriting ((12.134)), we get AðiÞ ¼ M ðiÞ þ αi E ni :

ð12:144Þ

k Let us think of matrices AðiÞ αi E ni and (A αiE)k (k μi). From (11.45), we find dim V n ¼ n ¼ dim Ker ðA αi E Þk þ rank ðA αi E Þk , and

ð12:145Þ

500

12

Canonical Forms of Matrices

h ik h ik e αi ¼ ni ¼ dim Ker AðiÞ αi Eni þ rank AðiÞ αi E ni , dim W where rank (A αiE)k ¼

k e αi . dim AðiÞ αi Eni W Noting that

ð12:146Þ

k dim (A αiE)k(V n) and rank AðiÞ αi Eni ¼

h ik dim Ker ðA αi E Þk ¼ dim Ker AðiÞ αi E ni ,

ð12:147Þ

h ik n rank ðA αi E Þk ¼ ni rank AðiÞ αi Eni :

ð12:148Þ

we get

This notable property comes from the non-singularity of [M ( p)]k ( p 6¼ i, k: a positive all the eigenvalue of [M ( p)]k are nonzero. In particular, as

ðiÞinteger); i.e., μi rank A αi E ni ¼ 0, from (12.148) we have rankðA αi E Þl ¼ n ni ðl μi Þ:

ð12:149Þ

Meanwhile, putting k ¼ 1 in (12.148) and using (12.132) we get h i h i dim Ker AðiÞ αi E ni ¼ ni rank AðiÞ αi E ni ¼ n rank ðA αi E Þ ¼ dim Ker ðA αi EÞ ðiÞ

¼ r1 :

ð12:150Þ

ðiÞ

The value r 1 gives the number of Jordan blocks with an eigenvalue αi. Moreover, we must consider a following situation. We know how the matrix A(i) in (10.134) is reduced to Jordan blocks of lower dimension. To get detailed information about it, however, we have to get the information about (generalized) eigenvalues corresponding to eigenvalues other than αi. In this context, (12.148) is useful. Equation (12.131) tells how the number of Jordan blocks in a nilpotent matrix is determined. If we can get this knowledge before we have found out all the (generalized) eigenvectors, it will be easier to address the problem. Let us rewrite (12.131) as h iq1 h iqþ1 J ðqiÞ ¼ r q r qþ1 ¼ rank AðiÞ αi Eni þ rank AðiÞ αi E ni h iq 2rank AðiÞ αi E ni ,

ð12:151Þ

12.6

Jordan Canonical Form

501

where we define J ðqiÞ as the number of the q-th order Jordan blocks within A(i). Note that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J ðqiÞ is expressed as J ðqiÞ ¼ rank ðA αi EÞq1 þ rank ðA αi E Þqþ1 2rank ðA αi E Þq :

ð12:152Þ

This relation can be obtained by replacing k in (12.148) with q 1, q + 1, and q, respectively, and deleting n and ni from these three relations. This enables us to gain access to a whole structure of the linear transformation represented by the (n, n) matrix A without reducing it to subspaces. To enrich our understanding of Jordan canonical forms, a following tangible example will be beneficial.

12.6.3 Example of Jordan Canonical Form Let us think of a following matrix A: 0

1

B1 B A¼B @0 3

1

1

0

3 0

0 2

0C C C: 0A

1

2

4

0

ð12:153Þ

The characteristic equation fA(x) is given by x 1 1 f A ðxÞ ¼ 0 3

1

0

x3

0

0

x2

1

2

0 : 0 x 4 0

ð12:154Þ

¼ ðx 4Þðx 2Þ3 Equating (12.154) to zero, we get an eigenvalue 4 as a simple root and an eigenvalue 2 as a triple root. The vector space V4 is then decomposed to two invariant subspaces. The first is a one-dimensional kernel (or null-space) of the transformation (A 4E) and the other is a three-dimensional kernel of the transformation (A 2E)3. We have to seek eigenvectors that span these invariant subspaces.

502

12

Canonical Forms of Matrices

ðiÞ x ¼ 4: An eigenvector belonging to the first invariant subspace must satisfy a proper eigenvalue equation since the eigenvalue 4 is simple root. This equation is expressed as ðA 4EÞx ¼ 0: This reads in a matrix form as 0

3 B1 B B @0 3

1 1

0 0

0 1

2 2

10 1 c1 0 C B 0 C B c2 C C CB C ¼ 0: 0 A @ c3 A 0

ð12:155Þ

c4

This is equivalent to a following set of four equations: 3c1 c2 ¼ 0, c1 c2 ¼ 0, 2c3 ¼ 0, 3c1 c2 2c3 ¼ 0: These are equivalent to that c3 ¼ 0 and c1 ¼ c2 ¼ 3c1. Therefore, c1 ¼ c2 ¼ c3 ¼ 0 with an arbitrarily chosen number of c4, which is chosen as 1 as usual. Hence, ð4Þ designating the proper eigenvector as e1 , its column vector representation is 0

0

1

B0C B C ð4Þ e1 ¼ B 0 C: @ A 1 A (4, 4) matrix in (12.155) representing (A 4E) has a rank 3. The number of Jordan blocks for an eigenvalue 4 is given by (12.150) as ð4Þ

r 1 ¼ 4 rankðA 4EÞ ¼ 1:

ð12:156Þ

In this case, the Jordan block is naturally one-dimensional. In fact, using (12.152) we have ð4Þ

J 1 ¼ rank ðA 4E Þ0 þ rank ðA 4EÞ2 2rank ðA 4E Þ ¼ 4 þ 3 2 3 ¼ 1:

ð12:157Þ

12.6

Jordan Canonical Form

503

ð4Þ

In (12.157), J 1 gives the number of the first-order Jordan blocks for an eigenvalue 4. We used 0

8

B 4 B ðA 4EÞ2 ¼ B @0 8

4 0

0

1

0C C C 0A

0 0 0 4 4 4

ð12:158Þ

0

and confirmed that rank (A 4E)2 ¼ 3. ðiiÞ x ¼ 2 : The eigenvalue 2 has a triple root. Therefore, we must examine how the invariant subspaces can further be decomposed to subspaces of lower dimension. To this end we first start with a secular equation expressed as ðA 2EÞx ¼ 0:

ð12:159Þ

The matrix representation is 0

1

B1 B B @0 3

1

0

1

0

0 1

0 2

0

10

c1

1

B C 0C CB c2 C CB C ¼ 0: 0 A@ c3 A 2

c4

This is equivalent to a following set of two equations: c1 þ c2 ¼ 0, 3c1 c2 2c3 þ 2c4 ¼ 0: From the above, we can put c1 ¼ c2 ¼ 0 and c3 ¼ c4 (¼1). The equations allow the existence of another proper eigenvector. For this we have c1 ¼ c2 ¼ 1, c3 ¼ 0, and c4 ¼ 1. Thus, for the two proper eigenvectors corresponding to an eigenvalue 2, we get 0

0

1

B0C B C ð2Þ e1 ¼ B 1 C, @ A 1

0

1

1

B 1 C B C ð2Þ e2 ¼ B 0 C: @ A 1

504

12

Canonical Forms of Matrices

A dimension of the invariant subspace corresponding to an eigenvalue 2 is three (due to the triple root) and, hence, there should be one generalized eigenvector. To determine it, we examine a following matrix equation: ðA 2EÞ2 x ¼ 0:

ð12:160Þ

The matrix representation is 0

0

B0 B B @0 4

0

0

0

0 0

0 0

0

4 4

10

c1

1

B C 0C CB c2 C CB C: 0 A@ c3 A c4

That is, c1 c3 þ c4 ¼ 0:

ð12:161Þ

Furthermore, we have 0

0 B0 B ðA 2E Þ3 ¼ B @0 8

0 0

0 0

0 0

0 8

1 0 0C C C: 0A

ð12:162Þ

8

Moreover, rank (A 2E)l(¼1) remains unchanged for l 2 as expected from (12.149). It will be convenient to examine a structure of the invariant subspace. For this ð2Þ purpose, we seek the number of Jordan blocks r 1 and their order. Using (12.150), we have ð2Þ

r 1 ¼ 4 rankðA 2E Þ ¼ 4 2 ¼ 2:

ð12:163Þ

The number of first-order Jordan blocks is ð2Þ

J 1 ¼ rank ðA 2EÞ0 þ rank ðA 2EÞ2 2rank ðA 2E Þ ¼ 4 þ 1 2 2 ¼ 1:

ð12:164Þ

In turn, the number of second-order Jordan blocks is ð 2Þ

J 2 ¼ rank ðA 2E Þ þ rank ðA 2E Þ3 2rank ðA 2E Þ2 ¼ 2 þ 1 2 1 ¼ 1:

ð12:165Þ

12.6

Jordan Canonical Form ð2Þ

505 ð2Þ

In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan blocks is three; the number of the first-order and second-order Jordan blocks is two and one, respectively. ð2Þ ð2Þ The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper ð2Þ ð2Þ eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to ð2Þ ð2Þ J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that h i ð2Þ ð2Þ ð2Þ ðA 2EÞ2 g2 ¼ ðA 2E Þ ðA 2E Þg2 = ðA 2EÞe2 = 0:

ð12:166Þ

From (12.161), we can put c1 ¼ c3 ¼ c4 ¼ 0 and c2 ¼ 1. Thus, the matrix ð2Þ representation of the generalized eigenvector g2 is 0

0

1

B 1 C B C ð2Þ g2 ¼ B C: @0 A

ð12:167Þ

0 ð2Þ

ð2Þ

We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because from (12.166) we have ð2Þ

ð2Þ

ðA 2E Þg2 ¼ e2 ,

ð2Þ

ð2Þ

ðA 2EÞg2 6¼ e1 :

ð12:168Þ

Thus, we have determined a set of (generalized) eigenvectors. The matrix representation R for the basis vectors transformation is given by 0

0 0

B0 0 B R¼B @0 1 1 1

1

0

1

1 0

1 C C ð4Þ ð2Þ ð2Þ ð2Þ C e 1 e 1 e 2 g2 , 0 A

1

0

ð12:169Þ

ð4Þ

ð2Þ

ð2Þ

where the symbol ~ denotes the column vector representation; e1 , e1 , and e2 ð2Þ represent proper eigenvectors and g2 is a generalized eigenvector. Performing similarity transformation using this R, we get a following Jordan canonical form:

506

12

Fig. 12.2 Structure of Jordan blocks of a matrix shown in (12.170)

Canonical Forms of Matrices

-RUGDQEORFNV ( )

( )

( )

(LJHQYDOXH

0

1

B B0 B R AR ¼ B B1 @ 1

0

1

0

1

0

0

1 1 0 1 4 0 0 0 C B B 0 2 0 0C C B ¼B C: B 0 0 2 1C A @ 0

0

0

0

1

10

1

CB B 0C CB 1 CB B 0C A@ 0 3 0

1

0

3

0

0

2

1

2

( )

(LJHQYDOXH

0

10

0 0

CB B 0C CB 0 0 CB B 0C A@ 0 1 1 1 4

1 1 0 1

0

1

C 1 C C C 0 C A 0

2 ð12:170Þ

The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of A remains unchanged before and after similarity transformation. Next, we consider column vector representations. According to (11.37), let us view the matrix A as a linear transformation over V 4. Then A is given by 0

1 x1 B x2 C B C AðxÞ ¼ ðe1 e2 e3 e4 ÞAB C @x A 3 x4 0 1 1 B1 3 B ¼ ð e1 e2 e3 e4 Þ B @0 0 3 1

0 0 2 2

10

1 x1 B C 0C CB x2 C CB C, 0 A@ x3 A x4 4

0

ð12:171Þ

where e1, e2, e3, and e4 areP basis vectors and x1, x2, x3, and x4 are corresponding coordinates of a vector x = ni¼1 xi ei 2 V 4 . We rewrite (12.171) as

12.6

Jordan Canonical Form

507

0

x1

1

C B B x2 C C B C AðxÞ ¼ ðe1 e2 e3 e4 ÞRR1 ARR1 B C B Bx C @ 3A x4 0

4 B 0 ð4Þ ð2Þ ð2Þ ð2Þ B ¼ e 1 e 1 e 2 g2 B @0 0

0 2

0 0

0 0

2 0

10 0 1 x1 0 B x02 C 0C CB C CB C, 1 A@ x03 A 2 x04

ð12:172Þ

where we have

ð4Þ ð2Þ ð2Þ ð2Þ e1 e1 e2 g2 ¼ ðe1 e2 e3 e4 ÞR, 0 0 1 1 0 x1 x1 B 0 C Bx C B x2 C B 2C C B 1 B C: C¼R B B C B x0 C A @ x @ 3A 3 x4 x0

ð12:173Þ

4

After (11.84), we have 0

x1

1

0

x1

1

Bx C Bx C B 2C B 2C 1 B C C B x = ð e1 e2 e 3 e4 Þ B C ¼ ðe1 e2 e3 e4 ÞRR B C @ x3 A @ x3 A x4 x4 0 0 1 x1 B 0 C B x C ð4Þ ð2Þ ð2Þ ð2Þ B 2 C ¼ e 1 e 1 e 2 g2 B C: B x0 C @ 3A

ð12:174Þ

x04

As for V n in general, let us put R ¼ ( p)ij and R1 ¼ (q)ij. Also represent j-th (generalized) eigenvectors by a column vector and denote them by p( j ). There we display individual (generalized) eigenvectors in the order of (e(1) e(2) e( j ) e(n)), where e( j ) (1 j n) denotes either a proper eigenvector or a generalized eigenvector according to (12.174). Each p( j ) is represented in reference to original basis vectors (e1 en). Then we have

508

R1 pð jÞ ¼

Xn

ð jÞ q p k¼1 ik k

12

Canonical Forms of Matrices

ð jÞ

ð12:175Þ

¼ δi ,

ð jÞ

where δi denotes a column vector to which only the j-th row is 1, otherwise ð jÞ 0. Thus, a column vector δi is an “address” of e( j ) in reference to (1) (2) ( j) (n) (e e e e ) taken as basis vectors. In our present case, in fact, we have 0

1 0

B0 B R1 pð1Þ ¼ B @1 0

R1 pð2Þ

R1 pð3Þ

R1 pð4Þ

0 0

1 1

1 B0 B ¼B @1 1 0 1 B0 B ¼B @1 1 0 1 B0 B ¼B @1 1

10 1 0 1 1 0 B B C C 0 CB 0 C B 0 C C CB C ¼ B C, @ @ A A 0A 0 0

1 1 1 0

0 1 10 1 0 1 0 0 1 1 B0C B1C 1 0C CB C B C CB C ¼ B C, 0 0 A@ 1 A @ 0 A 0 1 0 0 1 0 1 10 0 1 1 1 B B C C 1 0 CB 1 C B 0 C C C ¼ B C, CB 0 0 A@ 0 A @ 1 A 0

0

0 1 0

0

0 0 0 1 0 0

0 0

1 1

0 1

0 0

0 0 1 0 0 1 B B C C 0 CB 1 C B 0 C C C ¼ B C: CB @ @ A A 0A 0 0 1

10

0

1

ð12:176Þ

1

0 ð4Þ

In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector ð2Þ representation of e1 , and so on. A minimal polynomial ΦA(x) is expressed as ΦA ð x Þ ¼ ð x 4Þ ð x 2Þ 2 : Readers can easily make sure of it. We remark that a non-singular matrix R pertinent to the similarity transformation is not uniquely determined, but we have arbitrariness. In (12.169), for example, if we ð2Þ ð2Þ ð2Þ ð2Þ adopt g2 instead of g2 , we should adopt e2 instead of e2 accordingly. Thus, instead of R in (12.169) we may choose

12.6

Jordan Canonical Form

509

0

0 0 B0 0 B R0 ¼ B @0 1

1 1 0

1 0 1C C C: 0A

1 1

1

0

ð12:177Þ

In this case, we also get the same Jordan canonical form as before. That is, 0

4 0

B0 2 B 1 R0 AR0 ¼ B @0 0 0 0

0

0

1

0 2

0C C C: 1A

0

2

Suppose that we choose R00 such that 0

0

B0 B ðe1 e2 e3 e4 ÞR00 ¼ ðe1 e2 e3 e4 ÞB @1 1

1

0

1

1

0 1

0 0

1

0

0C C ð2Þ ð2Þ ð2Þ ð4Þ C ¼ e 1 e 2 g2 e 1 : 0A 1

In this case, we have 0

2 B0 B 1 R00 AR00 ¼ B @0 0

0 0 2 1 0 2 0 0

1 0 0C C C: 0A 4

Note that we get a different disposition of the matrix elements from that of (12.172). Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In (12.172), we had 0

4 B0 B R1 AR ¼ B @0 0 Defining

0 2

0 0

0 0

2 0

1 0 0C C C: 1A 2

510

12

0

4 B0 B S¼B @0

0 2

0 0

0

2

1 0 0C C C 0A

0

0

0

2

0

and

Canonical Forms of Matrices

0 B0 B N¼B @0

0 0

0 0

0

0

1 0 0C C C, 1A

0

0

0

0

we have R1 AR ¼ S þ N, i:e:, A ¼ RðS þ N ÞR1 ¼ RSR1 þ RNR1 : e¼ Performing the above matrix calculations and putting e S ¼ RSR1 and N 1 RNR , we get e A¼e SþN with 0

2

B0 B e S¼B @0 2

0

0

0

2 0

0 2

0

2 4

0

1

1

B1 0C C e¼B B C and N @0 0A

1 0 1 0

1

0 0

1 0

0

1

0C C C: 0A 0

That is, we have 0

1

B B1 B A¼B B0 @

3

1

0

3

0

0

2

1

2

0

1

0

2

C B B 0C C B0 C¼B B 0C A @0 2 4

0 0 2 0 0 2 0 2

0

1

0

1

C B B 0C C B1 CþB B 0C A @0 1 4

1 1 0 1

0 0

1

C 0 0C C C: 0 0C A 0 0 ð12:178Þ

Even though matrix forms S and N differ depending on the choice of different matrix forms of similarity transformation R (namely, R0, R00, etc. represented above), e are uniquely determined, the decomposition (12.178) is unique. That is, e S and N once a matrix A is given. The confirmation is left for readers as an exercise. We present another simple example. Let A be a matrix described as A¼

0 1

4 : 4

Eigenvalues of A are 2 as a double root. According to routine, we have an ð2Þ eigenvector e1 as a column vector, e.g.,

12.6

Jordan Canonical Form

511 ð2Þ e1

2 ¼ : 1 ð2Þ

Another eigenvector is a generalized eigenvector g1 decided such that ðA

ð2Þ 2E Þg1

¼

of rank 2. This can be

2 4 ð2Þ ð2Þ g ¼ e1 : 1 2 1

As an option, we get ð2Þ g1

1 ¼ : 1

Thus, we can choose R for a diagonalizing matrix together with an inverse matrix R1 such that R¼

2 1

1 , 1

R1 ¼

and

1 : 2

1 1

Therefore, with a Jordan canonical form we have 1

R AR ¼

2 0

1 : 2

ð12:179Þ

As before, putting S¼

2 0

and

0 2

N¼

0 1 0 0

,

we have R1 AR ¼ S þ N, i:e:, A ¼ RðS þ N ÞR1 ¼ RSR1 þ RNR1 : e ¼ RNR1 , we get Putting e S ¼ RSR1 and N e A¼e SþN with e S¼

2

0

0

2

e¼ and N

2

4

1

2

:

ð12:180Þ

512

12

Canonical Forms of Matrices

We may also choose R0 for a diagonalizing matrix together with an inverse matrix R instead of R and R1, respectively, such that we have, e.g., 0 1

0

R ¼

2 1

1 0 1 , and R ¼ 1 1 3

3

2

:

Using these matrices, we get exactly the same Jordan canonical form and the matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix decomposition is unique. Another simple example is a lower triangle matrix 0

2

0

B A ¼ @ 2 0

0

1

C 0 A: 1

1 0

Following now familiar procedures, as a diagonalizing matrix we have, e.g., 0

1 B R ¼ @ 2 0

0 1

1 0 C 0A

0

1

0

R1

and

1 B ¼ @2

0 1

1 0 C 0 A:

0

0

1

Then, we get 0

2

0

B R1 AR ¼ S ¼ @ 0 0

1 0

0

1

0

0

C 0 A: 1

Therefore, the “decomposition” is 0

2

B A ¼ RSR1 ¼ @ 2 0

0

0

1

1

C B 0A þ @0

0

1

0

0

0

1

0

C 0 A,

0

0

where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e., zero matrix). Thus, the decomposition is once again unique.

12.7

Diagonalizable Matrices

Among canonical forms of matrices, the simplest form is a diagonalizable matrix. Here we define a diagonalizable matrix as a matrix that can be converted to that whose off-diagonal elements are zero. In Sect. 12.5 we have investigated different

12.7

Diagonalizable Matrices

513

properties of the matrices. In this section we examine basic properties of diagonalizable matrices. In Sect 12.6.1 we have shown that Span{Ni 1aρ, Ni 2aρ, Naρ, aρ} forms a N-invariant subspace of a dimension i, where aρ satisfies the relations Niaρ ¼ 0 and Ni 1aρ 6¼ 0 (ri + 1 + 1 j ri) as in (12.122). Of these vectors, only Ni 1aρ is a sole proper eigenvector that is accompanied by (i 1) generalized eigenvectors. Note that only the proper eigenvector can construct one-dimensional N-invariant subspace by itself. This is because regarding other generalized eigenvectors g (here g stands for all of generalized eigenvectors), Ng (≠0) and g are linearly independent. Note that with a proper eigenvector e, we have Ne = 0. A corresponding Jordan block is represented by a matrix as given in (12.122) in reference to the basis vectors comprising these i eigenvectors. Therefore, if a (n, n) matrix A has only proper eigenvectors, all Jordan blocks are one-dimensional. This means that A is diagonalizable. That A has only proper eigenvectors is equivalent to that those eigenvectors form a one-dimensional subspace and that V n is a direct sum of the subspaces spanned by individual proper eigenvectors. In other words, if V n is a direct sum of subspaces (i.e., eigenspaces) spanned by individual proper eigenvectors of A, A is diagonalizable. Next, suppose that A is diagonalizable. Then, after an appropriate similarity transformation with a non-singular matrix P, A has a following form: 0 B B B B B B B B B P1 AP ¼ B B B B B B B B B @

1

α1

C C C C C C C C C C: C C C C C C C C A

⋱ α1 α2 ⋱ α2 ⋱ αs ⋱

ð12:181Þ

αs In this case, let us examine what form a minimal polynomial φA(x) for A takes. A characteristic polynomial fA(x) for A is invariant through similarity transformation, so is φA(x). That is, φP1 AP ðxÞ ¼ φA ðxÞ:

ð12:182Þ

From (12.181), we find that A αiE (1 i s) has a “perforated” form such as (12.143) with the diagonalized form unchanged. Then we have

514

12

Canonical Forms of Matrices

ðA α1 EÞðA α2 E Þ ðA αs EÞ ¼ 0:

ð12:183Þ

This is because a product of matrices having only diagonal elements is merely a product of individual diagonal elements. Meanwhile, in virtue of HamiltonCayley theorem, we have f A ðAÞ ¼

Ys i¼1

ðA αi E Þni ¼ 0:

Rewriting this expression, we have ðA α1 EÞn1 ðA α2 E Þn2 ðA αs E Þns ¼ 0:

ð12:184Þ

In light of (12.183), this implies that a minima polynomial φA(x) is expressed as φA ðxÞ ¼ ðx α1 Þðx α2 Þ ðx αs Þ:

ð12:185Þ

Surely φA(x) of (12.185) has a lowest order polynomial among those satisfying f (A) ¼ 0 and is a divisor of fA(x). Also φA(x) has a highest-order coefficient of 1. Thus, φA(x) should be a minimal polynomial of A and we conclude that φA(x) does not have a multiple root. Then let us think how V n is characterized in case φA(x) does not have a multiple root. This is equivalent to that φA(x) is described by (12.185). To see this, suppose that we have two matrices A and B and let BVn ¼ W. We wish to use the following relation: rank ðABÞ ¼ dimABV n ¼ dimAW ¼ dimW dim A1 f0g \ W dimW dim A1 f0g ¼ dimBV n ðn dimAV n Þ ¼ rank A þ rank B n:

ð12:186Þ

In (12.186), the third equality comes from the fact that the domain of A is restricted to W. Concomitantly, A1{0} is restricted to A1{0} \ W as well; notice that A1{0} \ W is a subspace. Considering these situations, we use a relation corresponding to that of (11.45). The fourth equality is due to the dimension theorem of (11.45). Applying (12.186) to (12.183) successively, we have

12.7

Diagonalizable Matrices

515

0 ¼ rank ½ðA α1 E ÞðA α2 EÞ ðA αs EÞ rank ðA α1 EÞ þ rank ½ðA α2 E Þ ðA αs E Þ n rank ðA α1 EÞ þ rank ðA α2 EÞ þ rank ½ðA α3 E Þ ðA αs E Þ 2n rank ðA α1 EÞ þ þ rank ðA αs E Þ ðs 1Þn Xs ¼ rank ½ðA αi E Þ n þ n: i¼1 Finally we get Xs i¼1

rank ½n ðA αi E Þ n:

ð12:187Þ

As rank ½n ðA αi E Þ ¼ dimW αi , we have Xs i¼1

dimW αi n:

ð12:188Þ

Meanwhile, we have V n ⊃ W α1 W α2 W αs n dim ðW α1 W α2 W αs Þ ¼

Xs i¼1

dimW αi :

ð12:189Þ

The equality results from the property of a direct sum. From (12.188) and (12.189), we get Xs i¼1

dimW αi ¼ n:

ð12:190Þ

Hence, V n ¼ W α1 W α2 W αs :

ð12:191Þ

Thus, we have proven that if the minimal polynomial does not have a multiple root, V n is decomposed into direct sum of eigenspaces as in (12.191). If in turn V n is decomposed into direct sum of eigenspaces as in (12.191), A can be diagonalized by a similarity transformation. The proof is as follows: Suppose that (12.191) holds. Then, we can take only eigenvectors for the basis n vectors αi ¼ ni . Then, we can take vectors hP of V . Suppose Pthat dimW i i1 i n n ak þ 1 k so that ak can be the basis vectors of W αi . In j¼1 j j¼1 j reference to this basis set, we describe a vector x 2 V n such that

516

12

Canonical Forms of Matrices

0

1 x1 B x2 C B C x = ða1 a2 an Þ B C: @⋮A xn Operating A on x, we get 0

x1

1

0

x1

1

B C B C B x2 C B x2 C B C B C AðxÞ ¼ ða1 a2 an ÞAB C ¼ ða1 A a2 A an AÞB C B⋮C B⋮C @ A @ A xn

xn

1

0

x1 B C B x2 C B C ¼ ðα1 a1 α2 a2 αn an ÞB C B⋮C @ A 0 B B B ¼ ða1 a2 an ÞB B @

xn

10

α1

x1

1

CB C CB x2 C CB C CB C CB ⋮ C A@ A

α2 ⋱ αn

ð12:192Þ

xn

where with the second equality we used the notation (11.40); with the third equality some of αi (1 i n) may be identical; ai is an eigenvector that corresponds to an eigenvalue αi. Suppose that a1, a2, , and an are obtained by transforming an “original” basis set e1, e2, , and en by R. Then, we have 0 B B AðxÞ ¼ ðe1 e2 en Þ RB B @

α1

1 α2 ⋱ αn

0

ð0Þ

x1

C B C 1 B xð20Þ CR B C B ⋮ A @

1 C C C: C A

xðn0Þ

We denote the transformation A with respect to a basis set e1, e2, , and en by A0; see (11.82) with the notation. Then, we have

12.7

Diagonalizable Matrices

517

0

ð0Þ

x1

B ð0Þ B x2 AðxÞ ¼ ðe1 e2 en Þ A0 B B ⋮ @

1 C C C: C A

xðn0Þ Therefore, we get 0 B B R1 A0 R ¼ B @

1

α1

C C C: A

α2 ⋱

ð12:193Þ

αn Thus, A is similar to a diagonal matrix as represented in (12.192) and (12.193). It is obvious to show a minimal polynomial of a diagonalizable matrix has no multiple root. The proof is left for readers. Summarizing the above arguments, we have a following theorem: Theorem 12.7 [3, 4] The following three statements related to A are equivalent: (i) The matrix A is similar to a diagonal matrix. (ii) The minimal polynomial φA(x) does not have a multiple root. (iii) The vector space V n is decomposed into a direct sum of eigenspaces. In Example 12.1 we showed the diagonalization of a matrix. There A has two different eigenvalues. Since with a (n, n) matrix having n different eigenvalues its characteristic polynomial does not have a multiple root, the minimal polynomial necessarily has no multiple root. The above theorem therefore ensures that a matrix having no multiple root must be diagonalizable. Another consequence of this theorem is that an idempotent matrix is diagonalizable. The matrix is characterized by A2 ¼ A. Then A(A E) ¼ 0. Taking its determinant, (detA)[det(A E)] ¼ 0. Therefore, we have either detA ¼ 0 or det (A E) ¼ 0. Hence, eigenvalues of A are zero or 1. Think of f(x) ¼ x(x 1). As f (A) ¼ 0, f(x) should be a minimal polynomial. It has no multiple root, and so the matrix is diagonalizable. Example 12.6 Let us revisit Example 12.1, where we dealt with A¼

2 0

1 : 1

ð12:32Þ

From (12.33), fA(x) ¼ (x 2)(x 1). Note that f A ðxÞ ¼ f P1 AP ðxÞ. Let us treat a problem according to Theorem 12.5. Also we use the notation of (12.85). Given f1(x) ¼ x 1 and f2(x) ¼ x 2, let us decide M1(x) and M2(x) such that these can satisfy

518

12

Canonical Forms of Matrices

M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ ¼ 1:

ð12:194Þ

We find M1(x) ¼ 1 and M2(x) ¼ 1. Thus, using the notation of Theorem 12.5, Sect. 12.4 we have A1 ¼ M 1 ðAÞf 1 ðAÞ ¼ A E ¼

1 1 0 0

A2 ¼ M 2 ðAÞf 2 ðAÞ ¼ A þ 2E ¼

! ,

0

1

0

1

! :

We also have A1 þ A2 ¼ E, Ai Aj ¼ Aj Ai ¼ Ai δij :

ð12:195Þ

Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are expressed by a polynomial A, they are commutative with A. We find that A is represented by A ¼ α1 A1 þ α2 A2 ,

ð12:196Þ

where α1 and α2 denote eigenvalues 2 and 1, respectively. Thus choosing proper eigenvectors for basis vectors, we have decomposed a vector space V n into a direct sum of invariant subspaces comprising the proper eigenvectors. Concomitantly, A is represented as in (12.196). The relevant decomposition is always possible for a diagonalizable matrix. Thus, idempotent matrices play an important role in the theory of linear vector spaces. Example 12.7 Let us think of a following matrix. 0

1 B A ¼ @0

0 1

1 1 C 1 A:

0

0

0

ð12:197Þ

This is a triangle matrix, and so diagonal elements give eigenvalues. We have an eigenvalue 1 of double root and that 0 of simple root. The matrix can be diagonalized using P such that

12.7

Diagonalizable Matrices

0

1

B e ¼ P1 AP ¼ B 0 A @ 0

519

0

1

10

1 0

1

CB B 1C A@ 0 1

0

1

0 0

1

10

0 1

1

CB B 1C A@ 0 0

1

0

1

C B B 1 1 C A ¼ @0

0

0 1

0

0

0

1

1

C 0C A:

0

0 ð12:198Þ

e We also have e2 ¼ A. As can be checked easily, A 0

0

e¼B EA @0 0

0

0

1

0

C 0 A,

0

1

ð12:199Þ

2 e ¼ 0. e ¼EA e holds as well. Moreover, A e EA e ¼ EA e A where E A e and E A e behave like A1 and A2 of (12.195). Thus A Next, suppose that x 2 V n is expressed as a linear combination of basis vectors a1, a2, , and an. Then x = c1 a1 þ c2 a2 þ þ cn1 an1 þ cn an :

ð12:200Þ

Here let us define a following linear transformation P(k) such that P(k) “extracts” the k-th component of x. That is, PðkÞ ðxÞ = PðkÞ

Xn

Xn Xn ½k c a p c a ¼ c k ak , ¼ j j j¼1 j¼1 i¼1 ij j i

ð12:201Þ

½k

where pij is the matrix representation of P(k). In fact, suppose that there is another arbitrarily chosen vector x such that y = d1 a1 þ d2 a2 þ þ dn1 an1 þ d n an :

ð12:202Þ

Then we have PðkÞ ðax þ byÞ = ðack þ bd k Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we should have ð jÞ

where δi

½k

pij

ðk Þ

¼ δi δðjkÞ ,

ð12:204Þ

has been defined in (12.175). Meanwhile, δðjkÞ denotes a row vector to ðk Þ

which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix

520

12

Canonical Forms of Matrices

ðk Þ

and that δðjkÞ denotes a (1, n) matrix. Therefore, δi δjðkÞ represents a (n, n) matrix whose (k, k) element is 1, otherwise 0. Thus, P(k)(x) is denoted by 0 B B B B B B ðk Þ P ðxÞ ¼ ða1 an ÞB B B B B @

1

0

C C0 1 C x1 C CB x C CB 2 C C B C ¼ x k ak , C@ ⋮ A C C x C n A

⋱ 0 1 0 ⋱

ð12:205Þ

0 where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] ¼ P(k)(x). That is h i2 PðkÞ ¼ PðkÞ :

ð12:206Þ

Also P(k)[P(l )(x)] = 0 if k 6¼ l. Meanwhile, we have P(1)(x) + + P(n)(x) = x. Hence, P(1)(x) + + P(n)(x) = [P(1) + + P(n)](x) ¼ x. Since this relation holds with any x 2 V n, we get Pð1Þ þ þ PðnÞ ¼ E:

ð12:207Þ

As shown above, an idempotent matrix such as P(k) always exists. In particular, if the basis vectors comprise only proper eigenvectors, the decomposition as expressed in (12.196) is possible. In that case, it is described as A ¼ α1 A1 þ þ αn An ,

ð12:208Þ

where α1, , and αn are eigenvalues (some of which may be identical) and A1, , and An are idempotent matrices such as those represented by (12.205). Yet, we have to be careful to construct idempotent matrices according to a formalism described in Theorem 12.5. It is because we often encounter a situation where different matrices give an identical characteristic polynomial. We briefly mention this in the next example. Example 12.8 Let us think about following two matrices: 0

3 0

B A ¼ @0 2 0 0

0

1

0

3

C B 0 A, B ¼ @ 0 2 0

0 0

1

C 2 1 A: 0 2

Then, following Theorem 12.5, Sect. 12.4, we have

ð12:209Þ

12.7

Diagonalizable Matrices

521

f A ðxÞ ¼ f B ðxÞ ¼ ðx 3Þðx 2Þ2 ,

ð12:210Þ

with eigenvalues α1 ¼ 3 and α2 ¼ 2. Also we have f1(x) ¼ (x 2)2 and f2(x) ¼ x 3. Following the procedures of (12.85) and (12.86), we obtain M 1 ð xÞ ¼ x 2

and

M 2 ðxÞ ¼ x2 þ 3x 3:

Therefore, we have M 1 ðxÞf 1 ðxÞ ¼ ðx 2Þ3 , M 2 ðxÞf 2 ðxÞ ¼ ðx 3Þ x2 þ 3x 3 :

ð12:211Þ

Hence, we get A1 M 1 ðAÞf 1 ðAÞ ¼ ðA 2E Þ3 , A2 M 2 ðAÞf 2 ðAÞ ¼ ðA 3E Þ A2 þ 3A 3E :

ð12:212Þ

Similarly, we get B1 and B2 by replacing A with B in (12.212). Thus, we have 0

1

B A1 ¼ B1 ¼ @ 0 0

0 0

1

0

0 0

C B 0 0 A, A2 ¼ B2 ¼ @ 0 1 0 0 0 0

0

1

C 0 A: 1

ð12:213Þ

Notice that we get the same idempotent matrix of (12.213), even though the matrix forms of A and B differ. Also we have A1 þ A2 ¼ B1 þ B2 ¼ E: Then, we have A ¼ ðA1 þ A2 ÞA ¼ A1 A þ A2 A,

B ¼ ðB1 þ B2 ÞB ¼ B1 B þ B2 B:

Nonetheless, although A ¼ 3A1 + 2A2 holds, B 6¼ 3B1 + 2B2. That is, the decomposition of the form of (12.208) is not true of B. The decomposition of this kind is possible only with diagonalizable matrices. In summary, a (n, n) matrix with s (1 s n) different eigenvalues has at least s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.) In the case of s < n, the matrix has multiple root(s) and may have generalized eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is accompanied by (ν 1) generalized eigenvectors along with a sole proper eigenvector. Those vectors form an invariant subspace along with the proper eigenvector (s). In total, such n (generalized) eigenvectors span a whole vector space V n. With the eigenvalue equation A(x) ¼ αx, we have an indefinite but nontrivial solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane.

522

12

Canonical Forms of Matrices

However, we have a unique but trivial solution x = 0 for complex numbers α other than eigenvalues. This is characteristic of the eigenvalue problem.

References 1. Mirsky L (1990) An introduction to linear algebra. Dover, New York 2. Hassani S (2006) Mathematical physics. Springer, New York 3. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)

Chapter 13

Inner Product Space

Thus far we have treated the theory of linear vector spaces. The vector spaces, however, were somewhat “structureless,” and so it will be desirable to introduce a concept of metric or measure into the linear vector spaces. We call a linear vector space where the inner product is defined an inner product space. In virtue of a concept of the inner product, the linear vector space is given a variety of structures. For instance, introduction of the inner product to the linear vector space immediately leads to the definition of adjoint operators and Gram matrices. Above all, the concept of inner product can readily be extended to a functional space and facilitate understanding of, e.g., orthogonalization of functions, as was exemplified in Parts I and II. Moreover, definition of the inner product allows us to relate matrix operators and differential operators. In particular, it is a key issue to understand logical structure of quantum mechanics. This can easily be understood from the fact that Paul Dirac, who was known as one of prominent founders of quantum mechanics, invented bra and ket vectors to represent an inner product.

13.1

Inner Product and Metric

Inner product relates two vectors to a complex number. To do this, we introduce the notation jai and hbj to represent the vectors. This notation is due to Dirac and widely used in physics and mathematical physics. Usually jai and hbj are called a “ket” vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may call haj an adjoint vector of jai. Or we denote haj jai{. The symbol “{” (dagger) means that for a matrix its transposed matrix should be taken with complex conju { gate matrix elements. That is, aij ¼ aji . If we represent a full matrix, we have

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_13

523

524

13

0

a11 B A¼@⋮ an1

1 a1n C ⋱ ⋮ A, ann

0

a11

B A{ ¼ @ ⋮ a1n

an1

Inner Product Space

1

C ⋱ ⋮ A: ann

ð13:1Þ

We call A{ an adjoint matrix or adjoint to A; see (1.106). The operation of transposition and complex conjugate is commutable. A further remark will be made below after showing the definition of the inner product. The symbols jai and hbj represent vectors and, hence, we do not need to use bold characters to show that those are vectors. The definition of the inner product is as follows: hbjai = hajbi ,

ð13:2Þ

hajðβjbi þ γjciÞ ¼ βhajbi þ γ hajci,

ð13:3Þ

hajai 0:

ð13:4Þ

In (13.4), equality holds only if jai ¼ 0. Note here that two vectors are said to be orthogonal to each other if their inner product vanishes, i.e., hbjai ¼ hajbi ¼ 0. In particular, if a vector jai 2 V n is orthogonal to all the vectors in V n, i.e., hxjai ¼ 0 for 8 x 2 V n, then jai ¼ 0. This is because if we choose jai for jxi, we have hajai ¼ 0. This means that jai ¼ 0. We call a linear vector space to which the inner product is defined an inner product space. We can create another structure to a vector space. An example is a metric (or distance function). Suppose that there is an arbitrary set Q. If a real non-negative number ρ(a, b) is defined as follows with any arbitrary elements a, b, c 2 Q, the set Q is called a metric space [1]: ρða, bÞ ¼ ρðb, aÞ, 8

ρða, bÞ 0 for a, b; ρða, bÞ ¼ 0

if and only if a ¼ b,

ρða, bÞ þ ρðb, cÞ ρða, cÞ:

ð13:5Þ ð13:6Þ ð13:7Þ

In our study a vector space is chosen for the set Q. Here let us define a norm for each vector a. The norm is defined as jjajj¼

pffiffiffiffiffiffiffiffiffiffi hajai:

ð13:8Þ

If we define ρ(a, b) jja bjj, jja bjj satisfies the definitions of metric. Equations (11.5) and (11.6) are obvious. For (11.7) let us consider a vector jci as jci¼jai xhbjaijbi with real x. Since hcjci 0, we have

13.1

Inner Product and Metric

525

x2 hajbihbjaihbjbi 2xhajbihbjai þ hajai 0 or x2 jhajbij2 hbjbi 2x j hajbij2 þ hajai 0:

ð13:9Þ

The inequality (13.9) related to the quadratic equation in x with real coefficients requires the inequality such that hajaihbjbi hajbihbjai ¼ jhajbij2 :

ð13:10Þ

pffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi hajai ∙ hbjbi jhajbij :

ð13:11Þ

jjajj ∙ jjbjj j hajbi j R ehajbi:

ð13:12Þ

That is,

Namely,

The relations (13.11) and (13.12) are known as Cauchy–Schwarz inequality. Meanwhile, we have jja þ bjj2 ¼ ha þ bja þ bi ¼ jjajj2 þ jjbjj2 þ 2R ehajbi, ðjjajj þ jjbjjÞ2 ¼ jjajj2 þ jjbjj2 þ 2jjajj ∙ jjbjj,

ð13:13Þ ð13:14Þ

Comparing (13.13) and (13.14) and using (13.12), we have jjajj þ jjbjj jja þ bjj:

ð13:15Þ

The inequality (13.15) is known as the triangle inequality. In (13.15) replacing a ! a b and b ! b c, we get jja bjj þ jjb cjj jja cjj:

ð13:16Þ

Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8) may be regarded as a “length” of a vector a. As βjbi + γjci represents a vector, we use a shorthand notation for it as j βb þ γci β j bi þ γ j ci: According to the definition (13.3)

ð13:17Þ

526

13

Inner Product Space

hajβb þ γci ¼ βhajbi þ γ hajci:

ð13:18Þ

Also from (13.2), hβb þ γcjai ¼ hajβb þ γci ¼ ½βhajbi þ γ hajci ¼ β hbjai þ γ hcjai:

ð13:19Þ

That is, hβb þ γcj ¼ β hb j þγ hcj :

ð13:20Þ

Therefore, when we take out a scalar from a bra vector, it should be a complex conjugate. When the scalar is taken out from a ket vector, on the other hand, it is unaffected by definition (13.3). To show this, we have jαai ¼ αjai. Taking its adjoint, hαaj ¼ αhaj. We can view (13.17) as a linear transformation in a vector space. In other words, fn, j ∙ i is that if we regard j ∙ i as a linear transformation of a vector a 2 V n to j ai 2 V n n f. On the other hand, h ∙ j could not be regarded as a linear transformation of V to V n fn 0. Sometimes the said transformation is referred to as “antilinear” of a 2 V to haj2 V or “sesquilinear.” From the point of view of formalism, the inner product can be fn 0 ! ℂ. fn V viewed as an operation: V fn. Then jxi jyi ¼ 0. That is, Let us consider jxi ¼ jyi in an inner product space V jx yi ¼ 0. Therefore, we have x ¼ y, or j0i ¼ 0. This means that the linear fn . This is a characteristic of a linear transformation j ∙ i converts 0 2 V n to j 0i 2 V transformation represented in (11.44). Similarly we have h0j ¼ 0. However, we do not have to get into further details in this book. Also if we have to specify a vector space, we simply do so by designating it as V n.

13.2

Gram Matrices

Once we have defined an inner product between any pair of vectors jai and jbi of a vector space V n, we can define and calculate various quantities related to inner products. As an example, let ja1i, , jani and jb1i, , jbni be two sets of vectors in V n. The vectors ja1i, , jani may or may not be linearly independent. This is true of jb1i, , jbni. Let us think of a following matrix M defined as below: 0

ha1 j

1

0

ha1 jb1 i B B C M ¼ @ ⋮ Aðjb1 i jbn iÞ ¼ @ ⋮ ⋱ han j han jb1 i We assume following cases.

1 ha1 jbn i C ⋮ A: han jbn i

ð13:21Þ

13.2

Gram Matrices

527

(i) Suppose that in (13.21) jb1i, , jbni are linearly dependent. Then, without loss of generality we can put jb1 i ¼ c2 jb2 i þ c3 jb3 i þ þ cn jbn i:

ð13:22Þ

Then we have 0

1 ha1 jbn i C ⋮ A :

c2 ha1 jb2 i þ c3 ha1 jb3 i þ þ cn ha1 jbn i B M¼@ ⋮

ha1 jb2 i ⋮

⋱

c2 han jb2 i þ c3 han jb3 i þ þ cn han jbn i

han jb2 i

han jbn i ð13:23Þ

Multiplying the second column, , and the n-th column by (c2), , and (cn), respectively, and adding them to the first column to get

ð13:24Þ Hence, det M ¼ 0. (ii) Suppose in turn that in (13.21) ja1i, , jani are linearly dependent. In that case, again, without loss of generality we can put ja1 i ¼ d2 ja2 i þ d3 ja3 i þ þ dn jan i:

ð13:25Þ

Focusing attention on individual rows and taking a similar procedure described above, we have 0

0

B B ha jb i M¼B 2 1 @ ⋮ han jb1 i

0

1

⋱

C ha2 jbn i C C: ⋮ A

han jbn i

ð13:26Þ

Again, det M ¼ 0. Next let us examine the case where det M ¼ 0. In that case, n column vectors of M in (13.21) are linearly dependent. Without loss of generality, we suppose that the first column is expressed as a linear combination of the other (n 1) columns such that

528

13

1 0 1 0 1 ha1 jb1 i ha1 jb2 i ha1 jbn i B C B C B C @ ⋮ A ¼ c2 @ ⋮ A þ þ cn @ ⋮ A:

Inner Product Space

0

han jb1 i

han jb2 i

ð13:27Þ

han jbn i

Rewriting this, we have 0

1 0 1 0 ha1 jb1 c2 b2 cn bn i B C B C ⋮ @ A ¼ @ ⋮ A: 0 han jb1 c2 b2 cn bn i

ð13:28Þ

Multiplying the first row, , and the n-th row of (13.28) by an appropriate complex number p1 , , and pn , respectively, we have hp1 a1 jb1 c2 b2 cn bn i ¼ 0, , hpn an jb1 c2 b2 cn bn i ¼ 0:

ð13:29Þ

Adding all the above, we get hp1 a1 þ þ pn an jb1 c2 b2 cn bn i ¼ 0:

ð13:30Þ

Now, suppose that ja1i, , jani are the basis vectors. Then, jp1a1 + + pnani represents any vectors in a vector space. This implies that jb1 c2b2 cnbni ¼ 0; for this see remarks after (13.4). That is, jb1 i ¼ c2 jb2 i þ þ cn jbn i:

ð13:31Þ

Thus, jb1i, , jbni are linearly dependent. Meanwhile, det M ¼ 0 implies that n row vectors of M in (13.21) are linearly dependent. In that case, performing similar calculations to the above, we can readily show that if jb1i, , jbni are the basis vectors, ja1i, , jani are linearly dependent. We summarize the above discussion by a following statement: Suppose that we have two sets of vectors ja1i, , jani and jb1i, , jbni. At least a set of vectors are linearly dependent. ⟺ det M ¼ 0. Both the sets of vectors are linearly independent. ⟺ det M 6¼ 0. The latter statement is obtained by considering contraposition of the former statement. We restate the above in a following theorem: Theorem 13.1 Let ja1i, , jani and jb1i, , jbni be two sets of vectors defined in a vector space V n. A necessary and sufficient condition for both these sets of vectors to be linearly independent is that for a matrix M defined below, det M 6¼ 0.

13.2

Gram Matrices

529

0 1 ha1 j ha1 jb1 i B B C M ¼ @ ⋮ Aðjb1 i jbn iÞ ¼ @ ⋮ 0

han j

han jb1 i

1 ha1 jbn i C ⋱ ⋮ A: han jbn i

Next we consider a norm of a vector expressed in reference to a set of basis vectors je1i, , jeni of V n. Let us express a vector jxi in an inner product space as follows as in the case of (11.10) and (11.13): jxi ¼ x1 je1 i þ x2 je2 i þ þ xn jen i ¼ jx1 e1 þ x2 e2 þ þ xn en i 0 1 x1 B C ¼ ðje1 i jen iÞ@ ⋮ A: xn

ð13:32Þ

A bra vector hxj is then denoted by 0

1 he1 j B C hx j¼ x1 xn @ ⋮ A:

ð13:33Þ

hen j Thus, we have an inner product described as 0 1 0 1 x1 h e1 j B C C B hxjxi ¼ x1 xn @ ⋮ Aðje1 i jen iÞ@ ⋮ A xn h en j 0 10 1 x1 he1 je1 i he1 jen i B CB C ⋱ ⋮ A@ ⋮ A : ¼ x1 xn @ ⋮ xn hen je1 i hen jen i

ð13:34Þ

Here the matrix expressed as follows is called a Gram matrix [2–4]: 0

he1 je1 i B G¼@ ⋮ ⋱ hen je1 i

1 he1 jen i C ⋮ A:

ð13:35Þ

hen jen i

As hejjeii ¼ heijeji, we have G ¼ G{. From Theorem 13.1, detG 6¼ 0. With a shorthand notation, we write (G)ij ¼ (gij) ¼ (heijeji). As already mentioned in Sect. 1.4, if for a matrix H we have a relation described by

530

13

Inner Product Space

H ¼ H{,

ð1:119Þ

it is said to be an Hermitian matrix or a self-adjoint matrix. We often say that such a matrix is Hermitian. Since the Gram matrices frequently appear in matrix algebra and play a role, their properties are worth examining. Since G is an Hermitian matrix, it can be diagonalized through similarity transformation using a unitary matrix. We will give its proof later (see Sect. 14.3). Let us deal with (13.34) further. We have 0

he1 je1 i {B hxjxi ¼ x1 xn UU @ ⋮ hen je1 i

0 1 1 he1 jen i x1 C C {B ⋱ ⋮ AUU @ ⋮ A, xn hen jen i

ð13:36Þ

where U is defined as UU{ ¼ U{U ¼ E. Such a matrix U is called a unitary matrix. We represent a matrix form of U as 0

u11

B U¼@⋮

un1

u1n

1

0

u11

C B ⋮ A, U { ¼ @ ⋮ u1n unn ⋱

un1

1

C ⋱ ⋮ A: unn

ð13:37Þ

Here, putting 0

1 0 1 ξ1 x1 Xn C B C {B U @ ⋮ A ¼ @ ⋮ A or equivalently x u ξ i k¼1 k ki

ð13:38Þ

ξn

xn

and taking its adjoint such that Xn x1 xn U ¼ ξ1 ξn or equivalently x u ¼ ξi , k¼1 k ki we have 0

he1 je1 i B hxjxi ¼ ξ1 ξn U { @ ⋮ hen je1 i

1 0 1 he1 jen i ξ1 C B C ⋱ ⋮ AU @ ⋮ A: ξn hen jen i

ð13:39Þ

We assume that the Gram matrix is diagonalized by a similarity transformation by U. After being diagonalized, similarly to (12.192) and (12.193) the Gram matrix has a following form G0:

13.2

Gram Matrices

531

0

λ1 B { 0 U GU ¼ G ¼ @ ⋮ 0

1 0 C ⋱ ⋮ A:

ð13:40Þ

λn

Thus, we get 0

λ1

B hxjxi ¼ ξ1 ξn @ ⋮ 0

0

10

ξ1

1

CB C ⋱ ⋮ A@ ⋮ A ¼ λ1 jξ1 j2 þ þ λn jξn j2 : λn ξn

ð13:41Þ

From the relation (13.4), hxjxi 0. This implies that in (13.41) we have λi 0 ð1 i nÞ:

ð13:42Þ

To0show1 this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi 6¼ 0. Then, 0 C B B⋮C C B 2 C with B B ξi C we have hxjxi ¼ λi|ξi| < 0, in contradiction. B⋮C A @ 0 Since we have detG 6¼ 0, taking a determinant of (13.40) we have n Y λi 6¼ 0: ð13:43Þ det G0 ¼ det U { GU ¼ det U { UG ¼ det E det G ¼ det G ¼ i¼1

Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e., λi > 0 ð1 i nÞ:

ð13:44Þ

The norm hxjxi ¼ 0, if and only if ξ1 ¼ ξ2 ¼ ¼ ξn ¼ 0 which corresponds to x1 ¼ x2 ¼ ¼ xn ¼ 0 from (13.38). For further study, we generalize the aforementioned feature a little further. That is, if je1i, , andjeni are linearly dependent, from Theorem 13.1 we get det G ¼ 0. This implies that there is at least one eigenvalue λi ¼ 0 (1 i n) and that for a 1 0 0 C B B⋮C C B ∃ 2 C vector B B ξi C with ξi 6¼ 0 we have hxjxi ¼ 0. (Suppose that in V we have linearly B⋮C A @ 0 dependent vectors je1i and je2i ¼ j e1i; see Example 13.2 below.)

532

13

Inner Product Space

Let H be an Hermitian matrix [i.e., (H )ij ¼ (H)ji]. If we have ϕ as a function of complex variables x1, , xn such that ϕðx1 , , xn Þ ¼

Xn

x ðH Þij xj , i,j¼1 i

ð13:45Þ

where Hij is a matrix element of an Hermitian matrix; ϕ(x1, , xn) is said to be an Hermitian quadratic form. Suppose that ϕ(x1, , xn) satisfies ϕ(x1, , xn) ¼ 0 if and only if x1 ¼ x2 ¼ ¼ xn ¼ 0, and otherwise ϕ(x1, , xn) > 0 with any other sets of xi (1 i n). Then, the said Hermitian quadratic form is called positive definite and we write H > 0:

ð13:46Þ

If ϕ(x1, , xn) 0 for any xn (1 i n) and ϕ(x1, , xn) ¼ 0 for at least a set of (x1, , xn) to which ∃xi 6¼ 0, ϕ(x1, , xn) is said to be positive semi-definite or nonnegative. In that case, we write H 0:

ð13:47Þ

From the above argument, a Gram matrix comprising linearly independent vectors is positive definite, whereas that comprising linearly dependent vectors is non-negative. On the basis of the above argument including (13.36) to (13.44), we have H > 0 ⟺ λi > 0 ð1 i nÞ, det H > 0; H 0 ⟺ λi 0 with

∃

λi ¼ 0 ð1 i nÞ, det H ¼ 0:

ð13:48Þ

Notice here that eigenvalues λi remain unchanged after (unitary) similarity transformation. Namely, the eigenvalues are inherent to H. We have already encountered several examples of positive definite and nonnegative operators. A typical example of the former case is Hamiltonian of quantum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigenvalues are all positive (i.e., positive definite). Orbital angular momenta L2 of hydrogen-like atoms, on the other hand, are non-negative operators and, hence, an eigenvalue of zero is permitted. Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If we take an orthonormal basis jη1i,jη2i, , jηni, jeii can be expressed as

13.2

Gram Matrices

533

Xn Xn ηj , hek ei i ¼ b b b hη ji ji l ηj i j¼1 j¼1 l¼1 lk Xn Xn Xn Xn ¼ b b δ ¼ b b ¼ B{ kj ðBÞji ji lj ji lk jk j¼1 l¼1 j¼1 j¼1 ¼ B{ B ki :

j ei i ¼

Xn

ð13:49Þ

For the second equality of (13.49), we used the orthonormal condition hηijηji ¼ δij. Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B. In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1, e2, , and en by a matrix A defined in (11.69) and examined whether the transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set of basis vectors) is detA 6¼ 0. Thus, we notice that B plays a same role as A of (11.69) and, hence, det B 6¼ 0 if and only if the set of vectors je1i,je2i, and jeni defined in (13.32) are linearly independent. By the same token as the above, we conclude that eigenvalues of B{B are all positive, only if det B 6¼ 0 (i.e., B is non-singular). Alternatively, if B is singular, detB{B ¼ det B{ det B ¼ 0. In that case at least one of eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are frequently dealt with in the field of mathematical physics in conjunction with quadratic forms. Further topics can be seen in the next chapter. Example 13.1 Let us take two vectors jε1i andjε2i that are expressed as jε1 i ¼ je1 iþje2 i, jε2 i ¼ j e1 i þ ije2 i:

ð13:50Þ

Here we have heijeji ¼ δij (1 i, j 2). Then we have a Gram matrix expressed as G¼

hε1 jε1 i hε1 jε2 i hε2 jε1 i hε2 jε2 i

¼

2

1þi

1i

2

:

ð13:51Þ

2 1 þ i ¼ 4 ð1 þ i Þ Principal minors of G are j2j ¼ 2 > 0 and 1i 2 ð1 iÞ ¼ 2 > 0. Therefore, according to Theorem 14.11, G > 0 (vide infra). Let us diagonalize the matrix G. To this end, we find roots of the characteristic equation. That is, 2 λ 1 þ i ¼ 0, λ2 4λ þ 2 ¼ 0: det jG λE j¼ 1 i 2 λ We have λ ¼ 2

pffiffiffi 2. Then as a diagonalizing unitary matrix U we get

ð13:52Þ

534

13

Inner Product Space

0

1 1þi 1þi 2pffiffiffi C B 2 U ¼ @ pffiffiffi A: 2 2 2 2

ð13:53Þ

Thus, we get 0

pffiffiffi 1 0 2 1i

1þi B 2 2 1þi B 2 2pffiffiffi C C U { GU ¼ B @ pffiffiffi @ A 1i 2 2 2 1i 2 2 2 ! pffiffiffi 2þ 2 0 pffiffiffi : ¼ 0 2 2

1 1þi 2pffiffiffi C A 2 2 ð13:54Þ

pffiffiffi pffiffiffi The eigenvalues 2 þ 2 and 2 2 are real positive as expected. That is, the Gram matrix is positive definite. Example 13.2 Let je1i and je2i(¼je1i) be two vectors. Then, we have a Gram matrix expressed as G¼

he1 je1 i he2 je1 i

he1 je2 i he2 je2 i

¼

1

1

1

1

:

ð13:55Þ

Similarly in the case of Example 13.1, we have as an eigenvalue equation 1λ det j G λE j¼ 1

1 ¼ 0, λ2 2λ ¼ 0: 1 λ

ð13:56Þ

We have λ ¼ 2 or 0. As a diagonalizing unitary matrix U we get 0

1 pffiffiffi B 2 U¼B @ 1 pffiffiffi 2

1 1 pffiffiffi 2C C: 1 A pffiffiffi 2

ð13:57Þ

Thus, we get 0

1 1 1 pffiffiffi pffiffiffi B 2 1 2C C U { GU ¼ B @ 1 1 A 1 pffiffiffi pffiffiffi 2 2

0 1 1 1

pffiffiffi pffiffiffi 1 B 2 2 2C B C¼ @ A 1 1 1 0 pffiffiffi pffiffiffi 2 2

0 0

:

ð13:58Þ

13.3

Adjoint Operators

535

As expected, we have a diagonal matrix, one of eigenvalues for which is zero. That is, the Gram matrix is non-negative. In the present case, let us think of a following Hermitian quadratic form: ϕð x1 , x2 Þ ¼

X2

x ðGÞij xj i,j¼1 i

¼

x1

x2

{

UU GUU

{

x1

x2

2 0 2 0 x1 ξ1 U{ ¼ x1 x2 U ¼ ξ1 ξ2 ¼ 2j ξ 1 j 2 : 0 0 0 0 x2 ξ2

ξ1

0

¼ with ξ2 6¼ 0 as a column vector, we get ϕ(x1, ξ2 ξ2 x2) ¼ 0. With this type of column vector, we have Then, if we take

U{

x1 x2

0 1 1 1

pffiffiffi pffiffiffi

B 2 0 x1 0 0 2C C ¼ , i:e:, ¼U ¼B @ 1 1 A ξ2 ξ2 ξ2 x2 pffiffiffi pffiffiffi 2 2

ξ 1 2 ¼ pffiffiffi , 2 ξ2

x1 0 6¼ we may where ξ2 is an arbitrary complex number. Thus, even for x 2 0

x1 1 1 ¼ or , we have ϕ(x1, x2) ¼ 0. To be more specific, if we take, e.g., 1 2 x2 get ϕð1, 1Þ ¼ ð1 1Þ ϕð1, 2Þ ¼ ð1 2Þ

13.3

1

1

1

0

¼ ð 1 1Þ ¼ 0, 1 1 0

1 1 1 ¼ ð 1 2Þ ¼ 1 > 0: 1 2 1

1 1

1

Adjoint Operators

A linear transformation A is similarly defined as before and A transforms jxi of (13.32) such that

536

13

0

a11 B AðjxiÞ ¼ ðje1 i jen iÞ@ ⋮ an1

Inner Product Space

10 1 a1n x1 CB C ⋱ ⋮ A@ ⋮ A, xn ann

ð13:59Þ

where (aij) is a matrix representation of A. Defining A(jxi) =jA(x)i and h(x) A{j ¼ (hxj)A{ ¼ [A(jxi)]{ to be consistent with (1.117), we have 0

a11 B hðxÞA{ j ¼ x1 xn @ ⋮ a1n Therefore, puttingjyi ¼

Pn

i¼1 yi jei i,

⋱

10 1 an1 he1 j CB C ⋮ A@ ⋮ A:

ann

ð13:60Þ

hen j

we have

0 10 0 1 1 a11 an1 y1 h e1 j B CB B C C ðxÞA{ j y ¼ x1 xn @ ⋮ ⋱ ⋮ A@ ⋮ Aðje1 i jen iÞ@ ⋮ A a1n ann yn h en j 0 1 y { B 1C ¼ x1 xn A G@ ⋮ A: yn

ð13:61Þ

Meanwhile, we get 0 1 a11 h e1 j B C B hyjAðxÞi ¼ y1 yn @ ⋮ Aðje1 i jen iÞ@ ⋮ 0

h en j 0 1 x1 B C ¼ y1 yn GA@ ⋮ A:

an1

a1n

10

x1

1

⋱

CB C ⋮ A@ ⋮ A

ann

xn ð13:62Þ

xn Hence, we have 0

1 0 1 x1 x1 TB B C C hyjAðxÞi ¼ ðy1 yn ÞG A @ ⋮ A ¼ ðy1 yn ÞGT A{ @ ⋮ A: xn xn

ð13:63Þ

With the second equality, we used G ¼ (G{)T ¼ GT (note that G is Hermitian) and A ¼ (A{)T. A complex conjugate matrix A is defined as

13.3

Adjoint Operators

537

0

a11

B A @ ⋮ an1

⋱

a1n

1

C ⋮ A: ann

Comparing (13.61) and (13.63), we find that one is transposed matrix of the other. Also note that (AB)T ¼ BTAT, (ABC)T ¼ CTBTAT, etc. Since an inner product can be viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical. Hence, we get ðxÞA{ jy ¼ hyjAðxÞi ¼ hAðxÞjyi,

ð13:64Þ

where the second equality is due to (13.2). The other way around, we may use (13.64) for the definition of an adjoint operator of a linear transformation A. In fact, on the basis of (13.64) we have X

X { x A ð G Þ y ¼ y ðG Þjk ðA Þki xi j i kj ik i,j,k i,j,k j

¼ ¼

X

h x y A{ ik ðGÞkj ðG Þjk ðA Þki j i i,j,k

h x y A{ ik ðA Þki ðGÞkj ¼ 0: i,j,k i j

X

ð13:65Þ

With the third equality of (13.65), we used (G)jk ¼ (G)kj, i.e., G{ ¼ G (Hermitian matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we must have

A{

ik

¼ ðA Þki :

ð13:66Þ

Adopting the matrix representation of (13.59) for A, we get [1]

A{

ik

¼ aki :

ð13:67Þ

Thus, we confirm that the adjoint operator A{ is represented by a complex conjugate transposed matrix of A, in accordance with (13.1). Taking complex conjugate of (13.64), we have D { E ðxÞA{ jy ¼ yj A{ ðxÞ ¼ hyjAðxÞi: Comparing both sides of the second equality, we get

538

13

A{

{

¼ A:

Inner Product Space

ð13:68Þ

In Chap. 11, we have seen how a matrix representation of a linear transformation A is changed from A0 to A0 by the basis vectors transformation. We have A0 ¼ P1 A0 P:

ð11:88Þ

In a similar manner, taking the adjoint of (11.88) we get { 1 { ðA0 Þ ¼ P{ A{ P1 ¼ P{ A{ P{ :

ð13:69Þ

In (13.69), we denote the adjoint operator before the basis vectors transformation simply by A{ to avoid complicated notation. We also have

A{

0

{

¼ ðA0 Þ :

ð13:70Þ

Meanwhile, suppose that (A{)0 ¼ Q1A{Q. Then, from (13.69) and (13.70) we have P{ ¼ Q1 : Next let us perform a calculation as below: ðαu þ βvÞA{ jy ¼ hyjAðαu þ βvÞi ¼ α hyjAðuÞi þ β hyjAðvÞi ¼ α ðuÞA{ jy þ β ðvÞA{ jy ¼ αðuÞA{ þ βðvÞA{ jy : As y is an element arbitrarily chosen from a relevant vector space, we have hðαu þ βvÞA{ j¼ hαðuÞA{ þ βðvÞA{ j ,

ð13:71Þ

ðαu þ βvÞA{ ¼ αðuÞA{ þ βðvÞA{ :

ð13:72Þ

or

Equation (13.71) states the equality of two vectors in an inner product space on both sides, whereas (13.72) states that in a vector space where the inner product is not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear transformation. In fact, the matrix representation of (13.66) and (13.67) is independent of the concept of the inner product. Suppose that there are two (or more) adjoint operators B and C that correspond to A. Then, from (13.64) we have

13.3

Adjoint Operators

539

hðxÞBjyi ¼ hðxÞCjyi ¼ hyjAðxÞi :

ð13:73Þ

hðxÞB ðxÞCjyi ¼ hðxÞðB C Þjyi ¼ 0:

ð13:74Þ

Also we have

As x and y are arbitrarily chosen elements, we get B ¼ C, indicating the uniqueness of the adjoint operator. It is of importance to examine how the norm of a vector is changed by the linear transformation. To this end, let us perform a calculation as below: 0

10 1 a11 an1 he1 j B CB C ðxÞA{ jAðxÞ ¼ x1 xn @ ⋮ ⋱ ⋮ A@ ⋮ Aðje1 i jen iÞ a1n ann hen j 0 10 1 x1 a11 a1n B CB C @ ⋮ ⋱ ⋮ A@ ⋮ A an1 ann xn 0 1 x1 B C ¼ x1 xn A{ GA@ ⋮ A: xn

ð13:75Þ

Equation (13.75) gives the norm of vector after its transformation. We may have a case where the norm is conserved before and after the transformation. Actually, comparing (13.34) and (13.75), we notice that if A{GA ¼ G, hxjxi ¼ h(x)A{| A(x)i. Let us have a following example for this. Example 13.3 Let us take two mutually orthogonal vectors je1i andje2i with ke2k ¼ 2ke1k as basis vectors in the xy-plane (Fig. 13.1). We assume that je1i is a Fig. 13.1 Basis vectors | e1i and | e2i in the xy-plane and their linear transformation by R

y | ⟩ |

′⟩

| ⟩ cos | ⟩

−2| ⟩ sin

sin 2

| ′⟩

| ⟩ | ⟩ cos

x

540

13

Inner Product Space

unit vector. Hence, a set of vectors je1i and je2i does not constitute the orthonormal basis. Let jxi be an arbitrary position vector there expressed as jxi ¼ ðje1 i je2 iÞ

x1 x2

:

ð13:76Þ

Then we have

x1 h e1 j ðje1 i je2 iÞ x2 h e2 j

x1 he1 je1 i he1 je2 i

hxjxi ¼ ðx1 x2 Þ ¼ ð x1 x2 Þ ¼

x21

þ

he2 je1 i he2 je2 i

x2

¼ ðx1 x2 Þ

1 0

0 4

x1

x2 ð13:77Þ

4x22:

Next let us think of a following linear transformation R whose matrix representation is given by R¼

cos θ ð sin θÞ=2

2 sin θ : cos θ

ð13:78Þ

The transformation matrix R is geometrically represented in Fig. 13.1. Following (11.36) we have

RðjxiÞ ¼ je1 i je2 i

cos θ 2 sin θ ð sin θÞ=2 cos θ

x1 : x2

ð13:79Þ

As a result of the transformation R, the basis vectors je1i and je2i are transformed into je10i and je20i, respectively, as in Fig. 13.1 such that

0 0 je1 i je2 i ¼ je1 i je2 i

cos θ ð sin θÞ=2

Taking an inner product of (13.79), we have

2 sin θ : cos θ

13.4

Orthonormal Basis

ðxÞR{ jRðxÞ

541

cos θ

ð sin θÞ=2

!

1

0

2 sin θ cos θ 0 ! ! 1 0 x1 ¼ ðx1 x2 Þ ¼ x21 þ 4x22 0 4 x2

4

¼ ðx1 x2 Þ

!

cos θ

2 sin θ

ð sin θÞ=2

cos θ

!

x1

!

x2

ð13:80Þ Putting G¼

1

0

0

4

,

ð13:81Þ

we have R{GR ¼ G. Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged after the transformation R. This means that R is virtually a unitary transformation. A somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other than an orthonormal basis.

13.4

Orthonormal Basis

Now we introduce an orthonormal basis, the simplest and most important basis set in an inner product space. If we choose the orthonormal basis so that heijeji ¼ δij, a Gram matrix G ¼ E. Thus, R{GR ¼ G reads as R{R ¼ E. In that case a linear transformation is represented by a unitary matrix and it conserves a norm of a vector and an inner product with two arbitrary vectors. So far we assumed that an adjoint operator A{ operates only on a row vector from the right, as is evident from (13.61). At the same time, A operates only on the column vector from the left as in (13.62). To render the notation of (13.61) and (13.62) consistent with the associative law, we have to examine the commutability of A{ with G. In this context, choice of the orthonormal basis enables us to get through such a troublesome situation and largely eases matrix calculation. Thus,

542

13

ðxÞA{ jAðxÞ

Inner Product Space

0

a11

B ¼ x1 xn B @⋮

an1

10

h e1 j

0

1

a1n

a11

CB B C B B C ⋮C A@ ⋮ Aðje1 i jen iÞ@ ⋮ a1n ann an1 h en j 0 1 0 1 x1 x1 { B C { B C C B C ¼ x1 xn A EAB @ ⋮ A ¼ x1 xn A A@ ⋮ A: xn xn ⋱

⋱

10

x1

1

CB C B C ⋮C A@ ⋮ A

ann

xn

ð13:82Þ At the same time, we adopt a simple notation as below instead of (13.75) 0

x1

1

B C xA{ jAx ¼ x1 xn A{ A@ ⋮ A:

ð13:83Þ

xn This notation has become now consistent with the associative law. Note that A{ and A operate on either a column or row vector. We can also do without a symbol “|” in (13.83) and express it as

xA{ Ax ¼ xA{ jAx :

ð13:84Þ

Thus, we can freely operate A{ and A from both the left and right. By the same token, we rewrite (13.62) as 0

a11

B hyjAxi ¼ hyAxi ¼ y1 yn @ ⋮ an1 0 1 x1 B C ¼ y1 yn A@ ⋮ A: xn

⋱

a1n

10

x1

1

CB C ⋮ A@ ⋮ A xn ann ð13:85Þ

Here, notice that a vector jxi is represented by a column vector with respect to the orthonormal basis. Using (13.64), we have

13.4

Orthonormal Basis

543

0

y1

1

B C C hyjAxi ¼ xA{ jy ¼ x1 xn A{ B @⋮A 0

a11

B ¼ x1 xn B @⋮

a1n

yn 10

1

an1

y1

⋱

CB C B C ⋮C A@ ⋮ A:

ann

yn

ð13:86Þ

If in (13.83) we put jyi ¼ jAxi, hxA{jAxi ¼ hyjyi 0. Thus, we define a norm of jAxi as jjAxjj ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi xA{ jAx :

ð13:87Þ

Now we are in a position to construct an orthonormal basis in V n using n linearly independent vectors jii (1 i n). The following theorem is well known as the Gram–Schmidt orthonormalization. Theorem 13.2: Gram–Schmidt Orthonormalization Theorem [5] Suppose that there are a set of linearly independent vectors jii (1 i n) in V n. Then one can construct an orthonormal basis jeii (1 i n) so that heijeji ¼ δij (1 j n) and each vector jeii can be a linear combination of the vectors jii. Proof First let us take j1i. This can be normalized such that j 1i je1 i ¼ pffiffiffiffiffiffiffiffiffiffi , he1 je1 i ¼ 1: h1j1i

ð13:88Þ

Next let us take j2i and then make a following vector: je2 i ¼

1 ½j2i he1 j2ije1 i, L2

ð13:89Þ

where L2 is a normalization constant such that he2je2i ¼ 1. Note that je2i cannot be a zero vector. This is because if je2i were a zero vector, j2i and je1i (or j1i) would be linearly dependent, in contradiction to the assumption. We have he1je2i ¼ 0. Thus, je1i and je2i are orthonormal. After this, the proof is based upon mathematical induction. Suppose that the theorem is true of (n 1) vectors. That is, let jeii (1 i n 1) so that heijeji ¼ δij (1 j n 1) and each vector jeii can be a linear combination of the vectors jii. Meanwhile, let us define

544

13

jf ni jni

Xn1 j¼1

ej jn jej i:

Inner Product Space

ð13:90Þ

Again, the vector j f ni cannot be a zero vector as asserted above. We have ni ¼ hek jni hek j f

Xn1 Xn1 e jn e je jn e jn δkj ¼ 0, ð13:91Þ ¼ e h i j k j k j¼1 j¼1 j

where 1 k n 1. The second equality comes from the assumption of the induction. The vector j f ni can always be normalized such that f j ni jen i ¼ qffiffiffiffiffiffiffiffiffiffi , hen jen i ¼ 1: fni f hnj

ð13:92Þ

Thus, the theorem is proven. In (13.92) a phase factor eiθ (θ: an arbitrarily chosen real number) can be added such that f eiθ jni jen i ¼ qffiffiffiffiffiffiffiffiffiffi : fni f hnj

ð13:93Þ

To prove Theorem 13.2, we have used the following simple but important theorem. ∎ Theorem 13.3 Let us have any n vectors jii ¼ 6 0 (1 i n) in V n and let these vectors jii be orthogonal to one another. Then the vectors jii are linearly independent. Proof Let us think of the following equation: c1 j1i þ c2 j2i þ þ cn jni ¼ 0:

ð13:94Þ

Multiplying (13.94) by hij from the left and considering the orthogonality among the vectors, we have ci hijii ¼ 0:

ð13:95Þ

Since hijii 6¼ 0, ci ¼ 0. The above is true of any ci and jii. Then c1 ¼ c2 ¼ ¼ cn ¼ 0. Thus, (13.94) implies that j1i,j2i, , jni are linearly independent. ∎

References

545

References 1. Hassani S (2006) Mathematical physics. Springer, New York 2. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 3. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 4. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd edn. Cambridge University Press, Cambridge 5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York

Chapter 14

Hermitian Operators and Unitary Operators

Hermitian operators and unitary operators are quite often encountered in mathematical physics and, in particular, quantum physics. In this chapter we investigate their basic properties. Both Hermitian operators and unitary operators fall under the category of normal operators. The normal matrices are characterized by an important fact that those matrices can be diagonalized by a unitary matrix. Moreover, Hermitian matrices always possess real eigenvalues. This fact largely eases mathematical treatment of quantum mechanics. In relation to these topics, in this chapter we investigate projection operators systematically. We find their important application to physicochemical problems in Part IV. We further investigate Hermitian quadratic forms and real symmetric quadratic forms as an important branch of matrix algebra. In connection with this topic, positive definiteness and non-negative property of a matrix are an important concept. This characteristic is readily applicable to theory of differential operators, thus rendering this chapter closely related to basic concepts of quantum physics.

14.1

Projection Operators

In Chap. 12 we considered the decomposition of a vector space to direct sum of invariant subspaces. We also mentioned properties of idempotent operators. Moreover, we have shown how an orthonormal basis can be constructed from a set of linearly independent vectors. In this section an orthonormal basis set is implied as basis vectors in an inner product space V n. Let us start with a concept of an orthogonal complement. Let W be a subspace in V n. Let us think of a set of vectors jxi such that

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_14

547

548

14

n jxi; hxjyi ¼ 0

for

Hermitian Operators and Unitary Operators 8

o j yi 2 W :

We name this set W⊥ and call it an orthogonal complement of W. The set W⊥ forms a subspace of V n. In fact, if jai, j bi 2 W⊥, ha j yi ¼ 0, hb| yi ¼ 0. Since (ha j + hb| )| yi ¼ ha| yi + hb| yi ¼ 0. Therefore, jai + j bi 2 W⊥ and hαa| yi ¼ αha| yi ¼ 0. Hence, jαai ¼ α j ai 2 W⊥. Then, W⊥ is a subspace of V n. Theorem 14.1 Let W be a subspace and W⊥ be its orthogonal complement in V n. Then, V n ¼ W⨁W⊥ :

ð14:1Þ

Proof Suppose that an orthonormal basis comprising je1i, j e2i, and j eni spans V n; V n ¼ Spanfje1 i, je2 i, . . . , jen ig:

ð14:2Þ

Of the orthonormal basis, let |e1i, |e2i, and |eri (r < n) span W. Let an arbitrarily chosen vector from V n be |xi. Then we have jxi ¼ x1 je1 i þ x2 je2 i þ þ xn jen i ¼

Xn

x i¼1 i

j ei i:

ð14:3Þ

Multiplying hejj on (14.3) from the left, we have Xn Xn j e ej j xi ¼ x e i ¼ x δ ¼ xj : i j i i¼1 i¼1 i ij

ð14:4Þ

That is, j xi ¼

Xn i¼1

hei jxijei i:

ð14:5Þ

hei jxijei i:

ð14:6Þ

Meanwhile, put j x0 i ¼

Xr i¼1

Then we have jx0i 2 W. Also putting jx00i ¼ j xi j x0i and multiplying hei j (1 i r) on it from the left, we get

14.1

Projection Operators

549

hei j x00 i ¼ hei jxi hei jx0 i ¼ hei jxi hei jxi ¼ 0:

ð14:7Þ

Taking account of W ¼ Span{| e1i, | e2i, , | eri}, we get jx00i 2 W⊥. That is, for 8jxi 2 V n jxi ¼jx0 i þ jx00 i:

ð14:8Þ

This means that V n ¼ W + W⊥. Meanwhile, we have W \ W⊥ ¼ {0}. In fact, suppose that |xi 2 W \ W⊥. Then hx j xi ¼ 0 because of the orthogonality. However, this implies that |xi ¼ 0. Consequently, we have V n ¼ W ⨁ W⊥. This completes the proof. The consequence of the Theorem 14.1 is that the dimension of W⊥ is (n r). In other words, we have dimV n ¼ n ¼ dimW þ dimW ⊥ : Moreover, the contents of the Theorem 14.1 can readily be generalized to more subspaces such that V n ¼ W 1 ⨁W2 ⨁ ⨁Wr ,

ð14:9Þ

where W1, W2, , and Wr (r n) are mutually orthogonal complements. In this case 8jxi 2 V n can be expressed uniquely as the sum of individual vectors jw1i, j w2i, , and j wri of each subspace; i.e., jxi ¼jw1 i þ jw2 i þ þjwr i ¼ jw1 þ w2 þ þ wr i:

ð14:10Þ

Let us define the following operators similarly to the case of (12.201): Pi ðjxiÞ ¼ jwi i ð1 i r Þ:

ð14:11Þ

Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have ðP1 þ P2 þ þ Pr ÞðjxiÞ ¼ P1 jxi þ P2 jxi þ þ Pr jxi ¼ jw1 iþjw2 i þ þ jwr i ¼jxi:

ð14:12Þ

Since jxi is an arbitrarily chosen vector, we get P1 þ P2 þ þ Pr ¼ E: Moreover,

ð14:13Þ

550

14

Hermitian Operators and Unitary Operators

Pi ½Pi ðjxiÞ ¼ Pi ðjwi iÞ ¼ jwi i ð1 i r Þ:

ð14:14Þ

Therefore, from (14.11) and (14.14) we have Pi ½Pi ðjxiÞ ¼ Pi ðjxiÞ:

ð14:15Þ

The vector jxi is arbitrarily chosen, and so we get Pi 2 ¼ Pi :

ð14:16Þ

Choose another arbitrary vector jyi 2 V n such that jyi ¼ju1 i þ ju2 i þ þjur i ¼ ju1 þ u2 þ þ ur i:

ð14:17Þ

Then, we have hxPi yi ¼ hw1 þ w2 þ þ wr Pi j u1 þ u2 þ þ ur i ¼ hw1 þ w2 þ þ wr j ui i ¼ hwi jui i

:

ð14:18Þ

With the last equality, we used the mutual orthogonality of the subspaces. Meanwhile, we have hyjPi xi ¼ hu1 þ u2 þ þ ur jPi jw1 þ w2 þ þ wr i ¼ hu1 þ u2 þ þ ur jwi i ¼ hui jwi i ¼ hwi jui i

:

ð14:19Þ

Comparing (14.18) and (14.19), we get hxjPi yi ¼ hyjPi xi ¼ hxPi { y ,

ð14:20Þ

where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily chosen, we get Pi { ¼ Pi :

ð14:21Þ

Equation (14.21) shows that Pi is Hermitian. The above discussion parallels that made in Sect. 12.4 with an idempotent operator. We have a following definition about a projection operator. Definition 14.1 An operator P is said to be a projection operator if P2 ¼ P and P{ ¼ P. That is, an idempotent and Hermitian operator is a projection operator. As described above, a projection operator is characterized by (14.16) and (14.21). An idempotent operator does not premise the presence of an inner product space, but we only need a direct sum of subspaces. In contrast, if we deal with the projection

14.1

Projection Operators

551

operator, we are thinking of orthogonal compliments as subspaces and their direct sum. The projection operator can adequately be defined in an inner product vector space having an orthonormal basis. From (14.13) we have ðP1 þ P2 þ þ Pr ÞðP1 þ P2 þ þ Pr Þ Xr X Xr X X ¼ P2þ PP ¼ P þ PP ¼Eþ P P ¼ E: i¼1 i i¼1 i i6¼j i j i6¼j i j i6¼j i j ð14:22Þ In (14.22) we used (14.13) and (14.16). Therefore, we get X

PP i6¼j i j

¼ 0:

In particular, we have Pi Pj ¼ 0, Pj Pi ¼ 0 ði 6¼ jÞ:

ð14:23Þ

Pi Pj ðjxiÞ ¼ Pi jwj i ¼ 0 ði 6¼ j, 1 i, j nÞ:

ð14:24Þ

In fact, we have

The second equality comes from Wi \ Wj ¼ {0}. Notice that in (14.24), indices i and j are interchangeable. Again, |xi is arbitrarily chosen, and so (14.23) holds. Combining (14.16) and (14.23), we write Pi Pj ¼ δij :

ð14:25Þ

In virtue of the relation (14.23), Pi + Pj (i 6¼ j) is a projection operator as well [1]. In fact, we have

Pi þ Pj

2

¼ Pi 2 þ Pi Pj þ Pj Pi þ Pj 2 ¼ Pi 2 þ Pj 2 ¼ Pi þ Pj ,

where the second equality comes from (14.23). Also we have

Pi þ Pj

{

¼ Pi { þ Pj { ¼ Pi þ Pj :

The following notation is often used: Pi ¼ Then we have

j wi ihwi j : j jwi j j j jwi j j

ð14:26Þ

552

14

Pi j xi ¼

Hermitian Operators and Unitary Operators

jwi ihwi j j ðj w1 iþ j w2 i þ þ jws iÞ jjwi jj jjwi jj

¼ ½jwi iðhwi jw1 i þ hwi jw2 i þ þ hwi jws iÞ=hwi jwi i ¼ ½jwi ihwi jwi i=hwi jwi i ¼j wi i: Furthermore, we have Pi 2 j xi ¼ Pi j wi i ¼j wi i ¼ Pi j xi:

ð14:27Þ

Equation (14.16) is recovered accordingly. Meanwhile, ðjwi ihwi jÞ{ ¼ ðhwi jÞ{ ðjwi iÞ{ ¼ jwi ihwi j:

ð14:28Þ

Hence, we recover Pi { ¼ Pi :

ð14:21Þ

In (14.28) we used (AB){ ¼ B{A{. In fact, we have D E { { { { xB A y ¼ xB A y ¼ hyAjBxi ¼ hyjABjxi ¼ xðABÞ{ jy :

ð14:29Þ

With the second equality of (14.29), we used (13.86) where A is replaced with B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors jxi and jyi, comparing the first and last sides of (14.29) we have ðABÞ{ ¼ B{ A{ :

ð14:30Þ

We can express (14.29) alternatively as follows:

D { { E xB{ A{ y ¼ y A{ B{ x ¼ hyAjBxi ¼ hyABxi ,

ð14:31Þ

where with the second equality we used (13.68). Also recall the remarks after (13.83) with the expressions of (14.29) and (14.31). Other notations can be adopted. We can view a projection operator under a more strict condition. Related operators can be defined as well. As in (14.26), let us define an operator such that Pek ¼j ek ihek j :

ð14:32Þ

Operating Pei on jxi ¼ x1 j e1i + x2 j e2i + + xn j eni from the left, we get

14.1

Projection Operators

553

Xn

Xn Xn ej ¼ek i ej ¼ ek i Pek jxi¼jek ihek x x e x δ ¼x ek i: j j k j kj k j¼1 j¼1 j¼1 Thus, we find that Pek plays the same role as P(k) defined in (12.201). Represented by a matrix, Pek has the same structure as that denoted in (12.205). Evidently, 2 { f1 þ f Pek ¼ Pek , Pek ¼ Pek , P P2 þ þ f Pn ¼ E:

ð14:33Þ

ðk Þ

Now let us modify P(k) in (12.201). There PðkÞ ¼ δi δjðkÞ , where only the (k,k) ðk Þ

ðk Þ

element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ ¼ δi δjðmÞ. A full matrix representation for it is 0

ðk Þ PðmÞ

B B B B B B B ¼B B B B B B @

1

0 ⋱

C C C C C C C C, C C C C C A

1 0 0 0 ⋱

ð14:34Þ

0 ðk Þ

where only the (k, m) element is 1, otherwise 0. In an example of (14.34), PðmÞ is an ðk Þ

upper triangle matrix (k < m). Therefore, its eigenvalues are all zero, and so PðmÞ is a nilpotent matrix. If k > m, the matrix is a lower triangle matrix and nilpotent as well. Such a matrix is not Hermitian (nor a projection operator), as can be immediately seen from the matrix form of (14.34). Because of the properties of nilpotent matrices ðk Þ mentioned in Sect. 12.3, PðmÞ ðk 6¼ mÞ is not diagonalizable either. Various relations can be extracted. As an example, we have ðk Þ

ðlÞ

PðmÞ PðnÞ ¼

X

ðk Þ δ δqðmÞ δðqlÞ δjðnÞ q i

ðk Þ

ðk Þ

¼ δi δml δjðnÞ ¼ δml PðnÞ :

ðk Þ

ð14:35Þ

Note that PðkÞ PðkÞ defined in (12.201). From (14.35), moreover, we have ðk Þ

ðmÞ

ðk Þ

ðnÞ

ðmÞ

ðnÞ

ðmÞ ðnÞ

ðmÞ

PðmÞ PðnÞ ¼ PðnÞ , PðmÞ PðnÞ ¼ PðnÞ , PðnÞ PðmÞ ¼ PðmÞ , h i2 ðkÞ ðk Þ ðk Þ ðmÞ ðmÞ ðmÞ ðmÞ PðmÞ PðmÞ ¼ δmk PðmÞ , PðmÞ PðmÞ ¼ PðmÞ ¼ PðmÞ , etc:

554

14

Hermitian Operators and Unitary Operators

These relations remain unchanged after the unitary similarity transformation by U. For instance, taking the first equation of the above, we have h ih i ðkÞ ðmÞ ðk Þ ðmÞ ðk Þ U { PðmÞ PðnÞ U ¼ U { PðmÞ U U { PðnÞ U ¼ U { PðnÞ U: ðk Þ

Among these operators, only PðkÞ is eligible for a projection operator. We will encounter further examples in Part IV. ðk Þ Using PðkÞ in (13.62), we have 0 1 x1 D E ðkÞ C ðk Þ B yPðkÞ ðxÞ ¼ y1 yn GPðkÞ @ ⋮ A:

ð14:36Þ

xn Within a framework of an orthonormal basis where G ¼ E, the representation is largely simplified to be 0 1 x1 D E ðkÞ B C ðkÞ yPðkÞ x ¼ y1 yn PðkÞ @ ⋮ A ¼ yk xk : xn

14.2

ð14:37Þ

Normal Operators

There are a large group of operators called normal operators that play an important role in mathematical physics, especially quantum physics. A normal operator is defined as an operator on an inner product space that commutes with its adjoint operator. That is, let A be a normal operator. Then, we have AA{ ¼ A{ A:

ð14:38Þ

The normal operators include an Hermitian operator H defined as H{ ¼ H as well as a unitary operator U defined as UU{ ¼ U{U ¼ E. In this condition let us estimate the norm of jA{xi together with jAxi defined by (13.87). If A is a normal operator, kA{ xk ¼

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi D { Effi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi xAA{ x ¼ x A{ jA{ x ¼ xA{ Ax ¼j jAxj j :

ð14:39Þ

14.2

Normal Operators

555

The other way around suppose that j|Ax| j ¼ kA{xk. Then, since ||Ax||2 ¼ kA{xk2, hxA{Axi ¼ hxAA{xi. That is, hx| A{A AA{| xi ¼ 0 for an arbitrarily chosen vector jxi. To assert A{A AA{ ¼ 0, i.e., A{A ¼ AA{ on the assumption that hx| A{A AA{| xi ¼ 0, we need the following theorems: Theorem 14.2 [2] A linear transformation A on an inner product space is the zero transformation if and only if hy| Axi ¼ 0 for any vectors jxi and jyi. Proof If A ¼ 0, then hy| Axi ¼ hy| 0i ¼ 0. This is because in (13.3) putting β ¼ 1 ¼ γ and jbi ¼ j ci, we get ha| 0i ¼ 0. Conversely, suppose that hy| Axi ¼ 0 for any vectors jxi and jyi. Then, putting jyi ¼ j Axi, hxA{| Axi ¼ 0 and hy| yi ¼ 0. This implies that jyi ¼ j Axi ¼ 0. For jAxi ¼ 0 to hold for any jxi we must have A ¼ 0. Note here that if A is a singular matrix, for some vectors jxi, j Axi ¼ 0. However, even though A is singular, for jAxi ¼ 0 to hold for any jxi, A ¼ 0. We have another important theorem under a further restricted condition. ∎ Theorem 14.3 [2] A linear transformation A on an inner product space is the zero transformation if and only if hx| Axi ¼ 0 for any vectors jxi. Proof As in the case of Theorem 14.2, a necessary condition is trivial. To prove a sufficient condition, let us consider the following: hx þ yjAðx þ yÞi ¼ hxjAxi þ hyjAyi þ hxjAyi þ hyjAxi, hxjAyi þ hyjAxi ¼ hx þ yjAðx þ yÞi hxjAxi hyjAyi: ð14:40Þ From the assumption that hx| Axi ¼ 0 with any vectors jxi, we have hxjAyi þ hyjAxi ¼ 0:

ð14:41Þ

Meanwhile, replacing jyi by jiyi in (14.41), we get hxjAiyi þ hiyjAxi ¼ i½hxjAyi hyjAxi ¼ 0:

ð14:42Þ

hxjAyi hyjAxi ¼ 0:

ð14:43Þ

That is,

Combining (14.41) and (14.43), we get hxjAyi ¼ 0:

ð14:44Þ

Theorem 14.2 means that A ¼ 0, indicating that the sufficient condition holds. This completes the proof. ∎ Thus, returning to the beginning, i.e., remarks made after (14.39), we establish the following theorem:

556

14

Hermitian Operators and Unitary Operators

Theorem 14.4 A necessary and sufficient condition for a linear transformation A on an inner product space to be a normal operator is kA{ xk ¼ kAxk:

14.3

ð14:45Þ

Unitary Diagonalization of Matrices

A normal operator has a distinct property. The normal operator can be diagonalized by a similarity transformation by a unitary matrix. The transformation is said to be a unitary similarity transformation. Let us prove the following theorem. Theorem 14.5 [3] A necessary and sufficient condition for a matrix A to be diagonalized by unitary similarity transformation is that the matrix A is a normal matrix. Proof To prove the necessary condition, suppose that A can be diagonalized by a unitary matrix U. That is, U { AU ¼ D, i:e:, A ¼ UDU { and

A{ ¼ UD{ U { ,

ð14:46Þ

where D is a diagonal matrix. Then AA{ ¼ UDU { UD{ U { ¼ UDD{ U { ¼ UD{ DU { ¼ UD{ U { UDU { ¼ A{ A:

ð14:47Þ

For the third equality, we used DD{ ¼ D{D (i.e., D and D{ are commutable). This shows that A is a normal matrix. To prove the sufficient condition, let us show that a normal matrix can be diagonalized by unitary similarity transformation. The proof is due to mathematical induction, as is the case with Theorem 12.1. First we show that Theorem is true of a (2, 2) matrix. Suppose that one of eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following procedures of the proof for Theorem 12.1 and remembering the Gram–Schmidt orthonormalization theorem, we can construct a unitary matrix U1 such that U 1 ¼ ðjx1 i jp1 iÞ,

ð14:48Þ

where jx1i represents a column vector and jp1i is another arbitrarily determined column vector. Then we can convert A2 to a triangle matrix such that

14.3

Unitary Diagonalization of Matrices

557

f A2 U 1 { A2 U 1 ¼

α1 0

x : y

ð14:49Þ

Then we have { { f2 ¼ U 1 A2 U 1 U 1 { A2 U 1 Þ{ ¼ U 1 { A2 U 1 U 1 { A2 { U 1 f2 A A h i{ f2 : ¼ U 1 { A2 A2 { U 1 ¼ U 1 { A2 { A2 U 1 ¼ U 1 { A2 { U 1 U 1 { A2 U 1 ¼ f A2 A ð14:50Þ With the fourth equality, we used the supposition that A2 is a normal matrix. Equation (14.50) means that f A2 defined in (14.49) is a normal operator. Via simple matrix calculations, we have h

f A2 A2 f

i{

¼

jα1 j2 þ jxj2 x y

xy jyj2

!

h

,

i{

f A2 ¼ A2 f

jα1 j2

α1 x

α1 x

jxj2 þ jyj2

! : ð14:51Þ

For (14.50) to hold, we must have x ¼ 0 in (14.51). Accordingly, we get f A2 ¼

α1 0

0 : y

ð14:52Þ

This implies that a normal matrix A2 has been diagonalized by the unitary similarity transformation. Now let us examine a general case where we consider a (n, n) square normal matrix An. Let αn be one of eigenvalues of An. On the basis of the argument of the e we (2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U first have { fn ¼ U e e An U, A

ð14:53Þ

where we can put fn ¼ A

αn

xT

0

B

,

ð14:54Þ

where x is a column vector of order (n 1), 0 is a zero column vector of order (n 1), and B is a (n 1, n 1) matrix. Then we have

558

fn { ¼ ½A

14

Hermitian Operators and Unitary Operators

αn x

0 , B{

ð14:55Þ

where x is a complex column vector. Performing matrix calculations, we have fn { ¼ f An ½A

jαn j2 þ xT x

xT B{

Bx

BB{

!

jαn j2

fn ¼ fn { ½A ½A

,

α n x

αn xT

!

: x xT þ B { B ð14:56Þ

fn { ½g fn { ¼ ½A fn ½A An to hold with (14.56), we must have x = 0. Thus we get For A fn ¼ A

αn 0

0 : B

ð14:57Þ

Since f An is a normal matrix, so is B. According to mathematical induction, let us assume that the theorem holds with a (n 1, n 1) matrix, i.e., B. Then, also from the assumption there exists a unitary matrix C and a diagonal matrix D, both of order (n 1), such that BC ¼ CD. Hence,

αn 0

0 B

1 0

0 C

¼

1 0

0 C

αn 0

0 : D

ð14:58Þ

Here putting fn ¼ C

1

0

0

C

and

fn ¼ D

αn

0

0

D

,

ð14:59Þ

we get f fn ¼ C fn D fn : An C

ð14:60Þ

h i{ fn ¼ C fn ¼ E . Hence, fn is a (n, n) unitary matrix, [ C fn { C fn C As C h i{ fn C fn A fn ¼ D fn . Thus, from (14.53) finally we get C h

fn C

i{ { fn ¼ D e An U eC fn : U

ð14:61Þ

fn ¼ V, V being another unitary operator, eC Putting U fn : V { An V ¼ D This completes the proof.

ð14:62Þ ∎

14.3

Unitary Diagonalization of Matrices

559

A direct consequence of Theorem 14.5 is that with any normal matrix we can find a set of orthonormal eigenvectors corresponding to individual eigenvalues whether or not those are degenerate. In Sect. 12.2 we dealt with a decomposition of a linear vector space and relevant reduction of an operator when discussing canonical forms of matrices. In this context Theorem 14.5 gives a simple and clear criterion for this. Equation (14.57) implies that a (n 1, n 1) submatrix B can further be reduced to matrices having lower dimensions. Considering that a diagonal matrix is a special case of triangle matrices, a normal matrix that has been diagonalized by the unitary similarity transformation gives eigenvalues by its diagonal elements. From a point of view of the aforementioned aspect, let us consider the characteristics of normal matrices, starting with the discussion about the invariant subspaces. We have a following important theorem. Theorem 14.6 Let A be a normal matrix and let one of its eigenvalues be α. Let Wα be an eigenspace corresponding to α. Then, Wα is both A-invariant and A{-invariant. Also Wα⊥ is both A-invariant and A{-invariant. Proof Theorem 14.5 ensures that a normal matrix is diagonalized by unitary similarity transformation. Therefore, we deal with only “proper” eigenvalues and eigenvectors here. First we show if a subspace W is A-invariant, then its orthogonal complements W⊥ is A{-invariant. In fact, suppose that jxi 2 W and jx0i 2 W⊥. Then, from (13.64) and (13.86) we have hx0 jAxi ¼ 0 ¼ xA{ jx0 ¼ xjA{ x0 :

ð14:63Þ

The first equality comes from the fact that jxi 2 W ⟹ A j xi(¼| Axi) 2 W as W is A-invariant. From the last equality of (14.63), we have A{ j x0 i ¼ jA{ x0 i 2 W ⊥ :

ð14:64Þ

That is, W⊥ is A{-invariant. Next suppose that jxi 2 Wα. Then we have AA{ j xi ¼ A{ A j xi ¼ A{ ðαjxiÞ ¼ αA{ j xi:

ð14:65Þ

Therefore, A{ j xi 2 Wα. This means that Wα is A{-invariant. From the above remark, Wα⊥ is (A{){-invariant, i.e., A-invariant accordingly. This completes the proof. ∎ fn in (14.62) has From Theorem 14.5, we know that the resulting diagonal matrix D a form with n eigenvalues (αn) some of which may be multiple roots arranged in diagonal elements. After diagonalizing the matrix, those eigenvalues can be sorted out according to different eigenvalues α1, α2, , and αs. This can also be done by unitary similarity transformation. The relevant unitary matrix U is represented as

560

14

0 B B B B B B B B B B B U¼B B B B B B B B B B @

Hermitian Operators and Unitary Operators

1

1

C C C C C C C C C C C C, C C C C C C C C C A

⋱ 1

0

1

1 ⋮

⋱

⋮ 1

1

0 1 ⋱

ð14:66Þ

1 where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If operated from the right, U exchanges the i-th and j-th columns of the matrix. Note that U is at once unitary and Hermitian with eigenvalue 1 or 1. Note that U2 ¼ E. This is because exchanging two columns (or two rows) two times produces identity transformation. Thus performing such unitary similarity transformations appropriate times, we get 0 B B B B B B B B B ~ Dn B B B B B B B B B @

1

α1

C C C C C C C C C C: C C C C C C C C A

⋱ α1 α2 ⋱ α2 ⋱ αs ⋱

ð14:67Þ

αs The matrix is identical to that represented in (12.181). In parallel, V n is decomposed to mutually orthogonal subspaces associated with different eigenvalues α1, α2, , and αs such that V n ¼ W α1 W α2 W αs :

ð14:68Þ

This expression is formally identical to that represented in (12.191). Note, however, that in (12.181) orthogonal subspaces are not implied. At the same time, An is reduced to

14.3

Unitary Diagonalization of Matrices

0 B B B ~ An B B B @

561

1

Að1Þ

C C C C, C C ⋱ ⋮A AðsÞ

Að2Þ ⋮

ð14:69Þ

according to the different eigenvalues. A normal operator has other distinct properties. Following theorems are good examples. Theorem 14.7 Let A be a normal operator on V n. Then jxi is an eigenvector of A with an eigenvalue α, if and only if jxi is an eigenvector of A{ with an eigenvalue α. Proof We apply (14.45) for the proof. Both (A αE){ ¼ A{ αE and (A αE) are normal, since A is normal. Consequently, we have j|(A αE)x| j ¼ 0 if and only if k(A{ αE)xk ¼ 0. Since only the zero vector has a zero norm, we get ðA αE Þ j xi ¼ 0 if and only if A{ α E j xi ¼ 0: ∎

This completes the proof. n

Theorem 14.8 Let A be a normal operator on V . Then, eigenvectors corresponding to different eigenvalues are mutually orthogonal. Proof Let A be a normal operator on V n. Let jui be an eigenvector corresponding to an eigenvalue α and jvi be an eigenvector corresponding to an eigenvalue β with α 6¼ β. Then we have αhvjui ¼ hvjαui ¼ vjAui ¼ hujA{ v ¼ hujβ vi ¼ hβ vjui ¼ βhvjui,

ð14:70Þ

where with the fourth equality we used Theorem 14.7. Then we get ðα βÞhvjui ¼ 0: Since α β 6¼ 0, hv| ui ¼ 0. Namely, the eigenvectors jui and jvi are mutually orthogonal. In (12.208) we mentioned the decomposition of diagonalizable matrices. As for the normal matrices, we have a related matrix decomposition. Let A be a normal operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as (14.67). This is equivalently expressed as a following succinct relation. That is, if we choose U for a diagonalizing unitary matrix, we have

562

14

Hermitian Operators and Unitary Operators

U { AU ¼ α1 P1 þ α2 P2 þ þ αs Ps ,

ð14:71Þ

where α1, α2, , and αs are different eigenvalues of A; Pl (1 l s) is described such that, e.g., 0 B B B P1 ¼ B B B @

1

E n1

C C C C, C C ⋱ ⋮A 0n s

0n 2 ⋮

ð14:72Þ

where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This expression is in accordance with (14.69). From a matrix form (14.72), obviously Pl (1 l s) is a projection operator. Thus, operating U and U{ on both sides (14.71) from the left and right of (14.71), respectively, we obtain A ¼ α1 UP1 U { þ α2 UP2 U { þ þ αs UPs U { :

ð14:73Þ

Defining Pel UPl U { ð1 l sÞ, we have P1 þ α1 f P2 þ þ αs Pes : A ¼ α1 f

ð14:74Þ

In (14.74), we can easily check that Pel is a projection operator with α1, α2, , and αs being different eigenvalues of A. If αl (1 l s) is degenerate, we express Pel fμ ð1 μ m Þ, where m is multiplicity of α . In that case, we may write as P l

l

l

l

ml Pel ¼ f P1l Pf l :

ð14:75Þ

Also we have Pek Pel ¼ UPk U { UPl U { ¼ UPk EPl U { ¼ UPk Pl U { ¼ 0 ð1 k, l sÞ: The last equality comes from (14.23). Similarly, we have Pel Pek ¼ 0: Thus, we have

14.3

Unitary Diagonalization of Matrices

563

Pek Pel ¼ δkl : If the operator is decomposed as in the case of (14.75), we can express fμ Peν ¼ δ ð1 μ, ν m Þ: P μν l l l Conversely, if an operator A is expressed by (14.74), that operator is normal operator. In fact, we have

{ X

X X X e e ei ej ¼ ei ¼ ei , P P P P P α α α α ¼ α α δ jα j2 P i j i j i j i j ij i j i,j i,j i i X

X

{ X X X e e ej ei ¼ ej ¼ ei P P P P P AA{ ¼ α α α α ¼ α α δ jα j2 P j i i j j i i j ji j i i,j i,j i i

A{ A ¼

X

ð14:76Þ Hence, A{A ¼ AA{. If projection operators are further decomposed as in the case of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient condition for an operator to be a normal operator is that the said operator is expressed as (14.74). The relation (14.74) is well known as a spectral decomposition theorem. Thus, the spectral decomposition theorem is equivalent to Theorem 14.5. The relations (12.208) and (14.74) are virtually the same, aside from the fact that whereas (14.74) premises an inner product space, (12.208) does not premise it. Correspondingly, whereas the related operators are called projection operators with the case of (14.74), those operators are said to be idempotent operators for (12.208). Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below. G¼

2

1þi

1i

2

:

ð13:51Þ

After a unitary similarity transformation, we got {

U GU ¼

2þ

pffiffiffi 2

0

! 0 pffiffiffi : 2 2

e ¼ U { GU and rewriting (13.54), we have Putting G

ð13:54Þ

564

14

Hermitian Operators and Unitary Operators

! pffiffiffi pffiffiffi 1 2þ 2 0 pffiffiffi ¼ 2 þ 2 0 0 2 2

e¼ G

0 0

pffiffiffi 0 0 þ 2 2 : 0 1

e { ¼ G, we get By a back calculation of U GU 0 1 pffiffiffi 2ð1 þ iÞ 1

pffiffiffi B pffiffiffi

C 4 2 Cþ 2 2 G¼ 2þ 2 B p ffiffi ffi @ A 2ð 1 i Þ 1 4 2 0 1 pffiffiffi 2ð 1 þ i Þ 1 B C 4 2 C: B @ pffiffiffi A 2ð 1 i Þ 1 4 2 pffiffiffi pffiffiffi Putting eigenvalues α1 ¼ 2 þ 2 and α2 ¼ 2 2 along with

0

1 B 2 A1 ¼ B p ffiffi ffi @ 2ð1 iÞ 4

1 pffiffiffi 2ð1 þ iÞ C 4 C, A 1 2

0

1 B 2 A2 ¼ B p ffiffi ffi @ 2ð1 iÞ 4

1 pffiffiffi 2ð1 þ iÞ C 4 C, A 1 2

ð14:77Þ

ð14:78Þ

we get G ¼ α1 A1 þ α2 A2 :

ð14:79Þ

In the above, A1 and A2 are projection operators. In fact, as anticipated we have A1 2 ¼ A1 , A2 2 ¼ A2 , A1 A2 ¼ A2 A1 ¼ 0, A1 þ A2 ¼ E:

ð14:80Þ

Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus, Eqs. (14.77) and (14.79) are an example of the spectral decomposition. The decomposition is unique. The Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5, however, an inner product space is not implied, and so we used an idempotent matrix instead of a projection operator. Note that as can be seen in Example 12.5 that idempotent matrix was not Hermitian.

14.4

Hermitian Matrices and Unitary Matrices

Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both in fundamental and applied science. Let us think of several topics and examples.

14.4

Hermitian Matrices and Unitary Matrices

565

In quantum physics, one frequently treats expectation value of an operator. In general, such an operator is Hermitian, more strictly an observable. Moreover, a vector on an inner product space is interpreted as a state on a Hilbert space. Suppose that there is a linear operator (or observable that represents a physical quantity) O that has discrete (or countable) eigenvalues α1, α2, . The number of the eigenvalues may be a finite number or an infinite number, but here we assume the finite number; i.e., let us suppose that we have eigenvalues α1, α2, , and αs, in consistent with our previous discussion. In quantum physics, we have a well-known Born probability rule. The rule says the following: Suppose that we carry out a physical measurement on A with respect to a physical state jui. Here we assume that jui has been normalized. Then, the probability ℘l that A takes αl (1 l s) is given by 2 ℘l ¼ Pel u ,

ð14:81Þ

where Pel is a projection operator that projects jui to an eigenspace W αl spanned by jαl, ki. Here k (1 k nl) reflects the multiplicity nl of an eigenvalue αl. Hence, we express the nl-dimensional eigenspace W αl as W αl ¼ Spanfjαl , 1i, jαl , 2i, , jαl , nl ig:

ð14:82Þ

Now, we define an expectation value hAi of A such that hAi

Xs

α℘: l¼1 l l

ð14:83Þ

From (14.81) we have E 2 D { ℘l ¼ Pel u ¼ uPel jPel u ¼ uPel jPel u ¼ uPel Pel u ¼ uPel u :

ð14:84Þ

For the third equality we used the fact that Pel is Hermitian; for the last equality we 2 used Pel ¼ Pel . Summing (14.84) with the index l, we have X

℘ ¼ l l

X l

D X E uPel u ¼ u Pe u ¼ huEui ¼ hu j ui ¼ 1, l l

where with the third equality we used (14.33). Meanwhile, from the spectral decomposition theorem, we have P1 þ α2 f P2 þ þ αs Pes : A ¼ α1 f Operating huj and jui on both sides of (14.74), we get

ð14:74Þ

566

14

Hermitian Operators and Unitary Operators

D E D E f f e hujAjui ¼ α1 uP 1 u þ α2 uP2 u þ þ αs u Ps u ¼ α1 ℘1 þ α2 ℘2 þ þ αs ℘s :

ð14:85Þ

Equating (14.83) and (14.85), we have hAi ¼ hujAjui:

ð14:86Þ

In quantum physics, a real number is required for an expectation value of an observable (i.e., a physical quantity). To warrant this, we have following theorems. Theorem 14.9 A linear transformation A on an inner product space is Hermitian if and only if hu|A|ui is real for all jui of that inner product space. Proof If A ¼ A{, then hu|A|ui ¼ hu| A{| ui ¼ hu| A| ui. Therefore, hu| A| ui is real. Conversely, if hu|A|ui is real for all jui, we have hujAjui ¼ hujAjui ¼ ujA{ ju : Hence, ujA A{ ju ¼ 0:

ð14:87Þ

From Theorem 14.3, we get A A{ ¼ 0, i.e., A ¼ A{. This completes the proof.∎ Theorem 14.10 The eigenvalues of an Hermitian operator A are real. Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then, A j ui ¼ α j ui. Operating huj from the left, hu|A|ui ¼ αhu| ui ¼ α j |u|j. Thus, α ¼ hujAjui= j juj j :

ð14:88Þ

Since A is Hermitian, hu|A|ui is real. Then, α is real as well. Unitary matrices have a following conspicuous features: (i) An inner product is held invariant under unitary transformation: Suppose that jx0i ¼ U j xi and jy0i ¼ U j yi, where U is a unitary operator. Then hy0| x0i ¼ hyU{| Uxi ¼ hy| xi. A norm of any vector is held invariant under unitary transformation as well. This is easily checked by replacing jyi with jxi in the above. (ii) Let U be a unitary matrix and suppose that λ be an eigenvalue with jλi being its corresponding eigenvector of that matrix. Then we have j λi ¼ U { U j λi ¼ U { ðλjλiÞ ¼ λU { j λi ¼ λλ j λi, where with the last equality we used Theorem 14.7. Thus

ð14:89Þ

14.4

Hermitian Matrices and Unitary Matrices

567

ð1 λλ Þ j λi ¼ 0:

ð14:90Þ

As jλi 6¼ 0 is assumed, 1 λλ ¼ 0. That is λλ ¼ jλj2 ¼ 1:

ð14:91Þ

Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigenvalues be λk ¼ eiθk ðk ¼ 1, 2, ; θk : realÞ. Then, from (12.11) we have detU ¼

Y

λk ¼ eiðθ1 þθ2 þÞ and jdetU j ¼

Y Y eiθk ¼ 1: jλk j ¼

k

k

k

That is, any unitary matrix has a determinant of unit absolute value.

∎

Example 14.2 Let us think of a following unitary matrix R:

sin θ : cos θ

ð14:92Þ

sin θ ¼ λ2 2λ cos θ þ 1: cos θ λ

ð14:93Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cos 2 θ 1 ¼ cos θ i j sin θ j :

ð14:94Þ

R¼

cos θ sin θ

A characteristic equation is cos θ λ sin θ Solving (14.93), we have λ ¼ cos θ

(i) θ ¼ 0: This is a trivial case. The matrix R has automatically been diagonalized to be an identity matrix. Eigenvalues are 1 (double root). (ii) θ ¼ π: This is a trivial case. Eigenvalues are 1 (double root). (iii) θ 6¼ 0, π: Let us think of 0 < θ < π. Then λ ¼ cos θ i sin θ. As a diagonalizing unitary matrix U, we get 0

1 1 1 pffiffiffi pffiffiffi B 2 2C C, U¼B @ i i A pffiffiffi pffiffiffi 2 2 Therefore, we have

0

1 pffiffiffi B 2 U{ ¼ B @ 1 pffiffiffi 2

1 i pffiffiffi 2 C C: i A pffiffiffi 2

ð14:95Þ

568

14

0

1 1 i pffiffiffi pffiffiffi B 2 cos θ 2 C { B C U RU ¼ @ 1 i A sin θ pffiffiffi pffiffiffi 2 2 iθ

e 0 ¼ : 0 eiθ

Hermitian Operators and Unitary Operators

0

1 pffiffiffi sin θ B 2 B cos θ @ piffiffiffi 2

1 1 pffiffiffi 2C C i A pffiffiffi 2 ð14:96Þ

A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a diagonal matrix similarly. The conformation is left for readers.

14.5

Hermitian Quadratic Forms

The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications in the field of mathematical physics and materials science. Let H be an Hermitian operator and jxi on an inner vector space. We define the Hermitian quadratic form in an arbitrary orthonormal basis as follows: 0

x1

1

B C X x ðH Þij xj , hxjH jxi x1 xn H @ ⋮ A ¼ i,j i xn where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a diagonal matrix and an inner product such that 0

λ1

B hxjH jxi ¼ ξ1 ξn @ ⋮ 0

2 CB C ⋱ ⋮ A@ ⋮ A ¼ λ1 ξ1 j2 þ þ λn ξn : ð14:97Þ λn ξn

0

10

ξ1

1

Notice, however, that the Gram matrix comprising basis vectors (that are linearly independent) is positive definite. Remember that if a Gram matrix is constructed by A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is either positive definite or non-negative. The Hermitian matrix we are dealing with here, in general, does not necessarily possess the positive definiteness or non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1, , λn are real. Positive definiteness of matrices is an important concept in relation to the Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case where all the matrix elements are real and | xi is defined in a real domain, we are

14.5

Hermitian Quadratic Forms

569

dealing with the quadratic form with respect to a real symmetric matrix. In the case of the real quadratic forms, we sometimes adopt the following notation [4, 5]:

A½x xT Ax ¼

0

Xn

a x x ;x i,j¼1 ij i j

x1

1

B C ¼ @ ⋮ A, xn

where A ¼ (aij) is a real symmetric matrix and xi (1 i n) are real numbers. The positive definiteness is invariant under a transformation PTAP, where P is non-singular. In fact, if A > 0, for x0T ¼ xTP and A0 ¼ PTAP we have xT Ax = xT P PT AP PT x = x0T A0 x0 : Since P is non-singular, PTx = x0 represents any arbitrary vector. Hence, A0 ¼ PT AP > 0,

ð14:98Þ

where with the notation PTAP > 0; see (13.46). In particular, using a suitable orthogonal matrix O, we obtain 0

λ1

B OT AO ¼ @ ⋮ 0

⋱

0

1

C ⋮ A: λn

From the above argument, OTAO > 0. We have detOT AO ¼ detOT detAdetO ¼ ð 1ÞdetAð 1Þ ¼ detA: Therefore, from (14.97), we have λi > 0 (1 i n). Thus, we get detA ¼

n Y

λi > 0:

i¼1

Notice that in the above discussion, PTAP is said to be an equivalent transformation of A by P and that we distinguish the equivalent transformation from the similarity transformation P1A P. Nonetheless, if we choose an orthogonal matrix O for P, the two transformations are the same because OT ¼ O1. We often encounter real quadratic forms in the field of electromagnetism. Typical example is a trace of electromagnetic fields observed with an elliptically or circularly polarized light (see Sect. 7.3). A permittivity tensor of an anisotropic media such as crystals (either inorganic or organic) is another example. A tangible example for this appeared in Sect. 9.5.2 with an anisotropic organic crystal. Regarding the real quadratic forms, we have a following important theorem.

570

14

Hermitian Operators and Unitary Operators

Theorem 14.11 [4, 5] Let A be a n-dimensional real symmetric matrix A ¼ (aij). Let A(k) be k-dimensional principal submatrices described by 0

ai1 i1

Ba B i2 i1 AðkÞ ¼ B @ ⋮ aik i1

ai1 i2 ai2 i2 ⋮ aik i2

ai1 ik

1

ai2 ik C C C ð1 i1 < i2 < < ik nÞ, ⋱ ⋮ A

aik ik

where the principal submatrices mean a matrix made by striking out rows and columns on diagonal elements. Alternatively, a principal submatrix of a matrix A can be defined as a matrix whose diagonal elements are a part of the diagonal elements of A. As a special case, those include a11, , or ann as a (1, 1) matrix (i.e., merely a number) as well as A itself. Then, we have A > 0⟺detAðkÞ > 0 ð1 k nÞ:

ð14:99Þ

Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n k) variables xl ¼ 0 (l 6¼ i1, i2, , ik), we obtain a quadratic form of Xk

a x x : μ,ν¼1 iμ iν iμ iν

Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have AðkÞ > 0: Therefore, we get detAðkÞ > 0: This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we have proven ⟹ of (14.99). To prove ⟸, in turn, we use mathematical induction. If n ¼ 1, we have a trivial case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n 1. Then, we have A(n 1) > 0 by supposition. Thus, it follows that it will suffice to show A > 0 on condition that A(n 1) > 0 and detA > 0 in addition. Let A be a ndimensional real symmetric non-singular matrix such that

14.5

Hermitian Quadratic Forms

571

! a , an

Aðn1Þ aT

A¼

where A(n 1) is a symmetric matrix and non-singular as well. We define P such that P¼

1

E

Aðn1Þ a

0

1

! ,

where E is a (n 1, n 1) unit matrix. Notice that detP ¼ det E 1 ¼ 1 6¼ 0, indicating that P is non-singular. We have P ¼ T

E

0 : 1

1

aT Aðn1Þ

For this expression, consider a non-singular matrix S. Then, we have SS1 ¼ E. Taking its transposition, we have (S1)TST ¼ E. Therefore, if S is a symmetric matrix (S1)TS ¼ E; i.e., (S1)T ¼ S1. Hence, an inverse matrix of a symmetric matrix is symmetric as well. Then, for a symmetric matrix A(n 1) we have

1 T 1 T T T 1 ¼ Aðn1Þ a ¼ Aðn1Þ a: aT Aðn1Þ Therefore, A can be expressed as A¼P

T

¼

!

Aðn1Þ

0

0

an Aðn1Þ ½a ! 0 Aðn1Þ

1

E 1

aT Aðn1Þ

1

0

P !

0

Aðn1Þ a

0

1

1

an Aðn1Þ ½a

1

E

! : ð14:100Þ

Now, taking a determinant of (14.100), we have o h in 1 det A ¼ det PT det Aðn1Þ an Aðn1Þ ½a det P o h in 1 ¼ det Aðn1Þ an Aðn1Þ ½a : By supposition, we have det A(n 1) > 0 and det A > 0. Hence, we have

572

14

Hermitian Operators and Unitary Operators

1

an Aðn1Þ ½a > 0: Putting aen an A

ðn1Þ 1

Aðn1Þ 0

½a and x

xðn1Þ

! , we get

xn

! h i 0 ½x ¼ Aðn1Þ xðn1Þ þ aen xn 2 : aen

Since A(n 1) > 0 and aen > 0, we have e A

Aðn1Þ 0

0 aen

! > 0:

Meanwhile, A is expressed as e A ¼ PT AP: From (14.98), A > 0. These complete the proof.

∎

We also have a related theorem (the following Theorem 14.12) for an Hermitian quadratic form. g ðk Þ Theorem 14.12 Let A ¼ (aij) be a n-dimensional Hermitian matrix. Let A be kdimensional principal submatrices. Then, we have g ðk Þ A > 0⟺detA > 0 ð1 k nÞ, g ðk Þ where A is described as 0

a11

B g ðk Þ A ¼@⋮

ak1

a1k

C ⋮ A:

⋱

1

akk

The proof is left for readers. Example 14.3 Let us consider a following Hermitian (real symmetric) matrix and corresponding Hermitian quadratic form. H¼

5 2

5 , hxjH jxi ¼ ðx1 x2 Þ 1 2 2

2 1

x1 x2

:

ð14:101Þ

14.5

Hermitian Quadratic Forms

573

Principal minors of H are 5 (>0) and 1 (>0) and detH ¼ 5 4 ¼ 1 (>0). Therefore, from Theorem 14.11 we have H > 0. A characteristic equation gives following eigenvalues; i.e., pffiffiffi pffiffiffi λ 1 ¼ 3 þ 2 2, λ 2 ¼ 3 2 2: Both the eigenvalues are positive as anticipated. As a diagonalizing matrix R, we get 1þ

R¼

pffiffiffi 2

1

1

pffiffiffi ! 2

1

:

To obtain a unitary matrix, we have to seek norms of column vectors. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi 4 þ 2 2 and Corresponding to λ1 and λ2, we estimate their norms to be pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi 4 2 2, respectively. Using them, as a unitary matrix U we get 0

1 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi C 4 þ 2 2C C: 1 A pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi 42 2

1 pffiffiffi B pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B 42 2 U¼B 1 @ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi 4þ2 2

ð14:102Þ

Thus, performing the matrix diagonalization, we obtain a diagonal matrix D such that {

D ¼ U HU ¼

pffiffiffi 3þ2 2 0

0

!

pffiffiffi : 32 2

ð14:103Þ

Let us view the above unitary diagonalization in terms of coordinate transformation. Using the above matrix U and changing (14.101) as in (13.36),

x1 UU { 2 1 x2 ! pffiffiffi x1 3þ2 2 0 pffiffiffi U { : x2 0 32 2

hxjH jxi ¼ ðx1 x2 ÞUU { ¼ ðx1 x2 ÞU

5

2

ð14:104Þ

Making an argument analogous to that of Sect. 13.2 and using similar notation, we have

574

14

0

x1 f x2 ÞH hexjH jexi ¼ ð f

0

f x1 f x2

Hermitian Operators and Unitary Operators

{

¼ ðx1 x2 ÞUU HUU

{

x1

x2

¼ hxjH jxi: ð14:105Þ

That is, we have

f x1 f x2

¼U

{

x1 : x2

ð14:106Þ

Or, taking an adjoint of (14.106), we have equivalently ð f x1 f x2 Þ ¼ ðx1 x2 ÞU . f x1 x1 and are real and that U is real (i.e., orthogonal Notice here that f x2 x2 matrix). Likewise, H 0 ¼ U { HU:

ð14:107Þ

f x1 x1 and are different column vector representax2 f x2 tions of the identical vector that is viewed in reference to two different sets of orthonormal bases (i.e., different coordinate systems). From (14.102), as an approximation we have Thus, it follows that

Uffi

0:92 0:38 0:38

0:92

ffi

cos 22:5 sin 22:5

sin 22:5 cos 22:5

! :

ð14:108Þ

Equating hx|H|xi to a constant, we get a hypersurface in a plane. Choosing 1 for a constant, we get an equation of hypersurface (i.e., an ellipse) as a function of f x1 and f x2 such that f f x1 2 x2 2 qffiffiffiffiffiffiffiffiffiffiffi 2 þ qffiffiffiffiffiffiffiffiffiffiffi 2 ¼ 1: 1pffiffi 1pffiffi 3þ2 2

ð14:109Þ

32 2

Figure 14.1 depicts the ellipse.

14.6

Simultaneous Eigenstates and Diagonalization

In quantum physics, a concept of simultaneous eigenstate is important and has been briefly mentioned in Sect. 3.3. To rephrase this concept, suppose that there are two operators B and C and ask whether the two operators (or more) possess a common set

14.6

Simultaneous Eigenstates and Diagonalization

575

Fig. 14.1 Ellipse obtained through diagonalization of a Hermitian quadratic form. The angle θ is about 22.5

of eigenvectors. The question is boiled down to whether the two operators commute. To address this question, the following theorem is important. Theorem 14.13 [2] Two Hermitian matrices B and C commute if and only if there exists a complete orthonormal set of common eigenvectors. Proof Suppose that there exists a complete orthonormal set of common eigenvectors {jxi; mii} that span the linear vector space, where i and mi are a positive integer and that jxii corresponds to an eigenvalue bi of B and ci of C. Note that if mi is 1, we say that the spectrum is nondegenerate and that if mi is equal to two or more, the spectrum is said to be degenerate. Then we have Bðjxi iÞ ¼ bi j xi i, C ðjxi iÞ ¼ ci j xi i:

ð14:110Þ

Therefore, BC(| xii) ¼ B(ci| xii) ¼ ciB(| xii) ¼ cibi j xii. Similarly, CB(| xii) ¼ cibi j xii. Consequently, (BC CB)(| xii) ¼ 0 for any jxii. As all the set of jxii span the vector space, BC CB ¼ 0, namely BC ¼ CB. In turn, assume that BC ¼ CB and that B(| xii) ¼ bi j xii, where jxii are orthonormal. Then, we have CBðjxi iÞ ¼ bi C ðjxi iÞ⟹B½Cðjxi iÞ ¼ bi C ðjxi iÞ:

ð14:111Þ

This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi. We have two cases. (i) The spectrum is nondegenerate: The spectrum is said to be nondegenerate if only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(| xii) ¼ ci j xii. That is, jxii is an eigenvector of C corresponding to an eigenvalue ci. That is, jxii is a common eigenvector to B and C. This completes the proof.

576

14

Hermitian Operators and Unitary Operators

(ii) The spectrum is degenerate: The spectrum is said to be degenerate if two or more eigenvectors belong to an eigenvalue. Multiplicity of bi is two or more; ð1Þ ð2Þ ðmÞ here suppose that the multiplicity is m (m 2). Let xi i, xi i, , and j xi i be linearly independent vectors and belong to the eigenvalue bi of B. Then, from the assumption we have m eigenvectors

ð1Þ ð2Þ ðmÞ C jxi i , C jxi i , , C jxi i that to an eigenvalue bi of B. This means that individual belong

ð μÞ ð1Þ C jxi i ð1 μ mÞ are described by linear combination of j xi i, ð2Þ

ðmÞ

j xi i, , and j xi i. What we want to prove is to show that suitable linear combination of these m vectors constitutes an eigenvector corresponding to an eigenvalue cμ of C. Here, to avoid complexity we denote the multiplicity by m instead of the abovementioned mi.

ð μÞ The vectors C jxi i ð1 μ mÞ can be described as

Xm

Xm ð1Þ ð jÞ ðmÞ ð jÞ C jxi i ¼ α j x i, , C jx i ¼ α j xi i: j1 i i j¼1 j¼1 jm

ð14:112Þ

Using full matrix representation, we have

C

0 1 γ1

ð1Þ ðmÞ B C ðk Þ γ jx i ¼ xi i xi i C @ ⋮ A k¼1 k i γm 0

α11 ð1Þ ðmÞ B ¼ xi i xi i @ ⋮ ⋱ αm1

Xm

10 1 γ1 α1m CB C ⋮ A@ ⋮ A:

αmm

ð14:113Þ

γm

In (14.113), we adopt the notation of (11.37). Since (αij) is a matrix representation of an Hermitian operator C, (αij) is Hermitian as well. More specifically, if we take an ðlÞ inner product of a vector expressed in (14.112) with j xi i, then we have D

E D Xm E Xm D E Xm ðlÞ ðk Þ ðlÞ ð jÞ ðlÞ ð jÞ xi Cxi ¼ xi α x α xi xi ¼ α δ ¼ jk jk i j¼1 j¼1 j¼1 jk lj ¼ αlk ð1 k,l mÞ,

ð14:114Þ

where the third equality comes from the orthonormality of the basis vectors. Meanwhile,

14.6

Simultaneous Eigenstates and Diagonalization

D

ðlÞ

ðk Þ

xi jCxi

E

577

D E D E ðk Þ ðlÞ ðk Þ ðlÞ ¼ xi jCxi ¼ αkl , ¼ xi jC { xi

ð14:115Þ

where the second equality comes from the Hermiticity of C. From (14.114) and (14.115), we get αlk ¼ αkl ð1 k, l mÞ:

ð14:116Þ

This indicates that (αij) is in fact Hermitian. We are seeking the condition under which linear combinations of the eigenvecðk Þ tors j xi i ð1 k mÞ for B are simultaneously eigenvectors of C. If the linear P ðk Þ combination m k¼1 γ k j xi i is to be an eigenvector of C, we must have C

Xm

Xm

ðk Þ ð jÞ γ jx i ¼ c γ jx i : k j i i k¼1 j¼1

ð14:117Þ

Considering (14.113), we have 0 α11

B ð1Þ ðmÞ B xi i xi i @ ⋮

αm1

⋱

0 1 γ1

CB C C ð 1Þ ðmÞ B C B C B ⋮ A@ ⋮ A ¼ xi i xi i c@ ⋮ C A:

α1m

10

γ1

1

γm

αmm

γm

ð14:118Þ ð jÞ

The vectors j xi i ð1 k mÞ span an invariant subspace (i.e., an eigenspace corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently, in ð jÞ (14.118) we can equate the scalar coefficients of individual j xi i ð1 k mÞ . Then, we get 0

α11

B @⋮ αm1

α1m

10

γ1

1

0

γ1

1

CB C B C ⋱ ⋮ A@ ⋮ A ¼ c@ ⋮ A: γm γm αmm

ð14:119Þ

This is nothing other than an eigenvalue equation. Since (αij) is an Hermitian matrix, there should be m eigenvalues cμ some of which may be identical (the degenerate case). Moreover, we can always decide m orthonormal column vectors by solving (14.119). We denote them by γ (μ) (1 μ m) that belong to cμ. Rewriting (14.119), we get

578

14

0

α11

B @⋮ αm1

Hermitian Operators and Unitary Operators

1 0 ðμÞ 1 ðμÞ γ1 γ1 CB C B C ⋱ ⋮ A@ ⋮ A ¼ cμ @ ⋮ A: αmm γ ðmμÞ γ ðmμÞ

α1m

10

ð14:120Þ

Equation (14.120) implies that we can construct a (m, m) unitary matrix from the m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to diagonalize (αij) according to Theorem 14.5. Having determined m eigenvectors γ (μ) (1 μ m), we can construct a set of eigenvectors such that Xm ðμÞ ðkÞ ðμÞ γ xi i: yi i ¼ k¼1 k

ð14:121Þ

ðμÞ

Finally let us confirm that j yi i ð1 μ mÞ in fact constitute an orthonormal basis. To show this, we have Xm D E DXm E ðνÞ ðμÞ ðνÞ ðk Þ ðμÞ ðlÞ γ x γ x ¼ yi yi i i k¼1 k l¼1 l Xm Xm h ðνÞ i ðμÞ D ðkÞ ðlÞ E ¼ γ γ l xi jxi k¼1 l¼1 k h i Xm Xm ðνÞ ðμÞ γ γ l δkl ¼ k¼1 l¼1 k Xm h ðνÞ i ðμÞ ¼ γ γ k ¼ δνμ k¼1 k

ð14:122Þ

The last equality comes from the fact that a matrix comprising m orthonormal ðμÞ column vectors γ (μ) (1 μ m) forms a unitary matrix. Thus, j yi i ð1 μ mÞ certainly constitute an orthonormal basis. The above completes the proof. ∎ Theorem 14.13 can be restated as follows: Two Hermitian matrices B and C can be simultaneously diagonalized by a unitary similarity transformation. As mentioned above, we can construct a unitary matrix U such that 0

ð1Þ

γ1

B U¼@⋮ γ ðm1Þ

⋱

ðmÞ

γ1

1

C ⋮ A: γ ðmmÞ

Then, using U, matrices B and C are diagonalized such that

14.6

Simultaneous Eigenstates and Diagonalization

0 B B U { BU ¼ B B @

bi bi ⋱

579

1

0

C C C, C A

B B U { CU ¼ B B @

1

c1

C C C: ð14:123Þ C A

c2 ⋱ cm

bi

Note that both B and C are represented in an invariant subspace Wm. As we have already seen in Part I that dealt with an eigenvalue problem of a hydrogen-like atom, squared angular momentum (L2) and z-component of angular momentum (Lz) possess a mutual set of eigenvectors and, hence, their eigenvalues are determined at once. Related matrix representations to (14.123) were given in (3.159). On the other hand, this was not the case with a set of operators Lx, and Ly, and Lz; see (3.30). Yet, we pointed out the exceptional case where these three operators along with L2 take an eigenvalue zero in common which an eigenstate pffiffiffiffiffiffiffiffiffiffi 0 Y 0 ðθ, ϕÞ 1=4π corresponds to. Nonetheless, no complete orthonormal set of common eigenvectors exists with the set of operators Lx, and Ly, and Lz. This fact is equivalent to that these three operators are noncommutative among them. In contrast, L2 and Lz share a complete orthonormal set of common eigenvectors and, hence, are commutable.

ð1Þ ð2Þ ðmÞ Notice that C jxi i , C jxi i , , and Cjxi i are not necessarily linearly independent (see Sect. 9.4). Suppose that among m eigenvalues cμ (1 μ m), some cμ ¼ 0. Then, detC according to ¼ 0

(13.48).

This means that C is singular. In ð1Þ ð2Þ ðmÞ that case, C jxi i , C jxi i , , and Cjxi i are linearly dependent. In Sect. 3.3, in fact, we had j Lz Y 00 ðθ, ϕÞ ¼j L2 Y 00 ðθ, ϕÞ ¼ 0 . But, this special situation does not affect the proof of Theorem 14.13. We know that any matrix A can be decomposed such that h1 i 1 A þ A{ þ i A A{ , ð14:124Þ 2 2i where we put B 12 A þ A{ and C 2i1 A A{ ; both B and C are Hermitian. That is any matrix A can be decomposed to two Hermitian matrices in such a way that A¼

A ¼ B þ iC:

ð14:125Þ

Note here that B and C commute if and only if A and A{ commute, that is A is a normal matrix. In fact, from (14.125) we get AA{ A{ A ¼ 2iðCB BC Þ:

ð14:126Þ

From (14.126), if and only if B and C commute (i.e., B and C can be diagonalized simultaneously), AA{ A{A ¼ 0, i.e., AA{ ¼ A{A. This indicates that A is a normal matrix. Thus, the following theorem will follow.

580

14

Hermitian Operators and Unitary Operators

Theorem 14.14 A matrix can be diagonalized by a unitary similarity transformation, if and only if it is a normal matrix. Thus, Theorem 14.13 is naturally generalized so that it can be stated as follows: Two normal matrices B and C commute if and only if there exists a complete orthonormal set of common eigenvectors. In Sects. 14.1 and 14.3, we mentioned the spectral decomposition. There, we showed a special case where projection operators commute with one another; see (14.23). Thus, in light of Theorem 14.13, those projection operators can be diagonalized at once to be expressed as, e.g., (14.72). This is a conspicuous feature of the projection operators.

References 1. Hassani S (2006) Mathematical physics. Springer, New York 2. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York 3. Mirsky L (1990) An introduction to linear algebra. Dover, New York 4. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 5. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)

Chapter 15

Exponential Functions of Matrices

In Chap. 12, we dealt with a function of matrices. In this chapter we study several important definitions and characteristics of functions of matrices. If elements of matrices consist of analytic functions of a real variable, such matrices are of particular importance. For instance, differentiation can naturally be defined with the functions of matrices. Of these, exponential functions of matrices are widely used in various fields of mathematical physics. These functions frequently appear in a system of differential equations. In Chap. 10, we showed that SOLDEs with suitable BCs can be solved using Green’s functions. In the present chapter, in parallel, we show a solving method based on resolvent matrices. The exponential functions of matrices have broad applications in group theory that we will study in Part IV. In preparation for it, we study how the collection of matrices forms a linear vector space. In accordance with Chap. 13, we introduce basic notions of inner product and norm to the matrices.

15.1

Functions of Matrices

We consider a matrix in which individual components are differentiable with respect to a real variable such that Aðt Þ ¼ aij ðt Þ :

ð15:1Þ

We define the differentiation as A0 ðt Þ ¼

dAðt Þ 0 aij ðt Þ : dt

ð15:2Þ

We have © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_15

581

582

15

Exponential Functions of Matrices

½Aðt Þ þ Bðt Þ0 ¼ A0 ðt Þ þ B0 ðt Þ,

ð15:3Þ

½Aðt ÞBðt Þ0 ¼ A0 ðt ÞBðt Þ þ Aðt ÞB0 ðt Þ:

ð15:4Þ

To get (15.4), putting A(t)B(t) C(t) we have cij ðt Þ ¼

X

a ðt Þbkj ðt Þ, k ik

ð15:5Þ

where C(t) ¼ (cij(t)), B(t) ¼ (bij(t)). Differentiating (15.5), we get c0ij ðt Þ ¼

X

a0 ðt Þbkj ðt Þ þ k ik

X

a ðt Þb0kj ðt Þ, k ik

ð15:6Þ

namely, we have C0(t) ¼ A0(t)B(t) + A(t)B0(t), i.e., (15.4) holds. Analytic functions are usually expanded as an infinite power series. As an analogue, a function of a matrix is expected to be expanded as such. Among various functions of matrices, exponential functions are particularly important and have a wide field of applications. As in the case of a real or complex number x, the exponential function of matrix A can be defined as exp A E þ A þ

1 2 1 A þ þ Aν þ , 2! ν!

ð15:7Þ

where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used instead of a number 1 that appears in the power series expansion of exp x described as exp x 1 þ x þ

1 2 1 x þ þ xν þ : 2! ν!

Here we define the convergence of a series of matrix as in the following definition: Definition 15.1 Let A0, A1, , Aν, [Aν (aij, ν) (1 i, j n)] be a sequence of matrices. Let aij, 0, aij, 1, , aij, ν, be the corresponding sequence of each matrix e e component. Let us define a series of matrices as A aij ¼ A0 þ A1 þ þ aij aij,0 þ aij,1 þ þ Aν þ and a series of individual matrix components as e e converges. aij,ν þ . Then, if each e aij converges, we say that A Regarding the matrix notation in Definition 15.1, readers are referred to (11.38). Now, let us show that the power series of (15.7) converges [1, 2]. Let A be a (n, n) square matrix. Putting A ¼ (aij) and max j aij j¼ M and defining i, j ðνÞ ðνÞ ν A aij , where aij is defined as a (i, j) component of Aν, we get

15.1

Functions of Matrices

583

ðνÞ aij nν M ν ð1 i, j nÞ:

ð15:8Þ

ð0Þ

Note that A0 ¼ E. Hence, aij ¼ δij. Then, if ν ¼ 0, (15.8) routinely holds. When ν ¼ 1, (15.8) obviously holds as well from the definition of M and assuming n 2; we are considering (n, n) matrices with n 2. We wish to show that (15.8) holds with any ν (2) using mathematical induction. Suppose that (15.8) holds with ν ¼ ν 1. That is, ðν1Þ aij nν1 M ν1 ð1 i, j nÞ: Then, we have Xn Xn ðν1Þ ð νÞ ðAν Þij aij ¼ Aν1 A ij ¼ Aν1 ik ðAÞkj ¼ a akj : k¼1 k¼1 ik

ð15:9Þ

Thus, we get Xn Xn ð νÞ ðν1Þ ðν1Þ a aij ¼ k¼1 aik akj akj n nν1 M ν1 M k¼1 ik ¼ nν M ν :

ð15:10Þ

Consequently, (15.8) holds with ν ¼ ν. Meanwhile, we have a next relation such that enM ¼

X1 1 nν M ν : ν¼0 ν!

ð15:11Þ

The series (15.11) is certainly converges; note that nM is not a matrix but a number. From (15.10) we obtain X1 1 ðνÞ X1 1 nν M ν ¼ enM : a ν¼0 ν! ij ν¼0 ν! P 1 ðνÞ This implies that 1 ν¼0 ν! aij converges (i.e., absolute convergence [3]). Thus, from Definition 15.1, (15.7) converges. To further examine the exponential functions of matrices, in the following proposition we show that a collection of matrices forms a linear vector space. Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms a linear vector space. Proof Let A ¼ aij 2 M and B ¼ bij 2 M be (n, n) square matrices. Then, αA ¼ (αaij) and A + B ¼ (aij + bij) are again (n, n) square matrices, and so we have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space.

584

15

Exponential Functions of Matrices 2

n In Proposition 15.1, M forms a n2-th order E vector space V . To define an inner ðkÞ product in this space, we introduce PðlÞ ðk, l ¼ 1, 2, . . . , nÞ as a basis set of n2

vectors. Now, let A ¼ (aij) be a (n, n) square matrices. To explicitly show that A is a vector in the inner product space, we express it as jAi ¼

E ðk Þ a P : k,l¼1 kl ðlÞ

Xn

Then, an adjoint matrix of jAi is written as hAjjAi{ ¼

D ðk Þ a P ðlÞ , k,l¼1 kl

Xn

ð15:12Þ

E D ðkÞ ðk Þ where PðlÞ is an adjoint matrix of PðlÞ . With the notations in (15.12), see Sects. ðk Þ

1.4 and 13.3; for PðlÞ see Sect. 14.1. Meanwhile, let B ¼ (bij) be another (n, n) square matrix denoted by jBi ¼

E ðsÞ b PðtÞ : st s,t¼1

Xn

E ðk Þ Here, let us define an orthonormal basis set PðlÞ and an inner product between vectors described in a form of matrices such that D E ðk Þ ðsÞ PðlÞ jPðtÞ δks δlt : Then, from (15.12) the inner product between A and B is given by hAjBi

D E Xn Xn ðkÞ ðsÞ a b P jP a b δ δ ¼ a b : ð15:13Þ ¼ st st ks lt kl kl ð l Þ ð t Þ k,l,s,t¼1 k,l,s,t¼1 k,l¼1 kl kl

Xn

The relation (15.13) leads to a norm already introduced in (13.8). The norm of a matrix is defined as jjAjj

pffiffiffiffiffiffiffiffiffiffi AjA¼

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn 2 aij : i,j¼1

ð15:14Þ

We have the following theorem regarding two matrices. Theorem 15.1 Let A ¼ (aij) and B ¼ (bij) be two square matrices. Then, we have a following inequality:

15.2

Exponential Functions of Matrices and Their Manipulations

jjABjjjjAjj jjBjj :

585

ð15:15Þ

Proof Let C ¼ AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1) we get kC k2 ¼

2 X X X 2 X 2 X X cij ¼ a b ja j2 l blj ik kj i,j i,j k i,j k ik : X X 2 blj 2 ¼ kAk2 kBk2 ¼ a j j ik i,k j,l

∎

This leads to (15.15).

15.2

ð15:16Þ

Exponential Functions of Matrices and Their Manipulations

Using Theorem 15.1, we explore important characteristics of exponential functions of matrices. For this we have a next theorem. Theorem 15.2 [4] Let A and B be mutually commutative matrices; i.e., AB ¼ BA. Then, we have exp ðA þ BÞ ¼ exp A exp B ¼ exp B exp A:

ð15:17Þ

Proof Since A and B are mutually commutative, (A + B)n can be expanded through the binomial theorem such that Xn Ak Bnk 1 ðA þ BÞn ¼ : k¼0 k! ðn k Þ! n!

ð15:18Þ

Meanwhile, with an arbitrary integer m, we have X X X2m 1 m An m Bn n ðA þ B Þ ¼ þ Rm , n¼0 n! n¼0 n! n¼0 n! where Rm has a form of

P

ð15:19Þ

k l A B =k!l!, where the summation is taken over all ðk,lÞ

combinations of (k, l) that satisfy max(k, l) m + 1 and k + l 2m. The number of all the combinations (k, l) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that if in (15.19) we put m ¼ 0, (15.19) trivially holds with Rm ¼ 0 (i.e., zero matrix). The

586

15

Fig. 15.1 Number of possible combinations (k, l ). Such combinations are indicated with green points. The number of the green points of upper left triangle is m(m + 1)/2. That for lower right triangle is m(m + 1)/2 as well. The total number of the green points is m(m + 1) accordingly. Adapted from Yamanouchi T, Sugiura M (1960) Introduction to continuous groups (New Mathematics Series 18: in Japanese) [4], with the permission of Baifukan Co., Ltd., Tokyo

Exponential Functions of Matrices

−1 −2 ⋯⋯⋯ ⋯

+1

0

⋯⋯⋯

⋯

0 +1

matrix Rm, however, changes with increasing m, and so we must evaluate it at m ! 1. Putting max(| |A|| , | |B|| , 1) ¼ C and using Theorem 15.1, we get jjRm jj

k l jAj jBj

X ðk,lÞ

k!l!

:

ð15:20Þ

In (15.20) min(k, l ) ¼ 0, and so k ! l ! (m + 1)!. Since ||A||k||B||l Ck + l C2m, we have k l jAj jBj

X ðk,lÞ

k!l!

mðm þ 1ÞC 2m C 2m ¼ , ðm þ 1Þ! ðm 1Þ!

ð15:21Þ

where m(m + 1) in the numerator comes from the number of combinations (k, l) as pointed out above. From Stirling’s interpolation formula [5], we have Γðz þ 1Þ

pffiffiffiffiffi z zþ1 2π e z 2 :

ð15:22Þ

Replacing z with an integer m 1, we get ΓðmÞ ¼ ðm 1Þ! Then, we have

pffiffiffiffiffi ðm1Þ 1 2π e ðm 1Þm2 :

ð15:23Þ

15.2

Exponential Functions of Matrices and Their Manipulations

½RHS of ð15:21Þ ¼

587

m12 C2m em1 C2m C eC 2 p ffiffiffiffiffiffiffi ffi ¼ ð15:24Þ pffiffiffiffiffi 1 ðm 1Þ! 2πe m 1 2π ðm 1Þm2

and m12 C eC2 lim pffiffiffiffiffiffiffiffi ¼ 0: m!1 2πe m 1

ð15:25Þ

Thus, from (15.20) and (15.21) we get lim jjRm jj ¼ 0:

m!1

ð15:26Þ

Remember that jjRmjj ¼ 0 if and only if Rm ¼ 0 (zero matrix); see (13.4) and (13.8) in Sect. 13.1. Finally, taking limit (m ! 1) of both sides of (15.19) we obtain X

X X2m 1 m An m Bn n ðA þ BÞ ¼ lim þ Rm : lim n¼0 n! n¼0 n! n¼0 n! m!1 m!1 Considering (15.26), we get X X X2m 1 m An m Bn n ð A þ B Þ : ¼ lim n¼0 n! n¼0 n! n¼0 n! m!1 m!1 lim

ð15:27Þ

This implies that (15.17) holds. It is obvious that if A and B commute, then exp A and exp B commute. That is, exp A exp B ¼ exp B exp A. These complete the proof. ∎ From Theorem 15.2, we immediately get the following property. ð1Þ ð exp AÞ1 ¼ exp ðAÞ:

ð15:28Þ

To show this, it suffices to find that A and A commute. That is, A (A) ¼ A2 ¼ (A)A. Replacing B with A in (15.17), we have exp ðA AÞ ¼ exp 0 ¼ E ¼ exp A exp ðAÞ ¼ exp ðAÞ exp A:

ð15:29Þ

This leads to (15.28). Notice here that exp 0 ¼ E from (15.7). We further have important properties of exponential functions of matrices. ð2Þ P1 ð exp AÞP ¼ exp P1 AP :

ð15:30Þ

588

15

Exponential Functions of Matrices

ð3Þ exp AT ¼ ð exp AÞT : ð4Þ exp A{ ¼ ð exp AÞ{ :

ð15:31Þ ð15:32Þ

To show (2) we have 1 n 1 1 1 P AP ¼ P AP P AP P AP ¼ P1 A PP1 A PP1 PP1 AP 1

ð15:33Þ

1 n

¼ P AEAE EAP ¼ P A P: Applying (15.33) to (15.7), we have 2 n exp P1 AP ¼ E þ P1 AP þ P1 AP þ þ P1 AP þ A2 An ¼ P1 EP þ P1 AP þ P1 P þ þ P1 P þ ð15:34Þ 2! n! 2 n A A ¼ P1 E þ A þ þ þ þ P ¼ P1 ð exp AÞP: 2! n! With (3) and (4), use the following equations:

AT

n

n ¼ ðAn ÞT and A{ ¼ ðAn Þ{ :

ð15:35Þ

The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we describe other important theorem and properties of the exponential functions of matrices. Theorem 15.3 Let a1, a2, , an be eigenvalue of a (n, n) square matrix A. Then, exp a1, exp a2, , exp an are eigenvalues of exp A. Proof From Theorem 12.1 we know that every (n, n) square matrix can be converted to a triangle matrix by similarity transformation. Also, we know that eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1). Let P be a non-singular matrix used for such a similarity transformation. Then, we have

=

0

*

Meanwhile, let T be a triangle matrix described by

.

ð15:36Þ

15.2

Exponential Functions of Matrices and Their Manipulations

*

0

589

ð15:37Þ

.

Then, from a property of the triangle matrix we have

*

ð15:38Þ

.

Applying this property to (15.36), we get

*

0

,

ð15:39Þ

where the last equality is due to (15.33). Summing (15.39) over m following (15.34), we obtain another triangle matrix expressed as exp

0

*

.

ð15:40Þ

Once again considering that eigenvalues of a triangle matrix are given by its diagonal elements and that the eigenvalues are invariant under a similarity transformation, (15.40) implies that exp a1, exp a2, , exp an are eigenvalues of exp A. These complete the proof. ∎ A question of whether the eigenvalues are a proper eigenvalue or a generalized eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of A in question. Theorem 15.3 immediately leads to a next important relation. Taking a determinant of (15.40), we get

det P1 ð exp AÞP ¼ detP1 detð exp AÞdetP ¼ detð exp AÞ ¼ ð exp a1 Þð exp a2 Þ ð exp an Þ ¼ exp ða1 þ a2 þ þ an Þ ¼ exp ðTrAÞ

ð15:41Þ

To derive (15.41), refer to (11.58) and (12.10) as well. We list (15.41) as an important property of the exponential functions of matrices.

590

15

Exponential Functions of Matrices

ð5Þ detð exp AÞ ¼ exp ðTrAÞ:

ð15:41Þ

We have examined general aspects until now. Nonetheless, we have not yet studied the relationship between the exponential functions of matrices and specific types of matrices so far. Regarding the applications of exponential functions of matrices to mathematical physics, especially with the transformations of functions and vectors, we frequently encounter orthogonal matrices and unitary matrices. For this purpose, we wish to examine the following properties. (6) Let A be a real skew-symmetric matrix. Then, exp A is a real orthogonal matrix. (7) Let A be an anti-Hermitian matrix. Then, exp A is a unitary matrix. To show (6), we note that a skew-symmetric matrix is described as AT ¼ A or AT þ A ¼ 0:

ð15:42Þ

Then, we have ATA ¼ A2 ¼ AAT and, hence, A and AT commute. Therefore, from Theorem 15.2, we have exp A þ AT ¼ exp 0 ¼ E ¼ exp A exp AT ¼ exp Að exp AÞT ,

ð15:43Þ

where the last equality comes from Property (3) of (15.31). Equation (15.43) implies that exp A is a real orthogonal matrix. Since an anti-Hermitian matrix A is expressed as A{ ¼ A or A{ þ A ¼ 0,

ð15:44Þ

we have A{A ¼ A2 ¼ AA{, and so A and A{ commute. Using (15.32), we have exp A þ A{ ¼ exp 0 ¼ E ¼ exp A exp A{ ¼ exp Að exp AÞ{ ,

ð15:45Þ

where the last equality comes from Property (4) of (15.32). This implies that exp A is a unitary matrix. Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations (15.42) through (15.45) are held unchanged. Then, Properties (6) and (7) are rewritten as (6)0 Let A be a real skew-symmetric matrix. Then, exp tA (t : real) is a real orthogonal matrix. (7)0 Let A be an anti-Hermitian matrix. Then, exp tA (t : real) is a unitary matrix. Now, let a function F(t) be expressed as F(t) ¼ exp tx with real numbers t and x. Then, we have a well-known formula d F ðt Þ ¼ xF ðt Þ: dt

ð15:46Þ

15.2

Exponential Functions of Matrices and Their Manipulations

591

Next, we extend (15.46) to the exponential functions of matrices. Let us define the differentiation of an exponential function of a matrix with respect to a real parameter t. Then, in concert with (15.46), we have a following important theorem. Theorem 15.4 [1, 2] Let F(t) exp tA with t being a real number and A being a matrix that does not depend on t. Then, we have d F ðt Þ ¼ AF ðt Þ: dt

ð15:47Þ

In (15.47) we assume that individual matrix components of F(t) are differentiable with respect to t and that the differentiation of a matrix is defined as in (15.2). Proof We have F ðt þ Δt Þ ¼ exp ðt þ Δt ÞA ¼ exp ðtA þ ΔtAÞ ¼ ð exp tAÞð exp ΔtAÞ ¼ F ðt ÞF ðΔt Þ ¼ ð exp ΔtAÞð exp tAÞ ¼ F ðΔt ÞF ðt Þ,

ð15:48Þ

where we considered that tA and ΔtA are commutative and used Theorem 15.2. Therefore, we get 1 1 ½F ðt þ Δt Þ F ðt Þ ¼ ½F ðΔt Þ EF ðt Þ Δt Δt h i 1 X1 1 ν E F ðt Þ ð ΔtA Þ ¼ ν¼0 ν! Δt hX1 i 1 ¼ ðΔt Þν1 Aν F ðt Þ: ν¼1 ν!

ð15:49Þ

Taking the limit of both sides of (15.49), we get lim

1

Δt!0 Δt

½F ðt þ Δt Þ F ðt Þ

d F ðt Þ ¼ AF ðt Þ: dt

Notice that in (15.49) only the first term (i.e., ν ¼ 1) of RHS is nonvanishing with respect to Δt ! 0. Then, (15.47) will follow. This completes the proof. ∎ On the basis of the power series expansion of the exponential function of a matrix defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may write d F ðt Þ ¼ F ðt ÞA: dt

ð15:50Þ

Using Theorem 15.4, we show the following important properties. These are converse propositions to Properties (6) and (7).

592

15

Exponential Functions of Matrices

(8) Let exp tA be a real orthogonal matrix with any real number t. Then, A is a real skew-symmetric matrix. (9) Let exp tA be a unitary matrix with any real number t. Then, A is an antiHermitian matrix. To show (8), from the assumption with any real number t we have exp tAð exp tAÞT ¼ exp tA exp tAT ¼ E:

ð15:51Þ

Differentiating (15.51) with respect to t by use of (15.4), we have ðA exp tAÞ exp tAT þ ð exp tAÞAT exp tAT ¼ 0:

ð15:52Þ

Since (15.52) must hold with any real number, (15.52) must hold with t ! 0 as well. On this condition, from (15.52) we get A + AT ¼ 0. That is, A is a real skewsymmetric matrix. In the case of (9), similarly we have exp tAð exp tAÞ{ ¼ exp tA exp tA{ ¼ E:

ð15:53Þ

Differentiating (15.53) with respect to t and taking t ! 0, we get A + A{ ¼ 0. That is, A is an anti-Hermitian matrix. Properties (6)–(9) including Properties (6)0 and (7)0 are frequently used later in Chap. 20 in relation to Lie groups and Lie algebras.

15.3

System of Differential Equations

In this section we describe important applications of the exponential functions of matrices to systems of differential equations.

15.3.1 Introduction In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present section we examine the properties of systems of differential equations. There is a close relationship between the two topics. First we briefly outline it. We describe an example of the system of differential equations. First, let us think of the following general equation: xð_t Þ ¼ aðt Þxðt Þ þ bðt Þyðt Þ,

ð15:54Þ

15.3

System of Differential Equations

593

yð_t Þ ¼ cðt Þxðt Þ þ dðt Þyðt Þ,

ð15:55Þ

where x(t) and y(t) vary as a function of t; a(t), b(t), c(t), and d(t) are coefficients. Differentiating (15.54) with respect to t, we get xð€t Þ ¼ að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þyð_t Þ ¼ að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þ½cðt Þxðt Þ þ d ðt Þyðt Þ ¼ ½að_t Þ þ bðt Þcðt Þxðt Þ þ aðt Þxð_t Þ þ ½bð_t Þ þ bðt Þd ðt Þyðt Þ,

ð15:56Þ

where with the second equality we used (15.55). To delete y(t) from (15.56), we multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of (15.54) and further subtract the latter equation from the former. As a result, we obtain

b€x b_ þ ða þ dÞb x_ þ a b_ þ bd bða_ þ bcÞ x ¼ 0:

ð15:57Þ

Upon the above derivation, we assume b 6¼ 0. Solving (15.57) and substituting the resulting solution for (15.54), we can get a solution for y(t). If, on the other hand, b ¼ 0, we have xð_t Þ ¼ aðt Þxðt Þ. This is a simple FOLDE and can readily be integrated to yield Z x ¼ C exp

t

aðt 0 Þdt 0 ,

ð15:58Þ

where C is an integration constant. Meanwhile, differentiating (15.55) with respect to t, we have _ þ d y_ ¼ c_ x þ cax þ dy _ þ dy_ €y ¼ c_ x þ c_x þ dy Z t _ þ dy_ , aðt Þdt þ dy ¼ ðc_ þ caÞ C exp where we used (15.58) as well as x_ ¼ ax. Thus, we are going to solve a following inhomogeneous SOLDE: _ ¼ ðc_ þ caÞ C exp €y dy_ dy

Z

t

aðt Þdt : 0

0

ð15:59Þ

In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE using a weight function. For a later purpose, let us consider a next FOLDE assuming a(x) 1 in (10.23):

594

15

Exponential Functions of Matrices

xð_t Þ þ pðt Þx ¼ qðt Þ:

ð15:60Þ

As another method to solve (15.60), we examine the method of variation of constants. The homogeneous equation corresponding to (15.60) is xð_t Þ þ pðt Þx ¼ 0:

ð15:61Þ

It can be solved as just before such that Z t 0 0 xðt Þ ¼ C exp pðt Þdt uðt Þ:

ð15:62Þ

Now we assume that a solution of (15.60) can be sought by putting xðt Þ ¼ kðt Þuðt Þ,

ð15:63Þ

where we assume that the functional form u(t) remains unchanged in the inhomogeneous equation (15.60) and that instead the “constant” k(t) may change as a function of t. Inserting (15.63) into (15.60), we have _ þ ku_ þ pku ¼ ku _ þ kðu_ þ puÞ ¼ ku _ þ k ðpu þ puÞ ¼ ku _ x_ þ px ¼ ku ¼ q, ð15:64Þ where with the third equality we used (15.61) and (15.62) to get u_ ¼ pu. In this way, (15.64) can easily be integrated to give Z k ðt Þ ¼

t

qð t 0 Þ 0 dt þ C, uð t 0 Þ

where C is an integration constant. Thus, as a solution of (15.60) we have Z xðt Þ ¼ kðt Þuðt Þ ¼ uðt Þ

t

qðt 0 Þ 0 dt þ Cuðt Þ: uðt 0 Þ

ð15:65Þ

Comparing (15.65) with (10.29), we notice that these two expressions are related. In other words, the method of variation of constants in (15.65) is essentially the same as that using the weight function in (10.29). Another important point to bear in mind is that in Chap. 10 we have solved SOLDEs of constant coefficients using the method of Green’s functions. In that case, we have separately estimated a homogeneous term and inhomogeneous term (i.e., surface term). In the present section we are going to seek a method corresponding to that using the Green’s functions. In the above discussion we see that a system of differential equations with two unknowns can be translated into a SOLDE. Then, we expect such a system of differential equations of constant coefficients to be solved somehow or other. We will hereinafter describe it in detail giving examples.

15.3

System of Differential Equations

595

15.3.2 System of Differential Equations in a Matrix Form: Resolvent Matrix The equations of (15.54) and (15.55) are unified in a homogeneous single matrix equation such that ðxð_t Þ yð_t ÞÞ ¼ ðxðt Þ yðt ÞÞ

að t Þ

cð t Þ

bðt Þ d ðt Þ

:

ð15:66Þ

In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so we describe them as row matrices. With this expression of equation, see Sect. 11.2 and, e.g., (11.37) therein. Another expression is a transposition of (15.66) described by xð_t Þ yð_t Þ

!

¼

að t Þ bð t Þ cðt Þ d ðt Þ

xð t Þ : yð t Þ

ð15:67Þ

Notice that the coefficient matrix has been transposed in (15.67) accordingly. We are most interested in an equation of constant coefficient and, hence, we rewrite (15.66) once again explicitly as a _ _ ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ b

c

d

ð15:68Þ

or a c d ð xð t Þ yð t Þ Þ ¼ ð xð t Þ yð t Þ Þ : dt b d

ð15:69Þ

Putting F(t) (x(t) y(t)), we have a c dF ðt Þ ¼ F ðt Þ : dt b d

ð15:70Þ

This is exactly the same as (15.50) if we put

c : d

ð15:71Þ

F ðt Þ ¼ exp tA:

ð15:72Þ

A¼

a b

Then, we get

596

15

Exponential Functions of Matrices

As already shown in (15.7) of Sect. 15.1, an exponential function of a matrix, i.e., exp tA can be defined as exp tA E þ tA þ

1 1 ðtAÞ2 þ þ ðtAÞν þ : 2! ν!

ð15:73Þ

From Property (5) of (15.41), we have det(exp A) 6¼ 0. This is because no matter what number (either real or complex) TrA may take, exp t(TrA) never vanishes. That is, F(t) ¼ exp A is non-singular, and so an inverse matrix of exp A must exist. This is the case with exp tA as well; see equation below ð5Þ0 detð exp tAÞ ¼ exp TrðtAÞ ¼ exp t ðTrAÞ:

ð15:74Þ

Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well. Since in that general case F(t) is non-singular, it consists of n linearly independent column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically described as F ðt Þ ¼ ðx1 ðt Þ x2 ðt Þ xn ðt ÞÞ, where xi(t) (1 i n) represents a i-th column vector. Note that since F (0) ¼ E, xi(0) has its i-th row component of 1, otherwise 0. Since F(t) is non-singular, n column vectors xi(t) (1 i n) are linearly independent. In the case of, e.g., (15.66), A is a (2, 2) matrix, which implies that x(t) and y(t) represent two linearly independent column vectors, i.e., (2, 1) matrices. Furthermore, if we look at the constitution of (15.70), x(t) and y(t) represent two linearly independent solutions of (15.70). We will come back to this point later. On the basis of the method of variation of constants, we deal with inhomogeneous equations and describe below the procedures for solving the system of equations with two unknowns. But, they can be readily conformed to the general case where the equations with n unknowns are dealt with. Now, we rewrite (15.68) symbolically such that xð_t Þ ¼ xðt ÞA, where x(t) (x(t) y(t)), xð_t Þ ðxð_t Þ yð_t ÞÞ, and A

ð15:75Þ

a

c

b

d

. Then, the inhomo-

geneous equation can be described as xð_t Þ ¼ xðt ÞA þ bðt Þ,

ð15:76Þ

where we define the inhomogeneous term as b (p q). This is equivalent to

15.3

System of Differential Equations

597

a _ _ ðxðt Þ yðt ÞÞ ¼ ðxðt Þ yðt ÞÞ b

c d

þ ðp qÞ:

ð15:77Þ

Using the same solution F(t) that appears in the homogeneous equation, we assume that the solution of the inhomogeneous equation is described by xðt Þ ¼ kðt ÞF ðt Þ,

ð15:78Þ

where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) ¼ k(t)F(t) of (15.78), we obtain _ þ kF_ ¼ kF _ þ kFA ¼ kFA þ b, x_ ¼ kF

ð15:79Þ

where with the second equality we used F_ ¼ FA in (15.50). Then, we have _ ¼ b: kF

ð15:80Þ

This can be integrated so that we may have Z k¼

t

h i ds bðsÞF ðsÞ1 :

ð15:81Þ

As previously noted, F(s)1 exists because F(s) is non-singular. Thus, as a solution we get Z

t

xðt Þ ¼ kðt ÞF ðt Þ ¼

h i ds bðsÞF ðsÞ1 F ðt Þ :

ð15:82Þ

t0

Here we define Rðs, t Þ F ðsÞ1 F ðt Þ:

ð15:83Þ

The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an essential role in solving the system of differential equations both homogeneous and inhomogeneous with either homogeneous or inhomogeneous boundary conditions (BCs). Rewriting (15.82), we get Z

t

xðt Þ ¼ kðt ÞF ðt Þ ¼

ds½bðsÞℜðs, t Þ :

t0

We summarize principal characteristics of the resolvent matrix.

ð15:84Þ

598

15

Exponential Functions of Matrices

ð1Þ Rðs, sÞ ¼ F ðsÞ1 F ðsÞ ¼ E, where E is a (2, 2) unit matrix, i.e., E ¼

1 0

0 1

ð15:85Þ

for an equation having two

unknowns. h i1 ð2Þ Rðs, t Þ1 ¼ F ðsÞ1 F ðt Þ ¼ F ðt Þ1 F ðsÞ ¼ Rðt, sÞ:

ð15:86Þ

ð3Þ Rðs, t ÞRðt, uÞ ¼ F ðsÞ1 F ðt ÞF ðt Þ1 F ðuÞ ¼ F ðsÞ1 F ðuÞ ¼ Rðs, uÞ:

ð15:87Þ

It is easy to include a term in solution related to the inhomogeneous BCs. As already seen in (10.33) of Sect. 10.2, a boundary condition for the FOLDEs is set such that, e.g., uðaÞ ¼ σ,

ð15:88Þ

where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the “initial” condition. In the present case, we describe the BCs as xðt 0 Þ x0 ðσ τÞ,

ð15:89Þ

where σ x(t0) and τ y(t0). Then, the said term associated with the inhomogeneous BCs is expected to be written as xðt 0 ÞRðt 0 , t Þ: In fact, since xðt 0 ÞRðt 0 , t 0 Þ ¼ xðt 0 ÞE ¼ xðt 0 Þ, this satisfies the BCs. Then, the full expression of the solution of the inhomogeneous equation that takes account of the BCs is assumed to be expressed as Z

t

xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ

ds½bðsÞℜðs, t Þ:

ð15:90Þ

t0

Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we give a formula on the differentiation of Q ðt Þ ¼

d dt

Z

t

Pðt, sÞds,

ð15:91Þ

t0

where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s) r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as

15.3

System of Differential Equations

1 Qðt Þ ¼ lim Δ Δ!0 1 Δ!0 Δ

Z

Z

tþΔ

Z

tþΔ t0

Z

Z

¼ lim

t

Z

t

Pðt þ Δ, sÞds

t0

¼ lim

1 Δ!0 Δ

599

Pðt, sÞds

t0 t

Pðt þ Δ, sÞds

Z

t0

t

Pðt þ Δ, sÞds þ

Pðt þ Δ, sÞds

t0

Pðt, sÞds

t0 tþΔ

Z

t

Pðt þ Δ, sÞds þ

t

1 1 Pðt, t ÞΔ þ lim Δ!0 Δ Δ!0 Δ Z t ∂Pðt, sÞ ds: ¼ Pðt, t Þ þ ∂t t0

Z Pðt þ Δ, sÞds

t0

Z

t

lim

Z

Pðt þ Δ, sÞds

t0

t

Pðt, sÞds

t0 t

Pðt, sÞds

t0

ð15:92Þ

Replacing P(t, s) of (15.92) with bðsÞRðs, t Þ, we have d dt

Z

t

Z

t

bðsÞℜðs, t Þds ¼ bðt Þℜðt, t Þ þ

t0

t0

Z

t

¼ bð t Þ þ

bð s Þ

t0

bð s Þ

∂ℜðs, t Þ ds ∂t

∂ℜðs, t Þ ds, ∂t

ð15:93Þ

where with the last equality we used (15.85). Considering (15.83) and (15.93) and differentiating (15.90) with respect to t, we get xð_t Þ ¼ xðt 0 ÞF ðt 0 Þ1 F ð_t Þ þ

Z

t

h i ds bðsÞF ðsÞ1 F ð_t Þ þ bðt Þ

t0

¼ xðt 0 ÞF ðt 0 Þ1 F ðt ÞA þ

¼

t

h i ds bðsÞF ðsÞ1 F ðt ÞA þ bðt Þ

t0

Z ¼ xðt 0 Þℜðt 0 , t ÞA þ ("

Z

t

ds½bðsÞℜðs, t ÞA þ bðt Þ )

t 0Z

t

xðt 0 Þℜðt 0 , t Þ þ

ds½bðsÞℜðs, t Þ A þ bðt Þ

t0

¼ xðt ÞA þ bðt Þ,

ð15:94Þ

where with the second equality we used (15.70) and with the last equality we used (15.90). Thus, we have certainly recovered the original differential equation of (15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous equation with inhomogeneous BCs.

600

15

Exponential Functions of Matrices

The above discussion and formulation equally apply to the general case of the differential equation with n unknowns, even though the calculation procedures become increasingly complicated with the increasing number of unknowns.

15.3.3 Several Examples To deepen our understanding of the essence and characteristics of the system of differential equations, we deal with several examples regarding the equations of two unknowns with constant coefficients. The constant matrices A of specific types (e.g., anti-Hermitian matrices and skew-symmetric matrices) have a wide field of applications in mathematical physics, especially in Lie groups and Lie algebras (see Chap. 20). In the subsequent examples, however, we deal with various types of matrices. Example 15.1 Solve the following equation xð_t Þ ¼ x þ 2y þ 1, yð_t Þ ¼ 2x þ y þ 2,

ð15:95Þ

under the BCs x(0) ¼ c and y(0) ¼ d. In a matrix form, (15.95) can be written as 1 2 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: 2 1

ð15:96Þ

The homogeneous equation corresponding to (15.96) is expressed as 1 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞ 2

2 1

:

ð15:97Þ

As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t) such that F ðt Þ ¼ exp tA,

ð15:98Þ

1 2 where A ¼ . Since A is a real symmetric (i.e., Hermitian) matrix, it should 2 1 be diagonalized by the unitary similarity transformation (see Sect. 14.3). For the purpose of getting an exponential function of a matrix, it is easier to estimate exp tD (where D is a diagonal matrix) than to directly calculate exp tA. That is, we rewrite (15.97) as

15.3

System of Differential Equations

601

{ 1 2 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞUU UU { , 2 1

ð15:99Þ

where U is a unitary matrix; i.e., U{ ¼ U1. Following routine calculation procedures (as in, e.g., Example 14.3), we readily get a following diagonal matrix and the corresponding unitary matrix for the diagonalization. That is, we obtain 0

1 1 1 pffiffiffi pffiffiffi B 2 2C C U¼B @ 1 1 A pffiffiffi pffiffiffi 2 2 1 0 ¼ : 0 3

D ¼ U 1

and

1

2

2

1

U ¼ U 1 AU

ð15:100Þ

Or we have A ¼ UDU 1 :

ð15:101Þ

Hence, we get F ðt Þ ¼ exp tA ¼ exp tUDU 1 ¼ U ð exp tDÞU 1 ,

ð15:102Þ

where with the last equality we used (15.30). From Theorem 15.3, we have exp tD ¼

et 0

0 : e3t

ð15:103Þ

In turn, we get 0

0 1 1 1 1 ! p1ffiffiffi p1ffiffiffi pffiffiffi pffiffiffi t e 0 B 2 B 2 2C 2C C B C F ðt Þ ¼ B @ A @ 3t 1 1 1 1 A 0 e pffiffiffi pffiffiffi pffiffiffi pffiffiffi 2 2 2 2 ! t 3t t 3t e þ e e þ e 1 ¼ : 2 et þ e3t et þ e3t

ð15:104Þ

With an inverse matrix, we have F ðt Þ

1

1 ¼ 2

e3t þ et e3t e3t

e3t e3t : e3t þ et

ð15:105Þ

602

15

Exponential Functions of Matrices

Therefore, as a resolvent matrix we get eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ

1 Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼ 2 1

! eðtsÞ þ e3ðtsÞ : eðtsÞ þ e3ðtsÞ

ð15:106Þ

Thus, as a solution of (15.95) we obtain Z

t

xðt Þ ¼ xð0ÞRð0, t Þ þ

ds½bðsÞRðs, t Þ,

ð15:107Þ

0

where x(0) ¼ (x(0) y(0)) ¼ (c d) represents the BCs and b(s) ¼ (1 2) comes from (15.96). This can easily be calculated so that we can have xð0ÞRð0, t Þ ¼

1

1 ðc dÞet þ ðc þ d Þe3t ðd cÞet þ ðc þ dÞe3t 2 2 ð15:108Þ

and Z

t

ds½bðsÞRðs, t Þ ¼

0

1 t 1 e þ e3t 1 et e3t : 2 2

ð15:109Þ

Note that the first component of (15.108) and (15.109) represents the x component of the solution for (15.95) and that the second component represents the y component. From (15.108), we have xð0ÞRð0, 0Þ ¼ xð0ÞE ¼ xð0Þ ¼ ðc d Þ:

ð15:110Þ

Thus, we find that the BCs are certainly satisfied. Putting t ¼ 0 in (15.109), we have Z

0

ds½bðsÞRðs, 0Þ ¼ 0,

ð15:111Þ

0

confirming that the second term of (15.107) vanishes at t ¼ 0. The summation of (15.108) and (15.109) gives an overall solution of (15.107). Separating individual components of the solution, we describe them as

1 ðc d þ 1Þet þ ðc þ d þ 1Þe3t 2 , 2

1 yðt Þ ¼ ðd c 1Þet þ ðc þ d þ 1Þe3t : 2

xð t Þ ¼

ð15:112Þ

15.3

System of Differential Equations

603

Moreover, differentiating (15.112) with respect to t, we recover the original form of (15.95). The confirmation is left for readers. Remarks on Example 15.1: We become aware of several points in Example 15.1. (i) The resolvent matrix given by (15.106) can be obtained by replacing t of (15.104) with t s. This is one of important characteristics of the resolvent matrix that is derived from an exponential function of a constant matrix. To see it, if A is a constant matrix, tA and sA commute. By the definition (15.7) of the exponential function we have 1 2 2 1 t A þ þ t ν Aν þ , 2! ν! 1 1 exp ðsAÞ ¼ E þ ðsAÞ þ ðsÞ2 A2 þ þ ðsÞν Aν þ : 2! ν! exp tA ¼ E þ tA þ

ð15:113Þ ð15:114Þ

Both (15.113) and (15.114) are polynomials of a constant matrix A and, hence, exp tA and exp(sA) commute as well. Then, from Theorem 15.2 and Property (1) of (15.28), we get exp ðtA sAÞ ¼ ð exp tAÞ½ exp ðsAÞ ¼ ½ exp ðsAÞð exp tAÞ ¼ exp ðsA þ tAÞ ¼ exp ½ðt sÞA:

ð15:115Þ

That is, using (15.28), (15.72), and (15.83), we get Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼ exp ðsAÞ1 exp ðtAÞ ¼ exp ðsAÞ exp ðtAÞ ¼ exp ½ðt sÞA ¼ F ðt sÞ:

ð15:116Þ

This implies that once we get exp tA, we can safely obtain the resolvent matrix by automatically replacing t of (15.72) with t s. Also, exp tA and exp(sA) are commutative. In the present case, therefore, by exchanging the order of products of F(s)1 and F(t) in (15.106), we have the same result as (15.106) such that

Rðs, t Þ ¼ F ðt ÞF ðsÞ1

1 ¼ 2

eðtsÞ þ e3ðtsÞ eðtsÞ þ e3ðtsÞ

! eðtsÞ þ e3ðtsÞ : eðtsÞ þ e3ðtsÞ

ð15:117Þ

(ii) The resolvent matrix of the present example is real symmetric. This is because if A is real symmetric, i.e., AT ¼ A, we have n ðAn ÞT ¼ AT ¼ An ,

ð15:118Þ

that is An is real symmetric as well. From (15.7), exp A is real symmetric accordingly. In a similar manner, if A is Hermitian, exp A is also Hermitian.

604

15

Exponential Functions of Matrices

(iii) Let us think of a case where A is not a constant matrix but varies as a function of t. In that case, we have Aðt Þ ¼

:

að t Þ

bð t Þ

cðt Þ

d ðt Þ

aðsÞ cð s Þ

bðsÞ : d ðsÞ

ð15:119Þ

Also, we have AðsÞ ¼

ð15:120Þ

We can easily check that in a general case Aðt ÞAðsÞ 6¼ AðsÞAðt Þ,

ð15:121Þ

namely, A(t) and A(s) are not generally commutative. The matrices tA(t) is not commutative with sA(s), either. In turn, (15.115) does not hold either. Thus, we need an elaborated treatment for this. Example 15.2 Find the resolvent matrix of the following homogeneous equation: xð_t Þ ¼ 0, yð_t Þ ¼ x þ y:

ð15:122Þ

In a matrix form, (15.122) can be written as 0 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞ 0 The matrix A ¼

0

1

1 : 1

ð15:123Þ

is characterized by an idempotent matrix (see Sect. 0 1 12.4). As in the case of Example 15.1, we get ðxð_t Þ yð_t ÞÞ ¼ ðx yÞPP1

1

1

0 1 0 1

PP1 ,

1

1

1

ð15:124Þ

. Notice that in this and, hence, P ¼ 0 1 0 1 example since A was not a symmetric matrix, we did not use a unitary matrix with the similarity transformation. As a diagonal matrix D, we have

where we choose P ¼

D ¼ P1

0 P ¼ P1 AP ¼ 0 0 1 0 1

0 1

:

ð15:125Þ

15.3

System of Differential Equations

605

Or we have A ¼ PDP1 :

ð15:126Þ

Hence, we get F ðt Þ ¼ exp tA ¼ exp tPDP1 ¼ Pð exp tDÞP1 ,

ð15:127Þ

where we used (15.30) again. From Theorem 15.3, we have exp tD ¼

1 0

0 : et

ð15:128Þ

In turn, we get F ðt Þ ¼

1 0

1 1

1 0

0 et

1 1 0 1

¼

1 1 þ et : 0 et

ð15:129Þ

With an inverse matrix, we have 1

F ðt Þ

¼

1 0

et 1 : et

ð15:130Þ

Therefore, as a resolvent matrix we get Rðs, t Þ ¼ F ðsÞ1 F ðt Þ ¼

1 þ ets : ets

1 0

ð15:131Þ

Using the resolvent matrix, we can include the inhomogeneous term along with BCs (or initial conditions). Once again, by exchanging the order of products of F (s)1 and F(t) in (15.131), we have the same resolvent matrix. This can readily be checked. Moreover, we get Rðs, t Þ ¼ F ðt sÞ. Example 15.3 Solve the following equation xð_t Þ ¼ x þ c, yð_t Þ ¼ x þ y þ d,

ð15:132Þ

under the BCs x(0) ¼ σ and y(0) ¼ τ. In a matrix form, (15.132) can be written as 1 ðxð_t Þ yð_t ÞÞ ¼ ðx yÞ 0

1 1

þ ðc dÞ:

ð15:133Þ

606

15

Exponential Functions of Matrices

1 1 The matrix of a type of M ¼ has been fully investigated in Sect. 12.6 0 1 and characterized as a matrix that cannot be diagonalized. In such a case, we directly calculate exp tM. Here, remember that product of triangle matrices of same kind (i.e., an upper triangle matrix or a lower triangle matrix; see Sect. 12.1) is a triangle matrix of the same kind. We have M2 ¼ ¼

1 1 0 1 1 3 0 1

1 1 0 1

¼

, :

2 1 , M3 ¼ 1 0

1 0

2 1

1 0

1 1

Repeating this, we get M ¼ n

1

n

0

1

:

ð15:134Þ

Thus, we have

¼

1 0 0

Be ¼@

t

0

1 1 exp tM ¼ ℜð0, t Þ ¼ E þ tM þ t 2 M 2 þ þ t ν M ν þ 2! ν! ! ! ! ! 0 1 1 1 2 1 2 1 ν 1 ν þt þ t þ þ t þ 2! ν! 1 0 1 0 1 0 1

1 ! 1 1 et tet t ν1 þ C t 1 þ t þ t2 þ þ 2! ðν 1Þ! : A¼ 0 et t e ð15:135Þ

With the resolvent matrix we get Rðs, t Þ ¼ exp ðsM Þ exp tM ¼

ets 0

ðt sÞets : ets

ð15:136Þ

Using (15.107), we get Z

t

xðt Þ ¼ ðσ τÞRð0, t Þ þ 0

Thus, we obtain the solution described by

ds½ðc d ÞRðs, t Þ:

ð15:137Þ

15.3

System of Differential Equations

607

xðt Þ ¼ ðc þ σ Þet c, yðt Þ ¼ ðc þ σ Þtet þ ðd c þ τÞet þ c d:

ð15:138Þ

From (15.138), we find that the original inhomogeneous equations (15.132) and BCs of x(0) ¼ σ and y(0) ¼ τ have been recovered. Example 15.4 Find the resolvent matrix of the following homogeneous equation: xð_t Þ ¼ 0, yð_t Þ ¼ x:

ð15:139Þ

In a matrix form, (15.139) can be written as ðxð_t Þ yð_t ÞÞ ¼ ðx The matrix N ¼

0 1

yÞ

0

1

0

0

:

ð15:140Þ

is characterized by a nilpotent matrix (Sect. 12.3). 0 0 Since N2 and matrices of higher order of N vanish, we have exp tN ¼ E þ tN ¼

1 0

t : 1

ð15:141Þ

We have a corresponding resolvent matrix such that Rðs, t Þ ¼

1 0

ts : 1

ð15:142Þ

Using the resolvent matrix, we can include the inhomogeneous term along with BCs. Example 15.5 As a trivial case, let us consider a following homogeneous equation: a 0 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞ , 0 b

ð15:143Þ

where one of a and b can be zero or both of them can be zero. Even though (15.143) merely apposes two FOLDEs, we can deal with such cases manner. in an automatic a 0 From Theorem 15.3, as eigenvalues of exp A where A ¼ , we have ea and 0 b eb. That is, we have

608

15

F ðt Þ ¼ exp tA ¼

eat 0

Exponential Functions of Matrices

0 : ebt

ð15:144Þ

In particular, if a ¼ b (including a ¼ b ¼ 0), we merely have the same FOLDE twice changing the arguments x with y. Readers may well wonder why we have to argue such a trivial case expressly. That is because the theory we developed in this chapter holds widely and we are able to make a clear forecast about a constitution of the solution for differential equations of a first order. For instance, (15.144) immediately tells us that a fundamental solution for xð_t Þ ¼ axðt Þ

ð15:145Þ

is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)] of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation under BCs. Returning back to Example 15.1, we had 1 2 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞ þ ð1 2Þ: 2 1

ð15:96Þ

This equation can be converted to 1 1 _ _ ðxðt Þ yðt ÞÞU ¼ ðx yÞUU 2

2 U þ ð1 2ÞU, 1

ð15:146Þ

0

1 0 1 1 1 1 1 pffiffiffi pffiffiffi pffiffiffi pffiffiffi B 2 B 2C 2C C. Then, ðx yÞU ¼ ðx yÞB 2 C. Defining (X where U ¼ B @ @ 1 1 A 1 1 A pffiffiffi pffiffiffi pffiffiffi pffiffiffi 2 2 2 2 Y) as 0

1 pffiffiffi B 2 ðX Y Þ ðx yÞU ¼ ðx yÞB @ 1 pffiffiffi 2

1 1 pffiffiffi 2C C, 1 A pffiffiffi 2

ð15:147Þ

we have 1 0 _ _ ðX ðt Þ Y ðt ÞÞ ¼ ðX Y Þ þ ð1 2ÞU: 0 3

ð15:148Þ

15.3

System of Differential Equations

609

Then, according to (15.90), we get a solution e ðt 0 , t Þ þ X ðt Þ ¼ X ðt 0 ÞR

Z

t

h i e ðs, t Þ , ds e bð s Þ R

ð15:149Þ

t0

e ðs, t Þ is given where X(t0) ¼ (X(t0) Y(t0)) and e bðsÞ ¼ bðsÞU. The resolvent matrix R by e ðs, t Þ ¼ R

!

eðtsÞ 0

0 e3ðtsÞ

:

ð15:150Þ

To convert (X Y) to (x y), operating U1 on both sides of (15.149) from the right, we obtain e ðt 0 , t ÞU 1 þ X ðt ÞU 1 ¼ Xðt 0 ÞU 1 U ℜ

Z

t

h i e ðs, t Þ U 1 , ds e bðsÞU 1 U ℜ

ð15:151Þ

t0

where with the first and second terms we insert U1U ¼ E. Then, we get Z t

1 1 e e xðt Þ ¼ xðt 0 Þ U ℜðt 0 , t ÞU þ dsbðsÞ U ℜðs, t ÞU :

ð15:152Þ

t0

e ðs, t ÞU 1 to have We calculate U R e ðs, t ÞU 1 UR

1 ¼ 2 ¼

1 2

1 1

1 1

eðtsÞ 0

!

0 e3ðtsÞ

1 1

1 1 !

eðtsÞ þ e3ðtsÞ

eðtsÞ þ e3ðtsÞ

eðtsÞ þ e3ðtsÞ

eðtsÞ þ e3ðtsÞ

¼ Rðs, t Þ: ð15:153Þ

Thus, we recover (15.90), i.e., we have Z

t

xðt Þ ¼ xðt 0 ÞRðt 0 , t Þ þ

ds½bðsÞRðs, t Þ:

ð15:90Þ

t0

e ðs, t Þ are connected to each other The equation (15.153) shows that Rðs, t Þ and R through the unitary similarity transformation such that e ðs, t Þ ¼ U 1 Rðs, t ÞU ¼ U { Rðs, t ÞU: R

ð15:154Þ

610

15

Exponential Functions of Matrices

Example 15.6 Let us consider the following system of differential equations: 0 _ _ ðxðt Þ yðt ÞÞ ¼ ðx yÞ ω The matrix

0

ω

ω 0

þ ða 0Þ:

ð15:155Þ

is characterized by an anti-Hermitian operator. This type ω 0 of operator frequently appears in Lie groups and Lie algebras that we will deal with in Chap. 20. Here we define an operator D as D¼

0

1

1

0

or ωD ¼

0

ω

ω

0

:

ð15:156Þ

The calculation procedures will be seen in Chap. 20, and so we only show the result here. That is, we have exp tωD ¼

cos ωt

sin ωt

sin ωt

cos ωt

:

ð15:157Þ

We seek the resolvent matrix Rðs, t Þ such that Rðs, t Þ ¼

cos ωðt sÞ

sin ωðt sÞ

sin ωðt sÞ

cos ωðt sÞ

:

ð15:158Þ

The implication of (15.158) is simple; synthesis of a rotation of ωt and the inverse rotation ωs is described by ω(t s). Physically, (15.155) represents the motion of a point mass under an external field. We routinely obtained the above solution using (15.90). If, for example, we solve (15.155) under the BCs for which the point mass is placed at rest at the origin of the xy-plane at t ¼ 0, we get a solution described by x¼

a a sin ωt, y ¼ ð cos ωt 1Þ: ω ω

ð15:159Þ

Deleting t from (15.159), we get 2 a 2 a x2 þ y þ ¼ : ω ω

ð15:160Þ

In other words, the point mass performs a circular motion as shown in Fig. 15.2. In Example 15.1 we mentioned that if the matrix A that defines the system of differential equations varies as a function of t, then the solution method based upon exp tA does not work. Yet, we have an important case where A is not a constant matrix. Let us consider a next illustration.

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave

611

Fig. 15.2 Circular motion of a point mass

− /

−

15.4

/

Motion of a Charged Particle in Polarized Electromagnetic Wave

In Sect. 4.5 we dealt with an electron motion under a circularly polarized light. Here we consider the motion of a charged particle in a linearly polarized electromagnetic wave. Suppose that a charged particle is placed in vacuum. Suppose also that an electromagnetic wave (light) linearly polarized along the x-direction is propagated in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take account of influence from both of the electric field and magnetic field components of Lorentz force. From (7.58) we have E = E0 eiðkxωtÞ ¼ E0 e1 eiðkzωtÞ ,

ð15:161Þ

where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two unit vectors appear just below.) If the extent of motion of the charged particle is narrow enough around the origin compared to a wavelength of the electromagnetic field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have E E 0 e1 eiωt :

ð15:162Þ

Taking a real part of (15.162), we get E = E0 e1 cos ωt: Meanwhile, from (7.59) we have

ð15:163Þ

612

15

Exponential Functions of Matrices

H = H 0 eiðkzωtÞ H 0 cos ωt:

ð15:164Þ

From (7.60) furthermore, we get H 0 = e3 E0 =

pffiffiffiffiffiffiffiffiffiffiffi μ0 =ε0 ¼ e3 E 0 e1 =μ0 c ¼ E 0 e2 =μ0 c:

ð15:165Þ

Thus, as the magnetic flux density we have B = μ0 H μ0 H 0 cos ωt ¼ E 0 e2 cos ωt=c:

ð15:166Þ

The Lorentz force F exerted on the charged particle is then described by F = eE þ e x_ B,

ð15:167Þ

where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a charge e. In (15.167) the first term represents the electric Lorentz force and the second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in (15.167) with those in (15.163) and (15.166), we get 1 eE F = 1 z_ eE0 e1 cos ωt þ 0 x_ e3 cos ωt c c = m€ x ¼ m€xe1 þ m€ye2 þ m€ze3 ,

ð15:168Þ

where m is a mass of the charged particle. Comparing each vector components, we have 1 m€x ¼ 1 z_ eE 0 cos ωt, c m€y ¼ 0, eE m€z ¼ 0 x_ cos ωt: c

ð15:169Þ

Note that the Lorentz force is not exerted in the direction of the y-axis. Putting a eE0/m and b eE0/mc, we get €x ¼ a cos ωt b_z cos ωt, €z ¼ b_x cos ωt:

ð15:170Þ

Further putting x_ ¼ ξ and z_ ¼ ζ, we obtain ξ_ ¼ bζ cos ωt þ a cos ωt, ζ_ ¼ bξ cos ωt:

ð15:171Þ

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave

613

Writing (15.171) as a matrix form, we obtain a following system of inhomogeneous differential equations: ðξð_t Þ ζ ð_t ÞÞ ¼ ðξ ζ Þ

0

b cos ωt

b cos ωt

0

þ ða cos ωt 0Þ,

ð15:172Þ

where the inhomogeneous term is given by (acosωt). First, we think of a homogeneous equation described by ðξð_t Þ ζ ð_t ÞÞ ¼ ðξ ζ Þ

0

b cos ωt

b cos ωt

0

:

ð15:173Þ

Meanwhile, we consider the following equation: d dt

cos f ðt Þ

sin f ðt Þ

sin f ðt Þ

cos f ðt Þ

¼

cos f ðt Þ sin f ðt Þ

0

!

¼@

f ð_t Þ sin f ðt Þ

f ð_t Þ cos f ðt Þ

f ð_t Þ cos f ðt Þ f ð_t Þ sin f ðt Þ

sin f ðt Þ cos f ðt Þ

0

f ð_t Þ

f ð_t Þ

0

1 A

! :

ð15:174Þ

Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that f(t) may satisfy f ð_t Þ ¼ b cos ωt,

ð15:175Þ

we might well be able to solve (15.172). In that case, moreover, we expect that two linearly independent column vectors

cos f ðt Þ

sin f ðt Þ

and

sin f ðt Þ

cos f ðt Þ

ð15:176Þ

give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175) can be given by f ðt Þ ¼ ðb=ωÞ sin ωt ¼ e b sin ωt, where e b b=ω. Notice that at t ¼ 0 two column vectors of (15.176) give 1 0

0 and , 1

if we choose f ðt Þ ¼ e b sin ωt for f(t) in (15.176).

ð15:177Þ

614

15

Exponential Functions of Matrices

Thus, defining F(t) as 0

cos e b sin ωt B F ðt Þ @ sin e b sin ωt

1 sin e b sin ωt C A, cos e b sin ωt

ð15:178Þ

and taking account of (15.174), we recast the homogeneous equation (15.173) as

A

dF ðt Þ ¼ F ðt ÞA, dt 0

b cos ωt

b cos ωt

0

:

ð15:179Þ

Equation (15.179) is formally the same as (15.47), even though F(t) is not given in the form of exp tA. This enables us to apply the general scheme mentioned in Sect. 15.3.2 to the present case. That is, we are going to address the given problem using the method of variation of constants such that xðt Þ ¼ kðt ÞF ðt Þ,

ð15:78Þ

where we define x(t) (ξ(t) ζ(t)) and k(t) (u(t) v(t)) as a variable constant. Hence, using (15.80)–(15.90) we should be able to find the solution of (15.172) described by Z

t

xðt Þ ¼ xðt 0 Þℜðt 0 , t Þ þ

ds½bðsÞℜðs, t Þ,

ð15:90Þ

t0

where b(s) (acosωt 0), i.e., the inhomogeneous term in (15.172). The resolvent matrix Rðs, t Þ is expressed as

1

Rðs, t Þ ¼ F ðsÞ F ðt Þ ¼

cos ½ f ðt Þ f ðsÞ

sin ½ f ðt Þ f ðsÞ

sin ½ f ðt Þ f ðsÞ

cos ½ f ðt Þ f ðsÞ

,

ð15:180Þ

where f(t) is given by (15.177). With F(s)1, we simply take a transpose matrix of (15.178), because it is an orthogonal matrix. Thus, we obtain 0

F ðsÞ1

cos e b sin ωs B ¼@ sin e b sin ωs

1 sin e b sin ωs C A: e cos b sin ωs

The resolvent matrix (15.180) possesses the properties described as (15.85)– (15.87). These can readily be checked. Also, we become aware that F(t) of

15.4

Motion of a Charged Particle in Polarized Electromagnetic Wave

615

(15.178) represents a rotation matrix in ℝ2; see Sect. 11.2. Therefore, F(t) and F(s)1 are commutative. Notice, however, that we do not have a resolvent matrix as a simple form that appears in (15.116). In fact, from (15.178) we find Rðs, t Þ 6¼ F ðt sÞ. It is because the matrix A given as (15.179) is not a constant but depends on t. Rewriting (15.90) as Z

t

xðt Þ ¼ xð0Þℜð0, t Þ þ

ds½ða cos ωt 0Þℜðs, t Þ

ð15:181Þ

0

and setting BCs as x(0) ¼ (σ τ), with the individual components we finally obtain a ξðt Þ ¼ σ cos e b sin ωt τ sin e b sin ωt þ sin e b sin ωt , b a a ζ ðt Þ ¼ σ sin e b sin ωt þ τ cos e b sin ωt cos e b sin ωt þ : b b

ð15:182Þ

Note that differentiating both sides of (15.182) with respect to t, we recover original equation of (15.172) to be solved. Further setting ξ(0) ¼ σ ¼ ζ(0) ¼ τ ¼ 0 as BCs, we have a sin e b sin ωt , b h i a ζ ðt Þ ¼ 1 cos e b sin ωt : b ξðt Þ ¼

ð15:183Þ

Notice that the above BCs correspond to that the charged particle is initially placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get 2 a 2 a ξ2 þ ζ ¼ : b b

ð15:184Þ

Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions, respectively. It is interesting that although the velocity in the x-direction switches from negative to positive, that in the z-direction is kept non-negative. This implies that the particle is oscillating along the x-direction, but continually drifting toward the z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually drifting term ab t. Figure 15.4 shows the Lorentz force F as a function of time in the case where the charge of the particle is positive. Again, the electric Lorentz force (represented by E) switches from positive to negative in the x-direction, but the magnetic Lorentz force (represented by x_ B) holds non-negative in the z-direction all the time. To precisely analyze the positional change of the particle, we need to integrate (15.182) once again. This requires the detailed numerical calculation. For this purpose the following relation is useful [7]:

616

15

Exponential Functions of Matrices

/

/

Fig. 15.3 Velocities ξ and ζ of a charged particle in the x- and z-directions, respectively. The charged particle exerts a periodic motion under the influence of a linearly polarized electromagnetic wave (a)

(b)

, ̇ ̇× , , ̇

̇× ,

Fig. 15.4 Lorentz force as a function of time. In (a, b), the phase of electromagnetic fields E and B is reversed. As a result, the electric Lorentz force (represented by E) switches from positive to negative in the x-direction, but the magnetic Lorentz force (represented by x_ B) holds non-negative all the time in the z-direction

cos ðx sin t Þ ¼ sin ðx sin t Þ ¼

X1

J ðxÞ cos mt, m¼1 m

X1

J ðxÞ sin mt, m¼1 m

ð15:185Þ

where the functions Jm(x) are called the Bessel functions that satisfy the following equation: d2 J n ðxÞ 1 dJ n ðxÞ n2 þ 1 þ J n ðxÞ ¼ 0: x dx x2 dx2

ð15:186Þ

References

617

Although we do not examine the properties of the Bessel functions in this book, the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel functions are widely investigated as one of special functions in many branches of mathematical physics. To conclude this section, we have described an example to which the matrix A of (15.179) that characterizes the system of differential equations is not a constant matrix but depends on the parameter t. Unlike the case of the constant matrix A, it may well be difficult to find a fundamental set of solutions. Once we can find the fundamental set of solutions, however, it is always possible to construct the resolvent matrix and solve the problem using it. In this respect, we have already encountered a similar situation in Chap. 10 where the Green’s functions were constructed by a fundamental set of solutions. In this section, a fundamental set of solutions could be found and, as a consequence, we could construct the resolvent matrix. It is, however, not always the case that we can recast the system of differential equations (15.66) as the form of (15.179). Nonetheless, whenever we are successful in finding the fundamental set of solutions, we can construct a resolvent matrix and, hence, solve the problems. This is a generic and powerful tool.

References 1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 3. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York 4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo. (in Japanese) 5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York 6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo. (in Japanese) 7. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd edn. Cambridge University Press, Cambridge 8. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 9. Hassani S (2006) Mathematical physics. Springer, New York

Part IV

Group Theory and Its Chemical Applications

Universe comprises space and matter. These two mutually stipulate their modality of existence. We often comprehend related various aspects as manifestation of symmetry. In this part we deal with the symmetry from a point of view of group theory. In this last part of the book, we outline and emphasize chemical applications of the methods of mathematical physics. This part supplies us with introductory description of group theory. The group theory forms an important field of both pure and applied mathematics. Starting with the definition of groups, we cover a variety of topics related to group theory. Of these, symmetry groups are familiar to chemists, because they deal with a variety of matter and molecules that are characterized by different types of symmetry. The symmetry group is a kind of finite group and called a point group as well. Meanwhile, we have various infinite groups that include rotation group as a typical example. We also mention an introductory theory of the rotation group of SO(3)that deals with an important topic of, e.g., Euler angles. We also treat successive coordinate transformations. Next, we describe representation theory of groups. Schur’s lemmas and related grand orthogonality theorem underlie the representation theory of groups. In parallel, characters and irreducible representations are important concepts that support the representation theory. We present various representations, e.g., regular representation, direct-product representation, and symmetric and antisymmetric representations. These have wide applications in the field of quantum mechanics and quantum chemistry, and so forth. On the basis of the above topics, we highlight quantum chemical applications of group theory in relation to a method of molecular orbitals. As tangible examples, we adopt aromatic molecules and methane. The last part deals with the theory of continuous groups. The relevant theory has wide applications in many fields of both pure and applied physics and chemistry. We highlight the topics of SU(2)and SO(3)that very often appear there. Tangible examples help understand the essence.

Chapter 16

Introductory Group Theory

A group comprises mathematical elements that satisfy four simple definitions. A bunch of groups exists under these simple definitions. This makes the group theory a discriminating field of mathematics. To get familiar with various concepts of groups, we first show several tangible examples. Group elements can be numbers (both real and complex) and matrices. More abstract mathematical elements can be included as well. Examples include transformation, operation, etc. as already studied in previous parts. Once those mathematical elements form a group, they share several common notions such as classes, subgroups, and direct-product groups. In this context, readers are encouraged to conceive different kinds of groups close to their heart. Mapping is an important concept as in the case of vector spaces. In particular, isomorphism and homomorphism frequently appear in the group theory. These concepts are closely related to the representation theory that is an important pillar of the group theory.

16.1

Definition of Groups

In contrast to a broad range of applications, the definition of the group is simple. Let ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or uncountable (e.g., real numbers) and the number of elements may be finite or infinite. We denote this by ℊ ¼ {gν}. If a group is a finite group, we express it as ℊ ¼ fg1 , g2 , , gn g,

ð16:1Þ

where n is said to be an order of the group. Definition of the group comprises the following four axioms with respect to a well-defined “multiplication” rule between any pair of elements. The multiplication

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_16

621

622

16

Introductory Group Theory

is denoted by a symbol " ⋄ " below. Note that the symbol ⋄ implies an ordinary multiplication, an ordinary addition, etc. (A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say that the set is “closed” regarding the multiplication.) (A2) Multiplication is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c. (A3) The set ℊ contains an element of e called the identity element such that we have a ⋄ e ¼ e ⋄ a ¼ a with any element a of ℊ. (A4) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element b is said to be the inverse element of a. We denote b a1. In the above definitions, we assume that the commutative law does not necessarily hold, that is, a ⋄ b 6¼ b ⋄ a. In that case the group ℊ is said to be a noncommutative group. However, we have a case where the commutative law holds, i.e., a ⋄ b ¼ b ⋄ a. If so, the group ℊ is called a commutative group or an Abelian group. Let us think of some examples of groups. Henceforth, we follow the convention and write ab to express a ⋄ b. Example 16.1 We present several examples of groups below. Examples (i)–(iv) are simple, but Example (v) is general. (i) ℊ ¼ {1, 1}. The group ℊ makes a group with respect to the multiplication. This is an example of a finite group. (ii) ℊ ¼ { , 3, 2, 1, 0, 1, 2, 3, }. The group ℊ makes a group with respect to the addition. This is an infinite group. For instance, take a(>0) and make a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1, again and again. Thus, addition is not closed and, hence, we must have an infinite group. 0 1 (iii) Let us start with a matrix a ¼ . Then, the inverse a1 ¼ a3 ¼ 1 0 1 0 0 1 . Its inverse is a2 itself. These four . We have a2 ¼ 0 1 1 0 elements make a group. That is, ℊ ¼ {e, a, a2, a3}. This is an example of cyclic groups. (iv) ℊ ¼ {1}. It is a most trivial case, but sometimes the trivial case is very important as well. We will come back later to this point. (v) Let us think of a more general case. In Chap. 11, we discussed endomorphism on a vector space and showed the necessary and sufficient condition for the existence of an inverse transformation. In this relation, we consider a set that comprises matrices such that GLðn, ℂÞ A ¼ aij ji, j ¼ 1, 2, , n; aij 2 ℂ, detA 6¼ 0 : This may be either a finite group or an infinite group. The former can be a symmetry group and the latter can be a rotation group. This group is characterized by a set of

16.1

Definition of Groups

Table 16.1 Multiplication table of ℊ ¼ {e, a, a2, a3}

623 ℊ e a b c

e e a b c

a a b c e

b a2 b c e a

c a3 c e a b

invertible and endomorphic linear transformations over a vector space Vn and called a linear transformation group or a general linear group and denoted by GL(n, ℂ), GL (Vn), GL(V ), etc. The relevant transformations are bijective. We can readily make sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the vector space can be ℂn or a function space. The structure of a finite group is tabulated in a multiplication table. This is made up such that group elements are arranged in a first row and a first column and that an intersection of an element gi in the row and gj in the column is designated as a product gi ⋄ gj. Choosing the above (iii) for an example, we make its multiplication table (see Table 16.1). There we define a2 ¼ b and a3 ¼ c. Having a look at Table 16.1, we notice that in the individual rows and columns each group element appears once and only once. This is well known as a rearrangement theorem. Theorem 16.1: Rearrangement Theorem [1] In each row or each column in the group multiplication table, individual group elements appear once and only once. From this, each row and each column list merely rearranged group elements. Proof Let a set ℊ ¼ {g1 e, g2, , gn} be a group. Arbitrarily choosing any element h from ℊ and multiplying individual elements by h, we obtain a set Hfhg1 , hg2 , , hgn g . Then, all the group elements of ℊ appear in H once and only once. Choosing any group element gi, let us multiply gi by h1 to get h1gi. Since h1gi must be a certain element gk of ℊ, we put h1gi ¼ gk. Multiplying both sides by h, we have gi ¼ hgk. Therefore, we are able to find this very element hgk in H, i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in turn that gi appears more than once. Then, we must have gi ¼ hgk ¼ hgl (k 6¼ l). Multiplying the relation by h1, we would get h1gi ¼ gk ¼ gl, in contradiction to the supposition. This means that gi appears in H once and only once. This confirms that the theorem is true of each row of the group multiplication table. A similar argument applies with a set H0 fg1 h, g2 h, , gn hg. This confirms in turn that the theorem is true of each column of the group multiplication table. These complete the proof. ∎

624

16.2

16

Introductory Group Theory

Subgroups

As we think of subspaces in a linear vector space, we have subgroups in a group. The definition of the subgroup is that a subset H of a group makes a group with respect to the multiplication ⋄ that is defined for the group ℊ. The identity element makes a group by itself. Both {e} and ℊ are subgroups as well. We often call subgroups other than {e} and ℊ “proper” subgroups. A necessary and sufficient condition for the subset H to be a subgroup is the following: (1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h1 2 H . If H is a subgroup of ℊ, it is obvious that the relations (1) and (2) hold. Conversely, if (1) and (2) hold, H is a subgroup. In fact, (1) ensures the aforementioned relation (A1). Since H is a subset of ℊ, this guarantees the associative law (A2). The relation (2) ensures (A4). Finally, in virtue of (1) and (2), h ⋄ h1 ¼ e is contained in H ; this implies that (A3) is satisfied. Thus, H is a subgroup, because H satisfies the axioms (A1) to (A4). Of the above examples, (iii) has a subgroup H ¼ e, a2 . It is important to decompose a set into subsets that do not mutually contain an element (except for a special element) among them. We saw this in Part III when we decomposed a linear vector space into subspaces. In that case the said special element was a zero vector. Here let us consider a related question in its similar aspects. Let H ¼ fh1 e, h2 , , hs g be a subgroup of ℊ. Also let us consider aH where ∃ a 2 ℊ and a= 2H . Suppose that aH is a subset of ℊ such that aH ¼ fah1 , ah2 , , ahs g . Then, we have another subset H þ aH . If H contains s elements, so does aH . In fact, if it were not the case, namely, if ahi ¼ ahj, multiplying the both sides by a1 we would have hi ¼ hj, in contradiction. Next, let us take b such that b= 2H and b= 2aH and make up bH and H þ aH þ bH successively. Our question is whether these procedures decompose ℊ into subsets mutually exclusive and collectively exhaustive. Suppose that we can succeed in such a decomposition and get ℊ ¼ g1 H þ g2 H þ þ gk H ,

ð16:2Þ

where g1, g2, , gk are mutually different elements with g1 being the identity e. In that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right coset decomposition can be done to give ℊ ¼ H g1 þ H g2 þ þ H gk : In general, however,

ð16:3Þ

16.3

Classes

625

gk H 6¼ H gk or gk H g1 k 6¼ H :

ð16:4Þ

Taking the case of left coset as an example, let us examine whether different cosets mutually contain a common element. Suppose that gi H and gj H mutually contain a common element. Then, that element would be expressed as gihp ¼ gjhq (1 i, j n; 1 p, q s). Thus, we have gi hp h1 q ¼ gj . Since H is a subgroup of ℊ, hp h1 2 H . This implies that g 2 g H . It is in contradiction to the definition of left j i q coset. Thus, we conclude that different cosets do not mutually contain a common element. Suppose that the order of ℊ and H is n and s, respectively. Different k cosets comprise s elements individually and different cosets do not mutually possess a common element and, hence, we must have n ¼ sk,

ð16:5Þ

where k is called an index of H . We will have many examples afterward.

16.3

Classes

Another method to decompose a group into subsets is called conjugacy classes. A conjugate element is defined as follows: Let a be an element arbitrarily chosen from a group. Then an element gag1 is called a conjugate element or conjugate to a. If c is conjugate to b and b is conjugate to a, then c is conjugate to a. It is because 1

c ¼ g0 bg0 , b ¼ gag1 ⟹c ¼ g0 bg0

1

¼ g0 gag1 g0

1

1

¼ g0 gaðg0 gÞ :

ð16:6Þ

In the above a set containing a and all the elements conjugate to a is said to be a (conjugate) class of a. Denoting this set by ∁a, we have 1 1 : ∁a ¼ a, g2 ag1 2 , g3 ag3 , , gn agn

ð16:7Þ

In ∁a a same element may appear repeatedly. It is obvious that in every group the identity element e forms a class by itself. That is, ∁e ¼ feg:

ð16:8Þ

As in the case of the decomposition of a group into (left or right) cosets, we can decompose a group to classes. If group elements are not exhausted by a set comprising ∁e or ∁a, let us take b such that b 6¼ e and b 2 = ∁a and make ∁b similarly to (16.7). Repeating this procedure, we should be able to decompose a group into classes. In fact, if group elements have not yet exhausted after these procedures, take

626

16

Introductory Group Theory

remaining element z and make a class. If the remaining element is only z in this moment, z can make a class by itself (as in the case of e). Notice that for an Abelian group every element makes a class by itself. Thus, with a finite group we have a decomposition such that ℊ ¼ ∁e þ ∁a þ ∁b þ þ ∁z :

ð16:9Þ

To show that (16.9) is really a decomposition, suppose that for instance a set ∁a \ ∁b is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy a following relation: x ¼ αaα1 ¼ βbβ1, i.e., b ¼ β1 αaα1β ¼ β1 αa(β1 α)1. This implies that b has already been included in ∁a, in contradiction to the supposition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of classes. In the above we thought of a class conjugate to a single element. This notion can be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be an element of ℊ. Let us now consider a set H 0 ¼ gH g1. The set H 0 is a subgroup of ℊ and is called a conjugate subgroup. In fact, let hi and hj be any two elements of H , that is, let ghig1 and ghjg1 be any tow elements of H 0 . Then, we have

ghi g1 ghj g1 ¼ ghi hj g1 ¼ ghk g1 ,

ð16:10Þ 1

1 where hk ¼ hi hj 2 H . Hence, ghk g1 2 H 0 . Meanwhile, ðghi g1 Þ ¼ gh1 2 i g 0 0 H . Thus, conditions (1) and (2) of Sect. 16.2 are satisfied with H . Therefore, H 0 is a subgroup of ℊ. The subgroup H 0 has a same order as H . This is because with any two different elements hi and hj ghig1 6¼ ghjg1. If for 8g2ℊ and a subgroup H we have a following equality

g1 H g ¼ H ,

ð16:11Þ

such a subgroup H is said to be an invariant subgroup. If (16.11) holds, H should be a sum of classes (reader, please show this). A set comprising only the identity, i.e., {e} forms a class. Therefore, if H is a proper subgroup, H must contain two or more classes. The relation (16.11) can be rewritten as gH ¼ H g:

ð16:12Þ

This implies that the left coset is identical to the right coset. Thus, as far as we are dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish left and right cosets. Now let us anew consider the (left) coset decomposition of ℊ by an invariant subgroup H

16.4

Isomorphism and Homomorphism

627

ℊ ¼ g1 H þ g2 H þ þ gk H ,

ð16:13Þ

where we have H ¼ fh1 e, h2 , , hs g. Then multiplication of two elements that belong to the cosets gi H and gj H is expresses as ðgi hl Þ gj hm ¼ gi gj g1 gj hm ¼ gi gj g1 j hl j hl gj hm ¼ gi gj hp hm ¼ gi gj hq ,

ð16:14Þ

where the third equality comes from (16.11). That is, we should have ∃hp such that g1 j hl gj ¼ hp , and hphm ¼ hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with 1 α s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets ð gi H Þ gj H ¼ gi gj H :

ð16:15Þ

Viewing LHS of (16.15) as a product of two cosets, we find that the said product is a coset as well. This implies that a collection of the cosets forms a group. Such a group that possesses cosets as elements is said to be a factor group or quotient group. In this context, the multiplication is a product of cosets. We denote the factor group by ℊ=H : An identity element of this factor group is H . This is because in (16.15) putting gi ¼ e, we get H gj H ¼ gj H . Alternatively, putting gj ¼ e, we have ðgi H ÞH ¼ gi H . In (16.15) moreover putting gj ¼ g1 i , we get 1 ðgi H Þ g1 i H ¼ gi gi H ¼ H :

ð16:16Þ

1 Hence, ðgi H Þ1 ¼ g1 i H . That is, the inverse element of gi H is gi H .

16.4

Isomorphism and Homomorphism

As in the case of the linear vector space, we consider the mapping between group elements. Of these, the notion of isomorphism and homomorphism is important. Definition 16.1 Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0, y0, } be groups and let a mapping ℊ ! ℊ0 exist. Suppose that there is a one-to-one correspondence (i.e., injective mapping)

628

16

x $ x0 ,

y $ y0 ,

Introductory Group Theory

between the elements such that xy ¼ z implies that x0y0 ¼ z0 and vice versa. Meanwhile, any element in ℊ0 must be the image of some element of ℊ. That is, the mapping is surjective as well and, hence, the mapping is bijective. Then, the two groups ℊ and ℊ0 are said to be isomorphic. The relevant mapping is called an isomorphism. We symbolically denote this relation by ℊ ffi ℊ0 : Note that the aforementioned groups can be either a finite group or an infinite group. We did not designate identity elements. Suppose that x is the identity e. Then, from the relations xy ¼ z and x0y0 ¼ z0 we have ey ¼ z ¼ y,

x0 y0 ¼ z 0 ¼ y0 :

ð16:17Þ

Then, we get x0 ¼ e0 , i:e:, e $ e0 :

ð16:18Þ

0 xx1 ¼ z ¼ e, x0 x1 ¼ e0 , x0 y0 ¼ z0 ¼ e0 :

ð16:19Þ

Also let us put y ¼ x1. Then,

Comparing the second and third equations of (16.19), we get y0 ¼ x0

1

0 ¼ x1 :

ð16:20Þ

The bijective character mentioned above can somewhat be loosened in such a way that the one-to-one correspondence is replaced with n-to-one correspondence. We have a following definition. Definition 16.2 Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0, y0, } be groups and let a mapping ℊ ! ℊ0 exist. Also let a mapping ρ: ℊ ! ℊ0 exist such that with arbitrarily chosen any two elements, the following relation holds: ρðxÞρðyÞ ¼ ρðxyÞ:

ð16:21Þ

Then, the two groups ℊ and ℊ0 are said to be homomorphic. The relevant mapping is called homomorphism. We symbolically denote this relation by ℊ ℊ0 : In this case, we have

16.4

Isomorphism and Homomorphism

629

ρðeÞρðeÞ ¼ ρðeeÞ ¼ ρðeÞ, i:e:, ρðeÞ ¼ e0 , where e0 is an identity element of ℊ0. Also, we have ρðxÞρ x1 ¼ ρ xx1 ¼ ρðeÞ ¼ e0 : Therefore, ½ρðxÞ1 ¼ ρ x1 : The two groups can be either a finite group or an infinite group. Note that in the above the mapping is not injective. The mapping may or may not be surjective. Regarding the identity and inverse elements, we have the same relations as (16.18) and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomorphism is the isomorphism. Let us introduce an important notion of a kernel of a mapping. In this regard, we have a following definition. Definition 16.3 Let ℊ ¼ {e, x, y, } and ℊ0 ¼ {e0, x0, y0, } be groups and let e and e0 be the identity elements. Suppose that there exists a homomorphic mapping ρ: ℊ ! ℊ0. Also let F be a subset of ℊ such that ρð F Þ ¼ e 0 :

ð16:22Þ

Then, F is said to be a kernel of ρ. Regarding the kernel, we have following important theorems. Theorem 16.2 Let ℊ ¼ {e, x, y, } and ℊ0 ¼ {e0, x0, y0, } be groups, where e and e0 are identity elements. A necessary and sufficient condition for a surjective and homomorphic mapping ρ : ℊ ! ℊ0 to be isomorphic is that a kernel F ¼ feg. Proof We assume that F ¼ feg. Suppose that ρ(x) ¼ ρ( y). Then, we have ρðxÞ½ρðyÞ1 ¼ ρðxÞρ y1 ¼ ρ xy1 ¼ e0 :

ð16:23Þ

The first and second equalities result from the homomorphism of ρ. Since F ¼ feg, xy1 ¼ e, i.e., x ¼ y. Therefore, ρ is injective (i.e., one-to-one correspondence). As ρ is surjective from the assumption, ρ is bijective. The mapping ρ is isomorphic accordingly. Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) ¼ e0. From (16.18), ρ(e) ¼ e0. We have ρ(x) ¼ ρ(e) ¼ e0 ⟹ x ¼ e due to the isomorphism of ρ (i.e., one-to-one correspondence). This implies F ¼ feg. This completes the proof. ∎ We become aware of close relationship between Theorem 16.1 and linear transformation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1

630

16

Fig. 16.1 Mapping in a group and vector space. (a) Homomorphic mapping ρ: ℊ ! ℊ0 between two groups. (b) Linear transformation (endomorphism) A: Vn ! Vn in a vector space Vn

Introductory Group Theory

(a)

ℊ

ℊ (b)

⟷

: isomorphism

: inverble (bijecve)

shows this relationship. Figure 16.1a represents homomorphic mapping ρ: ℊ ! ℊ0 between two groups, whereas Fig. 16.1b shows linear transformation (endomorphism) A: Vn ! Vn in a vector space Vn. Theorem 16.3 Suppose that there exists a homomorphic mapping ρ: ℊ ! ℊ0, where ℊ and ℊ0 are groups. Then, a kernel F of ρ is an invariant subgroup of ℊ. Proof Let ki and kj be any two arbitrarily chosen elements of F . Then, ρð k i Þ ¼ ρ k j ¼ e 0 ,

ð16:24Þ

where e0 is the identity element of ℊ0. From (16.21) we have ρ ki kj ¼ ρðki Þρ kj ¼ e0 e0 ¼ e0 :

ð16:25Þ

Therefore, k i kj 2 F . Meanwhile, from (16.20) we have 1 ρ k1 ¼ ½ρðk i Þ1 ¼ e0 ¼ e0 : i

ð16:26Þ

Then, k1 i 2 F . Thus, F is a subgroup of ℊ. Next, for 8g 2 ℊ we have ρ gk i g1 ¼ ρðgÞρðki Þρ g1 ¼ ρðgÞe0 ρ g1 ¼ e0 :

ð16:27Þ

Accordingly we have gk i g1 2 F . Thus, gF g1 ⊂ F . Since g is chosen arbitrarily, replacing it with g1 we have g1 F g ⊂ F. Multiplying g and g1 on both sides from the left and right, respectively, we get F ⊂ gF g1 . Consequently, we get gF g1 ¼ F : This implies that F of ρ is an invariant subgroup of ℊ.

ð16:28Þ ∎

16.5

Direct-Product Groups

631

Theorem 16.4: Homomorphism Theorem Let ℊ ¼ {x, y, } and ℊ0 ¼ {x0, y0, } be groups and let a homomorphic (and surjective) mapping ρ: ℊ ! ℊ0 exist. Also let F be a kernel of ℊ. Let us define a surjective mapping e ρ: ℊ=F ! ℊ0 such that e ρðgi F Þ ¼ ρðgi Þ:

ð16:29Þ

Then, e ρ is an isomorphic mapping. Proof From (16.15) and (16.21), it is obvious that e ρ is homomorphic. The confirmation is left for readers. Let gi F and gj F be two different cosets. Suppose here that ρ(gi) ¼ ρ(gj). Then we have 1 ρ g1 ρ gj ¼ ½ρðgi Þ1 ρ gj ¼ ½ρðgi Þ1 ρðgi Þ ¼ e0 : i g j ¼ ρ gi

ð16:30Þ

This implies that g1 i gj 2 F . That is, we would have gj 2 gi F . This is in contradiction to the definition of a coset. Thus, we should have ρ(gi) 6¼ ρ(gj). In other words, the different cosets gi F and gj F have been mapped into different elements ρ(gi) and ρ(gj) in ℊ0. That is, e ρ is isomorphic; i.e., ℊ=F ffi ℊ0 . ∎

16.5

Direct-Product Groups

So far we have investigated basic properties of groups. In Sect. 16.4 we examined factor groups. The homomorphism theorem shows that the factor group is characterized by division. In the case of a finite group, an order of the group is reduced. In this section, we study the opposite character, i.e., properties of direct product of groups, or direct-product groups. Let H ¼ fh1 e, h2 , , hm g and H 0 ¼ h01 e, h02 , , h0n be groups of the 8 order of m and n, respectively. Suppose that (i) 8 hi (1 i m) and h0j ð1 i nÞ commute, i.e., hi h0j ¼ h0j hi and that (ii) H \ H 0 ¼ feg. Under these conditions let us construct a set ℊ such that n o ℊ ¼ h1 h01 e, hi h0j ð1 i m, 1 j nÞ :

ð16:31Þ

In other words, ℊ is a set comprising mn elements hi h0j . A product of elements is defined as

hi h0j hk h0l ¼ hi hk h0j h0l ¼ hp h0q ,

ð16:32Þ

where hp ¼ hihk and h0q ¼ h0j h0l . The identity element is ee ¼ e; ehihk ¼ hihke. The 1 1 1 inverse element is ðhi h0j Þ ¼ h0j hi 1 ¼ hi 1 h0j . Associative law is obvious from

632

16

Introductory Group Theory

hi h0j ¼ h0j hi. Thus, ℊ forms a group. This is said to be a direct product of groups, or a direct-product group. The groups H and H 0 are called direct factors of ℊ. In this case, we succinctly represent ℊ ¼ H H 0: In the above the condition (ii) is equivalent to that 8 g2ℊ is uniquely represented as g ¼ hh0 ; h 2 H , h0 2 H 0 :

ð16:33Þ

In fact, suppose that H \ H 0 ¼ feg and that g can be represented in two ways such that g ¼ h1 h01 ¼ h2 h02 ; h1 , h2 2 H , h01 , h02 2 H 0 :

ð16:34Þ

Then, we have 1

h2 1 h1 ¼ h02 h01 ; h2 1 h1 2 H , h02 h01

1

2 H 0:

ð16:35Þ

From the supposition, we get h2 1 h1 ¼ h02 h01

1

¼ e:

ð16:36Þ

That is, h2 ¼ h1 and h02 ¼ h01 . This means that the representation is unique. Conversely, suppose that the representation is unique and that x 2 H \ H 0. Then we must have x ¼ xe ¼ ex:

ð16:37Þ

Thanks to uniqueness of the representation, x ¼ e. This implies H \ H 0 ¼ feg. Now suppose h 2 H . Then for 8 g2ℊ putting g ¼ hν h0μ , we have 1

1

ghg1 ¼ hν h0μ hh0μ hν 1 ¼ hν hh0μ h0μ hν 1 ¼ hν hhν 1 2 H :

ð16:38Þ

Then we have gH g1 ⊂ H . Similarly to the proof of Theorem 16.3, we get gH g1 ¼ H :

ð16:39Þ

This shows that H is an invariant subgroup of ℊ. Similarly, H 0 is an invariant subgroup as well.

Reference

633

Regarding the unique representation of the group element of a direct-product group, we become again aware of the close relationship between the direct product and direct sum that was mentioned earlier in Part III.

Reference 1. Cotton FA (1990) Chemical applications of group theory. Wiley, New York

Chapter 17

Symmetry Groups

We have many opportunities to observe symmetry both macroscopic and microscopic in natural world. First, we need to formulate the symmetry appropriately. For this purpose, we must regard various symmetry operations as mathematical elements and classify these operations under several categories. In Part III we examined various properties of vectors and their transformations. We also showed that the vector transformation can be viewed as the coordinate transformation. On these topics, we focused upon abstract concepts in various ways. On another front, however, we have not paid attention to specific geometric objects, especially molecules. In this chapter, we study the symmetry of these concrete objects. For this, it will be indispensable to correctly understand a variety of symmetry operations. At the same time, we deal with the vector and coordinate transformations as group elements. Among such transformations, rotations occupy a central place in the group theory and related field of mathematics. Regarding the three-dimensional Euclidean space, SO(3) is particularly important. This is characterized by an infinite group in contrast to various symmetry groups (or point groups) we investigate in the former parts of this chapter.

17.1

A Variety of Symmetry Operations

To understand various aspects of symmetry operations, it is convenient and essential to consider a general point that is fixed in a three-dimensional Euclidean space and to examine how this point is transformed in the space. In parallel to the description in Part III, we express the coordinate of the general point P as

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_17

635

636

17

0 1 x B C P ¼ @ y A:

Symmetry Groups

ð17:1Þ

z Note that P may be on or within or outside a geometric object or molecule that we are dealing with. The relevant position vector x for P is expresses as x ¼ xe1 þ ye2 þ ze3: 0 1 x B C ¼ ðe1 e2 e3 Þ@ y A,

ð17:2Þ

z where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transformation A by 0

a11 B AðxÞ ¼ ðe1 e2 e3 Þ@ a21

a12 a22

10 1 x a13 CB C a23 A@ y A:

a31

a32

a33

ð17:3Þ

z

Among various linear transformations that are represented by matrices, orthogonal transformations are the simplest and most widely used. We use orthogonal matrices to represent the orthogonal transformations accordingly. Let us think of a movement or translation of a geometric object and an operation that causes such a movement. Fist suppose that the geometric object is fixed on a coordinate system. Then the object is moved (or translate) to another place. If before and after such a movement (or translation) one could not tell whether the object has been moved, we say that the object possesses a “symmetry” in a certain sense. Thus, we have to specify this symmetry. In that context, group theory deals with the symmetry and defines it clearly. To tell whether the object has been moved (to another place), we usually distinguish it by change in (i) positional relationship and (ii) attribute or property. To make the situation simple, let us consider a following example: Example 17.1 We have two round disks, i.e., Disk A and Disk B. Suppose that Disk A is a solid white disk, whereas Disk B is partly painted black (see Fig. 17.1). In Fig. 17.1 we are thinking of a rotation of an object (e.g., round disk) around an axis standing on its center and stretching perpendicularly to the object plane. If an arbitrarily chosen positional vector fixed on the object before the rotation is moved to another position that was not originally occupied by the object, then we recognize that the object has certainly been moved. For instance, imagine that a round disk having a through-hole located aside from center is rotating. What about the case where that position was originally occupied by the object, then? We have

17.1

A Variety of Symmetry Operations

637

(b)

(a) Disk A

Disk B Rotation around

C

the center

Rotation around C

C

C

the center

Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been moved. (b) Case where we can recognize that the object has been moved because of its attribute (i.e., because the round disk is partly painted black)

two possibilities. The first alternative is that we cannot recognize that the object has been moved. The second one is that we can yet recognize that the object has been moved. According to Fig. 17.1a, b, we have the former case and the latter case, respectively. In the latter case, we have recognized the movement of the object by its attribute, i.e., by that the object is partly painted black. However, we do not have to be rigorous here. We have a clear intuitive criterion for a judgment of whether a geometric object has been moved. From now on, we assume that the geometric character of an object is pertinent to both its positional relationship and attribute. Thus, we define the equivalent (or indistinguishable) disposition of an object and the operation that yields such an equivalent disposition as follows: Definition 17.1 (i) Symmetry operation: A geometric operation that produces an equivalent (or indistinguishable) disposition of an object. (ii) Equivalent (or indistinguishable) disposition: Suppose that regarding a geometric operation of an object, we cannot recognize that the object has been moved before and after that geometric operation. In that case, the original disposition of the object and the resulting disposition reached after the geometric operation are referred to as an equivalent disposition. The relevant geometric operation is the symmetric operation. Here we should clearly distinguish translation (i.e., parallel displacement) from the abovementioned symmetry operations. This is because for a geometric object to possess the translation symmetry the object must be infinite in extent, typically an infinite crystal lattice. The relevant discipline is widely studied as space group and has a broad class of applications in physics and chemistry. However, we will not deal with the space group or associated topics, but focus our attention upon symmetry groups in this book. In the above example, the rotation is a symmetry operation with Fig. 17.1a, but the said geometric operation is not a symmetric operation with Fig. 17.1b. Let us further inspect properties of the symmetry operation. Let us consider a set H consisting of symmetry operations. Let a and b be any two symmetric operations of H . Then, (i) a ⋄ b is a symmetric operation as well. (ii) Multiplication of

638

17

Symmetry Groups

successive symmetric operations a, b, c is associative; i.e., a ⋄ (b ⋄ c) ¼ (a ⋄ b) ⋄ c. (iii) The set H contains an element of e called the identity element such that we have a ⋄ e ¼ e ⋄ a ¼ a with any element a of H. Operating “nothing” should be e. If the rotation is relevant, 2π rotation is thought to be e. These are intuitively acceptable. (iv) For any a of ℊ, we have an element b such that a ⋄ b ¼ b ⋄ a ¼ e. The element b is said to be the inverse element of a. We denote it by b a1. The inverse element corresponds to an operation that brings the disposition of a geometric object back to the original disposition. Thus, H forms a group. We call H satisfying the above criteria a symmetry group. A symmetry group is called a point group as well. This is because a point group comprises symmetry operations of geometric objects as group elements and those objects have at least one fixed point after the relevant symmetry operation. The name of a point group comes from this fact. As mentioned above, the symmetric operation is best characterized by a (3, 3) orthogonal matrix. In Example 17.1, e.g., the π rotation is represented by an orthogonal matrix A such that 0

1 B A¼@ 0

1 0 C 0 A:

0 1

0

0

ð17:4Þ

1

This operation represents a π rotation around the z-axis. Let us think of another symmetric operation described by 0

1 0

0

B B ¼ @0 1 0 0

1

C 0 A: 1

ð17:5Þ

This produces a mirror symmetry with respect to the xy-plane. Then, we have 0

1 B C AB ¼ BA ¼ @ 0

0 1

0

0

1 0 C 0 A: 1

ð17:6Þ

The operation C shows an inversion about the origin. Thus, A, B, and C along with an identity element E form a group. Here E is expressed as 0

1 B E ¼ @0

0 1

1 0 C 0 A:

0

0

1

ð17:7Þ

The above group is represented by four three-dimensional diagonal matrices whose elements are 1 or 1. Therefore, it is evident that an inverse element of A, B, and C is

17.1

A Variety of Symmetry Operations

639

Fig. 17.2 Rotation by θ around the z-axis

z

$ θ

O

y

x A, B, and C itself, respectively. The said group is commutative (Abelian) and said to be a four group [1]. Meanwhile, we have a number of noncommutative groups. From a point of view of a matrix structure, noncommutativity comes from off-diagonal elements of the matrix. A typical example is a rotation matrix of a rotation angles different from zero or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation around the z-axis. Figure 17.2 depicts a graphical illustration for this. A matrix R has a following form: 0

cos θ

B R ¼ @ sin θ 0

sin θ cos θ 0

0

1

C 0 A: 1

ð17:8Þ

Note that R has been reduced. This implies that the three-dimensional Euclidean space is decomposed into a two-dimensional subspace (xy-plane) and a one-dimensional subspace (z-axis). The xy-coordinates are not mixed with the z-component after the rotation R. Note, however, that if the rotation axis is oblique against the xyplane, this is not the case. We will come back to this point later. Taking only the xy-coordinates in Fig. 17.3 we make a calculation. Using an addition theorem of trigonometric functions, we get x0 ¼ r cos ðθ þ αÞ ¼ rð cos α cos θ sin α sin θÞ ¼ x cos θ y sin θ, 0

y ¼ r sin ðθ þ αÞ ¼ r ð sin α cos θ þ cos α sin θÞ ¼ y cos θ þ x sin θ,

ð17:9Þ ð17:10Þ

where we used x ¼ r cos α and y ¼ r sin α. Combining (17.9) and (17.10), as a matrix form we get

640

17

Symmetry Groups

y

Fig. 17.3 Transformation of the xy-coordinates by a θ rotation

′

θ α O

x0 y0

¼

cos θ sin θ

sin θ cos θ

x : y

x

ð17:11Þ

Equation (17.11) is the same as (11.31) and represents a transformation matrix of a rotation angle θ within the xy-plane. Whereas in Chap. 11 we considered this from the point of view of the transformation of basis vectors, here we deal with the transformations of coordinates in the fixed coordinate system. If θ 6¼ π, off-diagonal elements do not vanish. Including the z-component we get (17.8). Let us summarize symmetry operations and their (3, 3) matrix representations. The coordinates before and after a symmetry operation are expressed as 0 1 0 01 x x B C B 0C y and @ A @ y A, respectively: z

z0

(i) Identity transformation: To leave a geometric object or a coordinate system unchanged (or unmoved). By convention, we denote it by a capital letter E. It is represented by a (3, 3) identity matrix. (ii) Rotation symmetry around a rotation axis: Here a “proper” rotation is intended. We denote a rotation by a rotation axis and its magnitude (i.e., rotation angle). Thus, we have

17.1

A Variety of Symmetry Operations

641

Fig. 17.4 Rotation by ϕ around the y-axis

z

y

O

$

x 0

cos θ

sin θ

0

1

B C Rzθ ¼ @ sin θ cos θ 0 A, 0 0 1 0 1 1 0 0 B C Rxφ ¼ @ 0 cos φ sin φ A: 0

sin φ

0 B Ryϕ ¼ @

cos ϕ

0

0 1 sin ϕ 0

sin ϕ

1

C 0 A, cos ϕ ð17:12Þ

cos φ

With Ryϕ we first consider a following coordinate transformation: 0

z0

1

0

cos ϕ

B 0C B @ x A ¼ @ sin ϕ 0 y0

sin ϕ cos ϕ 0

10 1 z CB C 0 A@ x A: 1 y 0

ð17:13Þ

visualized in Fig. 17.4; consider that cyclic permutation of 0 This 1 can easily0be 1 x z B C B C @ y A produces @ x A. Shuffling the order of coordinates, we get z y 0

x0

1

0

cos ϕ

B 0C B @y A ¼ @ 0 sin ϕ z0

0 1 0

10 1 x CB C 0 A@ y A: cos ϕ z sin ϕ

ð17:14Þ

By convention of the symmetry groups, the following notation Cn is used to denote a rotation. A subscript n of Cn represents the order of the rotation axis. The order means the largest number of n so that the rotation through 2π/n gives an equivalent configuration. Successive rotations of m times (m < n) are denoted by

642

17

Symmetry Groups

Cm n: If m ¼ n, the successive rotations produce an equivalent configuration same as the beginning; i.e., Cnn ¼ E . The rotation angles θ, φ, etc. used above are restricted to 2πm/n accordingly. (iii) Mirror symmetry with respect to a plane of mirror symmetry: We denote a mirror symmetry by a mirror symmetry plane (e.g., xy-plane, yzplane). We have 0

1

B M xy ¼ @ 0

0

0

1

C 0 A, 0 0 1 0 1 1 0 0 B C M zx ¼ @ 0 1 0 A: 0 0 1 1

0

1

0

B M yz ¼ @ 0 0

0

1

1

C 0 A,

0

0

ð17:15Þ

The mirror symmetry is usually denoted by σ v, σ h, and σ d, whose subscripts stand for “vertical,” “horizontal,” and “dihedral,” respectively. Among these symmetry operations, σ v and σ d include a rotation axis in the symmetry plane, while σ h is perpendicular to the rotation axis if such an axis exists. Notice that a group belonging to Cs symmetry possesses only E and σ h. Although σ h can exist by itself as a mirror symmetry, neither σ v nor σ d can exist as a mirror symmetry by itself. We will come back to this point later. (iv) Inversion symmetry with respect to a center of inversion: We specify an inversion center if necessary, e.g., an origin of a coordinate system O. The operation of inversion is denoted by 0

1

B IO ¼ @ 0 0

0

0

1

C 0 A: 1

1 0

ð17:16Þ

Note that as obviously from the matrix form, IO is commutable with any other symmetry operations. Note also that IO can be expressed as successive symmetry operations or product of symmetry operations. For instance, we have 0

1

B I O ¼ Rzπ M xy ¼ @ 0 0

0

0

10

1

CB 1 0 A@ 0 0 1 0

0 1 0

0

1

C 0 A: 1

Note that Rzπ and Mxy are commutable; i.e., RzπMxy ¼ MxyRzπ.

ð17:17Þ

17.2

Successive Symmetry Operations

643

(v) Improper rotation: This is a combination of a proper rotation and a reflection by a mirror symmetry plane. That is, rotation around an axis is performed first and then reflection is carried out by a mirror plane that is perpendicular to the rotation axis. For instance, an improper rotation is expressed as 0

1 0

0

10

cos θ

sin θ

B CB CB M xy Rzθ ¼ B cos θ @ 0 1 0 A@ sin θ 0 0 1 0 0 0 1 cos θ sin θ 0 B C B ¼ @ sin θ cos θ 0 C A: 0 0 1

0

1

C 0C A 1

ð17:18Þ

As mentioned just above, the inversion symmetry I can be viewed as an improper rotation. Note that in this case the reflection and rotation operations are commutable. However, we will follow a conventional custom that considers the inversion symmetry as an independent symmetry operation. Readers may well wonder why we need to consider the improper rotation. The answer is simple; it solely rests upon the axiom (A1) of the group theory. A group must be closed with respect to the multiplication. The improper rotations are usually denoted by Sn. A subscript n again stands for an order of rotation.

17.2

Successive Symmetry Operations

Let us now consider successive reflections in different planes and successive rotations about different axes [2]. Figure 17.5a displays two reflections with respect to the planes σ and e σ both perpendicular to the xy-plane. The said planes make a dihedral angle θ with their intersection line identical to the z-axis. Also, the plane σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpendicularly to the zx-plane. As in (17.15), an operation σ is represented as 0

1 B σ ¼ @0 0

1 0 0 C 1 0 A: 0 1

ð17:19Þ

To determine a matrix representation of e σ , we calculate a matrix again as in the above case. As a result we have

644

17

(a)

Symmetry Groups

z

O

y

θ x

(c)

(b)

2

$

$ ‒2

Fig. 17.5 Successive two reflections about two planes σ and e σ that make an angle θ. (a) Reflections σ and e σ with respect to two planes. (b) Successive operations of σ and e σ in this order. The combined operation is denoted by e σ σ. The operations result in a 2θ rotation around the z-axis. (c) Successive operations of e σ and σ in this order. The combined operation is denoted by e σ σ. The operations result in a 2θ rotation around the z-axis

0

cos θ

B e σ¼B @ sin θ 0

0 cos 2θ

B ¼B @ sin 2θ 0

sin θ cos θ 0

0

10

CB B 0C A@ 0 1

sin 2θ cos 2θ 0

1

0

0 1

0 1 0

0

10

cos θ

CB B 0C A@ sin θ 1

0

sin θ cos θ 0

0

1

C 0C A 1

ð17:20Þ

C 0C A: 1

Notice that this matrix representation is referred to the original xyz-coordinate system; see discussion of Sect. 11.4. Hence, we describe the successive transformations σ followed by e σ as

17.2

Successive Symmetry Operations

645

0

cos 2θ B e σ σ ¼ @ sin 2θ 0

sin 2θ cos 2θ 0

1 0 C 0 A:

ð17:21Þ

1

The expression (17.21) means that the multiplication should be done first by σ and then by e σ; see Fig. 17.5b. Note that σ and e σ are conjugate to each other (Sect. 16.3). In this case, the combined operations produce a 2θ rotation around the z-axis. If, on the other hand, the multiplication is made first by e σ and then by σ, we have a 2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have 0

cos 2θ B σe σ ¼ @ sin 2θ 0

sin 2θ cos 2θ 0

1 0 C 0 A:

ð17:22Þ

1

Thus, successive operations of reflection by the planes that make a dihedral angle θ yield a rotation 2θ around the z-axis (i.e., the intersection line of σ and e σ ). The operation σe σ is an inverse to e σ σ. That is ðσe σ Þðe σ σ Þ ¼ E:

ð17:23Þ

We have det σe σ ¼ det e σ σ ¼ 1. Meanwhile, putting 0

cos 2θ

B R2θ ¼ @ sin 2θ 0

sin 2θ cos 2θ 0

0

1

C 0 A, 1

ð17:24Þ

we have e σ σ ¼ R2θ or e σ ¼ R2θ σ:

ð17:25Þ

This implies the following: Suppose a plane σ and a straight line on it. Also suppose that one first makes a reflection about σ and then makes a 2θ rotation around the said straight line. Then, the resulting transformation is equivalent to a reflection about e σ that makes an angle θ with σ. At the same time, the said straight line is an intersection line of σ and e σ. Note that a dihedral angle between the two planes σ and e σ is half an angle of the rotation. Thus, any two of the symmetry operations related to (17.25) are mutually dependent; any two of them produce the third symmetry operation. In the above illustration, we did not take account of the presence of a symmetry axis. If the aforementioned axis is a symmetry axis Cn, we must have

646

17

Symmetry Groups

Fig. 17.6 Successive two π rotations around two C2 axes of Cxπ and Caπ that intersect at an angle θ. Of these, Cxπ is identical to the x-axis (not shown)

$ 2 −2

2θ ¼ 2π=n or n ¼ π=θ:

ð17:26Þ

From a symmetry requirement, there should be n planes of mirror symmetry in combination with the Cn axis. Moreover, an intersection line of these n mirror symmetry planes should coincide with that Cn axis. This can be seen as various Cnv groups. Next we consider another successive symmetry operations. Suppose that there are two C2 axes (Cxπ and Caπ as shown) that intersect at an angle θ (Fig. 17.6). There, Cxπ is identical to the x-axis and the other (Caπ) lies on the xy-plane making an angle θ with Cxπ. Following procedures similar to the above, we have matrix representations of the successive C2 operations in reference to the xyz-system such that 0

1

0

0

1

B C C C xπ ¼ B @ 0 1 0 A 0 0 1 0 10 cos θ sin θ 0 1 0 B CB B C aπ ¼ B cos θ 0 C @ sin θ A@ 0 1 0 0 1 0 0 0 1 cos 2θ sin 2θ 0 B C C ¼B @ sin 2θ cos 2θ 0 A, 0 0 1

0

10

cos θ

CB B 0 C A@ sin θ 1 0

sin θ cos θ 0

0

1

C 0C A 1

ð17:27Þ where Caπ can be calculated similarly to (17.20). Again we get

17.2

Successive Symmetry Operations

0

647

sin 2θ

cos 2θ

B C aπ C xπ ¼ B @ sin 2θ 0

0

C 0C A,

cos 2θ

0

0

cos 2θ

B C xπ C aπ ¼ B @ sin 2θ 0

1

1

sin 2θ

0

1

ð17:28Þ

C 0C A:

cos 2θ 0

1

Notice that (CxπCaπ )(CaπCxπ) ¼ E. Once again putting 0

R2θ

cos 2θ B ¼ @ sin 2θ 0

sin 2θ cos 2θ 0

1 0 C 0 A,

ð17:29Þ

1

we have [1, 2] Caπ C xπ ¼ R2θ or C aπ ¼ R2θ Cxπ :

ð17:30Þ

Note that in the above two illustrations for the successive symmetry operations, both the relevant operators have been represented in reference to the original xyz-system. For this reason, the latter operation was done from the left in (17.28). From a symmetry requirement, once again the aforementioned C2 axes must be present in combination with the Cn axis. Moreover, those C2 axes should be perpendicular to the Cn axis. This can be seen in various Dn groups. Another illustration of successive symmetry operations is an improper rotation. If the rotation angle is π, this causes an inversion symmetry. In this illustration, reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical example. Equations (17.21), (17.22), and (17.29) demonstrate the same relation. Namely, two successive mirror symmetry operations about a couple of planes and two successive π-rotations about a couple of C2 axes cause the same effect with regard to the geometric transformation. In this relation, we emphasize that two successive reflection operations make a determinant of relevant matrices 1. These aspects cause an interesting effect and we will briefly discuss it in relation to O and Td groups. If furthermore the abovementioned mirror symmetry planes and C2 axes coexist, the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were not the case, another mirror plane or C2 axis would be generated from the symmetry requirement and the newly generated plane or axis would be coupled with the original plane or axis. From the above argument, these processes again produce another Cn axis. That must be prohibited. Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry. A rotation of 2π/n around such an axis produces another mirror symmetry plane.

648

17

Symmetry Groups

Fig. 17.7 Mirror symmetry plane σ h and a sole rotation axis perpendicular to it

This newly generated plane intersects with the original mirror plane and produces a different Cn axis according to the above discussion. Thus, in this situation a mirror symmetry plane cannot coexist with a sole rotation axis. In a geometric object with higher symmetry such as Oh, however, several mirror symmetry planes can coexist with several rotation axes in such a way that the axes intersect with the mirror planes obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane, that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the case with a geometric object having a C2h symmetry. The mirror symmetry plane is denoted by σ h. Now, let us examine simple examples of molecules and associated symmetry operations. Example 17.2 Figure 17.8 shows chemical structural formulae of thiophene, bithiophene, biphenyl, and naphthalene. These molecules belong to C2v, C2h, D2, and D2h, respectively. Note that these symbols are normally used to show specific point groups. Notice also that in biphenyl two benzene rings are twisted relative to the molecular axis. As an example, a multiplication table is shown in Table 17.1 for a C2v group. Table 17.1 clearly demonstrates that the group constitution of C2v differs from that of the group appearing in Example 16.1 (iii), even though the order is four for both the case. Similar tables are given with C2h and D2. This is left for readers as an exercise. We will find that the multiplication tables of these groups have the same structure and that C2v, C2h, and D2 are all isomorphic to one another as a four group. Table 17.2 gives matrix representation of symmetry operations for C2v. The representation is defined as the transformation by the symmetry operations of a set of basis vectors (x y z) in ℝ3. Meanwhile, Table 17.3 gives the multiplication table of D2h. We recognize that the multiplication table of C2v appears on upper left and lower right blocks. If we suitably rearrange the order of group elements, we can make another multiplication table so that, e.g., C2h may appear on upper left and lower right blocks. As in the case of Table 17.2, Table 17.4 summarizes the matrix representation of symmetry operations for D2h. There are eight group elements, i.e., an identity, an inversion, three mutually perpendicular C2 axes, and three mutually perpendicular planes of mirror symmetry (σ). Here, we consider a possibility of constructing subgroups of D2h. The order of the subgroups must be a divisor of eight, and so let us list subgroups whose order is four and examine how many subgroups exist. We have 8C4 ¼ 70 combinations, but those allowed should be restricted from the requirement

17.2

Successive Symmetry Operations

649

Fig. 17.8 Chemical structural formulae and point groups of (a) thiophene, (b) bithiophene, (c) biphenyl, and (d) naphthalene

Table 17.1 Multiplication table of C2v

(a)

(b) 6

6

(d)

(c)

C2v E C2(z) σ v(zx) σ 0v ðyzÞ

E E C2 σv σ 0v

C2(z) C2 E σ 0v σv

6

σ v(zx) σv σ 0v E C2

σ 0v ðyzÞ σ 0v σv C2 E

Table 17.2 Matrix representation of symmetry operations for C2v Matrix

E 0

1 0 B @0 1 0 0

1

0 C 0A 1

C2 (around z-axis) 1 0 1 0 0 C B @ 0 1 0 A 0 0 1

σ v(zx) 1 0 1 0 0 C B @ 0 1 0 A 0 0 1

σ 0 ðyzÞ 1 0v 1 0 0 C B @ 0 1 0A 0 0 1

of forming a group. This is because all the groups must contain identity element, and so the number allowed is equal to or no greater than 7C3 ¼ 35. (i) In light of the aforementioned discussion, two C2 axes mutually intersecting at π/2 yield another C2 axis around the normal to a plane defined by the intersecting axes. Thus, three C2 axes have been chosen and a D2 symmetry results. In this case, we have only one choice. (ii) In the case of C2v, two planes mutually intersecting at π/2 yield a C2 axis around their line of intersection. There are three possibilities of choosing two axes out of three (i.e., 3C2 ¼ 3). (iii) If we choose the inversion (i) along with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case when we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e., 3C1 ¼ 3) as well. Thus, we have only seven choices to construct subgroups of D2h having an order of four. This is summarized in Table 17.5. An inverse of any element is that element itself. Therefore, if with any above subgroup one chooses any element out of the remaining four elements and combines it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since all those subgroups of an order of 4 and 2 are commutative with D2h, these subgroups are invariant subgroups. Thus in terms of a direct-product group, D2h can be expressed as various direct factors. Conversely, we can construct factor groups from coset decomposition. For instance, we have

650

17

Symmetry Groups

Table 17.3 Multiplication table of D2h D2h E C2(z) σ v(zx) σ 0v ðyzÞ i σ 00v ðxyÞ C 02 ðyÞ C 002 ðxÞ

E E C2 σv σ 0v i σ 00v C 02 C 002

C2(z) C2 E σ 0v σv σ 00v i C 002 C 02

σ v(zx) σv σ 0v E C2 C 02 C 002 i σ 00v

σ 0v ðyzÞ σ 0v σv C2 E C 002 C 02 σ 00v i

i i σ 00v C 02 C 002 E C2 σv σ 0v

σ 00v ðxyÞ σ 00v i C 002 C 02 C2 E σ 0v σv

C 02 ðyÞ C 02 C 002 i σ 00v σv σ 0v E C2

D2h =C 2v ffi C s , D2h =C2v ffi C 2 , D2h =C 2v ffi C i :

C 002 ðxÞ C 002 C 02 σ 00v i σ 0v σv C2 E

ð17:31Þ

In turn, we express direct-product groups as, e.g., D2h ¼ C2v C s , D2h ¼ C 2v C 2 , D2h ¼ C 2v C i : Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are designated as shown. As for a chemical species of molecules, we have, e.g., boron trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center with three fluorine atoms located at vertices of the equilateral triangle. The boron atom and fluorine atoms form a planar molecule (Fig. 17.10). A symmetry group belonging to D3h comprises twelve symmetry operations such that D3h ¼ E, C 3 , C03 , C 2 , C 02 , C002 , σ h , S3 , S03 , σ v , σ 0v , σ 00v ,

ð17:32Þ

where symmetry operations of the same species but distinct operation are denoted by a “prime” or “double prime.” When we represent these operations by a matrix, it is straightforward in most cases. For instance, a matrix for σ v is given by Myz of (17.15). However, we should make some matrix calculations about C 02 , C 002 , σ 0v , and σ 00v . To determine a matrix representation of, e.g., σ 0v in reference to the xy-coordinate system with orthonormal basis vectors e1 and e2, we consider the x’y’-coordinate system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation matrix between the two set of basis vectors is represented by

D2h E 1 0 Matrix 1 0 0 C B @0 1 0A 0 0 1

C2(z) 1 0 1 0 0 C B @ 0 1 0 A 0 0 1

C2( y) 1 0 1 0 0 C B @ 0 1 0 A 0 0 1

C2(x) 1 0 1 0 0 C B @ 0 1 0 A 0 0 1

Table 17.4 Matrix representation of symmetry operations for D2h σ(xy) 1 1 0 1 0 0 1 0 0 C C B B @ 0 1 0 A @ 0 1 0 A 0 0 1 0 0 1

i 0

σ(zx) 0 1 0 B @ 0 1 0 0

σ(yz) 1 0 1 1 0 0 0 C B C 0A @ 0 1 0A 0 0 1 1

17.2 Successive Symmetry Operations 651

652

17

Table 17.5 Choice of symmetry operations for construction of subgroups of D2h

Subgroup D2 C2v C2h

E 1 1 1

σ 0 2 1

C2(z) 3 1 1

0 pffiffiffi 3 B 2 B 0 0 0 e1 e2 e3 ¼ ðe1 e2 e3 ÞRzπ6 ¼ ðe1 e2 e3 ÞB B 1 @ 2 0

1 pffiffi2ffi 3 2 0

Symmetry Groups i 0 0 1

1 0C C C: 0C A

Choice 1 3 3

ð17:33Þ

1

This representation corresponds to (11.69). Let Σv be a reflection with respect to the z0x0-plane. This is the same operation as σ 0v . However, a matrix representation is different. This is because Σv is represented in reference to the x’y’z’-system, while σ 0v is in reference to xyz-system. The matrix representation of Σv is simple and expressed as 0

1

B Σv ¼ @ 0 0

0 1 0

0

1

C 0 A: 1

ð17:34Þ

Referring to (11.80), we have h i1 Rzπ6 Σv ¼ σ 0v Rzπ6 or σ 0v ¼ Rzπ6 Σv Rzπ6 :

ð17:35Þ

Thus, we see that in the first equation of (17.35) the order of multiplications are reversed according as the latter operation is expressed in the xyz-system or in the x’y’z’-system. As a full matrix representation, we get 0 pffiffiffi 3 B 2 B σ 0v ¼ B B 1 @ 2 0 0 1 B 2 B pffiffiffi ¼B B 3 @ 2 0

1 pffiffi2ffi 3 2 0 pffiffiffi 3 2 1 2 0

1

0

0C 1 CB C@ 0 0C A 0 1 1 0C C C: 0C A

0 1 0

0 pffiffiffi 3 0 B 2 CB 0 AB B1 @ 2 1 0 1

1 p2ffiffiffi 3 2 0

1 0C C C 0C A 1

ð17:36Þ

1

Notice that this matrix representation is referred to the original xyz-coordinate system as before. Graphically, (17.35) corresponds to the multiplication of the

17.2

Successive Symmetry Operations

653

Fig. 17.9 Equilateral triangle placed on the xyplane. Several symmetry operations of D3h are shown. We consider the successive operations (i), (ii), and (iii) to represent σ 0v in reference to the xy-coordinate system (see text)

L ± /6 LLL LL (Σ );

Σ

Fig. 17.10 Boron trifluoride (BF3) belonging to a D3h point group

% symmetry operations done in the order of (i) π/6 rotation, (ii) reflection (denoted by Σv), and (iii) π/6 rotation (Fig. 17.9). The associated matrices are multiplied from the left. Similarly, with C02 we have 0

1 B 2 B pffiffiffi C 02 ¼ B B 3 @ 2 0

1 pffiffiffi 3 0 C 2 C C: 1 0 C A 2 0 1

ð17:37Þ

The matrix form of (17.37) can also be decided in a manner similar to the above according to three successive operations shown in Fig. 17.9. In terms of classes, σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy class. With regard to the reflection and rotation, we have det σ 0v ¼ 1 and det C2 ¼ 1, respectively.

654

17.3

17

Symmetry Groups

O and Td Groups

According as a geometric object (or a molecule) has a higher symmetry, we have to deal with many symmetry operations and relationship between them. As an example, we consider O and Td groups. Both the groups have 24 symmetry operations and are isomorphic. Let us think of the group O fist. We start with considering rotations of π/2 around x-, y-, and z-axis. The matrices representing these rotations are obtained from (17.12) to give 0

1

B Rxπ2 ¼ @ 0 0 0

0 B Rzπ2 ¼ @ 1 0

0 0 1 1

0

1

C 1 A, 0 0

1

0

C 0 A:

0

1

0

0

0 1

1

0 0

B Ryπ2 ¼ @ 0

1

C 1 0 A,

ð17:38Þ

We continue multiplications of these matrices so that the matrices can make up a complete set (i.e., a closure). Counting over those matrices, we have 24 of them and they form a group termed O. The group O is a pure rotation group. Here, the pure rotation group is defined as a group whose group elements only comprise proper rotations (with their determinant of 1). An example is shown in Fig. 17.11, where individual vertices of the cube have three arrows for the cube not to possess mirror symmetries or inversion symmetry. The group O has five conjugacy classes. Figure 17.12 summarizes them. Geometrical characteristics of individual classes are sketched as well. These classes are categorized by a trace of the matrix. This is because the trace is kept unchanged by a similarity transformation. (Remember that elements of the same conjugacy class are connected with a similarity transformation.) Having a look at the sketches, we notice that each operation switches the basis vectors e1, e2, and e3, i.e., x-, y-, and z-axes. Therefore, the presence of diagonal elements (either 1 or 1) implies that the matrix takes the basis vector(s) as an eigenvector with respect to the rotation. Corresponding eigenvalue(s) are either 1 or 1 accordingly. This is expected from the fact that the matrix is an orthogonal matrix (i.e., unitary). The trace, namely a summation of diagonal elements, is closely related to the geometrical feature of the operation. The operations of a π rotation around the x-, y-, and z-axis and those of a π rotation around an axis bisecting any two of three axes have a trace 1. The former operations take all the basis vectors as an eigenvectors; that is, all the diagonal elements are nonvanishing. With the latter operations, however, only one diagonal element is 1. This feature comes from that the bisected axes are switched by the rotation, whereas the remaining axis is reversed by the rotation.

O and Td Groups

17.3

655

Fig. 17.11 Cube whose individual vertices have three arrows for the cube not to possess mirror symmetries. This object belongs to a point group O called pure rotation group χ=3

z 100 010 001

y z

χ=1 10 0 0 0 -1 01 0

1 00 0 01 0 -1 0

0 0 1 0 1 0 -1 0 0

0 0 -1 0 1 0 1 0 0

0 -1 0 1 0 0 0 0 1

0 1 0 -1 0 0 0 0 1

x

π/2 y

x

z

χ=0 001 100 010

0 01 -1 0 0 0 -1 0

0 0 -1 1 00 0 -1 0

0 0 -1 -1 0 0 0 10

010 001 100

0 -1 0 0 0 -1 10 0

0 1 0 0 0 -1 -1 0 0

χ = 䠉1 10 0 0 -1 0 0 0 -1

y x

2π/3

z -1 0 0 0 1 0 0 0 -1

-1 0 0 0 -1 0 0 01

π

x

χ = 䠉1 -1 0 0 00 1 01 0

0 -1 0 0 0 1 -1 0 0

-1 0 0 0 0 -1 0 -1 0

0 1 0 1 0 0 0 0 -1

0 -1 0 -1 0 0 0 0 -1

0 0 1 0 -1 0 1 0 0

0 0 -1 0 -1 0 -1 0 0

z

y

y x π

Fig. 17.12 Point group O and its five conjugacy classes. Geometrical characteristics of individual classes are briefly sketched

Another characteristic is the generation of eight rotation axes that trisect the x-, y-, and z-axes, more specifically, a solid angle π/2 formed by the x-, y-, and z-axes. Since the rotation switches all the x-, y-, and z-axes, the trace is zero. At the same time, we find that this operation belongs to C3. This operation is generated by successive two π/2 rotations around two mutually orthogonal axes. To inspect this situation more

656

17

Symmetry Groups

closely, we consider a conjugacy class of π/2 rotation that belongs to the C4 symmetry and includes six elements, i.e., Rxπ2 , Ryπ2 , Rzπ2 , Rxπ2 , Ryπ2 , and Rzπ2 . With these notations, e.g., Rxπ2 stands for a π/2 counterclockwise rotation around the x-axis; Rxπ2 denotes a π/2 counterclockwise rotation around the x-axis. Consequently, Rxπ2 implies a π/2 clockwise rotation around the x-axis and, hence, an inverse element of Rxπ2 . Namely, we have 1 Rxπ2 ¼ Rxπ2 :

ð17:39Þ

Now let us consider the successive two rotations. This is denoted by the multiplication of matrices that represent the related rotations. For instance, the multiplication of, e.g., Rxπ2 and R0yπ produces the following: 2

0

1

B Rxyz2π3 ¼ Rxπ2 R0yπ ¼ @ 0 2 0

0 0 1

0

10

0

CB 1 A@ 0 0 1

0 1

1

0

0

C B 1 0A ¼ @1 0 0 0

0 0 1

1

1

C 0 A: 0

ð17:40Þ

In (17.40), we define Rxyz2π3 as a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-, and z-axes. The prime “ 0 ” of R0yπ means that the operation is carried 2 out in reference to the new coordinate system reached by the previous operation Rxπ2. For this reason, R0yπ is operated (i.e., multiplied) from the right in (17.40). Compare 2 this with the remark made just after (17.30). Changing the order of Rxπ2 and R0yπ , we 2 have 0

Rxyz2π3

0 B 0 π ¼ Ry2 Rxπ ¼ @ 0 2 1

0 1

10 1 1 0 CB 0 A@ 0 0

0

0

0 1

1 0 0 0 C B 1 A ¼ @ 0 0

1 0

1 0

1 0 C 1 A, ð17:41Þ 0

where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-, and z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 . 2 Thus, we notice that there are eight related operations that trisect eight octants of the coordinate system. These operations are further categorized into four sets in which the two elements are an inverse element of each other. For instance, we have

1 Rx y z2π3 ¼ Rxyz2π3 :

ð17:42Þ

Notice that a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-, and z-axes is equivalent to a 2π/3 clockwise rotation around an axis that trisects the

1 etc. Moreover, we have “cyclic” x-, y-, and z-axes. Also, we have Rx yz2π3 ¼ Rxyz2π3 relations such as

17.3

O and Td Groups

657

Rxπ2 R0yπ ¼ Ryπ2 R0zπ ¼ Rzπ2 R0xπ ¼ Rxyz2π3 : 2

2

ð17:43Þ

2

Returning back to Sect. 11.4, we had 0

1 2 0 13 x1 x1 C 6 B C7 0 B A½PðxÞ = ½ðe1 en ÞPA @ ⋮ A ¼ ðe1 en Þ4AO P@ ⋮ A5: xn

ð11:79Þ

xn

Implication of (11.79) is that0LHS1is related to the transformation of basis vectors x1 B C while retaining coordinates @ ⋮ A and that transformation matrices should be xn operated on the basis vectors from the right. Meanwhile, RHS describes the transformation of coordinates, while retaining basis vectors. In that case, transformation matrices should be operated on the coordinates from the left. Thus, the order of operator multiplication is reversed. Following (11.80), we describe 1 Rxπ2 R0yπ ¼ RO Rxπ2 , i:e:, RO ¼ Rxπ2 R0yπ Rxπ2 , 2

2

ð17:44Þ

where RO is viewed in reference to the original (or fixed) coordinate system and conjugated to R0yπ . Thus, we have 2

0

1

B RO ¼ @ 0 0 0 0 B ¼ @1 0

0

0

10

0

CB 1 A@ 0 0 1 1 1 0 C 0 0 A: 0 1 0 1

0 1

10

1

CB 1 0 A@ 0 0 0 0

0

0

1

C 0 1A 1 0 ð17:45Þ

Note that (17.45) is identical to a matrix representation of a π/2 rotation around the zaxis. This is evident from the fact that the y-axis is converted to the original z-axis by Rxπ2 ; readers, imagine it. We have two conjugacy classes of π rotation (the C2 symmetry). One of them includes six elements, i.e., Rxyπ , Ryzπ, Rzxπ, Rxyπ , Ryzπ , and Rzxπ . For these notations a subscript, e.g., xy stands for an axis that bisects the angle formed by x- and y-axes. A subscript xy denotes an axis bisecting the angle formed by x- and y-axes. Another class includes three elements, i.e., Rxπ , Ryπ, and Rzπ. As for Rxπ, Ryπ, and Rzπ, a combination of these operations should yield a C2 rotation axis as discussed in Sect. 17.2. Of these three rotation axes, in fact, any two

658

17

Symmetry Groups

produce a C2 rotation around the remaining axis, as is the case with naphthalene belonging to the D2h symmetry (see Sect. 17.2). Regarding the class comprising six π rotation elements, a combination of, e.g., Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put θ ¼ π/3 there. In this respect, elementary analytic geometry teaches the positional relationship among planes and is as follows: A plane 0 straight 1 0 lines. 1 The0argument 1 x2 x3 x1 B C B C B C determined by three points @ y1 A, @ y2 A, and @ y3 A that do not sit on a line is z1 z2 z3 expressed by a following equation: x x 1 x2 x 3

y

z

y1 x2

z1 z2

y3

z3

1 1 ¼ 0: 1 1

ð17:46Þ

0 1 0 1 0 1 0 1 0 B C B C B C Substituting @ 0 A, @ 1 A, and @ 1 A, we have 0 0 1 x y þ z ¼ 0:

ð17:47Þ

Taking account of direction cosines and using the Hesse’s normal form, we get 1 pffiffiffi ðx y þ zÞ ¼ 0, 3

ð17:48Þ

where the normal to the plane expressed in (17.48) has direction cosines of p1ffiffi3, p1ffiffi3, and p1ffiffi in relation to the x-, y-, and z-axes, respectively. 3

the normal is given by a straight line connecting the origin and 0 Therefore, 1 1 B C @ 1 A. In other words, a line connecting the origin and a corner of a cube is the 1 normal to the plane described by (17.48). That plane is formed by two intersecting lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side of 2). These axes0makepan π/3; this1 can easily be checked by taking an inner ffiffiffi 1angle 0 0 1= 2 B pffiffiffi C B pffiffiffi C product between @ 1= 2 A and @ 1= 2 A. These column vectors are two direction pffiffiffi 1= 2 0 cosines of C2 and C 02 . On the basis of the discussion of Sect. 17.2, we must have a

17.3

O and Td Groups

659

z (0, 1, 1) 2 (1, ‒1, 1)

3

π/3 O

y

2 (1, 1, 0)

x Fig. 17.13 Rotation axes of C2 and C 02 along with another rotation axis C3 in a point group O

6KHHW

6KHHW

6KHHW

Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight lines in three-dimensional space. To make it, follow next procedures: (i) Take three thick sheets of paper and make slits (dashed lines) as shown. (ii) Insert Sheet 2 into Sheet 1 so that the two sheets can make a right angle. (iii) Insert Sheet 3 into combined Sheets 1 and 2

rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three intersecting sides. It is sometimes hard to visualize or envisage the positional relationship among planes and straight lines in three-dimensional space. It will therefore be useful to make a simple kit to help visualize it. Figure 17.14 gives an illustration. Another typical example having 24 group elements is Td. A molecule of methane belongs to this symmetry. Table 17.6 collects the relevant symmetry operations and their (3, 3) matrix representations. As in the case of Fig. 17.12, the matrices show how a set of vectors (x y z) are transformed according to the symmetry operations. Comparing it with Fig. 17.12, we immediately recognize that the close relationship between Td and O exists and that these point groups share notable characteristics. (i) Both Td and O consist of five conjugacy classes, each of which contains the same number of symmetry species. (ii) Both Td and O contain a pure rotation group T as a subgroup. The subgroup T consists of 12 group elements E, 8C3, and 3C2. Other remaining twelve group elements of Td are symmetry species related to reflection; S4 and σ d. The elements 6S4 and 6σ d correspond to 6C4 and 6C2 of O,

1 B @0 0

0

1 0 1 C 0 0A 1 0

1 0 0 C 1 0A 0 1

1 1 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C C B C B C B B 0 A @ 1 0 0 A @ 0 0 1 A @ 1 0 0 A @ 1 0 1 0 0 0 1 0 0 1 0 0 1 0

0

1 1 0 1 0 0 1 0 0 1 0 0 1 0 C C B C B B 0 1A @ 0 0 1 A @ 0 0 1 A @ 0 1 0 0 1 0 0 1 0 0 0

1 0 0 C 0 1A 1 0

1 1 0 0 0 1 1 0 0 C C B B @ 0 0 1 A @ 0 1 0 A 1 0 0 0 1 0

0

1 1 0 0 1 0 0 0 1 C C B B @ 0 1 0 A @1 0 0A 0 0 1 1 0 0 0

1 0 1 0 C B @ 1 0 0 A 0 0 1 0

17

1 B @0 0

6σ d: 0

1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 C C B C B C B C B C B B 0 A @ 1 0 0 A 0 1 A @ 0 1 0 A @ 0 1 0 A @ 1 0 @ 0 0 1 A @ 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0

6S4: 0

1 1 0 1 0 1 0 0 1 0 0 1 0 0 C C B C B B @ 0 1 0 A @ 0 1 0 A @ 0 1 0 A 0 0 1 0 0 1 0 0 1

3C2: 0

0 B @1 0

8C3: 0

E:

Table 17.6 Symmetry operations and their matrix representation of Td

660 Symmetry Groups

17.3

O and Td Groups

661

respectively. That is, successive operations of S4 cause similar effects to those of C4 of O. Meanwhile, successive operations of σ d are related to those of 6C2 of O. Let us imagine in Fig. 17.15 that a regular tetrahedron is inscribed in a cube. As an example of symmetry operations, suppose that three pairs of planes of σ d (six planes in total) are given by equations of x ¼ y and y ¼ z and z ¼ x. Their Hesse’s normal forms are represented as 1 pffiffiffi ðx yÞ ¼ 0, 2

ð17:49Þ

1 pffiffiffi ðy zÞ ¼ 0, 2

ð17:50Þ

1 pffiffiffi ðz xÞ ¼ 0: 2

ð17:51Þ

Then, a dihedral angle α of the two planes is given by 1 1 1 cos α ¼ pffiffiffi ∙ pffiffiffi ¼ or 2 2 2

ð17:52Þ

1 1 1 1 cos α ¼ pffiffiffi ∙ pffiffiffi pffiffiffi ∙ pffiffiffi ¼ 0: 2 2 2 2

ð17:53Þ

That is, α ¼ π/3 or α ¼ π/2. On the basis of the discussion of Sect. 17.2, the intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the case of C3 the intersection is a straight line connecting the origin and a vertex of the cube. This can readily be verified as follows: For instance, two planes given by x ¼ y and y ¼ z make an angle π/3 and produce an intersection line x ¼ y ¼ z. This line, in turn, connects the origin and a vertex of the cube. If we choose, e.g., two planes x ¼ y from the above, these planes make a right angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-, y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and O are related. Let us more closely inspect the structure and constitution of O and Td. First we construct mapping ρ between group elements of O and Td such that ρ : g 2 O ⟷ g0 2 T d , if g 2 T, = T: ρ : g 2 O ⟷ g0 Þ1 2 T d , if g 2 In the above relation, the minus sign indicates that with an inverse representation matrix R must be replaced with R. Then, ρ(g) ¼ g0 is an isomorphic mapping. In fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for O and Td. For example, taking the first matrix of S4 of Td, we have

662

17

Symmetry Groups

z

Fig. 17.15 Regular tetrahedron inscribed in a cube. As an example of symmetry operations, we can choose three pairs of planes of σ d (six planes in total) given by equations of x ¼ y and y ¼ z and z¼ x

y

O

x

0

1

B @ 0 0

0 0 1

0

11

C 1 A

0

1

0

0

1

B ¼ @ 0

0

0

0

1

0

1 C B0 1A ¼ @ 0 0

1 0 0 0 1 C A: 1 0

The resulting matrix is identical to the first matrix of C4 of O. Thus, we find Td and O are isomorphic to each other. Both O and Td consist of 24 group elements and isomorphic to a symmetric group S4; do not confuse it with the same symbol S4 as a group element of Td. The subgroup T consists of three conjugacy classes E, 8C3, and 3C2. Since T is constructed only by entire classes, it is an invariant subgroup; in this respect see the discussion of Sect. 16.3. The groups O, Td, and T along with Th and Oh form cubic groups [2]. In Table 17.7, we list these cubic groups together with their name and order. Of these, O is a pure rotation subgroup of Oh and T is a pure rotation subgroup of Td and Th. Symmetry groups are related to permutations of n elements (or objects or numbers). The permutation has already appeared in (11.56) when we defined a determinant of a matrix. That was defined as σ¼

1 i1

2 i2

n , in

where σ means the permutation of the numbers 1, 2, , n. Therefore, the above symmetry group has n! group elements (i.e., different ways of rearrangements). Although we do not dwell on symmetric groups much, we describe a following important theorem related to finite groups without proof. Interested readers are referred to literature [1].

17.4

Special Orthogonal Group SO(3)

663

Table 17.7 Several cubic groups and their characteristics Notation T Th, Td O Oh

Group name Tetrahedral rotation Tetrahedral Octahedral rotation Octahedral

Order 12 24 24 48

Remark Subgroup of Th, Td Subgroup of Oh

Theorem 17.1: Cayley’s Theorem [1] Every finite group ℊ of order n is isomorphic to a subgroup (containing a whole group) of the symmetric group Sn.

17.4

Special Orthogonal Group SO(3)

In Part III and Part IV thus far, we have dealt with a broad class of linear transformations. Related groups are finite groups. Here we will describe characteristics of special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an axis (a line through the origin) with the origin fixed. The rotation is defined by an azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is defined by two parameters and the magnitude is defined by one parameter, a rotation angle. Hence, the rotation is defined by three independent parameters. Since those parameters are continuously variable, SO(3) is one of continuous groups. Two rotations result in another rotation with the origin again fixed. A reverse rotation is unambiguously defined. An identity transformation is naturally defined. An associative law holds as well. Thus, the relevant rotations form a group, i.e., SO (3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A matrix representation is uniquely determined once an orthonormal basis is set in ℝ3. Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is defined by RT R ¼ RRT ¼ E,

ð17:54Þ

where R is a real matrix with det R ¼ 1. Notice that we exclude the case where det R ¼ 1. Matrices that satisfy (17.54) with det R ¼ 1 are referred to as orthogonal matrices that cause orthogonal transformation. Correspondingly, orthogonal groups (represented by orthogonal matrices) contain rotation groups as a special case. In other words, the orthogonal groups contain the rotation groups as a subgroup. An orthogonal group in ℝ3 denoted O(3) contains SO(3) as a subgroup. By the same token, orthogonal matrices contain rotation matrices as a special case. In Sect. 17.2 we treated reflection and improper rotation with a determinant of their matrices being 1. In this section these transformations are excluded and only rotations are dealt with. We focus on geometric characteristics of the rotation groups.

664

17

Symmetry Groups

Readers are referred to more detailed representation theory of SO(3) in appropriate literature [1].

17.4.1 Rotation Axis and Rotation Matrix In this section we represent a vector such as jxi. We start with showing that any rotation has a unique presence of a rotation axis. The rotation axis is defined by the following: Suppose that there is a rigid body with some point within the body fixed. Here the said rigid body can be that with infinite extent. Then the rigid body exerts rotation. The rotation axis is a line on which every point is unmoved during the rotation. As a matter of course, identity matrix E has thee linearly independent rotation axes. (Practically, this represents no rotation.) Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis. Unless the rotation matrix is identity, the rotation matrix should be accompanied by one and only one rotation axis. Proof As R is an orthogonal matrix of a determinant 1, so are RT and R1. Then we have ðR EÞT ¼ RT E ¼ R1 E:

ð17:55Þ

Hence, we get

det ðR EÞ ¼ det RT E ¼ det R1 E ¼ det R1 ðE RÞ ¼ det R1 det ðE RÞ ¼ det ðE RÞ ¼ det ðR EÞ: ð17:56Þ Note here that with any (3, 3) matrix A of ℝ3, det A ¼ ð1Þ9 det A ¼ det ðAÞ: This equality holds with ℝn (n : odd), but in ℝn (n : even) we have det A ¼ det (A). Then, (17.56) results in a trivial equation 0 ¼ 0 accordingly. Therefore, the discussion made below only applies to ℝn (n : odd). Thus, from (17.56) we have det ðR E Þ ¼ 0: This implies that for ∃jx0i 6¼ 0

ð17:57Þ

17.4

Special Orthogonal Group SO(3)

665

ðR E Þ j x0 i ¼ 0:

ð17:58Þ

Rðajx0 iÞ ¼ a j x0 i,

ð17:59Þ

Therefore, we get

where a is an arbitrarily chosen real number. In this case an eigenvalue of R is 1, which an eigenvector a j x0i corresponds to. Thus, as a rotation axis, we have a straight line expressed as l ¼ Spanfajx0 i; a 2 ℝg:

ð17:60Þ

This proves the presence of a rotation axis. Next suppose that there are two (or more) rotation axes. The presence of two rotation axes naturally implies that there are two linearly independent vectors (i.e., two straight lines that mutually intersect at the fixed point). Suppose that such vectors are jui and jvi. Then we have ðR E Þjui ¼ 0,

ð17:61Þ

ðR EÞjvi ¼ 0:

ð17:62Þ

Let us consider a vector 8jyi that is chosen from Span{j ui, j vi}, to which we assume that Spanfjui, vijg Spanfajui, bjvi; a, b 2 ℝg:

ð17:63Þ

That is, Span{j ui, j vi} represents a plane P formed by two mutually intersecting straight lines. Then we have y ¼ sjui þ tjvi,

ð17:64Þ

where s and t are some real numbers. Operating R E on (17.64), we have ðR EÞjyi ¼ ðR E Þðsjui þ tjviÞ ¼ sðR E Þjui þ t ðR E Þjvi ¼ 0:

ð17:65Þ

This indicates that any vectors in P can be an eigenvector of R, implying that an infinite number of rotation axes exist. Now, take another vector jwi that is perpendicular to the plane P (See Fig. 17.16). Let us consider an inner product hyj Rwi. Since Rjui ¼ jui, hujR{ ¼ hujRT ¼ huj: Similarly, we have

ð17:66Þ

666

17

Symmetry Groups

| ⟩

Fig. 17.16 Plane P formed by two mutually intersecting straight lines represented by jui and j vi. Another vector jwi is perpendicular to the plane P

P | ⟩

| ⟩

hvjRT ¼ hvj :

ð17:67Þ

Therefore, using the relation (17.64), we get hyjRT ¼ hyj :

ð17:68Þ

Here we are dealing with real numbers and, hence, we have R{ ¼ RT:

ð17:69Þ

Now we have hyjRwi ¼ yRT jRw ¼ yjRT Rw ¼ hyjEwi ¼ hyjwi ¼ 0:

ð17:70Þ

In (17.70) the second equality comes from the associative law; the third is due to (17.54). The last equality comes from that jwi is perpendicular to P. From (17.70) we have hyjðR E Þwi ¼ 0:

ð17:71Þ

However, we should be careful not to conclude immediately from (17.71) that (R E)jwi ¼ 0; i.e., Rjwi ¼ jwi. This is because in (17.70) jyi does not represent all vectors in ℝ3, but merely represent all the vectors in Span{j ui, j vi}. Nonetheless, both jwi and jRwi are perpendicular to P, and so jRwi ¼ ajwi, where a is an arbitrarily chosen real number. From (17.72), we have

ð17:72Þ

17.4

Special Orthogonal Group SO(3)

667

{ wR jRw ¼ wRT jRw ¼ hwjwi ¼ jaj2 hwjwi, i:e:, a ¼ 1, where the first equality comes from that R is an orthogonal matrix. Since det R¼1, a ¼ 1. Thus, from (17.72) this time around we have jRwi ¼ jwi; that is, ðR E Þjwi ¼ 0:

ð17:73Þ

Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen from ℝ3 we have ðR EÞjxi ¼ 0:

ð17:74Þ

R E ¼ 0 or R ¼ E:

ð17:75Þ

Consequently, we get

The above procedures represented by (17.61) to (17.75) indicate that the presence of two rotation axes necessarily requires a transformation matrix to be identity. This implies that all the vectors in ℝ3 are an eigenvector of a rotation matrix. Taking contraposition of the above, unless the rotation matrix is identity, the relevant rotation cannot have two rotation axes. Meanwhile, the proof of the former half ensures the presence at least one rotation axis. Consequently, any rotation is characterized by a unique rotation axis except for the identity transformation. This completes the proof. ∎ An immediate consequence of Theorem 17.2 is that the rotation matrix should have an eigenvalue 1 which an eigenvector representing the rotation axis corresponds to. This statement includes a trivial fact that all the eigenvalues of the identity matrix are 1. In Sect. 14.4, we calculated eigenvalues of a two-dimensional rotation matrix. The eigenvalues were eiθ or eiθ, where θ is a rotation angle. Let us consider rotation matrices that we dealt with in Sect. 17.1. The matrix representing the rotation around the z-axis by a rotation angle θ is expressed by 0

cos θ

B R ¼ @ sin θ 0

sin θ cos θ 0

0

1

C 0 A: 1

ð17:8Þ

In Sect. 14.4 we treated diagonalization of a rotation matrix. As R is reduced, the diagonalization can be performed in a manner essentially the same as (14.92). That is, as a diagonalizing unitary matrix, we have

668

17

0

1 pffiffiffi B 2 B U¼B B piffiffiffi @ 2 0

1 1 pffiffiffi 0 C 2 C C, i pffiffiffi 0 C A 2 0 1

1

0

1 i pffiffiffi pffiffiffi B 2 2 B 1 U{ ¼ B B pffiffiffi piffiffiffi @ 2 2 0 0

Symmetry Groups

0

C C C: 0C A

ð17:76Þ

1

As a result of the unitary similarity transformation, we get 1 1 i pffiffiffi pffiffiffi 0 0 C cos θ B 2 2 CB B CB B i U { RU ¼ B 1 C sin θ B pffiffiffi pffiffiffi 0 C@ A @ 2 2 0 0 0 1 0 iθ 1 e 0 0 B C iθ ¼B 0C @ 0 e A: 0 0 1 0

sin θ cos θ 0

1

0

1 1 pffiffiffi pffiffiffi 0 B 2 2 CB B C i i 0 AB B pffiffiffi pffiffiffi @ 2 2 1 0 0 1

0

C C C 0C C A 1

ð17:77Þ Thus, eigenvalues are 1, eiθ, and eiθ. The eigenvalue 1 results from the existence of the unique rotation axis. When θ ¼ 0, (17.77) gives an identity matrix with all the eigenvalues 1 as expected. When θ ¼ π, eigenvalues are 1, 1, and 1. The eigenvalue 1 is again associated with the unique rotation axis. The (unitary) similarity transformation keeps a trace unchanged, that is, the trace χ is χ ¼ 1 þ 2 cos θ:

ð17:78Þ

As R is a normal operator, spectral decomposition can be done as in the case of Example 14.1. Here we only show the result below. 0

1 B 2 B R ¼ eiθ B i @ 2 0

1 0 i 1 i 0 C B2 2 2 C B 1 1 þ eiθ B i 0C A @ 2 2 2 0 0 0 0

1

0 0 0 C C B þ@0 0 0C A 0 0 0 0

Three matrices of the above equation are projection operators.

0

1

C 0 A: 1

17.4

Special Orthogonal Group SO(3)

669

17.4.2 Euler Angles and Related Topics Euler angles are well known and have been being used in various fields of science. We wish to connect the above discussion with Euler angles. In Part III we dealt with successive linear transformations. This can be extended to the case of three or more successive transformations. Suppose that we have three successive transformation R1, R2, and R3 and that the coordinate system (a threedimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly. The symbol “O” stands for the original coordinate system and I, II, and III represent successively transformed systems. With the discussion that follows, let us denote the transformation by R20, R30, R300, etc. in reference to the coordinate system I. For example, R30 means that the third transformation is viewed from the system I. The transformation R300 indicates the third transformation is viewed from the system II. That is, the number of primes “0” denotes the number of the coordinate system to distinguish the systems I and II. Let R2 (without prime) stand for the second transformation viewed from the system O. Meanwhile, we have R1 R02 ¼ R2 R1 :

ð17:79Þ

This notation is in parallel to (11.80). Similarly, we have R2 0 R3 00 ¼ R3 0 R2 0 and R1 R03 ¼ R3 R1 :

ð17:80Þ

R1 R2 0 R3 00 ¼ R1 R3 0 R2 0 ¼ R3 R1 R2 0 ¼ R3 R2 R1 :

ð17:81Þ

Therefore, we get [3]

Also combining (17.79) and (17.80), we have R3 00 ¼ ðR2 R1 Þ1 R3 ðR2 R1 Þ:

ð17:82Þ

Let us call R20, R300, etc. a transformation on a “moving” coordinate system (i.e., the system I, II, III, ). On the other hand, we call R1, R2, etc. a transformation on a “fixed” system (i.e., original coordinate system O). Thus, (17.81) shows that the multiplication order is reversed with respect to the moving system and fixed system [3]. For a practical purpose it would be enough to consider three successive transformations. Let us think of, however, a general case where n successive transformations are involved (n denotes a positive integer). For the purpose of succinct notation, let us define the linear transformations and relevant coordinate systems ðiÞ as those in Fig. 17.17. Also, we define a following orthogonal transformation Rj :

670

17 ( )

⟶

O

( )

( )

≡

⟶

I

⟶

II

Symmetry Groups ( )

III

⟶ ⋯

z

x

y

Fig. 17.17 Successive orthogonal transformations and relevant coordinate systems

ðiÞ

Rj ð0 i < j nÞ,

and

ð0Þ

Ri Ri ,

ð17:83Þ

ðiÞ

where Rj is defined as a transformation Rj described in reference to the coordinate ð0Þ system i; Ri means that Ri is referred to the original coordinate system (i.e., the fixed coordinate). Then we have ði2Þ ði1Þ

Ri1 Rk

ði2Þ ði2Þ Ri1

¼ Rk

ðk > i 1Þ:

ð17:84Þ

Particularly, when i ¼ 3 ð1Þ ð2Þ

ð1Þ ð1Þ

R2 Rk ¼ Rk R2

ðk > 2Þ:

ð17:85Þ

For i ¼ 2 we have ð1Þ

R1 Rk ¼ Rk R1 ðk > 1Þ:

ð17:86Þ

fn We define n time successive transformations on a moving coordinate system as R such that ð1Þ ð2Þ ðn3Þ ðn2Þ f Rn R1 R2 R3 Rn2 Rn1 Rðnn1Þ:

ð17:87Þ

ðn2Þ

Applying (17.84) on Rn1 Rðnn1Þ and rewriting (17.87), we have ð1Þ ð2Þ ðn3Þ ðn2Þ f Rn ¼ R1 R2 R3 Rn2 Rðnn2Þ Rn1 :

ð17:88Þ

ðn3Þ

Applying (17.84) again on Rn2 Rðnn2Þ , we get ð1Þ ð2Þ ðn3Þ ðn2Þ f Rn ¼ R1 R2 R3 Rðnn3Þ Rn2 Rn1 :

Proceeding similarly, we have

ð17:89Þ

17.4

Special Orthogonal Group SO(3)

671

ð1Þ ð2Þ ðn3Þ ðn2Þ ð1Þ ð2Þ ðn3Þ ðn2Þ f Rn ¼ R1 Rðn1Þ R2 R3 Rn2 Rn1 ¼ Rn R1 R2 R3 Rn2 Rn1 ,

ð17:90Þ

where with the last equality we used (17.86). In this case, we have R1 Rðn1Þ ¼ Rn R1 :

ð17:91Þ

To reach RHS of (17.90), we applied (17.84) (n 1) times in total. Then we ðn2Þ repeat the above procedures with respect to Rn1 another (n 2) times to get ð1Þ ð2Þ ðn3Þ f Rn ¼ Rn Rn1 R1 R2 R3 Rn2 :

ð17:92Þ

Further proceeding similarly, we finally get fn ¼ Rn Rn1 Rn2 R3 R2 R1 : R

ð17:93Þ

In total, we have applied the permutation of (17.84) n(n 1)/2 times. When n is 2, n (n 1)/2 ¼ 1. This is the case with (17.79). If n is 3, n(n 1)/2 ¼ 3. This is the case with (17.81). Thus, (17.93) once again confirms that the multiplication order is reversed with respect to the moving system and fixed system. e as Meanwhile, we define P n3Þ ðn2Þ e R1 Rð21Þ Rð32Þ Rðn2 P Rn1 :

ð17:94Þ

e ¼ Rn1 Rn2 R3 R2 R1 : P

ð17:95Þ

Alternately, we describe

Then, from (17.87) and (17.90) we get e ðnn1Þ ¼ Rn P: fn ¼ PR e R

ð17:96Þ

e1 or e ðnn1Þ P Rn ¼ PR

ð17:97Þ

Equivalently, we have

1

e e Rn P: Rðnn1Þ ¼ P Moreover, we have

ð17:98Þ

672

17

Symmetry Groups

h iT h iT h iT h iT n3Þ ðn2Þ ðn2Þ ðn3Þ ð1Þ eT ¼ R1 Rð21Þ Rð32Þ Rðn2 P Rn1 ¼ Rn1 Rn2 R2 ½R1 T h i1 h i1 h i1

ðn2Þ ðn3Þ ð1Þ ¼ Rn1 Rn2 R2 R1 1 h i1 ð1Þ ð2Þ ðn3Þ ðn2Þ e1: ¼P ð17:99Þ ¼ R1 R2 R3 Rn2 Rn1 ðn2Þ

ðn3Þ

ð1Þ

The third equality of (17.99) comes from the fact that matrices Rn1 , Rn2 , R2 , and R1 are orthogonal matrices. Then, we get e¼P eP eT ¼ E: eT P P

ð17:100Þ

e is an orthogonal matrix. Thus, P From a point of view of practical application, (17.97) and (17.98) are very useful. This is because Rn and Rðnn1Þ are conjugate to each other. Consequently, Rn has the same eigenvalues and trace as Rðnn1Þ . In light of (11.81), we see that (17.97) and (17.98) relate Rn (i.e., viewed in reference to the original coordinate system) to Rðnn1Þ [i.e., the same transformation viewed in reference to the coordinate system reached after the (n 1) transformations]. Since the transformation Rðnn1Þ is usually described in a simple form, matrix calculations to compute Rn can readily be done. Now let us consider an example. Example 17.4: Successive Rotations A typical illustration of three successive transformations in moving coordinate systems is well known in association with Euler angles. This contains the following three steps: (i) Rotation by α around the z-axis in the original coordinate system (O). (ii) Rotation by β around the y0-axis in the transferred coordinate system (I). (iii) Rotation by γ around the z00-axis (the same as z0-axis) in the transferred coordinate system (II). The above three steps are represented by matrices of Rzα, R0 y0 β , and R00 z00 γ in f3 we have (17.12). That is, as a total transformation R

17.4

Special Orthogonal Group SO(3)

0

cos α sin α 0

B f3 ¼ B sin α R @ 0

0

cos α 0

10

CB B 0C A@

1

673

cos β

0

0

1

sin β

10

cos γ sin γ 0

CB B 0 C A@ sin γ cos β 0

sin β 0

cos γ 0

cos α cos β cos γ sin α sin γ cos α cos β sin γ sin α cos γ

B ¼B @ sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ sin β cos γ

sin β sin γ

1

C 0C A 1

cos α sin β

1

C sin α sin β C A: cos β ð17:101Þ

This matrix corresponds to (17.87), where n ¼ 3. The angles α, β, and γ in (17.101) are called Euler angles and their domains are usually taken as 0 α 2π, 0 β π, 0 γ 2π. The matrix (17.101) is widely used in quantum mechanics and related fields of natural science. The matrix notation, however, differs from literature to literature, and so care should be taken [3–5]. Using the notation of (17.87), we have 0

1 cos α sin α 0 B C R1 ¼ @ sin α cos α 0 A, 0 0 1 0 1 cos β 0 sin β B C ð1Þ R2 ¼ @ 0 1 0 A, sin β 0 cos β 0 1 cos γ sin γ 0 B C ð2Þ R3 ¼ @ sin γ cos γ 0 A: 0

0

cos α cos β B ¼ @ sin α cos β sin β

sin α cos α

ð17:102Þ

ð17:103Þ

ð17:104Þ

1

From (17.94) we also get 0

e R1 Rð21Þ P

0

1 cos α sin β C sin α sin β A:

ð17:105Þ

cos β

Corresponding to (17.97), we have h i1 e1 ¼ R1 Rð21Þ Rð32Þ Rð21Þ R1 1: e ð32Þ P R3 ¼ PR Now matrix calculations are readily performed such that

ð17:106Þ

674

17

0

cos α sin α 0

B R3 ¼ B @ sin α

10

CB B 0C A@

cos β

0 sin β

10

Symmetry Groups

cos γ sin γ 0

1

CB C B C 0 C A@ sin γ cos γ 0 A 0 0 1 sin β 0 cos β 0 0 1 1 0 10 cos α sin α 0 cos β 0 sin β C B CB B C

B 0 C @ 0 1 A@ sin α cos α 0 A 0 0 1 sin β 0 cos β 0 1 2 2 2 2 cos α cos β þ sin α cos γ cos α sin α sin βð1 cos γ Þ cos α cos β sin βð1 cos γ Þ B C B C þ cos 2 α sin 2 β cos β sin γ þ sin α sin β sin γ B C B C 2 2 2 2 B cos α sin α sin βð1 cos γ Þ C sin α cos β þ cos α cos γ sin α cos β sin β ð 1 cos γ Þ B C ¼B C: B C 2 2 þ cos β sin γ þ sin α sin β cos α sin β sin γ B C B C B cos β sin β cos αð1 cos γ Þ C 2 sin α cos β sin βð1 cos γ Þ sin β cos γ @ A cos α

sin α sin β sin γ

0

1

þ cos α sin β sin γ

þ cos 2 β ð17:107Þ

Notice that in (17.107) we have a trace χ described as χ ¼ 1 þ 2 cos γ:

ð17:108Þ

ð2Þ

The trace is same as that of R3 , as expected from (17.106) and (17.97). Equation (17.107) apparently seems complicated but has simple and well-defined ð2Þ meaning. The rotation represented by R3 is characterized by a rotation by γ around 00 the z -axis. Figure 17.18 represents the orientation of the z00-axis viewed in reference to the original xyz-system. That is, the z00-axis (identical to the rotation axis A) is designated by an azimuthal angle α and a zenithal angle β as shown. The operation R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α, β, and γ coincide with the Euler angles designated with the same independent parameters α, β, and γ. e That is, From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. 0

eiγ { B ð 2 Þ { e 1 { e ¼ PU e e ¼ U R3 U ¼ @ 0 U P R3 PU R3 PU 0 e is a real matrix, we have Note that as P

0

0

eiγ

0

0

1

1 C A:

ð17:109Þ

17.4

Special Orthogonal Group SO(3)

675

Fig. 17.18 Rotation γ around the rotation axis A. The orientation of A is defined by angles α and β as shown

z

$

γ

O

y

x e{ ¼ P eT ¼ P e1 P

ð17:110Þ

and 0

cos α cos β B e ¼ B sin α cos β PU @ sin β

sin α cos α 0

0 1 i pffiffiffi cos α cos β þ pffiffiffi sin α B 2 2 B B 1 i B ¼ B pffiffiffi sin α cos β pffiffiffi cos α B 2 2 B @ 1 pffiffiffi sin β 2

1 pffiffiffi 2 1 pffiffiffi 2

0

1 pffiffiffi cos α sin β B 2 CB B i sin α sin β C AB B pffiffiffi @ 2 cos β 0 i cos α cos β pffiffiffi sin α 2 i sin α cos β þ pffiffiffi cos α 2 1 pffiffiffi sin β 2 1

1 pffiffiffi 2 i pffiffiffi 2 0

1 0

C C C 0C C A

1

cos α sin β

1

C C C sin α sin β C C: C C A cos β ð17:111Þ

A vector representing the rotation axis corresponds to an eigenvalue 1. The direction cosines of x-, y-, and z-component for the rotation axis A are cosα sin β, sinα sin β, and cosβ (see Fig. 3.1), respectively, when viewed in reference to the original xyzcoordinate system. This can directly be shown as follows: The characteristic equation of R3 is expressed as

676

17

Symmetry Groups

jR3 λEj ¼ 0:

ð17:112Þ

Using (17.107) we have 0

jR3 λEj cos 2 α cos 2 β þ sin 2 α cos γ

B B þ cos 2 α sin 2 β λ B B cos α sin α sin 2 βð1 cos γ Þ B ¼B B þ cos β sin γ B B @ cos α cos β sin βð1 cos γ Þ sin α sin β sin γ

cos α sin α sin 2 βð1 cos γ Þ

cos α cos β sin βð1 cos γ Þ

1

C C cos β sin γ þ sin α sin β sin γ C 2 2 2 sin α cos β þ cos α cos γ sin α cos β sin βð1 cos γ Þ C C C: 2 2 C cos α sin β sin γ þ sin α sin β λ C C 2 sin α cos β sin βð1 cos γ Þ sin β cos γ A þ cos α sin β sin γ þ cos 2 β λ ð17:113Þ

When λ ¼ 1, we must have the direction cosines of the rotation axis as an eigenvector. That is, we get 0

cos 2 β cos 2 α þ sin 2 α cos γ

B B þ sin 2 β cos 2 α 1 B B sin 2 β cos α sin αð1 cos γ Þ B B B þ cos β sin γ B B @ cos α cos β sin βð1 cos γ Þ sin α sin β sin γ

0

sin 2 β cos α sin αð1 cos γ Þ

cos β sin γ 2 cos β sin 2 α þ cos 2 α cos γ þ sin 2 α sin 2 β 1 sin α cos β sin βð1 cos γ Þ þ cos α sin β sin γ

cos β sin β cos αð1 cos γ Þ

C C þ sin β sin α sin γ C sin α cos β sin βð1 cos γ Þ C C C C cos α sin β sin γ C C 2 sin β cos γ A þ cos 2 β 1

1

0 1 cos α sin β 0 B C B C

@ sin α sin β A ¼ @ 0 A: cos β 0

ð17:114Þ

The above matrix calculations certainly verify that (17.114) holds; readers, check it. As an application of (17.107) to an illustrative example, let us consider a 2π/3 rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes (see Fig. 17.19 and Sect. 17.3). Then we have pffiffiffi pffiffiffi pffiffiffi cos α ¼ sin α ¼ 1= 2, cos β ¼ 1= 3, sin β ¼ 2= 3, pffiffiffi cos γ ¼ 1=2, sin γ ¼ 3=2: Substituting (17.115) for (17.107), we get

1

ð17:115Þ

17.4

Special Orthogonal Group SO(3)

677

z

z

(a)

(b)

O

y

O

y x x Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the origin and vertex

0

1 1 C 0 A:

0 0 B R3 ¼ @ 1 0 0 1

ð17:116Þ

0

If we write a linear transformation R3 following (11.37), we get 0

0

B R3 ðjxiÞ ¼ ðje1 ije2 i j e3 iÞ@ 1 0

0

1

10

x1

1

CB C 0 A@ x2 A, 0 x3

0 1

ð17:117Þ

where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and zaxes, respectively. We have jxi ¼ x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get 0

x1

1

B C R3 ðjxiÞ ¼ ðje2 i je3 i je1 iÞ@ x2 A:

ð17:118Þ

x3 This implies a cyclic permutation of the basis vectors. This is well expressed in terms of the column vectors (i.e., coordinate) transformation as

678

17

1 x3 B C R3 ðjxiÞ ¼ ðje1 i je2 i je3 iÞ@ x1 A:

Symmetry Groups

0

ð17:119Þ

x2 Care should be taken on which linear transformation is intended out of the basis vectors transformation or coordinate transformation. As mentioned above, we have compared geometric features on the moving coordinate system and fixed coordinate system. The features apparently seem to differ at a glance, but are essentially the same.

References 1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York 2. Cotton FA (1990) Chemical applications of group theory. Wiley, New York 3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press, Princeton 4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World Scientific, Singapore 5. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham

Chapter 18

Representation Theory of Groups

Representation theory is an important pillar of the group theory. As we shall see soon, the word “representation” and its definition sound a bit daunting. To be short, however, we need “numbers” or “matrices” to do a mathematical calculation. Therefore, we may think of the representation as merely numbers and matrices. Individual representations have their dimension. If that dimension is one, we treat a representation as a number (real or complex). If the dimension is two or more, we are going to deal with matrices; in the case of the n-dimension, it is a (n, n) square matrix. In this chapter we focus on the representation theory of finite groups. In this case, we have an important theorem stating that a representation of any finite group can be converted to a unitary representation by a similarity transformation. That is, group elements of a finite group are represented by a unitary matrix. According to the dimension of the representation, we have the same number of basis vectors. Bearing these things firmly in mind, we can pretty easily understand this important notion of representation.

18.1

Definition of Representation

In Sect. 16.4 we dealt with various aspects of the mapping between group elements. Of these, we studied fundamental properties of isomorphism and homomorphism. In this section we introduce the notion of representation of groups and study it. If we deal with a finite group consisting of n elements, we describe it as ℊ ¼ {g1 e, g2, , gn} as in the case of Sect. 16.1. Definition 18.1 Let ℊ ¼ {gν} be a group comprising elements gν. Suppose that a (d, d ) matrix D(gν) is given for each group element gν. Suppose also that in correspondence with

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_18

679

680

18 Representation Theory of Groups

gμ gν ¼ gρ , D gμ D ð gν Þ ¼ D gρ

ð18:1Þ ð18:2Þ

holds. Then a set L consisting of D(gν), that is, L ¼ fDðgν Þg is said to be a representation. Although the definition seems somewhat daunting, the representation is, as already seen, merely a homomorphism. We call individual matrices D(gν) a representation matrix. A dimension d of the matrix is said to be a dimension of representation as well. In correspondence with gνe ¼ gν, we have D gμ DðeÞ ¼ D gμ :

ð18:3Þ

DðeÞ ¼ E,

ð18:4Þ

Therefore, we get

where E is an identity matrix of dimension d. Also in correspondence with gνgν1 ¼ gν1gν ¼ E, we have Dðgν ÞD gν 1 ¼ D gν 1 Dðgν Þ ¼ DðeÞ ¼ E:

ð18:5Þ

D gν 1 ¼ ½Dðgν Þ1 :

ð18:6Þ

That is,

Namely, an inverse matrix corresponds to an inverse element. If the representation has one-to-one correspondence, the representation is said to be faithful. In the case of n-to-one correspondence, the representation is homomorphic. In particular, representing all group elements by 1 [as (1, 1) matrix] is called an “identity representation.” Illustrative examples of the representation have already appeared in Sects. 17.2 and 17.3. In that case the representation was faithful. For instance, in Tables 17.2, 17.4, and 17.6 as well as Fig. 17.12, the number of representation matrices is the same as that of group elements. In Sects. 17.2 and 17.3, in most cases we have used real orthogonal matrices. Such matrices are included among unitary matrices. Representation using unitary matrices is said to be a “unitary representation.” In fact, a representation of a finite group can be converted to a unitary representation.

18.1

Definition of Representation

681

Theorem 18.1 [1, 2] A representation of any finite group can be converted to a unitary representation by a similarity transformation. Proof Let ℊ ¼ {g1, g2, , gn} be a finite group of order n. Let D(gν) be a representation matrix of gν of dimension d. Here we suppose that D(gν) is not unitary, but non-singular. Using D(gν), we construct the following matrix H such that H¼

Xn i¼1

Dðgi Þ{ Dðgi Þ:

ð18:7Þ

Then, for an arbitrarily chosen group element gj we have { Xn { D gj Dðgi Þ{ Dðgi ÞD gj D gj HD gj ¼ i¼1 Xn { ¼ Dðgi ÞD gj Dðgi ÞD gj i¼1 Xn { ¼ D gi gj D gi gj Xni¼1 ¼ Dðgk Þ{ Dðgk Þ k¼1

ð18:8Þ

¼ H,

where the third equality comes from homomorphism of the representation and the second last equality is due to rearrangement theorem (see Sect. 16.1). Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a non-singular matrix and that H is a summation of such Gram matrices. Consequently, on the basis of the argument of Sect. 13.2, H is positive definite and all the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we can get a diagonal matrix Λ such that U { HU ¼ Λ:

ð18:9Þ

Here, the diagonal matrix Λ is given by 0 B B Λ¼B @

1

λ1

C C C, A

λ2 ⋱ λd

with λi > 0 (1 i d). We define Λ1/2 such that

ð18:10Þ

682

18 Representation Theory of Groups

Λ1=2

1 Λ1=2

0 pffiffiffiffiffi λ1 pffiffiffiffiffi B λ2 B ¼B @ ⋱

1

pffiffiffiffiffi λd

0 qffiffiffiffiffiffiffi λ1 1 B qffiffiffiffiffiffiffi B B λ1 B 2 ¼B B ⋱ @

C C C, A

ð18:11Þ 1

C C C C: C C qffiffiffiffiffiffiffi A λ1 d

ð18:12Þ

Notice that both Λ1/2 and (Λ1/2)1 are non-singular. Furthermore, we define a matrix V such that 1 V ¼ U Λ1=2 :

ð18:13Þ

Then, multiplying both sides of (18.8) by V1 from the left and by V from the right and inserting VV1 ¼ E in between, we get { V 1 D gj VV 1 HVV 1 D gj V ¼ V 1 HV :

ð18:14Þ

Meanwhile, we have 1 1 V 1 HV ¼ Λ1=2 U { HU Λ1=2 ¼ Λ1=2 Λ Λ1=2 ¼ Λ:

ð18:15Þ

With the second equality of (18.15), we used (18.9). Inserting (18.15) into (18.14), we get { V 1 D gj VΛV 1 D gj V ¼ Λ:

ð18:16Þ

Multiplying both sides of (18.16) by Λ1 from the left, we get { Λ1 V 1 D gj ðVΛÞ V 1 D gj V ¼ E: Using (18.13), we have h i1 1 1 Λ1 V 1 ¼ ðVΛÞ1 ¼ UΛ1=2 ¼ Λ1=2 U 1 ¼ Λ1=2 U{:

ð18:17Þ

18.2

Basis Functions of Representation

683

Meanwhile, taking adjoint of (18.13) and noting (18.12), we get {

V ¼

Λ

1=2

1 {

1 U { ¼ Λ1=2 U{:

Using (18.13) once again, we have VΛ ¼ UΛ1=2 : Also using (18.13) we have

V

1 {

( ¼

1 1 U Λ1=2

){

h i{ { { ¼ Λ1=2 U { ¼ U { Λ1=2 ¼ UΛ1=2 ,

where we used unitarity of U and (18.11). Using the above relations and rewriting (18.17), finally we get { { V { D gj V 1 ∙ V 1 D gj V ¼ E:

ð18:18Þ

e gj as follows Defining D e gj V 1 D gj V, D

ð18:19Þ

and taking adjoint of both sides of (18.19), we get { { e gj { : V { D gj V 1 ¼ D

ð18:20Þ

Then, from (18.18) we have e gj ¼ E: e gj { D D

ð18:21Þ

Equation (18.21) implies that a representation of any finite group can be converted to a unitary representation by a similarity transformation of (18.19). This completes the proof. ∎

18.2

Basis Functions of Representation

In Part III we considered a linear transformation of a vector. In that case we have defined vectors as abstract elements with which operation laws of (11.1)–(11.8) hold. We assumed that the operation is addition. In this part, so far we have dealt

684

18 Representation Theory of Groups

with vectors mostly in ℝ3. Therefore, vectors naturally possess geometric features. In this section, we extend a notion of vectors so that they can be treated under a wider scope. More specifically, we include mathematical functions treated in analysis as vectors. To this end, let us think of a basis of a representation. According to the dimension d of the representation, we have d basis vectors. Here we adopt d linearly independent basis vectors for the representation. Let ψ 1, ψ 2, , and ψ d be linearly independent vectors in a vector space V d. Let ℊ ¼ {g1, g2, , gn} be a finite group of order n. Here we assume that gi (1 i n) is a linear transformation (or operator) such as a symmetry operation dealt with in Chap. 17. Suppose that the following relation holds with gi 2 ℊ (1 i n): gi ð ψ ν Þ ¼

Xd μ¼1

ψ μ Dμν ðgi Þ ð1 ν dÞ:

ð18:22Þ

Here we followed the notation of (11.37) that represented a linear transformation of a vector. Rewriting (18.22) more explicitly, we have 0 B gi ðψ ν Þ ¼ ðψ 1 ψ 2 ψ d Þ@

D11 ðgi Þ

D1d ðgi Þ

⋮

⋱

⋮

Dd1 ðgi Þ

1 C A:

ð18:23Þ

Ddd ðgi Þ

Comparing (18.23) with (11.37), we notice that ψ 1, ψ 2, , and ψ d act as vectors. The corresponding coordinates of the vectors (or a column vector); i.e., x1, x2, , and xd have been omitted. Now let us make sure that a set L consisting of {D(g1), D(g2), , D(gn)} forms a representation. Operating gj on (18.22), we have Xd Xd gj gi ðψ ν Þ¼½gj gi ðψ ν Þ ¼ gj μ¼1 ψ μ Dμν ðgi Þ ¼ g ψ μ Dμν ðgi Þ μ¼1 j Xd Xd Xd D g ψ ð g Þ ¼ ψ D g Dμν ðgi Þ ¼ μν j μ i μ¼1 μ¼1 λ¼1 λ λμ j Xd ¼ ψ D gj Dðgi Þ λν : λ¼1 λ

ð18:24Þ

Putting gjgi ¼ gk according to multiplication of group elements, we get gk ð ψ ν Þ ¼

Xd λ¼1

ψ λ D gj Dðgi Þ λν :

Meanwhile, replacing gi in (18.22) with gk, we have

ð18:25Þ

18.2

Basis Functions of Representation

gk ð ψ ν Þ ¼

685

Xd μ¼1

ψ λ Dλν ðgk Þ:

ð18:26Þ

Comparing (18.25) and (18.26) and considering the uniqueness of vector representation based on linear independence of the basis vectors (see Sect. 11.1), we get D gj Dðgi Þ λν ¼ Dλν ðgk Þ ½Dðgk Þλν ,

ð18:27Þ

where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have D gj Dðgi Þ ¼ Dðgk Þ:

ð18:28Þ

Thus, the set L consisting of {D(g1), D(g2), , D(gn)} is certainly a representation of the group ℊ ¼ {g1, g2, , gn}. In such a case, the set B consisting of linearly independent d functions (i.e., vectors) B ¼ fψ 1 , ψ 2 , , ψ d g

ð18:29Þ

is said to be basis functions of the representation D. The number d equals the dimension of representation. Correspondingly, the representation matrix is a (d, d ) square matrix. As remarked above, the correspondence between the elements of L and ℊ is not necessarily one-to-one (isomorphic), but may be n-to-one (homomorphic). Definition 18.2 Let D and D0 be two representations of ℊ ¼ {g1, g2, , gn}. Suppose that these representations are related to each other by similarity transformation such that D0 ðgi Þ ¼ T 1 Dðgi ÞT ð1 i nÞ,

ð18:30Þ

where T is a non-singular matrix. Then, D and D0 are said to be equivalent representations, or simply equivalent. If the representations are not equivalent, they are called inequivalent. Suppose that B ¼ fψ 1 , ψ 2 , , ψ d g is a basis of a representation D of ℊ ¼ {g1, g2, , gn}. Then we have (18.22). Let T be a non-singular matrix. Using T, we want to transform a basis of the representation D from B to a new set B 0 ¼ ψ 01 , ψ 02 , , ψ 0d . Individual elements of the new basis are expressed as 0

ψν ¼

Xd λ¼1

ψ λ T λν :

ð18:31Þ

Since T is non-singular, this ensures that ψ 01 , ψ 02 , , and ψ 0d are linearly independent and, hence, that B 0 forms another basis set of D (see Discussion of Sect. 11.4). Thus, we can describe ψ μ (1 μ n) in terms of ψ 0ν . That is

686

18 Representation Theory of Groups

ψμ ¼

Xd λ¼1

ψ 0λ T 1 λμ :

ð18:32Þ

Operating gi on both sides of (18.31), we have Xd Xd Xd gi ψ 0ν ¼ g ð ψ ÞT ¼ ψ D ðg ÞT λν λν i λ λ¼1 λ¼1 μ¼1 μ μλ i Xd Xd Xd Xd ¼ ψ 0 T 1 κμ Dμλ ðgi ÞT λν ¼ ψ 0 T 1 Dðgi ÞT κν : λ¼1 μ¼1 κ¼1 κ κ¼1 κ ð18:33Þ Let D0 be a representation of ℊ in reference to B 0 ¼ ψ 01 , ψ 02 , , ψ 0d . Then we have Xd gi ψ 0ν ¼ ψ 0 D0 : κ¼1 κ κν

ð18:34Þ

Hence, (18.30) follows in virtue of the linear independence of ψ 01 , ψ 02 , , and ψ 0d . Thus, we see that the transformation of basis vectors via (18.31) causes similarity transformation between representation matrices. This is in parallel with (11.81) and (11.88). Below we show several important notions as a definition. We assume that a group is a finite group. e be two representations of ℊ ¼ {g1, g2, , gn}. Let Definition 18.3 Let D and D e ðgi Þ be two representation matrices of gi (1 i n). If we construct Δ(gi) D(gi) and D such that

Δ ð gi Þ ¼

Dðgi Þ

e ð gi Þ D

,

ð18:35Þ

then Δ(gi) is a representation as well. The representation Δ(gi) is said to be a direct e ðgi Þ. We denote it by sum of D(gi) and D e ðgi Þ: Δðgi Þ ¼ Dðgi Þ D

ð18:36Þ

e ðgi Þ. A dimension Δ(gi) is a sum of that of D(gi) and that of D Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation matrix of a group element gi. If we can convert D(gi) to a block matrix such as (18.35) by its similarity transformation, D is said to be a reducible representation, or completely reducible. If the representation is not reducible, it is irreducible.

18.2

Basis Functions of Representation

687

A reducible representation can be decomposed (or reduced) to a direct sum of plural irreducible representations. Such an operation (or procedure) is called reduction. Definition 18.5 Let ℊ ¼ {g1, g2, , gn} be a group and let V be a vector space. Then, we denote a representation D of ℊ operating on V by D : ℊ ! GLðV Þ: We call V a representation space (or carrier space) L S of D [3, 4]. From Definition 18.5, the dimension of representation is identical with the dimension of V. Suppose that there is a subspace W in V. If W is D(gi)-invariant (see Sect. 12.2) with all 8gi 2 ℊ, W is said to be an invariant subspace of V. Here, we say that W is D(gi)-invariant if we have | xi 2 W ⟹ D(gi)| xi 2 W; see Sect. 12.2. Notice that | xi may well represent a function ψ ν of (18.22). In this context, we have a following important theorem. Theorem 18.2 [3] Let D: ℊ ! GL(V ) be a unitary representation over V. Let W be a D(gi)-invariant subspace of V, where gi (1 i n) 2 ℊ ¼ {g1, g2, , gn}. Then, W⊥ is a D(gi)-invariant subspace of V as well. Proof Suppose that | ai 2 W⊥ and let | bi 2 W. Then, from (13.86) we have D E D E hbjDðgi Þjai ¼ ajDðgi Þ{ jb ¼ aj½Dðgi Þ1 jb ¼ ajD gi 1 jb , where we used the fact that D(gi) is a unitary matrix; see (18.6). But, since W is D(gi)-invariant from the supposition, jD(gi1)| bi 2 W. Here notice that gi1 must be ∃ gi (1 i n) 2 ℊ. Therefore, we should have hbjDðgi Þjai ¼ ajD gi 1 jb ¼ 0: This implies that jD(gi)| ai 2 W⊥. This means that W⊥ is a D(gi)-invariant subspace of V. This completes the proof. ∎ From Theorem 14.1, we have V ¼ W W ⊥: Correspondingly, as in the case of (12.102), D(gi) is reduced as follows: Dðgi Þ ¼

!

D ð W Þ ð gi Þ ⊥

DðW Þ ðgi Þ

,

688

18 Representation Theory of Groups ⊥

where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces W and W⊥, respectively. This is the converse of the additive representations that appeared in Definition 18.3. From (11.17) and (11.18), we have dimV ¼ dimW þ dimW ⊥ :

ð18:37Þ

If V is a d-dimensional vector space, V is spanned by d linearly independent basis vectors (or functions) ψ μ (1 μ d). Suppose that the dimension W and W⊥ is d (W ) ⊥ and dðW Þ , respectively. Then, W and W⊥ are spanned by d(W ) linearly independent ⊥ vectors and dðW Þ linearly independent vectors, respectively. The subspaces W and W⊥ may well further be decomposed into orthogonal complements [i.e., D(gi)invariant subspaces of V]. In this situation, in general, we write Dðgi Þ ¼ Dð1Þ ðgi Þ Dð2Þ ðgi Þ DðωÞ ðgi Þ:

ð18:38Þ

We will develop further detailed discussion in Sect. 18.4. If the aforementioned decomposition cannot be done, D is irreducible. Therefore, it will be of great importance to examine a dimension of irreducible representations contained in (18.38). This is closely related to the choice of basis vectors and properties of the invariant subspaces. Notice that the same irreducible representation may repeatedly occur in (18.38). Before advancing to the next section, however, let us think of an example to get used to abstract concepts. This is at the same time for the purpose of taking the contents of Chap. 19 in advance. Example 18.1 Figure 18.1 shows structural formulae including resonance structures for allyl radical. Thanks to the resonance structures, the allyl radical belongs to C2v; see the multiplication table in Table 17.1. The molecule lies on the yz-plane and a line connecting C1 and a bonded H is the C2 axis (see Fig. 18.1). The C2 axis is identical to the z-axis. The molecule has mirror symmetries with respect to the yzand zx-planes. We denote π-orbitals of C1, C2, and C3 by ϕ1, ϕ2, and ϕ3, respectively. We suppose that these orbitals extend toward the x-direction with a positive sign in the upper side and a negative sign in the lower side relative to the plane of paper. Notice that we follow custom of the group theory notation with the coordinate setting in Fig. 18.1. We consider an inner vector space V3 spanned by ϕ1, ϕ2, and ϕ3. To explicitly show that these are vectors of the inner vector space, we express them as jϕ1i, jϕ2i, and jϕ3i in this example. Then, according to (11.19) of Sect. 11.1, we write V 3 ¼ Spanfjϕ1 i, jϕ2 i, jϕ3 ig: The vector space V3 is a representation space pertinent to a representation D of the present example. Also, in parallel to (13.32) of Sect. 13.2 we express an arbitrary vector jψi 2 V3 as

18.2

Basis Functions of Representation

689

H

H

C1

H

H

C1

H

H

C3

C2

C3

C2

H

H

H

H

z

x

y

Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The zaxis is identical with a straight line connecting C1 and H (i.e., C2 axis)

jψi ¼ c1 jϕ1 i þ c2 jϕ2 i þ c3 jϕ3 i ¼ jc1 ϕ1 þ c2 ϕ2 þ c3 ϕ3 i 0 1 c1 B C C ¼ ðjϕ1 i jϕ2 i jϕ3 iÞB @ c2 A: c3 Let us now operate a group element of C2v. For example, choosing C2(z), we have 0

1

B C2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞ@ 0 0

0 0 1

0

10

c1

1

CB C 1 A@ c2 A: 0 c3

ð18:39Þ

Thus, C2(z) is represented by a (3, 3) matrix. Other group elements are represented similarly. These results are collected in Table 18.1, where each matrix is given with respect to jϕ1i, jϕ2i, and jϕ3i as basis vectors. Notice that the matrix representations differ from those of Table 17.2, where we chose jxi, j yi, and j zi for the basis vectors. From Table 18.1, we immediately see that the representation matrices are reduced to an upper (1, 1) diagonal matrix (i.e., just a number) and a lower (2, 2) diagonal matrix. In Table 18.1, we find that all the matrices are Hermitian (as well as unitary). Since C2v is an Abelian group (i.e., commutative), in light of Theorem 14.13, we should be able to diagonalize these matrices by a single unitary similarity transformation all at once. In fact, E and σ 0v ðyzÞ are invariant with respect to unitary similarity transformation, and so we only have to diagonalize C2(z) and σ v(zx) at once. As a characteristic equation of the above (3, 3) matrix of (18.39), we have

690

18 Representation Theory of Groups

Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i E 0

1

1 0 0 C B @0 1 0A 0 0 1

λ 1 0

σ 0 ðyzÞ 0v 1 0 B @ 0 1 0 0

σ v(zx) 1 0 1 0 0 C B @0 0 1A 0 1 0

C2(z) 1 0 1 0 0 C B 0 1 A @ 0 0 1 0

1 0 C 0 A 1

0 λ 1 ¼ 0: 1 λ 0

Solving the above equation, we get λ ¼ 1 or λ ¼ 1 (as a double root). Also as a diagonalizing unitary matrix U, we get 0

1 0 0 1 1 C pffiffiffi pffiffiffi C { 2 2 C C¼U : A 1 1 pffiffiffi pffiffiffi 2 2

1

B B0 U¼B B @ 0

ð18:40Þ

Thus, (18.39) can be rewritten as 0

1

B C 2 ðzÞðjψ Þ ¼ ðjϕ1 i jϕ2 i jϕ3 iÞUU { B @ 0 0

0 0 1 0

0

0

1

c1

1

C {B C B C 1 C AUU @ c2 A 0

1 0 B 1 1 ¼ ðjϕ1 i pffiffiffi ðjϕ2 þ ϕ3 iÞ pffiffiffi ðjϕ2 ϕ3 iÞB @ 0 1 2 2 0 0

1

c3 0

c1

1

C B 1 CB pffiffiffi ðc2 þ c3 Þ C C: B 0C C 2 AB C B A @ 1 1 pffiffiffi ðc2 c3 Þ 2 0

The diagonalization of the representation matrix σ v(zx) using the same U as the above is left for the readers as an exercise. Table 18.2 shows the results of the diagonalized representations with regard to the vectors j ϕ1 i, p1ffiffi2 ðjϕ2 þ ϕ3 iÞ, and p1ffiffi2 ðjϕ2 ϕ3 iÞ. Notice that traces of the matrices remain unchanged in Tables 18.1 and 18.2, i.e., before and after the unitary similarity transformation. From Table 18.2, we find that the vectors are eigenfunctions of the individual symmetry operations. Each diagonal element is a corresponding eigenvalue of those operations. Of the three vectors, jϕ1i and p1ffiffi2 ðjϕ2 þ ϕ3 iÞ have the same eigenvalues with respect to individual symmetry operations. With symmetry

18.2

Basis Functions of Representation

691

Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i, ðjϕ2 þ ϕ3 iÞ, and p1ffiffi ðjϕ2 ϕ3 iÞ

p1ffiffi 2

2

E 0

1

1 0 0 C B @0 1 0A 0 0 1

C2(z) 1 0 1 0 0 C B @ 0 1 0 A 0 0 1

σ v(zx) 1 0 1 0 0 C B @0 1 0 A 0 0 1

σ 0v ðyzÞ 0 1 0 B @ 0 1 0 0

1 0 C 0 A 1

operations C2 and σ v(zx), another vector p1ffiffi2 ðjϕ2 ϕ3 iÞ has an eigenvalue of a sign opposite to that of the former two vectors. Thus, we find that we have arrived at “symmetry-adapted” vectors by taking linear combination of original vectors. Returning back to Table 17.2, we see that the representation matrices have already been diagonalized. In terms of the representation space, we constructed the representation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In the next chapter, we make the most of such vectors constructed by the symmetryadapted linear combination. Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we deal with molecular orbital (MO) calculations. Including the present case of the allyl radical, we solve the energy eigenvalue problem by appropriately determining those coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a representation space. A dimension of the representation space depends on the number of molecular orbitals. The representation space is decomposed into several or more (but finite) orthogonal complements according to (18.38). We will come back to the present example in Sect. 19.4.4 and further investigate the problems there. As can be seen in the above example, the representation space accommodates various types of vectors, e.g., mathematical functions in the present case. If the representation space is decomposed into invariant subspaces, we can choose appropriate basis vectors for each subspace, the number of which is equal to the dimensionality of each irreducible representation [4]. In this context, the situation is related to that of Part III where we have studied how a linear vector space is decomposed into direct sum of invariant subspaces that are spanned by associated basis vectors. In particular, it will be of great importance to construct mutually orthogonal symmetry-adapted vectors through linear combination of original basis vectors in each subspace associated with the irreducible representation. We further study these important subjects in the following several sections.

692

18 Representation Theory of Groups

18.3

Schur’s Lemmas and Grand Orthogonality Theorem (GOT)

Pivotal notions of the representation theory of finite groups rest upon Schur’s lemmas (first lemma and second lemma). e be two irreducible representations of ℊ. Schur’s First Lemma [1, 2] Let D and D e be m and n, respectively. Suppose that Let dimensions of representations of D and D 8 with g 2 ℊ the following relation holds: e ðgÞ, DðgÞM ¼ M D

ð18:41Þ

where M is a (m, n) matrix. Then we must have Case (i): M ¼ 0 or Case (ii): M is a square matrix (i.e., m ¼ n) with detM 6¼ 0. e are inequivalent. In Case (ii), on the In Case (i), the representations of D and D e are equivalent. other hand, D and D Proof (a) First, suppose that m > n. Let B ¼ fψ 1 , ψ 2 , , ψ m g be a basis set of the representation space related to D. Then we have gð ψ ν Þ ¼

Xm μ¼1

ψ μ Dμν ðgÞ ð1 ν mÞ:

Next we form a linear combination of ψ 1, ψ 2, , and ψ m such that ϕν ¼

Xm μ¼1

ψ μ M μν

ð1 ν nÞ:

ð18:42Þ

Operating g on both sides of (18.42), we have i Xm Xm hXm g ψ ¼ ψ D ð g Þ M μν gð ϕ ν Þ ¼ M μν λμ μ λ μ¼1 μ¼1 λ¼1 hXn i hXm i Xm Xm e μν ðgÞ D ¼ ψ D ð g ÞM ψ M ¼ λμ μν λμ λ λ λ¼1 μ¼1 λ¼1 μ¼1 i Xn hXm Xn e μν ðgÞ ¼ e ðgÞ: ¼ ψ M D ϕ D μ¼1 λ¼1 λ λμ μ¼1 μ μν ð18:43Þ e¼ With the fourth equality of (18.43), we used (18.41). Therefore, B e fϕ1 , ϕ2 , , ϕn g is a representation of D. If m > n, we would have been able to construct a basis of the representation D using n ( n with a (m, n) matrix M. Figure 18.2 graphically shows the magnitude relationship in representation dimensions between representation matrices and M. Thus, in parallel with the above argument of (a), we exclude the case where m < n as well. (c) We consider the third case of m ¼ n. Similarly as before we make a linear combination of vectors contained in the basis set B ¼ fψ 1 , ψ 2 , , ψ m g such that ϕν ¼

Xm μ¼1

ψ μ M μν

ð1 ν mÞ,

ð18:47Þ

where M is a (m, m) square matrix. If detM ¼ 0, then ϕ1, ϕ2, , and ϕm are linearly dependent (see Sect. 11.4). With the number of linearly independent vectors p, we have p < m accordingly. As in (18.43) again, this implies that we e , in would have obtained a representation of a smaller dimension p for D e contradiction to the supposition that D is irreducible. To avoid this contradiction, we must have detM 6¼ 0. These complete the proof. ∎

694

18 Representation Theory of Groups

(a):

> ( ) (m, m)

(b):

×

(m, n)

=

(m, n)

×

( ) (n, n)

< ( ) (n, n)

×

(n, m)

=

(n, m)

×

( ) (m, m)

Fig. 18.2 Magnitude relationship between dimensions of representation matrices and M. (a) m > n. (b) m < n. The diagram is based on (18.41) and (18.46) of the text

Schur’s Second Lemma [1, 2] Let D be a representation of ℊ. Suppose that with 8 g 2 ℊ we have DðgÞM ¼ MDðgÞ:

ð18:48Þ

Then, if D is irreducible, M ¼ cE with c being an arbitrary complex number, where E is an identity matrix. Proof Let c be an arbitrarily chosen complex number. From (18.48) we have DðgÞðM cE Þ ¼ ðM cE ÞDðgÞ:

ð18:49Þ

If D is irreducible, Schur’s First Lemma implies that we must have either (i) M cE ¼ 0 or (ii) det(M cE) 6¼ 0. A matrix M has at least one proper eigenvalue λ (Sect. 12.1), and so choosing λ for c in (18.49), we have det(M λE) ¼ 0. Consequently, only former case is allowed. That is, we have M ¼ cE:

ð18:50Þ

∎ Schur’s lemmas lead to important orthogonality theorem that plays a fundamental role in many scientific fields. The orthogonality theorem includes that of matrices and their traces (or characters). Theorem 18.3: Grand Orthogonality Theorem (GOT) [5] Let D(1), D(2), be all inequivalent irreducible representations of a group ℊ ¼ {g1, g2, , gn} of order n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2), . Then, regarding their matrix representations, we have the following relationship:

18.3

Schur’s Lemmas and Grand Orthogonality Theorem (GOT)

X g

ð αÞ

ðβÞ

Dij ðgÞ Dkl ðgÞ ¼

695

n δ δ δ , dα αβ ik jl

ð18:51Þ

where Σg means that the summation should be taken over all n group elements; dα denotes a dimension of the representation D(α). The symbol δαβ means that δαβ ¼ 1 when D(α) and D(β) are equivalent and that δαβ ¼ 0 when D(α) and D(β) are inequivalent. Proof First we prove the case where D(α) ¼ D(β). For the sake of simple expression we omit a superscript and denote D(α) simply by D. Let us construct a matrix A such that A¼

X g

DðgÞXD g1 ,

ð18:52Þ

where X is an arbitrary matrix. Hence, 1 X 1 0 1 0 0 D ð g ÞD ð g ÞXD g D ð g ÞD ð g ÞXD g D g Dðg0 Þ ¼ g g X 1 Dðg0 ÞDðgÞXD g1 g0 ¼ Dðg0 Þ g h i X 0 0 1 D ð g g ÞXD ð g g Þ ¼ Dðg0 Þ: g

Dðg0 ÞA ¼

X

ð18:53Þ Thanks to the rearrangement theorem, for fixed g0 the element g0g runs through all the group elements as g does so. Therefore, we have h i X 0 0 1 D ð g g ÞXD ð g g Þ DðgÞXD g1 ¼ A: ¼ g g

X

ð18:54Þ

Thus, DðgÞA ¼ ADðgÞ:

ð18:55Þ

According to Schur’s Second Lemma, we have A ¼ λE:

ð18:56Þ ðlÞ

The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5 and 12.6). Thus from (18.53) we have X g,p,q

X Dip ðgÞδðplÞ δqðmÞ Dqj g1 ¼ D ðgÞDmj g1 ¼ λlm δij , g il

ð18:57Þ

where λlm is a constant to be determined. Using the unitary representation, we have

696

18 Representation Theory of Groups

X g

Dil ðgÞDjm ðgÞ ¼ λlm δij :

ð18:58Þ

Next, we wish to determine coefficients λlm. To this end, setting i ¼ j and summing over i in (18.57), we get for LHS

1 X 1 D ð g ÞD g D g ¼ D ð g Þ il mi i g

X X g

¼

X

½DðeÞml ¼ g

X

¼ ml

δ g ml

X 1 g D g g

¼ nδml ,

ml

ð18:59Þ

where n is equal to the order of group. As for RHS, we have X

λ δ i lm ii

¼ λlm d,

ð18:60Þ

where d is equal to a dimension of D. From (18.59) and (18.60), we get n λlm ¼ δlm : d

ð18:61Þ

n Dil ðgÞDjm ðgÞ ¼ δlm δij : d

ð18:62Þ

λlm d ¼ nδlm

or

Therefore, from (18.58) X g

Specifying a species of the irreducible representation, we get X

ðαÞ

g

ðαÞ

Dil ðgÞDjm ðgÞ ¼

n δ δ , dα lm ij

ð18:63Þ

where dα is a dimension of D(α). Next, we examine the relationship between two inequivalent irreducible representations. Let D(α) and D(β) be such representations with dimensions dα and dβ, respectively. Let us construct a matrix B such that B¼

X g

DðαÞ ðgÞXDðβÞ g1 ,

where X is again an arbitrary matrix. Hence,

ð18:64Þ

18.4

Characters

697

X

ðαÞ 0 ðαÞ ðβÞ 1 D ð g ÞD ð g ÞXD g g X ðβÞ 0 1 ðβÞ 0 ð αÞ 0 ð αÞ ðβÞ 1 D ð g ÞD ð g ÞXD g D g ¼ D ðg Þ g h i X 1 DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ DðβÞ ðg0 Þ ¼ BDðβÞ ðg0 Þ: ¼ g

DðαÞ ðg0 ÞB ¼

ð18:65Þ

According to Schur’s First Lemma, we have B ¼ 0:

ð18:66Þ

ðlÞ

Putting X ¼ δi δjðmÞ as before and rewriting (18.64), we get X

ðαÞ

g

ðβÞ

Dil ðgÞDjm ðgÞ ¼ 0:

ð18:67Þ

Combining (18.63) and (18.67), we get (18.51). These procedures complete the proof. ∎

18.4

Characters

Representation matrices of a group are square matrices. In Part III we examined properties of a trace, i.e., a sum of diagonal elements of a square matrix. In group theory the trace is called a character. Definition 18.6 Let D be a (matrix) representation of a group ℊ ¼ {g1, g2, , gn}. The sum of diagonal elements χ(g) is defined as follows: χ ðgÞ TrDðgÞ ¼

Xd i¼1

Dii ðgÞ,

ð18:68Þ

where g stands for group elements g1, g2, , and gn; Tr stands for “trace”; d is a dimension of the representation D. Let ∁ be a set defined as ∁ ¼ fχ ðg1 Þ, χ ðg2 Þ, , χ ðgn Þg:

ð18:69Þ

Then, the set ∁ is called a character of D. A character of an irreducible representation is said to be an irreducible character. Let us describe several properties of the character or trace. (i) A character of the identity element χ(e) is equal to a dimension d of a representation. This is because the identity element is given by a unit matrix.

698

18 Representation Theory of Groups

(ii) Let P and Q be two square matrices. Then, we have TrðPQÞ ¼ TrðQPÞ:

ð18:70Þ

This is because X i

ðPQÞii ¼

XX i

P Q j ij ji

¼

XX j

i

Qji Pij ¼

X j

ðQPÞjj :

ð18:71Þ

Putting Q ¼ SP1 in (18.70), we get Tr PSP1 ¼ Tr SP1 P ¼ TrðSÞ:

ð18:72Þ

Therefore, we have the following property: (iii) Characters of group elements that are conjugate to each other are equal. If gi and gj are conjugate, these elements are connected by means of a suitable element g such that ggi g1 ¼ gj :

ð18:73Þ

Accordingly, a representation matrix is expressed as DðgÞDðgi ÞD g1 ¼ DðgÞDðgi Þ½DðgÞ1 ¼ D gj :

ð18:74Þ

Taking a trace of both sides of (18.74), we have χ ðgi Þ ¼ χ gj :

ð18:75Þ

(iv) Any two equivalent representations have the same trace. This immediately follows from (18.30). There are several orthogonality theorem about a trace. Among them, the following theorem is well known. Theorem 18.4 A trace of irreducible representations satisfies the following orthogonality relation: X g

χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ nδαβ ,

ð18:76Þ

where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β), respectively. Proof In (18.51) putting i ¼ j and k ¼ l in both sides and summing over all i and k, we have

18.4

Characters

X X g

699 ð αÞ

ðβÞ

D ðgÞ Dkk ðgÞ ¼ i,k ii ¼

X

X n n δ δ δ ¼ δ δ αβ ik ik i,k d α i,k d α αβ ik

n δ d ¼ nδαβ : d α αβ α

From (18.68) and (18.77), we get (18.76). This completes the proof.

ð18:77Þ ∎

Since a character is identical with group elements belonging to the same conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as a summation of the conjugacy classes. Thus, we have Xnc l¼1

χ ðαÞ ðK l Þ χ ðβÞ ðK l Þkl ¼ nδαβ ,

ð18:78Þ

where nc denotes the number of conjugacy classes in a group and kl indicates the number of group elements contained in a class Kl. We have seen a case where a representation matrix can be reduced to two (or more) block matrices as in (18.35). Also as already seen in Part III, the block matrices decomposition takes place with normal matrices (including unitary matrices). The character is often used to examine a constitution of a reducible representation or reducible matrix. Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed into block matrices, we say that the unitary matrix comprises a direct sum of those block matrices. In physics and chemistry dealing with atoms, molecules, crystals, etc., we very often encounter such a situation. Extending (18.36), the relation can generally be summarized as Dðgi Þ ¼ Dð1Þ ðgi Þ Dð2Þ ðgi Þ DðωÞ ðgi Þ,

ð18:79Þ

where D(gi) is a reducible representation for a group element gi; D(1)(gi), D(2)(gi), , and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi) means that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may be inequivalent to them. More specifically, the same irreducible representations may well appear several times. To make the above situation clear, we usually use a following equation instead: D ð gi Þ ¼

X

q DðαÞ ðgi Þ, α α

ð18:80Þ

where qα is zero or a positive integer and D(α) is different types of irreducible representations. If the same D(α) repeatedly appears in the direct sum, then qα specifies how many times D(α) appears in the direct sum. Unless D(α) appears, qα is zero. Bearing the above in mind, we take a trace of (18.80). Then we have

700

18 Representation Theory of Groups

χ ð gÞ ¼

X

q χ α α

ðαÞ

ðgÞ,

ð18:81Þ

where we omitted a subscript i indicating an element. To find qα, let us multiply both sides of (18.81) by χ (α)(g) and take summation over group elements. That is, X

χ ðαÞ ðgÞ χ ðgÞ ¼ g

X

q β β

X

χ ðαÞ ðgÞ χ ðβÞ ðgÞ ¼ g

X

q nδ β β αβ

¼ qα n,

ð18:82Þ

where we used (18.76) with the second equality. Thus, we get qα ¼

1 X ð αÞ χ ðgÞ χ ðgÞ: g n

ð18:83Þ

The integer qα explicitly gives the number of appearance of D(α)(g) that appears in a reducible representation D(g). The expression pertinent to the classes is 1 X ðαÞ χ ðK i Þ χ ðK i Þki , i n

qα ¼

ð18:84Þ

where Ki and ki denote the i-th class of the group and the number of elements belonging to Ki, respectively.

18.5

Regular Representation and Group Algebra

Now, the readers may wonder how many different irreducible representations exist for a group. To answer this question, let us introduce a special representation of the regular representation. Definition 18.7 Let ℊ ¼ {g1, g2, , gn} be a group. Let us define a (n, n) square matrix D(R)(gν) for an arbitrary group element gν (1 ν n) such that h

i DðRÞ ðgν Þ ¼ δ g1 i gν gj ð1 i, j nÞ, ij

where δðgν Þ ¼

8 >

:

0

ð18:85Þ

for gν ¼ e ði:e:, identityÞ, ð18:86Þ for gν 6¼ e:

Let us consider a set n o R ¼ DðRÞ ðg1 Þ, DðRÞ ðg2 Þ, , DðRÞ ðgn Þ : Then, the set R is said to be a regular representation of the group ℊ.

ð18:87Þ

18.5

Regular Representation and Group Algebra

701

In fact, R is a representation. This is confirmed as follows: In (18.85), if 1 g g ¼ e, δ g g g g1 ¼ 1. This occurs when gνgj ¼ gi (A). Meanwhile, let us ν j ν j i i g g ¼ e. This occurs when gμgk ¼ gj (B). Replacing consider a situation where g1 μ k j gj in (A) with that in (B), we have gν gμ gk ¼ gi :

ð18:88Þ

g1 i gν gμ gk ¼ e:

ð18:89Þ

That is,

If we choose gi and gν, gj is uniquely decided from (A). If gμ is separately chosen, then gk is uniquely decide from (B) as well, because gj has already been uniquely decided. Thus, performing the following matrix calculations, we get Xh j

DðRÞ ðgν Þ

i h X i ðRÞ 1 1 gμ jk ¼ δ g g g g g δ g ij D ν j μ k i j j 1 h i ¼ δ gi gν gμ gk ¼ DðRÞ gν gμ : ik

ð18:90Þ

Rewriting (18.90) in a matrix product form, we get DðRÞ ðgν ÞDðRÞ gμ ¼ DðRÞ gν gμ :

ð18:91Þ

Thus, D(R) is certainly a representation. To further confirm this, let us think of an example. Example 18.2 Let us consider a thiophene molecule that we have already examined in Sect. 17.2. In a multiplication table, we arrange E, C2, σ v , σ v0 in a first column and their inverse element E, C2, σ v , σ v0 in a first row. In this case, the inverse element is the same as original element itself. Paying attention, e.g., to C2, we allocate the number 1 on the place where C2 appears and the number 0 otherwise. Then, that matrix is a regular representation of C2; see Table 18.3. Thus as D(R)(C2), we get 0

0

B1 B DðRÞ ðC 2 Þ ¼ B @0 0

1 0

0

1

0 0 0 0

0C C C: 1A

0 1

0

ð18:92Þ

As evidenced in (18.92), the rearrangement theorem ensures that the number 1 appears once and only once in each column and each row in such a way that individual column and row vectors become linearly independent. Thus, at the same time, we confirm that the matrix is unitary.

702

18 Representation Theory of Groups

Another characteristic of the regular representation is that the identity is represented by an identity matrix. In this example, we have 0

1 B0 B DðRÞ ðE Þ ¼ B @0

0 1 0 0

0

1 0 0 0 0C C C: 1 0A

ð18:93Þ

0 1

For the other symmetry operations, we have 0

0 0 B0 0 B DðRÞ ðσ v Þ ¼ B @1 0

1 0 0

1 0 1C C C, 0A

0 1

0

0

0

0 B0 B DðRÞ ðσ v0 Þ ¼ B @0

0 0 0 1 1 0

1 1 0C C C: 0A

1

0 0

0

Let χ (R)(gν) be a character of the regular representation. Then, according to the definition of (18.85),

χ ð RÞ ð g ν Þ ¼

Xn

1 ¼ δ g g g ν i i i¼1

8 >

:

0

for gν ¼ e, ð18:94Þ for gν 6¼ e:

As can be seen from (18.92), the regular representation is reducible because the matrix is decomposed into block matrices. Therefore, the representation can be reduced to a direct sum of irreducible representations such that DðRÞ ¼

X

q DðαÞ , α α

ð18:95Þ

where qα is a positive integer or zero and D(α) is an irreducible representation. Then, from (18.81), we have

Table 18.3 How to make a regular representation of C2v

C2v E C2(z) σ v(zx) σ 0v ðyzÞ

E1 E C2 σv σ 0v

C2(z)1 C2 E σ 0v σv

σ v(zx)1 σv σ 0v E C2

σ 0v ðyzÞ1 σ 0v σv C2 E

18.5

Regular Representation and Group Algebra

χ ðRÞ ðgν Þ ¼

X

703

q χ α α

ðαÞ

ðgν Þ,

ð18:96Þ

where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and (18.94), 1 X ð α Þ ð RÞ 1 1 χ ðgÞ χ ðgÞ ¼ χ ðαÞ ðeÞ χ ðRÞ ðeÞ ¼ χ ðαÞ ðeÞ n ¼ χ ðαÞ ðeÞ g n n n ¼ dα : ð18:97Þ

qα ¼

Note that a dimension dα of the representation D(α) is equal to a trace of its identity matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible representation D(α) dα times. To show it more clearly, Table 18.4 gives a character table of C2v. The regular representation matrices are given in Example 18.2. For this, we have DðRÞ ðC 2v Þ ¼ A1 þ A2 þ B1 þ B2 :

ð18:98Þ

This relation obviously indicates that all the irreducible representations of C2v are contained one time (that is equal to the dimension of representation of C2v). Returning to (18.94) and (18.96) and replacing qα with dα there, we get X

d χ α α

ðαÞ

ð gν Þ ¼

8 > < n for gν ¼ e, > :

ð18:99Þ

0 for gν 6¼ e:

In particular, when gν ¼ e, again we have χ (α)(e) ¼ dα (dα is a real number!). That is, X

d α α

2

¼ n:

ð18:100Þ

This is very important relation in the representation theory of a finite group in that (18.100) sets an upper limit to the number of irreducible representations and their dimensions. That number cannot exceed the order of a group.

Table 18.4 Character table of C2v

C2v A1 A2 B1 B2

E 1 1 1 1

C2(z) 1 1 1 1

σ v(zx) 1 1 1 1

σ 0v ðyzÞ 1 1 1 1

z; x2, y2, z2 xy x; zx y; yz

704

18 Representation Theory of Groups

In (18.76) and (18.78), we have shown the orthogonality relationship between traces. We have another important orthogonality relationship between them. To prove this theorem, we need a notion of group algebra [5]. The argument is as follows: (a) Let us think of a set comprising group elements expressed as ℵ¼

X

a g, g g

ð18:101Þ

where g is a group element of a group ℊ ¼ {g1, g2, , gn}; ag is an arbitrarily chosen complex number; Σg means that summation should be taken over group elements. Let ℵ0 be another set similarly defined as (18.101). That is, ℵ0 ¼

X

a0 0 g0 : g0 g

ð18:102Þ

Then we can define a following sum: X X 0 0 0 a g þ a g ¼ a g þ a g 0 g g g g g0 g g X ¼ ag þ a0g g: g

ℵ þ ℵ0 ¼

X

ð18:103Þ

Also, we get X X X X 0 0 0 a g a a gg ¼ a0g0 ℵ ℵ0 ¼ 0g g g 0 0 g g g g g X X X X 0 1 0 0 0 1 0 ¼ a gg ¼ a gg g a a0g0 1 1 g0 g0 1 g g g0 1 gg0 gg X X X X X X 0 0 0 0 g a 0 1 ¼ 0 0 a 0 1 g, ¼ a a gg gg 1 1 agg ag0 1 g ¼ 0 0 0 0 0 g g g gg gg g g g ð18:104Þ where we used the rearrangement theorem and suitable exchange of group elements. Thus, we see that the above-defined set ℵ is closed under summations (i.e., linear combinations) and multiplications. A set closed under summations and multiplications is said to be an algebra. If the set forms a group, the said set is called a group algebra. So far we have treated calculations as a multiplication between two elements g and g0, i.e., g ⋄ g0 (see Sect. 16.1). Now we start regarding the calculations as summation as well. In that case, group elements act as basis vectors in a vector space. Bearing this in mind, let us further define specific group algebra. (b) Let Ki be the i-th conjugacy class of a group ℊ ¼ {g1, g2, , gn}. Also let Ki be such that n o ðiÞ ðiÞ ðiÞ K i ¼ A1 , A2 , , Aki ,

ð18:105Þ

where ki is the number of elements belonging to Ki. Now think of a set gKig1 (8g 2 ℊ). Then, due to the definition of a class, we have

18.5

Regular Representation and Group Algebra

705

gK i g1 ⊂ K i :

ð18:106Þ

Multiplying g1 from the left and g from the right of both sides, we get Ki ⊂ g1Kig. Since g is arbitrarily chosen, replacing g with g1 we have K i ⊂ gK i g1 :

ð18:107Þ

gK i g1 ¼ K i :

ð18:108Þ

Therefore, we get

ðiÞ

ðiÞ

Meanwhile, for AðαiÞ , Aβ 2 K i ; AðαiÞ 6¼ Aβ (1 α, β ki) we have ðiÞ

gAðαiÞ g1 6¼ gAβ g1

8

g2ℊ :

ð18:109Þ ðiÞ

This is because if the equality holds with (18.109), we have AðαiÞ ¼ Aβ , in contradiction. (c) Let K be a set collecting several classes and described as K¼

X

aK, i i i

ð18:110Þ

where ai is a positive integer or zero. Thanks to (18.105) and (18.110), we have gKg1 ¼ K:

ð18:111Þ

Conversely, if a group algebra K satisfies (18.111), K can be expressed as a sum of classes such as (18.110). Here suppose that K is not expressed by (18.110), but described by K¼

X

aK i i i

þ Q,

ð18:112Þ

where Q is an “incomplete” set that does not form a class. Then, X X 1 gKg1 ¼ g a K þ Q g ¼ g a K g1 þ gQg1 i i i i i i X X ¼ a K þ gQg1 ¼ K ¼ a K þ Q, i i i i i i

ð18:113Þ

where the equality before the last comes from (18.111). Thus, from (18.113) we get gQg1 ¼ Q

8

g2ℊ :

ð18:114Þ

706

18 Representation Theory of Groups

By definition of the classes, this implies that Q must form a “complete” class, in contradiction to the supposition. Thus, (18.110) holds. In Sect. 16.3, we described several characteristics of an invariant subgroup. As readily seen from (16.11) and (18.111), any invariant subgroup consists of two or more entire classes. Conversely, if a group comprises entire classes, it must be an invariant subgroup. (d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105). We define a product KiKj as a set containing products ð jÞ AðαiÞ Aβ 1 α ki , 1 β kj . That is KiKj ¼

Xki Xkj

ðiÞ A AðmjÞ : m¼1 l

l¼1

ð18:115Þ

Multiplying (15.115) by g (8g 2 ℊ) from the left and by g1 from the right, we get gK i K j g1 ¼ ðgK i gÞ g1 K j g1 ¼ K i K j :

ð18:116Þ

From the above discussion, we get KiKj ¼

X

c K, l ijl l

ð18:117Þ

where cijl is a positive integer or zero. In fact, when we take gKiKjg1, we merely permute the terms in (18.117). (e) If two group elements of the group ℊ ¼ {g1, g2, , gn} are conjugate to each other, their inverse elements are conjugate to each other as well. In fact, suppose that for gμ, gν 2 Ki, we have gμ ¼ ggν g1 :

ð18:118Þ

Then, taking the inverse of (18.118), we get gμ 1 ¼ ggν 1 g1 :

ð18:119Þ

Thus, given a class Ki, there exists another class K i0 that consists of the inverses of the elements of Ki. If gμ 6¼ gν, gμ1 6¼ gν1. Therefore, Ki and K i0 are of the same order. Suppose that the number of elements contained in Ki is ki and that in K i0 is ki0 . Then, we have ki ¼ ki0 :

ð18:120Þ

18.6

Classes and Irreducible Representations

707

If K j 6¼ K i0 (i.e., Kj is not identical with K i0 as a set), KiKj does not contain e. In fact, suppose that for gρ 2 Ki there were a group element b 2 Kj such that bgρ ¼ e. Then, we would have b ¼ gρ 1

and

b 2 K i0 :

ð18:121Þ

T This would mean that b 2 K j K i0 , implying that K j ¼ K i0 by definition of class. It is in contradiction to K j 6¼ K i0 . Consequently, KiKj does not contain e. Taking the product of the classes Ki and K i0 , we obtain the identity e precisely ki times. Rewriting (18.117), we have K i K j ¼ cij1 K 1 þ

X

c K l6¼1 ijl l

,

ð18:122Þ

where K1 ¼ {e}. As mentioned below, we are most interested in the first term in (18.122). In (18.122), if K j ¼ K i0 , we have cij1 ¼ ki :

ð18:123Þ

On the other hand, if K j 6¼ K i0 , cij1 ¼ 0. Summarizing the above arguments, we get

cij1 ¼

8 > < ki > :

0

for K j ¼ K i0 , ð18:124Þ for K j 6¼ K i0 :

Or, symbolically we write cij1 ¼ ki δ ji0 :

18.6

ð18:125Þ

Classes and Irreducible Representations

After the aforementioned considerations, we have the following theorem: Theorem 18.5 Let ℊ ¼ {g1, g2, , gn} be a group. A trace of irreducible representations satisfies the following orthogonality relation: Xnr α¼1

n χ ðαÞ ðK i Þχ ðαÞ K j ¼ δij , ki

ð18:126Þ

where summation α is taken over all inequivalent nr irreducible representations; Ki and Kj indicate conjugacy classes; ki denotes the number of elements contained in the i-th class Ki.

708

18 Representation Theory of Groups

Proof Rewriting (18.108), we have gK i ¼ K i g

8

g2ℊ :

ð18:127Þ

Since a homomorphic correspondence holds between a group element and its bi representation matrix, a similar correspondence holds as well with (18.127). Let K (α) b i be be a sum of ki matrices of the α-th irreducible representation D and let K expressed as bi ¼ K

X g2K i

DðαÞ :

ð18:128Þ

Note that in (18.128) D(α) functions as a linear transformation with respect to a group algebra. From (18.127), we have 8

bi ¼ K b i DðαÞ DðαÞ K

g2ℊ :

ð18:129Þ

Since D(α) is an irreducible representation, Kbi must be expressed on the basis of Schur’s Second Lemma as b i ¼ λE: K

ð18:130Þ

To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and (18.130) we get k i χ ðαÞ ðK i Þ ¼ λd α :

ð18:131Þ

b i ¼ ki χ ðαÞ ðK i ÞE: K dα

ð18:132Þ

Thus, we get

Next, corresponding to (18.122), we have b iK bj ¼ K

X

b

c K: l ijl l

b l in (18.133) with that of (18.132), we get Replacing K

ð18:133Þ

18.6

Classes and Irreducible Representations

709

X k i kj χ ðαÞ ðK i Þχ ðαÞ K j ¼ dα l cijl kl χ ðαÞ ðK l Þ:

ð18:134Þ

Returning to (18.99) and rewriting it, we have X

d χ α α

ðαÞ

ðK i Þ ¼ nδi1 ,

ð18:135Þ

where again we have K1 ¼ {e}. With respect to (18.134), we sum over all the irreducible representations α. Then we have i X Xh ðαÞ ð αÞ ð αÞ k k χ ð K Þχ K c k d χ ð K Þ ¼ i j i j ijl l α l α l α X ¼ c k nδ ¼ cij1 n: l ijl l l1

X

ð18:136Þ

In (18.136) we remark that k1 ¼ 1, meaning that the number of group elements that K1 ¼ {e} contains is 1. Rewriting (18.136), we get ki kj

X

ðαÞ ðαÞ χ ð K Þχ K j ¼ cij1 n ¼ ki nδ ji0 , i α

ð18:137Þ

where we used (18.125) with the last equality. Moreover, using χ ðαÞ ðK i0 Þ ¼ χ ðαÞ ðK i Þ ,

ð18:138Þ

we get ki kj

X α

χ ðαÞ ðK i Þχ ðαÞ K j ¼ cij1 n ¼ ki nδji :

ð18:139Þ

Rewriting (18.139), we finally get Xnr α¼1

n χ ðαÞ ðK i Þχ ðαÞ K j ¼ δij : ki

ð18:126Þ

Equations (18.78) and (18.126) are well known as orthogonality relations. So far, we have no idea about the mutual relationship between the two numbers nc and nr in magnitude. In (18.78), let us consider a following set S S¼

npffiffiffiffiffi o pffiffiffiffiffi pffiffiffiffiffiffi k 1 χ ðαÞ ðK 1 Þ, k2 χ ðαÞ ðK 2 Þ, , k nc χ ðαÞ ðK nc Þ :

ð18:140Þ

710

18 Representation Theory of Groups

Viewing individual components in S as coordinates of a nc-dimensional vector, (18.78) can be considered as an inner product as expressed using (complex) coordinates of the two vectors. At the same time, (18.78) represents an orthogonal relationship between the vectors. Since we can obtain at most nc mutually orthogonal (i.e., linearly independent) vectors in a nc-dimensional space, for the number (nr) of such vectors we have nr n c :

ð18:141Þ

Here nr is equal to the number of different α, i.e., the number of irreducible representations. Meanwhile, in (18.126) we consider a following set S0 n o S0 ¼ χ ð1Þ ðK i Þ, χ ð2Þ ðK i Þ, , χ ðnr Þ ðK i Þ :

ð18:142Þ

Similarly, individual components in S0 can be considered as coordinates of a nrdimensional vector. Again, (18.126) implies the orthogonality relation among vectors. Therefore, as for the number (nc) of mutually orthogonal vectors we have nc nr :

ð18:143Þ

Thus, from (18.141) and (18.143) we finally reach a simple but very important conclusion about the relationship between nc and nr such that nr ¼ n c :

ð18:144Þ

That is, the number (nr) of inequivalent irreducible representations is equal to that (nc) of conjugacy classes of the group. An immediate and important consequence of (18.144) together with (18.100) is that the representation of an Abelian group is onedimensional. This is because in the Abelian group individual group elements constitute a conjugacy class. We have nc ¼ n accordingly.

18.7

Projection Operators

In Sect. 18.2 we have described how basis vectors (or functions) are transformed by a symmetry operation. In that case the symmetry operation is performed by a group element that belongs to a transformation group. More specifically, given a group ℊ ¼ {g1, g2, , gn} and a set of basis vectors ψ 1, ψ 2, , and ψ d, the basis vectors are transformed by gi 2 ℊ (1 i n) such that

18.7

Projection Operators

711

gi ð ψ ν Þ ¼

Xd μ¼1

ψ μ Dμν ðgi Þ:

ð18:22Þ

We may well ask then how we can construct such basis vectors. In this section we address this question. A central concept about this is a projection operator. We have already studied the definition and basic properties of the projection operators. In this section, we deal with them bearing in mind that we apply the group theory to molecular science, especially to quantum chemical calculations. In Sect. 18.6, we examined the permissible number of irreducible representations. We have reached a conclusion that the number (nr) of inequivalent irreducible representations is equal to that (nc) of conjugacy classes of the group. According to this conclusion, we modify the above equation (18.22) such that Xd α ðαÞ ðαÞ ðαÞ g ψi ψ Dji ðgÞ, ¼ j¼1 j

ð18:145Þ

where α and dα denote the α-th irreducible representation and its dimension, respectively; the subscript i is omitted from gi for simplicity. This naturally leads ðαÞ ðβ Þ to the next question of how the basis vectors ψ j are related to ψ j that are basis vectors as well, but belong to a different irreducible representation β. Suppose that we have an arbitrarily chosen function f. Then, f is assumed to contain various components of different irreducible representations. Thus, let us assume that f is decomposed into the component such that f ¼

X X α

cðαÞ ψ ðmαÞ , m m

ð18:146Þ

where cðmαÞ is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th irreducible representation. Now, let us define the following operator: ðαÞ

PlðmÞ

dα X ðαÞ D ðgÞ g, g lm n

ð18:147Þ

where Σg means that the summation should be taken over all n group elements g; dα ðαÞ denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have

712

18 Representation Theory of Groups ðαÞ

d α X ðαÞ d X ðαÞ X X ðνÞ ðνÞ Dlm ðgÞ gf ¼ α D ð gÞ g ν k c k ψ k g g lm n n d X ðαÞ X X ðνÞ ðνÞ ¼ α D ð gÞ c gψ k g lm ν k k n X X Xdν ðνÞ ðνÞ X d ðαÞ ðνÞ ¼ α D ð g Þ c ψ Djk ðgÞ lm k g ν k j¼1 j n h i X X X X dν d ðνÞ ð νÞ ðαÞ ðνÞ c ψ D ð g Þ D ð g Þ ¼ α j k lm jk ν k j¼1 g n dα X X ðνÞ Xdν ðνÞ n ðαÞ ¼ c ψ δ δ δ ¼ cðmαÞ ψ l , ν k k j¼1 j d α αν lj mk n

PlðmÞ f ¼

ð18:148Þ

where we used (18.51) for the equality before the last. ð αÞ Thus, an implication of the operator PlðmÞ is that if f contains the component ψ ðmαÞ, ðαÞ

i.e., cðmαÞ 6¼ 0, PlðmÞ plays a role in extracting the component cðmαÞ ψ ðmαÞ from f and then ðαÞ

converting it to cðmαÞ ψ l . If the ψ ðmαÞ component is not contained in f, that means from ð αÞ (18.148) PlðmÞ f ¼ 0. In that case we choose a more suitable function for f. In this context, we will investigate an example of a quantum chemical calculation later. ðαÞ In the above case, let us call PlðmÞ a projection operator sensu lato. In Sect. 14.1, we dealt with several aspects of the projection operators. There we have mentioned that a projection operator should be idempotent and Hermitian in a rigorous sense (Definition 14.1). To address another important aspect of the projection operator, let us prove an important relation in a following theorem. ðαÞ

ðβÞ

Theorem 18.6 [1, 2] Let PlðmÞ and PsðtÞ be projection operators defined in (18.147). Then, the following equation holds: ðαÞ

ðβ Þ

ð αÞ

PlðmÞ PsðtÞ ¼ δαβ δms PlðtÞ : Proof We have

ð18:149Þ

18.7

Projection Operators ðαÞ

ðβÞ

713

d α X ðαÞ dβ X ðβÞ 0 0 D ðgÞ g D ðg Þ g g lm g0 st n n d X ðαÞ d β X ðβÞ 1 0 1 0 ¼ α D ð gÞ g D g g g g g lm g0 st n n d dβ X X ðαÞ ðβÞ 1 0 1 0 ¼ α D ðgÞ Dst g g gg g g g0 lm n n h i{ d α d β X X ðαÞ ðβÞ ðβÞ 0 ¼ D ð gÞ D ðgÞ D ðg Þ eg0 g g0 lm n n st n o d α d β X X ð αÞ X ð β Þ ðβÞ 0 ¼ D ð g Þ D ð g Þ D ð g Þ g0 0 ks kt lm g g k n n h i d d β X X X ðαÞ ðβÞ ¼ α D ð g Þ D ð g Þ DðβÞ ðg0 Þkt g0 0 ks lm k g g n n dβ X d dβ X X n ðβ Þ 0 0 ¼ α δ δ δ D ð g Þ g ¼ δ δ DðβÞ ðg0 Þlt g0 αβ lk ms 0 kt k g dα g0 αβ ms n n n d β X ðβ Þ 0 0 ð αÞ ¼ δαβ δms D ðg Þlt g ¼ δαβ δms PlðtÞ : g0 n ð18:150Þ

PlðmÞ PsðtÞ ¼

∎

This completes the proof.

In the above proof, we used the rearrangement theorem and grand orthogonality theorem (GOT) as well as the homomorphism and unitarity of the representation matrices. Comparing (18.146) and (18.148), we notice that the term cðmαÞ ψ ðmαÞ is not extracted ðαÞ ðαÞ entirely, but cðmαÞ ψ l is given instead. This is due to the linearity of PlðmÞ . Nonetheless, this is somewhat inconvenient for a practical purpose. To overcome this inconvenience, we modify (18.149). In (18.149) putting s ¼ m, we have ð αÞ

ðβÞ

ðαÞ

PlðmÞ PmðtÞ ¼ δαβ PlðtÞ :

ð18:151Þ

We further modify the relation. Putting β ¼ α, we have ðαÞ

ðαÞ

ðαÞ

ð18:152Þ

ð αÞ

ð αÞ

ðαÞ

ð18:153Þ

PlðmÞ PmðtÞ ¼ PlðtÞ : Putting t ¼ l furthermore, PlðmÞ PmðlÞ ¼ PlðlÞ : In particular, putting m ¼ l moreover, we get

714

18 Representation Theory of Groups

h i2 ðαÞ ðαÞ ð αÞ ðαÞ PlðlÞ PlðlÞ ¼ PlðlÞ ¼ PlðlÞ :

ð18:154Þ

In fact, putting m ¼ l in (18.148), we get ð αÞ

ð αÞ

ðαÞ

PlðlÞ f ¼ cl ψ l : ðαÞ

ð18:155Þ

ðαÞ

This means that the term cl ψ l has been extracted entirely. Moreover, in (18.149) putting β ¼ α, s ¼ l, and t ¼ m, we have h i2 ðαÞ ðαÞ ðαÞ ð αÞ PlðmÞ PlðmÞ ¼ PlðmÞ ¼ δml PlðmÞ : ðαÞ

Therefore, for PlðmÞ to be an idempotent operator, we must have m ¼ l. That is, of ðαÞ

ðαÞ

various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator. ð αÞ

Meanwhile, fully describing PlðlÞ we have ðαÞ

PlðlÞ ¼

dα X ðαÞ D ðgÞ g: g ll n

ð18:156Þ

Taking complex conjugate transposition (i.e., adjoint) of (18.156), we have h i{ { d X ðαÞ d X ðαÞ ð αÞ PlðlÞ ¼ α Dll ðgÞg{ ¼ α Dll g1 g1 1 g g n n d α X ðαÞ ðαÞ ¼ D ðgÞ g ¼ PlðlÞ , g ll n

ð18:157Þ

where we used unitarity of g (with the third equality) and equivalence of summation over g and g1. Notice that the notation of g{ is less common. This should be interpreted as meaning that g{ operates on a vector constituting a representation space. Thus, notation g{ implies that g{ is equivalent to its unitary representation ðαÞ matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely, ðαÞ in (18.156) Dll ðgÞ is a coefficient of an operator g. ðαÞ Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a ðαÞ

rigorous sense. Let us call PlðlÞ a projection operator sensu stricto accordingly. Also, we notice that cðmαÞ ψ ðmαÞ is entirely extracted from an arbitrary function f including a coefficient. This situation resembles that of (12.205). ðαÞ Regarding PlðmÞ ðl 6¼ mÞ, on the other hand, we have

18.7

Projection Operators

h

ðαÞ

PlðmÞ

i{

715

dα X ðαÞ dα X ðαÞ 1 1 { { g D ð g Þg ¼ D 1 lm lm g g g n n d X ðαÞ ðαÞ ¼ α D ðgÞ g ¼ PmðlÞ : g ml n

¼

ð18:158Þ

ðαÞ

Hence, PlðmÞ is not Hermitian. We have many other related equations. For instance, h

ðαÞ

ð αÞ

PlðmÞ PmðlÞ ðαÞ

i{

h i h i ðαÞ { ðαÞ { ðαÞ ðαÞ ¼ PmðlÞ PlðmÞ ¼ PlðmÞ PmðlÞ :

ð18:159Þ

ðαÞ

Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153). In (18.149) putting m ¼ l and t ¼ s, we get ðαÞ ðβÞ

ðαÞ

PlðlÞ PsðsÞ ¼ δαβ δls PlðsÞ :

ð18:160Þ

As in the case of (18.146), we assume that h is described as h¼

X X α

dðαÞ ϕðmαÞ , m m

ð18:161Þ

where ϕðmαÞ is transformed in the same manner as that for ψ ðmαÞ in (18.146), but linearly independent of ψ ðmαÞ . Namely, we have Xd α ðαÞ ðαÞ ðαÞ g ϕi ϕ Dji ðgÞ: ¼ j¼1 j

ð18:162Þ

Tangible examples can be seen in Chap. 19. ð αÞ Operating PlðlÞ on both sides of (18.155), we have h i h i ðαÞ 2 ðαÞ ðαÞ ðαÞ ð αÞ ðαÞ ðαÞ PlðlÞ f ¼ PlðlÞ cl ψ l ¼ PlðlÞ f ¼ cl ψ l ,

ð18:163Þ

h i ð αÞ 2 ð αÞ where with the second equality we used PlðlÞ ¼ PlðlÞ . That is, we have h i ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ PlðlÞ cl ψ l ¼ cl ψ l : ðαÞ

ðαÞ

ð18:164Þ

This equation means that cl ψ l is an eigenfunction corresponding to an ð αÞ ðαÞ ðαÞ eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to the “position” l of an irreducible representation α. Furthermore, for some constants c and d as well as the functions f and h that appeared in (18.146) and (18.161), we consider a following equation:

716

18 Representation Theory of Groups

h i ð αÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ PlðlÞ ccl ψ l þ ddl ϕl ¼ cPlðlÞ cl ψ l þ dPlðlÞ dl ϕl ðαÞ

ðαÞ

¼ ccl ψ l

ðαÞ ðαÞ

þ dd l ϕl ,

ð18:165Þ

where with the last equality we used (18.164). This means that an arbitrary linear ðαÞ ðαÞ ðαÞ ðαÞ combination of cl ψ l and d l ϕl again belongs to the position l of an irreducible ðαÞ ðαÞ representation α. If ψ l and ϕl are linearly independent, we can construct two orthonormal basis vectors following Theorem 13.2 (Gram–Schmidt orthonormalization theorem). If there are other linearly independent vectors ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ pl ξl , ql φl , , where ξl , φl , etc. belong to the position l of an irreducible ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ representation α, then ccl ψ l þ ddl ϕl þ ppl ξl þ qql φl þ again belongs to the position l of an irreducible representation α. Thus, we can construct orthonormal basis vectors of the representation space according to Theorem 13.2. Regarding arbitrary functions f and h as vectors and using (18.160), we make an inner product of (18.160) such that ðαÞ ðβÞ ðαÞ ðαÞ ðβÞ ðαÞ ðαÞ hhPlðlÞ PsðsÞ j f i ¼ hhPlðlÞ PlðlÞ PsðsÞ j f i ¼ hhPlðlÞ δαβ δls PlðsÞ f i E D ðαÞ ðαÞ ¼ δαβ δls hPlðlÞ PlðsÞ f , ðαÞ

ð18:166Þ

ð αÞ ð αÞ

where with the first equality we used PlðlÞ ¼ PlðlÞ PlðlÞ (18.154). Meanwhile, from (18.148) we have ðαÞ

ð αÞ

PlðsÞ j f i ¼ cðsαÞ ψ l :

ð18:167Þ

ðαÞ

ð18:168Þ

Also using (18.155), we get ð αÞ

PlðlÞ j hi ¼ dl

ð αÞ

j ϕl i:

Taking adjoint of (18.168), we have h i h i ðαÞ ðαÞ { ð αÞ ðαÞ hϕl j , h PlðlÞ ¼ hhPlðlÞ ¼ dl

ð18:169Þ

h i{ ð αÞ ðαÞ where we used PlðlÞ ¼ PlðlÞ (18.157). The relation (18.169) is due to the notation of Sect. 13.3. Substituting (18.167) and (18.168) for (18.166), E D h i D E ðαÞ ðβÞ ðαÞ ðαÞ ðαÞ cðsαÞ ϕl jψ l : hPlðlÞ PsðsÞ f ¼ δαβ δls dl Meanwhile, (18.166) can also be described as

18.7

Projection Operators

717

E D D E h E i D ð αÞ ð β Þ ðαÞ ðαÞ ð αÞ ðαÞ hPlðlÞ PsðsÞ f ¼ dl ϕl cðsβÞ ψ ðsβÞ ¼ dl cðsβÞ ϕl ψ ðsβÞ

ð18:170Þ

For (18.166) and (18.170) to be identical, we must have D E D E h h i i ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðβÞ ð αÞ δαβ δls d l cs ϕl ψ l cs ϕl ψ ðsβÞ : ¼ dl Deleting coefficients, we get D E D E ð αÞ ðαÞ ðαÞ ϕl ψ ðsβÞ ¼ δαβ δls ϕl ψ l :

ð18:171Þ

The relation (18.171) is frequently used to estimate whether definite integrals vanish. Functional forms depend upon actual problems we encounter in various situations. We will deal with this problem in Chap. 19 in relation to, e.g., discussion on optical transition and evaluation of overlap integrals. To evaluate (18.170), if α 6¼ β or l 6¼ s, we get simply E D ðαÞ ðβÞ hPlðlÞ PsðsÞ f ¼ 0 ðα 6¼ β or l 6¼ sÞ:

ð18:172Þ

ðβÞ

ðαÞ

That is, under a given condition α 6¼ β or l 6¼ s, PsðsÞ j f i and PlðlÞ j hi are orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that functions belonging to different irreducible representations (α 6¼ β) are mutually orthogonal. Even though the functions belong to the same irreducible representation, the functions are orthogonal if they are allocated to the different “place” as a basis ðαÞ vector designated by l, s, etc. Here the place means the index j of ψ j in (18.145) or ðαÞ ðαÞ ðαÞ ð αÞ ð αÞ ðαÞ ϕj in (18.162) that designates the “order” of ψ j within ψ 1 , ψ 2 , , ψ dα or ϕj ðαÞ

ð αÞ

ð αÞ

within ϕ1 , ϕ2 , , ϕdα . This takes place if the representation is multidimensional (i.e., dimensionality: dα). In (18.160) putting α ¼ β, we get ðαÞ ðαÞ

ðαÞ

PlðlÞ PsðsÞ ¼ δls PlðsÞ : In the above relation, furthermore, unless l ¼ s we have ðαÞ ðαÞ

PlðlÞ PsðsÞ ¼ 0: ðαÞ

ðαÞ

Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection ð αÞ

ðαÞ

operator as well in the case of l 6¼ s. Notice, however, that if l ¼ s, PlðlÞ þ PsðsÞ is not a projection operator. Readers can readily show it. Moreover, let us define P(α) as below

718

18 Representation Theory of Groups

PðαÞ

Xd α

ðαÞ P : l¼1 lðlÞ

ð18:173Þ

Similarly, P(α) is again a projection operator as well. From (18.156), P(α) in (18.173) can be rewritten as PðαÞ ¼

h i dα X Xdα ðαÞ dα X ðαÞ D ð g Þ g ¼ χ ð g Þ g: g l¼1 ll g n n

ð18:174Þ

Returning to (18.155) and taking summation over l there, we have Xdα

ð αÞ

P f ¼ l¼1 lðlÞ

Xd α

ðαÞ ðαÞ c ψl : l¼1 l

Using (18.173), we get PðαÞ f ¼

Xdα

ðαÞ ðαÞ c ψl : l¼1 l

ð18:175Þ

Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting all the vectors (or functions) including their coefficients. Defining those functions as ψ ðαÞ

Xdα

ðαÞ ðαÞ c ψl , l¼1 l

ð18:176Þ

we succinctly rewrite (18.175) as PðαÞ f ¼ ψ ðαÞ :

ð18:177Þ

In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows: PðβÞ PðαÞ ¼

d β X h ðβÞ i dα X h ðαÞ 0 i 0 χ ð gÞ g χ ðg Þ g : g g0 n n

ð18:178Þ

To carry out the calculation, (i) first we replace g0 with g1g0 and rewrite the summation over g0 as that over g1g0. (ii) Using the homomorphism and unitarity of the representation, we rewrite [χ (α)(g0)] as h i X h i h i ðαÞ ðαÞ 0 D ð g Þ ð g Þ χ ðαÞ ðg0 Þ ¼ D ji ji : i,j Rewriting (18.178) we have

ð18:179Þ

18.7

Projection Operators

719

i X h d β dα X X h ðβÞ i h ðαÞ ðαÞ 0 D ð g Þ ð g Þ D ð g Þ g0 D ji 0 kk g k,i,j g n n ji i h i X h dβ dα X n dα X X ðαÞ 0 0 ðαÞ 0 ¼ δ δ δ D ð g Þ g ¼ D ð g Þ g0 αβ kj ki k,i,j d β g0 k g0 n n n ji kk h i d X ðαÞ 0 ¼ δαβ α χ ð g Þ g0 ¼ δαβ PðαÞ : g0 n ð18:180Þ

PðβÞ PðαÞ ¼

These relationships are anticipated from the fact that both P(α) and P(β) are projection operators. As in the case of (18.160), (18.180) is useful to evaluate an inner product of functions. Again taking an inner product of (18.180) with arbitrary functions f and g, we have D D E D E E gPðβÞ PðαÞ f ¼ gδαβ PðαÞ f ¼ δαβ gPðαÞ f :

ð18:181Þ

If we define P such that P

Xnr α¼1

PðαÞ ,

ð18:182Þ

then P is once again a projection operator (see Sect. 14.1). Now taking summation in (18.177) over all the irreducible representations α, we have Xnr

PðαÞ f ¼ α¼1

Xnr α¼1

ψ ðαÞ ¼ f :

ð18:183Þ

The function f has been arbitrarily taken and, hence, we get P ¼ E:

ð18:184Þ

As mentioned in Example 18.1, the concept of the representation space is not only important but also very useful for addressing various problems of physics and chemistry. For instance, molecular orbital methods to be dealt with in Chap. 19 consider a representation space whose dimension is equal to the number of electrons of a molecule about which we wish to know, e.g., energy eigenvalues of those electrons. In that case, the dimension of representation space is equal to the number of molecular orbitals. According to a symmetry species of the molecule, the representation matrix is decomposed into a direct sum of invariant eigenspaces relevant to individual irreducible representations. If the basis vectors belong to different irreducible representation, such vectors are orthogonal to each other in virtue of

720

18 Representation Theory of Groups

(18.171). Even though those vectors belong to the same irreducible representation, they are orthogonal if they are allocated to a different place. However, it is often the case that the vectors belong to the same place of the same irreducible representation. Then, it is always possible according to Theorem 13.2 to make them mutually orthogonal by taking their linear combination. In such a way, we can construct an orthonormal basis set throughout the representation space. In fact, the method is a powerful tool for solving an energy eigenvalue problem and for determining associated eigenfunctions (or molecular orbitals).

18.8

Direct-Product Representation

In Sect. 16.5 we have studied basic properties of direct-product groups. Correspondingly, we examine in this section properties of direct-product representation. This notion is very useful to investigate optical transitions in molecular systems and selection rules relevant to those transitions. Let D(α) and D(β) be two different irreducible representations whose dimensions are dα and dβ, respectively. Then, operating a group element g on the basis functions we have gð ψ i Þ ¼ g ϕj ¼

Xd α

ðαÞ

ψ k Dki ðgÞ

ð1 i dα Þ,

ð18:185Þ

ðβ Þ ϕ D ð gÞ l¼1 l lj

1 j dβ ,

ð18:186Þ

k¼1

Xd β

where ψ k (1 i dα) and ϕl (1 l dβ) are basis functions of D(α) and D(β), respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions are transformed according to g such that ihX i hX ðαÞ ðβÞ ψ D ð g Þ ϕ D ð g Þ g ψ i ϕj ¼ gðψ i Þg ϕj ¼ k l ki lj k l XX ðαÞ ðβÞ ¼ ψ ϕ D ðgÞDlj ðgÞ: k l k l ki

ð18:187Þ

Here let us define the following matrix [D(α β)(g)]kl, ij such that h i Dðα βÞ ðgÞ

ðαÞ

kl,ij

ðβ Þ

Dki ðgÞDlj ðgÞ:

ð18:188Þ

Then we have h i XX ðα βÞ ψ ϕ D ð g Þ g ψ i ϕj ¼ k l k l

kl,ij

:

ð18:189Þ

18.8

Direct-Product Representation

721

The notation using double scripts is somewhat complicated. We notice, however, that in (18.188) the order of subscript of kilj is converted to kl, ij; i.e., the subscripts i and l have been interchanged. We write D(α) and D(β) in explicit forms as follows: 0

ð αÞ

d1,1

B DðαÞ ðgÞ ¼ @ ⋮ ðαÞ ddα ,1

ðαÞ

d 1,dα

0

1

C ⋱ ⋮ A, ðαÞ d dα ,dα

ðβÞ

d1,1

B DðβÞ ðgÞ ¼ B @ ⋮ ðβ Þ ddβ ,1

ðβÞ

d1,dβ

1

C ⋱ ⋮ C A: ð18:190Þ ðβÞ d dβ ,dβ

Thus, Dðα βÞ ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ 0 ðαÞ ðβÞ d1,1 D ðgÞ B ¼@ ⋮ ⋱ ðαÞ ddα ,1 DðβÞ ðgÞ

1 ðαÞ d1,dα DðβÞ ðgÞ C A: ⋮

ð18:191Þ

ðαÞ d dα ,dα DðβÞ ðgÞ

To get familiar with the double-scripted notation, we describe a case of (2, 2) matrices. Denoting ðαÞ

D ð gÞ ¼

a11

a12

a21

a22

ðβÞ

and D ðgÞ ¼

b11

b12

b21

b22

,

ð18:192Þ

we get Dðα βÞ ðgÞ ¼ DðαÞ ðgÞ DðβÞ ðgÞ 0 a11 b12 a12 b11 a12 b12 a11 b11 Ba b B 11 21 a11 b22 a12 b21 a12 b22 B ¼B B a21 b11 a21 b12 a22 b11 a22 b12 @ a21 b21

a21 b22

a22 b21

1 C C: C C C A

ð18:193Þ

a22 b22

Corresponding to (18.22), (18.189) describes the transformation of ψ iϕj regarding the double script. At the same time, a set {ψ iϕj; (1 i dα, 1 j dβ)} is a basis of D(α β). In fact,

722

h

18 Representation Theory of Groups

Dðα βÞ ðgg0 Þ

i kl,ij

ðαÞ

ðβÞ

¼ Dki ðgg0 ÞDlj ðgg0 Þ hX ihX i ðαÞ ð αÞ 0 ðβÞ ðβÞ 0 D ð g ÞD ð g Þ D ð g ÞD ð g Þ ¼ μi νj μ kμ ν lν X X ðαÞ ðβÞ ðαÞ 0 ðβÞ ¼ D ðgÞDlν ðgÞDμi ðg ÞDνj ðg0 Þ μ ν kμ i h i X Xh ðα βÞ ðα βÞ 0 ¼ D ð g Þ D ð g Þ μ ν

ð18:194Þ

μν,ij

kl,μν

Equation (18.194) shows that the rule of matrix calculation with respect to the double subscript is satisfied. Consequently, we get Dðα βÞ ðgg0 Þ ¼ Dðα βÞ ðgÞDðα βÞ ðg0 Þ:

ð18:195Þ

The relation (18.195) indicates that D(α β) is certainly a representation of a group. This representation is said to be a direct-product representation. A character of the direct-product representation is given by putting k ¼ i and l ¼ j in (18.188). That is, h

i Dðα βÞ ðgÞ

ðαÞ

ij,ij

ðβÞ

¼ Dii ðgÞDjj ðgÞ:

ð18:196Þ

Denoting χ ðα βÞ ðgÞ

X h i, j

Dðα βÞ ðgÞ

i ij,ij

,

ð18:197Þ

we have χ ðα βÞ ðgÞ ¼ χ ðαÞ ðgÞχ ðβÞ ðgÞ:

ð18:198Þ

Even though D(α) and D(β) are both irreducible, D(αxβ) is not necessarily irreducible. Suppose that DðαÞ ðgÞ DðβÞ ðgÞ ¼

X

q D ω ω

ðωÞ

ðgÞ,

ð18:199Þ

where qγ is given by (18.83). Then, we have qγ ¼

1 X ðγÞ ðα βÞ 1 X ðγÞ ðαÞ χ ð g Þ χ ð g Þ ¼ χ ðgÞ χ ðgÞχ ðβÞ ðgÞ, g g n n

ð18:200Þ

where n is an order of the group. This relation is often used to perform quantum mechanical or chemical calculations, especially to evaluate optical transitions of

18.9

Symmetric Representation and Antisymmetric Representation

723

matter including atoms and molecular systems. This is also useful to examine whether a definite integral of a product of functions (or an inner product of vectors) vanishes. In Sect. 16.5, we investigated definition and properties of direct-product groups. Similarly to the case of the direct-product representation, we consider a representation of the direct-product groups. Let us consider two groups ℊ and H and assume that a direct-product group ℊ H is defined (Sect. 16.5). Also let D(α) and D(β) be dα- and dβ-dimensional representations of groups ℊ and H , respectively. Furthermore, let us define a matrix D(α β)(ab) as in (18.188) such that h i Dðα βÞ ðabÞ

ð αÞ

kl,ij

ðβ Þ

Dki ðaÞDlj ðbÞ,

ð18:201Þ

where a and b are arbitrarily chosen from ℊ and H , respectively, and ab 2 ℊ H . Then a set comprising D(α β)(ab) forms a representation of ℊ H . (Readers, please verify it.) A dimension of D(α β) is dαdβ. The character is given in (18.69), and so in the present case by putting i ¼ k and j ¼ l in (18.201) we get χ ðα βÞ ðabÞ ¼

i X Xh ðα βÞ D ð ab Þ k l

kl,kl

¼

XX k

ð αÞ

l

ðβ Þ

Dkk ðaÞDll ðbÞ

¼ χ ðαÞ ðaÞχ ðβÞ ðbÞ:

ð18:202Þ

Equation (18.202) resembles (18.198). Hence, we should be careful not to confuse them. In (18.198), we were thinking of a direct-product representation within a sole group ℊ. In (18.202), however, we are considering a representation of the direct-product group comprising two different groups. In fact, even though a character is computed with respect to a sole group element g in (18.198), in (18.202) we evaluate a character regarding two group elements a and b chosen from different groups ℊ and H , respectively.

18.9

Symmetric Representation and Antisymmetric Representation

As mentioned in Sect. 18.2, we have viewed a group element g as a linear transformation over a vector space V. There we dealt with widely chosen functions as vectors. In this chapter we introduce other useful ideas so that we can apply them to molecular science and atomic physics. In the previous section we defined direct-product representation. In D(α β)(g) ¼ D(α)(g) D(β)(g) we can freely consider a case where D(α)(g) ¼ D(β)(g). Then we have

724

18 Representation Theory of Groups

XX ðαÞ ðαÞ g ψ i ϕj ¼ gðψ i Þg ϕj ¼ ψ ϕ D ðgÞDlj ðgÞ: k l k l ki

ð18:203Þ

Regarding a product function ψ jϕi we can get a similar equation such that XX ðαÞ ðαÞ g ψ j ϕi ¼ g ψ j gð ϕi Þ ¼ ψ ϕ D ðgÞDli ðgÞ: k l k l kj

ð18:204Þ

On the basis of the linearity of the relations (18.203) and (18.204), let us construct a linear combination of the product functions. That is, g ψ i ϕj ψ j ϕi ¼ g ψ i ϕj g ψ j ϕi XX XX ðαÞ ðαÞ ðαÞ ðαÞ ¼ ψ ϕ D ðgÞDlj ðgÞ ψ ϕ D ðgÞDli ðgÞ k l k l ki k l k l kj XX ð αÞ ðαÞ ðψ k ϕl ψ l ϕk ÞDki ðgÞDlj ðgÞ: ¼ k l ð18:205Þ Here, defining Ψ ij as Ψ ij ¼ ψ i ϕj ψ j ϕi ,

ð18:206Þ

we rewrite (18.205) as gΨ ij ¼

XX k

Ψ l kl

n h io 1 ðαÞ ð αÞ ðαÞ ðαÞ Dki ðgÞDlj ðgÞ Dli ðgÞDkj ðgÞ : 2

ð18:207Þ

Notice that Ψ þ ij and Ψ ij in (18.206) are symmetric and antisymmetric with respect to the interchange of subscript i and j, respectively. That is, Ψ ij ¼ Ψ ji :

ð18:208Þ

We may naturally ask how we can constitute representation (matrices) with (18.207). To answer this question, we have to carry out calculations by replacing g with gg0 in (18.207) and following the procedures of Sect. 18.2. Thus, we have n h io 1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ 0 0 D ð gg ÞD ð gg ÞD ð gg ÞD ð gg Þ ki lj li kj k (2 h XX X ðαÞ 1 X ðαÞ ðαÞ ðαÞ ¼ Ψ D ðgÞDμi ðg0 Þ ν Dlν ðgÞDνj ðg0 Þ k l kl 2 μ kμ #) X ðαÞ X ðαÞ ð αÞ 0 ð αÞ 0 μ Dlμ ðgÞDμi ðg Þ ν Dkν ðgÞDνj ðg Þ

gg0 Ψ ij ¼

XX

Ψ l kl

18.9

Symmetric Representation and Antisymmetric Representation

725

n X Xh i o 1 ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ 0 ðαÞ 0 D ð g ÞD ð g ÞD ð g ÞD ð g Þ D ð g ÞD ð g Þ μi νj kμ lν lμ kν k μ ν (2 i h i o nh XX X X ðαÞ 0 ðαÞ 0 1 ðα αÞ ðα αÞ ¼ Ψ D ð g Þ D ð g Þ kl,μν lk,μν Dμi ðg ÞDνj ðg Þ, kl k l μ ν 2 ¼

XX

Ψ l kl

ð18:209Þ where the last equality follows from the definition of a direct-product representation (18.188). We notice that the terms of [D(α α)(gg0)]kl, μν [D(α α)(gg0)]lk, μν in (18.209) are symmetric and antisymmetric with respect to the interchange of subscripts k and l together with subscripts μ and ν, respectively. Comparing both sides of (18.209), we see that this must be the case with i and j as well. Then, the last factor of (18.209) should be rewritten as: ðαÞ

ðαÞ

Dμi ðg0 ÞDνj ðg0 Þ ¼

h i 1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ Dμi ðg ÞDνj ðg Þ Dνi ðg0 ÞDμj ðg0 Þ : 2

Now, we define the following notations accordingly: n

D½α α ðgÞ

Dfα αg ðgÞ

o kl,μν

kl,μν

h i h o 1 f Dðα αÞ ðgÞ þ Dðα αÞ ðgÞlk,μν 2 kl,μν h i 1 ðαÞ ðαÞ ðαÞ ð αÞ ¼ Dkμ ðgÞDlν ðgÞ þ Dlμ ðgÞDkν ðgÞ : 2

nh i i o h 1 Dðα αÞ ðgÞ Dðα αÞ ðgÞ lk,μν 2 kl,μν h i 1 ðαÞ ð αÞ ðαÞ ðαÞ ¼ Dkμ ðgÞDlν ðgÞ Dlμ ðgÞDkν ðgÞ : 2

ð18:210Þ

ð18:211Þ

ðαÞ ðαÞ Meanwhile, using Dμi ðg0 ÞDνj ðg0 Þ ¼ Dðα αÞ ðg0 Þ μν,ij and considering the exchange of summation with respect to the subscripts μ and ν, we can also define D[α α](g0) and D{α α}(g) according to the symmetric and antisymmetric cases, respectively. Thus, for the symmetric case, we have n o n o þ ½α α ½α α 0 Ψ D ð g Þ D ð g Þ kl k l μ ν kl,μν μν,ij n o XX þ ½α α ½α α 0 ¼ Ψ D ðgÞD ðg Þ : k l kl

gg0 Ψþ ij ¼

X XX X

kl,ij

Using (18.210), (18.207) can be rewritten as

ð18:212Þ

726

18 Representation Theory of Groups

gΨþ ij ¼

XX k

n o þ ½α α Ψ D ð g Þ l kl

kl,ij

:

ð18:213Þ

Then we have gg0 Ψþ ij ¼

XX k

l

n o ½α α Ψþ ðgg0 Þ kl D

kl,ij

:

ð18:214Þ

Comparing (18.212) and (18.214), we finally get D½α α ðgg0 Þ ¼ D½α α ðgÞD½α α ðg0 Þ:

ð18:215Þ

Similarly, for the antisymmetric case, we have gg0 Ψ ij ¼

XX

fα αg Ψ ðgÞDfα αg ðg0 Þ kl,ij , kl D XX gΨ Ψ Dfα αg ðgÞ kl,ij , ij ¼ k l kl k

l

Dfα αg ðgg0 Þ ¼ Dfα αg ðgÞDfα αg ðg0 Þ:

ð18:216Þ ð18:217Þ ð18:218Þ

Thus, both D[α α](g) and D{α α}(g) produce well-defined representations. Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions belonging to the symmetric representation and dα(dα 1)/2 functions belonging to the antisymmetric representation. With the two-dimensional representation, for instance, functions belonging to the symmetric representation are ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 : A function belonging to the antisymmetric representation is ψ 1 ϕ2 ψ 2 ϕ1 : Note that these vectors have not yet been normalized. From (18.210) and (18.211), we can readily get useful expressions with characters of symmetric and antisymmetric representations. In (18.210) putting μ ¼ k and ν ¼ l and summing over k and l,

References

727

o X Xn ½α α D ð g Þ k l kl,kl h i X X 1 ð αÞ ðαÞ ðαÞ ðαÞ ¼ D ð g ÞD ð g Þ þ D ð g ÞD ð g Þ kk ll lk kl l 2 k h i 2 h i 1 ¼ χ ðαÞ ðgÞ þ χ ðαÞ g2 : 2

χ ½α α ðgÞ ¼

ð18:219Þ

Similarly we have χ

fα αg

1 ðgÞ ¼ 2

h

χ

ð αÞ

ð gÞ

i2

h

χ

ðαÞ

2 i g :

ð18:220Þ

References 1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer, Berlin 2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded ed. Shokabo, Tokyo. (in Japanese) 3. Hassani S (2006) Mathematical physics. Springer, New York 4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World Scientific, Singapore 5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York

Chapter 19

Applications of Group Theory to Physical Chemistry

On the basis of studies of group theory, now in this last chapter we apply the knowledge to the molecular orbital (MO) calculations (or quantum chemical calculations). As tangible examples, we adopt aromatic hydrocarbons (ethylene, cyclopropenyl radical, benzene, and ally radical) and methane. The approach is based upon a method of linear combination of atomic orbitals (LCAO). To seek an appropriate LCAO MO, we make the most of a method based on a symmetryadapted linear combination (SALC). To use projection operators is a powerful tool for this purpose. For the sake of correct understanding, it is desired to consider transformation of functions. To this end, we first show several examples. In the process of carrying out MO calculations, we encounter a secular equation as an eigenvalue equation. Using a SALC eases the calculations of the secular equation. Molecular science relies largely on spectroscopic measurements and researchers need to assign individual spectral lines to a specific transition between the relevant molecular states. Representation theory works well in this situation. Thus, the group theory finds a perfect fit with its applications in the molecular science.

19.1

Transformation of Functions

Before showing individual examples, let us consider the transformation of functions (or vectors) by the symmetry operation. Here we consider scalar functions. For example, let us suppose an arbitrary function f(x, y) on a Cartesian xycoordinate. Figure 19.1 shows a contour map of f(x, y) ¼ constant. Then suppose that the map is rotated around the origin. More specifically, the position vector r0 fixed on a “summit” [i.e., a point that gives a maximal value of f(x, y)] undergoes a symmetry operation, namely, rotation around the origin. As a result, r0 is transformed to r00 . Here, we assume that a “mountain” represented by f(x, y) is a © Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_19

729

730

19

Fig. 19.1 Contour map of f (x, y) and f 0(x0, y0). The function f 0(x0, y0) is obtained by rotating a map of f(x, y) around the z-axis

Applications of Group Theory to Physical Chemistry

y

′ ′, ′

,

x

O

rigid body before and after the transformation. Concomitantly, a general point r (see Sect. 17.1) is transformed to r0 in exactly the same way as r0. A new function f 0 gives a new contour map after the rotation. Let us describe f 0 as f 0 OR f :

ð19:1Þ

The RHS of (19.1) means that f 0 is produced as a result of operating the rotation R on f. We describe the position vector after being transformed as r00 . Following the notation of Sect. 11.1, r00 and r0 are expressed as r00 ¼ Rðr0 Þ

and r0 ¼ RðrÞ:

ð19:2Þ

The matrix representation for R is given by, e.g., (11.35). In (19.2) we have

x0 r0 = ðe1 e2 Þ y0

and

r00

0 x ¼ ðe1 e2 Þ 00 , y0

ð19:3Þ

where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have 0 x x 0 and r ¼ ðe1 e2 Þ 0 : r ¼ ð e1 e2 Þ y y

ð19:4Þ

Meanwhile, the following equation must hold: f 0 ðx0 , y0 Þ ¼ f ðx, yÞ:

ð19:5Þ

OR f ðx0 , y0 Þ ¼ f ðx, yÞ:

ð19:6Þ

Or, using (19.1) we have

19.1

Transformation of Functions

731

The above argument can be extended to a three-dimensional (or higher dimensional) space. In that case, similarly we have OR f ðx0 , y0 , z0 Þ ¼ f ðx, y, zÞ, OR f ðr0 Þ ¼ f ðrÞ,

or

f 0 ðr0 Þ ¼ f ðrÞ,

ð19:7Þ

where 0 01 0 1 x x B 0C B C 0 r ¼ ðe1 e2 e3 Þ@ y A and r ¼ ðe1 e2 e3 Þ@ y A:

ð19:8Þ

z0

z

The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite (19.7) as OR f ½RðrÞ ¼ f ðrÞ:

ð19:9Þ

Replacing r with R1(r), we get OR f R R1 ðrÞ ¼ f R1 ðrÞ :

ð19:10Þ

OR f ðrÞ ¼ f R1 ðrÞ :

ð19:11Þ

That is,

More succinctly, (19.11) may be rewritten as Rf ðrÞ ¼ f R1 ðrÞ :

ð19:12Þ

Comparing (19.11) with (19.12) and considering (19.1), we have f 0 OR f Rf :

ð19:13Þ

To gain a good understanding of the function transformation, let us think of some examples. Example 19.1 Let f(x, y) be a function described by f ðx, yÞ = ðx aÞ2 þ ðy bÞ2 ; a, b > 0:

ð19:14Þ

A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis. Then, f(x, y) is transformed into Rf(x, y) ¼ f0(x, y) such that

732

19

Fig. 19.2 Contour map of f (x, y) ¼ (x a)2 + (y b)2 and f0(x0, y0) ¼ (x0 + b)2 + (y0 a)2 before and after a π/2 rotation around the z-axis

Applications of Group Theory to Physical Chemistry

y ′ ′, ′

, ( , )

(− , )

z

O

Rf ðx, yÞ ¼ f 0 ðx, yÞ = ðx þ bÞ2 þ ðy aÞ2 :

x

ð19:15Þ

We also have a r 0 = ð e1 e2 Þ , b 0 1 a b r00 ¼ ðe1 e2 Þ ¼ ð e1 e2 Þ , 1 0 b a where we define R as R¼

0 1

1 : 0

Similarly, x and r = ð e1 e2 Þ y

y r = ð e1 e2 Þ : x 0

From (19.15), we have f 0 ðx0 , y0 Þ = ðx0 þ bÞ þ ðy0 aÞ ¼ ðy þ bÞ2 þ ðx aÞ2 ¼ f ðx, yÞ: 2

2

ð19:16Þ

This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is that a view of f0(x0, y0) from (b, a) is the same as that of f(x, y) from (a, b). Imagine that if we are standing at (b, a) of f0(x0, y0), we are in the bottom of the “valley” of f0(x0, y0). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in

19.1

Transformation of Functions

733

y

Fig. 19.3 Contour map of 2 2 2 f ðrÞ = e 2 2½ðxaÞ þy þz þ

′, ′

2 2 2 e 2 2½ðxþaÞ þy þz ða > 0Þ. The function form remains unchanged by a reflection with respect to the yz-plane

, (− , 0)

( , 0)

O x

z

the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f0(x0, y0), (a, b) and (b, a) are the lowest point, respectively. Meanwhile, we have R

1

¼

0 1

1 0 1 , R ð r Þ ¼ ð e1 e2 Þ 0 1

1 0

x y ¼ ð e1 e2 Þ : y x

Then we have f R1 ðrÞ ¼ ½ðyÞ a2 þ ½ðxÞ b2 ¼ ðx þ bÞ2 þ ðy aÞ2 ¼ Rf ðrÞ:

ð19:17Þ

Thus, (19.12) certainly holds. Example 19.2 Let f(r) and g(r) be functions described by f ðrÞ ¼ e2½ðxaÞ þy 2

gðrÞ ¼ e2½ðxaÞ

2

2

þz2

þy2 þz2

þ e2½ðxþaÞ þy 2

e2½ðxþaÞ

2

2

þz2

þy2 þz2

ða > 0Þ,

ð19:18Þ

ða > 0Þ:

ð19:19Þ

Figure 19.3 shows an outline of the contour of f(r). We consider a following symmetry operation in a three-dimensional coordinate system: 0

1 B R¼@ 0 0

0 1

1 0 0 1 C B 1 0 A and R ¼ @ 0

0 1

1 0 C 0 A:

0

1

0

1

0

ð19:20Þ

This represents a reflection with respect to the yz-plane. Then we have f R1 ðrÞ ¼ Rf ðrÞ = f ðrÞ,

ð19:21Þ

734

19

Applications of Group Theory to Physical Chemistry

(a)

(b)

0

0

‒a

0

a

‒a

0

Fig. 19.4 Plots of f ðrÞ ¼ e2½ðxaÞ þy þz þ e2½ðxþaÞ þy þz and gðrÞ ¼ e2½ðxaÞ 2 2 2 e2½ðxþaÞ þy þz ða > 0Þ as a function of x on the x-axis. (a) f(r). (b) g(r) 2

2

2

2

2

2

g R1 ðrÞ ¼ RgðrÞ = 2 gðrÞ:

a 2

þy2 þz2

ð19:22Þ

Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4. Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and 1 for f (r) and g(r), respectively. In particular, f(r) holds the functional form after the transformation R. In this case, f(r) is said to be invariant with the transformation R. Moreover, f(r) is invariant with the following eight transformations: 0

1

B R¼@ 0 0

0 1 0

0

1

C 0 A: 1

ð19:23Þ

These transformations form a group that is isomorphic to D2h. Therefore, f(r) is eligible for a basis function of the totally symmetric representation Ag of D2h. Notice that f(r) is invariant as well with a rotation of an arbitrary angle around the x-axis. On the other hand, g(r) belongs to B3u.

19.2

Method of Molecular Orbitals (MOs)

Bearing in mind these arguments, we examine several examples of quantum chemical calculations. Our approach is based upon the molecular orbital theory. The theory assumes the existence of molecular orbitals (MOs) in a molecule, as the notion of atomic orbitals has been well established in atomic physics. Furthermore, we assume that the molecular orbitals comprise a linear combination of atomic orbitals (LCAO). This notion is equivalent to that individual electrons in a molecule are independently moving in a potential field produced by nuclei and other electrons. In other words, we assume that each electron is moving along an MO that is extended over

19.2

Method of Molecular Orbitals (MOs)

735

the whole molecule. Electronic state in the molecule is formed by different MOs of various energies that are occupied by electrons. As in the case of an atom, an MO ψ i(r) occupied by an electron is described as ħ2 2 Hψ i ðrÞ ∇ þ V ðrÞ ψ i ðrÞ ¼ E i ψ i ðrÞ, 2m

ð19:24Þ

where H is Hamiltonian of a molecule; m is a mass of an electron (note that we do not use a reduced mass μ here); r is a position vector of the electron; ∇2 is the Laplacian (Laplace operator); V(r) is a potential of the molecule at r; Ei is an energy of the electron occupying ψ i and said to be a molecular orbital energy. We assume that V(r) possesses a symmetry the same as that of the molecule. Let ℊ be a symmetry group and let a group element arbitrarily chosen from among ℊ be R. Suppose that an arbitrary position vector r is moved to another position r0. This transformation is expressed as (19.2). Since V(r) has the same symmetry as the molecule, an electron “feels” the same potential field at r0 as that at r. That is, V ðrÞ ¼ V 0 ðr0 Þ ¼ V ðr0 Þ:

ð19:25Þ

V ðrÞψ ðrÞ ¼ V ðr0 Þψ 0 ðr0 Þ,

ð19:26Þ

Or we have

where ψ is an arbitrary function. Defining V ðrÞψ ðrÞ ½Vψ ðrÞ ¼ Vψ ðrÞ,

ð19:27Þ

and recalling (19.1) and (19.7), we get ½RV ψ ðr0 Þ ¼ R½Vψ ðr0 Þ ¼ V 0 ψ 0 ðr0 Þ ¼ V 0 ðr0 Þψ 0 ðr0 Þ ¼ V ðr0 Þψ 0 ðr0 Þ ¼ V ðr0 ÞRψ ðr0 Þ ¼ VRψ ðr0 Þ ¼ ½VRψ ðr0 Þ:

ð19:28Þ

Comparing the first and last sides and remembering that ψ is an arbitrary function, we get RV ¼ VR:

ð19:29Þ

Next, let us examine the symmetry of the Laplacian ∇2. The Laplacian is defined in Sect. 1.2 as 2

∇2

2

2

∂ ∂ ∂ þ 2þ 2, 2 ∂x ∂y ∂z

ð1:24Þ

736

19

Applications of Group Theory to Physical Chemistry

where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that transforms the xyz-coordinate system to x’y’z’-coordinate system. We suppose that S is expressed as 0

s11

s12

B S ¼ @ s21 s31

s13

1

C s23 A s33

s22 s32

0

x0

1

0

s11

B 0C B @ y A ¼ @ s21 z0 s31

and

s12 s22 s32

10 1 x CB C s23 A@ y A: z s33 s13

ð19:30Þ

Since S is an orthonormal matrix, we have 0 1 0 x s11 B C B @ y A ¼ @ s12 z

s13

10 0 1 x s31 CB C s32 A@ y0 A:

s21 s22 s23

s33

ð19:31Þ

0

z

The equation is due to SST ¼ ST S ¼ E,

ð19:32Þ

where E is a unit matrix. Then we have ∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂ þ 0 þ 0 ¼ s11 þ s12 þ s13 : ¼ 0 0 ∂x ∂x ∂x ∂x ∂y ∂x ∂z ∂x ∂y ∂z

ð19:33Þ

Partially differentiating (19.33), we have ∂ ∂ ∂ ∂ ∂ ∂ ∂ þ s12 0 þ s13 0 ¼ s11 0 ∂x ∂x ∂x ∂y ∂x ∂z ∂x0 2 ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ¼ s11 s11 þ s12 þ s13 þ s12 s11 þ s12 þ s13 ∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y ∂ ∂ ∂ ∂ þ s12 þ s13 þs13 s11 ∂x ∂y ∂z ∂z 2

¼ s11 2

2

2

∂ ∂ ∂ þ s11 s13 þ s11 s12 ∂x2 ∂y∂x ∂z∂x 2

2

2

þs12 s11

∂ ∂ ∂ þ s12 2 2 þ s12 s13 ∂x∂y ∂y ∂z∂y

þs13 s11

∂ ∂ ∂ þ s13 s12 þ s13 2 2 : ∂x∂z ∂y∂z ∂z

2

2

2

ð19:34Þ

19.2

Method of Molecular Orbitals (MOs)

737

Calculating e.g., terms of ∂y∂0 2 and ∂z∂0 2, we have similar results. Then, collecting all those 27 terms, we get

s11 2 þ s21 2 þ s31 2

2 ∂2 ∂ ¼ 2, 2 ∂x ∂x

ð19:35Þ

where we used an orthogonal relationship of S. Cross terms of vanish. Consequently, we get 2

2

2

2

∂ , ∂ , ∂x∂y ∂y∂x

etc. all

2

∂ ∂ ∂ ∂ ∂ ∂ þ þ ¼ þ þ : ∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2

ð19:36Þ

Defining ∇02 as ∇0 2

∂ ∂ ∂ þ 02 þ 02 , 2 0 ∂x ∂y ∂z

ð19:37Þ

we have ∇ 0 ¼ ∇2 : 2

ð19:38Þ

Notice that (19.38) holds with not only the symmetry operation of the molecule, but also any rotation operation in ℝ3 (see Sect. 17.4). As in the case of (19.28), we have 2 R∇2 ψ ðr0 Þ ¼ R ∇2 ψ ðr0 Þ ¼ ∇0 ψ 0 ðr0 Þ ¼ ∇2 ψ 0 ðr0 Þ ¼ ∇2 Rψ ðr0 Þ ¼ ∇2 R ψ ðr0 Þ:

ð19:39Þ

Consequently, we get R∇2 ¼ ∇2 R:

ð19:40Þ

Adding both sides of (19.29) and (19.40), we get

R ∇2 þ V ¼ ∇2 þ V R:

ð19:41Þ

RH ¼ HR:

ð19:42Þ

From (19.24), we have

738

19

Applications of Group Theory to Physical Chemistry

Thus, we confirm that the Hamiltonian H commutes with any symmetry operation R. In other words, H is invariant with the coordinate transformation relevant to the symmetry operation. Now we consider matrix representation of (19.42). Also let D be an irreducible representation of the symmetry group ℊ. Then, we have DðgÞH ¼ HDðgÞ ðg 2 ℊÞ: Let {ψ 1, ψ 2, , ψ d} be a set of basis vectors that span a representation space L S associated with D. Then, on the basis of Schur’s Second Lemma of Sect. 18.3, (19.42) immediately leads to an important conclusion that if H is represented by a matrix, we must have H ¼ λE,

ð19:43Þ

where λ is an arbitrary complex constant. That is, H is represented such that 0

1 0 C ⋱ ⋮ A, λ

λ B H ¼ @⋮ 0

ð19:44Þ

where the above matrix is (d, d ) square diagonal matrix. This implies that H is reduced to H ¼ λDð0Þ λDð0Þ , where D(0) denotes the totally symmetric representation; notice that it is given by 1 for any symmetry operation. Thus, the commutability of an operator with all symmetry operation is equivalent to that the operator belongs to the totally symmetric representation. Since H is Hermitian, λ should be real. Operating both sides of (19.42) on {ψ 1, ψ 2, , ψ d} from the right, we have ðψ 1 ψ 2 ψ d ÞRH ¼ ðψ 1 ψ 2 ψ d ÞHR 1 0 λ C B C B λ C B ¼ ðψ 1 ψ 2 ψ d ÞB CR ¼ ðλψ 1 λψ 2 λψ d ÞR : B ⋮ ⋱ ⋮C A @ ¼ λðψ 1 ψ 2 ψ d ÞR In particular putting R ¼ E, we get

λ

ð19:45Þ

19.2

Method of Molecular Orbitals (MOs)

739

ðψ 1 ψ 2 ψ d ÞH ¼ λðψ 1 ψ 2 ψ d Þ: Simplifying this equation, we have ψ i H ¼ λψ i ð1 i dÞ:

ð19:46Þ

Thus, λ is found to be an energy eigenvalue of H and ψ i (1 i d ) are eigenfunctions belonging to the eigenvalue λ. Meanwhile, using Rðψ i Þ ¼

Xd k¼1

ψ k Dki ðRÞ,

we rewrite (19.45) as ðψ 1 ψ 2 ψ d ÞRH ¼ ðψ 1 R ψ 2 R ψ d RÞH X

Xd Xd d ¼ ψ D ð R Þ ψ D ð R Þ ψ D ð R Þ H, k1 k2 kd k k k k¼1 k¼1 k¼1 Xd

Xd Xd ψ D ð R Þ ψ D ð R Þ ψ D ð R Þ ¼λ k1 k2 kd k k k k¼1 k¼1 k¼1 where with the second equality we used the relation (11.42); i.e., ψ i R ¼ Rðψ i Þ: Thus, we get Xd k¼1

ψ k Dki ðRÞH ¼ λ

Xd k¼1

ψ k Dki ðRÞ

ð1 i d Þ:

ð19:47Þ

The relations (19.45) to (19.47) imply that ψ 1, ψ 2, , and ψ d as well as their linear combinations using a representation matrix D(R) are eigenfunctions that belong to the same eigenvalue λ. That is, ψ 1, ψ 2, , and ψ d are said to be degenerate with a multiplication d. After the remarks of Theorem 14.5, we can construct an orthonormal basis set e1, ψ e 2 , , ψ e d g as eigenfunctions. These vectors can be constructed via linear fψ combinations of ψ 1, ψ 2, , and ψ d. After transforming the vectors e i ð1 i d Þ by R, we get ψ

740

19

Applications of Group Theory to Physical Chemistry

DXd

E Xd Xd Xd e k je e e D ð R Þj D ð R Þ ¼ D ðRÞ Dlj ðRÞhψ ψ li ψ ψ ki lj k¼1 k l¼1 l k¼1 l¼1 ki X Xd d D ðRÞ Dkj ðRÞ ¼ D{ ðRÞ ik ½DðRÞkj ¼ D{ ðRÞDðRÞ ij ¼ δij : ¼ k¼1 ki k¼1

With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set e 2 , , ψ e d g. e1, ψ is retained after unitary transformation of fψ In the above discussion, we have assumed that ψ 1, ψ 2, , and ψ d belong to a e1, ψ e 2 , , and ψ e d consist of their linear certain irreducible representation D(ν). Since ψ e1, ψ e 2 , , and ψ e d belong to D(ν) as well. Also, we have assumed combinations, ψ e1, ψ e 2 , , and ψ e d form an orthonormal basis and, hence, according to that ψ Theorem 11.3 these vectors constitute basis vectors that belong to D(ν). In particular, functions derived using projection operators share these characteristics. In fact, the above principles underlie molecular orbital calculations dealt with in Sects. 19.4 and 19.5. We will go into more details in subsequent sections. Bearing the aforementioned argument in mind, we make the most of the relation expressed by (18.171) to evaluate inner products of functions (or vectors). In this connection, we often need to calculate matrix elements of an operator. One of the most typical examples is matrix elements of Hamiltonian. In the field of molecular science we have to estimate an overlap integral, Coulomb integral, resonance integral, etc. Other examples include transition matrix elements pertinent to, e.g., electric dipole transition. To this end, we deal with direct-product representation (see Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th irreducible representation. Let us think of a following inner product: D E ðβ Þ ϕl jOðγÞ ψ ðsαÞ ,

ð19:48Þ

ðβ Þ

where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the l-th component of β-th irreducible representation, respectively; see (18.146) and (18.161). (i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ belongs to an eigenvalue λ of H. Then, we have ð αÞ

ðαÞ

H j ψ l i ¼ λ j ψ l i:

ð19:49Þ

In that case, with (19.48) we get D E D E ðβÞ ðβ Þ ϕl jHψ ðsαÞ ¼ λ ϕl jψ ðsαÞ : This equation is essentially the same as (18.171). That is,

ð19:50Þ

19.2

Method of Molecular Orbitals (MOs)

D

ð αÞ

ϕðsβÞ jψ l

E

741

D E ðαÞ ¼ δαβ δls ϕðsβÞ jψ l :

ð18:171Þ

At the first glance this equation seems trivial, but the equation gives us a powerful tool to save us a lot of troublesome calculations. In fact, if we encounter a series of inner product calculations (i.e., definite integrals), we ignore many of them. It is because matrix elements of Hamiltonian do not vanish only if α ¼ β and l ¼ s. That is, we only have to estimate ϕðsαÞ jψ ðsαÞ . Otherwise the inner products vanish. The functional forms of ϕðsαÞ and ψ ðsαÞ are determined depending upon individual practical problems. We will show tangible examples later (Sect. 19.4). (ii) Next, let us consider matrix elements of the optical transition. In this case, we are thinking of transition probability between D quantum states. E Assuming the ðβ Þ

dipole approximation, we use εe P(γ) for O(γ) in ϕl jOðγÞ ψ ðsαÞ of (19.48). The

quantity P(γ) represents an electric dipole moment associated with the position vector. Normally, we can readily find it in a character table, in which we examine which irreducible representation γ the position vector components x, y, and z indicated in a rightmost column correspond to. Table 18.4 is an example. If we take a unit polarization vector εe in parallel to the position vector component that the character table designates, εe P(γ) is nonvanishing. Suppose ðβÞ that ψ ðsαÞ and ϕl are an initial state and final state, respectively. At the first glance, we would wish to use (18.200) and count how many times a representation β occurs for a direct-product representation D(γ α). It is often the case, however, where β is not an irreducible representation. Even in such a case, if either α or β belongs to a totally symmetric representation, the handling will be easier. This situation corresponds to that an initial electronic configuration or a final electronic configuration forms a closed shell an electronic state of which is totally symmetric [1]. The former occurs when we consider the optical absorption that takes place, e.g., in a molecule of a ground electronic state. The latter corresponds to an optical emission that ends up with a ground state. Let us ðγ Þ consider the former case. Since Oj is Hermitian, we rewrite (19.48) as D

E D E D E ðβÞ ðγ Þ ðγ Þ ðβÞ ðγ Þ ðβ Þ ϕl jOj ψ ðsαÞ ¼ Oj ϕl jψ ðsαÞ ¼ ψ ðsαÞ jOj ϕl ,

ð19:51Þ

where we assume that ψ ðsαÞ is a ground state having a closed shell electronic configuration. Therefore, ψ ðsαÞ belongs to a totally symmetric irreducible representation. For (19.48) not to vanish, therefore, we may alternatively state that it is necessary for Dðγ βÞ ¼ DðγÞ DðβÞ

742

19

Applications of Group Theory to Physical Chemistry

to contain D(α) belonging to a totally symmetric representation. Note that in group theory we usually write D(γ) D(β) instead of D(γ) D(β); see (18.191). ðβÞ If ϕl belongs to a reducible representation, from (18.80) we have DðβÞ ¼

X

q DðωÞ , ω ω

where D(ω) belongs to an irreducible representation ω. Then, we get DðγÞ DðβÞ ¼

X

q DðγÞ ω ω

DðωÞ :

Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible representations. After that, we examine whether an irreducible representation α is contained in D(γ) D(β). Since the totally symmetric representation is one-dimensional, s ¼ 1 in (19.51).

19.3

Calculation Procedures of Molecular Orbitals (MOs)

We describe a brief outline of the MO method based on LCAO (LCAOMO). Suppose that a molecule consists of n atomic orbitals and that each MO ψ i (1 i n) comprises a linear combination of those n atomic orbitals ϕk (1 k n). That is, we have ψi ¼

Xn

c ϕ k¼1 ki k

ð1 i nÞ,

ð19:52Þ

where cki are complex coefficients. We assume that ϕk are normalized. That is, ð hϕk jϕk i ϕ k ϕk dτ ¼ 1,

ð19:53Þ

where dτ implies that an integration should be taken over ℝ3. The notation hg| fi means an inner product defined in Sect. 13.1. The inner product is usually defined by a definite integral of g f whose integration range covers a part or all of ℝ3 depending on a constitution of a physical system. First we try to solve Schrödinger equation given as an eigenvalue equation. The said equation is described as Hψ ¼ λψ or ðH λψ Þ ¼ 0: Replacing ψ with (19.52), we have

ð19:54Þ

19.3

Calculation Procedures of Molecular Orbitals (MOs)

Xn

c ðH k¼1 k

λÞϕk ¼ 0

743

ð1 k nÞ,

ð19:55Þ

where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕ j from the left and integrating over whole ℝ3, we have ð ϕ j ðH λÞϕk dτ ¼ 0: c k k¼1

Xn

ð19:56Þ

Rewriting (19.56), we get Xn

c k¼1 k

ð

ϕ j Hϕk λϕj ϕk dτ ¼ 0:

ð19:57Þ

Here let us define following quantities: ð ð H jk ¼ ϕ j Hϕk dτ and Sjk ¼ ϕ j ϕk dτ,

ð19:58Þ

where Sii ¼ 1 (1 i n) due to a normalized function of ϕi. Then we get Xn

c k¼1 k

H jk λSjk ¼ 0:

ð19:59Þ

Rewriting (19.59) in a matrix form, we get 0 B @

H 11 λ

⋮ H n1 λSn1

⋱

H 1n λS1n

10

c1

1

CB C ⋮ A@ ⋮ A ¼ 0, cn H nn λ

ð19:60Þ

where note that Sii ¼ 1 (1 i n) because of normalization of ϕi. Suppose that by solving (19.59) or (19.60), we get λi (1 i n), some of which may be identical (i.e., the degenerate case), and obtain n different column eigenvectors corresponding to n eigenvalues λi. In light of (12.4) of Sect. 12.1, the following condition must be satisfied for this to get eigenvectors for which not all ck is zero: H 11 λ ⋮ H n1 λSn1

⋱

H 1n λS1n ⋮ ¼0 H nn λ

ð19:61Þ

744

19

Applications of Group Theory to Physical Chemistry

Equation (19.61) is called a secular equation. This equation is pertinent to a determinant of an order n, and so we are expected to get n roots for this, some of which are identical (i.e., a degenerate case). In the above discussion, it is useful to introduce the following notation: ð ð ϕj jHϕk H jk ¼ ϕ j Hϕk dτ and ϕj jϕk Sjk ¼ ϕ j ϕk dτ:

ð19:62Þ

The above notation has already been introduced in Sect. 1.4. Equation (19.62) certainly satisfies the definition of the inner product described in (13.2)–(13.4); readers, please check it. On the basis of (13.64), we have hyjHxi ¼ xH { jy :

ð19:63Þ

Since H is Hermitian, H{ = H. That is, hyjHxi ¼ hxHjyi:

ð19:64Þ

To solve (19.61) with an enough large number n is usually formidable. However, if we can find appropriate conditions, (19.61) can pretty easily be solved. An essential point rests upon how we deal with off-diagonal elements Hjk and Sjk. If we are able to appropriately choose basis vectors so that we can get H jk ¼ 0 and Sjk ¼ 0 for j 6¼ k,

ð19:65Þ

the secular equation is reduced to a simple form e e H 11 λS11

e 22 λe H S22 ⋱

¼ 0, e nn λe H Snn

ð19:66Þ

where all the off-diagonal elements are zero and an eigenvalue λ is given by e ii =e λi ¼ H Sii :

ð19:67Þ

This means that the eigenvalue equation has automatically been solved. The best way to achieve this is to choose basis functions (vectors) such that the functions conform to the symmetry which the molecule belongs to. That is, we “shuffle” the atomic orbitals as follows:

19.3

Calculation Procedures of Molecular Orbitals (MOs)

ξi ¼

Xn

d ϕ k¼1 ki k

ð1 i nÞ,

745

ð19:68Þ

where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a linear combination of the atomic orbitals that belongs to individual irreducible representation. The said linear combination is called symmetry-adapted linear combination (SALC). We remark that all atomic orbitals are not necessarily included in the SALC. Then we have ð e H jk ¼ ξ j Hξk dτ

and

ð ~ Sjk ¼ ξ j ξk dτ:

ð19:69Þ

Thus, instead of (19.61), we have a new secular equation of

e jk λ~ det H Sjk ¼ 0:

ð19:70Þ

Since the Hamiltonian H is totally symmetric, in terms of the direct-product representation Hξk in (19.69) belongs to an irreducible representation which ξk belongs to. This can intuitively be understood. But, to assert this, use (18.76) and (18.200) along with the fact that characters of the totally symmetric representation are 1. In light of (18.171) and (18.181), if ξk and ξj belong to different irreducible e jk and e representations, H Sjk both vanish at once. From (18.172), at the same time, ξk and ξj are orthogonal to each other. The most ideal situation is to get (19.66) with all the off-diagonal elements vanishing. Regarding the diagonal elements of (19.70), we always have the direct product of the same representation and, hence, the integrals (19.69) do not vanish. Thus, we get a powerful guideline for the evaluation of (19.70) in an as simplest as possible form. If the SALC orbitals ξj and ξk belong to the same irreducible representation, the relevant matrix elements are generally nonvanishing. Even in that case, however, according to Theorem 13.2 we can construct a set of orthonormal vectors by taking appropriate linear combination of ξj and ξk. The resulting vectors naturally belong to the same irreducible representation. This can be done by solving the secular equation as described below. On top of it, Hermiticity of the Hamiltonian ensures that those vectors can be rendered orthogonal (see Theorem 14.5). If n electrons are present in a molecule, we are to deal with n SALC orbitals which we view as vectors accordingly. In terms of representation theory, these vectors span a representation space where the vectors undergo symmetry operations. Thus, we can construct orthonormal basis vectors belonging to various irreducible representations throughout the representation space. To address our problems, we take following procedures: (i) First we have to determine a symmetry species (i.e., point group) of a molecule. (ii) Next, we pick up

746

19

Applications of Group Theory to Physical Chemistry

atomic orbitals contained in a molecule and examine how those orbitals are transformed by symmetry operations. (iii) We examine how those atomic orbitals are transformed according to symmetry operations. Since the symmetry operations are represented by a (n, n) unitary matrix, we can readily decide a trace of the matrix. Generally, that matrix representation is reducible, and so we reduce the representation according to the procedures of (18.81)–(18.83). Thus, we are able to determine how many irreducible representations are contained in the original reducible representation. (iv) After having determined the irreducible representations, we construct SALCs and constitute a secular equation using them. (v) Solving the secular equation, we determine molecular orbital energies and decide functional forms of the corresponding MOs. (vi) We examine physicochemical properties such as optical transition within a molecule. In the procedure (iii) of the above, if the same irreducible representations appear more than once, we have to solve a secular equation of an order of two or more. Even in that case, we can render a set of resulting eigenvectors orthogonal to one another during the process of solving a problem. To construct the abovementioned SALC, we make the most of projection operators that are defined in Sect. 14.1. In (18.147) putting m ¼ l, we have ðαÞ

PlðlÞ ¼

dα X ðαÞ D ðgÞ g: g ll n

ð19:71Þ

Or we can choose a projection operator P(α) described as PðαÞ ¼

h i dα X Xdα ðαÞ dα X ðαÞ D ð g Þ g ¼ χ ð g Þ g: g l¼1 ll g n n ðαÞ

ð18:174Þ

In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in (18.155) and (18.175), these projection operators act on an arbitrary function and extract specific component(s) pertinent to a specific irreducible representation of the point group which the molecule belongs to. ðαÞ At first glance the definition of PlðlÞ and P(α) looks daunting, but use of character tables relieves a calculation task. In particular, all the irreducible representations are one-dimensional (i.e., just a number!) with Abelian groups as mentioned in Sect. 18.6. For this, notice that individual group elements form a class by themselves. Therefore, utilization of character tables becomes easier for Abelian groups. Even though we encounter a case where a dimension of representation is more than one ðαÞ (i.e., the case of noncommutative groups), Dll can be determined without much difficulty.

19.4

19.4

MO Calculations Based on π-Electron Approximation

747

MO Calculations Based on π-Electron Approximation

On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform molecular orbital calculations of individual molecules. First, we apply group theory to the molecular orbital calculations about aromatic hydrocarbons such as ethylene, cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases, in addition to adoption of the molecular orbital theory, we adopt so-called “π-electron approximation.” With the first three molecules, we will not have to “solve” a secular equation. But, for allyl radical we deal with two SALC orbitals belonging to the same irreducible representations and, hence, the final molecular orbitals must be obtained by solving the secular equation.

19.4.1 Ethylene We start with one of the simplest examples, ethylene. Ethylene is a planar molecule and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons extend vertically to the molecular plane toward upper and lower directions (Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is, those atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane). In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have C 2 ðzÞðϕ1 Þ ¼ ϕ2 and

ð19:72Þ

z

z

+ &

+ &

+

+

y

y x

x Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane)

748

19

Applications of Group Theory to Physical Chemistry

C2 ðzÞðϕ2 Þ ¼ ϕ1 :

ð19:73Þ

Equation (19.73) can be combined into a following equation: ðϕ1 ϕ2 ÞC 2 ðzÞ ¼ ðϕ2 ϕ1 Þ:

ð19:74Þ

Thus, using a matrix representation, we have C2 ðzÞ ¼

0 1

1 : 0

ð19:75Þ

In Sect. 17.2, we had 0

1 cos θ sin θ 0 B C Rzθ ¼ @ sin θ cos θ 0 A or 0 0 1 0 1 1 0 0 B C C 2 ðzÞ ¼ @ 0 1 0 A: 0 0 1

ð19:76Þ

ð19:77Þ

Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3, (19.75) represents the transformation of functions in a two-dimensional vector space. A vector space composed of functions may be finite-dimensional or infinite-dimensional. We have already encountered the latter case in Part I where we dealt with the quantum mechanics of a harmonic oscillator. Such a function space is often referred to as a Hilbert space. An essence of (19.75) is characterized by that a trace (or character) of the matrix is zero. Let us consider another transformation C2( y). In this case the situation is different from the above case in that ϕ1 is converted to ϕ1 by C2( y) and that ϕ2 is converted to ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals. Thus, C2( y) is represented by C 2 ð yÞ ¼

1

0

0

1

:

ð19:78Þ

The trace of the matrix C2( y) is 2. In this way, choosing ϕ1 and ϕ2 for the basis functions for the representation of D2h we can determine the characters for individual symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We collect the results in Table 19.1. Next, we examine what kind of irreducible representations is contained in our present representation of ethylene. To do this, we need a character table of D2h (see Table 19.2). If a specific kind of irreducible representation is contained, then we

19.4

MO Calculations Based on π-Electron Approximation

749

Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene D2h Γ

E 2

C2(z) 0

C2( y) 2

C2(x) 0

i 0

σ(xy) 2

i

σ(xy) 1 1 1 1 1 1 1 1

σ(zx) 1 1 1 1 1 1 1 1

σ(zx) 0

σ(yz) 2

Table 19.2 Character table of D2h D2h Ag B1g B2g B3g Au B1u B2u B3u

E 1 1 1 1 1 1 1 1

C2(z) 1 1 1 1 1 1 1 1

C2( y) 1 1 1 1 1 1 1 1

C2(x) 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

σ(yz) 1 1 1 1 1 1 1 1

x2, y2, z2 xy zx yz z y x

want to examine how many times that specific representation takes place. Equation (18.83) is very useful for this purpose. In the present case, n ¼ 8 in (18.83). Also taking into account (18.79) and (18.80), we get Γ ¼ B1u B3g ,

ð19:79Þ

where Γ is a reducible representation for a set consisting of two pz atomic orbitals of ethylene. Equation (19.79) clearly shows that Γ is a direct sum of two irreducible representations B1u and B3g that belong to D2h. In instead of (19.79), we usually simply express the direct product in quantum chemical notation as Γ ¼ B1u þ B3g :

ð19:80Þ

As a next step, we are going to find an appropriate basis function belonging to two irreducible representations of (19.80). For this purpose, we use projection operators expressed as PðαÞ ¼

h i dα X ðαÞ χ ðgÞ g: n g

Taking ϕ1 for instance, we apply (18.174) to ϕ1. That is,

ð18:174Þ

750

19

Table 19.3 Character table of C2

C2 A B

Applications of Group Theory to Physical Chemistry E 1 1

C2 1 1

z; x2, y2, z2, xy x, y; yz, zx

h i 1 X ðB1u Þ χ ð g Þ gϕ1 g 8 1 ¼ ½1 ϕ1 þ 1 ϕ2 þ ð1Þðϕ1 Þ þ ð1Þðϕ2 Þ þ ð1Þðϕ2 Þ 8 1 þð1Þðϕ1 Þ þ 1 ϕ2 þ 1 ϕ1 ¼ ðϕ1 þ ϕ2 Þ: ð19:81Þ 2

PðB1u Þ ϕ1 ¼

Also with the B3g, we apply (18.174) to ϕ1 and get h i 1 X ðB3g Þ ð g Þ χ gϕ1 PðB3g Þ ϕ1 ¼ g 8 1 ¼ ½1 ϕ1 þ ð1Þϕ2 þ ð1Þðϕ1 Þ þ 1ðϕ2 Þ þ 1ðϕ2 Þ þ ð1Þðϕ1 Þ 8 1 þð1Þϕ2 þ 1 ϕ1 ¼ ðϕ1 ϕ2 Þ: ð19:82Þ 2 Thus, after going through routine but sure procedures, we have reached appropriate basis functions that belong to each irreducible representation. As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product group and C2 group is contained in D2h as a subgroup. We can use it for the present analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a representation of H . We write this relation as D#H:

ð19:83Þ

This representation is called a subduced representation of D to H . Table 19.3 shows a character table of irreducible representations of C2. In the present case, we are thinking of C2(z) as C2; see Table 17.5 and Fig. 19.5. Then we have B1u # C 2 ¼ A and B3g # C2 ¼ B:

ð19:84Þ

The expression of (19.84) is called a compatibility relation. Note that in (19.84) C2 is not a symmetry operation, but means a subgroup of D2h. Thus, (19.81) is reduced to PðAÞ ϕ1 ¼

h i 1 X ðAÞ 1 1 χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ 1 ϕ2 ¼ ðϕ1 þ ϕ2 Þ: g 2 2 2

Also, we have

ð19:85Þ

19.4

MO Calculations Based on π-Electron Approximation

PðBÞ ϕ1 ¼

h i 1 X ðBÞ 1 1 χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ ð1Þϕ2 ¼ ðϕ1 ϕ2 Þ: g 2 2 2

751

ð19:86Þ

The relations of (19.85) and (19.86) are essentially the same as (19.81) and (19.82), respectively. We can easily construct a character table (see Table 19.3). There should be two irreducible representations. Regarding the totally symmetric representation, we allocate 1 to each symmetry operation. For another representation we allocate 1 to an identity element E and 1 to an element C2 so that the row and column of the character table are orthogonal to each other. Readers might well ask why we bother to make circuitous approaches to reaching predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question seems natural when we are dealing with a case where the number of basis vectors (i.e., a dimension of the vector space) is small, typically 2 as in the present case. With increasing dimension of the vector space, however, to seek and determine appropriate SALCs become increasingly complicated and difficult. Under such circumstances, a projection operator is an indispensable tool to address the problems. We have a two-dimensional secular equation to be solved such that e S11 H 11 λe H e 21 λe S21

e 12 λe H S12 ¼ 0: e 22 λe S22 H

ð19:87Þ

In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1 ϕ2 Þ belongs to B3g. Since they belong to different irreducible representations, we have e 12 ¼ e H S12 ¼ 0:

ð19:88Þ

Thus, the secular equation (19.84) is reduced to e S11 H 11 λe 0

¼ 0: e 22 λe H S22 0

ð19:89Þ

As expected, (19.89) has automatically been solved to give a solution λ1 ¼

e 11 H e 22 =e and λ2 ¼ H S22 : e S11

ð19:90Þ

The next step is to determine the energy eigenvalue of the molecule. Note here that a role of SALCs is to determine a suitable irreducible representation that corresponds to a “direction” of a vector. As the coefficient keeps the direction of a vector unaltered, it would be of secondary importance. The final form of normalized MOs can be decided last. That procedure includes the normalization of a vector. Thus, we tentatively choose following functions for SALCs, i.e.,

752

19

Applications of Group Theory to Physical Chemistry

ξ1 ¼ ϕ1 þ ϕ2 and ξ2 ¼ ϕ1 ϕ2 : Then, we have ð ð e 11 ¼ ξ 1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ H ð ð ð ð ¼ ϕ 1 Hϕ1 dτ þ ϕ 1 Hϕ2 dτ þ ϕ 2 Hϕ1 dτ þ ϕ 2 Hϕ2 dτ

ð19:91Þ

¼ H 11 þ H 12 þ H 21 þ H 22 ¼ H 11 þ 2H 12 þ H 22 : Similarly, we have ð e 22 ¼ ξ 2 Hξ2 dτ ¼ H 11 2H 12 þ H 22 : H

ð19:92Þ

The last equality comes from the fact that we have chosen real functions for ϕ1 and ϕ2 as studied in Part I. Moreover, we have ð ð ð H 11 ϕ 1 Hϕ1 dτ ¼ ϕ1 Hϕ1 dτ ¼ ϕ2 Hϕ2 dτ ¼ H 22 , ð ð ð ð H 12 ϕ1 Hϕ2 dτ ¼ ϕ1 Hϕ2 dτ ¼ ϕ2 Hϕ1 dτ ¼ ϕ2 Hϕ1 dτ ¼ H 21 :

ð19:93Þ

The first equation comes from the fact that both H11 and H22 are calculated using the same 2pz atomic orbital of carbon. The second equation results from the fact that H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the convention, we denote α H 11 ¼ H 22 and β H 12 ,

ð19:94Þ

where α is called Coulomb integral and β is said to be resonance integral. Then, we have e 11 ¼ 2ðα þ βÞ: H

ð19:95Þ

e 22 ¼ 2ðα βÞ: H

ð19:96Þ

In a similar manner, we get

Meanwhile, we have

19.4

MO Calculations Based on π-Electron Approximation

753

ð ð e S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ ¼ ðϕ1 þ ϕ2 Þ2 dτ ð ð ð ð ¼ ϕ21 dτ þ ϕ22 dτ þ 2 ϕ1 ϕ2 dτ ¼ 2 þ 2 ϕ1 ϕ2 dτ,

ð19:97Þ

where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the convention, we denote ð S ϕ1 ϕ2 dτ ¼ S12 ,

ð19:98Þ

where S is called overlap integral. Thus, we have e S11 ¼ 2ð1 þ SÞ:

ð19:99Þ

e S22 ¼ 2ð1 SÞ:

ð19:100Þ

Similarly, we get

Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we get as the energy eigenvalue λ1 ¼

αþβ αβ and λ2 ¼ : 1þS 1S

ð19:101Þ

From (19.97), we get j jξ1 j j¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hξ1 jξ1 i ¼ 2ð1 þ SÞ:

ð19:102Þ

Thus, for one of MOs corresponding to an energy eigenvalue λ1, we get Ψ1 ¼

j ξ1 i ϕ1 þ ϕ2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : j jξ1 j j 2ð 1 þ SÞ

ð19:103Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hξ2 jξ2 i ¼ 2ð1 SÞ:

ð19:104Þ

Also, we have j jξ2 j j¼

For another MO corresponding to an energy eigenvalue λ2, we get

754

19

=

(

)

=

(

)

Applications of Group Theory to Physical Chemistry

Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene

Ψ2 ¼

j ξ2 i ϕ1 ϕ2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : j jξ2 j j 2ð 1 SÞ

ð19:105Þ

Note that both normalized MOs and energy eigenvalues depend upon whether we ignore an overlap integral, as being the case with the simplest Hückel approximation. Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1 ϕ2) remain the same regardless of the approximation levels about the overlap integral. Regarding quantitative evaluation of α, β, and S, we will briefly mention it later. Once we have decided symmetry of MO (or an irreducible representation which the orbital belongs to) and its energy eigenvalue, we will be in a position to examine various physicochemical properties of the molecule. One of them is an optical transition within a molecule, particularly electric dipole transition. In most cases, the most important transition is that occurring among the highest-occupied molecular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most stable state), two electrons are positioned in a B1u state (HOMO). An excited state is assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs belongs to a totally symmetric representation. In the case of ethylene, the ground state belongs to Ag accordingly. If a photon is absorbed by a molecule, that molecule is excited by an energy of the photon. In ethylene, this process takes place by exciting an electron from B1u to B3g. The resulting electronic state ends up as an electron remaining in B1u and another excited to B3g (a final state). Thus, the representation of the final sate electronic configuration (denoted by Γ f) is described as Γ f ¼ B1u B3g :

ð19:106Þ

That is, the final excited state is expressed as a direct product of the states associated with the optical transition. To determine the symmetry of the final sate, we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can

19.4

MO Calculations Based on π-Electron Approximation

755

determine the number of times that individual representations take place. Calculating χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a character table, we have Γ f ¼ B2u :

ð19:107Þ

Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state electronic configuration and B2u is called a final state electronic configuration. This transition is characterized by an electric dipole transition moment operator P. Here we have an important physical quantity of transition matrix element. This quantity Tfi is approximated by T fi Θf jεe PjΘi ,

ð19:108Þ

where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state and final sate electronic configuration, respectively. The description (19.108) is in parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as a particle confined in a square-well potential, a sole one-dimensional harmonic oscillator, and an electron in a hydrogen atom. In the present case, however, we are dealing with a two-electron system, ethylene. Consequently, we cannot describe Θf by a simple wave function, but must use a more elaborated function. Nonetheless, when we discuss the optical transition of a molecule, it is often the case that when we study optical absorption or optical emission we first wish to know whether such a phenomenon truly takes place. In such a case, qualitative prediction for this is of great importance. This can be done by judging whether the integral (19.108) vanishes. If the integral does not vanish, the relevant transition is said to be allowed. If, on the other hand, the integral vanishes, the transition is called forbidden. In this context, a systematic approach based on group theory is a powerful tool for this. Let us consider optical absorption of ethylene. With the transition matrix element for this, we have D E T fi ¼ Θf ðB2u Þ jεe PjΘi ðAg Þ :

ð19:109Þ

Again, note that a closed-shell electronic configuration belongs to a totally symmetric representation [1]. Suppose that the position vector x belongs to an irreducible representation η. A necessary condition for (19.109) not to vanish is that DðηÞ DðAg Þ contains the irreducible representation DðB2u Þ . In the present case, all the representations are one-dimensional, and so we can use χ (ω) instead of D(ω), where ω shows an arbitrary irreducible representation of D2h. This procedure is straightforward as seen in Sect. 19.2. However, if the character is real (it is the case with many symmetry groups and with the point group D2h as well), the situation will be easier. Suppose in general that we are examining whether a following matrix element vanishes:

756

19

Applications of Group Theory to Physical Chemistry

D E ðβ Þ ðαÞ M fi ¼ Φf jOðγÞ jΦi ,

ð19:110Þ

where α, β, and γ stand for irreducible representations and O is an appropriate operator. In this case, (18.200) can be rewritten as 1 X ðαÞ ðγ βÞ 1 X ðαÞ χ ðgÞ χ ð gÞ ¼ χ ðgÞχ ðγ βÞ ðgÞ g g n n 1 X ðαÞ 1 X ðγÞ ðγ Þ ðβ Þ ¼ χ ð g Þχ ð g Þχ ð g Þ ¼ χ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ g g n n 1 X ðγ Þ 1 X ðγÞ ðα βÞ ðα βÞ ¼ χ ð g Þχ ð g Þ ¼ χ ð gÞ χ ð gÞ ¼ qγ g g n n

qα ¼

ð19:111Þ

Consequently, the number of times that D(γ) appears in D(α β) is identical to the number of times that D(α) appears in D(γ β). Thus, it suffices to examine whether D(γ β) contains D(α). In other words, we only have to examine whether qα 6¼ 0 in (19.111). Thus, applying (19.111) to (19.109), we examine whether DðB2u Ag Þ contains the irreducible representation D(η) that is related to x. We easily get B2u ¼ B2u Ag :

ð19:112Þ

Therefore, if εe P (or x) belongs to B2u, the transition is allowed. Consulting the character table, we find that y belongs to B2u. In this case, in fact (19.111) reads as 1 qB2u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ 8 þð1Þ ð1Þ þ 1 1 þ ð1Þ ð1Þ þ 1 1 ¼ 1: Equivalently, we simply write B2u B2u ¼ Ag : This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized along the y-axis or polarized in the direction of the y-axis. As a molecular axis is parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This is often the case with aromatic molecules having a well-defined molecular long axis such as ethylene. We would examine whether ethylene is polarized along, e.g., the x-axis. From a character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111) we have

19.4

MO Calculations Based on π-Electron Approximation

757

1 qB3u ¼ ½1 1 þ ð1Þ ð1Þ þ 1 ð1Þ þ ð1Þ 1 8 þð1Þ ð1Þ þ 1 1 þ ð1Þ 1 þ 1 ð1Þ ¼ 0: This implies that B3u is not contained in B1u B3g (¼B2u). The above results on the optical transitions are quite obvious. Once we get used to using a character table, quick estimation will be done.

19.4.2 Cyclopropenyl Radical [1] Let us think of another example, cyclopropenyl radical that has three resonant structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect. 17.2). In the molecule three π-electrons extend vertically to the molecular plane toward upper and lower directions. Suppose that cyclopropenyl radical is placed on the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral triangle are denoted by ϕ1, ϕ2, and ϕ3 in Fig. 19.8. The orbitals are numbered clockwise so that the calculations can be consistent with the conventional notation of a character table (vide infra). We assume that these π-orbitals take positive and negative signs on the upper and lower sides of the plane of paper, respectively, with a nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the problem can be treated in parallel to the case of ethylene. As in the case of ethylene, we can choose ϕ1, ϕ2, and ϕ3 as real functions. We construct basis vectors using these vectors. What we want to do to address the problem is as follows: (i) We examine how ϕ1, ϕ2, and ϕ3 are transformed by the symmetry operations of D3h. According to the analysis, we can determine what irreducible representations SALCs should be assigned. (ii) On the basis of knowledge obtained in (i), we construct proper MOs. In Table 19.4 we list a character table of D3h along with symmetry species. First we examine traces (characters) of representation matrices. Similarly in the case of ethylene, a subgroup C3 of D3h plays an essential role (vide infra). This subgroup contains three group elements such that C 3 ¼ E, C 3 , C 23 : In the above, we use the same notation for the group name and group element, and so we should be careful not to confuse them. Fig. 19.7 Three resonant structures of cyclopropenyl radical

⋅

⟷

⋅

⟷

⋅

758

19

Applications of Group Theory to Physical Chemistry

Fig. 19.8 Three pz atomic orbitals of carbons for cyclopropenyl radical that is placed on the xy-plane. The carbon atoms are located at vertices of an equilateral triangle. The atomic orbitals are denoted by ϕ1, ϕ2 and ϕ3

y

z

Table 19.4 Character table of D3h

D3h A01 A02 E0 A001 A002 E00

E 1 1 2 1 1 2

2C3 1 1 1 1 1 1

σh 1 1 2 1 1 2

3C2 1 1 0 1 1 0

2S3 1 1 1 1 1 1

x

O

3σ v 1 1 0 1 1 0

x2 + y2, z2 (x, y); (x2 y2, xy) z (yz, zx)

They are transformed as follows: C 3 ðzÞðϕ1 Þ ¼ ϕ3 , C 3 ðzÞðϕ2 Þ ¼ ϕ1 , and C3 ðzÞðϕ3 Þ ¼ ϕ2 ;

ð19:113Þ

C23 ðzÞðϕ1 Þ ¼ ϕ2 , C23 ðzÞðϕ2 Þ ¼ ϕ3 , and C 23 ðzÞðϕ3 Þ ¼ ϕ1 :

ð19:114Þ

Equation (19.113) can be combined into a following form: ðϕ1 ϕ2 ϕ3 ÞC 3 ðzÞ ¼ ðϕ3 ϕ1 ϕ2 Þ:

ð19:115Þ

Using a matrix representation, we have 0

0 B C 3 ðzÞ ¼ @ 0

1 0

1 0 C 1 A:

1

0

0

In turn, (19.114) is expressed as

ð19:116Þ

19.4

MO Calculations Based on π-Electron Approximation

0

759

0 B 2 C 3 ðzÞ ¼ @ 1

0 0

1 1 C 0 A:

0

1

0

ð19:117Þ

Both traces of (19.116) and (19.117) are zero. Similarly let us check the representation matrices of other symmetry species. Of these, e.g., for C2 related to the y-axis (see Fig. 19.8) we have ðϕ1 ϕ2 ϕ3 ÞC 2 ¼ ðϕ1 ϕ3 ϕ2 Þ:

ð19:118Þ

Therefore, 0

1

0

B C2 ¼ @ 0 0

0

0

1

C 1 A:

1

ð19:119Þ

0

We have a trace 1 accordingly. Regarding σ h, we have 0

ðϕ1 ϕ2 ϕ3 Þσ h ¼ ðϕ1 ϕ2 ϕ3 Þ

and

1 0 B σ h ¼ @ 0 1 0

0

1 0 C 0 A: ð19:120Þ 1

In this way we can determine the trace for individual symmetry transformations of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible representation Γ in Table 19.5. It can be reduced to a summation of irreducible representations according to the procedures given in (18.81)–(18.83) and using a character table of D3h (Table 19.4). As a result, we get Γ ¼ A2 00 þ E 00 :

ð19:121Þ

As in the case of ethylene, we make the best use of the information of a subgroup C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in Table 19.6 we show a character table of irreducible representations of C3. We can readily construct this character table. There should be three irreducible representations. Regarding the totally symmetric representation, we allocate 1 to each symmetry operation. Hence, for other representations we allocate 1 to an identity element E and two other triple roots of 1, i.e., ε and ε [where ε ¼ exp (i2π/3)] to an element C3 and C32 as shown so that the row and column vectors of the character table are orthogonal to each other. Returning to the construction of the subduced representation, we have

760

19

Table 19.5 Characters for individual symmetry transformations of 2pz orbitals in cyclopropenyl radical

D3h Γ

Table 19.6 Character table of C3

C3 A E

Applications of Group Theory to Physical Chemistry E 3

E 1 1 1

2C3 0

C3 1 ε ε

σh 3

3C2 1

C 23 1 ε ε

2S3 0

3σ v 1

ε ¼ exp (i2π/3) z; x2 + y2, z2 (x, y); (x2 y2, xy), (yz, zx)

A2 00 # C3 ¼ A and E00 # C 3 ¼ 2E:

ð19:122Þ

Then, corresponding to (18.147) we have h i 1 X ðAÞ 1 χ ð g Þ gϕ1 ¼ ½1 ϕ1 þ 1 ϕ2 þ 1 ϕ3 g 3 3 1 ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ: 3

PðAÞ ϕ1 ¼

ð19:123Þ

Also, we have h i ð1Þ 1 X ½ E ð 1Þ 1 ð g Þ χ gϕ1 ¼ ½1 ϕ1 þ ε ϕ3 þ ðε Þ ϕ2 P½E ϕ1 ¼ g 3 3 1 ¼ ðϕ1 þ εϕ2 þ ε ϕ3 Þ: 3

ð19:124Þ

Also, we get h i ð 2Þ 1 X ½ E ð 2Þ 1 χ gϕ1 ¼ ½1 ϕ1 þ ðε Þ ϕ3 þ ε ϕ2 ð g Þ P ½ E ϕ1 ¼ g 3 3 1 ¼ ðϕ1 þ ε ϕ2 þ εϕ3 Þ: 3

ð19:125Þ

Here is the best place to mention an eigenvalue of a symmetry operator. Let us designate SALCs as follows: ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 , ξ2 ¼ ϕ1 þ εϕ2 þ ε ϕ3 , and ξ3 ¼ ϕ1 þ ε ϕ2 þ εϕ3 : ð19:126Þ Let us choose C3 for a symmetry operator. Then we have

19.4

MO Calculations Based on π-Electron Approximation

761

C 3 ðξ1 Þ ¼ C 3 ðϕ1 þ ϕ2 þ ϕ3 Þ ¼ C3 ϕ1 þ C3 ϕ2 þ C 3 ϕ3 ¼ ϕ3 þ ϕ1 þ ϕ2 ¼ ξ1 , ð19:127Þ where for the second equality we used the fact that C3 is a linear operator. That is, regarding a SALC ξ1, an eigenvalue of C3 is 1. Similarly, we have C3 ðξ2 Þ ¼ C3 ðϕ1 þ εϕ2 þ ε ϕ3 Þ ¼ C 3 ϕ1 þ εC3 ϕ2 þ ε C 3 ϕ3 ¼ ϕ3 þ εϕ1 þ ε ϕ2 ¼ εðϕ1 þ εϕ2 þ ε ϕ3 Þ ¼ εξ2 :

ð19:128Þ

Furthermore, we get C 3 ðξ 3 Þ ¼ ε ξ 3 :

ð19:129Þ

Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε , respectively. These pieces of information imply that if we appropriately choose proper functions for basis vectors, a character of a symmetry operation for a one-dimensional representation is identical to an eigenvalue of the said symmetry operation (see Table 19.6). Regarding the last parts of the calculations, we follow the procedures described in the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the secular equation such that e e H 11 λS11

e 22 λe S22 H

¼ 0: e e H 33 λS33

ð19:130Þ

Since we have obtained three SALCs that are assigned to individual irreducible representations A, E(1), and E(2), these SALCs span the representation space V3. This makes off-diagonal elements of the secular equation vanish and it is simplified as in (19.130). Here, we have ð ð e 11 ¼ ξ 1 Hξ1 dτ ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ H ¼ 3ðα þ 2βÞ,

ð19:131Þ

where we used the same α and β as defined in (19.94). Strictly speaking, α and β appearing in (19.131) should be slightly different from those of (19.94), because a Hamiltonian is different. This approximation, however, would be enough for the present studies. In a similar manner, we get

762

19

ð e 22 ¼ ξ 2 Hξ2 dτ ¼ 3ðα βÞ H

Applications of Group Theory to Physical Chemistry

ð e 33 ¼ ξ 3 Hξ3 dτ ¼ 3ðα βÞ: and H

ð19:132Þ

Meanwhile, we have ð e S11 ¼ hξ1 jξ1 i ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ ð ¼ ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ ¼ 3ð1 þ 2SÞ:

ð19:133Þ

Similarly, we get e S22 ¼ e S33 ¼ 3ð1 SÞ:

ð19:134Þ

Readers are urged to verify (19.133) and (19.134). Substituting (19.131) through (19.134) for (19.130), we get as the energy eigenvalue λ1 ¼

α þ 2β , 1 þ 2S

λ2 ¼

αβ , 1S

and

λ3 ¼

αβ : 1S

ð19:135Þ

Notice that two MOs belonging to E00 have the same energy. These MOs are said to be energetically degenerate. This situation is characteristic of a two-dimensional representation. Actually, even though the group C3 has only one-dimensional representations (because it is an Abelian group), the two complex conjugate representations labelled E behave as if they were a two-dimensional representation [2]. We will again encounter the same situation in a next example, benzene. From (19.126), we get j jξ1 j j¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hξ1 jξ1 i ¼ 3ð1 þ 2SÞ:

ð19:136Þ

Thus, as one of MOs corresponding to an energy eigenvalue λ1, i.e., Ψ 1, we get Ψ1 ¼

j ξ1 i ϕ1 þ ϕ2 þ ϕ3 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ¼ p j jξ1 j j 3ð1 þ 2SÞ

ð19:137Þ

Also, we have pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hξ2 jξ2 i ¼ 3ð1 SÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 3ð 1 SÞ :

jjξ2 jj ¼

and

j jξ3 j j¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi hξ3 jξ3 i ð19:138Þ

Thus, for another MO corresponding to an energy eigenvalue λ2, we get

19.4

MO Calculations Based on π-Electron Approximation

Ψ2 ¼

j ξ2 i ϕ þ εϕ2 þ ε ϕ3 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : j jξ2 j j 3ð 1 SÞ

763

ð19:139Þ

Also, with a MO corresponding to λ3 (¼λ2), we have Ψ3 ¼

j ξ3 i ϕ þ ε ϕ2 þ εϕ3 ¼ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : j jξ3 j j 3ð 1 SÞ

ð19:140Þ

Equations (19.139) and (19.140) include complex numbers, and so it is inconvenient to computer analysis. In that case, we can convert it to real numbers. In Part III, we examined properties of unitary transformations. Since the unitary transformation keeps a norm of a vector unchanged, this is suited to our present purpose. This can be done using a following unitary matrix U: 0

1 pffiffiffi B 2 U¼B @ 1 pffiffiffi 2

1 i pffiffiffi 2C C: i A pffiffiffi 2

ð19:141Þ

Then we have ðΨ 2 Ψ 3 ÞU ¼

1 i pffiffiffi ðΨ 2 þ Ψ 3 Þ pffiffiffi ðΨ 2 þ Ψ 3 Þ : 2 2

ð19:142Þ

f2 and Ψ f3 as Thus, defining Ψ f2 ¼ p1ffiffiffi ðΨ 2 þ Ψ 3 Þ and Ψ 2

f3 ¼ piffiffiffi ðΨ 2 þ Ψ 3 Þ, Ψ 2

ð19:143Þ

we get 1 f2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3 Ψ 6ð 1 SÞ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2ϕ1 ϕ2 ϕ3 Þ: 6ð 1 SÞ

ð19:144Þ

Also, we have h pffiffiffi pffiffiffi i i i f3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½ðε εÞϕ2 þ ðε ε Þϕ3 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 3ϕ2 þ i 3ϕ3 Ψ 6ð1 SÞ 6ð 1 SÞ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðϕ2 ϕ3 Þ: ð19:145Þ 2ð 1 SÞ

764

19

Applications of Group Theory to Physical Chemistry

Thus, we have successfully converted complex functions to real functions. In the above unitary transformation, notice that a norm of the vectors remains unchanged before and after the unitary transformation. As cyclopropenyl radical has three π electrons, two occupy the lowest energy level of A00. Another electron occupies a level E00. Since this level possesses an energy higher than α, the electron occupying this level is anticipated to be unstable. Under such a circumstance, a molecule tends to lose the said electron so as to be a cation. Following the argument given in the previous case of ethylene, it is easy to make sure that the allowed transition of the cyclopropenyl radical takes place when the light is polarized parallel to the molecular plane (i.e., the xy-plane in Fig. 19.8). The proof is left for readers as an exercise. This polarizing feature is typical of planar molecules with high molecular symmetry.

19.4.3 Benzene Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h symmetry. In the molecule six π-electrons extend vertically to the molecular plane toward upper and lower directions as in the case of ethylene and cyclopropenyl radical. This is a standard illustration of quantum chemistry and dealt with in many textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be treated similarly. As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in Fig. 19.10. These vectors or their linear combinations span a six-dimensional representation space. We construct basis vectors using these vectors. Following the previous procedures, we construct proper SALC orbitals along with MOs. Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup contains six group elements such that C6 ¼ E, C6 , C 3 , C 2 , C 23 , C56 :

ð19:146Þ

Taking C6(z) as an example, we have ðϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ÞC 6 ðzÞ ¼ ðϕ6 ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Þ: Using a matrix representation, we have Fig. 19.9 Structural formula of benzene. It belongs to D6h symmetry

ð19:147Þ

19.4

MO Calculations Based on π-Electron Approximation

Fig. 19.10 Six equivalent pz atomic orbitals of carbon of benzene

765

y

O

z

0

0 1

0

0

0

0 0

1 0

0 1

0 0

0

0

0

1

0 0

0 0

0 0

0 0

B0 B B B0 C 6 ðzÞ ¼ B B0 B B @0 1

0

x

1

0C C C 0C C: 0C C C 1A

ð19:148Þ

0

Once again, we can determine the trace for individual symmetry transformations belonging to D6h. We collect the results in Table 19.7. The representation is reducible and this is reduced as follows using a character table of D6h (Table 19.8). As a result, we get Γ ¼ A2u þ B2g þ E 1g þ E2u :

ð19:149Þ

As a subduced representation of D of D6h to C6, we have A2u # C 6 ¼ A, B2g # C 6 ¼ B, E 1g # C 6 ¼ 2E1 , E 2u # C 6 ¼ 2E2 :

ð19:150Þ

Here, we used Table 19.9 that shows a character table of irreducible representations of C6. Following the previous procedures, as SALCs we have

766

19

Applications of Group Theory to Physical Chemistry

Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene D6h Γ

E 6

2C6 0

2C3 0

C2 0

3C 02 2

3C 002 0

i 0

2S3 0

σh 6

2S6 0

3σ d 0

3σ v 2

Table 19.8 Character table of D6h D6h A1g A2g B1g B2g E1g E2g A1u A2u B1u B2u E1u E2u

E 1 1 1 1 2 2 1 1 1 1 2 2

2C6 1 1 1 1 1 1 1 1 1 1 1 1

2C3 1 1 1 1 1 1 1 1 1 1 1 1

C2 1 1 1 1 2 2 1 1 1 1 2 2

3C 02 1 1 1 1 0 0 1 1 1 1 0 0

3C 002 1 1 1 1 0 0 1 1 1 1 0 0

i

2S3 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 2 2 1 1 1 1 2 2

2S6 1 1 1 1 1 1 1 1 1 1 1 1

σh 1 1 1 1 2 2 1 1 1 1 2 2

3σ d 1 1 1 1 0 0 1 1 1 1 0 0

3σ v 1 1 1 1 0 0 1 1 1 1 0 0

x2 + y2, z2

(yz, zx) (x2 y2, xy) z

(x, y)

6PðAÞ ϕ1 ξ1 ¼ ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 , 6PðBÞ ϕ1 ξ2 ¼ ϕ1 ϕ2 þ ϕ3 ϕ4 þ ϕ5 ϕ6 , ð1Þ P½E1 ϕ1 ξ3 ¼ ϕ1 þ εϕ2 ε ϕ3 ϕ4 εϕ5 þ ε ϕ6 , ð2Þ P½E1 ϕ ξ ¼ ϕ þ ε ϕ εϕ ϕ ε ϕ þ εϕ , 1

4

1

2

3

4

5

ð19:151Þ

6

ð1Þ P½E2 ϕ1 ξ5 ¼ ϕ1 ε ϕ2 εϕ3 þ ϕ4 ε ϕ5 εϕ6 , ð2Þ P½E2 ϕ1 ξ6 ¼ ϕ1 εϕ2 ε ϕ3 þ ϕ4 εϕ5 ε ϕ6 ,

where ε ¼ exp (iπ/3). Correspondingly, we have a diagonal secular equation of a sixth-order such that H e e 11 λS11

e 22 λe H S22 e 33 λe H S33

Here, we have, for example,

e 44 λe H S44 e 55 λe H S55

¼ 0: ð19:152Þ e e H 66 λS66

19.4

MO Calculations Based on π-Electron Approximation

767

Table 19.9 Character table of C6 C6 A B E1 E2

E 1 1 1 1 1 1

C6 1 1 ε ε ε ε

C3 1 1 ε ε ε ε

C2 1 1 1 1 1 1

C 23 1 1 ε ε ε ε

C 56 1 1 ε ε ε ε

ð e 11 ¼ ξ 1 Hξ1 dτ ¼ hξ1 jHξ1 i ¼ 6ðα þ 2β þ 2β0 þ β00 Þ: H

ε ¼ exp (iπ/3) z; x2 + y2, z2 (x, y); (yz, zx) (x2 y2, xy)

ð19:153Þ

In (19.153), we used the same α and β defined in (19.94) as in the case of cyclopropenyl radical. That is, α is a Coulomb integral and β is a resonance integral between two adjacent 2pz orbital of carbon. Meanwhile, β0 is a resonance integral between orbitals of “meta” positions such as ϕ1 and ϕ3. A quantity β00 is a resonance integral between orbitals of “para” positions such as ϕ1 and ϕ4. It is unfamiliar to include such kind resonance integrals of β0 and β00 at a simple π-electron approximation level. To ignore such resonance integrals is because of a practical purpose to simplify the calculations. However, we have no reason to exclude them. Or rather, the use of appropriate SALCs makes it feasible to include β0 and β00. In a similar manner, we get ð e 22 ¼ ξ 2 Hξ2 dτ ¼ hξ2 jHξ2 i ¼ 6ðα 2β þ 2β0 β00 Þ, H e 33 ¼ H e 44 ¼ 6ðα þ β β0 β00 Þ, H e 55 ¼ H e 66 ¼ 6ðα β β0 þ β00 Þ: H

ð19:154Þ

Meanwhile, we have e S11 ¼ hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ, e S22 ¼ hξ2 jξ2 i ¼ 6ð1 2S þ 2S0 S00 Þ, e S33 ¼ e S44 ¼ 6ð1 þ S S0 S00 Þ,

ð19:155Þ

e S55 ¼ e S66 ¼ 6ð1 S S0 þ S00 Þ, where S, S0, and S00 are overlap integrals between the ortho, meta, and para positions, respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigenvalues are readily obtained as

768

19

Applications of Group Theory to Physical Chemistry

α þ 2β þ 2β0 þ β00 α 2β þ 2β0 β00 λ2 ¼ , 0 00 , 1 þ 2S þ 2S þ S 1 2S þ 2S0 S00 0 00 αββ þβ : λ5 ¼ λ6 ¼ 1 S S0 þ S00

λ1 ¼

λ3 ¼ λ4 ¼

α þ β β0 β00 , 1 þ S S0 S00

ð19:156Þ Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen, λ3 and λ4 are doubly degenerate. So are λ5 and λ6. From (19.151), we get for instance kξ1 k ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hξ1 jξ1 i ¼ 6ð1 þ 2S þ 2S0 þ S00 Þ:

ð19:157Þ

Thus, for one of normalized MOs corresponding to an energy eigenvalue λ1, we get Ψ1 ¼

j ξ1 i ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi : ¼ kξ 1 k 6ð1 þ 2S þ 2S0 þ S00 Þ

ð19:158Þ

Following the previous examples, we have other normalized MOs. That is, we have Ψ2 ¼

j ξ2 i ϕ1 ϕ2 þ ϕ3 ϕ4 þ ϕ5 ϕ6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi , ¼ kξ 2 k 6ð1 2S þ 2S0 S00 Þ

Ψ3 ¼

j ξ3 i ϕ1 þ εϕ2 ε ϕ3 ϕ4 εϕ5 þ ε ϕ6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi , ¼ kξ 3 k 6ð1 þ S S0 S00 Þ

Ψ4 ¼

j ξ4 i ϕ1 þ ε ϕ2 εϕ3 ϕ4 ε ϕ5 þ εϕ6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi , ¼ kξ 4 k 6ð1 þ S S0 S00 Þ

Ψ5 ¼

j ξ5 i ϕ1 ε ϕ2 εϕ3 þ ϕ4 ε ϕ5 εϕ6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi , ¼ kξ 5 k 6ð1 S S0 þ S00 Þ

Ψ6 ¼

j ξ6 i ϕ1 εϕ2 ε ϕ3 þ ϕ4 εϕ5 ε ϕ6 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi : ¼ kξ 6 k 6ð1 S S0 þ S00 Þ

ð19:159Þ

The eigenfunction Ψ i (1 i 6) corresponds to the eigenvalue λi. As in the case f3 and Ψ f4, respectively, of cyclopropenyl radical, Ψ 3 and Ψ 4 can be transformed to Ψ through a unitary matrix of (19.141). Thus, we get

19.4

MO Calculations Based on π-Electron Approximation

769

(

Fig. 19.11 Energy diagram and MO assignments of benzene. Energy eigenvalues λ1 to λ6 are given in (19.156)

2 ϕ3 2ϕ4 ϕ5 þ ϕ6 f3 ¼ 2ϕ1 þpϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , Ψ 12ð1 þ S S0 S00 Þ 2 þ ϕ3 ϕ5 ϕ6 f4 ¼ ϕp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : Ψ 2 1 þ S S0 S00

) =

(

)

=

(

)

ð19:160Þ

f5 and Ψ f6 , respectively, we have Similarly, transforming Ψ 5 and Ψ 6 to Ψ 2 ϕ3 þ 2ϕ4 ϕ5 ϕ6 f5 ¼ 2ϕ1 pϕffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Ψ , 12ð1 S S0 þ S00 Þ 2 ϕ3 þ ϕ5 ϕ6 f6 ¼ ϕp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : Ψ 2 1 S S0 þ S00

ð19:161Þ

Figure 19.11 shows an energy diagram and MO assignments of benzene. A major optical transition takes place among HOMO (E1g) and LUMO (E2u) levels. In the case of optical absorption, an initial electronic configuration is assigned to the totally symmetric representation A1g and the symmetry of the final electronic configuration is described as Γ ¼ E 1g E 2u : Therefore, a transition matrix element is expressed as D

E ΦðE1g E2u Þ jεe PΦðA1g Þ ,

ð19:162Þ

where ΦðA1g Þ stands for the totally symmetric ground-state electronic configuration; ΦðE1g E2u Þ denotes an excited state electronic configuration represented by a directproduct representation. This representation is reducible and expressed as a direct sum of irreducible representations such that

770

19

Applications of Group Theory to Physical Chemistry

Γ ¼ B1u þ B2u þ E 1u :

ð19:163Þ

Notice that unlike ethylene a direct-product representation associated with the final state is reducible. To examine whether (19.162) is nonvanishing, as in the case of (19.109) we estimate whether a direct-product representation A1g E1g E2u ¼ E1g E2u ¼ B1u + B2u + E1u contains an irreducible representation which εe P belongs to. Consulting a character table of D6h, we find that x and y belong to an irreducible representation E1u and that z belongs to A2u. Since the direct sum Γ in (19.163) contains E1u, benzene is expected to be polarized along both x- and y-axes (see Fig. 19.10). Since (19.163) does not contain A2u, the transition along the z-axis is forbidden. Accordingly, the transition takes place when the light is polarized parallel to the molecular plane (i.e., the xy-plane). This is a common feature among planar aromatic molecules including benzene and cyclopropenyl radical. On the other hand, A2u is not contained in Γ, and so we do not expect the optical transition to occur in the direction of the z-axis.

19.4.4 Allyl Radical [1] We revisit the allyl radical and perform its MO calculations. As already noted, Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual symmetry operations in reference to the basis vectors comprising three atomic orbitals of allyl radical. As usual, we examine traces (or characters) of those matrices. Table 19.10 collects them. The representation is readily reduced according to the character table of C2v (see Table 18.4) so that we have Γ ¼ A2 þ 2B1 :

ð19:164Þ

We have two SALC orbitals that belong to the same irreducible representation of B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two SALCs belong to B1 as well. Such a linear combination is given by a unitary transformation. In the present case, it is convenient to transform the basis vectors two times. The first transformation is carried out to get SALCs and the second one will be done in the process of solving a secular equation using the SALCs. Schematically showing the procedures, we have Table 19.10 Characters for individual symmetry transformations of 2pz orbitals in allyl radical

C2v Γ

E 3

C2(z) 1

σ v(zx) 1

σ 0v ðyzÞ 3

19.4

MO Calculations Based on π-Electron Approximation

771

8 8 8 > > > < ϕ1 < Ψ1 < Φ1 ϕ2 ! Ψ 2 ! Φ2 , > > > : : : ϕ3 Ψ3 Φ3 where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2, Φ3 the final MOs. Thus, the three sets of vectors are connected through unitary transformations. Starting with ϕ1 and following the previous cases, we have, e.g., PðB1 Þ ϕ1 ¼

h i 1 X ðB1 Þ χ ð g Þ gϕ1 ¼ ϕ1 : g 4

Also starting with ϕ2, we have PðB1 Þ ϕ2 ¼

h i 1 X ðB1 Þ 1 χ ðgÞ gϕ2 ¼ ðϕ2 þ ϕ3 Þ: g 4 2

Meanwhile, we get PðA2 Þ ϕ1 ¼ 0, h i 1 X ðA2 Þ 1 PðA2 Þ ϕ2 ¼ χ ð g Þ gϕ2 ¼ ðϕ2 ϕ3 Þ: g 4 2 Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not participate in A2, but take part in B1 by itself. Normalized SALCs are given as follows: Ψ 1 ¼ ϕ1 , Ψ 2 ¼ ðϕ2 þ ϕ3 Þ=

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ð1 þ S0 Þ, Ψ 3 ¼ ðϕ2 ϕ3 Þ= 2ð1 S0 Þ,

Ð where we define S0 ϕ2ϕ3dτ. If Ψ 1, Ψ 2, and Ψ 3 belonged to different irreducible representations (as in the cases of previous three examples of ethylene, cyclopropenyl radical, and benzene), the secular equation would be fully reduced to a form of (19.66). In the present case, however, Ψ 1 and Ψ 2 belong to the same irreducible representation B1, making the situation a bit complicated. Nonetheless, we can use (19.61) and the secular equation is “partially” reduced. Defining ð H jk ¼ Ψ j HΨ k dτ

and

ð Sjk ¼ Ψ j Ψ k dτ,

we have a following secular equation the same form as (19.61):

772

19

Applications of Group Theory to Physical Chemistry

det H jk λSjk ¼ 0: More specifically, we have αλ rffiffiffiffiffiffiffiffiffiffiffiffi 2 1 þ S0 ðβ SλÞ 0

rffiffiffiffiffiffiffiffiffiffiffiffi 2 ðβ SλÞ 1 þ S0 α þ β0 λ 1 þ S0 0

¼ 0, 0 α β0 λ 1 S0 0

where α, β, and S are similarly defined as (19.94) and (19.98). The quantity β0 is defined as 0

ð

β ϕ 2 Hϕ3 dτ: Thus, the secular equation is separated into the following two: rffiffiffiffiffiffiffiffiffiffiffiffi 2 αλ 0 ðβ SλÞ 1þS ¼ 0 and rffiffiffiffiffiffiffiffiffiffiffiffi 0 2 αþβ λ ð β Sλ Þ 0 1 þ S0 1þS

α β0 λ ¼ 0: 1 S0

ð19:165Þ

The second equation immediately gives λ¼

α β0 : 1 S0

The first equation of (19.165) is somewhat complicated, and so we adopt the next approximation. That is, S0 ¼ β0 ¼ 0:

ð19:166Þ

This approximation is justified, because two carbon atoms C2 and C3 are pretty remote, and so the interaction between them is likely to be weak enough. Thus, we rewrite the first equation of (19.165) as pffiffiffi αλ 2ðβ SλÞ ¼ 0: pffiffiffi 2ðβ SλÞ αλ

ð19:167Þ

19.4

MO Calculations Based on π-Electron Approximation

773

Moreover, we assume that since S is a small quantity compared to 1, a square term of S2 1. Hence, we ignore S2. Using the approximation of (19.166) and assuming S2 0, from (19.167) we have a following quadratic equation: λ2 2ðα 2βSÞλ þ α2 2β2 ¼ 0: Solving this equation, we have pffiffiffi λ ¼ α 2βS 2β

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi 2αS αS

α 2βS 2β 1 , 1 β β

where the last approximation is based on pffiffiffiffiffiffiffiffiffiffiffi 1 1x 1 x 2 with a small quantity x. Thus, we get pffiffiffi pffiffiffi λ L α þ 2β 1 2S

pffiffiffi pffiffiffi and λH α 2β 1 þ 2S ,

ð19:168Þ

where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting Φ ¼ c1 Ψ 1 þ c2 Ψ 2

ð19:169Þ

and from (19.167), we obtain ðα λÞc1 þ

pffiffiffi 2ðβ SλÞc2 ¼ 0:

Thus, with λ ¼ λL we get c1 ¼ c2 and for λ ¼ λH we get c1 ¼ c2. Consequently, as a normalized eigenfunction ΦL corresponding to λL we get

pffiffiffi 1=2 pffiffiffi 1 1 ΦL ¼ pffiffiffi ðΨ 1 þ Ψ 2 Þ ¼ 2ϕ1 þ ϕ2 þ ϕ3 : 1 þ 2S 2 2

ð19:170Þ

Likewise, as a normalized eigenfunction ΦH corresponding to λH we get

pffiffiffi 1=2 pffiffiffi 1 1 ΦH ¼ pffiffiffi ðΨ 1 þ Ψ 2 Þ ¼ 1 2S 2ϕ1 þ ϕ2 þ ϕ3 : 2 2

ð19:171Þ

As another eigenvalue λ0 and corresponding eigenfunction Φ0, we have

774

19

Applications of Group Theory to Physical Chemistry

1 λ0 α and Φ0 ¼ pffiffiffi ðϕ2 ϕ3 Þ, 2

ð19:172Þ

where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical bonding and, hence, is said to be a nonbonding orbital. It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function forms as those obtained by the simple Hückel theory that ignores the overlap integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs of Ψ 2 and Ψ 3 is weak. Notice that within the framework of the simple Hückel theory, two sets of basis vectors jϕ1 i, p1ffiffi2 jϕ2 þ ϕ3 i, p1ffiffi2 jϕ2 ϕ3 i and |ΦLi, |ΦHi, |Φ0i are connected by a following unitary matrix V: ðjΦL i jΦH i jΦ0 iÞ ¼

1 1 jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2 ϕ3 i V 2 2 0

B B 1 1 B ¼ jϕ1 i pffiffiffi jϕ2 þ ϕ3 i pffiffiffijϕ2 ϕ3 i B B 2 2 @

1 pffiffiffi 2 1 pffiffiffi 2 0

1 pffiffiffi 0 2 1 pffiffiffi 0 2 0 1

1 C C C C: ð19:173Þ C A

Although both SALCs Ψ 1 and Ψ 2 belong to the same irreducible representation B1, they are not orthogonal. As can be seen from (19.170) and (19.171), however, we find that ΦL and ΦH that are sought by solving the secular equation (19.167) have been mutually orthogonal. Starting from jϕ1i, jϕ2i, and jϕ3i of Example 18.1, we reached jΦLi, jΦHi, and jΦ0i via two-step unitary transformations (18.40) and (19.173). The combined unitary transformations W ¼ UV are unitary again. That is, we have 0 1 pffiffiffi B 2 B B 1 ðjϕ1 ijϕ2 ijϕ3 iÞUV ¼ ðjϕ1 ijϕ2 ijϕ3 iÞB B 2 B @ 1 2 ¼ ðjΦL i jΦH i jΦ0 iÞ:

1 1 pffiffiffi 0 C 2 C 1 1 C pffiffiffi C 2 2 C C 1 1 A pffiffiffi 2 2 ð19:174Þ

The optical transition of allyl radical represents general features of the optical transitions of molecules. To make a story simple, let us consider a case of allyl cation. Figure 19.12 shows electronic configurations together with symmetry species of individual eigenstates and their corresponding energy eigenvalues for the allyl cation. In Fig. 19.13, we redraw its geometry where the origin is located at the center

19.4

(a)

MO Calculations Based on π-Electron Approximation

(

)

(

)

(

)

775

(b)

(c)

− 2 )(1 + 2

+ 2 )(1 − 2

Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state. (c) Second excited state C1

r1 C3

r3

O

r2

C2

Fig. 19.13 Geometry and position vectors of carbon atoms of the allyl cation. The origin is located at the center of a line segment connecting C2 and C3 (r2 + r3 = 0)

of a line segment connecting C2 and C3 (r2 + r3 = 0). Major optical transitions (or optical absorption) are ΦL ! Φ0 and ΦL ! ΦH. (i) ΦL ! Φ0: In this case, following (19.109) the transition matrix element Tfi is described by D E T fi ¼ Θf ðB1 A2 Þ jεe PjΘi ðA1 Þ :

ð19:175Þ

In the above equation, we designate the irreducible representation of eigenstates according to (19.164). Therefore, the symmetry of the final electronic configuration is described as Γ ¼ B1 A2 ¼ B2 : Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the final configuration Θf ðB1 A2 Þ is B1 A2 A1 ¼ B2. Then, if εe P belongs to the same irreducible representation B2, the associated optical transition should be allowed. Consulting a character table of C2v, we find that y belongs to B2. Thus, the allowed optical transition is polarized along the y-direction. (ii) ΦL ! ΦH: In parallel with the above case, Tfi is described by D E T fi ¼ Θf ðB1 B1 Þ jεe PjΘi ðA1 Þ : Thus, the transition is characterized by

ð19:176Þ

776

19

Applications of Group Theory to Physical Chemistry

A1 ! A1 , where the former A1 indicates the electronic ground state and the latter A1 indicates the excited state given by B1 B1 ¼ A1. The direct product of them is simply described as B1 B1 A1 ¼ A1. Consulting a character table again, we find that z belongs to A1. This implies that the allowed transition is polarized along the z-direction. Next, we investigate the above optical transition in a semiquantitative manner. In principle, the transition matrix element Tfi should be estimated from (19.175) and (19.176) that use electronic configurations of two-electron system. Nonetheless, a formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our present purpose. For ΦL ! Φ0 transition, we have ð

ð

T fi ðΦL ! Φ0 Þ ¼ Φ0 εe PΦL dτ ¼ eεe Φ0 rΦL dτ ð pffiffiffi

1 1 2ϕ1 þ ϕ2 þ ϕ3 r pffiffiffi ðϕ2 ϕ3 Þ dτ ¼ eεe 2 2 ð pffiffiffi

pffiffiffi eε ¼ peffiffiffi 2ϕ1 rϕ2 2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2 ϕ3 rϕ3 dτ 2 2 ð eε

peffiffiffi ðϕ2 rϕ2 ϕ3 rϕ3 Þ dτ 2 2 ð ð eε

peffiffiffi r2 jϕ2 j2 dτ r3 jϕ3 j2 dτ 2 2 eε eε ¼ peffiffiffi ðr2 r3 Þ ¼ pffiffieffi r2 2 2 2 Ð where with the first near equality we ignored integrals ϕirϕjdτ (i 6¼ j); with the last near equality r r2 or r3. For these approximations, we assumed that an electron density is very high near C2 or C3 with ignorable density at a place remote from them. Choosing εe for the direction of r2, we get e T fi ðΦL ! Φ0 Þ pffiffiffi j r2 j : 2 With ΦL ! ΦH transition, similarly we have ð eε eε T fi ðΦL ! ΦH Þ ¼ ΦH εe PΦL dτ e ðr3 2 2r1 þ r2 Þ = 2 e r1 : 4 2 Choosing εe for the direction of r1, we get

19.5

MO Calculations of Methane

T fi ðΦL ! ΦH Þ

777

e jr j: 2 1

Transition pffiffiffi probability is proportional to a square of an absolute value of Tfi. Using j r2 j 3 j r1 j, we have 2 2 T fi ðΦL ! Φ0 Þ2 e jr2 j2 ¼ 3e jr1 j2 , 2 2 2 e2 T fi ðΦL ! ΦH Þ jr1 j2 : 4

Thus, we obtain T fi ðΦL ! Φ0 Þ2 6T fi ðΦL ! ΦH Þ2 :

ð19:177Þ

Thus, the transition probability of ΦL ! Φ0 is about six times that for ΦL ! ΦH. Note that in the above simple estimation we ignore an overlap integral S. From the above discussion, we conclude that (i) the ΦL ! Φ0 transition is polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL ! ΦH transition is polarized along the r1 direction (i.e., the molecular short axis). (ii) Transition probability of ΦL ! Φ0 is about six times that of ΦL ! ΦH. Note that the polarized characteristics are consistent with those obtained from the discussion based on the group theory. The conclusion reached by the semiquantitative estimation of a simple molecule of allyl cation well typifies the general optical features of more complicated molecules having a well-defined molecular long axis such as polyenes.

19.5

MO Calculations of Methane

So far, we investigated MO calculations of aromatic molecules based upon πelectron approximation. These are a homogeneous system that has the same quality of electrons. Here we deal with methane that includes a carbon and surrounding four hydrogens. These hydrogen atoms form a regular tetrahedron with the carbon atom positioned at a center of the tetrahedron. It is therefore considered as a heterogeneous system. The calculation principle, however, is consistent, namely, we make the most of projection operators and construct appropriate SALCs of methane. We deal with four 1s electrons of hydrogen along with two 2s electrons and two 2p electrons of carbon. Regarding basis functions of carbon, however, we consider 2s atomic orbital and three 2p orbitals (i.e., 2px, 2py, 2pz orbitals). That is, we deal with eight atomic orbitals all together. These are depicted in Fig. 19.14. The dimension of the vector space (i.e., representation space) is eight accordingly.

778

19

Applications of Group Theory to Physical Chemistry

z

Fig. 19.14 Four 1s atomic orbitals of hydrogen and a 2pz orbital of carbon. The former orbitals are represented by H1 to H4. 2px and 2py orbitals of carbon are omitted for simplicity

O

y

x

As before, we wish to determine irreducible representations which individual MOs belong to. As already mentioned in Sect. 17.3, there are 24 symmetry operations in a point group Td which methane belongs to (see Table 17.6). According to the symmetry operations, we decide transformation matrices related to each operation. For example, Cxyz 3 transforms basis functions as follows:

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C xyz 3

¼ H 1 H 3 H 4 H 2 C2s C2py C2pz C2px ,

where by the above notations we denoted atomic species and molecular orbitals. Hence, as a matrix representation we have 0

Cxyz 3

1

B0 B B B0 B B0 B ¼B B0 B B0 B B @0 0

0

1

0

0 0

0

0

0

0

0 1

0

0

0

1 0

0 0 1 0

0 0

0 0

0 0

0 0

0 0 0 0

1 0

0 0

0 0

0

0 0

0

1

0

0C C C 0C C 0C C C, 0C C 1C C C 0A

0

0 0

0

0

1

0

ð19:178Þ

where Cxyz appeared in Sect. 17.3. Therefore, 3 is the same operation as Rxyz2π 3

19.5

MO Calculations of Methane

779

χ C xyz ¼ 2: 3 As another example σ yz d , we have

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz σ yz d

¼ H 1 H 2 H 4 H 3 C2s C2px C2pz C2py ,

where σ yz d represents a mirror symmetry with respect to the plane that includes the xaxis and bisects the angle formed by the y- and z-axes. Thus, we have 0

1

B0 B B B0 B B0 B yz σd ¼ B B0 B B0 B B @0 0

0

1

0

0

0 0

0

0

1

0

0 0

0

0

0 0

0 1

1 0 0 0

0 0

0 0

0 0

0 0

0 1 0 0

0 1

0 0

0

0

0 0

0

0

0C C C 0C C 0C C C: 0C C 0C C C 1A

0

0

0 0

0

1

0

Then, we have

χ σ yz d ¼ 4: Taking some more examples, for Cz2 we get

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C z2

¼ H 4 H 3 H 2 H 1 C2s C2px C2py C2pz , where Cz2 means a rotation by π around the z-axis. Also, we have

ð19:179Þ

780

19

0

0 B0 B B B0 B B1 B z C2 ¼ B B0 B B0 B B @0

Applications of Group Theory to Physical Chemistry

0 0

0 1

1 0

0 0

0 0

1

0

0

0

0

0 0

0 0

0 0

0 1

0 0

0 0

0 0

0 0

0 1 0 0

0 0

0 0 0 0

χ Cz2 ¼ 0:

1 0 0C C C 0 0C C 0 0C C C, 0 0C C 0 0C C C 1 0 A 0 0

0

ð19:180Þ

1

zπ

With S42 (i.e., an improper rotation by π2 around the z-axis), we get

zπ H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42

¼ H 3 H 1 H 4 H 2 C2s C2py C2px C2pz , 0

0 B0 B B B1 B B0 zπ2 B S4 ¼ B B0 B B0 B B @0 0

1 0 0 0

0 1

0 0

0 0

0 0

0 0

0

0

0

0

0 1 0 0

0 0

0 1

0 0

0 0

0 0 0 0

0 0

0 0

0 1

1 0

0 0 0 0 0 π z χ S42 ¼ 0:

0

1 0 0 C C C 0 C C 0 C C C, 0 C C 0 C C C 0 A

ð19:181Þ

1

As for the identity matrix E, we have χ ðEÞ ¼ 8: Thus, Table 19.11 collects characters of individual symmetry transformations with respect to hydrogen 1s and carbon 2s and 2p orbitals in methane. From the above examples, we notice that all the symmetry operators R reduce an eightdimensional representation space V8 to subspaces of Span{H1 H2 H3 H4} and Span{C2s C2px C2py C2pz}. In terms of a notation of Sect. 12.2, we have V 8 ¼ SpanfH 1 H 2 H 3 H 4 g Span C2s C2px C2py C2pz :

ð19:182Þ

19.5

MO Calculations of Methane

Table 19.11 Characters for individual symmetry transformations of hydrogen 1s orbitals and carbon 2s and 2p orbitals in methane

781 E 8

Td Γ

8C3 2

3C2 0

6S4 0

6σ d 4

In other words, V8 is decomposed into the above two R-invariant subspaces (see Part III); one is the hydrogen-related subspace and the other is the carbon-related subspace. Correspondingly, a representation D comprising the above representation matrices should be reduced to a direct sum of irreducible representations D(α). That is, we should have D¼

X

q DðαÞ , α α

where qα is a positive integer or zero. Here we are thinking of decomposition of (8, 8) matrices such as (19.178) into submatrices. We estimate qα using (18.83). That is, qα ¼

1 X ðαÞ χ ðgÞ χ ðgÞ: g n

ð18:83Þ

With an irreducible representation A1 of Td for instance, we have qA1 ¼

1 X ðA1 Þ 1 χ ðgÞ χ ðgÞ ¼ ð1 8 þ 1 8 2 þ 1 6 4Þ ¼ 2: g 24 24

As for A2, we have qA 2 ¼

1 ½1 8 þ 1 8 2 þ ð1Þ 6 4 ¼ 0: 24

Regarding T2, we get qT 2 ¼

1 ð3 8 þ 1 6 4Þ ¼ 2: 24

For other irreducible representations of Td, we get qα ¼ 0. Consequently, we have D ¼ 2DðA1 Þ þ 2DðT 2 Þ :

ð19:183Þ

Evidently, both for the hydrogen-related representation D(H )and carbon-related representation D(C), we have individually

782

19

DðH Þ ¼ DðA1 Þ þ DðT 2 Þ

Applications of Group Theory to Physical Chemistry

and DðCÞ ¼ DðA1 Þ þ DðT 2 Þ ,

ð19:184Þ

where D ¼ D(H ) + D(C). In fact, (19.180) for example, in a subspace Span{H1 H2 H3 H4}, C z2 is expressed as 0

0 0 B0 0 B C z2 ¼ B @0 1 1 0

0 1 0 0

1 1 0C C C: 0A 0

Following routine procedures based on a characteristic polynomials, we get eigenvalues +1 (as a double root) and 1 (a double root as well). Unitary similarity transformation using a unitary matrix P expressed as 01 B2 B1 B B2 P¼B B1 B B2 @ 1 2

1 2 1 2 1 2 1 2

1 2 1 2 1 2 1 2

1 1 2 C 1C C 2C C 1C C 2C A 1 2

yields a diagonal matrix such that 0

1

B0 B P1 C z2 P ¼ B @0 0

0

0

1

0

0 0

1 0

0

1

0C C C: 0A 1

Note that this diagonal matrix is identical to submatrix of (19.180) for Span zπ {C2s C2px C2py C2pz}. Similarly, S42 of (19.181) gives eigenvalues 1 and i for zπ both the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex number with an absolute value of 1. Since these symmetry operation matrices are unitary, these matrices must be diagonalized according to Theorem 14.5. Using unitary matrices whose column vectors are chosen from eigenvectors of the matrices, diagonal elements are identical to eigenvalues including their multiplicity. Writing representation matrices of symmetry operations for the hydrogen-associated subspace and carbon-associated subspace as H and C, we find that H and C have the same eigenvalues in common. Notice here that different types of transformation zπ matrices (e.g., C z2 , S42 ) give a different set of eigenvalues. Via unitary similarity transformation using unitary matrices P and Q, we get

19.5

MO Calculations of Methane

783

P1 HP ¼ Q1 CQ

or

1 1 1 ¼ C: PQ H PQ

Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling Schur’s First Lemma, Eq. (19.184) results. Our next task is to construct SALCs, i.e., proper basis vectors using projection operators. From the above, we anticipate that the MOs comprise a linear combination of the hydrogen-associated SALC and carbon-associated SALC that belongs to the same irreducible representation (i.e., A1 or T2). For this purpose, we first find proper SALCs using projection operators described as ðαÞ

PlðlÞ ¼

dα X ðαÞ D ðgÞ g: g ll n

ð18:156Þ

We apply this operator to Span{H1 H2 H3 H4}. Taking, e.g., H1 and operating both sides of (18.156) on H1, as a basis vector corresponding to a one-dimensional representation of A1 we have dA1 X ðA1 Þ 1 D ðgÞ gH 1 ¼ ½ð1 H 1 Þ þ ð1 H 1 þ 1 H 1 þ 1 H 3 g 11 24 n þ1 H 4 þ 1 H 3 þ 1 H 2 þ 1 H 4 þ 1 H 2 Þ þ ð1 H 4 þ 1 H 3 þ 1 H 2 Þ ðA Þ

P1ð11Þ H 1 ¼

þð1 H 3 þ 1 H 2 þ 1 H 2 þ 1 H 4 þ 1 H 4 þ 1 H 3 Þ þð1 H 1 þ 1 H 4 þ 1 H 1 þ 1 H 2 þ 1 H 1 þ 1 H 3 Þ ¼

1 ð6 H 1 þ 6 H 2 þ 6 H 3 þ 6 H 4 Þ 24

1 ¼ ðH 1 þ H 2 þ H 3 þ H 4 Þ: 4

ð19:185Þ

The case of C2s is simple, because all the symmetry operations convert C2s to itself. That is, we have ðA Þ

P1ð11Þ C2s ¼

1 ð24 C2sÞ ¼ C2s: 24

Regarding C2p, taking C2px for instance, we have

1 ðA Þ P1ð11Þ C2px ¼ fð1 C2px Þ þ 1 C2py þ 1 C2pz þ 1 C2py 24

þ1 C2pz þ 1 C2py þ 1 C2pz þ 1 C2py þ 1 C2pz þ½ð1 C2px þ 1 ðC2px Þ þ 1 ðC2px Þþ½1 ðC2px Þ

þ1 ðC2px Þ þ 1 C2pz þ 1 C2pz þ 1 C2py þ 1 C2py

ð19:186Þ

784

19

Applications of Group Theory to Physical Chemistry

Table 19.12 Character table of Td Td A1 A2 E T1 T2

E 1 1 2 3 3

8C3 1 1 1 0 0

3C2 1 1 2 1 1

6S4 1 1 0 1 1

6σ d 1 1 0 1 1

x2 + y2 + z2 (2z2 x2 y2, x2 y2) (x, y, z); (xy, yz, zx)

þ 1 C2py þ 1 C2px þ 1 C2pz þ 1 C2py þ 1 C2px þ 1 C2pz ¼ 0: The calculation is somewhat tedious, but it is natural that since C2s is spherically symmetric, it belongs to the totally symmetric representation. Conversely, it is natural to think that C2px is totally unlikely to contain a totally symmetric representation. This is also the case with C2py and C2pz. Table 19.12 shows the character table of Td in which the three-dimensional irreducible representation T2 is spanned by basis vectors (x y z). Since in Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can directly be utilized to represent T2. More specifically, we can directly choose the ðT Þ diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 , ðT Þ ðT Þ D222 , and D332 elements of the projection operators, respectively. Thus, we can construct SALCs using projection operators explained in Sect. 18.7. For example, using H1 we obtain SALCs that belong to T2 such that 3 X ðT 2 Þ 3 D ðgÞ gH 1 ¼ fð1 H 1 Þ g 11 24 24 þ½ð1Þ H 4 þ ð1Þ H 3 þ 1 H 2 þ½0 ðH 3 Þ þ 0 ðH 2 Þ þ 0 ðH 2 Þ þ 0 ðH 4 Þ þ ð1Þ H 4 þ ð1Þ H 3 þð0 H 1 þ 0 H 4 þ 1 H 1 þ 1 H 2 þ 0 H 1 þ 0 H 3 Þg 3 ¼ ½2 H 1 þ 2 H 2 þ ð2Þ H 3 þ ð2Þ H 4 24 1 ð19:187Þ ¼ ðH 1 þ H 2 H 3 H 4 Þ: 4 ðT Þ

P1ð12Þ H 1 ¼

Similarly, we have 1 ðT Þ P2ð22Þ H 1 ¼ ðH 1 H 2 þ H 3 H 4 Þ, 4 1 ðT Þ P3ð32Þ H 1 ¼ ðH 1 H 2 H 3 þ H 4 Þ: 4

ð19:188Þ ð19:189Þ

19.5

MO Calculations of Methane

785 ðT Þ

Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,

ðT Þ

ðT Þ

P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we obtain what we anticipate. That is, ðT Þ

ðT Þ

P1ð12Þ C2px ¼ C2px ,

ðT Þ

P2ð22Þ C2py ¼ C2py , P3ð32Þ C2pz ¼ C2pz :

ð19:190Þ

This gives a good example to illustrate the general concept of projection operators and related calculations of inner products D discussed in Sect. E 18.7. For example, we ðT 2 Þ ðT 2 Þ have a nonvanishing inner product of P1ð1Þ H 1 jP1ð1Þ C2px and an inner product of, D E ðT Þ ðT Þ e.g., P2ð22Þ H 1 jP1ð12Þ C2px is zero. This significantly reduces efforts to solve a ðT Þ

ðT Þ

secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are ðT Þ

ðT Þ

linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px ð αÞ

ðαÞ

correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions are linearly independent, while they belong to the same place “1” of the same threedimensional irreducible representation T2. Equations (19.187) to (19.190) seem intuitively obvious. This is because if we draw a molecular geometry of methane (see Fig. 19.14), we can immediately recognize the relationship between the directionality in ℝ3 and “directionality” of SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of carbon). As stated above, we have successfully obtained SALCs relevant to methane. Therefore, our next task is to solve an eigenvalue problem and construct appropriate MOs. To do this, let us first normalize the SALCs. We assume that carbon-based SALCs have already been normalized as well-studied atomic orbitals. We suppose that all the functions are real. For the hydrogen-based SALCs hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i ¼ 4hH 1 jH 1 i þ 12hH 1 jH 2 i

¼ 4 þ 12SHH ¼ 4 1 þ 3SHH , ð19:191Þ where the second equality comes from the fact that jH1i, i.e., a 1s atomic orbital is normalized. We define hH1| H2i ¼ SHH, i.e., an overlap integral between two adjacent hydrogen atoms. Note also that hHi| Hji (1 i, j 4; i 6¼ j) is the same because of the symmetry requirement. Thus, as a normalized SALC we have j H1 þ H2 þ H3 þ H4i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 2 1 þ 3SHH pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Defining a denominator as c ¼ 2 1 þ 3SHH , we have H ðA1 Þ

ð19:192Þ

786

19

Applications of Group Theory to Physical Chemistry

H ðA1 Þ ¼j H 1 þ H 2 þ H 3 þ H 4 i=c:

ð19:193Þ

Also, we have hH 1 þ H 2 H 3 H 4 jH 1 þ H 2 H 3 H 4 i ¼ 4hH 1 jH 1 i 4hH 1 jH 2 i

¼ 4 4SHH ¼ 4 1 SHH :

ð19:194Þ

Thus, we have j H1 þ H2 H3 H4i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 2 1 SHH pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Also defining a denominator as d ¼ 2 1 SHH , we have ðT 2 Þ

H1

ðT 2 Þ

H1

ð19:195Þ

j H 1 þ H 2 H 3 H 4 i=d:

Similarly, we define other hydrogen-based SALCs as H2

ðT 2 Þ

j H 1 H 2 þ H 3 H 4 i=d,

ðT Þ H3 2

j H 1 H 2 H 3 þ H 4 i=d:

,

The next step is to construct MOs using the above SALCs. To this end, we make a linear combination using SALCs belonging to the same irreducible representations. In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear combinations of a1 H ðA1 Þ þ b1 C2s,

ð19:196Þ

where a1 and b1 are arbitrary constants. On the basis of the discussions of projection operators in Sect. 18.7, both the above two linear combinations belong to A1 as well. ðT Þ ðT Þ ðT Þ Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three sets of linear combinations ðT Þ

ðT Þ

ðT Þ

q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz ,

ð19:197Þ

where q1, r1, etc. are arbitrary constants. These three sets of linear combinations belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to determine coefficients of the above MOs and to normalize them by solving the secular equations. With two different energy eigenvalues, we get two orthogonal (i.e., linearly independent) MOs for the individual four sets of linear combinations of

19.5

MO Calculations of Methane

787

(19.196) and (19.197). Thus, total eight linear combinations constitute MOs of methane. In light of (19.183), the secular equation can be reduced as follows according to the representations A1 and T2. There we have changed the order of entries in the equation so that we can deal with the equation easily. Then we have H 11 λ H 12 λS12 H λS H λ 21 22 21 G11 λ G12 λT 12 G21 λT 21 G22 λ F 11 λ F 12 λV 12 F 21 λV 21 F 22 λ K λ K λW 11 12 12 K 21 λW 21 K 22 λ ¼ 0, ð19:198Þ where off-diagonal elements are all zero. This is because of the symmetry requirement (see Sect. 19.2). Thus, in (19.198) the secular equation is decomposed into four (2, 2) blocks. The first block is pertinent to A1 of a hydrogen-based component and carbon-based component from the left, respectively. Lower three blocks are pertinent to T2 of hydrogen-based and carbon-based components from the left, respecðT Þ ðT Þ ðT Þ tively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the top. The notations follow those of (19.59). We compute these equations. The calculations are equivalent to solving the following four two-dimensional secular equations: H 11 λ H 12 λS12 ¼ 0, H λS H 22 λ 21 21 G11 λ G12 λT 12 ¼ 0, G λT G22 λ 21 21 F 11 λ F 12 λV 12 ¼ 0, F λV F 22 λ 21 21 K 11 λ K 12 λW 12 ¼ 0: K λW K λ 21

21

ð19:199Þ

22

Notice that these four secular equations are the same as a (2, 2) determinant of (19.59) in the form of a secular equation. Note, at the same time, that while (19.59) did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199)

788

19

Applications of Group Theory to Physical Chemistry

expresses a secular equation with respect to two SALCs that belong to the same irreducible representation. The first equation of (19.199) reads as

1 S12 2 λ2 ðH 11 þ H 22 2H 12 S12 Þλ þ H 11 H 22 H 12 2 ¼ 0:

ð19:200Þ

In (19.200) we define quantities as follows: ð

S12 H ðA1 Þ C2sdτ hH ðA1 Þ jC2si ¼

hH 1 þ H 2 þ H 3 þ H 4 jC2si 2hH 1 jC2si pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 þ 3SHH 1 þ 3SHH

2SCH A1 ffi: ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ 3SHH

ð19:201Þ

In (19.201), an overlap integral between hydrogen atomic orbitals and C2s is identical from the symmetry requirement and it is defined as SCH A1 hH 1 jC2si:

ð19:202Þ

Also in (19.200), other quantities are defined as follows: D E αH þ 3βHH H 11 H ðA1 Þ jHH ðA1 Þ ¼ , 1 þ 3SHH

H 12

ð19:203Þ

ð19:204Þ H 22 hC2sjHC2si, D E hH þ H þ H þ H jHC2si 2hH jHC2si 2 3 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi4 ffi H ðA1 Þ jHC2s ¼ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi HH 2 1 þ 3S 1 þ 3SHH 2βCH A1 ffi, ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ 3SHH

ð19:205Þ

where H is a Hamiltonian of a methane molecule. In (19.203) and (19.205), moreover, we define the quantities as αH H 1 HH 1 i, βHH hH 1 HH 2 , βCH A1 hH 1 jHC2si:

ð19:206Þ

The quantity of H11 is a “Coulomb” integral of the hydrogen-based SALC that involves four hydrogen atoms. Solving the first equation of (19.199), we get

λ¼

H 11 þH 22 2H 12 S12

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðH 11 H 22 Þ2 þ4 H 12 2 þH 11 H 22 S12 2 H 12 S12 ðH 11 þH 22 Þ

: 2 1S12 2 ð19:207Þ

19.5

MO Calculations of Methane

789

Similarly, we obtain related solutions for the latter three eigenvalue equations of (19.199). With the second equation of (19.199), for instance, we have

1 T 12 2 λ2 ðG11 þ G22 2G12 T 12 Þλ þ G11 G22 G12 2 ¼ 0:

ð19:208Þ

Solving this, we get

λ¼

G11 þG22 2G12 T 12

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðG11 G22 Þ2 þ4 G12 2 þG11 G22 T 12 2 G12 T 12 ðG11 þG22 Þ

: 2 1T 12 2 ð19:209Þ

In (19.208) and (19.209) we define these quantities as follows: ð

D E hH þH H H jC2p i 2hH jC2p i ðT Þ ðT Þ 2 3 1 x x pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi4 ffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi T 12 H 1 2 C2px dτ H 1 2 j2Cpx ¼ 1 2 1SHH 1SHH 2SCH T2 ffi, ð19:210Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1SHH D E αH βHH ðT Þ ðT Þ G11 H 1 2 jHH 1 2 ¼ , ð19:211Þ 1 SHH

G22 hC2px jHC2px i, D E hH þ H H H jHC2p i 2hH jHC2p i ðT Þ 2 3 4 1 x pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ffix G12 H 1 2 jHC2px ¼ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 SHH 1 SHH 2βCH T2 ffi: ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 SHH

ð19:212Þ

ð19:213Þ

In the above equations, we further define integrals such that CH SCH T 2 hH 1 jC2px i, βT 2 hH 1 jℌC2px i:

ð19:214Þ

In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and C2px is identical again from the symmetry requirement. That is, the integrals of (19.213) are additive regarding a product of plus components of a hydrogen atomic orbital of (19.187) and C2px as well as another product of minus components of a hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a node on yz-plane. The third and fourth equations of (19.199) give exactly the same eigenvalues as that given in (19.209). This is obvious from the fact that all the latter three equations of (19.199) are associated with a irreducible representation T2. The corresponding three MOs are triply degenerate. In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus sign gives a lower energy. Equations (19.207) and (19.209), however, look

790

19

Applications of Group Theory to Physical Chemistry

somewhat complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider a case where jH11 j j H22j or jH11 j j H22j. In that case, (H11 H22)2 dominates inside a square root, and so ignoring 2H12S12 and S122 we have λ H11 or λ H22. Inserting these values into (19.199), we have either Ψ H ðA1 Þ or Ψ C2s, where Ψ is a resulting MO. This implies that no interaction would arise between H ðA1 Þ and C2s. (ii) If, on the other hand, H11 ¼ H22, we would get λ¼

H 11 H 12 S12 j H 12 H 11 S12 j : 1 S12 2

ð19:215Þ

As H12 H11S12 is positive or negative, we have a following alternative: 11 þH 12 11 H 12 Case I: H12 H11S12 > 0. We have λH ¼ H1þS and λL ¼ H1S , where 12 12 λ H > λ L. 11 H 12 11 þH 12 Case II: H12 H11S12 < 0. We have λH ¼ H1S and λL ¼ H1þS , where 12 12 λ H > λ L. In this case, we would anticipate maximum orbital mixing between H ðA1 Þ and C2s. (iii) If H11 and H22 are moderately different in between the above (i) and (ii), the orbital mixing is likely to be moderate. This is an actual situation. With (19.209) we have eigenvalues expressed similarly to those of (19.207). Therefore, classifications related to the above Cases I and II hold with G12 G11T12, F12 F11V12, and K12 K11W12 as well. In spite of simplicity of (19.215), the quantities H11, H12, and S12 are hard to calculate. In general cases including the present example (i.e., methane), the difficulty in calculating these quantities results essentially from the fact that we are dealing with a many-particle interactions that include electron repulsion. Nonetheless, for a simplest case of a hydrogen molecular ion Hþ 2 the estimation is feasible [3]. Let us estimate H12 H11S12 quantitatively according to Atkins and Friedman [3]. The Hamiltonian of Hþ 2 is described as H¼

ħ2 2 e2 e2 e2 þ ∇ , 2m 4πε0 r A 4πε0 r B 4πε0 R

ð19:216Þ

where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last term represents the repulsive interaction between the two hydrogen nuclei. To estimate the 0 quantities in (19.215), it is convenient to use dimensionless ellipsoidal 1 μ B C coordinates @ ν A such that [3] ϕ μ¼

rA þ rB R

and

ν¼

rA rB : R

ð19:217Þ

19.5

MO Calculations of Methane

791

e

A

B

Fig. 19.15 Configuration of electron (e) and hydrogen nuclei (A and B). rA and rB denote a separation between the electron and A and that between the electron and B, respectively. R denotes a separation between A and B

The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight line connecting the two nuclei). Then, we have 2 ħ2 2 e e2 e2 A H 11 ¼ hAjHjAi ¼ A ∇ A þ A A þ A 2m 4πε0 r A 4πε0 r B 4πε0 R 1 e2 e2 ¼ E 1s A A þ , ð19:218Þ rB 4πε0 4πε0 R where E1s is the same as that given in (3.258) with Z ¼ n ¼ 1 and μ replaced with ħ2 m (i.e., E1s ¼ 2ma 2 ). Using a coordinate representation of (3.301), we have rffiffiffi 1 3=2 rA =a j Ai ¼ a e : π

ð19:219Þ

Moreover considering (19.217), we have ð 1 1 1 A A ¼ 3 dτe2rA =a : rB rB πa Converting Cartesian coordinates to ellipsoidal coordinates [3] such that ð dτ ¼ we have

3 ð 2π ð 1 ð 1

R dϕ dμ dν μ2 ν2 , 2 0 1 1

792

19

Applications of Group Theory to Physical Chemistry

ð1 ð1 1

eðμþνÞR=a R3 A A ¼ 2π dμ dν μ2 ν2 3 R rB 8πa 1 1 ðμ νÞ 2 ð ð1 R2 1 ¼ 3 dμ dνðμ þ νÞeðμþνÞR=a : 2a 1 1 Putting I ¼ I¼

ð1

Ð1 1

μe

ð19:220Þ

Ð1 dμ 1 dνðμ þ νÞeðμþνÞR=a , we obtain

μR=a

ð1 dμ

e

νR=a

1

1

dν þ

ð1 e 1

μR=a

ð1 dμ νeνR=a dν: 1

ð19:221Þ

The above definite integrals can readily be calculated using the methods described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have ð1 1

ecν dν ¼

ecx c

1 1

1 ¼ ðec ec Þ: c

ð19:222Þ

Differentiating (19.222) with respect to the parameter c, we get ð1 1

νecν dν ¼

1 ½ð1 cÞec ð1 þ cÞec : c2

ð19:223Þ

In the present case, c is to be replaced with R/a. Other calculations of the definite integrals are left for readers. Thus, we obtain I¼

h i a3 R 2 2 1 þ e2R=a : 3 a R

In turn, we have

h i 1 R2 1 R A A ¼ 3 I ¼ 1 1 þ e2R=a : rB R a 2a Introducing a symbol according to Atkins and Friedman [3]

19.5

MO Calculations of Methane

793

j0

e2 , 4πε0

we finally obtain H 11 ¼ E 1s

i j j0 h R 1 1 þ e2R=a þ 0 : a R R

ð19:224Þ

The quantity S12 can be obtained as follows: S12 ¼ hB j Ai ¼ where we used j Bi ¼ ð1

ð 2π

ð1 dϕ

0

ð1 dμ

1

1

dν μ2 ν2 eμR=a ,

ð19:225Þ

qffiffi 1 3=2 rB =a e and (19.217). Noting that πa

ð1 dμ

1

R3 8πa3

1

dν μ2 ν2 eμR=a ¼

ð1 1

2 dμ 2μ2 eμR=a 3

and following procedures similar to those described above, we get

S12

R 1 R 2 R=a ¼ 1þ þ : e a 3 a

ð19:226Þ

In turn, for H12 we have 2 2 ħ2 e2 e B þ A e B H 12 ¼ hAjHjBi ¼ A ∇2 B þ A 2m 4πε0 r B 4πε0 r A 4πε0 R 1 e2 e2 ¼ E1s hAjBi A B þ ð19:227Þ hAjBi: 4πε0 r A 4πε0 R ħ Note that in (19.227) jBi is an eigenfunction belonging to E1s ¼ 2ma 2 that is an 2 ħ2 e2 eigenvalue of an operator 2m ∇ 4πε0 rB . From (3.258), we estimate E1s to be 13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coordinates once again, we get 2

794

19

Applications of Group Theory to Physical Chemistry

1 1 R R=a 1þ e A B ¼ : rA a a Thus, we have H 12 ¼

E1s þ

j0 j R R=a e S12 0 1 þ : a R a

ð19:228Þ

We are now in the position to evaluate H12 H11S12 in (19.213). According to Atkins and Friedman [3], we define following notations: 1 j j0 A A rB 0

and

1 k j0 A B : rA 0

Then, we have H 12 H 11 S12 ¼ k0 þ j0 S12 :

ð19:229Þ

The calculation of this quantity is straightforward. The result is H 12 H 11 S12 ¼ j0

¼

h

i 1 2R R=a j0 R R 1 R 2 3R=a 2 e 1þ þ Þ e 1þ R 3a a a 3 a R

h i j0 a 2R R=a j0 a R 1 R 2 3R=a e 1þ þ Þ e : 1þ R a 3 a a R 3a a

ð19:230Þ

In (19.230) we notice that whereas the second term is always negative, the first term may be negative or positive depending upon R. We could not tell a priori whether H12 H11S12 is negative accordingly. If we had R a, (19.230) would become positive. Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm (using an electron rest mass). As an experimental result, R is approximately 106 pm [3]. Hence, for a Hþ 2 ion we have R=a 2:0: Using this number, we get H 12 H 11 S12 0:13j0 =a < 0: We estimate H12 H11S12 to be ~ 3.5 eV. Therefore, from (19.215) we get

19.5

MO Calculations of Methane

λL ¼

795

H 11 þ H 12 1 þ S12

and

λH ¼

H 11 H 12 , 1 S12

ð19:231Þ

where λL and λH indicate lower and higher energy eigenvalues, respectively. Namely, Case II in the above is more likely. Correspondingly, for MOs we have j Aiþ j Bi Ψ L ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ð1 þ S12 Þ

j Ai j Bi and Ψ H ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , 2ð1 S12 Þ

ð19:232Þ

where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the same as those given in (19.101) to (19.105) of Sect. 19.4.1. In (19.101) and αβ Fig. 19.6, however, we merely assumed that λ1 ¼ αþβ 1þS is lower than λ2 ¼ 1S . Here, we have confirmed that this is truly the case. A chemical bond is formed in such a way that an electron is distributed as much as possible along the molecular axis (see Fig. 19.4 for a schematic) and that in this configuration a minimized orbital energy is achieved. In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199). Suppose that we get a MO by solving, e.g., the first secular equation of (19.199) such that Ψ ¼ a1 H ðA1 Þ þ b1 C2s: From the secular equation, we get a1 ¼

H 12 λS12 b , H 11 λ 1

ð19:233Þ

where S12 and H12 were defined as in (19.201) and (19.205), respectively. A e is described by normalized MO Ψ a1 H ðA1 Þ þ b1 C2s e ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi: Ψ a1 2 þ b1 2 þ 2a1 b1 S12

ð19:234Þ

Thus, according to two different energy eigenvalues λ we will get two linearly independent MOs. Other three secular equations are dealt with similarly. From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding MO and a1b1 < 0 with an antibonding MO. Meanwhile, since both H ðA1 Þ and C2s e on the basis of the discussion belong to the irreducible representation A1, so does Ψ on the projection operators of Sect. 18.7. Thus, we should be able to construct proper MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving other three secular equations of (19.199). In this case, three bonding MOs are triply degenerate, so are three antibonding MOs. All these six MOs belong to the

796

19

Applications of Group Theory to Physical Chemistry ∗

Fig. 19.16 Probable energy diagram and MO symmetry species of methane. Adapted from http://www.science. oregonstate.edu/~gablek/ CH334/Chapter1/methane_ MOs.htm with kind permission of Professor Kevin P. Gable

∗

CH4

irreducible representation T2. Thus, we can get a complete set of MOs for methane. These eight MOs span the representation space V8. To precisely determine the energy levels, we need to take more elaborate approaches to approximate and calculate various parameters that appear in (19.199). At the same time, we need to perform detailed experiments including spectroscopic measurements and interpret those results carefully [4]. Taking account of these situations, Fig. 19.16 [5] displays as an example of MO calculations that give a probable energy diagram and MO symmetry species of methane. The diagram comprises a ground-state bonding a1 state and its corresponding antibonding state a 1 along with triply degenerate bonding t2 states, and their corresponding antibonding state t 2 . We emphasize that the said elaborate approaches ensue truly from the “paperand-pencil” methods based upon group theory. The group theory thus supplies us with a powerful tool and clear guideline for addressing various quantum chemical problems, a few of which we are introducing as an example in this book. Finally, let us examine the optical transition of methane. In this case, we have to consider electronic configurations of the initial and final states. If we are dealing with optical absorption, the initial state is the ground-state A1 that is described by the totally symmetric representation. The final state, on the other hand, will be an excited state, which is described by a direct-product representation related to the two states that are associated with the optical transition. The matrix element is expressed as D E Θðα βÞ jεe PjΘðA1 Þ ,

ð19:235Þ

where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground state; Θ(α β) denotes an electronic configuration of an excited state represented by a direct-product representation pertinent to irreducible representations α and β; P is an electric dipole operator and εe is a unit polarization vector of the electric field. From a

19.5

MO Calculations of Methane

797

character table for Td, we find that εe P belongs to the irreducible representation T2 (see Table 19.12). The ground-state electronic configuration is A1 (totally symmetric). It is denoted by a21 t 22 t 0 2 t 00 2 , 2

2

where three T2 states are distinguished by a prime and double prime. For possible configuration of excited states, we have. a21 t 2 t 0 2 t 00 2 t 2 ðA1 ! T 2 T 2 Þ, a21 t 2 t 0 2 t 00 2 a 1 ðA1 ! T 2 A1 ¼ T 2 Þ, 2

2

2

2

a1 t 22 t 0 2 t 00 2 t 2 ðA1 ! A1 T 2 ¼ T 2 Þ, a1 t 22 t 0 2 t 00 2 a 1 ðA1 ! A1 A1 ¼ A1 Þ: 2

2

2

2

ð19:236Þ In (19.236), we have optical excitations of t 2 ! t 2 , t 2 ! a 1 , a1 ! t 2 , and a1 ! a 1 , respectively. Consequently, the excited states are denoted by T2 T2, T2, T2, and A1, respectively. Since εe P is Hermitian as seen in (19.51) of Sect. 19.2, we have D E D E Θðα βÞ jεe PjΘðA1 Þ ¼ ΘðA1 Þ jεe PjΘðα βÞ :

ð19:237Þ

Therefore, according to the general theory of Sect. 19.2, we need to examine whether T2 D(α) D(β) contains A1 to judge whether the optical transition is allowed. As mentioned above, D(α) D(β) is chosen from among T2 T2, T2, T2, and A1. The results are given as below: T 2 T 2 T 2 ¼ 3T 1 þ 4T 2 þ 2E þ A2 þ A1 ,

ð19:238Þ

T 2 T 2 ¼ T 1 þ T 2 þ E þ A1 ,

ð19:239Þ

T 2 A1 ¼ T 2 :

ð19:240Þ

Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As for (19.240), however, the transition is forbidden because it does not contain A1. In light of the character table of Td (Table 19.12), we find that the allowed transitions (i.e., t 2 ! t 2 , t 2 ! a 1 , and a1 ! t 2 ) equally take place in the direction polarized along the x-, y-, z-axes. This is often the case with molecules having higher symmetries such as methane. The transition a1 ! a 1 , on the other hand, is forbidden. In the above argument including the energetic consideration, we could not tell magnitude relationship of MO energies or photon energies associated with the optical transition. Once again, this requires more accurate calculations and experiments. Yet, the discussion we have developed gives us strong guiding principles in the investigation of molecular science.

798

19

Applications of Group Theory to Physical Chemistry

References 1. Cotton FA (1990) Chemical applications of group theory. Wiley, New York 2. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York 3. Atkins P, Friedman R (2005) Molecular quantum mechanics, 4th edn. Oxford University Press, Oxford 4. Anslyn EV, Dougherty DA (2006) Modern physical organic chemistry. University Science Books, Sausalito 5. Gable KP (2014) Molecular orbitals: Methane, http://www.science.oregonstate.edu/~gablek/ CH334/Chapter1/methane_MOs.htm

Chapter 20

Theory of Continuous Groups

In Chap. 16, we classified the groups into finite groups and infinite groups. We focused mostly on finite groups and their representations in the precedent chapters. Of the infinite groups, continuous groups and their properties have been widely investigated. In this chapter we think of the continuous groups as a collective term that include Lie groups and topological groups. The continuous groups are also viewed as the transformation group. Aside from the strict definition, we will see the continuous group as a natural extension of the rotation group or SO(3) that we studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of infinitesimal rotation. We make the most of the exponential functions of matrices the theory of which we have explored in Chap. 15. In this context, we study the basic properties and representations of the special unitary group of SU(2) that has close relevance to SO(3). Thus, we focus our attention on the representation theory of SU (2) and SO(3). The results are associated with the (generalized) angular momenta which we dealt with in Chap. 3. In particular, we show that the spherical surface harmonics constitute the basis functions of the representation space of SO(3). Finally, we study various important properties of SU(2) and SO(3) within a framework of Lie groups and Lie algebras. The last sections comprise the description of abstract ideas, but those are useful for us to comprehend the constitution of the group theory from a higher perspective.

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

To characterize the continuous transformation appropriately, we have to describe an infinitesimal transformation of rotation. This is because a rotation with a finite angle is attained by an infinite number of infinitesimal rotations.

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3_20

799

800

20

Theory of Continuous Groups

First let us consider how a function Ψ is transformed by the rotation (see Sect. 19.1). Here we express Ψ by 0

1

c1

B c2 C B C Ψ ¼ ðψ 1 ψ 2 ψ d ÞB C, @⋮A

ð20:1Þ

cd where {ψ 1, , ψ d} span the representation space relevant to the rotation; d is a dimension of the representation space; c1, c2, cd are coefficients (or coordinates). Henceforth, we make it a rule to denote a vector in a representation space as in (20.1). This is a natural extension of (11.13). We assume ψ v (v ¼ 1, 2, , d ) as a vector component and ψ v is normally expressed as a function of a position vector x in a real three-dimensional space (see Chap. 18). Considering for simplicity a one-dimensional representation space (i.e., d ¼ 1) and omitting the index v, as a transformation of ψ we have Rθ ψ ðxÞ ¼ ψ R1 θ ð xÞ ,

ð20:2Þ

where the index θ indicates the rotation angle in a real space whose dimension is two or three. Note that the dimensions of the real space and representation space may or may not be the same. Equation (20.2) is essentially the same as (19.11). As a three-dimensional matrix representation of rotation, we have, e.g., 0

cos θ

B Rθ ¼ @ sin θ 0

sin θ cos θ 0

0

1

C 0 A:

ð20:3Þ

1

The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and (11.31). Borrowing the notation of (17.3) and (17.12), we have 0

cos θ B Rθ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ 0

sin θ cos θ 0

10 1 0 x CB C 0 A@ y A: 1

ð11:36Þ

z

Therefore, we get 0

cos θ

B R1 θ ðxÞ ¼ ðe1 e2 e3 Þ@ sin θ 0

sin θ cos θ 0

10 1 x CB C 0 A@ y A: 1 z 0

ð20:4Þ

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

801

Let us think of an infinitesimal rotation. Taking a first-order quantity of an infinitesimal angle θ, (20.4) can be approximated as 0

10 1 0 1 x þ θy x B CB C B C B CB C B C R1 θ ðxÞ ðe1 e2 e3 Þ@ θ 1 0 A@ y A ¼ ðe1 e2 e3 Þ@ y θx A 0 0 1 z z ¼ ðxe1 þ θye1 ye2 θxe2 ze3 Þ ¼ ð½x þ θye1 ½y θxe2 ze3 Þ: 1

θ

0

ð20:5Þ

Putting ψ ðxÞ = ψ ðxe1 þ ye2 þ ze3 Þ ψ ðx, y, zÞ, we have Rθ ψ ðxÞ ¼ ψ R1 θ ð xÞ

ð20:2Þ

or Rθ ψ ðx, y, zÞ ψ ðx þ θy, y θx, zÞ ∂ψ ∂ψ ¼ ψ ðx, y, zÞ þ θy þ ðθxÞ ∂x ∂y ∂ ∂ ¼ ψ ðx, y, zÞ iθðiÞ x y ψ ðx, y, zÞ: ∂y ∂x

ð20:6Þ

Now, using a dimensionless operator ℳz (¼Lz/ħ) such that ∂ ∂ ℳz i x y , ∂y ∂x

ð20:7Þ

we get Rθ ψ ðx, y, zÞ ψ ðx, y, zÞ iθℳz ψ ðx, y, zÞ ¼ ð1 iθℳz Þψ ðx, y, zÞ:

ð20:8Þ

Note that the operator ℳz has already appeared in Sect. 3.5 [also see (3.19), Sect. 3.2], but in this section we denote it by ℳz to distinguish it from the previous notation Mz. This is because ℳz and Mz are connected through a unitary similarity transformation (vide infra). From (20.8), we express an infinitesimal rotation θ around the z-axis as

802

20

Theory of Continuous Groups

Rθ = 1 iθℳz :

ð20:9Þ

Next, let us think of a rotation of a finite angle θ. We have n Rθ ¼ Rθ=n ,

ð20:10Þ

where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by n successive infinitesimal rotations of Rθ/n with a large enough n. Taking a limit n ! 1 [1, 2], we get n n θ Rθ ¼ lim Rθ=n ¼ lim 1 i ℳz ¼ exp ðiθℳz Þ: n!1 n!1 n

ð20:11Þ

Recall the following definition of an exponential function other than (15.7) [3]:

x n lim 1 þ : ex n!1 n

ð20:12Þ

Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus, if a dimension of the representation space d is two or more, (20.9) should be read as Rθ ¼ E iθℳz ,

ð20:13Þ

where E is a (d, d) identity matrix; ℳz is represented by a (d, d) square matrix accordingly. As already shown in Chap. 15, an exponential function of a matrix is well defined. Since ℳz is Hermitian, iℳz is anti-Hermitian (see Chaps. 2 and 3). Thus, using (20.11) Rθ can be expressed as Rθ ¼ exp ðθAz Þ,

ð20:14Þ

Az iℳz :

ð20:15Þ

with

Operators of this type play an essential role in continuous groups. This is because in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie group. The Lie group is categorized as one of continuous groups along with a topological group. In this chapter we further explore various characteristics of SO (3) and SU(2), both of which are Lie groups and have been fully investigated from various aspects. The Lie algebras frequently appear in the theory of continuous groups in combination with the Lie groups. Brief outline of the theory of Lie groups and Lie algebras will be given at the end of this chapter.

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

803

Let us further consider implications of (20.9) and (20.13). From those equations we have ℳz ¼ ð1 Rθ Þ=iθ

ð20:16Þ

ℳz ¼ ðE Rθ Þ=iθ:

ð20:17Þ

or

As a very trivial case, we might well be able to think of a one-dimensional real space. Let the space extend along the z-axis and a unit vector of that space be e3. Then, with any rotation Rθ around the z-axis makes e3 unchanged (or invariant). That is, we have Rθ ðe3 Þ ¼ e3 : This means Rθ E. Putting this into (20.17), we have ℳz ¼ 0. From (20.15), we have Az ¼ 0 as well. In fact, (20.14) shows that this is the case. With the three-dimensional case, we have 0

1 0 B E ¼ @0 1 0 0

1 0 C 0A

0

and

1 B Rθ ¼ @ θ

1

0

θ 1 0

1 0 C 0 A:

ð20:18Þ

1

Therefore, from (20.17) we get 0

0 i 0

B ℳz ¼ @ i 0

1

C 0 A: 0

0 0

ð20:19Þ

Next, viewing x, y, and z as vector components (i.e., functions), we think of the following equation: 0

0

B ℳz ðΧÞ ¼ ðx y zÞ@ i

0

i 0

10

c1

1

0

CB C 0 A@ c2 A:

0

0

ð20:20Þ

c3

The notation of (20.20) denotes the vector transformation in the representation space, in accordance with (11.37). In (20.20) we consider (x y z) as basis vectors as in the case of (e1 e2 e3) of (17.2) and (17.3). In other words, we are thinking of a vector transformation in a three-dimensional representation space spanned by a set of basis

804

20

Fig. 20.1 Function f(x, y) ¼ ay (a > 0). The function ay is drawn green. Only a part of f (x, y) > 0 is shown

Theory of Continuous Groups

=

,

=

( > 0)

vectors (x y z). In this case, ℳz represented by (20.19) operates on a vector (i.e., a function) Χ expressed by 0

1 c1 B C Χ ¼ ð x y z Þ @ c2 A,

ð20:21Þ

c3 where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural extension of (11.13). We emphasize once more that (x y z) does not represents coordinates but functions. As an example, we depict a function f(x, y) ¼ ay (a > 0) in Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector with respect to a linear transformation like a basis vector e2 of the Cartesian coordinate. The “directionality” as a vector can easily be visualized with the function ay. Since ℳz of (20.19) is Hermitian, it must be diagonalized by unitary similarity transformation (Chap. 14). As in the case of, e.g., (14.96), we find a diagonalizing unitary matrix P with respect to ℳz such that 1

0

1 1 pffiffiffi pffiffiffi B 2 2 B i P¼B B pffiffiffi piffiffiffi @ 2 2 0 0 Using (20.22), we rewrite (20.20) as

0

C C C: 0C A 1

ð20:22Þ

20.1

Introduction: Operators of Rotation and Infinitesimal Rotation

0

0 i

0

0

1

c1

1

B ℳz ðΧÞ ¼ ðx y zÞP P1 B @i

0

B C C 1 B C 0C AP P @ c2 A

0

0

0

c3

1

805

1 1 i pffiffiffi pffiffiffi 0 0 1 0 0 B C c1 2 2 CB C CB CB C B 1 i 1 0 C c2 B AB pffiffiffi pffiffiffi 0 C C@ A A @ 2 2 0 0 c3 0 0 1 1 0 0 1 p1ffiffiffi c þ piffiffiffi c 1 2 2 2 C 1 0 0 B C B CB 1 i 1 i C B CB 1 : ¼ pffiffiffi x pffiffiffi y pffiffiffi x pffiffiffi y z B 0 1 0 i @ AB pffiffiffi c1 þ pffiffiffi c2 C C 2 2 2 2 A @ 2 2 0 0 0 c3 0

1 pffiffiffi B 2 B B i ¼ ðx y zÞB B pffiffiffi @ 2 0

1 pffiffiffi 0 0 C 1 2 CB CB i 0 pffiffiffi 0 C C@ A 2 0 0 1

1

0

ð20:23Þ fz such that Here we define M 0

1

0

fz P1 ℳz P ¼ B M @0

0

1

C 1 0 A:

0

0

ð20:24Þ

0

Equation (20.24) was obtained by putting l ¼ 1 and shuffling the rows and columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by 0

0 0 B R3 ¼ @ 1 0 0 1

1 1 C 0 A,

ð17:116Þ

0

we have 0

1

B 1 1 1 f R1 3 M z R3 ¼ R3 P ℳz PR3 ¼ ðPR3 Þ ℳz ðPR3 Þ ¼ @ 0 0

0 0 0

0

1

C 0 A ¼ Mz: 1

Hence, Mz that appeared as a diagonal matrix in (3.158) has been obtained by a unitary similarity transformation of ℳz using a unitary operator PR3. Comparing (20.20) and (20.23), we obtain useful information. If we choose (x y z) for basis vectors, we could

not have a representation that diagonalizes ℳz. If, on the other hand, we choose p1ffiffi2 x piffiffi2 y p1ffiffi2 x piffiffi2 y z for the basis vectors, we get a representation that diagonalizes ℳz. These basis vectors are sometimes referred to as

806

20

Theory of Continuous Groups

a spherical basis. If we carefully look at the latter basis vectors (i.e., spherical basis), we 1 notice that these vectors have the same transformation properties as Y 1 Y 1 Y 01 . In fact, using (3.216) and converting the spherical coordinates to 1 Cartesian coordinates in (3.216), we get [4].

Y 11

Y 1 1

Y 01

rffiffiffiffiffi 3 1 1 1 pffiffiffi ðx þ iyÞ pffiffiffi ðx iyÞ z , ¼ 4π r 2 2

ð20:25Þ

where r is a radial coordinate (i.e., a positive number) that appears in the polar 0 is invariant with coordinate (r, θ). Note that the factor r contained in Y 11 Y 1 Y 1 1 respect to the rotation. We can generalize the results and conform them to the discussion of related characteristics in representation spaces of higher dimensions. The spherical surface harmonics possess an important position for this (vide infra). From Sect. 3.5, (3.150) we have M z Y 00 ¼ 0, pffiffiffiffiffiffiffiffiffiffi where Y 00 ðθ, ϕÞ ¼ 1=4π . The above relation obviously shows that Y 00 ðθ, ϕÞ belongs to an eigenvalue of zero of Mz. This is consistent with the earlier comments made on the one-dimensional real space.

20.2

Rotation Groups: SU(2) and SO(3)

Bearing in mind the aforementioned background knowledge, we further develop the theory of the rotation groups SU(2) and SO(3) and explore in detail various properties of them. First, we deal with the topics through conventional approaches. Later we will describe them systematically. Following the argument developed above, we wish to relate an anti-Hermitian operator to a rotation operator of a finite angle θ represented by (17.12). We express Hermitian operators related to the x- and y-axes as 0

0 0 B ℳx ¼ @ 0 0 0

i

0

1 0 C i A

0 0 B and ℳy ¼ @ 0 0 i 0

0

1 i C 0 A: 0

Then, anti-Hermitian operators corresponding to (20.15) are described by 0

0

B Az ¼ @ 1 0

1 0 0

0

1

0

0

C B 0 A, A x ¼ @ 0 0 0

0

0

1

0

0

C B 0 1 A, Ay ¼ @ 0 1 0 1

0 0 0

1

1

C 0 A, 0

ð20:26Þ

20.2

Rotation Groups: SU(2) and SO(3)

807

where Ax iℳx and Ay iℳy. These are characterized by real skew-symmetric matrices. Using (15.7), we calculate exp(θAz) such that exp ðθAz Þ ¼ E þ θAz þ

1 1 1 1 ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5 þ : ð20:27Þ 2! 3! 4! 5!

We note that 0

Az 2

1 B ¼@ 0 0

0 1

0

0

0

1 0

B Az 3 ¼ @ 1 0

1 0 0 1 B C 6 4 0 A ¼ Az , Az ¼ @ 0 0 1

0

0 0

C B 0 0 A ¼ Az 7 , Az 5 ¼ @ 1 0 0 0

0 1

1 0 C 0 A ¼ Az 8 , etc:;

0

0

1 0 0 0

1

C 0 A ¼ Az 9 , etc: 0

Thus, we get 1 1 1 1 exp ðθAz Þ ¼ E þ ðθAz Þ2 þ ðθAz Þ4 þ þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ 2! 4! 3! 5! 1 0 1 1 1 θ2 þ θ4 0 0 C B 2! 4! C B C B 1 2 1 4 ¼B C 0 1 θ θ þ 0 A @ 2! 4! 0 0 1 0 1 1 3 1 5 0 θ þ θ θ þ 0 3! 5! B C B C B C 1 3 1 5 þB C θ þ þ 0 0 θ θ : @ A 3! 5! 0 0 0 0 1 0 1 cos θ 0 0 0 sin θ 0 B C B C B ¼B cos θ 0 C 0 0C @ 0 A þ @ sin θ A 0 0 1 0 0 0 0 1 cos θ sin θ 0 B C B ¼ @ sin θ cos θ 0 C A: 0 0 1 ð20:28Þ Hence, we recover the matrix form of (17.12). Similarly, we get

808

20

0

1 B exp ðφAx Þ ¼ @ 0

0 cos φ

0

sin φ

0 B exp ϕAy ¼ @

Theory of Continuous Groups

1 0 C sin φ A, cos φ

cos ϕ

0

0 sin ϕ

1 0

sin ϕ

ð20:29Þ

1

C 0 A: cos ϕ

ð20:30Þ

Of the above operators, using (20.28) and (20.30) we recover (17.101) related to the Euler angles such that f R3 ¼ exp ðαAz Þ exp βAy exp ðγAz Þ:

ð20:31Þ

Using exponential functions of matrices, (20.31) gives a transformation matrix containing the Euler angles and supplies us with a direct method to be able to make a matrix representation of SO(3). Although (20.31) is useful for the matrix representation in the real space, its extension to abstract representation spaces would somewhat be limited. In this respect, the method developed in SU(2) can widely be used to produce representation matrices for a representation space of an arbitrary dimension. Let us start with constructing those representation matrices using (2, 2) complex matrices.

20.2.1 Construction of SU(2) Matrices In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many subgroups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central role in theoretical physics and chemistry. In this section we examine how to construct SU(2) matrices. Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is called a special unitary group of degree n and denoted by SU(n). In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary matrix is 1. In Definition 20.1, on the other hand, there is an additional constraint in that we are dealing with unitary matrices whose determinant is 1. The SU(2) matrices U have the following general form: U¼

a

b

c

d

,

ð20:32Þ

where a, b, c, and d are complex numbers. According to Definition 20.1 we try to seek conditions for these numbers. The unitarity condition UU{ ¼ U{U ¼ E reads as

20.2

Rotation Groups: SU(2) and SO(3)

a

b

c

d

¼

!

a

c

!

a

¼ b d ! ! c a b

b

d

c

d

809

jaj2 þ jbj2

ac þ bd

a c þ b d

jcj2 þ jdj2

¼

!

jaj2 þ jcj2

a b þ c d

ab þ cd

jbj2 þ jdj2

! ¼

1 0 0 1

!

ð20:33Þ

:

Then we have (i) |a|2 + |b|2 ¼ 1, (ii) |c|2 + |d|2 ¼ 1, (iii) |a|2 + |c|2 ¼ 1, (iv) |b|2 + |d|2 ¼ 1, (v) ac + bd ¼ 0, (vi) ab + cd ¼ 0, (vii) ad bc ¼ 1. Of the above conditions, (vii) comes from detU ¼ 1. Condition (iv) can be eliminated by conditions (i)(iii). From (v) and (vii), using Cramer’s rule we have

a

0 b 0

b 1

1 a b a

¼ 2

¼ 2 c¼ ¼ b , d ¼ ¼ a : ð20:34Þ 2

a

a b

jaj þ jbj b

jaj þ jbj2

b a

b a From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redundant. Thus, as a general form of U we get U¼

a b

b a

with

jaj2 þ jbj2 ¼ 1:

ð20:35Þ

As a result, we have four freedoms with a and b with a restricting condition of |a|2 + |b|2 ¼ 1. That is, we finally get three freedoms with choice of parameters. Putting a ¼ p þ iq and

b ¼ r þ is ðp, q, r, s : realÞ,

we have p2 þ q2 þ r 2 þ s2 ¼ 1:

ð20:36Þ

Therefore, the restricted condition of (20.35) is equivalent to that p, q, r, and s are regarded as coordinates that are positioned on a unit sphere of ℝ4. From (20.35), we have U Also putting

1

{

¼U ¼

a

b

b

a

:

810

20

U1 ¼

b1 a1

a1 b1

U2 ¼

and

Theory of Continuous Groups

b2 , a2

a2 b2

ð20:37Þ

we get U1U2 ¼

a1 b2 þ b1 a2 : b1 b2 þ a1 a2

a1 a2 b1 b2 a2 b1 a1 b2

ð20:38Þ

Thus, both U1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices having a matrix form of (20.35) constitute a group; i.e., SU(2). Next we wish to connect the SU(2) matrices to the matrices that represent the angular momentum. In Sect. 3.4 we developed the theory of generalized angular momentum. In the case of j ¼ 1/2 we can obtain accurate information about the spin states. Using (3.101) as a starting point and defining the state belonging to the spin 1 0 angular momentum μ ¼ 1/2 and μ ¼ 1/2 as and , respectively, we 0 1 obtain J

ðþÞ

¼

0 1

0 0

and J

ð Þ

¼

1 : 0

0 0

ð20:39Þ

1 0 Note that and are in accordance with (2.64) as a column vector 0 1 representation. Using (3.72), we have Jx ¼

1 2

0

1

1

0

, Jy ¼

1 2

0

i

i

0

, Jz ¼

1 2

1

0

0

1

:

ð20:40Þ

1 1 1 In (20.40) Jz was given so that we may have J z ¼ and 2 0 0 0 1 0 1 . Deleting the coefficient from (20.40), we get Pauli spin ¼ Jz 2 2 1 1 matrices such that σx ¼

0

1

1

0

, σy ¼

0

i

i 0

, σz ¼

1

0

0

1

:

ð20:41Þ

Multiplying i on (20.40), we have anti-Hermitian operators described by ζx ¼

1 2

0 i

1 0 , ζy ¼ 2 1 0

i

1 i , ζz ¼ 2 0 0 1

Notice that these are traceless anti-Hermitian matrices.

0 i

:

ð20:42Þ

20.2

Rotation Groups: SU(2) and SO(3)

811

Now let us seek rotation matrices in a manner related to that deriving (20.31) but at a more general level. From Property (7)0 of Chap. 15 (see Sect. 15.2), we know that exp ðtAÞ with an anti-Hermitian operator A combined with a real number t produces a unitary matrix. Let us apply this to the anti-Hermitian matrices described by (20.42). Using ζ z of (20.42), we calculate exp(aζ z) as in (20.27). We show only the results described by 0 α α α α @ cos 2 þ i sin 2 exp ðαζ z Þ ¼ E cos iσ z sin ¼ 2 2 0 ! iα e2 0 ¼ : iα 0 e 2

1

0 cos

α α i sin 2 2

A

ð20:43Þ

Readers are recommended to derive this. Using ζ y and applying similar calculation procedures to the above, we also get 0 β B cos 2 exp βζ y ¼ @ β sin 2

1 β 2 C: A β cos 2 sin

ð20:44Þ

The representation matrices describe “complex rotations” and (20.43) and ð12Þ (20.44) correspond to (20.28) and (20.30), respectively. We define Dα,β,γ as the representation matrix that describes the successive rotations α, β, and γ and corresponds to f R3 in (17.101) and (20.31) including Euler angles. As a result, we have ð12Þ Dα,β,γ ¼ exp ðαζ z Þ exp βζ y exp ðγζ z Þ 0 1 i i β β e2ðαþγ Þ cos e2ðαγ Þ sin B 2 2 C, ¼@ A i i β β e2ðγαÞ sin e2ðαþγÞ cos 2 2

ð20:45Þ

ð12Þ where the superscript 12 of Dα,β,γ corresponds to the non-negative generalized angular 1 i momentum j ¼ 2 . Equation (20.45) can be obtained by putting a ¼ e2ðαþγÞ cos β2 i and b ¼ e2ðαγÞ sin β2 in the matrix U of (20.35). Using this general SU(2) representation matrix, we construct representation matrices of higher orders.

812

20

Theory of Continuous Groups

20.2.2 SU(2) Representation Matrices: Wigner Formula Suppose that we have two functions (or vectors) v and u. We regard v and u are linearly independent vectors that undergo a unitary transformation by a unitary matrix U of (20.35). That is, we have 0

0

ðv u Þ ¼ ðv uÞU ¼ ðv uÞ

a b

b , a

ð20:46Þ

where (v u) are transformed into (v0 u0). Namely, we have v0 ¼ av b u, u0 ¼ bv þ a u: Let us further define according to Hamermesh [5] an orthonormal set of 2j + 1 functions such that u jþm v jm f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1, , j 1, jÞ, ð j þ mÞ!ð j mÞ!

ð20:47Þ

where j is an integer or a half-odd-integer. Equation (20.47) implies that 2j + 1 monomials fm of 2j-degree with respect to v and u constitute the orthonormal basis. Related discussion can be seen in Sect. 20.3.2. The description that follows includes contents related to generalized angular momenta (see Sect. 3.4). We examine how these functions fm (or vectors) are transformed by (20.45). We denote this transformation by Rða, bÞ. That is, we have Rða, bÞð f m Þ

X

ð jÞ

m0

f m0 Dm0 m ða, bÞ,

ð20:48Þ

ð jÞ

where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation. Replacing u and v with u0 and v0, respectively, we get 1 ℜða,bÞf m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav b uÞ jm ð j þ mÞ!ð j mÞ! X 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðbv þ a uÞ jþm ðav b uÞ jm ¼ μ,v ð j þ mÞ!ð j mÞ! X ð j þ mÞ!ð j mÞ! 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ μ,v ð j þ m μÞ!μ!ð j m vÞ!v! ð j þ mÞ!ð j mÞ! ða uÞ jþmμ ðbvÞμ ðb uÞ jmv ðavÞv pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ð j þ mÞ!ð j mÞ! ¼ ða Þ jþmμ bμ ðb Þ jmv av u2jμv vμþv : μ,v ð j þ m μÞ!μ!ð j m vÞ!v! ð20:49Þ

20.2

Rotation Groups: SU(2) and SO(3)

813

Here, to eliminate v, putting 2j μ v ¼ j þ m0 , μ þ v ¼ j m0 , we get Rða, bÞf m ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j þ mÞ!ð j mÞ! μ,m0 ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!

X

0

ða Þ jþmμ bμ ðb Þm mþμ a jm μ u jþm v jm : 0

0

0

ð20:50Þ

Expressing f m0 as 0

0

u jþm v jm f m0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm0 ¼ j, j þ 1, , j 1, jÞ, ð j þ m0 Þ!ð j m0 Þ!

ð20:51Þ

we obtain Rða, bÞð f m Þ ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ! f 0 m 0 μ,m ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!

X

0

ða Þ jþmμ bμ ðb Þm mþμ a jm μ : 0

ð20:52Þ

Equating (20.52) with (20.48), we get X

ð jÞ

Dm0 m ða, bÞ ¼

μ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ! ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ! 0

ða Þ jþmμ bμ ðb Þm mþμ a jm μ : 0

ð20:53Þ

In (20.53), factorials of negative integers must be avoided; see (3.216). Finally, i i i replacing a (or a) and b (or b) with e2ðαþγÞ cos β2 [or e2ðαþγÞ cos β2] and e2ðαγÞ sin β2 i [or e2ðγαÞ sin β2] by use of (20.45), respectively, we get ð jÞ Dm0 m ðα, β, γ Þ

¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j þ mÞ!ð j mÞ!ð j þ m0 Þ!ð j m0 Þ! ð j þ m μÞ!μ!ðm0 m þ μÞ!ð j m0 μÞ!

X ð1Þm0 mþμ μ

e

iðαm0 þγmÞ

β cos 2

2jþmm0 2μ

β sin 2

m0 mþ2μ :

ð20:54Þ

Equation (20.54) is called Wigner formula [5–7]. ð jÞ To confirm that Dm0 m is indeed unitary, we carry out following calculations. From (20.47) we have

814

20

Xj

j f j2 ¼ m¼j m

Xj

Theory of Continuous Groups

Xj ju jþm v jm j juj2ð jþmÞ jvj2ð jmÞ ¼ : m¼j ð j þ mÞ!ð j mÞ! m¼j ð j þ mÞ!ð j mÞ! 2

Meanwhile, we have h

2

juj þ jvj

2

i2j

2ð jmÞ X2j ð2jÞ!juj2ð2jkÞ jvj2k Xj ð2jÞ!juj2ð jþmÞ jvj ¼ ¼ , k¼0 m¼j ð2j k Þ!k! ð j þ mÞ!ð j mÞ!

where with the last equality k was replaced with j m. Comparing the above two equations, we get Xj

j f j2 ¼ m¼j m

h i2j 1 juj2 þ jvj2 : ð2jÞ!

ð20:55Þ

Viewing (20.46) as variables transformation and taking adjoint of (20.46), we have

v0 u0

¼U

{

v , u

ð20:56Þ

a b (i.e., a unitary matrix). Operating (20.56) on b a both sides of (20.46) from the right, we get

where we defined U ¼

ðv0 u0 Þ

v0 u0

¼ ðv uÞUU {

v

u

¼ ð v uÞ

v u

:

That is, we have a quadratic form described by ju0 j þ jv0 j ¼ juj2 þ jvj2 : 2

2

From (20.55) and (20.57), we conclude that ð jÞ

Pj

2 m¼j j f m j

ð20:57Þ (i.e., the length of

vector) is invariant under the transformation Dm0 m. This implies that the transformað jÞ tion is unitary and, hence, that Dm0 m is a unitary matrix [5].

20.2.3 SO(3) Representation Matrices and Spherical Surface Harmonics Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get further useful information. We immediately notice that (20.54) is related to (3.216)

20.2

Rotation Groups: SU(2) and SO(3)

815

that is explicitly described by trigonometric functions. Putting m ¼ 0 in (20.54) we have ð jÞ Dm0 0 ðα, β, γ Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ð1Þm0 þμ j! ð j þ m0 Þ!ð j m0 Þ! ¼ ¼ μ ð j μÞ!μ!ðm0 þ μÞ!ð j m0 μÞ! 2jm0 2μ m0 þ2μ 0 β β sin : ð20:58Þ eiαm cos 2 2 ð jÞ Dm0 0 ðα, βÞ

Note that (20.58) does not depend on the variable γ, because m ¼ 0 leads to eimγ ¼ ei 0 γ ¼ e0 ¼ 1 in (20.54). Notice also that the event of m ¼ 0 in (20.54) never happens with the case where j is a half-odd-integer, but happens with the case where j is an integer l. In fact, D(l )(α, β, γ) is a (2l + 1)-degree representation of SO (3). Later in this section, we will give a full matrix form in the case of l ¼ 1. Replacing m0 with m in (20.58) we get ð jÞ Dm0 ðα,β,γ Þ ¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2jm2μ mþ2μ X ð1Þmþμ j! ð j þ mÞ!ð j mÞ! β β iαm cos sin : e μ ð j μÞ!μ!ðm þ μÞ!ð j m μÞ! 2 2 ð20:59Þ

For (20.59) to be consistent with (3.216), by changing notations of variables and replacing j with an integer l we further rewrite (20.59) to get ðlÞ Dm0 ðϕ, θÞ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2l2rm 2rþm X ð1Þmþr l! ðl þ mÞ!ðl mÞ! θ θ imϕ ¼ cos sin : e r r!ðl m r Þ!ðl r Þ!ðm þ r Þ! 2 2 ð20:60Þ

Notice that in (20.60) we deleted the variable γ as it was redundant. Comparing (20.60) with (3.216), we get [5]. h i rffiffiffiffiffiffiffiffiffiffiffiffi 4π m ðlÞ Dm0 ðϕ, θÞ ¼ Y ðθ, ϕÞ: 2l þ 1 l

ð20:61Þ

From a general point of view, let us return to a discussion of SU(2) and introduce a following important theorem in relation to the basis functions of SU(2). Theorem 20.1 [1] Let {ψ j, ψ j + 1, , ψ j} be a set of (2j + 1) functions. Let D( j ) be a representation of the special unitary group SU(2). Suppose that we have a following expression for ψ m, k (m ¼ j, , j) such that h i ð jÞ ψ m,k ðα, β, γ Þ ¼ Dmk ðα, β, γ Þ ,

ð20:62Þ

where α, β, and γ are Euler angles that appeared in (17.101). Then, the above functions are basis functions of the representation of D( j ).

816

20

Fig. 20.2 Coordinate systems O, I, and II and their transformation. Regarding the symbols and notations, see text

Theory of Continuous Groups

(

,

,

)

I

O

Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators of transformation (i.e., rotation) between the coordinate system as shown. The transformation Q is defined as the coordinate transformation that transforms I to II. Note that their inverse transformations, e.g., Q1 exists. In Fig. 20.2, P and R are specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2] Q1 R ¼ P:

ð20:63Þ

According to the expression (19.12), we have h i ð jÞ Qψ m,k ðα, β, γ Þ ¼ ψ m,k Q1 ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ Dmk Q1 R , where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element ð jÞ Dmk Q1 R can be written as i Xh X ð jÞ 1 ð jÞ ð jÞ ð jÞ Dmk Q1 R ¼ Dmn Q Dnk ðRÞ ¼ DðnmjÞ ðQÞ Dnk ðRÞ, n

n

where with the last equality we used the unitarity of the representation matrix D( j ). Taking complex conjugate of the above expression, we get Qψ m,k ðα, β, γ Þ ¼ ψ m,k ðα0 , β0 , γ 0 Þ ¼ ¼

X

h i X ð jÞ DðnmjÞ ðQÞ Dnk ðRÞ n

h i X ð jÞ ð jÞ D ð Q Þ D ð α, β, γ Þ ¼ ψ n,k ðα, β, γ ÞDðnmjÞ ðQÞ, ð20:64Þ nm nk n n

where with the last equality we used the supposition (20.62). Equation (20.64) implies that ψ m, k (m ¼ j, , 0, , j) are basis functions of the representation of D( j ). This completes the proof. ∎ If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with an integer l and putting k ¼ 0 in (20.62) and (20.64), we get

20.2

Rotation Groups: SU(2) and SO(3)

817

Qψ m,0 ðα, β, γ Þ ¼ ψ m,0 ðα0 , β0 , γ 0 Þ ¼ h i ðlÞ ψ m,0 ðα, β, γ Þ ¼ Dm0 ðα, β, γ Þ :

X n

lÞ ψ n,0 ðα, β, γ ÞDðnm ðQÞ,

ð20:65Þ

This expression shows that the complex conjugates of individual components for the “center” column vector of the representation matrix give the basis functions of the representation of D(l ). Removing γ as it is redundant and by the aid of (20.61), we get rffiffiffiffiffiffiffiffiffiffiffiffi 4π m Y ðβ, αÞ: ψ m,0 ðα, βÞ 2l þ 1 l Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the spherical surface harmonics Y m l ðθ, ϕÞ ðm ¼ l, , 0, , lÞ span the representation space with respect to D(l ). That is, we have QY m l ðβ, αÞ ¼

X

lÞ Y n ðβ, αÞDðnm ðQÞ: n l

ð20:66Þ

Now, let us get back to (20.63). From (20.63), we have Q ¼ RP1 :

ð20:67Þ

Since P represents an arbitrary transformation, we may choose any (orthogonal) coordinate system I in ℝ3. Meanwhile, a transformation which transforms I into II is necessarily present and given by Q that is expressed as (20.67); see (11.72) and the relevant argument of Sect. 11.4. Consequently, Q represents an arbitrary transformation in ℝ3 in turn. Now, in (20.67) putting P ¼ E we get Q R(α, β, γ). Then, replacing Q with R in (20.66) as a special case we have Rðα, β, γ ÞY m l ðβ, αÞ ¼

X

lÞ Y n ðβ, αÞDðnm ðα, β, γ Þ: n l

ð20:68Þ

The LHS of (20.68) is associated with three successive transformations. To describe it appropriately, let us recall (19.12) again. We have Rf ðrÞ ¼ f R1 ðrÞ : f R3

ð19:12Þ

Now, we are thinking of the case where R in (19.12) is described by R(α, β, γ) or

¼ Rzα R0y0 β R00 z00 γ in (17.101). That is, we have

818

20

Theory of Continuous Groups

h i Rðα, β, γ Þf ðrÞ ¼ f Rðα, β, γ Þ1 ðrÞ : f3 of (17.101), we readily get [5]. Considering the constitution of R Rðα, β, γ Þ1 ¼ Rðγ, β, αÞ: Thus, we have Rðα, β, γ Þf ðrÞ ¼ f ½Rðγ, β, αÞðrÞ:

ð20:69Þ

Applying (20.69) to (20.68), we get the successive (inverse) transformations of the functions such that m m m Ym l ðβ, αÞ ! Y l ðβ, α γ Þ ! Y l ðβ β, α γ Þ ! Y l ðβ β, α γ αÞ:

The last entry in the above is [5] m Ym l ðβ β, α γ αÞ ¼ Y l ð0, γ Þ:

Then, from (20.68) we get Ym l ð0, γ Þ ¼

X

lÞ Y nl ðβ, αÞDðnm ðα, β, γ Þ:

ð20:70Þ

n

Meanwhile, from (20.61) we have Ym l ð0, γ Þ ¼

rffiffiffiffiffiffiffiffiffiffiffiffih i 2l þ 1 ðlÞ Dm0 ðγ, 0Þ : 4π

Returning to (20.60), if we would have sin 2θ factor in (20.60) or in (3.216), the ðlÞ relevant terms vanish with Dm0 ðγ, 0Þ except for special cases where 2r + m ¼ 0 in (3.216) or (20.60). But, to avoid having a negative integer inside the factorial in the denominator of (20.60), we must have r + m 0. Since r 0, we must have 2r + m ¼ (r + m) + r 0 as well. In other words, only if r ¼ m ¼ 0 in (20.60), we ðlÞ have 2r + m ¼ 0, for which Dm0 ðγ, 0Þ does not vanish. Using these conditions in (20.60), (20.60) is greatly simplified to be ðlÞ

Dm0 ðγ, 0Þ ¼ δm0 : From (20.61) we get [5]

20.2

Rotation Groups: SU(2) and SO(3)

819

Ym l ð0, γ Þ

rffiffiffiffiffiffiffiffiffiffiffiffi 2l þ 1 δ : ¼ 4π 0m

ð20:71Þ

Notice that this expression is consistent with (3.147) and (3.148). Meanwhile, replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting ðlÞ equation by Dmk ðα, β, γ Þ, and further summing over m, we get rffiffiffiffiffiffiffiffiffiffiffiffi i X h ðlÞ 4π m ðlÞ ðlÞ ð β, α ÞD ð α, β, γ Þ ¼ D ð α, β, γ Þ Dmk ðα, β, γ Þ Y m0 mk m m 2l þ 1 l rffiffiffiffiffiffiffiffiffiffiffiffi 4π k Y ð0, γ Þ, ¼ δ0k ¼ 2l þ 1 l

X

ð20:72Þ

lÞ ðα, β, γ Þ (see where with the second equality we used the unitarity of the matrices Dðnm Sect. 20.2.2) and with the last equality we used (20.70) multiplied by the constant qffiffiffiffiffiffiffi 4π 2lþ1 .

At the same time, we recovered (20.71) comparing the last two sides of

(20.72). Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get

ℜða, bÞ 0 Dj,j B ¼ f j , f jþ1 , , f j1 , f j B @ ⋮ D j,j f j , f jþ1 , , f

j1 , f j

⋱

Dj,j

1

C ⋮ C A, D j,j

ð20:73Þ

where fm (m ¼ j, j + 1, , j 1, j) is described as in (20.47). If j is an integer l, we can choose Y m l ðβ, αÞ for individual fm (i.e., basis functions). To get used to the abstract representation theory in continuous groups, we wish to think of a following trivial case: Putting j ¼ 0 in (20.54), (20.54) becomes trivial, but still valid. In that case, we have m0 ¼ m ¼ 0 so that the inside of the factorial can be non-negative. In turn, we must have μ ¼ 0 as well. Thus, we have ð0Þ

D00 ðα, β, γ Þ ¼ 1: Or, inserting l ¼ m ¼ 0 into (20.61), we have h

ð 0Þ

D00 ðϕ, θÞ

i

¼1¼

pffiffiffiffiffi 0 4π Y 0 ðθ, ϕÞ,

i:e:,

Y 00 ðθ, ϕÞ ¼

pffiffiffiffiffiffiffiffiffiffi 1=4π :

Thus, we recover (3.150). Next, let us reproduce three-dimensional representation D(1) using (20.54). A full matrix form is expressed as

820

20

0

D1,1

B Dð1Þ ¼ B @ D0,1

D0,0

D1,1

1

C D0,1 C A D1,1 1 pffiffiffi eiα sin β 2

D1,0 β eiðαþγ Þ cos 2 2 B B B 1 B cos β ¼ B pffiffiffi eiγ sin β B 2 B @ β 1 pffiffiffi eiα sin β eiðγαÞ sin 2 2 2 0

D1,1

D1,0

Theory of Continuous Groups

β 1 2 C C C 1 iγ pffiffiffi e sin β C C: C 2 C βA eiðαþγÞ cos 2 2 eiðαγÞ sin 2

ð20:74Þ

Note that (20.74) is a unitary matrix. The confirmation is left for readers. The complex conjugate of the column vector m ¼ 0 in (20.61) with l ¼ 1 is described by 0

1 1 pffiffiffi eiα sin β C h i B B 2 C ð1Þ C cos β Dm0 ðϕ, θÞ ¼ B B C @ A 1 iα pffiffiffi e sin β 2

ð20:75Þ

that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760, Table 15.4 of Ref. [4]. Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the f3 of (17.101) that includes Euler angles. The argument is rotation matrix of ℝ3, i.e., R as follows: As already shown in (20.61) and (20.65), we have related the basis 0 1 . Denoting the vectors ( f1 f0 f1) of D(1) to the spherical harmonics Y 1 Y Y 1 1 1 spherical basis by 1 1 e e ff 1 pffiffiffi ðx iyÞ, f 0 z, f 1 pffiffiffi ðx iyÞ, 2 2

ð20:76Þ

1 1 pffiffiffi C 2C 0 C C, i A pffiffiffi 2

ð20:77Þ

we have 0

1 pffiffiffi B 2 0

B e e B ff 1 1 f 0 f 1 ¼ ðx z yÞU ¼ ðx z yÞB 0 @ i pffiffiffi 0 2 where we define a (unitary) matrix U as

20.2

Rotation Groups: SU(2) and SO(3)

821

0

1 pffiffiffi B 2 B UB B 0 @ i pffiffiffi 2

1 1 0 pffiffiffi C 2C 1 0 C C: i A 0 pffiffiffi 2

Equation (20.77) describes the relationship among two set of functions e e and (x z y). We should not regard (x z y) as coordinate in ℝ3 as ff 1 f 0 f 1 mentioned in Sect. 20.1. The (3, 3) matrix of (20.77) is essentially the as P of

same 0 0 e e0 f (20.22). Following (20.73) and taking account of (20.25), we express ff 1 0 f1 as

0 0 0

ð1Þ e e e e e e f f ff f f f f f f f R ð a, b Þ ¼ f 1 0 1 1 0 1 1 0 1 D :

ð20:78Þ

(1) e e as This means that a set of functions ff 1 f 0 f 1 forms the basis vectors of D well. Meanwhile, the relation (20.77) holds after the transformation, because (20.77) is independent of the choice of specific coordinate systems. That is, we have

0 0 0 e e ¼ ðx0 z0 y0 Þ U: ff 1 f 0 f 1 Then, from (20.78) we get

0 0 0

1 e e e e ð1Þ 1 ðx0 z0 y0 Þ ¼ ff ¼ ff 1 f 0 f 1 U 1 f 0 f 1 D U ¼ ðx z yÞUDð1Þ U 1 :

ð20:79Þ

Defining a unitary matrix V as 0

1 B V ¼@0 0

0 0 1

1 0 C 1A 0

and operating V on both sides of (20.79) from the right, we get ðx0 z0 y0 ÞV ¼ ðx z yÞV V 1 UDð1Þ U 1 V: That is, we have 1 ðx0 y0 z0 Þ ¼ ðx y zÞ U 1 V Dð1Þ U 1 V:

ð20:80Þ

822

20

Theory of Continuous Groups

Note that (x0 z0 y0)V ¼ (x0 y0 z0) and (x z y)V ¼ (x y z); that is, V exchanges the order of z and y in (x z y). Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have ð x0 y0 z 0 Þ ¼ ð x y z Þ R ,

ð20:81Þ

where R represents a (3, 3) orthogonal matrix. Equation (20.81) represents a rotation in SO(3). Since, x, y, and z are linearly independent, comparing (20.80) and (20.81) we get 1 { R U 1 V Dð1Þ U 1 V ¼ U 1 V Dð1Þ U 1 V: More explicitly, using a ¼ e2ðαþγÞ cos (20.45), we describe R as i

Re a2 b2 B R ¼ @ Im a2 b2 0

2 Re ðab Þ

β 2

and b ¼ e2ðαγÞ sin

Im a2 þ b2 Re a2 þ b2 2Imðab Þ

i

2 Re ðabÞ

β 2

as in the case of

1

C 2ImðabÞ A: 2

ð20:82Þ

ð20:83Þ

2

j aj j bj

f Comparing (20.83) with (17.101), we find out that R is identical

to R3 of (1) f3 are (17.101). Equations (20.82) and (20.83) clearly show that D and R R connected to each other through the unitary similarity transformation, even though these matrices differ in that the former matrix is unitary and the latter is a real R3 are equivalent in terms of representation; see orthogonal matrix. That is, D(1) and f Schur’s First Lemma of Sect. 18.3. Thus, D(1) is certainly a representation matrix of f3 is given by SO(3). The trace of D(1) and R f3 ¼ cos ðα þ γ Þð1 þ cos βÞ þ cos β: TrDð1Þ ¼ Tr R Let us continue the discussion still further. We think of the quantity (x/r y/r z/r), where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the right, we have ðx=r y=r z=r ÞR ¼ ðx=r y=r z=r ÞV 1 UDð1Þ U 1 V

ð1Þ 1 e e ¼ ff 1 =r f 0 =r f 1 =r D U V: From (20.25), (20.61), (20.75), and (20.76), we get

1 iα 1 iα e e p ffiffi ffi p ffiffi ffi e e =r f =r f =r ¼ sin β cos β sin β : ff 1 0 1 2 2 Using (20.74) and (20.84), we obtain

ð20:84Þ

20.2

Rotation Groups: SU(2) and SO(3)

823

e e ff =r f =r f =r Dð1Þ ¼ ð0 1 0Þ: 1 0 1 This is a tangible example of (20.72). Then, we have

ð1Þ 1 e e ff 1 =r f 0 =r f 1 =r D U V ¼ ðx=r y=r z=r ÞR ¼ ð0 0 1Þ:

ð20:85Þ

For the confirmation of (20.85), use the spherical coordinate representation (3.17) to denote x, y, and z in the above equation. This seems to be merely a confirmation of the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that the simple relation (20.85) holds only with a special case where the parameters α and β in D(1) (or R ) are identical with the angular components of spherical coordinates associated with the Cartesian coordinates x, y, and z. Equation (20.85), however, does not hold in a general case where the parameters α and β are different from those angular components. In other words, (20.68) that describes the special case where Q R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), on the other hand, is used for the general case where Q represents an arbitrary transformation in ℝ3. Regarding further detailed discussion on the topics, readers are referred to the literature [1, 2, 5, 6].

20.2.4 Irreducible Representations of SU(2) and SO(3) In this section we further explore characteristics of the representation of SU(2) and SO(3) and show the irreducibility of the representation matrices of SU(2) and SO(3). ð jÞ To prove the irreducibility of the SU(2) representation matrices Dm0 m , we use Schur’s lemmas already explained in Sect. 18.3. For this purpose we use special types of matrices of [5]. In (20.54) we consider a special case where m0 ¼ j. Then, the factor ( j m0 μ) ! ¼ (μ)! in the denominator survives only when μ ¼ 0. Hence, sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jþm jm ð2jÞ! β β ð jÞ jm iðαjþγmÞ Djm ðα, β, γ Þ ¼ ð1Þ cos sin : e 2 2 ð j þ mÞ!ð j mÞ! ð20:86Þ ð jÞ

Since in (20.86) the exponential term never vanishes, Djm ðα, β, γ Þ does not vanish either except for special values of β. ð jÞ Next, we consider Dm0 m ðα, 0, γ Þ. For this term not to vanish, the power of sin β2 in (20.54) must be zero; i.e., we have m0 m þ 2μ ¼ 0:

ð20:87Þ

824

20

Theory of Continuous Groups

Meanwhile, no condition is imposed on the power of cos β2 in (20.54); note that 2jþmm0 2μ cos 02 ¼ 1. In the denominator of (20.54), to avoid a negative number in factorials we must have m0 m + μ 0 and μ 0. If μ > 0, m0 m + 2μ ¼ (m0 m + μ) + μ > 0 as well. Therefore, for (20.87) to hold, we must have μ ¼ 0. From (20.87), in turn, this means m0 m ¼ 0, i.e., m0 ¼ m. Then, we have a greatly simplified equation

ð jÞ

0

Dm0 m ðα, 0, γ Þ ¼ δm0 m eiðαm þγmÞ :

ð20:88Þ

ð jÞ

This implies that Dm0 m ðα, 0, γ Þ is diagonal. Suppose that we have a matrix ð jÞ A ¼ ðAm0 m Þ. If A commutes with Dm0 m ðα, 0, γ Þ, it must be diagonal unless ð jÞ Dm0 m ðα, 0, γ Þ ¼ cδm0 m, where c is a constant independent of m0 and m. But, ð jÞ Dm0 m ðα, 0, γ Þ 6¼ cδm0 m because of (20.88). That is, A should have a form Am0 m ¼ ð jÞ am δm0 m . If A commutes with Dm0 m ðα, β, γ Þ, then taking the ( j, m) component of the product we have

ADð jÞ

jm

¼

X

ð jÞ

A D ¼ k jk km

X

ð jÞ aδ D k k jk km

ð jÞ

¼ aj Djm :

ð20:89Þ

ð jÞ

ð20:90Þ

Meanwhile,

Dð jÞ A

jm

¼

X

ð jÞ

k

Djk Akm ¼

X k

ð jÞ

Djk am δkm ¼ am Djm :

From (20.89) and (20.90), we have ð jÞ Djm ðα, β, γ Þ aj am ¼ 0: ð jÞ

As already mentioned above, the matrix elements Djm ðα, β, γ Þ of (20.86) do not ð jÞ

vanish except for special values of β. Therefore, for A to commute with Dm0 m ðα, β, γ Þ, we must have aj ¼ am a for all m. This implies Am0 m ¼ aδm0 m : ð jÞ

ð20:91Þ

Thus, from Schur’s Second Lemma Dm0 m ðα, β, γ Þ is irreducible. We must be careful, however, about the application of Schur’s Second Lemma. This is because Schur’s Second Lemma described in Sect. 18.3 says the following: A representation D is irreducible ⟹ The matrix M that is commutative with all D(g) is limited to a form of M ¼ cE.

20.2

Rotation Groups: SU(2) and SO(3)

825

Nonetheless, we are uncertain of truth or falsehood of the converse proposition. Fortunately, however, the converse proposition is true if the representation is unitary. We prove the converse proposition by its contraposition such that. A representation D is not irreducible (i.e., reducible) ⟹ Of the matrices that are commutative with all D(g), we can find at least one matrix M of the type other than M ¼ cE. In this context, Theorem 18.2 (Sect. 18.2) as well as (18.38) teaches us that the unitary representation D(g) is completely reducible so that we can describe it as a direct sum such that DðgÞ ¼ Dð1Þ ðgÞ Dð2Þ ðgÞ DðωÞ ðgÞ,

ð20:92Þ

where g is any group element. Then, we can choose a following matrix A for a linear transformation that is commutative with D(g): A ¼ αð1Þ E 1 αð2Þ E 2 αðωÞ E ω ,

ð20:93Þ

where α(1), , α(ω) are complex constants that may be different from one another; E1, , Eω are unit matrices having the same dimension as D(1)(g), , D(ω)(g), respectively. Then, even though A is not a type of A ¼ αδm0 m, A commutes with D(g) for any g. Here we rephrase Schur’s Second Lemma as the following theorem. Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with g 2 G we have

8

DðgÞM ¼ MDðgÞ:

ð18:48Þ

A necessary and sufficient condition for the said unitary representation to be irreducible is that linear transformations that are commutative with D(g) (8g 2 G) are limited to those of a form described by A ¼ αδm0 m , where α is a (complex) constant. Next, we examine the irreducibility of the SO(3) representation matrices. We develop the discussion in conjunction with explanation about the important properties of the representation matrices of SU(2) as well as SO(3). We rewrite (20.54) when j ¼ l (l : zero or positive integers) to relate it to the rotation in ℝ3. h

ðlÞ

D ðα, β, γ Þ

i m0 m

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ð1Þm0 mþμ ðl þ mÞ!ðl mÞ!ðl þ m0 Þ!ðl m0 Þ! ¼ μ ðl þ m μÞ!μ!ðm0 m þ μÞ!ðl m0 μÞ! 2lþmm0 2μ m0 mþ2μ 0 β β sin : eiðαm þγmÞ cos 2 2 ð20:94Þ

826

20

Theory of Continuous Groups

f3 (appearing in Meanwhile, as mentioned in Sect. 20.2.3 the transformation R Sect. 17.4.2) described by f R3 ¼ Rzα R0 y0 β R00 z00 γ may be regarded as R(α, β, γ) of (20.63). Consequently, the representation matrix D(l )(α, β, γ) is described by

ðlÞ R3 ¼ DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R00 z00 γ : DðlÞ ðα, β, γ Þ DðlÞ f

ð20:95Þ

Equation (20.95) is based upon the definition of representation (18.2). This ð12Þ notation is in parallel with (20.45) in which Dα,β,γ is described by the product of three exponential functions of matrices each of which was associated with the rotation characterized by Euler angles of (17.101). Denoting DðαlÞ ðRzα Þ ¼

ðlÞ DðlÞ ðα, 0, 0Þ, Dβ R0y0 β ¼ DðlÞ ð0, β, 0Þ, and DðγlÞ R00 z00 γ ¼ DðlÞ ð0, 0, γ Þ, we have DðlÞ ðα, β, γ Þ ¼ DðlÞ ðα, 0, 0ÞDðlÞ ð0, β, 0ÞDðlÞ ð0, 0, γ Þ,

ð20:96Þ

where with each representation two out of three Euler angles are zero. In light of (20.94), we estimate each factor of (20.96). By the same token as before, putting β ¼ γ ¼ 0 in (20.94) let us examine on what conditions m0 mþ2μ sin 02 survives. For this factor to survive, we must have m0 m + 2μ ¼ 0 as in (20.87). Then, following the argument as before, we should have μ ¼ 0 and m0 ¼ m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with D(l )(0, 0, γ) as well. Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm ¼ l, , 0, , lÞ are eigenfunctions with regard to the rotation about the z-axis. In other words, we have m imα m Rzα Y m Y l ðθ, ϕÞ: l ðθ, ϕÞ ¼ Y l ðθ, ϕ αÞ ¼ e

This is because from (3.216) Y m l ðθ, ϕÞ can be described as imϕ Ym , l ðθ, ϕÞ ¼ F ðθ Þe

where F(θ) is a function of θ. Hence, we have imðϕαÞ ¼ eimα F ðθÞeimϕ ¼ eimα Y m Ym l ðθ, ϕ αÞ ¼ F ðθ Þe l ðθ, ϕÞ:

Therefore, with a full matrix representation we get

20.2

Rotation Groups: SU(2) and SO(3)

lþ1 Y l Y 0l Y ll Rzα l Yl 0

B B B l lþ1 0 l B ¼ Y l Y l Y l Y l B B @ 0 B B B l lþ1 ¼ Y l Y l Y 0l Y ll B B B @

827

1

eiðlÞα

C C C C C C A

eiðlþ1Þα ⋱

e

eilα 1

ilα

ð20:97Þ

C C C C: C C A

eiðl1Þα ⋱ eilα

With a shorthand notation, we have DðlÞ ðα, 0, 0Þ ¼ eimα δm0 m ðm ¼ l, , lÞ:

ð20:98Þ

From (20.96), with the (m0, m) matrix elements of D(l )(α, β, γ) we get h

DðlÞ ðα, β, γ Þ

i m0 m

h i h ðlÞ ðlÞ ðlÞ 0 D ð α, 0, 0 Þ ð 0, β, 0 Þ D ð 0, 0, γ Þ D ms st s,t tm h i X isα ðlÞ imγ ¼ e δm0 s D ð0, β, 0Þ e δtm ð20:99Þ s,t st h i 0 ¼ eim α DðlÞ ð0, β, 0Þ 0 eimγ ¼

X h

mm

Meanwhile, from (20.94) we have h

ðlÞ

D ð0, β, 0Þ

i m0 m

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ð1Þm0 mþμ ðl þ mÞ!ðl mÞ!ðl þ m0 Þ!ðl m0 Þ! ¼ μ ðl þ m μÞ!μ!ðm0 m þ μÞ!ðl m0 μÞ! 2lþmm0 2μ m0 mþ2μ: β β sin ð20:100Þ cos 2 2

Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three factors. Equation (20.94) has such an implication. The same characteristic is shared with the SU(2) representation matrices (20.54) more generally. This can readily be understood from the aforementioned discussion. Taking D(1) of (20.74) as an example, we have

828

20

Theory of Continuous Groups

Dð1Þ ðα, β, γ Þ ¼ Dð1Þ ðα, 0, 0ÞDð1Þ ð0, β, 0ÞDð1Þ ð0, 0, γ Þ 0 1 β 1 2β pffiffiffi sin β sin 2 0 iα 1B cos 2 1 2 C0 iγ 2 e 0 0 C e 0 0 B CB B CB C 1 1 CB CB p ffiffi ffi p ffiffi ffi sin β cos β sin β ¼B 0 1 0 C@ 0 1 0 C B @ AB A C 2 2 C B iγ 0 0 eiα @ 0 0 e A β 1 β pffiffiffi sin β cos2 sin 2 2 2 2 1 0 β 1 β pffiffiffi eiα sin β eiðαþγÞ cos 2 eiðαγÞ sin 2 2 2 C B 2 C B C B 1 1 iγ B pffiffiffi eiγ sin β pffiffiffi e sin β C cos β ¼B C: C B 2 2 C B @ 1 iα βA 2β iðγαÞ i ð αþγ Þ 2 pffiffiffi e sin β e sin cos e 2 2 2 ð20:74Þ In this way, we have reproduced the result of (20.74). Once D(l )(α, β, γ) has been factorized as in (20.94), we can readily examine whether the representation is irreducible. To know what kinds of matrices commute with D(l )(α, β, γ), it suffices to examine whether the matrices commute with individual factors. The argument is due to Hamermesh [5]. (i) Let A be a matrix having a dimension the same as that of D(l )(α, β, γ), namely A is a (2l + 1, 2l + 1) square matrix with a form of A ¼ (aij). First, we examine the commutativity of D(l )(α, 0, 0) and A. With the matrix elements we have h i ADðlÞ ðα, 0, 0Þ

m0 m

¼

h i ðlÞ 0k D a ð α, 0, 0 Þ m k

X

km

¼

X

am0 k eimα δkm

k

¼ am0 m eimα ,

ð20:101Þ

where the second equality comes from (20.98). Also, we have h i DðlÞ ðα, 0, 0ÞA

m0 m

¼

Xh k

i DðlÞ ðα, 0, 0Þ

0

m0 k

akm ¼ am0 m eim α :

ð20:102Þ

From (20.101) and (20.102), for a condition of commutativity we have

0 am0 m eimα eim α ¼ 0: If m0 6¼ m, we should have am0 m ¼ 0. If m0 ¼ m, am0 m does not have to vanish, but may take an arbitrary complex number. Hence, A is a diagonal matrix, namely

20.2

Rotation Groups: SU(2) and SO(3)

829

A ¼ am δ m 0 m ,

ð20:103Þ

where am is an arbitrary complex number. (ii) Next we examine the commutativity of D(l )(0, β, 0) and A of (20.103). Using (20.66), we have QY m l ðβ, αÞ ¼

X

lÞ Y n ðβ, αÞDðnm ðQÞ: n l

ð20:66Þ

Choosing R(0, θ, 0) for Q and putting α ¼ 0 in (20.66), we have X

Rð0, θ, 0ÞY m l ðβ, 0Þ ¼

lÞ Y n ðβ, 0ÞDðnm ð0, θ, 0Þ: n l

ð20:104Þ

Meanwhile, from (20.69) we get h i 1 m Rð0, θ, 0ÞY m ð β, 0 Þ ¼ Y R ð 0, θ, 0 Þ ð β, 0 Þ ¼ Ym l l l ½Rð0, θ, 0Þðβ, 0Þ ¼ Ym l ðβ θ, 0Þ:

ð20:105Þ

Combining (20.104) and (20.105) we get Ym l ðβ θ, 0Þ ¼

X

lÞ Y n ðβ, 0ÞDðnm ð0, θ, 0Þ: n l

ð20:106Þ

Further putting β ¼ 0 in (20.106), we have Ym l ðθ, 0Þ ¼

X

lÞ Y n ð0, 0ÞDðnm ð0, θ, 0Þ: n l

ð20:107Þ

Taking account of (20.71), RHS does not vanish in (20.107) only when n ¼ 0. On this condition, we obtain ðlÞ

0 Ym l ðθ, 0Þ ¼ Y l ð0, 0ÞD0m ð0, θ, 0Þ:

ð20:108Þ

Meanwhile, from (20.71) with m ¼ 0, we have Y 0l ð0, 0Þ

rffiffiffiffiffiffiffiffiffiffiffiffi 2l þ 1 : ¼ 4π

ð20:109Þ

Inserting this into (20.108), we have Ym l ðθ, 0Þ

rffiffiffiffiffiffiffiffiffiffiffiffi 2l þ 1 ðlÞ D ð0, θ, 0Þ: ¼ 4π 0m

Replacing θ with θ in (20.110), we get

ð20:110Þ

830

20

ðlÞ D0m ð0, θ, 0Þ

Theory of Continuous Groups

rffiffiffiffiffiffiffiffiffiffiffiffi 4π m ¼ Y ðθ, 0Þ: 2l þ 1 l

ð20:111Þ

Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with l ¼ 1. Viewing (20.111) as a “row vector,” (20.111) with l ¼ 1 can be expressed as

ð1Þ ð1Þ ð1Þ D0,1 ð0, θ, 0Þ D0,0 ð0, θ, 0Þ D0,1 ð0, θ, 0Þ rffiffiffiffiffi 4π 1 1 1 0 1 Y ðθ, 0Þ Y l ðθ, 0Þ Y l ðθ, 0Þ ¼ pffiffiffi sin θ cos θ pffiffiffi sin θ : ¼ 3 l 2 2 Getting back to our topic, with the matrix elements we have h

ADðlÞ ð0, θ, 0Þ

i 0m

¼

h i ðlÞ a δ D ð 0, θ, 0 Þ k 0k k

X

km

h i ¼ a0 DðlÞ ð0, θ, 0Þ

0m

and h

i DðlÞ ð0, θ, 0ÞA

0m

¼

Xh k

DðlÞ ð0, θ, 0Þ

i 0k

h i am δkm ¼ DðlÞ ð0, θ, 0Þ am , 0m

where we assume A ¼ am δm0 m in (20.103). Since from (20.108) [D (0, θ, 0)]0m does not vanish other than specific values of θ, for A and D(l )(0, θ, 0) to commute we must have (l )

am ¼ a0 for all m ¼ l, l + 1, , 0, , l 1, l. Then, from (20.103) we get A ¼ a0 δm0 m ,

ð20:112Þ

where a0 is an arbitrary complex number. Thus, a matrix that commutes with both D(l )(α, 0, 0) and D(l )(0, β, 0) must be of a form of (20.112). The same discussion holds with D(l )(α, β, γ) of (20.94) as a whole with regard to the commutativity. On the basis of Schur’s Second Lemma (Sect. 18.3), we conclude that the representation D(l )(α, β, γ) is again irreducible. This is one of the prominent features of SO(3) as well as SU(2). As can be seen clearly in (20.97), the dimension of representation matrix D(l )(α, β, γ) is 2l + 1. Suppose that there is another representation matrix 0 0 Dðl Þ ðα, β, γ Þ ðl0 6¼ lÞ. Then, D(l )(α, β, γ) and Dðl Þ ðα, β, γ Þ are inequivalent (see Sect. 18.3). Meanwhile, the unitary representation of SO(3) is completely reducible and, hence, any reducible unitary representation D can be described by

20.2

Rotation Groups: SU(2) and SO(3)

D¼

X

831

a DðlÞ ðα, β, γ Þ, l l

ð20:113Þ

where al is zero or a positive integer. If al 2, this means that the same representations D(l )(α, β, γ) repeatedly appear. Equation (20.113) provides a tangible example of (20.92) expressed as DðgÞ ¼ Dð1Þ ðgÞ Dð2Þ ðgÞ DðωÞ ðgÞ

ð20:92Þ

and applies to the representation matrices of SU(2) more generally. We will encounter related discussion in Sect. 20.3 in connection with the direct-product representation.

20.2.5 Parameter Space of SO(3) As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the rotation in ℝ3. Their domains are usually taken as follows: 0 α 2π, 0 β π, 0 γ 2π: The domains defined above are referred to as a parameter space. Yet, there are different implications depending upon choosing different coordinate systems. The first choice is a moving coordinate system where α, β, and γ are Euler angles (see Sect. 17.4.2). The second choice is a fixed coordinate system where α is an azimuthal angle, β is a zenithal angle, and γ is a rotation angle. Notice that in the former case α, β, and γ represent equivalent rotation angles. In the latter case, however, although α and β define the orientation of a rotation axis, γ is designated as a rotation angle. In the present section we further discuss characteristics of the parameter space. First, we study several characteristics of (3, 3) real orthogonal matrices. (i) The (3, 3) real orthogonal matrices have eigenvalues 1, eiγ , and eiγ (0 γ π). This is a direct consequence of (17.77) and invariance of the characteristic equation of the matrix. The rotation axis is associated with an eigenvector that belongs to the eigenvalue 1. Let A ¼ (aij) (1 i, j 3) be a (3, 3) real orthogonal matrix. Let u be an eigenvector belonging to the eigenvalue 1. We suppose that when we are thinking of the rotation on the fixed coordinate system, u is given by a column vector such that 0

u1

1

B C u ¼ @ u2 A : u3 Then, we have

832

20

Theory of Continuous Groups

A u = 1u = u:

ð20:114Þ

Operating AT on both sides of (20.114), we get AT A u = Eu = u = AT u,

ð20:115Þ

where we used the property of an orthogonal matrix; i.e., ATA ¼ E. From (20.114) and (20.115), we have

A AT u ¼ 0:

ð20:116Þ

Writing (20.116) in matrix components, we have [5]. 20

a11

6B 4@ a21 a31

a12 a22 a32

a13

1

0

a11

a21

C B a23 A @ a12 a33 a13

a31

13 0

u1

1

C7 B C a32 A5@ u2 A ¼ 0: a33 u3

a22 a23

ð20:117Þ

That is, ða12 a21 Þu2 þ ða13 a31 Þu3 ¼ 0, ða21 a12 Þu1 þ ða23 a32 Þu3 ¼ 0, ða31 a13 Þu1 þ ða32 a23 Þu2 ¼ 0:

ð20:118Þ

Solving (20.118), we get u1 : u2 : u3 ¼ ða32 a23 Þ : ða13 a31 Þ : ða21 a12 Þ:

ð20:119Þ

0

1 u1 B C If we normalize u, i.e., u12 + u22 + u32 ¼ 1, @ u2 A gives direction cosines of the u3 eigenvector, namely the rotation axis. Equation (20.119) applies to any orthogonal matrix. A simple example is (20.3); a more complicated example can be seen in (17.107). Confirmation is left for readers. We use this property soon below. Let us consider infinitesimal transformations of rotation. As already implied in (20.5), an infinitesimal rotation θ around the z-axis is described by 0

1

B Rzθ ¼ @ θ 0

θ 1 0

0

1

C 0 A: 1

Now, we newly introduce infinitesimal rotation operators as below:

ð20:120Þ

20.2

Rotation Groups: SU(2) and SO(3)

0

1 B Rxξ ¼ @ 0 0

833

1 0 0 0 1 C B 1 ξ A, Ryη ¼ @ 0

0 1

1 0 η 1 C B 0 A, Rzζ ¼ @ ζ

ξ

0

1

1

η

0

ζ 1 0

1 0 C 0 A: ð20:121Þ 1

These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and z-axes, respectively (see Fig. 20.3). Note that these operators commute one another to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have Rxξ Ryη ¼ Ryη Rxξ , Ryη Rzζ ¼ Rzζ Ryη , Rxξ Ryη Rzζ ¼ Rzζ Ryη Rxξ , etc: with, e.g., 0

1

B Rxξ Ryη Rzζ ¼ @ ζ η

ζ 1 ξ

η

1

C ξ A 1

ð20:122Þ

to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded. Next, let us consider a successive transformation with a finite rotation angle ω that follows the infinitesimal rotations. Readers may well ask why and for what purpose we need to make such elaborate calculations. It is partly because when we were dealing with finite groups in the previous chapters, group elements were finite and relevant calculations were straightforward accordingly. But, now we are thinking of continuous groups that possess an infinite number of group elements and, hence, we have to consider a “density” of group elements. In this respect, we are now thinking of how the density of group elements in the parameter space is changed according to

z

Fig. 20.3 Infinitesimal rotations ξ, η, and ζ around the x-, y-, and z-axes, respectively

y O

x

834

20

Theory of Continuous Groups

the rotation of a finite angle ω. The results are used for various group calculations including orthogonalization of characters. Getting back to our subject, we further operate a finite rotation of an angle ω around the z-axis subsequent to the aforementioned infinitesimal rotations. The discussion developed below is due to Hamermesh [5]. We denote this rotation by Rω. Since we are dealing with a spherically symmetric system, any rotation axis can equivalently be chosen without loss of generality. Notice also that the rotations of the same rotation angle belong to the same conjugacy class [see (17.106) and Fig. 17.18]. Defining RxξRyηRzζ RΔ of (20.122), the rotation R that combines RΔ and subsequently occurring Rω is described by 0

1

ζ

η

10

cos ω

sin ω

0

1

B CB C B CB C R ¼ RΔ Rω ¼ B ζ 1 ξ CB sin ω cos ω 0 C @ A@ A η ξ 1 0 0 1 0 1 cos ω ζ sin ω sin ω ζ cos ω η B C B C ¼ B sin ω þ ζ cos ω cos ω ζ sin ω ξ C: @ A ξ sin ω η cos ω ξ cos ω þ η sin ω 1

ð20:123Þ

Hence, we have R32 R23 ¼ ξ cos ω þ η sin ω ðξÞ ¼ ξð1 þ cos ωÞ þ η sin ω, R13 R31 ¼ η ðξ sin ω η cos ωÞ ¼ ηð1 þ cos ωÞ ξ sin ω, R21 R12 ¼ sin ω þ ζ cos ω ð sin ω ζ cos ωÞ ¼ 2ð sin ω þ ζ cos ωÞ: ð20:124Þ Since (20.124) gives a relative directional ratio of the rotation axis, we should normalize a vector whose components are given by (20.124) to get a direction cosines of the rotation axis. Note that the direction of the rotation axis related to R of (20.123) should be close to Rω and that only R12 R21 in (20.124) has a term (i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to normalize R12 R21 to seek the direction cosines to the first order. Hence, dividing (20.124) by 2 sin ω, as components of the direction cosines we get ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ þ , , 1: 2 sin ω 2 sin ω 2 2

ð20:125Þ

Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω0 for R in (20.123). The trace χ of (20.123) is written as

20.2

Rotation Groups: SU(2) and SO(3)

835

χ ¼ 1 þ 2ð cos ω ζ sin ωÞ:

ð20:126Þ

χ ¼ 1 þ 2 cos ω0 :

ð20:127Þ

From (17.78) we have

Equating (20.126) and (20.127), we obtain cos ω0 ¼ cos ω ζ sin ω:

ð20:128Þ

Approximating (20.128), we get 1 2 1 1 ω0 1 ω2 ζω: 2 2

ð20:129Þ

From (20.129), we have a following approximate expression: ðω0 þ ωÞðω0 ωÞ 2ωðω0 ωÞ 2ζω: Hence, we obtain ω0 ω þ ζ:

ð20:130Þ

Combining (20.125) and (20.130), we get their product to the first order such that ω

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ þ , ω , ω þ ζ: 2 sin ω 2 sin ω 2 2

ð20:131Þ

Since these quantities are products of individual direction cosines and the rotation angle ω + ζ, we introduce variables ex, ey, and ez as the x-, y-, and z-related quantities, respectively. Namely, we have

ξð1 þ cos ωÞ η þ , 2 sin ω 2 ηð1 þ cos ωÞ ξ ey ¼ ω , 2 sin ω 2 ez ¼ ω þ ζ: ex ¼ ω

ð20:132Þ

To evaluate how the density of group elements is changed as a function of the rotation angle ω, we are interested in variations of ex, ey, and ez that depend on ξ, η, and ζ. To this end, we calculate the Jacobian J as

836

20

∂ex

∂ξ

∂ðex, ey,ezÞ

∂ey ¼ J¼ ∂ðξ, η, ζ Þ ∂ξ

∂ez

∂ξ

∂ex ∂η ∂ey ∂η ∂ez ∂η

∂ex

∂ζ

ωð1 þ cos ωÞ

2 sin ω ∂ey

¼

ω ∂ζ

2

∂ez 0

∂ζ

Theory of Continuous Groups

ω 0

2

ωð1 þ cos ωÞ 0

2 sin ω

0 1

2 ω2 ð1 þ cos ωÞ ω2 ¼ , þ1 ¼ 2 4 sin ω 4 sin 2 ω

ð20:133Þ

2

where we used formulae of trigonometric functions with the last equality. Note that (20.133) does not depend on the signs of ω. This is because J represents the relative volume density ratio between two parameter spaces of the ex ey ez-system and ξηζ-system and this ratio is solely determined by the modulus of rotation angle. Taking ω ! 0, we have 0

ðω2 Þ ω2 1 2ω ¼ lim 2 0 ¼ lim 2 ω 4 ω!0 2 sin ω2 12 cos ω!0 4 sin ω!0 4 sin ω 2 2

lim J ¼ lim

ω!0

ω 2

¼ lim

ω!0

ω sin ω

ð ωÞ 0 1 0 ¼ lim cos ω ¼ 1: ω!0 ð sin ωÞ ω!0

¼ lim

Thus, the relative volume density ratio in the limit of ω ! 0 is 1, as expected. Let dV ¼ dexdeydez and dΠ ¼ dξdηdζ be volume elements of each coordinate system. Then we have dV ¼ JdΠ:

ð20:134Þ

We assume that J¼

dV ρξηζ ¼ , dΠ ρexeyez

where ρ ex ey ez and ρξηζ are a “number density” of group elements in each coordinate system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are converted by a finite rotation Rω of (20.123) into the ex ey ez -coordinate. In this way, J can be viewed as an “expansion” coefficient as a function of rotation angle jωj. Hence, we have dV ¼

ρξηζ dΠ ρ ex ey ez

or ρ ex ey ez dV ¼ ρξηζ dΠ:

ð20:135Þ

20.2

Rotation Groups: SU(2) and SO(3)

837

Equation (20.135) implies that total (infinite) number of group elements that are contained in the group SO(3) is invariant with respect to the transformation. Let us calculate the total volume of SO(3). This can be measured such that Z

Z dΠ ¼

1 dV: J

ð20:136Þ

Converting dV to the polar coordinate representation, we get Z Π½SOð3Þ ¼

Z

π

dΠ ¼

Z

Z

π

de ω

dθ

0

0

2π

dϕ 0

4 sin 2 ω~2 e2 ω

e 2 sin θ ¼ 8π 2 , ω

ð20:137Þ

where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is e j ω j so that the radial coordinate can be positive. As already replaced with ω noted, (20.133) does not depend on the signs of ω. In other words, among the angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as π ω < π (instead of 0 ω < 2π) so that it can be conformed to the radial coordinate. We utilize (12.137) for calculation of various functions. Let f(ϕ, θ, ω) be an arbitrary function on a parameter space. We can readily estimate a “mean value” of f(ϕ, θ, ω) on that space. That is, we have R f ðϕ, θ, ωÞ

f ðϕ, θ, ωÞdΠ R , dΠ

ð20:138Þ

where f ðϕ, θ, ωÞ represents a mean value of the function and given by f ðϕ, θ, ωÞ ¼

Z π Z π Z 4 de ω dθ 0

0

0

2π

dϕf ðϕ, θ, ωÞ sin 2

Z e ω sin θ = dΠ: 2

ð20:139Þ

There would be some inconvenience to use (20.139) because of the mixture of e . Yet, that causes no problem if f(ϕ, θ, ω) is an even function with variables ω and ω respect to ω (see Sect. 20.2.6).

20.2.6 Irreducible Characters of SO(3) and Their Orthogonality To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of Sect. 20.2.3. Also, we recall how successive coordinate transformations produce changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the parameters α, β, and γ uniquely specify individual rotation, different sets of α, β, and γ (in this section we use ω instead of γ) cause different irreducible

838

20

Theory of Continuous Groups

Fig. 20.4 Geometrical arrangement of two rotation axes A and A0 accompanied by the same rotation angle ω

representations. A corresponding rotation matrix is given by (17.101). In fact, the said matrix is a representation of the rotation. Keeping these points in mind, we discuss irreducible representations and their characters particularly in relation to the orthogonality of the irreducible characters. Figure 20.4 depicts a geometrical arrangement of two rotation axes accompanied by the same rotation angle ω. Let Rω and R0ω be such two rotations around the rotation axis A and A0, respectively. Let Q be another rotation that transforms the rotation axis A to A0. Then we have [1, 2] R0ω ¼ QRω Q1 :

ð20:140Þ

This implies that Rω and R0ω belong to the same conjugacy class. To consider the situation more clearly, let us view the two rotations Rω and R0ω from two different coordinate systems, say some xyz-coordinate system and another x0y0z0-coordinate system. Suppose also that A coincides with the z-axis and that A0 coincides with the z0-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in reference to the x0y0z0-system, the representation of these rotations must be identical in reference to the individual coordinate systems. Let this representation matrix be Rω . (i) If we describe Rω and R0ω with respect to the xyz-system, we have 1 Rω ¼ Rω , R0ω ¼ Q1 Rω Q1 ¼ QRω Q1 :

ð20:141Þ

Namely, we reproduce R0ω ¼ QRω Q1 :

ð20:140Þ

(ii) If we describe Rω and R0ω with respect to the x0y0z0-system, we have R0ω ¼ Rω , Rω ¼ Q1 Rω Q:

ð20:142Þ

Hence, again we reproduce (20.140). That is, the relation (20.140) is independent of specific choices of the coordinate system.

20.2

Rotation Groups: SU(2) and SO(3)

839

Taking the representation indexed by l of Sect. 20.2.4 with respect to (20.140), we have DðlÞ R0ω ¼ DðlÞ QRω Q1 ¼ DðlÞ ðQÞDðlÞ ðRω ÞDðlÞ Q1 ¼ DðlÞ ðQÞDðlÞ ðRω Þ½DðlÞ ðQÞ1 ,

ð20:143Þ

where with the last equality we used (18.6). Operating D(l )(Q) on (20.143) from the right, we get DðlÞ R0ω DðlÞ ðQÞ ¼ DðlÞ ðQÞDðlÞ ðRω Þ:

ð20:144Þ

Since both Dð lÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations determined by the orientation of the rotation axis A that is accompanied by an azimuthal angle α (0 α < 2π) and a zenithal angle β (0 β < π) (see Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4. In Sect. 20.2.2 we know that the spherical surface harmonics (l ) Ym ð and span the repl θ, ϕÞ ðm ¼ l, , 0, , lÞ constitute basis functions of D resentation space. A dimension of the matrix (or the representation space) is 2l + 1. 0 Therefore, D(l ) and Dðl Þ for different l and l0 have a different dimension. From 0 (18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent. Returning to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13), the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω ¼ χ ðlÞ ðRω Þ. Then, we put χ ðlÞ ðωÞ χ ðlÞ ðRω Þ ¼ χ ðlÞ R0ω :

ð20:145Þ

Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96) whose representation matrix is given by (20.97). We have eilω 1 eiωð2lþ1Þ imω e ¼ m¼l 1 eiω eilω ð1 eiω Þ 1 eiωð2lþ1Þ ¼ , ð1 eiω Þð1 eiω Þ

χ ðlÞ ðωÞ ¼

Xl

where ½numerator of ð20:146Þ

ð20:146Þ

840

20

Theory of Continuous Groups

¼ eilω

þ eilω eiωðlþ1Þ eiωðlþ1Þ ¼ 2½ cos lω cos ðl þ 1Þω 1 ω ¼ 4 sin l þ ω sin 2 2 and ½denominator of ð20:146Þ ¼ 2ð1 cos ωÞ ¼ 4sin2

ω : 2

Thus, we get sin l þ 12 ω χ ð ωÞ ¼ : sin ω2 ðlÞ

ð20:147Þ

To adapt (18.76) to the present formulation, we rewrite it as 1 X ðαÞ ðβÞ χ ðgÞ χ ðgÞ ¼ δαβ : g n

ð20:148Þ

A summation over a finite number of group element n in (20.148) should be read as integration in the case of the continuous groups. Instead of a finite number of irreducible representations in a finite group, we are thinking of an infinite number of irreducible representations with continuous groups. In (20.139) the denominator R dΠ (¼8π 2) corresponds to n of (20.148). If f(ϕ, θ, ω) or f(α, β, ω) is an even function with respect to ω (π ω < π) as in the case of (20.147), the numerator of (20.139) can be expressed in a form of Z

Z

Z

π

f ðα, β, ωÞdΠ ¼ 0

Z

π

dω 0

Z

Z

π

¼4

dω 0

2π

dβ

dαf ðα, β, ωÞ

0

Z

π

2π

dβ 0

4 sin 2 ω2 2 ω sin β ω2

dαf ðα, β, ωÞ sin 2

0

ω sin β: 2

ð20:149Þ

If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character described by (20.147), the calculation is further simplified such that Z

Z

π

f ðωÞdΠ ¼ 16π

dωf ðωÞ sin 2

0

0 Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have

ω : 2

ð20:150Þ

20.2

Z h

Rotation Groups: SU(2) and SO(3)

Z

h 0 i π ω ðωÞ χ ðωÞdΠ ¼ 16π dω χ ðl Þ ðωÞ χ ðlÞ ðωÞsin 2 2 0

1 1 Z π sin l0 þ ω sin l þ ω Z π

1 1 2 2 2ω ¼ 16π dω sin ¼ 16π dωsin l0 þ ωsin l þ ω ω ω 2 2 2 0 0 sin sin 2 2 Z χ

i

841

ðl0 Þ

π

¼ 8π

ðlÞ

dω½ cos ðl0 lÞω cos ðl0 þ l þ 1Þω:

0

ð20:151Þ Since l0 + l + 1 > 0, the integral of cos(l0 + l + 1)ω term vanishes. Only if l0 l ¼ 0, the integral of cos(l0 l)ω term does not vanish, but takes a value of π. Therefore, we have Z h i 0 χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ ¼ 8π 2 δl0 l :

ð20:152Þ

0 Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get

0 χ ðl Þ ðωÞ χ ðlÞ ðωÞ ¼ δl0 l :

ð20:153Þ

This relation gives the orthogonalization condition in concert with (20.148) obtained in the case of a finite group. In Sect. 18.6 we have shown that the number of inequivalent irreducible representations of a finite group is well defined and given by the number of conjugacy classes of that group. This is clearly demonstrated in (18.144). In the case of the continuous groups, however, the situation is somewhat complicated and we are uncertain of the number of the inequivalent irreducible representations. We address this problem as follows: Suppose that Ω(α, β, ω) would be an irreducible representation that is inequivalent to individual D(l)(α, β, ω) (l ¼ 0, 1, 2, ). Let K(ω) be a character of Ω(α, β, ω). Defining f ðωÞ χ ðlÞ ðωÞ KðωÞ, (20.150) reads as Z h Z π h i i ω ðlÞ χ ðωÞ KðωÞdΠ ¼ 16π dω χ ðlÞ ðωÞ KðωÞ sin 2 : 2 0

ð20:154Þ

The orthogonalization condition (20.153) demands that the integral of (20.154) vanishes. Similarly, we would have Z h

χ

ðlþ1Þ

Z π h i i ω ðωÞ KðωÞdΠ ¼ 16π dω χ ðlþ1Þ ðωÞ KðωÞ sin 2 , 2 0

ð20:155Þ

842

20

Theory of Continuous Groups

which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and taking account of the fact that χ (l )(ω) is real [see (20.147)], we have Z 0

π

h i ω dω χ ðlþ1Þ ðωÞ χ ðlÞ ðωÞ KðωÞ sin 2 ¼ 0: 2

ð20:156Þ

Invoking a trigonometric formula of sin a sin b ¼ 2 cos

aþb ab sin 2 2

and applying it to (20.147), (20.156) can be rewritten as Z

π

dω½ cos ðl þ 1ÞωKðωÞ sin 2

2 0

ω ¼ 0 ðl ¼ 0, 1, 2, Þ: 2

ð20:157Þ

Putting l ¼ 0 in LHS of (20.154) and from the assumption that the integral of (20.154) vanishes, we also have Z

π

dωKðωÞ sin 2

0

ω ¼ 0, 2

ð20:158Þ

where we used χ (0)(ω) ¼ 1. Then, (20.157) and (20.158) are combined to give Z 0

π

dωð cos lωÞKðωÞ sin 2

ω ¼ 0 ðl ¼ 0, 1, 2, Þ: 2

ð20:159Þ

This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish. Considering that coslω (l ¼ 0, 1, 2, ) forms a complete orthonormal system in the region [0, π], we must have KðωÞ sin 2

ω 0: 2

ð20:160Þ

Requiring K(ω) to be continuous, (20.160) is equivalent to K(ω) 0. This implies that there is no other inequivalent irreducible representation than D(l )(α, β, ω) (l ¼ 0, 1, 2, ). In other words, the representations D(l )(α, β, ω) (l ¼ 0, 1, 2, ) constitute a complete set of irreducible representations. The spherical (l ) surface harmonics Y m l ðθ, ϕÞ ðm ¼ l, , 0, , lÞ constitute basis functions of D (l ) and span the representation space of D (α, β, ω) accordingly. In the above discussion, we assumed that K(ω) is an even function with respect to ω as in the case of χ (l )(ω) in (20.147). Hence, KðωÞ sin 2 ω2 is an even function as well. Therefore, we assumed that KðωÞ sin 2 ω2 can be expanded to the Fourier cosine series.

20.3

20.3

Clebsch–Gordan Coefficients of Rotation Groups

843

Clebsch2Gordan Coefficients of Rotation Groups

In Sect. 18.8, we studied direct-product representations. We know that even though D(α) and D(β) are both irreducible, D(α β) is not necessarily irreducible. This is a common feature for both finite and infinite groups. Moreover, the reducible unitary representation can be expressed as a direct sum of irreducible representations such that DðαÞ ðgÞ DðβÞ ðgÞ ¼

X

q D ω ω

ðωÞ

ðgÞ:

ð18:199Þ

In the infinite groups, typically the continuous groups, this situation often appears when we deal with the addition of two angular momenta. Examples include the addition of two (or more) orbital angular momenta and that of the orbital angular momentum and spin angular momentum [6, 8]. Meanwhile, there is a set of basis vectors that spans the representation space with respect to individual irreducible representations. There, we have a following question: What are the basis vectors that span the total reduced representation space described by (18.199)? An answer for this question is that these vectors must be constructed by the basis vectors relevant to the individual irreducible representations. The ClebschGordan coefficients appear as coefficients with respect to a linear combination of the basis vectors associated with those irreducible representations. In a word, to find the ClebschGordan coefficients is equivalent to the calculation of proper coefficients of the basis functions that span the representation space of the direct-product groups. The relevant continuous groups which we are interested in are SU(2) and SO(3).

20.3.1 Direct-Product of SU(2) and Clebsch2Gordan Coefficients In Sect. 20.2.6 we have explained the orthogonality of irreducible characters of SO (3) by deriving (20.153). It takes advanced theory to show the orthogonality of irreducible characters of SU(2). Fortunately, however, we have an expression similar to (20.153) with SU(2) as well. Readers are encouraged to look up appropriate literature for this [9]. On the basis of this important expression, we further develop the representation theory of the continuous groups. Within this framework, we calculate the ClebschGordan Coefficients. As in the case of the previous section, the character of the irreducible representation D( j )(α, β, γ) of SU(2) is described only as a function of a rotation angle. Similarly to (20.147), in SU(2) we get

844

20

ð jÞ

χ ð ωÞ ¼

Xj m¼j

e

imω

Theory of Continuous Groups

sin j þ 12 ω ¼ , sin ω2

ð20:161Þ

where χ ( j )(ω) is an irreducible character of D( j )(α, β, γ). It is because we can align the rotation axis in the direction of, e.g., the z-axis and the representation matrix of a rotation angle ω is typified by D( j )(ω, 0, 0) or D( j )(0, 0, ω). Or, we can evenly choose any direction of the rotation axis at will. We are interested in calculating angular momenta of a coupled system, i.e., addition of angular momenta of that system [5, 6]. The word of “coupled system” needs some explanation. This is because, on the one hand, we may deal with, e.g., a sum of an orbital angular momentum and a spin angular momentum of a “single” electron, but, on the other hand, we might calculate, e.g., a sum of two angular momenta of “two” electrons. In either case, we will deal with the total angular momenta of the coupled system. Now, let us consider a direct-product group for this kind of problem. Suppose that one system is characterized by a quantum number j1 and the other by j2. Here these numbers are supposed to be those of (3.89), namely the highest positive number of the generalized angular momentum in the z-direction. That is, the representation matrices of rotation for systems 1 and 2 are assumed to be Dð j1 Þ ðω, 0, 0Þ and Dð j2 Þ ðω, 0, 0Þ, respectively. Then, we are to deal with the direct-product representation Dð j1 Þ Dð j2 Þ (see Sect. 18.8). As in (18.193), we describe Dð j1 j2 Þ ðωÞ ¼ Dð j1 Þ ðωÞ Dð

j2 Þ

ðωÞ:

ð20:162Þ

In (20.162), the group is represented by a rotation angle ω, more strictly the rotation operation of an angle ω. The quantum numbers j1 and j2 mean the irreducible representations of the rotations for systems 1 and 2, respectively. According to (20.161), we have χð

j1 j2 Þ

ðωÞ ¼ χ ð j1 Þ ðωÞχ ð

j2 Þ

ðωÞ,

ð20:163Þ

where χ ð j1 j2 Þ ðωÞ gives a character of the direct-product representation; see (18.198). Furthermore, we have [5]. χð

j1 j2 Þ

ð ωÞ ¼ ¼ ¼

X j

1

X j1

eim1 ω X j2

X j

m1 ¼j1

m1 ¼j1 j1 þj2

X

J¼j j1 j2 j

m2 ¼j2

eim2 ω X j1 þj2 XJ eiðm1 þm2 Þω ¼ eiMω J¼j j j j M¼J 2

m2 ¼j2

1

2

ðJ Þ

χ ðωÞ, ð20:164Þ

where a positive number J can be chosen from among

20.3

Clebsch–Gordan Coefficients of Rotation Groups

845

J ¼ j j1 j2 j, j j1 j2 j þ 1, , j1 þ j2 : Rewriting (20.164) more explicitly, we have χ ð j1 j2 Þ ðωÞ ¼ χ ðj j1 j2 jÞ ðωÞ þ χ ðj j1 j2 jþ1Þ ðωÞ þ þ χ ð j1 þj2 Þ ðωÞ:

ð20:165Þ

If both j1 and j2 are integers, from (20.165) we can immediately derive an important result. In light of (18.199) and (18.200), we multiply χ (k)(ω) on both sides of (20.165) and then integrate it to get Z

χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞdΠ Z Z ¼ χ ðkÞ ðωÞ χ ðj j1 j2 jÞ ðωÞdΠ þ χ ðkÞ ðωÞ χ ðj j1 j2 jþ1Þ ðωÞdΠ þ Z þ χ ðkÞ ðωÞ χ ð j1 þj2 Þ ðωÞdΠ ¼ 8π 2 δk,jj1 j2 j þ δk,j j1 j2 jþ1 þ þ δk,j1 þj2 ,

ð20:166Þ

where k is zeroR or an integer and with the last equality we used (20.152). Dividing both sides by dΠ, we get χ ð k Þ ð ωÞ χ ð

j1 j2 Þ ðωÞ

¼ δk,jj1 j2 j þ δk,jj1 j2 jþ1 þ þ δk,j1 þj2 ,

ð20:167Þ

where we used (20.153). Equation (20.166) implies that only if k is identical to one out of jj1 j2j, jj1 j2 j + 1, , j1 + j2, χ ðkÞ ðωÞ χ ð j1 j2 Þ ðωÞ does not vanish. If, for example, we choose jj1 j2j for k in (20.167), we have χ ðj j1 j2 jÞ ðωÞ χ ð j1 j2 Þ ðωÞ ¼ δjj1 j2 j,jj1 j2 j ¼ 1:

ð20:168Þ

This relation corresponds R to (18.200) where a finite group is relevant. Note that the volume of SO(3), i.e., dΠ ¼ 8π 2 corresponds to the order n of a finite group in (20.148). Considering (18.199) and (18.200) once again, it follows that in the above case Dðj j1 j2 jÞ takes place once and only once in the direct-product representation Dð j1 j2 Þ . Thus, we obtain the following important relation: Dð

j1 j2 Þ

¼ Dð

j1 Þ

ðωÞ Dð

j2 Þ

ð ωÞ ¼

X j1 þj2 J¼jj1 j2 j

DðJ Þ :

ð20:169Þ

Any group (finite or infinite) is said to be simply reducible, if each irreducible representation takes place at most once when the direct-product representation of that group in question is reduced. Equation (20.169) clearly shows that SO(3) is simply reducible. With respect to SU(2) we also have the same expression of (20.169) [9].

846

20

Theory of Continuous Groups

In Sect. 18.8 we examined the direct-product representation of two (irreducible) representations. Let D(μ)(g) and D(v)(g) be two different irreducible representations of the group G. Here G may be either a finite or infinite group. Note that in Sect. 18.8 we have focused on the finite groups and that now we are thinking of an infinite group, typically SU(2) and SO(3). Let nμ and nv be a dimension of the representation space of the irreducible representations μ and v, respectively. We assume that each representation space is spanned by following basis functions: ðμÞ

ψj

μ ¼ 1, , nμ

and

ð vÞ

ϕl ðμ ¼ 1, , nv Þ:

We know from Sect. 18.8 that nμnv new basis functions can be constructed using ðμÞ ðvÞ the functions that are described by ψ j ϕl . Our next task will be to classify these nμnv functions into those belonging to different irreducible representations. For this, we need relation of (18.199). According to the custom, we rewrite it as follows: DðμÞ ðgÞ DðvÞ ðgÞ ¼

X ω

ðμvωÞDðωÞ ðgÞ,

ð20:170Þ

where (μvω) is identical with qω of RHS of (18.199) and shows how many times the same representation ω takes place in the direct-product representations. Naturally, we have ðμvωÞ ¼ ðvμωÞ: Then, we get X ω

ðμvωÞnω ¼ nμ nv :

Meanwhile, we should be able to constitute the functions that have transformation properties the same as those of the irreducible representation ω. In other words, these functions should make the basis functions belonging to ω. Let the function be Ψðsωτω Þ. Then, we have Ψðsωτω Þ ¼

X ðμÞ ðvÞ ðμj, vljωτω sÞψ j ϕl ,

ð20:171Þ

j, l

where τω ¼ 1, , (μvω) and s designates the functions contained in the irreducible representation ω; i.e., s ¼ 1, , nω. If ω takes place more than once, we label ω as ωτω . The coefficients with respect to the above linear combination ðμj, vljωτω sÞ are called ClebschGordan coefficients. Readers might well be bewildered by the notations of abstract algebra, but Examples of Sect. 20.3.3 should greatly relieve them of anxiety about it.

20.3

Clebsch–Gordan Coefficients of Rotation Groups

847

As we shall see later in this chapter, the functions Ψðsωτω Þ are going to be ðμÞ ðvÞ normalized along with the product functions ψ j ϕl . The ClebschGordan coefficients must form a unitary matrix accordingly. Notice that a transformation between two orthonormal basis sets is unitary (see Chap. 14). According as there are nμnv product functions as the basis vectors, the ClebschGordan coefficients are associated with (nμnv, nμnv) square matrices. The said coefficients can be defined with any group either finite or infinite. Or rather, difficulty in determining the ClebschGordan coefficients arises when τω is more than one. This is because we should take account of linear combinations described by X

c Ψðωτω Þ τ ω ωτ ω s

that belong to the irreducible representation ω. This would cause arbitrariness and ambiguity [5]. Nevertheless, we do not need to get into further discussion about this problem. From now on, we focus on the ClebschGordan coefficients of SU(2) and SO(3), a typical simply reducible group, namely τω is at most one. For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1) (2j2 + 1) for the representation space with the direct product of two representations is redistributed to individual irreducible representations that take place only once. The arithmetic based on this fact is as follows: ð2j1 þ 1Þð2j2 þ 1Þ n h io 1 ¼ 2ð j1 j2 Þð2j2 þ 1Þ þ 2 2j2 ð2j2 þ 1Þ þ ð2j2 þ 1Þ 2 ¼ 2ð j1 j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ þ 2j2 Þ þ ð2j2 þ 1Þ ¼ ½2ð j1 j2 Þ þ 0 þ 2½ð j1 j2 Þ þ 1 þ 2½ð j1 j2 Þ þ 2 þ

ð20:172Þ

þ2½ð j1 j2 Þ þ 2j2 þ ð2j2 þ 1Þ ¼ ½2ð j1 j2 Þ þ 1 þ f2½ð j1 j2 Þ þ 1 þ 1g þ þ ½2ð j1 þ j2 Þ þ 1, where with the second equality, we used the formula 1 þ 2 þ þ n ¼ 12 nðn þ 1Þ; resulting numbers of 0, 1, 2, , and 2j2 are distributed to 2j2 + 1 terms of the subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1 terms each as well on the rightmost hand side. Equation (20.172) is symmetric with j1 and j2, but if we assume that j1 j2, each term of the rightmost hand side of (20.172) is positive. Therefore, for convenience we assume j1 j2 without loss of generality. To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3). Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general case. In these tables the topmost layer has a diagram indicated by . This diagram corresponds to the case of J ¼ j j1 j2j and contains different 2|j1 j2| + 1 quantum states. The lower layers include diagrams indicated by . These diagrams correspond to the case of J ¼ |j1 j2| + 1, , j1 + j2. For

848

20

Theory of Continuous Groups

Table 20.1 Summation of angular momenta. J ¼ 0, 1, or 2; M J, J + 1, , J. Regarding a dotted aqua line, see text −1

0

−1

−2

−1

0

0

−1

0

1

0

1

2

1 (=

)

1 (=

)

Table 20.2 Summation of angular momenta. J ¼ 1, 2, or 3; M J, J + 1, , J −2

−1

0

1

2 (=

−1

−3

−2

−1

0

1

0

−2

−1

0

1

2

−1

0

1

2

3

1 (=

)

)

Table 20.3 Summation of angular momenta in a general case. We suppose j1 > 2j2. J ¼ |j1 j2|, , j1 + j2; M J, J + 1, , J m1 m2 − −

−

− +1

− − +1

⋯

− − ⋯

+1

⋯

− −

+1

⋯

− −

+2

⋯

⋯

⋯

2 −

⋯

− −

⋯ +1

⋯

⋯ ⋯

−2

⋯

−3 −3 ⋯

⋯ +1

−1 −

−1

⋯

−

⋯

⋯

− −

+1 ⋯

each J, 2J + 1 functions (or quantum states) are included and labelled according to different numbers M + J, J + 1, , J. The rightmost column displays M m1 + m2 ¼ |j1 j2|, |j1 j2| 1, , j1 + j2 from the topmost row through the bottommost row. Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1) functions that form basis vectors of Dð j1 Þ ðωÞ Dð j2 Þ ðωÞ are redistributed according to the M number. The same numbers that appear from the topmost row through the bottommost row are marked red and connected with dotted aqua lines. These lines form a parallelogram together with the top and bottom horizontal lines as shown. The upper and lower sides of the parallelogram are 2 jj1 j2j in width. The parallelogram gets flattened with decreasing jj1 j2j. If j1 ¼ j2, the parallelogram coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2 and 20.3.3 (vide infra). Bearing in mind the above remarks, we focus on the coupling of two angular momenta as a typical example. Within this framework we deal with the

20.3

Clebsch–Gordan Coefficients of Rotation Groups

849

ClebschGordan coefficients with regard to SU(2). We develop the discussion according to literature [5]. The relevant approach helps address a more complicated situation where the coupling of three or more angular momenta including j j coupling and L S coupling is responsible [6]. From (20.47) we know that fm forms a basis set that spans a (2j + 1)-dimensional representation space associated with D( j ) such that u jþm v jm f m ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm ¼ j, j þ 1, , j 1, jÞ: ð j þ mÞ!ð j mÞ!

ð20:47Þ

The functions fm are transformed according to Rða, bÞ so that we have Rða, bÞð f m Þ

X

ð jÞ f 0 D 0 ða, bÞ, m0 m m m

ð20:48Þ

where j can be either an integer or a half-odd-integer. We henceforth work on fm of (20.47).

20.3.2 Calculation Procedures of Clebsch2Gordan Coefficients In the previous section we have described how systematically the quantum states are made up of two angular momenta. We express those states in terms of a linear combination of the basis functions related to the direct-product representations of (20.169). To determine those coefficients, we follow calculation procedures due to Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize each item separately. (1) Invariant quantities AJ and BJ: We start with seeking invariant quantities under the unitary transformation U of (20.35) expressed as U¼

a

b

b

a

with jaj2 þ jbj2 ¼ 1:

ð20:35Þ

As in (20.46) let us consider a combination of variables u (u2 u1) and v (v2 v1) that are transformed such that

u02 u01 ¼ ðu2 u1 Þ

a

b

b

a

,

For a shorthand notation we have

v02 v01 ¼ ðv2 v1 Þ

:

a

b

b

a

ð20:173Þ

850

20

Theory of Continuous Groups

u0 ¼ um, v0 ¼ vm,

ð20:174Þ

where we define u0, v0, m as u u02 u01 ,

v v02 v01 ,

0

0

m

a

b

b

a

:

ð20:175Þ

As in (20.47) we also define the following functions as u1 j1 þm1 u2 j1 m1 ðm1 ¼ j1 , j1 þ 1, , j1 1, j1 Þ, ψ mj11 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð j1 þ m1 Þ!ð j1 m1 Þ! ϕmj22

v1 j2 þm2 v2 j2 m2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðm2 ¼ j2 , j2 þ 1, , j2 1, j2 Þ: ð j2 þ m2 Þ!ð j2 m2 Þ!

ð20:176Þ

Meanwhile, we also consider a combination of variables x (x2 x1) that are transformed as

x02

x01

¼ ð x2 x1 Þ

a

b

b

a

:

ð20:177Þ

For a shorthand notation we have x0 ¼ xm ,

ð20:178Þ

where we have x0 x02 x01 , x ðx2 x1 Þ, m ¼

a

b

b

a

:

ð20:179Þ

The unitary matrix m is said to be a complex conjugate matrix of m. We say that in (20.174) u and v are transformed covariantly and that in (20.178) x is transformed contravariantly. Defining a unitary operator g as g

0

1

1

0

ð20:180Þ

and operating g on both sides of (20.174), we have u0 g ¼ u gg{ mg ¼ ug g{ mg ¼ ugm : Rewriting (20.181) as

ð20:181Þ

20.3

Clebsch–Gordan Coefficients of Rotation Groups

851

u0 g ¼ ðugÞm ,

ð20:182Þ

we find that ug is transformed contravariantly. Taking transposition of (20.178), we have x0 ¼ ðm ÞT xT ¼ m{ xT : T

Then, we get T u0 x0 ¼ ðumÞ m{ xT ¼ u mm{ xT ¼ uxT ¼ u2 x2 þ u1 x1 ¼ uxT ,

ð20:183Þ

where with the third equality we used the unitarity of m. This implies that uxT is an invariant quantity under the unitary transformation by m. We have v0 x0 ¼ vxT , T

ð20:184Þ

meaning that vxT is an invariant as well. In a similar manner, we have u01 v02 u02 v01 ¼ v0 ðu0 gÞ ¼ v0 gT u0 ¼ vmgT mT uT ¼ vgT uT ¼ vðugÞT 0 1 u2 ¼ ðv2 v1 Þ ¼ u1 v 2 u2 v 1 : 1 0 u1 T

T

ð20:185Þ

Thus, v(ug)T (or u1v2 u2v1) is another invariant under the unitary transformation by m. In the above discussion we can view (20.183) to (20.185) as a quadratic form of two variables (see Sect. 14.5). Using these invariants, we define other important invariants AJ and BJ as follows [5]: AJ ðu1 v2 u2 v1 Þ j1 þj2 J ðu2 x2 þ u1 x1 Þ j1 j2 þJ ðv2 x2 þ v1 x1 Þ

j2 j1 þJ

BJ ðu2 x2 þ u1 x1 Þ2J :

,

ð20:186Þ ð20:187Þ

The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1 and x2 is 2J. Expanding AJ in powers of x1 and x2, we get AJ ¼

XJ M¼J

W JM X JM ,

ð20:188Þ

where X JM is defined as x1 JþM x2 JM X JM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1, , J 1, J Þ ðJ þ M Þ!ðJ M Þ!

ð20:189Þ

852

20

Theory of Continuous Groups

and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will be determined soon after in combination with X JM . Meanwhile, we have X2J

ð2J Þ! ðu x Þk ðu2 x2 Þ2Jk k!ð2J kÞ! 1 1 XJ 1 ¼ ð2J Þ! M¼J ðu x ÞJþM ðu2 x2 ÞJM ðJ þ M Þ!ðJ M Þ! 1 1 XJ 1 u JþM u2 JM x1 JþM x2 JM ¼ ð2J Þ! M¼J ðJ þ M Þ!ðJ M Þ! 1 XJ ¼ ð2J Þ! M¼J ΦJM X JM ,

BJ ¼

k¼0

ð20:190Þ

where ΦJM is defined as u1 JþM u2 JM ΦJM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðM ¼ J, J þ 1, , J 1, J Þ: ðJ þ M Þ!ðJ M Þ!

ð20:191Þ

Implications of the above calculations of abstract algebra are as follows: In Sect. ð jÞ 20.2.2 we determined the representation matrices Dm0 m of SU(2) using (20.47) and (20.48). We see that the functions fm (m ¼ j, j + 1, , j 1, j) of (20.47) are transformed as basis vectors by the unitary transformation D( j ). Meanwhile, the functions ΦJM ðM ¼ J, J þ 1, , J 1, J Þ of (20.191) are transformed as the basis vectors by the unitary transformation D(J ). Here let us compare (20.188) and (20.190). Then, we see that whereas ΦJM associates X JM with the invariant BJ, W JM associates X JM with the invariant AJ. Thus, W JM plays the same role as ΦJM and, hence, we expect that W JM is eligible for the basis vectors for D(J ) as well. (2) Binomial expansion of AJ: To calculate W JM , using the binomial theorem we expand AJ such that [5] ð u1 v 2 u2 v 1 Þ

j1 þj2 J

ð u2 x 2 þ u 1 x 1 Þ

j1 j2 þJ

ð v2 x2 þ v1 x1 Þ

j2 j1 þJ

¼ ¼ ¼

X

j1 þj2 J λ¼0

X

j1 j2 þJ μ¼0

X

j1 j2 þJ μ¼0

ð1Þ

j1 þ j2 J

λ

j1 j2 þ J μ j2 j1 þ J v

!

λ !

ð u1 v 2 Þ

j1 þj2 Jλ

ð u1 x 1 Þ

j1 j2 þJμ

ð v1 x1 Þ

j2 j1 þJv

ð u2 v 1 Þ λ ,

ð u2 x 2 Þ μ ,

! ð v2 x2 Þ v : ð20:192Þ

20.3

Clebsch–Gordan Coefficients of Rotation Groups

853

Therefore, we have X

AJ ¼

λ,μ,v

ð1Þ

λ

j1 þ j2 J λ

u1 2j1 λμ u2 λþμ v1

j2 j1 þJvþλ

j1 j2 þ J μ

v2

j1 þj2 Jλþv

j2 j1 þ J v

x1 2Jμv x2 μþv

ð20:193Þ

Introducing new summation variables m1 ¼ j1 λ μ and m2 ¼ J j1 + λ v, we obtain X

AJ ¼

λ,μ,v

ð1Þ

λ

u1 where to derive

j1 þ j2 J

!

λ j1 þm1

u2

j2 j1 þ J

j1 j2 þ J

!

j1 λ m1

j1 m1

v1

j2 þm2

v2

j2 m2

j2 j1 þ J

!

j2 λ þ m 2

ð20:194Þ

x1 Jþm1 þm2 x2 Jm1 m2 ,

in RHS we used

j2 λ þ m2

a a ¼ : b ab

ð20:195Þ

Further using (20.176) and (20.189), we modify AJ such that AJ ¼

X m1 ,m2

ð1Þλ ,λ

ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ! λ!ð j1 þ j2 J λÞ!ð j1 λ m1 Þ!

½ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!ðJ þ m1 þ m2 Þ!ðJ m1 m2 Þ!1=2 j1 j2 J ψ m1 ϕm2 X m1 þm2 : ðJ j2 þ λ þ m1 Þ!ð j2 λ þ m2 Þ!ðJ j1 þ λ m2 Þ!

Setting m1 + m2 ¼ M, we have AJ ¼

X m1 ,m2

ð1Þλ ,λ

ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ! λ!ð j1 þ j2 J λÞ!ð j1 λ m1 Þ!

½ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!ðJ þ M Þ!ðJ M Þ!1=2 j1 j2 J ψ m1 ϕm2 X M : ðJ j2 þ λ þ m1 Þ!ð j2 λ þ m2 Þ!ðJ j1 þ λ m2 Þ! ð20:196Þ

In this way, we get W JM for (20.188) expressed as W JM ¼ ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ! X CJ ψ j1 ϕ j2 , m ,m ,m þm ¼M m1 ,m2 m1 m2 1

where we define C Jm1 ,m2 as

2

1

2

ð20:197Þ

854

C Jm1 ,m2

20

X

Theory of Continuous Groups

ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2 : λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ! 1 2 1 2 2 1

ð20:198Þ From (20.197), we assume that the functions W JM ðM ¼ J, Jþ 1, , J 1, JÞ are suited for forming basis vectors of D(J ). Then, any basis set ΛJM should be connected to W JM via unitary transformation such that ΛJM ¼ UW JM , where U is a unitary matrix. Then, we get W JM ¼ U { ΛJM . What we have to do is only to normalize W JM. If we carefully look at the functional form of (20.197), we become aware that we only have to adjust the first factor of (20.197) that is a function of only J. Hence, as the proper normalized functions we expect to have ΨJM ¼ C ðJ ÞW JM ¼ C ðJ ÞU { ΛJM ,

ð20:199Þ

where C(J ) is a constant that depends only on J with given numbers j1 and j2. An implication of (20.199) is that we can get suitably normalized functions ΨJM using arbitrary basis vectors ΛJM . Putting ρJ C ðJ Þ ð j1 þ j2 J Þ!ð j1 j2 þ J Þ!ð j2 j1 þ J Þ!,

ð20:200Þ

we get ΨJM ¼ ρJ

X m1 ,m2 ,m1 þm2 ¼M

C Jm1 ,m2 ψ mj11 ϕmj22 :

ð20:201Þ

The combined states are completely decided by J and M. If we fix J at a certain number, (20.201) is expressed as a linear combination of different functions ψ mj11 ϕmj22 with varying m1 and m2 but fixed m1 + m2 ¼ M. Returning back to Tables 20.1, 20.2, and 20.3, the same M can be found in at most 2j2 + 1 places. This implies that ΨJM of (20.201) has at most 2j2 + 1 terms on condition that j1 j2. In (20.201) the ClebschGordan coefficients are given by ρJ CJm1 ,m2 : The descriptions of the ClebschGordan coefficients are different from literature to literature [5, 6]. We adopt the description due to Hamermesh [5] such that

20.3

Clebsch–Gordan Coefficients of Rotation Groups

855

ð j1 m1 j2 m2 jJM Þ ρJ C Jm1 ,m2 , X ΨJM ¼ ð j m j m jJM Þψ mj11 ϕmj22 : m ,m ,m þm ¼M 1 1 2 2 1

2

1

2

ð20:202Þ

(3) Normalization of ΨJM : Normalization condition for ΨJM of (20.201) or (20.202) is given by j ρJ j 2

X m1 ,m2 ,m1 þm2 ¼M

J

C

m1 ,m2

2

¼ 1:

ð20:203Þ

To obtain (20.203), we assume the following normalization condition for the basis vectors ψ mj11 ϕmj22 that span the representation space. That is, for an appropriate pair of complex variables z1 and z2, we define their inner product as follows [9]: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k mk zl1 zml ¼ l!ðm lÞ! k!ðm k Þ!δlk or 2 jz1 z2 * +

zk1 zmk zl1 zml 2 2

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ δlk: l!ðm lÞ! k!ðm kÞ!

ð20:204Þ

Equation (20.204) implies that m + 1 monomials of m-degree with respect to z1 and z2 constitute the orthonormal basis. Since ρJ defined in (20.200) is independent of M, we can evaluate it by choosing M conveniently in (20.203). Setting M (¼m1 + m2) ¼ J and substituting it in J j1 + λ m2 of (20.198), we obtain (m1 + m2) j1 + λ m2 ¼ λ + m1 j1. So far as we are dealing with integers, an inside of the factorial must be nonnegative, and so we have λ þ m1 j1 0 or λ j1 m1 :

ð20:205Þ

Meanwhile, looking at another factorial ( j1 λ m1)!, we get λ j1 m1 :

ð20:206Þ

From the above equations, we obtain only one choice for λ such that [5]. λ ¼ j1 m1 :

ð20:207Þ

Notice that j1 m1 is an integer, whichever j1 takes out of an integer or a halfodd-integer. Then, we get

856

20

Theory of Continuous Groups

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi ð 2J Þ! ð j1 þm1 Þ!ð j2 þm2 Þ! CJm1 ,m2 ¼ ð1Þ j1 m1 ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 m1 Þ!ð j2 m2 Þ! sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi j2 þm2 j1 þm1 ð2J Þ! j1 m1 : ð20:208Þ ¼ ð1Þ ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j2 m2 j1 m1 To confirm (20.208), calculate

j1 þ m 1 j2 m 2

j2 þ m2 j1 m1

and use, e.g.,

j1 j2 þ m1 þ m2 ¼ j1 j2 þ M ¼ j1 j2 þ J:

ð20:209Þ

Thus, we get X m1 ,m2 ,m1 þm2 ¼M

¼

J

C

m1 ,m2

2

X ð2J Þ! m1 ,m2 ,m1 þm2 ¼M ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ! j2 þ m2 : j1 m 1

j1 þ m 1

j2 m2 ð20:210Þ

To evaluate the sum, we use the following relations [4, 5, 10]: ΓðzÞΓð1 zÞ ¼ π=ð sin πzÞ:

ð20:211Þ

Replacing z with x + 1 in (20.211), we have Γðx þ 1ÞΓðxÞ ¼ π=½ sin π ðx þ 1Þ:

ð20:212Þ

Meanwhile, using gamma functions, we have x Γ ð x þ 1Þ Γðx þ 1ÞΓðxÞ Γðy xÞ x! ¼ ¼ ¼ y!ðx yÞ! y!Γðx y þ 1Þ Γðx y þ 1ÞΓðy xÞ y!ΓðxÞ y ¼

sin π ðx y þ 1Þ Γðy xÞ π π sin π ðx þ 1Þ y!ΓðxÞ

¼

sin π ðx y þ 1Þ Γðy xÞ Γ ð y xÞ ¼ ð1Þy : sin π ðx þ 1Þ y!ΓðxÞ y!ΓðxÞ

ð20:213Þ

Replacing x in (20.213) with y x 1, we have

yx1 y

Γ ð x þ 1Þ ¼ ð1Þy ¼ ð1Þ y!Γðx y þ 1Þ y

where with the last equality we used (20.213). That is, we get

x y

,

20.3

Clebsch–Gordan Coefficients of Rotation Groups

857

x y yx1 ¼ ð1Þ : y y

ð20:214Þ

Applying (20.214) to (20.210), we get X m1 ,m2 ,m1 þm2 ¼M

¼

m1 ,m2

2

ð2J Þ! ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!

X m1 ,m2 ,m1 þm2 ¼M

¼

J

C

ð1Þ

j2 m2

ð1Þ

j1 m1

j2 m2 j1 m1 1

j1 m1 j2 m2 1

j2 m2

j1 m1

j2 j1 J1 j1 j2 J1 ð1Þ j1 þj2 J ð2J Þ! X : ð20:215Þ ðJj2 þj1 Þ!ðJj1 þj2 Þ! m1 ,m2 ,m1 þm2 ¼M j2 m2 j1 m1

Meanwhile, from the binomial theorem, we have r

s

ð1 þ xÞ ð1 þ xÞ ¼

X

r α

! x

α

¼ ð 1 þ xÞ

rþs

α

¼

X

s β

X γ

! β

x ¼

β

rþs

X α,β

!

γ

r α

!

s β

! xαþβ

xγ : ð20:216Þ

Comparing the coefficients of the γ-th power monomials of the last three sides, we get

rþs γ

¼

r s

X α,β,αþβ¼γ

α

β

:

ð20:217Þ

Applying (20.217) to (20.215) and employing (20.214) once again, we obtain X m1 ,m2 ,m1 þm2 ¼M

J

C

m1 ,m2

2

¼

2J 2 ð1Þ j1 þj2 J ð2J Þ! ð1Þ j1 þj2 J ð1Þ j1 þj2 J ð2J Þ! j1 þj2 þJ þ1 ¼ ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! j1 þj2 J

¼

ð2J Þ! ðJ j2 þj1 Þ!ðJ j1 þj2 Þ!

¼

ð j1 þj2 þJ þ1Þ! : ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ!

j1 þj2 þJ þ1 j1 þj2 J

¼

ð2J Þ! ð j1 þj2 þJ þ1Þ! ðJ j2 þj1 Þ!ðJ j1 þj2 Þ! ð j1 þj2 J Þ!ð2J þ1Þ!

ð20:218Þ

858

20

Theory of Continuous Groups

Inserting (20.218) into (20.203), as a positive number ρJ we have sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2J þ 1ÞðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!ð j1 þ j2 J Þ! ρJ ¼ : ð j1 þ j2 þ J þ 1Þ!

ð20:219Þ

From (20.200), we find rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2J þ 1 C ðJ Þ ¼ : ð20:220Þ ðJ j2 þ j1 Þ!ðJ j1 þ j2 Þ!ð j1 þ j2 J Þ!ð j1 þ j2 þ J þ 1Þ! Thus, as the ClebschGordan coefficients ( j1m1j2m2| JM), at last we get ð j1 m1 j2 m2 jJM Þ ρJ C Jm1 ,m2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2J þ1ÞðJ j2 þj1 Þ!ðJ j1 þj2 Þ!ð j1 þj2 J Þ! ¼ ð j1 þj2 þJ þ1Þ!

X

ð1Þλ ½ð j1 þm1 Þ!ð j1 m1 Þ!ð j2 þm2 Þ!ð j2 m2 Þ!ðJ þM Þ!ðJ M Þ!1=2 λ λ!ð j þj J λÞ!ð j λm1 Þ!ðJ j þλþm1 Þ!ð j λþm2 Þ!ðJ j þλm2 Þ! 1 2 1 2 2 1

ð20:221Þ During the course of the above calculations, we encountered, e.g., Γ(x) and the factorial involving a negative integer. Under ordinary circumstances, however, such things must be avoided. Nonetheless, it would be convenient for practical use, if we properly recognize that we first choose a number, e.g., x close to (but not identical with) a negative integer and finally take the limit of x at a negative integer after the related calculations have been finished. Choosing, e.g., M(¼m1 + m2) ¼ J in (20.203) instead of setting M ¼ J in the above, we will see that only the factor of λ ¼ j2 + m2 in (20.198) survives. Yet we get the same result as the above. The confirmation is left for readers.

20.3.3 Examples of Calculation of Clebsch2Gordan Coefficients Simple examples mentioned below help us understand the calculation procedures of the ClebschGordan coefficients and their implications. Example 20.1 As a simplest example, we examine a case of D(1/2) D(1/2). At the same time, it is an example of the symmetric and antisymmetric representations

20.3

Clebsch–Gordan Coefficients of Rotation Groups

859

discussed in Sect. 18.9. Taking two sets of complex valuables u (u2 u1) and v (v2 v1) and rewriting them according to (20.176), we have 1=2

1=2

1=2

1=2

ψ 1=2 ¼ u2 , ψ 1=2 ¼ u1 , ϕ1=2 ¼ v2 , ϕ1=2 ¼ v1 so that we can get the basis functions of the direct-product representation described by ðu2 v2 u1 v2 u2 v1 u1 v1 Þ: Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) ¼ 4, j1 ¼ j2 ¼ 1/2. Using (20.221) we can determine the proper basis functions with respect to ΨJM of (20.202). For example, for Ψ11 we have only one choice to take m1 ¼ m2 ¼ 12 to get m1 + m2 ¼ M ¼ 1. In (20.221) we have no other choice but to take λ ¼ 0. In this way, we have 1=2

1=2

Ψ11 ¼ u2 v2 ¼ ψ 1=2 ϕ1=2 : Similarly, we get 1=2 1=2

Ψ11 ¼ u1 v1 ¼ ψ 1=2 ϕ1=2 : With two other functions, using (20.221) we obtain

1 1 1=2 1=2 1=2 1=2 Ψ10 ¼ pffiffiffi ðu2 v1 þ u1 v2 Þ ¼ pffiffiffi ψ 1=2 ϕ1=2 þ ψ 1=2 ϕ1=2 , 2 2

1 1 1=2 1=2 1=2 1=2 0 Ψ0 ¼ pffiffiffi ðu1 v2 u2 v1 Þ ¼ pffiffiffi ψ 1=2 ϕ1=2 ψ 1=2 ϕ1=2 : 2 2 We put the above results in a simple matrix form representing the basis vector transformation such that 0

1

0 B 1 B 0 pffiffiffi B 2 ðu2 v2 u1 v2 u1 v1 u2 v1 ÞB B0 0 B @ 1 0 pffiffiffi 2

0 0 1 0

1 0 1 C pffiffiffi C 2 C C ¼ Ψ 1 Ψ1 Ψ1 Ψ0 : 1 0 1 0 C 0 C 1 A pffiffiffi 2

ð20:222Þ

In this way, (20.222) shows how the basis functions that possess a proper irreducible representation and span the direct-product representation space are constructed using the original product functions. The ClebschGordan coefficients

860

20

Theory of Continuous Groups

play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this respect these coefficients resemble those appearing in the symmetry-adapted linear combinations (SALCs) mentioned in Sects. 18.2 and 19.3. Expressing the unitary transformation according to (20.173) and choosing 1 Ψ1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar manner to (20.48) such that Rða, bÞ Ψ11 Ψ10 Ψ11 Ψ00 ¼ Ψ11 Ψ10 Ψ11 Ψ00 Dð1=2Þ Dð1=2Þ :

ð20:223Þ

Then, as the matrix representation of D(1/2) D(1/2) we get 0

Dð1=2Þ Dð1=2Þ

a2 p B ffiffi2ffiab B ¼B @ ð b Þ 2 0

pffiffiffi 2ab aa bb pffiffiffi 2a b 0

1 b2 0 pffiffiffi 2a b 0 C C C: 2 ða Þ 0A 0

ð20:224Þ

1

Notice that (20.224) is a block matrix, namely (20.224) has been reduced according to the symmetric and antisymmetric representations. Symbolically writing (20.223) and (20.224), we have Dð1=2Þ Dð1=2Þ ¼ Dð1Þ ⨁Dð0Þ : The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1) submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the conditions (20.35) and (20.173). The confirmation is left for readers. This submatrix is a symmetric representation, whose representation space Ψ11 Ψ10 Ψ11 span. Note that these functions are symmetric with the exchange m1 $ m2. Or, we may think that these functions are symmetric with the exchange ψ $ ϕ. Though trivial, the basis function Ψ00 spans the antisymmetric representation space. The corresponding submatrix is merely the number 1. This is directly related to the fact that v (ug)T (or u1v2 u2v1) of (20.185) is an invariant. The function Ψ00 changes the sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ. Once again, readers might well ask why we need to work out such an elaborate means to pick up a small pebble. With increasing dimension of the vector space, however, to seek an appropriate set of proper basis functions becomes increasingly as difficult as lifting a huge rock. Though simple, a next example gives us a feel for it. Under such situations, a projection operator is an indispensable tool to address the problems. Example 20.2 We examine a direct product of D(1) D(1), where j1 ¼ j2 ¼ 1. This is another example of the symmetric and antisymmetric representations. Taking two sets of complex valuables u (u2 u1) and v (v2 v1) and rewriting them

20.3

Clebsch–Gordan Coefficients of Rotation Groups

861

according to (20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions expressed as ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 , ψ 11 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 11 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 : ð20:225Þ These product functions form the basis functions of the direct-product representation. Consequently, we will be able to construct proper eigenfunctions by means of linear combinations of these functions. This method is again related to that based on SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the ClebschGordan coefficients described by (20.221). As implied in Table 20.1, we need to determine the individual coefficients of nine functions of (20.225) with respect to the following functions that are constituted by linear combinations of the above nine functions: Ψ22 , Ψ21 , Ψ20 , Ψ21 , Ψ22 , Ψ11 , Ψ10 , Ψ11 , Ψ00 : (i) With the first five functions, we have J ¼ j1 + j2 ¼ 2. In this case, the determination procedures of the coefficients are pretty much simplified. Substituting J ¼ j1 + j2 for the second factor of the denominator of (20.221), we get (λ)! as well as λ! in the first factor of the denominator. Therefore, for the inside of factorials not to be negative we must have λ ¼ 0. Consequently, we have [5] sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2j1 Þ!ð2j2 Þ! ρJ ¼ , ð2J Þ!

C Jm1 ,m2

ðJ þ M Þ!ðJ M Þ! ¼ ð j1 þ m1 Þ!ð j1 m1 Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!

1=2 :

With Ψ2 2 we have only one choice for ρJ and CJm1 ,m2. That is, we have m1 ¼ j1 and m2 ¼ j2. Then, Ψ2 2 ¼ ψ 1 1 ϕ1 1 . In the case of M ¼ 1, we have two choices in (20.221); i.e., m1 ¼ 1, m2 ¼ 0 and m1 ¼ 0, m2 ¼ 1. Then, we get 1 Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : 2

ð20:226Þ

In the case of M ¼ 1, similarly we have two choices in (20.221); i.e., m1 ¼ 1, m2 ¼ 0 and m1 ¼ 0, m2 ¼ 1. We get

862

20

Theory of Continuous Groups

1 Ψ21 ¼ pffiffiffi ψ 11 ϕ10 þ ψ 10 ϕ11 : 2

ð20:227Þ

Moreover, in the case of M ¼ 0, we have three choices in (20.221); i.e., m1 ¼ 1, m2 ¼ 1 and m1 ¼ 0, m2 ¼ 0 along with m1 ¼ 1, m2 ¼ 1 (see Table 20.1). As a result, we get 1 Ψ20 ¼ pffiffiffi ψ 11 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ11 : 6

ð20:228Þ

(ii) With J ¼ 1, i.e., Ψ11 , Ψ10 , and Ψ11 , we start with (20.221). Noting the λ! and (1 λ)! in the first two factors of denominator, we have λ ¼ 0 or λ ¼ 1. In the case of M ¼ 1, we have only one choice of m1 ¼ 0, m2 ¼ 1 for λ ¼ 0. Similarly, m1 ¼ 1, m2 ¼ 0 for λ ¼ 1. Hence, we get 1 Ψ11 ¼ pffiffiffi ψ 10 ϕ11 ψ 11 ϕ10 : 2

ð20:229Þ

In a similar manner, for M ¼ 0 and M ¼ 1, respectively, we have 1 Ψ10 ¼ pffiffiffi ψ 11 ϕ11 ψ 11 ϕ11 , 2 1 Ψ11 ¼ pffiffiffi ψ 11 ϕ10 ψ 10 ϕ11 : 2

ð20:230Þ ð20:231Þ

(iii) With J ¼ 0 (i.e., Ψ00 ), we have J ¼ j1 j2 (¼0). In this case, we have J j1 + λ m2 ¼ j2 + λ m2 in the denominator of (20.221) [5]. Since this factor is inside the factorial, we have j2 + λ m2 0. Also, we have j2 λ + m2 0 in the denominator of (20.221). Hence, we have only one choice of λ ¼ j2 + m2. Therefore, we get sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2J þ 1Þ!ð2j2 Þ! ρJ ¼ , ð2j1 þ 1Þ! C Jm1 ,m2 ¼ ð1Þ

j2 þm2

1=2 ð j1 þ m1 Þ!ð j1 m1 Þ! : ðJ þ M Þ!ðJ M Þ!ð j2 þ m2 Þ!ð j2 m2 Þ!

In the above, we have three choices of m1 ¼ 1, m2 ¼ 1; m1 ¼ m2 ¼ 0; m1 ¼ 1, m2 ¼ 1. As a result, we get 1 Ψ00 ¼ pffiffiffi ψ 11 ϕ11 ψ 10 ϕ10 þ ψ 11 ϕ11 : 3

20.3

Clebsch–Gordan Coefficients of Rotation Groups

863

Summarizing the above results, we have constructed the proper eigenfunctions with respect to the combined angular momenta using the basis functions of the direct-product representation. An example of the relevant matrix representations is described as follows: 1 1 ψ 1 ϕ1 ψ 10 ϕ11 ψ 11 ϕ11 ψ 10 ϕ11 ψ 11 ϕ11 ψ 11 ϕ10 ψ 11 ϕ11 ψ 11 ϕ10 ψ 10 ϕ10 1 0 1 0 0 0 0 0 0 0 0 C B 1 B 0 p1ffiffiffi 0 0 0 pffiffiffi 0 0 0 C C B C B 2 2 C B B 1 1 C B 0 0 p1ffiffiffi 0 0 pffiffiffi pffiffiffi C 0 0 C B 3 C 6 2 B C B C B 1 1 C B0 0 p ffiffi ffi p ffiffi ffi 0 0 0 0 0 C B 2 2 C B C B C B0 0 0 0 1 0 0 0 0 C B B C 1 B 0 p1ffiffiffi 0 0 0 pffiffiffi 0 0 0 C C B C B 2 2 C B B 1 1 C B 0 0 p1ffiffiffi 0 0 pffiffiffi C 0 pffiffiffi 0 C B 3 C 6 2 B C B C B 1 1 C B0 0 p ffiffi ffi p ffiffi ffi 0 0 0 0 0 C B 2 2 C B C B @ 2 1 A p ffiffi ffi p ffiffi ffi 0 0 0 0 0 0 0 3 6 2 ¼ Ψ2 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00

ð20:232Þ

In (20.232) the matrix elements represent the ClebschGordan coefficients and their combination in (20.232) forms a unitary matrix. In (20.232) we have arbitrariness with the disposition of the elements of both row and column vectors. In other words, we have the arbitrariness with the unitary similarity transformation of the matrix, but the unitary matrix that underwent the unitary similarity transformation is again a unitary matrix (see Chap. 14). To show that the determinant of (20.232) is 1, use the expansion of the determinant and use the fact that when (a multiple of) a certain column (or row) is added to another column (or row), the determinant is unchanged (see Sect. 11.3). We find that the functions belonging to J ¼ 2 and J ¼ 0 are the basis functions of the symmetric representations. Meanwhile, those belonging to J ¼ 1 are the basis functions of the antisymmetric representations (see Sect. 18.9). Expressing the unitary transformation of this example according to (20.173) and choosing Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation as

864

20

Theory of Continuous Groups

ℜða, bÞ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 ¼ Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 Ψ11 Ψ10 Ψ11 Ψ00 Dð1Þ Dð1Þ :

ð20:233Þ

Notice again that the representation matrix of D(1) D(1) can be converted (or reduced) to a block unitary matrices according to the symmetric and antisymmetric representations such that Dð1Þ Dð1Þ ¼ Dð2Þ ⨁Dð1Þ ⨁Dð0Þ ,

ð20:234Þ

where D(2) and D(0) are the symmetric representations and D(1) is the antisymmetric representation. Recall that SO(3) is simply reducible. To show (20.234), we need somewhat lengthy calculations, but the computation procedures are straightforward. A set of basis functions Ψ22 Ψ21 Ψ20 Ψ21 Ψ22 spans a five-dimensional symmetric representation space. In turn, Ψ00 spans a one-dimensional symmetric representation space. Another set of basis functions Ψ11 Ψ10 Ψ11 spans a three-dimensional antisymmetric representation space. We add that to seek some special ClebschGordan coefficients would be easy. For example, we can readily find out them in the case of J ¼ j1 + j2 ¼ M or J ¼ j1 + j2 ¼ M. The final result of the normalized basis functions can be Ψ

j1 þj2 j1 þj2

j

j

j þj2 j1 þj2 Þ

¼ ψ j11 ϕ j22 , Ψ1ð

j

j

¼ ψ j1 1 ϕj2 2 :

That is, among the related coefficients, only ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ and ð j1 , m1 ¼ j1 , j2 , m2 ¼ j2 jJ ¼ j1 þ j2 , M ¼ J Þ survive and take a value of 1; see the corresponding parts of the matrix elements in (20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.

20.4

Lie Groups and Lie Algebras

In Sect. 20.2 we have developed a practical approach to dealing with various aspects and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory of Lie groups and Lie algebras that enable us to systematically study SU(2) and SO(3).

20.4

Lie Groups and Lie Algebras

865

20.4.1 Definition of Lie Groups and Lie Algebras: One-Parameter Groups We start with the following discussion. Let A(t) be a non-singular matrix whose elements vary as a function of a real number t. Here we think of a matrix as a function of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be a one-parameter group. In particular, we are interested in a situation where we have Aðs þ t Þ ¼ AðsÞAðt Þ:

ð20:235Þ

AðsÞAðt Þ ¼ Aðs þ t Þ ¼ Aðt þ sÞ ¼ Aðt ÞAðsÞ:

ð20:236Þ

We further get

Therefore, any pair of A(t) and A(s) is commutative. Putting t ¼ s ¼ 0 in (20.236), we have Að0Þ ¼ Að0ÞAð0Þ:

ð20:237Þ

Since A(0) is non-singular, A(0)1 must exist. Then, multiplying A(0)1 on both sides of (20.237), we obtain Að0Þ ¼ E,

ð20:238Þ

where E is an identity matrix. Also, we have Aðt ÞAðt Þ ¼ Að0Þ ¼ E,

ð20:239Þ

meaning that Aðt Þ ¼ Aðt Þ1 : Differentiating both sides of (20.235) with respect to s at s ¼ 0, we get A0 ðt Þ ¼ A0 ð0ÞAðt Þ:

ð20:240Þ

X A0 ð0Þ,

ð20:241Þ

Defining

we obtain

866

20

Theory of Continuous Groups

A0 ðt Þ ¼ XAðt Þ:

ð20:242Þ

Aðt Þ ¼ exp ðtX Þ

ð20:243Þ

Integrating (20.242), we have

under an initial condition A(0) ¼ E. Thus, any one-parameter group A(t) is described by (20.243) on condition that A(t) is represented by (20.235). In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context (20.243) shows how the Lie algebra is related to the one-parameter group. For a group to be a continuous group is thus deeply connected to this one-parameter group. Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups. Bearing in mind the above situation, we describe definitions of the Lie group and Lie algebra. Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that Ak (k ¼ 1, 2, ) 2 G and that lim Ak ¼ A ½2 GLðn, ℂÞ. If A 2 G, then G is called k!1

a linear Lie group of dimension n. We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in that they read it as though the definition itself including the word of Lie group were written in the air. Nevertheless, a next example is again expected to relieve the chemists of useless anxiety. Example 20.3 Let G be a group and let Ak 2 G with detAk ¼ 1 such that Ak ¼

cos θk

sin θk

sin θk

cos θk

ðk : real numberÞ:

ð20:244Þ

Suppose that lim θk ¼ θ and then we have k!1

lim Ak ¼ lim

k!1

k!1

cos θk

sin θk

sin θk

cos θk

¼

cos θ

sin θ

sin θ

cos θ

A:

ð20:245Þ

Certainly, we have A 2 G, because detAk ¼ det A ¼ 1. Such group is called SO(2). In a word, a lie group is a group whose component consists of continuous and differentiable (or analytic) functions. In that sense the groups such as SU(2) and SO(3) we have already encountered are Lie groups. Next, the following is in turn a definition of a Lie algebra.

20.4

Lie Groups and Lie Algebras

867

Definition 20.3 [9] Let G be a linear Lie group. If with 8t (t: real number) we have exp ðtX Þ 2 G,

ð20:246Þ

all X are called a Lie algebra of̬ the corresponding Lie group G. We denote the Lie algebra of the Lie group G by g. The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a complex number) as the trivial case. Most importantly, (20.246) is directly connected to (20.243) and Definition 20.3 shows the direct relevance between the Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie continuous groups. The lie algebra has following properties: ̬

̬

(i) If X 2g, aX 2g as well, where a is any real number. ̬ ̬ (ii) If X, Y 2g, X þ Y 2g as well. ̬ ̬ (iii) If X, Y 2g, ½X, Y ð XY YX Þ 2g as well. In fact, exp[t(aX)] ¼ exp [(ta)X] and ta is a real number so that we have (i). Regarding (ii) we should examine individual properties of X. As already shown in Property (7)0 of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated with an anti-Hermitian matrix of X; we are interested in this particular case. In the case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skewsymmetric matrix. In the former case, for example, traceless anti-Hermitian matrices X and Y can be expressed as X¼

ic

a þ ib

a þ ib

ic

, Y¼

ih

f þ ig

f þ ig

ih

ð20:247Þ

,

where a, b, c, etc. are real. Then, we have XþY ¼

iðc þ hÞ

ða þ f Þ þ iðb þ gÞ

ð a þ f Þ þ i ð b þ gÞ

iðc þ hÞ

:

ð20:248Þ

Thus, X + Y is a traceless anti-Hermitian matrix as well. With the property (iii), we ̬ need some explanation. Suppose X, Y 2g. Then, with any real numbers s and t, exp (sX), exp (tY), exp (sX), exp (tY) 2 G by Definition 20.3. Then, their product exp (sX) exp (tY) exp (sX) exp (tY) 2 G as well. Taking an infinitesimal transformation of these four products within the first order of s and t, we have exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ ð1 þ sX Þð1 þ tY Þð1 sX Þð1 tY Þ ¼ ð1 þ tY þ sX þ stXY Þð1 tY sX þ stXY Þ ð1 tY sX þ stXY Þ þ ðtY stYX Þ þ ðsX stXY Þ þ stXY ¼ 1 þ stðXY YX Þ:

ð20:249Þ

868

20

Theory of Continuous Groups

Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient. Defining ½X, Y XY YX,

ð20:250Þ

we have exp ðsX Þ exp ðtY Þ exp ðsX Þ exp ðtY Þ 1 þ st ½X, Y exp ðst ½X, Y Þ: ð20:251Þ Since LHS represents an element of G and st is a real number, by Definition 20.3 ̬ ½X, Y 2g . The quantity [X, Y] is called a commutator. The commutator and its definition appeared in (1.139). ̬ From properties (i) and (ii), the Lie algebra g forms a vector space (see Chap. 11). ̬ An element of g is usually expressed by a matrix. Zero vector corresponds to zero matrix (see Proposition 15.1). As already seen in Chap. 15, we defined an inner ̬ product between any A, B 2g such that hAjBi

X

aij bij :

ð20:252Þ

i, j

̬

Thus, g constitutes an inner product (vector) space.

20.4.2 Properties of Lie Algebras Let us further continue our discussion of Lie algebras by thinking of a following proposition about the inner product. Proposition 20.1 Let f(t) and g(t) be continuous and differentiable functions with respect to a real number t. Then, we have d h f ðt Þjgðt Þi ¼ dt

dgðt Þ df ðt Þ

g ð t Þ þ f ð t Þ

dt : dt

Proof We calculate a following equation:

ð20:253Þ

20.4

Lie Groups and Lie Algebras

869

h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt Þi ¼ h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt þ ΔÞi þ h f ðt Þjgðt þ ΔÞi h f ðt Þjgðt Þi ¼ h f ðt þ ΔÞ f ðt Þjgðt þ ΔÞi þ h f ðt Þjgðt þ ΔÞ gðt Þi: ð20:254Þ Dividing both sides of (20.254) by a real number Δ and taking a limit as below, we have 1 ½h f ðt þ ΔÞjgðt þ ΔÞi h f ðt Þjgðt Þi Δ

gðt þ ΔÞ gðt Þ f ðt þ ΔÞ f ðt Þ

g ð t þ Δ Þ þ f ð t Þ , ¼ lim

Δ Δ Δ!0 lim

Δ!0

ð20:255Þ

where we used the calculation rules for inner products of (13.3) and (13.20). Then, we get (20.253). This completes the proof. ∎ Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we have

D E d

ψAðt Þ{ Aðt Þχ ¼ dt

*

+

dAðt Þ{

{ dAðt Þ Aðt Þχ þ ψAðt Þ χ , ψ dt dt

ð20:256Þ

where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have LHS of ð20:256Þ ¼

D E d d d ψ jAðt Þ{ Aðt Þjχ ¼ hψ jEjχ i ¼ hψ jχ i ¼ 0, dt dt dt

ð20:257Þ

where with the second equality we used the unitarity of A(t). Taking limit t ! 0 in (20.256), we have * lim ½RHS of ð20:256Þ ¼ t!0

+

dAð0Þ{

{ dAð0Þ χ : ψ Að0Þχ þ ψAð0Þ dt dt

Meanwhile, taking a limit of t ! 0 in the following relation, we have

{

dAðt Þ{

dAðt Þ

{ ¼ ¼ ½A0 ð0Þ , dt t!0 dt t!0 where we assumed that operations of the adjoint and t ! 0 are commutable. Then, we get

870

20

Theory of Continuous Groups

D E

{ { dAð0Þ 0 χ lim ½RHS of ð20:256Þ ¼ ψ ½A ð0Þ jAð0Þχ þ ψAð0Þ dt t!0 ¼ ψX { jχ þ hψ jXχ i ¼ ψ X { þ X χ ,

ð20:258Þ

where we used X A0(0) of (20.241) and A(0) ¼ A(0){ ¼ E. From (20.257) and (20.258), we have 0 ¼ ψ X { þ X χ : Since ψ and χ are arbitrary, from Theorem 14.2 we get X { þ X 0,

i:e:,

X { ¼ X:

ð20:259Þ

This indicates that X is an anti-Hermitian operator. Let X be expressed as (xij). Then, from (20.259) xij ¼ xji . As for diagonal elements, xii ¼ xii ; i.e., xii þ xii ¼ 0 . Hence, xii is a pure imaginary number or zero. We have the following theorem accordingly. Theorem 20.3 Let A(t) be a one-parameter unitary group with A(0) ¼ E that is described by A(t) ¼ exp (tX) and satisfies (20.235). Then, X A0(0) is an antiHermitian operator. Differentiating both sides of A(t) ¼ exp (tX) with respect to t and using (15.47) of Theorem 15.4 [11, 12], we have A0 ðt Þ ¼ X exp ðtX Þ ¼ XAðt Þ ¼ Aðt ÞX:

ð20:260Þ

Putting t ¼ 0 in (20.260), we restore X ¼ A0(0). If we require A(t) to be unitary, from Theorem 20.3 again X should be anti-Hermitian. Thus, once we have a oneparameter group in a form of a unitary operator A(t) ¼ exp (tX), the exponent can be separated into a product of a parameter t and a t-independent constant anti-Hermitian operator X. In fact, all the one-parameter groups that have appeared in Sect. 20.2 are of a type of exp(tX). Conversely, let us consider what type of operator exp(tX) would be if X is antiHermitian. Here we assume that X is a (n, n) matrix. With an arbitrary real number t we have ½ exp ðtX Þ exp ðtX Þ{ ¼ ½ exp ðtX Þ exp tX { ¼ ½ exp ðtX Þ ½ exp ðtX Þ ¼ ½ exp ðtX tX Þ ¼ exp 0 ¼ E,

ð20:261Þ

where we used (15.32) and Theorem 15.2 as well as the assumption that X is antiHermitian. Note that exp(tX) and exp(tX) are commutative; see (15.29) of Sect. 15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The

20.4

Lie Groups and Lie Algebras

871 ̬

̬

notation U(n) means a unitary group. Hence, X 2u ðnÞ, where u ðnÞ means the Lie algebra corresponding to U(n). Next, let us seek a Lie algebra that corresponds to a special unitary group U(n). ̬ ̬ ̬ By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), u ðnÞ ⊃ su ðnÞ , where ̬ ̬ su ðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition ̬ under which det[exp(tX)] ¼ 1 holds with any real number t and the elements X 2u ðnÞ. From (15.41) [9], we have det½ exp ðtX Þ ¼ exp TrðtX Þ ¼ exp t ½TrðX Þ ¼ 1,

ð20:262Þ

where Tr stands for trace (see Chap. 12). This is equivalent to that t ½TrðX Þ ¼ 2miπ ðm : zero or integersÞ

ð20:263Þ

holds with any real t. For this, we must have Tr(X) ¼ 0 with m ¼ 0. This implies that ̬ ̬ su ðnÞ comprises anti-Hermitian matrices with its trace being zero (i.e., traceless). ̬ ̬ ̬ Consequently, the Lie algebra su ðnÞ is certainly a subset (or subspace) of u ðnÞ that consists of anti-Hermitian matrices. In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real and orthogonal, (20.256) can be rewritten as d ψAðt ÞT jAðt Þχ ¼ dt

dAðt ÞT

T dAðt Þ χ : ψ A ð t Þχ þ ψA ð t Þ

dt dt

ð20:264Þ

Following the procedure similar to the case of (20.259), we get X T þ X 0,

i:e:,

X T ¼ X:

ð20:265Þ

In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding ̬ ̬ ̬ ̬ Lie algebras are denoted by o ðnÞ and so ðnÞ, respectively. Note that both o ðnÞ and ̬ ̬ so ðnÞ consist of skew-symmetric matrices. Conversely, let us consider what type of operator exp(tX) would be if X is a skewsymmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real number t we have ½ exp ðtX Þ exp ðtX ÞT ¼ ½ exp ðtX Þ exp tX T ¼ ½ exp ðtX Þ ½ exp ðtX Þ ¼ ½ exp ðtX tX Þ ¼ exp 0 ¼ E,

ð20:266Þ

where we used (15.31). This implies that exp(tX) is an orthogonal matrix, i.e., exp ̬ (tX) 2 O(n). Hence, X 2o ðnÞ. Notice that exp(tX) and exp(tX) are commutative. Summarizing the above, we have the following theorem.

872

20

Theory of Continuous Groups

̬

Theorem 20.4 The n-th order Lie algebra u ðnÞ corresponding to the Lie group U (n) (i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th ̬ ̬ order Lie algebra su ðnÞ corresponding to the Lie group SU(n) (i.e., special unitary group) comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th ̬ ̬ ̬ order Lie algebras o ðnÞ and so ðnÞ corresponding to the Lie groups O(n) and SO(n), respectively, are all the real skew-symmetric (n, n) matrices. Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real ̬ ̬ ̬ ̬ number). This is true of skew-symmetric matrices as well. Hence, u ðnÞ, su ðnÞ, o ðnÞ, ̬ ̬ and so ðnÞ form a linear vector space. In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in quantum mechanics, especially as commutation relations (see Part I). Major properties are as follows: ði Þ

½X þ Y, Z ¼ ½X, Z þ ½Y, Z ,

ðiiÞ

½aX, Y ¼ a½X, Z ða 2 ℝÞ,

ðiiiÞ

½X, Y ¼ ½Y, X ,

ðivÞ ½X, ½Y, Z þ ½Y, ½Z, X þ ½Z, ½X, Y ¼ 0:

ð20:267Þ

The last equation of (20.267) is well known as Jacobi’s identity. Readers are encouraged to check it. Since the Lie algebra forms a linear vector space of a finite dimension, we denote its basis vectors by X1, X2, , Xd. Then, we have ̬

ð20:268Þ

g ðdÞ ¼ SpanfX 1 , X 2 , , X d g, ̬

where g ðdÞ stands for a Lie algebra of dimension d. As their commutators belong to ̬ g ðdÞ as well, with an arbitrary pair of Xi and Xj (i, j ¼ 1, 2, , d ) we have

Xd Xi, Xj ¼ f X , k¼1 ijk k

ð20:269Þ

where a set of real coefficients fijk is said to be structure constants, which define the structure of the Lie algebra. Example 20.4 In (20.42) we write ζ 1 ζ x, ζ 2 ζ y, and ζ 3 ζ z. Rewriting it explicitly, we have 1 ζ1 ¼ 2

0 i

Then, we get

i 1 0 1 1 i , ζ2 ¼ , ζ3 ¼ 2 1 0 2 0 0

0 : i

ð20:270Þ

20.4

Lie Groups and Lie Algebras

873

X3 ζi , ζj ¼ E ζ , k¼1 ijk k

ð20:271Þ

where Eijk is called the Levi-Civita symbol [4] and denoted by

Eijk

8 > < þ1 ði, j, k Þ ¼ ð1, 2, 3Þ, ð2, 3, 1Þ, ð3, 1, 2Þ 1 ði, j, k Þ ¼ ð3, 2, 1Þ, ð1, 3, 2Þ, ð2, 1, 3Þ ¼ > : 0 otherwise:

ð20:272Þ

Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk ¼ 1, but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk ¼ 1. Otherwise, for, e.g., i ¼ j, j ¼ k, or k ¼ i, etc. Eijk ¼ 0. The relation of (20.271) is essentially the same as (3.30) and (3.69) of Chap. 3. We have ̬ ̬

su ð2Þ ¼ Spanfζ 1 , ζ 2 , ζ 3 g:

ð20:273Þ

Example 20.5 In (20.26) we write A1 Ax, A2 Ay, and A3 Az. Rewriting it, we have 0

0

B A1 ¼ @ 0 0

0 0 1

0

1

0

0

0

C B 1 A, A2 ¼ @ 0 0 0 1 0

1

1

0

0

C B 0 A, A3 ¼ @ 1 0 0

1 0 0 0

1

C 0 A: 0

ð20:274Þ

Again, we have

X3 Ai , Aj ¼ E A: k¼1 ijk k

ð20:275Þ

This is of the same form as that of (20.271). Namely, ̬ ̬

̬

so ð3Þ ¼o ð3Þ ¼ SpanfA1 , A2 , A3 g: Equation (20.274) clearly shows that diagonal elements of Ai (i ¼ 1, 2, 3) are zero. ̬ ̬ ̬ ̬ ̬ From Examples 20.4 and 20.5, su ð2Þ and so ð3Þ ½or o ð3Þ structurally resemble each other. This fact is directly related to similarity between SU(2) and SO(3). Note ̬ ̬ ̬ ̬ also that the dimensionality of su ð2Þ and so ð3Þ is the same in terms of linear vector spaces. In Chap. 3, we deal with the generalized angular momentum using the relation (3.30). Using (20.15), we can readily obtain the relation the same as (20.271) and (20.275). That is, from (3.30) defining the following anti-Hermitian operators as

874

20

iJ Jex x , ħ

iJ y Jey , ħ

Theory of Continuous Groups

Jez iJ z =ħ,

we get h i X3 Jel , Jf E Je , m ¼ k¼1 lmn n

ð20:276Þ

where l, m, n stand for x, y, z. The derivation is left for readers.

20.4.3 Adjoint Representation of Lie Groups ̬

In (20.252) of Sect. 20.3.1 we defined an inner product of g as hAjBi

X

a b : i,j ij ij

ð20:252Þ

We can readily check that the definition (20.252) satisfies those of the inner product of (13.2), (13.3), and (13.4). In fact, we have hBjAi ¼

X

b a ¼ i,j ij ij

X X

a b ¼ a b ¼ hAjBi : ij ij ij ij i,j i,j

ð20:277Þ

Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4), we get X 2

a 0, hAjAi ¼ Tr A{ A ¼ i,j ij

ð20:278Þ

where we have A ¼ (aij) and the equality holds if and only if all aij ¼ 0, i.e., A ¼ 0. ̬ Thus, hA|Ai gives a positive definite inner product on g. ̬ From (20.278), we may equally define an inner product of g as hAjBi ¼ Tr A{ B :

ð20:279Þ

In fact, we have X X { A ij ðBÞji ¼ a b hAjBi: Tr A{ B ¼ i,j i,j ji ji In another way, we can readily compare (20.279) with (20.252) using, e.g., a (2, 2) matrix and find that both equations give the same result. It is left for readers as an exercise.

20.4

Lie Groups and Lie Algebras

875

̬

From Theorem 20.4, u ðnÞ corresponding to the unitary group U(n) consists of all the anti-Hermitian (n, n) matrices. With these matrices, we have X X X aij ¼ b a ¼ b a b ¼ hAjBi: ð20:280Þ hBjAi ¼ Tr B{ A ¼ ji ij ji i,j i,j i,j ij ij

Then, comparing (20.277) and (20.280) we have hAjBi ¼ hAjBi: ̬

This means that hA|Bi is real. For example, using u ð2Þ, let us evaluate an inner ̬ product. We denote arbitrary X 1 , X 2 2u ð2Þ by X1 ¼

ia

c þ id

c þ id

ib

,

X2 ¼

ip

r þ is

r þ is

iq

,

where a, b, c, d; p, q, r, s are real. Then, according to the calculation rule of an inner product of the Lie algebra, we have hX 1 jX 2 i ¼ ap þ 2ðcr þ dsÞ þ bq, ̬ ̬

which gives a real inner product. Notice that this is the case with su ð2Þ as well. Next, let us think of the following inner product:

h i

{ gXg1 gYg1 ¼ Tr gXg1 gYg1 ,

ð20:281Þ

where g is any non-singular matrix. Bearing in mind that we are particularly interested in the continuous groups U(n) [or its subgroup SU(n)] and O(n) [or its subgroup SO(n)], we assume that g is an element of those groups and represented by a unitary matrix (including an orthogonal matrix); i.e., g1 ¼ g{ (or gT). Then, if g is a unitary matrix, we have Tr gXg1 Þ{ gYg1 ¼ Tr gXg{ Þ{ gYg{ ¼ Tr gX { g{ gYg{ ¼ Tr gX { Yg{ ¼ Tr X { Y ¼ hXjY i, ð20:282Þ where with the second last equality we used (12.13), namely invariance of the trace under the (unitary) similarity transformation. If g is represented by a real orthogonal matrix, instead of (20.282) we have

876

20

Theory of Continuous Groups

Tr gXg1 Þ{ gYg1 ¼ Tr gXgT Þ{ gYgT ¼ Tr gT Þ{ X { g{ gYgT ð20:283Þ ¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hXjYi, where g is a complex conjugate matrix of g. Combining (20.281) and (20.282) or (20.283), we have

gXg1 gYg1 ¼ hXjY i:

ð20:284Þ

This relation clearly shows that the real inner product of (20.284) remains unchanged by the operation X⟼gXg1 , where X is an anti-Hermitian (or skew-symmetric) matrix and g is a unitary (or orthogonal) matrix. Now, we give the following definition and related theorems in relation to both the Lie groups and Lie algebras. Definition 20.4 Let G be a Lie group chosen from U(n) [including SU(n)] and O(n) ̬ ̬ [including SO(n)]. Let g be a Lie algebra corresponding to G. Let g 2 G and X 2g. ̬ We define the following transformation on g such that Ad½gðX Þ gXg1 :

ð20:285Þ ̬

Then, Ad[g] is said to be an adjoint representation of G on g. We write the relation (20.285) as ̬

̬

Ad½g: g!g : ̬

That is, Ad½gðX Þ 2g . The operator Ad[g] is a kind of mapping (i.e., endomorphism) discussed in Sect. 11.2. Notice that both g and X are represented by (n, n) matrices. The matrix g is either a unitary matrix or an orthogonal matrix. The matrix X is either an anti-Hermitian matrix or a skew-symmetric matrix. ̬

Theorem 20.5 Let g be an element of a Lie group G and g be a Lie algebra of G. ̬ Then, Ad[g] is a linear transformation on g. ̬

Proof Let X, Y 2g. Then, we have Ad½gðaX þ bY Þ gðaX þ bY Þg1 ¼ a gXg1 þ b gYg1 ¼ aAd½gðX Þ þ bAd½gðY Þ:

ð20:286Þ

Thus, we find that Ad[g] is a linear transformation. By Definition 20.3, exp (tX) 2 G with an arbitrary real number t. Meanwhile, we have

20.4

Lie Groups and Lie Algebras

877

exp ftAd½gðX Þg ¼ exp tgXg1 ¼ exp gtXg1 ¼ g exp ðtX Þg1 , where with the last equality we used (15.30). ̬ Then, if g 2 G, g exp (tX)g1 2 G. Again from Definition 20.3, Ad½gðX Þ 2g . ̬ That is, Ad[g] is a linear transformation on g. This completes the proof. ∎ Notice that in Theorem 20.5, G may be any linear Lie group chosen̬ from among ̬ GL(n, ℂ). Lie algebra corresponding to GL(n, ℂ) is denoted by gl ðn, ℂÞ. The following theorem shows that Ad[g] is a representation. Theorem 20.6 Let g be an element of a Lie group G. Then, g ⟼ Ad[g] is a ̬ representation of G on g. Proof From (20.285), we have 1 1 Ad½g1 g2 ðX Þ ðg1 g2 ÞX ðg1 g2 Þ1 ¼ g1 g2 Xg1 2 g1 ¼ g1 Ad½g2 ðX Þg1

¼ Ad½g1 ðAd½g2 ðX ÞÞ ¼ ðAd½g1 Ad½g2 ÞðX Þ:

ð20:287Þ

Comparing the first and last sides of (20.287), we get Ad½g1 g2 ¼ Ad½g1 Ad½g2 : ̬

That is, g ⟼ Ad[g] is a representation of G on g. By virtue of Theorem 20.6, we call g ⟼ Ad[g] an adjoint representation of G on ̬ g. Once again, we remark that X⟼Ad½gðXÞ X0

g

or

̬

Ad[ ]

̬

̬

̬

Ad½g : g!g

ð20:288Þ

g

Fig. 20.5 Linear transformation Ad[g]: g!g (i.e., endomorphism). Ad[g] is an endomorphism that ̬ operates on a vector space g

878

20 ̬

Theory of Continuous Groups

̬

is a linear transformation g!g, namely an endomorphism that operates on a vector ̬ space g (see Sect. 11.2). Figure 20.5 schematically depicts it. The notation of the adjoint representation Ad is somewhat confusing. That is, (20.288) shows that Ad[g] ̬ ̬ is a linear transformation g!g (i.e., endomorphism). On the other hand, Ad is thought to be a mapping G ! G0 where G and G0 mean two groups. Namely, we express it symbolically as [9] Ad : G ! G0 :

g⟼Ad½g or

ð20:289Þ

The G and G0 may or may not be identical. Examples can be seen in, e.g., (20.294) and (20.302) later. ̬ ̬ ̬ As mentioned above, if g is either u ðnÞ or o ðnÞ , Ad[g] is an orthogonal ̬ transformation on g. The representation of Ad[g] is real accordingly. An immediate implication of this is that the transformation of the basis vectors of ̬ g has a connection with the corresponding Lie groups such as U(n) and O(n). Moreover, it is well known and studied that there is a close relationship between ̬ ̬ Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of su ð2Þ. A general form ̬ ̬ of X 2su ð2Þ is a following traceless anti-Hermitian matrix described by X¼

ic

a þ ib

a þ ib

ic

,

where a, b, and c are real. We have encountered this type of matrix in (20.42) and (20.270). Using an inner product described in (20.252), we determine an orthonor̬ ̬ mal basis set of su ð2Þ such that 1 e1 ¼ pffiffiffi 2

0 i

0 1 i 1 1 , e2 ¼ pffiffiffi , e3 ¼ pffiffiffi 0 2 1 0 2 0

i

0

i

where we have hei| eji ¼ δij (i, j ¼ 1, 2, 3). Choosing the SU(2) elements of

gα

e

0

0

!

iα 2

0 e

iα2

β B 2 gβ @ β sin 2 cos

and

that appeared in (20.43) and (20.44), we have [9]

1 β 2C A β cos 2 sin

,

ð20:290Þ

20.4

Lie Groups and Lie Algebras

Ad½gα ðe1 Þ ¼ gα e1 g1 α 1 ¼ pffiffiffi 2

879

1 ¼ pffiffiffi 2

!

eiα=2

0

0

eiα=2

i

0 i

0

sin α i cos α

sin α i cos α

0

!

!

0

eiα=2

0

0

eiα=2

!

¼ e1 cos α þ e2 sin α: ð20:291Þ

Similarly, we get 0 cosα þ isinα 1 p ffiffi ffi ¼ e1 sinα þ e2 cosα, ð20:292Þ Ad½gα ðe2 Þ ¼ 0 2 cosα þ isinα i 0 1 Ad½gα ðe3 Þ ¼ pffiffiffi ð20:293Þ ¼ e3 : 2 0 i Notice that we obtained (20.291) to (20.293) as a product of three diagonal matrices. Thus, we obtain 0

cos α B ðe1 e2 e3 ÞAd½gα ¼ ðe1 e2 e3 Þ@ sin α 0

sin α cos α 0

1 0 C 0 A:

ð20:294Þ

1

From (20.294), as the real representation we get 0

cos α

B Ad½gα ¼ @ sin α 0

sin α

0

1

C 0 A: 1

cos α 0

ð20:295Þ

Calculating (e1 e2 e3)Ad[gβ] likewise, we get 0

cos β B Ad gβ ¼ @ 0 sin β

0 1

1 sin β C 0 A:

0

cos β

ð20:296Þ

Moreover, with gγ ¼ we have

eiγ=2

0

0

eiγ=2

! ,

ð20:297Þ

880

20

0

cos γ B Ad gγ ¼ @ sin γ 0

sin γ cos γ 0

Theory of Continuous Groups

1 0 C 0 A:

ð20:298Þ

1

Finally, let us reproduce (17.101) by calculating Ad½gα Ad gβ Ad gγ ¼ Ad gα gβ gγ ,

ð20:299Þ

where we used the fact that gω ⟼ Ad[gω] (ω ¼ α, β, γ) is a representation of SU ̬ ̬ (2) on su ð2Þ . To obtain an explicit form of the representation matrix of Ad[g] [g 2 SU(2)], we use 0

gαβγ

β B 2 gα gβ gγ ¼ @ i β ð γα Þ e2 sin 2 e2ðαþγÞ cos i

1 β 2 C A β 2i ðαþγ Þ e cos 2 e2ðαγÞ sin i

ð20:300Þ

and calculate Ad[gαβγ ](e1) such that Ad gαβγ ðe1 Þ ¼ gαβγ e1 g1 αβγ 0 i 0 i i β β 1 β β1 ! 2i ðαþγÞ e2ðαþγÞ cos e e2ðαγÞ sin cos e2ðαγÞ sin 0 i 2 2 C 2 2C 1 B B ¼ pffiffiffi @ A @ A i i i i β β β β 2 i 0 e2ðγαÞ sin e2ðαþγÞ cos e2ðγαÞ sin e2ðαþγÞ cos 2 2 2 2 1 0 i sin β cos γ ið cos α cos β cos γ sin α sin γ Þ C B þ sin α cos β cos γ þ cos α sin γ C B C B C 1 B C B ¼ pffiffiffi B C 2B C C B ið cos α cos β cos γ sin α sin γ Þ A @ i sin β cos γ ð sin α cos β cos γ þ cos α sin γ Þ 0 1 cos α cos β cos γ sin α sin γ B C C ¼ ð e1 e2 e3 Þ B @ sin α cos β cos γ þ cos α sin γ A: sin β cos γ ð20:301Þ Note that the matrix of (20.300) is identical with (20.45). Similarly calculating Ad [gαβγ ](e2) and Ad[gαβγ ](e3), we obtain

20.4

Lie Groups and Lie Algebras

881

ðe1 e2 e3 Þ Ad gαβγ ¼ 0 1 cosαcosβcosγ sinαsinγ cosαcosβsinγ sinαcosγ cosαsinβ B C sinαcosβcosγ þ cosαsinγ sinαcosβsinγ þ cosαcosγ sinαsinβ C: ð e1 e2 e3 Þ B @ A sinβcosγ

sinβsinγ

cosβ ð20:302Þ

The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles. The calculations are somewhat lengthy but straightforward. As mentioned just above and as can be seen, e.g., in (20.294) and (20.302), we may symbolically express the adjoint representation as [9] Ad : SU ð2Þ ! SOð3Þ

ð20:303Þ

in parallel to (20.289). Accordingly, from (20.302) we may identify Ad[gαβγ ] with the SO(3) matrix (17.101) and write Ad gαβγ 0 cos α cos β cos γ sin α sin γ cos α cos βsinγ sin α cos γ B sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ ¼B @ sin β cos γ

sin β sin γ

cos α sin β

1

C sin α sin β C: A cos β ð20:304Þ

In the above discussion, we make several remarks. (i) g 2 SU(2) is described by (2, 2) complex matrices, but Ad[SU(2)] is expressed by (3, 3) real orthogonal matrices. Thus, the dimension of the matrices is different. (ii) Notice again that ̬ ̬ (e1 e2 e3) of (20.302) is an orthonormal basis set of su ð2Þ. Hence, (20.302) can be viewed as the orthogonal transformation of (e1 e2 e3). That is, we may regard Ad[SU (2)] as all the matrix representations of SO(3). From (12.64) and (20.302), we write [9] Ad½SU ð2Þ SOð3Þ ¼ 0 or Ad½SU ð2Þ ¼ SOð3Þ: Regarding the adjoint representation of G, we have the following important theorem. Theorem 20.7 The adjoint representation of Ad[SU(2)] is a surjective homomorphism of SU(2) on SO(3). The kernel F of the representation is {e, e} 2 SU(2). With any rotation R 2 SO(3), two elements g 2 SU(2) satisfy Ad[g] ¼ Ad [g] ¼ R. Proof We have shown that by (20.302) the adjoint representation of Ad[SU(2)] is surjective mapping of SU(2) on SO(3). Since Ad[g] is a representation, namely

882

20

Theory of Continuous Groups

homomorphism mapping from SU(2) to SO(3), we seek its kernel on the basis of Theorem 16.3 (Sect. 16.4). Let h be an element of kernel of SU(2) such that Ad½h ¼ E 2 SOð3Þ,

ð20:305Þ ̬ ̬

where E is the identity transformation of SO(3). Operating (20.305) on e3 2su ð2Þ, from (20.305) we have a following expression such that Ad½hðe3 Þ ¼ he3 h1 ¼ Eðe3 Þ ¼ e3 or he3 ¼ e3 h, where e3 is denoted by (20.290). Expressing h as

ih11

ih12

ih21

ih22

¼

h11

h12

h21

h22

ih11

ih12

ih21

ih22

, we get

:

From the above relation, we must have h12 ¼ h21 ¼ 0. As h 2 SU(2), deth ¼ 1. a 0 Hence, for h we choose h ¼ , where a 6¼ 0. Moreover, considering 0 a1 he2 ¼ e2h we get

0 a1

a 0

¼

0 a

a1 : 0

ð20:306Þ

This implies that a ¼ a1, or a ¼ 1. That is, h ¼ e, where e denotes the identity element of SU(2). (Note that we get no further conditions from he1 ¼ e1h.) Thus, we obtain F ¼ fe, eg:

ð20:307Þ

̬ ̬

In fact, with respect to X 2su ð2Þ we have Ad½ eðX Þ ¼ ð eÞX ð eÞ1 ¼ ð eÞX ð eÞ ¼ X:

ð20:308Þ

Therefore, certainly we get Ad½ e ¼ E:

ð20:309Þ

Meanwhile, suppose that with g1, g2 2 SU(2) we have Ad[g1] ¼ Ad[g2] ¼ E 2 SO (3). Then, we have

20.4

Lie Groups and Lie Algebras

883

Ad g1 g2 1 ¼ Ad½g1 Ad g2 1 ¼ Ad½g2 Ad g2 1 ¼ Ad g2 g2 1 ¼ Ad½e ¼ E:

ð20:310Þ

Therefore, from (20.307) and (20.309) we get g1 g2 1 ¼ e or g1 ¼ g2 :

ð20:311Þ

Ad½g ¼ Ad½eAd½g ¼ Ad½g,

ð20:312Þ

Conversely, we have

where with the first equality we used Theorem 20.6 and with the second equality we used (20.309). The relations (20.311) and (20.312) imply that with any g 2 SU (2) there are two elements g that satisfy Ad[g] ¼ Ad[g]. These complete the proof. ∎ In the above proof of Theorem 20.7, (20.309) is a simple, but important expression. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above discussion as follows: Let Ad be a homomorphism mapping such that Ad: SU ð2Þ⟶SOð3Þ

ð20:313Þ

with a kernel F ¼ fe, eg. The relation is schematically represented in Fig. 20.6. Notice the resemblance between Fig. 20.6 and Fig. 16.1a. Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic f such that mapping Ad f SU ð2Þ=F ⟶SOð3Þ, Ad:

ð20:314Þ

where F ¼ fe, eg. Symbolically writing (20.314) as in Sect. 16.4, we have SU ð2Þ=F ffi SOð3Þ. Notice that F is an invariant subgroup of SU(2). Using a general

(2)

(3) Ad −

Fig. 20.6 Homomorphism mapping Ad: SU(2) ⟶ SO(3) with a kernel F ¼ fe, eg . E is the identity transformation of SO(3)

884

20

Theory of Continuous Groups

form of SU(2) of (20.300) and a general form of SO(3) of (20.302), we can readily show Ad[gαβγ ] ¼ Ad[gαβγ ]. That is, from (20.300) we have gαþ2π,βþ2π,γþ2π ¼ gαβγ : Both the above two elements gαβγ and gαβγ produce an identical orthogonal matrix given in RHS of (20.302). To associate the above results of abstract algebra with the tangible matrix form obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of sin β ¼ 2 sin

β β cos : 2 2

That is, we get pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ð1Þm0 mþμ ð jþmÞ!ð jmÞ!ð jþm0 Þ!ð jm0 Þ! μ ð jþmμÞ!μ!ðm0 mþμÞ!ð jm0 μÞ! 2j " mm0 2μ m0 mþ2μ # β β β iðαm0 þγmÞ cos cos sin e 2 2 2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ð1Þm0 mþμ ð jþmÞ!ð jmÞ!ð jþm0 Þ!ð jm0 Þ! 0 ¼ 2mm 2μ μ ð jþmμÞ!μ!ðm0 mþμÞ!ð jm0 μÞ!

ð jÞ

Dm0 m ðα,β,γ Þ¼

e

iðαm0 þγmÞ

m0 mþ2μ

ð sinβÞ

β cos 2

2jþ2ðmm0 2μÞ :

ð20:315Þ

From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and m0. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But (ii) if j is a half-odd-integer, so are m and m0. Then, (20.315) is a periodic function with 4π. Note that in the case of (ii) 2j ¼ 2n + 1 (n ¼ 0.1, 2, ). Therefore, in (20.315) we have

β cos 2

2jþ2ðmm0 2μÞ

¼

2ðnþmm0 2μÞ β β cos : cos 2 2

ð20:316Þ

As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics span the representation space of D(l ) (l ¼ 0, 1, 2, ). The simplest case for it is D(0) ¼ 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is ð12Þ in (20.45) or by gαβγ in D(1/2) whose matrix representation was given by Dα,β,γ (20.300). With gαβγ , both Ad[ gαβγ] produce a real orthogonal (3, 3) matrix expressed as (20.302). This representation matrix allowedly belongs to SO(3).

20.5

20.5

Connectedness of Lie Groups

885

Connectedness of Lie Groups

As already discussed in Chap. 6, reflecting the topological characteristics the connectedness is an important concept in the Lie groups. If we think of the analytic functions on the complex plane ℂ2, the connectedness was easily envisaged geometrically. To deal with the connectedness in general Lie groups, on the other hand, we need abstract concepts. In this section we take a down-to-earth approach as much as possible.

20.5.1 Several Definitions and Examples To discuss this topic, let us give several definitions and examples. Definition 20.5 Suppose that A is a subset of GL(n, ℂ) and that we have any pair of elements a, b 2 A ⊂ GL(n, ℂ). Meanwhile, let f(t) be a continuous function defined in an interval 0 t 1 and take values within A; that is, f(t) 2 A with 8t. If f(0) ¼ a and f(1) ¼ b, then a and b are said to be connected within A. Major parts of the discussion that follows are based mostly upon literature [9]. A simple example for the above is given below. 1 0 1 0 Example 20.6 Let a ¼ and b ¼ be elements of SO(2). Let f 0 1 0 1 (x) be described by f ðt Þ ¼

cos πt sin πt

sin πt : cos πt

Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) ¼ a and f(1) ¼ b and, hence, a and b are connected to each other within SO(2). For a and b to be connected within A is denoted by a~b. Then, the symbol ~ satisfies the equivalence relation. That is, we have a following proposition. Proposition 20.2 We have a, b, c 2 A ⊂ GL(n, ℂ). Then, we have following three relations. (i) a~a. (ii) If a~b, b~a. (iii) If a~b and b~c, then we have a~c. Proof (i) Let f(t) ¼ a (0 t 1). Then, f(0) ¼ f(1) ¼ a so that we have a~a. (ii) Let f(t) be a continuous curve that connects a and b such that f(0) ¼ a and f(1) ¼ b (0 t 1), and so a~b. Let g(t) be another continuous curve within A. Then, g(t) f (1 t) is also a continuous curve within A. Then, g(0) ¼ b and g(1) ¼ a. This implies b~a. (iii) Let f(t) and g(t) be two continuous curves that connect a and b and b and c, respectively. Meanwhile, we define h(t) as

886

20

8 >
: gð2t 1Þ 1 t 1 2 f ð2t Þ

so that h(t) can be a third continuous curve. Then, h(1/2) ¼ f(1) ¼ g(0). From the supposition we have h(0) ¼ f(0) ¼ a, h(1/2) ¼ f(1) ¼ g(0) ¼ b, and h(1) ¼ g(1) ¼ c. This implies if a~b and b~c, then we have a~c. Let us give another related definition. Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection of points that are connected to a within A is called a connected component of a and denoted by C(a). From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) ¼ ∅. (II) C(a) ¼ C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) ¼ C(b). Thus, the subset A is a direct sum of several connected components such that A ¼ C ðaÞ [ C ðbÞ [ [ CðzÞ with any CðpÞ \ C ðqÞ ¼ ∅ ðp 6¼ qÞ,

ð20:317Þ

where p, q are taken from among a, b, , z. In particular, the connected component containing e (i.e., the identity element) is of major importance. Definition 20.7 Let A be a subset of GL(n, ℂ). If any pair of elements a, b (2A) is connected to each other within A, then A is connected or A is called a connected set. The connected set is nothing but a set that consists of a single connected set. With a connected set, we have the following important theorem. Theorem 20.8 An image of a connected set A obtained by a continuous mapping f is a connected set as well. Proof The proof is almost self-evident. Let f: A ⟶ B be a continuous mapping from a connected set A to B. Suppose f(A) ¼ B. Then, we have 8a, a0 2 A and b, b0 2 B that satisfy f(a) ¼ b and f(a0) ¼ b0. Then, from Definition 20.5 there is a continuous function g(t) (0 t 1) that satisfies g(0) ¼ a and g(1) ¼ a0. Now, define a function h(t) f [g(t)]. Then, h(t) is a continuous mapping with h(0) ¼ f [g(0)] ¼ f(a) ¼ b and h(1) ¼ f [g(1)] ¼ f(a0) ¼ b0. Again from Definition 20.5, h(t) is a continuous curve that connects b and b0 within B. Meanwhile, from supposition of f(A) ¼ B, with 8 b, b0 2 B we must have ∃ a0 , a00 2 A that satisfy f(a0) ¼ b and f a00 ¼ b0 . Consequently, from Definition 20.7, B is a connected set as well. This completes the proof. ∎ Applications of Theorem 20.8 are given below as examples. Example 20.7 Let f(x) be a real continuous function defined on a subset A ⊂ GL (n, ℝ), where x 2 A with A being a connected set. Suppose that for a pair of

20.5

Connectedness of Lie Groups

887

a, b 2 A, f(a) ¼ p and f(b) ¼ q ( p, q : real) with p < q. Then, f(x) takes all real numbers for its value that exist on the interval [p, q]. In other words, we have f (A) ⊃ [p, q]. This is well known as an intermediate value theorem. Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function g(t) (0 t 1) such that g(0) ¼ a and g(1) ¼ b. Meanwhile, suppose that we can define a real determinant for any 8g 2 A as a real continuous function detg(t). Now, suppose that deta ¼ p < 0 and detb ¼ q > 0 with p < q. Then, from Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some x 2 A such that det(x) ¼ 0. But, this is in contradiction to A ⊂ GL(n, ℝ), because we must have det(A) 6¼ 0. Thus, within a connected set the sign of the determinant should be constant, if the determinant is defined as a real continuous function. In relation to the discussion of the connectedness, we have the following general theorem. Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group G that contains the identity element e. Then, G0 is an invariant subgroup of G. A connected component C(g) containing g 2 G is identical with gG0 ¼ G0g. Proof Let a, b 2 G0. Since G0 ¼ C(e), from the supposition there are continuous curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such that f(0) ¼ a, f(1) ¼ e and g(0) ¼ b, g(1) ¼ e. Meanwhile, let h(t) f(t)[g(t)]1. Then, h(t) is another continuous curve. We have h(0) ¼ f(0)[g(0)]1 ¼ ab1 and h(1) ¼ f(1) [g(1)]1 ¼ e(e1) ¼ e e ¼ e. That is, ab1 and e are connected, being indicative of ab1 2 G0. This implies that G0 is a subgroup of G. Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above. Defining k(t) as k(t) ¼ g f(t)g1, k(t) is a continuous curve within G with k (0) ¼ gag1 and k(1) ¼ g eg1 ¼ e. Hence, we have gag1 2 G0. Since a is an arbitrary point of G0, we have gG0g1 ⊂ G0 accordingly. Similarly, g is an arbitrary point of G, and so replacing g with g1 in the above, we get g1G0g ⊂ G0. Operating g and g1 from the left and right, respectively, we obtain G0 ⊂ gG0g1. Combining this with the above relation gG0g1 ⊂ G0, we obtain gG0g1 ¼ G0 or gG0 ¼ G0g. This means that G0 is an invariant subgroup of G; see Sect. 16.3. Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) ¼ g f(t) is a continuous curve within G with l(0) ¼ ga and l(1) ¼ g e ¼ g. Then, ga and g are connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0, gG0 ⊂ C (g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve m(t) (0 t 1) that connects p with g within G. This implies that m(0) ¼ p and m(1) ¼ g. Also setting a continuous curve n(t) ¼ g1m(t), we have n(0) ¼ g1m(0) ¼ g1p and n (1) ¼ g1m(1) ¼ g1g ¼ e. This means that g1p 2 G0. Multiplying g on its both sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get C(g) ⊂ gG0. Combining this with the above relation gG0 ⊂ C(g), we obtain C(g) ¼ gG0 ¼ G0g. These complete the proof. ∎ In the above proof, we add the following statement: In Sect. 16.2 with the necessary and sufficient condition for the subset to be a subgroup, we describe

888

20

Theory of Continuous Groups

(1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h1 2 H . In (1) if we replace hj with h1 i , 1 1 1 ¼ h ⋄h ¼ e 2 H . In turn, if we replace h with e, h ⋄h ¼ e⋄h hi ⋄h1 i i i j i j ¼

1 j 1 1 1 1 hj 2 H . Finally, if we replace hj ¼ hj , hi ⋄hj ¼ hi ⋄ hj ¼ hi ⋄hj 2 H . Thus, H satisfies the axioms (A1), (A3), and A(4) of Sect. 16.1 and, hence, forms a (sub)group. In other words, the conditions (1) and (2) are combined so as to be hi 2 H , hj 2 H ⟹hi ⋄h1 j 2 H: The above general theorem can immediately be applied to an important class of Lie groups SO(n). We discuss this topic below.

20.5.2 O(3) and SO(3) In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their subgroups. In this section, we examine several important properties in terms of Lie groups. The groups O(3) and SO(3) are characterized by their determinant detO(3) ¼ 1 and detSO(3) ¼ 1. Example 20.8 implies that in O(3) there should be two different connected components according to detO(3) ¼ 1. Also Theorem 20.9 tells us that a connected component C(E) is an invariant subgroup G0 in O(3), where E is the identity element of O(3). Obviously, G0 is SO(3). In turn, another connected component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with the notation see Sect. 6.1. Remembering (20.317), we have Oð3Þ ¼ SOð3Þ [ SOð3Þc :

ð20:318Þ

This can be rewritten as a direct sum such that Oð3Þ ¼ CðE Þ [ C ðEÞ with C ðE Þ \ C ðE Þ ¼ ∅, 0

1

B where E denotes a (3, 3) unit matrix and E ¼ @ 0

0

ð20:319Þ 0

1

C 0 A . In the above, 0 0 1 SO(3) is identical to C(E) with SO(3) itself being a connected set; SO(3)c is identical with C(E) that is another connected set. Another description of O(3) is [13]. Oð3Þ ¼ SOð3Þ fE, Eg,

1

ð20:320Þ

where the symbol denotes a direct-product group (Sect. 16.5). An alternative description for this is

20.5

Connectedness of Lie Groups

889

Oð3Þ ¼ SOð3Þ [ fðE ÞSOð3Þg,

ð20:321Þ

where the direct sum is implied. In Sect. 20.2.6, we discussed two rotations around different rotation axes with the same rotation angle. These rotations belong to the same conjugacy class; see (20.140). With Q as another rotation that transforms the rotation axis of Rω to that of R0ω , the two rotations Rω and R0ω having the same rotation angle ω are associated with each other through the following relation: R0ω ¼ QRω Q1 :

ð20:140Þ

Notice that this relation is independent of the choice of specific coordinate system. In Fig. 20.4 let us choose the z-axis for the rotation axis A with respect to the rotation Rω. Then, the rotation matrix Rω is expressed in reference to the Cartesian coordinate system as 0

cos ω B Rω ¼ @ sin ω 0

1 0 C 0 A:

sin ω cos ω 0

ð20:322Þ

1

Meanwhile, multiplying E on both sides of (20.140), we have R0ω ¼ QðRω ÞQ1 :

ð20:323Þ

Note that E is commutative with any Q (or Q1). Also, notice that from (20.323) R0ω and Rω are again connected via a unitary similarity transformation. From (20.322), we have 0

1 0 1 cosω sinω 0 cos ðω þ π Þ sin ðω þ π Þ 0 B C B C Rω ¼ @ sinω cosω 0 A ¼ @ sin ðω þ π Þ cos ðω þ π Þ 0 A, ð20:324Þ 0 0 1 0 0 1 where Rω represents an improper rotation of an angle ω + π (see Sect. 17.1). Referring to Fig. 17.12 and Table 17.6 (Sect. 17.3), as another tangible example we further get 0

0

B @ 1 0

1 0 0

0

10

1

CB 0 A@ 0 1 0

0 1 0

0

1

0

0

C B 0 A ¼ @1 1 0

1 0

1

C 0 0 A, 0 1

ð20:325Þ

890

20

Theory of Continuous Groups

where the first component of LHS belongs to O (octahedral rotation group) and the RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining (20.324) and (20.325), we get the following schematic representation: ðRotationÞ ðInversionÞ ¼ ðImproper rotationÞ or ðMirror symmetryÞ: This is a finite group version of (20.320).

20.5.3 Simply Connected Lie Groups: Local Properties and Global Properties As a final topic of Lie groups, we outline the characteristic of simply connected groups. In (20.318), (20.319), and (20.321) we showed three different decompositions of O(3). This can be generalized to O(n). That is, instead of E we may take F described by [9]. 0 B B B B F¼B B B @

1

1

C C C C C, C C A

1 ⋱ 1 1

where detF ¼ 1. Then, we have OðnÞ ¼ SOðnÞ [ SOðnÞc ¼ CðE Þ [ C ðF Þ ¼ SOðnÞ þ FSOðnÞ,

ð20:326Þ

where the last side represents the coset decomposition (Sect. 16.2). This is essentially the same as (20.321). In terms of the isomorphism, we have OðnÞ=SOðnÞ ffi fE, F g:

ð20:327Þ

If in particular n is odd (n 3), we can choose E instead of F, because det (E) ¼ 1. Then, similarly to (20.320) we have OðnÞ ¼ SOðnÞ fE, Eg ðn: oddÞ,

ð20:328Þ

where E is a (n, n) identity matrix. Note that since E is commutative with any elements of O(n), the direct-product of (20.328) is allowed.

20.5

Connectedness of Lie Groups

891

Of the connected components, that containing the identity element is of particular importance. This is because as already mentioned in Sect. 20.3.1, the “initial” condition of the one-parameter group A(t) is set at t ¼ 0 such that Að0Þ ¼ E:

ð20:238Þ

Under this condition, it is important to examine how A(t) evolves with t, starting from E at t ¼ 0. In this connection we have the following theorems that give good grounds to further discussion of Sect. 20.2. We describe them without proof. Interested readers are encouraged to look up literature [9]. Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra of G and G0 is identical. Theorem 20.10 shows the significance of the connected component of the ̬ ̬ ̬ identity. From this theorem the Lie algebras of, e.g., o ðnÞ and so ðnÞ are identical, namely both of the Lie algebras are given by real skew-symmetric matrices. This is inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector) as an element of Lie algebra corresponds to the identity element of the Lie group. This is obvious from the relation e ¼ exp 0, where e is the identity element of the Lie group and 0 represents the zero matrix. Basically Lie algebras are suited for describing local properties associated with infinitesimal transformations around the identity element e. More specifically, the exponential functions of the real skew-symmetric matrices cannot produce E. Considering that one of the connected components of O(3) is SO(3) that̬ ̬ contains ̬ the identity element, it is intuitively understandable that o ðnÞ and so ðnÞ are the same. Another interesting point is that SO(3) is at once an open set and a closed set (i.e., clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary and sufficient condition for a subset S of a topological space to be both open and closed at once is Sb ¼ ∅. Since the determinant of any elements of O(3) is alternatively 1, it is natural that SO(3) has no boundary; if there were boundary, its determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example 20.8. Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0 in the sense of Theorem 20.10.) Then, 8g 2 G can be described by g ¼ ð exp t 1 X 1 Þð exp t 2 X 2 Þ ð exp t d X d Þ,

ð20:329Þ

892

20

Theory of Continuous Groups

where Xi (i ¼ 1, 2, , d ) is a basis set of Lie algebra corresponding to G and ti (i ¼ 1, 2, , d) is an arbitrary real number. Originally, the notion of Lie algebra has been introduced to deal with infinitesimal transformation near the identity element. Yet, Theorem 20.11 says that any transformation (or element) of a connected linear Lie group can be described by a finite number of elements of the corresponding Lie algebra. That is, the Lie algebra determines the global characteristics of the connected Lie group. Strengthening the condition of connectedness, we need to deal with simply connected groups. In this context we have following definitions. Definition 20.8 Let G be a linear Lie group and let an interval I ¼ [0, 1]. Let f be a continuous function such that f : I⟶G

or

f ðI Þ ⊂ G:

ð20:330Þ

Then, the function f is said to be a path, in which f(0) and f(1) are an initial point and an end point, respectively. If f(0) ¼ f(1) ¼ x0, f is called a loop (i.e., a closed path) at x0 [14]. If f(t) x0 (0 t 1), f is said to be a constant loop. Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed from one to the other, they are said to be homotopic. To be more specific with Definition 20.9, let us define a function h(s, t) that is continuous with respect to s and t in a region I I ¼ {(s, t); 0 s 1, 0 t 1}. If furthermore h(0, t) ¼ f(t) and h(1, t) ¼ g(t) hold, f and g are homotopic [9]. Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply connected. This would somewhat be an abstract concept. Rephrasing the statement in a word, for G to be simply connected is that loops at any 8x 2 G can continually be contracted to that point x. A next example helps understand the meaning of being simply connected. Example 20.9 Let S2 be a spherical surface of ℝ3 such that S2 ¼ x 2 ℝ3 ; hxjxi ¼ 1 :

ð20:331Þ

Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected. By the same token, a spherical hypersurface (or hypersphere) given by Sn ¼ {x 2 ℝn + 1; hx| xi ¼ 1} is simply connected. Thus, from (20.36) SU(2) is simply connected. Note that the parameter space of SU(2) is S3. Example 20.10 In Sect. 20.4.2 we showed that SO(3) is a connected set. But, it is not simply connected. To see this, we return to Sect. 17.4.2 that dealt with the threedimensional rotation matrices. Rewriting f R3 in (17.101) as f R3 ðα, β, γ Þ, we have

20.5

Connectedness of Lie Groups

893

f3 ðα, β, γ Þ R 0 cos α cos β cos γ sin α sin γ cos α cos βsinγ sin α cos γ B sin α cos β cos γ þ cos α sin γ sin α cos β sin γ þ cos α cos γ ¼B @ sin β cos γ

sin β sin γ

cos α sin β

1

C sin α sin β C: A cos β ð20:332Þ

Note that in (20.332) we represent the rotation in the moving coordinate system. Putting β ¼ 0 in (20.332), we get f R3 ðα, 0, γ Þ 0 cos α cos γ sin α sin γ cos α sin γ sin α cos γ B ¼B @ sin αð cos 0Þ cos γ þ cos α sin γ sin αð cos 0Þ sin γ þ cos α cos γ 0 0 0 1 cos ðα þ γ Þ sin ðα þ γ Þ 0 B C ¼B cos ðα þ γ Þ 0 C @ sin ðα þ γ Þ A: 0 0 1

0

1

C 0C A 1

ð20:333Þ If in (20.333) we replace α with α + ϕ0 and γ with γ ϕ0, we are going to obtain the same result as (20.333). That is, different sets of parameters give the same result (i.e., the same rotation) such that ðα, 0, γ Þ⟷ðα þ ϕ0 , 0, γ ϕ0 Þ:

ð20:334Þ

Putting β ¼ π in (20.332), we get 0

cos ðα γ Þ

f3 ðα, π, γ Þ ¼ B R @ sin ðα γ Þ

sin ðα γ Þ cos ðα γ Þ

0

0

0

1

C 0 A: 1

ð20:335Þ

If, in turn, we replace α with α + ϕ0 and γ with γ + ϕ0 in (20.335), we obtain the same result as (20.335). Again, different sets of parameters give the same result (i.e., the same rotation) such that ðα, π, γ Þ⟷ðα þ ϕ0 , π, γ þ ϕ0 Þ: Meanwhile, R3 of (17.107) is expressed as

ð20:336Þ

894

20

Theory of Continuous Groups

(b)

(a) N

−

+

Equator

N

2 −

S

(c)

−

N

Fig. 20.7 Geometry of the rotation axis (A) accompanied by a rotation γ. (a) Geometry viewed in parallel to the equator. (b, c) Geometry viewed from above the north pole (N). If the azimuthal angle e α should be replaced with (b) α + π (in the case of 0 α π) α is measured at the antipodal point P, or (c) α π (in the case of π < α 2π) 0

cos α cos β sin βð1 cos γ Þ cos 2 α cos 2 β þ sin 2 α cos γ cos α sinα sin 2 βð1 cos γ Þ 2 2 B þsin α sin β sin γ cos β sin γ α sin β þ cos B B B B 2 B sin α cos 2 β þ cos 2 α cos γ sinα cos β sin βð1 cos γ Þ B cos αsinα sin 2 βð1 cos γ Þ B R3 ¼ B cos αsinβ sin γ þcos β sin γ þ sin 2 α sin 2 β B B B B B B cos β sin β cos αð1 cos γ Þ sin 2 β cos γ sin αcos β sinβð1 cos γ Þ @ sin α sin β sin γ

þcos α sin β sin γ

1 C C C C C C C C C: C C C C C C A

þ cos 2 β

ð20:337Þ

20.5

Connectedness of Lie Groups

895

Note in (20.337) we represent the rotation in the fixed coordinate system. In this case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once again. In this parameter space, specifying SO(3) the rotation is characterized by azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of variability of the parameters (α, β, γ) are 0 α 2π, 0 β π, 0 γ 2π: Figure 20.7a shows a geometry viewed in parallel to the equator that includes a zenithal angle β as well as a point P (i.e., a point of intersection between the rotation e If β is measured at P, e β and γ should be axis and geosphere) and its antipodal point P. replaced with π β and 2π γ, respectively. Meanwhile, Fig. 20.7b, c depict an azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is e α should be replaced with either α + π (in the case measured at the antipodal point P, of 0 α π; see Fig. 20.7b) or α π (in the case of π < α 2π; see Fig. 20.7c). Consequently, we have the following correspondence: ðα, β, γ Þ⟷ðα π, π β, 2π γ Þ:

ð20:338Þ

Thus, we confirm once again that the different sets of parameters give the same rotation. In other words, two different points in the parameter space correspond to the same transformation (i.e., rotation). In fact, using these different sets of parameters, the matrix form of (20.337) is held unchanged. The conformation is left for readers. Because of the aforementioned characteristic of the parameter space, a loop cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply connected. In contrast to Example 20.10, the mapping from the parameter space S3 to SU(2) is bijective in the case of SU(2). That is, any two different points of S3 give different transformations on SU(2) (i.e., injective). For any element of SU(2) it has a corresponding point in S3 (i.e., surjective). This can be rephrased such that SU (2) and S3 are isomorphic. In this context, let us give important concepts. Let (T, τ) and (S, σ) be two topological spaces. Suppose that f : (T, τ) ⟶ (S, σ) is a bijective mapping. If in this case, furthermore, both f and f1 are continuous mappings, (T, τ) and (S, σ) are said to be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic. Apart from being simply connected or not, SU(2) and SO(3) resemble in many aspects. The resemblance has already shown up in Chap. 3 when we considered the (orbital) angular momentum and generalized angular momentum. As shown in (3.30) and (3.69), the commutation relation was virtually the same. It is obvious when we compare (20.271) and (20.275). That is, in terms of Lie algebras their structure constants are the same and, eventually, the structure of corresponding Lie groups is related. This became well understood when we considered the adjoint

896

20

Theory of Continuous Groups

̬

representation of the Lie group G on its Lie algebra g . Of the groups whose Lie algebras are characterized by the same structure constants, the simply connected group is said to be a universal covering group. For example, among the Lie groups O (3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal covering group. The understanding of the Lie algebra is indispensable to explicitly describing the elements of the Lie group, especially the connected components of the identity element e 2 G. Then, we are able to understand the global characteristics of the Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example of linear Lie groups. Finally, though formal, we describe a definition of a topological group. Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is called a topological group. (i) The set G is a group. (ii) The set G is a T1-space. (iii) Group operations (or calculations) of G are continuous. That is, let P and Q be two mappings such that P : G G⟶G; Pðg, hÞ ¼ gh ðg, h 2 GÞ,

ð20:339Þ

Q : G⟶G; QðgÞ ¼ g1 ðg 2 GÞ:

ð20:340Þ

Then, both the mappings P and Q are continuous. By Definition 20.11, a topological group combines the structures of group and topological space. Usually, the continuous group is a collective term of Lie groups and topological groups. The continuous groups are very frequently dealt with in various fields of natural science and mathematical physics.

References 1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer, Berlin 2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, expanded edn. Shokabo, Tokyo. (in Japanese) 3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese) 4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Academic Press, Waltham 5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York 6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York 7. Hassani S (2006) Mathematical physics. Springer, New York 8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press, Princeton 9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo. (in Japanese) 10. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York

References

897

11. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York 12. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese) 13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer algebra, 2nd edn. World Scientific, Singapore 14. McCarty G (1988) Topology. Dover, New York 15. Murakami S (2004) Foundation of continuous groups, revived edn. Asakura Publishing, Tokyo. (in Japanese)

Index

A Abelian group, 622, 626, 689, 710, 746, 762 Abscissa axis, 181, 197 Absolute convergence, 583 Absolute temperature, 340 Absolute value, 127, 138, 141, 197, 205, 216, 337, 354, 567, 777, 782, 808 Abstract algebra, 846, 852, 884 Abstract concept, 433, 635, 688, 885, 892 AC5, 361–363 AC’7, 359–363 Accumulation points, 191–193, 223, 225, 226 Adherent point, 186, 192 Adjoint boundary functionals, 393 Adjoint Green’s functions, 396, 402 Adjoint matrix, 22, 524, 584 Adjoint operator, 109, 387, 389, 523, 535–539, 541, 554 Adjoint representation, 874–884, 895 Algebraic branch point, 256 Algebraic equation, 15 Algebraic method, 1 Allowed transition, 138, 147, 764, 776 Allyl radical, 688, 689, 691, 747, 770–777 Aluminum-doped zinc oxide (AZO), 363, 369–371 Ampère, A.-M., 273 Ampère’s circuital law, 273 Amplitude, 278, 284, 285, 298, 300, 303, 305, 310, 324, 332, 333, 335, 351, 375, 377, 422, 423 Analytical equation, 1 Analytical method, 77, 83, 91 Analytical solutions, 44, 107, 151

© Springer Nature Singapore Pte Ltd. 2020 S. Hotta, Mathematical Physical Chemistry, https://doi.org/10.1007/978-981-15-2225-3

Analytic continuation, 225–227, 234, 264 Analytic functions, 181–265, 319, 581, 582, 866, 885 Analytic prolongation, 227 Angular coordinate, 197 Angular frequency, 4, 6, 32, 127, 129, 133, 139, 278, 341, 344, 346, 351, 356 Angular momentum operator, 29, 77–91, 108, 142, 145 Anisotropic crystal, 366 dielectric constant, 365, 366 media, 569 medium, 365 Annihilation operators, 31, 41, 111, 112 Anti-Hermitian, 28, 29, 109, 116, 117, 387, 590, 592, 600, 610, 802, 806, 810, 867, 870–873, 875, 876, 878 Anti-Hermitian operators, 28, 109, 610, 806, 810, 870, 873 Antilinear, 526 Anti-phase, 336, 337 Antipodal point, 895 Antisymmetric, 412, 723–726, 858, 860, 863, 864 Antisymmetric representation, 723–726, 858, 860, 863, 864 Approximation methods, 151–179 Arc tangent, 63, 371 Arguments, 30, 55, 66, 152, 187, 197, 227, 231, 238, 243, 247, 252, 254, 256, 258, 260, 261, 264, 303, 317, 319, 353, 385, 394, 398–400, 403, 405, 408, 412, 416, 418, 454, 463, 489, 498, 499, 517, 532, 557,

899

900 569, 573, 608, 615, 623, 647, 658, 681, 693, 704, 707, 731, 734, 740, 747, 751, 764, 797, 806, 817, 820, 826, 828, 847 Aromatic hydrocarbons, 729, 747 Aromatic molecules, 756, 770, 777 Associated Laguerre polynomials, 57, 108, 116–122 Associated Legendre differential equation, 83, 91–103, 427 Associated Legendre functions, 57, 83, 94, 103–107 Associative law, 23, 441, 541, 624, 631, 663, 666 Atomic orbitals, 729, 734, 742, 744, 745, 747, 749, 752, 757, 764, 770, 771, 777, 785, 788, 789 Attribute, 636, 637 Azimuthal angle, 29, 70, 674, 791, 831, 839, 895 Azimuthal quantum numbers, 120, 142, 147 Azimuth of direction, 663

B Backward wave, 278, 331, 333, 335 Ball, 220 Basis functions, 88, 683–691, 720, 734, 744, 748–750, 777, 799, 815–817, 819, 839, 842, 843, 846, 849, 859, 860, 863, 864 set, 457, 515, 516, 541, 547, 584, 685, 692, 693, 720, 739, 847, 849, 854, 878, 881, 892 vectors, 8, 41, 59, 60, 132, 140, 155, 291, 395, 433, 435, 439–446, 448, 450, 452–460, 468–470, 474, 477, 487, 489–491, 495, 505–507, 513, 515, 518–520, 528, 529, 533, 537–540, 547, 568, 576, 636, 640, 648, 650, 654, 657, 677, 679, 684–686, 688, 689, 691, 704, 710, 711, 716, 717, 719, 730, 738, 740, 744, 745, 747, 751, 757, 761, 764, 770, 783, 784, 803, 805, 820, 822, 843, 847, 848, 852, 854, 855, 859, 860, 863, 872, 878 Benzene, 648, 729, 747, 762, 764–771 Bijective, 447, 450, 451, 623, 628, 629, 895 Bijective mapping, 895 Binomial expansion, 233, 234, 852 Binomial theorem generalized, 95, 233 Biphenyl, 648

Index Bithiophene, 648 Blackbody, 339–341 Blackbody radiation, 339–341 Block matrix, 686, 860 Bohr radius of hydrogen, 133 of hydrogen-like atoms, 108 Boltzmann distribution law, 339, 348, 355 Born probability rule, 565 Boron trifluoride (BF3), 650 Bose–Einstein distribution functions, 341 Boundary, 14, 71, 109, 117, 173, 174, 189–190, 325–327, 341, 344, 363, 365, 370, 379, 380, 384, 385, 388, 389, 393, 401, 405, 408, 418, 423, 597, 891 functionals, 380, 384, 388, 389, 393, 408 term, 386 Boundary conditions (BCs), 14–17, 19, 20, 25–27, 30, 32, 36, 56, 71, 72, 109, 116, 117, 173, 174, 285, 325–327, 341–343, 345, 363, 365, 370, 379, 380, 384–386, 388, 389, 393–395, 397–399, 401–403, 405, 407–411, 415, 418, 419, 421, 423–427, 581, 597–600, 602, 605, 607, 608, 610, 615 adjoint, 410, 415 homogeneous, 380, 388, 389, 393, 394, 397, 399, 401–403, 405, 407–409, 411, 423, 426, 427, 597, 600 homogeneous adjoint, 393–395 inhomogeneous, 380, 394, 395, 402, 403, 405, 407, 415, 418, 419, 424, 597–599, 605, 607, 608 Bounded, 25, 29, 74, 79, 212, 214–216, 221, 222 BP1T, 361–363 Bragg’s condition, 374 Branch, 250, 252, 253, 255, 256, 427, 547, 617 cut, 253, 255–259, 262, 263 point, 252–257, 259, 262–264 Bra vector, 22, 523, 526, 529 Brewster angles, 312–314

C Canonical commutation relation, 3, 27–30, 34, 43, 44, 54, 59, 65 Canonical coordinate, 27 Canonical forms of matrices, 459–522, 559 Canonical momentum, 27 Cardinalities, 183 Cardinal numbers, 183

Index Carrier space, 687 Cartesian coordinates, 45, 61, 109, 171, 197, 283, 321, 349, 650, 736, 791, 793, 804, 806, 823, 889 Cartesian space, 59, 436 Cauchy–Hadamard theorem, 220 Cauchy–Liouville theorem, 215–216 Cauchy Riemann conditions, 203–206 Cauchy–Schwarz inequality, 525, 585 Cauchy’s integral formula, 206–216, 219, 225, 229 Cauchy’s integral theorem, 209–212, 227, 243 Cavity, 33, 339, 341, 343–345, 375, 376, 378 Cavity radiation, 339 Center of inversion, 642 Center-of-mass coordinates, 57, 58 Central force fields, 57, 107 Characteristic equation, 421, 461, 462, 501, 533, 567, 573, 675, 689, 831 Characteristic impedance of dielectric media, 295, 336 Characteristic polynomial, 461, 471, 477, 481, 489, 513, 517, 520, 782 Characters, 373, 453, 524, 628, 631, 637, 694, 697–700, 702, 703, 722, 723, 726, 741, 745, 746, 748, 750, 751, 755–757, 759, 761, 765, 770, 775, 776, 780, 784, 796, 797, 834, 837–844 Chemical species, 374, 650 Circularly polarized light left-, 136, 138, 147–149, 293 right-, 136, 138 Clad layers, 321, 327, 331, 371 Classes, 191, 625–627, 637, 653, 654, 656, 657, 659, 662, 663, 699, 700, 704–711, 746, 834, 838, 839, 841, 888, 889 Classical Hamiltonian, 33 Classical orthogonal polynomials, 47, 94, 379, 427 Clebsch Gordan Coefficients, 843 Clopen sets, 190, 193, 195, 891 Closed interval, 196 Closed-open set, 190 Closed paths, 194, 209, 892 Closed sets, 185, 186, 188–190, 193, 195, 196, 212, 891 Closed shell, 741, 755 Closed surface, 273, 274 Closures, 186–189, 654 C-numbers, 10, 22 Cofactor, 471, 472 Cofactor matrix, 471, 472 Coherent state, 126–128, 133, 139, 149

901 Column vectors, 8, 21, 41, 88, 90, 290, 435, 440, 441, 452, 455, 457, 462–465, 502, 505, 507, 508, 510, 527, 535, 541, 542, 556, 557, 568, 573, 574, 577, 596, 613, 658, 677, 684, 759, 782, 810, 817, 820, 831, 863 Common factor, 477, 478, 482, 483 Commutable, 27, 482, 486–488, 524, 556, 579, 642, 643, 869 Commutation relation, 3, 27–30, 34, 43, 44, 54, 59, 65, 73, 77, 142–144, 872, 895 Commutative, 27, 171, 472, 518, 585, 591, 603, 604, 615, 622, 639, 649, 689, 824, 825, 865, 870, 871, 889, 890 Commutative groups, 622 Commutative law, 622 Commutator, 27–30, 144, 868, 872 Compatibility relation, 750 Complementary set, 183, 212, 888 Complements, 183, 185, 189, 190, 195, 212, 547–549, 559, 688, 691, 888 Complete orthonormal system (CONS), 151, 155, 163, 166, 167, 173, 842 Complete system, 155 Completely reducible, 825, 830 Complex amplitude, 303 Complex analysis, 181, 182, 197 Complex conjugate, 22, 24, 25, 28, 34, 117, 137, 199, 347, 390, 397, 401, 523, 524, 526, 536, 537, 693, 714, 762, 816, 817, 820, 850, 876 Complex conjugate representation, 762, 816, 817 Complex conjugate transposed matrix, 22, 537 Complex domain, 19, 201, 216, 227, 263–265, 315 Complex function, 14, 181, 199, 200, 204, 206, 207, 216, 254, 389, 764 Complex phase shift, 329 Complex plane, 14, 181, 182, 197–199, 201, 206, 207, 211, 212, 216, 227, 241, 242, 249, 251, 253, 263, 317, 318, 521, 885 Complex roots, 235, 420, 421 Complex variables, 15, 199–206, 215, 230, 532, 855 Composite function, 63 Compton, A., 4, 6 Compton effect, 4 Compton wavelength, 5 Condon–Shortley phase, 101, 102, 106, 123, 138, 140, 141 Conjugacy classes, 625, 654, 657, 659, 662, 699, 707, 710, 711, 841

902 Conjugate element, 625 Conjugate representation, 762 Connected components, 886–888, 891, 896 set, 886–888, 892 Connectedness, 194, 199, 887, 892, 896 Conservation of charge, 274 Conservation of energy, 5 Constant loop, 892 Constant matrix, 603, 604, 610, 617 Constant term, 169 Constructability, 401 Constructive interference, 357 Continuous function, 214, 885–887, 892 Continuous groups, 190, 663, 804, 816, 833, 838, 848, 877, 883, 894, 896 Continuous mapping, 886, 895 Contour integral, 207, 208, 229, 230, 232, 236, 239, 240, 242, 244 Contour integration, 211, 213, 214, 217, 220, 230, 232, 236, 238–240, 243, 247, 257, 261, 262, 265 Contour map, 729 Contraposition, 204, 223, 382, 449, 476, 528, 667, 825 Contravariant, 850, 851 Convergence circle, 220, 226 Convergence radius, 220 Coordinate point, 289, 292 Coordinate representation, 3, 31, 44–51, 112, 122, 129, 132, 136, 146, 157, 160, 165, 168, 170, 171, 175, 178, 256, 395, 396, 791, 823, 837 Coordinate transformation, 291, 367, 441, 573, 635, 641, 677, 738, 816, 837 Core layer, 321, 327, 329, 331 Corpuscular beam, 6, 7 Correction terms, 153–156 Coset decomposition, 624, 626, 649, 890 left, 624–626 right, 624–626 Coulomb integral, 740, 752, 767, 788 Coulomb potential, 57, 67, 107 Coulomb’s law, 270 Countable, 14, 565, 621 Coupling, 849 Covariant, 850, 851 Covering group, 896 Cramer’s rule, 306, 406, 809 Creation operator, 41 Critical angles, 312–314, 317, 318, 320, 329 Critical damping, 419

Index Cross-section area, 355, 356, 376 Crystal anisotropic, 366, 569 organic, 359, 360, 362, 363, 365, 372–374, 569 Cubic roots, 249 Current continuity equation, 274 Cyclic groups, 622 Cyclic permutation, 641, 677 Cyclopropenyl radical, 729, 747, 757–764, 767, 768, 770, 771

D Damped oscillator, 419–424 Damping critical, 419 over, 419 weak, 420 Damping constant, 419 Darboux inequality, 212–213, 231 de Broglie, L.-V., 6 de Broglie wave, 6 Definite integrals, 26, 98, 112, 132, 137, 171, 181, 230–248, 257, 259, 261, 717, 723, 741, 742, 792 de Moivre’s theorem, 198, 250 de Morgan’s law, 184, 186 Degeneracy, 152 Degenerate doubly, 768 triply, 789, 795, 796 Derived set, 191 Determinants, 42, 59, 287, 383, 409, 433, 448–452, 454, 471, 517, 531, 567, 571, 589, 647, 654, 662–664, 744, 787, 808, 863, 887, 888, 891 Device physics, 330, 359, 374 Device substrate, 363, 370 Diagonal elements, 87, 88, 90, 287, 450, 461, 462, 465, 476, 496, 498, 499, 512, 514, 518, 559, 560, 570, 588, 589, 639, 640, 654, 690, 697, 744, 745, 761, 782, 784, 787, 870, 871, 873 Diagonalizable, 475, 553 Diagonalizable matrices, 512–522, 561 Diagonalization, 466, 517, 573–580, 601, 667, 690 Diagonalized, 43, 54, 367, 475, 477, 486, 487, 513, 515, 530, 547, 556, 557, 559, 561, 567, 578–580, 600, 606, 690, 691, 782, 804

Index Diagonal matrices, 89, 459, 485, 517, 535, 556, 558, 559, 568, 573, 600, 601, 604, 638, 681, 689, 738, 782, 805, 826, 828, 879 Diagonal positions, 465, 486 Dielectric constant, 270, 271, 336, 365 Dielectric constant ellipsoid, 366 Dielectric medium, 269–272, 275, 276, 280, 285, 286, 295–337, 339, 357, 365, 371, 375, 376, 419 Differentiable, 65, 201, 204, 206, 215, 275, 406, 581, 591, 866, 868 Differential equations, 3, 14, 15, 17, 19, 26, 44, 50, 67, 83, 91–103, 107, 108, 169, 174, 269, 341, 368, 379–389, 392, 401, 403, 405, 408, 418, 421, 426, 427, 581, 592–600, 613, 617 Differential operators, 9, 14, 19, 26, 27, 50, 68, 72, 77, 167, 172–174, 270, 379, 386, 388–394, 399, 402, 403, 409, 414, 425, 426, 523, 547 Diffraction grating, 359, 361, 363, 371 Diffraction order, 364, 373 Dimensionless, 72, 77, 79, 108, 270–272, 278, 355, 359, 790, 801 Dimensionless parameter, 50, 152 Dimension theorem, 444, 445, 447, 479, 492, 493, 514 Dipole approximation, 127, 142, 347 isolated, 353 oscillation, 339 radiation, 349–354 Dirac equation, 12 Dirac, P., 11, 12, 22, 523–526 Direct factors, 632, 649 Direct sum, 191, 193, 437, 471, 473, 474, 477–481, 484, 487–489, 496, 513, 515, 517, 518, 547, 550, 633, 686, 687, 691, 699, 702, 719, 742, 749, 769, 770, 781, 825, 843, 886, 888, 889 Direction cosines, 279, 305, 658, 675, 676, 832, 834, 835 Direct-product groups, 621, 631–633, 649, 720, 723, 750, 843, 844, 888 Direct-product representations, 720–723, 725, 740, 741, 745, 769, 770, 796, 831, 843–846, 849, 859, 861, 863 Dirichlet conditions, 14, 19, 26, 71, 325, 342, 344, 408 Disconnected, 194 Discrete set, 191 Discrete topology, 195 Disjoint sets, 194

903 Dispersion of refractive index, 358, 359, 361, 362 Displacement current, 273, 275 Distance function, 181, 196, 199, 524 Distributive law, 184, 442 Divisor, 473, 514, 648 Domain of analyticity, 206, 209, 210, 212, 216, 227 complex, 19, 227, 263–265, 315 of variability, 181, 895

E Effective index, 362, 368 Eigenenergy, 43, 111, 121, 151–156, 162–164 Eigenequation, 460 Eigenfunctions, 17, 18, 24, 26, 36, 37, 40, 41, 67–69, 72, 86, 110, 112, 137, 152, 173, 429, 690, 715, 720, 739, 768, 773, 774, 793, 826, 861, 863 Eigenspace, 460, 468–473, 478–485, 487, 488, 496, 513, 515, 517, 559, 565, 577, 719 Eigenstate energy, 156, 157, 163, 166, 176, 774 Eigenvalue equation, 13, 24, 29, 33, 50, 51, 58, 126, 151, 152, 160, 172, 371, 408, 426, 460, 466, 502, 521, 534, 577, 729, 734, 742, 744, 789 problems, 14, 16, 18, 21, 24, 120, 125, 379, 425–430, 459, 522, 579, 691, 720, 785 Eigenvectors, 26, 41, 74, 152, 153, 155, 288, 290, 459–468, 471, 473–478, 493, 495, 497, 500–503, 505, 507, 510, 513, 515, 516, 518, 520, 521, 556, 559, 561, 566, 575–580, 654, 665, 667, 676, 691, 743, 746, 782, 831, 832 Einstein, A., 3, 4, 6, 7, 339, 341, 345–347, 349, 353, 354 Einstein coefficients, 349 Electric dipole approximation, 142 moment, 127, 164, 166, 741, 755 operator, 796 transition, 125–127, 129, 132, 141, 146, 740, 754, 755 Electric displacement, 270 Electric field, 127, 147, 149, 151, 156, 160, 161, 163, 164, 172, 174, 177, 269, 270, 272, 284–286, 289, 292, 293, 297, 303–306, 310, 315, 324, 325, 328, 329, 331, 335, 336, 344, 350–352, 368–370, 375, 376, 418, 419, 611, 755, 796

904 Electric flux density, 270, 367 Electric permittivity, 365 Electric permittivity tensor, 365 Electric waves, 284–286 Electromagnetic fields, 33, 127, 148, 295–297, 303, 305, 321, 351, 363, 365, 369, 371, 377, 569, 611 Electromagnetic induction, 272 Electromagnetic plane wave, 281 Electromagnetic waves, 127, 151, 269, 280–293, 295–337, 339, 341–345, 352, 357, 363, 367, 368, 375, 379, 419, 595–600 Electromagnetism, 166, 269, 270, 279, 378, 379, 427, 569 Electronic configuration, 741, 754, 755, 769, 774–776, 796, 797 Element, 45, 83, 127, 182, 287, 347, 433, 447, 523, 553, 581, 621, 635, 679, 735, 812 Elementary charge, 58, 127, 148, 347 Ellipse, 288, 289, 292, 366, 574, 575 Ellipsoidal coordinates, 790, 791, 793 Elliptic polarizations, 269 Elliptically polarized light, 288, 289, 292 Emission color, 374 Emission gain, 374 Empty set, 183, 185, 626 Endomorphism, 433, 441, 443, 444, 446–449, 452, 454, 622, 630, 876, 878 Energy conservation, 5, 311 Energy eigenvalues, 20, 31, 36, 37, 51, 133, 151, 152, 154, 158, 172, 532, 691, 719, 739, 751, 753, 754, 762, 767, 768, 774, 786, 795 Energy level, 41, 153–156, 172, 345, 764, 796 Energy transport, 308–311, 320, 335 Entire function, 206, 215, 216, 242 Equation of motion, 58, 148, 419 Equation of wave motion, 269, 276–280, 322 Equilateral triangle, 650, 757 Equivalence law, 485 Equivalence relation, 885 Equivalent transformation, 569 Essential singularity, 225 Ethylene, 729, 747–757, 759, 761, 764, 770, 771 Euclidean space, 196, 635, 639, 663 Euler angles, 669–678, 808, 811, 815, 820, 826, 831, 881 Euler equation, 169 Euler’s formula, 198 Euler’s identity, 198

Index Evanescent mode, 320 Evanescent wave, 321, 327–331 Even function, 18, 49, 132, 157, 837, 840, 842 Even permutations, 449, 873 Excited state, 37, 128, 132, 152, 174, 175, 177, 339, 346, 347, 354–356, 754, 769, 776, 796, 797 Existence probability, 21, 32, 139 Existence probability density, 139 Expansion coefficient, 173, 396, 836 Expectation value, 24, 25, 33–36, 51, 52, 67, 172, 174, 565, 566 Exponential functions, 15, 198, 241, 281, 298, 299, 423, 581–617, 799, 802, 808, 826, 891 Exponential polynomials, 299 External force fields, 58 Extremals, 129, 334 Extremum, 176

F Factor group, 627, 631, 649 Faraday, M., 272 Faraday’s law of electromagnetic induction, 272 Finite group, 621–623, 626, 628, 629, 631, 662, 663, 679–681, 683, 684, 686, 692, 703, 799, 833, 840, 841, 845, 846, 888, 890 Finite set, 183 First-order approximation, 177, 178 First-order linear differential equation (FOLDE), 44, 379, 384–389, 593, 598, 607, 608 First-order partial differential equations, 269 Fixed coordinate system, 640, 657, 678, 831, 895 Flux, 148, 271, 272, 311, 352, 356, 612 Focused ion beam (FIB), 363 Forbidden, 125, 755, 770, 797 Forward wave, 278, 331, 333, 335 Four group, 639, 648 Fourier coefficients, 167 Fourier cosine coefficients, 842 Fourier cosine series, 842 Fréchet axiom, 195 Free space, 295, 326, 331, 352 Functionals, 13, 15, 100, 101, 118, 122, 134, 139, 160, 167, 169, 201, 203, 219, 227, 234, 278, 380, 382, 384, 388, 389, 393, 408, 523, 594, 717, 734, 741, 746, 754, 854

Index Function space, 25, 623, 728 Fundamental set of solutions, 15, 383, 404, 405, 409, 411, 418–421, 613, 617 Fundamental solution, 608

G Gain function, 356 Gamma functions, 95, 97, 98, 856 Gaussian plane, 181 Gauss’ law, 270, 271 Gegenbauer polynomials, 94, 99 Generalized angular momentum, 72–77, 810, 811, 844, 873, 895 Generalized binomial theorem, 95, 233 Generalized eigenspaces, 471, 478–485, 488, 496 Generalized eigenvalues, 460, 500, 589 Generalized eigenvectors, 471, 473–478, 493, 495, 497, 500, 504, 505, 507, 511, 513, 521 Generalized Green’s identity, 392, 409 General linear group, 623, 808 General point, 635, 730, 815 General solution, 15, 32, 278, 384, 409, 420 Generating function, 95, 264 Geometric object, 635–638, 640, 648, 654 Geometric series, 217, 221 Glass, 269, 280, 286 Global properties, 890–896 Gram matrix, 529, 530, 532–535, 541, 563, 568, 681 Gram–Schmidt orthonormalization, 543, 556 Grand orthogonality theorem (GOT), 692–697, 713 Grating period, 364, 374 Grating wavevector, 363, 364, 372, 373 Grazing emissions, 364, 372 Grazing incidence, 317, 318 Greatest common factor, 478 Green’s functions, 379–430, 581, 592, 594, 617 Green’s identity, 392, 397 Ground state, 36, 128, 132, 152, 153, 156, 157, 163, 164, 172, 174, 176, 177, 339, 340, 345–347, 741, 754, 769, 776, 796, 797 Group algebra, 700–708 representation, 679–726 theory, 581, 621–633, 635, 636, 643, 679–726, 729–797 Group refractive index, 358 Group velocity, 7, 12, 326

905 H Half-integer, 75, 77 Half-odd-integers, 75, 77, 79, 82, 812, 815, 849, 855, 884 Hamilton–Cayley theorem, 471–472, 482, 489, 514 Hamiltonian, 11, 21, 29, 31, 33, 36, 43, 57–67, 107, 152, 157, 164, 174, 177, 532, 735, 738, 740, 741, 745, 761, 788, 790 Harmonic oscillator, 30–52, 57, 107–109, 125, 126, 130–132, 151, 152, 160, 174, 175, 339, 341, 375, 377, 379, 418, 419, 532, 748, 755 Heisenberg, W., 11 Hermite differential equation, 51, 426, 427 Hermite polynomials, 47, 49, 57, 163, 427 Hermitian, 23, 52, 67, 130, 152, 347, 379, 530, 547, 600, 681, 738, 802 Hermitian differential operators, 172, 174, 425 Hermitian matrix, 24, 131, 530, 532, 537, 568, 572, 577, 600 Hermitian operator, 23–25, 52, 53, 68, 77, 156, 173, 177, 347, 379, 399, 402, 427, 429, 547–580, 740, 806 Hermitian quadratic form, 532, 535, 547, 568–575 Hermiticity, 3, 19, 25, 26, 117, 391, 394, 402, 426, 427, 577, 745 Hesse’s normal forms, 280, 658, 661 Highest-occupied molecular orbital (HOMO), 754, 769 Hilbert space, 25, 565, 748 Homeomorphic, 895 Homogeneous BCs, 388, 389, 393, 394, 397, 399, 401–403, 405, 407–409, 411, 423, 425, 426 Homogeneous equation, 169, 380, 397, 399, 401, 402, 404, 405, 408, 419, 425, 426, 594, 597, 600, 604, 607, 613, 614 Homomorphic, 628–631, 680, 685, 708 Homomorphic mapping, 629, 630 Homomorphism, 621, 627–631, 679–681, 713, 718, 881, 883 Homomorphism theorem, 631, 883 Homotopic, 892 Hückel approximation, 754 Hydrogen, 57, 122, 125, 132–142, 163–172, 755, 777, 780, 785, 788–790 Hydrogen-like atom, 57–123, 125, 126, 132, 140, 142, 151, 153, 532, 579 Hypersphere, 892 Hypersurface, 574, 892

906 I Idempotent matrix, 479, 482, 486, 517, 520, 521, 564, 604 Identity element, 622, 624, 625, 627–631, 638, 649, 697, 751, 759, 882, 886–888, 891, 892, 896 matrix, 43, 480, 482, 484, 486, 494, 562, 567, 582, 640, 664, 667, 668, 680, 694, 702, 703, 780, 802, 865, 884, 890 operator, 35, 55, 155, 167, 395, 397, 480 transformation, 640, 663, 667, 882 Image of transformation, 443 Imaginary axis, 181, 197 Imaginary unit, 181 Improper rotation, 643, 647, 663, 780, 889, 890 Incidence angle, 301, 313 Incidence plane, 301, 304, 306, 311, 315 Incident light, 300, 302, 304 Indeterminate constant, 18 Indiscrete topology, 195 Inequivalent, 685, 692, 694–696, 699, 707, 710, 711, 830, 839, 841, 842 Infinite dielectric media, 285, 286 Infinite-dimensional, 41, 43, 435, 748 Infinite group, 622, 628, 629, 635, 663, 799, 843, 846 Infinite power series, 582 Infinite series, 159, 218, 225 Infinite set, 182 Inflection points, 235 Inhomogeneous boundary conditions (BCs), 402, 403, 405, 407, 415, 419, 424, 597–599 Inhomogeneous equation, 380, 383, 402, 407, 594, 596–599, 607 Initial value problem (IVP), 379, 408–424 Injective, 447, 448, 452, 627, 629, 895 Injective mapping, 627 Inner product of functions, 719, 740 space, 196, 523–544, 547, 550, 554–556, 563–566, 584 of vectors, 397, 523, 576 vector space, 523, 551, 868 In-quadrature, 375, 376 Integrand, 214, 232, 244, 246, 257, 259, 262, 265 Integration clockwise, 210, 257, 262 complex, 206 constant, 149, 376, 385, 593, 594

Index contour, 211, 213, 214, 217, 220, 230, 232, 236, 238, 239, 243, 247, 257, 261, 262, 265 counter-clockwise, 210, 211, 232, 257 definite, 132 by parts, 25, 29, 83, 130, 386 path, 209, 210, 222 range, 25, 239, 415, 417, 742 real, 135 surface, 274 termwise, 218, 222 volume, 274 Interface, 295–298, 300–305, 311, 314, 320, 321, 324–327, 331, 335–337, 339, 344, 363, 365, 370, 375, 376 Interference, 331, 357 Interior point, 186 Intermediate state, 156 Intermediate value theorem, 887 Intersection, 183, 184, 190, 623, 643, 645, 646, 649, 661, 895 Invariant, 461, 468–474, 481, 487, 501, 513, 559, 566, 569, 589, 626, 632, 687, 689, 691, 706, 719, 734, 738, 803, 806, 814, 837, 839, 849, 851, 852, 860 Invariant subgroups, 626, 630, 632, 649, 662, 706, 883, 887, 888 Invariant subspace, 468–474, 477, 481, 490, 501, 503, 504, 513, 518, 521, 547, 559, 577, 579, 687, 688, 691, 781 Inverse element, 447, 622, 627, 629, 631, 638, 656, 680, 701, 706 Inverse mapping, 447 Inverse matrix, 42, 200, 433, 448–452, 454, 511, 512, 571, 596, 601, 605, 680 Inverse transformations, 443, 447, 448, 622, 816, 818 Inversion symmetry, 642, 643, 647, 654 Inverted distribution, 355 Invertible, 447, 622 Invertible mapping, 447 Irradiance, 311, 355–357 Irrational functions, 248 Irreducible, 686, 688, 691–694, 696–700, 703, 707–711, 715–717, 719, 720, 722, 738, 740–742, 745–751, 754–757, 759, 761, 765, 769–771, 774, 775, 778, 781, 783–786, 788, 789, 795–797, 823–831, 837–847, 859 Irreducible character, 697, 842–844 Irreducible representations, 687, 688, 692, 694, 696, 698, 699, 702, 703, 707–711, 715–717, 719, 720, 738, 740–742,

Index 745–748, 750, 751, 754–757, 759, 761, 765, 769–771, 774, 775, 778, 781, 783–786, 788, 789, 795–797, 823–831, 837, 840–847, 859 Isolated point, 191–193 Isomorphic, 629, 631, 648, 654, 661–663, 685, 734, 883, 895 Isomorphism, 621, 627–631, 679, 890

J Jacobi’s identity, 872 j j coupling, 849 Jordan blocks, 490–502, 504–506, 513 Jordan canonical forms, 459, 488–512

K Kernel, 443, 501, 629–631, 881, 883 Ket vector, 22, 523, 526

L Laboratory coordinate system, 366 Ladder operators, 74, 90, 91, 112 Laguerre polynomials, 57, 108, 116–122 Laser material, 358, 361, 363 medium, 355–358, 365 organic, 359–374 oscillation, 355, 358, 360, 374 Lasing property, 359 Laurent’s expansion, 221–223, 229, 233, 234 Laurent’s series, 216–224, 228, 229, 234 Law of conservation of charge, 274 Law of electromagnetic induction, 272 Law of equipartition of energy, 341 Left-circularly polarized light, 136, 138, 147–149, 293 Left-circular polarization, 293 Legendre differential equation, 83, 91–103, 427 Legendre polynomials, 57, 85, 91, 93, 98, 100, 104, 106, 107, 264 Leibniz rule, 92, 94, 101, 118 Leibniz rule about differentiation, 92, 94 Levi-Civita symbol, 873 Lie algebra, 592, 600, 610, 799, 802, 864, 891, 892, 895, 896 Lie group, 891 linear, 866, 867, 877, 887, 891, 892 Light amplification, 354, 355 Light amplification by stimulated emission of radiation (laser), 354

907 Light-emitting device, 363, 372, 373 Light-emitting material Light propagation, 148, 269, 319, 364, 366 Light quantum, 3, 4, 345–347 organic, 359, 361, 374 Limes superior, 220 Limit, 202, 205, 207, 212, 213, 220, 258, 312, 415, 417, 587, 591, 703, 802, 836, 858, 869 Linear algebra, 491 Linear combination of basis vectors, 155, 435, 440, 519, 691, 843 of functions, 126, 849, 854, 861 of a fundamental set of solutions, 404 LCAO, 734 Linear combination of atomic orbitals (LCAO), 729, 734, 742 Linearly dependent, 17, 93, 100, 107, 299, 344, 381–383, 434, 443, 444, 454, 466, 468, 469, 471, 527, 528, 531, 532, 543, 693 Linearly independent, 15–17, 32, 71, 298, 341, 345, 368, 380–383, 404, 433, 434, 437, 445, 453, 456, 462, 463, 466–470, 473, 476–478, 489, 490, 513, 526, 528, 532, 533, 543, 544, 547, 568, 576, 579, 596, 613, 664, 665, 684, 685, 688, 693, 701, 710, 715, 716, 785, 786, 795, 812, 822 Linearly polarized light, 142, 144, 286, 293 Linear transformation group, 623 Linear vector space finite-dimensional, 872 Line integral, 296 Local properties, 890–896 Logarithmic branch point, 256 Logarithmic functions, 248 Longitudinal multimode, 358 Loop, 296, 892, 895 Lorentz force, 147, 611, 612, 615 Lowering operators, 74, 90, 108, 112, 122 Lower left off-diagonal elements, 450, 462 Lower triangle matrix, 462, 512, 553, 606 Lowest-unoccupied molecular orbital (LUMO), 754, 769 L S coupling, 849 Luminous flux, 311

M Magnetic field, 147, 269, 271–273, 275, 283–285, 297, 303–307, 310, 322, 324, 326, 344, 350, 352, 368–370, 375, 376, 611

908 Magnetic flux, 272 Magnetic flux density, 148, 271, 612 Magnetic permeability, 271, 365 Magnetic quantum number, 140, 142 Magnetic wave, 284, 286 Magnetostatics, 271 Magnitude relationship, 412, 693, 694, 797 Major axis, 291, 292 Mapping inverse, 447 invertible, 447 non-invertible, 447 reversible, 447 Mathematical induction, 48, 80, 113, 114, 462, 464, 466, 543, 556, 558, 570, 583 Matrix algebra, 43, 306, 367, 433, 443, 451, 474, 530, 547 decomposition, 485, 488, 512, 561 representation, 3, 31, 41–44, 86, 89, 90, 440, 442, 443, 450, 451, 454, 456, 468, 490, 494, 503–505, 519, 536–538, 540, 553, 576, 579, 640, 643–646, 648, 650, 652, 657, 659, 661, 663, 689–691, 694, 697, 730, 738, 746, 748, 758, 764, 778, 800, 808, 826, 860, 863, 881, 884 Matrix element of electric dipole, 127, 129 of Hamiltonian, 131, 532, 740, 741 Matrix equation, 504 Matrix function, 885 Matter wave, 6, 7 Maximum number, 434 Maxwell, J.C., 275 Maxwell’s equations, 269–293, 321, 365, 366 Mechanical system, 33, 375–378 Meromorphic, 225, 231 Meta, 767 Methane, 659, 729, 777–797 Method of variation of constants, 594, 596, 614 Metric, 181, 196, 199, 523 Metric space, 196, 199, 220, 524 Minimal polynomial, 472, 473, 475, 499, 508, 513–515, 517 Minor axis, 291 Mirror symmetry plane of, 642, 643, 646–648 Mode density, 341–345 Modulus, 197, 213, 235, 836 Molecular axis, 648, 756, 791, 795 Molecular long axis, 756, 777 Molecular orbital energy, 735, 746

Index Molecular orbitals (MOs), 691, 719, 720, 729, 734–797 Molecular science, 3, 711, 723, 729, 740, 797 Molecular short axis, 777 Molecular states, 729 Molecular systems, 720, 723 Momentum, 4–6, 27, 55, 59, 60, 72–77, 79, 82, 86, 91–107, 125, 132, 138, 147–150, 375, 377, 810, 843, 844, 873, 895 Momentum operators, 3, 29, 31, 33, 61, 77–91, 108, 142, 145 Monoclinic, 365 Monomials, 812, 855, 857 Moving coordinate systems, 669, 670, 672, 678, 831, 893 Multiple roots, 249, 461, 473, 474, 477, 478, 514, 515, 517, 521, 559 Multiplication table, 623, 648, 688, 701 Multiplicity, 468, 481, 486, 493, 498, 562, 565, 575, 576, 782 Multiply connected, 194, 222 Multivalued function, 248–265

N Nabla, 61 Nano capacitor, 157 Nano device, 160 Naphthalene, 648, 658 Necessary and sufficient condition, 53, 190, 193, 195, 206, 382, 383, 447, 448, 451, 453, 461, 528, 533, 556, 563, 622, 624, 629, 825, 887, 891 Negation, 187, 191 Negative helicity, 293 Neumann conditions, 27, 327, 344, 394 Newtonian equation, 31 Nilpotent, 475, 485–488, 498, 553 Nilpotent matrices, 90, 91, 473–478, 484, 487–494, 497, 499, 500, 509, 512, 553, 607 Nodes, 325, 335–337, 376, 747, 748, 789 Non-commutative, 3, 27, 579 Non-commutative group, 622, 639, 746 Non-degenerate, 151–154, 163, 176, 575 Non-equilibrium phenomenon, 355 Non-invertible mapping, 447 Non-magnetic substance, 280, 286, 308, 312 Non-negative, 27, 34, 49, 52, 68, 69, 74, 93, 146, 197, 214, 241, 242, 384, 385, 524, 532, 535, 547, 568, 615, 811, 819, 855 Non-negative operator, 72, 74, 532

Index Non-relativistic approximation, 7, 11 Non-singular matrix, 433, 454, 455, 462, 463, 465, 475, 486, 497, 499, 508, 513, 570, 571, 588, 681, 685, 865, 875 Non-trivial, 381, 434 Non-trivial solution, 382, 383, 399, 408, 426, 521 Non-vanishing, 15, 16, 90, 91, 128, 131, 137, 146, 223, 225, 228, 274, 296, 394, 591, 654, 741, 745, 770, 785 Non-zero coefficient, 300 determinant, 454 eigenvalue, 498 element, 87, 88 vector, 67 Norm, 24, 53, 524, 525, 529, 531, 539, 541, 543, 554, 561, 566, 573, 581, 584, 763, 764 Normalization constant, 37, 45, 51, 112, 140, 543 Normalized, 13, 18, 24, 25, 28, 34, 35, 37, 39, 47, 53, 54, 67, 71, 76, 86, 106, 110, 113, 115, 121, 122, 128, 132–134, 139, 140, 153, 154, 174, 175, 356, 543, 544, 565, 726, 742, 743, 751, 753, 754, 768, 771, 785, 795, 847, 854, 864 Normalized eigenfunction, 112, 773 Normalized eigenstate, 53 Normalized eigenvectors, 37, 153, 290 Normal operator, 547, 554–557, 561, 563, 668 n-th roots, 248–251 Nullity, 444 Null-space, 443, 501 Number operator, 40 Numerical vector, 435

O O(3), 280, 642, 647, 654–663, 888–889 Oblique coordinate, 639 Oblique incidence of wave, 295 Observable, 565, 566 Octants, 656 Odd functions, 18, 49, 132, 157 Odd permutations, 449, 873 Off-diagonal elements, 450, 512, 560, 639, 640, 744, 745, 761, 787 O(n), 875 One-dimensional, 33, 49, 128, 130, 151, 152, 157, 167, 174, 177, 196, 277, 357, 375,

909 473, 496, 501, 502, 513, 639, 710, 742, 746, 755, 803, 806, 864 case, 343 harmonic oscillator, 31, 57, 126, 130–132, 755 infinite potential well, 19 position coordinate, 33 representation, 746, 761, 762, 783, 800 system, 128–132, 163 One-parameter group, 865, 870, 891 Open ball, 220 Open neighborhood, 196 Operator, 31, 61, 130, 151, 270, 347, 379, 456, 478, 523, 547, 610, 647, 684, 729, 801 Operator method, 33–41, 132 Optical devices, 314, 319, 320, 337, 354, 355, 363, 371 Optical path, 327, 328 Optical process, 125, 136, 346 Optical transition, 125–151, 346, 347, 717, 720, 722, 741, 746, 754, 755, 757, 769, 770, 774–776, 796, 797 Optical waveguide, 329 Optics, 295, 339, 362 Orbital angular momentum, 58, 77–107, 843, 844, 895 Orbital angular momentum quantum numbers, 108, 120, 122, 142 Orbital energy, 789, 795 Order of group, 648, 696 Ordinate axis, 129 Organic laser, 359–374 Organic materials, 361 Orthogonal, 47–49, 94, 107, 175, 304, 344, 379, 427, 429, 524, 539, 544, 548, 551, 560, 561, 636, 655, 691, 710, 717, 719, 737, 745, 746, 751, 759, 774, 786, 871, 875, 884 complements, 547–549, 559, 688, 691 coordinate, 197, 365, 366, 817 group, 663–678, 871 matrix, 569, 574, 590, 592, 614, 636, 638, 654, 663, 664, 667, 672, 680, 736, 822, 831, 832, 837, 871, 875, 876, 881, 884 transformation, 636, 663, 669, 878, 881 Orthonormal base, 8, 532, 540–543 Orthonormal basis set, 547, 584, 720, 739, 740, 847, 878, 881 Orthonormal basis vectors, 59, 60, 636, 650, 716, 730, 745 Orthonormal eigenfunctions, 40 Orthonormal eigenvectors, 153, 559

910 Orthonormal system, 107, 151, 155, 842 Oscillating electromagnetic field, 127 Out-coupling of emission, 372 of light, 363, 364 Over damping, 419 Overlap integrals, 717, 740, 753, 754, 767, 774, 777, 785, 788, 789

P Paper-and-pencil, 796 Para, 767 Parallelepiped, 355–357 Parallelogram, 848 Parallel translation, 278 Parameter space, 831–837, 892, 895 Parity, 18, 49, 130 Partial differentiation, 8–10, 283, 321 Partial fraction decomposition method of, 232, 245 Partial sum, 159, 218 Particular solutions, 169, 383 Path, 194, 208–210, 212, 222, 327, 328, 357, 892 Pauli spin matrices, 810 Periodic conditions, 27, 394 Periodic function, 250, 884 Periodic wave, 278 Permeability, 148, 271, 306, 365 Permittivity, 58, 108, 270, 365, 569 Permittivity tensor, 365, 569 Permutation, 383, 449, 641, 662, 671, 677, 873 Perturbation method, 151–172, 179 Perturbed state, 164 Phase change upon reflection, 295, 330 Phase difference, 286, 327, 328, 375 Phase factor, 18, 40, 74, 77, 84, 120, 133, 141, 284, 544 Phase matching, 364, 365, 371 Phase matching conditions, 365 Phase refractive index, 358, 361, 369 Phase shift, 317, 318, 329, 330 Phase velocity, 7, 8, 278, 280, 326, 331 Photoabsorption, 132 Photon absorption, 132, 135, 138 emission, 132, 134, 138 energy, 355 Photonics, ix

Index Physicochemical properties, 746, 754 Planar molecule, 650, 747, 757, 764 Planck, M., 3, 127, 339, 341–345, 349 Planck’s law of radiation, 339, 341–345, 349 Plane electromagnetic waves, 283 Plane figure, 643 Plane of incidence, 301 Plane of mirror symmetry, 642, 647 Plastics, 286 Point group, 635, 638, 648, 659, 745, 746, 755, 778 Polar coordinate, 29, 45, 58, 62, 146, 171, 197, 241, 806, 837 Polar coordinate representation, 146, 256, 837 Polar form, 197, 237, 249 Polarizability, 163–171 Polarization vector, 127, 134, 135, 138, 284, 285, 303–306, 309, 315, 324, 331, 335, 350, 352, 375, 741, 755, 796 Polyenes, 777 Polymers, 269, 280 Polynomials, 38, 47–49, 57, 85, 91, 93–95, 98–100, 104–107, 116–122, 163, 231, 235, 264, 298, 299, 379, 427, 428, 460, 461, 471–473, 475, 477, 478, 481–483, 486, 488, 489, 499, 508, 513–515, 517, 518, 520, 603, 782, 851, 852 Population inversion, 355 Portmanteau word, 190 Position coordinate, 33 Position operator, 29, 347, 354 Position vector, 8, 59, 127, 129, 134, 149, 164, 275, 280, 298, 349, 351, 439, 540, 612, 636, 730, 735, 741, 748, 755, 800 Positive definite, 17, 26, 27, 34, 36, 287, 288, 532, 534, 568, 570, 874, 981 Positive definiteness, 17, 19, 26, 27, 31, 34, 36, 287, 288, 532, 534, 547, 568–570, 681, 874 Positive-definite operator, 26 Positive helicity, 293 Positive semi-definite, 27, 532 Power series expansion, 33, 108, 116, 117, 121, 122, 198, 216, 582, 591 Poynting vector, 309, 311, 335 Primitive function, 208 Principal axis, 366, 373 Principal branch, 251, 254 Principal diagonal, 494, 496 Principal minors, 287, 533, 570, 573 Principal part, 224 Principal submatrix, 570 Principal value

Index of branch, 251 of integral, 236, 237 of ln z, 258 Probability, 21, 32, 126, 127, 133, 138, 172, 346, 353, 565, 741, 777 density, 126, 127, 129, 139 distribution, 126 distribution density, 128, 129 Projection operator sensu lato, 712 sensu stricto, 714 Proof by contradiction, 190 Propagating electromagnetic waves, 295 Propagation constant, 326, 331, 361, 363, 364 Propagation vector, 364 Propagation velocity, 7, 278, 343 Proper rotation, 640, 643, 654 Proposition converse, 591, 825 P6T, 363, 365, 366, 368–370, 373, 374 Pure imaginary, 28, 29, 315, 331, 419, 870 Pure rotation group, 654, 659

Q Q-numbers, 10 Quadratic forms, 53, 287, 288, 532, 533, 535, 547, 568–575, 814, 851 Quantum chemical calculations, 711, 712, 729, 734 Quantum chemistry, 764 Quantum electromagnetism, 378 Quantum-mechanical, 3, 21, 30–52, 54, 55, 57, 107–109, 122, 151, 152, 160, 165, 532 Quantum-mechanical harmonic oscillator, 31–52, 55, 107–109, 152, 160, 532 Quantum mechanics, 3, 6, 11, 12, 18, 27, 29, 31, 51, 57, 58, 151–179, 339, 354, 427, 523, 547, 673, 748, 872 Quantum number azimuthal, 120, 121, 142, 147 magnetic, 140, 142 orbital angular momentum, 108, 120, 122, 142, 895 principal, 120, 121, 142 Quantum operator, 29 Quantum state, 21, 32, 52, 74, 90, 111, 121, 125–127, 132, 142, 143, 145, 151–159, 163, 164, 166, 172, 174, 354, 741, 847, 849 Quantum theory of light, 356 Quartic equation, 145 Quotient group, 627

911 R Radial coordinate, 107–115, 197, 806, 822, 837 Radial wave function, 107–115, 120–122, 167 Radiation field, 127, 136, 138 Raising and lowering operators, 74, 90 Raising operator, 74, 90 Rational number, 198 Rayleigh–Jeans law, 339, 345 Real analysis, 182, 216 Real axis, 181, 197, 235, 242, 246, 247, 253, 254, 257–259 Real functions, 49, 128, 141, 142, 200, 319, 348, 392, 397, 401, 402, 404, 752, 757, 764 Real number line, 200, 201, 395 Real space, 800, 803, 806, 808 Real symmetric matrix, 287, 348, 569, 570, 572, 600, 603 Real symmetric quadratic form, 547, 569 Real variable, 14, 25, 200, 206, 207, 219, 230, 389, 581 Rearrangement theorem, 623, 681, 695, 701, 704, 713 Rectangular parallelepiped, 355–357 Recurrence relation, 84 Redshifted, 4, 372–374 Reduced mass, 58, 59, 108, 133, 735 Reduced Planck constant, 4, 127 Reducible, 471, 686, 699, 700, 702, 742, 746, 749, 754, 759, 765, 769, 825, 830, 843, 845, 847, 864 Reducible representation, 686, 699, 700, 742, 746, 749, 759 Reduction, 559, 687 Reflectance, 311, 317, 337 Reflected light, 300, 302 Reflected wave, 311, 313, 324, 327 Reflections angle, 300, 302, 313, 314, 317, 319, 329, 643, 645 coefficient, 299, 300, 306–308, 317, 325, 326, 337, 344, 379 Refraction angles, 302, 308, 315 Refractive index of dielectric, 280, 295, 308, 313, 314, 321, 326, 327 group, 358 of laser medium, 356, 358 phase, 295, 358, 361, 362, 369 relative, 303, 312, 314, 318, 371 Region, 148, 199, 204, 206–209, 211, 215, 219–223, 225–227, 263, 285, 286, 318,

912 319, 321, 331, 336, 339, 351, 416, 842, 892 Regular hexagon, 764 Regular point, 206, 216, 228 Regular representation, 700–707 Regular singular point, 169 Regular tetrahedron, 661, 777 Relations between roots and coefficients, 461 Relative coordinate, 57–59 Relative permeability, 271 Relative permittivity, 271 Relative refractive index, 303, 312, 314, 318, 371 Relativistic quantum mechanics, 12 Removable singularity, 224, 244 Representation antisymmetric, 723–726, 858, 860, 863, 864 complex conjugate, 523, 762, 816, 817 conjugate, 762, 816, 817 coordinate, 3, 31, 44–51, 108, 112, 122, 127, 129, 132, 136, 146, 157, 160, 163, 165, 168, 170, 171, 175, 178, 256, 395, 396, 440, 574, 640, 644, 650, 652, 791, 800, 823, 837, 838 dimension of, 679, 680, 685, 687, 688, 691, 695–697, 703, 711, 719, 723, 726, 746, 777, 800, 802, 830, 839, 846, 847 direct-product, 631–633, 720–723, 725, 740, 741, 745, 750, 769, 796, 831, 843–846, 849, 859, 861, 863 of group, 679–726, 840 irreducible, 686, 688, 691–694, 696–700, 702, 703, 707–711, 715–717, 719, 720, 738, 740–742, 745–751, 754–757, 759, 761, 765, 769–771, 774, 775, 778, 781, 783–786, 788, 789, 795–797, 823, 825, 828, 830, 840–847, 859 matrix, 3, 31, 34, 54, 86, 161, 440–443, 450, 451, 454, 456, 468, 487, 536, 553, 661, 680, 681, 685, 686, 690, 698, 699, 708, 714, 719, 739, 811, 816, 817, 822, 826, 830, 838, 839, 844, 864, 880, 884 of point group, 746 reducible, 686, 699, 700, 742, 749, 759 regular, 700–707 space, 687, 688, 691, 692, 714, 716, 719, 738, 745, 761, 764, 777, 780, 796, 799, 800, 802, 803, 806, 808, 817, 839, 842, 843, 846, 847, 849, 855, 859, 860, 864, 884 subduced, 750, 759, 765 symmetric, 723–726, 734, 738, 741, 742, 745, 751, 754, 755, 759, 769, 784, 796, 858, 860, 863, 864

Index theory, 621, 664, 679–726, 729, 745, 799, 819, 843 unitary, 679–681, 683, 687, 695, 714, 825, 830, 843 Residues, 227–230, 232, 233, 237, 257 Resolvent, 581, 595–600, 602–607, 609, 610, 614, 617 Resolvent matrix, 595–600, 602–607, 609, 610, 614, 617 Resonance integral, 740, 752, 767 Resonance structure, 688, 689, 757 Resonator, 337, 359, 361 Reversible mapping, 447 Riemann sheet, 252 Riemann surface, 248–265 Right-circular polarization, 136, 138, 293 Right-handed system, 284, 304, 309, 350, 376 Rigid body, 664, 729 Rodrigues formula, 85, 93, 94, 99, 116, 264 Rotation angles, 367, 639, 640, 642, 647, 663, 667, 800, 831, 833–836, 838, 843, 844, 889 axis, 639–643, 648, 657, 659, 661, 664–668, 674–676, 831, 832, 834, 838, 839, 844, 889, 895 groups, 622, 654, 659, 663, 799, 806, 815, 843, 890 improper, 643, 647, 663, 780, 889 matrix, 615, 639, 663–668, 820, 838, 889 proper, 640, 643, 654 symmetry, 640, 642 transformation, 367, 439, 451, 622, 635, 640, 663, 667, 672–678, 799, 816, 832, 833, 837, 889, 895 Row vectors, 8, 383, 464, 519, 528, 541, 542, 702, 830

S Scalar, 270, 276–278, 434, 435, 446, 473, 486, 526, 577, 729 Scalar function, 277, 278, 729 Schrödinger, E., 3, 7, 11, 14 Schrödinger equation of motion, 58 time-evolved, 126 Schur’s First Lemma, 692–694, 697, 783, 822, 839 Schur’s lemmas, 692–697, 823 Schur’s Second Lemma, 694, 695, 708, 738, 824, 825, 830 Second-order differential operators, 389–394 Second-order linear differential equations (SOLDEs), 3, 14, 15, 25, 33, 44, 69–71,

Index 82, 368, 379–384, 388–390, 394, 402, 424, 428, 581, 592–594 Secular equation, 503, 729, 744–747, 751, 761, 766, 770, 771, 774, 786, 787, 795 Selection rules, 125–149, 720 Self-adjoint, 389, 391–394, 399, 401, 402, 405, 410, 426, 427, 530 Self-adjoint matrix, 530 Self-adjoint operator, 391, 399, 402, 410, 426 Sellmeier’s dispersion formula, 359, 362 Semicircle, 230–232, 236, 238, 239, 241, 242, 246 Semiclassical, 125, 147 Semiclassical theory, 147 Semiconductors, 286, 337, 354, 359–362 Semi-infinite dielectric media, 295, 296 Semi-simple, 485–488, 509, 512 Separation axioms, 195, 196 Separation conditions, 195 Separation of variables, 12, 67–72, 107, 125, 341 Sesquilinear, 526 Set difference, 183 Set of points, 225, 226 Set theory, 181, 182, 196, 199 Similarity transformation, 458, 459, 461–463, 465, 466, 475, 477, 484–486, 496, 497, 505, 506, 508, 510, 513, 515, 530, 532, 554, 556, 557, 559, 560, 563, 569, 578, 580, 588, 589, 600, 604, 609, 654, 668, 679, 681, 683, 685, 686, 689, 782, 801, 804, 805, 822, 839, 863, 875, 889 Simple harmonic motion, 32 Simple root, 501, 502, 518 Simply connected, 194, 209–212, 215, 219, 221, 222, 227, 890–896 Simply reducible, 686, 845, 847, 864 Simultaneous eigenfunction, 68, 86 Simultaneous eigenstate, 66, 68, 73, 79, 86, 90, 574–580 Singleton, 196 Single-valuedness, 206, 215, 251 Singularity, 208, 211, 224, 225, 228, 229, 240, 244, 408 Singular matrix, 555 Singular part, 224, 225 Singular point isolated, 235 regular, 169, 206, 228 Sinusoidal oscillation, 127, 129, 133 Skew-symmetric, 590, 592, 600, 807, 867, 871, 872, 876, 891

913 Skew-symmetric matrix, 590, 592, 600, 807, 867, 871, 872, 876, 891 Slab waveguide, 320, 321, 323, 324, 326, 327, 331, 359 Solid-state chemistry, 374 Solid-state physics, 374 Space group, 637 Span, 435–437, 443–446, 454, 468, 469, 474, 481, 489, 490, 501, 513, 521, 548, 549, 575, 577, 665, 738, 745, 761, 764, 780, 782, 796, 800, 817, 839, 842, 843, 849, 855, 859, 860, 864, 872, 873, 884 Special functions, 57, 107, 617 Special orthogonal group SO(3), 635, 663–678, 799, 802, 806, 808, 809, 815, 816, 822–843, 845, 847, 864, 866, 867, 873, 884, 888–889, 892, 895, 896 Special orthogonal group SO(n), 871, 875 Special solution, 8 Special unitary group SU(2), 799, 802, 806, 808, 810, 812, 815, 823, 825, 827, 830, 831, 843–849, 852, 864, 866, 867, 873, 892, 895, 896 Special unitary group SU(n), 808, 871, 872, 875 Spectral decomposition, 563–565, 580, 668 Spectral lines, 127, 357, 729 Spherical basis, 806, 820 Spherical coordinate, 58, 60, 109, 806, 823 Spherical surface harmonics, 86, 92–103, 107, 167, 799, 806, 817, 820, 823, 839, 842, 884 Spherical symmetry, 57, 58, 892 Spin angular momentum, 77, 843, 844 Spin states, 121, 810 Spontaneous emission, 346, 347, 353, 355 Spring constant, 31, 419 Square-integrable, 25 Square matrix, 433, 462–464, 468, 469, 471, 475, 485, 486, 496, 582, 584, 588, 679, 685, 692, 693, 697, 700, 802, 828 Square roots, 249, 288, 367, 790 Standard deviation, 52 Stark effect, 164, 172 State vectors, 127, 131, 133, 153 Stationary current, 273, 275 Stationary waves, 331–337, 343, 375 Stimulated absorption, 346, 355 Stimulated emission, 346, 347, 354, 355 Stirling’s interpolation formula, 586 Stokes’ theorem, 296 Strictly positive, 17 Structure constants, 872, 895

914 Structure–property relationship, 359 Strum Liouville system, 427 Subduced representation, 750, 759, 765 Subgroups, 621, 624–626, 630, 632, 648, 649, 659, 662, 663, 706, 750, 757, 759, 764, 808, 866, 875, 883, 887, 888 Submatrix, 367, 559, 570, 782, 860 Subsets, 182, 183, 185–193, 195, 196, 199, 226, 435, 624, 625, 629, 871, 885–887, 891 Subspaces, 185, 194, 199, 435, 436, 438, 443, 445, 460, 468–474, 477, 478, 481, 490, 501, 503, 504, 513, 514, 518, 521, 548–550, 559, 560, 577, 579, 624, 639, 687, 688, 691, 780, 782, 871 Superior limit, 220 Superposed wave, 334, 335 Superposition of waves, 285–293 Surface integral, 295–297, 351 Surface term, 386–388, 390, 394, 397, 402, 403, 407, 410, 414–418, 427, 594 Surjective, 447, 448, 452, 628, 629, 631, 881, 895 Surjective mapping, 631, 881 Symmetric, 67, 131, 171, 175, 287, 348, 371, 399, 403, 547, 570–572, 600, 603, 604, 637, 638, 662, 723–726, 734, 738, 741, 742, 745, 751, 754, 755, 759, 769, 784, 796, 797, 834, 847, 858, 860, 863, 864 Symmetric matrix, 131, 569–572, 604 Symmetric representation, 723–726, 734, 738, 741, 742, 745, 751, 754, 755, 759, 769, 784, 796, 860, 863, 864 Symmetry groups, 622, 637, 638, 641, 650, 662, 735, 738, 755 operations, 635, 637, 638, 640, 642–654, 659, 661, 684, 690, 691, 702, 710, 729, 733, 737, 738, 745–747, 750, 751, 757, 759, 761, 770, 778, 782, 783 requirement, 175, 646, 647, 785, 787–789 species, 650, 659, 719, 745, 757, 759, 774, 796 transformations, 748, 759, 765, 780 Symmetry-adapted linear combination (SALC), 691, 729, 745–747, 751, 757, 760, 761, 764, 765, 767, 770, 771, 773, 774, 777, 783–788, 860, 861 Symmetry groups, 635–678 Syn-phase, 335, 337 System of differential equations, 581, 592–600, 610, 617 System of linear equations, 450, 452

Index T Taylor’s expansion, 217, 219, 223, 226 Taylor’s series, 96, 216–222, 224, 234, 237, 243, 263 Td, 647, 654, 659, 661, 662, 778, 781, 784, 797, 890 Tensor electric permittivity, 365 permittivity, 365, 569 Termwise integration, 218, 222 Thiophene, 648, 701 Thiophene/phenylene co-oligomers (TPCOs), 360, 362, 363 Three-dimensional, 57, 58, 132–142, 163, 167, 273, 278, 280, 323, 342, 349, 436, 451, 496, 501, 635, 638, 639, 659, 669, 731, 733, 784, 785, 800, 803, 819, 864, 892 Three-dimensional Cartesian coordinate, 8, 650 Time-averaged Poynting vectors, 309, 311 Time-evolved Schrödinger equation, 126 Topological groups, 190, 799, 802, 896 Topological spaces, 181, 185–196, 199, 891, 895, 896 Topology, 181–199 Torus, 194 Total internal reflection, 321, 327–331, 372 Total reflection, 295, 303, 314–320, 323, 330 Totally symmetric, 741, 745, 797 Totally symmetric ground state, 769, 796 Totally symmetric representation, 734, 738, 741, 742, 745, 751, 754, 755, 759, 769, 784 Trace, 247, 257, 259, 261, 262, 289, 292, 293, 461, 506, 568, 569, 654, 668, 672, 674, 690, 694, 697–699, 703, 704, 707, 708, 746, 748, 757, 759, 765, 770, 822, 834, 839, 871, 872, 875 Traceless, 810, 867, 871, 878 Transformation coordinate, 641 group, 623, 635, 710, 799 linear, 433, 438–440, 443, 444, 447, 448, 450–452, 454, 456, 458, 459, 468, 471, 474, 481, 490, 494, 501, 506, 519, 526, 533, 535, 537–541, 555, 556, 566, 623, 629, 636, 663, 669, 677, 678, 683, 684, 708, 723, 804, 825, 876, 878 matrix, 441, 443, 451, 452, 454, 455, 459, 540, 782, 808, 812 successive, 454, 456, 458, 669, 670, 672, 817, 818, 833, 837 Transition dipole moments, 127, 132 Transition electric dipole moment, 127

Index Transition matrix elements, 131, 740, 755, 769, 775, 776 Transition probability, 127, 133, 138, 172, 346, 347, 741, 777 Translation symmetry, 637 Transmission coefficient, 306–308 of electromagnetic wave, 295–337 irradiance, 311 of light, 297 Transmittance, 311 Transpose complex conjugate, 22, 537 Transposed matrix, 22, 383, 471, 523, 537 Transverse electric (TE) mode, 304, 324, 326, 329 Transverse electric (TE) wave, 295, 303–312, 314, 316, 320–331 Transverse magnetic (TM) mode, 304, 368, 371, 372 Transverse magnetic (TM) wave, 295, 303–308, 310, 312–314, 316, 318, 320–331 Transverse waves, 281, 282, 352 Trial function, 174, 175, 177 Triangle inequality, 525 Triangle matrix lower, 462, 512, 553, 606 upper, 462, 464, 476, 485, 496, 553, 606 Trigonometric formula, 101, 130, 309, 312, 313, 330, 842, 884 Trigonometric functions, 103, 140, 241, 319, 639, 815, 836 Triple root, 474, 501, 503, 504, 759 Triply degenerate, 789, 795, 796 Trivial, 21, 26, 184, 192, 299, 381, 434, 437, 475, 480, 489, 555, 567, 570, 585, 607, 622, 664, 667, 693, 741, 803, 819, 860, 867 Trivial solution, 399, 401, 409, 426, 453, 460, 522 Trivial topology, 195 T1-space, 195–196, 896 Two-level atoms, 339, 345–349, 354, 355, 375

U Ultraviolet catastrophe, 339, 345 Ultraviolet light, 363 Unbounded, 25, 29, 216 Uncertainty principle, 29, 51–55 Uncountable, 182, 621 Underlying set, 185 Undetermined constants, 74, 75, 84 Uniformly convergent, 216, 218–222

915 Unique solution, 403, 416, 419, 424, 447, 450, 451 Uniqueness of decomposition, 486, 488, 512, 564 of Green’s function, 401 of representation, 440, 450, 454, 632, 633 of solution, 18 Unitarity, 683, 713, 714, 718, 740, 808, 816, 819, 823, 851, 869 Unitary diagonalization, 290, 556–564, 568, 573, 667, 804 group, 799, 870–872, 875 matrix, 135, 290, 530, 533, 534, 541, 547, 556–559, 561, 564–568, 573, 578, 590, 592, 601, 604, 667, 679–681, 687, 690, 699, 701, 714, 746, 763, 768, 774, 782, 804, 805, 808, 810–812, 814, 820, 821, 847, 850, 854, 863, 867, 875, 876 operator, 547–580, 860, 869, 870 representation, 679–681, 683, 687, 695, 714, 825, 830, 843 transformation, 135, 140, 541, 566, 740, 763, 764, 770, 771, 774, 812, 849, 851, 852, 854, 860, 863 Unitary similarity transformation, 532, 554, 556, 557, 559, 560, 563, 578, 580, 600, 609, 668, 689, 782, 801, 804, 805, 822, 863, 875, 889 Unit polarization vector of electric field, 127, 284, 303, 350, 352, 755, 796 of magnetic field, 304, 352 Unit set, 196 Unit vectors, 4, 129, 130, 273, 279, 280, 296–298, 303–305, 325, 327, 352, 364, 436, 540, 611, 803 Universal covering group, 896 Universal set, 182–185 Unknowns, 153, 284, 305, 596, 598, 600 Unperturbed state, 153, 158 system, 152, 153 Upper right off-diagonal elements, 450, 462 Upper triangle matrix, 462, 464, 476, 485, 496, 553, 606

V Variable transformation, 46, 47, 78, 140, 160, 234, 278 Variance, 51–55 Variance operator, 51 Variational method, 151, 172–179

916 Variational principle, 36, 110 Vector transformation, 182, 433–454, 459, 487, 505, 538, 635, 678, 803 Vector analysis, 269, 276, 281 Vector space complex, 136 finite-dimensional, 433, 435, 448, 449, 748 linear, 433, 438, 440, 444, 445, 448, 459, 518, 523, 524, 559, 575, 581, 583, 624, 627, 691, 872 n-dimensional, 433, 434, 437, 440, 448 Venn diagrams, 182–184, 186 Vertical incidence, 303, 305, 312

W Wave equations, 8, 269, 279, 284, 322, 324, 341, 342 Wave front, 328 Waveguide, 295, 319–331, 359, 371, 373 Wavelength dispersion, 358–362, 373 Wavelengths, 4–6, 149, 302, 303, 310, 334, 336, 337, 357–359, 364, 372, 374, 611 Wavenumber, 4–6, 278, 297, 298, 301, 315, 321, 323, 326, 328, 357, 364 Wavenumber vectors, 4, 5, 279, 297, 298, 301, 315, 321, 323, 355, 364 Wavevector, 363, 364, 372, 373 Wave zone, 350–352 Weak damping, 420 Weight function, 49, 384, 391, 393, 395, 405, 410, 422, 426–428, 593, 594

Index Wigner formula, 814 Wronskian, 15, 298, 381, 383, 409, 412

X X-ray, 4–6

Y Yukawa potential, 107

Z Zenithal angle, 70, 674, 831, 839, 895 Zero matrix, 27, 469, 471, 475, 476, 487–489, 512, 562, 585, 587, 868, 891 Zeros, 5, 48, 69, 132, 176, 204, 270, 318, 340, 408, 434, 460, 532, 553, 607, 624, 639, 695, 743, 806 Δ δ functions, 396, 397, 399, 402, 422 Θ θ function, 422 Π π-electron approximation, 767, 777