Physical mathematics [2 ed.] 9781108470032, 2300748011879, 4185558011879

399 48 11MB

English Pages 779 Year 2019

Polecaj historie

Physical Mathematics [2 ed.] 1108470033, 9781108470032

Unique in its clarity, examples, and range, Physical Mathematics explains simply and succinctly the mathematics that gra

1,692 152 42MB Read more

Mathematics and the Physical World 9780486136318, 0486241041

832 78 6MB Read more

Mathematics for the Physical Sciences 9783110409475, 9783110409451

The book begins with a thorough introduction to complex analysis, which is then used to understand the properties of ord

312 75 3MB Read more

Mathematics for the Physical Sciences 9780486450384, 9780486636351, 0486450384, 0486636356

Advanced undergraduates and graduate students in the natural sciences receive a solid foundation in several fields of ma

548 86 5MB Read more

Mathematics for the Physical Sciences (Collection Enseignement des sciences Hermann) [Revised]

In its discussions of mathematical entities and their elementary properties, this text employs examples from the physica

460 88 60MB Read more

Physical Optics

135 36 3MB Read more

Integrable Hierarchies and Modern Physical Theories (NATO Science Series II: Mathematics, Physics and Chemistry, Volume 18) (NATO Science Series II: Mathematics, Physics and Chemistry, 18) 0792369629, 9780792369622

A self-contained collection of reviews, reports and survey articles describing the background to and recent developments

148 65 4MB Read more

Mathematics

173 107 186MB Read more

Physical Education and Sport

The manual contains the most important the theoretical information on the discipline «physical culture and sport», by th

989 62 19MB Read more

Physical optics 9783030252786, 9783030252793

783 189 6MB Read more

Physical mathematics [2 ed.]
9781108470032, 2300748011879, 4185558011879

Author / Uploaded
Cahill K

Table of contents :
Contents......Page 7
Preface......Page 20
1.1 Numbers......Page 22
1.2 Arrays......Page 23
1.3 Matrices......Page 25
1.4 Vectors......Page 28
1.5 Linear Operators......Page 31
1.6 Inner Products......Page 33
1.7 Cauchy–Schwarz Inequalities......Page 36
1.8 Linear Independence and Completeness......Page 37
1.9 Dimension of a Vector Space......Page 38
1.10 Orthonormal Vectors......Page 39
1.11 Outer Products......Page 40
1.12 Dirac Notation......Page 41
1.13 Adjoints of Operators......Page 46
1.14 Self-Adjoint or Hermitian Linear Operators......Page 47
1.16 Unitary Operators......Page 48
1.17 Hilbert Spaces......Page 50
1.20 Determinants......Page 51
1.21 Jacobians......Page 59
1.22 Systems of Linear Equations......Page 61
1.24 Lagrange Multipliers......Page 62
1.26 Eigenvectors of a Square Matrix......Page 64
1.27 A Matrix Obeys Its Characteristic Equation......Page 68
1.28 Functions of Matrices......Page 70
1.29 Hermitian Matrices......Page 75
1.30 Normal Matrices......Page 80
1.31 Compatible Normal Matrices......Page 82
1.32 Singular-Value Decompositions......Page 86
1.33 Moore–Penrose Pseudoinverses......Page 91
1.34 Tensor Products and Entanglement......Page 93
1.35 Density Operators......Page 97
1.36 Schmidt Decomposition......Page 98
1.37 Correlation Functions......Page 99
1.39 Software......Page 101
Exercises......Page 102
2.1 Derivatives and Partial Derivatives......Page 105
2.2 Gradient......Page 106
2.3 Divergence......Page 107
2.4 Laplacian......Page 109
2.5 Curl......Page 110
Exercises......Page 113
3.1 Fourier Series......Page 114
3.2 The Interval......Page 117
3.3 Where to Put the 2pi’s......Page 118
3.4 Real Fourier Series for Real Functions......Page 119
3.5 Stretched Intervals......Page 123
3.6 Fourier Series of Functions of Several Variables......Page 124
3.8 How Fourier Series Converge......Page 125
3.9 Measure and Lebesgue Integration......Page 129
3.10 Quantum-Mechanical Examples......Page 131
3.11 Dirac’s Delta Function......Page 138
3.12 Harmonic Oscillators......Page 141
3.13 Nonrelativistic Strings......Page 143
3.14 Periodic Boundary Conditions......Page 144
Exercises......Page 146
4.1 Fourier Transforms......Page 149
4.2 Fourier Transforms of Real Functions......Page 152
4.3 Dirac, Parseval, and Poisson......Page 153
4.4 Derivatives and Integrals of Fourier Transforms......Page 157
4.5 Fourier Transforms of Functions of Several Variables......Page 162
4.6 Convolutions......Page 163
4.7 Fourier Transform of a Convolution......Page 165
4.8 Fourier Transforms and Green’s Functions......Page 166
4.9 Laplace Transforms......Page 167
4.10 Derivatives and Integrals of Laplace Transforms......Page 169
4.11 Laplace Transforms and Differential Equations......Page 170
4.13 Application to Differential Equations......Page 171
Exercises......Page 177
5.1 Convergence......Page 179
5.2 Tests of Convergence......Page 180
5.3 Convergent Series of Functions......Page 182
5.4 Power Series......Page 183
5.5 Factorials and the Gamma Function......Page 184
5.7 Taylor Series......Page 189
5.8 Fourier Series as Power Series......Page 190
5.9 Binomial Series......Page 191
5.11 Dirichlet Series and the Zeta Function......Page 193
5.12 Bernoulli Numbers and Polynomials......Page 195
5.13 Asymptotic Series......Page 196
5.14 Fractional and Complex Derivatives......Page 198
5.15 Some Electrostatic Problems......Page 199
5.16 Infinite Products......Page 202
Exercises......Page 203
6.1 Analytic Functions......Page 206
6.2 Cauchy–Riemann Conditions......Page 207
6.3 Cauchy’s Integral Theorem......Page 208
6.4 Cauchy’s Integral Formula......Page 214
6.5 Harmonic Functions......Page 217
6.6 Taylor Series for Analytic Functions......Page 219
6.8 Liouville’s Theorem......Page 220
6.9 Fundamental Theorem of Algebra......Page 221
6.10 Laurent Series......Page 222
6.11 Singularities......Page 224
6.12 Analytic Continuation......Page 226
6.13 Calculus of Residues......Page 228
6.14 Ghost Contours......Page 230
6.15 Logarithms and Cuts......Page 241
6.16 Powers and Roots......Page 242
6.17 Conformal Mapping......Page 246
6.18 Cauchy’s Principal Value......Page 248
6.19 Dispersion Relations......Page 254
6.20 Kramers–Kronig Relations......Page 256
6.21 Phase and Group Velocities......Page 257
6.22 Method of Steepest Descent......Page 260
6.23 Applications to String Theory......Page 262
Exercises......Page 264
7.1 Ordinary Linear Differential Equations......Page 269
7.2 Linear Partial Differential Equations......Page 271
7.3 Separable Partial Differential Equations......Page 272
7.5 Separable First-Order Differential Equations......Page 278
7.7 Exact First-Order Differential Equations......Page 281
7.8 Meaning of Exactness......Page 282
7.9 Integrating Factors......Page 284
7.10 Homogeneous Functions......Page 285
7.11 Virial Theorem......Page 286
7.12 Legendre’s Transform......Page 288
7.13 Principle of Stationary Action in Mechanics......Page 291
7.14 Symmetries and Conserved Quantities in Mechanics......Page 293
7.15 Homogeneous First-Order Ordinary Differential Equations......Page 294
7.16 Linear First-Order Ordinary Differential Equations......Page 295
7.17 Small Oscillations......Page 298
7.18 Systems of Ordinary Differential Equations......Page 299
7.19 Exact Higher-Order Differential Equations......Page 300
7.20 Constant-Coefficient Equations......Page 301
7.21 Singular Points of Second-Order Ordinary Differential Equations......Page 302
7.22 Frobenius’s Series Solutions......Page 303
7.24 Even and Odd Differential Operators......Page 306
7.25 Wronski’s Determinant......Page 307
7.26 Second Solutions......Page 308
7.27 Why Not Three Solutions?......Page 309
7.28 Boundary Conditions......Page 310
7.29 A Variational Problem......Page 311
7.30 Self-Adjoint Differential Operators......Page 312
7.31 Self-Adjoint Differential Systems......Page 314
7.32 Making Operators Formally Self-Adjoint......Page 316
7.33 Wronskians of Self-Adjoint Operators......Page 317
7.34 First-Order Self-Adjoint Differential Operators......Page 319
7.35 A Constrained Variational Problem......Page 320
7.36 Eigenfunctions and Eigenvalues of Self-Adjoint Systems......Page 325
7.37 Unboundedness of Eigenvalues......Page 328
7.38 Completeness of Eigenfunctions......Page 330
7.39 Inequalities of Bessel and Schwarz......Page 335
7.40 Green’s Functions......Page 336
7.41 Eigenfunctions and Green’s Functions......Page 339
7.42 Green’s Functions in One Dimension......Page 340
7.43 Principle of Stationary Action in Field Theory......Page 342
7.44 Symmetries and Conserved Quantities in Field Theory......Page 343
7.45 Nonlinear Differential Equations......Page 345
7.46 Nonlinear Differential Equations in Cosmology......Page 346
7.47 Nonlinear Differential Equations in Particle Physics......Page 349
Exercises......Page 352
8.1 Differential Equations as Integral Equations......Page 355
8.3 Volterra Integral Equations......Page 356
8.4 Implications of Linearity......Page 357
8.5 Numerical Solutions......Page 358
Exercises......Page 362
9.1 Legendre’s Polynomials......Page 364
9.2 The Rodrigues Formula......Page 365
9.3 Generating Function for Legendre Polynomials......Page 367
9.4 Legendre’s Differential Equation......Page 368
9.5 Recurrence Relations......Page 370
9.7 Schlaefli’s Integral......Page 372
9.8 Orthogonal Polynomials......Page 373
9.9 Azimuthally Symmetric Laplacians......Page 375
9.10 Laplace’s Equation in Two Dimensions......Page 376
9.12 Associated Legendre Polynomials......Page 377
9.13 Spherical Harmonics......Page 379
9.14 Cosmic Microwave Background Radiation......Page 381
Further Reading......Page 383
Exercises......Page 384
10.1 Cylindrical Bessel Functions of the First Kind......Page 386
10.2 Spherical Bessel Functions of the First Kind......Page 397
10.3 Bessel Functions of the Second Kind......Page 403
10.4 Spherical Bessel Functions of the Second Kind......Page 405
Exercises......Page 407
11.1 What Is a Group?......Page 411
11.2 Representations of Groups......Page 414
11.3 Representations Acting in Hilbert Space......Page 415
11.4 Subgroups......Page 416
11.6 Morphisms......Page 418
11.7 Schur’s Lemma......Page 419
11.8 Characters......Page 420
11.9 Direct Products......Page 421
11.10 Finite Groups......Page 422
11.11 Regular Representations......Page 423
11.13 Permutations......Page 424
11.15 Generators......Page 425
11.16 Lie Algebra......Page 426
11.18 Rotation Group......Page 430
11.20 Defining Representation of SU(2)......Page 434
11.21 The Lie Algebra and Representations of SU(2)......Page 435
11.22 How a Field Transforms Under a Rotation......Page 438
11.23 Addition of Two Spin-One-Half Systems......Page 439
11.25 Adjoint Representations......Page 442
11.26 Casimir Operators......Page 443
11.27 Tensor Operators for the Rotation Group......Page 444
11.28 Simple and Semisimple Lie Algebras......Page 445
11.30 SU(3) and Quarks......Page 446
11.31 Fierz Identity for SU(n)......Page 447
11.33 Symplectic Group Sp(2n)......Page 448
11.34 Quaternions......Page 450
11.35 Quaternions and Symplectic Groups......Page 452
11.37 Group Integration......Page 454
11.38 Lorentz Group......Page 456
11.39 Left-Handed Representation of the Lorentz Group......Page 460
11.40 Right-Handed Representation of the Lorentz Group......Page 463
11.41 Dirac’s Representation of the Lorentz Group......Page 465
11.42 Poincaré Group......Page 467
Exercises......Page 468
12.1 Inertial Frames and Lorentz Transformations......Page 472
12.2 Special Relativity......Page 474
12.3 Kinematics......Page 476
12.4 Electrodynamics......Page 477
12.5 Principle of Stationary Action in Special Relativity......Page 480
12.6 Differential Forms......Page 481
Exercises......Page 485
13.1 Points and Their Coordinates......Page 487
13.4 Covariant Vectors......Page 488
13.5 Tensors......Page 489
13.6 Summation Convention and Contractions......Page 490
13.8 Quotient Theorem......Page 491
13.10 Comma Notation for Derivatives......Page 492
13.12 Metric Tensor......Page 493
13.13 Inverse of Metric Tensor......Page 497
13.15 Covariant Derivatives of Contravariant Vectors......Page 498
13.17 Covariant Derivatives of Tensors......Page 500
13.18 The Covariant Derivative of the Metric Tensor Vanishes......Page 502
13.21 What is the Affine Connection?......Page 503
13.22 Parallel Transport......Page 504
13.23 Curvature......Page 505
13.24 Maximally Symmetric Spaces......Page 511
13.25 Principle of Equivalence......Page 513
13.26 Tetrads......Page 514
13.27 Scalar Densities and g = | det(gik)|......Page 515
13.28 Levi-Civita’s Symbol and Tensor......Page 516
13.29 Divergence of a Contravariant Vector......Page 517
13.31 Principle of Stationary Action in General Relativity......Page 519
13.32 Equivalence Principle and Geodesic Equation......Page 522
13.33 Weak Static Gravitational Fields......Page 523
13.35 Einstein’s Equations......Page 524
13.36 Energy–Momentum Tensor......Page 527
13.37 Perfect Fluids......Page 528
13.39 Schwarzschild’s Solution......Page 529
13.40 Black Holes......Page 530
13.41 Rotating Black Holes......Page 531
13.42 Spatially Symmetric Spacetimes......Page 532
13.43 Friedmann–Lemaître–Robinson–Walker Cosmologies......Page 534
13.44 Density and Pressure......Page 536
13.45 How the Scale Factor Evolves with Time......Page 537
13.46 The First Hundred Thousand Years......Page 539
13.47 The Next Ten Billion Years......Page 541
13.49 Before the Big Bang......Page 543
13.50 Yang–Mills Theory......Page 544
13.51 Cartan’s Spin Connection and Structure Equations......Page 546
13.53 Gauge Theory and Vectors......Page 550
Exercises......Page 552
14.1 Exterior Forms......Page 557
14.2 Differential Forms......Page 559
14.3 Exterior Differentiation......Page 564
14.4 Integration of Forms......Page 569
14.5 Are Closed Forms Exact?......Page 574
14.6 Complex Differential Forms......Page 576
14.7 Hodge’s Star......Page 577
14.8 Theorem of Frobenius......Page 581
Exercises......Page 583
15.1 Probability and Thomas Bayes......Page 585
15.2 Mean and Variance......Page 589
15.3 Binomial Distribution......Page 593
15.5 Poisson’s Distribution......Page 595
15.6 Gauss’s Distribution......Page 598
15.7 The Error Function erf......Page 601
15.8 Error Analysis......Page 604
15.9 Maxwell–Boltzmann Distribution......Page 606
15.10 Fermi–Dirac and Bose–Einstein Distributions......Page 607
15.11 Diffusion......Page 608
15.12 Langevin’s Theory of Brownian Motion......Page 609
15.13 Einstein–Nernst Relation......Page 612
15.14 Fluctuation and Dissipation......Page 613
15.15 Fokker–Planck Equation......Page 616
15.16 Characteristic and Moment-Generating Functions......Page 618
15.17 Fat Tails......Page 621
15.18 Central Limit Theorem and Jarl Lindeberg......Page 623
15.19 Random-Number Generators......Page 628
15.20 Illustration of the Central Limit Theorem......Page 630
15.21 Measurements, Estimators, and Friedrich Bessel......Page 632
15.22 Information and Ronald Fisher......Page 635
15.23 Maximum Likelihood......Page 639
15.24 Karl Pearson’s Chi-Squared Statistic......Page 640
15.25 Kolmogorov’s Test......Page 643
Exercises......Page 649
16.2 Numerical Integration......Page 653
16.4 Applications to Experiments......Page 654
16.5 Statistical Mechanics......Page 655
16.8 Evolution......Page 661
Exercises......Page 662
17.2 Slagle’s Symbolic Automatic Integrator......Page 664
17.3 Neural Networks......Page 665
17.4 A Linear Unbiased Neural Network......Page 666
18 Order, Chaos, and Fractals......Page 667
18.1 Hamilton Systems......Page 668
18.2 Autonomous Systems of Ordinary Differential Equations......Page 670
18.3 Attractors......Page 671
18.4 Chaos......Page 672
18.5 Maps......Page 674
18.6 Fractals......Page 677
Further Reading......Page 680
Exercises......Page 681
19.2 Functional Derivatives......Page 682
19.3 Higher-Order Functional Derivatives......Page 684
19.5 Functional Differential Equations......Page 686
Exercises......Page 689
20.2 Gaussian Integrals and Trotter’s Formula......Page 690
20.3 Path Integrals in Quantum Mechanics......Page 691
20.4 Path Integrals for Quadratic Actions......Page 695
20.5 Path Integrals in Statistical Mechanics......Page 700
20.6 Boltzmann Path Integrals for Quadratic Actions......Page 704
20.7 Mean Values of Time-Ordered Products......Page 706
20.8 Quantum Field Theory on a Lattice......Page 709
20.9 Finite-Temperature Field Theory......Page 713
20.10 Perturbation Theory......Page 715
20.11 Application to Quantum Electrodynamics......Page 719
20.12 Fermionic Path Integrals......Page 723
20.14 Faddeev–Popov Trick......Page 730
20.15 Ghosts......Page 733
20.16 Effective Field Theories......Page 734
Exercises......Page 735
21.2 Renormalization Group in Quantum Field Theory......Page 739
21.3 Renormalization Group in Lattice Field Theory......Page 744
21.4 Renormalization Group in Condensed-Matter Physics......Page 745
Exercises......Page 747
22.1 The Nambu–Goto String Action......Page 748
22.2 Static Gauge and Regge Trajectories......Page 750
22.3 Light-Cone Coordinates......Page 752
22.5 Quantized Open Strings......Page 753
22.8 D-branes or P-branes......Page 755
22.10 Riemann Surfaces and Moduli......Page 756
Exercises......Page 757
References......Page 758
Index......Page 765

Citation preview

PHYSICAL MATHEMATICS Second edition

Unique in its clarity, examples, and range, Physical Mathematics explains simply and succinctly the mathematics that graduate students and professional physicists need to succeed in their courses and research. The book illustrates the mathematics with numerous physical examples drawn from contemporary research. This second edition has new chapters on vector calculus, special relativity and artificial intelligence, and many new sections and examples. In addition to basic subjects such as linear algebra, Fourier analysis, complex variables, differential equations, Bessel functions, and spherical harmonics, the book explains topics such as the singular value decomposition, Lie algebras and group theory, tensors and general relativity, the central limit theorem and Kolmogorov’s theorems, Monte Carlo methods of experimental and theoretical physics, Feynman’s path integrals, and the standard model of cosmology. K E V I N C A H I L L is Professor of Physics and Astronomy at the University of New

Mexico. He has carried out research at NIST, Saclay, Ecole Polytechnique, Orsay, Harvard University, NIH, LBL, and SLAC, and has worked in quantum optics, quantum field theory, lattice gauge theory, and biophysics. Physical Mathematics is based on courses taught by the author at the University of New Mexico and at Fudan University in Shanghai.

PHYSICAL MATHEMATICS Second edition KEVIN CAHILL University of New Mexico

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108470032 DOI: 10.1017/9781108555814 First edition ± c K. Cahill 2013 Second edition ± c Cambridge University Press 2019

This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2013 Reprinted with corrections 2014 Second edition 2019 Printed in the United Kingdom by TJ International Ltd, Padstow Cornwall, 2019 A catalog record for this publication is available from the British Library. Library of Congress Cataloging-in-Publication Data Names: Cahill, Kevin, 1941– author. Title: Physical mathematics / Kevin Cahill (University of New Mexico). Description: Second edition. | Cambridge ; New York, NY : Cambridge University Press, 2019. | Includes bibliographical references and index. Identifiers: LCCN 2019008214 | ISBN 9781108470032 (alk. paper) Subjects: LCSH: Mathematical physics. | Mathematical physics – Textbooks. | Mathematics – Study and teaching (Higher) Classification: LCC QC20 .C24 2019 | DDC 530.15–dc23 LC record available at https://lccn.loc.gov/2019008214 ISBN 978-1-108-47003-2 Hardback Additional resources for this publication at www.cambridge.org/Cahill2ed Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

To Ginette, Michael, Sean, Peter, Micheon, Danielle, Rachel, Mia, James, Dylan, Christopher, and Liam

Brief Contents

Preface

page xxi

1

Linear Algebra

1

2

Vector Calculus

84

3

Fourier Series

93

4

Fourier and Laplace Transforms

128

5

Infinite Series

158

6

Complex-Variable Theory

185

7

Differential Equations

248

8

Integral Equations

334

9

Legendre Polynomials and Spherical Harmonics

343

10 Bessel Functions

365

11 Group Theory

390

12 Special Relativity

451

13 General Relativity

466

14 Forms

536

15 Probability and Statistics

564

16 Monte Carlo Methods

632 vii

viii

Brief Contents

17 Artificial Intelligence

643

18 Order, Chaos, and Fractals

647

19 Functional Derivatives

661

20 Path Integrals

669

21 Renormalization Group

718

22 Strings

727

References Index

737 744

Contents

Preface 1

Linear Algebra 1.1 Numbers 1.2 Arrays 1.3 Matrices 1.4 Vectors 1.5 Linear Operators 1.6 Inner Products 1.7 Cauchy–Schwarz Inequalities 1.8 Linear Independence and Completeness 1.9 Dimension of a Vector Space 1.10 Orthonormal Vectors 1.11 Outer Products 1.12 Dirac Notation 1.13 Adjoints of Operators 1.14 Self-Adjoint or Hermitian Linear Operators 1.15 Real, Symmetric Linear Operators 1.16 Unitary Operators 1.17 Hilbert Spaces 1.18 Antiunitary, Antilinear Operators 1.19 Symmetry in Quantum Mechanics 1.20 Determinants 1.21 Jacobians 1.22 Systems of Linear Equations 1.23 Linear Least Squares 1.24 Lagrange Multipliers 1.25 Eigenvectors and Eigenvalues

page xxi 1 1 2 4 7 10 12 15 16 17 18 19 20 25 26 27 27 29 30 30 30 38 40 41 41 43 ix

x

Contents

1.26 Eigenvectors of a Square Matrix 1.27 A Matrix Obeys Its Characteristic Equation 1.28 Functions of Matrices 1.29 Hermitian Matrices 1.30 Normal Matrices 1.31 Compatible Normal Matrices 1.32 Singular-Value Decompositions 1.33 Moore–Penrose Pseudoinverses 1.34 Tensor Products and Entanglement 1.35 Density Operators 1.36 Schmidt Decomposition 1.37 Correlation Functions 1.38 Rank of a Matrix 1.39 Software Exercises

44 48 49 54 59 61 65 70 72 76 77 78 80 80 81

2

Vector Calculus 2.1 Derivatives and Partial Derivatives 2.2 Gradient 2.3 Divergence 2.4 Laplacian 2.5 Curl Exercises

84 84 85 86 88 89 92

3

Fourier Series 3.1 Fourier Series 3.2 The Interval 3.3 Where to Put the 2pi’s 3.4 Real Fourier Series for Real Functions 3.5 Stretched Intervals 3.6 Fourier Series of Functions of Several Variables 3.7 Integration and Differentiation of Fourier Series 3.8 How Fourier Series Converge 3.9 Measure and Lebesgue Integration 3.10 Quantum-Mechanical Examples 3.11 Dirac’s Delta Function 3.12 Harmonic Oscillators 3.13 Nonrelativistic Strings 3.14 Periodic Boundary Conditions Exercises

93 93 96 97 98 102 103 104 104 108 110 117 120 122 123 125

Contents

xi

4

Fourier and Laplace Transforms 4.1 Fourier Transforms 4.2 Fourier Transforms of Real Functions 4.3 Dirac, Parseval, and Poisson 4.4 Derivatives and Integrals of Fourier Transforms 4.5 Fourier Transforms of Functions of Several Variables 4.6 Convolutions 4.7 Fourier Transform of a Convolution 4.8 Fourier Transforms and Green’s Functions 4.9 Laplace Transforms 4.10 Derivatives and Integrals of Laplace Transforms 4.11 Laplace Transforms and Differential Equations 4.12 Inversion of Laplace Transforms 4.13 Application to Differential Equations Exercises

128 128 131 132 136 141 142 144 145 146 148 149 150 150 156

5

Infinite Series 5.1 Convergence 5.2 Tests of Convergence 5.3 Convergent Series of Functions 5.4 Power Series 5.5 Factorials and the Gamma Function 5.6 Euler’s Beta Function 5.7 Taylor Series 5.8 Fourier Series as Power Series 5.9 Binomial Series 5.10 Logarithmic Series 5.11 Dirichlet Series and the Zeta Function 5.12 Bernoulli Numbers and Polynomials 5.13 Asymptotic Series 5.14 Fractional and Complex Derivatives 5.15 Some Electrostatic Problems 5.16 Infinite Products Exercises

158 158 159 161 162 163 168 168 169 170 172 172 174 175 177 178 181 182

6

Complex-Variable Theory 6.1 Analytic Functions 6.2 Cauchy–Riemann Conditions 6.3 Cauchy’s Integral Theorem 6.4 Cauchy’s Integral Formula

185 185 186 187 193

xii

7

Contents

6.5 Harmonic Functions 6.6 Taylor Series for Analytic Functions 6.7 Cauchy’s Inequality 6.8 Liouville’s Theorem 6.9 Fundamental Theorem of Algebra 6.10 Laurent Series 6.11 Singularities 6.12 Analytic Continuation 6.13 Calculus of Residues 6.14 Ghost Contours 6.15 Logarithms and Cuts 6.16 Powers and Roots 6.17 Conformal Mapping 6.18 Cauchy’s Principal Value 6.19 Dispersion Relations 6.20 Kramers–Kronig Relations 6.21 Phase and Group Velocities 6.22 Method of Steepest Descent 6.23 Applications to String Theory Further Reading Exercises

196 198 199 199 200 201 203 205 207 209 220 221 225 227 233 235 236 239 241 243 243

Differential Equations 7.1 Ordinary Linear Differential Equations 7.2 Linear Partial Differential Equations 7.3 Separable Partial Differential Equations 7.4 First-Order Differential Equations 7.5 Separable First-Order Differential Equations 7.6 Hidden Separability 7.7 Exact First-Order Differential Equations 7.8 Meaning of Exactness 7.9 Integrating Factors 7.10 Homogeneous Functions 7.11 Virial Theorem 7.12 Legendre’s Transform 7.13 Principle of Stationary Action in Mechanics 7.14 Symmetries and Conserved Quantities in Mechanics 7.15 Homogeneous First-Order Ordinary Differential Equations 7.16 Linear First-Order Ordinary Differential Equations 7.17 Small Oscillations

248 248 250 251 257 257 260 260 261 263 264 265 267 270 272 273 274 277

Contents

8

xiii

7.18 7.19 7.20 7.21

Systems of Ordinary Differential Equations Exact Higher-Order Differential Equations Constant-Coefficient Equations Singular Points of Second-Order Ordinary Differential Equations 7.22 Frobenius’s Series Solutions 7.23 Fuch’s Theorem 7.24 Even and Odd Differential Operators 7.25 Wronski’s Determinant 7.26 Second Solutions 7.27 Why Not Three Solutions? 7.28 Boundary Conditions 7.29 A Variational Problem 7.30 Self-Adjoint Differential Operators 7.31 Self-Adjoint Differential Systems 7.32 Making Operators Formally Self-Adjoint 7.33 Wronskians of Self-Adjoint Operators 7.34 First-Order Self-Adjoint Differential Operators 7.35 A Constrained Variational Problem 7.36 Eigenfunctions and Eigenvalues of Self-Adjoint Systems 7.37 Unboundedness of Eigenvalues 7.38 Completeness of Eigenfunctions 7.39 Inequalities of Bessel and Schwarz 7.40 Green’s Functions 7.41 Eigenfunctions and Green’s Functions 7.42 Green’s Functions in One Dimension 7.43 Principle of Stationary Action in Field Theory 7.44 Symmetries and Conserved Quantities in Field Theory 7.45 Nonlinear Differential Equations 7.46 Nonlinear Differential Equations in Cosmology 7.47 Nonlinear Differential Equations in Particle Physics Further Reading Exercises

278 280 280

Integral Equations 8.1 Differential Equations as Integral Equations 8.2 Fredholm Integral Equations 8.3 Volterra Integral Equations 8.4 Implications of Linearity 8.5 Numerical Solutions

334 334 335 335 336 337

281 282 285 285 286 287 288 289 290 291 293 295 296 298 299 304 307 309 314 315 318 319 321 322 324 325 328 331 331

xiv

Contents

8.6 Integral Transformations Exercises

339 341

Legendre Polynomials and Spherical Harmonics 9.1 Legendre’s Polynomials 9.2 The Rodrigues Formula 9.3 Generating Function for Legendre Polynomials 9.4 Legendre’s Differential Equation 9.5 Recurrence Relations 9.6 Special Values of Legendre Polynomials 9.7 Schlaefli’s Integral 9.8 Orthogonal Polynomials 9.9 Azimuthally Symmetric Laplacians 9.10 Laplace’s Equation in Two Dimensions 9.11 Helmholtz’s Equation in Spherical Coordinates 9.12 Associated Legendre Polynomials 9.13 Spherical Harmonics 9.14 Cosmic Microwave Background Radiation Further Reading Exercises

343 343 344 346 347 349 351 351 352 354 355 356 356 358 360 362 363

10 Bessel Functions 10.1 Cylindrical Bessel Functions of the First Kind 10.2 Spherical Bessel Functions of the First Kind 10.3 Bessel Functions of the Second Kind 10.4 Spherical Bessel Functions of the Second Kind Further Reading Exercises

365 365 376 382 384 386 386

11 Group Theory 11.1 What Is a Group? 11.2 Representations of Groups 11.3 Representations Acting in Hilbert Space 11.4 Subgroups 11.5 Cosets 11.6 Morphisms 11.7 Schur’s Lemma 11.8 Characters 11.9 Direct Products 11.10 Finite Groups 11.11 Regular Representations

390 390 393 394 395 397 397 398 399 400 401 402

9

Contents

11.12 11.13 11.14 11.15 11.16 11.17 11.18 11.19 11.20 11.21 11.22 11.23 11.24 11.25 11.26 11.27 11.28 11.29 11.30 11.31 11.32 11.33 11.34 11.35 11.36 11.37 11.38 11.39 11.40

Properties of Finite Groups Permutations Compact and Noncompact Lie Groups Generators Lie Algebra Yang and Mills Invent Local Nonabelian Symmetry Rotation Group Rotations and Reflections in 2n Dimensions Defining Representation of SU (2) The Lie Algebra and Representations of SU (2) How a Field Transforms Under a Rotation Addition of Two Spin-One-Half Systems Jacobi Identity Adjoint Representations Casimir Operators Tensor Operators for the Rotation Group Simple and Semisimple Lie Algebras SU(3) SU(3) and Quarks Fierz Identity for SU (n ) Cartan Subalgebra Symplectic Group Sp(2n) Quaternions Quaternions and Symplectic Groups Compact Simple Lie Groups Group Integration Lorentz Group Left-Handed Representation of the Lorentz Group Right-Handed Representation of the Lorentz Group 11.41 Dirac’s Representation of the Lorentz Group 11.42 Poincaré Group 11.43 Homotopy Groups Further Reading Exercises

12 Special Relativity 12.1 Inertial Frames and Lorentz Transformations 12.2 Special Relativity 12.3 Kinematics

xv

403 404 404 404 405 409 409 413 413 414 417 418 421 421 422 423 424 425 425 426 427 427 429 431 433 433 435 439 442 444 446 447 447 447 451 451 453 455

xvi

Contents

12.4 Electrodynamics 12.5 Principle of Stationary Action in Special Relativity 12.6 Differential Forms Exercises

456 459 460 464

13 General Relativity 13.1 Points and Their Coordinates 13.2 Scalars 13.3 Contravariant Vectors 13.4 Covariant Vectors 13.5 Tensors 13.6 Summation Convention and Contractions 13.7 Symmetric and Antisymmetric Tensors 13.8 Quotient Theorem 13.9 Tensor Equations 13.10 Comma Notation for Derivatives 13.11 Basis Vectors and Tangent Vectors 13.12 Metric Tensor 13.13 Inverse of Metric Tensor 13.14 Dual Vectors, Cotangent Vectors 13.15 Covariant Derivatives of Contravariant Vectors 13.16 Covariant Derivatives of Covariant Vectors 13.17 Covariant Derivatives of Tensors 13.18 The Covariant Derivative of the Metric Tensor Vanishes 13.19 Covariant Curls 13.20 Covariant Derivatives and Antisymmetry 13.21 What is the Affine Connection? 13.22 Parallel Transport 13.23 Curvature 13.24 Maximally Symmetric Spaces 13.25 Principle of Equivalence 13.26 Tetrads 13.27 Scalar Densities and g = | det( gik )| 13.28 Levi-Civita’s Symbol and Tensor 13.29 Divergence of a Contravariant Vector 13.30 Covariant Laplacian 13.31 Principle of Stationary Action in General Relativity 13.32 Equivalence Principle and Geodesic Equation 13.33 Weak Static Gravitational Fields 13.34 Gravitational Time Dilation

466 466 467 467 467 468 469 470 470 471 471 472 472 476 477 477 479 479 481 482 482 482 483 484 490 492 493 494 495 496 498 498 501 502 503

Contents

13.35 Einstein’s Equations 13.36 Energy–Momentum Tensor 13.37 Perfect Fluids 13.38 Gravitational Waves 13.39 Schwarzschild’s Solution 13.40 Black Holes 13.41 Rotating Black Holes 13.42 Spatially Symmetric Spacetimes 13.43 Friedmann–Lemaître–Robinson–Walker Cosmologies 13.44 Density and Pressure 13.45 How the Scale Factor Evolves with Time 13.46 The First Hundred Thousand Years 13.47 The Next Ten Billion Years 13.48 Era of Dark Energy 13.49 Before the Big Bang 13.50 Yang–Mills Theory 13.51 Cartan’s Spin Connection and Structure Equations 13.52 Spin-One-Half Fields in General Relativity 13.53 Gauge Theory and Vectors Further Reading Exercises

xvii

504 506 507 508 508 509 510 511 513 515 516 518 520 522 522 523 525 529 529 531 531

14 Forms 14.1 Exterior Forms 14.2 Differential Forms 14.3 Exterior Differentiation 14.4 Integration of Forms 14.5 Are Closed Forms Exact? 14.6 Complex Differential Forms 14.7 Hodge’s Star 14.8 Theorem of Frobenius Further Reading Exercises

536 536 538 543 548 553 555 556 560 562 562

15 Probability and Statistics 15.1 Probability and Thomas Bayes 15.2 Mean and Variance 15.3 Binomial Distribution 15.4 Coping with Big Factorials 15.5 Poisson’s Distribution

564 564 568 572 574 575

xviii

Contents

15.6 Gauss’s Distribution 15.7 The Error Function erf 15.8 Error Analysis 15.9 Maxwell–Boltzmann Distribution 15.10 Fermi–Dirac and Bose–Einstein Distributions 15.11 Diffusion 15.12 Langevin’s Theory of Brownian Motion 15.13 Einstein–Nernst Relation 15.14 Fluctuation and Dissipation 15.15 Fokker–Planck Equation 15.16 Characteristic and Moment-Generating Functions 15.17 Fat Tails 15.18 Central Limit Theorem and Jarl Lindeberg 15.19 Random-Number Generators 15.20 Illustration of the Central Limit Theorem 15.21 Measurements, Estimators, and Friedrich Bessel 15.22 Information and Ronald Fisher 15.23 Maximum Likelihood 15.24 Karl Pearson’s Chi-Squared Statistic 15.25 Kolmogorov’s Test Further Reading Exercises

577 581 583 585 586 587 588 591 592 595 597 600 602 607 609 611 614 618 619 622 628 628

16 Monte Carlo Methods 16.1 The Monte Carlo Method 16.2 Numerical Integration 16.3 Quasirandom Numbers 16.4 Applications to Experiments 16.5 Statistical Mechanics 16.6 Simulated Annealing 16.7 Solving Arbitrary Problems 16.8 Evolution Further Reading Exercises

632 632 632 633 633 634 640 640 640 641 641

17 Artificial Intelligence 17.1 Steps Toward Artificial Intelligence 17.2 Slagle’s Symbolic Automatic Integrator 17.3 Neural Networks 17.4 A Linear Unbiased Neural Network Further Reading

643 643 643 644 645 646

Contents

xix

18 Order, Chaos, and Fractals 18.1 Hamilton Systems 18.2 Autonomous Systems of Ordinary Differential Equations 18.3 Attractors 18.4 Chaos 18.5 Maps 18.6 Fractals Further Reading Exercises

647 647 649 650 651 653 656 659 660

19 Functional Derivatives 19.1 Functionals 19.2 Functional Derivatives 19.3 Higher-Order Functional Derivatives 19.4 Functional Taylor Series 19.5 Functional Differential Equations Exercises

661 661 661 663 665 665 668

20 Path Integrals 20.1 Path Integrals and Richard Feynman 20.2 Gaussian Integrals and Trotter’s Formula 20.3 Path Integrals in Quantum Mechanics 20.4 Path Integrals for Quadratic Actions 20.5 Path Integrals in Statistical Mechanics 20.6 Boltzmann Path Integrals for Quadratic Actions 20.7 Mean Values of Time-Ordered Products 20.8 Quantum Field Theory on a Lattice 20.9 Finite-Temperature Field Theory 20.10 Perturbation Theory 20.11 Application to Quantum Electrodynamics 20.12 Fermionic Path Integrals 20.13 Application to Nonabelian Gauge Theories 20.14 Faddeev–Popov Trick 20.15 Ghosts 20.16 Effective Field Theories 20.17 Complex Path Integrals Further Reading Exercises

669 669 669 670 674 679 683 685 688 692 694 698 702 709 709 712 713 714 714 714

21 Renormalization Group 21.1 Renormalization and Interpolation

718 718

xx

Contents

21.2 Renormalization Group in Quantum Field Theory 21.3 Renormalization Group in Lattice Field Theory 21.4 Renormalization Group in Condensed-Matter Physics Further Reading Exercises 22 Strings 22.1 The Nambu–Goto String Action 22.2 Static Gauge and Regge Trajectories 22.3 Light-Cone Coordinates 22.4 Light-Cone Gauge 22.5 Quantized Open Strings 22.6 Superstrings 22.7 Covariant and Polyakov Actions 22.8 D-branes or P-branes 22.9 String–String Scattering 22.10 Riemann Surfaces and Moduli Further Reading Exercises References Index

719 723 724 726 726 727 727 729 731 732 732 734 734 734 735 735 736 736 737 744

Preface

To the student You will find some physics crammed in amongst the mathematics. Don’t let the physics bother you. As you study the math, you’ll learn some physics without extra effort. The physics is a freebie. I have tried to explain the math you need for physics and have left out the rest. You can find codes and scripts for the simulations and figures of the book in the repositories of its chapters at github.com/kevinecahill.

To the professor The book is for students who also are taking mechanics, electrodynamics, quantum mechanics, and statistical mechanics nearly simultaneously and who soon may use probability or path integrals in their research. Linear algebra and Fourier analysis are the keys to physics, so the book starts with them, but you may prefer to skip the algebra or postpone the Fourier analysis. The book is intended to support a one- or two-semester course for graduate students or advanced undergraduates. My purpose in this book is to cover the mathematics that graduate students and working physicists need to know. I began the book with linear algebra, vector calculus, and Fourier analysis because these ideas are assumed without proof and often without mention in many courses on quantum mechanics. The chapter on infinite series lays the basis for the one that follows on the theory of functions of a complex variable, which is another set of ideas that students are assumed to know. The chapters on differential equations and on special functions cover concepts used in courses on quantum mechanics and on electrodynamics. These nine chapters make up material for a one-semester course on the basic mathematics of physics for undergraduates and beginning graduate students. The second half of the book is about more advanced and more interesting topics. Most theoretical physicists use group theory, tensors and general relativity, and xxi

xxii

Preface

probability and statistics. Some use forms, Monte Carlo methods, path integrals, and the renormalization group. Most experimental physicists use probability and statistics and Monte Carlo methods. Chapters 10–21 provide material for a secondsemester course on physical mathematics. I have been guided by two principles. The first is to write simply so as to add no extra mental work to what the students need to do to understand the mathematics. My guide here is George Orwell’s six rules for writing English. The second principle is to describe the simple ideas that are the essence of the mathematics of physics. These ideas are so simple that their descriptions may serve as their proofs. A list of errata is maintained at panda.unm.edu/cahill, and solutions to all the exercises are available for instructors at www.cambridge.org/Cahill2ed. Acknowledgments Several friends and colleagues – Rouzbeh Allahverdi, Susan Atlas, Bernard Becker, Steven Boyd, Charles Boyer, Robert Burckel, Marie Cahill, Sean Cahill, Colston Chandler, Vageli Coutsias, David Dunlap, Daniel Finley, Franco Giuliani, Roy Glauber, Pablo Gondolo, André de Gouvêa, Igor Gorelov, Kurt Hinterbichler, Jiaxing Hong, Fang Huang, Dinesh Loomba, Don Lichtenberg, Yin Luo, Lei Ma, Michael Malik, Richard Matzner, Kent Morrison, Sudhakar Prasad, Randy Reeder, Zhixiang Ren, Dmitri Sergatskov, Shashank Shalgar, David Spergel, Dimiter Vassilev, David Waxman, Edward Witten, and James Yorke – have helped me. Students have supplied questions, ideas, and corrections, notably David Amdahl, Thomas Beechem, Chris Cesare, Yihong Cheng, Charles Cherqui, Robert Cordwell, Austin Daniel, Amo-Kwao Godwin, Aram Gragossian, Aaron Hankin, Kangbo Hao, Tiffany Hayes, Yiran Hu, Shanshan Huang, Tyler Keating, Joshua Koch, Zilong Li, Miao Lin, ZuMou Lin, Sheng Liu, Yue Liu, Manuel Munoz Arias, Ben Oliker, Boleszek Osinski, Ravi Raghunathan, Akash Rakholia, Christian Roberts, Xingyue Tian, Toby Tolley, Jiqun Tu, Christopher Vergien, Weizhen Wang, James Wendelberger, Xukun Xu, Huimin Yang, Zhou Yang, Changhao Yi, Daniel Young, Mengzhen Zhang, Lu Zheng, Lingjun Zhou, and Daniel Zirzow. I should also like to thank the copy editor, Jon Billam, for his excellent work.

1 Linear Algebra

1.1 Numbers The natural numbers are the positive integers and zero. Rational numbers are ratios of integers. Irrational numbers have decimal digits dn x

=

∞ ±

n=m x

dn 10n

(1.1)

that do not repeat. Thus the repeating decimals 1 /2 = 0 .50000 . . . and 1/3 = 0.3¯ ≡ 0.33333 . . . are rational, while π = 3.141592654 . . . is irrational. Decimal arithmetic was invented in India over 1500 years ago but was not widely adopted in Europe until the seventeenth century. The real numbers R include the rational numbers and the irrational numbers; they correspond to all the points on an infinite line called the real line. The complex numbers C are the real numbers with one new number i whose square is −1. A complex number z is a linear combination of a real number x and a real multiple i y of i

= x + i y. partof z, and y = Imz is its imaginary z

Here x = Rez is the real complex numbers by adding their real and imaginary parts z1 + z2

(1.2) part. One adds

= x1 + i y1 + x2 + i y2 = x1 + x2 + i ( y1 + y2).

= −1, the product of two complex numbers is z 1 z 2 = (x 1 + i y1)( x 2 + i y2 ) = x 1 x 2 − y1 y2 + i ( x1 y2 + y 1 x2 ). The polar representation of z = x + i y is z = r ei = r (cos θ + i sin θ )

(1.3)

Since i 2

θ

(1.4)

(1.5) 1

2

1 Linear Algebra

in which r is the modulus or absolute valueof z r

= |z| =

²

x 2 + y2

(1.6)

and θ is its phase or argument θ

= arctan ( y/x ).

(1.7)

Since exp(2π i ) = 1, there is an inevitable ambiguity in the definition of the phase of any complex number z = r ei θ : for any integer n, the phase θ + 2π n gives the same z as θ . In various computer languages, the function atan2( y , x ) returns the angle θ in the interval −π < θ ≤ π for which (x , y ) = r (cos θ , sin θ ). There are two common notations z ∗ and z¯ for the complex conjugate of a complex number z = x + i y z ∗ = z¯

= x − i y.

= x + i y is |z|2 = x 2 + y2 = ( x + i y )(x − i y) = z¯ z = z ∗ z. The inverse of a complex number z = x + i y is x − iy z∗ z∗ x − iy − 1 − 1 = x 2 + y 2 = z ∗ z = |z |2 . z = (x + i y ) = ( x − i y)( x + i y )

(1.8)

The square of the modulus of a complex number z

(1.9)

(1.10)

Grassmann numbers θi are anticommuting numbers, that is, the anticommutator of any two Grassmann numbers vanishes

{ θi , θ j } ≡ [θi , θ j ]+ ≡ θi θ j + θ j θi = 0. (1.11) So the square of any Grassmann number is zero, θi2 = 0. These numbers have amusing properties (used in Chapter 20). For example, because θ1 θ2 = − θ2 θ1 and θ12 = θ22 = 0, the most general function of two Grassmann numbers is f (θ1, θ2) = a + b θ1 + c θ2 + d θ 1θ 2 (1.12) and 1/(1 + a θi ) = 1 − a θi in which a , b , c , d are complex numbers (Hermann Grassmann, 1809–1877).

1.2 Arrays An array is an ordered setof numbers. Arrays play big roles in computer science, physics, and mathematics. They can be of any (integral) dimension. A 1-dimensional array (a 1, a 2, . . . , an ) is variously called an n-tuple, a row vector when written horizontally, a column vectorwhen written vertically, or an n-vector. The numbers a k are its entries or components.

1.2 Arrays

3

A 2-dimensional array aik with i running from 1 to n and k from 1 to m is an n × m matrix. The numbers aik are its entries, elements, or matrix elements. One can think of a matrix as a stack of row vectors or as a queue of column vectors. The entry a ik is in the ith row and the kth column. One can add together arrays of the same dimension and shape by adding their entries. Two n-tuples add as (a 1, . . . , an )

+ (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn )

(1.13)

and two n × m matrices a and b add as (a

+ b)ik = aik + bik .

(1.14)

One can multiply arrays by numbers: Thus z times the 3-dimensional array ai jk is the array with entries z ai jk . One can multiply two arrays together no matter what their shapes and dimensions. The outer product of an n -tuple a and an m-tuple b is an n × m matrix with elements

= ai bk (1.15) or an m × n matrix with entries ( ba) ki = bk ai . If a and b are complex, then one also can form the outer products (a b) ik = a i bk , (b a )ki = bk a i , and (b a )ki = b k ai . The outer product of a matrix aik and a 3-dimensional array b j m (a b)ik

±

is a five-dimensional array

(a b )ik j ± m

= aik b j m.

(1.16)

±

An inner productis possible when two arrays are of the same size in one of their dimensions. Thus the inner product (a , b ) ≡ ±a |b ² or dot producta · b of two real n-tuples a and b is (a, b)

= ±a|b² = a · b = (a1 , . . . , an ) · (b1, . . . , bn ) = a1 b1 + · · · + an bn .

(1.17)

The inner product of two complex n-tuples often is defined as (a, b)

= ±a|b² = a · b = (a1 , . . . , an ) · (b1 , . . . , bn ) = a1 b1 + · · · + an bn

(1.18)

or as its complex conjugate

= ±a|b²∗ = (a · b)∗ = ( b, a ) = ±b|a² = b · a. (1.19) The inner product of a vector with itself is nonnegative (a , a ) ≥ 0. The product of an m × n matrix aik times an n-tuple bk is the m-tuple b³ whose ∗

(a , b )

ith component is

b³

i

= ai1 b1 + ai2 b2 + · · · + ain bn =

n ± k =1

aik bk .

(1.20)

4

1 Linear Algebra

This product is b ³ = a b in matrix notation. If the size n of the second dimension of a matrix a matches that of the first dimension of a matrix b, then their product a b is a matrix with entries (a b )i ±

= ai 1 b1 + · · · + ain bn = ±

±

n ± k =1

a ik b k± .

(1.21)

1.3 Matrices Matrices are 2-dimensional arrays. The trace of a square n × n matrix a is the sum of its diagonal elements Tr a = tr a

= a11 + a22 + · · · + ann =

n ± i =1

aii .

(1.22)

The trace of the product of two matrices is independent of their order Tr (a b ) =

n n ± ± i =1 k =1

=

a ik b ki

n ± n ± k =1 i =1

bki aik

= Tr (b a )

(1.23)

as long as the matrix elements are numbers that commute with each other. It follows that the trace is cyclic Tr (a b c . . . z ) = Tr (b c . . . z a ) = Tr (c . . . z a b ) = . . .

(1.24)

The transpose of an n × ± matrix a is an ± × n matrix a with entries

( ) a

T

T

ij

= a ji .

(1.25)

Mathematicians often use a prime to mean transpose, as in a ³ = a , but physicists tend to use primes to label different objects or to indicate differentiation. One may show that transposition inverts the order of multiplication T

(a b )

T

=b

T

a

T

.

(1.26)

A matrix that is equal to its transpose a=a

T

(1.27)

is symmetric, a i j = a ji . The (hermitian) adjoint of a matrix is the complex conjugate of its transpose. That is, the (hermitian) adjoint a † of an N × L complex matrix a is the L × N matrix with entries †

( a )i j

= a∗ji .

(1.28)

1.3 Matrices

One may show that †

(a b)

5

= b† a† .

(1.29)

A matrix that is equal to its adjoint ai j

= ( a†)i j = a∗ji

(1.30)

(and which must be a square matrix) is hermitian or self adjoint a = a†

(1.31)

(Charles Hermite 1822–1901). Example 1.1(The Pauli matrices) All three of Pauli’s matrices σ1

=

³ 0 1´ 1

0

,

σ2

=

³0 −i ´ , i

0

and

σ3

=

³1 0

0 −1

´

(1.32)

are hermitian (Wolfgang Pauli 1900–1958).

A real hermitian matrix is symmetric. If a matrix a is hermitian, then the quadratic form

±v |a|v ² =

N ± N ± i =1 j =1

∗ ai j v j

vi

∈R

is real for all complex n-tuples v . The Kronecker deltaδ i k is defined to be unity if i δik

=

µ

1 if i 0 if i

=k ´= k

= k and zero if i ´= k

(Leopold Kronecker 1823–1891). The identity matrixI has entries Iik = The inverse a −1 of an n × n matrix a is a square matrix that satisfies a −1 a

= a a−1 = I

(1.33)

(1.34) δik .

(1.35)

in which I is the n × n identity matrix. So far we have been writing n-tuples and matrices and their elements with lowercase letters. It is equally common to use capital letters, and we will do so for the rest of this section. A matrix U whose adjoint U † is its inverse U †U

= UU † = I

(1.36)

6

1 Linear Algebra

is unitary. Unitary matrices are square. A real unitary matrix O is orthogonal and obeys the rule O O T

= O O = I.

(1.37)

T

Orthogonal matrices are square. An N × N hermitian matrix A is nonnegative A≥0

(1.38)

if for all complex vectors V the quadratic form

± V | A |V ² =

N N ± ± i =1 j =1

Vi∗ A i j V j

≥0

(1.39)

is nonnegative. It is positive or positive definiteif

± V | A |V ² > 0

(1.40)

for all nonzero vectors |V ². Example 1.2(Kinds of positivity) The nonsymmetric, nonhermitian 2 × 2 matrix

³1 −1

1 1

´

(1.41)

is positive on the space of all real 2-vectors but not on the space of all complex 2-vectors. Example 1.3 (Representations of imaginary and grassmann numbers) matrix ³0 −1´ 1 0

The 2

×2

(1.42)

can represent the number i since

The 2 × 2 matrix

³ 0 −1´ ³0 −1´ ³−1 0 ´ = 0 −1 = − I . 1 0 1 0 ³0 1

can represent a Grassmann number since

³0 1

0 0

´ ³0 1

0 0

0 0

´

´ ³0 = 0

(1.43)

(1.44)

0 0

´

= 0.

(1.45)

1.4 Vectors

7

To represent two Grassmann numbers, one needs 4 × 4 matrices, such as

⎛0 ⎜0 θ1 = ⎜ ⎝

0 0

0 0 0 0

1 0 0 0

⎞

0 −1⎟ ⎟ 0 ⎠ 0

and

⎛0 ⎜0 θ2 = ⎜ ⎝ 0 0

1 0 0 0

0 0 0 0

⎞

0 0⎟ ⎟. 1⎠ 0

(1.46)

The matrices that represent n Grassmann numbers are 2n × 2n and have 2n rows and 2n columns. Example 1.4 (Fermions) The matrices (1.46) also can represent lowering or annihilation operators for a system of two fermionic states. For a1 = θ1 and a2 = θ2 † † and their adjoints a1 and a2 , the creation operaors, satisfy the anticommutation relations

{ai , ak†} = δik

and

{ai , ak } = {ai† , a†k } = 0

(1.47)

where i and k take the values 1 or 2. In particular, the relation (ai† )2 = 0 implements Pauli’s exclusion principle, the rule that no state of a fermion can be doubly occupied.

1.4 Vectors Vectors are things that can be multiplied by numbers and added together to form other vectors in the same vector space. So if U and V are vectors in a vector space S over a set F of numbers x and y and so forth, then W

=xU +yV

(1.48)

also is a vector in the vector space S. A basis for a vector space S is a set B of vectors Bk for k = 1 , . . . , n in terms of which every vector U in S can be expressed as a linear combination U

= u1 B1 + u2 B2 + · · · + un Bn

(1.49)

with numbers u k in F . The numbers u k are the components of the vector U in the basis B . If the basis vectorsBk are orthonormal, that is, if their inner products are ¯ k · B± = δk ±, then we might represent the vector U as the ( Bk , B± ) = ± Bk | B± ² = B n-tuple (u 1, u 2, . . . , u n ) with u k = ± Bk |U ² or as the corresponding column vector. Example 1.5(Hardware store) Suppose the vector W represents a certain kind of washer and the vector N represents a certain kind of nail. Then if n and m are natural numbers, the vector H

= nW + m N

(1.50)

8

1 Linear Algebra would represent a possible inventory of a very simple hardware store. The vector space of all such vectors H would include all possible inventories of the store. That space is a 2-dimensional vector space over the natural numbers, and the two vectors W and N form a basis for it. Example 1.6(Complex numbers) The complex numbers are a vector space. Two of its vectors are the number 1 and the number i; the vector space of complex numbers is then the set of all linear combinations z = x1 + yi

= x + iy.

(1.51)

The complex numbers are a 2-dimensional vector space over the real numbers, and the vectors 1 and i are a basis for it. The complex numbers also form a 1-dimensional vector space over the complex numbers. Here any nonzero real or complex number, for instance the number 1 can be a basis consisting of the single vector 1. This 1-dimensional vector space is the set of all z = z1 for arbitrary complex z. Example 1.7 (2-space) combinations

Ordinary flat 2-dimensional space is the set of all linear

= x xˆ + y yˆ (1.52) in which x and y are real numbers and xˆ and yˆ are perpendicular vectors of unit length (unit vectors with xˆ · xˆ = 1 = yˆ · yˆ and xˆ · yˆ = 0). This vector space, called R2 , is a r

2-d space over the reals. The vector r can be described by the basis vectors xˆ and yˆ and also by any other set of basis vectors, such as −ˆy and xˆ

= x xˆ + yyˆ = − y (−ˆy) + x xˆ . ¶ · vector r are (x , y ) in the xˆ , yˆ r

(1.53)

The¶ components of the basis and (− y, x ) in · the −ˆy, xˆ basis. Each vector is unique, but its components depend upon the basis. Example 1.8 (3-space) combinations

Ordinary flat 3-dimensional space is the set of all linear r = x xˆ + y yˆ + zzˆ

(1.54)

in which x , y, and z are real numbers. It is a 3-d space over the reals. Example 1.9 (Matrices) Arrays of a given dimension and size can be added and multiplied by numbers, and so they form a vector space. For instance, all complex 3-dimensional arrays ai jk in which 1 ≤ i ≤ 3, 1 ≤ j ≤ 4, and 1 ≤ k ≤ 5 form a vector space over the complex numbers. Example 1.10(Partial derivatives) Derivatives are vectors; so are partial derivatives. For instance, the linear combinations of x and y partial derivatives taken at x = y = 0

1.4 Vectors a

∂ ∂x

9

+ b ∂∂y

(1.55)

form a vector space. Example 1.11(Functions) The space of all linear combinations of a set of functions f i (x ) defined on an interval [a, b] f (x ) =

±

z i fi ( x )

(1.56)

i

is a vector space over the natural N, real R, or complex C numbers {zi }. Example 1.12(States in quantum mechanics) In quantum mechanics, if the properties of a system have been measured as completely as possible, then the system (or our knowledge of it) is said to be in a state, often called a pure state, and is represented by a vector ψ or |ψ ² in Dirac’s notation. If the properties of a system have not been measured as completely as possible, then the system (or our knowledge of it) is said to be in a mixture or a mixed state, and is represented by a density operator (section 1.35). If c1 and c2 are complex numbers, and |ψ1² and | ψ2² are any two states, then the linear combination

|ψ ² = c1 |ψ1² + c2 |ψ2 ²

(1.57)

also is a possible state of the system. A harmonic oscillator in its kth excited state is in a state described by a vector | k². A particle exactly at position q is in a state described by a vector |q ². An electron moving with momentum p and spin σ is in a state represented by a vector | p, σ ². A hydrogen atom at rest in its ground state is in a state | E 0². Example 1.13(Polarization of photons and gravitons) The general state of a photon of momentum kµ is one of elliptical polarization

|kµ, θ , φ ² = cos θ ei |kµ, +² + sin θ e−i |µk, −² (1.58) in which the states of positive and negative helicity |kµ, ¶² represent a photon whose angular momentum ¶± is parallel or antiparallel to its momentum kµ. If θ = π/4 + n π , φ

φ

the polarization is linear, and the electric field is parallel to an axis that depends upon µ. φ and is perpendicular to k The general state of a graviton of momentum kµ also is one of elliptical polarization (1.58), but now the states of positive and negative helicity | kµ, ¶² have angular momentum ¶2± parallel or antiparallel to the momentum kµ. Linear polarization again is θ = π/4 + nπ . The state |kµ, +² represents space being stretched and squeezed along one axis while being squeezed and stretched along another axis, both axes perpendicular to each other and to kµ. In the state | µk, ×², the stretching and squeezing axes are rotated by 45◦ about kµ relative to those of |kµ, +² .

10

1 Linear Algebra

1.5 Linear Operators

A linear operatorA maps each vector V in its domain into a vector V ³ = A( V ) ≡ A V in its range in a way that is linear. So if V and W are two vectors in its domain and b and c are numbers, then A( bV

+ cW ) = b A( V ) + c A( W ) = b A V + c A W .

(1.59)

If the domain and the range are the same vector space S, then A maps each basis vector Bi of S into a linear combination of the basis vectors Bk A Bi

= a1i B1 + a2i B2 + · · · + ani Bn =

n ±

a ki Bk

k =1

(1.60)

a formula that is clearer in Dirac’s notation (Section 1.12). The square matrix a ki represents the linear operator A in the Bk basis. The effect of A on any vector V = u 1 B1 + u 2 B2 + · · · + u n Bn in S then is AV

=A

n ± i =1

u i Bi

=

n ± i =1

u i ABi

So the kth component u ³k of the vector V ³ u³

k

=

n ±

i ,k=1

u i a ki Bk

= A V is

= ak1u1 + ak2u2 + · · · + akn u n =

=

n ± i =1

n ±

i ,k =1

a ki u i Bk .

a ki u i .

(1.61)

(1.62)

Thus the column vector u ³ of the components u ³k of the vector V ³ = A V is the product u ³ = a u of the matrix with elements aki that represents the linear operator A in the Bk basis and the column vector with components u i that represents the vector V in that basis. In each basis, vectors and linear operators are represented by column vectors and matrices. Each linear operator is unique, but its matrix depends upon the. If basis we ³ change from the Bk basis to another basis Bi B³ i

=

n ± ±

=1

u ki Bk

(1.63)

1 in which the n × n matrix u ±k has an inverse matrix u − ki so that n ± k =1

u −1 B ³ ki

k

=

n ± k =1

n −1 ±

u ki

=1

±

u ± k B±

=

¸± n n ± =1

±

k=1

¹

1 u ±k u − B± ki

=

n ± ±

=1

δ±i

B±

= Bi (1.64)

1.5 Linear Operators

11

then the old basis vectors Bi are given by

=

Bi

n ± k =1

1 ³ u− ki Bk .

(1.65)

Thus (Exercise 1.9) the linear operator A maps the basis vector Bi³ to

± ± ± 1 ³ A Bi³ = u ki A Bk = u ki a jk B j = u ki a jk u − B± . (1.66) ±j k =1 j ,k=1 j,k ,±=1 So the matrix a ³ that represents A in the B ³ basis is related to the matrix a that n

n

n

represents it in the B basis by a similarity transformation

=

a ³±i

n ±

jk =1

1 u− ± j a jk u ki

or a ³

= u −1 a u

(1.67)

in matrix notation. If the matrix u is unitary, then its inverse is its hermitian adjoint u −1

= u†

(1.68)

and the similarity transformation (1.67) is a ³±i

=

n ±

jk =1

u †± j a jk u ki

=

n ±

jk =1

u ∗j ± a jk u ik

or a ³

= u† a u.

(1.69)

Because traces are cyclic, they are invariant under similarity transformations Tr(a ³ ) = Tr (u a u −1) = Tr(a u −1 u ) = Tr (a ).

(1.70)

Example 1.14(Change of basis) Let the action of the linear operator A on the basis vectors {B 1, B2 } be AB1 = B2 and AB2 = 0. If the column vectors b1 =

³ 1´ 0

and

b2

=

³0´

(1.71)

1

represent the basis vectors B1 and B 2, then the matrix a=

³ 0 0´ 1

(1.72)

0

represents the linear operator A. But if we use the basis vectors B1³

= √1

2

( B1

then the vectors b³

1

1

=√

+ B2 )

³1´

2 1

and

B2³

= √1

2

( B1

− B2 )

³1´ 2 −1

1 and b2³ = √

(1.73)

(1.74)

12

1 Linear Algebra

would represent B1 and B2, and the matrix

³1 1´ 2 −1 −1

1 a³ =

(1.75)

would represent the linear operator A (Exercise 1.10).

A linear operator A also may map a vector space S with basis Bk into a different vector space T with its own basis C k . A Bi It then maps an arbitrary vector V AV

=

=

M ± k=1

a ki C k .

(1.76)

= u1 B1 + · · · + un Bn in S into the vector ¸± ¹ M n ± k =1

i =1

aki u i C k

(1.77)

in T . 1.6 Inner Products Most of the vector spaces used by physicists have an inner product. A positivedefinite inner productassociates a number ( f , g ) with every ordered pair of vectors f and g in the vector space V and obeys the rules ( f, g )

= =

(g ,

f )∗

(1.78)

+ w h ) z ( f, g ) + w ( f , h ) ( f , f ) ≥ 0 and ( f, f ) = 0 ⇐⇒ f = 0

( f, z

g

(1.79) (1.80)

in which f , g, and h are vectors, and z and w are numbers. The first rule says that the inner product is hermitian; the second rule says that it is linear in the second vector z g + w h of the pair; and the third rule says that it is positive definite. The first two rules imply that (Exercise 1.11) the inner product is antilinear in the first vector of the pair (z

g + w h , f ) = z ∗ (g , f ) + w ∗( h , f ).

(1.81)

A Schwarz inner productobeys the first two rules (1.78, 1.79) for an inner product and the fourth (1.81) but only the first part of the third (1.80) ( f,

f ) ≥ 0.

(1.82)

This condition of nonnegativity implies (Exercise 1.15) that a vector f of zero length must be orthogonal to all vectors g in the vector space V ( f,

f)= 0

=⇒

(g,

f ) = 0 for all g

∈ V.

(1.83)

1.6 Inner Products

13

So a Schwarz inner product is almost positive definite. Inner products of 4-vectors can be negative. To accomodate them we define an indefinite inner product without regard to positivity as one that satisfies the first two rules (1.78 and 1.79) and therefore also the fourth rule (1.81) and that instead of being positive definite is nondegenerate ( f, g )

= 0 for all f ∈ V =⇒ g = 0.

(1.84)

This rule says that only the zero vector is orthogonal to all the vectors of the space. The positive-definite condition (1.80) is stronger than and implies nondegeneracy (1.84) (Exercise 1.14). Apart from the indefinite inner products of 4-vectors in special and general relativity, most of the inner products physicists use are Schwarz inner products or positive-definite inner products. For such inner products, we can define the norm | f | = ¸ f ¸ of a vector f as the square root of the nonnegative inner product ( f , f )

¸ f ¸=

²

( f,

f ).

(1.85)

A vector fˆ = f / ¸ f ¸ has unit norm and is said to be normalized. Two measures of the distance between two normalized vectors f and g are the norm of their difference and the Bures distance D ( f, g ) =¸ f

− g¸

and D B ( f , g) = arccos(|( f , g )|).

(1.86)

Example 1.15 (Euclidian space) The space of real vectors U , V with n components Ui , Vi forms an n-dimensional vector space over the real numbers with an inner product (U , V )

=

n ± i =1

Ui Vi

(1.87)

that is nonnegative when the two vectors are the same (U , U )

=

n ± i =1

Ui Ui

=

n ± i =1

Ui2

≥0

(1.88)

and vanishes only if all the components Ui are zero, that is, if the vector U = 0. Thus the inner product (1.87) is positive definite. When (U , V ) is zero, the vectors U and V are orthogonal. Example 1.16(Complex euclidian space) The space of complex vectors with n components Ui , Vi forms an n-dimensional vector space over the complex numbers with inner product (U , V )

=

n ± i =1

Ui∗ Vi

= (V , U )∗ .

(1.89)

14

1 Linear Algebra

The inner product (U , U ) is nonnegative and vanishes (U , U )

=

n ± i =1

Ui∗Ui

=

n ± i =1

|Ui |2 ≥ 0

(1.90)

only if U = 0. So the inner product (1.89) is positive definite. If (U , V ) is zero, then U and V are orthogonal. Example 1.17(Complex matrices) For the vector space of n × m complex matrices A, B , . . . , the trace of the adjoint (1.28) of A multiplied by B is an inner product ( A,

B ) = TrA † B

=

n ± m ±

i =1 j =1

† ( A ) ji Bi j

that is nonnegative when the matrices are the same ( A,

A ) = TrA † A

=

n ± m ± i =1 j =1

A ∗i j Ai j

=

=

n ± m ±

A∗i j B i j

(1.91)

| Ai j |2 ≥ 0

(1.92)

i =1 j = 1

n ± m ± i =1 j =1

and zero only when A = 0. So this inner product is positive definite.

A vector space with a positive-definite inner product (1.78–1.81) is called an inner-product space, a metric space, or a pre-Hilbert space. A sequence of vectors fn is a Cauchy sequence if for every ² > 0 there is an integer N (²) such that ¸ fn − fm ¸ < ² whenever both n and m exceed N (²). A sequence of vectors fn converges to a vector f if for every ² > 0 there is an integer N (²) such that ¸ f − fn ¸ < ² whenever n exceeds N (²). An innerproduct space with a norm defined as in (1.85) is complete if each of its Cauchy sequences converges to a vector in that space. A Hilbert spaceis a complete innerproduct space. Every finite-dimensional inner-product space is complete and so is a Hilbert space. An infinite-dimensional complete inner-product space, such as the space of all square-integrable functions, also is a Hilbert space (David Hilbert, 1862–1943). Example 1.18(Hilbert space of square-integrable functions) functions (1.56), a natural inner product is ( f , g)

The squared norm ¸ f

=

¸ of a function

ºb a

dx f ∗ (x ) g(x ).

For the vector space of

(1.93)

f (x ) is

¸ f ¸ 2=

ºb a

dx | f (x )|2 .

(1.94)

1.7 Cauchy–Schwarz Inequalities

15

A function is square integrableif its norm is finite. The space of all square-integrable functions is an inner-product space; it also is complete and so is a Hilbert space. Example 1.19(Minkowski inner product) The Minkowski or Lorentz inner product ( p , x ) of two 4-vectors p = ( E /c, p1 , p 2, p3 ) and x = (ct , x 1, x2 , x3 ) is p · x − Et . It is indefinite, nondegenerate (1.84), and invariant under Lorentz transformations, and often is written as p · x or as p x. If p is the 4-momentum of a freely moving physical particle of mass m, then p·p

= p · p − E2 /c2 = − c2m 2 ≤ 0.

(1.95)

The Minkowski inner product satisfies the rules (1.78, 1.79, and 1.84), but it is not positive definite, and it does not satisfy the Schwarz inequality (Hermann Minkowski 1864–1909, Hendrik Lorentz 1853–1928). Example 1.20(Inner products in quantum mechanics) The probability P (φ | ψ ) that a system in the state |ψ ² will be measured to be in the state |φ ² is the absolute value squared of the inner product ±φ |ψ ² divided by the squared norms of the two states P (φ | ψ ) =

|±φ|ψ ²|2 . ±φ |φ ²±ψ |ψ ²

(1.96)

If the two states are normalized, then the probability is just the absolute value squared of their inner product, P (φ |ψ ) = |±φ| ψ ²|2.

1.7 Cauchy–Schwarz Inequalities For any two vectors f and g, the Schwarz inequality ( f,

f ) ( g , g ) ≥ |( f , g )|2

(1.97)

holds for any Schwarz inner product (and so for any positive-definite inner product). The condition (1.82) of nonnegativity ensures that for any complex number λ the inner product of the vector f − λg with itself is nonnegative

− λg, f − λg) = ( f , f ) − λ ∗( g, f ) − λ( f, g) + |λ|2 (g, g) ≥ 0. (1.98) Now if (g , g ) = 0, then for ( f − λg , f − λg ) to remain nonnegative for all complex values of λ it is necessary that ( f , g ) = 0 also vanish (Exercise 1.15). Thus if ( g , g ) = 0, then the Schwarz inequality (1.97) is trivially true because both sides of it vanish. So we assume that (g , g ) > 0 and set λ = (g , f )/( g , g ). The (

f

inequality (1.98) then gives us (

f

− λg, f − λg) =

³

f

−

(g,

f) g, f (g , g )

−

(g,

f) g (g, g)

´

= ( f , f ) − ( f,(gg)(, gg), f ) ≥ 0

16

1 Linear Algebra

which is the Schwarz inequality (1.97) f )( g , g ) ≥ |( f , g )|2 .

( f,

(1.99)

Taking the square root of each side, we have

¸ f ¸¸ g ¸≥ |( f , g)|

(1.100)

(Hermann Schwarz 1843–1921). Example 1.21(Some Schwarz inequalities) For the dot product of two real 3-vectors r and R, the Cauchy–Schwarz inequality is (r

· r ) ( R · R ) ≥ ( r · R )2 = ( r · r ) ( R · R ) cos2 θ

(1.101)

where θ is the angle between r and R. The Schwarz inequality for two real n-vectors x is (x

· x ) ( y · y) ≥ (x · y)2 = ( x · x) ( y · y) cos2 θ

(1.102)

and it implies (Exercise 1.16) that

¸x ¸ + ¸ y¸ ≥ ¸x + y¸.

(1.103)

For two complex n-vectors u and v , the Schwarz inequality is

(u∗ · u) (v∗ · v) ≥ »»u∗ · v»»2 = (u∗ · u) (v∗ · v) cos2 θ

(1.104)

and it implies (exercise 1.17) that

¸u¸ + ¸v¸ ≥ ¸u + v¸.

(1.105)

The inner product (1.93) of two complex functions f and g provides another example

ºb a

dx | f (x )|2

ºb a

dx | g(x )|2

»»º b ≥ »» dx a

»»2

f ∗ (x ) g(x )»»

(1.106)

of the Schwarz inequality.

1.8 Linear Independence and Completeness A set of n vectors V1, V2 , . . . , Vn is linearly dependentif there exist numbers ci , not all zero, such that the linear combination c 1 V1

+ · · · + cn Vn = 0

(1.107)

vanishes. A set of vectors is linearly independentif it is not linearly dependent.

1.9 Dimension of a Vector Space

17

A set { Vi } of linearly independent vectors is maximal in a vector space S if the addition of any other vector U in S to the set { Vi } makes the enlarged set {U , Vi } linearly dependent. A set of n linearly independent vectors V1, V2 , . . . , Vn that is maximal in a vector space S can represent any vector U in the space S as a linear combination of its vectors, U = u 1 V1 +· · ·+ u n Vn . For if we enlarge the maximal set { Vi } by including in it any vector U not already in it, then the bigger set {U , Vi } will be linearly dependent. Thus there will be numbers c0 , c 1, . . . , cn , not all zero, that make the sum c0 U + c1 V1 + · · · + cn Vn

=0

(1.108)

vanish. Now if c0 were 0, then the set { Vi } would be linearly dependent. Thus c 0 ´ = 0, and so we may divide by c 0 and express the arbitrary vector U as a linear combination of the vectors Vi (1.109) = − c1 ( c1 V1 + · · · + cn Vn ) = u1 V1 + · · · + u n Vn 0 with u k = −ck /c 0. Thus a set of linearly independent vectors { Vi } that is maximal in a space S can represent every vector U in S as a linear combination U = u 1 V1 + · · · + un Vn of its vectors. Such a set { Vi } of linearly independent vectors that is U

maximal in a space S is called a basis for S; it spans S ; it is a complete set of vectors in S. 1.9 Dimension of a Vector Space

If V1, . . . , Vn and W1, . . . , Wm are any two bases for a vector space S, then n = m. To see why, suppose that the n vectors C 1, C 2, . . . , C n are complete in a vector space S, and that the m vectors L 1, L 2 , . . . , L m in S are linearly independent (Halmos, 1958, sec. 1.8). Since the C ’s are complete, the set of vectors L m , C1 , . . . , C n is linearly dependent. So we can omit one of the C ’s and the remaining set L m , C1 , . . . , C i −1 , Ci +1, . . . , C n still spans S. Repeating this argument, we find that the vectors L m −1 , L m , C1 , . . . , C i −1 , Ci +1, . . . , Cn

(1.110)

are linearly dependent, and that the vectors L m −1 , L m , C 1 , . . . , C i −1 , Ci +1, . . . , C j −1, C j +1 , . . . , C n

(1.111)

still span S. We continue to repeat these steps until we run out of L ’s or C ’s. If n were less than m, then we’d end up with a set of vectors L k , . . . , L m that would be complete and therefore each of the vectors L 1 , . . . , L k −1 would have to be linear combinations of the vectors L k , . . . , L m . But the L’s by assumption are linearly

18

1 Linear Algebra

independent. So n ≥ m. Thus if both the C’s and the L’s are bases for the same space S, and so are both complete and linearly independent in it, then both n ≥ m and m ≥ n. So all the bases of a vector space consist of the same number of vectors. This number is the dimension of the space. The steps of the above demonstration stop for n = m when the m linearly independent L ’s have replaced the n complete C ’s leaving us with n = m linearly independent L ’s that are complete. Thus in a vector space of n dimensions, every set of n linearly independent vectors is complete and so forms a basis for the space. 1.10 Orthonormal Vectors Suppose the vectors V1, V2 , . . . , Vn are linearly independent. Then we can make out of them a set of n vectors Ui that are orthonormal (Ui , U j )

= δi j .

(1.112)

There are many ways to do this, because there are many such sets of orthonormal vectors. We will use the Gram–Schmidt method. We set V1 (1.113) U1 = √ ( V1 , V1 ) so the first vector U1 is normalized. Next we set u 2 u 2 be orthogonal to U1

= V2 + c12U1 and require that

0 = (U 1, u 2) = (U1, c 12U1 + V2 ) = c12 + (U1 , V2 ).

Thus c 12

= −(U1 , V2 ), and so

u2

= V2 − (U1, V2) U1 .

(1.114)

(1.115)

The normalized vector U2 then is U2

u2 . (u 2 , u 2 )

=√

(1.116)

= V3 + c13U1 + c23U2 and ask that u3 be orthogonal to U1 0 = (U1 , u 3 ) = (U1 , c13 U1 + c23U2 + V3 ) = c 13 + (U1 , V3 ) (1.117)

We next set u 3

and also to U2 0 = (U2 , u 3 ) = (U2 , c13U1 So c13

+ c23U2 + V3 ) = c23 + (U2 , V3 ).

= −(U 1, V3) and c23 = −(U2, V3) , and we have u 3 = V3 − (U1, V3) U1 − (U2 , V3 ) U2.

(1.118)

(1.119)

1.11 Outer Products

19

The normalized vector U3 then is U3

u3

=√

(u 3 , u 3)

(1.120)

.

We may continue in this way until we reach the last of the n linearly independent vectors. We require the kth unnormalized vector u k uk

= Vk +

k −1 ± i =1

cik Ui

(1.121)

to be orthogonal to the k − 1 vectors Ui and find that cik uk

= Vk −

k −1 ± i =1

= −(Ui , Vk ) so that

(Ui , Vk ) Ui .

(1.122)

The normalized vector then is Uk

uk . (u k , u k )

=√

(1.123)

A basis is more convenient if its vectors are orthonormal. 1.11 Outer Products From any two vectors f and g, we may make an outer-product operator A that maps any vector h into the vector f multiplied by the inner product ( g , h ) Ah

=

f (g , h ) = ( g , h ) f .

(1.124)

The operator A is linear because for any vectors e , h and numbers z , w A ( z h + w e ) = ( g , z h + w e) f

= z (g, h) f + w (g, e) f = z A h + w A e.

(1.125) If f , g, and h are vectors with components fi , gi , and h i in some basis, then the linear transformation is ( Ah )i

=

n ± j =1

Ai j h j

=

fi

n ± j =1

g ∗j h j

(1.126)

and in that basis A is the matrix with entries Ai j

=

f i g ∗j .

(1.127)

It is the outer productof the vectors f and g ∗ . The outer product of g and f ∗ is different, Bi j = g i f j∗ .

20

1 Linear Algebra

Example 1.22(Outer product) f

If in some basis the vectors f and g are

=

³2´

and

3i

⎛i ⎞ g = ⎝1⎠

(1.128)

3i

then their outer products are the matrices

³2´( A= −i 3i

and

1

³−2i ) −3i =

−6i

2 3i

3

´

(1.129)

9

⎛2i 3 ⎞ ⎛i⎞ ( ) B = ⎝ 1 ⎠ 2 −3i = ⎝ 2 −3i ⎠ . 6i

3i

(1.130)

9

Example 1.23(Dirac’s outer products) Dirac’s notation for outer products is neat. If the vectors f = | f ² and g = | g² are

⎛ |f²=⎝

then their outer products are

⎛az ∗ | f ²±g| = ⎝bz ∗ cz ∗

as well as

⎛aa∗ | f ²± f | = ⎝ba∗ ca ∗

⎞ ⎠

⎞

aw ∗ bw ∗ ⎠ cw∗ ab∗ bb ∗ cb∗

a b c

and

and

|g ²± f | =

⎞

ac ∗ bc∗ ⎠ cc∗

|g² =

and

³

(1.131)

w

³ za ∗ wa

´

z

∗

|g ²±g| =

zb ∗ ∗ wb

³ zz∗ wz

∗

zc ∗ ∗ wc

zw ∗ ww

∗

´

´ .

(1.132)

(1.133)

1.12 Dirac Notation Outer products are important in quantum mechanics, and so Dirac invented a notation for linear algebra that makes them easy to write. In his notation, a vector f is a ket f = | f ². The new thing in his notation is the bra ±g |. The inner product of two vectors ( g , f ) is the bracket ( g , f ) = ±g | f ². A matrix element ( g , c f ) of an operator c then is (g , c f ) = ±g |c| f ² in which the bra and ket bracket the operator c. In Dirac notation, an outer product like (1.124) A h = (g , h ) f = f ( g , h ) reads A |h ² = | f ²±g |h ², and the outer product A itself is A = | f ²± g |.

1.12 Dirac Notation

21

The bra ±g | is the adjoint of the ket |g ², and the ket | f ² is the adjoint of the bra ±f| ± g| = (|g²)† and | f ² = ( ± f |) †, so ± g|†† = ± g| and | f ²†† = | f ². (1.134) The adjoint of an outer product is (z

| f ²±g| )† = z∗ |g²± f |.

(1.135)

In Dirac’s notation, the most general linear operator is an arbitrary linear combination of outer products A= Its adjoint is A†

=

± k±

z k ± |k ²±±|.

(1.136)

z ∗k ± |±²±k |.

(1.137)

± k±

The adjoint of a ket |h ² = A | f ² is

|h²) = ( A| f ²) =

(

†

†

¸±

zk ±

k±

¹† ± |k²±±| f ² = z∗k ± f |±²±k| = ± f | A † . ±

(1.138)

k±

Before Dirac, bras were implicit in the definition of the inner product, but they did not appear explicitly; there was no simple way to write the bra ±g | or the outer product | f ²±g |. If the kets |k ² form an orthonormal basis in an n-dimensional vector space, then we can expand an arbitrary ket in the space as

|f² =

n ± k =1

ck |k ².

(1.139)

Since the basis vectors are orthonormal ± ±|k ² = δ±k , we can identify the coefficients c k by forming the inner product

±±| f ² =

n ± k =1

c k ±±|k ² =

n ± k =1

ck δ±,k

The original expasion (1.139) then must be

|f² =

n ± k =1

ck |k ² =

n ± k =1

±k| f ² |k² =

n ± k =1

| k ² ±k | f ² =

=c .

(1.140)

±

¸± n k =1

¹

|k² ±k | | f ².

(1.141)

22

1 Linear Algebra

Since this equation must hold for every vector | f ² in the space, it follows that the sum of outer products within the parentheses is the identity operator for the space I

=

n ± k =1

|k ² ±k|.

(1.142)

Every set of kets |α j ² that forms an orthonormal basis ±α j |α± ² gives us an equivalent representation of the identity operator I

=

n ± j =1

| α j ² ±α j | =

n ± k =1

= δj

±

|k ² ±k|.

for the space

(1.143)

These resolutions of the identity operator give every vector | f ² in the space the expansions

|f² =

n ± j =1

| α j ² ±α j | f ² =

n ± k= 1

|k ² ±k| f ².

(1.144)

Example 1.24(Linear operators represented as matrices) The equations (1.60–1.67) that relate linear operators to the matrices that represent them are much clearer in Dirac’s notation. If the kets | Bk ² are n orthonormal basis vectors, that is, if ±B k | B± ² = δk ± , for a vector space S, then a linear operator A acting on S maps the basis vector |Bi ² into (1.60) A| Bi ² =

n ± k =1

|Bk ²± Bk | A|Bi ² =

n ±

k= 1

aki | Bk ²,

(1.145)

and the matrix that represents the linear operator A in the | Bk ² basis is aki = ±Bk | A|Bi ². If a unitary operator U maps these basis vectors into |Bk³ ² = U |Bk ², then in this new basis the matrix that represents A as in (1.138) is a±³ i

= ± B³ | A|Bi³ ² = ± B |U † A U | Bi ² n ± n n ± n ± ± = ± B |U † |B j ²± B j | A| Bk ²± Bk |U |Bi ² = u† j a jk u ki ±

±

j =1 k =1

or a ³

±

j =1 k =1

(1.146)

±

= u† a u in matrix notation.

Example 1.25(Inner-product rules) In Dirac’s notation, the rules (1.78—1.81), of a positive-definite inner product are

± f |g² =±g | f ²∗ ± f |z 1 g1 + z2 g2² = z1 ± f |g1 ² + z 2± f |g2 ² ±z 1 f1 + z2 f 2 |g² = z∗1 ± f 1|g² + z∗2 ± f 2|g² ± f | f ² ≥ 0 and ± f | f ² = 0 ⇐⇒ f = 0.

(1.147)

1.12 Dirac Notation

23

States in Dirac notation often are labeled | ψ ² or by their quantum numbers |n , l , m ², and one rarely sees plus signs or complex numbers or operators inside bras or kets. But one should. Example 1.26 (Gram–Schmidt) In Dirac notation, the formula (1.122) for the kth orthogonal linear combination of the vectors |V± ² is

|uk ² = |Vk ² −

k−1 ± i =1

|Ui ²±Ui |Vk ² =

¸

I

−

± k −1 i =1

¹ |Ui ²±Ui | |Vk ²

(1.148)

and the formula (1.123) for the kth orthonormal linear combination of the vectors | V± ² is (1.149) |Uk ² = √ |uk ² . ±u k |u k ² The vectors |Uk ² are not unique; they vary with the order of the |Vk ².

Vectors and linear operators are abstract. The numbers we compute with are inner products like ±g | f ² and ± g| A| f ². In terms of n orthonormal basis vectors | j ² with f j = ± j | f ² and g ∗j = ±g | j ², we can use the expansion (1.142) of the identity operator to write these inner products as

± g | f ² = ±g | I | f ² =

n n ± ± ± g| j ²± j | f ² = g∗j f j j =1

± g| A | f ² = ±g| I AI | f ² =

± n

j =1

± g| j ²± j | A |±²±±| f ² =

j,± =1

n ± j ,±=1

g∗ A j

in which A j ± = ± j | A|±². We often gather the inner products f± column vector f with components f± = ±±| f ² f

⎛ ±1| f ² ⎜ ±2| f ² =⎜ ⎜⎝ .. . ± n| f ²

⎞ ⎛ ⎟⎟ ⎜⎜ ⎟⎠ = ⎜⎝

f1 f2 .. .

fn

⎞ ⎟⎟ ⎟⎠

(1.150) j±

f±

= ±±| f ² into a (1.151)

and the ± j | A|±² into a matrix A with matrix elements A j ± = ± j | A|±². If we also line up the inner products ± g| j ² = ± j |g ²∗ in a row vector that is the transpose of the complex conjugate of the column vector g

( ) ( ) = ±1| g² ∗, ± 2|g²∗ , . . . , ±n|g²∗ = g1∗ , g∗2 , . . . , gn∗ (1.152) then we can write inner products in matrix notation as ±g | f ² = g † f and as ± g| A| f ² = g† A f . One can compute the inner product ±g , f ² of two vectors f and g by doing g†

the sum (1.150) of g ∗j f j over the index j only if one knows their components

24

1 Linear Algebra

f j and g j which are their inner products f j = ± j | f ² and g j = ± j |g ² with the orthonormal states | j ² of some basis. Thus an inner product implies the existence of an orthonormal basis and a representation of the identity operator I

=

n ± j =1

| j ²± j |.

(1.153)

If we switch to a different basis, say from |k ²’s to |αk ²’s, then the components of the column vectors change from f k = ±k | f ² to fk³ = ±αk | f ², and similarly those of the row vectors g † and of the matrix A change, but the bras, the kets, the linear operators, and the inner products ± g | f ² and ± g | A | f ² do not change because the identity operator is basis independent (1.143)

± g| f ² = ± g| A| f ² =

n n ± ± ± g|k²±k | f ² = ± g|αk ²±αk | f ² k =1 n

±

k ,±=1

k =1

±g|k²±k | A |±²±±| f ² =

n ± k ,±=1

± g|αk ²±αk | A |α ²±α | f ². ±

(1.154)

±

Dirac’s outer products show how to change from one basis to another. The sum of outer products U

=

n ± k =1

|αk ²±k|

(1.155)

maps the ket |±² of one orthonormal basis into that |α± ² of another U |±² =

n ± k =1

|αk ²±k|±² =

n ± k=1

|αk ² δk = |α ². ±

±

(1.156)

Example 1.27(Simple change of basis) If the ket |αk ² of the new basis is simply |αk ² = |k + 1² with |αn ² = |n + 1² ≡ |1², then the operator that maps the n kets |k² into the kets |αk ² is U

=

n ± k =1

|αk ²± k| =

n ± k =1

|k + 1²±k |.

(1.157)

The square U 2 of U also changes the basis; it sends | k² to |k + 2². The set of operators U ± for ± = 1, 2, . . . , n forms a group known as Z n .

To compute the inner product (U , V ) of two vectors U and V , one needs the components Ui and Vi of these vectors in order to do the sum (1.89) of Ui∗ Vi over the index i.

1.13 Adjoints of Operators

25

1.13 Adjoints of Operators In Dirac’s notation, the most general linear operator (1.136) on an n-dimensional vector space is a sum of outer products z |k ²± ±| in which z is a complex number and the kets |k ² and |±² are two of the n orthonormal kets that make up a basis for the space. The adjoint (1.135) of this basic linear operator is (z

Thus with z

|k ²±±|) † = z∗ |±²±k |.

(1.158)

= ±k | A |±², the most general linear operator on the space is n ± A = I AI = |k²±k | A|±²±±| k ,±=1

and its adjoint A † is the operator I A† I A†

=

n ±

k ,±=1

|±²±±| A †|k ²±k| =

n ± k ,±=1

|±²±k| A|±²∗ ±k|.

(1.159)

(1.160)

It follows that ±±| A †|k ² = ±k | A|±²∗ so that the matrix A†k ± that represents A † in this basis is A †±k

= ±±| A †|k² = ±k| A|±²∗ = A∗k = A ∗k

T

±

±

(1.161)

in agreement with our definition (1.28) of the adjoint of a matrix as the transpose of its complex conjugate, A † = A ∗T . We also have

±g| A† f ² = ±g| A† | f ² = ± f | A| g²∗ = ± f | Ag²∗ = ± Ag | f ².

(1.162)

Taking the adjoint of the adjoint is by (1.158)

¼(z |k²±±|)†½† = ¼z ∗ |±²±k|½† = z |k²±±|

(1.163)

the same as doing nothing at all. This also follows from the matrix formula (1.161) because both ( A∗ )∗ = A and ( A ) = A, and so T

T

( A †) † = ( A ∗ ) ∗ = A T

T

(1.164)

the adjoint of the adjoint of a matrix is the original matrix. Before Dirac, the adjoint A † of a linear operator A was defined by (g,

A† f ) = ( A g , f ) = ( f , A g)∗ .

= A since †† † † ∗ ∗ (g , A f ) = ( A g , f ) = ( f , A g ) = ( A f , g ) = ( g , A f ). We also have ( g , A f ) = (g , A †† f ) = ( A † g , f ).

(1.165)

This definition also implies that A ††

(1.166)

26

1 Linear Algebra

1.14 Self-Adjoint or Hermitian Linear Operators

An operator A that is equal to its adjoint A† = A is self adjoint or hermitian. In view of (1.161), the matrix elements of a self-adjoint linear operator A satisfy ±k | A† |±² = ±±| A|k ²∗ = ±k | A|±² in any orthonormal basis. So a matrix that represents a hermitian operator is equal to the transpose of its complex conjugate Ak ±

= ±k| A|±² = ±k | A †|±² = ±±| A |k²∗ = A∗k = A†k .

(1.167)

±g| A | f ² = ± A g| f ² = ± f | A g ²∗ = ± f | A |g²∗

(1.168)

T

±

±

We also have

and in pre-Dirac notation (g ,

A f ) = ( A g, f ) = ( f, A g )∗ .

(1.169)

A matrix Ai j that is real and symmetricor imaginary and antisymmetricis hermitian. But a self-adjoint linear operator A that is represented by a matrix A i j that is real and symmetric (or imaginary and antisymmetric) in one orthonormal basis will not in general be represented by a matrix that is real and symmetric (or imaginary and antisymmetric) in a different orthonormal basis, but it will be represented by a hermitian matrix in every orthonormal basis. A ket |a ³ ² is an eigenvector of a linear operator A with eigenvalue a ³ if A |a ³ ² = a ³ |a ³² . As we’ll see in Section 1.29, hermitian matrices have real eigenvalues and complete sets of orthonormal eigenvectors. Hermitian operators and matrices represent physical variables in quantum mechanics. Example 1.28(Fierz identities for n × n hermitian matrices) The n 2 n × n hermitian matrices t a form a vector space with an inner product ±a| b² (Section 1.6) defined by the trace (1.22) ±a|b² = Tr(t a t b ). One can use the Gram–Schmidt method (Section 1.10) to make them orthonormal, so that

±a|b² = Tr(t a t b ) =

n ±

i,k =1

b tika tki

= δab .

(1.170)

Then the sum of their n 2 outer products (1.22) is the identity matrix of the n2 dimensional vector space

⎞ ⎛n ± ⎝ |a²±a |⎠ 2

a=1

i j ,k ±

=

n ± 2

a=1

tiaj tka±

= Iik j = δi ,±

±

δk j

(1.171)

1.16 Unitary Operators

27

because tibj

= (|b²)i j =

n ± 2

a=1

(

|a ²)i j ±a |b² =

n ± 2

a=1

tiaj Tr(t a t b)

=

n n ± ± 2

a=1 k,± =1

tiaj t ka± t±bk .

(1.172)

(Markus Fierz, 1912–2006)

1.15 Real, Symmetric Linear Operators In quantum mechanics, we usually consider complex vector spaces, that is, spaces in which the vectors | f ² are complex linear combinations

|f² =

n ± k =1

z k |k ²

(1.173)

of complex orthonormal basis vectors |i ². But real vector spaces also are of interest. A real vector space is a vector space in which the vectors | f ² are real linear combinations

|f ² =

n ± k =1

x k |k ²

(1.174)

of real orthonormal basis vectors, x k∗ = x k and |k ² ∗ = |k ². A real linear operator A on a real vector space A=

n ±

k ,±= 1

|k²±k | A |±²±±| =

n ±

k ,±= 1

|k² A k ±±| ±

(1.175)

is represented by a real matrix A∗k ± = Ak ± . A real linear operator A that is self adjoint on a real vector space satisfies the condition (1.169) of hermiticity but with the understanding that complex conjugation has no effect

= ( f , A g). (1.176) Thus its matrix elements are symmetric, ± g | A | f ² = ± f | A|g ². Since A is hermitian (g, A

f ) = ( A g, f ) = ( f , A g )∗

as well as real, the matrix A k ± that represents it (in a real basis) is real and hermitian, and so is symmetric Ak ± = A ∗±k = A± k . 1.16 Unitary Operators A unitary operatorU is one whose adjoint is its inverse U U†

= U† U = I .

(1.177)

28

1 Linear Algebra

Any operator that maps one orthonormal basis |k ² to another |αk ²

=

U is unitary since UU

†

= =

n ± k =1 n

±

k ,± =1

as well as †

U U

|αk ²±k|

=

n ± k =1

n ±

=1

|αk ²±k|

|±²±α | = ±

±

|αk ²δk ±α | =

n ±

=1

,±

|±²±α | ±

±

n ±

±

n ± k =1

k =1

(1.178)

n ± k ,±=1

|αk ²±k|±²±α | ±

(1.179)

|αk ²±αk | = I

|αk ²±k | =

n ± k =1

|k²±k | = I .

(1.180)

A unitary operator maps every orthonormal basis |k ² into another orthonormal basis |αk ². For if |αk ² = U |k², then the vectors |αk ² are orthonormal ±αk |α±² = δk ,± (Exercise 1.22). They also are complete because they provide a resolution of the identity operator n ± k =1

|αk ²±αk | =

n ± k =1

If we multiply the relation index k, we get n ± k =1

U |k ²±k |U †

= U I U† = U U† = I.

(1.181)

|αk ² = U |k ² by the bra ±k| and then sum over the

|αk ²±k | =

n ± k =1

U |k ²±k | = U

n ± k =1

|k ²±k| = U .

(1.182)

Every unitary operator maps every orthonormal basis into another orthonormal basis or into itself. Inner products do not change under unitary transformations because ±g | f ² = ±g|U † U | f ² = ±U g |U | f ² = ±U g |U f ² which in pre-Dirac notation is ( g, f ) = (g , U † U f ) = (U g , U f ). Unitary matrices have unimodular determinants, det U = 1, because the determinant of the product of two matrices is the product of their determinants (1.222) and because transposition doesn’t change the value of a determinant (1.205) 1 = det I

= det(UU †) = det U det U † = det U (det(U

T

∗

))

= |det U |2.

(1.183)

A unitary matrix that is real is orthogonal and satsfies OO

T

=O

T

O

= I.

(1.184)

1.17 Hilbert Spaces

29

1.17 Hilbert Spaces We have mainly been talking about linear operators that act on finite-dimensional vector spaces and that can be represented by matrices. But infinite-dimensional vector spaces and the linear operators that act on them play central roles in electrodynamics and quantum mechanics. For instance, the Hilbert space of all “wave” functions ψ ( x , t ) that are square integrable over 3-dimensional space at all times t is of infinite dimension. In one space dimension, the state | x ³² represents a particle at position x ³ and is an eigenstate of the hermitian position operator x with eigenvalue x ³ , that is, x |x ³ ² = x ³ | x ³ ². These states form a basis that is orthogonal in the sense that ±x |x ³ ² = 0 for x ´ = x ³ and normalized in the sense that ± x | x ³² = δ( x − x ³) in which δ( x − x ³ ) is Dirac’s delta function. The delta function δ( x − x ³) actually is a functional δx ³ that maps any suitably smooth function f into its value at x ³

H

δx³

[f] =

º

δ( x

− x ³) f ( x ) d x =

f (x ³ ).

(1.185)

Another basis for the Hilbert space of 1-dimensional quantum mechanics is made of the states | p ² of well-defined momentum. The state | p ³ ² represents a particle or system with momentum p ³ . It is an eigenstate of the hermitian momentum operator p with eigenvalue p ³, that is, p | p ³ ² = p ³ | p ³² . The momentum states also are orthonormal in Dirac’s sense, ± p | p³ ² = δ( p − p ³ ). The operator that translates a system in space by a distance a is U (a) =

º

| x + a²±x| d x .

(1.186)

It maps the state |x ³ ² to the state |x ³ + a ² and is unitary (Exercise 1.23). Remarkably, this translation operator is an exponential of the momentum operator U (a ) = exp (−i p a/±) in which ± = h /2π = 1.054 ×10−34 Js is Planck’s constant divided by 2π . In 2 dimensions, with basis states |x , y ² that are orthonormal in Dirac’s sense, ± x , y|x ³ , y³ ² = δ(x − x³ )δ( y − y³ ), the unitary operator U (θ ) =

º

|x cos θ − y sin θ , x sin θ + y cos θ ²±x , y | d xd y

(1.187)

rotates a system in space by the angle θ . This rotation operator is the exponential U (θ ) = exp( − i θ L z /±) in which the z component of the angular momentum is L z = x p y − y px . We may carry most of our intuition about matrices over to these unitary transformations that change from one infinite basis to another. But we must use common sense and keep in mind that infinite sums and integrals do not always converge.

30

1 Linear Algebra

1.18 Antiunitary, Antilinear Operators

Certain maps on states |ψ ² → |ψ ³ ², such as those involving time reversal, are implemented by operators K that are antilinear K (z ψ

+ wφ) = K ( z|ψ ² + w|φ ²) = z ∗ K |ψ ² + w∗ K |φ ² = z ∗ K ψ + w∗ K φ

(1.188)

and antiunitary

= ± K φ | K ψ ² = (φ , ψ )∗ = ±φ|ψ ²∗ = ±ψ |φ ² = (ψ, φ) . (1.189) The adjoint K † of an antiunitary operator K is defined by ± K † φ|ψ ² = ±φ |K |ψ ²∗ so that ± K † K φ |ψ ² = ±K φ| K ψ ²∗ = ±φ|ψ ² ∗∗ = ±φ|ψ ². (K φ , K ψ )

1.19 Symmetry in Quantum Mechanics

In quantum mechanics, a symmetry is a map of states |ψ ² → |ψ ³ ² and |φ ² → |φ ³ ² that preserves probabilities

|±φ³ |ψ ³ ²|2 = |±φ |ψ ²|2.

(1.190)

Eugene Wigner (1902–1995) showed that every symmetry in quantum mechanics can be represented either by an operator U that is linear and unitary or by an operator K that is antilinear and antiunitary. The antilinear, antiunitary case occurs when a symmetry involves time reversal. Most symmetries are represented by operators that are linear and unitary. Unitary operators are of great importance in quantum mechanics. We use them to represent rotations, translations, Lorentz transformations, and internal-symmetry transformations. 1.20 Determinants

The determinant of a 2 × 2 matrix A is

det A = | A| = A 11 A22 − A 21 A12 .

In terms of the 2 × 2 antisymmetric (ei j e11 = e22 = 0, this determinant is det A =

2 ± 2 ± i =1 j =1

= −e ji ) matrix e12 = 1 = −e21 with

ei j A i1 A j2

It’s also true that ek ± det A =

(1.191)

=

2 ± 2 ± i =1 j =1

2 ± 2 ± i =1 j =1

e i j A 1i A 2 j .

ei j A ik A j ± .

(1.192)

(1.193)

1.20 Determinants

31

Example 1.29 (Area of a parallelogram) Two 2-vectors V = (V1 , V2) and W = (W1, W2 ) define a parallelogram whose area is the absolute value of a 2× 2 determinant

»» ³ V 1

area(V , W ) = »» det

V2 W2

W1

´»» »» = | V1 W2 − V2 W1 | .

(1.194)

To check this formula, rotate the coordinates so that the 2-vector V runs from the origin along the x-axis. Then V2 = 0, and the determinant is V1 W2 which is the base V1 of the parallelogram times its height W2.

These definitions (1.191–1.193) extend to any square matrix. If A is a 3 matrix, then its determinant is det A

=

3 ± i , j ,k =1

=

e i jk Ai 1 A j 2 Ak3

3 ± i, j ,k =1

ei jk A1i A2 j A3k

×3

(1.195)

in which ei jk is the totally antisymmetric Levi-Civita symbol whose nonzero values are e123

= e231 = e312 = 1,

and

e 213 = e132

= e321 = − 1.

(1.196)

The symbol vanishes whenever an index appears twice, thus e111

= e112 = e113 = e222 = e221 = e223 = e333 = e331 = e332 = 0

(1.197)

and so forth. The sums over i, j, and k run from 1 to 3 det A =

3 ± i =1

Ai 1

3 ± j,k =1

ei jk A j2 A k3 (1.198)

= A11 ( A 22 A33 − A32 A 23) + A 21 ( A32 A 13 − A 12 A33) + A 31 ( A12 A 23 − A 22 A13 ) . The minor Mi of the matrix A is the 2 × 2 determinant of the matrix A without row i and column ±, and the cofactor Ci is the minor Mi multiplied by (−1)i + . ±

±

±

±

Thus det A is the sum

det A = A11 (−1) 2 ( A22 A 33 − A 32 A23) + A21 (−1)3 ( A12 A 33 − A32 A 13)

+ A31 (−1)4 ( A12 A 23 − A22 A 13) = A11 C11 + A21C 21 + A 31C 31 of the products Ai 1C i1 = A i1 (−1)i +1 Mi 1 where

(1.199)

32

1 Linear Algebra

= ( −1)2 M11 = A22 A 33 − A 23 A32 C 21 = ( −1)3 M21 = A32 A 13 − A 12 A33 C 31 = ( −1)4 M31 = A12 A 23 − A 22 A13 .

C 11

(1.200)

Example 1.30(Volume of a parallelepiped) The determinant of a 3 × 3 matrix is the dot product of the vector of its first row with the cross-product of the vectors of its second and third rows

»» »» »»

U1 V1 W1

U2 V2 W2

U3 V3 W3

»» 3 3 ± »» ± = e U V W = Ui ( V × W ) i = U · ( V × W ) . i jk i j k »» i jk =1

i =1

(1.201) The absolute value of this scalar triple productis the volume of the parallelepiped defined by U , V , and W as one can see by placing the parallelepiped so the vector U runs from the origin along the x-axis. The 3 × 3 determinant (1.201) then is U 1(V2 W3 − V3W2 ) which is the height of the parallelepiped times the area (1.194) of its base.

Laplace used the totally antisymmetric symbol ei 1i 2...i n with n indices and with e123 ...n = 1 to define the determinant of an n × n matrix A as det A

=

n ±

i 1 i 2 ...i n =1

e i1 i2 ...i n Ai 11 A i2 2 . . . Ai n n

(1.202)

in which the sums over i1 . . . i n run from 1 to n. In terms of cofactors, two forms of his expansion of this determinant are det A =

n ± i =1

A ik C ik

=

n ± k=1

A ik C ik

(1.203)

in which the first sum is over the row index i but not the (arbitrary) column index k, and the second sum is over the column index k but not the (arbitrary) row index i . The cofactor Cik is ( −1)i +k Mik in which the minor Mik is the determinant of the (n − 1) × (n − 1 ) matrix A without its ith row and kth column. It’s also true that ek1 k2...kn det A

= =

n ±

i 1 i2 ...i n =1 n

±

i 1 i2 ...i n =1

e i1i 2...i n Ai 1k1 A i2 k2 . . . Ai nk n (1.204) e i1i 2...i n Ak 1i1 A k2i 2 . . . Akn i n .

In particular, since e12...n = 1, the determinant of the transpose of a matrix is equal to the determinant (1.202) of the matrix

det A

T

=

n ±

1.20 Determinants

i 1 i 2 ...i n =1

33

e i1 i2...i n A1i 1 A 2i2 . . . Ani n

= det A .

(1.205)

The interchange A → A of the rows and columns of a matrix has no effect on its determinant. The key feature of a determinant is that it is an antisymmetric combination of products of the elements Aik of a matrix A. One implication of this antisymmetry is that the interchange of any two rows or any two columns changes the sign of the determinant. Another is that if one adds a multiple of one column to another column, for example a multiple x Ai2 of column 2 to column 1, then the determinant T

det A³

=

n ±

)

(

i 1 i 2 ...i n =1

e i1 i2 ...i n A i1 1 + x Ai 12 Ai 22 . . . A in n

(1.206)

is unchanged. The reason is that the extra term δ det A vanishes δ

det A =

n ±

i1 i 2 ... i n =1

x ei 1i 2...in A i1 2 Ai 22 . . . A in n

=0

(1.207)

because it is proportional to a sum of products of a factor ei 1i 2...in that is antisymmetric in i1 and i2 and a factor Ai 12 A i2 2 that is symmetric in these indices. For instance, when i1 and i 2 are 5 and 7 and 7 and 5, the two terms cancel e57...i n A52 A 72 . . . Ai n n + e75...i n A 72 A52 . . . A in n

=0

(1.208)

because e57...i n = −e75...i n . By repeated additions of x2 Ai2 , x 3 Ai 3, and so forth to A i1, we can change the first column of the matrix A to a linear combination of all the columns Ai1

−→ A i1 +

n ± k =2

x k Aik

(1.209)

without changing det A. In this linear combination, the coefficients x k are arbitrary. The analogous operation with arbitrary y k Ai±

−→ Ai + ±

n ±

k =1,k ´= ±

yk A ik

(1.210)

replaces the ±th column by a linear combination of all the columns without changing det A. Suppose that the columns of an n × n matrix A are linearly dependent (Section 1.8), so that the linear combination of columns n ± k =1

yk A ik

=0

for i

= 1, . . . , n

(1.211)

34

1 Linear Algebra

vanishes for some coefficients yk not all zero. Suppose y1 ´ = 0. Then by adding suitable linear combinations of columns 2 through n to column 1, we could make all the modified elements A ³i1 of column 1 vanish without changing det A. But then det A as given by (1.202) would vanish. Thus the determinant of any matrix whose columns are linearly dependent must vanish . Now suppose that the columns of an n × n matrix are linearly independent. Then the determinant of the matrix cannot vanish because any linearly independent set of n vectors in a vector space of n dimensions is complete (Section 1.8). Thus if the columns of a matrix A are linearly independent and therefore complete, some linear combination of all columns 2 through n when added to column 1 will convert column 1 into a nonzero multiple of the n-dimensional column vector (1, 0, 0 , . . . , 0 ), say ( c1, 0, 0 , . . . , 0 ). Similar operations will convert column 2 into a nonzero multiple of the column vector (0, 1, 0 , . . . , 0 ), say (0 , c2 , 0 , . . . , 0 ). Continuing in this way, we may convert the matrix A to a matrix with nonzero entries ci along the main diagonal and zeros everywhere else. The determinant det A then is the product c1c 2 . . . c n of the nonzero diagonal entries c i ’s, and so det A cannot vanish. We may extend these arguments to the rows of a matrix. The addition to row k of a linear combination of the other rows A ki

−→ A ki +

n ±

=1,± ´=k

z ± A± i

(1.212)

±

does not change the value of the determinant. In this way, one may show that the determinant of a matrix vanishes if and only if its rows are linearly dependent. The reason why these results apply to the rows as well as to the columns is that the determinant of a matrix A may be defined either in terms of the columns or in terms of the rows as in the definitions (1.202 and 1.204). These and other properties of determinants follow from a study of permutations (Section 11.13). Detailed proofs are in (Aitken, 1959). Let us return for a moment to Laplace’s expansion (1.203) of the determinant det A of an n × n matrix A as a sum of Aik C ik over the row index i with the column index k held fixed det A =

n ± i =1

A ik C ik

=

n ± i =1

A ki C ki

(1.213)

in order to prove that δk ± det

A=

n ± i =1

Aik C i ±

=

n ± i =1

A ki C± i .

(1.214)

1.20 Determinants

35

For k = ±, this formula just repeats Laplace’s expansion (1.213). But for k ´ = ±, it is Laplace’s expansion for the determinant of a matrix that has two copies of its kth column. Since the determinant of a matrix with two identical columns vanishes, the rule (1.214) also is true for k ´ = ±. The rule (1.214) provides a formula for the inverse of a matrix A whose determinant does not vanish. Such matrices are said to be nonsingular. The inverse A −1 of an n × n nonsingular matrix A is the transpose of the matrix of cofactors divided by the determinant of the matrix

( A −1) = Ci i det A ±

A −1

or

±

C = det . A T

(1.215)

To verify this formula, we use it for A−1 in the product A −1 A and note that by (1.214) the ±kth entry of the product A −1 A is just δ ±k

(

)

A−1 A

n ± (

= k

±

i =1

A−1

)

±i

A ik

Example 1.31(Inverting a 2 × 2 matrix) general 2 × 2 matrix A

=

³a c

1

ad − bc

which is the correct inverse as long as ad

n ± i =1

Ci ± A ik det A

b d

g

b e h

⎞⎛

c 1 ⎠ ⎝ f 0 i 0

x 1 0

±

´

(1.216)

(1.217)

³ d −b ´ −c a

(1.218)

´= bc.

The simple example of matrix multiplication

⎛a ⎝d

= δ k.

Our formula (1.215) for the inverse of the

gives A −1 =

=

⎞ ⎛

y a ⎠ ⎝ z = d 1 g

xa + b xd + e xg + h

⎞

ya + zb + c yd + ze + f ⎠ yg + zh + i

(1.219)

shows that the operations (1.210) on columns that don’t change the value of the determinant can be written as matrix multiplication from the right by a matrix that has unity on its main diagonal and zeros below. Now consider the matrix product

³A −I

0 B

´ ³I

0

B I

´ ³A = −I

AB 0

´

(1.220)

36

1 Linear Algebra

in which A and B are n × n matrices, I is the n × n identity matrix, and 0 is the n × n matrix of all zeros. The second matrix on the left-hand side has unity on its main diagonal and zeros below, and so it does not change the value of the determinant of the matrix to its left, which then must equal that of the matrix on the right-hand side: ³ A 0´ ³ A AB ´ det (1.221) − I B = det − I 0 . By using Laplace’s expansion (1.203) along the first column to evaluate the determinant on the left-hand side and his expansion along the last row to compute the determinant on the right-hand side, one finds that the determinant of the product of two matrices is the product of the determinants det A det B

= det AB .

(1.222)

Example 1.32(Two 2 × 2 matrices) When the matrices A and B are both 2 × 2, the two sides of (1.221) are

det

³A −I

⎛ a11 ´ ⎜ 0 = det ⎜ ⎜⎝a−211 B 0

a12 a22 0 −1

⎞

0 0 b11 b21

0 0 ⎟ ⎟⎟ b12 ⎠ b22

(1.223)

= a11a22 det B − a21a12 det B = det A det B and

¸ det

A

−I

AB 0

¹

⎛ a ⎜⎜a1121 = det ⎜ ⎝−1 0

a12 a22 0 −1

(ab)11 (ab)21

0 0

⎞ ⎟ (ab )22 ⎟ ⎟ 0 ⎠

(ab )12

(1.224)

0

= (−1)C42 = (−1)(−1) det AB = det AB and so they give the product rule det A det B = det AB. Often one uses the notation | A | = det A to denote a determinant. In this more compact notation, the obvious generalization of the product rule is

| ABC . . . Z | = | A|| B | . . . | Z |. ( ) The product rule (1.222) implies that det A−1 is 1/ det A since ( ) ( ) 1 = det I = det A A −1 = det A det A −1 .

(1.225)

(1.226)

1.20 Determinants

37

Example 1.33(Derivative of the logarithm of a determinant) We see from our formula (1.213) for det A that its derivative with respect to any given element A ik is the corresponding cofactor C ik ∂ det A = Cik (1.227) ∂ Aik because the cofactors Ci j and C jk for all j are independent of Aik . Thus the derivative of the logarithm of this determinant with respect to any parameter β is ∂ ln det ∂β

A

= det1 A =

± ik

± ∂ det A ∂ Aik ik

∂ Aik

1 ∂ A ik A− ki ∂β

=

∂β

= Tr

³

± ik

−1 ∂ A

A

Cik ∂ A ik det A ∂β

´

∂β

(1.228) .

Example 1.34(Numerical tricks) Adding multiples of rows to other rows does not change the value of a determinant, and interchanging two rows only changes a determinant by a minus sign. So we can use these operations, which leave the absolute values of determinants invariant, to make a matrix upper triangular, a form in which its determinant is just the product of the factors on its diagonal. Thus to make the matrix

⎛1 2 A = ⎝−2 −6 4

2

⎞

1 3 ⎠ −5

(1.229)

upper triangular, we add twice the first row to the second row

⎛1 2 ⎝0 −2 4

2

⎞

1 5 ⎠ −5

and then subtract four times the first row from the third

⎛1 2 1 ⎞ ⎝0 −2 5 ⎠ . 0 −6 −9

(1.230)

Next, we subtract three times the second row from the third

⎛1 2 ⎝0 −2 0

0

⎞

1 5 ⎠. −24

We now find as the determinant of A the product of its diagonal elements:

| A| = 1(−2)(−24) = 48.

(1.231)

38

1 Linear Algebra

Incidentally, Gauss, Jordan, and modern mathematicians have developed much faster ways of computing determinants and matrix inverses than those (1.203 & 1.215) due to Laplace. Sage, Octave, Matlab, Maple, Mathematica, and Python use these modern techniques, which are freely available as programs in C and FORTRAN from www.netlib.org/lapack. Example 1.35(Using Matlab) The Matlab command to make the matrix (1.229) is A = [ 1 2 1; -2 -6 3; 4 2 -5 ] . The command d = det(A) gives its determinant, d = 48, and Ainv = A(−1) gives its inverse

Ainv =

0.5000 0.0417 0.4167

0.2500 -0.1875 0.1250

0.2500 -0.1042 -0.0417 .

The permanent of a square n × n matrix Aik is the sum over all permutations 1, 2 , . . . , n → s 1, s 2, . . . , s n of the products As 11 A s2 2 · · · As nn perm( A) =

± s

A s1 1 A s22 · · · As nn .

(1.232)

1.21 Jacobians When one changes variables in a multiple integral from coordinates x 1 , x 2 and area element d x 1d x 2 , one must find the new element of area in terms of the new variables y1 , y 2. If xˆ 1 and xˆ 2 are unit vectors in the x 1 and x 2 directions, then as the new coordinates ( y1, y2 ) change by dy1 and dy 2, the point they represent moves by dy

1

=

³∂ x

1

∂ y1

xˆ 1 +

∂ x2 ∂ y1

´

xˆ 2 dy1

and by

2

dy

=

³∂x

1

∂ y2

xˆ 1

∂ x2

+ ∂ y xˆ 2 2

´

dy 2. (1.233)

These vectors, dy1 and dy2 define a parallelogram whose area (1.194) is the absolute value of a determinant

»» ⎛ ∂ x »» ⎜ 1 » ⎜ ∂ y1 area(dy1, dy2 ) = » det ⎜ ∂ x »» ⎝ 1 » ∂ y2

⎞»» » ∂ y1 ⎟ ⎟⎟»»» dy1 dy2 . ∂ x2 ⎠» » ∂ y2 » ∂ x2

(1.234)

1.21 Jacobians

39

The determinant itself is a jacobian

⎛ ∂ x1 ⎜⎜ ∂ y1 ∂ (x 1 , x 2 ) = det ⎜ ∂ x J = J (x / y ) = ∂ ( y1 , y2 ) ⎝ 1

⎞ ∂ y1 ⎟ ⎟. ∂ x2 ⎟ ⎠ ∂ x2

∂ y2

The two equal integrals are

ºº

Rx

f (x 1 , x2 ) d x 1d x 2

=

ºº

(1.235)

∂ y2

»» ∂ ( x , x ) »» 1 2 » dy 1dy2 f ◦ x ( y1 , y2 )) »» ∂ ( y 1, y2) » R

(1.236)

y

in which f ◦ x ( y1 , y2 ) = f ( x 1( y1, y2 ), x 2 ( y1 , y 2)) and Rx and R y are the same region in the two coordinate systems. In 3 dimensions, with j = 1, 2, and 3, the 3 vectors dy

j

=

³∂ x

1

∂ yj

xˆ 1 +

∂ x2 ∂yj

xˆ 2 +

∂ x3 ∂yj

xˆ 3

´

dy j

(1.237)

define a parallelepiped whose volume (1.201) is the absolute value of the determinant

»» ⎛ ∂ x1 »» ⎜ ∂ y »» ⎜⎜ ∂ x1 1 volume(dy1 , dy2, dy3 ) = »» det ⎜ ⎜ ∂ y »» ⎜⎝ 2 ∂ x1 »» ∂ y3

The equal integrals are

ººº

Rx

in which d 3 x

f (xµ) d x 3

=

ººº

∂ x2 ∂ y1

∂ x2

∂ y2

∂ x2

∂ y3

⎞»» » ∂ y1 ⎟» ⎟ » ∂ x3 ⎟» ⎟»» dy1 dy2 dy3 . ∂ y2 ⎟ ⎟»» ∂ x3 ⎠» » ∂ x3

(1.238)

∂ y2

»» ∂ ( x , x , x ) »» 1 2 3 » d3 y f ◦ x ( yµ)) »» ∂ ( y 1, y2, y3 ) » R

(1.239)

y

= d x1 d x2d x3 , d 3 y = dy1 dy2 dy3, f ◦x ( yµ) =

f ( x 1( yµ), x2 ( yµ), x 3 ( yµ)) ,

and Rx and R y are the same region in the two coordinate systems. For n-dimensional integrals over x = ( x 1, . . . , x n ) and y = ( y1 , . . . , y n ), the rule is similar

º

n

Rx

f (x ) d x

=

º

Ry

»» ∂ (x , . . . , x ) »» 1 n » n d y f ◦ x ( y ) »» ∂ ( y1 , . . . , y n ) »

(1.240)

40

1 Linear Algebra

and uses the absolute value of the n-dimensional jacobian

⎛ ∂ x1 ⎜⎜ ∂ y1 ∂ ( x 1, . . . , x n ) ⎜ . = det ⎜ .. J = J ( x / y) = ⎜⎝ ∂ x1 ∂ ( y1, . . . , yn ) ∂ yn

... ..

.

...

⎞ ∂ y1 ⎟ ⎟ .. ⎟ . ⎟. ⎟ ∂ xn ⎠ ∂ xn

(1.241)

∂ yn

Since the determinant of the transpose of a matrix is the same (1.205) as the determinant of the matrix, some people write jacobians with their rows and columns interchanged. 1.22 Systems of Linear Equations Suppose we wish to solve the system of n linear equations n ± k= 1

A ik x k

= yi

(1.242)

for n unknowns x k . In matrix notation, with A an n × n matrix and x and y n-vectors, this system of equations is A x = y. If the matrix A is nonsingular, that is, if det( A ) ´ = 0, then it has an inverse A−1 given by (1.215), and we may multiply both sides of A x = y by A−1 and so find x = A−1 y. When A is nonsingular, this is the unique solution to (1.242). When A is singular, its determinant vanishes, det( A) = 0, and so its columns are linearly dependent (section 1.20). In this case, the linear dependence of the columns of A implies that A z = 0 for some nonzero vector z. Thus if x satisfies A x = y, then so does x + cz for any constant c because A( x + cz ) = A x + c A z = y. So if det( A) = 0, then the equation A x = y may have solutions, but they will not be unique. Whether equation (1.242) has any solutions when det( A ) = 0 depends on whether the vector y can be expressed as a linear combination of the columns of A. Since these columns are linearly dependent, they span a subspace of fewer than n dimensions, and so (1.242) has solutions only when the n -vector y lies in that subspace. A system of m < n equations n ± k =1

A ik x k

= yi

for i

= 1, 2, . . . , m

(1.243)

in n unknowns is under determined. As long as at least m of the n columns A ik of the matrix A are linearly independent, such a system always has solutions, but they may not be unique.

1.24 Lagrange Multipliers

41

1.23 Linear Least Squares Suppose we have a system of m > n equations in n unknowns x k n ± k =1

= yi

A ik x k

= 1, 2, . . . , m .

for i

(1.244)

This problem is over determinedand, in general, has no solution, but it does have an approximate solution due to Carl Gauss (1777–1855). If the matrix A and the vector y are real, then Gauss’s solution is the n values xk that minimize the sum E of the squares of the errors E

=

¸ m ± i =1

yi

−

n ± k =1

¹2

Aik x k

(1.245)

.

The minimizing values x k make the n derivatives of E vanish ∂E ∂ x±

=0=

¸ m ± i =1

2 yi −

¹

n ±

A ik x k

k=1

− Ai )

(

(1.246)

±

or in matrix notation A y = A Ax. Since A is real, the matrix A A is nonnegative (1.39); if it also is positive (1.40), then it has an inverse, and our least-squares solution is T

T

T

x

=

( A A )−1 A T

T

y.

(1.247)

If the matrix A and the vector y are complex, and if the matrix A† A is positive, then one may derive (Exercise 1.25) Gauss’s solution

(

)−1 A

The operators A A T

x T

(

=

(

)−1

A† A

A † y.

(1.248)

)−1 A † are pseudoinverses (Section 1.33).

and A † A

1.24 Lagrange Multipliers

The maxima and minima of a function f (x ) of x = (x 1 , x2 , . . . , x n ) are among the points at which its gradient vanishes ∇ f (x ) = 0, that is, ∂ f (x ) ∂x j

for j

=0

= 1, . . . , n. These are stationary points of

(1.249) f.

42

1 Linear Algebra

Example 1.36(Minimum) For instance, if f (x ) = x 21 + 2x22 + 3x32 , then its minimum is at

∇ f (x ) = (2x1 , 4x2 , 6x3) = 0

that is, at x 1 = x2

(1.250)

= x3 = 0.

How do we find the extrema of f ( x ) if x also must satisfy a constraint? We use a Lagrange multiplier (Joseph-Louis Lagrange 1736–1813). In the case of one constraint c ( x ) = 0, we expect the gradient ∇ f ( x ) to vanish in those directions d x that preserve the constraint. So d x · ∇ f (x ) = 0 for all d x that make the dot product d x · ∇ c( x ) vanish. That is, ∇ f ( x ) and ∇ c (x ) must be parallel. So the extrema of f ( x ) subject to the constraint c (x ) = 0 satisfy the equations

∇ f ( x) = λ ∇ c( x )

and

c( x ) = 0 .

(1.251)

These n + 1 equations define the extrema of the unconstrained function L ( x , λ) = f ( x ) − λ c ( x )

(1.252)

of the n + 1 variables x 1, . . . , x n , λ ∂ L ( x , λ) ∂ xj

= ∂ ( f ( x)∂−x λ c(x )) = 0 j

and

∂ L (x , λ) ∂λ

= − c( x ) = 0.

(1.253)

The variable λ is a Lagrange multiplier. In the case of k constraints c1 (x ) = 0, . . . , c k ( x ) = 0, the projection of ∇ f must vanish in those directions d x that preserve all the constraints. So d x · ∇ f ( x ) = 0 for all d x that make all d x · ∇ c j (x ) = 0 for j = 1, . . . , k. The gradient ∇ f will satisfy this requirement if it’s a linear combination

∇ f = λ1 ∇ c1 + · · · + λk ∇ ck of the k gradients because then d x · ∇ f will vanish if d x · ∇ c j = 1, . . . , k. The extrema also must satisfy the constraints

(1.254) 0 for j

=

c1 (x ) = 0, . . . , ck (x ) = 0.

(1.255)

L ( x , λ) = f (x ) − λ1 c 1( x ) · · · − λ k ck ( x )

(1.256)

The n + k equations (1.254 & 1.255) define the extrema of the unconstrained function of the n + k variables x and λ

∇ L ( x, λ) = ∇ f ( x) − λ1 ∇c1( x ) · · · − λk ∇ ck ( x ) = 0

(1.257)

1.25 Eigenvectors and Eigenvalues

43

and ∂ L (x , λ) ∂λ j

= − c j (x ) = 0

for j

= 1, . . . , k.

Example 1.37(Constrained extrema and eigenvectors) extrema of a real, symmetric quadratic form f (x ) = x T A x subject to the constraint c(x ) length. We form the function

=

n ±

i, j =1

(1.258)

Suppose we want to find the

xi A i j x j

(1.259)

= x · x − 1 which says that the n-vector x is of unit

L ( x , λ) = x T A x

− λ (x · x − 1)

(1.260)

and since the matrix A is real and symmetric, we find its unconstrained extrema as

∇ L(x , λ) = 2A x − 2λ x = 0 and x · x = 1. (1.261) f (x ) = x A x subject to the constraint c(x ) = x · x − 1 are the

The extrema of normalized eigenvectors

T

Ax

=λx

x·x

and

=1

(1.262)

of the real, symmetric matrix A.

1.25 Eigenvectors and Eigenvalues

If a linear operator A maps a nonzero vector |u ² into a multiple of itself A|u ² =

|u ²

λ

(1.263)

then the vector |u ² is an eigenvector of A with eigenvalue λ. (The German adjective eigen means own, special, or proper.) If the vectors |k ² for k = 1, . . . , n form an orthonormal basis for the vector space in which A acts, then we can write the identity operator for the space as I = |1 ²±1| + · · · + |n ²±n |. By inserting this formula for I into the eigenvector equation (1.263), we get n ±

=1

±k| A |±²±±|u ² = λ ±k|u².

±

In matrix notation, with A k ± = ±k | A |±² and u ±

= ±±|u², this is A u = λ u .

(1.264)

44

1 Linear Algebra

A subspace c± |u ± ² + · · · + cr |u r ² spanned by any set of eigenvectors |u k ² of a matrix A is left invariant by its action, that is A with c³k

¸± k∈S

¹ ± ± ±³ ck |u k ² = c k A|u k ² = ck λk |u k ² = c k |u k ² k ∈S

k∈S

= ck λk . Eigenvectors span invariant

k∈S

(1.265)

subspaces.

Example 1.38(Eigenvalues of an orthogonal matrix)

The matrix equation

³1´ ´³ 1 ´ sin θ ¶ i (1.266) ¶i = e ¶i cos θ tells us that the eigenvectors of this 2×2 orthogonal matrix are (1, ¶i ) with eigenvalues ³ cos θ − sin θ

θ

e¶i θ . The eigenvalues λ of a unitary (and of an orthogonal) matrix are unimodular, |λ| = 1, (Exercise 1.26).

Example 1.39(Eigenvalues of an antisymmetric matrix) vector equation for a matrix A that is antisymmetric n ±

= λ ui .

(1.267)

= − Aki of A implies that n ± ui A ik uk = 0.

(1.268)

k =1

The antisymmetry A ik

Let us consider an eigen-

A ik u k

i ,k =1

Thus the last two relations imply that 0=

n ±

i ,k =1

ui Aik uk

=λ

n ± i =1

u2i

= 0.

(1.269)

Thus either the eigenvalue λ or the dot product of the eigenvector with itself vanishes.

1.26 Eigenvectors of a Square Matrix

Let A be an n × n matrix with complex entries Aik . A vector V with n entries Vk (not all zero ) is an eigenvector of A with eigenvalue λ if n ± k =1

A ik Vk

= λ Vi

or

AV

= λV

(1.270)

in matrix notation. Every n × n matrix A has n eigenvectors V (±) and eigenvalues λ ± AV (±)

=λ V ±

(±)

(1.271)

1.26 Eigenvectors of a Square Matrix

45

for ± = 1, . . . , n. To see why, we write the top equation (1.270) as n ± k =1

( Aik

− λ δik ) Vk = 0

(1.272)

or in matrix notation as ( A − λ I ) V = 0 in which I is the n × n matrix with entries Iik = δik . This equation and (1.272) say that the columns of the matrix A − λ I, considered as vectors, are linearly dependent (Section 1.8). The columns of a matrix A − λ I are linearly dependent if and only if the determinant | A − λ I | vanishes (Section 1.20). Thus a solution of the eigenvalue equation (1.270) exists if and only if the determinant of A − λ I vanishes det ( A − λ I ) = | A − λ I | = 0 .

(1.273)

This vanishing of the determinant of A − λ I is the characteristic equationof the matrix A. For an n × n matrix A, it is a polynomial equation of the n th degree in the unknown eigenvalue λ 0 = | A − λ I | = | A| + · · · + (−1 )n−1 λn−1 TrA + ( −1)n λn

= P (λ, A) =

n ± k =0

p k λk

(1.274)

in which p0 = | A|, p n−1 = (−1)n −1Tr A, and pn = (−1) n . All the pk ’s are basis independent. For if S is any nonsingular matrix, then multiplication rules (1.222 and 1.226) for determinants imply that the determinant | A − λ I | is invariant when A undergoes a similarity transformation (1.67 & (1.278) 1.284) A → A³ = S −1 AS P (λ, A ³ ) = P (λ, S −1 AS ) = | S −1 AS − λ I | = | S −1 ( A − λ I ) S |

= | S−1|| A − λ I ||S| = | A − λ I | = P (λ, A ).

(1.275)

By the fundamental theorem of algebra (Section 6.9), the characteristic equation (1.274) always has n roots or solutions λ± lying somewhere in the complex plane. Thus the characteristic polynomial P (λ, A ) has the factored form P (λ, A) =

(λ1

− λ)(λ2 − λ) · · · (λn − λ).

(1.276)

For every root λ± , there is a nonzero eigenvector V (±) whose components Vk(±) are the coefficients that make the n vectors Aik − λ± δik that are the columns of the matrix A − λ± I sum to zero in (1.272). Thus every n × n matrix hasn eigenvalues λ± and n eigenvectors V (±) . The n × n diagonal matrix A (kd±) = δk ± λ± is the canonical formof the matrix A; the matrix Vk ± = Vk(±) whose columns are the eigenvectors V (±) of A is the modal matrix; and A V = V A d or more explicitly

46

n ± k =1

A ik Vk±

=

n ± k =1

1 Linear Algebra (±)

Aik Vk

=λ

±

Vi

(±)

=

n ± k =1

Vik δk ± λ±

=

n ± k =1

(d )

Vik A k ± .

(1.277)

If the eigenvectors Vk ± are linearly independent, then the matrix V , of which they are the columns, is nonsingular and has an inverse V −1 . The similarity transformation V −1 A V

=Ad

(1.278)

( )

diagonalizes the matrix A. Example 1.40(The Canonical Form of a 3 × 3 Matrix) If in Matlab we set A = [0 1 2; 3 4 5; 6 7 8] and enter [V , D ] = eig( A), then we get ⎛0. 1648 0.7997 0.4082 ⎞ ⎛13 .3485 ⎞ 0 0 V = ⎝0. 5058 0.1042 −0.8165⎠ and A d = ⎝ 0 −1.3485 0⎠ 0. 8468 −0.5913 0.4082 0 0 0 and one may check that A V = V Ad and that V −1 A V = A d .

Setting λ = 0 in the factored form (1.276) of P (λ, A) and in the characteristic equation (1.274), we see that the determinant of everyn × n matrix is the product of itsn eigenvalues P ( 0, A) = | A | = p0

= λ1 λ2 · · · λn .

(1.279)

These n roots usually are all different, and when they are, the eigenvectors V are linearly independent. The first eigenvector is trivially linearly independent. Let’s assume that the first k < n eigenvectors are linearly independent; we’ll show that the first k + 1 eigenvectors are linearly independent. If they were linearly dependent, then there would be k + 1 numbers c ± , not all zero, such that (±)

k +1 ± ±

=1

c± V (±) = 0.

(1.280)

First we multiply this equation from the left by the linear operator A and use the eigenvalue equation (1.271) A

k +1 ±

c± V

(±)

=

k+ 1 ±

c± A V

(±)

=

k+1 ±

c ± λ± V (±) = 0.

=1 Now we multiply the same equation (1.280) by λk +1 k +1 ± c ± λk +1 V (±) = 0 ±=1 =1

±

=1

±

(1.281)

±

(1.282)

1.26 Eigenvectors of a Square Matrix

and subtract the product (1.282) from (1.281). The terms with leaving k ±

=1

c ± (λ± − λk +1) V (±)

47 ±

= k + 1 cancel

=0

(1.283)

±

in which all the factors (λ ± − λk +1 ) are different from zero since by assumption all the eigenvalues are different. But this last equation says that the first k eigenvectors are linearly dependent, which contradicts our assumption that they were linearly independent. This contradiction tells us that if all n eigenvectors of an n × n square matrix have different eigenvalues, then they are linearly independent. Similarly, if any k < n eigenvectors of an n × n square matrix have different eigenvalues, then they are linearly independent. An eigenvalue λ that is a single root of the characteristic equation (1.274) is associated with a single eigenvector; it is called a simple eigenvalue. An eigenvalue λ that is a root of multiplicity n of the characteristic equation is associated with n eigenvectors; it is said to be an n-fold degenerate eigenvalueor to have algebraic multiplicityn. Its geometric multiplicityis the number n ³ ≤ n of linearly independent eigenvectors with eigenvalue λ. A matrix with n ³ < n for any eigenvalue λ is defective. Thus an n × n matrix with fewer than n linearly independent eigenvectors is defective. Thus every nondefective square matrix A can be diagonalized by a similarity transformation V −1 AV

= Ad

(1.284)

( )

(1.278). The elements of the main diagonal of the matrix A (d ) are the eigenvalues of the matrix A. Thus the trace of every nondefective matrix A is the sum of its eigenvalues, Tr A = Tr A (d ) = λa + · · · + λn . The columns of the matrix V are the eigenvectors of the matrix A. Since the determinant of every matrix A is the product (1.279) of its eigenvalues, det A = | A | = λ1 λ2 · · · λn , the determinant of every nondefective matrix A = e L is the exponential of the trace of its logarithm

¼

½

det A = exp Tr (log A )

and

det A = det(e L ) = exp[Tr ( L )].

(1.285)

Example 1.41(A defective 2 × 2 matrix) Each of the 2 × 2 matrices

³ 0 1´ 0

0

and

³0 1

´

0 0

has only one linearly independent eigenvector and so is defective.

(1.286)

48

1 Linear Algebra

1.27 A Matrix Obeys Its Characteristic Equation Every square matrix obeys its characteristic equation (1.274). That is, the characteristic equation P (λ, A ) = | A − λ I | =

n ± k =0

pk λk

=0

(1.287)

remains true when the matrix A replaces the variable λ P( A, A) =

n ± k= 0

p k Ak

= 0.

(1.288)

To see why, we use the formula (1.215) for the inverse of the matrix A − λ I (A

− λ I )−1 = C| A(λ,− λA I) |

T

(1.289)

in which C (λ, A) is the transpose of the matrix of cofactors of the matrix A − λ I . Since | A − λ I | = P (λ, A ), we have, rearranging, T

− λ I ) C (λ, A) = | A − λ I | I = P (λ, A) I . (1.290) The transpose of the matrix of cofactors of the matrix A − λ I is a polynomial in λ (A

T

with matrix coefficients

C (λ, A )

T

= C0 + C 1λ + · · · + Cn −1λn −1.

(1.291)

Combining these last two equations (1.290 and 1.291) with the characteristic equation (1.287), we have (A

− λ I )C (λ, A ) = AC0 + ( AC1 − C 0)λ + ( AC2 − C 1)λ2 + · · · + ( ACn −1 − C n−2 )λn −1 − C n−1 λn (1.292) n ± = pk λk . T

k =0

Equating equal powers of λ on both sides of this equation, we find

= p0 I AC1 − C0 = p1 I AC2 − C1 = p2 I ··· = ··· ACn−1 − C n−2 = pn −1 I −C n−1 = pn I. AC0

(1.293)

1.28 Functions of Matrices

49

We now multiply from the left the first of these equations by I , the second by A, the third by A 2, . . . , and the last by An and then add the resulting equations. All the terms on the left-hand sides cancel, while the sum of those on the right gives P ( A , A) . Thus a square matrix A obeys its characteristic equation 0 = P ( A , A) or 0=

n ± k =0

pk A k

= | A| I + p1 A + · · · + ( −1)n−1 (Tr A ) A n−1 + (−1)n A n

(1.294)

a result known as the Cayley–Hamilton theorem(Arthur Cayley, 1821–1895, and William Hamilton, 1805–1865). This derivation is due to Israel Gelfand (1913– 2009) (Gelfand, 1961, pp. 89–90). Because every n × n matrix A obeys its characteristic equation, its nth power A n can be expressed as a linear combination of its lesser powers

( ) = (−1)n−1 | A| I + p1 A + p2 A 2 + · · · + ( −1)n −1(Tr A ) A n−1 . For instance, the square A 2 of every 2 × 2 matrix is given by A2 = −| A| I + (Tr A ) A. An

(1.295)

(1.296)

Example 1.42(Spin-one-half rotation matrix) If θ is a real 3-vector and σ is the 3vector of Pauli matrices (1.32), then the square of the traceless 2 × 2 matrix A = θ · σ is » »» θ θ1 − i θ2 » 3 2 (1.297) (θ · σ ) = − |θ · σ | I = − » » θ1 + i θ2 −θ3 »» I = θ 2 I in which θ 2

= θ · θ . One may use this identity to show (Exercise 1.28) that exp (−i θ · σ /2) = cos (θ /2) I − i θˆ · σ sin(θ /2)

(1.298)

in which θˆ is a unit 3-vector. For a spin-one-half object, this matrix represents an active right-handed rotation of θ radians about the axis θˆ .

1.28 Functions of Matrices

What sense can we make of a function f of an n × n matrix A? And how would we compute it? One way is to use the characteristic equation to express (1.295) every power of A in terms of I , A, . . . , An−1 and the coefficients p0 = | A|, p1 , p2 , . . . , pn−2 , and pn−1 = (−1) n−1Tr A. Then if f (x ) is a polynomial or a function with a convergent power series f (x ) =

∞ ± k =0

ck x k

(1.299)

50

1 Linear Algebra

in principle we may express f ( A) in terms of n functions f k ( p ) of the coefficients p ≡ ( p 0, . . . , pn−1 ) as f ( A) =

n−1 ± k =0

f k( p) Ak.

The identity (1.298) for exp (−i θ · σ /2) is an n which can become challenging when n > 3.

=

(1.300) 2 example of this technique

Example 1.43(The 3 × 3 rotation matrix) In Exercise 1.29, one finds the characteristic equation (1.294) for the 3×3 matrix −i θ · J in which ( Jk )i j = i ²ik j , and ²i jk is totally antisymmetric with ²123 = 1. The generators Jk satisfy the commutation relations [ Ji , J j ] = i ²i jk Jk in which sums over repeated indices from 1 to 3 are understood. In Exercise 1.30, one uses the characteristic equation for −i θ · J to show that the 3×3 real orthogonal matrix exp(−i θ · J ), which represents a right-handed rotation by θ radians about the axis θˆ , is exp(−i θ · J ) = cos θ I or

exp(−i θ

− i θˆ · J sin θ + (1 − cos θ ) θˆ (θˆ )

T

· J )i j = δi j cos θ − sin θ ²i jk θˆk + (1 − cos θ ) θî θˆ j

(1.301) (1.302)

in terms of indices.

Direct use of the characteristic equation can become unwieldy for larger values of n. Fortunately, another trick is available if A is a nondefective square matrix, and if the power series (1.299) for f ( x ) converges. For then A is related to its diagonal form A (d) by a similarity transformation (1.278), and we may define f ( A) as f ( A ) = S f ( A( d) ) S −1

(1.303)

in which f ( A (d ) ) is the diagonal matrix with entries f (a± )

⎛ f (a ) 1 ⎜ 0 ⎜ f (A d ) = ⎜ . ⎝ .. ( )

0

⎞ ⎟⎟ ⎟⎠

0 f (a 2)

0 0

.. .

.. .

... .. .

0

...

f (an )

...

(1.304)

and a1 , a1 , . . . , an are the eigenvalues of the matrix A. This definition makes sense if f ( A) is a series in powers of A because then f ( A) =

∞ ± k =0

ck A

k

=

∞ ( ± k =0

ck S A (d ) S −1

)k

.

(1.305)

1.28 Functions of Matrices

)k ( So since S −1 S = I , we have S A (d ) S −1 f ( A) = S

¾± ∞ ( k =0

c k A (d )

=S ¿

)k

( A d )k S− 1 and thus

51

( )

S −1

= S f ( A d ) S−1

(1.306)

( )

which is (1.303). Example 1.44 (Momentum operators generate spatial translations) The position operator x and the momentum operator p obey the commutation relation [x , p ] = x p − px = i ±. Thus the a-derivative x˙ (a) of the operator x (a) = eiap/± x e−iap/ ± is unity x˙ (a ) = eiap/± (i /± )[ p, x ] e−iap/± = eiap/± e−iap/± = 1. (1.307) Since x (0) x +a e

=

x, we see that the unitary transformation U (a)

iap /±

x e −iap/± = x (a) = x (0) +

ºa 0

x˙ (a³ ) da ³

= eiap ± moves x to

= x + a.

/

(1.308)

Example 1.45(Glauber’s identity) The commutator of the annihilation operator a and the creation operator a† for a given mode is the number 1

[a , a †] = a a† − a† a = 1. (1.309) Thus a and a † commute with their commutator [a, a†] = 1 just as x and p commute with their commutator [x , p] = i ±. Suppose that A and B are any two operators that commute with their commutator

[ A, B ] = AB − B A

[ A, [ A, B]] = [ B, [ A, B]] = 0. (1.310) As in the [ x , p] example (1.44), we define A B (t ) = e−t B A et B and note that because [ B , [ A , B ]] = 0, its t -derivative is simply A˙ B (t ) = e−t B [ A , B ] et B = [ A , B ] . (1.311) Since A B (0) = A, an integration gives ºt ºt A B (t ) = A + A˙ (t ) dt = A + [ A, B] dt = A + t [ A, B ]. (1.312) 0

0

Multiplication from the left by e

tB

now gives et B A B (t ) as

e t B A B ( t ) = A et B Now we define

= et B ( A + t [ A, B ]) .

G (t ) = et A et B e−t ( A+ B )

(1.313) (1.314)

52

1 Linear Algebra

and use our formula (1.313) to compute its t -derivative as

À

G˙ (t ) = et A A et B

= et A

À tB e

Á + et B B − et B ( A + B ) e−t A+ B

(A

(

)

+ t [ A , B ] ) + et B B − et B ( A + B )

Á

e−t ( A+ B )

(1.315)

= et A et B t [ A, B ] et A + B = t [ A, B ] G (t ) = t G(t ) [ A, B]. Since G˙ (t ), G (t ), and [ A , B ] all commute with each other, we can integrate this (

operator equation

)

d G˙ (t ) log G (t ) = dt G (t )

= t [ A, B ]

(1.316)

from 0 to 1 and get since G (0) = 1 log G(1) − log G (0) = log G (1) = Thus G (1) = e[ A, B ]/2 and so

e A e B e−( A +B )

=e

1 2

1 [ A, B ]. 2

(1.317)

[ A ,B ] or e A e B = e A+ B + 12 [ A ,B ]

(1.318)

which is Glauber’s identity. Example 1.46(Chemical reactions) The chemical reactions [A] − → [B], [B ] −→ [A], c and [B] − → [C] make the concentrations [ A] ≡ A, [ B ] ≡ B , and [C ] ≡ C of three kinds of molecules vary with time as a

A˙

= − a A + bB, B˙ = a A − (b + c)B,

and C˙

= cB.

b

(1.319)

We can group these concentrations into a 3-vector V = ( A , B , C ) and write the three equations (1.319) as V˙ = K V in which K is the matrix

⎛−a b −b − c K= ⎝a 0

c

⎞

0 0⎠ . 0

(1.320)

The solution to the differential equation V˙ = K V is V (t ) = eK t V (0). The eigenvalues of the matrix K are the roots of the cubic equation det( K − λ I ) = 0. One root vanishes, and the other two are the roots of the quadratic equation λ2 + (a + b + c)λ + ac = 0. Their sum is the trace TrK = −(a + b + c). They are real when a, b, and c are positive but are complex when 4ac > (a + b + c)2 . The eigenvectors are complete unless 4ac = (a + b + c)2, but are not orthogonal unless c = 0. The time evolution of the concentrations [ A] (dashdot), [ B ] (solid), and [C ] (dashes) are plotted in Fig. 1.1 for the initial conditions [ A] = 1 and [ B ] = [C ] = 0 and rates a = 0.15, b = 0. 1, and c = 0. 1. The Matlab code is in the repository Linear_algebra at github.com/kevinecahill.

1.28 Functions of Matrices

53

Chemical reactions

1 0.9 0.8 0.7

[ A] [ C]

noitartnecnoc

0.6

0.5

0.4 0.3 0.2 0.1 0

[B ]

0

10

20

30

40

50

60

70

80

90

100

t

Figure 1.1 The concentrations [ A ] (dashdot), [ B ] (solid), and [C ] (dashes) as given by the matrix equation V (t ) = eK t V (0) for the initial conditions [ A] = 1 and [ B ] = [C ] = 0 and rates a = 0.15, b = 0.1, and c = 0.1. Example 1.47(Time-evolution operator) In quantum mechanics, the time-evolution operator is the exponential exp(−i H t /±) where H = H † is a hermitian linear operator, the hamiltonian (William Rowan Hamilton 1805–1865), and ± = h/(2π ) = 1.054 × 10 −34 Js where h is Planck’s constant (Max Planck 1858–1947). As we’ll see in the next section, hermitian operators are never defective, so H can be diagonalized by a similarity transformation

= SH d

H

( )

S−1 .

(1.321)

( d)

The diagonal elements of the diagonal matrix H are the energies E ± of the states of the system described by the hamiltonian H. The time-evolution operator U (t ) then is U (t ) = S exp(−i H ( d) t /±) S −1. For a three-state system with angular frequencies ωi

⎛e−i U (t ) = S ⎝ 0

ω1t

0

0 e−i ω2t 0

= Ei /± , it is ⎞ 0 ⎠ S−1 .

e−i ω3t

(1.322)

(1.323)

Example 1.48 (Entropy) The entropy S of a system described by a density operator ρ is the trace S = −k Tr (ρ ln ρ) in which k = 1. 38 × 10 −23 J/K is the constant named after Ludwig Boltzmann (1844–1906). The density operator ρ is

54

1 Linear Algebra

hermitian, nonnegative, and of unit trace. Since ρ is hermitian, the matrix that represents it is never defective (section 1.29), and so it can be diagonalized by a similarity transformation ρ = S ρ (d) S−1 . By (1.24), Tr ABC = TrBC A, so we can write S as S

= −kTr

À

S ρ (d ) S−1 S ln(ρ ( d) ) S −1

Á

= −kTr

À

ρ

( d)

ln(ρ (d ) )

Á

.

(1.324)

A vanishing eigenvalue ρ k = 0 contributes nothing to this trace since limx →0 x ln x = 0. If the system has three states, populated with probabilities ρi , the elements of ρ (d ) , then the sum ( d)

S

= −k (ρ1 ln ρ1 + ρ2 ln ρ2 + ρ3 ln ρ3) ¼ ½ = k ρ1 ln (1/ρ1 ) + ρ2 ln (1/ρ2) + ρ3 ln (1/ρ3 )

(1.325)

is its entropy. Example 1.49(Logarithm of a determinant) Since every nondefective n × n matrix A may be diagonalized by a similarity transformation, its determinant is the product of its eigenvalues and its trace is the sum of them, and so the logarithm of its determinant is the trace of its logarithm ln det A

=

ln(λ1 . . . λn ) = ln(λ1 ) + · · · + ln(λn ) = Tr(ln A).

(1.326)

When none of A’s eigenvalues vanishes, this relation implies the earlier result (1.228) that the variation of A’s determinant is δ

det A

= det A Tr( A−1 δ A).

(1.327)

1.29 Hermitian Matrices Hermitian matrices have very nice properties. By definition (1.30), a hermitian matrix A is square and unchanged by hermitian conjugation A† = A. Since it is square, the results of section 1.26 ensure that an n × n hermitian matrix A has n eigenvectors |k ² with eigenvalues a k A |k ² = a k |k ².

(1.328)

In fact, all its eigenvalues are real. To see why, we take the adjoint

±k | A † = a∗k ±k |

and use the property A †

= A to find ±k| A† = ±k | A = ak∗ ±k|.

(1.329) (1.330)

We now form the inner product of both sides of this equation with the ket |k ² and use the eigenvalue equation (1.328) to get

±k| A|k ² = ak ±k|k ² = ak∗ ±k|k²

(1.331)

1.29 Hermitian Matrices

55

which (since ±k |k ² > 0) tells us that the eigenvalues are real a ∗k

= ak.

(1.332)

= A, the matrix elements of A between two of its eigenvectors satisfy am∗ ±m |k ² = ( am ±k |m ²)∗ = ±k | A |m ²∗ = ±m | A †|k ² = ±m | A|k ² = ak ±m |k ² (1.333) Since A†

which implies that

(

am∗ − ak

)

±m |k ² = 0.

(1.334)

But by (1.332), the eigenvalues a m are real, and so we have (am

− ak ) ±m |k ² = 0

(1.335)

which tells us that when the eigenvalues are different, the eigenvectors are orthogonal. In the absence of a symmetry, all n eigenvalues usually are different, and so the eigenvectors usually are mutually orthogonal. When two or more eigenvectors |k α ² of a hermitian matrix have the same eigenvalue ak , their eigenvalues are said to be degenerate. In this case, any linear combination of the degenerate eigenvectors also will be an eigenvector with the same eigenvalue ak A

¸± α

∈D

cα

¹ ¸± ¹ |k ² = ak c |k ² α

α

∈D

α

α

(1.336)

where D is the set of labels α of the eigenvectors with the same eigenvalue. If the degenerate eigenvectors |k α ² are linearly independent, then we may use the Gram– Schmidt procedure (1.113–1.123) to choose the coefficients c α so as to construct degenerate eigenvectors that are orthogonal to each other and to the nondegenerate eigenvectors. We then may normalize these mutually orthogonal eigenvectors. But two related questions arise: Are the degenerate eigenvectors |k α ² linearly independent? And if so, what orthonormal linear combinations of them should we choose for a given physical problem? Let’s consider the second question first. We know that unitary transformations preserve the orthonormality of a basis (section 1.16). Any unitary transformation that commutes with the matrix A

[ A, U ] = 0

(1.337)

represents a symmetry of A and maps each set of orthonormal degenerate eigenvectors of A into another set of orthonormal degenerate eigenvectors of A with the same eigenvalue because AU |k α ² = U A|kα ² = a k U |kα ².

(1.338)

56

1 Linear Algebra

So there’s a huge spectrum of choices for the orthonormal degenerate eigenvectors of A with the same eigenvalue. What is the right set for a given physical problem? A sensible way to proceed is to add to the matrix A a second hermitian matrix B multiplied by a tiny, real scale factor ² A (²) = A + ² B .

(1.339)

The matrix B must completely break whatever symmetry led to the degeneracy in the eigenvalues of A. Ideally, the matrix B should be one that represents a modification of A that is physically plausible and relevant to the problem at hand. The hermitian matrix A(²) then will have n different eigenvalues ak (²) and n orthonormal nondegenerate eigenvectors A(²)|k β , ² ² = a kβ (²)|kβ , ² ².

(1.340)

These eigenvectors |k β , ²² of A (²) are orthogonal to each other

±k , ² |k ³ , ²² = δ β

β

β,β

³

(1.341)

and to the eigenvectors of A (²) with other eigenvalues, and they remain so as we take the limit

|k ² = lim |k , ² ². →0 β

²

β

(1.342)

We may choose them as the orthogonal degenerate eigenvectors of A. Since one can always find a crooked hermitian matrix B that breaks any particular symmetry, it follows that every n × n hermitian matrix A possesses n orthonormal eigenvectors, which are complete in the vector space in which A acts. (Any n linearly independent vectors span their n-dimensional vector space, as explained in section 1.9.) Now let’s return to the first question and show by a different argument that an n × n hermitian matrix has n orthogonal eigenvectors. To do this, we first note that the space S⊥,k of vectors |y ² orthogonal to an eigenvector |k ² of a hermitian operator A A |k ² = a k |k ²

(1.343)

± k| A | y² = ak ±k | y² = 0.

(1.344)

is invariant under the action of A, that is, ±k | y ² = 0 implies Thus if the vector |y ² is in the space S ⊥,k of vectors orthogonal to an eigenvector |k² of a hermitian operator A, then the vector A | y² also is in the space S⊥,k . This space is invariant under the action of A. Now a hermitian operator A acting on an n-dimensional vector space S is represented by an n × n hermitian matrix, and so it has at least one eigenvector |1² . The subspace S⊥,1 of S consisting of all vectors orthogonal to |1 ² is an

1.29 Hermitian Matrices

57

− 1)-dimensional vector space Sn−1 that is invariant under the action of A. On this space Sn−1 , the operator A is represented by an (n − 1 ) × (n − 1) hermitian matrix A n−1 . This matrix has at least one eigenvector |2². The subspace S⊥,2 of Sn−1 consisting of all vectors orthogonal to |2² is an (n − 2)-dimensional vector space Sn−2 that is invariant under the action of A. On S n−2, the operator A is represented by an (n − 2) × (n − 2) hermitian matrix A n−2 which has at least one eigenvector |3 ². By construction, the vectors |1², |2², and |3 ² are mutually orthogonal. Continuing in this way, we see that A has n orthogonal eigenvectors |k ² for k = 1, 2 , . . . , n . Thus hermitian matrices are nondefective . The n orthogonal eigenvectors |k ² of an n × n matrix A can be normalized and used to write the n × n identity operator I as

(n

I

=

n ± k =1

|k²±k|.

(1.345)

On multiplying from the left by the matrix A, we find A = AI

=A

n ± k =1

|k²±k| =

n ± k=1

a k |k ²±k |

(1.346)

which is the diagonal form of the hermitian matrix A. This expansion of A as a sum over outer products of its eigenstates multiplied by their eigenvalues exhibits the possible values a k of the physical quantity represented by the matrix A when selective, nondestructive measurements |k ²± k | of the quantity A are made. The hermitian matrix A is diagonal in the basis of its eigenstates |k ² Ak j

= ±k | A | j ² = ak δk j .

(1.347)

But in any other basis |αk ², the matrix A appears as Ak ±

= ±αk | A |α ² = ±

n ± ±αk |n²an ±n|α ². ±

n=1

(1.348)

The unitary matrix Ukn = ±αk |n ² relates the matrix A k ± in an arbitrary basis to d) its diagonal form A = U A (d )U † in which A( d) is the diagonal matrix A (nm = an δnm . An arbitrary n × n hermitian matrix A can be diagonalized by a unitary transformation. A matrix that is real and symmetricis hermitian; so is one that is imaginary and antisymmetric . A real, symmetric matrix R can be diagonalized by an orthogonal transformation R

= ORd O ( )

T

(1.349)

in which the matrix O is a real unitary matrix, that is, an orthogonal matrix (1.184).

58

1 Linear Algebra

Example 1.50(The seesaw mechanism) the real, symmetric mass matrix

Suppose we wish to find the eigenvalues of

³0

´

M= m (1.350) in which m is an ordinary mass and M is a huge mass. The eigenvalues of this hermitian À mass√matrix satisfy Á det M − I = − M − m = 0 with solutions µ¶

=

M¶

M 2 + 4m 2

(

/2.

µ )

m M

µ(µ

The larger mass µ +

µ

2

)

≈ M + m 2/ M is approximately

the huge mass M and the smaller mass µ − ≈ −m 2 / M is tiny. The physical mass of a fermion is the absolute value of its mass parameter, here m 2/ M. The product of the two eigenvalues is the constant µ +µ − = det = −m 2 so as µ− goes down, µ + must go up. Minkowski, Yanagida, and Gell-Mann, Ramond, and Slansky invented this “seesaw” mechanism as an explanation of why neutrinos have such small masses, less than 1 eV/c2 . If mc2 = 10 MeV, and µ− c2 ≈ 0.01 eV, which is a plausible light-neutrino mass, then the rest energy of the huge mass would be Mc2 = 107 GeV suggesting new physics at that scale. But if we set mc2 = 0. 28 MeV and use m ν = 0.45 eV as an average neutrino mass, then the big mass is only Mc2 = 173 GeV, the mass of the top. Also, the small masses of the neutrinos may be related to the weakness of their interactions.

M

If we return to the orthogonal transformation (1.349) Â and multiply column

±

of the matrix O and row ± of the matrix O T by | R±(d ) |, then we arrive at the congruency transformationof Sylvester’s theorem R = C Rˆ (d) C T

(1.351)

(d ) in Â which the diagonal entries Rˆ± are either ¶1 or 0 because the matrices C k±

|R | Ok (d )

±

±

and C have absorbed the factors | R± |. (d )

T

=

Example 1.51(Principle of equivalence) If G is a real, symmetric 4 × 4 matrix then there’s a real 4 × 4 matrix D = C T −1 such that Gd

=D

T

⎛g 1 ⎜⎜ 0 GD = ⎝ 0 0

0 g2 0 0

0 0 g3 0

⎞

0 0⎟ ⎟ 0⎠ g4

(1.352)

in which the diagonal entries gi are ¶1 or 0. Thus there’s a real 4 × 4 matrix D that casts any real symmetric metric gik of spacetime with three positive and one negative eigenvalues into the diagonal metric η j ± of flat spacetime by the congruence

1.30 Normal Matrices

gd

=D

T

⎛−1 ⎜0 gD = ⎜ ⎝ 0 0

0 1 0 0

0 0 1 0

59

⎞

0 0⎟ ⎟=η 0⎠ 1

(1.353)

at any given point x of spacetime. Usually one needs different Ds at different points. The principle of equivalence (section 13.25) says that in the new free-fall coordinates, all physical laws take the same form as in special relativity without acceleration or gravitation in a suitably small region of spacetime about the point x.

1.30 Normal Matrices The largest set of matrices that can be diagonalized by a unitary transformation is the set of normal matrices. These are square matrices that commute with their adjoints

[V , V †] = V V † − V † V = 0.

(1.354)

This broad class of matrices includes not only hermitian matrices but also unitary matrices since

[U , U † ] = UU † − U † U = I − I = 0.

(1.355)

A matrix V = U † V (d ) U that can be diagonalized by a unitary transformation U commutes with its adjoint V † = U † V (d )∗ U and so is normal because the commutator of any two diagonal matrices vanishes

[V , V † ] = [U † V d

( )

U , U † V ( d) ∗ U ] = U † [V , V ( d)∗ ]U

= 0.

(1.356)

To see why a normal matrix can be diagonalized by a unitary transformation, we consider an n × n normal matrix V which since it is square has n eigenvectors |k ² with eigenvalues vk (V

− vk I ) |k² = 0

(1.357)

(section 1.26). The square of the norm (1.85) of this vector must vanish

¸ ( V − vk I ) |k ² ¸2= ±k | ( V − vk I )† ( V − vk I ) |k ² = 0.

(1.358)

But since V is normal, we also have

±k| (V − vk I )† (V − v k I ) |k² = ±k| (V − v k I ) ( V − vk I )† |k². (1.359) ( ) So the square of the )norm of the vector V † − vk∗ I |k ² = ( V − vk I )† |k ² also ( vanishes ¸ V † − v∗k I |k ² ¸2 = 0 which tells us that |k ² also is an eigenvector of V † with eigenvalue vk∗

V † |k ² = v ∗k |k ²

and so

±k|V = v k ±k |.

(1.360)

60

1 Linear Algebra

If now |m ² is an eigenvector of V with eigenvalue vm V |m ² =

then and also by (1.360)

vm

|m ²

(1.361)

±k |V |m ² = vm ±k|m ²

(1.362)

±k|V |m ² = vk ±k|m ².

(1.363)

Subtracting (1.362) from (1.363), we get

− vm ) ±k|m ² = 0

(vk

(1.364)

which shows that any two eigenvectors of a normal matrix V with different eigenvalues are orthogonal . To see that a normal n × n matrix V has n orthogonal eigenvectors, we first note that if | y ² is any vector that is orthogonal to any eigenvector |k ² of the matrix V , that is both ±k |y ² = 0 and V |k ² = v k |k ², then the property (1.360) implies that

±k| V | y ² = vk ±k| y² = 0.

(1.365)

Thus the space of vectors orthogonal to an eigenvector of a normal matrix V is invariant under the action of V . The argument following the analogous equation (1.344) applies also to normal matrices and shows that every n × n normal matrix has n orthonormal eigenvectors. It follows then from the argument of equations (1.345–1.348) that every n × n normal matrix V can be diagonalized by an n × n unitary matrix U

= U V d U†

(1.366)

V |k ² = vk |k ²

(1.367)

V

( )

whose kth column U± k = ±α± |k ² is the eigenvector |k ² in the arbitrary basis |α± ² of the matrix Vm ± = ±αm |V |α± ² as in (1.348). Since the eigenstates |k ² of a normal matrix V are complete and orthonormal, we can write the identity operator I as I The product V I is V itself, so V

=VI =V

=

n ± k =1

n ± k =1

|k²±k |.

|k ²±k| =

(1.368) n ± k =1

vk

|k ²±k|.

It follows therefore that if f is a function, then f ( V ) is

(1.369)

1.31 Compatible Normal Matrices n ±

f (V ) =

k =1

61

f (vk ) |k ²±k |

(1.370)

which is simpler than the corresponding formula (1.303) for an arbitrary nondefective matrix. This is a good way to think about functions of normal matrices. Example 1.52 (Time-evolution operator) How do we handle the operator exp(−iHt /±) that translates states in time by t ? The hamiltonian H is hermitian and so is normal. Its orthonormal eigenstates |k ² have energy E k

So we apply (1.370) with V

H | k² = E k |k ².

(1.371)

k /

(1.372)

→ H and get n ± − i Ht ± e = e−i E t ± |k²±k | /

k =1

which lets us compute the time evolution of any state | ψ ² as e−i H t /± |ψ ² =

n ± k =1

e−i E k t / ± | k²± k| ψ ²

(1.373)

if we know the eigenstates |k ² and eigenvalues E k of the hamiltonian H.

The determinant |V | of a normal matrix V satisfies the identities

| V | = exp [Tr (ln V )] ,

ln |V | = Tr(ln V ),

and

δ

(

)

ln | V | = Tr V −1 δ V . (1.374)

1.31 Compatible Normal Matrices Two normal matrices A and B that commute

[ A , B ] ≡ AB − B A = 0

(1.375)

are said to be compatible. Since these operators are normal, they have complete sets of orthonormal eigenvectors. If |u ² is an eigenvector of A with eigenvalue z, then so is B |u ² since AB |u ² = B A |u ² = Bz |u ² = z B|u ² .

(1.376)

We have seen that any normal matrix A can be written as a sum (1.30) of outer products A=

n ± k =1

|ak ²ak ±ak |

(1.377)

62

1 Linear Algebra

of its orthonormal eigenvectors |ak ² which are complete in the n-dimensional vector space S on which A acts. Suppose now that the eigenvalues a k of A are nondegenerate, and that B is another normal matrix acting on S and that the matrices A and B are compatible. Then in the basis provided by the eigenvectors (or eigenstates) |a k ² of the matrix A, the matrix B must satisfy 0 = ±a ± | AB

− B A|ak ² = (a − ak ) ±a | B |ak ² (1.378) which says that ±a | B |ak ² is zero unless a = ak . Thus if the eigenvalues a of the ±

±

±

±

±

operator A are nondegenerate, then the operator B is diagonal B = I BI

=

n ± ±

=1

n ±

|a ²±a | B ±

±

k =1

|ak ²±ak | =

n ±

=1

|a ²±a | B |a ²±a | ±

±

±

±

(1.379)

±

in the |a ± ² basis. Moreover B maps each eigenket |ak ² of A into B |ak ² =

n ±

=1

|a ²±a | B |a ²±a |ak ² = ±

±

±

n ±

±

±

±

=1

|a ²±a | B |a ²δ k = ±ak | B |ak ²|ak ² ±

±

±

±

(1.380) which says that each eigenvector |a k ² of the matrix A also is an eigenvector of the matrix B with eigenvalue ±a k | B |a k ² . Thus two compatible normal matrices can be simultaneously diagonalizedif one of them has nondegenerate eigenvalues. If A’s eigenvalues a ± are degenerate, each eigenvalue a± may have d± orthonormal eigenvectors |a± , k ² for k = 1, . . . , d± . In this case, the matrix elements ±a±, k| B |am , k³ ² of B are zero unless the eigenvalues are the same, a± = am . The matrix representing the operator B in this basis consists of square, d ± × d± , normal submatrices ± a± , k | B |a± , k ³ ² arranged along its main diagonal; it is said to be in block-diagonal form. Since each submatrix is a d± × d ± , normal matrix, we may find linear combinations |a± , bk ² of the degenerate eigenvectors |a ± , k ² that are orthonormal eigenvectors of both compatible operators A|a± , bk ² = a± |a ± , bk ²

and

B |a± , bk ² = bk |a± , b k ².

(1.381)

Thus one can simultaneously diagonalize any two compatible operators. The converse also is true: If the operators A and B can be simultaneously diagonalized as in (1.381), then they commute AB |a ± , b k ² = Abk |a ± , b k ² = a ± bk |a± , b k ² = a± B |a± , b k ² = B A|a± , bk ²

(1.382)

and so are compatible. Normal matrices can be simultaneously diagonalized if and only if they are compatible, that is, if and only if they commute. In quantum mechanics, compatible hermitian operators represent physical observables that can be measured simultaneously to arbitrary precision (in principle). A set of compatible hermitian operators { A , B , C , . . . } is said to be complete

1.31 Compatible Normal Matrices

63

if to every set of eigenvalues {a j , bk , c ± , . . . } there is only a single eigenvector |a j , bk , c±, . . .². Example 1.53(Compatible photon observables) For example, the state of a photon is completely characterized by its momentum and its angular momentum about its direction of motion. For a photon, the momentum operator P and the dot product J · P of the angular momentum J with the momentum form a complete set of compatible hermitian observables. Incidentally, because its mass is zero, the angular momentum J of a photon about its direction of motion can have only two values ¶± , which correspond to its two possible states of circular polarization. Example 1.54(Thermal density operator) A density operatorρ is the most general description of a quantum-mechanical system. It is hermitian, positive definite, and of unit trace. Since it is hermitian, it can be diagonalized (section 1.29) ρ

=

± n

|n²± n|ρ |n²±n |

(1.383)

and its eigenvalues ρn = ±n| ρ |n² are real. Each ρn is the probability that the system is in the state |n² and so is nonnegative. The unit-trace rule

± n

ρn

= 1.

(1.384)

ensures that these probabilities add up to one – the system is in some state. The mean value of an operator F is the trace, ± F ² = Tr(ρ F ). So the average energy E is the trace, E = ± H ² = Tr(ρ H ). The entropy operatorS is the negative logarithm of the density operator multiplied by Boltzmann’s constant, S = −k ln ρ , and the mean entropy S is S = ± S² = −kTr(ρ ln ρ). A density operator that describes a system in thermal equilibrium at a constant temperature T is time independent and so commutes with the hamiltonian, [ρ , H ] = 0. Since ρ and H commute, they are compatible operators (1.375), and so they can be simultaneously diagonalized. Each eigenstate |n ² of ρ is an eigenstate of H; its energy E n is its eigenvalue, H | n² = E n |n ². If we have no information about the state of the system other than its mean energy E , then we take ρ to be the density operator that maximizes the mean entropy S while ∑ ρ − 1 = 0 and c = Tr(ρ H ) − E = 0. We respecting the constraints c1 = 2 n n introduce two Lagrange multipliers (section 1.24) and maximize the unconstrained function L (ρ , λ1 , λ2) = S − λ1 c1 − λ 2 c2

= −k

± n

ρn

ln ρ n − λ 1

¸± n

¹

ρn

− 1 − λ2

¸± n

(1.385) ρn En

−E

¹

by setting its derivatives with respect to ρ n , λ 1, and λ2 equal to zero ∂L ∂ρn

= −k (ln ρn + 1) − λ1 − λ2 En = 0

(1.386)

64 ∂L ∂ λ1 ∂L ∂ λ2

= =

± n

± n

1 Linear Algebra ρn

−1 = 0

ρn En

(1.387)

− E = 0.

(1.388)

The first (1.386) of these conditions implies that ρn

¼ ½ = exp −(λ1 + λ2 En + k )/ k .

(1.389)

We satisfy the second condition (1.387) by choosing λ1 so that ρn

(−λ2 E n / k ) = ∑expexp . ( −λ E / k )

(1.390)

(− E n / kT ) . = ∑expexp (−E / kT )

(1.391)

2 n

n

Setting λ2 = 1/ T , we define the temperature T so that ρ satisfies the third condition (1.388). Its eigenvalue ρn then is ρn

n

n

≡ 1/(kT ), the density operator is

In terms of the inverse temperature β ρ

−β H

= Tre(e− H )

(1.392)

β

which is the Boltzmann distribution, also called the canonical ensemble. Example 1.55(Grand canonical ensemble) Lagrange’s function for the density operator of a system of maximum entropy S = −kTr(ρ ln ρ) given a fixed mean energy E = Tr(ρ H ) and a fixed mean number of particles ± N ² = Tr(ρ N ), in which N is the number operator N |n² = Nn |n², is L (ρ , λ1, λ2 , λ3) =

−k

±

− λ2

ρ n ln ρn

¸± n

ρn E n

n

− λ1

¸±

¹

n

¹

−1 ¸±

ρn

− E − λ3

n

ρn N n

− ±N ²

¹

(1.393) .

Setting the partial derivative of L with respect to ρ n ∂L ∂ρn

= − k (ln ρn + 1) − λ1 − λ2 En − λ3 Nn = 0

(1.394)

as well as the partial derivatives of L with respect to the three Lagrange multipliers λ i equal to zero, we get ρ

−β(H −µ N ) β( H −µN ) )

= Tre(e−

in which µ is the chemical potential.

(1.395)

1.32 Singular-Value Decompositions

65

1.32 Singular-Value Decompositions

Every complex m ×n rectangular matrix A is the product of an m × m unitary matrix U , an m × n rectangular matrix ³ that is zero except on its main diagonal which consists of A’s nonnegative singular values Sk , and an n × n unitary matrix V † A=U³V

†

or

Aik

=

±

min(m ,n )

=1

Ui ± S± V±†k .

(1.396)

±

This singular-value decomposition is a key theorem of matrix algebra. Suppose A is a linear operator that maps vectors in an n-dimensional vector space Vn into vectors in an m-dimensional vector ¶ space V·m . The spaces Vn and Vm will have infinitely many orthonormal bases |a j ² ∈ Vn and {|bk ² ∈ Vm } labeled by parameters a and b. Each pair of bases provides a resolution of the identity operator In for Vn and Im for Vm In

=

n ± j =1

|a j ²±a j |

and

Im

=

m ± k =1

|bk ²±bk |

(1.397)

and lets us write linear operator A as A = Im AI n

=

n m ± ± k =1 j =1

|bk ²±bk | A|a j ²±a j |

(1.398)

in which the ±bk | A |a j ² are the elements of a complex m × n matrix. The singular-value decomposition of the linear operator A is a choice of two special bases {|a j ²} and {|b j ²} that make ±bk | A |a j ² = S j δk j and so express A as A=

± j

|b j ² S j ±a j |

(1.399)

in which the sum is over the nonzero singular values S j , which will turn out to be positive. The kets of the special basis {|a j ²} are the eigenstates of the hermitian operator A† A A † A |a j ² = e j |a j ².

(1.400)

±a j | A † A|a j ² = e j ±a j |a j ² = e j ≥ 0.

(1.401)

These states {|a j ²} are orthogonal because A† A is hermitian, and we may choose them to be normalized. The eigenvalue e j is the squared length of the ket A |a j ² and so is positive or zero The singular values are the square roots of these eigenvalues Sj

Â = √e j = ±a j | A † A |a j ².

(1.402)

66

1 Linear Algebra

For S j > 0, the special ket |b j ² is the suitably normalized image A |a j ² of the special ket |a j ²

|b j ² = A S|a j ² ;

(1.403)

j

for S j

= 0, the ket |b j ² vanishes. The nonzero special kets |b j ² are orthonormal ±bk |b j ² = S 1S ±ak | A † A|a j ² = k

j

ej ej ± ak |a j ² = Sk S j Sk S j

δk j

= δk j .

(1.404)

The number of positive singular values, S j > 0, is at most n. It also is at most m because each nonzero ket |b j ² is an orthonormal vector in the space Vm which has only m dimensions. So the number of positive singular values, S j > 0, is at most min(m , n ) , the smaller of m and n. The singular-value decomposition of the linear operator A then is the sum A = AIn

=A

n ± j =1

|a j ²±a j | =

n ± j =1

A |a j ²±a j | =

n ± j =1

|b j ² S j ±a j |

in which at most min(m , n ) of the singular values are positive. In terms of any two bases, |k ² for k = 1, . . . , m for the space Vm and ± = 1, . . . , n for the space Vn , and their identity operators

=

Im

m ± k=1

|k ²±k|

and

In

=

n ± ±

=1

|±²±±|

(1.405)

|±² for (1.406)

the singular-value decomposition of the linear operator A is A=

=

m ±

|k²±k | A

k =1 m m

n ±

=1

±

± ±± ± n

k =1 i =1 j =1

n

±

=1

|±²±±| =

n m ± n ± ± k =1 j =1

=1

|k ²±k|b j ² S j ±a j |±²±±|

±

|k²±k |bi ² Si δi j ±a j |±²±±| = U ³ V

(1.407) †

.

In this expansion, the k , i matrix element of the m × m unitary matrix U is Uki = ±k|bi ² , the i , j element of the m × n matrix ³ is ³i j = S j δi j , and the j , ± matrix element of the n × n unitary matrix V † is V j† = ±a j |±². Thus V ∗j = ±a j |±², and so V j = ±a j |±²∗ = ±±|a j ². The vectors |b j ² and |a j ² respectively are the left and right singular vectors. ±

±

±

Incidentally, the singular-value decomposition (1.405) shows that the left singular vectors |b j ² are the eigenvectors of A A †

A A†

= =

n ± j =1

1.32 Singular-Value Decompositions

|b j ² S j ±a j |

n ± j ,k=1

n ± k =1

|ak ² Sk ±bk | =

|b j ² S j δ jk Sk ±bk | =

n ± j =1

n ±

j ,k =1

67

|b j ² S j ±a j |ak ² Sk ±bk | (1.408)

|b j ² S2j ±b j |

just as (1.400) the right singular vectors |a j ² are the eigenvectors of A† A. The kets |a j ² whose singular values vanish, S j = 0, span the null space or kernel of the linear operator A. Example 1.56(Singular-value decomposition of a 2 × 3 matrix) A=

³0

1

1 0

´ 0

1

(1.409)

1

then the positive hermitian matrix A † A is

⎛1 A† A = ⎝0

If A is

⎞

0 1 0

1 0⎠ . 1

(1.410)

The normalized eigenvectors and eigenvalues of A † A are

⎛0⎞ ⎛1⎞ ⎛−1⎞ |a1 ² = √1 ⎝0⎠ , |a2 ² = ⎝1⎠ , |a3 ² = √1 ⎝ 0 ⎠ 2

2

0

1

(1.411)

1

and their eigenvalues are e1 = 2, e2 = 1, and e3 = 0. The third eigenvalue e3 had to vanish because A is a 3 × 2 matrix. √ √ The vector A |a1² (as a row vector) is (0, 2), and its norm is 2, so the normalized vector is |b1² = (0, 1). Similarly, the vector |b2 ² is A |a2 ² = (1, 0). The singular-value decomposition (SVD) of A then is A where Sn

=

2 ± n=1

|b j ²S j ±a j | = U ³ V †

(1.412)

= √en . The unitary matrices are Uk n = ±k|bn ² and V j = ±±|a j ² are ⎛ 1 0 −1⎞ ³0 1´ √ 1 ⎝ (1.413) 0 U= and V = √ 2 0⎠ ,

1

0

±,

2

and the diagonal matrix ³ is ³

=

³ √2 0

0 1

0 0

1

´ .

0

1

(1.414)

68

1 Linear Algebra

= U ³ V † is ³ ´ ³√

So finally the SVD of A A=

0 1

1 0

2 0

0 1

0 0

´

⎛ 1 0 1⎞ √ √ ⎝ 0 2 0⎠ . 2 −1 0 1 1

(1.415)

The null space or kernel of A is the set of vectors that are real multiples c| a3 ² of the eigenvector | a3 ² which has a zero eigenvalue, e3 = 0. It is the third column of the matrix V displayed in (1.413). Example 1.57(Matlab’s singular value decomposition) Matlab’s command [U,S,V] = svd(X) performs the singular-value decomposition (SVD) of the matrix X. For instance

>> X = rand(3,3) + i*rand(3,3) 0.6551 X = 0.1626 0.1190 >> [U,S,V]

+ + + =

0.2551i 0.5060i 0.6991i svd(X)

0.4984 + 0.8909i 0.9597 + 0.9593i 0.3404 + 0.5472i

-0.3689 - 0.4587i U = -0.3766 - 0.5002i -0.2178 - 0.4626i S =

2.2335 0 0

0.5853 + 0.1386i 0.2238 + 0.1493i 0.7513 + 0.2575i

0.4056 - 0.2075i 0.4362 - 0.5055i -0.5792 - 0.2810i 0.0646 + 0.4351i 0.1142 + 0.6041i -0.5938 - 0.0901i

0 0.7172 0

0 0 0.3742

-0.4577 V = -0.7885 - 0.0255i -0.3229 - 0.2527i

0.5749 0.6783 -0.6118 - 0.0497i -0.0135 + 0.0249i 0.3881 + 0.3769i -0.5469 - 0.4900i .

The singular values are 2.2335, 0.7172, and 0.3742.

We may use the SVD to solve, when possible, the matrix equation A |x ² = | y ²

(1.416)

|b j ² S j ±a j |x ² = | y ².

(1.417)

for the n-dimensional vector |x ² in terms of the m-dimensional vector | y ² and the m × n matrix A. Using the SVD expansion (1.405), we have

±

min(m ,n ) j =1

The orthonormality (1.404) of the vectors |b j ² then tells us that S j ±a j | x ² = ±b j | y ².

If the singular value is positive, S j ±b j | y²/ S j and so find the solution

>

(1.418)

0, then we may divide by it to get ±a j |x ²

=

1.32 Singular-Value Decompositions

|x ² =

± ±b j | y ² |a ² .

69

min(m ,n)

j

Sj

j =1

(1.419)

But this solution is not always available or unique. For instance, if for some ± the inner product ±b ± | y ² ´ = 0 while the singular value S± = 0, then there is no solution to equation (1.416). This problem occurs when m > n because there are at most n < m nonzero singular values. Example 1.58 Suppose A is the 3 × 2 matrix

⎛r 1 A = ⎝r2

⎞

p1 p 2⎠ p3

r3

(1.420)

and the vector | y ² is the cross-product | y ² = L = r × p. Then no solution | x ² exists to the equation A |x ² = | y ² (unless r and p are parallel) because A |x ² is a linear combination of the vectors r and p while | y² = L is perpendicular to both r and p.

Even when the matrix A is square, the equation (1.416) sometimes has no solutions. For instance, if A is a square defective matrix (section 1.26), then A| x ² = | y ² will fail to have a solution when the vector | y ² lies outside the space spanned by the linearly dependent eigenvectors of the matrix A. And when n > m, as in for instance

³a d

b e

c f

´ ⎛ x 1⎞ ³ y ´ ⎝ x 2⎠ = 1 x3

y2

(1.421)

the solution (1.419) is never unique, for we may add to it any linear combination of the vectors |a j ² that have zero as their singular values

|x ² =

± ± ±b j | y² | aj² + x j |a j ² S

min( m ,n) j =1

of which there are at least n − m .

n

j, S j =0

(1.422)

Example 1.59(CKM matrix) In the standard model, the mass matrices of the u, c, t and d, s, b quarks are 3 × 3 complex matrices Mu and Md with singular-value decompositions Mu = Uu ³u Vu† and Md = Ud ³d Vd† whose singular-values are the quark masses. The unitary CKM matrix Uu† Ud (Cabibbo, Kobayashi, Maskawa) describes transitions among the quarks mediated by the W ¶ gauge bosons. By redefining the quark fields, one may make the CKM matrix real, apart from a phase that violates charge-conjugation-parity (C P) symmetry.

70

1 Linear Algebra

The adjoint of a complex symmetric matrix M is its complex conjugate, M † = M ∗. So by (1.400), its right singular vectors |n ² are the eigenstates of M ∗ M M ∗ M |n ² = Sn2 |n ²

(1.423)

and by (1.408) its left singular vectors |m n ² are the eigenstates of M M ∗

(

)∗ |m ² = S2 |m ² .

M M ∗ |m n ² = M ∗ M

n

n

n

(1.424)

Thus its left singular vectors are the complex conjugates of its right singular vectors, |m n ² = |n ²∗ . So the unitary matrix V is the complex conjugate of the unitary matrix U , and the SVD of M is (Autonne, 1915) M

= U ³U

T

.

(1.425)

1.33 Moore–Penrose Pseudoinverses

Although a matrix A has an inverse A −1 if and only if it is square and has a nonzero determinant, one may use the singular-value decomposition to make a pseudoinverse A + for an arbitrary m × n matrix A. If the singular-value decomposition of the matrix A is A = U ³ V†

(1.426)

then the Moore–Penrose pseudoinverse (Eliakim H. Moore 1862–1932, Roger Penrose 1931–) is A+ = V

³

+ U†

(1.427)

in which ³ + is the transpose of the matrix ³ with every nonzero entry replaced by its inverse (and the zeros left as they are). One may show that the pseudoinverse A + satisfies the four relations A A+ A = A ,

( A A + )† = A A+ ,

A+ A A+ = A+,

and

( A+ A)† = A + A ,

(1.428)

and that it is the only matrix that does so. Suppose that all the singular values of the m × n matrix A are positive. In this case, if A has more rows than columns, so that m > n, then the product A + A is the n × n identity matrix In A+ A = V †³ +³ V

= V † In V = In

(1.429)

1.33 Moore–Penrose Pseudoinverses

71

and A A + is an m × m matrix that is not the identity matrix Im . If instead A has more columns than rows, so that n > m, then A A + is the m × m identity matrix Im A A + = U ³³ +U †

= U Im U † = Im

(1.430)

and A + A is an n × n matrix that is not the identity matrix In . If the matrix A is square with positive singular values, then it has a true inverse A−1 which is equal to its pseudoinverse A −1

= A+.

(1.431)

If the columns of A are linearly independent, then the matrix A † A has an inverse, and the pseudoinverse is

(

)−1

A+ = A† A

A† .

(1.432)

The solution (1.248) to the complex least-squares method used this pseudoinverse. If the rows of A are linearly independent, then the matrix A A † has an inverse, and the pseudoinverse is A+

= A†

( A A†)−1 .

(1.433)

If both the rows and the columns of A are linearly independent, then the matrix A has an inverse A −1 which is its pseudoinverse A −1

= A+.

(1.434)

Example 1.60(The pseudoinverse of a 2 × 3 matrix) matrix A ³0 1 0´ A= 1 0 1

The pseudoinverse A + of the (1.435)

with singular-value decomposition (1.415) is A+

= V ³+⎛U †

=

1 √1 ⎝0 2 1

⎞⎛ √ √0 −1⎠ ⎝1/ 2 2 0 0 0

1

0

⎞

0 ³ 0 1⎠ 1 0

⎛0 ´ 1 = ⎝1 0

0

⎞

1/ 2 0 ⎠ (1.436) 1/ 2

which satisfies the four conditions (1.428). The product A A + gives the 2 × 2 identity matrix ³0 1 0´ ⎛0 1/2⎞ ³1 0´ ⎝1 0 ⎠ = A A+ = (1.437) 1 0 1 0 1 0 1/2 which is an instance of (1.430). Moreover, the rows of A are linearly independent, and so the simple rule (1.433) works:

72 A+

= A† ⎛1 = ⎝0 1 ⎛1 = ⎝0 1

À

A A†

1 Linear Algebra

Á−1

⎞⎛

³0 0 ⎠ ⎝ 1 1 0 ⎞ 0 ³ 0 1⎠

1

0

1 0

0 1

´ ⎛1 ⎝0

⎞⎞−1 ⎛1 = ⎝0 1 ⎞ 1/ 2 0 ⎠ 0 1⎠⎠ 0

1

´ ⎛0 1/2 = ⎝1 0

0

⎞

0 ³ 0 1⎠ 2 0

1 0

´−1

(1.438)

1/ 2

which is (1.436). The columns of the matrix A are not linearly independent, however, and so the simple rule (1.432) fails. Thus the product A + A

⎞

⎛0 A + A = ⎝1

1/2 ³ 0 0 ⎠ 1 1/2

0

´ 0

1 0

1

⎛1 = 1 ⎝0 2

1

0 2 0

⎞

1 0⎠ 1

(1.439)

is not the 3 × 3 identity matrix which it would be if (1.432) held.

1.34 Tensor Products and Entanglement Tensor productsare used to describe composite systems, such as the spins of two electrons. The terms direct productand tensor product sometimes are used interchangeably. If A is an n × n matrix with elements Ai j and B is an m × m matrix with elements Bk ± , then their tensor productC = A ⊗ B is an nm × nm matrix with elements Cik , j ± = Ai j Bk ± . This tensor-product matrix A ⊗ B maps a vector V j ± into the vector Wik

=

n m ± ± j =1

±

=1

Cik , j ± V j ±

=

n m ± ± j =1

±

=1

Ai j Bk ± V j ±

(1.440)

in which the second double index j ± of C and the second indices of A and B match the double index j ± of the vector V . A tensor-product operator is a product of two operators that act on two different vector spaces. Suppose that an operator A acts on a space S spanned by n kets |i ², and that an operator B acts on a space T spanned by m kets |k ², and that both operators map vectors into their spaces S and T . Then we may write A as A = IS AI S

=

n ±

i , j =1

|i ²±i | A | j ²± j |

(1.441)

1.34 Tensor Products and Entanglement

and B as B

= I T B IT =

m ± k ,s =1

|k ²±k| B |±²±±|.

= A ⊗ B is n m ± ± C=A⊗B= |i ² ⊗ |k² ±i | A| j ²±k | B |±² ± j | ⊗ ±±|

73

(1.442)

Their tensor product C

i , j =1 k,± =1

(1.443)

and it acts on the tensor-product vector space S ⊗ T which is spanned by the tensor-product kets |i, k ² = |i ² |k ² = |i ² ⊗ |k ² and has dimension nm. An arbitrary vector in the space S ⊗ T is of the form

|ψ ² =

n ± m ± i =1 k=1

ψ ( i, k )

|i ² ⊗ |k ² =

n ± m ± i =1 k =1

|i , k²±i, k |ψ ².

(1.444)

|φ S , χT ² that are tensor products |φS ² ⊗ |χ T ² of two vectors |φS ² ∈ S and |χT ² ∈ T ¸± ¹ ¸± ¹ ± n m n ± m φi χk |i , k ² (1.445) |φ S ² ⊗ | χT ² = φi |i ² ⊗ χk |k ² =

Vectors

i =1

k =1

i =1 k =1

are separable. States represented by vectors that are not separable are said to be entangled. Most states in a tensor-product space are entangled. In the simpler notation |i, k ² for |i ² ⊗ |k ², a tensor-product operator A ⊗ B maps an arbitrary vector (1.444) to (A

⊗ B ) |ψ ² =

n ± m ±

i , j =1 k ,±=1

|i , k² ±i | A | j ²±k| B |±² ± j, ±|ψ ².

(1.446)

Direct-product operators are special. An arbitrary linear operator on the space S ⊗ T D=

n ± m ±

i, j = 1 k ,±=1

|i , k²±i, k | D| j, ±²± j , ±|

(1.447)

maps an arbitrary vector (1.444) into the vector D |ψ ² =

n m ± ±

i , j =1 k,± =1

|i, k ²±i, k| D | j , ±²± j, ±|ψ ².

(1.448)

Example 1.61(States of the hydrogen atom) Suppose the state |n , ±, m ² is an eigenvector of the hamiltonian H , the square L 2 of the orbital angular momentum L, and the third component of the orbital angular momentum L 3 of a hydrogen atom without spin:

74

1 Linear Algebra H |n, ±, m ² = En |n, ±, m ²

L 2 |n, ±, m ² = ± 2±(± + 1)|n , ±, m ² L 3 |n, ±, m ² = ± m |n, ±, m².

(1.449)

The state |n, ±, m ² = |n ² ⊗ |±, m ² is separable. Suppose the states |σ ² for σ = ¶ are eigenstates of the third component S3 of the operator S that represents the spin of the electron S3 |σ ² = σ

±

2

|σ ².

(1.450)

The separable, tensor-product states

|n, ±, m , σ ² ≡ |n , ±, m ² ⊗ |σ ² ≡ |n, ±, m²|σ ²

(1.451)

represent a hydrogen atom including the spin of its electron. These separable states are eigenvectors of all four operators H , L 2 , L 3 , and S3 : H |n, ±, m , σ ² = E n | n, ±, m , σ ²

L 2| n, ±, m , σ ² = ±2±(± + 1)| n, ±, m , σ ²

L 3 |n, ±, m , σ ² = ±m | n, ±, m , σ ²

S3| n, ±, m , σ ² = σ

1 ±|n, ±, m , σ ². 2

(1.452)

Suitable linear combinations of these states are eigenstates of the square J 2 of the composite angular momentum J = L + S as well as of J3, L 3, and S3 . Many of these states are entangled. Example 1.62(Adding two spins) The smallest positive value of angular momentum is ±/2. The spin-one-half angular-momentum operators S are represented by three 2×2 matrices, Sa = 21 ± σa , the Pauli matrices σ1

=

³0 1

1 0

´

,

³ 0 −i ´ , σ2 = i

0

and

σ3

=

³1 0

0 −1

´

(1.453)

which are both hermitian and unitary. They map the basis vectors

|+² =

³1´ 0

and

|−² =

³ 0´

(1.454)

1

to σ 1|¶² = |∓², σ2|¶² = ¶i |∓², and σ3 |¶² = ¶|¶². Suppose two spin operators S(1) and S (2) act on two spin-one-half systems with (1) ( 2) states |¶²1 that are eigenstates of S3 and states |¶²2 that are eigenstates of S3 (1.455) = ¶ 21 ±|¶²1 and S32 |¶²2 = ¶ 21 ±|¶²2. Then the tensor-product states |¶, ¶² = |¶²1|¶²2 = |¶²1 ⊗ |¶²2 are eigenstates of S3(1) |¶²1

( )

both S3(1) and S(32)

S3(1) |¶ , s2 ² = ¶

1 ± |+ , s2 ² and 2

S3(2) |s1 , ¶² = ¶

1 ± |s1 , ¶². 2

(1.456)

1.34 Tensor Products and Entanglement

75

These states also are eigenstates of the third component of the spin operator of the combined system S3 = S3

(1)

+ S32

( )

that is

S3 |s1, s2² =

1 ± (s1 + s2 ) |s1 , s2 ². 2

(1.457)

Thus S3|+, +² = ±|+, +², and S3|− , −² = − ±|−, −², while S3|+, −² = 0 and S3 |−, +² = 0. Using the notation (1.454), we can compute the effect of the operator S 2 on the state | + +². We find for S12

Á2 Á2 2 À + S12 | + +² = ±4 σ 11 + σ 12 | + +² À Á À Á = 21 ±2 1 + σ1 1 σ12 | + +² = 21 ±2 | + +² + σ11 |+²σ12 |+² À

S12 | + +² = S1

(1)

( )

( )

( )

( )

( )

( )

= 21 ±2 (| + +² + | − −²)

( )

(1.458)

and leave S22 and S32 to Exercise 1.36. Example 1.63 (Entangled states) A neutral pion π 0 has zero angular momentum and negative parity. Its mass is 135 MeV/ c2 and 99% of them decay into two photons with a mean lifetime of 8. 5 × 10−17 s. A π 0 at rest decays into two photons moving in opposite directions along the same axis, and the spins of the photons must be either parallel to their momenta |+, +², positive helicity, or antiparallel to their momenta |−, −², negative helicity. Parity reverses helicity, and so the state of negative parity and zero angular momentum is À Á |γ , γ ² = √1 |+, +² − |−, −² . (1.459) 2 The two photons have the same helicity. If the helicity of one photon is measured to be positive, then a measurement of the other photon will show it to have positive helicity. The state is entangled. One π 0 in 17 million will decay into a positron and an electron in a state of zero angular momentum. The spin part of the final state is

À Á |e+, e− ² = √1 |+, −² − |−, +² . 2

(1.460)

If the spin along any axis of one of the electrons is measured to be positive, then a measurement of the spin of the other electron along the same axis will be negative. The state is entangled.

76

1 Linear Algebra

1.35 Density Operators A general quantum-mechanical system is represented by a density operatorρ that is hermitian ρ † = ρ , of unit trace Trρ = 1, and positive ±ψ |ρ |ψ ² ≥ 0 for all kets |ψ ². If the state |ψ ² is normalized, then ±ψ |ρ |ψ ² is the nonnegative probability that the system is in that state. This probability is real because the density matrix is hermitian. If {|k ²} is any complete set of orthonormal states I

=

± k

|k²±k|

(1.461)

then the probability that the system is in the state |k ² is pk

= ±k|ρ |k² = Tr (ρ|k²±k |) .

= 1, the sum of these probabilities is unity ¸ ± ¹ ± ± pk = ±k |ρ |k² = Tr ρ |k²±k | = Tr (ρ I ) = Trρ = 1.

(1.462)

Since Trρ

k

k

(1.463)

k

A system that is measured to be in a state |k ² cannot simultaneously be measured to be in an orthogonal state |±². The probabilities sum to unity because the system must be in some state. Since the density operator ρ is hermitian and positive, it has a complete, orthonormal set of eigenvectors |k ² all of which have nonnegative eigenvalues ρk ρ

|k² = ρk |k ².

(1.464)

They afford for it an expansion in their outer products ρ

=

±

ρk

k

|k ²±k|

(1.465)

each weighted by the probability ρ k that the system is in the state |k ². A system composed of two systems, one with basis kets |i ² and the other with basis kets |k ², has basis states |i, k ² = |i ²|k ² and can be described by the density operator ρ

=

± i jk ±

|i, k²±i, k|ρ | j , ±²± j, ±|.

(1.466)

The density operator for the first system is the trace of ρ over the states |k ² of the second system ρ1

=

± k

±k|ρ |k² =

± i jk

|i ²±i, k |ρ | j , k²± j |

(1.467)

and similarly the density operator for the second system is the trace of ρ over the states |i ² of the first system

1.36 Schmidt Decomposition

ρ2

=

± i

±i |ρ |i ² =

± jk ±

77

|k²±i, k|ρ |i, ±²±±|.

(1.468)

Classical entropy is an extensive quantity like volume, mass, and energy. The classical entropy of a composite system is the sum of the classical entropies of its parts. But quantum entropy S = −kTr(ρ log ρ) is not necessarily extensive. The quantum entropy of an entangled system can be less than the sum of the quantum entropies of its parts. The quantum entropy of each of the eigenstates |γ , γ ² and |e+, e− ² of Example 1.63 is zero, but the sum of the quantum entropies of their parts is in both cases 2k log 2. 1.36 Schmidt Decomposition

Suppose and C

|ψ ² is an arbitrary vector in the tensor product of the vector spaces B |ψ ² =

m n ± ±

Aik |i ² ⊗ |k ².

i =1 k =1

(1.469)

The arbitrary matrix A has a singular-value decomposition (1.396) Aik

±

min(n ,m )

=

=1

Ui ± S ± V±†k .

(1.470)

±

In terms of this SVD, the vector |ψ ² is

|ψ ² = = where the state |U , ±² is

n ± m min ±n m ± ( ,

i =1 k =1

=1

Ui ± S± V±†k |i ² ⊗ |k ²

±

±

min(n,m ) ±

)

(1.471)

S± |U , ±² ⊗ |V † , ±²

=1

|U , ±² =

n ±

Ui ± |i ²

i =1

(1.472)

and is in the vector space B , and the state | V † , ±² is

|V †, ± ² =

m ± k =1

V±†k |k ²

(1.473)

and is in the vector space C . The states |U , ±² and | V † , ±² are orthonormal

±U , ± |U , ±³ ² = δ

³

±±

and

because the matrices U and V are unitary.

±V † , ±|V †, ±³ ² = δ

±±

³

(1.474)

78

1 Linear Algebra

The outer product of a tensor-product state (1.469) is a pure-state density operator ρ

= |ψ ²±ψ | =

± ±±³

S± S±³ (|U , ±² ⊗ |V †, ±²)( ±U , ±³ | ⊗ ±V † , ±³ |).

(1.475)

Taking the trace over a complete set of orthonormal states in B or C , we get the density operator ρ in the spaces B or C

±

± ±V †, ± ³³|ρ | V † , ±³³² = S2 |U , ±²±U , ±| ±³³ ± ρC = Tr B (ρ) = ±U , ± ³³|ρ |U , ±³³² = S2 |V † , ±²±V † , ±|. ρB

= TrC (ρ) =

±

±

±

±

(1.476)

±

³³

±

The density operators ρ B and ρC have the same eigenvalues and therefore the same von Neumann entropy s (ρ B ) =

s (ρC ) =

− k Tr B (ρB log ρ B ) = −k

− k TrC (ρC log ρC ) = −k

±

±

S±2 log ( S±2)

±

S±2 log ( S±2).

(1.477)

±

The nonzero, positive singular values S± are called Schmidt coefficients. The number of them is the Schmidt rank or Schmidt number of the state |ψ ². The state |ψ ² is entangled if and only if its Schmidt rank is greater than unity. 1.37 Correlation Functions

We can define two Schwarz inner products for a density matrix ρ . If | f ² and |g ² are two states, then the inner product

≡ ± f |ρ |g ² (1.478) for g = f is nonnegative, ( f, f ) = ± f |ρ | f ² ≥ 0, and satisfies the other conditions ( f , g)

(1.78, 1.79, and 1.81) for a Schwarz inner product. The second Schwarz inner product applies to operators A and B and is defined (Titulaer and Glauber, 1965) as ( A,

B ) = Tr

(ρ A† B ) = Tr ( Bρ A† ) = Tr ( A † Bρ ) .

(1.479)

This inner product is nonnegative when A = B and obeys the other rules (1.78, 1.79, and 1.81) for a Schwarz inner product. These two degenerate inner products are not inner products in the strict sense of (1.78–1.84), but they are Schwarz inner products, and so (1.98–1.99) they satisfy the Schwarz inequality (1.99) ( f,

f )( g , g ) ≥ |( f , g )|2 .

(1.480)

Applied to the first, vector, Schwarz inner product (1.478), the Schwarz inequality gives

1.37 Correlation Functions

79

± f |ρ | f ²±g|ρ | g² ≥ |± f |ρ |g²|2

(1.481)

which is a useful property of density matrices. Application of the Schwarz inequality to the second, operator, Schwarz inner product (1.479) gives (Titulaer and Glauber, 1965) Tr

(

†

ρA

) (

A Tr

ρB

†

B

) »» ( † )»»2 ≥ Tr ρ A B .

(1.482)

The operator Ei ( x ) that represents the i th component of the electric field at the point x is the hermitian sum of the “positive-frequency” part E i(+) ( x ) and its adjoint E i(−) ( x ) = ( E (i +) ( x )) † E i ( x ) = E i(+) ( x ) + Ei( −) (x ).

(1.483)

Glauber has defined the first-order correlation function G (i 1j ) ( x , y ) as (Glauber, 1963b)

À

Á

G i j ( x , y ) = Tr ρ E i −) ( x ) E (j+) ( y ) ( 1)

(

or in terms of the operator inner product (1.479) as

À

G (i 1j ) (x , y ) = Ei( +) (x ), E (j+) ( y ) By setting A

Á

(1.484)

(1.485)

.

= E i + (x ) and B = E j+ ( y ) in the Schwarz inequality (1.482), we (

)

(

)

find that the correlation function G (i 1j ) ( x , y ) is bounded by (Titulaer and Glauber, 1965)

»» 1 »»2 »G i j (x , y)» ≤ G ii1 (x , x )G j1j (y , y). ( )

( )

( )

(1.486)

Interference fringes are sharpest when this inequality is saturated

»» 1 »2 »G i j (x , y)»» = G ii1 (x , x )G j1j (y, y ) ( )

( )

( )

(1.487)

(1)

which can occur only if the correlation function G i j ( x , y ) factorizes (Titulaer and Glauber, 1965) ( 1) G i j ( x , y ) = Ei∗ ( x )E j ( y )

(1.488)

as it does when the density operator is an outer product of coherent states ρ

= |{αk }²±{αk }|

(1.489)

which are eigenstates of E i(+) ( x ) with eigenvalue Ei ( x ) (Glauber, 1963b,a) E i(+) ( x )|{αk }² = Ei ( x )|{αk }² .

The higher-order correlation functions ) G (i1n... i 2n (x 1 · · · x 2n ) = Tr

À

−) ( x

(

ρ Ei

1

1)

(1.490)

Á · · · E i − ( xn ) E i ++ ( xn+1) · · · E i + ( xn ) (

n

)

(

)

n 1

(

2n

)

(1.491)

80

1 Linear Algebra

satisfy similar inequalities (Glauber, 1963b) which also follow from the Schwarz inequality (1.482).

1.38 Rank of a Matrix

Four equivalent definitions of the rank R ( A) of an m × n matrix A are:

1. 2. 3. 4.

the number of its linearly independent rows, the number of its linearly independent columns, the number of its nonzero singular values, and the number of rows in its biggest square nonsingular submatrix.

A matrix of rank zero has no nonzero singular values and so is zero. Example 1.64(Rank)

The 3 × 4 matrix

⎛1 A = ⎝2 4

0 2 3

1 0 1

−2⎞ 2⎠

(1.492)

1

has three rows, so its rank can be at most 3. But twice the first row added to thrice the second row equals twice the third row, 2r1 + 3r2 − 2r3 = 0, so R ( A ) ≤ 2. The first two rows obviously are not parallel, so they are linearly independent. Thus the number of linearly independent rows of A is 2, and so A has rank 2.

1.39 Software High-quality software for virtually all numerical problems in linear algebra are available in the linear-algebra package Lapack. Lapack codes in Fortran and C++ are available at netlib.org/lapack/ and at math.nist.gov/tnt/. Apple’s Xcode command -framework accelerate links this software into gnu executables. The Basic Linear Algebra Subprograms (BLAS) on which Lapack is based are also available in Java at icl.cs.utk.edu/f2j/ and at math. nist.gov/javanumerics/. Matlab solves a wide variety of numerical problems. A free gnu version is available at gnu.org/software/octave/. Maple and Mathematica are good commercial programs for numerical and symbolic problems. Python (python. org), Scientific Python (scipy.org), and Sage (sagemath.org) are websites of free software of broad applicability. Maxima, xMaxima, and wxMaxima (maxima.sourceforge.net) are free Lisp programs that excel at computer algebra. Intel gives software to students and teachers (software.intel.com).

Exercises

81

Exercises

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

1.9 1.10 1.11 1.12 1.13 1.14 1.15

1.16 1.17 1.18

What is the most general function of three Grassmann numbers θ1 , θ2 , θ3 ? Derive the cyclicity (1.24) of the trace from Eq. (1.23). Show that ( AB ) T = B T AT , which is Eq. (1.26). Show that a real hermitian matrix is symmetric. Show that ( AB )† = B † A † , which is Eq. (1.29). Show that the matrix (1.41) is positive on the space of all real 2-vectors but not on the space of all complex 2-vectors. Show that the two 4 × 4 matrices (1.46) satisfy Grassmann’s algebra (1.11) for n = 2. Show that the operators a i = θi defined in terms of the Grassmann matrices (1.46) and their adjoints ai† = θi† satisfy the anticommutation relations (1.47) of the creation and annihilation operators for a system with two fermionic states. Derive (1.66) from (1.63–1.65). Fill in the steps leading to the formulas (1.74) for the vectors b³1 and b³2 and the formula (1.75) for the matrix a ³ . Show that the antilinearity (1.81) of the inner product follows from its first two properties (1.78 & 1.79). Show that the Minkowski product (x , y ) = x · y − x 0 y 0 of two 4-vectors x and y is an inner product obeying the rules (1.78, 1.79, and 1.84). Show that if f = 0, then the linearity (1.79) of the inner product implies that ( f, f ) and ( g , f ) vanish. Show that the condition (1.80) of being positive definite implies nondegeneracy (1.84). Show that the nonnegativity (1.82) of the Schwarz inner product implies the condition (1.83). Hint: the inequality ( f − λg , f − λg ) ≥ 0 must hold for every complex λ and for all vectors f and g. Show that the inequality (1.103) follows from the Schwarz inequality (1.102). Show that the inequality (1.105) follows from the Schwarz inequality (1.104). Use the Gram-Schmidt method to find orthonormal linear combinations of the three vectors

⎛ ⎞ 1 s1 = ⎝ 0 ⎠ , 0

⎛ ⎞ 1 s2 = ⎝ 1 ⎠ , 0

⎛ ⎞ 1 s3 = ⎝ 1 ⎠ . 1

(1.493)

1.19 Now use the Gram-Schmidt method to find orthonormal linear combinations of the same three vectors but in a different order

82

1 Linear Algebra

⎛ ⎞ 1 s1± = ⎝ 1 ⎠ , 1

1.20 1.21 1.22 1.23 1.24

1.26 1.27 1.28 1.29 1.30 1.31 1.32

0

⎛ ⎞ 1 s3± = ⎝ 0 ⎠ . 0

(1.494)

Did you get the same orthonormal vectors as in the previous exercise? Derive the linearity (1.125) of the outer product from its definition (1.124). Show that a linear operator A that is represented by a hermitian matrix (1.167) in an orthonormal basis satisfies (g , A f ) = ( A g , f ). Show that a unitary operator maps one orthonormal basis into another. Show that the integral (1.186) defines a unitary operator that maps the state |x ³² to the state |x ³ + a². For the 2 × 2 matrices A=

1.25

⎛ ⎞ 1 s2± = ⎝ 1 ⎠ ,

³1

´

2 3 −4

and

³2 −1´ B= 4 −3

(1.495)

verify equations (1.220–1.222). Derive the least-squares solution (1.248) for complex A, x, and y when the matrix A † A is positive. Show that the eigenvalues λ of a unitary matrix are unimodular, that is, |λ | = 1. What are the eigenvalues and eigenvectors of the two defective matrices (1.286)? Use (1.297) to derive expression (1.298) for the 2 × 2 rotation matrix exp( −i θ · σ /2) . Compute the characteristic equation for the matrix −i θ · J in which the generators are ( Jk )i j = i ²ik j and ² i jk is totally antisymmetric with ²123 = 1. Use the characteristic equation of exercise 1.29 to derive identities (1.301) and (1.302) for the 3×3 real orthogonal matrix exp(−i θ · J ). Show that the sum of the eigenvalues of a normal antisymmetric matrix vanishes. Consider the 2 × 3 matrix A A=

³

1 2 3 −3 0 1

´

.

(1.496)

Perform the singular value decomposition A = U SV T, where V T the transpose of V . Use Matlab or another program to find the singular values and the real orthogonal matrices U and V . 1.33 Consider the 6 × 9 matrix A with elements A j,k = x + x j + i ( y − y k ) in which x = 1 .1 and y = 1.02. Use Matlab or another program to find the singular values, and the first left and right singular vectors.

Exercises

83

1.34 Show that the totally antisymmetric Levi-Civita symbol ²i jk where ² 123 satisfies the useful relation 3 ± i =1

²i jk ² inm

= δ jn δkm − δ jm δ kn .

=1

(1.497)

1.35 Consider the hamiltonian H = 21 ±ωσ 3 where σ3 is defined in (1.453). The entropy S of this system at temperature T is S = −kTr [ρ ln(ρ)] in which the density operator ρ is e − H /(kT ) ¼ ½. (1.498) ρ = Tr e − H /(kT ) 1.36

1.37

1.38 1.39

1.40

Find expressions for the density operator ρ and its entropy S. )2 ( Use example 1.62 to find the action of the operator S 2 = S(1) + S(2) on the four states | ¶ ¶² and then find the eigenstates and eigenvalues of S2 in the space spanned by these four states. A system that has three fermionic states has three creation operators ai† and three annihilation operators ak which satisfy the anticommutation relations {ai , a†k } = δik and {ai , ak } = { ai†, a†k } = 0 for i , k = 1, 2, 3. The eight states of the system are |t , u , v ² ≡ (a 1† )t ( a2† )u (a3† ) v |0, 0, 0 ². We can represent them by eight 8-vectors each of which has seven 0’s with a 1 in position 4t + 2u + v + 1. How big should the matrices that represent the creation and annihilation operators be? Write down the three matrices that represent the three creation operators. Show that the Schwarz inner product (1.478) is degenerate because it can violate (1.84) for certain density operators and certain pairs of states. Show that the Schwarz inner product (1.479) is degenerate because it can violate (1.84) for certain density operators and certain pairs of operators. The coherent state |{αk }² is an eigenstate of the annihilation operator a k with eigenvalue αk for each mode k of the electromagnetic field, a k |{αk }² = ( +) αk |{αk }² . The positive-frequency part E i ( x ) of the electric field is a linear combination of the annihilation operators (

Ei

+)( x ) = ± a E (+) (k ) ei (kx −ω t ). k i k

+)

(1.499)

Show that |{αk }² is an eigenstate of Ei (x ) as in (1.490) and find its eigenvalue Ei (x ) . 1.41 Show that if X is a nondefective, nonsingular square matrix, then the variation of the logarithm of its determinant is δ ln(det X ) = Tr( X −1 δ X ). (

2 Vector Calculus

2.1 Derivatives and Partial Derivatives The derivative of a function f (x ) at a point x is the limit of the ratio d f (x ) dx

f (x ±) − f (x ) . x± − x

= xlim ± →x

Setting x ±

Example 2.1(Derivative of a monomial) compute the derivative of x n as dx n dx

=

lim

→0

(x

±

(2.1)

= x + ± and letting ± → 0, we

+ ±)n − x n ≈ x n + ± n x n−1 − x n = n x n−1 . ± ±

(2.2)

Similarly, adding fractions, we find dx −n dx

=

lim

±

→0

(x

+ ±) −n − x −n ≈ x n − (x n + ±nx n −1) = −n x −n −1. ± ± x 2n

(2.3)

The partial derivativeof a function with respect to a given variable is the whole derivative of the function with all its other variables held constant. For instance, the partial derivatives of the function f (x , y , z ) = x ² y n /z m with respect to x and z are ∂ f ( x , y, z) ∂x

−1

= ² x zmy ²

n

and

∂ f ( x , y, z) ∂z

n

= − m xzm+y1 . ²

(2.4)

One often uses primes or dots to denote derivatives as in f± 84

=

df , dx

f ±±

=

d2 f dx2

≡

d dx

±d f ² dx

,

f˙ =

df , dt

and

f¨ =

d2 f dt 2

.

(2.5)

2.2 Gradient

85

For higher or partial derivatives, one sometimes uses superscripts f ( k)

k

= dd xfk

f (k ,²)

and

k+²

= ∂∂xk ∂ yf

(2.6)

²

or subscripts, sometimes preceded by commas

= ∂∂ xf

and

= ∂k f = ∂∂xf

and

fx

=

f, x

f x yy

=

f,x yy

3

= ∂∂x ∂fy2 .

(2.7)

If variables x = x 1 , . . . , x n are labeled by indexes, derivatives can be labeled by subscripted indexes, sometimes preceded by commas f,k

k

f,k ²

2

= ∂k ∂ f = ∂ ∂x ∂fx ²

k

. ²

(2.8)

2.2 Gradient The change d p in a point p due to changes du 1, du2 , du 3 in its orthogonal coordinates u 1 , u 2 , u 3 is a linear combination dp =

=

p du 3 ∂ u1 ∂ u2 ∂ u3 e 1 du 1 + e 2 du 2 + e 3 du 3 ∂p

du 1 +

∂p

du 2 +

∂

(2.9)

of vectors e1 , e 2, e 3 that are orthogonal ei · ek

= hi hk δik . In terms of the orthonormal vectors eˆ j = e h j , the change d p is d p = h 1 eˆ1 du 1 + h 2 eˆ2 du 2 + h 3 eˆ3 du3 . The orthonormal vectors eˆ j have cyclic cross products 3 ³ eî × eˆ j = ±i jk e ˆk

(2.10)

/

k =1

(2.11)

(2.12)

in which ±i jk is the antisymmetric Levi-Civita symbol (1.196) with ±123 = 1. In rectangular coordinates, the change d p in a physical point p due to changes d x, dy, and dz in its coordinates is d p = xˆ d x + ˆy dy + zˆ dz, and the scale factors are all unity h x = h y = h z = 1. In cylindrical coordinates, the change d p in a point p due to changes d ρ , d φ, and dz in its coordinates is d p = ρˆ dρ + ρ φˆ d φ + ˆz dz, and the scale factors are h ρ = 1, h φ = ρ , and h z = 1. In spherical coordinates, the change is d p = rˆ dr + r θˆ d θ + r sin θ φˆ d φ, and the scale factors are h r = 1, h θ = r , and h φ = r sin θ . In these orthogonal coordinates, the change in a point is

86

2 Vector Calculus

⎧ ⎪⎨ xˆ d x + ˆy dy + ˆz dz dp = d ρ + ρ φˆ d φ + ˆz dz ⎪⎩ ρrˆˆ dr + r θˆ dθ + r sin θ φˆ dφ

(2.13)

.

The gradient ∇ f of a scalar function f is defined so that its dot product ∇ f · d p with the change d p in the point p is the change d f in f

∇ f · d p = (∇ f1 eˆ 1 + ∇ f2 eˆ2 + ∇ f 3 eˆ3 ) · ( eˆ1 h1du 1 + eˆ2 h2du 2 + eˆ3 h3 du3 ) = ∇ f1 h1du 1 + ∇ f2 h2 du 2 + ∇ f3h 3du 3 (2.14) = d f = ∂∂uf du 1 + ∂∂uf du 2 + ∂∂uf du 3. 1

2

3

Thus the gradient in orthogonal coordinates is

∇ f = heˆ1 ∂∂uf + heˆ2 ∂∂uf + heˆ3 1

1

2

∂f

3 ∂u 3

2

(2.15)

,

and in rectangular, cylindrical, and spherical coordinates it is

⎧ ∂f ⎪⎪ xˆ + ˆy ∂ f + ˆz ∂ f ⎪⎪ ∂ x ∂ y ∂ z ⎨ ∂ f φˆ ∂ f ∂ f ∇ f = ⎪ ρˆ ∂ρ + ρ ∂ φ + zˆ ∂ z ⎪⎪ ⎪⎩ rˆ ∂ f + θˆ ∂ f + φˆ ∂ f ∂r r ∂θ r sin θ ∂ φ

In particular the gradient of 1/r is

∇

±1² r

=

− rrˆ2

and

±

²

(2.16)

.

∇ |r − r ± | = − | rr −− rr± |3 . 1

±

(2.17)

In both of these formulas, the differentiation is with respect to r, not r ± . 2.3 Divergence The divergence of a vector v in an infinitesimal cube C is defined as the integral S of v over the surface of the cube divided by its volume V = h 1h 2 h 3 du1 du 2du 3 . The surface integral S is the sum of the integrals of v1 , v 2, and v3 over the cube’s three forward faces v1h 2du 2 h 3du 3 + v2 h 1du 1 h 3du 3 + v3 h 1du 1 h 2 du2 minus the sum of the integrals of v1 , v2 , and v3 over the cube’s three opposite faces. The surface integral is then S=

´ ∂ (v h h ) 1 2 3

∂u1

+

∂ (v2 h 1 h 3) ∂ u2

+

∂ (v3h 1h 2) ∂ u3

µ

du 1du 2du 3 .

(2.18)

2.3 Divergence

87

So the divergence ∇ · v is the ratio S / V

∇·v=

S V

1 h1 h2 h3

=

´ ∂ (v h h ) 1 2 3

∂u1

∂ (v2 h 1h 3)

+

∂ u2

+

∂ (v 3h 1h 2 )

µ

∂ u3

.

(2.19)

In rectangular coordinates, the divergence is

∇ · v = ∂∂vxx + ∂∂vyy + ∂∂vzz . In cylindrical coordinates, it is 1

∇ ·v = ρ

´ ∂ (v ρ) ρ

∂ρ

+

+

∂ vφ ∂φ

∂ (v z ρ) ∂z

µ

(2.20)

) 1 ∂v ∂ vz = ρ1 ∂ (ρv + + , ∂ρ ρ ∂φ ∂z ρ

φ

(2.21)

and in spherical coordinates it is 2

1 ∂ (v sin θ ) 1 ∂v ∇ · v = r12 ∂ (v∂rrr ) + r sin + . θ ∂θ r sin θ ∂ φ θ

φ

(2.22)

By assembling a suitable number of infinitesimally small cubes, one may create a three-dimensional region of arbitrary shape and volume. The sum of the products of the divergence ∇ · v in each cube times its volume d V is the sum of the surface integrals d S over the faces of these tiny cubes. The integrals over the interior faces cancel leaving just the integral over the surface ∂ V of the whole volume V . Thus we arrive at Gauss’s theorem

¶

V

∇ · v dV =

¶

∂V

v

· da

(2.23)

in which da is an infinitesimal, outward, area element of the surface that is the boundary ∂ V of the volume V . Example 2.2(Delta function) The integral of the divergence of rˆ / r 2 over any sphere, however small, centered at the origin is 4π

¶

± ² ¶ ¶ ¶ ∇ · rrˆ2 d V = rrˆ2 · da = rrˆ2 · r 2 rˆ d ³ =

d ³ = 4π.

(2.24)

Similarly, the integral of the divergence of r − r ± /| r − r ± |3 over any sphere, however small, centered at r ± is 4π

¶

¶ r − r± ± r − r± ² ∇ · |r − r ± |3 d V = |r − r ± |3 · da ¶ ¶ ± ± = |rr −− rr± |3 · | r − r ± |2 |rr −− rr ± | d ³ =

d ³ = 4π. (2.25)

88

2 Vector Calculus

²= 0 and for r − r ± ²= 0, are delta functions ± r − r± ² 3 ± and ∇ · | r − r ± | 3 = 4π δ ( r − r )

These divergences, vanishing for r

± ² ∇ · rrˆ2 = 4π δ 3 (r )

(2.26)

because if f (r ) is any suitably smooth function, then the integral over any volume that includes the point r ± is

± r − r± ² 3 ± f (r ) ∇ · (2.27) |r − r ±|3 d r = 4π f ( r ). Example 2.3(Gauss’s law) The divergence ∇ · E of the electric field is the charge density ρ divided by the electric constant ±0 = 8.854 × 10−12 F/m ∇ · E = ±ρ . (2.28) ¶

0

So by Gauss’s theorem, the integral of the electric field over a surface ∂ V that bounds a volume V is the charge inside divided by ±0

¶

∂V

E · da =

¶

V

¶

∇ · E dV =

ρ

V ±0

=

dV

QV ±0

(2.29)

.

2.4 Laplacian The laplacian is the divergence (2.19) of the gradient (2.15). So in orthogonal coordinates it is

² f ≡ ∇2 f ≡ ∇ · ∇ f =

1 h1 h2 h3

·³ 3

In rectangular coordinates, the laplacian is 2

k =1

∂

±h h h

1 2 3 ∂f h 2k ∂ u k

∂uk

2

²¸

.

2

² f = ∂∂ xf2 + ∂∂ y f2 + ∂∂ z2f . In cylindrical coordinates, it is 1

²f = ρ

´∂± ∂ρ

ρ

∂f ∂ρ

²

1 ∂2 f

∂2

= =

µ

1

+ ρ ∂ φ2 + ρ ∂ z2 = ρ ∂ρ

and in spherical coordinates it is ´∂± ² 1 ∂f 2 ² f = r 2 sin θ ∂r r sin θ ∂ r +

±

f

²

∂ ∂θ

±

±

∂

sin θ

±

∂f ∂θ

²

ρ

(2.30)

(2.31) ∂f ∂ρ

²

2

2

+ ρ12 ∂∂ φf2 + ∂∂ z2f ,

(2.32)

²

+ ∂φ ∂

±

1 ∂f sin θ ∂ φ

∂2 f 1 ∂ ∂f 1 1 ∂ 2 ∂f r + sin θ + r 2 ∂r ∂r r 2 sin θ ∂ θ ∂θ r 2 sin2 θ ∂ φ 2 ± ² 1 ∂2 ( ) ∂2 f 1 ∂ ∂f 1 r f + . sin θ + r ∂r 2 r 2 sin θ ∂ θ ∂θ r 2 sin2 θ ∂ φ2

²µ (2.33)

2.5 Curl

89

Example 2.4(Delta function as laplacian of 1/ r) By combining the gradient (2.17) of 1/ r with the representation (2.26) of the delta function as a divergence, we can write delta functions as laplacians (with respect to r)

−³

± 1² r

= 4π δ

3

(r )

and

±

²

1

− ³ |r − r ± | = 4π δ 3( r − r ± ).

(2.34)

Example 2.5(Electric field of a uniformly charged sphere) The electric field E = − ∇φ − A˙ in static problems is just the gradient E = − ∇ φ of the scalar potential φ. Gauss’s law (2.28) then gives us Poisson’s equation ∇ · E = − ²φ = ρ/±0 . Writing the laplacian in spherical coordinates (2.33) and using spherical symmetry, we form the differential equation 2

− r1 drd 2

(r φ ) =

ρ

(2.35)

±0

in which ρ is the uniform charge density of the sphere. Integrating twice and letting the constant a be φ (r ) at r = 0, we find the potential inside the sphere to be φ (r ) = a − ρ r 2/(6±0 ). Outside the sphere, the charge density vanishes and so the second r-derivative of r φ vanishes, (r φ) ±± = 0. Integrating twice, we get for the potential outside the sphere φ (r ) = b/ r after dropping a constant term because φ (r ) → 0 as r → ∞. The electric field E = − ∇ φ is the negative gradient (2.16) of the potential, so on the surface of the sphere where the interior and exterior solutions meet at r = R, it is ρR = rˆ Rb2 . (2.36) E = − rˆ 3±0 Thus b = ρ R 3/(3±0 ). Matching the interior potential to the exterior potential on the surface of the sphere at r = R gives a = ρ R 2/(2±0 ). So the potential of a uniformly charged sphere of radius R is φ (r )

=

¹ ρ ( R2 − r 2 /3) /(2± ) 0 3 ρ R /(3± 0r )

≤R r≥R r

.

(2.37)

2.5 Curl The directed area d S of an infinitesimal rectangle whose sides are the tiny perpendicular vectors h i eî du i and h j eˆ j du j (fixed i and j) is their cross-product (2.12) d S = h i eî du i

× h j eˆ j du j =

3 ³ k =1

eˆk h i h j dui du j .

(2.38)

90

2 Vector Calculus

The line integral of the vector f along the perimeter of this infinitesimal rectangle is

º

f · dl

=

± ∂ (h

j

f j)

∂ ui

−

∂ (h i f i )

²

∂u j

dui du j .

(2.39)

The curl ∇ × f of a vector f is defined to be the vector whose dot product with the area (2.38) is the line integral (2.39)

∇ × f ) · d S = (∇ × f )k h i h j du i du j =

(

± ∂ (h

f j)

j

∂ui

−

∂ (h i f i ) ∂u j

²

du i du j

(2.40) in which i, j , k are 1, 2, 3 or a cyclic permutation of 1, 2, 3. Thus the kth component of the curl is

∇ × f )k =

(

1 hi h j

± ∂ (h

j

f j)

∂ ui

−

∂ (h i fi )

²

(no sum),

∂u j

(2.41)

and the curl as a vector field is the sum over i, j, and k from 1 to 3

∇× f =

3 ³ i , j,k =1

±i jk

eˆk hi h j

∂ (h j

f j)

∂ ui

(2.42)

.

In rectangular coordinates, the scale factors are all unity, and the ith component of the curl ∇ × f is (

∇ × f )i =

3 ³

j ,k =1

± i jk

∂ fk ∂ xj

We can write the curl as a determinant

∇× f =

1 h 1h 2h 3

»» h eˆ »» 1 1 »» ∂1 h1 f 1

3 ³

=

j ,k =1

± i jk ∂ j

h 2 eˆ2 ∂2

h 2 f2

f k.

» ∂3 »» . h 3 f3 » h 3 eˆ3 »»

(2.43)

(2.44)

In rectangular coordinates, the curl is

»» xˆ yˆ zˆ »» » » ∇ × f = »» ∂x ∂y ∂z »» . » fx f y f z »

In cylindrical coordinates, it is

»» 1 »» ρˆ ∇× f = ρ »∂ »f

ρ ρ

ˆ zˆ »»» ∂ ∂z »» ρf fz » ρφ φ

φ

(2.45)

(2.46)

2.5 Curl

and in spherical coordinates, it is

»» rˆ 1 »» ∇ × f = r 2 sin θ » ∂r » fr

91

r θˆ

r sin θ φˆ

∂θ

∂φ

r fθ

r sin θ f φ

»» »» »».

(2.47)

Sums of products of two Levi-Civita symbols (1.196) yield useful identities 3 ³ i =1

±i jk ±imn

= δ jm δkn − δ jn δkm

3 ³

and

i , j =1

±i jk ±i jn

= 2δkn

(2.48)

in which δ jm is Kronecker’s delta (1.34). Thus the curl of a curl is 3 3 ³ ¼∇ × (∇ × A )] = ³ ±i jk ∂ j ±kmn ∂m An = (δ im δ jn − δin δ jm )∂ j ∂m A n i j ,k ,m ,n=1

j,m ,n=1

= ∂i ∇ · A − ³ A i

or

(2.49)

∇ × (∇ × A) = ∇ (∇ · A) − ³ A.

By assembling a suitable set of infinitesimal rectangles d S, we may create an arbitrary surface S . The surface integral of the dot product ∇ × f · d S over the tiny rectangles d S that make up the surface S is the sum of the line integrals along the sides of these tiny rectangles. The line integrals over the interior sides cancel leaving just the line integral along the boundary ∂ S of the finite surface S. Thus the integral of the curl ∇ × f of a vector f over a surface is the line integral of the vector f along the boundary of the surface

¶

S

∇ × f )· dS =

(

¶

∂S

f · d±

(2.50)

which is one of Stokes’s theorems. Example 2.6(Maxwell’s equations) In empty space, Maxwell’s equations in SI units ˙ and c2 ∇ × B = E.˙ They imply that the are ∇ · E = 0, ∇ · B = 0, ∇ × E = − B, voltage induced in a loop is the negative of the rate of change of the magnetic induction through the loop V

=

º

∂S

E · dx

= − ´˙ B = −

¶

S

B˙ · da

(2.51)

and that the magnetic induction induced in a loop is the rate of change of the electric flux through the loop divided by c2 B

=

¶

∂S

B · dx

= c12 ´˙ E = c12

¶

S

E˙ · da.

(2.52)

Maxwell’s equations in empty space and the curl identity (2.49) imply that

∇ × (∇ × E ) = ∇(∇ · E ) − ³ E = −³ E = −∇ × B˙ = − E¨ /c2

(2.53)

92

2 Vector Calculus

∇ × (∇ × B) = ∇ (∇ · B ) − ³ B = −³ B = ∇ × E˙ /c 2 = − B¨ / c2

(2.54)

or

² E = E¨ /c2 and ² B = B¨ /c2 . The exponentials E (k, ω) = ² ei k·r − t and B( k, ω) = (kˆ × ²/c) ei k· r − ˆ · ² = 0 obey these wave equations. ω = | k | c and k (

ω )

(

(2.55)

ωt )

with

Exercises

2.1 Derive the Levi-Civita identity 3 ³ i =1

±i jk ± imn

= δ jm δkn − δ jn δkm .

(2.56)

= 2δkn .

(2.57)

2.2 Derive the Levi-Civita identity 3 ³ i , j =1

±i jk ±i jn

2.3 Show that

∇ × (a × b) = a ∇ · b − b ∇ · a + (b · ∇)a − (a · ∇ )b. (2.58) 2.4 Simplify ∇ × ∇φ and ∇ · (∇ × a ) in which φ is a scalar field and a is a vector field. 2.5 Simplify ∇ · (∇φ × ∇ ψ ) in which φ and ψ are scalar fields. 2.6 Let B = ∇ × A and E = −∇φ − A˙ and show that Maxwell’s equations in vacuum (example 2.6) and the Lorentz gauge condition

˙ c2 = 0 ∇ · A + φ/

(2.59)

imply that A and φ obey the wave equations

¨ c2 = 0 ³φ − φ/

and

³ A − A¨ /c2 = 0.

(2.60)

3 Fourier Series

3.1 Fourier Series

√

The phases exp(inx )/ 2π for integer n are orthonormal on an interval of length 2π

±2 0

π

e −imx e inx

√ √ 2π

2π

dx

±

=

ei (n−m ) x dx 2π

2π 0

²1

= δm n = ,

if m if m

0

=n ±= n

(3.1)

in which δn,m is Kronecker’s delta (1.34). So if a function f ( x ) is a sum of these phases, called a Fourier series, f (x ) =

∞ ³

n=−∞

fn

inx

√e

2π

(3.2)

,

√

then the orthonormality (3.1) of these phases exp (inx )/ 2π gives the nth coefficient fn as the integral

±

2π 0

e −inx

√

2π

f (x ) d x

=

±2

π

0

e−inx

√

∞ ³

2π m =−∞

fm

imx

√e

2π

dx

=

∞ ³

δn ,m fm

m =−∞

=

fn .

(3.3)

Fourier series can represent functions f ( x ) that are square integrable on the interval 0 < x < 2 π (Joseph Fourier 1768–1830). In Dirac’s notation, we interpret the phases inx

² x |n³ = √e

(3.4)

2π

as the components of the vector |n ³ in the |x ³ basis. These components are inner products ²x |n ³ of |n ³ and |x ³. The orthonormality integral (3.1) shows that the inner product of |n ³ and |m ³ is unity when n = m and zero when n ± = m

²m |n³ = ²m | I |n³ =

±

2π 0

² m |x ³²x |n³ d x =

±

2π 0

ei (n−m ) x dx 2π

= δm n . ,

(3.5) 93

94

3 Fourier Series

Here I is the identity operator of the space spanned by the vectors |x ³

±

=

I

2π 0

| x³²x | d x.

(3.6)

Since the vectors |n ³ are orthonormal, a sum of their outer products |n ³²n | also represents the identity operator

=

I

∞ ³

n =−∞

|n³²n|

(3.7)

of the space they span. This representation of the identity operator, together with the formula (3.4) for ² x |n ³, shows that the inner product f ( x ) = ²x | f ³, which is the component of the vector | f ³ in the | x ³ basis, is given by the Fourier series (3.2)

∞ ³

f ( x ) = ² x | f ³ = ²x | I | f ³ =

=

∞ ³

∞ ³

e inx

n=−∞

²x |n³²n | f ³

n=−∞

√ ²n| f ³ = 2π

e inx

n=−∞

√

2π

(3.8) fn .

Similarly, the other representation (3.6) of the identity operator shows that the inner products f n = ²n | f ³, which are the components of the vector | f ³ in the |n ³ basis, are the Fourier integrals (3.3) fn

= ²n | f ³ = ²n| I | f ³ =

±

2π 0

²n |x ³²x | f ³ d x =

±2

π

e −inx

√

2π

0

f (x ) d x .

(3.9)

The two representations (3.6 & 3.7) of the identity operator also give two ways of writing the inner product ² g | f ³ of two vectors | f ³ and |g ³

² g| f ³ =

∞ ³

²g|n³²n| f ³ =

∞ ³

n=−∞ ±n =−∞ ±2 2 = ² g|x ³²x | f ³ d x = π

0

0

gn∗ fn π

(3.10) g∗ ( x) f ( x ) d x .

When the vectors are the same, this identity shows that the sum of the squared absolute values of the Fourier coefficients f n is equal to the integral of the squared absolute value | f ( x ) |2

²f|f³ =

∞ ³

n=−∞

|²n| f ³| = 2

∞ ³

n =−∞

| fn | = 2

±2 0

π

|²x | f ³|

2

dx

=

±

2π 0

| f (x )|2 d x. (3.11)

Fourier series (3.2 and 3.8) are periodic with period 2 π because the phases ² x |n ³ are periodic with period 2π , exp(in (x + 2π )) = exp(inx ). Thus even if the function

3.1 Fourier Series

95

f ( x ) which we use in (3.3 and 3.9) to make the Fourier coefficients fn = ²n | f ³ is not periodic, its Fourier series (3.2 and 3.8) will nevertheless be strictly periodic, as illustrated by Figs. 3.2 and 3.4. The complex conjugate of the Fourier series (3.2 and 3.8) is f ∗(x ) =

∞ ³

f n∗

n=−∞

−inx

e √

2π

=

∞ ³ n=−∞

f−∗n

inx

√e

(3.12)

2π

so the nth Fourier coefficient fn ( f ∗ ) for f ∗ ( x ) is the complex conjugate of the −nth Fourier coefficient for f (x ) f n ( f ∗) = f −∗n ( f ).

(3.13)

Thus if the function f ( x ) is real, then fn ( f ) = fn ( f ∗) = f −∗n ( f ) Example 3.1 (Fourier series by inspection) exp(exp(i x )) has the Fourier series

or

fn

=

f−∗n .

(3.14)

The doubly exponential function

∞ 1 ´ ix µ ³ exp e = n ! einx

(3.15)

n=0

in which n! = n(n − 1) · · · 1 is n-factorial with 0! ≡ 1. Example 3.2(Beats) The sum of two sines f (x ) frequencies ω1 ≈ ω2 is the product (exercise 3.1) f (x ) = 2 cos

=

sin ω1 x

+ sin ω2x

1 1 (ω 1 − ω2 )x sin (ω 1 + ω2 )x . 2 2

of similar (3.16)

The first factor cos 21 (ω1 − ω2)x is the beat; it modulates the second factor sin 21 (ω1 + ω2 )x as illustrated by Fig. 3.1. Example 3.3(Laplace’s equation) f (ρ , θ ) =

The Fourier series (exercise 3.2)

∞ ´ ρ µ|n| ¶ ± 2 ³

π

n=−∞

a

0

´ e −in θ dθ ´ h(θ ´ ) √

·

2π

in θ

√e

2π

(3.17)

(Ritt, 1970, p. 3) obeys Laplace’s equation (7.23) 1 d ρ dρ

for ρ

0, but its

= s −1 k

(4.130)

(

) + e−kt = s2 −s k 2

(4.131)

(

) − e−kt = s 2 −k k2 .

(4.132)

dt e−( s −k )t

is well defined for Re s > k with a simple pole at s = k (section 6.10) and is square integrable for Re s > k + ³ . The Laplace transforms of cosh kt and sinh kt are f (s ) = and f (s ) =

²∞ 0

²∞ 0

dt e−st cosh kt

dt e−st sinh kt

=

1 2

=

1 2

²∞ 0

²∞ 0

dt e−st e kt

dt e−st ekt

4.9 Laplace Transforms

The Laplace transform of cos ωt is f (s ) =

²

∞ 0

1 dt e −st cos ω t = 2

and that of sin ωt is f (s ) =

²

∞ 0

dt e −st sin ωt

=

1 2i

²∞ 0

²∞ 0

147

(

) + e−i t = s 2 +s ω2

(4.133)

(

) − e−i t = s2 +ω ω2 .

(4.134)

dt e−st e i ωt

dt e−st ei ωt

ω

ω

Example 4.13(Lifetime of a fluorophore) Fluorophores are molecules that emit visible light when excited by photons. The probability P (t , t µ ) that a fluorophore with a lifetime τ will emit a photon at time t if excited by a photon at time t µ is P (t , t µ ) = τ e−(t −t

µ )/τ

θ (t

− t µ)

(4.135)

in which θ (t − t µ) = (t − t µ + |t − t µ | )/2|t − t µ | is the Heaviside function. One way to measure the lifetime τ of a fluorophore is to modulate the exciting laser beam at a frequency ν = 2π ω of the order of 60 MHz and to detect the phase-shift φ in the light L (t ) emitted by the fluorophore. That light is the integral of P (t , t µ ) times the modulated beam sin ωt or equivalently the convolution of e−t /τ θ (t ) with sin ω t

²∞

²∞ µ µ µ −(t −t µ)/τ θ (t − t µ ) sin(ωt µ ) dt µ L (t ) = P (t , t ) sin(ωt ) dt = τe −∞ ²−∞ t = τ e−(t −t µ)/τ sin(ωt µ ) dt µ . (4.136) −∞ Letting u = t − t µ and using the trigonometric formula sin(a − b) = sin a cos b − cos a sin b

(4.137)

we may relate this integral to the Laplace transforms of a sine (4.134) and a cosine (4.133) L (t ) =

−τ

²∞

e−u/τ sin ω (u − t ) du

²0 ∞

= − τ e−u (sin ωu cos ωt − cos ω u sin ωt ) du ´ sin0(ωt )/τ ω cos ω t µ = τ 1/τ 2 + ω2 − 1/τ 2 + ω2 . (4.138) ½ ½ Setting cos φ = (1/τ )/ 1/τ 2 + ω 2 and sin φ = ω/ 1/τ 2 + ω2, we have τ τ L (t ) = ½ sin(ωt − φ). (sin ωt cos φ − cos ω t sin φ) = ½ 2 2 1/τ + ω 1/τ 2 + ω 2 /τ

(4.139)

148

4 Fourier and Laplace Transforms

The phase-shift φ then is given by

= arcsin ½

φ

ω

1/τ 2

+

≤ π2 .

ω2

(4.140)

So by inverting this formula, we get the lifetime of the fluorophore τ = (1/ω) tan φ in terms of the phase-shift φ which is much easier to measure.

(4.141)

4.10 Derivatives and Integrals of Laplace Transforms The derivatives of a Laplace transform f (s ) are by its definition (4.128) d n f (s ) ds n

=

²∞ 0

dt (−t ) n e −st F (t ).

(4.142)

They usually are well defined if f (s ) is well defined. For instance, if we differentiate the Laplace transform (4.129) of the function F (t ) = 1 which is f ( s ) = 1/s, then we get (

−1)

n

d n s −1 ds n

²∞

n!

= s n+1 =

0

dt e−st t n

(4.143)

which tells us that the Laplace transform of t n is n !/s n+1 . The result of differentiating the function F (t ) also has a simple form. Integrating by parts, we find for the Laplace transform of F µ (t )

² ∞ ¾d » −st

²∞ 0

dt e−st F µ (t ) =

0

dt

dt

²

e

¼

F (t )

∞

−

d F (t ) e −st dt

= − F ( 0) + dt F (t ) s e−st 0 = − F ( 0) + s f (s) as long as e−st F (t ) → 0 as t → ∞.

¿

(4.144)

The indefinite integral of the Laplace transform (4.128) is (1)

f (s ) ≡

²

ds1 f ( s1 ) =

and its nth indefinite integral is (n)

f (s ) ≡

²

ds n ¸ ¸ ¸

²

ds1 f (s 1) =

²

∞ 0

dt

²∞ 0

dt

e−st F (t ) (−t )

(4.145)

e −st F (t ). (−t ) n

(4.146)

If f ( s ) is a well-behaved function, then these indefinite integrals usually are well defined for s > 0 as long as F (t ) → 0 suitably as t → 0.

4.11 Laplace Transforms and Differential Equations

149

4.11 Laplace Transforms and Differential Equations Suppose we wish to solve the differential equation P (d /ds ) f (s ) = j (s ). By writing f (s ) and j (s ) as Laplace transforms f ( s) =

²∞ 0

e−st F (t ) dt

j (s ) =

and

²

(4.147)

∞ 0

e −st J (t ) dt

(4.148)

and using the formula (4.142) for the n th derivative of a Laplace transform, we see that the differential equation (4.147) amounts to

²∞

P (d /ds ) f (s ) =

0

e −st P (−t ) F (t ) dt

=

²

∞

0

e −st J (t ) dt

(4.149)

which is equivalent to the algebraic equation F (t ) =

J (t ) . P ( −t )

(4.150)

A particular solution to the inhomogeneous differential equation (4.147) is then the Laplace transform of this ratio f (s ) =

²

∞ 0

e −st

J (t ) dt . P (−t )

(4.151)

A fairly general solution of the associated homogeneous equation P (d /ds ) f (s ) = 0 is the Laplace transform f ( s) = because P ( d /ds ) f (s ) =

²∞ 0

²∞ 0

(4.152)

e−st δ( P (−t )) H ( t ) dt

e−st P ( −t ) δ( P (−t )) H (t ) dt

(4.153)

=0

(4.154)

as long as the function H (t ) is suitably smooth but otherwise arbitrary. Thus our solution of the inhomogeneous equation (4.147) is the sum of the two f (s ) =

²

∞ 0

J (t ) e −st dt + P (−t )

²

∞ 0

e −st δ( P (−t )) H (t ) dt .

(4.155)

One may generalize this method to differential equations in n variables. But to carry out this procedure, one must be able to find the inverse Laplace transform J (t ) of the source function j (s ) as outlined in the next section.

150

4 Fourier and Laplace Transforms

4.12 Inversion of Laplace Transforms How do we invert the Laplace transform f (s ) =

²∞ 0

dt e−st F (t )?

First we extend the Laplace transform from real s to s f (s

+ iu ) =

²

∞ 0

(4.156)

+ iu

dt e−( s +iu )t F (t )

(4.157)

and choose s to be sufficiently positive that f (s + iu) is suitably smooth and bounded. Then we apply the delta-function formula (4.36) to the integral

² ∞ du

−∞ 2π

e

iut

f (s

+ iu ) = = =

² ∞ du ²

2π ²−∞ ∞

∞ 0

µ dt µ eiut e −(s +iu )t F (t µ )

²

∞ du µ eiu (t −t ) −∞ 2π ²0 ∞ µ dt µ e −st F (t µ ) δ( t − t µ ) = e−st F (t ). µ dt µ e −st F (t µ )

(4.158)

0

So our inversion formula is F(t ) = e

st

² ∞ du −∞ 2π

e iut f ( s + iu )

(4.159)

for sufficiently large s. Some call this inversion formula a Bromwich integral, others a Fourier–Mellin integral. 4.13 Application to Differential Equations Let us consider a linear partial differential equation in n variables P (∂1 , . . . , ∂n ) f ( x 1, . . . , x n ) = g ( x1 , . . . , x n )

(4.160)

in which P is a polynomial in the derivatives ∂j

≡ ∂∂x

j

(4.161)

with constant coefficients. If g = 0, the equation is homogeneous; otherwise it is inhomogeneous. We expand the solution and source as integral transforms f ( x 1, . . . , x n ) = g ( x 1, . . . , x n ) =

²

²

f˜(k 1, . . . , k n ) ei (k1 x 1+¸¸¸+kn xn ) d n k g˜( k1 , . . . , kn ) e i ( k1 x1+¸¸¸+kn xn ) d n k

(4.162)

4.13 Application to Differential Equations

151

in which the k integrals may run from −∞ to ∞ as in a Fourier transform or up the imaginary axis from 0 to ∞ as in a Laplace transform. The correspondence (4.58) between differentiation with respect to x j and multiplication by ik j tells us that ∂ m j acting on f gives m ∂j

f (x 1 , . . . , x n ) =

²

f˜(k 1, . . . , kn ) (ik j )m e i ( k1 x1 +¸¸¸+k n xn ) d n k .

(4.163)

If we abbreviate f (x 1 , . . . , x n ) by f ( x ) and do the same for g, then we may write our partial differential equation (4.160) as P (∂1, . . . , ∂n ) f ( x ) =

²

²

f˜(k ) P (ik1 , . . . , ikn ) ei (k1 x1 +¸¸¸+k n xn ) d n k

= g˜ (k ) e

i (k 1 x 1 +¸¸¸+kn x n )

n

(4.164)

d k.

Thus the inhomogeneous partial differential equation P (∂1 , . . . , ∂n ) fi ( x 1, . . . , xn ) = g ( x 1, . . . , xn )

(4.165)

becomes an algebraic equation in k-space P (ik1 , . . . , ikn ) f˜i (k 1 , . . . , k n ) = g˜ (k 1, . . . , k n )

(4.166)

where g˜ (k 1 , . . . , k n ) is the mixed Fourier–Laplace transform of g ( x 1, . . . , x n ). So one solution of the inhomogeneous differential equation (4.160) is fi (x 1 , . . . , x n ) =

²

e i (k 1x 1+¸¸¸+kn x n)

g˜( k1 , . . . , k n ) n d k. P (ik1 , . . . , ikn )

(4.167)

The space of solutions to the homogeneous form of equation (4.160) P (∂1 , . . . , ∂n ) f h (x 1 , . . . , x n ) = 0

(4.168)

is vast. We will focus on those that satisfy the algebraic equation P (ik1 , . . . , ikn ) f˜h (k1 , . . . , kn ) = 0

(4.169)

and that we can write in terms of Dirac’s delta function as f˜h (k 1, . . . , k n ) = δ( P (ik1, . . . , ikn )) h ( k1 , . . . , k n )

(4.170)

in which the function h (k ) is arbitrary. That is f h ( x) =

²

e i (k 1x 1+¸¸¸+kn x n ) δ( P (ik1 , . . . , ikn )) h (k ) d n k .

(4.171)

152

4 Fourier and Laplace Transforms

Our solution to the differential equation (4.160) then is a sum of a particular solution (4.167) of the inhomogeneous equation (4.166) and our solution (4.171) of the associated homogeneous equation (4.168) f (x 1 , . . . , x n ) =

²

i (k1 x1 +¸¸¸+k n xn )

e

À g˜ (k , . . . , k ) 1 n

P (ik1, . . . , ikn )

Á + δ(P (ik1 , . . . , ikn )) h(k1, . . . , kn ) d n k

(4.172)

in which h (k 1 , . . . , k n ) is an arbitrary function. The wave equation and the diffusion equation will provide examples of this formula f ( x) =

²

e

ik ¸x

À g˜(k)

P (ik )

Á

+ δ( P (ik))h(k )

dn k .

(4.173)

Example 4.14(Wave equation for a scalar field) A free scalar field φ (x ) of mass m in flat spacetime obeys the wave equation

¹ 2 2 2º ∇ − ∂t − m φ ( x ) = 0

(4.174)

in natural units (± = c = 1). We may use a 4-dimensional Fourier transform to represent the field φ (x ) as φ (x )

=

²

˜ k) eik ¸ x φ(

d4 k (2π ) 2

(4.175)

in which k ¸ x = k · x − k 0t is the Lorentz-invariant inner product. The homogeneous wave equation (4.174) then says

²¹ ¹ 2 2 2º º 4 ˜ k) d k2 = 0 ∇ − ∂t − m φ (x ) = − k2 + (k0 )2 − m 2 eik¸ x φ( (2π )

(4.176)

which implies the algebraic equation

¹ 2 0 2 2º ˜ k) = 0 − k + (k ) − m φ(

(4.177)

an instance of (4.169). Our solution (4.171) is φ (x )

=

² ¹ º ik¸x d 4k 2 0 2 2 δ − k + (k ) − m e h(k ) (2 π )2

(4.178)

in which h (k) is an arbitrary function. The argument of the delta function

´

Â

µ´

− k −m = k − k + ½ has zeros at k0 = ± k2 + m 2 ≡ ±ωk with ³³ d P (±ω ) ³³ k ³ ³³ ³ = 2 ωk . 0 P (ik ) =

0 2 (k )

2

2

0

dk

2

m2

k

0

+

Â

k

2

+

m2

µ

(4.179)

(4.180)

4.13 Application to Differential Equations

153

So using our formula (4.47) for integrals involving delta functions of functions, we have φ(x)

=

²·

ei ( k· x−ωk t ) h+ (k) + ei ( k·x +ωk t ) h− (k)

¸

d 3k (2π ) 22 ωk

(4.181)

where h ±( k) ≡ h(±ωk , k). Since ωk is an even function of k, we can write φ (x )

=

²·

ei ( k·x−ωk t ) h + (k) + e−i ( k· x−ωk t ) h− (−k)

¸

d3k . (2 π )2 2ωk

(4.182)

If φ ( x ) = φ (x , t ) is a real-valued classical field, then its Fourier transform h(k) must obey the relation (4.25) which says that h− (−k) = h + (k)∗. If φ is a hermitian quantum field, then h− (−k) = h†+ (k). In terms of the annihilation operator √ a( k) ≡ h+ (k)/ 4π ωk and its adjoint a †(k), a creation operator, the field φ ( x ) is the integral φ(x)

=

²·

ei ( k·x −ωk t ) a(k) + e −i ( k·x− ωk t ) a †(k)

¸

d3 k

½

(2π ) 32 ωk

.

(4.183)

The momentum π canonically conjugate to the field is its time derivative π( x )

= −i

²·

e

i ( k· x−ω k t )

¸¶

a(k) − e−i ( k·x −ωk t ) a†( k)

ωk

2(2π ) 3

d 3k .

(4.184)

If the operators a and a† obey the commutation relations

[ a(k), a †(k² )] = δ(k − k²)

[a( k), a(k² )] = [a† (k), a †(k² )] = 0

and

(4.185)

then the field φ (x , t ) and its conjugate momentum π( y, t ) satisfy (Exercise 4.17) the equal-time commutation relations

[φ ( x, t ), π( y, t )] = i δ( x − y)

and

[φ ( x, t ), φ ( y, t )] = [π( x, t ), π( y, t )] = 0

(4.186)

which generalize the commutation relations of quantum mechanics

[q j , p ] = i ±δ j ±

,±

and

[q j , q ] = [ p j , p ] = 0 ±

±

(4.187)

for a set of coordinates q j and conjugate momenta p± . Example 4.15 (Fourier series for a scalar field) For a field defined in a cube of volume V = L 3 , one often imposes periodic boundary conditions (Section 3.14) in which a displacement of any spatial coordinate by ±L does not change the value of the field. A Fourier series can represent a periodic field. Using the relationship (4.99) between Fourier-transform and Fourier-series representations in 3 dimensions, we expect the Fourier series representation for the field (4.183) to be

154

4 Fourier and Laplace Transforms

φ (x )

= (2Vπ ) =

± k

±

3

√

k

½

(2π ) 32 ωk

Ã

1

·

1

(2 π )3

+ a† (k)e−i k·x −

ωk t )

(

·

a( k)ei (k· x−ωk t ) + a †(k)e−i ( k· x−ωk t )

V

2ωk V

a(k)ei ( k· x−ωk t )

¸

¸

(4.188)

in which the sum over k = (2π/ L )(±, n, m ) is over all (positive and negative) integers ±, n, and m. One can set ak and write the field as φ (x )

=

± k

√

≡

Ã

(2π ) 3

V

a( k)

(4.189)

·

1 2ωk V

ak ei (k ·x−ωk t ) + ak† e−i ( k·x −ωk t )

¸

.

(4.190)

The commutator of Fourier-series annihilation and creation operators is by (4.36, 4.185, and 4.189) 3

3

[ak , a†k² ] = (2Vπ ) [a(k), a† (k²)] = (2Vπ ) δ(k − k² ) 3 ² 3 3 = (2Vπ ) ei k− k² ¸ x (d2πx)3 = (2Vπ ) (2Vπ )3 δk k² = δk k² (

)

,

(4.191)

,

in which the Kronecker delta δ k,k² is δ±,±µ δn ,n µ δm ,m µ . Example 4.16(Diffusion) The flow rate J (per unit area, per unit time) of a fixed number of randomly moving particles, such as molecules of a gas or a liquid, is proportional to the negative gradient of their density ρ ( x, t ) J ( x, t ) = − D ∇ ρ ( x , t )

(4.192)

where D is the diffusion constant, an equation known as Fick’s law (Adolf Fick 1829–1901). Since the number of particles is conserved, the 4-vector J = (ρ , J ) obeys the conservation law

²

∂ ∂t

ρ (x , t ) d

3

x

=−

Ä

J (x , t ) ¸ da = −

²

∇ ¸ J (x , t )d 3 x

which with Fick’s law (4.192) gives the diffusion equation

˙

ρ(x , t )

= −∇ ¸ J ( x, t ) = D ∇ 2ρ ( x, t )

or

¹

D ∇ 2 − ∂t

º

ρ (x , t )

(4.193)

= 0.

(4.194)

Fourier had in mind such equations when he invented his transform. If we write the density ρ ( x, t ) as the transform ρ ( x, t )

=

²

ei k¸ x +i ωt ρ( ˜ k, ω) d3 kdω

(4.195)

4.13 Application to Differential Equations then the diffusion equation becomes

¹

D∇

2

− ∂t

º

ρ ( x, t )

=

²

¹ 2 º − D k − i ω ρ(˜ k, ω) d 3 k d ω = 0

ei k¸ x+i ωt

which implies the algebraic equation

¹

155

D k2 + i ω

º

˜

ρ(k , ω)

= 0.

(4.196)

(4.197)

Our solution (4.171) of this homogeneous equation is ρ (x , t )

=

²

ei k¸ x+i ωt

δ

¹

− D k2 − i ω

º

h (k, ω) d 3 k dω

(4.198)

in which h (k, ω) is an arbitrary function. Dirac’s delta function requires ω to be imaginary ω = i Dk2 , with Dk2 > 0. So the ω-integration is up the imaginary axis. It is a Laplace transform, and we have ρ ( x, t )

=

²∞

−∞

ei k¸x − Dk

in which ρ( ˜ k) ≡ h(k, i Dk2). Thus the function initial density ρ ( x , 0) ρ ( x, 0)

=

²∞

−∞

2t

˜

ρ(k ) d

˜

ρ(k )

3

k

(4.199)

is the Fourier transform of the

ei k¸ x ρ( ˜ k) d 3 k.

(4.200)

So if the initial density ρ (x , 0) is concentrated at y ρ ( x , 0)

= δ( x − y) =

²∞

−∞

then its Fourier transform ρ( ˜ k) is

˜

ρ(k )

ei k¸( x− y)

d3 k (2 π )3

(4.201)

−i k¸ y

= e(2π )3

(4.202)

and at later times the density ρ ( x , t ) is given by (4.199) as ρ (x , t )

=

²∞

−∞

ei k¸( x − y )− Dk

2t

d3k . (2π )3

(4.203)

Using our formula (4.19) for the Fourier transform of a gaussian, we find ρ (x , t )

= (4π Dt1 )3 2 e− x− y (

) 2 /(4Dt )

/

(4.204)

.

Since the diffusion equation is linear, it follows (Exercise 4.18) that an arbitrary initial distribution ρ ( y, 0) evolves to the convolution (Section 4.6) ρ (x , t )

=

1 (4 π Dt )3/ 2

²

e−( x− y)

2 /(4 Dt )

ρ ( y, 0) d

3

y.

(4.205)

156

4 Fourier and Laplace Transforms

Exercises

4.1 Show that the Fourier integral formula (4.26) for real functions follows from (4.9) and (4.25). 4.2 Show that the Fourier integral formula (4.26) for real functions implies (4.27) if f is even and (4.28) if it is odd. 4.3 Derive the formula (4.30) for the square wave (4.29). 4.4 By using the Fourier-transform formulas (4.27 and 4.28), derive the formulas (4.31) and (4.32) for the even and odd extensions of the exponential exp(−β | x |) . 4.5 For the state |ψ, t ´ given by Eqs. (4.83 and 4.88), find the wave function ψ ( x , t ) = ³ x |ψ, t ´ at time t. Then find the variance of the position operator at that time. Does it grow as time goes by? How? 4.6 At time t = 0, a particle of mass m is in a gaussian superposition of momentum eigenstates centered at p = ± K ψ (x , 0 )

=N

²

∞ −∞

eikx e − L

2 (k

−K )2 dk .

(4.206)

(a) Shift k by K and do the integral. Where is the particle most likely to be found? (b) At time t , the wave function ψ (x , t ) is ψ (x , 0 ) but with ikx replaced by ikx − i ±k 2 t /2m. Shift k by K and do the integral. Where is the particle√most likely to be found? (c) Does the wave packet spread out like t or like t as in classical diffusion? 4.7 Express the characteristic function (4.90) of a probability distribution P ( x ) as its Fourier transform. 4.8 Express the characteristic function (4.90) of a probability distribution as a power series in its moments (4.92). 4.9 Find the characteristic function (4.90) of the gaussian probability distribution PG (x , µ, σ )

1

= √ σ

2π

´ (x − µ)2 µ . exp − 2σ 2

(4.207)

4.10 Find the moments µn = E [x n ] for n = 0 , . . . , 3 of the gaussian probability distribution PG ( x , µ, σ ). 4.11 Derive (4.115) from B = ∇ × A and Ampère’s law ∇ × B = µ0 J. 4.12 Derive (4.116) from (4.115). 4.13 Derive (4.117) from (4.116). 4.14 Use the Green’s function relations (4.110) and (4.111) to show that (4.117) satisfies (4.115).

Exercises

157

4.15 Show that the Laplace transform of t z−1 is the gamma function (5.58) divided by s z f (s ) =

²

∞

0

e −st t z −1 dt

√

= s −z ´(z).

(4.208)

4.16 Compute the Laplace transform of 1/ t. Hint: let t = u 2 . 4.17 Show that the commutation relations (4.185) of the annihilation and creation operators imply the equal-time commutation relations (4.186) for the field φ and its conjugate momentum π . 4.18 Use the linearity of the diffusion equation and Equations (4.201–4.204) to derive the general solution (4.205) of the diffusion equation.

5 Infinite Series

5.1 Convergence A sequence of partial sums SN

=

converges to a number S if for every ±

| S − SN | < ±

N ± n =0

>

cn

(5.1)

0, there exists an integer N (±) such that

for all

N

>

N (±).

(5.2)

The number S is then said to be the limit of the convergent infinite series S

=

∞ ± n=0

cn

= Nlim S = lim →∞ N N →∞

N ± n=0

cn .

(5.3)

Some series converge; others wander or oscillate; and others diverge. A series whose absolute values converges S

=

∞ ± n=0

|cn |

(5.4)

is said to converge absolutely. A convergent series that is not absolutely convergent is said to converge conditionally. Example 5.1(Two infinite series) number e = 2.718281828 . . .

The series of inverse factorials converges to the

∞ 1 ± = e. n! n =0

(5.5)

But the harmonic series of inverse integers diverges

∞ 1 ± k =1

158

k

→∞

(5.6)

5.2 Tests of Convergence

159

as one may see by grouping its terms 1+

1 + 2

²1

³ ²1 1 1 1³ 1 1 1 1 + + + + + + · · · ≥ 1 + + + + ··· 3 4 5 6 7 8 2 2 2

(5.7)

to form a series that obviously diverges. This series up to 1/ n approaches the natural logarithm ln n to within a constant γ

= nlim →∞

´± n 1 k =1

k

µ

− ln n = 0.5772156649 . . .

(5.8)

known as the Euler–Mascheroni constant (Leonhard Euler 1707–1783, Lorenzo Mascheroni 1750–1800). Example 5.2(Geometric series) (1

For any positive integer N , the identity

− z )(1 + z + z 2 + · · · + z N ) = 1 − z N +1

implies that S N ( z) = For | z| < 1, the term | z N +1| converges to

→

N ± n=0

zn

N +1

.

(5.10)

→ ∞, and so the geometric series SN (z)

0 as N

S ( z) =

= 1 −1 −z z

(5.9)

∞ ± n=0

zn

= 1 −1 z

(5.11)

as long as the absolute-value of z is less than unity. A useful approximation for

|z| ± 1 is

1 1²z

≈ 1 ∓ z.

(5.12)

5.2 Tests of Convergence The Cauchy criterionfor the convergence of a sequence S N is that for every ± there is an integer N (±) such that for N > N (±) and M > N (±) one has

|SN − SM | < ±.

>

0

(5.13)

Cauchy’s criterion is equivalent to the defining condition (5.2). The comparison test: Suppose the convergent series

∞ ± n=0

bn

(5.14)

160

has only positive terms b n

5 Infinite Series

≥ 0, and that |cn | ≤ bn for all n. Then the series ∞ ± n=0

cn

(5.15)

converges absolutely. Similarly, if for all n, the inequality 0 ≤ c n ≤ b n holds and the series of numbers cn diverges, then so does the series of numbers bn . The Cauchy root test : If for some N , the terms cn satisfy

|cn |1 n ≤ x < 1 (5.16) for all n > N , then for such n we have |cn | ≤ x n and therefore since the geometric series (5.11) converges for | z | < 1, we know that ∞ ± 1 xn = . (5.17) 1 − x n=0 /

Thus by the comparison test (5.14 and 5.15), the series

∞ ± n=0

cn

(5.18)

converges absolutely. ∑ The ratio test of d’Alembert : The series n cn converges if

¶¶ c ¶¶ n +1 ¶ =r 1. The integral test: If the terms cn are positive and monotonically decreasing, 0 ≤ c n+1 ≤ cn , and f ( x ) is a monotonically decreasing function with f (n ) = cn , then c n+1

≤

·

n+1

n

f (x ) d x

≤ cn ,

(5.20)

and so by the comparison test (5.14 and 5.15), the series converges (diverges) according to whether the integral

·∞ N

f (x ) d x

(5.21)

converges (diverges) for suitably large but fixed N . The sum test: One can use Fortran, C, Python, Sage, Matlab, or Mathematica to sum the first N terms of one’s series for N = 100, N = 10, 000, N = 1, 000, 000, and so forth as seems appropriate. For simple series, Mathematica’s Sum and

5.3 Convergent Series of Functions

161

NSum commands work, as in sumZeta.nb. For more complicated series, it may be easier to write one’s own code. For instance, the Fortran program doSum.f95 sums sin (sin(n ))/(n log(n )) from 2 to 109 and gets 0.4333495. The codes sumZeta.nb and doSum.f95 as well as the Matlab scripts of this chapter are in the GitHub repository Infinite_series at github.com/kevinecahill. The sum test has the advantage that one gets partial sums of the series, but if the series converges or diverges slowly, the partial sums may be tricky to interpret. 5.3 Convergent Series of Functions A sequence of partial sums S N (z ) =

N ± n=0

fn ( z )

(5.22)

of functions fn ( z ) converges to a function S (z ) on a set D if for every ± every z ∈ D, there exists an integer N (±, z ) such that

|S( z) − SN ( z)| < ±

for all

N

>

N (±, z ).

>

0 and (5.23)

The numbers z may be real or complex. The function S (z ) is said to be the limit on D of the convergent infinite series of functions S( z) =

∞ ± n=0

fn ( z ).

(5.24)

A sequence of partial sums S N (z ) of functions converges uniformly on the set D if the integers N (±, z ) can be chosen independently of the point z ∈ D, that is, if for every ± > 0 and every z ∈ D, there exists an integer N (±) such that

| S( z) − SN ( z)| < ±

for all

N

>

N (±).

(5.25)

The limit (3.65) of the integral over a closed interval a ≤ x ≤ b of a uniformly convergent sequence of partial sums S N ( x ) of continuous functions is equal to the integral of the limit lim

·

N →∞ a

b

SN (x ) d x

=

·b a

S( x) d x .

(5.26)

A real or complex-valued function f ( x ) of a real variable x is square integrable on an interval [a , b ] if the Riemann or Lebesgue (3.81) integral

·b a

| f (x )|2 d x

(5.27)

162

5 Infinite Series

exists and is finite. A sequence of partial sums SN (x ) =

N ± n=0

f n ( x)

(5.28)

of square-integrable functions f n (x ) converges in the meanto a function S ( x ) if lim

·b

N →∞ a

|S( x ) − SN ( x )|2 d x = 0.

(5.29)

Convergence in the mean sometimes is defined as lim

·

b

N →∞ a

ρ (x )

|S( x ) − SN ( x )|2 d x = 0

(5.30)

in which ρ ( x ) ≥ 0 is a weight function that is positive except at isolated points where it may vanish. If the functions fn are real, then this definition of convergence in the mean is more simply lim

·b

N →∞ a

ρ (x ) ( S ( x )

− SN (x ))2 d x = 0.

(5.31)

5.4 Power Series

A power series is a series of functions with fn (z ) = cn z n S( z) =

∞ ± n=0

cn z n .

(5.32)

By the ratio test (5.19), this power series converges if

¶¶ c ¶¶ |z| ¶¶ c zn +1 ¶¶ n+1 ¶ n+1 ¶ ¶ = | z | lim ¶¶ ≡ R

−1. In particular 0! =

·∞ 0

·

∞ 0

e −t t z dt

e −t dt

(5.56)

=1

(5.57)

which explains the definition (5.40). The factorial function (z − 1) ! in turn defines the gamma functionfor Re z > 0 as ²(z )

or equivalently as

=

·

∞ 0

e −t t z −1 dt

= (z − 1)!

(5.58)

z ! = ²(z + 1 ).

(5.59)

By differentiating the definition (5.58) of the gamma function and integrating by parts, we see that the gamma function satisfies the key identity

· ∞² d ³ · ∞ ²d ³ · ∞ − t z − dt e t dt = e−t dt t z dt = e−t z t z−1 dt ²( z + 1 ) = 0 0 0 = z ²(z ). (5.60)

Iterating, we get ²( z

+ n) = ( z + n − 1)²(z + n − 1) = (z + n − 1)(z + n − 2) · · · z ²(z).

Thus for any complex z and any integers n and m with n work as one would think

>

(5.61)

m, ratios of factorials

+ n) = ( z + n − 1)! = ( z + n − 1) · · · ( z + m ) ²(z + m ) ( z + m − 1)!

(5.62)

= z1 ²(z + 1) = 1z z +1 1 ²(z + 2) = 1z z +1 1 z +1 2 ²(z + 3) = · · ·

(5.63)

²( z

as long as z + m avoids the negative integers. We can use the identities (5.60 and 5.61) to extend the definition (5.58) of the gamma function in unit steps into the left half-plane ²( z )

5.5 Factorials and the Gamma Function

167

as long as we avoid the negative integers and zero. This extension leads to Euler’s definition 1 · 2 · 3···n nz (5.64) ²( z ) = lim n→∞ z (z + 1)( z + 2) · · · ( z + n ) and to Weierstrass’s (Exercise 5.6) ²( z )

1 −γ z e z

=

¿º ∞¸ n =1

1+

z ¹ − z/n e n

À−1 (5.65)

(Karl Theodor Wilhelm Weierstrass, 1815–1897), and is an example of analytic continuation (Section 6.12). One may show (Exercise 5.8) that another formula for ²(z ) is

=2

²( z )

for Re z

>

0 and that ²( n

n+

²

∞

0

e −t t 2z − 1 dt 2

)! + 21 ) = n( 2n 2n !2

which implies (Exercise 5.11) that

²

·

1 2

³

(5.66)

√

(5.67)

π

= (2n 2−n 1)!!

√

(5.68)

π.

Example 5.8(Bessel function of nonintegral index) We can use the gamma-function formula (5.58) for n! to extend the definition (5.52) of the Bessel function of the first kind Jn (ρ) to nonintegral values ν of the index n. Replacing n by ν and (m + n )! by ²(m + ν + 1 ), we get Jν (ρ) =

∞ ¸ρ ¹ ± ν

2

¸ ρ ¹2m −1)m m ! ²(m + ν + 1) 2 m =0 (

(5.69)

which makes sense even for complex values of ν . Example 5.9(Spherical Bessel function) The spherical Bessel function is defined as j³ (ρ) ≡ For small values of its argument | ρ | and so (Exercise 5.7) j ³ (ρ) ≈

√π ¸ ρ ¹ 2

2

³

Áπ

2ρ

J³ +1/ 2(ρ).

(5.70)

± 1, the first term in the series (5.69) dominates

= (³2!³(+2ρ)1)! = (2³ ρ+ 1)!! ²(³ + 3 /2) 1

³

³

(5.71)

as one may show by repeatedly using the key identity ²(z + 1) = z ²(z ). Example 5.10(Numerical gamma functions) The function gamma(x ) gives ²(x ) in Fortran, C, Matlab, and Python. As n increases past 100, n! = ²(n + 1) quickly

168

5 Infinite Series

becomes too big for computers to handle, and it becomes essential to work with the logarithm of the gamma function. The Fortran function log_gamma(x ), the C function lgamma(x ), the Matlab function gammaln(x ), and the Python function loggamma(x ) all give log(²(x )) = log(( x − 1)!) for real x.

5.6 Euler’s Beta Function The beta function is the ratio B( x , y ) =

²( x ) ²( y ) ²( x

(5.72)

+ y)

which is symmetric, B(x , y ) = B( y , x ). We can get an explicit formula for it by using the formula (5.66) for the gamma function (5.58) in the product ²( x ) ²( y )

=4

·∞ 0

2 e−u u 2x −1 du

and switching to polar coordinates u = t cos θ , v ²( x ) ²( y )

·

∞

=2 0 = ²(x + y ) B(x , y).

2 e −t t 2( x + y) −1 dt 2

·

0

π/2

·∞ 0

e −v

2

2 y −1

v

dv

(5.73)

= t sin θ , and t 2 = u2 + v 2 cos 2x −1 θ sin2 y −1 θ d θ

(5.74)

Thus the beta function is the integral B( x , y ) = 2 Setting t

·

= cos2 θ , we find B(x , y ) =

π/2

0

cos 2x −1 θ sin2 y −1 θ d θ .

(5.75)

t x −1 (1 − t ) y −1 dt .

(5.76)

· · · ²(zn ) ²( z 1 + z 2 + · · · + z n )

(5.77)

·1 0

The ratio B(z 1, z 2, . . . , z n ) =

²(z 1) ²( z 2 )

naturally generalizes the beta function (5.72). 5.7 Taylor Series If f ( x ) is a real-valued function of a real variable x with a continuous N th derivative, then Taylor’s expansion for it is

5.8 Fourier Series as Power Series

f (x

+ a) = =

f (x ) + a f ³ (x ) + N −1 n ± a n! n=0

169

a N −1 a 2 ³³ f (x ) + · · · + f ( N −1) + E N 2 ( N − 1)!

f (n) ( x ) + E N

in which the error E N is

(5.78)

N

= aN !

+ y)

(5.79)

for some 0 ≤ y ≤ a. For many functions f (x ) the errors go to zero, E N functions, the infinite Taylor series converges:

→ 0, as N → ∞; for these

EN

∞ an ± f (x + a ) = n! n=0

f

f (N )(x

(n)

(x )

=

²

d exp a dx

³

f (x ).

(5.80)

In quantum mechanics, the momentum operator p acts from the right on ´ x |

´x | p = ±i ddx ´ x |

(5.81)

and generates translations in space ψ (x

+ a ) = ´x + a |ψ µ = ´ x |e

iap/±

|ψ µ =

²

d exp a dx

³

´ x |ψ µ.

(5.82)

5.8 Fourier Series as Power Series The Fourier series (3.46) f (x ) = with coefficients (3.54) cn

=

is a pair of power series f (x ) = in the variables z

1

√

= ei2

·

∞ ± n =−∞

L/2

´± ∞

π x /L

n=0

e i2π nx / L

√

e −i2π nx / L

−L /2

L

cn

√

L

cn zn

and

+

f (x ) d x

∞ ±

z −1

(5.83)

L

n=1

(5.84)

c−n ( z −1) n

= e−i 2

π x /L

.

µ (5.85)

(5.86)

170

5 Infinite Series

Formula (5.35) tells us that the radii of convergence of these two power series are given by |cn+1 | and R −1 = lim |c−n −1| . 1 (5.87) R− − + = nlim n→∞ |c −n | →∞ |cn | Thus the pair of power series (5.85) will converge uniformly and absolutely as long as z satisfies the two inequalities

|z| < R+

and

1 |z |

1, the Riemann zeta function is the infinite product

5.11 Dirichlet Series and the Zeta Function

=

ζ ( z)

º

1 1 − p −z

p

173

(5.106)

over all prime numbers p = 2, 3, 5, 7, 11, . . . Example 5.14(Planck’s distribution) Max Planck (1858–1947) showed that the electromagnetic energy in a closed cavity of volume V at a temperature T in the frequency interval dν about ν is dU(β, ν, V ) =

3 ν 8π hV dν c3 eβ h ν − 1

(5.107)

in which β = 1/(kT ), k = 1. 3806503 × 10−23 J/K is Boltzmann’s constant, and h = 6. 626068 × 10−34 Js is Planck’s constant. The total energy then is the integral U (β, V ) =

8π hV c3

·∞

ν3

eβ h ν − 1

0

dν

(5.108)

= β hν and using the geometric series (5.11) · 8π(kT )4 V ∞ x 3 dx U (β, V ) = x (hc)3 0 e −1 · 8π(kT )4 V ∞ x 3e− x = (hc)3 −x dx 0 1−e · ∞ 8π(kT )4 V ∞ 3 −x ± −nx = (hc)3 x e e dx.

which we may do by letting x

n =0

0

(5.109)

The geometric series is absolutely and uniformly convergent for x > 0, and we may interchange the limits of summation and integration. Another change of variables and the definition (5.58) of the gamma function give U (β, V ) =

= =

8π(kT )4 V (hc)3

∞·∞ ± n =0

∞ 8π(kT )4 V ± (hc)3

8π(kT )4 V (hc)3

n =0

0

x 3 e−(n +1) x dx

1

(n

+ 1)4

3! ζ (4) =

·∞ 0

y 3e− y dy

8π 5(kT )4V 15 (hc )3

.

(5.110)

The power radiated by a “black body” is proportional to the fourth power of its temperature and to its area A

= σ AT4

(5.111)

2π k = 5.670400(40) × 10−8 W m−2 K−4 = 15 h 3c2

(5.112)

P in which

5 4

σ

is Stefan’s constant.

174 β

5 Infinite Series The number of photons in the black-body distribution (5.107) at inverse temperature in the volume V is N (β, V ) =

8π V c3

·

∞

ν2 eβ hν

−1

dν

=

8π V (cβ h )3

·∞

x2

− 1 dx ∞ ± 2 −x

ex

· ∞ x 2e− x ·∞ 8π V = e−nx dx −x dx = (cβh )3 0 x e 0 1−e n =0 · ·∞ ∞ ∞ ∞ 8π V ± 1 8π V ± 2 − n+1 x x e dx = = (cβ h)3 y2 e− y dy (cβ h )3 (n + 1 )3 0

0

8π V (cβ h )3

(

)

n=0 0

n=0

3

0

kT ) V ζ (3 )2!. = (8cβπhV)3 ζ (3)2! = 8π((ch )3

(5.113)

The mean energy É µ of a photon in the black-body distribution (5.107) is the energy U (β, V ) divided by the number of photons N (β, V ) 4 É µ = ´hν µ = 23!! ζζ ((34)) kT = 30πζ (3) kT

(5.114)

or É µ ≈ 2.70118 kT since Apéry’s constant ζ (3) is 1.2020569032 . . . (Roger Apéry, 1916–1994). Example 5.15(Lerch transcendent)

The Lerch transcendentis the series

µ( z, s, α)

=

∞ ±

n =0

zn

(n

+ α)s

(5.115)

which converges for |z | < 1 and Re s > 0 and reduces to Riemann’s zeta function (5.105) when z = 1 and α = 0, µ(1, s , 0) = ζ (s ).

5.12 Bernoulli Numbers and Polynomials The Bernoulli numbersBn are defined by the infinite series

¾¶¶ ∞ ∞ n ½ n ± ± x d xn x ¶ B = = n ex − 1 n ! d x n e x − 1 ¶ x =0 n =0 n! n=0 x

(5.116)

− 1). They are the successive derivatives ¶ x ¶¶ dn . (5.117) Bn = d x n e x − 1 ¶ x =0

for the generating functionx /(e x

So B0

= 1 and B1 = −1/2. The remaining odd Bernoulli numbers vanish B2n +1 = 0 for n > 0 (5.118)

5.13 Asymptotic Series

175

and the remaining even ones are given by Euler’s zeta function (5.105) as B 2n

= ( −1()2π )22n( 2n)! ζ (2n) n −1

for n > 0.

(5.119)

The Bernoulli numbers occur in the power series for many transcendental functions, for instance

∞ 2k ± 2 B2k 2k −1 x coth x = + (2k )! k =1 1 x

for x 2

2

0) are nearly decoupled from those in the cytosol. Real plasma membranes are phospholipid bilayers. The lipids avoid the water and so are on the inside. The phosphate groups are dipoles (and phosphatidylserine is negatively charged). So a real membrane is a 4 nm thick lipid layer bounded on each side by dipole layers, each about 0.5 nm thick. The net effect is to weakly attract ions that are within 0.5 nm of the membrane.

5.16 Infinite Products Weierstrass’s definition (5.65) of the gamma function, Euler’s formula (5.106) for the zeta function, and Mermin’s formula (5.45) for n ! are useful infinite products. Other examples are the expansions of the trigonometric functions (3.79 and 3.80) sin z

=z

∞½ º n=1

1−

z2 π 2n 2

¾

and

cos z

=

∞ ½ º n=1

1−

z2

π 2( n

− 1/2)2

¾

(5.158)

182

5 Infinite Series

which imply these of the hyperbolic functions sinh z

=z

∞½ º n=1

1+

z2 π 2 n2

¾

and cosh z

=

∞½ º n =1

1+

¾

z2

π 2(n

− 1/2) 2

.

(5.159)

Exercises

5.1 Test the following series for convergence: (a)

∞ ± n =2

1 2 (ln n )

,

(b)

∞ n! ± , n =1

20n

(c)

∞ ± n=1

1

n (n + 2)

,

(d)

∞ ± n =2

1 . n ln n

In each case, say whether the series converges and how you found out. 5.2 Olber’s paradox: Assume a static universe with a uniform density of stars. With you at the origin, divide space into successive shells of thickness t, and assume that the stars in each shell subtend the same solid angle ω (as follows from the first assumption). Take into account the occulting of distant stars by nearer ones and show that the total solid angle subtended by all the stars would be 4π . The sky would be dazzlingly bright at night. 5.3 Use the geometric formula (5.10) to derive the trigonometric summation formula 1 2

+ cos α + cos 2α + · · · + cos nα = sin(n +12 )α . 1

2 sin

Hint: write cos n α as [exp(inα) + exp(− inα)]/2. 5.4 Show that ²n − 1³ ²n − 1³ ²n³ + k−1 = k k 5.5

5.6 5.7 5.8 5.9 5.10 5.11

2α

(5.160)

(5.161)

and then use mathematical induction to prove Leibniz’s rule (5.49). (a) Find the radius of convergence of the series (5.52) for the Bessel function Jn (ρ). (b) Show that this series converges even faster than the one (5.50) for the exponential function. Use the formula (5.8) for the Euler–Mascheroni constant to show that Euler’s definition (5.64) of the gamma function implies Weierstrass’s (5.65). Derive the approximation (5.71) for j³(ρ) for |ρ | ± 1. Derive formula (5.66) for the gamma function from its definition (5.58). Use formula (5.66) to compute ²(1 /2). Show that z! = ²(z + 1 ) diverges when z is a negative integer. Derive formula (5.68) for ²(n + 21 ) .

Exercises

183

5.12 Show that the area of the surface of the unit sphere in d dimensions is

= 2π d 2/ ²(d/2). (5.162) Hint: Compute the integral of the gaussian exp( −x 2 ) in d dimensions using /

Ad

5.13 5.14 5.15 5.16 5.17 5.18 5.19

both rectangular and spherical coordinates. This formula (5.162) is used in dimensional regularization (Weinberg, 1995, p. 477). Derive (5.93) from (5.91) and (5.92) from √ (5.90). √ Derive the expansions (5.97 and 5.98) for 1 + x and 1/ 1 + x. Find the radii of convergence of the series (5.97) and (5.98). Find the first three Bernoulli polynomials Bn ( y ) by using their generating function (5.121). How are the two definitions (5.119) and (5.122) of the Bernoulli numbers related? Show that the Lerch transcendent µ( z, s, α) defined by the series (5.115) converges when |z | < 1 and Re s > 0 and Re α > 0. Langevin’s classical formula for the electrical polarization of a gas or liquid of molecules of electric dipole moment p is P(x ) = N p

² cosh x

−

sinh x

1 x

³

(5.163)

where x = p E /(kT ), E is the applied electric field, and N is the number density of the molecules per unit volume. (a) Expand P ( x ) for small x as an infinite power series involving the Bernoulli numbers. (b) What are the first three terms expressed in terms of familiar constants? (c) Find the saturation limit of P ( x ) as x → ∞. 5.20 By using repeatedly the identity (5.60), which is z²(z ) = ²(z + 1), show that

−n + α + 1) = (α − n)(α − n − 1) · · · (α − n − k + 1). ²(−n + α − k + 1) ²(

(5.164)

5.21 Show that the energy of a charge q spread on the surface of a sphere of radius a in an infinite lipid of permittivity ±³ is W = q 2 /8π ±³ a. 5.22 If the lipid of Exercise 5.21 has finite thickness t and is surrounded on both sides by water of permittivity ±w , then the image charges change the energy W by (Parsegian, 1969) ´W

q2

= 4π ±

³

³ ∞ ² ± 1 ± −± n . t n ± +± n =1 ³

w

³

w

Sum this series. Hint: read Section 5.10 carefully.

(5.165)

184

5 Infinite Series

5.23 Consider a stack of three dielectrics of infinite extent in the x–y plane separated by the two infinite x–y planes z = t /2 and z = −t /2. Suppose the upper region z > t /2 is a uniform linear dielectric of permittivity ± 1 , the central region −t /2 < z < t /2 is a uniform linear dielectric of permittivity ±2 , and the lower region z < −t /2 is a uniform linear dielectric of permittivity ±3 . Suppose the lower infinite x–y plane z = −t /2 has a uniform surface charge density −σ , while the upper plane z = t /2 has a uniform surface charge density σ . What is the energy per unit area of this system? What is the pressure on the second dielectric? What is the capacitance per unit area of the stack?

6 Complex-Variable Theory

6.1 Analytic Functions A complex-valued function f (z ) of a complex variable z is differentiable at z with derivative f ± ( z ) if the limit f (z ± ) − f (z ) z →z z± − z

f ± ( z ) = lim ±

(6.1)

exists and is unique as z ± approaches z from any directionin the complex plane. The limit must exist no matter how or from what direction z ± approaches z. If the function f (z ) is differentiable in a small disk around a point z 0, then f ( z ) is said to be analytic (or equivalently holomorphic) at z 0 (and at all points inside the disk). Example 6.1(Polynomials) If f (z ) = z n for some integer n, then for tiny dz and z ± = z + dz, the difference f (z ± ) − f (z) is f (z ± ) − f (z ) = (z + dz )n

and so the limit lim

z ± →z

f ( z± ) − f ( z ) z± − z

− z n ≈ nzn −1 dz n−1

nz dz = dzlim = nzn −1 →0 dz

(6.2)

(6.3)

exists and is nz n−1 independently of how z ± approaches z. Thus the function zn is analytic at z for all z with derivative dz n dz

= nzn−1 .

(6.4)

A function that is analytic everywhere is entire. All polynomials P (z ) =

N ± n =0

c n zn

(6.5)

are entire. 185

186

6 Complex-Variable Theory

Example 6.2 (A function that’s not analytic) To see what can go wrong when a function is not analytic, consider the function f (x , y) = x 2 + y2 = z z¯ for z = x + iy. If we compute its derivative at (x , y) = (1, 0) by setting x = 1 + ± and y = 0, then the limit is 2 f (1 + ±, 0) − f (1, 0) = lim (1 + ±) − 1 = 2 (6.6) lim

→0

±

±

±

→0

±

= 1 and y = ±, then the limit is 2 f (1, ±) − f (1, 0) = lim 1 + ± − 1 = −i lim ± = 0.

while if we instead set x lim

i± ±→ 0 i± ± →0 So the derivative depends upon the direction through which z → 1.

→0

±

(6.7)

6.2 Cauchy–Riemann Conditions

How do we know whether a complex function f ( x , y ) = u (x , y ) + i v( x , y ) of two real variables x and y is analytic? We apply the criterion (6.1) of analyticity and require that the change d f in the function f (x , y ) be proportional to the change dz = d x + idy in the complex variable z = x + i y

²∂u ∂x

+i ∂x ∂v

³

dx

+

²∂u ∂y

+ i ∂y ∂v

³

dy

=

f ± (z )(d x

Setting first dy and then d x equal to zero, we have

² ∂u

∂x

³

+ i ∂x = ∂v

1 f ± ( z) =

²∂u

i

∂y

+ i ∂y

∂v

+ idy ).

(6.8)

.

(6.9)

³

This complex equation implies the two real equations ∂u ∂x

= ∂∂ vy

and

∂v ∂x

= − ∂∂uy

(6.10)

which are the Cauchy–Riemann conditions. In a notation in which partial derivatives are labeled by subscripts, the Cauchy–Riemann conditions are u x = v y and vx = − u y . Example 6.3(A function analytic except at a point) The real and imaginary parts of the function f (z ) = are u(x , y ) =

z∗ − z0∗ x − x 0 + i ( y − y0) = = 2 z − z0 | z − z 0 | ( x − x0) 2 + ( y − y0) 2 1

x − x0 ( x − x0) 2 + ( y − y0) 2

and

v(x , y )

= (x − x )y2 −+ y(0y − y )2 . 0 0

(6.11)

(6.12)

6.3 Cauchy’s Integral Theorem

187

They satisfy the Cauchy–Riemann conditions (6.10) 2 2 = [((xy −− xy0))2 +−((yx −− yx0))2 ]2 = ∂ v(∂xy, y )

∂ u( x , y ) ∂x

and

0

0

2( x − x0)( y − y0) ∂ u (x , y ) = − = − 2 2 2 ∂x [( x − x0) + ( y − y0) ] ∂y except at the point z = z 0 where x = x0 and y = y0. ∂ v(x , y)

(6.13)

(6.14)

6.3 Cauchy’s Integral Theorem The Cauchy–Riemann conditions imply that the integral of a function along a closed contour(one that ends where it starts) vanishes if the function is analytic on the contour and everywhere inside it. To keep the notation simple, let’s consider a rectangle R of length ² and height h with one corner at the origin and edges running along the x and y axes of the z plane. The integral along the four sides of the rectangle is

´

R

f (z ) dz =

´

+ idy ) µ h¶ µ · = [u( x , 0) + i v( x , 0) ] d x + u(², y) + i v(², y) idy 0 0 µ0 µ 0¶ · + [u (x , h) + i v(x , h) ] d x + u(0, y) + i v(0, y ) idy. R

(u (x ,

y ) + i v(x , y )) (d x

²

h

²

(6.15)

The real and imaginary parts of this contour integral are

²´

³ µ µ h¶ · Re f ( z )dz = [u ( x , 0) − u ( x , h ) ] d x − v(², y ) − v(0 , y ) dy ³ µ0 ²´ R µ0h¶ · [v(x , 0) − v(x , h ) ] d x + u (², y ) − u (0 , y ) dy . f ( z )dz = Im ²

²

R

0

0

(6.16)

The differences u (x , 0) − u ( x , h ) and v(², y ) − v(0, y ) in the real part are integrals of the y derivative u y ( x , y ) and of the x derivative v x ( x , y ) u ( x , 0) − u ( x , h ) = v(², y )

− v(0, y) =

− µ 0

µ

h 0

²

u y ( x , y ) dy

vx ( x , y ) d x .

(6.17)

188

6 Complex-Variable Theory

The real part of the contour integral therefore vanishes due to the second v x of the Cauchy–Riemann conditions (6.10) Re

²´

R

f ( z ) dz

³

=−

µ µh ²

u y(x , y ) d y d x −

µ hµ

²

vx ( x , y ) d x

µ µ h¶ · =− u y ( x , y ) + vx ( x , y ) d y d x = 0. 0

0

0

0

dy

²

0

= −u y

(6.18)

0

Similarly, differences v(x , 0) − v(x , h ) and u (², y ) − u (0, y ) in the imaginary part are integrals of the y derivative v y ( x , y ) and of the x derivative u x ( x , y )

− v( x, h) = − µ u (², y ) − u (0, y ) =

v( x , 0)

0

µh 0

²

v y (x ,

y ) dy (6.19)

ux ( x , y) d x .

Thus the imaginary part of the contour integral vanishes due to the first u x the Cauchy–Riemann conditions (6.10) Im

²´

R

f ( z ) dz

³

µ µh ²

µ hµ

²

ux ( x , y) d x d y v y ( x , y) d y d x + =− 0 0 0 0 µ µ h¶ · − v y (x , y) + u x ( x, y) d y d x = 0. = ²

0

= v y of

(6.20)

0

A similar argument shows that the contour integral along the four sides of any rectangle vanishes as long as the function f ( z ) is analytic on and within the rectangle whether or not the rectangle has one corner at the origin z = 0. Suppose a function f (z ) is analytic along a closed contour C and also at every point inside it. We can tile the inside area A with a suitable collection of contiguous rectangles some of which might be very small. The integral of f ( z ) along the perimeter of each rectangle will vanish because each rectangle lies entirely within the region in which f ( z ) is analytic. Now consider two adjacent rectangles like the two squares in Fig. 6.1. The sum of the two contour integrals around the two adjacent squares is equal to the contour integral around the perimeter of the two squares because the up integral along the right side (dots) of the left square cancels the down integral along the left side of the right square. Thus the sum of the contour integrals around the perimeters of all the rectangles that tile the inside area A amounts to just the integral along the outer contour C. The integral around each rectangle vanishes. So the integral of f (z ) along the contour C also must vanish because it is the sum of these vanishing integrals around the rectangles that tile the

6.3 Cauchy’s Integral Theorem

189

Cauchy’s integral theorem

Figure 6.1 The sum of two contour integrals around two adjacent squares is equal to the contour integral around the perimeter of the two squares because the up integral along the right side (dots) of the left square cancels the down integral along the left side (dots) of the right square. A contour integral around a big square is equal to the sum of the contour integrals around the smaller interior squares that tile the big square. Matlab scripts for this chapter’s figures of are in Complex-variable_theory at github.com/kevinecahill.

inside area A. This is Cauchy’s integral theorem: The integral of a function f ( z ) along a closed contour vanishes

´

C

f (z ) dz

=0

(6.21)

if the function f ( z ) is analytic on the contour and at every point inside it. What could go wrong? The area A inside the contour might have a hole in it in which the function f (z ) is not analytic. To exclude this possibility, we require that the area A inside the contour be simply connected, that is, we insist that we be able to shrink every loop in A to a point while keeping the loop inside A. A slice of American cheese is simply connected, a slice of Swiss Emmenthal is not. A dime is simply connected, a washer isn’t. The surface of a sphere is simply connected, the surface of a bagel isn’t. So another version of Cauchy’s integral theorem is that the integral of a function f ( z ) along a closed contour vanishes if the contour lies within a simply connected region in which f (z ) is analytic (Augustin-Louis Cauchy, 1789–1857). Example 6.4(Polynomials) Since dz n+1 = (n + 1) z n dz, the integral of the entire function z n along any contour C that ends and starts at the same point z 0 must vanish

190

6 Complex-Variable Theory

for any integer n ≥ 0

´

C

n

z dz

1

= n +1

´ C

dz n+1

¸ n+1

= n +1 1

Thus the integral of any polynomial P (z ) = c0 + c1z contour C also vanishes

´

P (z ) dz

C

´ ± m

=

C n=0

¹ − zn0+1 = 0.

z0

(6.22)

+ c2 z 2 + · · · along any closed

cn z n dz

= 0.

(6.23)

Example 6.5 (Tiny circular contour) If f (z) is analytic at z 0 , then the definition (6.1) of the derivative f ± (z ) shows that f (z ) ≈ f (z0 ) + f ± (z0 ) (z − z 0) near z0 to first order in z − z 0. The points of a small circle of radius ± and center z0 are z = z0 + ± ei θ . Since z − z 0 = ± ei θ and dz = i ± ei θ d θ , the closed contour integral around the circle is

´

±

µ2 ¶ · f (z ) dz = f ( z ) + f ± ( z ) (z − z ) i ± e i π

0

0

=

µ2

f (z 0 )

π

0

0

0

f ± ( z 0)

i± e dθ + iθ

µ2

θ

dθ (6.24)

π

±e

0

iθ

iθ

i± e dθ

which vanishes because the θ -integrals are zero. Thus the contour integral of an analytic function f (z ) around a tiny circle, lying within the region in which f (z ) is analytic, vanishes. Example 6.6(Tiny square contour) The analyticity of f (z ) at z = z0 lets us expand f (z ) near z0 as f (z ) ≈ f (z 0) + f ± (z0 ) (z − z0 ). A tiny square contour consists of four complex segments dz 1 = ±, dz2 = i ±, dz 3 = −± , and dz4 = −i ± . The integral of the constant f (z0 ) around the square vanishes

´

±

f (z0 ) dz

=

f ( z0 )

´

±

dz

=

f (z 0) [± + i ± + (−±) + (−i

±)]

= 0.

(6.25)

The integral of the second term f ± (z0 )(z − z 0) also vanishes. It is the sum of four integrals along the four sides of the tiny square. Like the integral of the constant f (z 0), the integral of the constant − f ± (z 0 ) z 0 also vanishes. Dropping that term, we are left with the integral of f ± (z 0) z along the four sides of the tiny square. The integral from left to right along the bottom of the square where z = x − i ±/2 is I1

=

f ± (0)

µ

±/2

−±/2

¸

x −i

±

¹

2

dx

The integral up the right side of the square where z I2

=

f ± (0)

µ

±/2

−±/2

¸±

+ iy 2

¹

2

= − i 2±

f ± (0).

(6.26)

= ±/2 + iy is

idy

2

= i ±2

The integral backwards along the top of the square where z

f ± (0).

= x + i ±/2 is

(6.27)

6.3 Cauchy’s Integral Theorem I3

=

f ± (0)

µ − 2¸ ±/

x

±/2

+ i 2±

¹

dx

2

= − i ±2

191 f ± (z 0).

(6.28)

= −±/2 + iy is ¹ 2 − 2± + iy idy = i ±2 f ± (0).

Finally, the integral down the left side where z I4

=

f ± (0)

µ − 2¸ ±/

±/2

(6.29)

These integrals cancel in pairs. Thus the contour integral of an analytic function f (z) around a tiny square of side ± is zero to order ±2 as long as the square lies inside the region in which f (z ) is analytic.

Suppose a function f ( z ) is analytic in a simply connected region R and that C and C ± are two contours that lie inside R and that both run from z 1 to z 2 . The difference of the two contour integrals is an integral along a closed contour C ±± that runs from z 1 to z 2 and back to z 1 and that vanishes by Cauchy’s theorem

µ

z2 z1C

f (z ) dz

−

µ

z2 z1C ±

f (z ) dz =

µ

z2 z1C

f (z ) dz

+

µ

z1 z2C ±

f (z ) dz =

´

C±±

f (z ) dz = 0.

(6.30) It follows that any two contour integrals that lie within a simply connected region in which f ( z ) is analytic are equal if they start at the same point z 1 and end at the same point z 2 . Thus we may continuously deform the contour of an integral of an analytic function f (z ) from C to C ± without changing the value of the contour integral as long as as long as these contours and all the intermediate contours lie entirely within the region R and have the same fixed endpoints z 1 and z 2 as in Fig. 6.2

µz

2

z1C

f ( z ) dz

=

µ

z2

z1C ±

f (z ) dz .

(6.31)

So a contour integral depends upon its endpoints and upon the function f (z ) but not upon the actual contour as long as the deformations of the contour do not push it outside of the region R in which f ( z ) is analytic. If the endpoints z 1 and z 2 are the same, then the contour C is closed, and we write the integral as

´

z1 z1 C

f (z ) dz ≡

´

C

f (z ) dz

(6.32)

with a little circle to denote that the contour is a closed loop. The value of that integral is independent of the contour as long as our deformations of the contour keep it within the domain of analyticity of the function and as long as the contour starts and ends at z 1 = z 2 . Now suppose that the function f (z ) is analytic along the contour and at all points within it. Then we can shrink the

192

6 Complex-Variable Theory Four equal contour integrals

region of analyticity

Figure 6.2 As long as the four contours are within the domain of analyticity of f (z ) and have the same endpoints, the four contour integrals of that function are all equal.

contour, staying within the domain of analyticity of the function, until the area enclosed is zero and the contour is of zero length – all this without changing the value of the integral. But the value of the integral along such a null contour of zero length is zero. Thus the value of the original contour integral also must be zero

´

z1

z1C

f ( z ) dz

= 0.

(6.33)

And so we again arrive at Cauchy’s integral theorem: The contour integral of a function f ( z ) around a closed contour C lying entirely within the domain of analyticity of the function vanishes

´

C

f ( z ) dz

=0

(6.34)

as long as the function f (z ) is analytic at all points within the contour. Example 6.7(A pole) The function f (z ) = 1/(z − z0 ) is analytic in a region that is not simply connected because its derivative

²

1 f ± (z ) = lim dz →0 z + dz − z 0

1

³

= − (z −1z )2 0 0 exists in the whole complex plane except for the point z = z0 . −z−z

1 dz

(6.35)

6.4 Cauchy’s Integral Formula

193

6.4 Cauchy’s Integral Formula Suppose that f (z ) is analytic in a simply connected region R and that z 0 is a point inside this region. We first will integrate the function f (z )/(z − z 0 ) along a tiny closed counterclockwise contour around the point z 0. The contour is a circle of radius ± with center at z 0 with points z = z 0 + ± ei θ for 0 ≤ θ ≤ 2π , and dz = i ± e i θ d θ . Since z − z 0 = ± ei θ , the contour integral in the limit ± → 0 is

´

f ( z) dz ± z − z0

µ 2 ¶ f (z0 ) + f ± (z0) ( z − z0 )· = i ± ei d θ z − z0 0 µ 2 ¶ f (z0 ) + f ± (z0) ± ei · i ± ei d θ = i ± e µ0 2 ¶ · f (z ) + f ± ( z ) ± e i id θ = 2π i f ( z ) = π

θ

θ

π

θ

θ

π

0

0

0

θ

(6.36)

0

since the θ -integral involving f ± ( z 0) vanishes. Thus f ( z 0 ) is the integral f (z 0 ) =

´

1 2π i

f (z ) dz ± z − z0

(6.37)

which is a miniature version of Cauchy’s integral formula. Now consider the counterclockwise contour C± in Fig. 6.3 which is a big counterclockwise circle, a small clockwise circle, and two parallel straight lines, all within a simply connected region R in which f (z ) is analytic. As we saw in Examples 6.3 and 6.7, the function 1 /(z − z 0 ) is analytic except at z = z 0 . Thus since the product Contours around z0

z0

C′

Figure 6.3 The full contour is the sum of a big counterclockwise contour C ± and a small clockwise contour, both around z 0, and two straight lines which cancel.

194

6 Complex-Variable Theory

of two analytic functions is analytic (Exercise 6.3), the function f ( z )/(z − z 0) is analytic everywhere in R except at the point z 0. We can withdraw the contour C ± to the left of the point z 0 and shrink it to a point without having the contour C ± cross z 0 . During this process, the integral of f ( z )/( z − z 0) does not change. Its final value is zero. So its initial value also is zero 0=

´

1 2π i

f ( z) dz . C ± z − z0

(6.38)

We let the two straight-line segments approach each other so that they cancel. What remains of contour C ± is a big counterclockwise contour C around z 0 and a tiny clockwise circle of radius ± around z 0. The tiny clockwise circle integral is the negative of the counterclockwise integral (6.37), so we have 0=

´

1 2π i

f (z ) 1 dz = 2π i C± z − z 0

´

f ( z) 1 dz − 2π i C z − z0

Using the miniature result (6.37), we find f ( z0 ) =

1 2π i

´

´

f (z ) dz . ± z − z0

f ( z) dz C z − z0

(6.39)

(6.40)

which is Cauchy’s integral formula. We can use this formula to compute the first derivative f ± (z ) of f (z ) f ± (z ) =

= = So in the limit dz

f (z + dz ) − f ( z ) dz ² 1 ´ 1 1 ± ± dz f (z ) 2π i dz z ± − z − dz ´ 1 f (z ± ) ± dz ± . 2π i ( z − z − dz )( z ± − z )

→ 0, we get

1 f ± (z ) =

´

2π i

dz ±

1

− z± − z

f (z ± ) . (z ± − z ) 2

³ (6.41)

(6.42)

The second derivative f (2) ( z ) of f ( z ) then is f

(2)

( z)

2

= 2π i

And its nth derivative f (n) (z ) is f (n ) (z ) =

n! 2π i

´

´

f (z ± ) . ( z ± − z )3

(6.43)

f (z ± ) . ( z ± − z )n+1

(6.44)

dz ±

dz ±

In these formulas, the contour runs counterclockwise about the point z and lies within the simply connected domain R in which f (z ) is analytic.

6.4 Cauchy’s Integral Formula

195

R is infinitely differentiable

Thus a function f (z ) that is analytic in a region there.

Example 6.8(Schlaefli’s formula for the Legendre polynomials) Rodrigues showed (Section 9.2) that the Legendre polynomial Pn (x ) is the nth derivative Pn (x ) =

² d ³n

1 n 2 n!

dx

(x

2

− 1)n .

(6.45)

Schlaefli used this expression and Cauchy’s integral formula (6.44) to represent Pn (z) as the contour integral (Exercise 6.9)

´

(z ±2 − 1 )n 1 dz ± (6.46) Pn (z ) = n 2 2π i (z ± − z)n+1 in which the contour encircles the complex point z counterclockwise. This formula tells us that at z = 1 the Legendre polynomial is

Pn (1) =

1 n 2 2π i

´

±2 − 1)n 1 dz ± = n ± n + 1 (z − 1) 2 2π i (z

´

± + 1)n dz± (z ± − 1)

(z

=1

(6.47)

in which we applied Cauchy’s integral formula (6.40) to f (z ) = (z + 1)n . Example 6.9 (Bessel functions of the first kind) The counterclockwise integral around the unit circle z = ei θ of the ratio z m / zn in which both m and n are integers is 1

´

2π i

dz

zm zn

= 2π1 i

µ2 0

π

= 21π

ie i θ d θ ei (m −n )θ

µ2 0

π

d θ ei (m +1−n)θ .

(6.48)

If m + 1 − n ² = 0, this integral vanishes because exp 2π i (m + 1 − n ) = 1 1 2π

µ2 0

π

dθ e

i ( m +1−n)θ

=

º

1 2π

ei ( m +1−n)θ i (m + 1 − n )

»2

π

If m + 1 − n = 0, the exponential is unity exp i (m + 1 − n)θ 2π/2π = 1. Thus the original integral is the Kronecker delta 1

´

2π i

zm zn

dz

= 0.

(6.49)

0

= 1, and the integral is

= δm+1 n .

(6.50)

,

The generating function (10.5) for Bessel functions Jm of the first kind is et (z −1/ z )/2

=

∞ ±

m =−∞

z m Jm (t ).

Applying our integral formula (6.50) to it, we find 1 2π i

´

dz e

t (z −1/ z )/2

1

1

= 2π i zn +1 =

´

∞ ±

m =−∞

dz

∞ ± m =−∞

(6.51)

zm

z n+1

δm +1,n+1 Jm (t )

Jm (t )

= Jn (t ).

(6.52)

196 Thus letting z

6 Complex-Variable Theory

= ei

θ

, we have

Jn (t ) = or more simply 1 Jn (t ) = 2π (see Exercise 6.4).

µ2 0

π

µ2

1 2π

dθ e

0

π

º ( i −i ) » e −e d θ exp t − inθ 2 θ

i ( t sin θ −n θ )

1

=π

µ 0

π

θ

d θ cos(t sin θ − n θ )

(6.53)

(6.54)

6.5 Harmonic Functions The Cauchy–Riemann conditions (6.10) ux

= vy

and

uy

= −vx

(6.55)

tell us something about the laplacian of the real part u of an analytic function f = u + i v . First, the second x-derivative u x x is u x x = v yx = vx y = −u yy . So the real part u of an analytic function f is a harmonic function ux x

+ u yy = 0

that is, one with a vanishing laplacian. Similarly vx x = −u yx = imaginary part of an analytic function also is a harmonic function vx x

+ vyy = 0.

(6.56)

−v yy , so the (6.57)

A harmonic function h ( x , y ) can have saddle points, but not local minima or maxima because at a local minimum both h x x > 0 and h yy > 0, while at a local maximum both h x x < 0 and h yy < 0. So in its domain of analyticity, the real and imaginary parts of an analytic function f have neither minima nor maxima. For static fields, the electrostatic potential φ ( x , y , z ) is a harmonic function of the three spatial variables x, y, and z in regions that are free of charge because the electric field is E = −∇ φ , and its divergence vanishes ∇ · E = 0 where the charge density is zero. Thus the laplacian of the electrostatic potential φ ( x , y , z ) vanishes

∇ · ∇ φ = φx x + φ yy + φzz = 0

(6.58)

and φ ( x , y , z ) is harmonic where there is no charge. The location of each positive charge is a local maximum of the electrostatic potential φ ( x , y , z ) and the location of each negative charge is a local minimum of φ (x , y , z ). But in the absence of charges, the electrostatic potential has neither local maxima nor local minima. Thus

6.5 Harmonic Functions

197

one cannot trap charged particles with an electrostatic potential, a result known as Earnshaw’s theorem. The Cauchy–Riemann conditions imply that the real and imaginary parts of an analytic function are harmonic functions with 2-dimensional gradients that are mutually perpendicular (u x , u y )

· (vx , vy ) = v y vx − v x v y = 0.

(6.59)

And we know that the electrostatic potential is a harmonic function. Thus the real part u ( x , y ) (or the imaginary part v(x , y )) of any analytic function f (z ) = u ( x , y ) + i v(x , y ) describes the electrostatic potential φ ( x , y ) for some electrostatic problem that does not involve the third spatial coordinate z. The surfaces of constant u (x , y ) are the equipotential surfaces, and since the two gradients are orthogonal, the surfaces of constant v(x , y ) are the electric field lines. Example 6.10(Two-dimensional potentials) The function f ( z) = u + i v

= E z = E x +i E y

(6.60)

= z 2 = x 2 − y 2 + 2i x y

(6.61)

can represent a potential V (x , y, z ) = E x for which the electric-field lines E = − E xˆ are lines of constant y. It also can represent a potential V (x , y , z) = E y in which E points in the negative y-direction, which is to say along lines of constant x. Another simple example is the function f (z ) = u + i v

for which u = x 2 − y2 and v = 2x y. This function gives us a potential V (x , y , z) whose equipotentials are the hyperbolas u = x 2 − y 2 = c2 and whose electric-field lines are the perpendicular hyperbolas v = 2x y = d 2. Equivalently, we may take these last hyperbolas 2x y = d 2 to be the equipotentials and the other ones x 2 − y 2 = c2 to be the lines of the electric field. For a third example, we write the variable z as z = re i θ = exp(ln r + i θ ) and use the function λ = − (ln r + i θ ) (6.62) 2π ±0 2π ± 0 ¼ which describes the potential V (x , y , z) = −(λ/ 2π ± 0) ln x 2 + y 2 due to a line of charge per unit length λ = q / L . The electric-field lines are the lines of constant v

f (z) = u (x , y) + i v(x , y) = −

E or equivalently of constant θ .

λ

= 2πλ±

0

ln z

(x , y , 0) x 2 y2

+

(6.63)

198

6 Complex-Variable Theory

6.6 Taylor Series for Analytic Functions

Let’s consider the contour integral of the function f (z ± )/( z ± − z ) along a circle C inside a simply connected region R in which f ( z) is analytic. For any point z inside the circle, Cauchy’s integral formula (6.40) tells us that f ( z) =

´

1 2π i

f ( z± ) ± dz . ± C z −z

(6.64)

We add and subtract the center z 0 from the denominator z ± − z f (z ) =

2π i

and then factor the denominator 1

f (z ) =

´

1

2π i

f ( z± ) dz ± ± C z − z0 − ( z − z0 )

´

f ( z± )

C (z ±

− z 0)

¸

1−

z− z0 z ± −z 0

From Fig. 6.4, we see that the modulus of the ratio (z unity, and so the power series

²

1−

³ z − z 0 −1 z± − z0

(6.65)

¹ dz± .

(6.66)

− z0 )/(z ± − z0 ) is less than

∞ ² z − z ³n ± 0 = ± − z0 z n=0

(6.67)

Taylor-series contour around z0

z

z0

′

z

Figure 6.4 Contour of integral for the Taylor series (6.69).

6.8 Liouville’s Theorem

199

by (5.32–5.35) converges absolutely and uniformly on the circle. We therefore are allowed to integrate the series

´

f (z ) =

2π i

term by term f ( z) =

² z − z ³n 0 dz ± ± z − z0

f (z ± ) ± ± C z − z 0 n =0

1

∞ ± n=0

(z

∞

− z 0)

n

´

1 2π i

f ( z ± ) dz ± . ± n +1 C ( z − z 0)

(6.68)

(6.69)

Cauchy’s integral formula (6.44) tells us that the integral is just the nth derivative f (n ) (z ) divided by n-factorial. Thus the function f ( z ) possesses the Taylor series

∞ (z − z )n ± 0 f ( z) = n! n=0

f ( n) ( z 0)

(6.70)

which converges as long as the point z is inside a circle centered at z 0 that lies within a simply connected region in which f ( z ) is analytic.

R

6.7 Cauchy’s Inequality

Suppose a function f ( z ) is analytic in a region that includes the disk | z | ≤ R and that f (z ) is bounded by | f ( z )| ≤ M on the circle |z | = R which is the perimeter of the disk. Then by using Cauchy’s integral formula (6.44), we may bound the nth derivative f ( n) (0) of f (z ) at z = 0 by

|f

(n)

|≤

( 0)

≤

´ | f (z)||dz| n+1 µ 2 |zR| d θ n! M = Rn R n +1

n! 2π n !M 2π

π

0

(6.71)

which is Cauchy’s inequality. This inequality bounds the terms of the Taylor series (6.70)

∞ |(z − z )n ± 0 |f n ! n=0

( n)

|≤M

(z 0)

∞ |(z − z )n ± 0 n =0

Rn

(6.72)

showing that it converges (5.35) absolutely and uniformly for |z − z 0 | < R. 6.8 Liouville’s Theorem Suppose now that f ( z ) is analytic everywhere (entire) and bounded by

| f (z )| ≤ M

for all

|z | ≥ R 0.

(6.73)

200

6 Complex-Variable Theory

Then by applying Cauchy’s inequality (6.71) at successively larger values of R, we have n! M =0 (6.74) | f (n)(0)| ≤ Rlim →∞ R n

≥ 1 every derivative f n (z) vanishes at z = 0 f n (0) = 0. (6.75) But then the Taylor series (5.78) about z = 0 for the function f (z ) consists of only

which shows that for n

( )

( )

a single term, and f ( z ) is a constant f (z ) =

∞ zn ± n! n=0

f (n ) (0) = f ( 0) (0) = f ( 0).

(6.76)

So every bounded entire function is a constant (Joseph Liouville, 1809–1882). 6.9 Fundamental Theorem of Algebra Gauss applied Liouville’s theorem to the function f ( z) =

1 = Pn (z ) c0 + c1 z + c2 z 2 + · · · + c n z n 1

(6.77)

which is the inverse of an arbitrary polynomial of order n. Suppose that the polynomial Pn ( z ) had no zero, that is, no root anywhere in the complex plane. Then f (z ) would be analytic everywhere. Moreover, for sufficiently large | z |, the polynomial Pn ( z ) is approximately Pn (z ) ≈ c n z n , and so f (z ) would be bounded by something like (6.78) | f (z )| ≤ |c 1| R n ≡ M for all |z | ≥ R0 . n 0 So if Pn ( z ) had no root, then the function f (z ) would be a bounded entire function and so would be a constant by Liouville’s theorem (6.76). But of course, f (z ) = 1/ Pn (z ) is not a constant unless n = 0. Thus any polynomial Pn (z ) that is not a constant must have a root, a pole of f (z ), so that f ( z ) is not entire. This is the only exit from the contradiction. If the root of Pn ( z ) is at z = z 1, then Pn ( z ) = (z − z 1 ) Pn −1( z ), in which Pn−1 (z ) is a polynomial of order n − 1, and we may repeat the argument for its reciprocal f 1( z ) = 1/ Pn −1( z ). In this way, one arrives at the fundamental theorem of algebra: Every polynomial Pn (z ) = c0 + c1 z + · · · + cn z n has n roots somewhere in the complex plane Pn ( z ) = c n (z − z 1 )( z − z 2) · · · ( z − z n ).

(6.79)

6.10 Laurent Series

201

6.10 Laurent Series Consider a function f ( z ) that is analytic in an annulus that contains an outer circle C1 of radius R 1 and an inner circle C2 of radius R2 as in Fig. 6.5. We integrate f ( z) along a contour C12 within the annulus that encircles the point z in a counterclockwise fashion by following C1 counterclockwise and C2 clockwise and a line joining them in both directions. By Cauchy’s integral formula (6.40), this contour integral yields f (z ) f (z ) =

1 2π i

´

f ( z± ) ± dz . ± C12 z − z

(6.80)

The integrations in opposite directions along the line joining C1 and C2 cancel, and we are left with a counterclockwise integral around the outer circle C1 and a clockwise one around C2 or minus a counterclockwise integral around C2 f ( z) =

1 2π i

´

1 f ( z± ) ± dz − ± 2π i C1 z − z

´

f ( z ±± ) ±± dz . ±± C2 z − z

(6.81)

Now from the figure (6.5), the center z 0 of the two concentric circles is closer to the points z ±± on the inner circle C2 than it is to z and also closer to z than to the points z ± on C1

½½ z±± − z ½½ 0½ ½½ 0 at a point z 0 if ( z − z 0 )n f (z ) is analytic at z 0 but ( z − z 0 )n−1 f ( z ) has an isolated singularity at z 0. A pole of order n = 1 is called a simple pole. Poles are isolated singularities. A function is meromorphic if it is analytic for all z except for poles. Example 6.11(Poles) The function f (z ) = has a pole of order j at z

=

j for j

n ¾ j =1

1

(z

− j) j

(6.92)

= 1, 2, . . . , n. It is meromorphic.

An essential singularity is a pole of infinite order. If a function f ( z ) has an essential singularity at z 0, then its Laurent series (6.86) really runs from n = −∞ and not from n = − ² as in (6.89). Essential singularities are spooky: if a function f (z ) has an essential singularity at w , then inside every disk around w , f (z ) takes on every complex number, with at most one exception, an infinite number of times (Émile Picard, 1856–1941).

204

6 Complex-Variable Theory

Example 6.12(Essential singularity) The function f (z ) = exp(1/z ) has an essential singularity at z = 0 because its Laurent series (6.86) f (z ) = e1/z

=

∞ ±

m =0

runs from n = −∞. Near z = 0, f (z) except 0 an infinite number of times.

1 1 m ! zm

=

0 ±

1 n |n|! z n =−∞

(6.93)

= exp(1/z ) takes on every complex number

Example 6.13(Meromorphic function with two poles) The function f (z) = 1/z (z + 1) has poles at z = 0 and at z = −1 but otherwise is analytic; it is meromorphic. We may expand it in a Laurent series (6.87–6.88) f (z ) = about z

∞ ±

1 z (z

+ 1) = n =−∞ an z

n

(6.94)

= 0 for |z | < 1. The coefficient an is the integral ´ dz 1 an = 2π i C z n +2 (z + 1)

in which the contour C is a counterclockwise circle of radius r may expand 1/(1 + z ) as the power series 1

= 1+z Doing the integrals, we find an

=

∞ ±

m =0

1 2π i

´ C

∞ ±

m =0

(

−z )m zdz = n+2

(

(6.95)
−3 apart from simple poles at z = 0, −1, and −2. Proceeding in this way, we may analytically con³(z )

tinue the gamma function to the whole complex plane apart from the negative integers and zero. The analytically continued gamma function is represented by Weierstrass’s formula ³(z )

=

1 −γ z e z

º¾ ∞¸ n =1

Example 6.17(Riemann’s zeta function) ζ ( z)

=

∞ 1 ± n=1

nz

1+

z ¹ − z/n e n

»−1

.

(6.110)

Ser found an analytic continuation

n ² ³ k ± n ( −1 ) = z − 1 n +1 k (k + 1)z −1 n=0 k =0

∞ 1 ± 1

(6.111)

of Riemann’s zeta function (5.105) to the whole complex plane except for the point z = 1 (Joseph Ser, 1875–1954). Example 6.18 (Dimensional regularization) theory involve badly divergent integrals like I (4) =

µ

The loop diagrams of quantum field

( )a

q2 d4q ( ) (2π ) 4 q 2 + α 2 b

(6.112)

where often a = 0 and b = 2 and α2 > 0. Gerardus ’t Hooft (1946–) and Martinus J. G. Veltman (1931–) promoted the number of spacetime dimensions from 4 to a complex number d. The resulting integral has the value (Srednicki, 2007, p. 102)

6.13 Calculus of Residues

I (d ) =

µ

( )a

q2 ddq ( ) (2π )d q 2 + α2 b

207

1 a + d /2) = ³(b(−4πa)d−2d³(/2b))³( 2 b − ³(d /2 ) (α ) a−d 2 /

/

(6.113)

and so defines a function of the complex variable d that is analytic everywhere except for simple poles at d = 2(n − a + b) where n = 0, 1, 2, . . . , ∞. At these poles, the formula

−1)n ³(−n + z) = n! (

where γ

¿

1 z

−γ +

n ± 1 k =1

k

+ O ( z)

À

(6.114)

= 0.5772... is the Euler–Mascheroni constant (5.8) can be useful. 6.13 Calculus of Residues

A contour integral of an analytic function f ( z ) does not change unless the endpoints move or the contour crosses a singularity or leaves the region of analyticity (Section 6.3). Let us consider the integral of a function f (z ) along a counterclockwise contour C that encircles n poles at z k for k = 1, . . . , n in a simply connected region R in which f (z ) is meromorphic. We may shrink the area within the contour C without changing the value of the integral until the area is infinitesimal and the contour is a sum of n tiny counterclockwise circles Ck around the n poles

´

C

f (z ) dz =

n ´ ± k =1

Ck

f ( z ) dz .

(6.115)

These tiny counterclockwise integrals around the poles at z i are 2 π i times the residues a−1 (z i ) defined by Laurent’s formula (6.88) with n = −1. So the whole counterclockwise integral is 2π i times the sum of the residues of the enclosed poles of the function f ( z )

´

C

f (z ) dz

= 2π i

n ± k =1

a−1 (z k ) = 2 π i

n ± k =1

Res( f , z k )

(6.116)

a result that is known as the residue theorem. In general, one must do each tiny counterclockwise integral about each pole z i , but simple poles are an important special case. If w is a simple pole of the function f (z ), then near it f ( z ) is given by its Laurent series (6.87) as f (z ) =

a−1 (w) ± + an (w) (z − w)n . z−w n =0

∞

(6.117)

In this case, its residue is by (6.90) with −² = −1

a−1 (w) = lim (z − w) f ( z ) z →w

(6.118)

208

6 Complex-Variable Theory

which usually is easier to do than the integral (6.88) a−1 (w) =

´

1

2π i C

Example 6.19(A function with simple poles) f (z ) =

∞ ±

n=1

f ( z )dz .

(6.119)

The integral of the function

z z − n−s

(6.120)

= 0 is just the sum of its residues ´ ∞ ∞ 1 ± ± 1 f (z) dz = lim (z − n−s ) f (z ) = = ζ (s ) 2π i z →n − ns

along a circle of radius 2 with center at z s

n =1

n =1

(6.121)

which is the zeta function (5.105). Example 6.20(Cauchy’s integral formula) Suppose the function f (z) is analytic within a region R and that C is a counterclockwise contour that encircles a point w in R. Then the counterclockwise contour C encircles the simple pole at w of the function f (z )/( z − w), which is its only singularity in R. By applying the residue theorem and formula (6.118) for the residue a−1 (w) of the function f (z)/( z − w) , we find

´

C z

f ( z)

−w

dz

f (z ) = 2π i a−1 (w) = 2π i zlim (z − w) = 2π i → z −w w

f (w).

(6.122)

So Cauchy’s integral formula (6.40) is an example of the calculus of residues. Example 6.21 (A meromorphic function) By the residue theorem (6.116), the integral of the function 1 1 (6.123) f ( z) = z − 1 (z − 2)2 along the circle C

= 4ei

θ

´

for 0 ≤ θ

C

≤ 2π is the sum of the residues at z = 1 and z = 2 ¶ · f (z ) dz = 2π i a−1 (1) + a−1(2) . (6.124)

The function f (z) has a simple pole at z to evaluate the residue a−1(1) as

= 1, and so we may use the formula (6.118)

a−1 (1) = lim (z − 1) f (z ) = lim z →1

z →1 (z

1

− 2)2 = 1

(6.125)

instead of using Cauchy’s integral formula (6.40) to do the integral of f (z) along a tiny circle about z = 1, which gives the same result a−1 (1) =

1 2π i

´

dz

1

z − 1 (z −

2)2

= (1 −1 2)2 = 1.

(6.126)

6.14 Ghost Contours

209

The residue a−1 (2) is the integral of f (z) along a tiny circle about z do by using Cauchy’s integral formula (6.42) a−1 (2) =

1 2π i

´

dz 1 (z − 2 )2 z − 1

=

½

d 1 ½½ dz z − 1 ½z =2

= 2, which we

= − (2 −1 1)2 = −1

(6.127)

getting the same answer as if we had used the recipe (6.90) for a−2 a−2(2) = lim (z − 2)2 z →2

and (6.91) for a−1 a−1(2) = lim (z − 2)

Á

z →2

1

(z

− 1)(z − 2)2 = 1

1 (z − 1)( z − 2 )2

−

a−2(2) (z − 2)2

Â

= −1 .

(6.128)

(6.129)

The sum of these two residues is zero, and so the integral (6.124) vanishes. Another way of evaluating this integral is to deform it, not into two tiny circles about the two poles, but rather into a huge circle z = Rei θ and to notice that as R → ∞ the modulus of this integral vanishes ½½ 2π ½½´ ½½ f (z ) dz ½½ ≈ 2 → 0. (6.130) R This contour is an example of a ghost contour.

6.14 Ghost Contours Often one needs to do an integral that is not a closed counterclockwise contour. Integrals along the real axis occur frequently. One sometimes can convert a line integral into a closed contour by adding a contour along which the integral vanishes, a ghost contour. We have just seen an example (6.130) of a ghost contour, and we shall see more of them in what follows. Example 6.22(Using ghost contours) Ib

=

µ∞

Consider the integral 1

dx . (6.131) −∞ − i )(x − 2i )( x − 3i ) We could do the integral by adding a contour Rei from θ = 0 to θ = π . In the limit R → ∞, the integral of 1 /[(z − i )( z − 2i )(z − 3i )] along this contour vanishes; it is (x

θ

a ghost contour. The original integral I and the ghost contour encircle the three poles, and so we could compute I by evaluating the residues at those poles. But we also could add a ghost contour around the lower half-plane. This contour and the real line encircle no poles. So we get I = 0 without doing any work at all. Example 6.23 (Fourier transform of a gaussian) During our computation of the Fourier transform of a gaussian (4.16–4.19), we promised to justify the shift in the

210

6 Complex-Variable Theory

variable of integration from x to x + ik / 2m 2 in this chapter. So let us consider the contour integral of the entire function f (z ) = exp(−m 2 z2 ) over a rectangular closed contour along the real axis from − R to R and then from z = R to z = R + ic and then from there to z = − R + ic and then to z = − R. Since f (z ) is analytic within the contour, the integral is zero

´

2 2 dz e−m z

=

µR

2 2 dz e −m z −R

+

µ

R+ic

R

2 2 dz e−m z

+

µ − R +ic

µ −R 2 2 − m2 z2 dz e + dz e−m z R +ic − R +ic

=0

for all finite positive values of R and so also in the limit R → ∞. The two contours in the imaginary direction are of length c and are damped by the factor exp(−m 2 R 2), and so they vanish in the limit R → ∞. They are ghost contours. It follows then from this last equation in the limit R → ∞ that

µ∞

−∞

2 2 dx e −m ( x +ic)

µ∞

=

−∞

2 2 dx e −m x

=

√π

(6.132)

m

2 which is the √promised result (4.18). Setting c = k/(2m ) and dividing both sides of (6.132) by 2π , we see that the Fourier transform of a gaussian is a gaussian (4.19)

f˜(k) =

µ∞

√dx

−∞ 2π

e−ikx e−m

√

2

x2

= √1

e−k

2 /4m 2

2m

.

(6.133)

= p/± , k = − ±q,˙ and Ã m ³ ² m q˙ 2 ³ µ ∞ ² p2 q˙ p dp + i ± ± 2π ± = 2π ± ± 2 exp −± 2±2 exp − ± (6.134) 2m −∞

Dividing both sides of this formula by m 2 = ± ±2 /(2m ), we get

2π and setting x

a formula we’ll use in Section 20.5 to derive path integrals for partition functions. The earlier relation (6.132) implies (Exercise 6.22) that

µ∞

−∞ for m

>

2 2 dx e −m ( x +z )

=

µ∞

−∞

2 2 dx e−m x

=

√π m

(6.135)

0 and arbitrary complex z.

Example 6.24(A cosine integral) Ic

=

µ∞ 0

To compute the integral cos x dx , q2 + x 2

q

>

0,

(6.136)

we use the evenness of the integrand to extend the integration Ic

= 21

µ∞

cos x dx , −∞ + x 2

(6.137)

q2

write the cosine as [exp(i x ) + exp(−i x )]/ 2, and factor the denominators Ic

=

µ

1 ∞ 4 −∞ (x

ei x − iq )(x

+ iq ) dx +

µ

1 ∞ e−i x 4 −∞ ( x − iq )(x

+ iq ) dx .

(6.138)

6.14 Ghost Contours

211

We promote x to a complex variable z and add the contours z = Rei θ and z = Re−i θ as θ goes from 0 to π respectively to the first and second integrals. The term exp(i z )dz/(q 2 + z 2) = exp(i R cos θ − R sin θ )i Rei θ d θ /(q 2 + R 2e2i θ ) vanishes in the limit R → ∞ , so the first contour is a counterclockwise ghost contour. A similar argument applies to the second (clockwise) contour, and we have Ic

= 41

´

ei z 1 dz + (z − iq )( z + iq ) 4

´

e−i z dz . (z − iq )(z + iq )

(6.139)

The first integral picks up the pole at iq and the second the pole at −iq Ic

iπ 2

=

² e−q 2iq

+

So the value of the integral is π e−q /2q.

e−q 2iq

³

−q

e = π2q

(6.140)

.

Example 6.25 (Third-harmonic microscopy) An ultra-short laser pulse intensely focused in a medium generates a third-harmonic electric field E3 in the forward direction proportional to the integral (Boyd, 2000)

∝χ3

( )

E 30

µ∞

ei ´ k z

dz

(6.141) + 2i z /b)2 −∞ along the axis of the beam as in Fig. 6.6. Here b = 2π t02 n/λ = kt02 in which n = n (ω) E3

(1

is the index of refraction of the medium, λ is the wavelength of the laser light in the medium, and t0 is the transverse or waist radius of the gaussian beam, defined by E (r ) = E exp(−r 2 / t02 ). When the dispersion is normal, that is when dn (ω)/d ω > 0, the shift in the wave vector ´k = 3ω [n(ω) − n (3ω)]/c is negative. Since ´ k < 0, the exponential is damped when z = x + iy is in the lower half-plane (LHP) Third-harmonic microscopy

–L

unseen

visible

unseen

L

Figure 6.6 In the limit in which the distance L is much larger than the wavelength λ , the integral (6.141) is nonzero when an edge (solid line) lies where the beam is focused but not when a feature (. . . ) lies where the beam is not focused. Only features within the focused region are visible.

212

6 Complex-Variable Theory ei ´k z

= ei

´k (x

+i y) = ei ´k x e−´k y .

(6.142)

So as we did in Example 6.24, we will add a contour around the lower half-plane (z = R e i θ , π ≤ θ ≤ 2π , and dz = i Rei θ d θ ) because in the limit R → ∞, the integral along it vanishes; it is a ghost contour. The function f (z) = exp(i ´k z )/( 1 + 2i z/b)2 has a double pole at z = ib/ 2 which is in the UHP since the length b > 0, but no singularity in the LHP y < 0. So the integral of f (z ) along the closed contour from z = − R to z = R and then along the ghost contour vanishes. But since the integral along the ghost contour vanishes, so does the integral from − R to R. Thus when the dispersion is normal, the third-harmonic signal vanishes, E 3 = 0, as long as the medium with constant χ ( 3) (z ) effectively extends from −∞ to ∞ so that its edges are in the unfocused region like the dotted lines of Fig. 6.6. But an edge with ´k > 0 in the focused region like the solid line of the figure does make a third-harmonic signal E3 . Third-harmonic microscopy lets us see features instead of background. Example 6.26(Green and Bessel)

Let us evaluate the Fourier transform

I (x ) =

µ∞

−∞

dk

eikx k2 + m 2

(6.143)

of the function 1/(k 2 + m 2). If x > 0, then the exponential deceases with Im k in the upper half-plane. So as in Example 6.24, the semicircular contour k = R e i θ for 0 ≤ θ ≤ π on which dk = i Rei θ d θ is a ghost contour. So if x > 0, then we can add this contour to the integral I (x ) without changing it. Thus I (x ) is equal to the closed contour integral along the real axis and the semicircular ghost contour I (x ) =

´

eikx dk 2 k + m2

=

´

dk

eikx . (k + im )(k − im)

(6.144)

This closed contour encircles the simple pole at k = im and no other singularity, and so we may shrink the contour into a tiny circle around the pole. Along that tiny circle, the function eikx /(k + im ) is simply e−mx /2im, and so I (x) =

e −mx 2im

´

−mx

−mx

dk k − im

= 2π i e2im = π em

dk k + im

= −2π i −e2im = π em

for

x

>

0.

(6.145)

Similarly if x < 0, we can add the semicircular ghost contour k = R e i θ , π ≤ θ ≤ 2π , dk = i Rei θ dθ with k running around the perimeter of the lower half-plane. So if x < 0, then we can write the integral I (x ) as a shrunken closed contour that runs clockwise around the pole at k = −im I (x ) =

emx −2im

´

mx

mx

for

We combine the two cases (6.145) and (6.146) into the result

x

0, we add a ghost contour in the UHP and find J (x ) =

´

eikx dk (k + im )2(k − im )2

= 2m 2 π

If x

0,

(6.205)

we put the cut on the positive real axis. The integral backwards along and just below the positive real axis I−

=

µ0

∞ (x +

dx a)2

√

x − i±

=−

µ0

dx

√

∞ (x + a)2 x + i ±

= Is

(6.206)

is the same as Is since a minus sign from the square root cancels the minus sign due to the backwards direction.

6.16 Powers and Roots Since lim

223

|z | √ = 0,

(6.207)

|z|→∞ |z + a|2 | z |

√

the integrals of f (z) = 1/ [ (z + a)2 z ] along the contours z = R exp(i θ ) for 0 < θ < π and for π < θ < 2 π vanish as R → ∞. So these contours are ghost contours. We then add a pair of cancelling integrals along the negative real axis up to the pole at z = −a and then add a clockwise loop C around it. As in Fig. 6.8, the integral along this collection of contours encloses no singularity and therefore vanishes 0 = Is + I− + I G+

+ IG− + IC .

(6.208)

= − IC , and so from Cauchy’s integral formula (6.44) for n = 1, we have ½ ´ π 1 1 d −1 2 ½½ 1 dz = − i π z Is = − IC = − √ ½z=−a = 2a3 2 (6.209) 2 2 C (z + a)2 z dz

Thus 2Is

/

/

which one may check with the Mathematica command Assuming [a [1/(( x + a )2 ∗Sqrt[x]), {x,0,Infinity}].

>

0, Integrate

Example 6.37(Contour integral with a cut) Let’s compute the integral I

=

µ∞ 0

xa

(x

+ 1)2 dx

(6.210)

for −1 < a < 1. We promote x to a complex variable z and put the cut on the positive real axis. Since |z|a +1 = 0, (6.211) lim | z|→∞ | z + 1|2 the integrand vanishes faster than 1/| z |, and we may add two ghost contours, G+ counterclockwise around the upper half-plane and G − counterclockwise around the lower half-plane, as shown in Fig. 6.8. We add a contour C that runs from −∞ to the double pole at z = −1, loops around that pole, and then runs back to −∞ ; the two long contours along the negative real axis cancel because the cut in θ lies on the positive real axis. So the contour integral along C is just the clockwise integral around the double pole which by Cauchy’s integral formula (6.42) is

´

½

a½ ½ − 2π i dz dz ½z =−1

za

= 2π i a e ai . dz = 2 − ( − 1 )) C We also add the integral I− from ∞ to 0 just below the real axis µ 0 exp(a(ln(x ) + 2π i )) µ 0 ( x − i ±)a dx = dx I− = (x + 1)2 ∞ ∞ (x − i ± + 1)2 which is µ ∞ xa 2 ai I− = − e dx = − e 2 ai I . 2 0 (x + 1) (z

π

π

π

(6.212)

(6.213)

(6.214)

224

6 Complex-Variable Theory Ghost contours and a cut G+

I+ C

I–

G–

√

Figure 6.8 The integrals of f (z ) = 1/[ (x +a )2 z ] as well as that of f (z ) = z a /(z + 1)2 along the ghost contours G+ and G− and the contours C , I− , and I+ vanish because the combined contour encircles no poles of either f (z ). The cut (solid line) runs from the origin to infinity along the positive real axis.

Now the sum of all these contour integrals is zero because it is a closed contour that encloses no singularity. So we have

¸

0 = 1 − e2π ai or I

=

µ∞ 0

¹

I

+ 2π i a e

xa dx (x + 1)2

π ai

= sinπ(πa a)

(6.215)

(6.216)

as the value of the integral (6.210). Example 6.38(Euler’s reflection formula) y = 1 − z is the integral

The beta function (5.76) for x

B(z, 1 − z) = ³( z) ³(1 − z ) =

µ1 0

t z −1 (1 − t )−z dt .

= z and (6.217)

= u/(1 + u ), so that u = t /(1 − t ) and dt = 1/(1 + u )2, we have µ ∞ u z−1 du . (6.218) B(z, 1 − z ) = 1+u 0 We integrate f (u) = u z −1/(1 + u ) along the contour of the preceding example (6.37) which includes the ghost contour G = G+ ∪ G− and runs down both sides of the Setting t

cut along the positive real axis. Since f (u) is analytic inside the contour, the integral vanishes

0=

µ I+

f (u) du

+

µ

6.17 Conformal Mapping

G

f (u ) du +

µ

C

f (u) du

+

225

µ

f (u) du .

(6.219)

−1) = 2π iei π z .

(6.220)

I−

The clockwise contour C is

µ

u z −1 du C 1+u

= − 2π i (−1)z−1 = − 2π iei

π( z

The contour I+ runs just above the positive real axis, and the integral of f (u ) along it is the desired integral B(z , 1 − z). The contour I− runs backwards and just below the cut where u = | u| − i ±

µ

I−

f (u) du

µ ∞ (|u|e2

=−

0

πi

−± )z−1

1+u

du

= − e2

πiz

µ ∞ u z−1 du . 1+u 0

(6.221)

Thus the vanishing (6.219) of the contour integral 0 = B (z, 1 − z ) + 2π i e i π z − e2π i z B(z, 1 − z )

(6.222)

gives us Euler’s reflection formula B(z , 1 − z ) = ³(z ) ³(1 − z) = Example 6.39(A Matthews and Walker integral) I

=

µ∞ 0

π

sin π z

(6.223)

.

To do the integral

dx 1 + x3

(6.224)

we promote x to a complex variable z and consider the function f (z ) = ln z /(1 + z 3 ). If we put the cut in the logarithm on the positive real axis, then f (z ) is analytic everywhere except for z ≥ 0 and at z 3 = − 1. The integral of f (z) along the ghost contour z = Rexp(i θ ) from θ = ± to θ = 2π − ± and along both sides of the real axis from z = i ± to z = R + i ± and from z = R − i ± to z = −i ± is by the residue theorem (6.116)

´

f (z ) dz

=

´

ln(z) dz (z − 1 )( z − ei π/3 )(z − e2i π/3)

2

= − 4π√ i . 3 3

(6.225)

Since ln(x + i ±) = ln(x ) and ln( x − i ±) = ln(x ) + 2π i , while | ± ln(±)| → 0 as ± → 0, that same integral approaches −2π iI as R → ∞. Thus the integral (6.224) is √ I = 2π/(3 3).

6.17 Conformal Mapping An analytic function f ( z ) maps curves in the z plane into curves in the f (z ) plane. In general, this mapping preserves angles. To see why, we consider the angle d θ between two tiny complex lines dz = ± exp(i θ ) and dz ± = ± exp(i θ ± ) that radiate from the same point z. The angle d θ = θ ± − θ is the phase of the ratio

226

6 Complex-Variable Theory

dz ± dz

=

± ei θ

±

±ei θ

= ei

(θ

± −θ )

(6.226)

.

Let’s use w = ρ ei φ for f ( z ). Then the analytic function f ( z ) maps dz into dw and dz ± into The angle d φ

=

f ( z + dz ) − f (z ) ≈ f ± (z ) dz

(6.227)

d w ± = f (z + dz ± ) − f (z ) ≈ f ±( z ) dz ± .

(6.228)

= φ ± − φ between d w and d w± is the phase of the ratio d w± dw

=

±

ei φ ei φ

=

f ± ( z ) dz ± f ± ( z ) dz

=

dz ± dz

=

±

ei θ ei θ

= ei

(θ

± −θ )

.

(6.229)

So as long as the derivative f ± (z ) does not vanish, the angle in the w -plane is the same as the angle in the z-plane dφ

= dθ.

(6.230)

Analytic functions preserve angles. They are conformal maps. ±± ±± What if f ± (z ) = 0? In this case, d w ≈ f (z ) dz 2/2 and d w± ≈ f (z ) dz ±2/2, and so the angle d φ = d φ ± − d φ between these two tiny complex lines is the phase of the ratio ±± ± f (z ) dz ±2 dz ±2 ± d w± ei φ = = = e 2i (θ −θ ). (6.231) = ±± i φ 2 2 dw e f (z ) dz dz

So angles are doubled, d φ = 2d θ . In general, if the first nonzero derivative is f (n ) (z ), then d w± dw

and so d φ

=

±

ei φ ei φ

=

f ( n) ( z ) dz ±n f (n ) (z ) dz n

±n

= dz = eni dz n

(θ

± −θ )

(6.232)

= nd θ . The angles increase by a factor of n.

Example 6.40 (z n ) The function f (z) = z n has only one nonzero derivative f ( k ) (0) = n! δ nk at the origin z = 0. So at z = 0 the map z → z n scales angles by n, d φ = n d θ , but at z ² = 0 the first derivative f ( 1) (z ) = nz n −1 is not equal to zero. So z n is conformal except at the origin. Example 6.41(Möbius transformation)

The function

f (z ) =

az + b cz + d

(6.233)

maps (straight) lines into lines and circles and maps circles into circles and lines, unless ad = bc in which case it is the constant b/d.

6.18 Cauchy’s Principal Value

227

6.18 Cauchy’s Principal Value Suppose that f ( x ) is differentiable or analytic at and near the point x we wish to evaluate the integral

= lim →0

K

µ

b

−a

±

f (x ) x − i±

dx

(6.234)

for a > 0 and b > 0. First we regularize the pole at x devised by Cauchy K

= lim →0

Á

δ

lim

±

²µ −

f (x ) dx x − i± −a δ

→0

µ

f (x ) dx −δ x − i ±

+

δ

= 0, and that

+

µ

= 0 by using a method b

δ

f (x ) dx x − i±

³Â .

(6.235)

In the first and third integrals, since |x | ≥ δ , we may set ± = 0 K

= lim →0

²µ −

f (x ) dx x −a δ

δ

+

µ

b δ

f (x ) dx x

³

+ lim lim →0 →0 δ

±

µ

δ

−δ

dx

f (x ) . x − i±

(6.236)

We’ll discuss the first two integrals before analyzing the last one. The limit of the first two integrals is called Cauchy’s principal value

µ

b

f (x ) dx P x −a

≡ lim →0

²µ −

µ

f (x ) dx + x −a δ

δ

b δ

f (x ) dx x

³

(6.237)

.

If the function f ( x ) is nearly constant near x = 0, then the large negative values of 1/x for x slightly less than zero cancel the large positive values of 1/x for x slightly greater than zero. The point x = 0 is not special; Cauchy’s principal value about x = y is defined by the limit

µ

b

f (x ) dx P −a x − y

≡ lim →0

²µ

y −δ

−a

δ

f (x ) dx x−y

+

µb

f (x ) dx x −y y +δ

³

(6.238)

.

Using Cauchy’s principal value, we may write the quantity K as K

=

µb

f (x ) dx P x −a

+ lim lim →0 →0 δ

±

µ

δ

−δ

dx

f (x ) . x − i±

(6.239)

To evaluate the second integral, we use differentiability of f (x ) near x write f ( x ) = f (0 ) + x f ± (0 ) and then extract the constants f (0 ) and f ± (0)

µ

f (x ) lim lim dx δ→ 0 ±→ 0 −δ x − i± δ

= lim lim →0 →0 δ

=

±

µ

δ

−δ

dx

µ

f ( 0) + x f ± (0 ) x − i±

dx f (0 ) lim lim δ→ 0 ± → 0 −δ x − i ± δ

+

µ

= 0 to

x dx f ± (0 ) lim lim δ→ 0 ± → 0 −δ x − i ± δ

228

6 Complex-Variable Theory

= =

µ

dx + f ± (0) lim 2δ δ →0 µ δ d−xi ± f (0) lim lim . δ →0 ± →0 −δ x − i ± f (0) lim lim δ

δ

→0 ± →0 −δ x

(6.240)

Since 1/(z − i ±) is analytic in the lower half-plane, we may deform the straight contour from x = −δ to x = δ into a tiny semicircle that avoids the point x = 0 by setting z = δ ei θ and letting θ run from π to 2π K

=P

µ

b

−a

f (x ) x

dx

+ f ( 0) lim lim →0 →0 δ

µ

−δ

±

= 0 and so write K as µ b f (x ) µ2 dx + f (0) lim K =P →0 x −a µ b f (x ) = P d x x + i π f (0).

δ

dz

1

z − i±

.

(6.241)

We now can set ±

δ

π

i δ ei θ d θ

π

1 δ ei θ (6.242)

−a

Recalling the definition (6.234) of K , we have

µb

f (x ) lim dx ± →0 −a x − i ±

=P

µb

−a

dx

for any function f ( x ) that is differentiable at x 1

1 = P + i π δ( x ) x −i± x

and

f (x ) + i π f (0) x

(6.243)

= 0. Physicists write this as

1

1 = P − i π δ( x ) x + i± x

(6.244)

or as 1 x − y ´ i±

= P x −1 y ∓ i π δ( x − y).

(6.245)

Example 6.42(An application of Cauchy’s trick) We use (6.244) to evaluate the integral µ∞ 1 1 I = dx (6.246) x + i ± 1 + x2 −∞ as

I

=P

µ∞

dx

−∞

1 1 x 1 + x2

− iπ

µ∞

dx

−∞

δ(x )

1 + x2

.

(6.247)

Because the function 1/ x (1 + x 2 ) is odd, the principal part is zero. The integral over the delta function gives unity, so we have I = −i π .

6.18 Cauchy’s Principal Value Example 6.43(Cubic form of Cauchy’s principal value) the integral P

229 Cauchy’s principal value of

µb

f (x ) dx −a x 3

(6.248)

is finite as long as f (z ) is analytic at z = 0 with a vanishing first derivative there, f ± (0) = 0. In this case Cauchy’s integral formula (6.43) says that

µb

f (x ) dx ( x − i ±)3 −a

= =

µb

µ2

f (x ) P dx 3 x −a

+ lim →0

f (x ) P dx 3 x −a

+ i 2 f ±± (0).

µb

π

δ

i δ ei θ d θ

π

f (δ ei θ )

(δ ei )3 θ

(6.249)

π

Example 6.44(Cauchy’s principal value) By explicit use of the formula

µ

= − 2a1

dx x2

−

a2

x+a x−a

ln

one may show (Exercise 6.32) that P

µ∞

dx

−

x2

0

a2

µ a−

=

µ∞

dx

δ

x2

0

− + a2

(6.250)

dx x2

a +δ

− a2 = 0

(6.251)

a result we’ll use in Section 6.21. Example 6.45(sin k / k) To compute the integral Is

=

µ ∞ dk 0

k

sin k

(6.252)

which we used to derive the formula (4.113) for the Green’s function of the laplacian in 3 dimensions, we first express I s as an integral along the whole real axis Is

=

µ ∞ dk ¸ 0

2ik

eik − e−ik

¹ µ ∞ dk ik = 2ik e

by which we actually mean the Cauchy principal part Is

= lim →0 δ

²µ −

eik dk −∞ 2ik δ

+

µ∞ δ

eik dk 2ik

³

=P

Using Cauchy’s trick (6.244), we have Is

=

µ∞

eik P dk −∞ 2ik

=

µ∞

(6.253)

−∞

eik dk −∞ 2i (k + i ±)

+

µ∞

µ∞ −∞

−∞

dk

eik . 2ik

dk i π δ(k )

eik . 2i

(6.254)

(6.255)

To the first integral, we add a ghost contour around the upper half-plane. For the contour from k = L to k = L + i H and then to k = − L + i H and then down to k = − L, one may show (Exercise 6.35) that the integral of exp(ik )/ k vanishes in the double limit L → ∞ and H → ∞. With this ghost contour, the first integral therefore

230

6 Complex-Variable Theory

vanishes because the pole at k = −i ± is in the lower half-plane. The delta function in the second integral then gives π/2, so that Is

=

´

dk

eik 2i (k + i ±)

+ π2 = π2

(6.256)

as stated in (4.112). Example 6.46(The Feynman propagator) Adding í ± to the denominator of a pole term of an integral formula for a function f (x ) can slightly shift the pole into the upper or lower half-plane, causing the pole to contribute if a ghost contour goes around the upper half-plane or the lower half-plane. Such an i ± can impose a boundary condition on a Green’s function. The Feynman propagator ´ F (x ) is a Green’s function for the Klein–Gordon differential operator (Weinberg, 1995, pp. 274–280) (m

in which x

2

= (x 0 , x ) and

− ±)´ F (x ) = δ4 (x ) 2

(6.257)

2

± = ³ − ∂∂t 2 = ³ − ∂ (∂x 0 )2 is the 4-dimensional version of the laplacian dimensional Dirac delta function (4.36) 4

δ (x )

=

µ

(6.258)

³ ≡ ∇ · ∇ . Here δ 4(x ) is the four-

d 4q exp[i (q · x − q0 x 0 )] = (2 π )4

µ

d 4q iq x e (2 π )4

(6.259)

in which qx = q · x − q 0 x 0 is the Lorentz-invariant inner product of the 4-vectors q and x. There are many Green’s functions that satisfy Eq. (6.257). Feynman’s propagator ´ F (x ) ´ F (x )

=

µ

d 4q exp(iqx ) 4 2 (2π ) q + m 2 − i ±

=

µ

µ

∞ dq 0 ei q· x−iq x d3q (2 π )3 −∞ 2π q 2 + m 2 − i ± 0 0

(6.260)

is the one that satisfies boundary conditions ¼ that will become evident when we analyze the effect of its i ±. The quantity E q = q 2 + m 2 is the energy of a particle of mass m and momentum q in natural units with the speed of light c = 1. Using this abbreviation and setting ±± = ±/2E q , we may write the denominator as q2 + m 2 − i ±

= q ·q −

¸ 0 ¹2 q

+ m 2 − i± =

¸

Eq − i ± ± − q 0

¹¸

Eq

¹ − i ± ± + q 0 + ±±2

in which ± ±2 is negligible. Dropping the prime on ± , we do the q 0 integral I (q ) = −

µ ∞ dq0 −∞ 2π

e−iq

0 x0

¶q 0 − ( E − i ±) · ¶1q 0 − (− E + i ±)· . q q

(6.261)

(6.262)

6.18 Cauchy’s Principal Value

231

Ghost contours and the Feynman propagator x

0

0

Figure 6.9 In Equation (6.263), the function f (q 0) has poles at ´(E q − i ±), and the function exp(−iq 0 x 0 ) is exponentially suppressed in the lower halfplane if x 0 > 0 and in the upper half-plane if x 0 < 0. So we can add a ghost contour (. . . ) in the LHP if x 0 > 0 and in the UHP if x 0 < 0. As shown in Fig. 6.9, the integrand 0 0

x

¶

· ¶1 0

· (6.263) − ( Eq − i ±) q − (− Eq + i ±) has poles at E q − i ± and at − E q + i ±. When x 0 > 0, we can add a ghost contour that e−iq

q0

goes clockwise around the lower half-plane and get I (q ) = ie −i E q x

0

1 2E q

x0

> 0.

(6.264)

When x 0 < 0, our ghost contour goes counterclockwise around the upper half-plane, and we get 0 1 I (q) = ie i E q x x 0 < 0. (6.265) 2E q Using the step function θ ( x ) = (x

− i I (q ) = 2E1

+ |x |)/2, we combine (6.264) and (6.265) Ä 0 −i E x Å θ (x ) e + θ (−x 0 ) e i E x . q

q

0

q

0

(6.266)

In terms of the Lorentz-invariant function ´ +(x )

= (2π1 )3

µ

d 3q exp[i (q · x − E q x 0)] 2E q

(6.267)

232

6 Complex-Variable Theory

and with a factor of −i , Feynman’s propagator (6.260) is

− i ´ F (x ) = θ (x 0 ) ´+ (x ) + θ (−x 0 ) ´+ (x , −x 0).

(6.268)

The integral (6.267) defining ´+ (x ) is insensitive to the sign of q, and so ´ +(

1

−x ) = (2π )3 = (2π1 )3

µ

d3 q exp[i (−q · x + E q x 0 )] 2E q

µ

(6.269)

d3 q exp[i (q · x + E q x 0 )] = ´+ (x , − x 0). 2E q

Thus we arrive at the standard form of the Feynman propagator

− i ´ F (x ) = θ (x 0 ) ´+ (x ) + θ (−x 0) ´+(−x ).

(6.270)

The annihilation operators a(q ) and the creation operators a† ( p) of a scalar field φ (x ) satisfy in natural units the commutation relations

[a(q ), a† ( p)] = δ3(q − p)

[a(q ), a( p)] = [a †(q ), a† ( p)] = 0.

and

(6.271)

Thus the commutator of the positive-frequency part

+ φ (x ) =

µ

¼

d3 p

exp[i ( p · x − p0 x 0 )] a( p)

(2π ) 32 p 0

= φ+ + φ− with its negative-frequency part µ d3 q − ¼ 3 0 exp[−i (q · y − q 0 y 0)] a †(q ) φ ( y) =

(6.272)

of a scalar field φ

(2π )

2q

is the Lorentz-invariant function ´+ (x

[φ+ (x ), φ − ( y)] = =

µ

µ

(6.273)

− y)

d3 p d 3 q

¼

(2 π )3 2

q 0 p0

ei px −iqy [a ( p), a† (q)]

d3 p ei p (x −y ) (2 π )3 2p 0

= ´ +( x − y )

(6.274)

in which p(x − y ) = p · (x − y) − p0 (x 0 − y 0). At points x that are space-like, that is, for which x 2 = x 2 − (√ x 0)2 ≡ r 2 > 0, the Lorentz-invariant function ´+ (x ) depends only upon r = + x 2 and has the value (Weinberg, 1995, p. 202) m ´+ (x ) = K 1 (mr ) (6.275) 4π 2 r in which the Hankel function K 1 is K 1( z ) = −

π

2

[ J1 (i z ) + i N1 (i z )] =

1 z

+ 2z

Á ¸z¹ ln

2

Â + γ − 21 + · · ·

(6.276)

6.19 Dispersion Relations

233

where J1 is the first Bessel function, N1 is the first Neumann function, and γ = 0.57721 . . . is the Euler–Mascheroni constant. The Feynman propagator arises most simply as the mean value in the vacuum of the time-ordered productof the fields φ (x ) and φ ( y )

T

{φ (x )φ (y )} ≡ θ ( x 0 − y0 )φ (x )φ (y ) + θ ( y 0 − x 0 )φ (y )φ (x ).

(6.277)

The operators a ( p) and a† ( p) respectively annihilate the vacuum ket a ( p)| 0µ = 0 and bra ¶0|a †( p) = 0, and so by (6.272 and 6.273) do the positive- and negative-frequency parts of the field φ +(z )|0µ = 0 and ¶0|φ − (z) = 0. Thus the mean value in the vacuum of the time-ordered product is

¶0|T {φ (x )φ (y )} |0µ = ¶0|θ ( x 0 − y 0)φ (x )φ ( y) + θ ( y0 − x 0 )φ (y )φ (x )|0µ = ¶0|θ ( x 0 − y 0)φ +( x )φ− (y ) + θ ( y 0 − x 0)φ+ ( y)φ −( x )|0µ = ¶0|θ ( x 0 − y 0)[φ +( x ), φ− ( y)] + θ ( y0 − x 0 )[φ+ ( y), φ − (x )]|0µ. (6.278) But by (6.274), these commutators are ´+ (x − y) and ´+ (y − x ). Thus the mean value in the vacuum of the time-ordered product

¶0|T {φ (x )φ ( y)} |0µ = θ (x 0 − y0 )´+( x − y ) + θ ( y 0 − x 0 )´+ (y − x ) = −i ´ F ( x − y ) (6.279) is the Feynman propagator (6.268) multiplied by −i. 6.19 Dispersion Relations In many physical contexts, functions occur that are analytic in the upper half-plane. Suppose for instance that fˆ(t ) is a transfer function that determines an effect e( t ) due to a cause c (t ) e (t ) =

µ∞

−∞

dt ± fˆ( t − t ± ) c (t ± ).

If the system is causal, then the transfer function fˆ(t − t ± ) is zero for t and so its Fourier transform µ ∞ dt µ ∞ dt i zt ˆ √ f (t ) e = √ fˆ(t ) ei zt f ( z) = 2π −∞ 2π 0

(6.280)

− t ± < 0, (6.281)

will be analytic in the upper half-plane and will shrink as the imaginary part of z = x + i y increases. So let us assume that the function f ( z ) is analytic in the upper half-plane and on the real axis and further that lim

r →∞

| f (r ei )| = 0 θ

for 0 ≤ θ

≤ π.

(6.282)

234

6 Complex-Variable Theory

By Cauchy’s integral formula (6.40), if z 0 lies in the upper half-plane, then f ( z 0 ) is given by the closed counterclockwise contour integral

´

1

f ( z 0) =

2π i

f ( z) dz z − z0

(6.283)

in which the contour runs along the real axis and then loops over the semi-circle for 0 ≤ θ

lim r ei θ

r →∞

≤ π.

(6.284)

Our assumption (6.282) about the behavior of f (z ) in the upper half-plane implies that this contour (6.284) is a ghost contour because its modulus is bounded by 1 lim r →∞ 2π

µ | f (r ei )|r dθ = θ

r

lim

r →∞

| f (r ei ) | = 0. θ

(6.285)

So we may drop the ghost contour and write f ( z 0) as

µ

∞ f (x ) dx. 2π i −∞ x − z 0 1

f ( z 0) =

(6.286)

= x0 + i y0 shrink to ± µ ∞ f ( x) 1 f ( x 0) = dx 2π i −∞ x − x 0 − i ±

Letting the imaginary part y0 of z 0

and using Cauchy’s trick (6.245), we get f (x 0 ) =

1 2π i

P

µ∞

f (x ) dx −∞ x − x 0

or f ( x 0) =

1

P

2π i

which is the dispersion relation f ( x 0) =

+ 2iππi

µ∞

µ∞ −∞

f (x ) dx −∞ x − x 0

µ

f (x ) δ( x − x0 ) d x

+ 12

∞ f ( x) 1 P dx. πi −∞ x − x 0

f ( x 0)

(6.287)

(6.288)

(6.289)

(6.290)

If we break f ( z ) = u (z ) + i v(z ) into its real u (z ) and imaginary v(z ) parts, then this dispersion relation (6.290) u ( x 0) + i v( x0 ) =

=

µ

∞ u ( x ) + i v(x ) 1 P dx (6.291) πi x − x0 −∞ µ ∞ v( x) µ ∞ u(x ) 1 i P dx − P dx π π −∞ x − x 0 −∞ x − x0

6.20 Kramers–Kronig Relations

breaks into its real and imaginary parts u ( x 0) =

1 π

P

µ∞

v(x )

−∞ x

− x0

dx

and

v( x 0)

1

= −π

P

235

µ

∞ u(x ) dx −∞ x − x 0

(6.292)

which express u and v as Hilbert transformsof each other. In applications of dispersion relations, the function f (x ) for x < 0 sometimes is either physically meaningless or experimentally inaccessible. In such cases, there may be a symmetry that relates f (−x ) to f (x ) . For instance, if f ( x ) is the Fourier transform of a real function fˆ(k ) , then by Eq. (4.25) it obeys the symmetry relation f ∗ (x ) = u ( x ) − i v( x ) = f (−x ) = u (− x ) + i v(− x ),

(6.293)

which says that u is even, u (−x ) = u ( x ) , and v odd, v(−x ) = −v( x ). Using these symmetries, one may show (Exercise 6.38) that the Hilbert transformations (6.292) become u (x 0 ) =

2 π

µ∞

P

0

x v(x ) dx x 2 − x 02

and

v(x 0 )

=−

2x 0 π

µ∞

P

0

u(x ) d x (6.294) x 2 − x 02

which do not require input at negative values of x. 6.20 Kramers–Kronig Relations

If we use σ E for the current density J and E ( t ) = e−i ωt E for the electric field, then Maxwell’s equation ∇ × B = µ J + ±µ E˙ becomes

¹ ¸ σ E ≡ −i ωn 2 ±0 µ0 E ∇ × B = −i ω±µ 1 + i ±ω

in which the squared index of refraction is n 2 (ω) =

¸

1+i

±µ ± 0µ 0

σ ±ω

¹

(6.295)

(6.296)

.

The imaginary part of n 2 represents the scattering of light mainly by electrons. At high frequencies in nonmagnetic materials n 2 (ω) → 1, and so Kramers and Kronig applied the Hilbert-transform relations (6.294) to the function n 2 (ω) − 1 in order to satisfy condition (6.282). Their relations are Re(n

2

(ω0 ))

2

=1+π

P

and Im( n

2

(ω0 ))

=−

2ω0 π

P

µ

∞ ω Im(n 2(ω)) 0

µ

ω2

− ω20

dω

∞ Re(n 2(ω)) − 1 0

ω2

− ω02

d ω.

(6.297)

(6.298)

236

6 Complex-Variable Theory

What Kramers and Kronig actually wrote was slightly different from these dispersion relations (6.297 and 6.298). H. A. Lorentz had shown that the index of refraction n (ω) is related to the forward scattering amplitude f (ω) for the scattering of light by a density N of scatterers (Sakurai, 1982) 2π c2

n (ω) = 1 +

N f (ω).

ω2

(6.299)

They used this formula to infer that the real part of the index of refraction approached unity in the limit of infinite frequency and applied the Hilbert transform (6.294)

µ ∞ ω± Im[n(ω ± )] d ω± . (6.300) Re[n (ω)] = 1 + P π ω±2 − ω2 0 The Lorentz relation (6.299) expresses the imaginary part Im[n (ω)] of the index of 2

refraction in terms of the imaginary part of the forward scattering amplitude f (ω) Im[n (ω)] = 2π(c /ω)2 N Im[ f (ω)].

(6.301)

And the optical theoremrelates Im[ f (ω)] to the total cross-section σtot

= 4|kπ| Im[ f (ω)] = 4ωπ c Im[ f (ω)].

(6.302)

Thus we have Im[n (ω)] = cN σtot /(2ω) , and by the Lorentz relation (6.299) Re[n (ω)] = 1 + 2 π(c /ω)2 N Re[ f (ω)]. Insertion of these formulas into the Kramers–Kronig integral (6.300) gives a dispersion relation for the real part of the forward scattering amplitude f (ω) in terms of the total cross-section Re[ f (ω)] =

ω2

2π 2 c

P

µ∞

σtot (ω± )

ω±2

0

±

− ω2 d ω .

(6.303)

6.21 Phase and Group Velocities Suppose A ( x , t ) is the amplitude A(x, t ) =

µ

e

i ( p· x− Et )/±

A( p ) d p = 3

µ

e i ( k· x−ωt ) B ( k ) d 3 k

(6.304)

where B (k ) = ±3 A( ±k ) varies slowly compared to the phase exp[i ( k · x − ω t )]. The phase velocity v p is the linear relation x = v p t between x and t that keeps the phase φ = p · x − Et constant as a function of the time 0 = p · d x − E dt

= ( p · v p − E ) dt ⇐⇒

vp

= Ep pˆ = ωk kˆ

(6.305)

6.21 Phase and Group Velocities

237

in which p = | p |, and k = | k|. For light in the vacuum, v p = c = (ω/ k ) kˆ . For ¼ 2a 2particle2 4of mass m > 0, the phase velocity exceeds the speed of light, v p = c p + m c / p ≥ c. The more physical group velocityv g is the linear relation x = v g t between x and t that maximizes the amplitude A ( x , t ) by keeping the phase φ = p · x − Et constant as a function of the momentum p

∇ p( p · x − Et ) = x − ∇ p E ( p) t = 0

(6.306)

at the maximum of A ( p). This condition of stationary phasegives the group velocity as

= ∇ p E ( p) = ∇k ω (k). If ¼ E2 2 = p22/(4 2m ), then v g = p/m. For2 a relativistic particle c p + m c , the group velocity is v g = c p/ E , and vg ≤ c.

(6.307)

vg

with E

=

When light traverses a medium with a complex index of refraction n ( k ), the wave vector k becomes complex, and its (positive) imaginary part represents the scattering of photons in the forward direction, typically by the electrons of the medium. For simplicity, we’ll consider the propagation of light through a medium in 1 dimension, that of the forward direction of the beam. Then the (real) frequency ω ( k ) and the (complex) wave number k are related by k = n (k ) ω (k )/c, and the phase velocity of the light is vp

= Reω(k) = Re(nc(k )) .

(6.308)

If we regard the index of refraction as a function of the frequency ω , instead of the wave number k, then by differentiating the real part of the relation ωn (ω) = ck with respect to ω , we find n r (ω) + ω

dnr (ω) dω

r = c dk dω

(6.309)

in which the subscript r means real part. Thus the group velocity (6.307) of the light is dω c vg = = . (6.310) dk r n r (ω) + ω dn r /d ω Optical physicists call the denominator the group index of refraction n g (ω) = n r (ω) + ω

dnr (ω) dω

so that as in the expression (6.308) for the phase velocity v p velocity is vg = c/ n g (ω).

(6.311)

= c/ nr (ω), the group

238

6 Complex-Variable Theory

In some media, the derivative dnr /d ω is large and positive, and the group velocity vg of light there can be much less than c (Steinberg et al., 1993; Wang and Zhang, 1995) – as slow as 17 m/s (Hau et al., 1999). This effect is called slow light. In certain other media, the derivative dn/d ω is so negative that the group index of refraction n g (ω) is less than unity, and in them the group velocity v g exceeds c ! This effect is called fast light. In some media, the derivative dn r /d ω is so negative that dnr /d ω < − n r (ω)/ω , and then n g (ω) is not only less than unity but also less than zero. In such a medium, the group velocity vg of light is negative! This effect is called backwards light. Sommerfeld and Brillouin (Brillouin, 1960, ch. II & III) anticipated fast light and concluded that it would not violate special relativity as long as the signal velocity—defined as the speed of the front of a square pulse – remained less than c. Fast light does not violate special relativity (Stenner et al., 2003; Brunner et al., 2004) (Léon Brillouin 1889–1969, Arnold Sommerfeld 1868–1951). Slow, fast, and backwards light can occur when the frequency ω of the light is near a peak or resonance in the total cross-section σtot for the scattering of light by the atoms of the medium. To see why, recall that the index of refraction n (ω) is related to the forward scattering amplitude f (ω) and the density N of scatterers by the formula (6.299) n (ω) = 1 +

2π c2 ω2

N f (ω)

(6.312)

and that the real part of the forward scattering amplitude is given by the Kramers– Kronig integral (6.303) of the total cross-section Re( f (ω))

=

µ

∞ σtot(ω ± ) d ω± P . 2π 2 c ω±2 − ω 2 0 ω2

(6.313)

So the real part of the index of refraction is n r (ω) = 1 +

cN π

P

µ ∞σ 0

± ± tot (ω ) d ω

ω ±2

− ω2

.

(6.314)

If the amplitude for forward scattering is of the Breit–Wigner form f (ω)

=

f0

³/2

ω0

− ω − i ³/2

(6.315)

then by (6.312) the real part of the index of refraction is n r (ω) = 1 +

¶ − ω )2 +−³ω) · 2/4 0

π c 2 N f 0³(ω 0 ω2 (ω

(6.316)

6.22 Method of Steepest Descent

239

and by (6.310) the group velocity is

¶ · »−1 2 (ω − ω0 ) − ³ 2/4 (6.317) vg = c 1 + ¶ ·2 . ω2 (ω − ω0 )2 + ³2 /4 This group velocity v g is less than c whenever (ω − ω0 )2 > ³2 /4. But we get fast light v g > c, if (ω − ω0 )2 < ³ 2/4, and even backwards light, vg < 0, if ω ≈ ω0 with 4π c2 N f 0 / ³ω0 ¸ 1. Robert W. Boyd’s papers explain how to make slow and º

π c 2 N f 0 ³ ω0

fast light (Bigelow et al., 2003) and backwards light (Gehring et al., 2006). We can use the principal-part identity (6.251) to subtract 0=

cN π

σtot (ω)

P

µ∞ 0

1

dω ω±2 − ω2

±

(6.318)

from the Kramers–Kronig integral (6.314) so as to write the index of refraction in the regularized form nr (ω) = 1 +

cN π

P

µ∞σ

± tot (ω )

0

− σtot (ω) dω ±

ω ±2 − ω2

(6.319)

which we can differentiate and use in the group-velocity formula (6.310) v g (ω)

=c

º

1+

cN π

µ ∞ ¶σtot(ω ± ) − σtot (ω)· (ω± 2 + ω2) »−1 . P d ω± ± 2 − ω2 )2 (ω 0

(6.320)

6.22 Method of Steepest Descent Integrals like I (x ) =

µb a

dz h (z ) exp( x f (z ))

(6.321)

often are dominated by the exponential. We’ll assume that x is real and that the functions h ( z ) and f ( z ) are analytic in a simply connected region in which a and b are interior points. Then the value of the integral I ( x ) is independent of the contour between the endpoints a and b but is sensitive to the real part u (z ) of f (z ) = u ( z ) + i v(z ). But since f ( z ) is analytic, its real and imaginary parts u ( z ) and v(z ) are harmonic functions which have no minima or maxima, only saddle points (6.56). For simplicity, we’ll assume that the real part u (z ) of f (z ) has only one saddle point between the points a and b. (If it has more than one, then we must repeat the computation that follows.) If w is the saddle point, then u x = u y = 0 which by the Cauchy–Riemann equations (6.10) implies that vx = v y = 0. Thus the derivative

240

6 Complex-Variable Theory

of the function f also vanishes at the saddle point f ± (w) may approximate f (z ) as f ( z ) ≈ f (w) +

= 0, and so near w we

1 2 ±± (z − w) f (w). 2

(6.322)

Let’s write the second derivative as f (w) = ρ ei φ and choose our contour through the saddle point w to be a straight line z = w + y e i θ with θ fixed for z near w . As we vary y along this line, we want ±±

(z

so we keep 2θ

− w)2 f ±± (w) = y2 ρ e2i

θ

ei φ

w| |

T (w) T (z )

|z| |w | for the product U (z )V (w) and to |z | < |w | for the product V (w)U (z ).

2 + ( z −2w)2 T (w) + z −1 w T ± (w) + · · · R{T (z)T (w)} = (z −c/w) 4

(6.337)

in which the prime means derivative, c is a constant, and the dots denote terms that are analytic in z and w . The commutator introduces a minus sign that cancels most of the two contour integrals and converts what remains into an integral along a tiny circle Cw about the point w as in Fig. 6.10

´

´

Á

Â

T ± (w) + + . [L m , L n ] = ( z − w)4 z −w (6.338) After doing the z-integral, which is left as a homework exercise (6.41), one may use the Laurent series (6.334) for T (w) to do the w -integral, which one may choose to be along a tiny circle about w = 0, and so find the commutator (6.339) [L m , Ln ] = ( m − n) L m+n + 12c m (m 2 − 1) δm+n,0 of the Virasoro algebra. d w n+1 w 2π i

dz m +1 z Cw 2π i

c /2

Example 6.48(Using ghost contours to sum series) I

=

´

C

csc π z dz (z − a )2

2T (w) ( z − w)2

Consider the integral

along the counterclockwise rectangular contour C from z = N + 1/2 − iY to z = N +1/2+iY to z = − N −1/ 2+ iY to z = − N −1/2−iY and back to z = N + 1/2−iY

Exercises

243

in which N is a positive integer, and a is not an integer. In the twin limits N → ∞ and Y → ∞ , the integral vanishes because on the contour 1/| z − a|2 ≈ 1/ N 2 or 1/ Y 2 while | csc π z | ≤ 1. We now shrink the contour down to tiny circles about the poles of csc π z at all the integers, z = n, and about the nonintegral value, z = a. By Cauchy’s integral formula (6.42), the tiny contour integral around z = a is

´

a

csc π z dz (z − a)2

=

½

d csc π z ½½ 2π i dz ½z =a

= − 2 π 2i

cos π a sin2 π a

.

In the twin limits N → ∞ and Y → ∞, the tiny counterclockwise integrals around the poles of 1/ sin π z at z = nπ are (Exercise 6.44)

∞ ´ ±

n=−∞ n

csc π z dz (z − a)2

= 2i

∞ ±

n=−∞

−1)n (n −1 a)2 .

(

We thus have the sum rule

∞ ±

n=−∞

−1)n (n −1 a)2 = π 2 cot π a csc π a.

(

Further Reading For examples of conformal mappings see (Lin, 2011, section 3.5.7).

Exercises

6.1 Compute the two limits (6.6) and (6.7) of Example 6.2 but for the function f (x , y ) = x 2 − y 2 + 2i x y. Do the limits now agree? Explain. 6.2 Show that if f (z ) is analytic in a disk, then the integral of f ( z) around a tiny (isosceles) triangle of side ± ¹ 1 inside the disk is zero to order ± 2 . 6.3 Show that the product f ( z) g ( z) of two functions is analytic at z if both f ( z) and g (z ) are analytic at z. 6.4 Derive the two integral representations (6.54) for Bessel’s functions J n (t ) of the first kind from the integral formula (6.53). Hint: Think of the integral (6.53) as running from −π to π . 6.5 Do the integral

´

dz C

z2

−1

in which the contour C is counterclockwise about the circle |z| = 2. 6.6 The function f ( z) = 1/z is analytic in the region |z| > 0. Compute the integral of f ( z) counterclockwise along the unit circle z = e i θ for 0 ≤ θ ≤ 2π .

244

6 Complex-Variable Theory

The contour lies entirely within the domain of analyticity of the function f ( z). Did you get zero? Why? If not, why not? 6.7 Let P ( z) be the polynomial P ( z) = (z − a 1 )(z − a 2 )( z − a 3 )

(6.340)

with roots a1 , a 2 , and a3 . Let R be the maximum of the three moduli |ak |. (a) If the three roots are all different, evaluate the integral I

=

´

C

dz P ( z)

(6.341)

along the counterclockwise contour z = 2Rei θ for 0 ≤ θ ≤ 2 π . (b) Same exercise, but for a 1 = a2 ²= a 3 . 6.8 Compute the integral of the function f ( z) = e az /( z2 − 3z + 2 ) along the counterclockwise contour C± that follows the perimeter of a square of side 6 centered at the origin. That is, find

=

I

´

e az dz . 2 C± z − 3z + 2

(6.342)

6.9 Use Cauchy’s integral formula (6.44) and Rodrigues’s expression (6.45) for Legendre’s polynomial Pn (x ) to derive Schlaefli’s formula (6.46). 6.10 Use Schlaefli’s formula (6.46) for the Legendre polynomials and Cauchy’s integral formula (6.40) to compute the value of Pn (−1) . 6.11 Evaluate the counterclockwise integral around the unit circle |z| = 1

´ ¸

3 sinh2 2z − 4 cosh3 z

¹ dz z

.

(6.343)

6.12 Evaluate the counterclockwise integral around the circle |z| = 2

´

z3

z4 − 1

dz .

(6.344)

6.13 Evaluate the contour integral of the function f ( z) = sin w z /(z − 5) 3 along the curve z = 6 + 4 (cos t + i sin t ) for 0 ≤ t ≤ 2π . 6.14 Evaluate the contour integral of the function f ( z) = sin w z /(z − 5) 3 along the curve z = − 6 + 4(cos t + i sin t ) for 0 ≤ t ≤ 2π . 6.15 Is the function f ( x , y ) = x 2 + i y 2 analytic? 6.16 Is the function f ( x , y ) = x 3 − 3x y 2 + 3i x 2 y − i y3 analytic? Is the function x 3 − 3x y2 harmonic? Does it have a minimum or a maximum? If so, what are they? 6.17 Is the function f ( x , y ) = x 2 + y 2 + i ( x 2 + y 2 ) analytic? Is x 2 + y 2 a harmonic function? What is its minimum, if it has one?

Exercises

245

6.18 Derive the first three nonzero terms of the Laurent series for f ( z) = 1/(e z −1) about z = 0. 6.19 Assume that a function g (z ) is meromorphic in R and has a Laurent series (6.103) about a point w ∈ R. Show that as z → w , the ratio g ± (z)/ g ( z) becomes (6.101). 6.20 Use a contour integral to evaluate the integral Ia

=

µ

π

0

dθ , a + cos θ

a

>

1.

(6.345)

6.21 Find the poles and residues of the functions 1/ sin z and 1/ cos z. 6.22 Derive the integral formula (6.135) from (6.132). 6.23 Show that if Re w < 0, then for arbitrary complex z

µ

∞

e

−∞

w(x

Ã

+z ) 2 d x

= −πw .

(6.346)

6.24 Use a ghost contour to evaluate the integral

µ∞

x sin x dx. −∞ x 2 + a 2 Show your work; do not just quote the result of a commercial math program. 6.25 For a > 0 and b2 − 4ac < 0, use a ghost contour to do the integral

µ

∞

−∞ 6.26 Show that

µ∞ 0

6.27 Show that

6.28 Evaluate the integral

ax 2

dx + bx + c .

cos ax e −x d x 2

= 21 √ π e−a

µ∞

dx −∞ 1 + x 4

µ

∞ cos x 0

= √π

1 + x4

2

dx.

.

(6.347)

2 /4

.

(6.348)

(6.349)

(6.350)

6.29 Show that the Yukawa Green’s function (6.164) reproduces the Yukawa √ potential (6.154) when n = 3. Use K 1/2 ( x ) = π/2x e −x (10.109). 6.30 Derive the two explicit formulas (6.201) and (6.202) for the square root of a complex number. 6.31 What is (−i )i ? What is the most general value of this expression? 6.32 Use the indefinite integral (6.250) to derive the principal-part formula (6.251).

246

6 Complex-Variable Theory

6.33 The Bessel function Jn ( x ) is given by the integral Jn ( x ) =

1

´

2π i

e (x /2)( z −1/z )

C

dz

z n +1

(6.351)

along a counterclockwise contour about the origin. Find the generating function for these Bessel functions, that is, the function G (x , z) whose Laurent series has the Jn ( x )’s as coefficients G(x , z ) =

∞ ±

n =−∞

Jn ( x ) zn .

(6.352)

6.34 Show that the Heaviside function θ ( y ) = ( y + |y |)/( 2| y |) is given by the integral µ∞ dx 1 e i yx θ (y) = (6.353) 2π i −∞ x − i± in which ± is an infinitesimal positive number. 6.35 Show that the integral of exp(ik)/ k along the contour from k = L to k = L + i H and then to k = − L + i H and then down to k = −L vanishes in the double limit L → ∞ and H → ∞. 6.36 Use a ghost contour and a cut to evaluate the integral I

6.37

6.38 6.39

6.40

6.41

=

µ1

−1 ( x 2

dx

√ + 1) 1 − x 2

(6.354)

by imitating Example 6.39. Be careful when picking up the poles at z = í . If necessary, use the explicit square root formulas (6.201) and (6.202). Redo the previous exercise (6.36) by defining the square roots so that the cuts run from −∞ to −1 and from 1 to ∞. Take advantage of the evenness of the integrand and integrate on a contour that is slightly above the whole real axis. Then add a ghost contour around the upper half-plane. Show that if u is even and v is odd, then the Hilbert transforms (6.292) imply (6.294). Show why the principal-part identity (6.251) lets one write the Kramers– Kronig integral (6.314) for the index of refraction in the regularized form (6.319). Use the formula (6.310) for the group velocity and the regularized expression (6.319) for the real part of the index of refraction n r (ω) to derive formula (6.320) for the group velocity. (a) Perform the z-integral in Eq. (6.338). (b) Use the result of part (a) to find the commutator [L m , L n ] of the Virasoro algebra. Hint: use the Laurent series (6.334).

Exercises

247

6.42 Assume that ±(z ) is analytic in a disk that contains a tiny circular contour C w about the point w as in Fig. 6.10. Do the contour integral

´

Cw

±( z )

Á

c /2

+ ( z − w)4

2T (w) (z − w)2

+

T ± (w) z−w

Â

dz 2π i

(6.355)

and express your result in terms of ±(w), T (w), and their derivatives. 6.43 Show that if the coefficients a k of the equation 0 = a 0 + a1 z +· · ·+ a n zn are real, then its n roots zk are real or come in pairs that are complex conjugates, z² and z∗² , of each other. 6.44 Show that if a is not an integer, then the sum of the tiny counterclockwise integrals about the points z = n of Example 6.48 is

∞ ´ ±

n =−∞ n

csc π z dz ( z − a )2

= 2i

∞ ±

n=−∞

6.45 Use the trick of Example 6.48 with csc π z

∞ ±

n =−∞

as long as a is not an integer.

1

(

−1)n (n −1 a)2 .

→ cot π z to show that

= (n − a)2

π2

sin2 π a

7 Differential Equations

7.1 Ordinary Linear Differential Equations There are many kinds of differential equations – linear and nonlinear, ordinary and partial, homogeneous and inhomogeneous. Any way of correctly solving any of them is fine. We start our overview with some definitions. An operator of the form L

=

n ±

m =0

hm (x )

dm . dxm

(7.1)

is an nth-order, ordinary, linear differential operator . It is nth order because n n the highest derivative is d /d x . It is ordinary because all the derivatives are with respect to the same independent variable x. It is linear because derivatives are linear operators

²

³

L a1 f1 (x ) + a2 f 2( x )

= a1 L f 1( x ) + a2 L f2 (x ).

(7.2)

If all the h m ( x ) in the operator L are constants, independent of x, then L is an nth-order, ordinary, linear differential operator with constant coefficients . Example 7.1(Second-order linear differential operators) The operator L

2

= − dxd 2 − k 2

(7.3)

is a second-order, linear differential operator with constant coefficients. The secondorder linear differential operator L

=−

d dx

is in self-adjoint form(Section 7.30). 248

´

d p(x ) dx

µ

+ q (x )

(7.4)

7.1 Ordinary Linear Differential Equations

249

The differential equation L f ( x ) = 0 is homogeneous because each of its terms is linear in f or one of its derivatives f ( m ) – there is no term that is not proportional to f or one of its derivatives. The equation L f (x ) = s ( x ) is inhomogeneous because of the source term s (x ). If a differential equation is linear and homogeneous, then we can add solutions. If f1 (x ) and f 2( x ) are two solutions of the same linear homogeneous differential equation L f1 (x ) = 0 and L f2 (x ) = 0, then any linear combination of these solutions f (x ) = a1 f 1( x ) + a2 f 2( x ) with constant coefficients a 1 and a 2 also is a solution since

²

³

L f (x ) = L a 1 f1 (x ) + a2 f 2( x )

= a1 L f 1( x ) + a2 L f2 (x ) = 0.

(7.5)

This additivity of solutions makes it possible to find general solutions of linear homogeneous differential equations. Example 7.2(Sines and cosines) Two solutions of the second-order, linear, homogeneous, ordinary differential equation (ODE)

´ d2

dx 2

+k

2

µ

f (x ) = 0

(7.6)

are sin kx and cos kx, and the most general solution is the linear combination f (x ) = a1 sin kx + a2 cos kx .

The functions y 1( x ), . . . , yn ( x ) are linearly independent if the only numbers k 1, . . . , k n for which the linear combination vanishes for all x k 1 y1( x ) + k2 y2( x ) + · · · + k n yn ( x ) = 0

(7.7)

are k1 = . . . = kn = 0. Otherwise they are linearly dependent. Suppose that an nth-order linear, homogeneous, ordinary differential equation L f ( x ) = 0 has n linearly independent solutions f j ( x ), and that all other solutions to this ODE are linear combinations of these n solutions. Then these n solutions are complete in the space of solutions of this equation and form a basis for this space. The general solutionto L f ( x ) = 0 is then a linear combination of the f j ’s with n arbitrary constant coefficients f (x ) =

n ± j =1

a j f j (x ).

With a source term s ( x ), the differential equation L f ( x ) inhomogeneous linear ordinary differential equation L fi ( x ) = s (x ).

(7.8)

=

0 becomes an (7.9)

250

7 Differential Equations

If fi 1( x ) and fi2 ( x ) are any two solutions of this inhomogeneous differential equation, then their difference fi 1( x ) − fi 2( x ) is a solution of the associated homogeneous equation L f ( x ) = 0 L

²

fi1 ( x ) − fi2 (x )

³

= L f i1( x ) − L fi2 (x ) = s( x ) − s (x ) = 0.

(7.10)

Thus this difference must be given by the general solution (7.8) of the homogeneous equation for some constants a j fi1 (x ) − f i2 (x ) =

N ± j =1

a j f j ( x ).

(7.11)

It follows therefore that every solution fi 1( x ) of the inhomogeneous differential equation (7.9) is the sum of a particular solution fi2 (x ) of that equation and some solution (7.8) of the associated homogeneous equation L f = 0 fi1 (x ) = f i2( x ) +

N ± j =1

a j f j ( x ).

(7.12)

Thus the general solution of a linear inhomogeneous equation is a particular solution of that inhomogeneous equation plus the general solution of the associated homogeneous equation. A nonlinear differential equation (is one in)which a power f n (x ) of the unknown n function or of one of its derivatives f (k ) ( x ) other than n = 1 or n = 0 appears or in which the unknown function f appears in some other nonlinear way. For instance, the equations

− f ±±( x ) =

f 3( x ),

(

)2

f ±( x )

=

f ( x ),

and

f ± (x ) = e− f (x )

(7.13)

are nonlinear differential equations. We can’t add two solutions of a nonlinear equation and expect to get a third solution. Nonlinear equations are much harder to solve. 7.2 Linear Partial Differential Equations An equation of the form L f (x ) =

±

n1 ,...,nk m 1 ,...,m k =0

gm 1,..., m k (x )

∂

m 1 +···+m k

m1

∂ x1

· · · ∂ xkm

k

f (x ) = 0

(7.14)

in which x stands for x1 , . . . , x k is a linear partial differential equation of order n = n 1 + · · · + n k in the k variables x 1 , . . . , x k . (A partial differential equation is a whole differential equation that has partial derivatives.)

7.3 Separable Partial Differential Equations

251

Linear combinations of solutions of a linear homogeneous partial differential equation also are solutions of the equation. So if f1 and f 2 are solutions of L f = 0, and a 1 and a2 are constants, then f = a1 f 1 + a 2 f2 is a solution since L f = a1 L f1 + a2 L f2 = 0. Additivity of solutions is a property of all linear homogeneous differential equations, whether ordinary or partial. The general solution f ( x ) = f ( x 1, . . . , x k ) of a linear homogeneous partial ∑ a f (x ) over a complete set differential equation (7.14) is a sum f ( x ) = j j j of solutions f j (x ) of the equation with arbitrary coefficients a j . A linear partial differential equation L f i (x ) = s ( x ) with a source term s ( x ) = s ( x 1, . . . , x k ) is an inhomogeneous linear partial differential equations because of the added source term. Just as with ordinary differential equations, the difference fi1 − fi2 of two solutions of the inhomogeneous linear partial differential equation L fi = s is a solution of the associated homogeneous equation L f = 0 (7.14) L

²

³

f i1( x ) − f i2( x )

= s( x ) − s( x ) = 0.

(7.15)

So we can expand this difference in terms of the complete set of solutions f j of the homogeneous linear partial differential equation L f = 0 fi1 ( x ) − fi2 ( x ) =

±

a j f j (x ).

(7.16)

j

Thus the general solution of the inhomogeneous linear partial differential equation L f = s is the sum of a particular solution fi2 of L f = s and the general solution ∑ j a j f j of the associated homogeneous equation L f = 0 fi1 (x ) = f i2( x ) +

±

a j f j ( x ).

(7.17)

j

7.3 Separable Partial Differential Equations A homogeneous linear partial differential equation (PDE) in n variables x1 , . . . , x n is separable if it can be decomposed into n ordinary differential equations (ODEs) in each of the n variables x 1 , . . . , x n , and if suitable products of the solutions X i ( x i ) of these n ODEs is a solution f ( x 1, . . . , xn ) = X 1( x 1) X 2 (x 2 ) · · · X n (x n )

(7.18)

of the original PDE. The general solution to the PDE is then a sum of all of its linearly independent solutions f with arbitrary coefficients. In general, the separability of a partial differential equation depends upon the choice of coordinates. Several of the fundamental PDEs of classical and quantum

252

7 Differential Equations

field theory are separable in several coordinate systems because they usually involve the laplacian − ∇ · ∇ owing to their rotational symmetry. Example 7.3 (Laplace’s equation) empty space is Laplace’s equation

The equation for the electrostatic potential in

L φ ( x , y , z ) = ∇ · ∇ φ ( x , y , z) =

´ ∂2

+

∂ x2

∂2 ∂y2

+

∂2 ∂z2

µ

φ (x ,

y , z ) = 0.

(7.19)

It is a second-order linear homogeneous partial differential equation. Example 7.4(Poisson’s equation) φ is

− ² φ ( x , y , z) ≡ −

´ ∂2

∂ x2

Poisson’s equation for the electrostatic potential

+

∂2 ∂y2

+

∂2 ∂z2

µ

= ρ (x ,± y, z )

φ (x , y , z )

0

(7.20)

in which ρ is the charge density and ±0 is the electric constant. It is a second-order linear inhomogeneous partial differential equation. Example 7.5 (Maxwell’s equations)

In empty space, Maxwell’s equations are

∇ · E = 0, ∇ · B = 0, ∇ × E = − B,˙ and c2 ∇ × B = E.˙ They imply (Exercise 7.6) the wave equations

± E = E¨ /c2

and

± B = B¨ /c2

(7.21)

which are separable in rectangular, cylindrical, and spherical coordinates among others. For instance, the exponentials E (k, ω) = ± ei ( k· r −ωt ) and B (k, ω) = (kˆ × ± /c) ei ( k· r −ω t ) with ω = | k | c and kˆ · ± = 0 are solutions. Example 7.6 (Helmholtz’s equation in 2 dimensions) In 2 dimensions and in rectangular coordinates (x , y), Helmholtz’s linear homogeneous partial differential equation

− ∇ · ∇ f (x , y ) = −

´ ∂2

2

∂ + 2 ∂x ∂ y2

µ

f ( x , y ) = k2 f ( x , y )

(7.22)

is separable. The function f (x , y ) = X (x ) Y ( y) is a solution

−

´ ∂2

2

+ ∂∂y 2 ∂ x2

µ

X ( x ) Y ( y ) = −Y ( y )

∂ 2 X (x ) ∂ x2

2

− X (x ) ∂ ∂Yy(2y) = k 2 X (x )Y ( y )

as long as X and Y satisfy − X ±± (x ) = a2 X (x ) and − Y ±± ( y ) = b2Y (y ) with a 2 + b2 = k2 . For instance, if X a (x ) = eiax and Yb ( y) = eiby , then any linear combination of the products X a ( x ) Yb ( y) with a 2 + b2 = k2 will be a solution of Helmholtz’s equation (7.22). Helmholtz’s equation (7.22) also is separable in polar coordinates (ρ , φ) in 2 dimensions with a laplacian that is the 3-dimensional laplacian (2.32) without the z derivative

7.3 Separable Partial Differential Equations 2

2

∇ · ∇ f = ∂∂ρ 2f + ρ1 ∂∂ρf + ρ12 ∂∂ φf2 = ρ1 ddρ

´

ρ

df dρ

µ

253 2

+ ρ12 ∂∂ φf2 .

(7.23)

Substituting the product f (ρ , φ) = P(ρ) ²(φ) into Helmholtz’s equation (7.22) and multiplying both sides by ρ 2 /P ², we get 2P

±±

±

±±

+ ρ PP + ρ2 k 2 = − ²² = n 2. (7.24) P The first three terms are functions of ρ , the fourth term −²±± /² is a function of φ , and the last term n2 is a constant. The constant n must be an integer if ²n (φ) = ein is to be single valued on the interval [0, 2π ]. The function Pkn (ρ) = Jn (kρ) satisfies 2 ±± ± 2 2 2 ρ P kn + ρ Pkn + ρ k Pkn = n P kn (7.25) ρ

φ

because the Bessel functionof the first kind Jn (x ) obeys Bessel’s equation (10.4) x 2 Jn±± + x Jn± + x 2 Jn

= n 2 Jn

(7.26)

(Friedrich Bessel 1784–1846). Thus the product f kn (ρ , φ) = Jn (kρ) einφ is a solution to Helmholtz’s equation (7.22), as is any linear combination of such products for different n’s. Example 7.7 (Helmholtz’s equation in 3 dimensions) rectangular coordinates(x , y , z), Helmholtz’s equation

− ∇ · ∇ f ( x , y, z ) = −

´ ∂2

∂x2

∂2

∂2

+ ∂ y2 + ∂ z 2

µ

In 3 dimensions and in

f ( x , y, z ) = k 2 f ( x , y, z )

(7.27)

is separable. The function f (x , y, z ) = X (x )Y (y ) Z (z) is a solution

− ± ( X Y Z ) = − X ±± Y Z − X ( x )Y ±± Z − X Y Z ±± = k 2 X Y Z (7.28) as long as X , Y , and Z satisfy − X ±± = a2 X, −Y ±± = b2 Y , and − Z ±± = c2 Z with a2 + b2 + c2 = k 2 . Thus if X a ( x ) = eiax , Yb ( y) = eiby , and Z c (z ) = eicz , then any linear combination of the products X a Yb Z c with a 2 + b2 + c2 = k2 will be a solution of Helmholtz’s equation (7.27). Helmholtz’s equation (7.27) also is separable in polar coordinates (ρ , φ , z) with laplacian (2.32) 1

∇ ·∇ f = ±f = ρ

¶(

ρ

f ,ρ

)

1

,ρ

+ρ

f,φφ

+ ρ f zz

·

,

.

(7.29)

Setting f (ρ , φ , z) = P(ρ) ²(φ) Z (z ) and multiplying both sides of Helmholtz’s equation (7.27) by −ρ 2 /P ² Z , we get ρ2

f

2 ±± ± ±± ±± ² f = ρ P P+ ρ P + ²² + ρ2 ZZ = − k 2 ρ2.

(7.30)

If we set Z α (z ) = eαz , then this equation becomes (7.24) with k 2 replaced by α 2 + k 2. Its solution is

254

7 Differential Equations

¸

f (ρ , φ , z ) = Jn ( k 2 + α2 ρ) ein φ e αz

(7.31)

in which n must be an integer if f (ρ , φ , z) is to apply to the full range of φ from 0 to 2π . The case in which k = 0 corresponds to Laplace’s equation with solution f (ρ , φ , z ) = Jn (kρ)ein φ eαz . The alternative choice Z ±± = − α2 Z leads to the solution

¸

f (ρ , φ , z) = Jn ( k2 − α2 ρ) ein φ ei αz .

(7.32)

But if k 2 < α2, this solution involves the modified Bessel functionIn ( x ) = i −n Jn (i x ) (Section 10.3)

¸

f (ρ , φ , z) = I n (

α2

− k 2 ρ) ein

φ

ei α z .

(7.33)

Helmholtz’s equation (7.27) also is separable in spherical coordinates with the laplacian (2.33)

² f = r12 ∂∂r

´

r2

∂f ∂r

µ

1 ∂ + r 2 sin θ ∂θ

´

sin θ

∂f

µ

∂θ

+

1

∂

2

r 2 sin

2

f

2 θ ∂φ

(7.34)

the first term of which can be written as r −1 (r f ),rr . Setting f (r, θ , φ) = R (r ) ³(θ ) ²m (φ) where ²m = eimφ and multiplying both sides by −r 2 / R ³², we get

(r 2 R± )± (sin θ ³± )± m2 + − = −k 2 r 2 . R sin θ ³ sin2 θ

(7.35)

The first term is a function of r, the next two terms are functions of θ , and the last term is a constant. We set the r -dependent terms equal to a constant ´(´ + 1) − k2 and the θ -dependent terms equal to −´(´ + 1 ). The associated Legendre function³´m (θ ) satisfies (9.93)

(sin θ ³± )± / sin θ + ¹´(´ + 1) − m 2/ sin2 θ º ³ = 0. (7.36) m m If ²(φ) = eim is to be single valued for 0 ≤ φ ≤ 2π , then the parameter m must be an integer. The constant ´ also must be an integer with −´ ≤ m ≤ ´ (Example 7.39, Section 9.12) if ³ m (θ ) is to be single valued and finite for 0 ≤ θ ≤ π . The product f = R ³ ² then will obey Helmholtz’s equation (7.27) if the radial function Rk (r ) = j (kr ) satisfies » 2 ± ¼± ¹ 2 2 º r Rk + k r − ´(´ + 1) Rk = 0 (7.37) ´

´

φ

´

´

´

´

´

which it will because the spherical Bessel functionj´ (x ) obeys Bessel’s equation (10.67)

»

¼± x 2 j´± + [ x 2 − ´(´ + 1)] j´

= 0.

(7.38)

In 3 dimensions, Helmholtz’s equation separates in 11 standard coordinate systems (Morse and Feshbach, 1953, pp. 655–664).

7.3 Separable Partial Differential Equations

255

In special relativity, the time and space coordinates ct and x often are written as x 0 , x 1, x 2 , and x 3 or as the 4-vector ( x 0 , x ). To form the invariant inner product px ≡ x · x − p0 x 0 = x · x − Et as pa x a with a summed from 0 to 3, one attaches a minus sign to the time components of 4-vectors with lowered indexes so that p0 = − p 0 and x 0 = −x 0 . The derivatives ∂a f and ∂ a f are ∂a

f

= ∂∂xfa

and

∂

a

f

= ∂∂xf = − ∂∂xfa = −∂a f . a

The opposite metric with px set equal to p 0 x 0

(7.39)

− x · x is also in use.

Example 7.8 (Klein–Gordon equation) In Minkowski space, the analog of the laplacian in natural units (± = c = 1) is (summing over a from 0 to 3) 2

2

2

2

2

2

± = ∂a ∂ a = ± − ∂∂x 02 = ∇ · ∇ − ∂∂t 2 = ∂∂x 2 + ∂∂y2 + ∂∂z 2 − ∂∂t 2 in which for time derivatives ∂ 0 = Klein–Gordon equation (4.174) is

»

± − m2

¼

A (x )

− ∂0, but for spatial derivatives ∂ i =

µ ´ 2 = ± − ∂∂t 2 − m 2 A(x ) = 0.

If we set A(x ) = B ( px ) where px = pa x a = p · x derivative of A is pk times the first derivative of B ∂ ∂ xk

A (x ) =

∂ ∂xk

(7.40) ∂i .

The

(7.41)

− p0 x 0 , then the kth partial

B ( px ) = pk B ± ( px )

(7.42)

and so the Klein–Gordon equation (7.41) becomes

»

¼

= ( p2 − ( p0 )2) B±± − m2 B = = p2 − ( p0 )2 . Thus if B( p · x ) =

± − m2

A

p2 B ±± − m 2 B

=0

(7.43)

in which p2 exp(i p · x ) so that B ±± = − B, 0 and if the energy–momentum 4-vector ( p , p) satisfies p2 + m 2 = 0, then A(x ) will satisfy¸the Klein-Gordon equation. The condition p2 + m 2 = 0 relates the energy p0 = p2 + m 2 to the momentum p for a particle of mass m. Example 7.9(Field of a spinless boson) φ (x )

½

= ¸

d3 p 2p0 (2π )3

The quantum field (4.183)

¹

a ( p)ei px

+ a† ( p)e−i px

º

(7.44)

describes ¸ m. It satisfies the Klein–Gordon equation ( ) spinless bosons of mass ± − m 2 φ ( x ) = 0 because p0 = p2 + m2 . The operators a( p) and a† ( p) respectively represent the annihilation and creation of the bosons and obey the commutation relations

[a ( p), a† ( p± )] = δ 3( p − p± )

and [a ( p), a( p± )] = [ a† ( p), a† ( p±)] = 0

(7.45)

256

7 Differential Equations

in units with ±

˙

= c = 1. These relations make the field φ ( x ) and its time derivative ˙ y, t )] = relations [φ (x , t ), φ( ˙ ] = 0 in which a dot means a [ ] = [˙

φ( y) satisfy the canonical equal-time commutation i δ 3 (x y) and φ (x , t ), φ ( y, t ) φ( x , t ), φ( y, t )

−

time derivative.

Example 7.10(Field of the photon) The electromagnetic field has four components, but in the Coulomb or radiation gauge ∇ · A(x ) = 0, the component A0 is a function of the charge density, and the vector potential A in the absence of charges and currents satisfies the wave equation ± A(x ) = 0 for a spin-one massless particle. We write it as A(x ) =

2 ½ ± s =1

¸

¹

d3 p 2p0 (2π )3

e( p, s ) a( p, s ) ei px

+ e∗ ( p, s ) a† ( p, s) e−i px

º

(7.46)

in which the sum is over the two possible polarizations s. The energy p0 is equal to the modulus | p| of the momentum because the photon is massless, p2 = 0. The dot product of the polarization vectors e( p, s ) with the momentum vanishes p · e( p, s ) = 0 so as to respect the gauge condition ∇ · A(x ) = 0. The annihilation and creation operators obey the commutation relation

[a( p, s), a† ( p± , s ± )] = δ3( p − p± ) δs s ± (7.47) as well as [a( p, s ), a ( p± , s ± )] = 0 and [a †( p, s), a†( p± , s ±)] = 0. Because of the Coulomb-gauge condition ∇ · A(x ) = 0, the commutation relations of the vector ,

potential A(x ) involve the transverse delta function

²

A i (t , x ), A˙ j (t , y)

³

2

= i δi j δ 3 (x − y ) + i ∂ x∂i ∂ x j 4π |x1− y| ´ µ 3 ½ = i ei k· x− y δi j − ki k2 j (2dπk)3 . ( )

(

(7.48)

)

k

Example 7.11(Dirac’s equation) Fields χb(x ) that describe particles of spin onehalf have four components, b = 1, . . . , 4. In the absence of interactions, they satisfy the Dirac equation

(γ a ∂ + m δ ) χ (x ) = 0 bc c bc a

(7.49)

in which repeated indices are summed over – b, c from 1 to 4 and a from 0 to 3. In matrix notation, the Dirac equationis

(

a γ ∂a

+m

)

χ (x )

= 0.

(7.50)

The four Dirac gamma matrices are defined by the 10 rules

{γ a , γ b} ≡ γ a γ b + γ b γ a = 2ηab (7.51) in which η is the 4 × 4 diagonal matrix η00 = η00 = −1 and ηbc = ηbc = δbc for b, c = 1, 2, or 3.

7.5 Separable First-Order Differential Equations

257

is a 4-component field that satisfies the Klein–Gordon equation (± − m 2) φ = 0, then the field χ (x ) = (γ b ∂ b − m )φ ( x ) satisfies (Exercise 7.7) the Dirac equation (7.50) If

φ (x )

(γ a ∂ + m) χ ( x ) = (γ a ∂ + m ) (γ b∂ − m )φ (x ) = »γ a γ b ∂ ∂ − m 2 ¼ φ (x ) a a b a b · ¶1 » ¼ (7.52) = 2 {γ a , γ b } + [γ a , γ b ] ∂a ∂b − m 2 φ ( x ) » ab ¼ = η ∂a ∂b − m 2 φ (x ) = (± − m 2)φ (x ) = 0.

The simplest Dirac field is the Majorana field χb ( x )

=

½

d3 p (2 π )3/2

¸

±¹ s

u b( p, s ) a( p, s)ei px

+ vb ( p, s ) a †( p, s )e−i px

º

(7.53)

in which p0 = p2 + m 2 , s labels the two spin states, and the operators a and a † obey the anticommutation relations

{a( p, s ), a †( p², s ± )} ≡ a ( p, s ) a† ( p², s± ) + a† ( p² , s ± ) a ( p, s ) = δss ± δ( p − p² ) {a( p, s ), a( p², s ± )} = {a† ( p, s ), a †( p², s ± )} = 0. (7.54) It describes a neutral particle of mass m. If two Majorana fields χ1 and χ2 represent particles of the same mass, then one may combine them into one Dirac field

= √1

[χ1 (x ) + i χ2(x )] 2 which describes a charged particle such as a quark or a lepton. ψ (x )

(7.55)

7.4 First-Order Differential Equations The equation dy dx

=

f (x , y ) = −

or system P ( x , y) d x

P (x , y ) Q ( x , y)

+ Q ( x, y) dy = 0

(7.56)

(7.57)

is a first-order ordinary differential equation . 7.5 Separable First-Order Differential Equations If in a first-order ordinary differential equation like (7.57) one can separate the dependent variable y from the independent variable x F ( x ) d x + G ( y ) dy

=0

(7.58)

258

7 Differential Equations

then the equation (7.57) is separable, and (7.58) is separated. Once the variables are separated, one can integrate and obtain an equation, called the general integral, that relates y to x 0=

½

x x0

F (x ± ) d x ± +

½

y y0

G ( y ± ) dy ±

(7.59)

and so provides a solution y ( x ) of the differential equation. Example 7.12 (Zipf’s law) In 1913, Auerbach noticed that many quantities are distributed as (Gell-Mann, 1994, pp. 92–100) (7.60) = −a xdx k +1 an ODE that is separable and separated. For k ³ = 0, we may integrate this to n + c = a/ kx k or ´ a µ1 k (7.61) x= k ( n + c) dn

/

in which c is a constant. The case k = 1 occurs frequently x = a /(n + c) and is called Zipf’s law. With c = 0, it applies approximately to the populations of cities: if the largest city (n = 1) has population x, then the populations of the second, third, fourth cities (n = 2, 3, 4) will be x /2, x /3, and x / 4. Again with c = 0, Zipf’s law applies to the occurrence of numbers x in a table of some sort. Since x = a/ n, the rank n of the number x is approximately n = a/ x. So the number of numbers that occur with first digit d and, say, 4 trailing digits will be n(d0000 ) − n(d9999) = a

≈a

´

´

1 d0000

−

1 d9999

104 d(d + 1) 10 8

µ

µ

=a

´

9999 d0000 × d9999

−4

= da(d10+ 1) .

µ

(7.62)

The ratio of the number of numbers with first digit d to the number with first digit d ± is then d ±(d ± + 1)/d(d + 1). For example, the first digit is more likely to be 1 than 9 by a factor of 45. The German government uses such formulas to catch tax evaders. Example 7.13(Logistic equation) dy dt

= ay

»

1−

y¼ Y

(7.63)

is separable and separated. It describes a wide range of phenomena whose evolution with time t is sigmoidal such as (Gell-Mann, 2008) the cumulative number of casualties in a war, the cumulative number of deaths in London’s great plague, and the cumulative number of papers in an academic’s career. It also describes the effect y on an animal of a given dose t of a drug.

7.5 Separable First-Order Differential Equations With f

= y / Y , the logistic equation (7.63) is f˙ = a f (1 −

259

f ) or

= dff + 1 d−f f (7.64) ² ³ which we may integrate to a(t − th ) = ln f /(1 − f ) . Taking the exponential of both sides, we find exp[a(t − th )] = f /(1 − f ) which we can solve for f a dt

=

df f (1 − f )

f (t ) =

ea (t −t h ) . 1 + ea(t −th )

(7.65)

The sigmoidal shape of f (t ) is like a smoothed Heaviside function. In terms of y0 = y (0), the value of y (t ) is Y y0 eat y (t ) = . (7.66) Y + y0 (eat − 1) Example 7.14(Lattice QCD) In lattice field theory, the beta-function β( g )

≡ − d dg ln a

(7.67)

tells us how we must adjust the coupling constant g in order to keep the physical predictions of the theory constant as we vary the lattice spacing a. In quantum chromodynamics β(g ) = −β0 g 3 − β1 g5 + · · · where β0

1

= (4π )2

´

11 −

2 nf 3

µ

and

β1

1

= (4π )4

´

102 − 10 n f

−

8 nf 3

µ

(7.68)

in which n f is the number of light quark flavors. Combining the definition (7.67) of the β -function with the first term of its expansion β(g ) = − β0 g3 for small g, one arrives at the differential equation dg d ln a

= β0 g3

which one may integrate

½

d ln a = ln a + c =

to find µ a (g)

½

dg β0 g3

= e−1 2

/ β 0 g2

(7.69)

= − 2β1g2 0

(7.70)

(7.71)

in which µ is a constant of integration. As g approaches 0, which is an essential singularity (Section 6.11), the lattice spacing a(g) goes to zero ¸ very fast2 (as2 long as n f ≤ 16). The inverse of this relation g(a) ≈ 1/ β0 ln(1/a µ ) shows that the coupling constant g(a) slowly goes to zero as the lattice spacing (or shortest wave length) a goes to zero. The strength of the interaction shrinks logarithmically as the energy 1 /a increases in this lattice version of asymptotic freedom.

260

7 Differential Equations

7.6 Hidden Separability As long as each of the functions P (x , y ) and Q (x , y ) in the ODE

+ Q ( x , y )dy = U ( x )V ( y) d x + R ( x ) S( y )dy = 0 (7.72) can be factored P ( x , y ) = U ( x ) V ( y ) and Q (x , y ) = R (x ) S ( y ) into the product of P (x , y )d x

a function of x times a function of y, then the ODE is separable. Following (Ince, 1956), we divide the ODE by R ( x )V ( y ), separate the variables

+ VS((yy)) dy = 0

U (x ) dx R ( x) and integrate

½

U ( x) dx R(x )

+

½

S(y ) dy V ( y)

=C

(7.73)

(7.74)

in which C is a constant of integration. Example 7.15(Hidden separability)

We separate the variables in

x ( y2 − 1) dx

− y (x 2 − 1) dy = 0

by dividing by ( y2 − 1)( x 2 − 1) so as to get y x dx − 2 dy 2 x −1 y −1

= 0.

(7.75)

(7.76)

Integrating, we find ln¸ (x 2 − 1 ) − ln( y2 − 1) = − ln C or C (x 2 − 1) = y 2 − 1 which we solve for y (x ) = 1 + C (x 2 − 1).

7.7 Exact First-Order Differential Equations The differential equation P ( x , y) d x

+ Q (x , y ) dy = 0

(7.77)

is exact if its left-hand side is the differential of some function φ (x , y ) P dx

+ Q dy = d φ = φx d x + φ y dy.

(7.78)

We’ll have more to say about the exterior derivatived in section 12.6. The criteria of exactnessare ∂ φ (x , y) ∂ φ (x , y ) P(x , y ) = ≡ φ x ( x , y ) and Q ( x , y ) = ≡ φy ( x, y). (7.79) ∂x ∂y Thus, if the ODE (7.77) is exact, then Py (x , y ) = φ yx (x , y ) = φx y ( x , y ) = Q x ( x , y )

(7.80)

7.8 Meaning of Exactness

261

which is called the condition of integrability. This condition implies that the ODE (7.77) is exact and integrable, as we’ll see in Section 7.8. A first-order ODE that is separable and separated P (x )d x is exact because Py

+ Q ( y) dy = 0

(7.81)

= 0 = Qx .

(7.82)

But a first-order ODE may be exact without being separable. Example 7.16(Boyle’s law) At a fixed temperature T , changes in the pressure P and volume V of an ideal gas are related by PdV + VdP = 0. This ODE is exact because PdV + VdP = d(PV). Its integrated form is the ideal-gas lawPV = NkT in which N is the number of molecules in the gas and k is Boltzmann’s constant, k = 1.38066 × 10−23 J/K = 8.617385×10− 5 eV/K. A more accurate formula, due to van der Waals (1837–1923) is

¾

´ N µ2 ¿ ( ) a ± V − N b± = N kT P+ V

(7.83)

in which a± > 0 represents the mutual attraction of the molecules and b± > 0 is the effective volume of a single molecule. This equation was a sign that molecules were real particles, a fact finally accepted after Einstein in 1905 related the viscous-friction coefficient ζ and the diffusion constant D to the energy kT of a thermal fluctuation by the equation ζ D = kT , as explained in section 15.12 (Albert Einstein 1879–1955). Example 7.17 (Human population growth) If the number of people rises as the square of the population, then N˙ = N 2/b. The separated and hence exact form of this differential equation is d N / N 2 = dt /b which we integrate to N (t ) = b/(T − t ) where T is the time at which the population becomes infinite. With T = 2025 years and b = 2 × 1011 years, this formula is a fair model of the world’s population between the years 1 and 1970. For a more accurate account, see (von Foerster et al., 1960).

7.8 Meaning of Exactness We can integrate the differentials of a first-order ODE P ( x , y) d x

+ Q ( x, y) dy = 0

(7.84)

along any contour C in the x– y-plane, but in general we’d get a functional φ ( x , y , C , x 0, y0 )

=

½

(x, y) (x 0 , y0 )C

P ( x ±, y ± ) d x ± + Q ( x ± , y ± ) dy ±

(7.85)

262

7 Differential Equations

that depends upon the contour C of integration as well as upon the endpoints (x 0 , y 0) and (x , y ). But if the differential Pd x + Qdy is exact, then it’s the differential or exterior derivative d φ = P ( x , y ) d x + Q ( x , y ) dy of a function φ ( x , y ) that depends upon the variables x and y without any reference to a contour of integration. Thus if Pd x + Qdy = d φ , then the contour integral (7.85) is

½

( x , y) (x 0 , y0 )C

P (x ± , y ± ) d x ± + Q ( x ±, y ± ) dy ±

=

½

(x ,y )

dφ

( x0 ,y0 )

= φ (x , y) − φ (x0, y0).

(7.86)

This integral defines a function φ ( x , y ; x0 , y 0) ≡ φ ( x , y ) − φ (x 0 , y0 ) whose differential vanishes d φ = Pd x + Qdy = 0 according to the original differential equation (7.84). Thus the ODE and its exactness leads to an equation φ (x,

y ; x 0 , y0 ) = B

(7.87)

that we can solve for y, our solution of the ODE (7.84) d φ (x , y ; x 0 , y0 ) = P ( x , y ) d x Example 7.18(Explicit use of exactness) exactness P ( x , y) =

∂ φ (x ,

y)

∂x

≡ φ x( x , y )

+ Q ( x, y) dy = 0.

(7.88)

We’ll now explicitly use the criteria of

and Q (x , y) =

∂φ (x, y) ∂y

≡ φy ( x , y)

(7.89)

to integrate the general exact differential equation P (x , y ) dx

+ Q (x , y ) dy = 0.

(7.90)

We use the first criterion P = φx to integrate the condition φx = P in the x-direction getting a known integral R (x , y ) and an unknown function C ( y ) φ (x ,

y) =

½

P (x , y) dx

+ C (y ) = R(x , y ) + C ( y ).

(7.91)

= φy says that Q (x , y ) = φ y (x , y) = R y (x , y ) + C y (y ). (7.92) We get C ( y) by integrating its known derivative C y = Q − R y ½ C ( y ) = Q (x , y) − R y (x , y ) dy + D. (7.93) We now put C into the formula φ = R + C which is (7.91). Setting φ = E a constant, we find an equation that we can solve for y ½ φ (x , y ) = R ( x , y ) + C ( y ) = R ( x , y ) + Q (x , y ) − R y (x , y ) dy + D = E . The second criterion Q

7.9 Integrating Factors

263

Example 7.19(Using exactness) The functions P and Q in the differential equation

= ln( y2 + 1) dx + 2yy(2x+−11) dy = 0

P (x , y) dx + Q (x , y ) dy

(7.94)

are factorized, so the ODE is separable. It’s also exact since Py

=

2y y2 + 1

= Qx

(7.95)

and so we can apply the method just outlined. First as in (7.91), we integrate φx in the x-direction φ (x , y )

=

½

ln(y 2 + 1) dx + C ( y) = x ln(y 2 + 1) + C ( y ).

Then as in (7.92), we use φ y φ (x , y ) y

=P

(7.96)

=Q

= y 22x+y 1 + C y ( y) = Q (x , y) = 2yy(2x+−11)

(7.97)

to find that C y = −2y /( y2 + 1). We integrate C y in the y-direction as in (7.93) getting C ( y ) = − ln( y2 + 1) + D. We now put C (y ) into our formula (7.96) for φ (x , y) φ(x, y)

= (x − 1) ln(y 2 + 1) + D

which we set equal to a constant

= (x − 1) ln(y 2 + 1) + D = E or more simply (x − 1) ln( y2 + 1) = F . Unraveling this equation we find » ¼1 2 y(x ) = eF x −1 − 1 φ(x, y)

/(

/

)

(7.98)

(7.99)

as our solution to the differential equation (7.94).

7.9 Integrating Factors With great luck, one might invent an integrating factor α( x , y ) that makes an ordinary differential equation P d x + Q dy = 0 exact α

+ α Q dy = d φ

P dx

(7.100)

and therefore integrable. Such an integrating factor α must satisfy both α

P

= φx

(α

P)y

so that

and

α

Q

= φy

= φx y = (α Q ) x .

(7.101)

(7.102)

264

7 Differential Equations

Example 7.20(Two simple integrating factors) The ODE ydx − xdy = 0 is not exact, but α( x , y ) = 1/ x 2 is an integrating factor. For after multiplying by α, we have

− xy2 dx + 1x dy = 0

so that P

(7.103)

= − y/ x 2 , Q = 1/ x, and

Py

= − x12 = Q x

(7.104)

which shows that (7.103) is exact. Another integrating factor is α( x , y ) = 1/ x y which separates the variables dx x

= dyy

so that we can integrate and get ln( y / y0 ) implies that y = (y0/ x0 )x.

=

(7.105) ln( x /x0 ) or ln( yx0/ x y0)

=

0 which

7.10 Homogeneous Functions

A function f ( x ) = f ( x 1, . . . , x k ) of k variables x i is homogeneous of degree n if f (t x ) = f (t x 1, . . . , t xk ) = t n f ( x ).

(7.106)

For instance, z 2 ln(x / y ) is homogeneous of degree 2 because 2

(t z )

(

ln(t x / t y) = t 2 z 2 ln(x / y )

)

.

(7.107)

By differentiating (7.106) with respect to t, we find

± dt xi ∂ f (t x ) d f (t x) = dt dt ∂ t x i i =1 k

Setting t

=

k ± i =1

xi

∂ f (t x ) ∂ t xi

= nt n−1 f ( x ).

(7.108)

= 1, we see that a function that is homogeneous of degree n satisfies k ± ∂ f (x ) xi = n f (x ) (7.109) ∂x i =1

i

which is one of Euler’s many theorems. Example 7.21(Internal energy and entropy) The internal energy U is a first-degree homogeneous function of the entropy S, the volume V , and the numbers N j of molecules of kind j because U (t S , t V , t N ) = t U (S , V , N ). The temperature T , pressure p, and chemical potential μ j are defined as

7.11 Virial Theorem

À ∂U À À , T = ∂ S ÀV N

À ∂U À À μj = ∂Nj À S V N ³=

À ∂U À À , and p= − ∂ V ÀS N

,

,

so the change dU in the internal energy is dU

= T dS − p dV +

, ,

±

i

265 (7.110) j

μjd N j.

(7.111)

j

= 1 expresses the internal energy as ± U =T S− pV + μ j N j.

Euler’s theorem (7.109) for n

(7.112)

j

If the energy density of empty space is positive, then the internal energy of the universe rises as it expands, and the definition (7.110) of the pressure implies that the pressure on the universe is negative. Thus dark energyaccelerates the expansion of the universe. The formula (7.111) for the change dU implies that the change in the entropy is dS

= dU / T + ( p/ T ) d V −

which leads to relations 1 T

ÀÀ = ∂ U ÀÀ , V N

p T

∂S

,

ÀÀ = ∂ V ÀÀ , and U N ∂S

±

μj

,

(μ j / T ) d N j

(7.113)

j

T

ÀÀ = − ∂ N ÀÀ j U V N ∂S

, ,

i ³= j

(7.114)

analogous to the definitions (7.110). Unlike the internal energy, however, the entropy S(U, V , N ) is not a homogeneous function of U , V , and N . The Sackur–Tetrode formula for the entropy of an ideal gas is S

≈ kN

¾ Á ´ µ Â V 4π m U 3 2 /

ln

N

3N h 2

+

5 2

¿

(7.115)

in which h is Planck’s constant and m is the mass of a molecule of the gas.

7.11 Virial Theorem Consider N particles moving nonrelativistically in a potential V ( x ) of 3N variables that is homogeneous of degree n . Their virial is the sum of the products of the coordinates xi multiplied by the momenta pi G In terms of the kinetic energy T of the virial is

=

3N ± i =1

xi pi .

(7.116)

= (v1 p1 + · · · + v3 N p3N )/2, the time derivative

266

7 Differential Equations

dG dt

=

3N ± i =1

(vi

pi

+ xi Fi ) = 2T +

3N ± i =1

xi Fi

(7.117)

in which the time derivative of a momentum p˙i = Fi is a component of the force. We now form the infinite-time average of both sides of this equation G ( t ) − G (0) lim t →∞ t

=

Ã dG Ä dt

= 2 ´T µ +

Å± 3N i =1

Æ

x i Fi

.

(7.118)

If the particles are bound by a potential V , then it is reasonable to assume that the positions and momenta of the particles and their virial G ( t ) are bounded for all times, and we will make this assumption. It follows that as t → ∞, the time average of the time derivative G˙ of the virial must vanish 0 = 2 ´T µ + Newton’s law Fi now implies that 2 ´T µ =

Å± 3N i =1

Æ

x i Fi

.

= − ∂ V∂ x( x )

(7.120)

i

Å± 3N i =1

xi

∂ V (x )

xi

(7.119)

Æ .

(7.121)

If, further, the potential V (x ) is a homogeneous function of degree n, then Euler’s theorem (7.109) gives us x i ∂i V = nV and the virial theorem ´ T µ = 2n ´ V µ . (7.122) The long-term time average of the kinetic energy of particles trapped in a homogeneous potential of degree n is n /2 times the long-term time average of their potential energy. Example 7.22 (Coulomb forces) A 1/r gravitational or electrostatic potential is homogeneous of degree −1, and so the virial theorem asserts that particles bound in such wells must have long-term time averages that satisfy

´T µ = − 21 ´V µ.

(7.123)

In natural units (± = c = 1), the energy of an electron of momentum p a distance r from a proton is E = p2/ 2m − e2 /r in which e is the charge of the electron. The uncertainty principle (Example 4.7) gives us an approximate lower bound on the product r p ± 1 which we will use in the form r p = 1 to estimate the energy E of

7.12 Legendre’s Transform

267

the ground state of the hydrogen atom. Using 1/ r = p, we have E = p2 /2m − e2 p. Differentiating, we find the minimum of E is at 0 = p / m − e2 . Thus the kinetic energy of the ground state is T = p2 /2m = me4 /2 while its potential energy is V = −e 2 p = −me4 . Since T = − V /2, these values satisfy the virial theorem. They give the ground-state energy as E = −me4 /2 = −mc2(e2/ ±c)2 = 13.6 eV. Example 7.23(Dark matter) In 1933, Zwicky applied the gravitational version of the virial theorem (7.123) to the galaxies of the Coma cluster. He used his observations of their luminosities to estimate the gravitational term − 21 ´V µ and found it to be much less than their mean kinetic energy ´T µ. He called the transparent mass dark matter (Fritz Zwicky, 1898–1974). Example 7.24(Harmonic forces) Particles confined in a harmonic potential V (r ) = ∑ m ω2 r 2 which is homogeneous of degree 2 must have long-term time averages k k k k that satisfy ´T µ = ´ V (x )µ.

7.12 Legendre’s Transform The change in a function A( x , y ) of two independent variables x and y is dA

= ∂∂ xA d x + ∂∂ Ay dy.

(7.124)

Suppose we define a variable v as

≡ ∂∂ yA

(7.125)

≡ A ( x, y) − v y .

(7.126)

v

and a new function B as B Then the change in B is dB

= ∂∂ xA d x + ∂∂ Ay dy − d v y − ∂∂ yA dy = ∂∂ Ax d x − y d v

(7.127)

which says that B is properly a function B ( x , v) of x and v . To really make B a function of x and v , however, we must invert the definition (7.125) of v and express y as a function of x and v so that we can write B ( x , v) = A( x , y ( x , v)) − v y (x , v).

(7.128)

We also could define a new variable u as u

≡ ∂∂ xA

(7.129)

268

7 Differential Equations

and a new function C as C

≡ A( x , y ) − u x .

(7.130)

The change in C is dC

= ∂∂ xA d x + ∂∂ Ay dy − x du − ∂∂ Ax d x = ∂∂ Ay dy − x du

(7.131)

which says that C is properly a function C (u , y ) of u and y. To really make C a function of u and y, however, we must invert the definition (7.129) of u and express x as a function of u and y so that we can write C (u , y ) = A (x (u , y ), y ) − u x (u , y ).

(7.132)

We also could combine the two transformations by defining two variables u

≡ ∂∂ xA

D

≡ A( x , y ) − u x − v y.

and the new function

and

v

≡ ∂∂ Ay

(7.133)

(7.134)

The change in D is dD

= ∂∂ xA d x + ∂∂ Ay dy − ∂∂ xA d x − x du − ∂∂ Ay dy − y d v = − x du − y dv

(7.135)

which says that D is properly a function D (u , v) of u and v. To really make D a function of u and v, however, we must invert the definitions (7.133) and express x and y in terms of u and v so that D (u , v) = A( x (u , v), y (u , v)) − u x (u , v) − v y (u , v).

(7.136)

Example 7.25(The functions of Lagrange and Hamilton) The lagrangian L (q , q˙ ) of a time-independent system is a function of the two independent variables q and q˙ , and the change in L is ∂L ∂L dq + dq˙ . (7.137) dL = ∂q ∂ q˙ One defines a new variable called the momentum p as

≡ ∂∂ Lq˙

(7.138)

≡ p q˙ − L (q , q˙ )

(7.139)

p and the hamiltonian H as H

7.12 Legendre’s Transform

269

in which an overall minus sign makes H positive in most cases. The change in H is dH

= p d q˙ + q˙ dp − d L = ∂∂ Lq˙ d q˙ + q˙ dp − ∂∂ Lq dq − ∂∂ Lq˙ d q˙ = q˙ dp − ∂∂ Lq dq

(7.140)

which says that H is properly a function H (q , p) of q and p. When L is quadratic in the time derivative q˙ , it’s easy to invert the definition (7.138) of the momentum and express the hamiltonian H as a function H (q , p). For the harmonic oscillator of mass m and angular frequency ω , the lagrangian is L

= m2q˙ − m ω2 q 2

and the momentum is p≡

2

2

(7.141)

,

∂L

(7.142) ˙ = m q˙ . We easily invert this definition and get q˙ = p/ m. Inserting this formula for q˙ into the definition

H

∂q

≡ p q˙ − L = p q˙ − m2q˙ + m ω2 q 2

2

2 ,

(7.143)

we have as the hamiltonian

p2 p2 m ω2 q 2 p2 m ω2 q 2 − + = + . m 2m 2 2m 2 One also can go backwards, from H to L. Define q˙ as H

=

≡ ∂∂Hp

(7.145)

= p q˙ − H .

(7.146)

q˙ and set L

(7.144)

Then the change in L is dL

= q˙ dp + pd q˙ − d H = ∂∂Hp dp + pd q˙ − ∂∂Hq dq − ∂∂Hp dp = pd q˙ − ∂∂Hq dq

(7.147)

which shows that L is a function of q and q. ˙

Example 7.26 (Thermodynamic potentials) The internal energy U varies with the entropy S, the volume V , and the numbers of molecules N1 , N 2, . . . as (7.111) dU

= T dS − p dV +

±

μj

dNj

(7.148)

j

in which T is the temperature, p is the pressure, and μ j is a chemical potential. To change the independent variables from S, V , and N to S, p, and N, we add p V and get the enthalpy

270

7 Differential Equations H

Its differential d H is dH

= T dS − p dV +

±

μj

dNj

j

≡ U + p V.

(7.149)

+ p d V + V dp = T d S + V dp +

±

μj

dNj

j

(7.150) which shows that the enthalpy depends upon the entropy S, the pressure p, and the number N j of each kind of molecule. Subtracting T S from the enthalpy H , we get the Gibbs free energy G whose differential is dG

≡ U + pV −T S = H − T S

(7.151)

= d H − T d S − S dT ± = T d S + V dp + μ j d N j − T d S − S dT ±j = V dp − S dT + μ j d N j

(7.152)

j

which shows that the Gibbs free energy is a function G ( p, T , N ) of the pressure p, the temperature T , and the numbers N 1, N2 , . . . of molecules. Finally, Helmholtz’s free energy F is F

≡ U −T S

so its differential is dF

= dU − T d S − S dT = T d S − p d V + ± = − p d V − S dT + μ j d N j

(7.153)

± j

μj

dNj

− T d S − S dT (7.154)

j

which shows that the Helmholtz free energy is a function F (V , T , N ) of the volume V , the temperature T , and the numbers N of molecules.

7.13 Principle of Stationary Action in Mechanics In classical mechanics, the motion of n particles in 3 dimensions is described by an action density or lagrangian L (q , q˙ , t ) in which q stands for the 3n generalized coordinates q1 , q2 , . . . , q3n and q˙ for their time derivatives. The action of a motion q ( t ) is the time integral S

=

½t

2

t1

L (q , q˙ , t ) dt .

(7.155)

7.13 Principle of Stationary Action in Mechanics

271

If q (t ) changes slightly by δ q (t ), then the first-order change in the action is

=

δS

½ t± 3n ¶ ∂ L (q , q ˙, t ) 2

∂ qi

t1 i =1

δ q i (t )

+

˙

∂ L (q , q , t )

˙

∂q i

˙

·

δq i (t )

dt .

(7.156)

The change in q˙i is the time derivative of the change δ qi δ

so we have δS

=

d δq i i = d (qi dt+ δ qi ) − dq = , dt dt

dqi dt

½ t ± ¶ ∂ L (q, q˙ , t ) 2

t1

∂ qi

i

δ q i (t )

Integrating by parts, we find δS

=

½ t ± ¶´ ∂ L 2

t1

∂ qi

i

−

d ∂L dt ∂ q˙i

µ

+

˙

∂ L (q , q , t )

˙

∂q i

·

δ qi ( t )

(7.157)

·

d δ qi (t ) dt . dt

¾± ¿t ∂L dt + δq i (t ) ∂q˙ i i t

(7.158)

2

.

(7.159)

1

According to the principle of stationary action , a classical process is one that makes the action stationary to first order in δ q ( t ) for changes that vanish at the endpoints δ q (t1 ) = 0 = δ q (t2 ). Thus a classical process satisfies Lagrange’s equations d ∂L dt ∂ q˙i

− ∂∂qL = 0 i

= 1, . . . , 3n.

for i

(7.160)

Moreover, if the lagrangian L does not depend explicitly on the time t, as in autonomous systems, then the energy E

=

± ∂L q˙i − L ∂q ˙ i i

(7.161)

does not change with time because its time derivative is the vanishing explicit time dependence of the lagrangian − ∂ L /∂ t = 0. That is, the energy is conserved

´ d ∂L µ ∂L ± ± ∂L ∂L ∂L E˙ = q˙i + qï − L˙ = q˙i + qï − L˙ = − = 0. dt ∂q˙ ∂ q˙ ∂q ∂ q˙ ∂t i

i

i

i

i

i

(7.162)

The momentum pi canonically conjugateto the coordinate qi is pi

= ∂∂qL˙ . i

(7.163)

272

7 Differential Equations

If we can write the time derivatives q˙i of the coordinates in terms of the qk ’s and pk s, that is, q˙i = q˙ i (q , p ), then the hamiltonian is a Legendre transform of the lagrangian (Example 7.25) H (q , p ) =

3n ± i =1

pi q˙i (q , p) − L ( q , p ).

(7.164)

This rewriting of the velocities q˙i in terms of the q’s and p’s is easy to do when the lagrangian is quadratic in the q˙i ’s but not so easy in most other cases. The change (7.159) in the action due to a tiny detour δ q (t ) that differs from zero only at t2 is proportional to the momenta (7.163) δS

=

± ± ∂L δq i (t2) = pi δ qi (t2 ) ∂q ˙i i i

whence

∂S ∂q i

= pi .

(7.165)

(7.166)

We can write the total time derivative of the action S, which by construction (7.155) is the lagrangian L , in terms of the 3n momenta (7.166) as dS dt

± ∂S

= L = ∂∂St +

∂ qi

i

q˙i

= ∂∂St +

±

pi q i .

(7.167)

i

Thus apart from a minus sign, the partial time derivative of the action S is the energy function (7.161) or the hamiltonian (7.164) (if we can find it) ∂S

=L−

∂t

±

pi qi

i

= − E = − H.

(7.168)

7.14 Symmetries and Conserved Quantities in Mechanics

A transformation q i± (t ) = q i (t ) + δ qi (t ) and its time derivative

dqi (t ) d δ qi ( t ) dqi± (t ) = + dt = q˙i ( t ) + δ q˙i (t ) dt dt is a symmetry of a lagrangian L if the resulting change δ L vanishes q˙i± (t ) =

δL

=

±

∂L

i

∂ qi ( t )

δq i (t )

+ ∂q∂˙ L(t ) δ q˙i (t ) = 0. i

(7.169)

(7.170)

This symmetry and Lagrange’s equations (7.160) imply that the quantity Q

=

± ∂L δ qi ∂ q˙ i i

(7.171)

7.15 Homogeneous First-Order Ordinary Differential Equations

273

is conserved. That is, the time derivative of Q vanishes d dt

Á± i

Â ±´ µ d ∂L ∂L δ qi = δ qi + ∂q ˙i dt ∂ q˙i ∂ q˙i i ∂L

d δ qi dt

=

± ∂L ∂ qi

i

δq i

+ ∂∂qL˙ δq˙i = 0. i

(7.172)

Example 7.27 (Conservation of momentum and angular momentum) Suppose the coordinates qi are the spatial coordinates r i = (x i , yi , zi ) of a system of particles with time derivatives v i = (x˙ i , y˙i , z˙i ). If the lagrangian is unchanged δ L = 0 by spatial displacement or spatial translation by a constant vector d = (a , b, c), that is, by δ xi = a, δ yi = b, δ zi = c, then the momentum in the direction d P·d

=

± ∂L i

∂ vi

·d =

± i

pi · d

(7.173)

is conserved. If the lagrangian is unchanged δ L = 0 when the system is rotated by an angle θ , that is, if δ r i = θ × r i is a symmetry of the lagrangian, then the angular momentum J about the axis θ J·θ

=

± ∂L i

∂ vi

· (θ × r i ) =

± i

pi

· (θ × r i ) =

Á±

Â

ri

i

× pi · θ

(7.174)

is conserved.

7.15 Homogeneous First-Order Ordinary Differential Equations Suppose the functions P (x , y ) and Q( x , y ) in the first-order ODE P ( x , y) d x

+ Q ( x, y) dy = 0

(7.175)

are homogeneous of degree n (Ince, 1956). We change variables from x and y to x and y (x ) = x v(x ) so that dy = xd v + vd x, and P ( x , x v)d x

+ Q (x , x v)(xd v + v d x ) = 0.

(7.176)

The homogeneity of P ( x , y ) and Q ( x , y ) imply that x n P ( 1, v)d x

+ x n Q (1, v)(xdv + v d x ) = 0.

(7.177)

Rearranging this equation, we are able to separate the variables dx x

+ P ( 1, v)Q (+1,vv)Q (1, v) d v = 0.

(7.178)

274

7 Differential Equations

We integrate this equation ln x

+

½

Q (1, v ± ) d v± P (1, v ± ) + v ± Q (1, v ± )

v

=C

(7.179)

and find v( x ) and so too the solution y ( x ) = x v(x ). Example 7.28(Using homogeneity)

In the differential equation

− y 2) dx + 2x y dy = 0 (7.180) the coefficients of the differentials P (x , y ) = x 2 − y 2 and Q (x , y) = 2x y are homogeneous functions of degree n = 2, so the above method applies. With y(x ) = x v(x ), (x

we have

2

x 2 (1 − v 2)dx + 2x 2v(v dx

+ xd v) = 0 (7.181) in which x 2 cancels out, leaving (1 + v 2 )dx + 2v xd v = 0. Separating variables and integrating, we find ½ dx ½ 2v d v (7.182) + 1 + v2 = ln C x or ln x + ln(1 + v 2) = ln C . So√ (1 + v 2)x = C which leads to the general integral 2 2 x + y = C x and so to y (x ) = C x − x 2 as the solution of the ODE (7.180). 7.16 Linear First-Order Ordinary Differential Equations The general form of a linear first-order ODE is dy dx

+ r ( x ) y = s (x ).

(7.183)

We always can find an integrating factor α( x ) that makes 0= exact. With P ≡ exact is Py = αr

α(r y

− s) d x + αdy

(7.184)

− s ) and Q ≡ α, the condition (7.80) for this equation to be = Q x = αx or αx /α = r . So

α(r y

d ln α dx which we integrate and exponentiate α(x )

= α(x0 ) exp

=r

´½ x x0

(7.185)

r ( x± ) d x±

µ .

(7.186)

7.16 Linear First-Order Ordinary Differential Equations

Now since αr factor is

275

= αx , the original equation (7.183) multiplied by this integrating αyx

+ αr y = α yx + αx y = (α y)x = αs.

Integrating we find α( x ) y ( x )

= α(x0 ) y( x0) +

½

so that y( x ) =

α( x 0) y ( x 0) α( x )

+

1 α( x )

x x0

½

±

±

α( x )s (x )d x

x x0

±

±

(7.187)

±

α( x )s (x )d x

(7.188)

±

in which α( x ) is the exponential (7.186). More explicitly, y ( x ) is

´ ½ y (x ) = exp −

x x0

r (x ± )d x ±

µ¾

y ( x 0) +

½x x0

exp

Á½

x± x0

(7.189)

Â

r ( x ±± )d x ±± s (x ± )d x ±

¿ .

(7.190)

The first term in the square brackets multiplied by the prefactor α( x 0 )/α(x ) is the general solution of the homogeneous equation yx + r y = 0. The second term in the square brackets multiplied by the prefactor α(x 0 )/α( x ) is a particular solution of the inhomogeneous equation y x + r y = s. Thus Equation (7.190) expresses the general solution of the inhomogeneous equation (7.183) as the sum of a particular solution of the inhomogeneous equation and the general solution of the associated homogeneous equation as in Section 7.1. We were able to find an integrating factor α because the original equation (7.183) was linear in y. So we could set P = α(r y − s ) and Q = α. When P and Q are more complicated, integrating factors are harder to find or nonexistent. Example 7.29(Bodies falling in air) The downward speed v of a mass m in a gravitational field of constant acceleration g is described by the inhomogeneous first-order ODE m vt = mg − bv in which b represents air resistance. This equation is like (7.183) but with t instead of x as the independent variable, r = b/ m, and s = g. Thus by (7.190), its solution is mg » mg ¼ −bt / m v(t ) = + v(0 ) − e . (7.191) b b The terminal speed mg /b is nearly 200 km/h for a falling man. A diving Peregrine falcon can exceed 320 km/h; so can a falling bullet. But mice can fall down mine shafts and run off unhurt, and insects and birds can fly.

276

7 Differential Equations

If the falling bodies are microscopic, a statistical model is appropriate. The potential energy of a mass m at height h is V = mgh. The heights of particles at temperature T K follow Boltzmann’s distribution (1.392) P (h ) = P (0)e−mgh/ kT

(7.192)

in which k = 1.380 6504 × 10−23 J/K = 8.617 343 × 10− 5eV/K is his constant. The probability depends exponentially upon the mass m and drops by a factor of e with the scale heightS = kT / mg, which can be a few kilometers for a small molecule. Example 7.30 (R-C circuit) The capacitance C of a capacitor is the charge Q it holds (on each plate) divided by the applied voltage V , that is, C = Q / V . The current I through the capacitor is the time derivative of the charge I = Q˙ = C V˙ . The voltage across a resistor of R ¶ (Ohms) through which a current I flows is V = I R by Ohm’s law. So if a time-dependent voltage V (t ) is applied to a capacitor in series with a resistor, then V (t ) = Q / C + I R. The current I therefore obeys the first-order differential equation I˙ + I / RC

= V˙ / R (7.193) or (7.183) with x → t, y → I , r → 1/ RC, and s → V˙ / R. Since r is a constant, the integrating factor α( x ) → α(t ) is t −t RC α(t ) = α(t 0) e . (7.194) (

0 )/

Our general solution (7.190) of linear first-order ODEs gives us the expression

¶

I (t ) = e−( t −t0)/( RC ) I (t0 ) +

½t t0

e

(t

˙

± −t 0 )/ RC V (t ± )

R

dt ±

·

(7.195)

for the current I (t ). Example 7.31(Emission rate from fluorophores) A fluorophore is a molecule that emits light when illuminated. The frequency of the emitted photon usually is less than that of the incident one. Consider a population of N fluorophores of which N+ are excited and can emit light and N− = N − N+ are unexcited. If the fluorophores are exposed to an illuminating photon flux I , and the cross-section for the excitation of an unexcited fluorophore is σ , then the rate at which unexcited fluorophores become excited is I σ N − . The time derivative of the number of excited fluorophores is then N˙ +

= I σ N− − τ1 N+ = − 1τ N+ + I σ ( N − N + )

(7.196)

in which 1/τ is the decay rate (also the emission rate) of the excited fluorophores. Using the shorthand a = I σ + 1/τ , we have N˙ + = − a N+ + I σ N which we solve using the general formula (7.190) with r = a and s = I σ N

¶

N+ (t ) = e−at N+ (0) +

½t 0

at ±

e I (t ± )σ N dt ±

·

.

(7.197)

7.17 Small Oscillations

277

If the illumination I (t ) is constant, then by doing the integral we find

) Iσ N ( 1 − e−at + N+ (0)e−at . a

N+ (t ) = The emission rate E then is

=

N+ (t )/τ of photons from the N + (t ) excited fluorophores

E which with a = I σ

(7.198)

=

) N+ (0) −at Iσ N ( 1 − e−at + e aτ τ

+ 1/1τ gives for the emission rate per fluorophore » − I +1 t ¼ Iσ E = 1−e N 1 + Iστ if no fluorophores were excited at t = 0, so that N+ (0) = 0. ( σ

/τ )

(7.199)

(7.200)

7.17 Small Oscillations Actual physical problems often involve systems of differential equations . The motion of n particles in 3 dimensions is described by a system (7.160) of 3n differential equations. Electrodynamics involves the four Maxwell equations (12.34 and 12.35). Thousands of coupled differential equations describe the chemical reactions among the many molecular species in a living cell. Despite this bewildering complexity, many systems merely execute small oscillationsabout the minimum of their potential energy. The lagrangian L

=

3n ± mi i =1

2

x˙i2

− U ( x)

(7.201)

describes n particles of mass m i interacting through a potential U ( x ) that has no √ explicit time dependence. By letting qi = m i / m xi we may scale the masses to the same value m and set V (q ) = U ( x ), so that we have L

=

m± 2 q˙ 2 i =1 i 3n

− V (q) = m2 q˙ · q˙ − V ( q)

(7.202)

which describes n particles of mass m interacting through a potential V ( q ). Since U (x ) and V (q ) depend upon time only because the variables x i and qi vary with the time, the energy or equivalently the hamiltonian H

=

3n ±

p 2i 2m i =1

+ V (q )

(7.203)

278

7 Differential Equations

is conserved. It has a minimum energy E 0 at q0 , and so its first derivatives there vanish. So near q0 , the potential V (q ) to lowest order is a quadratic form in the displacements r i ≡ qi − q i0 from their minima qi0 , and the lagrangian, apart from the constant V (q 0) , is L

3n ±

≈ m2

i =1

r˙i2 −

1 2

3n ± j,k =1

r j rk

∂2 V (q 0) ∂ q j ∂ qk

.

(7.204)

The matrix V ±± of second derivatives is real and symmetric, and so we may diagonalize it V ±± = O Vd±± O by an orthogonal transformation O. The lagrangian is diagonal in the new coordinates s = O r T

L

≈

) 1 ±( 2 m s˙i − Vdi±± si2 2 i =1 3n

(7.205)

and Lagrange’s equations are m sï =¸−Vdi±± s i . These normal ¸ modesare uncoupled harmonic oscillators s i (t ) = ai cos Vdi±± / m t + b i sin Vdi±± / m t with frequencies that are real because q 0 is the minimum of the potential. 7.18 Systems of Ordinary Differential Equations An autonomous system of n first-order ordinary differential equations x˙1

= F1 (x1 , x2 , . . . , xn ) x˙2 = F2 (x 1 , x 2 , . . . , x n )

x˙n

. ..

(7.206)

= Fn ( x1, x2, . . . , xn )

is one in which the functions Fi ( x 1, . . . , x n ) do not depend upon the time t. Firstorder autonomous systems are very general. They can even represent systems that do depend upon time and systems of higher-order differential equations. To make a time-dependent system of n equations with time derivatives Fi ( x 1, . . . , xn , t ) autonomous, one sets t = x n+1 so that Fi ( x 1 , . . . , x n , t ) becomes Fi ( x 1, . . . , xn , x n+1 ), and Fn+1 (x 1 , . . . , x n+1 ) = x˙n+ 1 = t˙ = 1. For example, one can turn the explicitly time-dependent differential equation x˙ = − x 3 + sin3 (t ) into the autonomous system x˙1 = − x 13 + sin 3( x 2) and x˙2 = 1 by setting x 1 = x and x 2 = t. A similar trick turns a system of n higher-order ordinary differential equations into a system of more than n first-order differential equations. For instance, we can ... replace the third-order differential equation x = x 3 + x˙ 2 by the first-order system

7.18 Systems of Ordinary Differential Equations

x˙1 in whichx1

= x33 + x22, x˙2 = x1,

= x˙2, x2 = x˙1 ,

andx 3

x˙3

and

279

= x2

(7.207)

= x.

Example 7.32(First-order form of a second-order system)One can turn the secondorder nonlinear differential equations x¨

= − r x (x 2 − a 2) − gx y 2

and

y¨ =

− r y( y2 − b2 ) − gyx2

(7.208)

into the first-order system x˙1

= − r x3(x32 − a2) − gx3 x42 x˙2 = − r x4(x42 − b2 ) − gx4 x 32 (7.209) x˙3 = x1 x˙4 = x2 by settingx1 = x˙ , x2 = y˙ , x3 = x, andx4 = y. A representative trajectory (x (t ), y(t )) is plotted in Fig. 7.1.

Trajectory with four corners

15 10 5

y

0 –5 –10 –15 –15

–10

–5

0 x

5

10

15

Figure 7.1 A trajectory( x (t ), y (t )) of the autonomous system (7.209) for≤0 t ≤ 400 withr = .2, a = 10, b = 10, g = 1 and initial conditionsx (0) = 5, y (0 ) = 5, x˙ (0 ) = 10, y˙( 0) = 0. Matlab scripts for this chapter’s figures are in Differential_equations atgithub.com/kevinecahill.

280

7 Differential Equations

7.19 Exact Higher-Order Differential Equations An nth-order ordinary differential equation of the form 0=

d dx

´

d n −1 y (x ) Pn−1( x ) d x n−1

+

d n−2 y ( x ) Pn−2 ( x ) + · · · + P0 ( x ) y( x) d x n−2

µ

(7.210)

is exact (Bender and Orszag, 1978, p. 13). Example 7.33(Exact second-order equation) 0 = y±±

The differential equation

»

+ x 2 y ± + 2x y = dxd

y± + x2 y

¼

(7.211)

is exact. We can solve it by integrating the first-order equation y± + x 2 y = c where c is a constant. Using the method of Section 7.16 with r (x ) = x 2 and s (x ) = c, we find

¶

3 y (x ) = e− x /3 c± + c

½x 0

where c± is another constant.

e

x ±3/3

dx ±

·

(7.212)

Example 7.34(Integrating factor) The exponential ex is an integrating factor for the differential equation y ±± + (1 + 1/x ) y± + (1/x − 1/x 2) y = 0 because d x +1 x ± x −1 x e y + e y= 0 = e y ±± + 2 x

x

x

dx

´

x

e e y± + y x

x

µ

.

(7.213)

Solving the equation y ± + y /x = ce−x by using the method of Section 7.16 with r (x ) = 1/x and s (x ) = e−x , we find y( x ) = − c1 (1 + 1/ x ) e−x + c2/ x.

7.20 Constant-Coefficient Equations The exact, higher-order differential equations 0=

n ± k =0

ck

dk y dt k

(7.214)

with coefficients ck that are constants independent of t are soluble. We try y (t ) e zt and find that the complex number z must satisfy the equation 0=

n ± k=0

k

ck z e

zt

or 0 =

n ± k =0

ck z k .

=

(7.215)

The fundamental theorem of algebra (Section 6.9) tells us that this equation always has n solutions z i = z 1, z 2, . . . , z n . For each root z i of the polynomial P (z ) = c0 + c1 z + · · · + c n z n , the function y (t ) = e z i t is a solution of the differential equation (7.214).

7.21 Singular Points of Second-Order Ordinary Differential Equations

281

The roots need not all be different. But since partial derivatives commute, the jth z-derivative of the differential equation (7.214) is by Leibniz’s rule (5.49) n ± k =0

ck

∂k ∂tk

(t

j zt

e

n ∂j ± = ck z k e z t k j ∂t ∂z k =0 k =0 ´ µ µ j ± j ´∂ ) ∂j ( zt P (z ) t j − e zt . = ∂ z j P (z) e = ´ ∂z j

)

= ∂∂z j

n ±

ck

∂k

ez t

´

=0

(7.216)

´

´

´

Thus if z = z ± is a multiple root of order m, then the ´th z derivative of P ( z ) will ± ± ± vanish at z = z ± for ´ < m, and so for j < m, the functions e z t , tez t , . . . , t j e z t will be solutions of the differential equation (7.214). Example 7.35 (A fourth-order constant-coefficient differential equation) The dif.... ... ferential equation y − 4 y + 6 y¨ − 4y˙ + y = 0 has four solutions et , tet , t 2et , and t 3et . Example 7.36(Euler equations) An equation of the form 0=

n ± k =0

ck d k y t n −k dt k

(7.217)

is called an Euler equation or an equidimensional equation. It is an exact, constantcoefficient differential equation in the variable t = e x 0=

n ± k =0

ck±

dk y dx k

(7.218)

whose solutions y( x ) = e zi x or y(t ) = t z i are given in terms of the roots z i of the polynomial P ± (z ) = c0± + c1± z + · · · + cn± zn . If z ± is a root order m of this polynomial, ± ± then the functions (ln t ) t z , . . . , (ln t )m −1 t z also are solutions.

7.21 Singular Points of Second-Order Ordinary Differential Equations

If in the ODE y ±± = f ( x , y , y ± ), the acceleration y ±± = f ( x 0, y , y ±) is finite for all finite y and y ±, then x 0 is a regular pointof the ODE. If y ±± = f (x 0 , y , y ± ) is infinite for any finite y and y ±, then x 0 is a singular pointof the ODE. If a second-order ODE y ±± + P ( x ) y ± + Q ( x ) y = 0 is linear and homogeneous and both P ( x 0) and Q ( x 0) are finite, then x 0 is a regular point of the ODE. But if P ( x 0) or Q (x 0 ) or both are infinite, then x0 is a singular point. Some singular points are regular. If P (x ) or Q (x ) diverges as x → x 0, but both ( x − x 0 ) P ( x ) and ( x − x 0 )2 Q (x ) remain finite as x → x 0, then x 0 is a regular singular pointor equivalently a nonessential singular point. But if either

282 (x

7 Differential Equations

−x0 ) P ( x ) or ( x − x0) 2 Q ( x ) diverges as x → x0 , then x0 is an irregular singular

point or equivalently an essential singularity. To treat the point at infinity, one sets z = 1/x. Then if (2z − P (1 /z ))/ z 2 and Q (1/ z )/z 4 remain finite as z → 0, the point x 0 = ∞ is a regular pointof the ODE. If they don’t remain finite, but (2z − P ( 1/z ))/ z and Q (1/z )/z 2 do remain finite as z → 0, then x 0 = ∞ is a regular singular point . Otherwise the point at infinity is an irregular singular pointor an essential singularity. Example 7.37(Legendre’s equation)

¹»

Its self-adjoint form is

¼ ± º± 2

+ ´(´ + 1) y = 0 which is (1 − x 2 )y ±± − 2x y ± + ´(´ + 1) y = 0 or ´(´ + 1 ) 2x y± + y = 0. y±± − 2 1−x 1 − x2 It has regular singular points at x = ¶1 and x = ∞ (Exercise 7.15 ). 1−x

y

(7.219)

(7.220)

7.22 Frobenius’s Series Solutions Frobenius showed how to find a power-series solution of a second-order linear homogeneous ordinary differential equation y ±± + P ( x ) y ± + Q ( x ) y = 0 at any of its regular or regular singular points. We set p( x ) = x P ( x ) and q (x ) = x 2 Q ( x ) and assume that p and q are polynomials or analytic functions, and that x = 0 is a regular or regular singular point of the ODE so that p (0 ) and q (0) are both finite. Then writing the differential equation as x 2 y ±± + x p (x ) y ± + q ( x ) y

=0 ∞ ± r

= 0,

(7.221)

we expand y as a power series in x about x y( x ) = x in which a 0 we have

(7.222)

³= 0 is the coefficient of the lowest power of x in y( x ). Differentiating, y ±( x ) =

and

n=0

an x n

∞ ± ±± y (x ) = (r n=0

∞ ±

+ n ) an x r+n −1

(7.223)

+ n )(r + n − 1) an xr +n−2 .

(7.224)

n=0

(r

7.22 Frobenius’s Series Solutions

283

When we substitute the three series (7.222–7.224) into our differential equation (7.221), we find

∞ ² ± n =0

(n

+ r )(n + r − 1) + (n + r ) p( x ) + q (x )

³ a x n+r = 0.

(7.225)

n

If this equation is to be satisfied for all x, then the coefficient of every power of x r must ² vanish. The lowest power ³ of x is x , and it occurs when n = 0 with coefficient r (r − 1 + p( 0)) + q ( 0) a 0. Thus since a0 ³ = 0, we have r (r

− 1 + p(0)) + q( 0) = 0.

(7.226)

This quadratic indicial equationhas two roots r 1 and r 2 . To analyze higher powers of x, we introduce the notation p(x ) =

∞ ± j =0

p jx

j

and q ( x ) =

∞ ± j =0

q jx j

(7.227)

in which p0 = p (0 ) and q0 = q (0 ). The requirement (exercise 7.16) that the coefficient of x r +k vanish gives us a recurrence relation ak

=−

¶

1

(r

+ k )(r + k − 1 + p0 ) + q0

·± k −1 ² j =0

(j

+ r ) pk− j + qk − j

³

a j (7.228)

that expresses ak in terms of a0 , a1 , . . . , a k−1 . When p (x ) and q ( x ) are polynomials of low degree, these equations become much simpler. When the roots r 1 and r 2 are complex, the coefficients an also are complex, and the real and imaginary parts of the complex solution y ( x ) are two solutions of the differential equation. Example 7.38(Sines and cosines) To apply Frobenius’s method the ODE y ±± +ω2 y = 0, we first write it in the form x 2 y±± + x p (x ) y± + q(x ) y = 0 in which p(x ) = 0 and q (x ) = ω2 x 2 . So both p(0) = p0 = 0 and q (0) = q0 = 0, and the indicial equation (7.226) is r (r − 1) = 0 with roots r1 = 0 and r2 = 1. We first set r = r1 = 0. Since the p’s and q’s vanish except for q2 = ω 2, the recurrence relation (7.228) is ak = −q2 ak −2 / k (k − 1) = −ω2 ak−2 / k (k − 1). Thus a2 = −ω2a0 /2, and a2n = (−1)n ω2n a0/(2n)!. The recurrence relation (7.228) gives no information about a1, so to find the simplest solution, we set a1 = 0. The recurrence relation ak = −ω2 ak −2 / k (k − 1) then makes all the terms a2n+1 of odd index vanish. Our solution for the first root r1 = 0 then is y( x ) =

∞ ± n=0

an x n

= a0

∞ ± n=0

(

2n

−1)n (ω(2nx ))! = a0 cos ω x .

(7.229)

284

7 Differential Equations

Similarly, the recurrence relation (7.228) for the second root r2 = 1 is ak = −ω2ak −2/ k (k + 1), so that a2n = (−1)n ω2n a0/(2n + 1)!, and we again set all the

terms of odd index equal to zero. Thus we have y( x ) = x

∞ ±

n =0

an x

n

=

a0 ω

∞ ±

n =0

as our solution for the second root r2

2n +1

x) a0 −1)n (ω = (2n + 1)! ω

(

sin ωx

(7.230)

= 1.

Frobenius’s method sometimes shows that solutions exist only when a parameter in the ODE assumes a special value called an eigenvalue. Example 7.39 (Legendre’s equation) If one rewrites Legendre’s equation (1 − x 2 )y ±± − 2x y ± + λ y = 0 as x 2 y ±± + x py ± + qy = 0, then one finds p( x ) = −2x 2/(1 − x 2 ) and q (x ) = x 2λ/(1 − x 2), which are analytic but not polynomials. In this case, it is simpler to substitute the expansions (7.222–7.224) directly into Legendre’s equation (1 − x 2 )y ±± − 2x y ± + λ y = 0. We then find

∞ ¹ ± n=0

(n

+ r )(n + r − 1)(1 − x 2 )x n+r −2 − 2(n + r )x n+r + λx n+r

º

an

= 0.

The coefficient of the lowest power of x is r (r − 1)a0 , and so the indicial equation is r (r − 1) = 0. For r = 0, we shift the index n on the term n(n − 1)x n−2 an to n = j + 2 and replace n by j in the other terms:

∞Ç ± j =0

(j

+ 2)( j + 1) a j +2 −

² j ( j − 1) + 2 j − λ³ a È x j = 0.

(7.231)

j

Since the coefficient of x j must vanish, we get the recursion relation a j +2

= ( jj (+j +2)(1)j −+ 1λ) a j

(7.232)

which for big j says that a j +2 ≈ a j . Thus the series (7.222) does not converge for |x | ≥ 1 unless λ = j ( j + 1) for some integer j in which case the series (7.222) is a Legendre polynomial (Chapter 9).

Frobenius’s method also allows one to expand solutions about x 0 y( x ) = ( x

− x 0)

r

∞ ± n =0

an (x

− x0 )n .

³= 0 (7.233)

7.24 Even and Odd Differential Operators

285

7.23 Fuch’s Theorem The method of Frobenius can run amok, especially if one expands about a singular point x 0 . One can get only one solution or none at all. But Fuch has shown that if one applies Frobenius’s method to a linear homogeneous second-order ODE and expands about a regular point or a regular singular point, then one always gets at least one power-series solution:

1. If the two roots of the indicial equation are equal, one gets only one solution. 2. If the two roots differ by a noninteger, one gets two solutions. 3. If the two roots differ by an integer, then the bigger root yields a solution. Example 7.40(Roots that differ by an integer) If one applies the method of Frobenius to Legendre’s equation as in Example 7.39, then one finds (Exercise 7.18) that the k = 0 and k = 1 roots lead to the same solution.

7.24 Even and Odd Differential Operators

→ −x, a typical term transforms as k! k! xk = x n+k − p → (−1) n+k − p x n+ k− p (k − p )! (k − p) !

Under the parity transformation x xn

´ d µp dx

(7.234)

and so the corresponding differential operator transforms as x

n

´ d µp dx

→ (−1)

n− p

x

n

´ d µp dx

.

(7.235)

The reflected form of the second-order linear differential operator L ( x ) = h 0( x ) + h 1( x )

d dx

2

+ h2 (x ) ddx 2

(7.236)

therefore is

d d2 (7.237) + h 2 ( −x ) 2 . dx dx The operator L ( x ) is even if it is unchanged by reflection, that is, if h 0 (−x ) = h 0 ( x ), h 1 (− x ) = −h 1 (x ), and h 2 (− x ) = h 2 (x ) , so that L (− x ) = h 0 (−x ) − h 1 (−x )

L ( −x ) = L ( x ).

(7.238)

L (− x ) = − L ( x ).

(7.239)

It is odd if it changes sign under reflection, that is, if h 0( −x ) = −h 0( x ), h 1 (−x ) = h 1 ( x ), and h 2(− x ) = −h 2 ( x ), so that

286

7 Differential Equations

Not every differential operator L ( x ) is even or odd. But just as we can write every function f (x ) whose reflected form f (−x ) is well defined as the sum of [ f ( x ) + f (− x )]/2 which is even and [ f ( x ) − f ( −x )]/2 which is odd f (x ) =

³ 1 ² f (x ) − f (− x )³ 1² f ( x ) + f (− x ) + 2 2

(7.240)

1 1 [L ( x ) + L (−x )] + [L ( x ) − L (− x )]. 2 2

(7.241)

so too we can write every differential operator L ( x ) whose reflected form L (− x ) is well defined as the sum of one that is even and one that is odd L (x ) =

Many of the standard differential operators have h 0 = 1 and are even. If y ( x ) is a solution of the ODE L ( x ) y (x ) = 0 and L (−x ) and y (− x ) are well defined, then we have L (− x ) y (−x ) = 0. If further L (− x ) = ¶ L ( x ), then y (− x ) also is a solution L ( x ) y (− x ) = 0. Thus if a differential operator L ( x ) has a definite parity, that is, if L ( x ) is either even or odd, then y (−x ) is a solution if y ( x ) is, and solutions come in pairs y ( x ) ¶ y (−x ), one even, one odd. 7.25 Wronski’s Determinant If the N functions y 1( x ), . . . , y N ( x ) are linearly dependent, then by (7.7) there is a set of coefficients k 1, . . . , k N , not all zero , such that the sum 0 = k 1 y 1( x ) + · · · + k N y N ( x )

(7.242)

vanishes for all x. Differentiating i times, we get 0 = k 1 y1( i ) ( x ) + · · · + k N y N(i ) ( x )

(7.243)

for all x. So if we use the y j and their derivatives to define the matrix Yi j (x ) ≡ y (ji −1) ( x )

(7.244)

then we may express the linear dependence (7.242) and (7.243) of the functions y1 , . . . , y N in matrix notation as 0 = Y ( x ) k for some nonzero vector k = (k 1, k2 , . . . , k N ). Since the matrix Y (x ) maps the nonzero vector k to zero, its determinant must vanish: det( Y ( x )) ≡ |Y (x ) | = 0. This determinant

ÀÀ

W ( x ) = |Y (x )| = À y j

(i

À

−1) (x )ÀÀ

(7.245)

is called Wronski’s determinantor the wronskian. It vanishes on an interval if and only if the functions y j ( x ) or their derivatives are linearly dependent on the interval.

7.26 Second Solutions

287

7.26 Second Solutions If we have one solution to a second-order linear homogeneous ODE, then we may use the wronskian to find a second solution. Here’s how: If y1 and y2 are two linearly independent solutions of the second-order linear homogeneous ordinary differential equation y ±± (x ) + P (x ) y ±( x ) + Q (x ) y (x ) = 0

(7.246)

then their wronskian does not vanish

ÀÀ W ( x ) = ÀÀ

y1 ( x ) y1± ( x )

y 2( x ) y 2± ( x )

ÀÀ ÀÀ = y1 (x) y2± (x ) − y2(x ) y1± (x ) ³= 0

(7.247)

except perhaps at isolated points. Its derivative

= y1± y2± + y1 y2±± − y2± y1± − y2 y1±± = y1 y2±± − y2 y1±±

(7.248)

( ) ( ) = − y1 P y2± + Q y2 + y2 P y1± + Q y1 ) ( = − P y1 y2± − y2 y1±

(7.249)

· ¶ ½x ± ± P (x )d x . W (x ) = W ( x 0) exp −

(7.250)

W± must obey W±

or W ± (x ) = − P (x ) W (x ) which we integrate to

x0

This is Abel’s formula for the wronskian (Niels Abel, 1802–1829). Having expressed the wronskian in terms of the known function P ( x ), we now use it to find y 2( x ) from y1( x ). We note that d W = y 1 y2± − y 2 y1± = y12

dx

So

d dx

ý µ

which we integrate to y2( x ) = y1( x )

2

y1

¶½

ý µ 2

y1

(7.251)

.

= W2

(7.252)

y1

x

W (x ± )

dx± + c 2 ± y1 ( x )

·

(7.253)

.

Using our formula (7.250) for the wronskian, we find as the second solution y2 (x ) = y1( x )

½

x

1 exp 2 ± y1 ( x )

¾ ½ −

x±

P (x ±± )d x ±±

¿

dx±

(7.254)

288

7 Differential Equations

apart from additive and multiplicative constants. In the important special case in which P ( x ) = 0 , the wronskian is a constant, W ± ( x ) = 0, and the second solution is simply y2( x ) = y 1( x )

½

x

dx±

y12 ( x ±)

(7.255)

.

By Fuchs’s theorem, Frobenius’s expansion about a regular point or a regular singular point yields at least one solution. From this solution, we can use Wronski’s trick to find a (linearly independent) second solution. So we always get two linearly independent solutions if we expand a second-order linear homogeneous ODE about a regular point or a regular singular point. 7.27 Why Not Three Solutions? We have seen that a second-order linear homogeneous ODE has two linearly independent solutions. Why not three? If y1 , y 2, and y3 were three linearly independent solutions of the second-order linear homogeneous ODE 0 = y ±±j + P y ±j

+ Q yj ,

then their third-order wronskian

ÀÀ À W = ÀÀ À

y1 y1± y1±±

y2 y2± y2±±

y3 y3± y3±±

(7.256)

ÀÀ ÀÀ ÀÀ

(7.257)

would not vanish except at isolated points. But the ODE (7.256) relates the second derivatives y ±±j = −( P y ±j + Q y j ) to the y ±j and the y j , and so the third row of this third-order wronskian is a linear combination of the first two rows. Thus it vanishes identically

ÀÀ y1 y2 y3 À y1± y2± y3± W = ÀÀ À − P y1± − Qy1 − P y2± − Qy2 − P y3± − Qy3

ÀÀ ÀÀ ÀÀ = 0

(7.258)

and so any three solutions of a second-order ODE (7.256) are linearly dependent. One may extend this argument to show that an nth-order linear homogeneous ODE can have at most n linearly independent solutions. To do so, we’ll use superscript notation (2.6) in which y (n) denotes the nth derivative of y ( x ) with respect to x dn y (7.259) y ( n) ≡ n . dx

7.28 Boundary Conditions

289

Suppose there were n + 1 linearly independent solutions y j of the ODE y (n ) + P1 y (n −1) + P2 y ( n−2) + · · · + Pn−1 y ± + Pn y

=0

(7.260)

in which the Pk ’s are functions of x. Then we could form a Wronskian of order (n + 1) in which row 1 would be y1 , . . . , yn+1 , row 2 would be the first derivatives y1± , . . . , yn± +1, and row n + 1 would be the nth derivatives y 1(n) , . . . , y n(n+)1. We could then replace each term yk(n ) in the last row by yk(n ) = − P1 yk( n−1) − P2 yk(n −2) − · · · − Pn−1 yk± − Pn yk .

(7.261)

But then the last row would be a linear combination of the first n rows, the determinant would vanish, and the n +1 solutions would be linearly dependent. This is why an nth-order linear homogeneous ODE can have at most n linearly independent solutions. 7.28 Boundary Conditions Since an nth-order linear homogeneous ordinary differential equation can have at most n linearly independent solutions, it follows that we can make a solution unique by requiring it to satisfy n boundary conditions. We’ll see that the n arbitrary coefficients ck of the general solution y( x) =

n ± k =1

c k yk ( x )

(7.262)

of the differential equation (7.260) are fixed by the n boundary conditions y (x 1 ) = b 1,

y (x 2 ) = b2 ,

...,

y( xn ) = bn

(7.263)

as long as the functions yk (x ) are linearly independent, and as long as the matrix Y with entries Y jk = yk ( x j ) is nonsingular, that is, det Y ³ = 0. In matrix notation, with B a vector with components b j and C a vector with components ck , the n boundary conditions (7.263) are y( x j ) =

n ± k =1

ck yk (x j ) = b j

or Y C

= B.

(7.264)

Thus since det Y ³ = 0, the coefficients are uniquely given by C = Y −1 B. (´ ) The boundary conditions can involve the derivatives y k j ( x j ). We then define (´ ) the matrix Y jk = yk j (x j ), write the n boundary conditions as y

(´ j )

(x j )

=

n ± k =1

(´ j )

c k yk

(x j )

= bj

(7.265)

290

7 Differential Equations

or as Y C = B, and show (Exercise 7.20) that as long as the matrix Y is nonsingular, the n coefficients are uniquely C = Y −1 B. But what if all the b j are zero? If all the boundary conditions are homogeneous Y C = 0, and det Y ³ = 0, then Y −1 Y C = C = 0, and y k ( x ) ≡ 0 is the only solution. So there is no solution if B = 0 and the matrix Y is nonsingular. But if the n × n matrix Y has rank n − 1, then (Section 1.38) it maps a unique vector C to zero (apart from an overall factor). So if all the boundary conditions are homogeneous, and the matrix Y has rank n − 1, then the solution y = c k y k is unique. But if the rank of Y is less than n − 1, the solution is not unique. Since a matrix of rank zero vanishes identically, any nonzero 2 × 2 matrix Y must be of rank 1 or 2. Thus a second-order ODE with two homogeneous boundary conditions either has a unique solution or none at all. Example 7.41(Boundary conditions and eigenvalues) The solutions yk of the differential equation − y±± = k2 y are y1 (x ) = sin kx and y2(x ) = cos kx . If we impose the boundary conditions y (−a ) = 0 and y(a) = 0, then the matrix Y jk = yk (x j ) is Y

=

´ − sin ka sin ka

cos ka cos ka

µ

(7.266)

with determinant det Y = − 2 sin ka cos ka = − sin 2ka. This determinant vanishes only if ka = n π/2 for some integer n, so if ka ³ = n π/2, then no solution y of the differential equation −y ±± = k 2 y satisfies the boundary conditions y (−a) = 0 = y(a). But if ka = nπ/2, then there is a solution, and it is unique because for even (odd) n, the first (second) column of Y vanishes, but not the second (first), which implies that Y has rank 1. One may regard the condition ka = n π/2 either as determining the eigenvalue k 2 or as telling us what interval to use.

7.29 A Variational Problem For what functions u (x ) is the “energy” functional

½ b² ³ p (x ) u ±2( x ) + q ( x )u 2 (x ) d x E [u ] ≡ a

(7.267)

stationary? That is, for what functions u is E [u + δ u ] unchanged to first order in δu when u ( x ) is changed by an arbitrary but tiny function δu (x ) to u ( x ) + δ u (x ) ? Our equations will be less cluttered if we drop explicit mention of the x-dependence of p, q, and u which we assume to be real functions of x. The first-order change in E is δE

[u ] ≡

½ b( a

)

p 2u ± δu ± + q 2u δ u d x

(7.268)

7.30 Self-Adjoint Differential Operators

291

in which the change in the derivative of u is δ u ± = u ± + (δ u )± − u ± δ E = 0 and integrating by parts, we have

= (δu) ± . Setting

½ b² ³ 0 = δE = p u ±(δ u )± + q u δ u d x ½ b ¹( a )± ( )± º p u ±δ u − p u ± δ u + q u δ u d x = ½a b ¹ ( ) ± º ¹ ºb = − p u± + q u δ u d x + p u ± δ u . a

a

(7.269)

So if E is to be stationary with respect to all tiny changes δ u that vanish at the endpoints a and b, then u must satisfy the differential equation Lu

=−

( ± )± pu

+ q u = 0.

(7.270)

If instead E is to be stationary with respect to all tiny changes δ u, then u must satisfy the differential equation (7.270) as well as the natural boundary conditions 0 = p (b) u ± (b)

and

u ± (a ) = 0

and

0 = p( a ) u ± (a ).

(7.271)

u ± (b ) = 0

(7.272)

If p( a ) ³ = 0 ³ = p (b ), then these natural boundary conditions imply Neumann’s boundary conditions (Carl Neumann, 1832–1925). 7.30 Self-Adjoint Differential Operators If p ( x ) and q (x ) are real, then the differential operator L

=−

d dx

´

d p(x ) dx

µ

+ q (x )

(7.273)

is formally self adjoint. Such operators are interesting because if we take any two functions u and v that are twice differentiable on an interval [a , b] and integrate v L u twice by parts over the interval, we get (v, L

½

½

¹ ( ± )± º − pu + qu d x a a ½ b² ³ ² ³b = pu ± v ± + uq v d x − v pu ± a ½a b ² ³ ³ ² = −( pv ± )± + q v u d x + pu v ± − v pu ± ba ½a b ² ³ = ( L v) u d x + p(uv ± − v u± ) b

u) =

b

v

a

L u dx

=

b

v

a

(7.274)

292

7 Differential Equations

which is Green’s formula

½b a

(v Lu

− u L v) d x =

² p(u v± − vu ±)³b = ² pW (u, v)³b a

a

(7.275)

(George Green, 1793–1841). Its differential form is Lagrange’s identity vL

²

³±

u − u L v = p W ( u , v)

(7.276)

(Joseph-Louis Lagrange, 1736–1813). Thus if the twice-differentiable functions u and v satisfy boundary conditions at x = a and x = b that make the boundary term (7.275) vanish

²

³b ² ³ = pW ( u, v) ba = 0 a

p (u v± − vu ± )

(7.277)

then the real differential operator L is symmetric (v,

L u) =

½b a

=

L u dx

v

½b a

u L v dx

= (u, L v).

(7.278)

A real linear operator A that acts in a real vector space and satisfies the analogous relation (1.176) (g,

A f ) = ( f , A g)

= (A g , f )

(7.279)

for all vectors in the space is said to be symmetric and self adjoint. In this sense, the differential operator (7.273) is self adjoint on the space of functions that satisfy the boundary condition (7.277). In quantum mechanics, we often deal with wave functions that are complex. So keeping L real, let’s replace u and v by twice-differentiable, complex-valued functions ψ = u 1 + iu2 and χ = v1 + i v 2. If u 1 , u 2 , v 1, and v2 satisfy boundary conditions at x = a and x = b that make the boundary terms (7.277) vanish

²

p (u i v ±j

³ ² ³ − v j u±i ) ba = pW (ui , v j ) ba = 0

then (7.278) implies that

½b a

vj

L ui d x

=

½ b( a

)

L v j ui d x

for i, j

for i , j

= 1, 2

= 1, 2.

(7.280)

(7.281)

Under these assumptions, one may show (Exercise 7.21) that the boundary condition (7.280) makes the complex boundary term vanish

²

³b ² ( ∗± ± ∗)³b = p ψχ −ψ χ a =0 a

p W (ψ, χ ∗ )

(7.282)

and (Exercise 7.22) that since L is real, the identity (7.281) holds for complex functions (χ ,

L ψ) =

½b a

∗ χ L ψ dx

=

½b a

∗

(L χ ) ψ

dx

= ( L χ , ψ ).

(7.283)

7.31 Self-Adjoint Differential Systems

293

A linear operator A that satisfies the analogous relation (1.169) (g,

A f ) = ( A g, f )

(7.284)

is said to be self adjoint or hermitian. In this sense, the differential operator (7.273) is self adjoint on the space of functions that satisfy the boundary condition (7.282). The formally self-adjoint differential operator (7.273) will satisfy the innerproduct integral equations (7.278 or 7.283) only when the function p and the twice-differentiable functions u and v or ψ and χ conspire to make the boundary terms (7.277 or 7.282) vanish. This requirement leads us to define a self-adjoint differential system. 7.31 Self-Adjoint Differential Systems A self-adjoint differential systemconsists of a real formally self-adjoint differential operator, a differential equation on an interval, boundary conditions, and a set of twice differentiable functions that obey them. A second-order differential equation needs two boundary conditions to make a solution unique (Section 7.28). In a self-adjoint differential system , the two boundary conditions are linear and homogeneousso that the set of all twice differentiable functions u that satisfy them is a vector space. This space D is the domain of the system. For an interval [a , b ], Dirichlet’s boundary conditions (Johann Dirichlet 1805–1859) are u(a) = 0

and

u (b ) = 0

(7.285)

and

u ± (b ) = 0 .

(7.286)

and Neumann’s (7.272) are u ± (a ) = 0

We will require that all the functions in the domain D either obey Dirichlet boundary conditions or obey Neumann boundary conditions. The adjoint domain D∗ of a differential system is the set of all twicedifferentiable functions v that make the boundary term (7.277) vanish

² p(uv ± − v u± )³b = ² pW (u, v)³b = 0 a

a

(7.287)

for all functions u that are in the domain D, that is, that satisfy either Dirichlet or Neumann boundary conditions. A differential system is regular and self adjointif the differential operator Lu = −( pu ± )± + qu is formally self adjoint, if the interval [a, b] is finite, if p, p±, and q are continuous real functions of x on the interval, if p( x ) > 0 on [a , b ], and if the two domains D and D ∗ coincide, D = D ∗ .

294

7 Differential Equations

One may show (Exercises 7.23 and 7.24) that if D is the set of all twicedifferentiable functions u ( x ) on [a , b] that satisfy either Dirichlet’s boundary conditions (7.285) or Neumann’s boundary conditions (7.286), and if the function p (x ) is continuous and positive on [a , b], then the adjoint set D ∗ is the same as D. A real formally self-adjoint differential operator Lu = −( pu ± )± + qu together with Dirichlet (7.285) or Neumann (7.286) boundary conditions therefore forms a regular and self-adjoint system if p, p ± , and q are real and continuous on a finite interval [a, b], and p is positive on [a, b]. Since any two functions u and v in the domain D of a regular and self-adjoint differential system make the boundary term (7.287) vanish, a real formally selfadjoint differential operator L is symmetric and self adjoint (7.278) on all functions in its domain (v,

L u) =

½b a

v

L u dx

=

½b a

u L v dx

= (u, L v).

(7.288)

If functions in the domain are complex, then by (7.282 and 7.283) the operator L is self adjoint or hermitian (χ ,

L ψ) =

½

b a

∗ χ L ψ dx

=

½b a

∗

(L χ ) ψ

dx

= (L χ , ψ )

(7.289)

on all complex functions ψ and χ in its domain. Example 7.42 (Sines and cosines) adjoint differential operator

The differential system with the formally selfL

2

= − dxd 2

(7.290)

on an interval [a , b] and the differential equation L u = − u ±± = λ u has the function p(x ) = 1. If we choose the interval to be [−π, π ] and the domain D to be the set of all functions that are twice differentiable on this interval and satisfy Dirichlet boundary conditions (7.285), then we get a self-adjoint differential system in which the domain includes linear combinations of un (x ) = sin nx . If instead, we impose Neumann boundary conditions (7.286), then the domain D contains linear combinations of u n (x ) = cos nx . In both cases, the system is regular and self adjoint.

Some important differential systems are self adjoint but singular because the function p( x ) vanishes at one or both of the endpoints of the interval [a , b ] or because the interval is infinite, for instance [0 , ∞) or (−∞, ∞). In these singular, self-adjoint differential systems, the boundary term (7.287) vanishes if u and v are in the domain D = D ∗ .

7.32 Making Operators Formally Self-Adjoint Example 7.43 (Legendre’s system) operator is L

Legendre’s formally self-adjoint differential

= − dxd

and his differential equation is

295

¶

− x 2 ) dxd

(1

·

(7.291)

º± (7.292) − x 2)u± = ´(´ + 1) u on the interval [−1, 1]. The function p(x ) = 1−x 2 vanishes at both endpoints x = ¶1, and so this self-adjoint system is singular. Because p(¶1) = 0, the boundary term Lu=

−

¹

(1

(7.287) is zero as long as the functions u and v are differentiable on the interval. The domain D is the set of all functions that are twice differentiable on the interval [−1, 1]. Example 7.44(Hermite’s system) ator is L

Hermite’s formally self-adjoint differential oper2

= − dxd 2 + x 2

(7.293)

and his differential equation is

= − u±± + x 2 u = (2n + 1) u (7.294) on the interval (−∞, ∞ ). This system has p(x ) = 1 and q (x ) = x 2. It is self adjoint Lu

but singular because the interval is infinite. The domain D consists of all functions that are twice-differentiable and that go to zero as x → ¶∞ faster than 1/x 3/ 2, which ensures that the relevant integrals converge and that the boundary term (7.287) vanishes.

7.32 Making Operators Formally Self-Adjoint We can make a generic real second-order linear homogeneous differential operator L0 formally self adjoint d L =− dx

¶

d p( x ) dx

·

2

= h2 ddx 2 + h1 ddx + h0

(7.295)

2

+ q (x ) = − p(x ) ddx 2 − p± ( x ) ddx + q( x )

(7.296)

by first dividing through by −h 2 ( x ) L1

2

= − h1 L 0 = − ddx 2 − hh1 ddx − hh 0 2

2

2

(7.297)

296

7 Differential Equations

and then by multiplying L 1 by the positive prefactor p (x ) = exp

´½ x h (y) µ 1

dy

h2 ( y)

>

0.

(7.298)

The product p L 1 then is formally self adjoint

´½ x h (y) µ ¶ d2

·

h 1 (x ) d h 0 (x ) dy + + L = p ( x ) L 1 = − exp 2 h2 ( y) dx h 2 (x ) d x h 2 (x ) ¶ ´½ µ · ´½ µ x x h 1( y ) d h1 ( y) h0 (x ) dy − exp dy = − ddx exp h 2( y ) dx h2 ( y) h2 (x ) ´ µ (7.299) = − ddx p ddx + q 1

with q ( x ) = − p (x ) h 0( x )/ h 2( x ). So we may turn any second-order linear homogeneous differential operator L 0 (7.295) into a formally self-adjoint operator L by multiplying it by ρ (x )

=−

exp

(É x h (y )/ h ( y)dy) 1

2

h2 (x )

= − hp((xx)) . 2

(7.300)

The two differential equations L 0 u = 0 and Lu = ρ L 0 u = 0 have the same solutions, but under the transformation (7.300), an eigenvalue equation L 0 u = λ u becomes Lu = ρ L 0u = ρλu which is an eigenvalue equation Lu

= −( pu ± )± + qu = λ ρ u

(7.301)

with a weight function ρ (x ). Such an eigenvalue problem is known as a Sturm– Liouville problem (Jacques Sturm, 1803–1855; Joseph Liouville, 1809–1882). If h 2 (x ) is negative (as for many positive operators), then the weight function ρ (x ) = − p(x )/ h2 (x ) is positive. 7.33 Wronskians of Self-Adjoint Operators We saw in (7.246–7.250) that if y 1( x ) and y2 (x ) are two linearly independent solutions of the ODE y ±± ( x ) + P ( x ) y ± ( x ) + Q ( x ) y ( x ) = 0

(7.302)

then their Wronskian W ( x ) = y1 ( x ) y2± ( x ) − y2( x ) y1± ( x ) is

¶ ½ W ( x ) = W ( x0 ) exp −

x x0

P (x ± )d x ±

· .

(7.303)

7.33 Wronskians of Self-Adjoint Operators

297

Thus if we convert the ODE (7.302) to its formally self-adjoint form

−

²

³±

p( x ) y± ( x )

then P ( x ) =

2

+ q( x ) y( x ) = − p( x ) d dyx(2x ) − p±( x ) dyd(xx ) + q (x ) y( x) = 0

(7.304)

p ± (x )/ p ( x ), and so the Wronskian (7.303) is

¶ ½ W ( x ) = W ( x0 ) exp −

x x0

which we may integrate directly to W ( x ) = W ( x0 ) exp

²

− ln

²

p± ( x ± )/ p (x ± )d x ±

³³

p (x )/ p ( x 0)

·

(7.305)

= W ( x0) pp((xx0)) .

(7.306)

We learned in (7.246–7.254) that if we had one solution y1 ( x ) of the ODE (7.302 or 7.304), then we could find another solution y2 (x ) that is linearly independent of y1 ( x ) as y2 (x ) = y1 (x )

½

x

W (x ± ) ± dx . y 12( x ± )

(7.307)

In view of (7.303), this is an iterated integral. But if the ODE is formally self adjoint, then the formula (7.306) reduces it to y2( x ) = y 1( x )

½

x

1 dx± p ( x ± ) y12 (x ± )

(7.308)

apart from a constant factor. Example 7.45 (Legendre functions of the second kind) differential equation (9.4) is

Legendre’s self-adjoint

º± − x 2 ) y ± = ´(´ + 1) y (7.309) and an obvious solution for ´ = 0 is y(x ) ≡ P0 (x ) = 1. Since p(x ) = 1 − x 2 , the −

¹

(1

integral formula (7.308) gives us as a second solution Q 0( x ) = P0 (x )

½

x

1

p (x ± ) P02(x ± )

dx ± =

½x

1

(1

− x 2)

´

1+x 1 dx ± = ln 2

1−x

µ .

(7.310)

This second solution Q 0 (x ) is singular at both ends of the interval [−1, 1] and so does not satisfy the Dirichlet (7.285) or Neumann (7.286) boundary conditions that make the system self adjoint or hermitian.

298

7 Differential Equations

7.34 First-Order Self-Adjoint Differential Operators The first-order differential operator L will be self adjoint if

½

b a

∗ χ Lψ dx

=

½ b( a

L

†

= u ddx + v

χ

)∗

ψ

=

dx

(7.311)

½b a

∗

(L χ ) ψ

dx.

(7.312)

Starting from the first term, we find

½b a

∗ χ Lψ d x

½b ( ) = χ ∗ u ψ ± + vψ d x ½a b ² ³ ² ³ = (−χ ∗ u)± + χ ∗ v ψ d x + χ ∗ uψ ba ½a b ² ³ ² ³ = (−χ u∗ )± + χ v ∗ ∗ ψ d x + χ ∗ uψ ba ½a b ² ³ ² ³ = −u∗ χ ± + (v∗ − u∗± )χ ∗ ψ d x + χ ∗ uψ ba . a

So if the boundary terms vanish

²χ ∗ uψ ³b = 0

(7.314)

a

= −u and v ∗ − u∗± = v , then ½ ½ b² ½b ³ ∗ ± ∗ u χ + vχ ψ d x = χ Lψ d x =

and if both u ∗ a

a

and so L will be self adjoint or hermitian, L † self-adjoint linear operator is then L

(7.313)

b a

∗ (L χ ) ψ

dx

(7.315)

= L. The general form of a first-order

= ir ( x ) ddx + s( x ) + 2i r ± ( x )

(7.316)

in which r and s are arbitrary real functions of x. Example 7.46(Momentum and angular momentum) p=

±

d i dx

The momentum operator (7.317)

has r = − ±, which is real, and s = 0 and so is formally self adjoint. The boundary terms (7.314) are zero if the functions ψ and χ vanish at a and b, which often are ¶∞. The angular-momentum operators L i = ±i jk x j pk , where pk = −i ± ∂k , also are formally self adjoint because the total antisymmetry of ±i jk ensures that j and k are different as they are summed from 1 to 3.

7.35 A Constrained Variational Problem

299

Example 7.47(Momentum in a magnetic field) In a magnetic field B differential operator

± i

= ∇ × A, the

∇ −e A

(7.318)

that (in SI units) represents the kinetic momentum m v is formally self adjoint as is its Yang–Mills analog (13.308) when divided by i .

7.35 A Constrained Variational Problem In quantum mechanics, we usually deal with normalizable wave functions. So let’s find the function u (x ) that minimizes the energy functional

½ b² ³ E [u ] = p ( x ) u ±2( x ) + q ( x ) u 2( x ) d x (7.319) a subject to the constraint that u ( x ) be normalized on [a , b ] with respect to a positive weight function ρ (x ) ½b 2 2 ρ (x ) u (x ) d x = 1. (7.320) N [u ] = ¸u ¸ = a

Introducing λ as a Lagrange multiplier (Section 1.24) and suppressing explicit mention of the x-dependence of the real functions p, q , ρ , and u, we minimize the unconstrained functional

µ ´½ b ½ b( ) 2 ± 2 2 ρ u dx − 1 p u + q u dx − λ E [u , λ] = a

(7.321)

a

which will be stationary at the function u that minimizes it. The first-order change in E [u , λ ] is δE

[u, λ] =

½ b( a

)

p 2u ± δ u ± + q 2u δ u − λ ρ 2u δu d x

in which the change in the derivative of u is δ u ± = u ± + (δ u )± − u ± δ E = 0 and integrating by parts, we have

= (δu) ± . Setting

½ b²

³ p u± (δ u ) ± + (q − λ ρ) u δ u d x = ½ b ¹( a ) ± ( ) ± º = p u ± δ u − p u ± δ u + (q − λ ρ) u δ u d x ½a b ¹ ( ) ± º ¹ ºb = − p u ± + (q − λ ρ) u δ u d x + p u± δ u a .

0=

1 δE 2

(7.322)

a

(7.323)

So if E is to be stationary with respect to all tiny changes δ u, then u must satisfy both the self-adjoint differential equation 0=

−

( p u ± )± + (q − λ ρ) u

(7.324)

300

7 Differential Equations

and the natural boundary conditions 0 = p (b ) u ± (b )

and

0 = p (a ) u ± (a ).

(7.325)

If instead, we require E [u , λ] to be stationary with respect to all variations δ u that vanish at the endpoints, δu (a ) = δ u (b ) = 0, then u must satisfy the differential equation (7.324) but need not satisfy the natural boundary conditions (7.325). In both cases, the function u (x ) that minimizes the energy E [u ] subject to the normalization condition N [u ] = 1 is an eigenfunction of the formally self-adjoint differential operator ´ dµ d p( x) + q(x ) (7.326) L= − dx dx with eigenvalue λ Lu

=−

( ±)± pu

+ q u = λ ρ u.

(7.327)

The Lagrange multiplier λ has become an eigenvalue of a Sturm–Liouville equation (7.301). Is the eigenvalue λ related to E [u ] and N [u ]? To keep things simple, we restrict ourselves to a regular and self-adjoint differential system (Section 7.31) consisting of the self-adjoint differential operator (7.326), the differential equation (7.327), and a domain D = D ∗ of functions u (x ) that are twice differentiable on [a , b] and that satisfy two homogeneous Dirichlet (7.285) or Neumann (7.286) boundary conditions on [a , b ]. All functions u in the domain D therefore satisfy

¹

ºb

upu ±

a

= 0.

(7.328)

We multiply the Sturm–Liouville equation (7.327) from the left by u and integrate by parts from a to b. Noting the vanishing of the boundary terms (7.328), we find

½b

λ

a

ρ

u

2

½ b ¹ ( )± º u − p u± + q u d x u Lu d x = dx = a ½a b ² ¹ ºb ³ = p u ±2 + q u 2 d x − upu± a ½a b ² ³ = p u ±2 + q u 2 d x = E [u]. ½b

a

(7.329)

Thus in view of the normalization constraint (7.320), we see that the eigenvalue λ is the ratio of the energy E [u ] to the norm N [u ]

½ b² ³ p u ±2 + q u2 d x a λ= = NE [[uu]] . ½b 2 ρ u dx a

(7.330)

7.35 A Constrained Variational Problem

But is the function that minimizes the ratio E [u ] R [u ] ≡ N [u ]

301

(7.331)

an eigenfunction u of the Sturm–Liouville equation (7.327)? And is the minimum of R [u ] the least eigenvalue λ of the Sturm–Liouville equation (7.327)? To see that the answers are yes and yes, we require δ R [u ] to vanish δR

[u] = δNE[[uu]] − E [uN]2δ[Nu ][u] = 0

(7.332)

to first order in tiny changes δu (x ) that are zero at the endpoints of the interval, δ u (a ) = δ u (b ) = 0. Multiplying both sides by N [ u ], we have δE

[u] = R [u ] δ N [u].

(7.333)

Referring back to our derivation (7.321–7.323) of the Sturm–Liouville equation, we see that since δ u (a ) = δu (b) = 0, the change δ E is

½ b ¹ ( )± º ¹ ºb − p u± + q u δu d x + 2 p u±δu a δ E [u ] = 2 ½a b ¹ ( ) ± º = 2 − p u± + q u δu d x

(7.334)

a

while δ N is δN

[u] = 2

½

b a

ρ

u δu d x .

(7.335)

Substituting these changes (7.334) and (7.335) into the condition (7.333) that R [u ] be stationary, we find that the integral

½ b ¹ ( )± º − p u ± + (q − R [u] ρ ) u δ u d x = 0 a

(7.336)

must vanish for all tiny changes δ u ( x ) that are zero at the endpoints of the interval. Thus on [a , b ], the function u that minimizes the ratio R [u ] must satisfy the Sturm– Liouville equation (7.327)

( ) − p u ± ± + q u = R [u ] ρ u with an eigenvalue λ ≡ R [u ] that is the minimum value of the ratio R [u ].

(7.337)

So the eigenfunction u 1 with the smallest eigenvalue λ1 is the one that minimizes the ratio R [u ], and λ1 = R [u 1 ]. What about other eigenfunctions with larger eigenvalues? How do we find the eigenfunction u 2 with the next smallest eigenvalue λ2 ? Simple: we minimize R [u ] with respect to all functions u that are in the domain D and that are orthogonal to u 1.

302

7 Differential Equations

Example 7.48(Infinite square well) Let us consider a particle of mass m trapped in an interval [a, b] by a potential that is V for a < x < b but infinite for x < a and for x > b. Because the potential is infinite outside the interval, the wave function u(x ) will satisfy the boundary conditions u(a) = u(b) = 0.

(7.338)

The mean value of the hamiltonian is then the energy functional

ú| H |uµ = E [u] =

½ b¹ a

º

p(x ) u± 2(x ) + q(x ) u2 (x ) dx

(7.339)

in which p(x ) = ± 2/2m and q (x ) = V a constant independent of x. Wave functions in quantum mechanics are normalized when possible. So we need to minimize the functional E [u] =

·

½ b ¶ ±2

2m

a

subject to the constraint c=

½b a

u±2 (x ) + V u2 (x ) dx

(7.340)

−1 = 0

(7.341)

u2 (x ) dx

for all tiny variations δ u that vanish at the endpoints of the interval. The weight function ρ ( x ) = 1, and the eigenvalue equation (7.327) is

± ±± − 2m u + V u = λ u. 2

For any positive integer n, the normalized function un (x ) =

´

µ1 2 ´ /

2

b −a

sin nπ

x−a b−a

(7.342)

µ

(7.343)

satisfies the boundary conditions (7.338) and the eigenvalue equation (7.342) with energy eigenvalue λn

= E [un ] =

1 2m

´ n π± µ2 + V. b−a

(7.344)

The second eigenfunction u2 minimizes the energy functional E [u] over the space of normalized functions that satisfy the boundary conditions (7.338) and are orthogonal to the first eigenfunction u1 . The eigenvalue λ2 is higher than λ1 (4 times higher). As the quantum number n increases, the energy λn = E [u n ] goes to infinity as n 2. That λ n → ∞ as n → ∞ is related (Section 7.38) to the completeness of the eigenfunctions un . Example 7.49(Harmonic oscillator) E [u] =

½ ∞ ¶ ±2

We’ll minimize the energy

·

1 u±2 (x ) + m ω2 x 2 u2 (x ) dx 2 −∞ 2m

(7.345)

7.35 A Constrained Variational Problem subject to the normalization condition N [u] = ¸u ¸2

=

½∞ −∞

u2 (x ) dx

303

= 1.

(7.346)

We introduce λ as a Lagrange multiplier and find the minimum of the unconstrained function E [u]− λ ( N [ u] − 1). Following Equations (7.319–7.327), we find that u must satisfy Schrödinger’s equation

± ±± 1 2 2 u + m ω x u = λu − 2m 2 2

which we write as

±ω

¶ mω ´ 2±

±

d m ω dx

x−

µ´

x+

±

d m ω dx

µ

+

(7.347)

·

1 u 2

= λu .

(7.348)

The lowest eigenfunction u 0 is mapped to zero by the second factor

´

µ

± x+

d m ω dx

u0 (x ) = 0

(7.349)

so its eigenvalue λ 0 is ±ω/2. Integrating this differential equation, we get u0 (x ) =

» m ω ¼1 4 /

π

±

exp

´

−

m ωx 2 2±

µ

(7.350)

in which the prefactor is a normalization constant. As in Section 3.12, one may get the higher eigenfunctions by acting on u0 with powers of the first factor inside the square brackets (7.348) un (x ) =

√1

» m ω ¼n 2 ´ /

n!

x

2±

− m±ω dxd

µn

u 0(x ).

(7.351)

The eigenvalue of un is λn = ±ω (n + 1/2). Because the eigenfunctions are complete, the eigenvalues increase without limit λn → ∞ as n → ∞. Example 7.50(Bessel’s system) Bessel’s energy functional is E [u] =

½ 1¶ 0

·

n2 x u ±2(x ) + u 2(x ) dx x

(7.352)

in which n ≥ 0 is an integer. We seek the minimum of this functional over the set of twice differentiable functions u(x ) on [0, 1] that are normalized N [u ] = ¸u¸2

=

½1 0

x u 2(x ) dx

=1

(7.353)

and that satisfy the boundary conditions u(0) = 0 for n > 0 and u(1) = 0. We’ll use a Lagrange multiplier λ (Section 1.24) and minimize the unconstrained functional E [u] − λ ( N [u ] − 1). Proceeding as in (7.319–7.327), we find that u must obey the formally self-adjoint differential equation

304

7 Differential Equations Lu=

2

− (x u ± )± + nx u = λ x u .

(7.354)

The ratio formula (7.330) and the positivity of Bessel’s energy functional (7.352) tell us that the eigenvalues λ = E [u ]/ N [u] are positive (Exercise 7.25). As we’ll see in a moment, the boundary conditions largely determine these eigenvalues λn ,m ≡ k2n,m . By changing variables to ρ = kn,m x and letting un (x ) = Jn (ρ), we arrive (Exercise 7.26) at d 2 Jn dρ 2

+

1 d Jn ρ dρ

´

+

1−

n2

µ

=0

Jn

ρ2

(7.355)

which is Bessel’s equation. The eigenvalues are determined by the condition un (1) = Jn (kn ,m ) = 0; they are the squares of the zeros of Jn (ρ). The eigenfunction of the self-adjoint differential equation (7.354) with eigenvalue λ n,m = kn2,m (and u n (0) = 0 for n > 0) is um (x ) = Jn (k n,m x ). The parameter n labels the differential system; it is not an eigenvalue. Asymptotically as m → ∞, one has (Courant and Hilbert, 1955, p. 416) lim

m →∞

λn ,m

m2 π2

=1

(7.356)

which shows that the eigenvalues λn ,m rise like m 2 as m

→ ∞.

7.36 Eigenfunctions and Eigenvalues of Self-Adjoint Systems A regular Sturm–Liouville system is a set of regular and self-adjoint differential systems (Section 7.31) that have the same differential operator, interval [a, b], boundary conditions, and domain, and whose differential equations are of Sturm–Liouville (7.327) type Lψ

= − ( p ψ ± )± + q ψ = λ ρ ψ

(7.357)

each distinguished by an eigenvalue λ . The functions p, q, and ρ are real and continuous, p and ρ are positive on [a , b], but the weight function ρ may vanish at isolated points of the interval. Since the differential systems are self adjoint, the real or complex functions in the common domain D are twice differentiable on the interval [a , b ] and satisfy two homogeneous boundary conditions that make the boundary terms (7.287) vanish p W (ψ ± ,

∗

ÀÀb =0

(7.358)

ψ )a

and so the differential operator L obeys the condition (7.289) (χ ,

L ψ) =

½

b a

∗ χ L ψ dx

of being self adjoint or hermitian.

=

½b a

∗

(L χ ) ψ

dx

= (L χ , ψ )

(7.359)

7.36 Eigenfunctions and Eigenvalues of Self-Adjoint Systems

305

Let ψi and ψ j be eigenfunctions of L with eigenvalues λi and λ j L ψi

= λi ρ ψi

and

L ψj

= λ j ρ ψj

(7.360)

in a regular Sturm–Liouville system. Multiplying the first of these eigenvalue equations by ψ ∗j and the complex conjugate of the second by ψi , we get

= ψi λ∗j ρ ψ ∗j . (7.361) Integrating the difference of these equations [a , b] and using the É b ∗ over the Éinterval b hermiticity (7.359) of L in the form a ψ j L ψi d x = a ( L ψ j ) ∗ ψi d x, we have ½b ½ b² ( ∗) ψ ∗ ψ ρ d x . ∗ ³ ∗ (7.362) ψ j L ψi − ( L ψ j ) ψi d x = λi − λ j 0= j i ∗ L ψi

ψj

Setting i

=

= ψ ∗j λi ρ ψi

∗

and

ψi ( L ψ j )

a

a

j, we find

½b ) ( 2 ∗ ρ |ψ | d x 0 = λ −λ i

i

i

a

(7.363)

which since the integral is positive, shows that the eigenvalue λi must be real. All the eigenvalues of a regular Sturm–Liouville system are real. Using λ∗j = λ j in (7.362), we see that eigenfunctions that have different eigenvalues are orthogonal on the interval [a , b ] with weight function ρ (x ) 0=

(

λi

− λj

)½

b a

∗

ψ j ρ ψi

dx.

(7.364)

Since the differential operator L , the eigenvalues λi , and the weight function ρ are all real, we may write the first of the eigenvalue equations in (7.360) both as L ψi = λi ρ ψi and as L ψi∗ = λi ρ ψi∗ . By adding these two equations, we see that the real part of ψi satisfies them, and by subtracting them, we see that the imaginary part of ψi also satisfies them. So it might seem that ψi = u i + i vi is made of two real eigenfunctions with the same eigenvalue. But each eigenfunction u i in the domain D satisfies two homogeneous boundary conditions as well as its second-order differential equation

− ( p u ±i )± + q u i = λi ρ ui

(7.365)

and so u i is the unique solution in D to this equation. There can be no other eigenfunction in D with the same eigenvalue. In a regular Sturm–Liouville system, there is no degeneracy. All the eigenfunctions u i are orthogonal and can be normalized on the interval [a , b ] with weight function ρ (x )

½b a

They may be taken to be real.

u ∗j ρ u i d x

= δi j .

(7.366)

306

7 Differential Equations

It is true that the eigenfunctions of a second-order differential equation come in pairs because one can use Wronski’s formula (7.308) y2 ( x ) = y1 (x )

½

x

dx± p (x ± ) y 12( x ± )

(7.367)

to find a linearly independent second solution with the same eigenvalue. But the second solutions don’t obey the boundary conditions of the domain. Bessel functions of the second kind, for example, are infinite at the origin. A set of eigenfunctions u i

− ( p u ±i ) ± + q ui = λi ρ u i

(7.368)

is complete in the meanin the space L 2 (a , b ) of functions f (x ) that are square integrable with weight function ρ on the interval [a , b]

½

b a

| f ( x) |2 ρ (x ) d x < ∞

(7.369)

(in the sense of Lebesgue, Section 3.9) if every such function f ( x ) can be represented by a Fourier series f (x ) =

∞ ± i =1

ai u i (x )

(7.370)

that converges in the mean

ÀÀ2 ½ b ÀÀÀ n ± À ai u i ( x )À ρ (x ) d x = 0. lim f (x ) − À n→∞ a À À i =1

(7.371)

The orthonormality (7.366) of the eigenfunctions u i ( x ) implies that the coefficients ai are the integrals ai

=

½

b

f ( x ) u i (x ) ρ (x ) d x .

a

(7.372)

Putting this formula into the Fourier series (7.370) and interchanging the orders of summation and integration, we find that f (x ) =

∞ ± i =1

ai u i ( x ) =

½b a

f ( y)

Á± n i =1

Â

u i ( y ) u i ( x ) ρ (y )

dy

(7.373)

which gives us a representation for Dirac’s delta function δ( x

− y) =

∞ ± i =1

u i ( y ) u i ( x ) ρ (y ).

(7.374)

7.37 Unboundedness of Eigenvalues

307

The orthonormal eigenfunctions of every regular Sturm–Liouville system on an interval [a , b] are complete in the mean in L 2 (a , b ). The completeness of these eigenfunctions follows (Section 7.38) from the fact that the eigenvalues λn of a regular Sturm–Liouville system are unbounded: when arranged in ascending order λn < λn+1 they go to infinity with the index n lim

n→∞

λn

=∞

(7.375)

as we’ll see in the next section.

7.37 Unboundedness of Eigenvalues We have seen (Section 7.35) that the function u ( x ) that minimizes the ratio R [u ] =

E [u ] N [u ]

½ b²

=

a

³

p u± 2 + q u 2 d x

½

b a

ρ

(7.376)

2

u dx

is a solution of the Sturm–Liouville equation Lu

=−

( p u ± )± + q u = λ ρ u

with eigenvalue λ

= NE [[uu]] .

(7.377)

(7.378)

Let us call this least value of the ratio (7.376) λ1; it also is the smallest eigenvalue of the differential equation (7.377). The second smallest eigenvalue λ2 is the minimum of the same ratio (7.376) but for functions that are orthogonal to u 1

½b a

ρ

u1 u 2 d x

= 0.

(7.379)

And λ3 is the minimum of the ratio R [u ] but for functions that are orthogonal to both u 1 and u 2 . Continuing in this way, we make a sequence of orthogonal eigenfunctions u n ( x ) (which we can normalize, N [u n ] = 1) with eigenvalues λ1 ≤ λ2 ≤ λ3 ≤ · · · λn . How do the eigenvalues λn behave as n → ∞? Since the function p( x ) is positive for a < x < b, it is clear that the energy functional (7.319) E [u ] =

½ b² a

³

p u ±2 + q u 2 d x

(7.380)

308

7 Differential Equations Two functions with very different kinetic energies

1.4 1.2 1 )x( wu dna )x(u

0.8 0.6 0.4 0.2 0

0

0.5

1

1.5

2

2.5

3

x

Figure 7.2 The energy functional E [u ] of Equation (7.319) assigns a much higher energy to the function uω (x ) = u(x )(1 + 0.2 sin(ω x )) (zigzag curve with ω = 100) than to the function u(x ) = sin(x ) (smooth curve). As the frequency ω → ∞, the energy E[u 2] → ∞.

gets bigger as u ±2 increases. In fact, if we let the function u ( x ) zigzag up and down É ± 2 about a given u¯ , then the kinetic energy pu d x will rise but the potenÉ curve 2 tial energy qu d x will remain approximately constant. Thus by increasing the frequency of the zigzags, we can drive the energy E [u ] to infinity. For instance, if u ( x ) = sin x, then its zigzag version u ω ( x ) = u ( x )(1 + 0.2 sin ωx ) will have higher energy. The case of ω = 100 is illustrated in Fig. 7.2. As ω → ∞, its energy E [u ω ] → ∞. It is therefore intuitively clear (or at least plausible) that if the real functions p (x ), q ( x ), and ρ (x ) are continuous on [a , b ] and if p (x ) > 0 and ρ (x ) > 0 on (a , b ), then there are infinitely many energy eigenvalues λn , and that they increase without limit as n → ∞ lim

n →∞

λn

= ∞.

(7.381)

Courant and Hilbert (Richard Courant 1888–1972 and David Hilbert 1862– 1943) provide several proofs of this result (Courant and Hilbert, 1955, pp. 397– 429). One of their proofs involves the change of variables f = ( pρ)1/ 4 and v = f u, after which the eigenvalue equation

7.38 Completeness of Eigenfunctions

Lu

=−

309

( p u ±)± + q u = λρ u

(7.382)

becomes L f v = −v ±± + r v = λvv with r = f ±± / f + q /ρ . Were this r ( x ) a constant, the eigenfunctions of L f would be vn ( x ) = sin (n π/(b − a )) with eigenvalues λvn

=

´

nπ b −a

µ2

+r

(7.383)

rising as n 2. Courant and Hilbert show that as long as r ( x ) is bounded for a ≤ x ≤ b, the actual eigenvalues of L f are λv,n = c n 2 + d n in which dn is bounded and that the eigenvalues λn of L differ from the λv,n by a scale factor, so that they too diverge as n → ∞ n2

lim

n→∞ λ n

=g

(7.384)

where g is a constant. 7.38 Completeness of Eigenfunctions We have seen in Section 7.37 that the eigenvalues of every regular Sturm–Liouville system when arranged in ascending order tend to infinity with the index n lim

n→∞

λn

= ∞.

(7.385)

We’ll now use this property to show that the corresponding eigenfunctions u n ( x ) are complete in the mean (7.371) in the domain D of the system. To do so, we follow Courant and Hilbert (Courant and Hilbert, 1955, pp. 397– 428) and extend the energy E and norm N functionals to inner products on the domain of the system E [ f, g ] ≡ N [ f, g ] ≡

½ b²

½b a

a

³ p ( x ) f ±( x ) g ± ( x ) + q ( x ) f ( x ) g ( x ) d x

ρ (x )

f (x ) g (x ) d x

(7.386) (7.387)

for any f and g in D. Integrating E [ f , g ] by parts, we have E [ f, g ] =

=

½ b ¹( ½a b ¹ a

º

)± ( )± p f g± − f pg ± + q f g d x

−f

( ± )± pg

+

º

³b ² f q g d x + p f g± a

(7.388)

or in terms of the self-adjoint differential operator L of the system E[ f , g] =

½

b a

f L g dx

+

²

³b p f g± a .

(7.389)

310

7 Differential Equations

Since the boundary term vanishes (7.328) when the functions f and g are in the domain D of the system, it follows that for f and g in D E[ f , g] =

½

b a

f L g dx .

(7.390)

We can use the first n orthonormal eigenfunctions u k of the system

= λk ρ uk to approximate an arbitrary function in f ∈ D as the linear combination n ± f (x ) ∼ ck u k ( x ) L uk

k =1

with coefficients c k given by ck

= N [ f , u k] =

½b a

ρ

f uk d x .

(7.391)

(7.392)

(7.393)

We’ll show that this series converges in the mean to the function f . By construction (7.393), the remainder or error of the nth sum rn ( x ) = f ( x ) −

n ± k=1

ck u k ( x )

(7.394)

= 1, . . . , n.

(7.395)

is orthogonal to the first n eigenfunctions N [r n , u k ] = 0

for k

The next eigenfunction u n+1 minimizes the ratio R [φ ] =

E [φ , φ ] N [φ , φ ]

(7.396)

over all φ that are orthogonal to the first n eigenfunctions u k in the sense that N [φ , u k ] = 0 for k = 1, . . . n. That minimum is the eigenvalue λn+1 R [u n+1 ] = λn+1

(7.397)

which therefore must be less than the ratio R [r n ] of the remainder r n λn +1

≤ R [rn ] = NE [[rrn ,, rrn ]] . n

n

(7.398)

Thus the square of the norm of the remainder is bounded by the ratio

¸rn ¸2 ≡ N [rn , rn ] ≤ E [λrn , rn ] . n+1

(7.399)

7.38 Completeness of Eigenfunctions

311

So since λn+1 → ∞ as n → ∞, we’re done if we can show that the energy E [r n , r n ] of the remainder is bounded. This energy is E [rn , r n ] = E

¾ f

−

n ± k =1

= E [ f, f ] −

ck u k , f

n ± k =1 n

= E [ f, f ] − 2

−

n ± k =1

¿ ck uk

ck ( E [ f , u k ] + E [u k , f ]) +

± k =1

n n ± ±

ck E [ f , uk ] +

k =1

=1

n ± n ± k =1

´

=1

ck c´ E [u k , u ´ ]

c k c ´ E [u k , u ´ ].

(7.400)

´

Since f and all the u k are in the domain of the system, they satisfy the boundary condition (7.287 or 7.358), and so (7.389, 7.391, and 7.366) imply that E [ f , uk ] = and that E [u k , u ´ ] =

½

b a

½b a

= λk

f Luk d x

u k Lu ´ d x

=λ

´

½

a

½b a

b

ρ

f uk d x

ρ

uk u´ d x

= λk ck

= λk δ k

,´ .

(7.401)

(7.402)

Using these relations to simplify our formula (7.400) for E [r n , rn ] we find E [r n , r n ] = E [ f, f ] − Since λn

n ± k =1

2

λk c k .

→ ∞ as n → ∞, we can be sure that for high enough n, the sum n ± 2 k=1

λ k ck >

0

for n

>

N

(7.403)

(7.404)

is positive. It follows from (7.403) that the energy of the remainder r n is bounded by that of the function f E [r n , rn ] = E [ f , f ] −

n ± k =1

2

λk ck

≤ E [ f, f ].

(7.405)

By substituting this upper bound E [ f , f ] on E [r n , r n ] into our upper bound (7.399) on the squared norm ¸r n ¸2 of the remainder, we find

¸rn ¸2 ≤ Eλ[ f , f ] . n+1

(7.406)

312

7 Differential Equations

Thus since λn → ∞ as n mean (Section 5.3) to f lim ¸r n ¸

2

n →∞

→ ∞, we see that the series (7.392) converges in the

= nlim ¸f − →∞

n ± k =1

c k u k ¸2

E [ f, f ] = 0. ≤ nlim →∞ λ

(7.407)

n +1

The eigenfunctions u k of a regular Sturm–Liouville system are therefore complete in the mean in the domain D of the system. They span D. It is a short step from spanning D to spanning the space L 2 (a , b) of functions that are square integrable on the interval [a , b ] of the system. To take this step, we assume that the domain D is dense in (a , b), that is, that for every function g ∈ L 2 (a , b) there is a sequence of functions fn ∈ D that converges to it in the mean so that for any ± > 0 there is an integer N1 such that

½b

(7.408) ¸g − f n ¸ ≡ |g( x ) − fn ( x )|2 ρ (x ) d x < ± for n > N1 . a Since fn ∈ D, we can find a series of eigenfunctions u k of the system that 2

converges in the mean to fn so that for any ±

>

0 there is an integer N 2 such that

ÀÀ2 ½ b ÀÀ N ± À ¸ fn − cn k uk ¸ ≡ ÀÀ fn ( x ) − cn k uk (x )ÀÀÀ ρ (x ) d x < ± a k =1 k =1 N ±

,

2

,

for

The Schwarz inequality (1.105) applies to these inner products, and so

¸g −

N ± k =1

N ±

cn,k u k ¸ ≤ ¸ g − fn ¸ + ¸ fn (x ) −

k =1

cn ,k u k ¸.

Combining the last three inequalities, we have for n > N 1 and N

¸g −

N ± k =1

c n,k u k ¸ < 2

√

±.

N

>

N2 .

(7.409)

(7.410) >

N2 (7.411)

So the eigenfunctions u k of a regular Sturm–Liouville system span the space of functions that are square integrable on its interval L 2( a , b ). One may further show (Courant and Hilbert 1955, p. 360; Stakgold 1967, p. 220) that the eigenfunctions u k ( x ) of any regular Sturm–Liouville system form a complete orthonormal set in the sense that every function f (x ) that satisfies Dirichlet (7.285) or Neumann (7.286) boundary conditions and has a continuous first and a piecewise continuous second derivative may be expanded in a series f (x ) =

∞ ± k =1

a k u k (x )

(7.412)

that converges absolutely and uniformly on the interval [a , b ] of the system.

7.38 Completeness of Eigenfunctions

313

Our discussion (7.385–7.407) of the completeness of the eigenfunctions of a regular Sturm–Liouville system was insensitive to the finite length of the interval [a, b] and to the positivity of p( x ) on [a, b]. What was essential was the vanishing of the boundary terms (7.287) which can happen if p vanishes at the endpoints of a finite interval or if the functions u and v tend to zero as |x | → ∞ on an infinite one. This is why the results of this section have been extended to singular Sturm–Liouville systems made of self-adjoint differential systems that are singular because the interval is infinite or has p vanishing at one or both of its ends. The eigenfunctions u i ( x ) of a Sturm–Liouville system provide a representation (7.374) of Dirac’s delta function δ(x − y ) as a sum of the terms ρ (y )u i ( x )u i ( y ). Since this series is nonzero only for x = y, the weight function ρ ( y ) is just a scale factor, and we can write for 0 ≤ α ≤ 1

− y) = ρ

δ(x

α

(x ) ρ

1−α

( y)

∞ ± i =1

u i ( x ) u i ( y ).

(7.413)

These representations of the delta functional are suitable for functions f in the domain D of the regular Sturm–Liouville system. Example 7.51(A Bessel representation of the delta function) Bessel’s nth system L u = − (x u ± )± + n 2 u/ x = λ x u has eigenvalues λ = z 2n,k that are the squares of the zeros of the Bessel function Jn (x ). The eigenfunctions √ (Section 6.9) that are ( n) orthonormal with weight function ρ (x ) = x are uk (x ) = 2 Jn (z n,k x )/ Jn+1(z n,k ). Thus by (7.413), we can represent Dirac’s delta functional for functions in the domain D of Bessel’s system as δ(x

− y) = x

α

y 1−α

For n = 0, this Bessel representation is δ(x

− y) = 2 x

α

y1−α

∞ ± k =1

(n )

uk

(n) (x ) u k ( y ).

∞ J (z x ) J (z y ) ± 0 0k 0 0k ,

k =1

J12(z 0,k )

,

(7.414)

.

(7.415)

Figure 7.3 plots the sum of the first 10,000 terms of this series for α = 0 and y = 0.47, for α = 1/2 and y = 1/2, and for α = 1 and y = 0.53. This plot illustrates the Sturm–Liouville representation (7.413) of the delta function and its validity for 0 ≤ α ≤ 1.

314

7 Differential Equations Bessel series for δ(x

-

)

y

10000 8000 )y – x(δ rof seires

6000 4000

0J

2000 0

–2000 0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

x

Figure 7.3 The sum of the first 10,000 terms of the Bessel representation (7.415) for the Dirac delta function δ( x − y ) is plotted for y = 0. 47 and α = 0, for y = 1 /2 and α = 1 /2, and for y = 0 .53 and α = 1.

7.39 Inequalities of Bessel and Schwarz The inequality

½b a

ÀÀ ÀÀ2 n ± À À ρ (x ) À f (x ) − a k u k ( x )À d x ≥ 0 À À k =1

(7.416)

and the formula (7.372) for ak lead (exercise 7.27) to Bessel’s inequality

½b a

ρ (x )

| f ( x )|

2

dx

≥

∞ ± k =1

|ak |2 .

(7.417)

The argument we used to derive the Schwarz inequality (1.100) for vectors applies also to functions and leads to the Schwarz inequality

½b a

| f (x )|

ρ (x )

2

dx

½b a

ÀÀ½ b ÀÀ2 ∗ À ρ (x )|g ( x )| d x ≥ À ρ (x ) g (x ) f ( x ) d x ÀÀ . a 2

(7.418)

7.40 Green’s Functions

315

7.40 Green’s Functions Physics is full of equations of the form L G( x ) = δ (n ) ( x )

(7.419)

in which L is a differential operator in n variables. The solution G ( x ) is a Green’s function (Section 4.8) for the operator L.

Example 7.52 (Poisson’s Green’s function) Probably the most important Green’s function arises when the interaction is of long range as in gravity and electrodynamics. The divergence of the electric field is related to the charge density ρ by Gauss’s law ∇ · E = ρ/±0 where ±0 = 8.854 × 10−12 F/m is the electric constant. The electric field is E = −∇φ − A˙ in which φ is the scalar potential. In the Coulomb or radiation gauge, the divergence of A vanishes, ∇ · A = 0, and so − ² φ = −∇ · ∇ φ = ρ/±0. The needed Green’s function satisfies

− ²G (x ) = −∇ · ∇G ( x) = δ 3 (x) ( )

and expresses the scalar potential φ as the integral φ (t , x )

=

½

G( x − x ±)

ρ (t ,

x± )

±0

(7.420)

d 3 x ±.

(7.421)

For when we apply (minus) the laplacian to it, we get

½ ± − ²φ (t , x) = − ² G ( x − x ± ) ρ (t±, x ) d 3 x ± 0 ½ ± ρ (t , x ) 3 ± 3 ± = δ (x − x ) ± d x = ρ (t±, x) ( )

0

0

(7.422)

which is Poisson’s equation. The reader might wonder how the potential φ (t , x ) can depend upon the charge density ρ (t , x± ) at different points at the same time. The scalar potential is instantaneous because the Coulomb gauge condition ∇ · A = 0 is not Lorentz invariant. The gauge-invariant physical fields E and B are not instantaneous and do describe Lorentz-invariant electrodynamics. It is easy to find the Green’s function G (x ) by expressing it as a Fourier transform G ( x) =

½

and by using the 3-dimensional version δ

( 3)

(x)

=

ei k·x g (k) d 3k

½

d 3 k i k· x e (2 π )3

(7.423)

(7.424)

316

7 Differential Equations

of Dirac’s delta function (4.36). If we insert these Fourier transforms into the equation (7.420) that defines the Green’s function G (x ), then we find

½ − ²G ( x) = −² ei k· x g(k) d 3 k ½ ½ i k·x 2 3 3 = e k g( k) d k = δ ( x) = ( )

eik ·x

d3k . (2 π )3

(7.425)

Thus the Green’s function G ( x) is the Fourier transform (4.109) G ( x) =

½

eik· x d 3k k2 (2π )3

(7.426)

1 = 4π | x| 4π r

(7.427)

which we may integrate to (4.113) G (x ) =

1

where r = | x | is the length of the vector x. This formula is generalized to n dimensions in Example 6.28. Example 7.53 (Helmholtz’s Green’s functions) The Green’s function for the Helmholtz equation (−² − m 2 )V ( x) = ρ ( x) must satisfy

−² − m 2) G H (x ) = δ 3 ( x). ( )

(

(7.428)

By using the Fourier-transform method of the previous example, one may show that G H is eimr G H (x ) = (7.429) 4π r in which r = | x| and m has units of inverse length. Similarly, the Green’s function G m H for the modified Helmholtz equation (

−² + m 2 ) G m H (x) = δ 3 (x ) ( )

is (Example 6.27) which is a Yukawa potential.

G m H ( x) =

e−mr 4π r

Of these Green’s functions, probably the most important is G ( x ) which has the expansion G (x − x± ) =

1 4π | x − x ± |

=

∞ ± ± ´

(7.430)

(7.431)

=

1/4π r

1

r ´=0 m =− ´

in terms of the spherical harmonics Y´, m (θ , φ). Here r , θ , and φ are the spherical coordinates of the point x, and r ± , θ ± , and φ± are those of the point x ± ; r > is the larger of r and r ±, and r < is the smaller of r and r ± . If we substitute this expansion

7.40 Green’s Functions

317

(7.432) into the formula (7.421) for the potential φ, then we arrive at the multipole expansion φ (t , x )

= =

½

±

ρ (t , x ) 3 ± G (x − x± ) d x ±0

∞ ± ± ´

´

1

=0 m =−´ 2´ + 1

½

(7.433)

± r ´ +1 ±0

Physicists often use this expansion to compute the potential at x due to a localized, remote distribution of charge ρ (t , x ± ). In this case, the integration is only over the restricted region where ρ (t , x ± ) ³ = 0, and so r < = r ± and r > = r, and the multipole expansion is φ (t , x)

=

∞ ±

½

± ´

1

Y´,m (θ , φ) 2´ + 1 m =−´ r ´ +1 ´= 0

In terms of the multipoles m

Q´

=

½

r ±´ Y´∗m (θ ± , φ ± )

r ±´ Y´,∗ m (θ ± , φ ± )

ρ (t , x ± )

ρ (t , x ± ) ±0

d 3 x ± . (7.434)

d 3x ±

(7.435)

Qm ´ Y´,m (θ , φ).

(7.436)

±0

the potential is φ (t , x )

=

∞ ±

1

± ´

1

2´ + 1 r ´ +1 m =−´ ´ =0

The spherical harmonics provide for the Legendre polynomial the expansion P´ ( xˆ · xˆ ± ) =

4π

± ´

2´ + 1 m =−´

Y´m (θ , φ)Y´∗m (θ ± , φ± )

(7.437)

which abbreviates the Green’s function formula (7.432) to G (x − x± ) =

1 4π | x − x ± |

= 41π

∞ ±

r

(7.438)

Example 7.54(Feynman’s propagator) The Feynman propagator ·F( x )

=

½

d4 q exp(iqx ) 4 2 (2π ) q + m 2 − i ±

= m2 − ± 2 4 (m − ±)· F (x ) = δ (x ).

(7.439)

is a Green’s function (6.257) for the operator L

(7.440)

318

7 Differential Equations

By integrating over q 0 while respecting the i ± (Example 4.100), one may write the propagator in terms of the Lorentz-invariant function ·+ (x )

1

= (2π )3

½

d3 q exp[i (q · x − E q x 0 )] 2E q

(7.441)

as (6.268)

= i θ (x 0 ) ·+ (x ) + i θ (−x 0) ·+( x, −x 0 ) (7.442) which √ for space-like x, that is, for x 2 = x2 − (x 0 )2 ≡ r 2 > 0, depends only upon r = + x 2 and has the value ·+ (x ) = (m /(4π 2r )) K 1 (mr ) in which K 1 is the Hankel · F (x )

function (6.276) (Weinberg, 1995, p. 202).

7.41 Eigenfunctions and Green’s Functions The Green’s function G (x

− y) =

½

dk 1 e ik ( x − y) 2 2π k + m 2

(7.443)

is based on the resolution of the delta function δ( x

½

− y) =

dk ik (x − y ) e 2π

(7.444)

in terms of the eigenfunctions exp(ik x ) of the differential operator −∂x2 + m 2 with eigenvalues k 2 + m 2. We may generalize this way of making Green’s functions to a regular Sturm– Liouville system (Section 7.36) with a differential operator L, eigenvalues λn L u n (x ) =

λn ρ (x ) u n ( x )

(7.445)

and eigenfunctions u n ( x ) that are orthonormal with respect to a positive weight function ρ (x ) δn´

= (un , uk ) =

½

ρ (x ) u n ( x )u k (x ) d x

(7.446)

and that span in the mean the domain D of the system. To make a Green’s function G (x − y ) that satisfies L G ( x − y) =

δ(x

− y)

(7.447)

we write G ( x − y ) in terms of the complete set of eigenfunctions u k as G ( x − y) =

∞ u (x ) u ( y ) ± k k k =1

λk

(7.448)

7.42 Green’s Functions in One Dimension

319

= λk ρ uk turns G into ∞ ∞ L u ( x )u ( y ) ± ± k k = ρ (x ) uk (x ) u k ( y ) = δ(x − y) L G (x − y ) = λ

so that the action Luk

k =1

an α

k =1

k

(7.449)

= 1 series expansion (7.413) of the delta function. 7.42 Green’s Functions in One Dimension

In 1 dimension, we can explicitly solve the inhomogeneous ordinary differential equation L f (x ) = g ( x ) in which L

=−

d dx

´

d p(x ) dx

µ

+ q (x )

(7.450)

is formally self adjoint. We’ll build a Green’s function from two solutions u and v of the homogeneous equation L u (x ) = L v(x ) = 0 as G (x , y ) =

1 ² θ (x A

³ − y )u( y )v(x ) + θ (y − x )u (x )v(y )

(7.451)

in which θ ( x ) = ( x +| x |)/(2| x |) is the Heaviside step function (Oliver Heaviside 1850–1925), and A is a constant which we’ll presently identify. We’ll show that the expression f (x ) =

½

b a

G (x , y ) g ( y ) dy

=

v( x )

A

½

x a

u ( y ) g ( y ) dy

+

u( x) A

½

b x

v( y ) g ( y ) dy

solves L f (x ) = g ( x ). Differentiating f (x ) , we find after a cancellation f ± (x ) =

v± ( x )

½x

A

a

u ( y ) g ( y ) dy

Differentiating again, we have f ±± (x ) =

=

v±± ( x )

½x

+

u± (x ) A

½

b x

v( y ) g ( y ) dy .

(7.452)

½

u ±± ( x ) b u ( y ) g( y ) dy + v( y ) g ( y ) dy A A a x ± ± + v ( x )u(Ax )g( x ) − u (x )v(Ax )g( x ) (7.453) ½x ½ u ±± ( x ) b W (x ) v±± ( x ) u ( y ) g( y ) dy + v( y ) g ( y ) dy + g( x ) A A A a x

in which W ( x ) is the wronskian W ( x ) = u ( x )v ± (x ) − u ± (x )v( x ). The result (7.306) for the wronskian of two linearly independent solutions of a self-adjoint

320

7 Differential Equations

homogeneous ODE gives us W ( x ) = W ( x 0) p (x 0 )/ p (x ) . We set the constant A = −W (x 0 ) p (x 0 ) so that the last term in (7.453) is −g ( x )/ p (x ) . It follows that

½ ½ [ Lu ( x )] b [ L v(x )] x u ( y ) g ( y ) dy + v( y ) g ( y ) dy + g ( x ) = g ( x ). L f (x ) = A

A

a

But Lu( x ) = L v( x ) L f ( x ) = g ( x ).

= 0, so we see that

x

(7.454)

f satisfies our inhomogeneous equation

Example 7.55 (Green’s functions with boundary conditions) To use the Green’s function (7.451) with A = W (x0) p(x0 ) to solve the ODE (7.450) subject to the Dirichelet boundary conditions f (a) = 0 = f (b), we choose solutions u(x ) and v(x ) of the homogeneous equations Lu (x ) = 0 = L v(x ) that obey these boundÉb ary conditions, u (a) = 0 = v(b). For then our formula f ( x ) = a G (x , y )g( y)dy gives f (a ) =

½b

u (a) A

a

v(y ) g (y ) dy

=0=

f (b) =

v(b )

½b

u( y) g( y) dy .

A

a

±

½b

(7.455)

Similarly, to impose the Neumann boundary conditions f ± (a) = 0 = f ± (b), we choose solutions u (x ) and v(x ) of the homogeneous equations Lu (x ) = 0 = L v(x ) that obey these boundary conditions, u ±(a) = 0 = v ± (b), so that our formula (7.452) for f ± (x ) gives

½b

u± (a) f ± (a ) = A

a

v(y ) g (y ) dy

=0=

f ± (b) =

v (b)

A

a

u (y ) g(y ) dy .

(7.456)

For instance, to solve the equation − f ±± (x ) − f (x ) = exp x, with the mixed boundary conditions f (−π ) = 0 and f ± (π ) = 0, we choose from among the solutions α cos x + β sin x of the homogeneous equation − f ±± − f = 0, the functions u(x ) = sin x and v(x ) = cos x. Substituting them into the formula (7.451) and setting p (x ) = 1 and A = − W (x0 ) = sin2 (x0) + cos2 (x0 ) = 1, we find as the Green’s function G ( x , y ) = θ (x

É

− y ) sin y cos x + θ ( y − x ) sin x cos y .

(7.457)

The solution f (x ) = −π G (x , y) ey dy then is f (x ) =

π

½ ² ³ y θ (x − y ) sin y cos x + θ ( y − x ) sin x cos y e dy π

−π

=−

1 ( −π e cos x + eπ sin x 2

) +e . x

(7.458)

7.43 Principle of Stationary Action in Field Theory

321

7.43 Principle of Stationary Action in Field Theory

If φ ( x ) is a scalar field, and L (φ) is its action density, then its action S [φ ] is the integral over all of spacetime S [φ ] =

½

L (φ (x )) d 4 x .

(7.459)

The principle of least (or stationary) action says that the field φ ( x ) that satisfies the classical equation of motion is the one for which the first-order change in the action due to any tiny variation δφ ( x ) in the field vanishes, δ S [φ ] = 0. To keep things simple, we’ll assume that the action (or Lagrange) density L (φ) is a function only of the field φ and its first derivatives ∂a φ = ∂ φ/∂ x a . The first-order change in the action then is · ½ ¶∂ L ∂L 4 δφ + δ(∂a φ) d x (7.460) δ S [φ ] = ∂φ

∂ (∂a φ)

in which we sum over the repeated index a from 0 to 3. Now δ(∂a φ) = ∂a (φ +δφ) − ∂a φ = ∂a δφ . So we may integrate by parts and drop the surface terms because we set δφ = 0 on the surface at infinity

[ ]=

δS φ

½ ¶∂ L ∂φ

δφ

·

∂L

+ ∂ (∂ φ) ∂a (δφ)

=

4

d x

a

This first-order variation is zero for arbitrary Lagrange’s equation

´

∂a

µ

∂L ∂ (∂a φ)

≡ ∂xa ∂

¶

½ ¶ ∂L

δφ

∂φ

∂L

− ∂a ∂ (∂ φ) a

only if the field

·

∂L ∂ (∂ φ/∂ x a )

= ∂∂ φL

·

δφ d

φ (x )

4

x.

satisfies (7.461)

which is the classical equation of motion. Example 7.56(Theory of a scalar field) The action density of a single scalar field φ ˙ 2 − 12 (∇ φ)2 − 21 m 2φ 2 or equivalently L = − 21 ∂a φ ∂ a φ − of mass m is L = 21 (φ) 1 2 2 2 m φ . Lagrange’s equation (7.461) is then

∇ 2φ − φ¨ = ∂a ∂ a φ = m 2 φ

(7.462)

which is the Klein–Gordon equation (7.41).

In a theory of several fields φ1 , . . . , φn with action density L (φk , ∂a φk ) , the fields obey n copies of Lagrange’s equation one for each field φk ∂ ∂xa

´

∂L ∂ (∂a φ k )

µ

= ∂∂φL . k

(7.463)

322

7 Differential Equations

7.44 Symmetries and Conserved Quantities in Field Theory A transformation of the coordinates x a or of the fields φi and their derivatives ∂a φi that leaves the action density L (φi , ∂a φi ) invariant is a symmetry of the theory. Such a symmetry implies that something is conserved or time independent. Suppose that an action density L (φi , ∂a φi ) is unchanged when the fields φi and their derivatives ∂a φi change by δφi and by δ(∂a φi ) = ∂a (δφi ) 0 = δL

=

± ∂L

∂ φi

i

δφ i

+ ∂ ∂∂ Lφ ∂a δφi .

(7.464)

a i

Then using Lagrange’s equations (7.463) to rewrite ∂ L /∂ φi , we find 0=

±´

∂a

i

µ

∂L

δφi

∂ ∂a φi

+ ∂ ∂∂ Lφ ∂a δφi = ∂a

has zero divergence, ∂a J a the charge density J 0

∂L

i

∂ ∂ a φi

a i

which says that the current Ja

±

=

±

∂L

i

∂ ∂ a φi

δφi

(7.465)

(7.466)

δφ i

= 0. Thus the time derivative of the volume integral of ½ QV = J 0 d 3x (7.467) V

is the flux of current J¹ entering through the boundary S of the volume V Q˙ V

=

½

V

∂0 J

0

3

d x

=−

½

V

∂k J

k

3

d x

=−

½

S

J k d 2 Sk .

(7.468)

If no current enters V , then the charge Q inside V is conserved. When the volume V is the whole universe, the charge is the integral over all of space Q

=

½

J

0

½± ½ ± ∂L 3 3 δφi d x = πi δφ i d x d x= ˙ ∂ φi i i 3

(7.469)

in which πi is the momentum conjugate to the field φi πi

= ∂ L˙ . ∂ φi

Example 7.57(O (n ) symmetry and its charge) sum of n copies of the quadratic action density L

=

(7.470) Suppose the action density L is the

n ± 1 ˙ i )2 − 1 (∇φi )2 − 1 m 2φi2 = − 1 ∂a φ ∂ a φ − 1 m 2φ 2, (φ 2 2 2 2 2 i =1

(7.471)

7.44 Symmetries and Conserved Quantities in Field Theory

323

and A i j is any constant antisymmetric matrix, A i j = − A ji . Then if the fields change ∑ A φ , the change (7.464) in the action density by δφi = ± j ij j δL

= −±

n ¹ ±

i, j =1

m 2φi Ai j φ j

º + ∂ a φ i A i j ∂a φ j = 0

(7.472)

vanishes. Thus the charge (7.469) associated with the matrix A QA

½±

=

3

d x

πi δφi

i

=±

½ ±

πi

Ai j

φj

d3x

(7.473)

i

is conserved. There are n(n − 1)/2 antisymmetric n × n imaginary matrices; they generate the group O (n) of n × n orthogonal matrices (Example 11.3).

An action density L (φi , ∂a φi ) that is invariant under a spacetime translation, x ± a = x a + δ x a , depends upon x a only through the fields φi and their derivatives ∂a φi ∂L

=

∂xa

± ´ ∂ L ∂ φi ∂ φi ∂ x a

i

µ

∂ 2 φi

∂L

+ ∂∂ φ

b a b i ∂x ∂x

.

(7.474)

Using Lagrange’s equations (7.463) to rewrite ∂ L /∂ φi , we find

Á± ´

µ

Â

∂ 2φ i

∂L − + ∂ ∂ b φi ∂ ∂b φ i ∂ x b ∂ x a ∂xa i Â ¿ ¾Á ± ∂ L ∂ φi − δba L 0 = ∂b ∂∂ φ ∂ xa

0=

∂b

∂L

∂L

∂a φi

(7.475)

b i

i

that the energy–momentum tensor T ba

=

±

∂L

∂ φi

∂ ∂ bφ i ∂ x a

i

− δab L

(7.476)

has zero divergence, ∂b T ba = 0. Thus the time derivative of the 4-momentum PaV inside a volume V PaV

=

½ Á± V

i

∂L

∂ φi

∂ ∂0 φi ∂ x a

−

Â

δ0a L

3

d x

=

½

is equal to the flux entering through V ’s boundary S ∂0 PaV

=

½

V

0 ∂0 T a

3

d x

=−

½

V

k 3 ∂k T a d x

=−

V

T 0a d 3 x

½ S

T ka d 2 Sk .

(7.477)

(7.478)

The invariance of the action density L under spacetime translations implies the conservation of energyP0 and momentum P¹.

324

7 Differential Equations

The momentum πi ( x ) that is canonically conjugate to the field φi ( x ) is the derivative of the action density L with respect to the time derivative of the field (7.479) = ∂ L˙ . ∂ φi If one can express the time derivatives φ˙ i of the fields in terms of the fields φi and πi

their momenta πi , then the hamiltonian of the theory is the spatial integral of H in which φ˙i

= P0 = T 00 =

Á± n

= φ˙i (φ , π ).

i =1

Â

˙ −L

πi φi

(7.480)

Example 7.58(Hamiltonian of a scalar field) For the lagrangian of example 7.56, the hamiltonian density (7.480) is H = 21 π 2 + 21 (∇ φ)2 + 21 m 2 φ2 . Example 7.59(Euler’s theorem and the Nambu–Goto string) When the action density is a first-degree homogeneous function (Section 7.10) of the time derivatives of the fields, as is that of the Nambu-Goto string L

= − Tc0

Ê(

X˙ · X ±

)2 ( ˙ )2 ± 2 − X (X ) ,

(7.481)

Euler’s theorem (7.109) implies that the energy density (7.480) vanishes identically, independently of the equations of motion, E0 =

∂L

˙

∂ Xμ

X˙ μ − L

= 0.

(7.482)

7.45 Nonlinear Differential Equations Nonlinear differential equations are very interesting. Because linear combinations of the solutions to a given nonlinear differential equation are not also solutions, the actual solutions tend to have intrinsic sizes and shapes. But for the same reason, finding the general solution to a nonlinear differential equation is difficult. Fortunately, modern computers give us numerical solutions in most cases. And sometimes we get lucky. Example 7.60(Riccati’s equation) The nonlinear differential equation y ± y2 has the solution y( x ) = x −1 + ax −2(c − e2a/ x )/(c + e2a/ x ).

= a2 x −4 −

Example 7.61 (Equidimensional equation) The second-order nonlinear ordinary differential equation y ±± = y y± / x has a general solution given by y(x ) = 2c

7.46 Nonlinear Differential Equations in Cosmology tan(c ln(x ) + c± ) − 1 and two special solutions y (x ) ln x ) − 1 (Bender and Orszag, 1978, p. 30).

325

= c±± and y (x ) = − 2/(c±±± +

Example 7.62(Fluid mechanics) The continuity equation (4.193) for a fluid of mass density ρ , velocity v , and current density j = ρ v is ∂ ∂t

½

ρ ( x, t ) d

3

x=

−

Ë

j (x , t ) · da

½ = − ∇ · j (x, t ) d 3 x

½ = − ∇ · (ρ v) d3 x.

Its differential form is ρ˙ = derivative of the density is

(7.483)

− ∇ · (ρ v) = −ρ ∇ · v − v · ∇ ρ so that the total time

dρ dt

≡ ∂ρ + v · ∇ ρ = −ρ ∇ · v . ∂t

(7.484)

An incompressible fluidis one for which both sides of this continuity equation vanish dρ dt

=0

∇ · v = 0.

and

(7.485)

In the absence of viscosity, the force F acting on a tiny volume d V of a fluid is the integral of the pressure p over the surface d Aof d V

−

Ë

p dA =

½ − ∇ p dV

(7.486)

in which d Ais the outward normal to the surface. Equating this force per unit volume to the density times the acceleration, we find ρ

dv dt

=ρ

´ ∂v ∂t

µ + (v · ∇ ) v = − ∇ p

(7.487)

which is Euler’s equation for a fluid without viscosity (Leonhard Euler 1707–1783). An incompressible fluid with a constant viscosity η obeys the Navier–Stokes equation ρ

´ ∂v ∂t

µ

+ (v · ∇) v = − ∇ p + η ∇2 v

(7.488)

(Claude-Louis Navier 1785–1836, George Stokes 1819–1903).

7.46 Nonlinear Differential Equations in Cosmology On large scales our universe is homogeneous and isotropic in space but not in time. The invariant squared distance between nearby points is ds

2

= − c dt + a (t ) 2

2

2

´

dr 2 1 − kr 2/ L 2

+r

2

dθ

2

+r

2

2

sin

θ

dφ

2

µ

(7.489)

326

7 Differential Equations

in which the magnitude of the scale factora ( t ) describes the expansion of space, and k = 0 or ¶1 corresponding to flat space (k = 0), a closed universe (k = 1), or an open universe (k = −1). The Friedmann equations of general relativity (13.273 and 13.288) for the dimensionless scale factor a (t ) are a¨ a

= − 4π3G

in which L

>

´

ρ

+

3p c2

µ

2 × 1013 ly, and k

and

´ a˙ µ2 a

2

= 8π3G ρ − Lc2ak2

(7.490)

= ¶1 or 0.

These equations are simpler when the pressure p is related to the mass density ρ by an equation of state p = c 2wρ . This happens when the mass density ρ is due to a single constituent – radiation ( w = 1/3), matter ( w = 0), or dark energy (w = − 1). Conservation of energy ρ˙ = −3 (ρ + p/c 2 ) a˙ /a (13.279–13.284) then gives (Exercise 7.32) the mass density as ρ = ρ0 a −3(1+w) in which ρ0 is its present value, and the present value a0 of the scale factor has been set equal to unity, a 0 = 1. Friedmann’s equations then are a¨ a

=−

4π G (1 + 3w)ρ 0 −3(1+w) a 3

and

´ a˙ µ2 a

2

= 8π 3Gρ0 a−3 1+ − Lc2 ka2 . (

w)

(7.491) Example 7.63(de Sitter spacetimes) For a universe dominated by a cosmological constant µ = 8π Gρ0 , the equation-of-state parameter w is − 1, and Friedmann’s equations (7.491) are a¨ a

2

= 8π G3 ρ0 and a˙ 2 = 8π 3Gρ0 a 2 − cL2k . (7.492) If µ = 8π Gρ 0 < 0 and k = − 1, then the scale factor a(t ) satisfies the harmonic equation a¨ = − λ2 a where λ 2 = 8π G |ρ0 |/3. So the scale factor is a periodic function, a(t ) = α ei t + α¯ e−i t . The nonlinear first-order equation − λ2(α2 e2i t − 2|α|2 + α¯ 2 e−2i t ) = −λ2(α2 e2i t +2|α |2 + α¯ 2 e−2i t )+ c2 / L2 (7.493) fixes the magnitude of α to be | α| = c/(2λ L ). With the initial condition a(0) = 0, we get a(t ) = (c/(L λ)) sin(λ t ) in which the sign of a(t ) is irrelevant because only a2 λ

λ

λ

λ

λ

λ

appears in the invariant distance (7.489). This maximally symmetric (section 13.24) spacetime is called anti-de Sitter space. There are no k = 0 or k = 1 solutions for ρ0 < 0. If µ = 8π Gρ 0 > 0, the scale factor a (t ) obeys the equation a¨ = λ2 a with λ 2 = 8π G ρ0 /3 > 0. So a (t ) = α eλ t + β e−λt with α, β real. The nonlinear first-order equation a˙ 2 = λ2 a 2 − c2 k/ L 2 implies that αβ = c2k /(4λ 2 L 2). The solutions

7.46 Nonlinear Differential Equations in Cosmology 2

327

de Sitter spacetimes

×10 –3

1.8 1.6

de Sitter

1.4 1.2 |)t( a|

1 anti-de Sitter

0.8 0.6 0.4 0.2 0 –100

–80

–60

–40

–20 t

0 20 (Gyr)

40

60

80

100

Figure 7.4 The absolute value |a(t )| of the scale factor is plotted for de Sitter universes (7.494) with µ > 0 and k = 1 (solid) and k = −1 (dashed) as well as for an anti-de Sitter universe a(t ) = (c/( L λ)) sin(λ t ) with µ < 0 and k = −1 (dotted).

⎧c ⎪⎪⎨ L cosh(λ t ) a (t ) = a(0) e¶ t ⎪⎩ c sinh(λ t ) λ

λ

Lλ

=1 if k = 0 if k = − 1 if k

(7.494)

remarkably represent in different coordinates the same maximally symmetric spacetime known as de Sitter space. Figure 7.4 plots the absolute value |a(t )| of the scale factor for de Sitter universes (7.494) with µ > 0 and k = 1 (solid) and k = −1 (dashed) as well as for an antide Sitter universe a(t ) = (c/(L λ)) sin(λ t ) with µ < 0 and k = −1 (dotted). The present value of the dark-energy mass density ρ 0 = 5.924 × 10−27 kg/m3 is used with L = 2. 1 × 1013 ly. Example 7.64(Universe of radiation) For a universe in which the mass density ρ is entirely due to radiation, the parameter w = 1/3, and the Friedmann equations (7.491) are 8π Gρ 0 c2k 8π Gρ0 and a˙ 2 = − . (7.495) a 3 a¨ = − 3 3a2 L2

328

7 Differential Equations

In terms of r 4 ≡ 32π G ρ0 /3 > 0, the first-order k = 0 Friedmann equation (7.495) is a˙ = r 2/(2a) or 2a da = r 2 dt so that a 2 = r t . Thus the k = 0 solution for a universe √ of radiation is a(t ) = r t . In our universe the term − c2k / L 2 is so small as to be consistent with zero, so this solution is a good approximation during the first 50,000 years, which is the era of radiation. The k = ¶1 solutions are

¾

´

a(t ) =

2c2 t Lr 2 1− 1− 2 2 2c L r

a(t ) =

Lr 2

The scale factor of a k

¾´

2c

1+

2c2 t

µ2

L 2r 2

µ 2¿ 1 2 /

for

k

¿1 2

−1

=1 (7.496)

/

for

k

= − 1.

= 1 universe reaches a maximum size Ì L 8π Gρ 0 Lr 2 =c amax = 2c 3

and collapses back to zero at t

(7.497)

= L 2r 2 /c2 > 5. 6 Gyr.

Example 7.65 (Universe of matter) A universe made entirely of matter has no pressure, so w = 0. Its Friedmann equations (7.491) are a2 a¨

= − 4π 3Gρ0

and

a˙ 2

2

= 8π G3 ρ0 a1 − cL 2k .

(7.498)

In terms of m3 = 6π G ρ0 > 0, the first-order k = 0 Friedmann equation (7.498) is √ √ (3 /2 ) a a˙ = m 3/ 2 or (3 /2 ) a da = m 3/2 dt which integrates to a 3/2 = m 3/2t . So the k = 0 scale factor for a universe of matter rises as the two-thirds power of the time a(t ) = m t 2/3 which is a reasonable approximation during the first few Gyr of the era of matter because the curvature term − c2k / L 2 is tiny. The scale factor of a k = 1 universe reaches a maximum size and then collapses to zero.

amax

= 8π G3 cρ20 L

2

(7.499)

√

Figure 7.5 plots the k = 0 scale factors a (t ) = r t for a universe of radiation (dash-dot) and a ( t ) = m t 2/3 for a universe of matter (dashed) as well as the scale factor for a multi-component universe (13.296, solid) based on the parameters of the Planck collaboration (Aghanim et al., 2018). 7.47 Nonlinear Differential Equations in Particle Physics The equations of particle physics are nonlinear. Physicists usually use perturbation theory to cope with the nonlinearities. But occasionally they focus on the nonlinearities and treat the fields classically or semi-classically. To keep things relatively simple, we’ll work in a spacetime of only 2 dimensions and consider a model field theory described by the action density

7.47 Nonlinear Differential Equations in Particle Physics

329

Friedmann–Lemaˆıtre–Robinson–Walker cosmologies

1 0.9

matter, dark energy, and radiation

0.8 0.7

only matter

0.6 ) t(a

0.5 0.4 0.3 0.2

only radiation

0.1 0

0

2

4

6

8 t

10

12

14

(Gyr)

Figure 7.5 The √ k = 0 scale factors a(t ) = m t 2/3 for a universe of matter (dashes) and a (t ) = r t for a universe of radiation (dashdot) are compared with a multicomponent scale factor (13.296, solid) based on the parameters of the Planck collaboration (Aghanim et al., 2018).

L=

1( 2 ±2 ) − V (φ) φ˙ − φ 2

(7.500)

in which a dot means a time derivative, a prime means a spatial derivative, and V is a function of the field φ . Lagrange’s equation for this theory is

¨ − φ±± = − d V . dφ

(7.501)

φ

We can convert this partial differential equation to an ordinary one by making the field φ depend only upon the combination u = x − v t rather than upon both x and t . We then have φ˙ = − v φu . With this restriction to traveling-wave solutions, Lagrange’s equation reduces to (1

− v 2) φuu = ddVφ = V,

φ.

(7.502)

We multiply both sides of this equation by φu (1 −v 2) φu φuu = φu V,φ and integrate to get ( 1 − v 2) 21 φu2 = V + E in which E is a constant of integration E = 21 (1 − v 2) φ u2 − V (φ). We can convert (Exercise 7.36) this equation into a problem of integration

330

7 Differential Equations

u − u0

=

½

√

√2( E1+− Vv (φ)) d φ . 2

(7.503)

By inverting the resulting equation relating u to φ , we may find the soliton solution φ (u − u 0 ), which is a lump of energy traveling with speed v . Example 7.66(Soliton of the φ 4 theory) To simplify the integration (7.503), we take as the action density

L=

¶ λ2 »

¼ 1 » ˙2 ±2 − φ −φ 2

2

φ

2

−

2 φ0

¼2

−E

·

(7.504)

.

Our formal solution (7.503) gives

√ 2 ½ √1 − v 2 1−v u − u0 = ¶ ( ) dφ = ∓ tanh−1 (φ/φ0 ) (7.505) 2 2 λφ λ φ − φ0 0 or ¶ x − x − v(t − t ) · 0 φ (x − v t ) = ∓φ0 tanh λφ0 √0 (7.506) 2 1−v which is a soliton (or an antisoliton) at x0 + v(t − t0). A unit soliton at rest is plotted in Fig. 7.6. Its energy is concentrated at x = 0 where | φ2 − φ02 | is maximal. Kink soliton

1.5

1

0.5 )x(hnat = )x(φ

0

–0.5

–1 –1.5 –10

–8

–6

–4

–2

0

2

4

6

8

10

x

Figure 7.6 The field φ (x ) of the soliton (7.506) at rest (v = 0) at position x0 = 0 for λ = 1 = φ0. The energy density of the field vanishes when φ = ¶φ0 = ¶1. The energy of this soliton is concentrated at x = 0.

Exercises

331

Further Reading One can learn more about differential equations in Advanced Mathematical Methods for Scientists and Engineers (Bender and Orszag, 1978). Exercises

7.1 In rectangular coordinates, the curl of a curl is by definition (2.43)

∇ × (∇ × E ))i =

(

3 ±

j , k =1

∇ × E )k =

±i jk ∂ j (

3 ±

j ,k ,´,m =1

±i jk ∂ j ±k ´m ∂´ E m .

(7.507)

Use Levi-Civita’s identity (1.497) to show that

∇ × (∇ × E ) = ∇(∇ · E ) − ± E . This formula defines ± E in any system of orthogonal coordinates.

(7.508)

7.2 Show that since the Bessel function Jn (x ) satisfies Bessel’s equation (7.26), the function Pn (ρ) = Jn (kρ) satisfies (7.25). 7.3 Show that (7.38) implies that Rk ,´ (r ) = j´(kr ) satisfies (7.37). 7.4 Use (7.36, 7.37), and ² ±±m = −m 2 ²m to show in detail that the product f (r, θ , φ) = Rk ´(r ) ³´m (θ ) ²m (φ) satisfies −² f = k 2 f . 7.5 Replacing Helmholtz’s k 2 by 2m ( E − V (r ))/ ±2 , we get Schrödinger’s equation −(±2 /2m )²ψ (r, θ , φ) + V (r )ψ (r, θ , φ) = E ψ (r, θ , φ). Now let ψ (r, θ , φ) = R n ´(r )³´m (θ )e im φ in which ³´m satisfies (7.36) and show that the radial function R n ´ must obey

(r 2 R± )± ¶ · 2m E n Rn = − r 2n + ´(´r+2 1) + 2mV 2 ± ±2 Rn . (7.509) 7.6 Use the empty-space Maxwell’s equations ∇ · B = 0, ∇ × E + B˙ = 0, ∇ · E = 0, and ∇ × B − E˙ / c2 = 0 and the formula (7.508) to show that in vacuum ± E = E¨ /c 2 and ± B = B¨ /c 2 . 7.7 Argue from symmetry and antisymmetry that [γ a , γ b ]∂a ∂b = 0 in which the ,´

7.8

7.9 7.10 7.11

,´

,´

,´

sums over a and b run from 0 to 3. Suppose a voltage V (t ) = V sin(ω t ) is applied to a resistor of R (¶) in series with a capacitor of capacitance C ( F ) . If the current through the circuit at time t(= 0 is zero, )what is the current at time t? ³ −3/2 ² 2 2 (a) Is 1 + x + y ( 1 + y 2 ) y d x + (1 + x 2 )x d y = 0 exact? (b) Find its general integral and solution y ( x ). Use Section 7.8. (a) Separate the variables of the ODE (1 + y 2 ) y d x + (1 + x 2 ) x d y = 0. (b) Find its general integral and solution y ( x ). Find the general solution to the differential equation y ± + y / x = c /x.

332

7 Differential Equations

7.12 Find the general solution to the differential equation y ± + x y = ce −x /2 . 7.13 James Bernoulli studied ODEs of the form y ± + p y = q y n in which p and q are functions of x. Division by y n and the substitution v = y 1−n gives us the equation v ± + ( 1 − n ) p v = (1 − n ) q which is soluble as shown in Section 7.16. Use this method to solve the ODE y ± − y /2x = 5x 2 y 5 . 7.14 Integrate the ODE ( x y + 1) d x + 2x 2 ( 2x y − 1 ) d y = 0. Hint: Use the variable v(x ) = x y (x ) instead of y ( x ) . 7.15 Show that the points x = ¶1 and ∞ are regular singular points of Legendre’s equation (7.220). 7.16 Use the vanishing of the coefficient of every power of x in (7.225) and the notation (7.227) to derive the recurrence relation (7.228). 7.17 In Example 7.39, derive the recursion relation for r = 1 and discuss the resulting eigenvalue equation. 7.18 In Example 7.39, show that the solutions associated with the roots r = 0 and r = 1 are the same. 7.19 For a hydrogen atom with V (r )( = − e 2 /4π) ±0 r ≡ −q 2 /³r, Equation ² (7.509) is (r 2 R ±n ´)± + ( 2m /±2 ) E n ´ + Z q 2 / r r 2 − ´(´ + √1) Rn´ = 0. So at big r, R ±±n ´ ≈ − 2m E n ´ Rn ´ /±2 and Rn ´ ∼ exp(− −2m E n ´ r/±) . 2 ± ± ´ At tiny r, (r Rn ´ ) ≈ ´(´ + 1) Rn ´ and R n ´(r ) ∼ r . Set R n ´(r ) = √ r ´ exp(− −2m E n ´ r /±) Pn ´ (r ) and apply the method of Frobenius to find the values of E n ´ for which Rn ´ is suitably normalizable. (´ ) 7.20 Show that as long as the matrix Yk j = y k j ( x j ) is nonsingular, the n bound2

ary conditions b j = y (´ j ) (x j ) = c 1 y1 j (x j ) + · · · c n y n n coefficients ck of the expansion (7.262) to be (´ )

C = B Y −1 or C k T

T

(´ j )

=

n ± j =1

1 bj Y− jk .

(x j )

determine the

(7.510)

7.21 Show that if the real and imaginary parts u 1 , u 2 , v1 , and v2 of ψ and χ satisfy boundary conditions at x = a and x = b that make the boundary term (7.280) vanish, then its complex analog (7.282) also vanishes. 7.22 Show that if the real and imaginary parts u 1 , u 2 , v1 , and v2 of ψ and χ satisfy boundary conditions at x = a and x = b that make the boundary term (7.280) vanish, and if the differential operator L is real and self adjoint, then (7.278) implies (7.283). 7.23 Show that if D is the set of all twice-differentiable functions u ( x ) on [a , b ] that satisfy Dirichlet’s boundary conditions (7.285) and if the function p ( x ) is continuous and positive on [a , b ], then the adjoint set D ∗ defined as the

Exercises

7.24 7.25 7.26 7.27 7.28

7.29 7.30 7.31 7.32

7.33 7.34 7.35 7.36 7.37

7.38

333

set of all twice-differentiable functions v(x ) that make the boundary term (7.287) vanish for all functions u ∈ D is D itself. Same as Exercise 7.23 but for Neumann boundary conditions (7.286). Use Bessel’s equation (7.354) and the boundary conditions u ( 0) = 0 for n > 0 and u (1) = 0 to show that the eigenvalues λ are all positive. Show that after the change of variables u ( x ) = Jn (kx ) = Jn (ρ), the selfadjoint differential equation (7.354) becomes Bessel’s equation (7.355). Derive Bessel’s inequality (7.417) from the inequality (7.416). Repeat Example 7.51 using J1 ’s instead of J0 ’s. Hint: the Mathematica command Do[Print[N[BesselJZero[1, k], 10]], {k, 1, 100, 1}] gives the first 100 zeros z1 ,k of the Bessel function J 1 (x ) to 10 significant figures. Derive the Yukawa potential (7.431) as the Green’s function for the modified Helmholtz equation (7.430). Use Lagrange’s equation to derive the Klein–Gordon equation (7.462). Derive the formula for the hamiltonian density of the theory of Example 7.58. Derive the relation ρ = ρ(a /a )3 (1 +w) between the energy density ρ and the scale factor a (t ) from the conservation law dρ/da = −3(ρ + p )/a and the equation of state p = wρ . Derive the three de Sitter solutions (7.494). Derive the two solutions (7.496) for a universe of radiation. How do we know that the maximum value of the scale factor of a closed universe of matter is (7.499). Use E = 21 (1 − v 2 ) φ u2 − V (φ) to derive the soliton solution (7.503). Find the solution of the differential equation − f ±± (x ) − f ( x ) = 1 that satisfies the boundary conditions f (−π ) = 0 = f ± (π ) . Hint: Use Example 7.55. Show that the sum of exponentials f (x ) = c1 e z 1 x + c2 e z2 x + · · · + c n e z n x in which the zk ’s are the n roots of the algebraic equation 0 = a0 + a1 z + a2 z 2 + · · · + a n zn

and the c k ’s are any constants is a solution of the homogeneous ordinary differential equation 0 = a0 f ( x ) + a1 f ± (x ) + a2 f ±± (x ) + · · · + a n f (n )( x ) with constant coefficients ak . When the roots are all different, f (x ) is the most general solution. 7.39 Find two linearly independent solutions of the ODE f ±± − 2 f ± + f = 0.

8 Integral Equations

8.1 Differential Equations as Integral Equations Differential equations when integrated become integral equations with built-in boundary conditions. Thus if we integrate the first-order ODE du ( x ) dx

≡ ux ( x ) = p( x ) u(x ) + q( x )

then we get the integral equation u (x ) =

±

x a

p ( y ) u ( y ) dy +

±

x a

(8.1)

q ( y ) dy + u (a ).

(8.2)

To transform a second-order differential equation into an integral equation, one uses Cauchy’s identity (Exercise 8.1)

±

x a

dz

±z a

dy f ( y ) =

±

x a

(x

− y) f ( y) dy ,

(8.3)

which is a special case of his formula for repeated integration

± x± a

x1 a

···

±

xn −1 a

f ( xn ) d xn · · · d x2 d x1

= ( n −1 1)!

±

x a

(x

− y)n−1 f ( y) dy.

(8.4) Using the special case (8.3), one may integrate (Exercise 8.2) the second-order ODE u ±± = pu ± + qu + r to u(x ) = f (x ) +

±

x a

k ( x , y ) u ( y ) dy

² ³ − y) q ( y) − p± ( y) and ²± ³ ± f (x ) = u (a ) + ( x − a ) u ( a ) − p (a ) u (a ) +

(8.5)

with k ( x , y ) = p( y ) + (x

334

x a

(x

− y)r ( y) dy.

(8.6)

8.3 Volterra Integral Equations

335

In some physical problems, integral equations arise independently of differential equations. Whatever their origin, integral equations tend to have properties more suitable to mathematical analysis because derivatives are unbounded operators. 8.2 Fredholm Integral Equations An equation of the form

±b a

k (x , y ) u ( y ) dy =

λ u(x )

+ f (x )

(8.7)

for a ≤ x ≤ b with a given kernel k ( x , y ) and a specified function f ( x ) is an inhomogeneous Fredholm equation of the second kind for the function u ( x ) and the parameter λ (Erik Ivar Fredholm, 1866–1927). If f (x ) = 0, then it is a homogeneous Fredholm equation of the second kind

±b a

k ( x , y ) u ( y ) dy

= λ u (x ),

a

≤ x ≤ b.

(8.8)

Such an equation typically has nontrivial solutions only for certain eigenvalues λ. Each solution u (x ) is an eigenfunction. If λ = 0 but f ( x ) ² = 0, then Equation (8.7) is an inhomogeneous Fredholm equation of the first kind

±

b a

k (x , y ) u ( y ) dy

Finally, if both λ = 0 and f (x ) equation of the first kind

±b

=

a

≤ x ≤ b.

= 0, then (8.7) is a homogeneous

k ( x , y ) u ( y ) dy

a

f ( x ),

= 0,

a

≤ x ≤ b.

(8.9) Fredholm

(8.10)

These Fredholm equations (8.7–8.10) are linear because they involve only the first (and zeroth) power of the unknown function u ( x ). 8.3 Volterra Integral Equations If the kernel k ( x , y ) in the Equations (8.7– 8.10) that define the Fredholm integral equations is causal, that is, if k ( x , y ) = k ( x , y ) θ ( x − y ) , in which θ ( x ) = ( x + |x |)/2| x| is the Heaviside function, then the corresponding equations bear the name Volterra (Vito Volterra, 1860–1941). Thus an equation of the form

±

x a

k ( x , y ) u ( y ) dy

= λ u(x ) + f (x )

(8.11)

336

8 Integral Equations

in which the kernel k ( x , y ) and the function f ( x ) are given, is an inhomogeneous Volterra equation of the second kind for the function u ( x ) and the parameter λ . If f ( x ) = 0, then it is a homogeneous Volterra equation of the second kind

±

x a

k ( x , y ) u ( y ) dy

= λ u(x ).

(8.12)

Such an equation typically has nontrivial solutions only for certain eigenvalues λ . The solutions u ( x ) are the eigenfunctions. If λ = 0 but f (x ) ² = 0, then Equation (8.11) is an inhomogeneous Volterra equation of the first kind

±

x a

k ( x , y ) u ( y ) dy

Finally, if both λ = 0 and f ( x ) equation of the first kind

±

x a

=

=

f ( x ).

(8.13)

0, then it is a homogeneous Volterra

k (x , y ) u ( y ) dy

= 0.

(8.14)

These Volterra equations (8.11–8.14) are linear because they involve only the first (and zeroth power) of the unknown function u (x ). In what follows, we’ll mainly discuss Fredholm integral equations, since those of the Volterra type are a special case of the Fredholm type. 8.4 Implications of Linearity Because the Fredholm and Volterra integral equations are linear, one may add solutions of the homogeneous equations (8.8, 8.10, 8.12, and 8.14) and get new solutions. Thus if u 1 , u 2 , . . . are eigenfunctions

±b a

k ( x , y ) u j ( y ) dy

= λ u j (x ), ∑

with the same eigenvalue λ, then the sum the same eigenvalue

±b a

j

a

≤x ≤b

a j u j ( x ) also is an eigenfunction with

⎛ ⎞ ±b ´ ´ k( x , y ) ⎝ a j u j ( y ) ⎠ dy = aj k ( x , y ) u j ( y ) dy a j j ⎛ ⎞ ´ ´ = a j λ u j ( x ) = λ ⎝ a j u j ( x )⎠ . j

(8.15)

j

(8.16)

8.5 Numerical Solutions

337

It also is true that the difference between any two solutions u i1( x ) and u i2 (x ) of one of the inhomogeneous Fredholm (8.7, 8.9) or Volterra (8.11, 8.13) equations is a solution of the associated homogeneous equation (8.8, 8.10, 8.12, or 8.14). Thus if u i1( x ) and u i2 (x ) satisfy the inhomogeneous Fredholm equation of the second kind

±

b

a

k (x , y ) u ij ( y ) dy

= λ uij ( x ) + f ( x ),

j

= 1, 2

(8.17)

then their difference u i1( x ) − u i2 ( x ) satisfies the homogeneous Fredholm equation of the second kind

±b a

²

³

k ( x , y ) u i1( y ) − u i2( y ) dy

=λ

²

³

u i1 (x ) − u i2 (x )

.

(8.18)

Thus the most general solution u i (x ) of the inhomogeneous Fredholm equation of the second kind (8.17) is a particular solution u ip ( x ) of that equation plus the general solution of the homogeneous Fredholm equation of the second kind (8.15) u i ( x ) = u ip (x ) +

´

a j u j (x ).

(8.19)

j

Linear integral equations are much easier to solve than nonlinear ones. 8.5 Numerical Solutions

Let us break the real interval [a , b ] into N segments [ yk , yk +1 ] of equal length ± y = (b − a )/ N with y0 = a, yk = a + k ± y, and y N = b. Let’s also set x k = yk and define U as the vector with entries Uk = u ( y k ) and K as the ( N + 1 ) × ( N + 1) square matrix with elements K k ² = k (x k , y² ) ±y. Then we may approximate the homogeneous Fredholm equation of the second kind (8.8)

±b a

k ( x , y ) u ( y ) dy

as the algebraic equation

N ´

=0

= λ u( x ),

K k,² U²

a

≤x≤b

= λ Uk

(8.20)

(8.21)

²

or in matrix notation K U = λ U. We saw in Section 1.26 that every such equation has N + 1 eigenvectors U (α) and eigenvalues λ (α), and that the eigenvalues λ (α) are the solutions of the characteristic equation (1.273) det( K

−λ

(α)

µ

I ) = µK

µ − λ I µ = 0. (α)

(8.22)

338

8 Integral Equations

In general, as N → ∞ and ± y → 0, the number N + 1 of eigenvalues λ(α) and eigenvectors U (α) becomes infinite. We may apply the same technique to the inhomogeneous Fredholm equation of the first kind

±

b a

=

k ( x , y ) u ( y ) dy

for a ≤ x

f (x )

≤ b.

(8.23)

The resulting matrix equation is K U = F in which the kth entry in the vector F is Fk = f ( x k ) . This equation has the solution U = K −1 F as long as the matrix K is nonsingular, that is, as long as det K ² = 0. This technique applied to the inhomogeneous Fredholm equation of the second kind

±b a

k ( x , y ) u ( y ) dy

= λ u( x) +

f (x )

(8.24)

leads to the matrix equation K U = λ U + F . The associated homogeneous matrix equation K U = λ U has N + 1 eigenvalues λ(α) and eigenvectors U (α) ≡ |α³ . For any value of λ that is not one of the eigenvalues λ(α) , the matrix K − λ I has a nonzero determinant and hence an inverse, and so the vector U i = ( K − λ I ) −1 F is a solution of the inhomogeneous matrix equation K U = λ U + F . If λ = λ(β) is one of the eigenvalues λ(α) of the homogeneous matrix equation K U = λ U then the matrix K − λ(β) I will not have an inverse, but it will have a pseudoinverse (Section 1.33). If its singular-value decomposition (1.396) is K

−λ

(β)

I

=

´

N +1 n =1

|m n ³ Sn ń|

(8.25)

then its pseudoinverse (1.427) is N +1 ( K − λ I )+ = ´ |n³ Sn−1 ´m n | (β)

n=1 Sn ² = 0

(8.26)

in which the sum is over the positive singular values. So if the vector F is a linear combination of the left singular vectors |m n ³ whose singular values are positive F then the vector Ui

=

=

(

´

N +1 n=1 Sn ²=0

K

f n |m n ³

−λ

(β)

I

)+

(8.27)

F

(8.28)

8.6 Integral Transformations

339

= λ U + F . For in this case ( )( )+ Ui = K − λ I K − λ I F N +1 N +1 N +1 ´ ´ ´ ±± ± − 1 ±± ±± ± = |m n ³ Sn ´ n | |n ³ Sn± ´ m n | f n |m n ³

will be a solution of K U

(

K

−λ

(β)

I

)

(β)

(β)

n ±± =1

=

N +1 ´ n =1 Sn ² = 0

n=1 Sn ²=0

n± = 1 Sn ± ²=0

f n |m n ³ = F .

(8.29)

The most general solution will be the sum of this particular solution of the inhomogeneous equation K U = λ U + F and the most general solution of the homogeneous equation K U = λ U U

= Ui +

´

fβ,k U (β,k )

k

=

(

K

−λ

(β)

I

)+

F+

´

fβ,k U (β,k ) .

(8.30)

k

Open-source programs are available in C++ (math.nist.gov/tnt/) and in FORTRAN (www.netlib.org/lapack/) that can solve such equations for the N + 1 eigenvalues λ(α) and eigenvectors U (α) and for the inverse K −1 for N = 100, 1000, 10,000, and so forth in milliseconds on a PC. 8.6 Integral Transformations Integral transformations (Courant and Hilbert, 1955, chap. VII) help us solve linear homogeneous differential equations like L u + cu

=0

(8.31)

in which L is a linear operator involving derivatives of u ( z ) with respect to its complex argument z = x + i y and c is a constant. We choose a kernel K (z , w) analytic in both variables and write u (z ) as an integral along a contour in the complex w -plane weighted by an unknown function v(w) u( z) =

±

C

K (z , w) v(w) d w.

(8.32)

If the differential operator L commutes with the contour integration as it usually would, then our differential equation (8.31) is

±

C

[L K (z , w) + c K ( z , w)]

v(w) d w

= 0.

(8.33)

The next step is to find a linear operator M that acting on K ( z , w) with derivatives (but no z-derivatives) gives L acting on K ( z , w)

w-

340

8 Integral Equations

M K (z , w) = L K (z , w).

(8.34)

We then get an integral equation

±

C

[M K ( z , w) + c K (z , w) ]

v(w) d w

=0

(8.35)

involving w -derivatives which we can integrate by parts. We choose the contour C so that the resulting boundary terms vanish. By using our freedom to pick the kernel and the contour, we often can make the resulting differential equation for v simpler than the one (8.31) we started with. The kernel K (z , w) = exp(i zw)

Example 8.1(Fourier, Laplace, and Euler kernels) leads to the Fourier transform (Chapter 4) u(z ) =

±∞

−∞

ei zw v(w) d w.

(8.36)

The kernel K (z , w) = exp(−z w) gives us the Laplace transform (Section 4.9). u ( z) =

±∞ 0

e−z w v(w) dw.

(8.37)

Euler’s kernel K (z , w) = (z − w)a occurs in applications of Cauchy’s integral theorem (6.34) and integral formula (6.44). Example 8.2(Bessel functions) (7.355)

The differential operator L for Bessel’s equation

z 2 u ±± + z u ± + z 2 u − λ2 u

is L

(8.38)

2

= z2 dzd 2 + z dzd + z2

and the constant c is − λ2 . We choose M kernel K that satisfies M K = L K

−K

=0

ww

(8.39)

= − d 2/ dw 2 and seek a suitable (8.34)

= z 2 K zz + z K z + z 2 K

(8.40)

in which subscripts indicate differentiation as in (2.7). The kernel K (z , w) = eµi z sin w

(8.41)

is a solution of (8.40) that is entire in both variables (Exercise 8.3). In terms of it, our integral equation (8.35) is

± ¶ C

K ww (z, w) + λ2 K (z , w)

·

v(w) d w

= 0.

(8.42)

Exercises

341

We now integrate by parts once

± ¸ C

and then again

−K

¹

w

v

± + λ2 K v + d Kw v d w dw

± ¸ º

K v ±± + λ2 v

C

»

+

d (K w v − K v ± ) dw

¹

d w.

(8.43)

(8.44)

If we choose the contour so that K w v − K v ± vanishes at both ends, then the unknown function v need only satisfy the differential equation v

±± + λ 2 v = 0

(8.45)

which is much simpler than Bessel’s equation (7.355). The solution v(w) = exp(i λw) is an entire function of w for every complex λ. The contour integral (8.32) now gives us Bessel’s function as the integral transform u (z ) =

±

C

K (z , w) v(w) d w

=

±

C

eµi z sin w ei λw dw.

(8.46)

For Re(z ) > 0 and any complex λ , the contour C1 that runs from − i ∞ to the origin ± w = 0, then to w = −π , and finally up to − π + i ∞ has K w v − K v = 0 at its ends (Exercise 8.4) provided we use the minus sign in the exponential. The function defined by this choice ± 1 e−i z sin w +i λw dw (8.47) Hλ(1) (z ) = − π

C1

is the first Hankel function(Hermann Hankel, 1839–1873). The second Hankel function is defined for Re(z ) > 0 and any complex λ by a contour C2 that runs from π + i ∞ to w = π , then to w = 0, and lastly to − i ∞ (2)

Hλ

(z )

= − π1

±

C2

e−i z sin w +i λw d w.

(8.48)

Because the integrand exp(−i z sin w + i λw) is an entire function of z and w , one may deform the contours C 1 and C 2 and analytically continue the Hankel functions beyond the right half-plane (Courant and Hilbert, 1955, chap. VII). One may verify (Exercise 8.5) that the Hankel functions are related by complex conjugation ( 1)

when both z

>

0 and λ are real.

Hλ

( z)

= H 2 ∗ (z ) ( )

(8.49)

λ

Exercises

8.1 Show that

±x ± a

dz

z a

d y f (y ) =

± a

x (x

− y) f ( y) d y.

(8.50)

342

8 Integral Equations

Hint: differentiate both sides with respect to x. 8.2 Use this identity (8.50) to integrate u ±± = pu ± +³ qu + r and derive Equations ² (8.5), k ( x , y ) = p ( y ) + ( x − y ) q ( y ) − p ± ( y ) , and (8.6). 8.3 Show that the kernel K (z , w) = exp(µi z sin w) satisfies the differential equation (8.40). 8.4 Show that for Re z > 0 and arbitrary complex λ, the boundary terms in the integral (8.44) vanish for the two contours C 1 and C 2 that define the two Hankel functions. 8.5 Show that the Hankel functions are related by complex conjugation (8.49) when both z > 0 and λ are real.

9 Legendre Polynomials and Spherical Harmonics

9.1 Legendre’s Polynomials The monomials x n span the space of functions f ( x ) that have power-series expansions on an interval about the origin f (x ) =

∞ ± n=0

cn x n

=

∞ ± n=0

f (n) ( 0) n x . n!

(9.1)

They are complete but not orthogonal or normalized. We can make them into real, orthogonal polynomials Pn ( x ) of degree n on the interval [−1 , 1] ( Pn , Pm )

=

²1

−1

Pn ( x ) Pm ( x ) d x

=0

n ±= m

by requiring that each Pn (x ) be orthogonal to all monomials x m for m

²1

−1

Pn (x ) x m d x

=0

m

−1 and β > −1 to the Jacobi polynomials Pn(α,β) (x ) =

n 1 −α (1 + x )−β d (1 − x ) 2n n ! dx n

À

(1

− x)

α

(1

+ x)

β

(x

2

− 1)n

Á

(9.60)

which are orthogonal on [−1, 1]

²1

−1

Pn(α,β) (x ) Pm(α,β) (x ) w( x ) dx

+ +1 n + α + 1) ²( n + β + 1) = (22n + α ²( + β + 1) ²(n + α + β + 1) δnm α

β

(9.61)

and satisfy the normalization condition (α,β)

Pn

(1)

=

³n + α´ n

(9.62)

and the differential equation

− x 2) y³³ + (β − α − (α + β + 2)x ) y³ + n(n + α + β + 1) y = 0. ¸ In terms of R (x , y ) = 1 − 2x y + y 2 , their generating function is (1

(9.63)

∞ ± + β − α − β Pn(α,β) (x ) yn . (9.64) 2 (1 − y + R (x , y )) (1 + w + R (x , y)) / R (x , y ) = n=0 α

9.8 Orthogonal Polynomials

353

When α = β, they are the Gegenbauer polynomials, which for α = β = ²1/2 are the Chebyshev polynomials (of the second and first kind, respectively). For α = β = 0, they are Legendre’s polynomials. The recursion relations T0 (x ) = 1

U0 ( x ) = 1

T1 (x ) = x

U1(x ) = 2x

Tn+1 (x ) = 2x Tn (x ) − Tn−1 (x )

Un+1 (x )

=

(9.65)

2xUn (x ) − Un−1 (x )

define the Chebyshev polynomials of the first Tn (x ) and second Un (x ) kinds. Example 9.6 (Hermite’s polynomials) The choice q(x ) = exp(− x 2) leads to the Hermite polynomials

w( x )

d n −x2 Hn (x ) = (−1) e e dx n n x2

where D

=e

x 2/2

³

x

−

d dx

ń

e−x

= 1 with weight function

2 /2

= 2 n e− D

2 /4

= d /dx is the x-derivative. They are orthogonal on the real line ²∞ √ H (x ) H (x ) e−x dx = π 2n n ! δ n

−∞

2

m

nm

xn

(9.66)

(9.67)

and satisfy the differential equation y ³³ − 2 x y ³ + 2 n y = 0. Their generating function is e2x y− y

2

=

∞ ± n=0

Hn (x )

(9.68)

yn . n!

(9.69)

The nth excited state of the harmonic oscillator of mass m and angular frequency ω √ is proportional to Hn (x ) in which x = m ω/± q is the dimensionless position of the oscillator (Section 3.12). Example 9.7(Laguerre’s polynomials) The choices q (x ) = x and weight function α −x w( x ) = x e lead to the generalized Laguerre polynomials L (α) n (x ) =

e x d n ( −x n +α ) e x . n! x α dx n

They are orthogonal on the interval [0, ∞)

²∞ 0

α −x L n(α)(x ) L (α) dx m (x ) x e

(9.70)

= ²(n +nα! + 1) δn m ,

(9.71)

and satisfy the differential equation x y ³³

+ (α + 1 − x ) y³ + n y = 0.

(9.72)

354

9 Legendre Polynomials and Spherical Harmonics

Their generating function is (1

− y )− −1 exp α

³

´ ± ∞ = Ln y −1 xy

(α)

n=0

n (x) y .

(9.73)

The radial wave function for the state of the nonrelativistic hydrogen atom with quantum numbers n and ³ is ρ ³ L 2n³−+³1−1 (ρ) e−ρ/ 2 in which ρ = 2r / na0 and a0 = 4π ´ 0±2 / m e e2 is the Bohr radius.

9.9 Azimuthally Symmetric Laplacians

We saw in Section 7.3 that the laplacian ´ = ∇ · ∇ separates in spherical coordinates r, θ , φ . A system with no dependence on the angle φ is said to have azimuthal symmetry. An azimuthally symmetric function f (r, θ , φ)

= Rk

r ) µ³ (θ )

(9.74)

,³ (

will be a solution of Helmholtz’s equation

− ´ f = k2 f

(9.75)

if the functions R k,³ (r ) and µ³ (θ ) satisfy 1 d r 2 dr

³

r

2 d R k,³

dr

´ » ³(³ + 1) ¼ + k2 − r 2 Rk = 0 ,³

(9.76)

for a nonnegative integer ³ and Legendre’s equation (9.34) 1 d sin θ d θ

³

d µ³ sin θ dθ

´

+ ³(³ + 1)µ = 0 ³

(9.77)

so that we may set µ³ (θ ) = P³ (cos θ ). For k > 0, the solutions of the radial equation (9.76) that are finite at r = 0 are the spherical Bessel functions Rk ,³ (r ) = j³ (kr ) which are given by Rayleigh’s formula (10.71) j³ ( x ) = ( −1) x ³

³

³

d xd x

(9.78)

´ ³ sin x ´ ³

x

.

(9.79)

So the general azimuthally symmetric solution of the Helmholtz equation (9.75) that is finite at r = 0 is f (r, θ ) =

∞ ± =0

³

a k ,³ j³ (kr ) P³ (cos θ )

(9.80)

9.10 Laplace’s Equation in Two Dimensions

355

in which the ak ,³ are constants. If the solution can be infinite at the origin, then the Neumann functions n ³ (x ) = −(−1) x ³

³

³

d xd x

´ µ cos x ¶ ³

x

must be included, and the general solution then is f (r, θ ) =

∞ ¹ ± ³

=0

º

ak ,³ j³ (kr ) + bk ,³ n ³ (kr ) P³ (cos θ )

(9.81)

(9.82)

in which the ak ,³ and b k,³ are constants. When k = 0, Helmholtz’s equation reduces to Laplace’s equation

´f =0

(9.83)

which describes the Coulomb-gauge electrostatic potential in the absence of charge and the Newtonian gravitational potential in the absence of mass. Now the radial equation is simply ³ dR ´ d ³ r2 = ³(³ + 1) R³ (9.84) dr dr since k

= 0. Setting

we get n (n + 1) = solution to (9.83) is

R³ (r ) = r n

³(³

(9.85)

+ 1) so that n = ³ or n = − (³ + 1) . Thus the general

f (r, θ )

=

∞ ¹ ± =0

³

If the solution must be finite at r

º

a³ r ³ + b³ r −³−1 P³ (cos θ ).

(9.86)

= 0, then all the b ’s must vanish. ³

9.10 Laplace’s Equation in Two Dimensions In Section 7.3, we saw that Helmholtz’s equation separates in cylindrical coordinates, and that the equation for P(ρ) is Bessel’s equation (7.25). But if α = 0, Helmholtz’s equation (7.30) reduces to Laplace’s equation ´ f = 0, and if the potential f also is independent of z, then simpler solutions exist. For now 2 α = 0 = k, and so if ¶ ³³ m = −m ¶ m , then Equation (7.30) becomes d ρ dρ

³

d Pm ρ dρ

´

= m 2 Pm .

(9.87)

The function ¶(φ) may be taken to be ¶(φ) = exp(im φ) or a linear combination of cos(m φ) and sin( m φ). If the whole range of φ from 0 to 2π is physically

356

9 Legendre Polynomials and Spherical Harmonics

relevant, then ¶(φ) must be periodic, and so m must be an integer. To solve the equation (9.87) for Pm , we set Pm = ρ n and get n2 ρ n

= m2 ρn

(9.88)

which says that n = ²m. The general z-independent solution of Laplace’s equation in cylindrical coordinates then is f (ρ , φ) =

∞ ( ± m =0

cm ρ m

+ dm ρ −m

)

e imφ .

(9.89)

9.11 Helmholtz’s Equation in Spherical Coordinates

The laplacian ´ separates in spherical coordinates, as we saw in Section 7.3. Thus a function f (r, θ , φ)

= Rk

r ) µ³, m (θ ) ¶ m (φ)

(9.90)

,³ (

will be a solution of the Helmholtz equation −´ f = k 2 f if Rk ,³ is a linear combination of the spherical Bessel functions j³ (9.79) and n ³ (9.81)

if ¶m

= eim

Rk ,³ (r ) = a k ,³ j³ (kr ) + bk ,³ n ³ ( kr ) φ

(9.91)

, and if µ ³,m satisfies the associated Legendre equation

³

1 d sin θ d θ

d µ ³,m sin θ dθ

¼ ´ » m2 + ³(³ + 1) − 2 µ m = 0. sin θ ³,

(9.92)

9.12 Associated Legendre Polynomials

The associated Legendre functions P³m ( x ) ≡ P³,m ( x ) are polynomials in sin θ and cos θ . They arise as solutions of the separated θ equation (9.92) 1 d sin θ d θ

³

d P³,m sin θ dθ

´ » ¼ m2 + ³(³ + 1) − 2 P m = 0 sin θ ³,

of the laplacian in spherical coordinates. In terms of x ordinary differential equation is

¹(1 − x 2) P ³

³,m

» º ³ ( x ) + ³(³ + 1) −

= cos θ , this self-adjoint

¼

m2 P³,m (x ) = 0. 1 − x2

The associated Legendre function P³,m ( x ), for m to the mth derivative P³(m ) ( x )

(9.93)

(9.94)

= 0, 1, 2, . . . , is simply related

P³,m ( x ) ≡ (1 − x 2)m / 2 P³(m ) (x ).

(9.95)

9.12 Associated Legendre Polynomials

357

To see why this function satisfies the differential equation (9.94), we differentiate (m )

P³

= (1 − x 2) −m 2 P

(x )

/

twice getting P³ +1) (m

³,m (

³

+

= ( 1 − x )−m /2 P³,³ m 2

x)

mx P³,m 1 − x2

and

» 2mx P³,³ m + 2) ³³ 2 −m / 2 P³,m + P³ = (1 − x ) 2 (m

1−x

+

(9.96)

m P³,m 1 − x2

+

´

(9.97)

m (m + 2 )x 2 P³, m (1 − x 2)2

¼ .

(9.98) Next we use Leibniz’s rule (5.49) to differentiate Legendre’s equation (9.27)

¹

(1

º − x 2) P ³ ³ + ³(³ + 1) P = 0

(9.99)

³

³

m times, obtaining (1

− x 2) P m+2 − 2x (m + 1) P m+1 + (³ − m )(³ + m + 1) P m = 0. (

)

(

³

)

(

³

)

³

(9.100)

Now we put the formulas for the three derivatives (9.96–9.98) into this equation (9.100) and find that the P³,m ( x ) as defined (9.95) obey the desired differential equation (9.94). Thus the associated Legendre functions are P³,m ( x ) = ( 1 − x 2)m / 2 P³(m ) (x ) = (1 − x 2) m /2 They are simple polynomials in x

= cos θ and

P³,m (cos θ ) = sin m θ

√

dm P³ (x ). dxm

1 − x2

(9.101)

= sin θ

dm P³ (cos θ ). d (cos θ )m

(9.102)

One can use Rodrigues’s formula (9.8) for the Legendre polynomial P³ (x ) to write the definition (9.95) of P³,m ( x ) as P³,

− x 2 )m 2 d +m 2 (x − 1) . m(x ) = 2 ³! d x +m (1

³

/

³

³

³

(9.103)

This formula extends the associated Legendre polynomial P³,m ( x ) to negative values of the integer m and also tells us that under parity P³m ( x ) changes by (−1 )³+m P³,m (− x ) =

−1) +m P

(

³

³, m (

x ).

(9.104)

Rodrigues’s formula (9.103) for the associated Legendre function makes sense as long as ³ + m ≥ 0. This last condition is the requirement in quantum mechanics

358

9 Legendre Polynomials and Spherical Harmonics

that m not be less than −³. And if m exceeds ³, then P³,m ( x ) is given by more than 2³ derivatives of a polynomial of degree 2³; so P³,m (x ) = 0 if m > ³. This last condition is the requirement in quantum mechanics that m not be greater than ³. So we have

− ³ ≤ m ≤ ³.

One may show that P³,−m ( x ) =

(9.105)

(³ − m )! −1)m (³ + m )! P

x ).

(9.106)

³,m (

(

In fact, since m occurs only as m 2 in the ordinary differential equation (9.94), P³, −m ( x ) must be proportional to P³,m ( x ). Under reflections, the parity of P³,m is (−1)³ +m , that is, P³,m (−x ) =

−1) +m P m ( x ). (9.107) √ 2 If m ± = 0, then P m (x ) has a power of 1 − x in it, so P m (²1) = 0 for m ± = 0. (9.108) We may consider either ³(³ + 1 ) or m 2 as the eigenvalue in the ODE (9.94) ¼ ¹ º³ » m2 2 ³ (1 − x ) P m (x ) + ³(³ + 1) − P m (x ) = 0. (9.109) 1 − x2 If ³(³ + 1 ) is the eigenvalue, then the weight function is unity, and since this ODE is self adjoint on the interval [−1, 1 ] (at the ends of which p( x ) = (1 − x 2) = 0), ³

(

³,

³,

³,

³,

³,

the eigenfunctions P³,m ( x ) and P³³ ,m ( x ) must be orthogonal on that interval when ³ ± = ³³ . The full integral formula is

²

1

−1

P³,m (x ) P³³ ,m ( x ) d x

(³ + m ) ! = 2³ 2+ 1 (³ − m)! δ

³,³

³.

(9.110)

If m 2 for fixed ³ is the eigenvalue, then the weight function is 1/(1 − x 2 ), and the eigenfunctions P³,m (x ) and P³,m ³ ( x ) must be orthogonal on [−1, 1] when m ± = m ³ . The full formula is

²

1

−1

P³,m ( x ) P³,m ³ ( x )

dx 1 − x2

= |m(³|(³+−mm)!)! δm m³ . ,

(9.111)

9.13 Spherical Harmonics

The spherical harmonic Y ³m (θ , φ) ≡ Y³,m (θ , φ) is the product Y ³,m (θ , φ) =

µ³,m (θ ) ¶ m (φ)

(9.112)

9.13 Spherical Harmonics

359

in which µ ³,m (θ ) is proportional to the associated Legendre function P³, m µ³,m (θ )

= (−1)m

Â

2³ + 1 2

− m)! (³ + m ) !

(³

P³,m (cos θ )

(9.113)

and imφ

¶m (φ)

= √e

2π

(9.114)

.

The big square root in the definition (9.113) ensures that

²2

π

0

dφ

²

π

0

sin θ d θ Y³,∗ m (θ , φ) Y³ ³ ,m ³ (θ , φ) =

(9.115)

δ³³³ δmm ³ .

In spherical coordinates, the parity transformation x ³ = −x

(9.116)

is r ³ = r, θ ³ = π − θ , and φ³ = φ ² π . So under parity, cos θ ³ = − cos θ and exp(im φ ³ ) = ( −1)m exp(im φ). This factor of ( −1)m cancels the m-dependence (9.104) of P³, m (θ ) under parity, so that under parity Y³,m (θ ³ , φ ³ ) = Y ³,m (π − θ , φ ² π ) =

−1 ) Y

(

³

³,m (θ ,

(9.117)

φ).

Thus the parity of the state |n , ³, m µ is ( −1)³ . The spherical harmonics are complete on the unit sphere. They may be used to expand any smooth function f (θ , φ) as f (θ , φ) =

∞ ± ± ³

=0 m =− ³

a³m Y ³,m (θ , φ).

(9.118)

³

The orthonormality relation (9.115) says that the coefficients a ³m are a ³m

=

²2 0

π

dφ

²

π

0

sin θ d θ Y³,∗ m (θ , φ) f (θ , φ).

Putting the last two equations together, we find f (θ , φ)

=

²

2π 0

d φ³

²

π

0

sin θ ³ d θ ³

Ã± ∞ ± ³

³

=0 m =−³

Y ³,∗ m (θ ³ , φ³ ) Y ³,m (θ , φ)

(9.119)

Ä

f (θ ³ , φ ³ )

(9.120) and so, we may identify the sum within the brackets as an angular delta function

∞ ± ± ³

=0 m =−³

³

Y³,∗ m (θ ³ , φ ³ ) Y³,m (θ , φ)

= sin1 θ δ(θ − θ ³ ) δ(φ − φ³ )

(9.121)

360

9 Legendre Polynomials and Spherical Harmonics

which sometimes is abbreviated as

∞ ± ± ³

=0 m =−³

Y³,∗ m (·³ ) Y³, m (·) =

δ

(2)

(·

− ·³).

(9.122)

³

The spherical-harmonic expansion (9.118) of the Legendre polynomial P³ ( nˆ · nˆ ± ) of the cosine nˆ · nˆ ± in which the polar angles of the unit vectors respectively are θ , φ and θ ³ , φ ³ is the addition theorem(Example 11.21) P³ (nˆ · nˆ ± ) =

4π

± ³

2³ + 1 m =−³

Y³,m (θ , φ)Y³,∗ m (θ ³ , φ ³ ).

(9.123)

9.14 Cosmic Microwave Background Radiation Instruments on the Wilkinson Microwave Anisotropy Probe (WMAP) and on the Planck satellite in orbit at the Lagrange point L2 (in the Earth’s shadow, 1.5× 106 km farther from the Sun) have measured the temperature T (θ , φ) of the cosmic microwave background (CMB) radiation as a function of the polar angles θ and φ in the sky as shown in Fig. 9.3. This radiation is photons last scattered when the visible universe became transparent at an age of 380,000 years and a temperature (3000 K) cool enough for hydrogen atoms to be stable. This initial transparency is usually (and inexplicably) called recombination. Temperature fluctuations of the cosmic microwave background radiation

Figure 9.3 CMB temperature fluctuations over the celestial sphere as measured by the Planck satellite. The average temperature is 2.7255 K. White regions c ESA and the Planck are cooler and black ones warmer by about 300 µK. ¶ Collaboration, 2018.

9.14 Cosmic Microwave Background Radiation

361

Since the spherical harmonics Y ³,m (θ , φ) are complete on the sphere, we can expand the temperature as T (θ , φ)

=

∞ ± ± ³

=0 m =−³

a³,m Y³, m (θ , φ)

(9.124)

³

in which the coefficients are by (9.119) a³, m

=

²2 0

π

dφ

²

π

0

sin θ d θ Y ³,∗ m (θ , φ) T (θ , φ).

(9.125)

The average temperature T contributes only to a 0,0 = T = 2.7255 K. The other coefficients describe the difference ± T (θ , φ) = T (θ , φ) − T . The angular power spectrum is C³

= 2³ 1+ 1

± ³

m =−³

|a m |2 . ³,

(9.126)

If we let the unit vector nˆ point in the direction θ , φ and use the addition theorem (9.123), then we can write the angular power spectrum as C³ =

1 4π

²

d nˆ 2

²

d 2 nˆ ± P³ (nˆ · nˆ ± ) T ( nˆ ) T ( nˆ ± ).

(9.127)

In Fig. 9.4, the measured values (arXiv:1807.06205) of the power spectrum D³ = ³(³ + 1) C³ /2π are plotted against ³ for 1 < ³ < 2500 with the angles decreasing with ³ as θ ∼ 180◦ /³. The power spectrum is a snapshot at the moment of initial transparency of the temperature distribution of the rapidly expanding plasma of photons, electrons, and nuclei undergoing tiny (2 × 10−4 ) ΛCDM fit to the CMB

6000 5000 ]2 Kµ[

4000

lD

∏

3000 2000 1000 0

2

5 10

30 50

100

250

500

1000 1500 2000

2500

Multipole

Figure 9.4 The power spectrum D³T T = ³(³ + 1)C ³ /2π of the CMB temperature fluctuations in µ K2 is plotted against the multipole moment ³ . The solid curve is the ¸CDM prediction. (Source: Planck Collaboration, arXiv:1807.06205, https://arxiv.org/pdf/1807.06205.pdf)

362

9 Legendre Polynomials and Spherical Harmonics

acoustic oscillations. In these oscillations, gravity opposes radiation pressure, and |±T (θ , φ)| is maximal both when the oscillations are most compressed and when they are most rarefied. Regions that gravity has squeezed to maximum compression at transparency form the first and highest peak. Regions that have bounced off their first maximal compression and that radiation pressure has expanded to minimum density at transparency form the second peak. Those at their second maximum compression at transparency form the third peak, and so forth. The solid curve is the prediction of a model with inflation, cold dark matter, ordinary matter, and a cosmological constant ¸. In this model, the age of the visible universe is 13.8 Gyr; the Hubble constant is H0 = 67.7 km/(s Mpc); the energy density of the universe is enough to make the universe flat; and the fractions of the energy density due to ordinary matter, dark matter, and dark energy are 5%, 26%, and 69% (Edwin Hubble 1889–1953). We can learn a lot from the data in the CMB Figure 9.4. The radius of the maximum causally connected region at transparency is 380,000 light-years. The radius √ of the maximum compressed region √ is smaller by a factor of 3 because the speed of “sound” in the plasma is nearly c / 3. The expansion of the universe since transparency has stretched the wavelength of light and reduced its frequency from that of 3000 K to 2.7255 K, an expansion factor z ≈ 3000/2.7255 ≈ 1100. The diameter of the maximum compressed region is now bigger by 1100, which (too simply) suggests an angle of about 2

3.8 × 105 × 1100 × 180◦

√

3 × 13.8 ×

109

×π

= 2◦ .

(9.128)

A more accurate estimate of that angle and of the location of the first peak in Figure 9.4 is about one degree. This result tells us that space is flat. For if the universe were closed (k = 1), then the angle would appear bigger than 1◦ ; and if it were open (k = − 1), the angle would appear smaller than 1◦. The heights and locations of the peaks in Figure 9.4 also tell us about the density of dark matter and the density of dark energy (Aghanim et al., 2018). Further Reading Much is known about Legendre functions. The books A Course of Modern Analysis (Whittaker and Watson, 1927, chap. XV) and Methods of Mathematical Physics (Courant and Hilbert, 1955) are classics. The NIST Digital Library of Mathematical Functions (dlmf.nist.gov) and the companion NIST Handbook of Mathematical Functions (Olver et al., 2010) are outstanding. You can learn more about the CMB in Steven Weinberg’s book Cosmology (Weinberg, 2010, chap. 7) and at the website camb.info.

Exercises

363

Exercises

9.1 Use conditions (9.6) and (9.7) to find P0 ( x ) and P1 ( x ). 9.2 Using the Gram–Schmidt method (section 1.10) to turn the functions x n into a set of functions L n (x ) that are orthonormal on the interval [−1, 1 ] with inner product (9.2), find L n ( x ) for n = 0, 1, 2, and 3. Isn’t Rodrigues’s formula (9.8) easier to use? 9.3 Derive the conditions (9.6–9.7) on the coefficients ak of the Legendre polynomial Pn (x ) = a0 + a1 x + · · · + a n x n . 9.4 Use equations (9.6–9.7) to find P3 ( x ) and P4 ( x ). 9.5 In superscript notation (2.6), Leibniz’s rule (5.49) for derivatives of products u v of functions is (n )

( u v)

=

n ³ ´ ± n k =0

u ( n −k ) v ( k ) .

k

(9.129)

Use it and Rodrigues’s formula (9.8) to derive the explicit formula (9.9). 9.6 The product rule for derivatives in superscript notation (2.6) is (n )

( u v)

9.7 9.8 9.9 9.10

=

n ³ ´ ± n k =0

u ( n −k ) v ( k ) .

k

(9.130)

Apply it to Rodrigues’s formula (9.8) with x 2 − 1 = (x − 1 )( x + 1) and show that the Legendre polynomials satisfy Pn (1 ) = 1. Use Cauchy’s integral formula (6.44) and Rodrigues’s formula (9.55) to derive Schlaefli’s integral formula (9.56). Show that the polynomials (9.57) are orthogonal (9.58) as long as they satisfy the endpoint condition (9.59). Derive the orthogonality relation (9.2) from Rodrigues’s formula (9.8). (a) Use the fact that the quantities w = x 2 − 1 and w n = w n vanish at the endpoints ²1 to show by repeated integrations by parts that in superscript notation (2.6)

²

1

−1

(n )

(n )

wn wn

dx

=−

²

1

−1

(n

wn

−1) w(n +1) d x = (− 1)n n

(b) Show that the final integral is equal to In

= (2n)!

²

1 (1 −1

− x )n (1 + x )n d x .

9.11 (a) Show by integrating by parts that In (9.13).

²1

−1

(2n ) wn wn d x .

(9.131)

(9.132)

= (n!)2 22n+1 /(2n + 1). (b) Prove

364

9 Legendre Polynomials and Spherical Harmonics

9.12 Suppose that Pn ( x ) and Q n (x ) are two solutions of (9.27). Find an expression for their wronskian, apart from an overall constant. 9.13 Use the method of sections (7.26 and 7.33) and the solution f (r ) = r ³ to find a second solution of the ODE (9.84). 9.14 For a uniformly charged circle of radius a, find the resulting scalar potential φ (r, θ ) for r < a. 9.15 (a) Find the electrostatic potential V (r, θ ) outside an uncharged perfectly conducting sphere of radius R in a vertical uniform static electric field that tends to E = E ˆz as r → ∞. (b) Find the potential if the free charge on the sphere is q f . 9.16 Derive (9.127) from (9.125) and (9.126). 9.17 Find the electrostatic potential V (r, θ ) inside a hollow sphere of radius R if the potential on the sphere is V ( R , θ ) = V0 cos2 θ . 9.18 Find the electrostatic potential V (r, θ ) outside a hollow sphere of radius R if the potential on the sphere is V ( R , θ ) = V0 cos2 θ .

10 Bessel Functions

10.1 Cylindrical Bessel Functions of the First Kind

≥ 0 by the series ± ² z2 zn z4 1− Jn ( z ) = n + − ··· 2 n! 2(2n + 2) 2 · 4( 2n + 2 )(2n + 4) (10.1) ∞ ³ z ´2m ³ z ń µ (−1 )m = 2 m ! ( m + n )! 2 m =0

The cylindrical Bessel functions are defined for any integer n

(Friedrich Bessel, 1784–1846). The first term of this series tells us that for small |z | ± 1 Jn ( z ) ≈

zn . 2n n!

(10.2)

The alternating signs in the series (10.1) make the waves plotted in Fig. 10.1 and for |z | ² 1 give us the approximation (Courant and Hilbert, 1955, chap. VII) Jn (z ) ≈

¶

³ nπ 2 cos z − πz 2

´ − π4 + O (|z |−3 2). /

(10.3)

The Jn (z ) are entire functions. They obey Bessel’s equation (7.355) d 2 Jn dz 2

+

1 d Jn z dz

+

·

1−

n2 z2

¸

Jn

=0

(10.4)

as one may show (Exercise 10.1) by substituting the series (10.1) into the differential equation (10.4). Their generating function is exp

¹z

2

(u

º

− 1/u ) =

∞ µ n=−∞

u n Jn (z )

(10.5) 365

366

10 Bessel Functions Bessel functions J0(ρ), J1(ρ), J2(ρ)

1 ) ρ(2,1,0J

0.5 0 –0.5

0

4

2

6

8

10

12

8

10

12

ρ

J3(ρ), J4(ρ), J5(ρ)

0.4 )ρ(5,4,3J

0.2 0 –0.2 –0.4

0

2

4

6 ρ

Figure 10.1 Top: Plots of J0(ρ) (solid curve), J1 (ρ) (dot-dash), and J2 (ρ) (dashed) for real ρ . Bottom: Plots of J3 (ρ) (solid curve), J4 (ρ) (dot-dash), and J5 (ρ)(dashed). The points at which Bessel functions cross the ρ -axis are called zeros or roots; we use them to satisfy boundary conditions. This chapter’s Matlab scripts are in Bessel_functions at github.com/kevinecahill.

from which one may derive (Exercise 10.5) the series expansion (10.1) and (Exercise 10.6) the integral representation (6.54) Jn ( z ) =

1 π

»

0

π

cos(z sin θ − n θ ) d θ

= J−n (−z ) = (−1)n J−n (z)

(10.6)

for all complex z. For n = 0, this integral is (Exercise 10.7) more simply J0 (z ) =

1 2π

»

2π 0

e

i z cos θ

dθ

=

1 2π

»2

π

0

ei z sin θ d θ .

(10.7)

These integrals (Exercise 10.8) give Jn (0) = 0 for n ³ = 0, and J0 (0) = 1. By differentiating the generating function (10.5) with respect to u and identifying the coefficients of powers of u , one finds the recursion relation Jn −1( z ) + Jn +1( z ) =

2n Jn ( z ). z

(10.8)

Similar reasoning after taking the z derivative gives (Exercise 10.10) Jn−1 (z ) − Jn+1 (z ) = 2 Jn´ (z ).

(10.9)

10.1 Cylindrical Bessel Functions of the First Kind

367

By using the gamma function (Section 6.16), one may extend Bessel’s equation (10.4) and its solutions Jn (z ) to nonintegral values of n Jν (z ) =

∞ ³z´ µ

³ z ´2m −1)m . m ! ±(m + ν + 1) 2 m =0

ν

2

(

(10.10)

=

The self-adjoint form (7.354) of Bessel’s equation (10.4) with z (Exercise 10.11)

−

d dx

·

d x Jn ( kx ) dx

¸

2

+ nx Jn (kx ) = k2 x Jn (kx ).

kx is

(10.11)

In the notation of Equation (7.327), p( x ) = x, k 2 is an eigenvalue, and ρ (x ) = x is a weight function. To have a self-adjoint system (Section 7.31) on an interval [0, b], we need the boundary condition (7.287)

½b ¼ 0 = p ( Jn v ´ − Jn´ v) 0

=

¼x (J (kx )v´ (kx ) − J ´ (kx )v(kx ))½b n

n

(10.12)

0

for all functions v(kx ) in the domain D of the system. The definition (10.1) of Jn (z ) implies that J0 (0) = 1, and that Jn (0 ) = 0 for integers n > 0. Thus since p ( x ) = x, the terms in this boundary condition vanish at x = 0 as long as the domain consists of functions v(kx ) that are twice differentiable on the interval [0, b]. To make these terms vanish at x = b, we require that Jn (kb) = 0 and that v(kb) = 0. Thus kb must be a zero z n ,m of Jn ( z ), and so Jn (kb) = Jn (z n,m ) = 0. With k = z n,m /b, Bessel’s equation (10.11) is

−

d dx

·

( ) d x Jn z n,m x /b dx

¸

+

) n2 ( Jn z n ,m x /b x

=

z n2,m b2

(

)

x J n z n,m x /b

.

(10.13)

For fixed n, the eigenvalue k 2 = z 2n ,m /b 2 is different for each positive integer m. Moreover as m → ∞, the zeros z n ,m of Jn ( x ) rise as m π as one might expect since the leading term of the asymptotic form (10.3) of Jn ( x ) is proportional to cos(x − n π/2 − π/4) which has zeros at m π + (n + 1 )π/2 + π/4. It follows that the eigenvalues k 2 ≈ (m π )2 /b 2 increase without limit as m → ∞ in accordance with the general result of Section 7.37. It follows then from the argument of Section 7.38 and from the orthogonality relation (7.366) that for every fixed n, the eigenfunctions Jn (z n,m x /b) , one for each zero, are complete in the mean, orthogonal, and normalizable on the interval [0, b] with weight function ρ (x ) = x

»b 0

x Jn

³ zn m x ´ ³ z n m´ x ´ ,

b

Jn

,

b

dx

2

2

= δm m´ b2 Jn´ 2( zn m ) = δm m´ b2 Jn2+1( zn m ) ,

,

,

,

(10.14) and a normalization constant (Exercise 10.12) that depends upon the first derivative of the Bessel function or the square of the next Bessel function at the zero.

368

10 Bessel Functions

Because they are complete, sums of Bessel functions Jn (z n,k x /b) can represent Dirac’s delta function on the interval [0, b] as in the sum (7.413) δ(x

For n

− y) =

(

2x y α

1−α

/b

2

∞ J ( z x /b ) J ( z y /b ) )µ n nk n nk ,

,

Jn2+1 (z n ,k )

k =1

.

(10.15)

= 1 and b = 1, this sum is δ(x

− y) = 2 x

α

y 1−α

∞ J ( z x ) J ( z y) µ 1 1k 1 1k ,

J22 (z 1,k )

k=1

,

.

(10.16)

Figure 10.2 plots the sum of the first 100,000 terms of this series for α = 1 /2 and y = 1/2. In the figure the scales of the two axes differ by a factor of 108 . This series adequately represents δ( x − 21 ) for functions that vary little on a scale of ² x = 0. 001. The expansion (10.15) of the delta function expresses the completeness of the eigenfunctions Jn ( z n,k x /b ) for fixed n and lets us write any twice differentiable

×104

Bessel J1 series for δ(x

- 1/2)

10

8

)2/1 – x (δ rof seires

6

1J

2

4

0

–2 0.499 0.4992 0.4994 0.4996 0.4998

0.5

0.5002 0.5004 0.5006 0.5008 0.501

x

Figure 10.2 The sum of the first 100,000 terms of the α = 1/ 2 Bessel J1 series (10.16) for the Dirac delta function δ( x − 1/2) is plotted from x = 0.499 to x = 0.501. The scales of the axes differ by 108.

10.1 Cylindrical Bessel Functions of the First Kind

= b as the Fourier series ∞ µ f ( x) = x a k J n (z n k x /b )

369

function f (x ) that vanishes at x

α

where

=

ak

»

2 b2

Jn2+1 (z n,k )

b 0

(10.17)

,

k =1

y 1−α Jn ( z n,k y /b ) f ( y ) dy .

(10.18)

The orthogonality relation on an infinite interval is δ( x

− y) = x

»

∞ 0

k Jn (kx ) J n (ky ) dk

(10.19)

for positive values of x and y. One may generalize these relations (10.11–10.19) from integral n to complex ν with Re ν > −1. Example 10.1(Bessel’s drum) The top of a drum is a circular membrane with a fixed circumference. The membrane’s potential energy is approximately proportional to the extra area it has when it’s not flat. Let h(x , y) be the displacement of the membrane in the z direction normal to the x–y-plane of the flat membrane, and let hx and h y denote its partial derivatives (2.7). The extra length of a line segment dx on the stretched ¾ membrane is 1 + h2x dx, and so the extra area of an element dx dy is dA ≈

·¿

1+

h2x

¿

1+

h 2y

¸

−1

dx dy

≈ 12

³

h2x

+ h 2y

´

dx dy.

(10.20)

The (nonrelativistic) kinetic energy of the area element is proportional to the square of its speed. So if σ is the surface tension and µ the surface mass density of the membrane, then to lowest order in d the action functional (Sections 7.13 and 7.43) is S [h] =

» ¹µ 2

h2t −

σ

2

³

h 2x + h2y

´º

dx dy dt .

We minimize this action for h’s that vanish on the boundary x 2 + y 2 = rd2

(10.21)

»¼ ( )½ 0 = δ S [h] = µ h t δ h t − σ h x δ h x + h y δ h y dx dy dt . (10.22) Since (7.157) δ ht = (δ h)t , δ h x = (δh )x , and δ h y = (δh )y , we can integrate by parts and get » ¼ ( )½ 0 = δ S[ h] = − µ htt + σ h x x + h yy δh dx dy dt (10.23) apart from a surface term proportional to δ h which vanishes because δ h = 0 on the circumference of the membrane. The membrane therefore obeys the wave equation ( + h ) ≡ σ ±h. µh = σ h tt

xx

yy

370

10 Bessel Functions This equation is separable, and so letting h(x , y , t ) = s (t ) v(x , y), we have stt s

= σµ ±vv = − ω 2.

The eigenvalues of the Helmholtz equation − ±v √ as ω = σ λ/µ. The time dependence then is

¾

s (t ) = a sin(

(10.24)

= λ v give the angular frequencies

σ λ/µ (t

− t0))

(10.25)

in which a and t0 are constants. In polar coordinates, Helmholtz’s equation is separable (7.23–7.26)

− µv = − vrr − r −1vr − r −2v = λv. (10.26) We set v(r, θ ) = u (r )h (θ ) and find −u´´ h−r −1 u´ h −r −2 uh´´ = λuh. After multiplying θθ

both sides by r 2 /uh, we get

´

u´´ u

´´

+ r uu + λr 2 = − hh = n2 . (10.27) The general solution for h then is h(θ ) = b sin(n(θ − θ0 )) in which b and θ0 are r2

constants and n must be an integer so that h is single valued on the circumference of the membrane. The displacement u(r ) must obey the differential equation

−

( ´ )´ ru

+ n 2 u/ r = λ r u

(10.28)

and vanish at r = r d , the rim of the drum. Therefore it is an eigenfunction u (r ) Jn (zn,m r /r d ) of the self-adjoint differential equation (10.13)

−

d dr

·

¸

( ) d r Jn zn ,m r /r d dr

2

+ nr

(

J n zn , m r / rd

) = z2n m x J ( z ,

rd2

n

n,m r /r d

)

=

(10.29)

with an eigenvalue λ = z2n,m / rd2 in which z n,m is the mth zero of the nth Bessel function, Jn (zn,m ) = 0. For each integer n ≥ 0, there are infinitely many zeros z n,m at which the Bessel function vanishes, and for each zero the frequency is √ ω = (zn ,m / rd ) σ/µ. The general solution to the wave equation of the membrane ( ) µ h tt = σ h x x + h yy ≡ σ ±h thus is h(r, θ , t ) =

∞ ∞ µ µ

n =0 m =1

cn ,m sin

± z ¶σ n,m

rd

µ

(t

² · ¸ − t0 ) sin [n(θ − θ0 )] Jn z n m rr ,

d

(10.30) in which t0 and θ0 can depend upon n and m. For each n and big m, the zero zn ,m is near m π + (n + 1)π/2 + π/4.

Helmholtz’s equation − µV = α 2 V separates in cylindrical coordinates in 3 dimensions (Section 7.3). Thus the function V (ρ , φ , z ) = B (ρ)³(φ) Z ( z ) satisfies the equation

10.1 Cylindrical Bessel Functions of the First Kind

371

+ ρ1 V + ρ V zz = α2 V

(10.31)

± 1 (

− µV = − ρ

ρ

V,ρ

)

,ρ

²

,φφ

,

both if B(ρ) obeys Bessel’s equation d ρ dρ

·

dB ρ dρ

¸ ( ) + (α 2 + k2 )ρ2 − n2 B = 0

(10.32)

while ³ and Z satisfy 2

− ddφ³2 = n2³(φ)

d 2Z dz 2

= k2 Z (z ),

(10.33)

¸ ( ) + (α 2 − k2 )ρ2 − n2 B = 0

(10.34)

and

and if B (ρ) obeys the Bessel equation d ρ dρ

·

dB ρ dρ

while ³ and Z satisfy 2

− ddφ³2 = n2³(φ)

and

d 2Z dz 2

= −k 2 Z (z).

In the first case (10.32 and 10.33), the solution is Vk ,n (ρ , φ , z ) = Jn

³¾

α2

+ k2 ρ

In the second case (10.34 and 10.35), it is Vk ,n (ρ , φ , z ) = Jn

³¾

α2

− k2 ρ

´

´

(10.35)

e¶in φ e ¶kz.

(10.36)

e ¶in φ e¶ikz .

(10.37)

In both cases, n must be an integer if the solution is to be single valued on the full range of φ from 0 to 2π . When α = 0, Helmholtz’s equation reduces to Laplace’s equation µ V = 0 which is satisfied by the simpler functions Vk,n (ρ , φ , z ) = Jn ( kρ)e ¶inφ e ¶kz

Vk,n (ρ , φ , z ) = Jn (ikρ)e ¶inφ e ¶ikz . (10.38) − ν The product i Jν (ikρ) is real and is known as the modified Bessel function and

Iν (kρ) ≡ i −ν Jν (ikρ).

(10.39)

It occurs in various solutions of the diffusion equationD µ V V (ρ , φ , z ) = B (ρ)³(φ) Z (z ) satisfies

= V˙ . The function

±( ) 1 ² µ V = ρ ρ V + ρ V + ρ V zz = α2 V 1

,ρ ,ρ

,φφ

,

(10.40)

both if B(ρ) obeys Bessel’s equation d ρ dρ

·

dB ρ dρ

¸ ( ) − (α 2 − k2 )ρ2 + n2 B = 0

(10.41)

372

10 Bessel Functions

while ³ and Z satisfy 2

− dd φ³2 = n2 ³(φ)

d2 Z dz 2

= k2 Z ( z)

(10.42)

¸ ( ) − (α2 + k 2)ρ 2 + n2 B = 0

(10.43)

and

and if B(ρ) obeys the Bessel equation d ρ dρ

·

dB ρ dρ

while ³ and Z satisfy 2

− dd φ³2 = n2 ³(φ)

and

d2 Z dz 2

= −k 2 Z ( z).

(10.44)

In the first case (10.41 and 10.42), the solution is Vk ,n (ρ , φ , z ) = In (

¾

α2

− k2 ρ)e¶in

e ¶kz .

(10.45)

e ¶ikz .

(10.46)

φ

In the second case (10.43 and 10.44), it is Vk ,n (ρ , φ , z ) = In (

¾

α2

+ k2 ρ)e¶in

φ

In both cases, n must be an integer if the solution is to be single valued on the full range of φ from 0 to 2π . Example 10.2 (Charge near a membrane) We will use ρ to denote the density of free charges – those that are free to move in or out of a dielectric medium, as opposed to those that are part of the medium, bound in it by molecular forces. The time-independent Maxwell equations are Gauss’s law ∇ · D = ρ for the divergence of the electric displacement D, and the static form ∇ × E = 0 of Faraday’s law which implies that the electric field E is the gradient of an electrostatic potential E = −∇ V . Across an interface between two dielectrics with normal vector nˆ , the tangential electric field is continuous, nˆ × E 2 = nˆ × E1 , while the normal component of the electric displacement jumps by the surface density σ of free charge, nˆ · ( D2 − D 1) = σ . In a linear dielectric, the electric displacement D is the electric field multiplied by the permittivity ´ of the material, D = ´ E. The membrane of a eukaryotic cell is a phospholipid bilayer whose area is some 3 × 108 nm2 , and whose thickness t is about 5 nm. On a scale of nanometers, the membrane is flat. We will take it to be a slab extending to infinity in the x and y directions. If the interface between the lipid bilayer and the extracellular salty water is at z = 0, then the cytosol extends thousands of nm down from z = − t = −5 nm. We will ignore the phosphate head groups and set the permittivity ´µ of the lipid bilayer to twice that of the vacuum ´µ ≈ 2´0 ; the permittivity of the extracellular water and that of the cytosol are ´w ≈ ć ≈ 80´0 .

10.1 Cylindrical Bessel Functions of the First Kind

373

We will compute the electrostatic potential V due to a charge q at a point (0, 0, h) on the z-axis above the membrane. This potential is cylindrically symmetric about the z-axis, so V = V (ρ , z ). The functions Jn (kρ) ein φ e¶kz form a complete set of solutions (10.38) of Laplace’s equation, but due to the symmetry, we only need the n = 0 functions J0 (kρ) e¶kz . Since there are no free charges in the lipid bilayer or in the cytosol, we may express the potential in the lipid bilayer Vµ and in the cytosol Vc as Vµ (ρ , z ) = Vc (ρ , z ) =

»∞ 0

»∞ 0

¹

dk J0 (kρ) m (k ) ekz + f (k) e−kz

(10.47) dk J0 (kρ) d (k) e

kz

.

−µG ( x) =

The Green’s function (4.113) for Poisson’s equation cylindrical coordinates is (6.152) G (x) =

1

4π | x|

=

4π

¾

1

ρ2

+ z2

» ∞ dk

=

0

J0(kρ) e−k |z | .

4π

Thus we may expand the potential in the salty water as Vw (ρ , z ) =

β

»∞ 0

º

dk J0 (kρ)

±

q 4π ´ w

e−k |z − h| + u(k) e−kz

Using nˆ × E 2 = nˆ × E 1 and nˆ · ( D2 − D 1) = ≡ qe−kh /4π ´w and y = e2kt , we get four equations m+ f

−u = β ´ m − ´ y f − ć d = 0

σ,

δ

( 3)

(x)

in

(10.48)

²

(10.49)

.

suppressing k, and setting

−´ f +´ u = ´ β (10.50) and m + y f − d = 0. In terms of the abbreviations ´ = (´ + ´ ) /2 and ć = (ć + ´ ) /2 as well as p = (´ − ´ )/(´ + ´ ) and p´ = (´ c − ´ )/(ć + ´ ), the solutions are µ

and

µ

w

w

µ

wµ

w

´µ m

µ

w

u ( k) = β f ( k) =

w

µ

p − p´ / y 1 − pp ´ / y

−β ´

´w wµ

µ

µ

µ

and m (k ) = β

p´ / y 1 − pp´ / y

µ

µ

´w ´ wµ

1−

and d (k) = β

1 pp´ / y

´w ´ µ ´ wµ ´ cµ

1−

1 . pp ´ / y

(10.51)

Inserting these solutions into the Bessel expansions (10.47) for the potentials, expanding their denominators 1−

1 pp ´/ y

=

∞ µ

ń ( pp )

e−2nkt

(10.52)

0

and using the integral (10.48), we find that the potential Vw in the extracellular water of a charge q at (0, 0, h) in the water is

374 Vw (ρ , z ) =

À

q 4π ´w

¾

10 Bessel Functions 1 r

+¾

p

ρ2

−

+ (z + h )2

∞ µ n=1

(

p´ 1 − p2

¾

ρ2

)

Á

´ n−1

( pp )

+ (z + 2nt + h)2

(10.53)

in which r = ρ 2 + (z − h)2 is the distance to the charge q. The principal image charge pq is at (0, 0, −h). Similarly, the potential Vµ in the lipid bilayer is Vµ (ρ , z ) =

∞À µ

q 4π ´ wµ

n =0

¾

( pp ´ )n

ρ2

+ (z − 2nt − h)2

and that in the cytosol is Vc (ρ , z ) =

∞ µ

q ´µ 4π ´ wµ ć µ

n=0

−¾

ρ2

+ (z + 2(n + 1)t + h)2

(10.54)

ń ( pp )

¾

+ (z − 2nt − h )2

ρ2

Á

pn pń+ 1

(10.55)

.

These potentials are the same as those of Example 5.19, but this derivation is much simpler and less error prone than the method of images. Since p = (´ w − ´µ )/(´w + ´µ ) > 0, the principal image charge pq at (0, 0, − h) has the same sign as the charge q and so contributes to the energy a positive term proportional to pq 2. So a lipid slab repels a nearby charge in water no matter what the sign of the charge. A cell membrane is a phospholipid bilayer. The lipids avoid water and form a 4 nmthick layer that lies between two 0.5 nm layers of phosphate groups which are electric dipoles. These electric dipoles cause the cell membrane to weakly attract ions that are within 0.5 nm of the membrane. Example 10.3(Cylindrical waveguides) An electromagnetic wave traveling in the z-direction down a cylindrical waveguide looks like E einφ ei (kz − ωt )

and

in which E and B depend upon ρ E

= E ρˆ + E φˆ + E z zˆ ρ

and

φ

B

B ein φ ei (kz −ωt )

(10.56)

= B ρˆ + B φˆ + Bz ˆz ρ

(10.57)

φ

in cylindrical coordinates. If the waveguide is an evacuated, perfectly conducting cylinder of radius r , then on the surface of the waveguide the parallel components of E and the normal component of B must vanish which leads to the boundary conditions E z (r ) = 0,

E φ (r ) = 0,

and

B ρ (r ) = 0.

(10.58)

In a notation (2.7) in which subscripts denote differentiation, the vacuum forms ∇ × E = − B˙ and ∇ × B = E˙ / c2 of the Faraday and Maxwell–Ampère laws give us (exercise 10.15) the field equations E φz /ρ − ik E φ ik E ρ

= i ωB − E z = i ωB ρ

ρ φ

in B z /ρ − ik B φ ik B ρ

= −i ω E

− B z = −i ω E ρ

φ

/c

2

ρ

/c

2

(10.59)

¼

10.1 Cylindrical Bessel Functions of the First Kind φ

(ρ E )ρ

− in E

ρ

½

/ρ

¼

= iω Bz

φ

(ρ B )ρ

− in B

ρ

½

/ρ

375

= −i ω Ez /c 2.

Solving them for the ρ and φ components of E and B in terms of their z components (Exercise 10.16), we find Eρ Bρ

−ik E z + n ω B z/ρ k 2 − ω2 /c2 −ik B z − nω E z /c2 ρ = k 2 − ω2 / c 2 =

ρ

Eφ

ρ

Bφ

= =

nk E z /ρ + i ω Bρz k 2 − ω2 /c2

nk B z /ρ − i ω E ρz /c2 k2 − ω 2/c2

(10.60) .

The fields E z and B z obey the wave equations (12.43 , Exercise 7.6)

− µ E z = − E¨ z /c2 = ω2 Ez /c2

and

− µ B z = − B¨ z /c2 = ω2 B z/ c2.

(10.61)

Because their z-dependence is periodic, they are (Exercise 10.17) linear ¾ 2 2 (10.56) in φ i ( kz −ω t ) 2 combinations of Jn ( ω / c − k ρ)e e . Modes with Bz = 0 are transverse magneticor ¾ 2 2 TM2 modes. For them the boundary conditions (10.58) will be satisfied if ω /c − k r is a zero z n,m of Jn . So the frequency ωn ,m (k ) of the n, m TM mode is

¿ (10.62) = c k 2 + z 2n m/ r 2. Since the first zero of J0 is z 0 1 ≈ 2.4048, the minimum frequency ω0 1(0) = c z0 1 /r ≈ 2. 4048 c /r occurs for n = 0 and k = 0. If the radius of the waveguide is r = 1 cm, then ω0 1 (0)/2π is about 11 GHz, which is a microwave frequency with a ωn,m (k )

,

,

,

,

,

wavelength of 2.6 cm. In terms of the frequencies (10.62), the field of a pulse moving in the +z-direction is E

z

(ρ , φ , z , t )

=

∞ µ ∞ »∞ µ n =0 m =1 0

cn,m (k) Jn

³ zn m ρ ´ ,

r

¼

½

ein φ exp i kz − ωn,m (k)t dk .

(10.63) Modes with E z = 0 are transverse electric or TE¾modes. Their boundary conditions (10.58) are satisfied (Exercise 10.19) when ω2/ c2 − k 2 r is a zero zń ,m of Jn´ . Their frequencies are

ωn ,m (k )

=

¿

c k2 + zn´2,m / r 2. Since the first zero of

a first derivative of a Bessel function is z1´ ,1 ≈ 1.8412, the minimum frequency ´ ω1,1(0) = c z 1,1 /r ≈ 1. 8412 c/r occurs for n = 1 and k = 0. If the radius of the waveguide is r = 1 cm, then ω1,1(0)/2π is about 8.8 GHz, which is a microwave frequency with a wavelength of 3.4 cm.

Example 10.4(Cylindrical cavity) The modes of an evacuated, perfectly conducting cylindrical cavity of radius r and height h are like those of a cylindrical waveguide (Example 10.3) but with extra boundary conditions B z (ρ , φ , 0, t ) = E ρ (ρ , φ , 0, t ) = E φ (ρ , φ , 0, t ) = 0

B z (ρ , φ , h, t ) = E ρ (ρ , φ , h , t ) = E φ (ρ , φ , h , t ) = 0

(10.64)

376

10 Bessel Functions

at the two ends of the cylinder. If µ is an integer and if z ń,m of Jn´ , then the TE fields E z = 0 and Bz

= Jn (zń m ρ/r ) ein ,

φ

¾

ω 2/c2

− π 2 µ2/ h 2 r is a zero

sin(π µz / h) e −i ωt

(10.65)

satisfy both these (10.64) boundary conditions at z = 0 and h and those (10.58) at = r as well as the separable wave ¿ equations (10.61). The frequencies of the resonant

ρ

= c zń2 m / r 2 + π 2µ 2/ h2 . The TM modes are B z = 0 ¿ and E z = Jn (z n m ρ/ r ) ein resonant frequencies ωn m = c z 2n m / r 2 + π 2µ 2/ h2 .

TE modes then are ωn,m ,µ

,

φ

,

,

,µ

sin(π µ z / h) e−i ωt with

,

10.2 Spherical Bessel Functions of the First Kind The spherical Bessel function jµ (x ) is proportional to the cylindrical Bessel √ function Jµ +1/ 2( x ) divided by the square root x jµ ( x ) ≡

¶π

2x

√

Jµ +1/2 (x ).

(10.66)

By setting n = µ + 1/2 and Jµ +1/2 (x ) = 2x /π jµ ( x ) in Bessel’s equation (10.4), one may derive (Exercise 10.22) the equation x 2 jµ´´ (x ) + 2x jµ´ (x ) + [ x 2 − µ(µ + 1)] jµ ( x ) = 0

(10.67)

for the spherical Bessel function jµ . We saw in Example 7.7 that by setting V (r, θ , φ) = Rk ,µ (r ) ¶ µ,m (θ ) ³m (φ) we could separate the variables of Helmholtz’s equation −µ V = k 2 V in spherical coordinates r 2 µV V

=

(r 2 Rk´ ,µ )´

R k ,µ

+

(sin θ ¶ ´µ,m )´

sin θ ¶ µ,m

Thus if ³m (φ) = e imφ so that ³ ´´m Legendre equation(9.93)

(

+

³´´ 2

sin

θ³

= − k 2r 2 .

= −m 2³m , and if ¶

µ,m

)´

(10.68)

satisfies the associated

+ [ µ(µ + 1) sin2 θ − m 2 ] ¶ m = 0, (10.69) then the product V (r, θ , φ) = Rk (r ) ¶ m (θ ) ³ m (φ) will obey (10.68) because by Bessel’s equation (10.67) the radial function Rk (r ) = j (kr ) satisfies 2 ´ ´ 2 2 (r R k ) + [k r − µ(µ + 1 )] Rk = 0 . (10.70) In terms of the spherical harmonic Y m (θ , φ) = ¶ m (θ ) ³ m (φ), the solution is V (r, θ , φ) = j (kr ) Y m (θ , φ). sin θ sin θ ¶´µ,m

µ,

,µ

µ,

,µ

,µ

,µ

µ,

µ

µ,

µ

µ,

10.2 Spherical Bessel Functions of the First Kind

377

Rayleigh’s formula gives the spherical Bessel function jµ ( x ) as the µth derivative of sin x /x jµ ( x ) = ( −1) x µ

· 1 d ¸ · sin x ¸ µ

µ

x dx

x

(10.71)

.

In particular, j0( x ) = sin x /x and j1 ( x ) = sin x / x 2 − cos x /x (Lord Rayleigh (John William Strutt), 1842–1919). His formula (10.71) leads (Exercise 10.23) to the recursion relation jµ+1 ( x ) =

µ

x

jµ ( x ) − jµ´ (x )

(10.72)

with which one can show (Exercise 10.24) that the spherical Bessel functions jµ ( x ) defined by (10.71) satisfy the differential equation (10.70) with x = kr. The series expansion (10.1) for Jn and the definition (10.66) of jµ give us for small |ρ | ± 1 the approximation jµ (ρ) ≈

!

µ (2 ρ)µ

µ

ρ = . (2µ + 1 )! (2µ + 1 )!!

(10.73)

To see how jµ (ρ) behaves for large |ρ | ² 1, we use Rayleigh’s formula (10.71) to compute j1 (ρ) and notice that the derivative d /dρ j1 (ρ) =

−

d dρ

· sin ρ ¸ ρ

= − cosρ ρ + sinρ 2ρ

(10.74)

adds a factor of 1/ρ when it acts on 1/ρ but not when it acts on sin ρ . Thus the dominant term is the one in which all the derivatives act on the sine, and so for large |ρ | ² 1, we have approximately jµ (ρ)

= (−1) ρ µ

· 1 d ¸ · sin ρ ¸ µ

µ

ρ

dρ

ρ

≈ (−ρ1)

µ

d µ sin ρ dρ µ

= sin (ρ −ρ µπ/2)

(10.75) with an error that falls off as 1/ρ . The quality of the approximation, which is exact for µ = 0, is illustrated for µ = 1 and 2 in Fig. 10.3. The spherical Bessel functions jµ (kr ) satisfy the self-adjoint Sturm–Liouville (7.377) equation (10.70) 2

− (r 2 j ´ )´ + µ(µ + 1) j = k 2r 2 j (10.76) with eigenvalue k 2 and weight function ρ = r 2 . If j ( z n ) = 0, then the functions j ( kr ) = j (z nr /a ) vanish at r = a and form an orthogonal basis »a a3 2 j ( z n r /a ) j (z m r /a ) r 2 dr = j (z n ) δn m (10.77) 2 +1 µ

µ

µ

µ

µ

µ

µ,

µ,

0

µ

µ,

µ

µ,

µ

µ,

,

378

10 Bessel Functions Spherical Bessel functions and their approximations j1(ρ) and its approximations

0.5

)ρ (1j

0

–0.5

0

2

4

6

8

10

12

10

12

ρ

j2(ρ)

0.5

and its approximations

)ρ(2j

0

–0.5

0

2

4

6

8

ρ

Figure 10.3 Top: Plot of j1 (ρ) (solid curve) and its approximations ρ/3 for small ρ (10.73, dashes) and sin(ρ − π/2 )/ρ for big ρ (10.75, dot-dash). Bottom: Plot of j2(ρ) (solid curve) and its approximations ρ 2/15 for small ρ (10.73, dashed) and sin(ρ − π )/ρ for big ρ (10.75, dot-dash). The values of ρ at which jµ (ρ) = 0 are the zeros or roots of jµ ; we use them to fit boundary conditions.

for a self-adjoint system on the interval [0 , a ]. Moreover, since as n → ∞ the eigenvalues k µ,2 n = z 2µ,n /a 2 ≈ [(n + µ/2)π ]2/a 2 → ∞, the eigenfunctions jµ ( z µ,n r /a ) also are complete in the mean (Section 7.38) as shown by the expansion of the delta function

∞ 2r αr ´ 2−α µ jµ ( z µ,n r /a ) jµ (z µ, nr ´/a ) ´ δ(r − r ) = a3 jµ2+1( z µ,n ) n=1

(10.78)

for 0 ≤ α ≤ 2. This formula lets us expand any twice-differentiable function f (r ) that vanishes at r = a as f (r ) = r where an

= a3 j 2

2

( z µ,n ) µ +1

α

∞ µ n =1

»

a 0

a n jµ (z µ,n r /a )

r ´2−α jµ ( z µ,n r ´ /a ) f (r ´ ) dr ´ .

(10.79)

(10.80)

10.2 Spherical Bessel Functions of the First Kind

On an infinite interval, the delta-function formulas are for k , k ´ 2kk ´ ´ δ(k − k ) =

»

π

and for r, r ´

>

0 2rr ´ ´ δ(r − r ) =

∞ 0

»

π

∞ 0

»

>

0

jµ (kr ) jµ ( k ´ r ) r 2 dr

(10.81)

jµ (kr ) jµ ( kr ´ ) k 2 dk .

(10.82)

One may iterate the trick 1 d e i zx d x zdz −1

379

»

»

1 1 = zi xei zx d x = 2zi ei zx d( x 2 − 1) −1 −1 » » 1 i 1 1 2 2 i zx = − 2z (x − 1) de = 2 ( x − 1)ei zx d x −1 −1

(10.83)

to show (Exercise 10.25) that (Schwinger et al., 1998, p. 227)

·

d zdz

¸»1 µ

−1

e

i zx

dx

=

Using this relation and the integral j 0( z ) =

sin z z

=

»

1

−1

− 1) 2 µ!

(x 2

µ

µ

e i zx d x .

»

1 1 i zx e dx, 2 −1

(10.84)

(10.85)

we can write Rayleigh’s formula (10.71) for the spherical Bessel function as jµ ( z ) =

· 1 d ¸ · sin z ¸ µ

−z) z dz » 1 (1 − x 2 ) z =2 −1 2 µ! µ

(

µ

µ

µ

µ

·1 d ¸ 1 »

µ

µ

1

µ

= (−z ) z dz 2 ei zx d x z » 1 (1 − x 2) −1d ( − i ) ei zx d x = ei zx d x . 2 2 µ ! d x −1 µ

µ

(10.86)

µ

Now integrating µ times by parts and using Rodrigues’s formula (9.8) for the Legendre polynomial Pµ (x ) , we get

»1 d − i) j (z ) = e i zx 2 dx

− 1) d x = (−i ) » 1 P ( x ) ei zx d x . 2 µ! 2 −1 −1 This formula with z = kr and x = cos θ » 1 1 P (cos θ )e ikr cos d cos θ i j (kr ) = 2 (

µ

µ

µ

µ

µ

µ

(x 2

µ

µ

µ

µ

−1

µ

θ

(10.87)

(10.88)

380

10 Bessel Functions

turns the Fourier–Legendre expansion (9.31) for e ikr cos θ into eikr cos θ

=

∞ µ 2µ + 1 =0 ∞ µ

2

µ

=

=0

Pµ (cos θ )

»1 −1

+ 1) P (cos θ ) i

(2µ

µ

´

Pµ ( cos θ ´ ) e ikr cos θ d cos θ ´ (10.89)

jµ (kr ).

µ

µ

If θ , φ and θ ´ , φ ´ are the polar angles of the vectors r and k, then by using the addition theorem (9.123) we get the plane-wave expansion e ik· r

=

∞ µ µ µ

=0 m =−µ

4π i µ jµ (kr ) Yµ,m (θ , φ) Yµ,∗ m (θ ´ , φ ´ ).

(10.90)

µ

Example 10.5(Partial waves) Spherical Bessel functions occur in the wave functions of free particles with well-defined angular momentum. The hamiltonian H0 = p2/ 2m for a free particle of mass m and the square L 2 of the orbital angular-momentum operator are both invariant under rotations; thus they commute with the orbital angular-momentum operator L. Since the operators H0 , L 2, and L z commute with each other, simultaneous eigenstates |k , µ, m ¸ of these compatible operators (Section 1.31) exist (± k )2 p2 | k , µ, m ¸ = |k, µ, m¸ (10.91) 2m 2m L 2 |k , µ, m ¸ = ±2 µ(µ + 1) | k, µ, m¸ and L z |k , µ, m ¸ = ± m | k, µ, m¸.

H0 |k , µ, m ¸ =

By (10.67–10.70), their wave functions are products of spherical Bessel functions and spherical harmonics (9.112)

¹ r |k , µ, m ¸ = ¹r, θ , φ|k , µ, m ¸ =

¶

2 π

k jµ (kr ) Yµ,m (θ , φ).

(10.92)

They satisfy the normalization condition

» ´»∞ ¹k , µ, m |k ´ , µ´ , m ´ ¸ = 2kk j (kr ) j (k ´r ) r 2 dr π 0 = δ(k − k ´ ) δ ´ δm m´ µ

µ

µ,µ

,

Yµ,∗ m (θ , φ) Yµ´ ,m ´ (θ , φ) d · (10.93)

and the completeness relation 1=

»∞ µ ∞ µ µ

0

dk

µ

=0 m =−µ

|k , µ, m ¸¹k , µ, m |.

(10.94)

10.2 Spherical Bessel Functions of the First Kind

381

Their inner products with an eigenstate | k² ¸ of a free particle of momentum p² = ± k² are µ (10.95) ¹k , µ, m |k²¸ = ik δ(k − k ´ ) Yµ,∗ m(θ ´ , φ´ ) in which the polar coordinates of k² are θ ´ , φ ´ . Using the resolution (10.94) of the identity operator and the inner-product formulas (10.92 and 10.95), we recover the expansion (10.90) ²

e i k ·r (2 π )3/2

= ¹r | k² ¸ = =

∞ ¶2 µ =0

µ

»∞ µ ∞ µ µ

0

dk

=0 m =−µ

¹r |k , µ, m ¸¹k , µ, m |k² ¸

µ

(10.96)

i µ jµ (kr ) Yµ, m (θ , φ) Y µ,∗ m (θ ´ , φ ´ ).

π

The small kr approximation (10.73), the definition (10.92), and the normalization (9.115) of the spherical harmonics tell us that the probability that a particle with angular momentum ±µ about the origin has r = | r | ± 1/ k is P (r ) =

2k2 π

»r 0

jµ (kr ´ )r ´2dr ´ ≈ 2

2

[

+ 1)!!]2

π (2µ

»r 0

(kr )

2µ+2

dr

2µ+3

) = (π4µ[(+2µ6+)(kr 3)!!]2 k

(10.97)

which is very small for big µ and tiny k. So a short-range potential can only affect partial waves of low angular momentum. When physicists found that nuclei scattered low-energy hadrons into s-waves, they knew that the range of the nuclear force was short, about 10−15m. If the potential V (r ) that scatters a particle is of short range, then at big r the radial wave function uµ (r ) of the scattered wave should look like that of a free particle (10.96) which by the big kr approximation (10.75) is u (µ0) (r ) = jµ (kr ) ≈

sin(kr

− µπ/2) =

kr

º 1 ¹ i (kr −µπ/2) e − e−i (kr −µπ/ 2) . (10.98) 2ikr

Thus at big r the radial wave function uµ (r ) differs from u (µ0) (r ) only by a phase shift δµ

º 1 ¹ i (kr −µπ/2+δµ ) e − e−i ( kr −µπ/2+δµ ) . (10.99) kr 2ikr The phase shifts determine the cross-section σ to be (Cohen-Tannoudji et al., 1977, chap. VIII) uµ (r ) ≈

sin(kr

− µπ/2 + δ ) = µ

σ

= 4kπ2

∞ µ =0

(2 µ

+ 1) sin2 δ . µ

µ

If the potential V (r ) is negligible for r section is σ ≈ 4π sin2 δ 0/ k2 .

> r0 ,

then for momenta k

(10.100)

± 1/ r0 the cross-

382

10 Bessel Functions

Example 10.6(Quantum dots) The active region of some quantum dots is a CdSe sphere whose radius a is less than 2 nm. Photons from a laser excite electron–hole pairs which fluoresce in nanoseconds. I will model a quantum dot simply as an electron trapped in a sphere of radius a. Its wave function ψ (r, θ , φ) satisfies Schrödinger’s equation

± µψ + V ψ = E ψ (10.101) − 2m with the boundary condition ψ (a, θ , φ) = 0 and the potential V constant and negative for r < a and infinitely positive for r > a. With k 2 = 2m ( E − V )/± 2 = z2 n / a2, the 2

µ,

unnormalized eigenfunctions are

= j (z n r /a) Y m (θ , φ) θ (a − r ) (10.102) in which the Heaviside function θ (a − r ) makes ψ vanish for r > a, and µ and m are integers with −µ ≤ m ≤ µ because ψ must be single valued for all angles θ and φ. The zeros z n of j (x ) fix the energy levels as E n m = (± z n /a)2/ 2m + V . For j0 (x ) = sin x / x, they are z 0 n = n π . So E n 0 0 = (±nπ/a)2 /2m + V . If the coupling to a photon is via a term like p · A, then one expects ²µ = 1. The energy gap from the n, µ = 1 state to the n = 1, µ = 0 ground state is ψn,µ,m (r, θ , φ)

µ,

µ

µ,

µ,

µ

,µ,

,

² En

µ,

, ,

± = En 1 0 − E1 0 0 = (z21 n − π 2 ) 2ma . 2 2

, ,

, ,

,

(10.103)

Inserting factors of c2 and using ±c = 197 eV nm, and mc2 = 0.511 MeV, we find from the zero z 1,2 = 7. 72525 that ² E 2 = 1.89 (nm/a )2 eV, which is red light if a = 1 nm. The next zero z 1,3 = 10.90412 gives ² E 3 = 4.14 (nm/ a)2 eV, which is in the visible if 1.2 < a < 1.5 nm. The Mathematica command Do[Print[N[BesselJZero[1.5, k]]], {k, 1, 5, 1}] gives the first five zeros of j1 (x ) to six significant figures.

10.3 Bessel Functions of the Second Kind In Section 8.6, we derived integral representations (8.47 and 8.48) for the Hankel functions Hλ(1) ( z ) and Hλ( 2) (z ) for Re z > 0. One may analytically continue them (Courant and Hilbert, 1955, chap. VII) to the upper half z-plane

»

1 −i λ/2 ∞ i z cosh x −λ x e dx e H z) = πi −∞ and to the lower half z-plane (1) λ (

Imz

≥0

(10.104)

»

1 +i λ/2 ∞ −i z cosh x −λ x e dx Imz ≤ 0. (10.105) Hλ ( z ) = − e πi −∞ When both z = ρ and λ = ν are real, the two Hankel functions are complex conjugates of each other (2)

Hν(1) (ρ) = Hν(2)∗ (ρ).

(10.106)

10.3 Bessel Functions of the Second Kind

383

Hankel functions, called Bessel functions of the third kind , are linear combinations of Bessel functions of the first Jλ (z ) and second Yλ (z ) kind Hλ( 1) ( z ) = Jλ ( z ) + iYλ ( z )

(10.107)

Hλ( 2) ( z ) = Jλ ( z ) − iYλ ( z ).

Bessel functions of the second kind also are called Neumann functions; the symbols Yλ (z ) = N λ (z ) refer to the same function. They are infinite at z = 0 as illustrated in Fig. 10.4. When z = i x is imaginary, we get the modified Bessel functions

∞ ³ x ´2m+α µ 1 − α Iα ( x ) = i Jα (i x ) = m ! ±(m + α + 1) 2 m =0 »∞ π α +1 ( 1) H (i x) = e −x cosh t cosh αt dt . Kα (x ) = i 2

α

(10.108)

0

Bessel Functions of the Second Kind Y0(ρ ), Y1(ρ), Y1(ρ)

0.5 ) ρ(2,1,0Y

0 –0.5 –1

0

2

4

6

8

10

12

10

12

14

ρ

Y3(ρ ), Y4(ρ), Y5(ρ)

0.5

) ρ(5,4,3Y

0 –0.5 –1

2

4

6

8 ρ

Figure 10.4 Top: Y0 (ρ) (solid curve), Y 1(ρ) (dot-dash), and Y 2(ρ) (dashed) for 0 < ρ < 12. Bottom: Y 3(ρ) (solid curve), Y4 (ρ) (dot-dash), and Y 5(ρ) (dashed) for 2 < ρ < 14. Bessel functions cross the ρ -axis at zeros or roots.

384

10 Bessel Functions

Some simple cases are I−1/2 (z ) =

¶

2 cosh z , I1/2 (z ) = πz

¶

2 sinh z , and K 1/ 2( z ) = πz

¶π

e −z . 2z (10.109) When do we need to use these functions? If we are representing functions that are finite at the origin ρ = 0, then we don’t need them. But if the point ρ = 0 lies outside the region of interest or if the function we are representing is infinite at that point, then we do need the Yν (ρ)’s.

Example 10.7(Coaxial waveguides) An ideal coaxial waveguide is perfectly conducting for ρ < r0 and ρ > r , and the waves occupy the region r0 < ρ < r . Since points with ρ = 0 are not in the physical domain of the problem, the electric field E (ρ , φ) exp(i (kz − ω t )) is a linear combination of Bessel functions of the first and second kinds with E

z

(ρ , φ)

= a Jn

·¿

ω2 /c2

−

k2 ρ

¸

+ b Yn

·¿

ω 2/c 2

−

k2 ρ

¸

(10.110)

in the notation of Example 10.3. A similar equation represents the magnetic field B z . The fields E and B obey the equations and boundary conditions of Example 10.3 as well as E z (r0 , φ) = 0,

E φ (r0 , φ) = 0,

and

B ρ (r 0, φ) = 0

(10.111)

at ρ = r 0. In TM modes with B z = 0, one may show (Exercise 10.28) that the boundary conditions E z (r 0, φ) = 0 and E z (r, φ) = 0 can be satisfied if Jn (x ) Yn (v x ) − Jn (v x ) Y n (x ) = 0

¾

(10.112)

in which v = r / r0 and x = ω2/ c2 − k2 r0 . The Matlab code JYequation.m in Bessel_functions at github.com/kevinecahill shows that for n = 0 and v = 10, the first three solutions are x 0,1 = 0 .3314, x0,2 = 0.6858, and x0,3 = 1. 0377. Setting n = 1 and adjusting the guesses in the code, one finds x 1,1 = 0. 3941, x1,2 = 0.7331, ¿ and x1,3 = 1.0748. The corresponding dispersion relations are ωn , i ( k )

=c

k 2 + xn2,i /r02.

10.4 Spherical Bessel Functions of the Second Kind Spherical Bessel functions of the second kind are defined as yµ (ρ) =

¶π

2ρ

Yµ+1/ 2(ρ)

(10.113)

10.4 Spherical Bessel Functions of the Second Kind

385

Spherical Bessel functions of the second kind and their approximations y1(ρ) and its approximations

0.2 ) ρ(1y

0

–0.2

–0.4 1

2

3

4

5

6

7

8

9

10

11

12

10

11

12

ρ

y2(ρ)

and its approximations

0.2 ) ρ(2y

0

–0.2

–0.4 1

2

3

4

5

6

7

8

9

ρ

Figure 10.5 Top: Plot of y1(ρ) (solid curve) and its approximations −1/ρ 2 for small ρ (10.116, dot-dash) and − cos(ρ − π/2)/ρ for big ρ (10.115, dashed). Bottom: Plot of y2 (ρ) (solid curve) and its approximations −3/ρ 3 for small ρ (10.116, dot-dash) and − cos(ρ − π )/ρ for big ρ (10.115, dashed). The values of ρ at which yµ (ρ) = 0 are the zeros or roots of yµ ; we use them to fit boundary conditions. All six plots run from ρ = 1 to ρ = 12.

and Rayleigh formulas express them as y µ (ρ) = (−1) +1ρ µ µ

·

d ρ dρ

¸ · cos ρ ¸ µ

ρ

.

(10.114)

The term in which all the derivatives act on the cosine dominates at big ρ yµ (ρ) ≈

ρ = − cos (ρ − µπ/2) /ρ . −1) +1 ρ1 d dcos ρ

(

µ

µ

µ

(10.115)

The second kind of spherical Bessel functions at small ρ are approximately y µ (ρ) ≈

They all are infinite at x

− (2µ − 1)!!/ρ +1. µ

(10.116)

= 0 as illustrated in Fig. 10.5.

Example 10.8 (Scattering off a hard sphere) In the notation of Example 10.5, the potential of a hard sphere of radius r0 is V (r ) = ∞ θ (r0 − r ) in which θ (x ) = (x + |x |)/2|x | is Heaviside’s function. Since the point r = 0 is not in the physical region,

386

10 Bessel Functions

the scattered wave function is a linear combination of spherical Bessel functions of the first and second kinds u µ (r ) = cµ jµ (kr ) + dµ yµ (kr ).

(10.117)

The boundary condition u µ (kr0 ) = 0 fixes the ratio vµ = dµ /cµ of the constants cµ and dµ . Thus for µ = 0, Rayleigh’s formulas (10.71 and 10.114) and the boundary condition say that kr0 u 0(r0) = c0 sin(kr 0) − d0 cos (kr 0) = 0 or d0 /c0 = tan kr0 . The s-wave then is u0 (kr ) = c0 sin(kr − kr 0)/( kr cos kr0 ), which tells us that the tangent of the phase shift is tan δ 0(k) = − kr0 . By (10.100), the cross-section at low energy is σ ≈ 4π r02 or four times the classical value. Similarly, one finds (Exercise 10.29) that the tangent of the p-wave phase shift is tan δ 1(k ) =

kr0 cos kr 0 − sin kr 0 . cos kr 0 − kr 0 sin kr 0

(10.118)

For kr 0 ± 1, we have δ 1(k ) ≈ −(kr0 )3 /3; more generally, the µth phase shift is Â Ã for a potential of range r at low energy 2µ+ 1 2 δµ (k ) ≈ −(kr 0 ) / (2µ + 1 )[(2 µ − 1 )!!] 0 k ± 1/ r0 .

Further Reading A great deal is known about Bessel functions. Students may find Mathematical Methods for Physics and Engineering (Riley et al., 2006) as well as the classics A Treatise on the Theory of Bessel Functions (Watson, 1995), A Course of Modern Analysis (Whittaker and Watson, 1927, chap. XVII), and Methods of Mathematical Physics (Courant and Hilbert, 1955) of special interest. The NIST Digital Library of Mathematical Functions (dlmf.nist.gov) and the companion NIST Handbook of Mathematical Functions (Olver et al., 2010) are superb. Exercises

10.1 Show that the series (10.1) for Jn (ρ) satisfies Bessel’s equation (10.4). 10.2 Show that the generating function exp (z (u − 1 /u )/2) for Bessel functions is invariant under the substitution u → −1 /u. 10.3 Use the invariance of exp(z (u − 1/u )/2 ) under u → −1/u to show that J −n ( z) = (−1) n Jn (z ). 10.4 By writing the generating function (10.5) as the product of the exponentials exp(zu /2) and exp(− z/2u ) , derive the expansion exp

¹z ³ 2

u − u −1

∞ µ ∞ ³ z ´ m +n u m +n ³ z ´ n u −n ´º µ = − 2 n! . 2 (m + n )! n =0 m =−n

(10.119)

Exercises

387

10.5 From this expansion (10.119) of the generating function (10.5), derive the power-series expansion (10.1) for Jn (z ). 10.6 In the formula (10.5) for the generating function exp( z(u − 1/u )/2 ), replace u by exp i θ and then derive the integral representation (10.6) for Jn (z) . Start with the interval [−π, π ]. 10.7 From the general integral representation (10.6) for Jn ( z), derive the two integral formulas (10.7) for J0 ( z). 10.8 Show that the integral representations (10.6 and 10.7) imply that for any integer n ³= 0, Jn (0) = 0, while J0 (0 ) = 1. 10.9 By differentiating the generating function (10.5) with respect to u and identifying the coefficients of powers of u, derive the recursion relation Jn −1 (z) + Jn +1 (z ) =

2n Jn (z ). z

(10.120)

10.10 By differentiating the generating function (10.5) with respect to z and identifying the coefficients of powers of u, derive the recursion relation Jn −1 ( z) − Jn +1 ( z) = 2 J n´ ( z).

(10.121)

10.11 Change variables to z = ax and turn Bessel’s equation (10.4) into the selfadjoint form (10.11). 10.12 If y = Jn (ax ), then Equation (10.11) is (x y ´ )´ + ( xa 2 − n 2 /x ) y = 0. Multiply this equation by x y ´ , integrate from 0 to b, and so show that if ab = zn ,m and Jn (zn , m ) = 0, then 2

»b 0

x Jn2 (ax ) d x

= b2 Jn´2(z n m ) ,

(10.122)

which is the normalization condition (10.14). 10.13 Use the expansion (10.15) of the delta function to find formulas for the coefficients a k in the expansion for 0 < α < 1 f (x ) =

∞ µ k =1

ak x α J n ( zn ,k x /b)

(10.123)

of twice-differentiable functions that vanish at at x = b. 10.14 Show that with λ ≡ z2 / rd2 , the change of variables ρ = zr /rd and u (r ) = ( )´ Jn (ρ) turns − r u ´ + n 2 u /r = λ r u into Bessel’s equation (10.29). 10.15 Use the formula (2.46) for the curl in cylindrical coordinates and the vacuum forms ∇ × E = − B˙ and ∇ × B = E˙ /c 2 of the laws of Faraday and Maxwell–Ampère to derive the field equations (10.59).

388

10 Bessel Functions

10.16 Derive Equations from ´(10.59). ³¾ (10.60) 2 2 ω /c − k 2 ρ ein φ ei (kz −ω t ) is a traveling-wave solution 10.17 Show that Jn (10.56) of the wave equations (10.61). 10.18 Find expressions for the nonzero TM fields in terms of the formula (10.63) for E z . ´ in φ i (kz−ωt) ³¾ 2 2 2 ω /c − k ρ e e 10.19 Show that the TE field E z = 0 and B z = Jn

¾

will satisfy the boundary conditions (10.58) if ω2 /c 2 − k 2 r is a zero zń ,m of Jn´ . ¾ 10.20 Show that if µ is an integer and if ω2 /c 2 − π 2 µ2 / h 2 r is a zero zn´ , m of J n´ , then the fields E z = 0 and Bz = J n ( zń ,m ρ/r ) ein φ sin(µπ z/ h ) e −i ωt satisfy both the boundary conditions (10.58) at ρ = r and those (10.64) at z = 0 and h as well as the wave equations (10.61). Hint: Use Maxwell’s equations ∇ × E = − B˙ and ∇ × B = E˙ /c 2 as in (10.59). 10.21 Show that the resonant frequencies of the TM modes of the cavity of ¿ Example 10.4 are ωn ,m ,µ = c z2n ,m /r 2 + π 2 µ2 / h 2 .

√

10.22 By setting n = µ + 1/2 and jµ = π/2x Jµ+1/2 , show that Bessel’s equation (10.4) implies that the spherical Bessel function jµ satisfies (10.67). 10.23 Show that Rayleigh’s formula (10.71) implies the recursion relation (10.72). 10.24 Use the recursion relation (10.72) to show by induction that the spherical Bessel functions jµ ( x ) as given by Rayleigh’s formula (10.71) satisfy their differential equation (10.70) which with x = kr is (10.124) − x 2 j ´´ − 2x j ´ + µ(µ + 1) j = x 2 j . Hint: start by showing that j0 (x ) = sin( x )/ x satisfies this equation. This µ

µ

µ

µ

problem involves some tedium. 10.25 Iterate the trick (10.83) and derive the identity (10.84). 10.26 Use the expansions (10.89 and 10.90) to show that the inner product of the ket |r ¸ that represents a particle at r with polar angles θ and φ and the one |k¸ that represents a particle with momentum p = ± k with polar angles θ ´ and φ ´ is with k · r = kr cos θ

∞ µ ¹ r |k¸ = (2π1)3 2 eikr cos = (2π1)3 2 (2µ + 1) P (cos θ ) i j (kr ) =0 ¶ µ ∞ = (2π1)3 2 ei k·r = π2 i j (kr ) Y m (θ , φ) Y ∗ m (θ ´, φ ´). θ

/

µ

µ

/

µ

µ

µ

/

µ

=0

µ

µ,

µ,

(10.125)

Exercises

389

10.27 Show that (−1 )µ d µ sin ρ/dρ µ = sin(ρ − π µ/ 2) and so complete the derivation of the approximation (10.75) for jµ (ρ) for big ρ . 10.28 In the context of Examples 10.3 and 10.7, show that the boundary conditions E z (r0 , φ) = 0 and E z (r, φ) = 0 imply (10.112). 10.29 Show that for scattering off a hard sphere of radius r0 as in Example 10.8, the p-wave phase shift is given by (10.118).

11 Group Theory

11.1 What Is a Group? A group G is a set of elements f , g, h , . . . , and an operation called multiplication such that for all elements f , g, and h in the group G :

1. 2. 3. 4.

The product f g is in the group G (closure); f (gh ) = ( f g )h (associativity); there is an identity element e in the group G such that ge = eg = g; and every g in G has an inverse g −1 in G such that gg −1 = g −1 g = e.

Physical transformations naturally form groups. The elements of a group might be all physical transformations on a given set of objects that leave invariant a chosen property of the set of objects. For instance, the objects might ± 2 be 2the points x + y from the (x , y ) in a plane. The chosen property could be their distances origin. The physical transformations that leave unchanged these distances are the rotations about the origin

²x ± ³ ² cos θ = − sin θ y±

sin θ cos θ

³ ²x ³ y

.

(11.1)

These rotations form the special orthogonal group in 2 dimensions, S O ( 2). More generally, suppose the transformations T , T ± , T ±± , . . . change a set of objects in ways that leave invariant a chosen property of the objects. Suppose the product T ± T of the transformations T and T ± represents the action of T followed by the action of T ± on the objects. Since both T and T ± leave the chosen property unchanged, so will their product T ± T . Thus the closure condition is satisfied. The triple products T ±± ( T ± T ) and (T ±± T ± ) T both represent the action of T followed by the action of T ± followed by action of T ±± . Thus the action of T ±± (T ± T ) is the same as the action of ( T ±± T ± ) T , and so the transformations are associative. The identity element e is the null transformation, the one that does nothing. The inverse T −1 is 390

11.1 What Is a Group?

391

the transformation that reverses the action of T . Thus physical transformations that leave a chosen property unchanged naturally form a group. Example 11.1(Permutations) A permutation of an ordered set of n objects changes the order but leaves the set unchanged. Example 11.2(Groups of coordinate transformations) The set of all transformations that leave invariant the distance from the origin of every point in n-dimensional space is the group O (n) of rotations and reflections. The rotations in n-space form the special orthogonal group S O (n). Linear transformations x ± = x + a, for different n-dimensional vectors a, leave invariant the spatial difference x − y between every pair of points x and y in ndimensional space. They form the group of translations. Here, group multiplication is vector addition. The set of all linear transformations that leave invariant the square of the Minkowski distance x12 + x22 + x32 − x02 between any 4-vector x and the origin is the Lorentz group (Hermann Minkowski 1864–1909, Hendrik Lorentz 1853–1928). The set of all linear transformations that leave invariant the square of the Minkowski distance (x 1 − y1)2 + (x2 − y2 )2 + (x3 − y3 )2 − (x 0 − y 0)2 between any two 4-vectors x and y is the Poincaré group, which includes Lorentz transformations and translations (Henri Poincaré 1854–1912).

In the group of translations, the order of multiplication (which is vector addition) does not matter. A group whose elements all commute

[g , h ] ≡ g h − h g = 0

(11.2)

is said to be abelian. Except for the group of translations and the group S O (2 ), the order of the physical transformations in these examples does matter: the transformation T ± T is not in general the same as T T ± . Such groups are nonabelian (Niels Abel 1802–1829). Matrices naturally form groups with group multiplication defined as matrix multiplication. Since matrix multiplication is associative, any set of n × n nonsingular matrices D that includes the inverse D −1 of every matrix in the set as well as the identity matrix I automatically satisfies three of the four properties that characterize a group, leaving only the closure property uncertain. Such a set { D } of matrices will form a group as long as the product of any two matrices is in the set. As with physical transformations, one way to ensure closure is to have every matrix leave something unchanged. Example 11.3 (Orthogonal groups) The group of all real matrices that leave unchanged the squared distance x12 +· · ·+ xn2 of a point x = ( x1, . . . , xn ) from the origin is the group O (n ) of all n × n orthogonal (1.37) matrices (Exercises 11.1 and 11.2).

392

11 Group Theory

The group O (2n) leaves unchanged the anticommutation relations (Section 11.19) of the real and imaginary parts of n complex fermionic operators ψ1 , . . . , ψn . The n × n orthogonal matrices of unit determinant form the special orthogonal group S O (n ). The group S O (3) describes rotations in 3-space. Example 11.4 (Unitary groups) The set of all n × n complex matrices that leave invariant the quadratic form z ∗1 z 1 + z ∗2 z 2 + · · · + z ∗n z n forms the unitary group U (n) of all n × n unitary (1.36) matrices (Exercises 11.3 and 11.4). Those of unit determinant form the special unitary group SU (n) (Exercise 11.5). Like S O (3), the group SU (2) represents rotations. The group SU (3) is the symmetry group of the strong interactions, quantum chromodynamics. Physicists have used the groups SU (5) and S O (10) to unify the electroweak and strong interactions; whether nature also does so is unclear. Example 11.5(Symplectic groups) The set of all 2n × 2n real matrices R that leave invariant the commutation relations [qi , pk ] = i ±δ ik , [qi , qk ] = 0, and [ pi , pk ] = 0 of quantum mechanics is the symplectic group Sp (2n, R).

The number of elements in a group is the order of the group. A finite groupis a group with a finite number of elements, or equivalently a group of finite order. Example 11.6(Z 2, Z n , and Z) The parity group whose elements are 1 and −1 under ordinary multiplication is the cyclic group Z 2. It is abelian and of order 2. The cyclic group Z n for any positive integer n is made of the phases exp(i 2k π/ n) for k = 1, 2, . . . , n. It is abelian and of order n. The integers Z form a group Z with multiplication defined as addition.

A group whose elements g = g ({α }) depend continuously upon a set of parameters αk is a continuous groupor a Lie group. Continuous groups are of infinite order. A group G of matrices D is compact if the (squared) norm as given by the trace

(

Tr D† D

)≤ M

(11.3)

is bounded for all the D ∈ G. Example 11.7(S O (n), O (n), SU (n), and U (n)) The groups S O (n), O (n), SU (n), and U (n ) are continuous Lie groups of infinite order. Since for any matrix D in one of these groups

´

Tr D† D these groups also are compact.

µ

= Tr I = n ≤ M

(11.4)

11.2 Representations of Groups

393

Example 11.8 (Noncompact groups) The set of all real n × n matrices forms the general linear group G L (n , R); those of unit determinant form the special linear group SL (n , R). The corresponding groups of matrices with complex entries are G L (n, C) and SL (n, C). These four groups and the symplectic groups Sp (2n , R) and Sp(2n, C) have matrix elements that are unbounded; they are noncompact. They are continuous Lie groups of infinite order like the orthogonal and unitary groups. The group SL (2, C) represents Lorentz transformations.

Incidentally, a semigroup is a set of elements G = { f, g , h , . . . } and an operation called multiplication that is closed, f g ∈ G, and associative, f ( gh ) = ( f g) h, but that may lack an identity element and inverses. 11.2 Representations of Groups If one can associate with every element g of a group G a square matrix D (g ) and have matrix multiplication imitate group multiplication D( f ) D( g ) = D( f g )

(11.5)

for all elements f and g of the group G, then the set of matrices D ( g ) is said to form a representation of the group G. If the matrices of the representation are n × n, then n is the dimension of the representation. The dimension of a representation also is the dimension of the vector space on which the matrices act. If the matrices D ( g ) are unitary D †( g ) = D−1 ( g ), then they form a unitary representationof the group. Example 11.9(Representations of the groups SU (2) and S O (3)) resentations of SU (2) and S O (3) are the 2 × 2 complex matrix D (θ ) = and the 3 × 3 real matrix

¶

cos 21 θ + i θˆ3 sin 21 θ i (θˆ1 + i θˆ2 ) sin 12 θ

i (θˆ1 − i θˆ2) sin 21 θ cos 21 θ − i θˆ3 sin 2θ

The defining rep-

·

⎛ ⎞ c + θˆ12 (1 − c) θˆ1 θˆ2 (1 − c) − θˆ3s θˆ1θˆ3(1 − c) + θˆ2 s ⎜ ⎟ D(θ ) = ⎝θˆ2θˆ1(1 − c) + θˆ3 s c + θˆ22(1 − c) θˆ2θˆ3(1 − c) − θˆ1 s ⎠ θˆ3θˆ1(1 − c) − θˆ2 s θˆ3 θˆ2 (1 − c) + θˆ1s c + θˆ32 (1 − c) in which θ = | θ |, θî = θi /θ , c = cos θ , and s = sin θ .

(11.6)

(11.7)

Compact groups possess finite-dimensional unitary representations; noncompact groups do not . A group of bounded (11.3) matrices is compact. An abstract group of elements g ({α}) is compact if its space of parameters {α} is

394

11 Group Theory

closed and bounded. (A set is closed if the limit of every convergent sequence of its points lies in the set. A set is open if each of its elements lies in a neighborhood that lies in the set. For example, the interval [a , b] ≡ { x |a ≤ x ≤ b} is closed, while (a , b) ≡ { x |a < x < b } is open.) The group of rotations is compact, but the group of translations and the Lorentz group are noncompact. Every n × n matrix S that is nonsingular (det S ² = 0) maps any n × n representation D ( g ) of a group G into an equivalent representationD ± (g ) through the similarity transformation D ± ( g ) = S −1 D (g ) S

(11.8)

which preserves the law of multiplication D ± ( f ) D ±( g ) = S −1 D ( f ) S S −1 D ( g ) S

= S−1 D ( f ) D ( g) S = S−1 D( f g ) S = D ± ( f g).

(11.9)

A proper subspaceW of a vector space V is a subspace of lower (but not zero) dimension. A proper subspace W is invariant under the action of a representation D (g ) if D( g ) maps every vector v ∈ W to a vector D (g ) v = v ± ∈ W . A representation that has a proper invariant subspaceis reducible. A representation that is not reducible is irreducible. A representation D (g ) is completely reducibleif it is equivalent to a representation whose matrices are in block-diagonal form

⎛ D (g) 1 ⎜ − 1 S D (g ) S = ⎝ 0 .. .

⎞ . . .⎟ ⎠

0

...

.. .

.. .

D2( g )

(11.10)

in which each representation Di (g ) is irreducible. A representation in blockdiagonal form is said to be a direct sumof its irreducible representations S −1 D S

= D1 ⊕ D 2 ⊕ · · · .

(11.11)

11.3 Representations Acting in Hilbert Space A symmetry transformation g is a map (1.190) of states probabilities

|³φ± |ψ ± ´|2 = |³φ |ψ ´|2.

ψ

→ ψ ± that preserves (11.12)

The action of a group G of symmetry transformations g on the Hilbert space of a quantum theory can be represented either by operators U (g ) that are linear and unitary (the usual case) or by ones K ( g ) that are antilinear (1.188) and antiunitary (1.189), as in the case of time reversal. Wigner proved this theorem in the 1930s,

11.4 Subgroups

395

and Weinberg improved it in his 1995 classic (Weinberg, 1995, p. 51) (Eugene Wigner, 1902–1995; Steven Weinberg, 1933–). Two operators F1 and F2 that commute F1 F2 = F2 F1 are compatible (1.375). A set of compatible operators F1, F2 , . . . is complete if to every set of eigenvalues there belongs only a single eigenvector (Section 1.31). Example 11.10 (Rotation operators) Suppose that the hamiltonian H, the square of the angular momentum J 2, and its z-component Jz form a complete set of compatible observables, so that the identity operator can be expressed as a sum over the eigenvectors of these operators I

=

¸

E , j ,m

|E , j , m ´³ E, j , m |.

(11.13)

Then the matrix element of a unitary operator U (g) between two states | ψ ´ and |φ ´ is

³φ |U (g )|ψ ´ = ³φ |

¸

|E ± , j ± , m ± ´³ E ± , j ± , m ± | U (g)

E ± , j ± ,m ±

¸

|E , j , m ´³ E , j , m|ψ ´.

E , j,m

(11.14) Let H and J 2 be invariant under the action of U (g) so that U †(g )HU ( g) = H and U †(g) J 2U (g) = J 2. Then HU (g ) = U (g) H and J 2U (g) = U (g) J 2, and so if H | E , j , m ´ = E | E , j , m´ and J 2| E , j , m ´ = j ( j + 1)| E , j , m ´, we have HU (g)| E , j , m´ = U (g) H | E , j , m ´ = EU (g)| E , j , m ´

J 2U (g)| E , j , m´ = U (g) J 2| E , j, m ´ = j ( j

(11.15)

+ 1)U (g)|E , j , m ´.

Thus U (g) cannot change E or j , and so

³E ± , j ± , m ± |U (g)|E , j , m ´ = δE ± E δ j ± j ³ j , m ± |U (g)| j , m ´ = δE ± E δ j ± j Dmj± m(g ). ( )

(11.16) The matrix element (11.14) is a single sum over E and j in which the irreducible ( j) representations Dm ± m (g) of the rotation group SU (2) appear

³φ |U (g)|ψ ´ =

¸

³φ| E, j , m ± ´Dmj± m(g )³ E , j , m |ψ ´. ( )

E , j ,m ± ,m

(11.17)

This is how the block-diagonal form (11.10) usually appears in calculations. The (j) matrices Dm ± m (g) inherit the unitarity of the operator U (g).

11.4 Subgroups If all the elements of a group S also are elements of a group G, then S is a subgroup of G . Every group G has two trivial subgroups– the identity element e and the whole group G itself. Many groups have more interesting subgroups. For example,

396

11 Group Theory

the rotations about a fixed axis is an abelian subgroup of the group of all rotations in 3-dimensional space. A subgroup S ⊂ G is an invariant subgroup if every element s of the subgroup S is left inside the subgroup under the action of every element g of the whole group G, that is, if g −1 s g

= s± ∈ S

for all

g

∈ G.

(11.18)

This condition often is written as g −1 Sg = S for all g ∈ G or as Sg

=gS

for all g

∈ G.

(11.19)

Invariant subgroups also are called normal subgroups. A set C ⊂ G is called a conjugacy classif it’s invariant under the action of the whole group G, that is, if Cg = g C or g −1 C g = C

for all g

∈ G.

(11.20)

A subgroup that is the union of a set of conjugacy classes is invariant. The center C of a group G is the set of all elements c ∈ G that commute with every element g of the group, that is, their commutators

[c, g] ≡ cg − gc = 0

(11.21)

vanish for all g ∈ G. Example 11.11(Centers are abelian subgroups) Does the center C always form an abelian subgroup of its group G? The product c1 c2 of any two elements c1 and c2 of the center commutes with every element g of G since c1 c2 g = c1 gc2 = gc1 c2 . So the center is closed under multiplication. The identity element e commutes with every g ∈ G, so e ∈ C. If c± ∈ C , then c± g = gc± for all g ∈ G, and so multiplication of this equation from the left and the right by c±−1 gives gc±−1 = c±−1 g, which shows that c±−1 ∈ C. The subgroup C is abelian because each of its elements commutes with all the elements of G including those of C itself.

So the center of any group always is one of its abelian invariant subgroups. The center may be trivial, however, consisting either of the identity or of the whole group. But a group with a nontrivial center can not be simple or semisimple (Section 11.28).

11.6 Morphisms

397

11.5 Cosets

If H is a subgroup of a group G, then for every element g ∈ G the set of elements H g ≡ { hg |h ∈ H } is a right coset of the subgroup H ⊂ G. (Here ⊂ means is a subset of or equivalently is contained in.) If H is a subgroup of a group G, then for every element g ∈ G the set of elements g H is a left coset of the subgroup H ⊂ G. The number of elements in a coset is the same as the number of elements of H , which is the order of H . An element g of a group G is in one and only one right coset (and in one and only one left coset) of the subgroup H ⊂ G . For suppose instead that g were in two right cosets g ∈ H g1 and g ∈ H g2, so that g = h 1 g1 = h 2 g 2 for suitable h 1, h 2 ∈ H 1 and g1 , g 2 ∈ G. Then since H is a (sub)group, we have g2 = h − 2 h 1 g 1 = h 3 g1 , which says that g 2 ∈ H g1 . But this means that every element hg2 ∈ H g2 is of the form hg 2 = hh3 g1 = h 4 g 1 ∈ H g1 . So every element hg2 ∈ H g2 is in H g1 : the two right cosets are identical, H g1 = H g2 . The right (or left) cosets are the points of the quotient coset spaceG / H. If H is an invariant subgroup of G, then by definition (11.19) H g = g H for all g ∈ G, and so the left cosets are the same sets as the right cosets. In this case, the coset space G / H is itself a group with multiplication defined by ( H g1 ) ( H g2 )

¹ º = hi g1 h j g2|hi , h j ∈ H ¹ º = hi g1 h j g−1 1 g1 g2|hi , h j ∈ H = {h i hk g1 g2 |hi , h k ∈ H } = {h g1 g2|h ∈ H } = H g1 g2 ±

±

(11.22)

which is the multiplication rule of the group G . This group G / H is called the quotient or factor group of G by H

11.6 Morphisms An isomorphism is a one-to-one map between groups that respects their multiplication laws. For example, a similarity transformation (11.8) relates two equivalent representations D ±( g ) = S −1 D ( g) S

(11.23)

and is an isomorphism (Exercise 11.8). An automorphism is an isomorphism between a group and itself. The map gi → g gi g −1 is one to one because g g 1 g −1 = g g2 g −1 implies that g g1 = g g2 , and so that g 1 = g2 . This map

398

11 Group Theory

also preserves the law of multiplication since g g 1 g −1 g g 2 g −1 the map G

= g g1 g2 g−1 . So

→ gGg−1

(11.24)

is an automorphism. It is called an inner automorphismbecause g is an element of G. An automorphism not of this form (11.24) is an outer automorphism. 11.7 Schur’s Lemma Part 1: If D1 and D2 are inequivalent, irreducible representations of a group G , and if D1 (g ) A = AD2( g ) for some matrix A and for all g ∈ G, then the matrix A must vanish, A = 0. Proof: First suppose that A annihilates some vector |x ´, that is, A| x ´ = 0. Let P be the projection operator into the subspace that A annihilates, which is of at least 1 dimension. This subspace, incidentally, is called the null space ( A) or the kernel of the matrix A. The representation D2 must leave this null space ( A) invariant since

N

AD2( g ) P

N

= D1( g) A P = 0.

N

(11.25)

If ( A ) were a proper subspace, then it would be a proper invariant subspace of the representation D2, and so D2 would be reducible, which is contrary to our assumption that D1 and D2 are irreducible. So the null space ( A ) must be the whole space upon which A acts, that is, A = 0. A similar argument shows that if ³ y | A = 0 for some bra ³ y |, then A = 0. So either A is zero or it annihilates no ket and no bra. In the latter case, A must be square and invertible, which would imply that D2( g ) = A −1 D1( g ) A, that is, that D1 and D2 are equivalent representations, which is contrary to our assumption that they are inequivalent. The only way out is that A vanishes. Part 2: If for a finite-dimensional, irreducible representation D ( g ) of a group G, we have D (g ) A = AD( g ) for some matrix A and for all g ∈ G, then A = cI. That is, any matrix that commutes with every element of a finite-dimensional, irreducible representation must be a multiple of the identity matrix. Proof: Every square matrix A has at least one eigenvector | x ´ and eigenvalue c so that A|x ´ = c| x ´ because its characteristic equation det( A − cI ) = 0 always has at least one root by the fundamental theorem of algebra (6.79). So the null space ( A − cI ) has dimension greater than zero. The assumption D( g ) A = AD (g ) for all g ∈ G implies that D( g )( A − cI ) = ( A − cI ) D( g ) for all g ∈ G. Let P be the projection operator onto the null space ( A −cI ). Then we have ( A−cI ) D( g ) P = D (g )( A − cI ) P = 0 for all g ∈ G which implies that D ( g ) P maps vectors into the null space ( A − cI ). This null space therefore is a subspace that is invariant

N

N

N

N

11.8 Characters

399

N

under D (g ), which means that D is reducible unless the null space ( A − cI ) is the whole space. Since by assumption D is irreducible, it follows that ( A − cI ) is the whole space, that is, that A = cI (Issai Schur, 1875–1941).

N

Example 11.12(Schur, Wigner, and Eckart) Suppose an arbitrary observable O is invariant under the action of the rotation group SU (2) represented by unitary operators U (g) for g ∈ SU (2) U †( g ) O U ( g ) = O

or

[ O , U (g)] = 0.

(11.26)

These unitary rotation operators commute with the square J 2 of the angular momentum [ J 2, U ] = 0. Suppose that they also leave the hamiltonian H unchanged [ H , U ] = 0. Then as shown in Example 11.10, the state U | E , j , m ´ is a sum of states all with the same values of j and E. It follows that

¸ m±

³E , j , m |O |E ± , j ± , m ± ´³ E ± , j ±, m ± |U (g)|E ± , j ± , m ±± ´

=

¸ m±

³E , j , m |U (g )| E, j , m ± ´³ E , j , m ± |O | E± , j ± , m±± ´

(11.27)

or in the notation of (11.16)

¸ ¸ ³ E , j , m |O |E ± , j ± , m ± ´D j ± (g)m±m±± = D j (g )mm± ³ E , j , m ± |O | E± , j ± , m±± ´. (

( )

)

m±

m±

(11.28)

Now Part 1 of Schur’s lemma tells us that the matrix ³E , j , m | O | E ± , j ± , m ± ´ must vanish unless the representations are equivalent, which is to say unless j = j ± . So we have

¸ ¸ ³ E, j , m |O | E± , j , m ± ´D j (g)m± m±± = D j (g)mm± ³ E, j , m ± |O |E ± , j , m ±± ´. ( )

( )

m±

m±

(11.29) Now Part 2 of Schur’s lemma tells us that the matrix ³ E , j , m | O | E ± , j , m ± ´ must be a multiple of the identity. Thus the symmetry of O under rotations simplifies the matrix element to

³ E, j , m |O | E± , j ± , m ± ´ = δ j j± δmm± O j (E , E± ).

(11.30)

This result is a special case of the Wigner–Eckart theorem(Eugene Wigner 1902– 1995, Carl Eckart 1902–1973).

11.8 Characters

Suppose the n × n matrices Di j (g ) form a representation of a group G character χD ( g ) of the matrix D( g ) is the trace

µ g. The

400

11 Group Theory χ D (g )

= TrD (g) =

n ¸ i =1

Dii (g ).

(11.31)

Traces are cyclic, that is, Tr ABC = TrBC A = TrC AB . So if two representations D and D ± are equivalent, so that D ± (g ) = S −1 D ( g ) S, then they have the same characters because χD ± ( g )

= Tr D± ( g) = Tr

( S−1 D(g)S) = Tr ( D (g)SS−1 ) = TrD (g) = χ

D ( g ).

(11.32)

If two group elements g1 and g2 are in the same conjugacy class, that is, if g 2 = gg1 g −1 for all g ∈ G, then they have the same character in a given representation D (g ) because χD (g 2)

( ) = Tr D (g2) = Tr D( gg1 g−1 ) = Tr D (g) D (g1 ) D ( g−1) ( ) = Tr D( g1 ) D −1 (g) D( g) = Tr D ( g1) = χ D( g1 ).

(11.33)

11.9 Direct Products Suppose D( a) ( g) is a k-dimensional representation of a group G , and D (b) ( g ) is an n-dimensional representation of the same group. Then their product (a,b) (a) (b) Dim ( g ) = Di j ( g ) Dm ± ( g ) , j±

(11.34)

is a (kn) -dimensional direct-product representation of the group G . Direct products are also called tensor products. They occur in quantum systems that have two or more parts, each described by a different space of vectors. Suppose the vectors |i ´ for i = 1, . . . , k are the basis vectors of the k-dimensional space Vk on which the representation D (a) ( g ) acts, and that the vectors |m ´ for m = 1 . . . n are the basis vectors of the n-dimensional space Vn on which D (b) ( g ) acts. The kn vectors |i , m ´ are basis vectors for the kn-dimensional tensor-product space Vkn . The matrices D (a ,b) (g ) defined as

³i, m | D ( g) a b | j, ±´ = ³i | D a ( g)| j ´³m | D b ( g)|±´ ( , )

( )

( )

(11.35)

act in this kn-dimensional space Vkn and form a representation of the group G; this direct-product representation usually is reducible. Many tricks help one to decompose reducible tensor-product representations into direct sums of irreducible representations (Georgi, 1999; Zee, 2016). Example 11.13 (Adding angular momenta) The addition of angular momenta illustrates both the direct product and its reduction to a direct sum of irreducible representations. Let D ( j1) (g) and D ( j2) (g) respectively be the (2 j1 + 1) × (2 j1 + 1) and the

11.10 Finite Groups

401

(2 j2 + 1) × (2 j2 + 1 ) representations of the rotation group SU (2 ). The direct-product representation D ( j1 , j2 )

³m±1 , m±2 |D j

( 1 , j 2)

|m 1 , m 2 ´ = ³m ±1 |D j

( 1) (

g)| m 1´³m ±2 | D ( j2 ) (g)|m 2´

(11.36)

is reducible into a direct sum of all the irreducible representations of SU (2) from D ( j1+ j2 ) (g) down to D( | j1− j2| ) (g) in integer steps: D( j1, j2)

= D j + j ⊕ D j + j −1 ⊕ · · · ⊕ D | j − j |+1 ⊕ D | j − j | ( 1

2)

( 1

2

)

(

1

)

2

( 1

2 )

(11.37)

each irreducible representation occurring once in the direct sum. Example 11.14(Adding two spins) When one adds j1 = 1/2 to j2 = 1/2, one finds that the tensor-product matrix D (1/2,1/2) is equivalent to the direct sum D (1) ⊕ D (0) D

( 1/2,1/ 2)

²

−1 D (1) (θ ) (θ ) = S 0

0

D (0) (θ )

³

S

(11.38)

where the matrices S, D( 1) , and D (0) are 4 × 4, 3 × 3, and 1 × 1.

11.10 Finite Groups A finite groupis one that has a finite number of elements. The number of elements in a group is the order of the group. Example 11.15(Z 2) cation rules

The group Z2 consists of two elements e and p with multipliee = e,

ep

= pe = p,

and

pp = e.

(11.39)

Clearly, Z 2 is abelian, and its order is 2. The identification e → 1 and p → −1 gives a 1-dimensional representation of the group Z 2 in terms of 1 × 1 matrices, which are just numbers.

It is tedious to write the multiplication rules as individual equations. Normally people compress them into a multiplication table like this:

× e p

e e p

p p e

(11.40)

A simple generalization of Z 2 is the group Z n whose elements may be represented as exp(i2π m / n ) for m = 1, . . . , n. This group is also abelian, and its order is n .

402

11 Group Theory

Example 11.16(Z 3)

The multiplication table for Z 3 is

× e a b

which says that a 2 = b, b2

e e a b

a a b e

b b e a

(11.41)

= a, and ab = ba = e.

11.11 Regular Representations

For any finite group G we can associate an orthonormal vector |gi ´ with each element gi of the group. So ³g i |g j ´ = δi j . These orthonormal vectors |gi ´ form a basis for a vector space whose dimension is the order of the group. The matrix D ( gk ) of the regular representation of G is defined to map any vector |g i ´ into the vector |gk gi ´ associated with the product gk gi D( gk )|gi ´ = | gk gi ´.

(11.42)

Since group multiplication is associative, we have D (g j ) D ( gk ) |gi ´ = D (g j ) |gk gi ´ = | g j ( gk gi ) ´ = |(g j gk )g i ´ = D ( g j g k )| gi ´. (11.43) Because the vector |gi ´ was an arbitrary basis vector, it follows that D ( g j ) D ( gk ) = D ( g j g k )

(11.44)

which means that the matrices D (g ) satisfy the criterion (11.5) for their being a representation of the group G. The matrix D (g ) has entries

[ D ( g)]i j = ³gi | D ( g)|g j ´ .

(11.45)

The sum of dyadics |g ± ´³ g± | over all the elements g± of a finite group G is the unit matrix

¸

g± ∈ G

|g ´³g | = In ±

(11.46)

±

in which n is the order of G, that is, the number of elements in G. The matrix (11.45) respects the product law (11.44) and so is a representation of the group

[ D (g j gk )]m n = ³gm| D ( g j gk )| gn ´ = ³gm | D (g j ) D (gk )| gn ´ (11.47) ¸ ¸ = ³ gm | D (g j )|g ´³g | D( gk )|gn ´ = [D ( g j )]m [ D( gk )] n . ,

g± ∈G

±

±

g± ∈G

,±

±,

11.12 Properties of Finite Groups

403

Example 11.17( Z 3 ’s regular representation) The regular representation of Z 3 is

⎛1 D (e) = ⎝0 0

so D(a)2

0 1 0

⎞

0 0⎠ , 1

⎛0 D (a ) = ⎝1 0

⎛0 D (b) = ⎝0

⎞

0 0 1

1 0⎠ , 0

1

1 0 0

⎞

0 1⎠ (11.48) 0

= D(b), D(b)2 = D (a), and D (a) D(b) = D (b) D (a ) = D (e). 11.12 Properties of Finite Groups

In his book (Georgi, 1999, chap. 1), Georgi proves the following theorems:

1. 2. 3. 4.

Every representation of a finite group is equivalent to a unitary representation. Every representation of a finite group is completely reducible. The irreducible representations of a finite abelian group are 1 dimensional. If D(a )( g ) and D(b )( g ) are two unitary irreducible representations of dimensions n a and n b of a group G of N elements g 1 , . . . , g N , then the functions

»n

a

D (jka )( g )

(11.49)

N are orthonormal and complete in the sense that na N

N ¸ j =1

(a )∗ (b) Dik ( g j ) D±m (g j ) = δab δ i ± δkm .

(11.50)

5. The order N of a finite group is the sum of the squares of the dimensions of its inequivalent irreducible representations N

=

¸ a

n 2a .

(11.51)

Example 11.18( Z N ) The abelian cyclic group Z N with elements gj

= e2

π i j/N

(11.52)

has N 1-dimensional irreducible representations D(a ) (g j ) = e 2π ia j / N

(11.53)

for a = 1, 2, . . . , N . Their orthonormality relation (11.50) is the Fourier formula 1 N

N ¸ j =1

e−2π ia j/ N e2π ibj / N

= δab .

(11.54)

The na are all unity, there are N of them, and the sum of the n2a is N as required by the sum rule (11.51).

404

11 Group Theory

11.13 Permutations The permutation group on n objects is called Sn . Permutations are made of cycles that change the order of the n objects. For instance, the permutation (1 2 ) = (2 1 ) is a 2-cycle that means x 1 → x 2 → x 1 ; the unitary operator U ((1 2 )) that represents it interchanges states like this: U ((1 2 ))|+, −´ = U (( 1 2)) |+, 1´ |−, 2´ = |−, 1´|+, 2 ´ = |−, +´.

(11.55)

The 2-cycle (3 4 ) means x 3 → x 4 → x3 , it changes (a , b, c, d ) into (a , b, d , c) . The 3-cycle (1 2 3) = ( 2 3 1) = (3 1 2) means x 1 → x 2 → x 3 → x 1, it changes (a , b , c , d ) into (b , c , a , d ) . The 4-cycle (1 3 2 4 ) means x 1 → x 3 → x 2 → x 4 → x 1 and changes (a , b, c, d ) into (c , d , b, a ). The 1-cycle (2) means x 2 → x 2 and leaves everything unchanged. The identity element of Sn is the product of 1-cycles e = (1)( 2) · · · (n ) . The inverse of the cycle (1 3 2 4) must invert x 1 → x 3 → x 2 → x 4 → x 1 , so it must be (1 4 2 3 ) which means x 1 → x 4 → x 2 → x 3 → x1 so that it changes ( c, d , b , a ) back into (a , b , c , d ). Every element of Sn has each integer from 1 to n in one and only one cycle. So an arbitrary element of Sn with ±k k-cycles must satisfy n ¸ k =1

k ±k

= n.

(11.56)

11.14 Compact and Noncompact Lie Groups Imagine rotating an object repeatedly. Notice that the biggest rotation is by an angle of ¶π about some axis. The possible angles form a circle; the space of parameters is a circle. The parameter space of a compact group is compact – closed and bounded. The rotations form a compact group. Now consider the translations. Imagine moving a pebble to the Sun, then moving it to the next-nearest star, then moving it to the nearest galaxy. If space is flat, then there is no limit to how far one can move a pebble. The parameter space of a noncompact group is not compact. The translations form a noncompact group. We’ll see that compact Lie groups possess unitary representations, with n × n unitary matrices D (α), while noncompact ones don’t. Here α stands for the parameters α1, . . . , αn that label the elements of the group, three for the rotation group. The α’s usually are real, but can be complex. 11.15 Generators To study continuous groups, we will use calculus and algebra, and we will focus on the simplest part of the group – the elements g (α) for α ≈ 0 which are near

11.16 Lie Algebra

405

the identity e = g (0 ) for which all αa = 0. Each element g (α) of the group is represented by a matrix D(α) ≡ D ( g (α)) in the D representation of the group and by another matrix D ± (α) ≡ D± ( g (α)) in any other D ± representation of the group. Every representation respects the multiplication law of the group. So if g(β) g (α) = g (γ ) , then the matrices of the D representation must satisfy D(β) D(α) = D(γ ) , and those of any other representation D ± must satisfy D ± (β) D ± (α) = D ± (γ ). A generator ta of a representation D is the partial derivative of the matrix D(α) with respect to the component αa of α evaluated at α = 0

¼¼ ¼ . (11.57) ta = −i ∂ α a ¼ =0 When all the parameters αa are infinitesimal, |αa | ¸ 1, the matrix D(α) is very close to the identity matrix I ¸ D (α) ¹ I + i αa ta . (11.58) ∂ D(α)

α

a

Replacing α by α/ n, we get a relation that becomes exact as n → ∞ D

´αµ n

= I +i

¸ αa a

n

ta .

(11.59)

The n th power of this equation is the matrix D (α) that represents the group element g (α) in the exponential parametrization D (α) = D

´ α µn n

= nlim →∞

¶

I

+i

¸ αa ·n a

n

ta

= exp

¶¸ i

·

αa ta

.

(11.60)

a

The i’s appear in these equations so that when the generators ta are hermitian matrices, (ta )† = ta , and the α’s are real, the matrices D (α) are unitary D −1(α) = exp

¶ ¸ · ¶ ¸ · † −i αa ta = D (α) = exp −i αa ta . a

(11.61)

a

Compact groups have finite-dimensional, unitary representations with hermitian generators. 11.16 Lie Algebra If ta and tb are any two generators of a representation D, then the matrices D (α) = e i ² ta

and

D (β) = e i ² tb

(11.62)

represent the group elements g (α) and g (β) with infinitesimal exponential parameters αi = ² δia and βi = ² δib . The inverses of these group elements g −1 (α) =

406

11 Group Theory

g (−α) and g −1(β) = g (−β) are represented by the matrices D( −α) = e −i ² ta and D (−β) = e −i ² tb . The multiplication law of the group determines the parameters γ (α, β) of the product g (β) g (α) g (−β) g ( −α) = g (γ (α, β)).

(11.63)

The matrices of any two representations D with generators ta and D± with generators ta± obey the same multiplication law D (β) D(α) D (−β) D( −α) = D(γ (α, β))

(11.64)

D ± (β) D ± (α) D ± (−β) D ±( −α) = D± (γ (α, β))

with the same infinitesimal exponential parameters α, β , and γ (α, β). To order ² 2 , the product of the four D’s is e i ² tb e i ² ta e−i ²tb e −i ² ta

2

2

≈ (1 + i ² tb − ²2 tb2 )(1 + i ² ta − ²2 ta2)

2

2

× ( 1 − i ² tb − ²2 tb2)(1 − i ² ta − ²2 ≈ 1 + ²2 (ta tb − tb ta ) = 1 + ²2 [ta , tb ].

ta2)

(11.65)

The other representation gives the same result but with primes ± ± ± ± ei ²tb ei ²ta e−i ²t b e−i ²t a

≈ 1 + ²2 [ta± , tb± ].

(11.66)

The products (11.65 and 11.66) represent the same group element g (γ (α, β)), so they have the same infinitesimal parameters γ (α, β) and therefore are the same linear combinations of their respective generators tc and tc± D (γ (α, β)) ≈ 1 + ² [ta , tb ] = 1 + i ² 2

D± (γ (α, β)) ≈ 1 + ² 2[ta± , tb± ] = 1 + i ²

2

n ¸ c=1 n

¸ 2 c=1

c f ab tc c ± f ab tc

(11.67)

c which in turn imply the Lie algebraformulas with the same f ab

[ta , tb ] =

n ¸ c=1

c tc fab

and

[ta± , tb± ] =

n ¸ c =1

c ± fab tc .

(11.68)

The commutator of any two generators is a linear combination of the generators. The coefficientsf cab are the structure constants of the group. They are the same for all representations of the group. Unless the parameters αa are redundant, the generators are linearly independent. They span a vector space, and any linear combination may be called a generator.

11.16 Lie Algebra

407

By using the Gram–Schmidt procedure (Section 1.10), we may make the generators ta orthogonal with respect to the inner product (1.91) ( ta , tb )

= Tr

(† ) ta tb

= k δ ab

(11.69)

in which k is a nonnegative normalization constant that depends upon the representation. We can’t normalize the generators, making k unity, because the structure c constants fab are the same in all representations. ∑ when an index is In what follows, I will often omit the summation symbol repeated. In this notation, the structure-constant formulas (11.68) are

[ta , tb ] = fabc tc

and

[ta± , tb± ] =

c ± fab tc .

(11.70)

This summation convention avoids unnecessary summation symbols. By multiplying both sides of the first of the two Lie algebra formulas (11.68) by td† and using the orthogonality (11.69) of the generators, we find Tr

´ µ [ta , tb ] td† = i

´ µ

f cab Tr tc td†

=i

f cab k δcd

= ik f dab

(11.71)

which implies that the structure constant f cab is the trace

( ) (11.72) = − ki Tr [ta , tb ] tc† . Because of the antisymmetry of the commutator [ta , tb], structure constants are f cab

antisymmetric in their lower indices f cab

= − f cba .

(11.73)

From any n × n matrix A, one may make a hermitian matrix A + A† and an antihermitian one A − A† . Thus one may separate the n G generators into a set that are hermitian ta(h ) and a set that are antihermitian ta(ah ) . The exponential ( of any) imaginary linear combination of n × n hermitian generators D (α) = exp i αa ta(h) is an n × n unitary matrix since D † (α) = exp

( −i α

†( h) a ta

) = exp (−i α

(h )

a ta

) = D −1(α).

(11.74)

A group with only hermitian generators is compact and has finite-dimensional unitary representations. On the other hand, the exponential of any) imaginary linear combination of anti( hermitian generators D (α) = exp i αa ta( ah ) is a real exponential of their hermitian counterparts i ta(ah ) whose squared norm

º D(α)º2 = Tr

½ D(α)† D (α)¾ = Tr ½exp (2α

(ah )

a ita

)¾

(11.75)

408

11 Group Theory

grows exponentially and without limit as the parameters αa → ¶∞ . A group with some antihermitian generators is noncompact and does not have finitedimensional unitary representations. (The unitary representations of the translations and of the Lorentz and Poincaré groups are infinite dimensional.) Compact Lie groups have hermitian generators, and so the structure-constant formula (11.72) reduces in this case to f cab

( ) = ( −i / k) Tr [ta , tb ] tc† = (−i / k)Tr ( [ta , tb ] tc ) .

(11.76)

Now, since the trace is cyclic, we have f bac

= ( −i / k) Tr ([ta , tc ] tb ) = (−i / k) Tr (ta tc tb − tc ta tb ) = ( −i / k) Tr (tb ta tc − ta tb tc ) = ( −i / k) Tr ([tb , ta ] tc ) = f cba = − f cab .

(11.77)

Interchanging a and b, we get f abc

=

f cab = − f cba .

(11.78)

Finally, interchanging b and c in (11.77) gives f cab

=

f bca

= − f bac.

(11.79)

Combining (11.77, 11.78, and 11.79), we see that the structure constants of a compact Lie group are totally antisymmetric f bac

= − f bca =

f cba

= − f cab = − f abc =

f acb .

(11.80)

Because of this antisymmetry, it is usual to lower the upper index f cab = fcab

=

f abc

(11.81)

and write the antisymmetry of the structure constants of compact Lie groups as facb

= − f cab =

fbac

= − fabc = − f bca =

fcba .

(11.82)

For compact Lie groups, the generators are hermitian, and so the structure constants fabc are real, as we may see by taking the complex conjugate of the formula (11.76) for fabc

∗ fabc

= (i / k)Tr (tc [tb , ta ]) = (−i / k )Tr ([ta , tb ] tc ) =

f abc .

(11.83)

It follows from (11.68 and 11.81–11.83) that the commutator of any two generators of a Lie group is a linear combination

[ta , tb ] = i

c f ab tc

(11.84)

11.18 Rotation Group

of its generatorstc , and that the structure constants fabc totally antisymmetric if the group is compact .

409

≡

c fab are real and

11.17 Yang and Mills Invent Local Nonabelian Symmetry The action of a Yang–Mills theory is unchanged when a spacetime-dependent unitary matrix U ( x ) = exp(−ita θ a ( x )) maps a vector ψ ( x ) of matter fields to ψ ± ( x ) = U ( x )ψ ( x ) . The symmetry ψ † (x )U † (x )U ( x )ψ (x ) = ψ † ( x )ψ ( x ) is obvious, but how can kinetic terms like ∂i ψ † ∂i ψ be made invariant? Yang and Mills introduced matrices Ai = − ita Aai of gauge fields, replaced ordinary derivatives ∂i by covariant derivatives Di ≡ ∂i + Ai , and required that covariant derivatives of fields transform like fields

(

∂i

+ A ±i

)

Uψ

=

(

∂i U

+ U ∂i + A ±i U

)

ψ

= U (∂i + Ai ) ψ.

(11.85)

Their nonabelian gauge transformation is A±i ( x ) = U ( x ) A i ( x )U † ( x ) − (∂i U (x )) U † (x ).

(11.86)

= [ Di , Dk ] = ∂i Ak − ∂k A i + [ A i , Ak ] transforms as Fik± ( x ) = U ( x ) Fik U −1( x ) = U ( x )[ Di , Dk ]U −1 (x ). (11.87)

Their Faraday tensor Fik

11.18 Rotation Group The rotations and reflections in 3-dimensional space form a compact group O ( 3) whose elements R are 3 × 3 real matrices that leave invariant the dot product of any two 3-vectors

·

( R x) ( R y)

=x

T

R R y = x I y = x · y. T

T

(11.88)

These matrices therefore are orthogonal (1.184) R R T

= I.

(11.89)

Taking the determinant of both sides and using the transpose (1.205) and product (1.225) rules, we have (det

R )2

=1

(11.90)

whence det R = ¶1. The group O ( 3) contains reflections as well as rotations and is disjoint. The subgroup with det R = 1 is the group S O ( 3). An S O (3) element near the identity R = I + ω must satisfy (I

+ ω)

T

(I

+ ω) = I .

(11.91)

410

11 Group Theory

Neglecting the tiny quadratic term, we find that the infinitesimal matrix antisymmetric ω

T

= − ω.

ω

is

(11.92)

One complete set of real 3 × 3 antisymmetric matrices is

⎛0 ω1 = ⎝ 0

0 0 0 1

⎞

0 −1 ⎠ , 0

⎛0 ω2 = ⎝ 0 −1

0 0 0

which we may write as [ωb ]ac

⎞

1 0⎠ , 0

⎛0 −1 0⎞ ω3 = ⎝1 0 0⎠ 0

0

(11.93)

0

= ²abc

(11.94)

in which ²abc is the Levi-Civita symbol which is totally antisymmetric with ²123 = 1 (Tullio Levi-Civita 1873–1941). The ωb are antihermitian, but we make them hermitian by multiplying by i tb

= i ωb

so that

[tb]ac

= i ²abc

(11.95)

and R = I − i θb tb. The three hermitian generators ta satisfy (Exercise 11.15) the commutation relations

[ t a , t b] = i

f abc tc

(11.96)

in which the structure constants are given by the Levi-Civita symbol ²abc fabc

= ²abc

(11.97)

so that

[ta , tb ] = i ²abc tc .

(11.98)

They are the generators of the defining representationof S O (3 ) (and also of the adjoint representationof SU (2) (Section 11.25)). Physicists usually scale the generators by ± and define the angular-momentum generator L a as La

= ± ta

(11.99)

so that the eigenvalues of the angular-momentum operators are the physical values of the angular momenta. With ±, the commutation relations are

[L a , L b] = i ± ²abc L c .

(11.100)

11.18 Rotation Group

411

The matrix that represents a right-handed rotation (of an object) by an angle θ about an axis θ is D (θ ) = e −i θ ·t

= e−i · L ±. θ

= |θ |

(11.101)

/

By using the fact (1.294) that a matrix obeys its characteristic equation, one may show (Exercise 11.17) that the 3 × 3 matrix D (θ ) that represents a right-handed rotation of θ radians about the axis θ is the matrix exp( −i θ · t) (11.7) whose i , jth entry is Di j (θ ) = cos θ δi j

− sin θ ²i jk θk /θ + (1 − cos θ ) θi θ j /θ 2

(11.102)

in which a sum over k = 1, 2, 3 is understood. A set of generators Ja equivalent to the antisymmetric L ’s and ω’s (11.93) but with J 3 diagonal is

⎛0 J1 = √ ⎝ 1 1

2

0

1 0 1

⎞

0 1⎠ , 0

⎛0 −i 0 ⎞ J2 = √ ⎝ i 0 −i ⎠ , 1

2

0

i

0

⎛1 J3 = ⎝0 0

⎞

0 0 0

0 0 ⎠. −1 (11.103)

Example 11.19(Demonstration of commutation relations) Take a big sphere with a distinguished point and orient the sphere so that the point lies in the y-direction from the center of the sphere. Now rotate the sphere by a small angle, say 15 degrees or ² = π/12, right-handedly about the x-axis, then right-handedly about the y-axis by the same angle, then left-handedly about the x-axis and then left-handedly about the yaxis. Using the approximations (11.65 and 11.67) for the product of these four rotation matrices and the definitions (11.99) of the generators and of their structure constants (11.100), we have ±t a = L 1 = L x , ±tb = L 2 = L y , ± f abc t c = ²12c L c = L 3 = L z , and ei ² L y / ± ei ² L x / ± e−i ² L y / ± e−i ² L x / ±

2

2

≈ 1 + ±²2 [ L x , L y ] = 1 + i ²± L z ≈

ei ²

2

Lz /±

(11.104)

which is a left-handed rotation about the (vertical) z-axis. The magnitude of that rotation should be about ²2 = (π/12)2 ≈ 0.069 or about 3.9 degrees. Photographs of an actual demonstration are displayed in Fig. 11.1. The demonstrated equation (11.104) shows (Exercise 11.16) that the generators L x and L y satisfy the commutation relation

[L x , L y] = i±L z of the rotation group.

(11.105)

412

11 Group Theory Physical demonstration of commutation relations

Figure 11.1 Demonstration of Equation (11.104) and the commutation relation (11.105). Upper left: black ball with a white stick pointing in the y-direction; the x-axis is to the reader’s left, the z-axis is vertical. Upper right: ball after a small right-handed rotation about the x-axis. Center left: ball after that rotation is followed by a small right-handed rotation about the y-axis. Center right: ball after these rotations are followed by a small left-handed rotation about the x-axis. Bottom: ball after these rotations are followed by a small left-handed rotation about the y-axis. The net effect is approximately a small left-handed rotation about the z-axis.

11.20 Defining Representation of SU(2)

413

2n Dimensions 11.19 Rotations and Reflections in The orthogonal group O ( 2n ) of rotations and reflections in 2n dimensions is the group of all real 2n × 2n matrices O whose transposes O are their inverses T

O O T

=OO =I

(11.106)

T

in which I is the 2n × 2n identity matrix. These orthogonal matrices leave unchanged the distances from the origin of points in 2n dimensions. Those with unit determinant, det O = 1, constitute the subgroup S O (2n ) of rotations in 2n dimensions. A symmetric sum { A , B } = AB + B A is called an anticommutator. Complex fermionic variables ψi obey the anticommutation relations

{ψi , ψk† } = ± δik , {ψi , ψk } = 0,

and

{ψi† , ψk† } = 0.

(11.107)

− ψi†)

(11.108)

{xi , yk } = 0.

(11.109)

Their real x i and imaginary yi parts xi

= √1

2

(ψi

+ ψi† )

and

yi

= √1

i 2

(ψi

obey the anticommutation relations

{ xi , xk } = ± δik , { yi , yk } = ± δik ,

and

More simply, the anticommutation relations of these 2n hermitian variables v ( x 1, . . . , x n , y1, . . . , yn ) are

=

{vi , vk } = ± δik . (11.110) If the real linear transformation v ±i = L i1 v1 + L i2 v2 + · · · + L i2n v2n preserves these anticommutation relations, then the matrix L must satisfy

± δ ik = {vi± , vk± } = L i j L k { v j , v } = L i j L k ± δ j = ± L i j L k j ±

±

±

±

(11.111)

which is the statement that it is orthogonal, L L = I . Thus the group O (2n ) is the largest group of linear transformations that preserve the anticommutation relations of the 2n hermitian real and imaginary parts vi of n complex fermionic variables ψi . T

SU (2) 11.20 Defining Representation of The smallest positive value of angular momentum is ±/2. The spin-one-half angular-momentum operators are represented by three 2 × 2 matrices Sa

= ±2 σa

(11.112)

414

11 Group Theory

in which the σa are the Pauli matrices σ1

=

²0 1³ 1

0

²0 −i ³ σ2 = ,

,

i

0

and

σ3

=

²1 0

³

0 −1

(11.113)

which obey the multiplication law σi σ j

= δi j + i ²i jk σk

(11.114)

summed over k from 1 to 3. Since the symbol ² i jk is totally antisymmetric in i, j, and k, the Pauli matrices obey the commutation and anticommutation relations

[σi , σ j ] ≡ σi σ j − σ j σi = 2i ²i jk σk {σi , σ j } ≡ σi σ j + σ j σi = 2δi j .

(11.115)

The Pauli matrices divided by 2 satisfy the commutation relations (11.98) of the rotation group ¿1 1 À 1 σa , σb = i ²abc σc (11.116) 2 2 2 and generate the elements of the group SU (2)

´

µ

· 2 = I cos θ2 + i θˆ · σ sin θ2 ± in which I is the 2×2 identity matrix, θ = θ 2 and θˆ = θ /θ . exp i

σ

θ

(11.117)

It follows from (11.116) that the spin operators (11.112) satisfy

[Sa , Sb ] = i ± ²abc Sc .

(11.118)

11.21 The Lie Algebra and Representations SU(2) of

The three generators of SU (2) in its 2 × 2 defining representation are the Pauli matrices divided by 2, ta = σa /2. The structure constants of SU (2) are fabc = ²abc which is totally antisymmetric with ²123 = 1

[ta , tb ] = i f abctc =

¿1

1 σa , σb 2 2

À

= i ²abc 21 σc .

(11.119)

For every half-integer j

= 2n

for

n = 0, 1, 2, 3, . . .

(11.120)

there is an irreducible representation of SU (2 ) D ( j ) (θ ) = e −i θ · J

(j)

(11.121)

11.21 The Lie Algebra and Representations of SU(2)

415

in which the three generators ta ≡ Ja are (2 j + 1) × (2 j + 1) square hermitian matrices. In a basis in which J3( j ) is diagonal, the matrix elements of the complex (j) (j) (j) linear combinations J¶ ≡ J1 ¶ i J2 are ( j)

Á

( j)

J1

¶ i J2

( j)

Â

( j)

= δs± s¶1 s± s ,

,

±

(j

∓ s )( j ¶ s + 1)

(11.122)

where s and s ± run from − j to j in integer steps and those of J3( j ) are

Á jÂ ( )

J3

s ± ,s

= s δs± s .

(11.123)

,

Borrowing a trick from Section 11.26, one may show that the commutator of the square J ( j ) · J ( j ) of the angular momentum matrix commutes with every generator (j) Ja . Thus J ( j ) 2 commutes with D ( j ) ( θ ) for every element of the group. Part 2 of Schur’s lemma (Section 11.7) then implies that J ( j ) 2 must be a multiple of the (2 j + 1) × (2 j + 1) identity matrix. The coefficient turns out to be j ( j + 1) J ( j) · J ( j)

= j ( j + 1) I .

(11.124)

Combinations of generators that are a multiple of the identity are called Casimir operators. Example 11.20(Spin 2)

(2)

J+

and J−

⎛ 0 ⎜ 0 ⎜ 0 =⎜ ⎜ ⎜ ⎝0

=

´

0

J+(2)

µ†

2 0 0 0 0

√0 6 0 0 0

= 2, the spin-two matrices J+2 ⎞ ⎛ 2 0 0 0 ⎜⎜ 0 1 0 0⎟ ⎟ √ ⎟ ⎜ 2 6 0⎟ and J3 = ⎜ 0 0 ⎟ ⎜⎝ 0 0 0 2⎠

( )

For j

( )

0

0

0

0

and J3(2) are 0 0 0 0 0

0 0 0 −1 0

⎞

0 0 ⎟ ⎟⎟ 0 ⎟ ⎟ 0 ⎠ −2 (11.125)

.

The tensor product of any two irreducible representations D ( j ) and D (k ) of SU (2) is equivalent to the direct sum of all the irreducible representations D(±) for | j − k | ≤ ± ≤ j + k ( j)

D

j +k Ã

⊗D = (k )

±

each D (±) occurring once.

=| j −k |

D (±)

(11.126)

416

11 Group Theory

Example 11.21 (Addition theorem) The spherical harmonics Y±m (θ , φ) = ³θ , φ|±, m ´ of Section 9.13 transform under the (2± + 1)-dimensional representation D ± of the rotation group. If a rotation R takes θ , φ into the vector θ ± , φ ± , so that |θ ± , φ± ´ = U ( R)|θ , φ´, then summing over m ± from − ± to ±, we get Y ±,∗ m (θ ± , φ ± ) = ³±, m |θ ± , φ ± ´ = ³ ±, m| U ( R )|θ , φ ´

= ³±, m |U ( R)|±, m ± ´³±, m ± |θ , φ´ = D ( R)m m± Y ∗ m± (θ , φ). Suppose now that a rotation R maps |θ1, φ 1´ and | θ2 , φ2 ´ into |θ1± , φ ±1´ = U (R ) |θ1, φ 1´ and |θ2± , φ2± ´ = U (R ) |θ2, φ 2´. Then summing over the repeated indices m, m ±, and m ±± from − ± to ±, we find Y m (θ1± , φ1± )Y ∗ m (θ2± , φ2± ) = D ( R )∗m m ± Y m ± (θ1 , φ1 ) D ( R )m m ±± Y ∗ m ±± (θ2, φ 2). ±

±,

±

±,

,

,

±

±,

±,

,

±,

In this equation, the matrix element D ( R )∗m ,m ± is D ± ( R )∗ ± = ³±, m |U ( R )|±, m ±´∗ = ³±, m ± |U † ( R )| ±, m ´ = D ± ( R −1 )m ± ,m . ±

m ,m

Thus since D ± is a representation of the rotation group, the product of the two D ± ’s in example (11.21) is D± ( R )∗m ,m ± D ± ( R )m ,m ±±

= D ( R−1 )m± m D ( R)m m±± ±

±

,

,

= D ( R−1 R)m± m±± = D ( I )m± m±± = δm± m ±± . ±

,

So as long as the same rotation R maps we have

¸ ±

m =−±

θ1 , φ1

Y ±,m (θ1± , φ1± )Y ±,∗ m (θ2± , φ2± ) =

±

,

,

into θ1± , φ ±1 and θ2, φ2 into θ2± , φ2± , then

¸ ±

m =−±

Y ±,m (θ1 , φ1)Y ±,∗ m (θ2 , φ2).

We choose the rotation R as the product of a rotation that maps the unit vector nˆ (θ 2, φ 2) into nˆ (θ 2± , φ ±2) = ˆz = (0, 0, 1) and a rotation about the z axis that maps nˆ (θ 1, φ 1) into nˆ (θ1± , φ 1± ) = (sin θ , 0, cos θ ) in the x-z plane where it makes an angle θ with nˆ (θ2± , φ2± ) = ˆz. We then have Y ±,∗ m (θ2± , φ2± ) = Y±,∗ m (0, 0) and Y ±,m (θ1± , φ1± ) = Y±,m (θ , 0) in which θ is the angle between the unit vectors nˆ (θ1± , φ1± ) and nˆ (θ 2± , φ 2± ), which is the same as the angle between the unit vectors nˆ (θ 1, φ 1) and nˆ (θ2 , φ2 ). The vanishing (9.108) at θ = 0 of the associated Legendre functions P±, m for m ² = 0 and the def√ initions (9.4, 9.101, and 9.112–9.114) say that Y ±,∗ m (0, 0) = (2± + 1)/4π δm ,0 , and √ that Y±,0 (θ , 0) = (2± + 1)/4π P± (cos θ ). Thus an identity of example (11.21) gives us the spherical harmonics addition theorem(9.123) P± (cos θ ) =

2± + 1 4π

¸ ±

m =−±

Y±, m (θ1, φ 1)Y±,∗ m (θ2, φ 2).

11.22 How a Field Transforms Under a Rotation

417

11.22 How a Field Transforms Under a Rotation Under a rotation R, a field ψs (x ) that transforms under the D ( j ) representation of SU (2) responds as U ( R ) ψs ( x ) U −1 ( R ) = Ds ,s ± ( R −1) ψs ± ( Rx ). (j)

(11.127)

Example 11.22(Spin and statistics) Suppose |a, m ´ and |b, m ´ are any eigenstates of the rotation operator J3 with eigenvalue m (in units with ± = c = 1). If u and v are any two space-like points, then in some Lorentz frame they have spacetime coordinates u = (t , x , 0, 0) and v = (t , −x , 0, 0). Let U be the unitary operator that represents a right-handed rotation by π about the 3-axis or z-axis of this Lorentz frame. Then U |a , m ´ = e−imπ | a, m ´

³b, m |U −1 = ³b, m |eim . (11.128) And by (11.127), U transforms a field ψ of spin j with x ≡ (x , 0, 0) to j (11.129) U ( R ) ψs (t , x) U −1 ( R ) = Dss± ( R −1 ) ψs ± (t , − x) = ei s ψs (t , − x). Thus by inserting the identity operator in the form I = U −1U and using both (11.128) and (11.129), we find, since the phase factors exp(−im π ) and exp(im π ) and

π

( )

π

cancel,

³b, m |ψs (t , x ) ψs (t , − x)|a , m ´ = ³b, m |U ψs(t , x)U − 1U ψs (t , −x )U −1 |a, m ´ = e2i s ³b, m |ψs (t , − x)ψs (t , x )|a, m ´. (11.130) Now if j is an integer, then so is s, and the phase factor exp(2i π s) = 1 is unity. In this π

case, we find that the mean value of the equal-time commutator vanishes

³b, m |[ψs (t , x ), ψs (t , − x)]|a , m ´ = 0

(11.131)

which suggests that fields of integral spin commute at space-like separations. They represent bosons. On the other hand, if j is half an odd integer, that is, j = (2n + 1)/2, where n is an integer, then the phase factor exp(2i π s ) = −1 is minus one. In this case, the mean value of the equal-time anticommutator vanishes

³b, m |{ψs (t , x ), ψs (t , − x)}|a, m ´ = 0

(11.132)

which suggests that fields of half-odd-integral spin anticommute at space-like separations. They represent fermions. This argument shows that the behavior of fields under rotations is related to their equal-time commutation or anticommutation relations ψs (t , x )ψs ± (t ,

and their statistics.

x± ) + (−1)2 j ψs ± (t , x± )ψ s (t , x ) = 0

(11.133)

418

11 Group Theory

11.23 Addition of Two Spin-One-Half Systems The spin operators (11.112) Sa

= ±2 σa

(11.134)

obey the commutation relation (11.118)

[Sa , Sb ] = i ± ²abc Sc .

(11.135)

The raising and lowering operators S¶

= S1 ¶ i S2

(11.136)

have simple commutators with S3

[ S3, S¶] = ¶ ± S¶ . (11.137) This relation implies that if the state | 21 , m ´ is an eigenstate of S3 with eigenvalue ±m, then the states S¶ | 21 , m ´ either vanish or are eigenstates of S3 with eigenvalues ±(m ¶ 1) 1 1 1 1 , m ´ = S¶ S3 | , m ´ ¶ ±S ¶| , m ´ = ±(m ¶ 1 ) S¶ | , m ´. (11.138) 2 2 2 2 Thus the raising and lowering operators raise and lower the eigenvalues of S3 . The eigenvalues of S 3 = ±σ3/2 are ¶±/2. So with the usual sign and normalization conventions S3 S¶|

while

S+ |−´ = ±|+´

and

S− |+´ = ±|−´

(11.139)

S +|+´ = 0

and

S− |−´ = 0.

(11.140)

The square of the total spin operator is simply related to the raising and lowering operators and to S3 S2

= S12 + S22 + S32 = 21 S+ S− + 12 S− S+ + S32.

But the squares of the Pauli matrices are unity, and so Sa2 values of a. Thus 3 S2 = ±2 4

(11.141)

= (±/2) 2 for all three (11.142)

is a Casimir operator (11.124) for a spin-one-half system. Consider two spin operators S(1) and S( 2) as defined by (11.112) acting on two spin-one-half systems. Let the tensor-product states

|¶, ¶´ = |¶´1 |¶´2 = |¶´1 ⊗ |¶´2

(11.143)

11.23 Addition of Two Spin-One-Half Systems ( 1)

419

(2)

be eigenstates of S3 and S3 so that S3( 1) |+, ¶´ =

± |+, ¶´

2

± S31 |−, ¶´ = − |−, ¶´ 2 ( )

and and

S3(2) |¶, +´ = (2)

S3

± |¶, +´ 2

(11.144)

|¶, −´ = − ±2 |¶, −´ .

The total spin of the system is the sum of the two spins S = S(1) + S(2) , so S2

=

( S 1 + S 2 )2 ( )

( )

and

S3

= S31 + S32 . ( )

( )

(11.145)

The state |+, +´ is an eigenstate of S3 with eigenvalue ± S3 |+, +´ = S3( 1) |+, +´+ S 3(2) |+, +´ =

± |+, +´+ ± |+, +´ = ±|+, +´.

(11.146) 2 2 So the state of angular momentum ± in the 3-direction is |1, 1´ = |+, +´. Similarly, the state |−, −´ is an eigenstate of S3 with eigenvalue −± S3 |−, −´ = S3( 1) |−, −´ + S3(2) |−, −´ =

− 2± |−, −´ − ±2 |−, −´ = − ±|−, −´

S3|+, −´ = S3(1) |+, −´ + S3( 2) |+, −´ =

± |+, −´ − ± |+, −´ = 0

(11.147) and so the state of angular momentum ± in the negative 3-direction is |1 , −1´ = |−, −´. The states |+, −´ and |−, +´ are eigenstates of S3 with eigenvalue 0 2

2

± ± S3|−, +´ = S31 |−, +´ + S32 |−, +´ = − |−, +´ + |−, +´ = 0. 2 2 ( )

( )

(11.148)

To see which states are eigenstates of S2 , we use the lowering operator for the combined system S− = S−(1) + S −(2) and the rules (11.122, 11.139, and 11.140) to lower the state |1, 1 ´

´

S− |+, +´ = S−( 1) + S−(2)

µ

|+, +´ = ± (|−, +´ + |+ , −´) = ±

√

2 |1 , 0´.

Thus the state |1, 0 ´ is

|1, 0´ = √1 (|+, −´ + |−, +´) . 2

(11.149)

The orthogonal and normalized combination of |+, −´ and |−, +´ must be the state of spin zero |0, 0´ = √1 ( |+, −´ − |−, +´) (11.150) 2 with the usual sign convention. To check that the states |1, 0 ´ and |0 , 0´ really are eigenstates of S2 , we use (11.141 and 11.142) to write S2 as

420

11 Group Theory

S2

( S 1 + S 2 )2 = 3 ±2 + 2S 1 · S 2 2

=

( )

( )

( )

( )

= 32 ±2 + S+1 S−2 + S−1 S+2 + 2S31 S32 . ( )

( )

( )

( )

( )

( )

(11.151)

Now the sum S+(1) S−( 2) + S−(1) S+( 2) merely interchanges the states |+, −´ and |−, +´ and multiplies them by ±2 , so 2 3 2 ± | 1, 0´ + ±2 |1 , 0´ − ±2|1, 0´ 2 4 = 2±2|1, 0´ = s (s + 1) ±2|1, 0´

(11.152)

3 2 1 ± | 0, 0´ − ±2 |1 , 0´ − ±2|1, 0´ 2 2 = 0±2|1, 0´ = s (s + 1) ±2|1, 0´

(11.153)

S 2|1, 0´ =

which confirms that s = 1. Because of the relative minus sign in formula (11.150) for the state |0 , 0´, we have S 2|0, 0´ = which confirms that s

= 0.

Example 11.23(Two equivalent representations of SU (2)) The identity

Á

´

exp i θ

· σ2

µÂ∗

= σ2 exp

´

iθ

· σ2

µ

(11.154)

σ2

shows that the defining representation of SU (2) (Section 11.20) and its complex conjugate are equivalent (11.8) representations. To prove this identity, we expand the exponential on the right-hand side in powers of its argument

´

exp i θ ·

σ2

σ

Ä¸ ∞ 1 ´ σ µn Å σ2 = σ2 iθ · σ2 n! 2 n=0

µ

2

(11.155)

and use the fact that σ2 is its own inverse to get σ2

´

exp i θ ·

σ

2

µ

σ2

=

∞ 1 Á ´ σ µ Ân ¸ σ2 i θ · σ2 . n! 2 n=0

(11.156)

Since the Pauli matrices obey the anticommutation relation (11.115), and since both σ1 and σ 3 are real, while σ2 is imaginary, we can write the 2 × 2 matrix within the square brackets as σ2

´

iθ

· σ2

µ

σ2

= −i θ1 σ21 + i θ2 σ22 − i θ3 σ23 =

´

iθ

· σ2

µ∗

(11.157)

which implies the identity (11.154) σ2

´

exp i θ

· σ2

µ

σ2

=

∞ 1 Á´ σ µ∗ Ân Á ´ σ µÂ∗ ¸ iθ · = exp i θ · 2 . n! 2 n=0

(11.158)

11.25 Adjoint Representations

421

11.24 Jacobi Identity Any three square matrices A, B , and C satisfy the commutator-product rule

[ A , BC] = ABC − BC A = ABC − B AC + B AC − BC A = [ A , B ]C + B [ A , C ].

(11.159)

Interchanging B and C gives

[ A , C B ] = [ A , C ] B + C [ A , B ].

(11.160)

Subtracting the second equation from the first, we get the Jacobi identity

[ A , [ B , C ]] = [[ A , B ], C ] + [B , [ A , C ]]

(11.161)

and its equivalent cyclic form

[ A , [ B , C ]] + [ B , [C , A]] + [C , [ A , B ]] = 0. Another Jacobi identity uses the anticommutator { A , B } ≡ AB + B A {[ A , B ], C } + {[ A , C ], B } + [{ B , C }, A] = 0.

(11.162)

(11.163)

11.25 Adjoint Representations Any three generators ta , tb, and tc satisfy the Jacobi identity (11.162)

[ta , [tb , tc ]] + [tb , [tc , ta ]] + [tc , [ta , tb ]] = 0.

(11.164)

By using the structure-constant formula (11.84), we may express each of these double commutators as a linear combination of the generators

[ta , [tb , tc ]] = [ta , i fbcd td ] = − fbcd fade te [tb , [tc , ta ]]] = [tb , i f cad td ] = − fcad f bde te [tc , [ta , tb ]] = [tc , i fabd td ] = − fabd fcde te .

(11.165)

So the Jacobi identity (11.164) implies that

(

e fbcd fad

) + f cad fbde + f abd f cde te = 0

(11.166)

or since the generators are linearly independent e fbcd fad

+ f cad fbde + f abd f cde = 0.

(11.167)

If we define a set of matrices Ta by (Tb )ac

= i f abc

(11.168)

422

11 Group Theory

then, since the structure constants are antisymmetric in their lower indices, we may write the three terms in the preceding equation (11.167) as

= fcbd fdae = ( −Tb Ta ) ce e fcad f bd = − fcad f dbe = ( Ta Tb) ce

e fbcd fad

and d e fab fcd

or in matrix notation

= −i fabd ( Td )ce

(11.169) (11.170) (11.171)

[Ta , Tb] = i fabc Tc .

(11.172)

So the matrices Ta , which we made out of the structure constants by the rule c (Tb )ac = i f ab (11.168), obey the same algebra (11.68) as do the generators ta . They are the generators in the adjoint representation of the Lie algebra. If the Lie algebra has N generators ta , then the N generators Ta in the adjoint representation are N × N matrices. 11.26 Casimir Operators For any compact Lie algebra, the sum of the squares of all the generators

=

C

N ¸ a=1

ta ta

≡ ta ta

(11.173)

commutes with every generator tb

[C , tb ] = [ta ta , tb] = [ta , tb ]ta + ta [ta , tb ] = i fabctc ta + ta i fabctc = i ( fabc + fcba ) tc ta = 0

(11.174)

because of the total antisymmetry (11.82) of the structure constants. This sum, called a Casimir operator, commutes with every matrix

[C , D (α)] = [C , exp(i αa ta )] = 0

(11.175)

of the representation generated by the ta ’s. Thus by part 2 of Schur’s lemma (Section 11.7), it must be a multiple of the identity matrix C

= ta ta = cI .

(11.176)

The constant c depends upon the representation D (α) and is called the quadratic Casimir

( )Æ

C 2( D) = Tr ta2

TrI

.

(11.177)

11.27 Tensor Operators for the Rotation Group

423

Example 11.24(Quadratic Casimirs of SU (2)) The quadratic Casimir C (2) of the defining representation of SU (2) with generators ta = σa / 2 (11.113) is C2 (2) = Tr

¶¸ 3 ² a =1

1 σa 2

³2 · Æ

Tr (I )

=

3·2·

´ 1 µ2

= 34 .

(11.178)

= 2.

(11.179)

2

2

That of the adjoint representation (11.95) is

¶¸ · 3 3 ¸ 2 Æ Tr (I ) = C (3) = Tr t 2

b=1

b

a ,b,c =1

i ²abc i ²cba 3

The generators of some noncompact groups come in pairs ta and ita , and so the sum of the squares of these generators vanishes, C = ta ta − ta ta = 0. 11.27 Tensor Operators for the Rotation Group

Suppose Am is a set of 2 j + 1 operators whose commutation relations with the generators Ji of rotations are (j)

[ Ji , Amj ] = A s j ( Ji j ) sm (11.180) in which the sum over s runs from − j to j. Then A j is said to be a spin- j tensor ( )

( )

( )

( )

operator for the group SU (2 ).

Example 11.25(A spin-one tensor operator) For instance, if j

= 1, then ( Ji 1 )sm = ( )

i ±²sim , and so a spin-1 tensor operator of SU (2) is a vector A m that transforms as (1)

[ Ji , Am1 ] = As1 i ± ²sim = i ± ²ims As1 ( )

( )

(11.181)

( )

under rotations.

Let’s rewrite the definition (11.180) as Ji A(mj ) = A(s j ) ( Ji

(j)

+ A mj Ji

(11.182)

( )

= 3 so that ( J3 j )sm is diagonal, ( J3 j )sm = ±m δsm = As j ( J3 j )sm + A mj J3 = A s j ±m δ sm + Amj J3 = Amj (±m + J3 ) . ( )

( )

( )

( )

and specialize to the case i J 3 A(mj )

)sm

( )

( )

( )

( )

(11.183)

424

11 Group Theory

Thus if the state | j, s , E ´ is an eigenstate of J3 with eigenvalue ±s, then the state A (mj ) | j, s , E ´ is an eigenstate of J3 with eigenvalue ±(m + s ) J 3 A(mj ) | j , s , E ´ = A (mj ) (±m + J 3) | j, s , E ´ =

± (m + s ) A mj | j, s, E ´. ( )

(11.184) The J 3 eigenvalues of the tensor operator A(mj ) and the state | j, s , E ´ add.

11.28 Simple and Semisimple Lie Algebras An invariant subalgebra is a set of generators ta(i ) whose commutator with every generator tb of the group is a linear combination of the generators tc(i ) of the invariant subalgebra

[tai , tb] = i fabc tc i . ( )

( )

(11.185)

The whole algebra and the null algebra are trivial invariant subalgebras. An algebra with no nontrivial invariant subalgebras is a simple algebra. A simple algebra generates a simple group. An algebra that has no nontrivial abelian invariant subalgebras is a semisimple algebra. A semisimple algebra generates a semisimple group.

Example 11.26(Some simple Lie groups) The groups of unitary matrices of unit determinant SU (2), SU (3), . . . are simple. So are the groups of orthogonal matrices of unit determinant S O (n) (except S O (4), which is semisimple) and the groups of symplectic matrices Sp(2n) (Section 11.33). Example 11.27 (Unification and grand unification) The symmetry group of the standard model of particle physics is a direct product of an SU (3) group that acts on colored fields, an SU (2) group that acts on left-handed quark and lepton fields, and a U (1) group that acts on fields that carry hypercharge. Each of these three groups, is an invariant subgroup of the full symmetry group SU (3)c ⊗ SU (2)± ⊗ U (1)Y , and the last one is abelian. Thus the symmetry group of the standard model is neither simple nor semisimple. In theories of grand unification, the strong and electroweak interactions unify at very high energies and are described by a simple group which makes all its charges simple multiples of each other. Georgi and Glashow suggested the group SU (5) in 1976 (Howard Georgi, 1947– ; Sheldon Glashow, 1932– ). Others have proposed S O (10) and even bigger groups.

11.30 SU(3) and Quarks

425

11.29 SU (3) The Gell-Mann matrices are

⎛0 λ1 = ⎝ 1

⎛1 0 0⎞ λ3 = ⎝0 −1 0 ⎠ , 0 0 0 0 0 0 0 ⎛0 0 1⎞ ⎛0 0 −i ⎞ ⎛0 0 0⎞ λ4 = ⎝ 0 0 0⎠ , λ5 = ⎝0 0 0 ⎠ , λ6 = ⎝0 0 1 ⎠ , 1 0 0 i 0 0 0 1 0 ⎛0 0 0 ⎞ ⎛1 0 0 ⎞ 1 λ7 = ⎝ 0 0 −i ⎠ , and λ8 = √ ⎝ 0 1 0 ⎠ . (11.186) 3 0 i 0 0 0 −2 The generators ta of the 3 × 3 defining representation of SU ( 3) are these Gell-Mann 1 0 0

⎞

0 0⎠ , 0

⎛0 −i 0⎞ λ2 = ⎝ i 0 0⎠ ,

matrices divided by 2

ta

= λa /2

(Murray Gell-Mann, 1929–). The eight generators ta are orthogonal with k

(11.187)

= 1/2

1 δab 2

(11.188)

[ta , tb ] = i fabc tc .

(11.189)

Tr (ta tb) = and satisfy the commutation relation

The trace formula (11.72) gives us the SU (3 ) structure constantsas

= − 2i Tr ([ta , tb]tc ) . They are real and totally antisymmetric with f123 = 1, f147 = − f156 = f 246 = f257 = f345 = − f 367 = 1/2. fabc

f458

=

f678

=

√

(11.190) 3/2, and

While no two generators of SU (2 ) commute, two generators of SU (3 ) do. In the representation (11.186,11.187), t3 and t8 are diagonal and so commute

[t3, t8 ] = 0.

(11.191)

They generate the Cartan subalgebra(11.199) of SU (3 ). 11.30 SU (3) and Quarks

The generators (11.186 and 11.187) give us the 3 × 3 representation D (α) = exp ( i αa ta )

(11.192)

426

11 Group Theory

in which the sum a = 1, 2, . . . , 8 is over the eight generators ta . This representation acts on complex 3-vectors and is called the 3. Note that if D (α1) D(α2 ) = D (α3)

(11.193)

then the complex conjugates of these matrices obey the same multiplication rule D ∗(α1 ) D ∗ (α2) = D ∗(α3 )

(11.194)

and so form another representation of SU ( 3). It turns out that (unlike in SU ( 2)) this representation is inequivalent to the 3; it is the 3. There are three quarks with masses less than about 100 MeV/c 2 – the u, d, and s quarks. The other three quarks c, b, and t are more massive; m c = 1.28 GeV, m b = 4.18 GeV, and m t = 173.1 GeV. Nobody knows why. Gell-Mann and Zweig suggested that the low-energy strong interactions were approximately invariant under unitary transformations of the three light quarks, which they represented by a 3, and of the three light antiquarks, which they represented by a 3. They imagined that the eight light pseudoscalar mesons, that is, the three pions π −, π 0 , π + , the 0 neutral η, and the four kaons K 0 , K + , K − K , were composed of a quark and an antiquark. So they should transform as the tensor product 3 ⊗ 3 = 8 ⊕ 1.

(11.195)

They put the eight pseudoscalar mesons into an 8. They imagined that the eight light baryons – the two nucleons N and P , the three sigmas ³− , ³ 0, ³ + , the neutral lambda ´, and the two cascades µ− and µ0 were each made of three quarks. They should transform as the tensor product 3 ⊗ 3 ⊗ 3 = 10 ⊕ 8 ⊕ 8 ⊕ 1 .

(11.196)

They put the eight light baryons into one of these 8’s. When they were writing these papers, there were nine spin-3/2 resonances with masses somewhat heavier than 1200 MeV/c2 – four ¶ ’s, three ³ ∗’s, and two µ∗ ’s. They put these into the 10 and predicted the tenth and its mass. In 1964, a tenth spin-3/2 resonance, the ·− , was found with a mass close to their prediction of 1680 MeV/c2 , and by 1973 an MIT-SLAC team had discovered quarks inside protons and neutrons. (George Zweig, 1937–) 11.31 Fierz Identity forSU (n )

In terms of the n × n matrices t a that are the n 2 − 1 hermitian generators of the fundamental (defining) representation of SU ( n ) with

(

Tr t a t b

)

= k δab ,

(11.197)

11.33 Symplectic Group Sp(2n)

427

Fierz’s identity for SU (n ) (Nishi, 2005) 1 k

¸

n2 −1 a=1

1 δi j δk ± n

tiaj tka± +

= δi

±

(11.198)

δk j

follows from his identity (1.171) for the generators of U ( n ). 11.32 Cartan Subalgebra In any Lie group, the maximum set of mutually commuting generators Ha generate the Cartan subalgebra

[ Ha , Hb ] = 0

(11.199)

which is an abelian subalgebra. The number of generators in the Cartan subalgebra is the rank of the Lie algebra. The Cartan generators Ha can be simultaneously diagonalized, and their eigenvalues or diagonal elements are the weights Ha |μ, x , D ´ = μa |μ, x , D ´

(11.200)

in which D labels the representation and x whatever other variables are needed to specify the state. The vector μ is the weight vector. The roots are the weights of the adjoint representation. 11.33 Symplectic GroupSp(2n) The real symplectic group Sp( 2n , R ) is the group of real linear transformations R i ± that preserve the canonical commutation relations of quantum mechanics

[qi , pk ] = i ±δik , [qi , qk ] = 0, and [ pi , pk ] = 0 (11.201) for i, k = 1, . . . , n. In terms of the 2n vector v = ( q1, . . . , qn , p1 , . . . , p n ) of quantum variables, these commutation relations are [vi , v k ] = i ± Jik where J is the 2n × 2n real matrix ² 0 I³ J= (11.202) −I 0 in which I is the n × n identity matrix. The real linear transformation 2n ± ¸ Ri v vi = (11.203) ±

=1

±

±

will preserve the quantum-mechanical commutation relations (11.201) if

[v ± , v± ] = i

k

Ä¸ 2n =1

±

Ri ± v± ,

2n ¸

m =1

Å

Rkm vm

= i±

2n ¸

±,k

=1

R i ± J± m Rkm

= i ± Jik

(11.204)

428

11 Group Theory

which in matrix notation is just the condition RJR

=J

T

(11.205)

that the matrix R be in the real symplectic group Sp(2n , R). The transpose and product rules (1.205 and 1.222) for determinants imply that det ( R ) = ¶1, but the condition (11.205) itself implies that det( R ) = 1 (Zee, 2016, p. 281). In terms of the matrix J and the hamiltonian H (v) = H (q , p ), Hamilton’s equations have the symplectic form q˙i

= ∂ H∂(qp, p)

and p˙i

i

= − ∂ H∂( qq, p)

d vi dt

or

i

=

2n ¸

=1

Ji ±

∂ H (v)

±

∂ v±

(11.206)

.

A matrix R = e t obeys the defining condition (11.205) if t J = − J t (Exercise 11.22) or equivalently if J t J = t . It follows (Exercise 11.23) that the generator t must be T

T

t

=

²b

s1 −b

s2

³

(11.207)

T

in which the matrices b , s1 , s 2 are real, and both s 1 and s2 are symmetric. The group Sp (2n , R ) is noncompact. Example 11.28(Squeezed states) A coherent state |α´ is an eigenstate of the anni√ hilation operator a = (λq + i p/λ)/ 2± with a complex eigenvalue α (3.146), √m ω, the variances are (¶q )2 = a|α ´ = α| α´. In a coherent state with λ = ³α |(q − q¯ )2 |α´ = ±/(2m ω) and (¶ p)2 = ³α|( p − p¯)2 |α´ = ±m ω/2. Thus coherent states have minimum uncertainty, ¶ q ¶ p = ±/2. √ A squeezed state |α´± is an eigenstate of a ± = (λq ± + i p± /λ)/ 2± in which q ± and p ± are related to q and p by an Sp (2) transformation

²q ±³ ²a = c p±

in which ad − bc state | α± ´ are ¶q

=

³ ²q ³ p

with inverse

² q ³ ² d −b ³ ² q ± ³ = −c a p ± p

(11.208)

= 1. The standard deviations of the variables q and p in the squeezed

» Ç ± 2

b d

d2 mω

+ m ω b2

» Ç

and

¶p

= ±2

c2 mω

+ m ω a 2.

(11.209)

Thus by making b and d tiny, one can reduce the uncertainty ¶q by any factor, but then ¶p will increase by the same factor since the determinant of the Sp (2 ) transformation must remain equal to unity, ad − bc = 1.

11.34 Quaternions

429

Example 11.29(Sp (2, R )) The matrices (Exercise 11.27) T

=¶

²cosh θ sinh θ

sinh θ cosh θ

³

(11.210)

are elements of the noncompact symplectic group Sp (2, R ).

M

A dynamical map that takes a 2n vector v v(t1) to v(t2 ) has a jacobian (section 1.21) Mab =

= ( q1 , . . . , qn , p1 , . . . , pn ) from

∂ z a (t2 ) ∂ z b (t1)

(11.211)

in Sp(2n , R ) if and only if its dynamics are hamiltonian (11.206, Section 18.1) (Carl Jacobi 1804–1851, William Hamilton 1805–1865). The complex symplectic group Sp (2n , C ) consists of all 2n × 2n complex matrices C that satisfy the condition CJC

T

= J.

(11.212)

The group Sp( 2n , C ) also is noncompact. The unitary symplectic group U Sp(2n ) consists of all 2n × 2n unitary matrices U that satisfy the condition U JU

T

= J.

(11.213)

It is compact. 11.34 Quaternions

If z and w are any two complex numbers, then the 2 × 2 matrix

² z q= −w ∗

w

³

z∗

(11.214)

is a quaternion. The quaternions are closed under addition and multiplication and under multiplication by a real number (Exercise 11.21), but not under multiplication by an arbitrary complex number. The squared norm of q is its determinant

ºqº2 = |z |2 + |w|2 = det q.

(11.215)

The matrix products q † q and q q † are the squared norm ºq º2 multiplied by the 2 × 2 identity matrix q† q

= q q† = ºq º2 I .

(11.216)

430

11 Group Theory

The 2 × 2 matrix

² 0 1³ i σ2 = (11.217) −1 0 provides another expression for ºq º2 in terms of q and its transpose q q i σ 2 q = ºq º 2 i σ 2 . (11.218) Clearly ºq º = 0 implies q = 0. The norm of a product of quaternions is the product of their norms ± ± (11.219) ºq1 q2 º = det(q1q2) = det q1 det q2 = ºq1 ººq2º. T

T

The quaternions therefore form an associative division algebra(over the real numbers); the only others are the real numbers and the complex numbers; the octonions are a nonassociative division algebra. One may use the Pauli matrices to define for any real 4-vector x a quaternion q ( x ) as q( x ) = x0

² x+ +i σki xxk = xx0 ++ ii σx ·³x = − 0x + i x3 x2 − i x 1 2 1 0 3

with squared norm

ºq (x )º2 = x02 + x12 + x22 + x32.

(11.220)

(11.221)

The product rule (11.114) for the Pauli matrices tells us that the product of two quaternions is q ( x ) q ( y ) = (x 0

+ i σ · x)(y0 + i σ · y) (11.222) = x0 y0 + i σ · ( y0 x + x0 y) − i ( x × y) · σ − x · y

so their commutator is

[q( x ), q ( y) ] = − 2 i ( x × y) · σ .

(11.223)

Example 11.30(Lack of analyticity) One may define a function f (q) of a quaternionic variable and then ask what functions are analytic in the sense that the (one-sided) derivative f ± (q ) = lim ± q

→0

½

f (q

+ q± ) −

¾

f (q ) q ±−1

(11.224)

exists and is independent of the direction through which q ± → 0. This space of functions is extremely limited and does not even include the function f (q ) = q2 (exercise 11.24).

11.35 Quaternions and Symplectic Groups

431

11.35 Quaternions and Symplectic Groups This section is optional on a first reading. One may regard the unitary symplectic group U Sp(2n ) as made of 2n × 2n unitary matrices W that map n-tuples q of quaternions into n-tuples q ± = W q of quaternions with the same value of the quadratic quaternionic form

ºq± º2 = ºq1± º2 + ºq2± º2 + · · · + ºqn± º2 = ºq1º2 + ºq2º2 + · · · + ºqn º2 = ºqº2.

(11.225)

By (11.216), the quadratic form º q ± º2 times the 2 × 2 identity matrix I is equal to the hermitian form q ±†q ±

ºq ± º2 I = q ±† q± = q1±†q1± + · · · + qn±† qn± = q † W † W q (11.226) and so any matrix W that is both a 2n × 2n unitary matrix and an n × n matrix of quaternions keeps ºq ± º2 = ºq º2 ºq± º2 I = q† W † W q = q †q = ºq º2 I . (11.227) The group U Sp( 2n ) thus consists of all 2n × 2n unitary matrices that also are n × n matrices of quaternions. (This last requirement is needed so that q ± = W q is

an n-tuple of quaternions.) The generators ta of the symplectic group U Sp(2n ) are 2n × 2n direct-product matrices of the form

⊗ A , σ1 ⊗ S1 , σ2 ⊗ S2, and σ3 ⊗ S3 (11.228) in which I is the 2 × 2 identity matrix, the three σi ’s are the Pauli matrices, A is an imaginary n × n antisymmetric matrix, and the Si are n × n real symmetric I

matrices. These generators ta close under commutation

[ta , tb ] = i fabc tc .

(11.229)

D (α) = ei αa ta

(11.230)

Any imaginary linear combination i αa ta of these generators is not only a 2n × 2n antihermitian matrix but also an n × n matrix of quaternions. Thus the matrices are both unitary 2n × 2n matrices and n elements of the group Sp( 2n ).

× n quaternionic matrices and so are

Example 11.31 (U Sp(2) ∼ = SU (2)) There is no 1 × 1 antisymmetric matrix, and there is only one 1 × 1 symmetric matrix. So the generators ta of the group Sp (2) are the Pauli matrices ta = σ a , and Sp (2) = SU (2). The elements g(α) of

432

11 Group Theory

SU (2) are quaternions of unit norm (Exercise 11.20), and so the product g(α)q is a quaternion

ºg(α)q º2 = det(g(α)q ) = det(g(α)) det q = det q = ºq º2

(11.231)

with the same squared norm. Example 11.32(S O (4) ∼ = SU (2) ⊗ SU (2)) If g and h are any two elements of the group SU (2), then the squared norm (11.221) of the quaternion q (x ) = x0 + i σ · x is invariant under the transformation q (x ± ) = g q (x ) h −1, that is, x0±2 + x1±2 + x ±22 + x3±2 = x02 + x12 + x22 + x32. So x → x ± is an S O (4) rotation of the 4-vector x. The Lie algebra of S O (4) thus contains two commuting invariant SU (2) subalgebras and so is semisimple. Example 11.33(U Sp(4) ∼ = S O (5)) Apart from scale factors, there are three real symmetric 2 × 2 matrices S1 = σ1 , S2 = I , and S3 = σ3 and one imaginary antisymmetric 2 × 2 matrix A = σ2. So there are 10 generators of U Sp(4) = S O (5)

² 0 −i I ³ , iI 0 ²σ 0 ³ k tk2 = σ k ⊗ I = , 0 σk

t1

= I ⊗ σ2 =

tk1

= σk ⊗ σ1 =

tk3

= σk ⊗ σ3 =

²0

σk

³

0

σ

²σk

0

k

³

(11.232)

−σk

0

where k runs from 1 to 3.

Another way of looking at U Sp(2n ) is to use (11.218) to write the quadratic form º q º2 as

ºqº2J = q J q

(11.233)

T

in which the 2n × 2n matrix J has n copies of i σ2 on its 2 × 2 diagonal

⎛ iσ ⎜⎜ 02 ⎜⎜ 0 J =⎜ ⎜⎜ 0 ⎜⎝ ...

0

0 0

0 0 0

...

i σ2 0 0

i σ2 0

i σ2

...

.. .

.. .

.. .

..

0

0

0

0

0

... ...

.

⎞ ⎟⎟ ⎟⎟ ⎟⎟ ⎟⎟ 0 ⎠

0 0 0 0

i σ2

and is the matrix J (11.202) in a different basis. Thus any n quaternions W that satisfies W

T

W

T

(11.234)

JW =J

×n

matrix of (11.235)

also satisfies

ºW q º2 J = q

T

J W q = q J q = ºq º2 J T

(11.236)

11.37 Group Integration

433

and so leaves invariant the quadratic form (11.225). The group U Sp(2n ) therefore consists of all 2n × 2n unitary matrices W that satisfy (11.235) and that also are n × n matrices of quaternions. 11.36 Compact Simple Lie Groups Élie Cartan (1869–1951) showed that all compact, simple Lie groups fall into four infinite classes and five discrete cases. For n = 1, 2 , . . . , his four classes are:

SU (n + 1) which are (n + 1) × (n + 1) unitary matrices with unit determinant, ● Bn = S O (2n + 1) which are (2n + 1) × (2n + 1 ) orthogonal matrices with unit determinant, ● C n = U Sp (2n ) which are the unitary 2n × 2n symplectic matrices, and ● Dn = S O (2n ) which are 2n × 2n orthogonal matrices with unit determinant. ● An

=

The five discrete cases are the exceptional groupsG 2 , F4, E6 , E7 , and E8 . The exceptional groups are associated with the octonians a + b αi α

(11.237)

where the α-sum runs from 1 to 7; the eight numbers a and bα are real; and the seven i α ’s obey the multiplication law i α iβ

= −δ + g αβ

αβγ

iγ

(11.238)

in which gαβγ is totally antisymmetric with g123

= g247 = g451 = g562 = g634 = g375 = g716 = 1.

(11.239)

Like the quaternions and the complex numbers, the octonians form a division algebra with an absolute value

|a + b i | = α α

that satisfies

(

a2

+ b2

)1 2 /

α

| AB | = | A|| B |

(11.240) (11.241)

but they lack associativity. The group G 2 is the subgroup of S O ( 7) that leaves the g αβγ ’s of (11.238) invariant. 11.37 Group Integration Suppose we need to integrate some function f (g ) over a group. Naturally, we want to do so in a way that gives equal weight to every element of the group. In particular,

434

11 Group Theory

if g ± is any group element, we want the integral of the shifted function f (g ± g) to be the same as the integral of f (g )

È

f ( g ) dg

=

È

f ( g ± g ) dg .

(11.242)

Such a measure dg is said to be left invariant(Creutz, 1983, chap. 8). Let’s use the letters a = a1 , . . . , a n , b = b1 , . . . , bn , and so forth to label the elements g (a ), g (b ), so that an integral over the group is

È

f ( g ) dg

=

È

f ( g (a )) m (a ) d n a

(11.243)

in which m ( a ) is the left-invariant measure and the integration is over the n-space of a’s that label all the elements of the group. To find the left-invariant measure m (a ), we use the multiplication law of the group g (a (c , b )) ≡ g (c ) g (b )

(11.244)

and impose the requirement (11.242) of left invariance with g ± ≡ g (c )

È

f (g ( b)) m ( b) d b = n

È

f (g ( c) g (b )) m (b ) d b = n

È

f ( g (a ( c, b))) m (b ) d n b .

(11.245) We change variables from b to a = a ( c, b) by using the jacobian det(∂ b/∂ a ) which gives us d n b = det(∂ b/∂ a ) d n a

È

n

f ( g (b)) m (b ) d b

=

È

f ( g (a )) det(∂ b/∂ a ) m (b ) d n a .

(11.246)

= a (c, b) on the left-hand side of this equation, we find m (a ) = det(∂ b /∂ a ) m (b ) (11.247) or since det(∂ b/∂ a ) = 1/ det(∂ a ( c, b)/∂ b ) m ( a (c , b )) = m ( b)/ det(∂ a (c , b )/∂ b). (11.248) So if we let g ( b) → g ( 0) = e, the identity element of the group, and set m (e ) = 1,

Replacing b by a

then we find for the measure

m (a ) = m (c ) = m (a (c , b)) |b=0

= 1/ det(∂ a(c, b)/∂ b) |b= 0 .

Example 11.34(The invariant measure for SU(2)) SU (2) is given by (11.117) as

´

exp i

θ

(11.249)

A general element of the group

µ · σ2 = I cos θ2 + i θˆ · σ sin 2θ .

(11.250)

11.38 Lorentz Group

435

Setting a0 = cos (θ /2) and a = θˆ sin(θ /2), we have

g (a ) = a0 + i a · σ

(11.251)

in which a2 ≡ a20 + a · a = 1. Thus, the parameter space for SU (2) is the unit sphere S3 in 4 dimensions. Its invariant measure is

È

δ(1

−a

or

2

)d

4

a=

È

δ(1

− −a a20

2

)d

4

m (a) = (1 − a2 )−1/2

a

È

=

(1

− a2 )−1 2 d 3a /

= | cos(θ1 /2)| .

(11.252)

(11.253)

We also can write the arbitrary element (11.251) of SU (2) as

±

g (a) = ¶ 1 − a2 + i a · σ and the group-multiplication law (11.244) as

±

1−

a2

+i a·σ =

´±

1−

c2

+ i c· σ

µ ²É

(11.254)

1−b

2

+i b·σ

³ .

(11.255)

Thus, by multiplying both sides of this equation by σ i and taking the trace, we find (Exercise 11.28) that the parameters a( c, b) that describe the product g (c) g(b) are a ( c, b ) =

±

1−

c2 b

+

É

1 − b2 c − c × b.

(11.256)

To compute the jacobian of our formula (11.249) for the invariant measure, we differentiate this expression (11.256) at b = 0 and so find (Exercise 11.29) m (a) = 1/ det(∂a (c , b)/∂ b)|b=0 = (1 − a2 )−1/ 2 as the left-invariant measure in agreement with (11.253).

(11.257)

11.38 Lorentz Group The Lorentz group O (3, 1) is the set of all linear transformations L that leave invariant the Minkowski inner product xy

≡ x · y − x 0 y0 = x

in which η is the diagonal matrix

⎛−1 ⎜0 η = ⎜ ⎝

0 1 0 0

0 0

0 0 1 0

T

ηy

(11.258)

⎞

0 0⎟ ⎟. 0⎠ 1

(11.259)

So L is in O ( 3, 1 ) if for all 4-vectors x and y (L x )

T

ηL

y

=x

T

L

T

η

Ly

=x

T

η y.

(11.260)

436

11 Group Theory

Since x and y are arbitrary, this condition amounts to L

T

η

=η

L

or

L ab ηac L cd

= ηbd .

(11.261)

Taking the determinant of both sides and using the transpose (1.205) and product (1.225) rules, we have 2

(det L )

= 1.

(11.262)

So det L = ¶1, and every Lorentz transformation L has an inverse. Multiplying (11.261) by η, we get ηL η L T

= η2 = I

or

η

eb

= ηeb ηbd = δed

(11.263)

= ηeb L ab ηac = L ce .

(11.264)

L ab ηac L c d

which identifies L −1 as L −1

= ηL

T

η

L −1ec

or

The subgroup of O (3 , 1) with det L = 1 is the proper Lorentz group S O (3 , 1) . The subgroup of S O (3, 1) that leaves invariant the sign of the time component of timelike vectors is the proper orthochronous Lorentz groupS O + (3, 1 ). To find the Lie algebra of S O +( 3, 1 ), we take a Lorentz matrix L = I + ω that differs from the identity matrix I by a tiny matrix ω and require L to obey the condition (11.261) for membership in the Lorentz group

( I + ω ) η ( I + ω) = η + ω η + η ω + ω ω = η. T

Neglecting ω

T

ω,

we have ω

T

T

η

T

(11.265)

= −η ω or since η2 = I ω

T

= − η ω η.

(11.266)

This equation implies that the matrix ωab is antisymmetric when both indexes are down ωab

= −ωba .

(11.267)

To see why, we write it (11.266) as ωea = − ηab ωbc ηce and then multiply both sides by ηde so as to get ωda = ηde ωea = −ηab ωbc ηce ηde = −ωac δcd = −ωad . The key equation (11.266) also tells us (Exercise 11.31) that under transposition the time–time and space–space elements of ω change sign, while the time–space and spacetime elements do not. That is, the tiny matrix ω is for infinitesimal θ and λ a linear combination ω

= θ · R+λ · B

(11.268)

11.38 Lorentz Group

of three antisymmetric space–space matrices

⎛0 ⎜0 R1 = ⎜ ⎝ 0 0

0 0 0 0

0 0 0 1

⎛0 ⎜0 R2 = ⎜ ⎝

⎞

0 0 ⎟ ⎟ −1 ⎠ 0

0 0

0 0 0 −1

⎞

0 0 0 0

0 1⎟ ⎟ 0⎠ 0

and of three symmetric time–space matrices

⎛0 ⎜1 B1 = ⎜ ⎝0

1 0 0 0

0 0 0 0

⎞

0 0⎟ ⎟ 0⎠ 0

⎛0 ⎜0 B2 = ⎜ ⎝1

0 0 0 0

1 0 0 0

437

⎞

0 0⎟ ⎟ 0⎠ 0

⎛0 ⎜0 R3 = ⎜ ⎝ 0 0

⎛0 ⎜0 B3 = ⎜ ⎝0

⎞

0 0 1 0

= I − iθ

±

i R± − i λ± i B±

≡ I −iθ J −iλ K ±

±

±

0 0 0 0

⎞

1 0⎟ ⎟ 0⎠ 0 0 1 0 (11.270) all of which satisfy condition (11.266). The three R± are 4 × 4 versions of the rotation generators (11.93); the three B± generate Lorentz boosts. If we write L = I + ω as L

0 0 0 0

0 0 −1 0⎟ ⎟ 0 0⎠ 0 0 (11.269)

(11.271)

±

then the three matrices J± = i R± are imaginary and antisymmetric, and therefore hermitian. But the three matrices K ± = i B± are imaginary and symmetric, and so are antihermitian. The 4 × 4 matrix L = exp(i θ± J± − i λ± K ± ) is not unitarybecause the Lorentz group is not compact. One may verify (Exercise 11.32) that the six generators J± and K ± satisfy three sets of commutation relations:

[ Ji , J j ] = i ²i jk Jk [ Ji , K j ] = i ²i jk K k [ K i , K j ] = − i ²i jk Jk .

(11.272) (11.273) (11.274)

The first (11.272) says that the three J± generate the rotation group S O (3 ); the second (11.273) says that the three boost generators transform as a 3-vector under S O ( 3); and the third (11.274) implies that four cancelling infinitesimal boosts can amount to a rotation. These three sets of commutation relations form the Lie algebra of the Lorentz group S O (3, 1). Incidentally, one may show (Exercise 11.33) that if J and K satisfy these commutation relations (11.272–11.274), then so do J

− K.

and

(11.275)

The infinitesimal Lorentz transformation (11.271) is the 4 × 4 matrix L

= I +ω = I + θ R + λ B ±

±

±

±

⎛1 ⎜λ1 =⎜ ⎝λ2

λ3

λ1

1

θ3

− θ2

λ2

− θ3 1

θ1

⎞ θ2 ⎟ ⎟ − θ1⎠ . λ3

1

(11.276)

438

11 Group Theory

It moves any 4-vector x to x ± = L x or in components x ±a

= x 0 + λ1 x 1 + λ2 x 2 + λ3 x 3 x ±1 = λ1 x 0 + x 1 − θ 3 x 2 + θ 2 x 3 x ±2 = λ2 x 0 + θ3 x 1 + x 2 − θ 1 x 3 x ±3 = λ3 x 0 − θ2 x 1 + θ1 x 2 + x 3. More succinctly with t = x 0 , this is t± = t + λ · x x± = x + tλ + θ ∧ x in which ∧ ≡ × means cross-product.

= L ab x b

x ±0

(11.277)

(11.278)

For arbitrary real θ and λ, the matrices L

= e−i · J −i · K θ

λ

(11.279)

form the subgroup of O (3, 1) that is connected to the identity matrix I . The matrices of this subgroup have unit determinant and preserve the sign of the time of time-like vectors, that is, if x 2 < 0, and y = L x, then y 0 x 0 > 0. This is the proper orthochronous Lorentz group S O + (3 , 1). The rest of the (homogeneous) Lorentz group can be obtained from it by space P , time T , and spacetime PT reflections. The task of finding all the finite-dimensional irreducible representations of the proper orthochronous homogeneous Lorentz group becomes vastly simpler when we write the commutation relations (11.272–11.274) in terms of the hermitian matrices 1 J±¶ = ( J± ¶ i K ± ) (11.280) 2 which generate two independent rotation groups

[ Ji+, J j+] = i ²i jk Jk+ [ Ji−, J j−] = i ²i jk Jk− [ Ji+, J j−] = 0.

(11.281)

Thus the Lie algebra of the Lorentz group is equivalent to two copies of the Lie algebra (11.119) of SU (2) . The hermitian generators of the rotation subgroup SU (2) are by (11.280) J

= J + + J −.

(11.282)

The antihermitian generators of the boosts are (also by 11.280) K

= −i

(

) J+ − J− .

(11.283)

11.39 Left-Handed Representation of the Lorentz Group

439

Since J + and J − commute, the finite-dimensional irreducible representations of the Lorentz group are the direct products D ( j, j ± )(θ , λ ) = e −i θ · J −i λ· K

= e −i − θ

(

= e −i − θ

(

λ)

· J+ + ( −i θ +λ)· J −

(11.284)

· J + e( −i θ +λ)· J −

λ)

of the nonunitary representations D( j,0) (θ , λ) = e (−i θ −λ) · J

+

and

D( 0, j ) (θ , λ ) = e (−i θ +λ) · J ±

−

(11.285)

generated by the three ( 2 j + 1) × (2 j + 1) matrices J±+ and by the three ( 2 j ± + 1) × (2 j ± + 1) matrices J±− . ± Under a Lorentz transformation L , a field ψm( j,,mj ± ) ( x ) that transforms under the ± D ( j , j ) representation of the Lorentz group responds as ±

±

±

( j,0) −1 ( 0, j ) −1 ( j, j ) U ( L ) ψm( j,,mj ± ) (x ) U −1 ( L ) = Dmm ±± ( L ) Dm ± m ±±± ( L ) ψm ±± ,m ±±± ( L x ).

(11.286)

±

The representation D ( j , j ) describes objects of the spins s that can arise from the direct product of spin- j with spin- j ± (Weinberg, 1995, p. 231) s

= j + j ± , j + j ± − 1, . . . , | j − j ± |.

(11.287)

For instance, D (0,0) describes a spinless field or particle, while D( 1/ 2,0) and D (0,1/2) respectively describe left-handed and right-handed spin-1/2 fields or particles. The representation D (1/2,1/2) describes objects of spin 1 and spin 0 – the spatial and time components of a 4-vector. The interchange of J + and J − replaces the generators J and K with J and − K , a substitution that we know (11.275) is legitimate. 11.39 Left-Handed Representation of the Lorentz Group

The generators of the 2-dimensional representation D( 1/2,0) with j = 1/2 and j ± = 0 are given by (11.282 and 11.283) with J + = σ /2 and J − = 0. They are J

= 21 σ

and

K

= −i 12 σ .

(11.288)

The 2 × 2 matrix D ( 1/ 2,0) that represents the Lorentz transformation (11.279) L is

= e−i · J −i · K θ

D (1/ 2,0) (θ , λ) = exp (−i θ

λ

· σ /2 − λ · σ /2 ) .

(11.289) (11.290)

And so the generic D (1/2,0) matrix is D (1/2,0) (θ , λ ) = e − z· σ / 2

(11.291)

440

11 Group Theory

with λ = Rez and θ = Imz. It is nonunitary and of unit determinant; it is a member of the group S L (2, C ) of complex unimodular 2× 2 matrices. The (covering) group S L (2 , C ) relates to the Lorentz group S O (3 , 1) as SU ( 2) relates to the rotation group S O (3 ). Example 11.35 (The standard left-handed boost) For a particle of mass m > 0, the boost that takes the 4-vector k = (m , 0) to p = ( p0, p), where p0 = ± standard m 2 + p2 is a boost in the pˆ direction. It is the 4 × 4 matrix B ( p) = R ( pˆ ) B3( p0) R −1( pˆ ) = exp

(α pˆ · B )

(11.292)

in which cosh α = p0 / m and sinh α = | p|/ m, as one may show by expanding the exponential (Exercise 11.35). This standard boost is represented by D (1/2,0) (0, λ ), the 2 × 2 matrix (11.289), with λ = α pˆ . The power-series expansion of this matrix is (Exercise 11.36) D (1/2,0) (0, α pˆ ) = e−α pˆ ·σ / 2 = I cosh(α/2) − pˆ · σ sinh(α/2)

=I

É

+ m )/(2m ) − pˆ · σ 0 = ( p±+ m ) I0 − p · σ 2m ( p + m ) (p0

É

( p0

− m )/(2m ) (11.293)

in which I is the 2 × 2 identity matrix.

Under D (1/ 2,0) , the vector (− I , σ ) transforms like a 4-vector. For tiny θ and λ , one may show (Exercise 11.38) that the vector (− I, σ ) transforms as D †( 1/2,0) ( θ , λ)( − I ) D (1/2,0) (θ , λ ) =

− I +λ·σ D † 1 2 0 ( θ , λ) σ D 1 2 0 (θ , λ ) = σ + (− I )λ + θ ∧ σ ( / , )

( / , )

(11.294)

which is how the 4-vector (t , x ) transforms (11.278). Under a finite Lorentz transformation L , the 4-vector S a ≡ (− I, σ ) goes to D †( 1/ 2,0) ( L ) S a D (1/2,0) ( L ) = L a b S b.

(11.295)

A massless field u ( x ) that responds to a unitary Lorentz transformation U ( L ) like U ( L ) u ( x ) U −1 ( L ) = D (1/ 2,0) ( L −1) u ( L x )

(11.296)

is called a left-handed Weyl spinor . The action density

L ( x ) = i u †( x ) (∂0 I − ∇ · σ ) u( x ) ±

(11.297)

is Lorentz covariant, that is U ( L ) L ± ( x ) U −1 ( L ) =

L ( L x ). ±

(11.298)

11.39 Left-Handed Representation of the Lorentz Group

441

Example 11.36(Why L± is Lorentz covariant) We first note that the derivatives ∂b± in L± (L x ) are with respect to x ± = L x. Since the inverse matrix L −1 takes x ± back to x = L −1 x ± or in tensor notation x a = L −1a b x ±b , the derivative ∂b± is a

(11.299) = ∂ x∂ ±b = ∂∂xx±b ∂ ∂x a = L −1a b ∂ ∂x a = ∂a L−1a b . Now using the abbreviation ∂0 I −∇ · σ ≡ − ∂a Sa and the transformation laws (11.295

±

∂b

and 11.296), we have

U (L ) L ± (x ) U −1( L ) = i u † (L x ) D (1/2,0)† (L −1)( − ∂a Sa )D (1/ 2,0) ( L −1 ) u( L x )

= i u † (L x )( − ∂a L −1a b Sb ) u (L x ) = i u † (L x )( − ∂b± Sb ) u (L x ) = L ( L x ) ±

(11.300)

which shows that L ± is Lorentz covariant.

Incidentally, the rule (11.299) ensures, among other things, that the divergence ∂a V a is invariant

(

∂a V

a

)±

= ∂a± V ±a = ∂b L −1b a La c V c = ∂b δ bc V c = ∂b V b .

(11.301)

Example 11.37(Why u is left handed) The spacetime integral S of the action density L± is stationary when u (x ) satisfies the wave equation (∂ 0 I

− ∇ · σ ) u (x ) = 0

(11.302)

or in momentum space

+ p · σ ) u( p) = 0. (11.303) Multiplying from the left by (E − p · σ ), we see that the energy of a particle created or annihilated by the field u is the same as its momentum, E = | p|. The particles of (E

the field u are massless because the action density L± has no mass term. The spin of the particle is represented by the matrix J = σ / 2, so the momentum-space relation (11.303) says that u( p) is an eigenvector of pˆ · J with eigenvalue −1/2 pˆ · J u( p) =

− 21 u ( p).

(11.304)

A particle whose spin is opposite to its momentum is said to have negative helicityor to be left handed. Nearly massless neutrinos are nearly left handed.

One may add to this action density the Majorana mass term

L M ( x ) = − 1 m u † ( x ) σ2 u ∗ ( x ) − 1 m ∗ u 2

2

T

( x ) σ2

u( x)

(11.305)

442

11 Group Theory

which is Lorentz covariant because the matrices σ1 and σ3 anticommute with σ2 which is antisymmetric (Exercise 11.41). This term would vanish if u 1u 2 were equal to u 2 u 1. Since charge is conserved, only neutral fields like neutrinos can have Majorana mass terms. The action density of a left-handed field of mass m is the sum L = L ± + L M of the kinetic one (11.297) and the Majorana mass term (11.305). The resulting equations of motion 0 = i (∂0

∗ − ∇ · σ ) u − m σ2 u ) ( 0 = ∂02 − ∇ 2 + | m |2 u show that the field u represents particles of mass |m |.

(11.306)

11.40 Right-Handed Representation of the Lorentz Group

The generators of the 2-dimensional representation D ( 0,1/2) with j = 0 and j ± 1/2 are given by (11.282 and 11.283) with J + = 0 and J − = σ /2; they are J

= 21 σ

and

K

= i 12 σ .

=

(11.307)

Thus 2 × 2 matrix D( 0,1/ 2) ( θ , λ) that represents the Lorentz transformation (11.279) L

= e−i · J −i · K θ

(11.308)

λ

is D (0,1/2) (θ , λ) = exp (−i θ

· σ /2 + λ · σ /2) = D 1 2 0 (θ , − λ) ( / , )

(11.309)

which differs from D (1/ 2,0) (θ , λ) only by the sign of λ. The generic D ( 0,1/2) matrix is the complex unimodular 2 × 2 matrix ∗ D( 0,1/2) (θ , λ) = e z · σ / 2

with λ = Rez and θ

(11.310)

= Imz.

Example 11.38(The standard right-handed boost) For a particle of mass m > 0, the “standard” boost (11.292) that transforms k = (m, 0) to p = ( p0, p) is the 4 ×4 matrix ( ) B ( p) = exp α pˆ · B in which cosh α = p0/ m and sinh α = | p|/ m. This Lorentz transformation with θ = 0 and λ = α pˆ is represented by the matrix (Exercise 11.37) D (0,1/2) (0, α pˆ ) = eα pˆ ·σ / 2 = I cosh(α/2) + pˆ · σ sinh(α/2)

É

É

+ m )/(2m ) + pˆ · σ ( p0 − m )/(2m ) 0 = p± + m +0 p · σ 2m ( p + m ) in the third line of which the 2 × 2 identity matrix I is suppressed. =I

(p0

(11.311)

11.40 Right-Handed Representation of the Lorentz Group

443

Under D (0,1/ 2) , the vector ( I , σ ) transforms as a 4-vector; for tiny z D †( 0,1/ 2) ( θ , λ) I D (0,1/2) (θ , λ ) = I D

†( 0,1/ 2)

( θ , λ) σ

D

(0,1/2)

+λ·σ (θ , λ ) = σ + I λ + θ ∧ σ

(11.312)

as in (11.278). A massless field v(x ) that responds to a unitary Lorentz transformation U ( L ) as U ( L ) v(x ) U −1( L ) = D( 0,1/ 2) ( L −1 ) v( L x )

(11.313)

is called a right-handed Weyl spinor . One may show (Exercise 11.40) that the action density is Lorentz covariant

Lr ( x ) = i v †( x ) (∂0 I + ∇ · σ ) v( x )

(11.314)

U ( L ) Lr ( x ) U −1( L ) = L r ( L x ).

(11.315)

Example 11.39(Why v is right handed) An argument like that of Example 11.37 shows that the field v( x ) satisfies the wave equation (∂ 0 I

+ ∇ · σ ) v( x ) = 0

(11.316)

or in momentum space

− p · σ ) v( p) = 0. Thus, E = | p|, and v( p) is an eigenvector of pˆ · J (E

pˆ · J v( p) =

1 v( p ) 2

(11.317)

(11.318)

with eigenvalue 1/2. A particle whose spin is parallel to its momentum is said to have positive helicityor to be right handed. Nearly massless antineutrinos are nearly right handed.

The Majorana mass term 1 1 m v † (x ) σ2 v ∗ ( x ) − m ∗ v ( x ) σ2 v( x ) (11.319) 2 2 like (11.305) is Lorentz covariant. The action density of a right-handed field of mass m is the sum L = Lr + L M of the kinetic one (11.314) and this Majorana mass term (11.319). The resulting equations of motion

LM ( x ) = −

T

0 = i (∂0 + ∇ · σ ) v − m σ2 v ∗ 0=

(∂ 2 − ∇ 2 + |m |2 ) v 0

show that the field v represents particles of mass |m |.

(11.320)

444

11 Group Theory

11.41 Dirac’s Representation of the Lorentz Group

Dirac’s representation of SO (3, 1 ) is the direct sum D (1/ 2,0) ⊕ D (0,1/2) of D (1/2,0) and D (0,1/2) . Its generators are the 4 × 4 matrices J

=

1 2

²σ

0

0

³

²−σ K = 0 i 2

and

σ

0

³

σ

.

(11.321)

Dirac’s representation uses the Clifford algebraof the gamma matrices γ a which satisfy the anticommutation relation

{γ a , γ b } ≡ γ a γ b + γ b γ a = 2ηab I in which η is the 4 × 4 diagonal matrix (11.259) with η00 j = 1, 2, and 3, and I is the 4 × 4 identity matrix. Remarkably, the generators of the Lorentz group Jij

= ²i jk Jk

J0 j

and

(11.322)

= −1 and η j j = 1 for

= Kj

(11.323)

may be represented as commutators of gamma matrices J ab

= − 4i [γ a , γ b].

(11.324)

They transform the gamma matrices as a 4-vector

[ J ab , γ c ] = −i γ a ηbc + i γ b ηac

(11.325)

(Exercise 11.42) and satisfy the commutation relations i [ J ab , J cd ] = ηbc J ad

− ηac J bd − ηda J cb + ηdb J ca

(11.326)

of the Lorentz group (Weinberg, 1995, pp. 213–217) (Exercise 11.43). The gamma matrices γ a are not unique; if S is any 4 × 4 matrix with an inverse, then the matrices γ ±a ≡ S γ a S −1 also satisfy the definition (11.322). The choice ²0 1³ ² 0 σ³ 0 γ = −i and γ = −i (11.327) 1 0 −σ 0 makes J and K block diagonal (11.321) and lets us assemble a left-handed spinor u and a right-handed spinor v neatly into a 4-component spinor ψ

²u³

=

v

.

(11.328)

Dirac’s action density for a 4-spinor is

L= −ψ

(

a

γ ∂a

+m

)

ψ

≡ − ψ (² ∂ + m) ψ

(11.329)

11.41 Dirac’s Representation of the Lorentz Group

in which ψ

≡ iψ γ = ψ † 0

†

²0 1³ ( = v† 1 0

The kinetic part is the sum of the left-handed densities (11.297 and 11.314)

L

±

u†

445

).

(11.330)

and right-handed

− ψ γ a ∂a ψ = iu † (∂0 I − ∇ · σ ) u + i v † (∂0 I + ∇ · σ ) v.

Lr

action

(11.331)

If u is a left-handed spinor transforming as (11.296), then the spinor

²v ³ ²0 −i ³ ²u †³ ≡ v1 = i 0 u 1† 2

∗ v = σ2 u

(11.332)

2

transforms as a right-handed spinor (11.313), that is (Exercise 11.44)

( ) = σ2 e− z· 2 u ∗ . Similarly, if v is right handed, then u = − σ2 v ∗ is left handed. ∗ e z ·σ /2 σ2 u ∗

(11.333)

σ/

A Majorana 4-spinor obeys the Majorana condition ψM

=

²

u σ2 u ∗

³ ² − σ v∗ ³ = v2 = −i γ 2ψ M∗ .

(11.334)

Its particles are the same as its antiparticles. (2) (1) If two Majorana spinors ψM and ψ M have the same mass, then one may combine them into a Dirac spinor ψD

1

=√

2

´

( 1)

ψM

+ iψM

( 2)

µ

²u 1 + iu 2 ³ ²u ³ = √ v 1 + i v 2 = vD . D 2 1

The Dirac mass term

− m ψ D ψD = − m

( )

( )

( )

( )

´

†

vD u D

+ u†D v D

conserves charge, and since exp( z ∗ · σ /2) † exp(− z · σ /2) invariant. For a Majorana field, it reduces to

− 21 m ψ M ψ M = − 12 m =−

(

†

v

u + u †v

)

1 ( † m v σ2 v ∗ + v 2

= − 12 m )

T

a Majorana mass term (11.305 or 11.319).

σ2 v

(

(11.335)

µ =

(11.336) I it also is Lorentz

u † σ2 u ∗ + u

T

σ2 u

) (11.337)

446

11 Group Theory

11.42 Poincaré Group The elements of the Poincaré group are products of Lorentz transformations and translations in space and time. The Lie algebra of the Poincaré group therefore includes the generators J and K of the Lorentz group as well as the hamiltonian H and the momentum operator P which respectively generate translations in time and space. Suppose T ( y ) is a translation that takes a 4-vector x to x + y and T ( z ) is a translation that takes a 4-vector x to x + z. Then T (z ) T ( y ) and T ( y ) T ( z ) both take x to x + y + z. So if a translation T ( y ) = T ( t, y ) is represented by a unitary operator U (t , y ) = exp(i H t − i P · y ), then the hamiltonian H and the momentum operator P commute with each other

[H , P j ] = 0

and

[ P i , P j ] = 0.

(11.338)

We can figure out the commutation relations of H and P with the angularmomentum J and boost K operators by realizing that P a = ( H , P ) is a 4-vector. Let U (θ , λ ) = e −i θ · J −i λ ·K

(11.339)

be the (infinite-dimensional) unitary operator that represents (in Hilbert space) the infinitesimal Lorentz transformation L

= I + θ · R +λ· B

(11.340)

where R and B are the six 4 × 4 matrices (11.269 and 11.270). Then because P is a 4-vector under Lorentz transformations, we have U −1 (θ , λ ) PU ( θ , λ) = e+i θ · J +i λ· K Pe −i θ · J−i λ· K

= ( I + θ · R + λ · B) P

or using (11.312)

(11.341)

+ i θ · J + i λ · K ) H (I − i θ · J − i λ · K ) = H + λ · P (11.342) ( I + i θ · J + i λ · K ) P (I − i θ · J − i λ · K ) = P + H λ + θ ∧ P .

(I

Thus one finds (Exercise 11.44) that H is invariant under rotations, while P transforms as a 3-vector

[ Ji , H ] = 0 and that

[ K i , H ] = −i Pi

and

and

[ Ji , P j ] = i ²i jk Pk [ K i , P j ] = − i δi j H .

(11.343)

(11.344)

Exercises

447

By combining these equations with (11.326), one may write (Exercise 11.46) the Lie algebra of the Poincaré group as i [ J ab , J cd ] = ηbc J ad − ηac J bd i [ P a , J bc ] = ηab P c − ηac P b

− ηda J cb + ηdb J ca

[ P a , P b ] = 0.

(11.345)

11.43 Homotopy Groups

Two paths f (s ) and g (s ) that map the unit interval I = [ 0, 1] into a space X are homotopic if there is a continuous function F ( s , t ) into X defined for 0 ≤ s , t ≤ 1 such that F ( s , 0) = f (s ) and F (s , 1 ) = g (s ) as well as F ( 0, t ) = x 0 and F (1 , t ) = x 1. All paths homotopic to f form an equivalence class called the homotopy class [ f ] of f . The product f · g of two paths with f (1 ) = g (0) is f · g (s ) = f (2s ) for 0 ≤ s ≤ 1/2 and f · g (s ) = g (2s − 1) for 1/2 ≤ s ≤ 1. With this kind of multiplication, the set of all homotopy classes [ f ] of paths f that are loops, f (0) = f (1 ) = x 0, is the fundamental group π1( X , x 0) of the space X with basepoint x 0 . This construction extended to paths that map the n-cube I n into a space X and map the boundary of I n to the same point x 0 defines the nth homotopy group πn ( X , x 0 ) of X. Example 11.40(Some homotopy groups) The fundamental groups of the circle S 1 and the torus T 2 are π1( S1 ) = Z and π1 (T 2) = Z 2. Some higher homotopy groups are π n (Sn ) = Z and πi ( Sn ) = 0 for i < n.

Further Reading Group theory in a Nutshell for Physicists (Zee, 2016), Lie Algebras in Particle Physics (Georgi, 1999), Group theory and Its Application to the Quantum Mechanics of Atomic Spectra (Wigner, 1964), Unitary Symmetry and Elementary Particles (Lichtenberg, 1978), Chemical Applications of Group theory (Cotton, 1990), Group theory and Quantum Mechanics (Tinkham, 2003), and pi.math. cornell.edu/~hatcher/AT/ATch1.pdf. Exercises

11.1 Show that all n ×n (real) orthogonal matrices O leave invariant the quadratic form x 12 + x 22 + · · · + x n2 , that is, that if x ± = O x , then x ±2 = x 2 .

448

11 Group Theory

11.2 Show that the set of all n × n orthogonal matrices forms a group. 11.3 Show that all n × n unitary matrices U leave invariant the quadratic form |x1|2 + |x2|2 + · · · + |xn|2, that is, that if x ± = U x, then | x ±|2 = |x |2. 11.4 Show that the set of all n × n unitary matrices forms a group. 11.5 Show that the set of all n × n unitary matrices with unit determinant forms a group. ( j) 11.6 Show that the matrix Dm ±m (g ) = ³ j , m ± |U ( g )| j , m ´ is unitary because the rotation operator U (g ) is unitary ³ j, m ± |U † ( g )U ( g )| j , m ´ = δm ±m . 11.7 Invent a group of order 3 and compute its multiplication table. For extra credit, prove that the group is unique. 11.8 Show that the relation (11.23) between two equivalent representations is an isomorphism. 11.9 Suppose that D1 and D2 are equivalent, finite-dimensional, irreducible representations of a group G so that D 2 (g ) = S D 1 (g ) S −1 for all g ∈ G . What can you say about a matrix A that satisfies D2 (g ) A = A D1 ( g ) for all g ∈ G ? 11.10 Find all components of the matrix exp( i α A) in which

⎛0 A = ⎝0 i

0 0 0

−i ⎞ 0⎠.

(11.346)

0

11.11 If [ A , B ] = B, find ei α A Be−i α A . Hint: what are the α-derivatives of this expression? 11.12 Show that the direct-product matrix (11.34) of two representations D and D ± is a representation. 11.13 Find a 4× 4 matrix S that relates the direct-product representation D (1/2, 1/2) to the direct sum D (1) ⊕ D (0 ). 11.14 Find the generators in the adjoint representation of the group with structure constants f abc = ²abc where a , b , c run from 1 to 3. Hint: The answer is three 3 × 3 matrices ta , often written as L a . 11.15 Show that the generators (11.95) satisfy the commutation relations (11.98). 11.16 Show that the demonstrated equation (11.104) implies the commutation relation (11.105). 11.17 Use the Cayley–Hamilton theorem (1.294) to show that the 3 × 3 matrix (11.101) that represents a right-handed rotation of θ radians about the axis θ is given by (11.102). 11.18 Verify the mixed Jacobi identity (11.163). 11.19 For the group SU (3) , find the structure constants f123 and f231 .

Exercises

449

11.20 Show that every 2 × 2 unitary matrix of unit determinant is a quaternion of unit norm. 11.21 Show that the quaternions as defined by (11.214) are closed under addition and multiplication and that the product xq is a quaternion if x is real and q is a quaternion. 11.22 Show that the square of the matrix (11.202) is J 2 = − I, where I is the 2n × 2n identity matrix. Then by setting R = exp(²t ) with 0 < ² ¸ 1, show that R J RT = J if and only if J t J = t ± . 11.23 Show that J t J = t ± implies that t is given by (11.207). 11.24 Show that the one-sided derivative f ± (q ) (11.224) of the quaternionic function f (q ) = q 2 depends upon the direction along which q ± → 0. 11.25 Show that the generators (11.228) of Sp (2n ) obey commutation relations of the form (11.229) for some real structure constants fabc and a suitably extended set of matrices A, A± , . . . and Sk , Sk± , . . . . 11.26 Show that for 0 < ² ¸ 1, the real 2n × 2n matrix T = exp(² J S) in which S is symmetric satisfies T T J T = J (at least up to terms of order ² 2 ) and so is in Sp(2n , R ). 11.27 Show that the matrix T of (11.210) is in Sp (2, R ). 11.28 Use the parametrization (11.254) of the group SU (2) to show that the parameters a ( c, b) that describe the product g ( a ( c, b)) = g ( c) g ( b) are those of (11.256). 11.29 Use formulas (11.256) and (11.249) to show that the left-invariant measure for SU (2) is given by (11.257). 11.30 In tensor notation, which is explained in Chapter 13, the condition (11.266) ( ) a T that I + ω be an infinitesimal Lorentz transformation reads ω b = ωa b = − ηbc ωc d ηda in which sums over c and d from 0 to 3 are understood. In this notation, the matrix ηe f lowers indices and ηgh raises them, so that ωb a = − ωbd ηda . (Both ηe f and η gh are numerically equal to the matrix η displayed in equation (11.259).) Multiply both sides of the condition (11.266) by ηae = ηea and use the relation ηda ηae = ηd e ≡ δ de to show that the matrix ωab with both indices lowered (or raised) is antisymmetric, that is, ωba

= − ωab

and

ω

ba

= − ω ab.

(11.347)

11.31 Show that the six matrices (11.269) and (11.270) satisfy the S O ( 3, 1 ) condition (11.266). 11.32 Show that the six generators J and K obey the commutation relations (11.272–11.274).

450

11 Group Theory

11.33 Show that if J and K satisfy the commutation relations (11.272–11.274) of the Lie algebra of the Lorentz group, then so do J and − K . 11.34 Show that if the six generators J and K obey the commutation relations (11.272–11.274), then the six generators J + and J − obey the commutation relations (11.281). 11.35 Relate the parameter α in the definition (11.292) of the standard boost B( p) to the 4-vector p and the mass m. 11.36 Derive the formulas for D (1/2 ,0) (0, α pˆ ) given in Equation (11.293). 11.37 Derive the formulas for D (0,1 /2) (0, α pˆ ) given in Equation (11.311). 11.38 For infinitesimal complex z, derive the 4-vector properties (11.294 and 11.312) of (− I , σ ) under D (1 /2,0 ) and of ( I , σ ) under D(0, 1/2) . 11.39 Show that under the unitary Lorentz transformation (11.296), the action density (11.297) is Lorentz covariant (11.298). 11.40 Show that under the unitary Lorentz transformation (11.313), the action density (11.314) is Lorentz covariant (11.315). 11.41 Show that under the unitary Lorentz transformations (11.296 and 11.313), the Majorana mass terms (11.305 and 11.319) are Lorentz covariant. 11.42 Show that the definitions of the gamma matrices (11.322) and of the generators (11.324) imply that the gamma matrices transform as a 4-vector under Lorentz transformations (11.325). 11.43 Show that (11.324) and (11.325) imply that the generators J ab satisfy the commutation relations (11.326) of the Lorentz group. 11.44 Show that the spinor v = σ2 u ∗ defined by (11.332) is right handed (11.313) if u is left handed (11.296). 11.45 Use (11.342) to get (11.343 and 11.344). 11.46 Derive (11.345) from (11.326, 11.338, 11.343, and 11.344).

12 Special Relativity

12.1 Inertial Frames and Lorentz Transformations An inertial reference frameis a system of coordinates in which free particles move in straight lines at constant speeds. Our spacetime has one time dimension x 0 = ct and three space dimensions x. Its physical points are labeled by four coordinates, p = ( x 0, x 1 , x 2 , x 3 ). The quadratic difference between two infinitesimally separated points p and p + d p whose coordinates differ by d x 0, d x 1, d x 2 , d x 3 is ds 2

= − c2 dt2 + ( d x1 )2 + (d x 2 )2 + (d x 3 )2.

(12.1)

In the absence of gravity, ds 2 is the physical quadratic distance between the points p and p + d p. Since it is physical, it must not change when we change coordinates. Changes of coordinates x → x ± that leave ds 2 invariant are called Lorentz transformations. If we adopt the summation conventionin which an index is summed from 0 to 3 if it occurs both raised and lowered in the same monomial, then we can write a Lorentz transformation as x ±i

=

3 ± k =0

Li k x k

= L ik x k.

(12.2)

Lorentz transformations change coordinate differences d x k to d x ±i The Minkowski-space metric ηik = ηik is the 4 × 4 matrix

⎛−1 ⎜0 (ηik ) = ⎜ ⎝0 0

It is its own inverse: η2

0 1 0 0

0 0 1 0

= Lik d x k.

⎞

0 0⎟ ⎟= 0⎠ 1

ik

(η ).

(12.3)

= I or ηik ηk = δi . ±

±

451

452

12 Special Relativity

In terms of the metric η, the formula (12.1) for the quadratic distance is ds 2 = d x i ηik d x k . A Lorentz transformation (12.2) must preserve this quadratic distance, so ds 2

= d x± i ηik d x ±k = Li

±

d x ± ηik L k j d x j

= dx

±

η± j

dx j.

(12.4)

By differentiating both sides with respect to d x ± and d x j , one may show (Exercise 12.1) that the matrix L must obey the equation L i ± ηik L k j

=ηj

(12.5)

±

in order to preserve quadratic distances. In matrix notation, left indexes label rows; right indexes label columns; and transposition interchanges rows and columns. Thus in matrix notation the condition (12.5) that defines a Lorentz transformation is L η L = η which since η2 = I , implies that η L η L = η2 = I . Thus the inverse (11.264) of a Lorentz transformation is L −1 = η L η or L −1 = η L η or more explcitly T

T

T

L −1± j

T

= η k L i k ηi j ±

and

Example 12.1(Lorentz transformations)

L −1± j

=η kL ±

i k ηi j .

T

(12.6)

If we change coordinates to

x ± 0 = cosh θ x 0 + sinh θ x 1

x ± 1 = sinh θ x 0 + cosh θ x 1

(12.7)

x ± 2 = x 2 and x ± 3 = x 3 ,

then the matrix L of the Lorentz transformation is (Exercise 12.2)

⎛cosh θ ⎜ sinh θ L =⎜ ⎝ 0 0

sinh θ cosh θ 0 0

0 0 1 0

⎞

0 0⎟ ⎟ 0⎠ 1

(12.8)

which is a boost in the x-direction to speed v/c = tanh θ and rapidity θ . One may check (Exercise 12.3) that L T η L = η. In the new coordinates, the point p = (x 0 , x 1, x 2 , x 3 ) is

= (cosh θ x ±0 − sinh θ x ±1, cosh θ x ± 1 − sinh θ x ±0 , x ±2, x ± 3). (12.9) 12.2(Spacelike points) Points p and q with ( p − q ) · ( p − q ) = ( p − q )2 − p

Example 0 0 2 ( p − q ) > 0 are spacelike. Spacelike events occur at the same time in some Lorentz frames. Let the coordinates of p and q be (0, 0) and (ct , L , 0, 0) with |ct / L | < 1 so that ( p − q )2 > 0. The Lorentz transformation (12.8) leaves the coordinates of p unchanged but takes those of q to (ct cosh θ + L sinh θ , ct sinh θ + L cosh θ , 0, 0). So if v/c = tanh θ = −ct / L , then p and q occur at time 0 frame with |v/c| = | tanh θ | < 1.

12.2 Special Relativity

453

Example 12.3 (Timelike points) Points p and q with ( p − q ) · ( p − q ) < 0 are timelike. Timelike events occur at the same place in some Lorentz frames. We can use the same coordinates as in the previous example (12.2) but with |ct / L | > 1 so that ( p − q )2 < 0. The Lorentz transformation (12.8) leaves the coordinates of p unchanged but takes those of q to (ct cosh θ + L sinh θ , ct sinh θ + L cosh θ , 0, 0). So if v/c = tanh θ = − L /(ct ), p and q occur at the same place 0 in the e± frame with |v/ c| = | tanh θ | < 1.

In special relativity, the spacetime coordinates x 0 = ct and ( x 1 , x 2, x 3) = x of a point have upper indexes and transform contravariantly x ±k = L k ± x ± under Lorentz transformations. Spacetime derivatives ∂0 = ∂/∂ x 0 and ∇ = (∂1 , ∂2, ∂3 ) have lower indexes and transform covariantly under Lorentz transformations. That is, since x ± = L −1± j x ± j , derivatives transform as

= ∂∂xx± j ∂∂x = L−1 j ∂∂x = η k L i k ηi j ∂∂x ±

∂

∂x± j

±

or equivalently because ηm j ηi j η

mj

±

±

±

(12.10)

= δim as

k ∂ m η = L k ∂ x±j ∂x

∂

±

±

±

or

∂

±m

= Lm k ∂ . ±

(12.11)

This last equation illustrates a general rule: the metric ηik raises indexes turning covariant vectors into contravariant vectors ηik Ak = Ai , and the metric ηik lowers indexes turning contravariant vectors into covariant vectors ηik Ak = A i . Thus another way to write the inverse (12.6) of a Lorentz transformation is L −1± j = η± k

L i k ηi j

= Lj . ±

12.2 Special Relativity The spacetime of special relativity is flat, 4-dimensional Minkowski space. In the absence of gravity, the inner product ( p − q ) · ( p − q ) (p

− q) · ( p − q) = ( p − q)2 − ( p0 − q 0) 2 = ( p − q)i ηik ( p − q )k

(12.12)

is physical and the same in all Lorentz frames. If the points p and q are close neighbors with coordinates x i + d x i for p and x i for q , then that invariant inner product is ds 2 = d x i ηi j d x j = d x 2 − (d x 0) 2. If the points p and q are on the trajectory of a massive particle moving at velocity v , then this invariant quantity is the square of the invariant distance ds 2

= d x2 − c2dt 2 =

(

v

2

− c2

)

dt 2

(12.13)

454

12 Special Relativity

which is negative since proper timeτ , and

c. The time in the rest frame of the particle is the

v
0 moving at speed v , the element of proper time than the corresponding element of laboratory time dt by ²d τ is 2smaller 2 the factor 1 − v /c . The proper time is the time in the rest frame of the particle where v = 0. So if T (0) is the lifetime of a particle at rest, then the apparent lifetime T (v) when the particle is moving at speed v is T (v) = dt

=²

dτ

1 − v 2 /c

which is longer than T (0) since 1 − v 2/c 2

=² 2

T (0 )

(12.15)

1 − v 2/c 2

≤ 1, an effect known as time

dilation.

Example 12.4 (Time dilation in muon decay) A muon at rest has a mean life of T (0) = 2.2 × 10−6 seconds. Cosmic rays hitting nitrogen and oxygen nuclei make pions high in the Earth’s atmosphere. The pions rapidly decay into muons in 2.6 × 10−8 s. A muon moving at the speed of light from 10 km takes at least t = 10 km /300, 000 (km/sec) = 3.3 × 10−5 s to hit the ground. Were it not for time dilation, the probability P of such a muon reaching the ground as a muon would be

= exp(−33/2.2) = e−15 = 2.6 × 10−7. (12.16) The mass of a muon is 105.66 MeV. So a muon of energy E = 749 MeV has by P

= e−t

/ T ( 0)

(12.23) a time-dilation factor of

²

1

1−

v 2/ c2

749 = 7.089 = ² = mcE 2 = 105 .7

So a muon moving at a speed of v equation (12.15) as T (v) =

E T (0) = mc2

²

1

1 − (0. 99)2

.

(12.17)

= 0.99 c has an apparent mean life T (v) given by

2.2 × 10 s = ² = 1.6 × 10 −5 s. 1 − v 2 /c2 1 − (0. 99)2 T (0)

−6

(12.18)

The probability of survival with time dilation is P

= e −t T = exp(−33/16) = 0.12 /

(v)

(12.19)

so that 12% survive. Time dilation increases the chance of survival by a factor of 460,000 – no small effect.

12.3 Kinematics

455

12.3 Kinematics From the scalar d τ , and the contravariant vector d x i , we can make the 4-vector u

i

=

dxi dτ

dt dτ

=

³dx0

dx , dt dt

²

´

=²

in which u 0 = c dt /d τ = c/ 1 − v2 /c 2 and u energy–momentum 4-vector p i pi

i

1

1 − v2 /c2

(c , v )

(12.20)

= u 0 v /c. The product mui is the

i

= m u i = m ddxτ = m ddtτ ddtx = ² m 2 2 ddtx 1 − v /c ´ ³ = ² m 2 2 (c, v ) = Ec , p . 1 − v /c

i

(12.21)

Its invariant inner product is a constant characteristic of the particle and proportional to the square of its mass c 2 p i pi

= mc u i mc u i = − E 2 + c2 p 2 = −m 2 c4.

(12.22)

Note that the time-dilation factor is the ratio of the energy of a particle to its rest energy (12.23) ² 1 2 2 = mcE 2 1 − v /c and the velocity of the particle is its momentum divided by its equivalent mass E /c 2 p . (12.24) v= E /c2 The analog of F

= m a is m

in which p 0

d2 x i dτ2

i

i

dp = m du = = dτ dτ

fi

(12.25)

= E /c, and f i is a 4-vector force.

Example 12.5(Time dilation and proper time) In the frame of a laboratory, a particle of mass m with 4-momentum pilab = ( E /c, p, 0, 0) travels a distance L in a time t i for a 4-vector displacement of xlab = (ct , L , 0, 0). In its own rest frame, the particle’s 4-momentum and 4-displacement are pri est = (mc, 0, 0, 0) and xri est = (c τ, 0, 0, 0). Since the Minkowski inner product of two 4-vectors is Lorentz invariant, we have

µ

p i xi

¶

= r est

µ

p i xi

¶

lab

or pL

− Et = − mc2 τ = −mc2 t

So a massive particle’s phase exp (i pi xi /±) is exp(−imc 2τ /±).

·

1 − v 2 /c2 . (12.26)

456

12 Special Relativity

Example 12.6( p + p → 3p + p¯ ) Conservation of the energy–momentum 4-vector gives p + p0 = 3p± + p¯ ± . We set c = 1 and use this equality in the invariant form 2 ( p + p 0)2 = (3 p ± + p¯± )2 . We compute ( p + p 0 )2 = p 2 + p 2 0 + 2p · p 0 = −2m p + 2 p · p0 2 2 in the laboratory frame in which p0 = (m , 0). Thus ( p + p0) = −2m p − 2E p m p . We compute (3p± + p¯ ± )2 in the frame in which each of the three protons and the antiproton has zero spatial momentum. There (3p± + p¯ ±)2 = (4m , 0)2 = − 16m 2p . We get E p = 7m p of which 6m p = 5.63 GeV is the threshold kinetic energy of the proton. In 1955, when the group led by Owen Chamberlain and Emilio Segrè discovered the antiproton, the nominal maximum energy of the protons in the Bevatron was 6.2 GeV.

12.4 Electrodynamics In electrodynamics and in MKSA (SI) units, the 3-dimensional vector potential A and the scalar potential φ form a covariant 4-vector potential

=

Ai

³ −φ ´ c

,

A

= (φ/c, A) . The magnetic or Bi = ² i jk ∂ j Ak

The contravariant 4-vector potential is Ai B

=∇×A

(12.27)

.

inductionis (12.28)

in which ∂ j = ∂/∂ x j , the sum over the repeated indices j and k runs from 1 to 3, and ²i jk is totally antisymmetric with ²123 = 1. The electric field is for i = 1 , 2, 3 Ei

=c

³∂ A

0 ∂xi

∂ Ai

´

− ∂ x0 = − ∂∂xφi − ∂∂At i

where x 0 = ct . In 3-vector notation, E is given by the gradient of time-derivative of A E

(12.29) φ

= −∇φ − A˙ .

and the (12.30)

The second-rank, antisymmetric Faraday field-strength tensor is for i , j = 0, 1 , 2, 3 ∂ Aj ∂ Ai Fi j = − = − F ji . (12.31) i ∂x ∂x j

= c Fi0 , and the magnetic field Bi is ³ ´ ∂ Ak ∂Aj 1 1 − ∂ x k = ( ∇ × A)i Bi = ²i jk F jk = ²i jk 2 2 ∂xj

In terms of it, the electric field is Ei

(12.32)

where the sum over repeated indexes runs from 1 to 3. The inverse equation F jk ² jki Bi for spatial j and k follows from the Levi-Civita identity (1.497)

=

12.4 Electrodynamics ² jki Bi

457

= 12 ² jki ²inm Fnm = 12 ²i jk ²inm Fnm ) ) ( ( = 21 δ jn δkm − δ jm δkn Fnm = 12 F jk − Fk j = F jk .

(12.33)

In 3-vector notation and MKSA (SI) units, Maxwell’s equations are a ban on magnetic monopoles and Faraday’s law, both homogeneous,

∇·B=0

and

∇ × E + B˙ = 0

(12.34)

and Gauss’s lawand the Maxwell–Ampère law, both inhomogeneous,

∇ · D = ρf

and

∇×H =

jf

+ D˙ .

(12.35)

Here ρ f is the density of free charge and j f is the free current density . By free, we understand charges and currents that do not arise from polarization and are not restrained by chemical bonds. The divergence of ∇ × H vanishes (like that of any curl), and so the Maxwell–Ampère law and Gauss’s law imply that free charge is conserved

∇ · ( ∇ × H ) = ∇ · j f + ∇ · D˙ = ∇ · j f + ρ˙ f . (12.36) If we use this continuity equationto replace ∇ · j f with − ρ˙ f in its middle form ˙ then we see that the Maxwell–Ampère law preserves the 0 = ∇ · j f + ∇ · D, 0=

Gauss-law constraint in time

( ) ∇ · j f + ∇ · D˙ = ∂∂t −ρ f + ∇ · D . Similarly, Faraday’s law preserves the constraint ∇ · B = 0 0=

0=

− ∇ · (∇ × E ) = ∂∂t ∇ · B = 0.

(12.37)

(12.38)

In a linear, isotropicmedium, the electric displacement D is related to the electric field E by the permittivity ² , D = ² E, and the magnetic or magnetizing field H differs from the magnetic induction B by the permeability µ, H = B /µ. On a sub-nanometer scale, the microscopic form of Maxwell’s equations applies. On this scale, the homogeneous equations (12.34) are unchanged, but the inhomogeneous ones are

∇ · E = ²ρ

0

and

˙ ∇ × B = µ0 j + ²0 µ0 E˙ = µ0 j + cE2

(12.39)

in which ρ and j are the total charge and current densities, and ²0 = 8.854 × 10−12 F/m and µ0 = 4 π × 10−7 N/A2 are the electric and magnetic constants, whose product is the inverse of the square of the speed of light, ²0 µ0 = 1 /c2 . Gauss’s law and the Maxwell–Ampère law (12.39) imply (Exercise 12.10) that

458

12 Special Relativity

the microscopic (total) current-density 4-vector j = (cρ , j ) obeys the continuity equation ρ˙ + ∇ · j = 0. Electric charge is conserved. In vacuum, ρ = j = 0, D = ² 0 E, and H = B /µ0, and Maxwell’s equations become

∇ · B = 0 and ∇ × E + B˙ = 0 (12.40) ∇ · E = 0 and ∇ × B = c12 E˙ . Two of these equations ∇ · B = 0 and ∇ · E = 0 are constraints. Taking the curl of the other two equations, we find

∇ × (∇ × E ) = − c12 E¨

and

∇ × (∇ × B) = − c12 B¨ .

(12.41)

One may use the Levi-Civita identity (1.497) to show (Exercise 12.13) that

∇ × ( ∇× E ) = ∇ (∇ · E ) −± E and ∇ × (∇ × B ) = ∇ ( ∇ · B) −± B (12.42) in which ² ≡ ∇2. Since in vacuum the divergence of E vanishes, and since that of B always vanishes, these identities and the curl–curl equations (12.41) tell us that waves of E and B move at the speed of light 1 E¨ − ² E c2

=0

and

1 B¨ − ² B = 0. c2

(12.43)

We may write the two homogeneous Maxwell equations (12.34) as ∂i

F jk

( ) ( ) + ∂k Fi j + ∂ j Fki = ∂i ∂ j A k − ∂k A j + ∂k ∂i A j − ∂ j Ai +∂ j (∂k A i − ∂i Ak ) = 0

(12.44)

(exercise 12.11). This relation, known as the Bianchi identity, actually is a generally covariant tensor equation ²

±i jk

∂i F jk

=0

(12.45)

in which ² ± i jk is totally antisymmetric, as explained in Section 13.20. There are four versions of this identity (corresponding to the four ways of choosing three different indices i, j , k from among four and leaving out one, ±). The ± = 0 case gives the scalar equation ∇ · B = 0, and the three that have ± ³ = 0 give the vector equation ∇ × E + B˙ = 0. In tensor notation, the microscopic form of the two inhomogeneous equations (12.39) – the laws of Gauss and Ampère – are a single equation ∂i

is the current 4-vector.

F ki

= µ0 j k

in which

jk

= (cρ , j )

(12.46)

12.5 Principle of Stationary Action in Special Relativity

459

The Lorentz force lawfor a particle of charge q is du i d pi dx j d 2x i = m = = f i = q Fi j = q Fi j u j . (12.47) 2 dτ dτ dτ dτ We may cancel a factor of dt /d τ from both sides and find for i = 1, 2, 3 m

( d pi = q − F i0 dt and for i = 0

+ ²i jk Bk v j

)

dp dt

or

= q ( E + v × B)

(12.48)

dE =q E·v (12.49) dt which shows that only the electric field does work. The only special-relativistic ² correction needed in Maxwell’s electrodynamics is ² a factor of 1/ 1 − v2 /c 2 in these equations. That is, we use p = mu = m v / 1 − v 2 /c2 not p = m v in (12.48), and we use the total energy E not the kinetic energy in (12.49). The reason why so little of classical electrodynamics was changed by special relativity is that electric and magnetic effects were accessible to measurement during the 1800s. Classical electrodynamics was almost perfect. Keeping track of factors of the speed of light is a lot of trouble and a distraction; in what follows, we’ll often use units with c = 1. 12.5 Principle of Stationary Action in Special Relativity The action for a free particle of mass m in special relativity is S

= −m

¸

τ2

dτ

τ1

=−

¸t · 2

t1

m 1 − x˙ 2 dt

(12.50)

where c = 1 and x˙ = d x /dt . The requirement of stationary action is

²

0 = δS

= −δ

¸

t2 t1

·

m 1 − x˙ dt 2

=m

¸

t2 t1

But 1/ 1 − x˙ 2

= dt /d τ and so ¸ t d x d δ x dt ¸ 0 = δS = m · dt = m dt dt d τ t ¸ d x d δx = m d τ · d τ d τ.

τ2

2

τ2

1

²x˙ · δ x˙ 2 dt. 1 − x˙

τ1

dx dt

· ddtδ x ddtτ

(12.51)

dt dτ dτ (12.52)

τ1

So integrating by parts, keeping in mind that δ x (τ2 ) = δ x (τ 1) = 0, we have 0 = δS

=m

¸ ¹ d ³dx τ2

τ1

dτ

´

· δx − dτ

d2 x dτ2

· δx

º

dτ

= −m

¸

τ2 τ1

d 2x dτ 2

· δ x dτ. (12.53)

460

12 Special Relativity

To have this hold for arbitrary δ x, we need d 2x dτ 2

=0

(12.54)

which is the equation of motion for a free particle in special relativity. What about a charged particle in an electromagnetic field A i ? Its action is S

= −m

¸

τ2

+q

dτ

τ1

¸

x2 x1

Ai ( x ) d x

i

=

¸ ³ τ2

τ1

i −m + q Ai (x ) ddxτ

´

d τ.

(12.55)

We now treat the first term in a 4-dimensional manner

i k ² = δ −ηik d x i d x k = ²−ηik d x δid x k = −u k δ d x k = −uk dδ x k (12.56) −ηik d x d x in which u k = d x k /d τ is the 4-velocity (12.20) and η is the Minkowski metric

δdτ

(12.3) of flat spacetime. The variation of the other term is δ

(

Ai d x i

)

= (δ A i ) d x i + Ai δ d x i = Ai , k δ xk d x i + Ai d δ x i .

Putting them together, we get for δ S δS

¸ ³

dδ x k mu k dτ

τ2

=

τ1

i + q Ai , k δ x ddxτ k

i + q Ai ddδτx

´

(12.57)

d τ.

(12.58)

After integrating by parts the last term, dropping the boundary terms, and changing a dummy index, we get δS

= =

¸ ³ τ2

¸ ¹ τ1

τ2

τ1

k k δx − m du dτ k − m du dτ

k dx

i

− q ddAτ k δ x k

+ q Ai ,k δ x d τ ( ) dxi º k + q Ai ,k − Ak ,i d τ δ x d τ.

´

dτ (12.59)

If this first-order variation of the action is to vanish for arbitrary δ x k , then the particle must follow the path 0 = −m

du k dτ

+q

(A − A ) d xi i ,k k ,i dτ

or

d pk dτ

= q F ki ui

(12.60)

which is the Lorentz force law (12.47). 12.6 Differential Forms By (13.9 and 13.4), a covariant vector field contracted with contravariant coordinate differentials is invariant under arbitrary coordinate transformations A±

j

±i

= A±i d x ±i = ∂∂ xx ±i A j ∂∂ xx k d x k = δkj A j d x k = A k d x k = A .

(12.61)

12.6 Differential Forms

461

This invariant quantity A = A k d x k is a called a 1-form in the language of differential formsintroduced about a century ago by Élie Cartan, son of a blacksmith (1869–1951). The wedge productd x ∧ dy of two coordinate differentials is the directed area spanned by the two differentials and is defined to be antisymmetric d x ∧ dy

= − dy ∧ d x

and d x

∧ d x = dy ∧ dy = 0

(12.62)

so as to transform correctly under a change of coordinates. In terms of the coordinates u = u (x , y ) and v = v(x , y ), the new element of area is

´ ³ ∂v ´ ∂v dx + dy ∧ dx + dy . du ∧ d v = ∂x ∂y ∂x ∂y ³ ∂u

∂u

(12.63)

Labeling partial derivatives by subscripts (2.7) and using the antisymmetry (12.62) of the wedge product, we see that du ∧ d v is the old area d x ∧ dy multiplied by the Jacobian (Section 1.21) of the transformation x , y → u , v du

( ) ( ) ∧ dv = u x d x + u y dy ∧ v x d x + v y dy = ux vx d x ∧ d x + u x v y d x ∧ dy + u y v x dy ∧ d x + u y v y dy ∧ dy ( ) = u x v y − u y vx d x ∧ dy »» »» u u x y = »» v v »» d x ∧ dy = J (u, v; x , y) d x ∧ dy. (12.64) x

y

A contraction H = 21 Hik d x i ∧ d x k of a second-rank covariant tensor with a wedge product of two differentials is a 2-form. A p -form is a rank- p covariant tensor contracted with a wedge product of p differentials K

= p1! K i

1 ...i p

d x i1

∧ . . . d xi

p

.

(12.65)

The exterior derivatived differentiates and adds a differential. It turns a p-form into a ( p + 1 )-form. It turns a function f , which is a 0-form, into a 1-form

= ∂∂xfi d x i (12.66) and a 1-form A = A j d x j into a 2-form d A = d ( A j d x j ) = (∂i A j ) d x i ∧ d x j . df

Example 12.7(The curl) The exterior derivative of the 1-form A

= Ax dx + Ay dy + Az dz

is a 2-form that contains the curl (2.42) of A d A = ∂ y A x dy ∧ dx + ∂ z A x dz ∧ dx

+∂x Ay dx ∧ dy + ∂z Ay dz ∧ dy +∂x Az dx ∧ dz + ∂ y Az dy ∧ dz

(12.67)

462

12 Special Relativity

) − ∂z Ay dy ∧ dz (12.68) + (∂z Ax − ∂x Az) dz ∧ dx ( ) + ∂x A y − ∂y Ax dx ∧ dy = (∇ × A)x dy ∧ dz + (∇ × A) y dz ∧ dx + (∇ × A)z dx ∧ dy . =

(

∂ y Az

The exterior derivative of the electromagnetic 1-form A = A j d x j made from the 4-vector potential or gauge field A j is the Faraday 2-form (12.31), the tensor Fi j dA

=d

(A

j

dx j

) =∂ A i

j

dxi

∧ d x j = 21 Fi j d x i ∧ d x j = F

(12.69)

in which ∂i = ∂/∂ x i . The square dd of the exterior derivative vanishes in the sense that dd applied to any p-form Q is zero

¼ (

)½ = d ¼(∂ Q ) d x r ∧ d x i ∧ . . .½ r i = (∂s ∂r Q i ) d x s ∧ d x r ∧ d x i ∧ . . . = 0

d d Q i ... d x i ∧ . . .

...

(12.70)

...

because ∂s ∂r Q is symmetric in r and s while d x s ∧ d x r is antisymmetric. If Mik is a covariant second-rank tensor with no particular symmetry, then (Exercise 12.12) only its antisymmetric part contributes to the 2-form Mik d x i ∧ d x k and only its symmetric part contributes to Mik d x i d x k . Example 12.8 (The homogeneous Maxwell equations) The exterior derivative d applied to the Faraday 2-form F = d A gives the homogeneous Maxwell equations 0 = dd A

= d F = d Fik dx i ∧ dx k = ∂ Fik dx ∧ dx i ∧ dx k ±

±

(12.71)

an equation known as the Bianchi identity (12.45).

A p-form H is closed if d H = 0. By (12.71), the Faraday 2-form is closed, d F = 0. A p-form H is exact if it is the differential H = d K of a ( p − 1)form K . The identity (12.70) or dd = 0 implies that every exact form is closed . A lemma (Section 14.5) due to Poincaré shows that every closed form is locally exact. If the Ai in the 1-form A = A i d x i commute with each other, then the 2-form A ∧ A is identically zero. But if the Ai don’t commute because they are matrices, operators, or Grassmann variables, then A ∧ A = 21 [ A i , A j ] d x i ∧ d x j need not vanish.

12.6 Differential Forms

463

Example 12.9(If B˙ = 0, the electric field is closed and exact) If B˙ = 0, then by Faraday’s law (12.34) the curl of the electric field vanishes, ∇ × E = 0. In terms of the 1-form E = Ei dx i for i = 1, 2, 3, the vanishing of its curl ∇ × E is dE

= ∂ j Ei dx j ∧ dx i = 21

(∂

− ∂i E j

j Ei

) dx j ∧ dx i = 0.

(12.72)

So E is closed. It also is exact because we can define a quantity V ( x) whose gradient is E = −∇ V . We first define V P (x ) as a line integral of the 1-form E along an arbitrary path P from some starting point x 0 to x V P (x ) =

−

¸

x P, x 0

Ei dx

=−

i

¸

P

E.

(12.73)

The potential V P (x ) might seem to depend on the path P . But the difference V P ± (x) − V P ( x) is a line integral of E from x0 to x along the path P ± and then back to x0 along the path P . And by Stokes’s theorem (2.50), the integral of E around such a closed loop is an integral of the curl ∇ × E of E over any surface S whose boundary is that closed loop. V P ± (x) − V P (x ) =

¾

P−P±

In the notation of forms, this is V P ± (x ) − V P ( x) =

¸

=

E i dx i

S

¸

E

∂S

∇ × E ) · da = 0.

(

=

¸ S

dE

= 0.

Thus the potential VP ( x) = V (x ) is independent of the path, and E so the 1-form E = E i dx i = −∂i V dx i = −d V is exact.

(12.74)

(12.75)

= − ∇ V ( x), and

The general form of Stokes’s theorem is that the integral of any p-form H over the boundary ∂ R of any ( p + 1)-dimensional, simply connected, orientable region R is equal to the integral of the ( p + 1)-form d H over R

¸

∂

R

H

=

¸

R

d H.

(12.76)

Equation (12.75) is the p = 1 case (George Stokes, 1819–1903). Example 12.10(Stokes’s theorem for 0-forms) When p = 0, the region R = [a , b] is 1-dimensional, H is a 0-form, and Stokes’s theorem is the formula of elementary calculus H (b) − H (a) =

¸

∂R

H

=

¸

R

dH

=

¸b a

d H (x ) =

¸b a

H ± (x ) dx .

(12.77)

464

12 Special Relativity

Example 12.11(Exterior derivatives anticommute with differentials) The exterior derivative acting on the wedge product of two 1-forms A = A i dx i and B = B± dx ± is d ( A ∧ B ) = d( A i dx i

∧ B dx ) = ∂k ( Ai B ) dx k ∧ dx i ∧ dx (12.78) = (∂k Ai ) B dx k ∧ dx i ∧ dx + Ai (∂k B ) dx k ∧ dx i ∧ dx = (∂k Ai ) B dx k ∧ dx i ∧ dx − Ai (∂k B ) dx i ∧ dx k ∧ dx = (∂k Ai ) dx k ∧ dx i ∧ B dx − Ai dx i ∧ (∂k B ) dx k ∧ dx = d A ∧ B − A ∧ d B. If A is a p-form, then d ( A ∧ B ) = d A ∧ B + (−1) p A ∧ d B (Exercise 12.14). ±

±

±

±

±

±

±

±

±

±

±

±

±

±

±

±

Although I have avoided gravity in this chapter, special relativity is not in conflict with general relativity. In fact, Einstein’s equivalence principle says that special relativity applies in a suitably small neighborhood of any point in any inertial reference frame that has free-fall coordinates.

Exercises

12.1 Show that (12.4) implies (12.5). 12.2 Show that the matrix form of the Lorentz transformation (12.7) is the x boost (12.8). 12.3 Show that the Lorentz matrix (12.8) satisfies L T η L = η. 12.4 The basis vectors at a point p are the derivatives of the point with respect to the coordinates. Find the basis vectors ei = ∂ p /∂ x i at the point p = ( x 0 , x 1 , x 2 , x 3 ). What are the basis vectors e ±i in the coordinates x ± (12.9)? 12.5 The basis vectors e i that are dual to the basis vectors e k are defined by e i = ηik e k . (a) Show that they obey e i · e k = δki . (b) In the two coordinate systems described in Example 12.1, the vectors ek x k and e±i x ±i represent the same point, so ek x k = e ±i x ± i . Find a formula for e± i · ek . (c) Relate e ±i · e k to the Lorentz matrix (12.8). 12.6 Show that the equality of the inner products x i ηik x k = x ± j η j ± x ±± means that the matrix L i k = e ± i · ek that relates the coordinates x ±i = L ik x k to the coordinates x k must obey the relation ηik = L ik ηi ± L ±k which is η = L Tη L in matrix notation. Hint: First doubly differentiate the equality with respect to x k and to x ± for k ³= ±. Then differentiate it twice with respect to x k . 12.7 The relations x ± i = e ±i · e j x j and x ± = e ± · e ±k x k imply (for fixed basis vectors e and e± ) that ∂ x ±i ∂x j

= e±i · e j = e j · e±i = η j

±

η

ik

e ± · e ±k

= ηj

±

η

± ik ∂ x . ∂ x±k

Exercises

465

Use this equation to show that if Ai transforms (13.6) as a contravariant vector ± ∂ x ±i A j, (12.79) Ai = ∂x j then A±

= η j A j transforms covariantly (13.9) ±

A ±s

12.8

12.9

12.10

12.11

12.12 12.13 12.14 12.15

= ∂∂xx±s A . ±

±

The metric η also turns a covariant vector A ± into its contravariant form ηk ± A ± = ηk± η±j A j = δ kj A j = A k . The LHC is designed to collide 7 TeV protons against 7 TeV protons for a total collision energy of 14 TeV. Suppose one used a linear accelerator to fire a beam of protons at a target of protons at rest at one end of the accelerator. What energy would you need to see the same physics as at the LHC? What is the minimum energy that a beam of pions must have to produce a sigma hyperon and a kaon by striking a proton at rest? The relevant masses (in MeV) are m ³+ = 1189 .4, m K + = 493.7, m p = 938.3, and m π + = 139.6. Use Gauss’s law and the Maxwell–Ampère law (12.39) to show that the microscopic (total) current-density 4-vector j = (cρ , j ) obeys the continuity equation ρ˙ + ∇ · j = 0. Derive the Bianchi identity (12.44) from the definition (12.31) of the Faraday field-strength tensor, and show that it implies the two homogeneous Maxwell equations (12.34). Show that if Mik is a covariant second-rank tensor with no particular symmetry, then only its antisymmetric part contributes to the 2-form Mik d xi ∧ d x k and only its symmetric part contributes to the quantity Mik d x i d x k . In rectangular coordinates, use the Levi-Civita identity (1.497) to derive the curl–curl equations (12.42). Show that if A is a p-form, then d ( AB ) = d A ∧ B + (−1 ) p A ∧ d B. Show that if ω = a i j d x i ∧ d x j /2 with ai j = − a ji , then dω

= 31!

(

∂k a i j

+ ∂i a jk + ∂ j aki

)

dxi ∧ dx j ∧ dxk.

(12.80)

13 General Relativity

13.1 Points and Their Coordinates We use coordinates to label the physical points of a spacetime and the mathematical points of an abstract object. For example, we may label a point on a sphere by its latitude and longitude with respect to a polar axis and meridian. If we use a different axis and meridian, our coordinates for the point will change, but the point remains as it was. Physical and mathematical points exist independently of the coordinates we use to talk about them. When we change our system of coordinates, we change our labels for the points, but the points remain as they were. At each point p, we can set up various coordinate systems that assign unique coordinates x i ( p ) and x ±i ( p ) to p and to points near it. For instance, polar coordinates (θ , φ) are unique for all points on a sphere – except the north and south poles which are labeled by θ = 0 and θ = π and all 0 ≤ φ < 2 π . By using a second coordinate system with θ ± = 0 and θ ± = π on the equator in the (θ , φ) system, we can assign unique coordinates to the north and south poles in that system. Embedding simplifies labeling. In a 3-dimensional euclidian space and in the 4-dimensional Minkowski spacetime in which the sphere is a surface, each point of the sphere has unique coordinates, ( x , y , z ) and (t , x , y , z ). We will use coordinate systems that represent the points of a space or spacetime uniquely and smoothly at least in local patches, so that the maps

= xi ( p) = x i ( p( x ± )) = x i (x ± ) x ±i = x ±i ( p ) = x ± i ( p (x )) = x ±i ( x ) xi

(13.1)

are well defined, differentiable, and one to one in the patches. We’ll often group the n coordinates x i together and write them collectively as x without superscripts. Since the coordinates x ( p ) label the point p, we sometimes will call them “the point x.” But p and x are different. The point p is unique 466

13.4 Covariant Vectors

467

with infinitely many coordinates x, x ± , x ±± , . . . in infinitely many coordinate systems. We begin this chapter by noticing carefully how things change as we change our coordinates. Our goal is to write physical theories so their equations look the same in all systems of coordinates as Einstein taught us. 13.2 Scalars A scalar is a quantity B that is the same in all coordinate systems B± = B.

(13.2)

If it also depends upon the coordinates of the spacetime point p (x ) it is a scalar field, and B ± ( x ± ) = B ( x ).

= p( x ±) , then (13.3)

13.3 Contravariant Vectors By the chain rule, the change in d x ±i due to changes in the unprimed coordinates is d x ±i

=

± ∂ x ±i k

∂xk

dxk.

(13.4)

This transformation defines contravariant vectors: a quantity A i is a component of a contravariant vector if it transforms like d x i A ±i

=

± ∂ x ±i k

∂xk

Ak .

(13.5)

The coordinate differentials d x i form a contravariant vector. A contravariant vector A i (x ) that depends on the coordinates is a contravariant vector fieldand transforms as ± ∂ x ±i k A ( x ). (13.6) A ±i ( x ± ) = ∂xk k 13.4 Covariant Vectors The chain rule for partial derivatives ∂

∂ x ±i

=

± ∂ xk k

∂

∂ x ±i ∂ x k

(13.7)

468

13 General Relativity

defines covariant vectors: a quantity C i that transforms like a partial derivative C± i

=

± ∂ xk k

∂ x ±i

Ck

(13.8)

is a covariant vector. A covariant vector Ci ( x ) that depends on the coordinates and transforms as ± ∂ xk Ck (x ) (13.9) Ci± (x ± ) = ∂ x ±i k is a covariant vector field . Example 13.1(Gradient of a scalar) The derivatives of a scalar field B ± (x ± ) = B (x ) form a covariant vector field because

± ±

∂ B (x ) ∂ x ±i

= ∂∂Bx(±xi ) =

± ∂ xk k

∂ x ±i

∂ B (x ) , ∂ xk

(13.10)

which shows that the gradient ∂ B (x )/∂ x k fits the definition (13.9) of a covariant vector field.

13.5 Tensors Tensors are structures that transform like products of vectors. A rank-zero tensor is a scalar. A rank-one tensor is a covariant or contravariant vector. Second-rank tensors are distinguished by how they transform under changes of coordinates: covariant contravariant mixed

Fi±j M ±i j N ±ji

k

l

±i

±j

±i

l

= ∂∂ xx±i ∂∂xx± j Fkl

= ∂∂ xx k ∂∂xx l

M kl

(13.11)

= ∂∂ xx k ∂∂xx± j Nlk .

We can define tensors of higher rank by extending these definitions to quantities with more indices. The rank of a tensor also is called its order and its degree. If S (x ) is a scalar field, then its derivatives with respect to the coordinates are covariant vectors (13.10) and tensors Vi

= ∂∂xSi ,

Tik

2

= ∂ x∂i ∂Sx k ,

and Uik ±

3

= ∂ x i ∂∂ xSk ∂ x

±

.

(13.12)

13.6 Summation Convention and Contractions

469

Example 13.2(Rank-2 tensors) If A k and B± are covariant vectors, and C m and D n are contravariant vectors, then the product A k B± is a second-rank covariant tensor; C m D n is a second-rank contravariant tensor; and Ak C m , Ak D n , Bk C m , and Bk D n are second-rank mixed tensors.

Since the transformation laws that define tensors are linear, any linear combination (with constant coefficients) of tensors of a given rank and kind is a tensor of that rank and kind. Thus if Fi j and G i j are both second-rank covariant tensors, so is their sum Hi j = Fi j + G i j .

13.6 Summation Convention and Contractions An index that appears in the same monomial once as a covariant subscript and once as a contravariant superscript, is a dummy index that is summed over Ai B i

≡

±

Ai Bi

(13.13)

i

usually from 0 to 3. Such a sum in which an index is repeated once covariantly and once contravariantly is a contraction. The rank of a tensor is the number of its uncontracted indices. Although the product A k C ± is a mixed second-rank tensor, the contraction A k C k is a scalar because A±k C ±k

±k

= ∂∂xx±k ∂∂ xx m A ±

m ± C

= ∂∂xxm A ±

±

Cm

= δm A ±

±

Cm

=A

±

C ±.

(13.14)

Similarly, the doubly contracted product F ik Fik is a scalar. Example 13.3(Kronecker delta) The summation convention and the chain rule imply that ² 1 if i = ± ∂ x ±i ∂ x k ∂ x ±i i = ∂ x ± ± = δ± = 0 if i ²= ±. (13.15) ∂ x k ∂x ±± The repeated index k has disappeared in this contraction. The Kronecker deltaδ i± is a mixed second-rank tensor; it transforms as

±i δ±

±i ∂ x j ∂ x ±±

= ∂∂ xx k

k δj

±i

= ∂∂ xx k

and is invariant under changes of coordinates.

∂ xk

∂ x ±±

±i

= ∂∂ xx± = δi ±

±

(13.16)

470

13 General Relativity

13.7 Symmetric and Antisymmetric Tensors A covariant tensor is symmetric if it is independent of the order of its indices. That is, if Sik = Ski , then S is symmetric. Similarly a contravariant tensor S k ±m is symmetric if permutations of its indices k , ±, m leave it unchanged. The metric of spacetime gik (x ) = gki ( x ) is symmetric because its whole role is to express infinitesimal distances as ds 2 = gik ( x )d x i d x k which is symmetric in i and k. A covariant or contravariant tensor is antisymmetric if it changes sign when any two of its indices are interchanged. The Maxwell field strength Fk ± ( x ) = − F±k ( x ) is an antisymmetric rank-2 covariant tensor. If T ik ² ik = 0 where ²12 = − ²21 = 1 is antisymmetric, then T 12 − T 21 = 0. Thus T ik ²ik = 0 means that the tensor T ik is symmetric. 13.8 Quotient Theorem Suppose that B has unknown transformation properties, but that its product BA with all tensors A a given rank and kind is a tensor. Then B must be a tensor. The simplest example is when Bi A i is a scalar for all contravariant vectors A i Bi± A ±i

= Bj A j.

Then since A i is a contravariant vector

³

±i

= Bi± ∂∂ xx j A j = B j A j

Bi± A±i or

(13.17)

∂x B± i

±i

∂x j

− Bj

´

Aj

= 0.

(13.18)

(13.19)

Since this equation holds for all vectors A, we may promote it to the level of a vector equation Bi±

∂ x ±i

∂x j

− B j = 0.

(13.20)

Multiplying both sides by ∂ x j /∂ x ±k and summing over j, we get Bi±

∂ x ±i ∂ x j

∂ x j ∂ x ±k

j

= B j ∂∂ xx± k

(13.21)

which shows that the unknown quantity Bi transforms as a covariant vector Bk±

j

= ∂∂xx±k B j .

(13.22)

The quotient rule works for tensors A and B of arbitrary rank and kind. The proof in each case is similar to the one given here.

13.10 Comma Notation for Derivatives

471

13.9 Tensor Equations Maxwell’s homogeneous equations (12.45) relate the derivatives of the fieldstrength tensor to each other as 0 = ∂i F jk

+ ∂k Fi j + ∂ j Fki .

(13.23)

They are generally covariant tensor equations(Sections 13.19 and 13.20). They follow from the Bianchi identity (12.71) dF

= ddA = 0.

(13.24)

Maxwell’s inhomegneous equations (12.46) relate the derivatives of the fieldstrength tensor to the current density j i and to the square root of the modulus g of the determinant of the metric tensor gi j (Section 13.12)

√g F ik )

∂(

∂xk

= µ0 √g j i .

(13.25)

They are generally covariant tensor equations. We’ll write them as the divergence of a contravariant vector in section 13.29, derive them from an action principle in Section 13.31, and write them as invariant forms in Section 14.7. If we can write a physical law in one coordinate system as a tensor equation j± G ( x ) = 0, then in any other coordinate system the corresponding tensor equation G ±ik (x ± ) = 0 is valid because G ±ik (x ± ) =

∂ x ±i ∂ x ±k

∂ x j ∂ x±

G j ± ( x ) = 0.

(13.26)

Physical laws also remain the same if expressed in terms of invariant forms. A theory written in terms of tensors or forms has equations that are true in all coordinate systems if they are true in any coordinate system . Only such generally covariant theories have a chance at being right because we can’t be sure that our particular coordinate system is the correct one. One can make a theory the same in all coordinate systems by applying the principle of stationary action (Section 13.31) to an action that is invariant under all coordinate transformations. 13.10 Comma Notation for Derivatives Commas are used to denote derivatives. If f (θ , φ) is a function of θ and φ, we can write its derivatives with respect to these coordinates as f, θ

= ∂ f = ∂∂ θf θ

and

f,φ

= ∂ f = ∂∂ φf . φ

(13.27)

472

13 General Relativity

And we can write its double derivatives as f,θ θ

2

= ∂∂ θ 2f ,

2

= ∂∂θ ∂fφ ,

f ,θ φ

and

f,φφ

2

= ∂∂ φf2 .

(13.28)

If we use indices i, k , . . . to label the coordinates x i , x k , then we can write the derivatives of a scalar f as f ,i

∂f

= ∂i f = ∂ x i

and

f,ik

∂2

= ∂k ∂i f = ∂ x k ∂fx i

(13.29)

and those of tensors T ik and Fik as T,ikj ±

2

ik

= ∂∂x jT∂ x

±

Fik , j ± =

and

2

Fik ∂x j ∂ x± ∂

(13.30)

and so forth. Semicolons are used to denote covariant derivatives (Section 13.15). 13.11 Basis Vectors and Tangent Vectors A point p (x ) in a space or spacetime with coordinates x is a scalar (13.3) because it is the same point p ± ( x ±) = p( x ± ) = p ( x ) in any other system of coordinates x ± . Thus its derivatives with respect to the coordinates ∂ p (x ) ∂xi

= ei ( x )

(13.31)

form a covariant vectorei ( x ) ei± ( x ± ) =

∂ p± (x ± ) ∂ x ±i

k

k

= ∂∂px( x±i ) = ∂∂ xx±i ∂∂px( xk ) = ∂∂ xx±i ek (x ).

(13.32)

Small changes d x i in the coordinates (in any fixed system of coordinates) lead to small changes in the point p (x ) d p( x ) = e i ( x ) d x i .

(13.33)

The covariant vectors ei ( x ) therefore form a basis (1.49) for the space or spacetime at the point p (x ). These basis vectors ei ( x ) are tangent to the curved space or spacetime at the point x and so are called tangent vectors. Although complex and fermionic manifolds may be of interest, the manifolds, points, and vectors of this chapter are assumed to be real. 13.12 Metric Tensor A Riemann manifold of dimension d is a space that locally looks like ddimensional euclidian space Ed and that is smooth enough for the derivatives

13.12 Metric Tensor

473

(13.31) that define tangent vectors to exist. The surface of the Earth, for example, looks flat at horizontal distances of less than a kilometer. Just as the surface of a sphere can be embedded in flat 3-dimensional space, so too every Riemann manifold can be embedded without change of shape (isometrically) in a euclidian space En of suitably high dimension (Nash, 1956). In particular, every Riemann manifold of dimension d = 3 (or 4) can be isometrically embedded in a euclidian space of at most n = 14 (or 19) dimensions, E14 or E19 (Günther, 1989). The euclidian dot products (Example 1.15) of the tangent vectors (13.31) define the metric of the manifold g ik (x ) = ei ( x ) · ek ( x ) =

n ± α

=1

e αi (x ) e αk ( x ) = ek ( x ) · ei ( x ) = gki (x )

(13.34)

which is symmetric, g ik (x ) = gki ( x ) . Here 1 ≤ i, k ≤ d and 1 ≤ α ≤ n. The dot product of this equation is the dot product of the n-dimensional euclidian embedding space En . Because the tangent vectors e i (x ) are covariant vectors, the metric tensor transforms as a covariant tensor if we change coordinates from x to x ± g ±ik (x ± ) =

∂x j ∂x±

∂ x ±i ∂ x ± k

g j ± ( x ).

(13.35)

The squared distance ds 2 between two nearby points is the dot product of the small change d p (x ) (13.33) with itself ds 2

= d p(x ) · d p( x ) = ( ei ( x ) d xi ) · (ei ( x ) d x i ) = ei ( x ) · ei (x ) d x i d x k = gik ( x ) d x i d x k .

(13.36)

So by measuring the distances ds between nearby points, one can determine the metric gik ( x ) of a Riemann space. Example 13.4 (The sphere S 2 in E3 ) In polar coordinates, a point p on the 2-dimensional surface of a sphere of radius R has coordinates p = R (sin θ cos φ , sin θ sin φ , cos θ ) in an embedding space E3. The tangent space E 2 at p is spanned by the tangent vectors eθ

=

p, θ

eφ

=

p, φ

= ∂∂ θp = R (cos θ cos φ , cos θ sin φ , − sin θ ) = ∂∂ φp = R (− sin θ sin φ , sin θ cos φ , 0).

(13.37)

474

13 General Relativity

The dot products of these tangent vectors are easy to compute in the embedding space

E3. They form the metric tensor of the sphere

³g

´ ³e · e e · e ´ ³R2 ´ 0 gik = = e · e e · e = 0 R2 sin2 θ . (13.38) g Its determinant is det(gik ) = R 4 sin2 θ . Since e · e = 0, the squared infinitesimal θθ

φθ

gθ φ gφφ

θ

θ

θ

φ

φ

θ

φ

φ

θ

distance (13.36) is

= R2 d θ 2 + R2 sin2 θ d φ2 . (13.39) We change coordinates from the angle θ to a radius r = R sin θ /a in which a is a dimensionless scale factor. Then R 2d θ 2 = a 2dr 2 / cos 2 θ , and cos2 θ = 1 − sin2 θ = 1 − a2r 2/ R 2 = 1 − kr 2 where k = (a / R )2. In these coordinates, the squared distance ds 2

=e ·e

φ

θ

θ

d θ 2 + eφ · eφ d φ2

(13.39) is

ds 2

2

= 1 −a kr 2 dr 2 + a 2r 2 dφ 2

and the r, φ metric of the sphere and its inverse are gik

=a

2

³(1 − kr 2)−1 0

0 r2

´

³

1 − kr 2 and g = a −2 ik

0

(13.40)

0

r−2

´ .

(13.41)

The sphere is a maximally symmetric space(Section 13.24). Example 13.5(Graph paper) Imagine a piece of slightly crumpled graph paper with horizontal and vertical lines. The lines give us a 2-dimensional coordinate system 1 2 (x , x ) that labels each point p (x ) on the paper. The vectors e1 (x ) = ∂1 p (x ) and e2 (x ) = ∂2 p(x ) define how a point moves dp (x ) = ei (x ) dx i when we change its coordinates by dx 1 and dx 2. The vectors e1(x ) and e2 (x ) span a different tangent space at the intersection of every horizontal line with every vertical line. Each tangent space is like the tiny square of the graph paper at that intersection. We can think of the two vectors ei (x ) as three-component vectors in the 3-dimensional embedding space we live in. The squared distance between any two nearby points separated by dp(x ) is ds2 ≡ dp2(x ) = e21 (x )(dx 1 )2 + 2e1(x ) · e2 (x ) dx 1 dx 2 + e22(x )(dx 2 )2 in which the inner products gi j = ei (x ) · e j (x ) are defined by the euclidian metric of the embedding euclidian space R3.

But our universe has time. A semi-euclidian spacetime E( p,d− p) of dimension d is a flat spacetime with a dot product that has p minus signs and q = d − p plus signs. A semi-riemannian manifold of dimension d is a spacetime that locally looks like a semi-euclidian spacetime E( p,d − p) and that is smooth enough for the derivatives (13.31) that define its tangent vectors to exist. Every semi-riemannian manifold can be embedded without change of shape (isometrically) in a semi-euclidian spacetime E(u ,n−u) for sufficiently large u and n (Greene, 1970; Clarke, 1970). Every physically reasonable (globally hyperbolic)

13.12 Metric Tensor

475

semi-riemannian manifold with 1 dimension of time and 3 dimensions of space can be embedded without change of shape (isometrically) in a flat semi-euclidian spacetime of 1 temporal and at most 19 spatial dimensions E( 1,19) (Müller and Sánchez, 2011; Aké et al., 2018). The semi-euclidian dot products of the tangent vectors of a semi-riemannian manifold of d dimensions define its metric as gik ( x ) = e i (x ) · ek (x ) =

−

u ± α

=1

n ±

e i ( x ) ek ( x ) + α

α

α

=u +1

e αi ( x ) e αk ( x )

(13.42)

for 0 ≤ i, k ≤ d − 1. The metric (13.42) is symmetric gik ( x ) = gki . In an extended summation convention, the dot product (13.42) is g ik (x ) = ei α ( x ) eαk ( x ). The squared pseudo-distance or line elementds 2 between two nearby points is the inner product of the small change d p ( x ) (13.33) with itself ds 2

= d p(x ) · d p( x ) = ( ei ( x ) d xi ) · (ei ( x ) d x i ) = ei ( x ) · ei (x ) d x i d x k = gik ( x ) d x i d x k .

(13.43)

Thus measurements of line elements ds 2 determine the metric gik ( x ) of the spacetime. Some Riemann spaces have natural embeddings in semi-euclidian spaces. One example is the hyperboloid H 2 . Example 13.6(The hyperboloid H 2 ) If we embed a hyperboloid H 2 of radius R in a semi-euclidian spacetime E(1,2) , then a point p = (x , y , z) on the 2-dimensional surface of H 2 obeys the equation R 2 = x 2 − y 2 − z 2 and has polar coordinates p = R (cosh θ , sinh θ cos φ , sinh θ sin φ). The tangent vectors are eθ

=

p, θ

eφ

=

p, φ

= ∂∂ θp = R (sinh θ , cosh θ cos φ , cosh θ sin φ) = ∂∂ φp = R (0, − sinh θ sin φ , sinh θ cos φ).

(13.44)

= ds 2 between nearby points is ds 2 = e · e dθ 2 + e · e d φ2 . (13.45) The metric of E 1 2 is (−1, 1, 1), so the metric and line element (13.45) of H 2 are ³1 0 ´ R2 and ds 2 = R 2 d θ 2 + R 2 sinh2 θ d φ 2. (13.46) 0 sinh 2 θ The line element d p2

θ

θ

φ

φ

( , )

We change coordinates from the angle θ to a radius r = R sinh θ /a in which a is a dimensionless scale factor. Then in terms of the parameter k = (a / R )2, the metric and line element (13.46) are (Exercise 13.7)

476 a2

³ (1 + r 2)−1 0

0 r2

´

13 General Relativity and ds2

= a2

³

dr 2 1 + kr 2

+ r 2 d φ2

´

(13.47)

which describe one of only three maximally symmetric (Section 13.24) twodimensional spaces. The other two are the sphere S 2 (13.40) and the plane.

13.13 Inverse of Metric Tensor The metric gik is a nonsingular matrix (exercise 13.4), and so it has an inverse g ik that satisfies g ik gk ±

= δi = g± ik gk± ±

±

(13.48)

in all coordinate systems. The inverse metric g ik is a rank-2 contravariant tensor (13.11) because the metric gk ± is a rank-2 covariant tensor (13.35). To show this, we combine the transformation law (13.35) with the definition (13.48) of the inverse of the metric tensor s r i ± ik ± ±ik ∂ x ∂ x gr s (13.49) δ± = g gk ± = g ∂ x ±k ∂ x ±± and multiply both sides by ∂ x ± ± ∂ x± v . (13.50) g tu ∂ xt ∂xu Use of the Kronecker-delta chain rule (13.15) now leads (Exercise 13.5) to g ±i v (x ± ) =

∂ x ±i ∂ x ± v ∂xt ∂xu

g tu (x )

(13.51)

which shows that the inverse metric g ik transforms as a rank-2 contravariant tensor. The contravariant vector A i associated with any covariant vector A k is defined as A i = g ik A k which ensures that Ai transforms contravariantly (Exercise 13.6). This is called raising an index . It follows that the covariant vector corresponding to the contravariant vector A i is Ak = gki A i = gki g i ± A ± = δk± A± = Ak which is called lowering an index. These definitions apply to all tensors, so T ik ± = g i j g km g ± n T jmn , and so forth. Example 13.7(Making scalars) Fully contracted products of vectors and tensors are scalars. Two contravariant vectors A i and B k contracted with the metric tensor form the scalar gik A i B k = Ak B k . Similarly, gik A i Bk = Ak Bk . Derivatives of scalar fields with respect to the coordinates are covariant vectors S,i (Example 13.1) and covariant tensors S,ik (Section 13.5). If S is a scalar, then S,i is a covariant vector, gik S, k is a contravariant vector, and the contraction gik S,i S,k is a scalar.

In what follows, I will often use space to mean either space or spacetime.

13.15 Covariant Derivatives of Contravariant Vectors

477

13.14 Dual Vectors, Cotangent Vectors Since the inverse metric g ik is a rank-2 contravariant tensor, dual vectors ei

= gik ek

(13.52)

are contravariant vectors. They are orthonormal to the tangent vectors e ± because e i · e±

= gik ek · e = gik gk = δ i . ±

±

(13.53)

±

Here the dot product is that (13.34) of the euclidian space or embedding space or that (13.42) of the semi-euclidian space or embedding space. The dual vectors e i are called cotangent vectorsor tangent covectors. The tangent vector e k is the sum ek = gki ei because ek

= gki ei = gki gi

±

e±

= δk e = ek . ±

(13.54)

±

The definition (13.52) of the dual vectors and their orthonormality (13.53) to the tangent vectors imply that their inner products are the matrix elements of the inverse of the metric tensor ei · e±

= gik ek · e = gik δk = gi . ±

±

(13.55)

±

The outer product of a tangent vector with its cotangent vector P = ek ek (summed over the dimensions of the space) is both a projection matrix P from the embedding space onto the tangent space and an identity matrix for the tangent space because Pei = ei . Its transpose P = e k ek is both a projection matrix P from the embedding space onto the cotangent space and an identity matrix for the cotangent space because P e i = e i . So T

T

P

= ek ek = It

and

P

T

= ek ek = Ict .

(13.56)

Details and examples are in the file tensors.pdf in Tensors_and_general_relativity at github.com/kevinecahill. 13.15 Covariant Derivatives of Contravariant Vectors The covariant derivative D± V k of a contravariant vector V k that transforms like a mixed rank-2 tensor. An easy derivative is to note that the invariant description V ( x ) = travariant vector field V i ( x ) in terms of tangent vectors derivative ∂ ei ∂ Vi ∂V = ei + V i ± ± ± ∂x ∂x ∂x

V k is a derivative of way to make such a V i (x ) ei (x ) of a cone i (x ) is a scalar. Its (13.57)

478

13 General Relativity

is therefore a covariant vector. And the inner product of that covariant vector V, ± with a contravariant tangent vector e k is a mixed rank-2 tensor

) ( = ek · V, = ek · V,i ei + ei , V i = δki V,i + ek · ei , V i = V,k + ek · ei , V i . The inner product e k · ei , is usually written as D± V k

±

±

±

±

±

(13.58)

±

±

±

ek · ei ,±

= ek · ∂∂xei ≡ ³ki ±

(13.59)

±

and is variously called an affine connection (it relates tangent spaces lacking a common origin), a Christoffel connection, and a Christoffel symbol of the second kind. The covariant derivative itself often is written with a semicolon, thus D± V k

= V;k = V,k + ek · ei , ±

±

±

Vi

= V,k + ³ki ±

Example 13.8(Covariant derivatives of cotangent vectors) 0 = δik,±

±

V i.

Using the identity

= (ek · ei ), = ek, · ei + ek · ei , ±

(13.60)

±

±

(13.61)

and the projection matrix (13.56), we find that D± e k

= e,k + e k · ei ,

±

±

ei

= ek, − ek, · ei ei = ek, − e,k = 0 ±

±

±

±

(13.62)

the covariant derivatives of cotangent vectors vanish.

Under general coordinate transformations, D± V k transforms as a rank-2 mixed tensor

(

D± V k

)±

±

(x )

=

( k )± V;±

±

(x )

±k

m

= ∂∂ xx p ∂∂ xx ±

±

p p V;m ( x ) = x ,±kp x ,m±± V;m ( x ).

(13.63)

Tangent basis vectors ei are derivatives (13.31) of the spacetime point p with respect to the coordinates x i , and so ei ,± = e±,i because partial derivatives commute ∂ ei ∂2 p ∂2 p e i ,± = = = = e±,i . (13.64) ∂x± ∂ x ±∂ x i ∂xi ∂x± Thus the affine connection (13.59) is symmetric in its lower indices k

³ i±

= ek · ei , = ek · e ,i = ³ k i . ±

±

±

(13.65)

Although the covariant derivative V;i± (13.60) is a rank-2 mixed tensor, the affine connection ³k i ± transforms inhomogeneously (Exercise 13.8)

13.17 Covariant Derivatives of Tensors

±k

³ i±

±k

±

m

±k

n

= e±k · ∂∂xe±i = ∂∂xx p ∂∂ xx ± ∂∂xx±i ³ pnm + ∂∂ xx p = x,±kp x,m± x,ni ± ³ pnm + x,±kp x,p ±i ± ±

±

±

479 ∂2x p

∂ x ±± ∂ x ±i

(13.66)

±

and so is not a tensor. Its variation δ³k i ± = ³ ±ki ± − ³ki ± is a tensor, however, because the inhomogeneous terms in the difference cancel. Since the affine connection ³ki ± is symmetric in i and ±, in four-dimensional spacetime, there are 10 ³’s for each k, or 40 in all. The 10 correspond to 3 rotations, 3 boosts, and 4 translations.

13.16 Covariant Derivatives of Covariant Vectors The derivative of the scalar V V,±

= Vk ek is the covariant vector

= ( Vk ek ), = Vk , ±

±

e k + Vk e k,± .

(13.67)

Its inner product with the covariant vector ei transforms as a rank-2 covariant tensor. Thus using again the identity (13.61), we see that the covariant derivative of a covariant vector is D ± Vi

) ( = Vi ; = ei · V, = ei · Vk , ek + Vk e,k = δik Vk , + ei · ek, = Vi , − ei , · ek Vk = Vi , − ³ki Vk . ±

±

±

±

±

±

±

±

±

Vk

(13.68)

±

D± Vi transforms as a rank-2 covariant tensor because it is the inner product of a covariant tangent vector ei with the derivative V,± of a scalar. Note that ³ki± appears with a minus sign in Vi ;± and a plus sign in V;k± . Example 13.9(Covariant derivatives of tangent vectors) matrix (13.56), we find that D± e i

Using again the projection

= ei = ei , − ei , · ek ek = ei , − ei , = 0 ±

±

±

±

±

(13.69)

covariant derivatives of tangent vectors vanish.

13.17 Covariant Derivatives of Tensors Tensors transform like products of vectors. So we can make the derivative of a tensor transform covariantly by using Leibniz’s rule (5.49) to differentiate products of vectors and by turning the derivatives of the vectors into their covariant derivatives (13.60) and (13.68).

480

13 General Relativity

Example 13.10(Covariant derivative of a rank-2 contravariant tensor) An arbitrary rank-2 contravariant tensor T ik transforms like the product of two contravariant vectors A i B k . So its derivative ∂± T ik transforms like the derivative of the product of the vectors Ai B k ∂± ( A

i

B k ) = (∂± Ai ) B k

+ Ai ∂ B k .

(13.70)

±

By using twice the formula (13.60) for the covariant derivative of a contravariant vector, we can convert these two ordinary derivatives ∂± A i and ∂± B k into tensors D± ( A i B k ) = ( Ai B k );±

= ( Ai, + ³i j A j ) Bk + Ai (B,k + ³k j B j ) = ( Ai Bk ), + ³ i j A j Bk + ³k j Ai B j . ±

±

±

±

±

±

(13.71)

±

Thus the covariant derivative of a rank-2 contravariant tensor is D± T ik

= T ik; = T ik, + ³ i j ±

±

±

T jk

+ ³k j

T ij.

±

(13.72)

It transforms as a rank-3 tensor with one covariant index. Example 13.11 (Covariant derivative of a rank-2 mixed tensor) A rank-2 mixed tensor T ik transforms like the product A i B k of a contravariant vector Ai and a covariant vector Bk . Its derivative ∂± T ik transforms like the derivative of the product of the vectors Ai Bk ∂± ( A

i

B k ) = (∂± A i ) B k

+ Ai ∂ Bk .

(13.73)

±

We can make these derivatives transform like tensors by using the formulas (13.60) and (13.68) D± ( Ai Bk ) = ( A i B k );±

= ( Ai, + ³ i j A j ) Bk + Ai ( Bk , − ³ jk B j ) = ( Ai Bk ), + ³ i j A j Bk − ³ jk Ai B j . ±

±

±

±

±

±

(13.74)

±

Thus the covariant derivative of a mixed rank-2 tensor is D± T ik

= T ik; = T ik, + ³ i j ±

±

±

T

j k

− ³ jk

±

T ij .

(13.75)

It transforms as a rank-3 tensor with two covariant indices. Example 13.12(Covariant derivative of a rank-2 covariant tensor) A rank-2 covariant tensor Tik transforms like the product A i Bk of two covariant vectors A i and Bk . Its derivative ∂± Tik transforms like the derivative of the product of the vectors A i Bk ∂± ( A i

B k ) = (∂± A i ) B k

+ Ai ∂ Bk .

(13.76)

±

We can make these derivatives transform like tensors by twice using the formula (13.68) D± ( Ai Bk ) = ( A i B k );±

= Ai ; Bk + Ai Bk = ( Ai , − ³ ji A j )Bk + Ai ( Bk , − ³ jk B j ) = ( Ai Bk ), − ³ ji A j Bk − ³ jk Ai B j . ±

±

±

±

±

±

±

±

±

(13.77)

13.18 The Covariant Derivative of the Metric Tensor Vanishes

481

Thus the covariant derivative of a rank-2 covariant tensor Tik is D± Tik

= Tik ; = Tik, − ³ ji ±

±

±

T jk − ³

j

k±

(13.78)

Ti j .

It transforms as a rank-3 covariant tensor. Another way to derive the same result is to note that the scalar form of a rank-2 covariant tensor Tik is T = ei ⊗ e k Tik . So its derivative is a covariant vector

= ei ⊗ ek Tik, + ei, ⊗ ek Tik + ei ⊗ ek, Tik . (13.79) Using the projector Pt = e j e j (13.56), the duality ei · en = δ in of tangent and cotangent vectors (13.53), and the relation e j · ek, = − ek · e j , = −³ kj (13.59 and 13.61), T,±

±

±

±

±

±

±

we can project this derivative onto the tangent space and find after shuffling some indices (e

n

en ⊗ e j e j ) T,±

= ei ⊗ ek Tik, + en ⊗ ek (en · ei, )Tik + ei ⊗ e j (e j · ek, ) Tik = ei ⊗ ek Tik, − en ⊗ ek ³ in Tik − ei ⊗ e j ³k j Tik µ ¶ = (ei ⊗ ek ) Tik, − ³ j i T jk − ³ jk Ti j ±

±

±

±

±

±

±

±

±

which again gives us the formula (13.78).

As in these examples, covariant derivatives are derivations; Dk ( AB ) = ( AB );k

= A;k B + A B;k = ( Dk A) B + A Dk B .

(13.80)

The rule for a general tensor is to treat every contravariant index as in (13.60) and every covariant index as in (13.68). The covariant derivative of a mixed rank-4 tensor, for instance, is Txab y ;k

= Txaby,k + Txjby ³a jk + Txamy ³bmk − Tjaby ³ j xk − Txmab ³ myk .

(13.81)

13.18 The Covariant Derivative of the Metric Tensor Vanishes The metric tensor is the inner product (13.42) of tangent basis vectors gik

= ei η α

αβ

β

ek

(13.82)

in which α and β are summed over the dimensions of the embedding space. Thus by the product rule (13.77), the covariant derivative of the metric D ± gik

= gik; = D ±

±

β

(ei ηαβ e k ) α

= (D

±

β

eiα ) ηαβ e k

+ ei η α

αβ

β

D± ek

=0

(13.83)

vanishes because the covariant derivatives of tangent vectors vanish (13.69), D± e αi = e αi ;± = 0 and D± e βk = eβk ;± = 0 .

482

13 General Relativity

13.19 Covariant Curls Because the connection ³ ki ± is symmetric (13.65) in its lower indices, the covariant curl of a covariant vector Vi is simply its ordinary curl

− Vi ; = V ,i − Vk ³k i − Vi , + Vk ³k i = V ,i − Vi , . (13.84) Thus the Faraday field-strength tensor Fi = A ,i − Ai , being the curl of the V± ;i

±

±

±

±

±

±

±

±

±

±

covariant vector field A i is a generally covariant second-rank tensor.

13.20 Covariant Derivatives and Antisymmetry

The covariant derivative (13.78) Ai ±;k is A i ±;k = A i ± ,k − A m ± ³m ik − Aim ³m ±k . If the tensor A is antisymmetric Ai ± = − A± i , then by adding together the three cyclic permutations of the indices i ±k, we find that the antisymmetry of the tensor and the symmetry (13.65) of the affine connection ³ mik = ³mki conspire to cancel the terms with ³s Ai ±;k + Aki ;± + A± k; i

=

A i ±, k − A m ± ³mik

− A im ³m k + A ki , − A mi ³mk − A km ³ mi + A k ,i − A mk ³m i − A m ³mki = A i , k + A ki , + A k ,i ±

±

±

±

±

±

±

±

±

±

(13.85)

an identity named after Luigi Bianchi (1856–1928). The Maxwell field-strength tensor Fi ± is antisymmetric by construction (Fi ± A ±,i − Ai , ± ), and so Maxwell’s homogeneous equations 1 i jk ± ² F jk ,± 2

= F jk, + Fk , j + F j , k = A k , j − A j ,k + A ,k j − A k , j + A j , k − A , jk = 0 ±

±

±

±

±

±

±

±

=

(13.86)

±

are tensor equations valid in all coordinate systems. 13.21 What is the Affine Connection? We insert the identity matrix (13.56) of the tangent space in the form e j e j into the formula (13.59) for the affine connection ³ki ± = e k · ei ,± . In the resulting combination ³ki ± = e k · e j e j · e± ,i we recognize ek · e j as the inverse (13.55) of the metric tensor ek · e j = gk j . Repeated use of the relation ei , k = ek ,i (13.64) then leads to a formula for the affine connection ) 1 kj ( k k j k j k ³ i ± = e · e i ,± = e · e e j · ei , ± = e · e e j · e ±,i = g e j · ei ,± + e j · e± ,i 2 ) 1 kj ( = 2 g ( e j · ei ) ,± − e j ,± · ei + (e j · e±) ,i − e j ,i · e±

13.22 Parallel Transport

(g + g − e · e − e · e ) ji , j ,i j, i j ,i ) ( = 12 gk j g ji , + g j ,i − e , j · ei − ei , j · e ) ) ( ( = 12 gk j g ji , + g j ,i − (ei · e ), j = 21 gk j g ji , + g j ,i − gi , j = 21 gk j

±

±

±

±

±

±

±

±

±

483

(13.87)

±

±

±

±

±

in terms of the inverse of the metric tensor and a combination of its derivatives. The metric g ik determines the affine connection ³ ki ± . The affine connection with all lowerindices is ) 1( k ³ni ± = g nk ³ i ± = gni ,± + gn± ,i − gi ± ,n . (13.88) 2 13.22 Parallel Transport The movement of a vector along a curve on a manifold so that its length and direction in successive tangent spaces do not change is called parallel transport. In parallel transport, a vector V = V k ek = Vk e k may change d V = V,± d x ± , but the projection of the change P d V = e i ei d V = ei e i d V into the tangent space must vanish, P d V = 0. In terms of its contravariant components V = V k ek , this condition for parallel transport is just the vanishing of its covariant derivative (13.60)

( ) = ei V, d x = ei ( V k ek ), d x = ei V,k ek + V k ek , d x ) (13.89) ) ( ( = δik V,k + ei · ek , V k d x = V,i + ³i k V k d x . In terms of its covariant components V = Vk e k , the condition of parallel transport 0 = ei d V

±

±

±

±

±

±

±

±

±

±

±

±

is also the vanishing of its covariant derivative (13.68) 0 = ei d V

= ei V, d x = ei (Vk ek ), d x = ei ( ) ( = δik Vk , + ei · ek, V k d x = Vi , − ³ki ±

±

±

±

±

±

±

±

±

(V

k ,± e

±

)

k

+ Vk ek,

±

) dx

±

(13.90)

Vk d x ± .

If the curve is x ± (u ), then these conditions (13.89 and 13.90) for parallel transport are dVi du

= V,i

±

d x± du

= − ³i k

Vk ±

dx± du

and

d Vi du

= Vi ,

±

dx± du

= ³ki

Vk ±

dx± . du (13.91)

Example 13.13(Parallel transport on a sphere) We parallel-transport the vector v = eφ = (0, 1, 0) up from the equator along the line of longitude φ = 0. Along this path, the vector v = (0, 1, 0) = eφ is constant, so ∂θ v = 0 and so both eθ · eφ,θ = 0 and eφ · eφ ,θ = 0. Thus Dθ v k = v;kθ = 0 between the equator and the north pole. As θ → 0 along the meridian φ = 0, the vector v = (0, 1, 0) approaches the vector eθ of the φ = π/2 meridian. We then parallel-transport v = eθ down from the north pole along

484

13 General Relativity

that meridian to the equator. Along this path, the vector v = eθ / r = (0, cos θ , − sin θ ) obeys the parallel-transport condition (13.90) because its θ -derivative is v ,θ

= r −1 e, = θ

(0 , cos θ ,

· − sin θ ), = − (0, sin θ , cos θ ) = −ˆr · = θ

φ

π/2

.

(13.92)

So v ,θ is perpendicular to the tangent vectors eθ and eφ along the curve φ = π/2. Thus ek · v , θ = 0 for k = θ and k = φ and so v ; θ = 0, along the meridian φ = π/2. When eθ reaches the equator, it is eθ = (0, 0, −1). Finally, we parallel-transport v along the equator back to the starting point φ = 0. Along this path, the vector v = (0, 0, −1) = eθ is constant, so v ,φ = 0 and v ;φ = 0. The change from v = (0, 1 , 0 ) to v = (0 , 0 , −1 ) is due to the curvature of the sphere.

13.23 Curvature To find the curvature at a point p ( x0 ), we parallel-transport a vector Vi along a curve x ± that runs around a tiny square about the point p ( x 0). We then measure the change in the vector ´ Vi

=

¸

k

³ i±

Vk d x ± .

(13.93)

On the curve x ± , we approximate ³ki ± (x ) and Vk (x ) as

= ³ ki ( x0) + ³ki n ( x0) ( x − x0 )n (13.94) Vk ( x ) = Vk ( x 0) + ³mkn ( x 0) Vm ( x 0 ) (x − x 0) n . So keeping only terms linear in (x − x 0) n , we have ¸ k ´ Vi = ³ i Vk d x (13.95) ¸ º ¹ = ³ki n (x0 ) Vk (x0 ) + ³ ki ( x0) ³ mkn ( x0) Vm ( x0) ( x − x0 )n d x ¸ º ¹ k m k = ³ i n (x0 ) Vk (x0 ) + ³ i ( x0 ) ³ mn ( x0) Vk ( x0) ( x − x0 )n d x k

³ i ±( x )

±

±

±,

±

±,

±

±

±

±

±,

after interchanging the dummy indices k and m in the second term within the square brackets. The integral around the square is antisymmetric in n and ± and equal in absolute value to the area a 2 of the tiny square

¸

(x

− x0 )n d x = ³ a2 ²n . ±

±

(13.96)

The overall sign depends upon whether the integral is clockwise or counterclockwise, what n and ± are, and what we mean by positive area. The integral picks out the part of the term between the brackets in the formula (13.95) that is

13.23 Curvature

485

antisymmetric in n and ±. We choose minus signs in (13.96) so that the change in the vector is

= a2

´ Vi

¹ ³k − ³ k + ³k ³ m − ³k ³m º V . k in i n m in nm i ,±

±,

±

(13.97)

±

The quantity between the brackets is Riemann’s curvature tensor R ki ±n

= ³ kni − ³ k i n + ³ k m ³mni − ³k nm ³m i . ,±

± ,

±

(13.98)

±

The sign convention is that of (Zee, 2013; Misner et al., 1973; Carroll, 2003; Schutz, 2009; Hartle, 2003; Cheng, 2010; Padmanabhan, 2010). Weinberg (Weinberg, 1972) uses the opposite sign. The covariant form Ri jk± of Riemann’s tensor is related to R ki ±n by

= gin R n jk

Ri jk±

and

±

Ri jk ±

= gin Rn jk .

(13.99)

±

The Riemann curvature tensor is the commutator of two covariant derivatives. To see why, we first use the formula (13.78) for the covariant derivative Dn D± Vi of the second-rank covariant tensor D± Vi

( ) = Dn Vi , − ³k i Vk = Vi , n − ³k i n Vk − ³ k i Vk ,n (13.100) µ ¶ ) q j ( m m − ³ ni V j , − ³ j Vm − ³ n Vi , m − ³ im Vq . Subtracting D Dn Vi , we find the commutator [ Dn , D ]Vi to be the contraction of Dn D± Vi

±

±

±

± ,

±

±

±

the curvature tensor

R ki ±n

±

±

±

(13.98) with the covariant vector Vk

¶ [ Dn , D ]Vi = ³ kni − ³ k i n + ³k j ³ j ni − ³ kn j ³ j i Vk = R k i n Vk . (13.101) Since [Dn , D ]Vi is a rank-3 covariant tensor and Vk is an arbitrary covariant vecµ

±

,±

± ,

±

±

±

±

tor, the quotient theorem (Section 13.8) implies that the curvature tensor is a rank-4 tensor with one contravariant index. If we define the matrix ³± with row index k and column index i as ³ ki ±

⎛ ³0 ⎜⎜³1 00 = ⎝ ³2 0 ±

³±

³0± 1

³0± 2

³2± 1

³2± 2

³1± 1

±

±

³3± 0

³3± 1

³1± 2

³3± 2

⎞ ³ 1 3⎟ ⎟, ³ 2 3⎠

³0± 3 ±

(13.102)

±

³3± 3

then we may write the covariant derivatives appearing in the curvature tensor R ki ±n as D ± = ∂± + ³± and Dn = ∂n + ³n . In these terms, the curvature tensor is the i, k matrix element of their commutator R ki ±n

= [∂ + ³ , ∂n + ³n ]ki = [D , D n ]k i . ±

±

±

(13.103)

486

13 General Relativity

The curvature tensor is therefore antisymmetric in its last two indexes

= − R kin .

R ki ±n

(13.104)

±

The curvature tensor with all lower indices shares this symmetry R ji ±n

= g jk R ki n = − g jk R kin = − R jin ±

±

(13.105)

±

and has three others. In Riemann normal coordinates the derivatives of the metric vanish at any particular point x ∗. In these coordinates, the ³ ’s all vanish, and the curvature tensor in terms of the ³ ’s with all lower indices (13.88) is after a cancellation R ki ±n

= ³kni , − ³k i ,n = 21 ±

±

(g

kn ,i ±

− gni ,k − gk ,in + g i , kn ±

±

±

).

(13.106)

In these coordinates and therefore in all coordinates, Rki ± n is antisymmetric in its first two indexes and symmetric under the interchange of its first and second pairs of indexes Ri jk ± =

− R jik

and

±

= Rk i j .

R i jk±

(13.107)

±

Cartan’s equations of structure (13.328 and 13.330) imply (13.342) that the curvature tensor is antisymmetric in its last three indexes 0 = R [ik ±] j

=

1 µ j j j R ik ± + R ± ik + R k ±i 3!

−

j R ki ±

−Ri k−R j

±

j

¶

±ki

(13.108)

and obeys the cyclic identity 0=R

j

ik ±

+ R j ik + R jk i .

(13.109)

±

±

The vanishing (13.108) of R i [ jk ±] implies that the completely antisymmetric part of the Riemann tensor also vanishes 0 = R[i jk±]

= 41!

( R − R − R − R + R · · ·) . i jk jik ik j ij k jki ±

±

±

±

±

(13.110)

The Riemann tensor also satisfies a Bianchi identity 0 = R ij [k ±;m ] .

(13.111)

These symmetries reduce 256 different functions Ri jk ± ( x ) to 20. The Ricci tensoris the contraction Rin

= R kikn .

(13.112)

The curvature scalaris the further contraction R

= gni R in .

(13.113)

13.23 Curvature

487

Example 13.14 (Curvature of the sphere S 2) While in 4-dimensional spacetime indices run from 0 to 3, on the everyday sphere S2 (Example 13.4) they are just θ and φ . There are only eight possible affine connections, and because of the symmetry (13.65) in their lower indices ³i θ φ = ³ iφθ , only six are independent. In the euclidian embedding space E3, the point p on a sphere of radius L has cartesian coordinates p = L (sin θ cos φ , sin θ sin φ , cos θ ), so the two tangent 3-vectors are (13.37)

= =

eθ eφ

p,θ p,φ

= L (cos θ cos φ , cos θ sin φ , − sin θ ) = L θˆ = L sin θ (− sin φ , cos φ , 0) = L sin θ φˆ .

(13.114)

Their dot products form the metric (13.38)

´ ´ ³ e · e e · e ´ ³ L2 0 (13.115) gik = = e · e e · e = 0 L 2 sin2 θ g which is diagonal with g = L 2 and g = L 2 sin2 θ . Differentiating the vectors e ³g

θθ

φθ

gθ φ gφφ

θ

θ

θ

φ

φ

θ

φ

φ

θθ

and eφ , we find

eθ ,θ eθ ,φ eφ ,θ eφ ,φ

φφ

θ

= − L (sin θ cos φ , sin θ sin φ , cos θ ) = −L rˆ = L cos θ (− sin φ , cos φ , 0) = L cos θ φˆ =e , = − L sin θ (cos φ , sin φ , 0) .

(13.116)

θ φ

The metric with upper indices gi j is the inverse of the metric gi j (g

so the dual vectors ei eθ eφ

ij

)

=

³ L−2 0

0 L −2 sin−2 θ

´

(13.117)

,

= gik ek are

= L −1 (cos θ cos φ , cos θ sin φ , − sin θ ) = L −1θˆ 1 1 ˆ. (− sin φ , cos φ , 0 ) = φ = L sin θ L sin θ

(13.118)

The affine connections are given by (13.59) as

= ³i k j = e i · e j ,k . (13.119) Since both e and e are perpendicular to rˆ , the affine connections ³ and ³ both vanish. Also, e , is orthogonal to φˆ , so ³ = 0 as well. Similarly, e , is perpendicular to θˆ, so ³ = ³ also vanishes. i ³ jk

θ

φ

θ

φ

φ φ

φφ

θ θφ

θ

φ

θθ

θθ

θ φ

φθ

The two nonzero affine connections are ³

φ θφ

= e · e , = L−1 sin−1 θ φˆ · L cos θ φˆ = cot θ φ

θ φ

(13.120)

488

13 General Relativity

and

= e · e , = − sin θ (cos θ cos φ , cos θ sin φ , − sin θ ) · (cos φ , sin φ , 0) = − sin θ cos θ . (13.121) The nonzero connections are ³ = cot θ and ³ = − sin θ cos θ . So the matrices ³ and ³ , the derivative ³ , , and the commutator [³ , ³ ] are ³0 0 ´ ³ 0 − sin θ cos θ ´ ³ = and ³ = (13.122) 0 cot θ cot θ 0 ´ ³ 0 cos2 θ ´ ³ 0 sin2 θ − cos2 θ . and [³ , ³ ] = ³ , = cot2 θ 0 − csc2 θ 0 Both [³ , ³ ] and [³ , ³ ] vanish. So the commutator formula (13.103) gives for ³

θ φφ

θ

φ φ

φ

θ φφ

θφ

θ

φ

φ θ

θ

θ

φ

φ

θ

φ θ

θ

θ

φ

φ

φ

Riemann’s curvature tensor

=0 ( ) R = ³ , + [³ , ³ ] = 1 ( ) R = − ³ , + [³ , ³ ] = sin2 θ R = 0. (13.123) The Ricci tensor (13.112) is the contraction Rmk = R nmnk , and so R =R +R =1 (13.124) R =R + R = sin2 θ . The curvature scalar (13.113) is the contraction R = gkm Rmk , and so since g = L −2 and g = L −2 sin−2 θ , it is Rθθ θ θ φ

θ φθ

θ

φθ φ

φ

φφφ

=[∂ =[∂ =[∂ =[∂

θ

φ θ

φ

+³ ,∂ +³ ,∂ +³ ,∂ +³ ,∂ θ

θ

φ

θ

θ

φ

φ

φ

+³ ] +³ ] +³ ] +³ ]

θθ

φφ

θ

θ

φ

θ

θ

θ

θ φ

θ

φ

θ φ

φ

φ

θ

θ

φ

θ

θ

φ

θ

φ

θ

φ

φ

φ

θ

θ

φ

φ

φ

θθθ

θ φθ

φθ φ

φ φφφ

θθ

φφ

R

=g

θθ

Rθ θ

+g

φφ

R φφ

= L −2 + L −2 =

2 L2

(13.125)

for a 2-sphere of radius L . The scalar curvature is a constant because the sphere is a maximally symmetric space (Section 13.24). Gauss invented a formula for the curvature K of a surface; for all two-dimensional surfaces, his K = R / 2. Example 13.15(Curvature of a cylindrical hyperboloid) The points of a cylindrical hyperboloid in 3-space satisfy L 2 = − x 2 − y 2 + z 2 and may be parametrized as p = L (sinh θ cos φ , sinh θ sin φ , cosh θ ). The (orthogonal) coordinate basis vectors are eθ eφ

= p, = L(cosh θ cos φ , cosh θ sin φ , sinh θ ) = p, = L(− sinh θ sin φ , sinh θ cos φ , 0). θ

(13.126)

φ

The squared distance ds2 between nearby points is ds2 = eθ · eθ d θ 2 + eφ · eφ dφ 2.

(13.127)

13.23 Curvature

489

= diag(1, 1, −1), then ds 2 is ds2 = L 2 d θ 2 + L 2 sinh 2 θ d φ 2

If the embedding metric is m

and (gi j )

= L2

³1 0

0 sinh 2 θ

(13.128)

´

(13.129)

.

The Mathematica scripts GREAT.m and cylindrical_hyperboloid.nb compute the scalar curvature as R = − 2/ L 2 . The surface is maximally symmetric with constant negative curvature. This chapter’s programs and scripts are in Tensors_and_general_relativity at github.com/kevinecahill. Example 13.16 (Curvature of the sphere S 3) The 3-dimensional sphere S3 may be embedded isometrically in 4-dimensional flat euclidian space E4 as the set of points p = (x , y , z , w) that satisfy L 2 = x 2 + y2 + z 2 + w2 . If we label its points as p(χ , θ , φ) = L (sin χ sin θ cos φ , sin χ sin θ sin φ , sin χ cos θ , cos χ ),

(13.130)

then its coordinate basis vectors are eχ eθ eφ

= p, = L(cos χ sin θ cos φ , cos χ sin θ sin φ , cos χ cos θ , − sin χ ) = p, = L(sin χ cos θ cos φ , sin χ cos θ sin φ , − sin χ sin θ , 0) = p, = L(− sin χ sin θ sin φ , sin χ sin θ cos φ , 0, 0). χ

θ

(13.131)

φ

The inner product of E4 is the 4-dimensional dot-product. The basis vectors are orthogonal. In terms of the radial variable r = L sin χ , the squared distance ds2 between two nearby points is ds 2 = eχ · eχ d χ 2 + eθ · eθ d θ 2 + eφ · eφ d φ2

= L2

=

µ

d χ 2 + sin2 χ dθ 2 + sin2 χ sin2 θ d φ 2

dr 2

1 − sin2 χ

+ r 2d θ 2 + r 2 sin2 θ dφ 2 =

¶

dr 2 1 − (r / L )2

(13.132)

+ r 2 d µ2

where d µ2 = dθ 2 + sin2 θ d φ 2. In these coordinates, r, θ , φ, the metric is

⎛1/(1 − (r / L )2) 0 gik = ⎝ 0

0 r2 0

⎞

0 0 ⎠. 2 r sin2 θ

(13.133)

The Mathematica scripts GREAT .m and sphere_S3.nb compute the scalar curvature as 6 (13.134) R= 2 L which is a constant because S3 is maximally symmetric (Section 13.24).

490

13 General Relativity

Example 13.17 (Curvature of the hyperboloid H 3) The hyperboloid H 3 is a 3-dimensional surface that can be isometrically embedded in the semi-euclidian spacetime E(1,3) in which distances are ds2 = dx 2 + dy 2 + dz 2 − dw 2, and w is a time coordinate. The points of H 3 satisfy L 2 = − x 2 − y2 − z2 + w2 . If we label them as p(χ , θ , φ) = L (sinhχ sin θ cos φ , sinh χ sin θ sin φ , sinhχ cos θ , coshχ ) (13.135) then the coordinate basis vectors or tangent vectors of H 3 are

= p, = L(coshχ sin θ cos φ , coshχ sin θ sin φ , coshχ cos θ , sinhχ ) e = p, = L (sinh χ cos θ cos φ , sinh χ cos θ sin φ , − sinhχ sin θ , 0) (13.136) e = p, = L (− sinhχ sin θ sin φ , sinhχ sin θ cos φ , 0, 0). The basis vectors are orthogonal. In terms of the radial variable r = L sinh χ /a, the eχ

χ

θ

θ

φ

φ

squared distance ds2 between two nearby points is ds2

= e · e d χ 2 + e · e d θ 2 + e · e dφ 2 µ ¶ = L 2 d χ 2 + sinh2 χ d θ 2 + sinh2 χ sin2 θ d φ2 χ

=

χ

dr 2

θ

1 + sinh2 χ

θ

φ

φ

+ r 2d θ 2 + r 2 sin2 θ d φ2 =

dr 2 1 + (r / L )2

(13.137)

+ r 2 d µ2 .

The Mathematica scripts GREAT .m and hyperboloid_H3.nb compute the scalar curvature of H 3 as 6 R = − 2. (13.138) L Its curvature is a constant because H 3 is maximally symmetric (Section 13.24). The only maximally symmetric 3-dimensional manifolds are S3, H 3, and euclidian space E3 whose line element is ds2 = dr 2 + r 2d µ2. They are the spatial parts of Friedmann–Lemaître–Robinson–Walker cosmologies (Section 13.42).

13.24 Maximally Symmetric Spaces The spheres S 2 and S 3 (Examples 13.4 and 13.16) and the hyperboloids H 2 and H 3 (Examples 13.6 and 13.17) are maximally symmetric spaces. A space described by a metric gik ( x ) is symmetric under a transformation x → x ± if the distances gik ( x ± )d x ±i d x ±k and gik ( x )d x i d x k are the same. To see what this symmetry condition means, we consider the infinitesimal transformation x ±± = x ± + ² y ± (x ) under which to lowest order gik ( x ±) = g ik ( x ) + g ik ,± ² y ± and d x ±i = d x i + ² y,i j d x j . The symmetry condition requires gik ( x ) d x i d x k

= ( gik ( x ) + gik, ² y )(d x i + ² y,i j d x j )(d xk + ² y,kmd x m ) ±

±

(13.139)

13.24 Maximally Symmetric Spaces

or

491

0 = gik ,± y ± + gim y,mk + g jk y,i . j

(13.140)

The vector field y i (x ) must satisfy this condition if x ±i = x i + ² y i ( x ) is to be a symmetry of the metric gik ( x ). By using the vanishing (13.83) of the covariant derivative of the metric tensor, we may write the condition on the symmetry vector y ± (x ) as (Exercise 13.9) 0 = yi ;k

+ yk; i .

(13.141)

The symmetry vector y ± is a Killing vector (Wilhelm Killing, 1847–1923). We may use symmetry conditions (13.140) and (13.141) either to find the symmetries of a space with a known metric or to find metrics with a particular symmetry. Example 13.18 (Killing vectors of the sphere S2 ) The first Killing vector is φ θ ( y1 , y1 ) = (0 , 1 ). Since the components of y1 are constants, the symmetry condition (13.140) says gik ,φ = 0 which tells us that the metric is independent of φ. φ φ The other two Killing vectors are ( y2θ , y2 ) = (sin φ , cot θ cos φ) and ( yθ3 , y3 ) = (cos φ , − cot θ sin φ). The symmetry condition (13.140) for i = k = θ and Killing vectors y2 and y3 tell us that gθ φ = 0 and that gθ θ ,θ = 0. So gθ θ is a constant, which we set equal to unity. Finally, the symmetry condition (13.140) for i = k = φ and the Killing vectors y2 and y3 tell us that gφφ ,θ = 2 cot θ gφφ which we integrate to gφφ = sin2 θ . The 2-dimensional space with Killing vectors y1 , y2 , y3 therefore has the metric (13.115) of the sphere S2 . Example 13.19 (Killing vectors of the hyperboloid H 2) The metric (13.46) of the hyperboloid H 2 is diagonal with gθ θ = R 2 and gφφ = R 2 sinh2 θ . The Killing φ vector (y1θ , y1 ) = (0, 1) satisfies the symmetry condition (13.140). Since gθ θ is independent of θ and φ , the θ θ component of (13.140) implies that y,θθ = 0. Since

component of (13.140) says that y,φ = − coth θ yθ . φ The θ φ and φθ components of (13.140) give yθ,φ = − sinh 2 θ y,θ . The vectors φ φ y2 = ( yθ2 , y2 ) = (sin φ , coth θ sin φ) and y3 = (y3θ , y3 ) = (cos φ , − coth θ sin φ) satisfy both of these equations. gφφ

=

R 2 sinh 2 θ , the

φ

φφ

The Lie derivative L y of a scalar field A is defined in terms of a vector field y ± (x ) as L y A = y ± A, ± . The Lie derivative L y of a contravariant vector F i is (13.142) = y F;i − F y;i in which the second equality follows from y ³ i k F k = F ³i k y k . The Lie derivative

Ly F i = y

±

F,i± − F ± y,i±

±

Vi ,± + V± y ,±i

±

±

±

Ly of a covariant vector Vi is Ly Vi = y

±

±

±

±

±

= y Vi ; + V y;i . ±

±

±

±

(13.143)

492

13 General Relativity

Similarly, the Lie derivative L y of a rank-2 covariant tensor Tik is

Ly Tik = y Tik, + T k y,i + Ti ±

±

±

±

±

y ,±k .

(13.144)

We see now that the condition (13.140) that a vector field y ± be a symmetry of a metric g jm is that its Lie derivative

Ly gik = gik,

y ± + gim y,mk

±

+ g jk y,ji = 0

(13.145)

must vanish. A maximally symmetric space (or spacetime) in d dimensions has d translation symmetries and d (d − 1)/2 rotational symmetries which gives a total of d ( d + 1 )/2 symmetries associated with d (d + 1)/2 Killing vectors. Thus for d = 2, there is one rotation and two translations. For d = 3, there are three rotations and three translations. For d = 4, there are six rotations and four translations. A maximally symmetric space has a curvature tensor (13.99) that is simply related to its metric tensor Ri jk ± = c (gik g j ± − gi ± g jk )

(13.146)

where c is a constant (Zee, 2013, IX.6). Since g ki gik = g kk = d is the number of dimensions of the space(time), the Ricci tensor (13.112) and the curvature scalar (13.113) of a maximally symmetric space are R j ± = g ki R i jk±

= c (d − 1) g j

±

and

R

= g j R j = c d (d − 1). ±

±

(13.147)

13.25 Principle of Equivalence Since the metric tensor gi j ( x ) is real and symmetric, it can be diagonalized at any point p ( x ) by a 4 × 4 orthogonal matrix O ( x )

O

T

k i

gk ±

⎛e 0 ⎜⎜ 0 O j= ⎝ 0

0 e1 0 0

±

0

0 0 e2 0

⎞

0 0⎟ ⎟ 0⎠ e3

(13.148)

which arranges the four real eigenvalues ei of the matrix gi j ( x ) in the order e 0 e1 ≤ e2 ≤ e 3. Thus the coordinate transformation ∂ xk

∂ x ±i

O

k

= √|e | T

i

≤

i

(13.149)

13.26 Tetrads

493

takes any spacetime metric gk ± ( x ) with one negative and three positive eigenvalues into the Minkowski metric ηi j of flat spacetime

gk ± (x )

∂xk ∂x±

∂ x ±i ∂ x ± j

⎛−1 ⎜0 = gi± j ( x ±) = ηi j = ⎜ ⎝0 0

0 1 0 0

0 0 1 0

⎞

0 0⎟ ⎟ 0⎠ 1

(13.150)

at the point p ( x ) = p ( x ± ) . The principle of equivalencesays that in these free-fall coordinates x ± , the physical laws of gravity-free special relativity apply in a suitably small region about the point p (x ) = p (x ± ). It follows from this principle that the metric gi j of spacetime accounts for all the effects of gravity. In the x ± coordinates, the invariant squared separation d p 2 is

= gi± j d x ±i d x ± j = e±i ( x± ) · e±j ( x ±) d x ±i d x ± j = e±ia ( x ± )ηab e±jb( x ± ) d x ±i d x ± j = δia ηab δ bj d x ±i d x ± j (13.151) = ηi j d x ±i d x ± j = ( d x ±)2 − (d x ±0 )2 = ds 2 . √ If d x ± = 0, then dt ± = −ds 2 /c is the proper timeelapsed between events p and p + d p. If dt ± = 0, then ds is the proper distancebetween the events. d p2

The x ± coordinates are not unique because every Lorentz transformation (Section 12.1) leaves the metric η invariant. Coordinate systems in which gi j ( x ± ) = ηi j are called Lorentz, inertial, or free-fall coordinate systems. The congruency transformation (1.351 and 13.148–13.150) preserves the signs of the eigenvalues ei which make up the signature (−1, 1, 1 , 1) of the metric tensor. 13.26 Tetrads We defined the metric tensor as the dot product (13.34) or (13.42) of tangent vectors, g k± (x ) = ek ( x ) · e ± ( x ). If instead we invert the equation (13.150) that relates the metric tensor to the flat metric g k ± (x ) =

∂ x ±a ∂ xk

ηab

∂ x ±b ∂x±

(13.152)

then we can express the metric in terms of four 4-vectors cak ( x ) =

∂ x ±a ∂xk

as

gk ± ( x ) = c ak ( x ) ηab c b± ( x )

(13.153)

in which ηi j is the 4 × 4 metric (13.150) of flat Minkowski space. Cartan’s four 4-vectors cia ( x ) are called a moving frame, a tetrad, and a vierbein.

494

13 General Relativity

Because L ac ( x ) ηab L bd (x ) = ηcd , every spacetime-dependent Lorentz transformation L ( x ) maps one set of tetrads c ck ( x ) to another set of tetrads c ±ka ( x ) = L a c (x ) c ck ( x ) that represent the same metric ck±a ( x ) ηab c±±b (x ) = L ac (x ) c ck ( x ) ηab L bd ( x ) cd± (x )

(13.154)

= cck (x ) ηcd cd (x ) = gk ( x ). ±

±

Cartan’s tetrad is four 4-vectors ci that give the metric tensor as gik cí · c´k − c0i ck0. The dual tetrads cai = g ik ηab cbk satisfy c ia cak

= δik

and

cai c bi

= δab .

= ci · ck = (13.155)

The metric g k ± (x ) is symmetric, gk ± ( x ) = g ±k (x ) , so it has 10 independent components at each spacetime point x. The four 4-vectors cak have 16 components, but a Lorentz transformation L ( x ) has 6 components. So the tetrads have 16 − 6 = 10 independent components at each spacetime point. The distinction between tetrads and tangent basis vectors is that each tetrad c ak has 4 components, a = 0, 1, 2 , 3, while each basis vector e αk ( x ) has as many components α = 0, 1 , 2, 3, . . . as there are dimensions in the minimal semi-euclidian embedding space E1,n where n ≤ 19 (Aké et al., 2018). Both represent the metric g k ± (x ) =

3 ±

a,b=0

in which η± is like 1869–1951).

η

c ak (x ) ηab cb± ( x ) =

n ±

α,β

=0

e αk ( x ) η±αβ e

β ±

(x )

(13.156)

but with n diagonal elements that are unity (Élie Cartan,

13.27 Scalar Densities andg =| det( g i k) | Let g be the absolute value of the determinant of the metric tensor gik

= g(x ) = | det(gik ( x ))|. √ Under a coordinate transformation, g becomes ½· ³ ´·· ·· ∂ x j ∂ x »± »± ± ¼ ± ± g j ( x ) ·· . g = g (x ) = | det( gik ( x ))| = ·det ∂ x ±i ∂ x ±k g

±

±

(13.157)

(13.158)

The definition (1.204) of a determinant and the product rule (1.225) for determinants tell us that

»

½· ³ j ´ ³ ´ ·· · ) ( ∂x ∂x ± )|»g ( x ) · = | J ( x / x g ± ( x ± ) = ··det det det g j · ∂ x ±i ∂ x ±k ±

±

(13.159)

13.28 Levi-Civita’s Symbol and Tensor

495

where J ( x /x ± ) is the jacobian (Section 1.21) of the coordinate transformation J ( x /x ± ) = det

³ ∂x j ´ ∂ x± i

(13.160)

.

A quantity s ( x ) is a scalar densityof weight w if it transforms as s ± (x ± ) = [ J ( x ± /x )]w s (x ).

(13.161)

Thus the transformation rule (13.159) says that the determinant det( gik ) is a scalar density of weight minus two

± (x ± )) = det( gik

= [ J (x / x± )]2 g( x) = [ J ( x ± /x )]−2 det(g j (x )). ±

(13.162)

We saw in Section 1.21 that under a coordinate transformation x → x ± the d dimensional element of volume in the new coordinates d d x ± is related to that in the old coordinates d d x by a jacobian d x ± = J (x ± /x ) d d x d

Thus the product when x → x ±

= det

³ ∂ x ±i ´ ∂x j

d d x.

√ g d d x changes at most by the sign of the jacobian

J ( x ± /x )

» = ³ g(x ) d d x. (13.164) √ √ The quantity g d 4 x is the invariant scalar g |d 4 x | so that if L ( x ) is a scalar, »

g± d d x ±

= | J ( x /x ± )| J (x ± /x )

then the integral over spacetime

¾

»

(13.163)

g(x ) dd x

L(x )

√g d 4 x

(13.165)

is invariant under general coordinate transformations. The Levi-Civita tensor provides a fancier definition. 13.28 Levi-Civita’s Symbol and Tensor

In 3 dimensions, Levi-Civita’s symbol ²i jk ≡ ² i jk is totally antisymmetric with ²123 = 1 in all coordinate systems. In 4 space or spacetime dimensions, LeviCivita’s symbol ²i jk± ≡ ² i jk± is totally antisymmetric with ² 1234 = 1 or equivalently with ² 0123 = 1 in all coordinate systems. In n dimensions, Levi-Civita’s symbol ²i 1 i 2 ... in is totally antisymmetric with ² 123... n = 1 or ² 012... n−1 = 1. We can turn his symbol into a pseudotensor by multiplying it by the square root of the absolute value of the determinant of a rank-2 covariant tensor. A natural

496

13 General Relativity

choice is the metric tensor. In a right-handed coordinate system in which the tangent vector e 0 points (orthochronously) toward the future, the Levi-Civita tensor ηi jk ± is the totally antisymmetric rank-4 covariant tensor ηi jk ± ( x )

=

»

g(x )

(13.166)

²i jk ±

in which g (x ) = | det gmn (x )| is (13.157) the absolute value of the determinant of the metric tensor gmn . In a different system of coordinates x ± , the Levi-Civita tensor ηi jk± ( x ± ) differs from (13.166) by the sign s of the jacobian J ( x ± /x ) of any coordinate transformation to x ± from a right-handed, orthochronous coordinate system x

±

ηi jk ± ( x )

= s( x ± )

»

g ( x ±)

(13.167)

²i jk ± .

The transformation rule (13.159) and the definition (1.204) and product rule (1.225) of determinants show that ηi jk± transforms as a rank-4 covariant tensor

±

±

ηi jk ± ( x )

» » = s ( x± ) g± ( x ± ) ²i jk = s( x ± ) | J ( x /x ± )| g( x ) ²i jk ³ ∂x ´ √ » ± = J ( x /x ) g( x ) ²i jk = det ∂ x ± g ²i jk t t u u = ∂ x ∂ x ∂ x ∂ x √g ² = ∂ x ∂ x ∂ x ±

±

v

η

i jk ±

η

(13.168)

±

w

v

∂ x ± i ∂ x ± j ∂ x ±k ∂ x ±±

Raising the indices of

±

∂xw

and using σ as the sign of det( gik ), we have

= git g ju gk g ηtu = git g ju gk g √g ²tu = √g ²i jk = √g ²i jk / det( gmn ) = σ ²i jk /√ g ≡ σ ² i jk /√ g. v

±w

ηtu vw .

∂ x ±i ∂ x ± j ∂ x ±k ∂ x ± ±

tu vw

v

vw

±

±w

vw

±

det (g mn )

±

±

(13.169)

In terms of the Hodge star (14.151), the invariant volume element is

√ g |d 4 x | = ∗ 1 = 1 η i jk 4!

±

dxi

∧ dx j ∧ dxk ∧ dx . ±

(13.170)

13.29 Divergence of a Contravariant Vector The contracted covariant derivative of a contravariant vector is a scalar known as the divergence,

∇ · V = V;ii = V,ii + V k ³iki .

Because gik

(13.171)

= gki , in the sum over i of the connection (13.59) ) 1 i ( i g i , k + g k, i − g ki , ³ ki = g 2 ±

±

±

±

(13.172)

the last two terms cancel because they differ only by the interchange of the dummy indices i and ± g i ± g± k ,i

= g i gik , = gi gki , . ±

±

±

±

(13.173)

13.29 Divergence of a Contravariant Vector

497

So the contracted connection collapses to i

³ ki

= 21 gi gi , k . ±

(13.174)

±

There is a nice formula for this last expression. To derive it, let g ≡ gi ± be the 4 × 4 matrix whose elements are those of the covariant metric tensor gi ± . Its determinant, like that of any matrix, is the cofactor sum (1.213) along any row or column, that is, over ± for fixed i or over i for fixed ± det (g ) =

±

g i ± Ci ±

(13.175)

i or ±

in which the cofactor C i ± is ( −1)i +± times the determinant of the reduced matrix consisting of the matrix g with row i and column ± omitted. Thus the partial derivative of det g with respect to the i ±th element g i ± is ∂

det( g ) ∂ gi ±

= Ci

(13.176)

±

in which we allow gi ± and g ±i to be independent variables for the purposes of this differentiation. The inverse g i ± of the metric tensor g, like the inverse (1.215) of any matrix, is the transpose of the cofactor matrix divided by its determinant det( g ) gi ±

= detC( ig) = det1( g) ±

∂

det(g ) ∂ g±i

(13.177)

.

Using this formula and the chain rule, we may write the derivative of the determinant det( g ) as det( g ),k

=

∂

det(g ) ∂ gi ±

gi ± ,k

= det( g) g i gi , k ±

±

(13.178)

and so since gi ± = g ±i , the contracted connection (13.174) is i

³ ki

=

1 i± g gi ± ,k 2

··

··

| det(g) |,k g,k ( √g ),k = 2 det( g) = 2| det( g)| = 2g = √ g det( g ),k

(13.179)

in which g ≡ ·det (g )· is the absolute value of the determinant of the metric tensor. Thus from (13.171 and 13.179), we arrive at our formula for the covariant divergence of a contravariant vector:

∇·V =

V;ii

=

V,ii

+

i k ³ki V

=

V,kk

√ √g ) ( g V k ),k ,k k + √g V = √ g . (

(13.180)

498

13 General Relativity

Example 13.20(Maxwell’s inhomogeneous equations) An important application of this divergence formula (13.180) is the generally covariant form (14.157) of Maxwell’s inhomogeneous equations

¶ µ √1g √g F k , = μ0 j k .

(13.181)

±

±

Example 13.21(Energy–momentum tensor) Another application is to the divergence of the symmetric energy–momentum tensor T i j = T ji

= T,iij + ³i ki T k j + ³ j mi T im √ kj ( gT )k = √g + ³ j mi T im .

ij

T;i

(13.182)

13.30 Covariant Laplacian

In flat 3-space, we write the laplacian as ∇ · ∇ = ∇ 2 or as µ. In euclidian coordinates, both mean ∂x2 + ∂ 2y + ∂z2 . In flat minkowski space, one often turns the triangle into a square and writes the 4-laplacian as ± = µ − ∂02 . The gradient f, k of a scalar field f is a covariant vector, and f ,i = g ik f,k is its contravariant form. The invariant laplacian ± f of a scalar field f is the ,i covariant divergence f;i . We may use our formula (13.180) for the divergence of a contravariant vector to write it in these equivalent ways

±f =

f;,ii

√ ,i √ gik f ) ,k ,i = (gik f,k );i = ( g√fg ) ,i = ( g √ . g

(13.183)

13.31 Principle of Stationary Action in General Relativity The invariant proper time for a particle to move along a path x i (t ) T

=

¾

τ2

τ1

¾( ) dτ = − gi d x i d x 1 c

±

±

1 2

(13.184)

is extremal and stationary on free-fall paths called geodesics. We can identify a geodesic by computing the variation δ d τ cδ d τ

i i » i d x δd x = δ −gi d x i d x = −δ(gi )d»x d x − 2g 2 −gi d x i d x gi i gi ,k k i gi i ,k k i = − g2c δ x u u dτ − u δd x = − δ x u u dτ − c 2c c ±

±

±

±

±

±

±

±

±

±

±

(13.185)

±

±

±

±

ui d δx ±

13.31 Principle of Stationary Action in General Relativity

= d x /dτ is the 4-velocity (12.20). The path is extremal if ´ ¾ ¾ ³1 1 k i i d δx gi ,k δ x u u + gi u dτ δdτ = − 0 = cδ T = c c 2 dτ

in which u ±

499

±

τ2

τ2

τ1

±

±

±

±

(13.186)

τ1

which we integrate by parts keeping in mind that δ x ± (τ 2) = δ x ± (τ1 ) = 0 0=

−

=−

¾ ³1 τ2

2

¾ ³1 τ1

τ2

2

τ1

k i

±

k i

±

gi ± ,k δ x u u gi ± ,k δ x u u

− −

du i ± d τ. (13.187) gi ± ,k u u δ x − g i ± δx dτ

Now interchanging the dummy indices have 0=

−

¾ ³1 τ2

τ1

´

d (gi ± u i ) ± dτ δx dτ

±

i

k

and k on the second and third terms, we

dui g i ±,k u u − g ik ,± u u − gik 2 dτ i

´

±

i

±

±

´

δx

k

dτ

(13.188)

or since δ x k is arbitrary 0=

1 du i gi ± ,k u i u ± − gik , ± u i u ± − gik . 2 dτ

(13.189)

If we multiply this equation of motion by gr k and note that gik ,± u i u ± = g±k ,i u i u ± , then we find ) 1 rk ( du r + g gik ,± + g ±k ,i − gi ± ,k u i u ± . (13.190) 0= dτ 2 So using the symmetry gi ± = g ±i and the formula (13.87) for ³r i ± , we get i ± d 2 xr du r i ± r r dx dx or 0= + ³ i± u u + ³ i± (13.191) dτ dτ2 dτ dτ which is the geodesic equation. In empty space, particles fall along geodesics independently of their masses . One gets the same geodesic equation from the simpler action principle

0=

0=δ

¾

λ2 λ1

gi ± (x )

dxi dx± dλ dλ d λ

2 r

=⇒ 0 = dd λx2 + ³ri

±

dxi dx± . dλ dλ

(13.192)

The right-hand side of the geodesic equation (13.191) is a contravariant vector because (Weinberg, 1972) under general coordinate transformations, the inhomogeneous terms arising from x¨ r cancel those from ³r i ± x˙ i x˙ ± . Here and often in what follows we’ll use dots to mean proper-time derivatives. The action for a particle of mass m and charge q in a gravitational field ³r i ± and an electromagnetic field A i is

¾ ¾( ) q i S = − mc − gi d x d x + c 1

±

± 2

τ2 τ1

A i (x ) d x i

(13.193)

500

13 General Relativity

¿

in which the interaction q Ai d x i is invariant under general coordinate transformations. By (12.59 and 13.188), the first-order change in S is δS

=m

¾ À1 τ2

τ1

dui gi ±, k u u − gik ,± u u − gik 2 dτ i

i

±

±

+

q ( A i ,k mc

− Ak ,i

) ui Á δx k dτ

(13.194) and so by combining the Lorentz force law (12.60) and the geodesic equation (13.191) and by writing F ri x˙i as F ri x˙ i , we have i ± d 2 xr q r dxi r dx dx + ³ i± − F (13.195) dτ2 dτ dτ m i dτ as the equation of motion of a particle of mass m and charge q . It is striking how nearly perfect the electromagnetism of Faraday and Maxwell is. The action of the electromagnetic field interacting with an electric current j k in a gravitational field is

0=

¾¹ − 41 Fk S=

√

±

F k ± + μ 0 Ak j k

º √g d4 x

(13.196)

in which g d 4 x is the invariant volume element. After an integration by parts, the first-order change in the action is δS

=

¾À

−∂ x

(

∂

±

F

k±

√g) + μ

0

j

k

√gÁ δ A

k

d 4x ,

(13.197)

and so the inhomogeneous Maxwell equations in a gravitational field are ∂ ∂x

(√g F k ) = μ √g j k . ±

±

0

(13.198)

The action of a scalar field φ of mass m in a gravitational field is

¾( ) − φ,i gik φ, k − m 2φ 2 √g d 4 x . S= 1 2

(13.199)

After an integration by parts, the first-order change in the action is δS

=

¾

δφ

Â( √

) − m 2 √g φÃ d4 x

g g ik φ,k ,i

(13.200)

which yields the equation of motion

(√ ik ) √ g g φ,k ,i − m 2 g φ = 0.

(13.201)

The action of the gravitational field itself is a spacetime integral of the Riemann scalar (13.113) divided by Newton’s constant S

=

c3 16π G

¾ √ R g d4x .

Its variation leads to Einstein’sequations (Section 13.35).

(13.202)

13.32 Equivalence Principle and Geodesic Equation

501

13.32 Equivalence Principle and Geodesic Equation The principle of equivalence(Section 13.25) says that in any gravitational field, one may choose free-fall coordinates in which all physical laws take the same form as in special relativity without acceleration or gravitation – at least over a suitably small volume of spacetime. Within this volume and in these coordinates, things behave as they would at rest deep in empty space far from any matter or energy. The volume must be small enough so that the gravitational field is constant throughout it. Such free-fall coordinate systems are called local Lorentz frames and local inertial frames. Example 13.22(Elevators) When a modern elevator starts going down from a high floor, it accelerates downward at something less than the local acceleration of gravity. One feels less pressure on one’s feet; one feels lighter. After accelerating downward for a few seconds, the elevator assumes a constant downward speed, and then one feels the normal pressure of one’s weight on one’s feet. The elevator seems to be slowing down for a stop, but actually it has just stopped accelerating downward. What if the cable snapped, and a frightened passenger dropped his laptop? He could catch it very easily as it would not seem to fall because the elevator, the passenger, and the laptop would all fall at the same rate. The physics in the falling elevator would be the same as if the elevator were at rest in empty space far from any gravitational field. The laptop’s clock would tick as fast as it would at rest in the absence of gravity, but to an observer on the ground it would appear slower. What if a passenger held an electric charge? Observers in the falling elevator would see a static electric field around the charge, but observers on the ground could detect radiation from the accelerating charge. Example 13.23(Proper time) If the events are the ticks of a clock, then the proper time between ticks d τ /c is the time between the ticks of the clock at rest or at speed zero if the clock is accelerating. The proper lifetime dτ± /c of an unstable particle is the average time it takes to decay at speed zero. In arbitrary coordinates, this proper lifetime is c2 d τ±2

= −ds 2 = − gik (x ) dx i dx k .

(13.203)

Example 13.24 (Clock hypothesis) The apparent lifetime of an unstable particle is independent of the acceleration of the particle even when the particle is subjected to centripetal accelerations of 1019 m/s2 (Bailey et al., 1977) and to longitudinal accelerations of 1016m/s2 (Roos et al., 1980).

The transformation from arbitrary coordinates x k to free-fall coordinates y i changes the metric g j ± to the diagonal metric ηik of flat spacetime η = diag(−1, 1, 1 , 1), which has two indices and is not a Levi-Civita tensor. Algebraically, this transformation is a congruence (1.353)

502

13 General Relativity j

ηik

= ∂∂xyi

g j±

∂x± ∂yk

(13.204)

.

The geodesic equation (13.191) follows from the principle of equivalence (Weinberg, 1972; Hobson et al., 2006). Suppose a particle is moving under the influence of gravitation alone. Then one may choose free-fall coordinates y ( x ) so that the particle obeys the force-free equation of motion d 2y i dτ 2

with d τ the proper time d τ 2 (13.205) gives 0=

=

=0

(13.205)

= −ηik dyi dy k /c2. The chain rule applied to yi ( x) in

³

´

d ∂ yi d x k dτ ∂ x k d τ ∂ 2 yi d x k d x ± ∂ yi d 2 x k + . ∂ x k dτ 2 ∂ x k∂ x ± dτ dτ

(13.206)

We multiply by ∂ x m /∂ y i and use the identity ∂ x m ∂ yi ∂ yi ∂ x k

= δkm

(13.207)

to write the equation of motion (13.205) in the x-coordinates d 2x m dτ2

+ ³mk

±

dxk dx± dτ dτ

= 0.

(13.208)

This is the geodesic equation (13.191) in which the affine connection is m

³ k±

m

= ∂∂xyi

∂ 2 yi ∂xk ∂x±

.

(13.209)

13.33 Weak Static Gravitational Fields Newton’s equations describe slow motion in a weak static gravitational field. Because the motion is slow, we neglect u i compared to u 0 and simplify the geodesic equation (13.191) to du r 0 2 r 0= (13.210) + ³ 00 (u ) . dτ Because the gravitational field is static, we neglect the time derivatives gk0,0 and g0k ,0 in the connection formula (13.87) and find for ³r 00 r

³ 00

= 21 gr k

(g

0k ,0

) + g0k ,0 − g00,k = − 21 gr k g00,k

(13.211)

13.34 Gravitational Time Dilation

503

with ³ 000 = 0. Because the field is weak, the metric can differ from ηi j by only a tiny tensor gi j = ηi j + h i j so that to first order in |h i j | ¶ 1 we have ³r 00 = − 21 h 00,r for r = 1, 2 , 3. With these simplifications, the geodesic equation (13.191) reduces to ³ ´2 d 2x r 1 0 2 d 2 xr 1 dx0 = 2 (u ) h 00,r or d τ 2 = 2 dτ h00,r . (13.212) dτ 2 So for slow motion, the ordinary acceleration is described by Newton’s law d2 x dt 2

2

= c2 ∇h00 .

(13.213)

If φ is his potential, then for slow motion in weak static fields

= − 2φ/c2 . (13.214) Thus, if the particle is at a distance r from a mass M, then φ = − G M /r and h 00 = −2φ/ c 2 = 2G M / r c2 and so g 00

= −1 + h00 = −1 − 2φ/c2

and so

h 00

GM r d 2x = − ∇ φ = ∇ (13.215) = − GM 3. 2 dt r r How weak are the static gravitational fields we know about? The dimensionless ratio φ/ c2 is 10− 39 on the surface of a proton, 10−9 on the Earth, 10 −6 on the surface of the sun, and 10−4 on the surface of a white dwarf. 13.34 Gravitational Time Dilation The proper time (Example 13.23) interval d τ of a clock at rest in the weak, static gravitational field (13.210–13.215) satisfies equation (13.203)

= − ds 2 = −gik ( x ) d xi d x k = −g00 c2dt 2 = (1 + 2φ/c2 ) c2 dt 2. (13.216) So if two clocks are at rest at distances r and r + h from the center of the Earth, then the times dtr and dtr +h between ticks of the clock at r and the one at r + h are c2 dτ 2

related by the proper time d τ of the ticks of the clock

+ 2φ (r ))dtr2 = c2d τ 2 = (c2 + 2φ (r + h))dtr2+h . (13.217) Since φ (r ) = − G M / r, the potential at r + h is φ (r + h ) ≈ φ (r ) + gh, and so the 2

(c

ratio of the tick time dtr of the lower clock at r to the tick time of the upper clock at r + h is dtr dtr +h

»2 »2 c + 2 φ (r + h ) . = c »+22φ (r ) + 2gh ≈ 1 + gh = »2 c2 c + 2 φ (r ) c + 2φ (r )

The lower clock is slower.

(13.218)

504

13 General Relativity

Example 13.25 (Pound and Rebka) Pound and Rebka in 1960 used the Mössbauer effect to measure the blue shift of light falling down a 22.6 m shaft. They found (ν± − νu )/ν = gh/c 2 = 2.46 × 10−15 (Robert Pound 1919– 2010, Glen Rebka 1931–2015, media.physics.harvard.edu/video/?id= LOEB_POUND_092591.flv). Example 13.26(Redshift of the Sun) A photon emitted with frequency ν0 at a distance r from a mass » M would be observed at spatial infinity to have frequency ν = √ ν0 −g00 = ν0 1 − 2M G / c2r for a redshift of ´ν = ν0 − ν . Since the Sun’s dimensionless potential φ¸ /c2 is − M G /c2 r = −2.12 × 10−6 at its surface, sunlight is shifted to the red by 2 parts per million.

13.35 Einstein’s Equations If we make an action that is a scalar, invariant under general coordinate transformations, and then apply to it the principle of stationary action, we will get tensor field equations that are invariant under general coordinate transformations. If the metric of spacetime is among the fields of the action, then the resulting theory will be a possible theory of gravity. If we make the action as simple as possible, it will be Einstein’s theory. To make the action of the gravitational field, we need a scalar. Apart from the √ √ scalar g d 4 x = g c dt d 3 x, where g = | det (g ik )|, the simplest scalar we can form from the metric tensor and its first and second derivatives is the scalar curvature R which gives us the Einstein–Hilbert action SE H

=

c3 16π G

¾ √ R g d4x =

c3 16π G

¾

gik R ik

√g d4 x

(13.219)

in which G = 6.7087 × 10−39 ±c (GeV/c 2 )−2 = 6 .6742 × 10−11 m3 kg−1 s −2 is Newton’s constant. If δ g ik (x ) is a tiny local change in the inverse metric, then the rule δ det A = det A Tr( A−1 δ A ) (1.228), valid for any nonsingular, nondefective matrix A, together with the identity 0 = δ( g ik gk ± ) = δ g ik gk ± + g ik δ gk ± and the notation g for the metric tensor g j ± considered as a matrix imply that

√ δ g =

det g

√ δ det g = 2g g

(det g )2 g ik δ gik

√ 2g g

= − 12 √g gik δ gik .

(13.220)

So the first-order change in the action density is δ

(

g ik R ik

√ g) = R √g δ gik + gik R δ √g + gik √g δ R ik ³ik 1 ´ √ ik = Rik − 2 R gik g δ gik + gik √g δ Rik .

(13.221)

13.35 Einstein’s Equations

505

The product g ik δ Rik is a scalar, so we can evaluate it in any coordinate system. In a local inertial frame, where ³abc = 0 and gde is constant, this invariant variation of the Ricci tensor (13.112) is g ik δ Rik

( ) ) ( = gik δ ³nin k − ³nik n = gik ∂k δ³nin − ∂n δ³nik ( ) = gik ∂k δ³ nin − gin ∂k δ³ kin = ∂k gik δ³nin − gin δ³kin . ,

,

(13.222)

The transformation law (13.66) for the affine connection shows that the variations δ³ nin and δ³ kin are tensors although the connections themselves aren’t. Thus we can evaluate this invariant variation of the Ricci tensor in any coordinate system by replacing the derivatives with covariant ones getting g ik δ R ik

=

( gik δ³n − gin δ³k ) in in ;k

(13.223)

which we recognize as the covariant divergence (13.180) of a contravariant vector. The last term in the first-order change (13.221) in the action density is therefore a surface term whose variation vanishes for tiny local changes δg ik of the metric

√ g gik δ R = ¹√g (gik δ³n − gin δ³k )º . ik in in ,k

Hence the variation of S E H is simply δSE H

=

c3 16π G

¾³

Rik

The principle of least action δ S E H empty space: Rik

− =

1 gik R 2

´√

g δ g ik d 4 x .

(13.224)

(13.225)

0 now gives us Einstein’s equations for

− 21 gik R = 0.

(13.226)

The tensor G ik = Rik − 21 g ik R is Einstein’s tensor. Taking the trace of Einstein’s equations (13.226), we find that the scalar curvature R and the Ricci tensor Rik are zero in empty space: R

=0

and

Rik

= 0.

(13.227)

The energy–momentum tensor Tik is the source of the gravitational field. It is defined so that the change in the action of the matter fields due to a tiny local change δ g ik ( x ) in the metric is δ Sm

=

− 2c1

¾

Tik

√ g δ gik d 4 x =

1 2c

¾

T ik

√g δ g

ik

d 4x

(13.228)

in which the identity δ g ik = − g i j g ±k δg j ± explains the sign change. Now the principle of least action δ S = δ S E H + δ Sm = 0 yields Einstein’s equations in the presence of matter and energy R ik −

1 gik R 2

= 8πc4G Tik .

(13.229)

506

13 General Relativity

Taking the trace of both sides, we get R

=−

8π G T c4

and

Rik

=

³

8π G c4

Tik

−

T gik 2

´ .

(13.230)

13.36 Energy–Momentum Tensor The action Sm of the matter fields φi is a scalar that is invariant under general coordinate transformations. In particular, a tiny local general coordinate transformation x ±a = x a + ² a (x ) leaves Sm invariant 0 = δ Sm

=

¾ µ δ

L (φi ( x ))

»

¶

g (x ) d 4 x .

(13.231)

The vanishing change δ Sm = δ Sm φ + δ Smg has a part δ Sm φ due to the changes in the fields δφi (x ) and a part δ Smg due to the change in the metric δg ik . The principle of stationary action tells us that the change δ Sm φ is zero as long as the fields obey the classical equations of motion. Combining the result δ Sm φ = 0 with the definition (13.228) of the energy–momentum tensor, we find 0 = δ Sm Since x a gives us

= δ Sm + δ Smg = δ Smg = φ

1 2c

¾

T ik

√g δg

ik

d 4x .

(13.232)

= x± a − ²a ( x ), the transformation law for rank-2 covariant tensors (13.35)

= g±ik( x ) − gik ( x ) = gik± (x ± ) − gik ( x ) − ( gik± ( x ± ) − gik± (x )) = (δia − ²,ai )(δbk − ²,bk ) gab − gik − ²c gik,c = − gib ²,bk − gak ²,ai − ²c gik, c (13.233) = − gib ( gbc ²c ),k − gak (gac ²c ),i − ²c gik,c = − ²i ,k − ²k ,i − ²c gib g,bck − ²c gak gac,i − ²c gik,c . Now using the identity ∂i (g ik g k ) = 0, the definition (13.68) of the covariant δ gik

±

derivative of a covariant vector, and the formula (13.87) for the connection in terms of the metric, we find to lowest order in the change ² a (x ) in x a that the change in the metric is δ gik

= − ²i ,k − ²k ,i + ²c gbc (gib,k + ²c gac gak,i −)²c gik, c = − ²i ,k − ²k ,i + ²c gac gia ,k + gak, i − gik,a = − ²i ,k − ²k ,i + ²c ³cik + ²c ³cki = − ²i ;k − ²k ;i .

(13.234)

Combining this result (13.234) with the vanishing (13.232) of the change δ Smg , we have ¾ 0=

T ik

√g (² + ² ) d 4 x . i ;k k ;i

(13.235)

13.37 Perfect Fluids

507

Since the energy–momentum tensor is symmetric, we may combine the two terms, √ integrate by parts, divide by g, and so find that the covariant divergence of the energy–momentum tensor is zero 0 = T;ik k

= T,ikk + ³kak T ia + ³i ak T ak = √1g (√ gT ik ),k + ³i ak T ak

(13.236)

when the fields obey their equations of motion. In a given inertial frame, only the total energy, momentum, and angular momentum of both the matter and the gravitational field are conserved.

13.37 Perfect Fluids In many cosmological models, the energy–momentum tensor is assumed to be that of a perfect fluid, which is isotropic in its rest frame, does not conduct heat, and has zero viscosity. The energy–momentum tensor Ti j of a perfect fluid moving with 4-velocity u i (12.20) is Ti j

= p gi j + ( cp2 + ρ) ui u j

(13.237)

in which p and ρ are the pressure and mass density of the fluid in its rest frame and g i j is the spacetime metric. Einstein’s equations (13.229) then are Rik

−

8π G 1 gik R = 4 Tik 2 c

=

8π G Â p gi j c4

+ ( cp2

+ ρ) ui u j

Ã

.

(13.238)

An important special case is the energy-momentum tensor due to a nonzero value of the energy density of the vacuum. In this case p = −c 2ρ and the energy– momentum tensor is Ti j

= p gi j = −c2 ρ gi j

(13.239)

in which T00 = c2 ρ is the energy density of the ground state of the theory. This energy density c 2ρ is a plausible candidate for the dark-energy density. It is equivalent to a cosmological constant¶ = 8 π Gρ . On small scales, such as that of our solar system, one may neglect matter and dark energy. So in empty space and on small scales, the energy–momentum tensor vanishes Ti j = 0 along with its trace and the scalar curvature T = 0 = R, and Einstein’s equations (13.230) are Ri j

= 0.

(13.240)

508

13 General Relativity

13.38 Gravitational Waves The nonlinear properties of Einstein’s equations (13.229– 13.230) are important on large scales of distance (Sections 13.42 & 13.43) and near great masses (Sections 13.39 & 13.40). But throughout most of the mature universe, it is helpful to linearize them by writing the metric as the metric ηik of empty, flat spacetime (12.3) plus a tiny deviation h ik

= ηik + hik .

gik

(13.241)

To first order in h ik , the affine connection (13.87) is k

³ i±

(g + g − g ) = 1 ηk j (h + h − h ) ji , j ,i i ,j ji , j ,i i ,j 2

= 21 gk j

±

±

±

±

±

±

(13.242)

and the Ricci tensor (13.112) is the contraction

= R kik = [∂k + ³k , ∂ + ³ ]ki = ³k i ,k − ³ kki , . (13.243) Since ³ki = ³ k i and h ik = h ki , the linearized Ricci tensor is ( ) 1 ( ) 1 R i = ηk j h ji , + h j ,i − h i , j ,k − ηk j h ji, k + h jk ,i − h ik , j , 2 2 ¶ 1 µ k (13.244) = 2 h ,ik + hik ,,k − hi ,,kk − hk k ,i . We can simplify Einstein’s equations (13.230) in empty space Ri = 0 by using Ri ±

±

±

±

±

±

±

±

±

±

±

±

±

±

±

±

±

±

coordinates in which h ik obeys (Exercise 13.17) de Donder’s harmonic gauge condition h i k ,i = 21 (η j ± h j ± ),k ≡ 21 h ,k . In this gauge, the linearized Einstein equations in empty space are Ri±

= − 21 hi ,,kk = 0 ±

(c2 ∇ 2 − ∂ 2) h = 0. i 0

or

±

(13.245)

On 14 September 2015, the LIGO collaboration detected the merger of two black holes of 29 and 36 solar masses which liberated 3M¸ c 2 of energy. They have set an upper limit of c 2m g < 2 × 10−25 eV on the mass of the graviton, have detected 10 black-hole mergers, and are expected to detect a new black-hole merger every week in late 2019. 13.39 Schwarzschild’s Solution In 1916, Schwarzschild solved Einstein’s field equations (13.240) in empty space Ri j = 0 outside a static mass M and found as the metric ds

2

=−

in which d µ2

³

1−

´

2MG 2 2 c dt c2r

= d θ 2 + sin2 θ d φ2 .

+

³

1−

´ 2 M G −1 2 dr + r 2 d µ2 2 cr

(13.246)

13.40 Black Holes

509

The Mathematica scripts GREAT.m and Schwarzschild.nb give for the Ricci tensor and the scalar curvature Rik = 0 and R = 0, which show that the metric obeys Einstein’s equations and that the singularity in grr

=

³

1−

´ 2M G −1 c 2r

(13.247)

at the Schwarzschild radius r S = 2M G /c 2 is an artifact of the coordinates. Schwarzschild.nb also computes the affine connections. The Schwarzschild radius of the Sun r S = 2M¸ G /c 2 = 2.95 km is far less than the Sun’s radius r ¸ = 6.955 × 105 km, beyond which his metric applies (Karl Schwarzschild, 1873–1916). 13.40 Black Holes Suppose an uncharged, spherically symmetric star of mass M has collapsed within a sphere of radius rb less than its Schwarzschild radius or horizon r h = r S = 2 M G/c 2. Then for r > rb , the Schwarzschild metric (13.246) is correct. The time dt measured on a clock outside the gravitational field is related to the proper time d τ on a clock fixed at r ≥ 2M G /c2 by (13.216) dt

√

= d τ / −g00 = dτ /

Ä

1−

2MG . c 2r

(13.248)

The time dt measured away from the star becomes infinite as r approaches the horizon r h = rS = 2 M G/c 2. To outside observers, a clock at the horizon r h seems frozen in time. Due to the gravitational redshift (13.248), light of frequency ν p emitted at r ≥ 2 M G/c 2 will have frequency ν

√ ν = ν p − g00 = ν p

Ä

1−

2M G c 2r

(13.249)

when observed at great distances. Light coming from the surface at r S = 2M G /c 2 is redshifted to zero frequency ν = 0. The star is black. It is a black hole with a horizon at its Schwarzschild radius r h = r S = 2 M G/c 2 , although there is no singularity there. If the radius of the Sun were less than its Schwarzschild radius of 2.95 km, then the Sun would be a black hole. The radius of the Sun is 6.955 × 105 km. Black holes are not black. They often are surrounded by bright hot accretion disks, and Stephen Hawking (1942–2018) showed (Hawking, 1975) that the intense gravitational field of a black hole of mass M radiates at a temperature T

= 8π k G M = 4π±krc = 2±πgkc ± c3

h

(13.250)

510

13 General Relativity

in which k = 8 .617 × 10−5 eV K−1 is Boltzmann’s constant, ± is Planck’s constant h = 6.626 × 10−34 J s divided by 2π , ± = h /(2π ), and g = G M / rh2 is the gravitational acceleration at r = r h . More generally, a detector in vacuum subject to a uniform acceleration a (in its instantaneous rest frame) sees a temperature T = ±a /(2 π kc) (Alsing and Milonni, 2004). In a region of empty space where the pressure p and the chemical potentials μ j all vanish, the change (7.111) in the internal energy U = c 2 M of a black hole of mass M is c2 d M = T d S where S is its entropy. So the change d S in the entropy of a black hole of mass M = c 2r h /(2G ) and temperature T = ±c/(4 π kr h ) (13.250) is dS

2

2

2

3

πc k h c drh = 2r h drh . = cT d M = 4π(c±ckr) h d M = 4π ckr ± 2G G±

(13.251)

Integrating, we get a formula for the entropy of a black hole in terms of its area (Bekenstein, 1973; Hawking, 1975) S=

3

πc

k

G±

rh2

3

= c± Gk A4

(13.252)

where A = 4π r h2 is the area of the horizon of the black hole. A black hole is entirely converted into radiation after a time t

= 5120± cπ4 G

2

M3

(13.253)

proportional to the cube of its mass M .

13.41 Rotating Black Holes A half-century after Einstein invented general relativity, Roy Kerr invented the metric for a mass M rotating with angular momentum J = G Ma /c. Two years later, Newman and others generalized the Kerr metric to one of charge Q. In Boyer– Lindquist coordinates, its line element is ds 2

) − a sin2 θ dφ 2 2 ( ) 2 + sinρ 2 θ (r 2 + a 2)d φ − a dt 2 + ρ´ dr 2 + ρ 2d θ 2

= − ρ´2

(

dt

(13.254)

where ρ 2 = r 2 + a 2 cos2 θ and ´ = r 2 + a 2 − 2G Mr /c2 + Q 2. The Mathematica script Kerr_black_hole.nb shows that the Kerr–Newman metric for the uncharged case, Q = 0, has Rik = 0 and R = 0 and so is a solution of Einstein’s equations in empty space (13.230) with zero scalar curvature. A rotating mass drags nearby masses along with it. The daily rotation of the Earth moves satellites to the East by tens of meters per year. The frame dragging

13.42 Spatially Symmetric Spacetimes

of extremal black holes with J et al., 2018) (Roy Kerr, 1934–).

±

511

G M 2 /c approaches the speed of light (Ghosh

13.42 Spatially Symmetric Spacetimes Einstein’s equations (13.230) are second-order, nonlinear partial differential equations for 10 unknown functions gik ( x ) in terms of the energy–momentum tensor Tik ( x ) throughout the universe, which of course we don’t know. The problem is not quite hopeless, however. The ability to choose arbitrary coordinates, the appeal to symmetry, and the choice of a reasonable form for Tik all help. Astrophysical observations tell us that the universe extends at least 46 billion light years in all directions; that it is flat or very nearly flat; and that the cosmic microwave background (CMB) radiation is isotropic to one part in 105 apart from a Doppler shift due the motion of the Sun at 370 km/s towards the constellation Leo. These microwave photons have been moving freely since the universe became cool enough for hydrogen atoms to be stable. Observations of clusters of galaxies reveal an expanding universe that is homogeneous on suitably large scales of distance. Thus as far as we know, the universe is homogeneous and isotropic in space, but not in time. There are only three maximally symmetric 3-dimensional spaces: euclidian space E3 , the sphere S 3 (Example 13.16), and the hyperboloid H 3 (Example 13.17). Their line elements may be written in terms of a distance L as 2

+ r 2 d µ2 (13.255) = 1 − dr 2 2 k r /L in which k = 1 for the sphere, k = 0 for euclidian space, and k = − 1 for the hyperds 2

boloid. The Friedmann–Lemaître–Robinson–Walker (FLRW) cosmologies add to these spatially symmetric line elements a dimensionless scale factora (t ) that describes the expansion (or contraction) of space ds

2

= − c dt + a (t ) 2

2

2

³

dr 2 1 − k r 2 /L 2

+r

2

dθ

2

+r

The FLRW metric is

⎛ − c2 ⎜ 0 gik (t , r, θ , φ) = ⎜ ⎝ 0 0

0 2 a /(1 − k r 2/ L 2 ) 0 0

0 0 2 2 a r 0

2

sin

2

θ

dφ

2

´

.

(13.256)

⎞

0 ⎟⎟ 0 ⎠. 0 a 2 r 2 sin 2 θ

(13.257)

The constant k determines whether the spatial universe is open k = −1, flat k = 0, or closed k = 1. The coordinates x 0, x 1, x 2 , x 3 ≡ t , r , θ , φ are comoving in that a detector at rest at r , θ , φ records the CMB as isotropic with no Doppler shift.

512

13 General Relativity

The metric (13.257) is diagonal; its inverse gi j also is diagonal

⎛ − c−2 ⎜ 0 g ik = ⎜ ⎝ 0 0

0

0 0 (a r )−2 0

− k r 2/ L 2)/a2

(1

0 0

⎞

0 ⎟⎟ 0 ⎠. 0 (a r sin θ )−2

(13.258)

One may use the formula (13.87) to compute the affine connection in terms of the metric and its inverse as ³ ki ± = 21 g k j ( g ji, ± + g j ±,i − g ±i , j ). It usually is easier, however, to use the action principle (13.192) to derive the geodesic equation directly and then to read its expressions for the ³i jk ’s. So we require that the integral

´ ¾³ a 2 r ±2 2 2 ±2 2 2 2 ± 2 2 ±2 0=δ −c t + 1 − kr 2/ L 2 + a r θ + a r sin θ φ d λ,

(13.259)

in which a prime means derivative with respect to λ, be stationary with respect to the tiny variations δ t (λ), δr (λ), δθ (λ), and δφ (λ). By varying t (λ), we get the equation a a˙ 0 = t ±± + c2

³

r ±2 1 − kr 2/ L 2

+ r θ ± 2 + r 2 sin2 θ φ± 2 2

´

= t ±± + ³ ti x ±i x±

which tells us that the nonzero ³ t jk ’s are t

³ rr

= c2 (1 −akra˙ 2/ L 2) ,

³

t

θθ

= a ac˙ 2r

2

,

and

³

±

±

= aa˙ r c2sin 2

t

φφ

2

θ

.

(13.260)

(13.261)

By varying r (λ) we get (with more effort) 0 = r ±± +

rr ±2 k / L 2 (1 − kr 2 / L 2 )

So we find that ³ r tr r

³ rr

= L 2 −kr kr 2 ,

= a˙ /a, ³

r

+ 2 a˙ tar − r (1 − krL2 )(θ ± 2 + sin2 θ φ ±2). ± ±

2

3

θθ

= − r + krL 2 ,

and

³

r

φφ

= sin2 θ ³r

θθ .

(13.262)

(13.263)

Varying θ (λ) gives

a˙ 2 0 = θ ±± + 2 t ± θ ± + θ ± r ± − sin θ cos θ φ ±2 and a r 1 a ˙ θ θ ³ rθ = , and ³ θφφ = − sin θ cos θ . ³ tθ = , a r Finally, varying φ (λ) gives r ± φ± a˙ + 2 cot θ θ ± φ± and 0 = φ ±± + 2 t ±φ ± + 2 a r a˙ 1 φ φ φ ³ tφ = , ³ rφ = , and ³ θ φ = cot θ . a r

(13.264)

(13.265)

13.43 Friedmann–Lemaître–Robinson–Walker Cosmologies

513

Other ³ ’s are either zero or related by the symmetry ³k i ± = ³ k±i . Our formulas for the Ricci (13.112) and curvature (13.103) tensors give

= R k0k0 = [ Dk , D0]k 0 = [∂k + ³k , ∂0 + ³0]k 0. (13.266) [ D0 , D0 ] = 0, we need only compute [D1 , D 0]10 , [ D2, D0]20, and R00

Because [ D3 , D0 ]30 . Using the formulas (13.261–13.265) for the ³ ’s and keeping in mind (13.102) that the element of row r and column c of the ±th gamma matrix is ³r ± c , we find

[D1 , D0]10 = ³ 100,1 − ³110,0 + ³ 11 j ³ j00 − ³10 j ³ j 10 = − (a˙ /a),0 − (a˙ /a)2 [D2 , D0]20 = ³ 200,2 − ³220 0 + ³22 j ³ j00 − ³ 20 j ³ j20 = − (a˙ /a),0 − (a˙ /a)2 [D3 , D0]30 = ³ 300,3 − ³330 0 + ³33 j ³ j00 − ³ 30 j ³ j30 = − (a˙ /a),0 − (a˙ /a)2 R tt = R 00 = [ Dk , D0 ]k0 = − 3(a˙ /a ),0 − 3(a˙ /a ) 3 = − 3a¨ /a . (13.267) Thus for Rrr = R11 = R k 1k1 = [ Dk , D1 ]k 1 = [∂k + ³k , ∂1 + ³1]k 1, we get a a¨ + 2 a˙ 2 + 2kc2 / L 2 Rrr = [ Dk , D1 ]k 1 = (13.268) c2 (1 − kr 2 / L 2 ) (Exercise 13.21), and for R22 = R and R33 = R we find R = [(a a¨ + 2a˙ 2 + 2kc2 / L 2 )r 2]/c 2 and R = sin 2 θ R (13.269) (Exercises 13.22 and 13.23). And so the scalar curvature R = g ab R ba is R 00 (1 − kr 2 / L 2 ) R 11 + + R 22 + R33 R = g ab R = − ,

,

θθ

φφ

θθ

φφ

ba

c2 2 2 2 = 6 aa¨ + a˙ c2+a2kc /L .

a2

θθ

a 2r 2 sin2 θ

a 2r 2

(13.270)

It is, of course, quicker to use the Mathematica script FLRW.nb. 13.43 Friedmann–Lemaître–Robinson–Walker Cosmologies The energy–momentum tensor (13.237) of a perfect fluid moving at 4-velocity u i is Tik = pgik + ( p /c 2 + ρ)u i u k where p and ρ are the pressure and mass density of the fluid in its rest frame. In the comoving coordinates of the FLRW metric (13.257), the 4-velocity (12.20) is u i = (1, 0 , 0, 0), and the energy–momentum tensor (13.237) is

⎛−c2ρ g 00 ⎜⎜ 0 Ti j = ⎝ 0 0

0 p g 11 0 0

0 0 p g22 0

⎞

0 0 ⎟ ⎟. 0 ⎠ p g33

(13.271)

514

13 General Relativity

Its trace is

= gi j Ti j = −c2ρ + 3 p. (13.272) Thus using our formulas (13.257) for g 00 = − c 2, (13.267) for R00 = − 3 a¨ /a , T

(13.271) for Ti j , and (13.272) for T , we can write the 00 Einstein equation (13.230) as the second-order equation a¨ a

= − 4π3G

³

ρ

+

3p c2

´

(13.273)

which is nonlinear because ρ and p depend upon a. The sum c 2 ρ + 3 p determines the acceleration a¨ of the scale factor a (t ); when it is negative, it accelerates the expansion. If we combine Einstein’s formula for the scalar curvature R = − 8π G T /c 4 (13.230) with the FLRW formulas for R (13.270) and for the trace T (13.272) of the energy–momentum tensor, we get a¨ a

+

³ a˙ ´2 a

+

c2 k L2 a2

=

4π G 3

³

ρ

−

3p c2

´

.

(13.274)

Using the 00-equation (13.273) to eliminate the second derivative a¨ , we find

³ a˙ ´2 a

2

= 8π3G ρ − Lc2 ak2

(13.275)

which is a first-order nonlinear equation. Both this and the second-order equation (13.273) are known as the Friedmann equations. The left-hand side of the first-order Friedmann equation (13.275) is the square of the Hubble rate a˙ H= (13.276) a which is an inverse time or a frequency. Its present value H0 is the Hubble constant. In terms of H , the first-order Friedmann equation (13.275) is H2 An absolutely flat universe has k

2

= 8π3G ρ − Lc2 ak2 .

(13.277)

= 0, and therefore its density must be 2

ρc

which is the critical mass density .

= 83H πG

(13.278)

13.44 Density and Pressure

515

13.44 Density and Pressure The 0th energy-momentum conservation law (13.236) is 0 = T;0a a

= ∂a T 0a + ³aca T 0c + ³0ca T ca .

(13.279)

For a perfect fluid of 4-velocity u a , the energy–momentum tensor (13.271) is T ik = (ρ + p /c2 )u i u k + pgik in which ρ and p are the mass density and pressure of the fluid in its rest frame. The comoving frame of the Friedmann–Lemaître–Robinson– Walker metric (13.257) is the rest frame of the fluid. In these coordinates, the 4velocity u a is (1 , 0, 0, 0 ), and the energy–momentum tensor is diagonal with T 00 = ρ and T j j = pg j j for j = 1 , 2, 3. Our connection formulas (13.261) tell us that ³000 = 0, that ³0 j j = a ˙ g j j /(c2 a), and that ³ j 0 j = 3a˙ /a. Thus the conservation law (13.279) becomes for spatial j 0=

+ ³ j 0 j T 00 + ³ 0 j j T j j 3 µ ¶ ± = ρ˙ + 3 a˙ ρ + a˙ g j j p g j j = ρ˙ + 3 a˙ ρ + p . ∂0 T

00

a

Thus

j =1

c 2a

a

(13.280)

c2

p¶ dρ 3 µ ρ+ . (13.281) = − da a c2 The energy density ρ is composed of fractions ρi each contributing its own partial pressure p i according to its own equation of state

˙ = − 3aa˙ (ρ + cp2 ),

ρ

and so

pi

= c2wi ρi

(13.282)

in which wi is a constant. The rate of change (13.282) of the density ρi is then dρi 3 = − (1 + wi ) ρi . da a In terms of the present density ρi0 and scale factor a 0, the solution is

= ρi 0

(13.283)

µ a0 ¶3 1+ (

wi )

. (13.284) a There are three important kinds of density. The dark-energy density ρ¶ is assumed to be like a cosmological constant ¶ or like the energy density of the vacuum, so it is independent of the scale factor a and has w¶ = −1. A universe composed only of dust or nonrelativistic collisionless matterhas no pressure. Thus p = wρ = 0 with ρ ² = 0, and so w = 0. So the matter density falls inversely with the volume ρi

ρm

= ρm0

µ a0 ¶3 a

.

(13.285)

516

13 General Relativity History of the universe

Figure 13.1 NASA/WMAP Science Team’s timeline of the known universe. (Source: NASA/WMAP Science Team, https://map.gsfc.nasa.gov/media/ 060915/index.html)

The density of radiation ρr has wr = 1 /3 because wavelengths scale with the scale factor, and so there’s an extra factor of a ρr

= ρr0

The total density ρ varies with a as

= ρ + ρm0

µ a0 ¶4 a

µ a0 ¶3

(13.286)

.

+ ρr 0

µ a0 ¶4

. (13.287) a a This mass density ρ , the Friedmann equations (13.273 and 13.275), and the physics of the standard model have caused our universe to evolve as in Fig. 13.1 over the past 14 billion years. ρ

¶

13.45 How the Scale Factor Evolves with Time The first-order Friedmann equation (13.275) expresses the square of the instantaneous Hubble rate H = a˙ /a in terms of the density ρ and the scale factor a ( t ) H

2

=

³ a˙ ´2 a

2

= 8π3G ρ − Lc2ak2

(13.288)

in which k = ³1 or 0. The critical density ρc = 3H 2 /(8π G ) (13.278) is the one that satisfies this equation for a flat (k = 0) universe. Its present value is

13.45 How the Scale Factor Evolves with Time

517

Table 13.1 Cosmological parameters of the Planck collaboration H0 (km/(s Mpc)

µ¶

67.66 ³ 0.42

ρc0

µm

0.6889 ³ 0.0056

0. 3111 ³ 0.0056

µk

0.0007 ³ 0.0037

= 3 H02/(8π G ) = 8.599 × 10−27 kg m−3 . Dividing Friedmann’s equation by

the square of the present Hubble rate H02, we get H2

1

H0

H02

= 2

³ a˙ ´2 a

=

1

³ 8π G

H02

3

ρ

−

c 2k a2 L 2

´

= ρρ − 0c

c 2k a 2 H02 L 2

(13.289)

in which ρ is the total density (13.287) H2

¶

H0

c0

= ρρ + ρρm + ρρr − 2 c0

c0

a 03 a3

=ρ +ρ ρ¶

ρ m0

c0

c0

+ρ

c 2k a 2 H02 L 2

ρr 0 c0

a04 a4

−

c2 k a02 . a 20 H02 L 2 a 2

(13.290)

The Planck collaboration use a model in which the energy density of the universe is due to radiation, matter, and a cosmological constant ¶. Only about 18.79% of the matter in their model is composed of baryons, µb = 0.05845 ³ 0.0003. Most of the matter is transparent and is called dark matter. They assume the dark matter is composed of particles that have masses in excess of a keV so that they are heavy enough to have been nonrelativistic or “cold” when the universe was about a year old (Peebles, 1982). The energy density of the cosmological constant ¶ is known as dark energy. The Planck collaboration use this ¶ -cold-dark-matter (¶CDM) model and their CMB data to estimate the Hubble constant as H0 = 67.66km/(s Mpc) = 2.1927 × 10− 18 s−1 and the density ratios µ¶ = ρ¶ /ρc0, µm = ρm0 /ρc0 , and µk ≡ − c2 k /(a 0 H0 L )2 as listed in the table (13.1) (Aghanim et al., 2018). The Riess group use the Gaia observatory to calibrate Cepheid stars and type Ia supernovas as standard candles for measuring distances to remote galaxies. The distances and redshifts of these galaxies give the Hubble constant as H0 = 73.48 ³ 1 .66 (Riess et al., 2018). As this book goes to press, the 9% discrepancy between the Planck and Riess H0 ’s is unexplained. To estimate the ratio µr = ρr 0/ρc0 of densities, one may use the present temperature T0 = 2 .7255 ³ 0.0006 K (Fixsen, 2009) of the CMB radiation and the formula (5.110) for the energy density of photons 5

ργ

4

= 8π15h(k3BcT50) = 4.6451 × 10−31 kg m−3 .

(13.291)

518

13 General Relativity

Adding in three kinds of neutrinos and antineutrinos at T0ν = (4/11)1/3 T0, we get for the present density of massless and nearly massless particles (Weinberg, 2010, section 2.1) ρr 0

=

Å

1+3

³ 7 ´ ³ 4 ´4 3 Æ /

8

11

ργ

= 7.8099 × 10−31 kg m−3 .

(13.292)

The fraction µr of the present critical energy density that is due to radiation is then µr

In terms of H / H02 is 2

µr

= ρρr 0 = 9.0824 × 10−5.

and of the H2 H02

µ’s

in the table (13.1), the formula (13.290) for

= µ + µk ¶

a02 a2

+ µm

Since H = a˙ /a, one has dt = da /(a H ) x = a /a 0, the time interval dt is dt

= H1

dx x

»

(13.293)

c0

=

a03 a3

+ µr

a04 . a4

(13.294)

H0−1( da/a )( H0/ H ), and so with

1

.

(13.295)

.

(13.296)

+ µk x −2 + µm x −3 + µr x −4 Integrating and setting the origin of time t ( 0) = 0 and the scale factor at the present time equal to unity a 0 = 1, we find that the time t (a ) that a (t ) took to grow from 0 0

to a (t ) is

t (a ) =

1 H0

¾

a

µ¶

»

dx

+ µr x −2 This integral gives the age of the universe as t (1) = 13.789 Gyr; the Planckcollaboration value is 13.787 ³ 0.020 Gyr (Aghanim et al., 2018). Figure 13.2 plots the scale factor a (t ) and the redshift z (t ) = 1/a − 1 as functions of the time t (13.296) for the first 14 billion years after the time t = 0 of infinite redshift. A photon emitted with wavelength λ at time t (a ) now has wavelength λ0 = λ/a (t ) . The change in its wavelength is ´λ = λ z (t ) = λ (1/a − 1) = λ 0 − λ. 0

µ¶

x 2 + µk + µm x −1

13.46 The First Hundred Thousand Years Figure 13.3 plots the scale factor a (t ) as given by the integral (13.296) and the densities of radiation ρr (t ) and matter ρm (t ) for the first 100,000 years after the time of infinite redshift. Because wavelengths grow with the scale factor, the radiation density (13.286) is proportional to the inverse fourth power of the scale factor ρr (t ) = ρr0 /a 4(t ). The density of radiation therefore was dominant at early

13.46 The First Hundred Thousand Years

519

Evolution of scalefactor over 14 Gyr

1

10

0.9

9

0.8

8 a (t)

0.7

7

0.6

6 5

0.4

4

0.3

3

0.2

2 z(t )

0.1 0

0

2

4

6 t

(Gyr)

) t(z

)t( a

0.5

1 8

10

12

0 14

Figure 13.2 The scale factor a (t ) (solid, left axis) and redshift z(t ) (dotdash, right axis) are plotted against the time (13.296) in Gyr. This chapter’s Fortran, Matlab, and Mathematica scripts are in Tensors_and_general_relativity at github.com/kevinecahill.

4.5

Evolution of scalefactor and densities over first 100 kyr ×10–4

4.5

4

4

3.5 (t )

ρm

3

ρr

a (t)

(t)

3.5

) t(a

2.5

2

2

1.5

)3m/gk( )t( ρ

3

2.5

1.5

1

1

0.5 0

×10 –16 5

0.5 0

10

20

30

40

50 60 t (kyr)

70

80

90

0 100

Figure 13.3 The Planck-collaboration scale factor a (solid), radiation density ρr (dotdash), and matter density ρ m (dashed) are plotted as functions of the time (13.296) in kyr. The era of radiation ends at t = 50,506 years when the two densities are equal to 1.0751 × 10 −16 kg/m3 , a = 2. 919 × 10−4 , and z = 3425.

520

13 General Relativity

times when the scale factor was small. Keeping only µr (13.296), we get t

=

a2 2 H0

√

µr

and a ( t ) = µ1r /4

»

= 0.6889 in the integral

2 H0 t .

(13.297)

Since the radiation density ρr (t ) = ρr 0/a 4 (t ) also is proportional to the fourth power of the temperature ρr (t ) ∼ T 4 , the temperature varied as the inverse of the scale factor T ∼ 1/a (t ) ∼ t −1/2 during the era of radiation. In cold-dark-matter models, when the temperature was in the range 10 12 > T > 1010 K or m μc 2 > kT > m e c 2, where m μ is the mass of the muon and m e that of the electron, the radiation was mostly electrons, positrons, photons, and neutrinos, and the relation between the time t and the temperature T was t ∼ 0.994 sec × (1010 K/ T )2 (Weinberg, 2010, ch. 3). By 109 K, the positrons had annihilated with electrons, and the neutrinos fallen out of equilibrium. Between 109 K and 106 K, when the energy density of nonrelativistic particles became relevant, the time–temperature relation was t ∼ 1 .78 sec × (1010 K/ T ) 2 (Weinberg, 2010, ch. 3). During the first three minutes (Weinberg, 1988) of the era of radiation, quarks and gluons formed hadrons, which decayed into protons and neutrons. As the neutrons decayed (τ = 877.7 s), they and the protons formed the light elements – principally hydrogen, deuterium, and helium in a process called big-bang nucleosynthesis. The density of nonrelativistic matter (13.285) falls as the third power of the scale factor ρm (t ) = ρ m0/a 3 (t ). The more rapidly falling density of radiation ρr (t ) crosses it 50,506 years after the Big Bang as indicated by the vertical line in the figure (13.3). This time t = 50,506 yr and redshift z = 3425 mark the end of the era of radiation . 13.47 The Next Ten Billion Years The era of matterbegan about 50,506 years after the time of infinite redshift when the matter density ρm first exceeded the radiation density ρr . Some 330,000 years later at t ∼ 380,000 yr, the universe had cooled to about T = 3000 K or kT = 0 .26 eV – a temperature at which less than 1% of the hydrogen was ionized. At this redshift of z = 1090, the plasma of ions and electrons became a transparent gas of neutral hydrogen and helium with trace amounts of deuterium, helium-3, and lithium-7. The photons emitted or scattered at that time as 0.26 eV or 3000 K photons have redshifted down to become the 2.7255 K photons of the cosmic microwave background (CMB) radiation. This time of last scattering and first transparency is called recombination, a term that makes sense if the universe is cyclic.

13.47 The Next Ten Billion Years

521

If we approximate time periods t − tm during the era of matter by keeping only in the integral (13.296), then we get

µm

t

− tm =

2 a 3/2 3 H0

√

µm

and a ( t ) =

³ 3 H √µ 0

m (t

2

− tm ) ´2 3 /

(13.298)

in which tm is a time well inside the era of matter. Between 10 and 17 million years after the Big Bang, the temperature of the known universe fell from 373 to 273 K. If by then the supernovas of very early, very heavy stars had produced carbon, nitrogen, and oxygen, biochemistry may have started during this period of 7 million years. Stars did form at least as early as 180 million years after the Big Bang (Bowman et al., 2018). The era of matter lasted until the energy density of matter ρm (t ), falling as ρm (t ) = ρm0/a 3 (t ), had dropped to the energy density of dark energy ρ¶ = 5.9238 × 10−27kg/m3 . This happened at t = 10.228 Gyr as indicated by the first vertical line in the figure (13.4).

Evolution of scalefactor and densities over 50 Gyr

9

×10 –26 1

0.9

8

0.8

()

ρm t

7

0.7

6

Λ

0.6

) t(a

5

0.5

4

0.4

3

a(t )

0.3

2

0.2

1 0

)3– mgk( )t(ρ

ρ

0.1 0

5

10

15

20

25 30 t (Gyr)

35

40

45

0 50

Figure 13.4 The scale factor a (solid), the vacuum density ρ¶ (dotdash), and the matter density ρm (dashed) are plotted as functions of the time (13.296) in Gyr. The era of matter ends at t = 10.228 Gyr (first vertical line) when the two densities are equal to 5.9238 × 10−27kg m −3 and a = 0.7672. The present time t0 is 13.787 Gyr (second vertical line) at which a(t ) = 1.

522

13 General Relativity

13.48 Era of Dark Energy

The era of dark energy began 3.6 billion years ago at a redshift of z = 0 .3034 when the energy density of the universe ρm + ρ¶ was twice that of empty space, ρ = 2 ρ ¶ = 1. 185 × 10−26 kg/m3 . The energy density of matter now is only 31.11% of the energy density of the universe, and it is falling as the cube of the scale factor ρm (t ) = ρm0 /a 3( t ). In another 20 billion years, the energy density of the universe will have declined almost all the way to the dark-energy density ρ¶ = 5.9238 × 10−27 kg/m3 or (1.5864 meV) 4/(±3 c 5). At that time t¶ and in the indefinite future, the only significant term in the integral (13.296) will be the vacuum energy. Neglecting the others and replacing a0 = 1 with a¶ = a (t¶ ), we find t ( a /a¶ ) − t¶ in which t¶

= log(√a/a

² 35 Gyr.

¶)

H0

µ¶

or a (t ) = e H0

√µ (t −t ) a (t¶ ) ¶

¶

(13.299)

13.49 Before the Big Bang The ¶ CDM model is remarkably successful (Aghanim et al., 2018). But it does not explain why the CMB is so isotropic, apart from a Doppler shift, and why the universe is so flat (Guth, 1981). A brief period of rapid exponential growth like that of the era of dark energy may explain the isotropy and the flatness. Inflation occurs when the potential energy ρ dwarfs the energy of matter and radiation. The internal energy of the universe then is proportional to its volume U = c2 ρ V , and its pressure p as given by the thermodynamic relation p

= − ∂∂UV = − c2ρ

(13.300)

is negative. The second-order Friedmann equation (13.273) a¨ a

= − 4π3G

³

ρ

+

3p c2

´

= 8π3Gρ

(13.301)

then implies exponential growth like that of the era of dark energy (13.299)

√8π Gρ/ 3 t

a (t ) = e

a ( 0).

(13.302)

The origin of the potential-energy density ρ is unknown. In chaotic inflation(Linde, 1983), a scalar field φ fluctuates to a mean value ¹φºi that makes its potential-energy density ρ huge. The field remains at or close to the value ¹φºi , and the scale factor inflates rapidly and exponentially (13.302) until time t at which the potential energy of the universe is E

√24π Gρ t

= c2 ρ e

V (0 )

(13.303)

13.50 Yang–Mills Theory

523

where V (0) is the spatial volume in which the field φ held the value ¹φºi . After time t, the field returns to its mean value ¹0|φ |0 º in the ground state |0º of the theory, and the huge energy E is released as radiation in a Big Bang. The energy E g of the gravitational field caused by inflation is negative, E g = − E , and energy is conserved. Chaotic inflation may explain why there is a universe: a quantum fluctuation made it. Regions where the field remained longer at ¹φ ºi would inflate longer and make new local universes in new Big Bangs creating a multiverse, which also might arise from multiple quantum fluctuations. If a quantum fluctuation gives a field φ a spatially constant mean value ¹φºi ≡ φ in an initial volume V (0) , then the equations for the scale factor (13.301) and for the scalar field (13.201) simplify to H

=

³ 8π Gρ ´1 2 /

3

and

2 4

¨ = − 3 H φ˙ − m 2c ±

φ

φ

(13.304)

in which ρ is the mass density of the potential energy of the field φ. The term −3 H φ˙ is a kind of gravitational friction. It may explain why a field φ sticks at the value ¹φ ºi long enough to resolve the isotropy and flatness puzzles. Classical, de Sitter-like (Example 7.63), bouncing solutions (Steinhardt et al., 2002; Ijjas and Steinhardt, 2018) of Einstein’s equations can explain the homogeneity, flatness, and isotropy of the universe as due to repeated collapses and rebirths. Experiments will tell us whether inflation or bouncing or something else actually occurred (Akrami et al., 2018).

13.50 Yang–Mills Theory The gauge transformation of an abelian gauge theory like electrodynamics multiplies a single charged field by a spacetime-dependent phase factor φ± ( x ) = exp(iq θ (x )) φ (x ). Yang and Mills generalized this gauge transformation to one that multiplies a vector φ of matter fields by a spacetime dependent unitary matrix U (x )

± ± Uab ( x ) φb ( x ) or φa ( x ) = b=1 n

±

φ (x )

= U ( x ) φ (x)

(13.305)

and showed how to make the action of the theory invariant under such non-abelian gauge transformations. (The fields φ are scalars for simplicity.) Since the matrix U is unitary, inner products like φ† ( x ) φ ( x ) are automatically invariant

(

†

)±

φ ( x ) φ (x )

= φ †( x )U †( x )U (x )φ (x ) = φ† (x )φ (x ).

(13.306)

524

13 General Relativity

But inner products of derivatives ∂i φ† ∂i φ are not invariant because the derivative acts on the matrix U (x ) as well as on the field φ (x ) . Yang and Mills made derivatives Di φ that transform like the fields φ ( Di φ)

± = U D φ. i

(13.307)

To do so, they introduced gauge-field matrices A i that play the role of the connections ³i in general relativity and set

= ∂i + Ai

Di

(13.308)

in which A i like ∂i is antihermitian. They required that under the gauge transformation (13.305), the gauge-field matrix A i transform to A±i in such a way as to make the derivatives transform as in (13.307) ( Di φ)

± = (∂i

+ A ±i

)

φ

± = (∂i

)

+ A ±i

Uφ

= U Di φ = U (∂i + Ai ) φ .

(13.309)

So they set

(∂ + A ± ) U φ = U (∂ + A ) φ i i i i

or

(∂i U ) φ

+ A±i U φ = U Ai φ

(13.310)

and made the gauge-field matrix A i transform as A±i

= U Ai U −1 − (∂i U ) U −1 .

(13.311)

Thus under the gauge transformation (13.305), the derivative Di φ transforms as in (13.307), like the vector φ in (13.305), and the inner product of covariant derivatives

Â(

Di φ

)†

Di φ

Ã± ( i ) † † ( ) = D φ U U Di φ = D i φ † Di φ

(13.312)

remains invariant. To make an invariant action density for the gauge-field matrices A i , they used the transformation law (13.309) which implies that D±i U φ = U Di φ or D±i = U Di U −1 . So they defined their generalized Faraday tensor as Fik

= [ Di , Dk ] = ∂i A k − ∂k Ai + [ Ai , A k ]

(13.313)

so that it transforms covariantly Fik±

= U Fik U −1.

(13.314)

ik They ( thenik )generalized the action density Fik F of electrodynamics to the trace Tr Fik F of the square of the Faraday matrices which is invariant under gauge transformations since

(

Tr U Fik U −1U F ik U −1

)

= Tr

(

U Fik F ik U −1

)

= Tr

(

Fik F ik

)

.

(13.315)

13.51 Cartan’s Spin Connection and Structure Equations

525

As an action density for fermionic matter fields, they replaced the ordinary derivative in Dirac’s formula ψ (γ i ∂i + m )ψ by the covariant derivative (13.308) to get ψ (γ i Di + m )ψ (Chen-Ning Yang 1922–, Robert L. Mills 1927–1999). In an abelian gauge theory, the square of the 1-form A = Ai d x i vanishes A 2 = Ai Ak d x i ∧d x k = 0, but in a nonabelian gauge theory the gauge fields are matrices, and A 2 ² = 0. The sum d A + A 2 is the Faraday 2-form F

= d A + A2 = (∂i A k + Ai Ak ) d x i ∧ d xk (13.316) 1 1 = 2 (∂i A k − ∂k A i + [ Ai , Ak ]) d x i ∧ d x k = 2 Fik d x i ∧ d x k .

The scalar matter fields φ may have self-interactions described by a potential V (φ) such as V (φ) = λ(φ † φ − m 2/λ) 2 which is positive unless φ† φ = m 2 /λ. The kinetic action of these fields is ( Di φ)† Di φ . At low temperatures, these scalar fields assume mean values ¹0|φ |0 º = φ0 in the vacuum with φ0†φ0 = m 2/λ so as to minimize their potential energy density V (φ) and their kinetic action † ( Di φ)† Di φ = (∂ i φ + A i φ)†(∂i φ + Ai φ) is approximately φ0 A i A i φ0 . The α matrix of the gauge fields Aiab = − itab A iα is a linear combination of the generα ators t of the gauge group. So the action of the scalar fields contains the term † 2 φ 0 Ai Ai φ0 =− Mαβ A iα Ai β in which the mass-squared matrix for the gauge fields 2 is Mαβ = φ0∗a tabα tbcβ φc0 . This Higgs mechanism gives masses to those linear 2 combinations bβ i A β of the gauge fields for which Mαβ bβi = m 2i bαi ² = 0 . The Higgs mechanism also gives masses to the fermions. The mass term m in the Yang–Mills–Dirac action is replaced by something like c φ in which c is a constant, different for each fermion. In the vacuum and at low temperatures, each fermion acquires as its mass c φ0 . On 4 July 2012, physicists at CERN’s Large Hadron Collider announced the discovery of a Higgs-like particle with a mass near 125 GeV/c 2 (Peter Higgs 1929 –). 13.51 Cartan’s Spin Connection and Structure Equations Cartan’s tetrads (13.153) c ak ( x ) are the rows and columns of the orthogonal matrix that turns the flat-space metric ηab into the curved-space metric gik = cia ηab c bk . Early-alphabet letters a , b, c, d , . . . = 0 , 1, 2 , 3 are Lorentz indexes, and middle-to-late letters i , j, k , ±, . . . = 0, 1, 2 , 3 are spacetime indexes. Under a combined local Lorentz (13.154) and general coordinate transformation the tetrads transform as c± ak ( x ± ) = L ab (x ± )

∂x±

∂ x ±k

cb± ( x ).

(13.317)

526

13 General Relativity

The covariant derivative of a tetrad D± c must transform as

± ±

a

( D± c k ) (x )

i

∂x j

= L ab ( x ±) ∂∂xx±

Di c b j ( x ).

∂ x ±k

±

(13.318)

We can use the affine connection ³ j k ± and the formula (13.68) for the covariant derivative of a covariant vector to cope with the index j. And we can treat the Lorentz index b like an index of a nonabelian group as in Section 13.50 by introducing a gauge field ωa± b D± cak

= ca, − ³ jk ±

±

+ ωa b c b k .

c aj

(13.319)

±

The affine connection is defined so as to make the covariant derivative of the tangent basis vectors vanish D ± e αk

= ek , − ³ jk e j = 0

(13.320)

α

α

±

±

in which the Greek letter α labels the coordinates 0, 1, 2, . . . , n of the embedding space. We may verify this relation by taking the inner product in the embedding space with the dual tangent vector eαi

= δij ³ j k = ³i k = ei ek , = ei · ek, (13.321) which is the definition (13.59) of the affine connection, ³i k = ei · ek , . So too the eiα ³ j k ± e αj

±

±

α

α

±

±

±

±

spin connectionωab ± is defined so as to make the covariant derivative of the tetrad vanish D± c ak

= cak ; = cak , − ³ j k ±

±

±

caj

+ ωa d

±

= 0.

cdk

(13.322)

The dual tetrads c kb are doubly orthonormal: c kb c bi

= δik

and

Thus using their orthonormality, we have ω ad ± spin connection is a

ω b±

= − ckb

= δab . c dk c kb = ωad

cak c bk

(13.323) ±

(ca − ³ j ca ) = ca ck ³ j − ca k, j j b k, k k ±

±

±

±

d

δb

= ω ab

c kb .

±

, and so the (13.324)

In terms of the differential forms (Section 12.6) ca

= cak d x k

and

a

ω b

= ωa b

±

dx±

(13.325)

we may use the exterior derivative to express the vanishing (13.322) of the covariant derivative cak ;± as dc a

= cka,

± k ± dx ∧ dx

=

µ

j

a

³ k± c j

− ω ab

¶

b dx± ∧ dxk. ± ck

(13.326)

13.51 Cartan’s Spin Connection and Structure Equations

527

j

But the affine connection ³ k ± is symmetric in k and ± while the wedge product d x ± ∧ d x k is antisymmetric in k and ±. Thus we have

= cak, d x ∧ d x k = − ωab or with c ≡ cka d x k and ω ≡ ωab d x dc = −ω ∧ c dca

±

±

±

d x ± ∧ cbk d x k

(13.327)

±

±

(13.328)

which is Cartan’s first equation of structure . Cartan’s curvature 2-form is R ab

= 21 caj cib R j ik d x k ∧ d x Ã Â = 21 caj cib ³ j i k − ³ j ki + ³ jkn ³n i − ³ j n ³nki d xk ∧ d x . ±

±

± ,

±

,±

(13.329)

±

±

His second equation of structureexpresses R ab as R ab

= dω ab + ωac ∧ ω cb

or more simply as R

(13.330)

= d ω + ω ∧ ω.

(13.331)

A more compact notation, similar to that of Yang–Mills theory, uses Cartan’s covariant exterior derivative D

≡ d +ω∧

(13.332)

to express his two structure equations as Dc

=0

and

R = D ω.

(13.333)

To derive Cartan’s second structure equation (13.330), we let the exterior derivative act on the 1-form ωab d ωa b

= d (ωab

and add the 2-form ωac ∧ ωcb a

ω c

±

d x ±) =

a

ω b ±, k

∧ ω cb = ωac k ω cb

±

d x k ∧ d x±

(13.334)

dxk ∧ dx±

(13.335)

to get S ab

= (ωab ,k + ωac k ωcb ±

±

) dxk

∧ dx

(13.336)

±

which we want to show is Cartan’s curvature 2-form R ab (13.329). First we replace j ωa± b with its equivalent (13.324) c aj cbi ³ i ± − cai,± c ib S ab

=

Â

a

(c j

c ib ³

j

i±

− cia,

× dxk ∧ dx

i ± cb ),k

±

.

+ ( caj cic ³ jik − cai,k cic )(cnc cbp ³n p − ccp, ±

p

Ã

± cb )

(13.337)

528

13 General Relativity j

The terms proportional to ³ i ±,k are equal to those in the definition (13.329) of Cartan’s curvature 2-form. Among the remaining terms in S ab , those independent of ³ are after explicit antisymmetrization

− cai, cic ccp,k cbp (13.338) which vanishes (Exercise 13.36) because cbi ,k = −cci c cp,k cbp . The terms in S ab that S0

= cia,k cib, − cai, ±

c ib,k + cia,k cci c cp,± c b

p

±

±

are linear in ³’s also vanish (Exercise 13.37). Finally, the terms in S ab that are quadratic in ³’s are c aj cci c cn cbp ³ j ik ³n p± d x k ∧ d x ±

= caj δin cbp ³ jik ³n p d xk ∧ d x = caj cbp ³ jnk ³n p d xk ∧ d x ±

±

±

±

(13.339)

and these match those of Cartan’s curvature 2-form R ab (13.329). Since S ab Cartan’s second equation of structure (13.330) follows.

= R ab ,

Example 13.27(Cyclic identity for the curvature tensor) We can use Cartan’s structure equations to derive the cyclic identity (13.109) of the curvature tensor. We apply the exterior derivative (whose square dd = 0) to Cartan’s first equation of structure (13.328) and then use it and his second equation of structure (13.330) to write the result as 0 = d (dc + ω ∧ c) = (dω) ∧ c − ω ∧ dc = (d ω + ω ∧ ω) ∧ c = R

∧ c.

(13.340)

The definition (13.329) of Cartan’s curvature 2-form R and of his 1-form (13.325) now give 1 j 0 = R ∧ c = caj cib R ik ± dx k ∧ dx ± ∧ cbm dx m 2 = 12 caj R j ik± dx k ∧ dx ± ∧ dx i

(13.341)

which implies that

¶ 1 µ j j j j j j j R ik ± + R ±ik + R k ± i − R ki ± − R i ± k − R ±ki . (13.342) 0 = R [ik ± ] = 3! But since Riemann’s tensor is antisymmetric in its last two indices (13.104), we can write this result more simply as the cyclic identity (13.109) for the curvature tensor 0 = R j ik ±

+ R j ik + R j k i . ±

±

(13.343)

The vanishing of the covariant derivative of the flat-space metric 0 = ηab;k

= ηab,k − ω cakηcb − ωcbk ηac = −ωbak − ωabk

(13.344)

13.53 Gauge Theory and Vectors

529

shows that the spin connection is antisymmetric in its Lorentz indexes ωabk

= − ωbak

and

ab

ω k

= − ωbak .

(13.345)

Under a general coordinate transformation and a local Lorentz transformation, the spin connection (13.324) transforms as

±a

ω b±

¹L a

i

= ∂∂xx±

±

d d ω ei

− (∂i L ae )

º L −1e . b

(13.346)

13.52 Spin-One-Half Fields in General Relativity

The action density (11.329) of a free Dirac field is L = −ψ¯ (γ a ∂a + m ) ψ in which a = 0, 1 , 2, 3; ψ is a 4-component Dirac field; ψ¯ = ψ †β = i ψ †γ 0 ; and m is a mass. Tetrads cak (x ) turn the flat-space indices a into curved-space indices i , so one first replaces γ a ∂a by γ a ca± ∂± . The next step is to use the spin connection (13.324) to correct for the effect of the derivative ∂± on the field ψ . The fully covariant derivative is D± = ∂± − 81 ωab± [γa , γb ] where ωab± = ωac ± ηbc , and the action density is L = − ψ¯ (γ a c ±a D± + m )ψ . 13.53 Gauge Theory and Vectors This section is optional on a first reading. We can formulate Yang–Mills theory in terms of vectors as we did relativity. To accomodate noncompact groups, we generalize the unitary matrices U ( x ) of the Yang–Mills gauge group to nonsingular matrices V (x ) that act on n matter fields ψ a ( x ) as ψ ±a (x ) = V ab ( x ) ψ b (x ). The field ·(x ) = ea ( x ) ψ a ( x ) will be gauge invariant · ± ( x ) = ·( x ) if the vectors e a ( x ) transform as e ±a ( x ) = eb ( x ) V −1b a ( x ). We are summing over repeated indices from 1 to n and often will suppress explicit mention of the spacetime coordinates. In this compressed notation, the field · is gauge invariant because

= e±a ψ ±a = eb V −1b a V ac ψ c = eb δ bc ψ c = eb ψ b = · which is e ± ψ ± = e V −1 V ψ = e ψ in matrix notation. ·

±

T

T

(13.347)

T

The inner product of two basis vectors is an internal “metric tensor” e∗ · e a

b

=

N ± α

=1

eaα∗ e αb

= gab

(13.348)

in which for simplicity I used the the N -dimensional identity matrix as the metric of the embedding space. As in relativity, we’ll assume the matrix gab to be nonsingular. We then can use its inverse to construct dual vectors e a = g ab e b that satisfy e a† · eb = δba .

530

13 General Relativity

The free Dirac action density of the invariant field

·

+ m )· = ψ a ea †(γ i ∂i + m ) eb ψ b = ψ a

a

i

·(γ ∂i

is the full action of the component fields ψ i

·(γ ∂i

¹

i

γ (δ b∂i

b

+ m )· = ψ a (γ i Diab + m δ ab)ψ b = ψ a

+ ea† · eb, i ) + m δ ab

º

ψ

b

(13.349)

¹γ i (δa ∂ + Aa ) + m δ a º ψ b b i ib b

(13.350) if we identify the gauge-field matrix as = e · eb,i in harmony with the definition (13.59) of the affine connection = ek · e±,i . ± − Under the gauge transformation ea = eb V 1b a , the metric matrix transforms as A ai b ³ik±

g ±ab = V −1c∗ a gcd V −1d

b

a†

g±

or as

= V −1† g V −1

(13.351)

in matrix notation. Its inverse goes as g ±−1 = V g −1 V †. The gauge-field matrix Aai b = e a† · eb,i = g ac e†c · eb,i transforms as

= g±ace±a† · eb± ,i = V ac A cid V −b1d + V ac V −b1ci (13.352) or as A±i = V Ai V −1 + V ∂i V −1 = V Ai V −1 − (∂i V ) V −1 . By using the identity e a † · ec,i = − e a† ,i · e c , we may write (Exercise 13.39) the A ±iab

,

Faraday tensor as

Fiajb = [ Di , D j ]a b If n

= N , then

= ea,i† ·eb, j − ea,i† ·ec ec† ·eb, j −e,a†j ·eb,i +ea†, j ·ec ec† ·eb,i . n ± c=1

e αc e β c∗ =

δ

αβ

and

Fiajb

= 0.

(13.353)

(13.354)

The Faraday tensor vanishes when n = N because the dimension of the embedding space is too small to allow the tangent space to have different orientations at different points x of spacetime. The Faraday tensor, which represents internal curvature, therefore must vanish. One needs at least 3 dimensions in which to bend a sheet of paper. The embedding space must have N > 2 dimensions for SU (2), N > 3 for SU ( 3), and N > 5 for SU (5) . The covariant derivative of the internal metric matrix

= g,i − g Ai − A†i g (13.355) ( )± does not vanish and transforms as g;i = V −1†g ,i V −1 . A suitable action deng;i

sity for it is the trace Tr( g;i g −1 g ;i g −1). If the metric matrix assumes a (constant, hermitian) mean value g0 in the vacuum at low temperatures, then its action is m 2 Tr

Â

( g0 A i

+ Ai†g0) g0−1( g0 A i + Ai † g0 )g0−1

Ã

(13.356)

Exercises

531

which is a mass term for the matrix of gauge bosons Wi

= g01 2 A i g0−1 2 + g−0 1 2 A†i g01 2 . /

/

/

/

(13.357)

This mass mechanism also gives masses to the fermions. To see how, we write the Dirac action density (13.350) as

¹

i

a

ψ a γ (δ b ∂i

+ A ai b ) + m δ ab

º

ψ

b

= ψa

¹

i

γ (g ab ∂i

+ gac A ci b ) + m gab

º

b

ψ .

(13.358) Each fermion now gets a mass m ci proportional to an eigenvalue ci of the hermitian matrix g0 . This mass mechanism does not leave behind scalar bosons. Whether nature ever uses it is unclear. Further Reading Einstein Gravity in a Nutshell (Zee, 2013), Gravitation (Misner et al., 1973), Gravitation and Cosmology (Weinberg, 1972), Cosmology (Weinberg, 2010), General Theory of Relativity (Dirac, 1996), Spacetime and Geometry (Carroll, 2003), Exact Space-Times in Einstein’s General Relativity (Griffiths and Podolsky, 2009), Gravitation: Foundations and Frontiers (Padmanabhan, 2010), Modern Cosmology (Dodelson, 2003), The Primordial Density Perturbation: Cosmology, Inflation and the Origin of Structure (Lyth and Liddle, 2009), A First Course in General Relativity (Schutz, 2009), Gravity: An Introduction to Einstein’s General Relativity (Hartle, 2003), and Relativity, Gravitation and Cosmology: A Basic Introduction (Cheng, 2010). Exercises

13.1 Use the flat-space formula d p = xˆ d x + yˆ d y + ˆz dz to compute the change d p due to dρ , d φ, and dz, and so derive expressions for the orthonormal basis vectors ρˆ, φˆ , and zˆ in terms of xˆ , yˆ , and zˆ. 13.2 Similarly, compute the change d p due to dr , d θ , and d φ , and so derive expressions for the orthonormal basis vectors rˆ , θˆ, and φˆ in terms of xˆ , yˆ , and ˆz. 13.3 (a) Using the formulas you found in Exercise 13.2 for the basis vectors of spherical coordinates, compute the derivatives of the unit vectors rˆ , θˆ, and φˆ with respect to the variables r, θ , and φ and express them in terms of the basis vectors rˆ , θˆ, and φˆ . (b) Using the formulas of (a) and our expression (2.16) for the gradient in spherical coordinates, derive the formula (2.33) for the laplacian ∇ · ∇.

532

13 General Relativity

13.4 Show that for any set of basis vectors v1 , . . . , vn and an inner product that is either positive definite (1.78–1.81) or indefinite (1.78–1.79, 1.81, and 1.84), the inner products gik = (vi , vk ) define a matrix g ik that is nonsingular and that therefore has an inverse. Hint: Show that the matrix gik cannot have a zero eigenvalue without violating either the condition (1.80) that it be positive definite or the condition (1.84) that it be nondegenerate. 13.5 Show that the inverse metric (13.48) transforms as a rank-2 contravariant tensor. 13.6 Show that if A k is a covariant vector, then gik Ak is a contravariant vector. 13.7 Show that in terms of the parameter k = (a / R ) 2 , the metric and line element (13.46) are given by (13.47). 13.8 Show that the connection ³ik± transforms as (13.66) and so is not a tensor. 13.9 Use the vanishing (13.83) of the covariant derivative of the metric tensor, to write the condition (13.140) in terms of the covariant derivatives of the symmetry vector (13.141). 13.10 Embed the points p = R (cosh θ , sinh θ cos φ , sinh θ sin φ) with tangent vectors (13.44) and line element (13.45) in the euclidian space E3 . Show that the line element of this embedding is ds 2

µ ¶ = R2 cosh2 θ + sinh2 θ d θ 2 + R2 sinh2 θ dφ 2 ´ ³ 2 2 2 2 2 ( 1 + 2kr ) dr + r dφ =a 1 + kr 2

(13.359)

which describes a hyperboloid that is not maximally symmetric. 13.11 If you have Mathematica, imitate Example 13.15 and find the scalar curvature R (13.113) of the line element (13.359) of the cylindrical hyperboloid embedded in euclidian 3-space E3 . 13.12 Consider the torus with coordinates θ , φ labeling the arbitrary point p

= (cos φ (R + r sin θ ), sin φ ( R + r sin θ ), r cos θ )

(13.360)

in which R > r. Both θ and φ run from 0 to 2π . (a) Find the basis vectors eθ and eφ . (b) Find the metric tensor and its inverse. 13.13 For the same torus, (a) find the dual vectors e θ and eφ and (b) find the nonzero connections ³ ijk where i, j , k take the values θ and φ . 13.14 For the same torus, (a) find the two Christoffel matrices ³θ and ³φ , (b) find φ θ , and their commutator [³θ , ³φ ], and (c) find the elements R θθ θ θ , R θ φθ , Rφθ φ φ

R φφφ of the curvature tensor. 13.15 Find the curvature scalar R of the torus with points (13.360). Hint: In these four problems, you may imitate the corresponding calculation for the sphere in Section 13.23.

Exercises

533

13.16 Show that δ g ik = −g is g kt δg st or equivalently that dgik = − g is g kt dgst by differentiating the identity g ik g k± = δ±i . 13.17 Let gik = ηik + h ik and x ±n = x n + ² n . To lowest order in ² and h, (a) show that in the x ± coordinates h ±ik = h ik − ²i ,k − ²k,i and (b) find an equation for 1 ² that puts h ± in de Donder’s gauge h ±ik,i = 2 (η j ± h ±j ± ),k . 13.18 Just to get an idea of the sizes involved in black holes, imagine an isolated sphere of matter of uniform density ρ that as an initial condition is all at rest within a radius rb . Its radius will be less than its Schwarzschild radius if rb

0 is at t = 0 to derive the formula a (t ) = c cosh( gt )/( Lg ). 13.30 Use the Friedmann equations (13.273 and 13.275) with w = −1, ρ constant, k = −1, and the boundary conditions a (0) = 0 and a˙ ( 0) > 0 to derive the formula a (t ) = c sinh(gt )/( Lg) where again g 2 = 8π Gρ/3. 13.31 Use the Friedmann equations (13.273 and 13.275) with w = −1, ρ constant, and k = 0 to derive the formula a ( t ) = a ( 0) e ³gt . 13.32 Use the constancy of 8π Gρ a 4 /3 = f 2 for radiation (w = 1/3 ) and the Friedmann equations (13.273 and 13.275) to show that if k = 0, a (0) = 0, √ and a ( t ) > 0, then a ( t ) = 2 f t where f > 0. 13.33 Show that if the matrix U ( x ) is nonsingular, then

= − U ∂i U −1. (13.367) 13.34 The gauge-field matrix is a linear combination Ak = − ig t b A bk of the gen(∂i

U ) U −1

erators t b of a representation of the gauge group. The generators obey the commutation relations

[t a , t b] = i fabct c

(13.368)

in which the fabc are the structure constants of the gauge group. Show that under a gauge transformation (13.311) A ±i

= U Ai U −1 − (∂i U ) U −1 (13.369) = exp(−ig λat a ) in which λ a is infinitesimal, the

by the unitary matrix U gauge-field matrix Ai transforms as

Exercises

535

− ig A ±ia t a = −ig Aai t a − ig 2 f abcλa Abit c + ig∂i λa t a.

(13.370)

Show further that the gauge field transforms as A±ia

= Aai − ∂i λa − g fabc Abiλc .

(13.371)

13.35 Show that if the vectors ea ( x ) are orthonormal, then ea† · ec,i = −e,a† i · ec . p a a i i i 13.36 Use the equation 0 = δb ,k = (c i cb ) ,k to show that cb ,k = −c c c cp ,k c b . Then use this result to show that the ³ -free terms S0 (13.338) vanish. 13.37 Show that terms in S ab (13.337) linear in the ³’s vanish. 13.38 Derive the formula (13.346) for how the spin connection (13.324) changes under a Lorentz transformation and a general change of coordinates. 13.39 Use the identity of Exercise 13.35 to derive the formula (13.353) for the nonabelian Faraday tensor. 13.40 Show that the dual tetrads cai = g ik ηab cbk are dual (13.155). 13.41 Write Dirac’s action density in the explicitly hermitian form ¹ º† L D = − 21 ψ γ i ∂i ψ − 12 ψ γ i ∂i ψ in which the field ψ has the invariant

¹

º†

form ψ = ea ψa and ψ = i ψ † γ 0 . Use the identity ψ a γ i ψb = −ψ b γ i ψa to show that the gauge-field matrix A i defined as the coefficient of ψ a γ i ψb as in ψ a γ i (∂i + i Aiab )ψb is hermitian A ∗iab = Aiba .

14 Forms

14.1 Exterior Forms 1-Forms: A 1-form is a linear function ω that maps vectors into numbers. Thus, if A and B are vectors in Rn and z and w are numbers, then ω (z A

+ w B ) = z ω ( A) + w ω ( B ).

(14.1)

The n coordinates x 1 , . . . , x n are 1-forms; they map a vector A into its coordinates: x 1 ( A) = A 1, . . . , x n ( A ) = An . Every 1-form may be expanded in terms of these basic 1-forms as ω

= B1 x1 + · · · + Bn xn

(14.2)

so that ω ( A)

= B1 x1 ( A) + · · · + Bn xn ( A) = B1 A1 + · · · + Bn A n = ( B , A) = B · A.

(14.3)

Thus, every 1-form is associated with a (dual) vector, in this case B. 2-Forms: A 2-form is a function that maps pairs of vectors into numbers linearly and skew-symmetrically. Thus, if A, B, and C are vectors in Rn and z and w are numbers, then ω2 (z A

+ w B , C ) = z ω2 ( A , C ) + w ω 2 ( B , C ) ω 2( A , B ) = − ω2 ( B , A ).

(14.4)

One often drops the superscript and writes the addition of two 2-forms as (ω1

536

+ ω2 )( A , B ) = ω1( A , B ) + ω2( A , B ).

(14.5)

14.1 Exterior Forms

537

Example 14.1(Parallelogram) The oriented areaof the parallelogram defined by two 2-vectors A and B is the determinant

±± ω ( A , B ) = ±±

A1 B1

A2 B2

±± ±± .

(14.6)

This 2-form maps the ordered pair of vectors ( A , B ) into the oriented area ( ± the usual area) of the parallelogram they describe. To check that this 2-form gives the area to within a sign, rotate the coordinates so that the 2-vector A runs from the origin along the x-axis. Then A 2 = 0, and the 2-form gives A1 B2 which is the base A1 of the parallelogram times its height B 2. Example 14.2(Parallelepiped) The triple scalar productof three 3-vectors

±± ± 2 ω A ( B , C ) = A · B × C = ±± ±

A1 B1 C1

A2 B2 C2

A3 B3 C3

±± ±± 3 ±± = ω ( A, B, C )

(14.7)

is both a 2-form that depends upon the vector A and also a 3-form that maps the triplet of vectors A , B , C into the signed volume of their parallelepiped.

k-Forms: A k-form (or an exterior form of degreek ) is a linear function of k vectors that is antisymmetric. For vectors A 1, . . . , A k and numbers z and w

²

ω ( z A1

+ w A ²²1 , A2 , . . . , A k ) = z ω ( A ²1, A 2, . . . , Ak ) + w ω ( A ²²1 , A2 , . . . , Ak )

and the interchange of any two vectors makes a minus sign ω ( A2 , A1 , . . . ,

Ak ) = −ω ( A 1 , A 2 , . . . , Ak ).

(14.8) (14.9)

Exterior Product of Two 1-Forms : The 1-form ω1 maps the vectors A and B into the numbers ω1( A ) and ω1 ( B ), and the 1-form ω2 does the same thing with 1 → 2. The value of the exterior product ω1 ∧ ω2 on the two vectors A and B is the 2-form defined by the 2 × 2 determinant

±± ω ( A) 1 ω1 ∧ ω2 ( A , B ) = ±± ω1 ( B )

or more formally ω1

±± ± = ω1( A )ω2( B ) − ω2 ( A)ω 1( B ) ω2 ( B ) ± ω2 ( A)

∧ ω2 = ω1 ⊗ ω2 − ω2 ⊗ ω1 .

The most general 2-formon xi ∧ x j ω2

Rn

=

(14.10)

(14.11)

is a linear combination of the basic 2-forms

² 1≤i 1/2) ≈ √ − erf √ σ 2 σ 2 = 0. 999999999946. 1 erf 2

(15.103)

15.8 Error Analysis

The mean value f¯ = ² f ³ of a smooth function f ( x ) of a random variable x is f¯ =

≈ =

´

f ( x ) P (x ) d x

´Ã

Ä

1 f (µ) + ( x − µ) f ± (µ) + ( x − µ)2 f ±± (µ) P ( x ) d x

f (µ) +

1 2 ±± σ f (µ) 2

2

(15.104)

584

15 Probability and Statistics

as long as the higher central moments small. The mean value of f 2 then is

² f 2³ =

´

νn

and the higher derivatives f (n ) (µ) are

f 2( x ) P ( x ) d x

´Ã

1 f (µ) + ( x − µ) f ± (µ) + (x

Ä2

− µ) P (x ) d x 2 ´¸ ¹ ≈ f 2(µ) + ( x − µ)2 f ± 2 (µ) + ( x − µ)2 f (µ) f ±±(µ) P (x ) d x ≈

=

2

f ±± (µ)

f 2 (µ) + σ 2 f ± 2 (µ) + σ 2 f (µ) f ±± (µ).

(15.105)

Subtraction of f¯2 gives the variance of the variable f (x ) 2

σf

( ) = ² f − f¯ 2 ³ = ² f 2 ³ − f¯ 2 ≈ σ 2 f ± 2(µ).

(15.106)

A similar formula gives the variance of a smooth function f (x 1 , . . . , x n ) of several independent variables x 1 , . . . , x n as

¶ ∂ f (x) ·2ÇÇ n ± ( ) ÇÇ 2 2 2 2 2 σ f = ² f − f¯ ³ = ² f ³ − f¯ ≈ σi Ç ∂x i =1

i

x =¯x

(15.107)

in which x¯ is the vector (µ1, . . . , µn ) of mean values, and σi2 = ²( xi − µi ) 2³ is the variance of x i . This formula (15.107) implies that the variance of a sum f (x , y ) = c x + d y is 2

σcx+dy

= c2σ x2 + d 2 σy2.

(15.108)

Similarly, the variance formula (15.107) gives as the variance of a product f (x , y ) = x y σx2y

= σx2 µ2y + σ y2 µ2x = µ2x µ2y

and as the variance of a ratio f ( x , y ) = x / y 2 σx / y

σ x2

= µ2 + y

The variance of a power f ( x ) function of a single variable

2 2 µx σy µ 4y

= 2

σx a

µ2x

= µ2

º

y

º

σ x2

µ2x

σx2

µ2x

σ y2

+ µ2

»

(15.109)

y

σy2

+ µ2

» .

y

(15.110)

x a follows from the variance (15.106) of a

= σx2

(

a µax −1

)2

.

In general, the standard deviation σ is the square root of the variance σ 2.

(15.111)

15.9 Maxwell–Boltzmann Distribution

585

Example 15.17(Photon density) The 2009 COBE/FIRAS measurement of the temperature of the cosmic microwave background (CMB) radiation is T0 = 2.7255 ¶ 0.0006 K. The mass density (5.110) of these photons is 5

ργ

4

= σ 8π15h(k3BcT50 ) = 4.6451 × 10−31 kg m−3.

(15.112)

Our formula (15.111) for the variance of a power says that the standard deviation σρ of the photon density is its temperature derivative times the standard deviation σT of the temperature 4σ T σρ = ρ γ = 0.00088 ργ . (15.113) T0 So the probability that the photon mass density lies within the range

= (4.6451 ¶ 0. 0041) × 10 −31 kg m−3

ργ

(15.114)

is 0.68.

15.9 Maxwell–Boltzmann Distribution It is a small jump from Gauss’s distribution (15.84) to the Maxwell–Boltzmann distribution of velocities of molecules in a gas. We start in 1 dimension and focus on a single molecule that is being knocked forward and backward with equal probabilities by other molecules. If each tiny hit increases or decreases its speed by d v, then after n hits from behind and N − n hits from in front, the speed v x of a molecule initially at rest would be vx

= nd v − ( N − n)dv = (2n − N )d v.

(15.115)

The probability of this speed is given by Gauss’s approximation (15.81) to the binomial distribution Pb (n , 21 , N ) as 1 PbG ( n , , N ) = 2

¾

2 exp πN

¶ (2n − N )2 · ¾ 2 − 2N = πN

exp

¶

v 2x

− 2 N d v2

·

.

(15.116)

In this formula, the product N d v 2 is the variance σv2x which is the mean value ²v 2x ³ because ²v x ³ = 0. Kinetic theory says that this variance σv2x = ²v 2x ³ is ²v 2x ³ = kT / m in which m is the mass of the molecule, k Boltzmann’s constant, and T the temperature. So the probability of the molecule’s having velocity vx is the Maxwell–Boltzmann distribution PG (v x ) =

1

σv

√

2π

¶ v2 · ¾ m exp − x2 = 2σ 2π kT v

¶ mv2 · exp − x 2kT

(15.117)

586

15 Probability and Statistics

when normalized over the line −∞ < vx < ∞. In three space dimensions, the Maxwell–Boltzmann distribution PM B (v ) is the product

m À3/2 − 12 m v 2/(kT ) e 4π v 2d v. 2π kT (15.118) The mean value of the velocity of a Maxwell–Boltzmann gas vanishes PM B (v )d 3 v = PG (vx ) PG (v y ) PG (v z ) d 3 v =

²v ³ =

´

v

¿

PM B (v )d 3v

=0

but the mean value of the square of the velocity v 2 variances σ x2 = σ y2 = σz2 = kT / m

² v ³ = V [v ] = 2

2

´

2

v

(15.119)

= v · v is the sum of the three

PM B (v ) d 3v

= 3kT / m

(15.120)

which is the familiar statement 3 1 m ²v 2³ = kT 2 2

(15.121)

that each degree of freedom gets kT /2 of energy. 15.10 Fermi–Dirac and Bose–Einstein Distributions The commutation and anticommutation relations (11.133) ψs ( t,

x )ψs ± (t , x ± ) − (−1) 2 j ψs ± ( t , x ± )ψs (t , x ) = 0

(15.122)

of Bose fields (−1) 2 j = 1 and of Fermi fields (−1)2 j = −1 determine the statistics of bosons and fermions. One can put any number N n of noninteracting bosons, such as photons or gravitons, into any state |n ³ of energy E n . The energy of that state is N n E n , and the states |n, Nn ³ form an independent thermodynamical system for each state |n³ . The grand canonical ensemble (1.395) gives the probability of the state |n , Nn ³ as ρn, Nn

−β(E n−µ) Nn

= ²n , Nn |ρ |n, Nn ³ = ∑ e²n, N |ρ |n , N ³ . n n N

(15.123)

n

For each state |n ³, the partition function is a geometric sum Z (β, µ, n ) =

∞ ±

Nn =0

e−β( En −µ) Nn

=

∞ ( ± −

Nn =0

e

β( E n

−µ) )Nn

= 1 − e−1 E − β( n

µ)

.

(15.124)

15.11 Diffusion

So the probability of the state |n , N n ³ is

587

−β(E n −µ) Nn

= 1e− e− E − and the mean number of bosons in the state |n ³ is ρn, Nn

β( n

µ)

² N n ³ = β1 ∂∂µ ln Z (β, µ, n) = e

(15.125)

,

1

β( E n

−µ) − 1 .

(15.126)

−µ) ,

(15.127)

One can put at most one fermion into a given state |n ³. If like neutrinos the fermions don’t interact, then the states |n , 0³ and |n , 1 ³ form an independent thermodynamical system for each state |n ³. So for noninteracting fermions, the partition function is the sum of only two terms Z (β, µ, n ) =

1 ±

Nn =0

= 1 + e−

e −β(E n −µ) Nn

β( E n

and the probability of the state |n , Nn ³ is ρn, Nn

−β( En −µ) Nn . β( E n −µ)

= 1e+ e−

(15.128)

So the mean number of fermions in the state |n ³ is

² Nn ³ = = e

1

β( E n

(15.129)

−µ) + 1 .

15.11 Diffusion We may apply the same reasoning as in the preceding section (15.9) to the diffusion of a gas of particles treated as a random walk with step size d x. In 1 dimension, after n steps forward and N − n steps backward, a particle starting at x = 0 is at x = ( 2n − N )d x. Thus as in (15.116), the probability of being at x is given by Gauss’s approximation (15.81) to the binomial distribution Pb( n , 21 , N ) as 1 PbG (n , , N ) = 2

¾

2 exp πN

¶ (2n − N )2 · ¾ 2 = πN − 2N

In terms of the diffusion constant D= this distribution is PG ( x ) =

¶

1 4π Dt

Ndx2 2t

·1 2 /

¶ x2 · . exp − 2Ndx2

(15.130) (15.131)

¶

x2 exp − 4 Dt

·

(15.132)

588

15 Probability and Statistics

when normalized to unity on (−∞, ∞). In 3 dimensions, this gaussian distribution is the product P ( r , t ) = PG ( x ) PG ( y ) PG (z ) =

¶

1 4π Dt

·3 2 /

¶

r2 exp − 4Dt

· .

(15.133)

The variance σ 2 = 2Dt gives the average of the squared displacement of each of the three coordinates. Thus the mean of the squared displacement ² r 2 ³ rises linearly with the time as

´

² r ³ = V [r ] = 3 σ = 2

2

r 2 P ( r , t ) d 3r

= 6 D t.

(15.134)

The distribution P ( r , t ) satisfies the diffusion equation P˙ ( r , t ) = D ∇2 P ( r , t )

(15.135)

in which the dot means time derivative.

15.12 Langevin’s Theory of Brownian Motion Einstein made the first theory of brownian motion in 1905, but Langevin’s approach (Langevin, 1908) is simpler. A tiny particle of colloidal size and mass m in a fluid is buffeted by a force F (t ) due to the 1021 collisions per second it suffers with the molecules of the surrounding fluid. Its equation of motion is d v (t ) = F (t ). (15.136) dt Langevin suggested that the force F (t ) is the sum of a viscous drag −v (t )/ B and a rapidly fluctuating part f (t ) m

F(t ) =

− v( t )/ B +

f (t )

(15.137)

so that

v( t ) d v (t ) = − + f (t ). (15.138) dt B The parameter B is called the mobility. The ensemble average(the average over all the particles) of the fluctuating force f ( t ) is zero

m

² f (t ) ³ = 0.

(15.139)

Thus the ensemble average of the velocity satisfies m

d ²v ³ dt

= − ²Bv ³

(15.140)

15.12 Langevin’s Theory of Brownian Motion

whose solution with τ

= m B is ² v(t )³ = ²v(0)³ e−t

/τ

589

(15.141)

.

The instantaneous equation (15.138) divided by the mass m is d v (t ) dt

= − v τ(t ) + a(t )

(15.142)

in which a (t ) = f (t )/ m is the acceleration. The ensemble average of the scalar product of the position vector r with this equation is

È

dv r· dt

É

= − ² r τ· v³ + ² r · a³.

(15.143)

But since the ensemble average ²r · a ³ of the scalar product of the position vector r with the random, fluctuating part a of the acceleration vanishes, we have

È

dv r· dt

É

= − ² r τ· v ³ .

(15.144)

Now 1 d r2 2 dt

= 21 dtd ( r · r ) = r · v

(15.145)

and so 1 d2 r 2 2 dt 2

= r · ddtv + v 2 .

The ensemble average of this equation is

È

É

(15.146)

d 2² r 2³ dt 2

=

+ 2²v 2 ³

(15.147)

d2 ² r2 ³ dt 2

= −2 ² r τ· v ³ + 2²v 2 ³.

(15.148)

dv 2 r· dt

or in view of (15.144)

We now use (15.145) to replace ² r · v³ with half the first time derivative of ² r 2 ³ so that we have 1 d² r 2 ³ d2 ² r2 ³ = − + 2²v 2 ³. (15.149) dt 2 τ dt If the fluid is in equilibrium, then the ensemble average of Maxwell–Boltzmann value (15.121)

²v 2³ = 3kT m

v2

is given by the (15.150)

590

15 Probability and Statistics

and so the acceleration (15.149) of ² r 2 ³ is d 2 ² r 2³ dt 2

+ τ1 d ²dtr ³ = 6kT m 2

(15.151)

which we can integrate. The general solution (7.12) to a second-order linear inhomogeneous differential equation is the sum of any particular solution to the inhomogeneous equation plus the general solution of the homogeneous equation. The function ² r 2( t )³ pi = 6kT t τ / m is a particular solution of the inhomogeneous equation. The general solution to the homogeneous equation is ² r 2(t ) ³gh = U + W exp( −t /τ ) where U and W are constants. So ² r 2(t )³ is

² r 2(t )³ = U + W e−t + 6kT τ t /m (15.152) where U and W make ² r 2(t )³ fit the boundary conditions. If the individual particles start out at the origin r = 0, then one boundary condition is ² r 2 (0)³ = 0 (15.153) /τ

which implies that

U+W

= 0.

(15.154)

And since the particles start out at r = 0 with an isotropic distribution of initial velocities, the formula (15.145) for r˙ 2 implies that at t = 0

Ç

d ²r 2 ³ ÇÇ dt Çt =0

= 2²r (0) · v (0)³ = 0.

(15.155)

This boundary condition means that our solution (15.152) must satisfy

Ç

d ² r 2 (t )³ ÇÇ dt Ç t =0

Thus W

τ = − Wτ + 6kT = 0. m

= −U = 6kT τ 2 /m, and so our solution (15.152) is Ä Ã 6kT τ 2 t − t 2 ² r ( t )³ = m τ + e − 1 . /τ

(15.156)

(15.157)

At times short compared to τ , the first two terms in the power series for the exponential exp( −t /τ ) cancel the terms −1 + t /τ , leaving

² r (t )³ = 2

6kT τ 2 m

Ã t2 Ä 2τ 2

= 3kT t 2 = ²v 2 ³ t 2. m

(15.158)

But at times long compared to τ , the exponential vanishes, leaving τ ² r 2(t )³ = 6kT t = 6 B kT t . m

(15.159)

15.13 Einstein–Nernst Relation

591

The diffusion constantD is defined by

² r 2(t )³ = 6 D t

(15.160)

and so we arrive at Einstein’s relation D

= B kT

(15.161)

which often is written in terms of the viscous-friction coefficient f v fv

≡ B1 = mτ

(15.162)

= kT .

(15.163)

as fv D

This equation expresses Boltzmann’s constant k in terms of three quantities v , D, and T that were accessible to measurement in the first decade of the twentieth century. It enabled scientists to measure Boltzmann’s constant k for the first time. And since Avogadro’s number N A was the known gas constant R divided by k, the number of molecules in a mole was revealed to be N A = 6.022 × 1023. Chemists could then divide the mass of a mole of any pure substance by 6.022 × 1023 and find the mass of the molecules that composed it. Suddenly the masses of the molecules of chemistry became known, and molecules were recognized as real particles and not tricks for balancing chemical equations. 15.13 Einstein–Nernst Relation If a particle of mass m carries an electric charge q and is exposed to an electric field E, then in addition to viscosity − v / B and random buffeting f , the constant force q E acts on it v dv = − + qE + f. (15.164) m dt B The mean value of its velocity satisfies the differential equation

È dv É dt

where τ

= − ²τv ³ + qmE

(15.165)

= m B. A particular solution of this inhomogeneous equation is

²v (t )³ pi = qτmE = q B E . (15.166) The general solution of its homogeneous version is ²v (t )³ gh = A exp(−t /τ ) in which the constant A is chosen to give ²v (0)³ at t = 0. So by (7.12), the general

592

15 Probability and Statistics

solution ²v (t )³ gh

²v (t )³ to Equation (15.165) is (Exercise 15.17) the sum of ²v (t )³ pi and ¸ ¹ ²v( t )³ = q B E + ²v (0) ³ − q B E e−t . (15.167) /τ

By applying the tricks of the previous section (15.12), one may show (Exercise 15.18) that the variance of the position r about its mean ²r (t )³ is

Ê r − ²r (t )³ 2 Ë = 6kT τ 2 ¶ t − 1 + e−t · (15.168) ( ) m τ ) ( where ² r ( t )³ = (q τ 2 E / m ) t /τ − 1 + e−t if ² r ( 0)³ = ²v (0) ³ = 0. So for times t ¸ τ , this variance is Ê(r − ²r (t )³)2Ë = 6kT τ t . (15.169) m /τ

/τ

Since the diffusion constant D is defined by (15.160) as

Ê(r − ²r (t )³)2 Ë = 6 D t

(15.170)

we arrive at the Einstein–Nernst relation D=

kT τ m

= kT B = µq kT

(15.171)

in which the electric mobility is µ = q B.

15.14 Fluctuation and Dissipation Let’s look again at Langevin’s equation (15.142) d v (t ) dt

+ v (τt ) = a(t ).

(15.172)

If we multiply both sides by the exponential exp( t /τ )

¶dv dt

+τ

v

·

e t /τ

= dtd

(v et ) = a(t ) et /τ

and integrate from 0 to t

´t 0

d dt ±

¿

ve

t ± /τ

À

dt ± = v( t ) e t /τ

− v (0) =

then we get

−t /τ v (t ) = e

−t /τ v (0) + e

´

t 0

´t 0

/τ

±

a (t ±) e t /τ dt ±

±

a (t ± ) e t /τ dt ± .

(15.173)

(15.174)

(15.175)

15.14 Fluctuation and Dissipation

593

Thus the ensemble average of the square of the velocity is

²v

2

−2t /τ (t )³ = e

²v

+ e−2t /τ

2

−2t /τ (0 )³ + 2e

´ t´ t 0

0

´t 0

² v( 0) · a( t ± )³ et ±

² a(u1 ) · a(t2 )³ e u +t ( 1

2 )/τ

/τ

dt ±

(15.176)

du 1dt2 .

The second term on the RHS is zero, so we have

²v2 (t ) ³ = e−2t ²v 2 (0)³ + e−2t /τ

The ensemble average

/τ

´ t´ 0

t 0

²a(t1 ) · a(t2 )³ e t +t ( 1

2 )/τ

dt1dt 2. (15.177)

C (t1 , t2) = ² a( t1 ) · a( t2 )³

(15.178)

is an example of an autocorrelation function. All autocorrelation functions have some simple properties, which are easy to prove (Pathria, 1972, p. 458):

1. If the system is independent of time, then its autocorrelation function for any given variable A(t ) depends only upon the time delay s:

+ s ) = ² A(t ) · A(t + s )³ ≡ C (s ). 2. The autocorrelation function for s = 0 is necessarily nonnegative C (t , t ) = ² A(t ) · A(t ) ³ = ² A(t )2 ³ ≥ 0. If the system is time independent, then C (t , t ) = C (0 ) ≥ 0. C (t , t

(15.179)

(15.180)

3. The absolute value of C (t1 , t2 ) is never greater than the average of C (t1 , t1 ) and C (t2 , t2 ) because

²| A(t1) ¶ A(t2)|2³ = ² A(t1)2³ + ² A(t2)2³ ¶ 2² A(t1) · A(t2)³ ≥ 0 (15.181) which implies that −2 C (t1 , t2 ) ≤ C (t1 , t1 ) + C (t2 , t2 ) ≥ 2 C (t1 , t2 ) or 2 |C ( t1 , t2 )| ≤ C (t1 , t1 ) + C ( t2 , t2 ). (15.182) For a time-independent system, this inequality is |C (s) | ≤ C ( 0) for every time

delay s. 4. If the variables A(t1 ) and A(t2 ) commute, then their autocorrelation function is symmetric C ( t1 , t2 ) = ² A( t1 ) · A(t2 )³ = ² A(t2 ) · A(t1 )³ = C (t2 , t1 ). For a time-independent system, this symmetry is C (s ) = C (−s) .

(15.183)

594

15 Probability and Statistics

5. If the variable A( t ) is randomly fluctuating with zero mean, then we expect both that its ensemble average vanishes

² A(t )³ = 0

(15.184)

and that there is some characteristic time scale T beyond which the correlation function falls to zero:

² A(t1) · A(t2)³ → ² A(t1)³ · ² A(t2)³ = 0

when |t1 − t2 | ¸ T .

In terms of the autocorrelation function C (t1 , t2 ) acceleration, the variance of the velocity (15.177) is

²v 2 (t )³ = e−2t ²v 2(0)³ + e−2t /τ

/τ

´ t´ t 0

0

(15.185)

= ²a(t1 ) · a(t2 )³

C (t1 , t2) e ( t1+t2 )/τ dt 1dt2 .

of the

(15.186)

Since C ( t1 , t2) is big only for tiny values of |t2 − t1 |, it makes sense to change variables to 1 s = t2 − t1 and w = ( t1 + t2). (15.187) 2 The element of area then is by (14.6–14.14)

∧ dt2 = d w ∧ ds (15.188) and the limits of integration are −2w ≤ s ≤ 2 w for 0 ≤ w ≤ t /2 and −2 (t − w) ≤ s ≤ 2( t − w) for t /2 ≤ w ≤ t. So ²v 2( t )³ is ´t2 ´2 2 2 − 2t 2 − 2t e dw C (s ) ds ²v ( t )³ = e ² v (0) ³ + e 0 −2 ´ 2 t− ´t 2 − 2t C (s ) ds . (15.189) e dw +e dt1

/τ

/

/τ

w

w/τ

w

(

w/τ

/τ

w)

−2(t −w)

t /2

Since by (15.185) the autocorrelation function C (s ) vanishes outside a narrow window of width 2T , we may approximate each of the s-integrals by C

=

´

∞

C (s ) ds .

It follows then that

²v

2

−2t /τ ²v 2 (0)³ + C e−2t /τ (t ) ³ = e

= e−2t ²v 2 (0)³ + C 2τ

´

t

e2w/τ d w

(e2t − 1) (1 −2e−2t ) .

= e−2t /τ ²v 2 (0)³ + C e−2t /τ /τ

(15.190)

−∞

0

τ

/τ

/τ

(15.191)

15.15 Fokker–Planck Equation

As t

595

→ ∞, ²v 2( t )³ must approach its equilibrium value of 3kT / m, and so lim ²v 2 (t )³ = C

t →∞

3kT = 2 m τ

(15.192)

which implies that C

= 6kT mτ

or

Our final formula for ²v2 (t ) ³ then is

1 B

²v 2( t )³ = e−2t ² v2 (0) ³ + 3kT m /τ

2

= m6kTC .

(15.193)

(1 − e−2t ) .

(15.194)

/τ

Referring back to the definition (15.162) of the viscous-friction coefficient f v 1/ B, we see that fv is related to the integral fv

=

1 B

=

m2 C 6kT

=

m2 6kT

´∞ ´∞ 1 ² a(0) · a(s )³ds = 6kT ² f (0) · −∞

−∞

=

f (s )³ds

(15.195) of the autocorrelation function of the random acceleration a( t ) or equivalently of the random force f (t ). This equation relates the dissipation of viscous friction to the random fluctuations. It is an example of a fluctuation–dissipation theorem. If we substitute our formula (15.194) for ²v 2 (t )³ into the expression (15.149) for the acceleration of ² r 2 ³, then we get

2 (1 − e−2t ) . (15.196) = − 1τ d ² rdt(t )³ + 2e−2t ²v 2(0)³ + 6kT m The solution with both ² r 2 (0) ³ = 0 and d ² r 2 (0)³/dt = 0 is (Exercise 15.19) ( ) ( −t )(3 − e−t ) + 6kT τ t . 2 ² r 2 (t )³ = ²v2 (0)³ τ 2 1 − e−t 2 − 3kT τ 1 −e m m

d 2 ² r 2 (t ) ³ dt 2

/τ

/τ

/τ

/τ

/τ

(15.197)

15.15 Fokker–Planck Equation Let P ( v, t ) be the probability distribution of particles in velocity space at time t, and ψ (v ; u ) be a normalized transition probability that the velocity changes from v to v + u in the time interval [t , t + ²t ] . We take the interval ² t to be much longer than the interval between successive particle collisions but much shorter than the time over which the velocity v changes appreciably. So | u| µ |v |. We also assume that the successive changes in the velocities of the particles is a Markoff stochastic process, that is, that the changes are random and that what happens at time t depends only upon the state of the system at time t and not upon the history

596

15 Probability and Statistics

of the system. We then expect that the velocity distribution at time t + ² t is related to that at time t by P (v , t

´

+ ²t ) =

P ( v − u , t ) ψ ( v − u ; u) d 3 u .

Since | u| µ |v |, we can expand P ( v, t Taylor series in u like ψ (v

+ ²t ), P (v − u, t ) , and ψ (v − u; u) in

− u; u) = ψ (v ; u) − u · ∇ ψ (v ; u) + 21

±

v

and get P (v , t ) + ² t

∂ P (v , t ) ∂t

=

´Ã ×

Ã

ui u j

∂ 2 ψ (v

(15.199)

2 1± ∂ P (v , t ) ui u j 2 ∂ vi ∂ v j i, j

1±

; u ) − u · ∇ ψ (v ; u ) + 2 v

; u)

∂ vi ∂ v j

i, j

P (v , t ) − u · ∇v P (v , t ) +

ψ (v

(15.198)

ui u j

2 ∂ ψ (v

i, j

Ä

; u) Ä d3 u.

∂ vi ∂ v j

(15.200) The normalization of the transition probability velocity are

´

1=

´

²ui ³ =

´

²u i u j ³ =

ψ

and the average changes in

; u) d 3 u

ψ (v

u i ψ ( v; u) d 3 u

(15.201)

u i u j ψ ( v; u) d 3 u

in which the dependence of the mean values ²u i ³ and ²u i u j ³ upon the velocity v is implicit. In these terms, the expansion (15.200) is ²t

∂ P (v , t ) ∂t

= − ²u³ · ∇

v

P (v , t ) +

1± ∂ 2 P (v , t ) ² ui u j ³ 2 i, j ∂ vi ∂ v j

− P (v , t ) ∇ · ²u³ + P (v , t ) 21 ± + ∂ P∂(vv , t ) ∂ ²∂uvi u j ³ . v

i, j

i

j

± ∂ 2² u i u j ³ i, j

∂ vi ∂ v j

(15.202)

15.16 Characteristic and Moment-Generating Functions

597

Combining terms, we get the Fokker–Planck equation in its most general form (Chandrasekhar, 1943) ²t

∂ P (v , t ) ∂t

= −∇

v [P (v , t ) · ² u ³] +

¹ 1 ± ∂2 ¸ P (v , t )²u i u j ³ . (15.203) 2 i , j ∂ vi ∂ v j

Example 15.18(Brownian motion) Langevin’s equation (15.138) gives the change u in the velocity v as the viscous drag plus some 1021 random tiny accelerations per second v ²t f ²t u= − + (15.204) mB m in which B is the mobility of the colloidal particle. The random changes f ²t / m in velocity are gaussian, and the transition probability is ψ (v

; u) =

¶ βm 2 B ·3 2

º

/

4 π ²t

exp

−

Ç»

Ç

2 v ²t Ç B ÇÇ Ç u + 4² t Ç mB Ç

βm

2

.

(15.205)

Here β = 1/ kT, and Stokes’s formula for the mobility B of a spherical colloidal particle of radius r in a fluid of viscosity η is 1/ B = 6π r η. The moments (15.201) of the changes u in velocity are in the limit ² t → 0

²u³ = − vm²Bt

²ui u j ³ = 2δi j

kT m2B

(15.206) ²t .

So for Brownian motion, the Fokker–Planck equation is ∂ P (v , t ) ∂t

= m1B ∇

v

[ P (v , t ) · v ] +

kT 2 ∇ P (v, t ). m2 B v

(15.207)

15.16 Characteristic and Moment-Generating Functions The Fourier transform (4.9) of a probability distribution P (x ) is its characteristic function P˜ (k ) sometimes written as χ (k ) P˜ (k ) ≡ χ (k ) ≡ E [e ikx ] =

´

e ikx P ( x ) d x .

(15.208)

The probability distribution P ( x ) is the inverse Fourier transform (4.9) P(x ) =

´

e −ikx P˜ (k )

dk . 2π

(15.209)

598

15 Probability and Statistics

Example 15.19(Gauss)

The characteristic function of the gaussian

PG (x , µ, σ ) = is by (4.19)

´

1

σ

√

2π

¶ (x − µ)2 · exp − 2σ 2

(15.210)

¶

2· − (x 2−σ µ) dx (15.211) 2 σ 2π · ¶ · ´ ¶ ik 1 x2 exp ikx − 2 dx = exp i µk − σ 2k 2 . = e√ 2σ 2

P˜G (k , µ, σ ) =

√1

exp ikx

µ

2π

σ

For a discrete probability distribution Pn the characteristic function is χ (k )

≡ E [eikn ] =

±

eikn Pn .

(15.212)

n

The normalization of both continuous and discrete probability distributions implies that their characteristic functions satisfy P˜ (0) = χ (0) = 1. Example 15.20(Binomial and Poisson) distribution (15.51) Pb (n , p, N ) = is χb ( k )

=

N ± n=0

e

ikn

¶N · n

p (1 − p) n

The characteristic function of the binomial

¶N·

N −n

n

=

¿

pn (1 − p)N −n

N ¶ · ± N n =0

n

ik n ( pe ) (1

ÀN Á ¿ À ÂN = peik + 1 − p = p eik − 1 + 1 . The Poisson distribution (15.66) PP (n , ²n³) =

²n³n e−²n³ n!

(15.213)

− p)N −n

(15.214)

(15.215)

has the characteristic function χ P (k )

=

∞ ±

n =0

eikn

∞ (²n³eik )n Á ¿ ÀÂ ²n ³n e−²n³ = e−²n³ ± = exp ²n³ eik − 1 . n! n! n=0

(15.216)

15.16 Characteristic and Moment-Generating Functions

599

The moment-generating functionis the characteristic function evaluated at an imaginary argument M (k ) ≡ E [e kx ] = P˜ (−ik) =

χ(

−ik).

(15.217)

For a continuous probability distribution P ( x ), it is M (k ) = E [e

kx

]=

´

e kx P (x ) d x

(15.218)

and for a discrete probability distribution Pn , it is M (k ) = E [e kx ] =

±

e kxn Pn .

(15.219)

n

In both cases, the normalization of the probability distribution implies that M (0) = 1. Derivatives of the moment-generating function and of the characteristic function give the moments µ n E [x n ] = µn

=

Ç

d n M ( k ) ÇÇ dk n Çk =0

=

Ç

Ç n ˜ n d P (k ) Ç (−i ) Ç dk n Ç

k =0

.

(15.220)

Example 15.21(Three moment-generating functions) The characteristic functions of the binomial distribution (15.214) and those of the distributions of Poisson (15.216) and Gauss (15.210) give us the moment-generating functions M b ( k, p , N ) =

Á ¿

p ek

À ÂN −1 +1 ,

Á ¿ k ÀÂ ²n ³ e − 1 , ·

M P (k, ²n ³) = exp

and MG (k , µ, σ ) = exp

¶

µk

+ 21 σ 2k 2

.

(15.221)

Thus by (15.220), the first three moments of these three distributions are

=1, µ P0 =1, µ G0 =1, µ b0

= N p, µ P1 = ² n ³ , µ G1 = µ,

µb1

= N2p 2 µ P2 = ² n ³ + ²n ³ 2 2 µ G2 = µ + σ µb2

(15.222)

(Exercise 15.20).

Since the characteristic and moment-generating functions have derivatives (15.220) proportional to the moments µn , their Taylor series are

∞ ∞ n n ˜P (k ) = E [eikx ] = ± (ik) E [x n ] = ± (ik ) µn n! n! n =0 n=0

(15.223)

600

15 Probability and Statistics

and

∞ n ∞ n ± ± k k n E [x ] = µn . M (k ) = E [e ] = n! n! n=0 n=0 kx

(15.224)

The cumulants cn of a probability distribution are the derivatives of the logarithm of its moment-generating function at k = 0 cn

=

Ç

d n ln M (k ) ÇÇ Çk =0 dk n

Ç

n ˜ Ç n d ln P ( k ) Ç (−i ) ÇÇ dk n

=

k=0

.

(15.225)

One may show (Exercise 15.22) that the first five cumulants of an arbitrary probability distribution are c0

= 0,

c1

= µ,

c2

= σ 2,

c3

= ν3 ,

and

c4

= ν4 − 3σ 4

(15.226)

where the ν ’s are its central moments (15.31). The third and fourth normalized cumulants are the skewness v = c 3/σ 3 = ν3 /σ 3 and the kurtosis κ = c4 /σ 4 = ν4/σ 4 − 3. Example 15.22 (Gaussian cumulants) The logarithm of the moment-generating function (15.221) of Gauss’s distribution is µ k + σ 2 k2 /2. Thus by (15.225), PG ( x , µ, σ ) has no skewness or kurtosis, its cumulants vanish cGn = 0 for n > 2, and its fourth central moment is ν 4 = 3σ 4 .

15.17 Fat Tails

The gaussian( probability distribution PG ( x , µ, σ ) falls off for |x − µ | ¸ σ very ) 2 2 fast – as exp − ( x − µ) /2 σ . Many other probability distributions fall off more slowly; they have fat tails. Rare “black-swan” events – wild fluctuations, market bubbles, and crashes – lurk in their fat tails. Gosset’s distribution, which is known as Student’s t-distribution with ν degrees of freedom PS ( x , ν, a ) =

1 + ν)/2) a √1π ³((³(ν/ 2) (a 2 + x 2) 1+ ν

(

has power-law tails. Its even moments are µ 2n

= (2n − 1)!!

³(ν/2

ν)/2

− n) ¶ a 2 ·n

³(ν/2)

2

(15.227)

(15.228)

15.17 Fat Tails

for 2n < ν and infinite otherwise. For ν Cauchy distribution

601

= 1, it coincides with the Breit–Wigner or a

1

PS ( x , 1 , a ) =

π a2

(15.229)

+ x2

in which x = E − E 0 and a = ³/2 is the half-width at half-maximum. Two representative cumulative probabilities are (Bouchaud and Potters, 2003, pp. 15–16) Pr(x , ∞) = Pr(x , ∞) =

√

´

´

∞ x

x

PS (x ± , 3 , 1) d x ±

√

∞

1 2

1

= −π

PS (x ± , 4 , 2) d x ±

Ã

arctan x +

= 21 − 34 u + 14 u 3

x

1 + x2

Ä

(15.230) (15.231)

where u = x / 2 + x 2 and a is picked so σ 2 = 1. William Gosset (1876– 1937), who worked for Guinness, wrote as Student because Guinness didn’t let its employees publish. The log-normal probability distribution on (0, ∞) Pln (x ) =

1

σx

√

2π

exp

Ã

−

ln2 ( x /x 0) 2σ 2

Ä

(15.232)

describes distributions of rates of return (Bouchaud and Potters, 2003, p. 9). Its moments are (Exercise 15.25)

= x0n en 2 . The exponential distributionon [0, ∞) Pe (x ) = αe − x has (Exercise 15.26) mean µ = 1/α and variance σ 2 = µn

2σ 2/

(15.233)

α

(15.234)

1/α 2 . The sum of n independent exponentially and identically distributed random variables x = x 1 + · · · + xn is distributed on [0, ∞) as (Feller, 1966, p. 10) Pn,e ( x ) = α

(α x )n−1 (n

− 1) ! e

−αx .

(15.235)

The sum of the squares x 2 = x 12 + · · · + x n2 of n independent normally and identically distributed random variables of zero mean and variance σ 2 gives rise to Pearson’s chi-squared distributionon (0 , ∞) Pn ,P ( x , σ )d x

=

√

2

σ

1 ³( n /2)

¶

x

σ

√

2

·n−1

e −x

2 /(2σ 2 )

dx

(15.236)

602

15 Probability and Statistics

which for x = v, n = 3, and σ 2 = kT / m is (Exercise 15.27) the Maxwell– Boltzmann distribution (15.118). In terms of χ = x /σ , it is Pn, P (χ

2

/2) d χ

2

=

1 ³(n /2)

¶ χ 2 ·n 2−1 /

2

e −χ

2/2

d

(χ 2/2) .

(15.237)

It has mean and variance µ

=n

and

σ

2

= 2n

(15.238)

and is used in the √ chi-squared test (Pearson, 1900). The Porter–Thomas distribution − x /2 PP T (x ) = e / 2π x and the exponential distribution Pe (x ) (15.234) are special cases of the class (15.236) of chi-squared distributions. Personal income, the amplitudes of catastrophes, the price changes of financial assets, and many other phenomena occur on both small and large scales. Lévy distributions describe such multi-scale phenomena. The characteristic function for a symmetric Lévy distribution is for ν ≤ 2 L˜ ν (k , aν ) = exp ( − aν |k |ν ) .

Its inverse Fourier transform (15.209) is for ν Lorentz distribution L 1 (x , a 1) =

and for ν

= 2 the gaussian

µ

L 2( x , a2 ) = PG ( x , 0, 2a2) =

= 1 (Exercise 15.28) the Cauchy or a1

π(x 2

(15.240)

+ a21 )

1

√π a

2

(15.239)

2

exp

¶

−

x2 4a 2

·

(15.241)

but for other values of ν no simple expression for L ν (x , aν ) is available. For 0 < ν < 2 and as x → ¶∞, it falls off as |x |−(1+ν) , and for ν > 2 it assumes negative values, ceasing to be a probability distribution (Bouchaud and Potters, 2003, pp. 10–13). 15.18 Central Limit Theorem and Jarl Lindeberg We have seen in Sections 15.9 and 15.11 that unbiased fluctuations tend to distribute the position and velocity of molecules according to Gauss’s distribution (15.84). Gaussian distributions occur very frequently. The central limit theorem suggests why they occur so often. Let x 1 , . . . , x N be N independent random variables described by probability distributions P1( x 1 ), . . . , PN (x N ) with finite means µ j and finite variances σ j2. The P j ’s may be all different. The central limit theorem says that as N → ∞ the probability distribution P ( N ) ( y ) for the average of the x j ’s

15.18 Central Limit Theorem and Jarl Lindeberg

y

=

603

1 ( x1 + x2 + · · · + x N ) N

(15.242)

tends to a gaussian in y quite independently of what the underlying probability distributions P j ( x j ) happen to be. Because expected values are linear (15.38), the mean value of the average y is the average of the N means µy

= E [ y] = E [( x1 + · · · + x N ) / N ] = N1 ( E [x1] + · · · + E [x N ]) =

1 N

(µ1

+ · · · + µN ) .

(15.243)

The independence of the random variables x 1, x 2 , . . . , x N implies (15.44) that their joint probability distribution factorizes P (x 1 , . . . , x N ) = P1 ( x1 ) P2( x 2) · · · PN (x N ).

(15.244)

And our rule (15.49) for the variance of a linear combination of independent variables says that the variance of the average y is the sum of the variances 2

σy

= V [(x1 + · · · + xN ) / N ] = N12

(

2

σ1

+ · · · + σN2

)

(15.245)

.

The conditional probability (15.3) P ( N ) ( y |x 1, . . . , x N ) that the average of the x’s is y is the delta function (4.36) P ( N ) ( y | x1 , . . . , x N ) =

δ( y

− ( x1 + x2 + · · · + x N )/ N ).

(15.246)

Thus by (15.8) the probability distribution P ( N ) ( y ) for the average y · · · + x N )/ N of the x j ’s is P (N )(y ) =

=

´ ´

= ( x1 + x2 +

P ( N ) ( y |x 1 , . . . , x N ) P ( x 1, . . . , x N ) d N x δ( y

− ( x1 + x2 + · · · + x N )/ N ) P ( x1, . . . , x N ) d

N

x

(15.247)

= d x1 · · · d x N . Its characteristic function is then ´ N P˜ (k ) = e iky P N ( y ) dy ´ = eiky δ( y − ( x1 + x2 + · · · + x N )/ N ) P ( x1, . . . , x N ) d N x dy ´ Ã ik Ä = exp N (x1 + x2 + · · · + x N ) P ( x1, . . . , x N ) d N x (15.248) Ä ´ Ã ik = exp N (x1 + x2 + · · · + x N ) P1( x1) P2 (x2 ) · · · PN ( xN ) d N x

where d N x (

)

(

)

604

15 Probability and Statistics

which is the product P˜ ( N ) (k ) = P˜1 (k /N ) P˜2 (k / N ) of the characteristic functions P˜ j (k / N ) =

´

· · · PÑ (k / N )

e ikx j / N P j (x j ) d x j

(15.249)

(15.250)

of the probability distributions P1 ( x1 ), . . . , PN ( x N ). The Taylor series (15.223) for each characteristic function is P˜j (k / N ) =

∞ ± n=0

( ik)n

n! N n

(15.251)

µn j

and so for big N we can use the approximation P˜ j (k / N ) ≈ 1 + in which µ2 j

2

µj

− 2kN 2 µ2 j

(15.252)

= σ 2j + µ2j by the formula (15.26) for the variance. So we have ) ik k2 ( 2 2 P˜ j (k / N ) ≈ 1 + µj − σ j + µj (15.253) N 2N2

or for large N P˜ j (k / N ) ≈ exp Thus as N erges to

ik N

¶ ik N

µj

−

k2 2 σ 2N 2 j

· .

→ ∞, the characteristic function (15.249) for the variable

¶ · N N 2 ˜P N (k) = ¼ P˜ j (k/ N ) = ¼ exp ik µ j − k 2 σ 2j N 2N j =1 j =1 ⎤ ⎡N ¶ · ¶ · ± ik k2 2 ⎦ 1 2 2 ⎣ = exp µj − σ = exp i µy k − 2 σ y k N 2N 2 j (

(15.254) y conv-

)

(15.255)

j =1

which is the characteristic function (15.211) of a gaussian (15.210) with mean and variance µy

=

1 ± µj N j =1 N

and

σy2

=

1 ± 2 σ . N 2 j =1 j N

(15.256)

The inverse Fourier transform (15.209) now gives the probability distribution P ( N ) ( y ) for the average y = ( x 1 + x 2 + · · · + x N )/ N as

15.18 Central Limit Theorem and Jarl Lindeberg

P ( N )( y) =

´∞

−∞

lim P

N →∞

(y )

= =

´

∞

´−∞∞ −∞

dk 2π

e −iky P˜ ( N ) (k )

e −iky lim P˜ ( N ) (k ) N →∞

¶

e −iky exp i µ y k −

dk 2π · 1 2 2 dk σ k 2 y 2π

Å

1

= PG ( y, µ y , σ y ) = √ σy

(15.257)

→ ∞ to Gauss’s distribution

which in view of (15.255) and (15.211) tends as N PG ( y , µ y , σ y ) (N )

605

exp

2π

2 − −2σµ2 y )

(y

Æ

(15.258)

y

with mean µ y and variance σ y2 as given by (15.256). The sense in which the exact distribution P ( N ) ( y ) converges to PG ( y , µ y , σ y ) is that for all a and b the probability Pr N (a < y < b ) that y lies between a and b as determined by the exact P ( N ) ( y ) converges as N → ∞ to the probability that y lies between a and b as determined by the gaussian PG ( y , µ y , σ y ) lim Pr N (a

N →∞

0 on the relevant interval (a , b), then its integral F(x ) =

´

x

a

P (r ) dr

(15.267)

is a strictly increasing function on (a , b ), that is, a < x < y < b implies F (x ) < F ( y ). Moreover, the function F (x ) rises from F (a ) = 0 to F (b ) = 1 and takes on every value 0 < y < 1 for exactly one x in the interval (a , b ). Thus the inverse function F −1 ( y ) x

= F −1 ( y)

if and only if

y = F (x )

(15.268)

is well defined on the interval (0 , 1) . Our random-number generator gives us random numbers u that are uniform on (0, 1). We want a random variable r whose probability Pr(r < x ) of being less than any x is F ( x ). The trick (Knuth, 1981, p. 116) is to generate a uniformly distributed random number u and then replace it with r

= F −1 (u).

(15.269)

For then, since F ( x ) is one-to-one (15.268), the statements F −1 (u ) u < F ( x ) are equivalent, and therefore Pr(r

χ02) = ´ . These probabilities PrN − M (χ 2 > χ02) are plotted in

622

15 Probability and Statistics χ

0.2

2 test

0.18 0.16 0.14 )02χ > 2χ( M – NrP

0.12 0.1

0.08

0.06 0.04 0.02 0

N– M= 2

0

4

5

6

10

10

8

15

20

25

30

χ20

Figure 15.8 The probabilities Pr N − M (χ 2 > χ02 ) are plotted from left to right for N − M = 2, 4, 6, 8, and 10 degrees of freedom as functions of χ02.

Fig. 15.8 for N − M = 2, 4, 6, 8, and 10. In particular, the probability of a value of χ 2 greater than χ02 = 20 respectively is 0.000045, 0.000499, 0.00277, 0.010336, and 0.029253 for N − M = 2, 4, 6, 8, and 10.

15.25 Kolmogorov’s Test Suppose we want to use a sequence of N measurements x j to determine the probability distribution that they come from. Our empirical probability distribution is (N )

Pe

(x )

=

1 ± δ( x N j =1 N

− x j ).

(15.347)

Our cumulative probability for events less than x then is (N )

Pre

−∞, x ) =

(

´x

−∞

(N )

Pe

± ± (x ) d x

=

´

1 ± ± ± δ( x − x j ) d x . −∞ N j =1 x

N

(15.348)

15.25 Kolmogorov’s Test

So if we label our events in increasing order x 1 probability of an event less than x is a staircase Pr(e N ) (−∞, x ) =

j N

for x j

623

≤

) qe (u < ).

(20.101)

The matrix element of the time-ordered product (20.100) of two position operators and two exponentials e −it H /± between states |a ³ and |b³ is

´b|e−it H ±T [q (t1) q(t2 )]e−it H ±|a³ = ´b|e−it H ±q (t )q (t )e−it H ±|a³ (20.102) = ´b|e−i t −t H ±q e−i t −t H ±q e−i t +t H ±|a³. /

/

/

/

>

(

>)

/

(

/

)q e (u < )e −u H /±

|a ³

(20.108)

|³

± q e−( u> −u< ) H / ± q e−(u+ u< ) H / ± a .

>)H /

As u → ∞, the exponential e −u H /± projects (20.70) states in onto the ground state |0³ which is an eigenstate of H with energy E 0 . So we replace the arbitrary states in (20.108) with the ground state and use the path-integral formula (20.68) for the last three exponentials of (20.108) e −2u E0 /± ´0 |T [q e (u 1 )qe ( u 2) ]|0³ =

±

´0|qb ³q (u1 )q( u2) e− S [q ] ±´qa |0³ Dq . e

/

(20.109) The same equation without the time-ordered product is e −2u E 0/± ´ 0|0³ = e −2u E 0/±

±

= ´0|qb ³e− S [q ] ±´qa |0³ Dq .

(20.110)

/

e

The ratio of the last two equations is

±

´0|qb³ q(u 1)q (u2 )e−S [q] ±´qa |0³ Dq ± (20.111) ´ 0|T [qe (u1 )qe (u2 )]|0³ = − S [ q ] ± ´0|qb ³e ´qa |0³ Dq in which the integration is over all paths from u = −∞ to u = ∞. The mean value in the ground state of the time-ordered product of k euclidian position operators is ± ´0|qb ³ q(u 1) · · · q (uk ) e−S [q] ±´qa |0³ Dq ± . ´0|T [qe ( u1) · · · qe (uk )]|0³ = − S [ q ] ± ´0|qb ³e ´qa |0³ Dq e

e

/

/

e

e

/

/

(20.112)

688

20 Path Integrals

20.8 Quantum Field Theory on a Lattice Quantum mechanics imposes upon n coordinates qi and conjugate momenta pk the equal-time commutation relations

[qi , pk ] = i ± δi k ,

[qi , qk ] = [ pi , pk ] = 0.

and

(20.113)

In a theory of a single spinless quantum field, a coordinate q x ≡ φ ( x ) and a conjugate momentum p x ≡ π( x ) are associated with each point x of space. The operators φ ( x ) and π( x ) obey the commutation relations

[φ (x), π( x² )] = i ± δ(x − x² )

[φ (x ), φ (x² ) ] = [π(x), π( x² )] = 0

and

(20.114)

inherited from quantum mechanics. To make path integrals, we replace space by a 3-dimensional lattice of points x = a (i, j , k ) = ( ai, a j , ak ) and eventually let the distance a between adjacent points go to zero. On this lattice and at equal times t = 0, the field operators obey discrete forms of the commutation relations (20.114)

[φ (a(i , j, k )), π(a(´, m , n ))] = i a±3 δi δ j m δk n [φ (a(i , j, k )), φ (a(´, m , n ))] = [π(a(i, j , k)), π(a (´, m , n))] = 0. ,´

,

,

(20.115)

The vanishing commutators imply that the field and the momenta have compatible eigenvalues for all lattice points a (i, j , k )

| ² ³ = φ ² (a(i, j , k))|φ² ³

φ (a (i, j , k )) φ

Their inner products are

´φ ²|π ² ³ =

È² i , j ,k

È |φ ²³´φ² | d φ² (a(i , j, k )) = I i , j ,k

and orthonormal

´φ² |φ²²³ =

È

| ² ³ = π ² (a(i, j , k ))|π ² ³.

π(a (i, j, k )) π

a 3 ia 3φ² (a(i , j ,k ))π ² (a (i , j ,k ))/± e . 2π ±

These states are complete

±

and

±

= |π ² ³´π ² |

δ(φ ²( a (i, j , k ))

i , j,k

with a similar equation for ´π ²|π ²² ³.

È

d π ² (a (i , j, k ))

(20.116)

(20.117)

(20.118)

i , j,k

− φ ²²(a(i, j , k)))

(20.119)

20.8 Quantum Field Theory on a Lattice

The hamiltonian for a free field of mass m is H

=

1 2

±

π

2

2 4

+ c2 (∇ φ)2 + m±2c

where v = a (i, j, k ), πv = lattice gradient (∇ φv ) 2 is

φ

2

d 3x

3

= a2

π(a (i, j, k )) , φ v

=

¹ v

2

πv

689 2 4

+ c2 (∇ φ )2 + m±2c v

2

φv

(20.120) φ (a (i, j , k )) , and the square of the

¿(φ (a(i + 1, j , k)) − φ (a(i, j, k)))2 + (φ (a(i , j + 1, k)) − φ (a (i, j, k)))2 À (20.121) + (φ (a(i, j, k + 1)) − φ (a(i , j, k )))2 /a2 .

Other interactions, such as c 3φ 4/±, can be added to this hamiltonian. To simplify the appearance of the equations in the rest of this chapter, I will often use natural unitsin which ± = c = 1. To convert the value of a physical quantity from natural units to universal units, one multiplies or divides its natural-unit value by suitable factors of ± and c until one gets the right dimensions. For example, if V = 1/ m is a time in natural units, where m is a mass, then the time in arbitrary units is T = ±/(mc 2 ). If V = 1/ m is a length in natural units, then the length in universal units is L∑= ±/(mc ). ∑ We set K = a 3 v πv2 /2 and V = (a 3/2) v (∇ φv )2 + m 2 φ2v + P (φv ) in which P (φv ) represents the self-interactions of the field. With ± = (tb − ta )/ n, Trotter’s product formula (20.6) is the n → ∞ limit of e −i (t b−ta )(K +V )

( −i t −t

=

e

a )K /n

(b

e −i (t b−ta ) V / n

)n ( −i K −i V )n = e e . ±

±

(20.122)

We insert I in the form (20.118) between e −i ± K and e −i ± V

´ φ1 |e−i ± K e−i ± V |φa ³ = ´φ1|e−i ± K

±

|π ² ³´π ² |

È

d πv² e−i ± V |φa ³

(20.123)

v

and use the eigenstate formula (20.116)

´φ1

|e−i ± K e−i ± V |φ

a

³ = e−i ± V (φa )

±

e −i ± K (π ) ´ φ1 |π ² ³´π ² |φa ³ ²

È

d πv²

(20.124)

v

and the inner-product formula (20.117)

´φ1

|e−i ± K e−i ± V |φ

a

È ³ = e−i ± V (φa )

·±

a 3d πv² a 3[−i ±πv2/ 2+i (φ1v −φav )πv² ] e 2π

v

´φ1 |e−i ± K e−i ± V |φa ³ =

v

a3 2π i ±

.

(20.125)

Using the gaussian integral (20.1), we set φ˙a

È º³

¸

= (φ1 − φa )/± and get ´1 2 /

e

i ± a3 [φ˙2a v −( ∇ φa v )2 − m 2 φ2a v − P (φv )]/2

» .

(20.126)

690

20 Path Integrals

= ( tb − ta )/± such time intervals is º³ 3 ń 2 ± È a n − i t −t H ´ φb |e |φ a ³ = 2 π i ( tb − ta )

The product of n

/

(b

» ei Sv Dφv

(20.127)

À − (∇ φ j )2 − m 2φ 2j − P (φ ) ,

(20.128)

a)

v

in which Sv

=

tb − ta a 3 n 2

n−1 ¹ ¿ ˙2

φ jv

j =0

v

v

v

˙ = n(φ j +1 − φ j )/(tb − ta ), and Dφ = d φn −1 · · · d φ1 . The amplitude ´φb |e −i t −t H |φa ³ is the integral over all fields that go from φa ( x )

φjv

,v

,v

v

(b

,v

,v

a)

at ta to φb( x ) at tb each weighted by an exponential

´φb |e−i (tb −ta )H |φa ³ = of its action S [φ] =

±t

b

ta

dt

±

d 3x

±

e i S [φ ] D φ

(20.129)

1¿ 2 ˙ − (∇ φ)2 − m 2φ 2 − P (φ)À φ 2

(20.130)

→ ∞ limit of the product over all spatial vertices v » È º³ a3n ń 2 d φn −1 · · · d φ1 . (20.131) Dφ = 2π i (tb − ta )

in which D φ is the n

/

,v

,v

v

Equivalently, the time-evolution operator is e −i ( tb−ta ) H

É

=

±

|φb ³ei S[ ] ´φa | D φ Dφa D φb φ

(20.132)

in which D φa D φb = v d φ a,v d φb,v is an integral over the initial and final states. As in quantum mechanics (Section 20.4), the path integral for an action that is quadratic in the fields is an exponential of the action of a classical process S [φc ] times a function of the times ta , tb and of other parameters

´φb |e−i (tb−t a) H |φa ³ =

±

ei S [φ ] D φ

=

f (ta , tb , . . . ) ei S[φc ]

(20.133)

in which S [φc ] is the action of the process that goes from φ ( x , ta ) = φa ( x ) to φ ( x , tb ) = φ b ( x ) and obeys the classical equations of motion, and the function f is a path integral over all fields that go from φ ( x , ta ) = 0 to φ ( x , tb ) = 0.

20.8 Quantum Field Theory on a Lattice

691

Example 20.8(Classical processes) The field φ (x , t )

Ê

=

±

ei k·x [a(k) cos ωt

+ b(k) sin ωt ] d3 k

(20.134)

with ω = k2 + m 2 makes the action (20.130) for P a solution of the equation of motion ∇ 2φ − φ¨ − m 2φ transforms

˜ k, ta ) = φ(

±

e−ik ·x φ ( x, ta )

d3x and (2π ) 3

= 0 stationary because it is = 0. In terms of the Fourier ± 3 ˜φ( k, tb ) = e−i k·x φ ( x, tb ) d x3 , (2 π ) (20.135)

the solution that goes from φ ( x, ta ) to φ ( x, tb ) is φ ( x, t )

=

±

eik· x

˜ k, ta ) + sin ω (t sin ω (t b − t ) φ( sin ω (tb − t a )

˜ k, tb ) 3 − ta ) φ( d k.

˙ x, ta ) is The solution that evolves from φ (x , t a ) and φ( φ (x , t )

=

±

e

i k· x

·

˜ k, ta ) + sin ω (t − t a ) φ˜˙ (k, ta ) cos ω (t − ta ) φ( ω

¸

d3 k

(20.136)

(20.137)

in which the Fourier transform φ˜˙ ( k, t a ) is defined as in (20.135).

Like a position operator (20.98), a field at time t is defined as φ ( x, t )

= eit H ±φ (x, 0)e−it H ± /

/

(20.138)

in which φ ( x ) = φ ( x , 0 ) is the field at time zero, which obeys the commutation relations (20.114). The time-ordered product of several fields is their product with newer (later time) fields standing to the left of older (earlier time) fields as in the definition (20.100). The logic (20.102–20.106) of the derivation of the path-formulas for time-ordered products of position operators also applies to field operators. One finds (Exercise 20.14) for the mean value of the time-ordered product of two fields in an energy eigenstate |n ³

±

ń|T [φ (x1)φ (x2 )]|n³ =

ń|φb ³φ (x1)φ (x2 )ei S[ ] ±´φa |n³ D φ ± ń|φb³ ei S[ ] ±´φa |n ³ Dφ φ /

(20.139)

φ /

in which the integrations are over all paths that go from before t1 and t2 to after both times. The analogous result for several fields is (Exercise 20.15)

692

20 Path Integrals

±

ń |φb ³φ (x1 ) · · · φ (xk ) ei S [ ] ±´φa |n³ D φ ± ń|T [φ (x1 ) · · · φ (xk )]|n³ = ń|φb ³ei S [ ] ±´φa |n³ D φ φ /

(20.140)

φ /

in which the integrations are over all paths that go from before the times t1 , . . . , tk to after them. 20.9 Finite-Temperature Field Theory

Since the Boltzmann operator e−β H = e − H /(kT ) is the time evolution operator e −it H / ± at the imaginary time t = − i ±β = −i ±/(kT ), the formulas of finitetemperature field theory are those of quantum field theory with t replaced by −iu = −i ±β = −i ±/(kT ). As in Section 20.8, we use as our hamiltonian H = K + V where K and V are sums over all lattice vertices v = a ( i, j, k ) = (ai , a j , ak) of the cubes of volume a 3 times the squared momentum and the potential energy H

3

= K + V = a2

¹

3

+ a2

2

πv

v

¹

(

∇φ )2 + m 2 φ2 + P (φ ). v

(20.141)

v

v

v

A matrix element of the first term of the Trotter product formula (20.7) e −β( K +V )

= nlim →∞

(−Kn

)n

/

e −β V / n

is the imaginary-time version of (20.125) with ±

= ±β/ n

´φ1

|e−± K e−± V |φ

a

³ = e−± V (φa )

È ·± v

Setting φ˙av

e

β

a 3d πv² a3 [−±πv2 /2+i (φ1v −φav )πv² ] e 2π

= (φ1 − φa )/±, we find, instead of (20.126) º³ 3 ´1 2 È a ˙ +m e− a [ + ∇ ´φ1|e− K e− V |φa ³ = 2π ± v

¸

v

/

±

(20.142)

±

± 3 φ a2v

(

φa v

)2

2 φ2 av

+ P (φ )]/ 2 v

.

(20.143)

» .

(20.144)

v

The product of n

= ±β/± such inverse-temperature intervals is º³ 3 ń 2 ± » È a n ´φb|e− H |φa ³ = e− S D φ 2 πβ /

β

ev

v

(20.145)

v

in which the euclidian action is Se v where φ˙ j v

=

a3 ¹ ¿ ˙ 2 2 2 2 φ + (∇ φ j v ) + m φ j v n 2 j =0 j v n −1

β

= n(φ j +1 − φ j ,v

,v )/β

and D φv

+ P (φ ) v

= d φn−1 · · · d φ1 ,v

À

,v

.

(20.146)

20.9 Finite-Temperature Field Theory

693

The amplitude ´φb |e −(βb −βa ) H |φa ³ is the integral over all fields that go from φ a ( x ) at βa to φ b ( x ) at βb each weighted by an exponential

´φb |e−(βb −βa )H |φa ³ = of its euclidian action Se [φ ] =

±

βb

±

du

d 3x

βa

1¿ 2 ˙ φ 2

±

e − Se [φ] D φ

(20.147)

+ ( ∇φ)2 + m 2 φ2 + P (φ)

À

(20.148)

in which D φ is the n → ∞ limit of the product over all spatial vertices v Dφ

=

È º³ v

a3 n

ń 2 /

2π(βb − βa )

d φn−1,v · · · d φ1,v

»

.

(20.149)

Equivalently, the Boltzmann operator is e −(βb −βa ) H

É

=

±

|φb ³e− S [ ] ´φa | D φ Dφa D φb e φ

(20.150)

in which D φa D φb = v d φa,v d φb,v is an integral over the initial and final states. The trace of the Boltzmann operator is the partition function Z (β) = Tr(e −β H ) =

±

e −Se [φ] ´ φa |φb³ Dφ Dφa D φb =

±

e −Se [φ] Dφ Dφa

(20.151) which is an integral over all fields that go back to themselves in euclidian time β . Like a position operator (20.99), a field at an imaginary time t = −iu = −i ±β is defined as

= φe ( x, ±β) = eu H ±φ (x , 0) e−u H ± (20.152) in which φ ( x ) = φ ( x , 0 ) = φe ( x , 0 ) is the field at time zero, which obeys the comφe ( x , u )

/

/

mutation relations (20.114). The euclidian-time-ordered product of several fields is their product with newer (higher u = ±β ) fields standing to the left of older (lower u = ±β ) fields as in the definition (20.101). The euclidian path integrals for the mean values of euclidian-time-orderedproducts of fields are similar to those (20.139 and 20.140) for ordinary timeordered-products. The euclidian-time-ordered-product of the fields φ ( x j ) = φ ( x j , u j ) is the path integral

±

ń|T [φe (x1 )φe (x2 )]|n³ =

ń|φb ³φ (x1)φ (x2 )e−S [ ] ±´ φa |n³ D φ ± ń|φb ³e− S [ ] ±´φa |n³ D φ e φ /

e φ /

(20.153)

694

20 Path Integrals

in which the integrations are over all paths that go from before u 1 and u 2 to after both euclidian times. The analogous result for several fields is

±

ń|T [φe (x1 ) · · · φe (xk )]|n³ =

ń|φb ³φ (x1 ) · · · φ (xk )e− S [ ] ±´φa |n³ D φ ± ń|φb ³e− S [ ] ±´φa |n³ Dφ e φ /

e φ /

(20.154) in which the integrations are over all paths that go from before the times u 1, . . . , u k to after them. In the low-temperature β = 1/(kT ) → ∞ limit, the Boltzmann operator is proportional to the outer product |0 ³´0| of the ground-state kets, e−β H → e−β E0 |0³´0 |. In this limit, the integrations are over all fields that run from u = −∞ to u = ∞ and the only energy eigenstate that contributes is the ground state of the theory

±

´0|T [φe ( x1) · · · φe ( xk )]|0³ =

´0|φb ³φ (x1 ) · · · φ (xk ) e− S [q ] ±´φa |0³ D φ ± . − S [ q ] ± ´0|φb³ e ´φa |0³ D φ e

e

/

/

(20.155)

Formulas like this one are used in lattice gauge theory. 20.10 Perturbation Theory Field theories with hamiltonians that are quadratic in their fields like H0

± 1¿ À 3 2 2 2 2 = π ( x ) + (∇ φ (x )) + m φ ( x ) d x 2

(20.156)

are soluble. Their fields evolve in time as φ ( x, t )

= eit H φ (x, 0)e−it H . 0

(20.157)

0

The mean value in the ground state of H0 of a time-ordered product of these fields is a ratio (20.140) of path integrals

±

´0|φb ³ φ (x1) · · · φ (xn ) ei S [ ]´φa |0³ D φ ± ´0|T [φ (x1) · · · φ (xk )] |0³ = ´0|φb ³ ei S [ ] ´φa |0³ D φ 0 φ

0 φ

(20.158)

in which the action S0 [φ ] is quadratic in the field φ S0 [φ ] =

1 2

±

− ∂a φ (x )∂ a φ (x ) − m 2φ 2( x ) d4 x .

(20.159)

20.10 Perturbation Theory

695

Here −∂a φ∂ a φ = φ˙ 2 − (∇ φ)2, and the integrations are over all fields that run from φ a at a time before the times t1 , . . . , tk to φ b at a time after t1 , . . . , tk . The path integrals in the ratio (20.158) are gaussian and doable. The Fourier transforms

˜

φ( p)

=

±

e −i px φ ( x ) d 4 x

and

φ (x )

=

±

˜ p) ei px φ(

d4 p (2π )4

(20.160)

turn the spacetime derivatives in the action into a quadratic form 1 S0 [φ ] = − 2

±

4

|φ(˜ p)|2 ( p2 + m 2) (d2πp)4

(20.161)

˜ − p) = φ˜ ∗( p) by (4.25) since the field φ is real. in which p 2 = p 2 − p 02 and φ( The initial ´φa |0³ and final ´0 |φb ³ wave functions produce the i ± in the Feynman propagator (6.260). Although its exact form doesn’t matter here, the wave function ´φ |0³ of the ground state of H0 is the exponential (19.53)

¸ · 1± 3 Ê d p 2 2 2 ˜ p) | p + m ´φ|0³ = c exp − 2 |φ( (2π )3

(20.162)

˜ p) is the spatial Fourier transform of the eigenvalue φ (x) in which φ( ˜

φ( p)

=

±

e −i p·x φ ( x ) d 3 x

(20.163)

and c is a normalization factor that will cancel in ratios of path integrals. Apart from −2i ln c which we will not keep track of, the wave functions and ´0 |φb ³ add to the action S0 [φ ] the term

[ ]=

² S0 φ

i 2

±Ê

p2 + m 2

µ ¶ 3 ˜ p, t )|2 + |φ( ˜ p, −t )|2 d p3 |φ( (2π )

in which we envision taking the limit t → ∞ with φ ( x , t ) φ ( x , −t ) = φ a ( x ) . The identity (Weinberg, 1995, pp. 386–388) f (+∞) + f (−∞) = lim

→ 0+

±

(Exercise 20.22) allows us to write ² S0 [φ ] as i± ² S0 [φ ] = lim ±→ 0+ 2

±Ê

p2

+

m2

±

∞ −∞

±

±∞

−∞

(20.164)

=

f (t ) e −± |t | dt

3

|φ(˜ p, t )|2 e− |t | dt (d2πp)3 . ±

So to first order in ± , the change in the action is (Exercise 20.23)

´φa |0³

φ b( x )

and

(20.165)

(20.166)

696

20 Path Integrals

±Ê

i± ± → 0+ 2 ± i± = ± lim → 0+ 2

[ ]=

² S0 φ

lim

Ê

Thus the modified action is

p2 + m 2

±∞ −∞

3

|φ(˜ p, t ) |2 dt (2dπp)3

˜ p)|2 p 2 + m 2 |φ(

d4 p . ( 2π )4

(20.167)

±

µ ¶ 4 Ê |φ(˜ p)|2 p2 + m 2 − i ± p2 + m 2 (2dπp)4 ± ( ) 4 1 (20.168) = − 2 |φ(˜ p)|2 p2 + m 2 − i ± (d2πp)4

S0[φ , ± ] = S0 [φ ] + ² S0[φ] =

−

1 2

since the square root is positive. In terms of the modified action, our formula (20.158) for the time-ordered product is the ratio

±

· · · φ (xn ) ei S [ ±

φ ( x 1)

´0|T [φ (x1 ) · · · φ (xn )] |0³ =

0 φ ,±

i S0 [φ ,± ]

e

] Dφ .

(20.169)

Dφ

We can use this formula (20.169) to express the mean value in the vacuum |0³ of the time-ordered exponential of a spacetime integral of j ( x )φ (x ), in which j ( x ) is a classical (c-number, external) current, as the ratio Z 0[ j ] ≡ ´0| T

±

=

Ä ·± exp i

·±

exp i

4

j (x ) φ ( x ) d x

¸

¸Å

|0³

j ( x ) φ ( x ) d x e i S0[φ ,± ] D φ

±

4

e

i S0 [φ ,± ]

(20.170) .

Dφ

Since the state |0³ is normalized, the mean value Z 0 [0 ] is unity, Z 0[0] absorb the current into the action S0 [φ , ±, j ] = S0 [φ , ± ] +

±

j (x ) φ ( x ) d 4 x

= 1. If we (20.171)

then in terms of the current’s Fourier transform j˜( p) =

±

e −i px j ( x ) d 4 x

(20.172)

the modified action S0[φ , ±, j ] is (Exercise 20.24) S0[φ , ±, j ] =

−

1 2

±¼

½ ( 2 2 ) ˜∗ ˜ ∗ 2 ˜ ˜ ˜ |φ( p) | p + m − i ± − j ( p)φ( p) − φ ( p) j ( p)

d4 p . ( 2π )4 (20.173)

20.10 Perturbation Theory

697

˜ p) − j˜( p)/( p2 Changing variables to ψ˜ ( p ) = φ( action S0 [φ , ±, j ] as (Exercise 20.25)

+ m 2 − i ±), we can write the » 4 ±º ∗ ( p ) j˜( p ) ˜ ( ) j 1 d p ˜ ( p)|2 p2 + m 2 − i ± − 2 2 | ψ S0[φ , ±, j ] = − 2 ( p + m − i ±) ( 2π )4 ± º j˜∗( p) j˜( p) » d 4 p 1 = S0[ψ, ± ] + 2 ( p2 + m 2 − i ±) (2π )4 . (20.174) And since Dφ = D ψ , our formula (20.170) gives simply (Exercise 20.26) Æ ± Ç i | j˜( p )|2 d4 p Z 0 [ j ] = exp . (20.175) 2 p 2 + m 2 − i ± (2π )4

Going back to position space, one finds (Exercise 20.27) Z 0 [ j ] = exp

·i ± 2

j ( x ) ²(x − x ² ) j (x ² ) d 4 x d 4 x ²

¸

(20.176)

− x ² ) is Feynman’s propagator (6.260) ± ei p x − x ² d 4 p ² ² . (20.177) ²( x − x ) = ² F ( x − x ) = p 2 + m 2 − i ± (2 π )4 The functional derivative (Chapter 19) of Z 0 [ j ], defined by (20.170), is · ³± ´¸ 1 δ Z 0[ j ] ² ² 4 ² = ´0| T φ (x ) exp i j (x )φ (x )d x |0³ (20.178)

in which ²(x

(

i

)

δ j (x )

while that of equation (20.176) is 1 δ Z0 [ j ] i δ j ( x)

± = Z 0[ j ]

²( x

− x² ) j (x ² ) d 4 x ² .

(20.179)

Thus the second functional derivative of Z 0 [ j ] evaluated at j

= 0 gives Ë Ë ¿ À ´ 0| T φ (x )φ (x ² ) |0³ = i12 δ jδ( xZ)δ0[j (j x] ² ) ËË = −i ²( x − x ² ). j =0 2

Similarly, one may show (Exercise 20.28) that

¿ À ´ 0| T φ (x )φ (x )φ (x )φ (x ) |0³ = 1

2

3

4

(20.180)

Ë

ËË δ4 Z0 [ j ] 1 i 4 δ j ( x 1 )δ j ( x 2 )δ j ( x 3)δ j ( x 4) Ë j =0

= −²(x1 − x2 )²(x3 − x4 ) − ²( x1 − x3)²( x2 − x4) − ²(x1 − x4 )²(x2 − x3 ).

(20.181)

698

20 Path Integrals

Suppose now that we add a potential V (φ) to the free hamiltonian (20.156). Scattering amplitudesÀ are matrix elements of the time-ordered exponential ¿ Ì T exp −i V (φ) d 4 x (Weinberg, 1995, p. 260). Our formula (20.169) for the mean value in the ground state |0 ³ of the free hamiltonian H0 of any time-ordered product of fields leads us to

±

· ± ¸ 4 Ä · ± ¸Å exp −i V (φ) d x e i S [ ± ´ 0|T exp −i V (φ) d 4 x |0³ =

0 φ ,±

e

i S0 [φ ,± ]

] Dφ .

Dφ

(20.182) Using (20.180 and 20.181), we can cast this expression into the magical form

Ä

· ± ´0|T exp −i

4

V (φ) d x

¸Å

ËË · ± ³ δ ´ ¸ 4 |0³ = exp −i V i δ j ( x ) d x Z 0[ j ]ËË

j =0

.

(20.183) The generalization of the path-integral formula (20.169) to the ground state |µ³ of an interacting theory with action S is

±

´µ|T [φ (x1) · · · φ (xn ) ] |µ³ =

· · · φ (xn ) ei S[ ±

φ ( x 1)

φ ,±

] Dφ

ei S [φ ,± ] D φ

(20.184)

in which a term like i ±φ 2 is added to make the modified action S [φ , ± ]. These are some of the techniques one uses to make states of incoming and outgoing particles and to compute scattering amplitudes (Weinberg, 1995, 1996; Srednicki, 2007; Zee, 2010).

20.11 Application to Quantum Electrodynamics

In the Coulomb gauge ∇ · A = 0, the QED hamiltonian is H

= Hm +

± ·1 2

π

2

+

¸

1 3 2 (∇ × A) − A · j d x 2

+ VC

(20.185)

in which Hm is the matter hamiltonian, and VC is the Coulomb term VC

=

1 2

±

j 0 ( x , t ) j 0( y, t ) 3 3 d x d y. 4 π | x − y|

(20.186)

The operators A and π are canonically conjugate, but they satisfy the Coulombgauge conditions ∇ · A = 0 and ∇ · π = 0.

20.11 Application to Quantum Electrodynamics

699

One may show (Weinberg, 1995, pp. 413–418) that in this theory, the analog of Equation (20.184) is

±

O1 · · · On ei S δ [∇ · A] D A D ψ

±

´ µ|T [O1 · · · On ] |µ³ =

C

e

i SC

δ

[∇ · A] D A D ψ

in which the Coulomb-gauge action is SC

=

±

1 ˙2 1 A − (∇ × A)2 2 2

+ A · j + Lm d x − 4

and the functional delta function δ

[∇ · A] =

È

±

VC dt

∇ · A( x ))

δ(

x

(20.187)

(20.188)

(20.189)

enforces the Coulomb-gauge condition. The term L m is the action density of the matter field ψ . Tricks are available. We introduce a new field A0 ( x ) and consider the factor

· ± 1( ) ¸ 0 0 − 1 0 2 4 ∇ A +∇µ j d x D A F = exp i 2 ±

(20.190)

which is just a number independent of the charge density j 0 since we can cancel the j 0 term by shifting A0 . By µ−1 , we mean − 1 /4π |x − y |. By integrating by parts, we can write the number F as (Exercise 20.29)

±

· ± 1 ( )2 ¸ 1 0 −1 0 4 0 0 0 F = exp i ∇ A − A j − 2 j µ j d x D A0 2 ¸ ± ± · ± 1 ( )2 0 0 0 4 = exp i 2 ∇ A − A j d x + i VC dt D A0 .

(20.191)

So when we multiply the numerator and denominator of the amplitude (20.187) by F , the awkward Coulomb term VC cancels, and we get

±

´µ|T [O1 · · · On ] |µ³ =

O1 · · · On ei S² δ[∇ · A] D A D ψ

±

e

i S²

δ

[∇ · A] D A D ψ

(20.192)

where now D A includes all four components Aμ and S² =

±

)2 1 ˙2 1 1( ∇ A0 + A · j A − (∇ × A)2 + 2 2 2

− A 0 j 0 + Lm d 4 x .

(20.193)

700

20 Path Integrals

Since the delta-functional δ [∇ · A] enforces the Coulomb-gauge condition, we can add to the action S ² the term ( ∇ · A˙ ) A0 which is − A˙ · ∇ A0 after we integrate by parts and drop the surface term. This extra term makes the action gauge invariant S

=

±

±

1 ˙ 0 2 (A − ∇ A ) 2

= −

1 4

Fab F

ab

− 12 (∇ × A)2 + A · j − A 0 j 0 + Lm d4 x

+ A jb + Lm d b

Thus at this point we have

±

(20.194)

x.

O 1 · · · On ei S δ[∇ · A] D A D ψ

±

´µ|T [O1 · · · On ] |µ³ =

4

iS

e

δ

(20.195)

[∇ · A] D A D ψ

in which S is the gauge-invariant action (20.194), and the integral is over all fields. The only relic of the Coulomb gauge is the gauge-fixing delta functional δ [∇ · A]. We now make the gauge transformations A ²b (x ) = Ab ( x ) + ∂b ¶( x ) and ² ψ ( x ) = e iq ¶( x ) ψ ( x ) in the numerator and also, using a different gauge transformation ¶ ², in the denominator of the ratio (20.195) of path integrals. Since we are integrating over all gauge fields, these gauge transformations merely change the order of integration Ì ∞ in the numerator Ì ∞ and denominator of that ratio. They are like replacing −∞ f ( x ) d x by −∞ f ( y ) dy. They change nothing, and so ´µ|T [O1 · · · On ] |µ³ = ´µ|T [O1 · · · On ] |µ³ ² in which the prime refers to the gauge transformations ¶ and ¶² . We’ve seen that the action S is gauge invariant. So is the measure D A Dψ . We now restrict ourselves to operators O1 · · · On that are gauge invariant. So in ´µ|T [O1 · · · O n ] |µ³² , the replacement of the fields by their gauge transforms affects only the Coulomb-gauge term δ [∇ · A]

±

´µ|T [O1 · · · On ] |µ³ =

O1 · · · O n ei S δ [∇ · A + µ¶] D A D ψ

±

e δ [∇ · A + µ¶² ] D A D ψ iS

.

(20.196)

We now have two choices. If we integrate over all gauge functions ¶( x ) and in both the numerator and the denominator of this ratio (20.196), then apart from over-all constants that cancel, the mean value in the vacuum of the timeordered product is the ratio

¶² ( x )

±

´ µ|T [O1 · · · On ] |µ³ =

O1 · · · On ei S D A D ψ

±

ei S D A D ψ

(20.197)

20.11 Application to Quantum Electrodynamics

701

in which we integrate over all matter fields, gauge fields, and gauges. That is, we do not fix the gauge . The analogous formula for the euclidian time-ordered product is

±

O1 · · · On e− S À ¿ ± ´µ|T Oe 1 · · · Oe n |µ³ = ,

,

e

D A Dψ

(20.198)

e − Se D A D ψ

in which the euclidian action Se is the spacetime integral of the energy density. This formula is quite general; it holds in nonabelian gauge theories and is important in lattice gauge theory. Our second choice is to multiply the Ìnumerator and the denominator of the ratio (20.196) by the exponential exp[−i 21 α ( µ¶)2 d 4 x ] and then integrate over ¶( x ) in the numerator and over ¶² ( x ) in the denominator. This operation just multiplies the numerator and denominator by the same constant factor, which cancels. But if before integrating over all gauge transformations, we Ì shift ¶ so that µ¶ changes to µ¶ − A˙ 0, then the exponential factor is exp [−i 21 α ( A˙ 0 − µ¶)2 d4 x ]. Now when we integrate over ¶( x ), the delta function δ(∇ · A +µ ¶) replaces µ¶ by −∇ · A Ì 1 in the inserted exponential, converting it to exp[−i 2 α ( A˙ 0 + ∇ · A)2 d 4 x ]. This term changes the gauge-invariant action (20.194) to the gauge-fixed action Sα

±

= − 41 Fab F ab − α2 (∂b A b )2+ A b jb + Lm d4 x .

(20.199)

This Lorentz-invariant, gauge-fixed action is much easier to use than the Coulombgauge action (20.188) with the Coulomb potential (20.186). We can use it to compute scattering amplitudes perturbatively. The mean value of a time-ordered product of operators in the ground state |0³ of the free theory is

±

O 1 · · · On ei S

±

´0|T [O1 · · · On ] |0³ =

α

i Sα

e

D A Dψ

(20.200)

.

D A Dψ

By following steps analogous to those that led to (20.177), one may show (Exercise 20.30) that in Feynman’s gauge, α = 1, the photon propagator is

´0|T

¿A

À |0³ = − i µ

μ (x ) A ν ( y )

μν ( x

− y) = − i

±

ημν

q2 − i ±

e iq ·( x − y)

d4 q . (2 π )4 (20.201)

702

20 Path Integrals

20.12 Fermionic Path Integrals In our brief introduction (1.11–1.12) and (1.44–1.46), to Grassmann variables, we learned that because θ 2 = 0 the most general function f (θ ) of a single Grassmann variable θ is f (θ ) = a + b θ . So a complete integral table consists of the integral of this linear function

±

f (θ ) d θ

=

±

a + b θ dθ

=a

±

dθ

Ì

±

+b

θ

dθ .

(20.202)

This equation has two unknowns, the integral d θ of unity and the integral of θ . We choose them so that the integral of f (θ + ζ )

±

f (θ + ζ ) d θ

=

±

a + b (θ

±

+ ζ ) dθ = (a + b ζ )

dθ

±

+b Ì

θ dθ

Ì θ dθ

(20.203)

is the same as the integralÌ(20.202) of f (θ ). Thus the integral d θ of unity must vanish, while the integral θ d θ of θ can be any constant, which we choose to be unity. Our complete table of integrals is then

±

dθ

=0

and

±

θ

dθ

= 1.

(20.204)

The anticommutation relations for a fermionic degree of freedom ψ are

{ψ, ψ † } ≡ ψ ψ † + ψ †ψ = 1

{ ψ, ψ} = {ψ † , ψ † } = 0.

and

Because ψ has ψ †, it is conventional to introduce a variable anticommutes with itself and with θ

{θ ∗ , θ ∗ } = {θ ∗, θ } = {θ , θ } = 0. The logic that led to (20.204) now gives

±

θ∗

(20.205)

=

θ†

that

(20.206)

±

= 0 and θ ∗ dθ ∗ = 1. (20.207) We define the reference state |0³ as |0³ ≡ ψ |s ³ for a state |s ³ that is not annihilated by ψ . Since ψ 2 = 0, the operator ψ annihilates the state |0 ³ 2 ψ |0³ = ψ |s ³ = 0. (20.208) dθ ∗

The effect of the operator ψ on the state

|θ ³ = exp is

| ³=ψ

ψ θ

³

³

†

ψ θ

1+ψ

†

−

θ

−

1 ∗ θ θ 2

´

1 ∗ θ θ 2

|0³ =

´

³

1+ψ

†

θ

−

1 ∗ θ θ 2

´

|0 ³

(20.209)

|0³ = ψ ψ † θ |0³ = ( 1 − ψ †ψ )θ |0³ = θ |0³ (20.210)

20.12 Fermionic Path Integrals

while that of θ on |θ ³ is

| ³=θ

³

θ θ

1+ψ

†

θ

−

1 ∗ θ θ 2

´

703

|0³ = θ |0³.

(20.211)

The state |θ ³ therefore is an eigenstate of ψ with eigenvalue θ

| ³ = θ |θ ³. The bra corresponding to the ket |ζ ³ is ³ ´ 1 ∗ ∗ ´ζ | = ´0| 1 + ζ ψ − 2 ζ ζ

(20.212)

ψ θ

(20.213)

and the inner product ´ ζ |θ ³ is (Exercise 20.31)

´ζ |θ ³ = ´0|

³

1 1 + ζ ∗ψ − ζ ∗ζ

´³

1+ψ

2

†

θ

−

1 ∗ θ θ 2

´

|0³

= ´0|1 + ζ ∗ψ ψ †θ − 12 ζ ∗ ζ − 12 θ ∗ θ + 41 ζ ∗ ζ θ ∗ θ |0³ = ´0|1 + ζ ∗θ − 12 ζ ∗ ζ − 12 θ ∗ θ + 14 ζ ∗ ζ θ ∗ θ |0³ · )¸ 1( ∗ ∗ ∗ = exp ζ θ − 2 ζ ζ + θ θ .

(20.214)

Example 20.9(A gaussian integral) For any number c, we can compute the integral of exp(c θ ∗ θ ) by expanding the exponential

±

ec θ

∗θ

dθ ∗d θ

=

±

(1

+ c θ ∗ θ ) dθ ∗d θ =

±

(1

− c θ θ ∗ ) d θ ∗ d θ = −c .

(20.215)

The identity operator for the space of states c |0 ³ + d |1³ ≡ c |0³ + d ψ †|0³

(20.216)

is (Exercise 20.32) the integral I

±

= |θ ³´θ | d θ ∗ dθ = |0³´0| + |1³´1|

(20.217)

in which the differentials anticommute with each other and with other fermionic variables: {d θ , d θ ∗ } = 0, {d θ , θ } = 0, {d θ , ψ } = 0, and so forth. The case of several Grassmann variables θ1 , θ2 , . . . , θn and several Fermi operators ψ1, ψ2 , . . . , ψn is similar. The θk anticommute among themselves and with the Fermi operators

{θi , θ j } = {θi , θ ∗j } = {θi∗, θ ∗j } = 0

and

{θi , ψk } = {θi∗ , ψk } = 0

(20.218)

704

20 Path Integrals

while the ψk satisfy

{ψk , ψ † } = δk The reference state |0³ is ´

{ψk , ψl } = {ψk†, ψ †} = 0.

and

´

(20.219)

´

|0³ =

ÆÈ Ç n k =1

ψk

|s ³

(20.220)

in which |s ³ is any state not annihilated by any ψk (so the resulting |0³ isn’t zero). The direct-product state

|θ ³ ≡ exp

Æ¹ n k =1

†

ψk θk

−

1 ∗ θ θk 2 k

Ç

|0³ =

ºÈ n ³ k =1

1 ∗ θ θk 2 k

1 + ψk† θk −

´»

|0³

(20.221)

is (Exercise 20.33) a simultaneous eigenstate ψk |θ ³ = θk |θ ³ of each ψk . It follows that

| ³ = ψ θk |θ ³ = −θk ψ |θ ³ = −θk θ |θ ³ = θ θk |θ ³ (20.222) and so too ψk ψ |θ ³ = θ k θ |θ ³. Since the ψ ’s anticommute, their eigenvalues must ψ´ ψk θ

´

´

´

´

´

´

also

| ³ = ψ ψk |θ ³ = −ψk ψ |θ ³ = −θk θ |θ ³. (20.223) The inner product ´ζ |θ ³ is ºÈ » ºÈn » n 1 1 † ∗ ´ζ |θ ³ = ´0| (1 + ζk∗ ψk − 2 ζk∗ ζk ) ( 1 + ψ θ − θ θ ) |0 ³ 2 =1 =»1 ºk¹ n 1( ∗ ∗ ) =e − + 2 ∗ ζk ζk + θk θk . (20.224) = exp ζk θk − 2 θ´ θk θ

´

´

´

´

´

´

´

´

ζ †θ

(ζ † ζ

θ † θ )/

k =1

The identity operator is I

± n È = |θ ³´θ | dθk∗ d θk . k =1

Example 20.10 (Gaussian Grassmann integral) compute the gaussian integral g( A ) =

±

e−θ

† Aθ

(20.225)

For any 2 × 2 matrix A, we may

d θ1∗ d θ1 d θ2∗ d θ2

(20.226)

by expanding the exponential. The only terms that survive are the ones that have exactly one of each of the four variables θ1, θ2, θ1∗, and θ2∗. Thus the integral is the determinant of the matrix A

20.12 Fermionic Path Integrals

705

± 1( )2 dθ ∗d θ d θ ∗ d θ ∗ g ( A) = θk A k θ 1 2 2 1 ± (2 ) = θ1∗ A11θ1 θ2∗ A22 θ2 + θ1∗ A12θ2 θ2∗ A21θ1 dθ1∗d θ1d θ2∗ d θ2 = A11 A22 − A12 A21 = det A. (20.227) ´ ´

The natural generalization to n dimensions is

±

e−θ

†

Aθ

n È

k =1

dθk∗d θk

= det A

(20.228)

and is true for any n × n matrix A. If A is invertible, then the invariance of Grassmann integrals under translations implies that

±

e−θ

† Aθ

+θ †ζ +ζ †θ È d θ ∗ d θk = k k =1 n

= = =

±

± ± ±

+ A−1 ζ )+θ †ζ +ζ †(θ + A−1 ζ ) È dθ ∗d θk k k =1 n † † † −1 È e−θ Aθ +ζ θ +ζ A ζ dθk∗d θk k =1 n † † −1 † † −1 È e−(θ +ζ A ) A θ +ζ θ +ζ A ζ d θk∗ dθk k =1 n È † † −1 dθk∗ d θk e−θ Aθ +ζ A ζ k =1 e−θ

n

† A(θ

= det A e

−

ζ† A 1 ζ

.

(20.229)

The values of θ and θ † that make the argument −θ † Aθ + θ †ζ + ζ † θ of the exponential stationary are θ = A −1ζ and θ † = ζ † A−1 . So a gaussian Grassmann integral is equal to its exponential evaluated at its stationary point, apart from a prefactor involving the determinant det A. Exercises (20.2 and 20.4) are about the bosonic versions (20.3 and 20.4) of this result.

One may further extend these definitions to a Grassmann field χm ( x ) and an associated Dirac field ψm (x ). The χm (x ) ’s anticommute among themselves and with all fermionic variables at all points of spacetime

{χm( x ), χn ( x ² )} = {χm∗ ( x ), χn ( x ² )} = {χm∗ ( x), χn∗ ( x ² )} = 0

(20.230)

and the Dirac field ψm ( x ) obeys the equal-time anticommutation relations

{ψm ( x, t ), ψn† ( x± , t )} = δmn δ(x − x± ) (n, m = 1, . . . , 4) (20.231) {ψm ( x, t ), ψn ( x± , t )} = {ψm† ( x, t ), ψn†( x± , t )} = 0. As in (20.220), we use eigenstates of the field ψ at t = 0. If |0 ³ is defined in terms of a state |s ³ that is not annihilated by any ψm ( x , 0 ) as

706

20 Path Integrals

|0³ =

ºÈ

»

ψm ( x , 0)

m, x

|s ³

(20.232)

then (Exercise 20.34) the state

|χ ³ = exp = exp

Æ± ¹

³±

m

† ψm ( x , 0) χm ( x )

−

†

ψ χ

1 † 3 χ χd x 2

´

−

1 ∗ 3 χm ( x )χm ( x ) d x 2

|0³

Ç

|0³ (20.233)

is an eigenstate of the operator ψm ( x , 0) with eigenvalue χm ( x )

| ³ = χm( x )|χ ³.

ψm ( x , 0) χ

The inner product of two such states is (Exercise 20.35)

´ χ ² |χ ³ = exp

·±

1 ²† ² 1 † 3 ²† χ χ − χ χ − χ χ d x 2

2

(20.234)

¸ .

(20.235)

The identity operator is the integral I in which

±

= |χ ³´χ | D χ ∗ D χ

Dχ∗ Dχ

≡

È m ,x

The hamiltonian for a free Dirac field H0

=

±

ψ (γ

d χm∗ ( x )d χm ( x ). ψ

(20.236)

(20.237)

of mass m is the spatial integral

· ∇ + m) ψ d 3x

(20.238)

≡ i ψ †γ 0 and the gamma matrices (11.327) satisfy {γ a , γ b } = 2 ηab (20.239) where η is the 4 × 4 diagonal matrix with entries (−1, 1, 1 , 1). Since ψ |χ ³ = χ |χ ³ and ´χ ²|ψ † = ´χ ² |χ ²† , the quantity ´χ ² | exp( − i ± H0 )|χ ³ is by (20.235) ¸ · ± ² 3 ² − i H ² (20.240) ´χ |e |χ ³ = ´χ |χ ³ exp − i ± χ (γ · ∇ + m ) χ d x ¸ ·± 1 1 ²† ² ² 3 ² † † = exp 2 (χ − χ )χ − 2 χ (χ − χ ) − i ±χ (γ ·∇ + m )χ d x Ä ± ·1 ¸ Å 1 ²† ² † = exp ± 2 χ˙ χ − 2 χ χ˙ − i χ (γ · ∇ + m ) χ d 3 x

in which ψ

±

0

20.12 Fermionic Path Integrals

707

in which χ ²† − χ † = ± χ˙ † and χ ² − χ = ± χ˙ . Everything within the square brackets is multiplied by ± , so we may replace χ ²† by χ † and χ ² by χ so as to write to first order in ±

´χ ² |e−i ± H0 |χ ³ = exp

·±

1 † χ˙ χ 2

±

−

1 † χ χ˙ 2

− i χ (γ · ∇ + m ) χ d

3

x

¸

(20.241)

in which the dependence upon χ ² is through the time derivatives. Putting together n = 2t /± such matrix elements, integrating over all intermediate-state dyadics |χ ³´χ |, and using our formula (20.236), we find

´χt |e−2it H0 |χ−t ³ =

±

exp

·± 1

˙ − 2 †

χ χ

1 † χ χ ˙ 2

− i χ (γ ·∇ + m ) χ d

4

¸

x D χ ∗D χ . (20.242)

Integrating χ˙ †χ by parts and dropping the surface term, we get

´χt |e−2it H0 |χ−t ³ =

±

exp

·±

− χ χ˙ − i χ (γ ·∇ + m ) χ d †

4

¸

x D χ ∗Dχ . (20.243)

− χ † χ˙ = − i χ γ 0χ˙ , the argument of the exponential is ± ± ( ) 0 4 i − χ γ χ˙ − χ (γ · ∇ + m ) χ d x = i − χ γ ∂ + m χ d 4 x .

Since

μ

We then have

´χt |e−2it H0 |χ−t ³ =

±

³±

exp i

(

)

L0 (χ ) d

4

μ

´

x Dχ∗ Dχ

(20.244)

(20.245)

in which L0 (χ ) = − χ γ μ ∂μ + m χ is the action density (11.329) for a free Dirac field. Thus the amplitude is a path integral with phases given by the classical action S0 [χ ]

´ χ |e−2it H0 |χ t

−t ³ =

±

i

e

Ì L (χ ) d4 x 0

Dχ∗ Dχ

=

±

e i S0 [χ ] D χ ∗ Dχ

(20.246)

and the integral is over all fields that go from χ ( x , −t ) = χ−t ( x ) to χ ( x , t ) = χt ( x ). Any normalization factor will cancel in ratios of such integrals. Since Fermi fields anticommute, their time-ordered product has an extra minus sign

T

¿ψ (x )ψ ( x )À = θ (x 0 − x 0) ψ (x ) ψ (x ) − θ (x 0 − x 0) ψ (x ) ψ (x ). 1

2

1

2

1

2

2

1

2

1

(20.247)

The logic behind our formulas (20.140) and (20.158) for the time-ordered product of bosonic fields now leads to an expression for the time-ordered product of 2n Dirac fields (with D χ ²² and D χ ² and so forth suppressed)

708

20 Path Integrals

±

´0|χ ²²³ χ (x1) · · · χ (x2n ) ei S [ ]´χ ² |0³ D χ ∗ D χ ¿ À ± . ´0|T ψ (x1 ) · · · ψ (x2n ) |0³ = ´0|χ ²²³ ei S [ ] ´χ ² |0³ Dχ ∗ D χ 0 χ

0 χ

(20.248)

As in (20.169), the effect of the inner products ´0|χ ²² ³ and ´χ ² |0³ is to insert ± -terms which modify the Dirac propagators

¿ À ´0|T ψ (x1) · · · ψ (x2n ) |0³ =

±

χ ( x 1)

· · · χ ( x2n ) ei S [ ±

0 χ ,±

] Dχ ∗ D χ

Imitating (20.170), we introduce a Grassmann external current a fermionic analog of Z 0[ j ]

¼

Z 0[ζ ] ≡ ´0| T e

±

Ì ζ ψ +ψ ζ d 4 x ½

|0 ³ =

Ì ζ χ +χ ζ d4 x

e

±

(

γ

μ

∂μ

+m

)

²(x

(20.249)

ζ (x )

and define

ei S0[χ ,± ] Dχ ∗ Dχ

(20.250)

.

e i S0[χ ,± ] Dχ ∗ D χ

Example 20.11(Feynman’s fermion propagator) i

.

e i S0 [χ ,±] Dχ ∗ D χ

Since

( )± 4 − y) ≡ i γ ∂ + m (2dπp)4 ei p x − y −ip(2−+i γm 2p −+i ±m ) ± d4 p ( ) = (2π )4 ei p x −y i γ p + m (p−2i γ+ mp 2 +− mi ±) ± 4 2 2 = (d2π p)4 ei p x −y p2 p+ m+2m− i ± = δ 4(x − y), μ

(

μ

(

)

(

)

μ

ν

)

ν

μ

ν

ν

(20.251)

the function ²(x − y) is the inverse of the differential operator i (γ μ∂μ + m ). Thus the Grassmann identity (20.229) implies that Z 0[ ζ ] is

´0| T

¼Ì e

ζψ

+ψ ζ

½ d x |0³ = 4

=

±

Ì

μ 4 e [ζ χ +χ ζ −i χ (γ ∂ μ +m )χ ]d x D χ ∗ Dχ

±

exp

·±

ei S0 [χ ,± ] Dχ ∗ Dχ

ζ (x )²( x

− y )ζ (y ) d 4 xd4 y

(20.252)

¸ .

Differentiating we get

± ¿ À ´0|T ψ (x )ψ (y ) |0³ = ²(x − y ) = −i

d 4 p i p (x − y ) −i γ ν pν + m e . (20.253) (2π ) 4 p2 + m 2 − i ±

20.14 Faddeev–Popov Trick

709

20.13 Application to Nonabelian Gauge Theories The action of a generic non-abelian gauge theory is S

±

= − 41 Fa

μν

−ψ

Faμν

(γ

μ

Dμ

+m

) ψ d4 x

(20.254)

in which the Maxwell field is Fa μν

≡∂

μ

Aa ν − ∂ν A aμ + g f abc Abμ Acν

(20.255)

and the covariant derivative is Dμ ψ

≡∂

μψ

− ig ta A a

μ

(20.256)

ψ.

Here g is a coupling constant, fabc is a structure constant (11.68), and ta is a generator (11.57) of the Lie algebra (Section 11.16) of the gauge group. One may show (Weinberg, 1996, pp. 14–18) that the analog of equation (20.195) for quantum electrodynamics is

±

O1 · · · On ei S δ [ A a3 ] D A D ψ ± ´ µ|T [O1 · · · On ] |µ³ = e i S δ[ Aa3 ] D A Dψ in which the functional delta function δ

È

[ A a3 ] ≡

δ( A a3 ( x ))

(20.257)

(20.258)

x

enforces the axial-gauge condition, and D ψ stands for Dψ ∗ Dψ . Initially, physicists had trouble computing nonabelian amplitudes beyond the lowest order of perturbation theory. Then DeWitt showed how to compute to second order (DeWitt, 1967), and Faddeev and Popov, using path integrals, showed how to compute to all orders (Faddeev and Popov, 1967). 20.14 Faddeev–Popov Trick The path-integral tricks of Faddeev and Popov are described in (Weinberg, 1996, pp. 19–27). We will use gauge-fixing functions G a ( x ) to impose a gauge condition on our non-abelian gauge fields Aaμ( x ). For instance, we can use G a (x ) = A 3a (x ) to impose an axial gauge or G a (x ) = i ∂μ Aμa ( x ) to impose a Lorentz-invariant gauge. Under an infinitesimal gauge transformation (13.371) A λaμ

= Aa − ∂ μ

μ λa

− g fabc A b

μ

λc

(20.259)

710

20 Path Integrals

the gauge fields change, and so the gauge-fixing functions G b ( x ), which depend upon them, also change. The jacobian J of that change at λ = 0 is J

= det

´ËË Ë ≡ δλb( y ) Ë =0

³ δG

a (x ) λ

λ

Ë

DG λ ËË Dλ Ë λ=0

(20.260)

3

(20.261)

and it typically involves the delta function δ 4 (x − y ). Let B [G ] be any functional of the gauge-fixing functions G b ( x ) such as B [G ] =

È x ,a

in an axial gauge or B [G ] = exp

·i ± 2

2

( G a ( x ))

=

δ( G a ( x ))

4

d x

¸

= exp

·

È x ,a

δ( A a (x ))

±( ∂ − i 2

μ

Aa

μ

¸ ) 2 4 (x ) d x

(20.262)

in a Lorentz-invariant gauge. We want to understand functional integrals like (20.257)

±

O1 · · · On ei S B [G ] J D A Dψ ± ´µ|T [O1 · · · On ] |µ³ = e i S B [G ] J D A D ψ

(20.263)

in which the operators O k , the action functional S [ A], and the differentials D AD ψ (but not the gauge-fixing functional B [G ] or the Jacobian J) are gauge invariant. The axial-gauge formula (20.257) is a simple example in which B [G ] = δ [ Aa 3] enforces the axial-gauge condition A a3 (x ) = 0 and the determinant J = det (δab ∂3 δ(x − y )) is a constant that cancels. If we translate the gauge fields by gauge transformations ¶ and ¶ ², then the ratio (20.263) does not change

±

O1 · · · On ´µ|T [O1 · · · On ] |µ³ = ± ² ¶

¶

ei S B [G ¶ ] J ¶ D A¶ Dψ ¶ ¶

e i S B [G ¶ ] J ¶ D A¶ D ψ ¶ ²

¶

Ì

²

²

²

(20.264)

Ì

any more than f ( y ) dy is different from f (x ) d x. Since the operators O k , the action functional S [ A ], and the differentials D AD ψ are gauge invariant, most of the ¶ -dependence goes away

±

O1 · · · On ei S B [G ] J D A D ψ ´ µ|T [O1 · · · On ] |µ³ = ± . ² ² e i S B [ G ] J D A Dψ ¶

¶

¶

¶

(20.265)

20.14 Faddeev–Popov Trick

711

Let ¶λ be a gauge transformation ¶ followed by an infinitesimal gauge transformation λ. The jacobian J ¶ is a determinant of a product of matrices which is a product of their determinants J

¶

´ËË ³± δ G (x ) δ¶λ (z) ´ËË = det δλ ( y) ËË = det δ¶λa (z) δλ (c y ) d4 z ËË b =0 ³ δ G b (x ) ´ËË =0 ³ δ¶λ (z) ´ËËc = det δ¶λa ( z) ËË det δλ (c y) ËË =0 Ë ³ δ G (cx ) ´ =³0 δ¶λ (z) b´ËË DG D¶λ ËË c a Ë = det δ¶ (z ) det δλ ( y) Ë ≡ D¶ D λ Ë . (20.266) c b =0 =0 ³ δG

a (x )

¶λ

¶λ

λ

λ

¶λ

λ

λ

¶

¶

λ

λ

Now we integrate over the gauge transformations ¶ (and ¶² ) with weight function ρ (¶) = ( D ¶λ/ D λ|λ=0 )−1 and find, since the ratio (20.265) is ¶ -independent

±

O1 · · · On ei S B [G ] DG D¶ D A Dψ D¶ ´ µ|T [O1 · · · On ] |µ³ = ± DG e i S B [G ] D¶ D A Dψ D¶ ± O1 · · · On ei S B [G ] DG D A D ψ ± = e i S B [G ] DG D A Dψ ± O1 · · · On ei S D A D ψ ± . (20.267) = ¶

¶

¶

¶

¶

¶

¶

¶

e i S D A Dψ

Thus the mean value in the vacuum of a time-ordered product of gauge-invariant operators is a ratio of path integrals over all gauge fields without any gauge fixing. No matter what gauge condition G or gauge-fixing functional B [G ] we use, the resulting gauge-fixed ratio (20.263) is equal to the ratio (20.267) of path integrals over all gauge fields without any gauge fixing. All gauge-fixed ratios (20.263) give the same time-ordered products, and so we can use whatever gauge condition G or gauge-fixing functional B [G ] is most convenient. The analogous formula for the euclidian time-ordered product is

±

O1 · · · On e− S ± ´µ|Te [O1 · · · On ] |µ³ =

e

D A Dψ

e −Se D A Dψ

(20.268)

where the euclidian action Se is the spacetime integral of the energy density. This formula is the basis for lattice gauge theory.

712

20 Path Integrals

The path-integral formulas (20.197 and 20.198) derived for quantum electrodynamics therefore also apply to nonabelian gauge theories.

20.15 Ghosts Faddeev and Popov showed how to do perturbative calculations in which one does fix the gauge. To continue our description of their tricks, we return to the gaugefixed expression (20.263) for the time-ordered product

±

O1 · · · On ei S B [G ] J D A Dψ ± ´µ|T [O1 · · · On ] |µ³ = e i S B [G ] J D A D ψ

(20.269)

set G b (x ) = −i ∂μ Aμb ( x ) and use (20.262) as the gauge-fixing functional B [G ] B [G ] = exp

·i ± 2

2

(G a ( x ))

4

d x

¸

= exp

·

±( ∂ − i 2

This functional adds to the action density the term gauge-field propagator like the photon’s (20.201)

¿

À

μ

− (∂

±

Aa

μ

μ

¸ ) 2 4 (x ) d x .

(20.270)

A μa ) 2/2 which leads to a

d 4q . q2 − i ± (2π )4 (20.271) What about the determinant J ? Under an infinitesimal gauge transformation (20.259), the gauge field becomes

´0|T

Aμ ( x ) Aν ( y ) |0³ = a

b

Aλa μ

− i δab µ

μν (

= Aa − ∂ μ

and so G λa (x ) = i ∂ μ Aaλμ (x ) is G a (x ) = i ∂ A aμ (x ) + i ∂ λ

μ

μ

±¿

x

− y) = − i

μ λa

− g fabc A b

−δac∂ − g f abc A b μ

μ

μ(

ημν δab

e iq ·(x −y )

(20.272)

λc

À δ4 (x − y)λ (y) d 4 y .

x)

The jacobian J then is the determinant (20.260) of the matrix

c

³ δ G (x ) ´ËË a ËË = −i δac ± δ 4 (x − y) − ig f abc ∂ ¿ A b (x )δ4(x − y )À δλc ( y ) ∂x =0 λ

μ

μ

λ

(20.273)

(20.274)

that is

´ ³ À ∂ ¿ 4 4 A b (x )δ (x − y ) . J = det −i δac ± δ ( x − y ) − ig fabc ∂x μ

μ

(20.275)

20.16 Effective Field Theories

713

But we’ve seen (20.228) that a determinant can be written as a fermionic path integral det A =

±

† e−θ A θ

n È k =1

d θ k∗ d θk .

So we can write the jacobian J as J

=

±

exp

·±

i ω ∗± ω a

(20.276)

¸

μ ∗ 4 ∗ a + ig fabc ωa ∂ μ( A b ωc ) d x D ω D ω

(20.277)

which contributes the terms −∂μ ω∗a ∂ μ ωa and

−∂

∗ g f abc Aμ ωc = ∂μω ∗ g fabc A μωb b a c

(20.278)

μ ωa

to the action density. Thus we can do perturbation theory by using the modified action density

) 1( μ 2 ∗ μ ∗ μ ∂μ A a − ∂μ ωa ∂ ωa + ∂μ ωa g f abc Ac ωb − ψ (± D + m ) ψ 2 (20.279) in which ± D ≡ γ μ Dμ = γ μ (∂μ − igt a A aμ ). The ghost field ω is a mathematical device, not a physical field describing real particles, which would be spinless fermions violating the spin-statistics theorem (Example 11.22). L² = − 41 Fa

μν

Faμν −

20.16 Effective Field Theories Suppose a field φ whose mass M is huge compared to accessible energies interacts with a field ψ of a low-energy theory such as the standard model Lφ

= − 21 ∂a φ (x) ∂ a φ (x ) − 21 M 2 φ2 (x ) + g ψ (x )ψ (x )φ (x ).

(20.280)

Compared to the mass term M 2, the derivative terms ∂a φ ∂ a φ contribute little to the low-energy path integral. So we represent the effect of the heavy field φ as L φ0 = − 21 M 2φ 2 + g ψ ψ φ . Completing the square L φ0

=−

¶2 g2 1 2µ g M φ − 2 ψψ + 2 M 2M 2

(ψ ψ )

2

(20.281)

and shifting φ by g ψ ψ/ M 2 , we see that the gaussian path integral is

±

·±

exp i

¸

g2 2 4 (ψ ψ ) d x Dφ − 21 M 2 φ2 + 2M 2

= exp

·± i

g2 2 4 (ψ ψ ) d x 2M 2

¸

apart from a field-independent factor. The net effect of heavy field φ is thus to add to the low-energy theory a new interaction Lψ

2

= 2gM 2 (ψ ψ )2

(20.282)

714

20 Path Integrals

which is small because M 2 is large. If a gauge boson Y a of huge mass M interacts as L Y 0 = − 21 M 2 Ya Y a + ig ψ γ a ψ Ya with a spin-one-half field ψ , then L ψ = − ( g2/(2 M 2 )) ψ γ a ψ ψ γa ψ is the new low-energy interaction. 20.17 Complex Path Integrals In this chapter, it has been tacitly assumed that the action is quadratic in the time derivatives of the fields. This assumption makes the hamiltonian quadratic in the momenta and the path integral over them gaussian. In general, however, the partition function is a path integral over fields and momenta like Z (β) =

±

exp

Ä± ± ¼ β

0

½

Å

˙ x )π(x ) − H (φ , π ) dtd x Dφ D π i φ( 3

(20.283)

in which the exponential is not a probability distribution. To study such theories, one often can numerically integrate over the momentum, make a look-up table for P [φ ], and then apply the usual Monte Carlo methods of Section 16.5 (Amdahl and Cahill, 2016). Programs that do this are in the repository Path_integrals at github.com/kevinecahill. Further Reading “Space-Time Approach to Non-relativistic Quantum Mechanics” (Feynman, 1948), Quantum Mechanics and Path Integrals (Feynman et al., 2010), Statistical Mechanics (Feynman, 1972), The Quantum Theory of Fields I, II, & III (Weinberg, 1995, 1996, 2005), Quantum Field Theory in a Nutshell (Zee, 2010), and Quantum Field Theory (Srednicki, 2007) all provide excellent treatments of path integrals. Some applications are described in Path Integrals in Quantum Mechanics, Statistics, Polymer Physics, and Financial Markets (Kleinert, 2009). Exercises

20.1 From (20.1), derive the multiple gaussian integral for real a j and b j

±∞

−∞

⎛n ⎞n Á n ¹ È È i π −ib exp ⎝ ia j x 2j + 2ib j x j ⎠ e dx j = j =1

j =1

j =1

aj

2 /a j j .

(20.284)

20.2 Use (20.284) to derive the multiple imaginary gaussian integral (20.3). Hint: Any real symmetric matrix s can be diagonalized by an orthogonal transformation a = oso ± . Let y = ox.

Exercises

715

20.3 Use (20.2) to show that for positive a j

±

∞ −∞

⎛ ⎞n n ² ¹ È È π −b 2 e exp ⎝ −a j x j + 2ib j x j ⎠ d x j = j =1

j

j =1

aj

2 /a j j .

(20.285)

20.4 Use (20.285) to derive the many variable real gaussian integral (20.4). Same hint as for Exercise 20.2. 20.5 Do the q 2 integral (20.27). 20.6 Insert the identity operator in the form of an integral (20.10) of outer products | p ³´ p | of eigenstates of the momentum operator p between the exponential and the state |qa ³ in the matrix element (20.25) and so derive for that matrix element ´ qb | exp(−i ( tb − ta ) H /±)|q√ a ³ the formula (20.28). Hint: use the inner product ´q | p ³ = exp(iq p/±)/ 2π ±, and do the resulting Fourier transform. 20.7 Derive the path-integral formula (20.36) for the quadratic action (20.35). 20.8 Show that for the simple harmonic oscillator (20.44) the action S [qc ] of the classical path from qa , ta to qb , tb is (20.46). 20.9 Show that the determinants |C n ( y ) | = det C n ( y ) of the tridiagonal matrices (20.52) satisfy the recursion relation (20.53) and have the initial values |C 1(y)| = y and |C 2(y )| = y2 − 1. Incidentally, the Chebyshev polynomials (9.65) of the second kind Un ( y /2 ) obey the same recursion relation and have the same initial values, so |C n ( y ) | = Un ( y /2). 20.10 (a) Show that the functions Sn ( y ) = sin( n + 1)θ / sin θ with 2 cos θ = y satisfy the Toeplitz recursion relation (20.53) which after a cancellation simplifies to sin (n + 2)θ = 2 cos θ sin(n + 1 )θ − sin n θ . (b) Derive the initial conditions S0 ( y ) = 1, S1 ( y ) = y, and S2 ( y ) = y 2 − 1. 20.11 Do the q 2 integral (20.75). 20.12 Show that the euclidian action (20.88) is stationary if the path q ec( u ) obeys the euclidian equation of motion qëc (u ) = ω2 q ec( u ). 20.13 By using (20.20) for each of the three exponentials in (20.102), derive (20.103) from (20.102). Hint: From (20.20), one has qe −i (tb −ta ) H /±q =

±

q b |qb ³ e i S[ q ]/± ´qa |q a Dq dqa dqb

(20.286)

in which q a = q (ta ) and qb = q (tb ). 20.14 Derive the path-integral formula (20.139) from (20.129–20.132). 20.15 Derive the path-integral formula (20.153) from (20.147–20.150). 20.16 Show that the vector Y that makes the argument − iY T SY + i DT Y of the multiple gaussian integral

716

±

∞

exp

µ

−∞

20 Path Integrals

− iY

T

SY

+ iD

T

Y

n ¶È

i =1

d yi

=

Á

πn

det (i S)

exp

³i 4

D S −1 D

´

T

(20.287)

stationary is Y = S −1 D /2, and that the multiple gaussian integral (20.287) is equal to its exponential exp( −iY T SY + i DT Y ) evaluated at its stationary point Y = Y apart from a prefactor involving det i S. 20.17 Show that the vector Y that makes the argument − Y T SY + D T Y of the multiple gaussian integral

±

∞

exp

−∞

20.18

20.19

20.20 20.21 20.22 20.23 20.24 20.25 20.26 20.27 20.28 20.29 20.30

µ

−Y

T

SY

+D

T

Y

n ¶È

i =1

d yi

=

Á

πn

det( S )

exp

³1

4

D S −1 D

´

T

(20.288)

stationary is Y = S −1 D /2, and that the multiple gaussian integral (20.288) is equal to its exponential exp(−Y T SY + DT Y ) evaluated at its stationary point Y = Y apart from a prefactor involving det S. By taking the nonrelativistic limit of the formula (12.55) for the action of a relativistic particle of mass m and charge q, derive the expression (20.41) for the action of a nonrelativistic particle in an electromagnetic field with no scalar potential. Work out the path-integral formula for the amplitude for a mass m initially at rest to fall to the ground from height h in a gravitational field of local acceleration g to lowest order and then including loops up to an overall constant. Hint: use the technique of Section 20.4. Show that the euclidian action of the stationary solution (20.87) is (20.88). Derive formula (20.161) for the action S0 [φ] from (20.159 and 20.160). Derive identity (20.165). Split the time integral at t = 0 into two halves, use ± e ¶± t = ¶ d e¶± t /dt and then integrate each half by parts. Derive the third term in Equation (20.167) from the second term. Use (20.171) and the Fourier transform (20.172) of the external current j to derive the formula (20.173) for the modified action S0 [φ , ±, j ]. Derive Equation (20.174) from Equation (20.173). Derive the formula (20.175) for Z 0 [ j ] from the formula for S0 [φ , ±, j ]. Derive Equations (20.176 and 20.177) from formula (20.175). Derive Equation (20.181) from the formula (20.176) for Z 0 [ j ]. Show that the time integral of the Coulomb term (20.186) is the term that is quadratic in j 0 in the number F defined by (20.190). By following steps analogous to those that led to (20.177), derive the formula (20.201) for the photon propagator in Feynman’s gauge.

Exercises

717

20.31 Derive expression (20.214) for the inner product ´ζ |θ ³. 20.32 Derive the representation (20.217) of the identity operator I for a single fermionic degree of freedom from the rules (20.204 and 20.207) for Grassmann integration and the anticommutation relations (20.206). 20.33 Derive the eigenvalue equation ψk |θ ³ = θ k |θ ³ from the definitions (20.220 and 20.221) of the eigenstate |θ ³. 20.34 Derive the eigenvalue relation (20.234) for the Fermi field ψm ( x , t ) from the anticommutation relations (20.230 and 20.231) and the definitions (20.232 and 20.233). 20.35 Derive the formula (20.235 ) for the inner product ´χ ² |χ ³ from the definition (20.233) of the ket |χ ³. 20.36 Imitate the derivation of the path-integral formula (20.66) and derive its 3-dimensional version (20.73).

21 Renormalization Group

21.1 Renormalization and Interpolation Probably because they describe point particles, quantum field theories are divergent. Unknown physics at very short distance scales, removes these infinities. Since these infinities really are absent, we can cancel them consistently in renormalizable theories by a procedure called renormalization. One starts with an action that contains infinite, unknown charges and masses, such as e0 = e − δ e and m 0 = m − δm, in which − e is the charge of the electron and m is its mass. One then uses δe and δm to cancel unwanted infinite terms as they appear in perturbation theory. Because the underlying theory is finite, the value of a divergent scattering amplitude may change by a finite amount when we compute it at two different sets of initial and final momenta. This happens, for example, in the theory of a scalar field φ with action density 1

1

g

L = − 2 ∂i φ ∂ i φ − 2 m 2φ 2 − 4! φ 4.

(21.1)

The amplitude for the elastic scattering of two bosons of initial four-momenta p1 and p2 into two of final momenta p1± and p ±2 is A

= g−

g2 16π 2

±

∞ 0

±1 ² d x [k 2 + m 2 − sx (1 − x ) ]−2 k dk 3

0

+ [ k + m − t x ( 1 − x )]−2 + [k2 + m 2 − ux (1 − x )]−2 2

2

³

(21.2)

to one-loop order (Weinberg, 1995, section 12.2). In this formula, s , t , and u are the Mandelstam variables s = −( p 1 + p2 )2, t = −( p 1 − p±1) 2, and u = −( p1 − p2± )2 , and k 2 = k 02 + k12 + k 22 + k 32 after a Wick rotation. 718

21.2 Renormalization Group in Quantum Field Theory

719

The amplitude A (s , t , u ) diverges logarithmically, but the difference between it and its value A0 = A(s 0 , t0 , u 0 ) at some point ( s0 , t0, u 0) is finite A(s , t , u ) − A0

=

´ x (1 − x )(s − s )[2k2 + 2m 2 − (s + s )x (1 − x )] 0 0 k dk d x − 2 + m 2 − sx (1 − x )] 2[ k 2 + m 2 − s x ( 1 − x )]2 [ k 0 0 0 µ + (s, s0 → t , t0) + (s, s0 → u, u 0) . (21.3) 2

g 16π 2

±

∞

±

3

1

The second and third terms within the big curly brackets are the same as the first term but with s and s0 replaced by t and t0 in the second term and by u and u 0 in the third. The k integral is finite as is A(s , t , u ) − A0 A (s , t , u ) − A 0

=

±

´ ¶

·

2 2 1 − 32gπ 2 d x ln mm 2 −− ssx0 x((11−− xx)) ¶ m 2 −0 t x (1 − x ) · ¶ m 2 − u x(1 − x ) · µ (21.4) + ln m 2 − t0x ( 1 − x ) + ln m 2 − ux0 (1 − x) .

If we choose as the renormalization point s0 = t0 = u 0 = −4 µ2 /3, then we get the usual result (Weinberg, 1995, 1996, sections 12.2, 18.1–2).

21.2 Renormalization Group in Quantum Field Theory We can use the scattering amplitude (21.4) to define a running coupling constant g µ at energy scale µ as the experimentally measured, finite, physical amplitude A at s 0 = t0 = u 0 = −µ 2. Then with gµ ≡ A(s 0 , t0 , u 0 ), the scattering amplitude (21.4) is finite to order g 2

± 1 ´ ¶ 1 + (µ/m )2 x (1 − x) · d x ln A (s , t , u ) = g − (21.5) 1 − s / m 2 x (1 − x ) 0 ¶ 1 + (µ/ m )2 x (1 − x ) · ¶ 1 + (µ/m )2 x(1 − x ) · µ + ln 1 − t / m 2 x (1 − x ) + ln 1 − u/ m 2 x (1 − x) . g2 32π 2

µ

Callan (Callan, 1970) and Symanzik (Symanzik, 1970) noticed that this scattering amplitude, like any physical quantity, is independent of the sliding scale µ. Thus its derivative with respect to µ vanishes 0=

∂ A(s , t , u ) ∂µ

∂ gµ

= ∂µ − =

∂ gµ ∂µ

−

±

1 ¸ ¹ 3g 2 ∂ d x ln 1 + (µ/ m )2 x (1 − x ) 2 32π ∂ µ 0 ± 1 (2µ/ m 2)x (1 − x ) 3g 2 dx . 32π 2 0 1 + (µ/ m )2 x (1 − x )

(21.6)

720

21 Renormalization Group

For µ ² m, the integral is 2/µ . So at high energies, the running coupling constant obeys the differential equation µ

∂ gµ ∂µ

3gµ2

2

≡ β( g ) = 163gπ 2 = 16π 2

(21.7)

µ

in which the last equality holds to second order in gµ . Integrating the beta function β(gµ ), we get E ln M

=

±

E M

dµ µ

=

±

gE gM

dgµ β(gµ )

±g

16π 2 3

=

E

gM

dgµ gµ2

=

16π 2 3

º1

gM

So the running coupling constant gµ at energy µ = E is

√

gE

= 1 − 3g

M

gM . ln( E / M )/16π 2

−

1 gE

»

.

(21.8) (21.9)

As the energy E = s rises above M, while staying below the singular value E = M exp( 16π 2/3g M ) , the running coupling constant g E slowly increases, as does the scattering amplitude, A ≈ g E . Example 21.1(Quantum electrodynamics) Vacuum polarization makes the ¸ one-loop¹ amplitude for the scattering of two electrons proportional to A (q 2) = e2 1 + π(q 2) rather than to e2 (Gell-Mann and Low, 1954), (Weinberg, 1995, section 11.2). Here e is the renormalized charge, q = p1± − p1 is the four-momentum transferred to the first electron, and 2

π(q )

=

e2 2π 2

±1 0

¶

x (1 − x ) ln 1 +

·

q 2 x (1 − x ) dx m2

(21.10)

represents the polarization of the vacuum. One defines the square of the running coupling constant e2µ to be the amplitude A (q 2 ) at q2 = µ 2 eµ2 For q2

= A(µ2) = e2

¼

1 + π(µ 2)

½

(21.11)

.

= µ2 ² m 2, the vacuum-polarization term π(µ2) is (exercise 21.1) ¶ µ 5· e2 2 ln − π(µ ) ≈ . (21.12) 6π 2 m 6

The amplitude then is A(q2 ) = e2µ

1 + π(q 2) , 1 + π(µ2 )

and since it must be independent of µ , we have 0=

d A (q 2 ) d µ 1 + π(q 2)

e2µ

= ddµ 1 + π(µ2 ) ≈ ddµ

²2¼

eµ 1 − π(µ2 )

(21.13)

½³

.

(21.14)

21.2 Renormalization Group in Quantum Field Theory

721

So by differentiating eµ and the vacuum-polarization term (21.12), we find 0 = 2eµ

º de » ¼

1−

µ

dµ

2 π(µ )

½

2 ) − eµ dπ(µ dµ

= 2e

2

º de » ¼

1 − π(µ2 )

µ

µ

dµ

½

2

− e 2 6πe2 µ . µ

(21.15) But by (21.10) the vacuum-polarization term π(µ2) is of order e2 , which is the same as eµ2 to lowest order in eµ . Thus we arrive at the Callan–Symanzik equation µ

which we can integrate E ln M

=

±

E

dµ µ

M

=

±e

E

eM

deµ dµ

µ

deµ β( eµ )

= 12π

to e2E

=

e3µ

≡ β(e ) = 12π 2 2

±e

E

eM

(21.16)

de µ e3µ

e2M

1 − e2M ln( E / M )/6π 2

= 6π

2

¾

1 e2M

−

¿

1 e2E

(21.17)

.

The fine-structure constant e2µ/ 4π slowly rises from α = 1/137.036 at m e to

√

e2 (45 .5GeV) 4π

= 1 − 2α ln(45.5α/0.00051)/3π = 1341 .6

(21.18)

at s = 91 GeV. When all light charged particles are included, one finds that the fine-structure constant rises to α = 1/128.87 at E = 91 GeV. Example 21.2(Quantum chromodynamics) The beta functions of scalar field theories and of quantum electrodynamics are positive, and so interactions in these theories become stronger at higher energy scales. But Yang–Mills theories have beta functions that can be negative because of the cubic interactions of the gauge fields and the ghost fields (20.279). If the gauge group is SU (3), then the beta function is dgµ µ dµ

≡ β(g ) = − µ

11g3 16 π 2

to lowest order in gµ . Integrating, we find E ln M

=

±

E M

dµ µ

=

±g

E

gM

dgµ β( g µ)

2 = − 1611π

11g3µ

= − 16 π 2

±g

E

gM

dgµ g3µ

=

8π 2 11

(21.19)

¾

1 g 2M

−

¿

1 g2E

(21.20) and g2E

=

À g2M

1+

11g2M E ln 2 8π M

Á−1 (21.21)

722

21 Renormalization Group

which shows that as the energy E of a scattering process increases, the running coupling slowly decreases, going to zero at infinite energy, an effect called asymptotic freedom (Gross and Wilczek, 1973; Politzer, 1973). If the gauge group is SU ( N ), and the theory has n f flavors of quarks with masses below µ , then the beta function is β( g µ)

gµ3

= − 4π 2

º 11 N 12

− n6f

»

(21.22)

which is negative as long as n f < 11N /2. Using this beta function with N again integrating, we get instead of (21.21) g2E

=

À

1+

g 2M

(11

16 π 2

So with M2

− 2n f /3)g 2M

≡ ±2 exp

¾

E2 ln 2 M

Á−1

(21.23)

.

¿

16 π 2

(11

= 3 and

(21.24)

− 2n f /3)g 2M

Asymptotic freedom

0.22

0.2 )E(s α htgnerts gnilpuoC

0.18

0.16

0.14

0.12

0.1

20

40

60

80

100

120

140

Energy E (GeV)

Figure 21.1 The strong-structure constant αs ( E ) as given by the one-loop formula (21.25) (thin curve) and by a three-loop formula (thick curve) with ± = 230 MeV and n f = 5 is plotted for m b ³ E ³ m t . This chapter’s Matlab scripts are in Renormalization_group at github.com/ kevinecahill.

21.3 Renormalization Group in Lattice Field Theory

723

we find (Exercise 21.2) αs ( E )

2

≡ g 4(πE ) = (33 − 2n 12) πln(E 2 /±2) f

(21.25)

which expresses the dimensionless QCD coupling constant αs ( E ) appropriate to energy E in terms of a parameter ± that has the dimension of energy. Sidney Coleman called this dimensional transmutation. For ± = 230 MeV and n f = 5, Fig. 21.1 displays αs ( E ) in the range 4.19 = m b ³ E ³ m t = 172 GeV. The thin curve is the one-loop formula (21.25), and the thick curve is a three-loop formula (Weinberg, 1996, p. 156).

21.3 Renormalization Group in Lattice Field Theory Let us consider a quantum field theory on a lattice (Gattringer and Lang, 2010, chap. 3) in which the strength of the nonlinear interactions depends upon a single dimensionless coupling constant g. The spacing a of the lattice regulates the infinities, which return as a → 0. The value of an observable P computed on this lattice will depend upon the lattice spacing a and on the coupling constant g, and so will be a function P (a , g ) of these two parameters. The right value of the coupling constant is the value that makes the result of the computation be as close as possible to the physical value P . So the correct coupling constant is not a constant at all, but rather a function g (a ) that varies with the lattice spacing or cutoff a. Thus as we vary the lattice spacing and go to the continuum limit a → 0, we must adjust the coupling function g (a ) so that what we compute, P (a , g (a )), is equal to the physical value P. That is, g (a ) must vary with a so as to keep P (a , g (a )) constant at P ( a , g (a )) = P d P ( a , g ( a )) da

= 0.

(21.26)

Writing this condition as a dimensionless derivative a

d P ( a , g (a )) da

d P (a , g (a )) d P ( a , g (a )) = dda = =0 ln a da d ln a

we arrive at the Callan–Symanzik equation 0=

d P (a , g (a )) d ln a

=

º

∂ ∂

ln a

+

dg ∂ d ln a ∂ g

»

P (a , g (a )).

(21.27)

(21.28)

The coefficient of the second partial derivative with a minus sign is the lattice beta function dg β L ( g) ≡ − . (21.29) d ln a

724

21 Renormalization Group

Since the lattice spacing a and the energy scale µ are inversely related, the lattice beta function differs from the continuum beta function by a minus sign. In SU ( N ) gauge theory, the first two terms of the lattice beta function for small g are β L ( g) = −β0 g 3 − β1 g 5 where for n f flavors of light quarks β0 β1

1

= (4π )2 1

= (4π )4

º 11

»

2 N − nf 3 º 343 10 N2 − Nn f 3 3

N2 − 1 nf N

−

»

(21.30)

and N = 3 in quantum chromodynamics. Combining the definition (21.29) of the beta function with its expansion β L ( g ) = −β0 g3 − β1 g5 for small g, one gets the differential equation dg d ln a which one may integrate

±

±

= β0 g3 + β1 g5

(21.31)

1

dg

º β + β g2 » 0 1 ln

+ β1 g5 = − 2β0 g2 + 2β02 to find that the lattice spacing has an essential singularity at g = 0 º β + β g2 » 2 0 1 a(g) = c e −1 2 g d ln a = ln a

− ln c =

β1

β0 g 3

β 1 / β 02

g2

/( β 0 2 )

g2

(21.32)

in which c is a constant of integration. The term β1 g 2 is of higher order in g, and if one drops it and absorbs a power of β0 into a new constant of integration ±, then one finds ) 2e 1 ( 2 −β1 /2β 0 −1/(2β 0 g2 ) β0 g a(g) = . (21.33) ±

As g → 0, the lattice spacing a (g ) goes to zero very fast (as long as n f N = 3). The inverse of this relation (21.33) g(a) ≈

¸β

0 ln( a

−2± −2) + (β

1 /β0 )

(

)¹−1 2

ln ln(a −2 ±−2 )

/

1 φ² ( x )

=a

²

=a

φ ( x /²)

±

e i k·x/² φ (k ) d d k

²

(21.38)

±

in which a² is a factor that keeps the kinetic part of the hamiltonian invariant (Fisher, 1974, 1998; Kosterlitz et al., 1976; Kadanoff, 2009; Wilson, 1971, 1975). To have Hk [φ² ] = Hk [φ], we need derivative terms to remain invariant Hk [φ² ] =

=

± º ∂ φ (x) »2 ²

∂ xi

± ²

d 2

a²

d

d x

=

º ∂ φ (x /²) »2

±

2

a²

º ∂ φ (x /²) »2 ∂ xi

d

d ( x /²)

²∂ x i /²

=

±

d −2 2

a²

²

dd x

º ∂ φ (x ± ) »2 ∂ x i±

d d ( x ± ) = Hk [φ]. (21.39)

That is, we need a ² = ²(2−d )/2. How do the various potential-energy terms change? A term Hn [φ ] with changes to Hn [φ² ] =

±

=²

d

n

d

gn φ² ( x ) d x n

a²

±

=

±

n

n

gn a²

gn φn ( x ± ) d d x ± =

φ ( x /²) d d +n( 2− d)/2

²

d

x

=

±

Hn [φ].

φn

g n a n² φ n ( x /²) ²d d d ( x /²) (21.40)

In effect, the coupling constant changes to g n (²) = ²d+n (2−d)/2 gn . Similar reasoning shows that the coupling constant g n, p of a term with n factors of the field φ and p spatial derivatives of φ should vary as gn, p (²) = ²d+n (2−d)/2− p g n, p . Coupling constants with positive exponents d + n (d − 2)/2 − p > 0 become more important at greater spatial scales and are said to be relevant. Those with negative exponents become less important at greater spatial scales and are said

726

21 Renormalization Group

to be irrelevant. Those with vanishing exponents are marginal. The mass term g2 (²)φ 2 = ²2 g2 φ2 is always relevant. The quartic term g4 (²)φ 4 = ²4−d g4 φ4 is relevant in fewer than 4 dimensions, marginal in 4 dimensions, and irrelevant in more than 4 dimensions. The term g 6(²)φ 6 = ²6−2d g6 φ6 is relevant in fewer than 3 dimensions, marginal in 3 dimensions, and irrelevant in more than 3. Further Reading Quantum Field Theory in a Nutshell (Zee, 2010, chaps. III & VI), An Introduction to Quantum Field Theory (Peskin and Schroeder, 1995, chap. 12), and The Quantum Theory of Fields (Weinberg, 1995, 1996, sections 12.2 & 18.1–2). Exercises

21.1 Show that for µ 2 ² m 2 , the vacuum polarization term (21.10) reduces to (21.12). Hint: Use ln a b = ln a + ln b when integrating. 21.2 Show that by choosing the energy scale ± according to (21.24), one can derive (21.25) from (21.23).

22 Strings

22.1 The Nambu–Goto String Action Quantum field theory is plagued with infinities, presumably because it represents particles as points. A more physical approach is to represent elementary particles as objects of finite size. Those that are 1-dimensional are called strings. We’ll use 0 ≤ σ ≤ σ1 and τ i ≤ τ ≤ τ f to parametrize the spacetime coordinates X µ (σ, τ ) of the string. Nambu and Goto suggested using as the action the area (Zwiebach, 2009, chap. 6) T0 S=− c

± ± ²( )2 ( )2 X˙ · X ± − X˙ ( X ± )2 d τ d σ

(22.1)

X˙ µ

(22.2)

τf

τi

σ1

0

in which

= ∂∂Xτ

µ

and

X±µ

= ∂∂Xσ

µ

and a Lorentz metric ηµν = diag(−1, 1, 1 , . . . ) is used to form the inner products like X˙ · X ± = X˙ µ ηµν X ν ± . The action S is the area swept out by a string of length σ1 in time τ f − τi . If X˙ d τ = dt points in the time direction and X ± d σ = d r points in a spa( )2 tial direction, then one sees that X˙ · X ± = 0, that − X˙ d τ 2 = dt 2, and that

( ± )2 X

dσ2

= d r 2. So in this simple case, the action (22.1) is ± ±r T0 T0 t dt dr = − (t f − ti )r 1 S=− c c 1

f

ti

0

(22.3)

which is the area the string sweeps out. The other term ( X˙ · X ±) 2 within the square root ensures that the action is the area swept out for all X˙ and X ± , and that it is invariant under arbitrary reparametrizations σ → σ ± and τ → τ ± . 727

728

22 Strings

The equation of motion for the relativistic string follows from the requirement that the action (22.1) be stationary, δ S = 0. Since

˙ = δ ∂ X = ∂δ X ∂τ ∂τ

δX

µ

µ

µ

and

δX

= δ ∂∂Xσ = ∂ δ∂Xσ µ

±µ

µ

,

we may express the change in the action in terms of derivatives of the Lagrange density

²(

T0 L=− c as

± ± ³ ∂L δS = ˙ ∂X 0 τf

X˙ · X ± ∂δ X µ

σ1

∂L

∂δ X µ

+ ∂ X±

µ

∂τ

µ

τi

)2 − ( X˙ )2 ( X ± )2

(22.4)

´

∂σ

d τ d σ.

(22.5)

Its derivatives, which we’ll call Pµτ and Pµσ , are ∂L

˙ · X ± ) X ± − ( X ± ) 2 X˙ )2 ( )2 X · X ± − X˙ ( X ± )2

(X

T

P = ˙ = − 0 ²( c ∂X ˙ τ µ

µ

and σ µ

µ

∂ X ±µ

In terms of them, the change in the action is δS

=

± ± ³∂ ( τf

τi

σ1

δX

∂τ

0

µ

)

P + ∂σ τ µ

(22.6)

˙ · X ± ) X˙ − ( X˙ )2 X ± (X = − Tc0 ²( . ˙X · X ± )2 − ( X˙ )2 ( X ± )2

∂L

P =

µ

µ

∂

(

δX

µ

µ

)

P − δX σ µ

µ

µ∂P

τ µ

∂τ

+

(22.7)

∂

P ¶´ σ µ

∂σ

d τ d σ. (22.8)

The total τ -derivative integrates to a term involving the variation δ X µ which we make vanish at the initial and final values of τ . So we drop that term and find that the net change in the action is δS

=

± · τf

δX

τi

µ

P

¸

σ σ1 µ 0

dτ

± ± τf

−

τi

σ1

δX

0

µ

µ∂P

τ µ

∂τ

+

∂

P ¶ σ µ

∂σ

d τ d σ.

(22.9)

Thus the equations of motion for the string are ∂

P

τ µ

∂τ

+

∂

P

σ µ

∂σ

= 0,

(22.10)

but the action is stationary only if

±

P

δ X (τ, σ1 ) µ

σ µ (τ,

σ 1)

− δX

µ

P

(τ, 0)

σ µ (τ,

0) d τ

= 0.

(22.11)

22.2 Static Gauge and Regge Trajectories

729

Closed strings automatically satisfy this condition. Open strings satisfy it if they obey for each end σ∗ of the string, each spacetime dimension µ, and all times τ either the free-endpoint boundary condition

=0

(22.12)

X˙ µ(τ, σ∗ ) = 0

(22.13)

P

σ (τ, µ

σ∗ )

or the Dirichlet boundary condition

which cannot hold for µ = 0 because X 0 = c t is related to the parameter τ . So the time component of an open string obeys the free-endpoint condition P0σ (τ, σ∗ ) = 0. A Dn-brane is a space of n spatial dimensions to which an end X j (τ, σ∗ ) of an open string can attach and on which it can move. The equation of motion (22.10) shows that the 4-momentum pµ

=

±

P

τ µ

dσ

(22.14)

of both a closed string and an open one moving with free endpoints (22.12) is conserved d pµ dτ

± ∂P =

τ µ

∂τ

± ∂P dσ = −

σ µ

∂σ

dσ

· =− P

σ (τ, µ

¸ = 0, 0

σ)

σ1

(22.15)

but the momentum p ν need not be conserved if X ν is stuck on a brane. The momentum density of the string is Pµτ . 22.2 Static Gauge and Regge Trajectories The Nambu–Goto action (22.1) remains the same when one changes the parameters σ and τ arbitrarily to σ ± (σ, τ ) and τ ± (σ, τ ). This reparametrization invariance allows us to choose these parameters so as to simplify any particular calculation without changing the physics. One such choice is the static gauge in which the parameter τ is the time t = 0 X /c in some chosen Lorentz frame, and so the σ and τ derivatives of X = ( X 0, X² ) are X ± = (0 , X² ± ) and X˙ = (c , X²˙ ). In this gauge, the time derivative of the 4momentum p µ of a freely moving string vanishes because its τ -derivative vanishes (22.15), and the free endpoints of an open string move transversely to the string at the speed of light. We also can chose the parameter σ so that the motion of every point on a string is transverse to the string, X² ± · X²˙ = X ± · X˙ = 0, so that the energy of the string varies as d E = T0 d σ , and so that X ±2 + X˙ 2 = 0. In this parametrization (Zwiebach, 2009, chaps. 6–8), strings obey the wave equation X¨

= c2 X ±±,

(22.16)

and the free-endpoint boundary condition (22.12) is simply X ±µ (t , σ∗ ) = 0.

730

22 Strings

The angular momentum M12 of a string rigidly rotating in the 1–2 plane is M12 (τ ) =

±

σ1

0

X 1(τ, σ )P2τ (τ, σ ) − X 2(τ, σ )P1τ (τ, σ ) d σ.

(22.17)

In the above version of the static gauge, the nonzero spatial coordinates of a straight open string rotating in the 1–2 plane are X² (t , σ ) =

σ1 π

cos

πσ σ1

with momentum densities

P²

τ

(t , σ )

T0 ∂ X² c2 ∂ t

=

=

T0 πσ cos c σ1

Its angular momentum (22.17) is M12

=

σ1 π

±

T0 c

σ1

0

µ

π ct

cos

σ1

sin

,

µ π ct − sin σ , 1

cos 2

πσ σ1

dσ

π ct

¶

cos

(22.18)

,

σ1

π ct

¶ .

σ1

2

= σ21πTc0 .

(22.19)

(22.20)

Since in this static gauge its energy is E = σ1 T0 , the angular momentum J = M12 of the relativistic open string (22.18) is proportional to the square of its total energy 2

J

E = ± 2π ±cT .

(22.21)

0

The nonzero spatial coordinates of the corresponding closed string are X²

= 4π σ1

µ

sin

2π u σ1

+ sin

2π v σ1

,

− cos

2π u σ1

− cos

2π v σ1

¶

(22.22)

in which u = ct + σ and v = ct − σ . One may show (Exercise 22.3) that its angular momentum M12 obeys a rule similar to that (22.21) of the open string but with half the slope, J /± = E 2 /(4π ±cT0). Rules like (22.21) describe many hadrons made of light (u , d , s) quarks and massless gluons. The nucleon and five baryon resonances obey it with nearly the same value of the string tension T0 ≈ 0 .937 GeV/fm. Their common Regge trajectory is plotted in Fig. 22.1. The energy E and angular momentum J of a Kerr black hole obey a relation J /± = aG E 2/(±c 5) = E 2/(2π ±cT ) like the Regge-trajectory (22.21) but with string tension T = c4 /(2π aG ) in which G is Newton’s constant and a < 1. This string tension is higher by at least 37 orders of magnitude since 1/ G = 1 .4906 × 1038 GeV2 /(±c) 2. A string theory of hadrons took off in 1968 when Gabriel Veneziano published his amplitude for π + π scattering as a sum of three Euler beta functions (Veneziano, 1968). But after eight years of intense work, this effort was

22.3 Light-Cone Coordinates

731

Regge trajectory of the nucleon and delta

6

Δ(2420)

5 N(2220)

4 h/J

Δ(1950)

nipS

3 N(1680)

2 Δ (1232)

1 N(938)

0 0.8

1

1.2

1.4

1.6 1.8 2 2.2 2 Energy E = mc (GeV)

2.4

2.6

2.8

Figure 22.1 The angular momentum and energy of the nucleon and delta resonances approximately fit the curve J / ± = E 2/(2π ±c T0) with string tension T0 = 0.94 GeV/fm. This chapter’s Matlab scripts are in Strings at github.com/ kevinecahill.

largely abandoned with the discovery of quarks at SLAC and the promise of QCD as a theory of the strong interactions. In 1974, Joël Scherk and John Schwarz proposed increasing the string tension by 38 orders of magnitude so as to use strings to make a quantum theory that included gravity (Scherk and Schwarz, 1974). They identified the graviton as an excitation of the closed string.

22.3 Light-Cone Coordinates

(

) √

Dirac’s light-cone coordinates are√x + = x 0 + x 1 / 2 in which x 1 is a spatial ) ( coordinate and x − = x 0 − x 1 / 2. Conventionally x + is the light-cone variable corresponding to) the time x 0 . The light-cone √ √components of the momentum are ( ( ) + 0 1 − 0 1 p = p + p / 2 and p = p − p / 2. The squared distance ds 2 is ds 2

= d x · d x − ( d x 0)2 = (d x 2 )2 + (d x 3 )2 − d x − d x+ − d x +d x − ,

(22.23)

732

22 Strings

and p · x = p · x − p 0 x 0 = p 2 x 2 + p 3 x 3 − p − x + − p + x − . Just as in quantum mechanics we identify i ± ∂t with the energy E so that i±

∂ ∂ x0

E i ( p· x −Et )/± e , c

ei ( p·x − Et )/± =

(22.24)

so too, in light-cone coordinates, we have i±

∂

e ∂ x+

i ( p2 x 2 + p3 x 3 − p− x + − p+ x − )/±

= p− ei p x + p x − p− x + − p + x − ( 2 2

3 3

)/

±.

(22.25)

So p − is the light-cone version of E /c. 22.4 Light-Cone Gauge The equations of motion (22.10) and boundary conditions (22.12 and 22.13) simplify when the string parameters σ and τ obey the light-cone gauge conditions X+ =

= (2π/β) P + (22.26) in natural units (± = c = 1) with α± = 1/(2π ± c T0 ) and 0 ≤ σ ≤ π where β = 2 for open strings, and β = 1 for closed strings (Zwiebach, 2009, chaps. 9 & 12). β α±

p +τ and p +

τ

This gauge maintains the wave equation (22.16) and constraints of the static gauge of Section 22.2 X¨

as well as the X ±µ adding the rules

= X ±± and X˙ · X ± = 0 = X˙ 2 + X ±2 (22.27) (τ, σ∗ ) = 0 form of the free-endpoint condition (22.12) while

P =− σµ

1

X 2π α ±

±µ and

P = τµ

1

2π α ±

X˙ µ

(22.28)

which make it easier to show (Exercise 22.4) that the angular momentum Mµν (τ ) =

±

σ1

0

X µ(τ, σ ) Pντ (τ, σ ) − X ν (τ, σ ) Pµτ (τ, σ ) d σ

of a freely moving string is conserved, M˙ µν (τ )

(22.29)

= 0.

22.5 Quantized Open Strings In light-cone gauge (22.26), the action is a sum over the transverse coordinates ± = 2, 3 , . . . , d S=

1

4π α ±

±

dτ

±

π

0

(

d σ X˙ ± X˙ ± − X ±± X ±±

)

.

(22.30)

22.5 Quantized Open Strings

733

An expansion for an open string with free endpoints is X µ(τ, σ ) = x 0µ +

√

2 α± α0µ τ

+i

√ ± ¹ e−in 2α

τ

n³ = 0

µ

αn

n

cos n σ.

(22.31)

It obeys free-endpoint boundary conditions (22.12) because (cos n σ ) ± = n sin σ vanishes at σ = 0 and π . Closed strings have left- and right-moving waves indicated by barred and unbarred amplitudes X µ(τ, σ ) = x 0

µ

+

√

2 α± α0

µ

º ± ¹ −in ( α e α τ +i τ

2 n ³=0

n

ne µ

inσ

+ α¯ n e−in µ

σ

).

(22.32)

In the light-cone gauge, the transverse coordinates X ± of open strings obey the equal-time commutation relations with the transverse momenta

[X

±

(τ, σ ),

P

τ±

±

] = iη

(τ, σ ± )

±±

±

δ(σ

− σ ±)

and

[x0− (τ ), p+ (τ )] = − i , (22.33)

and commute with themselves at equal times: [ X ± (τ, σ ), X ± (τ, σ ± )] = 0 and [P τ ±(τ, σ ), P τ ±± (τ, σ ± )] = 0 . The amplitude operators αn± obey the commutation relations ±

[ α m , αn± ] = m η ±

±

±±

±

δm +n,0

and

[x 0 , p ± ] = i η ±

±

±±

±

.

(22.34)

Similar rules apply to closed strings (Zwiebach, 2009, chap. 13). String theory is a quantum field theory in 2 dimensions τ and σ . Its ground-state energy involves a sum over the positive integers which string theorists interpret by using Ser’s series expansion (6.111) of Riemann’s zeta function to say that

∞ ¹ n =1

n=

−1) = −

ζ(

=

µ¶

1¹ 1 ¹ k 2 n (−1 ) (k + 1) 2 n =0 n + 1 k =0 k

∞

n

»

¼½½ ½½ = − 1 . ½x =1 12

¹ ¸ · − 21 n +1 1 ddx x ddx x ( 1 − x )n ∞

n =0

∑

(22.35)

The relation 1 = − 21 ( D − 2) n n then says that open bosonic strings make sense in D = 26 dimensions. The theory also has a tachyon, that is, a particle that moves faster than light.

734

22 Strings

22.6 Superstrings µ

µ

In light-cone gauge, one can add fermionic variables ψ1 (τ, σ ) and ψ2 (τ, σ ) to the action in a supersymmetric way S=

1

±

dτ

4π α ±

1

+ 4π α±

±

±

π

0

dτ

(

d σ X˙ ± X˙ ± − X ±± X ±±

±

π

· d σ ψ1 (∂ + ∂ ±

0

τ

σ

)

±

)ψ1

+ ψ2 (∂ ±

τ

¸ − ∂ )ψ2 .

(22.36)

±

σ

The tachyon then goes away, and the number of spacetime dimensions drops from 26 to 10. Although string theory requires renormalization, it does give finite scattering amplitudes. There are five distinct superstring theories – types I, IIA, and IIB; E 8 ⊗ E 8 heterotic; and S O (32) heterotic. All five may be related to a single theory in 11 dimensions called M-theory, which is not a string theory. M-theory contains membranes (2-branes) and 5-branes, which are not D-branes. 22.7 Covariant and Polyakov Actions Other actions offer other advantages. The covariant action S=

1

±

4π α ±

dτ d σ

(∂

τ

X µ ∂τ X µ − ∂σ X µ ∂σ X µ

)

(22.37)

offers manifest Lorentz invariance and simple momentum operators ∂L 1 ˙ X . P = ˙ = 2π α± ∂X The commutation relations are [ X (τ, σ ), P (τ, σ ± )] = µ

(22.38)

µ

µ

i ηµν δ(σ − σ ± ) , but the 00 commutation relation has a minus sign that requires constraints on the physical states. Polyakov’s action is µ

S

1

= − 4π α ±

±

dτ dσ

ν

√ −h h

αβ

∂α X

µ

∂β X ηµν ν

(22.39)

in which h = det (h αβ ) and h αβ is the inverse of the 2 × 2 matrix h αβ . The Minkowski metric ηµν is fixed and diagonal, but the metric h αβ is dynamical and plays in the 2 dimensions τ, σ a role like that of g µν in general relativity. 22.8 D-branes or P-branes One may satisfy Dirichlet boundary conditions (22.13) by requiring the ends of a string to touch but be free to move along a spatial manifold, called a D-brane after

22.10 Riemann Surfaces and Moduli

735

Two strings attached to a D2-brane

Figure 22.2 Two strings stuck on a D2-brane.

Dirichlet. These branes should be called P-branes after Polchinski. If the manifold to which the string is stuck has p dimensions, then it’s called a Dp-brane. Figure 22.2 shows two strings whose ends are free to move only within a D2-brane. D p-branes offer a natural way to explain the extra six dimensions required in a universe of superstrings. One imagines that the ends of all the strings are free to move only in our 4-dimensional spacetime; the strings are stuck on a D3-brane, which is the 3-dimensional space of our physical universe. The tension of the superstring then keeps it from wandering far enough into the extra six spatial dimensions for us ever to have noticed. 22.9 String–String Scattering Strings interact by joining and by breaking. The left side of Fig. 22.3 shows two open strings joining to form one open string and then breaking into two open strings; the right side shows two closed strings joining to form one closed string and then breaking into two closed strings. The interactions of strings do not occur at points. Because strings are extended objects, their scattering amplitudes are finite. 22.10 Riemann Surfaces and Moduli A homeomorphism is a map that is one to one and continuous with a continuous inverse. A Riemann surfaceis a 2-dimensional real manifold whose open sets Uα are mapped onto open sets of the complex plane C by homeomorphisms z α whose

736

22 Strings Scattering of open and closed strings

→ emiT

→ emiT Space

Space

Figure 22.3 Spacetime diagrams of the scattering of two open strings into two open strings (left) and of two closed strings into two closed strings (right). 1 transition functions z α ◦ z − β are analytic on the images of the intersections Uα ∩ Uβ . Two Riemann surfaces are equivalent if they are related by a continuous analytic map that is one to one and onto. A parameter that labels the geometry of a manifold or that identifies a vacuum in quantum field theory is called a modulus. Some Riemann surfaces have several moduli; others have one modulus; others none at all. Some moduli are continuous parameters; others are discrete.

Further Reading A First Course in String Theory (Zwiebach, 2009). Exercises

22.1 Derive formulas (22.6) and (22.7). 22.2 Derive Equation (22.8) from (22.5–22.7). 22.3 Use the formulas for the momentum P² τ ( t , σ ) = T0 X²˙ /c 2 and angular momentum (22.17) to derive the Regge relation J /± = E 2 /(4π ±cT0 ) for the angular momentum of the closed string (22.22). 22.4 Use the equation of motion (22.10) and the light-cone gauge conditions (22.28) and X µ ± (τ, σ∗ ) = 0 to show that the angular momentum (22.29) of a freely moving string is conserved, M˙ µν = 0.

References

Aghanim, N., et al. 2018. Planck 2018 results. VI. Cosmological parameters. arXiv, 1807.06209. Aitken, A. C. 1959. Determinants and Matrices. Oliver and Boyd. Aké, Luis, Flores, José L., and Sánchez, Miguel. 2018. Structure of globally hyperbolic spacetimes with timelike boundary. arXiv, 1808, 04412. Akrami, Y., et al. 2018. Planck 2018 results. X. Constraints on inflation. arXiv, 1807.06211. Alberts, Bruce, Johnson, Alexander, Lewis, Julian, Morgan, David, Raff, Martin, Roberts, Keith, and Walter, Peter. 2015. Molecular Biology of the Cell. 6th edn. Garland Science. Alligood, Kathleen T., Sauer, Tim D., and Yorke, James A. 1996. CHAOS: An Introduction to Dynamical Systems. Springer-Verlag. Alsing, Paul M. and Milonni, Peter W. 2004. Simplified derivation of the Hawking-Unruh temperature for an accelerated observer in vacuum. Am. J. Phys., 72, 1524–1529. Amdahl, David and Cahill, Kevin. 2016. Path integrals for awkward actions. arXiv, 1611.06685. Arnold, V. I. 1989. Mathematical Methods of Classical Mechanics. 2nd edn. Springer. 5th printing. Chap. 7. Autonne, L. 1915. Sur les Matrices Hypohermitiennes et sur les Matrices Unitaires. Ann. Univ. Lyon, Nouvelle Série I, Fasc. 38, 1–77. Bailey, J., Borer, K., Combley, F., Drumm, H., Krienen, F., Lange, F., Picasso, E., von Ruden, W., Farley, F. J. M., Field, J. H., Flegel, W., and Hattersley, P. M. 1977. Measurements of relativistic time dilatation for positive and negative muons in a circular orbit. Nature, 268, 301–305. Bekenstein, Jacob D. 1973. Black holes and entropy. Phys. Rev., D7, 2333–2346. Bender, Carl M, and Orszag, Steven A. 1978. Advanced Mathematical Methods for Scientists and Engineers. McGraw-Hill. Bigelow, Matthew S., Lepeshkin, Nick N., and Boyd, Robert W. 2003. Superluminal and Slow Light Propagation in a Room-Temperature Solid. Science, 301(5630), 200–202. Bouchaud, Jean-Philippe and Potters, Marc. 2003. Theory of Financial Risk and Derivative Pricing. 2nd edn. Cambridge University Press. Bowman, Judd D., Rogers, Alan E. E., Monsalve, Raul A., Mozdzen, Thomas J., and Mahesh, Nivedita. 2018. An absorption profile centred at 78 megahertz in the sky-averaged spectrum. Nature, 555, 67. Boyd, Robert W. 2000. Nonlinear Optics. 2nd edn. Academic Press. Brillouin, L. 1960. Wave Propagation and Group Velocity. Academic Press. 737

738

References

Brunner, N, Scarani, V, Wegmüller, M, Legré, M, and Gisin, N. 2004. Direct measurement of superluminal group velocity and signal velocity in an optical fiber. Phys. Rev. Lett., 93(20), 203902. Cahill, Sean. 2018. Coding Lina. Amazon. Callan, Jr., Curtis G. 1970. Broken scale invariance in scalar field theory. Phys. Rev., D2, 1541–1547. Cantelli, F. P. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari, 4, 221–424. Carroll, Sean. 2003. Spacetime and Geometry: An Introduction to General Relativity. Benjamin Cummings. Chandrasekhar, S. 1943. Stochastic problems in physics and astronomy. Rev. Mod. Phys., 15(1), 1–89. Cheng, Ta-Pei. 2010. Relativity, Gravitation and Cosmology: A Basic Introduction. Oxford University Press. Clarke, C. J. S. 1970. On the global isometric embedding of pseudo-Riemannian manifolds. Proc. Roy. Soc. London Ser. A, 314, 417–428. Cohen-Tannoudji, Claude, Diu, Bernard, and Laloë, Frank. 1977. Quantum Mechanics. Hermann & John Wiley. Cotton, F. Albert. 1990. Chemical Applications of Group Theory. 3rd edn. WileyInterscience. Courant, Richard. 1937. Differential and Integral Calculus, Vol. I. Interscience. Courant, Richard and Hilbert, David. 1955. Methods of Mathematical Physics, Vol. I. Interscience. Creutz, Michael. 1983. Quarks, Gluons and Lattices. Cambridge University Press. Darden, Tom, York, Darrin, and Pedersen, Lee. 1993. Particle mesh Ewald: An Nlog(N) method for Ewald sums in large systems. J. Chem. Phys., 98(12). DeWitt, Bryce S. 1967. Quantum theory of gravity. II. The manifestly covariant Theory. Phys. Rev., 162(5), 1195–1239. Dirac, P. A. M. 1967. The Principles of Quantum Mechanics. 4th edn. Oxford University Press. Dirac, P. A. M. 1996. General Theory of Relativity. Princeton University Press. Dodelson, Scott. 2003. Modern Cosmology. 1st edn. Academic Press. Faddeev, L. D. and Popov, V. N. 1967. Feynman diagrams for the Yang-Mills field. Phys. Lett. B, 25(1), 29–30. Feller, William. 1966. An Introduction to Probability Theory and Its Applications. Vol. II. Wiley. Feller, William. 1968. An Introduction to Probability Theory and Its Applications. 3rd edn. Vol. I. Wiley. Feynman, Richard P. 1948. Space-time approach to non-relativistic quantum mechanics. Rev. Mod. Phys., 20, 367–387. Feynman, Richard P. 1972. Statistical Mechanics. Basic Books. Feynman, Richard P., Hibbs, Albert R., and Styer, Daniel F.(ed.). 2010. Quantum Mechanics and Path Integrals. Dover Publications. Fisher, M. E. 1974. The renormalization group in the theory of critical behavior. Rev. Mod. Phys., 46(4), 597–616. Fisher, M. E. 1998. Renormalization group theory: Its basis and formulation in statistical physics. Rev. Mod. Phys., 70, 653–681. Fixsen, D. J. 2009. The temperature of the cosmic microwave background. Astrophys. J., 707, 916–920.

References

739

Gattringer, Christof and Lang, Christian B. 2010. Quantum Chromodynamics on the Lattice: An Introductory Presentation. Springer’s (Lecture Notes in Physics). Gehring, G. M., Schweinsberg, A., Barsi, C., Kostinski, N., and Boyd, R. W. 2006. Observation of backwards pulse propagation through a medium with a negative group velocity. Science, 312, 895–897. Gelfand, Israel M. 1961. Lectures on Linear Algebra. Interscience. Gell-Mann, Murray. 1994. The Quark and the Jaguar. W. H. Freeman. Gell-Mann, Murray. 2008. Plectics. Lectures at the University of New Mexico. Gell-Mann, Murray and Low, F. E. 1954. Quantum electrodynamics at small distances. Phys. Rev., 95, 1300–1312. Georgi, H. 1999. Lie Algebras in Particle Physics. 2rd edn. Perseus Books. Ghosh, Ritesh, Dewangan, Gulab C., Mallick, Labani, and Raychaudhuri, Biplab. 2018. Broadband spectral study of the jet-disc emission in the radio-loud narrow-line Seyfert 1 galaxy 1H 0323+342. MNRAS, arxiv:1806.04089, 479, 2464. Glauber, Roy J. 1963a. Coherent and incoherent states of the radiation field. Phys. Rev., 131(6), 2766–2788. Glauber, Roy J. 1963b. The quantum theory of optical coherence. Phys. Rev., 130(6), 2529– 2539. Glivenko, V. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari, 4, 92–99. Gnedenko, B. V. 1968. The Theory of Probability. Chelsea Publishing Co. Grassberger, Peter and Procaccia, Itamar. 1983. Measuring the strangeness of strange attractors. Physica D, 9, 189–208. Grebogi, Celso, Ott, Edward, and Yorke, James A. 1987. Chaos, strange attractors, and fractal basin boundaries in nonlinear dynamics. Science, 238, 632–638. Greene, Robert E. 1970. Isometric embeddings of Riemannian and pseudo-Riemannian manifolds. Memoirs of the American Mathematical Society, No. 97. American Mathematical Society. Griffiths, Jerry B. and Podolsky, Jiri. 2009. Exact Space-Times in Einstein’s General Relativity. Cambridge Monographs on Mathematical Physics. Cambridge University Press. Gross, David J., and Wilczek, Frank. 1973. Ultraviolet behavior of nonabelian gauge theories. Phys. Rev. Lett., 30, 1343–1346. [,271(1973)]. Günther, M. 1989. Zum Einbettungssatz von J. Nash. Math. Nachr., 144, 165–187. Guth, Alan H. 1981. Inflationary universe: A possible solution to the horizon and flatness problems. Phys. Rev., D23, 347–356. Gutzwiller, Martin C. 1990. Chaos in Classical and Quantum Mechanics. Springer. Halmos, Paul R. 1958. Finite-Dimensional Vector Spaces. 2nd edn. Van Nostrand. Hardy, Godfrey Harold and Rogosinski, W. W. 1944. Fourier Series. 1st edn. Cambridge University Press. Hartle, James B. 2003. Gravity: An Introduction to Einstein’s General Relativity. Pearson. Hastie, Trevor, Tibshirani, Roberti, and Friedman, Jerome. 2016. The Elements of Statistical Leraning: Data Mining, Inference, and Prediction. Springer. Hau, L. V., Harris, S. E., Dutton, Z, and Behroozi, C. H. 1999. Light speed reduction to 17 metres per second in an ultracold atomic gas. Nature, 397, 594. Hawking, S. W. 1975. Particle creation by black holes. Commun. Math. Phys., 43, 199–220. Helstrom, Carl W. 1976. Quantum Detection and Estimation Theory. Academic Press. Hobson, M. P., Efstathiou, G. P., and Lasenby, A. N. 2006. General Relativity: An Introduction for Physicists. Cambridge University Press.

740

References

Holland, John H. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press. Huang, Fang, Schwartz, Samantha L., Byars, Jason M., and Lidke, Keith A. 2011. Simultaneous multiple-emitter fitting for single molecule super-resolution imaging. Biomed. Opt. Express, 2(5), 1377–1393. Ijjas, Anna and Steinhardt, Paul J. 2018. Bouncing Cosmology made simple. Class. Quant. Grav., 35(13), 135004. Ince, E. L. 1956. Integration of Ordinary Differential Equations. 7th edn. Oliver and Boyd, Ltd. Chap. 1. James, F. 1994. RANLUX: A Fortran implementation of the high-quality pseudorandom number generator of Lüscher. Comp. Phys. Comm. , 79, 110. Kadanoff, Leo P. 2009. More is the same: Phase transitions and mean field theories. J. Statist. Phys., 137, 777. Kahneman, Daniel. 2011. Thinking, Fast and Slow. Farrar, Straus, and Giroux. Kaplan, J. and Yorke, J. 1979. Chaotic Behavior of Multidimensional Difference Equations. Springer Lecture Notes in Mathematics, vol. 730. Springer. Pages 228–237. Kato, T. 1978. Topics in Functional Analysis. Academic Press. Pages 185–195. Kleinert, Hagen. 2009. Path Integrals in Quantum Mechanics, Statistics, Polymer Physics, and Financial Markets. World Scientific. Knuth, Donald E. 1981. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 2nd edn. Addison-Wesley. Kolmogorov, Andrei Nikolaevich. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari, 4, 83–91. Kosterlitz, J. M., Nelson, David R., and Fisher, Michael E. 1976. Bicritical and tetracritical points in anisotropic antiferromagnetic systems. Phys. Rev., B13, 412–432. Langevin, Paul. 1908. Sur la théorie du mouvement brownien. Comptes Rend. Acad. Sci. Paris, 146, 530–533. Lichtenberg, Donald B. 1978. Unitary Symmetry and Elementary Particles. 2nd edn. Academic Press. Lin, I-Hsiung. 2011. Classic Complex Analysis. World Scientific. Linde, Andrei D. 1983. Chaotic inflation. Phys. Lett., 129B, 177–181. Lüscher, M. 1994. A portable high-quality random number generator for lattice field theory simulations. Comp. Phys. Comm. , 79, 100. Lyth, David H. and Liddle, Andrew R. 2009. The Primordial Density Perturbation: Cosmology, Inflation and the Origin of Structure. Cambridge University Press. Martincorena, Iñigo, Raine, Keiran M., Gerstung, Moritz, Dawson, Kevin J., Haase, Kerstin, Loo, Peter Van, Davies, Helen, Stratton, Michael R., and Campbell, Peter J. 2017. Universal patterns of selection in cancer and somatic tissues. Cell, 171(5), 1029–1041. Matzner, Richard A. and Shepley, Lawrence C. 1991. Classical Mechanics. Prentice Hall. Minsky, Marvin. 1961. Steps toward Artificial Intelligence. Proceedings of the IEEE, 49, 8–30. Misner, Charles W., Thorne, Kip S., and Wheeler, John Archibald. 1973. Gravitation. W. H. Freeman. Morse, Philip M. and Feshbach, Herman. 1953. Methods of Theoretical Physics. Vol. I. McGraw-Hill. Müller, O., and Sánchez, M. 2011. Lorentzian manifolds isometrically embeddable in L N . Trans. Amer. Math. Soc., 363(10), 5367–5379. Nash, John. 1956. The imbedding problem for Riemannian manifolds. Ann. Math., 63(1), 20–63.

References

741

Nishi, C. C. 2005. Simple derivation of general Fierz-like identities. Am. J. Phys., 73, 1160–1163. Olver, Frank W. J., Lozier, Daniel W., Boisvert, Ronald F., and Clark, Charles W. 2010. NIST Handbook of Mathematical Functions. Cambridge University Press. Padmanabhan, Thanu. 2010. Gravitation: Foundations and Frontiers. Cambridge University Press. Parsegian, Adrian. 1969. Energy of an ion crossing a low dielectric membrane: Solutions to four relevant electrostatic problems. Nature, 221, 844–846. Pathria, R. K. 1972. Statistical Mechanics. Pergamon. Ch. 13. Pearson, Karl. 1900. On the criterion that a given system of deviations from the probable in the case of correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Phil. Mag., 50(5), 157–175. Peebles, P. J. E. 1982. Large scale background temperature and mass fluctuations due to scale invariant primeval perturbations. Astrophys. J., 263, L1–L5. [,85(1982)]. Peskin, Michael Edward, and Schroeder, Daniel V. 1995. An Introduction to Quantum Field Theory. Advanced book program. Westview Press. Reprinted in 1997. Phillips, Robert Brooks, Kondev, Jane, Theriot, Julie, and Orme, Nigel. 2012. Physical Biology of the Cell. 2nd edn. Garland Science. Politzer, H. David. 1973. Reliable perturbative results for strong interactions? Phys. Rev. Lett., 30, 1346–1349. [,274(1973)]. Riess, A. G., Casertano, S., Yuan, W., Macri, L., Anderson, J., MacKenty, J. W., Bowers, J. B., Clubb, K. I., Filippenko, A. V., Jones, D. O., and Tucker, B. E. 2018. New parallaxes of galactic cepheids from spatially scanning the Hubble Space Telescope: Implications for the Hubble constant. ApJ, 855(Mar.), 136. Riley, Ken, Hobson, Mike, and Bence, Stephen. 2006. Mathematical Methods for Physics and Engineering. 3d edn. Cambridge University Press. Ritt, Robert K. 1970. Fourier Series. 1st edn. McGraw-Hill. Roe, Byron P. 2001. Probability and Statistics in Experimental Physics. Springer. Roos, C. E., Marraffino, J., Reucroft, S., Waters, J., Webster, M. S., Williams, E. G. H., Manz, A., Settles, R., and Wolf, G. 1980. ±± lifetimes and longitudinal acceleration. Nature, 286, 244–245. Rubinstein, Reuven Y. and Kroese, Dirk P. 2007. Simulation and the Monte Carlo Method. 2nd edn. Wiley. Russell, D. A., Hanson, J. D., and Ott, E. 1980. Dimension of strange attractors. Phys. Rev. Lett., 45(14), 1175. Saito, Mutsuo and Matsumoto, Makoto. 2007. www.math.sci.hiroshima-u.ac.jp/ m-mat/MT /emt.html. Sakurai, J. J. 1982. Advanced Quantum Mechanics. 1st edn. Addison Wesley. 9th printing. Pages 62–63. Scherk, Joël, and Schwarz, John H. 1974. Dual models for non-hadrons. Nucl. Phys., B81, 118. Schmitt, Lothar M. 2001. Theory of genetic algorithms. Theor. Comput. Sci., 259, 1–61. Schutz, Bernard F. 1980. Geometrical Methods of Mathematical Physics. Cambridge University Press. Schutz, Bernard F. 2009. A First Course in General Relativity. 2nd edn. Cambridge University Press. Schwinger, Julian, Deraad, Lester, Milton, Kimball A., and Tsai, Wu-yang. 1998. Classical Electrodynamics. Westview Press. Shortliffe, E. H. and Buchanan, B.G. 1975. A model of inexact reasoning in medicine. Math. Biosci., 23, 351–379.

742

References

Smirnov, N. V. 1939. Estimation of the deviation between empirical distribution curves for two independent random samples. Bull. Moscow State Univ., 2(2), 3–14. Smith, Carlas S, Joseph, Nikolai, Rieger, Bernd, and Lidke, Keith A. 2010. Fast, single-molecule localization that achieves theoretically minimum uncertainty. Nature Methods, 7(5), 373. Sprott, Julien C. 2003. Chaos and Time-Series Analysis. 1st edn. Oxford University Press. Srednicki, Mark. 2007. Quantum Field Theory. Cambridge University Press. Stakgold, Ivar. 1967. Boundary Value Problems of Mathematical Physics, Vol. I. Macmillan. Steinberg, A. M., Kwiat, P. G., and Chiao, R. Y. 1993. Measurement of the Single-Photon Tunneling Time. Phys. Rev. Lett., 71(5), 708–711. Steinhardt, Paul J., Turok, Neil, and Turok, N. 2002. A Cyclic model of the universe. Science, 296, 1436–1439. Stenner, Michael D., Gauthier, Daniel J., and Neifeld, Mark A. 2003. The speed of information in a “fast-light” optical medium. Nature, 425, 695–698. Strogatz, Steven H. 2014. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. 2nd edn. Westview Press. Symanzik, K. 1970. Small distance behavior in field theory and power counting. Commun. Math. Phys., 18, 227–246. Tinkham, Michael. 2003. Group Theory and Quantum Mechanics. Dover Publications. Titulaer, U. M. and Glauber, R. J. 1965. Correlation functions for coherent fields. Phys. Rev., 140(3B), B676–682. Trotter, H. F. 1959. On the product of semi-groups of operators. Proc. Ann. Math., 10, 545–551. Veneziano, Gabriel. 1968. Construction of a crossing-symmetric Regge-behaved amplitude for linearly rising Regge trajectories. Nuovo Cim., 57A, 190. von Foerster, Heinz, Patricia M. Mora, Patricia M., and Amiot, Lawrence W. 1960. Doomsday: Friday, 13 November, A.D. 2026. Science, 132, 1291–1295. Vose, Michael D. 1999. The Simple Genetic Algorithm: Foundations and Theory. MIT Press. Wang, Yun-ping and Zhang, Dian-lin. 1995. Reshaping, path uncertainty, and superluminal traveling. Phys. Rev. A, 52(4), 2597–2600. Watson, George Neville. 1995. A Treatise on the Theory of Bessel Functions. Cambridge University Press. Waxman, David and Peck, Joel R. 1998. Pleiotropy and the preservation of perfection. Science, 279. Weinberg, Steven. 1972. Gravitation and Cosmology. John Wiley & Sons. Weinberg, Steven. 1988. The First Three Minutes. Basic Books. Weinberg, Steven. 1995. The Quantum Theory of Fields. Vol. I Foundations. Cambridge University Press. Weinberg, Steven. 1996. The Quantum Theory of Fields. Vol. II Modern Applications. Cambridge University Press. Weinberg, Steven. 2005. The Quantum Theory of Fields. Vol. III Supersymmetry. Cambridge University Press. Weinberg, Steven. 2010. Cosmology. Oxford University Press. Whittaker, E. T. and Watson, G. N. 1927. A Course of Modern Analysis. 4th edn. Cambridge University Press. Wigner, Eugene P. 1964. Group Theory and Its Application to the Quantum Mechanics of Atomic Spectra. Revised edn. Academic Press.

References

743

Wilson, Kenneth G. 1971. Renormalization group and critical phenomena. 1. Renormalization group and the Kadanoff scaling picture. Phys. Rev., B4, 3174–3183. Wilson, Kenneth G. 1975. The renormalization group: Critical phenomena and the Kondo problem. Rev. Mod. Phys., 47(Oct), 773–840. Zee, Anthony. 2010. Quantum Field Theory in a Nutshell. 2nd edn. Princeton University Press. Zee, Anthony. 2013. Einstein Gravity in a Nutshell. Princeton University Press. Zee, Anthony. 2016. Group Theory in a Nutshell for Physicists. Princeton University Press. Zwiebach, Barton. 2009. A First Course in String Theory. 2nd edn. Cambridge University Press.

Index

SU ( 3) lattice gauge theory, 638 Z2 lattice gauge theory, 637 Z, the group of integers, 392 absolute value of complex number, 2 addition theorem for spherical harmonics, 416–417 adjoint domain, 293 affine connection, 478 affine connection with all lower indices, 483 analytic continuation, 205–207 dimensional regularization, 207 analytic function branch of, 221 definition of, 185 essential singularity, 203 meromorphic, 203 multivalued, 221 analytic functions, 185–247 entire, 185, 203 harmonic functions, 196–197 holomorphic, 185 isolated singularity, 203 meromorphic, 203 pole, 203 simple pole, 203 angular momentum lowering operators, 418 raising operators, 418 spin, 414 annihilation and creation operators, 153, 255 anti-de Sitter space, 326 area of a parallelogram, 30 area of parallelepiped, 537 area of parallelogram, 537 area of unit sphere in d dimensions, 183 arrays, 2–4 associated Legendre functions, 356–362 Rodrigues’s formula for, 357 associated Legendre polynomials, 356–362 Rodrigues’s formula for, 357 asymptotic freedom, 259, 721–726

744

attractors, 650–656, 659 autonomous system, 647 autonomous systems of ordinary differential equations, 278–279, 649–653 average value, 138 basis, 7, 17 basis vectors, 472–477 beats, 95 Bessel functions, 365–389 and charge near a membrane, 372–374 and coaxial waveguides, 384 and cylindrical waveguides, 374–376 and scattering off a hard sphere, 385–386 exercises, 386–389 Hankel functions, 382–384 modified, 371–372 modified Bessel functions, 382–384 Neumann functions, 382–384 of the first kind, 165, 167, 365–382 of the second kind, 382–386 spherical, 167 and partial waves, 380–382 and quantum dots, 382 Rayleigh’s formula for, 377 spherical Bessel functions of the second kind, 384–386 Bessel inequality, 314 Bessel’s equation, 367 beta function, 168 Bianchi identity, 458, 462 Big Bang, 523 binomial coefficient, 165 binomial distribution, 572–575 and coping with huge factorials, 574–575 black hole entropy of, 510 lifetime of, 510 temperature of, 510 black holes, 508–511 Bloch’s theorem, 124

Index bodies falling in air, 275 Boltzmann distribution, 63–65 Boltzmann’s constant, 173 Bose–Einstein statistics, 586–587 boundary conditions Dirichlet, 293 natural, 291, 300 Neumann, 291 Bravais lattice, 124 Bromwich integral, 150 Bures distance, 13 C function gamma, 168, 574 C function lgamma, 168, 574 calculus of variations, 498–500 in nonrelativistic mechanics, 270–273 in relativistic electrodynamics, 460 in relativistic mechanics, 459–460 particle in a gravitational field, 498–500 strings, 727–729 Callan–Symanzik equation, 259, 721, 723–726 canonical commutation relations, 153, 256 canonical ensemble, 63–64 Cartan subalgebras, 427 Cartan’s moving frame, 493–494 Cartan’s structure equations, 527 Cauchy’s formula for repeated integration, 334 Cauchy’s principal value, 227 Feynman’s propagator, 230–233 trick, 228 Cauchy–Riemann conditions, 196–197 Cayley–Hamilton theorem, 49 chaos, 653–660 attractors, 650–656 limit cycle, 650 of fractal dimension, 659 strange, 659 strange attractor, 650 Bernoulli shift, 655 chaotic threshold, 653 dripping faucet, 653 Duffing’s equation, 652 dynamical system, 653 autonomous, 653 exercises, 660 fractals, 656–659 Cantor set, 657 fractal dimension, 657 Koch snowflake, 657 self-similar dimension, 658 Hénon’s map, 655 invertible map, 653 Lyapunov exponent, 653 map, 653 period-two sequence, 653 Poincaré map, 653 Rayleigh–Benard convection, 653 strange attractor, 654

745

strange attractors, 659 van der Pol oscillator, 649 characteristic equation, 45 characteristic function, 140 and moments, 140 characteristic functions, 597–600 characteristic polynomial, 45 chemical potential, 64, 265 Christoffel connection, 478 Christoffel symbol of the first kind, 478 class C k of functions, 105 Clifford algebra, 444 clock hypothesis, 501 coherent states, 79, 83, 122, 126, 576 squeezed states, 428 comma notation for derivatives, 471–472 commutators, 396 compact, 394 complex arithmetic, 2 complex derivatives, 177–178 complex path integrals, 714 complex-variable theory, 185–247 analytic continuation, 205–207 analyticity, 185–187 and string theory, 241–242 applications to string theory radial order, 241 argument principle, 204–205 calculus of residues, 207–209 Cauchy’s inequality, 199 Cauchy’s integral formula, 193–196 Cauchy’s integral theorem, 187–192 Cauchy’s principal value, 227–233 Cauchy–Riemann conditions, 186–187, 196–197 conformal mapping, 225–226 contour integral with cut, 223 cuts, 220–225 dispersion relations, 233–236 essential singularity, 203 and Picard’s theorem, 203 exercises, 243–247 fundamental theorem of algebra, 200 ghost contours, 209–218 harmonic functions, 196–197 isolated singularity, 203 Laurent series, 201–205 Liouville’s theorem, 199–200 logarithms, 220–225 method of steepest descent, 239–241 phase and group velocities, 236–239 pole, 203 residue, 202 roots, 221 simple pole, 203 singularities, 203–205 Taylor series, 198–199 conformal mapping, 225–226

746

Index

conjugate momenta, 324 connection from metric, 482–483 conservation of angular momentum in mechanics, 273 conservation of energy in field theory, 320–324 in mechanics, 270–278 conservation of momentum in mechanics, 273 conserved quantities, 322–324 continuity equation, 325 contractions, 469 contravariant vector field, 467 contravariant vectors, 467 convergence of functions, 104 uniform, 104 uniform and term-by-term integration, 105 convergence in the mean, 161–162 convex function, 570 convolutions, 142–145, 155 and Gauss’s law, 142–144 and Green’s functions, 142–144 and translational invariance, 144 coordinates, 466–467 correlation functions Glauber and Titulaer, 78–80 cosmic microwave background radiation, 360–362 cosmology, 325–328, 511–523 Big Bang, 523 comoving coordinates, 511 density and pressure, 515–516 era of matter, 516 Friedmann–Lemaître–Robertson–Walker metric, 513–514 Friedmann–Robertson–Walker metric energy–momentum tensor of, 514 Friedmann equations, 514 homogeneous and isotropic line element, 511 Hubble rate, 514 models, 515–523 Big Bang, 523 dark-energy dominated, 521–522 inflation dominated, 522–523 cotangent vectors, 477 coupling constant, 726 covariance, 571 covariant derivative of a contravariant vector, 478 covariant derivative of metric tensor, 481 Covariant derivatives in Yang–Mills theory, 409 covariant derivatives of contravariant vectors, 477–479 covariant derivatives of cotangent vectors, 478 covariant derivatives of covariant vectors, 479 covariant derivatives of tangent vectors, 479 covariant vector field, 467–468 covariant vectors, 467–468

critical density, 514, 516 curvature, 484–490 curvature of a cylindrical hyperboloid, 532 curvature of hyperboloid H 3 , 489–490 curvature of sphere S 2 , 486–488 curvature of sphere S 3 , 489 curvature scalar, 486 cyclic group Z n , 392 D-brane, 729 dark energy, 265 dark matter, 267 de Sitter space, 327 de Sitter spacetimes, 326–327 decuplet of baryon resonances, 426 degenerate eigenvalue, 47 degrees of freedom, 647 delta function, 88, 89, 117–120, 132–136, 141, 151, 152, 313, 318–319 and Green’s functions, 143 Dirac comb, 118–120, 135–136 eigenfunction expansion of, 313, 318–319 for continuous square-integrable functions, 132–136 for functions twice differentiable on [−1, 1], 348 for real periodic functions, 120, 123 of a function, 134 density operator, 53, 63–65 derivations, 481 determinants, 30–40 and antisymmetry, 32, 33 and Levi-Civita symbol, 32 and linear dependence, 33 and linear independence, 33 and permutations, 34 and the inverse of a matrix, 35 as exponential of trace of logarithm, 47 as products of eigenvalues, 46 cofactors, 31 derivative of logarithm of, 37 invariances of, 33 Laplace expansion, 32 minors, 31 product rule for, 36 3 × 3, 31 2 × 2, 30 dielectrics, 178 differential equations, 248–333 exercises, 331–333 terminal velocity of mice, men, falcons, and bullets, 275 diffusion, 154–155 Fick’s law, 154 dimensional regularization, 207 dimensional transmutation, 723 Dirac mass term, 445 Dirac notation, 20–30, 117–120 and change of basis, 24

Index and inner-product rules, 23 and self-adjoint linear operators, 26 and the adjoint of an operator, 25 bra, 21 bracket, 21 examples, 24 ket, 21 outer products, 24 tricks, 60 Dirac’s delta function, 316 Dirac’s gamma matrices, 444 direct product, 424 direct products adding angular momenta, 401 dispersion relations, 233–236 and causality, 233 Kramers and Kronig, 235–236 divergence, 85–92 division algebra, 430, 433 double-factorials, 166 driven, damped pendulum, 653 dual vectors, 477 Duffing’s equation, 652 effective field theories, 713–714 eigenfunctions, 299–313 complete, 306, 309–313 orthonormal Gram-Schmidt procedure, 306 eigenvalue algebraic multiplicity, 47 geometric multiplicity, 47 simple, 47 eigenvalues, 299–313 degenerate, 46 nondegenerate, 46 unbounded, 307–313 eigenvectors, 43–65 Einstein’s equations, 506 Einstein’s equations for empty space, 505 Einstein’s summation convention, 469 Einstein–Hilbert action, 504 Einstein–Nernst relation, 592 electric constant, 178, 457 electric displacement, 457 electric susceptibility, 178 electrodynamics, 456–459 electrostatic energy, 178 electrostatic potential multipole expansion, 317 electrostatics dielectrics, 178–181 embedding spaces, 473–476 emission rate from a fluorophore, 276 energy density, 322–324 energy–momentum 4-vector, 455 energy–momentum tensor, 498, 504–507 divergence of, 498

entangled, 74 entangled states, 74 entangled vectors, 74 entanglement, 72–75 enthalpy, 269–270 entropy, 53, 63–65, 77, 264–265 classical, 77 of a black hole, 510 of entangled system, 77 quantum, 77 equidimensional differential equations, 281 equivalence principle, 464, 492–493 equivalent representations of SU (2) , 420–421 error analysis, 583–585 Euler differential equations, 281 Euler’s equation, 325 Euler’s theorem, 324 Ewald summation, 136 expected value, 138 exterior derivative, 461–464 extremal black holes, 511 factorials, 163–168 double, 165, 168 Mermin’s approximation, 164 Mermin’s infinite-product formula, 164 Ramanujan’s approximation, 163–165 Stirling’s approximation, 163 Faraday’s law, 178, 457 Fermi–Dirac statistics, 586–587 Feynman’s propagator, 141–142, 230–233 as a Green’s function, 230 field of charge near a membrane, 179–181 field of charge near dielectric interface, 178–181 Fierz identities, 26, 426 Fisher information, 614–618 fluid mechanics, 325 Fokker–Planck equation, 595–597 forms, 460–464, 536–563 closed, 553–554 covariant laplacian and Hodge star, 560 differential forms, 460–464, 538–563 p-forms, 461 1-forms, 461 2-forms, 461 and exterior derivative, 461 and gauge theory, 525 and Stokes’s theorem, 463 closed, 462 complex, 555 exact, 462 Hodge star, 556–560 invariance of, 461 wedge product, 461 divergence of a contravariant vector and Hodge star, 557–560 exact, 553–554

747

748 exercises, 562–563 exterior derivatives, 543–548 exterior forms, 536–538 Frobenius’s theorem, 560–562 Hodge star, 556–560 and divergence, 556 and laplacian, 556 and Maxwell’s equations, 557–560 integration, 548–553 Stokes’s theorem, 552 laplacian and Hodge star, 560 Fortran’s gamma function, 168, 574, 575 Fortran’s log_gamma function, 168, 574 Fourier expansion of a free scalar field, 152–154 Fourier series, 93–127 and scalar fields, 153 better convergence of integrated series, 104 complex, 93–98, 102–109, 116–127 for real functions, 95 of nonperiodic functions, 97–109 convergence, 104–109 convergence theorem, 105 differentiation of, 103–104 exercises, 124–127 Gibbs overshoot, 100–102 integration of, 103–104 nonrelativistic strings, 122–123 of a C 1 function, 107 of nonperiodic functions, 97–109 Parseval’s identity, 120 periodic boundary conditions, 123–124 Born von Karman, 124 poorer convergence of differentiated series, 109 quantum-mechanical examples, 110–116 real functions, 98–102 Gibbs overshoot, 100 several variables, 103 stretched intervals, 102–103 the interval, 96 where to put the 2π ’s, 97 Fourier transform of a gaussian, 209 Fourier transforms, 128–146, 150–155 and Ampère’s law, 144 and characteristic functions, 140 and convolutions, 142–145 and differential equations, 150–155 and diffusion equation, 154–155 and Fourier series, 130, 141 and Green’s functions, 142–146 and momentum space, 137–140 and Parseval’s relation, 133–134 and scalar wave equation, 152 and the delta function, 132–136 and the Feynman propagator, 141–142 and the uncertainty principle, 138–140 derivatives of, 136–140

Index exercises, 155–157 in several dimensions, 141–142 integrals of, 136–140 inverse of, 129 of a gaussian, 130–131 of real functions, 131–132 Fourier–Legendre expansion, 348 Fourier–Mellin integral, 150 fractional derivatives, 177–178 fractional linear transformation, 226 free-fall coordinate system, 493 Friedmann equations, 514 Friedmann’s equations, 326 for a single constituent, 326 Friedmann–Lemaître–Robinson–Walker cosmologies, 325–328 function f ( x ± 0) , 105 continuous, 105 piecewise continuous, 105 functional derivatives, 661–668 and delta functions, 662–663 and variational methods, 662–665 exercises, 668 functional differential equation for ground state of a field theory, 665–668 functional differential equations, 665–668 notation used in physics, 662–663 of higher order, 663–665 Taylor series of, 665 functionals, 661 functions analytic, 185–187 differentiable, 185 fundamental theorem of algebra, 200 gamma function, 163–168, 206–207 gauge theory, 409, 523–525, 529–531 Gauss’s distribution, 577–586 accuracy of, 579 Gauss’s law, 178, 315, 457 gaussian integrals, 669–670 Gell-Mann’s SU (3) matrices, 425 general relativity, 325–328, 498–523 black holes, 509–511 cosmological constant, 507 cosmology, 511–523 dark energy, 507 density and pressure, 515–516 Einstein’s equations, 504–506 Einstein–Hilbert action, 504–506 gravitational waves, 507–508 linearized gravity, 507–508 model cosmologies, 515–523 dark-energy dominated, 521–522 inflation dominated, 522–523 Schwarzschild metric, 508–511 static and isotropic gravitational field, 508–511

Index Schwarzschild metric, 508–511 geodesic equation, 498–503 geodesic equation in electromagnetic field, 500 geometric series, 162–163 Gibbs free energy, 270 gradient, 85–92 grand canonical ensemble, 64–65 grand unification, 424 Grassmann numbers, 2, 6–7 Grassmann polynomials, 2 Grassmann variables, 702–708 gravitational time dilation, 503–504 gravitational waves, 507–508 graviton polarization, 9 Green’s function for Helmholtz’s equation, 316 for Helmholtz’s modified equation, 316 for laplacian, 144 for Poisson’s equation and Legendre functions, 317 Green’s functions, 315–320 and eigenfunctions, 318–320 and Fourier transforms, 142–146 and Fourier–Laplace transforms, 150–155 Feynman’s propagator, 317 for Helmholtz’s equation, 316 for Poisson’s equation, 315–317 of a self-adjoint operator, 319 Poisson’s equation, 315 group index of refraction, 237 groups, 390–450 O (n ) , 392 S O ( 3) adjoint representation of, 409–413 S O ( n) , 392 SU (2) , 414–424 defining representation of, 413–421 spin and statistics, 417–418 tensor products of representations, 415 SU (3) , 425–426 SU (3) structure constants, 425 Z 2 , 401 Z 3 , 402 Z n , 401 abelian, 391 and symmetries in quantum mechanics, 394 and Yang–Mills gauge theory, 409 auto-morphism, 398 inner, 398 outer, 398 block-diagonal representations of, 394 centers of, 396 characters, 399–400 compact, 392–394 compact Lie groups real structure constants of, 408 totally antisymmetric structure constants of, 408 completely reducible representations of, 394

749 conjugacy classes of, 396 continuous, 390–393 definition, 390 direct product addition of angular momenta, 400 direct products, 400–401 direct sum of representations of, 394 equivalent representations of, 394 exercises, 447–450 factor groups of by subgroups, 397 finite, 392, 401–404 multiplication table, 401 regular representation of, 402 Further reading, 447 Gell-Mann matrices, 425–426 generators of adjoint representations of, 421–422 homotopy groups, 447 invariant integration, 433–435 irreducible representations of, 394 isomorphism, 397 Lie algebras, 404–450 SU (2) , 414–421 Lie groups, 390–393, 404–450 S O ( 3) , 409–411 SU (2) tensor product, 418 SU (3) , 425–426 SU (3) generators, 425 SU (3) structure constants, 425 adjoint representations of, 421–422 antisymmetry of structure constants, 407 Cartan subalgebra of SU ( 3), 425 Cartan subalgebras of, 427 Cartan’s list of compact simple Lie groups, 433 Casimir operators, 422–423 Casimir operators of, 415 compact, 404–435 defining representation of SU ( 2), 413–421 definition of, 405 generators of, 405 generators of adjoint representation, 421 hermitian generators of, 407 Jacobi identity, 421 non-hermitian generators of, 407 noncompact, 404–409 of rotations, 409–421 simple and semisimple, 424 structure constants of, 421–422 symplectic groups, 427–433 Lorentz group, 391, 435–447 2-dimensional representations of, 439–443 Dirac representation of, 444–445 Lie algebra of, 435–439 matrix, 391 morphism, 397 nonabelian, 391 noncompact, 393 of matrices, 391–393

750 of orthogonal matrices, 391–393 of permutations, 404 of rotations, 409–421 adjoint representation of, 409–413 explicit 3 × 3 representation of, 410–413 generators of, 409–411 spin and statistics, 417–418 tensor operators of, 423–424 of transformations, 390–393 Lorentz, 391 Poincaré, 391 rotations, 391 rotations and reflections, 391 translations, 391 of unitary matrices, 391–393 order of, 392, 401 Poincaré group, 391, 446–447 Lie algebra of, 446–447 reducible representations of, 394 representations of, 393–395 dimensions of, 393 in Hilbert space, 394–395 rotations representations of, 395 Schur’s lemma, 398–399 semisimple, 396 similarity transformations, 394 simple, 396 simple and semisimple Lie algebras, 424 subgroups, 395–398 cosets of, 397 invariant, 396 left cosets of, 397 normal, 396 quotient coset space, 397 right cosets of, 397 trivial, 395 symmetry antilinear, antiunitary representations of, 395 linear, unitary representations of, 395 translation, 391 unitary representations of, 394 Hamilton systems, 647–649 hamiltonian, 268–269 in classical mechanics, 272 hamiltonian density of scalar fields, 324 harmonic function, 196 harmonic oscillator, 120–122, 302–303, 647 Heaviside step function, 126 helicity positive, 443 right handed, 443 Helmholtz free energy, 270 Helmholtz’s equation in cylindrical coordinates and Bessel functions, 369–376 in spherical coordinates

Index and spherical Bessel functions, 376–382 in 2 dimensions, 252–253 in 3 dimensions, 253–254 rectangular coordinates, 252, 253 spherical coordinates, 254 and associated Legendre functions, 356 and spherical Bessel functions, 356 with azimuthal symmetry, 354–355 and Legendre polynomials, 354–355 and spherical Bessel functions, 354–355 Hermite functions, 295 Hermite’s system, 295 hermitian differential operators, 293 Hilbert spaces, 14–15, 29 homogeneous function, 324 homogeneous functions, 264–274 Euler’s theorem, 265 virial theorem, 265–267 homogeneous Maxwell equations, 462 homotopy groups, 447 Hubble constant, 514, 516–517 incompressible fluid, 325 index of refraction, 237 inequality of arithmetic and geometric means, 571 inertial coordinate system, 493 infinite products, 181–182 infinite series, 158–184 absolute convergence, 158 asymptotic, 175–177 WKB and Dyson, 177 Bernoulli numbers and polynomials, 174–175 binomial series, 171 binomial theorem, 170, 171 Cauchy’s criterion, 159 Cauchy’s root test, 160 comparison test, 160 conditional convergence, 158 convergence, 158–163 d’Alembert’s ratio test, 160 divergence of, 158 exercises, 182–184 integral test, 160 Intel test, 161 logarithmic series, 172 of functions convergence, 161–162 convergence in the mean, 161–162 Dirichlet series, 172–174 Fourier series, 169–170 geometric series, 162–163 power series, 162–163, 169–170 Taylor series, 168–172 uniform convergence, 162 Riemann zeta function, 173 uniform convergence, 161 and term-by-term integration, 162 inner product, 3

Index inner products, 12–15 and distance, 13 and norm, 13 degenerate, 13 hermitian, 13 indefinite, 13, 15 Minkowski, 15 nondegenerate, 13 of functions, 15 positive definite, 12 Schwarz, 13 inner-product spaces, 14–15 integrability, 647 integrable systems, 647–649 integral equations, 334–342 exercises, 341–342 Fredholm, 335–339 eigenfunctions, 335–339 eigenvalues, 335–339 first kind, 335 homogeneous, 335 inhomogeneous, 335 second kind, 335 implications of linearity, 336–341 integral transformations, 339–341 and Bessel functions, 340–341 Fourier, Laplace, and Euler kernels, 340 kernel, 335 numerical solutions, 337–339 Volterra, 335–339 eigenfunctions, 335–339 eigenvalues, 335–339 first kind, 335 homogeneous, 335 inhomogeneous, 335 second kind, 335 integral transformations, 339–341 and Bessel functions, 340–341 Fourier, Laplace, and Euler kernels, 340 integration program, 643–644 internal energy, 264–265, 269–270 invariant distance, 454 √ invariant element of volume, g d 4 x , 494–496 invariant subspace, 44 invariant subspaces, 44 inverse of the metric tensor, 476 involution, 647 irrelevant constant, 726 irrelevant coupling constant, 726 Jacobi identity, 421 jacobians, 38–40, 495 Jensen’s inequalities, 570–571 Jensen’s inequality, 628 kernel of a matrix, 398 Kerr metric, 510 Kramers–Kronig Relations, 235–236

Kronecker delta, 469 Kronecker’s delta, 5 Lagrange multipliers, 41–43, 63–65, 299–313 Lagrange’s equation in field theory, 320–321 in mechanics, 270–273 Lagrangian, 268–269 Lapack, 38, 80 Laplace transforms, 146–155 and convolutions, 155 and differential equations, 149–155 derivatives of, 148–149 examples, 146 integrals of, 148–149 inversion of, 150 Laplace’s equation in 2 dimensions, 355–356 laplacian, 85–92 Lebesgue integration, 108–109 Legendre functions, 294–295, 317, 343–364 exercises, 363–364 second kind, 297 Legendre polynomials, 343–352 addition theorem for, 360 for large n, 351 generating function, 345–347 Helmholtz’s equation with azimuthal symmetry, 354–355 Legendre’s differential equation, 347–349 normalization, 343 recurrence relations, 349–351 Rodrigues’s formula, 344–345 Schlaefli’s integral for, 351–352 special values of, 351 Legendre transformation, 267–273 Legendre’s system, 294–295 Leibniz’s rule, 165 Lerch transcendent, 174 Levi-Civita affine connection, 478 Levi-Civita symbol, 410 Lie algebras ranks of, 427 roots of, 427 weight vector, 427 weights of, 427 Lie groups S O ( 3), 393 SU (2) , 393 light slow, fast, and backwards, 237–239 light-cone coordinates, 731–734 limit cycle, 650 linear algebra, 1–84 exercises, 80–84 linear dependence, 16–17 and determinants, 33 linear independence, 16–17

751

752

Index

and completeness, 16 and determinants, 33 linear least squares, 41 linear operators, 10–12 and matrices, 10 density operator, 76 domain, 10 hermitian, 26 range, 10 real, symmetric, 27 self adjoint, 26–27 unitary, 27–28 linearized gravity, 507–508 linearly dependent, 249 linearly independent, 249 Liouville integrability, 647 logarithm of gamma function, 574 Lorentz force, 458 Lorentz group generators of, 444 Lorentz transformations, 451–456 Lorenz butterfly, 650 lowering an index, 476 Lyapunov exponent, 651, 653 magnetic constant, 457 magnetic field, 457 magnetic induction, 456 majorana field, 257 Majorana mass term, 441, 443 Mandelstam variables, 718 Maple, 38, 80 maps, 653–656 marginal constant, 726 marginal coupling constant, 726 Markoff process, 595 mass of graviton, 508 Mathematica, 38, 80 Matlab, 38, 68, 80, 608 integrating systems of ODEs, 279 Matlab’s gamma function, 168, 574 Matlab’s gammaln function, 168, 574 matrices, 4–7 adjoint, 5 and linear operators, 10 change of basis, 11 characteristic equation of, 45, 48–49 CKM matrix, 69 CKM matrix and C P violation, 69 congruency transformation, 58 defective, 47 density operator, 76 diagonal form of square nondefective matrix, 47 functions of, 49–54 gamma, 444 hermitian, 5, 54–59 and diagonalization by a unitary transformation, 57

complete and orthonormal eigenvectors, 57 degenerate eigenvalues, 55, 56 eigenvalues of, 54 eigenvectors and eigenvalues of, 57 eigenvectors of, 55 identity, 5 imaginary and antisymmetric, 57 inverse, 5 inverses of, 35 nonnegative, 6 normal, 59–65 compatible, 61–65 diagonalization by a unitary transformation, 59 orthogonal, 6, 28 Pauli, 5, 74, 414 positive, 6 positive definite, 6 rank of, 80 example, 80 real and symmetric, 57 similarity transformation, 11 similarity transformation of, 47 singular-value decomposition, 65–69 example, 69 quark mass matrix, 69 square, 44–49 eigenvalues of, 44–49 eigenvectors of, 44–49 trace, 4 cyclic, 4 unitary, 6 upper triangular, 37 maximally symmetric spaces, 490–492 Maxwell’s equations, 457 in vacuum, 458 Maxwell’s generally covariant, inhomogeneous equations, 500 Maxwell–Ampère law, 457 mean value, 138 megaverse, 523 method of steepest descent, 239–241 metric of hyperboloid H 3 , 489–490 metric of sphere S3 , 489 metric spaces, 14–15 metric tensor, 472–477 signature of, 493 square root of absolute value of determinant of, √ g, 494 Minkowski space, 451–453 modulus of complex number, 2 moment-generating functions, 597–600 momentum in classical mechanics, 271 Monte Carlo methods, 632–642 and evolution, 640–641 and lattice gauge theory, 637–638, 641–642 detailed balance, 637 exercises, 641–642

Index genetic algorithms, 641 in statistical mechanics, 634–640 Metropolis step, 636 Metropolis’s algorithm, 634–640 more general applications, 640–641 of data analysis, 633–634 example, 633–634 of numerical integration, 632–633 partition function, 637 simulated annealing, 640 smart schemes, 639 sweeps, 636 thermalization, 636 Moore–Penrose pseudoinverse, 70–72 multiverse, 523 Möbius transformation, 226 n factorial Stirling’s formula, 241 natural units, 142, 689 Navier–Stokes equation, 325 nearest-integer function, 128 neural networks, 644–646 Noether’s theorem, 272–273, 321–324 nonlinear differential equations, 324–330 general relativity, 325–328 solitons, 328–330 normal modes, 278 notation for derivatives, 84–85 null space of a matrix, 398 numbers complex, 1, 8 atan2, 2 phase of, 2 irrational, 1 natural, 1 rational, 1 real, 1 Octave, 38, 80 octet of baryons, 426 octet of pseudoscalar mesons, 426 octonians, 433 open, 394 operators adjoint of, 25 antilinear, 30 antiunitary, 30 compatible, 395 complete, 395 orthogonal, 28 real, symmetric, 27 unitary, 27–28 optical theorem, 236 ordinary differential equations, 248–250 and variational problems, 290–291 boundary conditions, 289–291 differential operators of definite parity, 286

753 even and odd differential operators, 285–286 exact, 260–264, 279–281 Boyle’s law, 261 condition of integrability, 261 Einstein’s law, 261 first order, 260–264 higher order, 279–281 human population growth, 261 integrating factors, 263–264 integration, 262–263 van der Waals’s equation, 261 first order, 257–277 separable, 257–260 first-order self adjoint, 298–299 Frobenius’s series solutions, 282–285 Fuch’s theorem, 285 indicial equation, 283 recurrence relations, 283 Green’s formula, 292 hermitian operators, 299 higher order constant coefficients, 280–281 homogeneous first-order, 273–274 homogeneous functions, 264–274 Lagrange’s identity, 292 linear, 248–250 general solution, 249, 250 homogeneous, 249 inhomogeneous, 249 order of, 248 linear dependence of solutions, 249 linear independence of solutions, 249 linear, first-order, 274–277 exact, 274 integrating factor, 274 meaning of exactness, 261–263 nonlinear, 250 second order self-adjoint form, 248 second-order eigenfunctions, 304–307 eigenvalues, 304–307 essential singularity of, 282 Green’s functions, 319 irregular singular point of, 282 making operators self adjoint, 295–296 non-essential singular point of, 281 regular singular point of, 281 second solution, 287–288 self adjoint, 291–313 singular points at infinity, 282 singular points of, 281–282 weight function, 304 Why not three solutions?, 288–289 wronskians of self-adjoint operators, 296–297 self adjoint, 291–313 self-adjoint operators, 296

754

Index

separable, 257–260 general integral, 258 hidden separability, 260 logistic equation, 258 Zipf’s law, 258 separated, 257 singular points of Legendre’s equation, 282 Sturm–Liouville problem, 296, 299–313 systems of, 270–278, 325–330 Friedmann’s equations, 325–328 Lagrange’s equations, 270–278 Wronski’s determinant, 286–289 orthogonal coordinates, 85–92 divergence in, 85–92 gradient in, 85–92 laplacian in, 85–92 orthogonal polynomials, 352–354 Hermite’s, 353 Jacobi’s, 352–353 Laguerre’s, 353–354 orthogonal transformation, 57 outer products, 19–24 example, 19 in Dirac’s notation, 24 parallel transport, 483–484 partial differential equations, 250–257 Dirac equation, 256–257 general solution, 251 homogeneous, 250 inhomogeneous, 251 Klein–Gordon equation, 255–256 linear, 250–257 separable, 251–257 Helmholtz’s equation, 252–254 wave equations, 255–257 photon, 256 spin-one-half fields, 256–257 spinless bosons, 255–256 path integrals, 669–717 and gaussian integrals, 669–670 and lattice gauge theories, 712 and nonabelian gauge theories, 709–713 ghosts, 712–713 the method of Faddeev and Popov, 709–713 and perturbative field theory, 694–714 and quantum electrodynamics, 698–701 and the Bohm–Aharonov effect, 676–677 and The principle of stationary action, 672–673 density operator for a free particle, 682 density operator for harmonic oscillator, 684–685 euclidian, 679–682 exercises, 714–717 fermionic, 702–708 finite temperature, 679–682 for harmonic oscillator in imaginary time, 684–685 for partition function, 679–682 for the harmonic oscillator, 677–678

in finite-temperature field theory, 692–694 in imaginary time, 679–682 in quantum mechanics, 670–679 in real time, 670–679 Minkowski, 670–679 of fields, 688–714 of fields in euclidian space, 692–694 of fields in imaginary time, 692–694 of fields in real time, 688–692 partition function for a free particle, 682 Pauli matrices, 414 anticommutation relation, 414 perfect fluids, 507 permanent, 38 permittivity, 178, 457 permutations, 404 and determinants, 34 cycles, 404 phase and group velocities, 236–239 slow, fast, and backwards light, 238 phase of complex number, 2 photon polarization, 9 Planck’s constant, 173 Planck’s distribution, 173–174 Poincaré invariant, 649 points, 466–467 Poisson bracket, 647 Poisson distribution and cancer, 576 and coherent states, 576 Poisson summation, 135–136 Poisson’s distribution, 575–576 accuracy of, 576 Porter–Thomas distribution, 602 power series, 162–163 pre-Hilbert spaces, 14–15 pressure, 265 principle of equivalence, 492–493 principle of stationary action, 290–291, 299–304, 498–500 in field theory, 320–322 in mechanics, 270–278 in nonrelativistic mechanics, 270–273, 369–370 in quantum mechanics, 299–304 in relativistic electrodynamics, 460 in relativistic mechanics, 459–460 particle in a gravitational field, 498–500 probability and statistics, 564–631 Bayes’s theorem, 564–567 Bernoulli’s distribution, 572 binomial distribution, 572–575 brownian motion, 588–597 Einstein–Nernst relation, 588–592 Langevin’s theory of, 588–595 Cauchy distributions, 602 central limit theorem, 602–611 illustrations of, 606–611 central moments, 567–572

Index centroid method, 579 characteristic functions, 597–600 chi-squared distribution, 601 chi-squared statistic, 619–622 convergence in probability, 605 correlation coefficient, 571 covariance, 571–572 lower bound of Cramér and Rao, 615–618 cumulants, 600 diffusion, 587–597 diffusion constant, 591 direct stochastic optical reconstruction microscopy, 580 Einstein’s relation, 591 Einstein–Nernst relation, 588–592 ensemble average, 588 error function, 580–583 estimators, 611–618 Bessel’s correction, 614 bias, 612 consistent, 612 standard deviation, 614 standard error, 614 exercises, 628–631 expectation, 567–572 expected value, 567–572 exponential distribution, 601 fat tails, 600–602 Fisher’s information matrix, 614–618 fluctuation and dissipation, 592–595 Gauss’s distribution, 577–586 Gosset’s distribution, 600 Heisenberg’s uncertainty principle, 569–570 information, 614–618 information matrix, 614–618 Jensen’s inequality, 628 Kolmogorov’s function, 623 Kolmogorov’s test, 622–628 kurtosis, 600 Lévy distributions, 602 Lindeberg’s condition, 606 log-normal distribution, 601 Lorentz distributions, 602 maximum likelihood, 618–619 Maxwell–Boltzmann distribution, 585–586 mean, 567–572 moment-generating functions, 597–600 moments, 567–572 normal distribution, 579 Pearson’s distribution, 601 Poisson distribution coherent states, 576 Poisson’s distribution, 575–577 power-law tails, 600 probability density, 567–586 probability distribution, 567–586 random-number generators, 606–609 skewness, 600

Student’s t-distribution, 600 variance, 567–572 lower bound of Cramér and Rao, 615–618 viscous-friction coefficient, 591 proper time, 454, 501 and time dilation, 455–456 pseudoinverse, 70–72, 619 pseudoinverses, 41 pseudorandom numbers, 607, 632–641 Python, 38 Python’s gamma function, 168, 574 Python’s loggamma function, 168, 574 quantum field of spin one and mass zero, 256 quantum field of spin zero, 153, 255 quantum field theory as an interpolating theory, 718–724 quantum mechanics, 299–313 quasirandom numbers, 608, 633 Halton sequences, 633 Sobol’ sequences, 633 quaternions, 429–433 and the Pauli matrices, 430 R-C circuit, 276 raising an index, 476 random-number generators, 606–609 rank of a Lie algebra, 427 rank of a matrix, 80 Rayleigh–Benard convection, 653 recombination, 520 Regge trajectories, 729–731 regular and self-adjoint differential system, 293 relative permittivity, 178 relevant coupling constant, 725 renormalization, 718–726 renormalization group, 259, 718–726 exercises, 726 in condensed-matter physics, 724–726 in lattice field theory, 723–724 in quantum field theory, 719–726 reparametrization invariance, 729 Ricci tensor, 486 Riemann curvature tensor, 485 Riemann manifold, 472–474 rotating black holes, 510 rotations, 409–421 Rössler system, 650–651 Sackur–Tetrode formula, 264 scalar densities, 494–496 scalar density, 495 scalar field, 152 scalar fields, 467 scalars, 467 Schmidt decomposition, 77–78 Schwarz inequality, 15–16, 314 examples, 16

755

756 Schwarz inner products, 78–80 seesaw mechanism, 58 self-adjoint differential operators, 291–313 self-adjoint differential systems, 293–313 semi-euclidian spacetime, 474–476 semi-riemannian manifold, 474–476 sensitivity to initial conditions, 651 separable states, 73 separable vectors, 73, 74 sequence of functions convergence in the mean, 109 signature of a metric, 493 similarity transformation, 11, 47, 50, 53, 54, 394 simple and semisimple Lie algebras, 424 simple and semisimple Lie groups, 424 simple eigenvalue, 47 simply connected, 189 simulated annealing, 640 slow, fast, and backwards light, 238 and Kramers–Kronig relations, 238 small oscillations, 277–278 solitons, 328–330 spacelike, 452 special relativity, 451–465 4-vector force, 455 and time dilation, 455–456 electrodynamics, 456–459 energy–momentum 4-vector, 455 exercises, 464–465 kinematics, 455–456 spherical harmonics, 358–362 spin and statistics, 417–418 spin connection, 525–529 spin-one-half fields in general relativity, 529 squeezed states, 428 standard deviation, 568, 584, 614 standard error, 614 standard model of particle physics, 424 states coherent, 79, 83, 122, 126, 576 Stefan’s constant, 173 Stirling’s formula, 574 Stirling’s formula for n factorial, 241 stochastic process, 595 Stokes’s theorem, 91, 463, 547–553 strange attractor, 650, 654 strange attractors, 659 strings, 727–736 and infinities of quantum field theory, 727 angular momentum conserved, 732 covariant action, 734 D-branes, 734–735 exercises, 736 free-endpoint boundary condition, 729 gauge fixing, 732–734 light-cone coordinates, 731–734 light-cone gauge, 732–734 momentum conserved, 729

Index Nambu–Goto action, 727–731 P-branes, 734–735 Polyakov’s action, 734 quantized, 732–734 Regge trajectories, 729–731 Riemann surfaces and moduli, 735–736 scattering of, 735 superstrings, 733–734 Sturm–Liouville equation, 296, 299–313 subspace, 394 invariant, 394 proper, 394 summation convention, 407, 469 Sylvester’s theorem, 58 symmetric differential operators, 292 symmetries, 322–324 symmetries and conserved quantities, 272–273 symmetry in quantum mechanics, 30 symmetry in mechanics, 273 symplectic groups, 427–433 systems of first-order ordinary differential equations, 278–279 systems of linear equations, 40–41 systems of ordinary differential equations, 278–279, 647–653 tachyon, 733 tangent covectors, 477 tangent vectors, 472–476 Taylor series, 168–172 temperature, 265 of a black hole, 510 tensor product, 424 adding spins, 74–75, 413–421 and hydrogen atom, 74 tensor products adding angular momenta, 413–421 entanglement, 72–74 tensors, 466–535 affine connections, 477–483, 496 and metric tensor, 482–483 and general relativity, 498–523 and spin-one-half fields, 529 antisymmetric, 469, 470 Bianchi identity, 482 Christoffel symbols, 477–483, 496 connections, 477–483, 496 contractions, 469 contravariant metric tensor, 476 covariant curl, 481–482 Covariant derivatives, 477–483, 496 and antisymmetry, 482 metric tensor, 481 covariant derivatives, 477–483 covariant derivatives of tensors, 479–482 curvature, 484–490 curvature of hyperboloid H 3 , 489–490

Index curvature of sphere S2 , 486–488 curvature of sphere S3 , 489 divergence of a contravariant vector, 496–498 and laplacian, 498 Einstein’s equations, 504–506 equivalence principle, 492–493 exercises, 531–535 gauge theory, 523–531 role of vectors, 529–531 standard model, 523–531 hyperboloid H 2 , 475 laplacian, 498 Levi-Civita’s symbol, 495–496 Levi-Civita’s tensor, 495–496 maximally symmetric spaces, 490–492 metric of hyperboloid H 3, 489–490 metric of sphere S 2 , 473 metric of sphere S 3 , 489 notation for derivatives, 471–472 parallel transport, 483–484 particle in a weak, static gravitational field, 502–504 gravitational redshift, 504 gravitational time dilation, 503 perfect fluid, 507 principle of equivalence, 492–493, 500–502 geodesic equation, 502 quotient theorem, 470 second rank, 468–471 symmetric, 469, 470 symmetric and antisymmetric, 470 tensor equations, 471 tetrads, 493–494, 525–529 vierbeins, 493–494 tetrads, 493–494 thermodynamics, 264–265, 269–270 enthalpy, 269–270 entropy, 264–265 Gibbs free energy, 270 Helmholtz free energy, 270 internal energy, 264–265, 269–270 third-harmonic microscopy, 211 three-body problem, 649, 651 time dilation, 454 in muon decay, 454 time-ordered product, 142 time-ordered products, 685–687, 691–694 of euclidian field operators, 693–694 of field operators, 691–692 of position operators, 685–687 timelike, 453 total cross-section, 236 transparency, 520 twistor, 444 two-body problem, 648 uncertainty principle, 138–140, 266–267 unitary matrix, 11 universe of matter, 328

757

universe of radiation, 327 van der Pol oscillator, 649 variance, 568, 583–585 of an operator, 139–140 variational methods, 270–278, 290–291, 299–304, 498–500 in nonrelativistic mechanics, 369–370 in quantum mechanics, 299–304 in relativistic electrodynamics, 460 in relativistic mechanics, 459–460 particle in a gravitational field, 498–500 strings, 727–729 vector calculus, 84–92 exercises, 92 vector space, 17–18 dimension of, 17–18 vectors, 7–9 basis, 7, 17, 19 complete, 17 components, 7 direct product, 72–75 eigenvalues, 43–65 example, 43 eigenvectors, 43–65 example, 43 eigenvectors of square matrix, 47 functions as, 9 orthonormal, 18–19 Gram–Schmidt method, 18–19 partial derivatives as, 9 pure states as, 9 span, 17 span a space, 17 states as, 9 tensor product, 72–75 example, 74 vierbeins, 493–494 Virasoro’s algebra, 242, 246 virial theorem, 265–267 viscous-friction coefficient, 591, 595 volume of a parallelepiped, 32 wedge product, 461 Weyl spinor left handed, 440 right handed, 443 Wigner–Eckart theorem special case of, 399 wronskian, 286–289 Yang–Mills theory, 409, 523–525, 529–531 and general relativity, 529 role of vectors, 529–531 standard model, 523–531 Yukawa potential, 146, 316 zeta function, 172–174, 206, 733 analytic continuation of, 206