Statistical Mechanics: Theory and Molecular Simulation (Oxford Graduate Texts) [2 ed.] 0198825560, 9780198825562

Scientists are increasingly finding themselves engaged in research problems that cross the traditional disciplinary line

289 88 27MB

English Pages 880 [881] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Statistical Mechanics: Theory and Molecular Simulation (Oxford Graduate Texts) [2 ed.]
 0198825560, 9780198825562

Table of contents :
Cover
Titlepage
Copyright
Dedication
Preface
Contents
1 Classical mechanics
1.1 Introduction
1.2 Newton’s laws of motion
1.3 Phase space: visualizing classical motion
1.4 Lagrangian formulation of classical mechanics: A general framework for Newton’s laws
1.5 Legendre transforms
1.6 Generalized momenta and the Hamiltonian formulation of classical mechanics
1.7 A simple classical polymer model
1.8 The action integral
1.9 Lagrangian mechanics and systems with constraints
1.10 Gauss’s principle of least constraint
1.11 Rigid body motion: Euler angles and quaternions
1.12 Non-Hamiltonian systems
1.13 Problems
2 Theoretical foundations of classical statistical mechanics
2.1 Overview
2.2 The laws of thermodynamics
2.3 The ensemble concept
2.4 Phase-space volumes and Liouville’s theorem
2.5 The ensemble distribution function and the Liouville equation
2.6 Equilibrium solutions of the Liouville equation
2.7 Problems
3 The microcanonical ensemble and introduction to molecular dynamics
3.1 Brief overview
3.2 Basic thermodynamics, Boltzmann’s relation, and the partition function of the microcanonical ensemble
3.3 The classical virial theorem
3.4 Conditions for thermal equilibrium
3.5 The free particle and the ideal gas
3.6 The harmonic oscillator and harmonic baths
3.7 Introduction to molecular dynamics
3.8 Integrating the equations of motion: Finite difference methods
3.9 Systems subject to holonomic constraints
3.10 The classical time evolution operator and numerical integrators
3.11 Multiple time-scale integration
3.12 Symplectic integration for quaternions
3.13 Exactly conserved time-step dependent Hamiltonians
3.14 Illustrative examples of molecular dynamics calculations
3.15 Problems
4 The canonical ensemble
4.1 Introduction: A different set of experimental conditions
4.2 Thermodynamics of the canonical ensemble
4.3 The canonical phase-space distribution and partition function
4.4 Canonical ensemble via entropy maximization
4.5 Energy fluctuations in the canonical ensemble
4.6 Simple examples in the canonical ensemble
4.7 Structure and thermodynamics in real gases and liquids from spatial distribution functions
4.8 Perturbation theory and the van der Waals equation
4.9 Molecular dynamics in the canonical ensemble: Hamiltonian formulation in an extended phase space
4.10 Classical non-Hamiltonian statistical mechanics
4.11 Nos´e-Hoover chains
4.12 Integrating the Nos´e-Hoover chain equations
4.13 The isokinetic ensemble: A variant of the canonical ensemble
4.14 Isokinetic Nos´e-Hoover chains: Achieving very large time steps
4.15 Applying canonical molecular dynamics: Liquid structure
4.16 Problems
5 The isobaric ensembles
5.1 Why constant pressure?
5.2 Thermodynamics of isobaric ensembles
5.3 Isobaric phase-space distributions and partition functions
5.4 Isothermal-isobaric ensemble via entropy maximization
5.5 Pressure and work virial theorems
5.6 An ideal gas in the isothermal-isobaric ensemble
5.7 Extending the isothermal-isobaric ensemble: Anisotropic cell fluctuations
5.8 Derivation of the pressure tensor estimator from the canonical partition function
5.9 Molecular dynamics in the isoenthalpic-isobaric ensemble
5.10 Molecular dynamics in the isothermal-isobaric ensemble I: Isotropic volume fluctuations
5.11 Molecular dynamics in the isothermal-isobaric ensemble II: Anisotropic cell fluctuations
5.12 Atomic and molecular virials
5.13 Integrating the Martyna-Tobias-Klein equations of motion
5.14 The isothermal-isobaric ensemble with constraints: The ROLL algorithm
5.15 Problems
6 The grand canonical ensemble
6.1 Introduction: The need for yet another ensemble
6.2 Euler’s theorem
6.3 Thermodynamics of the grand canonical ensemble
6.4 Grand canonical phase space and the partition function
6.5 Grand canonical ensemble via entropy maximization
6.6 Illustration of the grand canonical ensemble: The ideal gas
6.7 Particle number fluctuations in the grand canonical ensemble
6.8 Potential distribution theorem
6.9 Molecular dynamics in the grand canonical ensemble
6.10 Problems
7 Monte Carlo
7.1 Introduction to the Monte Carlo method
7.2 The Central Limit theorem
7.3 Sampling distributions
7.4 Hybrid Monte Carlo
7.5 Replica exchange Monte Carlo
7.6 Wang-Landau sampling
7.7 Transition path sampling and the transition path ensemble
7.8 Problems
8 Free-energy calculations
8.1 Free-energy perturbation theory
8.2 Adiabatic switching and thermodynamic integration
8.3 Adiabatic free-energy dynamics
8.4 Jarzynski’s equality and nonequilibrium methods
8.5 The problem of rare events
8.6 Collective variables
8.7 The blue moon ensemble approach
8.8 Umbrella sampling and weighted histogram methods
8.9 Wang-Landau sampling
8.10 Driven adiabatic free-energy dynamics
8.11 Metadynamics
8.12 The committor distribution and the histogram test
8.13 Problems
9 Quantum mechanics
9.1 Introduction: Waves and particles
9.2 Review of the fundamental postulates of quantum mechanics
9.3 Simple examples
9.4 Identical particles in quantum mechanics: Spin statistics
9.5 Problems
10 Quantum ensembles and the density matrix
10.1 The difficulty of many-body quantum mechanics
10.2 The ensemble density matrix
10.3 Time evolution of the density matrix
10.4 Quantum equilibrium ensembles
10.5 Problems
11 Quantum ideal gases: Fermi-Dirac and Bose-Einstein statistics
11.1 Complexity without interactions
11.2 General formulation of the quantum-mechanical ideal gas
11.3 An ideal gas of distinguishable quantum particles
11.4 General formulation for fermions and bosons
11.5 The ideal fermion gas
11.6 The ideal boson gas
11.7 Problems
12 The Feynman path integral
12.1 Quantum mechanics as a sum over paths
12.2 Derivation of path integrals for the canonical density matrix and the time evolution operator
12.3 Thermodynamics and expectation values from path integrals
12.4 The continuous limit: Functional integrals
12.5 How to think about imaginary time propagation
12.6 Many-body path integrals
12.7 Quantum free-energy profiles
12.8 Numerical evaluation of path integrals
12.9 Problems
13 Classical time-dependent statistical mechanics
13.1 Ensembles of driven systems
13.2 Driven systems and linear response theory
13.3 Applying linear response theory: Green-Kubo relations for transport coefficients
13.4 Calculating time correlation functions from molecular dynamics
13.5 The nonequilibrium molecular dynamics approach
13.6 Problems
14 Quantum time-dependent statistical mechanics
14.1 Time-dependent systems in quantum mechanics
14.2 Time-dependent perturbation theory in quantum mechanics
14.3 Time correlation functions and frequency spectra
14.4 Examples of frequency spectra
14.5 Quantum linear response theory
14.6 Approximations to quantum time correlation functions
14.7 Problems
15 The Langevin and generalized Langevin equations
15.1 The general model of a system plus a bath
15.2 Derivation of the generalized Langevin equation
15.3 Analytically solvable examples
15.4 Vibrational dephasing and energy relaxation in simple fluids
15.5 Molecular dynamics with the Langevin equation
15.6 Designing memory kernels for specific tasks
15.7 Sampling stochastic transition paths
15.8 Mori-Zwanzig theory
15.9 Problems
16 Discrete models and critical phenomena
16.1 Phase transitions and critical points
16.2 The critical exponents α, β, γ, and δ
16.3 Magnetic systems and the Ising model
16.4 Universality classes
16.5 Mean-field theory
16.6 Ising model in one dimension
16.7 Ising model in two dimensions
16.8 Spin correlations and their critical exponents
16.9 Introduction to the renormalization group
16.10 Fixed points of the renormalization group equations in greater than one dimension
16.11 General linearized renormalization group theory
16.12 Understanding universality from the linearized renormalization group theory
16.13 Other uses of discrete models
16.14 Problems
17 Introduction to machine learning in statistical mechanics
17.1 Machine learning in statistical mechanics: What and why?
17.2 Three key probability distributions
17.3 Simple linear regression as a case study
17.4 Kernel methods
17.5 Neural networks
17.6 Weighted neighbor methods
17.7 Demonstrating machine learning in free-energy simulations
17.8 Clustering algorithms
17.9 Intrinsic dimension of a data manifold
17.10 Problems
Appendix A Properties of the Dirac delta-function
Appendix B Calculus of functionals
Appendix C Evaluation of energies and forces
Appendix D Proof of the Trotter theorem
Appendix E Laplace transforms
References
Index

Citation preview

Statistical Mechanics: Theory and Molecular Simulation

Statistical Mechanics: Theory and Molecular Simulation SECOND EDITION M A R K E . T U C K ER M A N Department of Chemistry and Courant Institute of Mathematical Sciences New York University

Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Mark E. Tuckerman 2023 The moral rights of the author have been asserted First Edition published in 2010 Second Edition published in 2023 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2023930434 ISBN 978–0–19–882556–2 DOI: 10.1093/oso/9780198825562.001.0001 Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4YY Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

To Jocelyn and all my group members, past, present, and yet to join

Preface The field of statistical mechanics is evolving with stunning rapidity. Practitioners are actively developing theoretical and computational tools for solving complex problems and those tools are being deployed in increasingly novel applications to real systems of physical, chemical, biological, and engineering import. The first edition of this book provided a solid foundation in the theoretical underpinnings and computational technologies that allow the classical and quantum statistical mechanics of particles to be formally understood and practically implemented. These core concepts will forever remain central to the field and indispensable learning for anyone wishing to enter it. However, the intervening ten-plus years have witnessed advances of such significance that I felt a new edition of the book was needed to incorporate these developments into the book’s framework. These incorporations and revisions caused the book’s length to swell by over 150 additional pages. In the last decade, the application of models and methods from the subarea of artificial intelligence known as machine learning has transformed the landscape of computational statistical mechanics. Large-scale computer simulations in statistical mechanics often generate output data sets of such enormity and heterogeneity that they could be regarded as “big data” within a “chemical” scope (broadly speaking). These data sets often represent multiple systems containing heterogeneous environments evolving under a variety of external conditions. The techniques of machine learning allow these data sets to be mined for hidden patterns, patterns that can be further leveraged to construct simplified models of the data. Owing to their low computational overhead, these models can be employed in subsequent simulations that reach longer length and time scales, reveal meaningful reaction coordinates for specific processes, and even drive rare-event simulations leading to highly featured freeenergy hypersurfaces. Because machine learning has become such an impactful tool in statistical mechanics, a substantial new chapter (Chapter 17) has been added that introduces basic machine learning concepts and model types, including kernel-ridge methods, support-vector machines, neural networks, weighted-neighbor models, and data-clustering algorithms, and demonstrates the application of these models, in both regression and classification modes, in statistical mechanical simulation problems. Other important developments have shaped the second edition significantly. The discussions of collective variables and free-energy based rare-event sampling techniques in Chapter 8 have been overhauled to capture the impressive innovations in these methods and to highlight how novel techniques such as well-tempered metadynamics and driven adiabatic free-energy dynamics are connected and can be synergistically combined, both with each other and with simpler methods like umbrella sampling. The treatment of Feynman path integrals in Chapter 12 has been enhanced to clarify the meaning of imaginary time, to include methods for simulating systems of N identical

viii

Preface

bosons and fermions via path integrals and to incorporate an elegant technique for reducing the computational overhead of path integral simulations. The discussion of approaches for approximating quantum time correlation functions in Chapter 14 has been augmented to include a formally exact open-chain formulation. Functionals also make a more prominent appearance in the second edition, being used to construct a classical entropy maximization principle in Chapters 4 through 6 to derive ensemble distributions, and an additional appendix on the calculus of functions has also been provided. Proofs of the Henderson Theorem on radial distribution functions and the Potential Distribution Theorem on excess chemical potentials can be found in these chapters as well. Resonances in multiple time-scale integration are dealt with in much greater depth in Chapters 4 and 15 and algorithms for circumventing these numerical artifacts and allowing very large time steps in molecular dynamics simulations are presented. Other added material includes an expanded discussion of numerical integrators for deterministic and stochastic thermostats in molecular dynamics, more exactly solvable models and numerical applications, resonance-free multiple time-scale algorithms, and more end-of-chapter exercises. The overall structure of the book has not been changed from the first edition: equilibrium classical statistical mechanics precedes equilibrium quantum statistical mechanics, and both precede discussions of classical and quantum time-dependent statistical mechanics. Stochastic dynamics, discrete models, and machine learning make up the final three chapters of the book. As in the original edition, computational methods are presented side-by-side with the theoretical developments, and, to the extent possible, mathematical complexity gradually increases within each chapter. As I did in the first edition, I wish to close this preface with a list of acknowledgments. I am, as ever, truly grateful to all of the teachers, mentors, colleagues, and coworkers acknowledged in the first edition’s preface, which will immediately follow. In addition, content for the second edition would not have been possible without highly fruitful collaborations with Benedict Leimkuhler, Charlles Abreu, Jutta Rogal, Ondrej Marsalek, Serdal Kirmizialtin, and Joseph Cendagorta. I must also express my continued thanks to the National Science Foundation, the U.S. Department of Energy, and the Army Research Office for the ongoing support I have received from these agencies. Finally, I owe, once again, a tremendous debt of gratitude to my wife Jocelyn Leka who lent me her considerable talents as a substantive editor. As with the first edition, her skills were employed only for the textual parts of the book; any mathematical errors remain mine and mine alone. M.E.T. New York December, 2022

Preface

ix

Preface to the first edition Statistical mechanics is a theoretical framework that aims to predict the observable static and dynamic properties of a many-body system starting from its microscopic constituents and their interactions. Its scope is as broad as the set of “many-body” systems is large: as long as there exists a rule governing the behavior of the fundamental objects that comprise the system, the machinery of statistical mechanics can be applied. Consequently, statistical mechanics has found applications outside of physics, chemistry, and engineering, including biology, social sciences, economics, and applied mathematics. Because it seeks to establish a bridge between the microscopic and macroscopic realms, statistical mechanics often provides a means of rationalizing observed properties of a system in terms of the detailed “modes of motion” of its basic constituents. An example from physical chemistry is the surprisingly high diffusion constant of an excess proton in bulk water, which is a single measurable number. However, this single number belies a strikingly complex dance of hydrogen bond rearrangements and chemical reactions that must occur at the level of individual or small clusters of water molecules in order for this property to emerge. In the physical sciences, the technology of molecular simulation, wherein a system’s microscopic interaction rules are implemented numerically on a computer, allow such “mechanisms” to be extracted and, through the machinery of statistical mechanics, predictions of macroscopic observables to be generated. In short, molecular simulation is the computational realization of statistical mechanics. The goal of this book, therefore, is to synthesize these two aspects of statistical mechanics: the underlying theory of the subject, in both its classical and quantum developments, and the practical numerical techniques by which the theory is applied to solve realistic problems. This book is aimed primarily at graduate students in chemistry or computational biology and graduate or advanced undergraduate students in physics or engineering. These students are increasingly finding themselves engaged in research activities that cross traditional disciplinary lines. Successful outcomes for such projects often hinge on their ability to translate complex phenomena into simple models and develop approaches for solving these models. Because of its broad scope, statistical mechanics plays a fundamental role in this type of work and is an important part of a student’s toolbox. The theoretical part of the book is an extensive elaboration of lecture notes I developed for a graduate-level course in statistical mechanics I give at New York University. These courses are principally attended by graduate and advanced undergraduate students who are planning to engage in research in theoretical and experimental physical chemistry and computational biology. The most difficult question faced by anyone wishing to design a lecture course or a book on statistical mechanics is what to include and what to omit. Because statistical mechanics is an active field of research, it comprises a tremendous body of knowledge, and it is simply impossible to treat the entirety of the subject in a single opus. For this reason, many books with the words “statistical mechanics” in their titles can differ considerably. Here, I have attempted to bring together topics that reflect what I see as the modern landscape of statistical mechanics. The reader will notice from a quick scan of the table of contents that

x

Preface

the topics selected are rarely found together in individual textbooks on the subject; these topics include isobaric ensembles, path integrals, classical and quantum timedependent statistical mechanics, the generalized Langevin equation, the Ising model, and critical phenomena. (The closest such book I have found is also one of my favorites, David Chandler’s Introduction to Modern Statistical Mechanics.) The computational part of the book joins synergistically with the theoretical part and is designed to give the reader a solid grounding in the methodology employed to solve problems in statistical mechanics. It is intended neither as a simulation recipe book nor a scientific programmer’s guide. Rather, it aims to show how the development of computational algorithms derives from the underlying theory with the hope of enabling readers to understand the methodology-oriented literature and develop new techniques of their own. The focus is on the molecular dynamics and Monte Carlo techniques and the many novel extensions of these methods that have enhanced their applicability to, for example, large biomolecular systems, complex materials, and quantum phenomena. Most of the techniques described are widely available in molecular simulation software packages and are routinely employed in computational investigations. As with the theoretical component, it was necessary to select among the numerous important methodological developments that have appeared since molecular simulation was first introduced. Unfortunately, several important topics had to be omitted due to space constraints, including configuration-bias Monte Carlo, the reference potential spatial warping algorithm, and semi-classical methods for quantum time correlation functions. This omission was not made because I view these methods as less important than those I included. Rather, I consider these to be very powerful but highly advanced methods that, individually, might have a narrower target audience. In fact, these topics were slated to appear in a chapter of their own. However, as the book evolved, I found that nearly 700 pages were needed to lay the foundation I sought. In organizing the book, I have made several strategic decisions. First, the book is structured such that concepts are first introduced within the framework of classical mechanics followed by their quantum mechanical counterparts. This lies closer perhaps to a physicist’s perspective than, for example, that of a chemist, but I find it to be a particularly natural one. Moreover, given how widespread computational studies based on classical mechanics have become compared to analogous quantum investigations (which have considerably higher computational overhead), this progression seems to be both logical and practical. Second, the technical development within each chapter is graduated, with the level of mathematical detail generally increasing from chapter start to chapter end. Thus, the mathematically most complex topics are reserved for the final sections of each chapter. I assume that readers have an understanding of calculus (through calculus of several variables), linear algebra, and ordinary differential equations. This structure hopefully allows readers to maximize what they take away from each chapter while rendering it easier to find a stopping point within each chapter. In short, the book is structured such that even a partial reading of a chapter allows the reader to gain a basic understanding of the subject. It should be noted that I attempted to adhere to this graduated structure only as a general protocol. Where I felt that breaking this progression made logical sense, I have forewarned the reader

Preface

xi

about the mathematical arguments to follow, and the final result is generally given at the outset. Readers wishing to skip the mathematical details can do so without loss of continuity. The third decision I have made is to integrate theory and computational methods within each chapter. Thus, for example, the theory of the classical microcanonical ensemble is presented together with a detailed introduction to the molecular dynamics method and how the latter is used to generate a classical microcanonical distribution. The other classical ensembles are presented in a similar fashion as is the Feynman path integral formulation of quantum statistical mechanics. The integration of theory and methodology serves to emphasize the viewpoint that understanding one helps in understanding the other. Throughout the book, many of the computational methods presented are accompanied by simple numerical examples that demonstrate their performance. These examples range from low-dimensional “toy” problems that can be easily coded up by the reader (some of the exercises in each chapter ask precisely this) to atomic and molecular liquids, aqueous solutions, model polymers, biomolecules, and materials. Not every method presented is accompanied by a numerical example, and in general I have tried not to overwhelm the reader with a plethora of applications requiring detailed explanations of the underlying physics, as this is not the primary aim of the book. Once the basics of the methodology are understood, readers wishing to explore applications particular to their interests in more depth can subsequently refer to the literature. A word or two should be said about the problem sets at the end of each chapter. Math and science are not spectator sports, and the only way to learn the material is to solve problems. Some of the problems in the book require the reader to think conceptually while others are more mathematical, challenging the reader to work through various derivations. There are also problems that ask the reader to analyze proposed computational algorithms by investigating their capabilities. For readers with some programming background, there are exercises that involve coding up a method for a simple example in order to explore the method’s performance on that example, and in some cases, reproduce a figure from the text. These coding exercises are included because one can only truly understand a method by programming it up and trying it out on a simple problem for which long runs can be performed and many different parameter choices can be studied. However, I must emphasize that even if a method works well on a simple problem, it is not guaranteed to work well for realistic systems. Readers should not, therefore, na¨ıvely extrapolate the performance of any method they try on a toy system to high-dimensional complex problems. Finally, in each problem set, some problems are preceded by an asterisk (∗ ). These are problems of a more challenging nature that require deeper thinking or a more in-depth mathematical analysis. All of the problems are designed to strengthen understanding of the basic ideas. Let me close this preface by acknowledging my teachers, mentors, colleagues, and coworkers without whom this book would not have been possible. I took my first statistical mechanics courses with Y. R. Shen at the University of California Berkeley and A. M. M. Pruisken at Columbia University. Later, I audited the course teamtaught by James L. Skinner and Bruce J. Berne, also at Columbia. I was also privileged to have been mentored by Bruce Berne as a graduate student, by Michele Parrinello

xii

Preface

during a postdoctoral appointment at the IBM Forschungslaboratorium in R¨ uschlikon, Switzerland, and by Michael L. Klein while I was a National Science Foundation postdoctoral fellow at the University of Pennsylvania. Under the mentorship of these extraordinary individuals, I learned and developed many of the computational methods that are discussed in the book. I must also express my thanks to the National Science Foundation for their continued support of my research over the past decade. Many of the developments presented here were made possible through the grants I received from them. I am deeply grateful to the Alexander von Humboldt Foundation for a Friedrich Wilhelm Bessel Research Award that funded an extended stay in Germany where I was able to work on ideas that influenced many parts of the book. I am equally grateful to my German host and friend Dominik Marx for his support during this stay, for many useful discussions, and for many fruitful collaborations that have helped shape the book’s content. I also wish to acknowledge my long-time collaborator and friend Glenn Martyna for his help in crafting the book in its initial stages and for his critical reading of the first few chapters. I have also received many helpful suggestions from Bruce Berne, Giovanni Ciccotti, Hae-Soo Oh, Michael Shirts, and Dubravko Sabo. I am indebted to the excellent students and postdocs with whom I have worked over the years for their invaluable contributions to several of the techniques presented herein and for all they have taught me. I would also like to acknowledge my former student Kiryn Haslinger Hoffman for her work on the illustrations used in the early chapters. Finally, I owe a tremendous debt of gratitude to my wife Jocelyn Leka, whose finely honed skills as an editor were brought to bear on crafting the wording used throughout the book. Editing me took up many hours of her time. Her skills were restricted to the textual parts of the book; she was not charged with the onerous task of editing the equations. Consequently, any errors in the latter are mine and mine alone. M.E.T. New York July, 2010

Contents 1

Classical mechanics 1.1 Introduction 1.2 Newton’s laws of motion 1.3 Phase space: visualizing classical motion 1.4 Lagrangian formulation of classical mechanics: A general framework for Newton’s laws 1.5 Legendre transforms 1.6 Generalized momenta and the Hamiltonian formulation of classical mechanics 1.7 A simple classical polymer model 1.8 The action integral 1.9 Lagrangian mechanics and systems with constraints 1.10 Gauss’s principle of least constraint 1.11 Rigid body motion: Euler angles and quaternions 1.12 Non-Hamiltonian systems 1.13 Problems

1 1 1 5

18 25 29 32 35 37 47 50

2

Theoretical foundations of classical statistical mechanics 2.1 Overview 2.2 The laws of thermodynamics 2.3 The ensemble concept 2.4 Phase-space volumes and Liouville’s theorem 2.5 The ensemble distribution function and the Liouville equation 2.6 Equilibrium solutions of the Liouville equation 2.7 Problems

55 55 57 63 65 67 71 72

3

The microcanonical ensemble and introduction to molecular dynamics 3.1 Brief overview 3.2 Basic thermodynamics, Boltzmann’s relation, and the partition function of the microcanonical ensemble 3.3 The classical virial theorem 3.4 Conditions for thermal equilibrium 3.5 The free particle and the ideal gas 3.6 The harmonic oscillator and harmonic baths 3.7 Introduction to molecular dynamics 3.8 Integrating the equations of motion: Finite difference methods 3.9 Systems subject to holonomic constraints 3.10 The classical time evolution operator and numerical integrators 3.11 Multiple time-scale integration

10 17

77 77 78 83 86 89 95 98 101 106 110 116

xiv

4

Contents

3.12 3.13 3.14 3.15

Symplectic integration for quaternions Exactly conserved time-step dependent Hamiltonians Illustrative examples of molecular dynamics calculations Problems

122 124 127 134

The 4.1 4.2 4.3 4.4 4.5 4.6 4.7

canonical ensemble Introduction: A different set of experimental conditions Thermodynamics of the canonical ensemble The canonical phase-space distribution and partition function Canonical ensemble via entropy maximization Energy fluctuations in the canonical ensemble Simple examples in the canonical ensemble Structure and thermodynamics in real gases and liquids from spatial distribution functions Perturbation theory and the van der Waals equation Molecular dynamics in the canonical ensemble: Hamiltonian formulation in an extended phase space Classical non-Hamiltonian statistical mechanics Nos´e-Hoover chains Integrating the Nos´e-Hoover chain equations The isokinetic ensemble: A variant of the canonical ensemble Isokinetic Nos´e-Hoover chains: Achieving very large time steps Applying canonical molecular dynamics: Liquid structure Problems

139 139 140 141 146 148 150

4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 5

The 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15

isobaric ensembles Why constant pressure? Thermodynamics of isobaric ensembles Isobaric phase-space distributions and partition functions Isothermal-isobaric ensemble via entropy maximization Pressure and work virial theorems An ideal gas in the isothermal-isobaric ensemble Extending the isothermal-isobaric ensemble: Anisotropic cell fluctuations Derivation of the pressure tensor estimator from the canonical partition function Molecular dynamics in the isoenthalpic-isobaric ensemble Molecular dynamics in the isothermal-isobaric ensemble I: Isotropic volume fluctuations Molecular dynamics in the isothermal-isobaric ensemble II: Anisotropic cell fluctuations Atomic and molecular virials Integrating the Martyna-Tobias-Klein equations of motion The isothermal-isobaric ensemble with constraints: The ROLL algorithm Problems

159 176 186 191 198 203 210 214 218 220 231 231 232 233 239 240 242 243 247 251 254 258 261 263 272 277

Contents

grand canonical ensemble Introduction: The need for yet another ensemble Euler’s theorem Thermodynamics of the grand canonical ensemble Grand canonical phase space and the partition function Grand canonical ensemble via entropy maximization Illustration of the grand canonical ensemble: The ideal gas Particle number fluctuations in the grand canonical ensemble Potential distribution theorem Molecular dynamics in the grand canonical ensemble Problems

xv

6

The 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10

7

Monte Carlo 7.1 Introduction to the Monte Carlo method 7.2 The Central Limit theorem 7.3 Sampling distributions 7.4 Hybrid Monte Carlo 7.5 Replica exchange Monte Carlo 7.6 Wang-Landau sampling 7.7 Transition path sampling and the transition path ensemble 7.8 Problems

303 303 304 308 320 323 327 328 335

8

Free-energy calculations 8.1 Free-energy perturbation theory 8.2 Adiabatic switching and thermodynamic integration 8.3 Adiabatic free-energy dynamics 8.4 Jarzynski’s equality and nonequilibrium methods 8.5 The problem of rare events 8.6 Collective variables 8.7 The blue moon ensemble approach 8.8 Umbrella sampling and weighted histogram methods 8.9 Wang-Landau sampling 8.10 Driven adiabatic free-energy dynamics 8.11 Metadynamics 8.12 The committor distribution and the histogram test 8.13 Problems

339 339 342 346 350 357 359 363 370 373 374 384 390 392

9

Quantum mechanics 9.1 Introduction: Waves and particles 9.2 Review of the fundamental postulates of quantum mechanics 9.3 Simple examples 9.4 Identical particles in quantum mechanics: Spin statistics 9.5 Problems

397 397 399 412 419 422

10 Quantum ensembles and the density matrix 10.1 The difficulty of many-body quantum mechanics 10.2 The ensemble density matrix

281 281 281 283 284 290 291 293 295 297 300

430 430 431

xvi

Contents

10.3 Time evolution of the density matrix 10.4 Quantum equilibrium ensembles 10.5 Problems

434 435 441

11 Quantum ideal gases: Fermi-Dirac and Bose-Einstein statistics 11.1 Complexity without interactions 11.2 General formulation of the quantum-mechanical ideal gas 11.3 An ideal gas of distinguishable quantum particles 11.4 General formulation for fermions and bosons 11.5 The ideal fermion gas 11.6 The ideal boson gas 11.7 Problems

446 446 446 450 451 454 471 481

12 The Feynman path integral 12.1 Quantum mechanics as a sum over paths 12.2 Derivation of path integrals for the canonical density matrix and the time evolution operator 12.3 Thermodynamics and expectation values from path integrals 12.4 The continuous limit: Functional integrals 12.5 How to think about imaginary time propagation 12.6 Many-body path integrals 12.7 Quantum free-energy profiles 12.8 Numerical evaluation of path integrals 12.9 Problems

486 486 490 497 502 511 513 522 524 554

13 Classical time-dependent statistical mechanics 13.1 Ensembles of driven systems 13.2 Driven systems and linear response theory 13.3 Applying linear response theory: Green-Kubo relations for transport coefficients 13.4 Calculating time correlation functions from molecular dynamics 13.5 The nonequilibrium molecular dynamics approach 13.6 Problems

569 579 583 594

14 Quantum time-dependent statistical mechanics 14.1 Time-dependent systems in quantum mechanics 14.2 Time-dependent perturbation theory in quantum mechanics 14.3 Time correlation functions and frequency spectra 14.4 Examples of frequency spectra 14.5 Quantum linear response theory 14.6 Approximations to quantum time correlation functions 14.7 Problems

599 599 603 613 617 620 626 641

15 The 15.1 15.2 15.3 15.4

646 646 649 656 664

Langevin and generalized Langevin equations The general model of a system plus a bath Derivation of the generalized Langevin equation Analytically solvable examples Vibrational dephasing and energy relaxation in simple fluids

560 560 562

Contents

15.5 15.6 15.7 15.8 15.9

Molecular dynamics with the Langevin equation Designing memory kernels for specific tasks Sampling stochastic transition paths Mori-Zwanzig theory Problems

16 Discrete models and critical phenomena 16.1 Phase transitions and critical points 16.2 The critical exponents α, β, γ, and δ 16.3 Magnetic systems and the Ising model 16.4 Universality classes 16.5 Mean-field theory 16.6 Ising model in one dimension 16.7 Ising model in two dimensions 16.8 Spin correlations and their critical exponents 16.9 Introduction to the renormalization group 16.10 Fixed points of the renormalization group equations in greater than one dimension 16.11 General linearized renormalization group theory 16.12 Understanding universality from the linearized renormalization group theory 16.13 Other uses of discrete models 16.14 Problems

xvii

667 676 680 682 688 695 695 697 698 703 704 710 712 719 720 727 729 731 733 734

17 Introduction to machine learning in statistical mechanics 17.1 Machine learning in statistical mechanics: What and why? 17.2 Three key probability distributions 17.3 Simple linear regression as a case study 17.4 Kernel methods 17.5 Neural networks 17.6 Weighted neighbor methods 17.7 Demonstrating machine learning in free-energy simulations 17.8 Clustering algorithms 17.9 Intrinsic dimension of a data manifold 17.10 Problems

740 740 741 743 746 750 764 767 782 784 787

Appendix A

Properties of the Dirac delta-function

793

Appendix B

Calculus of functionals

796

Appendix C

Evaluation of energies and forces

802

Appendix D

Proof of the Trotter theorem

815

Appendix E

Laplace transforms

818

References

823

Index

849

1 Classical mechanics 1.1

Introduction

The first part of this book is devoted to the subject of classical statistical mechanics, which is founded upon the fundamental laws of classical mechanics as originally stated by Newton. Although the laws of classical mechanics were first postulated to study the motion of planets, stars and other large-scale objects, they turn out to be a surprisingly good approximation at the molecular level (where the true behavior is correctly described by the laws of quantum mechanics). Indeed, an entire computational methodology, known as molecular dynamics, is based on the applicability of the laws of classical mechanics to microscopic systems. Molecular dynamics has been remarkably successful in its ability to predict macroscopic thermodynamic and dynamic observables for a wide variety of systems using the rules of classical statistical mechanics to be discussed in the next chapter. Many of these applications address important problems in biology, such as protein and nucleic acid folding, in materials science, such as surface catalysis and functionalization, in the structure and dynamics of glasses and their melts, and in nanotechnology, such as the behavior of self-assembled monolayers and the formation of molecular devices. Throughout the book, we will be discussing both model and realistic examples of such applications. In this chapter, we will begin with a discussion of Newton’s laws of motion and build up to the more elegant Lagrangian and Hamiltonian formulations of classical mechanics, both of which play fundamental roles in statistical mechanics. The origin of these formulations from the action principle will be discussed. The chapter will conclude with a first look at systems that do not fit into the Hamiltonian/Lagrangian framework and the application of such systems in the description of certain physical situations.

1.2

Newton’s laws of motion

In 1687, the English physicist and mathematician Sir Isaac Newton published the Philosophiae Naturalis Principia Mathematica, wherein three simple and elegant laws governing the motion of interacting objects are given. These may be stated briefly as follows: 1. In the absence of external forces, a body will either be at rest or execute motion along a straight line with a constant velocity v. 2. The action of an external force F on a body produces an acceleration a equal to the force divided by the mass m of the body:

2

Classical mechanics

a=

F , m

F = ma.

(1.2.1)

3. If body A exerts a force on body B, then body B exerts an equal and opposite force on body A. That is, if FAB is the force body A exerts on body B, then the force FBA exerted by body B on body A satisfies FBA = −FAB .

(1.2.2)

In general, two objects can exert attractive or repulsive forces on each other, depending on their relative spatial location, and the precise dependence of the force on the relative location of the objects is specified by a particular force law.1 Although Newton’s interests largely focused on the motion of celestial bodies interacting via gravitational forces, most atoms are massive enough that their motion can be treated reasonably accurately within a classical framework. Hence, the laws of classical mechanics can be approximately applied at the molecular level. Naturally, there are numerous instances in which the classical approximation breaks down, and a proper quantum mechanical treatment is needed. For the present, however, we will assume the approximate validity of classical mechanics at the molecular level and proceed to apply Newton’s laws as stated above. The motion of an object can be described quantitatively by specifying the Cartesian position vector r(t) of the object in space at any time t. This is tantamount to specifying three functions of time, the components of r(t), r(t) = (x(t), y(t), z(t)).

(1.2.3)

Recognizing that the velocity v(t) of the object is the first time derivative of the position, v(t) = dr/dt, and that the acceleration a(t) is the first time derivative of the velocity, a(t) = dv/dt, the acceleration is easily seen to be the second derivative of position, a(t) = d2 r/dt2 . Therefore, Newton’s second law, F = ma, can be expressed as a second-order differential equation m

d2 r = F. dt2

(1.2.4)

(We shall henceforth employ the overdot notation for differentiation with respect to time. Thus, r˙ = dr/dt and ¨r = d2 r/dt2 .) Since eqn. (1.2.4) is a second-order equation, it is necessary to specify two initial conditions, these being the initial position r(0) and initial velocity v(0). The solution of eqn. (1.2.4) subject to these initial conditions uniquely specifies the motion of the object for all time. The force F that acts on an object is capable of doing work on the object. In order to see how work is computed, consider Fig. 1.1, which shows a force F acting on a 1 Throughout the book, vector quantities will be designated using boldface type. Thus, in three spatial dimensions, a vector u has three components ux , uy , and uz , and we will q represent the vector

as the ordered triple u = (ux , uy , uz ). The vector magnitude u = |u| = denoted using normal type.

u2x + u2y + u2z will be

Newton’s laws of motion

3

Fig. 1.1 Example of mechanical work. Here dW = F · dl = F cos θdl.

system along a particular path. The work dW performed along a short segment dl of the path is defined to be dW = F · dl = F cos θdl. (1.2.5) The total work done on the object by the force between points A and B along the path is obtained by integrating over the path from A to B: Z B WAB (path) = F · dl. (1.2.6) A

In general, the work done on an object by a force depends on the path taken between A and B. For certain types of forces, called conservative forces, the work is independent of the path and only depends on the endpoints of the path. We shall describe shortly how conservative forces are defined. Note that the definition of work depends on context. Equation (1.2.6) specifies the work done by a force F. If this force is an intrinsic part of the system, then we refer to this type of work as work done by the system. If we wish to calculate the work done against such a force by some external agent, then this work would be the negative of that obtained using eqn. (1.2.6), and we refer to this as work done on the system. An example is the force exerted by the Earth’s gravitational field on an object of mass m. If the mass falls under the Earth’s gravitational pull through a distance h, we can think of the object and the gravitational force as defining the mechanical system. In this case, the system does work, and eqn. (1.2.6) would yield a positive value. Conversely, if we applied eqn. (1.2.6) to the opposite problem of raising the object to a height h, it would yield a negative result. This is simply telling us some external agent must do work on the system against the force of gravity in order to raise it to a height h. Generally, it is obvious what sign to impart to work, yet the distinction between work done on and by a system will become important in our discussions of thermodynamics and classical statistical mechanics in Chapters 2 through 6. Given the form of Newton’s second law in eqn. (1.2.4), it can be easily shown that, in a flat or Euclidean space, Newton’s first law is redundant. According to Newton’s first law, an object initially at a position r(0) moving with constant velocity v will move along a straight line described by r(t) = r(0) + vt.

(1.2.7)

This is an example of a trajectory, that is, a specification of the object’s position as a function of time and initial conditions. If no force acts on the object, then, according to Newton’s second law, its position will be the solution of

4

Classical mechanics

¨r = 0.

(1.2.8)

The straight line motion of eqn. (1.2.7) is, in fact, the unique solution of eqn. (1.2.8) for an object whose initial position is r(0) and whose initial (and constant) velocity is v. Thus, Newton’s second law embodies Newton’s first law. Statistical mechanics is concerned with the behavior of large numbers of objects that can be viewed as the fundamental constituents of a particular microscopic model of the system, whether they are individual atoms or molecules, or even groups of atoms in a macromolecule (for example, the amino acids in a protein). We shall, henceforth, refer to these constituents as “particles” (or, in some cases, “pseudoparticles”). The classical behavior of a system of N particles in three dimensions is given by the generalization of Newton’s second law to the system. In order to develop the general form of Newton’s second law, note that particle i, i ∈ [1, N ], will experience a force Fi due to all of the other particles in the system and possibly the external environment or external agents as well. Denoting the position vectors of the N particles as r1 , ..., rN , the forces Fi are generally functions of these positions, and if frictional forces are present, Fi could also be a function of the particle’s velocity r˙ i . We denote this functional dependence as Fi = Fi (r1 , ..., rN , r˙ i ). For example, if the force Fi depends only on individual contributions from every other particle in the system, we say that the forces are pairwise additive. In this case, the force Fi can be expressed as X Fi (r1 , ..., rN , r˙ i ) = fij (ri − rj ) + f (ext) (ri , r˙ i ). (1.2.9) j6=i

The first term in eqn. (1.2.9) describes forces that are intrinsic to the system and are part of the definition of the mechanical system, while the second term describes forces that are entirely external to the system. For a general N -particle system, Newton’s second law for particle i takes the form mi ¨ri = Fi (r1 , ..., rN , r˙ i ).

(1.2.10)

These equations, referred to as the equations of motion of the system, must be solved subject to a set of initial positions, {r1 (0), ..., rN (0)}, and velocities, {r˙ 1 (0), ..., r˙ N (0)}. In any realistic system, the interparticle forces are highly nonlinear functions of the N particle positions so that eqns. (1.2.10) possess enormous dynamical complexity, and obtaining an analytical solution is hopeless. Moreover, even if an accurate numerical solution could be obtained, for macroscopic matter, where N ∼ 1023 , the computational resources required to calculate and store the solutions for each and every particle at a large number of discrete time points would exceed by many orders of magnitude all those presently available, making such a task equally untenable. Given these considerations, how can we ever expect to calculate physically observable properties of realistic systems starting from a microscopic description if the fundamental equations governing the behavior of the system cannot be solved? The rules of statistical mechanics provide the necessary connection between the microscopic laws and macroscopic observables. These rules, however, cannot circumvent the complexity of the system. Therefore, several approaches can be considered for dealing with this complexity: A highly simplified model for a system that lends

Phase space

5

itself to an analytical solution could be introduced. Although often of limited utility, important physical insights can sometimes be extracted from a clever model, and it is usually possible to study the behavior of the model as external conditions are varied, such as the number of particles, containing volume, applied pressure, and so forth. Alternatively, one can consider a system, not of 1023 particles, but of a much smaller number, perhaps 102 –109 particles, depending on the nature of the system, and solve the equations of motion numerically subject to initial conditions and the boundary conditions of a containing volume. Fortunately, many macroscopic properties are well-converged with respect to system size for such small numbers of particles! The rules of statistical mechanics are then used to analyze the numerical trajectories thus generated. This is the essence of the technique known as molecular dynamics. Although the molecular dynamics approach is very powerful, a significant disadvantage exists: in order to study the dependence on external conditions, a separate calculation must be performed for every choice of these conditions, hence a very large number of calculations is needed, for example, in order to map out a phase diagram. In addition, the “exact” forces between particles cannot be determined and, hence, models for these forces must be introduced. Usually, the more accurate the model, the more computationally intensive the numerical calculation, and the more limited the scope of the calculation with respect to time and length scales and the properties that can be studied. Often, time and length scales can be bridged by combining models of different accuracy, including even continuum models commonly used in engineering, to describe different aspects of a large, complex system, and devising clever numerical solvers for the resulting equations of motion. Numerical calculations (typically referred to as simulations) have become an integral part of modern theoretical research, and since many of these calculations rely on the laws of classical mechanics, it is important that this subject be covered in some detail before advancing to a discussion of the rules of statistical mechanics. The remainder of this chapter will, therefore, be devoted to introducing the concepts from classical mechanics that will be needed for our subsequent discussion of statistical mechanics.

1.3

Phase space: visualizing classical motion

Newton’s equations specify the complete set of particle positions {r1 (t), ..., rN (t)} and, by differentiation, the particle velocities {v1 (t), ..., vN (t)} at any time t, given that the positions and velocities are known at one particular instant in time. For reasons that will be clear shortly, it is often preferable to work with the particle momenta, {p1 (t), ..., pN (t)}, which, in Cartesian coordinates, are related to the velocities by pi = mi vi = mi r˙ i .

(1.3.1)

Note that, in terms of momenta, Newton’s second law can be written as Fi = mai = mi

dpi dvi = . dt dt

(1.3.2)

Therefore, the classical dynamics of an N -particle system can be expressed by specifying the full set of 6N functions, {r1 (t), ..., rN (t), p1 (t), ..., pN (t)}. Equivalently, at any

6

Classical mechanics

instant t in time, all of the information about the system is specified by 6N numbers (or 2dN in d dimensions). These 6N numbers constitute the microscopic state of the system at time t. That these 6N numbers are sufficient to characterize the system follows entirely from the fact that they are all that is needed to seed eqns. (1.2.10), from which the complete time evolution of the system can be determined. Suppose, at some instant in time, the positions and momenta of the system are {r1 , ..., rN , p1 , ..., pN }. These 6N numbers can be regarded as an ordered 6N -tuple or a single point in a 6N -dimensional space called phase space. Although the geometry of this space can, under certain circumstances, be nontrivial, in its simplest form, a phase space is a Cartesian space that can be constructed from 6N mutually orthogonal axes. We shall denote a general point in the phase space as x = (r1 , ..., rN , p1 , ..., pN )

(1.3.3)

also known as the phase-space vector. (As we will see in Chapter 2, phase spaces play a central role in classical statistical mechanics.) Solving eqns. (1.2.10) generates a set of functions x(t) = (r1 (t), ..., rN (t), p1 (t), ..., pN (t)) ≡ xt , (1.3.4) which describe a parametric path or trajectory in the phase space. Therefore, classical motion can be described by the motion of a point along a trajectory in phase space. Although phase-space trajectories can only be visualized for a one-particle system in one spatial dimension, it is, nevertheless, instructive to study several such examples. Consider, first, a free particle with coordinate x and momentum p, described by the one-dimensional analog of eqn. (1.2.7), i.e., x(t) = x(0) + (p/m)t, where p is the particle’s (constant) momentum. A plot of p vs. x is simply a straight horizontal line starting at x(0) and extending in the direction of increasing x if p > 0 or decreasing x if p < 0. This is illustrated in Fig. 1.2. The line is horizontal because p is constant for all x values visited on the trajectory.

Fig. 1.2 Phase space of a one-dimensional free particle.

Phase space

7

Another important example of a phase-space trajectory is that of a simple Harmonic oscillator, for which the force law is given by Hooke’s law, F (x) = −kx, where k is a constant known as the force constant. In this case, Newton’s second law takes the form m¨ x = −kx. (1.3.5) For a given initial condition, x(0) and p(0), the solution of eqn. (1.3.5) is x(t) = x(0) cos ωt +

p(0) sin ωt, mω

(1.3.6)

p where ω = k/m is the natural frequency of the oscillator. Equation (1.3.6) can be verified by substitution into eqn. (1.3.5). Differentiating once with respect to time and multiplying by the mass gives an expression for the momentum p(t) = p(0) cos ωt − mωx(0) sin ωt.

(1.3.7)

Note that p(t) and x(t) are related by 1 (p(t))2 + mω 2 (x(t))2 = C, 2m 2

(1.3.8)

where C is a constant determined by the initial condition according to C=

(p(0))2 1 + mω 2 (x(0))2 . 2m 2

(1.3.9)

(This relation is known as the conservation of energy, which we will discuss in greater detail in the next few sections.) From eqn. (1.3.8), it can be seen that the phase-space plot, p vs. x, specified by p2 /2m + mω 2x2 /2 = C is an ellipse with axes (2mC)1/2 and (2C/mω 2 )1/2 as shown in Fig. 1.3. The analysis also indicates that different initial

Fig. 1.3 Phase space of the one-dimensional harmonic oscillator.

8

Classical mechanics

conditions give rise to different values of C, which changes the size of the ellipse. Changing the mass and frequency changes the shape of the ellipse. Phase-space plots determine the values of position and momentum the system will visit along a trajectory for a given set of initial conditions. These values constitute the accessible phase space. For a free particle, the accessible phase space is unbounded since x lies in the interval x ∈ [x(0), ∞) for p > 0 or x ∈ (−∞, x(0)] for p < 0. The harmonic oscillator, by contrast, provides an example of a phase space that is bounded. Consider, finally, the example of a particle of mass m rolling over a hill under the influence of gravity, as illustrated in Fig. 1.4(a). (This example is a one-dimensional idealization of a situation that should be familiar to anyone who has ever played miniature golf and also serves as a paradigm for chemical reactions.) We will assume that the top of the hill corresponds to a position x = 0. The force law for this problem is non-linear, so that a simple, closed-form, analytical solution to Newton’s second law is not readily available. However, an analytical solution is not needed in order to visualize the motion using a phase-space picture. Several kinds of motion are possible depending on the initial conditions. First, if the particle is not rolled quickly enough, it cannot roll completely over the hill. Rather, it will climb part way up the hill and then roll back down the same side. This type of motion is depicted in the phase-space plot of Fig. 1.4(b). Note that the plot only shows the motion in a region close to the hill. A full phase-space plot would extend to x = ±∞. On the other hand, if the initial speed is high enough, the particle can reach the top of the hill and roll down the other side as depicted in Fig. 1.4(d). The crossover between these two scenarios occurs for

Fig. 1.4 Phase space of a one-dimensional particle subject to the “hill” potential: (a) Two particles approach the hill, one from the left, one from the right. (b) Phase-space plot if the particles have insufficient energy to roll over the hill. (c) Same if the energy is just sufficient for a particle to reach the top of the hill and come to rest there. (d) Same if the energy is greater than that needed to roll over the hill.

Phase space

9

one particular initial rolling speed in which the ball can just climb to the top of the hill and come to rest there, as is shown in Fig. 1.4(c). Such a trajectory clearly divides the phase space between the two types of motion shown in Figs. 1.4(b) and 1.4(d) and is known as a separatrix. If this example were extended to include a large number of hills with possibly different heights, then the phase space would contain a very large number of separatrices. Such an example is paradigmatic of the force laws that one encounters in complex problems such as protein folding, one of the most challenging computational problems in biophysics. Visualizing the trajectory of a complex many-particle system in phase space is not possible due to the high dimensionality of the space. Moreover, the phase space may be bounded in some directions and unbounded in others. For formal purposes, it is often useful to think of an illustrative phase-space plot, in which some particular set of coordinates of special interest are shown collectively on one axis and their corresponding momenta are shown on the other with a schematic representation of a phase-space trajectory. This technique has been used to visualize the phase space of chemical reactions in an excellent treatise by De Leon et al. (1991). In other instances, it is instructive to consider a particular cut or surface in a large phase space that represents a set of variables of interest. Such a cut is known as a Poincar´e section after the French mathematician Henri Poincar´e (1854–1912), who, among other things, contributed substantially to our modern theory of dynamical systems. In this case, the values of the remaining variables will be fixed at the values they take at the location of this section. The concept of a Poincar´e section is illustrated in Fig. 1.5.

Fig. 1.5 A Poincar´e section. The dark line represents a trajectory, and the collection of points at which it crosses the plane is the Poincar´e section.

10

Classical mechanics

1.4 Lagrangian formulation of classical mechanics: A general framework for Newton’s laws Statistical mechanics is concerned with characterizing the number of microscopic states available to a system and, therefore, requires a formulation of classical mechanics that is more closely connected to the phase-space description than the Newtonian formulation. Since phase space provides a geometric description of a system in terms of positions and momenta, or equivalently in terms of positions and velocities, it is natural to look for an algebraic description of a system in terms of these variables. In particular, we seek a “generator” of the classical equations of motion that takes the positions and velocities or positions and momenta as its inputs and produces, through some formal procedure, the classical equations of motion. The formal structure we seek is embodied in the Lagrangian and Hamiltonian formulations of classical mechanics (Goldstein, 1980), named for Joseph-Louis Lagrange (1736–1813) and William Rowan Hamilton (1805–1865), respectively. The introduction of such a formal structure places some restrictions on the form of the force laws. Specifically, the forces are required to be conservative. Conservative forces are defined to be vector quantities that are derivable from a scalar function U (r1 , ..., rN ), known as a potential energy function, via Fi (r1 , ..., rN ) = −∇i U (r1 , ..., rN ),

(1.4.1)

where ∇i = ∂/∂ri . Consider the work done by the force Fi in moving particle i from points A to B along a particular path. This work is Z B Fi · dl. (1.4.2) WAB = A

Since Fi = −∇i U is conserved, the line integral simply becomes the difference in potential energy between path endpoints A and B, WAB = UA − UB , independent of the path taken. Because the work is independent of the path, it follows that along a closed path I Fi · dl = 0.

(1.4.3)

Given the N particle velocities, r˙ 1 , ..., r˙ N , the kinetic energy of the system is given

by N

K(˙r1 , ..., r˙ N ) =

1X mi r˙ 2i . 2 i=1

(1.4.4)

The Lagrangian L of a system is defined as the difference between the kinetic and potential energies expressed as a function of positions and velocities: L(r1 , ..., rN , r˙ 1 , ..., r˙ N ) = K(˙r1 , ..., r˙ N ) − U (r1 , ..., rN ).

(1.4.5)

The Lagrangian serves as the generator of the equations of motion via the EulerLagrange equation:   ∂L d ∂L − = 0. (1.4.6) dt ∂ r˙ i ∂ri

Lagrangian formulation

11

It can be easily verified that substitution of eqn. (1.4.5) into eqn. (1.4.6) gives eqn. (1.2.10): ∂L = mi r˙ i ∂ r˙ i   d ∂L = mi ¨ri dt ∂ r˙ i ∂L ∂U =− = Fi ∂ri ∂ri d dt



∂L ∂ r˙ i





∂L = mi ¨ri − Fi = 0, ∂ri

(1.4.7)

which is just Newton’s second law of motion. As an example of the application of the Euler-Lagrange equation, consider the one-dimensional harmonic oscillator discussed in the previous section. The Hooke’s law force F (x) = −kx can be derived from a potential U (x) =

1 2 kx , 2

(1.4.8)

so that the Lagrangian takes the form L(x, x) ˙ =

1 1 mx˙ 2 − kx2 . 2 2

(1.4.9)

Thus, the equation of motion is derived as follows: ∂L = mx˙ ∂ x˙   d ∂L = m¨ x dt ∂ x˙ ∂L = −kx ∂x d dt



∂L ∂ x˙





∂L = m¨ x + kx = 0, ∂x

(1.4.10)

which is the same as eqn. (1.3.5). It is important to note that when the forces in a particular system are conservative, then the equations of motion satisfy an important conservation law, namely the conservation of energy. The total energy is given by the sum of kinetic and potential energies: N X 1 (1.4.11) mi r˙ 2i + U (r1 , ..., rN ). E= 2 i=1

12

Classical mechanics

In order to verify that E is a constant, we need only show that dE/dt = 0. Differentiating eqn. (1.4.11) with respect to time yields N N X X ∂U dE ˙ ¨ mi ri · ri + · r˙ i = dt ∂r i i=1 i=1

=

=

N X

  ∂U r˙ i · mi ¨ri + ∂ri i=1

N X i=1

r˙ i · [mi ¨ri − Fi ]

= 0,

(1.4.12)

where the last line follows from the fact that Fi = mi ¨ri . The power of the Lagrangian formulation of classical mechanics lies in the fact that the equations of motion in an arbitrary coordinate system, which might not be easy to write down directly from Newton’s second law, can be derived straightforwardly via the Euler-Lagrange equation. Often, the standard Cartesian coordinates are not the most suitable coordinate choice for a given problem. Suppose, for a given system, there exists another set of 3N coordinates, {q1 , ..., q3N }, that provides a more natural description of the particle locations. These coordinates are known as generalized coordinates, and they can be related to the original Cartesian coordinates, r1 , ..., rN , via a coordinate transformation qα = fα (r1 , ..., rN ),

α = 1, ..., 3N.

(1.4.13)

Thus, each coordinate qα is generally a function of the N Cartesian coordinates, r1 , ..., rN . It is assumed that the coordinate transformation eqn. (1.4.13) has a unique inverse ri = gi (q1 , ..., q3N ),

i = 1, ..., N.

(1.4.14)

In order to determine the Lagrangian in terms of generalized coordinates, eqn. (1.4.14) is used to compute the velocities via the chain rule: r˙ i =

N 3N X X ∂gi ∂ri q˙α = q˙α , ∂qα ∂qα α=1 α=1

(1.4.15)

where ∂ri /∂qα is computed from ∂gi /∂qα . Substituting eqn. (1.4.15) into eqn. (1.4.4) gives the kinetic energy in terms of the new velocities q˙1 , ..., q˙3N : # "N 3N 3N 1 XX X ∂ri ∂ri ˜ K(q, q) ˙ = q˙α q˙β · mi 2 α=1 ∂qα ∂qβ i=1 β=1

Lagrangian formulation 3N 3N 1 XX ≡ Gαβ (q1 , ..., q3N )q˙α q˙β , 2 α=1

13

(1.4.16)

β=1

where Gαβ (q1 , ..., q3N ) =

N X

mi

i=1

∂ri ∂ri · ∂qα ∂qβ

(1.4.17)

is called the mass metric matrix or mass metric tensor. The matrix Gαβ is a function of some or all of the generalized coordinates, depending on the form of the transformations in eqn. (1.4.14). The Lagrangian in generalized coordinates then takes the form 3N 3N 1 XX L= Gαβ (q1 , ..., q3N )q˙α q˙β − U (g1 (q1 , ..., q3N ), ..., gN (q1 , ..., q3N )), (1.4.18) 2 α=1 β=1

where the potential U is expressed as a function of the generalized coordinates through the transformation in eqn. (1.4.14). Substitution of eqn. (1.4.18) into eqn. (1.4.6), the Euler-Lagrange equation, gives the equations of motion for each generalized coordinate, qγ , γ = 1, ..., 3N : 3N X

Gγβ (q1 , ..., q3N )¨ qβ +

3N X 3N  X ∂Gγβ

α=1 β=1

β=1

∂qα



 ∂U 1 ∂Gαβ q˙α q˙β = − . 2 ∂qγ ∂qγ

(1.4.19)

Here, ∂U/∂qγ can be obtained from the chain rule N

X ∂U ∂ri ∂U = · . ∂qγ ∂ri ∂qγ i=1

(1.4.20)

Equation (1.4.19) can be recast in a form that manifestly reveals the geometric structure of the generalized coordinate space. Noting that the dyad q˙α q˙β is symmetric with respect to an exchange of α and β, we can rewrite the second term on the left in eqn. (1.4.19) as   1 ∂Gαβ 1 ∂Gγβ ∂Gγα ∂Gαβ ∂Gγβ . (1.4.21) − = + − ∂qα 2 ∂qγ 2 ∂qα ∂qβ ∂qγ If we substitute eqn. (1.4.21) into eqn. (1.4.19), multiply through by the inverse of the mass-metric tensor G−1 λγ , and sum over γ, we obtain an equation of motion of the form q¨λ +

3N X 3N X

α=1 β=1

where Γλαβ

Γλαβ q˙α q˙β = −

3N X

γ=1

G−1 λγ

∂U , ∂qγ

  3N X 1 −1 ∂Gγβ ∂Gγα ∂Gαβ = + − G 2 λγ ∂qα ∂qβ ∂qγ γ=1

(1.4.22)

(1.4.23)

is known as the affine connection in the generalized coordinate space. On a general manifold, an affine connection is a geometric structure that provides a way to connect

14

Classical mechanics

nearby tangent spaces. In the absence of forces, eqn. (1.4.22) gives the equations of motion of geodesics (free-particle motion) in terms of the generalized coordinates. The remainder of this section will present several examples of the use of the Lagrangian formalism. 1.4.1

Example: Motion in a central potential

Consider a single particle in three dimensions subject to a potential U (r) that depends only on the p particle’s distance from the origin. This means U (r) = U (|r|) = U (r), x2 + y 2 + z 2 and is known as a central potential. The most natural where r = coordinates are not the Cartesian coordinates (x, y, z) but rather spherical polar coordinates (r, θ, φ) given by p p x2 + y 2 y −1 2 2 2 r = x +y +z , , φ = tan−1 , (1.4.24) θ = tan z x which can be inverted to give x = r sin θ cos φ,

y = r sin θ sin φ,

z = r cos θ.

(1.4.25)

The mass metric tensor is a 3×3 diagonal matrix given by G11 (r, θ, φ) = m G22 (r, θ, φ) = mr2 G33 (r, θ, φ) = mr2 sin2 θ Gαβ (r, θ, φ) = 0

α 6= β.

(1.4.26)

Returning to our example of a single particle moving in a central potential, U (r), we find that the Lagrangian obtained by substituting eqn. (1.4.26) into eqn. (1.4.18) is  1  (1.4.27) L = m r˙ 2 + r2 θ˙2 + r2 sin2 θφ˙ 2 − U (r). 2 In order to obtain the equations of motion from the Euler-Lagrange equations, eqn. (1.4.6), derivatives of L with respect to each of the variables and their time derivatives are required. These are given by: ∂L = mr, ˙ ∂ r˙

∂L dU = mrθ˙2 + mr sin2 θφ˙ 2 − ∂r dr

∂L ˙ = mr2 θ, ∂ θ˙

∂L = mr2 sin θ cos θφ˙ 2 ∂θ

∂L ˙ = mr2 sin2 θφ, ∂ φ˙

∂L = 0. ∂φ

(1.4.28)

Note that in eqn. (1.4.28), the derivative ∂L/∂φ = 0. The coordinate φ is an example of a cyclic coordinate. In general, if a coordinate q satisfies ∂L/∂q = 0, it is called cyclic. It is also possible to make θ a cyclic coordinate by recognizing that the quantity

Lagrangian formulation

15

l = r × p, called the orbital angular momentum, is a constant (l(0) = l(t)) when the potential only depends on r. (Angular momentum will be discussed in more detail in Section 1.11.) Thus, the quantity l is conserved by the motion and, therefore, satisfies dl/dt = 0. Because l is constant, it is always possible to simplify the problem by choosing a coordinate frame in which the z axis lies along the direction of l. In such a frame, the motion occurs solely in the xy plane so that θ = π/2 and θ˙ = 0. With this simplification, the equations of motion become m¨ r − mrφ˙ 2 = −

dU dr

mr2 φ¨ + 2mrr˙ φ˙ = 0.

(1.4.29)

The second equation can be expressed in the form d dt



1 2˙ r φ 2



= 0,

(1.4.30)

which expresses another conservation law known as the conservation of areal velocity, defined as the area swept out by the radius vector per unit time. Setting the quantity, mr2 φ˙ = λ, where λ is constant, the first equation of motion can be written as dU λ2 =− . mr3 dr

(1.4.31)

 1  2 1 λ2 + U (r) m r˙ + r2 φ˙ 2 + U (r) = mr˙ 2 + 2 2 2mr2

(1.4.32)

m¨ r− Since the total energy E=

is conserved, eqn. (1.4.32) can be inverted to give an integral expression dt = q t=

Z

dr 2 m

E − U (r) −

r

r(0)

λ2 2mr 2

dr′ q

2 m



E − U (r′ ) −

λ2 2mr ′2

,

(1.4.33)

which, for certain choices of the potential, can be integrated analytically and inverted to yield the trajectory r(t). 1.4.2

Example: Two-particle system

Consider a two-particle system with masses m1 and m2 , positions r1 and r2 , and velocities r˙ 1 and r˙ 2 subject to a potential U that is a function of only the distance

16

Classical mechanics

|r1 − r2 | between them. Such would be the case, for example, in a diatomic molecule. The Lagrangian for the system can be written as L=

1 1 m1 r˙ 21 + m2 r˙ 22 − U (|r1 − r2 |). 2 2

(1.4.34)

Although such a system can easily be treated directly in terms of the Cartesian positions r1 and r2 , for which the equations of motion are m1 ¨r1 = −U ′ (|r1 − r2 |) m2 ¨r2 = U ′ (|r1 − r2 |)

r1 − r2 |r1 − r2 |

r1 − r2 , |r1 − r2 |

(1.4.35)

a more natural set of coordinates can be chosen. To this end, we introduce the centerof-mass and relative coordinates defined by R=

m1 r1 + m2 r2 , M

r = r1 − r2 ,

(1.4.36)

respectively, where M = m1 + m2 . The inverse of this transformation is r1 = R +

m2 r, M

r2 = R −

m1 r. M

(1.4.37)

When eqn. (1.4.37) is substituted into eqn. (1.4.34), the Lagrangian becomes L=

1 ˙2 1 2 M R + µ˙r − U (|r|), 2 2

(1.4.38)

where µ = m1 m2 /M is known as the reduced mass. Since ∂L/∂R = 0, we see that the center-of-mass coordinate is cyclic, and only the relative coordinate needs to be considered. After elimination of the center of mass, the reduced Lagrangian is L = µ˙r2 /2 − U (|r|) which gives a simple equation of motion µ¨r = −U ′ (|r|)

r . |r|

(1.4.39)

Alternatively, one could transform r into spherical-polar coordinates as described in the previous subsection, and solve the resulting one-dimensional equation for a single particle of mass µ moving in a central potential U (r). We hope that the reader is now convinced of the elegance and simplicity of the Lagrangian formulation of classical mechanics. Primarily, it offers a framework in which the equations of motion can be obtained in any set of coordinates. Beyond this, in Section 1.8, we will see how it connects with a more general concept, the action extremization principle, which allows the Euler-Lagrange equations to be obtained by extremization of a particular mathematical form known as the classical action, an object of fundamental importance in quantum statistical mechanics to be explored in Chapter 12.

Legendre transforms

1.5

17

Legendre transforms

We shall next derive the Hamiltonian formulation of classical mechanics. Before we can do so, we need to introduce the concept of a Legendre transform. Consider a simple function f (x) of a single variable x. Suppose we wish to express f (x) in terms of a new variable s, where s and x are related by s = f ′ (x) ≡ g(x)

(1.5.1)

with f ′ (x) = df /dx. Can we determine f (x) at a point x0 given only s0 = f ′ (x0 ) = g(x0 )? The answer to this question, of course, is no. The reason, as Fig. 1.6 makes clear, is that s0 , being the slope of the line tangent to f (x) at x0 , is also the slope of f (x) + c at x = x0 for any constant c. Thus, f (x0 ) cannot be uniquely determined

Fig. 1.6 Depiction of the Legendre transform.

from s0 . However, if we specify both the slope, s0 = f ′ (x0 ), and the y-intercept, b(x0 ), of the line tangent to the function at x0 , then f (x0 ) can be uniquely determined. In fact, f (x0 ) will be given by the equation of the line tangent to the function at x0 : f (x0 ) = f ′ (x0 )x0 + b(x0 ).

(1.5.2)

Equation (1.5.2) shows how we may transform from a description of f (x) in terms of x to a new description in terms of s. First, since eqn. (1.5.2) is valid for all x0 , it can be written generally in terms of x as f (x) = f ′ (x)x + b(x).

(1.5.3)

18

Classical mechanics

Then, recognizing that f ′ (x) = g(x) = s and x = g −1 (s), and assuming that s = g(x) exists and is a one-to-one mapping, it is clear that the function b(g −1 (s)), given by b(g −1 (s)) = f (g −1 (s)) − sg −1 (s),

(1.5.4)

contains the same information as the original f (x) but expressed as a function of s instead of x. We call the function f˜(s) = b(g −1 (s)) the Legendre transform of f (x). The function f˜(s) can be written compactly as f˜(s) = f (x(s)) − sx(s),

(1.5.5)

where x(s) serves to remind us that x is a function of s through the variable transformation x = g −1 (s). The generalization of the Legendre transform to a function f of n variables x1 , ..., xn is straightforward. In this case, there will be a variable transformation of the form ∂f = g1 (x1 , ..., xn ) ∂x1 .. . ∂f sn = = gn (x1 , ..., xn ). ∂xn s1 =

(1.5.6)

Again, it is assumed that this transformation is invertible so that it is possible to express each xi as a function xi (s1 , ..., sn ) of the new variables. The Legendre transform of f will then be f˜(s1 , ..., sn ) = f (x1 (s1 , ..., sn ), ..., xn (s1 , ..., sn )) −

n X

si xi (s1 , ..., sn ).

(1.5.7)

i=1

Note that it is also possible to perform the Legendre transform of a function with respect to any subset of the variables on which the function depends.

1.6 Generalized momenta and the Hamiltonian formulation of classical mechanics For a first application of the Legendre transform technique, we will derive a new formulation of classical mechanics in terms of positions and momenta rather than positions and velocities. The Legendre transform will appear again numerous times in subsequent chapters. Recall that the Cartesian momentum of a particle pi is just pi = mi r˙ i . Interestingly, the momentum can also be obtained as a derivative of the Lagrangian with respect to r˙ i :   N X ∂  1 ∂L = mj r˙ 2j − U (r1 , ..., rN ) = mi r˙ i . pi = (1.6.1) ∂ r˙ i ∂ r˙ i j=1 2

For this reason, it is clear how the Legendre transform method can be applied. We seek to derive a new function of positions and momenta as a Legendre transform of

Hamiltonian formulation

19

the Lagrangian with respect to the velocities. Note that, by way of eqn. (1.6.1), the velocities can be easily expressed as functions of momenta, r˙ i = r˙ i (pi ) = pi /mi . Therefore, substituting the transformation into eqn. (1.5.7), the new Lagrangian, denoted ˜ 1 , ..., rN , p1 , ..., pN ), is given by L(r ˜ 1 , ..., rN , p1 , ..., pN ) = L(r1 , ..., rN , r˙ 1 (p1 ), ..., r˙ N (pN )) − L(r

N X i=1

pi · r˙ i (pi )

 2 N N X 1X pi pi = pi · mi − U (r1 , ..., rN ) − 2 i=1 mi m i i=1 =−

N X p2i − U (r1 , ..., rN ). 2mi i=1

(1.6.2)

˜ 1 , ..., rN , p1 , ..., pN ) is known as the Hamiltonian H: The function −L(r H(r1 , ..., rN , p1 , ..., pN ) =

N X p2i + U (r1 , ..., rN ). 2mi i=1

(1.6.3)

The Hamiltonian is simply the total energy of the system expressed as a function of positions and momenta and is related to the Lagrangian by H(r1 , ..., rN , p1 , ..., pN ) =

N X i=1

pi · r˙ i (pi ) − L(r1 , ..., rN , r˙ 1 (p1 ), ...., r˙ N (pN )). (1.6.4)

The momenta given in eqn. (1.6.1) are referred to as conjugate to the positions r1 , ..., rN . The relations derived above also hold for a set of generalized coordinates. The momenta p1 , ..., p3N conjugate to a set of generalized coordinates q1 , ..., q3N are given by ∂L pα = , (1.6.5) ∂ q˙α and the Hamiltonian becomes H(q1 , ..., q3N , p1 , ..., p3N ) =

3N X

pα q˙α (p1 , ..., p3N )

α=1

− L(q1 , ..., q3N , q˙1 (p1 , ..., p3N ), ..., q˙3N (p1 , ..., p3N )).

(1.6.6)

Now, according to eqn. (1.4.18), since Gαβ is a symmetric matrix, the generalized conjugate momenta are 3N X pα = Gαβ (q1 , ..., q3N )q˙β . (1.6.7) β=1

20

Classical mechanics

Inverting this, we obtain the generalized velocities as q˙α =

3N X

G−1 αβ (q1 , ..., q3N )pβ ,

(1.6.8)

β=1

where the inverse of the mass-metric tensor is G−1 αβ (q1 , ..., q3N )

    N X ∂qβ 1 ∂qα · . = mi ∂ri ∂ri i=1

(1.6.9)

It follows that the Hamiltonian in terms of a set of generalized coordinates is 3N 3N 1 XX pα G−1 H(q1 , ..., q3N , p1 , ..., p3N ) = αβ (q1 , ..., q3N )pβ 2 α=1 β=1

+ U (r1 (q1 , ..., q3N ), ..., rN (q1 , ..., q3N )).

(1.6.10)

Given the Hamiltonian (as a Legendre transform of the Lagrangian), one can obtain the equations of motion for the system from the Hamiltonian using q˙α =

∂H , ∂pα

p˙ α = −

∂H . ∂qα

(1.6.11)

Equations (1.6.11) are known as Hamilton’s equations of motion. Whereas the EulerLagrange equations constitute a set of 3N second-order differential equations, Hamilton’s equations constitute an equivalent set of 6N first-order differential equations. When subject to the same initial conditions, the Euler-Lagrange and Hamiltonian equations of motion must yield the same trajectory. Hamilton’s equations must be solved subject to a set of initial conditions on the coordinates and momenta, {q1 (0), ..., q3N (0), p1 (0), ..., p3N (0)}. Equations (1.6.11) are completely equivalent to Newton’s second law of motion. In order to see this explicitly, let us apply Hamilton’s equations to the simple Cartesian Hamiltonian of eqn. (1.6.3): r˙ i =

pi ∂H = ∂pi mi

p˙ i = −

∂H ∂U =− = Fi (r). ∂ri ∂ri

(1.6.12)

Taking the time derivative of both sides of the first equation and substituting the result into the second yields ¨ri =

p˙ i mi

p˙ i = mi ¨ri = Fi (r1 , ..., rN ),

(1.6.13)

which shows that Hamilton’s equations reproduce Newton’s second law of motion. The reader should check that the application of Hamilton’s equations to a simple harmonic oscillator, for which H = p2 /2m + kx2 /2, yields the equation of motion m¨ x + kx = 0.

Hamiltonian formulation

21

Hamilton’s equations conserve the total Hamiltonian: dH = 0. dt

(1.6.14)

Since H is the total energy, eqn. (1.6.14) is just the law of energy conservation. In order to see that H is conserved, we simply compute the time derivative dH/dt via the chain rule in generalized coordinates:  3N  X dH ∂H ∂H q˙α + p˙ α = dt ∂qα ∂pα α=1  3N  X ∂H ∂H ∂H ∂H − = ∂qα ∂pα ∂pα ∂qα α=1 = 0,

(1.6.15)

where the second line follows from Hamilton’s equation, eqns. (1.6.11). We will see shortly that conservation laws, in general, are connected with physical symmetries of a system and, therefore, play an important role in the analysis of the dynamics. Hamilton’s equations of motion describe the unique evolution of the coordinates and momenta subject to a set of initial conditions. In the language of phase space, they specify a trajectory xt = (q1 (t), ..., q3N (t), p1 (t), ..., p3N (t)) in the phase space starting from an initial point x0 . The energy conservation condition H(q1 (t), ..., q3N (t), p1 (t), ..., p3N (t)) = const is expressed as a condition on a phase-space trajectory. It can also be expressed as a condition directly on the coordinates and momenta H(q1 , ..., q3N , p1 , ..., p3N ) = const, which defines a 6N − 1 dimensional surface in the phase space on which a trajectory must remain. This surface is known as the constant-energy hypersurface or simply the constant-energy surface. An important theorem, known as the work-energy theorem, follows from the law of conservation of energy. Consider the evolution of the system from a point xA in phase space to a point xB . Since energy is conserved, the energy HA = HB . But since H = K + U , it follows that K A + UA = K B + UB ,

(1.6.16)

K A − K B = UB − UA .

(1.6.17)

or The right side expresses the difference in potential energy between points A and B and is, therefore, equal to the work, WAB , done on the system in moving between these two points. The left side is the difference between the initial and final kinetic energy. Thus, we have a relation between the work done on the system and the kinetic energy difference WAB = KA − KB . (1.6.18) Note that if WAB > 0, net work is done on the system, which means that its potential energy increases, and its kinetic energy must decrease between points A and B. If

22

Classical mechanics

WAB < 0, work is done by the system, its potential energy decreases, and its kinetic energy must, therefore, increase between points A and B. In order to understand the formal structure of a general conservation law, consider the time evolution of any arbitrary phase-space function, a(x). Viewing x as a function of time xt , the time evolution can be analyzed by differentiating a(xt ) with respect to time: ∂a da · x˙ t = dt ∂xt =

 3N  X ∂a ∂a q˙α + p˙ α ∂qα ∂pα α=1

 3N  X ∂a ∂H ∂a ∂H − = ∂qα ∂pα ∂pα ∂qα α=1 ≡ {a, H}.

(1.6.19)

The last line is known as the Poisson bracket between a(x) and H(x). The general definition of a Poisson bracket between two functions a(x) and b(x) is  3N  X ∂a ∂b ∂a ∂b . (1.6.20) − {a, b} = ∂qα ∂pα ∂pα ∂qα α=1

Note that the Poisson bracket is a statement about the dependence of functions on the phase-space vector and no longer refers to time. This is an important distinction, as it will often be necessary for us to distinguish between quantities evaluated along trajectories generated from the solution of Hamilton’s equations and quantities that are evaluated at arbitrary (static) points in the phase space. From eqn. (1.6.19), it is clear that if a(x) is a conserved quantity, then da(xt )/dt = 0 along a trajectory, and, therefore, {a(x), H(x)} = 0 in the phase space. Conversely, if the Poisson bracket between any quantity a(x) and the Hamiltonian of a system vanishes, then the function a(xt ) is conserved along a trajectory generated by Hamilton’s equations. As an example of the Poisson bracket formalism, P suppose a system has no external forces acting on it. In this case, the total force N F = 0, since all internal forces i=1 PNi are balanced by Newton’s third law. The condition i=1 Fi = 0 implies that N X i=1

Fi = −

N X ∂H i=1

∂ri

= 0.

(1.6.21)

PN Now, consider the total momentum P = i=1 pi . Its Poisson bracket with the Hamiltonian is N N N X X ∂H X {pi , H} = − Fi = 0. (1.6.22) {P, H} = = ∂ri i=1 i=1 i=1

Hence, the total momentum P is conserved. When a system has no external forces acting on it, its dynamics will be the same no matter where in space the system lies.

Hamiltonian formulation

23

That is, if all of the coordinates were translated by a constant vector a according to r′i = ri + a, then the Hamiltonian would remain invariant. This transformation defines the so-called translation group. In general, if the Hamiltonian is invariant with respect to the transformations of a particular group G, there will be an associated conservation law. This fact, known as Noether’s theorem, is one of the cornerstones of classical mechanics and also has important implications in quantum mechanics. Another fundamental property of Hamilton’s equations is known as the condition of phase-space incompressibility. To understand this condition, consider writing Hamilton’s equations directly in terms of the phase-space vector as x˙ = η(x),

(1.6.23)

where η(x) is a vector function of the phase-space vector x. Since x = (q1 , ..., q3N , p1 , ..., p3N ), it follows that η(x) =



∂H ∂H ∂H ∂H , ..., ,− , ..., − ∂p1 ∂p3N ∂q1 ∂q3N



.

(1.6.24)

Equation (1.6.23) illustrates the fact that the general phase-space “velocity” x˙ is a function of x, suggesting that motion in phase space described by eqn. (1.6.23) can be regarded as a kind of “flow field” as in hydrodynamics, where the flow pattern of a fluid is described by a velocity field v(r). Thus, at each point in phase space, there will be a velocity vector x(x) ˙ equal to η(x). In hydrodynamics, the condition for incompressible flow is that there be no sources or sinks in the flow, which means that the flow field is divergence free: ∇ · v(r) = 0. In phase-space flow, the analogous condition is ∇x · x(x) ˙ = 0, where ∇x = ∂/∂x is the phase-space gradient operator. Hamilton’s equations of motion guarantee that the incompressibility condition in phase space is satisfied. To see this, consider the compressibility in generalized coordinates ∇x · x˙ = =

3N  X ∂ p˙ α

α=1

∂pα

+

∂ q˙α ∂qα



 3N  X ∂ ∂H ∂ ∂H − + ∂pα ∂qα ∂qα ∂pα α=1

 3N  X ∂2H ∂2H + = − ∂pα ∂qα ∂qα ∂pα α=1 = 0,

(1.6.25)

where the second line follows from Hamilton’s equations of motion. One final important property of Hamilton’s equations that merits comment is the so-called symplectic structure of the equations of motion. Given the form of the vector function, η(x), introduced above, it follows that Hamilton’s equations can be recast as

24

Classical mechanics

∂H , ∂x where M is a matrix expressible in block form as   0 I M= , −I 0

(1.6.26)

x˙ = M

(1.6.27)

where 0 and I are the 3N × 3N zero and identity matrices, respectively. Dynamical systems expressible in the form of eqn. (1.6.26) are said to possess a symplectic structure. Consider a solution xt to eqn. (1.6.26) starting from an initial condition x0 . Because the solution of Hamilton’s equations is unique for each initial condition, xt will be a unique function of x0 , that is, xt = xt (x0 ). This dependence can be viewed as defining a variable transformation on the phase space from an initial set of phase space coordinates x0 to a new set xt . The Jacobian matrix J of this transformation, whose elements are given by ∂xkt Jkl = , (1.6.28) ∂xl0 satisfies the following condition: M = JT MJ, (1.6.29) where JT is the transpose of J. Equation (1.6.29) is known as the symplectic property. We will have more to say about the symplectic property in Chapter 3. At this stage, however, let us illustrate the symplectic property in a simple example. Consider, once again, the harmonic oscillator H = p2 /2m + kx2 /2 with equations of motion x˙ =

p ∂H = ∂p m

p˙ = −

∂H = −kx. ∂x

(1.6.30)

The general solution to these for an initial condition (x(0), p(0)) is p(0) sin ωt mω p(t) = p(0) cos ωt − mωx(0) sin ωt,

x(t) = x(0) cos ωt +

(1.6.31) p where ω = k/m is the frequency of the oscillator. The Jacobian matrix is, therefore,  ∂x(t)   ∂x(t)  1 cos ωt sin ωt ∂x(0) ∂p(0) mω    . (1.6.32) J= = ∂p(t) ∂p(t) −mω sin ωt cos ωt ∂x(0)

∂p(0)

For this two-dimensional phase space, the matrix M is given simply by   0 1 M= . −1 0 Thus, performing the matrix multiplication JT MJ, we find    0 1 cos ωt cos ωt −mω sin ωt   JT MJ =  1 cos ωt −1 0 −mω sin ωt mω sin ωt

1 mω

(1.6.33)

sin ωt

cos ωt

 

Polymer model



= 

=

cos ωt 1 mω

0

−mω sin ωt

sin ωt

1

−1 0

cos ωt

 

−mω sin ωt − cos ωt

cos ωt 1 − mω

sin ωt

25

 

 

= M,

(1.6.34)

showing that the symplectic condition is satisfied.

1.7

A simple classical polymer model

Before moving on to more formal developments, we present a simple classical model for a free polymer chain that can be solved analytically. This example, known as the Rouse model (Rouse, 1953), not only serves as a basis for more complex models of biological systems presented later but will also reappear in our discussion of quantum statistical mechanics. The model is illustrated in Fig. 1.7. It consists of a set of N point particles connected by nearest neighbor harmonic springs for which the Hamiltonian is N N −1 X p2i 1 X H= mω 2 (|ri − ri+1 | − bi )2 , (1.7.1) + 2m 2 i=1 i=1

where bi is the equilibrium bond length. For simplicity, all of the particles are assigned

Fig. 1.7 The harmonic polymer model.

26

Classical mechanics

the same mass, m. Consider a one-dimensional analog of eqn. (1.7.1) described by H=

N −1 N X 1 X p2i mω 2 (xi − xi+1 − bi )2 . + 2m 2 i=1 i=1

(1.7.2)

In order to simplify the problem, we begin by making a change of variables of the form ηi = xi − xi0 ,

(1.7.3)

where xi0 − x(i+1)0 = bi . The Hamiltonian in terms of the new variables and their conjugate momenta is given by N N −1 X p2ηi 1 X H= mω 2 (ηi − ηi+1 )2 . + 2m 2 i=1 i=1

(1.7.4)

The equations of motion obeyed by this simple system can be obtained directly from Hamilton’s equations and take the form pηi m = −mω 2 (η1 − η2 ) = −mω 2 (2ηi − ηi+1 − ηi−1 ),

η˙ i = p˙η1 p˙ηi

i = 2, ..., N − 1

2

p˙ηN = −mω (ηN − ηN −1 ),

(1.7.5)

which can be expressed as second-order equations η¨1 = −ω 2 (η1 − η2 ) η¨i = −ω 2 (2ηi − ηi+1 − ηi−1 ), 2

η¨N = −ω (ηN − ηN −1 ).

i = 2, ..., N − 1

(1.7.6)

In eqns. (1.7.5) and (1.7.6), it is understood that the η0 = ηN +1 = 0, since these have no meaning in our system. Equations (1.7.6) must be solved subject to a set of initial conditions {η1 (0), ..., ηN (0), η˙1 (0), ..., η˙ N (0)}. The general solution to eqns. (1.7.6) can be written in the form of a Fourier series ηi (t) =

N X

Ck aik eiωk t ,

(1.7.7)

k=1

where ωk is a set of frequencies, aik is a set of expansion coefficients, and Ck is a complex scale factor. Substitution of this ansatz into eqns. (1.7.6) gives N X

Ck ωk2 a1k eiωk t = ω 2

k=1 N X

k=1

N X

Ck eiωk t (a1k − a2k )

N X

Ck eiωk t (2aik − ai+1,k − ai−1,k )

k=1

Ck ωk2 aik eiωk t = ω 2

k=1

Polymer model N X

Ck ωk2 aN k eiωk t



2

k=1

N X

k=1

Ck eiωk t (aN k − aN −1,k ).

27

(1.7.8)

Since eqns. (1.7.8) must be satisfied independently for each function exp(iωk t), we arrive at an eigenvalue equation of the form: ωk2 ak = Aak . Here, A is a matrix given by 

1  −1  A = ω2  0 

−1 2 −1

0

0 −1 2 ··· 0

0

(1.7.9)

0 0 −1

0 0 0

··· ··· ···

0 0 0

0

0

···

−1

 0 0  0, 

(1.7.10)

1

and the ωk2 and ak are the eigenvalues and eigenvectors, respectively. The square roots of the eigenvalues are frequencies that correspond to a set of special modes of the chain known as the normal modes. By diagonalizing the matrix A, the frequencies can be shown to be    (k − 1)π 2 2 . (1.7.11) ωk = 2ω 1 − cos N Moreover, the orthogonal matrix U whose columns are the eigenvectors ak of A defines a transformation from the original displacement variables ηi to a new set of variables ζi via X ζi = ηk Uki , (1.7.12) k

known as normal mode variables. By applying this transformation to the Hamiltonian in eqn. (1.7.4), it can be shown that the transformed Hamiltonian is given by H=

N N X p2ζk 1X + mωk2 ζk2 . 2m 2 k=1

(1.7.13)

k=1

(The easiest way to derive this result is to start with the Lagrangian in terms of η1 , ..., ηN , η˙ 1 , ..., η˙ N , apply eqn. (1.7.12) to it, and then perform the Legendre transform to obtain the Hamiltonian. Alternatively, one can directly compute the inverse of the mass-metric tensor and substitute it directly into eqn. (1.6.10).) In eqn. (1.7.13), the normal modes are decoupled from each other and represent a set of independent modes with frequencies ωk . Note that independent of N , there is always one normal mode, the k = 1 mode, whose frequency is ω1 = 0. This zero-frequency mode corresponds to overall translations of the entire chain in space. In the absence of an external potential, this translational motion is free, with no associated frequency. Considering this fact, the solution of the equations of motion for each normal mode ζ¨k + ωk2 ζk = 0

(1.7.14)

28

Classical mechanics

can now be solved analytically: ζ1 (t) = ζ1 (0) +

pζ1 (0) t m

ζk (t) = ζk (0) cos ωk t +

pζk (0) sin ωk t mωk

k = 2, ..., N,

(1.7.15)

where ζ1 (0), ..., ζN (0), pζ1 (0), ..., pζN (0) are the initial conditions on the normal mode variables, obtainable by transformation of the initial conditions of the original coordinates. Note that pζ1 (t) = pζ1 (0) is the constant momentum of the free zero-frequency mode. In order to better understand the physical meaning of the normal modes, consider the simple case of N = 3. In this case, there are three normal mode frequencies given by √ (1.7.16) ω1 = 0, ω2 = ω, ω3 = 3ω. Moreover, the orthogonal transformation matrix is given by   √1 √1 √1 3

   1 U =  √3   √1 3

2

6

0

− √26

− √12

√1 6

   .  

(1.7.17)

Therefore, the three normal mode variables corresponding to each of these frequencies are given by ω1 = 0 :

1 ζ1 = √ (η1 + η2 + η3 ) 3

ω2 = ω :

1 ζ2 = √ (η1 − η3 ) 2

ω3 =

√ 3ω :

1 ζ3 = √ (η1 − 2η2 + η3 ) . 6

(1.7.18)

These three modes are illustrated in Fig. 1.8. Again, the zero-frequency mode corresponds to overall translations of the chain. The ω2 mode corresponds to the motion of the two outer particles in opposite directions, with the central particle remaining fixed. This is known as the asymmetric stretch mode. The highest frequency ω3 mode corresponds to symmetric motion of the two outer particles with the central particle oscillating out of phase with them. This is known as the symmetric stretch mode. On a final note, a more realistic model for real molecules should involve additional terms beyond just the harmonic bond interactions of eqn. (1.7.1). Specifically, there should be potential energy terms associated with bend angle and dihedral angle motion. For now, we hope that this simple harmonic polymer model illustrates the types of techniques used to solve classical problems. Indeed, the use of normal modes as a method for efficiently simulating the dynamics of biomolecules has been proposed (Sweet et al.,

The action integral

29

Fig. 1.8 Normal modes of the harmonic polymer model for N = 3 particles.

2008). In general, for anharmonic potentials, one can only obtain local normal modes obtained by diagonalizing the Hessian of the potential Hαβ = ∂ 2 U/∂qα ∂qβ in the vicinity of a point in configuration space.

1.8

The action integral

Having introduced the Lagrangian formulation of classical mechanics and derived the Hamiltonian formalism from it using the Legendre transform, it is natural to ask if there is a more fundamental principle that leads to the Euler-Lagrange equations. In fact, we will show that the latter can be obtained from a variational principle applied to a certain integral quantity, known as the action integral. At this stage, however, we shall introduce the action integral concept without motivation because in Chapter 12, we will show that the action integral emerges naturally and elegantly from quantum mechanics. The variational principle to be laid out here has more than formal significance. It has been adapted for actual trajectory calculations for large biological macromolecules by Olender and Elber (1996) and by Passerone and Parrinello (2001). In order to define the action integral, we consider a classical system with generalized coordinates q1 , ..., q3N and velocities q˙1 , ..., q˙3N . For notational simplicity, let us denote by Q the full set of coordinates Q ≡ {q1 , ..., q3N } and Q˙ the full set of velocities Q˙ ≡ {q˙1 , ..., q˙3N }. Suppose we follow the evolution of the system from time t1 to t2 with initial and final conditions (Q1 , Q˙ 1 ) and (Q2 , Q˙ 2 ), respectively, and we ask what path the system will take between these two points (see Fig. 1.9). We will show that the path followed renders stationary the following integral: Z t2 ˙ L(Q(t), Q(t)) dt. (1.8.1) A= t1

The integral in eqn. (1.8.1) is known as the action integral. We see immediately that the action integral depends on the entire trajectory of the system. Moreover, as specified, the action integral does not refer to one particular trajectory but to any trajectory

30

Classical mechanics

Fig. 1.9 Two proposed paths joining the fixed endpoints. The actual path followed is a stationary path of the action integral in eqn. (1.8.1.)

that takes the system from (Q1 , Q˙ 1 ) to (Q2 , Q˙ 2 ) in a time t2 − t1 . Each trajectory satisfying these conditions yields a different value of the action. Thus, the action can be viewed as a “function” of trajectories that satisfy the initial and final conditions. However, this is not a function in the usual sense since the action is really a “function of a function.” In mathematical terminology, we say that the action is a functional of the trajectory, which is specified by writing A as A[Q]. A functional is a quantity that depends on all values of a function between two points of its domain. (For readers unfamiliar with functionals, an introduction is provided in Appendix B.) Here, the action is a functional of trajectories Q(t) leading from Q1 to Q2 in time t2 − t1 . In order to express the functional dependence, the notation A[Q] is commonly used. Also, ˙ since at each t, L(Q(t), Q(t)) only depends on t (and not on other times), A[Q] is known as a local functional in time. Functionals will appear from time to time throughout the book, so it is important to become familiar with these objects. Stationarity of the action means that the action does not change to first order if a small variation of a path is made keeping the endpoints fixed. In order to see that the true classical path of the system is a stationary point of A, we need to consider a path Q(t) between points 1 and 2 and a second path, Q(t) + δQ(t), between points 1 and 2 that is only slightly different from Q(t). If a path Q(t) renders A[Q] stationary, then to first order in δQ(t), the variation δA of the action must vanish. This can be shown by first noting that the path Q(t) satisfies the initial and final conditions:

The action integral

Q(t1 ) = Q1 ,

˙ 1 ) = Q˙ 1 , Q(t

Q(t2 ) = Q2 ,

˙ 2 ) = Q˙ 2 . Q(t

31

(1.8.2)

Since all paths begin at Q1 and end at Q2 , the path Q(t) + δQ(t) must also satisfy these conditions, and since Q(t) already satisfies eqn. (1.8.2), the function δQ(t) must satisfy ˙ 1 ) = δ Q(t ˙ 2 ) = 0. δQ(t1 ) = δQ(t2 ) = 0, δ Q(t (1.8.3) The variation in the action is defined to be the difference Z t2 Z t2 ˙ ˙ ˙ L(Q(t), Q(t)) dt. L(Q(t) + δQ(t), Q(t) + δ Q(t)) dt − δA =

(1.8.4)

t1

t1

This variation must vanish to first order in the path difference δQ(t). Expanding to first order, we find: Z

t2

Z

t2

− =

Z

δA =

˙ L(Q(t), Q(t)) dt +

Z

t2

t1

t1

3N  X ∂L

α=1

∂qα

δqα (t) +

 ∂L δ q˙α (t) dt ∂ q˙α

˙ L(Q(t), Q(t)) dt

t1 3N  t2 X

t1

α=1

 ∂L ∂L δqα (t) + δ q˙α (t) dt. ∂qα ∂ q˙α

(1.8.5)

We would like the term in brackets to involve only δqα (t) rather than both δqα (t) and δ q˙α (t) as it currently does. We thus integrate the second term in brackets by parts to yield t2 Z t2 3N    3N X X ∂L ∂L d ∂L δqα (t) dt. δA = δqα (t) + − ∂ q˙α dt ∂ q˙α t1 α=1 ∂qα t1 α=1

(1.8.6)

The boundary term vanishes by virtue of eqn. (1.8.3). Then, since δA = 0 to first order in δqα (t) at a stationary point, and each of the generalized coordinates qα and their variations δqα are independent, the term in brackets must vanish independently for each α. This leads to the condition   ∂L d ∂L − = 0, (1.8.7) dt ∂ q˙α ∂qα which is just the Euler-Lagrange equation. The implication is that the path for which the action is stationary is that which satisfies the Euler-Lagrange equation. Since the latter specifies the classical motion, the path is a classical path. There is a subtle difference, however, between classical paths that satisfy the endpoint conditions specified in the formulation of the action and those generated from a set of initial conditions as discussed in Section 1.2. In particular, if an initial-value problem has a solution, then it is unique, assuming smooth, well-behaved forces. By contrast, if a solution exists to the endpoint problem, it is not guaranteed to be a

32

Classical mechanics

unique solution. However, it is trivial to see that if a trajectory with initial conditions ˙ 1 ) passes through the point Q2 at t = t2 , then it must also be a solution Q(t1 ) and Q(t of the endpoint problem. Fortunately, in statistical mechanics, this distinction is not very important, as we are never interested in the unique trajectory arising from one particular initial condition, and in fact, initial conditions for Hamilton’s equations are generally chosen at random (e.g., random velocities). Typically, we are interested in the behavior of large numbers of trajectories all seeded differently. Similarly, we are rarely interested in paths leading from one specific point in phase space to another as much as paths that evolve from one region of phase space to another. Therefore, the initial-value and endpoint formulations of classical trajectories can often be two routes to the solution of a particular problem. The action principle suggests the intriguing possibility that classical trajectories could be computed from an optimization procedure performed on the action given knowledge of the endpoints of the trajectory. This idea has been exploited by various researchers to study complex processes such as protein folding. As formulated, however, stationarity of the action does not imply that the action is minimum along a classical trajectory, and, indeed, the action is bounded neither from above nor below. In order to overcome this difficulty, alternative formulations of the action principle have been proposed which employ an action or a variational principle that leads to a minimization problem. The most well known of these is Hamilton’s principle of least action. The least action principle involves a somewhat different type of variational principle in which the variations are not required to vanish at the endpoints. A detailed discussion of this type of variation, which is beyond the scope of this book, can be found in Goldstein’s Classical Mechanics (1980).

1.9

Lagrangian mechanics and systems with constraints

In mechanics, it is often necessary to treat a system that is subject to a set of externally imposed constraints. These constraints can be imposed as a matter of convenience, e.g. constraining high-frequency chemical bonds in a molecule at fixed bond lengths, or as true constraints that might be due, for example, to the physical boundaries of a system or the presence of thermal or barostatic control mechanisms. Constraints are expressible as mathematical relations among the phase-space variables. Thus, a system with Nc constraints will have 3N − Nc degrees of freedom and a set of Nc functions of the coordinates and velocities that must be satisfied by the motion of the system. Constraints are divided into two types. If the relationships that must be satisfied along a trajectory are functions of only the particle coordinates q1 , ..., q3N and possibly time, then the constraints are called holonomic and can be expressed as Nc conditions of the form σk (q1 , ..., q3N , t) = 0,

k = 1, ..., Nc .

(1.9.1)

If they cannot be expressed in this manner, the constraints are said to be nonholonomic. A class of a nonholonomic constraints consists of conditions involving both the particle positions and velocities: ζ(q1 , ..., q3N , q˙1 , ..., q˙3N ) = 0.

(1.9.2)

Constraints

33

An example of a nonholonomic constraint is a system whose total kinetic energy is kept fixed (thermodynamically, this would be a way of fixing the temperature of the system). The nonholonomic constraint in Cartesian coordinates would then be expressed as N

1X mi r˙ 2i − C = 0, 2 i=1

(1.9.3)

where C is a constant. Since constraints reduce the number of degrees of freedom in a system, it is often possible to choose a new system of 3N − Nc generalized coordinates, known as a minimal set of coordinates, that eliminates the constraints. For example, consider the motion of a particle on the surface of a sphere. If the motion is described in terms of Cartesian coordinates (x, y, z), then a constraint condition of the form x2 + y 2 + z 2 − R2 = 0, where R is the radius of the sphere, must be imposed at all times. This constraint could be eliminated by choosing the spherical polar angle θ and φ as generalized coordinates. However, it is not always convenient to work in such minimal coordinate frames, particularly when there is a large number of coupled constraints. An example of this is a long hydrocarbon chain in which all carbon-carbon bonds are held rigid (an approximation, as noted earlier, that is often made to eliminate the high frequency vibrational motion). Thus, it is important to consider how the framework of classical mechanics is affected by the imposition of constraints. We will now show that the Lagrangian formulation of mechanics allows the influence of constraints to be incorporated into its framework in a transparent way. In general, it would seem that the imposition of constraints no longer allows the equations of motion to be obtained from the stationarity of the action, since the coordinates (and/or velocities) are no longer independent. More specifically, the path displacements δqα (cf. eqn. (1.8.6)) are no longer independent. In fact, the constraints can be built into the action formalism using the method of Lagrange undetermined multipliers. However, in order to apply this method, the constraint conditions must be expressible in a differential form as: 3N X

akα dqα + akt dt = 0,

k = 1, ..., Nc ,

(1.9.4)

α=1

where akα is a set of coefficients for the displacements dqα . For a holonomic constraint as in eqn. (1.9.1), it is clear that the coefficients can be obtained by differentiating the constraint condition N X ∂σk ∂σk dqα + dt = 0 (1.9.5) ∂q ∂t α α=1 so that

akα =

∂σk , ∂qα

akt =

∂σk . ∂t

(1.9.6)

Nonholonomic constraints cannot always be expressed in the form of eqn. (1.9.4). A notable exception is the kinetic energy constraint of eqn. (1.9.3):

34

Classical mechanics N X 1 i=1

N X i=1

1 mi r˙ i · 2

N X 1 i=1

2

2



mi r˙ 2i − C = 0 dri dt



−C = 0

mi r˙ i · dri − Cdt = 0

(1.9.7)

so that

1 a1t = C (1.9.8) mi r˙ i , 2 (k = 1 since there is only a single constraint). Assuming that the constraints can be expressed in the differential form of eqn. (1.9.4), we must also be able to express them in terms of path displacements δqα in order to incorporate them into the action principle. Unfortunately, doing so requires a further restriction, since it is not possible to guarantee that a perturbed path Q(t) + δQ(t) satisfies the constraints. The latter will hold if the constraints are integrable, in which case they are expressible in terms of path displacements as a1i =

3N X

akα δqα = 0.

(1.9.9)

α=1

The coefficient akt does not appear in eqn. (1.9.9) because there is no time displacement. The equations of motion can then be obtained by adding eqn. (1.9.9) to eqn. (1.8.6) with a set of Lagrange undetermined multipliers, λk , where there is one multiplier for each constraint, according to " #   X Z t2 X Nc 3N ∂L d ∂L δA = (1.9.10) + − λk akα δqα (t) dt. dt ∂ q˙α t1 α=1 ∂qα k=1

The equations of motion obtained by requiring that δA = 0 are then d dt



∂L ∂ q˙α



N



c X ∂L = λk akα . ∂qα

(1.9.11)

k=1

It may seem that we are still relying on the independence of the displacements δqα , but this is actually not the case. Suppose we choose the first 3N −Nc coordinates to be independent. Then, these coordinates can be evolved using eqns. (1.9.11). However, we can choose λk such that eqns. (1.9.11) apply to the remaining Nc coordinates as well. In this case, eqns. (1.9.11) hold for all 3N coordinates provided they are solved subject to the constraint conditions. The latter can be expressed as a set of Nc differential equations of the form 3N X akα q˙α + akt = 0. (1.9.12) α=1

Gauss’s principle

35

Equations (1.9.11) together with eqn. (1.9.12) constitute a set of 3N + Nc equations for the 3N + Nc unknowns, q1 , ..., q3N , λ1 , ..., λNc . This is the most common approach used in numerical solutions of classical-mechanical problems. Note that, even if a system is subject to a set of time-independent holonomic constraints (akt = 0), the Hamiltonian is still conserved. In order to see this, note that eqns. (1.9.11) and (1.9.12) can be cast in Hamiltonian form as q˙α =

∂H ∂pα N

p˙ α = − 3N X

akα

α=1

c ∂H X − λk akα ∂qα

k=1

∂H = 0. ∂pα

(1.9.13)

Computing the time-derivative of the Hamiltonian, we obtain  3N  X ∂H ∂H dH q˙α + p˙ α = dt ∂qα ∂pα α=1 " 3N X ∂H ∂H ∂H = − ∂q ∂p ∂p α α α α=1 =

Nc X

k=1

λk

N

c ∂H X + λk akα ∂qα

k=1

!#

3N X ∂H akα ∂pα α=1

= 0.

(1.9.14)

From this, it is clear that no work is done on a system by the imposition of holonomic constraints.

1.10

Gauss’s principle of least constraint

The constrained equations of motion (1.9.11) and (1.9.12) constitute a complete set of equations for the motion subject to the Nc constraint conditions. Let us study these equations in more detail. To keep the notation simple, let us consider just a single particle in three dimensions described by a Cartesian position vector r(t) subject to a single constraint σ(r) = 0. According to eqns. (1.9.11) and (1.9.12), the constrained equations of motion take the form m¨r = F(r) + λ∇σ ∇σ · r˙ = 0.

(1.10.1)

These equations will generate classical trajectories of the system for different initial conditions {r(0), r˙ (0)} provided the condition σ(r(0)) = 0 is satisfied. If this condition

36

Classical mechanics

is true, then the trajectory will obey σ(r(t)) = 0. Conversely, for each r visited along the trajectory, the condition σ(r) = 0 will be satisfied. The latter condition defines a surface on which the motion described by eqns. (1.10.1) must remain. This surface is called the surface of constraint. The quantity ∇σ(r) is a vector that is orthogonal to the surface at each point r. Thus, the second equation (1.10.1) expresses the fact that the velocity must also lie in the surface of constraint, hence it must be perpendicular to ∇σ(r). Of the two force terms appearing in eqns. (1.10.1), the first is an “unconstrained” force which, alone, would allow the particle to drift off of the surface of constraint. The second term must, then, correct for this tendency. If the particle starts from rest, this second term exactly removes the component of the force perpendicular to the surface of constraint as illustrated in Fig. 1.10. This minimal projection

Fig. 1.10 Representation of the minimal force projection embodied in Gauss’s principle of least constraint.

of the force, first conceived by Karl Friedrich Gauss (1777-1855), is known as Gauss’s principle of least constraint (Gauss, 1829). The component of the force perpendicular to the surface is F⊥ (r) = [n(r) · F(r)] n(r), (1.10.2) where n(r) is a unit vector perpendicular to the surface at r; n is given by n(r) =

∇σ(r) . |∇σ(r)|

(1.10.3)

Thus, the component of the force parallel to the surface is Fk (r) = F(r) − F⊥ (r) = F(r) − [n(r) · F(r)] n(r).

(1.10.4)

If the particle is not at rest, the projection of the force cannot lie entirely in the surface of constraint. Rather, there must be an additional component of the projection which can project any free motion of the particle directed off the surface of constraint. This additional term must sense the curvature of the surface in order to affect the required projection; it must also be a minimal projection perpendicular to the surface. In order to show that Gauss’s principle is consistent with the Lagrangian formulation of the constraint problem and find the additional projection when the particle’s velocity is not zero, we make use of the second of eqns. (1.10.1) and differentiate it once with respect to time. This yields: ∇σ · ¨r + ∇∇σ · ·˙rr˙ = 0,

(1.10.5)

Rigid body motion

37

where the double dot-product notation in the expression ∇∇σ · ·˙rr˙ indicates a full contraction of the two tensors ∇∇σ and r˙ r˙ . The first of eqns. (1.10.1) is then used to substitute in for the second time derivative appearing in eqn. (1.10.5) to yield:   F λ∇σ ∇σ · + ∇∇σ · ·˙rr˙ = 0. (1.10.6) + m m We can now solve for the Lagrange multiplier λ to yield the analytical expression λ=−

∇∇σ · ·˙rr˙ + ∇σ · F/m . |∇σ|2 /m

(1.10.7)

Finally, substituting eqn. (1.10.7) back into the first of eqns. (1.10.1) yields the equation of motion: ∇∇σ · ·˙rr˙ + ∇σ · F/m ∇σ, (1.10.8) m¨r = F − |∇σ|2 /m

which is known as Gauss’s equation of motion. Note that when r˙ = 0, the total force appearing on the right is   (∇σ · F)∇σ ∇σ ∇σ =F− ·F , (1.10.9) F− |∇σ|2 |∇σ| |∇σ| which is just the projected force in eqn. (1.10.4). For r˙ 6= 0, the additional force term involves a curvature term ∇∇σ contracted with the velocity-vector dyad r˙ r˙ . Since this term would be present even if F = 0, this term clearly corrects for free motion off the surface of constraint. Having eliminated λ from the equation of motion, eqn. (1.10.8) becomes an equation involving a velocity-dependent force. This equation, alone, generates motion on the correct constraint surface, has a conserved energy, E = m˙r2 /2 + U (r), and, by construction, conserves σ(r) in the sense that dσ/dt = 0 along a trajectory. However, this equation cannot be derived from a Lagrangian or a Hamiltonian and, therefore, constitutes an example of non-Hamiltonian dynamical system (see also Section 1.12). Gauss’s procedure for obtaining constrained equations of motion can be generalized to an arbitrary number of particles or constraints satisfying the proper differential constraints relations.

1.11

Rigid body motion: Euler angles and quaternions

The discussion of constraints leads naturally to the topic of rigid body motion. Rigid body techniques can be particularly useful in treating small molecules such as water or ammonia or large, approximately rigid subdomains of large molecules, in that these techniques circumvent the need to treat large numbers of explicit degrees of freedom. Imagine a collection of n particles with all interparticle distances constrained. Such a system, known as a rigid body, has numerous applications in mechanics and statistical mechanics. An example in chemistry is the approximate treatment of small molecules with very high frequency internal vibrations. A water molecule (H2 O) could be treated as a rigid isosceles triangle by constraining the two OH bond lengths and the distance

38

Classical mechanics

between the two hydrogens for a total of three holonomic constraint conditions. An ammonia molecule (NH3 ) could also be treated as a rigid pyramid by fixing the three NH bond lengths and the three HH distances for a total of six holonomic constraint conditions. In a more complex molecule, such as the alanine dipeptide shown in Fig. 1.11, specific groups can be treated as rigid. Groups of this type are shaded in Fig. 1.11.

Fig. 1.11 Rigid subgroups in a large molecule, the alanine dipeptide.

Of course, it is always possible to treat these constraint conditions explicitly using the Lagrange multiplier formalism. However, since all the particles in a rigid body move as a whole, a simple and universal formalism can be used to treat all rigid bodies that circumvents the need to impose explicitly the set of holonomic constraints that keep the particles at fixed relative positions. Before discussing rigid body motion, let us consider the problem of rotating a rigid body about an arbitrary axis in a fixed frame. Since a rotation performed on a rigid body moves all of the atoms uniformly, it is sufficient for us to consider how to rotate a single vector r about an arbitrary axis. The problem is illustrated in Fig. 1.12. Let n designate a unit vector along the axis of rotation, and let r′ be the result of rotating r by an angle θ clockwise about the axis. In the notation of Fig. 1.12, straightforward geometry shows that r′ is the result of a simple vector addition: −→

−→

−→

r′ = OC + CS + SQ .

(1.11.1)

Since the angle CSQ is a right angle, the three vectors in eqn. (1.11.1) can be expressed in terms of the original vector r, the angle θ, and the unit vector n according to r′ = n(n · r) + [r − n(n · r)] cos θ + (r×n) sin θ,

(1.11.2)

which can be rearranged to read r′ = r cos θ + n(n · r)(1 − cos θ) + (r×n) sin θ.

(1.11.3)

Equation (1.11.3) is known as the rotation formula, which can be used straightforwardly when an arbitrary rotation needs to be performed. In order to illustrate the concept of rigid body motion, consider the simple problem of a rigid homonuclear diatomic molecule in two dimensions, in which each atom has

Rigid body motion

39

Fig. 1.12 Rotation of the vector r to r′ about an axis n.

a mass m. We shall assume that the motion occurs in the xy plane. Let the Cartesian positions of the two atoms be r1 and r2 and let the molecule be subject to a potential of the form U (r1 − r2 ). The Lagrangian for the molecule can be written as

1 2 1 2 m˙r + m˙r − U (r1 − r2 ). (1.11.4) 2 1 2 2 For such a problem, it is useful to transform to center-of-mass R = (r1 + r2 )/2 and relative r = r1 − r2 coordinates, in terms of which the Lagrangian becomes L=

1 ˙2 1 2 M R + µ˙r − U (r), (1.11.5) 2 2 where M = 2m and µ = m/2 are the total and reduced masses, respectively. Note that in these coordinates, the center of mass has an equation of motion of the form ¨ = 0, which is the equation of motion of a free particle. As we have already seen, MR ˙ is a constant. According to the principle this means that the center-of-mass velocity R of Galilean relativity, the physics of the system must be the same in a fixed coordinate frame as in a coordinate frame moving with constant velocity. Thus, we may transform to a coordinate system that moves with the molecule. Such a coordinate frame is called a body-fixed frame. The origin of the body-fixed frame lies at the center of mass of the molecule, and in this frame the coordinates of the two atoms r1 = r/2 and r2 = −r/2. It is, therefore, clear that only the motion of the relative coordinate r needs to be considered. Note that we may use the body-fixed frame even if the center-of-mass velocity is not constant in order to separate the rotational and translational kinetic energies of the rigid body. In the body-fixed frame, the Lagrangian of eqn. (1.11.5) becomes 1 L = µ˙r2 − U (r). (1.11.6) 2 In a two-dimensional space, the relative coordinate r is the two-component vector r = (x, y). However, if the distance between the two atoms is fixed at a value d, then L=

40

Classical mechanics

there is a constraint in the form of x2 + y 2 = d2 . Rather than treating the constraint via a Lagrange multiplier, we could transform to polar coordinates according to x = d cos θ

y = d sin θ.

(1.11.7)

The velocities are given by x˙ = −d(sin θ)θ˙

y˙ = d(cos θ)θ˙

(1.11.8)

so that the Lagrangian becomes L=

 1 1 ˜ (θ), µ x˙ 2 + y˙ 2 − U (x, y) = µd2 θ˙2 − U 2 2

(1.11.9)

˜ (θ), indicates that the powhere the notation, U (r) = U (x, y) = U (d cos θ, d sin θ) ≡ U tential varies only according to the variation in θ. Equation (1.11.9) demonstrates that the rigid body has only one degree of freedom, namely, the single angle θ. According to the Euler-Lagrange equation (1.4.6), the equation of motion for the angle is µd2 θ¨ = −

˜ ∂U . ∂θ

(1.11.10)

In order to understand the physical content of eqn. (1.11.10), we first note that the origin of the body-fixed frame lies at the center-of-mass position R. We further note the motion occurs in the xy plane and therefore consists of rotation about an axis perpendicular to this plane, in this case, about the z-axis. The quantity ω = θ˙ is called the angular velocity about the z-axis. The quantity I = µd2 is a constant known as the moment of inertia about the z-axis. Since the motion is purely angular, we can define an angular momentum, analogous to the Cartesian momentum, by l = µd2 θ˙ = Iω. In general, angular momentum, like the Cartesian momentum, is a vector quantity given by l = r×p. (1.11.11) For the present problem, in which all of the motion occurs in the xy-plane (no motion along the z-axis), l has only one nonzero component, namely lz , given by lz = xpy − ypx = x(µy) ˙ − y(µx) ˙ = µ(xy˙ − y x) ˙ = µd2 (θ˙ cos2 θ + θ˙ sin2 θ) ˙ = µd2 θ.

(1.11.12)

Equation (1.11.12) demonstrates that although the motion occurs about the z-axis, the direction of the angular momentum vector is along the z-axis. Since the angular

Rigid body motion

41

momentum lz is given as the product of the moment of inertia I and the angular velocity ω (lz = Iω), the angular velocity must also be a vector whose direction is along the z-axis. Thus, we write the angular velocity vector for this problem as ˙ and l = Iω. Physically, we see that the moment of inertia plays the role ω = (0, 0, θ) of “mass” in angular motion; however, its units are mass×length2 . The form of the moment of inertia indicates that the farther away from the axis of rotation an object is, the greater will be its angular momentum, although its angular velocity is the same at all distances from the axis of rotation. It is interesting to calculate the velocity in the body-fixed frame. The components of the velocity v = (vx , vy ) = (x, ˙ y) ˙ are given by eqn. (1.11.8). Note, however, that these can also be expressed in terms of the angular velocity vector, ω. In particular, the velocity vector is expressible as a cross product v = r˙ = ω×r. ˙ the cross product has two nonvanishing components Since ω = (0, 0, θ), vx = −ωz y = −d(sin θ)θ˙ ˙ vy = ωz x = d(cos θ)θ,

(1.11.13)

(1.11.14)

which are precisely the velocity components given by eqn. (1.11.8). Equation (1.11.14) determines the velocity of the relative position vector r. In the body-fixed frame, the velocities of atoms 1 and 2 would be −r˙ /2 and r˙ /2, respectively. If we wish to determine the velocity of, for example, atom 1 at position r1 in a space-fixed frame rather than in the body-fixed frame, we need to add back the velocity of the body-fixed frame. To do this, write the position r1 as 1 (1.11.15) r1 = R + r. 2 Thus, the total velocity v1 = r˙ 1 is ˙ + 1 r˙ . r˙ 1 = R (1.11.16) 2 The first term is clearly the velocity of the body-fixed frame, while the second term is the velocity of r1 relative to the body-fixed frame. Note, however, that if the motion of r1 relative to the body-fixed frame is removed, the remaining component of the velocity is just that due to the motion of the body-fixed frame, and we may write   dr1 dR = . (1.11.17) dt body dt Since r˙ = ω×r, the total time derivative of the vector r1 becomes     1 dr1 dr1 = + ω× r dt space dt body 2     dr1 dr1 = + ω×r1 , dt space dt body

(1.11.18)

where the first term in the second line is interpreted as the velocity due solely to the motion of the body-fixed frame and the second term is the rate of change of r1

42

Classical mechanics

in the body-fixed frame. A similar relation can be derived for the time derivative of the position r2 of atom 2. Indeed, eqn. (1.11.18) applies to the time derivative of any arbitrary vector G:     dG dG = + ω×G. (1.11.19) dt space dt body Although it is possible to obtain eqn. (1.11.19) from a general consideration of rotational motion, we content ourselves here with this physically motivated approach. Consider, next, the force term −∂U/∂θ. This is also a component of a vector quantity known as the torque about the z-axis. In general, τ is defined by τ = r×F.

(1.11.20)

Again, because the motion is entirely in the xy-plane, there is no z-component of the force, and the only nonvanishing component of the torque is the z-component given by τz = xFy − yFx = −d cos θ

∂U ∂U + d sin θ ∂y ∂x

= −d cos θ

∂U ∂θ ∂U ∂θ + d sin θ , ∂θ ∂y ∂θ ∂x

(1.11.21)

where the chain rule has been used in the last line. Since θ = tan−1 (y/x), the two derivatives of θ can be worked out as ∂θ 1 1 = ∂y 1 + (y/x)2 x x cos θ = 2 +y d  y  1 ∂θ − = ∂x 1 + (y/x)2 x2 =

x2

=−

x2

y sin θ =− . 2 +y d

Substitution of eqn. (1.11.22) into eqn. (1.11.21) gives ∂U τz = − ∂θ =−

  cos θ sin θ d cos θ + d sin θ d d

 ∂U cos2 θ + sin2 θ ∂θ

(1.11.22)

Rigid body motion

=−

∂U . ∂θ

43

(1.11.23)

Therefore, we see that the torque is simply the force on an angular coordinate. The equation of motion can thus be written in vector form as dl = τ, dt

(1.11.24)

which is analogous to Newton’s second law in Cartesian form dp = F. dt

(1.11.25)

A rigid diatomic, being a linear object, can be described by a single angle coordinate in two dimensions or by two angles in three dimensions. For a general rigid body consisting of n atoms in three dimensions, the number of constraints needed to make it rigid is 3n−6 so that the number of remaining degrees of freedom is 3n−(3n−6) = 6. After removing the three degrees of freedom associated with the motion of the bodyfixed frame, we are left with three degrees of freedom, implying that three angles are needed to describe the motion of a general rigid body. These three angles are known as the Euler angles. They describe the motion of the rigid body about three independent axes. Although several conventions exist for defining these axes, any choice is acceptable. A particularly convenient choice of the axes can be obtained as follows: Consider the total angular momentum of the rigid body, obtained as a sum of the individual angular momentum vectors of the constituent particles: l=

n X

ri ×pi =

n X

mi ri ×vi .

(1.11.26)

i=1

i=1

Now, vi = dri /dt is measured in the body-fixed frame. From the analysis above, it follows that the velocity is just ω×ri so that l=

n X

mi ri ×(ω×ri ).

(1.11.27)

i=1

Expanding the double cross product, we find that l=

n X i=1

 mi ωri2 − ri (ri · ω) ,

(1.11.28)

which, in component form, becomes lx = ω x

n X i=1

mi (ri2 − x2i ) − ωy

n X i=1

mi xi yi − ωz

n X i=1

mi xi zi

44

Classical mechanics

ly = −ωx lz = −ωx

n X

mi yi xi + ωy

i=1

i=1

n X i=1

n X

mi zi xi − ωy

n X

mi (ri2 − yi2 ) − ωz mi z i y i + ω z

i=1

n X i=1

n X

mi y i z i

i=1

mi (ri2 − zi2 ).

Equation (1.11.29) can be written in matrix form as      ωx lx Ixx Ixy Ixz  ly  =  Iyx Iyy Iyz   ωy  . ωz lz Izx Izy Izz

(1.11.29)

(1.11.30)

The matrix elements are given by Iαβ =

n X i=1

 mi ri2 δαβ − ri,α ri,β ,

(1.11.31)

where α, β = (x, y, z) and ri,α is the αth component of the ith position vector in the body-fixed frame. The matrix Iαβ is known as the inertia tensor and is the generalization of the moment of inertia defined previously. The inertia tensor is symmetric (Iαβ = Iβα ), and can therefore be diagonalized via an orthogonal transformation. Consequently, it has real eigenvalues denoted I1 , I2 and I3 . The eigenvectors of the inertia tensor define a new set of mutually orthogonal axes about which we may describe the motion. When these axes are used, the inertia tensor is diagonal. Since the angular momentum is obtained by acting with the inertia tensor on the angular velocity vector, it follows that, in general, l is not parallel to ω as it was in the two-dimensional problem considered above. Thus, ω×l 6= 0 so that the time derivative of l in a space-fixed frame obeys eqn. (1.11.19):     dl dl = + ω×l. (1.11.32) dt space dt body Accordingly, the rate of change of l in the space-fixed frame will be determined simply by the torque according to   dl = τ. (1.11.33) dt space Expressing this in terms of the body-fixed frame (and dropping the “body” subscript) eqn. (1.11.32) yields dl + ω×l = τ . (1.11.34) dt Finally, using the fact that l = Iω and working with a set of axes in terms of which I is diagonal, the equations of motion for the three components ω1 , ω2 , and ω3 of the angular velocity vector become I1 ω˙ 1 − ω2 ω3 (I2 − I3 ) = τ1

Rigid body motion

I2 ω˙ 2 − ω3 ω1 (I3 − I1 ) = τ2

I3 ω˙ 3 − ω1 ω2 (I1 − I2 ) = τ3 .

45

(1.11.35)

These are known as the rigid body equations of motion. Given the solutions of these equations of motion for ωi (t), the three Euler angles, denoted (φ, ψ, θ), are then given as solutions of the differential equations ω1 = φ˙ sin θ sin ψ + θ˙ cos ψ ω2 = φ˙ sin θ cos ψ − θ˙ sin ψ ˙ ω3 = φ˙ cos θ + ψ.

(1.11.36)

The complexity of the rigid body equations of motion and the relationship between the angular velocity and the Euler angles renders the solution of the equations of motion a nontrivial problem. (In a numerical scheme, for example, there are singularities when the trigonometric functions approach 0.) For this reason, it is preferable to work in terms of a new set of variables known as quaternions. As the name suggests, a quaternion is a set of four variables that replaces the three Euler angles. Since there are only three rotational degrees of freedom, the four quaternions cannot be independent. In order to illustrate the idea of the quaternion, let us consider the analogous problem in a smaller number of dimensions (where we might call the variables “binarions” or “ternarions” depending on the number of angles being replaced). Consider again a rigid diatomic moving in the xy plane. The Lagrangian for the system is given by eqn. (1.11.9). Introduce a unit vector q = (q1 , q2 ) ≡ (cos θ, sin θ).

(1.11.37)

Clearly, q · q = q12 + q22 = cos2 θ + sin2 θ = 1. Note also that ˙ (cos θ)θ) ˙ q˙ = (q˙1 , q˙2 ) = (−(sin θ)θ,

(1.11.38)

L = µd2 q˙ 2 − U (q),

(1.11.39)

so that where U (q) indicates that the potential depends on q since r = dq. The present formulation is completely equivalent to the original formulation in terms of the angle θ. However, suppose we now treat q1 and q2 directly as the dynamical variables. If we wish to do this, we need to ensure that the condition q12 + q22 = 1 is obeyed, which could be achieved by treating this condition as a constraint (in Section 3.12, we shall see how to formulate the problem so as to avoid the need for an explicit constraint on the components of q). In this case, q would be an example of a “binarion.” The “binarion” structure is rather trivial and seems to bring us right back to the original problem we sought to avoid by formulating the motion of a rigid diatomic in terms of the angle θ at the outset! We must wait until Section 3.12 to see how to sidestep the constraint. For a diatomic in three dimensions, the rigid-body equations of motion would be reformulated using three variables (q1 , q2 , q3 ) satisfying q12 + q22 + q32 = 1 so that they are equivalent to (sin θ cos φ, sin θ sin φ, cos θ).

46

Classical mechanics

For a rigid body in three dimensions, we require four variables, (q1 , q2 , q3 , q4 ), the P quaternions, that must satisfy 4i=1 qi2 = 1 and are, by convention, formally related to the three Euler angles by     φ+ψ θ cos q1 = cos 2 2     φ−ψ θ cos q2 = sin 2 2     θ φ−ψ q3 = sin sin 2 2     φ+ψ θ sin . (1.11.40) q4 = cos 2 2 P From eqn. (1.11.40), it is straightforward to verify that i qi2 = 1. The advantage of the quaternion structure is that it leads to a simplification of the rigid-body motion problem. First, note that at any time, a Cartesian coordinate vector in the space fixed frame can be transformed into the body-fixed frame via a rotation matrix involving the quaternions. The relations are r(body) = A(θ, φ, ψ)r(space)

r(space) = AT (θ, φ, ψ)r(body) .

(1.11.41)

The rotation matrix is the product of individual rotations about the three axes, which yields cos ψ cos φ − cos θ sin φ sin ψ

cos ψ sin φ + cos θ cos φ sin ψ

  A(θ, φ, ψ) =  − sin ψ cos φ − cos θ sin φ cos ψ 

− sin ψ sin φ + cos θ cos φ cos ψ



sin θ sin φ

− sin θ cos φ

sin θ sin ψ



  − sin θ cos ψ  .  cos θ

(1.11.42)

In terms of quaternions, the matrix can be expressed in a simpler-looking form as  2  q1 + q22 − q32 − q42 2(q2 q3 + q1 q4 ) 2(q2 q4 − q1 q3 )     A(q) =  2(q2 q3 − q1 q4 ) q12 − q22 + q32 − q42 2(q3 q4 + q1 q2 )  . (1.11.43)   2(q2 q4 + q1 q3 )

2(q3 q4 − q1 q2 )

q12 − q22 − q32 + q42

It should be noted that in the body-fixed coordinate, the moment of inertia tensor is diagonal. The rigid-body equations of motion, eqns. (1.11.35), can now be transformed into a set of equations of motion involving the quaternions. Direct transformation of these equations leads to a new set of equations of motion given by q˙ = ω˙ x =

1 S(q)ω 2 (Iyy − Izz ) τx + ωy ωz Ixx Ixx

Non-Hamiltonian systems

ω˙ y =

(Izz − Ixx ) τy + ωz ωx Iyy Iyy

ω˙ z =

τz (Ixx − Iyy ) + ωx ωy . Izz Izz

47

(1.11.44)

Here, ω = (0, ωx , ωy , ωz ) and q1  q2 S(q) =  q3 q4 

−q2 q1 q4 −q3

−q3 −q4 q1 q2

 −q4 q3  . −q2 q1

(1.11.45)

These equations of motion must be supplemented by the constraint condition 1. The equations of motion have the conserved energy E=

 1 Ixx ωx2 + Iyy ωy2 + Izz ωz2 + U (q). 2

2 i qi

P

=

(1.11.46)

Conservation of the energy in eqn. (1.11.46) can be shown by recognizing that the torques can be written as ∂U 1 . (1.11.47) τ = − S(q)T 2 ∂q

1.12

Non-Hamiltonian systems

There is a certain elegance in the symmetry between coordinates and momenta of Hamilton’s equations of motion. Up to now, we have mostly discussed systems obeying Hamilton’s principle, yet it is important for us to take a short detour away from this path and discuss more general types of dynamical equations of motion that cannot be derived from a Lagrangian or Hamiltonian function. These are referred to as nonHamiltonian systems. Why might we be interested in non-Hamiltonian systems in the first place? To begin with, we note that Hamilton’s equations of motion can only describe a conservative system isolated from its surroundings and/or acted upon by an applied external field. However, Newton’s second law is more general than this and could involve forces that are non-conservative and, hence, cannot be derived from a potential function. There are numerous physical systems that are characterized by non-conservative forces, including systems subject to frictional forces and damping effects or the celebrated Lorenz model that catalyzed major interest in the study of chaotic dynamics. We noted previously that Gauss’s equations of motion (1.10.8) constitute another example of a non-Hamiltonian system. In order to understand how non-Hamiltonian systems may be useful in statistical mechanics, consider a physical system in contact with a much larger system, referred to as a bath, which regulates some macroscopic property of the physical system such as its pressure or temperature. Were we to consider the microscopic details of the system plus the bath together, we could, in principle, write down a Hamiltonian for

48

Classical mechanics

the entire system and determine the evolution of the physical subsystem. However, we are rarely interested in all of the microscopic details of the bath. We might, therefore, consider treating the effect of the bath in a more coarse-grained manner by replacing its microscopic coordinates and momenta with a few simpler variables that couple to the physical subsystem in a specified manner. In this case, a set of equations of motion describing the physical system plus the few additional variables used to represent the action of the bath could be proposed which generally would not be Hamiltonian in form because the true microscopic nature of the bath had been eliminated. For this reason, non-Hamiltonian dynamical systems can be highly useful and it is instructive to examine some of their characteristics. We will restrict ourselves to dynamical systems of the generic form x˙ = ξ(x),

(1.12.1)

where x is a phase space-vector of n components and ξ(x) is a continuous, differentiable n-dimensional vector function. A key signature of a non-Hamiltonian system is that it can have a nonvanishing phase-space compressibility: κ(x) =

n X ∂ x˙ i i=1

∂xi

=

n X ∂ξi 6= 0. ∂xi i=1

(1.12.2)

When eqn. (1.12.2) holds, many of the theorems about Hamiltonian systems no longer apply. However, as will be shown in Chapter 2, some properties of Hamiltonian systems can be generalized to non-Hamiltonian systems provided certain conditions are met. It is important to note that when a Hamiltonian system is formulated in non-canonical variables, the resulting system can also have a nonvanishing compressibility. Strictly speaking, such systems are not truly non-Hamiltonian since a simple transformation back to a canonical set of variables can eliminate the nonzero compressibility. However, throughout this book, we will group such cases in with our general discussion of non-Hamiltonian systems and loosely refer to them as non-Hamiltonian because the techniques we will develop for analyzing dynamical systems with nonzero compressibility factors can be applied equally well to both types of systems. A simple and familiar example of a non-Hamiltonian system is the case of the damped forced harmonic oscillator described by an equation of motion of the form m¨ x = −mω 2 x − ζ x. ˙

(1.12.3)

This equation describes a harmonic oscillator subject to the action of a friction force −ζ x, ˙ which could arise, for example, by the motion of the oscillator on a rough surface. Obviously, such an equation cannot be derived from a Hamiltonian. Moreover, the microscopic details of the rough surface are not treated explicitly but rather are modeled grossly by the simple dissipative term in the equation of motion for the physical subsystem described by the coordinate x. Writing the equation of motion as two first order equations involving a phase-space vector (x, p), we have

Non-Hamiltonian systems

x˙ =

p m

p˙ = −mω 2 x − ζ

p . m

49

(1.12.4)

It can be seen that this dynamical system has a non-vanishing compressibility κ(x, p) =

∂ x˙ ∂ p˙ ζ + =− . ∂x ∂p m

(1.12.5)

The fact that the compressibility is negative indicates that the effective “phase-space volume” occupied by the system will, as time increases, shrink and eventually collapse onto a single point in the phase space (x = 0, p = 0) as t → ∞ under the action of the damping force. All trajectories regardless of their initial condition will eventually approach this point as t → ∞. Consider an arbitrary volume in phase space and let all of the points in this volume represent different initial conditions for eqns. (1.12.4). As these initial conditions evolve in time, the volume they occupy will grow ever smaller until, as t → ∞, the volume tends toward 0. In complex systems, the evolution of such a volume of trajectories will typically be less trivial, growing and shrinking in time as the trajectories evolve. If, in addition, the damped oscillator is driven by a periodic forcing function, so that the equation of motion reads m¨ x = −mω 2 x − ζ x˙ + F0 cos Ωt,

(1.12.6)

then the oscillator will never be able to achieve the equilibrium situation described above but rather will achieve what is known as a steady state. The existence of a steady state can be seen by considering the general solution F0 x(t) = e−γt [A cos λt + B sin λt] + p sin(Ωt + β) 2 (ω − Ω2 )2 + 4γ 2 Ω2

(1.12.7)

of eqn. (1.12.6), where γ=

ζ 2m

λ=

p ω2 − γ 2

β = tan−1

ω 2 − Ω2 , 2γΩ

(1.12.8)

and A and B are arbitrary constants set by the choice of initial conditions x(0) and x(0). ˙ In the long-time limit, the first term decays to zero due to the exp(−γt) prefactor, and only the second term remains. This term constitutes the steady-state solution. Moreover, the amplitude of the steady-state solution can become large when the denominator is a minimum. Considering the function f (Ω) = (ω 2 − Ω2 )2 + 4γ 2 Ω2 , this function reaches a minimum when the frequency of the forcing function is chosen to p be Ω = ω 1 − γ 2 /(2ω 2 ). Such a frequency is called a resonant frequency. Resonances play an important role in classical dynamics when harmonic forces are present, a phenomenon that will be explored in greater detail in Chapter 3. Driven systems and steady states will be discussed in greater detail in Chapter 13.

50

Classical mechanics

1.13

Problems

1.1. Solve the equations of motion given arbitrary initial conditions for a onedimensional particle moving in a linear potential U (x) = Cx, where C is a constant, and sketch a representative phase-space plot. ∗

1.2. A particle of unit mass moves in a potential of the form U (x) = − a. Show that the function

2 ω2 x2 − a2 . 2 8a

x(t) = atanh[(t − t0 )ω/2] is a solution to Hamilton’s equations for this system, where t0 is an arbitrary constant. b. Let the origin of time be t = −∞ rather than t = 0. To what initial conditions does this solution correspond? c. Determine the behavior of this solution as t → ∞. d. Sketch the phase-space plot for this particular solution. 1.3. Determine the trajectory r(t) for a particle of mass m moving in three dimensions subject to a central potential of the form U (r) = kr2 /2. Verify your solution for different values of l and given values of m and k by numerically integrating eqn. (1.4.33). Discuss the behavior of the solution for different values of l. 1.4. Repeat problem 3 for a potential of the form U (r) = κ/r. ∗

1.5. Consider Newton’s equation of motion for a one-dimensional particle subject to an arbitrary force, m¨ x = F (x). A numerical integration algorithm for the equations of motion, known as the velocity Verlet algorithm (see Chapter 3), for a discrete time step value ∆t is x(∆t) = x(0) + ∆t p(∆t) = p(0) +

p(0) ∆t2 + F (x(0)) m 2m

∆t [F (x(0)) + F (x(∆t))] . 2

By considering the Jacobian matrix  ∂x(∆t) ∂x(0)

 J=

∂p(∆t) ∂x(0)

∂x(∆t) ∂p(0) ∂p(∆t) ∂p(0)



 ,

Problems

51

show that the algorithm is symplectic, and show that det[J] = 1. 1.6. A water molecule H2 O is subject to an external potential. Let the positions of the three atoms be denoted rO , rH1 , rH2 , so that the forces on the three atoms can be denoted FO , FH1 , and FH2 . Consider treating the molecule as completely rigid, with internal bond lengths dOH and dHH , so that the constraints are |rO − rH1 |2 − d2OH = 0 |rO − rH2 |2 − d2OH = 0

|rH1 − rH2 |2 − d2HH = 0. a. Derive the constrained equations of motion for the three atoms in the molecule in terms of undetermined Lagrange multipliers. b. Show that the forces of constraint do not contribute to the work done on the molecule in moving it from one spatial location to another. c. Determine Euler’s equations of motion about an axis perpendicular to the plane of the molecule in a body-fixed frame whose origin is located on the oxygen atom. d. Determine the equations of motion for the quaternions that describe this system. 1.7. Calculate the classical action for a one-dimensional free particle of mass m. Repeat for a harmonic oscillator of spring constant k. 1.8. A simple mechanical model of a diatomic molecule bound to a flat surface is illustrated in Fig. 1.13. Suppose the atom with masses m1 and m2 carry electrical charges q1 and q2 , respectively, and suppose that the molecule is subject to a constant external electric field E in the vertical direction, directed upwards. In this case, the potential energy of each atom will be qi Ehi , i = 1, 2 where hi is the height of the atom i above the surface. a. Using θ1 and θ2 as generalized coordinates, write down the Lagrangian of the system. b. Derive the equations of motion for these coordinates. c. Introduce the small-angle approximation, which assumes that the angles only execute small amplitude motion. What form do the equations of motion take in this approximation? 1.9. Use Gauss’s principle of least constraint to determine a set of non-Hamiltonian equations of motion for the two atoms in a rigid diatomic molecule of bond length d subject to an external potential. Take the constraint to be σ(r1 , r2 ) = |r1 − r2 |2 − d2 .

52

Classical mechanics

Fig. 1.13 Schematic of a diatomic molecule bound to a flat surface.

Determine the compressibility of your equations of motion. 1.10. Consider the harmonic polymer model of Section 1.7 in which the harmonic neighbor couplings all have the same frequency ω but the masses have alternating values m and M , respectively. For the case of N = 5 particles, determine the normal modes and their associated frequencies. 1.11. The equilibrium configuration of a molecule is represented by three atoms of equal mass at the vertices of a right isosceles triangle. The atoms can be viewed as connected by harmonic springs of equal force constant. Find the normal mode frequencies of this molecule, and, in particular, show that the zero-frequency mode is triply degenerate. 1.12. A particle of mass m moves in a double-well potential of the form U (x) =

2 U0 2 x − a2 . 4 a

Sketch the contours of the constant-energy surface H(x, p) = E in phase space for the following cases: a. E < U0 . b. E = U0 + ǫ, where ǫ ≪ U0 . c. E > U0 .

Problems ∗

53

1.13. The Hamiltonian for a system of N charged particles with charges qi , i = 1, ..., N and masses mi , i = 1, ..., N , positions, r1 , ..., rN and momenta p1 , ..., pN interacting with a static electromagnetic field is given by H=

N 2 X (pi − qi A(ri )/c) i=1

2mi

+

N X i=1

qi φ(ri ) +

N −1 X

N X

i=1 j=i+1

qi qj , |ri − rj |

where A(r) and φ(r) are the vector and scalar potentials of the field, respectively, and c is the speed of light. In terms of these quantities, the electric and magnetic components of the electromagnetic field, E(r) and B(r) are given by E(r) = −∇φ(r), B(r) = ∇ × A(r). a. Derive Hamilton’s equations for this system, and determine the force on each particle in terms of the electric and magnetic fields E(r) and B(r), respectively. This contribution to the force due to the field is known as the Lorentz force. Express the equations of motion in Newtonian form. b. Suppose N = 1, that the electric field is zero everywhere (E = 0), and that the magnetic field is a constant in the z direction, B = (0, 0, B). For this case, solve the equations of motion for an arbitrary initial condition and describe the motion that results. 1.14. Prove that the energy in eqn. (1.11.46) is conserved making use of eqn. (1.11.47) for the torques. 1.15. Let q1 , ..., q3N ≡ q denote the 3N Cartesian coordinate components in a system, such that (q1 , q2 , q3 ) = r1 , (q4 , q5 , q6 ) = r2 , etc. Similarly, let p1 , ..., p3N ≡ p denote the 3N Cartesian momentum components. Consider a Hamiltonian of the form   3N X pα H(q, p) = γ ln cosh √ + U (q1 , ..., q3N ), γmα α=1 where γ is an arbitrary parameter and {mα } are the masses associated with each degree of freedom. a. Derive Hamilton’s equations of motion from this Hamiltonian. b. Show that the equations of motion imply that p the speed of any degree of freedom |q˙α | is restricted such that |q˙α | = γ/mα , even though |pα | ∈ [0, ∞). ∗

1.16 (For amusement only): Consider a system with coordinate q, momentum p, and Hamiltonian qn pn + , H= n n

54

Classical mechanics

where n is an integer larger than 2. Show that if the energy E of the system is chosen such that nE = mn , where m is a positive integer, then no phasespace trajectory can ever pass through a point for which p and q are both positive integers.

2 Theoretical foundations of classical statistical mechanics 2.1

Overview

The field of thermodynamics began in precursory form with the work of Otto von Guericke (1602–1686) who designed the first vacuum pump in 1650 and with Robert Boyle (1627–1691) who, working with von Guericke’s design, discovered an inverse proportionality between the pressure and volume of a gas at a fixed temperature for a fixed amount of gas. This inverse proportionality became known as Boyle’s Law. Thermodynamics matured in the nineteenth century through the seminal work of R. J. Mayer (1814–1878) and J. P. Joule (1818–1889), who established that heat is a form of energy, of R. Clausius (1822–1888) and N. L. S. Carnot (1796–1832), who originated the concept of entropy, and of numerous others. This work is neatly encapsulated in what we now refer to as the laws of thermodynamics (see Section 2.2). As these laws are based on experimental observations, thermodynamics is a phenomenological theory of macroscopic matter, which has, nevertheless, withstood the test of time. The framework of thermodynamics is an elegantly self-consistent one that makes no reference to the microscopic constituents of matter. If, however, we believe in a microscopic theory of matter, then it must be possible to rationalize thermodynamics based on microscopic mechanical laws. In Chapter 1, we presented the laws of classical mechanics and applied them to several simple examples. The laws of classical mechanics imply that if the positions and velocities of all the particles in a system are known at a single instant in time, then the past evolution of the system leading to that point in time and the future evolution of the system from that point forward are known. The example systems considered in Chapter 1 consisted of one or a small number of degrees of freedom with simple forces, and we saw that the past and future of each system could be worked out from Newton’s second law of motion (see, for example, eqn. (1.2.10)). Thus, classical mechanics encodes all the information needed to predict the properties of a classical system at any instant in time. In order to provide a rational basis for thermodynamics, we should apply the microscopic laws of motion to macroscopic systems. However, this idea immediately meets with two serious issues. First, macroscopic systems possess an enormous number of degrees of freedom (1 mole consists of 6.022×1023 particles); second, real-world systems are characterized by highly nontrivial interactions. Hence, even though we should be able, in principle, to predict the microscopic detailed dynamics of any classical system knowing only the initial conditions, we quickly realize the hopelessness of this effort.

56

Theoretical foundations

The highly nonlinear character of the forces in realistic systems means that an analytical solution of the equations of motion is not available. If we propose, alternatively, to solve the equations of motion numerically on a computer, the memory requirement to store just one phase-space point for a system of 1023 particles exceeds what is available both today and in the foreseeable future. Thus, while classical mechanics encodes all the information needed to predict the properties of a system, the problem of extracting that information is seemingly intractable. In addition to the problem of the sheer size of macroscopic systems, another, more subtle, issue exists. The second law of thermodynamics prescribes a direction of time, namely, the direction in which the entropy increases. This “arrow” of time is seemingly at odds with the microscopic mechanical laws, which are inherently reversible in time.1 This paradoxical situation, known as Loschmidt’s paradox, seems to pit thermodynamics against microscopic mechanical laws. The reconciliation of macroscopic thermodynamics with the microscopic laws of motion required the development of a new field, statistical mechanics, the main topic of this book. Statistical mechanics began with ideas from Clausius and James C. Maxwell (1831–1879) but grew principally out of the work of Ludwig Boltzmann (1844– 1906) and Josiah W. Gibbs (1839–1903). (Other significant contributors include Henri Poincar´e, Albert Einstein, and later, Lars Onsager, Richard Feynman, Ilya Prigogine, Kenneth Wilson, and Benjamin Widom, to name just a few.) Early innovations in statistical mechanics derived from the realization that the macroscopic observable properties of a system do not depend strongly on the detailed dynamical motion of every particle in a macroscopic system but rather on gross averages that largely “wash out” these microscopic details. Thus, by applying the microscopic mechanical laws in a statistical fashion, a link can be provided between the microscopic and macroscopic theories of matter. Not only does this concept provide a rational basis for thermodynamics, it also leads to procedures for computing many other macroscopic observables. The principal conceptual breakthrough on which statistical mechanics is based is that of an ensemble, which refers to a collection of systems that share common macroscopic properties. Averages performed over an ensemble yield the thermodynamic quantities of a system as well as other equilibrium and dynamic properties. In this chapter, we will lay out the fundamental theoretical foundations of ensemble theory and show how the theory establishes the link between the microscopic and macroscopic realms. We begin with a discussion of the laws of thermodynamics and a number of important thermodynamic functions. Following this, we introduce the notion of an ensemble and the properties that an ensemble must obey. Finally, we will describe, in general terms, how to use an ensemble to calculate macroscopic properties. Specific types of ensembles and their use will be detailed in subsequent chapters.

1 It can be easily shown, for example, that Newton’s second law retains its form under a timereversal transformation t → −t. Under this transformation, d/dt → −d/dt, but d2 /dt2 → d2 /dt2 . Time-reversal symmetry implies that if a mechanical system evolves from an initial condition x0 at time t = 0 to xt at a time t > 0, and all the velocities are subsequently reversed (vi → −vi ), the system will return to its initial microscopic state x0 . The same is true of the microscopic laws of quantum mechanics. Consequently, it should not be possible to tell if a “movie” made of a mechanical system is running in the “forward” or “reverse” direction.

Laws of thermodynamics

2.2

57

The laws of thermodynamics

Our discussion of the laws of thermodynamics will make no reference to the microscopic constituents of a particular system. Concepts and definitions we will need for the discussion are described below: i. A thermodynamic system is a macroscopic system. Thermodynamics always divides the universe into the system and its surroundings. A thermodynamic system is said to be isolated if no heat or material is exchanged between the system and its surroundings and if the surroundings produce no other change in the thermodynamic state of the system. ii. A system is in thermodynamic equilibrium if its thermodynamic state does not change in time. iii. The fundamental thermodynamic parameters that define a thermodynamic state, such as the pressure P , volume V , the temperature T , and the total mass M or number of moles n are measurable quantities assumed to be provided experimentally. A thermodynamic state is specified by providing values of all thermodynamic parameters necessary for a complete description of a system. iv. The equation of state of a system is a relationship among the thermodynamic parameters prescribing how these parameters vary from one equilibrium state to another. Thus, if P , V , T , and n are the fundamental thermodynamic parameters of a system, the equation of state takes the general form g(n, P, V, T ) = 0.

(2.2.1)

As a consequence of eqn. (2.2.1), there are in fact only three independent thermodynamic parameters in an equilibrium state. When the number of moles remains fixed, the number of independent parameters is reduced to two. An example of an equation of state is that of an ideal gas, which is defined (thermodynamically) as a system whose equation of state is P V − nRT = 0,

(2.2.2)

where R = 8.315 J·mol−1 ·K−1 is the gas constant. The ideal gas represents the limiting behavior of all real gases at sufficiently low density ρ ≡ n/V . v. A thermodynamic transformation is a change in the thermodynamic state of a system. In equilibrium, a thermodynamic transformation is effected by a change in the external conditions of the system. Thermodynamic transformations can be carried out either reversibly or irreversibly. In a reversible transformation, the change is carried out slowly enough that the system has time to adjust to each new external condition imposed along a prescribed thermodynamic path, so that the system can retrace its history along the same path between the endpoints of the transformation. If this is not possible, then the transformation is irreversible. vi. A state function is any function f (n, P, V, T ) whose change under any thermodynamic transformation depends only on the initial and final states of the transformation and not on the particular thermodynamic path taken between these states (see Fig. 2.1).

58

Theoretical foundations

Fig. 2.1 The thermodynamic state space defined by the variables P , V , and T with two paths (solid and dashed lines) between the state points 1 and 2. The change in a state function f (n, P, V, T ) is independent of the path taken between any two such state points.

vii. In order to change the volume or the number of moles, work must be performed on a system. If a transformation is performed reversibly such that the volume changes by an amount dV and the number of moles changes by an amount dn, then the work performed on the system is dWrev = −P dV + µdn.

(2.2.3)

The quantity µ is called the chemical potential, defined to be the amount of work needed to add 1.0 mole of a substance to a system already containing that substance. viii. In order to change the temperature of a system, heat must be added or removed. The amount of heat dQ needed to change the temperature by an amount dT in a reversible process is given by dQrev = CdT.

(2.2.4)

The quantity C is called the heat capacity, defined to be the amount of heat needed to change the temperature of 1.0 mole of a substance by 1.0 degree on a chosen scale. If heat is added at fixed pressure, then the heat capacity is denoted CP . If heat is added at fixed volume, it is denoted CV . 2.2.1

The first law of thermodynamics

The first law of thermodynamics is a statement of conservation of energy. We saw in Section 1.6 that performing work on a system changes its potential (or internal) energy

Laws of thermodynamics

59

(see Section 1.4). Thermodynamics recognizes that heat is also a form of energy. The first law states that in any thermodynamic transformation, if a system absorbs an amount of heat ∆Q and has an amount of work ∆W performed on it, then its internal energy will change by an amount ∆E given by ∆E = ∆Q + ∆W.

(2.2.5)

(Older books define the first law in terms of the heat absorbed and work done by the system. With this convention, the first law is written ∆E = ∆Q − ∆W .) Although neither the heat absorbed ∆Q nor the work ∆W done on the system are state functions, the internal energy E is a state function. Thus, the transformation can be carried out along either a reversible or irreversible path, and the same value of ∆E will result. If E1 and E2 represent the energies before and after the transformation respectively, then ∆E = E2 − E1 , and it follows that an exact differential dE exists for the energy such that Z E2

∆E = E2 − E1 =

dE.

(2.2.6)

E1

However, since ∆E is independent of the path of the transformation, ∆E can be expressed in terms of changes along either a reversible or irreversible path: ∆E = ∆Qrev + ∆Wrev = ∆Qirrev + ∆Wirrev .

(2.2.7)

Suppose that reversible and irreversible transformations are carried out on a system with a fixed number of moles, and let the irreversible process be one in which the external pressure drops to a value Pext by a sudden volume change ∆V , thus allowing the system to expand rapidly. It follows that the work done on the system is ∆Wirrev = −Pext ∆V.

(2.2.8)

In such a process, the internal pressure P > Pext . If the same expansion is carried out reversibly (slowly), then the internal pressure has time to adjust as the system expands. Since Z V2 P dV, (2.2.9) ∆Wrev = − V1

where the dependence of the internal pressure P on the volume is specified by the equation of state, and since Pext in the irreversible process is less than P at all states visited in the reversible process, it follows that −∆Wirrev < −∆Wrev , or ∆Wirrev > ∆Wrev . However, because of eqn. (2.2.7), the first law implies that the amounts of heat absorbed in the two processes satisfy ∆Qirrev < ∆Qrev .

(2.2.10)

Equation (2.2.10) will be needed in our discussion of the second law of thermodynamics. Of course, since the thermodynamic universe is, by definition, an isolated system (it has no surroundings), its energy is conserved. Therefore, any change ∆Esys in a system must be accompanied by an equal and opposite change ∆Esurr in the surroundings so that the net energy change of the universe ∆Euniv = ∆Esys + ∆Esurr = 0.

60

Theoretical foundations

2.2.2

The second law of thermodynamics

Before discussing the second law of thermodynamics, it is useful to review the Carnot cycle. The Carnot cycle is the thermodynamic cycle associated with an ideal device or “engine” that takes in heat at a high temperature Th , expels heat at a low temperature Tl , and over the cycle, delivers a net amount of useful work. The ideal engine provides an upper bound on the efficiency that can be achieved by a real engine. The thermodynamic cycle of a Carnot engine is shown in Fig. 2.2, which is a plot of the process in the P -V plane. In the cycle, each of the four transformations (curves

Fig. 2.2 The Carnot cycle.

AB, BC, CD and DA in Fig. 2.2) is assumed to be performed reversibly on an ideal gas. The four stages of the cycle are defined as follows: • Path AB: An amount of heat Qh is absorbed at a high temperature Th , and the system undergoes an isothermal expansion at this temperature. • Path BC: The system further expands adiabatically so that no further heat is gained or lost. • Path CD: The system is compressed at a low temperature Tl , and an amount of heat Ql is released by the system. • Path DA: The system undergoes a further adiabatic compression in which no further heat is gained or lost. Since the cycle is closed, the change in the internal energy in this process is ∆E = 0. Thus, according to the first law of thermodynamics, the net work output by the Carnot engine is given by

Laws of thermodynamics

−Wnet = ∆Q = Qh + Ql .

61

(2.2.11)

The efficiency of any engine ǫ is defined as the ratio of the net work output to the heat input Wnet , (2.2.12) ǫ=− Qh from which it follows that the efficiency of the Carnot engine is ǫ=1+

Ql . Qh

(2.2.13)

On the other hand, the work done on (or by) the system during the adiabatic expansion and compression phases cancels, so that the net work comes from the isothermal expansion and compression segments. From the ideal gas law, eqn. (2.2.2), the work done on the system during the initial isothermal expansion phase is simply   Z VB Z VB nRTh VB P dV = − Wh = − dV = −nRTh ln , (2.2.14) V VA VA VA while the work done on the system during the isothermal compression phase is   Z VD VD nRTl . (2.2.15) dT = −nRTl ln Wl = − V VC VC However, because the temperature ratio for both adiabatic phases is the same, namely, Th /Tl , it follows that the volume ratios VC /VB and VD /VA are also the same. Since VC /VB = VD /VA , it follows that VB /VA = VC /VD , and the net work output is   VB −Wnet = nR(Th − Tl ) ln . (2.2.16) VA The internal energy of an ideal gas is E = 3nRT /2, and therefore the energy change during an isothermal process is ∆E = 0. Hence, for the initial isothermal expansion phase, ∆E = 0 and Wh = −Qh = nRTh ln(VB /VA ). The efficiency can also be expressed in terms of the temperatures as ǫ=−

nR(Th − Tl ) ln(VB /VA ) Tl Wnet = . =1− Qh nRTh ln(VB /VA ) Th

(2.2.17)

Equating the two efficiency expressions, we have Ql Tl = 1− Qh Th Tl Ql =− Qh Th Qh Ql + = 0. Th Tl 1+

(2.2.18)

62

Theoretical foundations

Equation (2.2.18) indicates that there is a quantity ∆Qrev /T whose change over the closed cycle is 0. The “rev” subscript serves as a reminder that the Carnot cycle is carried out using reversible transformations. Thus, the quantity ∆Qrev /T is a state function, and although we derived this fact using an idealized Carnot cycle, it turns out that this quantity is always a state function. This means that there is an exact differential dS = dQrev /T such that S is a state function. The quantity ∆S, defined by Z 2 dQrev , (2.2.19) ∆S = T 1 is therefore independent of the path over which the transformation from state “1” to state “2” is carried out. The quantity S is the entropy of the system. The second law of thermodynamics is a statement about the behavior of the entropy in any thermodynamic transformation. From eqn. (2.2.10), which implies that dQirrev < dQrev , we obtain dS =

dQirrev dQrev > , T T

(2.2.20)

which is known as the Clausius inequality. If this inequality is now applied to the thermodynamic universe, an isolated system that absorbs and releases no heat (dQ = 0), then the total entropy dStot = dSsys + dSsurr satisfies dStot ≥ 0.

(2.2.21)

That is, in any thermodynamic transformation, the total entropy of the universe must either increase or remain the same. dStot > 0 pertains to an irreversible process while dStot = 0 pertains to a reversible process. Equation (2.2.21) is the second law of thermodynamics. Our analysis of the Carnot cycle allows us to understand two equivalent statements of the second law. The first, attributed to William Thomson (1824–1907), known later as the First Baron Kelvin or Lord Kelvin, reads: There exists no thermodynamic transformation whose sole effect is to extract a quantity of heat from a high-temperature source and convert it entirely into work. In fact, some of the heat absorbed at Th is always lost in the form of waste heat, which in the Carnot cycle is the heat Ql released at Tl . The loss of waste heat means that −Wnet < −Wh or that the net work done by the system must be less than the work done during the first isothermal expansion phase. Now suppose we run the Carnot cycle in reverse so that an amount of heat Ql is absorbed at Tl and released as Qh at Th . In the process, an amount of work Wnet is consumed by the system. Thus, the Carnot cycle operated in reverse performs as a refrigerator, moving heat from a cold source to a hot source. This brings us to the second statement of the second law, attributed to Clausius: There exists no thermodynamic transformation whose sole effect is to extract a quantity of heat from a cold source and deliver it to a hot source. That is, heat does not flow spontaneously from cold to hot; moving heat in this direction requires that work be done.

Ensembles

2.2.3

63

The third law of thermodynamics

As with any state function, it is only possible to measure changes in the entropy, which does not inherently require an absolute entropy scale. The third law of thermodynamics defines an absolute scale of entropy: The entropy of a system at the absolute zero of temperature is a universal constant, which can be taken to be zero. Absolute zero of temperature is defined as T = 0 on the Kelvin scale; it is a temperature that can never be physically reached. The unattainability of absolute zero is sometimes taken as an alternative statement of the third law. A consequence of the unattainability of absolute zero temperature is that the ideal (Carnot) engine can never be 100% efficient, since this would require sending Tl → 0 in eqn. (2.2.17), which is not possible. As we will see in Chapter 10, the third law of thermodynamics is actually a macroscopic manifestation of quantum mechanical effects.

2.3

The ensemble concept

We introduced the laws of thermodynamics without reference to the microscopic origin of macroscopic thermodynamic observables. Without a microscopic basis, thermodynamics must be regarded as a phenomenological theory. We now wish to provide this microscopic basis and establish a connection between the macroscopic and microscopic realms. As we remarked at the beginning of the chapter, we cannot solve the classical equations of motion for a system of 1023 particles with the complex, nonlinear interactions that govern the behavior of real systems. Nevertheless, it is instructive to pose the following question: If we could solve the equations of motion for such a large number of particles, would the vast amount of detailed microscopic information generated be necessary to describe macroscopic observables? Intuitively, we would answer this question with “no.” Although the enormous quantity of microscopic information is certainly sufficient to predict any macroscopic observable, there are many microscopic configurations of a system that lead to the same macroscopic properties. For example, if we connect the temperature of a system to an average of kinetic energy of the individual particles composing the system, then there are many ways to assign the velocities of the particles consistent with a given total energy such that the same total kinetic energy and, hence, the same measure of temperature is obtained. Nevertheless, each assignment corresponds to a different point in phase space and, therefore, a different and unique microscopic state. Similarly, if we connect the pressure to the average force per unit area exerted by the particles on the walls of the container, there are many ways of arranging the particles such that the forces between them and the walls yields the same pressure measure, even though each assignment corresponds to a unique point in phase space and hence, a unique microscopic state. Suppose we aimed, instead, to predict macroscopic time-dependent properties. By the same logic, if we started with a large set of initial conditions drawn from a state of thermodynamic equilibrium, and if we launched a trajectory from each initial condition in the set, then the resulting trajectories would all be unique in phase space. Despite their uniqueness, these trajectories should all lead, in the long time limit, to the same macroscopic dynamical observables such as vibrational spectra, diffusion constants, and so forth.

64

Theoretical foundations

The idea that the macroscopic observables of a system are not sensitive to precise microscopic details is the basis of the ensemble concept originally introduced by Gibbs. More formally, an ensemble is a collection of systems described by the same set of microscopic interactions and sharing a common set of macroscopic properties (e.g. the same total energy, volume, and number of moles). Each system evolves under the microscopic laws of motion from a different initial condition so that at any point in time, every system has a unique microscopic state. Once an ensemble is defined, macroscopic observables are calculated by performing averages over the systems in the ensemble. Ensembles can be defined for a wide variety of thermodynamic situations. The simplest example is a system isolated from its surroundings. However, ensembles also describe systems in contact with heat baths, systems in contact with particle reservoirs, systems coupled to pressure control mechanisms such as mechanical pistons, and various combinations of these influences. Such ensembles are useful for determining static properties such as temperature, pressure, free energy, average structure, etc. Thus, the fact that the systems in the ensemble evolve in time does not affect properties of this type, and we may freeze the ensemble at any instant and perform the average over the ensemble at that instant. These ensembles are known as equilibrium ensembles, and we will focus on them up to and including Chapter 12. Finally, ensembles can also be defined for systems driven by external forces or fields for the calculation of transport coefficients and other dynamical properties. These are examples of non-equilibrium ensembles, which will be discussed in Chapters 13 and 14. In classical ensemble theory, every macroscopic observable of a system is directly connected to a microscopic function of the coordinates and momenta of the system. A familiar example of this comes from the kinetic theory, where the temperature of a system is connected to the average kinetic energy. In general, we will let A denote a macroscopic equilibrium observable and a(x) denote a microscopic phase-space function that can be used to calculate A. According to the ensemble concept, if the ensemble has Z members, then the “connection” between A and a(x) is provided via an averaging procedure, which we write heuristically as Z 1X a(xλ ) ≡ hai. A= Z

(2.3.1)

λ=1

This definition is not to be taken literally, since the sum may well be a continuous “sum” or integral. However, eqn. (2.3.1) conveys the notion that the phase-space function a(x) must be evaluated for each member of the ensemble at that point in time when the ensemble is frozen. Finally, A is obtained by performing an average over the ensemble. (The notation hai in eqn. (2.3.1) will be used throughout the book to denote an ensemble average.) Let us recall the question we posed earlier: If we could solve the equations of motion for a very large number of particles, would the vast amount of detailed microscopic information generated be necessary to describe macroscopic observables? Previously, we answered this in the negative. However, the other side can also be argued if we take a purist’s view. That is, all of the information needed to describe a physical system is encoded in the microscopic equations of motion. Indeed, there are many physical and

Liouville’s theorem

65

chemical processes for which the underlying atomic and molecular mechanics are of significant interest and importance. In order to elucidate these, it is necessary to know how individual atoms and molecules move as the process occurs. Experimental techniques such as ultrafast laser spectroscopy can resolve processes at increasingly short time scales and thus obtain important insights into such motions. (The importance of such techniques was recognized by the award of the 1999 Nobel Prize in chemistry to the physical chemist Ahmed Zewail (1946–2016) for his pioneering work in their development.) While we cannot expect to solve the equations of motion for 1023 particles, we actually can solve them numerically for systems whose particle numbers range from 102 to 109 , depending on the complexity of the interactions in a particular physical model. The technique of solving the equations of motion numerically for small representative systems is known as molecular dynamics, a method that has become one of the most important theoretical tools for solving statistical mechanical problems. Although the system sizes currently accessible to molecular dynamics calculations are not truly macroscopic, they are large enough to capture the macroscopic limit for certain properties. Thus, a molecular dynamics calculation, which can be viewed as a kind of detailed “thought experiment” performed in silico, can yield important microscopic insights into complex phenomena including the catalytic mechanisms of enzymes, details of protein folding and misfolding processes, formation supramolecular structures, and many other fascinating phenomena. We will have more to say about molecular dynamics and other methods for solving statistical mechanical problems throughout the book. For the remainder of this chapter, we will focus on the fundamental underpinnings of ensemble theory.

2.4

Phase-space volumes and Liouville’s theorem

As noted previously, an ensemble is a collection of systems with a set of common macroscopic properties such that each system is in a unique microscopic state at any point in time as determined by its evolution under some dynamical rule, e.g., Hamilton’s equations of motion. Given this definition, and assuming that the evolution of the collection of systems is prescribed by Hamilton’s equations, it is important first to understand how a collection of microscopic states (which we refer to hereafter simply as “microstates”) moves through phase space. Consider a collection of microstates in a phase-space volume element dx0 centered on the point x0 . The “0” subscript indicates that each microstate in the volume element serves as an initial condition for Hamilton’s equations, which we had written in eqn. (1.6.23) as x˙ = η(x). The equations of motion can be generalized to the case of a set of driven Hamiltonian systems by writing them as x˙ = η(x, t). We now ask how the entire volume element dx0 moves under the action of Hamiltonian evolution. Recall that x0 is a complete set of generalized coordinates and conjugate momenta: x0 = (q1 (0), ..., q3N (0), p1 (0), ..., p3N (0)).

(2.4.1)

(We will refer to the complete set of generalized coordinates and their conjugate momenta collectively as the phase-space coordinates.) If we follow the evolution of this volume element from t = 0 to time t, dx0 will be transformed into a new volume element dxt centered on a point xt in phase space. The point xt is the phase-space

66

Theoretical foundations

point that results from the evolution of x0 . As we noted in Section 1.6, xt is a unique function of x0 that can be expressed as xt (x0 ). Since the mapping of the point x0 to xt is one-to-one, this mapping is equivalent to a coordinate transformation on the phase space from initial phase-space coordinates x0 to phase-space coordinates xt . Under this transformation, the volume element dx0 transforms according to dxt = J(xt ; x0 )dx0 ,

(2.4.2)

where J(xt ; x0 ) is the Jacobian of the transformation, the determinant of the matrix J defined in eqn. (1.6.28), from x0 to xt . According to eqn. (1.6.28), the elements of the matrix are ∂xkt . (2.4.3) Jkl = ∂xl0 We propose to determine the Jacobian in eqn. (2.4.2) by deriving an equation of motion it obeys and then solving this equation of motion. To accomplish this, we start with the definition, J(xt ; x0 ) = det(J), (2.4.4) analyze the derivative

d d J(xt ; x0 ) = det(J), (2.4.5) dt dt and derive a first-order differential equation obeyed by J(xt ; x0 ). The time derivative of the determinant is most easily computed by applying an identity satisfied by determinants det(J) = eTr[ln(J)] , (2.4.6) P where Tr is the trace operation: Tr(J) = k Jkk . Equation (2.4.6) is most easily proved by first transforming J into a representation in which it is diagonal. If J has eigenvalues λk , then ln(J) is aP diagonal matrix with eigenvalues ln(λk ), Q and the trace operation yields Tr[ln(J)] = k ln λk . Exponentiating the trace yields k λk , which is just the determinant of J. Substituting eqn. (2.4.6) into eqn. (2.4.5) gives d d J(xt ; x0 ) = eTr[ln(J)] dt dt 

 dJ −1 J dt  X  dJkl −1 J . = J(xt ; x0 ) dt lk = eTr[ln(J)] Tr

(2.4.7)

k,l

The elements of the matrices J−1 and dJ/dt are easily seen to be dJkl ∂ x˙ kt . = dt ∂xl0

−1 Jlk =

Substituting eqn. (2.4.8) into eqn. (2.4.7) gives

∂xl0 . ∂xkt

(2.4.8)

Ensemble distribution

X  ∂ x˙ k ∂xl  d 0 t . J(xt ; x0 ) = J(xt ; x0 ) l ∂xk dt ∂x t 0 k,l

67

(2.4.9)

The summation over l of the term in square brackets, is just the chain-rule expression for ∂ x˙ kt /∂xkt . Thus, performing this sum yields the equation of motion for the Jacobian: X ∂ x˙ k d t . J(xt ; x0 ) = J(xt ; x0 ) dt ∂xkt

(2.4.10)

k

The sum in the last line of eqn. (2.4.10) is easily recognized as the phase-space compressibility ∇xt · x˙ t defined in eqn. (1.6.25), where ∇xt = ∂/∂xt . Equation (1.6.25) also revealed that the phase compressibility is 0 for a system evolving under Hamilton’s equations. Thus, the sum on the right side of eqn. (2.4.10) vanishes, and the equation of motion for the Jacobian reduces to d J(xt ; x0 ) = 0. dt

(2.4.11)

This equation of motion implies that the Jacobian is a constant for all time. The initial condition J(x0 ; x0 ) on the Jacobian is simply 1, since the transformation from x0 to x0 is an identity transformation. Thus, since the Jacobian is initially 1 and remains constant in time, it follows that J(xt ; x0 ) = 1.

(2.4.12)

Substituting eqn. (2.4.12) into eqn. (2.4.2) yields the volume element transformation condition dxt = dx0 . (2.4.13) Equation (2.4.13) is an important result known as Liouville’s theorem (named for the nineteenth-century French mathematician Joseph Liouville (1809–1882)). Liouville’s theorem is essential to the claim made earlier that ensemble averages can be performed at any point in time. If the motion of the system is driven by highly nonlinear forces, then an initial hypercubic volume element dx0 , for example, will distort due to the chaotic nature of the dynamics. Because of Liouville’s theorem, the volume element can spread out in some of the phase-space dimensions but must contract in other dimensions by an equal amount so that, overall, the volume is conserved. That is, there can be no net attractors or repellors in the phase space. This is illustrated in Fig. 2.3 for a two-dimensional phase space.

2.5

The ensemble distribution function and the Liouville equation

Phase space consists of all possible microstates available to a system of N particles. However, an ensemble contains only those microstates that are consistent with a given set of macroscopic observables. Consequently, the microstates of an ensemble are either a strict subset of all possible phase-space points or are clustered more densely in certain

68

Theoretical foundations

Fig. 2.3 Illustration of phase-space volume conservation prescribed by Liouville’s theorem.

regions of phase space and less densely in others. We, therefore, need to describe quantitatively how the systems in an ensemble are distributed in the phase space at any point in time. To do this, we introduce the ensemble distribution function or phase-space distribution function f (x, t). The phase-space distribution function of an ensemble has the property that f (x, t)dx is the fraction of the total ensemble members contained in the phase-space volume element dx at time t. From this definition, it is clear that f (x, t) satisfies the following properties: Z

f (x, t) ≥ 0 dx f (x, t) = 1.

(2.5.1)

Therefore, f (x, t) is a probability density. When the phase-space distribution is expressed as f (x, t), we imagine ourselves sitting at a fixed location x in the phase space and observing the ensemble distribution evolve around us as a function of time. This perspective is known as the Eulerian view of the ensemble evolution and is analogous to an Eulerian frame in fluid mechanics, in which an observer remains at a fixed location and observes the motion of the fluid at that point. In order to determine the number of ensemble members in a small element dx at our location, we could simply “count” the number of microstates belonging to the ensemble in the volume element dx at any time t, determine the fraction f (x, t)dx, and build up a picture of the distribution by moving from one observation point x to another until we have exhausted every point in the phase space. On the other hand, the ensemble consists of a collection of systems evolving in time according to Hamilton’s equations of motion. Thus, we can also let the ensemble distribution function describe how a bundle of trajectories in a volume element dxt centered on a trajectory xt is distributed at time t. This will be given by f (xt , t)dxt . The latter, “co-moving” perspective, known as the Lagrangian view of the ensemble evolution (analogous to a Lagrangian frame in fluid mechanics), more closely fits the originally stated definition of an ensemble and will, therefore, be employed in the subsequent derivation to determine an equation satisfied by f in the phase space. The fact that f has a constant normalization means that there can be neither sources of new ensemble members nor sinks that reduce the number of ensemble

Ensemble distribution

69

ˆ is the unit Fig. 2.4 An arbitrary volume in phase space. dS is a hypersurface element and n vector normal to the surface at the location of dS.

members—the number of members remains constant. This also means that any volume Ω in the phase space with a surface S (see Fig. 2.4) contains no sources or sinks. Thus, the rate of decrease (or increase) of ensemble members in Ω must equal the rate at which ensemble members leave (or enter) Ω through the surface S. The fraction of ensemble members in Ω at time t can be written as Z dxt f (xt , t). (2.5.2) Fraction of ensemble members in Ω = Ω

Thus, the rate of decrease of ensemble members in Ω is related to the rate of decrease of this fraction by Z Z ∂ d dxt f (xt , t). (2.5.3) dxt f (xt , t) = − − dt Ω ∂t Ω On the other hand, the rate at which ensemble members leave Ω through the surface can be calculated from the flux, which is the number of ensemble members per unit ˆ be the unit vector normal to the area per unit time passing through the surface. Let n surface at the point xt (see Fig. 2.4). Then, as a fraction of ensemble members, this ˆ f (xt , t). The dot product with n ˆ ensures that we count only those flux is given by x˙ t · n ensemble members actually leaving Ω through the surface, that is, members whose trajectories have a component of their phase-space velocity x˙ t normal to the surface. Thus, the rate at which ensemble members leave Ω through the surface is obtained by integrating over S: Z Z ˆ f (xt , t) = dxt ∇xt · (x˙ t f (xt , t)) . (2.5.4) dS x˙ t · n S



70

Theoretical foundations

The right side of eqn. (2.5.6) follows from the divergence theorem applied to the hypersurface integral. Equating the right sides of eqns. (2.5.4) and (2.5.3) gives Z Z ∂ f (xt , t), (2.5.5) dxt dxt ∇xt · (x˙ t f (xt , t)) = − ∂t Ω Ω or Z



dxt



 ∂ f (xt , t) + ∇xt · (x˙ t f (xt , t)) = 0. ∂t

(2.5.6)

Since the choice of Ω is arbitrary, eqn. (2.5.6) must hold locally, so that the term in brackets vanishes identically, giving ∂ f (xt , t) + ∇xt · (x˙ t f (xt , t)) = 0. ∂t

(2.5.7)

Finally, since ∇xt · (x˙ t f (xt , t)) = x˙ t · ∇xt f (xt , t) + f (xt , t)∇xt · x˙ t , and the phase-space divergence ∇xt · x˙ t = 0, eqn. (2.5.7) reduces to ∂ f (xt , t) + x˙ t · ∇xt f (xt , t) = 0. ∂t

(2.5.8)

The quantity on the left side of eqn. (2.5.8) is just the total time derivative of f (xt , t), which includes both the time dependence of the phase-space vector xt and the explicit time dependence of f (xt , t). Thus, we obtain finally df ∂ = f (xt , t) + x˙ t · ∇xt f (xt , t) = 0, dt ∂t

(2.5.9)

which states that f (xt , t) is conserved along a trajectory. This result is known as the Liouville equation. The conservation of f (xt , t) implies that f (xt , t) = f (x0 , 0),

(2.5.10)

f (xt , t)dxt = f (x0 , 0)dx0 .

(2.5.11)

and since dxt = dx0 , we have

Equation (2.5.11) states that the fraction of ensemble members in the initial volume element dx0 is equal to the fraction of ensemble members in the volume element dxt . Equation (2.5.11) ensures that we can perform averages over the ensemble at any point in time because the fraction of ensemble members is conserved. Since x˙ t = η(xt , t), eqn. (2.5.9) can also be written as df ∂ = f (xt , t) + η(xt , t) · ∇xt f (xt , t) = 0. dt ∂t

(2.5.12)

Writing the Liouville equation this way allows us to recover the “passive” view of the ensemble distribution function in which we remain at a fixed location in phase space.

Equilibrium solutions

71

In this case, we remove the t label attached to the phase-space points and obtain the following partial differential equation for f (x, t): ∂ f (x, t) + η(x, t) · ∇x f (x, t) = 0, ∂t

(2.5.13)

which is another form of the Liouville equation. Since eqn. (2.5.13) is a partial differential equation, it can only specify a class of functions as solutions. Specific solutions for f (x, t) require input of further information; we will return to this point again as specific ensembles are considered in subsequent chapters. Finally, note that if we use the definition of η(x, t) in eqn. (1.6.24) and apply the analysis leading up to eqn. (1.6.19), it is clear that η(x, t)·∇x f (x, t) = {f (x, t), H(x, t)}, where {..., ...} is the Poisson bracket. Thus, the Liouville equation can also be written as ∂ f (x, t) + {f (x, t), H(x, t)} = 0, (2.5.14) ∂t a form we will employ in the next section for deriving general equilibrium solutions.

2.6

Equilibrium solutions of the Liouville equation

In Section 2.3, we argued that thermodynamic variables can be computed from averages over an ensemble. Such averages must, therefore, be expressed in terms of the ensemble distribution function. If a(x) is a microscopic phase-space function corresponding to a macroscopic observable A, then a proper generalization of eqn. (2.3.1) is Z A = ha(x)i = dx f (x, t)a(x). (2.6.1) If f (x, t) has an explicit time dependence, then so will the observable A, in general. However, we also remarked earlier that a system in thermodynamic equilibrium has a fixed thermodynamic state. This means that the thermodynamic variables characterizing the equilibrium state do not change in time. Thus, if A is an equilibrium observable, the ensemble average in eqn. (2.6.1) must yield a time-independent result, which is only possible if the ensemble distribution of a system in thermodynamic equilibrium has no explicit time dependence, i.e., ∂f /∂t = 0. This will be the case, for example, when no external driving forces act on the system, in which case H(x, t) → H(x) and η(x, t) → η(x). When ∂f /∂t = 0, the Liouville equation eqn. (2.5.14) reduces to {f (x), H(x)} = 0.

(2.6.2)

The general solution to eqn. (2.6.2) is any function of the Hamiltonian H(x): f (x) ∝ F(H(x)).

(2.6.3)

This is as much as we can say from eqn. (2.6.2) without further information about the ensemble. In order to ensure that f (x) is properly normalized according to eqn. (2.5.1), we write the solution as

72

Theoretical foundations

f (x) = where Z is defined to be Z=

Z

1 F(H(x)), Z

(2.6.4)

dx F(H(x)).

(2.6.5)

The quantity Z, referred to as the partition function, is one of the central quantities in equilibrium statistical mechanics. The partition function is a measure of the number of microscopic states in the phase space accessible within a given ensemble. Each ensemble has a particular partition function that depends on the macroscopic observables used to define the ensemble. We will show in Chapters 3 to 6 that the thermodynamic properties of a system are calculated from the various partial derivatives of the partition function. Other equilibrium observables are computed according to Z 1 A = ha(x)i = dx a(x)F(H(x)). (2.6.6) Z Note that the condition f (x0 )dx0 = f (xt )dxt implied by eqn. (2.5.11) guarantees that the equilibrium average over the systems in the ensemble can be performed at any point in time. Equations (2.6.5) and (2.6.6) constitute the essence of equilibrium statistical mechanics. As Richard Feynman remarks in his book Statistical Mechanics: A Set of Lectures, eqns. (2.6.5) and (2.6.6) embody “the summit of statistical mechanics, and the entire subject is either the slide-down from this summit, as the [principles are] applied to various cases, or the climb-up where the fundamental [laws are] derived and the concepts of thermal equilibrium . . . [are] clarified” (Feynman, 1998).2 We shall, of course, embark on both, and we will explore the methods by which equilibrium ensemble distributions are generated and observables are computed for realistic applications.

2.7

Problems 2.1. Consider n moles of an ideal gas in a volume V at pressure P and temperature T . The equation of state is P V = nRT as given in eqn. (2.2.2). If the gas contains N molecules, so that n = N/N0 , where N0 is Avogadro’s number, then the total number of microscopic states available to the gas can be shown (see Section 3.5) to be Ω ∝ V N (kT )3N/2 , where k = R/N0 is known as Boltzmann’s constant. The entropy of the gas is defined via Boltzmann’s relation (see Chapter 3) as S = k ln Ω. Note the total energy of an ideal gas is E = 3nRT /2. a. Suppose the gas expands or contracts from a volume V1 to a volume V2 at constant temperature. Calculate the work done on the system.

2 This quote is actually made in the context of quantum statistical mechanics (see Chapter 10); however, the sentiment applies equally well to classical statistical mechanics.

Problems

73

b. For the process in part a, calculate the change of entropy using Boltzmann’s relation and using eqn. (2.2.19). Show that these two approaches yield the same entropy change. c. Next, suppose the temperature of the gas is changed from T1 to T2 under conditions of constant volume. Calculate the entropy change using the two approaches in part a and show that they yield the same entropy change. d. Finally, suppose that the volume changes from V1 to V2 in an adiabatic process (∆Q = 0). The pressure also changes from P1 to P2 in the process. Show that P1 V1γ = P2 V2γ , and find the numerical value of the exponent γ. 2.2. A substance has the following properties: i. When its volume is increased from V1 to V2 at constant temperature T , the work done in the expansion is   V2 . W = RT ln V1 ii. When the volume changes from V1 to V2 and the temperature changes from T1 to T2 , its entropy changes according to    α T2 V1 , ∆S = k V2 T1

where α is a constant. Find the equation of state and Helmholtz free energy of this substance. 2.3. Reformulate the Carnot cycle for an ideal gas as a thermodynamic cycle in the T -S plane rather than the P -V plane, and show that the area enclosed by the cycle is equal to the net work done by the gas during the cycle. 2.4. The equation of state for one mole of a substance takes the general form g(V, P, T ) = 0, which can be expressed equivalently as V = F (P, T ). Suppose we are able to perform experiments that yield the following information about the P and T dependence of the volume:     1 ∂V 1 ∂F = = A (1 − aP ) V ∂T P F ∂T P     1 ∂F 1 ∂V = = −K [1 + b(T − T0 )] , V ∂P T F ∂P T where a, b, K, and T0 are constants. (These two relations define two thermodynamic variables known as the volume expansion coefficient and the isothermal compressibility, respectively.)

74

Theoretical foundations

a. Let h(P, T ) = ln V = ln F (P, T ). Show that       1 ∂V 1 ∂F ∂h = = ∂P T V ∂P T F ∂P T       ∂h 1 ∂V 1 ∂F = = , ∂T P V ∂T P F ∂T P and, therefore, that dh = −K [1 + b(T − T0 )] dP + A (1 − aP ) dT. b. Using the identity for h(P, T ) ∂ 2h ∂2h = , ∂T ∂P ∂P ∂T show that Aa = Kb.  ∂h with respect to P , show that c. By integrating ∂P T

h(P, T ) = −K [1 + b(T − T0 )] P + f (T ),

where f (T ) is an arbitrary function of T .  ∂h to show that f (T ) = AT + C, where C is an d. Use the derivative ∂T P arbitrary constant. e. Suppose we know that at T = T0 and at a pressure P = P0 , the volume has a value V = V0 . Given this data, determine the constant C, and find the functional form of F (P, T ), thereby obtaining the equation of state for this system. 2.5. Consider the thermodynamic cycle shown in Fig. 2.5. Compare the efficiency of this engine to that of a Carnot engine operating between the highest and lowest temperatures of the cycle in the figure. Which one is greater? 2.6. Consider an ensemble of one-particle systems, each evolving in one spatial dimension according to an equation of motion of the form x˙ = −αx, where x(t) is the position of the particle at time t and α is a constant. Since the compressibility of this system is nonzero, the ensemble distribution function f (x, t) satisfies a Liouville equation of the form ∂f ∂f − αx = αf ∂t ∂x (see eqn. (2.5.7)). Suppose that at t = 0, the ensemble distribution has a Gaussian form 2 2 1 e−x /2σ . f (x, 0) = √ 2 2πσ

Problems

75

Fig. 2.5 Thermodynamic cycle.

a. Find a solution of the Liouville equation that also satisfies this initial distribution. Hint: Show that the substitution f (x, t) = eαt f˜(x, t) yields an equation for a conserved distribution f˜(x, t). Next, try multiplying the x2 in the initial distribution by an arbitrary function g(t) that must satisfy g(0) = 1. Use the Liouville equation to derive an equation that g(t) must satisfy and then solve this equation. b. Describe the evolution of the ensemble distribution qualitatively and explain why it should evolve this way. c. Show that your solution is properly normalized in the sense that Z ∞ dxf (x, t) = 1. −∞

d. Repeat the problem for the case that the initial distribution is a Lorentzian f (x, 0) =



σ 1 π σ 2 + x2

2.7. An alternative definition of entropy was proposed by Gibbs, who expressed the entropy in terms of the phase-space distribution function f (x, t) as Z S(t) = −k dx f (x, t) ln f (x, t).

76

Theoretical foundations

Here, f (x, t) satisfies the Liouville equation eqn. (2.5.13). The notation S(t) expresses the fact that an entropy defined this way is an explicit function of time. a. Show that for an arbitrary distribution function, the entropy is actually constant, i.e., that dS/dt = 0, S(t) = S(0), so that S(t) cannot increase in time for any ensemble. Is this in violation of the second law of thermodynamics? Hint: Be careful how the derivative d/dt is applied to the integral! b. The distribution f (x, t) is known as a “fine-grained” distribution function. Because f (x, t) is fully defined at every phase-space point, it contains all of the detailed microstructure of the phase space, which cannot be resolved in reality. Consider, therefore, introducing a “coarse-grained” phase space distribution f¯(x, t) defined via the following operation: Divide phase space into the smallest cells over which f¯(x, t) can be defined. Each cell C is then subdivided into small subcells such that each subcell of volume ∆x centered on the point x has an associated probability f (x, t)∆x at time t (Waldram, 1985). Assume that at t = 0, f (x, 0) = f¯(x, 0). In order to define f¯(x, t) for t > 0, at each point in time, we transfer probability from subcells of C where f > f¯ to cells where f < f¯. Then, we use f¯(x, t) to define a coarse-grained entropy Z ¯ = −k dx f¯(x, t) ln f¯(x, t), S(t) where the integral should be interpreted as a sum over all cells C into which the phase space has been divided. For this particular coarse-graining ¯ ≥ S(0) ¯ operation, show that S(t) where equality is only true in equilibrium. Hint: Show that the change in S¯ on transferring probability from one small subcell to another is either positive or zero. This is sufficient to show that the total coarse-grained entropy can either increase in time or remain constant. 2.8. Consider a single particle moving in three spatial dimensions with phase-space vector (px , py , pz , x, y, z). Derive the complete canonical transformation to spherical polar coordinates (r, θ, φ) and their conjugate momenta (pr , pθ , pφ ) and show that the phase-space volume element dpdr satisfies dpx dpy dpz dxdydz = dpr dpθ dφdrdθdφ.

3 The microcanonical ensemble and introduction to molecular dynamics 3.1

Brief overview

In the previous chapter, it was shown that statistical mechanics provides the link between the classical microscopic world described by Newton’s laws of motion and the macroscopic observables that are actually measured in experiments, including thermodynamic, structural, and dynamical properties. One of the great successes of statistical mechanics is its provision of a rational microscopic basis for thermodynamics, which otherwise is only a phenomenological theory. We showed that the microscopic connection is provided via the notion of an ensemble—an imaginary collection of systems described by the same Hamiltonian with each system in a unique microscopic state at any given instant in time. In this chapter, we will lay out the basic classical statistical mechanics of the simplest and most fundamental of the equilibrium ensembles, that of an isolated system of N particles in a container of volume V and a total energy E corresponding to a Hamiltonian H(x). This ensemble is known as the microcanonical ensemble. The microcanonical ensemble provides a starting point from which all other equilibrium ensembles are derived. Our discussion will begin with the classical partition function, its connection to the entropy via Boltzmann’s relation, and the thermodynamic and equilibrium properties that it generates. Several simple applications will serve to illustrate these concepts. However, it will rapidly become apparent that in order to treat any realistic system, numerical solutions are needed, which will lead naturally to a discussion of the numerical simulation technique known as molecular dynamics (MD). MD is a widely used, immensely successful computational approach in which the classical equations of motion are solved numerically and the trajectories thus generated are used to extract macroscopic observables. MD also permits direct “visualization” of the detailed motions of individual atoms in a system, thereby providing a “window” into the microscopic world. Although such animations of MD trajectories should never be taken too seriously, they can be useful as a guide toward understanding the mechanisms underlying a given chemical process. At the end of the chapter, we will consider a number of examples that illustrate the power and general applicability of molecular dynamics to realistic systems.

78

Microcanonical ensemble

3.2 Basic thermodynamics, Boltzmann’s relation, and the partition function of the microcanonical ensemble We begin by considering a system of N identical particles in a container of volume V with a fixed internal energy E. The variables N , V , and E are all macroscopic thermodynamic quantities referred to as control variables. Control variables are simply quantities that characterize the ensemble and that determine other thermodynamic properties of the system. Different choices of these variables lead to different system properties. In order to describe the thermodynamics of an ensemble of systems with given values of N , V , and E, we seek a unique state function of these variables. We will now show that such a state function can be obtained from the first law of thermodynamics, which relates the energy E of a system to a quantity Q of heat absorbed and an amount of work W done on the system: E = Q + W.

(3.2.1)

The derivation of the desired state function begins by examining how the energy changes if a small amount of heat dQ is added to the system and a small amount of work dW is done on the system. Since E is a state function, this thermodynamic transformation may be carried out along any path, and it is particularly useful to consider a reversible path for which dE = dQrev + dWrev .

(3.2.2)

Note that since Q and W are not state functions, it is necessary to characterize their changes by the “rev” subscript. The amount of heat absorbed by the system can be related to the change in the entropy ∆S of the system by Z dQrev dQrev , dS = , (3.2.3) ∆S = T T where T is the temperature of the system. Therefore, dQrev = T dS. Work done on the system is measured in terms of the two control variables V and N . Let P (V ) be the pressure of the system at the volume V . Mechanical work can be done on the system by compressing it from a volume V1 to a new volume V2 < V1 : (mech)

W12

=−

Z

V2

P (V )dV,

(3.2.4)

V1

where the minus sign indicates that work is positive in a compression. A small volume (mech) change dV corresponds to an amount of work dWrev = −P (V )dV . Although we will typically suppress the explicit volume dependence of P on V and write simply, (mech) dWrev = −P dV , it must be remembered that P depends not only on V but also on N and E. In addition to the mechanical work done by compressing a system, chemical work can also be done on the system by increasing the number of particles. Let µ(N ) be the chemical potential of the system at particle number, N (µ also depends on V

Basic thermodynamics

79

and E). If the number of particles is increased from N1 to N2 > N1 , then chemical work N2 X (chem) W12 = µ(N ) (3.2.5) N =N1

is done on the system. Clearly, the number of particles in a system can only change by integral amounts, ∆N . However, the changes we wish to consider are so small compared to the total particle number (N ∼ 1023 ) that they can be regarded approximately as changes dN in a continuous variable. Therefore, the chemical work corresponding (chem) to such a small change dN can be expressed as dWrev = µ(N )dN . Again, we suppress the explicit dependence of µ on N (as well as on V and E) and write simply (chem) dWrev = µdN . Therefore, the total reversible work done on the system is given by (mech) (chem) dWrev = dWrev + dWrev = −P dV + µdN,

(3.2.6)

so that the total change in energy is dE = T dS − P dV + µdN.

(3.2.7)

By writing eqn. (3.2.7) in the form dS =

P µ 1 dE + dV − dN, T T T

(3.2.8)

it is clear that the state function we are seeking is just the entropy of the system, S = S(N, V, E), since the change in S is related directly to the change in the three control variables of the ensemble. However, since S is a function of N , V , and E, the change in S resulting from small changes in N , V , and E can also be written using the chain rule as       ∂S ∂S ∂S dE + dV + dN. (3.2.9) dS = ∂E N,V ∂V N,E ∂N V,E Comparing eqn. (3.2.9) with eqn. (3.2.8) shows that the thermodynamic quantities T , P , and µ can be obtained by taking partial derivatives of the entropy with respect to each of the three control variables:       1 P µ ∂S ∂S ∂S , , . (3.2.10) = = =− T ∂E N,V T ∂V N,E T ∂N V,E We now recall that the entropy is a quantity that can be related to the number of microscopic states of the system. This relation was first proposed by Ludwig Boltzmann in 1877, although it was Max Planck who actually formalized the connection. Let Ω be the number of microscopic states available to a system. The relation connecting S and Ω states that S(N, V, E) = k ln Ω(N, V, E). (3.2.11) Since S is a function of N , V , and E, Ω must be as well. The constant, k, appearing in eqn. (3.2.11) is known as Boltzmann’s constant; its value is 1.3806505(24)×10−23

80

Microcanonical ensemble

J·K−1 . That logarithmic dependence of the entropy on Ω(N, V, E) will be explained shortly. Assuming we can determine Ω(N, V, E) from a microscopic description of the system, eqn. (3.2.11) then provides a connection between this microscopic description and macroscopic thermodynamic observables. In the last chapter, we saw that the most general solution to the equilibrium Liouville equation, {f (x), H(x)} = 0, for the ensemble distribution function f (x) is any function of the Hamiltonian: f (x) = F (H(x)), where x is the phase-space vector. The specific choice of F (H(x)) is determined by the conditions of the ensemble. The microcanonical ensemble pertains to a collection of systems in isolation obeying Hamilton’s equations of motion. Recall, however, from Section 1.6, that a system obeying Hamilton’s equations conserves the total Hamiltonian H(x) = E

(3.2.12)

with E being the total energy of the system. Conservation of H(x) was demonstrated explicitly in eqn. (1.6.15). Moreover, the ensemble distribution function f (x) is static in the sense that ∂f /∂t = 0. Therefore, each member of an equilibrium ensemble is in a single unique microscopic state. For the microcanonical ensemble, each unique state is described by a unique phase-space vector x that satisfies eqn. (3.2.12). It follows that the choice of F (H(x)) must be consistent with eqn. (3.2.12). That is, F (H(x)) must restrict x to those microscopic states for which H(x) = E. A function that achieves this is the Dirac δ-function F (H(x)) = Nδ(H(x) − E)

(3.2.13)

expressing the conservation of energy condition. Here, N is an overall normalization constant. For readers not familiar with the properties of the Dirac δ-function, a detailed discussion is provided in Appendix A. Since eqn. (3.2.12) defines the constant-energy hypersurface in phase space, eqn. (3.2.13) expresses the fact that, in the microcanonical ensemble, all phase-space points must lie on this hypersurface and that all such points are equally probable; all points not on this surface have zero probability. The notion that Ω(N, V, E) can be computed from an ensemble in which all accessible microscopic states are equally probable is an assumption that is consistent with classical mechanics, as the preceding discussion makes clear. More generally, we assume that for an isolated system in equilibrium, all accessible microscopic states are equally probable, which is known as the assumption of equal a priori probability. The quantity 1/Ω(N, V, E) is a measure of the probability of randomly selecting a microstate in any small neighborhood of phase space anywhere on the constant-energy hypersurface. The number Ω(N, V, E) is a measure of the amount of phase space available to the system. It must, therefore, be proportional to the fraction of phase space consistent with eqn. (3.2.12), which is proportional to the (6N − 1)-dimensional “area” of the constant-energy hypersurface. This number can be obtained by integrating eqn. (3.2.13) over the phase space, as indicated by eqn. (2.6.5).1 An integration over the 1 If we imagined discretizing the constant-energy hypersurface such that each discrete patch contained a single microstate, then the integral would revert to a sum that would represent a literal counting of the number of microstates contained on the surface.

Basic thermodynamics

81

entire phase space is an integration over the momentum pi and position ri of each particle in the system and is, therefore, a 6N -dimensional integration. Moreover, while the range of integration of each momentum variable is infinite, integration over each position variable is restricted to that region of space defined by the containing volume. We denote this region as D(V ), i.e., the spatial domain defined by the containing volume. For example, if the container is a cube of side length L, lying in the positive octant of Cartesian space with a corner at the origin, then D(V ) would be defined by x ∈ [0, L], y ∈ [0, L], z ∈ [0, L] for each Cartesian vector r = (x, y, z). Therefore, Ω(N, V, E) is given by the integral Z Z Z Z Ω(N, V, E) = M dp1 · · · dpN drN δ(H(r, p) − E), (3.2.14) dr1 · · · D(V )

D(V )

where M is an overall constant whose value we will discuss shortly. Equation (3.2.14) defines the partition function of the microcanonical ensemble. For notational simplicity, we often write eqn. (3.2.14) in a briefer notation as Z Z N Ω(N, V, E) = M d p dN r δ(H(r, p) − E), (3.2.15) D(V )

or more simply as Ω(N, V, E) = M

Z

dx δ(H(x) − E),

(3.2.16)

using the phase-space vector. However, it should be remembered that these shorter versions refer to the explicit form of eqn. (3.2.14). In order to understand eqn. (3.2.14) somewhat better and define the normalization constant M , let us consider determining Ω(N, V, E) in a somewhat different way. We perform a thought experiment in which we “count” the number of microstates via a “device” capable of determining a position component, say x, to a precision ∆x and a momentum component p to a precision ∆p. Since quantum mechanics places an actual limit on the product ∆x∆p, namely Planck’s constant h (this is Heisenberg’s uncertainty relation to be discussed in Chapter 9), h is a natural choice for our thought experiment. Thus, we can imagine dividing phase space up into small hypercubes of volume ∆x = (∆x)3N (∆p)3N = h3N , such that each hypercube contains a single measurable microstate. Let us denote this phase-space volume simply as ∆x. We will also assume that we can only determine the energy of each microstate to be within E and E + E0 , where E0 defines a very thin energy shell above the constant-energy hypersurface. For each phase-space hypercube, if the energy of the corresponding microstate lies within this shell, we increment our counting by 1, which we represent identically as ∆x/h3N . We can therefore write Ω as Ω(N, V, E) =

X

hypercubes

∆x , h3N

(3.2.17)

E 0 evolves under the action of the integrator. What happens if p(0) < 0? ∗

g. Derive the general distribution generated by eqns. (4.9.14) when no external forces are present and the conservation law in eqn. (4.10.27) is obeyed.

Hint: Since the conservation law involves the center-of-mass momentum P, it is useful to introduce a canonical transformation to center-of-mass momentum and position (R, P) and the d(N − 1) corresponding relative coordinates r′1 , r′2 , ... and momenta p′1 , p′2 , .... PN ∗ h. Show that when i=1 Fi = 0, the Nos´e-Hoover chain equations generate the correct canonical distribution in all variables except the center-ofmass momentum. P i. Finally, show the conservation N i=1 Fi = 0 is not obeyed by eqns. (4.11.7) and, therefore, that they also generate a correct canonical distribution in all variables.

224

Canonical ensemble

4.4. Consider a modified version of the Nos´e-Hoover equations for a harmonic oscillator with unit mass, unit frequency, and kT = 1: x˙ = p − pη x,

p˙ = −x − pη p

η˙ = pη ,

p˙ η = p2 + x2 − 2.

a. Show that these equations have the two conservation laws:  1 2 p + x2 + p2η + 2η 2  1 2 K= p + x2 e2η . 2 C=

b. Determine the distribution f (H) of the physical Hamiltonian H(x, p) = (p2 +x2 )/2. Is the distribution the expected canonical distribution f (H) ∝ exp(−H)? Hint: Try using the two conservation laws to eliminate the variables η and pη . ∗

c. Show that a plot of the physical phase space p vs. x necessarily must have a hole centered at (x, p) = (0, 0), and find a condition that determines the size of the hole.

4.5. In Section 4.5, it was shown that the energy fluctuations in the canonical ensemble satisfy 2 (∆E) = kB T 2 CV where CV is the constant-volume heat capacity. Define the third-order fluctuation (∆E)3 as E D (∆E)3 = (H(x) − hH(x)i)3 . 3

Derive an expression for (∆E) in terms of T , CV , and any derivatives of CV you think are necessary.

4.6. Suppose the interactions in an N -particle system are described by a pair potential of the form U (r1 , ..., rN ) =

N −1 X

N X

i=1 j=i+1

u(|ri − rj |).

In the low density limit, we can assume that each particle interacts with at most one other particle.

Problems

225

a. Show that the canonical partition function in this limit can be expressed as N/2  Z ∞ (N − 1)!!V N/2 2 −βu(r) . 4π dr r e Q(N, V, T ) = N !λ3N 0 b. Show that the radial distribution function is proportional to exp[−βu(r)] in this limit. c. Show that the second virial coefficient in the low density limit becomes Z ∞ B2 (T ) = −2π dr r2 f (r) 0

where f (r) = e−βu(r) − 1. d. Suppose the interaction potential between a pair of particles in a gas is given by  r≤σ  ∞    σ ≤ r ≤ λσ u(r) = σǫ r−λσ λ−1     0 r > λσ, where ǫ, λ, and σ are constants. Sketch a plot of this potential as a function of r and calculate the second virial coefficient B2 (T ) for this potential. Plot B2 (T ) as a function of T .

4.7. An ideal gas of N particles of mass m at temperature T is in a cylindrical container with radius a and length L. The container rotates about its cylindrical axis (taken to be the z axis) with angular velocity ω. In addition, the gas is subject to a uniform gravitational field of strength g. Therefore, the Hamiltonian for the gas is H=

N X

h(ri , pi ),

i=1

where h(r, p) is the Hamiltonian for a single particle h(r, p) =

p2 − ω(r×p)z + mgz. 2m

Here, (r×p)z is the z-component of the cross product between r and p. a. Show, in general, that when the Hamiltonian is separable in this manner, the canonical partition function Q(N, V, T ) is expressible as Q(N, V, T ) =

1 N [q(V, T )] , N!

226

Canonical ensemble

where q(V, T ) =

1 h3

Z

dp

Z

dr e−βh(r,p).

D(V )

b. Show, in general, that the chemical potential µ(N, V, T ) is given by   Q(N − 1, V, T ) µ(N, V, T ) = kT ln , Q(N, V, T ) where Q(N − 1, V, T ) is the partition function for an (N − 1)-particle system. c. Calculate the partition function for this ideal gas. d. Calculate the Helmholtz free energy of the gas. e. Calculate the total internal energy of the gas. f. Calculate the heat capacity of the gas. ∗

g. What is the equation of state of the gas?

4.8. A classical system of N noninteracting diatomic molecules enclosed in a cubic box of length L and volume V = L3 is held at a fixed temperature T . The Hamiltonian for a single molecule is h(r1 , r2 , p1 , p2 ) =

p21 p2 + 2 + ǫ|r12 − r0 |, 2m1 2m2

where r12 = |r1 − r2 | is the distance between the atoms in the diatomic. a. Calculate the canonical partition function. b. Calculate the Helmholtz free energy. c. Calculate the total internal energy. d. Calculate the heat capacity.

e. Calculate the mean-square molecular bond length |r1 − r2 |2 .

4.9. Write a program to integrate the Nos´e-Hoover chain equations for a harmonic oscillator with mass m = 1, frequency ω = 1, and temperature kT = 1 using the integrator of Section 4.12. Verify that the correct momentum and position distributions are obtained by comparison with the analytical results r mω 2 −mω2 x2 /2kT 1 −p2 /2mkT f (p) = √ e . e , f (x) = 2πkT 2πmkT

Problems ∗

227

4.10. Consider a system of N particles subject to a single holonomic constraint σ(r1 , ..., rN ) ≡ σ(r) = 0. Recall that the equations of motion derived using Gauss’s principle of least constraint are pi r˙ i = mi # "P P j,k ∇j ∇k σ · ·pj pk /(mj mk ) j Fj · ∇j σ/mj + P ∇i σ. p˙ i = Fi − 2 j (∇j σ) /mj

Show using the techniques of Section 4.10 that these equations of motion generate the partition function Z Ω = dN pdN rZ(r)δ(H(r, p) − E)δ(σ(r))δ(σ(r, ˙ p)), where

 2 N X 1 ∂σ . Z(r) = mi ∂ri i=1

This result was first derived by Ryckaert and Ciccotti (1983). 4.11. The canonical ensemble version of the classical virial theorem is credited to Richard C. Tolman (1918). Prove that the canonical average   Z ∂H 1 ∂H −βH(x) xi = e = kT δij dx xi ∂xj N !h3N Q(N, V, T ) ∂xj holds. What assumptions must be made in the derivation of this result? 4.12. Prove that the structure factor S(q) of a one-component isotropic liquid or gas is related to the radial distribution function g(r) via eqn. (4.7.31). 4.13. Consider a system of N identical noninteracting molecules, each molecule being comprised of n atoms with some chemical bonding pattern within the molecule. The atoms in each molecule are held together by a potential (i) (i) u(r1 , ..., rn ), i = 1, ..., N , which rapidly increases as the distance between any two pairs of atoms increases, and becomes infinite as the distance between any two atoms in the molecule becomes infinite. Assume the atoms in each molecule have masses mk , where k = 1, ..., n. a. Write down the Hamiltonian and the canonical partition function for this system and show that the partition function can be reduced to a product of single-molecule partition functions.

228

Canonical ensemble

b. Make the following change of coordinates in your single-molecule partition function: s1 =

n 1 X mk rk M k=1

sk = rk −

k−1 1 X ml rl m′k

k = 2, ..., n,

l=1

where m′k ≡

k−1 X

ml ,

l=1

M is the total mass of a molecule, and the i superscript has been dropped for simplicity. What is the meaning of the coordinate s1 ? Show that if u(r1 , ..., rn ) only depends on the relative coordinates between pairs of atoms in the molecule, then single molecule partition function is of the general form (V f (n, T ))N Q(N, V, T ) = , N! where f (n, T ) is a pure function of n and T . c. Show, therefore, that the equation of state is always that of an ideal gas, independent of the type of molecule in the system. d. Denote the single-molecule partition function as q(n, V, T ) = V f (n, T ). Now suppose that the system is composed of different types of molecules (see comment following eqn. (4.3.14)). Specifically, suppose the system contains NA molecules of type A, NB molecules of type B, NC molecules of type C and ND molecules of type D. Suppose further that the molecules can undergo the following chemical reaction: aA + bB ⇋ cC + dD, which is a chemical equilibrium. The Helmholtz free energy A must now be a function of V , T , NA , NB , NC , and ND . When chemical equilibrium is reached, the free energy is a minimum, so that dA = 0. Assume that the volume and temperature of the system are kept constant. Let λ be a variable such that dNA = adλ, dNB = bdλ, dNC = −cdλ, and dND = −ddλ. λ is called the reaction extent. Show that, at equilibrium, aµA + bµB − cµC − dµD = 0, where µA is the chemical potential of species A: µA = −kT

∂ ln Q(V, T, NA , NB , NC , ND ) ∂NA

with similar definitions for µB , µC , and µD .

(4.16.3)

Problems

229

e. Finally, show that eqn. (4.16.3) implies ρcC ρdD (qC /V )c (qD /V )d = (qA /V )a (qB /V )b ρaA ρbB and that both sides are pure functions of temperature. Here, qA is the one-molecule partition function for a molecule of type A, qB , the onemolecule partition function for a molecule of type B, etc., and ρA is the number density of type A molecules, etc. How is the quantity on the right related to the usual equilibrium constant d PCc PD a PA PBb

K=

for the reaction? Here, PA , PB ,... are the partial pressures of species A, species, B,..., respectively. 4.14. Consider a system of N identical particles interacting via a pair potential 1 X u(r1 , ..., rN ) = u(|ri − rj |), 2 i,j,i6=j

where u(r) is a general repulsive potential of the form A , rn where n is an integer and A > 0. In the low density limit, compute the pressure of such a system as a function of n. Explain why a system described by such a potential cannot exist stably for certain values of n. u(r) =

Hint: You may express the answer in terms of the Γ-function Z ∞ Γ(x) = dt tx−1 e−t . 0

Also, the following properties of the Γ-function may be useful: Γ(x) > 0

x > 0,

Γ(0) = ∞,

Γ(n) = ∞ for integer n < 0 √ Γ(−1/2) = −2 π. 4.15. Often a pair potential is insufficient to describe accurately the behavior of many real liquids and gases. One then often includes three-body terms in the potential, which appear as follows: X X u(|ri − rj |) + v(|ri − rj |, |rj − rk |, |ri − rk |), U (r1 , ..., rN ) = i>j

i>j>k

where the first term is the usual pair interaction term and the second contains the three-body contributions.

230

Canonical ensemble

a. Derive an expression for the average energy in terms of g (2) (r1 , r2 ) and g (3) (r1 , r2 , r3 ). What is the expression for g (3) ? b. Explain why g (3) (r1 , r2 , r3 ) should only depend on r1 − r2 , r3 − r2 and r3 − r1 . By making the following coordinate transformation: 1 (r1 + r2 + r3 ) 3 r = r1 − r2 1 s = r3 − (r1 + r2 ), 2

R=

show that g (3) really only depends on r and s. By integrating over the variable R, obtain a new distribution function g˜(3) (r, s). c. Express the average energy in terms of the radial distribution function g(r) and the new function g˜(3) (r, s). d. For isotropic systems, explain why g (3) should only depend on the three distances |r1 − r2 |, |r3 − r2 | and |r3 − r1 |. Show, therefore, that in terms of the variables r and s, the simplest distribution function can be expressed as g¯(3) (r, µ, ν), i.e., as a function of the three variables r, µ, and ν, where r, µ and ν are given by r = |r|

1 µ = |s + r| + |s − 2 1 ν = |s + r| − |s − 2

1 r| 2 1 r|. 2

(You do not need to obtain an explicit expression for g¯(3) .) 4.16. Derive eqn. (4.14.5). 4.17. A classical ideal gas of N massless particles with coordinates r1 , ..., rN ≡ r and momenta p1 , ..., pN ≡ p is held at a constant volume V and external temperature T . The Hamiltonian of the gas is H(r, p) =

N X i=1

c|pi |

where c is a constant. The number of particles N is also kept fixed. Determine the equation of state, the total energy, the heat capacity, and the entropy of the gas under these conditions.

5 The isobaric ensembles 5.1

Why constant pressure?

Standard handbooks of thermodynamic data report numerical values of physical properties, including standard enthalpies, entropies and free energies of formation, redox potentials, equilibrium constants (such as acid ionization constants, solubility products, inhibition constants) and other such data, under conditions of constant temperature and pressure. This makes the isothermal-isobaric ensemble one of the most important ensembles since it most closely reflects the conditions under which many condensed-phase experiments are performed. In order to maintain a fixed internal pressure, the volume of a system must be allowed to fluctuate. We may therefore view an isobaric system as coupled to an isotropic “piston” that compresses or expands the system uniformly in response to instantaneous internal pressure fluctuations such that the average internal pressure is equal to an external applied pressure. Remember that an instantaneous pressure estimator is the total force exerted by the particles on the walls of their container, and the average of this quantity gives the observable internal pressure. Coupling a system to the piston leads to an ensemble known as the isoenthalpic-isobaric ensemble, since the enthalpy remains fixed as well as the pressure. Recall that the enthalpy is H = E +P V . If the system also exchanges heat with a thermal reservoir, which maintains a fixed temperature T , then the system is described by the isothermal-isobaric ensemble. In this chapter, the basic thermodynamics of isobaric ensembles will be derived by performing a Legendre transformation on the volume starting with the microcanonical and canonical ensembles, respectively. The condition of a fluctuating volume will be seen to affect the ensemble distribution function, which must be viewed as a function of both the phase-space vector x and the volume V . Indeed, when considering how the volume fluctuates in an isobaric ensemble, it is important to note that both isotropic and anisotropic fluctuations are possible. Bulk liquids and gases in equilibrium only support isotropic fluctuations. However, in any system that is not isotropic by nature, anisotropic volume fluctuations are possible even if the applied external pressure is isotropic. For solids, if one is interested in structural phase transitions under an external applied pressure or in mapping out the space of crystal structures of complex molecular systems that exhibit polymorphism, it is often critical to include anisotropic shape changes of the containing volume or supercell. Other examples that support anisotropic volume changes include biological membranes, amorphous materials, and interfaces, to name a few. After developing the basic statistical mechanics of the isobaric ensembles, we will see how the extended phase-space techniques of the previous chapter can be adapted

232

Isobaric ensembles

for molecular dynamics calculations in these ensembles. We will show how the volume and density distributions can be generated by treating the volume as an additional dynamical variable with a corresponding momentum, the latter serving as a barostatic control of the fluctuations in the internal pressure. This idea will be extended to anisotropic volume shape-changes by treating the cell vectors as dynamical variables.

5.2

Thermodynamics of isobaric ensembles

We begin by considering the isoenthalpic-isobaric ensemble, which derives from a Legendre transformation performed on the microcanonical ensemble. In the microcanonical ensemble, the energy E is constant and is expressed as a function of the number of particles N , the volume V , and the entropy S: E = E(N, V, S). Since we seek to use an external applied pressure P as the control variable in place of the volume V , it is necessary to perform a Legendre transform of E with respect to the volume V . ˜ we find Denoting the new energy as E, ∂E ˜ V (P ). E(N, P, S) = E(N, V (P ), S) − ∂V

(5.2.1)

˜ = E + P V , which we recognize However, since P = −∂E/∂V , the new energy is just E as the enthalpy H: H(N, P, S) = E(N, V (P ), S) + P V (P ).

(5.2.2)

The enthalpy is naturally a function of N , P , and S. Thus, for a process in which these variables change by small amounts, dN , dP , and dS, respectively, the change in the enthalpy is       ∂H ∂H ∂H dN + dP + dS. (5.2.3) dH = ∂N P,S ∂P N,S ∂S N,P Since H = E + P V , it also follows that dH = dE + P dV + V dP = T dS − P dV + µdN + P dV + V dP = T dS + V dP + µdN,

(5.2.4)

where the second line follows from the first law of thermodynamics. Comparing eqns. (5.2.3) and (5.2.4) leads to the thermodynamic relations       ∂H ∂H ∂H µ= , hV i = , T = . (5.2.5) ∂N P,S ∂P N,S ∂S N,P The notation hV i for the volume appearing in eqn. (5.2.5) serves to remind us that the observable volume results from a sampling of instantaneous volume fluctuations. Equations (5.2.5) constitute the basic thermodynamic relations in the isoenthalpic-isobaric

Phase space and partition functions

233

ensemble. The reason that enthalpy is designated as a control variable rather than the entropy is the same as for the microcanonical ensemble: It is not possible to “dial up” a desired entropy, whereas, in principle, the enthalpy can be set by the external conditions, even if it is never done in practice (except in computer simulations). The isothermal-isobaric ensemble results from performing the same Legendre transform on the canonical ensemble. The volume in the Helmholtz free energy A(N, V, T ) is transformed into the external pressure P yielding a new free energy denoted G(N, P, T ): G(N, P, T ) = A(N, V (P ), T ) − V (P )

∂A . ∂V

(5.2.6)

Using the fact that P = −∂A/∂V , we obtain G(N, P, T ) = A(N, V (P ), T ) + P V (P ).

(5.2.7)

The function G(N, P, T ) is known as the Gibbs free energy. Since G is a function of N , P , and T , a small change in each of these control variables yields a change in G given by       ∂G ∂G ∂G dG = dN + dP + dT. (5.2.8) ∂N P,T ∂P N,T ∂T N,P However, since G = A + P V , the differential change dG can also be expressed as dG = dA + P dV + V dP = −P dV + µdN − SdT + P dV + V dP = µdN + V dP − SdT,

(5.2.9)

where the second line follows from eqn. (4.2.5). Thus, equating eqn. (5.2.9) with eqn. (5.2.8), the thermodynamic relations of the isothermal-isobaric ensemble follow:       ∂G ∂G ∂G µ= , hV i = , S=− . (5.2.10) ∂N P,T ∂P N,T ∂T N,P As before, the volume in eqn. (5.2.10) must be regarded as an average over instantaneous volume fluctuations.

5.3

Isobaric phase-space distributions and partition functions

The relationship between the isoenthalpic-isobaric and isothermal-isobaric ensembles is similar to that between the microcanonical and canonical ensembles. In the isoenthalpicisobaric ensemble, the instantaneous enthalpy is given by H(x) + P V , where V is the instantaneous volume and H(x) is the Hamiltonian. Note that H(x) + P V is strictly conserved under isoenthalpic conditions. Thus, the ensemble is defined by a collection of systems evolving according to Hamilton’s equations in a containing volume; in turn, the volume of the container adjusts to keep the internal pressure equal to the external

234

Isobaric ensembles

applied pressure such that H(x) + P V is constant. The term P V in the instantaneous enthalpy represents the work done by the system against the external pressure. The fact that H(x) + P V is conserved implies that the ensemble is the collection of all microstates on the constant enthalpy hypersurface defined by the condition H(x) + P V = H,

(5.3.1)

analogous to the constant-energy hypersurface in the microcanonical ensemble. Since the ensemble distribution function must satisfy the equilibrium Liouville equation and therefore be a function F (H(x)) of the Hamiltonian, the appropriate solution for the isoenthalpic-isobaric ensemble is simply a δ-function expressing the conservation of the instantaneous enthalpy, f (x) = F (H(x)) = Mδ(H(x) + P V − H),

(5.3.2)

where M is an overall normalization constant. As in the microcanonical ensemble, the partition function (the number of accessible microstates) is obtained by integrating over the constant enthalpy hypersurface. However, as the volume is not fixed in this ensemble, each volume accessible to the system has an associated manifold of accessible phase-space points because the size of the configuration is determined by the volume. The partition function must, therefore, contain an integration over both the phase space and the volume. Denoting the partition function as Γ(N, P, H), we have Γ(N, P, H) = M

Z



dV

0

×

Z

D(V )

Z

dr1 · · ·

dp1 · · · Z

D(V )

Z

dpN

drN δ(H(r, p) + P V − H),

(5.3.3)

where the volume can, in principle, be any positive number. It is important to note that the volume and position integrations cannot be interchanged, since the position integration is restricted to the domain defined by each volume. For this reason, the volume integration cannot be used to integrate over the δ-function. The definition of the normalization constant M is similar to the microcanonical ensemble except that an additional reference volume V0 is needed to make the partition function dimensionless: M ≡ MN =

H0 . V0 N !h3N

(5.3.4)

Although we can write eqn. (5.3.3) more compactly as Γ(N, P, H) = M

Z

0



dV

Z

dx δ(H(x) + P V − H),

(5.3.5)

where the volume dependence of the phase-space integration is implicit, this volume dependence must be determined before the integration over V can be performed.

Phase space and partition functions

235

Noting that the thermodynamic relations in eqn. (5.2.5) can also be written in terms of the entropy S = S(N, P, H) as       hV i µ ∂S ∂S ∂S 1 , , , (5.3.6) = = = T ∂H N,P T ∂P N,H T ∂N V,H the thermodynamics can be related to the number of microscopic states by the analogous Boltzmann relation S(N, P, H) = k ln Γ(N, P, H),

(5.3.7)

so that eqns. (5.3.6) can be expressed in terms of the partition function as       1 hV i µ ∂ ln Γ ∂ ln Γ ∂ ln Γ , , . (5.3.8) = = = kT ∂H N,P kT ∂P kT ∂N V,H N,H The partition function for the isothermal-isobaric ensemble can be derived in much the same way as the canonical ensemble is derived from the microcanonical ensemble. The proof is similar to that in Section 4.3 and is left as an exercise (see Problem 5.1). As an alternative, we present a derivation of the partition function that parallels the development of the thermodynamics: We will make explicit use of the canonical ensemble. Consider two systems coupled to a common thermal reservoir so that each system is described by a canonical distribution at temperature T . Systems 1 and 2 have N1 and N2 particles respectively with N2 ≫ N1 and volumes V1 and V2 with V2 ≫ V1 . System 2 is coupled to system 1 as a “barostat,” allowing the volume to fluctuate such that the internal pressure P of system 2 functions as an external applied pressure to system 1 while keeping its internal pressure equal to P (see Fig. 5.1). The total particle number and volume are N = N1 + N2 and V = V1 + V2 , respectively. Let H1 (x1 ) be the Hamiltonian of system 1 and H2 (x2 ) be the Hamiltonians of system 2. The total Hamiltonian is H(x) = H1 (x1 ) + H2 (x2 ). If the volume of each system were fixed, the total canonical partition function Q(N, V, T ) would be Z Q(N, V, T ) = CN dx1 dx2 e−βH1 (x1 )+H2 (x2 ) = g(N, N1 , N2 )CN1

Z

dx1 e−βH1 (x1 ) CN2

∝ Q1 (N1 , V1 , T )Q2 (N2 , V2 , T ),

Z

dx2 e−βH2 (x2 ) (5.3.9)

where g(N, N1 , N2 ) is an overall normalization constant. Equation (5.3.9) does not produce a proper counting of all possible microstates, as it involves only one specific choice of V1 and V2 , and these volumes need to be varied over all possible values. A proper counting, therefore, requires that we integrate over all V1 and V2 , subject to the condition that V1 + V2 = V . Since V2 = V − V1 , we only need to integrate explicitly

236

Isobaric ensembles

Fig. 5.1 Two systems in contact with a common thermal reservoir at temperature T . System 1 has N1 particles in a volume V1 ; system 2 has N2 particles in a volume V2 . Both V1 and V2 can vary.

over one of the volumes, say V1 . Thus, we write the correct canonical partition function for the total system as Q(N, V, T ) = g(N, N1 , N2 )

Z

V

0

dV1 Q1 (N1 , V1 , T )Q2 (N2 , V − V1 , T ).

(5.3.10)

The canonical phase-space distribution function f (x) of the combined system 1 and 2 is CN e−βH(x) f (x) = . (5.3.11) Q(N, V, T ) In order to determine the distribution function f1 (x1 , V1 ) of system 1, we need to integrate over the phase space of system 2: Z g(N, N1 , N2 ) CN1 e−βH1 (x1 ) CN2 dx2 e−βH2 (x2 ) f1 (x1 , V1 ) = Q(N, V, T ) =

Q2 (N2 , V − V1 , T ) g(N, N1 , N2 )CN1 e−βH1 (x1 ) . Q(N, V, T )

(5.3.12)

The distribution in eqn. (5.3.12) satisfies the normalization condition: Z

V

dV1 0

Z

dx1 f1 (x1 , V1 ) = 1.

(5.3.13)

Phase space and partition functions

237

The ratio of partition functions can be expressed in terms of Helmholtz free energies according to Q2 (N2 , V − V1 , T ) = e−βA(N2 ,V −V1 ,T ) Q(N, V, T ) = e−βA(N,V,T ) Q2 (N2 , V − V1 , T ) = e−β[A(N −N1 ,V −V1 ,T )−A(N,V,T )]. Q(N, V, T )

(5.3.14)

Recalling that N ≫ N1 and V ≫ V1 , the free energy A(N − N1 , V − V1 , T ) can be expanded to first order about N1 = 0 and V1 = 0, which yields   ∂A A(N − N1 , V − V1 , T ) ≈ A(N, V, T ) − N1 ∂N N1 =0,V1 =0 − V1



 ∂A . ∂V N1 =0,V1 =0

(5.3.15)

The minus signs arise from differentiating with respect to N and V instead of N1 and V1 . Using the relations µ = ∂A/∂N and P = −∂A/∂V , eqn. (5.3.15) becomes A(N − N1 , V − V1 , T ) ≈ A(N, V, T ) − µN1 + P V1 .

(5.3.16)

Substituting eqn. (5.3.16) into eqn. (5.3.12) yields the distribution f1 (x1 , V1 ) = g(N, N1 , N2 )eβµN1 e−βP V1 e−βH1 (x1 ) .

(5.3.17)

System 2 has now been eliminated, and we can drop the extraneous “1” subscript. Rearranging eqn. (5.3.17), integrating both sides, and taking the thermodynamic limit, we obtain Z ∞ Z Z ∞ Z e−βµN dV dx f (x, V ) = IN dV dx e−β(H(x)+P V ) . (5.3.18) 0

0

Equation (5.3.18) defines the partition function of the isothermal-isobaric ensemble as Z ∞ Z ∆(N, P, T ) = IN dV dx e−β(H(x)+P V ) , (5.3.19) 0

where the definition of the prefactor IN is analogous to the microcanonical and canonical ensembles but with an additional reference volume to make the overall expression dimensionless: 1 . (5.3.20) IN = V0 N !h3N As noted in Sections 3.2 and 4.3, the factors IN and MN (see eqn. (5.3.4)) should be generalized to I{N } and M{N } for multicomponent systems.

238

Isobaric ensembles

Equation (5.3.18) illustrates an important point. Since eqn. (5.3.13) is true in the limit V → ∞, and ∆(N, P, T ) = exp(−βG(N, P, T )) (we will prove this shortly), it follows that e−βµN = e−βG(N,P,T ) , (5.3.21) or G(N, P, T ) = µN . This relation is a special case of a more general result known as Euler’s theorem (see Section 6.2). Euler’s theorem implies that if a thermodynamic function depends on extensive variables such as N and V , it can be reexpressed as a sum of these variables multiplied by their thermodynamic conjugates. Since G(N, P, T ) depends only one extensive variable N , and µ is conjugate to N , G(N, P, T ) is a simple product µN . The partition function of the isothermal-isobaric ensemble is essentially a canonical partition function in which the Hamiltonian H(x) is replaced by the “instantaneous enthalpy” H(x) + P V and an additional volume integration is included. Since IN = CN /V0 , it is readily seen that eqn. (5.3.19) is Z ∞ 1 dV e−βP V Q(N, V, T ). (5.3.22) ∆(N, P, T ) = V0 0

According to eqn. (5.3.22), the isothermal-isobaric partition function is the Laplace transform (see Appendix E) of the canonical partition function with respect to volume, just as the canonical partition function is the Laplace transform of the microcanonical partition function with respect to energy. In both cases, the variable used to form the Laplace transform between partition functions is the same variable used to form the Legendre transform between thermodynamic functions. We now show that the Gibbs free energy is given by the relation 1 (5.3.23) G(N, P, T ) = − ln ∆(N, P, T ). β Recall that G = A + P hV i = E + P hV i − T S, which can be expressed as

∂G (5.3.24) ∂T with the help of eqn. (5.2.10). Note that the average of the instantaneous enthalpy is R∞ R dx (H(x) + P V )e−β(H(x)+P V ) IN 0 dV R∞ R H = hH(x) + P V i = IN 0 dV dx e−β(H(x)+P V ) G = hH(x) + P V i + T

=−

∂ 1 ∆(N, P, T ) ∆(N, P, T ) ∂β

=−

∂ ln ∆(N, P, T ). ∂β

(5.3.25)

Therefore, eqn. (5.3.24) becomes G+

∂ ∂G ln ∆(N, P, T ) + β = 0, ∂β ∂β

(5.3.26)

which is analogous to eqn. (4.3.20). Thus, following the procedure in Section 4.3 used to prove that A = −(1/β) ln Q, we can easily show that G = −(1/β) ln ∆ is a solution

Entropy maximization

239

to eqn. (5.3.26). other thermodynamic quantities follow in a manner similar to the canonical ensemble. The average volume is   ∂ ln ∆(N, P, T ) , (5.3.27) hV i = −kT ∂P N,T the chemical potential is given by µ = −kT



∂ ln ∆(N, P, T ) ∂N



,

(5.3.28)

N,P

the heat capacity at constant pressure CP is   ∂H CP = ∂T N,P = kβ 2

∂2 ln ∆(N, P, T ), ∂β 2

(5.3.29)

and the entropy is obtained from S(N, P, T ) = k ln ∆(N, P, T ) +

H(N, P, T ) . T

(5.3.30)

The equivalence of the isobaric ensembles to the canonical and microcanonical ensembles is explored in Problem 5.2 which examines the behavior of volume fluctuations as a functions of system size.

5.4

Isothermal-isobaric ensemble via entropy maximization

In Section 4.4, we demonstrated that the canonical ensemble could be derived via entropy maximization by defining an entropy functional of the phase-space distribution function based on the Shannon entropy definition. The maximization was performed subject to two constraints: (i) the distribution must be properly normalized and (ii) hH(x)i = E, i.e., the canonical average of the Hamiltonian must yield a well-defined energy. The constrained maximization revealed that the only distribution function possible is f (x) = CN exp[−βH(x)]/Q(N, V, T ). We will now show that the distribution function f (x, V ) of the isothermal-isobaric ensemble can be derived in a similar manner. When the volume integration is included, the analog to eqn. (4.4.1) for an isobaric ensemble is Z ∞ Z S = −khln f (x)i = −k dV dx f (x, V ) ln f (x, V ). (5.4.1) 0

We need to maximize this functional subject to two constraints. As in Section 4.4, the first constraint is the normalization of the distribution function Z Z ∞ dV dx f (x, V ) = 1, (5.4.2) 0

240

Isobaric ensembles

while the second is dictated by eqn. (5.3.25), which requires that hH(x) + P V i yield a well-defined enthalpy H. These two constraints are added with Lagrange multipliers µ and λ to yield the modified entropy functional Z ∞  Z ∞ Z Z ˜ S[f ] = −k dV dx f (x, V ) ln f (x, V ) − µ dV dx f (x, V ) − 1 0

−λ

Z

0

0



dV

Z



dx (H(x) + P V ) f (x, V ) − H .

(5.4.3)

˜ ] with respect to As we did in Section 4.4, we take the functional derivative of S[f f (x, V ) and set it equal to 0, which yields δ S˜ = −k [ln f (x, V ) + 1] − µ − λ (H(x) + P V ) = 0. δf (x, V )

(5.4.4)

Solving for f (x, V ), we obtain f (x, V ) = e−µ/k e−1 e−λ(H(x)+P V )/k .

(5.4.5)

Referring back to eqn. (5.2.5), we can obtain an expression for the Lagrange multiplier λ by requiring that 1 ∂ S˜ = . (5.4.6) ∂H T ˜ From eqn. (5.4.3), we see that (∂ S/∂H) = λ, from which it follows that λ = 1/T . Substituting this into eqn. (5.4.5), we find that f (x, V ) = e−(1+µ/k) e−(H(x)+P V )/kT ,

(5.4.7)

and once the distribution is properly normalized, we obtain f (x, V ) =

IN e−(H(x)+P V )/kT , ∆(N, P, T )

(5.4.8)

which is the expected result for the isothermal-isobaric ensemble.

5.5

Pressure and work virial theorems

In this chapter, we must distinguish between the external applied pressure, here denoted P , and the internal pressure of the system, denoted P (int) , obtained by averaging the estimator P(r, p) in eqns. (4.7.57) and (4.7.58) over a canonical ensemble. In the isobaric ensembles, the volume adjusts so that the volume-averaged internal pressure hP (int) i is equal to the external applied pressure P . Recall that the internal pressure P (int) at a particular volume V is given in terms of the canonical partition function by kT ∂Q ∂ ln Q = . (5.5.1) P (int) = hP(r, p)i = kT ∂V Q ∂V In order to determine the volume-averaged internal pressure, we need to average eqn. (5.5.1) over an isothermal-isobaric distribution according to Z ∞ ∂ kT 1 Q(N, V, T ) dV e−βP V Q(N, V, T ) hP (int) i = ∆(N, P, T ) 0 Q(N, V, T ) ∂V

Virial theorems

1 = ∆(N, P, T )

Z



dV e−βP V kT

0

∂ Q(N, V, T ). ∂V

241

(5.5.2)

Integrating by parts in eqn. (5.5.2), we obtain   Z  ∞ 1  −βP V ∂ −βP V 1 ∞ Q(N, V, T ) e kT Q(N, V, T ) − hP (int) i = e dV kT ∆ ∆ 0 ∂V 0 Z 1 ∞ =P dV e−βP V Q(N, V, T ) = P. (5.5.3) ∆ 0 The boundary term in the first line of eqn. (5.5.3) vanishes at both endpoints: At V = 0, the configurational integrals in Q(N, V, T ) over a box of zero volume must vanish, and at V = ∞, the exponential exp(−βP V ) decays faster than Q(N, V, T ) increases with V .1 Recognizing that the integral in the last line of eqn. (5.5.3) is just the partition function ∆(N, P, T ), it follows that hP (int) i = P.

(5.5.4)

Equation (5.5.4) expresses the expected result that the volume-averaged internal pressure is equal to the external pressure. This result is known as the pressure virial theorem. Any computational approach that seeks to generate the isothermal-isobaric ensemble must obey this theorem. We next consider the average of the pressure-volume product hP (int) V i. At a fixed volume V , the product P (int) V is given in terms of the canonical partition function by P (int) V = kT V

kT V ∂Q ∂ ln Q = . ∂V Q ∂V

Averaging eqn. (5.5.5) over an isothermal-isobaric ensemble yields Z 1 ∞ ∂ (int) hP Vi= Q(N, V, T ). dV e−βP V kT V ∆ 0 ∂V

(5.5.5)

(5.5.6)

As was done for eqn. (5.5.3), we integrate eqn. (5.5.6) by parts, which gives   Z  ∞ 1 ∞ ∂ 1  −βP V −βP V (int) Q(N, V, T ) e kT V Q(N, V, T ) − Ve dV kT hP Vi = ∆ ∆ 0 ∂V 0   Z ∞ Z ∞ 1 −βP V −βP V = −kT dV e Q(V ) + P dV e V Q(V ) ∆ 0 0 = −kT + P hV i,

(5.5.7)

or hP (int) V i + kT = P hV i.

(5.5.8)

Equation (5.5.8) is known as the work virial theorem. Note the presence of the extra kT term on the left side. Since P hV i and hP (int) V i are both extensive quantities and hence 1 Recall

that as V → ∞, Q(N, V, T ) approaches the ideal-gas and grows as V N .

242

Isobaric ensembles

proportional to N , the extra kT term can be neglected in the thermodynamic limit, and eqn. (5.5.8) becomes hP (int) V i ≈ P hV i. Nevertheless, eqn. (5.5.8) is rigorously correct, and it is interesting to consider the origin of the extra kT term, since it will arise again in Section 5.10, where we discuss molecular dynamics algorithms for the isothermal-isobaric ensemble. The quantity hP (int) V i can be defined for any ensemble. However, because the volume can fluctuate in an isobaric ensemble, we can think of the volume as an additional degree of freedom that is not present in the microcanonical and canonical ensembles. If energy is equipartitioned, there should be an additional kT of energy in the volume motion, giving rise to a difference of kT between hP (int) V i and P hV i. Since the motion of the volume is driven by an imaginary “piston” that acts to adjust the internal pressure to the external pressure, this piston also adds an amount of energy kT to the system so that eqn. (5.5.8) is satisfied.

5.6

An ideal gas in the isothermal-isobaric ensemble

As an example application of the isothermal-isobaric ensemble, we compute the partition function and thermodynamic properties of an ideal gas. Recall from Section 4.6 that canonical partition function for the ideal gas is Q(N, V, T ) =

VN , N !λ3N

(5.6.1)

p where λ = βh2 /2πm. Substituting eqn. (5.6.1) into eqn. (5.3.22) gives the isothermalisobaric partition function Z ∞ Z ∞ 1 1 VN ∆(N, P, T ) = = dV e−βP V V N . (5.6.2) dV e−βP V V0 0 N !λ3N V0 N !λ3N 0 The volume integral can be rendered dimensionless by letting x = βP V , leading to Z ∞ 1 1 dx xN e−x . (5.6.3) ∆(N, P, T ) = V0 N !λ3N (βP )N +1 0 The value of the integral is just N !. Hence, the isothermal-isobaric partition function for an ideal gas is 1 ∆(N, P, T ) = . (5.6.4) 3N V0 λ (βP )N +1 The thermodynamics of the ideal gas follow from the relations derived in Section 5.3. For the equation of state, we obtain the average volume from   (N + 1)kT ∂ ln ∆ = , (5.6.5) hV i = −kT ∂P P or P hV i = (N + 1)kT ≈ N kT,

(5.6.6)

where the last expression follows from the thermodynamic limit. Using eqn. (5.5.8), we can express eqn. (5.6.6) in terms of the average P (int) V product:

Anisotropic cells

hP (int) V i = N kT.

243

(5.6.7)

Equation (5.6.7) is generally true even away from the thermodynamic limit. The average enthalpy of the ideal gas is given by H=−

3 5 ∂ ln ∆ = (N + 1)kT + N kT ≈ N kT, ∂β 2 2

from which the constant pressure heat capacity is given by   5 ∂H = N k. CP = ∂T 2

(5.6.8)

(5.6.9)

Equations (5.6.8) and (5.6.9) are usually first encountered in elementary physics and chemistry textbooks with no microscopic justification. This derivation shows the microscopic origin of eqn. (5.6.9). Note that the difference between the constant volume and constant pressure heat capacities is CP = CV + N k = CV + nR,

(5.6.10)

where the product N k has been replaced by nR, with n the number of moles of gas and R the gas constant. (This relation is obtained by multiplying and dividing by N0 , Avogadro’s number, N k = (N/N0 )N0 k = nR.) Dividing eqn. (5.6.10) by the number of moles leads to the familiar relation for the molar heat capacities: cP = cV + R.

(5.6.11)

5.7 Extending the isothermal-isobaric ensemble: Anisotropic cell fluctuations In this section, we will show how to account for anisotropic volume fluctuations within the isothermal-isobaric ensemble. Anisotropic volume fluctuations can occur under a wide variety of external conditions; however, we will limit ourselves to those that develop under an applied isotropic external pressure. Other external conditions, such as an applied pressure in two dimensions, would generate a constant surface tension ensemble. The formalism developed in this chapter will provide the reader with the tools to understand and develop computational approaches for different external conditions. When the volume of a system can undergo anisotropic fluctuations, it is necessary to allow the containing volume to change its basic shape. Consider a system contained within a general parallelepiped. The parallelepiped represents the most general “box” shape and is appropriate for describing, for example, solids whose unit cells are generally triclinic. As shown in Fig. 5.2, any parallelepiped can be specified by the three vectors a, b, and c that lie along three edges originating from a vertex. Simple geometry tells us that the volume V of the parallelepiped is given by V = a · b×c.

(5.7.1)

244

Isobaric ensembles

Fig. 5.2 A general parallelepiped showing the convention for the cell vectors and angles.

Since each edge vector contains three components, nine numbers can be used to characterize the parallelepiped; these are often collected in the columns of a 3×3 matrix h called the box matrix or cell matrix:   ax b x cx h =  ay b y cy  . (5.7.2) az b z cz

In terms of the cell matrix, the volume V is easily seen to be V = det(h).

(5.7.3)

On the other hand, a little reflection shows that, in fact, only six numbers are needed to specify the cell: the lengths of the edges a = |a|, b = |b|, and c = |c| and the angles α, β, and γ between them. By convention, these three angles are defined such that α is the angle between vectors b and c, β is the angle between vectors a and c, and γ is the angle between vectors a and b. It is clear, therefore, that the full cell matrix contains redundant information—in addition to providing information about the cell lengths and angles, it also describes overall rotations of the cell in space, as specified by the three Euler angles (see Section 1.11), which accounts for the three extra degrees of freedom. In order to separate isotropic from anisotropic cell fluctuations, we introduce a unit box matrix h0 related to h by h = V 1/3 h0 such that det(h0 ) = 1. Focusing on the isothermal-isobaric ensemble, the changing cell shape under the influence of an isotropic applied pressure P can be incorporated into the partition function by writing ∆(N, P, T ) as Z ∞ Z 1 dV dh0 e−βP V Q(N, V, h0 , T ) δ (det(h0 ) − 1) , (5.7.4) ∆(N, P, T ) = V0 0 R where dh0 is an integral over all nine components of h0 and the δ-function restricts the integration to unit box matrices satisfying det(h0 ) = 1. In eqn. (5.7.4), the explicit

Anisotropic cells

245

dependence of the canonical partition function Q on both the volume V and the shape of the cell described by h0 is shown. Rather than integrate over V and h0 with the constraint of det(h0 ) = 1, it is preferable to perform an unconstrained integration over h. This can be accomplished by a change of variables from h0 to h. Since each element of h0 is multiplied by V 1/3 to obtain h, the integration measure, which is a nine-dimensional integration, transforms as dh0 = V −3 dh. In addition, det(h0 ) = det(h)/V . Thus, substituting the cell-matrix transformation into eqn. (5.7.4) yields   Z ∞ Z 1 1 det(h) − 1 dV dh V −3 e−βP V Q(N, h, T ) δ ∆(N, P, T ) = V0 0 V Z Z ∞ 1 = dV dh V −3 e−βP V Q(N, h, T )V δ (det(h) − V ) V0 0 Z Z ∞ 1 = dV dh V −2 e−βP V Q(N, h, T ) δ (det(h) − V ) , (5.7.5) V0 0 where the dependence of Q on V and h0 has been expressed as an equivalent dependence only on h. Performing the integration over the volume using the δ-function, we obtain for the partition function Z 1 dh [det(h)]−2 e−βP det(h) Q(N, h, T ). (5.7.6) ∆(N, P, T ) = V0 In an arbitrary number d of spatial dimensions, the transformation is h = V 1/d h0 , and the partition function becomes Z 1 1−d −βP det(h) ∆(N, P, T ) = dh [det(h)] e Q(N, h, T ). (5.7.7) V0 Before describing the generalization of the virial theorems of Section 5.5, we note that the internal pressure of a canonical ensemble with a fixed cell matrix h describing an anisotropic system cannot be described by a single scalar quantity as is possible for an isotropic system. Rather, a tensor is needed; this tensor is known as the pressure tensor, P(int) . Since the Helmholtz free energy A = A(N, h, T ) depends on the full cell matrix, the pressure tensor, which is a 3 × 3 (or rank 2) tensor, has components given by   3 ∂A 1 X (int) , (5.7.8) hβγ Pαβ = − det(h) γ=1 ∂hαγ N,T which can be expressed in terms of the canonical partition function as (int) Pαβ

  3 kT X ∂ ln Q = . hβγ det(h) γ=1 ∂hαγ N,T

(5.7.9)

In Section 5.8, an appropriate microscopic estimator for the pressure tensor will be derived.

246

Isobaric ensembles

If we now consider the average of the pressure tensor in the isothermal-isobaric ensemble, a tensorial version of virial theorem can be proved for an applied isotropic external pressure P . The average of the internal pressure tensor is (int) hPαβ i

1 = ∆(N, P, T )

Z

1 = ∆(N, P, T )

Z

dh [det(h)]

dh [det(h)]

−2 −βP det(h) kT Q(N, h, T )

e

det(h)

−2 −βP det(h)

e

3 X

γ=1

hβγ



∂ ln Q ∂hαγ



N,T

  3 ∂Q kT X .(5.7.10) hβγ det(h) γ=1 ∂hαγ N,T

An integration by parts can be performed as was done in Section 5.5, and, recognizing that the boundary term vanishes, we obtain (int) hPαβ i

3 X

kT =− ∆(N, P, T )

Z

∂ dh ∂h αγ γ=1

kT =− ∆(N, P, T )

Z

3  X

dh

γ=1

  −2 −βP det(h) kT [det(h)] e hβγ Q(N, h, T ) det(h)

−3[det(h)]−4

∂det(h) hβγ ∂hαγ

∂det(h) hβγ ∂hαγ  ∂hβγ e−βP det(h) Q(N, h, T ). (5.7.11) + [det(h)]−3 ∂hαγ − βP [det(h)]−3

In order to proceed, we need to know how to calculate the derivative of the determinant of a matrix with respect to one of its elements. The determinant of a matrix M can be written as det(M) = exp[Tr ln(M)]. Taking the derivative of this expression with respect to an element Mij , we obtain   ∂[det(M)] Tr ln(M) −1 ∂M =e Tr M ∂Mij ∂Mij = det(M)

X k,l

−1 Mkl

∂Mlk , ∂Mij

(5.7.12)

where the trace has been written out explicitly. The derivative ∂Mlk /∂Mij = δil δkj . Thus, performing the sums over k and l leaves ∂[det(M)] −1 = det(M)Mji . ∂Mij

(5.7.13)

P P Applying eqn. (5.7.13) to eqn. (5.7.11), and using the fact that γ ∂hβγ /∂hαγ = γ δβα δγγ = 3δβα , it can be seen that the first and last terms in the curly brackets of eqn. (5.7.11) cancel, leaving

Pressure tensor estimator (int)

hPαβ i =

kT ∆(N, P, T )

Z

dh βP δαβ e−βP det(h) Q(N, h, T ) = P δαβ ,

247

(5.7.14)

which states that, on the average, the pressure tensor should be diagonal with each diagonal element equal to the external applied pressure P . This is the generalization of the pressure virial theorem of eqn. (5.5.4). In a similar manner, the generalization of the work virial in eqn. (5.5.8) can be shown to be (int)

hPαβ det(h)i + kT δαβ = P hdet(h)iδαβ ,

(5.7.15)

(int)

according to which the average hPαβ det(h)i is diagonal.

5.8 Derivation of the pressure tensor estimator from the canonical partition function Molecular dynamics calculations in isobaric ensembles require explicit microscopic estimators for the pressure. In Section 4.7.3, we derived an estimator for the isotropic internal pressure (see eqn. (4.7.57)). In this section, we generalize the derivation and obtain an estimator for the pressure tensor. For readers wishing to skip over the mathematical details of this derivation, we present the final result: (int)

Pαβ (r, p) =

 N  ˆα )(pi · e ˆβ ) 1 X (pi · e ˆα )(ri · e ˆβ ) , + (Fi · e det(h) i=1 mi

(5.8.1)

ˆα and e ˆβ are unit vectors along the α and β spatial directions, respectively. where e ˆα ) is just the αth component of the momentum vector pi , with α = x, y, z. Thus, (pi · e (int) The internal pressure tensor Pαβ at fixed h is simply a canonical ensemble average of the estimator in eqn. (5.8.1). The derivation of the pressure tensor requires a transformation from the primitive Cartesian variables r1 , ..., rN , p1 , ..., pN to scaled variables, as was done in Section 4.7.3 for the isotropic pressure estimator. In order to make the dependence of the Hamiltonian and the partition function on the box matrix h explicit, we introduce scaled variables s1 , ..., sN related to the primitive Cartesian positions by ri = hsi .

(5.8.2)

The right side of eqn. (5.8.2) is a matrix-vector product, which, in component form, appears as X ˆα = ˆβ ) , ri · e hαβ (si · e (5.8.3) β

or in more compact notation, ri,α =

X β

ˆα and si,β = si · e ˆβ . where ri,α = ri · e

hαβ si,β ,

(5.8.4)

248

Isobaric ensembles

Not unexpectedly, the corresponding transformation for the momenta requires multiplication by the inverse box matrix h−1 . However, since h and h−1 are not symmetric, should the matrix be multiplied on the right or on the left. The Lagrangian formulation of classical mechanics of Section 1.4 provides us with a direct route for answering this question. Recall that the Lagrangian is given by ˙ = L(r, r)

1X mi r˙ 2i − U (r1 , ..., rN ). 2 i

(5.8.5)

The Lagrangian can be transformed into the scaled coordinates by substituting eqn. (5.8.4) and the corresponding velocity transformation X r˙i,α = hαβ s˙ i,β (5.8.6) β

into eqn. (5.8.5) to yield L(s, s˙ ) =

X 1X mi hαβ s˙ i,β hαγ s˙ i,γ − U (hs1 , ..., hsN ) 2 i α,β,γ

X 1 X mi s˙ i,β s˙ i,γ − U (hs1 , ..., hsN ). hαβ hαγ = 2 i

(5.8.7)

α,β,γ

A component of the momentum πj conjugate to sj is computed according to πj,λ =

∂L . ∂ s˙ j,λ

(5.8.8)

The trickiest part of this derivative is keeping track of the indices. Since all of the indices in eqn. (5.8.7) are summed over or contracted, eqn. (5.8.7) contains many terms. The only terms that contribute to the momentum in eqn. (5.8.8) are those for which i = j and β = λ or γ = λ. The easiest way to keep track of the bookkeeping is to replace factors of s˙ i,β or s˙ i,γ with δij δβλ and δij δγλ , respectively, when computing the derivative, and then perform the sums with the aid of the Kroenecker deltas: X 1 X mi [δij δβλ s˙ i,γ + s˙ i,β δij δγλ ] hαβ hαγ 2 i α,β,γ   X X 1 = mj  hαλ hαγ s˙ j,γ + hαβ hαλ s˙ j,β  . 2 α,γ

πj,λ =

(5.8.9)

α,β

Since the two sums appearing in the last line of eqn. (5.8.9) are the same, the factor of 1/2 can be cancelled, yielding X X πj,λ = mj hαλ hαγ s˙ j,γ = mj r˙j,α hαλ . (5.8.10) α,γ

Writing this in vector notation, we find

α

Pressure tensor estimator

πj = mj r˙ j h = pj h

249

(5.8.11)

or pj = πj h−1 .

(5.8.12) −1

Thus, we see that πj must be multiplied on the right by h . Having obtained the Lagrangian in scaled coordinates and the momentum transformation from the Lagrangian, we must now derive the Hamiltonian in order to determine the canonical partition function. The Hamiltonian is given by the Legendre transform rule: XX X πi · s˙ i − L = πi,α s˙ i,α − L. (5.8.13) H= α

i

i

−1 ˙

−1

Using the fact that s˙ i = h ri = h pi /mi together with eqn. (5.8.12) to substitute pi in terms of πi , the Hamiltonian becomes X X 1 XX 1 −1 πi,α h−1 p − L = πi,α h−1 (5.8.14) H= i,β αβ αβ πi,γ hγβ − L. m m i i i i α,β,γ

α,β

Since the kinetic energy term in L is just 1/2 of the first term in eqn. (5.8.14), the Hamiltonian becomes −1 X X πi,α πi,γ h−1 αβ hγβ

H=

i

α,β,γ

2mi

+ U (hs1 , ..., hsN ).

(5.8.15)

The pressure tensor in the canonical ensemble is given by (int)

Pαβ

=

kT X ∂Q(N, h, T ) 1 hβγ Q(N, h, T ) det(h) γ ∂hαγ

kT 1 = det(h) Q(N, h, T ) =−

*

Z

N

N

d πd s

1 X ∂H hβγ det(h) γ ∂hαγ

X γ

+

  ∂H e−βH hβγ −β ∂hαγ

.

(5.8.16)

Equation (5.8.16) requires the derivative of the Hamiltonian with respect to an arbitrary element of h. This derivative must be obtained from eqn. (5.8.15), which requires more index bookkeeping. Let us first rewrite the Hamiltonian using a different set of summation indices: H=

−1 X X πi,µ πi,ν h−1 µλ hνλ i

µ,ν,λ

2mi

+ U (hs1 , ..., hsN ).

(5.8.17)

Computing the derivative with respect to hαγ , we obtain ! −1 X X πi,µ πi,ν ∂h−1 ∂ ∂H µλ −1 −1 ∂hνλ + = hνλ + hµλ U (hs1 , ..., hsN ). (5.8.18) ∂hαγ 2m ∂h ∂h ∂h i αγ αγ αγ i µ,ν,λ

250

Isobaric ensembles

In order to proceed, we will derive an identity for the derivative of the inverse of a matrix M(λ) with respect to an arbitrary parameter λ. We start by differentiating the matrix identity M(λ)M−1 (λ) = I (5.8.19) with respect to λ to obtain dM−1 dM −1 M +M = 0. dλ dλ

(5.8.20)

Solving eqn. (5.8.20) for dM−1 /dλ yields dM −1 dM−1 = −M−1 M . dλ dλ

(5.8.21)

Applying eqn. (5.8.21) to eqn. (5.8.18), we obtain  X X πi,µ πi,ν X  ∂hρσ −1 −1 ∂H −1 −1 ∂hρσ h−1 h−1 =− h h + h h µρ σλ σλ νλ µλ νρ ∂hαγ 2m ∂h ∂h i αγ αγ ρ,σ i µ,ν,λ

+

∂ U (hs1 , ..., hsN ). ∂hαγ

(5.8.22)

Using ∂hρσ /∂hαγ = δαρ δσγ and performing the sums over ρ and σ, we find  X X πi,µ πi,ν  ∂H −1 −1 −1 −1 −1 h−1 =− µα hγλ hνλ + hµλ hνα hγλ ∂hαγ 2mi i µ,ν,λ

+

∂ U (hs1 , ..., hsN ). ∂hαγ

(5.8.23)

Since X X ∂U ∂hµν ∂ U (hs1 , ..., hsN ) = si,ν ∂hαγ ∂(hsi )µ ∂hαγ i µ,ν =

XX

=

X

i

i

µ,ν

∂U δαµ δγν si,ν ∂(hsi )µ

∂U si,γ , ∂(hsi )α

we arrive at the result  X ∂U X X πi,µ πi,ν  ∂H −1 −1 −1 −1 −1 h−1 =− si,γ . µα hγλ hνλ + hµλ hνα hγλ + ∂hαγ 2mi ∂(hsi )α i i

(5.8.24)

(5.8.25)

µ,ν,λ

To obtain the pressure tensor estimator, we must multiply by hβγPand sum over γ. When this is done and the sum over γ is performed according to γ hβγ h−1 γλ = δβλ , then the sum over λ can be performed as well, yielding

Molecular dynamics

X γ

hβγ

251

 X X πi,µ πi,ν  ∂H −1 −1 −1 =− h−1 h + h h µα νβ µβ να ∂hαγ 2mi i µ,ν +

XX γ

i

∂U hβγ si,γ . ∂(hsi )α

(5.8.26)

P P −1 −1 We now P recognize that α πi,µ hµα = pi,α , ν πi,ν hνβ = pi,β , ∂U/∂(hsi ) = ∂U/∂ri and γ hβγ si,γ = ri,β . Substituting these results into eqn. (5.8.26) and multiplying by −1/det(h) gives (int)

Pαβ (r1 , ..., rN , p1 , ..., pN ) =

 N  1 X pi,α pi,β + Fi,α ri,β , det(h) i=1 mi

(5.8.27)

which is equivalent to eqn. (5.8.1), thus completing the derivation. The isotropic pressure estimator for P (int) in eqn. (4.7.57) can be obtained directly from the pressure tensor estimator by tracing: i 1 X (int) 1 h (5.8.28) P(int) (r, p) = Pαα (r, p) = Tr P(int) (r, p) , 3 α 3 where P(int) (r, p) is the tensorial representation of eqn. (5.8.27). Finally, note that if the potential has an explicit dependence on the cell matrix h, then the estimator is modified to read  N  ˆα )(pi · e ˆβ ) 1 X (pi · e (int) ˆα )(ri · e ˆβ ) Pαβ (r, p) = + (Fi · e det(h) i=1 mi 3



5.9

1 X ∂U hγβ . det(h) γ=1 ∂hαγ

(5.8.29)

Molecular dynamics in the isoenthalpic-isobaric ensemble

The derivation of the isobaric ensembles requires that the volume be allowed to vary in order to keep the internal pressure equal, on average, to the applied external pressure. This suggests that if we wish to develop a molecular dynamics technique for generating isobaric ensembles, we could introduce the volume as an independent dynamical variable in the phase space. Indeed, the work-virial theorem of eqn. (5.5.8) strongly supports such a notion, since it effectively assigns an energy of kT to a “volume mode.” The idea of incorporating the volume into the phase space as an additional dynamical degree of freedom, together with its conjugate momentum, as a means of generating an isobaric ensemble was first introduced by Andersen (1980) and later generalized for anisotropic volume fluctuations by Parrinello and Rahman (1980). This idea inspired numerous other powerful techniques based on extended phase spaces, including

252

Isobaric ensembles

the canonical molecular dynamics methods from Chapter 4, the Car-Parrinello approach (Car and Parrinello, 1985; Tuckerman, 2002; Marx and Hutter, 2009) for performing molecular dynamics with forces obtained from “on the fly” electronic structure calculations, and schemes for including nuclear quantum effects in molecular dynamics (see Chapter 12). In this section, we present Andersen’s original method for the isoenthalpic-isobaric ensemble and then use this idea as the basis for a non-Hamiltonian isothermal-isobaric molecular dynamics approach in Section 5.10. Andersen’s method is based on the remarkably simple yet very elegant idea that the scaling transformation used to derive the pressure si = V −1/3 ri ,

πi = V 1/3 pi

(5.9.1)

is all we need to derive an isobaric molecular dynamics method. This transformation is used not only to make the volume dependence of the coordinates and momenta explicit but also to promote the volume to a dynamical variable. Moreover, it leads to a force that is used to propagate the volume. In order to make the volume dynamical, we need to define a momentum pV conjugate to the volume and introduce a kinetic energy p2V /2W term into the Hamiltonian. Here, W is a mass-like parameter that determines the time scale of volume motion. Since we already know that the instantaneous pressure estimator is −∂H/∂V , we seek a Hamiltonian and associated equations of motion that drive the volume according to the difference between the instantaneous pressure and the external applied pressure P . The Hamiltonian postulated by Andersen is obtained from the standard Hamiltonian for an N -particle system by substituting eqn. (5.9.1) for the coordinates and momenta into the Hamiltonian, adding the volume kinetic energy and an additional term P V for the action of the imaginary “piston” driving the volume fluctuations. Andersen’s Hamiltonian is HA =

N X V −2/3 π 2 i

i=1

2mi

+ U (V 1/3 s1 , ..., V 1/3 sN ) +

p2V + P V. 2W

(5.9.2)

The parameter W is determined by a relation similar to eqn. (4.11.2) W = (3N + 1)kT τb2 ,

(5.9.3)

where τb is a time scale for the volume motion. The factor of 3N + 1 arises because the barostat scales all N particles and the volume. Equation (5.9.2) is now used to derive equations of motion for generating the isoenthalpic-isobaric ensemble. Applying Hamilton’s equations, we obtain s˙ i =

∂HA V −2/3 πi = ∂πi mi

π˙ i = −

∂U ∂HA =− V 1/3 ∂si ∂(V 1/3 si )

∂HA pV V˙ = = ∂pV W

Molecular dynamics

p˙V = −

X π2 X ∂U 1 2 ∂HA i − V −2/3 = V −5/3 · si − P. 1/3 s ) ∂V 3 m 3 ∂(V i i i i

253

(5.9.4)

These equations of motion could be integrated numerically using the techniques introduced in Section 3.10 to yield a trajectory in the scaled coordinates. However, it is not always convenient to work in these coordinates, as they do not correspond to the physical coordinates. Fortunately, eqns. (5.9.4) can be easily transformed back into the original Cartesian coordinates by inverting the transformation as follows: si = V −1/3 ri 1 s˙ i = V −1/3 r˙ i − V −4/3 V˙ ri 3 πi = V 1/3 pi 1 π˙ i = V 1/3 p˙ i + V −2/3 V˙ pi . 3

(5.9.5)

Substituting eqns. (5.9.5) into eqns. (5.9.4) yields r˙ i =

1 V˙ pi + ri mi 3V

p˙ i = −

∂U 1 V˙ − pi ∂ri 3V

pV V˙ = W p˙V

  1 X p2i ∂U = − · ri − P. 3V i mi ∂ri

(5.9.6)

Note that the right side of the equation of motion for pV is simply the difference between the instantaneous pressure estimator of eqn. (4.7.57) or (4.7.58) and the external pressure P . Although eqns. (5.9.6) cannot be derived from a Hamiltonian, they nevertheless possess the important conservation law H′ =

N X p2i p2 + U (r1 , ..., rN ) + V + P V 2mi 2W i=1

= H(r, p) +

p2V + P V, 2W

(5.9.7)

and they are incompressible. Here, H is the physical Hamiltonian of the system. Equations (5.9.6) therefore generate a partition function of the form

254

Isobaric ensembles

ΩP =

Z

dpV

Z

0



dV

Z

  p2V + PV − H d p d r δ H(r, p) + 2W D(V ) N

Z

N

(5.9.8)

at a pressure P .2 Equation (5.9.8) is not precisely equivalent to the true isoenthalpicisobaric partition function given in eqn. (5.3.3) because the conserved energy in eqn. (5.9.7) differs from the true enthalpy by p2V /2W . However, when the system is equipartitioned, then according to the classical virial theorem, hp2V /W i = kT , and for N very large, this constitutes only a small deviation from the true enthalpy. In fact, this kT is related to the extra kT appearing in the work-virial theorem of eqn. (5.5.8). If the fluctuations in p2V /2W are small, then the instantaneous enthalpy H(r, p) + P V is confined to a thin shell between H and H + ∆. In most molecular dynamics calculations, the isoenthalpic-isobaric ensemble is employed only seldom: the most common experimental conditions are constant pressure and temperature. Nevertheless, eqns. (5.9.6) provide the foundation for molecular dynamics algorithms capable of generating an isothermal-isobaric ensemble, which we discuss next.

5.10 Molecular dynamics in the isothermal-isobaric ensemble I: Isotropic volume fluctuations Since most condensed-phase experiments are carried out under the conditions of constant temperature and pressure (e.g. thermochemistry), the majority of isobaric molecular dynamics calculations are performed in the isothermal-isobaric ensemble. Because N , P , and T are the control variables, we often refer to the N P T ensemble for short. Calculations in the N P T ensemble require one of the canonical methods of Chapter 4 to be grafted onto an isoenthalpic method in order to induce fluctuations in the enthalpy. In this section, we will develop molecular dynamics techniques for isotropic volume fluctuations under isothermal conditions. Following this, we will proceed to generalize the method for anisotropic cell fluctuations. Although several algorithms have been proposed in the literature for generating an N P T ensemble, they do not all give the correct ensemble distribution function (Martyna et al., 1994; Tuckerman et al., 2001). Therefore, we will restrict ourselves to the approach of Martyna, Tobias, and Klein (1994) (MTK), which has been proved to yield the correct volume distribution. The failure of other schemes is the subject of Problem 5.9. The starting point for developing the MTK algorithm is eqns. (5.9.6). In order to avoid having to write V˙ /3V repeatedly, we introduce, as a convenience, the variable ǫ = (1/3) ln(V /V0 ), where V0 is the reference volume appearing in the isothermalisobaric partition function of eqn. (5.3.22). A momentum pǫ corresponding to ǫ can be defined according to ǫ˙ = pǫ /W = V˙ /3V . Note that in d dimensions, ǫ = (1/d) ln(V /V0 ) and pǫ = V˙ /dV . In terms of these variables, eqns. (5.9.6) become, in d dimensions, pi pǫ r˙ i = + ri mi W 2 If

P

P Fi = − i ∂U/∂ri = 0, then an additional conservation law of the form K = P exp[(1/3) ln V ] exists, and the equations will not generate eqn. (5.9.8). Note that the equations of motion in scaled variables, eqns. (5.9.4), do not suffer from this pathology. i

Molecular dynamics

p˙ i = −

255

pǫ ∂U − pi ∂ri W

dV pǫ V˙ = W p˙ ǫ = dV (P(int) − P ),

(5.10.1)

where P(int) is the internal pressure estimator of eqn. (4.7.57) or eqn. (4.7.58). Although eqns. (5.10.1) are isobaric, they still lack a proper isothermal coupling and therefore, they do not generate an N P T ensemble. However, we know from Section 4.11 that temperature control can be achieved by coupling eqns. (5.10.1) to a thermostat. Before we discuss the thermostat coupling, however, we need to analyze eqns. (5.10.1) in greater detail, for in introducing the “convenient” variables ǫ and pǫ , we have transformed the incompressible equations (5.9.6) into compressible ones; the compressibility of eqns. (5.10.1) now leads to an incorrect volume dependence in the phase-space measure. Applying the rules of Section 4.10 for analyzing non-Hamiltonian systems, we find that the compressibility of eqns. (5.10.1) is κ=

N X

[∇ri · r˙ i + ∇pi · p˙ i ] +

i=1

= dN =d

∂ V˙ ∂V

pǫ pǫ pǫ − dN +d W W W

pǫ W

=

V˙ V

=

d ln dt



V V0



.

(5.10.2)

√ Thus, the function w(x) = ln(V /V0 ) and the phase-space metric becomes g = exp(−w) = V0 /V . The inverse volume dependence in the phase-space measure leads to an incorrect volume distribution. The origin of this problem is the volume dependence of the transformation leading to eqns. (5.10.1). We can make the compressibility vanish, however, by a minor modification of eqns. (5.10.1). All we need is to add a term that yields an extra −dpǫ /W in the compressibility. One way to proceed is to modify the momentum equation and add a term to the pǫ equation to ensure conservation of energy. If the momentum equation is modified to read   d pǫ ˜ p˙ i = Fi − 1 + pi , (5.10.3) Nf W where Nf is the number of degrees of freedom (dN − Nc ) with Nc the number of constraints, then the compressibility κ will be zero, as required for a proper isobaric

256

Isobaric ensembles

˜ i is the total force on atom i including any forces of constraint. If ensemble. Here, F ˜ i = Fi = −∂U/∂ri . In addition, if the pǫ equation is modified to read: Nc = 0, then F p˙ǫ = dV (P(int) − P ) +

N d X p2i , Nf i=1 mi

(5.10.4)

then eqns. (5.10.1), together with these two modifications, will conserve eqn. (5.9.7). Since, eqns. (5.10.1), together with eqns. (5.10.3) and (5.10.4), possess the correct phase-space metric and conserved energy, they can now be coupled to a thermostat in order to generate a true isothermal-isobaric ensemble. Choosing the Nos´e-Hoover chain approach of Section 4.11, we obtain the equations of motion pi pǫ + ri mi W   pη pǫ d ˜ pi − 1 pi p˙ i = Fi − 1 + Nf W Q1 dV pǫ V˙ = W N d X p2i pξ p˙ǫ = dV (P(int) − P ) + − ′1 pǫ Nf i=1 mi Q1 pη pξ η˙ j = j , ξ˙j = ′j Qj Qj pηj+1 pη p˙ ηj = Gj − Qj+1 j p˙ ηM = GM pξ p˙ ξj = G′j − ′j+1 pξj Qj+1 r˙ i =

p˙ ξM = G′M ,

(5.10.5)

where the Gj are defined in eqn. (4.12.6). Note that eqns. (5.10.5) possess two Nos´eHoover chains. One is coupled to the particles and the other to the volume. The reason for this seemingly baroque scheme is that the particle positions and momenta move on a considerably faster time scale than the volume. Thus, for practical applications, they need to be thermalized independently. The volume thermostat forces G′j are defined in a manner analogous to the particle thermostat forces: G′1 = G′j =

p2ǫ − kT W p2ξj−1 Q′j−1

− kT.

Equations (5.10.5) are the MTK equations, which have the conserved energy

(5.10.6)

Molecular dynamics

H′ =

+

257

N X p2i p2 + U (r1 , ..., rN ) + ǫ + P V 2mi 2W i=1 M X j=1

"

p2ηj 2Qj

+

p2ξj 2Q′j

#

+ kT ξj + Nf kT η1 + kT

M X

ηj .

(5.10.7)

j=2

√ The metric factor associated with these equations is g = exp(dN η1 + η2 + · · · + ηM + ξ1 + · · · ξM ). With this metric and eqn. (5.10.7), it is straightforward to prove, using the techniques of Section 4.10, that these equations do, indeed, generate the correct isothermal-isobaric phase-space distribution (see Problem 5.7). Moreover, they can be modified to include a thermostat on each particle or on each degree of freedom (“massive” thermostatting), as discussed in Section 4.11. To illustrate the use of eqns. (5.10.5), consider the simple example of a particle of mass m moving in a one-dimensional box with length L subject to periodic potential. Let p and q be the momentum and coordinate of the particle, respectively. The potential is given by    2πq mω 2 L2 , (5.10.8) 1 − cos U (q, L) = 4π 2 L where ω is a parameter having units of inverse time. Such a potential could be used, for example, as a simple model for the motion of particles through a nanowire. We will use eqns. (5.10.5) to determine the position and box-length distributions for a given pressure P and temperature T . These distributions are given by     Z ∞ mω 2 L2 2πq P (q) ∝ dL exp [−βP L] exp −β 1 − cos 4π 2 L q     Z L mω 2 L2 2πq dq exp −β P (L) ∝ exp [−βP L] 1 − cos 4π 2 L 0   Z 1 mω 2 L2 [1 − cos (2πs)] , (5.10.9) ∝ L exp [−βP L] ds exp −β 4π 2 0 where the last line is obtained by introducing the scaled coordinate s = q/L. The one-dimensional integrals can be performed using a standard numerical quadrature scheme, yielding “analytical” distributions that can be compared to the simulated ones. The simulations are carried out using a numerical integration scheme that we will present in Section 5.13. Figure 5.3 shows the comparison for the specific case that ω = 1, m = 1, kT = 1, and P = 1. The parameters of the simulation are: W = 18, M = 4, Qj = 1, Q′j = 1, and ∆t = 0.05. It can be seen that the simulated and analytical distributions match extremely well, indicating that eqns. (5.10.5) generate the correct phase-space distribution.

258

Isobaric ensembles

Fig. 5.3 Position and box-length distributions for a particle moving in the one-dimensional potential of eqn. (5.10.8).

5.11 Molecular dynamics in the isothermal-isobaric ensemble II: Anisotropic cell fluctuations Suppose we wish to map out the space of stable crystal structures for a given substance. We can only do this within a molecular dynamics framework if we can sample different cell shapes. For this reason, the development of molecular dynamics approaches with a fully flexible cell or box is an extremely important problem. We have already laid the groundwork in the isotropic scheme developed above and in our derivation of the pressure tensor estimator in eqn. (5.8.1). The key modification we need here is that the nine components of the box matrix h must be treated as dynamical variables with nine corresponding momenta. Moreover, we must devise a set of equations of motion whose compressibility leads to the metric factor   √ 1−d (5.11.1) g = [det(h)] exp dN η1 + ηc + d2 ξ1 + ξc , PM PM where ηc = j=2 ηj and ξc = j=2 ξj , as required by the partition function in eqn. (5.7.6). We begin by defining the 3×3 matrix of box momenta, denoted pg . pg is analogous ˙ −1 where Wg is the time-scale parameter analogous to pǫ in that we let pg /Wg = hh to W in the isotropic case. Rather than repeat the full development presented for the

Molecular dynamics

259

isotropic case, here we will simply propose a set of equations of motion that represent a generalization of eqns. (5.10.5) for fully flexible cells and then prove that they generate the correct distribution. A proposed set of equations of motion is (Martyna et al., 1994) pg pi + ri mi Wg ˜ i − pg pi − 1 Tr [pg ] pi − pη1 pi p˙ i = F Wg Nf Wg Q1 pg h h˙ = Wg r˙ i =

p˙ g = det[h](P(int) − IP ) + η˙j =

pηj , Qj

p˙ ηj = Gj −

N 1 X p2i pξ I − ′1 pg Nf i=1 mi Q1

pξ ξ˙j = ′j Qj pηj+1 pη Qj+1 j

p˙ ηM = GM p˙ ξj = G′j − p˙ ξM = G′M ,

pξj+1 pξ Q′j+1 j (5.11.2)

where P(int) is the internal pressure tensor, whose components are given by eqn. (5.8.1) or (5.8.29), I is the 3×3 identity matrix, the thermostat forces Gj are given by eqns. (4.12.6), and   Tr pTg pg ′ G1 = − d2 kT Wg p2ξ (5.11.3) G′j = ′j−1 − kT. Qj−1 The matrix pTg is the transpose of pg . Equations (5.11.2) have the conserved energy   N X Tr pTg pg p2i ′ + U (r1 , ..., rN ) + + P det[h] H = 2mi 2Wg i=1 " # M X p2ξj p2ηj + + + Nf kT η1 + d2 kT ξ1 + kT (ηc + ξc ) . (5.11.4) 2Qj 2Q′j j=1 P ˜ Furthermore, if i F i = 0, i.e., there are no external forces on the system, then when a global thermostat is used on the particles, there is an additional vector conservation law of the form 1/N K = hP {det [h]} f eη1 , (5.11.5) P where P = i pi is the center-of-mass momentum.

260

Isobaric ensembles

We will now proceed to show that eqns. (5.11.2) generate the ensemble described by eqn. (5.7.7). For the purpose of this analysis, we will P assume that there are no constraints on the system, so that N = dN and that f i Fi 6= 0. The slightly more P F = 0 will be left for the reader to ponder in complex case that arises when i i Problem 5.8. We start by calculating the compressibility of eqns. (5.11.2). Since the matrix multiplications give rise to a mixing among the components of the position and momentum vectors, it is useful to write the equations of motion for ri , pi , h, and pg explicitly in terms of their Cartesian components: pi,α X pg,αβ + ri,β r˙i,α = mi Wg β

p˙ i,α = Fi,α −

X pg,αβ β

Wg

pi,β −

1 Tr [pg ] pη pi,α − 1 pi,α dN Wg Q1

X pg,αγ hγβ

h˙ αβ =

Wg i h 1 X p2i pξ (int) = det(h) Pαβ − P δαβ + δαβ − ′i pg,αβ . dN i mi Q1 γ

p˙ g,αβ

Now, the compressibility is given by #  X" ˙ X  ∂ r˙i,α ∂ p˙ i,α ∂ p˙ g,αβ ∂ hαβ κ= + + + ∂ri,α ∂pi,α ∂hαβ ∂pg,αβ i,α α,β # " M X ∂ p˙ ηj ∂ p˙ ξj ∂ η˙ j ∂ ξ˙j . + + + + ∂ηj ∂pηj ∂ξj ∂pξj j=1

(5.11.6)

(5.11.7)

Carrying out the differentiation using eqns. (5.11.7) and (5.11.2), we find that κ=N

X pg,αβ α,β

− dN

Wg

δαβ − N

M X pηj

X pg,αβ α,β

Wg

δαβ −

1 Tr [pg ] dN dN Wg M

X pξj pg,αβ pξ pη1 − +d δαβ − d2 ′1 − Q1 j=2 Qj Wg Q1 j=2 Q′j

" # M X pηj pξj Tr [pg ] pη1 2 pξ1 = −(1 − d) − dN −d ′ − + ′ . Wg Q1 Q1 j=2 Qj Qj

(5.11.8)

˙ −1 , Since pg /Wg = hh h i Tr [pg ] ˙ −1 . = Tr hh Wg

Using the identity det[h] = exp [Tr(ln h)], we have i i h h d ˙ −1 ˙ −1 = det[h]Tr hh det[h] = eTr[ln h] Tr hh dt

(5.11.9)

Virials

i h ˙ −1 = Tr hh

d 1 d det[h] = ln [det(h)] . det[h] dt dt

261

(5.11.10)

Thus, the compressibility becomes κ = −(1 − d)

i h d ln [det(h)] − dN η˙ 1 − d2 ξ˙1 − η˙ c + ξ˙c , dt

(5.11.11)

which leads to the metric in eqn. (5.11.1). Assuming that eqn. (5.11.4) is the only conservation law, then by combining the metric in eqn. (5.11.1) with eqn. (5.11.4) and inserting these into eqn. (4.10.21) we obtain Z 2 Z = dN p dN r dh dpg dη1 dηc dξ1 dξc dM pη dM pξ [det(h)]1−d edN η1 +ηc ed ξ1 +ξc 

× δ H(r, p) +

M X j=1

"

p2ηj 2Qj

+

p2ξj 2Q′j

#

+ Nf kT η1 + d2 kT ξ1 + kT [ηc + ξc ]

!   Tr pTg pg + + P det[h] − H . 2Wg

If we now integrate over η1 using the δ-function, we find Z 1−d −βP det(h) N Z ∝ dh [det(h)] e d p dN r e−βH(r,p),

(5.11.12)

(5.11.13)

where the constant of proportionality includes uncoupled integrations over the remaining thermostat/barostat variables. Thus, the correct isothermal-isobaric partition function for fully flexible cells is recovered.

5.12

Atomic and molecular virials

The isotropic pressure estimator in eqn. (5.8.28) and pressure tensor estimator in eqn. (5.8.1) were derived assuming a scaling or matrix multiplication of all atomic positions. The resulting virial term in the estimator N X i=1

ri · Fi

is, therefore, known as an atomic virial. Although mathematically correct and physically sensible for purely atomic systems, the atomic virial might seem to be an overkill for molecular systems. In a collection of molecules, assuming no constraints, the force Fi appearing in the atomic virial contains both intramolecular and intermolecular components. If the size of the molecule is small compared to its container, it is more intuitive to think of the coordinate scaling (or multiplication by the cell matrix) as acting only on the centers of mass of the molecules rather than on each atom individually. That is, the scaling should only affect the relative positions of the molecules

262

Isobaric ensembles

rather than the bond lengths and angles within each molecule. In fact, an alternative pressure estimator can be derived by scaling only the positions of the molecular centers of mass rather than individual atomic positions. Consider a system of N molecules with centers of mass at positions R1 , ..., RN . For isotropic volume fluctuations, we would define the scaled coordinates S1 , ..., SN of the centers of mass by Si = V −1/d Ri . (5.12.1) If each molecule has n atoms with masses mi,1 , ..., mi,n and atomic positions ri,1 , ..., ri,n , then the center-of-mass position is Pn α=1 mi,α ri,α . (5.12.2) Ri = P n α=1 mi,α

We saw in Section 1.11 that the center-of-mass motion of each molecule can be separated from internal motion relative to a body-fixed frame. Thus, if the derivation leading up to eqn. (4.7.57) is repeated using the transformation in eqn. (5.12.1), the following pressure estimator is obtained: Pmol (P, R) =

 N  1 X P2i + R i · Fi , dV Mi

(5.12.3)

i=1

where Mi is the mass of the ith molecule, and Pi is the momentum of its center of mass: n X Pi = pi,α , (5.12.4) α=1

and Fi is the force on the center of mass Fi =

n X

Fi,α .

(5.12.5)

α=1

The virial term appearing in eqn. (5.12.3) N X i=1

Ri · Fi

is known as the molecular virial. Given the molecular virial, it is straightforward to derive a molecular dynamics algorithm for the isoenthalpic-isobaric ensemble that uses a molecular virial. The key feature of this algorithm is that the barostat coupling acts only on the center-of-mass positions and momenta. Assuming three spatial dimensions and no constraints between the molecules, the equations of motion take the form pi,α pǫ + Ri mi,α W   1 pǫ mi,α Pi = Fi,α − 1 + N W Mi

r˙ i,α = p˙ i,α

Integrating the MTK equations

263

dV pǫ V˙ = W p˙ǫ = dV (Pmol − P ) +

N 1 X P2i . N i=1 Mi

(5.12.6)

These equations have the conserved energy H′ =

X p2i,α p2 + U (r) + ǫ + P V, 2mi,α 2W i,α

(5.12.7)

where the r in U (r) denotes the full set of atomic positions. The proof that these equations generate the correct isobaric-isoenthalpic ensemble is left as an exercise in Problem 5.10. These equations can easily be generalized for the isothermal-isobaric ensemble with a molecular virial by coupling Nos´e-Hoover chain thermostats as in eqns. (5.10.5). Moreover, starting from the transformation for anisotropic cell fluctuations Si = h−1 Ri ,

(5.12.8)

the algorithm in eqn. (5.12.6) can be turned into an algorithm capable of handling anisotropic volume fluctuations with a molecular virial.

5.13

Integrating the Martyna-Tobias-Klein equations of motion

Integrating the MTK equations is only slightly more difficult than integrating the NHC equations and builds on the methodology we have already developed. We begin with the isotropic case, and for the present, we consider a system in which no constraints ˜ i = Fi = −∂U/∂ri . In Section 5.14, we will see are imposed so that Nf = dN and F how to account for forces of constraint. We first write the total Liouville operator as iL = iL1 + iL2 + iLǫ,1 + iLǫ,2 + iLNHC−baro + iLNHC−part ,

(5.13.1)

where iL1 =

N  X pi i=1

iL2 = iLǫ,1 =

 ∂ pǫ + ri · mi W ∂ri

N h X pǫ i ∂ Fi − α pi · W ∂pi i=1

pǫ ∂ W ∂ǫ

iLǫ,2 = Gǫ

∂ , ∂pǫ

(5.13.2)

and the operators iLNHC−part and iLNHC−baro are the particle and barostat Nos´eHoover chain Liouville operators, respectively, which are defined in the last two lines of eqn. (4.12.5). In eqn. (5.13.2), α = 1 + d/Nf = 1 + 1/N , and

264

Isobaric ensembles N N X X ∂U p2i ri · Fi − dV + − dP V. Gǫ = α m ∂V i i=1 i=1

(5.13.3)

The propagator is factorized following the scheme of eqn. (4.12.8) as     ∆t ∆t exp iLNHC−part exp(iL∆t) = exp iLNHC−baro 2 2     ∆t ∆t × exp iLǫ,2 exp iL2 2 2 × exp (iLǫ,1 ∆t) exp (iL1 ∆t)     ∆t ∆t exp iLǫ,2 × exp iL2 2 2     ∆t ∆t × exp iLNHC−part exp iLNHC−baro + O(∆t3 ) (5.13.4) 2 2 (Tuckerman et al., 2006). In evaluating the action of this propagator, the SuzukiYoshida decomposition developed in eqns. (4.12.17) and (4.12.19) is applied to the operators exp(iLNHC−baro ∆t/2) and exp(iLNHC−part ∆t/2). The operators exp(iLǫ,1 ∆t) and exp(iLǫ,2 ∆t/2) are simple translation operators. The operators exp(iL1 ∆t) and exp(iL2 ∆t/2) are somewhat more complicated than their microcanonical or canonical ensemble counterparts due to the barostat coupling and need further explication. The action of the operator exp(iL1 ∆t) can be determined by solving the first-order differential equation r˙ i = vi + vǫ ri , (5.13.5) keeping vi = pi /mi and vǫ = pǫ /W constant with an arbitrary initial condition ri (0) and then evaluating the solution at t = ∆t. Note that vi must not be confused with the atomic velocity vi = r˙ i = vi + vǫ ri . vi = pi /mi , introduced here for notational convenience to avoid having to write pi /mi explicitly everywhere. Solving eqn. (5.13.5) yields the finite-difference expression ri (∆t) = ri (0)evǫ ∆t + ∆tvi evǫ ∆t/2

sinh(vǫ ∆t/2) . vǫ ∆t/2

(5.13.6)

Similarly, the action of exp(iL2 ∆t/2) can be determined by solving the differential equation Fi v˙ i = − αvǫ vi , (5.13.7) mi keeping Fi and vǫ constant with an arbitrary initial condition vi (0) and then evaluating the solution at t = ∆t/2. This yields the evolution vi (∆t/2) = vi (0)e−αvǫ ∆t/2 +

sinh(αvǫ ∆t/4) ∆t Fi e−αvǫ ∆t/4 . 2mi αvǫ ∆t/4

(5.13.8)

Integrating the MTK equations

265

In practice, the factor sinh(x)/x should be evaluated by a power series for small x to avoid numerical instabilities.3 Equations (5.13.4), (5.13.6) and (5.13.8), together with the Suzuki-Yoshida factorization of the thermostat operators, completely define an integrator for eqns. (5.10.5). The integrator can be easily coded using the direct translation technique. Integrating eqns. (5.11.2) for the fully flexible case employs the same basic factorization scheme as in eqn. (5.13.4). First, we decompose the total Liouville operator as iL = iL1 + iL2 + iLg,1 + iLg,2 + iLNHC−baro + iLNHC−part , (5.13.10) where iL1 =

N  X pi

mi

i=1

iL2 =

N  X i=1

iLg,1 =

+

Fi −

 ∂ pg ri · Wg ∂ri



  pg ∂ 1 Tr [pg ] + I pi · Wg Nf Wg ∂pi

pg h ∂ · Wg ∂h

iLg,2 = Gg

∂ , ∂pg

with Gg = det[h](P(int) − IP ) +

(5.13.11)

N 1 X p2i I. Nf i=1 mi

(5.13.12)

The propagator is factorized exactly as in eqn. (5.13.4) with the contributions to iLǫ replaced by the contributions to iLg . In the flexible case, the application of the operators exp(iL1 ∆t) and exp(iL2 ∆t/2) requires solution of the following matrixvector equations: r˙ i = vi + vg ri (5.13.13)

v˙ i =

Fi − vg vi − bTr [vg ] vi , mi

(5.13.14)

where vg = pg /Wg , and b = 1/Nf . In order to solve eqn. (5.13.13), we introduce a transformation xi = Ori , (5.13.15) 3 The

power series expansion of sinh(x)/x up to tenth order is 5 X sinh(x) ≈ a2n x2n , x n=0

where a0 = 1, a2 = 1/6, a4 = 1/120, a6 = 1/5040, a8 = 1/362880, a10 = 1/39916800.

(5.13.9)

266

Isobaric ensembles

where O is a constant orthogonal matrix. We also let ui = Ovi . Since O is orthogonal, it satisfies OT O = I. Introducing this transformation into eqn. (5.13.13) yields Or˙ i = Ovi + Ovg ri x˙ i = ui + Ovg OT Ori = ui + Ovg OT xi ,

(5.13.16)

where the second line follows from the orthogonality of O. Now, since the pressure tensor is symmetric, vg is also symmetric. Therefore, it is possible to choose O to be the orthogonal matrix that diagonalizes vg according to vg(d) = Ovg OT ,

(5.13.17)

(d)

where vg is a diagonal matrix with the eigenvalues of vg on the diagonal. The columns of O are just the eigenvectors of vg . Let λα , α = 1, 2, 3 be the eigenvectors of vg . Since vg is symmetric, its eigenvalues are real. In this representation, the three components of xi are uncoupled in eqn. (5.13.16) and can be solved independently using eqn. (5.13.6). The solution at t = ∆t for each component of xi is xi,α (∆t) = xi,α (0)eλα ∆t + ∆tvi,α eλα ∆t/2

sinh(λα ∆t/2) . λα ∆t/2

(5.13.18)

Transforming back to ri , we find that ˜ ri (∆t) = OT DOri (0) + ∆tOT DOv i,

(5.13.19)

˜ have the elements where the matrices D and D Dαβ = eλα ∆t δαβ ˜ αβ = eλα ∆t/2 sinh(λα ∆t/2) δαβ . D λα ∆t/2

(5.13.20)

In a similar manner, eqn. (5.13.14) can be solved for vi (t) and the solution evaluated at t = ∆t/2 with the result vi (∆t/2) = OT ∆Ovi (0) +

∆t T ˜ O ∆OFi , 2mi

(5.13.21)

˜ are given by their elements where the matrices ∆ and ∆ ∆αβ = e−(λα +bTr[vg ])∆t/2 δαβ ˜ αβ = e−(λα +bTr[vg ])∆t/4 sinh[(λα + bTr[vg ])∆t/4] δαβ . ∆ (λα + bTr[vg ])∆t/4

(5.13.22)

A technical comment is in order at this point. As noted in Section 5.7, if all nine elements of the box matrix h are allowed to vary independently, then the simulation

Integrating the MTK equations

267

box could execute overall rotational motion, which makes analysis of molecular dynamics trajectories difficult. Overall cell rotations can be eliminated straightforwardly, however (Tobias et al., 1993). One scheme for accomplishing this is to restrict the box matrix to be upper (or lower) triangular only. Consider, for example, what an upper triangular box matrix represents. According to eqn. (5.7.2), if h is upper triangular, then the vector a has only one nonzero component, which is its x-component. Hence, h lies entirely along the x direction. Similarly, b lies entirely in the x-y plane. Only c has complete freedom. With the base of the box firmly rooted in the x-y plane with its a vector pinned to the x-axis, overall rotations of the cell are eliminated. The other option, which is preferable when the system is subject to holonomic constraints, is (int) explicit symmetrization of the pressure tensor Pαβ . That is, we can simply replace (int) (int) (int) (int) occurrences of Pαβ in eqns. (5.11.2) with P˜αβ = (Pαβ + Pβα )/2. This has the effect of ensuring that pg and vg are symmetric matrices. If the initial conditions are chosen such that the angular momentum of the cell is initially zero, then the cell should not rotate. Both techniques can actually be derived using simple holonomic constraints and Lagrange undetermined multipliers (see Problem 5.15). When the number of degrees of freedom in the cell matrix is restricted, factors of d2 in eqns. (5.11.3) and (5.11.4) must be replaced by the correct number of degrees of freedom. If overall cell rotations are eliminated, then this number is d2 − d. The new N P T integrator can also be applied within the multiple time-step RESPA framework of Section 3.11. For two time steps, δt and ∆t = nδt, the following contributions to the total Liouville operator are defined as iL1 =

N  X pi i=1

(fast) iL2

(slow)

iL2

 ∂ pǫ + ri · mi W ∂ri

N h X pǫ i ∂ (fast) − α pi · = Fi W ∂pi i=1

=

N X

(slow)

Fi

i=1

iLǫ,1 =

∂ ∂pǫ

= G(fast) ǫ

(slow)

= G(slow) ǫ

iLǫ,2

∂ ∂pi

pǫ ∂ W ∂ǫ

(fast)

iLǫ,2

·

∂ , ∂pǫ

(5.13.23)

where fast and slow components are designated with superscripts with G(fast) =α ǫ

N X p2 X ∂U (fast) (fast) i − dV ri · Fi + − dP (fast) V m ∂V i i i=1

(5.13.24)

268

Isobaric ensembles

G(slow) ǫ

=

N X i=1

(slow)

ri · Fi

− dV

∂U (slow) − dP (slow) V. ∂V

(5.13.25)

The variables P (fast) and P (slow) are external pressure components corresponding to the fast and slow virial contributions and must be chosen such that P = P (fast) + P (slow) . Although the subdivision of the pressure is arbitrary, a physically meaningful choice can be made. One possibility is to perform a short calculation with a single time step and compute the contributions to the pressure from (N  )+ *  X p2 ∂U (fast) 1 (fast) i (fast) − dV + ri · Fi P = dV i=1 mi ∂V P

(slow)

=

*

1 dV

(

N X i=1

ri ·

(slow) Fi

∂U (slow) − dV ∂V

)+

,

(5.13.26)

that is, using the definitions of the reference system and correction contributions to the internal pressure. Another simple choice is P (fast) =

n P n+1

P (slow) =

1 P. n+1

(5.13.27)

The factorized propagator then takes the form     ∆t ∆t exp iLNHC−part exp(iL∆t) = exp iLNHC−baro 2 2     (slow) ∆t (slow) ∆t × exp iLǫ,2 exp iL2 2 2      (fast) δt (fast) δt exp iLǫ,2 × exp iL2 2 2 × exp (iLǫ,1δt) exp (iL1 δt)    n (fast) δt (fast) δt × exp iLǫ,2 exp iL2 2 2     (slow) ∆t (slow) ∆t exp iLǫ,2 × exp iL2 2 2     ∆t ∆t × exp iLNHC−part exp iLNHC−baro + O(∆t3 ). 2 2

(5.13.28)

Note that because Gǫ depends on the forces Fi , it is necessary to update both the particles and the barostat in the reference system.

Integrating the MTK equations

269

The integrators presented in this section can be generalized to handle systems with constraints under constant pressure. It is not entirely straightforward, however, because self-consistency conditions arise from the nonlinearity of some of the operators. A detailed discussion of the implementation of constraints under conditions of constant pressure can be found in Section 5.14. Note that these integrators can also be formulated using the “middle” scheme discussed in Section 4.12. 5.13.1

Example: Liquid argon at constant pressure

Fig. 5.4 (a) Box-length fluctuations at pressures of P = 0.5 kbar (top curve), P = 1.0 kbar (middle curve), and P = 1.5 kbar (bottom curve), respectively. (b) Radial distribution functions at each of the three pressures.

As an illustrative example of molecular dynamics in the isothermal-isobaric ensemble, we consider first the argon system of Section 3.14.2. Three simulations at applied external pressures of 0.5 kbar, 1.0 kbar, and 1.5 kbar and a temperature of 300 K are carried out, and the radial distribution functions computed at each pressure. The parameters of the Lennard-Jones potential are described in Section 3.14.2, together with the integration time step used in eqn. (5.13.4). Temperature control is achieved using the “massive” Nos´e-Hoover chain scheme of Section 4.11. The values of τ for the particle and barostat Nos´e-Hoover chains are 100.0 fs and 1000.0 fs, respectively, while τb = 500.0 fs. Nos´e-Hoover chains of length M = 4 are employed using nsy = 7 and n = 4 in eqn. (4.12.17). Each simulation is 75 ps in length and carried out in a cubic box with periodic boundary conditions subject only to isotropic volume fluctuations. In Fig. 5.4(a), we show the fluctuations in the box length at each pressure, while in Fig. 5.4(b), we show the radial distribution functions obtained at each pressure. Both panels exemplify the expected behavior of the system. As the pressure increases, the box length decreases. Similarly, as the pressure increases, the liquid becomes more structured, and the first and second peaks in the radial distribution function become sharper. Figure 5.5 shows the density distribution (in reduced units) obtained form

270

Isobaric ensembles

Fig. 5.5 Density distribution for the argon system at P = 0.5 kbar for two different values of τb (Tuckerman et al., 2006). The solid curve with filled circles represents the fit to the Gaussian form in eqn. (5.13.29).

the simulation at P = 0.5 kbar (P ∗ = P σ 3 /ǫ = 1.279). The solid and dashed curves correspond to τb values of 500.0 fs and 5000.0 fs, respectively. It can be seen that the distribution is fairly sharply peaked in both cases around a density value ρ∗ ≈ 0.704, and that the distribution is only sensitive to the value of τb near the peak. Interestingly, the distribution can be fit very accurately to a Gaussian form, ∗ 2 2 1 PG (ρ∗ ) = √ e−(ρ −ρ0 ) /2σ 2 2πσ

(5.13.29)

with a width σ = 0.01596 and average ρ0 = 0.7038. Such a fit is shown in circles on the solid curve in Fig. 5.5. 5.13.2

Example: Structural transition of paracetamol crystal as a function of temperature

Structural diversity abounds in nature, and an area where this structural diversity has a significant impact is the study of organic molecular crystals. An important issue in the study of these crystals is polymorphism, which refers to the ability of a compound to crystallize into multiple forms. In some systems under certain conditions, the landscape of polymorphs may contain multiple thermodynamically close-lying structures. When the most useful form is not the lowest free-energy structure but a metastable

Integrating the MTK equations

271

Fig. 5.6 Contours of the Gibbs free energy of paracetamol form III as a function of temperature T and cell angle α (see Fig. 5.2); the molecular diagram of paracetamol is shown in the box in the lower portion of the figure. The white horizontal line indicates the temperature at which the transition from monoclinic to orthorhombic occurs (reprinted with permission from Hong et al., Cryst. Growth & Design, 2, 886 (2021), copyright American Chemical Society).

one, then an unforeseen transition to a more stable crystal structure can cause pharmaceuticals to fail, affect the performance of organic semiconductors, or change the lethality of a contact insecticide (Yang et al., 2017; Yang et al., 2020; Zhu et al., 2021). When pharmaceutical formulations are created from compressed powders of molecular crystals (the process of “tabletting”), the cost of manufacturing these formulations correlates with the ease of crystallization and compressibility of the resulting crystal. Paracetamol (see Fig. 5.6), also known sometimes as acetaminophen, is an active pharmaceutical compound used as an analgesic. In total, nine different polymorphs of paracetamol have been identified, as discussed by Shtukenberg et al. (2019), with forms I, II, and IIII existing stably under ambient conditions. Of these, paracetamol I is used to create pharmaceutical formulations. Paracetamol I possesses a herringbonelike structure, which has a low compressibility and, therefore, is not ideal for tabletting. Forms II and III, on the other hand, are composed of stacked layers, which are more compressible and, therefore, much better for tabletting. Unfortunately, forms II and III are difficult to crystallize. Form III is particularly interesting in that the stacked layers are arranged in pairs. Intermolecular interactions within each pair are stronger than those between the layer pairs, which means that these layer pairs can slip past each other, thus creating a difference between low- and high-temperature structures of form III. At low temperatures (up to around 175 K), form III appears as a monoclinic structure with the α angle (see Fig. 5.2) taking on values around 82–84◦, while at temperatures above these, it is thought that the structure can be characterized as orthorhombic with α = β = γ = 90◦ . By mapping out the Gibbs free energy as a function of temperature at P = 1 atm using molecular dynamics simulations in the isothermal-isobaric ensemble, the location of the transition between the monoclinic

272

Isobaric ensembles

and orthorhombic structures can be identified (Hong et al., 2021). If, in addition, one integrates over all cell parameters except α and plot G as a function of both T and α, these different structures can be characterized in greater detail. Note that G is not a natural function of α. However, we can think of α as a structural variable for distinguishing the different structures (an idea we will expand upon in Chapter 8). The free energy as a function of T and α is shown in Fig. 5.6. Interestingly, the free energy indicates that at high temperature (above the white line), the structure is best characterized as an ensemble of monoclinic structures with different values of α, the distribution of which peaks at 90◦ and ranges between 82◦ and 98◦ . Below the white line, the free energy clearly shows two minima at two equivalent monoclinic structures. The transition temperature occurs around 180 K, which agrees with an experimental prediction of the transition temperature, placing it between 170 K and 220 K (Reiss et al., 2018).

5.14 The isothermal-isobaric ensemble with constraints: The ROLL algorithm Incorporating holonomic constraints into molecular dynamics calculations in the isobaric ensembles introduces new technical difficulties. The forces in the virial contributions to the pressure and pressure tensor estimators must also include the forces of constraint. According to eqn. (3.9.5), the force on atom i in an N -particle system is ˜ i = Fi + P λk F(k) , where F(k) = ∇i σk (r1 , ..., rN ), where Fi = −∂U/∂ri , and the F c,i c,i k virial part of the pressure is # " N X X 1 (k) (5.14.1) ri · Fi + ri · λk Fc,i . P(vir) = dV i=1 k

The integration algorithm for eqns. (5.10.5) encoded in the factorization of eqn. (5.13.4) generates a nonlinear dependence of the coordinates and velocities on the barostat variables vǫ or vg , while these variables, in turn, depend linearly on the pressure or pressure tensor. The consequence is that the coordinates and velocities acquire a complicated dependence on the Lagrange multipliers, and solving for multipliers is much less straightforward than in the constant-volume ensembles (see Section 3.9). In order to tackle this problem, we need to modify the SHAKE and RATTLE algorithms of Section 3.9. We refer to the modified algorithm as the “ROLL” algorithm.4 It is worth noting that a version of the ROLL algorithm was developed by Martyna, et al. (1996), however, the version that will be described here based on eqn. (5.13.4) is somewhat simpler. Here, we will only consider the problem of isotropic cell fluctuations; the extension to fully flexible cells is straightforward, though tedious (Yu et al., 2010). Because of the highly nonlinear dependence of eqn. (5.13.6) on the Lagrange multipliers, the operators exp(iLǫ,1 ∆t) exp(iL1 ∆t) exp(iL2 ∆t/2) exp(iLǫ,2 ∆t/2) must be applied in an iterative fashion until a self-consistent solution that satisfies the constraints is obtained. The 4 Yes, the “ROLL” moniker does fit well with “SHAKE” and “RATTLE”; however, there is an actual “rolling” procedure in the ROLL algorithm when used in fully flexible cell calculations.

Constraints: The ROLL algorithm

273

full evolution of the coordinates ri is obtained by combining eqns. (5.13.6) and (5.13.8) to give ri (∆t) = ri (0)evǫ ∆t + ∆tvi (∆t/2)evǫ ∆t/2 = ri (0)evǫ ∆t + ∆tevǫ ∆t/2

×

"

(NHC) −αvǫ ∆t/2 e vi

sinh(vǫ ∆t/2) vǫ ∆t/2

sinh(vǫ ∆t/2) vǫ ∆t/2

∆t + 2mi

Fi (0) +

(5.14.2)

X

!

(k) λk Fc,i (0)

k

# sinh(αvǫ ∆t/4) , αvǫ ∆t/4

or (NHC) −vǫ (α−1)∆t/2 sinh(vǫ ∆t/2)

ri (∆t) = ri (0)evǫ ∆t + ∆tvi

e

vǫ ∆t/2

(5.14.3)

" # X sinh(vǫ ∆t/2) sinh(αvǫ ∆t/4) ∆t2 (k) Fi (0) + λk Fc,i (0) e−vǫ (α−2)∆t/4 , + 2mi vǫ ∆t/2 αvǫ ∆t/4 k

(NHC)

is the “velocity” generated by the thermostat operwhere α = 1 + d/Nf . Here, vi ator, exp(iLNHC−part ∆t/2). Because the evolution of vǫ is determined by the pressure, many of the factors in eqn. (5.14.4) depend on the Lagrange multipliers. Thus, let us write eqn. (5.14.4) in the suggestive shorthand form (NHC)

ri (∆t) = Rxx (λ, 0)ri (0) + Rvx (λ, 0)∆tvi " # X ∆t2 (k) RF x (λ, 0) Fi (0) + λk Fc,i (0) , + 2mi

(5.14.4)

k

where λ denotes the full set of Lagrange multipliers. The factors Rxx (λ, 0), Rvx (λ, 0) and RF x (λ, 0) denote the vǫ -dependent factors in eqn. (5.14.4); we refer to them as the “ROLL scalars.” (In the fully flexible cell case, these scalars are replaced by 3×3 matrices.) Note the three operators exp(iLt ∆t) exp(iL2 ∆t/2) exp(iLǫ,2 ∆t/2) also generate the following half-step velocities: " # X ∆t sinh(αvǫ ∆t/4) (NHC) −αvǫ ∆t/2 (k) e + vi (∆t/2) = vi Fi (0) + λk Fc,i (0) e−αvǫ ∆t/4 2mi αvǫ ∆t/4 k " # X ∆t (k) (NHC) RF v (λ, 0) Fi (0) + (5.14.5) λk Fc,i (0) , + ≡ Rvv (λ, 0)vi 2mi k

where we have introduce the ROLL scalars Rvv (λ, 0) and RF v (λ, 0). The first half of the ROLL algorithm is derived by requiring that the coordinates in eqn. (5.14.4) satisfy the constraint conditions σk (r1 (∆t), ..., rN (∆t)) = 0. That is,

274

Isobaric ensembles

eqns. (5.14.4) are inserted into the conditions σk (r1 (∆t), ..., rN (∆t)) = 0, which are then solved for the Lagrange multipliers λ. Once the multipliers are determined, they are substituted into eqns. (5.14.4), (5.14.5), and (5.14.1) to generate final coordinates, half-step velocities, and the virial contribution to the pressure. Unfortunately, unlike the N V E and N V T cases, where the coordinates and velocities depend linearly on the Lagrange multipliers, the highly nonlinear dependence of eqn. (5.14.4) on λ complicates the task of solving for the multipliers. To see how we can solve this problem, we begin ˜ k = (∆t2 /2)λk . We now seed the ROLL algorithm with a guess {λ ˜ (1) } for by letting λ k (1) (1) ˜k = λ ˜ + δλ ˜ . We also assume, at the multipliers and write the exact multipliers as λ k k first, that the ROLL scalars are independent of the multipliers. Thus, when this ansatz for the multipliers is substituted into eqn. (5.14.4), the coordinates can be expressed as X (1) (k) 1 (1) ˜ F (0), RF x (λ, 0) (5.14.6) δλ ri (∆t) = ri + c,i k mi k

˜(1) -dependent δλ k

(1) ri

contains everything except the term. Since we are ignoring where (1) the dependence of the ROLL scalars on the multipliers, ri has no dependence on (1) ˜ . The constraint conditions now become δλ k σl

(1)

r1 +

X (1) (k) 1 ˜ F (0), ..., RF x (λ, 0) δλ c,1 k m1 k

(1) rN

! X (1) (k) 1 ˜ F (0) = 0. RF x (λ, 0) + δλ c,N k mN

(5.14.7)

k

As we did in eqn. (3.9.13), we linearize these conditions using a first-order Taylor expansion: (1)

(1)

σl (r1 , ..., rN ) +

Nc N X X

i=1 k=1

1 ˜(1) F(k) (0) ≈ 0, RF x (λ, 0)δ λ c,i k mi

(5.14.8)

(1)

(1)

(k)

(k)

Fc,i (1) ·

where Fc,i (1) = ∇i σk (r1 , ..., rN ) are the constraint forces evaluated at the positions (1)

ri . As noted in Section 3.9, we can either solve the full matrix equation in eqn. (5.14.8) if the dimensionality is not too large, or as a time-saving measure, we neglect the dependence of eqn. (5.14.8) on l 6= k terms, write the condition as (1)

(1)

σl (r1 , ..., rN ) +

N X i=1

(l)

Fc,i (1) ·

1 ˜(1) F(l) (0) ≈ 0, RF x (λ, 0)δ λ c,i l mi

(5.14.9)

˜(1) to convergence as in Section 3.9. Equation (5.14.9) and iterate the corrections δ λ l ˜(1) to yield can be solved easily for the multiplier corrections δ λ l (1)

˜ δλ l

(1)

= − PN

(1)

σl (r1 , ..., rN )

(l) i=1 (1/mi )RF x (λ, 0)Fc,i (1)

(l)

· Fc,i (0)

.

(5.14.10)

Constraints: The ROLL algorithm

275

˜(1) , once we have them, we Whichever procedure is used to obtain the corrections δ λ l substitute them into eqn. (5.14.1) to obtain a new update to the pressure virial. Using this new pressure virial, we now cycle again through the operators exp(iLǫ,1 ∆t) exp(iL1 ∆t) exp(iL2 ∆t/2) exp(iLǫ,2 ∆t/2), (NHC)

. This will generate a which are applied on the original coordinates ri (0) and vi ˜(1) using new set of ROLL scalars, which we use to generate a new set of corrections δ λ l the above procedure. This cycle is now iterated, each producing successively smaller ˜(n) to the multipliers, until the ROLL scalars stop changing. Once this corrections δ λ l happens, the constraints will be satisfied and the pressure virial will be fully converged. Using the final multipliers, the half-step velocities are obtained from eqn. (5.14.5). It is important to note that, unlike the algorithm proposed by Martyna et al. (1996), this version of the first half of the ROLL algorithm requires no iteration through the thermostat operators. The second half of the ROLL algorithm requires an iteration through the operators exp(iLǫ,2 ∆t/2)exp(iL2 ∆t/2). However, it is also necessary to apply the operators exp(iLNHC−part ∆t/2) and exp(iLNHC−baro ∆t/2) in order to obtain the overall scaling factors on the velocities vi (∆t) and vǫ (∆t), which we will denote Si (∆t) and Sǫ (∆t). Thus, the entire operator whose application must be iterated is ˆ = exp(iLNHC−baro ∆t/2) exp(iLNHC−part ∆t/2) O × exp(iLǫ,2 ∆t/2) exp(iL2 ∆t/2).

(5.14.11)

The evolution of vi can now be expressed as !  X ∆t (k) −αvǫ ∆t/2 Fi (∆t) + µk Fc,i (∆t) vi (∆t) = vi (∆t/2)e + 2mi k

×e

−αvǫ ∆t/4



sinh(αvǫ ∆t/4) αvǫ ∆t/4



Si (∆t),

(5.14.12)

which we can also write as  vi (∆t) = Rvv (µ, ∆t)vi (∆t/2)   X ∆t (k) + RF v (µ, ∆t) Fi (∆t) + µk Fc,i (∆t) Si (∆t), 2mi

(5.14.13)

k

and for vǫ , we obtain

  ∆t vǫ (∆t) = vǫ (∆t/2) + Gǫ (µ, ∆t) Sǫ (∆t). 2W

(5.14.14)

In eqns. (5.14.13) and (5.14.14), the use of µk and µ for the Lagrange multipliers indicates that these multipliers are used to enforce the first time derivative of the

276

Isobaric ensembles

constraint conditions as described in Section 3.9. Let µ ˜k = (∆t/2)µk , and suppose we (1) (1) (1) have a good initial guess to the multipliers µ ˜ k . Then, µ ˜k = µ ˜k + δ µ ˜k , and we can write eqns. (5.14.13) and (5.14.14) in shorthand as X (1) (k) 1 (1) δµ ˜k Fc,i (∆t)Si (∆t) RF v (λ, ∆t) vi (∆t) = vi + mi k

vǫ (∆t) = vǫ(1) +

1 ˜ Sǫ (∆t) W

XX i

k

(1)

(k)

δµ ˜k ri (∆t) · Fc,i (∆t),

(5.14.15)

where S˜ǫ (∆t) = (∆t/2)Sǫ (∆t). As in the first half of the ROLL algorithm, we assume that the ROLL scalars and scaling factors are independent of the multipliers and use eqns. (5.14.15) to determine (1) the corrections δ µ ˜k such that the first time derivative of each constraint condition vanishes. This requires N X (k) (5.14.16) Fc,i · r˙ i = 0. σ˙ k = i=1

However, a slight subtlety arises because according to eqns. (5.10.5), r˙ i 6= vi but rather r˙ i = vi + vǫ ri . Thus, eqn. (5.14.16) becomes a condition involving both vi (∆t) and vǫ (∆t) at t = ∆t: X (k) (5.14.17) Fc,i (∆t) · [vi (∆t) + vǫ (∆t)ri (∆t)] = 0. i

Substituting eqns. (5.14.15) into eqn. (5.14.17) yields  X (k) X (1) (l) 1 (1) Fc,i (∆t) · vi + RF v (µ, ∆t)Si (∆t) δµ ˜l Fc,i (∆t) mi i l   X X (1) 1 ˜ (l) + ri (∆t) vǫ(1) + Sǫ (∆t) = 0. δµ ˜l rj (∆t) · Fc,j (∆t) W j

(5.14.18)

l

As we did with eqn. (5.14.8), we can solve eqn. (5.14.18) as a full matrix equation, or we can make the approximation of independent constraints and iterate to convergence as in Section 3.9. When the latter procedure is used, eqn. (5.14.18) becomes  X (l) 1 (1) (1) (l) Fc,i (∆t) · vi + RF v (µ, ∆t)Si (∆t)δ µ ˜l Fc,i (∆t) m i i   X (1) 1 ˜ (l) + ri (∆t) vǫ(1) + = 0. (5.14.19) δµ ˜l rj (∆t) · Fc,j (∆t) Sǫ (∆t) W j (l)

(1)

Denoting Fc,i · [vi

(1)

+ vǫ ri (∆t)] as σ˙ l (∆t), eqn. (5.14.19) can be solved for the mul(1)

tiplier corrections δ µ ˜l

(1)

to yield δ µ ˜l = −σ˙ l (∆t)/D, where D is given by X 1 (l) (l) RF v (λ, ∆t)Si (∆t)Fc,i (∆t) · Fc,i (∆t) D= m i i

Problems

#2 " X 1 (l) ri (∆t) · Fc,i (∆t) . Sǫ (∆t) + W i

277

(5.14.20)

As in the first part of the ROLL algorithm, once a fully converged set of correction (1) multipliers δ µ ˜l is obtained, we update the pressure virial according to P

(vir)

" # N  X  (1) 1 X (1) (k) = ri · Fi + ri · µ ˜k + δ µ ˜k Fc,i . dV i=1

(5.14.21)

k

We then apply the operators in eqn. (5.14.11) again on the velocities and the vǫ that emerged from the first part of the ROLL procedure in order to obtain a new set of ROLL scalars and scaling factors. We cycle through this procedure, obtaining (n) successively smaller corrections δ µ ˜l , until the ROLL scalars stop changing.

5.15

Problems

5.1. Show that the distribution function for the isothermal-isobaric ensemble can be derived starting from a microcanonical description of a system coupled to both a thermal reservoir and a mechanical piston. 5.2. Calculate the volume fluctuations ∆V given by p ∆V = hV 2 i − hV i2

in the isothermal-isobaric ensemble. Express the answer in terms of the isothermal compressibility κ defined to be   ∂hV i 1 . κ=− hV i ∂P N,T √ Show that ∆V /hV i ∼ 1/ N and hence vanish in the thermodynamic limit.

5.3. Prove the tensorial version of the work virial theorem in eqn. (5.7.15). 5.4. The thermodynamic relation for the pressure tensor of a system in d dimensions undergoing full cell fluctuations at temperature T is (int)

Pαβ =

  d kB T X ∂ ln Q , hβγ det(h) γ=1 ∂hαγ N,T

where h is the cell matrix and Q is the canonical partition function, while the thermodynamic relation for the isotropic pressure is

278

Isobaric ensembles

P

(int)

= kB T



∂ ln Q ∂V



,

N,T

where V = det(h) is the volume of the system. Using only these thermodynamic relations, prove that P (int) is related to the trace of the pressure tensor by d 1 X (int) P (int) = P . 3 α=1 αα N.B.: You cannot use expressions for estimators of the pressure or pressure tensor, as these expressions assume the validity of the thermodynamic relations. Thus, you need to use the aforementioned thermodynamic relations and show that they are consistent. 5.5. Repeat the entropy maximization calculation of Section 5.4 using three separate constraints of normalization and Z ∞ Z Z ∞ Z dV dx H(x)f (x, V ) = E, dV dx V f (x, V ) = hV i, 0

0

using three Lagrange multipliers. Show that the isothermal-isobaric distribution results as the only solution to the maximization problem. ∗

5.6. a. For the ideal gas in Problem 4.7 of Chapter 4, calculate the isothermalisobaric partition function assuming that only the length of the cylinder can vary. Hint: You might find the binomial theorem helpful in this problem. b. Derive an expression for the average length of the cylinder. 5.7. Prove that the isotropic N P T equations of motion in eqns. (5.10.5) generate the correct ensemble distribution function using the techniques of Section 4.10 for the following cases: PN a. i=1 Fi 6= 0, PN b. i=1 Fi = 0, for which there is an additional conservation law    d ǫ + η1 , K = P exp 1 + Nf where P =

PN

i=1

pi is the center-of-mass momentum;

c. “Massive” thermostatting is used on the particles.

Problems

279

5.8. Prove that eqns. (5.11.2) for generating anisotropic volume fluctuations genPN erate the correct ensemble distribution when i=1 Fi = 0.

5.9. One of the first algorithms proposed for generating the isotropic N P T ensemble via molecular dynamics is given by the equations of motion r˙ i =

pǫ pi + ri mi W

p˙ i = −

pǫ pη ∂U − pi − pi ∂ri W Q

dV pǫ V˙ = W p˙ ǫ = dV (P(int) − P ) − η˙ =

p˙η =

pη pǫ Q

pη Q N X p2i p2 + ǫ − (Nf + 1)kT, mi W i=1

(Hoover, 1985), where P(int) is the pressure estimator of eqn. (5.8.28). These equations have the conserved energy H′ = H(r, p) +

p2η p2ǫ + + (Nf + 1)kT η + P V. 2W 2Q

Determine the ensemble distribution function f (r, p, V ) generated by these equations for PN a. i=1 Fi 6= 0. Would the distribution be expected to approach the correct isothermal-isobaric ensemble distribution in the thermodynamic limit? PN ∗ b. i=1 F = 0, in which case, there is an additional conservation law K = Peǫ+η ,

where P is the center-of-mass momentum. Be sure to integrate over all nonphysical variables. 5.10. Prove that eqns. (5.12.6) generate the correct isobaric-isoenthalpic ensemble distribution when the pressure is determined using a molecular virial. 5.11. A simple model for the motion of particles through a nanowire consists of a one-dimensional ideal gas of N particles moving in a periodic potential. Let the Hamiltonian for one particle with coordinate q and momentum p be

280

Isobaric ensembles

   2πq kL2 p2 , 1 − cos + h(q, p) = 2m 4π 2 L where m is the mass of the particle, k is a constant, and L is the length of the one-dimensional “box” or unit cell. a. Calculate the change in the Helmholtz free energy per particle required to change the length of the “box” from L1 to L2 . Express your answer in terms of the zeroth-order modified Bessel function Z 1 π I0 (x) = dθe±x cos θ . π 0 b. Calculate the equation of state by determining the one-dimensional “pressure” P . Do you obtain an ideal-gas equation of state? Why or why not? You might find the following properties of modified Bessel functions useful: 1 dIν (x) = [Iν+1 (x) + Iν−1 (x)] , Iν (x) = I−ν (x). dx 2 c. Write down integral expressions for the position and length distribution functions in the isothermal-isobaric ensemble. 5.12. Write a program to integrate the isotropic N P T equations of motion (5.10.5) for the one-dimensional periodic potential in eqn. (5.10.8) using the integrator in eqn. (5.13.4). The program should be able to generate the distributions in Fig. 5.3. 5.13. How should the algorithm in Section 4.15 for calculating the radial distribution function be modified for the isotropic N P T ensemble? ∗

5.14. Generalize the ROLL algorithm of Section 5.14 to the case of anisotropic cell fluctuations based on eqns. (5.11.2) and the integrator defined by eqns. (5.13.10) and (5.13.11). 5.15. a. Using the constraint condition on the box matrix hαβ = 0 for α > β, show using Lagrange undetermined multipliers, that overall cell rotations in eqns. (5.11.2) can be eliminated simply by working with an upper triangular box matrix. b. Using the constraint condition that pg − pT g = 0, show using Lagrange undetermined multipliers, that overall cell rotations in eqns. (5.11.2) can (int) be eliminated by explicitly symmetrizing the pressure tensor Pαβ . Is this scheme easier to implement within the ROLL algorithm of Section 5.14?

6 The grand canonical ensemble 6.1

Introduction: The need for yet another ensemble

The ensembles discussed thus far all have the common feature that the particle number N is kept fixed as one of the control variables. The fourth ensemble to be discussed, the grand canonical ensemble, differs in that it permits fluctuations in the particle number at constant chemical potential µ. Why is such an ensemble necessary? As useful as the isothermal-isobaric and canonical ensembles are, numerous physical situations correspond to a system in which the particle number varies. These include liquid–vapor equilibria, capillary condensation, and, notably, molecular electronics and batteries, in which a device is assumed to be coupled to an electron source. In computational molecular design, one seeks to sample a complete “chemical space” of compounds in order to optimize a particular property (e.g. binding energy to a target), which requires varying both the number and chemical identity of the constituent atoms. Finally, in certain cases, it simply proves easier to work in the grand canonical ensemble, and given that all ensembles become equivalent in the thermodynamic limit, we are free to choose the ensemble that proves most convenient for the problem at hand. In this chapter, we introduce the basic thermodynamics and classical statistical mechanics of the grand canonical ensemble. We will begin with a discussion of Euler’s theorem and a derivation of the free energy. Following this, we will consider the partition function of a physical system coupled to both thermal and particle reservoirs. Finally, we will discuss the procedure for obtaining an equation of state within the framework of the grand canonical ensemble. Because of the inherently discrete nature of particle fluctuations, the grand canonical ensemble does not easily fit into the continuous molecular dynamics framework we have discussed so far for kinetic-energy and volume fluctuations. Therefore, a discussion of computational approaches to the grand canonical ensemble will be deferred until Chapters 7 and 8. These chapters will develop the machinery needed to design computational approaches suitable for the grand canonical ensemble.

6.2

Euler’s theorem

Euler’s theorem is a general statement about a certain class of functions known as homogeneous functions of degree n. Consider a function f (x1 , ..., xN ) of N variables that satisfies f (λx1 , ..., λxk , xk+1 , ..., xN ) = λn f (x1 , ..., xk , xk+1 , ...xN )

(6.2.1)

282

Grand canonical ensemble

for an arbitrary parameter, λ. We call such a function a homogeneous function of degree n in the variables x1 , ..., xk . The function f (x) = x2 , for example, is a homogeneous function of degree 2. The function f (x, y, z) = xy 2 + z 3 is a homogeneous function of degree 3 in all three variables x, y, and z. The function f (x, y, z) = x2 (y 2 + z) is a homogeneous function of degree 2 in x but not in y and z. The function f (x, y) = exy − xy is not a homogeneous function in either x or y. Euler’s theorem states the following: Let f (x1 , ..., xN ) be a homogeneous function of degree n in x1 , ..., xk . Then, nf (x1 , ..., xN ) =

k X

xi

i=1

∂f . ∂xi

(6.2.2)

The proof of Euler’s theorem is straightforward. Beginning with eqn. (6.2.1), we differentiate both sides with respect to λ to yield: d d n f (λx1 , ..., λxk , xk+1 , ..., xN ) = λ f (x1 , ..., xk , xk+1 , ..., xN ) dλ dλ k X i=1

xi

∂f = nλn−1 f (x1 , ..., xk , xk+1 , ..., xN ). ∂(λxi )

(6.2.3)

Since λ is arbitrary, we may freely choose λ = 1, which yields k X i=1

xi

∂f = nf (x1 , ..., xk , xk+1 , ..., xN ) ∂xi

(6.2.4)

and proves the theorem. What does Euler’s theorem have to do with thermodynamics? Consider, for example, the Helmholtz free energy A(N, V, T ), which depends on two extensive variables, N and V . Since A is, itself, extensive, A ∼ N , and since V ∼ N , A must be a homogeneous function of degree 1 in N and V , i.e., A(λN, λV, T ) = λA(N, V, T ). Applying Euler’s theorem, it follows that A(N, V, T ) = V

∂A ∂A +N . ∂V ∂N

(6.2.5)

From the thermodynamic relations of the canonical ensemble for pressure and chemical potential, we have P = −(∂A/∂V ) and µ = (∂A/∂N ). Thus, A = −P V + µN.

(6.2.6)

We can verify this result by recalling that A(N, V, T ) = E − T S.

(6.2.7)

From the first law of thermodynamics, E − T S = −P V + µN,

(6.2.8)

Thermodynamics

283

so that A(N, V, T ) = −P V + µN,

(6.2.9)

which agrees with Euler’s theorem. Similarly, the Gibbs free energy G(N, P, T ) is a homogeneous function of degree 1 in N only, i.e., G(λN, P, T ) = λG(N, P, T ). Thus, from Euler’s theorem, ∂G = µN, (6.2.10) G(N, P, T ) = N ∂N which agrees with the definition G = E−T S+P V = µN . From these two examples, we see that Euler’s theorem allows us to derive alternative expressions for extensive thermodynamic functions such as the Gibbs and Helmholtz free energies. As will be shown in the next section, Euler’s theorem simplifies the derivation of the thermodynamic relations of the grand canonical ensemble.

6.3

Thermodynamics of the grand canonical ensemble

In the grand canonical ensemble, the control variables are the chemical potential µ, the volume V , and the temperature T . The free energy of the ensemble can be obtained by performing a Legendre transformation of the Helmholtz free energy A(N, V, T ). Let ˜ V, T ) be the transformed free energy, which we obtain as A(µ,   ˜ V, T ) = A(N (µ), V, T ) − N ∂A A(µ, ∂N V,T ˜ V, T ) = A(N (µ), V, T ) − N (µ)µ. A(µ,

(6.3.1)

Since A˜ is a function of µ, V , and T , a small change in each of these variables leads to a change in A˜ given by ! ! ! ˜ ˜ ˜ ∂ A ∂ A ∂ A dµ + dV + dT. (6.3.2) dA˜ = ∂µ ∂V ∂T V,T

µ,T

µ,V

However, from the first law of thermodynamics, dA˜ = dA − N dµ − µdN = −P dV − SdT + µdN − N dµ − µdN = −P dV − SdT − N dµ, and we obtain the thermodynamic relations ! ! ∂ A˜ ∂ A˜ , P =− hN i = − ∂µ ∂V V,T

, µ,T

(6.3.3)

S=−

∂ A˜ ∂T

!

.

(6.3.4)

V,µ

In the above relations, hN i denotes the average particle number. Euler’s theorem can be used to determine a relation for A˜ in terms of other thermodynamic variables. Since

284

Grand canonical ensemble

A˜ depends on a single extensive variable, V , it is a homogeneous function of degree 1 ˜ λV, T ) = λA(µ, ˜ V, T ). From Euler’s theorem, in V , i.e., A(µ, ∂ A˜ A˜ = V , ∂V

(6.3.5)

which, according to eqn. (6.3.4), becomes A˜ = −P V.

(6.3.6)

Thus, −P V is the natural free energy of the grand canonical ensemble. Unlike other ensembles, A˜ = −P V is not given a unique symbol. Rather, because it leads directly to the equation of state, the free energy is simply denoted −P V .

6.4

Grand canonical phase space and the partition function

Since the grand canonical ensemble uses µ, V , and T as its control variables, it is convenient to think of this ensemble as a canonical ensemble coupled to a particle reservoir, which drives the fluctuations in the particle number. As the name implies, a particle reservoir is a system that can gain or lose particles without appreciably changing its own particle number. Thus, we imagine two systems coupled to a common thermal reservoir at temperature T , such that system 1 has N1 particles and volume V1 and system 2 has N2 particles and a volume V2 . The two systems can exchange particles, with system 2 acting as a particle reservoir (see Fig. 6.1). Hence, N2 ≫ N1 . The total particle number and volume are

Fig. 6.1 Two systems in contact with a common thermal reservoir at temperature T . System 1 has N1 particles in a volume V1 ; system 2 has N2 particles in a volume V2 . The dashed lines indicate that systems 1 and 2 can exchange particles.

Phase space and partition function

N = N1 + N2 ,

V = V1 + V2 .

285

(6.4.1)

In order to carry out the derivation of the ensemble distribution function, we will need to consider explicitly the dependence of the Hamiltonian on particle number, usually appearing as the upper limit of sums in the kinetic and potential energies. Therefore, let H1 (x1 , N1 ) be the Hamiltonian of system 1 and H(x2 , N2 ) be the Hamiltonian of system 2. As usual, we will take the total Hamiltonian to be H(x, N ) = H1 (x1 , N1 ) + H2 (x2 , N2 ).

(6.4.2)

Consider first the simpler case in which systems 1 and 2 do not exchange particles. The overall canonical partition function in this limit is Z Z 1 dx dx2 e−β[H1 (x1 ,N1 )+H2 (x2 ,N2 )] Q(N, V, T ) = 1 N !h3N Z Z N1 !N2 ! 1 1 −βH1 (x1 ,N1 ) = dx e dx2 e−βH2 (x2 ,N2 ) 1 N ! N1 !h3N1 N2 !h3N2 =

N1 !N2 ! Q1 (N1 , V1 , T )Q2 (N2 , V2 , T ), N!

(6.4.3)

where Q1 (N1 , V1 , T ) and Q2 (N2 , V2 , T ) are the canonical partition functions of systems 1 and 2, respectively, at the common temperature T . When the systems are allowed to exchange particles, the right side of eqn. (6.4.3) represents one specific choice of N1 particles for system 1 and N2 = N − N1 particles for systems 2. In order to account for particle number variations in systems 1 and 2, the true partition function must contain a sum over all possible values of N1 and N2 on the right side of eqn. (6.4.3) subject to the restriction that N1 + N2 = N . This restriction is accounted for by summing only N1 or N2 over the range [0, N ]. For concreteness, we will carry out the sum over N1 and set N2 = N − N1 . Additionally, we need to weight each term in the sum by a degeneracy factor g(N1 , N2 ) = g(N1 , N − N1 ) that accounts for the number of distinct configurations that exist for particular values of N1 and N2 . Thus, the partition function for varying particle numbers is Q(N, V, T ) =

N X

N1 =0

g(N1 , N − N1 )

N1 !(N − N1 )! N!

× Q1 (N1 , V1 , T )Q2 (N − N1 , V − V1 , T ),

(6.4.4)

where we have used the fact that V1 + V2 = V . We now determine the degeneracy factor g(N1 , N − N1 ). For the N1 = 0 term, g(0, N ) represents the number of ways in which system 1 can have 0 particles and system 2 can have all N particles. There is only one way to create such a configuration, hence g(0, N ) = 1. For N1 = 1, g(1, N − 1) represents the number of ways in which system 1 can have one particle and system 2 can have (N − 1) particles. Since there are N ways to choose that one particle to place in system 1, it follows that g(1, N −1) = N .

286

Grand canonical ensemble

When N1 = 2, we need to place two particles in system 1. The first particle can be chosen in N ways, while the second can be chosen in (N − 1) ways, which seems to lead to a product N (N − 1) ways that this configuration can be created. However, choosing particle 1, for example, as the first particle to put into system 1 and particle 2 as the second particle leads to the same physical configuration as choosing particle 2 as the first particle and particle 1 as the second. Thus, the degeneracy factor g(2, N − 2) is actually N (N − 1)/2. In general, g(N1 , N1 − 1) is nothing more than the number of ways of placing N “labeled” objects into 2 containers, which is just the well-known binomial coefficient N! . (6.4.5) g(N1 , N1 − N ) = N1 !(N − N1 )! We can check eqn. (6.4.5) against the specific examples we analyzed: g(0, N ) =

N! =1 0!N !

g(1, N − 1) =

N! =N 1!(N − 1)!

g(2, N − 2) =

N! N (N − 1) = . 2!(N − 2)! 2

(6.4.6)

Interestingly, the degeneracy factor exactly cancels the N1 !(N − N1 )!/N ! appearing in eqn. (6.4.4). This cancellation is not unexpected since, as we recall, the latter factor was included as a “fudge factor” to correct for the fact that classical particles are always distinguishable, and we need our results to be consistent with the indistinguishable nature of the particles (recall Section 3.5.1). Thus, all N configurations in which one particle is in system 1 are physically the same, and so forth. Inserting eqn. (6.4.5) into eqn. (6.4.4) gives Q(N, V, T ) =

N X

N1 =0

Q1 (N1 , V1 , T )Q2 (N − N1 , V − V1 , T ).

(6.4.7)

Now the total phase-space distribution function f (x, N ) = satisfies the normalization condition Z

e−βH(x,N ) N !h3N Q(N, V, T )

dx f (x, N ) = 1,

(6.4.8)

(6.4.9)

since it is just a canonical distribution. However, the phase-space distribution of system 1, obtained by integrating over x2 according to

Phase space and partition function

f1 (x1 , N1 ) = =



e−βH1 (x1 ,N1 ) Q(N, V, T )N1 !h3N1



1 (N − N1 )!h3(N −N1 )

Z

287

dx2 e−βH2 (x2 ,N −N1 )

1 Q2 (N − N1 , V − V1 , T ) e−βH1 (x1 ,N1 ) , Q(N, V, T ) N1 !h3N1

(6.4.10)

satisfies the normalization condition N Z X

dx1 f (x1 , N1 ) = 1.

(6.4.11)

N1 =0

Since the total partition function is canonical, Q(N, V, T ) = exp[−βA(N, V, T )] where A(N, V, T ) is the Helmholtz free energy, and it follows that Q2 (N − N1 , V − V1 , T ) = e−β[A(N −N1,V −V1 ,T )−A(N,V,T )], Q(N, V, T )

(6.4.12)

where we have assumed that system 1 and system 2 are described by the same set of physical interactions, so that the functional form of the free energy is the same for both systems and for the total system. Since N ≫ N1 and V ≫ V1 , we may expand A(N − N1 , V − V1 , T ) about N1 = 0 and V1 = 0. To first order, the expansion yields A(N − N1 , V − V1 , T ) ≈ A(N, V, T ) −

∂A ∂A N1 − V1 ∂N ∂V

= A(N, V, T ) − µN1 + P V1 .

(6.4.13)

Thus, the phase-space distribution of system 1 becomes f (x1 , N1 ) =

1 eβµN1 e−βP V1 e−βH1 (x1 ,N1 ) N1 !h3N1

=

1 1 eβµN1 βP V1 e−βH1 (x1 ,N1 ) . N1 !h3N1 e

(6.4.14)

Since system 2 quantities no longer appear in eqn. (6.4.14), we may drop the “1” subscript and write the phase-space distribution for the grand canonical ensemble as f (x, N ) =

1 1 eβµN βP V e−βH(x,N ). N !h3N e

(6.4.15)

Moreover, taking the thermodynamic limit, the summation over N is now unrestricted (N ∈ [0, ∞)), so the normalization condition becomes ∞ Z X

N =0

which implies that

dx f (x, N ) = 1,

(6.4.16)

288

Grand canonical ensemble ∞ X

1 eβP V

eβµN

N =0

1 N !h3N

Z

dx e−βH(x,N ) = 1.

(6.4.17)

Taking the exp(βP V ) factor to the right side, we obtain ∞ X

e

βµN

N =0

1 N !h3N

Z

dx e−βH(x,N ) = eβP V .

(6.4.18)

˜ V, T ) is the free energy of the grand canonical However, recall that −P V = A(µ, ensemble. Thus, exp(βP V ) = exp[−β(−P V )] is equal to the partition function. In the grand canonical ensemble, we denote the partition function as Z(µ, V, T ), and it is given by Z(µ, V, T ) =

∞ X

N =0

=

∞ X

e

βµN

1 N !h3N

Z

dx e−βH(x,N )

eβµN Q(N, V, T ).

(6.4.19)

N =0

The product P V is thus related to Z(µ, V, T ) by PV = ln Z(µ, V, T ). kT

(6.4.20)

According to eqn. (6.4.20), the equation of state can be obtained directly from the partition function in the grand canonical ensemble. Recall, however, that the equation of state is of the general form (cf. eqn. (2.2.1)) g(hN i, P, V, T ) = 0,

(6.4.21)

which is a function of hN i rather than µ. This suggests that a second equation for the average particle number hN i is needed. By definition, hN i =

∞ X 1 N eβµN Q(N, V, T ), Z(µ, V, T )

(6.4.22)

N =0

which can be expressed as a derivative of Z with respect to µ as   ∂ ln Z(µ, V, T ) . hN i = kT ∂µ V,T

(6.4.23)

Equations (6.4.23) and (6.4.20) give a prescription for finding the equation of state in the grand canonical ensemble. Equation (6.4.23) must be solved for µ in terms of hN i and then substituted back into eqn. (6.4.20) in order to obtain an equation in the proper form.

Entropy maximization

289

For other thermodynamic quantities, it is convenient to introduce a new variable ζ = eβµ

(6.4.24)

known as the fugacity. Since ζ and µ are directly related, the fugacity can be viewed as an alternative external control variable for the grand canonical ensemble, and the partition function can be expressed in terms of ζ as Z(ζ, V, T ) =

∞ X

ζ N Q(N, V, T ),

(6.4.25)

N =0

so that PV = ln Z(ζ, V, T ). kT

(6.4.26)

∂ζ ∂ ∂ ∂ = = βζ , ∂µ ∂µ ∂ζ ∂ζ

(6.4.27)

Since

the average particle number can be computed from Z(ζ, V, T ) by hN i = ζ

∂ ln Z(ζ, V, T ). ∂ζ

(6.4.28)

Thus, the equation of state results when eqn. (6.4.28) is solved for ζ in terms of hN i and substituted back into eqn. (6.4.26). Other thermodynamic quantities can be obtained as well. The average energy, E = hH(x, N )i, is given by Z ∞ 1 X N 1 ζ dx H(x, N )e−βH(x,N ) Z N !h3N N =0   ∂ . ln Z(ζ, V, T ) =− ∂β ζ,V

E = hH(x, N )i =

(6.4.29)

In eqn. (6.4.29), it must be emphasized that the average energy is computed as the derivative with respect to β of ln Z at fixed V and ζ rather than at fixed V and µ. The entropy is given in terms of the derivative of the free energy with respect to T : S(µ, V, T ) = −



∂(−P V ) ∂T



µ,V

= k ln Z(µ, V, T ) − kβ



 ∂ ln Z(µ, V, T ) . ∂β µ,V

(6.4.30)

For the entropy, the temperature derivative must be taken at fixed µ rather than at fixed ζ.

290

6.5

Grand canonical ensemble

Grand canonical ensemble via entropy maximization

As we did in Chapters 4 and 5, we will use entropy maximization to derive the phasespace distribution function of the grand canonical ensemble. Following the procedure of Section 4.4 and Section 5.4, we define the Shannon entropy functional in terms of the distribution function f (x, N ): S[f ] = −k

∞ Z X

dx f (x, N ) ln f (x, N ).

(6.5.1)

N =0

The correct phase-space distribution of the grand canonical ensemble is the one that maximizes the entropy subject to a constraint of normalization (cf. eqn. (6.4.16)) and the following constraints on the ensemble average of the Hamiltonian H(x) and particle number: ∞ Z X dx H(x)f (x, N ) = E N =0

∞ Z X

N =0

dx N f (x, N ) = hN i.

(6.5.2)

Here, E and hN i are the correct grand canonical ensemble averages. The three constraints will be incorporated with Lagrange multipliers µ (not to be confused with the chemical potential), λ, and γ via the modified entropy functional ! ∞ Z ∞ Z X X ˜ ] = −k S[f dx f (x, N ) ln f (x, N ) − µ dx f (x, N ) − 1 (6.5.3) N =0

−λ

∞ X

N =0

N =0

Z

dx H(x)f (x, N ) − E

!

−γ

∞ Z X

N =0

!

dx N f (x, N ) − hN i .

Taking the functional derivative and setting it equal to 0, we obtain δ S˜ = −k [ln f (x, N ) + 1] − µ − λH(x) − γN = 0. δf (x, N )

(6.5.4)

Solving for f (x, N ), we find f (x, N ) = e−(1+µ/k) e−λH(x)/k e−γN/k .

(6.5.5)

The Lagrange multipliers are determined by requiring that S˜ satisfy the conditions ∂ S˜ 1 = , ∂E T

∂ S˜ µ =− . ∂hN i T

(6.5.6)

˜ ˜ However, from eqn. (6.5.4), we see that ∂ S/∂E = λ and ∂ S/∂hN i = γ, from which it follows that λ = 1/T and γ = −µ/T . Substituting these into eqn. (6.5.5), we obtain

Ideal gas

f (x, N ) = e−(1+µ/k) e−H(x)/kT eµN/kT ,

291

(6.5.7)

and after normalization, we find f (x, N ) =

CN e−(H(x)−µN )/kT , Z(µ, V, T )

(6.5.8)

which is the expected result for the grand canonical phase-space distribution.

6.6

Illustration of the grand canonical ensemble: The ideal gas

In Chapter 11, the grand canonical ensemble will be used to derive the properties of the quantum ideal gases. It will be seen that the use of the grand canonical ensemble greatly simplifies the treatment over the canonical ensemble. Thus, in order to prepare for this analysis, it is instructive to illustrate the grand canonical procedure for deriving the equation of state with a simple example, namely, the classical ideal gas. Since the partition function of the grand canonical ensemble is given by eqn. (6.4.25), we can start by recalling the expression of the canonical partition function of the classical ideal gas "   N 3/2 #N 2πm V 1 1 V = . (6.6.1) Q(N, V, T ) = 2 N! βh N ! λ3 Substituting this expression into eqn. (6.4.25) gives Z(ζ, V, T ) =

∞ X

ζN

N =0

1 N!



V λ3

N

 N ∞ X 1 Vζ . = N ! λ3

(6.6.2)

N =0

Equation (6.6.2) is in the form of a Taylor series expansion for the exponential: ex =

∞ X xk . k!

(6.6.3)

k=0

Equation (6.6.2) can, therefore, be summed over N to yield 3

Z(ζ, V, T ) = eV ζ/λ .

(6.6.4)

The procedure embodied in eqns. (6.4.28) and (6.4.26) requires first the calculation of ζ as a function of hN i. From eqn. (6.4.28), hN i = ζ Thus,

∂ Vζ ln Z(ζ, V, T ) = 3 . ∂ζ λ

(6.6.5)

292

Grand canonical ensemble

ζ(hN i) =

hN iλ3 . V

(6.6.6)

From eqn. (6.6.4), we have PV Vζ = ln Z(ζ, V, T ) = 3 . kT λ

(6.6.7)

By substituting ζ(hN i) into eqn. (6.6.7), the expected equation of state results: PV = hN i, kT

(6.6.8)

which contains the average particle number hN i instead of N as would appear in the canonical ensemble. Similarly, the average energy is given by E=−

3V ζ ∂λ ∂ Vζ 3 ∂ = 4 ln Z(ζ, V, T ) = − = hN ikT. ∂β ∂β λ3 λ ∂β 2

(6.6.9)

Finally, in order to compute the entropy, Z must be expressed in terms of µ rather than ζ, i.e., V eβµ ln Z(µ, V, T ) = . (6.6.10) λ3 Then,   ∂ ln Z(µ, V, T ) S(µ, V, T ) = k ln Z(µ, V, T ) − kβ ∂β µ,V   3V eβµ ∂λ V eβµ V µeβµ . (6.6.11) − = k 3 − kβ 3 λ λ λ4 ∂β Using the facts that V eβµ Vζ = 3 = hN i, 3 λ λ

∂λ λ = , ∂β 2β

(6.6.12)

we obtain 3 hN i 2 β   5 hN iλ3 = hN ik − hN ik ln 2 V   V 5 , = hN ik + hN ik ln 2 hN iλ3

S = khN i − kβhN ikT ln ζ + kβ

(6.6.13)

which is the Sackur-Tetrode equation derived in Section 3.5.1. Note that because the 1/N ! is included a posteriori in the expression for Q(N, V, T ), the correct quantum mechanical entropy expression results.

Particle number fluctuations

6.7

293

Particle number fluctuations in the grand canonical ensemble

In the grand canonical ensemble, the total particle number fluctuates at constant chemical potential. It is, therefore, instructive to analyze these fluctuations, as was done for the energy fluctuations in the canonical ensemble (Section 4.5) and volume fluctuations in the isothermal-isobaric ensemble (see Problem 5.2 in Chapter 5). Particle number fluctuations in the grand canonical ensemble can be studied by considering the variance p ∆N = hN 2 i − hN i2 . (6.7.1)

In order to compute this quantity, we start by examining the operation ζ

∂ ∂ ζ ln Z(ζ, V, T ). ∂ζ ∂ζ

(6.7.2)

Using eqn. (6.4.25), this becomes ζ

∞ ∂ 1 X ∂ ∂ ζ ln Z(ζ, V, T ) = ζ N ζ N Q(N, V, T ) ∂ζ ∂ζ ∂ζ Z N =0

∞ 1 X 2 N 1 = N ζ Q(N, V, T ) − 2 Z Z N =0 2

"

∞ X

#2

N ζ N Q(N, V, T )

N =0

2

= hN i − hN i .

(6.7.3)

Thus, we have 2

(∆N ) = ζ

∂ ∂ ζ ln Z(ζ, V, T ). ∂ζ ∂ζ

(6.7.4)

Expressing eqn. (6.7.4) as derivatives of Z(µ, V, T ) with respect to µ, we obtain 2

(∆N ) = (kT )2

∂2 ∂2 P V ln Z(µ, V, T ) = (kT )2 2 . 2 ∂µ ∂µ kT

(6.7.5)

Since µ, V , and T are the independent variables in the ensemble, only the pressure in the above expression depends on µ, and we can write 2

(∆N ) = kT V

∂ 2P . ∂µ2

(6.7.6)

Therefore, computing the particle number fluctuations amounts to computing the second derivative of the pressure with respect to chemical potential. This is a rather nontrivial bit of thermodynamics, which can be carried out in a variety of ways. One approach is the following: Let A(N, V, T ) be the canonical Helmholtz free energy at a particular value of N . Recall that the pressure can be obtained from A(N, V, T ) via   ∂A . (6.7.7) P =− ∂V

294

Grand canonical ensemble

Since A(N, V, T ) is an extensive quantity, and we want to make the N dependence in the analysis as explicit as possible, we define an intensive Helmholtz free energy a(v, T ) by   V 1 (6.7.8) a(v, T ) = A N, , T , N N where v = V /N is the volume per particle and a(v, T ) is clearly the Helmholtz free energy per particle. Then, P = −N

∂a 1 ∂a ∂a ∂v = −N =− . ∂v ∂V ∂v N ∂v

(6.7.9)

From eqn. (6.7.9), it follows that ∂P ∂P ∂v ∂ 2 a ∂v = =− 2 . ∂µ ∂v ∂µ ∂v ∂µ

(6.7.10)

We can obtain an expression for ∂µ/∂v by µ=

∂A ∂N

= a(v, T ) + N = a(v, T ) − v

∂a ∂v ∂v ∂N

∂a , ∂v

(6.7.11)

so that ∂a ∂a ∂2a ∂µ = − −v 2 ∂v ∂v ∂v ∂v = −v

∂2a . ∂v 2

(6.7.12)

Substituting this result into eqn. (6.7.10) gives  −1  −1 ∂2a ∂2a ∂ 2 a ∂µ 1 ∂P = 2 v 2 =− 2 = . ∂µ ∂v ∂v ∂v ∂v v

(6.7.13)

Differentiating eqn. (6.7.13) once again with respect to µ gives  2 −1 ∂2P ∂ a 1 ∂v 1 1 =− 2 = 2 v 2 . =− 3 ∂µ2 v ∂µ v ∂v v ∂P/∂v

(6.7.14)

Now, recall that the isothermal compressibility is given by κT = −

1 ∂v 1 1 ∂V =− =− V ∂P v ∂P v∂P/∂v

(6.7.15)

Potential distribution theorem

295

and is an intensive quantity. It is clear from eqn. (6.7.14) that ∂ 2 P/∂µ2 can be expressed in terms of κT as 1 ∂2P = 2 κT , (6.7.16) 2 ∂µ v so that 2

(∆N ) = kT hN iv

1 hN ikT κT κT = , 2 v v

(6.7.17)

where the specific value of N has been replaced by its average value hN i in the grand canonical ensemble. The relative fluctuations in particle number can now be computed from s r hN ikT κT kT κT 1 1 ∆N = = ∼ p . (6.7.18) hN i hN i v hN iv hN i Thus, as hN i −→ ∞ in the thermodynamic limit, the particle fluctuations vanish and the grand canonical ensemble is seen to be equivalent to the other ensembles in this limit.

6.8

Potential distribution theorem

Up to this point, our treatment of the grand canonical ensemble has been restricted to one-component systems of N particles. Extending the ensemble formalism to mixed systems, such as solutions or co-crystals, that contain multiple components, is straightforward in all ensembles except the grand canonical ensemble, since for mixtures we need to do the bookkeeping of the extra particle-number sums in constructing the partition function and ensemble averages. As a vehicle for illustrating the treatment of multiple species within the grand canonical ensemble, we will present a derivation of an important result in the theory of solutions, namely the potential distribution theorem, which was originally derived by Widom (1963, 1982). This theorem provides a statistical mechanical framework for computing the excess chemical potential associated with the addition of an extra solute molecule to a solution, a topic we will revisit in Chapter 8. Let us consider a solution containing Nl solvent particles and Ns solute particles. Even though Nl + Ns = N , the total particle number, N , varies from 0 to ∞, which (l) (l) means we can vary Nl and Ns separately. Let r1 , ..., rNl ≡ r(l) denote the solvent (s)

(s)

particle coordinates, and let r1 , ..., rNs ≡ r(s) denote the solute particle coordinates. The grand canonical partition function of the solution is Z(ζl , ζs , V, T ) = eβP V =

∞ X ∞ X

ζlNl ζsNs Q(Nl , Ns , V, T ).

(6.8.1)

Nl =0 Ns =0

Equation (6.8.1) reflects the fact that each species in the mixture has its own chemical potential and corresponding fugacity: βµl = ln ζl , βµs = ln ζs . The potential distribution theorem starts by considering the average number of solute particles in the solution, which can be expressed as a grand canonical ensemble average of the form

296

Grand canonical ensemble

hNs i = e−βP V

∞ X ∞ X

Ns ζlNl ζsNs Q(Nl , Ns , V, T ).

(6.8.2)

Nl =0 Ns =0

The presence of Ns in the summand means that the Ns sum begins at 1, since the first term is 0. Thus, hNs i = e−βP V

∞ X ∞ X

Ns ζlNl ζsNs Q(Nl , Ns , V, T ).

(6.8.3)

Nl =0 Ns =1

If we require that the Ns sum start at 0, we need to replace Ns by Ns + 1, in which case, eqn. (6.8.3) becomes hNs i = e−βP V

∞ X ∞ X

(Ns + 1)ζlNl ζsNs +1 Q(Nl , Ns + 1, V, T ).

(6.8.4)

Nl =0 Ns =0

We now factor our ζs out of eqn. (6.8.4) and multiply and divide by a partition function factor Q(0, 1, V, T ), which is simply the partition function of one solute particle. This gives hNs i = e−βP V ζs Q(0, 1, V, T )

∞ X ∞ X

(Ns + 1)ζlNl ζsNs +1

Nl =0 Ns =0

Q(Nl , Ns + 1, V, T ) . (6.8.5) Q(0, 1, V, T )

The factor Q(0, 1, V, T ) is the partition function of a single solute particle in the gas phase and is, therefore, given by the canonical ideal gas relation with Ns = 1: (int)

Q(0, 1, V, T ) =

qs

V

Λ3s

.

(6.8.6) (int)

results from the internal If a solute “particle” is a molecule of n atoms, then qs potential of the molecule and V is the volume factor associated with the center of mass (see Problem 4.13). The factor Λs is simply a product of the individual thermal wavelengths of each of the n atoms, Λs = λs,1 · · · λs,n . Obviously, if the solute particle is a single atom, then qs = V /λ3s . Introducing the solute density ρs = hNs i/V , eqn. (6.8.5) can be written as ρs Λ3s (int) ζs qs

= e−βP V

∞ X ∞ X

(Ns + 1)ζlNl ζsNs +1

Nl =0 Ns =0

Q(Nl , Ns + 1, V, T ) . Q(0, 1, V, T )

(6.8.7)

The partition function Q(Nl , Ns + 1, V, T ) requires specifying the potential for Nl solvent particles and Ns + 1 solute particles, which we write as (s)

(s)

U (r(l) , r(s) , rNs +1 ) = U (r(l) , r(s) ) + ∆U (r(l) , r(s) , rNs +1 ),

(6.8.8) (s)

where U (r(l) , r(s) ) is the potential of the original solution and ∆U (r(l) , r(s) , rNs +1 ) is the interaction between the (Ns + 1)st particle and the solution. With the definitions of the terms in eqn. (6.8.8), we can write

Molecular dynamics

(Ns + 1)Q(Nl , Ns + 1, V, T ) = ×

Z

(s)

297

(Ns + 1) (Ns + 1)!Nl !Λ3Nl Λ3(Ns +1) (l)

dr(l) dr(s) drNs +1 e−β(U(r

(s)

,r(s) )+∆U(r(l) ,r(s) ,rNs +1 ))

.

(6.8.9)

Since (Ns + 1)/(Ns + 1)! = 1/Ns !, if we now multiply and divide by Q(Nl , Ns , V, T ), which is the canonical partition function of the original solution, we obtain  (Ns + 1)Q(Nl , Ns + 1, V, T ) 1 = Q(Nl , Ns , V, T ) Q(0, 1, V, T ) Q(Nl , Ns , V, T )Q(0, 1, V, T ) # Z (s) 1 (s) −β(U(r(l) ,r(s) )+∆U(r(l) ,r(s) ,rNs +1 )) (l) (s) . (6.8.10) dr dr drNs +1 e × 3(N +1) l Nl !Ns !Λ3N Λs s l (s)

The term in square brackets is an average of exp(−β∆U (r(l) , r(s) , rNs +1 )) over a canonical ensemble of Nl solvent particles and Ns + 1 solute particles but with no interaction between the (Ns + 1)st particle and the original solution, an average we denote as hexp(−β∆U )i0 (Nl , Ns , V, T ). With this notation, eqn. (6.8.10) becomes

(Ns + 1)Q(Nl , Ns + 1, V, T ) = Q(Nl , Ns , V, T ) e−β∆U 0 (Nl , Ns , V, T ). Q(0, 1, V, T )

(6.8.11)

When eqn. (6.8.11) is substituted into eqn. (6.8.7), we obtain ρs Λ3s (int) ζs qs

= e−βP V

∞ ∞ X X

Nl =0 Ns =0



ζlNl ζsNs e−β∆U 0 (Nl , Ns , V, T )Q(Nl , Ns , V, T )

= e−β∆U = e−βµex .

(6.8.12)

The first line of eqn. (6.8.12) is a grand canonical average of he−β∆U i0 (Nl , Ns , V, T ), which leads to our final result in the second line of eqn. (6.8.12). The last line expresses the work needed to add the (Ns + 1)st solute particle to the solution, which is referred to as the excess chemical potential, µex . The fact that µex can be obtained from this grand canonical average he−β∆U i is the potential distribution theorem, which is an important starting point for theories of solutions and solvation (Pratt and Laviolette, 1998; Beck, 2011).

6.9

Molecular dynamics in the grand canonical ensemble

Our previous discussions of molecular dynamics simulation in the canonical and isobaric ensembles suggest that equations of motion are rather natural for treating fluctuations in ensemble control variables, such as temperature and pressure, that can be expressed as averages of continuous functions of the phase-space vector x. The key element proves to be extending the phase space with variables designed to control the fluctuations in the phase-space estimators of these control variables. The same is not

298

Grand canonical ensemble

true for particle-number fluctuations, which are discrete by nature; for this reason, molecular dynamics is an unnatural choice for generating a grand canonical ensemble distribution. In principle, one could use a very large canonical simulation box and select a small subcell within it to compute grand canonical averages. Such a calculation would closely match the setup in Fig. 6.1, and the number of particles within this subcell will exhibit fluctuations consistent with a grand canonical distribution. There are, however, a number of disadvantages to performing simulations in the manner described above. First, it can only be applied in spatially homogeneous systems where we expect the same particle-number fluctuations throughout the system; many applications are not of this type. Second, since particles can only enter a subcell through its boundary, convergence of particle-number fluctuations could be quite slow, which is problematic given the high computational overhead of running a large canonical simulation box in the first place. This problem becomes especially severe if a subcell within the canonical box must, itself, also be large, since the surface-area-to-volume ratio decreases as the size of the subcell increases. Ideally, we would like to be able to perform grand canonical simulations using a single simulation box of any desired size, and allow particles to appear and disappear anywhere within the box; this would lead to maximum efficiency in the convergence of particle-number fluctuations. However, as noted above, simulating discrete changes is not easily achieved within the molecular dynamics framework. We will see in Chapter 7 that particle-number fluctuations can be handled via Monte Carlo simulation methods, which are more flexible in the types of changes they allow. Nevertheless, there is an approach to grand canonical molecular dynamics that avoids the overhead of large simulation boxes through multiple scales of spatial resolution. We close out this chapter with a discussion of this approach. The method is known as Hamiltonian Adaptive Resolution Simulation (H-AdResS) (Potestio et al., 2013; Heidari et al., 2020) and is close in spirit to the scheme in Fig. 6.1. However, H-AdResS exploits the fact that we can design a particle reservoir whose spatial resolution is coarser than that of the system and within which the interparticle interactions are highly simplified compared to those in the system. With these simplifications, the computational demand is significantly lower than using a small subcell within a large canonical box. A full understanding of the details of the H-AdResS approach requires techniques to be discussed in Chapters 7 and 8; however, we can proceed to outline the basic ideas here. Figure 6.2 illustrates the setup for an H-AdResS simulation of a molecular fluid. The simulation cell is divided into three regions: (i) an atomistic region, whose grand canonical properties we seek; (ii) a coarse-grained region containing a set of reservoir particles, described at a low level of spatial resolution, that can flow in and out of the atomistic region; and (iii) a hybrid region, in which coarse-grained reservoir particles are transformed into molecules at full atomistic resolution. If the coarse-grained particles are the same molecules as in the atomistic region but with simplified interactions, e.g., occurring only between their centers of mass, then the transformation between the coarse and atomistic regions is simple, and the Hamiltonian can be written as

Molecular dynamics

299

Fig. 6.2 Illustration of the H-AdResS simulation box setup (reproduced with permission from Potestio et al., Phys. Rev. Lett. 110, 108301 (2013), copyright American Physical Society).

H=

N X n X p2α,i

α=1 i=1

2mi

+

N h i X λα Uα(AA) + (1 − λα ) Uα(CG) + Uintra ,

(6.9.1)

α=1

where the system is assumed to have N total molecules, each containing n atoms with (AA) mass mi , i = 1, ..., n. The potential Uα is felt by the αth molecule in the atomistic (CG) region, Uα is the potential of the αth molecular mass center in the coarse-grained region, λα is a function of the molecular mass centers that switches smoothly from 0 to 1 through the hybrid region, and Uintra is the intramolecular potential. Note that the molecules retain their structure in the coarse-grained region, but since the interaction is only between centers of mass, individual atoms are carried along as “excess baggage”. A technical issue with the Hamiltonian in eqn. (6.9.1) concerns differences between the atomistic and coarse-grained interaction potentials. Because of these differences, the density and chemical potentials in the atomistic and coarsegrained regions are not the same, which can cause unphysical gradients between the two regions. It is, therefore, necessary to add a compensating term into eqn. (6.9.1) derived from a free-energy calculation, which we will discuss in Chapter 8. Finally, in order to maintain a constant chemical potential in the coarse-grained region, which, in turn, fixes the chemical potential of the atomistic region, a Monte Carlo particle insertion/deletion algorithm is employed (see Section 7.3.3). In practice, it helps if the potential in the coarse-grained region is one for which the chemical potential can be derived analytically; an obvious choice is to take the coarse-grained region to be a reservoir of ideal gas particles (Heidari et al., 2020). A feature of the H-AdResS scheme is that it can be employed at and away from equilibrium (Delle Site and Praprotnik, 2017; Ciccotti and Delle Site, 2019; Cortes-Huerto et al., 2021).

300

Grand canonical ensemble

6.10

Problems

6.1. Using a Legendre transform, determine if it is possible to define an ensemble in which µ, P , and T are the control variables. Can you rationalize your result based on Euler’s theorem? 6.2. a. Derive the thermodynamic relations for an ensemble in which µ, V , and S are the control variables. b. Determine the partition function for this ensemble. 6.3. For the ideal gas in Problem 4.7 of Chapter 4, imagine dividing the cylinder into rings of radius r, thickness ∆r, and height ∆z. Within each ring, assume that r and z are constant. a. Within each ring, explain why it is possible to work within the grand canonical ensemble. b. Show that the grand canonical partition function within each ring satisfies Z(µ, Vring , r, z, T ) = Z(0) (µeff (r, z), Vring , T ), where Z(0) is the grand canonical partition function for ω = 0 and g = 0, Vring is the volume of each ring, and µeff (r, z) is an effective local chemical potential that varies from ring to ring. Derive an expression for µeff (r, z). c. Is this result true even if there are interactions among the particles? Why or why not? 6.4. Consider an equilibrium chemical reaction involving K molecular species denoted X1 , ..., XK , where some of the species are reactants and some are products. Denote the chemical equation governing the reaction as K X

νi Xi = 0,

i=1

where νi are the stoichiometric coefficients in the reaction. Using this notation, the coefficients of the products are, by definition, negative. As the reaction proceeds, there will be a change δNi in the number Ni of each species such that the law of mass balance is δN2 δNK δN1 = = ··· . ν1 ν2 νK In order to find a condition describing the chemical equilibrium, we can make use of the Helmholtz free energy A(N1 , N2 , ..., NK , V, T ). At equilibrium, the changes δNi should not change the free energy to first order. That is, δA = 0.

Problems

301

a. Show that this assumption leads to the equilibrium condition K X

µi νi = 0.

i=1

b. Now consider the reaction 2H2 (g) + O2 (g) ⇋ 2H2 O(g). Let ρ0 be the initial density of H2 molecules and ρ0 /2 be the initial density of O2 molecules, and let the initial amount of H2 O be zero. Calculate the equilibrium densities of the three components as a function of temperature and ρ0 . ∗

6.5. Prove the following fluctuation theorems for the grand canonical ensemble: a.   ∂E (∆N )2 . hN H(x)i − hN ihH(x)i = ∂N V,T b. 2

2

∆F = kT CV +

"

∂E ∂N



V,T

−µ

#2

(∆N )2 ,

where CV ispthe constant-volume heat capacity, F = E − N µ = T S − P V , and ∆F = hF2 i − hFi2 .

6.6. In a multicomponent system with K components, show that the fluctuations in the particle numbers of each component are related by     ∂hNj i ∂hNi i = kT ∆Ni ∆Nj = kT , ∂µj V,T,µi ∂µi V,T,µj where ∆Ni =

p hNi2 i − hNi i2 , with a similar definition for ∆Nj .

6.7. Derive eqns. (6.8.6) and (6.8.11).

6.8. The Potential Distribution Theorem has a useful inverse form:

β∆U = eβµex . e

Using the approach in Section 6.8, prove this result and give a physical interpretation of it.

302

Grand canonical ensemble

6.9. For the gas in Problem 4.17 of Chapter 4, suppose, instead of a fixed volume V , the external pressure P is held fixed. Calculate the partition function, the equation of state, the enthalpy, the heat capacity, and the entropy under these conditions.

7 Monte Carlo 7.1

Introduction to the Monte Carlo method

In our treatment of the equilibrium ensembles, we have, thus far, exclusively developed and employed dynamical techniques for sampling the phase-space distributions. This choice was motivated by the natural connection between the statistical ensembles and classical (Hamiltonian or non-Hamiltonian) mechanics. The dynamical aspect of these approaches is, however, irrelevant for equilibrium statistical mechanics, as we are interested only in sampling the accessible microscopic states of the ensemble. In this chapter, we will introduce another class of sampling techniques known as Monte Carlo methods. As the name implies, Monte Carlo techniques are based on games of chance (driven by sequences of random numbers) which, when played many times, yield outcomes that are the solutions to particular problems. The first use of random methods to solve a physical problem dates back to 1930 when Enrico Fermi (1901-1954) employed such an approach to study the properties of neutrons. Monte Carlo simulations also played a central role in the Manhattan Project. It was not until computers could be leveraged that the power of Monte Carlo methods would be realized. In the 1950s, for example, Monte Carlo methods were carried out on the MANIAC at Los Alamos National Laboratory in New Mexico for research on the hydrogen bomb. Eventually, it was determined that Monte Carlo techniques constitute a powerful suite of tools for solving statistical mechanical problems involving integrals of very high dimension. As a simple illustrative example, consider the evaluation of the definite integral I=

Z

0

1

dx

Z

√ 1−x2

dy = 0

π . 4

(7.1.1)

The result π/4 can be obtained straightforwardly, since this is an elementary integral. Note that the answer π/4 is also the ratio of the area of a circle of arbitrary radius to the area of its circumscribed square. This fact suggests that the following game could be used to solve the integral: Draw a square and an inscribed circle on a piece of paper, tape the paper to a dart board, and throw darts randomly at the board. The ratio of the number of darts that land in the circle to the number of darts that land anywhere in the square will, in the limit of a very large number of dart throws, yield a good estimate of the area ratio and hence of the integral in eqn. (7.1.1).1 In 1 Kalos and Whitlock (1986) suggested putting a round cake pan in a square one, placing the combination in a rain storm, and measuring the ratio of raindrops that fall in the round cake pan to those that fall in the square one.

304

Monte Carlo

practice, it would take about 106 such dart throws to achieve a reasonable estimate of π/4, which would try the patience of even the most avid dart player. For this reason, it is more efficient to have the computer throw the darts. Nevertheless, this example shows that a simple random process can be used to produce a numerical estimate of a two-dimensional integral; no fancy sets of dynamical differential equations are needed. In this chapter, we will discuss an important underpinning of the Monte Carlo technique, namely the central limit theorem, and then proceed to describe a number of commonly used Monte Carlo algorithms for evaluating high-dimensional integrals of the type that are ubiquitous in classical equilibrium statistical mechanics.

7.2

The Central Limit theorem

The integrals that must be evaluated in equilibrium statistical mechanics are generally of the form Z I = dx φ(x)f (x), (7.2.1) where x is an n-dimensional vector, φ(x) is an arbitrary function, and f (x) is a function satisfying the properties of a probability distribution function, namely f (x) ≥ 0 and f (x) ≥ 0 Z

dx f (x) = 1.

(7.2.2)

The integral in eqn. (7.2.1) represents the ensemble average of a physical observable in equilibrium statistical mechanics. Let x1 , ..., xM be a set of M n-dimensional vectors that are sampled from f (x). That is, the vectors x1 , ..., xM are distributed according to f (x), so that the probability that the vector xi is in a small region dx of the n-dimensional space on which the vectors x1 , ..., xM are defined is f (xi )dx. Recall that in Section 3.8.3, we described an algorithm for sampling the Maxwell-Boltzmann distribution, which is a particularly simple case. In general, the problem of sampling a distribution f (x) is a nontrivial one that we will address in this chapter. For now, however, let us assume that an algorithm exists for carrying out the sampling of f (x) and generating the vectors x1 , ..., xM . We will establish that the simple arithmetic average M 1 X φ(xi ) (7.2.3) I˜M = M i=1

is an estimator for the integral I, meaning that

lim I˜M = I.

M→∞

(7.2.4)

This result is guaranteed by a theorem known as the central limit theorem, which we will now prove. Readers wishing to proceed immediately to the specifics of Monte Carlo methodology can take the results in eqns. (7.2.3) and (7.2.4) as given and skip to the next section.

Central Limit theorem

For simplicity, we introduce the notation Z dx φ(x)f (x) = hφif ,

305

(7.2.5)

where h· · ·if indicates an average of φ(x) with respect to the distribution f (x). We wish to compute the probability P(y) that the estimator I˜M will have a value y. This probability is given formally by ! # "M Z M Y 1 X φ(xi ) − y , (7.2.6) f (xi ) δ P(y) = dx1 · · · dxM M i=1 i=1 where the Dirac δ-function restricts the integral to those sets of vectors x1 , ..., xM for which the estimator is equal to y. Equation (7.2.6) can be simplified by introducing the integral representation of the δ-function (see Appendix A) Z ∞ 1 δ(z) = dσ eizσ . (7.2.7) 2π −∞ Substituting eqn. (7.2.7) into eqn. (7.2.6) and using the general property of δ-functions that δ(ax) = (1/|a|)δ(x) yields ! # "M Z M X Y φ(xi ) − M y f (xi ) δ P(y) = M dx1 · · · dxM i=1

i=1

=

M 2π

Z

dx1 · · · dxM

"M Y

#Z

f (xi )



dσ eiσ(

PM

i=1

φ(xi )−My )

.

(7.2.8)

−∞

i=1

Interchanging the order of integrations gives M P(y) = 2π

Z



M = 2π

Z



=

M 2π

Z



=

M 2π

Z



dσ e

−iMσy

−∞

−∞

dσ e

−iMσy

Z Z

dx1 · · · dxM dx f (x)e

dσ e−iMσy eM ln

R

"M Y

f (xi ) eiσ

i=1

iσφ(x)

#

PM

i=1

φ(xi )

M

dx f (x)eiσφ(x)

−∞

dσ eMF (σ,y) ,

(7.2.9)

−∞

where in the second line, we have used the fact that the integrals over x1 , ..., xM in the product are all identical. In the last line of eqn. (7.2.9), the function F (σ, y) is defined to be F (σ, y) = −iσy + g(σ) (7.2.10)

306

Monte Carlo

with g(σ) = ln

Z

dx f (x)eiσφ(x) .

(7.2.11)

Although we cannot evaluate the integral over σ in eqn. (7.2.9) exactly, we can approximate it by a technique known as the stationary phase method. This technique applies to integrals of functions F (σ, y) that are sharply peaked about a global maximum at σ = σ ˜ (y) where the integral is expected to have its dominant contribution. For σ = σ ˜ (y) to be a maximum, the following conditions must hold: ∂ 2 F ∂F = 0, 1. Note that the scaling is applied only to the energy bin in which E happens to fall. All other bins remain unchanged. In addition, we accumulate a histogram h(E) of each energy visited as a result of such moves. After many iterations of this procedure h(E) starts to “flatten out,” meaning that it has roughly the same value in each energy bin. Once we decide that h(E) is “flat enough” for the given value of f√ , we start refining the procedure. We choose a new value of f , for example, fnew = fold (an arbitrary formula suggested by Wang and Landau) and begin a new cycle. As before, we wait until h(E) is flat enough and then switch to a new value of f (using the square-root formula again, for example). After many refinement cycles, we will find that f → 1 and h(E) becomes smoothly flat. When this happens, Ω(E) is a converged density of states. The Wang-Landau approach can be used for both discrete lattice-based models as well as continuous systems such as simple fluids (Yan et al., 2002) and proteins (Rathore et al., 2003). An interesting point concerning the Wang-Landau algorithm is that it does not satisfy detailed balance due to the application of the scaling factor to the density of states Ω(E), causing the latter to change continually throughout the calculation. As f approaches unity, the algorithm just starts to satisfy detailed balance. Thus, the Wang-Landau approach represents a Monte Carlo method that can work without strict adherence to detailed balance throughout the sampling procedure. Note that it is also possible to use molecular dynamics to generate trial moves before application of eqn. (7.6.3). If a large time step is used, for example, then after m steps, energy will not be conserved (which also occurs in the hybrid Monte Carlo scheme), meaning that there will be an energy change from E1 to E2 . Thus, the potential energy will change from U1 to U2 , and the kinetic energy will also change from K1 to K2 , with E1 = K1 + U1 and E2 = K2 + U2 . In this case, the acceptance criterion should be modified, as suggested by Rathmore et al. (2003), according to   −β∆K Ω(U1 ) , (7.6.4) A(E2 |E1 ) = min 1, e Ω(U2 ) where ∆K = K2 − K1 .

7.7

Transition path sampling and the transition path ensemble

The last technique we will describe in this chapter is something of a departure from the methods we have discussed thus far. Up to now, we have discussed approaches for sampling configurations from a specified probability distribution, and because our aim has only been to sample, there is no dynamical information in the sequence of configurations we have produced. Consequently, these methods can only be used to calculate equilibrium and thermodynamic properties. But suppose we could devise a Monte Carlo scheme capable of producing actual dynamical trajectories from one part of phase space to another. The framework of Monte Carlo is flexible enough that such a thing ought to be possible. In fact, this is something that can be achieved, but doing so requires a small shift in the way we think of statistical ensembles and the sampling problem.

Transition path sampling

329

The technique we will discuss, known as transition path sampling, was pioneered by Chandler and coworkers (Dellago et al., 1998; Bolhuis et al., 2002; Dellago et al., 2002; Frenkel and Smit, 2002). This approach is a particularly powerful one for generating dynamical trajectories between two regions of phase space when the passage of the system from one to the other is exceedingly rare. When such rare events (which we will discuss in more detail in Section 8.5) cannot be accessed using ordinary molecular dynamics, transition path sampling provides a means of producing the desired trajectories. A classic example of a rare event is the dissociation of a water molecule according to 2H2 O(l) −→ H3 O+ (aq) + OH− (aq) (Geissler et al., 2001). The reaction ostensibly only requires transferring a proton from one water molecule to another. However, if we attach ourselves to a particular water molecule and wait for the chemical reaction to occur, the average time we would have to wait is ten hours for a single event, a time scale that is well beyond that which can be accessed in an ordinary molecular dynamics calculation. Generally, a process is termed a rare-event process when the system must cross one or more high energy barriers (see Fig. 7.2). The actual passage time over the barrier can be quite short, while most of the time is spent waiting for a sufficient amount of energy to amass in a small number of modes in the system to allow the barrier to be surmounted. When a system is in equilibrium, where equipartitioning holds, such a fluctuation is, indeed, a rare event. This is illustrated in Fig. 7.5, which shows a thermostatted trajectory of a particle in a one-dimensional double-well potential with minima at x = ±1 and a barrier height of 8kT . The figure shows that actual crossing events are rapid but times between such events are quite long. In the transition path sampling method, we approach the problem in a way that is qualitatively different from what we have done until now. Let us assume that we know a priori the regions of phase space in which the trajectory initiates and in which it finally ends up. We denote these regions generically as A and B, respectively (see

Fig. 7.5 Rare-event trajectory in a double-well potential with barrier height 8kT .

330

Monte Carlo

Fig. 7.6 A representative transition path from A to B. Also shown are discrete phase-space points along the path: Initial (x0 ), final (xn∆t = xt ), and several intermediate (x(k−1)∆t , xk∆t , x(k+1)∆t ) points.

Fig. 7.6). Let us also assume that a time T is needed for the system to pass from A to B. If we could generate a molecular dynamics trajectory from A to B, then this trajectory would consist of a sequence of discrete phase-space points x0 , x∆t , x2∆t , ..., xn∆t , where n∆t = T, such that x0 ∈ A and xn∆t ∈ B (see Fig. 7.6). Let us denote this set of phase-space points as X(T). That is, X(T) is a time-ordered sequence of microscopic states visited by the system as it passes from A to B. If we view X(T) as belonging to an ensemble of trajectories from A to B, then we can derive a probability PAB [X(T)] associated with a given trajectory. Note that we are using functional notation because PAB depends on the entire trajectory. If we regard the sequence of phase-space points x0 , x∆t , x2∆t , ..., xn∆t as a Markov chain, then there exists a rule T (x(k+1)∆t |xk∆t ) for generating x(k+1)∆t given xk∆t . For example, suppose we posit that the trajectory is to be generated deterministically via a symplectic integrator such as the velocity Verlet method of Section 3.10. Then, x(k+1)∆t would be generated from xk∆t using a Trotter factorization of the propagator x(k+1)∆t = eiL2 ∆t/2 eiL1 ∆t eiL2 ∆t/2 xk∆t ≡ φ∆t (xk∆t ).

(7.7.1)

Here, φ∆t (xk∆t ) is a shorthand notation for the Trotter factorized single-step propagator acting on xk∆t . The rule T (x(k+1)∆t |xk∆t ) must specify that there is only one possible choice for x(k+1)∆t given xk∆t , which means we must take the rule as  T (x(k+1)∆t |xk∆t ) = δ x(k+1)∆t − φ∆t (xk∆t ) .

(7.7.2)

In an ensemble of trajectories X(T), the general statistical weight P[X(T)] that we would assign to any single trajectory is P[X(T)] = f (x0 )

n−1 Y k=0

T (x(k+1)∆t |xk∆t ),

(7.7.3)

where f (x0 ) is the equilibrium distribution of initial conditions x0 , for example, a canonical distribution f (x0 ) = exp(−βH(x0 ))/Q(N, V, T ). However, since our interest is in trajectories that start in the phase-space region A and end in the region B, we need to restrict the trajectory distribution in eqn. (7.7.3) to this subset of trajectories. We do this by multiplying eqn. (7.7.3) by functions hA (x0 ) and hB (xn∆t ), where

Transition path sampling

331

hA (x) = 1 if x ∈ A and hA (x) = 0 otherwise, with a similar definition for hB (x). This gives the transition path probability PAB [X(T)] as PAB [X(T)] =

1 hA (x0 )P[X(T)]hB (xn∆t ), FAB (T)

where FAB is a normalization constant given by Z FAB (T) = dx0 · · · dxn∆t hA (x0 )P[X(T)]hB (xn∆t ).

(7.7.4)

(7.7.5)

Equations (7.7.4) and (7.7.5) can be regarded as the probability distribution and partition function for an ensemble of trajectories that begin in A and end in B and thus, they can be regarded as defining an ensemble called the transition path ensemble (Dellago et al., 1998; Bolhuis et al., 2002; Dellago et al., 2002). Although eqns. (7.7.4) and (7.7.5) are valid for any trajectory rule T (x(k+1)∆t |xk∆t ), if we take the specific example of deterministic molecular dynamics in eqn. (7.7.2), then eqn. (7.7.5) becomes n−1 Y

FAB (T) =

Z

dx0 · · · dxn∆t hA (x0 )f (x0 )

=

Z

dx0 f (x0 )hA (x0 )hB (xn∆t (x0 )),

k=0

 δ x(k+1)∆t − φ∆t (xk∆t ) hB (xn∆t ) (7.7.6)

where we have used the Dirac δ-functions to integrate over all points xk∆t except k = 0. When this is done the hB factor looks like hB (φ∆t (φ∆t (· · · φ∆t (x0 )))), where the integrator φ∆t acts n times on the initial condition x0 to give the unique numerical solution xn∆t (x0 ) appearing in eqn. (7.7.6). Thus, for deterministic molecular dynamics, eqn. (7.7.6) simply counts those microstates belonging to the equilibrium ensemble f (x0 ) that are contained in A and, when integrated for n steps, end in B. In Chapter 15, we will show how to define and generate the transition path ensemble for trajectories obeying stochastic rather than deterministic dynamics. Having now defined the ensemble of transition paths, we need a Monte Carlo algorithm for sampling this ensemble. The method we will describe here is an adaptation of the M(RT)2 algorithm for an ensemble of paths rather than one of configurations. Accordingly, we seek to generate a Markov chain of M trajectories X1 (T), ..., XM (T), and to accomplish this, we begin, as we did in Section 7.3.3, with a generalization of the detailed balance condition. Let RAB [X(T)|Y(T)] be the conditional probability to generate a trajectory X(T) starting from Y(T). Both X(T) and Y(T) must be transition paths from A to B. The detailed balance condition appropriate for the transition path ensemble is RAB [X(T)|Y(T)]PAB [Y(T)] = RAB [Y(T)|X(T)]PAB [X(T)].

(7.7.7)

As we did in eqn. (7.3.25), we decompose RAB [X(T)|Y(T)] into a product RAB [X(T)|Y(T)] = ΛAB [X(T)|Y(T)]TAB [X(T)|Y(T)]

(7.7.8)

332

Monte Carlo

of a trial probability TAB and an acceptance probability ΛAB . The same logic used to obtain eqn. (7.3.28) leads to the acceptance rule for transition path sampling   PAB [X(T)]TAB [Y(T)|X(T)] . (7.7.9) Λ[X(T)|Y(T)] = min 1, PAB [Y(T)]TAB [X(T)|Y(T)] Since the trajectory Y(T) is assumed to be a proper transition path from A to B, hA (y0 ) = 1 and hB (yn∆t ) = 1. Thus, we can write eqn. (7.7.9) as   P[X(T)]TAB [Y(T)|X(T)] Λ[X(T)|Y(T)] = hA (x0 )hB (xn∆t )min 1, , (7.7.10) P[Y(T)]TAB [XT)|Y(T)] which is zero unless the new trajectory X(T) is also a proper transition path. As with any Monte Carlo algorithm, the key to efficient sampling of transition paths is the design of the rule TAB [X(T)|Y(T)] for generating trial moves from one path to another. Here, we will discuss a particular type of trial move known as a “shooting move.” Shooting moves are conceptually simple. We randomly select a point yj∆t (by randomly choosing the integer j) from the starting trajectory Y(T) and modify it in some way to give a point xj∆t on the trial trajectory X(T), referred to as a “shooting point.” Starting from this point, trajectories are launched forward and backward in time. If the new trajectory X(T) thus generated is a transition path from A to B, it is accepted with some probability; otherwise, it is rejected. The idea of shooting moves is illustrated in Fig. 7.7.

Fig. 7.7 The shooting algorithm. The original path Y(T) is shown as the solid line. A point yj∆t randomly chosen from this path is used to determine the shooting point xj∆t . Two example shooting paths are shown as dashed lines. The long dashed line is a successful path that is accepted, while the short dashed line shows an unsuccessful path that is rejected.

Let τ (xj∆t |yj∆t ) denote the rule for generating a trial shooting point xj∆t from yj∆t . Then, we can express TAB [X(T)|Y(T)] as TAB [X(T)|Y(T)] = 

n−1 Y

τ (xj∆t |yj∆t ) 

k=j

"

T (x(k+1)∆t |xk∆t )

j Y

k=1

#

T (x(k−1)∆t |xk∆t ) .

(7.7.11)

Transition path sampling

333

The first product in eqn. (7.7.11) is the probability for the forward trajectory, and the second product, which requires the rule to generate x(k−1)∆t from xk∆t via timereversed dynamics, is the weight for the backward trajectory. For molecular dynamics, this part of the trajectory is just obtained via eqn. (7.7.1) using −∆t instead of ∆t, i.e., x(k−1)∆t = φ−∆t (xk∆t ), which can be generated by integrating forward in time with time step ∆t but with velocities reversed at the shooting point. Combining eqn. (7.7.11) with (7.7.10), we obtain for the acceptance probability ! " n−1 f (x0 ) Y T (x(k+1)∆t |xk∆t ) Λ[X(T)|Y(T)] = hA (x0 )hB (xn∆t )min 1, f (y0 ) T (y(k+1)∆t |yk∆t ) k=0   j−1 n−1 Y Y T (y |y ) T (y |y ) τ (y |x ) k∆t (k+1)∆t (k+1)∆t k∆t j∆t j∆t  × τ (xj∆t |yj∆t ) T (x(k+1)∆t |xk∆t ) T (xk∆t |x(k+1)∆t ) k=0

k=j

= hA (x0 )hB (xn∆t ) # j−1 f (x0 ) τ (yj∆t |xj∆t ) Y T (yk∆t |y(k+1)∆t )T (x(k+1)∆t |xk∆t ) . (7.7.12) ×min 1, f (y0 ) τ (xj∆t |yj∆t ) T (xk∆t |x(k+1)∆t )T (y(k+1)∆t |yk∆t ) "

k=0

Although eqn. (7.7.12) might seem rather involved, consider what happens when the trajectories are generated by molecular dynamics, with the trial probability given by eqn. (7.7.2). Since a symmetric Trotter factorization of the classical propagator is time reversible, as discussed in Section 3.10, the ratio T (yk∆t |ry(k+1)∆t )/T (y(k+1)∆t |yk∆t is unity, as is the ratio T (x(k+1)∆t |xk∆t )/T (xk∆t |x(k+1)∆t ), and the acceptance criterion simplifies to   f (x0 ) τ (yj∆t |xj∆t ) Λ[X(T)|Y(T)] = hA (x0 )hB (xn∆t )min 1, . (7.7.13) f (y0 ) τ (xj∆t |yj∆t ) Finally, suppose the new shooting point xj∆t is generated from the old point yj∆t using the following rule; xj∆t = yj∆t + ∆, (7.7.14) where the phase-space displacement ∆ is chosen randomly from a symmetric distribution π(∆) satisfying π(−∆) = π(∆). In this case, the ratio τ (yj∆t |xj∆t )/τ (xj∆t |yj∆t ) in eqn. (7.7.13) is unity, and the acceptance rule becomes simply   f (x0 ) , (7.7.15) Λ[X(T)|Y(T)] = hA (x0 )hB (xn∆t )min 1, f (y0 ) which is determined just by the initial conditions and whether the new trajectory is a proper transition path from A to B. At this point, several comments on the shooting algorithm are in order. First, a very common and simple choice for the phase-space displacement is ∆ = (0, δp), meaning

334

Monte Carlo

that only the momenta are altered, while the configuration is left unchanged (Dellago et al., 2002). If δp is chosen from a Maxwell-Boltzmann distribution, then the symmetry condition is satisfied and will continue to be satisfied if the new momenta are projected onto a surface of constraint or modified to give zero total linear or angular momentum in the system. As a rule of thumb, the displacement ∆ should be chosen to give roughly 40% acceptance probability (Dellago et al., 2002). The basic steps of the shooting algorithm can be summarized as follows: 1. Choose an index j randomly on the old trajectory Y(T). 2. Generate a random phase-space displacement ∆ in order to generate the new shooting point xj∆t from the old point yj∆t . 3. Integrate the equations of motion backwards in time from the shooting point to the initial condition x0 . 4. If the initial condition x0 is not in the phase-space region A, reject the trial move. 5. If x0 ∈ A, accept the move with probability min[1, f (x0 )/f (y0 )]. Note that if the distribution of initial conditions is microcanonical rather than canonical (or isothermal-isobaric), this step can be skipped. 6. Integrate the equations of motion forward in time to generate the final point xn∆t . 7. If xn∆t ∈ B, accept the trial move, and reject it otherwise. 8. If the path is rejected at steps 4, 5, or 7, then the old trajectory Y(T) is counted again in the calculation of averages over the transition path ensemble. Otherwise, invert the momenta along the backward path of the path to yield a forward moving transition path X(T) and replace the old trajectory Y(T) by the new trajectory X(T). A useful variant of the shooting algorithm is known as “aimless shooting” (Peters and Trout, 2006). In this approach, rather than displacing a point yj∆t to obtain a new shooting point xj∆t , a shooting point yj∆t from the old trajectory is selected, and the momenta are resampled from a Boltzmann distribution at that point. The resulting new phase-space point is denoted xj∆t . As in the shooting algorithm above, the equations of motion for the new trajectory originating at the new shooting point xj∆t are integrated backward to t = 0 to determine whether the trajectory will return to the phase-space region A. If it does, then the equations of motion are integrated forward to t = n∆t = T to determine whether the trajectory will terminate in the phase-space region B. If it does, then the trajectory is accepted; otherwise, the trajectory is rejected. In aimless shooting, the shooting point does not correspond to an arbitrary choice of j, but rather, j is chosen to be (n/2) − 1, n/2, or (n/2) + 1 with equal probability, so that the corresponding point on the new trajectory is x((n/2)−1)∆t , x(n/2)∆t , or x((n/2)+1)∆t , respectively. We note, in passing, that aimless shooting simplifies the condition for accepting a trajectory. Importantly, aimless shooting is designed to generate most of its shooting points near a region in the phase space for which there is roughly equal probability that trajectories initiated at these points will end in A or B, which we identify as a transition state. Further discussion of this point will be deferred to Section 8.12, where the concept of a committor and its distribution is introduced. Another important point to note about the transition path sampling approach is that an initial transition path X0 (T) is needed in order to seed the algorithm. Gen-

Problems

335

erating such a path can be difficult, particularly for extremely rare event processes. However, a few tricks can be employed, for example running a system at high temperature to accelerate a process or possibly starting a path in B and letting it evolve to A. The latter could be employed, for example, in protein folding, where it is generally easier to induce a protein to unfold than fold. Although initial paths generated via such tricks are not likely to have a large weight in the transition path ensemble, they should quickly relax to more probable paths under the shooting algorithm. However, as with any Monte Carlo scheme, this fast relaxation cannot be guaranteed if the initial path choice is a particularly poor one. Just as configurational Monte Carlo methods can become trapped in local regions of configuration space, so transition path sampling can become trapped in local regions of path space where substantial barriers keep the system from accessing regions of the space containing paths of higher probability. The design of path-generation algorithms capable of enhancing sampling of the transition path ensemble is still an open and interesting question with room for novel improvements.

7.8

Problems 7.1. Write a Monte Carlo program to calculate the integral Z 1 2 I= e−x dx 0

using a. uniform sampling of x on the interval [0, 1], and b. an importance function h(x), where h(x) =

 3 1 − x2 2

constitutes the first two terms in the Taylor series expansion of exp(−x2 ). In both cases, compare the converged result you obtain to the value of I generated using a simple numerical integration algorithm such as Simpson’s rule. 7.2. Devise an importance function for performing the integral Z 1  πx  I= cos dx, 2 0

and show using a Monte Carlo program that your importance function leads to a smaller variance than uniform sampling for the same number of Monte Carlo moves. How many steps are required with your importance function to converge the Monte Carlo estimator to within 10−6 of the analytical value of I?

336

Monte Carlo

7.3. The following example (Kalos and Whitlock, 1986) illustrates the recursion associated with the M(RT)2 algorithm. Consider the M(RT)2 algorithm for sampling the one-dimensional probability distribution f (x) = 2x for x ∈ (0, 1). Let the probability for trial moves from y to x be   1 x ∈ (0, 1) T (x|y) = . 0 otherwise In this case, r(x|y) = x/y for in y ∈ (0, 1). a. Show that the sequence of distributions πn (x) satisfies the recursion Z x Z x Z 1 y x 1− dy. πn (y)dy + πn (y)dy + πn (x) πn+1 (x) = x 0 0 x y b. Show, therefore, that πn (x) = cx is a fixed point of the recursion, where c is an arbitrary constant. By normalization, c must be equal to 2. c. Now suppose that we start the recursion with π0 (x) = 3x2 and that at the nth step of the iteration πn (x) = an x + cn xn+2 , where an and cn are constants. Show that as n → ∞, cn goes asymptotically to 0, leaving a distribution that is purely linear. ∗

7.4. In this problem, we will compare some of the Monte Carlo schemes introduced in this chapter to thermostatted molecular dynamics for a one-dimensional harmonic oscillator for which U (x) =

1 mω 2 x2 2

in the canonical ensemble. For this problem, you can take the mass m and frequency ω both equal to 1. a. Write a Monte Carlo program that uses uniform sampling of x within the M(RT)2 algorithm to calculate the canonical ensemble average hx4 i. Try to optimize the step size ∆ and average acceptance probability to obtain the lowest possible variance. b. Write a hybrid Monte Carlo program that uses the velocity Verlet integrator to generate trial moves of x. Use your program to calculate the same average hx4 i. Try to optimize the time step and average acceptance probability to obtain the lowest possible variance. c. Write a thermostatted molecular dynamics program using the Nos´e-Hoover chain equations together with the integrator described by eqns. (4.12.8)(4.12.19). Try using nsy = 7 with the weights in eqn. (4.12.13) and n = 4. Adjust the time step so that the energy in eqn. (4.11.3) is conserved to 10−4 as measured by eqn. (3.14.1).

Problems

337

d. Compare the number of steps of each algorithm needed to converge the average hx4 i to within the same error as measured by the variance. What are your conclusions about the efficiency of Monte Carlo versus molecular dynamics for this problem? ∗

7.5. Consider Hamilton’s equations in the form x˙ = η(x). Suppose x = (q, p) is a two-dimensional phase-space vector, and let the equations of motion be integrated using the following numerical solver:   ∆t η(x(0)) . x(∆t) = x(0) + ∆tη x(0) + 2 Can this algorithm be used in conjunction with hybrid Monte Carlo? Why or why not? 7.6. Write a replica exchange Monte Carlo algorithm to sample the Boltzmann distribution corresponding to a double-well potential U (x) of the form U (x) = D0 x2 − a2

2

for a = 1 and D0 values of 5 and 10. For each case, optimize the temperature ladder T1 , ..., TM , the number of replicas M , and the frequency of exchange attempts. Use separate Nos´e-Hoover chains to control the temperatures on each of the M replicas. For each simulation, plot the following measure of convergence: Nbins 1 X |Pk (xi ) − Pexact (xi )|, ζk = Nbins i=1 where Pexact (x) is the exact probability distribution, Pk (x) is the probability at the kth step of the simulation, and Nbins is the number of bins used in the calculation of the histogram of the system with temperature T1 . ∗

7.7 Suppose a non-Hamiltonian molecular dynamics algorithm, such as the Nos´eHoover chain method of Section 4.11, is used to generate paths in the transition path sampling algorithm. Assuming symmetry of the rule for generating shooting points, i.e., τ (yj∆t |xj∆t )/τ (xj∆t |yj∆t ), show that the acceptance rule in eqn. (7.7.15) must be modified to read   f (x0 ) J(yj∆t ; y0 ) , Λ[X(T)|Y(T)] = hA (x0 )hB (xn∆t )min 1, f (y0 ) J(xj∆t ; x0 ) where J(xt ; x0 ) is the Jacobian of the transformation from x0 to xt (x0 ) determined by eqn. (4.10.2).

338

Monte Carlo

7.8. The probability distribution function f (x) =

C , x2 + a2

where x ∈ (−∞, ∞), is known as a Lorentzian distribution. Here, a is a constant. a. Find an expression for the constant C so that f (x) is properly normalized. b. Design an algorithm for generating random samples from the Lorentzian distribution function. c. Repeat parts a and b for the distribution f (x) = where x ∈ [0, a].

Cx , + a2

x2

8 Free-energy calculations Our treatment of the classical equilibrium ensembles makes clear that the free energy is a quantity of particular importance in statistical mechanics. Being related to the logarithm of the partition function, the free energy is the generator through which other thermodynamic quantities are obtained via differentiation. Often, however, we are less interested in the absolute free energy than we are in the free-energy difference between two thermodynamic states. Free-energy differences tell us, for example, whether a chemical reaction occurs spontaneously or requires input of work. They tell us whether a given solute is hydrophobic or hydrophilic, and they are directly related to equilibrium constants for chemical processes. Thus, from free-energy differences, we can compute acid or base ionization constants. We can also quantify the therapeutic viability of a candidate drug compound by calculating its inhibition constant or IC50 value from the binding free energy. Another type of free energy often sought is the free energy as a function of one or more generalized or collective coordinates in a system, e.g., a set of backbone dihedral angles in an oligopeptide. This surface provides a map of the stable conformations of the molecule, the relative stability of these conformations, and the barrier heights that must be crossed for a change in conformation. It should, therefore, be clear that free-energy surfaces encode the conformational preferences of complex molecules such as proteins and nucleic acids. In this chapter, we describe a variety of widely used techniques that have been developed for calculating free energies and discuss the relative merits and disadvantages of the methods. The fact that the free energy is a state function, ensuring that the system can be transformed from one state to another along physical or unphysical paths without affecting the free-energy difference, allows for considerable flexibility in the design of novel techniques and will be frequently exploited in the developments we will present. The techniques described in this chapter are constructed within the framework of the canonical ensemble with the aim of obtaining Helmholtz free-energy differences ∆A. Generalization to the isothermal-isobaric ensemble and the Gibbs free-energy difference ∆G is, in all cases, straightforward. (For a useful compendium of free-energy calculation methods, readers are referred to Free-Energy Calculations, C. Chipot and A. Pohorille, eds. (2007) or Rugged Free-Energy Landscapes, W. Janke, ed. (2010) .

8.1

Free-energy perturbation theory

We begin our treatment of free-energy differences by considering the problem of transforming a system from one thermodynamic state to another. Let these states be

340

Free-energy calculations

denoted generically as A and B. At the microscopic level, these two states are characterized by potential energy functions UA (r1 , ..., rN ) and UB (r1 , ..., rN ). For example, in a drug-binding study, the state A might correspond to the unbound ligand and enzyme while B would correspond to the bound complex. In this case, the potential UA would exclude all interactions between the ligand and the enzyme, but the potential UB would include them. The Helmholtz free-energy difference between the states A and B is simply ∆AAB = AB − AA . The two free energies AA and AB are given in terms of their respective canonical partition functions QA and QB , respectively by AA = −kT ln QA and AB = −kT ln QB , where #) ( "N Z X p2 N N i + UA (r1 , ..., rN ) QA (N, V, T ) = CN d p d r exp −β 2mi i=1 =

ZA (N, V, T ) N !λ3N

QB (N, V, T ) = CN =

Z

#) N X p2i + UB (r1 , ..., rN ) d p d r exp −β 2mi i=1 N

N

(

"

ZB (N, V, T ) . N !λ3N

(8.1.1)

The 1/N ! combinatorial factor should be adjusted if multiple particle species are present in the system (see footnote below eqn. (3.2.22). The free-energy difference is, therefore,     QB ZB ∆AAB = AB − AA = −kT ln = −kT ln , (8.1.2) QA ZA where ZA and ZB are the configurational partition functions for states A and B, respectively: Z ZA = dN r e−βUA (r1 ,...,rN ) ZB =

Z

dN r e−βUB (r1 ,...,rN ) .

(8.1.3)

The ratio of full partition functions QB /QA reduces to the ratio of configurational partition functions ZB /ZA because the momentum integrations in the former cancel out of the ratio. Equation (8.1.2) is difficult to implement in practice because in any numerical calculation based either on molecular dynamics or Monte Carlo, we can compute averages of phase-space functions, but we do not have direct access to the partition function.1 Thus, eqn. (8.1.2) can be computed directly if it can be expressed in terms 1 The

Wang-Landau method of Section 7.6 is an exception.

Free-energy perturbation theory

341

of a phase-space average. To this end, consider inserting unity into the expression for ZB as follows: Z ZB = dN r e−βUB (r1 ,...,rN ) =

Z

dN r e−βUB (r1 ,...,rN ) e−βUA (r1 ,...,rN ) eβUA (r1 ,...,rN )

=

Z

dN r e−βUA (r1 ,...,rN ) e−β(UB (r1 ,...,rN )−UA (r1 ,...,rN )) .

If we now take the ratio ZB /ZA , we find Z 1 ZB = dN r e−βUA (r1 ,...,rN ) e−β(UB (r1 ,...,rN )−UA (r1 ,...,rN )) ZA ZA E D , = e−β(UB (r1 ,...,rN )−UA (r1 ,...,rN )) A

(8.1.4)

(8.1.5)

where the notation h· · ·iA indicates an average taken with respect to the canonical configurational distribution of the state A. Substituting eqn. (8.1.5) into eqn. (8.1.2) gives E D . (8.1.6) ∆AAB = −kT ln e−β(UB −UA ) A

Equation (8.1.6) is known as the free-energy perturbation formula (Zwanzig, 1954); it should be reminiscent of the thermodynamic perturbation formula in eqn. (4.8.8). Recalling the potential distribution theorem (Widom, 1963; Beck et al., 2006) for calculating the excess chemical potential of a solute molecule in solution (Section 6.8), we see that it is essentially a variant of eqn. (8.1.6) applied to the Gibbs free energy G = µN . Equation (8.1.6) can be interpreted as follows: We sample a set of configurations {r1 , ..., rN } from the canonical distribution of state A and simply use these configurations, unchanged, to sample the canonical distribution of state B with potential UB . However, because these configurations are not sampled from exp(−βUB )/ZB directly, we need to “unbias” our sampling by removing the factor exp(−βUA ) and reweighting with exp(−βUB ), which leads to eqn. (8.1.6). The difficulty with this approach is that the configuration spaces of states A and B might not have significant overlap. By this, we mean that configurations sampled from the canonical distribution of state A may not be states of high probability in the canonical distribution of state B. When this is the case, the potential energy difference UB − UA becomes large, and hence the exponential factor exp[−β(UB − UA )] becomes negligibly small. Thus, most of the configurations have very low weight in the ensemble average, and the free-energy difference converges slowly. For this reason, it is clear that the free-energy perturbation formula is only useful when the two states A and B do not differ significantly. In other words, the state B must be a small perturbation to the state A. Even if B is not a small perturbation to A, the free-energy perturbation idea can still be salvaged by introducing a set of M − 2 intermediate states with potentials

342

Free-energy calculations

Uα (r1 , ..., rN ), where α = 1, ..., M , α = 1 corresponds to the state A, and α = M corresponds to the state B. Let ∆Uα,α+1 = Uα+1 − Uα . We now transform the system from state A to state B along a path through each of the intermediate states and compute the average of ∆Uα,α+1 in each state α. The free-energy difference ∆AAB is the sum of contributions obtained using the free-energy perturbation formula from each intermediate state along the path: ∆AAB = −kT

M−1 X α=1

ln e−β∆Uα,α+1 α ,

(8.1.7)

where h· · ·iα represents an average taken over the distribution exp(−βUα ). The key to applying eqn. (8.1.7) is to choose the thermodynamic path between A and B so as to achieve sufficient overlap between successive intermediate states without requiring a large number of them.

8.2

Adiabatic switching and thermodynamic integration

The free-energy perturbation approach evokes a physical picture in which configurations sampled from the canonical distribution of state A are immediately “switched” to the state B by simply changing the potential from UA to UB . When there is insufficient overlap between the states A and B, a set of intermediate states can be employed to define an optimal transformation path. The use of intermediate states conjures a picture in which the system is slowly switched from A to B. In this section, we will discuss an alternative approach in which the system is continuously, adiabatically switched from A to B. An adiabatic path is one along which the system is fully relaxed at each point of the path. In order to effect the switching from one state to the other, we employ a common trick of introducing an “external” switching variable λ in order to parameterize the adiabatic path. This parameter is used to define a new potential energy function, sometimes called a “metapotential,” defined as U (r1 , ..., rN , λ) ≡ f (λ)UA (r1 , ..., rN ) + g(λ)UB (r1 , ..., rN ).

(8.2.1)

The functions f (λ) and g(λ) are switching functions that must satisfy the conditions f (0) = 1, f (1) = 0, g(0) = 0, g(1) = 1. Thus U (r1 , ..., rN , 0) = UA (r1 , ..., rN ) and U (r1 , ..., rN , 1) = UB (r1 , ..., rN ). Apart from these conditions, f (λ) and g(λ) are completely arbitrary. The mechanism of eqn. (8.2.1) is one in which an imaginary external controlling influence (“hand of God” in the form of the λ parameter) starts the system off in state A (λ = 0) and slowly switches off the potential UA while simultaneously switching on the potential UB . The process is complete when λ = 1. A simple choice for the functions f (λ) and g(λ) is f (λ) = 1 − λ and g(λ) = λ. In order to see how eqn. (8.2.1) is used to compute the free-energy difference ∆AAB , consider the canonical partition function of a system described by the potential of eqn. (8.2.1) for a particular choice of λ: #) ( "N Z X p2 i N N + U (r1 , ..., rN , λ) . (8.2.2) Q(N, V, T, λ) = CN d p d r exp −β 2mi i=1

Adiabatic switching and thermodynamic integration

343

This partition function leads to a free energy A(N, V, T, λ) via A(N, V, T, λ) = −kT ln Q(N, V, T, λ).

(8.2.3)

In Section 4.2, we showed that derivatives of A with respect to N , V , or T give the chemical potential, pressure, or entropy, respectively. For A(N, V, T, λ), what does the derivative with respect to λ represent? According to eqn. (8.2.3), ∂A kT ∂Q kT ∂Z =− =− . ∂λ Q ∂λ Z ∂λ Computing the derivative of Z with respect to λ, we find Z kT ∂ kT ∂Z = dN r e−βU(r1 ,...,rN ,λ) Z ∂λ Z ∂λ   Z ∂U kT e−βU(r1 ,...,rN ,λ) dN r −β = Z ∂λ   ∂U . =− ∂λ

(8.2.4)

(8.2.5)

Note that the free-energy difference ∆AAB can be obtained trivially from the relation Z 1 ∂A ∆AAB = dλ. (8.2.6) 0 ∂λ Substituting eqns. (8.2.4) and (8.2.5) into eqn. (8.2.6) yields the free-energy difference as  Z 1 ∂U ∆AAB = dλ, (8.2.7) ∂λ λ 0 where h· · ·iλ denotes an average over the canonical ensemble described by the distribution exp[−βU (r1 , ..., rN , λ)] with λ fixed at a particular value. The special choice of f (λ) = 1 − λ and g(λ) = λ has a simple interpretation: With these functions, eqn. (8.2.7) becomes Z 1 hUB − UA iλ dλ. (8.2.8) ∆AAB = 0

Equation (8.2.8) recalls the relationship between work and free energy from the second law of thermodynamics. If, in transforming the system from state A to state B, an amount of work W is performed on the system, then W ≥ ∆AAB ,

(8.2.9)

where equality holds only if the transformation is carried out along a reversible path. We will refer to this inequality as the “work-free-energy inequality.” Since reversible work is related to a change in potential energy (see Section 1.6), eqn. (8.2.8) is actually a statistical version of eqn. (8.2.9) for the special case of equality. Equation (8.2.8) tells

344

Free-energy calculations

us that the free-energy difference is the ensemble average of the microscopic reversible work needed to change the potential energy of each configuration from UA to UB along the chosen λ-path. Note, however, that eqn. (8.2.7), which is known as the thermodynamic integration formula (Kirkwood, 1935), is independent of the choice of f (λ) and g(λ), which means that eqn. (8.2.7) always yields the reversible work via the free-energy difference. The flexibility in the choice of the λ-path, however, can be exploited to design adiabatic switching algorithms of greater efficiency than can be achieved with the simple choice f (λ) = 1 − λ, g(λ) = λ. In practice, the thermodynamic integration formula is implemented as follows: A set of M values of λ is chosen from the interval [0, 1], and at each chosen value λk a full molecular dynamics or Monte Carlo calculation is carried out in order to generate the average h∂U/∂λk iλk . The resulting values of h∂U/∂λk iλk , k = 1, ..., M , are then substituted into eqn. (8.2.7), and the result is integrated numerically to produce the free-energy difference ∆AAB . The selected values {λk } can be evenly spaced, for example, or they could be a set of Gaussian quadrature nodes, depending on the anticipated variation of A(N, V, T, λ) with λ for particular f (λ) and g(λ). Let us now consider an example of a particular type of free-energy calculation of particular relevance, specifically, the binding of a lead drug candidate to the active site of an enzyme E. The purpose of the drug candidate is to inhibit the catalytic mechanism of an enzyme used, for instance, by a virus to attack a host cell, hijack its cellular machinery, or replicate itself. We will refer to the candidate drug compound as “I” (inhibitor). The efficacy of the compound as an inhibitor of the enzyme is measured by an equilibrium constant known as the inhibition constant Ki = [E][I]/[EI], where [E], [I], and [EI] refer to the concentrations in aqueous solution of the uncomplexed enzyme, uncomplexed inhibitor, and enzyme-inhibitor complex EI, respectively. Since Ki is an equilibrium constant, it is also related to the Gibbs free energy of binding ∆Gb , which is the free energy of the reaction E(aq) + I(aq) ⇋ EI(aq). That is, Ki = exp(∆Gb /kT ). For the purposes of this discussion, we will assume that the binding Helmholtz free energy ∆Ab is approximately equal to the Gibbs free energy, so that the former can be reasonably used to estimate the inhibition constant. If we wish to determine Ki for a given drug candidate by calculating the binding free energy, a technical complication immediately arises. In principle, we can let the potential UA contain all interactions except that between the enzyme and the inhibitor and then let this excluded interaction be included in UB . Now consider placing an enzyme and inhibitor in a bath of water molecules in order to perform the calculation. First, in order to sample the unbound state, the enzyme and inhibitor need to be separated by a distance large enough that both are fully solvated. If we then attempt to let them bind by turning on the enzyme-inhibitor interaction in stages, the probability that they will “find” each other and bind properly under this interaction is small, and the calculation will be inefficient. For this reason, a more efficient thermodynamic path for this problem is a three-stage one in which the enzyme and the inhibitor are first desolvated by transferring them from solution to vacuum. Following this, the enzyme and inhibitor are allowed to bind in vacuum, and finally, the complex EI is solvated by transferring it back to solution. Figure 8.1 illustrates the direct and indirect paths. Since free energy is a state function, the final result is independent of the path taken.

Adiabatic switching and thermodynamic integration

345

Fig. 8.1 Two thermodynamic pathways for the calculation of the binding free energy of an enzyme E and inhibitor I. According the figure, ∆Ab = ∆A1 + ∆A2 + ∆A3 .

The advantage of the indirect path, however, is that once desolvated, the enzyme and inhibitor no longer need to be at such a large separation. Hence, we can start them much closer to each other in order to obtain the vacuum binding free energy. Moreover, this part of the calculation will have a low computational overhead because the expensive water–water interactions have been removed. Although the desolvation and solvation parts of the cycle are still expensive to carry out, they are considerably more straightforward than direct binding in solution. The method shown in Fig. 8.1 is known as the “double decoupling method” (Gilson et al., 1997; Deng and Roux, 2009). As with free-energy perturbation theory, the thermodynamic integration approach can be implemented easily. An immediate disadvantage of the method, however, is the same as applies to eqn. (8.1.7): in order to perform the numerical integration, it is necessary to perform many simulations of a system at physically uninteresting intermediate values of λ where the potential U (r1 , ..., rN , λ) is, itself, unphysical. Only λ = 0, 1 correspond to actual physical states, and ultimately we can only attach physical meaning to the free-energy difference ∆AAB = A(N, V, T, 1) − A(N, V, T, 0). Nevertheless, the intermediate averages must be accurately calculated in order for the integration to yield a correct result. The approach in the next section attempts to reduce the time spent in such unphysical intermediate states, thereby focusing the

346

Free-energy calculations

sampling in the important regions λ = 0, 1. In carrying out calculations that involve coupling and decoupling from a solvent, it is often useful to couple/decouple electrostatic and van der Waals interactions separately, which requires two λ pathways. For example, if the potential UB (r1 , ..., rN ) contains a van der Waals coupling potential uvdw (rij ) between solute atom i and solvent atom j, where rij = |ri − rj |, then the corresponding term in the metapotential (meta) would be uvdw (rij , λ) = g(λ)uvdw (rij ). When uvdw (rij ) is a Lennard-Jones potential uvdw (rij ) = 4ǫ[(σ/rij )12 − (σ/rij )6 ], as λ → 0, particles can easily penetrate the repulsive core, which can become problematic when λ increases, and the Lennard-Jones interaction once again becomes active. Greater numerical stability can be achieved by modifying the form of the potential in such a way as to “soften” the hard core of the potential with the full Lennard-Jones potential being recovered when λ = 1. One such form (Beutler et al., 1994) is (meta,soft)

uvdw



(rij , λ) = 4ǫg(λ) 

σ 6 rij

+

6

(1−λ)σ6 2

!2



σ 6 rij

+

6

(1−λ)σ 2

! . 6

(8.2.10)

(meta,soft)

Clearly, when λ = 1, uvdw (rij , 1) = uvdw (rij ). Once the van der Waals potential is switched on (λ = 1), a second pathway for the Coulomb potential uCoul (rij ) can be defined in terms of a second switching parameter λ′ with a contribution to the (meta) metapotential uCoul (rij , λ′ ) = g(λ′ )qi qj /rij , where qi and qj are the charges on atoms i and j, respectively. Once λ′ reaches 1, full coupling is achieved. It is worth noting that solvation free-energy calculations can be carried out with very large time steps using the isokinetic methods introduced in Section 4.14 and will be discussed further in Chapter 15 (see, also, Fig. 15.4) (Abreu and Tuckerman, 2020).

8.3

Adiabatic free-energy dynamics

Although we cannot entirely eliminate the need to visit the unphysical states between λ = 0 and λ = 1 in adiabatic switching methods, we can substantially alleviate it. If, instead of preselecting a set of λ values, we are willing to allow λ to vary continuously in a molecular dynamics calculation as an additional dynamical degree of freedom, albeit a fictitious one, then we can exploit the flexibility in our choice of the switching functions f (λ) and g(λ) to make the region between λ = 0 and λ = 1 energetically unfavorable. When such a choice is used within a molecular dynamics calculation, λ will spend most of its time in the physically relevant regions close to λ = 0 and λ = 1. To understand how to devise such a scheme, let us consider a Hamiltonian that includes a kinetic energy term p2λ /2mλ , where pλ is a “momentum” conjugate to λ and mλ is a mass-like parameter needed to define the kinetic energy. The parameter mλ also determines the time scale on which λ evolves dynamically. The total Hamiltonian is then N X p2i p2 + U (r1 , ..., rN , λ). (8.3.1) Hλ (r, λ, p, pλ ) = λ + 2mλ i=1 2mi

Adiabatic free-energy dynamics

347

In eqn. (8.3.1), λ and its conjugate momentum pλ are now part of an extended phase space (see, for example, the discussion Section 4.9). We can now define a canonical partition function for the Hamiltonian in eqn. (8.3.1), Q(N, V, T ) =

Z

dpλ

Z

dN p

Z

0

1



Z

dN r e−βHλ (pλ ,λ,p,r) ,

(8.3.2)

D(V )

and therefore compute any ensemble average with respect to the corresponding canonical distribution. In particular, the probability distribution P (λ′ ) = hδ(λ − λ′ )iλ , leads directly to a λ-dependent free-energy function A(λ) through the relation A(λ′ ) = −kT ln P (λ′ ).

(8.3.3)

Equation (8.3.3) defines an important quantity known as a free-energy profile. We will have more to say about free-energy profiles starting in Section 8.6. Note that the free-energy difference     QB P (1) = −kT ln = ∆AAB , (8.3.4) A(1) − A(0) = −kT ln P (0) QA since P (0) and P (1) are the partition functions QA and QB , respectively. The distribution function hδ(λ − λ′ )iλ can be generated straightforwardly in a molecular dynamics calculation by accumulating a histogram of λ values visited over the course of the trajectory. We still need to answer the question of how to maximize the time λ spends near the endpoints λ = 0 and λ = 1. Equation (8.3.3) tells us that free energy is a direct measure of probability. Thus, consider choosing the functions f (λ) and g(λ) such that A(λ) has a significant barrier separating the regions near λ = 0 and λ = 1. According to eqn. (8.3.3), where the free energy is high, the associated phase-space probability is low. Thus, if the region between λ = 0 and λ = 1 is transformed into a low-probability region, λ will spend a very small fraction of its time there in a molecular dynamics run, and sampling near the endpoints λ = 0 and λ = 1 will be enhanced. To illustrate how we can achieve a barrier in the free-energy profile A(λ), let us examine a simple example. Consider taking two uncoupled harmonic oscillators and using λ-switching to “grow in” a bilinear coupling between them. If x and y represent the coordinates of the two oscillators with masses mx and my and frequencies ωx and ωy , respectively, then the two potential energy functions UA and UB take the form UA (x, y) =

1 1 mx ωx2 x2 + my ωy2 y 2 2 2

UB (x, y) =

1 1 mx ωx2 x2 + my ωy2 y 2 + κxy, 2 2

(8.3.5)

where κ determines the strength of the coupling between the oscillators. For this problem, the integration over x and y can be performed analytically (see Problem 8.1), leading to the exact probability distribution function in λ

348

Free-energy calculations

C , P (λ) = q 2 mx ωx2 my ωy2 (f (λ) + g(λ)) − κ2 g 2 (λ)

(8.3.6)

where C is a constant, from which the free-energy profile A(λ) = −kT ln P (λ) becomes i kT h 2 ln mx ωx2 my ωy2 (f (λ) + g(λ)) − κ2 g 2 (λ) . (8.3.7) 2 The reader can easily verify that the free-energy difference ∆A = A(1) − A(0) does not depend on the choice of f (λ) and g(λ). For concreteness, let us set the parameters mx = my = 1, ωx = 1, ωy = 2, kT = 1, and κ = 1. First, consider switches of the form f (λ) = (λ2 − 1)2 and g(λ) = ((λ − 1)2 − 1)2 . The solid line in Fig. 8.2(a) shows the free-energy profile obtained from eqn. (8.3.7). The free-energy profile clearly contains a barrier between λ = 0 and λ = 1. If, on the other hand, we choose f (λ) = (λ2 − 1)4 and g(λ) = ((λ − 1)2 − 1)4 , the free-energy profile appears as the dashed line in Fig. 8.2(a), and we see that the profile exhibits a deep well. A well indicates a region A(λ) =

Fig. 8.2 (a) Free-energy profiles from eqn. (8.3.7). The solid line indicates switches f (λ) = (λ2 − 1)2 and g(λ) = ((λ − 1)2 − 1)2 , and the dashed line indicates f (λ) = (λ2 − 1)4 and g(λ) = ((λ − 1)2 − 1)4 . (b) Corresponding switch f (λ).

of high probability and suggests that in a molecular dynamics calculation, λ will spend considerably more time in this irrelevant region than it will near the endpoints. In this case, therefore, the quartic switches are preferable. For comparison, these two choices for f (λ) (g(λ) is just the mirror image of f (λ)) are shown in Fig. 8.2(b). It can be seen that small differences in the shape of the switches lead to considerable differences in the free-energy profiles. Suppose we now try to use a molecular dynamics calculation based on the Hamiltonian in eqn. (8.3.1) to generate a free-energy profile A(λ) with a substantial barrier between λ = 0 and λ = 1. We immediately encounter a problem: The probability for λ to cross this barrier becomes exponentially small! So have we actually accomplished anything by introducing the barrier? After all, what good is enhancing the sampling in the endpoint regions if the barrier between them cannot be easily crossed? It seems

Adiabatic free-energy dynamics

349

that we have simply traded the problem of inefficient sampling at the endpoints of the λ-path for the problem of crossing a high barrier. That is, we have created what is commonly referred to as a rare-event problem (see also Section 8.5). In order to overcome the rare-event problem, we introduce an approach in this section known as adiabatic free-energy dynamics (AFED) (Rosso et al., 2002; Rosso et al., 2005; Abrams et al., 2006). A related method, called canonical adiabatic freeenergy sampling, was proposed by VandeVondele and R¨ othlisberger (2002). Let U ‡ be the value of the potential energy at the top of the barrier. In the canonical ensemble, the probability that the system will visit a configuration whose potential energy is U ‡ at temperature T is proportional to exp[−U ‡ /kT ], which is exceedingly small when kT ≪ U ‡ . The exponential form of the probability suggests that we could promote barrier crossing by simply raising the temperature. If we do this na¨ıvely, however, we broaden the ensemble distribution and change the thermodynamics. On the other hand, suppose we raise the “temperature” of just the λ degree of freedom. We can achieve this by coupling λ to a thermostat designed to keep the average hp2λ /2mλ i = kTλ , where Tλ > T . In general, the thermodynamics would still be affected. However, under certain conditions, we can still recover the correct free energy. In particular, we must also increase the “mass” parameter mλ to a value high enough that λ is adiabatically decoupled from all other degrees of freedom. When this is done, it can be shown that the adiabatically decoupled dynamics generates the correct free-energy profile even though the phase-space distribution is not the true canonical one. Here, we will give a heuristic argument showing how the modified ensemble distribution and free energy can be predicted and corrected. (Later, in Section 8.10, we will analyze the adiabatic dynamics more thoroughly and derive the phase-space distribution rigorously.) Under the assumption of adiabatic decoupling between λ and the physical degrees of freedom, λ evolves very slowly, thereby allowing the physical degrees of freedom to sample large portions of their available phase space while λ samples only a very localized region of its part of phase space. Since the physical degrees of freedom are coupled to a thermostat at the physical temperature T , we expect the adiabatic dynamics to generate a distribution Z(λ, β) as a function of λ in which the physical coordinates sample essentially all of their configuration space at temperature T at each λ, leading to Z Z(λ, β) = dN r e−βU(r1 ,...,rN ,λ) . (8.3.8)

In the extreme limit, where λ is fixed at each value, this is certainly the correct distribution function if the physical degrees of freedom are properly thermostatted. The adiabatic decoupling approximates the more extreme situation of fixed λ. Equation (8.3.8) leads to an important quantity known as the potential of mean force in λ, obtained from −(1/β) ln Z(λ, β). In the limit of adiabatic decoupling, the potential of mean force becomes an effective potential on which λ can be assumed to move quasiindependently from the physical degrees of freedom. Note that −(1/β) ln Z(λ, β) is also equal to the free-energy profile A(λ) we originally sought to determine. Using the potential of mean force, we can construct an effective Hamiltonian for λ: Heff (λ, pλ ) =

1 p2λ − ln Z(λ, β). 2mλ β

(8.3.9)

350

Free-energy calculations

Now, if λ is thermostatted to a temperature Tλ , then a canonical distribution in eqn. (8.3.9) at temperature Tλ will be generated. This distribution takes the form Padb (λ, pλ , β, βλ ) ∝ e−βλ Heff (λ,pλ ) ,

(8.3.10)

where βλ = 1/kTλ , and the “adb” subscript indicates that the distribution is valid in the limit of adiabatic decoupling of λ. Integrating over pλ yields a distribution P˜adb (λ, βλ , β) ∝ [Z(λ, β)]βλ /β , from which the free-energy profile can be computed as A(λ) = −kTλ ln P˜adb (λ, β, βλ ) = −kT ln Z(λ, β) + const.

(8.3.11)

Thus, apart from a trivial additive constant, the free-energy profile can be computed from the distribution P˜adb (λ) generated by the adiabatic dynamics. Note that eqn. (8.3.11) closely resembles eqn. (8.3.3), the only difference being the prefactor of kTλ rather than kT . Despite the fact that ln P˜adb (λ, β, βλ ) is multiplied by kTλ , the freeenergy profile is obtained at the correct ensemble temperature T . As an example, Abrams and Tuckerman (2006) employed the AFED approach to calculate the hydration free energies of alanine and serine side-chain analogs using the CHARMM22 force field (MacKerell et al., 1998) in a bath of 256 water molecule. These simulations required kTλ = 40kT = 12, 000 K, mλ one-thousand times the mass of an oxygen atom and could obtain the desired free energies, 1.98 kcal/mol (for the alanine analog) and -4.31/kcal/mol (for the serine analog), to within an error of 0.25 kcal/mol in 1-2 ns using a time step of 1.0 fs. In order to keep the λ degree of freedom in the range [0, 1], reflecting boundaries were placed at λ = −ǫ and 1 + ǫ where ǫ = 0.01.

8.4

Jarzynski’s equality and nonequilibrium methods

In this section, we investigate the connection between free energy and nonequilibrium work. We have already introduced the work-free-energy inequality in eqn. (8.2.9), which states that if an amount of work WAB takes a system from state A to state B, then WAB ≥ ∆AAB , where equality holds only if the work is performed reversibly. WAB is a thermodynamic quantity, which means that it can be expressed as an ensemble average of a phase-space function. Specifically, WAB must be an average of the mechanical WAB (x) performed on a single member of the ensemble to drive it from a microstate of A to a microstate of B. However, we need to be careful about how we define this ensemble average because, as we saw in Chapter 1 (eqn. (1.2.6)), the work WAB (x) is defined along a particular path or trajectory, while equilibrium averages are performed over the microstates that describe a particular thermodynamic state. This distinction is emphasized by the fact that the work could be carried out irreversibly, such that the system is driven out of equilibrium. To illustrate the use of the microscopic function WAB (x), suppose we prepare an initial distribution of microstates x0 belonging to A and then initiate a trajectory from each of these initial states. The ensemble average that determines the thermodynamic work WAB is an average of WAB (x0 ) over this initial ensemble, which we will take to be a canonical ensemble. The trajectory xt along which the work is computed is a unique function of the initial condition x0 , i.e., xt = xt (x0 ). Thus, the work WAB (x0 ) is

Jarzynski’s equality

351

actually a functional of the path WAB [xt ]. However, since the trajectory xt is uniquely determined by the initial condition x0 , WAB is also determined by x0 , and we have Z CN dx0 e−βHA (x0 ) WAB (x0 ). (8.4.1) WAB = hWAB (x0 )iA = QA (N, V, T ) Thus, the work-free-energy inequality can be stated as hWAB (x0 )iA ≥ ∆AAB . From this inequality, it would seem that performing work on a system as a method to compute free energy leads, at best, to an upper bound on the free energy. It turns out, however, that irreversible work can be used to calculate free-energy differences by virtue of a connection between the two quantities first established by C. Jarzynski (1997) that is now referred to as the Jarzynski equality. The equality states that if instead of averaging WAB (x0 ) over the initial canonical distribution (that of state A), an average of exp[−βWAB (x0 )] is performed over the same distribution, the result is exp[−β∆AAB ], that is, Z E D CN −β∆AAB −βWAB (x0 ) e = e = dx0 e−βHA (x0 ) e−βWAB (x0 ) . (8.4.2) QA (N, V, T ) A Hence, the free-energy difference ∆AAB = −kT ln hexp(−βWAB (x0 )iA . This remarkable result not only provides a foundation for the development of nonequilibrium free-energy methods but has important implications for thermodynamics in general.2 Since its introduction, the Jarzynski equality has been the subject of both theoretical and experimental investigation (Park et al., 2003; Liphardt et al., 2002). The Jarzynski equality can be derived using different strategies, as we will now show. Consider first a time-dependent Hamiltonian of the form H(r, p, t) =

N X p2i + U (r1 , ..., rN , t). 2mi i=1

(8.4.3)

For time-dependent Hamiltonians, the usual conservation law dH/dt = 0 does not hold, which can be seen by computing dH ∂H = ∇xt H · x˙ t + , dt ∂t

(8.4.4)

where the phase-space vector x = (r1 , ..., rN , p1 , ..., pN ) ≡ (r, p) has been introduced. Integrating both sides over time from t = 0 to an endpoint t = τ , we find Z τ Z τ Z τ ∂H dH = dt ∇xt H · x˙ t + dt . (8.4.5) dt dt ∂t 0 0 0 Equation (8.4.5) can be regarded as a microscopic version of the first law of thermodynamics, where the first and second terms represent the heat absorbed by the 2 Jarzynski’s equality is actually implied by a more general theorem known as the Crooks fluctuation theorem (Crooks, 1998,1999).

352

Free-energy calculations

system and the work done on the system over the trajectory, respectively.3 That the work depends on the initial phase-space vector x0 can be seen by defining the work associated with the trajectory xt (x0 ) obtained up to time t = t′ as Wt′ (x0 ) =

Z

0

t′

dt

∂ H(xt (x0 ), t). ∂t

(8.4.6)

Note that WAB (x0 ) = Wτ (x0 ). The derivation of the Jarzynski equality requires the calculation of the ensemble average of exp[−βWAB (x0 )] = exp[−βWτ (x0 )] over a canonical distribution in the initial conditions x0 . Before we examine how this average might be performed along a molecular dynamics trajectory for a finite system, let us consider the simpler case where each initial condition x0 evolves in isolation according to Hamilton’s equations as derived from eqn. (8.4.3). If Hamilton’s equations are obeyed, then the first (heat) term on the right in eqn. (8.4.4) vanishes, ∇xt H · x˙ t = 0, and we can express the work as Z t′ d (8.4.7) dt H(xt (x0 ), t) = H(xt′ (x0 ), t′ ) − H(x0 , 0). Wt′ = dt 0 Taking t′ = τ , and recognizing that H(x0 , 0) = HA (x0 ), we can write the ensemble average of exp[−βWAB (x0 )] as Z

−βW CN AB dx0 e−βHA (x0 ) e−β[H(xτ (x0 ),τ )−HA (x0 )] e = A QA (N, V, T ) Z CN = dx0 e−βH(xτ (x0 ),τ ) . (8.4.8) QA (N, V, T )

Now let us change variables from x0 to xτ (x0 ) in the integral in eqn. (8.4.8). Since the trajectory xt (x0 ) is generated from Hamilton’s equations of motion, the mapping of x0 to xτ (x0 ) is unique. Moreover, by Liouville’s theorem, the phase-space measure satisfies dxτ = dx0 . Therefore, we find that Z

−βW CN AB dxτ e−βHB (xτ ) e = A QA (N, V, T ) =

QB (N, V, T ) QA (N, V, T )

= e−β∆AAB ,

(8.4.9)

which is Jarzynski’s equality. Note that by Jensen’s inequality, 3 To see this, consider, for example, a mechanical piston slowly compressing a gas. Such a device can be viewed as an explicitly time-dependent external agent acting on the system, which can be incorporated into the potential. For this reason, the second term on the right in eqn. (8.4.5) represents the work performed on the system. However, even in the absence of an external agent, if the system interacts with a thermal reservoir (see Section 4.3) then the Hamiltonian is not conserved, and the first term on the right in eqn. (8.4.5) will be nonzero. Hence, this term represents the heat absorbed.

Jarzynski’s equality

which implies that



e−βWAB



A

≥ e−βhWAB iA ,

e−β∆AAB ≥ e−βhWAB iA .

353

(8.4.10)

(8.4.11)

Taking the natural log of both sides of eqn. (8.4.11) leads to the work-free-energy inequality. In fact, from eqn. (8.4.11), it becomes clear how to prove the GibbsBogoliubov inequality in eqn. (4.7.78). If we set t′ = τ and xB = xA = x, then hWAB i = hHB (x) − HA (x)iA , so that we obtain the expected result AB ≤ AA + hHB (x) − HA (x)iA .

(8.4.12)

We now present a proof of Jarzynski’s equality that is relevant for finite systems coupled to thermostats typically employed in molecular dynamics. The original version of this derivation is due to Cuendet (2006) and was subsequently generalized by Sch¨ ollPaschinger and Dellago (2006). The proof does not depend critically on the particular thermostatting mechanism as long as the scheme rigorously generates a canonical distribution. For notational simplicity, we will employ the Nos´e-Hoover scheme in eqns. (4.9.14). Although we already know that the Nos´e-Hoover equations have many weaknesses which can be fixed using, for example, Nos´e-Hoover chains, the former is sufficient for our purpose here and keeps the notation simpler. We start, therefore, with the equations of motion r˙ i =

pi mi

p˙ i = − η˙ = p˙ η =

∂ pη U (r, t) − pi ∂ri Q

pη Q X p2 i − 3N kT, m i i

(8.4.13)

which are identical to eqns. (4.9.14) except for the time-dependent potential U (r, t) and the choice of d = 3 for the dimensionality. As was discussed in Section 4.10, √ these equations have an associated phase-space metric g = e3N η . Note that the heat term in eqn. (8.4.4) does not vanish for the Nos´e-Hoover equations because energy is exchanged between the physical system and the thermostat. Nevertheless, the work in eqn. (8.4.6) allows us to construct a conserved energy according to ˜ t , t) = H(x

N X p2η p2i + U (r, t) + + 3N kT η − Wτ (x0 ) 2mi 2Q i=1

354

Free-energy calculations

≡ H′ (xt , t) − Wτ (x0 ).

(8.4.14)

Here x is the extended phase-space vector x = (r1 , ..., rN , η, p1 , ..., pN , pη ). According to the procedure outlined described in Section 4.10, the average of exp[−βWτ (x0 )] is computed as Z

−βW 1 AB = e dx0 e3N η e−βWτ (x0 ) δ (H′ (x0 , 0) − C) , (8.4.15) A ZT (0)

where dx0 = dN p0 dN r0 dpη,0 dη0 , C is a constant and ZT (0) is the “microcanonical” partition function generated by the Nos´e-Hoover equations for the t = 0 Hamiltonian Z ZT (0) = dx0 e3N η0 δ (H′ (x0 , 0) − C) . (8.4.16) In eqns. (8.4.15) and (8.4.16), the ensemble distribution must be the distribution of the initial state, which is the state A. In Section 4.10, we showed that when eqn. (8.4.16) is integrated over η, the canonical partition function is obtained in the form Z Z 2 eβC dpη,0 e−βpη,0 /2Q dN p0 dN r0 e−βH(r0 ,p0 ) ZT (0) = 3N kT ∝ QA (N, V, T ).

(8.4.17)

In order to complete the proof, we need to carry out the integration over η in the numerator of eqn. (8.4.15). As was done above, we change variables from x0 to xτ (x0 ). Recalling from Section 4.10 that the measure exp(3N η)dx is conserved, it follows that exp(3N η0 )dx0 = exp(3N ητ )dxτ . Therefore, the variable transformation leads to Z

−βW 1 AB dxτ e3N ητ e−βWτ (x0 (xτ )) δ (H′ (x0 (xτ ), 0) − C) . (8.4.18) = e A ZT (0) From eqn. (8.4.14), it follows that H′ (x0 (xτ ), 0) = H′ (xτ , τ ) − Wτ (x0 (xτ )). Inserting this into eqn. (8.4.18), we obtain Z

−βW 1 AB = e dxτ e3N ητ e−βWτ (x0 (xτ )) A ZT (0) ×δ (H′ (xτ , τ ) − Wτ (x0 (xτ )) − C) .

(8.4.19)

The integration over ητ can now be performed by noting that the δ-function requires " # p2η,τ 1 C − H(rτ , pτ ) − + Wτ (x0 (xτ )) . (8.4.20) ητ = 3N kT 2Q Using eqn. (8.4.20), together with eqn. (4.9.6), to perform the integration over ητ causes the exponential factors of Wτ (x0 (xτ )) to cancel, yielding Z Z

−βW eβC 1 −βp2η,τ /2Q AB dp e e = dN pτ dN rτ e−βH(rτ ,pτ ) η,τ A ZT (0) 3N kT

Jarzynski’s equality



1 QB (N, V, T ), ZT (0)

355

(8.4.21)

where, since the integration is performed using the phase-space vector and Hamiltonian at t = τ , the result is proportional to the canonical partition function in state B. In fact, when eqn. (8.4.17) for ZT (0) is substituted into eqn. (8.4.21), the prefactors cancel, yielding simply

e−βWAB



A

=

QB = e−β∆AAB , QA

(8.4.22)

which, again, is Jarzynski’s equality. Jarzynski’s equality has an intriguing connection with mechanical pulling experiments involving laser trapping (Liphardt et al., 2002) as suggested by Hummer and Szabo (2001), or atomic force microscopy (Binnig et al., 1986) applied, for example, to biomolecules (Fernandez and Li, 2004). Experiments such as these can be mimicked in molecular dynamics calculations (Park et al., 2003; Park and Schulten, 2004). For instance, suppose we wish to unfold a protein or polypeptide by “pulling” on the ends as illustrated in Fig. 8.3. Within the Jarzynski framework, we could perform nonequilibv

fixed

Fig. 8.3 Extension of deca-alanine by pulling via eqn. (8.4.23) (reprinted with permission from Park et al., J. Chem. Phys. 119, 3559 (2003), copyright American Institute of Physics).

356

Free-energy calculations

rium calculations and obtain the free-energy change ∆A associated with the unfolding process. This could be accomplished by adding a time-dependent term to the potential that “drives” the distance between the two ends of the molecule from its (small) value in the folded state to a final (large) value in the unfolded state. For concreteness, let us designate the atomic coordinates at the ends of the molecule by r1 and rN . (In practice, r1 and rN could be nitrogen atoms at the N- and C-termini of a protein or polypeptide.) The time-dependent potential would then take the form 1 2 U (r1 , ..., rN , t) = U0 (r1 , ..., rN ) + κ (|r1 − rN | − req − vt) , 2

(8.4.23)

where U0 is the internal potential described, for example, by a force field. The second term in eqn. (8.4.23) is a harmonic potential with force constant κ that drives the end-to-end distance |r1 − rN | away from its equilibrium value at the folded state req by means of the time-dependent term vt, where v is the pulling rate. In practice, applying Jarzynski’s formula to such a problem requires generating an ensemble of initial conditions x0 and then performing the pulling “experiment” in order to obtain a work value Wτ (x0 ) for each chosen x0 . The final average of exp(−βWτ (x0 )) then leads to the free-energy difference. Several challenges arise in the use of Jarzynski’s formula. First, while it is an elegant approach to free-energy calculations, a potential bottleneck needs to be considered. The work values that are generated from an ensemble of nonequilibrium processes have a distribution P (Wτ ). Thus, we could imagine calculating the average exp(−βWτ ) using this distribution according to Z

−βWτ (8.4.24) e = dWτ P (Wτ )e−βWτ . However, as illustrated in Fig. 8.4, P (Wτ ) and P (Wτ ) exp(−βWτ ) can peak at very different locations, depending on the value of β. In Fig. 8.4, the average is dominated by small values of Wτ , which lie predominantly in the left tail of the work probability distribution P (Wτ ) and occur only rarely. Consequently, many “fast-switching” trajectories with high pulling rates v are needed in order to sample the tail of the work distribution adequately. Alternatively, trajectories with very slow pulling rates (“slow-switching” trajectories) could be used, in which case a smaller ensemble can be employed; however, the method will then have an efficiency comparable to equilibrium methods (Oberhofer et al., 2005). It is worth noting that if the work distribution P (Wτ ) is Gaussian or nearly Gaussian, then the exponential average can be computed reliably using a truncated cumulant expansion (see eqns. (4.8.22) and (4.8.24)): 

β2 hWτ2 i − hWτ i2 , ln e−βWτ ≈ −βhWτ i + 2

(8.4.25)

which eliminates the problem of poor overlap between P (Wτ ) and P (Wτ ) exp(−βWτ ) in the exponential average. A second challenge raises a more fundamental question concerning the proof of the equality. In the proof we have presented, it is tacitly assumed that after the transformation x0 to xτ is made, the integral that results constitutes an actual equilibrium

Rare events

357

Fig. 8.4 Shift in a Gaussian work distribution as a result of the multiplication by exp(−βWτ ) for various values of β.

partition function (see eqn. (8.4.9)). That is, we assume an equilibrium distribution of phase-space points xτ , but if the the driving force is too strong, this might not be valid in a finite system. Certainly, if the driving force is sufficiently mild to maintain the system close to equilibrium along the time-dependent path between states A and B, then the assumption is valid, and indeed, in this limit, the Jarzynski equality seems to be most effective in actual applications (Oberhofer et al., 2005). Away from this limit, it might be necessary to allow the final states xτ (x0 ) to relax to an equilibrium distribution by performing a short run with a thermostat coupling, even if such a relaxation process has been shown to be formally unnecessary (Jarzynski, 2004).

8.5

The problem of rare events

In Section 8.3, we alluded to the rare-event problem associated with large barriers separating important minima on a potential energy surface. Such energy surfaces are known as rough energy landscapes and characterize, for example, proteins, glasses, and polymers. As we noted in Section 8.3, when high barriers separate the minima, the probability that a fluctuation capable of driving the system from one minimum to another over such a barrier will occur becomes exponentially small with the ratio of the barrier height to kT . Consequently, such an event is described as a rare event (see Fig. 7.5). In the remainder of this chapter, we will discuss this problem at length together with methods for enhancing conformational sampling on rough potential energy surfaces. In order to illustrate the concept of “roughness,” consider the alanine dipeptide, shown in Fig. 3.8. This small dipeptide can exist in a number of stable conformations, which can be characterized by two backbone dihedral angles φ and ψ known as the

358

Free-energy calculations

Ramachandran angles (see Fig. 3.8). Figure 8.5 shows the two-dimensional free-energy surface in these two angles obtained by “integrating out” the remaining degrees of freedom (we will discuss methods for generating such surfaces later in this chapter). The figure shows that there are four pronounced minima on the surface for the particular model employed, indicating four stable conformations with different relative free energies. These relative free energies can be used to rank the minima in terms of their thermodynamic importance. The full free-energy surface also contains the locations and heights of barriers, from which information about transition states and rates of conformational changes can be obtained. In a study by Chen et al. (2012) of the free-energy surface of the alanine tripeptide using the four backbone Ramachandran angles, 14 minima were found, which is close to 42 = 16. If we now consider

Fig. 8.5 Free-energy surface (top) and contours (bottom) as a function of the Ramachandran angles for an alanine dipeptide in solution.

that a free-energy surface of just two dihedral angles contains four minima, and one containing four dihedral angles contains approximately 16 minima, then in a protein of modest size containing 50 such pairs, the number of possible conformations would be roughly 450 = 2100 . From this simple exercise, we have a crude measure of the roughness of an energy landscape.

Collective variables

359

The large number of free-energy minima exhibited by our modest protein is far more than can be sampled in a typical computer simulation, and in fact, many of these minima tend to be high enough in energy as to contribute little to an ensemble average. How, then, can we limit sampling to the most important regions of an energy landscape? There is no definitive answer at present, and the question remains an active area of research. In many cases, however, it is possible to identify a subset of generalized coordinates that is particularly well suited for characterizing a given process. In the next few sections, we will discuss the selection of collective variables and describe how to make use of such variables in enhanced sampling schemes for mapping out their corresponding free-energy surfaces.

8.6

Collective variables

It is frequently the case that the progress of some chemical, mechanical, or thermodynamics process can be characterized using a small set of generalized coordinates in a system. When generalized coordinates are used in this manner, they are typically referred to as collective variables. They may also be referred to as reaction coordinates or order parameters depending on the context, type of system, and their information content. The term “reaction coordinate” or “order parameter” generally refers to a single coordinate capable of describing the full pathway of a particular process such as a chemical reaction, conformational change, change in phase, etc. In this case, we think of two basins separated by a saddle point or transition state, all of which is accurately described by the chosen coordinate. By contrast, a collective variable refers to one of a possibly large set of functions of the particle coordinates in a system that characterizes the conformational landscape, which might include many minima and saddle points separating them. Collective variables are typically chosen for their ability to distinguish the basins of attraction in the configuration space with less regard for how well they characterize transition states. These functions can certainly be considered as generalized coordinates; however, it is rarely necessary to obtain a full generalized coordinate transformation that contains the chosen collective variables as a subset of the set of generalized coordinates. Usually, in choosing a set of collective variables, one seeks to identify these minima and obtain free-energy differences between them. If, by chance, a collective variable is able to characterize a transition state between two basins accurately, it could be identified as a candidate reaction coordinate for a process that takes the system from one basin to the other through the transition state. Because of the distinction between the terms “collective variables” and “reaction coordinates” or “order parameters”, we will mostly use the former to refer to the variables used in the enhanced sampling methods to be discussed in the next few sections and employ the latter two terms only when when referring to coordinates describing a specific transition between two basins and passing through a proper transition state (see Section 8.12). As an example of a collective variable, consider a simple gas-phase diatomic dissociation process AB −→ A+B. If rA and rB denote the Cartesian coordinates of atoms A and B, respectively, then a useful collective generalized coordinate for describing a dissociation reaction is simply the distance r = |rB − rA |. As we saw in Section 1.4.2, a set of generalized coordinates that contains r as one of the coordinates is the center

360

Free-energy calculations

of mass R = (mA rA + mB rB )/(mA + mB ), the magnitude of theprelative coordinate r = |rB − rA |, and the two angles φ = tan−1 (y/x) and θ = tan−1 ( x2 + y 2 /z), where x, y, and z are the components of r = rB −rA . Of course, in the gas phase, r is the most relevant coordinate when the potential depends only on r, and r is likely to be a proper reaction coordinate for the gas-phase dissociation process as well. If the dissociation reaction takes place in solution, however, then some thought is needed as to whether a simple coordinate like r is sufficient to serve as a reaction coordinate. If the solvent drives or hinders the reaction by some mechanism, then a more complex coordinate that involves solvent degrees of freedom is likely needed. If the role of the solvent is a more “passive” one, then the free energy A(r), obtained by “integrating out” all other degrees of freedom, will yield important information about the thermodynamics and kinetics of the entire dissociation process. Another example is the gas-phase proton transfer reaction A–H· · ·B−→A· · ·H–B. Here, although the two distances |rH − rA | and |rH − rB | can be used to monitor the progress of the proton away from A and toward B, respectively, neither distance alone is sufficient to follow the progress of the reaction. However, a collective variable equal to the difference δ = |rH − rB | − |rH − rA |, which is positive (negative) when the proton is near A (B) and zero when the proton is equidistant between A and B, is a reasonable choice for describing the process. Therefore, δ is likely also a proper reaction coordinate in the gas phase. As it happens, a complete set of generalized coordinates involving δ can be constructed as follows. If rA , rB , and rH denote the Cartesian coordinates of the three atoms, then first introduce the center of mass R = (mA rA + mB rB + mH rH )/(mA + mB + mH ), the relative coordinate between A and B, r = rB − rA , and a third relative coordinate s between H and the center of mass of A and B, s = r − (mA rA + mB rB )/(mA + mB ). Finally, r is transformed into spherical polar coordinates (r, θ, φ), and from r and s, three more coordinates are formed as σ = |s +

mA mB r| + |s − r|, mA + mB mA + mB

δ = |s +

mA mB r| − |s − r|, mA + mB mA + mB

(8.6.1)

and the angle α, which measures the “tilt” of the plane containing the three atoms from the vertical. These coordinates could also be useful if the reaction takes place in solution, but as noted above, some careful thought about relevant solvent coordinates is needed. As a third example, conformational changes in small peptides, such as di- and tripeptides, can be described in terms of the Ramachandran backbone dihedral angles φ and ψ (see Fig. 3.8, for example). For longer oligopeptides that can fold into protein secondary structure elements, such as helices and β-sheets, other coordinates could be used in addition to the Ramachandran angles such as the radius of gyration of Nb heavy backbone atoms, given by v 2  u u Nb Nb X u 1 X ri − 1 rj  , (8.6.2) RG = t Nb i=1 Nb j=1

Collective variables

361

or the number of hydrogen bonds of length approximately d0 between nO oxygens and nH hydrogens, which can be expressed as NH =

nO X nH X 1 − [(ri − rj )/d0 ]6 . 1 − [(ri − rj )/d0 ]12 i=1 j=1

(8.6.3)

These coordinates have been shown to be useful in characterizing both the folded and unfolded states of polypeptides (Bussi et al., 2006). Given some knowledge of secondary structural elements, collective variables capable of distinguishing α-helix patterns from β-strands are also useful in this context (Bonomi et al., 2009). In all of these examples, the collective variables are functions of the primitive Cartesian coordinates of some or all of the atoms in the system. If we are interested in the packing of domains of secondary structural elements in proteins, then Cartesian centroids of these domains can be employed (Abrams and Vanden-Eijnden, 2010). As a final example, collective variables for exploring the polymorph landscape of a molecular crystal might include the cell matrix h of eqn. (5.7.2), as different crystal structures in different space groups will often be characterized by different values of the cell parameters (Yu and Tuckerman, 2011). In addition to the cell matrix, collective variables measuring the internal ordering of molecules are also important in this context. For example, the Shannon entropy can be formulated as a collective variable given the full three-dimensional spatial distribution g˜(r) of eqn. (4.7.17) as Z S(r1 , ..., rN ) = −N ρk dr [˜ g (r, r1 , ..., rN ) ln g˜(r, r1 , ..., rN ) − (˜ g (r, r1 , ..., rN ) − 1)] ,

(8.6.4)

(Piaggi and Parrinello, 2018), where ρ is the number density. If g˜(r, r1 , ..., rN ) is modeled as a sum of Gaussians in distances rij and relative orientations cos θij of pairs (ij) of molecules according to g˜(r, r1 , ..., rN ) =

1

X

4πr2 N ρσr2 σθ2 i,j

e−(r−rij )

2

/2σr2

e−(cos θ−cos θij )

2

/2σθ2

,

(8.6.5)

where σr and σθ are the Gaussian widths, then S(r1 , ..., rN ) along with h can be particularly effective in searching for polymorphs of crystals of small rigid molecules such as benzene, resorcinol, urea, 5-fluorouracil, or co-crystals containing more than one of these components (Song et al., 2020). While collective variables are potentially powerful constructs and intuitively appealing, they must be used with care. Enhanced sampling approaches applied to poorly chosen collective variables can bias the system in misleading ways and generate erroneous predictions of free-energy barriers, transition states, and mechanisms. A dramatic illustration of this can be seen with the autodissociation of liquid water according to the classic reaction 2H2 O(l) −→ H3 O+ (aq) + OH− (aq), as discussed in Section 7.7. The reaction ostensibly only requires transferring a proton from one water molecule to another. If we pursue this simple picture, seemingly sensible collective variables might be the distance between the oxygen and the transferring proton or

362

Free-energy calculations

the number of hydrogens covalently bonded to one of the oxygens. As it turns out, these coordinates do not capture the mechanism of the reaction and consequently fail to yield an accurate free energy or autoionization constant Kw (Trout and Parrinello, 1998). A clearer picture of this reaction was provided by Geissler et al. (2001) using the transition path sampling (Dellago et al., 1998; Bolhuis et al., 2002; Dellago et al., 2002) discussed in Section 7.7. From their calculations, these authors posited that the dissociation reaction is complete only when the H3 O+ and OH− ions separate such that no contiguous path of hydrogen bonds exists between them that would allow them to recombine through a series of proton transfer steps. In order to describe such a process correctly, a collective variable that involves many solvent degrees of freedom would be needed and could ultimately be employed as a reaction coordinate. Later, in Section 8.12, we will describe a technique for judging the fitness of a collective variable to serve as a reaction coordinate. The examples presented in this section suggest that there is neither an automated procedure for selecting collective variables a priori, nor is there a criterion for determining the number of collective variables needed to characterize a particular configurational landscape of a complex system. In Chapter 17, where machine learning techniques are introduced, we will discuss how machine learning can be leveraged to assist in the selection of collective variables and reaction coordinates. Keeping in mind all of our caveats about choosing collective variables, we now describe a number of popular methods designed to enhance sampling along preselected collective variables. These methods are designed to generate, either directly or indirectly, the probability distribution function of a subset of n collective variables of interest in a system. When these collective variables are expressed as functions of the N primitive particle coordinates r1 , ..., rN as qα = fα (r1 , ..., rN ), α = 1, ..., n, then the probability density that these n coordinates will have values qα = sα in the canonical ensemble is

P (s1 , ..., sn ) =

CN Q(N, V, T )

Z

dN p dN re−βH(r,p)

n Y

α=1

δ(fα (r1 , ..., rN ) − sα ),

(8.6.6)

where the δ-functions are introduced to fix the collective variables q1 , ..., qn at s1 , ..., sn . The function P (s1 , ..., sn ) is known as the marginal probability distribution function. Once the marginal distribution P (s1 , ..., sn ) is known, the free-energy surface in these coordinates is defined in terms of the marginal distribution as A(s1 , ..., sn ) = −kT ln P (s1 , ..., sn ).

(8.6.7)

Generating a free-energy surface is far from simple, and the remainder of this chapter will be devoted to advanced techniques for obtaining and understanding these often complicated functions. We will revisit the problem of generating and representing freeenergy surfaces in Section 17.7.

Blue moon ensemble

8.7

363

The blue moon ensemble approach

The term “blue moon,” as the name implies, colloquially describes a rare event.4 The blue moon ensemble approach was introduced by Carter et al. (1989) and Sprik and Ciccotti (1998) for computing the free-energy profile of a collective variable or reaction coordinate when one or more high barriers along this coordinate direction lead to a rare-event problem in an ordinary thermostatted molecular dynamics calculation. Suppose a process of interest can be monitored by a single collective coordinate q1 = f1 (r1 , ..., rN ). Then according to eqns. (8.6.6) and (8.6.7), the probability that f1 (r1 , ..., rN ) has the value s and the associated free-energy profile, are given by Z CN dN p dN re−βH(r,p) δ(f1 (r1 , ..., rN ) − s) P (s) = Q(N, V, T ) Z 1 = dN re−βU(r) δ(f1 (r1 , ..., rN ) − s) N !λ3N Q(N, V, T ) A(s) = −kT ln P (s).

(8.7.1)

In the second line, the integration over momenta has been performed giving the thermal prefactor factor λ3N . In the blue moon ensemble approach, a holonomic constraint (see Section 1.9 and Section 3.9) σ(r1 , ..., rN ) = f1 (r1 , ..., rN ) − s is introduced in a molecular dynamics calculation as a means of “driving” the coordinate q1 from an initial value s(i) of the parameter s to a final value s(f ) via a set of intermediate points s(1) , ..., s(n) between s(i) and s(f ) . As we saw in Section 3.9, the introduction of a holonomic constraint does not yield the single δ-function condition δ(σ(r)) = δ(f1 (r) − s), where r ≡ r1 , ..., rN , as required by eqn. (8.7.1), but rather the product δ(σ(r))δ(σ(r, ˙ p)), since both the constraint and its first time derivative are imposed in a constrained dynamics calculation. We will return to this point shortly. In addition, the blue moon ensemble approach does not yield A(s) directly but rather the derivative kT dP dA =− , ds P (s) ds

(8.7.2)

from which the free-energy profile A(s) along the chosen collective variable or reaction coordinate and the free-energy difference ∆A = A(sf )−A(si ) are given by the integrals (i)

A(s) = A(s ) +

Z

s

s(i)

dA ′ ds , ds′

∆A =

Z

s(f ) s(i)

dA ds. ds

(8.7.3)

In the free-energy profile expression, A(s(i) ) is just an additive constant that can be left off or adjusted so the minimum value of the profile at smin corresponds to A(smin ) = 0. In practice, these integrals are evaluated numerically using the integration points s(1) , ..., s(n) . These points can be chosen equally-spaced between s(i) and s(f ) , so that the integrals can be evaluated using a standard numerical quadrature, or they can 4 A “blue moon” usually refers to the occurrence of a second full moon in a calendar month. The extra full moon occurs roughly every 2.7 years and is, therefore, a rare event.

364

Free-energy calculations

be chosen according to a more sophisticated quadrature scheme. If the full profile A(s) is desired, however, the number of quadrature points should be sufficient to capture the detailed shape of the profile. We next show how to evaluate the derivative in eqn. (8.7.2). Noting that P (s) = hδ(f1 (r) − s)i, the derivative can be written as R N ∂ δ(f1 (r) − s) d p dN r e−βH(r,p) ∂s 1 dP CN = . (8.7.4) P (s) ds Q(N, V, T ) hδ(f1 (r) − s)i

In order to avoid the derivative of the δ-function, an integration by parts is performed. We first introduce a complete set of 3N generalized coordinates qα = fα (r1 , ..., rN ) and their conjugate momenta pα . This transformation, being canonical, has a unit Jacobian so that dN p dN r = d3N p d3N q. Denoting the transformed Hamiltonian as ˜ H(q, p), eqn. (8.7.4) becomes R 3N 3N −β H(q,p) ˜ ∂ d pd qe CN 1 dP ∂s δ(q1 − s) = . (8.7.5) P (s) ds Q(N, V, T ) hδ(q1 − s)i

Next, we change the derivative in front of the δ-function from ∂/∂s to ∂/∂q1 , using the fact that ∂ ∂ δ(q1 − s). (8.7.6) δ(q1 − s) = − ∂s ∂q1 Finally, we integrate by parts to obtain i R 3N 3N h ∂ −β H(q,p) ˜ d p d q δ(q1 − s) e ∂q 1 dP CN 1 = P (s) ds Q(N, V, T ) hδ(q1 − s)i R 3N 3N ∂ H ˜ ˜ d p d q ∂q1 e−β H(q,p) δ(q1 − s) βCN =− Q(N, V, T ) hδ(q1 − s)i * ! + ˜ β ∂H =− δ(q1 − s) . (8.7.7) hδ(q1 − s)i ∂q1 The last line defines a new ensemble average. Specifically, the average must be performed subject to the condition q1 = s. Note, however, that this is not equivalent to a mechanical constraint since the additional condition q˙1 = 0 is not imposed. This new ensemble average will be denoted h· · ·icond . Hence, the derivative dP/ds can be s expressed as +cond * ˜ 1 dP ∂H . (8.7.8) = −β P (s) ds ∂q1 s

Substituting eqn. (8.7.8) yields the free-energy profile * +cond Z s ˜ ∂H ′ . A(s) = A(si ) + ds ∂q1 ′ s(i)

(8.7.9)

s

cond ˜ Noting that −h∂ H/∂q is the expression for the average of the generalized force 1 is′ ′ on q1 when q1 = s , the integral represents the work done on the system in moving

Blue moon ensemble

365

from s(i) to an arbitrary final point q. Since the conditional average implies a full simulation at each fixed value of q1 , the thermodynamic transformation is carried out reversibly, and eqn. (8.7.9) is consistent with the work-free-energy inequality. Equation (8.7.9) provides insight into the underlying statistical mechanical expression for the free energy. Technically, however, the need for a full canonical transformation to generalized coordinates and conjugate momenta is inconvenient (see eqn. (1.4.16)). A more useful expression results when we perform the momentum integrations before introducing the transformation to generalized coordinates. Starting again with eqn. (8.7.4), we integrate out the momenta to yield 1 1 dP = 3N P (s) ds N !λ Q(N, V, T )

R

∂ dN r e−βU(r) ∂s δ(f1 (r) − s) . hδ(f1 (r) − s)i

(8.7.10)

Next, we transform just the coordinates to generalized coordinates qα = fα (r1 , ..., rN ). However, because there is no corresponding momentum transformation, the Jacobian of the transformation is not unity. Let J(q) = |∂(r1 , ..., rN )/∂(q1 , ..., q3N )| denote this Jacobian. Equation (8.7.10) then becomes ˜

1 1 dP = 3N P (s) ds N !λ Q(N, V, T )

R

∂ d3N q J(q)e−β U(q) ∂s δ(q1 − s) hδ(q1 − s)i

1 N !λ3N Q(N, V, T )

R

∂ δ(q1 − s) d3N q e−β (U(q)−kT ln J(q)) ∂s , hδ(q1 − s)i

=

˜

(8.7.11)

where, in the last line, the Jacobian has been exponentiated. Changing the derivative ∂/∂s to ∂/∂q1 and performing the integration by parts as was done in eqn. (8.7.7), we obtain 1 1 dP = P (s) ds N !λ3N Q(N, V, T )

R

d3N q

˜ ln J(q)) ∂ −β (U(q)−kT δ(q1 ∂q1 e

hδ(q1 − s)i

− s)

β N !λ3N Q(N, V, T ) i R 3N h ∂ U˜ ˜ d q ∂q1 − kT ∂q∂ 1 ln J(q) e−β (U(q)−kT ln J(q)) δ(q1 − s)

=− ×

= −β

*"

hδ(q1 − s)i #+cond

˜ ∂ ∂U − kT ln J(q) ∂q1 ∂q1

.

(8.7.12)

s

Therefore, the free-energy profile becomes (i)

A(s) = A(s ) +

Z

s

s(i)

ds



*"

#+cond ˜ ∂U ∂ − kT ln J(q) . ∂q1 ∂q1 ′ s

(8.7.13)

366

Free-energy calculations

˜ , the transformed potential, can be computed form the original The derivative of U potential U using the chain rule N

X ∂U ∂ri ˜ ∂U = · . ∂q1 ∂ri ∂q1 i=1

(8.7.14)

Equation (8.7.13) can be applied straightforwardly to simple collective variables or reaction coordinates for which the full transformation to generalized coordinates is known. Let us return to the problem of computing conditional ensemble averages from constrained molecular dynamics. We will use this discussion as a vehicle for introducing yet another expression for A(q) that does not require a coordinate transformation at all. Recall from Section 1.9 that the equations of motion for a system subject to a single holonomic constraint σ(r1 , ..., rN ) = 0 are r˙ i =

pi mi

p˙ i = Fi + λ

∂σ , ∂ri

(8.7.15)

where λ is the Lagrange multiplier needed to impose the constraint. In order to carry out the statistical mechanical analysis of the constrained dynamics, we shall make use of Gauss’s principle of least constraint introduced in Section 1.10. There we showed that the Lagrange multiplier is given by P P j,k ∇j ∇k σ · ·pj pk /(mj mk ) j Fj · ∇j σ/mj + λ=− . (8.7.16) P 2 j (∇j σ) /mj

When eqn. (8.7.16) is substituted into eqn. (8.7.15), we obtain a set of non-Hamiltonian equations of motion that explicitly satisfy the two conservation laws σ(r) = 0 and σ(r, ˙ p) = 0. In addition, the equations of motion conserve the Hamiltonian H(r, p), since the forces of constraint do no work on the system. The methods of classical non-Hamiltonian statistical mechanics introduced in Section 4.10 allow us to derive the phase-space distribution sampled by eqns. (8.7.15). According to Section 4.10, we need to determine the conservation laws and the phase-space metric in order to construct the “microcanonical” partition function in eqn. (4.10.21). The partition function corresponding to eqns. (8.7.15) is given by Z p Z = dN r dN p g(r, p)δ (H(r, p) − E) δ (σ(r)) δ (σ(r, ˙ p)) . (8.7.17) From eqn. (4.10.11), the metric factor is the compressibility of the system, κ=

N X i=1

√ g = exp(−w), where dw/dt = κ, and κ is

[∇ri · r˙ i + ∇pi · p˙ i ] .

(8.7.18)

Blue moon ensemble

367

Note that ∇ri · r˙ i = 0 so that κ=

X i

∇pi · p˙ i = − =−

2

2

P

i (∇i σ/mi )

P

Pi

P

i

· ∇i

P

j

2

∇j σ · pj /mj

(∇i σ) /mi

∇i σ · ∇i σ/m ˙ i

(∇i σ)2 /mi # " X d 2 (∇i σ) /mi = − ln dt i =

i

dw . dt

(8.7.19)

The metric, therefore, becomes X 1 √ g = e−w = mi i



∂σ ∂ri

2

≡ z(r).

(8.7.20)

This metric factor arises frequently when holonomic constraints are imposed on a system and is, therefore, given the special symbol z(r). The partition function generated by eqns. (8.7.15) now becomes Z Z = dN r dN p z(r)δ (H(r, p) − E) δ (σ(r)) δ (σ(r, ˙ p)) . (8.7.21) The energy-conserving δ-function, δ(H(p, r) − E) can be replaced by a canonical distribution exp[−βH(r, p)] simply by coupling eqns. (8.7.15) to a thermostat such as the Nos´e-Hoover chain thermostat (see Section 4.11), and eqn. (8.7.21) is replaced by Z Z = dN r dN p z(r)e−βH(r,p) δ (σ(r)) δ (σ(r, ˙ p)) . (8.7.22) In order to compare eqn. (8.7.21) to eqn. (8.7.1), we need to perform the integration over the momenta in order to clear the second δ-function. Noting that σ(r, ˙ p) =

X ∂σ X ∂σ pi · r˙ i = · , ∂ri ∂ri mi i i

(8.7.23)

the partition function becomes Z=

Z

N

N

d r d p z(r)e

−βH(r,p)

δ (σ(r)) δ

X ∂σ pi · ∂ri mi i

!

.

(8.7.24)

Fortunately, the second δ-function is linear in the momenta and can be integrated over relatively easily (Problem 8.4), yielding Z Z ∝ dN r z 1/2 (r)e−βU(r) δ (σ(r)) . (8.7.25)

368

Free-energy calculations

Apart from prefactors irrelevant to the free energy, eqns. (8.7.25) and eqn. (8.7.1), differ only by the factor of z 1/2 (r). We have already seen that the conditional average of any function O(r) of the positions is R dr e−βU(r) O(r)δ(f1 (r) − s) cond . (8.7.26) hO(r)is = hδ(f1 (r) − s)i The above analysis suggests that the average of O(r) in the ensemble generated by the constrained dynamics is R dr e−βU(r) z 1/2 (r)O(r)δ(f1 (r) − s) , (8.7.27) hO(r)iconstr = s hz 1/2 (r)δ(f1 (r) − s)i since σ(r) = f1 (r)−s. Thus, the conditional average of O(r) can be generated using the constrained ensemble if, instead of computing the average of O(r) in this ensemble, we compute the average of z −1/2 (r)O(r) and normalize by the average of z −1/2 (r). That is, hz −1/2 (r)O(r)iconstr s . (8.7.28) hO(r)icond = s hz −1/2 (r)iconstr s Given the connection between the conditional and constrained averages, eqn. (8.7.13) for the free-energy profile can be written as

A(s) = A(s(i) ) +

Z

s

ds′

s(i)

D

z −1/2 (r)

h

˜ ∂U ∂q1

iEconstr − kT ∂q∂1 ln J(q) ′ s

hz −1/2 (r)iconstr s′

.

(8.7.29)

Having now demonstrated how to compute a conditional average from constrained dynamics, we quote an important result of Sprik and Ciccotti (1998) who showed that a transformation to a complete set of generalized coordinates is not required. Rather, all we need is the form of the function f1 (r1 , ..., rN ) associated with the collective variable or reaction coordinate q1 . Then, when eqns. (8.7.15) are used to constrain f1 (r1 , ..., rN ), the free-energy profile can be expressed as

−1/2 constr z (r) [λ + kT G] s′ , ds A(s) = A(s ) + hz −1/2 (r)iconstr s(i) s′ Z

(i)

s



(8.7.30)

where λ is the Lagrange multiplier for the constraint and G=

1 z 2 (r)

X i,j

1 ∂f1 ∂ 2 f1 ∂f1 · · . mi mj ∂ri ∂ri ∂rj ∂rj

(8.7.31)

In a constrained molecular dynamics calculation, the Lagrange multiplier λ is calculated “on the fly” at every step and can be used, together with the calculation of G from eqn. (8.7.31), to construct the average in eqn. (8.7.30) at each value of the constraint. An interesting twist on the blue moon method was introduced by Darve and Pohorille (2001, 2007, 2008) who suggested that the free-energy derivative could

Umbrella sampling

369

Fig. 8.6 Free-energy profiles for the addition of 1,3-butadiene and 2-F-1,3-butadiene to a Si(100)-2×1 surface. The inset at the upper left shows two possible reaction paths, concerted and non-concerted; the figure in the lower part of the plot is the molecule 2-F-butadiene.

also be computed using unconstrained dynamics by connecting the former to the instantaneous force acting on the collective variable or reaction coordinate. As an illustration of the use of the blue moon ensemble method, we show Helmholtz free-energy profiles for the addition of two different organic molecules, 1,3-butadiene, and 2-F-1,3-butadiene to a silicon (100)-2×1 reconstructed surface (see Fig. 8.6). The surface contains rows of silicon dimers with a strong double-bond character that can form [4+2] Diels-Alder type adducts with these two molecules (see inset to Fig. 8.6). An important challenge is the design of molecules that can chemisorb to the surface but can also be selectively removed for surface patterning, thus requiring a relatively low free energy for the retro-Diels-Alder reaction. By computing the free-energy profile using a reaction coordinate ξ = (1/2)|(rSi1 + rSi2 ) − (rC1 + rC4 )|, where Si1 and Si2 are the two silicon atoms in a surface dimer, and C1 and C4 are the outer carbons in the organic molecule, it was possible to show that a simple modification of the molecule substantially lowers the free energy of the retro-Diels-Alder reaction. Specifically, if the hydrogen at the 2 position is replaced by a fluorine, the free energy is lowered by 8-10 kcal/mol (Minary and Tuckerman, 2004; Iftimie et al., 2005). These calculations are performed using a molecular dynamics protocol in which forces are obtained “on the fly” from an electronic structure calculation (this is called ab initio molecular dynamics (Car and Parrinello, 1985; Tuckerman, 2002; Marx and Hutter, 2009)). The calculations are performed at 300 K using 13 different values for ξ separated by a distance of 0.15 ˚ A.

370

8.8

Free-energy calculations

Umbrella sampling and weighted histogram methods

In this section, we will discuss another free-energy technique known as umbrella sampling (Torrie and Valleau, 1974; Torrie and Valleau, 1977). This method bears some similarity to the blue moon ensemble approach; however, rather than constraining the collective variable or reaction coordinate, the latter is restrained with a biasing potential. The bias is usually taken to be a harmonic potential of the form W (f1 (r1 , ..., rN ), s) =

1 2 κ (f1 (r1 , ..., rN ) − s) , 2

(8.8.1)

which is also known as an umbrella potential. Equation (8.8.1) is added to U (r) so that the total potential is U (r) + W (f1 (r), s). This potential is then used in a molecular dynamics or Monte Carlo calculation. As is done in the blue moon ensemble approach, the equilibrium value of the harmonic restraining potential is chosen to be a set of intermediate points s(1) , ..., s(n) between s(i) and s(f ) , so that the chosen coordinate q1 = f1 (r) is “driven” between the two endpoint values. Each molecular dynamics or Monte Carlo simulation will then yield a biased probability distribution P˜ (s, s(k) ) k = 1, ..., n with s(1) = s(i) and s(n) = s(f ) about each point s(k) . From this set of distribution functions, the true free-energy profile across the entire range of the collective variable or reaction coordinate must be reconstructed. When a biasing potential is used, reconstructing the full distribution requires an unbiasing procedure. This parallels the use of a constraint in the blue moon ensemble approach, which also requires an unbiasing factor in ensemble averages. The technique we will present in this section is known as the weighted histogram analysis method, or WHAM (Kumar et al., 1992). WHAM begins by defining the biased probability distribution generated from each molecular dynamics or Monte Carlo simulation Z (k) P˜ (s, s(k) ) = eβAk dN r e−βU(r) e−βW (f1 (r),s ) δ(f1 (r) − s), (8.8.2) where Ak is (apart from additive constants) the free energy associated with the biased potential Z D E (k) (k) −βAk e = dN r e−βU(r) e−βW (f1 (r),s ) = e−βA0 e−βW (f1 (r),s ) . (8.8.3) In eqn. (8.8.3), the average is taken with respect to the unbiased potential U (r), and Z (8.8.4) e−βA0 = dN r e−βU(r) is just the unbiased configurational partition function. The factor exp(βAk ) is, therefore, the correct normalization constant for eqn. (8.8.2). Next, we define the unbiased probability distribution function Pk (s) corresponding to the distribution P˜ (s, s(k) ) as (k)

Pk (s) = e−β(Ak −A0 ) eβW (s,s

)

P˜ (s, s(k) ).

(8.8.5)

Umbrella sampling

371

We now “glue” these distributions together to give the full probability distribution function P (s) by expressing P (s) as a linear combination of the distributions Pk (s): P (s) =

n X

Ck (s)Pk (s)

k=1

=

n X

k=1

h i (k) Ck (s) e−β(Ak −A0 ) eβW (s,s ) P˜ (s, s(k) ) ,

(8.8.6)

where {Ck (s)} is a set of coefficients that must be optimized to give the best representation of the true distribution P (s). The coefficients must satisfy the constraint n X

Ck (s) = 1.

(8.8.7)

k=1

In order to determine the coefficients, we seek to minimize the statistical error ˜ k (s) be the (biased) in the distribution generated by the WHAM procedure. Let H histogram obtained from each molecular dynamics or Monte Carlo simulation. Then, the biased distribution is estimated by P˜ (s, s(k) ) ≈

1 ˜ Hk (s), nk ∆s

(8.8.8)

where nk is the number of configurations sampled in the kth simulation and ∆s is the bin width used to compute the histogram. The statistical error in the biased ˜ k (s)/(nk ∆s), where ǫk (s) distribution for the kth umbrella window is then σ ˜k2 = ǫk (s)H measures the deviation as a function of q between the numerically sampled distribution and the true distribution P (s) in the kth umbrella window. The error in Pk (s) is then given by the square of the unbiasing factor (k)

σk2 = e−2β(Ak −A0 ) e2βW (s,s

) 2 σ ˜k .

(8.8.9)

We aim to minimize the total error σ2 =

n X

Ck2 (s)σk2

(8.8.10)

k=1

subject to eqn. (8.8.7). The constraint can be imposed by means of a Lagrange multiplier λ; the error function to be minimized is ! n n X X ˜ 2 2 −2β(Ak −A0 ) 2βW (s,s(k) ) ǫk (s)Hk (s) −λ Ck (s) − 1 . (8.8.11) Σ = Ck (s)e e (nk ∆s) k=1

k=1

Thus, setting the derivative ∂Σ2 /∂Ck (s) = 0 and solving for Ck (s) in terms of the Lagrange multiplier, we find Ck (s) =

λnk ∆s . (k) ) −2β(A ˜ k −A0 ) e2βW (s,s 2ǫk (s)Hk (s)e

(8.8.12)

372

Free-energy calculations

The Lagrange multiplier is now determined by substituting eqn. (8.8.12) into eqn. (8.8.7). This yields λ

n X

k=1

nk ∆s =1 (k) ) −2β(A ˜ k −A0 ) e2βW (s,s 2ǫk (s)Hk (s)e

(8.8.13)

so that 1 . −2β(Ak −A0 ) e2βW (s,s(k) ) ] ˜ n ∆s/[2ǫ (s) H (s)e k k k=1 k

λ = Pn

(8.8.14)

Substituting the Lagrange multiplier back into eqn. (8.8.12) gives the coefficients as (k)

˜ k (s)e−2β(Ak −A0 ) e2βW (s,s ) ] nk /[ǫk (s)H Ck (s) = Pn . −2β(Aj −A0 ) e2βW (s,s(j) ) ] ˜ j=1 nj /[ǫj (s)Hj (s)e

(8.8.15)

At this point, we make two vital assumptions. First, we assume that the error function ǫk (s) is the same in all n umbrella windows, which is tantamount to assuming that the sampling is of equal quality for each simulation. Note that this does not necessarily mean that all simulations should be of equal length, as the relaxation of the system along directions in configuration space orthogonal to q might be longer or shorter depending on the value of q. The second assumption is that the biased ˜ k (s) is well estimated by simply applying the histogram in each umbrella window H biasing factor directly to the target distribution P (s), i.e., ˜ k (s) ∝ eβ(Ak −A0 ) e−βW (s,s(k) ) P (s), H

(8.8.16)

which, again, will be approximately true if there is adequate sampling. Once these assumptions are introduced into eqn. (8.8.15), the coefficients are finally given by (k)

nk eβAk e−βW (s,s ) . Ck (s) = Pn βAj e−βW (s,s(j) ) j=1 nj e

(8.8.17)

Therefore, the distribution becomes

Pn

k=1 nk Pk (s) . β(Ak −A0 ) e−βW (s,s(k) ) n e k=1 k

P (s) = Pn

(8.8.18)

Although the WHAM procedure might seem straightforward, eqn. (8.8.18) only defines P (s) implicitly because the free-energy factors in eqn. (8.8.18) are directly related to P (s) by Z e−β(Ak −A0 ) =

(k)

dq P (s)e−βW (s,s

)

.

(8.8.19)

Equations (8.8.18) and (8.8.19), therefore, constitute a set of self-consistent equations, and the solution for the coefficients and the free-energy factors must be iterated to self-consistency. The iteration is usually started with an initial guess for the freeenergy factors Ak . Note that the WHAM procedure only yields Ak up to an overall

Wang-Landau sampling

373

additive constant A0 . When applying the WHAM procedure, care must be taken that the assumption of equal quality sampling in each umbrella window is approximately satisfied. If this is not the case, the WHAM iteration can yield unphysical results which might, for example, appear as holes in the final distribution. Once P (s) is known, the free-energy profile is given by eqn. (8.6.7). We note, finally, that K¨ astner and Thiel (2005) showed how to combine the umbrella sampling and thermodynamic integration techniques. Their approach makes use of the bias in eqn. (8.8.1) as a means of obtaining the free-energy derivative dA/dq in each umbrella window. The method assumes that if the bias is sufficiently strong to keep the chosen coordinate q1 = f1 (r) very close to s(k) in the kth window, then the probability distribution P (s) can be well represented by a Gaussian distribution: 2 2 1 e−(s−¯sk ) /2σk . P (s) ≈ p 2 2πσk

(8.8.20)

Here, s¯k is the average value of the target coordinate in the kth window and σk2 is the variance, both of which are computed via a molecular dynamics or Monte Carlo simulation in the window. Because we have assumed a specific form for the distribution, we can calculate the unbiased approximation to the derivative dAk /dq in the kth window 1 dAk = (s − s¯k ) − κ(s − s(k) ). (8.8.21) dq βσk2 Let nw and n represent the number of umbrella windows and thermodynamic integration points, respectively. We now obtain the full derivative profile dA/dq at each integration point q (i) by “gluing” the windows together as in the original WHAM procedure: nw X dA (i) dAk C (s ) . (8.8.22) = k (i) dq q=q(i) dq k=1

The coefficients Ck (s) satisfy eqn. (8.8.7). Equations (8.8.20), (8.8.21), and (8.8.22) are the starting points for the development of a weighted histogram method that is considerably simpler than the one developed previously as it eliminates the global constant A0 . Once the values of dA/dq (i) are obtained (see Problem 8.11), a numerical integration is used to obtain the full free-energy profile A(s).

8.9

Wang-Landau sampling

In Section 7.6 of Chapter 7, we introduced the Wang-Landau approach for obtaining a flat density of energy states g(E). There we showed that in addition to a move in configuration space from r0 to r leading to an energy change from E0 to E with a Metropolis acceptance probability   g(E0 ) acc(r0 → r) = min 1, , (8.9.1) g(E) the density of states g(E) is scaled at each energy E visited by a factor f : g(E) → f g(E). Here r0 is a complete set of initial Cartesian coordinates, and r is the complete set of trial Cartesian coordinates. Initially, we take g(E) to be 1 for all possible energies.

374

Free-energy calculations

The Wang-Landau sampling scheme has been extended to collective variable by Calvo (2002). The idea is to use the Wang-Landau scaling f (see Section 7.6) to generate a function that approaches the probability P (s) in eqn. (8.7.1) over many Monte Carlo passes. Let g(s) be a function that we initially take to be 1 over the entire range of s, i.e., over the entire range of the collective variable q1 = f1 (r1 , ..., rN ) ≡ f (r). Let h(s) = ln g(s) so that h(s) is initially zero everywhere. A Monte Carlo simulation is performed with the Metropolis acceptance rule:   exp (−βU (r)) g(s0 ) acc(r0 → r) = min 1, exp (−βU (r0 )) g(s)   exp (−βU (r)) exp (−h(s)) = min 1, . (8.9.2) exp (−βU (r0 )) exp (−h(s0 )) Here s0 = f1 (r0 ) and s = q1 = f1 (r). In addition to this acceptance rule, for each value s of the chosen coordinate q1 = f1 (r) visited, the function h(s) is updated according to h(s) → h(s) + α, where α = ln f . This is equivalent to scaling g(s) → f g(s). As the simulation proceeds, g(s) approaches the true probability P (s), and the histogram H(s) will become flat. Typically, f is initially chosen large, e.g. f = e1 , and is gradually reduced to 1, which means that α is initially chosen to near 1 and is reduced to 0. Note that the Metropolis acceptance rule in eqn. (8.9.2) is equivalent to the usual acceptance ˜ (r1 , ..., rN ) = U (r1 , ..., rN ) − kT h(f1 (r1 , ..., rN )). rule using a modified potential U

8.10

Driven adiabatic free-energy dynamics

In this section, we show how the adiabatic free-energy dynamics approach introduced in Section 8.3 can be extended to treat a set of collective variables. We will provide a detailed analysis of the approach, demonstrating how it leads to the free-energy profile directly from the adiabatic probability distribution function. Let us begin by considering what type of calculation eqn. (8.6.6), as written, would require. The presence of the Dirac δ-functions in the expression dictates that we impose a restriction on any simulation we use to compute the integral such that each function fα (r1 , ..., rN ) has a constant value sα . This amounts to n restrictions, and a simulation that includes these n restrictions will then generate just one point of the marginal probability distribution function P (s1 , ..., sn ). Now imagine trying to generate the entire marginal distribution numerically this way by laying down a grid of M points along each of the n directions in the space of s variables. In this case, the total number of points needed to characterize the marginal is M n . When n = 1, this is essentially what is done in the blue moon and umbrella sampling methods, where it is computationally tractable. Recall that in the former, a constraint is used to enforce the restriction, while in the latter, a simple harmonic restraint is applied as a bias potential. Even if n = 2, it might still be possible to compute P (s1 , s2 ) on a grid of M 2 points in a tour de force calculation, but for n > 2, the number of points needed grows exponentially with n, and the problem rapidly becomes intractable. In this section, we will apply the idea of adiabatic free-energy dynamics to devise an approach to sample P (s1 , ..., sn ), thus generating the statistically relevant regions of the marginal and the corresponding free-enrgy hypersurface A(s1 , ..., sn ).

375

Adiabatic dynamics

The notion of sampling the marginal distribution P (s1 , ..., sn ) is no different in principle from sampling the high-dimensional canonical Boltzmann configuration distribution exp(−βU (r1 , ..., rN ))/Z(N, V, T ). In each case, we are primarily interested in sampling the statistically important regions. Because we know its explicit functional form a priori, we can sample the Boltzmann distribution directly from molecular dynamics or Monte Carlo simulations. However, we do not have such an explicit functional form for the marginal distribution P (s1 , ..., sn ). Rather, it emerges numerically from a simulation by applying eqn. (8.6.6). The obvious question thus arises of how we can sample a distribution function whose form we do not know a priori. In fact, this can be done “on the fly” in a simulation if we are able to generate sufficient information about P (s1 , ..., sn ) in a local neighborhood of each n-dimensional point (s1 , ..., sn ) ≡ s as the simulation progresses. As a matter of nomenclature, we refer to the variables s1 , ..., sn as coarse-grained variables. The first step in realizing this kind of “on the fly” sampling is to replace each Dirac δ-function in eqn. (8.6.6) with a Gaussian using the identity in eqn. (A.5) in Appendix A: 2 2 1 e−(x−a) /2σ . (8.10.1) δ(x − a) = lim √ 2 σ→0 2πσ Applying this identity to the δ-functions in eqn. (8.6.6), we obtain n Y

α=1

δ (fα (r) − sα ) = lim

σ→0

= lim



κ→∞

1 2πσ 2



n/2 Y n

2

/2σ2

α=1

n/2 Y n βκ 2π

e−(fα (r)−sα )

e−βκ(fα (r)−sα )

2

/2

,

(8.10.2)

α=1

where r ≡ (r1 , ..., rN ) denotes the full set of N coordinate vectors. In the last line of eqn. (8.10.2), we have replaced the width parameter σ with an inverse width measure κ with the definition σ 2 = 1/(βκ). Here, β is an arbitrary constant; however, we will identify it with the physical temperature in the usual way as β = 1/kT going forward. With this identity, eqn. (8.6.6) becomes CN (βκ/2π)n/2 κ→∞ Q(N, V, T )

P (s1 , ..., sn ) = lim

Z

dN p dN r e−βH(r,p)

n Y

e−βκ(fα (r)−sα )

2

/2

.

α=1

(8.10.3)

Equation (8.10.3) is exact and, therefore, no different from eqn. (8.6.6). As written, it is not practical because of the limit sending κ → ∞. Obviously, in a computer simulation, we must choose a finite value of κ sufficiently large that the Gaussian becomes a reasonable approximation of the δ-function. In fact, discretization of δ-functions is exactly what is done when generating a histogram, and therefore, we have a criterion for choosing κ at a given temperature, such that the width parameter σ 2 = kT /κ of the Gaussian matches that of the bin width one might use to generate the histogram. Following this line of reasoning, if we allow κ to be finite, then there is no reason

376

Free-energy calculations

to assume one value of κ for all n components of s. Rather, for greater generality, we can choose a set of n values κ1 , ..., κn ≡ κ selected to match the histogram that would be needed to resolve each of the directions of s, which is helpful if each of these components is a different type of collective variable with a different associated scale. Within the approximation of finite κ, the marginal distribution becomes Z Pn 2 1 NCN Pκ (s1 , ..., sn ) = dN p dN r e−β (H(r,p)+ 2 α=1 κα (fα (r)−sα ) ) , (8.10.4) Q(N, V, T ) Qn 1/2 where N = α=1 (βκα /2π) . Clearly, in the limit that all components of κ go to infinity, Pκ (s1 , ..., sn ) → P (s1 , ..., sn ). However, within the approximation of eqn. (8.10.4), we see that when the P δ-functions are converted to Gaussians of the form given in eqn. (8.10.2), the term nα=1 κα (fα (r) − sα )2 /2 adds to the physical Hamiltonian H(r, p) as a kind of harmonic coupling between the collective variables fα (r) and the points sα on which the marginal distribution depends. This harmonic coupling can be considered as an additional term in the potential. We now see that the unknown approximate marginal distribution Pκ (s1 , ..., sn ) ≡ Pκ (s) can be sampled using molecular dynamics if we consider s as a set of dynamical variables. Of course, in order to do this, we must extend the phase space to include a set of momenta conjugate to s, which we will denote as π ≡ (π1 , ...., πn ). This construct leads to an extended phase-space Hamiltonian of the form Hκ (r, s, p, π) = H(r, p) +

=

n n X X 1 πα2 2 + κα (fα (r) − sα ) 2µ 2 α α=1 α=1

(8.10.5)

n n N X X X πα2 1 p2i + + U (r) + κα (fα (r) − sα )2 , 2m 2µ 2 i α α=1 α=1 i=1

where µ1 , ..., µn is a set of fictitious “mass-like” parameters that define the time scale on which the extended phase-space variables (s, π) evolve. The equations of motion obtained from this Hamiltonian are r˙ i =

∂Hκ pi = ∂pi mi

p˙ i = − s˙ α =

n X ∂Hκ ∂fα = Fi − κα (fα (r) − sα ) ∂ri ∂ri α=1

∂Hκ πα = ∂πα µα

π˙ α = −

∂Hκ = κα (fα (r) − sα ) . ∂sα

(8.10.6)

Unfortunately, as we saw in Section 8.3, if we simply integrate these numerically as written, we will not properly sample the marginal in eqn. (8.10.4). On-the-fly sampling of the marginal distribution requires that we effect an adiabatic separation between

Adiabatic dynamics

377

the physical phase space (r, p) and the extended phase space (s, π) such that the latter moves on a time scale much shorter than the former. This adiabatic separation is realized through the choice of the fictitious parameters µ1 , ..., µn : these must be chosen sufficiently large to ensure the time-scale separation of the motion of the two sets of variables. At the end of this section, we provide a detailed proof showing how adiabatic separation allows the marginal distribution to be correctly generated. However, as was discussed in Section 8.3, we can use a simple argument to illustrate this point. Let us first write the marginal distribution as an average of the product of Gaussians over the physical phase space (r, p): D β Pn E 2 Pκ (s1 , ..., sn ) = N e− 2 α=1 κα (fα (r)−sα ) . (8.10.7)

In the limit of perfect adiabatic separation, eqns. (8.10.6), when coupled to a canonical sampling approach such as a Nos´e-Hoover chain, isokinetic Nos´e-Hoover chain, or generalized Gaussian moment (see Problem 4.1) thermostat, can be seen to generate the average in eqn. (8.10.7). If we define a κ-dependent free-energy surface Aκ (s1 , ..., sn ) = −kT ln Pκ (s1 , ..., sn ),

(8.10.8)

then we can compute free-energy derivatives and express them as averages according to ∂Pκ kT ∂Aκ =− ∂sγ P (s1 , ..., sn ) ∂sγ D E β Pn 2 κγ (fγ (r) − sγ ) e− 2 α=1 κα (fα (r)−sα ) E D β Pn . =− 2 e− 2 α=1 κα (fα (r)−sα )

(8.10.9)

From eqn. (8.10.9), we see that under perfect adiabatic separation, the limiting form of the equations of motion for (s, π) from eqns. (8.10.6) becomes s˙ α =

πα µα

π˙ α = −

∂Aκ + Therm(T ), ∂sα

(8.10.10)

which means that these variables will explore the free-energy landscape via a dynamics driven by the free-energy derivative. Here, the notation Therm(T ) is a shorthand indicating a coupling to one of the aforementioned thermostats, conveniently allowing us to express this without having to write out explicit equations of motion for all of the thermostat variables! This result will be proved more rigorously at the end of this section. While adiabatic separation is a necessary condition to ensure proper sampling of the marginal distribution, it will lead to inefficient sampling, particularly if the potential energy surface is a rough surface, containing numerous barriers that are high compared to kT . Therefore, in order to ensure that these barriers can be surmounted

378

Free-energy calculations

and that sampling is efficient, we can choose to sample (s, π) variables at a temperature higher than T . Since these variables constitute the extended phase space, then so long as adiabatic separation is maintained, the only effect this higher temperature will have is to change the definition of the marginal distribution in eqn. (8.10.8). If we define Ts ≫ T as the (higher) temperature assigned to the extended phasespace variables, then the effect of introducing this temperature boost is to change the marginal distribution expressed as a Boltzmann factor of the free-energy surface Aκ (s1 , ..., sn ) according to Padb (s1 , ..., sn , Ts ) ∝ e−βs Aκ (s1 ,...,sn) iβs /β h ∝ e−βAκ (s1 ,...,sn) ∝ [P (s1 , ..., sn )]T /Ts ,

(8.10.11)

where βs = 1/kTs and Padb (s1 , ..., sn , Ts ) denotes the marginal distribution generated under adiabatic separation and at temperature Ts . Equation (8.10.11) was first derived by Marx et al. (1999b) and was later adapted for direct collective-variable-based adiabatic free-energy dynamics (Rosso et al., 2002). Given eqn. (8.10.11), we see that the free-energy surface at the physical temperature T can be obtained from the simple relation Aκ (s1 , ..., sn ) = −kTs ln Padb (s1 , ..., sn , Ts ). (8.10.12)

Importantly, as κ → ∞, the exact free energy A(s1 , ..., sn ) is recovered. The introduction of a high temperature Ts for the (s, π) phase space is tantamount to replacing eqns. (8.10.10) with πα s˙ α = µα π˙ α = −

∂Aκ + Therm(Ts ). ∂sα

(8.10.13)

This, then, brings us to the practical scheme employed to generate the adiabatic marginal distribution, which amounts to rewriting eqns. (8.10.6) as pi r˙ i = mi p˙ i = Fi − s˙ α =

πα µα

n X

α=1

κα (fα (r) − sα )

∂fα + Therm(T ) ∂ri

π˙ α = κα (fα (r) − sα ) + Therm(Ts ).

(8.10.14)

The procedure for employing eqns. (8.10.14) to compute an n-dimensional free-energy landscape is to accumulate an n-dimensional histogram in the variables s1 , ..., sn over

Adiabatic dynamics

379

the course of a simulation based on these equations of motion, and after normalization, the histogram is fed into eqn. (8.10.12) to obtain the free-energy surface. In practice, the force constants κ need to be large enough to reproduce the κα → ∞ limit within error bars. However, as these large force constants lead to high-frequency motion, use of multiple time-stepping methods in the integration of eqns. (8.10.14) allows larger time steps to be used than would be needed for the resulting fast forces. Resonance phenomena, discussed in Section 3.11.1, can be handled using isokinetic thermostatting methods; resonance-avoiding methods allow calculations involving eqn. (8.10.14) to be performed with very large time steps (Chen and Tuckerman, 2018). Equations (8.10.14) comprise the method known as driven adiabatic free-energy dynamics (d-AFED) (Abrams and Tuckerman, 2008) or temperature-accelerated molecular dynamics (TAMD) (Maragliano and Vanden-Eijnden, 2006), and they are an extended phase-space version of a scheme originally proposed by Rosso et al. (2002). An important feature of these equations is that they can be used to explore and generate free-energy surfaces of high dimension, by which we usually mean a value of n ranging anywhere from 4 to several 10s of collective variables (Abrams and VandenEijnden, 2010). Not surprisingly, as the dimension n grows, the associated free-energy surfaces become increasingly difficult to visualize, represent mathematically, and analyze for key features such as minima and saddle points, which are also called landmark points (Chen et al., 2015). It is important to point out that d-AFED/TAMD generates full sweeps over the free-energy surface, generating a scattering of points in each sweep. Over long simulation times, the points fill in to provide a global surface. However, as will be discussed in Chapter 17, techniques of machine learning can serve as regression models to fill in details of a free-energy surface even if a d-AFED/TAMD run is short and the scattering of points is sparse (Maragliano and Vanden-Eijnden, 2008; Chen et al., 2012; Schneider et al., 2017). As we have already noted, there is no general protocol for selecting optimal collective variables a priori, which also means that the optimal number, n, of collective variables cannot be known at the outset. Nevertheless, an important advantage of d-AFED/TAMD over other sampling methods is that it can generate a free-energy surface as a function of a large number of possibly redundant collective variables. This surface can then be analyzed a posteriori to determine a lower dimensional set of variables that captures slow motions of a system and, consequently, the true rare events. This lower dimensional set of collective variables should correspond to a putative set of optimal collective variables. This type of analysis will be described in Chapter 17. Additional approaches discussed in Chapter 17 will provide a route to constructing the free-energy surface using the sampled gradients in eqn. (8.10.9), an approach that often converges to a final free-energy surface more rapidly than histograms in eqn. (8.10.12) (Chen et al., 2012). Finally, Chapter 17 will explore machine learning methods for constructing optimal collective variables that can be targeted in enhanced sampling methods like d-AFED/TAMD or to construct meaningful reaction coordinates for specific processes. We conclude this section with a physically motivated proof that d-AFED/TAMD generates the correct free-energy surface under the assumption of ergodic sampling of the extended phase space. (This analysis is rather technical and not critical for

380

Free-energy calculations

implementing d-AFED/TAMD; readers may proceed to the discussion and examples at the end of this section without loss of continuity.) The analysis is based on the Liouville operator formalism we have been using to derive numerical integrators for molecular dynamics simulations. In order to simplify the notation, let r ≡ r1 , ..., rN denote the full set of physical coordinates, p ≡ p1 , ..., pN the full set of physical momenta, and F ≡ F1 , ..., FN the full set of physical forces. We will continue to use (s, π) to denote the 2n-dimensional extended phase space, and we will use φ ≡ φ1 , ..., φn to denote the forces on s, i.e., φα = κα (qα (r) − sα ). Finally, let M be a 3N × 3N diagonal matrix of physical particle masses and µ be an n × n diagonal matrix of extended phase-space mass-like parameters. The full Liouville operator for d-AFED/TAMD can be written as ∂ ∂ ∂ ∂ (r) + µ−1 π · +F· +φ· + iLtherm(T ). (8.10.15) iL = M−1 p · ∂r ∂s ∂p ∂π (r)

Here, iLtherm is a thermostat operator for the physical phase space. For ease of notation, we will not couple a thermostat to the extended phase space in the present analysis (s, π). The thermostat on (r, p) could be one of the deterministic methods we discussed in Chapter 4 or one of the stochastic methods we will introduce in Chapter 15. The particular form of this thermostat is not important so long as the corresponding propagator generates canonical sampling in the physical subspace. For this analysis, we will introduce a factorization of the classical propagator exp(iL∆t), which will lead to a formal numerical scheme equivalent to eqns. (8.10.10) or (8.10.13). The factorization we seek starts with the following decomposition of the Liouville operator in eqn. (8.10.15): (s) iL = iL(r) + iLref (8.10.16) where (s)

iLref = µ−1 π · and (r)

iL(r) = iLref + F ·

∂ ∂s

(8.10.17)

∂ ∂ =φ· ∂p ∂π

(8.10.18)

with

∂ (r) + iLtherm(T ). (8.10.19) ∂r Equation (8.10.16) is now used to generate a factorization for exp(iL∆t) in which the time step ∆t is chosen to integrate the slow motion accurately. This factorization is (r)

iLref = M−1 p ·

eiL∆t = eiL

(r)

(s)

∆t/2 iLref ∆t iL(r) ∆t/2

e

e

.

(8.10.20)

Note that the propagator exp(iL(r) ∆t/2) is left intact and, therefore, generates formally exact motion for the (fast) physical variables. Without loss of generality, we can also represent the operator exactly using the Trotter theorem:

Adiabatic dynamics

eiL

(r)

∆t/2

381

     ∆t ∂ ∂ ∆t (r) ∆t exp exp iLref φ· F· M→∞ 4M ∂π 4M ∂p 2M    M ∆t ∆t ∂ ∂ × exp exp . (8.10.21) F· φ· 4M ∂p 4M ∂π

= lim



exp



When eqn. (8.10.21) is substituted into eqn. (8.10.20), the result is a formally exact analog of a multiple time-scale factorization (see Section 3.11), which, nevertheless, can be applied analytically. When the operator exp(iL(r) ∆t/2) is applied in the exactly factorized form given in eqn. (8.10.21), the s variables do not change through the first application of this operator; however, the momenta π do. In each of the M applications of the terms in the factorization, both r and p change, producing their exact evolution up to ∆t/2, and since φ depends on both s and r, the φ forces change as r changes. (1) Let this evolution of r be denoted radb (t; r(0), p(0), γ(0)) ≡ radb (t), where γ denotes a set of thermostat variables acting on the physical space (r, p) and the “adb” subscript indicates evolution under the conditions of adiabatic separation between the physical (r, p) and extended (s, π) phase spaces. The evolution of π under exp(iL(r) ∆t/2) in its factorized form can be written as  1 ∆t 1 (1) (1) φ(radb (0), s(0)) + φ(radb (∆t/2), s(0)) π(∆t/2) = π(0) + lim M→∞ 2M 2 2 # M−1 X (1) + φ(radb (k∆t/2M ), s(0)) . (8.10.22) k=1

The term in brackets in the limit that M → ∞ is the trapezoidal rule representation of a time integral, which means that the evolution in eqn. (8.10.22) can be written exactly as Z ∆t/2 (1) (8.10.23) dt φ(radb (t), s(0)). π(∆t/2) = π(0) + 0

Application of the remaining operators completes the evolution of π as π(∆t) = π(∆t/2) +

Z

∆t

∆t/2

(2)

(2)

dt φ(radb (t), s(∆t)),

(8.10.24)

where radb (t) = radb (t; r(∆t/2), p(∆t/2), γ(∆t/2)), indicating the second half of the evolution of the physical coordinates. Of course, the factorized propagator prescribes the evolution of all other coordinates and momenta in the system, but we do not need to concern ourselves with these for the purposes of this analysis. The next step of the proof is to invoke the conditions of adiabaticity and the assumption of ergodic evolution of the physical variables. Under these conditions, we assume that the physical variables (r, p) sample enough of the local phase space at s(0)

382

Free-energy calculations

(or at s(∆t/2)) that we can replace the time integrals in eqns. (8.10.23) and (8.10.24) with integrals over the physical coordinates, i.e., E D Pn 2 −β Z τ +∆t/2 α=1 κα (fα (r)−sα ) 2 φ(r, s)e 2 D β Pn E r dt φ(radb (t; r(τ ), p(τ ), γ(τ ), s) = 2 ∆t τ e− 2 α=1 κα (fα (r)−sα ) r

=−

∂Aκ , ∂s

(8.10.25)

where h· · ·ir indicates that the integrals are performed over r, keeping s fixed. Thus, ¯ if we denote the exact free-energy gradient as φ(s), then eqn. (8.10.25) shows that the numerical evolution of π will be a velocity Verlet evolution step on the exact free-energy surface: π(∆t) = π(0) +

 ∆t  ¯ ¯ φ(s(0)) + φ(s(∆t)) . 2

(8.10.26)

Note that the use of a high temperature Ts ≫ T on the (s, π) extended space is not necessary for the analysis we have performed; only the adiabatic decoupling is required to generate motion on the free-energy surface. The effect of adding a second thermostat on the extended phase space would be that the free-energy surface is sampled under canonical evolution at high temperature, but this would not change the underlying surface on which (s, π) evolves. Thus, the canonical distribution that would be sampled at temperature Ts would be exp(−βs Aκ (s)) = [exp(−βAκ (s))]βs /β = [exp(−βAκ (r))]T /Ts , from which the free energy can be easily recovered via eqn. (8.10.12). This completes our proof. As an illustration of adiabatic dynamics, consider a simple problem with two degrees of freedom x and y. The potential is chosen to be a double well in x coupled linearly to a harmonic oscillator in y: U (x, y) = D0 x2 − a2

2

1 + ky 2 + λxy. 2

(8.10.27)

The free-energy profile A(s) for this system can be derived analytically with the result A(s) = D0 s2 − a2

2



λ2 2 s . 2k

(8.10.28)

4 The bare double well has minima at x = ±a and p a barrier height of D0 a while the free energy in eqn. (8.10.28) has minima at s = ± a2 + λ2 /(4D0 k) and a barrier height of D0 a4 + λ2 a2 /(2k)+ λ4 /(16D0 k 2 ). Thus, in order to ensure sufficient barrier-crossing in an adiabatic dynamics simulation, the temperature of the extended variable, s, should satisfy kTs > D0 a4 . In Fig. 8.7, we examine how the free-energy profile generated by eqns. (8.10.14) depends on Ts , µ, and κ for the potential in eqn. (8.10.27) and the corresponding free-energy profile in eqn. (8.10.28) for a simulation of 109 steps of length ∆t = 0.0025 using a generalized Gaussian moment thermostat (Liu and Tuckerman, 2000). The figure shows that if Ts is too low, the barrier is inadequately

Adiabatic dynamics

383

Fig. 8.7 (Left) Free-energy profiles for the potential in eqn. (8.10.27) generated by d-AFED/TAMD for different values of Ts with κ = 3000 and µ = 1000. (Middle) Free-energy profiles for different values of µ with Ts = 7.5 (except where Ts = 5 as indicated) and κ = 3000. (Right) Free-energy profiles for different values of κ with Ts = 7.5 and µ = 1000.

sampled due to insufficient crossing. If the mass µ is too low, the free-energy profile is distorted due to insufficient adiabatic decoupling. Finally, if the spring constant κ is too low, the profile is distorted due to poor approximation of the δ-function. As a second example of a realistic system, d-AFED/TAMD is applied to explore the landscape of crystal structures of solid xenon at 2700 K and 25 GPa pressure. In this application, an isothermal-isobaric version of the d-AFED/TAMD algorithm is employed (Yu et al., 2014) to a system of 4000 xenon atoms in an orthorhombic box. Atoms interact via a Buckingham potential from Ross and McMahan (1980). The simulation employs five collective variables, specifically, the three diagonal elements of the cell matrix h and two crystalline order parameters known as continuous Steinhardt parameters (Steinhardt et al., 1983; Quigley and Rodger, 2008; Quigley and Rodger, Pl 2009), defined as Ql (r) = [(4π/(2l + 1)) m=−l |Qlm (r)|2 ]1/2 , where Nb X 1 Qlm (r) = fc (rb )Ylm (ˆrb ) N Ncoord

(8.10.29)

b=1

and where Nb is the total number of atom pairs separated by a distance rmax , Ncoord is the first-shell coordination number of each atom, Ylm (ˆrb ) is a spherical harmonic, ˆrb is the unit vector along the direction rb , rb = |rb |, and fc (r) is a smooth switching function that switches the order parameter off at r = rmax (here, set to 4.5 ˚ A). The two parameters employed correspond to l = 4 and l = 6. In these simulations, Ts is set to 105 K and κ is set to 1010 K. A simulation of length 200 ns with a time step of 5 fs is performed from which 5×105 snapshots are extracted for free-energy surface reconstruction (Schneider et al., 2017). A projection of the five-dimensional free-energy surface onto the Q4 -Q6 plane is shown in Fig. 8.8. The figure clearly indicates three polymorphs corresponding to the face-centered cubic, body-centered cubic, and hexagonal close-packed (FCC, BCC, and HCP) Bravais lattices. Moreover, we see that the FCC and BCC structures are the most stable with the HCP structure appearing as a metastable state between the two stable structures, in accordance with experiments that have reported pressure-induced FCC-to-HCP transitions (Cynn

384

Free-energy calculations

Fig. 8.8 Projection onto the Q4 − Q6 plane of the five-dimensional free-energy surface of crystalline xenon at 2700 K and 25 GPa from a d-AFED/TAMD simulation.

et al., 2001). It is worth noting that use of Q4 and Q6 alone as collective variables is insufficient to converge the free energy and reveal the three polymorphs (Yu et al., 2014). Thus, it is often essential to work in a high-dimensional collective-variable space in order to explore fully the structural diversity of a system.

8.11

Metadynamics

In this section, we will describe a popular method for computing a free-energy hypersurface in terms of a set of collective variables that is akin to a dynamical version of the Wang-Landau approach in Section 7.6. The metadynamics method (Laio and Parrinello, 2002) is a dynamical scheme wherein energy basins are “filled in” using a time-dependent potential that depends on the history of the system’s trajectory. Once a basin is filled in, the system is driven into the next basin, which is subsequently filled in, and so forth until the entire landscape is “flat.” When this state is achieved, the accumulated time-dependent potential is used to construct the free-energy profile. Note that, in contrast to d-AFED/TAMD, in which basins are visited repeatedly as the system sweeps over the free-energy landscape, in metadynamics, once a basin has been filled in, the system is effectively repelled from visiting it again for any significant length of time. In order to see how such a dynamics can be constructed, consider once again the probability distribution function in eqn. (8.6.6). Since P (s1 , ..., sn ) is an ensemble average * n + Y P (s1 , ..., sn ) = δ(fα (r1 , ..., rN ) − sα ) , (8.11.1) α=1

we can replace the phase-space average with a time average over a trajectory as Z n Y 1 T dt δ(fα (r1 (t), ..., rN (t)) − sα ), (8.11.2) P (s1 , ..., sn ) = lim T→∞ T 0 α=1

under the assumption of ergodic dynamics. In the metadynamics approach, we express the δ-function as the limit of a Gaussian function as the width goes to 0 and the height is goes to infinity:

Metadynamics 2 2 1 e−(x−a) /2σ . δ(x − a) = lim √ σ→0 2πσ 2

385

(8.11.3)

Using eqn. (8.11.3), eqn. (8.11.2) can be rewritten as P (s1 , ..., sn ) = 1

lim lim √ 2π∆s2 T

T→∞ ∆s→0

Z

0

T

n Y

(sα − fα (r1 (t), ..., rN (t))) exp − dt 2∆s2 α=1

2

!

. (8.11.4)

Thus, for finite T and ∆s, eqn. (8.11.4) represents an approximation to P (s1 , ..., sn ), which becomes increasingly accurate as T increases and the Gaussian width ∆s decreases. To see how eqn. (8.11.4) builds up the probability distribution at the point s1 , ..., sn , consider a dynamical trajectory initiated such that fα (r1 (0), ..., rN (0)) = sα for all α. If s1 , ..., sn is a point of low free energy, this is also a point of high probability, and the trajectory will sample more points for which fα (r1 (t), ..., rN (t)) is in the neighborhood of sα , where the Gaussian has its peak, than it would if s1 , ..., sn were a point of high free energy or low probability. In the latter case, the trajectory will tend to drift away from such points of low probability, fα (r1 (t), ..., rN (t)) will no longer be close to s1 , ..., sn , and the Gaussian will contribute negligibly to the overall probability. For numerical evaluation, the integral in eqn. (8.11.4) is written as a discrete sum so that the approximation becomes P (s1 , ..., sn ) ≈ 1

√ 2π∆s2 T

N−1 X k=0

n 2 X (sα − fα (r1 (k∆t), ..., rN (k∆t))) exp − 2∆s2 α=1

!

.

(8.11.5)

Again, assuming that fα (r1 (0), ..., rN (0)) = sα , we can write eqn. (8.11.5) as P (s1 , ..., sn ) ≈ √

1 2π∆s2 T

"

1+

N−1 X k=1

n X (sα − fα (r1 (k∆t), ..., rN (k∆t)))2 exp − 2∆s2 α=1

!#

. (8.11.6)

Now, since A(s1 , ..., sn ) = −kT ln P (s1 , ..., sn ), if we assume the Gaussians are generally small in amplitude, then using ln(1 + x) ≈ x, we can approximate the free energy (apart from an arbitrary offset) as ! N−1 n 2 X X (sα − fα (r1 (k∆t), ..., rN (k∆t))) A(s1 , ..., sn ) ≈ −kT exp − + const. 2∆s2 α=1 k=1 (8.11.7)

386

Free-energy calculations

Equation (8.11.7) suggests an intriguing bias potential that can be added to the original potential U (r1 , ..., rN ) to help the system sample the free-energy hypersurface while allowing for a straightforward reconstruction of this surface directly from the dynamics. Consider a bias potential of the form (Laio and Parrinello, 2002; Laio et al., 2005)

UG (r, t) = W

t X

t′ =τG ,2τG ,...,

# n 2 X (fα (r) − fα (rG (t′ ))) exp − , 2∆s2 α=1 "

(8.11.8)

where r ≡ r1 , ..., rN , as usual, and rG (t) is the time evolution of the complete set of Cartesian coordinates up to time t under the action of the potential U (r) + UG (r, t), and τG is a time interval. The purpose of this bias potential is to add Gaussians of height W and width ∆s at intervals τG to the potential energy so that as time increases, these Gaussians accumulate. If the system starts in a deep basin on the

Fig. 8.9 Illustration of the metadynamics procedure for a symmetric double well.

potential energy surface, then this basin will be “filled in” by the Gaussians, thereby lifting the system up toward the barrier until it is able to cross into the next basin, which is subsequently filled by Gaussians until the system can escape into the next basin, and so forth. The process is illustrated in Fig. 8.9. Our analysis of the d-AFED/TAMD approach shows that if the reaction coordinates move relatively slowly, then they move instantaneously not on the bare potential energy surface but on the potential of mean force surface. Thus, if small Gaussians (adjusted through the parameter W ) are added slowly enough, then as time increases, UG takes on the shape of −A(s1 , ..., sn ), since it has maxima where A has minima, and vice versa. Thus, given a long trajectory rG (t) generated using the bias potential, the free-energy hypersurface is constructed using A(s1 , ..., sn ) ≈ −W

X

t′ =τG ,2τG ,...,

# n 2 X (sα − fα (rG (t′ ))) exp − . 2∆s2 α=1 "

(8.11.9)

Metadynamics

387

Given the points rG (t) generated during a metadynamics run, the times at which Gaussians were deposited, and the parameters W and ∆s, eqn. (8.11.9) can be used to generate the free-energy surface by evaluating it at s1 , ..., sn ≡ s. Of course, the free-energy surface can only be reconstructed in this way if s lies in regions sampled during the metadynamics run, for otherwise the Gaussians will vanish if s lies in a region that was never visited during the run. Moreover, if free-energy surfaces of high dimensionality are sought, e.g., n > 3, the high dimensionality of the Gaussians makes it difficult for eqn. (8.11.9) to achieve sufficient coverage that a smooth surface is generated. For this reason, metadynamics typically fails in generating free-energy surfaces of dimensionality beyond n = 3, in contrast to d-AFED/TAMD, which can be employed with larger values of n (Abrams and Vanden-Eijnden, 2010). A difficulty with metadynamics concerns the question of when to terminate a run. Once all basins are filled, the metadynamics procedure prescribes the continued addition of Gaussians, which means that the free energy will never truly converge but, rather, will simply fluctuate about the correct result, leading to an error in the free energy that is proportional to the square root of the deposition rate of the Gaussians (Barducci et al., 2008). However, we can use the history of the bias potential itself to create an adaptive bias that converges to the correct free-energy surface. A scheme that implements this idea is well-tempered metadynamics (Barducci et al., 2008). In a well-tempered metadynamics run, a bias potential similar to that in eqn. (8.11.8) is constructed as follows:

UWT (r, t) = W

t X



e−UWT (r,t )/∆T e−

Pn

α=1

(fα (r)−fα (rG (t′ )))2 /2∆s2 , (8.11.10)

t′ =τG ,2τG ,...,

where ∆T is a parameter that defines an elevated “pseudotemperature” T + ∆T > T , which helps accelerate barrier crossing. The key feature of well-tempered metadynamics is that the rate at which the bias potential is added decreases by an exponential factor given by exp(−UWT (r1 , ..., rN )/∆T ) such that in the limit that t → ∞, the rate drops to zero and a probability distribution PWT (s1 , ..., sn ) of the form (Barducci et al., 2008; Dama et al., 2014) PWT (s1 , ..., sn ) ∝ e−A(s1 ,...,sn )/k(T +∆T ) iT /(T +∆T ) h ∝ e−A(s1 ,...,sn)/kT

(8.11.11)

is generated. Note the similarity between eqn. (8.11.11) and eqn. (8.10.11), indicating the advantage of using temperature boosting in both d-AFED/TAMD and in metadynamics. It is worth noting other forms of metadynamics, including the so-called parallel-replica/bias-exchange metadynamics (Piana and Laio, 2007), in which metadynamics is performed on a set of n replicas, each of which accumulates a bias on just one of the n collective variables, thereby permitting larger n, and variational metadynamics (Valsson and Parrinello, 2014), which uses a variational procedure to find an optimal metadynamics bias potential.

388

Free-energy calculations

8.11.1

Unifying metadynamics and d-AFED/TAMD

Both metadynamics and well-tempered metadynamics can be formulated in the extended phase-space framework used in d-AFED/TAMD (Cuendet and Tuckerman, 2014). This feature makes combining metadynamics with d-AFED/TAMD straightforward and leads to a method that leverages the advantages of both methodologies. Consider incorporating a bias potential of the form # " n t X X (sα − sα (t′ ))2 (8.11.12) Ubias (s, t) = W exp − 2∆s2 ′ α=1 t =τG ,2τG ,...,

into eqns. (8.10.14), such that the modified equations of motion read r˙ i =

pi mi

p˙ i = Fi − s˙ α =

πα µα

n X

α=1

κα (fα (r) − sα )

π˙ α = κα (fα (r) − sα ) −

∂fα + Therm(T ) ∂ri

∂Ubias (s, t) + Therm(Ts ). ∂sα

(8.11.13)

Equations (8.11.13) constitute the unified free-energy dynamics (UFED) method (Chen et al., 2012). The mechanism of eqns. (8.11.13) is similar to d-AFED/TAMD except that as the system sweeps over the free-energy landscape, Gaussians are deposited at a specified rate, leading to a more uniform filling of basins, which, in turn, accelerates convergence of d-AFED/TAMD calculations. In a UFED calculation, the bias potential, Ubias (s, t), serves only to enhance the temperature-accelerated adiabatic sampling but is not used for free-energy reconstruction. Although it is beyond the scope of this book, a detailed analysis by Chen et al. (2012) shows that in the limit of adiabatic decoupling, the mean force average becomes independent of Ts and Ubias (s, t), which means that the average in eqn. (8.10.9) can be used directly in the same way it is used in a d-AFED/TAMD simulation. In order to illustrate how adding Ubias (s, t) can accelerate a d-AFED/TAMD simulation, we show, in Fig. 8.10, the convergence of the four-dimensional free-energy surface (FES) of the gas-phase alanine tripeptide using the two pairs of Ramachandran angles as collective variables. The free-energy surface is generated using both UFED and d-AFED/TAMD for the CHARMM22 force field (MacKerell et al., 1998). The d-AFED/TAMD parameters are Ts = 1500 K, κ = 168.0 amu·˚ A2 /rad2 , and µ = 1008 kcal/mol·rad2. For the bias potential, the Gaussian height W is 2 kcal/mol, the width ∆s is 0.63 radians, and the deposition rate τG is 100 fs. The total simulation time is 200 ns for d-AFED/TAMD and 300 ns for UFED. The convergence of the simulation is measured by computing an L1 norm between the four-dimensional surface generated from either d-AFED/TAMD or UFED and a reference surface generated from a 1 µs

Metadynamics

389

Fig. 8.10 (Left) Snapshot of the alanine tripeptide showing the two sets of Ramachandran angles (φ1 , ψ1 , φ2 , ψ2 ). Color code: red = O, blue = N, grey = C. (Right) Comparison of the L1 norm of the corresponding four-dimensional free-energy surface difference between UFED or d-AFED/TAMD and a reference free-energy surface for the alanine tripeptide in the gas phase space (Chen et al., 2012).

run using parallel tempering (see Section 7.5). The figure shows that the use of the bias potential Ubias (s, t) causes the L1 error measure to drop below 0.1 kcal/mol by 50 ns, which is roughly a factor of 4 improvement over use of d-AFED/TAMD alone. As we noted above for metadynamics, it is important to remember that the efficacy of the bias potential Ubias (s, t) will degrade as the dimensionality increases, which means that d-AFED/TAMD would need to be employed on its own in order to treat large numbers of collective variables. The next section describes a scheme for applying different sampling methods to different subsets of collective variables. 8.11.2

Unifying metadynamics, d-AFED/TAMD, and umbrella sampling

According to eqns. (8.11.13), the UFED method adds the bias potential Ubias (s, t) to all of the collective variables equally. This is reasonable if the number of collective variables is not too large, i.e., n . 4–6, and if all collective variables are of the same type. However, it might happen that different types of collective variables, e.g., distances, dihedral angles, and energies, are targeted simultaneously in one simulation. One might then expect that optimal treatment of different types of collective variables could be achieved by tailoring the enhanced sampling approach to each type of variable. For example, local collective variables, such as dihedral angles, having finite domains and ranges characterized by large barriers, might best be treated using d-AFED/TAMD, while collective variables such as distances, radii of gyration, or energies, having wider domains and more global character, could be treated with metadynamics, UFED, the blue moon ensemble, or umbrella sampling. The temperatureaccelerated sliced sampling (TASS) method, introduced by Awasthi and Nair (2017), combines d-AFED/TAMD, metadynamics or UFED, and umbrella sampling. As an illustration of TASS, suppose, of the n collective variables employed in an enhancedsampling simulation, we wish to treat s1 with umbrella sampling, s2 with UFED, and

390

Free-energy calculations

the rest with d-AFED/TAMD. In such a simulation, all n components of s are kept at the high temperature Ts and are adiabatically decoupled from the physical variables. For this example, the Hamiltonian in eqn. (8.10.6) would be replaced with (TASS) Hκ (r, s, p, π) = H(r, p) +

n n X X 1 πα2 2 + κα (fα (r) − sα ) 2µ 2 α α=1 α=1

+ Uub (s1 ) + Ubias (s2 , t).

(8.11.14)

Here, Ubias (s2 , t) is the metadynamics bias in eqn. (8.11.12) evaluated only as a function of s2 , and Uub (s1 ) is an umbrella potential (cf. eqn. (8.8.1)) Usb (s1 ) = κub (s1 − σ)2 /2. The parameter σ is the fixed value around which s1 is restrained with force constant κub . In an application of TASS from Paul et al. (2019), a set of ten collective variables, encompassing both distances and dihedral angles, was chosen to study a product-release reaction in a sugar nucleotidyltransferase enzyme. The simulations revealed low free-energy pathways for the reaction that could not have been uncovered using umbrella sampling alone.

8.12

The committor distribution and the histogram test

We conclude this chapter with a discussion of the following question: How do we know if a given reaction coordinate is a good choice for representing a particular process of interest? After all, reaction coordinates are often chosen based on some intuitive mental picture we might have of the process, and intuition can be misleading. Therefore, it is important to have a test capable of revealing the quality of a chosen reaction coordinate. To this end, we introduce the concept of a committor and its associated probability distribution function (Geissler et al., 1999). Let us consider a process that takes a system from state A to state B. We define the committor as the probability pB (r1 , ..., rN ) ≡ pB (r) that a trajectory initiated from a configuration r1 , ..., rN ≡ r with velocities sampled from a Maxwell-Boltzmann distribution will arrive in state B before state A. If the configuration r corresponds to a true transition state, then pB (r) = 1/2. Inherent in the definition of the committor is the assumption that the trajectory is stopped as soon as it ends up in either state A or B. Therefore, pB (r) = 1 if r belongs to the state B and pB (r) = 0 if r belongs to A. It can be seen that, In principle, pB (r) is an exact and universal reaction coordinate for any system. The idea of the committor is illustrated in Fig. 8.11. Unfortunately, we do not have an analytical expression for the committor, and mapping out pB (r) numerically is intractable for large systems. Nevertheless, the committor forms the basis of a useful test that is able to determine the quality of a chosen reaction coordinate. This test, referred to as the histogram test (Geissler et al., 1999; Bolhuis et al., 2002; Dellago et al., 2002; Peters, 2006), applies the committor concept to a reaction coordinate q(r). If q(r) is a good reaction coordinate, then the isosurfaces q(r) = const should approximate the isosurfaces pB (r) = const of the committor. Thus, we can test the quality of q(r) by calculating an approximation to the committor distribution on an isosurface of q(r). The committor distribution is defined to be the probability that pB (r) has the value p when q(r) = q ‡ , the value of q(r) at a maximum

Committor distribution

391

Fig. 8.11 Schematic of the committor concept. In the figure, trajectories are initiated from the isocommittor surface pB (r) = 1/2, which is also the transition state surface, so that an equal number of trajectories “commit” to basins A and B.

between A and B, which is a presumptive transition state or separatrix between the two basins. This probability distribution is given by Z Z CN P (p) = dN re−βH(r,p)δ(pB (r1 , ..., rN ) − p). (8.12.1) dN p Q(N, V, T ) q(r)=q‡ In discussing the histogram test, we will assume that q(r) is the generalized coordinate q1 (r). The histogram test is then performed as follows: 1) Fix the value of q1 (r) at q ‡ . 2) Sample an ensemble of M configurations q2 (r), ..., q3N (r) corresponding to the remaining degrees of freedom. This will lead to many values of each remain(k) (k) ing coordinate. Denote this set of orthogonal coordinates as q2 (r), ..., q3N (r), where k = 1, ..., M . 3) For each of these sampled configurations, sample a set of initial veloc(k) (k) ities from a Maxwell-Boltzmann distribution. 4) For the configuration q ‡ , q2 , ..., q3N , use each set of sampled initial velocities to initiate a trajectory and run the trajectory until the system ends up in A or B, at which point, the trajectory is stopped. Assign the trajectory a value of 1 if it ends up in state B and a value of 0 if it ends up in state A. When the complete set of sampled initial velocities is exhausted for this particular orthogonal configuration, average the 1s and 0s, and record the average value as p(k) . 5) Repeat for all of the configurations sampled in step 2 until the full set of averaged probabilities p(1) , ..., p(M) is generated. 6) Plot a histogram of the probabilities p(1) , ..., p(M) . If the histogram from step 6 peaks sharply at 1/2, then q(r) is a good reaction coordinate. However, if the histogram is broad over the entire range (0, 1), then q(r) is a poor reaction coordinate. Illustrations of good and poor reaction coordinates obtained from the histogram test are shown in Fig. 8.12. Although the histogram test can be expensive to carry out, it is, nevertheless, an important evaluation of the quality of a reaction coordinate and its associated free-energy profile. Once the investment in the histogram test is made, the payoff can be considerable, regardless of whether the reaction coordinate passes the test. If it does pass the test, then the same coordinate can be used in subsequent studies of similar systems. If it does not pass

392

Free-energy calculations

Fig. 8.12 Example histogram tests for evaluating the quality of a reaction coordinate. (a) An example of a poor reaction coordinate; (b) An example of a good reaction coordinate.

the test, then it is clear that the coordinate q(r) should be avoided for the present and similar systems. In Chapter 17, we demonstrate how machine learning can be leveraged to estimate a committor from an enhanced sampling simulation in which a large and possibly redundant set of collective variables is targeted, thereby showing how to reduce such collective-variable sets to a single, optimal reaction coordinate.

8.13

Problems

8.1. Derive eqn. (8.3.6). 8.2. Write a program to compute the free-energy profile in eqn. (8.3.6) using thermodynamic integration. How many λ points do you need to compute the integral accurately enough to obtain the correct free-energy difference A(1) − A(0)? 8.3. Write a program to compute the free-energy difference A(1) − A(0) from eqn. (8.3.6) using the free-energy perturbation approach. Can you obtain an accurate answer using a one-step perturbation, or do you need intermediate states? 8.4. Derive eqn. (8.7.25). 8.5. Derive eqn. (8.10.28). 8.6. Consider a classical system with two degrees of freedom x and y described by a potential energy U (x, y) =

2 1 U0 2 x − a2 + ky 2 + λxy, a4 2

Problems

393

and consider a process in which x is moved from the position x = −a to the position x = 0. a. Calculate the Helmholtz free-energy difference ∆A for this process in a canonical ensemble. b. Consider now an irreversible process in which the ensemble is frozen in time and, in each member of the ensemble, x is moved instantaneously from x = −a to x = 0, i.e., the value of y remains fixed in each ensemble member during this process. The work performed on each system in the ensemble is related to the change in potential energy in this process by W = U (0, y) − U (−a, y) (see eqn. (1.4.2)). By performing the average of W over the initial ensemble, that is, an ensemble in which x = −a for each member of the ensemble, show that hW i > ∆A. c. Now perform the average of exp(−βW ) for the work in part b using the same initial ensemble and show that the Jarzynski equality hexp(−βW )i = exp(−β∆A) holds. 8.7. Calculate the unbiasing (Z(r)) and curvature (G(r)) factors (see eqns. (8.7.20) and (8.7.31)) in the blue moon ensemble method for the following constraints: a. a distance between two positions r1 and r2 ; b. the difference of distances between r1 and r2 and r1 and r3 , i.e., σ = |r1 − r2 | − |r1 − r3 |; c. the bend angle between the three positions r1 , r2 , and r3 . Treat r1 as the central position; ∗

d. the dihedral angle involving the four positions r1 , r2 , r3 , and r4 .

8.8. For the enzyme-inhibitor binding free-energy calculation illustrated in Fig. 8.1, describe, in detail, the algorithm that would be needed to perform the calculation along the indirect path. What are the potential energy functions that would be needed to describe each endpoint? ∗

8.9. a. Write a program to perform an adiabatic free-energy dynamics calculation of the free-energy profile A(x) corresponding to the potential in Problem 8.6. Using the following values in your program: a = 1, U0 = 5, kTy = 1, kTx = 5, my = 1, mx = 1000, λ = 2.878. Use separate Nos´eHoover chains to control the x and y temperatures. b. Use your program to perform the histogram test of Section 8.12. Does your histogram peak at p = 1/2?

394

Free-energy calculations

8.10. Write adiabatic dynamics and thermodynamic integration codes to generate the λ free-energy profile of Fig. 8.2 using the switches f (λ) = (λ2 − 1)4 and g(λ) = ((λ − 1)2 − 1)4 . In your adiabatic dynamics code, use kTλ = 0.3, kT = 1, mλ = 250, m = 1. For the remaining parameters, take ωx = 1, ωy = 2, and κ = 1. ∗

8.11. Develop a weighted histogram procedure to obtain the free-energy derivative dA/dqi at a set of integration points qi starting with eqn. (8.8.22). Describe the difference between your algorithm and that corresponding to the original WHAM procedure for obtaining Ak .



8.12. In this problem, we will illustrate how a simple change of integration variables in the partition function can be used to create an enhanced sampling method. The approach was originally introduced by Zhu et al. (2002) and later enhanced by Minary et al. (2007). Consider the double-well potential U (x) =

2 U0 2 x − a2 . a4

The configurational partition function is Z Z(β) = dx e−βU(x) .

a. Consider the change of variables q = f (x). Assume that the inverse x = f −1 (q) ≡ g(q) exists. Show that the partition function can be expressed as an integral of the form Z Z(β) = dq e−βφ(q) , and give an explicit form for the potential φ(q). b. Now consider the transformation q = f (x) =

Z

x

˜

dy e−β U(y)

−a

˜ (x) a continuous potenfor −a ≤ x ≤ a and q = x for |x| > a and U tial energy function. This transformation is known as a spatial-warping transformation (Zhu et al., 2002; Minary et al., 2007). Show that f (x) is a monotonically increasing function of x and, therefore, that f −1 (q) exists. Write down the partition function that results from this transformation. ˜ (x) is chosen to be U ˜ (x) = U (x) for −a ≤ x ≤ a and c. If the function U ˜ U (x) = 0 for |x| > a, then the function φ(x) is a single-well potential energy function. Sketch a plot of q vs. x, and compare the shape of φ(x) as a function of x to φ(q) as a function of q.

Problems

395

d. Argue, therefore, that a Monte Carlo calculation carried out based on φ(q), or molecular dynamics calculation performed using the Hamiltonian H(q, p) = p2 /2m + φ(q), leads to an enhanced sampling algorithm for high barriers, and show that the same equilibrium and thermodynamic properties will result. Hint: From the plot of q vs. x, argue that a small change in q leads to a change in x large enough to move it from one well of U (x) to the other. e. Develop a Monte Carlo approach for sampling the distribution function P (q) =

1 −βφ(q) e Z

from part d. f. Derive molecular dynamics equations of motion, including full expressions for the force on q using the chain rule on the derivatives (dU/dx)(dx/dq) ˜ /dx)(dx/dq), and develop a numerical procedure for obtaining and (dU these forces. ˜ (x)] in a set of orthogonal polynomials Hint: Consider expanding exp[−β U such as Legendre polynomials Pl (α(x)) with α(x) ∈ [−1, 1]. What should the function α(x) be? 5 8.13. It has been suggested (Peters et al., 2007) that the committor probability pB (r) for a single reaction coordinate q(r) can be approximated by a function πB (q(r)) that depends on r only through q(r). a. What are the advantages and disadvantages of such an approximation? b. Suppose that πB (q(r)) can be accurately fit to the following functional form 1 + tanh(q(r)) . πB (q(r)) = 2 Is q(r) a good reaction coordinate? Why or why not? 8.14. Starting from eqn. (8.10.7), derive an expression for the Hessian of the free energy ∂2A Hαβ = ∂sα ∂sβ and, in particular, show that it can be expressed in terms of a covariance matrix involving κα (fα (r) − sα ), κβ (fβ (r) − sβ ), and averages of these quantities and their products. 5 It is worth noting that machine learning techniques (cf. Chapter 17) exist for generating transformations x → q in multiple dimensions such that P (q) is smoother and more easily sampled than P (x). These are known as Boltzmann generators (Noe et al., 2019; Tuckerman, 2019), generated from a type of model known as a normalizing flow. They rely on similar ideas as addressed in this problem.

396

Free-energy calculations

8.15. Suppose we have generated a multidimensional free-energy surface A(s1 , ..., sn ) at temperature T as a function of n variables but we wish to project it down ˜ 1 , s2 ) for presentation purposes. Derive a formula for obto a function A(s ˜ 1 , s2 ) from the full free-energy function taining the projected function A(s A(s1 , ..., sn ).

9 Quantum mechanics 9.1

Introduction: Waves and particles

The first half of the twentieth century witnessed a revolution in physics. Classical mechanics, with its deterministic world view, was shown not to provide a correct description of nature. New experiments were looking deeper into the microscopic world than had been hitherto possible, and the results could not be rationalized using classical concepts. Consequently, a paradigm shift occurred: classical determinism needed to be overthrown, and a new perspective on the physical world emerged. One of the earliest breakthroughs concerned the blackbody radiation problem. The classical theory of electromagnetism predicts that the intensity of electromagnetic radiation from a blackbody at wavelength λ is proportional to 1/λ2 , which diverges as λ → 0. Experimentally, the intensity is observed to vanish as λ → 0. In 1901, the German physicist Max Planck postulated that the radiated energy cannot take on any value but is quantized according to the formula E = nhν, where ν is the frequency of the radiation, n is an integer, and h is a constant. With this simple hypothesis, Planck correctly predicted the shape of the intensity versus wavelength curves and determined the value of h. The constant h is now known as Planck’s constant and has the value of h = 6.6208 × 10−34 J·s. A second breakthrough concerned the so-called photoelectric effect. It is observed that when light of sufficiently high frequency impinges on a metallic surface, electrons are ejected from the surface with a residual kinetic energy that depends on the light’s frequency. According to classical mechanics, the energy carried by an electromagnetic wave is proportional to its amplitude, independent of its frequency, which contradicts the observation. Planck’s hypothesis, however, implies that the impinging light carries energy proportional to its frequency. Applying Planck’s idea, Albert Einstein was able to provide a correct explanation of the photoelectric effect in 1905 and was awarded the Nobel prize for this work in 1921. The photoelectric effect also suggests that, in the context of the experiment, the impinging light behaves less like a wave and more like a massless “particle” that is able to transfer energy to the electrons. Finally, a fascinating experiment carried out by Davisson and Germer in 1927 investigated the interference patterns registered by a photosensitive detector when electrons impinge on a diffraction grating. This experiment revealed an interference pattern very similar to that produced when coherent light impinges on a diffraction grating, suggesting that the electrons in such a diffraction experiment behave less like particles and more like waves. Moreover, where an individual electron strikes the detector cannot be predicted. All that can be predicted is the probability that the electron will strike

398

Quantum mechanics

the detector in some small region. This observation suggests an object that exhibits “wave-like” behavior over one that follows a precise particle-like trajectory predictable from a deterministic equation of motion. The notions of energy quantization, unpredictability of individual experimental outcomes, and particle-wave duality are aspects of the modern theory of the microscopic world known as quantum mechanics. Yet even this particle/wave description is incomplete. For what exactly does it mean for a particle to behave like a wave and a wave to behave like a particle? To answer this, we need to specify more precisely what we mean by “wave” and “particle.” In general, a wave is a type of field describing something that can vary over an extended region of space as a function of time. Examples are the displacement of a plucked string over its length or the air pressure inside of an organ pipe. Mathematically, a wave is described by an amplitude, A(x, t) (in one dimension) that depends on both space and time. In classical wave mechanics, the form of A(x, t) is determined by solving the (classical) wave equation. Quantum theory posits that the probability of an experimental outcome is determined from a particular “wave” that assigns to each possible outcome a (generally complex) probability amplitude Ψ. If, for example, we are interested in the probability that a particle will strike a detector at a location x at time t, then there is an amplitude Ψ(x, t) for this outcome, which is also referred to as the wave function. From the amplitude, the probability that the particle will strike the detector in a small region dx about the point x at time t is given by P (x, t)dx = |Ψ(x, t)|2 dx. Here, P (x, t) = |Ψ(x, t)|2

(9.1.1)

is known as the probability density or probability distribution. Such probability amplitudes are fundamental in quantum mechanics because they directly relate to the possible outcomes of experiments and lead to predictions of average quantities obtained over many trials of an experiment. These averages are known as expectation values. The spatial probability amplitude, Ψ(x, t), is determined by a particular type of wave equation known as the Schr¨odinger equation. As we will see shortly, the framework of quantum mechanics describes how to compute the probabilities and associated expectation values of any type of physical observable beyond the spatial probability distribution. We now seek to understand what is meant by “particle” in quantum mechanics. A particularly elegant description was provided by Richard Feynman in the context of his path integral formalism (to be discussed in detail in Chapter 12). As we noted above, the classical notion that particles follow precise, deterministic trajectories breaks down in the microscopic realm. Indeed, if an experiment can have many possible outcomes with different associated probabilities, then it should follow that a particle can follow many different possible paths between the initiation and detection points of an experimental setup. Moreover, it must trace all of these paths simultaneously! In order to build up a probability distribution P (x, t), the different paths that a particle can follow will have different associated weights or amplitudes. Since the particle evolves unobserved between initiation and detection, it is impossible to conclude that a particle follows a particular path from one point to the other, and according to Feynman’s concept, physical predictions can only be made by summing over all possible paths

Postulates of quantum mechanics

399

that lead from the initiation point to the detection point. This sum over paths is referred to as the Feynman path integral. As we will see in Chapter 12, the classical path, i.e., the path predicted by extremizing the classical action, is the most probable path, thereby indicating that classical mechanics naturally emerges as an approximation to quantum mechanics. Proceeding as we did for classical statistical mechanics, this chapter will review the basic principles of quantum mechanics. In the next chapter, we will lay out the statistical mechanical rules for connecting the quantum description of the microscopic world to macroscopic observables. These chapters are by no means meant to be an overview of the entire field of quantum mechanics, which could (and does) fill entire books. Here, we seek only to develop the quantum-mechanical concepts that we will use in our treatment of quantum statistical mechanics.

9.2

Review of the fundamental postulates of quantum mechanics

The fundamental postulates and definitions of quantum mechanics address the following questions: 1. How is the physical state of a system described? 2. How are physical observables represented? 3. What are the possible outcomes of a given experiment? 4. What is the expected result when an average over a very large number of observations is performed? 5. How does the physical state of a system evolve in time? 6. What types of measurements are compatible with each other? Let us begin by detailing how we describe the physical state of a system. 9.2.1

The state vector

In quantum theory, it is not possible to determine the precise outcome of a given single experimental measurement. Thus, unlike in classical mechanics, where the microscopic state of a system is specified by providing a definite set of coordinates and velocities of the particles at any time t, the microscopic state of a system in quantum mechanics is specified in terms of the probability amplitudes for the possible outcomes of different measurements made on the system. Since we must be able to describe any type of measurement, the specification of the amplitudes remains abstract until a particular measurement is explicitly considered. The procedure for converting a set of abstract amplitudes to probabilities associated with the outcomes of particular measurements will be given shortly. For now, let us choose a mathematically useful construct for listing these amplitudes. Such a list is conveniently represented as a vector of complex numbers, which we can specify as a column vector:   α1  α2    α  |Ψi =  3  . (9.2.1)  ·    · ·

400

Quantum mechanics

We have introduced a special type of notation for this column vector, “|Ψi” with half of an angle bracket, which is called a Dirac ket vector, after its inventor, the English physicist P. A. M. Dirac (1902–1984). This notation is now standard in quantum mechanics. The components of |Ψi are complex probability amplitudes αk that are related to the corresponding probabilities by Pk = |αk |2 .

(9.2.2)

The vector |Ψi is called the state vector (conceptually, it bears some similarity to the phase-space vector used to hold the physical state in classical mechanics). The dimension of |Ψi must be equal to the number of possible states in which the system might be observed. For example, if the physical system were a coin, then we might observe the coin in a “heads-up” or a “tails-up” state, and a coin-toss experiment is needed to realize one of these states. In this example, the dimension of |Ψi is 2, and |Ψi could be represented as follows: |Ψi =



αH αT



,

(9.2.3)

where αH and αT are the (complex) amplitudes for heads-up and tails-up states, respectively. Since the sum of all the probabilities must be unity X

Pk = 1,

(9.2.4)

|αk |2 = 1.

(9.2.5)

k

it follows that X k

√ In the coin-toss example, an unbiased coin would have amplitudes αH = αT = 1/ 2. Dirac ket vectors live in a vector space known as the Hilbert space, which we will denote as H. A complementary or dual space to H can also be defined in terms of vectors of the form hΨ| = ( α∗1 α∗2 α∗3 · · · ) , (9.2.6) which is known as a Dirac bra vector. Hilbert spaces have numerous interesting properties; however, the most important one for our present purposes is the inner or scalar product between hΨ| and |Ψi. This product is defined to be hΨ|Ψi =

X k

α∗k αk =

X k

|αk |2 .

(9.2.7)

Note that the inner product requires both a bra vector and a ket vector. The terms “bra” and “ket” are meant to denote two halves of a “bracket” (h· · · | · · ·i), which is formed when an inner product is constructed. Combining eqn. (9.2.7) with (9.2.5), we see that |Ψi is a unit vector since hΨ|Ψi = 1.

Postulates of quantum mechanics

401

A more general inner product between two Hilbert-space vectors  φ1  φ2    φ  |φi =  3   ·    · · 

is defined to be hψ|φi = Note that hφ|ψi = hψ|φi∗ . 9.2.2

 ψ1  ψ2    ψ  |ψi =  3   ·    · ·

(9.2.8)

ψk∗ φk .

(9.2.9)



X k

Representation of physical observables

In quantum mechanics, physical observables are represented by linear Hermitian operators, which act on the vectors of the Hilbert space (we will see shortly why the operators must be Hermitian). When the vectors of H and its dual space are represented as ket and bra vectors, respectively, such operators are represented by matrices. Thus, if Aˆ is an operator corresponding to a physical observable, we can represent it as   A11 A12 A13 · · ·  A21 A22 A13 · · ·   Aˆ =  (9.2.10)  A31 A32 A33 · · ·  . .. .. .. .. . . . .

(The overhat notation is commonly used in quantum mechanics to denote Hilbertspace operators.) That Aˆ must be a Hermitian operator means that its matrix elements satisfy A∗ji = Aij . (9.2.11) The Hermitian conjugate of Aˆ is defined as

Aˆ† =

 A∗

11  A∗12  ∗  A13

.. .

A∗21 A∗22 A∗23 .. .

A∗31 A∗31 A∗33 .. .

··· ···  ···, .. .

(9.2.12)

ˆ Since the vectors of H are and the requirement that Aˆ be Hermitian means Aˆ† = A. ˆ column vectors, it is clear that an operator A can act on a vector |φi to yield a new ˆ vector |φ′ i via A|φi = |φ′ i, which is a simple matrix-vector product. 9.2.3

Possible outcomes of a physical measurement

Quantum mechanics postulates that if a measurement is performed on a physical ˆ the result must be one of the eigenvalues of observable represented by an operator A, ˆ From this postulate, we now see why observables must be represented by Hermitian A.

402

Quantum mechanics

operators: A physical measurement must yield a real number, and Hermitian operators have strictly real eigenvalues. In order to prove this, consider the eigenvalue problem for Aˆ cast in Dirac notation: ˆ k i = ak |ak i, A|a (9.2.13) ˆ ˆ where |ak i denotes an eigenvector of A with eigenvalue ak . For a general A, the corresponding equation cast in Dirac bra form would be hak |Aˆ† = hak |a∗k .

(9.2.14)

hak |Aˆ = hak |a∗k .

(9.2.15)

ˆ this reduces to However, since Aˆ† = A,

Thus, if we multiply eqn. (9.2.13) by the bra vector hak | and eqn. (9.2.15) by the ket vector |ak i, we obtain the following two equations: ˆ k i = ak hak |ak i hak |A|a ˆ k i = a∗k hak |ak i. hak |A|a

(9.2.16)

Consistency between these two relations requires that ak = a∗k , which proves that the eigenvalues are real. Note that the operator Aˆ can be expressed in terms of its eigenvalues and eigenvectors as X Aˆ = ak |ak ihak |. (9.2.17) k

The product |ak ihak | is known as the outer or tensor product between the ket and bra vectors and could be used to project an arbitrary vector |φi along the direction of |ak i. Another important property of Hermitian operators is that their eigenvectors form a complete orthonormal set of vectors that span the Hilbert space. In order to prove orthonormality of the eigenvectors, we multiply eqn. (9.2.13) by the bra vector haj |, which gives ˆ k i = ak haj |ak i. haj |A|a (9.2.18) On the other hand, if we start with the bra equation (remembering that Aˆ = Aˆ† and aj = a∗j ) haj |Aˆ = aj haj | (9.2.19)

and multiply by the ket vector |ak i, we obtain

ˆ k i = aj haj |ak i. haj |A|a

(9.2.20)

Subtracting eqn. (9.2.18) from (9.2.20) gives 0 = (ak − aj )haj |ak i.

(9.2.21)

If the eigenvalues of Aˆ are not degenerate, then for k 6= j, ak 6= aj , and it is clear that haj |ak i = 0. If k = j, then (aj − aj ) = 0, and haj |aj i can take on any value. This

Postulates of quantum mechanics

403

arbitrariness reflects the arbitrariness of the overall normalization of the eigenvectors ˆ The natural choice for this normalization is haj |aj i = 1, so that the eigenvectors of A. ˆ of A are unit vectors. Therefore, the eigenvectors are orthogonal and have unit length, hence, they are orthonormal. If some of the eigenvalues of Aˆ are degenerate, we can choose the eigenvectors to be orthogonal by taking appropriate linear combinations of the degenerate eigenvectors using a procedure such as Gram-Schmidt orthogonalization to produce an orthogonal set (see Problem 9.1). The last property we need to ˆ Since a rigorous proof is considerably prove is completeness of the eigenvectors of A. more involved, we will simply sketch out the main points of the proof. Let G be the orthogonal complement space to H. By this, we mean that any vector that lies entirely in G has no components along the axes of H. Let |bj i be a vector in G. Since Aˆ is ˆ k i and hak |A|b ˆ j i vanish. defined entirely in H, matrix elements of the form hbj |A|a ˆ Thus, A|bj i has no components along any of the directions |ak i. As a consequence, the operator Aˆ maps vectors of G back into G. This implies that Aˆ must have at least one eigenvector in G. However, this conclusion contradicts our original assumption that G is the orthogonal complement to H. Consequently, G must be a null space, which means that the eigenvectors of Aˆ span H.1 The most important consequence of the completeness relation is that an arbitrary vector |φi on the Hilbert space can be expanded in terms of the eigenvectors of any ˆ we have Hermitian operator. For the operator A, X X ˆ = |φi = I|φi |ak ihak |φi = Ck |ak i, (9.2.22) k

k

where the expansion coefficient Ck is given by Ck = hak |φi,

(9.2.23)

and |ak ihak |φi is the projection of |φi along |ak i. Thus, to obtain the expansion coefficient Ck , we simply compute the inner product of the vector to be expanded with ˆ will have the same eigenthe eigenvector |ak i. Finally, we note that any function g(A) vectors of Aˆ with eigenvalues g(ak ) satisfying ˆ k i = g(ak )|ak i. g(A)|a

(9.2.24)

Now that we have derived the properties of Hermitian operators and their eigenvector/eigenvalue spectra, we next consider several other aspects of the measurement process in quantum mechanics. We stated that the result of a measurement of an observable associated with a Hermitian operator Aˆ must yield one of its eigenvalues. If the state vector of a system is |Ψi, then the probability amplitude that a specific eigenvalue ak will be obtained in a measurement of Aˆ is determined by taking the inner product of the corresponding eigenvector |ak i with the state vector, αk = hak |Ψi,

(9.2.25)

1 Note that the argument pertains to finite-dimensional discrete vector spaces. In Section 9.2.5, continuous vector spaces will be introduced, for which such proofs are considerably more subtle.

404

Quantum mechanics

and the corresponding probability is Pk = |αk |2 . Interestingly, {αk } are just the coefˆ ficients of an expansion of |Ψi in the eigenvectors of A: X |Ψi = αk |ak i. (9.2.26) k

ˆ the greater Thus, the more aligned the state vector is with a given eigenvector of A, is the probability of obtaining the corresponding eigenvalue in a given measurement. ˆ then the corresponding eigenvalue must Clearly, if |Ψi is one of the eigenvectors of A, be obtained with 100% probability, since no other result is possible in this state. Although we have not yet discussed the time evolution of the state vector, one aspect of this evolution can be established immediately. According to our discussion, ˆ then immediwhen a measurement is made and yields a particular eigenvalue of A, ately following the measurement, the state vector must somehow “collapse” onto the corresponding eigenvector, for at that moment, we know with 100% certainty that a particular eigenvalue was obtained as the result. Therefore, the act of measurement changes the state of the system and its subsequent time development. Moreover, this change is abrupt and discontinuous.2 Finally, suppose a measurement of Aˆ is performed many times, with each repetition carried out on the same state |Ψi. If we average over the outcomes of these measurements, what is the result? We know that each measurement yields a result ak with probability |αk |2 . The average over these trials yields the expectation value of Aˆ defined by ˆ = hΨ|A|Ψi. ˆ hAi (9.2.27)

In order to verify this definition, consider, again, the expansion in eqn. (9.2.26). Substituting eqn. (9.2.26) into eqn. (9.2.27) gives X ˆ = ˆ ki hAi α∗j αk haj |A|a j,k

=

X j,k

=

X

α∗j αk ak haj |ak i α∗j αk ak δjk

j,k

=

X j

aj |αj |2 .

(9.2.28)

The last line shows that the expectation value is determined by summing the possible outcomes of a measurement of Aˆ (the eigenvalues aj ) times the probability |αj |2 that 2 It is important to note that the notion of a “collapsing” wave function belongs to one of several interpretations of quantum mechanics and the measurement process known as the Copenhagen Interpretation. Another interpretation, the so-called “many-worlds” interpretation, states that our ˆ is measured, a different outcome is universe is part of an essentially infinite “multiverse”; when A obtained in each member of the multiverse. It has been suggested that a more fundamental theory of the universe’s origin (e.g. string theory or loop quantum gravity) will encode a more fundamental interpretation. Many interesting articles and books exist on this subject for curious readers who wish to explore the subject further.

Postulates of quantum mechanics

405

each of these results is obtained. This is precisely what we would expect the average over many trials to yield as the number of trials goes to infinity, so that every possible outcome is ultimately obtained, including those with very low probabilities. The quantity |αj |2 gives the fraction of all the measurements that yield aj as a result. We noted above that the act of measuring an operator Aˆ causes a “collapse” of ˆ Given this, it follows that no the state vector onto one of the eigenvectors of A. experiment can be designed that can measure two observables simultaneously unless the two observables have a common set of eigenvectors. This is simply a consequence of the fact that the state vector cannot simultaneously collapse onto two different ˆ eigenvectors. Suppose two observables represented by Hermitian operators Aˆ and B have a common set of eigenvectors {|ak i} so that the two eigenvalue equations ˆ k i = ak |ak i, A|a

ˆ k i = bk |ak i B|a

(9.2.29)

are satisfied. It is then clear that ˆ k i = ak bk |ak i AˆB|a ˆ A|a ˆ k i = bk ak |ak i B

ˆ ki = B ˆ A|a ˆ ki AˆB|a ˆ ˆ ˆ ˆ (AB − B A)|ak i = 0.

(9.2.30)

ˆ −B ˆ Aˆ must vanish as an operator. The operator Since |ak i is not a null vector, AˆB ˆ−B ˆ Aˆ ≡ [A, ˆ B] ˆ AˆB

(9.2.31)

ˆ If the commutator between two operais known as the commutator between Aˆ and B. tors vanishes, then the two operators have a common set of eigenvectors and hence can ˆ that do not commute be simultaneously measured. Conversely, two operators Aˆ and B ˆ ˆ ([A, B] 6= 0) are said to be incompatible observables and cannot be simultaneously measured. 9.2.4

Time evolution of the state vector

So far, we have referred to the state vector |Ψi as a static object. In actuality, the state vector is dynamic, and one of the postulates of quantum mechanics specifies how the time evolution is determined. Suppose the system is characterized by a Hamiltoˆ The eigenvalues of H ˆ give the allowed energy levels of the system. nian operator H. (How the Hamiltonian is obtained for a quantum mechanical system when the classical Hamiltonian is known will be described in the next subsection.) As in classical mechanics, the quantum Hamiltonian plays the special role of determining the time evolution of the physical state. Quantum mechanics postulates that in the absence of a measurement, the time dependence of the state vector |Ψ(t)i is determined by i~

∂ ˆ |Ψ(t)i = H|Ψ(t)i, ∂t

(9.2.32)

which is known as the time-dependent Schr¨odinger equation after the Austrian physicist Erwin Schr¨odinger (1887–1961) (for which he was awarded the Nobel Prize in

406

Quantum mechanics

1933). Here ~ is related to Planck’s constant by ~ = h/2π and is also referred to as Planck’s constant. Since eqn. (9.2.32) is a first-order differential equation, it must be solved subject to an initial condition |Ψ(0)i. Interestingly, eqn. (9.2.32) bears a marked mathematical similarity to the classical equation that determines the evolution of the phase-space vector x˙ = iLx. The Schr¨odinger equation can be formally solved to yield the evolution ˆ |Ψ(t)i = e−iHt/~ |Ψ(0)i. (9.2.33) Again, note the formal similarity to the classical relation x(t) = exp(iLt)x(0). The unitary operator ˆ ˆ (t) = e−iHt/~ U (9.2.34) is known as the time evolution operator or the quantum propagator. The term unitary ˆ † (t)U ˆ (t) = I. ˆ Consequently, the action of U ˆ (t) on the state vector cannot means that U change the magnitude of the vector, only its direction. This is crucial, as |Ψ(t)i must always be normalized to 1 in order that it generate proper probabilities. Suppose the eigenvectors |Ek i and energy levels or eigenvalues Ek of the Hamiltonian are known from the eigenvalue equation ˆ k i = Ek |Ek i. H|E (9.2.35) It is then straightforward to show that X |Ψ(0)i = |Ek ihEk |Ψ(0)i k

|Ψ(t)i =

X k

e−iEk t/~ |Ek ihEk |Ψ(0)i.

(9.2.36)

ˆ in an If we know the initial amplitudes for obtaining the various eigenvalues of H experiment designed to measure the energy, the time evolution of the state vector can ˆ is be determined. In general, the calculation of the eigenvectors and eigenvalues of H an extremely difficult problem that can only be solved for systems with a very small number of degrees of freedom, and alternative methods for calculating observables are typically needed. 9.2.5

Position and momentum operators

Up to now, we have formulated the theory of measurement in quantum mechanics for observables with discrete eigenvalue spectra. While there certainly are observables that satisfy this condition, we must also consider operators whose spectra are possibly continuous. The most notable examples are the position and momentum operators corresponding to the classical position and momentum variables.3 In infinite space, the classical position and momentum variables are continuous, so that in a quantum description, we require operators with continuous eigenvalue spectra. If x ˆ and pˆ denote 3 Note, however, that there are important cases in which the momentum eigenvalues are discrete. An example is a free particle confined to a finite spatial domain, where the discrete momentum eigenvalues are related to the properties of standing waves. This case will be discussed in Section 9.3.

Postulates of quantum mechanics

407

the quantum mechanical position and momentum operators, respectively, then these will satisfy eigenvalue equations of the form x ˆ|xi = x|xi,

pˆ|pi = p|pi,

(9.2.37)

where x and p are the continuous eigenvalues. In place of the discrete orthonormality and completeness relations, we have continuous analogs, which take the form hx|x′ i = δ(x − x′ ), Z

ˆ dx |xihx| = I,

|φi =

Z

dx |xihx|φi,

hp|p′ i = δ(p − p′ ) Z

dp |pihp| = Iˆ

|φi =

Z

dp |pihp|φi.

(9.2.38)

The last line shows how to expand an arbitrary vector |φi in terms of the position or momentum eigenvectors. Quantum mechanics postulates that the position and momentum of a particle are not compatible observables. That is, no experiment can measure both properties simultaneously. This postulate is known as the Heisenberg uncertainty principle after the German physicist Werner Heisenberg (1901–1976) and is expressedp as a relation p x2 i − hˆ xi2 and ∆p ≡ hˆ p2 i − hˆ pi2 , between the statistical uncertainties ∆x ≡ hˆ namely ~ ∆x∆p ≥ . (9.2.39) 2 Since ∆x and ∆p are inversely proportional, the more certainty we have about a particle’s position, the less certain we are about its momentum, and vice versa. Thus, any experiment designed to measure a particle’s position with a small uncertainty must cause a large uncertainty in the particle’s momentum. The uncertainty principle also tells us that the concepts of classical microstates and phase spaces are fictions, as these require a specification of a particle’s position and momentum simultaneously. Thus, a point in phase space cannot correspond to anything physical. The uncertainty principle, therefore, supports the idea of a “coarse-graining” of phase space, which was considered in Problem 2.7 and in Section 3.2. A two-dimensional phase space should be represented as a tiling with squares of minimum area ~/2. These squares would represent the smallest area into which the particle’s position and momentum can be localized. Similarly, the phase space of an N -particle system should be coarse-grained into hypervolumes of size (~/2)3N . In the classical limit, which involves letting ~ → 0, we recover the notion of a continuous phase space as an approximation. The action of the operators x ˆ and pˆ on an arbitrary Hilbert-space vector |φi can be expressed in terms of a projection of the resulting vector onto the basis of either position or momentum eigenvectors. Consider the vector x ˆ|φi and multiply on the left by hx|, which yields hx|ˆ x|φi. Since hx|ˆ x = hx|x, this becomes xhx|φi. Remembering that the eigenvalue x is continuous, the vectors |xi form a continuous set of vectors, and hence, the inner product hx|φi is a continuous function of x, which we can denote

408

Quantum mechanics

as φ(x). Similarly, the inner project hp|φi is a continuous function of p, which we can denote as φ(p). The uncertainty principle tells us that x ˆ and pˆ do not commute. Can we, nevertheless, determine what [ˆ x, pˆ] is? If we take the particle-wave duality as our starting point, then we can, indeed, derive this commutator. Consider a free particle, for which the classical Hamiltonian is H = p2 /2m. The corresponding quantum operator is obtained by promoting the classical momentum p to the quantum operator pˆ to give the ˆ = pˆ2 /2m. Since this Hamiltonian is a function of pˆ alone, quantum Hamiltonian H ˆ ˆ and pˆ have simultaneous eigenvectors. Consider, it follows that [H, pˆ] = 0, so that H therefore, the eigenvalue equation for pˆ pˆ|pi = p|pi.

(9.2.40)

When this equation is projected into the coordinate basis, we obtain hx|ˆ p|pi = phx|pi.

(9.2.41)

The quantity hx|pi is a continuous function of the eigenvalues x and p. We can write eqn. (9.2.41) as pˆhx|pi = phx|pi (9.2.42) if we specify how pˆ acts on the continuous function hx|pi. Equation (9.2.42) is actually an equation for a continuous eigenfunction of pˆ with eigenvalue p. This eigenfunction must be a continuous function of x. According to the particle-wave duality, a free particle should behave as if it were a wave with amplitude ψ(x) = exp(±ikx), where k is the wave vector k = 2π/λ. Indeed, the de Broglie hypothesis assigns a wavelength λ = h/p to a particle of momentum p, so that k = p/~. We now posit that the function exp(±ipx/~) is an eigenfunction of pˆ and, therefore, a solution to eqn. (9.2.42) with eigenvalue p. This means that, with proper normalization, 1 eipx/~ . hx|pi = √ 2π~

(9.2.43)

However, eqn. (9.2.42) will only be true if pˆ acts on hx|pi as the derivative pˆ →

~ ∂ . i ∂x

(9.2.44)

Now, consider the commutator x ˆpˆ − pˆx ˆ. If we sandwich this between the vectors hx| and |pi, we obtain hx|ˆ xpˆ − pˆx ˆ|pi = hx|ˆ xpˆ|pi − hx|ˆ px ˆ|pi = xphx|pi − pˆhx|ˆ x|pi = xphx|pi −

~ ∂ (xhx|pi) i ∂x

Postulates of quantum mechanics

409

= xphx|pi + i~hx|pi − xˆ phx|pi = i~hx|pi,

(9.2.45)

where the penultimate line follows from eqns. (9.2.40) and (9.2.44). Since |xi and |pi are not null vectors, eqn. (9.2.45) implies that the operator ˆ x ˆpˆ − pˆxˆ = [ˆ x, pˆ] = i~I.

(9.2.46)

Next, consider a classical particle of mass m moving in one dimension with a Hamiltonian p2 + U (x). (9.2.47) H(x, p) = 2m ˆ is obtained by promoting both pˆ and x The quantum Hamiltonian operator H ˆ to operator, which yields 2 ˆ x, pˆ) = pˆ + U (ˆ H(ˆ x). (9.2.48) 2m The promotion of a classical phase-space function to a quantum operator via the substitution x → x ˆ and p → pˆ is known as the quantum-classical correspondence principle. Using eqn. (9.2.44), we can now project the Schr¨odinger equation onto the basis of position eigenvectors: ˆ x, pˆ)|Ψ(t)i = i~ hx|H(ˆ −

∂ hx|Ψ(t)i ∂t

~2 ∂ 2 ∂ Ψ(x, t) + U (x)Ψ(x, t) = i~ Ψ(x, t), 2m ∂x2 ∂t

(9.2.49)

where Ψ(x, t) ≡ hx|Ψ(t)i. Equation (9.2.49) is a partial differential equation that is often referred to as the Schr¨odinger wave equation, and the function Ψ(x, t) is referred to as the wave function. Despite the nomenclature, eqn. (9.2.49) differs from a classical wave equation in that it is complex and only first-order in time, and it includes a multiplicative potential energy term U (x)Ψ(x, t). A solution Ψ(x, t) is then used to compute expectation values at time t of any operator. In general, the promotion of classical phase-space functions a(x) or b(p), which depend only on position or momentum, to quantum operators follows by simply replacing x by the operator x ˆ and p by ˆ x) or B(ˆ ˆ p) are defined by the operator pˆ. In this case, the expectation values A(ˆ Z ˆ t = hΨ(t)|A(ˆ ˆ x)|Ψ(t)i = dx Ψ∗ (x, t)Ψ(x, t)a(x) hAi ˆ t = hΨ(t)|B(ˆ ˆ p)|Ψ(t)i = hBi

Z



dx Ψ (x, t)b



~ ∂ i ∂x



Ψ(x, t).

(9.2.50)

For phase-space functions a(x, p) that depend on both position and momentum, promotion to a quantum operator is less straightforward for the reason that in a classical function, how the variables x and p are arranged is irrelevant, but the order matters

410

Quantum mechanics

considerably in quantum mechanics! Therefore, a rule is needed as to how the operaˆ x, pˆ) is constructed. Since we will not tors xˆ and pˆ are ordered when the operator A(ˆ encounter such operators in this book, we will not belabor the point except to refer to one rule for such an ordering due to H. Weyl (1927) (see also Hillery et al. (1984)). If a classical phase-space function has the form a(x, p) = xn pm , its Weyl ordering is n m

x p

n   1 X n −→ n x ˆn−r pˆm x ˆr 2 r=0 r

(9.2.51)

for n < m. Using the analysis leading up to eqn. (9.2.50), the eigenvalue equation for the Hamiltonian can also be expressed as a differential equation:   ~2 d2 + U (x) ψk (x) = Ek ψk (x), (9.2.52) − 2m dx2 where ψk (x) ≡ hx|Ek i. The functions, ψk (x) are the eigenfunctions of the Hamiltonian. Because eqns. (9.2.49) and (9.2.52) differ only in their right-hand sides, the former and latter are often referred to as the “time-dependent” and “time-independent” Schr¨odinger equations, respectively. Equation (9.2.52) yields the well-known quantum-mechanical phenomenon of energy quantization. Even in one dimension, the number of potential functions U (x) for which eqns. (9.2.49) or (9.2.52) can be solved analytically is remarkably small.4 In solving eqn. (9.2.52), if for any given eigenvalue Ek there exist M independent eigenfunctions, then that energy level is said to be M -fold degenerate. Finally, let us extend this framework to three spatial dimensions. The position ˆ = (ˆ and momentum operators are now vectors ˆr = (ˆ x, yˆ, zˆ) and p px , pˆy , pˆz ). The components of vectors satisfy the commutation relations [ˆ x, yˆ] = [ˆ x, zˆ] = [ˆ y , zˆ] = 0 [ˆ px , pˆy ] = [ˆ px , pˆz ] = [ˆ py , pˆz ] = 0 ˆ [ˆ x, pˆx ] = [ˆ y , pˆy ] = [ˆ z , pˆz ] = i~I.

(9.2.53)

All other commutators between position and momentum components are 0. Therefore, given a Hamiltonian of the form ˆ2 ˆ = p H + U (ˆr), 2m

(9.2.54)

the eigenvalue problem can be expressed as a partial differential equation using the momentum operator substitutions pˆx → −i~(∂/∂x), pˆy → −i~(∂/∂y), pˆz → −i~(∂/∂z). This leads to an equation of the form 4 An excellent treatise on such problems can be found in the book by S. Fl¨ ugge, Practical Quantum Mechanics (1994).

Postulates of quantum mechanics



 ~2 2 ∇ + U (r) ψk (r) = Ek ψk (r), − 2m

411

(9.2.55)

where the label k = (kx , ky , kz ) indicates that three quantum numbers are needed to characterize the states. 9.2.6

The Heisenberg picture

An important fact about quantum mechanics is that it supports multiple equivalent formulations, which allows us to choose the formulation that is most convenient for the problem at hand. The picture of quantum mechanics we have been describing postulates that the state vector |Ψ(t)i evolves in time according to the Schr¨odinger equation and the operators corresponding to physical observables are static. This formulation is known as the Schr¨odinger picture of quantum mechanics. In fact, there exists a perfectly equivalent alternative formulation in which the state vector is taken to be static and the operators evolve in time. This formulation is known as the Heisenberg picture. In the Heisenberg picture, an operator Aˆ corresponding to an observable evolves in time according to the Heisenberg equation of motion: 1 ˆ ˆ dAˆ = [A, H]. dt i~

(9.2.56)

Note the mathematical similarity to the evolution of a classical phase-space function: dA = {A, H}. dt

(9.2.57)

ˆ ˆ H]/i~ This similarity suggests that the commutator [A, becomes the Poisson bracket {A, H} in the classical limit. Like the Schr¨odinger equation, the Heisenberg equation can be solved formally to yield ˆ ˆ −iHt/~ ˆ = eiHt/~ ˆ ˆ † (t)A(0) ˆ U ˆ (t). A(t) A(0)e =U

(9.2.58)

ˆ The initial value A(0) that appears in eqn. (9.2.58) is the operator Aˆ in the Schr¨odinger ˆ in picture. Thus, given a state vector |Ψi, the expectation value of the operator A(t) the Heisenberg picture is simply ˆ ˆ hA(t)i = hΨ|A(t)|Ψi.

(9.2.59)

The Heisenberg picture makes clear that any operator Aˆ that commutes with the ˆ Hamiltonian satisfies dA/dt = 0 and, hence, does not evolve in time. Such an operator is referred to as a constant of the motion. In the Schr¨odinger picture, if an operator is a constant of the motion, the probabilities associated with the eigenvalues of the operator do not evolve in time. To see this, consider the evolution of the state vector in the Schr¨odinger picture: ˆ

|Ψ(t)i = e−iHt/~ |Ψ(0)i.

(9.2.60)

412

Quantum mechanics

The probability of obtaining an eigenvalue ak of Aˆ at time t is given by |hak |Ψ(t)i|2 . Thus, taking the inner product on both sides with hak |, we find ˆ

hak |Ψ(t)i = hak |e−iHt/~ |Ψ(0)i.

(9.2.61)

ˆ = 0, then |ak i is an eigenvector of H ˆ with an eigenvalue, say Ek . Hence, the ˆ H] If [A, amplitude for obtaining ak at time t is hak |Ψ(t)i = e−iEk t/~ hak |Ψ(0)i.

(9.2.62)

Taking the absolute squares of both sides, the complex exponential disappears, and we obtain |hak |Ψ(t)i|2 = |hak |Ψ(0)i|2 , (9.2.63) which implies that the probability at time t is the same as at t = 0. Any operator that is a constant of the motion can be simultaneously diagonalized with the Hamiltonian, and the eigenvalues of the operator can be used to characterize the physical states along with those of the Hamiltonian. As these eigenvalues are often expressed in terms of integers, these integers are referred to as the quantum numbers of the state.

9.3

Simple examples

In this section, we will consider two simple examples, the free particle and the harmonic oscillator, which illustrate how energy quantization arises and how the eigenstates of the Hamiltonian can be determined and manipulated. 9.3.1

The free particle

The first example is a single free particle in one dimension. In a sense, we solved this problem in Section 9.2.5 using an argument based on the particle-wave duality. Here, we will work backwards, assuming eqn. (9.2.44) is true, and solve the eigenvalue problem explicitly. The Hamiltonian is 2 ˆ = pˆ . H 2m

(9.3.1)

ˆ can be expressed as The eigenvalue problem for H pˆ2 |Ek i = Ek |Ek i, 2m

(9.3.2)

which, from eqn. (9.2.52), is equivalent to the differential equation −

~2 d2 ψk (x) = Ek ψk (x). 2m dx2

(9.3.3)

Solution of eqn. (9.3.3) requires determining the functions ψk (x), the eigenvalues Ek , and the appropriate quantum number k. The problem can be simplified considerably ˆ commutes with pˆ. Therefore, ψk (x) are also eigenfunctions of pˆ, by noting that H

Simple examples

413

which means we can determine these by solving the simpler equation pˆ|pi = p|pi. In the coordinate basis, this is a simple differential equation ~ d φp (x) = pφp (x), i dx

(9.3.4)

which has the solution φp (x) = C exp(ipx/~). Here, C is a normalization constant to be determined by the requirement of orthonormality. First, let us note that these eigenfunctions are characterized by the eigenvalue p of momentum, hence the p subˆ by substituting script. We can verify that the functions are also eigenfunctions of H them into eqn. (9.3.3). When this is done, we find that the energy eigenvalues are also characterized by p and are given by Ep = p2 /2m. We can, therefore, write the energy eigenfunctions as ψp (x) = φp (x) = Ceipx/~ , (9.3.5) and it is clear that different eigenvalues and eigenfunctions are distinguished by their value of p. The requirement that the momentum eigenfunctions be orthonormal is expressed via eqn.R (9.2.38), i.e., hp|p′ i = δ(p − p′ ). By inserting the identity operator in the form of Iˆ = dx|xihx| between the bra and ket vectors, we can express this condition as ′

hp|p i =

Z



−∞

dxhp|xihx|p′ i = δ(p − p′ ).

(9.3.6)

Since, by definition, hp|xi = φp (x) = ψp (x), we have |C|2

Z



−∞



dx e−ipx/~ eip x/~ = |C|2 2π~δ(p − p′ ),

(9.3.7)

√ and it follows that C = 1/ 2π~. Hence, √ the normalized energy and momentum eigenfunctions are ψp (x) = exp(ipx/~)/ 2π~. These eigenfunctions are known as plane waves. Note that they are oscillating functions of x defined over the entire spatial range x ∈ (−∞, ∞). Moreover, the corresponding probability distribution function Pp (x) = |ψp (x)|2 is spatially uniform. If we consider the time dependence of the eigenfunctions   ipx iEp t ψp (x, t) ∼ exp − (9.3.8) ~ ~ (which can be easily shown to satisfy the time-dependent Schr¨odinger equation), then this represents a free wave moving to the right for p > 0 and to the left for p < 0 with frequency ω = Ep /~. As noted in Section 9.2.5, the momentum and energy eigenvalues are continuous because p is a continuous parameter that can range from −∞ to ∞. This results from the fact that the system is unbounded. Let us now consider placing our free particle in a one-dimensional box of length L, which is more in keeping with the paradigm of statistical mechanics. If x is restricted to the interval [0, L], then we need to impose boundary conditions at x = 0 and x = L. We first analyze the case of periodic

414

Quantum mechanics

boundary conditions, for which we require that ψp (0) = ψp (L). Imposing this on the eigenfunctions leads to Ceip·0/~ = 1 = C ipL/~ . (9.3.9) Since eiθ = cos θ + i sin θ, the only way to satisfy this condition is to require that pL/~ is an integer multiple of 2π. Denoting this integer as n, we have the requirement pL = 2πn ~



p=

2π~ n ≡ pn , L

(9.3.10)

and we see immediately that the momentum eigenvalues are no longer continuous but are quantized. Similarly, the energy eigenvalues are now also quantized as En =

p2n 2π 2 ~2 2 n . = 2m mL2

(9.3.11)

In eqns. (9.3.10) and (9.3.11), n can be any integer. This example illustrates the important concept that the quantized energy eigenvalues are determined by the boundary conditions. In this case, the fact that the energies are discrete leads to a discrete set of eigenfunctions distinguished by the value of n and given by ψn (x) = Ceipn x/~ = Ce2πinx/L . (9.3.12) These functions are orthogonal but not normalized. The normalization condition determines the constant C: Z L |ψn (x)|2 dx = 1 0

|C|2

Z

L

e−2πinx/L e2πinx/L dx = 1

0

|C|

2

Z

L

dx = 1

0

|C|2 L = 1 1 C= √ . L

(9.3.13)

Hence, the normalized functions for a particle in a periodic box are 1 ψn (x) = √ exp(2πinx/L). L

(9.3.14)

Another interesting boundary condition is ψp (0) = ψp (L) = 0, which corresponds to hard walls at x = 0 and x = L. We can no longer satisfy the boundary condition with a right- or left-propagating plane wave. Rather, we need to take a linear combination of right- and left-propagating waves to form a sin wave, which is also a free standing

Simple examples

415

wave in the box. This is possible because the Schr¨odinger equation is linear, hence any linear combination of eigenfunctions with the same eigenvalue is also an eigenfunction. In this case, we need to take ψp (x) = C sin

 px  ~

=

i C h ipx/~ e − e−ipx/~ , 2i

(9.3.15)

which manifestly satisfies the boundary condition at x = 0. This function satisfies the boundary condition at x = L only if pL/~ = nπ, where n is a positive integer. This leads to the momentum quantization condition p = nπL/~ ≡ pn and the energy eigenvalues p2 ~2 π 2 2 En = n = n . (9.3.16) 2m 2mL2 The eigenfunctions become ψn (x) = C sin

 nπx  L

.

(9.3.17)

p Normalizing yields C = 2/L for the constant. From eqn. (9.3.17), it is clear why n must be strictly positive. If n = 0, then ψn (x) = 0 everywhere, which would imply that the particle exists nowhere. Finally, since the eigenfunctions are already constructed from combinations of right- and left-propagating waves, to form standing waves in the box, allowing n < 0 only changes the sign of the eigenfunctions (which is a trivial phase factor) but not the physical content of the eigenfunctions (probabilities and expectation values are not affected by an overall sign). Note that the probability distribution Pn (x) = (2/L) sin2 (nπx/L) is no longer uniform. 9.3.2

The harmonic oscillator

The second example we will consider is a single one-dimensional particle moving in a harmonic potential U (x) = mω 2 x2 /2, so that the Hamiltonian becomes 2 ˆ = pˆ + 1 mω 2 x ˆ2 . H 2m 2

(9.3.18)

ˆ becomes, according to eqn. (9.2.52), The eigenvalue equation for H 



 1 ~2 d2 2 2 ψn (x) = En ψn (x). + mω x 2m dx2 2

(9.3.19)

Here, we have anticipated that because the particle is asymptotically bound (U (x) → ∞ as x → ±∞), the energy eigenvalues will be discrete and characterized by an integer n. Since the potential becomes infinitely large as x → ±∞, we have the boundary conditions ψn (∞) = ψn (−∞) = 0. The solution of this second-order differential equation is not trivial and, therefore, we will not carry out its solution in detail. However, the interested reader is referred to the excellent treatment in Principles of Quantum Mechanics by R. Shankar (1994).

416

Quantum mechanics

The solution does, indeed, lead to a discrete set of energy eigenvalues given by the familiar formula   1 ~ω n = 0, 1, 2, .... (9.3.20) En = n + 2 and a set of normalized eigenfunctions ψn (x) =



mω 2n 2 (n!)2 π~

1/4

e

−mωx2 /2~

Hn

r

 mω x , ~

(9.3.21)

where {Hn (y)} are the Hermite polynomials Hn (y) = (−1)n ey

2

dn −y2 e . dy n

(9.3.22)

The first few of these eigenfunctions are ψ0 (x) =

 α 1/4

e−αx

2

/2

π  3 1/4 2 4α xe−αx /2 ψ1 (x) = π  α 1/4  2 2αx2 − 1 e−αx /2 ψ2 (x) = 4π  3 1/4  2 α ψ3 (x) = 2αx3 − 3x e−αx /2 , 9π

(9.3.23)

where α = mω/~. These are plotted in Fig. 9.1. Note that the number of nodes in each eigenfunction is equal to n. Doing actual calculations with these eigenfunctions is mathematically cumbersome. It turns out, however, that there is a simple and convenient framework for the harmonic oscillator in terms of the abstract set of ket vectors |ni that define the eigenfunctions through hx|ni = ψn (x). If we exploit the symmetry between pˆ and x ˆ in the harmonic-oscillator Hamiltonian, we can factorize the sum of squares to give  2  pˆ mω 2 ˆ = H + x ˆ ~ω 2m~ω 2~  r   r 1 mω mω i i pˆ pˆ + ~ω. (9.3.24) = xˆ − √ x ˆ+ √ 2~ 2~ 2 2m~ω 2m~ω The extra 1/2 appearing in eqn. (9.3.24) arises from the nonzero commutator between ˆ Let us now define two operators x ˆ and pˆ, [ˆ x, pˆ] = i~I.

Simple examples

417

Fig. 9.1 The first four eigenfunctions of a harmonic oscillator.

r

mω i x ˆ+ √ pˆ 2~ 2m~ω r i mω x ˆ− √ pˆ , a ˆ† = 2~ 2m~ω a ˆ=

(9.3.25)

which can be shown to satisfy the commutation relation [ˆ a, a ˆ† ] = 1.

(9.3.26)

In terms of these operators, the Hamiltonian can be easily derived with the result   1 † ˆ ~ω. (9.3.27) H= a ˆ a ˆ+ 2 ˆ can be worked out using the fact The action of a ˆ and a ˆ† on the eigenfunctions of H that

418

Quantum mechanics

 α 1 d ψn (x) (9.3.28) x+ √ a ˆψn (x) = 2 2α dx together with the recursion relation for Hn (y): Hn′ (y) = 2nHn−1 (y). Here, we have used the fact that p = (~/i)(d/dx). After some algebra, we find that √ (9.3.29) a ˆψn (x) = nψn−1 (x). r

Similarly, it can be shown that a ˆ† ψn (x) =



n + 1ψn+1 (x).

(9.3.30)

These relations make it possible to bypass the eigenfunctions and work in an abstract ket representation of the energy eigenvectors, which we denote simply as |ni. The above relations can be expressed compactly as √ √ a ˆ|ni = n|n − 1i, a ˆ† |ni = n + 1|n + 1i. (9.3.31)

ˆ into the eigenvector corresponding Because the operator a ˆ† changes an eigenvector of H to the next highest energy, it is called a raising operator or creation operator. Similarly, ˆ into the eigenvector corresponding to the the operator a ˆ changes an eigenvector of H next lowest energy, and hence it is called a lowering operator or annihilation operator. Note that a ˆ|0i = 0 by definition. The raising and lowering operators simplify calculations for the harmonic oscillator considerably. Suppose, for example, we wish to compute the expectation value of the ˆ In principle, operator xˆ2 for a system prepared in one of the eigenstates ψn (x) of H. one could work out the scary-looking integral 1/2 Z ∞  √ 2 α x2 e−αx Hn2 ( αx)dx. hn|ˆ x2 |ni = (9.3.32) 2n 2 π2 (n!) −∞ However, since x ˆ has a simple expression in terms of the a ˆ and a ˆ† , r  ~ x ˆ= a ˆ+a ˆ† , 2mω

(9.3.33)

the expectation value can be evaluated in a few lines. Note that hn|n′ i = δnn′ by orthogonality. Thus,  ~ hn|ˆ x2 |ni = hn| a ˆ2 + a ˆa ˆ† + a ˆ† a ˆ + (ˆ a† )2 |ni 2mω ~ hp n(n − 1)hn|n − 2i + (n + 1)hn|ni = 2mω i p + nhn|ni + (n + 1)(n + 2)hn|n + 2i =

~ (2n + 1). 2mω

(9.3.34)

Thus, by expressing xˆ and pˆ in terms of a ˆ and a ˆ† , we can easily calculate expectation values and arbitrary matrix elements, such as hn|ˆ x2 |n′ i.

Identical particles

9.4

419

Identical particles in quantum mechanics: Spin statistics

In 1922, an experiment carried out by Otto Stern and Walter Gerlach showed that quantum particles possess an intrinsic property that, unlike charge and mass, has no classical analog. When a beam of silver atoms was sent through an inhomogeneous magnetic field with a field increasing from the south to north poles of the magnet, the beam split into two distinct beams. The experiment was repeated in 1927 by T. E. Phipps and J. B. Taylor with hydrogen atoms in their ground state in order to ensure that the effect truly revealed an electronic property. The result of the experiment suggests that the particles comprising the beam possess an intrinsic property that couples to the magnetic field and takes on discrete values. This property is known as the ˆ M of the particle, which is defined in terms of a more fundamental magnetic moment µ ˆ These two quantities are related property called spin denoted by the vector operator S. ˆ where the constant of proportionality γ is the spin gyromagnetic ratio, ˆ M = γ S, by µ ˆ fixed in space but interacting with γ = −e/me c. The energy of a particle of spin S ˆ ˆ M · B = −γ S · B. Unlike charge and mass, which are a magnetic field B is E = −µ simple scalar quantities, spin is expressed as a vector operator and can take on multiple values for a given particle. When the beam in a Stern-Gerlach experiment splits in the magnetic field, for example, this indicates that there are two possible spin states. Since a particle with a magnetic moment resembles a tiny bar magnet, the spin state that has the south pole of the bar magnet pointing toward the north pole of the external magnetic field will be attracted to the stronger field region, and the opposite spin state will be attracted toward the weaker field region. ˆ = (Sˆx , Sˆy , Sˆz ) satisfy the The three components of the spin operator vector S commutation relations [Sˆx , Sˆy ] = i~Sˆz ,

[Sˆy , Sˆz ] = i~Sˆx ,

[Sˆz , Sˆx ] = i~Sˆy .

(9.4.1)

These commutation relations are similar to those satisfied by the three components ˆ = ˆr × p ˆ . A convenient way to remember the of the angular momentum operator L ˆ S ˆ = i~S. ˆ commutation relations is to note that they can be expressed compactly as S× Since spin is an intrinsic property, a particle is said to be a spin-s particle, where s can be either an integer or a half-integer. A spin-s particle can exist in 2s + 1 possible spin states, which, by convention, are taken to be the eigenvectors of the operator Sˆz . The eigenvalues of Sˆz then range from −s~, (−s + 1)~, ..., (s − 1)~, s~. For example, the spin operators for a spin-1/2 particle can be represented by 2×2 matrices of the form       ~ 0 −i ~ 1 0 ~ 0 1 , Sˆy = , Sˆz = , (9.4.2) Sˆx = 2 1 0 2 i 0 2 0 −1 and the spin-1/2 Hilbert space is a two-dimensional space. The two spin states have associated spin eigenvalues sz = m~/2, where m = −1/2 and m = 1/2, and the corresponding eigenvectors are given by     1 0 |m = 1/2i ≡ |χ1/2 i = , |m = −1/2i ≡ |χ−1/2 i = , (9.4.3) 0 1

420

Quantum mechanics

which are (arbitrarily) referred to as “spin-up” and “spin-down,” respectively. The spin-up and spin-down states are also sometimes denoted |αi and |βi, though we will ˆ ·S ˆ = Sˆx2 + Sˆy2 + Sˆz2 not make use of this nomenclature. Note that the operator Sˆ2 = S is diagonal and, therefore, shares common eigenvectors with Sˆz . These eigenvectors are degenerate, however, having the eigenvalue s(s + 1)~2 . Finally, if the Hamiltonian ˆ are also eigenvectors of Sˆ2 is independent of the spin operator, then eigenvectors of H ˆ and Sz , since all three can be simultaneously diagonalized. How the physical states of identical particles are constructed depends on the spin of the particles. Consider the example of two identical spin-s particles. Suppose a measurement is performed that can determine that one of the particles has an Sˆz eigenvalue of ma ~ and the other mb ~ such that ma 6= mb . Is the state vector of the total system just after this measurement |ma ; mb i ≡ |ma i⊗|mb i or |mb ; ma i ≡ |mb i⊗|ma i? Note that, in the first state, particle 1 has an Sˆz eigenvalues ma ~, and particle 2 has mb ~ as the Sˆz eigenvalue. In the second state, the labeling is reversed. The answer is that neither state is correct. Since the particles are identical, the measurement is not able to assign the particular spin states of each particle. In fact, the two states |ma ; mb i and |mb ; ma i are not physically equivalent states. Two states |Ψi and |Ψ′ i can only be physically equivalent if there is a complex scalar α such that |Ψi = α|Ψ′ i,

(9.4.4)

and there is no such number relating |ma ; mb i to |mb ; ma i. Therefore, we need to construct a new state vector |Ψ(ma , mb )i such that |Ψ(mb , ma )i is physically equivalent to |Ψ(ma , mb )i. Such a state is the only possibility for correctly representing the physical state of the system immediately after the measurement. Let us take as an ansatz |Ψ(ma , mb )i = C|ma ; mb i + C ′ |mb ; ma i.

(9.4.5)

|Ψ(ma , mb )i = α|Ψ(mb , ma )i,

(9.4.6)

C|ma ; mb i + C ′ |mb ; ma i = α (C|mb ; ma i + C ′ |ma ; mb i) ,

(9.4.7)

If we require that then from which it can be seen that C = αC ′

C ′ = αC

(9.4.8)

or C ′ = α2 C ′ .

(9.4.9) ′

The only solution to these equations is α = ±1 and C = ±C . This gives us two possible physical states of the system: a state that is symmetric (S) under an exchange of Sˆz eigenvalues and one that is antisymmetric (A) under such an exchange. These states are given by

Identical particles

421

|ΨS (ma , mb )i ∝ |ma ; mb i + |mb ; ma i |ΨA (ma , mb )i ∝ |ma ; mb i − |mb ; ma i.

(9.4.10)

Similarly, suppose we have two identical particles in one dimension, and we perform an experiment capable of determining the position of each particle. If the measurement determines that one particle is at position x = a and the other is at x = b, then the state of the system after the measurement would be one of the two following possibilities: |ΨS (a, b)i ∝ |a bi + |b ai |ΨA (a, b)i ∝ |a bi − |b ai.

(9.4.11)

How do we know whether a given pair of identical particles will opt for the symmetric or antisymmetric state? In order to resolve this ambiguity, the standard postulates of quantum mechanics need to be supplemented by an additional postulate that specifies which of the two possible physical states the particle pair will assume. The new postulate states the following: In nature, particles are of two possible types—those that are always found in symmetric (S) multiparticle states and those that are always found in antisymmetric (A) multiparticle states. The former are known as bosons (named for the Indian physicist Satyendra Nath Bose (1894–1974)) and the latter as fermions (named for the Italian physicist Enrico Fermi (1901-1954)). Fermions are half-integer-spin particles (s = 1/2, 3/2, 5/2,...), while bosons are integer-spin particles (s = 0, 1, 2,...). Examples of fermions are electrons, protons, neutrons, and 3 He nuclei, all of which are spin-1/2 particles. Examples of bosons are 4 He nuclei, which are spin-0, and photons, which are spin-1. Note that the antisymmetric state has the important property that if ma = mb , |ΨA (ma , ma )i = |ΨA (mb , mb )i = 0. Since identical fermions are found in antisymmetric states, it follows that no two identical fermions can be found in nature in exactly the same quantum state. Put another way, no two identical fermions can have the same set of quantum numbers. This statement is known as the Pauli exclusion principle after the Austrian physicist Wolfgang Pauli (1900–1958). Suppose a system is composed of N identical fermions or bosons with coordinate labels r1 , ..., rN and spin labels s1 , ..., sN . The spin labels designate the eigenvalue of Sˆz for each particle. Let us define, for each particle, a combined label xi ≡ ri , si . Then, for a given permutation P (1), ..., P (N ) of the particle indices 1,..,N , the wave function will be totally symmetric if the particles are bosons: ΨB (x1 , ..., xN ) = ΨB (xP (1) , ...., xP (N ) ).

(9.4.12)

For fermions, as a result of the Pauli exclusion principle, the wave function is antisymmetric with respect to an exchange of any two particles in the system. Therefore,

422

Quantum mechanics

in creating the given permutation, the wave function will pick up a factor of −1 for each exchange of two particles that is performed: ΨF (x1 , ..., xN ) = (−1)Nex ΨF (xP (1) , ...., xP (N ) ),

(9.4.13)

where Nex is the total number of exchanges of two particles required in order to achieve the permutation P (1), ..., P (N ). An N -particle bosonic or fermionic state can be created from a state Φ(x1 , ..., xN ) which is not properly symmetrized but which, nevertheless, is an eigenfunction of the Hamiltonian ˆ = EΦ. HΦ

(9.4.14)

Since there are N ! possible permutations of the N particle labels in an N -particle state, the bosonic state ΨB (x1 , ..., xN ) is created from Φ(x1 , ..., xN ) according to N! 1 X ˆ √ Pα Φ(x1 , ..., xN ), ΨB (x1 , ..., xN ) = N ! α=1

(9.4.15)

where Pˆα creates 1 of the N ! possible permutations of the indices. The fermionic state is created from N! 1 X N (α) ΨF (x1 , ..., xN ) = √ (−1) ex Pˆα Φ(x1 , ..., xN ), N ! α=1

(9.4.16)

where Nex (α) is the number of exchanges needed to create permutation α. The N ! that appears in the physical states is exactly the N ! introduced ad hoc in the expressions for the classical partition functions to account for the identical nature of the particles not explicitly treated in classical mechanics.

9.5

Problems 9.1. Generalize the proof in Section 9.2.3 of orthogonality of the eigenvectors of a Hermitian operator to the case that some of the eigenvalues of the operator are degenerate. Start by considering two degenerate eigenvectors: If |aj i and |ak i are two eigenvectors of Aˆ with eigenvalue aj , show that two new eigenvectors |a′j i and |a′k i can be constructed such that |a′j i = |aj i and |a′k i = |ak i + c|aj i, where c is a constant, and determine c such that ha′j |a′k i = 0. Generalize the procedure to an arbitrary degeneracy. 9.2. A spin-1/2 particle that is fixed in space interacts with a uniform magnetic field B. The magnetic field lies entirely along the z-axis, so that B = (0, 0, B). The Hamiltonian for this system is therefore ˆ = −γB Sˆz . H The dimensionality of the Hilbert space for this problem is 2.

Problems

423

ˆ a. Determine the eigenvalues and eigenvectors of H. b. Suppose the system is prepared with an initial state vector   1 |Ψ(0)i = . 0 Determine the state vector |Ψ(t)i at time t.

c. Determine the expectation values of the operators Sˆx , Sˆy , and Sˆz in the time-dependent state computed in part b. d. Suppose, instead, the system is prepared with an initial state vector   1 1 |Ψ(0)i = √ . 2 1 Determine the state vector |Ψ(t)i at time t. e. Using the time-dependent state computed in part d, determine the following expectation values: hΨ(t)|Sˆx |Ψ(t)i, hΨ(t)|Sˆy |Ψ(t)i, hΨ(t)|Sˆz |Ψ(t)i. f. For the time-dependent state in part q d, determine the uncertainties ∆Sx , ∆Sy , and ∆Sz , where ∆Sα = hΨ(t)|Sˆα2 |Ψ(t)i − hΨ(t)|Sˆα |Ψ(t)i2 , for α = x, y, z. 9.3. Consider a free particle in a one-dimensional box that extends from x = −L/2 to x = L/2. Assuming periodic boundary conditions, determine the eigenvalues and eigenfunctions of the Hamiltonian for this problem. Repeat for infinite walls at x = −L/2 and x = L/2. 9.4. A rigid homonuclear diatomic molecule rotates in the xy plane about an axis through its center of mass. Let m be the mass of each atom in the molecule, and let R be its bond length. Show that the molecule has a discrete set of energy levels (energy eigenvalues) and determine the corresponding eigenfunctions. 9.5. Given only the commutator relation between x ˆ and pˆ, [ˆ x, pˆ] = i~Iˆ and the fact that pˆ → −i~(d/dx) when projected into the coordinate basis, show that the inner product relation 1 hx|pi = √ eipx/~ 2π~ follows. 9.6. Using raising and lowering operators, calculate the expectation value hn|ˆ x4 |ni ′ 4 and general matrix element hn |ˆ x |ni for a one-dimensional harmonic oscillator.

424

Quantum mechanics

9.7. Consider an unbound free particle in one dimension such that x ∈ (−∞, ∞). An initial wave function Ψ(x, 0), where  1/4 2 2 1 Ψ(x, 0) = e−x /4σ 2πσ 2 is prepared. a. Determine the time evolution of the initial wave function and the corresponding time-dependent probability density. b. Calculate the uncertainties in xˆ and pˆ at time t. What is the product ∆x∆p? ∗

9.8. A charged particle with charge q and mass m moves in an external magnetic ˆ be the position and momentum operators for field B = (0, 0, B). Let ˆr and p the particle, respectively. The Hamiltonian for the system is 2  q ˆ = 1 p ˆ − A(ˆr) , H 2m c where c is the speed of light and A(r) is called the vector potential. A is related to the magnetic field B by B = ∇ × A(r). One possible choice for A is A(r) = (−By, 0, 0). The particles occupy a cubic box of side L that extends from 0 to L in each spatial direction subject to periodic boundary conditions. Find the energy eigenvalues and eigenfunctions for this problem. Are any of the energy levels degenerate? Hint: Try a solution of the form ψ(x, y, z) = Cei(px x+pz z)/~ φ(y) and show that φ(y) satisfies a harmonic oscillator equation with frequency ω = qB/mc and equilibrium position y0 = −(cpx /qB). You may assume L is much larger than the range of y − y0 .



9.9. Consider a system of N identical particles moving in one spatial dimension. Suppose the Hamiltonian for the system is separable, meaning that it can be expressed as a sum N X ˆ = ˆ xi , pˆi ), h(ˆ H i=1

where x ˆi and pˆi are the coordinate and momentum operators for particle i. These operators satisfy the commutation relations [ˆ xi , x ˆj ] = 0,

[ˆ pi , pˆj ] = 0,

[ˆ xi , pˆj ] = i~δij .

Problems

425

ˆ x, pˆ) is of the form a. If the Hamiltonian h(ˆ pˆ2 ˆ h(ˆ x, pˆ) = + U (ˆ x), 2m ˆ can be expressed as N singleshow that the eigenvalue problem for H particle eigenvalue problems of the form   ~2 ∂ 2 + U (x) ψki (x) = εki ψki (x), − 2m ∂x2 such that the N -particle eigenvalues Ek1 ,...,kN , which are characterized by N quantum numbers, are given by Ek1 ,...,kN =

N X

εki .

i=1

b. Show that if the particles could be treated as distinguishable, then the ˆ could be expressed as a product eigenfunctions of H Φk1 ,...kN (x1 , ..., xN ) =

N Y

ψki (xi ).

i=1

c. Show that if the particles are identical fermions, then the application of eqn. (9.4.16) leads to a set of eigenfunctions Ψk1 ,...,kN (x1 , ..., xN ) that is expressible as the determinant of a matrix whose rows are of the form ψk1 (xP (1) ) ψk2 (xP (2) ) · · · ψkN (xP (N ) ). Recall that P (1), ..., P (N ) is one of the N ! permutations of the indices 1,...,N . Give the general form of this determinant. (This determinant is called a Slater determinant after its inventor John C. Slater (1900–1976).) Hint: Try it first for N = 2 and N = 3. d. Show that if the particles are bosons rather than fermions, then the eigenfunctions are exactly the same as those of part c except for a replacement of the determinant by a permanent. Hint: The permanent of a matrix can be generated from the determinant by replacing all of the minus signs with plus signs. Thus, for a 2×2 matrix   a b M= , c d perm(M ) = ad + bc.

426

Quantum mechanics

9.10 A single particle in one dimension is subject to a potential U (x). Another particle in one dimension is subject to a potential V (x). Suppose U (x) 6= V (x) + C, where C is a constant. Prove that the ground-state wave functions ψ0 (x) and φ0 (x) for each problem must be different. Hint: Try using proof by contradiction. That is, assume ψ0 (x) = φ0 (x). What relation between U (x) and V (x) is obtained? 9.11. In a certain nuclear magnetic resonance experiment, a spin-1/2 nucleus (e.g., a proton) is subject to a magnetic field B = (0, 0, B) in an environment in which its shielding is σ. The spin-1/2 nucleus is well separated from other nuclei, allowing spin-spin interactions to be neglected. The Hamiltonian in this case is then ˆ 0 = −γB(1 − σ)Sˆz H

where γ is the spin gyromagnetic ratio and Sˆz is the spin operator in the z-direction. Assume this Hamiltonian describes the system for time t < 0. At t = 0, a time-dependent magnetic field of strength B1 aligned along the x-direction is applied, so as to induce transitions among the energy levels of ˆ 0 . The total Hamiltonian for t ≥ 0 is H ˆ ˆ 0 + Pˆ (t) + Pˆ † (t) H(t) =H

where the operator Pˆ (t) is Pˆ (t) = γB1 eiωt Sˆ+ in which ω is the frequency of the time-dependent field, and   0 1 ˆ S+ = ~ . 0 0 This Hamiltonian will cause the nuclear spin to acquire a time-dependent component off the z-axis, which is determined by solving the time-dependent Schr¨odinger equation for t ≥ 0: d ˆ H(t)|Ψ(t)i = i~ |Ψ(t)i. dt This problem will walk you through a procedure for solving the time-dependent Schr¨odinger equation for this system. a. Assume that for t < 0, the nucleus is in its lowest energy state. Based on this, determine the state vector |Ψ(0)i at t = 0 immediately before the time-dependent field is switched on. This state vector will be the initial condition on the time-dependent Schr¨odinger equation. b. Derive a 2×2 matrix representation of the time-dependent Hamiltonian ˆ ˆ H(t) and show that H(t) is Hermitian.

Problems

427

c. In order to solve the equation, we will introduce a transformation of the state vector |Ψ(t)i = R† (t)|Φ(t)i and derive an equation for |Φ(t)i. Here,  −iωt/2  e 0 R(t) = . 0 eiωt/2

For this transformation, show that |Ψ(0)i = |Φ(0)i. d. Substitute the transformation |Ψ(t)i = R† (t)|Φ(t)i into the time-dependent Schr¨odinger equation and show that the resulting equation for |Φ(t)i is   d † d † ˆ R (t) |Φ(t)i. H(t)R (t)|Φ(t)i = i~R† (t) |Φ(t)i + i~ dt dt e. Multiply the equation in part d through on the left by R(t) and, after simplifying, show that the resulting equation for |Φ(t)i is † ˆ R(t)H(t)R (t)|Φ(t)i + ω Sˆz |Φ(t)i = i~

d |Φ(t)i. dt

f. Show that the operator on the right can be written as a 2×2 matrix   −H11 H12 † ˆ ˆ ˆ , HT ≡ R(t)H(t)R (t) + ω Sz = H12 H11 where H11 =

γB(1 − σ)~ − ~ω , 2

H12 = γB1 ~.

ˆ T is time-independent, we can express the solution for |Φ(t)i g. Given that H in the familiar form |Φ(t)i = |E1 ihE1 |Φ(0)ie−iE1 t/~ + |E2 ihE2 |Φ(0)ie−iE2 t/~

ˆ T , and |E1 i and |E2 i are the where E1 and E2 are the eigenvalues of H ˆ T in terms of corresponding eigenvectors. Determine the eigenvalues of H H11 and H12 , and show that the eigenvectors are  H    − D12 1 ,  C C H12 1 D

where

D = H11 −

q 2 + H2 , H11 12

1 C= p . 2 /D 2 ) 1 + (H12

Finally, show that these eigenvectors form an orthonormal set.

428

Quantum mechanics

h. Use the eigenvalues and eigenvectors from part g and the initial condition you obtained in part a to determine the time dependence of |Φ(t)i and then transform |Φ(t)i back to |Ψ(t)i using the matrix R† (t) to obtain the solution to the time-dependent Schr¨odinger equation. You may express your answer in terms of H11 , H12 , and D. i. In the state |Ψ(t)i, find the probability as a function of time t that a measurement of Sˆz yields the value ~/2. You may express your answer in terms of H11 , H12 , and D. j. Finally, show that if, at some point in time τ > 0, the time-dependent field is switched off (B1 = 0, ω = 0), |Ψ(t)i reverts back to its initial form from Part a. 9.12. In quantum computing, a qubit is a linear combination of two basic states |0i and |1i whose definitions are the same as those of a spin-1/2 particle, i.e.,     1 0 |1i = , |0i = . 0 1 One of the most useful single-qubit logic gates in quantum computing is one that takes each input |0i and |1i and creates the following superposition states: |0i − |1i |0i + |1i √ √ , |1i −→ . |0i −→ 2 2 a. Find a combination of the Pauli matrices     0 1 0 −i σx = , σy = , 1 0 i 0

σz =



1 0

0 −1



that transforms each input this way, and show that your combination leads to a unitary matrix capable of performing the action of this logic gate. b. Now consider the truth table on the next page for a quantum XOR gate: The two-input gate shown in the figure on the next page is a classical XOR gate (‘XOR’ = exclusive OR). It produces the output (A ⊕ B) ≡ (A + B) × (NOT(A) + NOT(B)). The quantum analog of the XOR gate would require two entangled qubits that produce a single-qubit output according to |0 0i −→ |0i, |0 1i −→ |1i, |1 0i −→ |1i, |1 1i −→ |0i. Can such a quantum gate be designed using only a unitary transformation? If so, specify what that transformation needs to be. If not, explain what other operations would be needed, and write down the full set of operations you propose.

Problems

429

Fig. 9.2 Truth table of the quantum XOR gate.

c. Show that the two-qubit state  √ 1 √ √ 3|0 0i − 3|0 1i + |1 0i − |1 1i 2 2 is not an entangled state, meaning that it can be written as a product of two states |ψ0 i|ψ1 i. Find the two states |ψ0 i and |ψ1 i whose product yields this state. d. Show that the two-qubit state 1 √ (|0 0i + |1 1i) 2 is an entangled state, meaning that no two states |ψ0 i and |ψ1 i can be found whose product yields this state.

10 Quantum ensembles and the density matrix 10.1

The difficulty of many-body quantum mechanics

We begin our discussion of the quantum equilibrium ensembles by considering a system of N identical particles in a container of volume V . This is the same setup we studied in Section 3.1 in developing the classical ensembles. In principle, the physical properties of such a large quantum system can be obtained by solving the full time-dependent Schr¨odinger equation. Suppose the Hamiltonian of the system is ˆ = H

N X ˆ 2i p + U (ˆr1 , ..., ˆrN ). 2m i=1

(10.1.1)

In d dimensions, there will be dN position and momentum operators. All the position operators commute with each other as do all of the momentum operators. The commutation rule between position and momentum operators is [ˆ riα , pˆjβ ] = i~δij δαβ ,

(10.1.2)

where α and β index the d spatial directions and i and j index the particle number. Given the commutation rules, the many-particle coordinate and momentum eigenvectors are direct products (also called tensor products) of the eigenvectors of the individual operators. For example, a many-particle coordinate eigenvector in three dimensions is |x1 y1 z1 · · · xN yN zN i = |x1 i ⊗ |y1 i ⊗ |z1 i · · · |xN i ⊗ |yN i ⊗ |zN i.

(10.1.3)

Thus, projecting the Schr¨odinger equation onto the coordinate basis, the N -particle Schr¨odinger equation in three dimensions becomes " # N ~2 X 2 ∂ − ∇i + U (r1 , ..., rN ) Ψ(x1 , ..., xN , t) = i~ Ψ(x1 , ...., xN , t), (10.1.4) 2m i=1 ∂t where xi = ri , si , and the expectation value of a Hermitian operator Aˆ corresponding ˆ t = hΨ(t)|A|Ψ(t)i. ˆ to an observable is hAi The problem inherent in solving eqn. (10.1.4) and evaluating the expectation value (which is a dN -dimensional integral) is that, unless an analytical solution is available, the computational overhead for a numerical

Ensemble density matrix

431

solution grows exponentially with the number of degrees of freedom. If eqn. (10.1.4) were to be solved on a spatial grid with M points along each spatial direction, then the total number of points needed would be M 3N . Thus, even on a very coarse grid with just M = 10 points, for N ∼ 1023 particles, the total number of grid points would 23 be on the order of 1010 points! But even for a small molecule of just N = 10 atoms in the gas phase, after we subtract out translations and rotations, Ψ is still a function of 24 coordinates and time. The size of the grid needed to solve eqn. (10.1.4) is large enough that the calculation is beyond the capability of current computing resources. The same is true for the N -particle eigenvalue equation # " N ~2 X 2 ∇ + U (r1 , ..., rN ) ψ{k,ms } (x1 , ..., xN ) = E{k,ms } ψ{k,ms } (x1 , ...., xN ) − 2m i=1 i (10.1.5) (see Problem 9.9). Here {k, ms } ≡ k1 , ..., kN , ms1 , ..., msN are the 4N quantum numbers, including Sˆz values, needed to characterize the eigenfunctions and eigenvalues. In fact, explicit solution of the eigenvalue equation for just 4 or 5 particles is considered a tour de force calculation. While such calculations yield a wealth of highly accurate dynamical information about small systems, if one wishes to move beyond the limits of the Schr¨odinger equation and the explicit calculation of the eigenvalues and eigenˆ statistical methods are needed. Now that we have a handle on the functions of H, magnitude of the many-body quantum mechanical problem, we proceed to introduce the basic principles of quantum equilibrium ensemble theory.

10.2

The ensemble density matrix

Quantum ensembles are conceptually very much like their classical counterparts. Our treatment here will follow somewhat the development presented by Feynman (1998). We begin by considering a collection of Z quantum systems, each with a unique state vector |Ψ(λ) i, λ = 1, ..., Z, corresponding to a unique microscopic state. At this stage, we imagine that our quantum ensemble is frozen in time, so that the state vectors are fixed. (In Section 10.3 below, we will see how the ensemble develops in time.) As in the classical case, it is assumed that the microscopic states of the ensemble are consistent with a set of macroscopic thermodynamic observables, such as temperature, pressure, chemical potential, etc. The principal goal is to predict observables in the form of expectation values. Therefore, we define the expectation value of an operator Aˆ as the ensemble average of expectation values with respect to each microscopic state in the ensemble. That is, Z X ˆ = 1 ˆ (λ) i. hAi hΨ(λ) |A|Ψ (10.2.1) Z λ=1

Since each state vector is an abstract object, it proves useful to work in a particular basis. Thus, we introduce a complete set of orthonormal vectors |φk i on the Hilbert space and expand each state of the ensemble in this basis according to X (λ) |Ψ(λ) i = Ck |φk i, (10.2.2) k

432

Quantum ensembles (λ)

where Ck

= hφk |Ψ(λ) i. Substituting eqn. (10.2.2) into eqn. (10.2.1) yields ˆ = hAi =

Z 1 X X (λ)∗ (λ) ˆ li Ck Cl hφk |A|φ Z λ=1 k,l

X k,l

Z 1 X (λ) (λ)∗ Cl Ck Z λ=1

!

ˆ l i. hφk |A|φ

(10.2.3)

Equation (10.2.3) is in the form of the trace of a matrix product. Hence, let us introduce a matrix Z X (λ) (λ)∗ ρlk = Cl Ck (10.2.4) λ=1

and a normalized matrix ρ˜lk = ρlk /Z. The matrix ρlk (or, equivalently, ρ˜lk ) is known as the ensemble density matrix. Introducing ρlk into eqn. (10.2.3), we obtain ˆ = hAi

1X 1X ˆ 1 ˆ = Tr(˜ ˆ ρA) ρA). ρlk Akl = (ˆ ρA)ll = Tr(ˆ Z Z Z k,l

(10.2.5)

l

ˆ l i and ρˆ is the operator whose matrix elements in the basis are ρlk . Here, Akl = hφk |A|φ Thus, we see that the expectation value of Aˆ is expressible as a trace of the product of Aˆ with the ensemble density matrix. According to eqn. (10.2.2), the operator ρˆ can be written formally using the microscopic state vectors: ρˆ =

Z X

λ=1

|Ψ(λ) ihΨ(λ) |.

(10.2.6)

It is straightforward to show that this operator has the matrix elements given in eqn. (10.2.4). According to eqn. (10.2.6), ρˆ is a Hermitian operator, so that ρˆ† = ρˆ and ρ˜† = ρ˜. Therefore, its eigenvectors, which satisfy the eigenvalue equation ρ˜|wk i = wk |wk i,

(10.2.7)

form a complete orthonormal basis on the Hilbert space. Here, we have defined wk as being an eigenvalue of ρ˜. In order to see what the eigenvalues of ρ˜ mean physically, let ˆ Since hIi ˆ = 1, it follows that us consider eqn. (10.2.5) for the choice Aˆ = I. 1=

X 1 Tr(ˆ ρ) = Tr(˜ ρ) = wk . Z

(10.2.8)

k

Thus, the eigenvalues of ρ˜ must sum to 1. Next, let Aˆ be a projector onto an eigenstate of ρ˜, Aˆ = |wk ihwk | ≡ Pˆk . Then

Ensemble density matrix

hPˆk i = Tr(˜ ρ|wk ihwk |) =

X l

X

=

433

hwl |˜ ρ|wk ihwk |wl i wk δkl

l

= wk ,

(10.2.9)

where we have used eqn. (10.2.7) and the orthogonality of the eigenvectors of ρ˜. Note, however, that hPˆk i =

Z Z 1X 1 X (λ) hΨ |wk ihwk |Ψ(λ) i = |hΨ(λ) |wk i|2 ≥ 0. Z Z λ=1

(10.2.10)

λ=1

Equations (10.2.9) and (10.2.10) imply that wk ≥ 0. Combining the facts that wk ≥ 0 P and k wk = 1, we see that 0 ≤ wk ≤ 1. Thus, the wk satisfy the properties of probabilities. With this key property of wk in mind, we can now assign a physical meaning to the ˆa density matrix. Let us now consider the expectation value of a projector |ak ihak | ≡ P k ˆ onto one of the eigenstates of the operator A. The expectation value of this operator is given by ˆa i = hP k

Z Z 1 X (λ) ˆ 1 X (λ) hΨ |Pak |Ψ(λ) i = hΨ |ak ihak |Ψ(λ) i Z Z λ=1

λ=1

Z 1X |hak |Ψ(λ) i|2 . = Z

(10.2.11)

λ=1

(λ)

However, |hak |Ψ(λ) i|2 ≡ Pak is just the probability that a measurement of the operator Aˆ in the λth member of the ensemble will yield the eigenvalue ak . Similarly. ˆa i = hP k

Z 1 X (λ) Pak Z

(10.2.12)

λ=1

is just the ensemble average of the probability of obtaining the value al in each member ˆ a can also be written as of the ensemble. However, the expectation value of P k X ˆ a i = Tr(˜ ˆa ) = ˆ a |wl i hP ρP hwl |˜ ρP k k k l

=

X l

=

X l

wl hwl |ak ihak |wl i wl |hak |wl i|2 .

Equating the results of eqns. (10.2.12) and (10.2.13) gives

(10.2.13)

434

Quantum ensembles Z X 1 X (λ) hPak i = wl |hak |wl i|2 . Z λ=1

(10.2.14)

l

We now interpret {|wl i} as a complete set of microscopic states appropriate for the ensemble, with wl the probability that a randomly selected member of the ensemble is in the state |wl i. Hence, the quantity on the right is the sum of probabilities that a measurement of Aˆ in a state |wl i yields the result ak weighted by the probability that an ensemble member is in the state |wl i. This is equal to the ensemble averaged probability on the left. Thus, the density operator ρ˜ gives the probability wl for an ensemble member to be in a particular microscopic state |wl i consistent with a set of macroscopic observables, and therefore, it plays the same role in quantum statistical mechanics as the phase-space distribution function f (x) plays in classical statistical mechanics.

10.3

Time evolution of the density matrix

The evolution in time of the density matrix is determined by the time evolution of each of the state vectors |Ψ(λ) i. The latter are determined by the time-dependent Schr¨odinger equation. Starting from eqn. (10.2.6), we write the time-dependent density operator as Z X ρˆ(t) = |Ψ(λ) (t)ihΨ(λ) (t)|. (10.3.1) λ=1

An equation of motion for ρˆ(t) can be determined by taking the time derivative of both sides of eqn. (10.3.1):    Z  ∂ ρˆ X ∂ (λ) ∂ = |Ψ (t)i hΨ(λ) (t)| + |Ψ(λ) (t)i hΨ(λ) (t)| . ∂t ∂t ∂t

(10.3.2)

λ=1

ˆ (λ) (t)i from the Schr¨odinger equation, eqn. However, since ∂|Ψ(λ) (t)i/∂t = (1/i~)H|Ψ (10.3.2) becomes Z   i 1 X h ˆ (λ) ∂ ρˆ ˆ = H|Ψ (t)i hΨ(λ) (t)| − |Ψ(λ) (t)i hΨ(λ) (t)|H ∂t i~ λ=1

= or

1 ˆ ˆ (Hρˆ − ρˆH) i~

(10.3.3)

1 ˆ ∂ ρˆ = [H, ρˆ]. (10.3.4) ∂t i~ Equation (10.3.4) is known as the quantum Liouville equation, and it forms the basis of quantum statistical mechanics just as the classical Liouville equation derived in Section 2.5 forms the basis of classical statistical mechanics. Recall that the time evolution of a Hermitian operator representing a physical observable in the Heisenberg picture is given by eqn. (9.2.56). Although ρˆ is a Hermitian operator, its evolution equation differs from eqn. (9.2.56), as eqn. (10.3.4) makes

Quantum equilibrium ensembles

435

clear. This difference underscores the fact that ρˆ does not actually represent a physical observable. The quantum Liouville equation can be solved formally as ˆ ˆ ˆ (t)ˆ ˆ † (t). ρˆ(t) = e−iHt/~ ρˆ(0)eiHt/~ = U ρ(0)U

(10.3.5)

Equation (10.3.4) is often cast into a form that closely resembles the classical Liouville equation by defining a quantum Liouville operator iL =

1 ˆ [..., H]. i~

(10.3.6)

In terms of this operator, the quantum Liouville equation becomes ∂ ρˆ = −iLρˆ, ∂t

(10.3.7)

ρˆ(t) = e−iLt ρˆ(0).

(10.3.8)

which has the formal solution There is a subtlety associated with the quantum Liouville operator iL. As eqn. (10.3.6) implies, iL is not an operator in the sense described in Section 9.2. The operators we have encountered so far act on the vectors of the Hilbert space to yield new vectors. By contrast, iL acts on an operator and returns a new operator. For this reason, it is often called a “superoperator” or “tetradic” operator.1

10.4

Quantum equilibrium ensembles

As in the classical case, quantum equilibrium ensembles are defined by a density matrix with no explicit time dependence, i.e., ∂ ρˆ/∂t = 0. Thus, the equilibrium Liouville ˆ ρˆ] = 0. This is precisely the condition required for a quantity to equation becomes [H, be a constant of the motion. The general solution to the equilibrium Liouville equation ˆ of the Hamiltonian. Consequently, H ˆ and ρˆ have simultaneous is any function F (H) ˆ with eigenvalues Ek , then eigenvectors. If |Ek i are the eigenvectors of H ˆ ρˆ|Ek i = F (H)|E k i = F (Ek )|Ek i.

(10.4.1)

Starting from eqn. (10.4.1), we could derive the quantum equilibrium ensembles in much the same manner as we did for the classical equilibrium ensembles. That is, we ˆ could begin by defining the microcanonical ensemble based on the conservation of H and then derive the canonical, isothermal-isobaric, and grand canonical ensembles by coupling the system to a heat bath, mechanical piston, particle reservoir, etc. However, since we have already carried out this program for the classical ensembles, we can exploit the quantum-classical correspondence principle and simply promote the classical equilibrium phase-space distribution functions, all of which are functions of 1 As an example from the literature of the use of the superoperator formalism, S. Mukamel, in his book Principles of Nonlinear Optical Spectroscopy (1995), uses the quantum Liouville operator approach to develop an elegant framework for analyzing various types of nonlinear spectroscopies.

436

Quantum ensembles

the classical Hamiltonian, to quantum operators. Thus, for the canonical ensemble at temperature T , the normalized density operator becomes ˆ

ˆ = ρ˜(H)

e−β H . Q(N, V, T )

Since ρ˜ must have unit trace, the partition function is given by i h ˆ Q(N, V, T ) = Tr e−β H .

(10.4.2)

(10.4.3)

Here, Q(N, V, T ) is identified with the number Z, the total number of microscopic ˆ Casting states in the ensemble. Thus, the unnormalized density matrix ρˆ is exp(−β H). ˆ eqns. (10.4.2) and (10.4.3) into the basis of the eigenvectors of H, we obtain e−βEk Q(N, V, T ) X Q(N, V, T ) = e−βEk . hEk |˜ ρ|Ek i =

(10.4.4)

k

Equation (10.4.4) indicates that the microscopic states corresponding to the canonˆ i.e., wk = F (Ek ) = exp(−βEk )/Q(N, V, T ), ical ensemble are eigenstates of H, and that the probability of any member of the ensemble being in a state |Ek i is exp(−βEk )/Q(N, V, T ). Once Q(N, V, T ) is known from eqn. (10.4.4), the thermodynamics of the canonical ensemble are determined as usual from eqn. (4.3.23). Finally, the expectation value of any operator Aˆ in the canonical ensemble is given by   X 1 ˆ k i. ˆ = Tr ρ˜Aˆ = e−βEk hEk |A|E (10.4.5) hAi Q(N, V, T ) k

(Feynman regarded eqns. (10.4.4) and (10.4.5) as the core of statistical mechanics, and they appear on the first page of his book Statistical Mechanics: A Set of Lectures.)2 ) If there are degeneracies among the eigenvalues, then a factor g(Ek ), which is the degeneracy of the energy level Ek , i.e., the number of independent eigenstates with this energy, must be introduced into the above sums over eigenstates. Thus, for example, the partition function becomes X Q(N, V, T ) = g(Ek )e−βEk , (10.4.6) k

and the expectation value of the operator Aˆ is given by X 1 ˆ k i. ˆ = g(Ek )e−βEk hEk |A|E hAi Q(N, V, T )

(10.4.7)

k

2 In reference to eqn.(10.4.5), Feynman casually remarks, “This law is the summit of statistical mechanics, and the entire subject is either the slide-down from this summit, as the principle is applied to various cases, or the climb-up to where the fundamental law is derived and the concepts of thermal equilibrium and temperature T clarified” (Feynman, 1998). Our program in the next two chapters will be the former, as we apply the principle and develop analytical and computational tools for carrying out quantum statistical mechanical calculations for complex systems.

Quantum equilibrium ensembles

437

In an isothermal-isobaric ensemble at temperature T and pressure P , the density operator, partition function, and expectation value are given, respectively, by ˆ

−β(H+P V ) ˆ V) = e , ρ˜(H, ∆(N, P, T )

e−β(Ek +P V ) , ∆(N, P, T )

ˆ V )|Ek i = hEk |˜ ρ(H, ∆(N, P, T ) =

Z



=

Z



0

i h ˆ dV Tr e−β(H+P V ) X

dV

0

ˆ = hAi

1 ∆(N, P, T )

Z



=

1 ∆(N, P, T )

Z



0

(10.4.8)

e−β(Ek +P V ) ,

(10.4.9)

k

i h ˆ V) ˆ −β(H+P dV Tr Ae dV

0

X k

ˆ k i. e−β(Ek +P V ) hEk |A|E

(10.4.10)

Again, if there are degeneracies, then a factor of g(Ek ) must be introduced into the sums: Z ∞ X ∆(N, P, T ) = dV g(Ek )e−β(Ek +P V ) 0

ˆ = hAi

k

1 ∆(N, P, T )

Z

0



dV

X k

ˆ k i. g(Ek )e−β(Ek +P V ) hEk |A|E

(10.4.11)

Finally, for the grand canonical ensemble at temperature T and chemical potential µ, the density operator, partition function, and expectation value are given by ˆ

ˆ N) = ρ˜(H,

e−β(H−µN ) , Z(µ, V, T )

−β(Ek −µN ) ˆ N )|Ek i = e hEk |˜ ρ(H, , Z(µ, V, T )

Z(µ, V, T ) =

∞ X

N =0

=

i h ˆ Tr e−β(H−µN )

∞ X X

N =0

(10.4.12)

k

e−β(Ek −µN ) ,

(10.4.13)

438

Quantum ensembles

ˆ = hAi =

∞ h i X 1 ˆ ) ˆ −β(H−µN Tr Ae Z(µ, V, T ) N =0

∞ X X 1 ˆ k i. e−β(Ek −µN ) hEk |A|E Z(µ, V, T )

(10.4.14)

N =0 k

As before, if there are degeneracies, then a factor of g(Ek ) must be introduced into the above sums: Z(µ, V, T ) =

∞ X X

N =0

ˆ = hAi

g(Ek )e−β(Ek −µN )

k

∞ X X 1 ˆ k i. g(Ek )e−β(Ek −µN ) hEk |A|E Z(µ, V, T )

(10.4.15)

N =0 k

The quantum grand canonical ensemble will prove particularly useful in our treatment of the quantum ideal gases to be discussed in Chapter 11. In the above list of definitions, a definition of the quantum microcanonical ensemble is conspicuously missing for the reason that it is very rarely used for condensed-phase systems. Moreover, in order to define this ensemble, the quantum-classical corresponˆ are assumed to be disdence must be applied carefully because the eigenvalues of H crete. Hence, the δ-function used in the classical microcanonical ensemble does not make sense for quantum systems because a given eigenvalue may or may not be equal to the energy E used to define the ensemble. However, if we define an energy shell between E and E + ∆E, then we can certainly find a subset of energy eigenvalues in this shell. The partition function is then related to the number of energy levels Ek satisfying E < Ek < E + ∆E. Typically, when we take the thermodynamic limit of a system, the energy levels become very closely spaced, and we can shrink the thickness ∆E of the shell to zero. 10.4.1

Canonical ensemble via entropy maximization

In this subsection, we show how to derive the quantum canonical ensemble using a quantum mechanical version of the entropy maximization derivation presented in Section 4.4. Derivations of the quantum isothermal-isobaric and grand-canonical ensembles via entropy maximization are left as exercises for the reader (see Problem 10.2). The idea is to express the entropy S using the density matrix eigenvalues wk and then show that the form that maximizes S is wk = exp(−βEk )/Q(N, V, T ). Treating the eigenvalues as discrete, the quantum mechanical analog of the Shannon entropy expression in eqn. (4.4.1) is X S = −k wk ln wk (10.4.16) k

(be careful not to confuse the leading Boltzmann’s constant k with the summation index k). We seek to find the form of wk that maximizes S. However, as in Section 4.4,

Quantum equilibrium ensembles

439

the maximization must be performed subject to certain constraints. First, since the P density matrix ρ˜ has unit trace, we need k wk = 1. In addition, we need the ensemble ˆ to yield a well-defined energy E: hHi ˆ = E. This gives the average of the Hamiltonian H P second constraint k wk Ek = E. These two constraints are introduced with Lagrange multipliers µ and λ, respectively, giving us the modified entropy expression ! ! X X X S˜ = −k wk ln wk − µ wk − 1 − λ wk Ek − E . (10.4.17) k

k

k

In contrast to Section 4.4, which requires a functional derivative, in the quantum context, the discreteness of the eigenvalues allows us to express the entropy maximization condition as a simple partial derivative ∂ S˜ = 0, ∂wi

(10.4.18)

−k [ln wi + 1] − µ − λEi = 0.

(10.4.19)

wi = e−(1+µ/k) e−λEi /k .

(10.4.20)

which leads to the condition

Solving for wi yields

The multiplier λ is determined by the condition 1 ∂ S˜ = , ∂E T

(10.4.21)

P which gives i wi = 1, together with the P λ = 1/T . The normalization condition definition i exp(−βEi ) = Q(N, V, T ), then gives the multiplier µ as e−(1+µ/k) =

1 , Q(N, V, T )

(10.4.22)

so that wi = exp(−βEi )/Q(N, V, T ), as expected. 10.4.2

Example: The canonical harmonic oscillator

In order to illustrate the application of a quantum equilibrium ensemble, we consider the case of a simple one-dimensional harmonic oscillator of frequency ω. We will derive the properties of this system using the canonical ensemble. Recall from Section 9.3 that the energy eigenvalues are given by   1 ~ω n = 0, 1, 2, .... (10.4.23) En = n + 2 The canonical partition function is, therefore, Q(β) =

∞ X

n=0

e−βEn =

∞ X

n=0

e−β(n+1/2)~ω .

(10.4.24)

440

Quantum ensembles

Recalling that the sum of a geometric series is given by ∞ X

rn =

n=0

1 , 1−r

(10.4.25)

where 0 < r < 1, the partition function becomes Q(β) = e−β~ω/2

∞ X

e−nβ~ω = e−β~ω/2

n=0

X

e−β~ω

n=0

n

=

e−β~ω/2 . 1 − e−β~ω

(10.4.26)

From the partition function, various thermodynamic quantities can be determined. First, the free energy is given by A=− while the total energy is

 ~ω 1 1 ln Q(β) = + ln 1 − e−β~ω , β 2 β

∂ ~ω ~ωe−β~ω E=− = ln Q(β) = + ∂β 2 1 − e−β~ω



 1 + hni ~ω. 2

(10.4.27)

(10.4.28)

Thus, even if hni = 0, there is still a finite amount of energy, ~ω/2 in the system. This residual energy is known as the zero-point energy. Next, from the average energy, the heat capacity can be determined (β~ω)2 e−β~ω C = 2. k (1 − e−β~ω )

(10.4.29)

Note that as ~ → 0, C/k → 1 in agreement with the classical result. Finally, the entropy is given by S = k ln Q(β) +

 ~ω e−β~ω E , = −k ln 1 − e−β~ω + T T 1 − e−β~ω

(10.4.30)

which is consistent with the third law of thermodynamics, as S → 0 as T → 0. The expressions we have derived for the thermodynamic observables are often used to estimate thermodynamic quantities of molecular systems under the assumption that the system can be approximately decomposed into a set of uncoupled harmonic oscillators corresponding to the normal modes of Section 1.7. By summing the expressions in eqns. (10.4.27), (10.4.29), or (10.4.30) over a set of frequencies generated in a normal-mode calculation, estimates of the quantum thermodynamic properties such as free energy, heat capacity, and entropy, can be easily obtained. As a concluding remark, we note that the formulation of the quantum equilibrium ˆ suggests that the computaensembles in terms of the eigenvalues and eigenvectors of H tional problems inherent in many-body quantum mechanics have not been alleviated. After all, one still needs to solve the eigenvalue problem for the Hamiltonian, which involves solution eqn. (10.1.5). In Section 10.1, we described the difficulty inherent in this approach. The eigenvalue equation can be solved explicitly only for systems

Problems

441

with a very small number of degrees of freedom. Looking ahead, in Chapter 12 we will develop a framework, known as the Feynman path integral formulation of statistical mechanics, that allows the calculation of N -particle eigenvalues to be circumvented, thereby allowing quantum equilibrium properties of large condensed-phase systems to be evaluated using molecular dynamics and Monte Carlo methods. Before exploring this formalism, however, we will use the traditional eigenvalue approach to study the quantum ideal gases, the subject of the next chapter.

10.5

Problems

10.1. a. Prove that the trace of a matrix A is independent of the basis in which the trace is performed. b. Prove the cyclic property of the trace Tr(ABC) = Tr(CAB) = Tr(BCA). 10.2. Use the quantum mechanical formulation of the Shannon entropy to derive density matrix eigenvalues for the quantum isothermal-isobaric and grandcanonical ensembles by maximizing the entropy subject to appropriate constraints. Use the derivations in Sections 5.4 and 6.5 as guides for how to set up the appropriate modified entropy expressions. 10.3. Recall from Problem 9.2 that the energy of a quantum particle with magnetic moment µ in a magnetic field B is E = −µ · B. Consider a spin-1/2 particle such as an electron fixed in space in a uniform magnetic field in the z direction, so that B = (0, 0, B). The Hamiltonian for the particle is given by ˆ = −γB Sˆz . H The spin operators are given in eqn. (9.4.2). a. Suppose an ensemble of such systems is prepared such that the density matrix initially is   1/2 0 ρ˜(0) = . 0 1/2 Calculate ρ˜(t).

b. What are the expectation values of the operators Sˆx , Sˆy , and Sˆz at any time t? c. Suppose now that the initial density matrix is   1/2 −i/2 ρ˜(0) = . i/2 1/2 For this case, calculate ρ˜(t).

442

Quantum ensembles

d. What are the expectation values of the operators Sˆx , Sˆy , and Sˆz at time t for this case? e. What is the fluctuation or uncertainty in Sˆx at time t? Recall that q ˆ ∆Sx = hSˆx2 i − hSˆx i2 f. Suppose finally that the density matrix is given initially by a canonical density matrix: ˆ e−β H ρ˜(0) = ˆ Tr(e−β H ) What is ρ˜(t)? g. What are the expectation values of Sˆx , Sˆy , and Sˆz at time t? 10.4. Consider a one-dimensional quantum harmonic oscillator of frequency ω, for which the energy eigenvalues are   1 ~ω n = 0, 1, 2, .... En = n + 2 Using the canonical ensemble at temperature T , calculate hˆ x2 i, hˆ p2 i, and the uncertainties ∆x and ∆p. Hint: Might the raising and lowering operators of Section 9.3 be useful? ∗

10.5. A weakly anharmonic oscillator of frequency ω has energy eigenvalues given by    2 1 1 En = n + ~ω − κ n + ~ω n = 0, 1, 2, .... 2 2

Show that, to first order in κ and fourth order in r = β~ω, the heat capacity in the canonical ensemble is given by     C 1 r3 r2 r4 + 4κ = 1− + + k 12 240 r 80 (Pathria, 1972).

10.6. Suppose a quantum system has degenerate eigenvalues. a. If g(En ) is the degeneracy of the energy level En , show that the expression for the canonical partition function must be modified to read X Q(N, V, T ) = g(En )e−βEn . n

Problems

443

b. A harmonic oscillator of frequency ω in d dimensions has energy eigenvalues given by   d ~ω, En = n + 2 but the energy levels become degenerate. The degeneracy of each level is g(En ) =

(n + d − 1)! . n!(d − 1)!

Calculate the canonical partition function, free energy, total energy, and heat capacity in this case. 10.7. The Hamiltonian for a free particle in one dimension is 2 ˆ = pˆ . H 2m a. Using the free particle eigenfunctions, show that the canonical density matrix is given by   1/2  m m ˆ ′ 2 (x − x ) exp − hx|e−β H |x′ i = 2πβ~2 2β~2

b. Recall that an operator Aˆ in the Heisenberg picture evolves in time according to ˆ ˆ ˆ = eiHt/~ ˆ −iHt/~ A(t) Ae . Now consider a transformation from real time t to an imaginary time variable τ via t = −iτ ~. In imaginary time, the evolution of an operator becomes ˆ ˆ −τ H ˆ ˆ ) = eτ H A(τ Ae Using this evolution, derive an expression for the imaginary-time meansquare displacement of a free particle defined to be R2 (τ ) = h[ˆ x(0) − x ˆ(τ )]2 i. Assume the particle is a one-dimensional box of length L. This function can be used to quantify the quantum delocalization of a particle at temperature T . 10.8. The following theorem is due to Peierls (1938): Let {|φn i} be an arbitrary set of orthonormal functions on the Hilbert space of a quantum system whose ˆ The functions {|φn i} are assumed to satisfy the same Hamiltonian is H. boundary and symmetry conditions of the physical system. It follows that the canonical partition function Q(N, V, T ) satisfies the inequality X ˆ Q(N, V, T ) ≥ e−βhφn|H|φn i , n

ˆ Prove this where equality holds only if {|φn i} are the eigenfunctions of H. theorem.

444

Quantum ensembles

Hint: You might find the Ritz variational principle of quantum mechanics helpful. The Ritz principle states that for an arbitrary wave function |Ψi, the ground-state energy E0 obeys the inequality ˆ E0 ≤ hΨ|H|Ψi ˆ where equality only holds if |Ψi is the ground state wave function of H. 10.9. Prove the following inequality: If A1 and A2 are the Helmholtz free energies ˆ 1 and H ˆ 2 , respectively, then for systems with Hamiltonians H ˆ1 −H ˆ 2 i2 A1 ≤ A2 + hH where h· · ·i2 indicates an ensemble average calculated with respect to the density matrix of system 2. This inequality is known as the Gibbs-Bogliubov inequality (Feynman, 1998). 10.10 The density matrix of an equilibrium ensemble can be thought of as arising from an imaginary time reformulation of the time-dependent Schr¨odinger ˆ equation for a system with a time-independent Hamiltonian H. a. Starting with the time-dependent Schr¨odinger equation for a state vector |Ψ(t)i, change the time variable t to τ = it, and show that the resulting imaginary-time Schr¨odinger equation is ∂ ˆ H|Ψ(τ )i = −~ |Ψ(τ )i. ∂τ b. Now suppose an ensemble of Z members is constructed of imaginary-time dependent states |Ψλ (τ )i with a density operator ρ˜(τ ) =

Z 1 X |Ψ(τ )ihΨ(τ )|. Z λ=1

Since this density operator describes an equilibrium ensemble, we will ˆ ρ˜(τ )] = 0, that is, ρ˜ commutes with the Hamiltonian. assume that [H, Under this condition, use the imaginary-time Schr¨odinger equation to show that ˆ ρ˜(τ ) = − ~ ∂ ρ˜(τ ). H 2 ∂τ c. Show that if τ = β~/2, where β = 1/kB T , then the equation for the density operator becomes the so-called Bloch equation ˆ ρ˜(β) = − ∂ ρ˜(β). H ∂β

Problems

445

d. Taking ρ˜(β = 0) = ˆI to be the identity operator, show that the Bloch equation is formally solved by the canonical density matrix. ∗

10.11. A simple model of a one-dimensional classical polymer consists of assigning discrete energy states to different configurations of the polymer. Suppose the polymer consists of flat, elliptical disc-shaped molecules that can align either along their long axis (length 2a) or short axis (length a). The energy of a monomer aligned along its short axis is higher by an amount ε so that the total energy of the molecule is E = nε, where n is the number of monomers aligned along the short axis. a. Calculate the canonical partition function Q(N, T ) for such a polymer consisting of N monomers. b. What is the average length of the polymer?

11 Quantum ideal gases: Fermi-Dirac and Bose-Einstein statistics 11.1

Complexity without interactions

In Chapters 3 through 6, the classical ideal gas was used to illustrate how the tools of classical statistical mechanics are applied to a simple problem. The classical ideal gas was seen to be a relatively trivial system with an uninteresting phase diagram. The situation with the quantum ideal gas is dramatically different. The symmetry conditions imposed on the wave function for a system of N noninteracting bosons or fermions lead to surprisingly rich and complex behavior that is purely quantum mechanical in origin. For bosonic systems, the ideal gas admits a fascinating effect known as Bose-Einstein condensation. From the fermionic ideal gas, we arrive at the notion of a Fermi surface. Moreover, many of the results derived for an ideal gas of fermions have been used to develop approximations to the electronic structure theory known as density functional theory (Hohenberg and Kohn, 1964; Kohn and Sham, 1965). Thus, a detailed treatment of the quantum ideal gases is instructive. In this chapter, we will study the general problem of a quantum-mechanical ideal gas using the rules of quantum statistical mechanics developed in the previous chapter. Following this, we will specialize our treatment for the fermionic and bosonic cases, examine a number of important limits, and finally, derive the general concepts that emerge from these limits.

11.2

General formulation of the quantum-mechanical ideal gas

The Hamiltonian operator for an ideal gas of N identical particles is ˆ = H

N X ˆ 2i p . 2m i=1

(11.2.1)

In order to compute the partition function, we must solve for the eigenvalues of this Hamiltonian. In so doing, we will also determine the N -particle eigenfunctions. The eigenvalue problem for the Hamiltonian in the coordinate basis reads N



~2 X 2 ∇ Φ(x1 , ..., xN ) = EΦ(x1 , ..., xN ), 2m i=1 i

(11.2.2)

where xi is the combined coordinate and spin label xi = (ri , si ). The N -particle function Φ(x1 , ...., xN ) is the solution to eqn. (11.2.2) before any symmetry conditions are

General formulation

447

imposed. Since eqn. (11.2.2) is completely separable in the N -particle coordinate/spin labels x1 , ..., xN , the Hamiltonian can be written as a sum of single-particle Hamiltonians: ˆ = H

N X

ˆi h

i=1

ˆ 2i ˆi = p h . 2m

(11.2.3)

ˆ is independent of spin, the eigenfunctions must also be eigenfuncMoreover, since H tions of Sˆ2 and Sˆz . Therefore, the unsymmetrized solution to eqn. (11.2.2) can be written as a product: Φα1 m1 ,...,αN mN (x1 , ..., xN ) =

N Y

φαi mi (xi ),

(11.2.4)

i=1

where φαi mi (xi ) is a single-particle wave function characterized by a set of spatial quantum numbers αi and Sz eigenvalues mi . The spatial quantum numbers αi are chosen to characterize the spatial part of the eigenfunctions according to a set of observables that commute with the Hamiltonian. Each single-particle function φαi mi (xi ) can be further decomposed into a product of a spatial function ψαi (ri ) and a spin eigenfunction χmi (si ). The spin eigenfunctions are defined via components of the eigenvectors of Sˆz given in eqn. (9.4.3): χm (s) = hs|χm i = δms .

(11.2.5)

Thus, χ~/2 (~/2) = 1, χ~/2 (−~/2) = 0, and so forth. Substituting this ansatz into the wave equation yields a single-particle wave equation: −

~2 2 ∇ ψα (ri ) = εαi ψαi (ri ). 2m i i

(11.2.6)

Here, εαi is a single-particle energy eigenvalue, and the N -particle eigenvalues are just sums of these: N X εαi . (11.2.7) Eα1 ,...,αN = i=1

Note that the single-particle wave equation is completely separable in x, y, and z. If we impose periodic boundary conditions in all three directions, then the solution of the wave equation is simply a product of one-dimensional wave functions of the form given in eqn. (9.3.14). The one-dimensional wave functions are characterized by integers nx,i , ny,i , and nz,i that arise from the quantization of momentum due to the periodicity of the box. These can be collected into a vector ni = (nx,i , ny,i , nz,i ) of integers, which leads to the following solution to eqn. (11.2.6):  3/2 1 exp(2πinx,i xi /L) exp(2πiny,i yi /L) exp(2πinz,i zi /L) ψni (ri ) = L

448

Quantum ideal gases

1 = √ exp(2πini · ri /L). V

(11.2.8)

Similarly, each component of momentum is quantized, so that the momentum eigenvalues can be expressed as 2π~ ni , (11.2.9) pn i = L and the energy eigenvalues in eqn. (11.2.6) are just sums of the energies in eqn. (9.3.11) over x, y, and z: p2 2π 2 ~2 εni = ni = |ni |2 . (11.2.10) 2m mL2 Multiplying the functions in eqn. (11.2.8) by spin eigenfunctions, the complete singleparticle eigenfunctions become 1 hxi |ni mi i = φni mi (xi ) = √ e2πini ·ri /L χmi (si ), V

(11.2.11)

and the total energy eigenvalues are given by a sum over single-particle eigenvalues En1 ,...,nN =

N X 2π 2 ~2 i=1

mL2

|ni |2 .

(11.2.12)

Finally, since the eigenvalue problem is separable, complete fermionic and bosonic wave functions can be constructed as follows. Begin by constructing a matrix   φn1 ,m1 (x1 ) φn2 ,m2 (x1 ) · · · φnN ,mN (x1 )  φn1 ,m1 (x2 ) φn2 ,m2 (x2 ) · · · φnN ,mN (x2 )    · · ··· ·   M= (11.2.13) . · · ··· ·     · · ··· · φn1 ,m1 (xN ) φn2 ,m2 (xN ) · · · φnN ,mN (xN )

The properly symmetrized fermionic and bosonic wave functions are ultimately given by Ψ(F) n1 ,m1 ,...,nN ,mN (x1 , ..., xN ) = det(M) Ψ(B) n1 ,m1 ,...,nN ,mN (x1 , ..., xN ) = perm(M),

(11.2.14)

where det and perm refer to the determinant and permanent of M, respectively. (The permanent of a matrix is just a determinant in which all the minus signs are changed to plus signs.1 ) In the fermion case, the determinant leads to a wave function that is 1 The

permanent of a 2×2 matrix A=

is perm(A) = ad + bc.



a c

b d



Distinguishable particles

449

completely antisymmetric with respect to an exchange of any two particle spin labels. Such an exchange is equivalent to interchanging two rows of the matrix M , which has the effect of changing the sign of the determinant. These determinants are known as Slater determinants after the physicist John C. Slater (1900–1976) who introduced the procedure. In the preceding discussion, each individual particle was treated separately, with total energy eigenvalues expressed as sums of single-particle eigenvalues, and overall wave functions given as determinants/permanents constructed from single-particle wave functions. We will now introduce an alternative framework for solving the quantum ideal-gas problem that proves more convenient for the quantum statistical mechanical treatment to follow. Let us consider again the single-particle eigenfunction and eigenvalue for a given vector of integers n and spin eigenvalue m: 1 φn,m (x) = √ e2πin·r/L χm (s) V 2π 2 ~2 2 |n| . (11.2.15) mL2 We now ask: How many particles in the N -particle system are described by this wave function and energy? Let this number be fnm , which is called an occupation number. The occupation number fnm tells us how many particles have an energy εn and probability amplitude φn,m (x). Since there is an infinite number of accessible states φn,m (x) and associated energies εn , there are infinitely many occupation numbers, and only a finite subset of these can be nonzero. Indeed, the occupation numbers are subject to the restriction that the sum over them yield the number of particles in the system: XX fnm = N, (11.2.16) εn =

m

where

X n

and



n

∞ X

∞ X

∞ X

,

(11.2.17)

nx =−∞ ny =−∞ nz =−∞

X m



s X

(11.2.18)

m=−s

runs over the (2s+1) possible values of m for a spin-s particle. The occupation numbers can be used to characterize the total energy eigenvalues of the system. The total energy eigenvalue can be expressed as XX E{fnm } = εn fnm , (11.2.19) m

n

which is just a sum over all possible energies multiplied by the number of particles having each energy. The formulation of the eigenvalue problem in terms of accessible states φn,m (x), energy levels εn , and occupation numbers for these states and energies is known as second quantization. The framework of second quantization leads to a simple and elegant procedure for constructing the partition function.

450

11.3

Quantum ideal gases

An ideal gas of distinguishable quantum particles

To illustrate the use of occupation numbers in the evaluation of the quantum partition function, let us suppose we can ignore the symmetry of the wave function under particle exchange. Neglect of spin statistics leads to an approximation known as Boltzmann statistics. Boltzmann statistics are equivalent to an assumption that the particles are distinguishable because the N -particle wave function for Boltzmann particles is just a simple product of the functions φni mi (xi ). In this case, spin can also be neglected. The canonical partition function Q(N, V, T ) can be expressed as a sum over the quantum numbers n1 , ..., nN for each particle: XX X e−βEn1 ,...,nN ··· Q(N, V, T ) = =

n2

n1

=

nN

n2

n1

XX X

e

···

X

−βεn1

n1

=

X

nN

!

e−βεnN

n

e−βεn1 e−βεn2 · · · e−βεnN X

e

−βεn2

n2

!N

!

···

X nN

.

In terms of occupation numbers, the partition function is X P g({fn })e−β n εn fn , Q(N, V, T ) =

e

−βεnN

! (11.3.1)

(11.3.2)

{f }

where g({fn }) is a factor that tells how many distinct physical states can be represented by a given set of occupation numbers {fn }. For Boltzmann particles, exchanging the momentum labels ni of two particles leads to a new physical state but leaves the occupation numbers unchanged. Thus, the counting problem becomes one of determining how many different ways N particles can be placed in the physical states. This means that g({fn }) is given simply by the combinatorial factor N! g({fn }) = Q . n fn !

(11.3.3)

For example, if there were only two states, then the occupation numbers are f1 and f2 where f1 + f2 = N . The above formula gives g(f1 , f2 ) =

N! N! = , f1 !f2 ! f1 !(N − f1 )!

(11.3.4)

which is the expected binomial coefficient. Substituting eqn. (11.3.3) into eqn. (11.3.2) gives Q(N, V, T ) =

X

{fn }

N ! Y −βfn εn Q e , n fn ! n

(11.3.5)

General formulation

451

which is just a multinomial expansion for Q(N, V, T ) =

X

e−βεn

n

!N

.

(11.3.6)

Again, if there were two states, then the partition function would be (e−βε1 + e−βε2 )N =

X

f1 ,f2 ,f1 +f2 =N

N ! −f1 βε1 −f2 βε2 e e f1 !f2 !

(11.3.7)

from the binomial theorem. Therefore, in order to evaluate the partition function, we just need to perform the sum X X 2 2 2 2 e−βεn = e−2π β~ |n| /mL . (11.3.8) n

n

Ultimately, we are interested in the thermodynamic limit, where L → ∞. In this limit, the spacing between the single-particle energy levels becomes quite small, and the discrete sum over n can, to a very good approximation, be replaced by an integral over a continuous variable (which we also denote as n): Z X 2 2 2 2 2 2 2 2 e−2π β~ |n| /mL −→ dn e−2π β~ |n| /mL . (11.3.9) n

Since the single-particle eigenvalues only depend on the magnitude of n, we can transform the integral over nx , ny , and nz into spherical polar coordinates (n, θ, φ), where n = |n|, and θ and φ retain their usual meaning. Thus, the integral becomes 4π

Z

0



dn n2 e−2π

2

β~2 |n|2 /mL2

=V



m 2πβ~2

3/2

=



V λ3



,

(11.3.10)

where λ is the thermal wavelength. The partition function now becomes Q(N, V, T ) =



V λ3

N

,

(11.3.11)

which is just the classical canonical partition function for an ideal gas. Therefore, we see that an ideal gas of distinguishable particles, even when treated quantum mechanically, has precisely the same properties as a classical ideal gas. Thus, we conclude that all the quantum effects are contained in the particle spin statistics, which we will now consider.

11.4

General formulation for fermions and bosons

For systems of identical fermions or identical bosons, an exchange of particles does not change the physical state. Therefore, the factor g({fnm }) is simply 1 for either particle type. For fermions, the Pauli exclusion principle forbids two identical particles from

452

Quantum ideal gases

having the same set of quantum numbers. Note that the Slater determinant vanishes if, for any two particles i and j, ni = nj and mi = mj . In the second quantization formalism, this means that no two particles may occupy the same state φn,m (x). Consequently, the occupation numbers are restricted to be only 0 or 1: fnm = 0, 1

(Fermions).

(11.4.1)

By contrast, since a permanent does not vanish if ni = nj and mi = mj , the occupation numbers fnm for a system of identical bosons have no such restriction and can, therefore, take on any value between 0 and N : fnm = 0, 1, 2, ..., N

(Bosons).

(11.4.2)

For either set of occupation numbers, the canonical partition function can be written generally as X YY X P P e−β m n fnm εn = (11.4.3) e−βfnm εn . Q(N, V, T ) = {fnm } n

{fnm }

m

Note that the sum over occupation numbers in eqn. (11.4.3) must be performed subject to the restriction XX fnm = N. (11.4.4) m

n

This restriction makes performing the sum in eqn. (11.4.3) nontrivial when g({fnm }) = 1. Evidently, the canonical ensemble is not the most convenient choice for deriving the thermodynamics of bosonic or fermionic ideal gases. Fortunately, since all ensembles are equivalent in the thermodynamic limit, we may choose from any of the other remaining ensembles. Of these, we will see shortly that working in the grand canonical ensemble makes our task considerably easier. Recall that in the grand canonical ensemble, µ, V , and T are the control variables, and the partition function is given by Z(µ, V, T ) =

∞ X

ζ N Q(N, V, T )

N =0

=

∞ X

N =0

eβµN

X YY

{fnm } m

e−βfnm εn .

(11.4.5)

n

Note that the P innerPsum in eqn. (11.4.5) over occupation numbers is still subject to the restriction m n fnm = N . However, in the grand canonical ensemble, there is a final sum over all possible values of N , and this sum allows us to lift the restriction on the inner sum. The final sum over N combined with the restricted sum over occupation numbers is mathematically equivalent to an unrestricted sum over occupation numbers. For if we simply perform an unrestricted sum over occupation numbers, then all possible values of N will be generated automatically. Thus, we can see why the

General formulation

453

grand canonical ensemble is preferable for fermions and bosons. The grand canonical partition function can be written compactly as Z(µ, V, T ) =

X YY

{fnm } m

eβ(µ−εn )fnm .

(11.4.6)

n

A second simplification results from rewriting the sum of products as a product of sums: XXX f1

f2

f3



=

=

X f1

· · · eβ(µ−ε1 )f1 eβ(µ−ε2 )f2 eβ(µ−ε3 )f3 · · ·    X X eβ(µ−ε1 )f1   eβ(µ−ε1 )f2   eβ(µ−ε1 )f3  · · · f2

YY X m

f3

eβ(µ−εn )fnm .

(11.4.7)

n {fnm }

For fermions, each occupation-number sum contains only two terms corresponding to fnm = 0 and fnm = 1, which yields Z(µ, V, T ) =

YY m

1 + eβ(µ−εn )

n



(Fermions).

(11.4.8)

For bosons, each occupation-number sum ranges P∞ from 0 to ∞ and can be computed using the sum formula for a geometric series n=0 rn = 1/(1 − r) for 0 < r < 1. Thus, eqn. (11.4.7) becomes Z(µ, V, T ) =

YY m

n

1 1 − eβ(µ−εn )

(Bosons).

(11.4.9)

Note that, in each case, the summands are independent of the quantum number m so that we may perform the product over m values trivially with the result "

 Y 1 + eβ(µ−εn ) Z(µ, V, T ) = n

#g

(11.4.10)

for fermions, and Z(µ, V, T ) =

" Y n

1 1 − eβ(µ−εn )

#g

(11.4.11)

for bosons, where g = (2s + 1) is the number of eigenstates of Sˆz , which is also known as the spin degeneracy. For spin-1/2 particles such as electrons, g = 2.

454

Quantum ideal gases

At this point, let us recall the procedure for calculating the equation of state in the grand canonical ensemble. The free energy in this ensemble is −P V , and PV = ln Z(ζ, V, T ). kT

(11.4.12)

Moreover, the average particle number is the thermodynamic derivative with respect to the fugacity ζ: ∂ ln Z(ζ, V, T ). (11.4.13) hN i = ζ ∂ζ Next, the fugacity ζ must be eliminated in favor of hN i by solving for ζ in terms of hN i and substituting into eqn. (11.4.12). Thus, in order to obtain the equation of state in the grand canonical ensemble, we must carry out the products in eqn. (11.4.10) and then apply the above procedure. Although we saw in Section 6.6 that this is straightforward for the classical ideal gas, the procedure cannot be performed exactly analytically for the quantum ideal gases. For an ideal gas of identical fermions, the equations we must solve are " #g Y X   PV −βεn = ln Z(ζ, V, T ) = ln 1 + ζe =g ln 1 + ζe−βεn kT n n hN i = ζ

X ζe−βεn ∂ ln Z = g , ∂ζ 1 + ζe−βεn n

(11.4.14)

and for bosons, they become #g " X Y  PV 1 −βεn = −g ln 1 − ζe = ln Z(ζ, V, T ) = ln −βε kT 1 − ζe n n n hN i = ζ

X ζe−βεn ∂ ln Z = g . ∂ζ 1 − ζe−βεn n

(11.4.15)

It is not difficult to see that the problem of solving for ζ in terms of hN i is nontrivial for both particle types. In the next two sections, we will analyze the ideal fermion and boson gases individually and investigate the limits and approximations that can be applied to compute their thermodynamic properties.

11.5

The ideal fermion gas

As we did for the ideal Boltzmann gas in Section 11.3, we will consider the thermodynamic limit L → ∞ of the ideal fermion gas, so that the spacing between energy levels becomes small. Then the sums in eqns. (11.4.14) can be replaced by integrals over a continuous variable denoted n. For the pressure, this replacement leads to

The ideal fermion gas

PV =g kT

Z

dn ln 1 + ζe−βεn

=g

Z

  2 2 2 2 dn ln 1 + ζe−2π β~ |n| /mL

= 4πg



Z

0

455



  2 2 2 2 dn n2 ln 1 + ζe−2π β~ |n| /mL ,

(11.5.1)

where, in the last line, we have transformed to spherical polar coordinates. Next, we introduce a change of variables r 2π 2 β~2 x= n, (11.5.2) mL2 which gives PV = 4πgV kT 4V g = √ 3 πλ



m 2 2π β~2

Z



0

3/2 Z

0



  2 dx x2 ln 1 + ζe−x

  2 dx x2 ln 1 + ζe−x .

(11.5.3)

The remaining integral can be evaluated by expanding the log in a power series and integrating the series term by term. Using the fact that ln(1 + y) =

∞ X

(−1)l+1

l=1

yl , l

(11.5.4)

we obtain ∞  X  2 (−1)l+1 ζ l −lx2 ln 1 + ζe−x = e l l=1

Z ∞ 2 PV 4V g X (−1)l+1 ζ l ∞ = √ 3 dx x2 e−lx kT πλ l 0 l=1

=

Vg λ3

∞ X l=1

(−1)l+1 ζ l . l5/2

(11.5.5)

In the same way, it can be shown that the average particle number hN i is given by the expression ∞ V g X (−1)l+1 ζ l . (11.5.6) hN i = 3 λ l3/2 l=1

Multiplying eqns. (11.5.5) and (11.5.6) by 1/V , we obtain

456

Quantum ideal gases



X (−1)l+1 ζ l P λ3 = gkT l5/2 l=1 ∞

X (−1)l+1 ζ l ρλ3 , = g l3/2

(11.5.7)

l=1

where ρ = hN i/V is the number density. Although we cannot solve these equations to obtain a closed form for the equation of state, two interesting limits can be worked out to a very good approximation, which we examine next. 11.5.1

The high-temperature, low-density limit

Solving for ζ as a function of hN i is equivalent to solving for ζ as a function of ρ. Hence, in the low-density limit, we can take an ansatz for ζ = ζ(ρ) in the form of a power series: ζ(ρ) = a1 ρ + a2 ρ2 + a3 ρ3 + · · · . (11.5.8) How rapidly this series converges depends on how low the density actually is. Writing out the first few terms in the pressure and density equations, we have ζ3 ζ4 P λ3 ζ2 = ζ − 5/2 + 5/2 − 5/2 + · · · gkT 2 3 4 ρλ3 ζ2 ζ3 ζ4 = ζ − 3/2 + 3/2 − 3/2 + · · · . g 2 3 4

(11.5.9)

Substituting eqn. (11.5.8) into eqns. (11.5.9) gives ρλ3 1 = (a1 ρ + a2 ρ2 + a3 ρ3 + · · ·) − 3/2 (a1 ρ + a2 ρ2 + a3 ρ3 + · · ·)2 g 2 +

1 33/2

(a1 ρ + a2 ρ2 + a3 ρ3 + · · ·)3 + · · · .

(11.5.10)

Equation (11.5.10) can be solved perturbatively, equating like powers of ρ on both sides. For example, if we work only to first order in ρ, then we have ρλ3 = a1 ρ g



a1 =

λ3 g

⇒ζ≈

λ3 ρ . g

(11.5.11)

When eqn. (11.5.11) is substituted into eqn. (11.5.9) for the pressure and only terms first order in the density are kept, we obtain ρλ3 P λ3 = gkT g



P hN i =ρ= , kT V

(11.5.12)

which is just the classical ideal gas equation. If we now go out to second-order in ρ, eqn. (11.5.9) gives

The ideal fermion gas

λ3 ρ 1 λ6 ρ2 λ3 ρ = + a2 ρ2 − 3/2 2 , g g g 2

457

(11.5.13)

or λ6 , 23/2 g 2

(11.5.14)

λ3 ρ λ6 + 3/2 2 ρ2 , g 2 g

(11.5.15)

a2 = from which ζ≈ and the equation of state becomes

λ3 P = ρ + 5/2 ρ2 . kT 2 g

(11.5.16)

From the equation of state, we can read off the second virial coefficient B2 (T ) =

λ3 25/2 g

≈ 0.1768

λ3 > 0. g

(11.5.17)

Even at second-order, we observe a nontrivial quantum effect, in particular, a second virial coefficient with a nonzero value despite the absence of interactions among the particles. The implication of eqn. (11.5.17) is that there is an effective “interaction” among the particles as a result of the fermionic spin statistics. This “interaction” tends to increase the pressure above the classical ideal gas result (B2 (T ) > 0) and hence is repulsive in nature. This result is a consequence of the Pauli exclusion principle: If we imagine filling the energy levels, then since no two particles can occupy the same quantum state, once the ground state n = (0, 0, 0) is fully occupied by particles with different Sˆz eigenvalues, the next particle must go into a higher energy state. The result is an effective “repulsion” among the particles that pushes them into increasingly higher energy states so as not to violate the Pauli principle. If the third-order contribution is worked out, one finds (see Problem 11.1) that a3 =



1 1 − 4 33/2



λ9 g3

  9 1 λ 3 λ3 ρ λ6 1 ρ + 3/2 2 ρ2 + − 3/2 g 4 3 g3 2 g   λ3 2 λ6 1 2 P = ρ + 5/2 ρ + 2 − ρ3 , kT g 8 35/2 2 g ζ=

(11.5.18)

so that B3 (T ) < 0. Since the third-order term is a second-order correction to the ideal-gas equation of state, the fact that B3 (T ) < 0 is consistent with time-independent perturbation theory, wherein the second-order correction lowers all of the energy levels.

458

Quantum ideal gases

11.5.2

The high-density, low-temperature limit

The high-density, low-temperature limit exhibits the largest departure from classical behavior. Operating with ζ(∂/∂ζ) on eqn. (11.5.3), as indicated in eqn. (11.4.14), we obtain the following integral expression for the density: Z ∞ x2 dx 4g . (11.5.19) ρλ3 = √ π 0 ζ −1 ex2 + 1 Starting with this expression, we can derive an expansion in the inverse powers of ln ζ ≡ µ/kT , as these inverse powers will become decreasingly small as T → 0, allowing the leading order behavior to be deduced. We begin by introducing the variable µ kT

ν = ln ζ =

(11.5.20)

and developing an expansion in its inverse powers. We will sketch out briefly how this √ is accomplished. We first introduce a change of variable y = x2 , from which x = y √ and dx = dy/(2 y). When this change is made in eqn. (11.5.19), we obtain 2g ρλ3 = √ π



Z

0



ydy . ey−ν + 1

(11.5.21)

The integral can be carried out by parts using u=

1 ey−ν

+1

,

dv = y 1/2 dy,

du = − v=

1 ey−ν dy + 1)2

(ey−ν

2 3/2 y , 3

(11.5.22)

which gives 4g ρλ3 = √ 3 π

Z

0



y 3/2 ey−ν dy . (ey−ν + 1)2

(11.5.23)

Next, we expand y 3/2 about y = ν: 3 3 y 3/2 = ν 3/2 + ν 1/2 (y − ν) + ν −1/2 (y − ν)2 + · · · . 2 8

(11.5.24)

This expansion is now substituted into eqn. (11.5.23) and the resulting integrals over y are performed, which yields   4g π2 ρλ3 = √ (ln ζ)3/2 + (ln ζ)−1/2 + · · · + O(1/ζ), (11.5.25) 3 π 8 where the fact that µ/kT ≫ 1 has been used for the low temperature limit. The high density limit implies a high chemical potential, which makes ζ(ρ) = eβµ(ρ) large as well. A large ζ also helps ensure the convergence of the series in eqn. (11.5.25), since the error falls off with powers of 1/ζ.

The ideal fermion gas

459

As T → 0, ζ → ∞ and only the first term in the above expansion survives: ρλ3 = ρ



2π~2 mkT

3/2

4g  µ 3/2 4g 3/2 . = √ ≈ √ (ln ζ) 3 π 3 π kT

(11.5.26)

According to the procedure of the grand canonical ensemble, we need to solve for ζ as a function of ρ or equivalently for µ as a function of ρ. From eqn. (11.5.26), we find ~2 µ= 2m



6π 2 ρ g

2/3

≡ µ0 = ε F ,

(11.5.27)

which is independent of T . The special value of the chemical potential µ0 = µ(T = 0) is known as the Fermi energy, εF . The Fermi energy plays an important role in characterizing free or quasi-free many-fermion systems such as metals and semiconductors. In order to shed more light on the physical significance of the Fermi energy, consider the expression for the average number of particles: hN i =

XX m

n

ζe−βεn . 1 + ζe−βεn

(11.5.28)

However, recall that the occupation numbers must sum to the total number of particles in the system: XX fnm = N. (11.5.29) m

n

Thus, taking an average of both sides over the grand canonical ensemble, we obtain XX hN i = hfnm i. (11.5.30) m

n

Comparing eqns. (11.5.28) and (11.5.30), we can deduce that the average occupation number of a given state with quantum numbers n and m is hfnm i =

1 e−β(εn −µ) = . −β(ε −µ) β(ε n 1+e 1 + e n −µ)

(11.5.31)

Equation (11.5.31) gives the average occupancy of each quantum state in the ideal fermion gas and is known as the Fermi-Dirac distribution function. As T → 0, β → ∞, and eβ(εn −µ0 ) → ∞ if εn > µ0 , and eβ(εn −µ0 ) → 0 if εn < µ0 . Recognizing that µ0 = εF , we have the T = 0 result  for εn > εF 0 hfnm i = . (11.5.32)  1 for εn < εF

That is, at zero temperature, the Fermi-Dirac distribution becomes a simple step function: hfnm i = θ(εF − εn ). (11.5.33) A plot of the average occupation number versus εn at T = 0 is shown in Fig. 11.1.

460

Quantum ideal gases

Fig. 11.1 The Fermi-Dirac distribution for T = 0 in eqn. (11.5.32) (solid line) and finite temperature using eqn. (11.5.31) (dashed line).

The implication of eqn. (11.5.33) is that at T = 0, the particles fill all of the available energy levels up to an energy value εF , above which all energy levels are unoccupied. Thus, εF represents a natural cutoff between occupied and unoccupied subspaces of energy levels. The highest occupied energy level must satisfy the condition εn = εF , which implies 2π 2 ~2 2 2π 2 ~2 2 |n| = (n + n2y + n2z ) = εF . mL2 mL2 x

(11.5.34)

Equation (11.5.34) defines a spherical surface in n space, which is known as the Fermi surface. Although the Fermi surface is a simple sphere for the ideal gas, for interacting systems the geometry of the Fermi surface will be considerably more complicated. In fact, characterizing the shape of a Fermi surface is an important component in the understanding of a wide variety of properties (thermal, electrical, optical, magnetic) of solid-state systems. As T is increased, the probability of an excitation above the Fermi energy becomes nonzero, and on average, some of the energy levels above the Fermi energy will be occupied, leaving some of the energy levels below the Fermi energy vacant. This situation is represented with the dashed line in Fig. 11.1, which shows eqn. (11.5.31) for T > 0. The combination of a particle excitation to an energy level above εF and a depletion of an energy level below εF constitutes an “exciton-hole” pair. In real materials such as metals, an exciton-hole pair can also be created by bombarding the material with photons. The familiar concept of a work function—the energy needed to just remove an electron from one of the occupied energy levels—is closely related to the Fermi energy.

The ideal fermion gas

11.5.3

461

Zero-temperature thermodynamics

The fact that states of finite energy are occupied even at zero temperature in the fermion gas means that the thermodynamic properties at T = 0 are nontrivial. Consider, for example, the average particle number. In order to obtain an expression for this quantity, recall that XX XX X hN i = hfnm i = θ(εF − εn ) = g θ(εF − εn ). (11.5.35) m

m

n

n

n

In the thermodynamic limit, the sum may be replaced by an integration in spherical polar coordinates Z hN i = g dn θ(εF − εn ) = 4πg

Z



0

dn n2 θ(εF − εn ).

(11.5.36)

However, since the energy eigenvalues are given by εn =

2π 2 ~2 2 n , mL2

(11.5.37)

it proves useful to change variables of integration from n to ε using eqn. (11.5.37): 

n=

1 dn = 2

mL2 2π 2 ~2 

1/2

mL2 2π 2 ~2

ε1/2

1/2

ε−1/2 dε.

(11.5.38)

Inserting eqn. (11.5.38) into eqn. (11.5.36), we obtain Z ∞ dn n2 θ(εF − ε) hN i = 4πg 0

= 2πg

= 2πg hN i =



mL2 2π 2 ~2

3/2 Z





mL2 2π 2 ~2

3/2 Z

εF

0

dε ε1/2 θ(εF − ε) dε ε1/2

0

4πg  m 3/2 V εF 3/2 . 3 2π~2

(11.5.39)

By a similar procedure, we can obtain an expression for the average energy. Recall that the total energy for a given set of occupation numbers is given by XX E{fn } = fnm εn . (11.5.40) m

n

462

Quantum ideal gases

Taking the ensemble average of both sides yields XX ˆ =E= hHi hfnm iεn . m

(11.5.41)

n

At T = 0, this becomes E = g

X

θ(εF − εn )εn

n

→g

Z

= 4πg

dn θ(εF − εn )εn Z



0

dn n2 θ(εF − εn )εn ,

(11.5.42)

where, as usual, we have replaced the sum by an integral and transformed to spherical polar coordinates. If the change of variables in Equation (11.5.38) is made, we find 3/2 mL2 ε3/2 θ(εF − ε) 2 ~2 2π 0  m 3/2 Z εF = 2πg dε ε3/2 V 2π 2 ~2 0

E = 4πg

=

Z





1 2



4πg  m 3/2 V εF 5/2 . 5 2π 2 ~2

(11.5.43)

Combining eqns. (11.5.43) and (11.5.39), the following relation between E and hN i can be established: 3 E = hN iεF . (11.5.44) 5 Moreover, since εF ∼ ρ2/3 , we see that the total energy is related to the density ρ by E = CK ρ5/3 , V

(11.5.45)

where CK is an overall constant, CK = (3~2 /10m)(6π 2 /g)2/3 . Note that if we perform a spatial integration on both sides of eqn. (11.5.45) over the containing volume, we obtain the total energy as Z Z E dr E= = CK dr ρ5/3 = V CK ρ5/3 . (11.5.46) V D(V ) D(V ) In one of the early theories of the electronic structure of multielectron atoms, the Thomas-Fermi theory, eqns. (11.5.45) and (11.5.46) were used to derive an expression for the electron kinetic energy. In a fermion ideal gas, the density ρ is constant, whereas in an interacting many-electron system, the density ρ varies in space and is, therefore, a function ρ(r). A key assumption in the Thomas-Fermi theory is that in a

The ideal fermion gas

463

multielectron atom, the spatial variation in ρ(r) is mild enough that the kinetic energy can be approximated by replacing the constant ρ in eqn. (11.5.45) with ρ(r) and then performing a spatial integration over both sides. The result is an approximate kinetic-energy functional given by Z T [ρ] = CK dr ρ5/3 (r). (11.5.47) Since the functional in eqn. (11.5.47) depends on the function ρ(r), it is known as a density functional. In 1964, Hohenberg and Kohn proved that the total energy of a quantum multielectron system can be expressed as a unique functional E[ρ] of the density ρ(r) and that the minimum of this functional over the set of all densities ρ(r) derivable from the set of all ground-state wave functions leads to the ground-state density of the particular system under consideration. The implication is that knowledge of the ground-state density ρ0 (r) uniquely defines the quantum Hamiltonian of the system. This theorem has led to the development of the modern theory of electronic structure known as density functional theory, which has become one of the most widely used electronic structure methods. The Hohenberg-Kohn theorem amounts to an existence proof, since the exact form of the functional E[ρ] is unknown. The kinetic energy functional in eqn. (11.5.47) is only an approximation to the exact kineticenergy functional known as a local density approximation because the integrand of the functional depends only on one spatial point r. Equation (11.5.47) is no longer used for actual applications because it, together with the rest of Thomas-Fermi theory, is unable to describe chemical bonding. In fact, Thomas-Fermi theory and its variants have been largely supplanted by the orbital-dependent version of density functional theory introduced by Kohn and Sham (1965); however, important recent work is showing that a nonlocal version of orbital-free density functional theory that builds on the ThomasFermi model can also yield accurate results in some cases (Huang and Carter, 2010). In Section 11.5.4 below, we will use our solution to the fermion ideal gas to derive another approximation commonly used in density functional theory, which is still used within the Kohn-Sham theory for certain classes of systems. The pressure at T = 0 can now be obtained straightforwardly. We first recognize that the pressure is given by the sum in eqn. (11.5.5): ∞ V g X (−1)l+1 ζ l PV = ln Z(ζ, V, T ). = 3 kT λ l5/2 l=1

(11.5.48)

Recall, however, the total energy can be obtained as a thermodynamic derivative of the partition function via   ∂ E=− ln Z(ζ, V, T ) , (11.5.49) ∂β ζ,V from which it follows that E=

∞ 3 V g X (−1)l+1 ζ l . 2β λ3 l5/2 l=1

(11.5.50)

464

Quantum ideal gases

Comparing eqns. (11.5.48) and (11.5.50), we see that E and P are related by E=

3 PV 2



P =

2E . 3V

(11.5.51)

Note that the energy and the pressure at T = 0 are not zero. The zero-temperature values of these quantities are E=

3 hN iεF 5

P =

2 hN i εF . 5 V

(11.5.52)

These are referred to as the zero-point energy and pressure and are purely quantum mechanical in nature. The fact that the pressure does not vanish at T = 0 is again a consequence of the Pauli exclusion principle and the effective repulsive interaction that also appeared in the low density, high-temperature limit. 11.5.4

Derivation of the local density approximation

In Section 11.5.3, we referred to the local density approximation to density functional theory. In this section, we will derive the local density approximation to the exact exchange energy in density functional theory. The functional we will obtain is still used in many density functional calculations and serves as the basis for more sophisticated approximation schemes. The exact exchange energy is a component of the electronic structure method known as Hartree-Fock theory. It takes the form Z e2 |ρ1 (r, r′ )|2 Ex = − , (11.5.53) dr dr′ 4 |r − r′ | where ρ1 (r, r′ ) is known as the one-particle density matrix: XXX ρ1 (r, r′ ) = hfnm iφnm (x)φ∗nm (x′ ). s,s′ m

(11.5.54)

n

Thus, for this calculation, we need both the energy levels and the corresponding eigenfunctions of the quantum ideal gas. We will show that for an ideal gas of electrons, the exchange energy is given exactly by Ex = V Cx ρ4/3 , where Cx = −

3 4

 1/3 3 e2 . π

(11.5.55)

(11.5.56)

As we did for the kinetic energy, the volume factor in eqn. (11.5.55) can be written as an integral: Z Ex = dr Cx ρ4/3 . (11.5.57)

The ideal fermion gas

465

The local density approximation consists in replacing the constant density in eqn. (11.5.57) with the spatially varying density ρ(r) of a system of interacting electrons. When this is done, we obtain the local density approximation to the exchange energy: Ex =

Z

dr Cx ρ4/3 (r).

(11.5.58)

The remainder of this section will be devoted to the derivation of eqn. (11.5.55). Since we are interested in the T = 0 limit, we will make use the zero-temperature occupation numbers in eqn. (11.5.33), and we will assume that the fermions are electrons (spin-1/2) so that the spin degeneracy factor is g = 2. The first step in the derivation is to determine the one-particle density matrix using the eigenvalues and eigenfunctions in eqn. (11.2.15). Substituting these into eqn. (11.5.54) gives ρ1 (r, r′ ) =

′ 1 XXX χm (s)χm (s′ )e2πin·(r−r )/L θ(εF − εn ) V ′ m n

s,s

=

′ 1 XXX δms δms′ e2πin·(r−r )/L θ(εF − εn ) V ′ m n

s,s

2 X 2πin·(r−r′ )/L e θ(εF − εn ) V n Z ′ 2 = dn e2πin·(r−r )/L θ(εF − εn ), V

=

(11.5.59)

where in the last line, the summation has been replaced by integration, and the factor of 2 comes from the summation over spin states. At this point, notice that ρ1 (r, r′ ) does not depend on r and r′ separately but only on the relative vector s = r − r′ . Thus, we can write the last line of eqn. (11.5.59) as 2 ρ1 (s) = V

Z

dn e2πn·s/L θ(εF − εn ).

(11.5.60)

The integral over n can be performed by orienting the n coordinate system such that the vector s lies along the nz axis. Then, transforming to spherical polar coordinates in n, we find that ρ1 only depends on the magnitude s = |s| of s and the angle between s and n: ρ1 (s) =

2 V

Z

0



dn n2 θ(εF − εn )

Z





0

Z

π

dθ sin θ e2πns cos θ/L .

0

Performing the angular integrals, we obtain 4π ρ1 (s) = V

Z

0



dn n2 θ(εF − εn )

 L  2πins/L e − e−2πins/L 2πins

(11.5.61)

466

Quantum ideal gases

4 = 2 L s

Z

0



dn n θ(εF − εn ) sin



2πns L



.

(11.5.62)

For the remaining integral over n, transforming from n to εn is not convenient because of the sin function in the integrand. However, since n > 0, we recognize that the step function simply restricts the upper limit of the integral by the condition 2π 2 ~2 2 n < εF mL2 n
R(3/2). For ρλ3 /g < R(3/2), ζ will be different from 1 by more than 1/V , and the divergent term proportional to ζ/(1 − ζ) can, therefore, be safely neglected. Thus, for ρλ3 /g < R(3/2), we only need to solve ρλ3 /g = g3/2 (ζ) for ζ. Combining these results, the general solution for ζ valid at high density and low temperature can be expressed as  λ3 /V ρλ3  1 − (ρλ3 /g)−R(3/2) g > R(3/2) , (11.6.27) ζ= 3  ρλ3 < R(3/2) root of g3/2 (ζ) = ρλg g which in the thermodynamic limit becomes

476

Quantum ideal gases

Fig. 11.3 Plot of eqn. (11.6.28).

ζ=

  

ρλ3 g

1 root of g3/2 (ζ) =

ρλ3 g

ρλ3 g

> R(3/2) .

(11.6.28)

< R(3/2)

A plot of ζ vs. vg/λ3 = V g/hN iλ3 is shown in Fig. 11.3. According to the figure, the point R(3/2) is special, as ζ undergoes a transition there to the (approximately) constant value of 1. In order to see what the effect of this transition has on the average occupation numbers, recall that the latter can be determined using hN i =

X ζe−βεn = hfnm i, −βε n 1 − ζe n,m n,m X

(11.6.29)

from which it can be seen that the average occupation of each energy level is given by hfnm i =

1 ζe−βεn . = β(ε −µ) 1 − ζe−βεn e n −1

(11.6.30)

The ideal boson gas

477

Equation (11.6.30) is known as the Bose-Einstein distribution function. For the ground state (n = (0, 0, 0)), the occupation number expression is .

hf0m i =

ζ . 1−ζ

(11.6.31)

Substituting the ansatz in eqn. (11.6.23) for ζ into eqn. (11.6.31) gives   V V ρλ3 hf0m i ≈ = 3 − R(3/2) a λ g

(11.6.32)

for ρλ3 /g > R(3/2). At ρλ3 /g = R(3/2), ζ → 0, and the occupation of the ground state becomes 0. The temperature at which the n = (0, 0, 0) level starts to become occupied can be computed by solving ρλ3 = R(3/2) g ρ g



2π~2 mkT0

kT0 =



3/2

= R(3/2)

ρ gR(3/2)

2/3

2π~2 . m

(11.6.33)

For temperatures less than T0 , the occupation of the ground state becomes   ρV g hf0m i = 1 − 3 R(3/2) g ρλ   g hN i 1 − 3 R(3/2) = g ρλ "  3/2  3/2 # hN i gR(3/2) mkT kT0 = 1− g ρ 2π~2 kT0 " #  3/2 hN i T 1− = g T0 "  3/2 # T hf0m i 1 1− . (11.6.34) = hN i g T0 At T = 0, hN i . (11.6.35) g Summing both sides of eqn. (11.6.35) over m cancels the degeneracy factor g on the right, yielding X X hN i hf0m i = g m m hf0m i =

478

Quantum ideal gases

hf¯0 i = hN i,

(11.6.36)

where hf¯0 i indicates that the spin degeneracy has been summed over. For T > T0 , ρλ3 /g < R(3/2) and ζ is not within 1/V of 1, implying that ζ/(1 − ζ) is finite and ζ 1 hf¯0 i = −→ 0 hN i hN i 1 − ζ as hN i → ∞. Thus, for the occupation of the ground state, we obtain  1 − (T /T0 )3/2 T < T0 ¯ hf0 i  = . hN i  0 T > T0

(11.6.37)

(11.6.38)

A plot of eqn. (11.6.38) is given in Fig. 11.4. The occupation of the ground state undergoes a transition from a finite value at T = 0 to zero at T = T0 , and for all higher temperatures, it remains zero. Now, hf¯0 i/hN i represents the probability that a particle will be found in the ground state and is, therefore, the fraction of the total number of particles occupying the ground state on average. For T ρ0 ¯ hf0 i  . = hN i  0 ρ < ρ0

(11.6.41)

The ideal boson gas

479

Fig. 11.4 Plot of eqn. (11.6.38).

The divergent term in eqn. (11.6.18), −(λ3 /V ) ln(1 − ζ), becomes, for ζ very close to 1, λ3 ln V ln(V /a) ∼ , V V

(11.6.42)

which clearly vanishes in the thermodynamic limit, since V ∼ hN i. Thus, the pressure simplifies even for ζ very close to 1, and the equation of state can be written as   g5/2 (1)/λ3 P =  gkT g5/2 (ζ)/λ3

ρ > ρ0 ,

(11.6.43)

ρ < ρ0

where ζ is obtained by solving ρλ3 /g = g3/2 (ζ). It is interesting to note that the pressure is approximately independent of the density for ρ > ρ0 . Isotherms of the ideal boson gas are shown in Fig. 11.5. Here, v0 is the volume corresponding to the critical density ρ0 . The figure shows that P ∼ T 5/2 , which is quite different from the classical ideal gas. This is likewise in contrast to the fermion ideal gas, where as T → 0, the pressure remains finite. For the boson gas, as T → 0, the pressure vanishes, in keeping with the notion of an effective “attraction” between the particles that causes them to condense into the ground state, which is a state of zero energy.

480

Quantum ideal gases

Fig. 11.5 Plot of the isotherms of the equation of state in eqn. (11.6.43). Here T1 > T2 > T3 . The dotted line connects the transition points from constant to decreasing pressure and is of the form P ∼ V −5/3 .

Other thermodynamic quantities follow from the equation of state. The energy can be obtained from E = 3P V /2, yielding  3 kT V ρ > ρ0 , T < T 0  2 λ3 g5/2 (1) E= , (11.6.44)  3 kT V g (ζ) ρ < ρ , T > T 3 0 0 5/2 2 λ

and the heat capacity at constant volume is obtained from CV = (∂E/∂T )v , which gives  15 g5/2 (1) T < T0  4 ρλ3 CV = . (11.6.45) hN ik  15 g5/2 (ζ) 9 g3/2 (ζ) T > T − 0 3 4 ρλ 4 g1/2 (ζ) The plot of the heat capacity in Fig. 11.6 exhibits a cusp at T = T0 . Experiments carried out on liquid 4 He, which has been observed to undergo Bose-Einstein condensation at around T =2.18 K, have measured an actual discontinuity in the heat capacity at the transition temperature, suggesting that Bose-Einstein condensation is a phase transition known as a λ transition. By contrast, the heat capacity of the ideal boson gas exhibits a discontinuous change at the transition temperature, signifying a firstorder phase transition (see also Section 16.1). However, using the mass and density of liquid He4 in the expression for T0 in eqn. (11.6.33), we obtain T0 ≈ 3.14 K from the

Problems

481

Fig. 11.6 CV as a function of T from eqn. (11.6.45). For T < T0 , the curve increases as T 3/2 .

ideal gas, which is not far off the experimental transition temperature of 2.18 K for real liquid helium.

11.7

Problems

11.1. Derive eqn. (11.5.18). What is the analogous term for bosons? 11.2. a. Can Bose-Einstein condensation occur for an ideal gas of bosons in one dimension? If so, determine the temperature T0 . If not, prove that it is not possible. b. Can Bose-Einstein condensation occur for an ideal gas of bosons in two dimensions? If so, determine the temperature T0 . If not, prove that it is not possible.

482

Quantum ideal gases

11.3. Determine how the average energy of an ideal gas of identical fermions in one dimension at zero temperature depends on density. Repeat for a gas in two dimensions. 11.4. Consider an ideal gas of massless spin-1/2 fermions in a cubic periodic box of side L. The Hamiltonian for the system is ˆ = H

N X i=1

c|ˆ pi |,

where c is the speed of light. a. Calculate the equation of state in the high-temperature, low-density limit up to second-order in the density. What is the second virial coefficient? What is the classical limit of the equation of state? b. Calculate the Fermi energy, εF , of the gas. c. Determine how the total energy depends on the density. ∗

11.5. Problem 9.7 of Chapter 9 considers the case of a charged fermion in a uniform magnetic field. In that problem, the eigenfunctions and eigenvalues of the Hamiltonian were determined. This problem uses your solution for these eigenvalues and eigenfunctions. a. For N non-interacting charged fermions in a uniform magnetic field, calculate the grand canonical partition function Z(ζ, V, T ) in the hightemperature (~ω/kT 0. These two conditions might appear incompatible, for how can the point xt be both an independent sampling from f0 (H(x)) and a source for any other point xt+τ of the trajectory? Once again, the existence of a finite correlation time resolves the paradox. If the time correlation function eventually decays to zero on a time scale specified by the correlation time and the total time T of a finite-time trajectory is much larger than the correlation time, then we can imagine breaking a trajectory into segments of length similar to the correlation time. Each segment can then be regarded as an independent “sampling” of the correlation function because over each segment, the correlation function decays to zero. Suppose a molecular dynamics trajectory is long compared to the correlation time and consists of M points, xm∆t , m = 1, ..., M . We can break the trajectory up into segments each having K points, where K ≪ M and K∆t is comparable to the correlation time, and apply the procedure of eqn. (13.4.1) to each segment. For example, we could choose x∆t , ..., xK∆t to be the first segment, x(K+1)∆t , ..., x2K∆t to be the next, etc., and apply eqn. (13.4.1) on these segments. Note, however, that we could just as well start with x∆t and assume x∆t , ..., x(K+1)∆t is the first segment, x(K+2)∆t , ..., x(2K+1)∆t is the next, etc. By segmenting the trajectory in as many different ways as possible, each point of the trajectory serves both as an independent sampling of f0 (H(x)) and as a time point in the correlation function. In fact, when eqn. (13.4.2) is written in discrete form for a finite-time trajectory CAB (n∆t) =

M−n X 1 a(xm∆t )b(x(m+n)∆t ), M − n m=1

n = 0, ..., K,

(13.4.3)

it is clear that eqn. (13.4.3) automatically exploits the idea of dividing the trajectory in all possible ways. Each point of the trajectory, xm∆t , serves as an “initial condition” from which the point x(m+n)∆t a time n∆t later is determined, i.e., as an independent sampling of f0 (H(x)), and as a point generated by an “initial condition” at some earlier time in the trajectory. This approach is illustrated in Fig. 13.6(b). Indeed, as n increases, the number of time intervals that fit into the trajectory decreases, and hence, the statistics degrade. This is why it is imperative to ensure that the trajectory is long compared to the correlation time when using eqn. (13.4.3).

582

Classical time-dependent statistical mechanics

13.4.3

The fast Fourier transform method

For correlation functions with very long decay times, eqn. (13.4.3) requires a very long trajectory and could potentially need to be evaluated at a large number of points. Consequently, the computational overhead of eqn. (13.4.3) could be quite high, increasing roughly as M 2 for a trajectory of M time steps. The third method we will discuss is a highly efficient Fourier transform-based method that can take advantage of fast Fourier transform algorithms (Futrelle and McGinty, 1971). In order to derive the method, we start by noting that Hamilton’s equations are invariant with respect to a change in the origin of time, so that we can shift the time origin in eqn. (13.4.2) from t = 0 to t = −T/2. Such a shift gives an equivalent definition of the time correlation function: Z 1 T/2 dt a(xt )b(xt+τ ). (13.4.4) CAB (τ ) = lim T→∞ T −T/2 Since T formally is taken to be infinite, we may write a(xt ) and b(xt ) in terms of their Fourier transforms: Z ∞ 1 dt e−iωt a(xt ) a ˜(ω) = √ 2π −∞ Z ∞ ˜b(ω) = √1 dt e−iωt b(xt ). (13.4.5) 2π −∞ Now, consider the product a ˜(ω)˜b∗ (ω): Z ∞ Z ∞ ′ 1 ∗ ˜ dt dt′ eiωt e−iωt a(xt′ )b(xt ) a ˜ (ω)b(ω) = 2π −∞ −∞ Z ∞ Z ∞ ′ 1 = dt dt′ e−iω(t−t ) a(xt′ )b(xt ). 2π −∞ −∞ If a change of variables s = t − t′ is made for the t integral, we find Z ∞ Z ∞ 1 ∗ ˜ ds dt′ e−iωs a(xt′ )b(xt′ +s ) a ˜ (ω)b(ω) = 2π −∞ −∞ Z ∞ Z ∞ 1 = ds e−iωs dt′ a(xt′ )b(xt′ +s ). 2π −∞ −∞

(13.4.6)

(13.4.7)



Multiplying both sides by eiωs and integrating over ω using the fact that Z ∞ ′ dω eiωs e−iωs = 2πδ(s − s′ ),

(13.4.8)

−∞

we obtain

Z



−∞

dt a(xt )b(xt+s ) =

Z



−∞

dω eiωs a ˜∗ (ω)˜b(ω).

(13.4.9)

Nonequilibrium molecular dynamics

Thus, for very large T, we have, to a good approximation, Z 1 ∞ dω eiωt a ˜∗ (ω)˜b(ω). CAB (t) = T −∞

583

(13.4.10)

In practice, since the time interval is actually finite and time is discrete, eqn. (13.4.10) can be evaluated using discrete fast Fourier transforms (FFTs). FFTs are nothing more than canned routines capable of evaluating the transforms in eqn. (13.4.5) as discrete sums of the form a ˜k =

M−1 X

a(xj∆t )e−2πijk/M

j=0

˜bk =

M−1 X

b(xj∆t )e−2πijk/M ,

(13.4.11)

j=0

corresponding to discrete frequencies and times ωk = 2πk/M ∆t and tj = j∆t, respectively, in O(M ln M ) operations. The efficiency of the Fourier transform method lies in the fact that the correlation function can be computed by performing just three FFTs: Two FFTs are needed to transform a and b into the frequency domain, and a third one is needed to transform the product a ˜˜b back to the time domain via CAB (tj ) =

M−1 1 X ˜ 2πijk/M . a ˜k bk e M

(13.4.12)

k=0

Moreover, since the calculation of vibrational spectra requires the Fourier transform of CAB (t), the final FFT in eqn. (13.4.12) can be eliminated. Finally, if an autocorrelation function CAA (t) is sought, then only one FFT is needed to transform a into the frequency domain, and one additional inverse FFT is required to obtain the real-time autocorrelation function. Again, if only C˜AA (ω) is needed, the only one FFT is needed. Since FFTs can be evaluated in M ln M operations, the scaling with the number of trajectory points is considerably better than eqn. (13.4.3).

13.5

The nonequilibrium molecular dynamics approach

Although equilibrium time correlation functions are useful in the calculation of transport properties, their connection to experiments that employ external driving forces is obscured by the fact that the external perturbation is absent in eqn. (13.2.29). This means we can determine transport coefficients without actually observing the behavior of the system under the action of a driving force. A more intuitive approach would attempt to model the experimental conditions via direct solution of eqns. (13.2.1). Early molecular dynamics calculations based on this idea were carried out by Gosling et al. (1973), who employed a spatially periodic shearing force. Ciccotti et al. (1976, 1979) showed that a variety of transport properties could be calculated employing this approach. Lees and Edward (1972) introduced an approach by which simulations at

584

Classical time-dependent statistical mechanics

constant shear rate could be performed using periodic boundary conditions. Finally, Ashurst and Hoover (1975) and later Edberg et al. (1987) introduced a general approach whereby the technique of Lees and Edwards could be coupled to thermostatting mechanisms. This methodology is known as nonequilibrium molecular dynamics and is described in detail in a series of reviews (Hoover, 1983; Ciccotti et al., 1992; Evans and Morriss, 1980; Mundy et al., 2000). Not unexpectedly, molecular dynamics simulations based directly on the perturbed equations (13.2.1) involve certain subtleties. To see what some of these might be, consider again the example in Section 13.3.1 of flow under the action of a shearing force. In order to perform a molecular dynamics simulation based on eqns. (13.3.6) and (13.3.7), we could imagine placing N particles between movable plates, pulling the plates in opposite directions, and using the trajectory to compute hPxy it ; eqn. (13.3.4) would then be used to obtain the shear viscosity. However, let us take a closer look at the influence of the plates. Because the plates are physical boundaries, they create a strong inhomogeneity in the system. Moreover, the interactions between the plates and the fluid particles are generally repulsive, fluid particles are “pushed” away from the plates, thereby creating a void layer at each plate surface. In addition, these repulsive interactions set up a layer of high particle density adjacent to each void layer. To a lesser extent, these high-density layers repel or “push” particles out of their vicinity and into secondary layers of high but slightly smaller density. This effect propagates across several layers until it dissipates into the bulk. However, in realistic applications, the effect can persist for tens of angstroms before dissipating, so that very large system sizes are needed in order to render this boundary effect negligible, leaving enough bulk fluid to obtain reliable bulk properties. As we discussed in Section 3.14.2, the effects of physical boundaries in equilibrium molecular dynamics calculations can be eliminated by employing periodic boundary conditions. Thus, it is interesting to ask if the idea of periodic boundary conditions

Fig. 13.7 (Left) Standard periodic boundary conditions. (Right) Lees-Edwards boundary conditions.

Nonequilibrium molecular dynamics

585

Fig. 13.8 Periodic boundary conditions under the box evolution of eqn. (13.5.1).

can be adapted for systems undergoing shear flow as a means of eliminating the physical plates. It is not immediately obvious how this can be accomplished, however, as the plates are the very source of the driving force. One option is to replace the usual periodic array employed in equilibrium molecular dynamics calculations (see Fig. 13.7, left) with an array in which layers stacked along the y-axis move in the x-direction with a speed equal to yl γ, where yl is the position of the lth layer along the y-axis. This scheme is depicted in Fig. 13.7, right. Such time-dependent boundary conditions are known as Lees-Edwards boundary conditions (Lees and Edwards, 1972). As the right panel in Fig. 13.7 suggests, it is the application of the Lees-Edwards boundary conditions that drives the flow and establishes the linear velocity profile required for planar Couette flow. The combination of eqns. (13.3.6) and (13.3.7) with the Lees-Edwards boundary conditions is known as the SLLOD algorithm.2 A serious disadvantage of Lees-Edwards boundary conditions is its implementation for systems of charged particles employing Ewald summation techniques (see Appendix C), which are based in reciprocal space. In such systems, no clear definition of the reciprocal-space vectors exists under Lees-Edwards boundary conditions (Mundy et al., 2000). An alternative to the Lees-Edwards approach drives the flow by evolving the box matrix h in time so as to “shear” the fluid in a particular direction (see Fig. 13.8). For example, if we begin with a cubic box for which h = diag(L, L, L), then a shear flow profile can be established by allowing h to evolve in time according to 

L h(t) =  0 0

 γtL 0 L 0 . 0 L

(13.5.1)

2 Note that “SLLOD” is “DOLLS” spelled backwards. For an explanation of the amusing origin of the “SLLOD” moniker, see Evans and Morriss (1984). Suffice it to say, here, that SLLOD is not actually an acronym.

586

Classical time-dependent statistical mechanics

This form of the box matrix allows the simulation cell to shear along the x-direction without changing the volume (since deth(t) = deth(0)). As a practical matter, the box must not be allowed to “roll over” indefinitely but needs to be reset periodically using a property of the equations of motion known as modular invariance, which we will discuss in the next subsection. Note that when using either Lees-Edwards boundary conditions or eqn. (13.5.1), eqn. (13.3.8) is no longer conserved because the inter-particle distances become time dependent, which in turn causes the potential to become time dependent. Another subtlety associated with the nonequilibrium molecular dynamics approach concerns the time scales that can be accessed in a typical simulation. The driving forces used in actual experiments are orders of magnitude smaller than those that can be routinely employed in calculations. For planar Couette flow, this means that a typical experimental shear rate might be γ = 102 s−1 while the simulation might require γ = 109 s−1 . Under such enormously high shear rates, it is conceivable that the behavior of the system could differ significantly from the experiment. Consequently, a careful extrapolation to the experimental limit from simulations performed at several different shear rates is generally needed in nonequilibrium molecular dynamics calculations. Clearly, this problem does not arise in the Green-Kubo approach because of the absence of explicit driving forces. Finally, when using nonequilibrium molecular dynamics, we must ensure that conditions of linear response theory are valid in the molecular dynamics calculation. This means that the coefficient of shear viscosity η must be independent of the choice of shear rate γ. Since the extrapolation procedure requires a series of simulations at different shear rates, these simulations should be used to check that the average hPxy it varies linearly with γ. If it does not, the shear rate is too high and must be decreased. 13.5.1

Reversible integration of the SLLOD equations of motion

The developing reversible numerical integration methods for the general driven system in eqns. (13.2.1) depends on the particular form for the functions Ci (q, p) and Di (q, p) appearing in them; each problem needs to be treated as a separate case. While eqns. (13.3.19) for the calculation of the diffusion constant are quite straightforward to integrate, for example, the SLLOD equations for planar Couette flow are somewhat more complicated. Hence, we will consider this latter case as an example of how to construct an integration algorithm for a nontrivial nonequilibrium molecular dynamics problem. There are two issues that need to be considered in the SLLOD scheme. First, the equations of motion are non-Hamiltonian and involve nontrivial driving terms. Second, we must either impose Lees-Edwards boundary conditions or employ the timedependent box matrix of eqn. (13.5.1). When performing nonequilibrium molecular dynamics simulations, the use of thermostats is often imperative, as the coupling to the external field causes the system to heat continuously, and a steady state can never be reached without some kind of temperature control mechanism. Thus, if the system is coupled to a Nos´e-Hoover chain thermostat, the simulation would be based on the following equations of motion:

Nonequilibrium molecular dynamics

r˙ i =

pi ˆx + γyi e mi

ˆx − p˙ i = Fi − γpyi e

587

pη1 pi Q1

pηj j = 1, ..., M Qj "N # X p2 pη i = − dN kT − 2 pη1 m Q2 i i=1 " # p2ηj−1 pη = − kT − j+1 pηj Qj−1 Qj+1 " # p2ηM −1 = − kT . QM−1

η˙ j = p˙ η1 p˙ ηj p˙ ηM

j = 2, ..., M − 1 (13.5.2)

In order to devise a numerical integration scheme for eqns. (13.5.2), we begin by writing the Liouville operator for the system in a form similar to eqn. (4.12.4): iL = iLmNHC + iL1 + iL2 ,

(13.5.3)

where the three terms are defined to be  N  X ∂ ∂ pi · + γyi iL1 = mi ∂ri ∂xi i=1 iL2 =

N X i=1

Fi ·

∂ ∂pi

iLmNHC = iLNHC − γ

N X i=1

py i

∂ . ∂pxi

(13.5.4)

Here, iLNHC is the Nos´e-Hoover chain Liouville operator of eqn. (4.12.5). The classical propagator exp(iL∆t) for a single time step is now factorized in a manner analogous to eqn. (4.12.8): i h  (13.5.5) eiL∆t = eiLmNHC∆t/2 eiL2 ∆t/2 eiL1 ∆t eiL2 ∆t/2 eiLNHC ∆t/2 + O ∆t3 .

Consider, first, the three operators in the brackets. If γ = 0, these operators just produce a step of velocity Verlet integration in eqns. (3.8.7) and (3.8.9). However, with γ 6= 0, the action of the operators is only slightly more complicated. For each particle, the action of the operators can be deduced by solving the three coupled differential equations px py pz x˙ i = i + γyi , y˙ i = i , z˙i = i (13.5.6) mi mi mi subject to initial conditions xi (0), yi (0), zi (0) with constant pxi , pyi , and pzi . The solution for yi and zi is simply py yi (t) = yi (0) + i t mi

588

Classical time-dependent statistical mechanics

zi (t) = zi (0) +

pz i t. mi

Substituting eqn. (13.5.7) for yi (t) into eqn. (13.5.6) for xi gives   px py x˙ i = i + γ yi (0) + i mi mi t   t2 p y px i + γyi (0) t + γ i . xi (t) = xi (0) + mi 2 mi

(13.5.7)

(13.5.8)

The evolution for xi can be understood by noting that the displacement of xi in time is elongated by an amount that depends on the position yi (0), the shear rate, and the momentum pyi . It is this yi -dependent elongation that gives rise to the linear flow profile. Setting t = ∆t and using the fact that exp(iL2 ∆t/2) is a simple translation, we can express the action of the operator in brackets in eqn. (13.5.5) in pseudocode as ∆t Fi 2   ∆t2 px i + γyi + γpyi xi ←− xi + ∆t mi 2mi

pi ←− pi +

yi ←− yi + ∆t

py i mi

zi ←− zi + ∆t

pz i mi

Update Forces ∆t Fi . (13.5.9) 2 The action of the operator exp(iLmNHC ∆t/2) is similar to the standard Nos´e-Hoover chain operator exp(iLNHC ∆t/2) discussed in Section 4.12 (cf. eqn. (4.12.19)). The presence of the γ-dependent operator requires only a slight modification. The operator   ∂ δα pη1 pi · exp − 2 Q1 ∂pi pi ←− pi +

in eqn. (4.12.19) is replaced by    ∂ ∂ δα pη1 pi · + γpyi exp − 2 Q1 ∂pi ∂pxi

due to the γ-dependent term in iLmNHC . The action of this operator can be derived by solving the three coupled differential equations pη pη pη p˙xi = − 1 pxi − γpyi , p˙yi = − 1 pyi , p˙zi = − 1 pzi . (13.5.10) Q1 Q1 Q1 The solutions for pyi and pzi are pyi (t) = pyi (0)e−pη1 t/Q1

Nonequilibrium molecular dynamics

pzi (t) = pzi (0)e−pη1 t/Q1 .

589

(13.5.11)

Substituting the solution for pyi (t) into the equation for pxi gives p˙xi = −

pη1 px − γpyi (0)e−pη1 t/Q1 . Q1 i

(13.5.12)

This equation can be solved using exp[pη1 t/Q1 ] as an integrating factor, which yields the solution (13.5.13) pxi (t) = [pxi (0) − γtpyi (0)] e−pη1 t/Q1 . Evaluating the solutions for pxi (t), pyi (t), and pzi (t) at t = δα /2 then gives the modification to the Suzuki-Yoshida Nos´e-Hoover chain evolution necessary for treating planar Couette flow. As shown in Appendix C, the application of periodic boundary conditions in a cubic box of side L proceeds as follows: Given a pair of particles with indices i and j, we first calculate the vector difference rij = ri −rj and then apply the minimum-image convention xij ←− xij − L × NINT (xij /L)

yij ←− yij − L × NINT (yij /L) zij ←− zij − L × NINT (zij /L) .

(13.5.14)

Here, NINT(x) is the nearest integer function. As Fig. 13.7 (right) suggests, the application of periodic boundary conditions in the Lees-Edwards scheme requires a modification in the minimum-image procedure along the x-direction. For a shear rate γ, a row of boxes will be displaced with respect to a neighboring row by an amount γtL in time t. Hence, we must replace eqn. (13.5.14) by ′ yij = L × NINT (yij /L)    ′ /L xij ←− xij − L ∗ NINT xij − γtyij ′ yij ←− yij − yij

zij ←− zij − L × NINT (zij /L) .

(13.5.15)

Note that when γt = 1, neighboring rows come back into register with each other, and the periodic array will produce the same set of minimum images as at t = 0. Consequently, we could reset γt = 0 at this point and continue the simulation. As noted previously, Lees-Edwards boundary conditions are not useful for systems with long range interactions using Ewald summation techniques, as there is no welldefined box matrix, hence no simple definition of the reciprocal-space vectors. In such systems, the use of a time-dependent box matrix is preferable. In Appendix C, it is shown that for a given box matrix h, the application of periodic boundary conditions involves the four steps: si = h−1 ri sij = si − sj

590

Classical time-dependent statistical mechanics

sij ←− sij − NINT(sij )

(Minimum-image convention)

rij = hsij ,

(13.5.16)

where the third line is a reassignment of sij based on the minimum-image convention. In the time-dependent box-matrix method, the above prescription is performed using h(n∆t) at each time step. However, as eqn. (13.5.1) suggests, eventually the box will become long and skinny in the x-direction, causing the system to distort severely. In order to avoid this scenario, we may use a property of the minimum-image convention known as modular invariance (Mundy et al., 2000). Note that the inverse of the matrix in eqn. (13.5.1) is   1/L −γt/L 0 h−1 (t) =  0 1/L 0 . (13.5.17) 0 0 1/L Applying the prescription in eqn. (13.5.16) using eqns. (13.5.1) and (13.5.17) gives the three components of rij for a cubic box as ′ yij = L × NINT (yij /L) ′ xij ←− xij − L ∗ NINT [(xij − γtyij ) /L] + γtyij ′ yij ←− yij − yij zij ←− zij − L × NINT (zij /L) .

(13.5.18)

In much the same way as occurs for Lees-Edwards boundary conditions, the same set of images is obtained from eqns. (13.5.16) when γt = 1 as when γt = 0. Thus, as in the Lees-Edwards scheme, when γt = 1 we can reset the time-dependent element h21 back to 0, thereby preventing the simulation cell from becoming overly distorted. Note that a similar modular invariance condition exists if h21 is allowed to vary between −L/2 and L/2 rather than between 0 and L. Thus, by resetting h21 to −L/2 when γt = 1/2, the box can be kept closer to cubic, thereby allowing use of a larger cutoff radius on the short-range interactions without requiring an increase in system size. When the diagonal elements of h are not all identical, it can be shown that the equivalent reset conditions for the two modular invariant forms occur when γth11 /h22 = 1 and when γth11 /h22 = 1/2. 13.5.2

Other types of flows

Although u(r) could describe any type of velocity flow field and eqns. (13.3.6) and (13.3.7) can describe both the linear and nonlinear regimes, the formal development of nonlinear response theory is beyond the scope of this book. Therefore, we will restrict ourselves to examples for which u(r) depends on linearly on r. This means that the so-called strain-rate tensor, denoted ∇u, is a constant dyad. We will also assume hydrodynamic incompressibility for which ∇ · u(r) = 0. For example, the velocity profile u(r) = (γy, 0, 0) that describes planar Couette flow has the associated strain-rate tensor   0 0 0 ∇u =  γ 0 0  . (13.5.19) 0 0 0

Nonequilibrium molecular dynamics

591

Note that the equation of motion for the matrix h is generally given by h˙ = (∇u)T h,

(13.5.20)

which reproduces the evolution in eqn. (13.5.1). An example of a different flow satisfying the above requirements is planar elongational flow described by the strain-rate tensor   ξ 0 0  ∇u =  0 ξ(b + 1)/2 0 (13.5.21) 0 0 ξ(b − 1)/2 (Ciccotti et al., 1992), where ξ is the elongation rate and b is a parameter such that 0 ≤ b ≤ 1 that describes the type of planar elongational flow. For a general incompressible flow with constant strain-rate tensor, we seek a set of equations of motion that have a well-defined conserved energy in the absence of timedependent boundary conditions (Tuckerman et al., 1997). These equations of motion take the form r˙ i =

pi + ri · ∇u mi

p˙ i = Fi − pi · ∇u − mi ri · ∇u · ∇u − ζ˙ =

N X i=1

ri · ∇u · pi

pη1 pi Q1

pη1 Q1

pηj j = 1, ..., M Qj "N # X p2 pη i = − 3N kT − 2 pη1 m Q2 i i=1 # " p2ηj−1 pη − kT − j+1 pηj = Qj−1 Qj+1 " # 2 pηM −1 = − kT , QM−1

η˙ j = p˙ η1 p˙ ηj p˙ ηM

j = 2, ..., M − 1 (13.5.22)

where a Nos´e-Hoover chain has been coupled to the system. In the absence of timedependent boundary conditions, eqns. (13.5.22) have the conserved energy H′ =

N 2 X (pi + mi ri · ∇u) i=1

2mi

+ U (r) +

M X p2ηj j=1

2Qj

+ 3N kT η1 + kT

M X

ηj + ζ. (13.5.23)

j=2

An interesting problem to which nonequilibrium molecular dynamics with fixed boundaries can be applied is the determination of hydrodynamic boundary conditions for shear flow over a stationary corrugated surface (Tuckerman et al., 1997; Mundy

592

Classical time-dependent statistical mechanics

et al., 2000). In general, the flow field v(r, t) in a given geometry can be computed using the Navier-Stokes equation ∂v + (v · ∇)v = ∇P + η∇2 v, ∂t

(13.5.24)

where P is the pressure tensor and η is the coefficient of shear viscosity. The most general boundary condition on the velocity profile at the surface (assuming the geometry of Fig. 13.9) is ∂vx (r, t) 1 vx (r, t)|y=ysurf (13.5.25) = ∂y y=ysurf δsurf

(Tuckerman et al., 1997), where δsurf and ysurf are known as the slip length and hydrodynamic thickness, respectively. When δsurf = ∞, the right side of eqn. (13.5.25) is zero, which corresponds to “slip” boundary conditions, and when δsurf = 0, the right side is infinite, corresponding to the “no-slip” conditions. In general, δsurf is related to the coefficient of shear viscosity via the relation δsurf = ηλsurf , where λsurf is the friction coefficient of the surface. Thus, in order to use eqn. (13.5.25), one needs to determine both λsurf and ysurf , assuming that η is already known (or has been determined by a nonequilibrium simulation with Lees-Edwards boundary conditions or

Fig. 13.9 Two-dimensional representation of a fluid confined between corrugated plates.

a time-dependent box matrix approach). These quantities can be obtained by relating them to a nonequilibrium average of the force Fx on the fluid due to the surface: hFx inonequil = −Sλsurf vx (ysurf ) = −Sλsurf γ(ysurf − y0 ),

(13.5.26)

where S is the area of surface in contact with the fluid, γ is the shear rate, and y0 is the location along the y-axis where the drift velocity due to the external field is

Nonequilibrium molecular dynamics

593

Fig. 13.10 Velocity profile at regularly spaced slabs in the y direction of a soft-sphere fluid confined between corrugated plates (Fig. 13.9) at T = 480 K.

0. By performing two simulations with two different values of y0 , one obtains two values of hFx inonequil , which gives two equations in the two unknowns λsurf and ysurf . From these equations, the boundary condition can be determined via eqn. (13.5.25). In Fig. 13.10, we illustrate one such simulation by showing that a linear velocity profile can be achieved when a fluid confined between corrugated plates in the absence of moving boundaries. In this nonequilibrium molecular dynamics simulation, the fluid is described by a pair potential of the form u(r) = ǫ(σ/r)12 , known as a soft-sphere potential. In the present simulations, ǫ = 480 K, σ = 3.405 ˚ A, and the temperature and density are T = 480 K, ρ = 0.0162 ˚ A−3 . The particles interact with the corrugated walls via a potential of the form Uwf (x, y) = ǫ

(

σ y − f (l) (x)

12



σ + Ly − y − f (u) (x)

12 )

,

(13.5.27)

where Ly is the length of the box in the y direction, and f (l) (x) = f (u) (x) = a cos(kx) characterizes the corrugation of the lower and upper walls, respectively. In Fig. 13.10, a corrugation amplitude of a = 0.02σ is used with a corrugation period of 1.0 ˚ A, and the zero of the shear field occurs at y0 = −26.6 ˚ A. The shear rate is γ =0.05 ps−1 . The massive thermostatting scheme of Section 4.11 is used in order to stabilize the linear

594

Classical time-dependent statistical mechanics

profile. Because moving boundaries are absent, the quality of the simulation can be monitored by using the conservation law in eqn. (13.5.23).

13.6

Problems

13.1. Consider two models of a velocity autocorrelation function: C1 (t) = hv 2 ie−γt ,

C2 (t) = hv 2 ie−γt cos(αt).

a. Calculate the diffusion constants D1 and D2 for each model. Which one is larger? b. Comparing the two diffusion constants, what two physical situations do these two models describe? 13.2. The classical isotropic isothermal-isobaric (NPT) ensemble is particularly useful for determining the bulk viscosity of a substance via Green-Kubo theory. a. Show that the linear response formula does not change if the initial distribution is chosen to be f0 (H(r, p), V ), i.e., the isothermal-isobaric distribution function exp(−β(H(r, p) + P V ))/∆(N, P, T ). b. Next, consider coupling a system to an external compression field described by the equations of motion pi,α X + ri,β Mβα r˙i,α = mi β X p˙ i,α = Fi,α − pi,β Mβα , β

where α and β index the three spatial directions x, y, and z. Show that the equations of motion satisfy the incompressibility condition. c. Consider the specific choice Mαβ =

1 γδαβ , 3

where γ is the compression rate. The coefficient of bulk viscosity ηV is given by a generalization of Newton’s law of viscosity: hV iηV = − lim

t→∞

hP (t)V (t)i , γ

where h· · ·i on the right represents an average over the equilibrium isotropic NPT distribution function and h· · ·iηV on the left is the full nonequilibrium average. Using the linear response formula to evaluate hP (t)V (t)i, derive the appropriate Green-Kubo expression for ηV .

Problems

595

13.3. Consider a particular realization of eqns. (13.2.1) with the following choice for Ci (r, p) and Di (r, p): Ci (r, p) = 0 ˆz + Di (r, p) = εi e

X j6=i

ˆz ) , fij (rij · e

ˆz is the unit vector along the z-direction, where εi is the energy of atom i, e fij = Fi − Fj is the force between atoms i and j, and rij = ri − rj is the relative vector between atoms i and j. Write down the driven equations of motion, and derive the dissipative flux j(x) for this choice of Ci and Di . Assuming that Fe (t) = 1 and that the dissipative flux also corresponds to the observable a(x) in the linear response formula, derive a Green-Kubo relation for the thermal conductivity λ given by the nonequilibrium average λ = lim

t→∞



V hjit . kT 2

13.4. Show that the Einstein relation in eqn. (13.3.36) can be derived from the Green-Kubo relation in eqn. (13.3.35). 13.5. Verify that eqn. (13.5.23) is conserved by eqn. (13.5.22) in the absence of time-dependent boundary conditions.



13.6. In Problem 5.11, we considered a particle moving through a periodic potential. Now suppose we add a linear potential −f q to this system, so that the particle is driven by a constant force f . The total potential that results is known as Galton’s staircase U (q) = V0 cos(kq) − f q. a. Write a program to integrate the driven equations of motion under conditions of constant energy, and use the program to show that the system never reaches a steady state by computing the average hqi ˙ t in the long time limit. b. Next try coupling the system to a Nos´e-Hoover chain thermostat (see Section 4.11) at temperature kT = 1. Does the system now reach a steady state? If so, calculate the average hqi ˙ t for different values of f and find a regime for which the dependence of the average on f is linear, and estimate the value of the diffusion constant D. Take m = 1, k = 1, and V0 = 1. ∗∗

c. Finally, couple the system to the thermostat of Problem 4.1, using M = 2. Algorithms for integrating these equations are given by Liu and Tuckerman (2000) and by Ezra (2007). Repeat the analysis of part b, and estimate the diffusion constant for different values of V0 .

596

Classical time-dependent statistical mechanics

13.7. Verify that the cell matrix evolution in eqn. (13.5.20) reproduces eqn. (13.5.1) for planar Couette flow starting with h(0) = diag(L, L, L). 13.8. Consider a fluid confined between corrugated plates as in Fig. 13.9 but with the fluid particles obeying the equations of motion r˙ i =

pi m

ˆx , p˙ i = Fi + Fe e where Fe is a constant and Fi is the force on particle i due to the other fluid particles and the corrugated wall. Describe the velocity profile you would expect to develop in a steady state, and give the mathematical form of this profile. 13.9. Prove that hPxy i = 0 in the canonical ensemble. 13.10. An N -particle system with coordinate r ≡ r1 , ..., rN and momenta p ≡ p1 , ..., pN has a Hamiltonian of the form H(r, p) = H0 (r, p) +

N X

rT i (∇u) pi

i=1

where ∇u is a 3×3 constant matrix obtained from the derivative of the linear flow profile u(r), and H0 (r, p) =

N X p2i + U (r1 , ..., rN ). 2mi i=1

a. Derive the equations of motion from this Hamiltonian. b. Show that if ∇u is the flow-profile gradient matrix for shear flow given by   0 0 0 ∇u =  γ 0 0  , 0 0 0 where γ is the shear rate, then the equation of motion is p˙ i = Fi − γ(pi · ˆx )ˆ e ey , which is not the posited form in eqn. (13.3.7).

c. How would the equations of motion obtained in part a need to be modified so that eqn. (13.3.7) is obtained when ∇u is given by the form in part b?

Problems

597

13.11. A system of N identical particles with coordinates r1 , ..., rN ≡ r and momenta p1 , ..., pN ≡ p interacts via a pair potential U (r1 , ..., rN ) =

N −1 X

N X

i=1 j=i+1

u(|ri − rj |).

Hamilton’s equations of motion are pi p˙ i = Fi r˙ i = , m where the force Fi = −∂U/∂ri , and m is the mass of each particle. a. The center of mass velocity of the system is V=

N 1 X r˙ i N i=1

Show that V is conserved over a trajectory (r(t), p(t)) satisfying these equations of motion. In your derivation, you might find it useful to remember Newton’s third law of motion. b. Suppose V is a constant vector a 6= 0 over the trajectory. Let r˙ ′i (t) be the velocity of particle i in an equivalent system for which a = 0. Then, the velocity when a 6= 0 is r˙ i (t) = r˙ ′i (t) + a. Show that the velocity autocorrelation function Cvv (t) =

N 1 X h˙ri (0) · r˙ i (t)i 3N i=1

can never decay to 0 even if h˙r′i (0) · r˙ ′i (t)i decays to 0. Show, in fact, that if h˙r′i (0) · r˙ ′i (t)i decays to 0, Cvv (t) decays to a constant. c. Show that even if h|r′i (0) − r′i (t)|2 i becomes linear in time as t → ∞, where, again, r˙ i (t) = r˙ ′i (t) + a, the mean-square displacement curve N 1 X h|ri (0) − ri (t)|2 i ∆R (t) = N i=1 2

of the system with a 6= 0 becomes a quadratic function of t as t → ∞. 13.12. Recall that the general driven equations of motion for a system of 3N generalized coordinates q1 , ..., q3N ≡ q, momenta p1 , ..., p3N ≡ p, and Hamiltonian H(q, p) ≡ H(x) are q˙α =

∂H + Cα (q, p)Fe (t) ∂pα

p˙α = − α = 1, ..., 3N .

∂H + Dα (q, p)Fe (t), ∂qα

598

Classical time-dependent statistical mechanics

a. Evaluate dH/dt over a trajectory xt , expressing your answer in terms of the dissipative flux j(xt ) and the driving function Fe (t). b. Can the system ever reach equilibrium? If no, explain why not. If yes, give the conditions under which it might reach equilibrium. c. Prove that eqn. (13.3.8) is conserved by eqns. (13.3.6) and (13.3.7). d. Suppose we were to treat H′ in eqn. (13.3.8) as a proper Hamiltonian. Determine the equations of motion produced by H′ and explain any differences with the equations of motion in part c. e. If the equations you derived in part d are used in the linear response theory formula to determine the coefficient of shear viscosity from the nonequilibrium average of the pressure tensor estimator hPxy it , do you obtain the correct Green-Kubo relation? If not, then under what conditions would you obtain the correct Green-Kubo relation? 13.13. Consider a one-dimensional harmonic oscillator with coordinate x and momentum p for which the Hamiltonian is H(q, p) =

1 p2 + mω 2 x2 . 2m 2

Assuming a canonical distribution of initial conditions, calculate the following time correlation functions: a. hx(0)x(t)i b. hp(0)p(t)i c. hx(0)p(t)i d. hp(0)x(t)i e. hx2 (0)x2 (t)i f. hx2 (0)p2 (t)i

14 Quantum time-dependent statistical mechanics 14.1

Time-dependent systems in quantum mechanics

In this chapter, we will explore how the physical properties of a quantum system ˆ 0 can be probed by applying a small external timedescribed by a Hamiltonian H dependent perturbation. As we showed in Section 10.4, if the energy levels and energy ˆ 0 are known, then using the rules of quantum statistical mechaneigenfunctions of H ics, all of the thermodynamic and equilibrium properties can be computed from eqns. (10.4.4) and (10.4.5). This fact emphasizes the importance of techniques that can ˆ 0 . The essence of the experiprovide information about the eigenvalue structure of H mental technique known as spectroscopy is to employ an external electromagnetic field ˆ 0 ; from the frequencies of to induce transitions between the different eigenstates of H photons needed to induce these transitions, information about different parts of the eigenvalue spectrum can be gleaned for different electromagnetic frequency ranges (infrared, visible, ultraviolet, etc.) As we will see in the discussion to follow, the rules of quantum statistical mechanics help us both to interpret such experiments and to construct approximate computational procedures for calculating quantum spectral and transport properties of a condensed-phase system. An external electromagnetic field is described by its electric and magnetic field components E(r, t) and B(r, t), respectively. The term “electromagnetic” arises from the notion that the electric and magnetic fields can be unified into a single classical field theory, an idea that was demonstrated by James Clerk Maxwell (1831–1879) between the years 1864 and 1873. Maxwell’s theory of electromagnetism is embodied in a set of field equations known as Maxwell’s equations, which describe how the electric and magnetic fields are coupled in such a unified theory. In the absence of external sources, i.e., for free fields, Maxwell’s equations specify how the divergence and curl of the electric and magnetic fields are related to their time rates of change fields.1 Once the divergence and curl of a vector field are specified, then these, together with a knowledge of the time derivatives of the fields, are sufficient to determine the 1 The “divergence” of a vector field is a measure of the extent to which the field expands or contracts at a point while the “curl” tells us the degree to which the field circulates around a point; these modes of behavior are about all a vector field can do. The importance of the divergence and curl of a vector field can be understood using a theorem known as Helmholtz’s theorem, which states that if C(r) and D(r) are, respectively, smooth scalar and vector functions that decay faster than 1/|r|2 as |r| → ∞, then there exists a unique vector field F such that ∇ · F = C(r) and ∇ × F = D(r). Consequently, the divergence and curl of F are sufficient to determine F(r), and conversely, F can be decomposed in terms of its divergence and curl.

600

Quantum time-dependent statistical mechanics

spatial and time dependence of the vector fields. The elegance and beauty of Maxwell’s theory lies in the simple connection it establishes between the electric and magnetic fields. For free fields, Maxwell’s equations take the form (in cgs units) ∇ · E = 0,

∇×E=−

∇ · B = 0,

∇×B=

1 ∂B c ∂t

1 ∂E , c ∂t

(14.1.1)

where c is the speed of light in vacuum. While the magnetic field is always divergencefree (there are no magnetic monopoles), the electric field has zero divergence only in the absence of sources. The rates of change of the magnetic and electric fields determine, respectively, how the electric and magnetic fields “curl” to form closed loops. Notice that the free-field Maxwell equations constitute a set of eight homogeneous partial differential equations for the six field components, Ex (r, t), Ey (r, t), Ez (r, t), Bx (r, t), By (r, t), Bz (r, t). Hence, the set is overdetermined, and there is some redundancy among the field equations. This redundancy arises because of a property exhibited by this and other types of field theories known as gauge invariance or gauge freedom. In order to manifest the gauge freedom and remove the redundancy, it is convenient to work with related fields A(r, t) and φ(r, t) known as the vector and scalar potentials, respectively. These are related to the electric and magnetic fields by the transformation B = ∇ × A,

E = −∇φ −

1 ∂A . c ∂t

(14.1.2)

The relation B = ∇ × A follows from the fact that any divergence-free field can always be expressed as the curl of another field since ∇ · (∇ × A) = 0. The relation for E in eqn. (14.1.2) arises from the fact that if B = 0, then ∇ × E = 0, so that E can be expressed as a gradient E = −∇φ. The second term in the equation for E is included when a nonzero magnetic field is present. Although these relations uniquely define the electric and magnetic fields, they possess some ambiguity in how the vector and scalar potentials A(r, t) and φ(r, t) are defined. Specifically, if new potentials A′ (r, t) and φ′ (r, t) are constructed from the original potentials via A′ (r, t) = A(r, t) + ∇χ(r, t) φ′ (r, t) = φ(r, t) −

1 ∂ χ(r, t), c ∂t

(14.1.3)

where χ(r, t) is an arbitrary scalar field, then the same electric and magnetic fields will result when A′ and φ′ are substituted into eqns. (14.1.2). The transformations in eqn. (14.1.3) are known as gauge transformations, and the freedom to choose χ(r, t) arbitrarily is the manifestation of the gauge invariance of the electric and magnetic fields mentioned above. Moreover, if the definitions in eqns. (14.1.2) are substituted

Time-dependent systems

601

into the free-field Maxwell equations, then the vector potential is seen to satisfy the classical wave equation ∇2 A(r, t) −

1 ∂2 A(r, t) = 0, c2 ∂t2

(14.1.4)

provided the gauge function χ(r, t) is also chosen to be a solution of Laplace’s equation ∇2 χ(r, t) = 0. This choice, known as the Coulomb gauge, is tantamount to requiring that A(r, t) satisfy ∇ · A(r, t) = 0.

(14.1.5)

In the Coulomb gauge, the scalar potential satisfies Laplace’s equation ∇2 φ = 0, which, in free space with no charges present, means that φ is, at most, a linear function of r and independent of time. If we require that φ → 0 as |r| → ∞, then the only possible choice is φ = 0, which we can take with no loss of generality.2 The appropriate solution to the wave equation for A(r, t) for an electromagnetic field in a vacuum describe freely propagating waves of frequency ω, wavelength λ, and wave vector k (|k| = 2π/λ) prescribing the direction of propagation is A(r, t) = A0 cos (k · r − ωt + ϕ0 ) .

(14.1.6)

Here, ω = c|k|, ϕ0 is an arbitrary phase, and A0 is the amplitude of the wave. Since ∇ · A = 0, it follows that k · A = k · A0 = 0, and A is perpendicular to the direction of propagation. From eqns. (14.1.2), the electric and magnetic fields that result are E(r, t) =

ω A0 sin (k · r − ωt + ϕ0 ) ≡ E0 sin (k · r − ωt + ϕ0 ) c

B(r, t) = −k × A0 sin (k · r − ωt + ϕ0 ) ≡ B0 sin (k · r − ωt + ϕ0 ) . (14.1.7) Note that, since k × A is perpendicular to k and A0 , it follows that E0 ⊥ B0 ⊥ k. A snapshot in time of the free electromagnetic wave described by eqns. (14.1.7) is shown in Fig. 14.1. In general, when an electromagnetic field interacts with matter, the field must be included in the Hamiltonian of the physical system, which complicates the mathematical treatment considerably. Therefore, in the present discussion, we will work in an approximation in which the field can be treated as an external perturbation whose degrees of freedom do not need to be explicitly included in the quantum description of the system, i.e., they do not need to be included in the state vector of the system. In this approximation, the Hamiltonian for a system of N particles with charges q1 , .., qN 2 These choices are particular to the free-field theory. For time-dependent fields in the presence of sources, a more appropriate gauge choice is the Lorentz gauge:

∇ · A(r, t) +

1 ∂ φ(r, t) = 0. c ∂t

602

Quantum time-dependent statistical mechanics

Fig. 14.1 Sketch of an electromagnetic wave described by eqn. (14.1.7).

and masses m1 , ..., mN , including the external electromagnetic field, is specified in terms of the vector and scalar potentials and is given by ˆ H(t) =

N N 2 X X 1  qi ˆ qi φ(ˆri , t) + U (ˆr1 , ..., ˆrN ). pi − A(ˆri , t) + 2mi c i=1 i=1

(14.1.8)

When the square in the kinetic energy is expanded out, the Hamiltonian takes the general form ˆ ˆ0 +H ˆ 1 (t), H(t) =H (14.1.9) ˆ 0 is the pure system Hamiltonian in the absence of the field where H ˆ0 = H

N X ˆ 2i p + U (ˆr1 , ..., ˆrN ), 2mi i=1

(14.1.10)

ˆ 1 (t) involves the coupling to the field and H ˆ 1 (t) = − H +

N X qi ˆi] [ˆ pi · A(ˆri , t) + A(ˆri , t) · p 2m ic i=1

N X i=1

N

X qi2 qi φ(ˆri , t). A2 (ˆri , t) + 2 2mi c i=1

(14.1.11)

ˆ 0 , which For a Hamiltonian of the form given in eqn. (14.1.9), the eigenstates of H satisfy ˆ 0 |Ek i = Ek |Ek i, H (14.1.12)

ˆ are no longer eigenstates of H(t), which means that they are not stationary states. ˆ 1 (t) is to induce transitions among the eigenstates of H ˆ 0 as Thus, the effect of H functions of time. If the system absorbs energy from the field, it can be excited from an initial state |Ei i with energy Ei to a final state |Ef i with energy Ef as depicted in Fig. 14.2. When the system returns to its initial state, the energy emitted is detected, ˆ 0 . This is one facet of the providing information about the eigenvalue spectrum of H experimental technique known as spectroscopy.

Time-dependent perturbation theory

603

Fig. 14.2 A transition from an initial eigenstate of H0 with energy Ei to a final state with energy Ef due to the external field coupling.

The remainder of this chapter will be devoted to analyzing the process in Fig. 14.2 both as an isolated event and in an ensemble of quantum systems. We will examine the behavior of quantum systems subject to time-dependent perturbations of both a general nature and specific to an external electromagnetic field-coupling, thereby providing an introduction to quantum time correlation functions and the field of linear spectroscopy. Finally, we will discuss numerical approaches for approximating quantum correlation functions. We begin with a discussion of the time-dependent Schr¨odinger ˆ 1 (t) is a weak perturbation. equation when H

14.2 14.2.1

Time-dependent perturbation theory in quantum mechanics The interaction picture

ˆ ˆ0+H ˆ 1 (t), the unperturbed HamilFor a system described by a Hamiltonian H(t) =H ˆ tonian H0 is taken to describe a physical system of interest, such as a gas, liquid, solid, ˆ 1 (t) represents an arbitrary time-dependent perturbation that induces or solution. H ˆ 0. transitions between the eigenstates of H The state vector of the system |Ψ(t)i evolves in time from an initial state vector |Ψ(t0 )i according to the time-dependent Schr¨odinger equation   ˆ ˆ0 +H ˆ 1 (t) |Ψ(t)i = i~ ∂ |Ψ(t)i. H(t)|Ψ(t)i = H ∂t

(14.2.1)

Although it might seem that obtaining an appropriate propagator from eqn. (14.2.1) is straightforward, the presence of operators on the left side of the equation, together ˆ 0, H ˆ 1 (t)] 6= 0, in general, renders this task nontrivial. However, if with the fact that [H ˆ we view H1 (t) as a weak perturbation, then we can develop a perturbative approach to the solution of eqn. (14.2.1). We begin by noting that eqn. (14.2.1) can be cast in a form more amenable to a perturbative treatment by transforming the state vector from |Ψ(t)i to |Φ(t)i via ˆ

|Ψ(t)i = e−iH0 (t−t0 )/~ |Φ(t)i.

(14.2.2)

In Section 9.2.6, we introduced the concept of a picture in quantum mechanics and discussed the difference between the Schr¨odinger and Heisenberg pictures. The state vector |Φ(t)i in eqn. (14.2.2) represents yet another quantum mechanical picture called the interaction picture. The interaction picture can be considered as “intermediate”

604

Quantum time-dependent statistical mechanics

between the Schr¨odinger and Heisenberg pictures. Recall that in the Schr¨odinger picture, operators are static and the state evolves in time, while the opposite is true in the Heisenberg picture. In the interaction picture, both the state vector and the operators evolve in time. The transformation of an operator Aˆ from the Schr¨odinger picture to the interaction picture is given by ˆ ˆ 0 (t−t0 )/~ ˆ −iH AˆI (t) = eiH0 (t−t0 )/~ Ae ,

(14.2.3)

which is equivalent to an equation of motion of the form 1 dAˆI (t) ˆ 0 ]. = [AˆI (t), H dt i~

(14.2.4)

Note the similarity between this transformation and that of eqn. (9.2.58) between the Schr¨odinger and Heisenberg pictures. Equation (14.2.4) indicates that the time evolution of operators in the interaction picture is determined solely by the unperturbed ˆ 0. Hamiltonian H Equation (14.2.2) represents a transformation of the state vector between the Schr¨odinger and interaction pictures. The time-evolution equation for the state |Φ(t)i can be derived by substituting eqn. (14.2.2) into eqn. (14.2.1), which yields   ∂ ˆ ˆ 0 (t−t0 )/~ ˆ0 +H ˆ 1 (t) e−iH H |Φ(t)i = i~ e−iH0 (t−t0 )/~ |Φ(t)i ∂t   ∂ ˆ 0 (t−t0 )/~ ˆ 0 (t−t0 )/~ ˆ ˆ0 +H ˆ 1 (t) e−iH ˆ 0 e−iH H |Φ(t)i = H |Φ(t)i + e−iH0 (t−t0 )/~ i~ |Φ(t)i ∂t ∂ ˆ 0 (t−t0 )/~ ˆ ˆ 1 (t)e−iH H |Φ(t)i = e−iH0 (t−t0 )/~ i~ |Φ(t)i. ∂t

(14.2.5)

ˆ

Multiplying on the left by eiH0 (t−t0 )/~ yields ˆ

ˆ

ˆ 1 (t)e−iH0 (t−t0 )/~ |Φ(t)i = i~ eiH0 (t−t0 )/~ H

∂ |Φ(t)i. ∂t

(14.2.6)

ˆ 0 (t − t0 )/~]H ˆ 1 (t) exp[−iH ˆ 0 (t − t0 )/~] is the According to eqn. (14.2.3), the exp[iH interaction-picture representation of the perturbation Hamiltonian, which we will deˆ I (t). The time evolution of the state vector in the interaction picture is, note as H therefore, given by a Schr¨odinger equation of the form ˆ I (t)|Φ(t)i = i~ ∂ |Φ(t)i. H ∂t

(14.2.7)

Equation (14.2.7) shows that this time evolution is determined entirely by the interactionˆ I (t). According to eqn. (14.2.2), the initial picture representation of the perturbation, H condition to eqn. (14.2.7) is |Φ(t0 )i = |Ψ(t0 )i. In the next subsection, we will develop an iterative solution to eqn. (14.2.7), which will reveal the detailed structure of the propagator for time-dependent systems.

Time-dependent perturbation theory

14.2.2

605

Iterative solution for the interaction-picture state vector

ˆI (t; t0 ), The solution to eqn. (14.2.7) can be expressed in terms of a unitary operator U which is the interaction-picture propagator. The initial state |Φ(t0 )i evolves in time according to ˆI (t; t0 )|Φ(t0 )i = U ˆI (t; t0 )|Ψ(t0 )i. |Φ(t)i = U (14.2.8) Substitution of eqn. (14.2.8) into eqn. (14.2.7) yields an evolution equation for the ˆI (t; t0 ): propagator U ˆ I (t)U ˆI (t; t0 ). ˆI (t; t0 ) = i~ ∂ U (14.2.9) H ∂t ˆI (t0 ; t0 ) = I. ˆ In developing a solution to Equation (14.2.9) has the initial condition U ˆ eqn. (14.2.9), we assume that HI (t) is a small perturbation so that the solution can ˆ I (t). Such a solution is generated by be constructed in terms of a power series in H rewriting eqn. (14.2.9) as an integral equation: Z t ˆ I (t′ )U ˆI (t; t0 ) = U ˆI (t0 ; t0 ) − i ˆI (t′ ; t0 ) U dt′ H ~ t0 Z i t ′ ˆ ′ ˆ ′ dt HI (t )UI (t ; t0 ). = Iˆ − ~ t0

(14.2.10)

ˆI (t; t0 ). Taking the time We can easily verify that eqn. (14.2.10) is the solution for U derivative of both sides of eqn. (14.2.10) gives ∂ ˆ i ∂ i~ U I (t; t0 ) = −i~ ∂t ~ ∂t

Z

t

ˆ I (t′ )U ˆI (t′ ; t0 ) dt′ H

t0

ˆ I (t)U ˆI (t; t0 ). =H

(14.2.11)

Equation (14.2.10) allows a perturbation series solution to be developed systematically. ˆ I (t) = 0 in eqn. (14.2.10), which We start with a zeroth-order solution by setting H gives the trivial result ˆ ˆ (0) (t; t0 ) = I. (14.2.12) U I This solution is now fed back into the right side of eqn. (14.2.10) to develop a first-order solution: Z t ˆ I (t′ )U ˆ (0) (t′ ; t0 ) ˆ (1) (t; t0 ) = Iˆ − i dt′ H U I I ~ t0 Z i t ′ ˆ ′ = Iˆ − (14.2.13) dt HI (t ). ~ t0 This first-order solution is now substituted back into the right side of eqn. (14.2.10) to generate a second-order solution

606

Quantum time-dependent statistical mechanics

ˆ (2) (t; t0 ) = Iˆ − i U I ~

Z

t

i = Iˆ − ~

Z

t

t0

ˆ I (t′ )U ˆ (1) (t′ ; t0 ) dt′ H I ˆ I (t′ ) + dt′ H

t0

 2 Z t Z t′ i ˆ I (t′ )H ˆ I (t′′ ), dt′ dt′′ H ~ t0 t0

(14.2.14)

and so forth. Thus we see that the kth-order solution is generated from the (k − 1)thorder solution according to the recursion formula: Z t ˆ I (t′ )U ˆ (k) (t; t0 ) = Iˆ − i ˆ (k−1) (t′ ; t0 ). U (14.2.15) dt′ H I I ~ t0 Setting k = 3 in eqn. (14.2.15), for example, yields the third-order solution  2 Z t Z t Z t′ i ′ ′ ˆ ′ ˆ I (t′ )H ˆ I (t′′ ) ˆ (3) (t; t0 ) = Iˆ − i U dt dt H (t ) + dt′′ H I I ~ t0 ~ t0 t0  3 Z t Z t′′ Z t′ i ˆ I (t′ )H ˆ I (t′′ )H ˆ I (t′′′ ). dt′′ dt′′′ H (14.2.16) dt′ − ~ t0 t0 t0 By taking the limit k → ∞ in eqn. (14.2.15) and summing over all orders, we obtain ˆI (t; t0 ) as a series: the exact solution for U ˆI (t; t0 ) = U

k Z t Z t(k−1) ∞  X i (1) ˆ I (t(1) ) · · · H ˆ I (t(k) ). dt(k) H dt · · · − ~ t0 t0

(14.2.17)

k=0

ˆI (t; t0 ), approximated at any order in perturbation theory, can The propagator U be used to approximate the time evolution of the state vector |Φ(t)i in the interaction picture. In general, this evolution is ˆI (t; t0 )|Φ(t0 )i, |Φ(t)i = U

(14.2.18)

and from this expression, the time evolution of the original state vector |Ψ(t)i in the Schr¨odinger picture can be determined as ˆ

ˆ

ˆI (t; t0 )|Φ(t0 )i |Ψ(t)i = e−iH0 (t−t0 )/~ |Φ(t)i = e−iH0 (t−t0 )/~ U ˆ

ˆI (t; t0 )|Ψ(t0 )i = e−iH0 (t−t0 )/~ U ˆ (t; t0 )|Ψ(t0 )i. ≡U

(14.2.19)

Here, we have used the fact that |Φ(t0 )i = |Ψ(t0 )i. In the last line, the full propagator in the Schr¨odinger picture is identified as ˆ 0 (t−t0 )/~ ˆ ˆ (t; t0 ) = e−iH ˆ0 (t; t0 )U ˆI (t; t0 ). U UI (t; t0 ) = U

(14.2.20)

Time-dependent perturbation theory

607

From eqn. (14.2.20), the structure of the full propagator is revealed. Let us use eqn. (14.2.20) to generate the first few orders in the propagator. Substituting eqn. (14.2.12) ˆ (t; t0 ): into eqn. (14.2.20) yields the lowest order contribution to U ˆ 0 (t−t0 )/~ ˆ (0) (t; t0 ) = e−iH ˆ0 (t; t0 ). U =U

(14.2.21)

Thus, at zeroth order, eqn. (14.2.21) implies that the system is propagated using the ˆ0 (t; t0 ) as though the perturbation did not exist. At first unperturbed propagator U order, we obtain Z t i ˆ ˆ 0 (t−t0 )/~ ˆ I (t′ ) ˆ (1) (t; t0 ) = e−iH dt′ H U − e−iH0 (t−t0 )/~ ~ t0 Z t ′ i ˆ ˆ ˆ ˆ 0 (t′ −t0 )/~ ˆ 1 (t′ )e−iH = e−iH0 (t−t0 )/~ − e−iH0 (t−t0 )/~ dt′ eiH0 (t −t0 )/~ H ~ t0 Z t ′ i ˆ ˆ ˆ 0 (t′ −t0 )/~ ˆ 1 (t′ )e−iH = e−iH0 (t−t0 )/~ − dt′ e−iH0 (t−t )/~ H ~ t0 Z t ˆ 1 (t′ )U ˆ0 (t; t0 ) − i ˆ0 (t; t′ )H ˆ0 (t′ ; t0 ), =U dt′ U (14.2.22) ~ t0 where in the second line, the definition ˆ ˆ 0 (t−t0 )/~ ˆ ˆ I (t) = eiH H1 (t)e−iH0 (t−t0 )/~ H

(14.2.23)

has been used. Equation (14.2.22) indicates that at first order, the propagator is composed of two terms. The first term is simply the unperturbed propagation from t0 to t. In the second term, the system undergoes unperturbed propagation from t0 to t′ . ˆ 1 (t′ ) acts, and finally, from t′ to t, the propagation is unperAt t′ , the perturbation H turbed. In addition, we must integrate over all possible intermediate times t′ at which the perturbation is applied. In a similar manner, it can be shown that up to second-order, the full propagator is given by Z t ˆ 1 (t′ )U ˆ0 (t; t′ )H ˆ0 (t′ ; t0 ) ˆ (2) (t; t0 ) = U ˆ0 (t; t0 ) − i dt′ U U ~ t0 +

 2 Z t Z t′ i ˆ 1 (t′ )U ˆ 1 (t′′ )U ˆ0 (t; t′ )H ˆ0 (t′ ; t′′ )H ˆ0 (t′′ ; t0 ). dt′ dt′′ U ~ t0 t0

(14.2.24)

At second-order, the new (last) term involves unperturbed propagation from t0 to ˆ 1 (t′′ ) at t′′ , unperturbed propagation from t′′ to t′ , action of H ˆ 1 (t′ ) t′′ , action of H at t′ , and finally unperturbed propagation from t′ to t. Again, we must integrate over intermediate times t′ and t′′ at which the perturbation is applied. A pictorial representation of the full propagator is given in Figure 14.3. The picture on the left side of the equal sign in Fig. 14.3 indicates that the perturbation causes the system to

608

Quantum time-dependent statistical mechanics

Fig. 14.3 Pictorial representation of the perturbation expansion of the time-dependent propagator.

undergo some undetermined dynamical process between t0 and t. On the right side of ˆ1 the equal sign, the process is broken down in terms of the action of the perturbation H at specific intermediate times (which must be integrated over), indicated in the figure ˆ 1 acts on the system by the dots. At the kth order, the perturbation Hamiltonian H at k instances in time. The limits of integration indicate that these time instances are ordered chronologically. ˆ 1 acts on the unperturbed system The specific ordering of the times at which H ˆI (t; t0 ), the order in raises an important point. In each term in the expansion for U ′ ′′ ˆ ˆ which the operators HI (t ), HI (t ),... are multiplied is critical. The reason for this ˆ I (t), H ˆ I (t′ )] does not vanish if t 6= t′ . Thus, to remove is that the commutator [H ˆ I (t′ ), H ˆ I (t′′ ),... in a time any ambiguity when specifying the order of the operators H series, we introduce the time-ordering operator, T . The action of T on a product ˆ 1 )B(t ˆ 2 )C(t ˆ 3 ) · · · D(t ˆ n ) reorders the operators in the of time-dependent operators A(t product chronologically in time from earliest to latest proceeding from right to left in the product. This means that operators earliest in time act before operators at later ˆ 1 ) and B(t ˆ 2 ) is times. For example, the action of T on two operators A(t  ˆ 1 )B(t ˆ 2) t2 < t1 h i  A(t ˆ ˆ T A(t1 )B(t2 ) = (14.2.25)  ˆ ˆ 1) B(t2 )A(t t1 < t2

Let us now apply the time-ordering operator T to the second-order term. We first write the double integral Z t Z t′ ˆ I (t′ )H ˆ I (t′′ ) I2 (t0 , t) = dt′ dt′′ H (14.2.26) t0

t0

as a sum of two terms generated by interchanging the dummy variables t′ and t′′ : "Z # Z t Z t′′ Z t′ t 1 ′′ ′′ ˆ ′ ′ ′ ˆ ′′ ′ˆ ′′ ˆ dt dt dt HI (t )HI (t ) . (14.2.27) dt HI (t )HI (t ) + I2 (t0 , t) = 2 t0 t0 t0 t0

Time-dependent perturbation theory

609

Fig. 14.4 The two integration regions in eqn. (14.2.27): (a) The region t′′ ∈ [t0 , t], t′ ∈ [t0 , t′′ ]. (b) The region t′ ∈ [t0 , t], t′′ ∈ [t′ , t].

Figure 14.4(a) illustrates the integration region t′′ ∈ [t0 , t], t′ ∈ [t0 , t′′ ] in the t′ -t′′ plane, which is covered in the second term on the right side of eqn. (14.2.27). The same region can be covered by choosing t′ ∈ [t0 , t] and t′′ ∈ [t′ , t], as illustrated Fig. 14.4(b). With this choice, eqn. (14.2.27) becomes "Z # Z t Z t′ Z t t 1 ′ ′′ ˆ ′ ˆ ′′ ′ ′′ ˆ ′′ ˆ ′ dt dt HI (t )HI (t ) + dt dt HI (t )HI (t ) . (14.2.28) I(t0 , t) = 2 t0 t0 t0 t′ In the first term on the right side of eqn. (14.2.28), t′′ < t′ and HI (t′′ ) acts first, ˆ I (t′ ) acts first, followed by followed by HI (t′ ). In the second term, t′ < t′′ and H ′′ ˆ HI (t ). The two terms can thus be combined with both t′ and t′′ lying in the interval [t0 , t] by introducing the time-ordering operator: Z

t t0

dt′

Z

t′

t0

ˆ I (t′ )H ˆ I (t′′ ) = 1 dt′′ H 2

Z

t

t0

dt′

Z

t

t0

i h ˆ I (t′ )H ˆ I (t′′ ) . dt′′ T H

(14.2.29)

The same analysis can be applied to each order in eqn. (14.2.17), recognizing that a product of k operators can be ordered in k! ways by the time-ordering operator. Equation (14.2.17) can then be rewritten in terms of a time-ordered product

ˆI (t; t0 ) = U

k ∞  X i 1 − ~ k!

k=0

×

Z

t

dt1 t0

Z

t

t0

dt2 · · ·

Z

t

t0

h i ˆ I (t1 )H ˆ I (t2 ) · · · H ˆ I (tk ) . dtk T H

(14.2.30)

The sum in eqn. (14.2.30) resembles the power-series expansion of an exponential, and consequently we can write the sum symbolically as    Z i t ′ ˆ ′ ˆ UI (t; t0 ) = T exp − dt HI (t ) , (14.2.31) ~ t0

610

Quantum time-dependent statistical mechanics

which is known as a time-ordered exponential. Since eqn. (14.2.31) is just a shorthand for eqn. (14.2.30), it is understood that the time-ordering operator orders the operators in each term of an expansion of the exponential. Given the formalism of time-dependent perturbation theory, we now seek to answer ˆ 0 with energy Ei , the following question: If the system is initially in an eigenstate of H what is the probability as a function of time t that the system will undergo a transition ˆ 0 with energy Ef ? To answer this question, we first set the to a new eigenstate of H ˆ 0 . Then, the amplitude initial state vector |Ψ(t0 )i equal to the eigenvector |Ei i of H as a function of time that the system will undergo a transition to the eigenstate |Ef i ˆ (t; t0 ) and is obtained by propagating this initial state to time t with the propagator U then taking the overlap of the resulting state with the eigenstate |Ef i: ˆ (t; t0 )|Ei i. Af i (t) = hEf |U

(14.2.32)

The probability is just the square modulus of this complex amplitude: 2 ˆ (t; t0 )|Ei i . Pf i (t) = hEf |U

(14.2.33)

Consider first the amplitude at zeroth order in perturbation theory. At this order, ˆ (t; t0 ) = U ˆ0 (t; t0 ), and the amplitude is simply U ˆ

(0)

Af i (t) = hEf |e−iH0 (t−t0 )/~ |Ei i = e−iEi (t−t0 )/~ hEf |Ei i,

(14.2.34)

which clearly vanishes by orthogonality if Ei 6= Ef . Thus, at zeroth order, the only possibility is a trivial one in which no transition occurs. The lowest nontrivial (Ei 6= Ef ) result occurs at first order, where the transition amplitude is given by (1)

ˆ (1) (t; t0 )|Ei i Af i (t) = hEf |U i =− ~

Z

t

i ~

Z

t

=−

i ~

t

=−

Z

t0

t0

t0

ˆ 1 (t′ )U ˆ0 (t; t′ )H ˆ0 (t′ ; t0 )|Ei i dt′ hEf |U ˆ



ˆ



ˆ 1 (t′ )e−iH0 (t −t0 )/~ |Ei i dt′ hEf |e−iH0 (t−t )/~ H ′ ′ ˆ 1 (t′ )|Ei i dt′ e−iEf (t−t )/~ e−iEi (t −t0 )/~ hEf |H

i = − e−iEf t/~ eiEi t0 /~ ~

Z

t

t0

′ ˆ 1 (t′ )|Ei i. dt′ ei(Ef −Ei )t /~ hEf |H

(14.2.35)

We now define a transition frequency ωf i as ωf i = (Ef − Ei )/~. Taking the absolute square of the last line of eqn. (14.2.35), we obtain the probability at first order as

Time-dependent perturbation theory

(1)

Pf i (t) =

Z 2 1 t ′ iωf i t′ ′ ˆ dt e hE | H (t )|E i f 1 i . 2 ~ t0

611

(14.2.36)

At first order, the probability depends on the matrix element of the perturbation between the initial and final eigenstates. ˆ 1 (t). The Thus far, the formalism we have derived is valid for any perturbation H ˆ 0 it specific choice of this perturbation determines the manifold of eigenstates of H probes, as we will demonstrate in the next subsection. 14.2.3

Fermi’s Golden Rule

In Section 14.1, we formulated the Hamiltonian of a material system coupled to an external electromagnetic field, and we derived expressions for the electromagnetic field in the absence of sources or physical boundaries as solutions of the free-field wave equation. For the remainder of this chapter, we will focus on weak fields, so that the term in eqn. (14.1.11) proportional to A2 can be neglected. We will also focus on a class of experiments for which the wavelength of the electromagnetic radiation is long compared to the size of the sample under investigation. Under this condition, the spatial dependence of the electromagnetic field can also be neglected, since cos(k · r − ωt + ϕ0 ) = Re[exp(ik · r − iωt + ϕ0 )], and exp(ik · r) ≈ 1 in the long-wavelength limit ˆ 1 (t) reduces to the specific form where |k| = 2π/λ ≈ 0. Thus, H   ˆ 1 (t) = −2VF (ω) cos(ωt) = −VF (ω) eiωt + e−iωt , H

(14.2.37)

where V is a Hermitian operator. As noted above, we are interested in the probability that the system will undergo ˆ 0 under the perturbation of eqn. a transition from an eigenstate |Ei i to |Ef i of H (14.2.37). However, since the perturbation is periodic in time, the problem must be stated a little differently. If we apply the perturbation over a long time, what is the mean probability per unit time of a transition or mean transition rate? In order to simplify the calculation of this rate, let us consider a time interval T and choose t0 = −T /2 and t = T /2. The mean transition rate can be expressed as (1)

(1)

Rf i (T ) =

Pf i (T ) T

|F (ω)|2 = T ~2

Z T /2 i 2 h 2 i(ωf i +ω)t i(ωf i −ω)t dt e +e |hEf |V|Ei i| . −T /2

For finite T , the time integral can be carried out explicitly yielding

(14.2.38)

612

Quantum time-dependent statistical mechanics

Z

T /2

−T /2

i h dt ei(ωf i +ω)t + ei(ωf i −ω)t = T



 sin[(ωf i + ω)T /2] sin[(ωf i − ω)T /2] . + (ωf i + ω)T /2 (ωf i − ω)T /2

(14.2.39)

Assuming ω > 0, in the limit of large T , the second term on the right in eqn. (14.2.39) dominates over the first and peaks sharply at ω = ωf i . Thus, we can retain only this term and write the mean transition rate as (1)

Rf i (T ) =

2 1 2 2 sin (ωf i − ω)T /2 T |F (ω)| |hE |V|E i| . f i ~2 [(ωf i − ω)T /2]2

(14.2.40)

Regarding eqn. (14.2.40) as a function of ω, at large T , this function becomes highly peaked when ω = ωf i but drops to zero rapidly away from ω = ωf i . The condition ωf i = ω is equivalent to the condition Ef = Ei + ~ω, which is a statement of energy conservation. Since ~ω is the energy quantum of the electromagnetic field, also known as a photon, the transition can only occur if the energy of the field frequency ω is exactly “tuned” for the the transition from Ei to Ef . Hence, a monochromatic field of frequency ω can be used as a probe of the allowed transitions and hence the eigenvalue ˆ 0. structure of H We now consider the T → ∞ limit more carefully. Denoting the rate in this limit simply as Rf i , the integral in eqn. (14.2.39) in this limit becomes Z T /2 h i Z ∞ h i i(ωf i +ω)t i(ωf i −ω)t lim dt e +e = dt ei(ωf i +ω)t + ei(ωf i −ω)t T →∞

−T /2

−∞

= 2π [δ(ωf i + ω) + δ(ωf i − ω)] .

(14.2.41)

Again, for ω > 0 and ωf i > 0, only the second δ-function is ever nonzero, so we can drop the first δ-function. Note that the second δ-function in eqn. (14.2.41) can also be written as 2π~δ(Ef − Ei − ~ω). Therefore, the expression for the mean rate in this limit can be written as (1)

Rf i (ω) = lim

T →∞

Pf i (T ) T

1 = lim T →∞ T ~2 1 T →∞ T ~2

= lim

×

Z 2 T /2 2 dt ei(ωf i −ω)t |F (ω)|2 |hEf |V|Ei i| −T /2 "Z # T /2

dt ei(ωf i −ω)t

−T /2

"Z

T /2

−T /2

#

2

dt ei(ωf i −ω)t |F (ω)|2 |hEf |V|Ei i| ,

(14.2.42)

where we have dropped the “(1)” superscript (it is understood that the result is derived from first-order perturbation theory) and indicated explicitly the dependence of Rf i

Frequency spectra

613

on the frequency ω. When the integral in the third line of eqn. (14.2.42) is replaced by the δ-function, the remaining integral becomes simply T since ωf i = ω, and this T cancels the T in the denominator. For this reason, the division by T in eqn. (14.2.38) is (1) (1) equivalent to expressing the rate as a proper derivative limT →∞ Pf i (T )/T = dPf i /dt. The expression for the rate now becomes Rf i (ω) =

2π 2 |F (ω)|2 |hEf |V|Ei i| δ(Ef − Ei − ~ω), ~

(14.2.43)

which is known as Fermi’s Golden Rule. The rule states that, in first-order perturbation theory, the transition rate depends only on the square of the matrix element of the operator V between initial and final states and explicitly requires energy conservation via the δ-function. Fermi’s Golden Rule predicts the rate of transitions from a specific ˆ 0 and which initial state |Ei i to a final state |Ef i, both of which are eigenstates of H are connected via the energy conservation condition Ef = Ei + ~ω.

14.3

Time correlation functions and frequency spectra

In this section, the Fermi Golden Rule expression will be used to analyze the output of an experiment in which a monochromatic field is applied to an ensemble of systems. If we wish to calculate the transition rate for the ensemble, we must remember that the systems in the ensemble are not in a single initial state |Ei i. Rather, there is a ˆ 0 ), which distribution of initial states prescribed by the equilibrium density matrix ρ(H ˆ ˆ satisfies the equilibrium Liouville equation [H0 , ρ(H0 )] = 0. Thus, in the canonical ˆ0 ensemble, the probability that a given ensemble member is in an eigenstate of H with energy Ei is the density matrix eigenvalue wi =

e−βEi e−βEi .  = ˆ0 Q(N, V, T ) Tr e−β H

(14.3.1)

The rate we seek is the ensemble average of Rf i (ω) over initial states, denoted R(ω), which is given by X R(ω) = hRf i (ω)i = Rf i (ω)wi . (14.3.2) i,f

Although both initial and final states are summed in eqn. (14.3.2), we know that the sum over final states is not independent, since the only permissible final states are those connected to initial states by energy conservation. Equation (14.3.2) indicates that the contribution from each possible initial state |Ei i to the average rate is the probability wi that a given member of the ensemble is initially in that state. Ultimately, we sum over those final states that can be reached from the initial state without violating energy conservation to obtain the average transition rate. When we substitute eqn. (14.2.43) for Rf i (ω) into eqn. (14.3.2), the average rate becomes X 2π |F (ω)|2 wi |hEf |V|Ei i|2 δ(Ef − Ei − ~ω). (14.3.3) R(ω) = ~ i,f

614

Quantum time-dependent statistical mechanics

Writing the δ-function as an integral, eqn. (14.3.3) becomes Z ∞ X 1 dt wi ei(Ef −Ei −~ω)t/~ |hEf |V|Ei i|2 R(ω) = 2 |F (ω)|2 ~ −∞ i,f

=

1 |F (ω)|2 ~2

Z

=

1 |F (ω)|2 ~2

Z





dt e−iωt

X

wi hEi |V|Ef ihEf |V|Ei ieiEf t/~ e−iEi t/~

dt e−iωt

X

wi hEi |V|Ef ihEf |eiH0 t/~ Ve−iH0 t/~ |Ei i.

−∞

−∞

i,f

i,f

ˆ

ˆ

(14.3.4)

ˆ 0 to In the last line, we have used the fact that |Ei i and |Ef i are eigenstates of H bring the two exponential factors into the angle brackets as the unperturbed propaˆ 0 t/~) and its conjugate exp(iH ˆ 0 t/~). Note, however, that the operator gator exp(−iH ˆ ˆ exp(iH0 t/~)V exp(−iH0 t/~) = V(t) is just the representation of the operator V in the interaction picture (see eqn. (14.2.3)). Thus, the average transition rate can be expressed as Z ∞ X 1 dt e−iωt wi hEi |V(0)|Ef ihEf |V(t)|Ei i, (14.3.5) R(ω) = 2 |F (ω)|2 ~ −∞ i,f

where the V(0) is the operator in the interaction picture at t = 0. Thus, both operators in eqn. (14.3.5) are represented within the same quantum-mechanical picture. Note that the sum over final states can now be performed using the completeness relation X |Ef ihEf | = Iˆ (14.3.6) f

ˆ 0 . Equation (14.3.5) thus becomes of the eigenstates of H Z ∞ X 1 2 wi hEi |V(0)V(t)|Ei i dt e−iωt R(ω) = 2 |F (ω)| ~ −∞ i Z ∞   1 1 ˆ = 2 |F (ω)|2 dt e−iωt Tr e−β H0 V(0)V(t) ~ Q(N, V, T ) −∞ Z ∞ 1 = 2 |F (ω)|2 dt e−iωt hV(0)V(t)i. ~ −∞

(14.3.7)

The last line shows that the ensemble-averaged transition rate at frequency ω is just the Fourier transform of the quantum time correlation function hV(0)V(t)i (Berne, 1971). ˆ with In general, a quantum time correlation function of two operators Aˆ and B ˆ respect to an unperturbed Hamiltonian H0 is given by h i ˆ0 −β H ˆ B(t)e ˆ Tr A(0) i h CAB (t) = . (14.3.8) ˆ0 Tr e−β H

Frequency spectra

615

Although quantum time correlation functions possess many of the same properties as their classical counterparts, we point out one crucial difference at this juncture. The operators V(0) and V(t) are individually Hermitian, but since [V(0), V(t)] 6= 0, the autocorrelation function in eqn. (14.3.7) is an expectation value of a non-Hermitian operator product V(0)V(t). Such a non-Hermitian expectation value suggests that something fundamental is missing from the above analysis. A little reflection reveals that the problem lies with our choice of ω > 0 in eqn. (14.2.39). A complete analysis requires that we examine ω < 0 as well, in which case the only first term on the right side of eqn. (14.2.40) is retained. This is tantamount to substituting −ω for ω in eqn. (14.3.3), which yields R(−ω) =

X 2π 2 |F (ω)|2 wi |hEf |V|Ei i| δ(Ef − Ei + ~ω). ~

(14.3.9)

i,f

Unlike eqn. (14.3.3), which refers to an absorption process with Ef = Ei + ~ω, eqn. (14.3.9) describes a process for which Ef = Ei − ~ω, or Ef < Ei , which is an emission process. The system starts in a state with energy Ei and releases an amount of energy ~ω as it decays to a state with lower energy Ef . We will now show that eqn. (14.3.9) can be expressed in terms of the correlation function V(t)V(0), which when added to eqn. (14.3.7) leads to the Hermitian combination V(t)V(0) + V(0)V(t). We begin by interchanging the summation indices i and f in eqn. (14.3.9), which gives X 2π 2 |F (ω)|2 R(−ω) = wf |hEi |V|Ef i| δ(Ei − Ef + ~ω), (14.3.10) ~ i,f

where wf =

e−βEf e−βEf i. h = ˆ0 Q(N, V, T ) Tr e−β H

(14.3.11)

The interchange of summation indices in eqn. (14.3.10) causes the δ-function condition in eqn. (14.3.10) to revert to that contained in eqn. (14.3.9), namely, Ef = Ei + ~ω. Substituting this condition into the expression for wf gives wf =

e−β(Ei +~ω) = wi e−β~ω . Q(N, V, T )

(14.3.12)

Since δ(x) = δ(−x), eqn. (14.3.10) can be expressed as R(−ω) =

X 2π 2 |F (ω)|2 e−β~ω wi |hEi |V|Ef i| δ(Ef − Ei + ~ω). ~

(14.3.13)

i,f

Comparing eqn. (14.3.13) with eqn. (14.3.3) reveals that R(−ω) = e−β~ω R(ω),

(14.3.14)

which is known as the condition of detailed balance. According to this condition, the probability per unit time of an emission event is smaller than that of an absorption

616

Quantum time-dependent statistical mechanics

event by a factor of exp(−β~ω) in a canonical distribution, for which the probability of finding the system with a high initial energy Ei is smaller than that for finding the system with a smaller initial energy. Equation (14.3.14) is a consequence of the statistical distribution of initial states; in fact, the individual transition rates Rf i (ω) satisfy the microscopic reversibility condition Rf i (ω) = Rif (ω). If we followed all of the individual transitions of an ensemble of systems, they would all obey microscopic reversibility. However, because we introduce a statistical distribution, we no longer retain such a detailed microscopic picture, and the ensemble averaged absorption and emission rates, R(ω) and R(−ω), do not obey the microscopic reversibility condition. If the analysis leading from eqn. (14.3.3) to eqn. (14.3.7) is carried out on eqn. (14.3.9), the result is Z ∞ 1 2 dt e−iωt hV(t)V(0)i, (14.3.15) R(−ω) = 2 |F (ω)| ~ −∞ and since R(−ω) 6= R(ω), it follows that the correlation functions hV(0)V(t)i and hV(t)V(0)i are not equal. This could also have been gleaned from the fact that the commutator [V(0), V(t)] does not vanish. We now define the net energy absorption spectrum Q(ω) as the net energy absorbed per unit time at frequency ω. Since the energy absorbed is just ~ω, and the net rate is the difference between the absorption and emission rates R(ω) − R(−ω), the energy spectrum Q(ω) is given by  (14.3.16) Q(ω) = [R(ω) − R(−ω)] ~ω = ~ωR(ω) 1 − e−β~ω . Note, however, that since R(−ω) = exp(−β~ω)R(ω), it follows that  R(ω) + R(−ω) = 1 + e−β~ω R(ω)

or

R(ω) =

R(ω) + R(−ω) . 1 + e−β~ω

(14.3.17)

(14.3.18)

Using eqn. (14.3.7) and eqn. (14.3.15), we express the sum R(ω) + R(−ω) as Z ∞ 1 2 R(ω) + R(−ω) = 2 |F (ω)| dt e−iωt hV(0)V(t) + V(t)V(0)i. (14.3.19) ~ −∞ Let us now define a new operator bracket ˆ B] ˆ + = AˆB ˆ +B ˆ Aˆ [A,

(14.3.20)

ˆ It is straightforward to see that the known as the anticommutator between Aˆ and B. anticommutator is manifestly Hermitian. Inserting the anticommutator definition into eqn. (14.3.19), we obtain Z ∞

1 2 dt e−iωt [V(0), V(t)]+ . (14.3.21) R(ω) + R(−ω) = 2 |F (ω)| ~ −∞

Examples of frequency spectra

617

Finally, substituting eqn. (14.3.21) into eqn. (14.3.18) and the result into eqn. (14.3.16), the energy spectrum becomes   Z ∞ 1 2ω 2 −iωt |F (ω)| tanh(β~ω/2) dt e [V(0), V(t)]+ . (14.3.22) Q(ω) = ~ 2 −∞ Equation (14.3.22) demonstrates that the energy spectrum Q(ω) can be expressed in terms of the ensemble average of a Hermitian operator combination [V(0), V(t)]+ . In particular, Q(ω) is directly related to the Fourier transform of a symmetric quantum time correlation function h[V(0), V(t)]+ i. It is instructive to examine the classical limit of the quantum spectrum in eqn. (14.3.22). In this limit, the operators V(0) and V(t) revert to classical phase-space functions so that V(0)V(t) = V(t)V(0) and [V(0), V(t)]+ −→ 2V(0)V(t). Also, as ~ −→ 0, tanh(β~ω/2) −→ β~ω/2. Combining these results, we find that the classical limit of the quantum spectrum is just Z ∞ ω2 Qcl (ω) = |F (ω)|2 dt e−iωt hV(0)V(t)icl , (14.3.23) kT −∞ where the notation hV(0)V(t)icl serves to remind us that the time correlation function is a classical one.

14.4

Examples of frequency spectra

From eqn. (14.3.22), it is clear that in order to calculate a spectrum, we must be able to calculate a quantum time correlation function. Unfortunately, numerical evaluation of these correlation functions is an extremely difficult computational problem, an issue we will explore in more detail in Section 14.6, where we will also describe approaches for approximating quantum time correlation functions from path integral molecular dynamics. In this section, we will use a simple, analytically solvable example, the harmonic oscillator, to illustrate the general idea of a quantum time correlation function. As discussed in Section 10.4.2, expressions for equilibrium averages and thermodynamic quantities for a harmonic oscillator form the basis of simple approximations for general anharmonic systems. We will use the result we derive here for the position autocorrelation function of a harmonic oscillator to devise a straightforward approach to approximate absorption spectra from classical molecular dynamics trajectories. 14.4.1

Position autocorrelation function of a harmonic oscillator

We begin by considering the position autocorrelation function hˆ x(t)ˆ x(0)i and symmetrized autocorrelation function h[ˆ x(t), xˆ(0)]+ i of a simple harmonic oscillator of frequency ω0 . In order to calculate the time evolution of the position operator, we use the fact that the Schr¨odinger operator x ˆ can be expressed in terms of the creation and annihilation operators (or raising and lowering operators) a ˆ† and a ˆ, respectively, as xˆ =



~ 2mω0

1/2

a ˆ+a ˆ†



(14.4.1)

618

Quantum time-dependent statistical mechanics

(Shankar, 1994), where the action of a ˆ and a ˆ† on an energy eigenstate of the oscillator is √ √ a ˆ|ni = n|n − 1i, a ˆ† |ni = n + 1|n + 1i. (14.4.2)

These operators satisfy the commutation relation [ˆ a, a ˆ† ] = 1. In terms of the creation and annihilation operators, the Hamiltonian for a harmonic oscillator of frequency ω0 can be written as   1 † ˆ ~ω0 . (14.4.3) H0 = a ˆ a ˆ+ 2

In the interaction picture, the operators a ˆ and a ˆ† evolve according to the equations of motion dˆ a 1 ˆ 0 ] = −iω0 a = [ˆ a, H ˆ dt i~ 1 † ˆ dˆ a† = [ˆ a , H0 ] = iω0 a ˆ† . dt i~

(14.4.4)

Equations (14.4.4) are readily solved to yield a ˆ(t) = a ˆe−iω0 t ,

a ˆ† (t) = a ˆ† eiω0 t .

(14.4.5)

Using eqn. (10.4.26), the correlation function hˆ x(0)ˆ x(t)i can be written as hˆ x(0)ˆ x(t)i =

~ 1 − e−β~ω0 2mω0 e−β~ω0 /2 ×

∞ X

n=0

e−(n+1/2)β~ω0 hn|(ˆ a+a ˆ† )(ˆ ae−iω0 t + a ˆ† eiω0 t )|ni.

(14.4.6)

After some algebra, we find that hˆ x(0)ˆ x(t)i =

i h ~ eiω0 t eβ~ω0 /2 + e−iω0 t e−β~ω0 /2 . 4mω0 sinh(β~ω0 /2)

Similarly, the correlation function hˆ x(t)ˆ x(0)i can be shown to be i h ~ hˆ x(t)ˆ x(0)i = e−iω0 t eβ~ω0 /2 + eiω0 t e−β~ω0 /2 . 4mω0 sinh(β~ω0 /2)

(14.4.7)

(14.4.8)

When we combine eqns. (14.4.7) and (14.4.8), the symmetric correlation function is found to be ~ 1

[ˆ x(0), x ˆ(t)]+ = coth(β~ω0 /2) cos(ω0 t). (14.4.9) 2 2mω0 A comparison of eqn. (14.4.9) and eqn. (13.3.41) reveals that the quantum and classical correlation functions are related by β~ω0 1

[ˆ x(0), x ˆ(t)]+ = coth(β~ω0 /2) hx(0)x(t)icl . 2 2

(14.4.10)

The connection established in eqn. (14.4.10) between the quantum and classical position autocorrelation functions of a harmonic oscillator can be exploited as a method

Examples of frequency spectra

619

for approximating the quantum position autocorrelation function of an anharmonic system using the corresponding classical autocorrelation function. The latter can be obtained directly from a molecular dynamics calculation. This approximation is known as the harmonic approximation, within which the quantum-mechanical prefactor (β~ω0 /2)coth(β~ω0 /2) serves to capture at least some of the true quantum character of the system (Bader and Berne, 1994; Skinner and Park, 2001). The utility of this approximation depends on how well a system can be represented as a collection of harmonic oscillators. 14.4.2

The infrared spectrum

One of the most commonly used approaches to probe the vibrational energy levels of a system is infrared spectroscopy, in which electromagnetic radiation of frequency in the near infrared part of the spectrum (1012 to 1014 Hz) is used to induce transitions between the vibrational levels. By sweeping through this frequency range, the technique records the frequencies at which the transitions occur and the intensities associated with each transition. Infrared spectroscopy makes use of the fact that the total electric dipole moment ˆ couples to the electric field component of an electromagnetic operator of a system µ wave via ˆ 1 (t) = −µ ˆ · E(t). H (14.4.11) If we orient the coordinate system such that E(t) = (0, 0, E(t)) and recall that the wavelength of infrared radiation is long compared to a typical sample size, then ˆ 1 (t) is of the form given in E(t) = E(ω)e−iωt , and the perturbation Hamiltonian H eqn. (14.2.37). For this perturbation, the energy spectrum is given by Z ∞

ω 2 µz (0), µ ˆz (t)]+ . dt e−iωt [ˆ (14.4.12) Q(ω) = |E(ω)| tanh(β~ω/2) ~ −∞ Since we could have chosen any direction for the electric field E(t), we may compute the spectrum by averaging over the three spatial directions and obtain Z ∞ ω ˆ · µ(0) ˆ ˆ ˆ Q(ω) = |E(ω)|2 tanh(β~ω/2) dt e−iωt hµ(t) + µ(0) · µ(t)i . (14.4.13) 3~ −∞ What is actually measured in an infrared experiment is the absorptivity α(ω) from the Beer-Lambert law. The product of α(ω) with the frequency-dependent index of refraction n(ω) is directly proportional to Q(ω) in eqn. (14.4.13), Q(ω) ∝ α(ω)n(ω). If the quantum dipole-moment autocorrelation function is replaced by a classical autoP correlation function, with µ(t) = i qi ri (t) the classical dipole moment for a system of N charges q1 , ..., qN , then the approximation in eqn. (14.4.10) can be employed. Through the use of the Kramers-Kr¨onig relations (see eqn. (14.5.20) in Section 14.5), a straightforward computational procedure can be employed to compute n(ω) (Iftimie and Tuckerman, 2005). The examples of water and ice considered by Iftimie and Tuckerman show that n(ω) has only a weak dependence on frequency so that α(ω)n(ω) is a reasonable representation of the experimental observable.

620

Quantum time-dependent statistical mechanics

As a specific example of an infrared spectrum, we show, in Fig. 14.5(a), computed IR spectra from ab initio molecular dynamics calculations of pure D2 O (Lee and Tuckerman, 2007), with a comparison to experiment (Bertie et al., 1989; Zelsmann, 1995). In Fig. 14.5(b), we show computed IR spectra for 1 M and a 13 M aqueous KOD solutions (Zhu and Tuckerman, 2002). In an ab initio molecular dynamics calculation, a molecular dynamics trajectory is generated with forces computed from electronic structure calculations performed “on the fly” as the simulation proceeds. The ab initio molecular dynamics technique allows chemical bond-breaking and -forming events (which occur frequently in KOD solutions as protons are transferred from water to OD− ions) to be treated implicitly in an unbiased manner. In each of these spectra, the quantum dipole correlation function is replaced by its classical counterpart using the harmonic approximation of eqn. (14.4.10). The simulation protocol employed in Fig. 14.5(a) leads to a small red shift in the OD vibrational band compared to experiment; however, the agreement is generally reasonable. The spectra in Fig. 14.5(b) show how a strongly red-shifted OD vibrational at 1950 cm−1 band diminishes with concentration and disappears in the pure D2 O case. This band can be assigned to water molecules in the first solvation shell of an OD− ion that donate a hydrogen bond to the OD− oxygen, forming a relatively strong hydrogen bond. The stretch mode of

Fig. 14.5 Computed (solid line) and experimental (dashed line) IR spectra for pure D2 O (a) and KOD solutions of 1 M (solid line) and 13 M (dashed line) concentrations.

these OD groups pointing directly to the hydroxyl oxygen are strongly red-shifted. As the concentration is decreased, it is expected that this band, in particular, will exhibit diminished intensity in the infrared spectrum. At high concentration, the shoulder at ≈ 2400 cm−1 is due to the OD− stretch which is only free for a fraction of the trajectory and hence is a weak signal in the IR spectrum.

14.5

Quantum linear response theory

In this section, we will show that energy spectrum can be derived directly from the ensemble density matrix and the quantum Liouville equation without explicit reference

Quantum linear response theory

621

ˆ 0 . This approach, known as quantum linear response theory, is to the eigenstates of H the quantum version of the classical linear response theory described in Section 13.2, and it also the basis for the calculation of quantum transport properties. Since the eigenstate approach to linear spectroscopy derived in Section 14.3 employed first-order perturbation theory (Fermi’s Golden Rule), we expect to use a linearization of the quantum Liouville equation, as was done in Section 13.2, in order to establish the connection between the eigenstate and density-matrix approaches. Recall that the quantum Liouville equation is i 1 hˆ ∂ ρˆ(t) H(t), ρˆ(t) . = ∂t i~

(14.5.1)

ˆ ˆ 0 − VFe (t), H(t) =H

(14.5.2)

In order to keep the discussion as general as possible, we will consider the solution of eqn. (14.5.1) for a general class of Hamiltonians of the form

where V is a Hermitian operator and Fe (t) is an arbitrary function of time. As in the classical case, we take an ansatz for ρˆ(t) of the form ˆ 0 ) + ∆ˆ ρˆ(t) = ρˆ0 (H ρ(t),

(14.5.3)

ˆ 0 ) is the equilibrium density matrix for a system described by the unperwhere ρˆ0 (H ˆ 0 and which therefore satisfies an equilibrium Liouville equation turbed Hamiltonian H h i ˆ 0 , ρˆ0 = 0, H

∂ ρˆ0 = 0. ∂t

(14.5.4)

Assuming that the system is in equilibrium before the perturbation is applied, the ˆ 0 ), ∆ˆ initial condition on the Liouville equation is ρˆ(t0 ) = ρˆ0 (H ρ(t0 ) = 0. When eqn. (14.5.3) is substituted into eqn. (14.5.1) and terms involving both V and ∆ˆ ρ are dropped, we obtain the following equation of motion for ∆ˆ ρ(t): i ∂∆ˆ ρ(t) 1 1 hˆ H0 , ∆ˆ ρ(t) − = [V, ρˆ0 ] Fe (t). ∂t i~ i~

(14.5.5)

Since this equation is a first-order inhomogeneous linear differential equation for an operator and involves a commutator, left and right integrating factors in the form of ˆ 0 t/~) and U ˆ 0 t/~), respectively, are needed. With these ˆ † (t) = exp(iH ˆ0 (t) = exp(−iH U 0 integrating factors, the solution for ∆ˆ ρ(t) becomes Z 1 t ˆ ˆ ds e−iH0 (t−s)/~ [V, ρˆ0 ] eiH0 (t−s)/~ Fe (s). (14.5.6) ∆ˆ ρ(t) = − i~ t0 The ensemble average of an operator Aˆ in the time-dependent quantum ensemble is given by i h i h i h i h ˆ + Tr ∆ˆ ˆ t = Tr ρˆ(t)Aˆ = Tr ρˆ0 Aˆ + Tr ∆ˆ ρ(t)Aˆ = hAi ρ(t)Aˆ , (14.5.7) hAi

622

Quantum time-dependent statistical mechanics

ˆ is the equilibrium ensemble average of A. ˆ When eqn. (14.5.6) is substituted where hAi into eqn. (14.5.7), we obtain ˆ t = hAi ˆ − 1 hAi i~

Z

t

ˆ − 1 = hAi i~

Z

t

1 i~

Z

t

ˆ − = hAi

t0

t0

t0

o n ˆ 0 (t−s)/~ ˆ ˆ −iH ds Tr Ae [V, ρˆ0 ] eiH0 (t−s)/~ Fe (s) n o ˆ ˆ 0 (t−s)/~ ˆ −iH ds Tr eiH0 (t−s)/~ Ae [V, ρˆ0 ] Fe (s) n o ˆ − s) [V(0), ρˆ0 ] Fe (s). ds Tr A(t

(14.5.8)

In the second line, we have used the fact that the trace is invariant to cyclic permuˆ C) ˆ = Tr(Cˆ AˆB) ˆ = Tr(B ˆ Cˆ A). ˆ In the last line, A(t ˆ − s) tations of the operators, Tr(AˆB ˆ denotes the operator A in the interaction picture at time t − s, and V(0) denotes an operator in this picture at t = 0. Using the cyclic property of the trace again, the expression in eqn. (14.5.8) can be further simplified. Expanding the commutator in the last line of eqn. (14.5.8) yields 1 i~

Z

t

ˆ − 1 = hAi i~

Z

t

1 i~

Z

t

ˆ − 1 = hAi i~

Z

t

ˆ t = hAi ˆ − hAi

ˆ − = hAi

t0

t0

t0

t0

h i ˆ − s)V(0)ˆ ˆ − s)ˆ ds Tr A(t ρ0 − A(t ρ0 V(0) Fe (s) i h ˆ − s)V(0) − ρˆ0 V(0)A(t ˆ − s) Fe (s) ds Tr ρˆ0 A(t i h  ˆ − s)V(0) − V(0)A(t ˆ − s) Fe (s) ds Tr ρˆ0 A(t io n h ˆ − s), V(0) Fe (s). ds Tr ρˆ0 A(t

(14.5.9)

Equation (14.5.9) is the quantum analog of the classical linear response formula given in eqn. (13.2.27) and is hence the starting point for the development of quantum GreenKubo expressions for transport properties. Since these expressions are very similar to their classical counterparts, we will not repeat the derivations of the Green-Kubo formulae here. The time correlation function appearing in eqn. (14.5.9) is referred to as the aftereffect function ΦAV (t), which is defined as ΦAV (t) =

i ˆ h[A(t), V(0)]i. ~

(14.5.10)

ˆ Note that although the operator combination [A(t), V(0)] is anti-Hermitian,3 the i ˆ prefactor in eqn. (14.5.10) fixes this: the operator i[A(t), V(0)] is Hermitian. In order 3 An

ˆ satisfies B ˆ † = −B. ˆ anti-Hermitian B

Quantum linear response theory

623

to make contact with the treatment of Section 14.3, we set t0 = −∞ in eqn. (14.5.9) to obtain Z t ˆ t = hAi ˆ + ds ΦAV (t − s)Fe (s). (14.5.11) hAi −∞

According to eqn. (14.5.11), when V is the operator we choose to measure in the non-equilibrium ensemble, we find Z t ds ΦVV (t − s)Fe (s), (14.5.12) hVit = hVi + −∞

which involves a quantum autocorrelation function of V. We now consider the special case of a monochromatic field of frequency ω, for which Fe (t) = F (ω) exp(−iωt). Substituting this field form into eqn. (14.5.12) yields hVit = hVi + F (ω)

Z

t

−∞

ds ΦVV (t − s)e−iωs .

(14.5.13)

Because the lower limit of the time integral in eqn. (14.5.13) is −∞, it is necessary to ensure that the potentially oscillatory integrand yields a convergent result. Thus, we multiply the integrand by a convergence factor exp(ǫt), which decays to 0 as t → −∞. After the integral is performed, the limit ǫ → 0+ (the limit that ǫ approaches 0 from the positive side) is taken. The use of convergence factors is a formal device, the necessity of which depends on the behavior of the autocorrelation function. For nearly perfect solids and glassy systems, one would expect the decay of the correlation function to be very slow, requiring the use of the convergence factor. For ordinary liquids, the correlation function should decay rapidly to zero, obviating the need for this factor. For generality, we retain it in the present discussion. Introducing a convergence factor into eqn. (14.5.13) gives hVit = hVi + F (ω) lim+ ǫ→0

Z

t

−∞

ds ΦVV (t − s)e−iωs eǫs .

(14.5.14)

We now change the integration variables in eqn. (14.5.14) from t to τ = t − s, which yields Z ∞ hVit = hVi + F (ω) lim e(iω+ǫ)t dτ ΦVV (τ )e−(iω+ǫ)τ . (14.5.15) ǫ→0+

0

Equation (14.5.15) involves a Fourier-Laplace transform of the after-effect function at a complex variable z = ω − iǫ. Let the function χVV (z) denote this Laplace transform (see Appendix E) Z χVV (z) =



dτ ΦVV (τ )e−izτ ,

(14.5.16)

0

which is referred to as the susceptibility. Equation (14.5.15) can now be expressed as hVit = hVi + F (ω) lim+ ei(ω−iǫ)t χVV (ω − iǫ). ǫ→0

(14.5.17)

624

Quantum time-dependent statistical mechanics

By decomposing the susceptibility into its real and imaginary parts, we can relate it directly to the energy spectrum Q(ω). In the limit ǫ → 0+ , we obtain Z ∞ χVV (ω) = lim+ dτ e−ǫτ ΦVV (τ )e−iωτ ǫ→0

= lim

ǫ→0+

0

Z

0



dτ e−ǫτ ΦVV (τ ) [cos ωτ − i sin ωτ ]

≡ Re [χVV (ω)] − iIm [χVV (ω)] ,

(14.5.18)

where Re [χVV (ω)] = lim

Z



Im [χVV (ω)] = lim

Z



ǫ→0+

ǫ→0+

dτ e−ǫτ ΦVV (τ ) cos ωτ

0

dτ e−ǫτ ΦVV (τ ) sin ωτ.

(14.5.19)

0

An important property of the susceptibility χ(z) is its analyticity in the complex zplane. For any analytic function, the real and imaginary parts are not independent but satisfy a set of relations known as the Kramers-Kr¨onig relations. Let χ′VV (ω) and χ′′VV (ω) denote the real and imaginary parts of χVV (ω), respectively, so that χVV (ω) = χ′VV (ω) + iχ′′VV (ω). The real and imaginary parts are related by Z ∞ ω ˜ χ′′ (˜ ω) 1 χ′VV (ω) = P d˜ ω 2 VV 2 π ω ˜ −ω −∞ Z ∞ χ′ (˜ ω) ω χ′′VV (ω) = − P d˜ ω 2VV 2 . (14.5.20) π ω ˜ − ω −∞ Here, P indicates that the principal value of the integral is to be taken. The KramersKr¨onig relations can be expressed equivalently as Z ∞ χ′′ (˜ ω) 1 d˜ ω VV χ′VV (ω) = P π ω ˜ −ω −∞ Z ∞ χ′ (˜ ω) 1 χ′′VV (ω) = − P d˜ ω VV , (14.5.21) π ω ˜ − ω −∞ which are known as Hilbert transforms. We alluded to the use of these relations in Section 14.4 where we presented infrared spectra for water and aqueous solutions. We will now show that the frequency spectrum of eqn. (14.3.22) can be related to the imaginary part χ′′VV (ω) of the susceptibility. The spectrum of eqn. (14.3.22) is given in terms of the anticommutator of V(0) and V(t), while the susceptibility is derived from the after-effect function, which involves a commutator between V(0) and V(t). Recall, however, that the frequency spectrum is defined as Q(ω) = ~ω [R(ω) − R(−ω)] .

(14.5.22)

Quantum linear response theory

625

Substituting the definitions of R(ω) and R(−ω) from eqns. (14.3.7) and (14.3.15) into this expression for Q(ω) yields Z ∞ 2 1 Q(ω) = ~ω|F (ω)| 2 dt e−iωt hV(0)V(t) − V(t)V(0)i ~ −∞ Z ∞ 21 = −ω|F (ω)| dt e−iωt h[V(t), V(0)]i ~ −∞ Z ∞ = iω|F (ω)|2 dt e−iωt ΦVV (t). (14.5.23) −∞

Next, we divide the time integration into an integration from −∞ to 0 and from 0 to ∞ so that Z 0  Z ∞ Q(ω) = iω|F (ω)|2 dt e−iωt ΦVV (t) + dt e−iωt ΦVV (t) −∞

= iω|F (ω)|2

Z

0



0

dt eiωt ΦVV (−t) +

Z

0



 dt e−iωt ΦVV (t) ,

(14.5.24)

where in the first term the transformation t → −t has been made. In order to proceed, we need to analyze the time-reversal properties of the after-effect function. Consider a general after-effect function ΦAB (t): iE i Dh iH ˆ ˆ 0 t/~ ˆ −iH ˆ . e 0 t/~ Ae ,B (14.5.25) ΦAB (t) = ~ Substituting −t into eqn. (14.5.25) yields E i D −iH ˆ 0 t/~ ˆ iH ˆ ˆ 0 t/~ ˆ iH ˆ ˆ − Be ˆ −iH e Ae 0 t/~ B Ae 0 t/~ ΦAB (−t) = ~ i   i h  −iH ˆ 0 t/~ ˆ iH ˆ ˆ 0 t/~ ˆ iH ˆ ˆ −iH ˆ − Tr ρˆ0 Be = Ae 0 t/~ . Tr ρˆ0 e Ae 0 t/~ B ~

(14.5.26)

Because the trace is invariant under cyclic permutations of the operators and ρˆ0 comˆ 0 t/~), we can express eqn. (14.5.26) as mutes with the propagators exp(±iH i   i h  −iH ˆ 0 t/~ ˆ iH ˆ ˆ 0 t/~ ˆ iH ˆ ˆ −iH ˆ − Tr ρˆ0 Be Ae 0 t/~ Tr ρˆ0 e Ae 0 t/~ B ΦAB (−t) = ~ i   i h  −iH ˆ 0 t/~ ˆ ˆ 0 t/~ ˆ 0 t/~ ˆ iH ˆ ˆ iH ˆ −iH B − Tr Be Tr e ρˆ0 Ae Ae 0 t/~ ρˆ0 = ~ i   i h  ˆ iH ˆ 0 t/~ ˆ 0 t/~ ˆ ˆ 0 t/~ ˆ −iH ˆρ0 eiH ˆ −iH = Aˆ Tr ρˆ0 Ae 0 t/~ Be − Tr Be ~   i i h  ˆ iH ˆ ˆ 0 t/~ ˆ ˆ ˆ 0 t/~ ˆ −iH ˆ −iH − Tr ρˆ0 eiH0 t/~ Be A Tr ρˆ0 Ae 0 t/~ Be = ~

626

Quantum time-dependent statistical mechanics

i  i h  ˆˆ  ˆ Aˆ Tr ρˆ0 AB(t) − Tr ρˆ0 B(t) ~ E i Dˆˆ ˆ Aˆ AB(t) − B(t) = ~ E iD ˆ ˆ =− [B(t), A] ~

=

= −ΦBA (t).

(14.5.27)

Thus, the effect of time reversal on a general after-effect function is to reverse the order ˆ then the after-effect function of the operators and change the overall sign. If Aˆ = B, only picks up an overall change of sign upon time reversal. When eqn. (14.5.27) is introduced into eqn. (14.5.24), the energy spectrum becomes  Z ∞  Z ∞ iωt −iωt 2 dt e φVV (t) + dt e ΦVV (t) Q(ω) = iω|F (ω)| − 0

= 2ω|F (ω)|2

Z



0

dt sin(ωt)φVV (t)

0

= 2ω|F (ω)|2 Im [χVV (ω)] .

(14.5.28)

Thus, the net absorption spectrum is related to the imaginary part of the frequencydependent susceptibility. Note that eqns. (14.5.28) and (14.3.22) are equivalent, demonstrating that the spectrum is expressible in terms either of Hermitian or anti-Hermitian quantum time correlation functions. This derivation establishes the equivalence between the wave-function approach, leading to the Fermi Golden rule treatment of spectra, and the statistical-mechanical approach, which starts with the ensemble and ˆ 0 . This its density matrix ρˆ(t) and makes no explicit reference to the eigenstates of H is significant, as the former approach is manifestly eigenstate resolved, meaning that ˆ 0 , which is closer to the it explicitly considers the transitions between eigenstates of H experimental view. The latter, which builds directly from the density matrix, is closer in spirit to the path integral perspective.

14.6

Approximations to quantum time correlation functions

In this section, we will discuss the general problem of calculating quantum time correlation functions for condensed-phase systems. We will first show how to formulate ˆ 0 . While the eigenstate forthe correlation function in terms of the eigenstates of H mulation is useful for analyzing the properties of time correlation functions, we have already alluded, in Section 10.1, to the computational intractability of solving the ˆ 0 for systems containing more than just a few degrees of freeeigenvalue problem of H dom. Thus, we will also express the quantum time correlation function using the path integral formulation of quantum mechanics from Chapter 12. Although, as we will show, even the path integral representation suffers from severe numerical difficulties,

Approximations

627

it serves as a useful starting point for the development of computationally manageable approximation schemes. Let us begin with a standard nonsymmetrized time correlation function defined by D i E h 1 ˆ ˆ iHt/~ ˆ ˆ β ˆ B(t) ˆ ˆ −iHt/~ CAB (t) = A(0) , (14.6.1) = Tr e−β H Ae Be Q(N, V, T )

ˆ are quantum mechanical operators in the interaction picture with where Aˆ and B ˆ 4 If we evaluate the trace in the basis of the eigenvectors unperturbed Hamiltonian H. ˆ of H, then a simple formula for the quantum time correlation function results: X 1 ˆ ˆ iHt/~ ˆ ˆ β ˆ −iHt/~ CAB (t) = hEn |e−β H Ae Be |En i Q(N, V, T ) n =

=

X 1 ˆ ˆ ˆ ˆ m ihEm |eiHt/~ ˆ −iHt/~ hEn |e−β H A|E Be |En i Q(N, V, T ) n,m

X 1 ˆ m ihEm |B|E ˆ n i. e−βEn ei(Em −En )t/~ hEn |A|E Q(N, V, T ) n,m

(14.6.2)

ˆ as Thus, if we are able to calculate all of the eigenvalues and eigenvectors of H, ˆ then the calculation of the time well as the full set of matrix elements of Aˆ and B, correlation function just requires carrying out the two sums in eqn. (14.6.2). Generally, however, this can only be done for systems having just a few degrees of freedom. In the condensed phase, for example, it is simply not possible to solve the eigenvalue problem ˆ directly. for H In the Feynman path integral formalism of Chapter 12, the eigenvalue problem is circumvented by computing thermal traces in the basis of coordinate eigenstates. We will now apply this approach to the quantum time correlation function. For simplicity, ˆ be functions we will consider a single particle in one dimension, and we will let Aˆ and B ˆ x), B ˆ = B(ˆ ˆ x). Taking the coordinate-space trace, of the position operator x ˆ, Aˆ = A(ˆ we obtain Z 1 ˆ ˆ ˆ ˆ β ˆ x)e−iHt/~ dx hx|e−β H A(ˆ x)eiHt/~ B(ˆ |xi (14.6.3) CAB (t) = Q(β) Z 1 ˆ ˆ ˆ = dx dx′ dx′′ hx|e−β H |x′ ia(x′ )hx′ |eiHt/~ |x′′ ib(x′′ )hx′′ |e−iHt/~ |xi. Q(β) ˆ

ˆ

ˆ

If each of the matrix elements hx|e−β H |x′ i, hx′ |eiHt/~ |x′′ i, and hx′′ |e−iHt/~ |x′ i were expressed as path integrals, we would interpret eqn. (14.6.3) as follows: Starting at x, ˆ propagate along a real-time path to the point x′′ using the propagator exp(−iHt/~) ′′ ′′ ˆ and evaluate the eigenvalue b(x ) of B at that point; from x , propagate backward ˆ in real time using the propagator exp(iHt/~) to the point x′ and evaluate the eigen′ ˆ value a(x ) of A; finally, propagate in imaginary time using the propagator from x′ to the original starting point x. This is represented schematically in Fig. 14.6(a). Un4 For the remainder of this chapter, we will drop the “0” subscript on the Hamiltonian, since it is ˆ represents the unperturbed Hamiltonian. assumed that H

628

Quantum time-dependent statistical mechanics

Fig. 14.6 (a) Diagram of the real- and imaginary-time paths for the correlation function in eqn. (14.6.3). (b) Same for the time correlation function in eqn. (14.6.4).

fortunately, standard Monte Carlo or molecular dynamics schemes cannot be used to ˆ compute the two real-time paths because the matrix elements of exp(±iHt/~) are not positive-definite, and the sampling schemes of Section 7.3 break down. A possible alternative could be to devise a molecular dynamics approach with complex variables (Gausterer and Klauder, 1986; Lee, 1994; Berges et al., 2007), although no known approach is stable enough to guarantee convergence of a path integral with a purely imaginary discretized action functional. Before proceeding, we note that there are two alternative quantum time correlation β functions that have important advantages over CAB (t). The first is a symmetrized correlation function GβAB (t) defined by GβAB (t) =

i h 1 ˆ ∗ /~ ˆ −iHτ ˆ c /~ ˆ iHτ c . Tr Ae Be Q(N, V, T )

(14.6.4)

Here τc is a complex time variable given by τc = t − iβ~/2. Although not equal to β CAB (t), the Fourier transform of GβAB (t) Z ∞ 1 β ˜ GAB (ω) = √ dte−iωt GβAB (t) (14.6.5) 2π −∞ β is related to the Fourier transform of CAB (t) by β ˜ β (ω), C˜AB (ω) = eβ~ω/2 G AB

(14.6.6)

which provides a straightforward route to the determination of a spectrum, assuming GβAB (t) can be calculated. Equation (14.6.6) can be easily proved by performing the traces in the basis of energy eigenstates (see Problem 14.2). The advantage of GβAB (t) β over CAB (t) can be illustrated for a single particle in one dimension. We assume, again, ˆ ˆ are functions only of x that A and B ˆ and compute the trace in the coordinate basis, which gives Z 1 ˆ ∗ /~ ˆ ˆ ˆ x)eiHτ c GβAB (t) = dx hx|A(ˆ B(ˆ x)e−iHτc /~ |xi (14.6.7) Q(β)

Approximations

=

1 Q(β)

Z

ˆ

629

ˆ



dx dx′ hx|eiHτc /~ |x′ ib(x′ )hx′ |e−iHτc /~ |xia(x). ˆ

ˆ



If the two matrix elements hx|eiHτc /~ |x′ i and hx′ |e−iHτc /~ |xi are represented as path integrals, the interpretation of eqn. (14.6.8) is clear. We start at x, calculate the eigenˆ propagate along a complex time path to x′ , calculate the eigenvalue value a(x) of A, ′ ˆ and then propagate back to x along a complex conjugate time path. This b(x ) of B, process is represented schematically in Fig. 14.6(b). Since the two matrix elements in eqn. (14.6.8) are complex conjugates, and since a(x) and b(x′ ) are both real, GβAB (t) β is, itself, a real object. More importantly, in contrast to CAB (t), the complex time paths needed to represent the two matrix elements in eqn. (14.6.8) have oscillatory phases, but they also have positive-definite weights, which tends to make them somewhat better behaved numerically. If each matrix element in eqn. (14.6.8) is discretized into paths of P points, then GβAB (t) can be written as the limit of a discretized path integral of the form GβAB,P (t) = 1 Q(β)

Z

dx1 · · · dx2P a(x1 )b(xP +1 )ρ(x1 , ..., x2P )eiΦ(x1 ,...,x2P )

(14.6.8)

(Krilov et al., 2001), where ρ(x1 , ..., x2P ) is a positive-definite distribution given by ρ(x1 , ..., x2P ) = 

mP 2π|τc |~

P

# 2P 2P β X mP β X 2 U (xk ) . (xk+1 − xk ) − exp − 4|τc |2 ~2 2P "

(14.6.9)

k=1

k=1

Here, x2P +1 = x1 due to the trace condition, and Φ(x1 , ..., x2P ) is a phase factor defined by " P # 2P X mP t X 2 2 Φ(x1 , ..., x2P ) = (xk+1 − xk ) − (xk+1 − xk ) 2~|τc |2 k=1

t − ~P

"

P X

k=2

U (xk ) −

k=P +1

2P X

k=P +2

#

U (xk ) .

(14.6.10)

In the limit P → ∞, GβAB,P (t) = GβAB (t). Note that the same path variables define ρ and Φ, demonstrating explicitly that the paths have a positive-definite weight as well as a phase factor. Moreover, because GβAB,P (t) is real, the imaginary part of exp(iΦ) must vanish. The fact that the paths in GβAB (t) have positive-definite weights allows novel Monte-Carlo schemes to be devised for computing this function (Jadhao and Makri, 2008).

630

Quantum time-dependent statistical mechanics

The second alternate time correlation function is the Kubo-transformed correlation function (Kubo et al., 1985) defined by β KAB (t) =

1 βQ(N, V, T )

Z

0

β

i h ˆ ˆ −λH ˆ iHt/~ ˆ ˆ ˆ −iHt/~ . dλ Tr e−(β−λ)H Ae e Be

(14.6.11)

β β Like GβAB (t), KAB (t) is also purely real. In addition, KAB (t) reduces to its classical β counterpart both in the classical (β → 0) and harmonic limits. Consequently, KAB (t) can be more readily compared to corresponding classical and harmonic time correlation functions, which can be computed straightforwardly. As with GβAB (t), there is a simple β β relationship between the Fourier transforms of KAB (t) and CAB (t):   β~ω β ˜ β (ω). K (14.6.12) C˜AB (ω) = AB 1 − e−β~ω

Finally, we note that purely imaginary-time correlation functions of the form i h 1 ˆ ˆ ˆ τH ˆ GβAB (τ ) = (14.6.13) Tr e−β H Aˆ e−τ H Be Q(N, V, T ) can be computed straightforwardly using the numerical techniques for imaginary-time path integrals. Equation (14.6.13) results from eqn. (14.6.1) when the Wick rotation from real to imaginary time is applied (see Section 12.2 and Fig. 12.5). An example of such a correlation function is the imaginary-time mean square displacement given by R2 (τ ) = h|ˆ x(τ ) − x ˆ(0)|2 i,

(14.6.14)

which for N particles in three dimensions becomes R2 (τ ) =

N E 1 XD 2 [ˆri (τ ) − ˆri (0)] . N i=1

(14.6.15)

Here, τ ∈ [0, β~/2] (R2 (τ ) is symmetric about τ = β~/2) and for a free particle, its shape is an R2 (τ ) ∝ τ (β~/2 − τ ), where the constant of proportionality depends on the number of dimensions (see Problem 14.12). This important quantity is related β to the real-time velocity autocorrelation function Cvv (t) (more precisely, its Fourier β ˜ transform Cvv (ω)) via a two-sided Laplace transform R2 (τ ) =

1 π

Z



−∞

e−β~ω/2 ˜ β Cvv (ω) ω2       β~ω ~β . −τ − cosh × cosh ω 2 2



(14.6.16)

Equation (14.6.16) suggests that performing the Wick rotation from real to imaginary time is a well-posed problem that requires a Fourier transform followed by a Laplace transform (see Appendix E). Unfortunately, the reverse process, transforming from

Approximations

631

imaginary time back to real time, requires an inverse Laplace transform, which is an extremely ill-posed problem numerically (see, for example, the discussion by Epstein and Schotland (2008)). This is the primary reason that the analytic continuation of imaginary-time data to real-time data is such an immense challenge (Krilov et al., 2001). Before we discuss approximation schemes for quantum time correlation functions, we need to point out that quantum effects in condensed-phase systems are sometimes squelched due to pronounced decoherence effects. In this case, off-diagonal elements ˆ tend to be small for large |x − x′ |, and consequently, of the density matrix exp(−β H) the sums over forward and backward real-time paths are not appreciably different. This means that there is considerable cancellation between these two sums, a fact that forms the basis of a class of approximation schemes known as semiclassical methods. These include the Herman-Kluk propagator (1984), the linearized semiclassical initial value representation (Miller, 2005), the linearized Feynman-Kleinert path integral method (Poulsen et al., 2005; Hone et al., 2008), and the forward–backward approach (Nakayama and Makri, 2005), to name just a few. Although fascinating and potentially very powerful, semiclassical approaches also carry a relatively high computational overhead, and we will not discuss them further here. Rather, we will focus on two increasingly popular approximation schemes for quantum time correlation functions that are based on the use of the imaginary-time path integral. Although these schemes are somewhat ad hoc, they have the advantage of being computationally inexpensive and straightforward to implement. They must, however, be used with care because there is no rigorous basis for this class of methods; because of this, we will also introduce a procedure for checking the accuracy of their results. 14.6.1

Centroid molecular dynamics

In 1993, J. Cao and G. A. Voth introduced the centroid molecular dynamics (CMD) method as an approximate technique for computing real-time quantum correlation functions. The primary object in this approach is the path centroid defined in eqn. (12.8.22). Toward the end of Section 12.8.1, we briefly discussed the centroid potential of mean force (Feynman and Kleinert, 1986); the CMD approach is rooted in this concept and in an idea put forth by Gillan (1987) for obtaining approximate quantum rate constants from the centroid density along a reaction coordinate. CMD is based on the notion that the time evolution of the centroid on this potential of mean force surface can be used to garner approximate quantum dynamical properties of a system. In CMD, the centroid for a single particle in one dimension, denoted here as xc , is postulated to evolve in time according to the following equations of motion: x˙ c =

pc , m

p˙ c = −

dU0 (xc ) ≡ F0 (xc ) dxc

(14.6.17)

(Cao and Voth, 1994a; Cao and Voth, 1996), where m is the physical mass, pc is a momentum conjugate to xc , and U0 (xc ) is the centroid potential of mean force given by ) ( 1/2 I 1 2πβ~2 −S[x(τ )]/~ U0 (xc ) = − ln , (14.6.18) Dx(τ ) δ(x0 [x(τ )] − xc )e β m

632

Quantum time-dependent statistical mechanics

R where x0 [x(τ )] = (1/β~) dτ x(τ ). In eqn. (14.6.18), S[x(τ )] is the Euclidean time action, and the δ-function restricts the functional integration to cyclic paths whose centroid position is xc . Note that eqn. (12.8.24), in the limit P → ∞, is equivalent to eqn. (14.6.18). Of course, in actual calculations, we use the discretized, finite-P version in eqn. (12.8.24). The centroid force at xc , F0 (xc ) is derived from eqn. (14.6.18) simply by spatial differentiation: h R i H β~ 1 Dx(τ ) δ(x0 [x(τ )] − xc ) β~ dτ ′ U ′ (x(τ ′ )) e−S[x(τ )]/~ 0 H . (14.6.19) F0 (xc ) = − D x(τ ) δ(x0 [x(τ )] − xc ) e−S[x(τ )]/~

In a path integral molecular dynamics or Monte Carlo calculation, the centroid force would be computed simply from !+ * P P 1 X 1 X ∂U . (14.6.20) δ F0 (xc ) = − xk − xc P ∂xk P k=1

k=1

f

Although formally exact within the CMD framework, eqns. (14.6.18), (14.6.19), and eqn. (14.6.20) are of limited practical use: Their evaluation entails a full path integral calculation at each centroid configuration, which is computationally very demanding for complex systems. In order to alleviate this computational burden, an adiabatic approximation, similar to that of Section 8.10, can be employed (Cao and Voth, 1994b; Cao and Martyna, 1996). In this approach, an ordinary imaginary-time path integral molecular dynamics calculation in the normal-mode representation of eqn. (12.8.18) is performed with two small modifications. First, the noncentroid modes are assigned masses that are significantly lighter than the centroid mass(es) so as to effect an adiabatic decoupling between the two sets of modes. According to the analysis of Section 8.10, this allows the centroid potential of mean force to be generated “on the fly” as the CMD simulation is carried out. The decoupling is achieved by introducing an adiabaticity parameter γ 2 (0 < γ 2 < 1), which is used to scale the fictitious kinetic masses of the internal modes according to m′k = γ 2 mλk and thereby accelerate their dynamics. Second, while the path integral molecular dynamics schemes of Section 12.8 employ thermostats on every degree of freedom, in the adiabatic CMD approach, because we require the actual dynamics of the centroids, only the noncentroid modes are coupled to thermostats. The key assumption of CMD is that the Kubo-transformed quantum time correlaβ ˆ that are functions of x tion function KAB (t) of eqn. (14.6.11), for operators Aˆ and B ˆ can be approximated by    2 Z 1 pc β KAB (t) ≈ + U0 (xc ) . dxc dpc a (xc ) b (xc (t; xc , pc )) exp −β Q(β) 2m (14.6.21) Here, the function b(xc (t; xc , pc )) is evaluated using the time-evolved centroid variables generated by eqns. (14.6.17), starting from {xc , pc } as initial conditions. An analogous ˆ that are functions of momentum only. As disdefinition holds for operators Aˆ and B cussed by Hernandez et al. (1995), eqn. (14.6.21) can be generalized for operators that

Approximations

633

are functions of both position and momentum using a procedure known as “Weyl operator ordering”(Weyl, 1927; Hillery et al., 1984), which we alluded to in Section 9.2.5 (see eqn. (9.2.51)). CMD is exact in the classical limit and in the limit of a purely harmonic potential. 14.6.2

Ring polymer molecular dynamics

The method known as ring polymer molecular dynamics (RPMD), originally introduced by Craig and Manolopoulos (2004), is motivated by the primitive path integral algorithm of eqn. (12.8.5). Craig and Manolopoulos posited that the primitive equations of motion could be used to extract approximate real-time information. Indeed, like CMD, the dynamics generated by eqns. (12.8.5) possess the correct harmonic and classical limits. The principal features that distinguish RPMD from CMD are threefold. First, the RPMD fictitious masses are chosen such that each imaginary-time slice or bead has the physical mass m. Second, RPMD uses the full chain to approximate time correlation ˆ x) is assumed to evolve in time according to functions. Thus, a quantum observable A(ˆ AP (t) =

P 1 X a(xi (t)). P

(14.6.22)

k=1

β RPMD approximates the Kubo-transformed time correlation function KAB (t) as Z 1 β dP xdP p AP (0)BP (t) e−βP Hcl,P (x,p) , (14.6.23) KAB (t) ≈ P (2π~) QP (β)

where βP = β/P (RPMD simulations are typically carried out at P times the actual temperature) and Hcl,P (x, p) =

P P P X X p2k m X + 2 2 (xk − xk+1 )2 + U (xk ) 2m 2βP ~ k=1

k=1

(14.6.24)

k=1

with xP +1 = x1 . Note that the harmonic bead-coupling and potential energy terms are taken to be P times larger than their counterparts in eqn. (12.8.4). We adopt this convention for consistency with Craig and Manolopoulos (2004); it amounts to nothing more than a rescaling of the temperature from T to P T . For operators linear in position or momentum, the CMD and RPMD representations of observables are the same; however, they generally differ for functions that are nonlinear in these variables. The third difference is that RPMD is purely Newtonian. The equations of motion are easily derived from eqn. (14.6.24): x˙ k =

pk , m

p˙ k = −

∂U m [2xk − xk−1 − xk+1 ] − . βP2 ~2 ∂xk

(14.6.25)

In this dynamics, no thermostats are used on any of the beads since all beads are treated as dynamical variables. As discussed in Section 13.2.1, however, the use of eqn. (14.6.23) assumes that the distribution exp[−βP Hcl,P (x, p)] can be adequately

634

Quantum time-dependent statistical mechanics

sampled, and as Section 12.8 makes clear, this requires some care. Thus, RPMD is optimally implemented by performing a fully thermostatted path integral molecular dynamics calculation in staging or normal modes. From this trajectory, path configurations are periodically transformed back to primitive variables and saved. From these saved path configurations, independent RPMD trajectories are initiated, and these trajectories are then used to compute the approximate Kubo-transformed correlation function. Detailed discussions on the emergence of RPMD from a formally exact path integral dynamics can be found in the work of Hele et al. (2015a, 2015b).

14.6.3

Self-consistent quality control of time correlation functions

Since the introduction of the CMD and RPMD methods, several technical issues have been noted concerning their use in computing vibrational spectra (Witt et al., 2009): Delocalization of ring polymers can cause the centroid position to be unphysical (see Section 12.7). Potentials such as harmonic bonds, which are functions of the length r of the bond, are spherical in nature, and as illustrated in Fig. 12.11, when ring polymers become delocalized over the surface, the centroid may lie inside the sphere. The effect on a vibrational spectrum, such as an infrared spectrum, computed using CMD can be a significant red-shift of the associated stretch frequency, especially at low temperatures (Witt et al., 2009; Paesani and Voth, 2010). For RPMD calculations, unphysical mode frequencies of the ring polymer can interfere with the physical vibrational frequencies and cause unphysical peaks to appear in the spectrum (Witt et al., 2009). As a solution to these problems, Rossi et al. (2014) suggested that performing an RPMD simulation with a thermostat on the internal ring-polymer modes can, with proper choice of the thermostat type and thermostat parameters, correct the spurious effects inherent in both CMD and RPMD that degrade the quality of vibrational spectra. Technical issues such as those discussed in the preceding paragraph suggest that the quality of CMD and RPMD correlation functions is often difficult to assess. Therefore, it is important to have an internal consistency check for these predicted time correlation functions. The measure we will propose allows the inherent accuracy of the CMD or RPMD approximation to be evaluated for a given model without having to rely on experimental data as the final arbiter. Recall that a CMD or RPMD simulation yields an approximation to the Kubotransformed time correlation function. Consider, for example, the velocity autocorreβ,(est) β β lation function Cvv (t) and its Fourier transform C˜vv (ω). Let C˜vv (ω) denote a CMD β ˜ or RPMD approximation to Cvv (ω). Equation (14.6.16) allows us to reconstruct the β,(est) ¯ 2 (τ ) from C˜vv associated imaginary-time correlation function R (ω). The approxi¯ 2 (τ ) can then be compared directly to the numerically exact mean square mation R displacement function R2 (τ ) computed from the same simulation (see eqn. (14.6.15)). P´erez et al. (2009) suggested a dimensionless quantitative descriptor for the quality of an approach (CMD/RPMD) can be defined by 1 χ = β~ 2

Z

β~ 0

2  ¯2 R (τ ) − R2 (τ ) . dτ R2 (τ )

(14.6.26)

Approximations

635

Fig. 14.7 Kubo-transformed position autocorrelation function for a mildly anharmonic potential U (x) = x2 /2+0.1x3 +0.01x4 at inverse temperatures β = 1 (top) and β = 8 (bottom).

If CMD or RPMD were able to generate exact quantum time correlation functions, then χ2 would be exactly zero. Thus, the larger χ2 , the poorer is the CMD or RPMD approximation to the true correlation function. As illustrative examples of the CMD and RPMD schemes and the error measure in eqn. (14.6.26), we consider two one-dimensional systems with potentials given by U (x) = x2 /2 + 0.1x3 + 0.01x4 and U (x) = x4 /4. These potentials are simulated at inverse temperatures of β = 1 and β = 8 with P = 8 and P = 32 beads, respectively. For the CMD simulations, the adiabaticity parameter is γ 2 = 0.005. Figs. 14.7 β and 14.8 show the Kubo-transformed time correlation functions Kxx (t) for these two problems, respectively, comparing CMD and RPMD to the exact correlation functions, which are available for these one-dimensional examples via numerical matrix multiplication (Thirumalai et al., 1983). Since the first potential is very close to harmonic, we expect CMD and RPMD to perform well compared to the exact correlation functions, which, as Fig. 14.7 shows, they do. For the strongly anharmonic potential U (x) = x4 /4, both methods are poor approximations to the exact correlation function. We notice, however, that the results improve at the higher temperature (lower β) for the mildly anharmonic potential, which is expected, as the higher temperature is

636

Quantum time-dependent statistical mechanics

Fig. 14.8 Kubo-transformed position autocorrelation function for a quartic potential U (x) = x4 /4 at inverse temperatures β = 1 (top) and β = 8 (bottom).

closer to the classical limit. This trend is consistent with a study by Witt et al. (2009), who found significant deviations of vibrational spectra from the correct results at low temperatures. In particular, in a subsequent study by Ivanov et al. (2010), CMD was shown to produce severe artificial red shifts in high-frequency regions of vibrational spectra (however, see Paesani and Voth (2010)). For the quartic potential, the results are actually worse at high temperature, indicating that at low temperature (high β), the quartic potential is closer to the harmonic limit for which CMD and RPMD are exact. The second example is a more realistic one of fluid para-hydrogen, described by a potential model of Silvera and Goldman (1978). Following Miller and Manolopoulos (2005) and Hone et al. (2006), the system is simulated at a temperature of T = 14 K, a density of ρ =0.0234 ˚ A−1 , and N = 256 molecules subject to periodic boundary conditions. In addition, we take P = 32 beads to discretize the path integral, and for CMD, the adiabaticity parameter is taken to be γ 2 = 0.0444. Figure 14.9(a) shows the Kubo-transformed velocity autocorrelation functions for this system from CMD and RPMD. The two methods appear to be in excellent agreement with each other. In fact, if these velocity autocorrelation functions are used to compute the diffusion constant using the Green-Kubo theory in eqn. (13.3.35), we obtain D = 0.306 ˚ A2 /ps for CMD 2 ˚ and D = 0.263 A /ps for RPMD, both of which are in reasonable agreement with the

Approximations

637

Fig. 14.9 (a) Velocity autocorrelation functions for para-hydrogen at T = 14 K for CMD and RPMD simulations. (b) Exact imaginary-time mean-square displacements and imaginary-time mean-square displacements reconstructed from the approximate CMD and RPMD real-time correlation functions in part (a).

experimental value of 0.4 ˚ A2 /ps (Miller and Manolopoulos, 2005). Interestingly, we see that the correlation function of this condensed-phase system decays to zero in a short time, something which is not uncommon in the condensed phase at finite temperature. In Fig. 14.9(b), we show the imaginary-time mean-square displacements R2 (τ ) computed directly from an imaginary-time path integral calculation and estimated from the CMD and RPMD approximate real-time correlation functions. Both approximations miss the true imaginary-time data, particularly in the peak region around τ = β~/2. The χ2 error measure for both cases is 0.0089 for RPMD and 0.0056 for CMD. Interestingly, although Braams and Manolopoulos (2006) showed that RPMD is a more accurate approach at very short times, CMD seems to give a slightly better approximation to the true correlation function overall. 14.6.4

Approaching exact quantum time correlation functions

The CMD and RPMD methods were originally proposed as ans¨ atze for approximating the Kubo-transformed time correlation function, and as such, they do not start β from the exact expression for KAB (t). In this section, we will discuss a framework, termed the open-chain symmetrized correlation function (OPSCF) method, for recasting the symmetrized time correlation function GβAB (t) in a way that renders it as a formally exact sampling problem from a positive-definite distribution. We first prove the existence of this positive-definite distribution and then illustrate an approach for approximating the distribution. It is worth noting that since existence of the distribution can be proved, other routes to approximating it likely exist, and we leave this as an open challenge in this area.

638

Quantum time-dependent statistical mechanics

In the primitive coordinates, x1 , ..., x2P , of a single particle in one dimension, the symmetrized quantum time correlation function is given by the expressions in eqns. (14.6.8) through (14.6.10). As can be discerned from the phase definition in eqn. (14.6.10), the points x1 , ..., xP represent the discretization of a path that is propaˆ c /~), while xP +1 , ..., x2P gated forward in complex time via the propagator exp(−iHτ represent the discretization of a path that is propagated backward in complex time via ˆ ∗ /~). In condensed phases, where significant quantum decoherthe propagator exp(iHτ c ence is often anticipated, such decoherence will cause difference between the forward and backward paths to be small. Thus, we can draw inspiration from semi-classical methods and introduce a change of variables from x1 , ..., x2P to a new set, denoted r1 , ..., rP +1 , s2 , ..., sP (Bonella et al., 2010) via r1 = x1 , rP +1 = xP +1  1 xi + x2(P +1)−i , i = 2, ..., P ri = 2 si = xi − x2(P +1)−i , i = 2, ..., P.

(14.6.27)

The variables r1 and rP +1 are the path endpoints, s2 , ..., sP represent the difference between the forward and backward paths, and r2 , ..., rP represent the average of the forward and backward paths. Once we develop the formalism in this new set of variables, we will exploit the fact that, in many applications, s2 , ..., sP are small in order to derive one possible approximation to the symmetrized correlation function. Before introducing any approximations, however, let us recast GβAB,P (t) in terms of the variables r1 , ..., rP +1 ≡ r and s2 , ..., sP ≡ s and determine what can be proved about the resulting expressions. In terms of (r, s), the expression for GβAB,P (t) becomes GβAB,P (t) =

1 Q(β) "

Z

dr1 · · · drP +1 a(r1 )b(rP +1 )I β (r; t) P X

# β (U (r1 ) + U (rP +2 )) , (14.6.28) (ri+1 − ri ) − × exp −4α 2P i=1 where I β (r; t) = In this expression, β

f (r, s; t) =



mP 2π|τc |~

P

Z (

2

β

ds2 · · · dsP f β (r, s; t)eiχ

exp −α

P X i=2

(r,s;t)

.

(si+1 − si )2 − α s22 + s2P

(14.6.29)



  ) P   β X 1 1 − U ri + si + U ri − si , 2P i=2 2 2

(14.6.30)

Approximations

χβ (r, s; t) = γ

639

P −1 X i=2

(ri+1 − ri )(si+1 − si ) + γ ((r2 − r1 )s2 − (rP +1 − rP )sP )



   P   t X 1 1 U ri + si − U ri − si , P ~ i=2 2 2

(14.6.31)

where α = mP β/(8|τc |2 ) and γ = mP t/(~|τc |2 ). Since the integral in eqn. (14.6.28) is performed over the variables r1 , ..., rP +1 , it is critical to note that r1 6= rP +1 , and hence, we can think of r1 , ..., rP +1 as a discretization of an open chain derived from the average of the forward and backward complex-time paths. Amazingly, despite the presence of the complex exponential exp(iχβ (r, s; t)) in eqn. (14.6.29), the function I β (r; t) is actually a positive-definite function of r. In order to prove the assertion that I β (r; t) is positive-definite, note that in eqn. (14.6.28), a(r1 ) = a(x1 ) and b(rP +1 ) = b(xP +1 ) and,, therefore, a direct comparison between eqns. (14.6.28) and (14.6.8) can be made. In particular, eqn. (14.6.8) can be rewritten in the form Z 2 1 ˆ dx1 dxP +1 a(x1 )b(xP +1 ) hxP +1 |e−iHτc /~ |x1 i . (14.6.32) GβAB,P (t) = Q(β)

Note that, in recasting eqn. (14.6.8), we have renamed x → x1 and x′ → xP +1 as dummy variables of integration to arrive at eqn. (14.6.32), and we have writ 2 ˆ ∗ ˆ ˆ ten hx|eiHτc /~ |x′ ihx′ |e−iHτc /~ |xi as hxP +1 |e−iHτc /~ |x1 i , which is a positive-definite quantity. Therefore, this factor in the integrand of eqn. (14.6.32) is positive-definite. Now, in eqn. (14.6.28), the exponential factor is positive-definite. Since eqns. (14.6.28) and (14.6.32) are exactly the same, in order for the product " # P X β 2 β (ri+1 − ri ) − I (r; t) exp −4α (U (r1 ) + U (rP +2 )) 2P i=1

to be positive-definite, it must follow that I β (r; t) is a positive-definite quantity as well, which completes the proof. ˜ β (r; t) such This result implies that there exists an effective real-valued potential U that " # P X β ˜β 2 β (ri+1 − ri ) − I (r; t) exp −4α (U (r1 ) + U (rP +2 )) = e−β U (r;t) . (14.6.33) 2P i=1 ˜ β (r; t) acts on an open chain consisting of P + 1 beads whose primThe potential U ˜ β (r; t)) is positive-definite, the itive variables are r1 , ..., rP +1 . Then, since exp(−β U β calculation of GAB,P (t) is reduced to one of sampling open-chain configurations from ˜ β (r; t)) at different points t in time and evaluating the the Boltzmann factor exp(−β U product a(r1 )b(rP +1 ) using the endpoint positions of the chain for each value of t.

640

Quantum time-dependent statistical mechanics

It is important to note that since |τc | = [t2 + (β~)2 /4]1/2 , as t increases, in order to maintain the accuracy of the calculation, the number of beads in the open chain must increase as well so as to keep |τc |/P constant. ˜ β (r; t) exists, but At this point, we have established that the effective potential U we have not specified how it can be constructed either exactly or approximately. In the work of Cendagorta et al. (2018), the authors expand the terms U (ri ± si /2) in eqns. (14.6.30) and (14.6.31) in a power series out to second order in si under the assumption alluded to above that the si are small. The expansion takes the form  1 1 si  ≈ U (ri ) ± U ′ (ri )si + U ′′ (ri )s2i . (14.6.34) U ri ± 2 2 8 When eqn. (14.6.34) is substituted into eqns. (14.6.30) and (14.6.31), the integrals over s2 , ..., sP become Gaussian integrals that can be performed analytically to yield ˜ β (r; t), which we know must be real despite the a closed-form approximation for U presence of the phase factor in eqn. (14.6.31). By working within this approximation, some of the phase information encoded in the correlation function is captured. The ˜ β (r; t) that results from this approximation (see Problem 14.14) is expression for U   ˜ β (r; t) = 1 KT (r)M−1 (r)K(r) + ln det[M(r)] , (14.6.35) U 2β where M(r) is a (P − 1) × (P − 1) tridiagonal matrix given by   β ′′ U (ri ) δij − 2α (δi,j+1 + δi+1,j ) , Mij (r) = 4α + 4P

(14.6.36)

and K(r) is a (P − 1)-dimensional vector with components Ki (r) = γ (2ri − ri−1 − ri+1 ) −

t ′ U (ri ). β~

(14.6.37)

An advantage of starting from the exact expression for GβAB (t) is that correlation functions of any position-dependent operators, including those that are nonlinear in the positions, can be computed. When this is done in CMD or RPMD, there is usually a loss in accuracy. We illustrate this loss in Fig. 14.10 by computing Gβx2 x2 (t) for a harmonic oscillator U (x) = x2 /2 at temperature β = 1. For this example, the secondorder expansion in eqn. (14.6.34) is exact, which means that the exact result must be obtained, and Fig. 14.10 shows this to be the case within the sampling error bars. It is worth noting that as t and (necessarily) P increase, sampling the configuration space of the open chain becomes more difficult, and convergence is slower. A clear disadvantage of using eqn. (14.6.35) is that the second derivative of the potential must be computed, which, for complex systems, is something preferably to be avoided. If ample computer time is available, the need for a second-order expansion could be circumvented by rewriting integrals over s2 , ..., sP in eqn. (14.6.29) as Z ∞ Z  I β (r; t) = dy eiy ds2 · · · dsP f β (r, s; t) δ χβ (r, s; t) − y . (14.6.38) −∞

The most straightforward way to implement eqn. (14.6.38) is to sample a configuration r¯ ≡ r¯1 , ..., r¯P +1 of an open chain and at this configuration perform a path integral

Problems

641

Fig. 14.10 Symmetric correlation function Gβx2 x2 (t) for a harmonic oscillator U (x) = x2 /2 as generated using ring polymer molecular dynamics (RPMD) and the open-chain approach (symbols with error bars) compared to the exact result.

simulation in the space of the variables s2 , ..., sP , which are then sampled from the distribution f β (¯ r , s; t) at the fixed configuration r¯. During this simulation, a histogram of the function χβ (¯ r , s; t) is collected. Once this histogram is converged, it is fed into the one-dimensional integral over y in order to obtain a value for I β (¯ r ; t), which is then used to sample the next configuration of the open chain via, for example, a Monte Carlo algorithm. Once again, because the open chain must increase in length as t increases, we would expect that converging the distribution to an accuracy sufficient enough to integrate stably over y will become more difficult as longer times t are needed. However, when decoherence is strong, the correlation function will decay quickly, and the method based on eqn. (14.6.38) might remain stable long enough to see this rapid ˜ β (r; t) can be proved opens the decay. Ultimately, the fact that the very existence of U door to new approximate approaches.

14.7

Problems

14.1. a. Derive eqns. (14.4.7) and (14.4.8). b. Show that the Fourier transforms of correlation functions hˆ x(0)ˆ x(t)i and hˆ x(t)ˆ x(0)i are related by Z ∞ Z ∞ 1 1 dt e−iωt hˆ x(0)ˆ x(t)i = eβ~ω dt e−iωt hˆ x(t)ˆ x(0)i. 2π −∞ 2π −∞ c. Show that the Fourier transform of hˆ x(t)ˆ x(0)i is related to its classical counterpart by Z ∞ β~ω 1 1 x(t), xˆ(0)]+ i = tanh(β~ω/2) dt e−iωt h[ˆ 2π −∞ 2 2 Z ∞ 1 dt e−iωt hx(t)x(0)icl . × 2π −∞

642

Quantum time-dependent statistical mechanics

14.2. a. Derive eqns. (14.6.6) and (14.6.12). ∗

b. Derive eqns. (14.6.8) through (14.6.10).

14.3. Derive eqn. (14.6.16). 14.4. A quantum harmonic oscillator of mass m and frequency ω is subject to a ˆ 1 (t) = −αˆ time-dependent perturbation H x exp(−t2 /τ 2 ), t ∈ (−∞, ∞). At t0 = −∞, the oscillator is in its ground state. a. To the lowest nonvanishing order in perturbation theory, calculate the probability of a transition from the ground to the first excited state as t → ∞. b. To the lowest nonvanishing order in perturbation theory, calculate the probability of a transition from the ground to the second excited state as t → ∞. 14.5. Evaluate the correlation functions Gxx (t) and Kxx (t) for a harmonic oscillator and compare both to the classical correlation function hx(0)x(t)i for a harmonic oscillator. 14.6. The time-dependent Schr¨odinger equation for a single particle of mass m and charge −e moving in a potential U (r) subject to an electromagnetic field is   i2 e ∂ 1 h −i~∇ − A(r, t) − eφ(r, t) + V (r) ψ(r, t) = i~ ψ(r, t). − 2m c ∂t

Show that the Schr¨odinger equation is invariant under a gauge transformation A′ (r, t) = A(r, t) − ∇χ(r, t) φ′ (r, t) = φ(r, t) +

1 ∂ χ(r, t) c ∂t

ψ ′ (r, t) = e−ieχ(r,t)/~c ψ(r, t). 14.7. Consider the free rotational motion of a rigid heteronuclear diatomic molecule of (fixed) bond length R and moment of inertia I = µR2 , where µ is the reduced mass, about an axis through its center of mass perpendicular to the internuclear bond axis. The molecule is constrained to rotate in the xy plane only. One of the atoms carries a charge q and the other a charge −q. ˆ 0 for the a. Ignoring center-of-mass motion, write down the Hamiltonian H molecule. ˆ 0. b. Find the eigenvalues and eigenvectors of H

Problems

643

c. The molecule is exposed to spatially homogeneous, monochromatic radiation with an electric field E(t) given by ˆ, E(t) = E(ω)eiωt x ˆ is the unit vector in the x-direction. Write down the perturbation where x ˆ 1. Hamiltonian H d. Calculate the energy spectrum Q(ω) for ω > 0. Interpret your results, and in particular, explain how the allowed absorptions and emissions are manifest in your final expression. Plot the absorption part of your spectrum. Where do you expect the peak intensity to occur? Hint: Consider using a convergence factor, exp(−ǫ|t|), and let ǫ go to 0 at the end of the calculation. e. Based on your results from parts a-d, plot the spectrum three-dimensional rigid rotor, for which the energy eigenvalues are Elm = ~2 l(l + 1)/2I and m = −l, ..., l is the quantum number for the z-component of angular momentum. Where do you expect the peak intensity to occur in the 3dimensional case? 14.8. Derive a discrete path integral representation for the Kubo-transformed quanβ tum time correlation function KAB (t) defined in eqn. (14.6.11). 14.9. Consider two spin-1/2 particles at fixed points in space a distance R apart and interacting with a magnetic field B = (0, 0, B). The particles carry charge q and −q, respectively. The Hamiltonian of the system is 2 ˆ = −γB · S ˆ−q , H R

ˆ =S ˆ1 + S ˆ 2 is the total spin, and γ is the spin gyromagnetic ratio. where S a. What are the allowed energy levels of this system? b. Suppose that a time-dependent perturbation of the form ˆ 1 (t) = −γb · Se ˆ −t2 /τ 2 , H where b = (b, 0, 0) is applied at t = −∞. At t = −∞, the system is in its unperturbed ground state. To first order in perturbation theory, what is the probability, as t −→ ∞, that the system will make a transition from its ground state to a state with energy −q 2 /R? 14.10. The density of vibrational states, also known as the power spectrum or spectral density, is the Fourier transform of the velocity autocorrelation function: Z ∞ 1 dt e−iωt Cvv (t). I(ω) = √ 2π −∞

644

Quantum time-dependent statistical mechanics

I(ω) encodes information about the vibrational modes of a system; however, it does not provide any information about net absorption intensities. For the two model velocity autocorrelation functions in Problem 13.1, calculate the density of vibrational states and interpret them in terms of the physical situations described by these two model correlation functions. ∗

14.11. For the discrete correlation function GβAB,P (t) defined in eqn. (14.6.8), we could analyze the importance of the phase factor Φ(x1 , ..., x2P ) by calculating its fluctuation (δΦ)2 ≡ hΦ2 i − hΦi2 = h(Φ − hΦi)2 i with respect to an equilibrium discrete path integral consisting of 2P imaginarytime points. Using the path integral virial theorem in eqn. (12.8.34), derive a virial estimator for the above average. 14.12. Derive analytical expressions for the imaginary-time mean-square displacement of a free particle in 1, 2, and 3 dimensions. In particular, show that in d dimensions, R2 (τ ) is an inverted parabola, symmetric about the point τ = β~/2. For each number of dimensions, sketch the graph of R2 (τ ) as a function of τ . Finally, determine the numerical value of R2 (β~/2) in angstroms at T = 300 K for an electron and for a proton. 14.13. Show that RPMD can be rewritten in terms of normal modes in such a way that the dynamical trajectories are exactly the same as those obtained using primitive variables. 14.14. Derive eqn. (14.6.35). In this derivation, you may make use of the following identity for Gaussian integrals: If x is an n-dimensional vector, M is an n × n matrix, and v is an n-dimensional vector, then 1/2  Z T −1 (2π)n −xT Mx+iv·x e−v M v . dxe = det[M] 14.15. Electronic excitation spectroscopy investigates the dynamics of a quantum system on two electronic surfaces denoted Ug (x) and Ue (x) for the ground ˆg and excited-state surfaces, respectively, with corresponding Hamiltonians H ˆ and He . An analog of the symmetrized time correlation function Geg (t) for this type of process can be written as io n h 1 ˆ e τ ∗ /~ ˆ −iH ˆ g τc /~ ˆ iH c . Geg (t) = Re Tr Ae Be Q

Show that an open-chain approach based on a second-order expansion of the potentials about si = 0 yields a method in which the potential energy functions appearing in eqns. (14.6.36) and (14.6.37) are replaced by the average

Problems

645

potential (Ug +Ue )/2, indicating motion on an average potential surface (reminiscent of an approach to quantum dynamics known as Ehrenfest dynamics).

15 The Langevin and generalized Langevin equations 15.1

The general model of a system plus a bath

Many problems in chemistry, biology, and physics do not involve homogeneous systems but are concerned, rather, with a specific process that occurs in some sort of medium. Most biophysical and biochemical processes happen in an aqueous environment, and one might be interested in a specific conformational change in a protein or the bond-breaking event in a hydrolysis reaction. In this case, the water solvent and other degrees of freedom not directly involved in the reaction serve as the “medium,” which is often referred to generically as a bath. Organic reactions are carried out in a variety of different solvents, including water, methanol, dimethyl sulfoxide, and carbon tetrachloride. For example, a common reaction such as a Diels-Alder reaction can occur in water or in a room-temperature ionic liquid. In surface physics, we might be interested in the addition of an adsorbate to a particular site on the surface. If a reaction coordinate (see Section 8.6) for the adsorption process can be identified, the remaining degrees of freedom, including the bulk below the surface, can be treated as the environment or bath. Many other examples fall into this general paradigm, and it is, therefore, useful to develop a framework for treating such problems. In this chapter, we will develop an approach that allows the bath degrees of freedom to be eliminated from a problem, leaving only coordinates of interest to be treated explicitly. The resulting equation of motion in the reduced subspace, known as the generalized Langevin equation (Langevin, 1905; Langevin, 1908) after the French physicist Paul Langevin (1872–1946), can only be taken as rigorous in certain idealized limits. However, as a phenomenological theory, the generalized Langevin equation is a powerful tool for understanding of a wide variety of physical processes. These include theories of chemical reaction rates (Kramers, 1940; Grote and Hynes, 1980; Pollak et al., 1989; Pollak, 1990; Pollak et al., 1990) and of vibrational dephasing and energy relaxation, to be discussed in Section 15.4. In order to introduce the basic paradigm of a subsystem interacting with a bath, consider a classical system with generalized coordinates q1 , ..., q3N . Suppose we are interested in a simple process that can be described by a single coordinate, which we arbitrarily take to be q1 . We will call q1 and the remaining coordinates q2 , ..., q3N the system and bath coordinates, respectively. Moreover, in order to make the notation clearer, we will rename q1 as q and the remaining bath coordinates as y1 , ..., yn , where n = 3N −1. In order to avoid unnecessary complexity at this point, we will assume that

System coupled to a bath

647

the system coordinate q is a simple coordinate, such as a distance between two atoms or a Cartesian spatial direction (in Section 15.8, we will introduce a general framework for treating the problem that allows this restriction to be lifted). The Hamiltonian for q and its conjugate momentum p in the absence of the bath can then be written simply as p2 + V (q), (15.1.1) H(q, p) = 2µ where µ is the mass associated with q and V (q) is a potential energy contribution that depends only on q and, therefore, is present even without the bath. The system is coupled to the bath via a potential Ubath (q, y1 , ..., yn ) that involves both the coupling terms between the system and the bath and terms describing the interactions among the bath degrees of freedom. The total potential is U (q, y1 , ..., yn ) = V (q) + Ubath (q, y1 , ..., yn ).

(15.1.2)

As an example, consider a system originally formulated in Cartesian coordinates r1 , ..., rN described by a pair potential U (r1 , ..., rN ) =

N −1 X

N X

i=1 j=i+1

u(|ri − rj |).

(15.1.3)

Suppose the distance r = |r1 − r2 | between atoms 1 and 2 is a coordinate of interest, which we take as the system coordinate. All other degrees of freedom are assigned as bath coordinates. Suppose, further, that atoms 1 and 2 have the same mass. We first transform to the center-of-mass and relative coordinates between atoms 1 and 2 according to 1 r = r1 − r2 , (15.1.4) R = (r1 + r2 ), 2 the inverse of which is 1 r1 = R + r, 2

1 r2 = R − r. 2

(15.1.5)

The potential can then be expressed as

U (r1 , ..., rN ) = u(|r1 − r2 |) +

N X i=3

[u(|r1 − ri |) + u(|r2 − ri |)] +

N −1 X

N X

i=3 j=i+1

   N   X 1 1 u R + rn − ri + u R − rn − ri = u(r) + 2 2 i=3 +

N −1 X

N X

i=3 j=i+1

u(|ri − rj |),

u(|ri − rj |)

(15.1.6)

where n = (r1 − r2 )/|r1 − r2 | = r/r is the unit vector along the relative coordinate direction. Equation (15.1.6) is of the same form as eqn. (15.1.2), in which the first

648

Langevin and generalized Langevin equations

term is equivalent to V (q), the term in brackets represents the interaction between the system and the bath, and the final term is a pure bath–bath interaction. Suppose the bath potential Ubath can be reasonably approximated by an expansion up to second-order about a minimum characterized by values q¯, y¯1 , ..., y¯n of the generalized coordinates. The condition for Ubath to have a minimum at these values is ∂Ubath = 0, (15.1.7) ∂qα {q=¯q ,y=¯y}

where all coordinates are set equal to their values at the minimum. Performing the expansion up to second-order gives X ∂Ubath (qα − q¯α ) Ubath (q, y1 , ..., yn ) ≈ Ubath (¯ q , y¯1 , ..., y¯n ) + ∂qα {q=¯q ,y=¯y} α # " 1X ∂ 2 Ubath + (qβ − q¯β ). (15.1.8) (qα − q¯α ) 2 ∂qα ∂qβ {q=¯ q ,y=¯ y}

α,β

The second term in eqn. (15.1.8) vanishes by virtue of the condition in eqn. (15.1.7). The first term is a constant that can be made to vanish by shifting the absolute zero of the potential (which is, anyway, arbitrary). Thus, the bath potential reduces, in this approximation, to n+1 n+1 1 XX Ubath (q, y1 , ..., yn ) = q˜α Hαβ q˜β , (15.1.9) 2 α=1 β=1

2

where Hαβ = ∂ Ubath /∂qα ∂qβ |q=¯q ,{y=¯y} and q˜α = qα − q¯α are the displacements of the generalized coordinates from their values at the minimum of the potential. Note that since we have already identified the purely q-dependent term in eqn. (15.1.6), the H11 arising from the expansion of the bath potential can be taken to be zero or absorbed into the q-dependent function V (q). Since our treatment from this point on will refer to the displacement coordinates, we will drop the tildes and let qα refer to the displacement of a coordinate from its value at the minimum. Separating the particular coordinate q from the other coordinates gives a potential of the form Ubath (q, y1 , ..., yn ) =

X

Cα qyα +

α

n n 1 XX ˜ αβ yβ , yα H 2 α=1

(15.1.10)

β=1

˜ αβ is the n × n block of Hαβ coupling only the where Cα = H1α = Hα1 and H coordinates y1 , ..., yn . The potential, though quadratic, is still somewhat complicated because all of the coordinates are coupled through the matrix Hαβ . Thus, in order to simplify the potential, we introduce a linear transformation of the coordinates y1 , ..., yn to x1 , ..., xn via n X yα = Rαβ xβ , (15.1.11) β=1

˜ αβ via where Rαβ is an orthogonal matrix that diagonalizes the symmetric matrix H T T ˜ diag = R HR, ˜ where R is the transpose of R and H ˜ diag contains the eigenvalues of H ˜ H

Derivation of the GLE

649

on its diagonal. Letting kα denote these eigenvalues and introducing the transformation into eqn. (15.1.10), we obtain X 1X kα x2α , (15.1.12) Ubath (q, x1 , ..., xn ) = gα qxα + 2 α α P where gα = β Cβ Rβα . The potential energy in eqn. (15.1.12) is known as a harmonic bath potential; it also contains a bilinear coupling to the coordinate q. We will henceforth refer to the coordinate q as the “system coordinate.” In order to construct the full Hamiltonian in the harmonic bath approximation, we introduce a set of momenta p1 , ..., pn , assumed to be conjugate to the coordinates x1 , ..., xn , and a set of bath masses m1 , ..., mn . The full Hamiltonian for the system coordinate coupled to a harmonic bath can be written as  n  n X X p2 p2α 1 H= + mα ωα2 x2α + q + V (q) + gα xα , (15.1.13) 2µ 2mα 2 α=1 α=1 where the spring constants kα have been replaced by the bath frequencies ω1 , ..., ωn using kα = mα ωα2 so that the motion of the free bath oscillators is independent of the choice of masses mα and depends only on the frequencies. We must not forget that eqn. (15.1.13) represents a highly idealized situation in which the possible curvilinear nature of the generalized coordinates is neglected in favor of a very simple model of the bath (Deutsch and Silbey, 1971; Caldeira and Leggett, 1983). A real bath is often characterized by a continuous distribution of frequencies I(ω) called the spectral density or density of states (see Problem 14.10). I(ω) is obtained by taking the Fourier transform of the velocity autocorrelation function.1 The physical picture embodied in the harmonic-bath Hamiltonian is one in which a real bath is replaced by an ideal bath under the assumption that the motion of the real bath is dominated by small displacements from an equilibrium point described by discrete frequencies ω1 , ..., ωn . This replacement is tantamount to expressing I(ω) as a sum of harmonic-oscillator spectral density functions. It is important to note that the harmonic bath does not allow for diffusion of bath particles. In general, a set of frequencies, ω1 , .., ωn , effective masses m1 , ..., mn , and coupling constants to the system g1 , ..., gn need to be determined in order to reproduce at least some of the properties of the real bath. The extent to which this can be done, however, depends on the particular nature of the original bath. For the purposes of the subsequent discussion, we will assume that a reasonable choice can be made for these parameters and proceed to work out the classical dynamics of the harmonic-bath Hamiltonian.

15.2

Derivation of the generalized Langevin equation

We begin by deriving the classical equations of motion generated by eqn. (15.1.13). Applying Hamilton’s equations, these become q˙ =

∂H p = ∂p µ

1 The density of states encodes the information about the vibrational modes of the bath; however, it does not provide any information about absorption intensities.

650

Langevin and generalized Langevin equations

p˙ = − x˙ α =

X dV ∂H =− − gα xα ∂q dq α

pα ∂H = ∂pα mα

p˙ α = −

∂H = −mα ωα2 xα − gα q, ∂xα

(15.2.1)

which can be written as the following set of coupled second-order differential equations: µ¨ q=−

X dV − gα xα dq α

mα x ¨α = −mα ωα2 xα − gα q.

(15.2.2)

Equations (15.2.2) must be solved subject to a set of initial conditions {q(0), q(0), ˙ x1 (0), ..., xn (0), x˙ 1 (0), ..., x˙ n (0)}. The second equation for the bath coordinates can be solved in terms of the system coordinate q by Laplace transformation, assuming that the system coordinate q acts as a kind of driving term. The Laplace transform of a function f (t), alluded to briefly in Section 14.6, is one of several types of integral transforms defined to be f˜(s) =

Z



dt e−st f (t).

(15.2.3)

0

As we will now show, Laplace transforms are particularly useful for solving linear differential equations. A more detailed discussion of Laplace transforms is given in Appendix E. From eqn. (15.2.3), it can be shown straightforwardly that the Laplace transforms of df /dt and d2 f /dt2 are given, respectively, by Z



dt e−st

0

Z

∞ 0

dt e−st

df = sf˜(s) − f (0) dt

d2 f = s2 f˜(s) − f ′ (0) − sf (0). dt2

(15.2.4)

Finally, the Laplace transform of a convolution of two functions f (t) and g(t) can be shown to be Z ∞ Z t −st dt e dτ f (τ )g(t − τ ) = f˜(s)˜ g (s). (15.2.5) 0

0

Equations (15.2.4) and (15.2.5), together with eqn. (E.2), are sufficient for us to solve eqns. (15.2.2).

Derivation of the GLE

651

Taking the Laplace transform of both sides of the second line in eqn. (15.2.2) yields s2 x ˜α (s) − x˙ α (0) − sxα (0) + ωα2 x˜α (s) = −

gα q˜(s). mα

(15.2.6)

The use of the Laplace transform has the effect of turning a differential equation into an algebraic equation for x˜α (s). Solving this equation for x ˜α (s) gives x ˜α (s) =

s 1 gα q˜(s) xα (0) + 2 x˙ α (0) − . s2 + ωα2 s + ωα2 mα s2 + ωα2

(15.2.7)

We now obtain the solution to the differential equation by computing the inverse transform x ˜α (s) in eqn. (15.2.7). Applying the inverse Laplace transform relations in Appendix E (eqn. (E.2)), and recognizing that the last term in eqn. (15.2.7) is the product of two Laplace transforms and can be inverted using the convolution theorem, we find that the solution for xα (t) is 1 gα xα (t) = xα (0) cos ωα t + x˙ α (0) sin ωα t − ωα mα ω α

Z

0

t

dτ sin ωα (t − τ )q(τ ). (15.2.8)

For reasons that will be clear shortly, we integrate the convolution term by parts to express it in the form Z

0

t

1 [q(t) − q(0) cos ωα t] ωα Z t 1 − dτ cos ωα (t − τ )q(τ ˙ ). ωα 0

dτ sin ωα (t − τ )q(τ ) =

(15.2.9)

Substituting eqn. (15.2.9) and eqn. (15.2.8) into the first line of eqn. (15.2.2) yields the equation of motion for q: X dV − gα xα (t) dq α  X  dV pα (0) gα =− sin ωα t + − q(0) cos ωα t gα xα (0) cos ωα t + dq mα ω α mα ωα2 α X g2 Z t X g2 α α q(t). (15.2.10) − dτ q(τ ˙ ) cos ω (t − τ ) + α 2 m ω m ωα2 α α 0 α α α

µ¨ q=−

Equation (15.2.10) is in the form of an integro-differential equation for the system coordinate that depends explicitly on the bath dynamics. Although the dynamics of each bath coordinate are relatively simple, the collective effect of the bath on the system coordinate can be nontrivial, particularly if the initial conditions of the bath are randomly chosen, the distribution of frequencies is broad, and the frequencies are not all commensurate. Indeed, the bath might appear to affect the system coordinate in a random and unpredictable manner, especially if the number of bath degrees of

652

Langevin and generalized Langevin equations

freedom is large. This is just what we might expect for a real bath. Thus, in order to motivate this physical picture, the following quantities are introduced:   X  pα (0) gα sin ω t , (15.2.11) q(0) cos ω t + R(t) = − gα xα (0) + α α mα ωα2 mα ω α α ζ(t) =

X α

gα2 cos ωα t, mα ωα2

W (q) = V (q) −

X α

gα2 q2 . 2mα ωα2

(15.2.12) (15.2.13)

In terms of these quantities, the equation of motion for the system coordinate reads Z t dW dτ q(τ ˙ )ζ(t − τ ) + R(t). (15.2.14) µ¨ q=− − dq 0 Equation (15.2.14) is known as the generalized Langevin equation (GLE). The quantity ζ(t) in the GLE is called the dynamic friction kernel, R(t) is called the random force, and W (q) is identified as the potential of mean force acting on the system coordinate. Despite the simplifications of the bath inherent in eqn. (15.2.14), the GLE can yield considerable physical insight without requiring large-scale simulations. Before discussing predictions of the GLE, we will examine each of the terms in eqn. (15.2.14) and provide a physical interpretation of them. 15.2.1

The potential of mean force

Potentials of mean force were first discussed in Chapter 8 (see eqns. (8.6.6) and (8.6.7)). For a true harmonic bath, the potential of mean force is given by the simple expression in eqn. (15.2.13); however, as a phenomenological theory, the GLE assumes that the potential of mean force has been generated by some other means (using techniques from Chapter 8, for example the blue moon ensemble of Section 8.7 or the umbrella sampling approach of Section 8.8) and attempts to model the dynamics of the system coordinate on this surface using the friction kernel and random force to represent the influence of the bath. The use of the potential of mean force in the GLE assumes a quasi-adiabatic separation between the system and bath motions. However, considering the GLE’s phenomenological viewpoint, it is also possible to use the bare potential V (q) and use the GLE to model the dynamics on this surface instead. Such a model can be derived from a slightly modified version of the harmonic-bath Hamiltonian: " 2 #  n n X p2 p2α 1X gα 2 H= + + V (q) + q . (15.2.15) mα ωα xα + 2µ 2mα 2 α=1 mα ωα2 α=1 15.2.2

The random force

The question that immediately arises concerning the random force in eqn. (15.2.14) is why it is called “random” in the first place. After all, eqn. (15.2.11) defines a perfectly

Derivation of the GLE

653

deterministic quantity. To understand why R(t) can be treated as a random process, we note that a real bath, which contains a macroscopically large number of degrees of freedom, will affect the system in what appears to be a random manner, despite the fact that its time evolution is completely determined by the classical equations of motion. Recall, however, that the basic idea of ensemble theory is to disregard the detailed motion of every degree of freedom in a macroscopically large system and to replace this level of detail by an ensemble average. It is in this spirit that we replace the R(t), defined microscopically in eqn. (15.2.11), with a truly random process defined by a particular time sequence of random numbers and a set of related time correlation functions that must be satisfied by this sequence. We first note that the time correlation functions hq(0)R(t)i and hq(0)R(t)i ˙ are identically zero for all time. To see this, consider first the correlation function   p(0) R(t) hq(0)R(t)i ˙ = µ    2 Z p 1 + V (q) dp dq exp −β =− Q 2µ ( " n #) ! Z Y n n n X X p2α 1X 2 2 dxα dpα exp −β × + mα ωα xα + q gα xα 2mα 2 α=1 α=1 α=1 α=1    gα pα pX sin ωα t , (15.2.16) q cos ωα t + gα xα + × µ α mα ωα2 mα ω α where the average is taken over a canonical ensemble and Q is the partition function for the harmonic-bath Hamiltonian. Since R(t)R does not depend on the system mo∞ mentum p, the integral over p is of the form −∞ dp p exp(−βp2 /2µ) = 0, and the entire integral vanishes. It is left as an exercise to show that the correlation function hq(0)R(t)i = 0 (see Problem 15.1). The vanishing of the correlation functions hq(0)R(t)i and hq(0)R(t)i ˙ is precisely what we would expect from a random bath force, and hence we require that these correlation functions vanish for any model random process. Finally, the same manipulations employed above can be used to derive the autocorrelation function hR(0)R(t)i with the result hR(0)R(t)i =

1 X gα2 cos ωα t = kT ζ(t), β α mα ωα2

(15.2.17)

which shows that the random force and the dynamic friction kernel are related (see Problem 15.1). Equation (15.2.17) is known as the second fluctuation dissipation theorem (Kubo et al., 1985). Once again, we require that any model random process we choose satisfy this theorem. If the deterministic definition of R(t) in eqn. (15.2.11) is to be replaced by a model random process, how should such a process be described mathematically? There are various ways to construct random time sequences that give the correct time correlation functions, depending on the physics of the problem. For instance, the influence of a

654

Langevin and generalized Langevin equations

relatively high-density bath, which affects the system via only soft collisions due to low amplitude thermal fluctuations, is different from a low-density, high-temperature bath that influences the system through mostly strong, impulsive collisions. Here, we construct a commonly used model, known as a Gaussian random process, for the former type of bath. Since for most potentials, the GLE must be integrated numerically, we seek a discrete description of R(t) that acts at M discrete time points 0, ∆t, 2∆t, ..., (M −1)∆t. At the kth point of a Gaussian random process, Rk ≡ R(k∆t) can be expressed as the sum of Fourier sine and cosine series as Rk =

M−1 X

aj sin

j=0



2πjk M



+ bj cos



2πjk M



,

(15.2.18)

where the coefficients aj and bj are random numbers sampled from a Gaussian distribution of the form P (a0 , ..., aM−1 , b0 , ..., bM−1 ) =

M−1 Y k=0

1 −(a2k +b2k )/2σk2 e . 2πσk2

(15.2.19)

For the random force to satisfy eqn. (15.2.17) at each time point, the width, σk , of the distribution must be chosen according to σk2

  M−1 1 X 2πjk = ζ(j∆t) cos , βM j=0 M

(15.2.20)

which can be easily evaluated using fast Fourier transform techniques. Since the random process in eqn. (15.2.18) is periodic with period M , it clearly cannot be used for more than a single period. This means that the number of points M in the trajectory must be long enough to capture the dynamical behavior sought. 15.2.3

The dynamic friction kernel

The convolution integral term in eqn. (15.2.14) Z

0

t

dτ q(τ ˙ )ζ(t − τ )

is called the memory integral because it depends, in principle, on the entire history of the evolution of q. Physically, this term expresses the fact that the bath requires a finite time to respond to any fluctuation in the motion of the system and that this lag affects how the bath subsequently influences the motion of the system. Thus, the force that the bath exerts on the system at any point in time depends on the prior motion of the system coordinate q. The memory of the system coordinate dynamics retained by the bath is encoded in the memory kernel or dynamic friction kernel ζ(t). Note that ζ(t) has units of mass·(time)−2 . Since the dynamic friction kernel is actually an autocorrelation function of the random force, it follows that the correlation time of the random force determines the decay time of the memory kernel. The finite correlation

Derivation of the GLE

655

time of the memory kernel indicates that the bath, in reality, retains memory of the system motion for a finite time tmem . One might expect, therefore, that the memory integral could be replaced, to a very good approximation, by an integral over a finite interval [t − tmem, t]: Z t Z t dτ q(τ ˙ )ζ(t − τ ) ≈ dτ q(τ ˙ )ζ(t − τ ). (15.2.21) 0

t−tmem

Such an approximation proves very convenient in numerical simulations based on the generalized Langevin equation, as it permits the memory integral to be truncated, thereby reducing the computational overhead needed to evaluate it. We now consider a few interesting limiting cases of the friction kernel. Suppose, for example, that the bath is able to respond infinitely quickly to the motion of the system. This would occur when the system mass µ is very large compared to the bath masses, µ ≫ mα . In such a case, the bath retains essentially no memory of the system motion, and the memory kernel reduces to a simple δ-function in time: ζ(t) = lim ζ0 δ(t − ǫ). ǫ→0+

(15.2.22)

The introduction of the parameter ǫ ensures that the entire δ-function is integrated over. Alternatively, we can recognize that for ǫ = 0, only “half” of the δ-function is included in the interval t ∈ [0, ∞), since δ(t) is an even function of time, and therefore, we could also define ζ(t) as 2ζ0 δ(t). Substituting eqn. (15.2.22) into eqn. (15.2.14) and taking the limit gives an equation of motion for q of the form Z t dW − lim ζ0 dτ q(τ ˙ )δ(t − ǫ − τ ) + R(t) µ¨ q=− dq ǫ→0+ 0 =−

dW − lim+ ζ0 q(t ˙ − ǫ) + R(t) dq ǫ→0

=−

dW − ζ0 q(t) ˙ + R(t), dq

(15.2.23)

where all quantities on the right are evaluated at time t. Equation (15.2.23) is known as the Langevin equation (LE), and it should be clear that the LE is ultimately a special case of the GLE. The LE describes the motion of a system in a potential W (q) subject to an ordinary dissipative friction force as well as a random force R(t). Langevin originally employed eqn. (15.2.23) as a model for Brownian motion, where the mass disparity clearly holds (Langevin, 1908). In the limit of very large friction ζ0 , the inertial term µ¨ q can be neglected, resulting an equation of the form ζ0 q˙ = −

dW + R(t) dQ

(15.2.24)

which is known as the overdamped Langevin equation. The overdamped form of the equation is used, for example, in simple theories of polymers such as the Rouse model (Rouse, 1953) (see Section 1.7 for the Hamiltonian of the Rouse model). The

656

Langevin and generalized Langevin equations

most common use of the LE or the overdamped LE is as a thermostatting method for generating a canonical distribution (see Section 15.5). The quantity ζ0 is known as the static friction coefficient, defined generally as Z ∞ ζ0 = dt ζ(t). (15.2.25) 0

Note that the random force R(t) is now completely uncorrelated, as it is required to satisfy hR(0)R(t)i = 2kT ζ0 δ(t). (15.2.26)

In addition, note that ζ0 has units of mass·(time)−1 . The second limiting case we will consider is a sluggish bath that responds very slowly to changes in the system coordinate. For such a bath, we can take ζ(t) approximately constant over a long time interval, i.e., ζ(t) ≈ ζ(0) ≡ ζ, for times that are short compared to the actual response time of the bath. In this case, the memory integral can be approximated as Z t Z t dτ q(τ ˙ ) = ζ(q(t) − q(0)), (15.2.27) dτ q(τ ˙ )ζ(t − τ ) ≈ ζ 0

0

and eqn. (15.2.14) becomes µ¨ q=−

d dq

  1 W (q) + ζ(q − q(0))2 + R(t). 2

(15.2.28)

Here, the effect of friction is now manifest as an extra harmonic term in the potential W (q), and all terms on the right are, again, evaluated at time t. This harmonic term in W (q) has the effect of trapping the system in certain regions of configuration space, an effect known as dynamic caging. Figure 15.1 illustrates how the caging potential ζ[q − q(0)]2 /2 can potentially trap the particle at what would otherwise be a point of unstable equilibrium. An example of this is a dilute mixture of small, light particles in a bath of large, heavy particles. In spatial regions where a heavy particle cluster forms a slowly moving spatial “cage,” the light particles can become trapped. Only rare fluctuations in the bath open up this rigid structure, allowing the light particles to escape. After such an escape, however, the light particles can become trapped again in another cage newly formed elsewhere for a comparable time interval. Not unexpectedly, dynamic caging can cause a significant decrease in the rate of light-particle diffusion.

15.3

Analytically solvable examples

In the next few subsections, a number of simple yet illustrative examples of both Langevin and generalized Langevin dynamics will be examined in detail. In particular, we will study the free Brownian particle and compute its diffusion constant and then consider the free particle in a more general bath with memory. Finally, we will consider the harmonic oscillator and derive well-known relations for the vibrational and energy relaxation times.

Analytically solvable examples

657

Fig. 15.1 Example of the dynamic caging phenomenon. W (q) is taken to be the double-well potential. The potential ζ(q − q0 )2 /2 is the single-minimum solid line, and the dashed line shows the potential shifted to the top of the barrier region.

15.3.1

The free Brownian particle

A particle diffusing in a dissipative bath with no external forces is known as a free Brownian particle. The dynamics is described by eqn. (15.2.23) with W (q) = 0: µ¨ q = −ζ0 q˙ + R(t).

(15.3.1)

Since only q¨ and q˙ appear in the equation of motion, we can rewrite eqn. (15.3.1) in terms of the velocity v = q˙ µv˙ = −ζ0 v + R(t). (15.3.2)

Equation (15.3.2) can be treated as an inhomogeneous first-order equation that can be solved in terms of R(t). In order to derive the solution for a given initial value v(0), we take the Laplace transform of both sides, which yields ˜ µ(s˜ v (s) − v(0)) = −ζ0 v˜(s) + R(s).

(15.3.3)

Defining γ0 = ζ0 /µ and f (t) = R(t)/µ and solving for v˜(s) gives v˜(s) =

v(0) f˜(s) + . s + γ0 s + γ0

(15.3.4)

The function 1/(s + γ0 ) has a single pole at s = −γ0 . Hence, the inverse Laplace transform (see Appendix E) yields the solution for v(t) as Z t v(t) = v(0)e−γ0 t + dτ f (τ )e−γ0 (t−τ ) . (15.3.5) 0

From eqn. (15.3.5), it is clear that the solution for a free Brownian particle has two components: a transient component dependent on v(0) that decays at large t, and a

658

Langevin and generalized Langevin equations

steady-state term involving a convolution of the random force with exp(−γ0 t). Thus, the system quickly loses memory of its initial condition, and the dynamics for long times is determined by the bath, as we would expect for a random walk process such as Brownian motion. There are some subtleties associated with integrals over the (stochastic) random force f (τ ), and these will be discussed in Section 15.5. Fortunately, in the examples considered here, we are only interested in correlation functions for which the random force integrals drop out, as we will see, and the problem of evaluating such integrals becomes moot. If we wish to compute the diffusion constant of the Brownian particle, we can use eqn. (13.3.34) and calculate the velocity autocorrelation function hv(0)v(t)i. From eqn. (15.3.5), the velocity correlation is obtained by multiplying both sides by v(0) and averaging over a canonical distribution of the initial conditions at temperature T , exp(−µv(0)2 /kT )/Q: Z t E D 2 −γ0 t + dτ hv(0)f (τ )ie−γ0 (t−τ ) . (15.3.6) hv(0)v(t)i = (v(0)) e 0

The second term in eqn. (15.3.6) vanishes because hv(0)f (τ )i = hv(0)R(τ )/µi = 0. Thus, interestingly, the velocity autocorrelation function is determined by the transient term, hence, the short-time dynamics. Performing the average h(v(0))2 i R∞ 2 E D dv v 2 e−µv /2kT kT 2 −∞ = (15.3.7) (v(0)) = R ∞ 2 /2kT −µv µ dv e −∞ yields the velocity autocorrelation function as hv(0)v(t)i =

kT −γ0 t e . µ

Finally, from eqn. (13.3.34), we find Z ∞ Z kT ∞ kT kT dt hv(0)v(t)i = D= = , dt e−γ0 t = µ 0 µγ0 ζ0 0

(15.3.8)

(15.3.9)

which has the expected units of length2 ·(time)−1 . Note that as ζ0 → ∞, the bath becomes infinitely dissipative and the diffusion constant goes to zero. Note that this simple picture of diffusion cannot capture the long-time algebraic decay of the velocity autocorrelation function mentioned in Section 13.3. 15.3.2

Free particle in a bath with memory

If the bath has memory, then the dynamics of the particle is given by the GLE, which, for a free particle, reads: Z t µ¨ q=− dτ q(τ ˙ )ζ(t − τ ) + R(t). (15.3.10) 0

As a concrete example, suppose the dynamic friction kernel is given by an exponential function

Analytically solvable examples

ζ(t) = λAe−λ|t| ,

659

(15.3.11)

which could describe the long-time decay of a realistic friction kernel. Although the cusp at t = 0 is problematic for the short-time behavior of a typical friction kernel, the exponential friction kernel is, nevertheless, a convenient and simple model that can be solved analytically and has been studied in some detail in the literature (Berne et al., 1966). Once again, let us introduce the velocity v = q. ˙ For the exponential friction kernel of eqn. (15.3.11), the GLE then reads Z t µv˙ = −λA dτ v(τ )e−λ(t−τ ) + R(t), (15.3.12) 0

where we are restricting the time domain to t > 0. Let us introduce the quantities a = A/µ and f (t) = R(t)/µ. The Laplace transform can turn this integro-differential equation into a simple algebraic equation. Taking the Laplace transform of both sides of eqn. (15.3.12), and solving for v˜(s), we obtain

f˜(s)(s + λ) v(0)(s + λ) + , (15.3.13) s2 + sλ + λa s2 + sλ + λa where the fact that the memory integral is a convolution has been used to give its Laplace transform as a product of Laplace transforms of v(t) and ζ(t). For Laplace inversion, the poles of the the function (s + λ)/(s2 + sλ + λa) are needed. These occur where s2 + sλ + λa = 0, which yields two poles s± given by √ λ2 − 4λa λ . (15.3.14) s± = − ± 2 2 The poles will be purely real if λ ≥ 4a and complex if λ < 4a. Performing the Laplace inversion gives the solution in the form   (s− + λ)es− t (s+ + λ)es+ t v(t) = v(0) + (s+ − s− ) (s− − s+ )   Z t (s+ + λ)es+ τ (s− + λ)es− τ + dτ f (t − τ ) . (15.3.15) + (s+ − s− ) (s− − s+ ) 0 v˜(s) =

Since hv(0)f (t)i = 0, the velocity autocorrelation function becomes   D E λ 2 −λt/2 hv(t)v(0)i = (v(0)) e cos Ωt + sin Ωt , (15.3.16) 2Ω p where Ω = λa − λ2 /4 for complex roots and   E D λ 2 −λt/2 cosh αt + sinh αt , (15.3.17) hv(t)v(0)i = (v(0)) e 2α p where α = λ2 /4 − λa. For both cases, the diffusion constant obtained from eqn. (13.3.34) is kT . (15.3.18) D= A R∞ Since 0 ζ(t)dt = A, eqn. (15.3.18) is consistent with eqn. (15.3.9) for the free Brownian particle. As A → ∞, the bath becomes highly dissipative and D → 0. Again,

660

Langevin and generalized Langevin equations

the overall decay is exponential, which means that the long-time algebraic decay of characteristic of such an autocorrelation function is not properly described. 15.3.3

The harmonic oscillator in a bath with memory

As a final example of a GLE model, consider a harmonic reaction coordinate described by a bare potential V (q) = µω 2 q 2 /2. According to eqn. (15.2.13), the potential of mean force W (q) is also a harmonic potential but with a different frequency given by ω ˜ 2 = ω2 − 2

X α

gα2 , µmα ωα2

(15.3.19)

so that

1 2 2 µ˜ ω q . (15.3.20) 2 The quantity ω ˜ is known as the renormalized frequency We will examine the case in which the frequency of the oscillator is high compared to the bath frequencies, a condition that exists when the coupling between the system and the bath is weak. In P the limit of high ω ˜ , the term −2 α gα2 /µmα ωα2 will be a small perturbation to ω 2 . For a general friction kernel ζ(t), the GLE reads W (q) =

q¨ = −˜ ω2q −

Z

0

t

dτ q(τ ˙ )γ(t − τ ) + f (t),

(15.3.21)

where γ(t) = ζ(t)/µ and f (t) = R(t)/µ. Equation (15.3.21) must be solved subject to initial conditions q(0) and q(0). ˙ Taking the Laplace transform of both sides and solving for q˜(s) yields q˜(s) =

f˜(s) q(0) ˙ (s + γ˜ (s)) q(0) + + , ∆(s) ∆(s) ∆(s)

(15.3.22)

∆(s) = s2 + ω ˜ 2 + s˜ γ (s).

(15.3.23)

where In order to perform the Laplace inversion, the poles of each of the terms on the right side of eqn. (15.3.22) are needed. These are given by the zeroes of ∆(s). That is, we seek solutions of s2 + ω ˜ 2 + s˜ γ (s) = 0. (15.3.24) Even if we do not know the explicit form of γ˜ (s), when ω ˜ is large compared to the bath frequencies, it is possible to solve eqn. (15.3.24) perturbatively. We do this by positing a solution to eqn. (15.3.24) for s of the form s = s0 + s1 + s2 + · · ·

(15.3.25)

as an ansatz (Tuckerman and Berne, 1993). Substituting eqn. (15.3.25) into eqn. (15.3.24) gives 2

(s0 + s1 + s2 + · · ·) + ω ˜ 2 + (s0 + s1 + s2 + · · ·) γ˜ (s0 + s1 + s2 · · ·) = 0.

(15.3.26)

Analytically solvable examples

661

Assuming ω ˜ 2 >> s˜ γ (s) at the root, we can solve this equation to lowest order by neglecting the s˜ γ (s) term, which gives s20 + ω ˜ 2 = 0,

s0 = ±i˜ ω.

(15.3.27)

Next, working to first order in the perturbation, we have s20 + 2s0 s1 + ω ˜ 2 + s0 γ˜(s0 ) = 0,

(15.3.28)

where it has been assumed that s0 γ˜ (s0 ) is of the same order as s1 . Using the fact that s20 = −˜ ω 2 and solving for s1 , we obtain 1 ω), s1 = − γ˜ (±i˜ 2 which requires the evaluation of γ˜ (s) at s = ±i˜ ω. Note that Z ∞ dt γ(t)e∓i˜ωt , γ˜(±i˜ ω) =

(15.3.29)

(15.3.30)

0

which contains both real and imaginary parts. Defining γ˜(±i˜ ω) = γ ′ (˜ ω ) ∓ iγ ′′ (˜ ω ), there are, to first order, two roots of ∆(s), which are given by   1 1 1 ′′ ω ) − γ ′ (˜ ω ) ≡ iΩ − γ ′ (˜ ω) s+ = i ω ˜ + γ (˜ 2 2 2   1 ′′ 1 1 s− = −i ω ˜ + γ (˜ ω ) − γ ′ (˜ ω ) ≡ −iΩ − γ ′ (˜ ω ). 2 2 2

(15.3.31)

(15.3.32)

Substituting the roots in eqn. (15.3.32) into eqn. (15.3.22) yields the solution   ′ q(0) ˙ γ ′ (˜ ω) −γ ′ (˜ ω )t/2 sin Ωt + e−γ (˜ω)t/2 sin Ωt cos Ωt + q(t) = q(0)e 2Ω Ω Z ′ 1 t + dτ f (t − τ )e−γ (˜ω)τ /2 sin Ωτ Ω 0   ′ ′ γ ′ (˜ ω) sin Ωt e−γ (˜ω)t/2 − Ωq(0)e−γ (˜ω)t/2 sin Ωt q(t) ˙ = q(0) ˙ cos Ωt − 2Ω   Z t ′ ′ 1 + f (0)e−γ (˜ω)t/2 sin Ωt + dτ f ′ (t − τ )e−γ (˜ω)τ /2 sin Ωτ , (15.3.33) Ω 0 where we have used the fact that γ ′′ (˜ ω ) ≪ Ω, and we have neglected any terms nonlinear in γ ′ (˜ ω ). Since hq(0)f (t)i = 0 and hq(0)f ˙ (t)i = 0, the velocity and position autocorrelation functions become, respectively  

2 −γ ′ (˜ω)t/2 γ ′ (˜ ω) cos Ωt − sin Ωt Cvv (t) = q˙ (0) e 2Ω

662

Langevin and generalized Langevin equations

 

2 −γ ′ (˜ω)t/2 γ ′ (˜ ω) sin Ωt . Cqq (t) = q (0) e cos Ωt + 2Ω

(15.3.34)

As eqn. (15.3.34) shows, the decay time of both correlation functions is [γ ′ (˜ ω )t/2]−1 , which is denoted T2 and is called the vibrational dephasing time. (We will explore vibrational and energy relaxation phenomena as an application of the GLE in greater detail in Section 15.4.) According to eqn. (13.3.41), the velocity autocorrelation function of a harmonic oscillator in isolation is proportional to cos ωt, where ω is the bare frequency of the oscillator. In this case, the correlation function does not decay because the system retains infinite memory of its initial condition. However, when coupled to a bath, the oscillator exchanges energy with the bath particles via collision events and, as a result, loses memory of its initial state on a time scale T2 . If there is a very large disparity of frequencies between the oscillator and the bath, the coupling between them will be weak and T2 will be long, whereas if the oscillator frequency lies near or within the spectral density of the bath, vibrational energy exchange will occur readily and T2 will be short. Thus, T2 is an indicator of the strength of the coupling between the oscillator and the bath. As the frequency of the oscillator is increased, the coupling between the oscillator and the bath becomes weaker, and γ ′ (˜ ω ) decreases. According to eqn. (15.3.34), this means that the correlation functions Cqq (t) and Cvv (t) decay more slowly, and the number of oscillations that can cycle through on the time scale T2 grows. Two examples of the velocity autocorrelation function Cvv (t) are shown in Fig. 15.2. In this example, the values of ω ˜ , γ ′ (˜ ω ), and γ ′′ (˜ ω ) correspond to a harmonic diatomic molecule of atomic type A coupled to a bath of A atoms interacting with each other and with the molecule via a Lennard-Jones potential at reduced temperature Tˆ = T /ǫ = 2.5 and reduced density ρˆ = ρσ 3 = 1.05. The frequencies ω = 60 and p ω = 90 are expressed in Lennard-Jones reduced frequency units ǫ/(mσ 2 ). A method for calculating the friction kernel for a high-frequency oscillator weakly coupled to a bath will be discussed in Section 15.8.

Fig. 15.2 Velocity autocorrelation function of the bond length of a harmonic diatomic coupled to a Lennard-Jones bath aspdescribed in the text. The bond frequencies ω = 60 and ω = 90 are expressed in units of ǫ/(mσ 2 ).

Analytically solvable examples

15.3.4

663

Exact solution for an exponential memory kernel

In this section, we will consider the exact, analytical solution to eqn. (15.3.21) with an exponential memory kernel of the form γ(t) = γ0 e−α|t| .

(15.3.35)

Our goal in carrying out this analysis is not to obtain the detailed mathematical solution, which requires a long derivation that does not contain much physical content. Rather, we will provide enough of the derivation that the relevant physics can be readily extracted. It is important to note that the kernel in eqn. (15.3.35) is unphysical at t = 0 because of the cusp that exists there. Our reasons for considering such a kernel are threefold: (1) such a kernel models interesting physical behavior for t > 0; (2) this physical behavior can be exploited to design useful memory kernels for molecular simulation, which we will discuss later in this chapter; (3) eqn. (15.3.21) can be solved analytically. As we have done in the previous examples, we will transform the GLE into Laplace space and examine the solutions in terms of the Laplace variable s. For the exponential memory kernel in eqn. (15.3.35), if t ≥ 0, the absolute value of t becomes irrelevant, and γ˜(s) = γ0 /(s + α), hence, eqn. (15.3.22) becomes q˜(s) =

f˜(s) s+α s2 + αs + γ0 q(0) + q(0) ˙ + , D(s) D(s) D(s)

(15.3.36)

where  D(s) = s3 + αs2 + ω ˜ 2 + γ0 s + α˜ ω2.

(15.3.37)

We noted in eqn. (15.3.24) that solving the GLE requires finding the poles of each term on the right of eqn. (15.3.36), which are given by the zeroes of D(s). Since D(s) is a cubic polynomial, there will be three roots, which we can find using the cubic formula. Of the three roots, at least one must be real, giving rise to a pure exponential decay. Of the remaining two, these can either be complex conjugates or purely real, and it is interesting to examine which conditions lead to complex roots and which lead to real roots. The general form of these three roots is s1 = A + B −

α 3

1 s2 = − (A + B) − 2 1 s3 = − (A + B) − 2

√ α 3 +i (A − B) 3 √2 3 α −i (A − B) , 3 2

where  g √ 1/3 A= − + δ 2

(15.3.38)

664

Langevin and generalized Langevin equations

 g √ 1/3 B= − − δ 2

and

 α2 γ0 +ω ˜2 − 9 2  3 2 α2 1 g 2 ω ˜ + γ0 − + . δ= 4 27 3

g=

2α 3

(15.3.39)



(15.3.40)

Here, δ is called the discriminant of the cubic. If δ > 0, then two of the roots are complex (conjugates), and if δ < 0, all of the roots are real. Since g 2 /4 is a positivedefinite quantity, the only condition that can determine if δ is positive or negative is whether ω ˜ 2 + γ0 − α2 /3 is positive or negative. If this quantity is positive, then δ > 0 and two of the roots are complex. If it is negative, then δ could be negative if |˜ ω 2 + γ0 − α2 /3|3 /27 > g 2 /4, which, since γ0 > 0, can only be true if the decay rate α of the memory kernel is large compared to the frequency ω ˜ . However, because of the g 2 /4 term, the prefactor γ0 needs to be moderately large as well. This limit is clearly overdamped, as the frequency of the oscillator ω ˜ is low in a high-friction environment with little memory. If ω ˜ is high compared to γ0 and the decay rate, then δ > 0. However, δ > 0 when ω ˜ 2 + γ0 > α2 /3, which occurs when α is small p 2 compared to ω ˜ + γ0 or when the memory kernel decays slowly. This represents the dynamic caging limit discussed previously (cf. eqn. (15.2.28)). Thus, in carrying out the analysis in this section, we see that physical insight can be gained simply by examining the behavior of the poles of eqn. (15.3.36). Of course, if a complete timedependent solution is needed, it can be obtained by inverting the Laplace transform using the three roots in eqns. (15.3.38) (see Problem 15.11). In Section 15.6, we will introduce simple algorithms for integrating the GLE with exponential memory.

15.4

Vibrational dephasing and energy relaxation in simple fluids

An application of the GLE that is of particular interest in chemical physics is the study of vibrational and energy relaxation phenomena. As we noted in Section 15.3.3, quantifying energy exchange between the system and the bath provides direct information about the strength of the system–bath coupling. For a harmonic oscillator coupled to a bath with memory, it was shown that, when the frequency of the oscillator is high compared to the spectral density of the bath, the vibrational relaxation time T2 satisfies ζ ′ (˜ ω) 1 = . (15.4.1) T2 2µ T2 is a measure of the decay time of the velocity and position autocorrelation functions. In addition to T2 , there is another relevant time scale, denoted T1 , which measures the rate of energy relaxation of the system. In this section, we will show how the GLE can be used to develop classical relations between T1 and T2 for both harmonic and anharmonic oscillators coupled to a bath. The times T1 and T2 are generally measured experimentally using nuclear-magnetic resonance techniques and, therefore, relate to

Vibrational dephasing

665

quantum processes. However, we will see that the GLE can nevertheless provide useful insights into the physical nature of these two time scales. Using the solutions of the GLE, it is also possible to show that the cross-correlation functions Cvq (t) = hv(0)q(t)i

Cqv (t) = hq(0)v(t)i

(15.4.2)

have the same decay time. For this discussion, we will find it convenient to introduce a change of nomenclature and work with normalized correlation functions: Cab (t) =

ha(0)b(t)i . ha2 i

(15.4.3)

In terms of the four normalized correlations functions, Cqq (t), Cvv (t), Cqv (t), and Cvq (t), the solutions of eqn. (15.3.21) can be expressed as q(t) = q(0)Cqq (t) + q(0)C ˙ vq (t) +

Z

t

Z

t

dτ f (t − τ )Cvq (τ )

0

q(t) ˙ = q(0)C ˙ vv (t) + q(0)Cqv (t) +

0

dτ f (t − τ )Cvv (τ )

(15.4.4)

(see Problem 15.4). Moreover, if the internal energy of the oscillator ε(t) =

1 2 2 1 2 µq˙ (t) + µ˜ ω q (t) 2 2

(15.4.5)

is calculated using the solutions in eqn. (15.4.4), then, as was shown by Tuckerman and Berne (1993), the autocorrelation function of ε(t) is Cεε (t) =

1 2 1 2 1 2 (t) Cvv (t) + Cqq (t) + 2 Cqv 2 2 ω ˜

(15.4.6)

(see Problem 15.4). Since each of the correlation functions appearing in eqn. (15.4.6) has an exponential decay envelope of the form exp(−ζ ′ (˜ ω )t/2µ), it follows that Cεε (t) will decay as exp(−ζ ′ (˜ ω )t/µ). This time scale corresponds to T1 and is given simply by ζ ′ (˜ ω) 1 = . T1 µ

(15.4.7)

Equations (15.4.1) and (15.4.7) are especially useful in analyzing the dynamics of solvation. They have also found use more broadly, for example, in analyzing certain classes of polarizable force field models (Rupakheti et al., 2020). A comparison of eqns. (15.4.7) and (15.4.1) reveals the prediction of the classical GLE approach namely that

666

Langevin and generalized Langevin equations

the vibrational dephasing and energy relaxation times for a harmonic oscillator coupled to a bath are related by 1 1 = . (15.4.8) T2 2T1 Equation (15.4.8) is true only for purely harmonic systems. However, real bonds always involve some degree of anharmonicity, which changes the relation between T1 and T2 . The more general expression of this relation is 1 1 1 = + , T2 2T1 T2∗

(15.4.9)

where T2∗ is a pure dephasing time. Now suppose we add to the harmonic potential µ˜ ω 2 q 2 /2 a small cubic term of the form gq 3 /6 so that the potential of mean force W (q) becomes 1 2 2 1 3 ω q + gq . (15.4.10) W (q) = µ˜ 2 6 Theoretical treatments of such a cubic anharmonicity have been presented by Oxtoby (1979), Levine et al. (1988), Tuckerman and Berne (1993), and Bader and Berne (1994), all of which lead to explicit expressions for the pure dephasing time. We note, however, that only direct solution of the full GLE, albeit an approximate one, yields 1/T2 in the form of eqn. (15.4.9), as we will now show. The GLE corresponding to the potential in eqn. (15.4.10) reads Z t g 2 2 q − dτ q(τ ˙ )γ(t − τ ) + f (t). (15.4.11) q¨ = −˜ ω q− 2µ 0 As long as the excursions of q in the cubic potential do not stray too far from the neighborhood of q = 0, the motion of q remains bound between definite turning points. However, as the energy of the oscillator fluctuations, the time required to move between the turning points varies. In other words, the period of the motion, and hence the frequency, varies as a function of the energy. Therefore, we seek a perturbative solution of eqn. (15.4.11), in which the anharmonicity is treated as an effect that causes the vibrational frequency to fluctuate in time. Equation (15.4.11) is then replaced, to lowest order in perturbation theory, by an equation of the form Z t 2 q¨ = −ω (t)q − dτ q(τ ˙ )γ(t − τ ) + f (t), (15.4.12) 0

where ω(t) = ω ˜ + δω(t) and δω(t) = gf (t)/2µ˜ ω 3 . By studying the autocorrelation function Cqq (t) within perturbation theory, it was shown (Tuckerman and Berne, 1993) that Cqq (t) is an oscillatory function with an exponential decay envelope. Thus, the general approximate form of Cqq (t) is E D Rt Rt (±,0) (±) (±,0) (t)e− 0 dτ (t−τ )hδω(0)δω(τ )i , (15.4.13) Cqq (t) = Cqq (t) ei 0 dτ δω(τ ) ≈ Cqq (±,0)

where Cqq (t) are purely harmonic autocorrelation functions similar to those in eqn. (15.3.34). The exponential decay term in eqn. (15.4.13) is the result of a cumulant

Molecular dynamics

expansion applied to hexp(i

(±,0) Cqq (t)

Rt 0

667

dτ δω(τ )i (see eqn. (4.8.22)). Combining the decay of

with the long-time behavior of the integral in eqn. (15.4.13) leads to following expression for the vibrational dephasing time: ζ ′ (˜ ω) g 2 kT 1 = + 3 6 γ˜(0), T2 2µ 4µ ω ˜

(15.4.14)

where γ˜ (0) is the Laplace transform of γ(t) at s = 0, which is also the static friction coefficient. The second term in eqn. (15.4.14) is a consequence of the anharmonicity. Since 1/T2∗ is a pure dephasing time, 1/T1 is still ζ ′ (˜ ω )/µ to the same order in perturbation theory. Hence, eqn. (15.4.14) implies that 1/T2 ≥ 1/2T1, where equality holds for g = 0. This inequality between T1 and T2 is usually true for anharmonic systems. An analysis by Skinner and coworkers using a higher order in perturbation theory suggested possible violations of this inequality under special circumstances (Budimir and Skinner, 1987; Laird and Skinner, 1991). Further analysis of such violations and potential difficulties with their detection were subsequently discussed by Reichman and Silbey (1996).

15.5

Molecular dynamics with the Langevin equation

Because the Langevin and generalized Langevin equations replace a large number of bath degrees of freedom with the much simpler memory integral and random force terms, simulations based on these equations are convenient and often very useful. They have a much lower computational overhead than a full bath calculation and can, therefore, access much longer time scales. The Langevin equation can also be employed as a simple and efficient thermostatting method for generating the canonical distribution, which is one of the most common applications. The generalized Langevin equation can also be used as a thermostatting method; however, the need to input a dynamic friction kernel ζ(t) renders the GLE less convenient for this purpose. Under certain conditions, a friction kernel can be generated from a molecular dynamics simulation (Straub et al., 1988; Berne et al., 1990), but some subtleties arise, which we will discuss in Section 15.8. When such a friction kernel is available, the GLE can, to a good approximation, yield the same dynamical properties as the full molecular dynamics calculation. Because of this important property, the GLE has been employed in the development of low-dimensional or “coarse-grained” models derived from fully atomistic potential functions. Employing the GLE helps to ensure that the coarse-grained model can more faithfully reproduce the dynamics of the more detailed model from which it is obtained (Izvekov and Voth, 2006). It has also been employed in the development of molecular dynamics methods (Ceriotti et al., 2010a) that mimic quantum effects (Ceriotti et al., 2009; Ganeshan et al., 2013), accelerate path-integral molecular dynamics calculations (Ceriotti et al., 2010b; Ceriotti et al., 2011), and circumvent resonances in multiple time-step methods (Morrone et al., 2011). 15.5.1

Numerical integration of the Langevin equation

The Langevin equation can be used in place of the methods discussed in Chapter 4 as a means of generating a canonical distribution. In fact, it is generally preferred by

668

Langevin and generalized Langevin equations

mathematicians, as its stochastic nature means ergodicity of the Langevin equation can be more easily proved for certain cases, while ergodicity of deterministic methods such as Nos´e-Hoover chains has never been established. In this section, we will derive a numerical integrator for the Langevin equation, following an approach introduced by Leimkuhler and Matthews (2012), using the Liouville operator formalism employed throughout this book. We begin by writing the Langevin equation for a single coordinate q and conjugate momentum p in the form p q˙ = µ p˙ = F (q) − γp + ση(t), (15.5.1) √ √ where γ = ζ0 /µ, η(t) = R(t)/ 2kT γµ, and σ = 2kT γµ. Since hR(0)R(t)i = kT ζ0 δ(t), it follows that hη(0)η(t)i = δ(t).2 The Liouville operator corresponding to eqns. (15.5.1) is iL =

p ∂ ∂ + F (q) + iLOU ≡ iL1 + iL2 + iLOU , µ ∂q ∂p

(15.5.2)

where iLOU , known as the Ornstein-Uhlenbeck operator, is determined by the second and third terms on the right side of the p˙ equation in eqns. (15.5.1). The operator iLOU produces free Brownian motion (see Section 15.3.1). In order to derive an integrator for the Langevin equation, we apply the Trotter theorem with the following operator factorization scheme for a small time step ∆t (see Section 4.12): eiL∆t = eiL2 ∆t/2 eiL1 ∆t/2 eiLOU ∆t eiL1 ∆t/2 eiL2 ∆t/2 . (15.5.3) The placement of the OU operator between half-step position updates was shown by Leimkuhler and Matthews (2012) and later by Zhang et al. (2017, 2019) to have the lowest error of all possible second-order factorizations.3 Without the middle factor exp(iLOU ∆t), the factorization in eqn. (15.5.3) would be equivalent to the velocity Verlet method and would produce the algorithm in eqn. (3.10.36). (The action of the operators exp(iL1 ∆t/2) and exp(iL2 ∆t/2) was discussed in Section 3.10.) The new operator in this scheme is exp(iLOU ∆t). At the end of this section, we will show that the action of this operator can be worked out analytically and that it produces the following evolution: eiLOU ∆t q = q 2 Because R(t) and η(t) are stochastic processes that are neither continuous nor anywhere differentiable, it is strictly correct to write eqns. (15.5.1) as equations for finite differentials, as p dq = dt µ dp = F (q)dt − γpdt + σdw(t),

where η(t) = dw/dt, with w(t) known as the Wiener process. 3 The integrator in eqn. (15.5.3) is often referred to colloquially as the “BAOAB” integrator, an acronym coined by Leimkuhler and Matthews who wrote eqn. (15.5.2) as iL = iLA + iLB + iLO . The Trotter factorization pattern in eqn. (15.5.3), therefore, follows the letter pattern B · A · O · A · B.

Molecular dynamics

eiLOU ∆t p = pe−γ∆t + σ

s

1 − e−2γ∆t r, 2γ

669

(15.5.4)

where r is a Gaussian random number of zero mean and unit width (see Section 3.8.3). If we take eqn. (15.5.4) as given and apply the direct translation technique, then the pseudocode in eqn. (3.10.36) would be modified to read as follows: p = p + 0.5 ∗ ∆t ∗ F q = q + ∆t ∗ p/2µ

p = p ∗ exp(−γ ∗ ∆t) + r ∗ σ ∗ sqrt((1 − exp(−2 ∗ γ ∗ ∆t))/(2 ∗ γ)) q = q + ∆t ∗ p/2µ Recalculate the force p = p + 0.5 ∗ ∆t ∗ F.

(15.5.5)

As an application of the algorithm in eqn. (15.5.5), we calculate the trajectory, phase space, and position distribution functions of a harmonic oscillator W (q) = µω 2 q 2 /2 with µ = 1, ω = 1, and kT = 1 for γ = 0.5 and γ = 8; the results are shown in Fig. 15.3. The Langevin equation is integrated for 108 steps with a time step of ∆t = 0.01. Figure 15.3 shows how the trajectory changes between γ = 0.5 and

Fig. 15.3 (Top) Trajectories of a harmonic oscillator with µ = 1, ω = 1, and kT = 1 coupled to a bath via the Langevin equation for γ = 0.5 (left) and γ = 8 (right). Here, T = 2π/ω is the period of the oscillator. (Middle) Phase space Poincar´e sections. (Bottom) Position probability distribution functions.

670

Langevin and generalized Langevin equations

γ = 8. Despite different values of the damping constant, the computed distribution functions agree with the analytical distributions. Note that values too large or too small for γ lead to distortions in the probability distribution. In the present example, values of γ less than 10−3 or greater than 100 lead to such distortions. We now present an approach for deriving eqn. (15.5.4). Recall that the operator exp(iLOU t) produces free Brownian motion, which was discussed in Section 15.3.1. In principle, the operator should produce the evolution given in eqn. (15.3.5). That is, if we set t = ∆t and express eqn. (15.3.5) in terms of momentum, we would write Z ∆t iLOU ∆t −γ∆t e p = pe +σ dτ η(τ )e−γ(∆t−τ ) . (15.5.6) 0

Although the integral in eqn. (15.5.6) over the stochastic process η(t) is defined, evaluating it is not straightforward and requires a mathematical framework known as the stochastic Itˆo calculus (named for Japanese mathematician Kiyosi Itˆ o, 1915–2008). Since a presentation of the Itˆ o calculus lies beyond the scope of this book, we will present a more heuristic derivation that nevertheless draws on its ideas. Since the Langevin equation assumes a Gaussian random process, we take p(∆t) = pe−γ∆t + rσ(∆t),

(15.5.7)

where r is a Gaussian random number of zero mean and unit width and σ(∆t) is a time-step dependent width to be determined such that eqn. (15.5.7) is consistent with eqn. (15.5.6) in a statistical sense. By construction, σ(∆t) is determined from hp2 (∆t)i − hp(∆t)i2 = σ 2 (∆t). Since η(t) satisfies hη(t)i = 0 and hη(t)η(t′ )i = δ(t − t′ ), let us use eqn. (15.5.6) to compute the averages hp2 (∆t)i and hp(∆t)i, assuming a large ensemble of trajectories all starting from the same initial momentum p. We then obtain Z ∆t −γ∆t hp(∆t)i = pe +σ dτ e−γ(∆t−τ )hη(τ )i = pe−γ∆t (15.5.8) 0

and

hp2 (∆t)i = p2 e−2γ∆t + 2σe−γ∆t + σ2

Z

∆t



0

Z

0

∆t

∆t

Z

dτ e−γ(∆t−τ )phη(τ )i

0

dτ ′ hη(τ )η(τ ′ )ie−2γ∆t eγ(τ +τ

= p2 e−2γ∆t + σ 2 e−2γ∆t

Z

∆t

= p2 e−2γ∆t + σ 2 e−2γ∆t

Z

∆t



0

Z

∆t 0



)

dτ ′ δ(τ − τ ′ )eγ(τ +τ



)

dτ e2γτ

0

= p2 e−2γ∆t +

 σ2 1 − e−2γ∆t . 2γ

(15.5.9)

Subtracting the square of eqn. (15.5.8) from the last line of eqn. (15.5.9), we obtain σ 2 (∆t) =

 σ2 1 − e−2γ∆t 2γ

Molecular dynamics

σ(∆t) = σ

s

1 − e−2γ∆t , 2γ

671

(15.5.10)

which, when substituted into eqn. (15.5.7), yields the evolution scheme in eqn. (15.5.4). In order to employ Langevin dynamics as a thermostat in a molecular dynamics calculation of N particles in three dimensions, each degree of freedom is subject to a friction and a random force. If the same random force is applied to each degree of freedom in a given time step, this would be equivalent to a gentle global thermostat, similar to global Nos´e-Hoover chains discussed in Section 4.11. If a more aggressive thermostat, analogous to a “massive” Nos´e-Hoover chain, is desired, then each degree of freedom would be subject to its own random force in each step of the simulation, requiring sampling 3N Gaussian random numbers in each step. Finally, consider the overdamped Langevin equation in the form q˙ =

1 F (q) + σ ˜ η(t), ζ0

(15.5.11)

p where σ ˜ = 2kT /ζ0 . The analogous integrator for the overdamped Langevin equation advances the coordinate q according to Leimkuhler and Matthews (2012) q(t + ∆t) = q(t) +

∆t 1 √ F (q(t)) + σ ˜ ∆t [rt + rt+∆t ] , ζ0 2

(15.5.12)

where rt and rt+∆t are the Gaussian random numbers from the previous step and the current step, respectively. 15.5.2

The stochastic Nos´ e-Hoover thermostat

The Langevin equation (15.5.1) has friction and noise applied directly to the physical degree of freedom, leading to a scheme for sampling a canonical distribution. However, it is also possible to combine the stochastic nature of the Langevin equation with the conventional Nos´e-Hoover approach of Section 4.9.2 to produce a mild, stochastic canonical sampling scheme that is also capable of approximately preserving dynamical quantities normally generated within the microcanonical ensemble. In order to generate robust canonical sampling, an approach such as this should function as Nos´e-Hoover chains do (see Section 4.11) by curing sampling failures due to conservation laws that restrict the accessible phase space while providing an option for less aggressive sampling. An approach that accomplishes this is the Nos´e-Hoover Langevin scheme of Leimkuhler et al. (2009). In their scheme, the Nos´e-Hoover chains of Section 4.11 are replaced by a friction and noise term added to a simple Nos´e-Hoover thermostat, giving a set of equations of motion for a single coordinate q and momentum p that take the form q˙ =

p m

p˙ = F (q) −

π p Q

672

Langevin and generalized Langevin equations

π˙ = G(p) − γπ + ση(t), (15.5.13) √ where σ = 2γkT Q and G(p) = p2 /m − kT . Note, here, that π is a thermostat momentum variable to which the friction and the stochastic noise term are applied. Since there is no conserved energy associated with these equations of motion, there is no need for a thermostat “coordinate”, as was used in the Nos´e-Hoover (4.9.2) and Nos´e-Hoover chain (4.11) equations. We will not delve into the mechanics of eqns. (15.5.13) except to say that, as with the Nos´e-Hoover chain equations, the Nos´e-Hoover Langevin approach is able to generate a correct canonical distribution for a simple harmonic oscillator (see Figs. 4.10 and 4.12). Interested readers are referred to Leimkuhler et al. (2009) for discussions about sampling distributions and generating dynamical observables using eqns. (15.5.13). Our purpose in introducing these equations here is to make use of a variant of them in the next section, where we discuss the use of a stochastic approach to create a resonance-free multiple time-step molecular dynamics algorithm analogous to that introduced in Section 4.14. 15.5.3

The stochastic isokinetic thermostat

In the previous subsection, we suggested that Nos´e-Hoover chains could be replaced by a single stochastic Nos´e-Hoover thermostat known as the Nos´e-Hoover Langevin scheme. At the same time, in Section 4.14, it was shown that Nos´e-Hoover chains can be combined with a set of isokinetic constraints as a means of controlling resonances in multiple time-step molecular dynamics calculations. In this section, we build on these ideas to derive a stochastic isokinetic thermostat also capable of controlling these resonance artifacts. In Section 4.11, we argued that Nos´e-Hoover chains could fix the sampling problems associated with the simple Nos´e-Hoover thermostat, yet ergodicity for Nos´e-Hoover chains cannot be proved. Stochastic approaches, by contrast, render such proofs possible, which is why they are preferred from a mathematical standpoint. Consider eqns. (4.14.8), which couple a physical degree of freedom q and momentum p to a Nos´e-Hoover chain via a set of isokinetic constraints. In the stochastic version of this algorithm, we replace the full Nos´e-Hoover chain with a Nos´e-Hoover Langevin thermostat, leading to equations of motion of the form p q˙ = m p˙ = F − αp p˙1,j = −

p2,j p1,j − αp1,j , Q2

j = 1, ..., L

p˙2,j = G(p1,j ) − γp2,j + σηj (t), (15.5.14) p where G(p1,j ) = p21,j /Q1 − kT and σ = 2kT γ/Q2. As in Section 4.14, the isokinetic constraint condition that defines the coupling takes the form L

L X p21,j p2 = LkT, + m L + 1 j=1 Q1

(15.5.15)

Molecular dynamics

673

which means that the Lagrange multiplier is α=

F p/m − L/(L + 1)

p2 /m + L/(L +

PL

2 j=1 p1,j p2,j /Q1 Q1 . PL 1) j=1 p21,j /Q1

(15.5.16)

The distribution generated by eqns. (15.5.14) is canonical in the coordinates, as was shown in Section 4.14. Let us now assume that the force F in eqns. (15.5.14) has fast (f ) and slow (s) contributions such that F = Ff + Fs . Introducing velocities v = p/m and vk,j = pk,j /Qk , we construct the Liouville operator with terms grouped as follows: ) (s) iL = iLq + iL(f v + iLv + iLN + iLOU ,

(15.5.17)

where iLq = v iL(s) v

∂ ∂q

Fs = m

  L X Fs v mv 2 ∂ ∂ 1− − v1,j LkT ∂v j=1 LkT ∂v1,j

Ff m

  L X Ff v mv 2 ∂ ∂ 1− − v1,j LkT ∂v j=1 LkT ∂v1,j

) iL(f v =

   L L X X  ∂ ∂ Q1 2 2    v2,j v1,j − v2,j v2,j v1,j + iLN = (L + 1)kT ∂v ∂v 1,j j=1 j=1 +

L X j=1

+

L X G(v1,j ) j=1

Q2

∂ . ∂v2,j

(15.5.18)

These operators are not only useful for deriving an integrator; they can be used to construct a proof of ergodicity of the equations of motion as was shown by Leimkuhler et al. (2013) using a set of conditions developed by L. H¨ ormander (known as H¨ ormander conditions) (H¨ormander, 1967). We will omit the details here—suffice it to say that the conditions require calculation of many nested commutators of the operators (Leimkuhler et al., 2013). For the purposes of deriving numerical solution algorithms, the decomposition in eqn. (15.5.18) allows us to formulate “end” and “middle” (see Section 4.12) multiple time-step propagators of the form

674

Langevin and generalized Langevin equations

  exp(iL∆t) = exp (iLN ∆t/2) exp iL(s) v ∆t/2

h    in ) (f ) × exp iL(f δt/2 exp (iL δt/2) exp (iL δt) exp (iL δt/2) exp iL δt/2 q OU q v v

and

  × exp iL(s) v ∆t/2 exp (iLN ∆t/2)

(15.5.19)

 h   ) exp(iL∆t) = exp iL(s) exp iL(f v ∆t/2 v δt/2 exp (iLq δt/2) × exp (iLN δt/2) exp (iLOU δt) exp (iLN δt/2)  in   ) × exp (iLq δt/2) exp iL(f exp iL(s) v δt/2 v ∆t/2 , (15.5.20)

respectively, where, as usual, δt = ∆t/n. Note that the integrator in eqn. (15.5.19) is an XO-RESPA algorithm while eqn. (15.5.20) is formulated as an XI-RESPA method (see Section 4.12). Equations (15.5.19) and (15.5.20) involve operators some of which we have already encountered of numerical solvers for equations of motion. The op in our discussions  (α) erator exp iLv h/2 , where α = s, f , with h = ∆t or δt, changes both the velocity v and the thermostat elements v1,j . The solution procedure for this operator is given in eqns. (4.13.16) through (4.13.20), which, in this instance, yield solutions v(h/2) = v1,j (h/2) =

v(0) + (F/m)s(h/2) s(h/2) ˙ v1,j (0) , s(h/2) ˙

(15.5.21)

where v(0) and v1,j (0) denote the values these velocities have before the operator is applied. Here,  √  √  a  1 bt + bt − 1 , (15.5.22) cosh s(t) = √ sinh b b with a = F v(0)/(Lβ −1 ) and b = F 2 m/(Lβ −1 ). The other new operator is exp(iLN h/2), which also changes v and v1,j . For this operator, we make use of the Suzuki-Yoshida factorization approach of eqn. (4.12.16), e

iLN h/2

nsy h im Y eiLN wα h/2m , =

(15.5.23)

α=1

where wα are the Suzuki-Yoshida weights of Section 4.12. We further factorize exp(iLN wα h/m) by decomposing iLN as iLN = iLN,1 + iLN,2 , where    L L X X  Q1 ∂  2  ∂ 2  iLN,1 = v2,j v1,j v2,j v1,j − v2,j + (L + 1)kT ∂v ∂v 1,j j=1 j=1

Molecular dynamics

iLN,2 =

L X j=1

+

L X G(v1,j ) j=1

Q2

∂ . ∂v2,j

675

(15.5.24)

We then factorize exp(iLN wα h/2m) as eiLN wα h/2m = eiLN,2 wα h/4m eiLN,1 wα h/2m eiLN,2 wα h/4m .

(15.5.25)

The operator exp(iLN,2 wα h/4m) simply produces a translation of v2,j . The operator exp(iLN,1 wα h/2m), however, is more complex and one we have not yet met. Its action is obtained by solving the pair of coupled first-order differential equations   L Q1 β X 2  v˙ = v v2,j v1,j (L + 1) j=1 v˙ 1,j

  L Q1 β X 2  = v2,j v1,j v1,j − v2,j v1,j (L + 1) j=1

(15.5.26)

subject to initial conditions v(0), v1,1 (0), ..., v1,L (0), which denote the values of these variables just before the operator is applied. Because these equations are homogeneous, they can be solved by a simple separation. In particular, if we assume a solution for v1,j (t) of the form u1,j (t) = v1,j (t) exp(v2,j t), then for any pair u1,j and u1,k , it can be shown that u˙ 1,j /u˙ 1,k = u1,j /u1,k , and from this, some simple algebra reveals that v1,j (t) = v1,j (0)H(t)e−v2,j t . where H(t) =

s

mv 2 (0)

+

L L+1

Lβ −1 . PL 2 −2v2,j t j=1 Q1 v1,j (0)e

(15.5.27)

(15.5.28)

When these are substituted into the equation for v(t), we find that v(t) = v(0)H(t).

(15.5.29)

The action of the operator is then obtained by evaluating eqns. (15.5.27) and (15.5.29) at t = wα h/2m for each wα in the Suzuki-Yoshida propagation. The performance of the stochastic isokinetic method with multiple time-stepping is similar to that presented in Sections 4.14 and 4.15, and its ability to allow very large time steps has also been demonstrated for different potential energy models (Margul and Tuckerman, 2016) and for enhanced sampling techniques (see Sections 8.10 and 8.11.1) (Chen and Tuckerman, 2018). Here, we demonstrate that eqns. (15.5.14) can generate correct solvation free energies with very large time steps using the thermodynamic integration techniques discussed in Chapter 8. The solvation free energy of a molecule is a sensitive quantity that constitutes a challenging test of large time-step algorithms. Here, we consider the four polar molecules methanol, phenol, 1,4-dioxane,

676

Langevin and generalized Langevin equations

Fig. 15.4 Solvation free energies for methanol (circles), phenol (squares), 1,4-dioxane (triangles up), and dibenzo-1,4-dioxin (triangles down) as a function of the outer time step using eqns. (15.5.14) integrated using a three time-step multiple time-step algorithm (reproduced with permission from Abreu and Tuckerman, J. Chem. Theor. Comput. 16, 7314 (2020), copyright American Chemical Society).

and dibenzo-1,4-dioxin solvated in a bath of 499 water molecules (Abreu and Tuckerman, 2020). Parameters for these molecules are specified by the generalized Amber force field (Wang et al., 2004), while the water molecules are treated using the SPC-Fw model (Wu et al., 2006). In these simulations, van der Waals and Coulomb interactions are switched separately, with the Lennard-Jones van der Waals potential being switched on using the so-called softcore method (see Section 8.2), wherein the pair potential is given by   !2 6 6 σ σ − 6 1−λ 6  (15.5.30) usoft (r, λ) = 4λǫ  6 1−λ 6 r + 2 σ r + 2 σ

with λ ∈ [0, 1]. The electrostatic potential uses a simple linear scaling uij (r, λ) = λqi qj /rij . The simulations employ 21 λ points and 11 λ points for the van der Waals and electrostatic potentials, respectively. A three time-step RESPA scheme is used with 0.5 fs for bonded interactions and 3.0 fs for short-range non-bonded interactions within a cutoff of 8 ˚ A. Figure 15.4 shows the hydration free energies for the four molecules as a function of the outer time step for values from 6.0 fs to 90 fs. The figure demonstrates the power of circumventing resonances, as we see only negligible variations in the results at all time steps (Abreu and Tuckerman, 2020).

15.6

Designing memory kernels for specific tasks

Up to this point in the chapter, we have seen that the Langevin and generalized Langevin equations have many uses in both analytical models of dissipative baths and in the development of efficient molecular dynamics algorithms. Over the decades,

Designer memory kernels

677

these equations have been employed to construct rate theories (Kramers, 1940; Grote and Hynes, 1980; Tucker et al., 1991; Berezhkovskii et al., 1992; Pollak and Talkner, 1993) and theories of vibrational and energy relaxation (Tuckerman and Berne, 1993) (see Section 15.4) and dynamical friction (Berne et al., 1990). Generalized Langevin equations have also been employed to create bath models for molecular dynamics simulations (Ceriotti et al., 2010a) that mimic quantum effects (Ceriotti et al., 2009; Ganeshan et al., 2013), reduce the bead number in path-integral calculations (Ceriotti et al., 2010b; Ceriotti et al., 2011), and stave off the (now familiar) resonances that plague multiple time-step methods (Morrone et al., 2011). In this section, we will develop a general framework for “designer” memory kernels. We will then consider the example of a kernel capable of mollifying resonances in multiple time-step molecular dynamics. Consider the generalized Langevin equation for a coordinate q and momentum p with associated mass m, which we cast in the form q˙ =

p m

p˙ = F (q) −

Z

0

t

d τ p(τ )ζ(t − τ ) + R(t),

(15.6.31)

for t ≥ 0 with the dynamic friction kernel given by a generalization of the exponential and δ-function forms in eqns. (15.3.11) and (15.2.22) ˜ T e−|t|Γ γ, ζ(t) = ζ0 δ(t) − γ

(15.6.32)

which is defined for t ∈ (−∞, ∞). Here, γ is a column vector having n components, γ T is an n-component row vector, and Γ is an n × n matrix. For this particular ˜ form of friction kernel (see Section 15.3.4 for an analytical treatment of a generalized Langevin equation with exponential memory), the generalized Langevin equation can be written in the form of a standard Langevin equation in an extended phase space with n additional variables s1 , ..., sn ≡ s. This extended phase-space formulation of the Langevin equation takes the form p m       p˙ F (q) p = −Z + Bξ(t), s˙ 0 s q˙ =

(15.6.33)

where ξ(t) is a vector of Gaussian random noise processes, each having zero mean and unit width, Z is the (n + 1) × (n + 1) matrix   ˜T ζ0 γ Z= , (15.6.34) γ Γ known as the damping matrix or drift matrix, and B is an (n + 1) × (n + 1) matrix known as the diffusion matrix. The equivalence between eqns. (15.6.33) and (15.6.31)

678

Langevin and generalized Langevin equations

with memory kernel given by eqn. (15.6.32) can be seen by solving eqn. (15.6.33) for s(t) in terms of p(t) and then substituting the result into eqn. (15.6.31) (see Problem 15.12). The second fluctuation dissipation theorem (cf. eqn. (15.2.17)) allows us to derive a relation between the matrices B and ζ from the autocorrelation function of the random force. This connection takes the form of a matrix equation  (15.6.35) mkT Z + ZT = BT B.

Equation (15.6.35) implies that Z + ZT , which is the symmetric part of Z, must be positive-definite. Since eqn. (15.6.33) is an ordinary Langevin equation in the extended phase space, it can be integrated using the methodology of Section 15.5.1, yet it encodes the memory of eqn. (15.6.32), which can be chosen a priori to model a colored-noise bath with features that influence a simulation in a desired manner. In other words, eqn. (15.6.33) allows us to design memory kernels for performing specific types of simulations while retaining the simplicity of the Langevin equation. Because the elements of Z are contained in the general exponential memory kernel of eqn. (15.6.32), designing a kernel for a specific task is tantamount to customizing the drift matrix Z. From the framework presented thus far, we can glean some important restrictions on possible choices of Z. From eqn. (15.6.32), it is clear that the Z11 element is simply the white-noise term and that the time dependence, hence the memory component, is encoded in the matrix Γ. Given this, it is important to note that the eqn. (15.6.33) will only be able to recover an equilibrium ensemble as t → ∞ if K(t) has a finite decay time, and this, therefore, requires that the real part of the eigenvalues of Γ be positive. Choosing drift matrices that satisfy requirements such as these can be nontrivial, and each choice should be carefully tested for compliance with eqn. (15.6.35) and its ability to produce stable dynamics. With these caveats in mind, let us consider how we might design a kernel to mollify resonances in multiple time-scale molecular dynamics simulations. Recall that resonances occur from the influence of high-frequency motion (integrated using the smallest time step) on intermediate and long time scales (integrated using larger time steps). Thus, the key feature that a bath model should have is strong coupling to the highest frequencies, leading to strong damping of these frequencies, and weaker coupling and small perturbation of lower frequencies. These considerations suggest that our starting point is identifying the frequency dependence of the memory kernel. That is, we should start with the Fourier transform of K(t): 1 ˆ K(ω) =√ 2π

Z



K(t)e−iωt dt.

(15.6.36)

−∞

If we now require that the friction couple strongly to high frequencies and weakly to ˆ low frequencies, we can intuit what the general shape of K(ω) should be, and we show ˆ this general behavior in Fig. 15.5. Since K(ω) determines the strength of the coupling ˆ at a given value of ω, it can be seen that at the extremes, where |ω| is large, K(ω) ˆ is large as well, while near ω ≈ 0, K(ω) ≈ 0. There are many functional forms that could, in principle, produce the behavior in Fig. 15.5; however, we need a functional form that is consistent with the Fourier transform of eqn. (15.6.32), and since eqn.

Designer memory kernels

679

Fig. 15.5 Desired behavior of the Fourier transform of the memory kernel for mollifying resonances.

ˆ (15.6.32) is a sum of simple exponentials, K(ω) should be composed of Lorentzian ˆ functions. For example, consider K(ω) in the form " # √ ζ0 ω ˜2 ω ˜2 ˆ K(ω) = 2πζ0 − (15.6.37)  + 3 2 . ω ˜ 2 2 2 3ω ˜ 2 + ω + ω˜2 4˜ + ω− 2 4ω

This kernel is a √ sum of two Lorentzian functions centered at ω = ±˜ ω/2 subtracted from a constant 2πζ0 . The memory kernel K(t) to which this corresponds is r √ 2π ˜ cos (˜ ω t/2) . (15.6.38) ω ˜ ζ0 e− 3ω|t|/2 K(t) = ζ0 − 3 Given that K(t) has two exponentials (which becomes clear when cos(˜ ω t/2) is written as (exp(i˜ ωt/2) + exp(−i˜ ωt/2))/2), we then need n = 2 extended variables s1 and s2 , which means that √ the drift matrix Γ is a 3×3 matrix. The 2×2 submatrix Γ must have eigenvalues (˜ ω 3/2) ± (i˜ ω/2), and a 2×2 matrix that gives us these eigenvalues is   √ ˜ ω ˜ 3 ω . (15.6.39) Γ= −˜ ω 0 ˜ T and γ have components proportional to (ζ0 ω Finally, the vectors γ ˜ )1/2 , i.e.,     α α ˜ = (ζ0 ω , (15.6.40) , γ = (ζ0 ω ˜ )1/2 γ ˜ )1/2 −β β where α and β are simple numerical constants.

680

Langevin and generalized Langevin equations

As an example, we consider a flexible model of water (Wu et al., 2006), for which the colored-noise Langevin scheme is used to compute the diffusion constant employing a three time-step multiple time-stepping scheme. Since diffusion is dominated by lowfrequency motion, the strong coupling between high-frequency modes to the bath should allow a large outer time step without affecting the diffusion constant. In this scheme, a time step of 0.5 fs is used for bonded interactions, non-bonded interactions within 9 ˚ A are integrated with a 2 fs time step, and the outer time step is varied. The correct diffusion constant for the model is 0.25 ˚ A2 /ps. When the colored-noise scheme is used with outer time steps of 12 fs, 16 fs, and 20 fs, the resulting diffusion constants are 0.24 ˚ A2 /ps, 0.20 ˚ A2 /ps, and 0.16 ˚ A2 /ps, respectively (Morrone et al., 2011). By contrast, when a simple white-noise Langevin scheme is used with a friction value of 1 ps−1 , the resulting diffusion constants for outer time steps of 12 fs, 16 fs, and 20 fs are 0.095 ˚ A2 /ps, 0.044 ˚ A2 /ps, and 0.019 ˚ A2 /ps, respectively. This example illustrates the efficacy of the colored-noise method at mollifying resonances, allowing an outer time step of 12 fs to be achieved with minimal perturbation to the rate of diffusion.

15.7

Sampling stochastic transition paths

The numerical integration algorithm of Section 15.5.1 can be used in conjunction with the transition path sampling approach of Section 7.7 to sample a transition path ensemble of stochastic paths from a region A of phase space to another region B. As noted in Section 7.7, the shooting algorithm is an effective method for generating trial moves from a path Y(t) to a new path X(t). Here, we describe a simple variant of the shooting algorithm for paths satisfying the Langevin equation. In fact, the shooting algorithm can be applied almost unchanged from that described in Section 7.7. However, a few differences need to be pointed out. First, the random force in the Langevin equation is not deterministic, which means that a rule such as that given in eqn. (7.7.2) for a numerical solver such as velocity Verlet cannot be used for Langevin dynamics. Rather, we need to account for the fact that a distribution of q(t + ∆t) values can be generated from q(t) due to the random force. The solver in eqn. (15.5.5) can be expressed compactly as x(k+1)∆t = xk∆t + δxd + δxr ,

(15.7.41)

where the displacement δxd is purely deterministic, and δxr is due to the random force. If we take the random force to be a Gaussian random variable, then we can state the rule for generating trial moves in phase space from xk∆t to x(k+1)∆t as T (x(k+1)∆t |xk∆t ) = w(δxr ),

(15.7.42)

where w(x) is a Gaussian distribution of width determined by the friction (Chandrasekhar, 1943). In the high friction limit, eqn. (15.7.42) can be shown to be

Path sampling

681

T (q((k + 1)∆t)|q(k∆t)) = r

i h µγ µγ exp − (q((k + 1)∆t) − q(k∆t))2 4πkT ∆t 4kT ∆t

(15.7.43)

for a single degree of freedom q (Dellago et al., 2002). Note that as ∆t → 0, the Gaussian distribution tends to a Dirac δ-function, as expected (see eqn. (A.5) in Appendix A). The second difference in the shooting algorithm is in the choice of the shooting point. Because the trajectories are stochastic, we are free to choose a shooting point to lie on the old trajectory Y(t) without modification because a trajectory launched from this point will be different from Y(t). Thus, given a stochastic transition path Y(t) and a randomly chosen point yj∆t on this path, we can take the rule for generating the new shooting point xj∆t to be τ (xj∆t |yj∆t ) = δ(xj∆t − yj∆t ).

(15.7.44)

Third, because the Langevin equation acts as a thermostatting mechanism, the distributions of initial conditions f (x0 ) and f (y0 ) will be canonical by construction. Thus, if canonical sampling is sought, then there is no need to apply the acceptance rule min[1, f (x0 )/f (y0 )] for each trial path move. Putting this fact together with eqn. (15.7.44) and the Gaussian form of eqn. (15.7.42), which is symmetric, gives a particularly simple acceptance criterion from eqn. (7.7.12) Λ[X(t)|Y(t)] = hA (x0 )hB (xn∆t ).

(15.7.45)

Thus, as long as the new path is a proper transition path from A to B, it is accepted with probability 1. We can now summarize the steps of the shooting algorithm for stochastic paths as follows: 1. Choose an index j randomly on the old trajectory Y(t) and take the shooting point yj∆t to be the shooting point xj∆t . 2. Integrate the equations of motion backwards in time from the shooting point to the initial condition x0 using a stochastic propagation scheme such as that of eqn. (15.5.5). 3. If the initial condition x0 is not in the phase-space region A, reject the trial move; otherwise, accept it. 4. Integrate the equations of motion forward in time to generate the final point xn∆t using a stochastic propagation scheme such as that of eqn. (15.5.5). 5. If xn∆t ∈ B, accept the trial move, and reject it otherwise. 6. If the path is rejected at steps 3 or 5, then the old trajectory Y(t) is counted again in the calculation of averages over the transition path ensemble. Otherwise, invert the momenta along the backward portion of the path to yield the complete transition path X(t) and replace the old trajectory Y(t) by the new trajectory X(t).

682

Langevin and generalized Langevin equations

Fig. 15.6 The shooting algorithm for stochastic paths (see also Fig. 7.7)

The shooting algorithm for stochastic paths is illustrated in Fig. 15.6. One final difference between the present shooting algorithm and that for deterministic molecular dynamics is that for stochastic trajectories, it is not necessary to generate both forward and backward segments from every shooting point. A stochastic path of higher statistical weight in the transition path ensemble can be obtained by integrating only backward in time and retaining the forward part of the old trajectory or vice versa. Of course, when this is done, the difference between old and new paths is smaller and sampling becomes less efficient. However, some fraction of the shooting moves can be of this type in order to give a higher average acceptance rate.

15.8

Mori-Zwanzig theory

Our original derivation in Section 15.2 of the generalized Langevin equation was based on the introduction of a harmonic bath as a model for a true bath. While conceptually simple, such a derivation naturally raises the question of whether a GLE can be derived in a more general way for an arbitrary bath. The Mori-Zwanzig theory (Mori, 1965; Zwanzig, 1973) achieves this and gives us deeper physical insight into the quantities that appear in the GLE (Deutsch and Silbey, 1971; Berne, 1971; Berne and Pecora, 1976). The Mori-Zwanzig theory begins with the full classical Hamiltonian and effectively “integrates out” the bath degrees of freedom by using a formalism known as the projection operator method (Kubo et al., 1985). In this approach, we divide the full set of degrees of freedom into the system and the bath, as was done for the harmonic bath Hamiltonian. In the phase space, we consider the two axes corresponding to the system coordinate q and its conjugate momentum p, and the remaining 6N − 2 axes orthogonal to the system. In order to make this phase-space picture concrete, let us introduce a two-component system vector   q A= . (15.8.1) p Geometrically, recall that the projection of a vector b along the direction of another vector a is given by the formula   a a . (15.8.2) Pb = b · |a| |a| Here, P is an operator that gives the component of b along the direction of a, with a/|a| the unit vector along the direction of a. An analog of this formula is used to

Mori-Zwanzig theory

683

construct projection operators in phase space parallel and perpendicular to the vector A. The projection operator must also eliminate or integrate out the bath degrees of freedom. Thus, we define the operator P that both projects along the direction of A and integrates out the bath according to P = h...A† ihAA† i−1 A,

(15.8.3)

where the quantity on which P acts replaces the dots, and h...i denotes an average over a canonical ensemble. A† is the Hermitian conjugate of A. The use of Hermitian conjugates is introduced because of the close analogy between the Hilbert space formalism of quantum mechanics and the classical phase-space propagator formalism we will employ in the present derivation (see Section 3.10). The operator that projects along the direction orthogonal to A is denoted Q and is simply I − P, where I is the phase-space identity operator. The operators P and Q can be shown to be Hermitian operators. Note that this definition of the projection operator is somewhat more general than the simple geometric projector of eqn. (15.8.2) in that the quantities h...A† i and hAA† i−1 are matrices. These are multiplied together and then allowed to act on A, ultimately producing another two-component vector. As expected for projection operators, the actions of P and Q on the vector A are PA = A,

QA = 0.

(15.8.4)

Since PA = A, it follows that P2 A = PA = A, and Q2 A = −QQA = 0 = QA, which also means that P and Q satisfy P2 = P,

Q2 = Q.

(15.8.5)

This condition is known as idempotency, and the operators P and Q are referred to as idempotent operators. The projection operators P and Q can be used to analyze the dynamics of the system variables A. Recall that the time evolution of any quantity in the phase space is determined by the action of the classical propagator exp(iLt). The vector A, therefore, evolves according to A(t) = eiLt A(0). (15.8.6) Differentiating both sides of this relation with respect to time yields dA = eiLt iLA(0), (15.8.7) dt where iL is the classical Liouville operator of Section 3.10 We now use the projection operators to separate this evolution equation for A(t) into components along A(0) and orthogonal to A(0). This is done by inserting the identity operator I into eqn. (15.8.7) and using the fact that P + Q = I, which yields dA = eiLt (P + Q)iLA(0) = eiLt PiLA(0) + eiLt QiLA(0). (15.8.8) dt The first term can be evaluated by introducing eqn. (15.8.3) into eqn. (15.8.8): eiLt PiLA(0) = eiLt hiLAA† ihAA† i−1 A(0).

(15.8.9)

The integrations implied by the angular brackets in eqn. (15.8.9) are performed over an ensemble distribution of initial conditions A(0). The propagator exp(iLt) can be

684

Langevin and generalized Langevin equations

pulled across the ensemble averages, since the quantity hiLAA† ihAA† i−1 is a matrix independent of the phase-space variables. Thus, eiLt PiLA(0) = hiLAA† ihAA† i−1 eiLt A(0) = hiLAA† ihAA† i−1 A(t) ≡ iΩA(t), where Ω is a force-constant matrix given by Ω = hLAA† ihAA† i−1 .

(15.8.10)

Note that because of the application of the operators exp(iLt) and P, the first term in eqn. (15.8.8) is effectively linear in A(t). In order to evaluate the second term, we start with the trivial identity eiLt = eQiLt + eiLt − eQiLt .

(15.8.11)

The operator difference exp(iLt) − exp(QiLt) appearing in this identity can be evaluated as follows. We first take the Laplace transform of the exp(iLt) − exp(QiLt): Z ∞   −1 −1 (15.8.12) dt e−st eiLt − eQiLt = (s − iL) − (s − QiL) . 0

Equation (15.8.12) involves the difference of operator inverses. In general, given a −1 generic operator difference of the form O−1 1 − O2 , we multiply the first term by the −1 identity operator expressed as I = O2 O2 and the second term by the identity operator expressed as I = O−1 1 O1 to yield −1 −1 −1 O−1 1 − O2 = O1 (O2 − O1 ) O2 .

(15.8.13)

Applying eqn. (15.8.13) to the difference in eqn. (15.8.12) gives −1

−1

(s − iL)

−1

(s − QiL − s + iL) (s − QiL)

−1

(I − Q) iL (s − QiL)

= (s − iL)

−1

PiL (s − QiL)

,

−1

−1

−1

− (s − QiL)

−1

= (s − iL)

−1

= (s − iL)

−1

(15.8.14)

so that −1

(s − iL)

= (s − QiL)

+ (s − iL)

PiL (s − QiL)

.

(15.8.15)

Inverting the Laplace transform of both sides, we obtain eiLt = eQiLt +

Z

0

t

dτ eiL(t−τ ) PiLeQiLτ .

(15.8.16)

Mori-Zwanzig theory

685

The second term in eqn. (15.8.8) can be evaluated by multiplying eqn. (15.8.16) on the right by QiLA(0) to give eiLt QiLA(0) = eQiLt QiLA(0) +

Z

t

dτ eiL(t−τ ) PiLeQiLτ QiLA(0).

(15.8.17)

0

In the equation of motion dA/dt = iLA for A(t), the vector iLA drives the evolution. We can therefore think of iLA as a kind of general force driving the evolution of A, with iLA(0) being the initial value of this force. Indeed, the second component of this vector is the initial physical force since p˙ = F from Newton’s second law. The action of Q on iLA(0) projects the initial force onto a direction orthogonal to A. The evolution operator exp(QiLt) acts as a classical propagator of a dynamics in which the forces are orthogonal to A. Therefore, F(t) ≡ exp(QiLt)QiLA(0) is the time evolution of the projected force in this orthogonal subspace. Of course, the propagator exp(QiLt) and the true evolution operator exp(iLt) do not produce the same time evolution. The dynamics generated by exp(QiLt) is generally not conservative and, therefore, not straightforward to evaluate. However, we will see shortly that physically interesting approximations are available for this dynamics in the case of a high frequency oscillator. In order to complete the derivation of the GLE, we introduce the projected force F(t) into eqn. (15.8.17) to obtain e

iLt

t

QiLA(0) = F(t) +

Z

t

= F(t) +

Z Z

t

Z

t

= F(t) +

t

= F(t) +

Z

dτ eiL(t−τ ) PiLeQiLτ QiLA(0)

0

dτ eiL(t−τ ) PiLF(τ )

0

= F(t) +

0

0

0

dτ eiL(t−τ ) hiLF(τ )A† ihAA† i−1 A(0) dτ hiLF(τ )A† ihAA† i−1 eiL(t−τ ) A(0) dτ hiLF(τ )A† ihAA† i−1 A(t − τ ).

(15.8.18)

Since F(t) is orthogonal to A(t), it follows that QF(t) = F(t).

(15.8.19)

Equation (15.8.19) can be used to simplify the ensemble average appearing in eqn. (15.8.18). We first express the ensemble average of iLF(τ )A† as hiLF(τ )A† i = hiLQF(τ )A† i.

(15.8.20)

We next transfer the operator iL to A† by taking the Hermitian conjugate of iL. Recalling that L, itself, is Hermitian, we only need to change i to −i so that

686

Langevin and generalized Langevin equations

hiLF(τ )A† i = −hQF(τ )(iLA)† i.

(15.8.21)

Using eqn. (15.8.19) and the fact that Q is Hermitian, we can write eqn. (15.8.21) as hiLF(τ )A† i = −hQ2 F(τ )(iLA)† i = −hQF(τ )(QiLA)† i = −hF(τ )F† (0)i.

(15.8.22)

Therefore, eqn. (15.8.18) becomes eiLt QiLA = F(t) −

Z

0

t

dτ hF(τ )F† (0)ihAA† i−1 A(t − τ ).

(15.8.23)

Finally, combining eqn. (15.8.23) with eqns. (15.8.10) and (15.8.8) gives an equation of motion for A in which bath degrees of freedom have been effectively eliminated: dA = iΩA − dt

Z

0

t

dτ hF(τ )F† (0)ihAA† i−1 A(t − τ ) + F(t).

(15.8.24)

Equation (15.8.24) takes the form of a GLE for a harmonic potential of mean force if the autocorrelation function appearing in the integral is identified with the dynamic friction kernel K(t) = hF(t)F† (0)ihAA† i−1 . (15.8.25) The quantity K(t) is called the memory function or memory kernel. Note that K(t) is a matrix. Substituting eqn. (15.8.25) into eqn. (15.8.24) gives the generalized Langevin equation for a general bath dA = iΩA(t) − dt

Z

0

t

dτ K(τ )A(t − τ ) + F(t).

(15.8.26)

Although eqn. (15.8.26) is formally exact, the problem of determining F(t) and K(t) is generally more difficult than simply simulating the full system because of the need to generate the orthogonal dynamics of exp(QiLt) (Darve et al., 2009). Taken as a phenomenological theory, however, eqn. (15.8.26) implies that if the potential of mean force is harmonic, and a memory function can be obtained that faithfully represents the dynamics of the full bath, then the GLE will yield accurate dynamical properties of a system. If we wish to use eqn. (15.8.26) to generate dynamical properties in a low-dimensional subspace of the original system, then several subtleties arise. Consider a one-dimensional harmonic oscillator with coordinate x, momentum p, reduced mass µ, frequency ω, and potential minimum at x0 . For this problem, the force-constant matrix can be shown to be   0 1/µ iΩ = (15.8.27) −µ˜ ω2 0 (see Problem 15.9), where ω ˜ is the renormalized frequency, which can be computed using kT ω ˜2 = . (15.8.28) µh(x − hxi)2 i

Problems

687

More importantly, the memory kernel K(t) is not a simple autocorrelation function. A closer look at eqn. (15.8.25) makes clear that the required autocorrelation function is  K(t) = h eQiLt F F† ihAA† i−1 , (15.8.29) which requires the orthogonal dynamics generated by exp(QiLt). For this example, it can be shown that   0 0 K(t) = (15.8.30) 0 ζ(t)/µ and that F(t) =



0 p˙ + µ˜ ω2q



,

(15.8.31)

where q = x − hxi. If we denote the nonzero component of F(t) as δf , we can express the exact friction kernel as hδf eQiLt δf i ζ(t) = , (15.8.32) µ hp2 i which is nontrivial to evaluate. The standard autocorrelation function hδf eiLt δf i φ(t) = µ hp2 i

(15.8.33)

is not equal to the friction kernel. It was shown by Berne et al. (1990) that the Laplace transforms of ζ(t) and φ(t) are related by ˜ ˜ ζ(s) [φ(s)/µ] n o, = µ ˜ 1 − [s/(s2 + ω ˜ 2 )][φ(s)/µ]

(15.8.34)

˜ and φ(s) ˜ from which we can see that ζ(s) are equal only in the limit that s → 0. They are also equal when ω ˜ → ∞, and hence, the standard correlation function φ(t) is a reasonable approximation to ζ(t) only in the high-frequency limit. In fact, Berne et al. showed formally that ζ(t) can be calculated in this limit from the correlation function ¯

hδf eiLt δf i ζ(t) = , µ hp2 i

(15.8.35)

¯ is the Liouville operator for a system in which the oscillator coordinate where iL x is fixed at x = x0 . Equation (15.8.35) can be used, for example, to approximate the friction on a harmonic diatomic molecule by replacing it with a rigid diatomic in which the distance between the two atoms is constrained to the value of the equilibrium bond length. Importantly, as the frequency of the oscillator decreases, this rigid-bond approximation breaks down, and care is needed to determine when φ(t) is a good approximation to ζ(t).

688

15.9

Langevin and generalized Langevin equations

Problems

15.1. a. Use the definition of R(t) for the harmonic-bath model in eqn. (15.2.11) to show that hq(0)R(t)i = 0 and to derive eqn. (15.2.17). b. When R(t) is taken to be a true random process, it can be expressed as a Fourier series evaluated at M time points 0, ∆t, 2∆t, ..., (M − 1)∆t as R(k∆t) =

M−1 X

aj cos

j=0



2πjk M



+ bj sin



2πjk M



,

where the coefficients are random numbers sampled from a Gaussian distribution P (a0 , ..., aM−1 , b0 , ..., bM−1 ) =

M−1 Y k=0

and σk2

2 2 2 1 e−(ak +bk )/2σk 2 2πσk

  M−1 2πjk kT X ζ(j∆t) cos . = M j=0 M

Using these definitions, prove that hR(0)R(k∆t)i = kT ζ(k∆t), where the average is performed over all possible realizations of R(k∆t). Hint: Think carefully about what distinguishes one realization of the random force from another and define the average accordingly. 15.2. Consider a single harmonic degree of freedom q, having a mass m, that obeys the generalized Langevin equation Z t q¨ = −ω 2 q − dτ q(τ ˙ )γ(t − τ ) + f (t). 0

Here ω is the frequency associated with the motion of q, γ(t) = ζ(t)/m, and f (t) = R(t)/m, where R(t) and ζ(t) are the random force and friction kernel, respectively. As a crude model of a rapidly decaying friction kernel, consider a γ(t) that is constant over a very short time t0 and then drops suddenly to 0: γ(t) = γ0 θ(t0 − t). Here γ0 is a constant, θ(x) is the Heaviside step function, t ≥ 0, and t0 ≥ 0. a. Show that if t0 is small enough that exp(−st0 ) ≈ 1 − st0 for all relevant values of s, then the presence of memory in this system causes the normalized velocity autocorrelation function Cvv (t) to oscillate with a frequency

Problems

689

less than ω and to decay slowly in time. Determine the decay constant and oscillation frequency of Cvv (t). b. Suppose that we now let t0 become very large. Show that the velocity autocorrelation no longer decays but oscillates with a frequency different from ω. What is the oscillation frequency of Cvv (t)? c. Explain the physical origin of the behavior of the velocity correlation function in the two limits considered in parts a and b in terms of the response of the bath to the system. 15.3. A simple model for electron transfer is defined by a quantum Hamiltonian of the form ∆ ˆ = εσ ˆz + σ ˆx , H 2 2 where ǫ and ∆ are constants and σ ˆx , σ ˆy , and σ ˆz are the Pauli matrices given by       0 1 0 −i 1 0 σ ˆx = σ ˆy = σ ˆz = . 1 0 i 0 0 −1 The model assumes that there are only two states for the electron and is thus a simplification of the true electron-transfer problem. a. In the Heisenberg picture, the operators that describe the observables of a system evolve in time according to the equation of motion 1 ˆ ˆ dAˆ = [A, H], dt i~ where Aˆ is an arbitrary operator. Write down Heisenberg’s equations for σ ˆx , σ ˆy , and σ ˆz . b. Compute the autocorrelation function hˆ σy (0)ˆ σy (t)i assuming an initial canonical distribution. c. In order to mimic the effect of an environment, the above two-level system is often coupled to a bath of quantum-mechanical harmonic oscillators for which the Hamiltonian is given by   X  ~ X 1 ∆ ˆ = εσ + σ ˆ†α a ˆα + H ˆz + σ ˆx + ˆz ~ωα a gα a ˆ†α + a ˆα , 2 2 2 2 α α where α is an index that runs over all of the bath modes, ωα are the bath frequencies, a ˆα and a ˆ†α are the bath annihilation and creation (lowering and raising) operators, respectively, and gα are coupling constants. For this Hamiltonian, write down the Heisenberg equations of motion for all operators, including the Pauli matrices of the system and the creation and annihilation operators of the bath.

690

Langevin and generalized Langevin equations

d. By solving the Heisenberg equations for the bath operators, develop generalized Langevin type equations for the spin operators σ ˆx and σ ˆy . 15.4. a. Derive eqn. (15.4.4). ∗ ∗

b. Derive eqn. (15.4.6).

15.5. Consider a single harmonic degree of freedom q, having a mass m, that obeys the generalized Langevin equation Z t 2 dτ q(τ ˙ )γ(t − τ ) + f (t). q¨ = −ω q − 0

Here ω is the frequency associated with the motion of q, γ(t) = ζ(t)/m, and f (t) = R(t)/m, where R(t) and ζ(t) are the random force and friction kernel, respectively. Suppose q(0) = 0. a. Show that the velocity autocorrelation function Cvv (t) obeys the following integro-differential equation: Z t d Cvv (t) = − dτ K(t − τ )Cvv (τ ), dt 0 where K(t) = ω 2 + γ(t). This equation is one in a class of integrodifferential equations known as Volterra equations. b. Devise a numerical algorithm for extracting the memory function and friction kernel given a velocity autocorrelation function obtained from a molecular dynamics calculation. Note that this requires inversion of the Volterra equation. Discuss any numerical difficulties you expect to arise in the implementation of your algorithm. c. Now consider a simple continuous model for the friction kernel γ(t) = λAe−λt . In what time range would you expect this model to break down physically and why? d. Solve the Volterra equation for the velocity autocorrelation function using the simple exponential friction kernel model in part c. Discuss the influence of the parameters A and λ on your solution. In addition, examine the free particle case by taking the limit ω → 0. e. Finally, discretize your velocity autocorrelation function into time steps of size ∆t for a given choice of A and λ, and use this discretized Cvv (t) to test the algorithm you developed in part b. How well can you recover the exponential friction kernel? How robust is your algorithm to the addition of a little random noise to your discretized Cvv (t)? 15.6. For a particle obeying eqn. (15.3.21), show that the density of vibrational states is related to the friction kernel by

Problems

I(ω) =

ω 2 γ ′ (ω) 2

[ω 2 − ω ˜ 2 − ωγ ′′ (ω)] + [ωγ ′ (ω)]

691

2,

where γ ′ (ω) and γ ′′ (ω) are the real and imaginary parts of γ˜ (iω) and I(ω) is the Fourier transform of the velocity autocorrelation function Z ∞ 1 −iωt I(ω) = dt hq(0) ˙ q(t)ie ˙ . 2π −∞ In general, the connection between the autocorrelation function of a random process and a spectral density is known as the Wiener-Khintchine theorem (Wiener, 1930; Khintchine, 1934; Kubo et al., 1985). Using eqn. (15.2.12), derive the spectral density for a harmonic bath corresponding to hR(0)R(t)i. 15.7. Write a program to integrate the Langevin equation with a Gaussian random force for a harmonic oscillator with mass m = 1, frequency ω = 1, temperature kT = 1, and friction γ = 1, using the integrator of Section 15.5. Verify that the correct momentum and position distribution functions of the canonical ensemble are obtained (see Problem 4.9). 15.8. Consider the adiabatic free-energy dynamics approach of Section 8.10. a. Reformulate this technique using a set of coupled Langevin equations with two different temperatures Tq and T for the first n coordinates and remaining 3N − n coordinates, respectively. b. Write a program to integrate your equations for the example in eqn. (8.10.27) using the parameters given following eqn. (8.10.28) and verify that you are able to generate the analytical free-energy profile in eqn. (8.10.28). 15.9. a. Derive eqns. (15.8.27), (15.8.30), and (15.8.31). ∗ ∗

b. Derive eqn. (15.8.34).

15.10. A solution contains a very low concentration of solute molecules (denoted A) and solvent molecules (denoted B). Let the number of solute molecules be NA with positions r1 (t), ..., rNA (t). We can introduce a phase-space function for the solute concentration c(r, t) at any point r in space as c(r, t) =

NA X i=1

δ(r − ri (t)).

a. Assuming the solution is in a cubic periodic box of length L, show that the spatial Fourier transform c˜k (t) of c(r, t) is c˜k (t) =

NA X i=1

e−ik·ri (t) ,

692

Langevin and generalized Langevin equations

where k = 2πn/L, where n is a vector of integers. b. Use the Mori-Zwanzig theory to derive a generalized Langevin equation for c˜k (t) and give the explicit expressions for all terms in the equation. c. Now consider the correlation function C(k, t) =

h˜ c−k (0)˜ ck (t)i . h˜ c−k c˜k i

Starting with your generalized Langevin equation of part b, derive an integro-differential equation satisfied by C(k, t). d. Show that the memory kernel in your equation is at least second-order in the wave vector k. e. Show that exp(QiLt) → exp(iLt) as |k| → ∞. f. Suppose the memory kernel decays rapidly in time. In this limit, show that the correlation function satisfies an equation of the form ∂ C(k, t) = −k · D · kC(k, t), ∂t where D is the diffusion tensor. Give an expression for D in terms of a velocity autocorrelation function. 15.11. Use the roots in eqn. (15.3.38) to invert the Laplace transform in eqn. (15.3.36) and obtain solutions for q(t) and the position correlation function Cqq (t) = hq(0)q(t)i for a harmonic oscillator with an exponential memory kernel. Examine the different cases in which the roots are all real and two are complex conjugates of each other. What happens to your solutions when the discriminant δ = 0? 15.12. Consider eqn. (15.6.33) without the random force term p m       p˙ F (q) p = −Z . s˙ 0 s q˙ =

By solving the equation for s(t) in terms of p(t), show that eqn. (15.6.31) without R(t) results if K(t) is given by eqn. (15.6.32). 15.13. Derive eqn. (15.6.35). 15.14. Write a program to integrate the overdamped Langevin equation with a Gaussian random force for a harmonic oscillator with mass m = 1,

Problems

693

frequency ω = 1, temperature kT = 1, and friction γ = 1, using the integrator in eqn. (15.5.12). Verify that the correct momentum and position distribution functions of the canonical ensemble are obtained (see Problem 4.9). 15.15. The technical difference between the d-AFED (Abrams and Tuckerman, 2008) and TAMD (Maragliano and Vanden-Eijnden, 2006) methods is that the former is based on molecular dynamics while the latter is built on the overdamped Langevin equation. That is, the equations of motion of TAMD for a system of N particles and n collective variables fα (r) are ζ0 r˙ i = Fi −

n X

α=1

κα (fα (r) − sα )

∂fα + σWi (t) ∂ri

ζ0,s s˙ α = κα (fα (r) − sα ) + σs wα (t)

(15.9.36)

(see eqns. (8.10.6)), where ζ0 and ζ0,s are the friction coefficients for the physical and extended phase-space variables, respectively, Wi (t) and wα (t) are vectors of Gaussian random processes of lengths √ 3N and n, 2kT ζ0 , and respectively, all having zero mean and unit width, σ = p σs = 2kTs ζ0,s . Here, T and Ts are the temperatures of the physical and extended phase-space variables (see Section 8.10 for details). Modify your program from exercise 15.14 for the example in eqn. (8.10.27) using the parameters given following eqn. (8.10.28) and the integrator in eqn. (15.5.12). Verify that you are able to generate the analytical free-energy profile in eqn. (8.10.28). 15.16. Active matter is composed of entities that consume energy, allowing them to move, exert forces, and exhibit complex motion. Simulations of active matter particles employ a device in which the energy consumption is modeled as an additional random process in an overdamped Langevin equation model. a. For a single particle in one dimension subject to a potential U (x), the active-matter equations take the form p γ x˙ = −U ′ (x) + αf + Dx η p f˙ = −βf + Df ξ. Here, α, β, Dx , and Df are constants, and η and ξ are Gaussian random processes of zero–mean and correlations hη(0)η(t)i = δ(t), and hξ(0)ξ(t)i = δ(t). From these equations, derive expressions for the correlation functions hx(0)x(t)i and hf (0)f (t)i if U (x) = kx2 /2.

b. In a system of N particles in two dimensions, which describes a system of colloidal disks, the equations of motion take the form X p ˆi + µ r˙ i = v n Fij + 2DT ηi j6=i

694

Langevin and generalized Langevin equations

θ˙i =

p 2DR ξi

where DT and DR are translational and rotational diffusion constants, ˆ i = (cos θi sin θi ). If Fij µ = DT /kT , Fij are pairwise forces, and n comes from a pair potential of the form U (rij ) = k(2a − rij )2 /2 for rij < 2a and U (rij ) = 0 for rij ≥ 2a, describe qualitatively the motion generated by these equations.

16 Discrete models and critical phenomena 16.1

Phase transitions and critical points

In Section 4.8, we studied the liquid–gas phase transition associated with the van der Waals equation of state. We demonstrated the existence of a critical isotherm and derived the thermodynamic state variables of the liquid–gas critical point. We also showed that the behavior of certain thermodynamic properties is determined by a set of power laws characterized by exponents known as critical exponents, and we alluded to the phenomenon of universality, which provides a rationalization for the observation that large classes of physically very different systems possess the same critical exponents. In this chapter, we will explore the behavior of systems near their critical points in greater detail. To begin our discussion, consider a uniform gas of identical particles in a container of volume V . The interactions between the particles are weak, and their motion is primarily driven by the kinetic-energy (free particle) contribution to the Hamiltonian. Collisions, which are overwhelmingly dominated by two-body interactions, are infrequent. If we assume that the collisions are approximately elastic, then each colliding particle merely changes its direction of motion. The fact that an interaction event determines when and where a particle’s next collision will occur is not particularly important because the frequency of collision events is small. In this case, we speak of a lack of correlation between collision events. If the gas is now compressed at a given temperature so that both the pressure and the density are increased, the interactions between the particles become more important, at least locally. Formation of small, short-lived clusters might occur, due to cooperative interactions that have a greater influence on the system than merely changing the direction of a particle’s motion. Collision events now exhibit short-range correlations with each other, which leads to the formation of such local structures. If the system is compressed even further, a change of phase or phase transition occurs in which the gas becomes a liquid. Although phase transitions are an everyday phenomenon, their underlying microscopic details are fascinating and merit further comment. First, the macroscopic manifestation of a gas-to-liquid phase transition is a discontinuous change in the volume. At the microscopic level, the interparticle interactions give rise to long-range correlations—cooperative effects that cause the gas particles to condense, forming well-defined solvation structures quantifiable through spatial correlation functions such as those in Figs. 4.2 and 4.3.

696

Discrete models and critical phenomena

Fig. 16.1 Representative phase diagram of a simple, one-component system. L1 , L2 , and L3 are the sublimation, melting, and boiling curves, respectively.

Further compression leads to the formation of locally ordered structures that resemble the solid. As the compression continues, long-range order sets in, and a liquidto-solid phase transition occurs. Such a transition is accompanied by a discontinuous change in the density, although the change is not as dramatic as in the gas-to-liquid case. Other macroscopic observables known as order parameters (see Section 16.4) change as well. In order to map out the specific values of pressure and temperature at which the different phases exist, a phase diagram is used. A typical phase diagram for a simple, one-component system is shown in Fig. 16.1. In the phase diagram, the lines that separate different phases are called coexistence curves. Among these, there is the melting curve (L2 ) between the liquid and solid phases, the sublimation curve (L1 ) between the solid and gas phases, and the boiling curve (L3 ) between the liquid and gas phases. The point at which all three curves meet is called the triple point. In the above discussion, where a constant temperature is assumed, the specific value of the temperature determines whether or not a gas-to-liquid phase transition can occur. If the temperature is too high, then the system cannot exist as a liquid at any pressure. The temperature at which a gas–liquid phase transition just starts is called the critical temperature, denoted Tc . The existence of a critical temperature is the reason that the boiling curve in Fig. 16.1 terminates at a definite point, whereas the melting curve, in principle, does not. The point at which the boiling curve terminates is called the critical point. Consider next isotherms of the equation of state for a simple fluid, which are illustrated in Fig. 16.2. For temperatures above Tc , no phase transition occurs, and the isotherms are continuous. In the phase diagram, the region to the right of the critical point is known as the supercritical fluid region where the system exhibits both gas-like and liquid-like properties. For temperatures below Tc , one sees a discontinuous change in the volume, signifying the transition from gas to liquid. When a phase transition is characterized by a discontinuous change in an associated thermodynamic observable, the transition is referred to as a first-order phase transition. In Fig. 16.2, there is one point labeled C at which the phase transition is characterized by a continuous volume

Critical exponents

697

Fig. 16.2 Equation of state of a simple, one-component fluid. C denotes an inflection point.

change. This point, which is an inflection point along the isotherm, corresponds to the critical point on the phase diagram. The isotherm that contains the point C is called the critical isotherm. The phase transition that occurs at this point is an example of a second-order phase transition. For a one-component system, the critical point is the only point at which a second-order phase transition is possible. A two-component system, for example, could have lines of second-order phase transitions, called critical lines. As discussed in Section 4.8, C is a point of zero curvature, meaning that ∂P/∂ρ and ∂ 2 P/∂ρ2 both vanish at C.

16.2

The critical exponents α, β , γ , and δ

The liquid–gas critical point is characterized by a number of important properties. First, certain thermodynamic variables are observed to diverge as the temperature T approaches the critical temperature Tc ; the divergence obeys a power-law form in |T − Tc |−1 . Other thermodynamic variables are found to exhibit a nondivergent power-law dependence as the critical point is approached in either |T − Tc | or |ρ − ρc |, where ρc is the critical density—the density at which the inflection point C occurs in Fig. 16.2. The exponents that govern the aforementioned power laws are called critical exponents. A second important property is the fact that large classes of systems, known as universality classes, possess the same critical exponents. This phenomenon of universality indicates that near a critical point, the details of the local interactions between specific pairs or clusters of particles become less important than long-range cooperative effects, which are largely insensitive to the particulars of an interaction

698

Discrete models and critical phenomena

potential. Thus, it is possible to obtain information about all of the systems in a universality class by studying its physically simplest members. We start by introducing a set of critical exponents, known as the primary exponents, and the properties they characterize. We will illustrate these exponents first using the gas–liquid critical point we have been discussing; however, we will see very shortly see how these definitions carry over to a different type of physical system in the same universality class. The first exponent pertains to the behavior of the constantvolume (or constant-pressure) heat capacity at the critical pressure and density as the critical temperature Tc is approached from above. Recall that the heat capacity at constant volume is CV = (∂E/∂T )V = kβ 2 (∂ 2 ln Q(N, V, T )/∂β 2 )V . As T → Tc from above, CV is observed to diverge with the power-law form CV ∼ |T − Tc |−α .

(16.2.1)

The critical exponent α, therefore, characterizes the divergence in CV as T → Tc . The second exponent γ pertains to the divergence of the isothermal compressibility κT , defined as κT = −(1/V )(∂V /∂P )T (see eqn. (4.8.41)). At the critical pressure and volume, as T → Tc from above, the isothermal compressibility is observed to diverge following the power law κT ∼ |T − Tc |−γ . (16.2.2) The third exponent characterizes the shape of the critical isotherm near the inflection point where the density and pressure approach their critical values ρc and Pc . In particular, it is seen that for values of P and ρ near their critical values at T = Tc , P − Pc ∼ (ρ − ρc )δ sign(ρ − ρc ),

(16.2.3)

where sign(ρ − ρc ) = (ρ − ρc )/|ρ − ρc | is just the sign of (ρ − ρc ). Finally, the fourth exponent refers to the dependence of the difference ρL − ρG on temperature as Tc is approached from below. Here ρL and ρG refer to the liquid and gas density values when the discontinuous change occurs. Is it observed that this difference obeys the power law ρL − ρG ∼ |Tc − T |β .

(16.2.4)

α, β, γ, δ are the primary critical exponents.

16.3

Magnetic systems and the Ising model

The gas–liquid critical point we have been discussing is not the simplest system in its universality class due to the complexity of the interactions and the complicated ensemble distribution they generate. The problem would be simplified considerably if we could restrict the particles to specific points in space, specifically the points of a regular lattice, and use variables that take on discrete values rather than the continuous Cartesian positions r1 , ..., rN that characterize liquids and gases. The only requirements that we place on the simplified system is that it possess a critical point and that it belong to the same universality class.

Ising model

699

Fortunately, it is possible to fulfill these requirements by considering the phenomenon of magnetization (the formation of magnetic ordering in a ferromagnetic system). We are interested in the transition from a disordered to an ordered state that occurs either by the application of an external field or spontaneously as the temperature is reduced. The models that are often invoked to describe this process are of a type known as lattice models. Since we know that magnetization involves the collective alignment of the quantum spins of particles along a particular spatial axis, the basic ˆ 1 , ...S ˆ N placed at the lattice model consists of N interacting particles having spins S points of a regular lattice, as illustrated in Fig. 16.3. For simplicity, we consider the

Fig. 16.3 Small part of a cubic spin lattice.

ˆ i = (~/2)σ ˆ i , where σ ˆ is the vector of Pauli matrices particles to be spin-1/2, so that S ˆ = (ˆ σ σx , σ ˆy , σ ˆz ) (see Problem 16.3). When diagonalized, each of the Pauli matrices has eigenvalues ±1. We consider only pairwise interactions among the spins described by a general spin-spin coupling tensor Jij so that the Hamiltonian for the system in the absence of an external magnetic field is X ˆ 0 = −1 ˆ i · Jij · σ ˆj . σ (16.3.1) H 2 i,j The model in eqn. (16.3.1) is known as the Heisenberg model. The ‘0’ subscript reminds us that eqn. (16.3.1) prescribes a field-free Hamiltonian. When a magnetic field B is ˆ =H ˆ0 +H ˆ 1 , where H ˆ1 applied to the system, the total Hamiltonian takes the form H is a perturbation due to the field and is given by X X ˆ1 = − ˆi = − ˆi, h·σ (16.3.2) γB · S H i

i

where γ is the spin gyromagnetic ratio and h = γ~B/2 (see Problems 9.2 and 10.3). As we will discuss in the next section, the Heisenberg and liquid–gas models do not belong to the same universality class. Therefore, we need to consider a simplified version of the model in eqns. (16.3.1) and (16.3.2), which ultimately renders it easier to solve. The simplification we will make is to restrict the spins to lie along only

700

Discrete models and critical phenomena

one spatial axis, specifically, the z-axis. With this choice, the spin matrices become ˆ i = (0, 0, σ σ ˆzi ). In addition, since σ ˆz is diagonal, we can reduce each spin matrix σ ˆzi to a scalar variable σzi that can take values of ±1 only. At this point, since there is only one relevant spatial axis, we can henceforth drop the “z” label on the spin variables. Finally, we will make the assumption that the spin-spin couplings, which are now simple numbers Jij , couple only nearest neighboring spins. Thus, the Hamiltonian for this idealized situation becomes X 1 X σi , (16.3.3) Jij σi σj − h H=− 2 i P where the notation signifies that only nearest-neighbor interactions are included in the first term. The h = 0 limit of eqn. (16.3.3) gives the unperturbed Hamiltonian 1 X H0 = − Jij σi σj . (16.3.4) 2 The model described by eqn. (16.3.3) is known as the Ising model, named after the German physicist Ernst Ising (1900–1998). We have dropped the hat from H in eqns. (16.3.3) and (16.3.4), signifying that the Ising model is a type of generic discrete model obtainable as idealizations of actual classical or quantum Hamiltonians (see Problems 10.11 and 16.5 for examples of the former). It is worth noting that the Ising model is just one type of discrete spin model. Another discrete spin model, known as the Potts model, is specified by a Hamiltonian of the form X X σi , (16.3.5) Jij δσi σj − h H=− i

where δkl is the usual Kroenecker delta (see Problem 16.2). A more general type of Potts model replaces the Jij δσi σj term in the Hamiltonian with a term of the form Jσiji σj , allowing for a broader range of couplings between spins at sites i and j on the lattice. In fact, such generalizations of both Potts and Ising type Hamiltonians allow these models to be applied to a wide range of discrete systems beyond simple spin lattices, examples of which will be described briefly at the end of this chapter. Although we will focus throughout this chapter on techniques for solving Ising models, the mathematical similarity between the Ising and Potts Hamiltonians means that the methods we will describe can be easily applied to the solution of Potts models. For the gas–liquid phase transition, where the density ρ was used to distinguish one phase from another, we seek a variable that can distinguish a disordered phase from an ordered, magnetized phase. To this end, we introduce the average total magnetization + *N X (16.3.6) σi , M= i=1

which is an extensive thermodynamic observable. The equivalent intensive observable is the average magnetization per spin m = M/N . In a perfectly ordered state, m = ±1 and M = ±N , depending on whether the spins are aligned along the positive or

Ising model

701

h

M>0 (Spin up) M=0

T

Tc M Tc , the system is in a disordered state for h = 0. It could still order in the presence of a finite applied field and, therefore, is paramagnetic. For T < Tc , as h → 0, a finite magnetization persists down to h = 0. If h → 0+ , then the magnetization will be positive, and if h → 0− , it will be negative. As this analysis implies, at h = 0, the magnetization can be either positive or negative, and indeed, the Ising model exhibits a two-phase coexistence at h = 0. Since a plot of m vs. h for T < Tc is an isotherm of the equation of state, such an isotherm shows a discontinuous change in m (see Fig. 16.5). For T > Tc , the magnetization vanishes as h → 0. The critical isotherm shown in Fig. 16.5 separates these two modes of behavior. At h = 0, the isotherm has zero curvature, meaning that ∂h/∂m = 0 and ∂ 2 h/∂m2 = 0. Table 16.1 draws an analogy between the gas–liquid and magnetic cases. It lists the basic thermodynamic variables and thermodynamic relations among these variables, which will be needed throughout our discussion. Accordingly, the primary critical exponents are defined as follows. At h = 0 and m = 0 (the values of the magnetic field and magnetization at the critical point), as T → Tc from above, the heat capacity at constant magnetization CM diverges as CM ∼ |T − Tc |−α .

(16.3.7)

Similarly, at h = 0 and m = 0, as T → Tc from above, the susceptibility diverges as χ ∼ |T − Tc |−γ .

(16.3.8)

702

Discrete models and critical phenomena

Fig. 16.5 Equation of state of the Ising model.

Table 16.1 Comparison of thermodynamic variables and relations between gas–liquid and magnetic systems. Quantity

Gas–Liquid

Magnetic

Quantity

Pressure

P

h

Magnetic field

Volume

V

−M = −N m

Magnetization

κT = −(1/V )(∂P/∂V )

χ = ∂m/∂h

Magnetic susceptibility

Helmholtz free energy

A(N, V, T )

A(N, M, T )

Helmholtz free energy

Gibbs free energy

G(N, P, T )

G(N, h, T )

Gibbs free energy

P = −∂A/∂V

h = ∂A/∂M

Magnetic field relation

V = ∂G/∂P

M = −∂G/∂h

Isothermal compressibility

Pressure relation Volume relation Const. volume heat capacity Const. pressure heat capacity

CV = −T ∂ 2 A/∂T

 2

CP = −T ∂ 2 G/∂T 2

V



P

CM = −T ∂ 2 A/∂T

Magnetization relation

 2

Ch = −T ∂ 2 G/∂T 2

M



h

Const. magnetization heat capacity Const. field heat capacity

Universality classes

703

Along the critical isotherm, the behavior of the equation of state near the inflection point is h ∼ |m|δ sign(m). (16.3.9) Finally, as T → Tc from below, the discontinuity in the magnetization depends on temperature according to the power-law m ∼ |Tc − T |β .

(16.3.10)

In this way, a perfect analogy is established between the magnetic and gas–liquid systems. Before proceeding to analyze the magnetic system model, however, we first clarify the concept of universality and provide a definition of universality classes.

16.4

Universality classes

In a perfectly ordered magnetic state, the magnetization per spin m can take one of two values, m = 1 or m = −1, depending on the direction in which the spins point. In the former case, σ1 = 1, ..., σN = 1, while in the latter σ1 = −1, ..., σN = −1. If we perform a variable transformation σi′ = −σi ,

(16.4.1)

which is simply a spin-flip transformation, the magnetization in a perfectly ordered state changes sign. Note that the spin-flip transformation has the same effect as performing a parity transformation, in which we let the spatial coordinate z → −z.1 Consider next the effect of the transformation in eqn. (16.4.1) on the unperturbed Hamiltonian H0 of our idealized magnetic model. The unperturbed (h = 0) Hamiltonian is 1 X Jij σi σj . (16.4.2) H0 = − 2 If the spins in eqn. (16.4.2) are transformed according to eqn. (16.4.1), the Hamiltonian becomes 1 X H0′ = − Jij σi′ σj′ , (16.4.3) 2 which has exactly the same form as eqn. (16.4.2). Thus, the transformation in eqn. (16.4.1) preserves the form of the Hamiltonian. The Hamiltonian H0 is said to be invariant under a spin-flip transformation. This is not unexpected since the spin-flip transformation is equivalent to a parity transformation, which is merely a different choice of coordinates, and physical results should not depend on this choice. Thus, the Hamiltonian H0 exhibits parity invariance. Readers having some familiarity with the concepts of group theory will recognize that the spin-flip transformation, together with the trivial identity transformation σi′ = σi , form a complete group of transformations with elements {1, −1}, a group known as Z2 . The Hamiltonian H0 is invariant under both of the operations of this 1 The

general parity transformation is a complete reflection of all three spatial axes, r → −r.

704

Discrete models and critical phenomena

group. The magnetization of an ordered state, on the other hand, is not. Based on these notions, we introduce the concept of an order parameter, which is needed to define universality classes. Suppose the unperturbed Hamiltonian H0 of a system is invariant with respect to all of the transformations of a group G. If two phases can be distinguished by a specific thermodynamic average hφi (either a classical phase-space average or a quantum trace) that is not invariant under one or more of the transformations of G, then hφi is called an order parameter for the system. Because the magnetization m is not invariant under one of the transformations in Z2 , it can serve as an order parameter for the Ising model with H0 given by eqn. (16.3.4). The systems in a universality class are characterized by two parameters: (1) the number d of spatial dimensions in which the system exists, and (2) the dimension n of the order parameter. All systems possessing the same values of d and n belong to the same universality class. Thus, comparing the gas–liquid system and the h = 0 Ising model defined by eqn. (16.3.4), it should be clear why these two systems belong to the same universality class. In both cases, the number of spatial dimensions is d = 3. (In fact, these models can be defined in any number of dimensions.) In addition, the dimension of the order parameter for each system is n = 1, since for both systems, the order parameter (ρ or m) is a simple scalar quantity. Consequently, the idealized magnetic model can be used to determine the critical exponents of the d = 3, n = 1 universality class. By contrast, in the Heisenberg model of eqn. (16.4.2), the spins can point in any spatial direction, and the order parameter is the magnetization vector + *N X (16.4.4) σi , M= i=1

which has dimension n = 3, corresponding to the three components of M. Such a system could be used to determine the exponents of the d = 3, n = 3 universality class. Having established the concept of a universality class, we will now proceed to analyze the Ising model in order to gain an understanding of systems in the d = 3, n = 1 universality class near their critical points.

16.5

Mean-field theory

We begin our treatment of the Ising model by invoking an approximation scheme known as the mean-field theory. In this approach, spatial correlations are neglected, and each particle is assumed to experience an “average” or “mean” field due to the other particles in the system. Before examining the magnetic system, let us first note that we previously encountered this approximation in our discussion of the van der Waals equation in Section 4.8. We derived the van der Waals equation of state from a perturbation expansion of the potential energy of a system, and we obtained the approximate Helmholtz free energy of eqn. (4.8.25): 1 A = − ln β



ZN (0) N !λ3N



+ hU1 i0 −

 β hU12 i0 − hU1 i20 + · · · , 2

Mean-field theory

705

0 where ZN is the configurational partition function due to the unperturbed potential U0 . Note that the second term in the free energy is just the average of the perturbation U1 , while the third term is the fluctuation in this potential h(U1 − hU1 i)2 i. In the derivation of the van der Waals equation, the fluctuation term was completely neglected. Furthermore, the unperturbed configurational partition function was taken to be that of an ideal gas in a reduced volume. Thus, all of the interactions between particles were assumed to arise from U1 , and the approximation of retaining only the first two terms in the free-energy expression amounted to replacing U1 with its mean value hU1 i in the configurational partition function, i.e.,

ZN = ZN (0) he−βU1 i ≈ ZN (0) e−βhU1 i

(16.5.1)

(cf. eqn. (4.8.4)). Note that the replacement in eqn. (16.5.1) is tantamount to approximating hexp(−βU1 )i by its first-order cumulant. The mean-field theory approximation can recover the first two terms in the free energy. Recall that the van der Waals equation, despite its crudeness, predicts a gas-to-liquid phase transition as well as a critical point. The four primary exponents were found in Section 4.8 to be α = 0, β = 1/2, γ = 1, and δ = 3 within the mean-field approximation. In our discussion of the van der Waals equation, we referred to the fact that the isotherms are unrealistic for T < Tc owing to regions where both P and V increase simultaneously. In Fig. 4.8, the correction to a T < Tc isotherm, which appears as the thin solid straight line, is necessary for the calculation of the exponent β. The location of the thin solid line is determined using a procedure known as the Maxwell construction, which states that the areas enclosed above and below the thin line and the isotherm must be equal. Once the isotherms for T < Tc are corrected in this manner, then the exponent β can be calculated (see Problem 16.3). In order to apply the mean-field approximation to the Ising model, we assume that the P system is spatially isotropic. That is, for the spin-spin coupling Jij , we assume j Jij is independent of the lattice location i. Since the sum in eqn. (16.3.4) is perP ˜ formed over nearest neighbors of i, under the assumption of isotropy, j Jij = z J, ˜ where J is a constant and z is the number of nearest neighbors of each spin (z = 2 in one dimension, z = 4 on a two-dimensional square lattice, z = 6 on a three-dimensional simple cubic lattice, z = 8 on a three-dimensional body-centered cubic lattice, etc.). ˜ we define J = z J. ˜ Absorbing the factor z into the constant J, Next, we consider the Hamiltonian in the presence of an applied magnetic field h: X 1 X σi . (16.5.2) Jij σi σj − h H=− 2 i The partition function is given by ∆(N, h, T ) =

X

X

σ1 =±1 σ2 =±1

···

     X X 1 exp β  σi  . (16.5.3) Jij σi σj + h   2 =±1 i

X

σN

To date, it has not been possible to obtain a closed-form expression for this sum in three dimensions. Thus, to simplify the problem, we write the spin-spin product P σi σj in terms of the difference of each spin from the magnetization per spin m = (1/N )h i σi i:

706

Discrete models and critical phenomena

σi σj = (σi − m + m)(σj − m + m)

= m2 + m(σi − m) + m(σj − m) + (σi − m)(σj − m).

(16.5.4)

Since m ∼ hσi, the last term in eqn. (16.5.4) is a fluctuation term, which is neglected in the mean-field approximation. If this term is dropped, then   1 X 1 X Jij σi σj ≈ Jij −m2 + m(σi + σj ) 2 2 X 1 = − m2 N J + Jm σi , 2 i

(16.5.5)

where the assumption of spatial isotropy has been used. Thus, the Hamiltonian reduces to X X 1 1 X σi ≈ N Jm2 − (Jm + h) H=− σi , (16.5.6) Jij σi σj − h 2 2 i i and the partition function becomes ∆(N, h, T ) ≈

X

X

σ1 =±1 σ2 =±1

=e

−βN Jm2 /2

···

σN

X

X

σ1 =±1 σ2 =±1

= e−βN Jm

2

/2

X

σ1 =±1

=e

−βN Jm2 /2

X

2

/2

= e−βN Jm

2

/2

"

···

X

σN =±1

"

exp β(Jm + h)

exp [β(Jm + h)σ1 ] × · · · × e

σ=±1

= e−βN Jm

(

X 1 exp −β σi N Jm2 − (Jm + h) 2 =±1 i

X

β(Jm+h)σ

X i

X

σi

#)

#

exp [β(Jm + h)σN ]

σN =±1

!N

N  eβ(Jm+h) + e−β(Jm+h) N

(2coshβ(Jm + h)) .

(16.5.7)

From eqn. (16.5.7), the Gibbs free energy G(N, h, T ) can be calculated according to G(N, h, T ) = −

1 1 N ln ∆(N, h, T ) = N Jm2 − ln [2coshβ(Jm + h)] . β 2 β

(16.5.8)

The average magnetization is M = −(∂G/∂h), which means that the average magnetization per spin m = M/N can be expressed as m = −(∂g/∂h), where g(h, T ) = G(N, h, T )/N is the Gibbs free energy per spin: g(h, T ) =

1 1 Jm2 − ln [2coshβ(Jm + h)] . 2 β

(16.5.9)

Mean-field theory

707

Thus, the average magnetization per spin is given by ∂g = tanhβ(Jm + h). (16.5.10) ∂h Notice, however, that since m was introduced into the Hamiltonian, the result of this derivative is an implicit relation for m that takes the form of a transcendental equation. We now ask if an ordered phase exists at zero field. Setting h = 0 in eqn. (16.5.10), the transcendental equation becomes m = tanh(βJm). Of course, m = 0 is a trivial solution to this equation; however, we seek solutions for finite m, which we can obtain by solving the equation graphically. That is, we plot the two functions f1 (m) = m and f2 (m) = tanh(βJm) on the same graph and then look for points at which the two curves intersect for different values of kT = 1/β. The plot is shown in Fig. 16.6. We m=−

Fig. 16.6 Graphical solution of the transcendental equation tanh(βJm) = m.

see that depending on the value of T , the curves intersect at either three points or one point. Excluding the trivial case m = 0 we see that for small enough T (βJ > 1), there are two other solutions, which we label m0 and −m0 . These solutions correspond to magnetizations at zero field aligned along the positive and negative z-axis, respectively. When the temperature is too high, that is, when βJ < 1, only the m = 0 solution exists. Since there is no magnetization at zero field,there is no spontaneous ordering. The case βJ = 1 just separates these two regimes and corresponds, therefore, to a critical isotherm. The condition βJ = 1 can be used to determine the critical temperature:

708

Discrete models and critical phenomena

J = 1 ⇒ kTc = J. (16.5.11) kT In order to clarify further the behavior of the system near the critical point, consider expanding the free energy about m = 0 at zero field for temperatures near the critical temperature. If the expansion is carried out up to quadratic order in m, we obtain βJ =

g(0, T ) ≈ c1 + J(1 − βJ)m2 + c2 m4 = a(T ),

(16.5.12)

where c1 and c2 are constants with c2 > 0. Note that at zero field, the Gibbs free energy per spin becomes the Helmholtz free energy per spin a(T ). If βJ > 1, the sign of the quadratic term is negative, and a plot of the free energy as a function of m is shown in Fig. 16.7(a). We can see from the figure that the free energy has two minima at m = ±m0 and a maximum at m = 0, indicating that the ordered states, predicted by solving for the magnetization m, are thermodynamically stable while the disordered state with m = 0 is thermodynamically unstable. For βJ < 1, the sign of

Fig. 16.7 Free energy of eqn. (16.5.12), g(0, T ). (a) T < Tc . (b) T > Tc .

the quadratic term is positive, and the free-energy plot, shown in Fig. 16.7(b), possesses a single minimum at m = 0, indicating that there are no solutions corresponding to ordered states. We now turn to the calculation of the critical exponents for the Ising model within the mean-field theory. From the free-energy plot in Fig. 16.7(a), we can obtain the exponent β directly. Recall that β describes how the discontinuity associated with the first-order phase transition for T < Tc depends on temperature as T → Tc , and for this, we need to know how m0 depends on T for T < Tc . The dependence of m0 on T is determined by the condition that g(0, T ) be a minimum at m = m0 : ∂g(0, T ) = 0, (16.5.13) ∂m m=m0 or 2J(1 − βJ)m0 + 4c2 m30 = 0

Mean-field theory

2J T

709

  J T− + 4c2 m20 = 0 k

2J (T − Tc ) + 4c2 m20 = 0 T 1/2

m0 ∼ (Tc − T )

,

(16.5.14)

from which it is clear that β = 1/2. In order to determine δ, the mean-field equation of state is needed, which is provided by eqn. (16.5.10). Solving eqn. (16.5.10) for h yields h = kT tanh−1 (m) − mJ.

(16.5.15)

We are interested in the behavior of eqn. (16.5.15) along the critical isotherm, where m0 → 0 near the inflection point. Using the expansion tanh−1 (x) ≈ x + x3 /3 + · · · gives   m3 h ≈ kT m + − mJ 3   kT 3 J + m = mk T − k 3 = mk (T − Tc ) +

kT 3 m . 3

(16.5.16)

Thus, along the critical isotherm, T = Tc , we find that h ∼ m3 , which implies that δ = 3. In order to calculate γ, we examine the susceptibility χ as T → Tc from above. By definition, ∂m 1 χ= = . (16.5.17) ∂h ∂h/∂m From eqn. (16.5.16), ∂h = k(T − Tc ) + kT m2 . ∂m

(16.5.18)

For T > Tc , m = 0, hence ∂h/∂m ∼ (T − Tc ). Therefore, χ ∼ (T − Tc )−1 , from which it is clear that γ = 1. Finally, the exponent α is determined by the behavior of the heat capacity Ch as T → Tc from above. Since Ch is derived from the Gibbs free energy, consider the limit of eqn. (16.5.8) for T > Tc , as m → 0 at h = 0: G(N, h, T ) = −N kT ln 2.

(16.5.19)

From this expression, it follows that Ch = 0; since, there is no divergence in Ch , we conclude that α = 0.

710

Discrete models and critical phenomena

In summary, we find that the mean-field exponents for the magnetic model are α = 0, β = 1/2, γ = 1, and δ = 3, which are exactly the exponents we obtained for the liquid–gas critical point using the van der Waals equation. Thus, within the meanfield theory approximation, two very different physical models yield the same critical exponents, thus providing a concrete illustration of the universality concept. As noted in Section 4.8, the experimental values of these exponents are α = 0.1, β = 0.34, γ = 1.35, and δ = 4.2, which shows that mean-field theory is not quantitatively accurate. Qualitatively, however, mean-field theory reveals many important features of critical-point behavior (even if it misses the divergence in the heat capacity) and is, therefore, a useful first approach. In order to move beyond mean-field theory, we require an approach capable of accounting for the neglected spatial correlations. We will first examine the Ising model in one and two dimensions, where the model can be solved exactly. Following this, we will present an introduction to scaling theory and the renormalization group methodology.

16.6

Ising model in one dimension

Solving the Ising model in one dimension is a relatively straightforward exercise. As we will show, however, the one-dimensional Ising model shows no ordered phases. Why study it then? First, there are classes of problems that can be mapped onto one-dimensional Ising-like models, such as the conformational equilibria of a linear polymer (see Problems 10.9 and 16.5). Second, the mathematical techniques employed to solve the problem are applicable to other types of problems. Third, even the onedimensional spin system must become ordered at T = 0, and therefore, understanding the behavior of the system as T → 0 will be important in our treatment of spin systems via renormalization group methods. From eqn. (16.5.2), the Hamiltonian for the one-dimensional Ising model is H = −J

N X i=1

σi σi+1 − h

N X

σi .

(16.6.1)

i=1

In order to complete the specification of the model, a boundary condition is also needed. Since the variable σN +1 appears in eqn. (16.6.1), it is convenient to impose periodic boundary conditions, which leads to the condition σN +1 = σ1 . The onedimensional periodic chain is illustrated in Fig. 16.8. Because of the periodicity, the Hamiltonian can be written in a more symmetric manner as

Fig. 16.8 One-dimensional Ising system subject to periodic boundary conditions.

One-dimensional Ising model N X

711

N

hX σi σi+1 − (σi + σi+1 ) . H = −J 2 i=1 i=1

(16.6.2)

The partition function corresponding to the Hamiltonian in eqn. (16.6.2) is " # N N X X X βh X σi σi+1 + ∆(N, h, T ) = ··· exp βJ (σi + σi+1 ) . (16.6.3) 2 i=1 σ =±1 σ =±1 i=1 1

N

Since each spin sum has two terms, the total number of terms represented by the spin sums is 2N . A powerful method for evaluating the partition function is referred to as the transfer matrix method, first introduced by Kramers and Wannier (1941a, 1941b). This method recognizes that the partition function can be expressed as a large product of matrices. Consider the matrix P, whose elements are given by ′



hσ|P|σ ′ i = eβJσσ +βh(σ+σ )/2 .

(16.6.4)

Since σ and σ ′ can be only 1 or -1, P is a 2×2 matrix with elements given by h1|P|1i = eβ(J+h) h−1|P| − 1i = eβ(J−h)

h1|P| − 1i = h−1|P|1i = e−βJ .

Written as a matrix, P appears as  β(J+h) e P= e−βJ

e−βJ eβ(J−h)



.

In terms of P, the partition function can be expressed as X X hσ1 |P|σ2 ihσ2 |P|σ3 i · · · hσN −1 |P|σN ihσN |P|σ1 i. ··· ∆(N, h, T ) = σ1

(16.6.5)

(16.6.6)

(16.6.7)

σN

Because the partition function is now a matrix product of N factors of P, each sandwiched between spin states with a spin in common, P is known as the transfer P matrix. Using the completeness of the spin eigenvectors, each factor of the form σk |σk ihσk | appearing in eqn. (16.6.7) is an identity operator I, and the sum over N spins can be collapsed to a sum over just one spin σ1 : X hσ1 |PN |σ1 i ∆(N, h, T ) = σ1

 = Tr PN .

(16.6.8)

Interestingly, in deriving eqn. (16.6.8), we performed the opposite set of operations used in eqns. (12.2.9) and (12.2.11) to derive the Feynman path integral. In the latter, an operator product was expanded by the introduction of the identity between factors of the operator.

712

Discrete models and critical phenomena

The simplest way to calculate the trace is to diagonalize P, which yields two eigenN values λ1 and λ2 , in terms of which the trace is simply λN 1 + λ2 . The eigenvalues of P are solutions of det(P − λI) = 0, which gives the eigenvalues   q 2 βJ −4βJ . (16.6.9) cosh(βh) ± sinh (βh) + e λ=e We denote these as values λ± (instead of λ1,2 ), where λ± corresponds to the choice of +/- in eqn. (16.6.9). Thus, the partition function becomes   N ∆(N, h, T ) = Tr PN = λN (16.6.10) + + λ− .

N Although eqn. (16.6.10) is exact, since λ+ > λ− , it follows that for N → ∞, λN + ≫ λ− so that the partition function is accurately approximated using the single eigenvalue λ+ . Thus, ∆(N, h, T ) ≈ λN + , and the free energy per spin is simply

g(h, T ) = −kT ln λ+



 q 2 −4βJ . = −J − kT ln cosh(βh) + sinh (βh) + e

(16.6.11)

From eqn. (16.6.11), the magnetization per spin can be computed as q   sinh(βh) + sinh(βh)cosh(βh)/ sinh2 (βh) + e−4βJ ∂g q m= = . ∂h cosh(βh) + sinh2 (βh) + e−4βJ

(16.6.12)

As h → 0, the magnetization vanishes, since cosh(βh) → 1 and sinh(βh) → 0. Thus, there is no magnetization at any finite temperature in one dimension, and hence, no nontrivial critical point. Note, however, that as T → 0 (β → ∞), the factors exp(−4βJ) vanish, and m → ±1 as h → 0± . This indicates that an ordered state does exist at absolute zero of temperature. The fact that m tends toward different limits depending on the approach of h → 0 from the positive or negative side indicates that T = 0 can be thought of as a critical point, albeit an unphysical one. Indeed, such a result is expected since the entropy vanishes at absolute zero, and consequently an ordered state must exist at T = 0. Though unphysical, we will find this critical point useful for illustrative purposes later in our discussion of the renormalization group.

16.7

Ising model in two dimensions

In contrast to the one-dimensional Ising model, which can be solved with a few lines of algebra, the two-dimensional Ising model is a highly nontrivial problem that was first worked out exactly by Lars Onsager (1903–1976) in 1944 (Onsager, 1944). Extensive discussions of the solution of the two-dimensional Ising model can be found in the books by K. Huang (1963) and by R. K. Pathria (1972). Here, we shall give the basic idea behind two approaches to the problem and then present the solution in its final form.

Two-dimensional Ising model

713

Fig. 16.9 Two-dimensional Ising system subject to periodic boundary conditions.

Transfer matrix approach: The first method follows the transfer matrix approach employed in the previous section for the one-dimensional Ising model. Consider the simple square lattice of spins depicted in Fig. 16.9, in which each row and each column contains n spins, so that N = n2 . If i indexes the rows and j indexes the columns, then the Hamiltonian, taking into account the restriction to nearest-neighbor interactions only, can be written as H = −J

n X n X i=1 j=1

[σi,j σi+1,j + σi,j σi,j+1 ] − h

n X n X

σi,j .

(16.7.1)

i=1 j=1

As in the one-dimensional case, we impose periodic boundary conditions on the square lattice so that the spins satisfy σn+1,j = σ1,j and σi,n+1 = σi,1 . The partition function can now be expressed as X X X X ∆(N, h, T ) = ··· ··· ··· σ1,1 =±1

X

σ1,n =±1

···

X

σn,n =±1

exp

σn,1 =±1 σ1,2 =±1

  

βJ

n X

i,j=1

σn,2 =±1

[σi,j σi+1,j + σi,j σi,j+1 ] + βh

n X

i,j=1

σi,j

  

.

(16.7.2)

As there are N = n2 spin sums, each having two terms, the total number of terms 2 represented by the spin sums is 2N = 2n . The form of the Hamiltonian and partition function in eqns. (16.7.1) and (16.7.2) suggests that a matrix multiplication analogous to eqn. (16.6.7) involves entire columns

714

Discrete models and critical phenomena

of spins and that the elements of the transfer matrix should be determined by the columns rather than by the single spins of the one-dimensional case. (Note that we could also have used rows of spins and written eqn. (16.7.2) “row-wise” rather than “column-wise.”) Let us we define a full column of spins by a variable µj µj = {σ1,j , σ2,j , ..., σn,j } .

(16.7.3)

The Hamiltonian can then be conveniently represented in terms of interactions between full columns of spins. We first introduce the two functions E(µj , µk ) = J E(µj ) = J

n X

i=1 n X

σi,j σi,k σi,j σi+1,j + h

X

σi,j

(16.7.4)

i,j

i=1

in terms of which the Hamiltonian becomes H=−

n X

[E(µj , µj+1 ) + E(µj )] ,

(16.7.5)

j=1

and the partition function can be expressed as   n   X XX X [E(µj , µj+1 ) + E(µj )] . exp β ∆(N, h, T ) = ···   µ µ µ

(16.7.6)

so that the partition function becomes XX X hµ1 |P|µ2 ihµ2 |P|µ3 i · · · hµn |P|µ1 i = Tr [Pn ] . ··· ∆(N, h, T ) =

(16.7.8)

1

n

2

j=1

Although eqn. (16.7.6) now resembles the partition function of a one-dimensional Ising model, each sum over µj now represents 2n terms. We can, nevertheless, define a 2n ×2n transfer matrix P with elements    1 ′ ′ ′ (16.7.7) hµ|P|µ i = exp β E(µ, µ ) + (E(µ) + E(µ )) , 2

µ1

µ2

µn

The partition function can now be computed from the 2n eigenvalues of P as ∆(N, h, T ) = λn1 + λn2 + · · · λn2n .

(16.7.9)

As in the one-dimensional case, however, as N → ∞, n → ∞, and the contribution from the largest eigenvalue will dominate. Thus, to a very good approximation, ∆(N, h, T ) ≈ λnmax , and the problem of computing the partition function becomes one of simply finding the largest eigenvalue of P.

715

Two-dimensional Ising model

A detailed mathematical discussion of how the largest eigenvalue of P can be found is given by Huang (1963), which we will not replicate here. We simply quote the final result for the Gibbs free energy per spin at zero field in the thermodynamic limit:   Z q 1 kT π (16.7.10) 1 + 1 − K 2 sin2 φ dφ ln g(0, T ) = −kT ln [2cosh(2βJ)] − 2π 0 2 (Onsager, 1944; Kaufmann, 1949), where K = 2/[cosh(2βJ)coth(2βJ)]. The integral is the result of taking the thermodynamic limit. In his 1952 paper, C. N. Yang obtained an exact expression for the magnetization at zero field and showed that when T < Tc , where Tc is given by 2tanh2 (2J/kTc ) = 1,

kTc ≈ 2.269185J,

(16.7.11)

the magnetization is nonzero, indicating that spontaneous magnetization occurs in two dimensions. The magnetization is ( 0 T > Tc o1/8 . (16.7.12) m= n 1 − [sinh(2βJ)]−4 T < Tc

In addition to the existence of a spontaneously ordered phase, the heat capacity Ch diverges as T → Tc . The expression for the heat capacity at h = 0 near T = Tc is     2   π T 2 2J kTc Ch (T ) − 1+ , (16.7.13) − ln 1 − + ln = k π kTc Tc 2J 4

which diverges logarithmically. A graph of Ch vs. T is shown in Fig. 16.10. The logarithmic divergence emerges because the model is solved in two rather than three dimensions; in the latter, we would expect a power-law divergence. The other critical exponents can be derived for the two-dimensional Ising model and are α = 0 (logarithmic divergence), β = 1/8, γ = 7/4, and δ = 15. These are the exact exponents for the d = 2, n = 1 universality class. To date, the three-dimensional Ising model remains an unsolved problem. Should an exact solution emerge, the exponents for the d = 3, n = 1 universality class would be known. Graph-theoretic approach: The second approach is a combinatorial one that leads directly to the partition of the two-dimensional Ising model in the zero-field limit. We begin by introducing the shorthand notation K = βJ, and again we assume periodic boundary conditions. The partition function in the zero-field limit can be written as X X X X X X Q(N, T ) = ··· ··· ··· (16.7.14) σ1,1 =±1 σ2,1 =±1

×

X

X

σ1,n =±1 σ2,n =±1

σn,1 =±1 σ1,2 =±1 σ2,2 =±1

···

X

σn,n =±1

exp

  

K

n X

i,j=1

σn,2 =±1

[σi,j σi+1,j + σi,j σi,j+1 ]

  

.

The combinatorial approach starts with an identity derived from the fact that the product of spins σi,j σi′ ,j ′ = ±1 so that exp[Kσi,j σi′ ,j ′ ] = exp[±K] = cosh(K) ± sinh(K). From these relations, it follows that

716

Discrete models and critical phenomena

Fig. 16.10 Heat capacity of the two-dimensional Ising model (see eqn. (16.7.13)).

exp [Kσi,j σi′ ,j ′ ] = cosh(K) + σi,j σi′ ,j ′ sinh(K) = cosh(K) [1 + σi,j σi′ ,j ′ tanh(K)] = cosh(K) [1 + vσi,j σi′ ,j ′ ] ,

(16.7.15)

where v = tanh(K). Converting exponentiated sums into products, the partition function can be written as     X X [σi,j σi+1,j + σi,j σi,j+1 ] exp K Q(N, T ) =   i,j

{σ}=±1

=

X

exp

{σ}=±1

=

X

{σ}=±1

=

X

{σ}=±1

 

 

  

Y i,j

Y i,j

K

X i,j

σi,j σi+1,j

  

exp

  

K

X i,j

  Y eKσi,j σi+1,j   eKσi,j σi,j+1 

σi,j σi,j+1

  

i,j

  Y cosh(K) (1 + vσi,j σi+1,j )  cosh(K) (1 + vσi,j σi,j+1 ) i,j

Two-dimensional Ising model

= [cosh(K)]ν

X Y

[1 + vσi,j σi+1,j ] [1 + vσi,j σi,j+1 ] ,

717

(16.7.16)

{σ}=±1 i,j

where ν is the Ptotal number of nearest neighbors on the two-dimensional lattice and the notation {σ}=±1 indicates that all spins are summed over. Now consider just the first few terms of the product (i, j = 1, 2), which contribute the following factors to the partition function: (1 + vσ1,1 σ2,1 )(1 + vσ1,2 σ2,2 )(1 + vσ1,1 σ1,2 )(1 + vσ2,1 σ2,2 ). Multiplying this expression out gives 1 + v [σ1,1 σ2,1 + σ1,2 σ2,2 + σ1,1 σ1,2 + σ2,1 σ2,2 ] + v 2 [σ1,1 σ2,1 σ1,2 σ2,2 + σ1,1 σ2,1 σ1,1 σ1,2 + σ1,1 σ2,1 σ2,1 σ2,2 +σ1,2 σ2,2 σ1,1 σ1,2 + σ1,2 σ2,2 σ2,1 σ2,2 + σ1,1 σ1,2 σ2,1 σ2,2 ] + v 3 [σ1,1 σ2,1 σ1,2 σ2,2 σ1,1 σ1,2 + σ1,1 σ2,1 σ2,1 σ2,2 σ2,2 σ1,2 +σ1,1 σ2,1 σ2,1 σ2,2 σ1,1 σ1,2 + σ1,1 σ1,2 σ1,2 σ2,2 σ2,2 σ2,1 ] + v 4 [σ1,1 σ2,1 σ2,1 σ2,2 σ2,2 σ1,2 σ1,2 σ1,1 ] .

(16.7.17)

As the power of v increases, the number of spin factors increases. Thus, in order to keep track of the “bookkeeping,” we introduce a graphical notation. Each index pair (i, j) on the spin variables σi,j corresponds to a point on the lattice. We identify these points as the vertices of one or more graphs that can be drawn on the lattice. We also identify a spin product σi,j σi′ ,j ′ with an edge joining the vertices (i, j) and (i′ , j ′ ). Thus, in eqn. (16.7.17), there are four vertices, (1, 1), (1, 2), (2, 1), (2, 2), corresponding to four points on the lattice, and at the nth power of v, n = 0, ..., 4, there are n edges joining the vertices. Figure 16.11 shows the complete set of graphs corresponding to the terms in eqn. (16.7.17). We next ask what each graph contributes to the overall partition function. The first graph, which contains no edges, obviously contributes exactly 1, and when this is summed over N spins, each of which can take on two values, we obtain a contribution of 2N . The graphs that arise from the v 1 term contain a single edge. Consider, for example, the product σ1,1 σ2,1 , which must be summed over σ1,1 and σ2,1 . The spin sum produces four terms corresponding to (σ1,1 , σ2,1 ) = (1, 1), (−1, 1), (1, −1), (−1, −1). When the spin products are taken, two of these terms will be 1 and the other two −1, and the sum yields 0 overall. The same is true for each of the remaining spin products in the v 1 term. After some reflection, we see that all of the terms proportional to v 2 and v 3 also sum to 0. However, the v 4 term, represented by a closed graph in which each vertex is included in two edges, does not vanish, since each spin variable appears twice. The contribution from this single graph (see v 4 term in Fig. 16.11), when the sum over all N spins is carried out, is v 4 2N . Now, on a lattice of N spins, it is possible

718

Discrete models and critical phenomena

Fig. 16.11 Graphical representation of eqn. (16.7.17).

to draw N − 1 such graphs containing four vertices. Thus, the total contribution from graphs containing four vertices is (N − 1)v 4 2N . The analysis in the preceding paragraph suggests that the problem of evaluating the partition function becomes one of counting the number n(r) of closed graphs that can be drawn on the lattice containing r edges and then summing the result over r, where r = 0, 4, 6, 8, .... Figure 16.12 shows some examples of graphs that occur when r = 6 and when r = 8. Once n(r) is known, the partition function can be shown to take the following form: Q(N, T ) = 2N [cosh(K)]ν

X

n(r)v r .

(16.7.18)

r

We conclude this brief discussion by illustrating how eqn. (16.7.18) can be applied to the one-dimensional Ising model on a periodic lattice. The partition function was given by eqn. (16.6.10), which for h = 0 becomes h i Q(N, T ) = 2N coshN (K) + sinhN (K) .

(16.7.19)

Spin correlations

719

Fig. 16.12 Examples of graphs that contribute to the partition function of the two-dimensional Ising model for r = 6 and r = 8.

We now show that the graph-theoretic approach can be used to derive eqn. (16.7.19). First, note that on a one-dimensional periodic lattice, only two graphs contribute: the graph with r = 0 and the graph with r = N , which is the only closed graph that can be drawn for r > 0, i.e., one that involves each vertex in two edges. Thus, n(N ) = 1, and the number of nearest neighbors on the periodic lattice is ν = N . Putting these facts together gives the partition function as  N  (16.7.20) Q(N, T ) = 2N [cosh(K)] 1 + v N , which simplifies to

h i Q(N, T ) = 2N coshN (K) + sinhN (K)

(16.7.21)

in agreement with eqn. (16.7.19).

16.8

Spin correlations and their critical exponents

In Section 4.7.1, we considered spatial correlation functions in a liquid. Interestingly, it is possible to define an analogous quantity for the Ising model. Consider the following spin-spin correlation function at zero field: X X 1 hσi σj i = σi σj e−βH , (16.8.1) ··· Q(N, T ) σ σ 1

N

720

Discrete models and critical phenomena

where Q(N, T ) is the canonical partition function. If σi and σj occupy lattice sites at positions ri and rj , respectively, then at large spatial separation r = |ri − rj |, the correlation function should depend only on r. Heuristically, G(r) is assumed to decay exponentially according to G(r) ≡ hσi σj i − hσi ihσj i ∼

e−r/ξ rd−2+η

(16.8.2)

for T > Tc (Ma, 1976). The quantity ξ is called the correlation length. As a critical point is approached from above, long-range order sets in, and we expect ξ to diverge as T → Tc+ . This divergence is characterized by an exponent ν such that ξ ∼ |T − Tc |−ν .

(16.8.3)

As T → Tc+ , ξ → ∞, and the exponential numerator in G(r) becomes 1. In this case, G(r) decays in a manner characteristic of a system with long-range order, i.e., as a small inverse power of r. The exponent η appearing in the expression for G(r) characterizes this decay at T = Tc . The exponents ν and η cannot be determined from mean-field theory, as the meanfield approximation neglects all spatial correlations. In order to calculate these exponents, fluctuations must be restored at some level. One method that treats correlations explicitly is a field-theoretic approach known as the Landau-Ginzberg theory (Huang, 1963; Ma, 1976). This theory uses a continuous spin field to define a free-energy functional and provides a prescription for deriving the spatial correlation functions from the external field dependence of the partition function via functional differentiation. Owing to its mathematical complexity, a detailed discussion of this theory is beyond the scope of this book; instead, in the next section, we will focus on an elegant approach that is motivated by a few simple physical considerations derived from the long-range behavior spin-spin correlations.

16.9

Introduction to the renormalization group

The renormalization group (RG) theory is based on ideas first introduced by L. P. Kadanoff (1966) and K. G. Wilson (1971) and posits that near a critical point, where long-range correlations dominate, the system possesses self-similarity at any scale. It then proposes a series of coarse-graining operations that leave the system invariant, from which the ordered phases can be correctly identified.2 The RG framework also offers an explanation of universality, provides a framework for calculating the critical exponents (Wilson and Fisher, 1972; Bonanno and Zappal`a, 2001), and through a hypothesis known as the scaling hypothesis, generates sets of relations called scaling relations (Widom, 1965; Cardy, 1996) among the critical exponents. Although we will only explore here how the RG approach applies to the study of magnetic systems, 2 The term “renormalization group” has little to do with group theory in the usual mathematical sense. Although the RG does employ a series of transformations based on the physics of a system near its critical point, the RG transformations are not unique and do not form a mathematical group. Hence, references to “the” renormalization group are also misleading, but as this usage has become the common parlance, we will continue this usage here.

Renormalization Group

721

Fig. 16.13 Example of the block spin transformation on a 6×6 square lattice. The lattice on the right shows the four spins that result from applying the transformation to each 3×3 block.

the technique is very general and has been employed in problems ranging from fluid dynamics to quantum chemistry (see, for example, Baer and Head-Gordon (1998)). The proceeding discussion of the RG will be based loosely on the treatment given by Cardy (1996). In order to illustrate the RG procedure, let us consider the example of a square spin lattice shown in Fig. 16.13. In the left half of the figure, the lattice is separated into 3×3 blocks. We now consider defining a new spin lattice from the old by applying a coarsegraining procedure that replaces each 3×3 spin block with a single spin. Of course, we need a rule for constructing this new spin lattice, so let us consider the following simple algorithm: (1) count the number of up and down spins in each block; (2) if the majority of the spins in the 3×3 block are up, replace the block by a single up spin, otherwise replace it by a single down spin. For the example on the left in Fig. 16.13, the new lattice obtained by applying this procedure is shown on the right in the figure. Such a transformation is called a block spin transformation (Kadanoff, 1966). Near a critical point, the system will exhibit long-range ordering, hence the coarse-graining procedure should yield a new spin lattice that is statistically equivalent to the old one; the spin lattice is then said to possess scale invariance. Given the new spin lattice generated by the block spin transformation, we now wish to determine the Hamiltonian of this lattice. Since the new lattice must be statistically equivalent to the original one, the natural route to the transformed Hamiltonian is through the partition function. Thus, we consider the zero-field (h = 0) partition function of the original spin lattice using the Hamiltonian H0 in eqn. (16.3.4) for the Ising model as the starting point:

Q(N, T ) =

X σ1

···

X σN

e−βH0 (σ1 ,...,σN ) ≡ Trσ e−βH0 (σ1 ,...,σN ) .

(16.9.1)

722

Discrete models and critical phenomena

The transformation function T (σ ′ ; σ1 , ..., σ9 ) that yields the single spin σ ′ for each 3×3 block of 9 spin variables can be expressed mathematically as follows:  P9 1 σ ′ i=1 σi > 0 ′ . (16.9.2) T (σ ; σ1 , ..., σ9 ) =  0 otherwise

This function ensures that when the spin sum over the original lattice is performed, only those terms that conform to the rule of the block spin P transformation are nonzero. 9 That is, the only nonzero terms are those for which σ ′ and i=1 σi have the same sign. When eqn. (16.9.2) is inserted into eqn. (16.9.1), the function T (σ ′ ; σ1 , ..., σ9 ) projects out those configurations that are consistent with the block spin transformation rule, while the sum over the old spin variables σ1 , ..., σN leaves a function of only the new ′ ′ spin variables {σ1′ , ..., σN ′ }. Note that T (σ ; σ1 , ..., σ9 ) satisfies the property X T (σ ′ ; σ1 , ..., σ9 ) = 1, (16.9.3) σ′ =±1

which means simply that only one of the two values of σ ′ can satisfy the block spin transformation rule. The new spin variables {σ ′ } can now be used to define a new partition function. To see how this is done, let the Hamiltonian of the new lattice be defined according to " # Y −βH0′ ({σ′ }) ′ e = Trσ T (σ ; σ1 , ..., σ9 ) e−βH0 ({σ}) , (16.9.4) blocks

which follows from eqn. (16.9.3). Summing both sides of eqn. (16.9.4) over the relevant spin variables yields ′ ′ Trσ′ e−βH0 ({σ }) = Trσ e−βH0 ({σ}) . (16.9.5) Equation (16.9.5) states that the partition function is preserved by the block spin transformation and, consequently, so are the equilibrium properties. If the block spin transformation is devised in such a way that the functional form of the Hamiltonian is preserved, then the transformation can be iterated repeatedly on each new lattice generated by the transformation: each iteration will generate a system that is statistically equivalent to the original. Importantly, in a truly ordered state, each iteration will produce precisely the same lattice in the thermodynamic limit, thus signifying the existence of a critical point. If the functional form of the Hamiltonian is maintained, then only its parameters (e.g., the strength of the spin-spin coupling) are affected by the transformation, and thus, we can regard the transformation as one that acts on these parameters. If the original Hamiltonian contains parameters K1 , K2 , ..., ≡ K (for example, the coupling J in the Ising model), then the transformation yields a Hamiltonian with a new set of parameters K′ = (K1′ , K2′ , ...) that are functions of the old parameters K′ = R(K). (16.9.6) The vector function R defines the transformation. These equations are called the renormalization group equations or renormalization group transformations. By iterating the RG equations, it is possible to determine if a system has an ordered phase and

Renormalization Group

723

for what parameter values the ordered phase occurs. In an ordered phase, each iteration of the RG equations yields the same lattice with exactly the same Hamiltonian. Requiring that the Hamiltonian itself remain unchanged under an RG transformation is stronger than simply requiring that the functional form of the Hamiltonian be preserved. When the Hamiltonian is unchanged by the RG transformation, then the parameters K′ obtained via eqn. (16.9.6) are unaltered, implying that K = R(K).

(16.9.7)

A point K in parameter space that satisfies eqn. (16.9.7) is called a fixed point of the RG transformation. Equation (16.9.7) indicates that the Hamiltonian of an ordered phase emerges from a fixed point of the RG equations. 16.9.1

RG example: The one-dimensional Ising model

In the zero-field limit, the Hamiltonian for the one-dimensional Ising model is H0 ({σ}) = −J

N X

σi σi+1 .

(16.9.8)

i=1

Let us define a dimensionless Hamiltonian P Θ0 = βH0 and a dimensionless coupling N constant K = βJ so that Θ0 ({σ}) = −K i=1 σi σi+1 . With these definitions, the partition function becomes Q(N, T ) = Trσ e−Θ0 ({σ}) .

(16.9.9)

Consider the simple block spin transformation illustrated in Fig. 16.14. The figure shows the one-dimensional spin lattice with two different indexing schemes: The upper scheme is a straight numbering of the nine spins in the figure, while the lower scheme numbers the spins in each block. As Fig. 16.14 indicates, the block spin transformation employed in this example replaces each block of three spins with a single spin determined solely by the spin at the center of the block. Thus, for the left block, the

Fig. 16.14 Example of the block spin transformation applied to the one-dimensional Ising model. The three spins that result are shown below.

724

Discrete models and critical phenomena

new spin σ1′ = σ2 , for the middle block, σ2′ = σ5 , σ3′ = σ8 , and so forth. Though not particularly democratic, this block spin transformation should be reasonable at low temperature where local ordering is expected and the middle spin is likely to cause neighboring spins to align with it. The transformation function T (σ ′ ; σ1 , σ2 , σ3 ) for this example can expressed mathematically simply as T (σ ′ ; σ1 , σ2 , σ3 ) = δσ′ σ2 .

(16.9.10)

The new spin lattice is shown below the original lattice in Fig. 16.14. The transformation function in eqn. (16.9.10) is now used to compute the new Hamiltonian Θ′0 according to XXX X  ′ ′ e−Θ0 ({σ }) = ··· δσ1′ σ2 δσ2′ σ5 · · · eKσ1 σ2 eKσ2 σ3 eKσ3 σ4 eKσ4 σ5 · · · σ1

=

σ2

σN

σ3

XXXX σ1

σ3

σ4

σ6







· · · eKσ1 σ1 eKσ1 σ3 eKσ3 σ4 eKσ4 σ2 · · · .

(16.9.11)

Equation (16.9.11) encodes the information we need to determine the new coupling parameter K ′ . We will use the rule that when the sums over σ3 and σ4 are performed, the new interaction between σ1′ and σ2′ gives the contribution exp(K ′ σ1′ σ2′ ) to the partition function. If this rule is satisfied, then the functional form of the Hamiltonian will be preserved. The sum over σ3 and σ4 that must then be performed in eqn. (16.9.11) is XX exp[Kσ1′ σ3 ] exp[Kσ3 σ4 ] exp[Kσ4 σ2′ ]. σ3

σ4

Note that the spin product σ3 σ4 has two possible values, σ3 σ4 = ±1, which allows us to employ a convenient identity: e±θ = coshθ ± sinhθ = coshθ [1 ± tanhθ] .

(16.9.12)

Equation (16.9.12) allows us to express exp(Kσ3 σ4 ) as eKσ3 σ4 = coshK [1 + σ3 σ4 tanhK] . If we define x = tanhK, the product of the three exponentials becomes ′



eKσ1 σ3 eKσ3 σ4 eKσ4 σ2 = cosh3 K(1 + σ1′ σ3 x)(1 + σ3 σ4 x)(1 + σ4 σ2′ x) = cosh3 K(1 + σ1′ σ3 x + σ3 σ4 x + σ4 σ2′ x +σ1′ σ32 σ4 x2 + σ1′ σ3 σ4 σ2′ x2 + σ3 σ42 σ2′ x2 +σ1′ σ32 σ42 σ2′ x3 ). (16.9.13) When summed over σ3 and σ4 , most terms in eqn. (16.9.13) cancel, yielding XX   ′ ′ eKσ1 σ3 eKσ3 σ4 eKσ4 σ2 = 2cosh3 K 1 + σ1′ σ2′ x3 ≡ coshK ′ [1 + σ1′ σ2′ x′ ] . σ3

σ4

(16.9.14) In eqn. (16.9.14), we have expressed the interaction in its original form but with a new coupling constant K ′ . In order for the interaction term [1 + σ1′ σ2′ x′ ] to match the original interaction [1 + σ1′ σ2′ x3 ], we require x′ = x3 or tanhK ′ = tanh3 K

Renormalization Group

  K ′ = tanh−1 tanh3 K .

Equation (16.9.15) defines the RG transformation as   R(K) = tanh−1 tanh3 (K) .

725

(16.9.15)

(16.9.16)

We must remember, however, that eqn. (16.9.16) is particular to the block spin transformation in eqn. (16.9.10). From eqns. (16.9.14) and (16.9.15), we obtain the new Hamiltonian as N′ X ′ ′ ′ ′ ′ σi′ σi+1 , (16.9.17) Θ0 ({σ }) = N g(K) − K i=1

where the spin-independent function g(K) is given by   2 cosh3 K 1 − ln 2. g(K) = − ln 3 coshK ′ 3

(16.9.18)

and N ′ is a constant. Thus, apart from the N ′ g(K) term in eqn. (16.9.17), the new Hamiltonian has the same functional form as the original Hamiltonian, but it is a function of the new spin variables and coupling constant K ′ . The block spin transformation of eqn. (16.9.10) could be applied again to the Hamiltonian in eqn. (16.9.17), leading to a new Hamiltonian Θ′′0 in terms of new spin ′′ ′′ variables σ1′′ , ...., σN ′′ and a new coupling constant K . It is a straightforward exercise to show that the coupling constant K ′′ would be related to K ′ by eqn. (16.9.15). Repeated application of the block spin transformation is, therefore, equivalent to iteration of the RG equation. Since the coupling constant K depends on temperature through K = J/kT , this iterative procedure can determine if, for some temperature, an ordered phase exists. Recall that an ordered phase corresponds to the fixed point condition in eqn. (16.9.7). From eqn. (16.9.16), this condition for the present example becomes   K = tanh−1 tanh3 K . (16.9.19) In terms of x = tanhK, the fixed point condition is simply x = x3 . Since K ≥ 0, the only possible solutions to this fixed point equation are x = 0 and x = 1. To understand the physical content of these solutions, consider the RG equation away from the fixed point: x′ = x3 . Since K = J/kT , at high T , K → 0 and x = tanhK → 0+ . At low temperature, K → ∞ and x → 1− . If we view the RG equation as an iteration or recursion of the form xn+1 = x3n ,

(16.9.20)

and we start the recursion at x0 = 1, then each successive iteration will yield 1. However, for any value x = 1 − ǫ less than 1 (here, ǫ > 0), eqn. (16.9.20) eventually iterates to 0. These two scenarios are illustrated in Fig. 16.15. The iteration of eqn. (16.9.20) generates a renormalization group flow through the one-dimensional coupling-constant space. The fixed point at x = 1 is called an unstable fixed point because any value of x0 other than 1, when iterated through the RG equation, flows away from this point

726

Discrete models and critical phenomena

.

.

Stable

x =1 K=

x=0 K=0

8

Unstable

Fig. 16.15 RG flow for the one-dimensional Ising model.

toward the stable fixed point at x = 0. As the stable fixed point is approached, the coupling constant decreases until it reaches K = 0, corresponding to infinite temperature! At the unstable fixed point (x = 1), K = ∞ and T = 0. The absence of a fixed point for any finite, nonzero value of temperature tells us that there can be no ordered phase (and hence no critical point) in one dimension. Note, however, that in one dimension, perfect ordering exists at T = 0. Although this is not a physically meaningful ordered phase (T = 0 can never be achieved), this result suggests that ordered phases and critical points are associated with the unstable fixed points of the RG equations. Recall that we also obtained ordering at T = 0 from the exact analytical solution of the one-dimensional Ising model in Section 16.6. Let us make one additional observation about the T = 0 unstable fixed point by analyzing the behavior of the correlation length ξ at T = 0. ξ has units of length, but if we choose to measure it in units of the lattice spacing, then it can only depend on the coupling constant K or x = tanhK, i.e., ξ = ξ(x). Under the block spin transformation of eqn. (16.9.10), as Fig. 16.14 indicates, the lattice spacing increases by a factor of 3 as a result of coarse graining. Thus, in units of the lattice spacing, ξ must decrease by a factor of 3 in order to maintain the same physical distance. Thus, ξ(x′ ) =

1 ξ(x). 3

(16.9.21)

More generally, if we had taken our blocks to have b spins, eqn. (16.9.21) suggests that ξ should transform as 1 ξ(x′ ) = ξ(x). (16.9.22) b In addition, the RG equation would become x′ = xb . We now seek a functional form for ξ(x) that satisfies eqn. (16.9.22). In fact, only one functional form is possible, namely 1 . ln x This can be shown straightforwardly as follows: ξ(x) ∼

ξ(x′ ) = ξ(xb ) ∼

1 1 1 = = ξ(x). b ln x b ln x b

(16.9.23)

(16.9.24)

Therefore, the correlation length ξ(K) ∼ 1/ ln(tanhK) −→ ∞ as T −→ 0, so that at T = 0 the correlation length is infinite, which is another indication that an ordered phase exists.

Renormalization Group

727

Finally, we examine the behavior of the RG equation at very low T where K is large. Note that eqn. (16.9.15) can be written as tanhK ′ = tanh3 K = tanhKtanh2 K = tanhK



 cosh(2K) − 1 . cosh(2K) + 1

(16.9.25)

The term in brackets is very close to 1 when K is large. Thus, when K is large, eqn. (16.9.25) can be expressed as K ′ ∼ K, which is a linearized version of the RG equation. On an arbitrary spin lattice, interactions between blocks are predominantly mediated by interactions between spins along the boundaries of the blocks (see Fig. 16.16 for an illustration of this in two dimensions). In one dimension, this interaction involves a single spin pair, and thus we expect a block spin transformation in one dimension to yield a coupling constant of the same order as the original coupling constant at low T where there is significant alignment between the blocks.

16.10 Fixed points of the renormalization group equations in greater than one dimension Figure 16.16 shows a two-dimensional spin lattice and the interactions between two blocks, which are mediated by the boundary spins. In more than one dimension, these interactions are mediated by more than a single spin pair. For the case of the 3×3 blocks shown in the figure, there are three boundary spin pairs mediating the interaction between blocks. Consequently, the result of a block spin transformation should yield,

Fig. 16.16 Interactions between blocks of a square spin lattice.

728

Discrete models and critical phenomena

at low T , a coupling constant K ′ roughly three times as large as the original coupling constant K, i.e., K ′ ∼ 3K. In a three-dimensional lattice, using 3 × 3 × 3 blocks, interactions between blocks would be mediated by 32 = 9 spin pairs. Generally, in d dimensions using blocks of bd spins, the RG equation at low T should behave as K ′ ∼ bd−1 K.

(16.10.1)

The number b is called the length scaling factor. Equation (16.10.1) implies that for d > 1, K ′ > K at low T . Thus, iteration of the RG equation at low temperature yields an RG flow towards K = ∞, and the fixed point at T = 0 becomes stable (in one dimension, this fixed point was unstable). However, we know that at high temperature, the system must be in a disordered state, and hence the fixed point at T = ∞ must remain a stable fixed point, as it was in one dimension. These two observations suggest that for d > 1, there must be a third fixed point with coupling constant x˜ between T = 0 and T = ∞. Moreover, an iteration initiated with x0 = x˜ + ǫ (ǫ > 0) must iterate to x = 1 where T = 0, and an iteration initiated from x0 = x ˜ − ǫ must iterate to x = 0 where T = ∞. Hence, this fixed point is unstable and is, therefore, a critical ˜ = Kc . To the extent that an RG flow in more than one dimension point with K can be represented as a one-dimensional process, the flow diagram would appear as in Fig. 16.17. Since this unstable fixed point corresponds to a finite, nonzero temperature

Fig. 16.17 Renormalization group flow in more than one dimension. The figure shows the iteration to each stable fixed point starting from the unstable fixed point and an arbitrary point K = K0 .

Tc , it is a physical critical point. This claim is further supported by the evolution of the correlation length under the RG flow. Recall that for a length scaling factor b, the correlation length transforms as ξ(K ′ ) = ξ(K)/b or ξ(K) = bξ(K ′ ). Suppose that we start at a point K near Kc and that n(K) iterations of the RG equation are required to reach a value K0 between K = 0 and K = Kc . If ξ0 is the correlation length at K = K0 , which should be a finite number of order 1, then by eqn. (16.9.22) we find that ξ(K) = ξ0 bn(K) .

(16.10.2)

As the starting point K is chosen closer and closer to Kc , the number of iterations needed to reach K0 increases. In the limit that the initial point K → Kc , the number of iterations is needed to reach K0 approaches infinity. According to eqn. (16.10.2), as K approaches Kc , the correlation length becomes infinite as expected in an ordered phase. Thus, the new unstable fixed point must correspond to a critical point.

Linearized RG

729

From this understanding of the correlation length behavior, we can analyze the exponent ν using the RG equation near the unstable fixed point. When K = Kc , the fixed point condition requires that Kc = R(Kc ). Near the fixed point, we can expand the RG equation to give K ′ ≈ R(Kc ) + (K − Kc )R′ (Kc ) + · · · .

(16.10.3)



Let us write R′ (Kc ) as blnR (Kc )/lnb and define an exponent y = ln R′ (Kc )/ ln b. Using this exponent, eqn. (16.10.3) becomes K ′ ≈ Kc + by (K − Kc ).

(16.10.4)

Near the critical point, ξ diverges according to −ν K − Kc −ν K − Kc −ν 1 1 ∼ ∼ ξ ∼ |T − Tc |−ν ∼ − K Kc . K Kc

(16.10.5)

Thus, ξ ∼ |K − Kc |−ν . However, since ξ(K) = bξ(K ′ ), it follows that |K − Kc |−ν ∼ b|K ′ − Kc |−ν = b|by (K − Kc )|−ν , which is only possible if ν=

1 . y

(16.10.6)

(16.10.7)

Equation (16.10.7) illustrates the general result that critical exponents are related to derivatives of the RG transformation.

16.11

General linearized renormalization group theory

Our discussion in the previous section illustrates the power of the linearized RG equations. We now generalize this approach to a Hamiltonian Θ0 with parameters K1 , K2 , K3 , ..., ≡ K. Equation (16.9.6) for the RG transformation can be linearized about an unstable fixed point at K∗ according to X Ka′ ≈ Ka∗ + Tab (Kb − Kb∗ ), (16.11.1) b

where Tab

∂Ra . = ∂Kb K=K∗

(16.11.2)

Note that the matrix T is not required to be symmetric. Consequently, we define a left eigenvalue equation for T according to X φia Tab = λi φib (16.11.3) a

and a scaling variable ui as

730

Discrete models and critical phenomena

ui =

X a

φia (Ka − Ka∗ ).

(16.11.4)

The term “scaling variable” arises from the fact that ui transforms multiplicatively near a fixed point under the linearized RG flow: X u′i = φia (Ka′ − Ka∗ ) a

=

XX a

=

X b

b

φia Tab (Kb − Kb∗ )

λi φib (Kb − Kb∗ )

= λi ui .

(16.11.5)

Suppose the eigenvalues λi are real. Since u′i = λi ui , ui will increase if λi > 1 and decrease if λi < 1. Redefining the eigenvalues λi as λi = byi ,

(16.11.6)

u′i = byi ui .

(16.11.7)

we see that By convention, the quantities {yi } are referred to as the RG eigenvalues. From the discussion in the preceding paragraph, three cases can be identified for the RG eigenvalues: 1. If yi > 0, the scaling variable ui is called relevant because repeated iteration of the RG transformation drives it away from its fixed point value at ui = 0. 2. If yi < 0, the scaling variable ui is called irrelevant because repeated iteration of the RG transformation drives it toward 0. 3. yi = 0. The scaling variable ui is referred to as marginal because we cannot determine from the linearized RG equations whether ui will iterate towards or away from the fixed point. Typically, scaling variables are either relevant or irrelevant; marginality is rare. The number of relevant scaling variables corresponds to the number of experimentally tunable parameters such as P and T in a fluid system or T and h in a magnetic system. For the former, the relevant variables are called the thermal and magnetic scaling variables, respectively. The thermal and magnetic scaling variables have corresponding RG eigenvalues yt and yh . An analysis of the scaling properties of the singular part g˜(h, T ) of the Gibbs free energy g(h, T ), which obeys g˜(h, T ) = b−d g˜(byh h, byt , T ), leads to the following relations for the primary critical exponents: α=2−

d , yt

β=

d − yh , yt

γ=

2yh − d , yt

δ=

yh . d − yh

(16.11.8)

These relations are obtained by differentiating g˜(h, T ) to obtain the heat capacity, magnetization, and magnetic susceptibility. From eqns. (16.11.8), the relations α +

Universality and linearized RG

731

2β + γ = 2 and α + β(1 + δ) = 2, which are examples of scaling relations, can be easily derived. Two other scaling relations can be derived from the scaling behavior of ˜ the spin-spin correlation function G(r) = b−2(d−yh ) G(r/b, byt t), where t = (T − Tc )/T . These are α = 2 − dν and γ = ν(2 − η). Because of such scaling relations, we do not need to determine all of the critical exponents individually. For the Ising model, we see that there are four such scaling relations, indicating that only two of the exponents, ν and η, of the six total are independent. Because a subset of the critical exponents still needs to be determined by some method, numerical simulations play an important role in the implementation of the RG, and techniques such as the Wang-Landau and M(RT)2 schemes carried out on a lattice are useful approaches that can be employed (see Problems 16.8 and 16.9).

16.12 Understanding universality from the linearized renormalization group theory In the linearized RG theory, at a fixed point, all scaling variables are zero, regardless of whether they are relevant, irrelevant, or marginal. Let us assume for the present discussion that there are no marginal scaling variables. From the definitions of relevant and irrelevant scaling variables, we can propose a formal procedure for locating fixed points. Begin with the space spanned by the full set of eigenvectors of T, and project out the relevant subspace by setting all the relevant scaling variables to zero by hand. The remaining subspace is spanned by the irrelevant eigenvectors of T, which defines a hypersurface in the full coupling constant space. This surface is called the critical

Fig. 16.18 A renormalization group trajectory.

732

Discrete models and critical phenomena

hypersurface. Any point on the critical hypersurface belongs to the irrelevant subspace and iterates to zero under successive RG transformations. This procedure defines a trajectory on the hypersurface that leads to a fixed point, as illustrated in Fig. 16.18. This fixed point, called the critical fixed point, is stable with respect to irrelevant scaling variables and unstable with respect to relevant scaling variables. In order to understand the importance of the critical fixed point, consider a simple model in which there is one relevant and one irrelevant scaling variable. Let these be denoted as u1 and u2 , respectively, and let these variables have corresponding couplings K1 and K2 . In an Ising model, K1 might represent the reduced nearestneighbor coupling, and K2 might represent a next-nearest-neighbor coupling. Relevant variables also include experimentally tunable parameters such as temperature and magnetic field. The reason u1 is relevant and u2 is irrelevant is that there must be a nearest-neighbor coupling for the existence of a critical point and ordered phase at h = 0, but magnetization can occur even if there is no next-nearest-neighbor coupling. According to the procedure of the preceding paragraph, the condition u1 (K1 , K2 ) = 0 defines the critical surface, which in this case, is a one-dimensional curve in the K1 K2 plane as illustrated in Fig. 16.19. Here, the black curve represents the critical “surface” (curve), and the point at which the arrows meet is the critical fixed point. The full coupling constant space represents the space of all physical systems containing nearest-neighbor and next-nearest-neighbor couplings. If we wish to consider the sub-

Fig. 16.19 Example curves defined by u1 (K1 , K2 ). The critical curve defined by u1 (K1 , K2 ) = 0 is shown in black and iterates to the critical fixed point P .

set of systems with no next-nearest-neighbor coupling (K2 = 0), the point at which

Other applications

733

the line K2 = 0 intersects the critical surface defines the critical value K1c and the corresponding critical temperature, and it is an unstable fixed point of an RG transformation with K2 = 0. Similarly, if we consider a model for which K2 6= 0, then the point at which this line intersects the critical surface determines the critical value of K1 for such a model. In fact, for any of these models, K1c lies on the critical surface and iterates toward the critical fixed point under the full RG transformation. Thus, we have an effective definition of a universality class: All models characterized by the same critical fixed point belong to the same universality class and share the same critical properties.

16.13

Other uses of discrete models

Discrete models of the type discussed in this chapter can be applied in areas other than magnetic systems, rendering them powerful tools for extracting complex patterns of behavior within a relatively simple framework. Problem 16.5 represents one approach for studying the conformational states of a simple polymer known as a rotational isomeric states model. In this model, the trans and two gauche states of each dihedral angle in, for example, a simple alkane, are treated as discrete. These discrete states are then used to construct an energy model similar to a generalized Potts model (cf. eqn. (16.3.5)). From such a model, the probability associated with any set of dihedral angles can be determined and properties such as the average end-to-end distance can be estimated and compared to more advanced sampling methods (Zhu et al., 2002). Ising models have been used to describe short-range order and long-range disorder in molecular crystals (Welberry, 1979), in which the “spins” represent different discrete orientations of a molecule in the crystal lattice. In the biomolecular field, Potts and Ising models have found an important application in connecting chemical composition to structure, function, and fitness. Specifically, a protein or nucleic acid can be considered as a sequence S consisting of a string {σ} of letter codes corresponding to amino or nucleic acids at each of M positions in the sequence S (Levy et al., 2017; Haldane and Levy, 2019; Wilburn and Eddy, 2020). The field and coupling parameters can be fit to databases of sequences, after which the model can predict conformational and fitness landscapes of mutations. A variant of the Potts model, known as the Cellular Potts model (Graner and Glazier, 1992; Savill and Hogeweg, 1997; Szab´ o and Merks, 2013; Guisoni et al., 2018) considers the spins as different types of cells with different pairwise adhesion energies, and such a model allows for the study of tissue morphogenesis and errant phenomena such as tumor growth under active cell motion. These are just a few areas in which discrete models have been adapted for application to complex problems well beyond spin lattices. In essence, any phenomenon that can be mapped onto a set of discrete states with local interactions between them is likely to fit into the paradigm of this class of models. The primary challenge then becomes the availability (or lack) of observed data to determine parameters of sufficient quality that the chosen model becomes subsequently predictive.

734

Discrete models and critical phenomena

16.14

Problems

16.1. Consider a block spin transformation of the one-dimensional Ising model with h 6= 0, in which every other spin is summed over. Such a procedure is also called a decimation procedure. a. Write down the transformation operator T for this transformation, and show that the transformation leads to a value of b = 2. b. Derive the RG equation for this transformation, and find the fixed points. c. Sketch the RG flow in the (x, y) plane. What is the nature of the fixed points, and what do they imply about the existence of a critical point? 16.2. The class of models known as Potts models allow, in general, N discrete states for each spin variable on the lattice. The one-dimensional three-state Potts model is defined as follows: each site i is a ‘spin’ which may take on three values, 1, 2, or 3. The Hamiltonian is given by X δsi ,si+1 . H = −J i

Use the same decimation procedure as in Problem 16.1, derive the RG equation, and find the fixed points. Does this model have a critical point? ∗

16.3. The Maxwell construction in the van der Waals theory attempts to fix the unphysical behavior of the theory resulting from ∂P/∂V > 0 for T < Tc . The procedure is shown schematically in Fig. 16.20. The construction introduces a tie line for the discontinuous change in volume resulting from the firstorder gas–liquid phase transition. The location of the tie line must be chosen so that the two areas enclosed by the tie line and the P V curve above and below the tie line are equal (a1 = a2 ). Show that the Maxwell construction together with the van der Waals equation of state in eqn. (4.8.36) leads to a mean-field value of the critical exponent β equal to 1/2. 16.4. For the spin- 21 Ising model in one dimension with h = 0, recall that the partition function could be expressed in the form  ∆ = Tr PN .

Consider an RG transformation, called the pair cell transformation, in which ∆ is reexpressed as h N/2 i . ∆ = Tr P2

The transfer matrix is redefined by P′ = P2 . Find the RG equation corresponding to this transformation and show that it leads to the expected stable fixed point.

Problems

735

Fig. 16.20 Maxwell construction.

Hint: Try redefining the coupling constant by u = eK and show that P′ can be put in the same form as P, i.e., P′ (u) = c(u)P(u′ ), and that c can be defined implicitly in terms of u′ . ∗

16.5. A simple model of a long polymer chain consists of the following assumptions: i. The conformational energy E of the chain is determined solely from its backbone dihedral angles. ii. Each dihedral angle can assume three possible values denoted t for “trans” and g + and g − for the two “gauche” conformations. However, the present model is discrete in the sense that t, g + , and g − are the only values the dihedral angles may assume. iii. Each conformation has an intrinsic energy and is also influenced by the conformations of nearest neighbor dihedral angles only. If the polymer has N atomic sites, then there are N −3 dihedral angles numbered φ1 , ..., φN −3 by convention. The total energy E(φ1 , ..., φN −3 ) can be written as E(φ1 , ..., φN −3 ) =

N −3 X i=1

ε1 (φi ) +

N −3 X

ε2 (φi−1 , φi ),

i=2

where each φi has values t, g + , or g − . iv. The two energy functions ε1 and ε2 are assumed to have the following values: ε1 (t) = 0

736

Discrete models and critical phenomena

ε1 (g + ) = ε(g − ) = ε ε2 (g + , g − ) = ε2 (g − , g + ) = ∞ ε2 (φi−1 , φi ) = 0

for all other combinations.

a. Calculate the canonical partition function for this system. You may express your answer in terms of σ ≡ exp(−βε). b. Show that, in the limit N → ∞, the partition function behaves as lim

N →∞

where χ=

1 ln Q = ln χ, N

i p 1h (1 + σ) + 1 + 6σ + σ 2 . 2

c. What is the probability, for large N , that all angles will be in the trans conformation? d. What is the probability, for large N , that the angles will alternate trans, gauche, trans, gauche, ...? 16.6. Consider the Ising Hamiltonian in eqn. (16.3.3) for which each spin has z neighbors on the lattice. Each spin variable σi can take on three values, −1, 0, 1. Using mean-field theory, find the transcendental equation for the magnetization, and determine the critical temperature of this model. What are the critical exponents? 16.7. In 1978, H. J. Maris and L. J. Kadanoff introduced an RG procedure for the two-dimensional zero-field Ising model (Maris and Kadanoff, 1978; Chandler, 1987). This problem explores this procedure step by step, leading to the RG equation. a. Consider the following labeling of spins on the two-dimensional periodic lattice shown in Fig. 16.21(a). The first step is to sum over half of the spins on the lattice by partitioning the summand of the partition function in such a way that each spin to be summed over appears in only one Boltzmann factor. Show that the resulting partition function, for one choice of the spins summed over, corresponds to the spin lattice shown below and takes the form i h X Q= · · · eK(σ1 +σ2 +σ3 +σ4 ) + e−K(σ1 +σ2 +σ3 +σ4 ) remaining spins

i h × eK(σ2 +σ3 +σ7 +σ8 ) + e−K(σ2 +σ3 +σ7 +σ8 ) · · · ,

where K = βJ (see Fig. 16.21(b)).

Problems

737

Fig. 16.21 (a) Labeling of spins on a two-dimensional lattice. (b) Spin lattice after decimation.

b. Consider trying to write one of the terms in brackets in the above expression as eK(σ1 +σ2 +σ3 +σ4 ) + e−K(σ1 +σ2 +σ3 +σ4 ) = g(K)eK



(σ1 σ2 +σ2 σ3 +σ3 σ4 +σ4 σ1 )

,

where the new coupling constant K ′ and the function g(K) are to be determined by requiring that this equation be satisfied by the four nonequivalent choices of the spins: σ1 = σ2 = σ3 = σ4 = ±1 σ1 = σ2 = σ3 = −σ4 = ±1

σ1 = σ2 = −σ3 = −σ4 = ±1 σ1 = −σ2 = σ3 = −σ4 = ±1. Show that g(K) and K ′ cannot be determined in this way. c. Consider instead introducing several new coupling constants, K1 , K2 , and K3 , and writing eK(σ1 +σ2 +σ3 +σ4 ) + e−K(σ1 +σ2 +σ3 +σ4 ) = g(K)e(1/2)K1 (σ1 σ2 +σ2 σ3 +σ3 σ4 +σ4 σ1 )+K2 (σ1 σ3 +σ2 σ4 )+K3 σ1 σ2 σ3 σ4 . By inserting the four nonequivalent choices of the spin variables from part b, find expressions for K1 , K2 , K3 , and g(K) in terms of K. Interpret the resulting partition function.

738

Discrete models and critical phenomena

d. Note that the result of part c does not lead to an exact RG procedure. Show that if K2 and K3 are neglected, an RG equation of the form K1 =

1 ln cosh(4K) 4

results. Does this lead to a critical point? e. In order to improve the results, it is proposed to neglect only K3 and treat the K2 term approximately. Let Σ1 and Σ2 be the spin sums multiplying K1 and K2 , respectively, in the partition function expression. Consider the approximation K1 Σ1 + K2 Σ2 ≈ K ′ (K1 , K2 )Σ′i,j σi σj , where the sum is taken only over nearest neighbors. What is the partition function that results from this approximation? f. Define the free energy per spin as a(K) =

1 ln Q(K). N

Show that the free energy satisfies an RG equation of the form a(K) =

1 1 ln g(K) + a(K ′ ). 2 2

g. Show that K ′ can be estimated by K ′ ≈ K1 + K2 . h. Derive the RG equation that results, and show that it predicts the existence of a critical point. What is the value of Kc ? i. By expanding the free energy in a Taylor series about K = Kc , calculate the critical exponent α and compare your value to the exact solution of Onsager. j. Finally, compare the critical temperature you obtain to the Onsager result: J = 0.44069. kTc k. Devise an analog of the Maris-Kadanoff scheme for the one-dimensional free-field Ising model by summing over every other spin on the onedimensional spin lattice. What is the RG equation that results? Show that your one-dimensional equation yields the expected fixed point.

Problems

739

16.8. The Wang-Landau method of Section 7.6 can be easily adapted for spin lattices. First, since the spin variables take on discrete values, the total energy E of the spin lattice also takes on discrete values. Therefore, we can write the canonical partition function as X Q(β) = Ω(E)e−βE , E

where Ω(E) is the density of states. A trial move consists of flipping a randomly chosen spin and then applying eqn. (7.6.3) to decide whether the move is accepted. Finally, the density of states at the final energy E is modified by the scaling factor using ln Ω(E) → ln Ω(E) + ln f . Write a Wang-Landau Monte Carlo code to calculate the partition function and free energy per spin at different temperatures of a 50×50 spin lattice in the absence of an external field. Take the initial state of the lattice to be a randomly chosen set of spin values, and apply periodic boundary conditions to the lattice. 16.9. Write a simple M(RT)2 Monte Carlo program to sample the distribution function of a two-dimensional Ising model in the presence of an external field h. Observe the behavior of your algorithm for temperatures T > Tc and T < Tc at different field strengths. For each case, calculate and plot the spin-spin correlation function hσi σj i − hσi ihσj i.

17 Introduction to machine learning in statistical mechanics 17.1

Machine learning in statistical mechanics: What and why?

The term artificial intelligence (AI) brings to mind the creation of computer systems capable of mimicking the decision-making and problem-solving tasks of a human mind by emulating its thought patterns. In a broad sense, machine learning is a pathway to AI that uses statistical models and “training” algorithms to take in data, learn insights and patterns in the data, and apply that learning to make new predictions without additional input or programming. Viewed this way, it follows that machine learning is best poised to provide reliable predictions when data are plentiful. For example, online vendors track shoppers’ purchases and employ machine learning models to learn their preferences so that they can recommend to shoppers specific items they might want to purchase in the future; however, the very notion of “preference” implies a pattern in a shopper’s purchases, which can only be discerned if a large number of purchases can be analyzed for such patterns. Recommendations based only on one or a few purchases may or may not meet a shopper’s needs or wants. In many applications, machine learning is employed to perform one of several important tasks; these include regression, classification, data clustering, feature extraction and engineering, and dimensionality reduction, tasks that are particularly useful in statistical mechanics. When molecular simulation approaches are applied to investigate a complex system, large amounts of data, such as time series, conformational samples, energies, free energies, distributions, and so forth, are generated and need to be analyzed. Most of the time, this data is of very high dimension and may contain complex patterns that are difficult to discern using simple statistical analysis tools. When this is the case, machine learning becomes a powerful tool for learning these patterns and predicting trends in a system that might not have been explicitly generated in a simulation. It is also very likely to happen that a simulation yields incomplete data about a system or a particular thermodynamic or dynamical process. In such instances, machine learning can often fill in some of the missing information. In an enhanced conformational sampling simulation, for example, employing a method such as d-AFED/TAMD (Section 8.10) or replica-exchange MD (Section 7.5), a sparse sampling of points is produced as the simulation sweeps over a free-energy hypersurface. The density of points increases with the number of sweeps, and at some stage, there will be enough points to fit or “train” a regression model that can subsequently predict free energies of points not visited during the simulation while also providing a smooth closed-form representation of the surface (Schneider et al., 2017; Zhang et al.,

Three probability distributions

741

2018; Wang et al., 2021). The number of sampled points needed to generate an accurate model may be far fewer than that needed to reach full convergence in a direct simulation. In studying complex processes for which a suitable reaction coordinate might not be known a priori, a classification model can be used to look for patterns in local structural information and use this information to create an order parameter capable of parameterizing a pathway from one conformational basin to another. This has been done, for example, to study a solid-solid phase transition in a metal (Rogal et al., 2019), in which environmental descriptors are used as a means of classifying individual atoms as belonging to one phase or another; the classification model ultimately allows free-energy barriers and mechanisms of the phase transition can be determined. Such an approach could also be applied to the prediction of pathways of protein or nucleic acid folding (Senior et al., 2019; AlQuraishi, 2019). Finally, in the field of active matter, where systems are driven to desired patterns and modes of behavior by supplying them with energy and applying external stimuli, given an appropriate amount of data and a rule-discovery algorithm, machine learning can be leveraged to identify the underlying interactions and governing equations that lead to the emergence of such complex behavior (Cichos et al., 2020). The purpose of this chapter is to introduce basic concepts in machine learning, including a selection of machine learning models particularly useful in statistical mechanics, and to show how these models can be applied to solve a range of specific statistical mechanical problems. It is important to note that machine learning methods, in and of themselves, are not likely to reveal any new physical insights. However, because they are able to predict the outcomes of computationally intensive tasks with significantly greater efficiency than without the benefit of machine learning, their application can lead to such new insights by allowing time and length scales to be significantly increased. Readers interested in a more comprehensive examination of the machine learning field, which has evolved into an enormous discipline, or a broader survey of the landscape of machine learning models, are referred to, for example, C. M. Bishop, Pattern Recognition and Machine Learning or Hastie et al., The Elements of Statistical Learning. Additional references include Neural Networks and Computing by T. W. S. Chow and S. -Y. Cho, and Understanding Machine Learning by S. Shalev-Shwartz and S. Ben-David. Mathematical underpinnings of machine learning are described in D. Simovici’s Mathematical Analysis for Machine Learning and Data Mining.

17.2

Three key probability distributions

In our discussion of algorithms for training machine learning models, we will make use of several standard probability distribution functions. These are the Gaussian, Bernoulli, and categorical distributions, each of which we discuss in this section. The Gaussian, or normal, distribution is a familiar and widely used function that models the distribution of continuous variables. Thinking back to Section 7.2, the central limit theorem guarantees the convergence of a Monte Carlo calculation by establishing that estimators of averages approach a Gaussian distribution about the true average. Given a continuous random variable x, the Gaussian distribution takes the form

742

Machine learning

PG (x; µ, σ) =

1 1/2 (2πσ 2 )

e−(x−µ)

2

/2σ2

,

(17.2.1)

where µ and σ 2 are the mean and variance of the distribution. Here, µ determines the location of the peak of the distribution and σ determines its width, meaning that Z ∞ 2 2 h(x − µ) i = (x − µ) PG (x; µ, σ) dx = σ 2 . (17.2.2) −∞

The number of parameters needed to specify a Gaussian distribution of one variable is clearly two. If, instead of a single random variable, we have an n-dimensional vector x of random variables and a vector of mean values µ, the general form of the Gaussian distribution is PG (x; µ, Σ) =

T −1 1 1 e−(x−µ) Σ (x−µ)/2 , (2π)n/2 [det(Σ)]1/2

(17.2.3)

where Σ is an n × n matrix known as the covariance matrix, related to averages of the distribution by h(x − µ)(x − µ)T i = Σ. (17.2.4) Equation (17.2.4) makes clear that Σ is a symmetric matrix, which, therefore, has real eigenvalues. For eqn. (17.2.3) to be well defined, the eigenvalues of Σ must also be positive-definite. In any case, the number of parameters needed to specify a Gaussian distribution of x is n(n + 1)/2 + n = n(n + 3)/2. However, from eqn. (17.2.4), we see that if the components of x are independent random variables, then Σ becomes a diagonal matrix, Σ = diag(σ12 σ22 · · · σn2 ), such that h(x − µ)(x − µ)T iij = σi2 δij ,

(17.2.5)

which leaves us with 2n parameters to specify the distribution. Finally, if the components of x are isotropic, then all of the σi are the same, so that Σ reduces to Σ = σ 2 I, where I is the n × n identity matrix, which leaves just n + 1 parameters needed to specify the distribution. The second distribution we will employ in our discussion of machine learning concerns random variables that can take on discrete values only. For starters, suppose x is a binary random variable that can take on the values 0 or 1. An example is a coin toss where we assign tails = 0 and heads = 1. Let ν be the probability that x = 1 so that 1 − ν is the probability that x = 0. Then, the probability distribution of x is a special case of a binomial distribution known as a Bernoulli distribution, which is given by 1−x PB (x; ν) = ν x (1 − ν) . (17.2.6)

With the convention used here, it can be easily shown that hxi = ν and hx2 i − hxi2 = ν(1 − ν). The Bernoulli distribution requires that just one parameter, ν, be specified. Suppose, next, that x can take on n values, which, for simplicity, we take to be the integers 1, 2, 3, ..., n. An example is the roll of a six-sided die. The third distribution, called the categorical distribution, generalizes the Bernoulli distribution to treat such

Simple linear regression

743

cases where n > 2. If the probabilities that x takes on each of n values are ν1 , ..., νn ≡ ν, with ν1 + ν2 + · · · + νn = 1, then the categorical distribution takes the form PC (x; ν) =

n Y

[x=i]

νi

.

(17.2.7)

i=1

Here, the quantity [x i] evaluates to 1 if x = i and to 0 if x 6= i. It is straightforward P= n to show that hxi = i=1 iνi . An alternative formulation of the categorical distribution introduces an n-dimensional vector x having just one component with the value 1 and all other components with the value 0. There will be n such vectors, and if νi is the probability associated with the vector xi whose single nonzero component is xi , then the categorical distribution can be written as PC (x; ν) =

n Y

νixi .

(17.2.8)

i=1

With this definition, it can be shown that hxi = ν. In either formulation, the categorical distribution reduces to the Bernoulli distribution for binary random variables.

17.3

Simple linear regression as a case study

In order to introduce the basic concepts of machine learning for a regression problem, we will use the “toy” example of simple linear regression, but we will discuss it within the framework and nomenclature of machine learning. Suppose we have a set of data points (x1 , y1 ), (x2 , y2 ),...,(xn , yn ) that we assume lie on a line yi = w0 + w1 xi , where w0 and w1 are the intercept and slope of the line, respectively. If the points come from a measurement of some kind and have associated measurement error, they might not satisfy a perfect linear relation. Therefore, our task is to find an optimal linear model that best fits the given data. To this end, we take as a data model the linear form y(x, w) = w0 + w1 x,

(17.3.1)

where w = (w0 , w1 ) is a two-dimensional vector of parameters whose specific values determine the optimal model. Figure 17.1 illustrates how the data might be distributed around the assumed model in eqn. (17.3.1). To determine w, we seek an objective function, expressing the deviation of the input data from the target model in eqn. (17.3.1); this objective function, when minimized with respect to w, yields an optimal realization of the model capable of predicting any new value of y given an input x. Optimization of the linear model is achieved by a least-squares minimization procedure, in which the average distance between yi and the prediction of yi by the model in eqn. (17.3.1), i.e., y(xi , w), is minimized with respect to w. This procedure defines an objective function, which takes the form E(w) =

N 1 X |yi − y(xi , w)|2 . N i=1

(17.3.2)

The function E(w) in eqn. (17.3.2) has various names throughout the machine learning literature; depending on the source, it is referred to as the error function, cost function,

744

Machine learning

Fig. 17.1 Illustration of the process of linear regression: Seven points are distributed about the line y = w0 + w1 x, shown as a solid line. Dashed lines represent distances from each point (xi , yi ) to the line y = w0 + w1 x at x = xi .

or loss function. Here, we will refer to it as the loss function, which is the most commonly used terminology. The optimal value of w is that which minimizes E(w), i.e., it is the solution of the minimization problem ∂E = 0. ∂w

(17.3.3)

In fact, the solution of the minimization problem is ultimately independent of the 1/N prefactor in eqn. (17.3.2), so the choice of this prefactor, which makes E(w) an average distance, is arbitrary. If eqn. (17.3.1) is substituted into eqn. (17.3.2) to give E(w0 , w1 ) =

N 1 X |yi − w0 − w1 xi |2 , N i=1

(17.3.4)

it becomes clear that eqn. (17.3.3), in this simple case, yields two equations ∂E/∂w0 = 0 and ∂E/∂w1 = 0 in the two unknowns w0 and w1 . From eqn. (17.3.4), these two conditions yield the coupled equations hyi − w0 − w1 hxi = 0

hxyi − w0 hxi − w1 hx2 i = 0,

(17.3.5)

where h· · ·i indicates an average over the N data points. The solution to these equations is the familiar result w1 =

hxyi − hxihyi hx2 i − hxi2

Simple linear regression

w0 = hyi − w1 hxi.

745

(17.3.6)

At this point, we note that the minimization of the loss function in eqn. (17.3.2) is tantamount to maximizing a Gaussian probability distribution of the form " # N 1 X 1 2 2 exp − 2 (17.3.7) P (y; w, σ ) = |yi − y(xi , w)| , 2σ i=1 (2πσ 2 )N/2 since

 2σ 2 (17.3.8) ln PG (y; w, σ 2 ) − σ 2 ln 2πσ 2 , N where y = (y1 , y2 , ..., yN ) is a vector of the N y-values. Therefore, the optimal parameters of our data model are those that maximize a Gaussian probability distribution between the input data and the chosen model of that data. Once the parameter vector wmin that minimizes the loss function is determined, eqn. (17.3.2) can be used to p compute a root-mean-square error (RMSE) ǫRMSE via ǫRMSE = E(wmin ). However, a more commonly used error is the mean absolute error (MAE), based on an L1 norm and given by N 1 X |yi − y(xi , wmin )|. (17.3.9) ǫMAE = N i=1 E(w) = −

In order to establish a robust test of the quality of a data model, we can divide the available data into two subsets: a training set of size Ntrain and a test set of size Ntest . The size of the test set is assumed to remain fixed while the number of points in the training set is allowed to vary. Thus, when Ntrain + Ntest < N , the rest of the data are held in reserve to augment the size of the training set. Initially, the training set size can be a small fraction of the total available data and is used to minimize the loss function and determine optimal parameters. Then, the test set is used to evaluate the accuracy of the model via an RMSE or MAE, evaluated over the points in the test set. If the error over the test set is too large, more points can be added to the training set from the reserve set and the process repeated until the magnitude of the error is deemed acceptable. What is meant by an “acceptable” error is typically one that is lower than the intrinsic error in the original data. For example, if a data set is within chemical accuracy of 1 kcal/mol, then the prediction error in the data model should be lower than 1 kcal/mol. A plot of the RMSE or MAE versus training set size, known as a learning curve, will reveal how well the model learns from the training data. Once an acceptable error is reached, the model is considered to be trained, and it can subsequently be used to predict new y values from input x values that are new to the model. In our simple linear regression example, we assumed that x is a scalar variable. Suppose, instead, that we have N points of the form (y1 , x1 ), (y2 , x2 ),...,(yN , xN ), where xi is an n-dimensional vector and the N points, therefore, exist in an (n + 1)dimensional space. If we again assume a linear relation between y and x, then the generalization of the linear data model in eqn. (17.3.1) becomes y(x, w) = w0 + w · x,

(17.3.10)

746

Machine learning

where w is an n-dimensional vector of parameters w = (w1 , ..., wn ). As with the scalar linear regression model, an analytical solution for the n + 1 parameters (w0 , w) can be obtained by direct minimization of the loss function in eqn. (17.3.2). The analytical solution, in particular, is expressible in terms of the inverse of the covariance matrix C ≡ hxxT i − hxihxiT of the data vector x (see Problem 17.3). As a final point, we note that, depending on the nature of the input data, certain points may overly influence the regression. In particular, if some points are outliers with respect to the posited model, a small subset of parameters might be favored as the learning procedure attempts to fit these points. This is generally undesirable and can lead to overfitting of outlier data. A control measure in the learning procedure can be introduced that prevents undue bias of certain data points. Consider modifying the loss function so that it reads E(w0 , w, λ) =

N 1 X λ (yi − w0 − w · xi )2 + w · w. N i=1 2

(17.3.11)

The last term (λ/2)w · w in eqn. (17.3.11) is known as a regularizer or ridge term and involves a new parameter λ, which needs to be included in the optimization of the model. However, if we were to include λ into the set (w0 , w) of optimizable parameters, then the optimization problem would become a nonlinear one, which is more difficult to solve analytically. Therefore, regularization parameters, such as λ, are typically chosen a priori before the optimization is performed. Such a parameter is known as a hyperparameter. The choice of λ is governed by its ability to lower the overall error across the test set compared to what the error would be if λ = 0. Regularization terms need not be restricted to quadratic forms. Other choices include a linear regularizer of the form λ′ |w|, known as a lasso term, or a combination of linear and quadratic regularizers, λ|w|2 /2 + λ′ |w|, known as an elastic net term. The elastic net regularizer requires determination of two hyperparameters, λ and λ′ . Depending on the nature of the data, a lasso or elastic net regularizer might be a better discriminator of the most relevant parameters (w0 , w) weighting the input values x. Choosing hyperparameters in a machine learning model can be accomplished using a robust scheme known as k-fold cross validation. In this approach, the training data are divided into k subsets of equal size; of these subsets, (k − 1) are used for hyperparameter searching, and the remaining subset is used to validate the choice. This process is repeated such that each of the k subsets acts as the validation set and the validation error is retained for each of the k searches. Ultimately, the hyperparameters that give the lowest error can be selected, or the hyperparameters can be averaged over the k searches (in which case, the associated error will be an average error over the k individual errors). As a final assessment, the quality of the choice of hyperparameters should be evaluated against the test dataset. The use of k-fold cross validation ensures that the hyperparameter search is not biased toward a single validation set.

17.4

Kernel methods

Linear regression methods such as the n-dimensional generalization in eqn. (17.3.10) are special cases of a more general approach to regression problems known as kernel

Kernel methods

747

regression. In kernel regression methods, the vector x in eqn. (17.3.10) is replaced by a vector function Φ : Rn → Rnf , where nf is a feature space of dimension nf ≥ n. More specifically, nf is the number of parameters needed to define the data model. For linear regression, it is easy to see that nf = n and Φ(x) = x. The more general expression for the data model is y(x, w) = w0 + w · Φ(x),

(17.4.1)

where now dim(w) = nf and Φ(x) is a nonlinear function of x. There is some vagueness to the idea of a feature space of dimension nf , but the good news is that neither nf nor Φ(x) need to be known explicitly, as we will show shortly. The power of eqn. (17.4.1) is that it serves as a framework for obtaining different types of kernel methods. In this context, a kernel or kernel matrix is an N × N matrix defined by Kij ≡ Φ(xi ) · Φ(xj ).

(17.4.2)

In addition, the parameters w are never explicitly definedP but are treated as functions N of the training data, which take the general form w = i=1 g(xi ). This is called a support-vector expansion. A framework that allows us to exploit these ideas is constrained optimization of the data model. That is, we define a set of N quantities εi , which we will require to equal yi − y(xi , w) via the constrained optimization procedure. To this end, we start with the quadratic loss function N

E(w) =

1 1 X |εi |2 + w · w, 2λ i=1 2

(17.4.3)

which we minimize subject to the constraint that εi = yi − y(xi , w). Here, λ is a hyperparameter similar to a regularization parameter. Since i = 1, ..., N , we have N constraints, which we can enforce by introducing N Lagrange multipliers αi and an extended loss function N N X 1 X 1 ˜ E(w) = αi (εi − yi + w0 + w · Φ(xi )) . |εi |2 + w · w − 2λ i=1 2 i=1

(17.4.4)

In eqn. (17.4.4), εi , w0 , and w are all treated as optimization parameters. The minimization conditions then become N X ˜ ∂E αi = 0 =− ∂w0 i=1 N X ˜ ∂E αi Φ(xi ) = 0 =w− ∂w i=1

748

Machine learning

˜ 1 ∂E = εi − αi = 0. (17.4.5) ∂εi λ PN PN From these equations, we see that i=1 αi = 0, w = i=1 αi Φ(xi ), and εi = λαi . Substituting these into the constraint condition εi − yi + w0 + w · Φ(xi ) = 0, we obtain an expression for each yj value yj =

N X i=1

αi Φ(xi ) · Φ(xj ) + w0 + λαi .

Equations (17.4.6) can be written in matrix form as      0 0 1T w0 = , y 1 K + λI α

(17.4.6)

(17.4.7)

where K is the N × N kernel matrix, y is an N -dimensional vector (y1 y2 · · · yN )T , 1T is an N -dimensional row vector of 1s, α is the vector (α1 α2 · · · αN )T , and I is the N × N identity matrix. A number of different regression methods emerge from eqn. (17.4.6). For example, if we set w0 = 0, we obtain the kernel ridge-regression model for a multidimensional function y(x). In this model, the Lagrange multipliers α1 , ..., αN become the optimization parameters, and the solution of eqn. (17.4.7) for the vector α is α = (K + λI)

−1

y.

(17.4.8)

The data model corresponding to this solution takes the form y(x, α) =

N X

αi K(xi , x),

(17.4.9)

i=1

where the kernel function K(xi , x) ≡ Φ(xi ) · Φ(x) and gives the kernel matrix Kij via Kij = K(xi , xj ). In fact, eqn. (17.4.8) can be derived by substituting the kernel ridge-regression data model into the least-squares loss function E(α) =

N X i=1

(yi − y(xi , α))2 + λαT Kα,

(17.4.10)

where we see that λ becomes the ridge parameter. On the other hand, if we retain the parameter w0 and rename it α0 for notational uniformity, then the corresponding data model, known as a least-squares support-vector machine model, takes the form y(x, α) =

N X

αi K(xi , x) + α0 .

(17.4.11)

i=1

Optimizing the corresponding loss function leads to the solution    −1   α0 0 1T 0 = . α 1 K + λI y

(17.4.12)

Kernel methods

749

Note that in the kernel ridge and support-vector machine models presented here, the function Φ(x) has disappeared. As we noted immediately below eqn. (17.4.1), we do not need to know the form of Φ(x). Rather, we can bypass specification of Φ(x) and introduce a kernel function K(xi , x) directly. This replacement is referred to as the “kernel trick”, which allows flexibility in how we specify the kernel function. When leveraging the kernel trick, a common choice is a Gaussian kernel K(xi , x) K(xi , x) = e−|xi −x|

2

/2σ2

.

(17.4.13)

The Gaussian kernel option illustrates the advantage of the kernel trick, as eqn. (17.4.13) cannot be derived from a dot product of the form Φ(x) and Φ(xi ). When a Gaussian kernel is employed within the kernel ridge-regression model, for example, the method is known as Gaussian kernel ridge regression. In this model, σ becomes a hyperparameter to be chosen along with the ridge parameter λ. Other kernel functions could be envisioned; however, the Gaussian kernel ridge regression model is both simple and widely applicable. A downside of kernel methods is that very large data sets (large values of N ) require kernel matrices of size N 2 , which can lead to significant memory issues when the matrix needs to be stored and inverted. This issue raises the question of whether more compact and flexible data models might be possible, a topic that will be addressed in the next section where neural network models are discussed. Before leaving this section, we illustrate how kernel methods might be used in a statistical mechanical application. Suppose we have performed an enhanced sampling calculation to generate an n-dimensional free-energy surface A(s1 , ..., sn ) ≡ A(s) using one of the techniques for generating high-dimensional free-energy surfaces such as were discussed in Sections 8.10 and 8.11. The d-AEFD/TAMD method, for example, generates global sweeps across the free-energy landscape generating a scattering of N points, si and corresponding free-energy values Ai ≡ A(si ). In the early phase of a run, the N points are sparsely distributed over the surface as illustrated in the left panel of Fig. 17.2. The point distribution might not be dense enough to reveal features of the surface to the naked eye. However, the points (si , Ai ) can be used to train one of the regression models described in this section in order to fill in details of the surface not easily identifiable by inspection, allowing its features to be discerned with greater clarity. This is illustrated in the right panel of Fig. 17.2. As the calculation is carried out further, more regions are sampled, the density of points (si , Ai ) increases, and the model can be updated with additional training data. This is known as active learning. Active learning allows the features of the surface predicted by the kernel model to become sharper as the amount of training data increases. If a kernel ridge regression model is used, then at any stage in the simulation, the explicit representation of the free-energy surface A(s) is A(s) =

N X

αi K(si , s) + α0 ,

(17.4.14)

i=1

where si , i = 1, ..., N are the N training points generated in the simulation and K(si , s) is the kernel function expressed in terms of the extended variables s that parameterize

750

Machine learning

Fig. 17.2 Generation of a free-energy surface via machine learning. (Left panel) Points sampled in a short d-AFED/TAMD simulation of the alanine dipeptide. (Right panel) Free-energy surface generated using a kernel ridge regression model. Dark blue are regions of low free energy, red indicates regions of high free energy (reproduced with permission from Cendagorta et al., J. Phys. Chem. B 124, 3647 (2020), copyright American Chemical Society).

the marginal probability distribution and the free-energy surface. Equation (17.4.14) allows the free energy at any point s to be evaluated. If the kernel employed is a Gaussian kernel, then eqn. (17.4.14) becomes

A(s) =

N X

αi e−|si −s|

2

/2σ2

.

(17.4.15)

i=1

Training is performed by optimizing the loss function with a ridge term, as in eqn. (17.4.10). We defer discussion of learning curves and training protocols for this type of application until Section 17.7, where we will compare kernel methods to other types of machine learning models.

17.5

Neural networks

By far, among the most powerful machine learning models are neural networks. With their enormous flexibility in allowable architectures and alluring mathematical structure, neural networks can be constructed for a broad range of learning tasks, including speech and text recognition, image recognition and classification, customer preference analysis, risk management, materials and chemical compound design and property characterization, and computer simulation processing and enhancement. Here, we will not endeavor to describe all possible network architectures and types, as the list is simply too long; rather, we aim to introduce general concepts of neural networks and provide examples of simple architectures that are useful in statistical mechanical applications. Neural network models are mathematical constructs, derived from neuronal patterns observed in the human brain, that take in input data and perform a series of transformations—nonlinear or linear—on the data in order to produce a desired output. One of the earliest examples of a neuron-based computational model is described in the work of Warren S. McCulloch and Walter Pitts (1943) who proposed a “logical calculus” of neuronal activity, relating the behavior of certain networks to specific psy-

Neural networks

751

chological conditions based on assumptions about the inputs1 . Fourteen years after the work of McCulloch and Pitts, an important theorem would be established that provided a mathematical foundation for many widely used modern neural network architectures. This theorem is the Kolmogorov superposition theorem (Kolmogorov, 1957) after the mathematician Andrey N. Kolmogorov (1903-1987). Following a number of subsequent refinements (Sprecher, 1965; Fridman, 1967; Lorentz, 2005), Kolmogorov’s theorem asserts that a continuous scalar function y(x1 , ..., xn ) of n variables xi ∈ [0, 1] can be expressed in terms of 2n+1 monotonically increasing functions φq (x) of a single variable (φ : [0, 1] → [0, 1]) and a single continuous function g(x) (g : R → R) in the following way: ! 2n+1 n X X y(x1 , ..., xn ) = g λp φq (xp ) , (17.5.1) q=1

p=1

where the coefficients λ1 , ..., λn > 0. While a detailed proof of Kolmogorov’s theorem is beyond the scope of this book (a sketch of the proof is outlined in the steps of Problem 17.17), an existence proof involves establishing that the function g(x) exists; this can be done by constructing a series of approximants to y(x1 , ..., xn ) based on the superposition principle and then showing that this series converges exactly to y(x1 , ..., xn ) (Kahane, 1975). In 1987, R. Hecht-Nielsen connected Kolmogorov’s theorem to a type of neural network known as the feed-forward network (Hecht-Nielsen, 1987) to be discussed in this section. However, the feed-forward network construction we will describe is based on a corollary to the Kolmogorov theorem, introduced in 1991 by Vera Kurkova (1991). Kurkova’s corollary is a more flexible formulation of Kolmogorov’s theorem that allows the number of inner functions to be greater than 2n + 1 while still guaranteeing that y(x1 , ..., xn ) can be approximated to arbitrary accuracy. Kurkova’s restatement of Kolmogorov’s theorem is ! m n X X y(x1 , ..., xn ) = gq ψqp (xp ) , (17.5.2) q=1

p=1

where ψqp (x) are continuous monotonically increasing functions on [0, 1], gq (x) is a continuous function, and m > 0 is an integer. The structure of a feed-forward neural network emerges by repeated application of eqn. (17.5.2) as we will now demonstrate. In order to see how the mathematical structure of a neural network emerges from eqn. (17.5.2), note that the argument of gq is, itself, a function of x1 , ..., xn for each value of q. We denote this function as hq (x1 , ..., xn ). Applying Kurkova’s representation to hq , we obtain hq (x1 , ..., xn ) ≡

n X p=1



ψqp (xp ) =

m X s=1

γqs

n X r=1

!

χsr (xr ) ,

(17.5.3)

1 In fact, in 1943, there was already considerable activity in the biophysics community to establish a mathematical framework for neuronal networks. The novelty of the work of McCulloch and Pitts, in addition to involving a collaboration between a neurophysiologist and logician, is its use of logic and computation as a way to understand neural activity. For a deeper look at the work of McCulloch and Pitts, see the historical and contextual analysis of G. Piccinini (2004).

752

Machine learning

where γqs (x) is a continuous function analogous to gq (x). Let us now choose χsr (xr ) to be (0) χsr (xr ) = wsr xr + asr , (17.5.4) (0)

where wsr and asr are constants. This function increases monotonically, as required, (0) provided wsr > 0. With this choice, we see that n X

χsr (xr ) =

r=1

n X

(0) wsr xr +

r=1

n X r=1

asr ≡

n X

(0)

(0) wsr xr + ws0 .

(17.5.5)

r=1

We then set (1) γqs (x) = wqs h(x) + bqs ,

(17.5.6)

where h(x) is a continuous function, about which we will have more to say later in this section. Substituting eqn. (17.5.6) into eqn. (17.5.3), we obtain ′

m X



γqs (x) =

s=1

m X



(1) wqs h(x)

s=1

+

m X s=1



bqs ≡

m X

(1)

(1) wqs h(x) + wa0 .

(17.5.7)

s=1

(2)

Finally, we set gq (x) = wq h(x) + cq . If we now substitute eqns. (17.5.3) through (17.5.7) into eqn. (17.5.2), we obtain  ′  ! m m n X X X (1) (0) (1) (0) y(x1 , ..., xn ) = wq(2) h  wqs h wsr xr + ws0 + wq0  + w(2) , (17.5.8) q=1

where w(2) = y(x1 , ..., xn ) = m X q=1

Pm



wq(3) h 

q=1 cq .



m X s=1

s=1

r=1

If we iterate the Kurkova theorem once again, we obtain

 ′′ m X

(2)  wqs h

t=1

(1) wst h

n X r=1

(0) wtr xr

+

(0) wt0

!

+



(1) ws0 

+



(2) wq0 

+ w(3) . (17.5.9)

Equations (17.5.8) and (17.5.9) are the mathematical representations of two-hiddenlayer and three-hidden-layer feed-forward neural networks, respectively. The term “hid(l) den layer” will be explained shortly. The parameters wp0 are known as biases. The operational interpretation of the feed-forward neural network in eqn. (17.5.9) is as follows: We input a set of values for the n variables x1 , ..., xn . These are then transformed (0) (0) according to the linear function wtr xr + wt0 . The result is fed into the nonlinear (1) (1) function wst h(x) + ws0 , and the result of this transformation is fed into the function (2) (2) (3) wqs h(x) + wq0 . This result is fed into the function wq h(x) + w(3) to produce the output y(x1 , ..., xn ). The procedure is illustrated in Fig. 17.3. In this figure, the network is shown as a mathematical graph in which the set of nodes in the layer at the left represents the inputs x1 , ..., xn . These nodes are then fully connected to a second set of nodes that represents the first transformation using the function h(x). These nodes

Neural networks

753

Fig. 17.3 Feed-forward neural network with three hidden layers represented as a mathe(j) matical graph. Quantities zσ are defined in eqn. (17.5.10). Each set of parameters {w(K) } determines weights of connections between hidden layers (shown with grey shading).

are fully connected to the next set, and so on, until the final output node represents (j) the output y(x1 , ..., xn ). The quantities zσ in Fig. 17.3 are defined as follows: ! n X (1) (0) (0) zt = h wtr xr + wt0 r=1



zs(2) = h 



zq(3) = h 



m X t=1 ′′

m X s=1



(1) (1) wst zt 



(2) (2)  wqs zt .

(17.5.10)

The resemblance of the graph in Fig. 17.3 to the connections between neurons in the brain has led to the use of the neural network moniker for the models in eqns. (17.5.8) and (17.5.9). Since neurons are activated when presented with input stimuli, the functions h(x) are known as activation functions. The edges, or connections, in the (K) graph denote the various parameters wij . These parameters are determined in the training phase by optimizing a loss function. What happens to the data in the layers that contain the activation functions h(x) is, in some sense, hidden from the user of the feed-forward network, as these transformations are performed automatically by the network. For this reason, these layers are called hidden layers. The numbers m′′ , m′ , and m determine the numbers of nodes in the first, second, and third hidden layers, respectively. The input layer must contain n nodes for each input value, and, in the example illustrated in Fig. 17.3, the output layer contains just one node that contains the function y(x1 , ..., xn ). Feed-forward neural networks with a single hidden layer are generally referred to as single-layer perceptron models. Network structures with more than one hidden layer are referred to as deep neural networks. Although

754

Machine learning

deep networks usually outperform single-layer perceptron models, how deep they need to be depends on the nature of the machine learning problem, and these networks may range in depth from two to hundreds of hidden layers of different architectures. Importantly, since the activation functions h(x) are chosen a priori, representations of y(x1 , ..., xn ) like those in eqns. (17.5.8) and (17.5.9) are no longer exact, as would be guaranteed by the Kolmogorov or Kurkova theorems, because an h(x) chosen this way does not necessarily correspond to the correct choice of the functions gq , γqs ,... for a given y(x1 , ..., xn ) ≡ y(x). Therefore, it is necessary to optimize the parameters (K) wij of the network via the training process. For this reason, the representation of y(x) as a neural network should actually be expressed as y(x, w) to indicate that the representation depends on the full set of parameters w. Given N training points (x1 , y1 ), ..., (xN , yn ), we must optimize a loss function of the form E(w) =

N 1 X |yi − y(xi , w)|2 N i=1

(17.5.11)

by solving ∂E/∂w = 0 in order to determine the optimal parameter set w. The mathematical complexity of neural networks causes the calculation of derivatives of the network with respect to w to be nontrivial, a point to which we will return later in this section. First, we discuss the choice of the activation function h(x). Activation functions serve several purposes in neural networks. First, they add nonlinearity into the data transformations that are performed; the corresponding weights determine how much a particular neuron contributes to the process of learning patterns in the data. Perhaps more importantly, activation functions need to effect a reshaping of the data between input and output layers such that redundancies in the data are eliminated, allowing true patterns to emerge. If we consider the input data as having an intrinsic dimension, which could be lower than the dimension of the space in which it is specified or embedded, then it has been suggested (Ansuini et al., 2019) that as a data set propagates through the layers of a neural network, transformations performed by these layers reduce this intrinsic dimension so that the data manifold is “compressed” into a space that contains its essential, or non-redundant, features. Although there is no systematic procedure for selecting an activation function for a particular problem, sigmoid-shaped functions such as h(x) =

1 , 1 + e−x

h(x) = tanh(x)

(17.5.12)

are widely used. Sigmoid functions such as those in eqn. (17.5.12) effect a significant compression of the data input to them, leading to a small change in the output for a large change in the input. The result is a very small gradient at a large number of nodes, a problem known as the vanishing gradient problem. This problem can ameliorated by the choice of a rectified linear unit or ReLU function as the activation function. The ReLU function is defined as h(x) = max(0, x); a slight variant that does not return 0 is the leaky ReLU, defined as h(x) = bx for x < 0 and x for x > 0. Here, 0 < b < 1 is a small positive number such as 0.01. However, the ReLU and leaky ReLU functions are not differentiable at x = 0, which could be a problem during training when the

Neural networks

755

gradient of the network is needed. Therefore, alternatives to the ReLU function that are everywhere differentiable are the softplus function h(x) = ln (1 + ex ) and the exponential linear unit, or ELU, function  x x>0  h(x) =  α(ex − 1) x ≤ 0

(17.5.13)

(17.5.14)

where α is a constant. Another differentiable activation function that does not suffer from vanishing gradients is the so-called “swish” activation function, defined by h(x) =

x . 1 + e−x

(17.5.15)

Unlike the other activation functions present here, the swish function is not monotonically increasing. Although this violates the condition of monotonicity in the Kolmogorov and Kurkova theorems, empirical evidence suggests that this condition is likely sufficient but not necessary. Consequently, it is also possible to take h(x) to be a Gaussian or a Lorentzian function. There is also no requirement that the same activation function be used in every layer of a neural network. Since different layers can serve different purposes as concerns learning patterns from the data, there could be advantages in using different activation functions in different layers, an issue we address in Section 17.5.1 when we discuss classification problems. Note that it is possible to tune the shape of a chosen activation function by replacing any of the h(x) functions defined above with h(ax), where the constant a becomes another hyperparameter that would need to be chosen at the outset, before training. The activation functions we have introduced here are shown in Fig. 17.4. With so many possible choices for activation functions with little guidance on how to choose an optimal function for a given layer in a neural network, one might ask whether the data, itself, could dictate the selection for a given learning problem. This is, indeed, possible, via an approach known as self learning of activation functions. Self learning of activation functions can be achieved by expanding h(x) in terms of a set of M basis functions as M X h(x; β) = βk φk (x) (17.5.16) k=1

and including the set of M coefficients β1 , ..., βM ≡ β, together with the parameters w, as parameters to be learning in the training phase. The neural network is then represented as y(x, w, β), and optimization of the loss function requires the two conditions ∂E(w, β)/∂w = 0 and ∂E(w, β)/∂β = 0. Examples of possible choices of φk (x) are simple polynomials φk (x) = xk−1 (Goyal et al., 2019) or sinc functions, φk (x) = sin(x − kd)/[π(x − kd)], where d defines a grid spacing for x. From a technical standpoint, the most complex operation when employing neural networks is the calculation of the derivatives needed to perform the parameter optimization. Complexity arises from the deep nesting of layers between input and output.

756

Machine learning

Fig. 17.4 Activation functions in eqns. (17.5.12) (tanh(x)) (upper left panel), (17.5.13) (upper right panel), (17.5.14) (lower left panel), and (17.5.15) (lower right panel).

Because of this nesting, long products arising from the application of the chain rule result when the derivatives are performed. To illustrate the structure of these products, consider a simple nested function g(w) = h(h(h(wx0 ))), where x0 is a constant. We can think of this function as representing a toy network having three hidden layers with one node in each layer. From the chain rule, the derivative of g with respect to w is g ′ (w) = [h′ (h(h(x0 w))][h′ (h(x0 w))][h′ (x0 w)]x0 . From the pictorial representation in Fig. 17.3, if this product is read from left to right, the first term in square brackets is the derivative of the outermost layer, which produces the output result g(w), the second term is the derivative of the layer just to the left of the previous layer, the third term is the next layer to the left, and finally, the last term “x0 ” is the derivative of the input layer. Thus, we see that the product is a propagation backward through the layers of the network from the rightmost (output) layer back to the leftmost (input) layer. Hence, the approach for computing derivatives of the nested functions that comprise a feed-forward network via the chain rule is called back propagation. Of course, computing the derivative of the loss function in eqn. (17.5.11) with a complete feed-forward network, although straightforward in principle, requires considerable bookkeeping to account for all of the terms that arise when the chain rule is applied. Suppose the network y(x, w) has K hidden layers with m(i) nodes in the ith layer. In order to derive the rules of back propagation, let us define a recursive variable

Neural networks

zr(k) (x) =

 (k−1) mX     (k−1) (k−1) (k−1)   h z war + w0r , n = 2, ..., K a    a=1   n  X  (0)  (0)  xa war + w0r , 

757

(17.5.17)

n = 1.

a=1

(k−1)

is the weight parameter that connects the Here, k indexes the hidden layer and war ath node in layer k − 1 to the rth node in layer k. The output layer of the network can be written compactly as y(x, w) =

(K) m X

a=1

  (K) h za(K) (x) wa(K) + w0 .

(17.5.18)

With the recursion in eqn. (17.5.17), the derivatives can also be defined recursively as a backward propagation through the layers of the network, from output back to input. (k) Thus, the derivative of eqn. (17.5.11) with respect to wrs can be written recursively as (k+1)

∂E(w) (k) ∂wrs

=

N mX X i=1

=

(k+1)

∂E(w)

∂za

(k+1) ∂za (xi )

a=1

(xi )

(k) ∂wrs

 N   X ∂E(w)  (k)   (x ) , h z i  i (k+1)   (xi )  i=1 ∂zs   N  X     i=1

∂E(w) (k+1)

∂zs

(xi )

0 P (cj |x) for all j 6= k. In addition to the conditional probability P (ck |x), we define p(x) as the probability distribution of input vectors x, P (ck ) as the probability of the occurrence of class ck in the machine learning model, and p(x|ck ) as the conditional probability distribution of inputs x given the class ck . The detailed balance condition in eqn. (7.3.23) allows us to relate these probabilities via P (ck |x)p(x) = p(x|ck )P (ck ).

(17.5.27)

Rearranging eqn. (17.5.27), we can write P (ck |x) as P (ck |x) =

p(x|ck )P (ck ) , p(x)

(17.5.28)

a result known as Bayes’ theorem after the English statistician, philosopher, and Presbyterian minister Thomas Bayes (1701–1761). The importance of Bayes’ theorem is

Neural networks

761

that it determines a key component of neural network architectures for classification problems. The term P (ck |x) in Bayes’ theorem is known as the posterior distribution; the conditional probability p(x|ck ) is referred to as the likelihood function, expressing a likelihood for obtaining the data x given parameters, i.e., class designations ck ; finally, p(ck ) is known as the prior distribution, which allows us to input known information about the class parameters ck if such information is known. Noting that p(x) is a marginal distribution obtained from the sum rule p(x) =

C X

p(x|ck )P (ck ),

(17.5.29)

k=1

we see that

p(x|ck )P (ck ) , P (ck |x) = PC k=1 p(x|ck )P (ck )

(17.5.30)

so that p(x) is a normalization for the product p(x|ck )P (ck ) in the numerator of the theorem. In the neural networks we derived for regression problems, we were able to express the output layer as a linear sum of the activation functions for the penultimate hidden layer as a consequence of Kurkova’s theorem. For classification problems, however, we cannot assume this is possible, and consequently we must express the output layer using a final activation function. Thus, eqn. (17.5.8) for a neural network with three hidden layers would become    ′  ! m m n X X X (1) (0) (1) (0) yk (x) = H  wq(2) h  wqs h wsr xr + ws0 + wq0  + w(2)  , (17.5.31) q=1

s=1

r=1

with a similar modification for the three-hidden-layer network in eqn. (17.5.9). Here, H(x) is the outer activation function whose form we need to determine. Note that yk (x), k = 1, ..., C, which replaced the continuous function y(x) in eqns. (17.5.8) and (17.5.9), is now interpreted as a numerical label for membership in the kth class. If we interpret yk (x) as the conditional probability P (ck |x), then it is clear that yk (x) ∈ [0, 1] P with C k=1 yk (x) = 1. Bayes’ theorem can now be used to determine the form of H(x). The facts that H(x) determines the output yk and that yk ∈ [0, 1] already restrict the type of activation function H(x) can be. What Bayes’ theorem accomplishes is a precise specification of the particular functional form of H(x). Let us begin by considering a binary classification with two classes c1 and c2 , and let z = z(x) denote the vector that results from transforming x through all of the hidden layers of a classification neural network. The output activation function H(x) determines the conditional probability P (ck |z) that a particular class ck is assigned to z. We start by specifying a form for the distribution p(z|ck ), which we might take to be an exponential construct p(z|ck ) = exp [F (θk ) + B(z, φ) + θk · z] ,

(17.5.32)

where F (θk ) is a function of a set of parameters θk that vary with the class k, φ is a set of universal parameters, and B(z, φ) is a function of z. This form is sufficiently

762

Machine learning

general to encompass the most commonly employed distribution functions such as the Gaussian and Bernoulli (see Section 17.2), binomial, Poisson, and various other distributions. For binary classification, using Bayes’ theorem, we can write the probability for one of the two classes, say c1 , as P (c1 |z) = =

=

p(z|c1 )P (c1 ) p(z|c1 )P (c1 ) + p(z|c2 )P (c2 ) 1 1+

p(z|c2 )P (c2 ) p(z|c1 )P (c1 )

1 , 1 + e−a

where

(17.5.33)



p(z|c1 )P (c1 ) p(z|c2 )P (c2 ) which is a linear function of z of the form a = ln



,

(17.5.34)

a = w · z + w0 w0 = F (θ1 ) − F (θ2 ) + ln



P (c1 ) P (c2 )



w = θ1 − θ2 .

(17.5.35)

We only need to determine P (c1 |z), as we can determine P (c2 |z) from P (c2 |z) = 1 − P (c1 |z). When a is written this way, we see that the argument of the activation function takes the expected form of a weighted linear combination of components of z with the bias w0 . This analysis tells us that H(x) should be chosen as the logistic sigmoid function H(x) = 1/(1 + exp(−x)) (eqn (17.5.12)). By the same analysis, if there are C > 2 classes, then the Bayes’ theorem along with eqn. (17.5.32) leads to p(z|ck )P (ck ) P (ck |z) = PC l=1 p(z|cl )P (cl ) e−ak , = PC −al l=1 e

where

ak = w · z + wk0

with wk = θ k ,

wk0 = F (θk ) + ln P (ck )

(17.5.36)

(17.5.37) (17.5.38)

(see Problem 17.12). These conditions require that H(x) be chosen as the softmax or normalized exponential function e−βk x H(x; β) = PC −βl x l=1 e

which depends on a vector β of parameters.

(17.5.39)

Neural networks

763

We now turn to the determination of the correct loss function for classification. Just as the Gaussian distribution determined the loss function for regression, the Bernoulli and categorical distributions in Section 17.2 determine the form of the loss function for classification. Once again, we first consider the binary classification problem with two classes c1 and c2 . If y1 (x, w) is the neural network used to determine membership in class c1 , with y2 (x, w) = 1 − y1 (x, w), then we use eqn. (17.3.8), relating the loss function to the negative logarithm of a probability distribution, to derive E(w) from the negative logarithm of the Bernoulli distribution. Referring to the distribution in eqn. (17.2.6), we interpret x as N known input classifications y (i) corresponding to N input vectors xi . Then, since − ln PB (x) = −[x ln ν + (1 − x) ln(1 − ν)], the correct loss function E(w) for binary classification becomes E(w) = − =−

N i 1 X h (i) (i) y1 ln y1 (xi , w) + y2 ln y2 (xi , w) N i=1 N  i  1 X h (i) (i) y1 ln y1 (xi , w) + 1 − y1 ln (1 − y1 (xi , w)) , (17.5.40) N i=1

which is known as a cross-entropy loss function. Training then proceeds by minimizing E(w) with respect to the network parameters w, just as in the regression problem. When there are more than just two classes (C > 2), we use the categorical distribution in eqn. (17.2.7) to derive an appropriate loss function E(w). Taking the negative (i) log of this distribution and substituting in the input training data (xi , yk ) and the neural networks yk (x, w), we obtain the multi-state cross-entropy loss function E(w) = − (i)

with yk ∈ [0, 1], and 17.5.2

PC

k=1

N C 1 X X (i) yk ln yk (xi , w) N i=1

(17.5.41)

k=1

(i)

yk = 1.

Convolution layers in neural networks

We have discussed only one type of data transformation in neural networks thus far, specifically, activation of simple linear combinations in feed-forward networks. However, neural networks can incorporate other types of transformations in their effort to enhance or blur certain input features. For example, image classification via neural networks can be achieved by drawing out features capable of distinguishing one image from another. This kind of feature enhancement (or its opposite, obfuscation) can be achieved by means of filters applied to an input data stream. Filters are generally applied by convolving them with the data using a discrete version of a convolution operation, such as we encountered in Chapter 15 in our discussion of the generalized Langevin equation. Suppose the input to a neural network is not a vector but a matrix x of dimension nr × nc . Such a matrix could hold information about the pixels in an image, for example. Let F be a matrix of dimension Nr × Nc , which we refer to as a filter. Then,

764

Machine learning

we define the two-dimensional (2D) convolution of x with F as a matrix X of size nr − Nr + 1 × nc − Nc + 1 given by XIJ ≡ (x ◦ F)IJ =

NX r −1 N c −1 X i=0

xI+i,J+j Fij .

(17.5.42)

j=0

Convolutions can be similarly defined for 1D arrays, 3D arrays, and, generally, tensors of any dimension, depending on the number of indices needed to describe the input data. As an example of a convolution, consider an input matrix x and filter F specified as     1 2 3 1 1 1   x= 4 5 6 1 , F= , 1 1 7 8 9 1 then

x◦F=



12 16 11 24 28 17



.

In neural networks, convolution layers perform operations such as that in eqn. (17.5.42), and it is the filter matrix that must be learned via training. In addition, since convolutions are linear transformations, it is common to finalize the transformation of the layer by running XIJ through an activation function to give a new matrix ZIJ = h(XIJ +b), where b is a bias. Finally, it is also possible to train multiple filters in a convolution layer by adding an additional index to the filter. Multiple filters are used to extract multiple features from the input data. For the 2D convolution in eqn. (17.5.42), multiple filters would be included by modifying the definition to read XIJk =

NX c −1 r −1 N X i=0

xI+i,J+j Fijk ,

(17.5.43)

j=0

where k = 1, ..., Nf indexes the number of desired filters. When convolution layers are used in networks that are also partially feed-forward networks, it is necessary to feed the output of a convolution layer into an activation layer whose input is a one-dimensional vector. This is done via an intermediate flattening layer in which a multidimensional layer ZIJ is converted into a one-dimensional array shaped for input into an activation layer.

17.6

Weighted neighbor methods

In this section, we will briefly describe two additional machine learning models that fall into a class of techniques known as weighted neighbor methods, specifically knearest neighbors and random forests. The idea behind weighted neighbor methods is to predict unknown values of a function y at a point x using only the nearest neighbors of x within the training set. This class of methods derives from the notion of regression or decision trees, depending on the desired task, in which we begin by partitioning the n-dimensional space Rn into M regions Ri , chosen such that the

Weighted neighbor methods

765

Fig. 17.5 Illustration of a splitting scheme for two-dimensional data (a) with an associated decision tree graph (b).

function y(x1 , ..., xn ) ≡ y(x) has a constant or nearly constant value wi within each region. This subdivision allows us to create a model for y(x) given by y(x, w) =

M X i=1

wi h(x ∈ Ri ),

(17.6.1)

where h(x ∈ R) is an indicator function that is 1 if x ∈ R and 0 otherwise. w1 , ..., wM ≡ w are parameters representing the value of y in each region. If we minimize a leastsquares loss function in eqn. (17.3.2) over the training data (x1 , y1 ), ...., (xN , yN ), the result is 1 PN yi h(xi ∈ Rk ) N wk = 1 Pi=1 , (17.6.2) N i=1 h(xi ∈ Rk ) N which is just the average value of the target function in Rk over the training set. Unfortunately, determining the regions Rk is nontrivial, as obtaining an optimized splitting of R increases in complexity with the size of the training set and the dimension of x. Figure 17.5(a) illustrates the region-splitting procedure for two-dimensional data. The definitions of the regions Ri can be gleaned from the figure; for example, R1 is the region for which x1 < t1 and x2 < t2 , R2 is the region for which x1 < t1 and x2 > t2 , and so forth. Note that the splitting procedure can be represented in a graph structure known as a decision tree. We will return to this decision tree graph shortly when we discuss ensemble methods. First, we introduce an approximate, yet tractable, protocol for approaching this splitting problem. Equation (17.6.2) allows us to construct a model for y(x) in a local neighborhood of the point x. The approximation takes the form y(x) ≈

K X i=1

W (x, xi )yi ,

(17.6.3)

766

Machine learning

where W (x, xi ) is a non-negative weight for the ith training point within a cluster of K neighbors of the point x2 . Each machine learning model of this form will have a different set of associated weights. The following choice for W (x, xi ) defines the K-nearest neighbors model:  1  i = 1, ..., K W (x, xi ) = Kdi (17.6.4)  0 otherwise.

Here, di is an additional parameter whose value depends on the specific K-nearest neighbor algorithm employed. For example, we would set di = 1 if we could assume that all neighboring points carried equal weight in determining y(x) in the neighborhood of x. This parameter could also be based on a distance metric that weights closer neighbors more heavily than more distant neighbors. In practice, K is a hyperparameter chosen to result in the lowest error in a cross-validation procedure. The K-nearest neighbors model results in an approximation yK−NN (x) for y(x) that takes the form yk−NN (x) =

K X yi . Kd i i=1

(17.6.5)

Clearly, values of y(x) can only be accurately predicted in regions for which the model has been trained. Figure 17.5(b) depicts the decision tree corresponding to the splitting of R2 in Fig. 17.5(a). In general, splitting is performed according to a set of rules defined by logical functions that partition data as evenly as possible into the different regions Ri . As Fig. 17.5(b) illustrates, a decision tree consists of a root node and internal nodes set by the splitting rules, which ultimately dictate the path from the root node to a set of terminal or decision nodes, also referred to as leaves. In regression problems, splitting rules are determined by minimization of the relative errors (or variances) at each split until the tree grows to a pre-specified cutoff or until the data can no longer be split. The weight associated with each point in the training set is given by  1  i = 1, ..., K ˜ (x, xi ) = K (17.6.6) W  0 otherwise.

˜ (x, xi ) will become clear by the end of The distinction between W (x, xi ) and W ˜ (x, xi ) also represents the weight of each point for a this paragraph. Note that W single decision tree. Here, K is the number of points within the same leaf at the target point x. Unlike K-nearest neighbors, the number of neighbors within each leaf can vary among leaves in a tree. The difficulty with the use of a single decision tree in 2 Weighted neighbor methods define a directed, weighted graph structure on a data set, in which nodes are represented by the data points {(xi , yi )} and edges are directed from point i to point j, assuming that j is among the K neighbors of i. The weight of each edge is given by the weight function W connecting points i and j.

Demonstrations

767

applying eqn. (17.6.1) for regression is that one tree has a tendency to overfit the training data. An approach by which this overfitting problem can be avoided when using decision trees is to divide the training data into random subsets, an approach known as bootstrap aggregation or bagging, and to create a decision tree for each subset. Each decision tree is created with a different set of splitting rules, and the collection of all decision trees forms an ensemble known as a random forest. In order to determine the weights for a random forest, we average the weights in eqn. (17.6.6) over all trees in the (ensemble). Thus, if there are m trees in the forest, then Pforest m ˜ j (x, xi ), where W ˜ j is the weight given in eqn. (17.6.6) for W (x, xi ) = (1/m) j=1 W the jth decision tree. These average weights are then used to produce a model yRF (x) using eqn. (17.6.3). A random forest generates an ensemble of decision trees, which reduces the overall variance in predictions of y(x) without overfitting the training data.

17.7

Demonstrating machine learning in free-energy simulations

In this section, we examine the performance of the different machine learning models in this chapter for their ability to represent free-energy surfaces accurately, generate observables from these surfaces, and drive rare-event simulations. Our first demonstration involves the regression of high-dimensional free-energy surfaces produced by enhanced sampling calculations. The second demonstration illustrates the use of classification neural networks to design collective variables for use in enhanced-sampling simulations, applied here to a solid-solid phase transition in a bulk metallic system. 17.7.1

Regression of free-energy surfaces

As noted at the end of Section 17.4, regression of high-dimensional free-energy surfaces is relatively straightforward within methods such as d-AFED/TAMD or replicaexchange Monte Carlo (or replica-exchange molecular dynamics). These methods scatter points over the free-energy surface that can be used to train a machine learning model. The longer the simulation runs, the more points will be generated and the better the training will be. Since the goal of leveraging a machine learning model is to represent the function A(s1 , ..., sn ) ≡ A(s), the input needed to train the model is the M values of s, si , i = 1, ..., M and corresponding free-energy values A(sk ). Once trained, the model provides a compact, smooth, closed-form representation A(s) of the free-energy surface that can be evaluated at any desired point s. This representation can then be analyzed for its landmark points (minima and saddle points) (Chen et al., 2015), fed back into a simulation as a bias to accelerate it further (Zhang et al., 2018; Wang et al., 2021), and employed to generate observable properties of interest via evaluation of integrals over Boltzmann factors exp(−βA(s)) (Cendagorta et al., 2020). In particular, if a(r) is a coordinate-dependent function, then we can obtain a canonical average hai of a(r) from an enhanced-sampling simulation as follows: hai = where hair is given by

R

ds hair (s)e−βA(s) R , ds e−βA(s)

(17.7.1)

768

Machine learning

hair =

R

Qn dr a(r) e−βU(r) α=1 δ (fα (r) − sα ) R Qn dr e−βU(r) α=1 δ (fα (r) − sα )

(17.7.2)

(cf. eqn. (8.6.6)). Here, fα (r) is a set of collective variables and U (r) is the potential energy of the system. Although enhanced-sampling simulations deliver sampled values of A(s), in order to apply eqns. (17.7.1) and (17.7.2), we need an analytical representation of the free-energy surface, which is what the machine learning model provides. Thus, in applying eqn. (17.7.1), the function A(s) is replaced by the machine-learned model, which we denote as AML (s), and averages are computed from the integral using either molecular dynamics or a Monte Carlo algorithm. If an observable of interest is a function only of collective variables, then eqn. (17.7.2) is not needed. Although we will only consider classical free-energy surfaces in this section, machine learning models can be applied equally well to quantum free-energy surfaces generated from path-integral simulations (see Section 12.7).

Fig. 17.6 Four molecules employed to test machine learning regression of free-energy landscapes: (a) alanine dipeptide, (b) alanine tripeptide, (c) met-enkaphalin oligopeptide (amino acid sequence Tyr-Gly-Gly-Phe-Met), (d) zwitterionic alanine pentapeptide (reproduced with permission from Cendagorta et al., J. Phys. Chem. B 124, 3647 (2020), copyright American Chemical Society).

For this comparative study, we will focus on a set of small peptides, commonly used as benchmark cases, and the corresponding conformational free-energy landscapes as a function of their backbone Ramachandran dihedral (φ, ψ) angles, which are used as collective variables. The four systems are: the alanine dipeptide, the alanine tripeptide, and the oligopeptide met-enkaphalin (amino acid sequence Tyr-Gly-Gly-Phe-Met), which are studied in vacuum; and the alanine pentapeptide, which is studied in zwitterionic form in aqueous solution. These molecules are pictured in Fig. 17.6. For the alanine dipeptide, there are just two Ramachandran angles, for the alanine tripeptide, the number of angles used is four. For met-enkephalin, ten angles are needed, and for the solvated alanine pentapeptide, the inner three residues and corresponding six Ramachandran angles are selected, as these are the same as have been used in experimental studies (Feng et al., 2016). The gas-phase simulations are performed using the

Demonstrations

769

CHARM22 force field (MacKerell et al., 1998) while the solvated alanine pentapeptide is simulated using the OPLS-AA force field (Jorgensen et al., 1996). All of the training data for the machine learning models are generated from d-AFED/TAMD simulations (see Section 8.10). The simulation parameters are set as follows: for the di- and tripeptides, Ts = 1500 K and µα = 168.0 amu·˚ A2 /rad2 ; for met-enkephalin, 2 2 ˚ Ts = 400 K and µα = 2.8 amu·A /rad ; for the alanine pentapeptide, Ts = 1000 K and µα = 168.0 amu·˚ A2 /rad2 . In all simulations, the harmonic coupling between the collective variables and the coarse-grained variables is 2.78×103 kcal/mol·rad2. Training and test data. For the alanine dipeptide, 9×104 values of (s1 , s2 ) and A(s1 , s2 ) are generated as a training set. These values cover the entire free-energy surface, which is partitioned into a 300 by 300 grid of evenly spaced bins. For the alanine tripeptide, a training set of 2×105 points on the four-dimensional free-energy surface is randomly selected from the d-AFED/TAMD trajectory, and free-energy values A(s1 , s2 , s3 , s4 ) are obtained from a Gaussian fit to the histogram corresponding to these points (Chen et al., 2012). For met-enkephalin, the 1081 minima and 1431 index-1 saddle points and their corresponding free-energy values identified by Chen et al. on the ten-dimensional surface (2015) are employed as the training set. Finally, for the aqueous alanine pentapeptide, a 1 ms simulation is performed and 106 free-energy points are randomly selected from a Gaussian fit to the histogram as the training set. For all systems, an additional 50,000 points are randomly generated in separate d-AFED/TAMD runs, and these points are used as a test set. The training and test sets are carefully checked to ensure there is no overlap between these two sets. For complete consistency, all machine learning models used in the comparison are trained on the same training sets for each system and tested using the same test set. Machine learning model details. Each of the machine learning models employed in this comparative study involve hyperparameters that must be chosen. For Gaussianbased kernel methods, there are two hyperparameters, specifically, the Gaussian width σ and the regularization or ridge parameter λ. The random forest and K-nearest neighbor models have a number of hyperparameters. For K-nearest neighbors, if we choose di = 1 (as we do here), then the only hyperparameter that needs to be determined is the value of K. For the random-forest model, the key parameters are the number of trees in the ensemble or forest and the number of input variables to place into each random subset. For feed-forward neural networks, the hyperparameters are the number of hidden layers and the number of nodes in each layer. In this comparison, a ten-fold cross validation is used to perform the hyperparameter search. The resulting neural network for the alanine di- and tripeptides consists of two hidden layers with 20 nodes in each layer for the dipeptide and 40 nodes in each layer for the tripeptide. For met-enkephalin, three hidden layers are employed with 100, 50, and 50 nodes in each of the three layers, respectively. For the aqueous alanine pentapeptide, three hidden layers are employed with 60, 30, and 30 nodes in each layer, respectively. For the kernel, K-nearest neighbors and random-forest models, the resulting hyperparameters depend on the size of the training set, and as the learning curves are generated as a function of the training set size, the number of parameters determined is quite large and can be found in tables in the supporting information document accompanying the work of Cendagorta et al. (2020); the interested reader is encouraged to study these

770

Machine learning

tables. It is worth noting that for larger training set sizes used with random forests, the number of trees is more than 200 for the di- and tripeptides and approximately 50 for met-enkephalin and the alanine pentapeptide. The learning curves are performed with respect to the test set using the L2 error formula v u test u 1 NX 2 (AML (sj ) − Atest (sj )) , L2 = t Ntest j=1

(17.7.3)

where Ntest is the number of points in the test set (here, 50,000) and Atest (sj ) is the (known) free energy at the jth test point. Generating observables. In order to test the ability of the training machine learning models to generate observables from eqn. (17.7.1), we select different types of observables for each system. For the alanine tripeptide, we study the following “observable”: v u n  2  2  u 1 X (min) (min) , + ψi − ψi φi − φi O({φ, ψ}) = t 2n i=1

(17.7.4)

where n is the number of Ramachandran angle pairs used to generate the free-energy (min) (min) are the angles at the and ψi surface (n = 2 for the tripeptide). The angles φi global minimum of the free-energy surface. Although this is not a physical observable, it is a sensitive test of the ability of the machine learning model to generate an observable that depends on the full set of collective variables. For met-enkephalin, we compute the average of the HN Hα nuclear magnetic resonance (NMR) J-couplings, which characterizes the indirect interaction between the nuclear spins of the Cα hydrogen and the amide hydrogen. These J-couplings can be computed using the Karplus equation (Karplus, 1959) 2

J(φ) = A cos2 (φ − φ0 ) + B cos (φ − φ1 ) + C,

(17.7.5)

where φ is the Ramachandran angle, A = 7.09 Hz, B = 1.42 Hz, C = 1.55 Hz, and the constant angles φ0 and φ1 are both 60◦ . J(φ) is computed for each amino acid residue in the oligopeptide. Finally, for the alanine pentapeptide, we focus on the propensities for different secondary structural motifs, specifically, α helix, β sheet, and the lefthanded polyproline II helix (ppII). These are defined by simple indicators that are functions of φ and ψ and define specific regions in the φ-ψ plane for each alanine residue. The definitions are as follows: α : −160◦ < φ < −20◦ and − 120◦ < ψ < β : −180◦ < φ < −90◦ and 50◦ < ψ
ρi }

rij .

(17.8.3)

That is, we compute the smallest distance between the point xi and any other point of higher density. If xi is already the point of highest density, then we can compute di = maxj rij . Cluster centroids are now recognized as points xi for which di is anomalously large. The algorithm is illustrated in Fig. 17.15. Once the cluster centroids are determined, cluster membership of each remaining point is determined by assigning it to the same cluster as its nearest neighbor of higher density. Thus, the assignment begins with the cluster centroids, themselves, as these are points of maximum local density.

17.9

Intrinsic dimension of a data manifold

We conclude this chapter by showing how the neighbors of a data point can be used to estimate the dimensionality of a data manifold. Data sets used in learning protocols often lie in a low-dimensional space embedded in a higher dimensional one, and the lower dimension of the data set might not be obvious under such embedding. An example of such an embedding might be a Swiss roll embedded in a rectangular box (see Fig. 17.16). This lower dimensionality is known as the intrinsic dimension of the data manifold or space that contains the data points. Although the embedding dimension of the box in Fig. 17.16 is three, the intrinsic dimension of the Swiss-roll data set is two.

Intrinsic dimension

785

Fig. 17.15 Illustration of density-based clustering. Two clusters are shown in the left panel as circles and squares. Values of d and ρ are plotted in the right panel. The two centroids emerge as points of anomalously high d values (see right panel).

It was suggested in Section 17.5 that the change in intrinsic dimension through the layers of a neural network can be an important metric for evaluating the performance of the network. Facco et al. (2017) introduced an approach for estimating this intrinsic dimension using only the nearest and second nearest neighbors of each point on the data manifold. Given a set of n data points x1 , ..., xn , let r1 , ..., rk be the k nearest neighbors of a point xi in the data set. If these neighbors are arranged in ascending order such that r1 < r2 < r3 · · · < rk , then r1 and r2 correspond to the nearest and second-nearest neighbors of xi , respectively. Introduce the ratio µ = r2 /r1 . Since r2 > r1 , it follows that µ ∈ [1, ∞). If the ratio µ is computed for every point in the data set, then n values of µ, µ1 , ..., µn will be obtained, and a histogram of µ values can be

Fig. 17.16 A Swiss roll data manifold embedded in a rectangular box.

786

Machine learning

Fig. 17.17 Intrinsic dimension as a function of the size of the data set for the alanine dipeptide. The data set is taken from a d-AFED/TAMD run that uses the two Ramachandran dihedral angles as collective variables.

generated. This histogram represents a probability distribution f (µ) whose analytical form can be shown to be f (µ) = dµ−d−1 θ(µ − 1), (17.9.1) where d is the intrinsic dimension of the data set. If we now integrate f (µ), we obtain the cumulative probability P (µ) of µ values Z µ Z µ  P (µ) = f (y) dy = d (17.9.2) y −d−1 dy = 1 − µ−d θ(µ − 1). −∞

1

Equation (17.9.2) is now solved for the intrinsic dimension d to give d=−

ln (1 − P (µ)) . ln µ

(17.9.3)

Equation 17.9.3 prescribes a straightforward approach for calculating the intrinsic dimension of a data manifold: We only need to compute the probability P (µ) of different values of µ and feed these µ and P (µ) into eqn. (17.9.3); the result will be the intrinsic dimension d of the data set. The result of applying eqn. (17.9.3) to the sampled data set for the alanine dipeptide, shown in Fig. 17.17, shows that the intrinsic dimension of data used to obtain the free-energy surface from a d-AFED/TAMD simulation using the Ramachandran angles as collective variables (see Fig. 8.5) is two, as expected. Since this is a data set from an enhanced sampling simulation, the plot is somewhat noisier

Problems

787

than what we would expect to observe for the synthetic data in Facco et al. (2017); however, a trend toward the value of two is clear. If the dimensionality of the data were not known a priori, this type of analysis would be capable of revealing it.

17.10

Problems

17.1. Verify the following properties for a Bernoulli distribution: 1 X

PB (x; ν) = 1

x=0

hxi = ν hx2 i − hxi2 = ν (1 − ν) . 17.2. Recall that the Shannon entropy S[f ] for a probability distribution f (x) is Z S[f ] = − dx f (x) ln f (x). a. By performing a maximization of S[f ] over all distributions f (x) subject to the three constraints Z dx f (x) = 1 Z Z

dx xf (x) = µ

dx (x − µ)(x − µ)T f (x) = Σ,

show that the resulting distribution is the Gaussian distribution of eqn. (17.2.3). b. Show that the entropy of the Gaussian distribution in eqn. (17.2.3) is given by n 1 S[PG ] = ln det (Σ) + (1 + ln(2π)) . 2 2 17.3. a. Derive eqns. (17.3.5). b. For the linear regression model in eqn. (17.3.10), find the analytical solution for w0 and w by optimizing the loss function in eqn. (17.3.11). Examine your solution in the limit that λ = 0.

788

Machine learning

17.4. Starting from the data model in eqn. (17.4.9), optimize the loss function in eqn. (17.4.10) and show that the analytical solution is given by eqn. (17.4.8). 17.5. For each of the diagrams shown above, write explicit expressions for the corresponding feed-forward neural networks, based either on eqn. (17.5.9) or on eqn. (17.5.31), depending on which general form applies.

Fig. 17.18 Diagrams for problem 17.5.

17.6. Generalize eqn. (17.5.9) to a feed-forward neural network having M hidden layers with m1 , ..., mK nodes in each hidden layer. ∗

17.7. Derive eqns. (17.5.19) and (17.5.20).



17.8. Gradient training of a neural network requires optimization of the loss function given in eqn. (17.5.25). Derive the back propagation scheme for this loss function. 17.9. Consider the following application of Bayes’ theorem to a regression neural network. Let eqn. (17.3.7) be used to construct a likelihood function for the network in terms of its loss function. We now define a prior distribution  n/2 λ e−λw·w , p(w) = 2π where dim(w) = n. If we additionally define a new loss function as the logarithm of the posterior probability distribution, show that the new loss function

Problems

789

is given by eqn. (17.5.11) but with an additional ridge regularization term. 17.10. The following ten data points are assumed to lie approximately along the curve y(x) = sin(2πx): (0,0.30), (1/9, 0.86), (2/9, 1.00), (1/3, 0.98), (4/9, 0.10), (5/9, 0.06), (2/3, −0.90), (7/9, −0.40), (8/9, −0.50), (1, 0.29) The points are plotted on the accompanying figure along with the function y(x) = sin(2πx). Write a program to train a regression neural network with a single hidden layer on these ten points. In particular, train three such networks, the first with one node in the hidden layer, the second with three nodes in the hidden layer, and the third with ten nodes in the hidden layer. For this problem, the use of the gradient descent algorithm in eqn. (17.5.21) is relatively straightforward; however, more advanced readers might wish to try the stochastic gradient descent approach. Plot the three functions that result from each trained network along with the ten training points. Is there an optimal number of nodes in the hidden layer? What happens if the number of nodes in the hidden layer is too large?

Fig. 17.19 Plot for problem 17.10.

17.11. Consider the matrix x and filter  1 3 5 x = 5 7 9 2 4 5

F given below    2 1 2 2, F= . 2 −1 1

Determine the convolution matrix X = x ◦ F.

790

Machine learning

17.12. Consider a classification neural network with C > 2 classes. Derive eqns. (17.5.36) through (17.5.38) and show that the final activation function H(x) should be the softmax function given in eqn. (17.5.39). If the neural network has no hidden layers, to what explicit form does the learning model simplify? 17.13. Determine how the back propagation approach needs to be modified for a classification neural network with C > 2 classes. Hint: Do not forget to take PC (i) into account the constraint that k=1 yk = 1.

17.14. The phase classification example in Section 17.7.2 employs molecular dynamics based enhanced sampling in order to generate the free-energy profiles in Fig. 17.13. The forces needed by these methods require derivatives of the form ∂f (Q(r))/∂ri on atom i. These derivatives are computed using the chain rule, which means that products of the form ∂f ∂Q ∂Gk · ∂Q ∂Gk ∂ri need to be computed. Here, Gk is one of the symmetry functions discussed in Section 17.7.2. Discuss qualitatively how each term is most efficiently evaluated. Some of the approaches in Appendix C might be useful for your discussion. Derive an explicit expression for the force on atom i for a neural network that contains one hidden layer with M nodes, whose output classifies p phases, and whose input consists only of G2 descriptors with a single value of η and n different values of Rs . 17.15. Let P (x) and Q(x) be two normalized probability distributions. The relative Shannon entropy between P and Q with respect to P is known as the Kullback-Leibler (KL) divergence and is defined by   Z Q(x) . KL(P ||Q) = − dx P (x) ln P (x) The KL divergence is a measure of a statistical “distance” between P and Q. a. If P (x) is a Gaussian distribution of mean µ and width σ, and Q(x) is a one-dimensional Gaussian distribution of mean ν and width λ, calculate the KL divergence between P and Q. ∗ b. Repeat for multivariate Gaussian distributions of a vector x of dimension n, assuming that P (x) has a mean vector µ and covariance matrix Σ and that Q(x) has a mean vector ν and covariance matrix Λ. 17.16. Derive a probability density corresponding to the K-nearest neighbors learning model, and show that this distribution cannot be normalized. That is, show that the integral of this distribution over all space is divergent.

791

Problems ∗∗

17.17. Define the L2 norm of a function f (x1 , ..., xn ) of n variables over an ndimensional volume Ω as Z 1/2 2 ||f (x1 , ..., xn )|| = dx1 · · · dxn (f (x1 , ..., xn )) . Ω

Let x1 , ..., xn be variables such that xi ∈ [0, 1], i = 1, ..., n, and let φi (x) be monotonically increasing functions φi : [0, 1] → [0, 1]. Let ǫ and δ, with 0 < ǫ < 1 and 0 < δ < 1, be ordinary numbers. Finally, let γ1 (x) be a function such that γ1 : R → R, and suppose we can choose γ1 such that ||γ1 || ≤ ||f ||, ! 2n+1 n X X f (x1 , ..., xn ) − λi φq (xi ) ≤ (1 − ǫ)||f ||, γ1 q=1

i=1

and ||γ1 || = δ||f ||. Let us now define a series of functions γj : R → R and hj : [0, 1]n → R such that ! 2n+1 n X X λi φq (xi ) . hj (x1 , ..., xn ) = γj q=1

i=1

With these definitions, note that ||f − h1 || = (1 − ǫ)||f || and ||γ1 || = δ||f ||. a. Show that this series of functions leads to an approximation to f (x1 , ..., xn ) such that r X r f − hj ≤ (1 − ǫ) ||f || j=1

and

r−1

||γr || = δ (1 − ǫ)

||f ||.

b. Now let r → ∞. Show that r X r hj ≤ (1 − ǫ) ||f || = 0 lim f − r→∞

j=1

and

lim ||γr || = 0.

r→∞

c. Finally, show that f (x1 , ..., xn ) =

2n+1 ∞ XX q=1 j=1

γj

n X i=1

!

λi φq (xi )



2n+1 X q=1

g

n X i=1

!

λi φq (xi )

as in eqn. (17.5.1). d. Can this procedure provide guidance on how to construct a feed-forward neural network for regression of a function? Explain.

792

Machine learning

17.18. In Problem 8.12, we considered the effect of an invertible transformation on the partition function of a complex system. Machine learning models can be trained to learn such transformations. Let x ∈ Rn be a vector, and let f : Rn → Rn be an invertible, smooth mapping with inverse f −1 . Let z = f (x) and x = f −1 (z), and let P (x) be a probability distribution function of x. a. Show that under the transformation z = f (x), the probability distribution P (x) transforms as    −1  ∂f ∂f ˜ = P (x) det , P (z) = P (x) det ∂z ∂x

where |det(∂f /∂x)| is the determinant of the transformation. b. Suppose we now have a sequence of invertible transformations from z0 ≡ x to zK : z1 = f1 (z0 ), z2 = f2 (z1 ),...,zK = fK (zK−1 ). If P0 (z0 ) is a probability distribution of z0 and PK (zK ) is a distribution of zK , show that ln PK (zK ) = ln P0 (z0 ) −

K X

k=1

  ∂fk . ln det ∂zk−1

c. The sequence of random variables {zk − fK (zk−1 )} is known as a flow, and the sequence of distributions {Pk (zk )} is known as a normalizing flow. Based on the discussion of neural networks given in Section 17.5, discuss how a normalizing flow can be formulated and learned as a feed-forward neural network model.

Appendix A Properties of the Dirac delta-function The Dirac delta(δ)-function is informally defined as having infinite height, zero width, and unit area. When centered around x = 0, it can be heuristically represented as  ∞ x=0 δ(x) = . (A.1) 0 x 6= 0 The δ-function has two important properties. First, its integral is unity: Z ∞ dx δ(x) = 1.

(A.2)

−∞

Second, the integral of a δ-function times any arbitrary function f (x) is Z ∞ dx δ(x)f (x) = f (0).

(A.3)

−∞

Equation (A.1) is not rigorous and, therefore, is not especially useful when we wish to derive the properties of the δ-function. For this reason, we replace eqn. (A.1) with a definition involving a limit of a sequence of functions δσ (x) known as δ sequences: δ(x) = lim δσ (x).

(A.4)

σ→0

The choice of the δ-sequence is not unique. Several possible choices are: 1. Normalized Gaussian function 2 2 1 e−x /2σ , δσ (x) = √ 2 2πσ

(A.5)

2. Fourier integral 1 σ 1 δσ (x) = = π σ 2 + x2 2π

Z



dk eikx−|σ|x ,

(A.6)

−∞

3. Scaled sinc function δσ (x) =

x x 1 1 = . sinc sin πσ σ πx σ

(A.7)

794

Dirac delta-function

Let us employ the normalized Gaussian sequence in eqn. (A.5) to prove eqns. (A.2) and (A.3). Equation (A.2) follows immediately from the fact that eqn. (A.5) is properly normalized so that Z ∞ Z ∞ dx δ(x) = lim dx δσ (x) σ→0

−∞

−∞

1 = lim √ σ→0 2πσ 2

Z



dx e−x

2

/2σ2

−∞

= 1.

(A.8)

To prove eqn. (A.3), we need to perform the integral Z ∞ Z ∞ dx δ(x)f (x) = lim dx δσ (x)f (x) σ→0

−∞

−∞

1

= lim √ σ→0 2πσ 2

Z



dx e−x

2

/2σ2

f (x).

(A.9)

−∞

This integral can be carried out by introducing a Taylor series expansion for f (x) about x = 0 into eqn. (A.9): √

1 2πσ 2

Z



dx e

−x2 /2σ2

−∞

1

f (x) = √ 2πσ 2

Z



−∞

dx e−x

2

/2σ2

∞ X f (n) (0) n x n! n=0

Z ∞ X 2 2 1 f (n) (0) ∞ dx xn e−x /2σ , (A.10) = √ 2 2πσ n=0 n! −∞ where f (n) (0) is the nth derivative of f (x) evaluated at x = 0. If n is odd, the integral vanishes because the product of an odd and an even function yields equal areas of opposite sign for x > 0 and x < 0. For n even, the integrals are just the moments of the Gaussian distribution. These moments have the general form Z ∞ 2 2 1 √ dx xn e−x /2σ = (n − 1)!!σ n , (A.11) 2πσ 2 −∞ where (n − 1)!! = 1 · 3 · 5 · · · (n − 1) and −1!! ≡ 1. Since we need the limit σ → 0, the only term that does not vanish in this limit is the n = 0 term. Thus, we find Z ∞ lim dx δσ (x)f (x) = f (0). (A.12) σ→0

−∞

In a similar manner, it is also possible to prove the more general property Z ∞ dx δ(x − a)f (x) = f (a), −∞

which is left as an exercise for the reader.

(A.13)

Dirac delta-function

795

Consider, next, a δ-function of the form δ(ax). This can be simplified according to δ(ax) =

1 δ(x), |a|

(A.14)

which also implies that δ(−x) = δ(x). Similarly, a δ-function of the form δ(x2 − a2 ) can be rewritten equivalently as  1 δ x2 − a2 = [δ(x − a) + δ(x + a)] . 2|a|

(A.15)

More generally, given a δ-function δ(g(x)) where g(x) is a function with n zeroes at points, x ¯1 , ..., x¯n , such that g ′ (¯ xi ) 6= 0, the δ-function can be simplified according to δ(g(x)) =

n X δ(x − x ¯i ) i=1

|g ′ (¯ xi )|

.

(A.16)

It is easy to see that eqns. (A.14) and (A.15) are special cases of eqn. (A.16). Equation (A.16) can also be proved using δ-sequences by performing a change of variables y = g(x) and employing similar arguments to those used in proving eqn. (A.3).

Appendix B Calculus of functionals Recall that an ordinary scalar function f (x1 , ..., xn ) ≡ f (x) of a vector x ∈ Ω, where Ω is an n-dimensional space, specifies a rule for mapping a point x in Ω to a single number. By contrast, a functional F [f ] specifies a rule for mapping a function f (x) onto a single number. Note the difference in notation: A functional is specified with square brackets “[ ]” to specify the argument whereas an ordinary function f (x) uses parentheses “( )”. A functional F [f ] can depend on the entire range of f or just a subset of this full range. Some examples of functionals are listed below. 1. A simple functional can take the form of a definite integral over Ω of f (x): Z F [f ] = dx f (x). (B.1) Ω

An integral is a natural mathematical construct for expressing a functional, as evaluation of an integral requires all values of f (x). That is, in order to compute an integral, the full range of f over the entire space Ω is needed. A functional need not be a simple integral of f (x). If g(y) is integrable over Ω, then a more general form of a functional is Z F [f ] = dx g(f (x)). (B.2) Ω

In addition, the integral could contain some other fixed function w(x), i.e., Z F [f ] = dx w(x)g(f (x)). (B.3) Ω

Note, finally, that if ω ⊂ Ω, then a functional that involves only part of the range of f can also be defined: Z dx w(x)g(f (x)). (B.4) F [f ] = ω

Examples in the book of functionals expressed as single integrals are the modified Shannon entropy functional in eqn. (4.4.4), in which x is the phase-space vector, or the Thomas-Fermi kinetic energy functional in eqn. (11.5.46), in which x is the Cartesian spatial vector r. Because functionals of this type are single integrals, they are known as local functionals.

The calculus of functionals

797

2. Functionals can also take the form of multiple integrals involving f (x) at different points. An example is a double integral F [f ] =

Z

dx



Z

dx′ g(f (x))h(f (x′ ))w(x, x′ ),

(B.5)



where w(x, x′ ) is a fixed function. Functionals of this type can involve any number of integrals. The exchange energy functional in eqn. (11.5.53), for example, contains two integrals. Because functionals of this type involve multiple integrals of f (x) at different points, they are known as nonlocal functionals. 3. Functionals can involve any number of derivatives of f (x), such as the gradient ∇f (x), the Laplacian ∇2 f (x), and/or higher–order derivatives. Local functionals involving derivatives of f take the general form F [f ] =

Z



dx w(x)g(f (x), ∇f (x), ∇2 f (x), ...).

(B.6)

Examples include the action functional in eqns. (1.8.1) and (12.4.5), the Euclidean action integral in eqn. (12.4.11), and generalized gradient functional approximations to exchange and correlation in density functional theory. Note that such functionals can also be nonlocal, as in: F [f ] =

Z



dx

Z



dx′ g(f (x), ∇f (x), ∇2 f (x), ..., f (x′ ), ∇f (x′ ), ∇2 f (x′ ), ...). (B.7)

4. In some instances, a functional may involve values of f (x) at a set of M discrete points x1 , ..., xM , i.e., F [f ] = g(f (x1 ), f (x2 ), ..., f (xM )).

(B.8)

In order to create a calculus of functionals, we first need to define the concept of a derivative of a functional, also known as a functional derivative. Functional derivatives require the concept of a small change in a functional, for which we use the notation of a variation δF defined as δF = F [f + δf ] − F [f ].

(B.9)

The variation δF , therefore, requires changing f (x) by a small amount δf (x). However, unlike with simple functions, where a small change x+δx in a point can be understood as a displacement of the point x by δx, the displacement δf (x) of a function is, itself, another function! This brings some arbitrariness to the process of defining the variation δF , and makes the process of taking a limit δf (x) → 0, which is ultimately needed to define a derivative, complex. The easiest way to proceed in computing a derivative of a functional with respect to its function argument is to introduce an arbitrary

798

The calculus of functionals

function η(x) and a multiplicative parameter ε such that δf (x) = εη(x). In this way, the derivative can be defined through an integral relation     Z F [f + εη] − F [f ] d δF (B.10) η(x) ≡ lim F [f + εη] . dx = ε→0 δf (x) ε dε Ω ε=0

The quantity δF/δf (x) is known as the functional derivative of F [f ]. Since η(x) is arbitrary, we can take η(x) = δ (n) (x − x0 ), in which case, evaluation of the integral on the left side of eqn. (B.10) yields the functional derivative of F [f ], δF/δf (x0 ) at the point x0 . However, doing so is not really necessary, as the ε derivative on the right side of eqn. (B.10) will yield an integral expression involving η(x) explicitly, so that what remains in the integrand can be deduced as δF/δf (x). As an example, consider the functional Z p dx (f (x)) , (B.11) F [f ] = Ω

where p is an arbitrary power. Applying eqn. (B.10), we simply need to evaluate Z d d p F [f + εη] = dx (f (x) + εη(x)) dε dε Ω Z ∂ p (f (x) + εη(x)) = dx ∂ε Ω Z p−1 = dx p (f (x) + εη(x)) η(x). (B.12) Ω

Now, setting ε = 0, we obtain   Z d p−1 F [f + εη] = dx p (f (x)) η(x), dε Ω ε=0 from which we see that

δF p−1 = p (f (x)) . δf (x)

(B.13)

(B.14)

Note that, in this case, because functional derivatives are taken at specific point values f (x) of the function, the derivative of a local functional removes the integral, leaving an expression that is evaluated at a specific point x. As a second example, consider the functional in eqn. (B.5), with g(f ) = h(f ) = f and with w(x, x′ ) being a symmetric function, meaning w(x, x′ ) = w(x′ , x). It is left as a exercise for the reader to show that application of eqn. (B.10) leads to the result Z δF dx′ f (x′ )w(x, x′ ). (B.15) =2 δf (x) Ω As a third example, consider the functional Z dx g(f (x), ∇f (x)). F [f ] = Ω

(B.16)

The calculus of functionals

799

Applying eqn. (B.10) by taking a ε derivative, we find that d F [f + εη] = dε

Z



dx g(f (x) + εη(x), ∇f (x) + ε∇η(x)),

(B.17)

which involves η(x) and its gradient ∇η(x). Taking the derivative with respect to ε and setting ε = 0, we obtain     Z ∂g d ∂g F [f + εη] η(x) + ∇η(x) . (B.18) dx = dε ∂f (x) ∂∇f (x) Ω ε=0

Although the first term is in the form required by eqn. (B.10), the second term, which involves ∇η(x), is not. However, if we assume that f (x) vanishes at the boundary ∂Ω of Ω, then we can integrate the second term by parts, dropping the boundary term, and we obtain     Z d ∂g ∂g η(x), (B.19) F [f + εη] −∇· dx = dε ∂f (x) ∂∇f (x) Ω ε=0 from which we see that

∂g δF = −∇· δf (x) ∂f (x)



∂g ∂∇f (x)



.

(B.20)

It is left as an exercise to the reader to show that if F [f ] is F [f ] =

Z



dx g(f (x), ∇f (x), ∇2 f (x)),

(B.21)

then δF ∂g = −∇· δf (x) ∂f (x)



∂g ∂∇f (x)



+ ∇2

∂g . ∂∇2 f (x)

(B.22)

As a further exercise, the reader should verify the following result: Let f (x) be a function of a single variable x, and consider a local functional of the form Z F [f ] = dx g(f (x), f ′ (x), f ′′ (x), f ′′′ (x), ..., f (n) (x)). (B.23) The functional derivative of F [f ] is then n δF ∂g ∂g ∂g d ∂g d2 n d , = − + + · · · + (−1) δf (x) ∂f (x) dx ∂f ′ (x) dx2 ∂f ′′ (x) dxn ∂f (n) (x)

(B.24)

where f (n) (x) is the nth derivative of f (x). According to eqn. (B.10), all that is required to define a functional derivative is a a simple parameter derivative with respect to ε; therefore, it is straightforward to

800

The calculus of functionals

define higher order functional derivatives. For example, a second functional derivative δ 2 F/δf (x)δf (x′ ) is defined by   2 Z d δ2F ′ , F [f + εη] η(x)η(x ) = (B.25) dx dx′ ′ 2 δf (x)δf (x ) dε Ω ε=0 with analogous definitions of third, fourth,..., nth functional derivatives of F [f ], i.e.,   n Z δnF d . (B.26) dx1 · · · dxn F [f + εη] η(x1 ) · · · η(xn ) = n δf (x1 ) · · · δf (xn ) dε Ω ε=0

With this general definition, we can also define a functional Taylor series for a finite displacement ∆f (x) of f (x): Z ∞ X δn F 1 ∆f (x1 ) · · · ∆f (xn ). dx1 · · · dxn F [f + ∆f ] = F [f ] + n! Ω δf (x1 ) · · · δf (xn ) n=1 (B.27)

To conclude this appendix, we briefly discuss the concept of functional integration. Section 12.4 in Chapter 12 provides a fairly detailed discussion of functional integration in the context of the Feynman path integral. Functional integration of a functional F [f ] is tantamount to the limit of a “sum” over all possible functions f (x) via a process in which f is systematically varied. As was done for the functional derivative, we start by defining functional integration in discretized form. In order to keep the discussion simple and avoid the complexities of multiple integration, we will consider functionals F [f ] of functions f (x) of a single variable x ∈ [a, b]. Moreover, let us consider a simple local functional of the form Z b F [f ] = g(f (x)) dx. (B.28) a

Let us further assume that the functional integral of F [f ] must include all functions satisfying fixed endpoint conditions f (a) = fa and f (b) = fb . To construct such a functional integral, we first discretize the integral in eqn. (B.28) using, for example, a trapezoidal rule integration: Z

b

g(f (x)) dx =

a

M−1 1 b−a X (g(fa ) + g(fb )) + lim g(f (xk )), M→∞ M 2

(B.29)

k=1

where the interval [a, b] has been discretized into M +1 points xk such that x0 = a, and xM = b. Ultimately, the exact result is recovered in the limit that M → ∞. Equation (B.29) now replaces the integral in eqn. (B.28) as F [f ] =

M−1 1 b−a X (g(fa ) + g(fb )) + lim g(f (xk )). M→∞ M 2

(B.30)

k=1

Let us define f (xk ) ≡ fk . Integration over all functions f (x) satisfying the endpoint condition f (x0 ) = f (a) = fa and f (xM ) = f (b) = fb now requires that we perform

The calculus of functionals

801

an integration over all fk , k = 0, ..., M , subject to the condition that f0 and fM are fixed. We indicate this (following the notation of Chapter 12) as Z

f (b)=fb

Df F [f ] =

f (a)=fa

b−a ≡ lim M→∞ M

Z

f (b)=fb

Df

k=0

dfk

b

g(f (x)) dx

a

f (a)=fa

Z Y M

Z

"

# M−1 X g(fa ) + g(fb ) + g(fk ) 2 k=1

(B.31) .

f0 =fa ,fM =fb

Equation (B.31) provides a definition of functional integration for local functionals. As discussed in Chapter 12, the integral in eqn. (B.31) is performed by carrying out the integration on the right side as a function of the number of points M and then taking the limit M → ∞ of the result. If F [f ] includes derivatives f ′ (x), f ′′ (x),..., these are evaluated numerically, e.g., f ′ (x) → (fk − fk−1 )/∆, where ∆ = (b − a)/M , and so forth. As illustrated in Chapter 12, there are approaches for performing the integration directly in terms of the continuous functions themselves; however, there are very few examples for which this can be done, these typically being restricted to quadratic functionals. In general, the integration needs to be carried out in discrete form for finite M numerically using the right side of eqn. (B.31). As the numerical calculation is repeated for increasing M , the result will converge asymptotically to the infinite-M limit, which is then taken as the result. The processes of discrete and continuous functional integration based on eqn. (B.31) are illustrated in Figs. 12.6 and 12.8.

Appendix C Evaluation of energies and forces In any Monte Carlo or molecular dynamics calculation of a many-body system, the most time-consuming step is the evaluation of energies and, when needed, forces. In this appendix, we will discuss the efficient evaluation of energies and forces for simple nonpolarizable force fields under periodic boundary conditions Let us begin by considering a system described by a typical nonpolarizable force field as in eqn. (3.11.1). The bonding and bending terms are straightforward and computationally inexpensive to evaluate; dihedral angles can be calculated from the angle between the planes defined by atoms 1, 2, and 3 and atoms 2, 3, and 4. By far, the most expensive part of the calculation is the evaluation of the nonbonded (nb) forces and energies given by the Lennard-Jones and Coulomb terms: ) " ( 12  6 # X qi qj σij σij + . (C.1) − Unb (r1 , ..., rN ) = 4ǫij rij rij rij i>j∈nb

As written, the evaluation of these energies and their associated forces requires O(N 2 ) operations, a calculation that quickly becomes intractable as N becomes very large. Simulations of even small proteins in solution typically require at least 104 to 105 atoms, while simulations in materials science, such as crack formation and propagation, can involve up to 109 atoms. Many interesting physical and chemical processes such as these require long time scales to be accessed, and in order to reach these time scales, a very large number of energy and force evaluations is needed. Such calculations would clearly not be possible if the quadratic scaling could not be ameliorated. The goal, therefore, is to evaluate the terms in eqn. (C.1) with O(N ), or at worst, O(N ln N ), scaling. The first thing we see about eqn. (C.1) is that the Lennard-Jones and Coulomb terms have substantially different length scales. The former is relatively short range and could possibly be truncated as a means of improving the scaling, but this is not true of the Coulomb interaction, which is very long range and would, therefore, suffer from severe truncation artifacts (see, for example, Patra et al. (2003)). Let us begin, therefore, by examining the Coulomb term more closely. In order to reduce the computational overhead of the long-range Coulomb interaction, we only need to recognize that any function we might characterize as long ranged in real space becomes short ranged in reciprocal or Fourier space. Moreover, since we are dealing with a periodic system, a Fourier representation is entirely appropriate. Therefore, we can tame the Coulomb interaction by dividing it into a contribution that is short ranged in real space and one that is short ranged in reciprocal space. Given two rapidly

803

Energies and forces

convergent terms, the Coulomb interaction should be easier to evaluate and exhibit better scaling. We begin by introducing a simple identity that pertains to a function known as the error function erf(x) defined by Z x 2 2 dt e−t . (C.2) erf(x) = √ π 0 In the limit x → ∞, erf(x) → 1. Note also that erf(0) = 0. The error function has a complement erfc(x) defined by Z ∞ 2 2 erfc(x) = 1 − erf(x) = √ dt e−t . (C.3) π x Both functions are defined for x ≥ 0. Note that as x → ∞, erfc(x) → 0. From these definitions, it is clear that erf(x) + erfc(x) = 1. This identity can now be used to divide up the Coulomb interaction into short-range and long-range components. Consider writing 1/r as erfc(αr) erf(αr) 1 = + , (C.4) r r r where the parameter α has units of inverse length. Since erfc(αr) decays rapidly as r increases, the first term in eqn. (C.4) is short ranged. The parameter α can be used to tune the range over which erfc(αr)/r is nonnegligible. The second term in eqn. (C.4) is long ranged and behaves asymptotically as 1/r. Introducing eqn. (C.4) into eqn. (C.1), we can write the nonbonded forces in terms of short- and long-ranged components as Unb (r1 , ..., rN ) = Ushort (r1 , ..., rN ) + Ulong (r1 , ..., rN )

Ushort (r1 , ..., rN ) =

X X

S i>j∈nb

Ulong (r1 , ..., rN ) =

(

4ǫij

"

σij rij,S

12

X X qi qj erf(αrij,S ) , rij,S





σij rij,S

6 #

qi qj erfc(αrij,S ) + rij,S

)

(C.5)

S i>j∈nb

where rij,S = |ri −rj +S|, with S = mL for a periodic cubic box of side L, and S = hm for a general box matrix h. Here m is a vector of integers. Note that if a system is sufficiently large, then the sums over the lattice vectors S reduce to the single term m = (0, 0, 0). As we will soon see, the use of the potential-energy contributions in eqn. (C.5) has a distinct advantage over eqn. (C.1). Let us consider first evaluating the short-range forces P in eqn. P (C.5). For notational convenience, we denote this term compactly as Ushort = S i>j ushort (rij,S ), where ushort (r) consists of the Lennard-Jones and complementary error function terms. Because the range of these interactions is finite, we can reduce the computational overhead needed to evaluate them by introducing a cutoff radius rc , beyond which we assume that the forces are negligible. The cutoff rc is important for determining α.

804

Energies and forces

Experience has shown that a good balance between the short- and long-range components is achieved if α = 3.5/rc . If rc is chosen to be half the length L of a cubic simulation box, rc = L/2, then the value α = 7/L should be used. For large systems, however, a typical value of rc is in the range of 10 to 12 ˚ A, which is roughly a factor of 3 larger than a typical value of σij . In most simulations, it is the case that at r = rc , ushort (r) is not exactly zero, in which case the potential energy exhibits a small jump discontinuity as the distance between two particles passes through r = rc . This discontinuity leads to a degradation in energy conservation. One approach for circumventing this problem is simply to shift the potential ushort (r) by an amount ushort (rc ), so that the short-range interaction between two particles i and j becomes   ushort (rij ) − ushort (rc ) if rij < rc u ˜short (rij ) = (C.6)  0 otherwise

However, there is a second problem with the use of cutoffs, which is that the forces also exhibit a discontinuity at r = rc ; a simple shift does not remove this discontinuity. A more robust truncation protocol that ensures continuous energies and forces is to smoothly switch off ushort (r) to zero via a switching function S(r). When a switch is used, the potential becomes u ˜short (r) = ushort (r)S(r). An example of such a function S(r) was given in eqn. (3.14.4). Switching functions become particularly important for simulations in the isothermal-isobaric ensemble, as the pressure estimator in eqn. (4.7.57) is especially sensitive to discontinuities in the force (Martyna et al., 1999). An important point concerning truncation of the potential is its effect on energies and pressures. In fact, when the potential is assumed to be zero beyond rc , many weak interactions among particle pairs will not be included. However, the contribution of these neglected interactions can be estimated using eqns. (4.7.46) and (4.7.69). Instead of integrating these expressions from 0 to ∞, if the lower limit is taken to be rc , then the integrals can be performed in the approximation that for r > rc , g(r) ≈ 1. For example, the corrections to the energy per particle and pressure for a Lennard-Jones potential in this approximation would be Z ∞ dr r2 u(r) ∆u = 2πρ rc



   σ 12  σ 6 − dr r = 8ǫπρ r r rc "    3 # 9 8πǫρσ 3 1 σ σ = , − 3 3 rc rc Z

2πρ2 ∆P = − 3 2

= 16πρ ǫ

Z Z



2

dr r3 u′ (r)

rc ∞

rc

    σ 12  σ 6 − dr r 2 r r 2

(C.7)

Energies and forces

16πρ2 ǫσ 3 = 3

"    3 # 9 2 σ σ . − 3 rc rc

805

(C.8)

Equations (C.7) and (C.8) are called the standard long-range corrections (Allen and Tildesley, 1989; Frenkel and Smit, 2002). They should be included in order to obtain accurate thermodynamics averages from simulations using truncated potentials. When periodic boundary conditions are employed (see left panel of Fig. 13.7), interparticle distances are computed using the minimum image convention. The rule of this convention is that in the infinite periodic array, particle i interacts with the periodic image of particle j to which it is closest. For a system with an arbitrary box matrix h, the three components of the vector difference rij = ri − rj are computed as follows for a system with an arbitrary box matrix h: si = h−1 ri sij = si − sj

sij ←− sij − NINT(sij )

rij = hsij ,

Minimum image convention (C.9)

where NINT(x) is the nearest-integer function. The first line transforms the coordinates into the scaled coordinates of eqn. (5.8.2); the second line gives the raw interparticle vector differences; the third line applies the minimum image convention to these differences; the last line transforms the differences back to the original Cartesian variables. A cutoff radius alone is insufficient to reduce the computational overhead associated with the short-range forces since it is still necessary to look over all N 2 pairs to determine if the condition rij < rc is satisfied for a given particle pair. However, since the cutoff radius ensures that a given particle will only interact with a finite number of other particles, we can build a list of all particles j with which a particle i interacts. Such a list is known as a Verlet neighbor list (Verlet, 1967). Since particles typically diffuse, the neighbor list must be updated from time to time during the simulation. The average frequency of updates is controlled by a parameter δ known as the skin length. Figure C.1 illustrates the construction of a neighbor list. For each particle i, we identify all particles that lie within a distance rc + δ of i and store the indices of these neighboring particles in an array. Once the list is built, we simply loop over its elements and calculate all of the forces and energies of the pairs. Because some of the pairs in the list are in the distance range rc < rij < rc + δ, some of the energies and forces will be zero. The smaller δ is chosen, the fewer such interactions there will be, which saves computational time. However, the larger δ is chosen, the more infrequently the list needs to be updated. Thus, the value of δ should be optimized at the start of any simulation. At the start of a simulation, the initial positions r1 (0), ..., rN (0) are saved and an initial Verlet list is generated. At each time step k∆t, k = 1, 2, 3, ..., the displacement ∆i of each particle from its initial position ∆i = |ri (k∆t) − ri (0)| is computed and the maximum over all of these displacements ∆max is subsequently determined. If at the nth step ∆max > δ/2, then the interactions contained in the list will no longer

806

Energies and forces

Fig. C.1 Building a Verlet list of the neighbors of particle i.

be correct, which means that the neighbor list needs to be recomputed. The saved positions r1 (0), ..., rN (0) are overwritten by r1 (n∆t), ..., rN (n∆t), and the Verlet list is regenerated using these positions. At each time step subsequent to the list update, the displacement test is performed using ∆i = |ri ((k + n)∆t) − ri (n∆t)|. If ∆max > δ/2, then the neighbor list must be generated again. In this way, we can determine automatically when the neighbor list must be recalculated. It is worth mentioning in passing that an alternative to the Verlet list is the link list or cell list (Hockney and Eastwood, 1981; Allen and Tildesley, 1989; Frenkel and Smit, 2002). This method consists of dividing the system into cells of size equal to or slightly larger than the cutoff radius rc . Thus, each particle interacts only with particles in the same cell or in nearest-neighbor cells. A link list is generally more efficient than a Verlet list in large systems, as the calculation of the latter scales as O(N 2 ) while the former scales as O(N ). However, the logic to write such a list if more involved. A very useful intermediate approach uses a Verlet list for computing the forces and energies and a link list to create the Verlet list. This improves considerably the efficiency of generating the Verlet list. Finally, fast algorithms for computing short range forces on massively parallel architectures have been developed (Plimpton, 1995) and have proved highly successful, although we will not discuss such algorithms here. We now turn to the evaluation of the long-range energy and forces in eqn. (C.5). Because erf(αr)/r behaves as 1/r for large r, the long-range energy and forces cannot

Energies and forces

807

be evaluated efficiently in real space. However, because a long-ranged function in real space is short ranged in Fourier or reciprocal space, if the latter is used, the scaling can be improved, thereby reducing the computational overhead. Moreover, a Fourier expansion of erf(αr)/r is consistent with the periodic boundary conditions typically used in Monte Carlo and molecular dynamics calculations. For simplicity, we will consider the case of a cubic simulation cell of length L and volume V = L3 . As we saw in Section 11.2, the reciprocal space of such a cell is composed of all reciprocal-space vectors g = 2πn/L, where n is a vector of integers. Using the Poisson summation rule, we can expand the error function term in a Fourier series as X erf(α|r + S|) S

|r + S|

=

1 X Cg eig·r . V g

(C.10)

In eqn. (C.10), the expansion coefficients are XZ erf(α|r + S|) −ig·r e Cg = dr |r + S| D(V ) S Z erf(αr) −ig·r = e r all space =

4π −|g|2 /4α2 e . |g|2

(C.11)

The second line in eqn. (C.11) again follows from the Poisson summation formula, which allows the sum over S to be eliminated in favor of an integral over all space. Note that the coefficient corresponding to n = (0, 0, 0) is not defined, hence this term must be excluded from the sum in eqn. (C.10). Moreover, it is not possible, in practice, to perform a sum over an infinite number of reciprocal-space vectors as required by eqn. (C.10). However, we note that the factor exp(−|g|2 /4α2 ) decays to zero very quickly as |g| → ∞, which means that the sum over reciprocal-space vectors can be truncated and restricted only to a very small part of reciprocal space. We can readily see, therefore, the advantage of working in Fourier space when faced with the calculation of longrange forces! Because Cg only depends on the magnitude |g|, the most natural way to truncate the Fourier sum is to restrict the sum to all reciprocal-space vectors g with 2 magnitudes |g| ≤ gmax , where gmax is chosen such that exp(−gmax /4α2 ) is negligible. The Fourier sum is consequently restricted to g-vectors that lie within a sphere in reciprocal space of radius gmax . In addition, because the coefficients Cg are real and depend only on |g|, they satisfy C−g = Cg and therefore we only need to keep half the g-vectors in the sphere. For example, we could choose the hemisphere S for which gx is positive or 0. When the sum is performed over half of reciprocal space, we simply need to multiply the result by 2. Taking into account the truncation of the Fourier sum and the restriction of the g-vectors to a single hemisphere, the long-range potential energy can be expressed as Ulong (r1 , ..., rN ) =

N −1 N X 4π 2 2 2 X X qi qj e−|g| /4α eig·(ri −rj ) . V i=1 j=i+1 |g|2 g∈S

(C.12)

808

Energies and forces

As written, the sum over particles in eqn. (C.12) scales as O(N 2 ). However, the computational overhead of eqn. (C.12) can be reduced substantially with a few clever tricks. First, consider writing eqn. (C.12) as Ulong (r1 , ..., rN ) =

N N X 4π 2 2 1 X X e−|g| /4α eig·(ri −rj ) . qi qj 2 V i=1 |g| j=1,j6=i

(C.13)

g∈S

Next, we add and subtract the term with i = j to yield Ulong (r1 , ..., rN ) =

N X 4π 2 2 1 X qi qj e−|g| /4α eig·(ri −rj ) 2 V i,j=1 |g| g∈S



N 1 X 2 X 4π −|g|2 /4α2 q e . V i=1 i |g|2

(C.14)

g∈S

When this is done, the first term can be written as the square magnitude of a single sum: 2 N 1 X 4π −|g|2 /4α2 X ig·ri Ulong (r1 , ..., rN ) = qi e e V |g|2 i=1

g∈S



N 1 X 2 X 4π −|g|2 /4α2 q e . V i=1 i |g|2

(C.15)

g∈S

P Note the presence of the structure factor S(g) = i qi exp(ig · ri ) in the first term of eqn. (C.15). The reciprocal-space sum in the second term of eqn. (C.15) can either be evaluated once in the beginning and stored, or it can also be performed analytically if we extend the sum to all of reciprocal space. For the latter, we first write Ulong (r1 , ..., rN ) =

1 X 4π −|g|2 /4α2 e |S(g)|2 V |g|2 g∈S

− We now note that

N 1 X 2 q 2V i=1 i

X

g6=(0,0,0)

4π −|g|2 /4α2 e . |g|2

1 X 4π −|g|2 /4α2 erf(αr) e = lim . 2 r→0 V g |g| r

(C.16)

(C.17)

Since r and erf(αr) are both zero at r = 0, the limit in eqn. (C.17) can be performed using L’Hˆ opital’s rule, so that Z αr 2 2 2 2α 2 2α erf(αr) = lim √ (C.18) e−t dt = lim √ e−α r = √ . lim r→0 r π 0 r→0 r→0 r π π

Energies and forces

809

From this limit, eqn. (C.17) becomes Ulong (r1 , ..., rN ) =

N α X 2 1 X 4π −|g|2 /4α2 2 √ e |S(g)| − q . V |g|2 π i=1 i

(C.19)

g∈S

Equation (C.19) is known as the Ewald sum for the long-range part of the Coulomb interaction (Ewald, 1921). The second term in eqn. (C.19) is known as the self-interaction correction since it cancels out the i = j term, which describes a long-range interaction of a particle with itself. The error that results when the second reciprocal-space sum in eqn. (C.15) is extended over all of Fourier space can be compensated for by multiplying the last term by erfc(gmax /2α). Although the sum in eqn. (C.19) is more efficient to evaluate than the double sum in eqn. (C.12), two important technical problems remain. First, the |S(g)|2 factor in eqn. (C.19) leads to an interaction between all charged particles. Unfortunately, if a system contains molecules, then charged particles involved in common bond, bend, or torsional interactions must not also have a charge–charge interaction, as these are built into the intramolecular potential energy function. Therefore, these unwanted Coulomb interactions need to be excluded from the Ewald sum. One way to solve this problem is to add these unwanted terms with the opposite sign in real space so that they become new contributions to the intramolecular potential and approximately cancel their reciprocal-space counterparts. If this is done, eqn. (C.19) becomes 1 X 4π −|g|2 /4α2 2 e |S(g)| Ulong (r1 , ..., rN ) = V |g|2 g∈S



X

i,j∈intra

N α X 2 qi qj erf(αrij ) −√ q , rij π i=1 i

(C.20)

where “intra” denotes the full set of bonded interactions. The second term in eqn. (C.20) can, therefore, be incorporated into the bonded terms in eqn. (3.11.1). Equation (C.20) is straightforward to implement, but because we are attempting to achieve the cancellation in real space rather than in Fourier space, the cancellation is imperfect. Indeed, Procacci et al. (1998) showed that better cancellation is achieved if the excluded interactions are corrected using a reciprocal-space expression that accounts for reciprocal-space truncation. In particular, these authors showed that the excluded interactions can be computed more precisely if an additional correction is added, which is given by Ucorr (r1 , ..., rN ) = where

N X α qi2 , qi qj χ(rij , gmax ) + √ erfc(gmax /2α) π i=1 i,j∈intra

X

2 χ(r, gmax ) = π

Z



gmax

dg e−g

2

/4α2

sin(gr) . gr

(C.21)

(C.22)

An important technical point about the form of the Ewald sum is that it causes the potential to acquire an explicit volume dependence. Thus, when calculating the

810

Energies and forces

pressure or performing simulations in the N P T ensemble, it becomes necessary to use the pressure estimators in eqns. (4.7.58) and (5.8.29). The second technical problem associated with the Ewald sum is the high computational overhead associated with the calculation of the structure factor S(g) = P i qi exp(ig · ri ) for large systems. As L increases, the number of g-vectors satisfying the criterion |g| ≤ gmax becomes quite large, and the particle sum in S(g), which also increases, must be carried out at each g-vector. The smooth particle-mesh Ewald (SPME) method, an important advance introduced by Essmann et al. (1995), substantially reduces the cost of the Ewald sum and improves the scaling with system size. The SPME approach can be viewed as a kind of “smearing” of the charges, which are located at arbitrary spatial points ri , over a finite set of points on a regular rectangular lattice. In practice, this “smearing” is realized mathematically by employing an interpolation formula to express the exponential exp(ig·ri ) in terms of the exponential factors that depend on the coordinates of the grid points rather than the arbitrary particle coordinate ri . Essmann et al. achieved this interpolation using a set of spline functions known as the cardinal B-spline functions. The functions, denoted Mn (x), are defined as follows: M2 (x) = 1 − |x − 1| for 0 ≤ x ≤ 2 and M2 (x) = 0 for x < 0 and x > 2. For n > 2, Mn (x) are defined via the recursion relations Mn (x) =

n−x x Mn−1 (x) + Mn−1 (x − 1). n−1 n−1

(C.23)

The cardinal B-spline functions have several important properties. First, they have compact support, meaning that Mn (x) is zero outside the interval 0 ≤ x ≤ n. Second, Mn (x) is n − 2 times continuously differentiable. Third, the derivative dMn /dx can be obtained from Mn−1 (x) using another recurrence relation d Mn (x) = Mn−1 (x) − Mn−1 (x − 1). dx

(C.24)

Two other useful properties are: Mn (x) = Mn (n − x) ∞ X

j=−∞

Mn (x − j) = 1.

(C.25)

A plot of the first few cardinal B-spline functions is shown in Fig. C.2. We now introduce the scaled particle coordinates si = V −1/3 ri (which reduce to si = ri /L for a cubic box), so that exp(ig · ri ) = exp(2πin · si ). Let the lattice be cubic with Nl points along each direction and define new coordinates ui = Nl si , such that ui,α ∈ [0, Nl ], α = x, y, z. Then, exp(2πin · si ) = exp(2πin · ui /Nl ). We can then approximate the exponential exp(2πinα ui,α /Nl ) using a cardinal B-spline interpolation formula as e2πinα ui,α /Nl ≈ bn (nα )

∞ X

k=−∞

Mn (ui,α − k)e2πinα k/Nl ,

(C.26)

Energies and forces

811

Fig. C.2 Cardinal B-spline functions for n = 2, 4, 6, 8.

where bn (ν) = e2πi(n−1)ν/Nl

n−2 X

Mn (k + 1)e2πiνk/Nl .

(C.27)

k=0

(Remember that n and nα are different indices!) The sum in eqn. (C.26) is not actually infinite because Mn (x) has compact support. The number of nonzero terms, and hence the accuracy of the interpolation, increases with n. Using eqn. (C.26), the structure factor can then be evaluated using a (discrete) fast Fourier transform of the form ˜ S(n) = bn (nx )bn (ny )bn (nz )Q(n),

(C.28)

where ˜ Q(n) =

N l −1 N l −1 N l −1 X X X

e2πinx kx /Nl e2πiny ky /Nl e2πinz kz /Nl Q(k),

(C.29)

kx =0 ky =0 kz =0

and Q(k) =

N X i=1

qi

X

j1 ,j2 ,j3

Mn (ux,i − kx − j1 Nl )Mn (uy,i − ky − j2 Nl )Mn (uz,i − kz − j3 Nl ). (C.30)

812

Energies and forces

Because the computational overhead of a fast Fourier transform of length Nl scales as O(Nl log Nl ), and Nl ∝ N , it follows that using the smooth particle-mesh Ewald technique to evaluate the structure factor reduces the cost of an Ewald sum to O(N log N ), which becomes a significant computational savings in large systems. Finally, we note that Ewald summation need not be restricted to periodic systems. In fact, Martyna, Tuckerman, and coworkers (1999, 2004a, 2002) showed that the Ewald summation formalism could be easily modified in a unified manner for use in systems with zero, one, or two periodic dimensions, corresponding to clusters, wires, and surfaces, respectively. Force fields employing simple functions, such as harmonic bonds, harmonic bends, cosine-series torsions, and nonbonded interactions like those in eqn. (C.1), are crude approximations of the true physics of interactions, which include charge polarization, charge transfer, and chemical events involving bond-breaking and forming. More sophisticated force fields, not discussed in this appendix, attempt to include chemical events by including polarization through fluctuating charges and dipole moments in the system or via models that attach a light, negatively charged particle to each atom by a spring (the so-called Drude model). Specialized reactive force fields have also been developed to describe specific chemical events. A considerable literature exists on force fields, and an entire book could be written about this subject alone. It is not our intention to review the literature on force fields in a book on statistical mechanics. However, we, nevertheless, conclude this appendix with a brief discussion of a new class of force fields based on the machine learning models of Chapter 17. The advantage of a machine-learning based interaction model is that much of the physics, such as polarization, manybody effects, and chemical reactivity, missing from the fixed-charge models discussed thus far can be captured within a single universal form, such as a neural network or Gaussian kernel model provided that these missing physical effects are present in the data on which the model is trained. As discussed in Chapter 17, a machine learning model needs a set of descriptors for the input information. For the development of a force field, such a descriptor could provide information about the local environment of each atom in the system, which involves other atoms with which a given atom interacts most strongly. As noted in Section 17.7.2, these descriptors must be invariant to translations and rotations of the system. Because of this requirement, the symmetry functions given in eqns. (17.7.7) through (17.7.13) are particularly discriminating of these local atomic environments and have been employed as descriptors to create force fields in terms of multi-layer feed-forward neural networks (Behler and Parrinello, 2007; Behler, 2011; Behler, 2015; Behler, 2021) (see Section 17.5). If these atomic descriptors are employed, then the construction of a machine learning force field must also be based on contributions from individual atoms. Moreover, since these descriptors are “nearsighted” via inclusion of the cutoff function fc (rij ) in the symmetry functions, the machine learning model will learn short-range interactions, including both bonded and short-range nonbonded contributions. Let us denote the corresponding sum of bonded and short-ranged potentials as Us (r1 , ..., rN ) = Ubonded(r1 , ..., rN )+Ushort (r1 , ..., rN ). The use of atom-centered descriptors suggests that there should be a corresponding decomposition of Us in terms of contributions from each atom, that is,

Energies and forces

Us =

N X

us,i .

813

(C.31)

i=1

If the long-range term Ulong is now added to Us , the full potential becomes U = Us + Ulong . The utility of eqn. (C.31) is that each term us,i is represented as a machine learning model. In the work of Behler, et al., a feed-forward network is employed, a schematic of which is illustrated in Fig. C.3. The fact that us,i is short ranged is important for employing machine learning in that it restricts the dimensionality of the function that must be learned, since the number of neighbors of atom i accounted for in the symmetry functions is relatively small (typically fewer then 100) compared to those included in the long-range contributions, and this renders tractable the use of a model such as a neural network. Nevertheless, the number of neighbors is sufficient to include manybody effects. The analytical function to which this diagram corresponds is given in eqn. (17.5.9), which is needed in order to compute the derivatives needed for the forces. The training of a machine learning potential requires input of coordinates and corresponding energies. If desired, gradient training using energies and forces, together with coordinates, is also possible and can reduce the amount of training data needed while improving the accuracy of the model if it is to be used in a molecular dynamics calculation. Ultimately, the physics contained within the machine learning force field

Fig. C.3 A feed-forward neural network with three hidden layers that are used to generate an atomic contribution us,i to the short-range potential energy in eqn. (C.31). The quantities (j) zσ are defined in eqn. (17.5.10). Each set of parameters {w(K) } determines the weights of the connections between hidden layers. The input layer is a set of symmetry functions of different types for each atom i (see eqns. (17.7.9) through (17.7.13)), which are generated from the set of coordinates {R}i of atom i and all of its neighbors within the cutoff prescribed by the symmetry functions.

will match that contained in the training data. For example, if the training data is

814

Energies and forces

produced by ab initio quantum chemical calculations at the level of density functional theory, then the accuracy of the machine learning model will reflect the level of the density functional theory calculations employed. If, in addition, the training data includes bond-breaking and forming events, then the machine learning model will be able to account for these as well. An important question concerns the number of configurations or, equivalently, atomic environments needed to produce a potential free of “holes”, which occur when the system visits configurations whose energies and forces cannot be predicted by the model because of insufficient training data at these configurations. In order to have a feeling for this, consider the neural network potential employed by Cendagorta et al. (2021) in their study of structure-II hydrogen clathrate (see Fig. 12.17). Although this potential was only needed to describe one solid phase, in order that enough environments were included, the training data consisted of six types of structures. “Clathrate structures” included a structure-II clathrate unit cell with a range of H2 occupancies of the small and large cages; “H2 ” structures consisted of simulation boxes of pure H2 molecules; “H2 O + H2 ” structures consisted of simulation boxes with varying concentrations of hydrogen and water molecules; “Large cage” structures consisted of an isolated large cage from the structure-II clathrate; “Small cage” structures consisted of an isolated small cage from the structure-II clathrate; “H2 O” structures consisted of simulation boxes of pure water. Configurations within each of these six structure classes were generated either randomly or by performing high-temperature (> 300 K) molecular dynamics calculations so as to generate a sufficient number of different environments to obtain a potential free of holes. The numbers of structures in each of the six classes were, respectively, 2,432, 533, 716, 539, 319, and 1,440 for a total of 5,979 different environments. The RMSE error in the test set was 0.93 meV/atom, and at the conditions of the simulations performed by Cendagorta et al. (2021), no holes were observed. The process of generating this potential exemplifies the large variety of environments needed to produce a reliable model free of holes.

Appendix D Proof of the Trotter theorem The Trotter theorem figures prominently throughout the book, in both the development of numerical solvers for ordinary differential equations and in the derivation of the Feynman path integral. However, in eqn. (3.10.18), we presented the theorem without proof. Therefore, in this appendix, we outline the proof of the theorem following a technique presented by Schulman (1981). ˆ be linear operators on a general normed vector space V, also known Let Pˆ and Q as a Banach space, and let ψ ∈ V. The Trotter theorem is equivalent to the statement ˆ on V such that the difference that there exists a linear operator R   ˆ t ψ − lim Pˆ t/n Q ˆ t/n ψ = 0, (D.1) R n→∞

where 0 ≤ t < ∞. Before proceeding, it is useful to introduce the following definition: A contraction semigroup on V is a family of bounded linear operators Pˆ t , 0 ≤ t < ∞, which are defined everywhere on V and constitute a mapping V → V such that the following statements are true: Pˆ 0 = 1,

Pˆ t Pˆ s = Pˆ t+s , lim Pˆ t ψ = ψ,

t→∞

t ≥ 0, s ≤ ∞, ||Pˆ t || ≤ 1.

Here, the norm ||Pˆ t || is defined to be n o ||Pˆ t || = inf β | ||Pˆ t φ|| ≤ β||φ|| ∀φ ∈ V, ||φ|| ≤ 1 . β∈B

(D.2) (D.3)

(D.4)

ˆ B, ˆ and Aˆ + B ˆ be infinitesimal generators of the contraction semigroups Pˆ t , Q ˆ t, Let A, t ˆ ˆ and R , respectively. This means, for example, that the action of A on a vector ψ is   ˆ = lim 1 Pˆ t ψ − ψ . Aψ (D.5) t→0 t

Next, let h be a positive real number. It is straightforward to verify the following identity for the contraction semigroups:       ˆ h − 1 ψ = Pˆ h − 1 ψ + Pˆ h Q ˆ h − 1 ψ. Pˆ h Q (D.6)

Using the infinitesimal generators allows us to write

816

Trotter theorem



   ˆ h − 1 ψ = h Aˆ + B ˆ ψ + O(h), Pˆ h Q

(D.7)

where O(h) denotes any vector φ such that lim

h→0

||φ|| = 0. h

In other words, ||x|| goes to 0 faster than h does. Note that we can also write     ˆ h − 1 ψ = h Aˆ + B ˆ ψ + O(h). R

(D.8)

(D.9)

Consequently,



 ˆh − R ˆ h ψ = O(h). Pˆ h Q

Now let h = t/n. We need to show that h  i ˆ h ˆ h n ˆ hn ψ → 0 −R P Q

(D.10)

(D.11)

as n → ∞. To see how this limit can be demonstrated, consider first the case n = 2. It is straightforward to show that 

ˆh Pˆ h Q

2

    ˆ 2h = Pˆ h Q ˆh − R ˆh R ˆ h + Pˆ h Q ˆ h Pˆ h Q ˆh − R ˆh . −R

(D.12)

Likewise, for n = 3, a little algebra reveals that 

ˆh Pˆ h Q

3

    ˆ 3h = Pˆ h Q ˆh ˆh − R ˆh R ˆ 2h + Pˆ h Q ˆ h Pˆ h Q ˆh − R ˆh R −R  2   ˆh − R ˆh . ˆh Pˆ h Q + Pˆ h Q

(D.13)

Generally, therefore, n      ˆh − R ˆ nh = Pˆ h Q ˆ (n−2)h + · · · ˆh − R ˆh R ˆ (n−1)h + Pˆ h Q ˆ h Pˆ h Q ˆh − R ˆh R Pˆ h Q  n−1   ˆh − R ˆh . ˆh (D.14) + Pˆ h Q Pˆ h Q

We now let the operators on the left and right sides of eqn. (D.14) act on ψ and take the norm. This yields h  i   ˆ h ˆ h n ˆ nh ˆ h ˆ h ˆ h ˆ (n−1)h ψ ψ = P Q − R R −R P Q   ˆ (n−2)h ψ ˆh − R ˆh R ˆ h Pˆ h Q +Pˆ h Q

Trotter theorem

 n−1   h ˆh h h ˆh ˆ ˆ ˆ +···+ P Q P Q − R ψ .

817

(D.15)

Recall that for ordinary vectors a, b, and c, such that a = b+c, the triangle inequality |a| ≤ |b| + |c| holds. Similarly, we have h i    ˆ h ˆ h n ˆ nh ˆ h ˆ h ˆ h ˆ (n−1)h ψ ≤ P Q − R R −R ψ P Q   ˆ (n−2)h ψ + · · · . (D.16) ˆh − R ˆh R ˆ h Pˆ h Q + Pˆ h Q

On the right, there are n terms, all of order O(h). Thus, the right side varies as nO(h) = nO(t/n). As n → ∞, nO(t/n) → 0. Hence, h n i ˆh − R ˆ nh ψ → 0, lim Pˆ h Q (D.17) n→∞

which implies eqn. (D.1).

Appendix E Laplace transforms The Laplace transform of a function f (t) is just one of a general class of integral transforms of the form Z ∞ ˜ f (s) = dt K(st)f (t), (E.1) 0

where K(x) is a kernel that is typically a smooth and rapidly decaying function of x. The Laplace transform corresponds to the choice K(x) = e−x . Interestingly, even if the integral of f (t) over the interval t ∈ [0, ∞) does not exist, the Laplace transform f˜(s) nevertheless can. If there exists a positive constant s0 such that |e−s0 t f (t)| ≤ M for M finite, then f˜(s) exists for s > s0 . By contrast, a function such as f (t) = exp(t2 ) does not have a Laplace transform. Laplace transforms of elementary functions are generally straightforward to evaluate. Some examples are given below: f (t) = tn ,

n! f˜(s) = n+1 , s

f (t) = e−at ,

f˜(s) =

1 , s+a

f (t) = cos(ωt),

f˜(s) =

s , s2 + ω 2

f (t) = sin(ωt),

f˜(s) =

f (t) = cosh(αt),

f˜(s) =

f (t) = sinh(αt),

f˜(s) =

s2

ω , + ω2

s , s2 − α2

α . s2 − α2

(E.2)

It is left as an exercise to the reader to verify each of these relations. In addition to these elementary transforms another useful result is the Laplace transform of the convolution f (t) between two functions g(t) and h(t). Recall that this convolution is defined as Z t

f (t) =

0

dτ g(τ )h(t − τ ).

(E.3)

˜ The Laplace transform of f (t) is f˜(s) = g˜(s)h(s), which is known as the convolution theorem. The proof of the convolution theorem proceeds by writing

Laplace transforms

˜ g˜(s)h(s) = lim

a→∞

Z

a

dx e

−sx

g(x)

Z

a−x

dy e−sy h(y).

819

(E.4)

0

0

The use of the upper limit a − x rather than simply a in the y integral is permissible because of the lima→∞ and the fact that the integrals are assumed to decay rapidly. The use of a triangular integration region rather than a square one simplifies the algebra. We then introduce the change of variables x = t − z, y = z into the integral, which leads to Z a Z t −st ˜ g˜(s)h(s) = lim dt e dz g(t − z)h(z). (E.5) a→∞

0

0

When a → ∞, eqn. (E.5) is the Laplace transform of a convolution. Laplace transforms are particularly useful for solving ordinary linear differential and linear integro-differential equations. In particular, an ordinary linear differential equation for a function f (t) can be converted into a simple algebraic equation for f˜(s). This conversion is accomplished by expressing derivatives as simple algebraic expressions via Laplace transformation. Consider first the Laplace transform of the function g(t) = f ′ (t) = df /dt, which is given by Z ∞ df (E.6) g˜(s) = dt e−st . dt 0 An integration by parts shows that g˜(s) = e

−st

∞ Z f (t) + s 0



dt e−st f (t)

0

= sf˜(s) − f (0),

(E.7)

where it is assumed that lima→∞ e−sa f (a) = 0. Similarly, if h(t) = f ′′ (t) = d2 f /dt2 , then by the same analysis ˜ h(s) = s2 f˜(s) − sf (0) − f ′ (0).

(E.8)

In general, if F (t) = dn f /dtn , then F˜ (s) = sn f˜(s) − sn−1 f (0) − sn−2 f ′ (0) − · · · − f (n−1) (0).

(E.9)

In Section 15.2, we showed how these relations can be used to convert a linear second-order differential equation for f (t) into an algebraic equation for f˜(s), which can then be easily solved. However, once an expression for f˜(s) is found, obtaining f (t) requires performing an inverse Laplace transform, and the inversion of Laplace transforms is considerably less straightforward than the forward Laplace transform. Of course, the simplest way to invert a Laplace transform is to look up the given form for f˜(s) in a table of Laplace transforms and find the corresponding form for f (t). However, if f˜(s) cannot be found in the table, then an explicit inversion must be performed. Unfortunately, as we noted earlier, f (t) can diverge exponentially and its Laplace transform will still exist, a fact that makes Laplace inversion rather tricky.

820

Laplace transforms

From a numerical standpoint, a forward Laplace transform can be perform straightforwardly, while a numerical Laplace inversion is a highly ill-posed problem mathematically (Epstein and Schotland, 2008). To see how the form of the inverse Laplace transform arises, let us assume that f (t) exhibits an exponential divergence eγt , and let us define a function g(t) by f (t) = eγt g(t). Since the Laplace transform restricts t to the interval [0, ∞), we will assume g(t) = 0 for t < 0. g(t) is a well-behaved function, and therefore we can define it via its Fourier transform Z ∞ 1 dω eiωt gˆ(ω). (E.10) g(t) = √ 2π −∞ The Fourier transform gˆ(ω) can, itself, be expressed in terms of g(t) as Z ∞ 1 gˆ(ω) = √ dt g(t)e−iωt . 2π 0 Substituting eqn. (E.11) into eqn. (E.10) yields Z ∞ Z ∞ 1 dω eiωt du g(u)e−iωu . g(t) = 2π −∞ 0 Since g(t) = e−γt f (t), eqn. (E.12) implies that Z Z ∞ eγt ∞ f (t) = dω eiωt du f (u)e−γu e−iωu . 2π −∞ 0

(E.11)

(E.12)

(E.13)

If we let s = γ + iω, then eqn. (E.13) becomes 1 f (t) = 2πi

Z

1 2πi

Z

=

γ+i∞

ds e γ−i∞ γ+i∞

st

Z



du f (u)e−us

0

ds est f˜(s).

(E.14)

γ−i∞

Equation (E.14) defines the Laplace inversion problem as an integral over a complex variable s, which can be performed using techniques of complex integration and the calculus of residues. For t > 0, eqn. (E.14) can be rewritten as a contour integral using a contour of the type shown in Fig. E.1 whose leading edge is parallel to the imaginary s axis and is chosen far enough to the right to enclose all of the poles of f˜(s). The contour is then closed in the left half of the complex plane. Denoting this contour as B for the Bromwich contour, we can rewrite eqn. (E.14) as I 1 (E.15) ds est f˜(s). f (t) = 2πi B As an example, consider inverting the Laplace transform for f˜(s) = α/(s2 − α2 ), which we know from eqn. (E.2) to be the Laplace transform of f (t) = sinh(αt). We set up the contour integration as

Laplace transforms

821

Fig. E.1 Bromwich contour, which contains all of the poles of f˜(s). Laplace inversion employs such a contour extended to infinity in all directions.

f (t) =

1 2πi

I

αest . (s − α)(s + α)

ds

B

(E.16)

The integrand in eqn. (E.16) has two first-order poles at s = α and s = −α. Thus, we need to choose the leading edge of the contour to lie to the right of the point s = α on the real s-axis. Once we do this and apply the residue theorem to each of the poles, we obtain   αt e−αt e α 2πi − f (t) = 2πi 2α 2α =

 1  αt e − e−αt 2

= sinh(αt).

(E.17)

In Section 15.2, we showed how to solve ordinary linear differential equations using Laplace transforms. Now let us use the result obtained in eqn. (E.17) in another example that involves the application of the Laplace transform to the solution of an integral equation. Consider the integral equation x(t) = t + a2

Z

0

t

dτ (t − τ )x(τ ),

(E.18)

which we must solve for x(t) in terms of the constant a. We can make use of the fact that the integral appearing in eqn. (E.18) is a convolution, and its Laplace transform will simply be x˜(s)/s2 , since the Laplace transform of f (t) = t is f˜(s) = 1/s2 . Thus, taking the Laplace transform of both sides of eqn. (E.18), we obtain the algebraic equation

822

Laplace transforms

x ˜(s) =

x˜(s) 1 + a2 2 . s2 s

(E.19)

1 , s2 − a2

(E.20)

Solving this for x ˜(s), we obtain x ˜(s) =

and, as we have already shown, the inverse Laplace transform of x ˜(s), which gives us the solution x(t) is 1 x(t) = sinh(at). (E.21) a

References Abrams, C. F. and Vanden-Eijnden, E. (2010). Large-scale conformational sampling of proteins using temperature-accelerated molecular dynamics. Proc. Natl. Acad. Sci. U.S.A., 107, 4961. Abrams, J. B., Rosso, L., and Tuckerman, M. E. (2006). Efficient and precise solvation free energies via alchemical adiabatic molecular dynamics. J. Chem. Phys., 125, 074115. Abrams, J. B. and Tuckerman, M. E. (2008). Efficient and direct generation of multidimensional free energy surfaces via adiabatic dynamics without coordinate transformations. J. Phys. Chem. B , 112, 15742. Abreu, C. R. A. and Tuckerman, M. E. (2020). Molecular dynamics with very large time steps for the calculation of solvation free energies. J. Chem. Theory Comput., 16, 7314. Agarwal, A. and Delle Site, L. (2016). Grand-canonical adaptive resolution centroid molecular dynamics: Implementation and application. Comp. Phys. Comm., 206, 26–34. Alavi, S., Ripmeester, J. A., and Klug, D. D. (2005). Molecular dynamics study of structure II hydrogen clathrates. J. Chem. Phys., 123, 024507. Alder, B. J. and Wainwright, T. E. (1957). Phase transition for a hard sphere system. J. Chem. Phys., 27, 1208. Alder, B. J. and Wainwright, T. E. (1959). Studies in molecular dynamics. 1. General method. J. Chem. Phys., 31, 459. Alder, B. J. and Wainwright, T. E. (1967). Velocity autocorrelations for hard spheres. Phys. Rev. Lett., 18, 988. Allen, M. P. and Tildesley, D. J. (1989). Computer Simulations of Liquids. Claredon Press, Oxford. AlQuraishi, M. (2019). AlphaFold at CASP13. Bioinform., 35, 4862. Andersen, H. C. (1980). Molecular dynamics at constant pressure and/or temperature. J. Chem. Phys., 72, 2384. Andersen, H. C. (1983). RATTLE—A velocity version of the SHAKE algorithm for molecular dynamics calculations. J. Comput. Phys., 52, 24. Anderson, M. H., Ensher, J. R., Matthews, M. R., Wieman, C. E., and Cornell, E. A. (1995). Observation of Bose-Einstein condensation in a dilute vapor. Science, 269, 198. Ansuini, A., Laio, A., Macke, J. H., and Zoccolan, D. (2019). Intrinsic dimension of data representations in deep neural networks. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2019) (ed. H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´e-Buc, E. Fox, and R. Garnett), Volume 32. Curran Associates, Inc.

824

References

Ashurst, W. T. and Hoover, W. G. (1975). Dense-fluid shear viscosity via nonequilibrium molecular dynamics. Phys. Rev. A, 11, 658. Awasthi, S. and Nair, N. N. (2017). Exploring high dimensional free energy landscapes: Temperature accelerated sliced sampling. J. Chem. Phys., 146, 094108. Bader, J. S. and Berne, B. J. (1994). Quantum and classical relaxation rates from classical simulations. J. Chem. Phys., 100, 8359. Baer, R. and Head-Gordon, M. (1998). Electronic structure of large systems: Coping with small gaps using the energy renormalization group method. J. Chem. Phys., 109, 10159. Barducci, A., Bussi, G., and Parrinello, M. (2008). Well-tempered metadynamics: A smoothly converging and tunable free-energy method. Phys. Rev. Lett., 100, 020603. Beck, T. L. (2011). Hydration free energies by energetic partitioning of the potential distribution theorem. J. Stat. Phys., 145, 335. Beck, T. L., Paulaitis, M. E., and Pratt, L. R. (2006). The Potential Distribution Theorem and Models of Molecular Solutions. Cambridge University Press, Cambridge. Behler, J. (2011). Atom-centered symmetry functions for constructing highdimensional neural network potentials. J. Chem. Phys., 134, 074106. Behler, J. (2015). Constructing high-dimensional neural network potentials: A tutorial review. Intl. J. Quant. Chem., 115, 1032. Behler, J. (2021). Four generations of high-dimensional neural network potentials. Chem. Rev., 121, 10037. Behler, J. and Parrinello, M. (2007). Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett., 98, 146401. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A., and Haak, J. R. (1984). Molecular dynamics with coupling to an external bath. J. Chem. Phys., 81, 3684. Berezhkovskii, A. M., Pollak, E., and Zitserman, V. Y. (1992). Activated rateprocesses - generalization of the Kramers-Grote-Hynes and Langer theories. J. Chem. Phys., 97, 2422. Berges, J., Borsanyi, S., Sexty, D., and Stamatescu, I. O. (2007). Lattice simulations of real-time quantum fields. Phys. Rev. D , 75, 045007. Berne, B. J. (1971). Time correlation functions in condensed media. In Physical Chemistry, Vol. VIIIB (ed. H. Eyring, D. Henderson, and W. Jost), p. 539. Academic Press, New York. Berne, B. J., Boon, J. P., and Rice, S. A. (1966). On calculation of autocorrelation functions of dynamical variables. J. Chem. Phys., 45, 1086. Berne, B. J. and Harp, G. D. (1970a). On the calculation of time correlation functions. Adv. Chem. Phys., 17, 63. Berne, B. J. and Harp, G. D. (1970b). Time correlation functions, memory functions and molecular dynamics. Phys. Rev. A, 2, 975. Berne, B. J., Pechukas, P., and Harp, G. D. (1968). Molecular reorientation in liquids and gases. J. Chem. Phys., 49, 3125. Berne, B. J. and Pecora, R. (1976). Dynamic Light Scattering. Wiley, New York.

References

825

Berne, B. J., Tuckerman, M. E., Straub, J. E., and Bug, A. L. R. (1990). Dynamic friction on rigid and flexible bonds. J. Chem. Phys., 93, 5084. Bertie, J. E., Ahmed, M. K., and Eysel, H. H. (1989). Infrared intensities of liquids. 5. Optical and dielectric-constants, integrated-intensities, and dipole-moment derivatives of H2 O and D2 O at 22◦ C. J. Phys. Chem., 93, 2210. Beutler, T. C., Mark, A. E., van Schaik, R. C., Gerber, P. R., and van Gunsteren, W. F. (1994). Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chem. Phys. Lett., 222, 529. Bezdek, J. C., Ehrlich, R., and Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers and Geosci., 10, 191. Bianucci, M., Mannella, M., and Grigolini, P. (1996). Linear response of Hamiltonian chaotic systems as a function of the number of degrees of freedom. Phys. Rev. Lett., 77, 1258. Binnig, G., Quate, C. F., and Gerber, C. (1986). Atomic force microscope. Phys. Rev. Lett., 56, 930. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, Singapore. Bolhuis, P. G., Chandler, D., Dellago, C., and Geissler, P. L. (2002). Transition path sampling: Throwing ropes over rough mountain passes, in the dark. Ann. Rev. Phys. Chem., 53, 291. Bonanno, A. and Zappal` a, D. (2001). Towards an accurate determination of critical exponents with the renormalization group flow equations. Phys. Lett. B , 504, 181. Bond, S. D., Leimkuhler, B. J., and Laird, B. B. (1999). The Nos´e-Poincar´e method for constant temperature molecular dynamics. J. Comp. Phys., 151, 114. Bonella, S., Monteferrante, M., Pierleoni, C., and Ciccotti, G. (2010). Path integral based calculations of symmetrized time correlation functions. I. J. Chem. Phys., 133, 164104. Bonomi, M., Branduardi, D., Bussi, G., Camilloni, C., Provasi, D., Raiteri, P., Donadio, D., Marinelli, F., Pietrucci, F., Broglia, R. A., and Parrinello, M. (2009). PLUMED: A portable plugin for free-energy calculations with molecular dynamics. Comp. Phys. Comm., 180, 1961. Booth, G. H., Thom, A. J. W., and Alavi, A. (2009). Fermion Monte Carlo without fixed nodes: A game of life, death, and annihilation in Slater determinant space. J. Chem. Phys., 131, 054106. Braams, B. J. and Manolopoulos, D. E. (2006). On the short-time limit of ring polymer molecular dynamics. J. Chem. Phys., 125, 124105. Branduardi, D., Gervasio, F. L., and Parrinello, M. (2007). From a to b in free energy space. J. Chem. Phys., 126, 054103. Brown, F. R., Butler, F. P., Chen, H., Christ, N. H., Dong, Z. H., Schaffer, W., Unger, L. I., and Vaccarino, A. (1991). Hadron masses in QCD with 2 flavors of dynamic fermions at beta = 5.7. Phys. Rev. Lett., 67, 1062. Brown, F. R. and Christ, N. H. (1988). Parallel supercomputers for lattice gauge theory. Science, 239, 1393. Budimir, J. and Skinner, J. L. (1987). On the relationship between T1 and T2 for stochastic relaxation models. J. Stat. Phys., 49, 1029.

826

References

Bulgac, A. and Kusnezov, D. (1990). Canonical ensemble averages from pseudomicrocanonical dynamics. Phys. Rev. A, 42, 5045. Bussi, G., Gervasio, F., Laio, A., and Parrinello, M. (2006). Free-energy landscape for beta hairpin folding from combined parallel tempering and metadynamics. J. Am. Chem. Soc., 128, 13435. Caldeira, A. O. and Leggett, A. J. (1983). Quantum tunnelling in a dissipative system. Ann. Phys., 149, 374. Calvo, F. (2002). Sampling along reaction coordinates with the Wang-Landau method. Mol. Phys., 100, 3421. Cao, J. and Berne, B. J. (1989). On energy estimators in path-integral Monte Carlo simulations—dependence of accuracy on algorithm. J. Chem. Phys., 91, 6359. Cao, J. and Berne, B. J. (1993). A Born-Oppenheimer approximation for pathintegrals with an application to electron solvation in polarizable fluids. J. Chem. Phys., 99, 2902. Cao, J. and Martyna, G. J. (1996). Adiabatic path integral molecular dynamics methods. II. Algorithms. J. Chem. Phys., 104, 2028. Cao, J. and Voth, G. A. (1993). A new perspective on quantum time correlation functions. J. Chem. Phys., 99, 10070. Cao, J. and Voth, G. A. (1994a). The formulation of quantum statistical mechanics based on the Feynman path centroid density. II. Dynamical properties. J. Chem. Phys., 100, 5106–5117. Cao, J. and Voth, G. A. (1994b). The formulation of quantum statistical mechanics based on the Feynman path centroid density. IV. Algorithms for centroid molecular dynamics. J. Chem. Phys., 101, 6168. Cao, J. and Voth, G. A. (1996). Semiclassical approximations to quantum dynamical time correlation functions. J. Chem. Phys., 104, 273. Car, R. and Parrinello, M. (1985). Unified approach for molecular dynamics and density-functional theory. Phys. Rev. Lett., 55, 2471. Cardy, J. (1996). Scaling and Renormalization in Statistical Physics. Cambridge University Press, Cambridge. Carter, E. A., Ciccotti, G., Hynes, J. T., and Kapral, R. (1989). Constrained reaction coordinate dynamics for the simulation of rare events. Chem. Phys. Lett., 156, 472. Causo, M. S., Ciccotti, G., Bonella, S., and Builleumier, R. (2006). An adiabatic linearized path-integral approach for quantum time-correlation functions II: A cumulant expansion method for improving convergence. J. Chem. Phys., 110, 16026. Cendagorta, J. R., Baˇci´c, Z., and Tuckerman, M. E. (2018). An open-chain imaginarytime path-integral sampling approach to the calculation of approximate symmetrized quantum time correlation functions. J. Chem. Phys., 148, 102340. Cendagorta, J. R., Powers, A., Hele, T. J. H., Marsalek, O., Baˇci´c, Z., and Tuckerman, M. E. (2016). Competing quantum effects in the free energy profiles and diffusion rates of hydrogen and deuterium molecules through clathrate hydrates. Phys. Chem. Chem. Phys., 18, 32169. Cendagorta, J. R., Shen, H., Baˇci´c, Z., and Tuckerman, M. E. (2021). Enhanced sampling path integral methods using neural network potential energy surfaces with application to diffusion in hydrogen hydrates. Adv. Theor. Simulat., 4, 2000258.

References

827

Cendagorta, J. R., Tolpin, J., Schneider, E., Topper, R. Q., and Tuckerman, M. E. (2020). Comparison of the performance of machine learning models in representing high-dimensional free energy surfaces and generating observables. J. Phys. Chem. B , 124, 3647. Ceperley, D. M. (1995). Path integrals in the theory of condensed helium. Rev. Mod. Phys., 67, 279. Ceperley, D. M. and Pollock, E. L. (1984). Simulation of quantum many-body systems by path-integral methods. Physical Review B , 30, 2555. Ceriotti, M., Bussi, G., and Parrinello, M. (2009). Nuclear quantum effects in solids using a colored-noise thermostat. Phys. Rev. Lett., 103, 030603. Ceriotti, M., Bussi, G., and Parrinello, M. (2010a). Colored-noise thermostats `a la carte. J. Chem. Theory Comput., 6, 1170. Ceriotti, M., Manolopoulos, D. E., and Parrinello, M. (2011). Accelerating the convergence of path integral dynamics with a generalized Langevin equation. J. Chem. Phys., 134, 084104. Ceriotti, M., Parrinello, M., Markland, T. E., and Manolopoulos, D. E. (2010b). Efficient stochastic thermostatting of path integral molecular dynamics. J. Chem. Phys., 133, 124104. Chandler, D. (1987). Introduction to Modern Statistical Mechanics. Oxford University Press, New York. Chandler, D. and Wolynes, P. G. (1981). Exploiting the isomorphism between quantum-theory and classical statistical-mechanics of polyatomic fluids. J. Chem. Phys., 74, 4078. Chandrasekhar, S. (1943). Stochastic problems in physics and astronomy. Rev. Mod. Phys., 15, 1. Chen, B., Ivanov, I., Klein, M. L., and Parrinello, M. (2003). Hydrogen bonding in water. Phys Rev. Lett., 91, 215503. Chen, M., Cuendet, M. A., and Tuckerman, M. E. (2012). Heating and flooding: A unified approach for rapid generation of free energy surfaces. J. Chem. Phys., 137, 024102. Chen, M., Yu, T. -Q., and Tuckerman, M. E. (2015). Locating landmarks on highdimensional free energy surfaces. Proc. Natl. Acad. Sci. U.S.A., 112, 3235. Chen, P. -Y. and Tuckerman, M. E. (2018). Molecular dynamics based enhanced sampling of collective variables with very large time steps. J. Chem. Phys., 148, 024106. Chin, S. A. (1997). Symplectic integrators from composite operator factorizations. Phys. Lett. A, 226, 344. Chipot, C. and Pohorille, A., eds. (2007). Free Energy Calculations. Springer–Verlag, Berlin. Chow, T. W. S. and Cho, S. Y. (2007). Neural Networks and Computing. Imperial College Press, London. Ciccotti, G. and Delle Site, L. (2019). The physics of open systems for the simulation of complex molecular environments in soft matter. Soft Matt., 15, 2114. Ciccotti, G., Jacucci, G., and McDonald, I. R. (1976). Transport properties of molten alkali-halides. Phys. Rev. A, 13, 426.

828

References

Ciccotti, G., Jacucci, G., and McDonald, I. R. (1979). Thought experiments by molecular dynamics. J. Stat. Phys., 21, 1. Ciccotti, G., Pierleoni, C., and Ryckaert, J. P. (1992). Theoretical foundation and rheological application of nonequilibrium molecular dynamics. In Microscopic Simulations of Complex Hydrodynamic Phenomena (ed. M. Marechal and B. L. Holian), p. 25. Plenum Press, New York. Cichos, F., Gustavsson, K., Mehlig, B., and Volpe, G. (2020). Machine learning for active matter. Nature Mach. Intell., 2, 94–103. Coker, D. F., Berne, B. J., and Thirumalai, D. (1987). Path integral Monte-Carlo studies of the behavior of excess electrons in simple fluids. J. Chem. Phys., 86, 5689. Corsini, R., Lazzerini, B., and Marcelloni, F. (2005). Relational clustering algorithm based on the fuzzy C-means algorithm. Intl. J. Soft Comput., 2005, 439. Cortes-Huerto, R., Praprotnik, M., Kremer, K., and Delle Site, L. (2021). From adaptive resolution to molecular dynamics of open systems. Euro. Phys. J. B , 94, 189. Craig, I. R. and Manolopoulos, D. E. (2004). Quantum statistics and classical mechanics: Real time correlation functions from ring polymer molecular dynamics. J. Chem. Phys., 121, 3368. Craig, I. R. and Manolopoulos, D. E. (2005). A refined ring polymer molecular dynamics theory of chemical reaction rates. J. Chem. Phys., 123, 034102. Crooks, G. E. (1998). Nonequilibrium measurements of free energy differences for microscopically reversible Markovian systems. J. Stat. Phys., 90, 1481. Crooks, G. E. (1999). Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E , 60, 2721. Cuendet, M. A. (2006). Statistical mechanical derivation of Jarzynski’s identity for thermostatted non-Hamiltonian dynamics. Phys. Rev. Lett., 96, 120602. Cuendet, M. A., Margul, D. T., Schneider, E., Vogt-Maranto, L., and Tuckerman, M. E. (2018). Endpoint-restricted adiabatic free energy dynamics approach for the exploration of biomolecular conformational equilibria. J. Chem. Phys., 149, 072316. Cuendet, M. A. and Tuckerman, M. E. (2014). Free energy reconstruction from metadynamics or adiabatic free energy dynamics simulations. J. Chem. Theory Comput., 10, 2975. Cynn, H., Yoo, C. S., Baer, B., Iota-Herbei, V., McMahan, A. K., Nicol, M., and Carlson, S. (2001). Martensitic FCC-to-HCP transformation observed in xenon at high pressure. Phys. Rev. Lett., 86, 4552. Dama, J. F., Parrinello, M., and Voth, G. A. (2014). Well-tempered metadynamics converges asymptotically. Phys. Rev. Lett., 112. Darve, E. (2007). Thermodynamic integration using constrainted and unconstrained dynamics. In Free Energy Calculations (ed. C. Chipot and A. Pohorille), pp. 119– 170. Springer-Verlag, Berlin. Darve, E. and Pohorille, A. (2001). Calculating free energies using average force. J. Chem. Phys., 115, 9169. Darve, E., Rodriguez-Gomez, D., and Pohorille, A. (2008). Adaptive biasing force method for scalar and vector free energy calculations. J. Chem. Phys., 128, 144120.

References

829

Darve, E., Solomon, J., and Kia, A. (2009). Computing generalized Langevin equations and generalized Fokker–Planck equations. Proc. Natl. Acad. Sci. U.S.A., 106, 10884. Davisson, C. and Germer, L. H. (1927). Reflection of electrons by a crystal of nickel. Nature, 119, 558. De Leon, N., Mehta, M. A., and Topper, R. Q. (1991). Cylindrical manifolds in phase space as mediators of chemical reaction dynamics and kinetics I. Theory. J. Chem. Phys., 94, 8310. Delbuono, G. S., Rossky, P. J., and Schnitker, J. (1991). Model dependence of quantum isotope effects in liquid water. J. Chem. Phys., 95, 3728. Dellago, C., Bolhuis, P. G., Csajka, F. S., and Chandler, D. (1998). Transition path sampling and the calculation of rate constants. J. Chem. Phys., 108, 1964. Dellago, C., Bolhuis, P. G., and Geissler, P. L. (2002). Transition path sampling. Adv. Chem. Phys., 123, 1. Delle Site, L. and Praprotnik, M. (2017). Molecular systems with open boundaries: Theory and simulation. Phys. Rep., 693, 1. Deng, Y. Q. and Roux, B. (2009). Computations of standard binding free energies with molecular dynamics simulations. J. Phys. Chem. B , 113, 2234. Deutsch, J. M. and Silbey, R. (1971). Exact generalized Langevin equation for a particle in a harmonic lattice. Phys. Rev., 3, 2049. Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B , 195, 216. Duncan, J., Harjunmaa, A., Terrell, R., Drautz, R., Henkelman, G., and Rogal, J. (2016). Collective atomic displacements during complex phase boundary migration in solid-solid phase transformations. Phys. Rev. Lett., 116. Edberg, R., Morriss, G. P., and Evans, D. J. (1987). Rheology of n-alkanes by nonequilibrium molecular dynamics. J. Chem. Phys., 86, 4555. Engle, R. D., Skeel, R. D., and Dress, M. (2005). Monitoring energy drift with shadow Hamiltonians. J. Comp. Phys., 206, 432. Epstein, C. L. and Schotland, J. (2008). The bad truth about Laplace’s transform. SIAM Rev., 50, 504. Ernst, M. H., Hague, E. H., and van Leeuwen, J. M. (1971). Asymptotic time behavior of correlation functions 1. Kinetic terms. Phys. Rev. A, 4, 2055. Essmann, U., Perera, L., Berkowitz, M. L., Darden, T., Lee, H., and Pedersen, L. G. (1995). A smooth particle-mesh Ewald method. J. Chem. Phys., 103, 8577. Evans, D. J. and Morriss, G. P. (1980). Statistical Mechanics of Nonequilibrium Liquids. Harcourt Brace Javanovich, London. Evans, D. J. and Morriss, G. P. (1984). Non-Newtonian molecular dynamics. Comput. Phys. Rep., I, 299. Ewald, P. P. (1921). Die Berechnung optischer und electrostatischer Gitterpotentiale. Ann. Phys., 64, 253. Ezra, G. S. (2004). On the statistical mechanics of non-Hamiltonian systems: The generalized Liouville equation, entropy, and time-dependent metrics. J. Math. Chem., 35, 29. Ezra, G. S. (2007). Reversible measure-preserving integrators for non-Hamiltonian

830

References

systems. J. Chem. Phys., 125, 034104. Facco, E., d’Errico, M., Rodriguez, A., and Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci. Rep., 7. Fanourgakis, G. S., Schenter, G. K., and Xantheas, S. S. (2006). A quantitative account of quantum effects in liquid water. J. Chem. Phys., 125, 141102. Fanourgakis, G. S. and Xantheas, S. S. (2006). The flexible, polarizable thole-type interaction potential for water (TTM2-F) revisited. J. Phys. Chem. A, 110, 4100. Feng, Y., Huang, J., Kim, S., Shim, J. H., MacKerell, A. D., and Ge, N. H. (2016). Structure of penta-alanine investigated by two-dimensional infrared spectroscopy and molecular dynamics simulations. J. Phys. Chem. B , 120, 5325. Fermi, E., Pasta, J., and Ulam, S. (1955). Studies in nonlinear problems. I. Los Alamos Report , LA 1940. Fernandez, J. M. and Li, H. (2004). Force-clamp spectroscopy minotors the folding trajectory of a single protein. Science, 303, 1674. Feynman, R. P. (1948). Space-time approach to non-relativistic quantum mechanics. Rev. Mod. Phys., 20, 367. Feynman, R. P. (1998). Statistical Mechanics: A Set of Lectures. Addison-Wesley, Reading. Feynman, R. P. and Hibbs, A. R. (1965). Quantum Mechanics and Path Integrals. McGraw-Hill, New York. Feynman, R. P. and Kleinert, H. (1986). Effective classical partition functions. Phys. Rev. A, 34, 5080. Fl¨ ugge, S. (1994). Practical Quantum Mechanics 2nd edn. Springer, Berlin. Fredrickson, G. H. (2006). The Equilibrium Theory of Inhomogeneous Polymers. Oxford University Press, Oxford. Frenkel, D. and Smit, B. (2002). Understanding Molecular Simulation. Academic Press, San Diego. Fridman, B. L. (1967). An improvement in the smoothness of the functions in Kolmogorov’s theorem on superpositions. Dokl. Akad. Nauk USSR, 117, 1019. Futrelle, R. P. and McGinty, D. J. (1971). Calculation of spectra and correlationfunctions from molecular dynamics data using fast Fourier-transform. Chem. Phys. Lett., 12, 285. Ganeshan, S., Ram´ırez, R., and Fern´ andez-Serra, M. V. (2013). Simulation of quantum zero-point effects in water using a frequency-dependent thermostat. Phys. Rev. B , 87, 134207. Gans, J. and Shalloway, D. (2000). Shadow mass and the relationship between velocity and momentum in symplectic numerical integration. Phys. Rev. E , 61, 4587. ¨ Gauss, K. F. (1829). Uber ein allgemeines Grundgesetz der Mechanik. Reine Angew. Math., IV, 232. Gausterer, H. and Klauder, J. R. (1986). Complex Langevin equations and their applications to quantum statistical and lattice field models. Phys. Rev. D , 33, 3678. Geiger, P. and Dellago, C. (2013). Neural networks for local structure detection in polymorphic systems. J. Chem. Phys., 139, 164105. Geissler, P. L., Dellago, C., and Chandler, D. (1999). Kinetic pathways of ion pair

References

831

dissociation in water. J. Phys. Chem. B , 103, 3706. Geissler, P. L., Dellago, C., Chandler, D., Hutter, J., and Parrinello, M. (2001). Autoionization in liquid water. Science, 291, 291. Geva, E., Shi, Q., and Voth, G. A. (2001). Quantum-mechanical reaction rate constants from centroid molecular dynamics simulations. J. Chem. Phys., 115, 9209. Gillan, M. J. (1987). Quantum-classical crossover of the transition rate in the damped double well. J. Phys. C., 20, 3621. Gilson, M. K., Given, J. A., Bush, B. L., and McCammon, J. A. (1997). The statistical thermodynamic basis for computation of binding affinities. Biophys. J., 72, 1047. Glaesemann, K. R. and Fried, L. E. (2002). Improved heat capacity estimator for path integral simulations. J. Chem. Phys., 117, 3020. Goldstein, H. (1980). Classical Mechanics. Addison-Wesley, Reading. Gosling, E. M., McDonald, I. R., and Singer, K. (1973). Calculation by molecular dynamics of shear viscosity of a simple fluid. Mol. Phys., 26, 1475. Goyal, M., Goyal, R., and Lall, B. (2019). Learning activation functions: A new paradigm for understanding neural networks. Proc. Machine Learning Res., 101, 1. Graner, F. and Glazier, J. A. (1992). Simulation of biological cell sorting using a two-dimensional extended Potts model. Phys. Rev. Lett., 69, 2013. Grote, R. F. and Hynes, J. T. (1980). The stable states picture of chemical-reactions 2. Rate constants for condensed and gas-phase reaction models . J. Chem. Phys., 73, 2715. Guisoni, N., Mazzitello, K. I., and Diambra, L. (2018). Modeling active cell movement with the Potts model. Front. Phys., 6. Gustafson, D. E. and Kessel, W. C. (1978). Fuzzy clustering with a fuzzy covariance matrix. In Proceedings of the 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes. IEEE Control Systems Society. Habershon, S., Markland, T. E., and Manolopoulos, D. E. (2009). Competing quantum effects in the dynamics of a flexible water model. J. Chem. Phys., 131, 024501. Haldane, A. and Levy, R. M. (2019). Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation. Phys. Rev. E , 99. Hall, R. W. and Berne, B. J. (1984). Nonergodicity in path integral molecular dynamics. J. Chem. Phys., 81, 3641. Harp, G. D. and Berne, B. J. (1968). Linear- and angular-momentum autocorrelation functions in diatomic liquids. J. Chem. Phys., 49, 1249. Harp, G. D. and Berne, B. J. (1970). Time-correlation functions, memory functions and molecular dynamics. Phys. Rev. A, 2, 975. Hastie, T., Tibshirani, R., and Friedman, J. (2017). The Elements of Statistical Learning. Springer, New York. Hayes, R. L., Paddison, S. J., and Tuckerman, M. E. (2009). Proton transport in triflic acid hydrates studied via path integral Car-Parrinello molecular dynamics. J. Phys. Chem. B , 113, 16574. Hayes, R. L., Paddison, S. J., and Tuckerman, M. E. (2011). Proton transport in triflic acid pentahydrate studied via ab initio path integral molecular dynamics. J. Phys. Chem. A, 115, 6112–6124. Hecht-Nielsen, R. (1987). Kolmogorov’s mapping neural network existence theorem.

832

References

In Proceedings of the International Conference on Neural Networks, volume 3 (ed. I. of Electrical and E. Engineers), p. 11. IEEE Press, New York. Heidari, M., Kremer, K., Golestanian, R., Potestio, R., and Cortes-Huerto, R. (2020). Open-boundary Hamiltonian adaptive resolution. From grand canonical to nonequilibrium molecular dynamics simulations. J. Chem. Phys. (19), 194104. Hele, T. J. H., Willatt, M. J., Muolo, A., and Althorpe, S. C. (2015a). Boltzmannconserving classical dynamics in quantum time-correlation functions: “Matsubara dynamics”. J. Chem. Phys., 142(13), 134103. Hele, T. J. H., Willatt, M. J., Muolo, A., and Althorpe, S. C. (2015b). Communication: Relation of centroid molecular dynamics and ring-polymer molecular dynamics to exact quantum dynamics. J. Chem. Phys., 142(19), 191101. Henderson, R. L. (1974). A uniqueness theorem for fluid pair correlation functions. Phys. Lett. A, 49, 197. Herman, M. F., Bruskin, E. J., and Berne, B. J. (1982). On path-integral Monte Carlo simulations. J. Chem. Phys., 76, 5150. Herman, M. F. and Kluk, E. (1984). A semiclassical justification for the use of non-spreading wavepackets in dynamics calculations. Chem. Phys., 91, 27. Hernandez, R., Cao, J. S., and Voth, G. A. (1995). On the Feynman path centroid density as a phase-space distribution in quantum-statistical mechanics. J. Chem. Phys., 103, 5018. Hillery, M., O’Connel, R. F., Scully, M. O., and Wigner, E. P. (1984). Distribution functions in physics: Fundamentals. Phys. Rep., 106, 121. Hirshberg, B., Rizzi, V., and Parrinello, M. (2019). Path integral molecular dynamics for bosons. Proc. Natl. Acad. Sci. U.S.A., 116, 21445. Hirshberg, B., Rizzi, V., and Parrinello, M. (2020). Path integral molecular dynamics for fermions: Alleviating the sign problem with the Bogliubov inequality. J. Chem. Phys., 152, 171102. Hockney, R. W. and Eastwood, J. W. (1981). Computer Simulation Using Particles. McGraw-Hill, New York. Hohenberg, P. and Kohn, W. (1964). Inhomogeneous electron gas. Phys. Rev., 136, B864. Hone, T. D., Poulsen, J. A., Rossky, P. J., and Manolopoulos, D. E. (2008). Comparison of approximate quantum simulation methods applied to normal liquid helium at 4 K. J. Phys. Chem. B , 112, 294. Hone, T. D., Rossky, P. J., and Voth, G. A. (2006). A comparative study of imaginary time path integral based methods for quantum dynamics. J. Chem. Phys., 124, 154103. Hong, R. S., Chan, E. J., Vogt-Maranto, L., Mattei, A., Sheikh, A. Y., and Tuckerman, M. E. (2021). Insights into the polymorphic structures and enantiotropic layer-slip transition in Paracetamol form III from enhanced molecular dynamics. Cryst. Growth & Design, 21, 886. Hoover, W. G. (1983). Nonequilibrium molecular dynamics. Ann. Rev. Phys. Chem., 34, 103. Hoover, W. G. (1985). Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A, 31, 1695.

References

833

H¨ ormander, L. (1967). Hypoelliptic second-order differential equations. Acta Math., 119, 147. Huang, C. and Carter, E. A. (2010). Nonlocal orbital-free kinetic energy density functional for semiconductors. Phys. Rev. B , 81, 045206. Huang, K. (1963). Statistical Mechanics (2nd edn). John Wiley & Sons, New York. Hummer, G. and Szabo, A. (2001). Free energy reconstruction from nonequilibrium single-molecule pulling experiments. Proc. Natl. Acad. Sci. U.S.A., 98, 3658. Hwang, J. K., Chu, Z. T., Yadav, A., and Warshel, A. (1991). Simulations of quantummechanical corrections for rate constants of hydride-transfer reactions in enzymes and solutions. J. Phys. Chem., 95, 8445. Hwang, J. K. and Warshel, A. (1996). How important are quantum mechanical nuclear motions in enzyme catalysis? J. Am. Chem. Soc., 118, 11745. Iftimie, R., Minary, P., and Tuckerman, M. E. (2005). Ab initio molecular dynamics: Concepts, recent developments and future trends. Proc. Natl. Acad. Sci. U.S.A., 102, 6654. Iftimie, R. and Tuckerman, M. E. (2005). Decomposing total IR spectra of aqueous systems into solute and solvent contributions: A computational approach using maximally localized Wannier orbitals. J. Chem. Phys., 122, 214508. Ivanov, S. D., Witt, A., Shiga, M., and Marx, D. (2010). On artificial frequency shifts in infrared spectra obtained from centroid molecular dynamics: Quantum liquid water. J. Chem. Phys., 132, 031101. Izaguirre, J. A. and Hampton, S. S. (2004). Shadow hybrid Monte Carlo: An efficient propagator in phase space of macromolecules. J. Comp. Phys., 200, 581. Izvekov, S. and Voth, G. A. (2006). Modeling real dynamics in the coarse-grained representation of condensed phase systems. J. Chem. Phys., 125, 151101. Jadhao, V. and Makri, N. (2008). Iterative Monte Carlo for quantum dynamics. J. Chem. Phys., 129, 161102. Janeˇziˇc, D., Praprotnik, M., and Merzel, F. (2005). Molecular dynamics integration and molecular vibrational theory. I. New symplectic integrators. J. Chem. Phys., 122, 174101. Jang, S., Jang, S., and Voth, G. A. (2001). Applications of higher order composite factorization schemes in imaginary time path integral simulations. J. Chem. Phys., 115, 7832. Janke, W. (ed) (2010). Rugged Free Energy Landscapes. Springer–Verlag, Berlin. Jarzynski, C. (1997). Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78, 2690. Jarzynski, C. (2004). Nonequilibrium work theorem for a system strongly coupled to a thermal environment. J. Stat. Mech.: Theor. and Expt., P09005. Jorgensen, W. L., Maxwell, D. S., and Tirado-Rives, J. (1996). Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc., 118, 11225. Kadanoff, L. P. (1966). Scaling laws for Ising models near Tc . Physics, 2, 263. Kahane, J. -P. (1975). Sur le th´eor`eme de superposition de Kolmogorov. J. Approx. Theory, 13, 229. Kalos, M. H. and Whitlock, P. A. (1986). Monte Carlo Methods. Volume 1: Basics.

834

References

John Wiley & Sons, New York. Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., and Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell., 24, 881. Kapil, V., Behler, J., and Ceriotti, M. (2016). High order path integrals made easy. J. Chem. Phys., 145, 234103. Karplus, M. (1959). Contact electron-spin coupling of nuclear magnetic moments. J. Chem. Phys., 30, 11. K¨astner, J. and Thiel, W. (2005). Bridging the gap between thermodynamic integration and umbrella sampling provides a novel analysis method: Umbrella integration. J. Chem. Phys., 123, 144104. Kaufmann, B. (1949). Crystal Statistics. II. Partition function evaluated by spinor analysis. Phys. Rev., 76, 1232. Khintchine, A. I. (1934). Correlation theories of stationary stochastic processes. Math. Ann., 109, 604. Kirkwood, J. G. (1935). Statistical mechanics of fluid mixtures. J. Chem. Phys., 3, 300. Kladko, K. and Fulde, P. (1998). On the properties of cumulant expansions. Intl. J. Quant. Chem., 66, 377. Kohn, W. and Sham, L. J. (1965). Self-consistent equations including exchange and correlation effects. Phys. Rev., 140, A1138. Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superpositions of continuous functions of one variable and addition. Dokl. Akad. Nauk USSR, 14, 953. Kone, A. and Kofke, D. A. (2005). Selection of temperature intervals for parallel tempering simulations. J. Chem. Phys., 122, 206101. Korol, R., Bou-Rabee, N., and Miller III, T. F. (2019). Cayley modification for strongly stable path-integral and ring-polymer molecular dynamics. J. Chem. Phys., 151, 124103. Korol, R., Rosa-Ra´ıces, J. L., Bou-Rabee, N., and Miller, III, T. F. (2020). Dimension-free path-integral molecular dynamics without preconditioning. J. Chem. Phys., 152, 104102. Kraeutler, V., van Gunsteren, W. F., and Huenenberger, P. H. (2001). A fast SHAKE algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations. J. Comput. Chem., 22, 501. Kramers, H. A. (1940). Brownian motion in a field of force and the diffusion model of chemical reactions . Physica, 7, 284. Kramers, H. A. and Wannier, G. H. (1941a). Statistics of the two-dimensional Ferromagnet. Part I. Phys. Rev., 60, 252. Kramers, H. A. and Wannier, G. H. (1941b). Statistics of the two-dimensional Ferromagnet. Part II. Phys. Rev., 60, 263. Kreis, K., Kremer, K., Potestio, R., and Tuckerman, M. E. (2017). From classical to quantum and back: Hamiltonian adaptive resolution path integral, ring polymer, and centroid molecular dynamics. J. Chem. Phys., 147, 244104. Kreis, K., Tuckerman, M. E., Donadio, D., Kremer, K., and Potestio, R. (2016). From

References

835

classical to quantum and back: A Hamiltonian scheme for adaptive multiresolution classical/path-integral simulations. J. Chem. Theory Comput., 12, 3030. Krilov, G., Sim, E., and Berne, B. J. (2001). Quantum time correlation functions from complex time Monte Carlo simulations: A maximum entropy approach. J. Chem. Phys., 114, 1075. Kubo, R., Toda, M., and Hashitsume, N. (1985). Statistical Physics II. SpringerVerlag, Berlin. Kumar, S., Bouzida, D., Swendsen, R. H., Kollman, P. A., and Rosenberg, J. M. (1992). The weighted histogram analysis method for free-energy calculations on biomolecules. 1. The method. J. Comp. Chem., 13, 1011. Kurkova, V. (1991). Kolmogorov’s theorem and multilayer neural networks. Neural Comput., 3, 617–622. Laio, A. and Parrinello, M. (2002). Escaping the free-energy minima. Proc. Natl. Acad. Sci. U.S.A., 99, 12562. Laio, A., Rodriguez-Fortea, A., Gervasio, F. L., Ceccarelli, M., and Parrinello, M. (2005). Assessing the accuracy of metadynamics. J. Phys. Chem. B , 109, 6714. Laird, B. B. and Skinner, J. L. (1991). T2 can be greater than 2T1 even at finite temperature. J. Chem. Phys., 94, 4405. Langevin, P. (1905). Une formule fondamentale de th´eorie cin´etique. Ann. Chim. Phys., 5, 245. Langevin, P. (1908). Sur la th´eorie du mouvement Brownien. CR Hebd. s´eances Acad. Sci., 146, 530. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Comp., 1, 541. Lee, H. -S. and Tuckerman, M. E. (2006). Structure of liquid water at ambient temperature from ab initio molecular dynamics performed in the complete basis set limit. J. Chem. Phys., 125, 154507. Lee, H. -S. and Tuckerman, M. E. (2007). Dynamical properties of liquid water from ab initio molecular dynamics performed in the complete basis set limit. J. Chem. Phys., 126, 164501. Lee, S. (1994). The convergence of complex Langevin simulations. Nucl. Phys. B , 413, 827. Lees, A. W. and Edwards, F. (1972). The computer study of transport processes under extreme conditions. J. Phys. C , 5, 1921. Leimkuhler, B., Margul, D. T., and Tuckerman, M. E. (2013). Stochastic, resonancefree multiple time-step algorithms for molecular dynamics with very large time steps. Mol. Phys., 111, 3579. Leimkuhler, B. and Matthews, C. (2012). Rational construction of stochastic numerical methods for molecular sampling. Appl. Math. Res. eXpress, 2013, 34. Leimkuhler, B., Matthews, C., and Vlaar, T. (2019). Partitioned integrators for thermodynamic parameterization of neural networks. Found. Data Sci., 1, 457. Leimkuhler, B., Noorizadeh, E., and Theil, F. (2009). A gentle stochastic thermostat for molecular dynamics. J. Stat. Phys., 135, 261. Lennard-Jones, J. E. (1924). On the determination of molecular force fields. Proc.

836

References

Royal Soc. London A, 106, 463. Levine, A. M., Shapiro, M., and Pollak, E. (1988). Hamiltonian theory for vibrational dephasing rates of small molecules in liquids. J. Chem. Phys., 88, 1959. Levy, R. M, Haldane, A., and Flynn, W. F. (2017). Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr. Opin. Struct. Biol., 43, 55. Lin, L., Morrone, J. A., Car, R., and Parrinello, M. (2011). Displaced path integral formulation for the momentum distribution of quantum particles. Phys. Rev. Lett., 105, 110602. Liphardt, J., Dumont, S., Smith, S. B., Tinoco, I., and Bustamante, C. (2002). Equilibrium information from nonequilibrium measurements in an experimental test of Jarzynski’s equality. Science, 296, 1832. Liu, P., Kim, B., Friesner, R. A., and Berne, B. J. (2005). Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proc. Natl. Acad. Sci. U.S.A., 102, 13749–13754. Liu, Y. and Tuckerman, M. E. (2000). Generalized Gaussian moment thermostatting: A new continuous dynamical approach to the canonical ensemble. J. Chem. Phys., 112, 1685. Liu, Z. H. and Berne, B. J. (1993). Electron solvation in methane and ethane. J. Chem. Phys., 99, 9054. Lorentz, G. G. (2005). Approximation of Functions. AMS Chelsea Publishing, Providence. Ma, Q., Izaguirre, J. A., and Skeel, R. D. (2003). Verlet-I/r-RESPA/Impulse is limited by nonlinear instabilities. SIAM J. Sci. Comput., 24, 1951. Ma, S. K. (1976). Modern Theory of Critical Phenomena. Benjamin/Cummings, Reading. MacKerell, A., Jr., Bashford, D., Bellott, M., Dumbrack, R. L., Evanseck, J. D., Field, M. J., Fischer, S., Guo, H., Ha, S., Joseph-McCarthy, D., Kcuhnir, L., Kuczera, K., Lau, F. T. K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D. T., Prodhom, B., Reiher, W. E., III, Roux, B., Schlenkrich, M., Smith, J. C., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera, J., Yin, D., and Karplus, M. (1998). All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B , 102, 3586. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposiumon Mathematical Statistics and Probability (ed. L. M. LeCam and J. Neyman), Volume 1, p. 281. University of California Press, Berkeley. Manousiouthakis, V. I. and Deam, M. W. (1999). Strict detailed balance is unnecessary in Monte Carlo simulation. J. Chem. Phys., 110, 2753. Maragliano, L. and Vanden-Eijnden, E. (2006). A temperature accelerated method for sampling free energy and determining reaction paths in rare event simulations. Chem. Phys. Lett., 426, 168. Maragliano, L. and Vanden-Eijnden, E. (2008). Single-sweep methods for free energy calculations. J. Chem. Phys., 128, 184110. Marchi, M., Sprik, M., and Klein, M. L. (1990). Calculation of the molar volume of

References

837

electron solvation in liquid ammonia. J. Phys. Chem., 94, 431. Margul, D. and Tuckerman, M. E. (2016). A stochastic, resonance-free multiple timestep algorithm for polarizable models that permits very large time steps. J. Chem. Theory Comput., 12, 2170. Marinari, E. and Parisi, G. (1992). Simulated tempering: A new Monte Carlo scheme. Europhys. Lett., 19, 451. Maris, H. J. and Kadanoff, L. J. (1978). Teaching the renormalization group. Am. J. Phys., 46, 652. Markland, T. E. and Manolopoulos, D. E. (2008a). A refined ring polymer contraction scheme for systems with electrostatic interactions. Chem. Phys. Lett., 464, 256. Markland, T. E. and Manolopoulos, D. E. (2008b). An efficient ring polymer contraction scheme for imaginary time path integral simulations. J. Chem. Phys., 129, 024105. Marsalek, O. and Markland, T. E. (2016). Ab initio molecular dynamics with nuclear quantum effects at classical cost: Ring polymer contraction for density functional theory. J. Chem. Phys., 144, 054112. Martyna, G. J., Deng, Z. H., and Klein, M. L. (1993). Quantum simulation studies of singlet and triplet bipolarons in liquid ammonia. J. Chem. Phys., 98, 555. Martyna, G. J., Hughes, A., and Tuckerman, M. E. (1999). Molecular dynamics algorithms for path integrals at constant pressure. J. Chem. Phys., 110, 3275. Martyna, G. J., Klein, M. L., and Tuckerman, M. E. (1992). Nos´e-Hoover chains: The canonical ensemble via continuous dynamics. J. Chem. Phys., 97, 2635. Martyna, G. J., Tobias, D. J., and Klein, M. L. (1994). Constant pressure molecular dynamics algorithms. J. Chem. Phys., 101, 4177. Martyna, G. J. and Tuckerman, M. E. (1999). A reciprocal space based method for treating long range interactions in ab initio and force-field-based calculations in clusters. J. Chem. Phys., 110, 2810. Martyna, G. J., Tuckerman, M. E., Tobias, D. J., and Klein, M. L. (1996). Explicit reversible integrators for extended systems dynamics. Mol. Phys., 87, 1117. Marx, D., Chandra, A., and Tuckerman, M. E. (2010). Aqueous basic solutions: Hydroxide solvation, structural diffusion and comparison to the hydrated proton. Chem. Rev., 110, 2174. Marx, D. and Hutter, J. (2009). Ab Initio Molecular Dynamics. Cambridge University Press, Cambridge. Marx, D. and Parrinello, M. (1994). Ab initio path-integral molecular dynamics. Z. Phys. B , 95, 143. Marx, D. and Parrinello, M. (1996). Ab initio path integral molecular dynamics: Basic ideas. J. Chem. Phys., 104, 4077. Marx, D., Tuckerman, M. E., Hutter, J., and Parrinello, M. (1999a). The nature of the hydrated excess proton in water. Nature, 367, 601. Marx, D., Tuckerman, M. E., and Martyna, G. J. (1999b). Quantum dynamics via adiabatic ab initio centroid molecular dynamics. Comp. Phys. Comm., 118, 166. McCammon, J. A., Gelin, B. R., and Karplus, M. (1977). Dynamics of folded proteins. Nature, 267, 585.

838

References

McCammon, J. A., Gelin, B. R., Karplus, M., and Wolynes, P. G. (1976). Hingebending mode in lysozyme. Nature, 262, 325. McCulloch, W. S. and Pitts, W. (1943). A logical calculus of the idea immanent in nervous activity. Bull. Math. Biophys., 5, 115. McGreevy, R. L. and Pusztai, L. (1988). Reverse Monte Carlo simulation: A new technique for the determination of disordered structures. Mol. Simulat., 1, 359. McQuarrie, D. A. (2000). Statistical Mechanics. University Science Books, Sausalito. Medders, G. R., Babin, V., and Paesani, F. (2014). Development of a “first-principles” water potential with flexible monomers. III. Liquid phase properties. J. Chem. Theory Comput., 10, 2906. Merli, P. G., Missiroli, G. F., and Pozzi, G. (1976). Statistical aspect of electron interference phenomena. Am. J. Phys., 44, 306. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys., 21, 1087. Miller, T. F., III, Eleftheriou, M., Pattnaik, P., Ndirango, A., Newns, D., and Martyna, G. J. (2002). Symplectic quaternion scheme for biophysical molecular dynamics. J. Chem. Phys., 116, 8649. Miller, T. F., III and Manolopoulos, D. E. (2005). Quantum diffusion in liquid parahydrogen from ring-polymer molecular dynamics. J. Chem. Phys., 122, 184503. Miller, W. H. (2005). Quantum dynamics of complex molecular systems. Proc. Natl. Acad. Sci. U.S.A., 102, 6660. Minary, P., Martyna, G. J., and Tuckerman, M. E. (2003). Algorithms and novel applications based on the isokinetic ensemble. I. Biophysical and path integral molecular dynamics. J. Chem. Phys., 118, 2510. Minary, P., Morrone, J. A., Yarne, D. A., Tuckerman, M. E., and Martyna, G. J. (2004a). Long range interactions on wires: A reciprocal space based formalism. J. Chem. Phys., 121, 11949. Minary, P. and Tuckerman, M. E. (2004). Reaction pathway of the [4+2] Diels-Alder adduct formation on Si(100)-2×1. J. Am. Chem. Soc., 126, 13920. Minary, P., Tuckerman, M. E., and Martyna, G. J. (2004b). Long-time molecular dynamics for enhanced conformational sampling in biomolecular systems. Phys. Rev. Lett., 93, 150201. Minary, P., Tuckerman, M. E., and Martyna, G. J. (2007). Dynamical spatial warping: A novel method for the conformational sampling of biophysical structure. SIAM J. Sci. Comput., 30, 2055. Minary, P., Tuckerman, M. E., Pihakari, K. A., and Martyna, G. J. (2002). A new reciprocal space based treatment of long range interactions on surfaces. J. Chem. Phys., 116, 5351. Miura, S. and Okazaki, S. (2000). Path integral molecular dynamics for Bose-Einstein and Fermi-Dirac statistics. J. Chem. Phys., 112, 10116. Mjalli, F. S. and Naser, J. (2015). Viscosity model for choline chloride-based deep eutectic solvents. Asia-Pacific J. Chem. Eng., 10(2), 273–281. Mori, H. (1965). Transport collective motion and Brownian motion. Prog. Theor. Phys., 33, 423.

References

839

Mori, Y., Okazaki, K., Mori, T., Kim, K., and Matubayasi, N. (2020). Learning reaction coordinates via cross-entropy minimization: Application to alanine dipeptide. J. Chem. Phys., 153, 054115. Morrone, J. A. and Car, R. (2008). Nuclear quantum effects in water. Phys Rev. Lett., 101, 017801. Morrone, J. A., Markland, T. E., Ceriotti, M., and Berne, B. J. (2011). Efficient multiple time scale molecular dynamics: Using colored noise thermostats to stabilize resonances. J. Chem. Phys., 134, 014103. Morrone, J. A., Srinivasan, V., Sebastiani, D., and Car, R. (2007). Proton momentum distribution in water: An open path integral molecular dynamics study. J. Chem. Phys., 126, 234504. Mukamel, S. (1995). Principles of Nonlinear Optical Spectroscopy. Oxford University Press, New York. Mundy, C. J., Balasubramanian, S., Bagchi, K., Martyna, G. J., and Klein, M. L. (2000). Nonequilibrium molecular dynamics. Rev. Comp. Chem., 14, 291. Nakayama, A. and Makri, N. (2005). Simulation of dynamical properties of normal and superfluid helium. Proc. Natl. Acad. Sci. U.S.A., 102, 4230. Noe, F., Olsson, S., Kohler, J., and Wu, H. (2019). Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365, 6457. Nos´e, S. (1984). A unified formulation of the constant-temperature molecular dynamics methods. J. Chem. Phys., 81, 511. Nos´e, S. and Klein, M. L. (1983). Constant-pressure molecular dynamics for molecular systems. Mol. Phys., 50, 1055. Oberhofer, H., Dellago, C., and Geissler, P. L. (2005). Biased sampling of nonequilibrium trajectories: Can fast switching simulations outperform conventional free energy calculation methods? J. Phys. Chem. B , 109, 6902. Olender, R. and Elber, R. (1996). Classical trajectories with a very large time step: Formalism and numerical examples. J. Chem. Phys., 105, 9299. Onsager, L. (1944). Crystal statistics. I. A two-dimensional model with an order– disorder transition. Phys. Rev., 65, 117. Oxtoby, D. W. (1979). Hydrodynamic theory for vibrational dephasing in liquids. J. Chem. Phys., 70, 2605. Paesani, F., Iuchi, S., and Voth, G. A. (2007). Quantum effects in liquid water from an ab initio-based polarizable force field. J. Chem. Phys., 127, 074506. Paesani, F. and Voth, G. A. (2009). The properties of water: Insights from quantum simulations. J. Phys. Chem. B , 113, 5702. Paesani, F. and Voth, G. A. (2010). A quantitative assessment of the accuracy of centroid molecular dynamics for the calculation of the infrared spectrum of liquid water. J. Chem. Phys., 132, 014105. Paesani, F., Zhang, W., Case, D. A., Cheathem, T. E., and Voth, G. A. (2006). An accurate and simple quantum model for liquid water. J. Chem. Phys., 125, 184507. Park, S., Khalili-Araghi, F., and Schulten, K. (2003). Free energy calculation from steered molecular dynamics simulations using Jarzynski’s equality. J. Chem. Phys., 119, 3559. Park, S. and Schulten, K. (2004). Calculating potentials of mean force from steered

840

References

molecular dynamics simulations. J. Chem. Phys., 120, 5946. Parr, R. G. and Yang, W. T. (1989). Density Functional Theory of Atoms and Molecules. Clarendon Press, Oxford. Parrinello, M. and Rahman, A. (1980). Crystal-structure and pair potentials: A molecular dynamics study. Phys. Rev. Lett., 45, 1196. Parrinello, M. and Rahman, A. (1984). Study of an F center in molten KCl. J. Chem. Phys., 80, 860. Passerone, D. and Parrinello, M. (2001). Action-derived molecular dynamics in the study of rare events. Phys. Rev. Lett., 87, 108302. Pathria, R. K. (1972). Statistical Mechanics (2nd edn). Elsevier Butterworth Heinemann, Amsterdam. Patra, M., Karttunen, M., Hyvonen, M. T., Falck, E., Lindqvist, P., and Vattulainen, I. (2003). Molecular dynamics simulations of lipid bilayers: Major artifacts due to truncating electrostatic interactions. Biophys. J., 84, 3636. Paul, S., Nair, N. N., and Vashisth, H. (2019). Phase space and collective variable based simulation methods for studies of rare events. Mol. Simulat., 45, 1273. Peierls, R. (1938). On a minimum property of the free energy. Phys. Rev., 54, 918. P´erez, A. and Tuckerman, M. E. (2011). Improving the convergence of closed and open path integral molecular dynamics via higher order Trotter factorization schemes. J. Chem. Phys., 135, 064104. P´erez, A., Tuckerman, M. E., and M¨ user, M. H. (2009). A comparative study of the centroid and ring-polymer molecular dynamics methods for approximating quantum time correlation functions from path integrals. J. Chem. Phys., 130, 184105. Peters, B. (2006). Using the histogram test to quantify reaction coordinate error. J. Chem. Phys., 125, 241101. Peters, B., Beckham, G. T., and Trout, B. L. (2007). Extensions to the likelihood maximization approach for finding reaction coordinates. J. Chem. Phys., 127, 034109. Peters, B. and Trout, B. L. (2006). Obtaining reaction coordinates by likelihood maximization. J. Chem. Phys., 125, 054108. Piaggi, P. M. and Parrinello, M. (2018). Predicting polymorphism in molecular crystals using orientational entropy. Proc. Natl. Acad. Sci. U.S.A., 115, 10251. Piana, S. and Laio, A. (2007). A bias-exchange approach to protein folding. J. Phys. Chem. B , 111, 4553. Piccinini, G. (2004). The first computational theory of mind and brain: A close look at McCulloch and Pitts’s “Logical Calculus of Ideas Immanent in Nervous Activity”. Synthese, 141, 175. Plimpton, S. (1995). Fast parallel algorithms for short-range molecular dynamics. J. Comp. Phys., 117, 1. Pollak, E. (1990). Variational transition-state theory for activated rate processes. J. Chem. Phys., 93, 1116. Pollak, E., Grabert, H., and Hanggi, P. (1989). Theory of activated rate-processes for arbitrary frequency-dependent friction: Solution of the turnover problem. J. Chem. Phys., 91, 4073. Pollak, E. and Talkner, P. (1993). Activated rate-processes: Finite-barrier expansion for the rate in the spatial-diffusion limit . Phys. Rev. E , 47, 922.

References

841

Pollak, E., Tucker, S. C., and Berne, B. J. (1990). Variational transition-state theory for reaction-rates in dissipative systems. Phys. Rev. Lett., 65, 1399. Potestio, R., Fritsch, S., Espa˜ nol, P., Delgado-Buscalioni, R., Kremer, K., Everaers, R., and Donadio, D. (2013). Hamiltonian adaptive resolution simulation for molecular liquids. Phys. Rev. Lett., 110, 108301. Poulsen, J. A., Nyman, G., and Rossky, P. J. (2005). Static and dynamic quantum effects in molecular liquids: A linearized path integral description of water. Proc. Natl. Acad. Sci. U.S.A., 102, 6709. Pratt, L. R. and Laviolette, R. A. (1998). Quasi-chemical theories of associated liquids. Mol. Phys., 94, 909. Procacci, P., Marchi, M., and Martyna, G. J. (1998). Electrostatic calculations and multiple time scales in molecular dynamics simulation of flexible molecular systems. J. Chem. Phys., 108, 8799. Quigley, D. and Rodger, P. M. (2008). Metadynamics simulations of ice nucleation and growth. J. Chem. Phys., 128, 154518. Quigley, D. and Rodger, P. M. (2009). A metadynamics-based approach to sampling crystallisation events. Mol. Simulat., 35, 613. Rahman, A. (1964). Correlations in motion of atoms in liquid argon. Phys. Rev., 136, A405. Rahman, A. and Stillinger, F. H. (1971). Molecular dynamics study of liquid water. J. Chem. Phys., 55, 3336. Ramshaw, J. D. (2002). Remarks on non-Hamiltonian statistical mechanics. Europhys. Lett., 59, 319. Rathore, N., Chopra, M., and de Pablo, J. J. (2005). Optimal allocation of replicas in parallel tempering simulations. J. Chem. Phys., 122, 024111. Rathore, N., Knotts IV, T. A., and de Pablo, J. J. (2003). Density-of-states simulations of proteins. J. Chem. Phys., 118, 4285. Reichman, D. R. and Silbey, R. J. (1996). On the relaxation of a two-level system: Beyond the weak-coupling approximation. J. Chem. Phys., 104, 1506. Reiss, C. A., van Mechelen, J. B., Goubitz, K., and Peschar, R. (2018). Reassessment of Paracetamol orthorhombic form III and determination of a novel low-temperature monoclinic form III-m from powder diffraction data. Acta Cryst. C , 74, 392. Ricci, M. A., Nardone, M., Ricci, F. P., Andreani, C., and Soper, A. K. (1995). Microscopic structure of low temperature liquid ammonia: A neutron diffraction study. J. Chem. Phys., 102, 7650. Rodriguez, A. and Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344, 1492. Rogal, J., Schneider, E., and Tuckerman, M. E. (2019). Neural-network-based path collective variables for enhanced sampling of phase transformations. Phys. Rev. Lett., 123, 245701. Rosa-Ra´ıces, J. L., Sun, J., Bou-Rabee, N., and Miller, T. F. (2021). A generalized class of strongly stable and dimension-free t-RPMD integrators. J. Chem. Phys., 154, 024106. Ross, M. and McMahan, A. K. (1980). Condensed xenon at high pressure. Phys. Rev. B , 21, 1658.

842

References

Rossi, M., Ceriotti, M., and Manolopoulos, D. E. (2014). How to remove the spurious resonances from ring polymer molecular dynamics. J. Chem. Phys., 140, 234116. Rosso, L., Abrams, J. B., and Tuckerman, M. E. (2005). Mapping the backbone dihedral free-energy surfaces in small peptides in solution using adiabatic free-energy dynamics. J. Phys. Chem. B , 109, 4162. Rosso, L., Minary, P., Zhu, Z. W., and Tuckerman, M. E. (2002). On the use of the adiabatic molecular dynamics technique in the calculation of free energy profiles. J. Chem. Phys., 116, 4389. Rouse, P. E. (1953). A theory of the linear viscoelastic properties of dilute solutions of coiling polymers. J. Chem. Phys., 21, 1272. Rupakheti, C., Lamoureux, G., MacKerell, A. D., and Roux, B. (2020). Statistical mechanics of polarizable force fields based on classical drude oscillators with dynamical propagation by the dual-thermostat extended Lagrangian. J. Chem. Phys., 153, 114108. Ryckaert, J. P. and Ciccotti, G. (1983). Introduction of Andersen demon in the molecular dynamics of systems with constraints. J. Chem. Phys., 78, 7368. Ryckaert, J. P., Ciccotti, G., and Berendsen, H. J. C. (1977). Numerical-integration of Cartesian equations of motion of a system with constraints: Molecular-dynamics of n-alkanes. J. Comput. Phys., 23, 327. Sandu, A. and Schlick, T. (1999). Masking resonance artifacts in force-splitting methods for biomolecular simulations by extrapolative Langevin dynamics. J. Comput. Phys., 151, 74. Sardanashvily, G. (2002a). The Lyapunov stability of first-order dynamic equations with respect to time-dependent Riemannian metrics. arXiv.nlin.CD/0201060v1 . Sardanashvily, G. (2002b). The Lyapunov stability of first-order dynamic equations with respect to time-dependent Riemannian metrics: An example. arXiv.nlin.CD/0203031v1 . Savill, N. J. and Hogeweg, P. (1997). Modelling morphogenesis: From single cells to crawling slugs. J. Theor. Biol., 184, 229. Scharf, D., Martyna, G. J., and Klein, M. L. (1993). Path-integral Monte Carlo study of a lithium impurity in para-hydrogen: Clusters and the bulk liquid. J. Chem. Phys., 99, 8997. Schlick, T., Mandzuik, M., Skeel, R. D., and Srinivas, K. (1998). Nonlinear resonance artifacts in molecular dynamics simulations. J. Comput. Phys., 140, 1. Schneider, E., Dai, L., Topper, R. Q., Drechsel-Grau, C., and Tuckerman, M. E. (2017). Stochastic neural network approach for learning high-dimensional free energy surfaces. Phys. Rev. Lett., 119, 150601. Schoell-Paschinger, E. and Dellago, C. (2006). A proof of Jarzynski’s nonequilibrium work theorem for dynamical systems that conserve the canonical distribution. J. Chem. Phys., 125, 054105. Schulman, L. S. (1981). Techniques and Applications of Path Integration. WileyInterscience, New York. Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., ˇ ıdek, A., Nelson, A. W. R., Bridgland, A., Penedones, H., Petersen, S., Simonyan, Z´ K., Crossan, S., Kohli, P., Jones, D. T., Silver, D., Kavukcuoglu, K., and Hassabis,

References

843

D. (2019). Protein structure prediction using multiple deep neural networks in the 13th critical assessment of protein structure prediction (CASP13). Proteins: Struct., Funct., Bioinform., 87, 1141. Sergi, A. (2003). Non-Hamiltonian equilibrium statistical mechanics. Phys. Rev. E , 67, 021101. Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press, New York. Shankar, R. (1994). Principles of Quantum Mechanics (2nd edn). Plenum Press, New York. Shtukenberg, A. G., Tan, M., Vogt-Maranto, L., Chan, E. J., Xu, W., Yang, J., Tuckerman, M. E., Hu, C. T., and Kahr, B. (2019). Melt crystallization for Paracetamol polymorphism. Cryst. Growth & Design, 19, 4070. Silvera, I. F. and Goldman, V. V. (1978). The isotropic intermolecular potential for H2 and D2 in the solid and gas phases. J. Chem. Phys., 69, 4209. Simovici, D. (2018). Mathematical Analysis for Machine Learning and Data Mining. World Scientific, Singapore. Skeel, R. D. and Hardy, D. J. (2001). Practical construction of modified Hamiltonians. SIAM J. Sci. Comput., 23, 1172. Skinner, J. L. and Park, K. (2001). Calculating vibrational energy relaxation rates from classical molecular dynamics simulations: Quantum correction factors for processes involving vibration-vibration energy transfer. J. Phys. Chem. B , 105, 6716. Smith, E. L., Abbott, A. P., and Ryder, K. S. (2014). Deep eutectic solvents (DESs) and their applications. Chem. Rev., 114(21), 11060–11082. Song, H., Vogt-Maranto, L., Wiscons, R., Matzger, A. J., and Tuckerman, M. E. (2020). Generating cocrystal polymorphs with information entropy driven by molecular dynamics-based enhanced sampling. J. Phys. Chem. Lett., 11, 9751. Soper, A. K. (1996). Empirical potential Monte Carlo simulation of fluid structure. Chem. Phys., 202, 295. Sprecher, D. (1965). On the structure of continuous functions of several variables. Trans. Amer. Math. Soc., 115, 340. Sprik, M. and Ciccotti, G. (1998). Free energy from constrained molecular dynamics. J. Chem. Phys., 109, 7737. Sprik, M., Impey, R. W., and Klein, M. L. (1985). Study of electron solvation in liquid ammonia using quantum path integral Monte-Carlo calculations. J. Chem. Phys., 83, 5802. Sprik, M., Impey, R. W., and Klein, M. L. (1986). Study of electron solvation in polar solvents using path integral Monte Carlo calculations. J. Stat. Phys., 43, 967. Steinczinger, Z. and Pusztai, L. (2012). An independent, general method for checking consistency between diffraction data and partial radial distribution functions derived from them: The example of liquid water. Condens. Matt. Phys., 15, 23606. Steinhardt, P. J., Nelson, D. R., and Ronchetti, M. (1983). Bond-orientational order in liquids and glasses. Phys. Rev. B , 28, 784. Stillinger, F. H. and Rahman, A. (1972). Molecular dynamics study of temperature effects on water structure and kinetics. J. Chem. Phys., 57, 1281. Stillinger, F. H. and Rahman, A. (1974). Improved simulation of liquid water by

844

References

molecular dynamics. J. Chem. Phys., 60, 1545. Strang, G. (1968). On the construction of and comparison of difference schemes. SIAM J. Numer. Anal., 5, 506. Straub, J. E., Borkovec, M., and Berne, B. J. (1988). Molecular dynamics study of an isomerizing diatomic in a Lennard-Jones fluid. J. Chem. Phys., 89, 4833. Suzuki, M. (1991a). Decomposition formulas of exponential operators and Lie exponentials with some applications to quantum mechanics and statistical physics. J. Math. Phys., 32, 400. Suzuki, M. (1991b). General theory of fractal path-integrals with applications to many-body theories and statistical physics. J. Math. Phys., 32, 400. Suzuki, M. (1992). General nonsymmetric higher-order decomposition of exponential operators and symplectic integrators. J. Phys. Soc. Japan, 61, 3015. Suzuki, M. (1995). Hybrid exponential product formulas for unbounded operators with possible applications to Monte Carlo simulations. Phys. Lett. A, 201, 425. Sweet, C. R., Petrone, P., Pande, V. S., and Izaguirre, J. A. (2008). Normal mode partitioning of Langevin dynamics for biomolecules. J. Chem. Phys., 128, 145101. Swope, W. C., Andersen, H. C., Berens, P. H., and Wilson, K. R. (1982). A computersimulation method for the calculation of equilibrium-constants for the formation of physical clusters of molecules: Application to small water clusters. J. Chem. Phys., 76, 637. Szab´ o, A. and Merks, R. M. H. (2013). Cellular Potts modeling of tumor growth, tumor invasion, and tumor evolution. Front. Oncol., 3. Takahashi, M. and Imada, M. (1984). Monte Carlo calculations of quantum systems. II. Higher order correction. J. Phys. Soc. Japan, 53, 3765. Tarasov, V. E. (2004). Phase-space metric for non-Hamiltonian systems. J. Phys. A, 38, 2145. Tesi, M. C., van Rensburg, E. J. J., Orlandini, E., and Whittington, S. G. (1996). Monte Carlo study of the interacting self-avoiding walk model in three dimensions. J. Stat. Phys., 82, 155. Thirumalai, D., Bruskin, E. J., and Berne, B. J. (1983). An iterative scheme for the evaluation of discretized path integrals. J. Chem. Phys., 79, 5063. Tobias, D. J., Martyna, G. J., and Klein, M. L. (1993). Molecular dynamics simulations of a protein in the canonical ensemble. J. Phys. Chem., 97, 12959. Tolman, R. C. (1918). A general theory of energy partition with applications to quantum theory. Phys. Rev., 11, 261. Torrie, G. M. and Valleau, J. P. (1974). Monte-Carlo free-energy estimates using non-Boltzmann sampling: Application to subcritical Lennard-Jones fluid. Chem. Phys. Lett., 28, 578. Torrie, G. M. and Valleau, J. P. (1977). Non-physical sampling distributions in MonteCarlo free-energy estimation: Umbrella sampling. J. Comput. Phys., 23, 187. Toxvaerd, S. (1994). Hamiltonians for discrete dynamics. Phys. Rev. E , 50, 2274. Trotter, H. F. (1959). On the produce of semi-groups of operators. Proc. Amer. Math. Soc., 10, 545. Trout, B. L. and Parrinello, M. (1998). The dissociation mechanism of H2 O in water studied by first-principles molecular dynamics. Chem. Phys. Lett., 288, 343.

References

845

Tucker, S. C., Tuckerman, M. E., Berne, B. J., and Pollak, E. (1991). Comparison of rate theories for generalized Langevin dynamics. J. Chem. Phys., 95, 5809. Tuckerman, M. E. (2002). Ab initio molecular dynamics: Basic concepts, current trends and novel applications. J. Phys. Condens. Matt., 14, R1297. Tuckerman, M. E. (2019). Machine learning transforms how microstates are sampled. Science, 365, 982. Tuckerman, M. E., Alejandre, J., Lopez-Rendon, R., Jochim, A. L., and Martyna, G. J. (2006). A Liouville-operator derived measure-preserving integrator for molecular dynamics simulations in the isothermal-isobaric ensemble. J. Phys. A, 39, 5629. Tuckerman, M. E. and Berne, B. J. (1993). Vibrational relaxation in simple fluids: Comparison of theory and simulation. J. Chem. Phys., 98, 7301. Tuckerman, M. E., Liu, Y., Ciccotti, G., and Martyna, G. J. (2001). Non-Hamiltonian molecular dynamics: Generalizing Hamiltonian phase space principles to nonHamiltonian systems. J. Chem. Phys., 115, 1678. Tuckerman, M. E., Martyna, G. J., and Berne, B. J. (1992). Reversible multiple time scale molecular dynamics. J. Chem. Phys., 97, 1990. Tuckerman, M. E., Martyna, G. J., Klein, M. L., and Berne, B. J. (1993). Efficient molecular dynamics and hybrid Monte Carlo algorithms for path integrals. J. Chem. Phys., 99, 2796. Tuckerman, M. E. and Marx, D. (2001). Heavy-atom skeleton quantization and proton tunneling in ”intermediate-barrier” hydrogen bonds. Phys. Rev. Lett., 86, 4946. Tuckerman, M. E., Marx, D., Klein, M. L., and Parrinello, M. (1996). Efficient and general algorithms for path integral Car-Parrinello molecular dynamics. J. Chem. Phys., 104, 5579. Tuckerman, M. E., Marx, D., and Parrinello, M. (2002). The nature and transport mechanism of hydrated hydroxide ions in aqueous solution. Nature, 417, 925. Tuckerman, M. E., Mundy, C. J., Balasubramanian, S., and Klein, M. L. (1997). Modified nonequilibrium molecular dynamics for fluid flows with energy conservation. J. Chem. Phys., 106, 5615. Tuckerman, M. E., Mundy, C. J., and Martyna, G. J. (1999). On the classical statistical mechanics of non-Hamiltonian systems. Europhys. Lett., 45, 149. Tzanov, A. T., Cuendet, M. A., and Tuckerman, M. E. (2014). How accurately do current force fields predict experimental peptide conformations? An adiabatic free energy dynamics study. J. Phys. Chem. B , 118, 6539. Valsson, O. and Parrinello, M. (2014). Variational approach to enhanced sampling and free energy calculations. Phys. Rev. Lett., 113, 090601. VandeVondele, J. and Rothlisberger, U. (2002). Canonical adiabatic free energy sampling (CAFES): A novel method for the exploration of free energy surfaces. J. Phys. Chem. B , 106, 203. Verlet, L. (1967). Computer experiments on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys. Rev., 159, 98. Voth, G. A. (1993). Feynman path integral formulation of quantum mechanical transition state theory. J. Phys. Chem., 97, 8365.

846

References

Voth, G. A., Chandler, D., and Miller, W. H. (1989). Rigorous formulation of quantum transition state theory and its dynamical corrections. J. Chem. Phys., 91, 7749. Waldram, J. R. (1985). Theory of Thermodynamics. Cambridge University Press, New York. Wallqvist, A., Thirumalai, D., and Berne, B. J. (1987). Path integral Monte-Carlo study of the hydrated electron. J. Chem. Phys., 86, 6404. Wang, D., Wang, Y., Chang, J., Zhang, L., Wang, H., and E, W. (2021). Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics. Nature Comp.Sci., 2, 20. Wang, F. and Landau, D. P. (2001). Determinating the density of states for classical statistical models: A random walk algorithm to produce a flat histogram. Phys. Rev. E , 64, 056101. Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004). Development and testing of a general amber force field. J. Comp. Chem., 25, 1157. Wang, Q., Johnson, J. K., and Broughton, J. Q. (1997). Path integral grand canonical Monte Carlo. J. Chem. Phys., 107, 5108. Watanabe, M. and Reinhardt, W. P. (1990). Direct dynamic calculation of entropy and free-energy by adiabatic switching. Phys. Rev. Lett., 65, 3301. Weber, H. J. and Arfken, G. B. (2005). Mathematical Methods for Physicists. Academic Press, Burlington. Weingarten, D. (1982). Monte Carlo evaluation of hadron masses in lattice gauge theories with fermions. Phys. Lett. B , 109, 57. Weingarten, D. and Petcher, D. N. (1981). Monte Carlo integration for lattice gauge theories with fermions. Phys. Lett. B , 99, 333. Welberry, T. R. (1979). Crystal growth-disorder models and Ising models. Mol. Cryst. Liq. Cryst., 50, 21. Weyl, H. (1927). Quantenmechanik und Gruppentheorie. Z. Phys., 46, 1. Widom, B. (1963). Some topics in the theory of fluids. J. Chem. Phys., 39, 2808. Widom, B. (1965). Equation of state in the neighborhood of the critical point. J. Chem. Phys., 43, 3898. Widom, B. (1982). Potential-distribution theory and the statistical mechanics of fluids. J. Phys. Chem., 86, 869. Wiener, N. (1930). Generalized harmonic analysis. Acta Math., 55, 117. Wilburn, G. W. and Eddy, S. R. (2020). Remote homology search with hidden Potts models. PLOS Comp. Biol., 16, e1008085. Wilson, K. G. (1971). Renormalization group and critical phenomena. 1. Renormalization group and Kadanoff scaling picture. Phys. Rev. B , 4, 3174. Wilson, K. G. and Fisher, M. E. (1972). Critical exponents in 3.99 dimensions. Phys. Rev. Lett., 28, 240. Witt, A., Ivanov, S. D., Shiga, M., Forbert, H., and Marx, D. (2009). On the applicability of centroid and ring-polymer path integral molecular dynamics for vibrational spectroscopy. J. Chem. Phys., 130, 194510. Wu, Y., Tepper, H. L., and Voth, G. A. (2006). Flexible simple point-charge water model with improved liquid-state properties. J. Chem. Phys., 124, 024503. Yan, Q. L., Faller, R., and de Pablo, J. J. (2002). Density-of-states Monte Carlo

References

847

method for simulation of fluids. J. Chem. Phys., 116, 8745. Yang, C. N. (1952). The spontaneous magnetization of a two-dimensional Ising model. Phys. Rev., 85, 808. Yang, J., Erriah, B., Hu, C. T., Reiter, E., Zhu, X., L´ opez-Mej´ıas, V., CarmonaSep´ ulveda, I. P., Ward, M. D., and Kahr, B. (2020). A deltamethrin crystal polymorph for more effective malaria control. Proc. Natl. Acad. Sci. U.S.A., 117, 26633. Yang, J., Hu, C. T., Zhu, X., Zhu, Q., Ward, M. D., and Kahr, B. (2017). DDT polymorphism and the lethality of crystal forms. Angew. Chem. Intl. Ed., 56, 10165. Yoshida, H. (1990). Construction of higher-order symplectic integrators. Phys. Lett. A, 150, 262. Yu, T. -Q., Chen, P. -Y., Chen, M., Samanta, A., Vanden-Eijnden, E., and Tuckerman, M. E. (2014). Order-parameter-aided temperature-accelerated sampling for the exploration of crystal polymorphism and solid-liquid phase transitions. J. Chem. Phys., 140, 214109. Yu, T. -Q., Martyna, G. J., and Tuckerman, M. E. (2010). Measure-preserving integrators for molecular dynamics in the isothermal-isobaric ensemble derived from the Liouville operator. Chem. Phys., 370, 294. Yu, T. -Q. and Tuckerman, M. E. (2011). Temperature-accelerated method for exploring polymorphism in molecular crystals based on free energy. Phys. Rev. Lett., 107, 015701. Zelsmann, H. R. (1995). Temperature dependence of the optical constants for liquid H2 O and D2 O in the far IR region. J. Mol. Struct., 350, 95. Zhang, F. (1997). Operator-splitting integrators for constant-temperature molecular dynamics. J. Chem. Phys., 106, 6102. Zhang, L., Wang, H., and E, W. (2018). Reinforced dynamics for enhanced sampling in large atomic and molecular systems. J. Chem. Phys., 148, 124113. Zhang, Y., Poe, D., Heroux, L., Squire, H., Doherty, B. W., Long, Z., Dadmun, M., Gurkan, B., Tuckerman, M. E., and Maginn, E. J. (2020). Liquid structure and transport properties of the deep eutectic solvent ethaline. J. Phys. Chem. B , 124(25), 5251–5264. Zhang, Z. J., Liu, X. Z. J., Chen, Z., Zheng, H., Yan, K., and Liu, J. (2017). Unified thermostat scheme for efficient configurational sampling for classical/quantum canonical ensembles via molecular dynamics. J. Chem. Phys., 147, 034109. Zhang, Z. J., Liu, X. Z. J., Yan, K. Y., Tuckerman, M. E., and Liu, J. (2019). Unified efficient thermostat scheme for the canonical ensemble with holonomic or isokinetic constraints via molecular dynamics. J. Phys. Chem. A, 123, 6056. Zhu, X., Hu, C. T., Erriah, B., Vogt-Maranto, L., Yang, J., Yang, Y., Qiu, M., Fellah, N., Tuckerman, M. E., Ward, M. D., and Kahr, B. (2021). Imidacloprid crystal polymorphs for disease vector control and pollinator protection. J. Am. Chem. Soc., 143, 17144. Zhu, Z. W. and Tuckerman, M. E. (2002). Ab initio molecular dynamics investigation of the concentration dependence of charged defect transport in basic solutions via calculation of the infrared spectrum. J. Phys. Chem. B , 106, 8009. Zhu, Z. W., Tuckerman, M. E., Samuelson, S. O., and Martyna, G. J. (2002). Using

848

References

novel variable transformations to enhance conformational sampling in molecular dynamics. Phys. Rev. Lett., 88, 100201. Zwanzig, R. W. (1954). High-temperature equation of state by a perturbation method. 1. Nonpolar gases. J. Chem. Phys., 22, 1420. Zwanzig, R. W. (1973). Nonlinear generalized Langevin equations. J. Stat. Phys., 9, 215.

Index k-fold cross validation, 746 k-nearest neighbors, 766 Action integral, 29–32, 503 imaginary time, 506 harmonic oscillator, 509 Action-angle variables, 96 Activation function, 753–755 exponential linear unit, ELU, 755 leaky ReLU, 754 learnable, 755 rectified linear unit, ReLU, 754 sigmoid, 754 softmax, 762 softplus, 755 swish, 755 Adaptive resolution simulation approach, 298–299 Adiabatic free-energy dynamics, 374–384 Affine connection, 13 After-effect function, 622 Amplitude classical wave, 398, 601 probability, 398 Angular momentum, 15 Angular velocity, 40 Anticommutator, 616 Areal velocity, 15 Artificial intelligence, 740 Autocorrelation function classical, 568 harmonic oscillator, 578–579 Mori-Zwanzig theory, 686 pressure tensor, 573 rigid-bond approximation, 687 velocity, 577, 658, 662 vibrational dephasing, 662–667 dipole and infrared spectrum, 619–620 quantum, 615 harmonic oscillator, 617–619 velocity, 630 Back propagation, 756–757 Bagging, 766 Baker-Campbell-Hausdorff formula, 126, 544 Banach space, 815 Bayes’ theorem, 760–761 Bernoulli distribution, 742 Blue moon ensemble, 363–369 Body-fixed frame, 39

Boiling curve, 696 Boltzmann constant, 72, 79 generators, 395 relation for entropy, 79, 235 statistics, 450, 516 Bootstrap aggregation, 766 Bose-Einstein condensation, 446, 471, 478 distribution function, 477 Boson ideal gas, 471–481 Boundary conditions determination of energy levels, 414 Lees-Edwards, 585 periodic energy eigenfunctions, 414, 424, 447 Ewald summation, 807 minimum image convention, 805 simulation of Lennard-Jones liquid, 130 simulation of liquid water, 219 spin lattice, 710, 713, 715, 739 Box matrix, 244 Box-Muller sampling, 104–106 Bra vector, Dirac, 400 Bromwich contour, 820 Brownian particle, 657–658 Bulk viscosity, 594 Cardinal B-spline function, 810 Carnot cycle, 60–61 Categorical distribution, 743 Cell matrix, 244 Central limit theorem, 304–308 Central potential, 14 Centroid, 523 Chemical potential, 283 canonical ensemble definition, 141 chemical work, 78 defined, 58 excess, 297 grand canonical ensemble, 281, 319 ideal boson gas, 472 ideal fermion gas, 458, 459, 470 isobaric ensemble definition, 232, 233, 239 microcanonical definition, 79 microcanonical ensemble definition, 140 Clausius inequality, 62 Clustering algorithms, 782–784 K-means clustering, 782 Density-based clustering, 784

850

Index

Fuzzy clustering, 784 Coexistence curves, 696 Collective variables, 359–362 Committor distribution, 390–392 histogram test, 391 machine learning of, 780 Commutator, 111, 405 Conservation law action, 96 Andersen’s equations, 253 areal velocity, 15 blue moon ensemble, 366 conservation of energy, 11 energy, 7 Martyna-Tobias-Klein equations, 259 Nos´ e-Hoover chain equations, 199 Nos´ e-Hoover equations, 196 GSLLOD dynamics, 591 Hamiltonian, 21, 80 isokinetic equations, 211 Jarzynski’s equality, 353 linear momentum, 22 Martyna-Tobias-Klein equations, 259 Nos´ e-Hoover equations, 197 Martyna-Tobias-Klein equations, 257 Noether’s theorem, 23 non-Hamiltonian equilibrium phase-space distribution, 195 non-Hamiltonian phase-space volume, 193 orbital angular momentum, 15 phase-space distribution, 70 phase-space volume, 67 planar Couette flow, 571 Poisson bracket, 22 shadow Hamiltonian, 125 Constant-energy hypersurface, 21, 80 Constant-energy surface, 21 Constraint Gauss’s principle, 36 holonomic, 32, 106–109, 209, 272 nonholonomic, 32 Contraction semigroup, 815 Control variables canonical ensemble, 139 grand canonical ensemble, 283, 284 isoenthalpic-isobaric ensemble, 233 isothermal-isobaric ensemble, 233, 254 microcanonical ensemble, 78, 79, 98, 140 Convolution integral, 818 Convolution neural network, 763–764 filter, 763 flattening layer, 764 Convolution operator, two-dimensional, 764 Convolution theorem, 818 Coordinates action-angle, 96 Cartesian, 5 center-of-mass, relative, 16, 39, 162 collective variables, 359 cyclic, 14

generalized, 12, 19, 29, 31, 65, 364, 365, 562, 646 minimal set, 33 normal mode, 27, 529 phase-space, 6, 24, 65 polar, 40, 96, 105 primitive, 533 scaled, 170, 247, 252, 262, 263, 318 spherical polar, 14 staging, 527, 533 Copenhagen interpretation, 404 Correlation function classical time, 567–569, 653, 662 general spatial, 161 pair, 161 quantum time, 614, 617, 622, 627 spin-spin, 719 Correlation length, 720 Correlation time, 568 Cost function (see also loss function), 744 Coulomb gauge, 601 Critical exponents, 185, 697–698 α, 185, 709 β, 185 δ, 185, 709 η, 719–720 γ, 185, 709 mean-field theory, 186, 705 ν, 719–720 Critical hypersurface, 732 Critical isotherm, 697 Critical point, 183, 696 Critical temperature, 183, 696 mean-field, 707 Cumulant expansion, 178–179 Cumulant generating function, 179 Data manifold intrinsic dimension of, 754, 784–787 Swiss roll, 785 Decision tree, 764–765 Degeneracy, 482, 483 energy levels, 410, 424, 436–438, 442, 443 spin, 420, 453, 465, 478 Density functional theory, 446, 463, 464 Density of states, 327 vibrational, 643, 649 Detailed balance, 312, 615 Diffusion constant, 576–577 Dirac δ-function, 80, 84, 90, 91, 143, 144, 161, 189, 196, 234, 244, 245, 261, 305, 354, 362–364, 367, 384, 613–615, 655, 793–795 Dirac bra vector, 400 Dirac ket vector, 400 Direct translation method, 116, 119, 208, 265, 669 Dissipative flux, 565, 572, 576 Driven adiabatic free-energy dynamics, 374–384

Index Dynamic caging, 656 Eigenfunction, 408, 410, 446 Energy relaxation, 664–667 Ensemble blue moon, 363–369 canonical, 139–141 classical canonical, 141–146 classical grand canonical, 284–289 classical isoenthalpic-isobaric, 233–235 classical isokinetic, 210–214 classical isothermal-isobaric, 235–239 classical microcanonical, 80–83 classical nonequilibrium, 560–568 classical uniform, 85 definition, 63–65 grand canonical, 283–284 isobaric, 231–233 magnetic system, 701 microcanonical, 77–80 quantum canonical, 436 quantum density matrix, 432 quantum grand-canonical, 437–438 quantum isothermal-isobaric, 436–437 quantum nonequilibrium, 620–623 Ensemble distribution canonical, 144, 343, 565 definition, 68 entropy, 76 equilibrium solution, 71, 195 generalized Liouville equation, 194 grand canonical, 287 isoenthalpic-isobaric, 234 isothermal-isobaric, 237, 594 Liouville equation, 70 microcanonical, 80 nonequilibrium ansatz, 563 Entropy additivity, 88 Boltzmann’s relation, 79, 235 canonical ensemble, 146 canonical ensemble via maximization of, 146, 438 Clausius, 62, 78 Gibbs, 75 Gibbs paradox, 94–95 grand canonical ensemble, 289 grand canonical ensemble via maximization of, 290 ideal gas, 94, 292 isothermal-isobaric ensemble, 239 isothermal-isobaric ensemble via maximization of, 239 microcanonical ensemble, 79 quantum harmonic oscillator, 440 second law of thermodynamics, 62 Shannon form, 147, 239, 290 thermodynamic derivatives, 79, 140, 235 third law of thermodynamics, 63 uniform ensemble, 85

851

Equal a priori probability, 80 Equation of state critical exponent δ, 186, 703 definition, 57 grand canonical ensemble, 284, 288–289 high-temperature ideal boson gas, 473 high-temperature ideal fermion gas, 457 ideal gas, 57, 93, 151, 242, 292 Ising model, 701 isochores, 93 isotherms, 93 low-temperature ideal boson gas, 479 low-temperature ideal fermion gas, 470 mean-field Ising model, 709 simple fluid, 696 van der Waals, 183–185, 734 virial, 174 Ergodic hypothesis, 100 Ergodic system, 100 Error function, 803 Estimator classical energy, 168 classical molecular pressure, 262 classical pressure, 171 classical pressure tensor, 247–251 energy, primitive path integral, 501, 517 energy, virial path integral, 535, 538 heat capacity, virial path integral, 555 Monte Carlo, 304 path integral, 499 pressure tensor, path integral, 555 pressure, primitive path integral, 502, 517 pressure, virial path integral, 539 Euler angles, 43 Euler’s theorem, 238, 281–283, 536 Euler-Lagrange equation, 10 action integral, 31 central potential, 14 Euclidean, 506 generalized coordinates, 12 harmonic oscillator, 11 path integral, 505 Ewald summation, 803, 806–812 smooth particle-mesh, 810–812 Excluded volume, 182 Expectation value, 398, 404 Feed-forward network, 751 Fermi energy, 459 Fermi sign problem, 516 Fermi surface, 446, 460 Fermi’s Golden Rule, 613 Fermi-Dirac distribution, 459, 467 Fermion ideal gas, 454–470 Filter matrix, 763 First law of thermodynamics, 58–59, 78, 351 Fluctuations canonical energy, 148–149 isobaric volume, 277 Force

852

Index

central potential, 14 conservative, 3 constraint, 37, 107, 272, 366 determination of hydrodynamic boundary conditions, 592 external driving, 561–568 general, 562 external in nonequilibrium molecular dynamics, 583–586 generalized, 364 Hooke’s law, 7, 129 linear driving, 575 long-range, 171, 806 multiple time-scale motion, 117 nonbonded, 802 pairwise, 172 projection onto surface of constraint, 36 random, 652–654 shear, 560, 569–571 short-range link list, 806 switching function, 804 truncation, 804 Verlet list, 805 simple force field, 117 torque, 42 van der Waals, 130 Force constant, Hooke’s law, 7 Free energy Gibbs, 233 grand canonical, 283, 284 Helmholtz, 140 Free-energy calculations blue moon ensemble, 363–369 driven adiabatic free-energy dynamics, 374–384 free-energy perturbation, 339–342 Jarzynski’s equality, 350–357 λ free-energy dynamics, 346–350 metadynamics, 384–387 quantum, 522–524 temperature-accelerated sliced sampling, 389–390 thermodynamic integration, 342–346 soft core, 346 umbrella sampling, 370–373 unified free-energy dynamics, 388–389 Wang-Landau algorithm, 373–374 Free-energy perturbation, 339–342 Free-energy profile, 347, 348, 363–368 quantum in hydrogen clathrate, 539–541 quantum in malonaldehyde, 539 Free-energy surface, 377 definition of, 362 driven adiabatic free energy dynamics/temperature-accelerated molecular dynamics (d-AFED/TAMD), 378 Gaussian kernel ridge regression model of, 750

high-dimensional, 379 machine learning of, 750, 759, 767–774 metadynamics, 385 neural network learning of, 759 regression of, 767–774 Friction kernel, 652, 654–656 exponential, 658–660 static friction coefficient, 656 Fugacity, 289, 454, 471, 472 Functional action, 30, 503 definition, 796 derivative, 797 Euclidean action, 506 influence, 556 integral of, 502 Shannon entropy, 147, 240, 290 Functional integral, 502–511 Functional integration definition, 800 expansion about classical path, 508–511 Galilean relativity, 39 Gas constant, 57 Gauge invariance, 600 Gauss’s equation of motion, 37 Gauss’s principle, 36 Gaussian distribution, 741 multivariate, 742 Gaussian kernel, 749 Gaussian random process, 654 Generalized coordinates, 12 action integral, 29 blue moon ensemble, 364–368 collective variables, 359 conjugate momenta, 19 driven adiabatic dynamics, 384 Hamiltonian, 20 harmonic bath model, 646–649 Lagrangian, 13 Liouville’s theorem, 65 phase-space compressibility, 23 reaction coordinates, 359 Generalized Langevin equation dynamic caging, 656 exponential memory, 663–664 friction kernel, 654–656 harmonic bath, derivation from, 649–652 Mori-Zwanzig theory, derivation from, 682–687 potential of mean force, 652 random force, 652–654 Gibbs free energy, 233, 238, 283 magnetic mean-field, 706–708 one-dimensional Ising model, 712 two-dimensional Ising model, 715 Gibbs paradox, 94–95 Gibbs-Bogoliubov inequality, 175 Gradient descent, 758 stochastic, 758

Index Grand canonical molecular dynamics, 297–299 Graph theory for two-dimensional Ising model, 715–719 Green-Kubo relation bulk viscosity, 594 diffusion constant, 577 shear viscosity, 573 H¨ ormander conditions, 673 Hamilton’s equations, 20 driven oscillator, 561 energy conservation, 21 formal solution, 111 incompressibility, 23 Jarzynski’s equality, 352 linear response theory, 562 Liouville’s theorem, 65 microcanonical ensemble, 100 Newton’s second law, 20 symplectic structure, 23 time reversibility, 103 Hamiltonian action-angle variables, 96 Andersen, 252 canonical distribution, 143 conservation of, 21 constant-energy surface, 21 constant-pressure molecular dynamics, 249 definition, 19 eigenvalue equation, 406 electromagnetic field, 601 free particle, 89, 150, 412 generalized coordinates, 19 Hamilton’s equations, 20 harmonic bath, 649, 652 harmonic bead-spring model, 153 harmonic oscillator, 95, 152, 415, 578 ideal gas, 90, 150, 446 interaction picture, 604 Ising model block spin transformation, 722 general, 700 mean-field, 706 one-dimensional, 710 two-dimensional, 713 zero-field, 703 isoenthalpic-isobaric distribution, 234 Lagrangian, 19 λ-dynamics, 346 Liouville equation, 71 Liouville operator, 110 microcanonical distribution, 80 normal mode, 27 Nos´ e, 188 operator, 405 operators, raising and lowering, 417 path integral molecular dynamics classical isomorphic, 526, 532 staging, 528

853

Poisson bracket, 22 quaternion, 123 shadow, 124–127 spin lattice, 699 thermal contact, 86 time-dependent perturbation, 602, 621 virial theorem, 83 Hard-sphere potential, 180 Harmonic oscillator bath of, 649 bead-spring model, 153–158 canonical ensemble, 152–153 canonical, quantum, 439–441 classical, 6–8 classical time correlation function, 578–579 coupled to double well, 382 damped-driven, 562 drive, 561 functional integral, 508–511 generalized Langevin equation, 660–664 lambda-integration, 347 Langevin equation, numerical solution, 669 microcanonical ensemble, 95–98 modified Nos´ e-Hoover equations, 224 Mori-Zwanzig theory, 686–687 Nos´ e-Hoover chain equations, 201, 205 Nos´ e-Hoover equations, 197 open-chain path integral, 640 path integral normal-mode transformation, 530 path integral simulation of, 535 path integral staging transformation, 528 quantum, 415–418 quantum time correlation function, 617–619 ring polymer contraction, 551 vibrational dephasing and energy relaxation, 664–667 weakly perturbed and isokinetic Nos´ e-Hoover chain algorithm, 218 Heat capacity classical harmonic oscillator, 152 constant pressure, 239 constant volume, 146, 301 critical point behavior, 185, 698, 701 definition, 58 discontinuity, 480, 484 energy fluctuations in canonical ensemble, 149 ideal gas, 152, 243 low-temperature ideal boson gas, 480 low-temperature ideal fermion gas, 470 mean-field theory, 709 molar, 146 quantum harmonic oscillator, 440 two-dimensional Ising model, 715 virial estimator, 555 Heaviside step function, 84, 134, 688 Heisenberg Hamiltonian, 699 Heisenberg uncertainty principle, 407

854

Index

Helmholtz free energy, 140, 145, 245, 282 adiabatic free-energy dynamics, 350 difference blue moon ensemble, 363–365, 368 partition function ratio, 340 weighted histogram methods (WHAM), 372 driven adiabatic free energy dynamics, 378 metadynamics, 386 potential of mean force, 349 profile, 347 van der Waals theory, 182 Helmholtz free-energy difference Jarzynski’s equality, 351 thermodynamic integration, 343 surface, alanine dipeptide, 358 Henderson’s theorem, 175 Hidden layers, 753 Hilbert space, 400 Hilbert transform, 624 Holonomic constraint, 32 Homogeneous function, 281 Hybrid Monte Carlo, 320–323 Hyperparameter, 746 Idempotency, 683 Importance sampling, 310–312 Inertia tensor, 44 Initial conditions choosing, 103–106 sensitive dependence on, 130–131 Integrator classical propagator, 110–116 hybrid Monte Carlo, 321–323 isokinetic algorithm, 212–214 Langevin equation, 667–670 Martyna-Tobias-Klein algorithm, 263–269 multiple time step, 121 resonances, 121–122 multiple time-step, 116 nonequilibrium molecular dynamics, 586–590 Nos´ e-Hoover chain algorithm, 203–210 position Verlet, 136 RATTLE, 109 shadow Hamiltonian, 124–127 SHAKE, 106–109 Velocity Verlet algorithm, 102–103 Verlet algorithm, 101–102 Interaction picture, 603–604 Intrinsic dimension of data manifold, 754, 784–787 Ising model equation of state, 701 Hamiltonian, 700 mean-field treatment, 705–710 one-dimensional, 710–712 phase diagram, 701 renormalization group, 721–727

two-dimensional graph-theoretic approach, 715–719 transfer matrix approach, 712–715 Isokinetic equations, 211 Itˆ o calculus, 670 Jacobi identity, 126 Jarzynski’s equality, 350–357 Kernel matrix, 747 Kernel methods, 746–750 Kernel ridge-regression model, 748 Kernel trick, 749 Ket vector, Dirac, 400 Kolmogorov superposition theorem, 751 Kurkova restatement of, 751 Kramers-Kr¨ onig relations, 624 Kullback-Leibler divergence, 790 Lagrange undetermined multipliers, 274, 275, 366, 371, 747 constraints in classical mechanics, 33 entropy maximization, 147, 240, 290, 439 holonomic constraints, 106 importance sampling, 310 isokinetic dynamics, 211 kernel learning methods, 747 ROLL algorithm, 273 SHAKE/RATTLE algorithm, 108 Lagrangian action integral, 29–32 central potential, 14 constant-pressure molecular dynamics, 248 constrained systems, 32–35, 106 definition, 10 Euclidean, 506 Euler-Lagrange equations, 10 generalized coordinates, 13 Hamiltonian, 19 harmonic oscillator, 11 Legendre transform, 19 momentum, 18 normal mode, 27 path integral, 503 rigid body motion, 39–47, 122 two-body system, 16 λ free-energy dynamics, 346–350 Langevin equation, 655 generalized, 652 numerical integration, 667–670 overdamped, 655 Laplace transform, 623, 630, 650, 657, 659, 660, 667, 684, 687, 818–821 canonical partition function, 238 convolution theorem, 818 microcanonical partition function, 144 Lattice models, 699 Lees-Edwards boundary conditions, 585 Legendre transform, 17–18 canonical to grand canonical, 283

Index canonical to isothermal-isobaric, 233 classical Lagrangian, 19 microcanonical to canonical, 140 microcanonical to isoenthalpic-isobaric, 232 Lennard-Jones potential, 130 Lennard-Jones reduced units, 131 Linear regression, 743–746 multidimensional, 746 Linear response theory classical, 562–568 Liouville equation classical, 70–71, 563 perturbative solution, 563–567 equilibrium, 71, 80, 195, 234, 435, 613 generalized, 194 quantum, 434 perturbative solution, 621–623 Liouville operator classical, 110, 113, 115, 126, 563, 564, 683 constant-pressure molecular dynamics, 263, 265 hybrid Monte Carlo, 321 isokinetic ensemble, 212 multiple time-scale integration, 118, 208, 267 Nos´ e-Hoover chain, 204, 263, 587 quantum, 435 SLLOD algorithm, 587 symplectic quaternion, 124 Trotter theorem, 112 Liouville’s theorem, 65–71, 193, 352, 577 Local density approximation, 463–467 Loschmidt’s paradox, 56 Loss function, 744 cross entropy, 763 gradient training, 759 least-squares, 743, 754 M(RT)2 algorithm, 312–320 Machine learning, 740 cluster algorithms, 782–784 kernel methods, 746–750 neural networks, 750–764 weighted neighbor, 764–767 Marginal probability distribution, 362 Markov chain, 312 Martyna-Tobias-Klein equations flexible cell, 259 isotropic, 256 Massive thermostatting, 202, 529 Maxwell construction, 705 tie line, 734 Maxwell’s equations in vacuum, 600 Maxwell-Boltzmann distribution, 103 Mean absolute error, 745 Mean-field theory Ising model, 705–710 van der Waals equation, 704–705 Mean-square displacement, 577

855

Melting curve, 696 Memory function, 686 Memory integral, 654 Memory kernel, 654, 686 for multiple time-step molecular dynamics, 679 Metadynamics, 384–387 well-tempered, 387 Metric mass metric tensor, 13 phase-space metric, 192–194 Minimum image convention, 805 Modular invariance, 586, 590 Molecular crystals, 270–272 Molecular dynamics, 77, 98–101 adiabatic free-energy dynamics, 346–350 centroid, 631–633 classical propagator, 110–116 constraints, 106–109 driven adiabatic free-energy dynamics, 374–384 grand canonical, 297–299 hybrid Monte Carlo, 320–323 initial conditions, 103–106 isoenthalpic-isobaric ensemble, 251–254 isokinetic algorithm, 210–214 isothermal-isobaric ensemble flexible cell, 258–261 integrators, 263–269 isotropic, 254–257 ROLL algorithm, 272–277 Langevin, 667–670 liquid structure, 218–219 metadynamics, 384–389 microcanonical alanine dipeptide, 132–133 harmonic oscillator, 128–129 Lennard-Jones fluid, 130–131 multiple time-scale integration, 116–121 resonance-free, 214–218 resonances, 121–122 nonequilibrium, 583–594 Nos´ e algorithm, 188–190 Nos´ e-Hoover algorithm, 190–191 Nos´ e-Hoover chain algorithm, 198–210 path integral, 525–533 replica exchange Monte Carlo, 324 ring polymer, 633–634 shadow Hamiltonian, 124–127 symplectic quaternions, 122–124 temperature-accelerated sliced sampling, 389–390 time correlation functions, 579–583 velocity Verlet algorithm, 102–103 Verlet algorithm, 101–102 Wang-Landau algorithm, 328 Moment of inertia, 40, 44 Monte Carlo acceptance ratio, 313 canonical ensemble, 315–318

856

Index

central limit theorem, 304–308 detailed balance, 312 grand canonical ensemble, 319–320 hybrid, 320–323 importance sampling, 310–312 isothermal-isobaric ensemble, 318–319 M(RT)2 algorithm, 312–320 particle insertion/deletion, 319 pass, 316, 534 path integral, 533–535 rejection methods, 313 replica exchange, 323–327 spin lattice, 738 Wang-Landau algorithm, 327–328, 373–374 Mori-Zwanzig theory, 682–687 Neural network, 750–764 activation function, 753–755 exponential linear unit, ELU, 755 leaky ReLU, 754 learnable, 755 rectified linear unit, ReLU, 754 sigmoid, 754 softmax, 762 softplus, 755 swish, 755 back propagation, 756–757 classification, 760–763 learning collective variables, 774–782 convolution, 763–764 filter, 763 flattening layer, 764 feed forward, 752 feed-forward network, 751 hidden layers, 753 loss function, 754 cross entropy, 763 gradient training, 759 least-squares, 754 nodes, 753 regression, 750–760 Newton, Isaac, 1 laws of motion, 1–4 second law differential equation, 2, 4 free particle, 3 harmonic oscillator, 7 Noether’s theorem, 23 Non-Hamiltonian classical statistical mechanics, 191–197 dynamical system, 37, 192 Nonholonomic constraint, 32 Normal distribution, 741 Normalizing flow, 395, 792 Nos´ e Hamiltonian, 188 Nos´ e-Hoover chain equations, 199 Nos´ e-Hoover equations, 191 Occupation number, 449 One-particle density matrix, 464

Onsager regression hypothesis, 569 Operator anticommutator, 616 classical Liouville, 110–116, 563, 564, 683 classical propagator, 111 commutator, 111, 405 constant of the motion, 411 density matrix, 432–434 Heisenberg picture, 411 Hermitian, 401 idempotent, 683 incompatible observables, 405 interaction picture, 604 Ornstein-Uhlenbeck, 668 projection, 682–683 quantum Liouville, 435 raising and lowering, 418, 689 spin operators, 419–420 time evolution, 406, 609 time-ordering, 608 transfer matrix, 711, 713–715 unitary, 406 Operator factorization Baker-Campbell-Hausdorff, 126, 544 multiple time scale, 119, 209 Suzuki-Yoshida, 207 end versus middle, 207–210 Trotter, 113 Orbital angular momentum, 15 Order parameter, 696, 704 Outer product, 402 Pair correlation function, 161 Pairwise additive forces, 4, 172 potential, 169, 172, 180, 224, 316, 593 spin-spin interaction, 699 Paracetamol, 270–272 Parallel tempering, 324–325 Parity invariance of Ising model, 703 Particle reservoir, 284 Partition function blue moon ensemble, 367 canonical ensemble Wang-Landau algorithm, 327 classical canonical, 144 bead-spring model, 155 configurational, 160 free particle, 150 harmonic bath, 153 harmonic oscillator, 152 ideal gas, 151 λ-dynamics, 342 relation to microcanonical partition function, 144 separable Hamiltonian, 225 van der Waals theory, 182 classical grand canonical, 288, 289 ideal gas, 291 classical isoenthalpic-isobaric, 234

Index classical isokinetic, 210–212 classical isothermal-isobaric, 237, 245 ideal gas, 242 relation to canonical partition function, 238 classical microcanonical, 81 Andersen Hamiltonian, 253 harmonic bath, 97 harmonic oscillator, 97 ideal gas, 92 Nos´ e Hamiltonian, 188–189 thermal contact, 87 thermal reservoir, 142 classical non-Hamiltonian, 195 Nos´ e-Hoover algorithm, 196–197 constrained system, 227 definition, 72 Ising model, 705 mean-field, 706 one-dimensional, 711 two-dimensional, 713 quantum canonical, 436 discrete path integral representation, 525 harmonic oscillator, 440 identical particles, 513–522 path integral representation, 495, 499, 507 staging path integral representation, 528 quantum grand canonical ideal boson gas, 453 ideal fermion gas, 453 second quantization, 453 uniform ensemble, 85 Path collective variable, 778–780 Path integral Born-Oppenheimer approximation, 558 bosonic and fermionic exchange, 517–521 classical isomorphism, 526 discrete, 494 functional integral, 502–511 influence functional, 556 path centroid, 523, 531 quantum time correlation function, 629 recursion formula for identical bosons and fermions, 520 Path integral molecular dynamics fourth-order factorization, 543–549 isothermal-isobaric and grand canonical ensembles, 552–554 open chains, 542–543 ring polymer contraction, 549–552 Suzuki-Chin operator factorization, 545 Takashi-Imada trace relation, 544 Pauli exclusion principle, 421, 451, 457, 464, 473 Phase diagram, 696 Phase space, 6 accessible, 8 canonical distribution, 143 coarse-grained, 76, 81

857

distribution Eulerian view, 68 Lagrangian view, 68 distribution function, 68 driven harmonic oscillator, 561 ensemble average, 71 extended, 186–191 Nos´ e Hamiltonian, 188–190 Nos´ e-Hoover chain equations, 198–202 Nos´ e-Hoover equations, 190–191 free particle, 6 grand canonical distribution, 287 harmonic oscillator, 7, 561 hill potential, 8 incompressibility, 23 isoenthalpic-isobaric distribution, 234 isothermal-isobaric distribution, 237 λ-dynamics, 347 nonequilibrium distribution, 565 Nos´ e-Hoover harmonic oscillator, 197 path integral molecular dynamics, 529 projection operators, 683 shadow Hamiltonian, 125 steady state distribution, 562 Phase transition, 176, 695 first order, 696 second-order, 697 Phase-space trajectory, 6, 21 vector, 6, 48, 80, 204, 351, 566 volume, 49, 67, 69, 192, 203 Phase-space compressibility, 67, 192 isokinetic equations, 212 Nos´ e-Hoover chain equations, 199 Nos´ e-Hoover equations, 196 Phase-space function, 22, 71, 82, 110, 160, 566–567, 569 work, 350 Phase-space metric, 192–194 blue moon ensemble, 367 isokinetic equations, 212 Martyna-Tobias-Klein equations flexible cell, 258 isotropic, 257 Nos´ e-Hoover chain equations, 200 Nos´ e-Hoover equations, 196 Planar Couette flow, 570, 586, 589, 590, 596 Planck’s constant, 397 Poincar´ e section, 9 Poisson bracket classical limit of commutator, 411 conservation law, 22 definition, 22 Liouville operator, 110 Polymorphism in molecular crystals, 270–272 Position Verlet, 136 Potential distribution theorem, 295–297 Potential energy function conservative forces, 10 hard-sphere, 180

858

Index

Lennard-Jones, 130 simple force field, 117 soft-sphere, 593 Potential of mean force, 349 Potts model, 700, 733, 734 Power spectrum, 643 Probability distribution Bernoulli, 742 Boltzmann, 139 categorical, 743 exponential, 309 from wave function, 398 Gaussian, 104, 375, 741 Lorentzian, 338 marginal, 362 Maxwell-Boltzmann, 103 multivariate Gaussian, 742 phase space canonical, 143 grand canonical, 287 isoenthalpic-isobaric, 234 isothermal-isobaric, 237 phase-space, 68 Projection operator, 683 Propagator classical, 111 constant-pressure molecular dynamics, 264 Mori-Zwanzig theory, 683–687 multiple time-scale integration, 118–268 Nos´ e-Hoover chain equations, 204 Suzuki-Yoshida factorization, 207 symplectic quaternion, 124 Trotter theorem, 112–116, 669 unperturbed, 565 quantum, 406, 490 free particle, 497 imaginary time, 490 interaction picture, 605 path integral representation, 494, 504 relation to canonical density matrix, 490 time-dependent perturbation theory, 605–610 Wick rotation, 491 Quantum amplitude, 398 expectation value, 398, 404 free energy, 522–524 bead estimator, 522 centroid estimator, 523 interference, 486 mechanics, 398 operator, 401 path integral, 398, 487 Pauli exclusion principle, 421, 451, 457, 464, 473 probability, 398 state vector, 400 wave equation, 409

wave function, 409 Quantum free energy enhanced sampling of, 524 Quantum-classical correspondence principle, 409 Quaternions, 45–47, 122–124 Radial distribution function calculation of, 219 coordination number, 164 definition, 163 hard-sphere model, 181 isothermal-isobaric, 280 Lennard-Jones liquid, 163, 269 low-density limit, 225 perturbation theory, 180 relation to energy, 170 relation to pressure, 173 structure factor, 167 uniqueness of, 174 virial coefficients, 174 water, 164 Random force, 652–654 autocorrelation, 687 autocorrelation function, 653 Gaussian, 654 Random forest, 766 Random variable Gaussian, 104–105 uniform, 308 Rare event, 328–329, 357–359, 363 RATTLE algorithm, 109, 272–277 Regularizer, 746 Renormalization transformation equations, 722 Renormalization group, 720 block spin transformation, 721–722 decimation transformation, 734 fixed point, 723, 725–729 stable, 726, 728, 733 unstable, 725, 728, 729, 732 flow, 725 irrelevant variable, 730, 731 linearized, 729–730 marginal variable, 730, 731 pair cell transformation, 734 relevant variable, 730, 731 universality, 731–733 Renormalized frequency, 660, 686 Replica exchange Monte Carlo, 323–327 Resonance instabilities in multiple time-scale integration, 137 Ridge term, 746, 750 Rigid body, 37 equations of motion, 45 Ring polymer contraction, 549–552 ROLL algorithm, 272–277 Rotation formula, 38 Rough energy landscape, 323, 357 Sackur-Tetrode equation, 95, 292

Index Scalar potential, 600 Scale invariance, 721 Scaling length factor, 728 relations, 720 variable, 729 Scaling hypothesis, 720 Schr¨ odinger equation, 405, 409, 430, 434, 603 single-particle, 447 Second fluctuation dissipation theorem, 653 Second law of thermodynamics, 60–62 Second quantization, 449 Separatrix, 9 SHAKE algorithm, 106–109, 272–277 Shannon entropy, 147, 239, 290 Shear rate, 570, 586, 588, 589, 592, 593 Shooting move, 332–335, 680–682 aimless shooting, 334 Slater determinant, 425, 449, 452 SLLOD algorithm, 584 Smooth particle-mesh Ewald summation, 810–812 Spectral density, 643, 649 State function, 57, 339, 344 State vector, 400 Stationary phase method, 306 Steady state, 562 Steepest descent, 758 Steinhardt order parameters, 383, 776 Step function, Heaviside, 84, 134, 688 Stirling’s approximation, 92 Stochastic gradient descent, 758 Strain-rate tensor, 590 Strang splitting formula, 112 Structure factor, 167, 808, 810, 811 partial, 167 Sublimation curve, 696 Supercritical fluid region, 696 Support-vector expansion, 747 Susceptibility, complex definition, 623 relation to spectrum, 623–626 Suzuki-Chin factorization, 545 Swiss roll data manifold, 785 Switching function, 130, 342 Symmetry functions, 774 Symplectic condition definition, 24 harmonic oscillator, 24 integrator, 103 quaternions, 122–124 shadow Hamiltonian, 124–125 symplectic map, 124 velocity Verlet, 115 Takahashi-Imada fourth-order trace relation, 544 Temperature-accelerated sliced sampling, 389–390 Tensor product, 402

859

Thermal equilibrium, 86–89 Thermal reservoir, 141 Thermodynamic equilibrium, 57 Thermodynamic integration, 342–346 soft core, 346 Thermodynamic limit, 83, 85, 92, 97, 149, 287, 295, 454, 461, 475, 715 energy-level spacing, 438, 451 equivalence of ensembles, 98, 281, 452 Thermodynamic state, 57 Thermodynamic system, 57 Thermostat isokinetic, 210–214 isokinetic Nos´ e-Hoover chains, 214–218 Nos´ e, 188–190 Nos´ e-Hoover, 190–191, 195–197 Nos´ e-Hoover chain, 198–202 Nos´ e-Hoover Langevin, 671–672 stochastic isokinetic Nos´ e-Hoover, 672–676 Third law of thermodynamics, 63 Time correlation function classical, 567 calculation of, 579–583 energy, 665 Green-Kubo theory, 573, 577 harmonic oscillator, 578–579 Mori-Zwanzig theory, 686 velocity, 658 vibrational dephasing, 662–667 correlation time, 568 harmonic approximation, 619 infrared spectrum, 619–620 Onsager regression hypothesis, 569 properties of, 568–569 quantum, 614, 617 centroid molecular dynamics approximation, 631–633 coordinate-space representation, 627 eigenstate representation, 627 harmonic oscillator, 617–619 Kubo-transformed, 630–631 linear response theory, 622–625 ring polymer molecular dynamics approximation, 633–634 self-consistent test, 634 symmetrized, 628–629 random force, 653 symmetrized open chain, 637–641 Time-ordered exponential, 610 Time-ordering operator, 608 Torque, 42 Trajectory, 3 action integral, 29 central potential, 15 chaotic, 131 classical correlation function calculation, 579–583 conservation of phase-space distribution, 70

860

Index

constrained system, 36 correlation time, 568–569 damped-driven oscillator, 562 driven oscillator, 561 ergodic hypothesis, 100 harmonic oscillator, 8, 561 Langevin, 669 hill potential, 9 histogram test, 390–391 instanton, 556 Jarzynski’s equality, 350–352 kink, 556 metadynamics, 384–386 numerical global error, 113 harmonic oscillator, 128 shadow Hamiltonian, 125 Verlet algorithm, 102 phase-space, 6, 21 Transfer matrix, 711, 713–715 Transformation bead-spring model, 156 block spin, 721–722, 724, 734 Carnot cycle, 60 center-of-mass, relative coordinates, 16 coordinate, 12 decimation, 734 gauge, 600 generalized coordinates, 12, 364 imaginary time, 443, 491 Legendre, 17–18 linearized renormalization group, 729 normal mode, 27, 529–530 pair cell, 734 parity, 703 path sum and difference, 638 renormalization group, 722, 724 rotation matrix, 46 scaled coordinates, 170, 247, 252 second law of thermodynamics, 62 spatial warping, 394 spherical polar coordinates, 14 staging, 527 thermodynamic, 57, 78, 343, 365 time translation on phase space, 24, 66, 192, 356 translation group, 23 Transition frequency, 610 Transition path ensemble, 331 Transition path sampling, 328–335, 680–682 Transition rate, 611 Translation group, 23 Triple point, 696 Trotter factorization, 115, 118, 126, 135, 204, 213, 264, 268, 556, 587, 669 end versus middle schemes, 204 for d-AFED/TAMD, 380 Trotter theorem, 112, 491, 815–817

Umbrella potential, 370 Umbrella sampling, 370–373 Unified free-energy dynamics, 388–389 Universality class, 185, 697 van der Waals theory critical exponents, 186 equation of state, 183–185 Helmholtz free energy, 182 partition function, 182 Vector potential, 424, 600 Velocity Verlet algorithm, 102–103, 114–116, 204, 321, 587 Verlet algorithm, 101–102 Verlet list, 805 skin length, 805 update condition, 806 Vibrational dephasing, 664–667 Virial, 171, 172 atomic, 261 coefficients, 174 energy estimator, 535, 537 equation of state, 174 heat-capacity estimator, 555 molecular, 262 pressure estimator, 539 second virial coefficient, 174, 457, 473, 482 Virial theorem classical, 83–86, 98, 101, 152, 202, 212, 227, 572 path integral, 535 pressure, 240–241 tensorial, 247 work, 241–242 Volterra equation, 690 Wang-Landau algorithm, 327–328, 373–374, 738 Wave equation, 601 Wave function, 409 Weighted histogram method, 370–373 Weighted neighbor methods, 764–767 k-nearest neighbors, 766 decision tree, 764–765 random forest, 766 Wick rotation, 491, 630 Wiener-Khintchine theorem, 691 Work distribution, 356 free-energy inequality, 343, 365 Jarzynski’s equality, 350–357, 393 mechanical, 2, 10, 21 thermodynamic, 58–62, 72, 73, 78–79, 234, 241, 247 virial theorem, 241, 247, 251, 254, 277 Zero-point energy, 440, 464