The Ashgate Companion to Contemporary Philosophy of Physics 0754655180, 9780754655183

Introducing the reader to the very latest developments in the philosophical foundations of physics, this book covers adv

203 59 6MB

English Pages 392 [393] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Ashgate Companion to Contemporary Philosophy of Physics
 0754655180, 9780754655183

Table of contents :
Contents
Preface
Acknowledgements
1 Advancing the Philosophy of Physics • Dean Rickles
2 Philosophy of Quantum Mechanics • David Wallace
3 A Field Guide to Recent Work on the Foundations of Statistical Mechanics • Roman Frigg
4 Philosophical Aspects of Quantum Information Theory • Chris Timpson
5 Quantum Gravity: A Primer for Philosophers • Dean Rickles
Index

Citation preview

The Ashgate Companion to Contemporary Philosophy of Physics

Edited by

Dean Rickles Unit for History and Philosophy of Science University of Sydney

First published 2008 by Ashgate Publishing Published 2016 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN 711 Third Avenue, New York, NY 10017, USA Routledge is an imprint of the Taylor & Francis Group, an informa business

Copyright  2008 Dean Rickles Dean Rickles has asserted his moral right under the Copyright, Designs and Patents Act, 1988, to be identified as the editor of this work. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data The Ashgate companion to contemporary philosophy of physics 1. Physics – Philosophy I. Rickles, Dean II. Companion to contemporary philosophy of physics 530'.01 Library of Congress Cataloging-in-Publication Data The Ashgate companion to contemporary philosophy of physics / edited by Dean Rickles. p. cm. Includes bibliographical references and index. ISBN 978-0-7546-5518-3 (hardcover : alk. paper) 1. Physics—Philosophy. 2. Philosophy, Modern. I. Rickles, Dean. QC6.A84 2008 530.01—dc22 2007051083 ISBN 9780754655183 (hbk)

iv

CONTENTS Preface Acknowledgements 1

1 3

Advancing the Philosophy of Physics Dean Rickles 1.1 What is Philosophy of Physics? 1.2 The Job of the Philosopher of Physics 1.2.1 The Interpretation Game 1.2.2 Experimental Philosophy 1.3 What Parts of Physics are Amenable to Philosophical Investigation? 1.4 Preview of the Chapters 1.4.1 Quantum Mechanics 1.4.2 Statistical Mechanics 1.4.3 Quantum Information Theory 1.4.4 Quantum Gravity

References 2

4 4 7 7 9 10 11 11 11 12 12 14

Philosophy of Quantum Mechanics David Wallace 2.1 The Measurement Problem: A Modern Approach 2.1.1 QM: Formalism and Interpretation 2.1.2 The Measurement Problem 2.1.3 Against the Traditional Account of QM 2.2 Decoherence Theory 2.2.1 The Concept of Decoherence 2.2.2 Domains and Rates of Decoherence 2.2.3 Sharpening Decoherence: Consistent Histories 2.2.4 Further Reading 2.3 Three Candidates for Orthodoxy 2.3.1 Decoherent Histories: The Solution That Isn’t 2.3.2 The New Pragmatism 2.3.3 The Consistent-Histories Interpretation 2.3.4 Operationalism 2.3.5 Further Reading 2.4 The Everett Interpretation 2.4.1 Multiplicity from Indefiniteness? 2.4.2 Preferred-Basis Problem: Solve by Modification 2.4.3 The Bare Theory: How Not to Think About the Wave Function v

16 16 17 20 21 22 22 24 26 29 29 30 31 33 35 39 39 39 40 42

vi

CONTENTS

2.5

2.6

2.7

2.8

2.4.4 Decoherence and the Preferred Basis 2.4.5 Probability: The Incoherence Problem 2.4.6 Probability: The Quantitative Problem 2.4.7 Further Reading Dynamical-Collapse Theories 2.5.1 The GRW Theory as a Paradigm of DynamicalCollapse Theories 2.5.2 The Problem of Tails and the Fuzzy Link 2.5.3 The Counting Anomaly 2.5.4 The Status of the Link Principles 2.5.5 Further Reading Hidden Variable Theories 2.6.1 Hidden Variables for Classical Physics 2.6.2 Constraints on Hidden-Variable Theories 2.6.3 Specific Theories I: Modal Interpretations 2.6.4 Specific Theories II: de Broglie-Bohm theory 2.6.5 Underdetermination of the Theory 2.6.6 The Origin of the Probability Rule 2.6.7 The Ontology of the State Vector 2.6.8 Further Reading Relativistic Quantum Physics 2.7.1 What is Quantum Field Theory? 2.7.2 Particles and Quasiparticles 2.7.3 QFT and the Measurement Problem 2.7.4 Further Reading Conclusion

References 3

A Field Guide to Recent Work on the Foundations of Statistical Mechanics Roman Frigg 3.1 Introduction 3.1.1 Statistical Mechanics—A Trailer 3.1.2 Aspirations and Limitations 3.2 The Boltzmann Approach 3.2.1 The Framework 3.2.2 The Combinatorial Argument 3.2.3 Problems and Tasks 3.2.4 The Ergodicity Programme 3.2.5 The Past Hypothesis 3.2.6 Micro-Probabilities Revisited 3.2.7 Limitations 3.2.8 Reductionism 3.3 The Gibbs Approach

45 47 49 52 52 52 56 58 61 63 63 63 66 68 72 73 74 76 78 78 79 81 83 85 85 88

99 99 99 101 103 103 107 113 121 126 130 134 137 140

CONTENTS

3.3.1 The Gibbs Formalism 3.3.2 Problems and Tasks 3.3.3 Why Does Gibbs Phase Averaging Work? 3.3.4 Ontic Probabilities in Gibbs’ Theory 3.3.5 The Approach to Equilibrium 3.3.6 The Epistemic Approach 3.3.7 Reductionism 3.4 Conclusion 3.4.1 Sins of Omission 3.4.2 Summing Up Appendix A. Classical Mechanics B. Thermodynamics References 4

Philosophical Aspects of Quantum Information Theory Chris Timpson 4.1 First Steps with Quantum Information 4.1.1 Bits and Qubits 4.1.2 The No-Cloning Theorem 4.1.3 Quantum Cryptography 4.1.4 Entanglement-Assisted Communication 4.1.5 Quantum Computers 4.2 The Concept(s) of Information 4.2.1 Coding Theorems: Both What and How Much 4.2.2 Bits and Pieces 4.2.3 The Worldliness of Quantum Information 4.2.4 Application: Understanding Teleportation 4.2.5 Summing Up 4.3 The Physical Side of the Theory of Computation 4.3.1 Speed-Up 4.3.2 Whither the Church-Turing Hypothesis? 4.4 Foundations of QM 4.4.1 Instrumentalism Once More? 4.4.2 Axiomatics 4.5 Outlook 4.6 Further Reading

References 5

Quantum Gravity: A Primer for Philosophers Dean Rickles 5.1 The Strange Case of Quantum Gravity

vii

141 143 146 153 155 168 174 175 175 178 178 178 182 184

197 199 199 201 204 211 219 221 222 227 227 230 232 233 234 237 240 241 243 250 251 253

262 262

viii

CONTENTS

5.2

5.3

5.4

5.5

5.6

5.7

5.8 5.9

What 5.2.1 5.2.2 5.2.3

is the Problem of Quantum Gravity? Quantum Gravity as a Theory of Everything Quantum Gravity as Quantum Cosmology Quantum Gravity as Synthesis of ‘Quantum’ and ‘Gravitation’ Why Bother? Arguments for Quantum Gravity 5.3.1 Dimensional Analysis 5.3.2 Black Holes and Spacetime Singularities 5.3.3 Cosmological Considerations 5.3.4 Non-Renormalizability of General Relativity 5.3.5 Dreams of Unification 5.3.6 Incompatibilities 5.3.7 Gravity and the Measurement Problem 5.3.8 For Knowledge’s Sake A Potted History of Quantum Gravity 5.4.1 The Early History of Quantum Gravity 5.4.2 Developing the Research Avenues The Ingredients of Quantum Gravity 5.5.1 General Relativity 5.5.2 Quantum Field Theory 5.5.3 Conclusions The Manifold Methods of Quantum Gravity 5.6.1 Semiclassical Quantum Gravity 5.6.2 Covariant Quantization 5.6.3 String Theory 5.6.4 Canonical Quantization 5.6.5 Feynman Quantization 5.6.6 External Methods Special Topics 5.7.1 Interactions and Cross-Fertilization 5.7.2 Background Structure 5.7.3 The Experimental Situation 5.7.4 Interpretation of Quantum Theory Final Remarks: What Will the Future Bring? Resources and Further Reading

265 265 266 267 268 269 271 273 274 277 279 283 284 284 285 287 291 291 294 298 299 300 308 311 323 339 344 350 350 352 355 359 361 363

References

366

Index

383

PREFACE A cursory flick through the very few philosophy of physics textbooks that are currently available might give one the impression that philosophers are trailing rather far behind contemporary physicists in terms of the subject matter being investigated. On the other hand, browsing through the relevant journals Studies in the History and Philosophy of Modern Physics, The British Journal for the Philosophy of Science, and Philosophy of Science will give one quite a different impression, with philosophers hot on the tails of some of the very latest developments in physics. In effect, the other textbooks misrepresent what modern philosophy of physics is like, and the range of problems the modern philosopher of physics is interested in. The student of philosophy of physics is liable to be in for quite a shock in trying to make the transition from the available introductory textbooks to this very much more advanced material. There are, it seems, no textbooks on the market that cater for the middle ground between absolute beginner and consummate professional. This book is intended to plug that gap, providing a way in to the contemporary debates as featured in the aforementioned journals and research monographs. In addition to this, in dealing with more contemporary portions of physics—such as quantum computation and quantum gravity—it attempts to mend the links between philosophy and physics so that physics and philosophy stand closer together, if not quite in perfect lockstep. Physics is a fast-moving and very exciting field of play: today there are various quasi-revolutions and other more profound revolutions in progress—e.g. complexity theory, quantum information theory, and quantum gravity. Students of philosophy of physics should be exposed to this early on in their training, and given the skills needed to make sense of it. This book’s intended audience is beginning graduate (or advanced undergraduate) students of philosophy and physics. We hope to ‘turn on’ the latter to philosophy of physics and the former to physics. The aim is to prime the budding philosopher of physics to as advanced a level as possible without lapsing into research writing mode. A fair amount of mathematical sophistication is demanded in all of the chapters; this is only to be expected. However, every effort has been made to render the chapters self-contained. Useful references to enable the reader to pick up the necessary background are mentioned along the way. The reader is expected to have seen some philosophy of physics before, and seen some physics before. Ideally, too, they will have done the usual early undergraduate roster of calculus, linear algebra, and classical mechanics. The book could be cherry-picked for use as a one semester-long course text, or stretched out over two semesters covering all of the material—in either case the book should be supplemented with additional readings. Alternatively, Chapters 2 and 3, on the Philosophy of Quantum Mechanics (by David Wallace) and the 1

2

PREFACE

Foundations of Statistical Mechanics (by Roman Frigg) respectively, could be used for semester-long courses devoted solely to these topics, again supplemented by additional readings. I should point out that, at the time of writing, work is completed on another book in the philosophy of physics which contains detailed guides to the themes covered here (and more): Handbook on the Philosophy of Physics (Butterfield and Earman, 2007). That book is pitched at the level of advanced researchers: (late stage) doctoral students and beyond. I suspect, in fact, that even many professional physicists would struggle with much of the material in that book— though it does come highly recommended. The book you are holding is intended to fit in between books like Sklar’s Philosophy of Physics and the aforementioned handbook, thus serving as an introduction to exactly that sort of advanced philosophy of physics. It can, then, be viewed as a stepping stone to the content that appears in the handbook, and I think would be very useful in this respect. Given these aims, this book would be very useful for researchers in the philosophy of science and the philosophy of physics who either have limited expertise in philosophy of physics, or specialize in a particular branch of physics.

ACKNOWLEDGEMENTS I’d like to thank Chris, David, and Roman, for agreeing to contribute to this book. I know it has taken a large chunk of time out of their lives at a stage when they had lots more to be doing—Roman deserves a special thanks for his help with various editorial matters. I wish to thank Paul Coulham, formerly at Ashgate, for his kind encouragement, enthusiasm, and assistance during the initial stages of this work, and Anne Keirby (also at Ashgate) whose patience made the final stages of editing a much less stressful experience than they really ought to have been. The actual writing of this book—and all of the other work I have done for some time now—would not have been possible without the excellent program TEXshop (written by Richard Koch and Dirk Olmes) coupled with the TeTEX distribution (put together by Gerben Wierda). Also, on practical matters, I would like to thank Apple computers (Sophie Petkopoulos in particular) who quickly provided me with a much upgraded replacement Powerbook when my old one developed faults. The majority of this book was completed while I was a postdoctoral fellow at the University of Calgary, I’d like to thank both the philosophy department and the department of community health sciences for their warmth and hospitality during my stay. I should especially like to thank Elaine Landry for many vital ‘tea and Hob-Nobs’ sessions and Frank Zenker and Yosh Kobasigawa for equally vital ‘beer and pool’ sessions. As always, my wife, Kirsty, and children, Sophie and Gaia, kept me happy and sane during an often absurd work schedule which involved much patience on their part—special thanks to Sophie for helping with the editing. This book is for them, though they’ll probably never read it!

3

1 ADVANCING THE PHILOSOPHY OF PHYSICS DEAN RICKLES There are several (general) philosophy of physics textbooks on the market.1 An obvious question to ask at the outset then is: Why another one? In his “Some Philosophical Aspects of Particle Physics”, Michael Redhead bemoans the common practice of philosophers of science of dealing with examples “which are no longer of current research interest in science” ((1980), p. 279). This practice is still common even among many philosophers of physics, and, for the most part, the available textbooks continue to engage in this practice. Generally, one finds a little spacetime, a little statistical mechanics, and a little quantum theory. Now, these are of course the ‘pillars’ of modern physics, so it is what we should expect to be covered in any textbook on the philosophy of physics worth its salt (and, indeed, they are well represented in this book). However, the issues that are dealt with are usually very old fashioned and very limited in scope: ‘spacetime’ means ‘the twins paradox’ (and possibly conventionalism); ‘statistical mechanics’ means ‘time asymmetry’; and ‘quantum theory’ means ‘the measurement problem’. Redhead is surely right that this leads physicists to “regard philosophy of science as somewhat irrelevant”. Rightly so, too, if this were in fact representative of much of what actually goes on in philosophy of physics. However, the state of play as represented in philosophy of physics and (some) philosophy of science journals shows a very different level of engagement, with philosophers of physics investigating the frontiers of scientific research. This disparity (or the impression of such) is the raison d’ˆetre for the book you are now reading. The result is an introductory textbook covering those portions of philosophy of physics research that other textbooks fail to cover, or cover only very briefly or very simplistically. The three pillars of modern physics—relativity, quantum theory and statistical mechanics—are still represented, then, but from a more advanced and contemporary perspective. 1.1

What is Philosophy of Physics?

A pretty good characterization of a philosopher of physics is ‘one who is shunned by philosophers and physicists in near equal measure’ ! Philosophers often view 1 For example, (d’Espagnat, 2000; Kosso, 1997; Lange, 2002; Sklar, 1992; Torretti, 1999; Cushing, 1998). These are very different books, all excellent, several of which the reader ought to be acquainted with before tackling this book. Torretti’s book is especially interesting since it covers the development of philosophy of physics as a discipline in its own right, with its origins in the natural philosophy of the seventeenth century.

4

WHAT IS PHILOSOPHY OF PHYSICS?

5

philosophers of physics as focusing on too restricted a field and as mere lapdogs to physicists. Physicists often view philosophers of physics either as simply incapable of understanding their work or else as tackling issues that are incapable of being resolved or that are not really in need of being resolved (i.e. irrelevant), and hence that are no aid to progress. This is, I hope you will agree, unfair to both physicists and philosophers of physics for a variety of reasons: • Philosophers of physics don’t just accept what physicists tell them; a large part of their job is to interpret the constructions of physicists (and thus go beyond the ability of such constructions to yield accurate predictions). Another large part of the philosopher of physics’ job is to assess the epistemic status of the claims made by physicists (in the context of their theories or otherwise). This part is often highly critical, involving the denial of physicists’ claims. For example, it may be taken for granted by physicists that their theory comes with a particular unique ontology, a single way that the world would be if the theory were true. This is almost always false.2 If there are multiple pictures then it is assumed that evidence or internal, rational principles can settle the matter in favour of one or the other. Failing this, physicists might simply say they are not in the business of offering ontologies, only with making the right predictions. This might be true for many, but it conflicts with the intuitions of most physicists’ views of what they see themselves as doing, namely uncovering the (objective) secrets of Nature—Steven Weinberg, for example, staunchly defends such a view (see (Weinberg, 1987)). Philosophers of physics probe the assumptions that underlie such intuitions, and often expose erroneous ones. • Philosophers of physics have also been known to occupy more constructive rˆ oles too, contributing work on, e.g. quantum information theory, that is virtually indistinguishable from ‘normal’ physics—of course, many philosophers of physics began life as professional physicists, and in some cases led very distinguished careers in physics. One can often find philosophers contributing to physicists’ journals, books, conferences, and workshops— indeed, one frequently finds that the philosophers’ contributions are the most technical (possibly as a result of some need to prove themselves amongst the group of physicists). • Physicists aren’t just concerned with describing the actual world : they often work on circumscribing what is physically possible, and often on what is possible according to various notions of physical necessity (i.e. involving different laws of physics). However, more importantly, physicists often employ (albeit often implicitly) a great many philosophical principles—a belief 2 The problem amounts to the ’underdetermination’ of the ontology by the theoretical framework. David Wallace discusses this problem in the context of the quantum measurement problem: see §2.6.5. Another example is the underdetermination that arises in the interpretation of generally relativistic spacetimes: both the view that spacetime points are substances and that spacetime points are material constructs are compatible with the generally covariant formalism—Rickles touches upon this issue in §5.7.2.

6

ADVANCING THE PHILOSOPHY OF PHYSICS

in the unity of Nature being one example; a belief in the simplicity of Nature being another. Moreover, as an examination of the history of physics quickly shows, philosophical scruples are never far away in revolutionary research: Newton, Bohr, Schr¨ odinger, Heisenberg were all unashamedly ‘philosophical physicists’. Also, though they quite vocally said otherwise, the views of Dirac and Feynman were laced with philosophical presumptions. And let’s not forget Einstein; for him, philosophical thinking freed the mind leading to better physics: I fully agree with you about the significance and educational value of methodology as well as history and philosophy of science. So many people today—and even professional scientists—seem to me like someone who has seen thousands of trees but has never seen a forest. A knowledge of the historic and philosophical background gives that kind of independence from prejudices of his generation from which most scientists are suffering. This independence created by philosophical insight is—in my opinion—the mark of distinction between a mere artisan or specialist and a real seeker after truth. (Einstein, from a letter to R. A. Thornton—taken from (Howard, 2005), p. 34)

The links between philosophy and physics are, then, tight and old. There have been clear instances in the past where aspects of one have influenced aspects of the other. Results in physics have directly contributed to the demise (or at least severe weakening) of positions in philosophy, especially in metaphysics. Kant’s views on the necessity of certain concepts in experience—Euclidean space and time for example—foundered in the light of developments in physics. The reasons for the tight connections are obvious: both involve a desire to understand the fundamental nature of things. The categories investigated intersect: matter, space, time, causality, etc. If a philosophical theory concerning these categories does not accord with the world as described by physics, then it perishes. In this way physics acts as a judge presiding over cases of metaphysics. In other words, physics and philosophy of physics are not really so different (at least if we are talking about theoretical physics). This is a curious and, I think, almost unique case in which the ‘philosophy of X ’ is very close to X itself.3 A browse through the pages of the International Journal of Theoretical Physics, Foundations of Physics, Studies in the History and Philosophy of Modern Physics, makes it hard to tell what is physics and what is philosophy of physics. Indeed, one can often find collaborations between philosophers and physicists on both ‘philosophical products’ and ‘physics products’. The chapters in this book respect this blurry distinction: the contemporary landscape of physics, treated here, involves considerable philosophical reflection from both 3 Think of the cases where X is ‘music’, ‘art’, ‘religion’, or even ‘mathematics’; here the metastudy of these subjects is a long way from the subjects themselves. (Perhaps the relationship between (certain areas of) philosophy of cognitive science and its ‘scientific’ study comes close to the intimate relationship between philosophy of physics and physics.)

THE JOB OF THE PHILOSOPHER OF PHYSICS

7

camps and, likewise, contemporary philosophy of physics involves considerably more technical sophistication than previously required. The available textbooks on philosophy of physics simply fail to reflect the present situation—note that I do not say that they should ; they have different aims from those of this book.4 1.2

The Job of the Philosopher of Physics

There are many ‘philosophy of ...’ subjects, as mentioned above. Indeed, most work in philosophy is of such a sort: the application to some specific field of enquiry or subject matter of the concepts, tools and methods of philosophy. Then there are central ways—the ‘pillars of philosophy’—of investigating such things, i.e. ontological (metaphysical), logical, axiological, linguistic, epistemological, and methodological. The philosophy of physics is no different from, say, the philosophy of mind in this respect. One can apply all of the standard resources of philosophy to the subject matter of physics just as one can apply them to the mind. 1.2.1

The Interpretation Game

Arthur Fine once pondered what it is that philosophers of science do that scientists don’t do (1988, p. 4): Why are there philosophers of science at all? His answer, one that I agree with, boiled down to interpretation. This does not mean that scientists are not perfectly capable in such activities; Fine also rightly points out that they do engage in such debate. But they also sometimes engage in discussions of the ethics and politics or science, and venture into many areas that are deemed to lie outside of the proper practice of science. The point is, there are also people whose primary areas of expertise are those subjects just mentioned. Philosophers of physics, to a very large extent, devote themselves to the interpretation of physical theory. Likewise, that does not mean they are not perfectly capable of doing computational work in physics; many of them do too. Fine goes further: he would like to see philosophers of science doing empirical science. However, I don’t agree with Fine that “[w]ith very few exceptions ... this divides philosophers from scientists pretty well” (ibid., p. 6). A vast number of physicists don’t ‘get their hands dirty’ at all with empirical science. Fine’s main point, however, is that there isn’t really much of a difference between ‘the scientific understanding of science’ and ‘the philosophical understanding of science’. That matches what was previously argued. What does it mean to interpret a theory? Bas van Fraassen puts it best; the interpreter will ask: “Under what conditions is this theory true?” and “What 4 Here I am exluding the magnificent recent Handbook on the Philosophy of Physics (Butterfield and Earman, 2007), which pushes the line taken here, only even more so. However, this book is not (to my mind, at least) feasibly usable as a textbook, even at the post-graduate level. Rather, it is, as I’m sure the editors would be the first to admit, a book intended to aid researchers already at the top of their game in terms of physics and philosophy of physics. However, that book would make excellent, and hopefully not nearly so forbidding reading once the contents of the present book have been assimilated.

8

ADVANCING THE PHILOSOPHY OF PHYSICS

does it say the world is like?” (1991, p. 242). The interpreter will then answer by specifying the class of worlds that make the theory true.5 What we end up with, then, is a set of worlds that make the theory true; or, a set of possible worlds according to the theory. Thus, interpretation, according to Gordon Belot, “consists of a set of stipulations which pick out a putative ontology for the possible worlds correctly described by the theory” (1998, p. 533). Now, there are differing degrees of complexity which interpretation has to deal with. In some cases, interpreting a theory is a simple matter: the formalism seemingly maps one-to-one on to physically possible worlds. Or, there might be multiple types of possible worlds that constitute models of the theory, but that are compatible. The kinds of interpretive problems philosophers of physics are interested in happen when there are multiple incompatible possible worlds. We see this, for example, in the interpretation of quantum mechanics, where we have possible worlds (each satisfying the formal demands of quantum mechanics) with collapse and without collapse, with world-splitting and without world-splitting, and so on. Or quantum field theory with particles and without particles. Electromagnetism as a theory of fields versus potentials versus loops; as a local theory versus an action-at-a-distance theory. General relativity as a theory with fundamental spacetime points and without fundamental spacetime points. These are empirically equivalent and they satisfy the basic postulates of the theories (van Fraassen’s syntactic part), but they are not equivalent simpliciter : they differ at the level of interpretation. This is, of course, the problem of underdetermination again: the representation relation between formalism (the syntactic structure of a theory) and worlds is one-to-many. We find similar underdetermination problems in all of the main pillars of physics. The basic problem is that there is an apparent ontological difference with no empirically discernible difference: even assuming one of the worlds really did describe the actual world, there is no way we could come to know, empirically, which world that was.6 The negative energy solutions of Dirac’s equation make a fine case in point here. The solutions appeared as part of the formal structure in Dirac’s theory, but they weren’t included in the semantics: no interpretation was given to them, and they were deemed ‘surplus structure’, to borrow a phrase of Michael Redhead’s. However, this is a rather large amount of surplus to be carrying around, and there is no apparent reason for its existence (qua surplus) that would explain it—that is, there is no symmetry responsible for generating it, or any other noticeable mechanism. Dirac reinterpreted the negative energy solutions (or just interpreted them in our sense) so that they left ‘the realm of surplus’ and entered ‘the realm of the 5 Van Fraassen argues that there are two parts to this specification: (1) a syntactic part, in which the purely formal structures are specified; and (2) a semantic part, in which the ‘worlds’ are specified (where the worlds are models of the formal structure outlined in the first part). 6 The stage is set, at this point, for a rehearsal of the various positions in philosophy of science: instrumentalism; constructive empiricism; structural realism, etc. However, these issues will not, on the whole, play any part in this book, though the material covered will no doubt have consequences for these more traditional philosophy of science type debates.

THE JOB OF THE PHILOSOPHER OF PHYSICS

9

physical’: they were taken to have a counterpart in the worlds that are models of the theory (i.e. that satisfy the equation). This new interpretation involved viewing the negative energy states as ‘holes’ that are positive (anti-electrons). These holes are ‘filled’ by electrons. Of course, interpretation demands there be something to interpret. However, one of the issues faced in the chapters by Frigg (on statistical mechanics) and by Rickles (on quantum gravity) is that there is no unique, agreed upon formal framework. We don’t quite know what the theory is! Quantum mechanics too has numerous formulations, but they are equivalent.7 Not so in the cases of statistical mechanics and quantum gravity, though the reasons are different: in the case of quantum gravity there is simply a problem concerning experimental evidence. The problem for statistical mechanics is more normative: we aren’t entirely sure what it ought to be a theory of. For this, and other reasons, these chapters follow a slightly different route to the others: the class of interpretations is multiplied by the number of formal frameworks. However, the issues involved are the same once we target a specific formalism. 1.2.2 Experimental Philosophy The interpretation game seeks to work out what a theory is telling us about the world. Related to the interpretation game is the second major job of the philosopher of physics: seeing what impact this has on our conceptual scheme. The results of interpretation are often incompatible with widely-held beliefs about the world, especially where space, time, and matter are concerned.8 When the theory has a firm experimental basis, this can lead to revisions. Even without experiment it suggests contingency in that conceptual scheme, the possibility that it might be wrong. Abner Shimony coined the term ‘experimental metaphysics’ to describe cases where some metaphysical assumptions are rendered untenable (or less likely) or supported by ‘real world’ experiments (see Cohen et al., 1997 and Shimony, 1993). The case he had in mind was the metaphysical and experimental status of Bell’s theorem: Bell’s inequality is violated by experiments, which demonstrate quantum correlations—or, at least, provide virtually incontrovertible evidence. However, Bell’s theorem is loaded with weighty metaphysical assumptions having to do with determinism, realism, and locality. These back-react on a host of philosophical positions. This general methodology can be readily generalized to other cases. For example, recent work in the philosophy of spacetime physics aims to show how metaphysical positions having to do with the reality of time and change are impacted upon by particular approaches to quantum gravity (see 7 Wallace, in this book, argues that though quantum mechanics provides a reliable algorithm for generating accurate predictions of macroscopic phenomena, it does not constitute a satisfactory physical theory: it is unable to explain why it is so successful. 8 For example, even very basic courses on the philosophy of space and time now have to take into account special relativistic considerations. The Newtonian idea of a unique ‘now line’ partitioning the events of the universe into past and future is simply no longer tenable, and this is so for experimental reasons—cf. (Saunders, 2002).

10

ADVANCING THE PHILOSOPHY OF PHYSICS

Belot and Earman, 2001; Rickles, 2006). Classical general relativity too—with its exotic solutions involving wormholes and rotating, cylindrical universes permitting time travel—has greatly modified philosophical debate over time, change, and causation. Even the higher reaches of modal metaphysics are not immune from the impact of physics: something of a cottage industry of work has sprung up charting the connections between theories of possibility and interpretations of symmetries (see (Rickles, 2007) for a review. Along different lines, (Butterfield, 2004) shows how analytical mechanics is heavily laden with modal involvement. Given the basis in physics, one could argue that this experimental brand of metaphysics offers a justification (or a concrete implementation) of Strawson’s (1990) ‘revisionary metaphysics’, which he had argued against in favour of so-called ‘descriptive metaphysics’. For Strawson, our conceptual scheme—as elucidated by metaphysics—is not the kind of thing that changes over time. At the very least, there is a constant core based on the subject-predicate distinction. However, even this distinction is placed under pressure by discoveries in physics. Whether Strawson would have seen matters in this way as quite another thing, of course. Also, whether, and to what extent one can really do experimental metaphysics is not something we need discuss here—certainly the cases are not always as straightforward as appears at first sight (the quantum gravity example is a case in point and the case of Bell’s theorem too is hardly settled beyond all doubt). Part of the problem here is the underdetermination of interpretive pictures by the formalism and the experiments. However, it cannot be denied that much of contemporary philosophy of physics is engaged in experimental metaphysics in some way, and the contributions to this book are no exception. 1.3

What Parts of Physics are Amenable to Philosophical Investigation?

My preferred answer would be: any and all! Answers that say otherwise confuse the distinction between what philosophers of physics do and what philosophers of science (in general) do. Philosophy of physics is concerned with the products of physics, the theories and models. Given the relatively uniform structure of these theories and models (i.e. they use, roughly, the same mathematical structures) it is reasonable to assume that they will all submit to the probing of their implications. For once we have such a structure to hand, we can look at its possible interpretations. This is the case even when there is no experimental evidence in favour of some theory. This is precisely what we find in quantum gravity research: theories with zero experimental support, but that can be interpreted in just the same way as if there was such support. In this sense, doing philosophy of physics (qua interpretation of physics) and doing theoretical physics are very similar indeed: both construct and examine possible worlds. It is up to the Nature and the experimentalists (and the philosophers of science) to tell us whether we’re on to something! Taking this liberal view on board, this book is about applying philosophy to very recent advances in physics, or at least recent enough not to be included in most other philosophy of physics textbooks. Inasmuch as

PREVIEW OF THE CHAPTERS

11

traditional issues are covered, and they are, they are dealt with in the context of the contemporary debate.

1.4

Preview of the Chapters

As mentioned in the introduction, the three pillars of modern physics are represented in this book: quantum theory is covered by David Wallace and Chris Timpson, statistical mechanics by Roman Frigg, and relativity by Dean Rickles. There are points of overlap between the chapters, as should be expected given that these theories are intended to describe one and the same world. A nutshell preview of the chapters follows. 1.4.1

Quantum Mechanics

Quantum mechanics generates by far the most work within philosophy of physics. It is responsible for the philosophical problem of physics: the measurement problem. Given the seriousness of this problem, and its habit of infecting numerous prima facie distinct issues, Wallace focuses his chapter squarely on it. His (very careful and finely nuanced) review concentrates on very recent work, involving decoherence, consistent histories, dynamical-collapse interpretations, hiddenvariables (modal) interpretations, and neo-Everettian views. Wallace finishes by considering the state of the interpretative issues in the context of quantum field theory, where the merger with special relativity means that Lorentz invariance has to be satisfied. Overall, Wallace paints a very bleak picture of the situation in quantum mechanics, one that warrants immediate and serious attention from both physicists and philosophers. This should be seen as a challenge rather than a criticism. 1.4.2

Statistical Mechanics

Statistical mechanics, and thermal physics more generally, began with foundational issues, and these remain today. This is not surprising given the central rˆ ole probability plays. As Roman Frigg points out in his chapter, there is still no unique formal framework of statistical mechanics to speak of; instead, there are many competing approaches. Indeed, Jos Uffink describes the foundations of statistical physics as “a battlefield between a dozen or so different schools, each firmly dug into their own trenches” (2004, p. 6). However, the overarching aim is the same: to provide an account of how macrophysical phenomena (the properties and behaviour of ‘unit complex systems’) are related to microphysical phenomena (the properties and behaviour of individual subunits contained in the unit). The statistical element arises since there are enormous numbers of subunits per unit. A central problem is trying to make sense of the connection given the apparently very different laws obeyed by units and subunits respectively. Frigg critically examines the details of the various approaches, highlighting those assumptions and implications of philosophical significance, especially those linked to the notion of equilibrium and non-equilibrium.

12

1.4.3

ADVANCING THE PHILOSOPHY OF PHYSICS

Quantum Information Theory

Quantum information theory does not break radically new ground conceptually speaking. However, what it does do is provide a new window onto the conceptual problems of quantum theory. Moreover, it forces us to better understand the nature of classical theory and the ways in which classical and quantum differ. This question is not pursued purely for philosophical interest; rather, it is pushed in a bid to exploit the quantum world. An underlying hope, however, is that in getting greater physical insight into the quantum mysteries, one might thereby be able to resolve the philosophical problems facing quantum mechanics. Timpson discusses these issues, giving a sober assessment of many grandiose claims that have been drawn so far, preferring to draw inferences about the structure of quantum theory from quantum information theory, rather than about metaphysical interpretive issues. 1.4.4

Quantum Gravity

Quantum gravity does not yet denote a specific theory or approach; instead there are many competing research programmes battling it out to earn that title. An utter lack of experimental evidence is at the root of this proliferation, itself due to the miniscule distances at which quantum gravity is expected to make itself manifest. Given that quantum gravity ought minimally to include general relativity and quantum theory (in the appropriate limits) we face the problems with these ‘ingredient’ theories, and then some. There are many novel conceptual problems brought about by the merger between quantum and gravity, including, not surprisingly, problems having to do with the nature of space, time, and change. Rickles introduces the basic ideas of quantum gravity, the various research programmes, and exposes a number of these problems. These chapters make it clear that there is plenty of work to be done in philosophy of physics. Recent developments in physics have not resolved the conceptual problems that plagued the theories in their earliest days. We do have new ways to make sense of the problems, and the problems themselves are more narrowly circumscribed; however, in many ways we face now the same set of problems that faced earlier generations. It is hoped that this book will spark a new generation of philosophers of physics who might one day resolve the problems raised in these pages once and for all. *** As Roberto Torretti’s book (1999) makes wonderfully clear, the philosophy of physics is an evolving discipline. It was once, in Newton’s day, so tightly woven into the fabric of physics as to be virtually one and the same thing. At the beginning of the twentieth century the ties that bind closed again, with little to distinguish philosophical discussion of physics from physics itself. These periods of proximity were sparked off by revolutionary episodes in physics, causing the

PREVIEW OF THE CHAPTERS

13

foundations to feel not quite so sturdy.9 Though many physicists try to ignore it, the foundations of physics never quite recovered from these blows. As Carlo Rovelli puts it, the revolution was “incomplete” (2000). Philosophers are left with the mess of trying to piece together the various fragments of the theoretical edifice. But we are left with many puzzles and, given the conceptual nature of many of these, it will take the efforts of both physicists and philosophers of physics to solve them.

9 For example, Kuhn wrote that “It is no accident that the emergence of Newtonian physics in the seventeenth century and of relativity and quantum mechanics in the twentieth should have been both preceded and accompanied by fundamental philosophical analyses of the contemporary research tradition.” (Kuhn, 1970, p. 88). A little further on, he writes: “Confronted with anomaly or with crisis, scientists take a different attitude toward existing paradigms, and the nature of their research changes accordingly. The proliferation of competing articulations, the willingness to try anything, the expression of explicit discontent, the recourse to philosophy and the debate over fundamentals, all these are symptoms of a transition from normal to extraordinary research.” (Kuhn, 1970, pp. 90–1)

REFERENCES Belot, G. (1998). Understanding electromagnetism. The British Journal for the Philosophy of Science, 49: 531–55. ————– and J. Earman. (2001). PreSocratic quantum gravity. In C. Callender and N. Huggett, eds, Physics Meets Philosophy at the Planck Scale, pp. 213– 55. Cambridge: Cambridge University Press. Butterfield, J. and J. Earman eds (2007). Philosophy of Physics, Vol. 2 of Handbook of the Philosophy of Science. Elsevier Science Publishing Co.. ————– (2004). Some aspects of modality in analytical mechanics. P. Weingartner and M. Stoeltzner eds, Formale Teleologie und Kausalitat in der Physik, pp. 160–98. Mentis. Cohen, R.S., M. Horne and J. S. Stachel eds (1997). Experimental Metaphysics: Quantum Mechanical Studies for Abner Shimony, Vol. 1. Springer. Cushing, J. T. (1998). Philosophical Concepts in Physics: The Historical Relation between Philosophy and Scientific Theories. Cambridge: Cambridge University Press. ————– (1988). Foundational problems in and methodological lessons from quantum field theory. In H. Brown and R. Harr´e eds Philosophical Foundations of Quantum Field Theory, pp. 25–39. Oxford: Clarendon Press. d’Espagnat, B. (2000). On Physics and Philosophy. Princeton University Press, 2000. Fine, A. (1988). Interpreting science. Philosophy of Science (Proceedings), 2, Symposia and Invited Papers: 3–11. Howard, D. (2005). Albert Einstein as a philosopher of science. Physics Today, December: 34–40. Kosso, P. (1997). Appearance and Reality: An Introduction to the Philosophy of Physics. Oxford: Oxford University Press. Lange, M. (2002). An Introduction to the Philosophy of Physics: Locality, Fields, Energy and Mass. Oxford: Basil Blackwell. Redhead, M. L. G. (1980). Some philosophical aspects of particle physics. Studies in the History and Philosophy of Science, 11(4): 279–304. Rickles, D. (2006). Time and structure in canonical gravity. In D. Rickles, S. French, and J. Saatsi, eds, The Structural Foundations of Quantum Gravity, pp. 152–95. Oxford: Oxford University Press. ————– (2007). Symmetry, Structure and Spacetime. Philosophy and Foundations of Physics. Elsevier. Rovelli, C. (2000). The century of the incomplete revolution: Searching for general relativistic quantum field theory. Journal of Mathematical Physics, 41(6): 3776–800. Saunders, S. (2002). How relativity contradicts presentism. In C. Callender, ed, 14

REFERENCES

15

Time, Reality & Experience, pp. 277–92. Cambridge: Cambridge University Press. Shimony, A. (1993). Experimental test of local hidden-variables theories. In A. Shimony, ed, Search for a Naturalistic World View, Vol. II, pp. 77–89. Cambridge: Cambridge University Press. Sklar, L. (1992). Philosophy of Physics. HarperCollins Canada / Westview S/Dis. Strawson, P.F. (1990). Individuals: An Essay in Descriptive Metaphysics. London: Routledge. Torretti, R. (1999). The Philosophy of Physics. Cambridge: Cambridge University Press. Uffink, J. (2004). Boltzmann’s work in statistical physics. Stanford Encyclopedia of Philosophy, E. N. Zalta ed, http://plato.stanford.edu/entries/statphysBoltzmann. van Fraassen, B. (1991). Quantum Mechanics: An Empiricist View. Oxford: Oxford University Press. Weinberg, S. (1987). Towards the final laws of physics. In Elementary Particles and the Laws of Physics: The 1986 Dirac Memorial Lectures. Cambridge: Cambridge University Press.

2 PHILOSOPHY OF QUANTUM MECHANICS DAVID WALLACE Introduction By some measures, quantum mechanics (QM) is the great success story of modern physics: no other physical theory has come close to the range and accuracy of its predictions and explanations. By other measures, it is instead the great scandal of physics: despite these amazing successes, we have no satisfactory physical theory at all—only an ill-defined heuristic which makes unacceptable reference to primitives such as ‘measurement’, ‘observer’ and even ‘consciousness’. This is the measurement problem, and it dominates philosophy of quantum mechanics. The great bulk of philosophical work on quantum theory over the last half-century has been concerned either with the strengths and weaknesses of particular interpretations of QM—that is, of particular proposed solutions to the measurement problem—or with general constraints on interpretations. Even questions which are notionally not connected to the measurement problem are hard to disentangle from it: one cannot long discuss the ontology of the wavefunction,1 or the nature of locality in relativistic quantum physics, without having to make commitments which rule out one interpretation or another. So I make no apologies that this review of “the philosophy of quantum mechanics” is focused sharply on the measurement problem. §2.1 sets up the problem from a modern perspective; §2.2 is a self-contained discussion of the phenomenon of decoherence, which has played a major part in physicists’ recent writings on the measurement problem. In §2.3–2.6 I discuss the main approaches to solving the measurement problem currently in vogue: modern versions of the Copenhagen interpretation; the Everett interpretation; dynamical collapse; and hidden variables. Finally, in §2.7 I generalise the discussion beyond non-relativistic physics. I give a self-contained, non-mathematical introduction to quantum field theory, discuss some of its conceptual problems, and draw some conclusions for the measurement problem. 2.1

The Measurement Problem: A Modern Approach

The goal of this section is a clean statement of what the measurement problem actually is. Roughly speaking, my statement will be that QM provides a very 1 Here and afterwards I follow the physicists’ standard usage by using “wavefunction” to refer, as appropriate, either to the putatively physical entity which evolves according to the Schr¨ odinger equation or to complex-valued function which represents it mathematically. I adopt a similar convention for “state vector”.

16

THE MEASUREMENT PROBLEM: A MODERN APPROACH

17

effective algorithm to predict macroscopic phenomena (including the results of measurements which purportedly record microscopic phenomena) but that it does not provide a satisfactorily formulated physical theory which explains the success of this algorithm. We begin by formulating this algorithm. 2.1.1

QM: Formalism and Interpretation

To specify a quantum system, we have to give three things: 1. A Hilbert space H, whose normalised vectors represent the possible states of that system.2 2. Some additional structure on H (all Hilbert spaces of the same dimension are isomorphic, so we need additional structure in order to describe specific systems). The additional structure is given by one or both of • Certain preferred operators on Hilbert space (or, certain preferred sets of basis vectors). • A preferred decomposition of the system into subsystems. 3. A dynamics on H: a set of unitary transformations which take a state at one time to the state it evolves into at other times. (Normally the dynamics is specified by the Hamiltonian, the self-adjoint operator which generates the unitary transformations). For instance: 1. Non-relativistic one-particle QM is usually specified by picking a particular triple of operators and designating them as the position operators, or equivalently by picking a particular representation of states as functions on R3 and designating it as the configuration-space representation. More ab i  i, P stractly, it is sometimes specified by designating pairs of operators Q (required to satisfy the usual commutation relations) as being the position and momentum observables. 2. In quantum computation a certain decomposition of the global Hilbert space into 2-dimensional component spaces is designated as giving the Hilbert spaces of individual qubits (normally taken to have spatially definite locations); sometimes a particular basis for each qubit is also designated as the basis in which measurements are made. 3. In quantum field theory (described algebraically) a map is specified from spatial regions to subalgebras of the operator algebras of the space, so that the operators associated with region R are designated as representing the observables localised in R. At least formally, this can be regarded as defining a component space comprising the degrees of freedom at R. (Note that, although the specification of a quantum system is a bit rough-andready, the quantum systems themselves have perfectly precise mathematical formalisms). 2 More

accurately: whose rays—that is, equivalence classes of normalised vectors under phase transformations—represent the possible states of the system.

18

PHILOSOPHY OF QUANTUM MECHANICS

I shall use the term bare quantum formalism to refer to a quantum theory picked out in these terms, prior to any notion of probability, measurement etc. Most actual calculations done with quantum theory—in particle physics, condensed matter physics, quantum chemistry, etc. —can be characterised as calculations of certain mathematical properties of the bare quantum formalism (the expectation values of certain functions of the dynamical variables, in the majority of cases.) Traditionally, we extract empirical content from the bare formalism via some notion of measurement. The standard “textbook” way to do this is to associate measurements with self-adjoint operators: if |ψ is the state of the system and  is the operator associated with some measurement, and if M = M



i mi P

(2.1)

i

 ’s spectral decomposition into projectors, then the probability of getting is M  i |ψ. result mi from the measurement is ψ| P   i and not M Now the really important thing here is the set of projectors P  ), which has spectral decompoitself: if we associate the measurement with f (M sition  ) = i f (M f (mi )P (2.2) i

then the only difference is that the different possible measurement outcomes are labelled differently. Hence it has become normal to call this sort of measurement a projection-valued measurement, or PVM. It has become widely accepted that PVMs are not adequate to represent all realistic sorts of measurement. In the more general Positive-operator-valued measurement formalism (see, e.g., Peres, 1995, pp. 282–9 and Nielsen and Chuang 2000, pp. 90–2) a measurement process is associated with a family of positive  n }. Each possible outcome of the measurement is associated  1, . . . E operators {E  i , and the probability of getting result i when a system in with one of the E  i |ψ. PVMs are a special case, obtained only when state |ψ is measured is ψ| E  each of the E i is a projector. This framework has proved extremely powerful in analyzing actual measurements (see, for instance, the POVM account of the Stern-Gerlach experiment given by Busch et al. 1996). How do we establish which PVM or POVM should be associated with a particular measurement? There are a variety of more-or-less ad hoc methods used in practice, e.g. 1. In non-relativistic particle mechanics we assume that the probability of finding the system in a given spatial region R is given by the usual formula  |ψ|2 . (2.3) Pr(x ∈ R) = R

THE MEASUREMENT PROBLEM: A MODERN APPROACH

19

2. In high-energy particle physics, if the system is in a state of definite particle number and has momentum-space expansion  |ψ = d3 k α(k) |k (2.4) then we assume that the probability of finding its momentum in the vicinity of some k is proportional to |α(k)|2 . 3. Again in non-relativistic particle mechanics, if we are making a joint measurement of position and momentum then we take the probability of finding the system in the vicinity of some phase-space point (q, p) is given by one of the various “phase-space POVMs” (Busch et al. 1996). But “measurement” is a physical process, not an unanalyzable primitive, and physicists routinely apply the formalism of quantum physics to the analysis of measurements themselves. Here we encounter a regress, though: if we have to construct a quantum-mechanical description of measurement, how do we extract empirical content from that description? In actual physics, the answer is: the regress ends when the measurement process has been magnified up to have macroscopically large consequences. That is: if we have some microscopic system in some superposed state then the empirical content of that state is in principle determined by careful analysis of the measurement process applied to it. If the superposition is between macroscopically different states, however, we may directly read empirical content from it: a system in state α |Macroscopic state 1 + β |Macroscopic state 2

(2.5)

is interpreted, directly, as having probability |α|2 of being found in macroscopic state 1. Let us get a little more precise about this. 1. We identify some of the system’s dynamical variables (that is, some of its  i and momenta self-adjoint operators) somehow as being the positions Q  i of some macroscopic degrees of freedom of the system. For instance, for P a simple system such as a macroscopic pointer, the centre-of-mass position and conjugate momentum of the system will suffice. For something more complicated (such as a fluid) we normally take the macroscopic degrees of freedom to be the density of the fluid averaged over some spatial regions large compared to atomic scales but small compared to macroscopic ones. 2. We decompose the Hilbert space of the system into a component space Hmacro described by these macroscopic variables, and a space Hmicro for the remaining degrees of freedom: H = Hmacro ⊗ Hmicro . (2.6)  i  3. We construct wave-packet states q , pi in Hmacro —Gaussian states, fairly  i and P  i . These are the states localised around particular values (q i , pi )of Q

20

PHILOSOPHY OF QUANTUM MECHANICS

which physicists in practice regard as “macroscopically definite”: that is, located at the phase-space point (q i , pi ). (We leave aside the conceptual problems with regarding them thus: for now, we are interested in explicating only the pragmatic method used to extract empirical content from QM.) 4. Next, we expand the state in terms of them:      (2.7) |ψ = dpi dq i α(q i , pi ) q i , pi ⊗ ψ(q i , pi ) . 5. We regard |ψ, expanded thus, as a probabilistic mixture. That is, we take the probability density of finding the system’s macroscopic variables to be in the vicinity of (q i , pi ) to be |α(q i , pi )|2 . Or to be (slightly) more exact, we take the probability of finding the system’s macroscopic variables to be in some reasonably large set V to be  dpi dq i |α(q i , pi )|2 . (2.8) V

We might call this the Quantum Algorithm. Empirical results are extracted from the Bare Quantum Formalism by applying the Quantum Algorithm to it. 2.1.2 The Measurement Problem The Bare Quantum Formalism (for any given theory) is an elegant piece of mathematics; the Quantum Algorithm is an ill-defined and unattractive mess. And this is the Measurement Problem. The Measurement Problem: Applying the Quantum Algorithm to the Bare Quantum Formalism produces extremely accurate predictions about macroscopic phenomena: from the results of measurement processes to the boiling points of liquids. But we have no satisfactorily formulated scientific theory which reproduces those predictions.

A solution of the measurement problem, then, is a satisfactorily formulated scientific theory (“satisfactorily formulated”, that is, relative to your preferred philosophy of science) from which we can explain why the Quantum Algorithm appears to be correct. Most such solutions do so by providing theories from which we can prove that the Algorithm is correct, at least in the vast majority of experimental situations. There is no requirement here that different solutions are empirically indistinguishable; two solutions may differ from one another, and from the predictions of the Algorithm, in some exotic and so-far-unexplored experimental regime. (Why call it the measurement problem? Because traditionally it has been the measurement process which has been taken as the source of macroscopic superpositions, and because only when we have such superpositions do we have any need to apply the Quantum Algorithm. But processes other than formal measurements—the amplification of classical chaos into quantum-mechanical indeterminateness, in particular—can also give rise to macroscopic superpositions.)

THE MEASUREMENT PROBLEM: A MODERN APPROACH

21

Solutions of the measurement problem are often called “interpretations of QM”, the idea being that all such “interpretations” agree on the formalism and thus on the experimental predictions. But in fact, different proposed solutions of the measurement problem are often different physical theories with different formalism. Where possible, then, I avoid using “interpretation” in this way (though often tradition makes it unavoidable). There is, however, a genuinely interesting distinction between those proposed solutions which do, and those which do not, modify the formalism. It will be helpful to make the following definition: a pure interpretation is a (proposed) solution of the measurement problem which has no mathematical formalism other than the Bare Quantum Formalism. Proposed solutions which are not pure interpretations I call modificatory: a modificatory solution either adds to the bare formalism, or modifies it (by changing the dynamics, for instance), or in principle eliminates it altogether. 2.1.3

Against the Traditional Account of QM

There is a more traditional way to formulate QM, which goes something like this: 1. A quantum system is represented by a vector |ψ in a Hilbert space H. 2. Properties of the system are represented by projectors on H. 3. If and only if |ψ is an eigenstate of some projector, the system possesses the property associated with that projector; otherwise the value is ‘indefinite’ or ‘indeterminate’ or somesuch. (The ‘eigenvalue-eigenvector link’)  will find that 4. A measurement of some property associated with projector P  |ψ. property to be possessed by the system with probability ψ| P From this perspective, the “measurement problem” is the problem of understanding what ‘indefinite’ or ‘indeterminate’ property possession means (or modifying the theory so as to remove it) and of reconciling the indefiniteness with the definite, probabilistically determined results of quantum measurements. However, this “traditional account” is not an “interpretation-neutral” way of stating the basic assumptions of QM; it is a false friend. Primarily, this is because it fails to give a good account of how physicists in practice apply QM: it assumes that measurements can be treated as PVMs, whereas as we have seen, it is now generally accepted that many practical measurement processes are best understood via the more general POVM formalism. This is particularly clear where continuous variables are concerned—that is, where almost all the quantities measured in practice are concerned. Here, physicists will normally regard a system as “localised” at some particular value of some continuous variable—position, usually—if its wavefunction is strongly peaked around that value. The fact that the wavefunction strictly speaking vanishes nowhere does not seem to bother them. In particular, measurements frequently measure continuous variables, and frequently output the result using further continuous variables (such as a pointer position). The practical criterion for such

22

PHILOSOPHY OF QUANTUM MECHANICS

measurements is that if the system being measured is localised in the vicinity of x, the pointer displaying the result of the measurement should end up localised near whatever pointer position is supposed to display “x”. This is straightforwardly represented via a POVM, but there is no natural way to understand it in terms of projections and the properties which they are supposed to represent. Independent of recent progress in physics, there are reasons internal to philosophy of QM to be skeptical about the traditional account. As we shall see, very few mainstream interpretations of QM fit this framework: mostly they either treat the wavefunction as a physical thing (whose “properties” are then any properties at all of that thing, not just the property of being an eigenstate of some particular operator); or they associate physical properties to some additional “hidden variables”; or they deny that the system has observer-independent properties at all. One of the recurring themes of this chapter will be that the traditional account, having been decisively rejected in the practice of physicists, should likewise be discarded by philosophers: it distorts the philosophy of QM, forcing interpretations into Procrustean beds and encouraging wild metaphysics. 2.2 Decoherence Theory Quite apart from its conceptual weaknesses, it is prima facie surprising that the Quantum Algorithm is well-defined enough to give any determinate predictions at all. For the division between ‘macroscopic’ and ‘microscopic’ degrees of freedom, essential to its statement, was defined with enormous vagueness. Over how large a spatial region must we average to get macroscopic density?—10−5 m? 10−4 m? Fortunately, it is now fairly well understood how to think about this question, thanks to one of the most important quantum-foundational developments of recent years: decoherence theory. 2.2.1 The Concept of Decoherence Suppose we have some unitarily-evolving quantum system, with Hilbert space H, and consider some decomposition of the system into component subsystems: H = Hsys ⊗ Henv ,

(2.9)

which we will refer to as the system and the environment. Now suppose that {|α} is some (not-necessarily-orthogonal) basis of Hsys and that the dynamics of the joint system is such that, if we prepare it in a product state |α⊗|ψ

(2.10)

then it evolves rapidly into another pure state |α⊗|ψ; α

(2.11)

with ψ; α|ψ; β  δ(α − β). (Here, “rapidly” means rapidly relative to other relevant dynamical timescales). In other words, we suppose that the environment measures the system in the {|α} basis and records the result.

DECOHERENCE THEORY

23

Suppose further that this “recording” is reasonably robust, so that subsequent system-environment interactions do not tend to erase it: that is, we don’t get evolutions like λ1 |α1 ⊗|ψ; α1  + λ2 |α2 ⊗|ψ; α2  −→ |φ⊗|χ .

(2.12)

In this (loosely-defined) situation, we say that the environment decoheres the system, and that the basis {|α} is a preferred basis or pointer basis. The timescale on which the recording of the system’s state occurs is called the decoherence timescale. Much follows from decoherence. The most obvious effects are synchronic (or at least, have a consequence which may be expressed synchronically): the system cannot stably be prepared in superpositions of pointer-basis states. Such superpositions very rapidly become entangled with the environment. Conversely, if the system is prepared in a pointer-basis state, it will remain stably in that pointerbasis state (at least for times long compared to the decoherence timescale). Equivalently, the density operator of the system, when expressed in the pointer basis, will be diagonal or nearly so. However, the more important consequence is diachronic. If the environment is keeping the density operator almost diagonal, then interference terms between elements of the pointer basis must be being very rapidly suppressed, and the evolution is effectively quasi-classical. To see this more clearly, suppose that the dynamics of the system is such that after time t, we have the evolution |α1 ⊗|ψ1  −→ |Λ1  = λ11 |α1 ⊗|ψ11  + λ12 |α2 ⊗|ψ12  ;

(2.13)

|α2 ⊗|ψ2  −→ |Λ2  = λ21 |α1 ⊗|ψ21  + λ22 |α2 ⊗|ψ22  .

(2.14)

By linearity, the superposition |Ψ = μ1 |α1 ⊗|ψ1  + μ2 |α2 ⊗|ψ2 

(2.15)

evolves in the same time to μ1 (λ11 |α1 ⊗|ψ11  + λ12 |α2 ⊗|ψ12 ) + μ2 (λ21 |α1 ⊗|ψ21  + λ22 |α2 ⊗|ψ22 ) = |α1  (μ1 λ11 |ψ11  + μ2 λ21 |ψ21 ) + |α2  (μ1 λ12 |ψ12  + μ2 λ22 |ψ22 ) .

(2.16)

Now, suppose that we want to interpret states |Ψ, |Λ1  and |Λ2  probabilistically with respect to the {|α}—for example, in |Ψ we want to interpret |μ1 |2 as the probability of finding the system in state |α1 . Generally speaking, interference makes this impossible: (2.13) and (2.14) would entail that if the joint system is initially in state |αi ⊗|ψi , after time t there is probability |λi1 |2 of finding the system in state |α1 . Applying the probabilistic interpretation to

24

PHILOSOPHY OF QUANTUM MECHANICS

|Ψ tells us that the joint system initially has probability |μi |2 of indeed being initially in state |αi ⊗|ψi , and hence the system has probability P = |μ1 |2 |λ11 |2 + |μ2 |2 |λ21 |2

(2.17)

of being found in |α1  after time t. But if we apply the probabilistic interpretation directly to (2.16), we get a contradictory result: P  = |μ1 |2 |λ11 |2 + |μ2 |2 |λ21 |2 + 2Re(μ∗1 λ∗11 μ2 λ21 ψ11 |ψ21 ).

(2.18)

Crucially, though, the contradiction is eliminated and we get the same result in both cases (irrespective of the precise values of the coefficients) provided that ψ11 |ψ21  = 0. And this is exactly what decoherence, approximately speaking, guarantees: the states |ψ11  and |ψ21  are approximately-orthogonal records of the distinct states of the system in the original superposition. So: we conclude that in the presence of decoherence, and provided that we are interested only in the state of the system and not of the environment, it is impossible to distinguish between a superposition of states like |α⊗|ψα  and a mere probabilistic mixture of such states. 2.2.2

Domains and Rates of Decoherence

When does decoherence actually occur? Some clear results have been established: 1. The macroscopic degrees of freedom of a system are decohered by the microscopic degrees of freedom. 2. The pointer basis picked out in Hmacro is a basis of quasi-classical, Gaussian states. This should not be surprising. Decoherence occurs because the state of the system is recorded by the environment; and, because the dynamics of our universe are spatially local, the environment of a macroscopically large system in a given spatial position will inevitably record that position. (A single photon bouncing off the system will do it, for instance.) So superpositions of systems in macroscopically distinct positions will rapidly become entangled with the environment. And superpositions of states with macroscopically distinct momentums will very rapidly evolve into states of macroscopically distinct positions. The only states which will be reasonably stable against the decoherence process will be wavepackets whose macroscopic degrees of freedom have reasonably definite position and momentum. Modelling of this decoherence process (both computationally and mathematically) shows that3 1. The process is extremely rapid. For example: (a) A dust particle of size ∼ 10−3 cm in a superposition of states ∼ 10−8 m apart will become decohered by sunlight after ∼ 10−5 seconds, and by 3 These

figures are derived from data presented in Joos et al. (2003).

DECOHERENCE THEORY

25

the Earth’s atmosphere after ∼ 10−18 s; the same particle in a superposition of states ∼ 10−5 m apart will become decohered by sunlight in ∼ 10−13 s (and by the atmosphere in 10−18 s again: once the separation is large compared to the wavelength of particles in the environment then the separation distance becomes irrelevant.) (b) A kitten in a superposition of states 10−10 m apart is decohered by the atmosphere in ∼ 10−25 s and by sunlight in ∼ 10−8 s; the same kitten in a superposition of states 10−5 m apart is decohered by the atmosphere in ∼ 10−26 s, by sunlight in ∼ 10−21 s, and by the microwave background radiation in ∼ 10−15 s. 2. In general there is no need for the “environment” to be in some sense external to the system. In general, the macroscopic degrees of freedom of a system can be decohered by the residual degrees of freedom of that same system: in fluids, for instance, the ‘hydrodynamic’ variables determined by averaging particle density and velocity over regions large compared to particle size are decohered by the remaining degrees of freedom of the fluid. 3. The dynamics of the macroscopic degrees of freedom seem, in general, to be ‘quasi-classical’ not just in the abstract sense that they permit a probabilistic interpretation, but in the more concrete sense that they approximate classical equations of motion. To be more precise: (a) If the classical limit of the system’s dynamics is classically regular (i.e., non-chaotic), as would be the case for a heavy particle moving inertially, then the pointer-basis states evolve, to a good approximation, like the classical states they are supposed to represent. That is, if the classical-limit dynamics would take the phase-space point (q, p) to (q(t),p(t)), then the quantum dynamics are approximately |q, p⊗|ψ −→ |q(t), p(t)⊗|ψ(t) .

(2.19)

(b) If the classical limit of the system’s dynamics is chaotic, then classically speaking a localised region in phase space will become highly fibrillated, spreading out over the energetically available part of phasespace (while retaining its volume). The quantum system is unable to follow this fibrillation: on timescales comparable to those on which the system becomes classically unpredictable, it spreads itself across the entire available phase space region:  dq dp |q, p⊗|ψq,p (t) (2.20) |q, p⊗|ψ −→ Ω

(where Ω is the available region of phase space). In doing so, it still tracks the coarse-grained behaviour of the classical system, but fails to track the fine details: thus, classical unpredictability is transformed into quantum indeterminacy.

26

PHILOSOPHY OF QUANTUM MECHANICS

For our purposes, though, the most important point is this: decoherence gives a criterion for applicability of the Quantum Algorithm. For the ‘quasi-classical’ dynamics that it entails for macroscopic degrees of freedom is a guarantee of the consistency of that algorithm: provided ‘macroscopic’ is interpreted as ‘decohered by the residual degrees of freedom beyond our ability to detect coherence’, then the algorithm will give the same results regardless of exactly when, and at what scales, the algorithm is deployed to make a probabilistic interpretation of the quantum state. 2.2.3

Sharpening Decoherence: Consistent Histories

The presentation of decoherence given in the previous section was somewhat loosely defined, and it will be useful to consider the most well-developed attempt at a cleaner definition: the consistent histories formalism. To motivate this formalism, consider a decoherent system with pointer basis {|α}, as above, and suppose (as is not in fact normally the case) that the pointer basis is discrete and orthonormal: α|β=δα,β . Suppose also that we consider the system only at discrete times t0 , t1 , . . . tn . Now, decoherence as we defined it above is driven by the establishment of records of the state of the system (in the pointer basis) made by the environment. Since we are discretising time it will suffice to consider this record as made only at the discrete times (so the separation (tn+1 − tn ) must be large compared with the decoherence timescale). Then if the system’s state at time t0 is  μi0 |αi0 ⊗|ψ(i0 ) (2.21) |Ψ0  = i0

it should evolve by time t1 into some state like  μi0 Λ1 (i0 , i1 ) |αi1 ⊗|ψ(i0 , i1 ) |Ψ1  =

(2.22)

i0 ,i1

(for some transition coefficients Λ1 (i0 , i1 )), with the states |ψ(i0 , i1 ) recording the fact that the system (relative to that state) was in state |αi0  and is now into |αi1  (and thus being orthogonal to one another). Similarly, by time t2 the system will be in state  μi0 Λ1 (i0 , i1 )Λ2 (i1 , i2 ) |αi2 ⊗|ψ(i0 , i1 , i2 ) (2.23) |Ψ1  = i0 ,i1 ,i2

and (iterating) by time tn will finish up in state  μi0 Λ1 (i0 , i1 ) · · · Λn (in−1 , in ) |αin ⊗|ψ(i0 , i1 , . . . , in ) . |Ψn  =

(2.24)

i0 ,i1 ,···in

Since we require (by definition) that record states are orthogonal or nearly so, we have ψ(i0 , . . . in )|ψ(j0 , . . . jn )  δi0 ,j0 · · · δin ,jn . (2.25)

DECOHERENCE THEORY

27

There is an elegant way to express this, originally due to Griffiths (1984) and developed by Gell-Mann and Hartle (1990) and others. For each |αi , and each  i be the projector of our discrete times t0 , . . . tn , let P   i (tj ) = U  † (tj , t0 ) |αi  αi | ⊗   (tj , t0 ), P 1 U (2.26)  (tj , t0 ) is the unitary evolution operator taking states at time t0 to states where U  (tj , t0 ) = exp(−i(tj − at time tj (unless the Hamiltonian is time-dependent, U  t0 )H/)). Then for any sequence i = (i0 , i1 , . . . in ) of indices we may define the  i by history operator C i = P  i (tn ) · · · P  i (t0 ). C (2.27) n 0 Now,

 i (t0 ) |Ψ0  = μi |αi ⊗|ψ(i0 ) ; P 0 0 0

 i (t1 ) |αi ⊗|ψ(i0 ) = μi Λt (i0 , i1 ) |αi ⊗|ψ(i0 , i1 )  i (t0 ) |Ψ0  = μi P  i (t1 )P P 1 0 0 1 0 0 1 1 ···  i |Ψ0  = μi Λ1 (i0 , i1 ) · · · Λn (in−1 , in ) |αi ⊗|ψ(i0 , i1 , . . . , in ) . C 0 n

(2.28)

This has an immediate corollary:   †C Ψ0 | C j i |Ψ0  ∝ αjn |αin  ψ(j0 , j1 , . . . , jn )|ψ(i0 , i1 , . . . , in ) and hence



 i |Ψ0   0 unless i = j. jC Ψ0 | C

(2.29)

(2.30)

Furthermore, if we apply the Quantum Algorithm, it tells us that the probability of the system being found successively in states (corresponding to) |αi0  , . . . |αin   †C  is given by Ψ0 | C i i |Ψ0 . The condition (2.30) then has a natural interpretation: it tells us that there is no interference between distinct histories, so that the Quantum Algorithm can be applied at successive times without fear of contradiction.  i (tj ) Now let us generalise. Given an arbitrary complete set of projectors P  i via (2.27). We say that for each time tj in our finite set we can define histories C these histories satisfy the medium decoherence condition (Gell-Mann and Hartle  †C  (1993)) with respect to some state |Ψ if Ψ| C j i |Ψ  0 whenever i = j. A set of histories satisfying medium decoherence has the following attractive properties:  i |Ψ are interpreted as probabilities of  †i C 1. If (as above) the quantities Ψ| C a given history being realised then medium decoherence guarantees that this can be done consistently, at least within the limits of what we can experimentally determine. In particular, it guarantees that if we define coarse-grained histories (by, e.g., leaving out some intermediate time ti or

28

PHILOSOPHY OF QUANTUM MECHANICS

amalgamating some projectors into a single joint projector), the coarsegraining obeys the probability calculus:    i)   i ). C Pr(C (2.31) Pr( i∈I

For

i∈I

  †   i )  Ψ| (  j )(  i ) |Ψ C C C Pr( i∈I

j∈I





i∈I †

 i |Ψ jC Ψ| C

(2.32)

j∈I i∈I

which in the presence of medium decoherence is just equal to: 



 i |Ψ i C Ψ| C

(2.33)

i∈I

 i |Ψ = 0—is  †j C (Actually a weaker condition—that the real part of Ψ| C sufficient to deliver (2.31). This condition is called consistency; it does not seem to occur in natural situations other than those which also deliver medium decoherence.) 2. Medium decoherence guarantees the existence of records (in an abstract sense). The probabilistic interpretation tells us that at time tn the system should be thought of as having one of the states  i |Ψ |Ψ(i) = N C

(2.34)

(where N is a normalising factor). These states are mutually orthogonal: as such, a single measurement (in the traditional sense) suffices, in principle, to determine the entire history and not just the current state. In light of its elegance, it is tempting to adopt the criterion of medium decoherence of a set of histories as the definition of decoherence, with the decoherence of the previous section only a special case (and an ill-defined one at that). And in fact the resultant formalism (call it the ‘decoherent histories’ formalism) has more than just elegance to recommend it. For one thing, it makes explicit the state-dependence of decoherence. This was in any case implicit in the previous section’s analysis: for the ‘environment’ to decohere the system, it must be in an appropriate state. (If the state of the ‘environment’ is a quintillion-degree plasma, for instance, the system will certainly not undergo quasi-classical evolution!) For another, it allows for a system/environment division which is not imposed once and for all, but can vary from history to history. It would be a mistake, however, to regard the decoherent histories formalism as conceptually generalising the environment-induced decoherence discussed in §2.2.1. In both cases, the mechanism of decoherence is the same: some subset of

THREE CANDIDATES FOR ORTHODOXY

29

the degrees of freedom are recorded by the other degrees of freedom, with the local nature of interactions picking out a phase-space-local basis as the one which is measured; this recording process breaks the coherence of the macroscopic degrees of freedom, suppressing interference and leading to dynamics which are quasiclassical and admit of a probabilistic interpretation (at least approximately). And although the decoherent-histories formalism in theory has the power to incorporate history-dependent system/environment divisions, in practice even simple models where this actually occurs have proven elusive, and actual applications of the decoherent-histories formalism have in the main been restricted to the same sort of system/environment split considered in §2.2.1 (although the ‘environment’ is often taken to be microscopic degrees of freedom of the same system). Furthermore, there are some infelicities of the decoherent-histories formalism as applied to realistic cases of decoherence. In particular, the natural pointer basis for realistic systems seems to be non-orthonormal wave-packet states and the rate of decoherence of superpositions of such states depends smoothly on the spatial distance between them. This does not sit altogether easily with the decoherent-histories formalism’s use of discrete times and an orthonormal pointer basis. Perhaps most importantly, though, the consistency condition alone is insufficient to restore quasi-classical dynamics in the ‘concrete’ sense of §2.2.2—that is, it is insufficient to provide approximately classical equations of motion. I return to this point in §2.3.1. In any case, for our purposes what is important is that (no matter how ‘decoherence’ is actually defined) the basis of quasi-classical states of a macroscopic system is very rapidly decohered by its environment. This guarantees the consistency, for all practical purposes, of the Quantum Algorithm; whether it goes further and actually solves the measurement problem is a matter to which I will return in §2.3 and 2.4. 2.2.4

Further Reading

Joos et al. (2003) and Zurek (2003a) provide detailed reviews of decoherence theory; Zurek (1991) is an accessible short introduction. Bacciagaluppi (2005) reviews the philosophical implications of decoherence. 2.3

Three Candidates for Orthodoxy

In philosophy of QM, terminology is not the least source of confusion. Authors frequently discuss the “orthodox” interpretation of QM as if everyone knew what they meant, even though different authors ascribe different and indeed contradictory properties to their respective versions of orthodoxy. It does not help that physicists use the term “Copenhagen interpretation” almost interchangeably with “orthodox interpretation” or “textbook interpretation”, while philosophers tend to reserve the term for Bohr’s actual, historical position, and use a term like “Dirac-von Neumann interpretation” for the textbook version.

30

PHILOSOPHY OF QUANTUM MECHANICS

In this section—which aims to present the “orthodox interpretation”—I follow the sage advice of Humpty Dumpty, who reminded Alice that words mean what we want them to mean. There are at least three mainstream positions on the measurement problem which are often described as “orthodoxy”. Two of them—operationalism and the consistent-histories formalism—are highly controversial pure interpretations of QM, which their proponents nonetheless often describe as the “orthodox” or indeed the only possible interpretation. (In their different ways, both are also claimed to capture the true spirit of Copenhagen). The third (which I call the “new pragmatism”) is not actually regarded by anyone as a successful solution to the measurement problem but, arguably, best captures the pragmatic quantum theory actually used by working physicists. It is best understood by considering, first, an attractive but failed solution. 2.3.1 Decoherent Histories: The Solution That Isn’t Suppose that there was exactly one finest-grained set of decoherent histories—  i (ti ) which satisfied the medium decoherence condidefined, say, by projectors P tion exactly; suppose also that this set of histories picked out a preferred basis reasonably close to the “quasi-classical” states used in the Quantum  i (ti ) projected onto those states interpreted by the Algorithm, so that each P Quantum Algorithm as saying: the macroscopic degrees of freedom of the system will certainly be found to have some particular values (q i , pi ). In this case, a solution of sorts to the measurement problem would be at hand. It would simply be a stochastic theory of the macroscopic degrees of freedom, specified as follows: Given that: 1. the universal state is |Ψ; 2. the unique finest-grained decoherent-history space consistent with |Ψ is generated b i (tj ), associated with values (q i , pi ) for the macroscopic degrees by projectors P of freedom at time tj ; 3. the macroscopic degrees of freedom at time tj have values (q i , pi ), corresponding b i (tj ) to projector P then the probability of the macroscopic degrees of freedom at time tj  having values  (q i , pi ) is given by 

Pr(q i , pi ; tj  |q i , pi ; ti ) =

b i (tj  )P b i (ti )P b i (ti )P b i (tj  ) |Ψ Ψ| P . b b Ψ| P i (ti )P i (ti ) |Ψ

(2.35)

(It follows from this and the decoherence condition, of course, that the probability of b i |Ψ.) b i is just Ψ| C b †i C a given history C

How satisfactory is this as an interpretation of QM? It is not a pure interpretation; on the other hand, since it is (ex hypothesi) a successful interpretation, it is unclear that this matters. It is not obviously compatible with relativity, since it makes explicit use of a preferred time; perhaps this could be avoided via a spacetime-local version of the preferred projectors, but it seems unlikely that genuinely pointlike degrees of freedom would decohere. The rˆ ole of the ‘universal

THREE CANDIDATES FOR ORTHODOXY

31

state’ is pretty unclear—in fact, the ontology as a whole is pretty unclear, and the state-dependent nature of the preferred set of histories is at least odd. These questions are moot, though. For the basic assumption which grounds the interpretation—that there exists a unique (finest-grained) exactly-decoherent history space—is badly mistaken, as has been shown by Dowker and Jent (1996) and Kent (1996a). The problem does not appear to be existence: as §2.2 showed, there are good reasons to expect the histories defined by macroscopic degrees of freedom of large systems to approximately decohere, and Dowker and Kent have provided plausibility arguments to show that in the close vicinity of any almost-decoherent family of histories we can find an exactly-decoherent one. It is uniqueness, rather, that causes the difficulties: there are excellent reasons to believe that the set of exactly decoherent history spaces is huge, and contains (continuously) many history spaces which are not remotely classical. Indeed, given a family of decoherent histories defined up to some time t, there are continuously many distinct ways to continue that family. As such, the simple decoherencebased interpretation above becomes untenable. The temptation, for those seeking to solve the measurement problem via decoherence, is to introduce some additional criterion stronger than medium decoherence—some X such that there is a unique finest-grained history space satisfying medium-decoherence-plus-X. And in fact there is a popular candidate in the literature: quasi-classicality (Gell-Mann and Hartle 1993). That is: the preferred history space not only decoheres: the decohering degrees of freedom obey approximately classical equations of motion. It is plausible (though to my knowledge unproven) that this condition is essentially unique; it is highly plausible that there are not continuously many essentially different ways to vary a quasi-classical decoherent history space. But as a candidate for X, quasi-classicality is pretty unsatisfactory. For one thing, it is essentially vague: while we have good theoretical reasons to expect exactlydecoherent histories in the vicinity of approximately decoherent ones, we have no reason at all to expect exactly classical histories in the vicinity of quasiclassical ones. For another, it is a high-level notion all-but-impossible to define in microphysical terms. It is as if we were to write a theory of atomic decay which included “density of multicellular organisms” as a term in its equations. As such, it seems that no satisfactory decoherent-history-based interpretation can be developed along the lines suggested here. 2.3.2

The New Pragmatism

However, an interpretation need not be satisfactory to be coherent (so to speak). No-one who took the measurement problem seriously regarded the Dirac-von Neumann formulation of QM, with its objective collapse of the wavefunction at the moment of measurement, as a satisfactory physical theory; yet it was widely discussed and used in physics when one wanted a reasonably clear statement of the theory being applied (and never mind its foundational problems). The quasiclassical condition discussed in the previous section lets us improve on the Dirac-

32

PHILOSOPHY OF QUANTUM MECHANICS

von Neumann interpretation by making (somewhat) more precise and objective its essential appeal to ‘measurement’ and ‘observation’. The resultant theory has been called the ‘unknown set’ interpretation by Kent (1996b); I prefer to call it the New Pragmatism, to emphasise that no-one really regards it as acceptable. It is, nonetheless, one of our three “candidates for orthodoxy”; though it has not been explicitly stated in quite the form which I shall use, it seems to conform quite closely to the theory that is in practice appealed to by working physicists. The New Pragmatism (decoherent-histories version): The state of the Universe at time t is given by specifying some state vector |Ψ(t), which evolves unitarily, and some particular quasi-classical, approximately decoherent consistent-history space, genb i (tj ) The state |Ψ(t) is to be interpreted as a probabilistic erated by the projectors P mixture of eigenstates of the quasi-classical projectors: that is, expanding it as X b i (t) |Ψ(t) Ψ(t)| P b i (t) |Ψ(t) , |Ψ(t) = P (2.36) i

b i (t) |Ψ(t) the probability that the state of the Universe is (up to normalisation) P b i (t) |Ψ(t) |2 . Because the history space is approximately decoherent, any is | Ψ(t)| P interference-generated inconsistencies caused by this probabilistic reading of the state will be undetectable; if that is felt to be unsatisfactory, just require that the history space is exactly decoherent (some such will be in the vicinity of any given approximatelydecoherent history space).

According to the New Pragmatism, then, the quantum state vector is physical (is, indeed, the complete physical description of the system). It evolves in some mixture of unitary steps and stochastic jumps, and at any given time it assigns approximately-definite values of position and momentum to the macroscopic degrees of freedom of the system. We do not know the actual decoherent-history space used (hence ‘unknown set’), but we know it well enough to predict all probabilities to any reasonably-experimentally-accessible accuracy. The New Pragmatism, it will be apparent, is a pretty minimal step beyond the Quantum Algorithm itself: if we were to ask for the most simple-minded way to embed the Algorithm into a theory, without any concern for precision or elegance, we would get something rather like the New Pragmatism. This is even more obvious if we reformulate it from the language of decoherent histories to the environment-induced decoherence of §2.2: The New Pragmatism (wave-packet version): Fix some particular division of Hilbert space into macroscopic and microscopic degrees of freedom: H = Hmacro ⊗ Hmicro ; and fix some particular basis |q, p of wave-packet states for Hmacro . Then the state vector |Ψ(t) of the Universe always evolves unitarily, but is to be understood as a probabilistic mixture of approximately-macroscopically-definite states: if the universal state is the superposition Z |Ψ(t) = dp dq α(q, p) |q, p⊗|ψ(q, p; t) (2.37) then the actual state is one of the components of this superposition, and has probability |α(q, p)|2 of being |q, p⊗|ψ(q, p; t). (And of course this state in turn is somehow to

THREE CANDIDATES FOR ORTHODOXY

33

be understood as having macroscopic phase-space location (q, p).)

It is an interesting philosophy-of-science question to pin down exactly what is unacceptable about the New Pragmatism. And it is not obvious at all that it is unacceptable from some anti-realist standpoints (from the standpoint of Van Fraassen (1980), for instance). Nonetheless, it is accepted as unsatisfactory. Unlike our other two candidates for orthodoxy, and despite the frequency with which it is in fact used, no-one really takes it seriously. 2.3.3

The Consistent-Histories Interpretation

A more ‘serious’ interpretation of QM, still based on the decoherent-histories formalism, has been advanced by Griffiths (1984, 2002) and Omnes (1994, 1998): it might be called the ‘consistent histories’ interpretation,4 and its adherents claim that it incorporates the essential insights of Bohr’s complementarity, and should be viewed as the natural modern successor to the Copenhagen interpretation. The positions of Griffiths and Omnes are subtle, and differ in the details. However, I think that it is possible to give a general framework which fits reasonably well to both of them. We begin, as with the impossible single-decoherent-historyspace theory, with some universal state |Ψ. Now, however, we consider all of the maximally-fine-grained consistent history spaces associated with |Ψ. (Re †C  call that a history space is consistent iff the real part of Ψ| C i j |Ψ vanishes for i = j; it is a mildly weaker condition than decoherence, necessary if the probabilities of histories are to obey the probability calculus.) Now in fact, these “maximally fine-grained” history spaces are actually constructed from one-dimensional projectors. For any exactly-consistent history which does not so consist can always be fine-grained, as follows: let it be con i (tj ), and define the state |ik , . . . i0  by structed as usual from projectors P j  i (tk ) · · · P  i (t0 ) |Ψ |ik , . . . i0  = N P 0 k

(2.38)

 m (tk ) as (where N is just a normalising factor.) Then define a fine-graining P ik follows:  0 (tk ) = |ik , . . . i0  ik , . . . i0 | ; (2.39) P ik  m (tk ) are arbitrary one-dimensional projectors chosen to satisfy the other P ik 

 i (tk ).  m (tk ) = P P ik k

(2.40)

m

It is easy to see that n  m0 m P in (tn ) · · · P i0 (t0 ) |Ψ = 0 whenever any mk = 0

4 Terminology is very confused here. Some of those who advocate ‘consistent-histories’ interpretations—notably Gell-Mann and Hartle—appear to mean something very different from Griffiths and Omnes, and much closer in spirit to the Everett interpretation.

34

PHILOSOPHY OF QUANTUM MECHANICS

 0 (t0 ) |Ψ = P  i (tn ) · · · P  i (t0 ) |Ψ ,  0i (tn ) · · · P P i0 n 0 n

(2.41)

and hence the fine-graining also satisfies the consistency condition. Notice (this is why I give the proof explicitly, in fact) how sensitive this fine-graining process is to the universal state |Ψ (by contrast, when we are dealing with the coarsegrained approximately-decoherent histories given by dynamical decoherence, the history space is fairly insensitive to all but broad details of |Ψ). Griffiths and Omnes now regard each consistent history space as providing some valid description of the quantum system under study. And under a given  i }, they take the probability of the system’s actual history being description {C  i to be given by the usual formula Ψ| C  i |Ψ.  †i C C Were there in fact only one consistent history space, this would reduce to the ‘impossible’ interpretation which I discussed in §2.3.1. But of course this is not the case, so that a great deal of conceptual work must be done by the phrase ‘under a given description’. It is very unclear how this work is in fact to be done. There are of course multiple descriptions of even classical systems, but these descriptions can in all cases be understood as coarse-grainings of a single exhaustive description (Griffiths (2002) dubs this the principle of unicity). By contrast, in the consistenthistories form of QM this is not the case: The principle of unicity does not hold: there is not a unique exhaustive description of a physical system or a physical process. Instead, reality is such that it can be described in various alternative, incompatible ways,using descriptions which cannot be combined or compared. (Griffiths, 2002, p. 365)

There is a close analogy between this ‘failure of unicity’ and Bohrian complementarity, as proponents of the consistent-histories interpretation recognise. The analogy becomes sharper in the concrete context of measurement: which histories are ‘consistent’ in a given measurement process depends sensitively on what the measurement device is constructed to measure. If, for instance, we choose to measure a spin-half particle’s spin in the x direction, then schematically the process looks something like (α |+x  + β |−x ) ⊗ |untriggered device −→ α |+x ⊗|device reads ‘up’ + β |−x ⊗|device reads ‘down’ .

(2.42)

A consistent-history space for this process might include histories containing the projectors |±x  ±x | ⊗ |untriggered device untriggered device| , |±x  ±x | ⊗ |device reads ‘up’ device reads ‘up’| , and |±x  ±x | ⊗ |device reads ‘down’ device reads ‘down’| ,

THREE CANDIDATES FOR ORTHODOXY

35

But if the experimenter instead chooses to measure the z component of spin, then this set will no longer be consistent and we will instead need to use a set containing projectors like |±z  ±z | ⊗ |device reads ‘down’ device reads ‘down’| , So while for Bohr the classical context of measurement was crucial, for the consistent-histories interpretation this just emerges as a special case of the consistency requirement, applied to the measurement process. (Note that it is, in particular, perfectly possible to construct situations where the consistent histories at time t are fixed by the experimenter’s choices at times far later than t—cf. Griffiths (2002, p. 255), Dickson (1998, pp. 54–5)—in keeping with Bohr’s response to the EPR paradox.) But serious conceptual problems remain for the consistent-histories interpretation: 1. What is the ontological status of the universal state vector |Ψ? It plays an absolutely crucial rˆ ole in the theory in determining which histories are consistent: as we have seen, when we try to fine-grain histories down below the coarse-grained level set by dynamical decoherence then the details of which histories are consistent becomes extremely sensitively dependent on |Ψ. Perhaps it can be interpreted as somehow ‘lawlike’; perhaps not. It is certainly difficult to see how it can be treated as physical without letting the consistent-histories interpretation collapse into something more like the Everett interpretation. 2. Does the theory actually have predictive power? The criticisms of Kent and Dowker continue to apply, and indeed can be placed into a sharp form here: they prove that a given consistent history can be embedded into two different history spaces identical up to a given time and divergent afterwards, such that the probabilities assigned to the history vary sharply from one space to the other. In practice, accounts of the consistent-history interpretation seem to get around this objection by foreswearing cosmology and falling back on some externally-imposed context to fix the correct history; shades of Bohr, again. 3. Most severely, is Griffith’s ‘failure of unicity’ really coherent? It is hard to make sense of it; no wonder that many commentators on the consistenthistory formalism (e.g., Penrose (2004, p. 788)) find that they can make sense of it only by regarding every history in every history space as actualised: an ontology that puts Everett to shame. 2.3.4

Operationalism

The consistent-histories interpretation can make a reasonable case for being the natural home for Bohr’s complementarity. But there is another reading of the Copenhagen interpretation which arguably has had more influence on physicists’ attitude to the measurement problem: the operationalist doctrine that physics is

36

PHILOSOPHY OF QUANTUM MECHANICS

concerned not with an objective ‘reality’ but only with the result of experiments. This position has been sharpened in recent years into a relatively well-defined interpretation (stated in particularly clear form by Peres, 1995); see also Fuchs and Peres (2000a)): the operationalist interpretation that is our third candidate for orthodoxy. Following Peres, we can state operationalism as follows: The operationalist interpretation: Possible measurements performable on a quantum system are represented by the POVMs of that system’s Hilbert space. All physics tells us is the probability, for each measurement, of a given outcome: specifically, it tells b obtaining us that the probability of the outcome corresponding to positive operator A b (or ψ| A b |ψ in the special case where a pure state may be used). As such, is Tr(b ρA) the state of the system is not a physical thing at all, but simply a shorthand way of recording the probabilities of various outcomes of measurements; and the evolution rule b |ψ(t) = exp(−itH/) |ψ

(2.43)

is just a shorthand way of recording how the various probabilities change over time for an isolated system.

 = Tr(  It is enough In fact, we do not even need to postulate the rule Pr(A) ρA). to require non-contextuality: that is, to require that the probability of obtaining  is independent of which POVM A  is embedded into. the result associated with A Suppose Pr is any non-contextual probability measure on the positive operators: that is, suppose it is a function from the positive operators to [0, 1] satisfying   i =   i ) = 1. A 1 −→ P r(A (2.44) i

i

Then it is fairly simple (Caves et al. 2004) to prove that Pr must be represented  = Tr(  for some  by a density operator: Pr(A) ρA) ρ. Modifications of the operationalist interpretation are available. The probabilities may be taken to be subjective (Caves et al., 2002), as referring to an ensemble of systems (Ballentine, 1990; Taylor, 1986), or as irreducible single-case chances (Fuchs and Peres 2000a). The ‘possible measurements’ may be taken to be given by the PVMs alone rather than the POVMs (in which case Gleason’s theorem must be invoked in place of the simple proof above to justify the use of density operators). But the essential content of the interpretation remains: the ‘quantum state’ is just a way of expressing the probabilities of various measurement outcomes, and—more generally—quantum theory itself is not in the business of supplying us with an objective picture of the world. Fuchs and Peres put this with admirable clarity: We have learned something new when we can distill from the accumulated data a compact description of all that was seen and an indication of which further experiments will corroborate that description. This is what science is about. If, from such a description, we can further distill a model of a free-standing “reality” independent of our interventions, then so much the better. Classical physics is the ultimate example of such a model. However, there is no logical necessity for a realistic worldview to always be obtainable. If the world is such that we can never identify a reality independent of our

THREE CANDIDATES FOR ORTHODOXY

37

experimental activity, then we must be prepared for that, too. . . . [Q]uantum theory does not describe physical reality. What it does is provide an algorithm for computing probabilities for the macroscopic events (“detector clicks”) that are the consequences of our experimental interventions. This strict definition of the scope of quantum theory is the only interpretation ever needed, whether by experimenters or theorists. (Fuchs and Peres, 2002a) ... Todd Brun and Robert Griffiths point out [in Styer et al. (2000)] that “physical theories have always had as much to do with providing a coherent picture of reality as they have with predicting the results of experiment.” Indeed, have always had. This statement was true in the past, but it is untenable in the present (and likely to be untenable in the future). Some people may deplore this situation, but we were not led to reject a freestanding reality in the quantum world out of a predilection for positivism. We were led there because this is the overwhelming message quantum theory is trying to tell us. (Fuchs and Peres, 2000b)

Whether or not Fuchs and Peres were led to their position ‘out of a predilection for positivism’, the operationalist interpretation is nonetheless positivist in spirit, and is subject to many of the same criticisms. However, in one place it differs sharply. Where the positivists were committed to a once-and-for-all division between observable and unobservable, a quantum operationalist sees no difficulty in principle with applying QM to the measurement process itself. In a measurement of spin, for instance, the state α |+x  + β |−x 

(2.45)

may just be a way of expressing that (among other regularities) the probability of getting result ‘up’ on measuring spin in the x direction is |α|2 . But the measurement process may itself be modelled in QM in the usual way— (α |+x  + β |−x ) ⊗ |untriggered device −→ α |+x ⊗|device reads ‘up’ + β |−x ⊗|device reads ‘down’ .

(2.46)

— provided that it is understood that this state is itself just a shorthand way of saying (among other regularities) that the probability of finding the measurement device to be in state “reads ‘up’ ” is |α|2 . It is not intended to describe a physical superposition any more than α |+x  + β |−x  is. In principle, this can be carried right up to the observer: α |Observer sees ‘up’ result + β |Observer sees ‘down’ result

(2.47)

is just a shorthand expression of the claim that if the ‘observer’ is themselves observed, they will be found to have seen ‘up’ with probability |α|2 . Of course, if analysis of any given measurement process only gives dispositions for certain results in subsequent measurement processes, then there is a threat of infinite regress. The operationalist interpretation responds to this problem by adopting an aspect of the Copenhagen interpretation essentially lost in the

38

PHILOSOPHY OF QUANTUM MECHANICS

consistent-histories interpretation: the need for a separate classical language to describe measurement devices, and the resultant ambiguity (Peres, 1995, p. 373) calls it ambivalence) as to which language is appropriate when. To spell this out (here I follow Peres, 1995, pp. 376–7)) a measurement device, or any other macroscopic system, may be described either via a density operator  ρ on Hilbert space (a quantum state, which gives only probabilities of certain results on measurement) or a probability distribution W (q, p) over phase-space points (each of which gives an actual classical state of the system). These two descriptions then give different formulae for the probability of finding the system to have given position and momentum: • The quantum description just is a shorthand for the probabilities of getting various different results on measurement. In particular, there will exist  q,p such that the probability density of getting results (q, p) some POVM A  q,p ). on a joint measurement of position and momentum is Tr( ρA • According to the classical description, the system actually has some particular values of q and p, and the probability density for any given values is just W (q, p). If the two descriptions are not to give contradictory predictions for the result of  q,p ); or, to be more precise, we experiment, then we require that W (q, p)  Tr( ρA  require that the integrals of W (q, p) and Tr( ρAq,p ) over sufficiently large regions of phase space are equal to within the limits of experimental error. This gives us a recipe to construct the classical description from the quantum: just set W (q, p)  q,p ). If this is done at a given time t0 , then at subsequent times equal to Tr( ρA t > t0 the classical dynamics applied to W and the quantum dynamics applied to  ρ will break the equality:  q,p ) = W (q, p; t) Tr( ρ(t)A

(2.48)

(where W (q, p; t) is the distribution obtained by time-evolving W (q, p) using Hamilton’s equations.) But if the system is sufficiently large, decoherence guarantees that the equality continues to hold approximately when W (q, p; t) and  q,p ) are averaged over sufficiently large phase-space volumes. Tr( ρ(t)A The ‘operationalism’ of this interpretation is apparent here. There is no exact translation between classical and quantum descriptions, only one whose imprecisions are too small to be detected empirically.5 But if QM—if science generally—is merely a tool to predict results of experiments, it is unclear at best that we should be concerned about ambiguities which are empirically undetectable in practice. Whether this is indeed a valid conception of science—and 5 A further ambiguity in the translation formula is the POVM A(q, p): in fact, no unique POVM for phase-space measurement exists. Rather, there are many equivalently-good candidates which essentially agree with one another provided that their predictions are averaged over phase-space volumes large compared to n , where n is the number of degrees of freedom of the system.

THE EVERETT INTERPRETATION

39

whether the operationalist interpretation really succeeds in overcoming the old objections to logical positivism—I leave to the reader’s judgment. 2.3.5 Further Reading Standard references for consistent histories are Griffiths (2002) and Omnes (1994); for critical discussion, see Dickson (1998, pp. 52–7), Bub (1997, pp. 212–36) and Bassi and Ghirardi (2000). The best detailed presentation of operationalism is Peres, 1995); for a briefer account see Fuchs and Peres (2000a). For two rather different reappraisals of the original Copenhagen interpretation, see Cushing (1994) and Saunders (2005). Recently, operationalist approaches have taken on an “information-theoretic” flavour, inspired by quantum information. See Chris Timpson’s contribution to this volume for more details. Though they cannot really be called “orthodox”, the family of interpretations that go under the name of “quantum logic” are also pure interpretations which attempt to solve the measurement problem by revising part of our pre-quantum philosophical picture of the world. In this case, though, the part to be revised is classical logic. Quantum logic is not especially popular at present, and so for reasons of space I have omitted it, but for a recent review see Dickson (2001). 2.4 The Everett Interpretation Of our three ‘candidates for orthodoxy’, only two are pure interpretations in the sense of §2.1.2, and neither of these are ‘realist’ in the conventional sense of the world. The consistent-histories interpretation purports to describe an objective reality, but that reality is unavoidably perspectival, making sense only when described from one of indefinitely many contradictory perspectives; whether or not this is coherent, it is not how scientific realism is conventionally understood! The operationalist interpretation, more straightforwardly, simply denies explicitly that it describes an independent reality. And although the new pragmatism does describe such a reality, it does it in a way universally agreed to be ad hoc and unacceptable. There is, however, one pure interpretation which purports to be realist in a completely conventional sense: the Everett interpretation. Unlike the three interpretations we have considered so far, its adherents make no claim that it is any sort of orthodoxy; yet among physicists if not philosophers it seems to tie with operationalism and consistent histories for popularity. Its correct formulation, and its weaknesses, are the subject of this section. 2.4.1 Multiplicity from Indefiniteness? At first sight, applying straightforward realism to QM without modifying the formalism seems absurd. Undeniably, unitary QM produces superpositions of macroscopically distinct quasi-classical states; whether or not such macroscopic superpositions even make sense, their existence seems in flat contradiction with the fact that we actually seem to observe macroscopic objects only in definite states.

40

PHILOSOPHY OF QUANTUM MECHANICS

The central insight in the Everett interpretation is this: superpositions of macroscopically distinct states are somehow to be understood in terms of multiplicity. For instance (to take the time-worn example) α |Live cat + β |Dead cat

(2.49)

is to be understood (somehow) as representing not a single cat in an indefinite state, but rather a multiplicity of cats, one (or more) of which is alive, one (or more) of which is dead. Given the propensity of macroscopic superpositions to become entangled with their environment, this ‘many cats’ interpretation becomes in practice a ‘many worlds’ interpretation: quantum measurement continually causes the macroscopic world to branch into countless copies. The problems in cashing out this insight are traditionally broken in two: 1. The ‘preferred basis problem’: how can the superposition justifiably be understood as some kind of multiplicity? 2. The ‘probability problem’: how is probability to be incorporated into a theory which treats wavefunction collapse as some kind of branching process? 2.4.2

Preferred-Basis Problem: Solve by Modification

If the preferred basis problem is a question (“how can quantum superpositions be understood as multiplicities?”) then there is a traditional answer, more or less explicit in much criticism of the Everett interpretation (Barrett, 1999; Kent, 2005; Butterfield, 1996): they cannot. That is: it is no good just stating that a state like (2.49) describes multiple worlds: the formalism must be explicitly modified to incorporate them. This position dominated discussion of the Everett interpretation in the 1980s and early 1990s: even advocates like (Deutsch, Deutsch1985) accepted the criticism and rose to the challenge of providing such a modification. Modificatory strategies can be divided into two categories. Many-exact-worlds theories augment the quantum formalism by adding an ensemble of ‘worlds’ to the state vector. The ‘worlds’ are each represented by an element in some particular choice of ‘world basis’ |ψi (t) at each time t: the proportion of worlds in state |ψi (t) at time t is | Ψ(t)|ψi (t), where |Ψ(t) is the (unitarily-evolving) universal state. Our own world is just one element of this ensemble. Examples of many-exact-worlds theories are the early Deutsch (1985, 1986), who tried to use the tensor-product structure of Hilbert space to define the world basis,6 and Barbour (1994, 1999), who chooses the position basis. In Many-minds theories, by contrast, the multiplicity is to be understood as illusory. A state like (2.49) really is indefinite, and when an observer looks at the cat and thus enters an entangled state like α |Live cat⊗|Observer sees live cat + β |Dead cat⊗|Observer sees dead cat (2.50) 6A

move criticised on technical grounds by Foster and Brown (1988).

THE EVERETT INTERPRETATION

41

then the observer too has an indefinite state. However: to each physical observer is associated not one mental state, but an ensemble of them: each mental state has a definite experience, and the proportion of mental states where the observer sees the cat alive is |α|2 . Effectively, this means that in place of a global ‘worlddefining basis’ (as in the many-exact-worlds theories) we have a ‘consciousness basis’ for each observer.7 When an observer’s state is an element of the consciousness basis, all the minds associated with that observer have the same experience and so we might as well say that the observer is having that experience. But in all realistic situations the observer will be in some superposition of consciousnessbasis states, and the ensemble of minds associated with that observer will be having a wide variety of distinct experiences. Examples of many-minds theories are Albert and Loewer (1988), Lockwood (1989, 1996), Page (1996) and Donald (1990, 1992, 2002). It has increasingly become recognised, by supporters and detractors alike that there are severe problems with either of these approaches to developing the Everett interpretation. Firstly, and most philosophically, both the manyexact-worlds and the many-minds theories are committed to a very strong (and arguably very anti-scientific) position in philosophy of mind: the rejection of functionalism, the view that mental properties should be ascribed to a system in accordance with the functional rˆ ole of that system (see e.g., Armstrong (1968), Lewis (1974), Hofstadter and Dennett (1981), Levin (2004) for various explications of functionalism). This is particularly obvious in the case of the ManyMinds theories, where some rule associating conscious states to physical systems is simply postulated in the same way that the other laws of physics are postulated. If it is just a fundamental law that consciousness is associated with some given basis, clearly there is no hope of a functional explanation of how consciousness emerges from basic physics (and hence much, perhaps all, of modern AI, cognitive science and neuroscience is a waste of time). And in fact many adherents of Many-Minds theories (e.g., Lockwood and Donald) embrace this conclusion, having been led to reject functionalism on independent grounds. It is perhaps less obvious that the many-exact-worlds theories are equally committed to the rejection of functionalism. But if the ‘many worlds’ of these theories are supposed to include our world, it follows that conscious observers are found within each world. This is only possible compatible with functionalism if the worlds are capable of containing independent complex structures which can instantiate the ‘functions’ that subserve consciousness. This in turn requires that the world basis is decoherent (else the structure would be washed away by interference effects) and—as we have seen—the decoherence basis is not effectively specifiable in any precise microphysical way. (See Wallace (2002) for 7 Given that an ‘observer’ is represented in the quantum theory by some Hilbert space many of whose states are not conscious at all, and that conversely almost any sufficiently-large agglomeration of matter can be formed into a human being, it would be more accurate to say that we have a consciousness basis for all systems, but one with many elements which correspond to no conscious experience at all.

42

PHILOSOPHY OF QUANTUM MECHANICS

further discussion of the difficulty of localising conscious agents within ‘worlds’ defined in this sense.) There is a more straightforwardly physical problem with these approaches to the Everett interpretation. Suppose that a wholly satisfactory Many-ExactWorlds or Many-Minds theory were to be developed, specifying an exact ‘preferred basis’ and an exact transition rule defining identity for worlds or minds. Nothing would then stop us from taking that theory, discarding all but one of the worlds/minds8 and obtaining an equally empirically effective theory without any of the ontological excess which makes Everett-type interpretations so unappealing. Put another way: an Everett-type theory developed along the lines that I have sketched would really just be a hidden-variables theory with the additional assumption that continuum many non-interacting sets of hidden variables exist, each defining a different classical world. (This point is made with some clarity by Bell (1981b) in his classic attack on the Everett interpretation.) In the light of these sorts of criticisms, these modify-the-formalism approaches to the Everett interpretation have largely fallen from favour. Almost no advocate of “the Many-Worlds Interpretation” actually advocates anything like the Many-Exact-Worlds approach9 (Deutsch, for instance, clearly abandoned it some years ago) and Many-Minds strategies which elevate consciousness to a preferred rˆ ole continue to find favour mostly in the small group of philosophers of physics strongly committed for independent reasons to a non-functionalist philosophy of mind. Advocates of the Everett interpretation among physicists (almost exclusively) and philosophers (for the most part) have returned to Everett’s original conception of the Everett interpretation as a pure interpretation: something which emerges simply from a realist attitude to the unitarily-evolving quantum state. 2.4.3

The Bare Theory: How Not to Think About the Wave Function

One way of understanding the Everett interpretation as pure interpretation—the so-called ‘Bare Theory’—was suggested by Albert (1992). It has been surprisingly influential among philosophers of physics—not as a plausible interpretation of QM, but as the correct reading of the Everett interpretation. Barrett (1999, p. 94) describes the Bare Theory as follows: The bare theory is simply the standard von Neumann-Dirac formulation of QM with the standard interpretation of states (the eigenvalue-eigenstate link) but stripped of the collapse postulate—hence, bare.

From this perspective, a state like (2.49) is not an eigenstate of the ‘cat-is-alive’ operator (that is, the projector which projects onto all states where the cat is alive); hence, given the eigenstate-eigenvalue link the cat is in an indefinite state of aliveness. Nor is it an eigenstate of the ‘agent-sees-cat-as-alive’ operator, so the agent’s mental state is indefinite between seeing the cat alive and seeing it dead. 8 It

would actually be a case of discarding all but one set of minds—one for each observer. (1999) may be an exception.

9 Barbour

THE EVERETT INTERPRETATION

43

But it is an eigenstate of the ‘agent-sees-cat-as-alive-or-agent-sees-cat-as-dead’ operator: the states |Live cat⊗|Observer sees live cat

(2.51)

|Dead cat⊗|Observer sees dead cat

(2.52)

and

are both eigenstates of that operator with eigenvalue one, so their superposition is also an eigenstate of that operator. Hence if we ask the agent, ‘did you see the cat as either alive or dead’ they will answer ‘yes’. That is: the bare theory—without any flaky claims of ‘multiplicity’ or ‘branching’—undermines the claim that macroscopic superpositions contradict our experience. It predicts that we will think, and claim, that we do not observe superpositions at all, even when our own states are highly indefinite, and that we are simply mistaken in the belief that we see a particular outcome or other. That is, it preserves unitary QM—at the expense of a scepticism that “makes Descartes’s demon and other brain-in-the-vat stories look like wildly optimistic appraisals of our epistemic situation” (Barrett 1999, p. 94). As Albert puts it: [M]aybe . . . the linear dynamical laws are nonetheless the complete laws of the evolution of the entire world, and maybe all the appearances to the contrary (like the appearance that experiments have outcomes, and the localised that the world doesn’t evolve deterministically) turn out to be just the sorts of delusions which those laws themselves can be shown to bring on! (1992, p. 123)

A quite extensive literature has developed trying to point out exactly what is wrong with the Bare Theory (see, e.g., Albert (1992, pp. 117–25), Barrett (1998), Barrett (1999, pp. 92–120), Bub, Clifton, and Monton (1998), Dickson (1998, pp. 45–7)). The consensus seems to be that: 1. If we take a ‘minimalist’, pure-interpretation reading of Everett, we are led to the Bare Theory; and 2. The bare theory has some extremely suggestive features; but 3. It is not ultimately satisfactory as an interpretation of QM because it fails to account for probability/is empirically self-undermining/smuggles in a preferred basis (delete as applicable); and so 4. Any attempt to solve the measurement problem along Everettian lines cannot be ‘bare’ but must add additional assumptions. From the perspective of this review, however, this line of argument is badly mistaken. It relies essentially on the assumption that the eigenstate-eigenvalue link is part of the basic formalism of QM, whereas—as I argued in §2.1.3—it plays no part in modern practice and is flatly in contradiction with most interpretative strategies. It will be instructive, however, to revisit this point in the context of the state-vector realism that is essential to the Everett interpretation. If the state vector is to be taken as physically real, the eigenstate-eigenvalue link becomes a claim about the properties of that state vector. Specifically:

44

PHILOSOPHY OF QUANTUM MECHANICS

All properties of the state vector are represented by projectors, and the state vector |ψ b if it is inside the subspace onto which P b projects: has the property represented by P b |ψ = |ψ. If P b |ψ = 0 then |ψ certainly lacks the property represented that is, if P b ; if neither is true then either |ψ definitely lacks the property or it is indefinite by P whether |ψ has the property.

As is well known, it follows that the logic of state-vector properties is highly nonclassical: it is perfectly possible, indeed typical, for a system to have a definite value of the property (p∨q) without definitely having either p or q. The quantum logic programme developed this into a mathematically highly elegant formalism; see Bub (1997) for an clear presentation. What is less clear is why we should take this account of properties at all seriously. We have seen that it fails to do justice to modern analyses of quantum measurement; furthermore, from the perspective of state-vector realism it seems to leave out all manner of perfectly ordinary properties of the state. Why not ask: • Is the system’s state an eigenstate of energy? • Is its expectation value with respect to the Hamiltonian greater than 100 joules? • Is its wavefunction on momentum space negative in any open set? • Does its wavefunction on configuration space tend towards any Gaussian? If the state vector is physical, these all seem perfectly reasonable questions to ask about it. Certainly, each has a determinate true/false answer for any given state vector. Yet none correspond to any projector (it is, for instance, obvious that there is no projector that projects onto all and only eigenstates of some operator!) Put more systematically: if the state vector is physical, then the set S of normalised vectors in Hilbert space is (or at least represents) the set of possible states in which an (isolated, non-entangled) system can be found. If we assume standard logic, then a property is defined (at least for the purposes of physics) once we have specified the states in S for which the property holds. That is: properties in quantum physics correspond to subsets of the state space, just as properties in classical physics correspond to subsets of the phase space. If we assume non-standard logic, of course, we doubtless get some different account of properties; if we assume a particular non-standard logic, very probably we get the eigenstate-eigenvalue account of properties. The fact remains that if we wish to assume state-vector realism and standard logic (as did Everett) we do not get the Bare Theory. (There is, to be fair, an important question about what we do get. That is, how can we think about the state vector (construed as real) other than through the eigenstate-eigenvalue link? This question has seen a certain amount of attention in recent years. The most common answer seems to be ‘wavefunction realism’: if the state vector is a physical thing at all it should be thought of as a field on 3N -dimensional space. Bell proposed this (in the context of the de Broglie-Bohm theory):

THE EVERETT INTERPRETATION

45

No one can understand this theory until he is willing to think of ψ as a real physical field rather than a ‘probability amplitude’. Even though it propagates not in 3-space but in 3N-space. (Bell 1981b; emphasis his)

Albert (1996) proposes it as the correct reading of the state vector in any statevector-realist theory;10 Lewis (2004a) and Monton (2004b) concur. (Monton argues that wavefunction realism is unacceptable, but he does so in order to argue against state-vector realism altogether rather than to advocate an alternative). There are alternatives, however. Chris Timpson and myself (2007) suggest a more spatio-temporal ontology, in which each spacetime region has a (possibly impure) quantum state but in which, due to entanglement the state of region A ∪ B is not determined simply by the states of regions A and B separately (a form of nonlocality which we claim is closely analogous to what is found in the Aharonov-Bohm effect). Deutsch and Hayden (2000) argue for an ontology based on the Heisenberg interpretation which appears straightforwardly local (but see Wallace and Timpson (2007) for an argument that this locality is more apparent than real). Saunders (1997) argues for a thoroughly relational ontology reminiscent of Leibniz’s monadology. To what extent these represent real metaphysical alternatives rather than just different ways of describing the quantum state’s structure is a question for wider philosophy of science and metaphysics.) 2.4.4

Decoherence and the Preferred Basis

In any case, once we have understood the ontological fallacy on which the Bare Theory rests, it remains to consider whether multiplicity does indeed emerge from a realist reading of the quantum state, and if so how. The 1990s saw an emerging consensus on this issue, developed by Zurek, Gell-Mann and Hartle, Zeh, and many others11 and explored from a philosophical perspective by Saunders (1993, 1997, 1995): the multiplicity is a consequence of decoherence. That is, the structure of “branching worlds” suggested by the Everett interpretation is to be identified with the branching structure induced by the decoherence process. And since the decoherence-defined branching structure is comprised of quasi-classical histories, it would follow that Everett branches too are quasi-classical. It is important to be clear on the nature of this “identification”. It cannot be taken as an additional axiom (else we would be back to the Many-ExactWorlds theory); rather, it must somehow be forced on us by a realist interpretation of the quantum state. Gell-Mann and Hartle (1993) made the first sustained attempt to defend why this is so, with their concept of an IGUS: an “information-gathering-and-utilising-system” (similar proposals were made by Saunders (1993) and Zurek (1998)) An IGUS, argue Gell-Mann and Hartle, can only function if the information it gathers and utilises is information about particular decoherent histories. If it attempts to store information about superpositions of such histories, then that information will be washed out almost 10 This would seem to imply that Albert would concur with my criticism of the Bare Theory, but I am not aware of any actual comment of his to this effect. 11 See, e.g., Zurek (1991, 1998), Gell-Mann and Hartle (1990, 1993), and Zeh (1993).

46

PHILOSOPHY OF QUANTUM MECHANICS

instantly by the decoherence process. As such, for an IGUS to function it must perceive the world in terms of decoherent histories: proto-IGUSes which do not will fail to function. Natural selection then ensures that if the world contains IGUSes at all—and in particular if it contains intelligent life—those IGUSes will perceive the decoherence-selected branches as separate realities. The IGUS approach is committed, implicitly, to functionalism: it assumes that intelligent, conscious beings just are information-processing systems, and it furthermore assumes that these systems are instantiated in certain structures within the quantum state. (Recall that in the ontology I have defended, the quantum state is a highly structured object, with its structure being describable in terms of the expectation values of whatever the structurally preferred observables are in whichever bare quantum formalism we are considering.) In Wallace (2003a) I argued that this should be made explicit, and extended to a general functionalism about higher-level ontology: quite generally (and independent of the Everett interpretation) we should regard macroscopic objects like tables, chairs, tigers, planets and the like as structures instantiated in a lowerlevel theory. A tiger, for instance, is a pattern instantiated in certain collections of molecules; an economy is a pattern instantiated in certain collections of agents. Dennett (1991) proposed a particular formulation of this functionalist ontology: those formally-describable structures which deserve the name ‘real’ are those which are predictively and explanatorily necessary to our account of a system (in endorsing this view in Wallace (2003a), I dubbed it ‘Dennett’s criterion’). So for instance (and to borrow an example from Wallace (2003a)) what makes a tiger-structure “real” is the phenomenal gain in our understanding of systems involving tigers, and the phenomenal predictive improvements that result, if we choose to describe the system using tiger-language rather than restricting ourselves to talking about the molecular constituents of the tigers. A variant of Dennett’s approach has been developed by Ross (2000) and Ross and Ladyman (2007); ignoring the fine details of how it is to be cashed out, let us call this general approach to higher-level ontology simply functionalism, eliding the distinction between the general position and the restricted version which considers only the philosophy of mind. Functionalism in this sense is not an uncontroversial position. Kim (1998), in particular, criticises it and develops a rival framework based on mereology (this framework is in turn criticised in Ross and Spurrett (2004); see also Wallace (2004) and other commentaries following Ross and Spurrett (2004), and also the general comments on this sort of approach to metaphysics in chapter 1 of Ross and Ladyman (2007)). Personally I find it difficult to see how any account of higher-level ontology that is not functionalist in nature can possibly do justice to science as we find it; as Dennett (2005, p. 17) puts it, “functionalism in this broadest sense is so ubiquitous in science that it is tantamount to a reigning presumption of all science”; but in any event, the validity or otherwise of functionalism is a very general debate, not to be settled in the narrow context of the measurement problem.

THE EVERETT INTERPRETATION

47

The reason for discussing functionalism here is that (as I argued in Wallace (2003a)) it entails that the decohering branches really should be treated—really are approximately independent quasi-classical worlds. Consider: if a system happens to be in a quasi-classical state |q(t), p(t)⊗|ψ(t) (as defined in §2.1.1 and made more precise in §2.2) then12 its evolution will very accurately track the phase-space point (q(t), p(t)) in its classical evolution, and so instantiates the same structures. As such, insofar as that phase-space point actually represents a macroscopic system, and insofar is what it is to represent a macroscopic system is to instantiate certain structures, it follows that |q(t), p(t)⊗|ψ represents that same macroscopic system. The fact that these ‘certain structures’ are instantiated in the expectation values13 of some phase-space POVM rather than in the location of a collection of classical particles is, from a functionalist perspective, quite beside the point. Now if we consider instead a superposition α |q1 (t), p1 (t)⊗|ψ1 (t) + β |q2 (t), p(t)⊗|ψ2 (t)

(2.53)

then not one but two structures are instantiated in the expectation values of that same phase-space POVM: one corresponding to the classical history (q1 (t), p1 (t)), one to (q2 (t), p2 (t)), with decoherence ensuring that the structures do not interfere and cancel each other out but continue to evolve independently, each in its own region of phase space. Generalising to arbitrary such superpositions, we deduce that functionalism applied to the unitarily-evolving, realistically-interpreted quantum state yields the result that decoherence-defined branches are classical worlds. Not worlds in the sense of universes, precisely defined and dynamically completely isolated, but worlds in the sense of planets—very accurately defined but with a little inexactness, and not quite dynamically isolated, but with a self-interaction amongst constituents of a world which completely dwarfs interactions between worlds. This functionalist account of multiplicity is not in conflict with the IGUS strategy, but rather contains it. For not only could IGUSes not process information not restricted to a single branch, they could not even exist across branches. The structures in which they are instantiated will be robust against decoherence only if they lie within a single branch. 2.4.5

Probability: The Incoherence Problem

The decoherence solution to the preferred-basis problem tells us that the quantum state is really a constantly-branching structure of quasi-classical worlds. It is much less clear how notions of probability fit into this account: if an agent knows for certain that he is about to branch into many copies of himself—some of which 12 I ignore the possibility of chaos; if this is included, then the quantum system would be better described as instantiating an ensemble of classical worlds. 13 Note that here ‘expectation value’ of an operator P b simply denotes ψ| P b |ψ; no probabilistic interpretation is intended.

48

PHILOSOPHY OF QUANTUM MECHANICS

see a live cat, some a dead cat—then how can this be reconciled with the Quantum Algorithm’s requirement that he should expect with a certain probability to see a live cat? It is useful to split this problem in two: The Incoherence Problem: In a deterministic theory where we can have perfect knowledge of the details of the branching process, how can it even make sense to assign probabilities to outcomes? The Quantitative Problem: Even if it does make sense to assign probabilities to outcomes, why should they be the probabilities given by the Born rule? The incoherence problem rests on problems with personal identity. In branching, one person is replaced by a multitude of (initially near-identical) copies of that person, and it might be thought that this one-to-many relation of past to future selves renders any talk of personal identity simply incoherent in the face of branching (see, e.g., Albert and Loewer (1988) for a defence of this point). However (as pointed out by Saunders (1998b)) this charge of incoherence fails to take account of what grounds ordinary personal identity: namely (unless we believe in Cartesian egos) it is grounded by the causal and structural relations between past and future selves. These relations exist no less strongly between past and future selves when there exist additional such future selves; as such, if it is rational to care about one’s unique future self (as we must assume if personal identity in non-branching universes is to be made sense of) then it seems no less rational to care about one’s multiple future selves in the case of branching. This point was first made—entirely independently of QM—by Parfit; see his (1984). This still leaves the question of how probability fits in, and at this point there are two strategies available: the Fission Program and the Subjective Uncertainty Program (Wallace 2006a). The Fission Program works by considering situations where the interests of future selves are in conflict. For instance, suppose the agent, about to observe Schr¨ odinger’s Cat and thus to undergo branching, is offered an each-way bet on the cat being left alive. If he takes the bet, those future versions of himself who exist in live-cat branches will benefit and those who live in dead-cat branches will lose out. In deciding whether to take the bet, then, the agent will have to weigh the interests of some of his successors against those of others. Assigning a (formal) probability to each set of successors and choosing that action which benefits the highest-probability subset of successors is at least one way of carrying out this weighing of interests. This strategy is implicit in Deutsch (1999) and has been explicitly defended by Greaves (2004). It has the advantage of conforming unproblematically to our intuition that “I can feel uncertain over P only if I think that there is a fact of the matter regarding P of which I am ignorant” (Greaves 2004); it has the disadvantage of doing violence to our intuitions that uncertainty about the future is generally justified; it is open to question what epistemic weight these intuitions should bear.14 There is, however, a more serious problem with 14 See

Wallace (2005), especially §6, for more discussion of this point.

THE EVERETT INTERPRETATION

49

the Fission Program: it is at best uncertain whether it solves the measurement problem. For recall: in the framework of this review, ‘to solve the measurement problem’ is to construct a theory which entails the truth (exact or approximate) of the Quantum Algorithm, and that Algorithm dictates that we should regard macroscopic superpositions as probabilistic, and hence that an agent expecting branching should be in a state of uncertainty. The challenge for fission-program advocates is to find an alternative account of our epistemic situation according to which the Everett interpretation is nonetheless explanatory of our evidence. See Greaves (2007) for Greaves’ proposed account, which draws heavily on Bayesian epistemology. The Subjective Uncertainty Program aims to establish that probability really, literally, makes sense in the Everett universe: that is, that an agent who knows for certain that he is about to undergo branching is nonetheless justified in being uncertain about what to expect. (This form of uncertainty cannot depend on ignorance of some facts describable from a God’s-eye perspective, since the relevant features of the universal state are ex hypothesi perfectly knowable by the agent—hence, subjective uncertainty). Subjective uncertainty was first defended by Saunders (1998b), who asks: suppose that you are about to be split into multiple copies, then what should you expect to happen? He argues that, given that each of your multiple successors has the same structural/causal connections to you as would have been the case in the absence of splitting, the only coherent possibility is uncertainty: I should expect to be one of my future selves but I cannot know which. I presented an alternative strategy for justifying subjective uncertainty in Wallace (2005) (and more briefly in Wallace (2006a)). My proposal is that we are led to subjective uncertainty by considerations in the philosophy of language: namely, if we ask how we would analyse the semantics of a community of language-users in a constantly branching universe, we conclude that claims like “X might happen” come out true if X happens in some but not all branches. If the Subjective Uncertainty Program can be made to work, it avoids the epistemological problem of the Fission Program, for it aims to recover the quantum algorithm itself (and not just to account for its empirical success.) It remains controversial, however, whether subjective uncertainty really makes sense. For further discussion of subjective uncertainty and identity across branching, see Greaves (2004), Saunders and Wallace (2007), Wallace (2006a) and Lewis (2007). 2.4.6

Probability: The Quantitative Problem

The Quantitative Problem of probability in the Everett interpretation is often posed as a paradox: the number of branches has nothing to do with the weight (i.e., modulus-squared of the amplitude) of each branch, and the only reasonable choice of probability is that each branch is equiprobable, so the probabilities in the Everett interpretation can have nothing to do with the Born rule.

50

PHILOSOPHY OF QUANTUM MECHANICS

This sort of criticism has sometimes driven advocates of the Everett interpretation back to the strategy of modifying the formalism, adding a continuous infinity of worlds (Deutsch 1985) or minds (Albert and Loewer 1988; Lockwood 1989) in proportion to the weight of the corresponding branch. But this is unnecessary, for the criticism was mistaken in the first place: it relies on the idea that there is some sort of remotely well-defined branch number, whereas there is no such thing. This can most easily be seen using the decoherent-histories formalism. Recall that the ‘branches’ are decoherent histories in which quasi-classical dynamics apply, but recall too that the criteria of decoherence and quasi-classicality are approximate rather than exact. We can always fine-grain a given history space at the cost of slightly less complete decoherence, or coarse-grain it to ensure more complete decoherence; we can always replace the projectors in a history space by ever-so-slightly-different projectors and obtain an equally decoherent, equally quasi-classical space. These transformations do not affect the structures which can be identified in the decoherent histories (for those structures are themselves only approximately defined) but they wildly affect the number of branches with a given macroscopic property. The point is also apparent using the formalism of quasi-classical states discussed in §2.1.1. Recall that in that framework, a macroscopic superposition is written  dq dp α(q, p) |q, p⊗|ψq,p  . (2.54) If the states |q, p⊗|ψq,p  are to be taken as each defining a branch, there are continuum many of them, but if they are too close to one another then they will not be effectively decohered. So we will have to define branches via some coarse-graining of phase space into cells Qn , in terms of which we can define states  |n = dq dp α(q, p) |q, p⊗|ψq,p  . (2.55) Qn

The coarse-graining must be chosen such that the states |n are effectively decohered, but there will be no precisely-determined ‘best choice’ (and in any case no precisely-determined division of Hilbert space into macroscopic and microscopic degrees of freedom in the first place.) As such, the ‘count-the-branches’ method for assigning probabilities is illdefined.15 But if this dispels the paradox of objective probability, still a puzzle remains: why use the Born rule rather than any other probability rule? Broadly speaking, three strategies have been proposed to address this problem without modifying the formalism. The oldest strategy is to appeal to relative frequencies of experiments. It has long been known (Everett, 1957) that if many copies of a system are prepared and measured in some fixed basis, the total 15 Wallace

(forthcoming) presents an argument that the count-the-branches rule is incoherent even if the branch number were to be exactly definable.

THE EVERETT INTERPRETATION

51

weight of those branches where the relative frequency of any result differs appreciably from the weight of that result tends to zero as the number of copies tends to infinity. But it has been recognised for almost as long that this account of probability courts circularity: the claim that a branch has very small weight cannot be equated with the claim that it is improbable, unless we assume that which we are trying to prove, namely that weight=probability. It is perhaps worth noting, though, that precisely equivalent objections can be made against the frequentist definition of probability. Frequentists equate probability with long-run relative frequency, but again they run into a potential circularity. For we cannot prove that relative frequencies converge on probabilities, only that they probably do: that is, that the probability of the relative frequencies differing appreciably from the probabilities tends to zero as the number of repetitions of an experiment tends to infinity (the maths is formally almost identical in the classical and Everettian cases). As such, it is at least arguable that anyone who is happy with frequentism in general as an account of probability should have no additional worries in the case of the Everett interpretation.16 The second strategy might be called primitivism: simply postulate that weight = probability. This strategy is explicitly defended by Saunders (1998b); it is implicit in Vaidman’s “Behaviour Principle” (Vaidman 2002). It is open to the criticism of being unmotivated and even incoherent: effectively, to make the postulate is simply to stipulate that it is rationally compelling to care about one’s successors in proportion to their weight (or to expect to be a given successor in proportion to his weight, in subjective-uncertainty terms), and it is unclear that we have any right to postulate any such rationality principle, as if it were a law of nature. But again, it can be argued that classical probability theory is no better off here—what is a “propensity”, really, other than a primitively postulated rationality principle? (This is David Lewis’s “big bad bug” objection to Humean supervenience; see Lewis (1986, pp. xiv–xvii) and Lewis (1997) for further discussion of it). Papineau (1996) extends this objection to a general claim about probability in Everett: namely, although we do not understand it at all, we do not understand classical probability any better!—so it is unfair to reject the Everett interpretation simply on the grounds that it has an inadequate account of probability. The third, and most recent, strategy has no real classical analogue (though it has some connections with the ‘classical’ program in philosophy of probability, which aims to derive probability from symmetry). This third strategy aims to derive the principle that weight=probability from considering the constraints upon rational action of agents living in an Everettian universe.17 It was 16 Farhi, Goldstone, and Gutmann (1989) try to evade the circularity by direct consideration of infinitely many measurements, rather than just by taking limits; their work has recently been criticised by Caves and Schack (2005). 17 Given this, it is tempting to consider the Deutsch program as a form of subjectivism about probability, but—as I argue more extensively in Wallace (2006a)—this is not the case. There was always a conceptual connection between objective probability and the actions of rational

52

PHILOSOPHY OF QUANTUM MECHANICS

initially proposed by Deutsch (1999), who presented what he claimed to be a proof of the Born rule from decision-theoretic assumptions; this proof was criticised by Barnum et al. (2000), and defended by Wallace (2003b). Subsequently, I have presented various expansions and developments on the proof (Wallace 2007, forthcoming), and Zurek (2003b, 2005) has presented another variant of it. It remains a subject of controversy whether or not these ‘proofs’ indeed prove what they set out to prove. 2.4.7

Further Reading

Barrett (1999) is an extended discussion of Everett-type interpretations (from a perspective markedly different from mine); Vaidman (2002) is a short (and fairly opinionated) survey. Kent (2005) is a classic criticism of “old-style” many-worlds theories; Baker (2007), Lewis (2007) and Hemmo and Pitowsky (2007) criticise various aspects of the Everett interpretation as presented in this chapter. 2.5 Dynamical-Collapse Theories In this section and the next, we move away from pure interpretations of the bare quantum formalism, and begin to consider substantive modifications to it. There are essentially two ways to do this: Either the wavefunction, as given by the Schr¨ odinger equation, is not everything, or it is not right (Bell 1987, p. 201)

That is, if unitary QM predicts that the quantum state is in a macroscopic superposition, then either 1. the macroscopic world does not supervene on the quantum state alone but also (or instead) on so-called “hidden variables”, which pick out one term in the superposition as corresponding to the macroscopic world; or 2. the predictions of unitary QM are false: unitary evolution is an approximation, valid at the microscopic level but violated at the macroscopic, so that macroscopic superpositions do not in fact come into existence. The first possibility leads us towards hidden variable theories, the topic of §2.6. This section is concerned with “dynamical collapse” theories, which modify the dynamics to avoid macroscopic superpositions. 2.5.1

The GRW Theory as a Paradigm of Dynamical-Collapse Theories

How, exactly, should we modify the dynamics? Qualitatively it is fairly straightforward to see what is required. Firstly, given the enormous empirical success of QM at the microscopic level we would be well advised to leave the Schr¨ odinger agents (as recognised in Lewis’s Principal Principle (Lewis 1980))—what makes a probability ‘objective’ is that all rational agents are constrained by it in the same way, and this is what Deutsch’s proofs (purport to) establish for the quantum weight. In other words, there are objective probabilities—and they have turned out to be the quantum weights.

DYNAMICAL-COLLAPSE THEORIES

53

equation alone at that level. At the other extreme, the Quantum Algorithm dictates that states like α |dead cat + β |live cat

(2.56)

must be interpretable probabilistically, which means that our modification must “collapse” the wavefunction rapidly into either |dead cat or |live cat—and furthermore, they must do it stochastically, so that the wavefunction collapses into |dead cat with probability |α|2 and |live cat with probability |β|2 . Decoherence theory offers a way to make these qualitative remarks somewhat more precise. We know that even in unitary QM, probabilistic mixtures of pointer-basis states are effectively indistinguishable from coherent superpositions of those states. So we can be confident that our dynamical-collapse theory will not be in contradiction with the observed successes of quantum theory provided that coherent superpositions are decohered by the environment before they undergo dynamical collapse—or, equivalently, provided that superpositions which are robust against decoherence generally do not undergo dynamical collapse. Furthermore, dynamical collapse should leave the system in (or close to) a pointer-basis state—this is in any case desirable, since the pointer-basis states are quasi-classical states, approximately localized in phase space. The other constraint—that macroscopic superpositions should collapse quickly—is harder to quantify. How quickly should they collapse? Proponents of dynamical-collapse theories—such as (Bassi and Ghirardi, 2003)—generally require that the speed of collapse should be chosen so as to prevent “the embarrassing occurrence of linear superpositions of appreciably different locations of a macroscopic object”. But it is unclear exactly when a given superposition counts as “embarrassing”. One natural criterion is that the superpositions should collapse before humans have a chance to observe them. But the motivation for this is open to question. For suppose that a human observer looks at the state (2.56). If collapse is quick, the state rapidly collapses into |dead cat

or

|live cat

(2.57)

and observation puts the cat-observer system into the state |dead cat⊗|observer sees dead cat or |live cat⊗|observer sees live cat (2.58) Given the stochastic nature of the collapse, the probability of the observer being in a state where he remembers seeing a dead cat is |α|2 . Now suppose that the collapse is much slower, taking several seconds to occur. Then the cat-observer system enters the superposition α |dead cat⊗|observer sees dead cat + β |live cat⊗|observer sees live cat (2.59) Who knows what it is like to be in such a state?18 But no matter: in a few seconds the state collapses to 18 According

to the functionalist analysis of §2.4.4 “it is like” there being two people, one alive and one dead; but we shall not assume this here.

54

PHILOSOPHY OF QUANTUM MECHANICS

|dead cat⊗|observer sees dead cat

|live cat⊗|observer sees live cat . (2.60) Once again, the agent is in a state where he remembers seeing either a live or dead cat, and the probability is |α|2 that he remembers seeing a dead cat—since his memories are encoded in his physical state, he will have no memory of the superposition. So the fast and slow collapses appear indistinguishable empirically. However, let us leave this point to one side. The basic constraints on a collapse theory remain: it must cause superpositions of pointer-basis states to collapse to pointer-basis states, and it must do so quickly enough to suppress “embarrassing superpositions”; however, it must not have any appreciable affect on states which do not undergo decoherence. Here we see again the difficulties caused by the approximate and ill-defined nature of decoherence. If decoherence were an exactly and uniquely defined process, we could just stipulate that collapse automatically occurs when states enter superpositions of pointer-basis states. Such a theory, in fact, would be exactly our ‘solution that isn’t’ from §2.3.1. But since decoherence is not at all like this, we cannot use it directly to define a dynamical-collapse theory. The requirement on a dynamical collapse theory is then: find a modification to the Schr¨ odinger equation that is cleanly defined in microphysical terms, and yet which closely approximates collapse to the decoherence-preferred basis. And such theories can in fact be found. The classic example is the “GRW theory” of Ghirardi, Rimini, and Weber (1986). The GRW theory postulates that every particle in the Universe has some small spontaneous chance per unit time of collapsing into a localised Gaussian wave-packet: or

ψ(x) −→ N exp(−(x − x0 )2 /2L2 )ψ(x)

(2.61)

where L is a new fundamental constant (and N is just a normalisation factor). The probability of collapse defines another new constant: τ , the mean time between collapses. Crucially, the ‘collapse centre’ x0 is determined stochastically: the probability that ψ collapses to a Gaussian with collapse centre in the vicinity of x0 is proportional to |ψ(x0 )|2 . If the particle is highly localised (that is, localised within a region small compared with L) then the collapse will have negligible effect on it; if it is in a superposition of such states, it will be left in just one of them, with the probability of collapse to a given state being equal to its mod-squared amplitude. Now, τ is chosen to be extremely small, so that the chance of an isolated particle collapsing in a reasonable period of time is quite negligible. But things are otherwise if the particle is part of a macroscopic object. (The generalisation of (2.61) to N -particle systems is just ψ(x1 , . . . xm , . . . xN ) −→ N exp(−(xm − x0 )2 /2L2 )ψ(x1 , . . . xm , . . . xN ) (2.62) where the collapse occurs on the mth particle.) For suppose that that macroscopic object is in a superposition: something like (schematically)

DYNAMICAL-COLLAPSE THEORIES

α |at X ⊗ · · · ⊗ |at X + β |at Y  ⊗ · · · ⊗ |at Y  .

55

(2.63)

If N 1/τ , then within a small fraction of a second one of these particles will undergo collapse. Then the collapse will kick that particle (roughly speaking) into either |at X (with probability |α|2 ) or |at Y  (with probability |β|2 ). For convenience, suppose it in fact collapses to X. Then because of the entanglement, so do all of the other particles—the system as a whole collapses to a state very close to |at X ⊗ · · · ⊗ |at X . (2.64) (Taking more mathematical care: if ψ(x1 , . . . xN ) is the wavefunction of a macroscopic N -particle body approximately localised at x = 0, then αψ(x1 − X, . . . xN − X) + βψ(x1 − Y, . . . xN − Y ).

(2.65)

If the first particle undergoes collapse, then its collapse centre has a probability  |α|2 to be in the vicinity of X. Assuming this is so, the post-collapse wavefunction is approximately proportional to αψ(x1 − X, . . . xN − X) + β exp(−|X − Y |2 /L2 )ψ(x1 − Y, . . . xN − Y ). (2.66) On the assumption that |X − Y | L, the second term in the superposition is hugely suppressed compared with the first.) So: the GRW theory causes superpositions of N particles to collapse into localised states in a time ∼ τ /N , which will be very short if τ is chosen appropriately; but it has almost no detectable effect on small numbers of particles. From the perspective in which I have presented dynamical collapse, GRW incorporates two key observations: 1. Although the decoherence process is approximately defined and highly emergent, the actual pointer-basis states are fairly simple: they are Gaussians, approximately localised at a particular point in phase space. As such, it is sufficient to define collapse as suppressing superpositions of position states. 2. Similarly, although the definition of ‘macroscopic system’ given by decoherence is highly emergent, in practice such systems can be picked out simply by the fact that they are compounds of a great many particles. So a collapse mechanism defined for single particles is sufficient to cause rapid collapse of macroscopic systems. The actual choice of GRW parameters is determined by the sorts of considerations discussed above. Typical choices are L = 10−5 cm, τ = 1016 s, ensuring that an individual particle undergoes collapse only after ∼ 108 years, but a grain of dust ∼ 10−2 cm across will undergo collapse within a hundredth of a second, and Schr¨ odinger’s cat will undergo it after ∼ 10−11 seconds. (In fact, if the GRW theory holds then the cat never has the chance to get into the alive-dead superposition in the first place: dynamical collapse will occur in the cat-killing apparatus long before it begins its dreaded work.)

56

PHILOSOPHY OF QUANTUM MECHANICS

The GRW theory is not the “last word” on dynamical-collapse theories. Even in the non-relativistic domain it is not fully satisfactory: manifestly, the collapse mechanism does not preserve the symmetries of the wavefunction, and so it is not compatible with the existence of identical particles. These and other considerations led Pearle (1989) to develop “continuous state localisation” (or CSL), a variant on GRW where the collapse mechanism preserves the symmetry of the wavefunction, and most advocates of dynamical collapse now support CSL rather than GRW. (See Bassi and Ghirardi (2003, §8) for a review of CSL.) However, there seems to be a consensus that foundational issues with CSL can be equally well understood in the mathematically simpler context of GRW. As such, conceptual and philosophical work on dynamical collapse is predominantly concerned with GRW, in the reasonable expectation that lessons learned there will generalise to CSL and perhaps beyond. 2.5.2

The Problem of Tails and the Fuzzy Link

The main locus of purely philosophical work on the GRW theory in the past decade has been the so-called “problem of tails”. As I shall argue (following Cordero (1999) to some extent) there are actually two “problems of tails”, only one of which is a particular problem of dynamical-collapse theories, but both are concerned with the stubborn resistance of the wavefunction to remain decently confined in a finite-volume region of space. The original “problem of tails” introduced by Albert and Loewer (1996) works as follows. Suppose we have a particle in a superposition of two fairly localised states |here and |there: |ψ = α |here + β |there .

(2.67)

Dynamical collapse will rapidly occur, propelling the system into something like

(2.68) |ψ   = 1 − 2 |here + |there . But (no matter how small may be) this is not the same state as |ψ   = |here .

(2.69)

Why should the continued presence of the ‘there’ term in the superposition—the continued indefiniteness of the system between ‘here’ and ‘there’—be ameliorated in any way at all just because the ‘there’ term has low amplitude? Call this the problem of structured tails (the reason for the name will become apparent). It is specific to dynamical collapse theories; it is a consequence of the GRW collapse mechanism, which represents collapse by multiplication by a Gaussian and so fails to annihilate terms in a superposition no matter how far they are from the collapse centre. It is interesting, though, that most of the recent ‘tails’ literature has dealt with a rather different problem which we might call the problem of bare tails.

DYNAMICAL-COLLAPSE THEORIES

57

Namely: even if we ignore the ‘there’ state, the wave function of |here is itself spatially highly delocalised. Its centre-of-mass wavefunction is no doubt a Gaussian, and Gaussians are completely delocalised in space, for all that they may be concentrated in one region or another. So how can a delocalised wave-packet possibly count as a localised particle? This problem has little or nothing to do with the GRW theory. Rather, it is an unavoidable consequence of using wave-packets to stand in for localised particles. For no wave-packet evolving unitarily will remain in any finite spatial region for more than an instant (consider that infinite potentials would be required to prevent it tunneling to freedom.) Apparent force is added on to this objection by applying to it the eigenvector-eigenvalue link. The latter gives a perfectly clear criterion for when a particle is localised in any spatial region R: it must be an eigenstate of the operator  R = dx |x x| . (2.70) P R

That is, it must have support within R; hence, no physically realisable state is every localised in a finite region. One might be inclined to respond: so much the worse for the eigenvectoreigenvalue link, at least in the context of continuous observables. As we have seen in §2.1.3, its motivation in modern QM is tenuous at best. But that simply transfers the problem: if the eigenvector-eigenvalue link is not to be the arbiter for which physical states count as localised, what is? Albert and Loewer propose a solution: namely a natural extension of the eigenvector-eigenvalue link which they call the fuzzy link. Recall that the eigenvector-eigenvalue link associates (at least a subset of) properties 1:1 with projectors, and regards a state |ψ as possessing (the property associated with)  iff P  |ψ = |ψ; that is, if projector P  |ψ − |ψ | = 0. |P

(2.71)

The fuzzy link is a relaxation of this condition: the properties remain in one-toone correspondence with the projectors, but now |ψ has property (associated  if, for some fixed small p, with) P  |ψ − |ψ | < p. |P

(2.72)

We shall return to the constant p and the question of what determines it; for now note only that it must be chosen to be sufficiently large that wave-packets really count as localised, sufficiently small that intuitively ‘delocalised’ states do not erroneously count as localised.

58

PHILOSOPHY OF QUANTUM MECHANICS

2.5.3

The Counting Anomaly

But these constraints lead to a problem: the counting anomaly, introduced by Lewis (1997). Suppose, with Lewis, that a (ridiculously19 ) large number N of distinguishable non-interacting particles are confined within some box. The wave i is the function of each will be strongly peaked inside the box, so that if P ‘particle i is in the box’ operator (that is, if it projects onto states of the ith par i |ψ − |ψ | ∼ for extremely small . (For ticle with support in the box) then |P instance, for a 1-metre box and an atom whose wavepacket has characteristic 10 width ∼ 10−10 m, is of the order of 10−10 .) But now consider the proposition ‘all N particles are in the box’. By def = ΠN P  inition, this is represented by the operator P i=1 i . Suppose that each particle has identical state |ψ; suppose that each |ψ is highly localised in the box, as above. Then the overall state of the N particles is |Ψ = ⊗N i=1 |ψ. Then  |Ψ | = ΠN |P  i |ψ | = (1 − )N . |P i=1 And this is unfortunate for the Fuzzy Link. For no matter how small may be, there will be some value of N for which (1 − )N < p. And for that value of N , the Fuzzy Link tells us that it is false that all N particles are in the box, even as it tells us that, for each of the N particles, it is true that that particle is in the box. So how did that happen? We can see what is going on in the following way. The proposition ‘all N particles are in the box’ is by nature compositional: it is in some sense definitionally equivalent to ‘particle 1 is in the box and particle 2 is in the box and ... and particle N is in the box’. But there are two ways to understand this compositionality: 1. The actual, true-or-false, proposition ‘all N particles are in the box’ is equivalent to the conjunction of the N propositions ‘particle 1 is in the box’ ‘particle 2 is in the box’, etc. So it is true iff each of those propositions is true. In turn, via Fuzzy Link semantics each one of those propositions is  i |ψ − |ψ | < p. true iff |P 2. The proposition ‘all N particles are in the box’ is associated, via the one-toone correspondence between propositions and projectors, with some projector; since that one-to-one correspondence respects the compositional structure of propositions, the proposition is associated with that projec which is the logical product of the N projectors corresponding to tor P , ‘particle 1 is in the box’, ‘particle 2 is in the box’, etc. Once we have P we can use the Fuzzy Link to determine that the proposition is true iff  |Ψ − |Ψ | < p—that is, iff Πi |P  i |ψ − |ψ | < p. |P In both cases, we construct the proposition ‘all the marbles are in the box’ via conjunction of the N component propositions. But in case (1) we apply this conjunction at the level of the actual propositions after having applied the Fuzzy 19 Somewhere

in the vicinity of 1010 2 visible universe is about 1010 .

20

are required; recall that the number of particles in the

DYNAMICAL-COLLAPSE THEORIES

59

Link to extract a propositional truth condition from a projector; in case (2) we do it the other way around. And the two procedures do not commute. That might suggest an obvious remedy to the Counting Anomaly: fix one way round—fairly obviously (1), given that it seems forced on us by the semantics of ordinary language—and declare it correct. Which amounts to the following: restrict the Fuzzy Link to our basic propositions (those describing the properties of individual particles), and then allow ordinary truth-functional semantics to dictate truth conditions for compound propositions. On this strategy (call it the single-particle fuzzy link) (2) becomes a derived and approximate truth— something which in fact holds in all conceivable circumstances in the actual world, but is not logically true. Clifton and Monton (1999), in their discussion of the Counting Anomaly, consider and reject this view, for instructive reasons: [T]his strategy would require that the wavefunction collapse theorist not simply weaken the eigenstate-eigenvalue link between proof and probability 1, but sever this link entirely. And if one is willing to entertain the thought that events in a quantum world can happen without being mandated or made overwhelmingly likely by the wavefunction, then it is no longer clear why one should need to solve the measurement problem by collapsing wavefunctions! Another reason not to [accept the single-particle fuzzy link strategy] is that it seems arbitrary to apply a semantic rule for quantum states to a single-particle-system, but not to a multi-particle-system. Indeed, to the extent that one supposes there to be a plausible intuitive connection between an event’s having high probability according to a theory, and the event actually occurring, one is hard pressed to resist the intuition in the multi-particle case.

However, this fails to recognise that for collapse theorists, the wavefunction is a physical entity. It is not some sort of probability distribution which makes events ‘overwhelmingly likely’, it is the microscopic stuff out of which macroscopic objects—including the constituents of events—are made. Furthermore, Clifton and Monton are too quick to accept (in their discussion of a ‘semantic rule’) that there must be a link between the macroscopic properties of a system and the projectors onto that system’s Hilbert space. As we have seen, this is a consequence of the eigenstate-eigenvalue link that we would be wise to reject. If we are serious about taking a realist attitude to the wavefunction, then macroscopic properties may turn out to supervene on any microscopic properties we like—including directly on single-particle fuzzy-link-defined positions—and need not have any particularly useful direct relation to large-dimensional projectors. In fact, the single-particle fuzzy link has found little favour, partly for technical reasons (to be fair, this too was anticipated by Clifton and Monton). However, a conceptually rather closely related response to the Counting Anomaly has been widely accepted: the Mass Density Link (Ghirardi et al., 1995). The Mass Density Link makes a clean break with eigenvector/eigenvalue link semantics: commendably from the viewpoint of this chapter, it grants no particular semantic status at all to the ‘observables’ in general. Instead, it defines the following mass density observer for an n-particle system:

60

PHILOSOPHY OF QUANTUM MECHANICS

 (x) = M



 i (x) mi N

(2.73)

i

 i (x) are the number density where the sum is over all particles and mi and N operators for the ith particle. For instance, if there is just one particle of mass m under consideration then the mass density operator for that particle is  (x) = m |x x| . M

(2.74)

(It is apparent from this that the mass density is a distributional operator, rigorously defined only when smeared over some finite volume.) The mass density of state |ψ is then defined just as  (x) |ψ . ρψ (x) = ψ| M

(2.75)

In more intuitively understandable terms, the mass density is the sum of the mass-weighted ‘probability’ densities for finding each particle at x; for a one-particle wavefunction ψ(x), ρ(x) is just m × |ψ(x)|2 . According to the Mass Density Link, a particle is in the box if some sufficiently high fraction (1 − p) of its mass is in the box (again we postpone questions as to what fixes the value of p). The meaning of ‘all N particles are in the box’ is, uncomplicatedly, ‘particle 1 is in the box and particle 2 is in the box and . . . ’, and the truth conditions of that proposition are just that it is true iff all of the component propositions are true. The interpretation provides no alternative, possibly-incompatible way to access its truth value, and the Counting Anomaly is avoided.20 Lewis (2003, 2004b, 2005) is unconvinced. He argues (to some extent following Clifton and Monton 2000) that the Mass Density Link avoids the Counting Anomaly at the cost of a comparably unintuitive result, which he calls the location anomaly. This anomaly arises when we consider the process of looking at the box and physically counting the number of particles in it. The ordinary quantum measurement theory—which the GRW theory is supposed to reproduce—then predicts that the expected number of particles found in the box will be somewhat less than N . Lewis claims that this clash between the predictions of how many particles are found in the box and how many are actually in the box “violates the entailments of everyday language” (Lewis, 2005, p. 174). Ghirardi and Bassi (1999, 2003) are bemused by this criticism, for reasons that I share: we have a theory which (a) gives a perfectly well-defined description of how many particles are in the box; (b) allows a precise description, in terms acceptable to the realist, of the measurement process by which we determine how many particles are in the box; (c) predicts that if the number of particles is sufficiently (i.e., ridiculously) large there will be tiny deviations between the 20 Ghirardi et al. have a further requirement on the mass density: that it be accessible; see the references above, and especially Bassi and Ghirardi (2003, pp. 86–92), for the meaning of this claim, and Monton (2004a) for a criticism.

DYNAMICAL-COLLAPSE THEORIES

61

actual number of particles and the recorded number of particles. They, and I, fail to see what the problem is here; I leave readers to reach their own conclusions. 2.5.4

The Status of the Link Principles

Perhaps, however, we are beginning to lose sight of the wood for the trees. The GRW theory is normally presented simply as a mathematical modification of the dynamics of the wavefunction, in which case the theory’s mathematical consistency is not in question. So how did we even begin to worry that the theory was internally contradictory? Answering this question requires us to consider: what actually is the status of these link principles (whether the “fuzzy link” or the “mass density link”)? Often, discussions of them are most naturally read as regarding the principles as a piece of physical law: that is, to specify a dynamical-collapse theory we must not only give the modifications to the Schr¨ odinger equation but also state the link principle. On this reading “fuzzylink” GRW and “mass-density” GRW are different physical theories; so, too, are two versions of fuzzy-link GRW which disagree about the value of p in (2.72). Monton (2004a) argues explicitly for this reading of the link principles (mostly on the grounds that he wishes to reject wavefunction realism altogether), and Allori et al. (2007) explore its consequences in extenso, but other authors seem to write as though they adopted this reading. Bassi and Ghirardi (2003), for instance, describe the mass density as the ‘beable’ of the GRW theory (pp. 94–5); Lewis (2004b) proposes empirical tests to measure the value of p in the Fuzzy Link. How should we understand the ontology of a theory which treats the link principle as physical law? As far as I can see, such theories must be understood dualistically: in addition to the nonlinearly evolving wavefunction (to be understood, perhaps, as a field on 3N-dimensional space; cf. §2.4.3) there is a 3-dimensional world of mass densities, or possibly of fuzzy-link-defined classical properties. The 3-dimensional world has no dynamical effect on the wavefunction, and conversely it is entirely determined by the wavefunction.21 In philosophy-ofmind terms this is a property dualism: the wavefunction has certain properties which are picked out by non-dynamical principles (the link principles in this case) as in some sense special (in philosophy of mind, for subserving conscious experience; in dynamical-collapse theories, for subserving macroscopic objects). On this ontology, the counting anomaly (though not, I think, the location anomaly) must be taken seriously, for it entails that our property-ascription rule ascribes contradictory properties to a given system. However, this property-dualist ontology is unattractive. For one thing, it is explicitly anti-functionalist (cf. §2.4.4), since it requires that higher-level ontology supervene on only a primitively-selected subset of structural and dynamical properties of the world; for another, it effectively introduces new physical constants, such as p in the case of the Fuzzy Link. Hence, in explicit discussions of 21 By

contrast in hidden variable theories the hidden variables are fixed at best probabilistically by the wavefunction.

62

PHILOSOPHY OF QUANTUM MECHANICS

the status of the link principles an alternative view is more common: that the ontology of the theory consists of the wavefunction alone and the link principles are just perspicacious ways of picking out certain relevant properties of that wavefunction. Albert and Loewer (1996) cash this out with reference to language use: Our everyday language will supervene only vaguely (as it always has) on the microlanguage of particle positions, and . . . that language will itself supervene only vaguely . . . on the fundamental language of physics. And note (and this is important) that swallowing this additional vagueness will leave physics not one whit less of an empirical science than it has ever been. The fundamental language, the language of wavefunctions, the language of the stuff of which (on these theories) the world actually consists, is absolutely precise. (Albert and Loewer, Albert and Loewer1996, p. 90)

Clifton and Monton (1999, p. 716) and Lewis (2003, p. 168) give similar accounts. Such accounts can be regarded as functionalist in spirit: the fundamental ontology is given by the wavefunction alone, and our higher-level talk supervenes on properties of that wavefunction picked out not a priori (as would be the case if the link principles were fundamental) but by considerations of how our language describes these properties—which means, ultimately, by considerations of the structural and dynamical function played by these properties. On this reading, the counting anomaly is of no real import: it represents at most a failure of our linguistic conventions to operate as we might wish in a truly bizarre, and certainly never-reached-in-practice, physical situation. However, if the link principles are to be understood in this way we will have come full circle, back to the original problem of tails, which I called the problem of ‘structured tails’ above. If regarded as fundamental principles, both the Fuzzy Link and the Mass Density Link deal perfectly satisfactorily with states like

(2.76) |ψ = 1 − 2 |live cat + |dead cat : the Fuzzy Link says directly that the cat is alive because |ψ is very close to being an eigenstate of the ‘cat is alive’ projector with eigenvalue 1; the Mass Density Link entails that the cat’s cells are localised in the spatial regions corresponding to a living cat. However, for all that its amplitude is tiny, the dead-cat term in the superposition is just as ‘real’ as the live-cat term. (Recall: we are treating the wavefunction as physical : the amplitude of a term in a superposition has nothing to do with the probability of that term, except indirectly via its rˆ ole in the stochastic dynamics.) As such, if the link principles are just a matter of descriptive convenience then what prevents us regarding observers as being just as present in the deadcat term as in the live-cat term? After all, if we do accept functionalism then the dead-cat term is as rich in complex structure as the live-cat term. Taken to its logical conclusion, this seems to suggest that the GRW theory with a functionalist reading of the link principles is just as much a ‘many-worlds’ theory as is the Everett interpretation (a point made by Cordero (1999)). But

HIDDEN VARIABLE THEORIES

63

the matter has received rather little critical discussion, and it may well be that the problem is solvable either via identifying a conceptual error in the argument, or by modification of the GRW dynamics so as to suppress the structure in the low-weight branches.22 2.5.5 Further Reading Bassi and Ghirardi (2003) provides a detailed review of the GRW and CSL theories; Ghiardi (2002) is a briefer introduction. Lewis (2005) is a good starting point for the Counting Anomaly. There is a very different approach to the ontology of the GRW theory which has received some attention from physicists but gone almost unnoticed amongst philosophers: take the ontology to be just the collapse centres themselves, and treat the wavefunction as secondary. See Dowker and Herbauts (2005) for a technical development, and Allori et al. (2007) for philosophical discussion. Still another, and very different, approach, is the “transactional interpretation” developed primarily by J. G. Cramer (see, e.g., Cramer 1986; Cramer 1988), in which (roughly speaking) the collapse propagates backwards in time along the past light-cone of the particle. Price (1996) discusses and defends the conceptual consequences of such a theory. One of the most exciting features of dynamical collapse is that it is in principle testable. Leggett (2002) and Schlosshauer (2006) consider (from very different perspectives) the prospect of empirical tests for the failure of unitarity. 2.6 Hidden Variable Theories Hidden variable theories take seriously the second half of Bell’s dilemma: if QM is right, maybe it is not everything. The most famous such theory— the de BroglieBohm theory—is now over fifty years old; other hidden variable theories are of more recent vintage (and in particular, the so-called “modal interpretation” is now generally recognised as a form of hidden-variable theory). Here I shall first explore some of the general requirements on hidden-variable theories, then discuss the de Broglie-Bohm theory and the modal interpretation, and then consider some of the open conceptual questions facing hidden-variable theories. 2.6.1 Hidden Variables for Classical Physics: A Parable Suppose, fancifully, that for some reason we only had classical mechanics in its statistical-mechanical form: as a theory concerning the evolution of some measure ρ(q, p) over phase space, in accordance with the usual dynamical equation ρ˙ = {ρ, H} =

∂ρ ∂H ∂ρ ∂H − ; ∂q i ∂pi ∂pi ∂q i

(2.77)

suppose also that we have a Classical Algorithm for extracting empirical predictions from the theory, which tells us that the probability of getting a result 22 The most straightforward way to make such a modification would be to replace the Gaussian used in the collapse process with a wave-packet of compact support (this does nothing to address the problem of bare tails but does annihilate the structured tails).

64

PHILOSOPHY OF QUANTUM MECHANICS

in the vicinity of (q, p) on making a phase-space measurement of the system is proportional to ρ(q, p). If we were asked to provide an interpretation of this theory, and we were having an off day, we might well begin by taking ρ as a physical entity, evolving in a 2N -dimensional space; we might further worry about how such a highly delocalised entity can correspond to our experiences of systems having definite positions and momenta; we might even toy with modifying (2.77) in some nonlinear and stochastic way so as to concentrate ρ periodically on much smaller regions of phase space. Of course, we would be missing the point. There is a hidden variable theory for classical statistical mechanics. These “hidden variables” are the positions and momenta of a swarm of pointlike particles, obeying the dynamical equation ∂H ∂H d (q, p) = ,− . (2.78) dt ∂p ∂q ρ is not a physical entity at all: it is a probability distribution, summarising our ignorance of the actual values of the hidden variables, and its “dynamical equation” (2.77) is just the result of pushing that distribution forwards through time via the real dynamical equation (2.78). If we actually know the values of the hidden variables we can dispense with ρ altogether; it is only because in practice we do not know them (hence “hidden”) that in practice we often fall back on using ρ. The original hope for hidden-variable theories was that they would work in just this way. The quantum state, after all, does serve to generate probability distributions (or equivalently, expectation values) over possible results of measurements, evolving in time via 

d   H   X =  X, dt

(2.79)

and (notwithstanding the criticisms of §2.1.3) such measurements are traditionally associated with possessed quantities. So the hope was that actual quantum systems have “hidden” determinate values of each quantity, that the quantum state is just a shorthand way of expressing a probability distribution over the various values of each of those quantities, and that some underlying (stochastic or deterministic) law for the hidden variables generates (2.79) just as (2.78) generates (2.77). Let us call this an eliminativist hidden-variable strategy: ‘eliminativist’ because it seeks to eliminate the wavefunction entirely from the formalism, and recover it only as a probabilistic average over hidden variables. Half a century of work has made it clear that any eliminativist hidden-variable theory must possess some highly undesirable features. Nonlocality is the least of these: Bell’s work (1981a) shows that hidden variable theories must be nonlocal, but it is fairly generally accepted (Redhead 1987, pp. 98–106, Maudlin 2002) that these conclusions apply equally to dynamical-collapse theories—so if we

HIDDEN VARIABLE THEORIES

65

want to be realists and to avoid the Everett interpretation, nonlocality is probably unavoidable. More seriously, the Bell-Kochen-Specker (BKS) theorem (Bell, 1966, Kochen and Specker, 1967; see Redhead (1987, pp. 118–38) or Peres, 1995, pp. 187–212) for a discussion) tells us that any hidden-variable theory which assigns values to all properties represented by projectors must be contextual. That is: whether or not a system is found, on measurement, to possess a given property must depend on what other properties are measured simultaneously. Contextuality seems well-nigh inconsistent with the idea that systems determinately do or do not possess given properties and that measurements simply determine whether or not they do. In the light of the BKS theorem, there seem to be four strategies for hiddenvariable theorists to follow. 1. Construct contextual hidden-variable theories, and try to come to terms with the contextuality. This strategy does not seem to have been widely explored, presumably because failure of contextuality is so pathological (Spekkens, 2007) is an interesting exception.) 2. Maintain the idea that the quantum state is just shorthand for a set of probability distributions, but abandon the idea that any underlying microdynamics can be found. This allows us to say, for instance, that 40% of particles had position x and that 35% had momentum y, but forbids us to say anything about the correlations between the two or to refer to the position or momentum of any particular particle. Such interpretations (which are pure interpretations in the sense of §2.1.2) are often called ensemble interpretations, and have been defended by, e.g., Ballentine (1970, 1990) and Taylor (1986). It seems fairly clear that these interpretations are essentially variants of the ‘operationalist interpretation’ of §2.3.4, using the ensemble just to make conceptual sense of the probabilities. I am less clear, however, whether the proponents of the ensemble interpretation would accept this. 3. Abandon classical logic, on which the BKS theorem relies. Certain versions of quantum-logic interpretations can perhaps best be understood as eliminativist hidden-variable theories built on a non-classical logic: in particular, this description seems a good fit to the quantum-logic interpretation discussed in Dickson (2001). The requirement for a dynamics then translates into the requirement for a logic of multi-time propositions (such as “the particle is currently at location x ∧ in t seconds it will be at location y”). In general, quantum-logic approaches lie beyond the scope of this review so I will say nothing further here about them. 4. Abandon the idea that the hidden variables must include determinate values (or values at all) for all properties. If they only have values of, say, position, then the BKS theorem does not apply, since it relies on the existence of sets of properties not all of whose associated projectors commute.

66

PHILOSOPHY OF QUANTUM MECHANICS

It is the fourth strategy which is adopted by the vast majority of hiddenvariable theories currently discussed in philosophy of QM. In fact, the restriction  in practice has to be severe: if one fixes a single non-degenerate observable X  and requires that the hidden variables have definite values of X, the only other  23 values which they can possess are values of functions of X. What is perhaps less obvious is that the fourth strategy sharply constrains eliminativist hidden-variable theories of all but the most trivial kind, for it is nearly impossible to establish empirically adequate dynamics for such theories. This follows from the fact that probability distribution over the hidden variables now badly underdetermines the quantum state. For instance, suppose that the hidden variables are associated with some  (with eigenstates {|x} satisfying X  |x = x |x). If the probability of operator X the hidden variables having value x is R(x) then we know the state vector must have form  |ψ = R(x) exp(iθ(x)) |x , (2.80) x

but we have no information about the phase factors θ(x). Since these phases affect the dynamics of |ψ, they affect the future probability distribution of the hidden variables—hence, their current distribution underdetermines their future distribution. In the light of this, it is unsurprising to observe that essentially all hiddenvariable theories currently studied are what might be called dualist hiddenvariable theories: their formalism contains not only the hidden variables, but the quantum state, with the latter playing a dynamical rˆ ole in determining the evolution of the former.24 The remainder of this section will be concerned with such theories. 2.6.2 General Constraints on Hidden-Variable Theories To be precise, formally such theories are specified by giving, for whatever system is under study, 1. The quantum state |ψ(t) (evolving unitarily via the Schr¨ odinger equation).  i (chosen sufficiently small that no Bell2. Some preferred set of observables X  i must Kochen-Specker paradox arises; in practice this means that the X 25 all commute with one another). 23 This statement is only true for generic choices of state vector (those which overlap with b A very elegant theorem of Bub and Clifton (Bub and Clifton 1996, Bub, all eigenspaces of X). Clifton, and Goldstein 2000; see also Bub 1997, chapter 4) places more precise restrictions on exactly which properties can be included in a hidden-variable theory before non-contextuality fails. 24 Nelson’s theory (1966, 1985) is a partial counterexample: it includes hidden variables with definite positions and an additional field which encodes the phase information about the wavefunction, but does not include the wavefunction itself. Still, it remains very far from eliminativist. 25 Busch (1998) considers the possibility of using POVMs rather than “old-fashioned” observables.

HIDDEN VARIABLE THEORIES

67

3. The actual possessed values xi of these observables. 4. A dynamical equation for the xi , which we might write schematically as d xi (t) = Fi (x1 (t), . . . , xN (t); |ψ(t)) dt

(2.81)

and which may be deterministic or stochastic. It is then possible to speak of the ‘determinate properties’ for such a theory: namely, all the properties whose defining projectors are eigenprojectors of the  i (and therefore whose value is fixed by the values of the xi ). X The idea of such theories is that the observable world is in some sense represented by the hidden variables, rather than (or at least: as well as) the state vector. (As such, in dualistic hidden-variable theories it is actually rather odd to call the variables “hidden”: if anything it is the state vector that is hidden. Bell (1981b) suggests that we might do better to refer to them as the exposed variables!) As we shall see, there are a variety of ways to cash this out. However, it is possible to place some empirical constraints on these theories, for to resolve the measurement problem they must allow us to reproduce the Quantum Algorithm. Here it will be useful again to adopt the decoherent histories formalism: quasi i (tk ); classical histories can be identified with certain sequences of projectors P k instantaneous quasi-classical states can be identified with single such projectors. From this we derive First constraint on a hidden-variable theory: There must exist a quasi-classical decoherent history space, fine-grained enough that any two empirically distinguishable histories of the world correspond to distinct histories in the space, and such that any b i (tk ) in that space is determinate at time tk . projector P k

This first constraint guarantees that if we know the values of the hidden variables, we know which macroscopic state is represented by the theory. We need to go further, however: the Quantum Algorithm is probabilistic in nature, and those probabilities must be represented somewhere in the hidden variable theory. This leads to the b is a time-t Second constraint on a hidden-variable theory (weak version): If P projector from the decoherent history space of the first constraint, then the probability b is determinately possessed by the system at of the hidden variables being such that P b |ψ. time t, conditional on the universal state at time t being |ψ, must be ψ| P

This guarantees the empirical accuracy of the Born rule for macroscopic states at a given time. In view of the usual interpretation of the ‘hidden variables’ giving the actual value of the preferred quantities, it is normal to require a stronger condition: b is any Second constraint on a hidden-variable theory (strong version): If P projector whose value is determinate at time t then the probability of the hidden varib is determinately possessed by the system at time t, conditional ables being such that P b |ψ. on the universal state at time t being |ψ, must be ψ| P

Now, the second constraint (in either form) requires us to place a certain probability distribution over hidden variables at time t. The dynamical equation

68

PHILOSOPHY OF QUANTUM MECHANICS

for the variables will then determine a probability distribution over them at all other times; if that probability distribution is not the one required by the second constraint, we will have a contradiction. This gives our Third constraint on a hidden-variable theory: if the second constraint is satisfied at one time, the dynamics are such that it is satisfied at all other times. (This constraint on the dynamics is sometimes called equivariance.)

It might seem that this list of constraints is sufficient to recover the Quantum Algorithm; not so. For that Algorithm involves “collapse of the wavefunction”: it permits us to interpret macroscopic superpositions as probabilistic in nature, and so to discard all terms in the superposition except that corresponding to the actually-observed world. In hidden-variable theories, though, the state continues to evolve unitarily at all times and no such collapse occurs—in principle, all terms in the superposition, not just the one corresponding to the quasi-classical world picked out by the hidden variables, can affect the evolution of the hidden variables via the dynamical equation. To prevent a clash with the Quantum Algorithm, we need to rule this out explicitly: b is a time-t projector in the Fourth constraint on a hidden-variable theory: if P decoherent history space of the first constraint, and if the hidden variables are such b is possessed by the system at time t, then the that the property corresponding to P dynamics of the hidden variables after t are the same whether we take as universal state b |ψ /P b |ψ .; in terms of the notation of equation (2.81), we are requiring that |ψ or P b |ψ /P b |ψ ). F (x1 , . . . xN ; |ψ) = F(x1 , . . . xN ; P

(2.82)

This assumption might be called locality: the hidden variables are affected only by “their” branch of the state vector. If a hidden-variable theory satisfies these four constraints, then it will solve the measurement problem —provided that we can actually understand the mathematical formalism of the theory in such a way as to justify the claim that it represents the single approximately-classical world picked out by the hidden variables. As such, the problem is two-fold: to construct such theories in the first place, and then to interpret them. 2.6.3 Specific Theories I: Modal Interpretations In trying to construct actual hidden-variable theories which conform to these four constraints, we once again run up against the inherent approximateness of decoherence. If there existed a clean, precise statement of what the decoherent histories actually were, we could just stipulate that the preferred quantities were precisely the projectors in the decoherent histories. (Effectively, of course, this would be to return us to something like the “solution that isn’t” of §2.3.1). But since decoherence is imprecisely defined and approximately satisfied, this strategy is not available. One way around this problem is to define some cleanly-stateable, statedependent rule which approximately picks out the decoherence-preferred quantities. A concrete way to do this was developed by Kochen, Healey, Dieks and others, following a proposal of van Fraassen (1991); theories of this form are

HIDDEN VARIABLE THEORIES

69

normally called modal interpretations, although the term is used in a variety of conflicting ways by different authors. Modal interpretations assign definite properties to subsystems of a given isolated system. They do so via the so-called “Schmidt decomposition” of the quantum state: given a Hilbert space H = HA ⊗ HB and a state |ψ in that space, it is always possible to find orthonormal bases {|Ai }, {|Bj } for HA and HB such that  λi |Ai ⊗|Bi  ; (2.83) |ψ = i

furthermore, generically (i.e., except when two of the λi are equal) this decomposition is unique. Modal interpretations take the projectors |Ai  Ai | to define the preferred properties of the subsystem represented by HA . Notice: the reduced state  ρA for HA is  |λi |2 |Ai  Ai | .; (2.84)  ρA = i

hence, the basis {|Ai } diagonalises the reduced state. If the system described by HA is macroscopic, and decohered by its environment, then the decoherence basis will approximately diagonalise  ρA ; hence, it is at least plausible that the modal interpretation’s preferred quantity is close to being decoherence-preferred. If we add an equivariant dynamics, it appears that we have a hidden-variable interpretation which satisfies the four constraints, and thus is a candidate for solving the measurement problem. And in fact it has proved possible to construct such dynamics: see Bacciagaluppi (1998) for a discussion. I should note that my presentation here differs significantly from the actual historical development of modal interpretations. Originally, the Schmidt decomposition rule was proposed in order that quantum measurements should actually  is some observable for HA (with succeed in measuring the right quantity: if X eigenstates |xi )and HB is the Hilbert space of a measurement device intended  then we might model the measurement process by to measure X, |xi ⊗|ready −→ |xi ⊗|measure xi  , in which case measurement on a superposition gives   λi |xi  ⊗ |ready −→ λi |xi ⊗|measure xi  . i

(2.85)

(2.86)

i

Plainly, the Schmidt decomposition applied to the post-measurement state gives  as a determinate observable for HA and X  xi |measure xi  measure xi | (2.87) i

 (the “X-measurement observable”) as a determinate observable for HB ; if the distribution of hidden variable values satisfies the Second Constraint then the

70

PHILOSOPHY OF QUANTUM MECHANICS

 if and only if the measurement system being measured will have value xi for X device determinately possesses the property of having measured xi . However, most measurements are not ‘ideal’ in this sense; many, for instance totally destroy the system being measured, which we might represent as |xi ⊗|ready −→ |x0 ⊗|measure xi  ,

(2.88)

(in optics, for instance, |x0  might be a no-photon state). It follows that the definite property for HB in this case is the property of being in the superposition  λi |measure xi  , (2.89) i

which is decidedly non-classical. It was recognised by Bacciagaluppi and Hemmo (1996) that this problem is resolved when decoherence is taken into account, since the state (2.89), once the environment is considered, rapidly decoheres into an entangled state like  λi |measure xi ⊗| i  , (2.90) i

with  i | j   δij , and so it is plausible that the definite-measurement-outcome observable is again determinate. Modal interpretations, however, have run into some serious difficulties, both conceptual and technical. For one thing, it is very unclear what the ontology of the theory is intended to be. Even in the presence of decoherence, the properties picked out by the modal-interpretation rule will not exactly be properties like position and momentum; it will determinately be true not that a system is localised in region R, but only that it is approximately localised in region R. And it has been argued (initially by Albert and Loewer (1990); see also Reutsche (1998)) that this is insufficient to solve the measurement problem: superpositions with very uneven amplitudes are still superpositions. My own view is that this should not be taken very seriously as an objection, however. It stems once again from the unmotivated assumption that classical properties have to be associated with projectors, but there is an alternative strategy available: if |Ai  Ai | corresponds to the maximally fine-grained property that a system is supposed to have, just take the ontology of the theory to be the state vector (the wavefunction, if you like) |Ai , understood as a physical entity. If that physical entity is not an eigenstate of any particularly easily-described operator, so be it. (The parallel to my objections to the Fuzzy Link in the GRW theory and to the Bare Theory should be plain; see also Arntzenius (1998) for a somewhat related response to this objection.) More serious conceptual threats to modal interpretations arise when we consider composite systems. For one thing, the modal interpretation appears to require a preferred decomposition of the universal Hilbert space into subsystems: as has been shown by Bacciagaluppi (1995), if we apply the modal rule to all decompositions, it leads to contradiction. But even if we suppose that we

HIDDEN VARIABLE THEORIES

71

are given some such preferred decomposition, trouble looms. Suppose we have a system with three subsystems, with Hilbert space H = HA ⊗ HB ⊗ HC and state |ψ. We can determine the determinate properties of HA in two ways: by applying the Schmidt decomposition rule to HA directly as a subsystem of H, or by applying it first to HA ⊗ HB as a subsystem of H and then to HA as a subsystem of HA ⊗ HB . However, the Schmidt decomposition theorem does not generalise: there is no guarantee that |ψ can be decomposed like |ψ =



λi |Ai  ⊗ |Bi  ⊗ |Ci 

(2.91)

i

with Ai |Aj  = Bi |Bj  = Ci |Cj  = δij . This means, in turn, that there is no guarantee that the two methods for finding the determinate properties of HA will give the same—or even approximately the same—answer. This perspectivalism of properties was noted by Arntzenius (1990) (see also Arntzenius (1998), Clifton (1996)). Two technical results alleviate the problem. Firstly, it can easily be proved that if a property determinately is possessed by a system from one perspective, it will not be found determinately not to be possessed from any perspective. Secondly, in the presence of decoherence decompositions like (2.91) turn out to be approximately true. But it is far from clear that this is sufficient: nonperspectivalism has the feel of a conceptual necessity, not just an approximatelytrue empirical fact. (What does it even mean to say that some system has the property that its subsystem has property p, if not that the subsystem just has that property?) There are basically two strategies to deal with perspectivalism: either accept it as a counter-intuitive but not formally contradictory property of QM (which will lead to some sort of non-classical logic), or adopt an ‘atomic’ theory according to which the modal rule applies only to some preferred decomposition of the system into atomic subsystems, and where the properties of compound systems are by definition given by the properties of their atomic components (as suggested by Bacciagaluppi and Dickson (1999)). See Vermaas (1998) for a comparison of the two strategies; note also the close similarity between these issues and the counting anomaly discussed in §2.5.3. So much for the conceptual objections. A potentially fatal technical objection exists: in realistic models, it seems that decoherence does not after all guarantee that the determinate properties pick out quasi-classical histories. For just because a given basis approximately diagonalises the density operator of a system, there is no guarantee that approximately that same basis exactly diagonalises it. In fact, where the density operator is approximately degenerate, the exactlydiagonalising basis (i.e., the one picked out by the modal rule) can be wildly different from the approximately-diagonalising basis (i.e., the quasiclassical one picked out by decoherence). See Bacciagaluppi, Donald, and Vermaas (1995),

72

PHILOSOPHY OF QUANTUM MECHANICS

Donald (1998), and Bacciagaluppi (2000) for further details. It is unclear whether the modal interpretation remains viable at all in the face of this problem. 2.6.4

Specific Theories II: de Broglie-Bohm theory

In view of the difficulties faced by the modal interpretation, it has fallen somewhat from favour. An older strategy for constructing hidden-variable theories remains popular, though (at least amongst philosophers of physics): rather than trying to approximate the decoherence rule itself, just pick an observable which in fact is approximately decoherence-preferred. The obvious choice is position: we take as hidden variables, for an N -particle system, the positions q1 , · · · qN of each of the N particles, and represent our ignorance of their precise values via the probability measure  dq 1 · · · dq N |ψ(q1 , · · · qN )|2 . (2.92) Pr(q1 , · · · qN  ∈ S) = S

We then specify an equivariant, configuration-space-local differential equation for the evolution of the q1 , · · · qN . There are a number of available choices, but the standard choice is the guidance equation: 1 ∇i ψ(q1 , · · · qN ) dqi = Im . dt m ψ(q1 , · · · qN )

(2.93)

The resulting theory is known as the de Broglie-Bohm theory (DBB), or sometimes as Bohmian mechanics or the “Pilot-wave theory”; its first clear and widely accessible statement was Bohm (1952). DBB has a number of conceptual advantages over modal interpretations. The ontology of its hidden variables seems fairly clear: they are simply point particles, interacting with one another via a spatially nonlocal but configurationspace-local dynamical equation, and following well-defined trajectories in space (following Brown (1996), and to distinguish the hidden variables from other senses of ‘particle’, I will refer to these point particles as corpuscles). Manifestly, there is no perspectivalism in DBB: the hidden variables assigned to any set of M 1-particle systems are just the M corpuscles associated with those systems.26 On the technical side, for DBB to solve the measurement problem we require that the corpuscles track the decohered macroscopic degrees of freedom of large systems. It is at least plausible that they do: coarse-grainings of the position components of the macroscopic degrees of freedom generate an approximately decoherent space of histories, and since the guidance equation is configurationspace-local, it seems that Constraints 1–4 are all satisfied. However, the matter is somewhat more complicated: as I noted in §2.2.3, the decoherent-history 26 Note here that the theory has no problem with identical particles: the corpuscles of a system of identical particles can be identified with a point in the reduced configuration space, so that the corpuscles too are identical. In fact, this framework gives some insight into the origin of fermionic and bosonic statistics (Brown et al., 1999).

HIDDEN VARIABLE THEORIES

73

framework is not a perfect fit to the actual phenomenology of decoherence in macroscopic systems, where the “pointer basis” is overcomplete and where decoherence between distinct states in the pointer basis is not an all-or-nothing matter. Further plausibility arguments have been constructed (e. g. Bell (1981b, §4), Holland (1993, pp. 336–50), D¨ urr, Goldstein, and Zanghi (1996, pp. 39–41), and some simple models have been studied; at present, it seems likely that the corpuscles do track the quasiclassical trajectories sufficiently well for DBB to solve the measurement problem, but there exists no full proof of this. (As with the modal interpretation, my route to deriving and justifying DBB— choose position as the hidden variable because it is approximately decohered and write down local dynamical equations for it—is not exactly that seen in the literature. The more common justification of why it suffices to take position as preferred is that “the only observations we must consider are position observations” (Bell, 1982), and hence if we wish to reproduce the predictions of QM for all measurements it is sufficient to reproduce them for position measurements. The equivalent slogan for my approach would be “the only macroscopic superpositions we must consider are superpositions of states with different positions”.) DBB is substantially the most studied, and the most popular, extant hiddenvariable theory. While the remainder of this section will consider general conceptual problems with hidden-variable theories, DBB will normally be used when an example is required, and in fact much of the literature study on these issues has dealt specifically with DBB. 2.6.5

Underdetermination of the Theory

At least at the level of the mathematics, hidden-variable theories reproduce the predictions of QM by picking out a particular decoherent history as actual. It follows that the details of such theories are underdetermined by the empirical data, in two ways: 1. It is impossible to determine the values of the hidden variables more precisely than by determining which decoherent history they pick out. (For instance, it is widely recognised that the location of the de Broglie-Bohm corpuscles cannot be fixed more precisely than by determining which wavepacket they lie in.) This means that the details of what the hidden-variable observables are cannot be determined empirically, even in principle; many different choices would lead to exactly the same macroscopic phenomenology. (In this sense, if no other, the hidden variables are indeed ‘hidden’ !) 2. Similarly, any two hidden-variable dynamical equations that reproduce the macroscopic dynamics—that is, that are equivariant in the sense of Constraint 3 and local in the sense of Constraint 4—will be empirically indistinguishable even in principle. We have seen that both forms of underdetermination loom large for the modal interpretations, in the controversy over perspectivalism and in the wide variety of different dynamical schemes which have been suggested. In DBB there is a

74

PHILOSOPHY OF QUANTUM MECHANICS

stronger consensus that the guidance equation gives the true dynamics27 but many variants are possible, both deterministic (Deotto and Ghirardi 1998) and stochastic (Bacciagaluppi 1999). There is also some disagreement over the hidden variables once we try to generalise the theory from spinless non-relativistic particles. In the case of particles with spin, some wish to include spin as a further hidden variable; others prefer a more minimalist version of the theory where corpuscles have only position. In the relativistic case, for various reasons it has been proposed that only fermions (Bell, 1984) or only bosons (Struyve and Westman, 2006) have associated corpuscles. Bassi and Ghirardi (2003, fn. 16) argue that underdetermination is more serious for hidden-variable theories than for dynamical-collapse theories. There are, to be sure, a great many dynamical-collapse theories compatible with the observed success of QM (consider, for instance, all the variants on the GRW theory produced by tweaking the collapse rate and the width of the post-collapse wavefunction). But these theories are empirically distinguishable, at least in principle. By contrast, no empirical test at all can distinguish different hidden-variable theories: eventually, the results of any such test would have to be recorded macroscopically, and equivariance guarantees that the probability of any given macroscopic record being obtained from a given experiment is determinable from the wavefunction alone, independent of the details of the hidden-variable theory.28 As such, supporters of hidden-variable theories (and in particular of DBB) have generally tried to advance non-empirical reasons to prefer one formulation or another—for instance, D¨ urr et al. (1992, pp. 852–4) argue that the guidance equation is the simplest equivariant, configuration-space-local and Gallileancovariant dynamical equation. In discussing which particles actually have corpuscles, Goldstein et al. (2005) appeal to general considerations of scepticism to argue that we should associate corpuscles to all species of particle. It is interesting at any rate to notice that if DBB really is a viable solution to the measurement problem, then it appears to offer abundant examples of underdetermination of theory by data (Newton-Smith 2000, pp. 40–3; Psillos 1999, pp. 162–82). 2.6.6

The Origin of the Probability Rule

Probability enters hidden-variable theories in much the same way is it enters classical statistical mechanics: the actual hidden variables have determinate but unknown values; the probability distribution represents our ignorance of that value. As in statistical mechanics, so in hidden-variable theories, we can ask: 27 There is a certain amount of disagreement as to the correct form of the guidance equation. Bohm originally proposed a second-order form of the equation, with the wavefunction’s action on the corpuscles represented by the so-called ‘quantum potential’; Bohm and Hiley (1993) continue to prefer this formalism. It is currently more common (mostly following Bell’s expositions of the theory) to use the first-order form. However, both versions of the equation ultimately generate the same dynamics for the corpuscles. 28 At least, this is so insofar as the hidden-variable theory reproduces quantum probabilities; cf. §2.6.6.

HIDDEN VARIABLE THEORIES

75

what justifies the particular probability distribution which we assume in order to make empirical predictions? In both cases, we postulate a probability measure over possible values of the dynamical variables; in both cases, there are both conceptual and technical questions about that measure. And in fact, some of the same strategies are seen in both cases. (Most, but not all, of the literature discussion has been in the context of DBB.) In particular, D¨ urr, Goldstein, and Zanghi (1996, pp. 36–41) propose an essentially cosmological approach to the problem: they suggest that we adopt as a postulate that the probability of the de Broglie-Bohm corpuscles having initial locations in the vicinity (q 1 , . . . q N ) is just (proportional to) |Ψ(q 1 , . . . q N )|2 , where |Ψ|2 is the initial wavefunction of the Universe. They argue that this is a sort of “typicality” assumption, tantamount to assuming that the actual corpuscle distribution is typical (in the |Ψ|2 measure) amongst the set of all possible corpuscle distributions. (There is an obvious generalisation to other hidden variable theories.) They acknowledge that there is an apparent non sequitur here, since distributions typical with respect to one measure may be highly atypical with respect to another, but they argue that the equivariance of the |Ψ|2 measure makes it a natural choice (just as, in classical statistical mechanics, it is common to argue that the equivariance of the Liouville measure makes it natural). For a defence of this ‘typicality’ framework in the general statistical-mechanical context, see Goldstein (2002); for a critique, see Dickson (1998, p. 122–3). A non-cosmological alternative exists: rather than postulate that the initialstate probability distribution was the |Ψ|2 distribution, we could postulate that it was some other distribution, and try to show that it evolves into the |Ψ|2 distribution reasonably quickly. This proposal has been developed primarily by Antony Valentini (Valentini 1996, Valentini 2001, Valentini and Westman 2005); again, he draws analogies with statistical mechanics, calling the |Ψ|2 distribution “quantum equilibrium” and aiming to prove that arbitrary probability distributions “converge to equilibrium”. Whether this convergence occurs in practice obviously depends on the dynamics of the hidden-variable theory in question: it is not prima facie obvious that an arbitrary hidden-variable dynamics (or even one which satisfies the Third and Fourth constraints) would have this property. However, there is fairly good evidence that in fact the convergence does occur for at least a wide class of hidden-variable theories. Bacciagaluppi (1998, p. 208) claims that convergence occurs for certain natural dynamics definable within the modal interpretation, while Valentini (ibid.) proves what he calls a “quantum H-theorem” which establishes convergence in DBB. (As with Boltzmann’s own ‘H-theorem’, though, Valentini’s result must be taken as at most indicative of convergence to equilibrium.) One fascinating corollary of these dynamical strategies is that the Universe— or at least some subsystems of it—might not be in “quantum equilibrium” at all. This would create observable violations of the predictions of QM, and might

76

PHILOSOPHY OF QUANTUM MECHANICS

provide a context in which hidden-variable theories could actually be tested (Valentini, 2001 and 2004). 2.6.7

The Ontology of the State Vector

Perhaps the most serious conceptual problem with hidden-variable theories is their dualistic nature. Recall that in eliminativist hidden-variable theories it is unproblematically true that the hidden variables alone represent physical reality—the state vector is a mere book-keeping device, a shorthand way of expressing our ignorance about the values of the hidden variables. But as we saw in §2.6.1, in actual hidden-variable theories the state vector plays an essential part in the formalism. Until we can say what it is, then, we do not have a truly satisfactory understanding of hidden-variable theories, nor a solution of the measurement problem. To adapt the terminology of §2.1.2, a technically satisfactory hidden-variable theory gives us a ‘bare hidden-variable formalism’, and a ‘Hidden-variable Algorithm’ whose predictions agree with those of the Quantum Algorithm, but what we need is a ‘pure interpretation’ of the bare hidden-variable formalism from which we can derive the Hidden-variable algorithm. And we cannot achieve this unless we are in a position to comment on the ontological status of the state vector. This question has been addressed in the literature in two ways, which we might call “bottom-up” and “top-down”. Bottom-up approaches look at particular properties of quantum systems. For instance, in DBB we might want to ask whether spin, or charge, or mass, should be considered as a property of the corpuscle, of the wave-packet, or of both. Much of the bottom-up literature has drawn conclusions from particular (thought- or actual) experiments: in particular, the so-called “fooled detector” experiments (Englert, Scully, Sussmann, and Walther 1992; Dewdney, Hardy, and Squires 1993; Brown, Dewdney, and Horton 1995; Aharonov and Vaidman 1996; Hiley, Callaghan, and Maroney 2000) suggest that not all “position” measurements actually measure corpuscle position. However, I shall focus here on the “top-down” approaches, which consider and criticise general theses about the ontological status of the wavefunction. The most obvious such thesis is wavefunction realism: the state vector is a physical entity, which interacts with the hidden variables. As was noted in §2.4.3, this was Bell’s position; other advocates include Albert (1996) and Valentini. One problem with wavefunction realism is that it violates the so-called “actionreaction principle”: the wavefunction acts on the hidden variables without being acted on in turn. Opinions and intuitions differ on the significance of this: Anandan and Brown (1995) regard it as an extremely serious flaw in the theory; at the other extreme, D¨ urr, Goldstein, and Zanghi (1997, §11) suggest that the principle is just a generalisation of Newton’s 3rd Law, and so is inapplicable to a theory whose dynamical equations are first order.

HIDDEN VARIABLE THEORIES

77

A potentially more serious flaw arises from the so-called “Everett-in-denial”29 objection to realism (Deutsch 1996; Zeh 1999; Brown and Wallace 2005). This objection begins with the observation that if the wavefunction is physical, and evolves unitarily, then the only difference between the Everett interpretation and the hidden-variable interpretation under consideration is the additional presence of the hidden-variables. (Note that we are concerned with the ‘pureinterpretation’ form of the Everett interpretation considered in §2.4.3–2.4.6, not with the Many-Exact-Worlds or Many-Minds theories discussed in §2.4.2.) Advocates of the Everett interpretation claim that, (given functionalism) the decoherence-defined quasiclassical histories in the unitarily evolving physically real wavefunction describe—are—a multiplicity of almost-identical quasiclassical worlds; if that same unitarily-evolving physically real wavefunction is present in DBB (or any other hidden-variable theory) then so is that multiplicity of physically real worlds, and all the hidden variables do is point superfluously at one of them. So far as I can see, hidden-variable theorists have two possible responses to the Everett-in-denial objection (other than denying wavefunction realism). Firstly, they can reject functionalism (Brown (1996) implicitly makes this recommendation to Bohmians, when he argues that the ultimate rˆ ole of the de Broglie-Bohm corpuscles is to act as a supervenience base for consciousness). Secondly, they can accept functionalism as a general metaphysical principle but argue (contra the arguments presented in §2.4.4) that it does not entail that a unitarily-evolving wavefunction subserves a multiplicity of quasi-classical worlds. Either response, interestingly, is metaphysically a priori : functionalism, if true at all, is an a priori truth, and it follows that it is an a priori question whether a unitarily evolving wavefunction does or does not subserve multiple worlds. At the least, the Everett-in-denial objection seems to support the claim that it is not a contingent fact which of the Everett and hidden-variables strategies is correct. Wavefunction realism is not the only—perhaps even not the most popular— interpretation of the state vector within hidden-variable theories. Space does not permit a detailed discussion, but I list the main proposals briefly. The state vector is an expression of a law. This can be understood best by analogy with classical physics: the gravitational potential, just like the quantum wavefunction, is a function on configuration space which determines the dynamics of the particles, but we do not reify the gravitational potential, so why reify the wavefunction? Some advocates of this proposal (such as D¨ urr, Goldstein, and Zanghi (1997)) believe that it entails further technical work (for instance, to remove the apparent contingency); others, such as Monton (2004b), claim no such work is needed. (See Brown and Wallace (2005) for my own—rather negative—view on this strategy.) 29 “[P]ilot-wave

(1996, p. 225).

theories are parallel-universe theories in a state of chronic denial” Deutsch

78

PHILOSOPHY OF QUANTUM MECHANICS

The state vector is an expression of possibilities (so that the state vector determines what is (naturally, or physically) possible whereas the hidden variables determine what is actual ). This is a common interpretation of the state vector within modal interpretations (hence “modal”, in fact); the obvious worry is that the merely possible is not normally allowed to have dynamical influences on the actual, nor to be physically contingent itself (as the wavefunction is normally taken to be). The state vector is a property of the hidden variables (Monton, 2004b). Monton bases this proposal on the eigenvalue-eigenvector link: if the properties of a quantum system are given by the projectors on its Hilbert space, then |ψ will always instantiate the property represented by |ψ ψ|. However, it is unclear to me whether treating the state vector as a “real property” is essentially different from treating it as a physical thing: by analogy, insofar as it is coherent at all to regard the electromagnetic field as a “property” of the charged particles, it does not seem particularly relevant to its reality. 2.6.8

Further Reading

Holland (1993) and Cushing, Fine, and Goldstein (1996) both provide detailed accounts of the de Broglie-Bohm theory (the former is a single-authored monograph, the latter a collection of articles). Dieks and Vermaas (1998) is an excellent collection of articles on modal interpretations. 2.7

Relativistic Quantum Physics

A great deal is now known about the constraints which observed phenomena put on any attempt to construct a relativistic quantum theory. In outline: • Bell’s Theorem shows us that any realist theory which is in a certain sense also “local” has a certain maximum level of correlation which it permits between measurement results obtained at distinct locations. • QM predicts that this maximum level will be exceeded; therefore any realist theory which reproduces the results of QM is in a certain sense “non-local”. • In any case, actual experiments have produced correlations between spatially separated detectors which violate this maximum level; therefore— irrespective of quantum theory—the true theory of the world is in a certain sense “non-local”. For some recent reviews of the issue, see Butterfield (1992), Maudlin (2002), Peres, 1995, pp. 148–86), Dickson (1998, part 2) and Bub (1997, chapter 2). In this section I shall pursue a rather different line. In practice we actually have a well-developed relativistic quantum theory: quantum field theory (QFT). So as well as asking what constraints are placed on interpretations of QM by relativity in general, we should ask whether a given interpretative strategy can

RELATIVISTIC QUANTUM PHYSICS

79

recover the actual empirical predictions of the concrete relativistic quantum theory we have. Perhaps such a strategy will produce a theory which is not Lorentzcovariant at the fundamental level, but given the enormous empirical successes of QFT, our strategy had better reproduce those successes at the observable level. Firstly, though, I shall discuss the conceptual status of QFT itself. As we shall see, it is not without its own foundational problems. 2.7.1

What is Quantum Field Theory?

The traditional way to describe a quantum field theory is through so-called operator-valued fields: formally speaking, these are functions from points in spacetime to operators on a Hilbert space, normally written as (for instance)  μ ). Just as quantum particle mechanics is specified by giving certain preferred ψ(x  and P  , say) on a Hilbert space, and a Hamiltonian defined in terms operators (Q of them, so a quantum field theory is specified by giving certain operator-valued fields and a Hamiltonian defined in terms of them. The simplest field theory, (real) Klein-Gordon field theory, for instance, is given by two operator-valued  μ ) and π (xμ ); their mathematical structure is given (at least formally) fields φ(x by the equal-time commutation relations

  t), π φ(x, (y, t) = iδ(x − y) (2.94)  P  ] = i, and the by analogy to the particle-mechanics commutation relations [Q, dynamics is generated by the Hamiltonian   2 (x) + m2 φ 2 (x) .  = 1 dx3 π (2.95) 2 (x) + ∇φ H 2 (Field theories defined via equal-time commutation relations are called bosonic; there is, however, an equally important class of field theories, the fermionic theories, which are specified by equal-time anti-commutation relations.) Note that, as is customary in discussing field theory, we adopt the Heisenberg picture, in which it is observables rather than states which evolve under the Schr¨ odinger equation:

rather than

    Q(t) = exp(−iHt/) Q(0) exp(+i/Ht)

(2.96)

 |ψ(t) = exp(−iHt/) |ψ(0) .

(2.97)

This makes the covariance of the theory rather more manifest; it leads, however, to an unfortunate temptation, to regard the operator-valued fields as analogous to the classical fields (so that quantum field theory is a field theory because it deals with operator-valued fields, just as classical field theory is a field theory because it deals with real-valued fields). This is a serious error: treating the operator-valued fields as part of the ontology of the theory is no more justified

80

PHILOSOPHY OF QUANTUM MECHANICS

 and P  as part of the ontology of quantum particle than treating the operators Q mechanics.30 This being noted, what is the ontology of quantum field theory? This is of course a heavily interpretation-dependent question, but for the moment let us consider it in the context of state-vector realism (as would perhaps be appropriate for an Everett interpretation of QFT, or for some hypothetical dynamicalcollapse or hidden-variable variant on QFT). In §2.4.3 we saw that the most common approach to state-vector realism is wavefunction realism, according to which the N -particle wavefunction is a complex-valued function on 3N -dimensional space. The analogous strategy in bosonic QFT interprets the state vector as a wavefunction over field configuration space: it assigns a complex number to every field configuration at a given time. Mathematically at least this requires a preferred foliation of spacetime and seems to break the covariance of QFT; it is also unclear how to understand it when dealing with fermionic quantum fields. An alternative strategy (briefly mentioned in §2.4.3) assigns a density operator to each spacetime region (defining the expectation values of all observables definable in terms of the field operators within that spacetime region). The issue has so far received very little foundational attention: what little attention there has so far been to state-vector realism has been largely restricted to the nonrelativistic domain. There is another complication in QFT, however: perhaps it should not be understood ontologically as a field theory at all. At least in the case of a free field theory, there is a natural interpretation of the theory in terms of particles: the Hilbert space possesses a so-called “Fock-space decomposition” H=

∞ 

Hn

(2.98)

i=0

where Hn has a natural isomorphism to a space of n identical free particles. (That is, Hn is isomorphic to either the symmetrised or the antisymmetrised nfold tensor product of some 1-free-particle system, and the isomorphism preserves energy and momentum, and so in particular the Hamiltonian.) Given this, it might be tempting to interpret QFTs ontologically as particle theories, and regard the ‘field’ aspect as merely a heuristic—useful in constructing the theory, but ultimately to be discarded (a proposal developed in detail by Weinberg (1995), and with links to the so-called ‘Segal quantisation’ approach to QFT pioneered by Segal (1964), Segal (1967) and explored philosophically by Saunders (1991, 1992). However, it is at best highly unclear whether it can be sustained. Part of the reason for this is purely conceptual: as we have seen, the spatiotemporally localised field observables at least provide some kind of basis for understanding the ontology of a quantum field theory. But a particle ontology 30 There is a concrete ontological proposal for QM which does treat the operators as part of the ontology (in QM and QFT both): the reading of Heisenberg offered by Deutsch and Hayden (2000). For critical discussion of their proposals see Wallace and Timpson (2007).

RELATIVISTIC QUANTUM PHYSICS

81

seems to require a different set of observables: in particular, it seems to require position observables for each particle. And unfortunately it appears that no such observables actually exist (see Saunders (1992, 1998a) and references therein). In any case, there is a more serious reason to be very skeptical about a particle ontology: namely that it does not seem to account for the status of particles in interacting quantum field theories, to which we now turn. 2.7.2

Particles and Quasiparticles

In order to understand the status of particles in interacting QFT, it is helpful to digress via solid-state physics, where the conceptual and mathematical issues are less controversial. At the fundamental level, the ontology of a solid-state system—a crystal, say—is uncontroversial: it consists of a large collection of atoms in a regular lattice, and if we describe the (no doubt highly entangled) joint state of all of those atoms then we have said all there is to say about the crystal’s quantum state. Nevertheless, we can describe the crystal in quantumfield-theoretic terms—the “field operators” associated to a point x are the x−, y− and z− components of the displacement from equilibrium of the atom whose equilibrium location is centred on point x (or the closest atom if no atom is exactly at x), together with the associated conjugate momenta. If the crystal is “harmonic”—that is, if its Hamiltonian is quadratic in positions and momenta—the “quantum field” produced in this fashion is free, and has an exact (formal) interpretation in terms of particles, which can be understood as quantised, localisable, propagating disturbances in the crystal. These ‘particles’ are known as phonons. What if it is not harmonic? The Hamiltonian of the crystal can often be separated into two terms —  int  =H  f ree + H H

(2.99)

 f ree is quadratic and H  int is small enough to be treated as a persuch that H  turbation to H f ree . It is then possible to understand the crystal in terms of interacting phonons—their free propagation determined by the first term in the Hamiltonian, their interactions, spontaneous creations, and spontaneous decays determined by the second term. The exact division into “free” and “interacting” terms is not unique, but is chosen to minimise the scale of the interaction term in the actual system under study (so that the choice of division is state-dependent). If there exists such a division of this sort, then phonons will provide a very useful analytic tool to study the crystal—so that various of its properties, such as the heat capacity, can be calculated by treating the crystal as a gas of fairlyweakly-interacting phonons. The usefulness of the particle concept in the practical analysis of the crystal decreases as the interaction term becomes larger; in circumstances where it becomes so large as to render the particle concept useless, we must either seek a different division of the Hamiltonian into free and interacting terms, or give up on particle methods altogether.

82

PHILOSOPHY OF QUANTUM MECHANICS

This method is completely ubiquitous in solid-state physics. Vibrations are described in terms of “phonons”; magnetic fields in terms of “magnons”; propagating charges in plasmas in terms of “plasmons, and so forth. The general term for the “particles” used in such analyses is quasi-particles. The particular quasi-particles to be used will vary from situation to situation: in one circumstance the system is best understood in terms of plasmons of a particular mass and charge; in another the temperature or density or whatever is such that a different division of the Hamiltonian into free and interacting terms is more useful and so we assign a different mass and charge to the plasmons (hence we talk about ‘temperature-dependence’ of quasi-particle mass); in still another the plasmons are not useful at all and we look for a different quasi-particle analysis. Quasi-particles are emergent entities in solid-state physics: not “fundamental”, not precisely defined, but no less real for all that. At least formally, “particles” in interacting quantum field theories turn out to be closely analogous to quasi-particles. They are described by a division of the Hamiltonian of an interacting QFT into “free” and “interaction” terms: their ‘intrinsic’ properties—notably mass and charge—are set by the parameters in the free part of the Hamiltonian, and the interacting part determines the parameters that govern scattering of particle off particle. As with solid-state physics, this leads to a situation-dependence of the properties, which can be seen in a variety of contexts in particle physics, such as: • The masses and charges of fundamental particles are often described as ‘scale-dependent’. What this really means is that the most useful division of the Hamiltonian into free and interacting parts depends on the characteristic scale at which the particles under consideration are actually interacting. Actually (as can be seen from the Feynman-diagram system used to analyse interacting QFTs), essentially any choice of mass or charge will be admissible, provided one is prepared to pay the price in increasing complexity of interactions. • A sufficiently substantial shift of situation does not just change the parameters of particles, it changes the particles themselves. In nucleon physics at very short ranges, the approximately-free particles are quarks (this is referred to as asymptotic freedom); at longer ranges, the interactions between quarks become far stronger and it becomes more useful to treat nucleons—neutrons and protons—as the approximately-free particles. (There is of course a sense in which a neutron is ‘made from’ three quarks, but the matter is a good deal more subtle than popular-science treatments might suggest!) This strongly suggests that, in particle physics as in solid-state physics, the particle ontology is an emergent, higher-level phenomena, derivative on a lowerlevel field-theoretic ontology.

RELATIVISTIC QUANTUM PHYSICS

2.7.3

83

QFT and the Measurement Problem

Whether the Lagrangian or algebraic approach to QFT is ultimately found correct, the result will be the same: a Lorentz-covariant31 unitary quantum theory, in which the primary dynamical variables are spacetime-local operators like field strengths and in which particles are approximate and emergent. This provides a Bare Quantum Formalism in the sense of §2.1.2, as well as the resources to define macroscopically definite states in the sense of the Quantum Algorithm: they will be states which approximate states in non-relativistic QM (that is, states of definite particle number for ‘ordinary’ particles like electrons and atomic nuclei, with energies low compared to the masses of those particles) and are in addition macroscopically definite according to the definitions used by non-relativistic QM. And although decoherence has been relatively little studied in the relativistic domain (see Anglin and Zurek (1996) for an interesting exception) it seems reasonably clear that again decoherence picks out these macroscopically definite states as pointer-basis states (so that in particular pointer-basis states are definite-particle-number states at least as regards the number of nuclei and electrons). This means that applying the Quantum Algorithm to QFT seems to be fairly straightforward, and should suffice to reproduce both the results of nonrelativistic QM in the appropriate limit and to recover the actual methods used by field theorists to calculate particle-production rates in particle-accelerator experiments. This means that so long as we are concerned with pure interpretations of QM (that is: the Everett interpretation; operationalism; the consistenthistories framework; the New Pragmatism; the original Copenhagen interpretation; and at least some variants of quantum logic) there are no essentially new issues introduced by QFT. If we can find a satisfactory pure interpretation of nonrelativistic QM, it should go through to QFT mutatis mutandis. Things are otherwise when we try to solve the measurement problem by modifying the formalism. The plain truth is that there are currently no hiddenvariable or dynamical-collapse theories which are generally accepted to reproduce the empirical predictions of any interacting quantum field theory. This is a separate matter to the conceptual problems with such strategies, discussed in §2.5 and 2.6. We do not even have QFT versions of these theories to have conceptual problems with. Suppose that we tried to construct one; how should we go about it? Observe that in dynamical-collapse and hidden-variable theories alike, some “preferred observables” must be selected: either to determine the hidden variables, or to determine which sorts of superposition are to be dynamically suppressed. And in nonrelativistic physics there is a natural choice in both cases: position. GRW collapses suppress superpositions of positionally delocalised states; the de Broglie-Bohm hidden variables have definite positions.

31 Or

possibly effectively Lorentz covariant; cf. Wallace (2006b, §3.4).

84

PHILOSOPHY OF QUANTUM MECHANICS

It is less clear what the ‘natural’ choice would be in QFT. One possibility is field configuration—so that, for instance, the QFT analogues of the de Broglie-Bohm corpuscles would be classical field configurations (see, e.g., Valentini (1996), Kaloyerou (1996)) . There are some technical difficulties with these proposals: in particular, it is unclear what “classical field configurations” are in the case of fermionic fields. But more seriously, it is debatable whether fieldbased modificatory strategies will actually succeed in reproducing the predictions of QM. For recall: as I argued in §2.5.1 and 2.6.2, it is crucial for these strategies that they are compatible with decoherence: that is, that the preferred observable is also decoherence-preferred. A dynamical-collapse theory which regards pointer-basis states as “macroscopic superpositions” will fail to suppress the right superpositions; a hidden-variable theory whose hidden-variables are not decoherence-preferred will fail the Second and Fourth Constraints on hiddenvariable strategies, and so will fail to recover effective quasiclassical dynamics. And in QFT (at least where fermions are concerned) the pointer-basis states are states of definite particle number, which in general are not diagonal in the field observables. (See Saunders (1999) for further reasons to doubt that field configurations are effective hidden variables for DBB.) This suggests an alternative choice: the preferred observables should include particle number. In a QFT version of DBB, for instance, there could be a certain number of corpuscles (with that number possibly time-dependent), one for each particle present in a given term in the wavefunction; in a dynamical-collapse theory, the collapse mechanism could suppress superpositions of different-particlenumber states (see, e.g., Bell (1984), Bohm and Hiley (1993), D¨ urr et al. (2004, 2005)). This strategy faces a different problem, however: as was demonstrated in §2.7.2, particles in QFT appear to be effective, emergent, and approximately defined concepts—making them poor candidates for direct representation in the microphysics. These remarks are not meant to imply that no modificatory strategy can successfully reproduce the experimental predictions of QFT—they are meant only to show that no such strategy has yet succeeded in reproducing them, and that there are some general reasons to expect that it will be extremely difficult. QFT, therefore, is significantly more hostile to solutions to the measurement problem that are not pure interpretations. Michael Dickson, comparing pure interpretations with modificatory strategies, observes that [I]t is not clear that ‘no new physics’ is a virtue at all. After all, we trust QM as it happens to be formulated primarily because it is empirically very successful. Suppose, however, that some other theory were equally successful, and were equally explanatory. To reject it because it is not the same as QM is, it seems, to be too much attached to the particular historical circumstances that gave rise to the formulation of QM. (Dickson 1998, p. 60)

QFT shows the true virtue of ‘no new physics’. It is not that we should prefer our existing physics to some equally-successful rival theory; it is rather that, in

CONCLUSION

85

the relativistic domain at any rate, no such theory has been found. 2.7.4

Further Reading

Of the great number of textbook discussions of QFT, Peskin and Schroeder (1995) is particularly lucid; Cao (1997) and Teller (1995) discuss QFT from a philosophical perspective. An issue glossed over in this section is the ‘renormalisation problem’ or ‘problem of infinitites’: mathematically, QFT seems ill-defined and calculations seem to have to be fixed up in an ad hoc way to avoid getting pathological results. One response to the problem is to try to reformulate QFT on mathematically rigorous foundations (the so-called “algebraic QFT program”); Haag (1996) is a good introduction. This strategy seems to be popular with philosophers despite its failure thus far to reproduce any of the concrete predictions of ‘ordinary’ QFT. Another response is to try to make sense of the apparent pathology; this is the mainstream position among physicists today (see Wilson and Kogut (1974), Binney et al. (1992) and Wallace (2006b)). See Bassi and Ghirardi (2003, part IV) for a review of progress in constructing relativistic dynamical-collapse theories; see also Myrvold (2002) for arguments that such theories could be Lorentz-covariant even if nonlocal. 2.8

Conclusion

The predictions of QM may be deduced from a rather well-defined mathematical formalism via an rather badly defined algorithm. Solving the measurement problem may be done in one of two ways: either we must provide an interpretation of the mathematical formalism which makes it a satisfactory physical theory and which entails the correctness of the algorithm; or we must invent a different formalism and a different algorithm which gives the same results, and then give a satisfactory interpretation of that algorithm. Interpreting formalisms is a distinctively philosophical project. Perhaps the most important theme of this review is that questions of interpretation depend on very broad philosophical positions, and so there are far fewer interpretations of a given formalism than meets the eye. In particular, if we are prepared to be both realist and functionalist about a given physical theory, and if we are prepared to accept classical logic as correct, there is exactly one interpretation of any given formalism, although we may be wrong about what it is!32 (And if we are realists but not functionalists, in effect we have further technical work to do, in picking out the non-functional properties of our theory which are supposed to act as a supervenience base for consciousness or other higher-level properties.) 32 Depending on one’s general attitude to metaphysics, there may be some questions not settled by this prescription: questions of the fundamental nature of the wavefunction, for instance, or whether particulars are bundles of properties. I side with van Fraassen (2002) and Ross and Ladyman (2007) in regarding these as largely non-questions, but in any case they do not seem to affect the validity or otherwise of a given solution of the measurement problem.

86

PHILOSOPHY OF QUANTUM MECHANICS

One sees this most clearly with the pure interpretations: your general philosophical predilections lead you to one interpretation or another. Unapologetic instrumentalists and positivists are led naturally to Operationalism. Those who are more apologetic, but who wish to hold on to the insight that “no phenomenon is a phenomenon until it is an observed phenomenon” will adopt a position in the vicinity of Bohr’s. Those willing to reject classical logic will (depending on the details of their proposed alternative) adopt some form of quantum-logic or consistent-histories interpretation. Functionalist realists will become Everettians, or else abandon pure interpretation altogether. (As noted above, non-functionalist realists are already in effect committed to a modificatory strategy.) In criticising a given pure interpretation, one can object to specific features (one may argue against Everett on grounds of probability, or against quantum logic on grounds of intertemporal probability specifications) but one is as likely to reject it because of disagreements with the general philosophical position (so those who regard positivism as dead are unlikely to be impressed by the purely technical merits of Operationalism). In the case of modificatory strategies, there ought to be rather less purely philosophical dispute, since these strategies are generally pursued in the name of realism. But we have seen that general philosophical problems are still highly relevant here: the nature of higher-level ontology, for instance, and the validity and implications of a general functionalism. If the interpretation of a formalism is a distinctively philosophical task, designing new formalisms is physics, and at its hardest. The empirical success of the Quantum Algorithm—never mind its foundational problems —is tremendous, underpinning tens of thousands of breakthroughs in twentieth- and twenty-firstcentury physics. No wonder, then that the new formalisms that have been constructed are very closely based on the bare quantum formalism, supplementing it only by dynamical modifications which serve to eliminate all but one quasiclassical history, or by hidden variables which pick out one such history. We have seen that in practice this is achieved because one of the dynamically fundamental variables in NRQM—position—also suffices to distinguish different quasiclassical histories. We have also seen that in QFT this strategy fails, making it an as-yet-unsolved problem to construct alternatives to the bare formalism of QFT. Sometimes it is easy to forget how grave a problem the ‘measurement problem’ actually is. One can too-easily slip into a mindset where there is one theory— quantum mechanics—and a myriad empirically-equivalent interpretations of that theory. Sometimes, indeed, it can seem that the discussion is carried out on largely aesthetic grounds: do I find this theory’s stochasticity more distressing than that interpretation’s ontological excesses or the other theory’s violation of action-reaction? The truth is very different. Most philosophers of physics are realists, or at least sympathetic to realism. At present we know of at most one realist (and classical-logic) solution to the measurement problem: the Everett interpretation.

CONCLUSION

87

If the Everett interpretation is incoherent for one reason or another (as is probably the mainstream view among philosophers of physics, if not among physicists) then currently we have no realist solutions to the measurement problem. There are interesting research programs, which (disregarding their potential conceptual problems) have successfully reproduced the predictions of non-relativistic physics, but a research programme is not a theory. Penrose (2004) regards “measurement problem” as too anodyne a term for this conceptual crisis in physics. He proposes “measurement paradox”. Perhaps philosophers would do well to follow his lead.

REFERENCES Aharonov, Y. and L. Vaidman (1996). About position measurements which do not show the Bohmian particle position. In Cushing, Fine, and Goldstein (1996), pp. 141–54. Albert, D. Z. (1992). Quantum Mechanics and Experience. Cambridge, MA: Harvard University Press. ————– (1996). Elementary quantum metaphysics. In Cushing, Fine, and Goldstein (1996), pp. 277–84. ————– and B. Loewer (1988). Interpreting the many worlds interpretation. Synthese 77, 195–213. ————– and B. Loewer (1990). Wanted dead or alive: Two attempts to solve Schr¨ odinger’s paradox. In A. Fine, M. Forbes, and L. Wessels eds, Proceedings of the 1990 Biennial Meeting of the Philosophy of Science Association, Vol. 1, pp. 277–85. East Lansing, Michigan: Philosophy of Science Association. ————– and B. Loewer (1996). Tails of Schr¨ odinger’s Cat. In R. Clifton ed, Perspectives on Quantum Reality, pp. 81–92. Dordrecht: Kluwer Academic Publishers. Allori, V., S. Goldstein, R. Tumulka, and N. Zanghi (2007). On the common structure of Bohmian mechanics and the Ghirardi-Rimini-Weber theory. Forthcoming in British Journal for the Philosophy of Science; available online at http://arxiv.org/abs/quant-ph/0603027. Anandan, J. and H. R. Brown (1995). On the reality of space-time geometry and the wavefunction. Foundations of Physics 25, 349–60. Anglin, J. R. and W. H. Zurek (1996). Decoherence of quantum fields: Pointer states and predictability. Physical Review D 53, 7327–35. Armstrong, D. (1968). A Materialist Theory of the Mind. London: Routledge and Kegan Paul. Arntzenius, F. (1990). Kochen’s interpretation of quantum mechanics. Proceedings of the Philosophy of Science Association 1, 241–9. Arntzenius, F. (1998). Curioser and curioser: A personal evaluation of modal interpretations. In Dieks and Vermaas (1998), pp. 337–77. Bacciagaluppi, G. (2005). The role of decoherence in quantum mechanics. Stanford Encyclopedia of Philosophy (Summer 2005 end), E. N. Zalta ed, available online at http://plato.stanford.edu/archives/sum2005/entries/qmdecoherence. ————– (1995). Kochen-Specker theorem in the modal interpretation of quantum mechanics. International Journal of Theoretical Physics 34, 1206– 15. ————– (1998). Bohm-Bell dynamics in the modal interpretation. In Dieks and Vermaas (1998), pp. 177–212. 88

REFERENCES

89

————– (1999). Nelsonian mechanics revisited. Foundations of Physics Letters 12, 1–16. ————– (2000). Delocalized properties in the modal interpretation of quantum mechanics. Foundations of Physics 30, 1431–44. ————– and M. Dickson (1999). Dynamics for modal interpretations. Foundations of Physics 29, 1165–201. ————– M. J. Donald, and P. E. Vermaas (1995). Continuity and discontinuity of definite properties in the modal interpretation. Helvetica Physica Acta 68, 679–704. ————– and M. Hemmo (1996). Modal interpretations, decoherence and measurements. Studies in the History and Philosophy of Modern Physics 27, 239–77. Baker, D. (2007). Measurement outcomes and probability in Everettian quantum mechanics. Studies in the History and Philosophy of Modern Physics 38 (38), 153–69. Ballentine, L. E. (1970). The statistical interpretation of quantum mechanics. Reviews of Modern Physics 42, 358–81. ————– (1990). Quantum Mechanics. Englewood Cliffs: Prentice Hall. Barbour, J. B. (1994). The timelessness of quantum gravity: II. The appearence of dynamics in static configurations. Classical and Quantum Gravity 11, 2875–97. ————– (1999). The End of Time. London: Weidenfeld and Nicholson. Barnum, H., C. M. Caves, J. Finkelstein, C. A. Fuchs, and R. Schack (2000). Quantum probability from decision theory? Proceedings of the Royal Society of London A456, 1175–82. Available online at http://arXiv.org/abs/quantph/9907024. Barrett, J. A. (1998). The bare theory and how to fix it. In Dieks and Vermaas (1998), pp. 319–26. ————– (1999). The Quantum Mechanics of Minds and Worlds. Oxford: Oxford University Press. Bassi, A. and G. Ghirardi (1999). More about dynamical reduction and the enumeration principle. British Journal for the Philosophy of Science 50, 719. ————– and G. Ghirardi (2003). Dynamical reduction models. Physics Reports 379, 257. ————– and G. C. Ghirardi (2000). Decoherent histories and realism. Journal of Statistical Physics 98, 457–94. Available online at http://arxiv.org/abs/quant-ph/9912031. Bell, J. (1981a). Bertlmann’s socks and the nature of reality. Journal de Physique 42, C2 41–61. Reprinted in Bell (1987), pp. 139–58. ————– (1966). On the problem of hidden variables in quantum mechanics. Reviews of Modern Physics 38, 447–52. Reprinted in Bell (1987), pp. 1–13. ————– (1981b). Quantum Mechanics for Cosmologists. In C. J. Isham, R. Penrose, and D. Sciama eds, Quantum Gravity 2: A Second Oxford Symposium, Oxford: Clarendon Press. Reprinted in Bell (1987), pp. 117–38.

90

REFERENCES

————– (1982). On the impossible pilot wave. Foundations of Physics 12, 989–99. Reprinted in Bell (1987), pp. 159–68. ————– (1984). Beables for quantum field theory. CERN preprint CERN-TH 4035/84. Reprinted in Bell (1987), pp. 173–80. ————– (1987). Speakable and Unspeakable in Quantum Mechanics. Cambridge: Cambridge University Press. Binney, J. J., N. J. Dowrick, A. J. Fisher, and M. E. J. Newman (1992). The Theory of Critical Phenomena: An Introduction to the Renormalisation Group. Oxford: Oxford University Press. Bohm, D. (1952). A suggested interpretation of quantum theory in terms of “hidden” variables. Physical Review 85, 166–93. ————– and B. J. Hiley (1993). The Undivided Universe: An Ontological Interpretation of Quantum Theory. London: Routledge and Kegan Paul. Bricmont, J. ed (2001). Chance in Physics: Foundations and Perspectives, London. Springer. Brown, H. R. (1996). Comment on Lockwood. British Journal for the Philosophy of Science 47, 189–248. ————– C. Dewdney, and G. Horton (1995). Bohm particles and their detection in the light of neutron interferometry. Foundations of Physics 25, 329–47. ————– E. Sj¨ oqvist, and G. Bacciagaluppi (1999). Remarks on identical particles in de Broglie-Bohm theory. Physics Letters A 251, 229–35. ————– and D. Wallace (2005). Solving the measurement problem: de Broglie-Bohm loses out to Everett. Studies in the History and Philosophy of Modern Physics 35, 517–40. Bub, J. (1997). Interpreting the Quantum World. Cambridge: Cambridge University Press. ————– and R. Clifton (1996). A uniqueness theorem for “no collapse” interpretations of quantum mechanics. Studies in the History and Philosophy of Modern Physics 27, 181–219. ————– R. Clifton, and S. Goldstein (2000). Revised proof of the uniqueness theorem for ‘no collapse’ interpretations of quantum mechanics. Studies in the History and Philosophy of Modern Physics 31, 95. ————– R. Clifton, and B. Monton (1998). The bare theory has no clothes. In G. Hellman and R. Healey eds, Quantum Measurement: Beyond Paradox, pp. 32–51. Minneapolis: University of Minnesota Press. Busch, P. (1998). Remarks on unsharp observables, objectification, and modal interpretations. In Dieks and Vermaas (1998), pp. 279–88. ————– P. J. Lahti, and P. Mittelstaedt (1996). The Quantum Theory of Measurement (2nd edn). Berlin: Springer-Verlag. Butterfield, J. N. (1992). Bell’s theorem: What it takes. British Journal for the Philosophy of Science 43, 41–83. ————– (1996). Whither the minds? British Journal for the Philosophy of Science 47, 200–21.

REFERENCES

91

Cao, T. Y. (1997). Conceptual Developments of 20th Century Field Theories. Cambridge: Cambridge University Press. Caves, C. M., C. A. Fuchs, K. Mann, and J. M. Renes (2004). Gleason-type derivations of the quantum probability rule for generalized measurements. Foundations of Physics 34, 193. ————– C. A. Fuchs, and R. Schack (2002). Quantum probabilities as Bayesian probabilities. Physical Review A 65, 022305. ————– and R. Schack (2005). Properties of the frequency operator do no imply the quantum probability postulate. Annals of Physics 315, 123–46. Clifton, R. (1996). The properties of modal interpretations of quantum mechanics. British Journal for the Philosophy of Science 47, 371–98. ————– and B. Monton (1999). Losing your marbles in wavefunction collapse theories. British Journal for the Philosophy of Science 50, 697–717. ————– and B. Monton (2000). Counting marbles with ‘accessible’ mass density: A reply to Bassi and Ghirardi. British Journal for the Philosophy of Science 51, 155–64. Cordero, A. (1999). Are GRW tails as bad as they say? Philosophy of Science 66, S59–S71. Cramer, J. G. (1986). The transactional interpretation of quantum mechanics. Reviews of Modern Physics 58 (58), 647–87. ————– (1988). An overview of the transactional interpretation of quantum mechanics. International Journal of Theoretical Physics 27 (27), 227–36. Cushing, J. T. (1994). Quantum Mechanics: Historical Contingency and the Copenhagen Hegemony. Chicago: University of Chicago Press. ————– A. Fine, and S. Goldstein eds (1996). Bohmian Mechanics and Quantum Theory: An Appraisal, Dordrecht. Kluwer Academic Publishers. Davies, P. and J. Brown eds (1986). The Ghost in the Atom, Cambridge. Cambridge University Press. Dennett, D. C. (1991). Real patterns. Journal of Philosophy 87, 27–51. Reprinted in Brainchildren, D. Dennett, (London: Penguin 1998) pp. 95– 120. ————– (2005). Sweet Dreams: Philosophical Objections to a Science of Consciousness. Cambridge, MA: MIT Press. Deotto, E. and G. Ghirardi (1998). Bohmian mechanics revisited. Foundations of Physics 28, 1–30. Available online at http://arxiv.org/abs/quantph/9704021. Deutsch, D. (1985). Quantum theory as a universal physical theory. International Journal of Theoretical Physics 24 (1), 1–41. ————– (1986). Interview. In Davies and Brown (1986), pp. 83–105. ————– (1996). Comment on Lockwood. British Journal for the Philosophy of Science 47, 222–8. ————– (1999). Quantum theory of probability and decisions. Proceedings of the Royal Society of London A455, 3129–37.

92

REFERENCES

————– and P. Hayden (2000). Information flow in entangled quantum systems. Proceedings of the Royal Society of London A456, 1759–74. Dewdney, C., L. Hardy, and E. J. Squires (1993). How late measurements of quantum trajectories can fool a detector. Physics Letters 184A, 6–11. DeWitt, B. and N. Graham eds (1973). The Many-Worlds Interpretation of Quantum Mechanics. Princeton: Princeton University Press. Dickson, M. (1998). Quantum Chance and Non-Locality: Probability and NonLocality in the Interpretations of Quantum Mechanics. Cambridge: Cambridge University Press. ————– (2001). Quantum logic is alive ∧ (it is true ∨ it is false). Philosophy of Science 68, S274–S287. Dieks, D. and P. E. Vermaas eds, (1998). The Modal Interpretation of Quantum Mechanics, Dordrecht. Kluwer Academic Publishers. Donald, M. J. (1990). Quantum theory and the brain. Proceedings of the Royal Society of London A 427, 43–93. ————– (1992). A priori probability and localized observers. Foundations of Physics 22, 1111–72. ————– (1998). Discontinuity and continuity of definite properties in the modal interpretation. In Dieks and Vermaas (1998), pp. 213–22. ————– (2002). Neural unpredictability, the interpretation of quantum theory, and the mind-body problem. Available online at http://arxiv.org/abs/quant-ph/0208033. Dowker, F. and I. Herbauts (2005). The status of the wave function in dynamical collapse models. Foundations of Physics Letters 18, 499–518. ————– and A. Kent (1996). On the consistent histories approach to quantum mechanics. Journal of Statistical Physics 82, 1575–646. D¨ urr, D., S. Goldstein, R. Tumulka, and N. Zanghi (2004). Bohmian mechanics and quantum field theory. Physical Review Letters 93, 090402. ————– S. Goldstein, R. Tumulka, and N. Zanghi (2005). Bell-type quantum field theories. Journal of Physics A38, R1. ————– S. Goldstein, and N. Zanghi (1996). Bohmian mechanics as the foundation of quantum mechanics. In Cushing, Fine, and Goldstein (1996), pp. 21–44. ————– S. Goldstein, and N. Zanghi (1997). Bohmian mechanics and the meaning of the wave function. In R. S. Cohen, M. Horne, and J. Stachel eds, Potentiality, Entanglement and Passion-at-a-Distance—Quantum Mechanical Studies in Honor of Abner Shimony. Dordrecht: Kluwer. Available online at http://arxiv.org/abs/quant-ph/9512031. ————– S. Goldstein, and N. Zhangi (1992). Quantum equilibrium and the origin of absolute uncertainty. Journal of Statistical Physics 67, 843–907. Englert, B. G., M. O. Scully, G. Sussmann, and H. Walther (1992). Surrealistic Bohm trajectories. Zeitschrift f¨ ur Naturforschung 47A, 1175–86. Everett, H. (1957). Relative state formulation of quantum mechanics. Review of Modern Physics 29, 454–62. Reprinted in DeWitt and Graham (1973).

REFERENCES

93

Farhi, E., J. Goldstone, and S. Gutmann (1989). How probability arises in quantum-mechanics. Annals of Physics 192, 368–82. Foster, S. and H. R. Brown (1988). On a recent attempt to define the interpretation basis in the many worlds interpretation of quantum mechanics. International Journal of Theoretical Physics 27, 1507–31. Fuchs, C. and A. Peres (2000a). Quantum theory needs no “interpretation”. Physics Today 53 (3), 70–71. ————– and A. Peres (2000b). Fuchs and Peres reply. Physics Today 53, 14. Gell-Mann, M. and J. B. Hartle (1990). Quantum mechanics in the light of quantum cosmology. In W. H. Zurek ed, Complexity, Entropy and the Physics of Information, pp. 425–59. Redwood City, California: Addison-Wesley. ————– and J. B. Hartle (1993). Classical equations for quantum systems. Physical Review D 47, 3345–82. Ghiardi, G. C. (2002). Collapse theories. Stanford Encyclopedia of Philosophy (Summer 2002 edn), E. N. Zalta edn, available online at http://plato.stanford.edu/archives/spr2002/entries/qm-collapse . ————– R. Grassi, and F. Benatti (1995). Describing the macroscopic world: Closing the circle within the dynamical reduction program. Foundations of Physics 25, 5–38. ————– A. Rimini, and T. Weber (1986). Unified dynamics for micro and macro systems. Physical Review D 34, 470–91. Goldstein, S. (2002). Boltzmann’s approach to statistical mechanics. In Bricmont (2001). Available online at http://arxiv.org/abs/cond-mat/0105242. ————– J. Taylor, R. Tumulka, and N. Zaghi (2005). Are all particles real? Studies in the History and Philosophy of Modern Physics 36, 103–12. Greaves, H. (2004). Understanding Deutsch’s probability in a deterministic multiverse. Studies in the History and Philosophy of Modern Physics 35, 423–56. ————– (2007). On the Everettian epistemic problem. Studies in the History and Philosophy of Modern Physics 38, 120–52. Griffiths, R. B. (1984). Consistent histories and the interpretation of quantum mechanics. Journal of Statistical Physics 36, 219–72. ————– (2002). Consistent Quantum Theory. Cambridge: Cambridge University Press. Haag, R. (1996). Local Quantum Theory: Fields, Particles, Algebras. Berlin: Springer-Verlag. Hemmo, M. and I. Pitowsky (2007). Quantum probability and many worlds. Studies in the History and Philosophy of Modern Physics 38, 333–50. Hiley, B. J., R. E. Callaghan, and O. J. Maroney (2000). Quantum trajectories, real, surreal, or an approximation to a deeper process? Available online at http://arxiv.org/abs/quant-ph/0010020. Hofstadter, D. R. and D. C. Dennett eds, (1981). The Mind’s I: Fantasies and Reflections on Self and Soul. London: Penguin. Holland, P. (1993). The Quantum Theory of Motion. Cambridge: Cambridge

94

REFERENCES

University Press. Joos, E., H. D. Zeh, C. Kiefer, D. Giulini, J. Kubsch, and I. O. Stametescu (2003). Decoherence and the Appearence of a Classical World in Quantum Theory (2nd edn). Berlin: Springer. Kaloyerou, P. N. (1996). An ontological interpretation of boson fields. In Cushing, Fine, and Goldstein (1996), pp. 155–68. Kent, A. (1990). Against many-worlds interpretations. International Journal of Theoretical Physics A5, 1764. Revised version available at http://www.arxiv.org/abs/gr-qc/9703089. ————– (1996a). Quasiclassical dynamics in a closed quantum system. Physical Review A 54, 4670–5675. ————– (1996b). Remarks on consistent histories and Bohmian mechanics. In Cushing, Fine, and Goldstein (1996), pp. 343–52. Kim, J. (1998). Mind in a Physical World. Cambridge, Massachusets: MIT Press/Bradford. Kochen, S. and E. Specker (1967). The problem of hidden variables in quantum mechanics. Journal of Mathematics and Mechanics 17, 59–87. Leggett, A. J. (2002). Testing the limits of quantum mechanics: Motivation, state of play, prospects. Journal of Physics: Condensed Matter 14, R415– R451. Levin, J. (2004). Functionalism. Stanford Encyclopedia of Philosophy (Fall 2004 edn), E. N. Zalta ed, available online at http://plato.stanford.edu/archives/fall2004/entries/functionalism. Lewis, D. (1974). Radical interpretation. Synthese 23, 331–44. Reprinted in David Lewis, Philosophical Papers, Vol. I (Oxford University Press, Oxford, 1983). ————– (1980). A subjectivist’s guide to objective chance. In R. C. Jeffrey ed, Studies in Inductive Logic and Probability, Vol. II. Berkeley: University of California Press. Reprinted in David Lewis, Philosophical Papers, Vol. II (Oxford University Press, Oxford, 1986). ————– (1986). Philosophical Papers, Vol. II. Oxford: Oxford University Press. Lewis, P. J. (1997). Quantum mechanics, orthogonality, and counting. British Journal for the Philosophy of Science 48, 313–28. ————– (2003). Counting marbles: Reply to critics. British Journal for the Philosophy of Science 54, 165–70. ————– (2004a). Life in configuration space. British Journal for the Philosophy of Science 55, 713–29. ————– (2004b). Quantum mechanics and ordinary language: The fuzzy link. Philosophy of Science 70 (55), 713–29. ————– (2005). Interpreting spontaneous collapse theories. Studies in the History and Philosophy of Modern Physics 36, 165–80. ————– (2007). Uncertainty and probability for branching selves. Studies in the History and Philosophy of Modern Physics 38, 1–14.

REFERENCES

95

Lockwood, M. (1989). Mind, Brain and the Quantum: The Compound ‘I’. Oxford: Blackwell Publishers. ————– (1996). ‘Many minds’ interpretations of quantum mechanics. British Journal for the Philosophy of Science 47, 159–88. Maudlin, T. (2002). Quantum Non-Locality and Relativity: Metaphysical Intimations of Modern Physics (2nd edn). Oxford: Blackwell. Monton, B. (2004a). The problem of ontology for spontaneous collapse theories. Studies in the History and Philosophy of Modern Physics 35, 407–21. ————– (2004b). Quantum mechanics and 3N-dimensional space. Forthcoming; available online from http://philsci-archive.pitt.edu. Myrvold, W. (2002). On peaceful coexistence: Is the collapse postulate incompatible with relativity? Studies in the History and Philosophy of Modern Physics 33, 435–66. Nelson, E. (1966). Derivation of the Schr¨ odinger equation from Newtonian mechanics. Physical Review 150, 1079–85. ————– (1985). Quantum Fluctuations. Princeton: Princeton University Press. Newton-Smith, W. S. (2000). Underdetermination of theory by data. In W. H. Newton-Smith ed, A Companion to the Philosophy of Science, pp. 532–36. Oxford: Blackwell. Nielsen, M. A. and I. L. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press. Omnes, R. (1988). Logical reformulation of quantum mechanics. I. Foundations. Journal of Statistical Physics 53, 893–932. ————– (1994). The Interpretation of Quantum Mechanics. Princeton: Princeton University Press. Page, D. N. (1996). Sensible quantum mechanics: Are probabilities only in the mind? International Journal of Modern Physics D5, 583–96. Papineau, D. (1996). Many minds are no worse than one. British Journal for the Philosophy of Science 47, 233–41. Parfit, D. (1984). Reasons and Persons. Oxford: Oxford University Press. Pearle, P. (1989). Combining stochastic dynamical state-vector reduction with spontaneous localization. Physical Review A 39 (5), 2277–89. Penrose, R. (2004). The Road to Reality: A Complete Guide to the Laws of the Universe. London: Jonathon Cape. Peres, A. (1993). Quantum Theory: Concepts and Methods. Dordrecht: Kluwer Academic Publishers. Peskin, M. E. and D. V. Schroeder (1995). An introduction to Quantum Field Theory. Reading, Massachusetts: Addison-Wesley. Price, H. (1996). Time’s Arrow and Archimedes’ Point: New Directions for the Physics of Time. Oxford: Oxford University Press. Psillos, S. (1999). Scientific Realism: How Science Tracks Truth. London: Routledge. Redhead, M. (1987). Incompleteness, Nonlocality and Realism: A Prolegomenon

96

REFERENCES

to the Philosophy of Quantum Mechanics. Oxford: Oxford University Press. Reutsche, L. (1998). How close is ‘close enough’ ? In Dieks and Vermaas (1998), pp. 223–40. Ross, D. (2000). Rainforest realism: A Dennettian theory of existence. In D. Ross, A. Brook, and D. Thompson eds, Dennett’s Philosophy: A comprehensive assessment, pp. 147–68. Cambridge, Massachusets: MIT Press/Bradford. Ross, D. and J. Ladyman (2007). Every Thing Must Go: Metaphysics Naturalized. Oxford: Oxford University Press. Ross, D. and D. Spurrett (2004). What to say to a sceptical metaphysician: A defense manual for cognitive and behavioral scientists. Behavioral and Brain Sciences 27, 603–27. Saunders, S. (1991). The negative-energy sea. In S. Saunders and H. R. Brown eds, The Philosophy of Vacuum, pp. 65–108. Oxford: Clarendon Press. ————– (1992). Locality, complex numbers and relativistic quantum theory. Philosophy of Science Association 1992 1, 365–80. ————– (1993). Decoherence, relative states, and evolutionary adaptation. Foundations of Physics 23, 1553–85. ————– (1995). Time, decoherence and quantum mechanics. Synthese 102, 235–66. ————– (1997). Naturalizing metaphysics. The Monist 80 (1), 44–69. ————– (1998a). A dissolution of the problem of locality. Proceedings of the Philosophy of Science Association 2, 88–98. ————– (1998b). Time, quantum mechanics, and probability. Synthese 114, 373–404. ————– (1999). The ‘beables’ of relativistic pilot-wave theory. In J. Butterfield and C. Pagonis eds, From Physics to Philosophy, pp. 71–89. Cambridge: Cambridge University Press. ————– (2005). Complementarity and scientific rationality. Foundations of Physics 35, 347–72. ————– and D. Wallace (2007). Branching and uncertainty. Available online from http://philsci-archive.pitt.edu. Schlosshauer, M. (2006). Experimental motivation and empirical consistency in minimal no-collapse quantum mechanics. Annals of Physics 321, 112–49. Available online at http://arxiv.org/abs/quant-ph/0506199. Segal, I. (1964). Quantum field and analysis in the solution manifolds of differential equations. In W. Martin and I. Segal eds, Analysis in Function Space, pp. 129–53. Cambridge, MA: MIT Press. ————– (1967). Representations of the canonical commutation relations. In F. Lurcat ed, Cargese Lectures on Theoretical Physics. New York: Gordon and Breach. Spekkens, R. W. (2007). In defense of the epistemic view of quantum states: a toy theory. Physical Review A 75, 032110. Struyve, W. and H. Westman (2006). A new pilot-wave model for quantum

REFERENCES

97

field theory. AIP Conference Proceedings 844, 321. Styer, D., S. Sobottka, W. Holladay, T. A. Brun, R. B. Griffiths, and P. Harris (2000). Quantum theory–interpretation, formulation, inspiration. Physics Today 53, 11. Taylor, J. (1986). Interview. In Davies and Brown (1986), pp. 106–17. Teller, P. (1995). An Interpretative Introduction to Quantum Field Theory. Princeton: Princeton University Press. Vaidman, L. (2002). The many-worlds interpretation of quantum mechanics. Stanford Encyclopedia of Philosophy (Summer 2002 edn), E. N. Zalta (ed.), available online at http://plato.stanford.edu/archives/sum2002/entries/qmmanyworlds . Valentini, A. (1996). Pilot-wave theory of fields, gravitation and cosmology. In Cushing, Fine, and Goldstein (1996), pp. 45–67. ————– (2001). Hidden variables, statistical mechanics and the early universe. In Bricmont (2001), pp. 165–81. Available online at http://arxiv.org/abs/quant-ph/0104067. ————– (2004). Extreme test of quantum theory with black holes. Available online at http://arxiv.org/abs/astro-ph/0412503. ————– and H. Westman (2005). Dynamical origin of quantum probabilities. Proceedings of the Royal Society of London A 461, 187–93. Van Fraassen, B. C. (1980). The Scientific Image. Oxford: Oxford University Press. ————– (1991). Quantum Mechanics. Oxford: Oxford University Press. ————– (2002). The Empirical Stance. New Haven: Yale University Press. Vermaas, P. E. (1998). The pros and cons of the Kochen-Dieks and the atomic modal interpretation. In Dieks and Vermaas (1998), pp. 103–48. Wallace, D. (2002). Worlds in the Everett Interpretation. Studies in the History and Philosopy of Modern Physics 33, 637–61. ————– (2003a). Everett and structure. Studies in the History and Philosophy of Modern Physics 34, 87–105. ————– (2003b). Everettian rationality: Defending Deutsch’s approach to probability in the Everett interpretation. Studies in the History and Philosophy of Modern Physics 34, 415–39. ————– (2004). Protecting cognitive science from quantum theory. Behavioral and Brain Sciences 27, 636–7. ————– (2005). Language use in a branching universe. Forthcoming. Available online from http://philsci-archive.pitt.edu. ————– (2006a). Epistemology quantized: Circumstances in which we should come to believe in the Everett interpretation. British Journal for the Philosophy of Science 57, 655–89. ————– (2006b). In defence of naivet´e: The conceptual status of Lagrangian quantum field theory. Synthese 151, 33–80. ————– (2006c). Probability in three kinds of branching universe. Forthcoming.

98

REFERENCES

————– (2007). Quantum probability from subjective likelihood: Improving on Deutsch’s proof of the probability rule. Studies in the History and Philosophy of Modern Physics 38, 311–32. ————– and C. Timpson (2007). Non-locality and gauge freedom in Deutsch and Hayden’s formulation of quantum mechanics. Foundations of Physics Letters 37, 951–5. Weinberg, S. (1995). The Quantum Theory of Fields, Vol. 1. Cambridge: Cambridge University Press. Wilson, K. G. and J. Kogut (1974). The renormalization group and the expansion. Physics Reports 12C, 75–200. Zeh, H. D. (1993). There are no quantum jumps, nor are there particles! Physics Letters A172, 189. ————– (1999). Why Bohm’s quantum theory? Foundations of Physics Letters 12, 197–200. Zurek, W. H. (1991). Decoherence and the transition from quantum to classical. Physics Today 43, 36–44. Revised version available online at http://arxiv.org/abs/quant-ph/0306072. ————– (1998). Decoherence, einselection, and the quantum origins of the classical: the rough guide. Philosophical Transactions of the Royal Society of London A356, 1793–820. ————– (2003a). Decoherence, einselection, and the quantum origins of the classical. Reviews of Modern Physics 75, 715. ————– (2003b). Environment-assisted invariance, causality, and probabilities in quantum physics. Physical Review Letters 90, 120403. ————– (2005). Probabilities from entanglement, Born’s rule from envariance. Physical Review A 71, 052105.

3 A FIELD GUIDE TO RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS ROMAN FRIGG 3.1 3.1.1

Introduction Statistical Mechanics—A Trailer

Statistical mechanics (SM) is the study of the connection between micro-physics and macro-physics.1 Thermodynamics (TD) correctly describes a large class of phenomena we observe in macroscopic systems. The aim of statistical mechanics is to account for this behaviour in terms of the dynamical laws governing the microscopic constituents of macroscopic systems and probabilistic assumptions.2 This project can be divided into two sub-projects, equilibrium SM and nonequilibrium SM. This distinction is best illustrated with an example. Consider a gas initially confined to the left half of a box (see fig. 3.1):

Fig. 3.1. Initial state of a gas, wholly confined to the left compartment of a box separated by a barrier This gas is in equilibrium as all natural processes of change have come to an end and the observable state of the system is constant in time, meaning that all macroscopic parameters such as local temperature and local pressure assume constant values. Now we remove the barrier separating the two halves of the box. As a result, the gas is no longer in equilibrium and it quickly disperses (see fig. 3.2): This process of dispersion continues until the gas homogeneously fills the entire box, at which point the system will have reached a new equilibrium state (see fig. 3.3): 1 Throughout this chapter I use ‘micro’ and ‘macro’ as shorthands for ‘microscopic’ and ‘macroscopic’ respectively. 2 There is a widespread agreement on the broad aim of SM; see for instance Ehrenfest and Ehrenfest (1912, p. 1), Khinchin (1949, p. 7), Dougherty (1993, p. 843), Sklar (1993, p. 3), Lebowitz (1999, 346), Goldstein (2001, p. 40), Ridderbos (2002, p. 66) and Uffink (2007, p.

99

100 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

Fig. 3.2. When the barrier is removed, the gas is no longer in equilibrium and disperses

Fig. 3.3. Gas occupies new equilibrium state From an SM point of view, equilibrium needs to be characterised in microphysical terms. What conditions does the motion of the molecules have to satisfy to ensure that the macroscopic parameters remain constant as long as the system is not subjected to perturbations from the outside (such as the removal of barriers)? And how can the values of macroscopic parameters like pressure and temperature be calculated on the basis of such a microphysical description? Equilibrium SM provides answers to these and related questions. Non-equilibrium SM deals with systems out of equilibrium. How does a system approach equilibrium when left to itself in a non-equilibrium state and why does it do so to begin with? What is it about molecules and their motions that leads them to spread out and assume a new equilibrium state when the shutter is removed? And crucially, what accounts for the fact that the reverse process won’t happen? The gas diffuses and spreads evenly over the entire box; but it won’t, at some later point, spontaneously move back to where it started. And in this the gas is no exception. We see ice cubes melting, coffee getting cold when left alone, and milk mix with tea; but we never observe the opposite happening. Ice cubes don’t suddenly emerge from lukewarm water, cold coffee doesn’t spontaneously heat up, and white tea doesn’t un-mix, leaving a spoonful of milk at the top of a cup otherwise filled with black tea. Change in the world is unidirectional : systems, when left alone, move towards equilibrium but not away from it. Let us introduce a term of art and refer to processes of this kind as ‘irreversible’. The fact that many processes in the world are irreversible is enshrined in the so-called Second Law of thermodynamics, which, roughly, states that transitions from equilibrium to non-equilibrium states cannot occur in isolated systems. What explains this regularity? It is the aim of non-equilibrium SM

923).

INTRODUCTION

101

to give a precise characterisation of irreversibility and to provide a microphysical explanation of why processes in the world are in fact irreversible.3 The issue of irreversibility is particularly perplexing because (as we will see) the laws of micro physics have no asymmetry of this kind built into them. If a system can evolve from state A into state B, the inverse evolution, from state B to state A, is not ruled out by any law governing the microscopic constituents of matter. For instance, there is nothing in the laws governing the motion of molecules that prevents them from gathering again in the left half of the box after having uniformly filled the box for some time. But how is it possible that irreversible behaviour emerges in systems whose components are governed by laws which are not irreversible? One of the central problems of non-equilibrium SM is to reconcile the asymmetric behavior of irreversible thermodynamic processes with the underlying symmetric dynamics. 3.1.2

Aspirations and Limitations

This chapter presents a survey of recent work on the foundations of SM from a systematic perspective. To borrow a metaphor of Gilbert Ryle’s, it tries to map out the logical geography of the field, place the different positions and contributions on this map, and indicate where the lines are blurred and blank spots occur. Classical positions, approaches, and questions are discussed only if they have a bearing on current foundational debates; the presentation of the material does not follow the actual course of the history of SM, nor does it aim at historical accuracy when stating arguments and positions.4 Such a project faces an immediate difficulty. Foundational debates in many other fields can take as their point of departure a generally accepted formalism and a clear understanding of what the theory is. Not so in SM. Unlike quantum mechanics and relativity theory, say, SM has not yet found a generally accepted theoretical framework, let alone a canonical formulation. What we find in SM is a plethora of different approaches and schools, each with its own programme and mathematical apparatus, none of which has a legitimate claim to be more fundamental than its competitors. For this reason a review of foundational work in SM cannot simply begin with a concise statement of the theory’s formalism and its basic principles, and then move on to the different interpretational problems that arise. What, then, is the appropriate way to proceed? An encyclopaedic list of the different schools and their programme would do little to enhance our understanding of the workings of SM. Now it might seem that an answer to this question can be found in the observation that across the different approaches equilibrium theory is better understood than non-equilibrium theory, which might suggest that a review should 3 Different meanings are attached to the term ‘irreversible’ in different contexts, and even within thermodynamics itself (see Denbigh 1989a and Uffink 2001, §3). I am not concerned with these in what follows and always use the term in the sense introduced here. 4 Those interested in the long and intricate history of SM are referred to Brush (1976), Sklar (1993, Chapter 2), von Plato (1994) and Uffink (2007).

102 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

begin with a presentation and discussion of equilibrium, and then move on to examining non-equilibrium. Although not uncongenial, this approach has two serious drawbacks. First, it has the disadvantage that the discussion of specific positions (for instance the ergodic approach) will be spread out over different sections, and as a result it becomes difficult to assess these positions as consistent bodies of theory. Second, it creates the wrong and potentially misleading impression that equilibrium theory can (or even should be) thought of as an autonomous discipline. By disconnecting the treatment of equilibrium from a discussion of non-equilibrium we lose sight of the question of how and in what way the equilibrium state constitutes the final point towards which the dynamical evolution of a system converges. In what follows I take my lead from the fact that all these different schools (or at any rate those that I discuss) use (slight variants) of either of two theoretical frameworks, one of which can be associated with Boltzmann (1877) and the other with Gibbs (1902), and can thereby classify different approaches as either ‘Boltzmannian’ or ‘Gibbsian’. The reliance on a shared formalism (even if the understanding of the formalism varies radically) provides the necessary point of reference to compare these accounts and assess their respective merits and drawbacks. This is so because the problems that I mentioned in §3.1.1 can be given a precise formulation only within a particular mathematical framework. Moreover it turns out that these frameworks give rise to markedly different characterisations both of equilibrium and of non-equilibrium, and accordingly the problems that beset accounts formulated within either framework are peculiar to one framework and often do not have a counterpart in the other. And last but not least, the scope of an approach essentially depends on the framework in which it is formulated, and, as we shall see, there are significant differences between two (I return to this issue in the conclusion). Needless to say, omissions are unavoidable in a chapter-size review. I hope that any adversity caused by these omissions is somewhat alleviated by the fact that I clearly indicate at what point they occur and how the omitted positions or issues fit into the overall picture; I also provide ample references for those who wish to pursue the avenues I bypass. The most notable omission concerns the macro theory at stake, thermodynamics. The precise formulation of the theory, and in particular the Second Law, raises important questions. These are beyond the scope of this review; Appendix B provides a brief statement of the theory and flags the most important problems that attach to it. What is the relevant microphysical theory? A natural response would be to turn to quantum theory, which is generally regarded as the currently best description of micro entities. The actual debate has followed a different path. With some rare exceptions, foundational debates in SM have been, and still are, couched in terms of classical mechanics (which I briefly review in Appendix A). I adopt this point of view and confine the discussion to classical statistical mechanics. This, however, is not meant to suggest that the decision to discuss

THE BOLTZMANN APPROACH

103

foundational issues in a classical rather than a quantum setting is unproblematic. On the contrary, many problems that occupy centre stage in the debate over the foundations of SM are intimately linked to aspects of classical mechanics and it seems legitimate to ask whether, and, if so, how, these problems surface in quantum statistical mechanics. (For a review of foundational issues in quantum SM see Emch (2007)). 3.2

The Boltzmann Approach

Over the years Boltzmann developed a multitude of different approaches to SM. However, contemporary Boltzmannians (references will be given below), take the account introduced in Boltzmann (1877) and streamlined by Ehrenfest and Ehrenfest (1912) as their starting point. For this reason I concentrate on this approach, and refer the reader to Klein (1973), Brush (1976), Sklar (1993), von Plato (1994), Cercignani (1998) and Uffink (2004, 2007) for a discussion of Boltzmann’s other approaches and their tangled history, and to de Regt (1996), Blackmore (1999) and Visser (1999) for discussions of Boltzmann’s philosophical and methodological presuppositions at different times. 3.2.1

The Framework

Consider a system consisting of n particles with three degrees of freedom,5 which is confined to a container of finite volume V and has total energy E.6 The system’s fine-grained micro-state is given by a point in its 6n dimensional phase space Γγ .7 In what follows we assume that the system’s dynamics is governed by Hamilton’s equations of motion,8 and that the system is isolated from its environment.9 Hence, the system’s fine-grained micro-state x lies within a finite sub-region Γγ, a of Γγ , the so-called ‘accessible region’ of Γγ . This region is determined by the constraints that the motion of the particles is confined to volume V and that the system has constant energy E—in fact, the latter implies that Γγ, a entirely lies within a 6n − 1 dimensional hypersurface ΓE , the so-called ‘energy hypersurface’, which is defined by the condition H(x) = E, where H 5 The generalisation of what follows to systems consisting of objects with any finite number of degrees of freedom is straightforward. 6 The version of the Boltzmann framework introduced in this subsection is the one favoured by Lebowitz (1993a, 1993b, 1999), Goldstein (2001), and Goldstein and Lebowitz (2004). As we shall see in the next subsection, some authors give different definitions of some of the central concepts, most notably the Boltzmann entropy. 7 The choice of the somewhat gawky notation ‘Γ ’ will be justified in the next subsection. γ 8 For brief review of classical mechanics see Appendix A. From a technical point of view the requirement that the system be Hamiltonian is restrictive because the Boltzmannian machinery, in particular the combinatorial argument introduced in the next subsection, can be used also in some cases of non-Hamiltonian systems (for instance the Baker’s gas and the Kac ring). However, as long as one believes that classical mechanics is the true theory of particle motion (which is what we do in classical SM), these other systems are not relevant from a foundational point of view. 9 This assumption is not uncontroversial; in particular, it is rejected by those who advocate an interventionist approach to SM; for a discussion see §3.3.5.2.

104 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

is the Hamiltonian of the system and E the system’s total energy. The phase space is endowed with a Lebesgue measure μL , which induces a measure μL, E on the energy hypersurface via equation (3.46) in the Appendix. Intuitively, these measures associate a ‘volume’ with subsets of Γγ and ΓE ; to indicate that this ‘volume’ is not the familiar volume in three dimensional physical space it is often referred to as ‘hypervolume’. Hamilton’s equations define a measure preserving flow φt on Γγ , meaning that φt : Γγ → Γγ is a one-to-one mapping and μL (R) = μL (φt (R)) for all times t and all regions R ⊆ Γγ , from which it follows that μL, E (RE ) = μL, E (φt (RE )) for all regions RE ⊆ ΓE . Let Mi , i = 1, ..., m, be the system’s macro-states. These are characterised by the values of macroscopic variables such as local pressure, local temperature, and volume.10 It is one of the basic posits of the Boltzmann approach that a system’s macro-state supervenes on its fine-grained micro-state, meaning that a change in the macro-state must be accompanied by a change in the fine-grained microstate (i.e. it is not possible, say, that the pressure of a system changes while its fine-grained micro-state remains the same). Hence, to every given fine-grained micro-state x ∈ ΓE there corresponds exactly one macro-state. Let us refer to this macro-state as M (x).11 This determination relation is not one-to-one; in fact many different x ∈ ΓE can correspond to the same macro-state (this will be illustrated in detail in the next subsection). It is therefore natural to define ΓMi := {x ∈ ΓE | Mi = M (x)}, i = 1, ..., m,

(3.1)

the subset of ΓE consisting of all fine-grained micro-states that correspond to macro-state Mi . The proposition that a system with energy E is in macro-state Mi and the proposition that the system’s fine-grained micro-state lies within ΓMi always have the same truth value; for this reason, Mi and ΓMi alike are sometimes referred to as ‘macro-states’. However, at some points in what follows it is important to keep the two separate and so I do not follow this convention; I reserve the term ‘macro-state’ for the Mi ’s and refer to the ΓMi ’s as ‘macroregions’. The ΓMi don’t overlap because macro-states supervene on micro-states: ΓMi ∩ ΓMj =  for all i = j and i, j = 1, ..., m. For a complete set of macro-states the ΓMi also jointly cover the accessible region of the energy hypersurface: ΓM1 ∪ 10 Whether index i ranges over a set of finite, countably infinite, or uncountably infinite cardinality depends both on the system and on how macro-states are defined. In what follows I assume, for the sake of simplicity, that there is a finite number m of macro-states. 11 This is not to claim that all macroscopic properties of a gas supervene on its mechanical configuration; some (e.g. colour and smell) do not. Rather, it is an exclusion principle: if a property does not supervene on the system’s mechanical configuration then it does not fall within the scope of SM.

THE BOLTZMANN APPROACH

105

... ∪ ΓMm = Γγ,a (where ‘∪’, ‘∩’ and ‘’ denote set theoretic union, intersection and the empty set respectively). In this case the ΓMi form a partition of Γγ, a .12 The Boltzmann entropy of a macro-state Mi is defined as13 SB (Mi ) = kB log[μL, E (ΓMi )],

(3.2)

where kB is the so-called Boltzmann constant. For later discussions, in particular for what Boltzmannians have to say about non-equilibrium and reductionism, a small ‘cosmetic’ amendment is needed. The Boltzmann entropy as introduced in equation (3.2) is a property of a macro-state. Since a system is in exactly one macro-state at a time, the Boltzmann entropy can equally be regarded as a property of a system itself. Let M (x(t)) be the system’s macro-state at time t (i.e. M (x(t)) is the Mi in which the system’s state x happens to be at time t), then the system’s Boltzmann entropy at time t is defined as SB (t) := SB [M (x(t))].

(3.3)

By definition, the equilibrium state is the macro-state for which the Boltzmann entropy is maximal. Let us denote that state by Meq (and, without loss of generality, choose the labelling of macro-states such that Meq = Mm ). Justifying this definition is one of the main challenges for the Boltzmannian approach, and I return to this issue below in §3.2.7. We now need to explain the approach to equilibrium. As phrased in the introduction, this would amount to providing a mechanical explanation of the Second Law of thermodynamics. It is generally accepted that this would be aiming too high; the best we can hope for within SM is to get a justification of a ‘probabilistic version’ of the Second Law, which I call ‘Boltzmann’s Law’ (BL) (Callender 1999; Earman 2006, pp. 401–03): Boltzmann’s Law: Consider an arbitrary instant of time t = t1 and assume that the Boltzmann entropy of the system at that time, SB (t1 ), is far below its maximum value. It is then highly probable that at any later time t2 > t1 we have SB (t2 ) ≥ SB (t1 ).14

Unlike the Second Law, which is a universal law (in that it does not allow for exceptions), BL only makes claims about what is very likely to happen. Whether it is legitimate to replace the Second Law by BL will be discussed in §3.2.8. Even if this question is answered in the affirmative, what we expect from SM is an argument for the conclusion that BL, which so far is just a conjecture, holds true 12 Formally, {α , ..., α }, where α ⊆ A for all i, is a partition of A iff α ∪ ... ∪ α = A and 1 1 i k k αi ∩ αj =  for all i = j and i, j = 1, ..., k. 13 Goldstein (2001, p. 43), Goldstein and Lebowitz (2004, p. 57), Lebowitz (1993a, p. 34; 1993b, p. 5; 1999, p. 348). 14 BL is sometimes referred to as the ‘statistical H-Theorem’ or the ‘statistical interpretation of the H-theorem’ because in earlier approaches to SM Boltzmann introduced a quantity H, which is basically equal to −SB , and aimed to prove that under suitable assumptions it decreased monotonically. For discussions of this approach see the references cited at the beginning of this subsection.

106 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

in the relevant systems, and, if it does, an explanation of why this is so. In order to address this question, we need to introduce probabilities into the theory to elucidate the locution ‘highly probable’. There are two different ways of introducing probabilities into the Boltzmannian framework. The first assigns probabilities directly to the system’s macrostates; the second assigns probabilities to the system’s micro-state being in particular subsets of the macro-region corresponding to the system’s current macrostate.15 For want of better terms I refer to these as ‘macro-probabilities’ and ‘micro-probabilities’ respectively. Although implicit in the literature, the distinction between macro-probabilities and micro-probabilities has never been articulated, and it rarely, if ever, receives any attention. This distinction plays a central rˆ ole in the discussion both of BL and the interpretation of SM probabilities, and it is therefore important to give precise definitions. Macro-Probabilities: A way of introducing probabilities into the theory, invented by Boltzmann (1877) and advocated since then by (among others) those working within the ergodic programme (see §3.2.4), is to assign probabilities to the macro-states Mi of the system. This is done by introducing the postulate that the probability of a macro-state Mi is proportional to the measure of its corresponding macro-region: p(Mi ) := c μL, E (ΓMi ),

(3.4)

where c is a normalisation constant. I refer to this as the ‘Proportionality Postulate’ (PP). From this postulate and equation (3.2) it follows immediately that the most likely macro-state is the macro-state with the highest Boltzmann entropy and the one that occupies the largest part of the (accessible) phase space. From this point of view it seems natural to understand the approach to equilibrium as the evolution from an unlikely macro-state to a more likely macrostate and finally to the most likely macro-state. If the system evolves from less to more likely macro-states most of the time then we have justified BL. Whether we have any reasons to believe that this is indeed the case will be discussed in §3.2.3.2. Micro-Probabilities: A different approach assigns probabilities to sets of microstates (rather than to macro-states) on the basis of the so-called statistical postulate (SP).16 Statistical Postulate: Let M be the macro-state of a system at time t. Then the probability at t that the fine-grained micro-state of the system lies in a subset A of ΓM is 15 This is not to say that these two kinds of probabilities are incompatible; in fact they could be used in conjunction. However, this is not what happens in the literature. 16 It is not clear where this postulate originates. It has recently—with some qualifications, as we shall see—been advocated by Albert (2000), and also Bricmont (1996) uses arguments based on probabilities introduced in this way; see also Earman (2006, p. 405), where this postulate is discussed, but not endorsed. Principles very similar to this one have also been suggested by various writers within the Gibbsian tradition; see §3.3.

THE BOLTZMANN APPROACH μL, E (A)/μL, E (ΓM ).

107 (3.5)

With this assumption the truth of BL depends on the dynamics of the system, because now BL states that the overwhelming majority of fine-grained microstates in any ΓMi (except the equilibrium macro-region) are such that they evolve under the dynamics of the system towards some other region ΓMj of higher entropy. Hence, the truth of BL depends on the features of the dynamics. The question is whether the systems we are interested in have this property. I come back to this issue in §3.2.6. 3.2.2 The Combinatorial Argument An important element in most presentations of the Boltzmann approach is what is now known as the ‘combinatorial argument’. However, depending on how one understands the approach, this argument is put to different uses—a fact that is unfortunately hardly ever made explicit in the literature on the topic. I will first present the argument and then explain what these different uses are. Consider the same system of n identical particles as above, but now focus on the 6 dimensional phase space of one of these particles, the so-called μ-space Γμ , rather than the (6n dimensional) phase space Γγ of the entire system.17 A point in Γμ denotes the particle’s fine-grained micro-state. It necessarily lies within a finite sub-region Γμ, a of Γμ , the accessible region of Γμ . This region is determined by the constraints that the motion of the particles is confined to volume V and that the system as a whole has constant energy E. Now we choose a partition ω of Γμ, a ; that is, we divide Γμ, a into a finite number l of disjoint cells ωj , which jointly cover the accessible region of the phase space. The introduction of a partition on a phase space is also referred to as ‘coarse-graining’. The cells are taken to be rectangular with respect to the position and momentum coordinates and of equal volume δω (this is illustrated in fig. 3.4). The so-called coarse-grained micro-state of a particle is given by specifying in which cell ωj its fine-grained micro-state lies.18 The micro-state of the entire system is a specification of the micro-state of every particle in the system, and hence the fine-grained micro-state of the system is determined by n labelled points in Γμ .19 The so-called coarse-grained micro17 The use of the symbol μ both in ‘μ-space’ and to refer to the measure on the phase space is somewhat unfortunate as they have nothing to do with each other. However, as this terminology is widely used I stick to it. The distinction between μ-space and γ-space goes back to Ehrenfest and Ehrenfest (1912); it is a pragmatic and not a mathematical distinction in that it indicates how we use these spaces (namely to describe a single particle’s or an entire system’s state). From a mathematical point of view both μ-space and γ-space are classical phase spaces (usually denoted by Γ). This explains choice of the seemingly unwieldy symbols Γμ and Γγ . 18 There is a question about what cell a fine-grained micro-state belongs to if it lies exactly on the boundary between two cells. One could resolve this problem by adopting suitable conventions. However, it turns out later on that sets of measure zero (such as boundaries) can be disregarded and so there is no need to settle this issue. 19 The points are labelled in the sense that it is specified which point represents the state of which particle.

108 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

p Γμ,a ω1 ω2 ω3

qψ0

Fig. 3.4. Partitioning (or coarse-graining) of the phase space state is a specification of which particle’s state lies in which cell of the partition ω of Γμ, a ; for this reason the coarse-grained micro-state of a system is also referred to as an ‘arrangement’. The crucial observation now is that a number of arrangements correspond to the same macro-state because a system’s macro-properties are determined solely by the number of particles in each cell, while it is irrelevant exactly which particle is in which cell. For instance, whether particle number 5 and particle number 7 are in cells ω1 and ω2 respectively, or vice versa, makes no difference to the macro-properties of the system as a whole because these do not depend on which particle is in which cell. Hence, all we need in order to determine a system’s macro-properties is a specification of how many particles there are in each cell of the coarse-grained μ-space. Such a specification is called a ‘distribution’. Symbolically we can write it as a tuple D = (n1 , . . . , nl ), meaning the distribution comprising n1 particles in cell ω1 , etc. The nj are referred to as l ‘occupation numbers’ and they satisfy the condition j=1 nj = n. For what follows it is convenient to label the different distributions with a discrete index i (which is not a problem since for any given partition ω and particle number n there are only a finite number of distributions) and denote the ith tuple by Di . The beginning of such labelling could be, for instance, D1 = (n, 0, ...., 0), D2 = (n − 1, 1, 0, ..., 0), D3 = (n − 2, 1, 1, 0, ..., 0), etc. How many arrangements are compatible with a given distribution D? Some elementary combinatorial considerations show that G(D) :=

n! n1 ! . . . nl !

(3.6)

arrangements are compatible with a given distribution D (where ‘!’ denotes factorials, i.e. k! := k(k − 1) ... 1, for any natural number k and 0! := 1). For this

THE BOLTZMANN APPROACH

109

reason a distribution conveys much less information than an arrangement. Each distribution corresponds to a well-defined region of Γγ , which can be seen as follows. A partition of Γγ is introduced in exactly the same way as above. In fact, the choice of a partition of Γμ induces a partition of Γγ because Γγ is just the Cartesian product of n copies of Γμ . The coarse-grained state of the system is then given by specifying in which cell of the partition its fine-grained state lies. This is illustrated in fig. 3.5 for the fictitious case of a two particle system, where each particle’s μ-space is one dimensional and endowed with a partition consisting of four cells ω1 , ..., ω4 . (This case is fictitious because in classical mechanics there is no Γμ with less than two dimensions. I consider this example for ease of illustration; the main idea carries over to higher dimensional spaces without difficulties.)

Γμ1 ω4

ω1 ω2 0 ω3 ω4

ω3 ω1 ω2 ω3 ω4 ω2 ω1

}

Γμ,a

Γμ2

Fig. 3.5. Specification of coarse-grained state of a system This illustration shows that each distribution D corresponds to a particular part of Γγ, a ; and it also shows the important fact that parts corresponding to different distributions do not overlap. In fig. 3.5, the hatched areas (which differ by which particle is in which cell) correspond to the distribution (1, 0, 0, 1) and the dotted area (where both particles are in the same cell) correspond to (0, 2, 0, 0). Furthermore, we see that the hatched area is twice as large as the dotted area, which illustrates an important fact about distributions to which we now turn. From the above it becomes clear that each point x in Γγ, a corresponds to exactly one distribution; call this distribution D(x). The converse of this, of course, fails since in general many points in Γγ, a correspond to the same distribution Di . These states together form the set ΓDi : ΓDi := {x ∈ Γγ | D(x) = Di }.

(3.7)

From equations (3.6) and (3.7), together with the assumption that all cells have the same size δω (in the 6 dimensional μ-space), it follows that μL (ΓDi ) = G(Di ) (δω)n ,

(3.8)

110 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

Next we want to know the distribution D for which G(D), and with it μL (ΓDi ), assume their maximum. To solve this problem we make two crucial sets of assumptions, one concerning the energy of the system and one concerning the system’s size (their implications will be discussed in §3.2.7). First, we assume that the energy of a particle only depends on which cell ωj it is in, but not on the states of the other particles; that is, we neglect the contribution to the energy of the system that stems from interactions between the particles. We then also assume that the energy Ej of a particle whose finegrained state lies in cell ωj only depends on the index j, i.e. on the cell in which the state is, and not on its exact location within the cell. This can be achieved, for instance, by taking Ej to be the average energy in ωj . Under these assumptions, l the total energy of the system is given by j=1 nj Ej . Second, we assume that the system as a whole is large, and that there are many particles in each individual cell: (nj 1 for all j). These assumptions allows us to use Stirling’s formula to approximate factorials: n √ n n! ≈ 2πn (3.9) e Now we have to maximise G(D) under  the ‘boundary conditions’ that the number n of particles is constant (n = j nj ) and that the total energy E of  the system is constant (E = j nj Ej ). Under these assumptions one can then prove (using Stirling’s approximation and the Lagrange multiplier method) that G(D) reaches its maximum for nj = α exp(−βEj ),

(3.10)

which is the (discrete) Maxwell-Boltzmann distribution, where α and β are constants depending on the nature of the system (Ehrenfest and Ehrenfest 1912, p. 30; Tolman 1938, Chapter 4). Before we turn to a discussion of the significance of these calculations, something needs to be said about observable quantities. It is obvious from what has been said so far that observable quantities are averages of the form f  :=

n 

nj fωj ,

(3.11)

j=1

where f is a function of position and momentum of a particle, and fωj is the value of the function in cell ωj (where, as in the case of the energy, it is assumed that the values of f depend only on the cell ωj but not of the particle’s location within the cell; i.e. it is assumed that f does not fluctuate on a scale of δω). In particular one can calculate the pressure of a gas in equilibrium in this way.20 20 In practice this is not straightforward. To derive the desired results, one first has to express the Maxwell-Boltzmann distribution in differential form, transform it to position and momentum coordinates and take a suitable continuum limit. For details see, for instance, Tolman (1938, Chapter 4).

THE BOLTZMANN APPROACH

111

What is the relevance of these considerations for the Boltzmann approach? The answer to this question is not immediately obvious. Intuitively one would like to associate the ΓDi with the system’s macro-macro regions ΓMi . However, such an association is undercut by the fact that the ΓDi are 6n dimensional objects, while the ΓMi , as defined by equation (3.1), are subsets of the 6n − 1 dimensional energy hypersurface ΓE . Two responses to this problem are possible. The first is to replace the definition of a macro-region given in the previous subsection by one that associates macro-states with 6n dimensional rather than 6n − 1 dimensional parts of Γγ , which amounts to replacing equation (3.1) by ΓMi := {x ∈ Γγ | Mi = M (x)} for all i = 1, ..., m. Macro-states thus defined can then be identified with the regions of Γγ corresponding to a given distribution: ΓMi = ΓDi for all i = 1, ..., m, where m now is the number of different distributions. This requires various adjustments in the apparatus developed in §3.2.1, most notably in the definition of the Boltzmann entropy. Taking the lead from the idea that the Boltzmann entropy is the logarithm of the hypervolume of the part of the phase space associated with a macro-state we have SB (Mi ) := kB log[μL (ΓMi )],

(3.12)

and with equation (3.8) we get SB (Mi ) = kB log[G(Di )] + kB n log(δω).

(3.13)

Since the last term is just an additive constant it can be dropped (provided we keep the partition fixed), because ultimately we are interested in entropy differences rather than in absolute values. We then obtain SB (Mi ) = kB log[G(Di )], which is the definition of the Boltzmann entropy we find in Albert (2000, p. 50). In passing it is worth mentioning that SB can be expressed in alternative ways. If we plug equation (3.6) into equation (3.13) and take into account the above assumption that all nj are large (which allows us to use Stirling’s approximation) we obtain (Tolman 1938, Chapter 4):  SB (Mi ) ≈ −kB nj log nj + c(n, δω), (3.14) j

where the nj are the occupation numbers of distribution Di and c(n, δω) is a constant depending on n and δω. Introducing the quotients pj := nj /n and plugging them into the above formula we find  pj log pj + c˜(n, δω), (3.15) SB (Mi ) ≈ −n kB j

where, again, c˜(n, δω) is a constant depending on n and δω.21 The quotients pi are often said to be the probability of finding a randomly chosen particle in 21 This

expression for the Boltzmann entropy is particularly useful because, as we shall see P in §3.3.6.1, j pj log pj is a good measure for the ‘flatness’ of the distribution pj .

112 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

cell ωi . This is correct, but it is important not to confuse this probability, which is simply a finite relative frequency, with the probabilities that occur in BL. In fact, the two have nothing to do with each other. What are the pros and the cons of this first response? The obvious advantage is that it provides an explicit construction of the macro-regions ΓMi , and that this construction gives rise to a definition of the Boltzmann entropy which allows for easy calculation of its values. The downside of this ‘6n dimensional approach’ is that the macro-regions ΓMi almost entirely consist of micro-states which the system never visits (remember that the motion of the system’s micro-state is confined to the 6n − 1 dimensional energy hypersurface). This is a problem because it is not clear what relevance considerations based on the hypervolume of certain parts of the phase space have if we know that the system’s actual micro-state only ever visits a subset of these parts which is of measure zero. Most notably, of what relevance is the observation that the equilibrium macro-region has the largest (6n dimensional) hypervolume if the system can only ever access a subset of measure zero of this macro-region? Unless there is a relation between the 6n−1 dimensional hypervolume of relevant parts of the energy hypersurface and the 6n dimensional hypervolume of the parts of Γγ in which they lie, considerations based on the 6n dimensional hypervolume are inconsequential.22 The second response to the above problem leaves the definition of macroregions as subsets of the 6n − 1 dimensional energy hypersurface unaltered and endeavours to ‘translate’ the results of the combinatorial argument back into the original framework (as presented in §3.2.1). This, as we shall see, is possible, but only at the cost of introducing a further hypothesis postulating a relation between the values of the 6n and the 6n−1 dimensional hypervolumes of relevant parts of Γγ . The most important achievement of the combinatorial argument is the construction of the ΓDi , the regions in phase space occupied by micro-states with the same macro-properties. Given that the original framework does not provide a recipe of how to construct the macro-regions, we want to make use of the ΓDi to define the ΓMi . A straightforward way to obtain the ΓMi from the ΓDi is intersect the ΓDi with ΓE :23 22 Moreover, the ‘6n dimensional approach’ renders the ‘orthodox’ account of SM probability, the time average interpretation (see §3.2.4), impossible. This interpretation is based on the assumption that the system is ergodic on the union of the macro-regions, which is impossible if macro regions are 6n dimensional. 23 This construction implicitly assumes that there is a one-to-one correspondence between distributions and macro-states. This assumption is too simplistic in at least two ways. First, ΓDi ∩ ΓE may be empty for some i. Second, characteristically several distributions correspond to the same macro-state in that the macroscopic parameters defining the macro-state assume the same values for all of them. These problems can easily be overcome. The first can be solved by simply deleting empty Mi from the list of macro-regions; the second can be overcome by intersecting ΓE not with each individual ΓDi , but instead with the union of all ΓDi that correspond to the same macro-state. Since this would not alter any of the considerations to

THE BOLTZMANN APPROACH

ΓMi := ΓDi ∩ ΓE .

113

(3.16)

How can we calculate the Boltzmann entropy of the macro-states corresponding to macro-regions thus defined? The problem is that in order to calculate the Boltzmann entropy of these states we need the 6n − 1 dimensional hypervolume of the ΓMi , but what we are given (via equation (3.8)) is the the 6n dimensional hypervolume of the ΓDi , and there is no way to compute the former on the basis of the latter. The way out of this impasse is to introduce a new postulate, namely that the 6n − 1 dimensional hypervolume of the ΓMi is proportional to 6n dimensional hypervolume of the ΓDi : μL, E (ΓMi ) = kv μL (ΓDi ), where kv is a proportionality constant. It is plausible to assume that this postulate is at least approximately correct because the energy hypersurface of characteristic SM systems is smooth and does not oscillate on the scale of δω. Given this, we have SB (Mi ) = SB (Mi ) + kB log(kv );

(3.17)



that is, SB and SB differ only by an additive constant, and so equation (3.13) as well as equations (3.14) and (3.15) can be used to determine the values of SB . In what follows I assume that this ‘proportionality assumption’ holds water and that the Boltzmann entropy of a macro-state can be calculated using equation (3.17). 3.2.3 Problems and Tasks In this subsection I point out what the issues are that the Boltzmannian needs to address in order to develop the approach introduced so-far into a full-fledged account of SM. Needless to say, these issues are not independent of each other and the response to one bears on the responses to others. 3.2.3.1 Issue 1: The Connection with Dynamics The Boltzmannian account as developed so far does not make reference to dynamical properties of the system other than the conservation of energy, which is a consequence of Hamilton’s equations of motion. But not every dynamical system—not even if it consists of a large number of particles—behaves thermodynamically in that the Boltzmann entropy increases most of the time.24 For such behaviour to take place it must be the case that a system, which is initially prepared in any low entropy state, eventually moves towards the region of Γγ associated with equilibrium. This is illustrated in fig. 3.6 (which is adapted from Penrose 1989, p. 401 and p. 407). But this need not be so. If, for instance, the initial low entropy macro-region is separated from the equilibrium region by an invariant surface, then no approach to equilibrium takes place. Hence, the question is: of what kind the dynamics has to be for the system to behave thermodynamically. follow, I disregard this issue henceforth. 24 Lavis (2005, pp. 254–61) criticises the standard preoccupation with ‘local’ entropy increase as misplaced and suggests that what SM should aim to explain is so-called thermodynamic-like behaviour, namely that the Boltzmann entropy be close to its maximum most of the time.

114 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

Γγ,a

ΓDeq

Fig. 3.6. Trajectory from a low entropy state to a region associated with equilibrium A common response begins by pointing out that equilibrium is not only associated with the largest part of Γγ ; in fact, the equilibrium macro-region is enormously larger than any other macro-region (Ehrenfest and Ehrenfest 1912, p. 30). Numerical considerations show that the ratio ΓMeq /ΓM , where M is a typical non-equilibrium distribution, is of the magnitude of 10n (Goldstein 2001, 43; Penrose 1989, p. 403). If we now assume that the system’s state drifts around more or less ‘randomly’ on Γγ, a then, because ΓMeq is vastly larger than any other macro region, sooner or later the system will reach equilibrium and stay there for at least a very long time. The qualification ‘more or less randomly’ is essential. If the motion is too regular, it is possible that the system successfully avoids equilibrium positions. But if the state wanders around on the energy hypersurface randomly, then, the idea is, it simply cannot avoid moving into the region associated with equilibrium sooner or later. Plausible as it may seem, this argument has at best heuristic value. What does it mean for a system to drift around randomly? In particular in the context of Hamiltonian mechanics, a deterministic theory, the notion of drifting around randomly is in need of explanation: what conditions does a classical system have to satisfy in order to possess ‘random properties’ sufficient to warrant the approach to equilibrium? 3.2.3.2 Issue 2: Introducing and Interpreting Probabilities There are several different (albeit interrelated) issues that must be addressed in order to understand the origin and meaning of probabilities in SM, and all of them are intimately connected to issue 1. The first of these is the problem of interpretation. The interpretation of SM probabilities. How are SM probabilities to be understood? Approaches to probability can be divided into two broad groups.25 First, epistemic approaches take probabilities to be measures for degrees of belief. Those who subscribe to an objective epistemic theory take probabilities to 25 What follows is only the briefest of sketches. Those options that have been seriously pursued within SM will be discussed in more detail below. For an in-depth discussion of all these approaches see, for instance, Howson (1995), Gillies (2000), Galavotti (2005) and Mellor (2005).

THE BOLTZMANN APPROACH

115

be degrees of rational belief, whereby ‘rational’ is understood to imply that given the same evidence all rational agents have the same degree of belief in any proposition. This is denied by those who hold a subjective epistemic theory, regarding probabilities as subjective degrees of belief that can differ between persons even if they are presented with the same body of evidence.26 Second, ontic approaches take probabilities to be part of the ‘furniture of the world’. On the frequency approach, probabilities are long run frequencies of certain events. On the propensity theory, probabilities are tendencies or dispositions inherent in objects or situations. The Humean best systems approach—introduced in Lewis (1986)—views probability as defined by the probabilistic laws that are part of that set of laws which strike the best balance between simplicity, strength and fit.27 To which of these groups do the probabilities introduced in the Boltzmannian scheme belong? We have introduced two different kinds of probabilities (micro and macro), which, prima facie, need not be interpreted in the same way. But before delving into the issue of interpretation, we need to discuss whether these probabilities can, as they should, explain Boltzmann’s law. In fact, serious problems arise for both kinds of probabilities. Macro-Probabilities. Boltzmann suggested that macro-probabilities explain the approach to equilibrium: if the system is initially prepared in an improbable macro-state (i.e. one far away from equilibrium), it will from then on evolve towards more likely states until it reaches, at equilibrium, the most likely state (1877, p. 165). This happens because Boltzmann takes it as a given that the system ‘always evolves from an improbable to a more probable state’ (ibid., p. 166). This assumption is unwarranted. eq.3.4 assigns unconditional probabilities to macro-states, and as such they do not imply anything about the succession of states, let alone that ones of low probability are followed by ones of higher probability. As an example consider a brassed die; the probability to get a ‘six’ is 0.25 and all other numbers of spots have probability 0.15. Can we then infer that after, say, a ‘three’ we have to get a ‘six’ because the six is the most likely event? Of course not; in fact, we are much more likely not to get a ‘six’ (the probability for non-six is 0.75, while the one for six is 0.25). A further (yet related) problem is that BL makes a statement about a conditional probability, namely the probability of the system’s macro-state at t2 being such that SB (t2 ) > SB (t1 ), given that the system’s macro-state at the earlier time t1 was such that its Boltzmann entropy was SB (t1 ). The probabilities of PP (see equation (3.4)) are not of this kind, and they cannot be turned into probabilities of this kind by using the elementary definition of conditional probabilities, p(B|A) = p(B&A)/p(A), for reasons pointed out by Frigg (2007a). For this reason non-equilibrium SM 26 ‘Subjective probability’ is often used as a synonym for ‘epistemic probability’. This is misleading because not all epistemic probabilities are also subjective. Jaynes’s approach to probabilities, to which I turn below, is a case in point. 27 Sometimes ontic probabilities are referred to as ‘objective probabilities’. This is misleading because epistemic probabilities can be objective as well.

116 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

cannot be based on the PP, no matter how the probabilities in it are interpreted. However, PP does play a rˆ ole in equilibrium SM. It posits that the equilibrium state is the most likely of all states and hence that the system is most likely to be in equilibrium. This squares well with an intuitive understanding of equilibrium as the state that the system reaches after a (usually short) transient phase, after which it stays there (remember the spreading gas in the introduction). Granting that, what notion of probability is at work in PP? And why, if at all, is this postulate true? That is, what facts about the system make it the case that the equilibrium state is indeed the most likely state? These are the questions that Boltzmannian equilibrium SM has to answer, and I will turn to these in §3.2.4. Micro-Probabilities. The conditional probabilities needed to explain BL can be calculated on the basis of SP (see equation (3.5)).28 Let M be the macrostate of a system at time t. For every point x ∈ ΓM there is a matter of fact (determined by the Hamiltonian of the system) about whether x evolves into a region of higher or lower entropy or stays at the same level of entropy. Call ΓM+ , ΓM− , and ΓM0 the sets of those points of ΓM that evolve towards a region of higher, lower, or same entropy respectively (hence ΓM = ΓM+ ∪ ΓM− ∪ ΓM0 ). The probability for the system’s entropy to either stay the same or increase as time evolves is μ(ΓM+ ∪ ΓM0 )/μ(ΓM ). Hence, it is a necessary and sufficient condition for BL to be true that μ(ΓM+ ∪ ΓM0 ) μ(ΓM− ) for all micro-states M except the equilibrium state itself (for which, trivially, μ(ΓM+ ) = 0). BL then translates into the statement that the overwhelming majority of micro-states in every macro-region ΓM except ΓMeq evolve under the dynamics of the system towards regions of higher entropy. This proposal is seriously flawed. It turns out that if the system, in macrostate M , is very likely to evolve towards a macro-state of higher entropy in the future (which we want to be the case), then, because of the time reversal invariance of the underlying dynamics, the system is also very likely to have evolved into the current macro-state M from another macro-state M  of higher entropy than M (see Appendix A for a discussion of time reversal invariance). So whenever the system is very likely to have a high entropy future it is also very likely to have a high entropy past; see Albert (2000, Chapter 4) for a discussion of this point. This stands in stark contradiction with both common sense experience and BL itself. If we have a lukewarm cup of coffee on the desk, SP makes the radically wrong retrodiction that is overwhelmingly likely that five minutes ago the coffee was cold (and the air in the room warmer), but then fluctuated away from equilibrium to become lukewarm and five minutes from now will be cold again. However, in fact the coffee was hot five minutes ago, cooled down a bit and will have further cooled down five minutes from now. This point is usually attributed to the Ehrenfests. It is indeed true that the Ehrenfests (1912, pp. 32–34) discuss transitions between different entropy levels 28 To keep things simple I assume that there corresponds only one macro-state to a given entropy value. If this is not the case, exactly the same calculations can be made using the union of the macro-regions of all macro-states with the same entropy.

THE BOLTZMANN APPROACH

117

and state that higher-lower-higher transitions of the kind that I just mentioned are overwhelmingly likely. However, they base their statement on calculations about a probabilistic model, their famous urn-model, and hence it is not clear what bearing, if any, their considerations have on deterministic dynamical systems; in fact, some of the claims they make are not in general true in conservative deterministic systems. Nor is it true that the objection to the proposal follows directly from the time reversal invariance of the underlying dynamics on the simple grounds that everything that can happen in one direction of time can also happen in the other direction. However, one can indeed prove that the statement made in the last paragraph about entropic behaviour is true in conservative deterministic dynamical systems, if SP is assumed (Frigg 2007b). Hence there is a serious problem, because the micro-dynamics and (SP) lead us to expect the system to behave in a way that is entirely different from how the system actually behaves and from what the laws of thermodynamics lead us to expect. The upshot is that the dynamics at the micro-level and SP by itself do not underwrite the asymmetrical behaviour that we find at the macro level, and which is captured by BL. Hence the question is: where then does the irreversibility at the macro level come from, if not from the dynamical laws governing the micro constituents of a system? I turn to a discussion of this question in §3.2.5. 3.2.3.3 Issue 3: Loschmidt’s Reversibility Objection As we observed in the introduction, the world is rife with irreversible processes; that is, processes that happen in one temporal direction but not in the other. This asymmetry is built into the Second Law of thermodynamics. As Loschmidt pointed out in controversy with Boltzmann (in the 1870s), this does not sit well with the fact that classical mechanics is time-reversal invariant. The argument goes as follows:29 Premise 1 : It follows from the time reversal invariance of classical mechanics that if a transition from state xi to state xf (‘i’ for ‘initial’ and ‘f ’ for ‘final’) in time span Δ is possible (in the sense that there is a Hamiltonian that generates it), then the transition from state Rxf to state Rxi in time span Δ is possible as well, where R reverses the momentum of the instantaneous state of the system (see Appendix A for details). Premise 2 : Consider a system in macro-state M with Boltzmann entropy SB (M ). Let RM be the reversed macro-state, i.e. the one with macro-region ΓRM := {x ∈ Γ|Rx ∈ ΓM } (basically we obtain RM by reversing the momenta of all particles at all points in ΓM ). Then we have SB (M ) = SB (RM ); that is, the Boltzmann entropy is invariant under R. Now consider a system that assumes macro-states Mi and Mf , at ti and tf respectively, where Si := SB (Mi ) < SB (Mf ) =: Sf and ti < tf . Furthermore assume that the system’s fine-grained state is xi ∈ ΓMi at ti and is xf ∈ ΓMf at tf , and that the transition from xi to xf during the interval Δ := tf − ti 29 The

following is a more detailed version of the presentation of the argument in Ehrenfest and Ehrenfest (1907, p. 311).

118 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

is allowed by the underlying dynamics. Now, by Premise 1 the system is time reversal invariant and hence the transition from Rxf to Rxi during Δ is possible as well. Because, by Premise 2, SB is invariant under R, we have to conclude that the transition from Sf to Si is possible as well. This contradicts the Second Law of thermodynamics which says that high to low entropy transitions cannot occur. So we are in the awkward position that a transition that is ruled out by the macro theory is allowed by the micro theory which is supposed to account for why the macro laws are the way they are. What are the consequences of this for the Boltzmannian? The answer to this question depends on what one sees as the aim of SM. If a justification of the (exact) Second Law is the aim, the objection is devastating. However, we have observed before that this would be asking for too much and what we should reasonably expect is an argument for the validity of BL rather than the Second Law. But BL is not obviously contradicted by the reversibility objection. So the question is whether the reversibility objection undermines BL, and if so in what way? 3.2.3.4 Issue 4: Zermelo’s Recurrence Objection Poincar´e’s recurrence theorem says, roughly speaking, that for the systems at stake in SM, almost every initial condition will, after some finite time (the Poincar´e recurrence time), return to a state that is arbitrarily close to its initial state (see Appendix A for details). As Zermelo pointed out in 1896, this has the unwelcome consequence that entropy cannot keep increasing all the time; sooner or later there will be a period of time during which the entropy of the system decreases. For instance, if we consider again the initial example of the gas (fig. 3.1), it follows from Poincar´e’s recurrence theorem that there is a future instant of time at which the gas returns by itself to the left half of the container. This stands in contradiction to the second law. A first attempt to dismiss this objection points to the fact that the time needed for this to happen in a realistic system is several times the age of the universe. In fact Boltzmann himself estimated that the time needed for a recurrence 19 to occur for a system consisting of a cubic centimetre of air was about 1010 seconds (Uffink 2007, p. 984). Hence, we never observe such a recurrence, which renders the objection irrelevant. This response misses the point. The objection is not concerned with whether we ever experience a decrease of entropy; the objection points to the fact that there is an in-principle incompatibility between the Second Law and the behaviour of classical mechanical systems. This is, of course, compatible with saying that there need not be any conflict with actual experience. Another response is to let the number of particles in the system tend towards infinity (which is the basic idea of the so-called thermodynamic limit; see §3.3.3.2). In this case the Poincar´e recurrence time becomes infinite as well. However, actual systems are not infinite and whether such limiting behaviour explains the behaviour of actual systems is at least an open question. So there

THE BOLTZMANN APPROACH

119

is no easy way around the objection. But as with the reversibility objection, the real issue is whether there is a contradiction between recurrence and BL rather than the (exact) Second Law. 3.2.3.5 Issue 5: The Justification of Coarse-Graining The introduction of a finite partition on the system’s μ-space is crucial to the combinatorial argument. Only with respect to such a partition can the notion of a distribution be introduced and thereby the Maxwell-Boltzmann equilibrium distribution be derived. Hence the use of a partition is essential. However, there is nothing in classical mechanics itself that either suggests or even justifies the introduction of a partition. How, then, can coarse-graining be justified? This question is further aggravated by the fact that the success of the combinatorial argument crucially depends on the choice of the right partition. The Maxwell-Boltzmann distribution is derived under the assumption that n is large and ni 1, i = 1, ..., l. This assumption is false if we choose too fine a partition (for instance one for which l ≥ n), in which case most ni are small. There are also restrictions on what kinds of partitions one can choose. It turns out that the combinatorial argument only works if one partitions phase space. Boltzmann first develops the argument by partitioning the particles’ energy levels and shows that in this case the argument fails to reproduce the MaxwellBoltzmann distribution (1877, pp. 168-90). The argument yields the correct result only if the phase space is partitioned along the position and momentum axes into cells of equal size.30 But why is the method sensitive to such choices and what distinguishes ‘good’ from ‘bad’ partitions (other than the ability to reproduce the correct distribution law)? 3.2.3.6 Issue 6: Limitations of the Formalism When deriving the MaxwellBoltzmann distribution in §3.2.2, we made the assumption that the energy of a particle depends only on its coarse-grained micro-state, i.e. on the cell in which its fine-grained micro-state comes to lie, which (trivially) implies that a particle’s energy does not depend on the other particles’ states. This assumption occupies centre stage in the combinatorial argument because the derivation of the Maxwell-Boltzmann distribution depends on it. However, this is true only if there is no interaction between the particles; wherever there is an interaction potential between the particles of a system the argument is inapplicable. Hence, the only system satisfying the assumptions of the argument is the ideal gas (which, by definition, consists of non-interacting particles). This restriction is severe. Although some real gases approximately behave like ideal gases under certain circumstances (basically: if the density is low), most systems of interest in statistical mechanics cannot be regarded as ideal gases. The behaviour both of solids and liquids (and even of dense gases) essentially depends on the interaction between the micro-constituents of the system, and a 30 Strictly

speaking this requirement is a bit too stringent. One can choose a different (but constant) cell size along each axis and still get the right results (Boltzmann 1877, p. 190).

120 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

theory that is forced to idealise these interactions away (should this be possible at all) is bound to miss out on what is essential to how real systems behave.31 A further limitation is that the argument assumes that all the particles of the system have the same phase space, which essentially amounts to assuming that we are dealing with a system of identical objects. A paradigm example for such a system is a monoatomic gas (e.g. helium). But many systems are not of that kind; most solids, for instance, contain constituents of different types. Finally, the formalism remains silent about what happens when systems interact with their environments. In practice many systems are not completely isolated and one would like the formalism to cover at least selected kinds of interactions. Hence, the question is whether the approach can be generalised so that it applies to the cases that, as it stands, are not within its scope. 3.2.3.7 Issue 7: Reductionism There is a consensus that the principal aim of SM is to account for the thermodynamic behaviour of a macroscopic system in terms of the dynamical laws governing its micro constituents; and it is a measure of the success of SM how much of TD it is able to reproduce (see §3.1.1). In philosophical parlance, the aim of SM is to reduce TD to mechanics plus probabilistic assumptions. What does such a reduction involve? How do the micro and the macro level have to relate to one another in order for it to be the case that the former reduces to the latter? The term ‘reduction’ has been used in many different senses and there is no consensus over what exactly it involves to reduce one domain to another. So we need to specify what exactly is asserted when SM is claimed to reduce TD and to discuss to what extent this assertion is true. A particular problem for reductionism is that idealisations play a constitutive rˆ ole in SM (Sklar 2000, p. 740). Depending on which approach we favour, we work with, for example: non-interacting particles or hard spheres in a box instead of ‘realistic’ interactions; or systems of infinite volume and an infinite number of particles; or vanishing densities; or processes that are infinitely long. These idealisations are more than the ‘usual’ inexactness that is unavoidable when applying a general theory to the messy world; they are essential to SM since the desired results usually cannot be derived without them. What is the status of results that only hold in highly idealised systems (and often are know to fail in more realistic systems) and what rˆole can they play in a reduction of TD to SM? 3.2.3.8 Plan §3.2.4 presents and discusses the ‘orthodox’ response to Issues 1 and 2, which is based on ergodic theory and the use of macro-probabilities. In §3.2.5 and § I discuss the currently most influential alternative answer to these issues, which invokes the so-called Past Hypothesis and uses micro-probabilities. Issues 3 and 4 are addressed in §3.2.6.3. In §3.2.7 I deal with Issues 5 and 6, and Issue 7 is discussed in §3.2.8. 31 A

discussion of this point can be found, for instance, in Schr¨ odinger (1952, Chapter 1).

THE BOLTZMANN APPROACH

3.2.4

121

The Ergodicity Programme

The best-known response to Issues 1 and 2, if macro probabilites are considered, is based on the notion of ergodicity. For this reason this subsection begins with an introduction to ergodic theory, then details how ergodicity is supposed to address the problems at stake, and finally explains what difficulties this approach faces. 3.2.4.1 Ergodic Theory Modern ergodic theory is developed within the setting of dynamical systems theory.32 A dynamical system is a triplet (X, λ, φt ), where X is a state space endowed with a normalised measure λ (i.e. λ(X) = 1) and φt : X → X, where t is a real number, is a one-parameter family of measure preserving automorphisms (i.e. λ(φt (A)) = λ(A) for all measurable A ⊆ X and for all t); the parameter t is interpreted as time. The Hamiltonian systems considered so far are dynamical systems in this sense if the following associations are made: X is the accessible part of the energy hypersurface; λ is the standard measure on the energy hypersurface, renormalised so that the measure of the accessible part is one; φt is the Hamiltonian flow. Now let f (x) be any complex-valued and Lebesgue-integrable function defined on X. Its space mean f¯ (sometimes also ‘phase space average’ or simply ‘phase average’) is defined as  ¯ f := f (x)dλ, (3.18) X

and its time mean f ∗ (sometimes also ‘time average’) at x0 ∈ X is defined as  t0 +τ f [φt (x0 )]dt. (3.19) f ∗ (x0 ) = lim (1/τ ) τ →∞

t0

The question is whether the time mean exists; the Birkhoff theorem asserts that it does: Birkhoff Theorem. Let (X, λ, φt ) be a dynamical system and f a complex-valued, λintegrable function on X. Then the time average f ∗ (x0 ) (i) exists almost everywhere (i.e. everywhere except, perhaps, on a set of measure zero); (ii) is invariant (i.e. does not depend on the initial time t0 ): f ∗ (x0 ) = f ∗ (φt (x0 )) for all t; R R (iii) is integrable: X f ∗ (x0 )dλ = X f (x)dλ.

We can now state the central definition: Ergodicity. A dynamical system is ergodic iff for every complex-valued, λ-integrable function f on X we have f ∗ (x0 ) = f¯ almost everywhere; that is, everywhere except, perhaps, on a set of measure zero.

Two consequences of ergodicity are worth emphasising. First, if a system is ergodic, then for almost all trajectories, the fraction of time a trajectory spends in 32 The presentation of ergodic theory in this subsection follows by and large Arnold and Avez (1968) and Cornfeld et al. (1982). For accounts of the long and intertwined history of ergodic theory see Sklar (1993, Chapters 2 and 5) and von Plato (1991, 1994, Chapter 3).

122 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

a region R equals the fraction of the area of X that is occupied by R. This can easily be seen by considering f (x) = χR (x), where χR (x) is the characteristic function of the region R: χR (x) = 1 if x ∈ R and χR (x) = 0 otherwise. We then have f¯ = λ(R) = f ∗ (x), meaning that the fraction of time the system spends in R equals λ(R), which is the fraction of the area of X that is occupied by R. Second, almost all trajectories (i.e. trajectories through almost all initial conditions) come arbitrarily close to any point on the energy hypersurface infinitely many times; or to put it another way, almost all trajectories pass through every subset of X that has positive measure infinitely many times. This follows from the fact that the time mean equals the space mean, which implies that the time mean cannot depend on the initial condition x0 . Hence a system can be ergodic only if its trajectory may access all parts of the energy hypersurface. The latter point is also closely related to the decomposition theorem. We first define: Decomposability. A system is decomposable (sometimes also ‘metrically decomposable’ or ‘metrically intransitive’) iff there exist two regions X1 and X2 of non-zero measure such that X1 ∩ X2 = ∅ and X1 ∪ X2 = X, which are invariant under the dynamics of the system: φt (X1 ) ⊆ X1 and φt (X2 ) ⊆ X2 for all t. A system that is not decomposable is indecomposable (‘metrically indecomposable’ or ‘metrically transitive’).

Then we have: Decomposition Theorem. A dynamical system is ergodic if and only if it is indecomposable; i.e. if every invariant measurable set has either measure zero or one.

The ergodic measure is unique, up to a continuity requirement, in the sense that there is only one measure invariant under the dynamics. We first define: Absolute Continuity. A measure λ is absolutely continuous with respect to λ iff for any measurable region A ⊆ X: if λ(A) = 0 then λ (A) = 0.

We then have: Uniqueness Theorem. Assume that (X, λ, φt ) is ergodic and λ is normalised. Let λ be another measure on X which is normalised, invariant under φt , and absolutely continuous with respect to λ. Then λ = λ .

For what follows it will also be important to introduce the notion of mixing. Mixing. A system is mixing33 if and only if for all measurable subsets A and B of X: limt→∞ μ(φt B ∩ A) = μ(A)μ(B).

The meaning of this concept can be visualised as follows. Think of the phase space as a glass of water to which a shot of scotch has been added. The volume of the cocktail X (scotch + water) is μ(X) and the volume of scotch is μ(B); hence the concentration of scotch in X is μ(B)/μ(X). Now stir. Mathematically, 33 Strictly speaking this property is called ‘strong mixing’ since there is a similar condition called ‘weak mixing’. The differences between these need not occupy us here. For details see Arnold and Avez (1968, Chapter 2).

THE BOLTZMANN APPROACH

123

the time evolution operator φt represents the stirring, meaning that φt (B) is the region occupied by the scotch after time t. The cocktail is mixed, if the concentration of scotch equals μ(B)/μ(X) not only with respect to the entire ‘glass’ X, but with respect to any arbitrary (but non-zero measure) region A in that volume; that is, it is mixed if μ(φt (B) ∩ A)/μ(A) = μ(B)/μ(X) for any finite volume A. This condition reduces to μ(φt (B) ∩ A)/μ(A) = μ(B) for any region B because, by assumption, μ(X) = 1. If we now assume that mixing is achieved only for t → ∞ we obtain the above condition. One can then prove the following two theorems. Implication Theorem. Every dynamical system that is mixing is also ergodic, but not vice versa. Convergence Theorem. Let (X, λ, φt ) be a dynamical system and let ρ be a measure on X that is absolutely continuous with λ (but otherwise arbitrary). Define ρt (A) := ρ(φt (A)) for all measurable A ⊆ X. Let f (x) be a bounded measurable function on X. If the system is mixing, then ρt → λ as t → ∞ in the sense that for all such f : Z

Z lim

t→∞

f (x)dρt =

f (x)dλ.

(3.20)

3.2.4.2 Promises Assuming that the system in question is ergodic seems to provide us with neat responses to Issues 1 and 2, if macro-probabilities are considered. Thus, let us ask the question of how are we to understand statements about the probability of a macro-state? That is, how are we to interpret the probabilities introduced in equation (3.4)? A natural suggestion is that probabilities should be understood as time averages. More specifically, the suggestion is that the probability of a macro-state M is the fraction of time that the system’s state spends in ΓM (the so-called sojourn time): 1 p(M ) = τ



t0 +τ

t0

χΓM [φt (x)]dt,

(3.21)

where χΓM is the characteristic function (as defined above) and [t0 , t0 + τ ] is some suitable interval of time. This definition faces some prima facie problems. First, what is the suitable interval of time? Second, does this time average exist? Third, as defined in eq.3.21, p(M ) exhibits an awkward dependence on the initial condition x. These difficulties can be overcome by assuming that the system is ergodic. In this case the relevant time interval is infinity; the existence question is resolved by Birkhoff’s theorem, which states that the infinite time limit exists almost everywhere; and the awkward dependence on the initial condition vanishes because in an ergodic system the infinite time means equals the space means for almost all initial conditions, and hence a fortiori the time mean, does not depend on the initial condition x (for almost all x).

124 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

This puts the time average interpretation of SM probabilities on a solid foundation and at the same time also offers a response to the problem of the mechanical foundation of the PP. The combinatorial considerations in the last subsection have shown that the equilibrium state occupies by far the largest part of Γγ . Combining this with the fact that the time a system spends in a given region of Γγ is proportional to its measure provides a faultless mechanical justification of PP.34 In sum, if the system is ergodic, we seem to have a neat mechanical explanation of the system’s behaviour as well as a clear interpretation of the probabilities that occur in the PP. 3.2.4.3 Problems The Ergodic programme faces serious difficulties. To begin with, it turns out to be extremely difficult to prove that the systems of interest really are ergodic. Contrary to what is sometimes asserted, not even a system of n elastic hard balls moving in a cubic box with hard reflecting walls has been proven to be ergodic for arbitrary n; it has been proven to be ergodic only for n ≤ 4. Moreover, hard ball systems are highly idealised (molecules do not behave like hard balls) and it is still an open question whether systems with more realistic interaction potentials (e.g. Lenard-Jones potentials) are ergodic.35 What is worse than the absence of proof that the systems of interest are ergodic is that there are systems that show the appropriate behaviour and yet are known not to be ergodic. For instance, in a solid the molecules oscillate around fixed positions in a lattice, and as a result the phase point of the system can only access a small part of the energy hypersurface (Uffink 2007, p. 1017). Bricmont (2001) investigates the Kac Ring Model (Kac 1959) and a system of n uncoupled anharmonic oscillators of identical mass, and points out that both systems exhibit thermodynamic behaviour— and yet they fail to be ergodic. And most notably, a system of non-interacting point particles is known not be ergodic; yet ironically it is exactly this system on which the combinatorial argument is based (Uffink 1996b, p. 381). Hence, ergodicity is not necessary for thermodynamic behaviour.36 But, as Earman and Redei (1996, p. 70) and van Lith (2001a, p. 585) point out, if ergodicity is not necessary for thermodynamic behaviour, then ergodicity cannot provide a satisfactory explanation for this behaviour. Either there must be properties other than ergodicity that explain thermodynamic behaviour in cases in which the system is not ergodic, or there 34 For a further discussion of ergodicity and issues in the interpretation of probability in the Boltzmann approach see von Plato (1981, 1982, 1988, 1989), Gutmann (1999), van Lith (2003), Emch (2005). 35 For further discussions of this issue see Sklar (1993, Chapter 5), Earman and Redei (1996, §4), Uffink (2007, §6), Emch and Liu (2002, Chapters 7-9), and Berkovitz et al. (2006, §4). 36 It has been argued that ergodicity is not sufficient either because there are systems that are ergodic but don’t show an approach to equilibrium, for instance two hard spheres in a box (Sklar 1973, p. 209). This is, of course, correct. But this problem is easily fixed by adding the qualifying clause that if we consider a system of interest in the context of SM —i.e. one consisting of something like 1023 particles—then if the system is ergodic it shows SM behaviour.

THE BOLTZMANN APPROACH

125

must be an altogether different explanation for the approach to equilibrium even for systems which are ergodic.37 But even if a system turns out to be ergodic, further problems arise. All results and definitions of ergodic theory come with the qualification ‘almost everywhere’: the Birkhoff theorem ensures that f ∗ exists almost everywhere and a dynamical system is said to be ergodic iff for every complex-valued, Lebesgueintegrable function f the time mean equals the space mean almost everywhere. This qualification is usually understood as suggesting that sets of measure zero can be neglected or ignored. This, however, is neither trivial nor evidently true. What justifies the neglect of these sets? This has become known as the ‘measure zero problem’. The idea seems to be that points falling in a set of measure zero are ‘sparse’ and this is why they can be neglected. This view receives a further boost from an application of the Statistical Postulate, which assigns probability zero to events associated with such sets. Hence, so goes the conclusion, what has measure zero simply doesn’t happen.38 This is problematic for various reasons. First, sets of measure zero can be rather ‘big’; for instance, the rational numbers have measure zero within the real numbers. Moreover, a set of measure zero need not be (or even appear) negligible if sets are compared with respect to properties other than their measures. For instance, we can judge the ‘size’ of a set by its cardinality or Baire category rather than by its measure, which leads us to different conclusions about the set’s size (Sklar 1993, pp. 182-88). Furthermore it is a mistake to assume that an event with measure zero cannot occur. In fact, having measure zero and being impossible are distinct notions. Whether or not the system at some point was in one of the special initial conditions for which the space and time mean fail to be equal is a factual question that cannot be settled by appeal to measures; pointing out that such points are scarce in the sense of measure theory does not do much, because it does not imply that they are scarce in the world as well.39 All we can do is find out what was the case, and if the system indeed was in one of these initial conditions then considerations based on this equality break down. The fact that SM works in so many cases suggests that they indeed are scarce, but this is a matter of fact about the world and not a corollary of measure theory.40 Hence, an explanation of SM behaviour would have to consist of the observation that the system is ergodic and that it additionally started in an initial condition which is such that space and time means are equal.

37 The term ‘explanation’ here is used in a non-technical sense; for a discussion of how the use of ergodicity ties in with certain influential philosophical views about explanation see Sklar (1973) and Quay (1978). 38 This piece of ‘received wisdom’ is clearly explained but not endorsed in Sklar (2000a, pp. 265-6). 39 Sklar (1973, pp. 210-11) makes a very similar point when discussing the Gibbs approach. 40 For a further discussion of this issue see Friedman (1976).

126 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

For these reasons a time average interpretation of macro-probabilities is problematic. However, alternative interpretations do not fare better. Frequentism is ruled out by the fact that the relevant events in SM do not satisfy the requirement of von Mises’ theory (van Lith 2001, p. 587), and a propensity interpretation (Popper 1959) fails because the existence of propensities is ultimately incompatible with a deterministic underlying micro theory (Clark 2001).41 A peculiar way around the problem of interpreting probabilities is to avoid probabilities altogether. This is the strategy pursued, among others, by Goldstein (2001), Lebowitz (1993b), Goldstein and Lebowitz (2004) and Zangh`ı (2005) in their presentation of the Boltzmannian account. The leading idea of this approach is that equilibrium states are ‘typical’ while non-equilibrium states are ‘atypical’, and that the approach to equilibrium can be understood as a transition from atypical to typical states. For a discussion of this approach to SM see Frigg (2007b). 3.2.5 The Past Hypothesis 3.2.5.1 The Past Hypothesis Introduced Let us now turn to Issues 2 and 3, and base our discussion on micro-probabilities. The two problems we have to solve are (a) that high to low entropy transitions are allowed by the dynamics (by the reversibility objection) and (b) that most trajectories compatible with a given non-equilibrium state are ones that have evolved into that state from a state of higher entropy (which is a consequence of SP and the time reversal invariance of the micro dynamics). There is a common and now widely accepted solution to these problems which relies on the fact that a system’s actual behaviour is determined by its dynamical laws and its initial condition. Hence there need not be a contradiction between time reversal invariant laws and the fact that high to low entropy transitions do not (or only very rarely) occur in our world. All we have to do is to assume that the relevant systems in our world have initial conditions which are such that the system’s history is indeed one that is characterised by low to high entropy transitions. That initial conditions of this kind are scarce is irrelevant; all that matters is that the system de facto started off in one of them. If this is the case, we find the irreversible behaviour that we expect. However, this behaviour is now a consequence not only of the laws governing the system, but also of its special initial condition. The question is at what point in time the relevant low entropy initial condition is assumed to hold. A natural answer would be that the beginning of an experiment is the relevant instant; we prepare the gas such that it sits in the left half of the container before we open the shutter and this is the low entropy initial condition that we need. The problem with this answer is that the original problem recurs if we draw an entropy curve for the system we find that the low entropy state at the beginning of the experiment evolved another high entropy state. The problem is obvious by now: whichever point in time we chose to be 41 For

a further discussion of this issue see Butterfield (1987) and Clark (1987; 1989; 1995).

THE BOLTZMANN APPROACH

127

the point for the low entropy initial condition to hold, it follows that the overwhelming majority of trajectories compatible with this state are such that their entropy was higher in the past. An infinite regress looms large. This regress can be undercut by assuming that there is an instant that simply has no past, in which case it simply does not make sense to say that the system has evolved into that state from another state. In other words, we have to assume that the low entropy condition holds at the beginning of the universe. At this point modern cosmology enters the scene: proponents of Boltzmannian SM take cosmology to inform us that the universe was created in the Big Bang a long but finite time ago and that it then was in a low entropy state. Hence, modern cosmology seems to provide us with exactly what we were looking for. This is a remarkable coincidence, so remarkable that Price sees in it ‘the most important achievement of late-twentieth-century physics’ (2004, p. 228). The posit that the universe has come into existence in a low entropy state is now (following Albert 2000) commonly referred to as the ‘Past Hypothesis’ (PH); let us call the state that it posits the ‘Past State’. In Albert’s formulation PH is the claim [...] that the world first came into being in whatever particular low-entropy highly condensed big-bang sort of macrocondition it is that the normal inferential procedures of cosmology will eventually present to us (2000, p. 96).

This idea can be traced back to Boltzmann (see Uffink 2007, p. 990) and has since been advocated, among others, by Feynman (1965, Chapter 5), Penrose (1989, Chapter 7; 2006, Chapter 27), Price (1996, 2004, 2006), Lebowitz (1993a, 1993b, 1999), Albert (2000), Goldstein (2001), Callender (2004a, 2004b), and Wald (2006). There is a remarkable consensus on the formulation and content of PH; different authors diverge in what status they attribute to it. For Feynman, Goldstein, and Penrose PH, seems to have the status of a law, which we simply add to the laws we already have. Whether such a position is plausible depends on one’s philosophical commitments as regards laws of nature. A discussion of this issue would take us too far afield; surveys of the philosophical controversies surrounding the concept of a law of nature can be found in, among others, Armstrong (1983), Earman (1984) and Cartwright and Alexandrova (2006). Albert regards PH as something like a Kantian regulative principle in that its truth has to be assumed in order to make knowledge of the past possible at all. On the other hand, Callender, Price, and Wald agree that PH is not a law, but just a contingent matter of fact; but they have conflicting opinions about whether this fact is in need of explanation.42 Thus for Price (1996, 2004) the crucial question in the foundation of SM is not so much why entropy increases, but rather why it ever got to be so low in the first place. Hence, what really needs to be explained 42 Notice that this view has the consequence that the Second Law of thermodynamics, or rather its ‘statistical cousin’, Boltzmann’s Law, becomes a de facto regularity and is thus deprived it of its status as a law properly speaking.

128 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

is why the universe shortly after the Big Bang was in the low entropy state that PH posits. Callender (1998, 2004a, 2004b) argues that this quest is wrong. PH simply specifies initial conditions of a process because initial conditions, irrespective of whether they are special or not, are not the kind of thing that is in need of explanation. Similar concerns have also been raised by Sklar (1993, pp. 309-18). 3.2.5.2 Problems and Criticisms PH has recently come under attack. Earman (2006) argues that what at first glance looks like a great discovery—that modern cosmology posits exactly the kind of Past State that the Boltzmannian account requires—turns out to be ‘not even false’ (p. 400). Earman first investigates a particular Friedman-Robertson-Walker model of cosmology suggested by Hawking and Page and shows that in this model probabilities are typically ill-defined or meaningless, and he then argues that this result is not an artefact of the idealisations of the models and would crop up equally in more realistic models (pp. 417-18). Hence, for the cosmologies described in general relativity there is no well-defined sense in which the Boltzmann entropy has a low value. And worse, even if quantum gravity or some other yet to be discovered theory came to the rescue and made it possible to give a well-defined expression for the Boltzmann entropy at the beginning of the universe, this would be of little help because the dynamics of the cosmological models does not warrant the claim that there will be monotonic increase in entropy (pp. 418-20). For these two reasons, Earman concludes, the past hypothesis is untenable. Whatever the eventual verdict of Earman’s critique of PH, there is a further problem in that the Boltzmann entropy is a global quantity characterising the macro-state of an entire system, in this case the entire universe. The fact that this quantity is low does not imply that the entropy of a particular small subsystem of interest is also low. And what is worse, just because the overall entropy of the universe increases it need not be the case that the entropy in a small subsystem also increases. A decrease in the entropy in one part of the universe may be balanced by an increase in entropy in some other part of the universe and hence is compatible with an increase in the overall entropy. Hence, SM cannot explain the behaviour of small systems like gases in laboratories. Winsberg (2004a, pp. 499-504) addresses this problem and argues that the only way to avoid it is to make a further conjecture about the theory (he calls it ‘Principle 3’), which in effect rules out local ‘entropic misbehaviour’. However, as he points out, this principle is clearly false and hence there is no way for the Boltzmannian to rule out behaviour of this kind. It is not the time to notice that a radical shift has occurred at the beginning of this subsection. We started with a pledge to explain the behaviour of homely systems like a vessel full of gas and ended up talking about the Big Bang and the universe as a whole. At least to some, this looks like using a sledgehammer to crack nuts, and not a very wise move because most of the problems that it faces are caused by the move to the cosmological scale. The natural reaction

THE BOLTZMANN APPROACH

129

to this is to downsize again and talk about laboratory scale systems. This is what happens in the so-called ‘branch systems approach’, which is inspired by Reichenbach’s (1956) discussion of the direction of time, and is fully articulated in Davies (1974) and discussed in Sklar (1993, pp. 318-32). The leading idea is that the isolated systems relevant to SM have neither been in existence forever, nor continue to exist forever after the thermodynamic processes took place. Rather, they separate off from the environment at some point (they ‘branch’) then exist as energetically isolated systems for a while and then usually merge again with the environment. Such systems are referred to as ‘branch systems’. For instance, the system consisting of a glass and an ice cube comes into existence when someone puts the ice cube into the water, and it ceases to exist when someone pours it into the sink. So the question becomes why a branch system like the water with the ice cube behaves in the way it does. An explanation can be given along the lines of the past hypothesis, with the essential difference that the initial low entropy state has to be postulated not for the beginning of the universe but only for the state of the system immediately after the branching. Since the system, by stipulation, did not exist before that moment, there is also no question of whether the system has evolved into the current state from a higher entropy state. This way of looking at things is in line with how working physicists think about these matters for the simple reason that low entropy states are routinely prepared in laboratories—hence Lebowitz’s (1993b, p. 11) remark that the origin of low entropy initial states is no problem in laboratory situations. Albert dismisses this idea as ‘sheer madness’ (2000, p. 89) for three reasons. First, it is impossible to specify the precise moment at which a particular system comes into being; that is, we cannot specify the precise branching point. Second, there is no unambiguous way to individuate the system. Why does the system in question consist of the glass with ice, rather than the glass with ice and the table on which the glass stands, or the glass and ice and the table and the person watching it, or ... And this matters because what we regard as a relevant low entropy state depends on what we take the system to be. Third, it is questionable whether we have any reason to assume, or whether it is even consistent to claim, that SP holds for the initial state of the branch system.43 The first and the second criticism do not seem to be successful. Why should the system’s behaviour have anything to do with our inability to decide at what instant the system becomes energetically isolated? So Albert’s complaint must be that there is no matter of the fact about when a system becomes isolated. If this was true, it would indeed be a problem. But there does not seem to be a reason why this should be so. If we grant that there is such a thing as being isolated from one’s environment (an assumption not challenged in the first criticism), then there does not seem to be a reason to claim that becoming isolated at 43 As we shall see in the next subsection, it is necessary to assume that SP holds for the initial state. Proponents of the past hypothesis and of the branch systems approach differ in what they regard as the beginning.

130 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

some point in time should be more problematic than the lights going off at some point in time, or the game beginning at some point in time, or any other event happening at some instant. The second criticism does not cut any ice either (see Winsberg 2004b, p. 715). Being energetically isolated from the rest of the universe is an objective feature of certain things and not others. The glass and its contents are isolated from the rest of the universe and this is what makes them a branch system; the table, the observer, the room, the house, etc. are not, and this is why they are not branch systems. There is nothing subjective or arbitrary about this division. One can, of course, question whether systems ever really are isolated (we come to this in §3.3.5.2). But this is a different point. If one goes down that road, then there simply are no branch systems; but then there is no individuation problem either. The third criticism leads us into deep waters. Why would we want to deny that SP applies to the branch system at the instance of its creation? Although Albert does not dwell on this point, his reasoning seems to be something like the following (see Winsberg 2004b, pp. 715-17). Take the universe at some particular time. Now things happen: someone opens the freezer, takes an ice cube and puts it into a glass of lukewarm water. These are physical processes governed by the laws of mechanics; after all, at the micro level all that happens is that swarms of particles move around in some specific way. But then the micro-state of the glass with ice is determined by the laws of mechanics and the micro-condition at the earlier point of time and we can’t simply ‘reset’ the glass’ state and postulate that it is now such that SP, or any other condition for that matter, holds. In brief, the glass’ state at some point is dictated by the laws of the theory and is not subject to stipulations of any kind. Whether or not one finds this criticism convincing depends on one’s philosophical commitments as regards the nature of laws. The above argument assumes that laws are universal and valid all the time; it assumes that not only the behaviour of the water and the ice, but also of the table, the room, the fridge and, last but not least, the person putting the ice into the water and everything else in the universe are governed by the laws of mechanics. If one shares this view, then Albert’s third criticism is valid. However, this view of laws is not uncontroversial. It has been argued that the domain of applicability of laws is restricted: we are making a mistake if we assume them to be universal. To someone of the latter persuasion the above argument has no force at all against branch systems. This conflict surfaces again when we discuss the interventionist approach to SM in §3.3.5.2 and for this reason I postpone till then a more detailed discussion of the issue of the scope of laws. 3.2.6

Micro-Probabilities Revisited

As we have seen above, SP gives us wrong retrodictions and this needs to be fixed. PH, as introduced in the last subsection, seems to provide us with the means to reformulate SP so that this problem no longer arises (§3.2.6.1). Once we have a rule that assigns correct probabilities to past states, we come back to

THE BOLTZMANN APPROACH

131

the question of how to interpret these probabilities (§3.2.6.2) and then address the reversibility and recurrence objections (§3.2.6.3). 3.2.6.1 Conditionalising on PH PH, if true, ensures that the system indeed starts in the desired low entropy state. But, as we have seen in §3.2.3.2, our probabilistic machinery tells us that this is overwhelmingly unlikely. Albert (2000, Chapter 4) argues that this is unacceptable since it just cannot be that the actual past is overwhelmingly unlikely for this would lead us to believe wrong things about the past.44 The source of this problem is that we have (tacitly) assumed that SP is valid at all times. Hence this assumption must be renounced and a postulate other than SP must be true at some times. Albert (2000, pp. 94-6) suggests the following remedy: SP is valid only for the Past State (the state of universe just after the Big Bang); for all later states the correct probability distribution is the one that is uniform (with respect to the Lebesgue measure) over the set of those conditions that are compatible with the current macro-state and the fact that the original macro-state of the system (at the very beginning) was the Past State. In brief, the suggestion is that we conditionalise on the Past Hypothesis and the current macro-state. More precisely, let MP be the macro-state of the system just after the Big Bang (the Past State) and assume (without loss of generality) that this state obtains at time t = 0; let Mt be the system’s macro-state at time t and let Γt := ΓMt be the parts of Γγ, a that corresponds to Mt . Then we have: Past Hypothesis Statistical Postulate (PHSP): SP is valid for the Past State. For all times t > 0, the probability at t that the fine-grained micro-state of the system lies in a subset A of Γt is μ (A ∩ Rt ) μL, t (A) := L (3.22) μL (Rt ) whenever Rt = 0, where Rt := Γt ∩ φt (ΓP ) and φt (ΓP ), as above, is the image of ΓP under the dynamics of the system after time t has elapsed.

This is illustrated in fig. 3.7. Now, by construction, those fine-grained microstates in Γt which have a high entropy past have probability zero, which is what we needed. However, PHSP needs to be further qualified. There might be a ‘conspiracy’ in the system to the effect that states with a low entropy past and ones with a low entropy future are clumped together. Let Γt, f be the subregions of Γt occupied by states with a low entropy future. If it now happens that these lie close to those states compatible with PH, then PHSP—wrongly—predicts that a 44 In fact, Albert (2000, Chapters 4 and 6) even sees this as a fundamental problem threatening the very notion of having knowledge of the past. Leeds (2003) takes the opposite stance and points out that this conclusion is not inevitable since it depends on the view that we explain an event by its having a high probability to occur. Explaining the past, then, involves showing that the actual past has high probability. However, if we deny that we are in the business of explaining the past on the basis of the present and the future, then this problem looks far less dramatic. For a further discussion of Albert’s view on past knowledge and intervention see Frisch (2005) and Parker (2005).

132 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

Γt

ΓP

Γγ,a

Rt

φt ΓP

Fig. 3.7. Illustration of the past hypothesis statistical postulate low entropy future is very likely despite the fact that the fraction of Γt occupied by Γt, f is tiny and that SP—correctly—predicts that a low entropy future is very unlikely (see fig. 3.8).

Γt

φt ΓP

Γγ,a

Γt,f

Fig. 3.8. Illustration of a conspiracy involving clumping together of low entropy past and low entropy future states This problem can be avoided by requiring that Γt, f is scattered in tiny clusters all over Γt (see Albert 2000, p. 67 and pp. 81-5) so that the fraction of Γt, f that comes to lie in Rt is exactly the same as the fraction of Γt taken up by Γt, f , i.e. μL (Γt, f )/μL (Γt ) = μL (Γt, f ∩ Rt )/μL (Rt ) (see fig. 3.9). Let us call this the ‘scattering condition’. If this condition falls in place, then the predictions of PHSP and SP coincide and the problem is solved. In sum, replacing SP by PHSP and requiring that the scattering condition holds for all times t is sufficient to get both predictions and retrodictions right. The remaining question is, of course, whether the scattering condition holds.

THE BOLTZMANN APPROACH

Γt

133

Γγ,a Γt,f

φt ΓP Fig. 3.9. Scattering condition as solution to the conspiracy problem Albert simply claims that it is plausible to assume that it holds, but he does so without presenting, or even mentioning, a proof. Since this condition concerns mathematical facts about the system, we need a proof, or a least a plausibility argument, that it holds. Such a proof is not easy to get because the truth of this condition depends on the dynamics of the system. 3.2.6.2 Interpreting Micro-Probabilities How are we to interpret the probabilities defined by PHSP? Frequentism, time averages and the propensity interpretation are unworkable for the same reasons as in the context of macro-probabilities. Loewer (2001, 2004) suggested that the way out of the impasse is to interpret PHSP probabilities as Humean chances in Lewis’ (1994) sense. Consider all deductive systems that make true assertions about what happens in the world and also specify probabilities of certain events. The best system is the one that strikes the best balance between simplicity, strength and fit, where the fit of a system is measured by how likely the system regards it that things go the way they actually do. Lewis then proposes as an analysis of the concept of a law of nature that laws are the regularities of the best system and chances are whatever the system asserts them to be. Loewer suggests that the package of classical mechanics, PH and PHSP is a putative best system of the world and that therefore the chances that occur in this system can be understood as chances in Lewis’ sense. Frigg (2006, 2007a) argues that this suggestion faces serious difficulties. First, Lewis’ notion of fit is modelled on the frequentist notion of a sequence and cannot be carried over to a theory with continuous time. Second, even when discretising time in order to be able to calculate fit, it turns out that Loewer’s putative best system is not the best system because there are distributions over the initial conditions that lead to a better fit of the system than the distribution posited in PHSP. The details of these arguments suggest that PHSP probabilities are best understood as epistemic probabilities of some sort.

134 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

3.2.6.3 Loschmidt’s and Zermelo’s Objections We now return to Loschmidt’s and Zermelo’s objections and discuss in what way the micro probability approach can address them. Reversal Objection: Consider the same scenario as in §3.2.3.3. Denote by Γif the subset of ΓMi consisting of all points that evolve into ΓMf during the interval Δ, and likewise let Γf i be set of all points in ΓMf that evolve into ΓMi during Δ. We then have Γf i = R(φΔ (Γif )), where φΔ is the time evolution of the system during time span Δ. Therefore μ(Γf i )/μ(ΓMf ) = μ(R(φΔ (Γif )))/μ(ΓMf ) = μ(Γif )/μ(ΓMf ), because μ(RA) = μ(A) for all sets A. By assumption μ(ΓMf ) > μ(ΓMi ) (because Mf has higher entropy than Mi ), hence μ(Γif )/μ(ΓMf ) < μ(Γif )/μ(ΓMi ). Assuming that conditionalising on PH would not upset these proportions, it follows that the system is more likely to evolve from low to high entropy than it is to evolve from high to low entropy. Now take Mi and Mf to be, respectively, the state of a gas confined to the left half of the container and the state of the gas spread out evenly over the entire available space. In this case μ(ΓMf )/μ(ΓMi ) ≈ 10n (n being the number of particles in the system), and hence the system is 10n times more likely to evolve from low to high entropy than vice versa. This is what BL asserts.45 Recurrence Objection: Roughly speaking, the recurrence objection (see §3.2.3.4) states that entropy cannot always increase because every mechanical system returns arbitrarily close to its initial state after some finite time (Poincar´e’s Recurrence Theorem). The common response (Callender 1999, p. 370; Bricmont 1996, §4) to the recurrence objection has a somewhat empiricist flavour and points out that, according to the Past Hypothesis, the universe is still today in a low entropy state far away from equilibrium and recurrence will therefore presumably not occur within all relevant observation times. This, of course, is compatible with there being periods of decreasing entropy at some later point in the history of the universe. Hence, we should not view BL as valid at all times. 3.2.7 Limitations There are serious questions about the use of coarse graining, i.e. partitions, in the combinatorial argument (issue 5) and the scope of the theory (issue 6). I now discuss these problems one at a time. How can coarse-graining be justified? The standard answer is an appeal to knowledge: we can never observe the precise value of a physical quantity because measuring instruments invariably have a finite resolution (just as do human observation capabilities); all we can assert is that the result lies within a certain range. This, so the argument goes, should be accounted for in what we assert about the system’s state and the most natural way to do this is to choose a partition whose cell size reflects what we can reasonably hope to know about the system. This argument is problematic because the appeal to observation introduces a kind of subjectivity into the theory that does not belong there. Systems approach 45 See

Bricmont (1996, §3) for a more detailed discussion.

THE BOLTZMANN APPROACH

135

equilibrium irrespective of what we happen to know about them. Hence, so the objection concludes, any reference to incomplete knowledge is out of place.46 Another line of argument is that there exists an objective separation of relevant scales—in that context referred to as ‘micro’ and ‘macro’47 —and that this justifies coarse-graining.48 The distinction between the two scales is considered objective in much the same way as, say, the distinction between dark and bright: it may not be clear where exactly to draw the line, but there is no question that there is a distinction between dark and bright. From a technical point of view, the separation of scales means that a macro description is bound to use a finite partition (whose cell size depends on where exactly one draws the line between the micro and macro scales). This justifies Boltzmannian coarse-graining. The question is whether there really is an objective micro-macro distinction of this kind. At least within the context of classical mechanics this is not evidently the case. In quantum mechanics Planck’s constant gives a natural limit to how confined a state can be in both position and momentum, but classical mechanics by itself does not provide any such limit. So the burden of proof seems to be on the side of those who wish to uphold that there is an objective separation between micro and macro scales. And this is not yet the end of the difficulties. Even if the above arguments were successful, they would remain silent about the questions surrounding the choice of the ‘right’ partition. Nothing in either the appeal to the limits of observation or the existence of an objective separation of scales explains why coarse-graining energy is ‘bad’ while coarse-graining position and momentum is ‘good’. These problems are not easily overcome. In fact, they seem so serious that they lead Penrose to think that ‘entropy has the status of a “convenience”, in present day theory, rather than being “fundamental”’(2006, p. 692) and that it only would acquire a ‘more fundamental status’ in the light of advances in quantum theory, in particular quantum gravity, as only quantum mechanics provides the means to compartmentalise phase space (ibid.). In the light of these difficulties the safe strategy seems to be to renounce commitment to coarse-graining by downgrading it to the status of a mere expedient, which, though instrumentally useful, is ultimately superfluous. For this strategy to be successful the results of the theory would have to be robust in the limit δω → 0. 46 Many authors have criticised approaches to SM that invoke limited knowledge as deficient. Since these criticisms have mainly been put forward against Gibbsian approaches to SM, I will come back to this point in more detail below. 47 Notice that this use of the terms ‘micro’ and ‘macro’ does not line up with how these terms have been used above, where both fine-grained and coarse-grained states were situated at the ‘micro’ level (see §3.2.1). 48 This point of view is often alluded to by physicists but rarely explained, let alone defended. It also seems to be what Goldstein has in mind when he advises us to ‘partition the 1-particle phase space (the q, p-space) into macroscopically small but microscopically large cells Δα ’ (2001, p. 42).

136 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

But this is not the case. The terms on the right-hand side of eq.3.13 diverge in the limit δω → 0. And this is not simply a ‘technical accident’ that one can get straight given enough mathematical ingenuity. On the contrary, the divergence of the Boltzmann entropy is indicative of the fact that the whole argument is intimately tied to there being finitely many cells which serve as the starting point for a combinatorial argument. Using combinatorics simply does not make sense when dealing with a continuum; so it is only natural that the argument breaks down in the continuum limit. Let us now turn to the limitations of the formalism, which are intimately connected to the Boltzmannian conception of equilibrium. The equilibrium macrostate, by definition, is the one for which SB is maximal. Per se this is just a definition and its physical relevance needs to be shown. This is done in two steps. First, we use the combinatorial argument to explicitly construct the macroregions as those parts of the energy hypersurface that correspond to a certain distribution, and then show that the largest macro-region is the one that corresponds to the Maxwell-Boltzmann distribution. But why is this the equilibrium distribution of a physical system? This is so, and this is the second step, because (a) predictions made on the basis of this distribution bear out in experiments, and (b) Maxwell showed in 1860 that this distribution can be derived from symmetry considerations that are entirely independent of the use of a partition (see Uffink (2007, pp. 943-8) for a discussion of Maxwell’s argument). This provides the sought-after justification of the proposed definition of equilibrium. The problem is that this justification is based on the assumption that there is no interaction between the particles in the system and that therefore the total energy of the system is the sum of the ‘individual’ particle energies. While not a bad characterisation of the situation in dilute gases, this assumption is radically false when we consider systems with non-negligible interactions such as liquids, solids, or gravitating systems. Hence, the above justification for regarding the macro-state for which SB is maximal as the equilibrium state is restricted to dilute gases, and it is not clear whether the equilibrium macro-state can be defined in the same way in systems that are not of this kind. There is a heuristic argument for the conclusion that this is problematic. Consider a system of gravitating particles. These particles attract each other and hence have the tendency to clump together. So if it happens that a large amount of these are distributed evenly over a bounded space, then they will move together and eventually form a lump. However, the phase volume corresponding to a lump is much smaller than the one corresponding to the original spread out state, and hence it has lower Boltzmann entropy.49 So we have here a system that evolves from a high to a low entropy state. This problem is usually ‘solved’ 49 A possible reply to this is that the loss in volume in configuration space is compensated by an increase in volume in momentum space. Whether this argument is in general correct is an open question; there at least seem to be scenarios in which it is not, namely ones in which all particles end up moving around with almost the same velocity and hence only occupy a small volume of momentum space.

THE BOLTZMANN APPROACH

137

by declaring that things are different in a gravitating system and that we should, in such cases, regard the spread out state as one of low entropy and the lump as one of high entropy. Whether or not this ad hoc move is convincing may well be a matter of contention. But even if it is, it is of no avail to the Boltzmannian. Even if one redefines entropy such that the lump has high and the spread out state low entropy, it is still a fact that the phase volume corresponding to the spread out state is substantially larger than the one corresponding to the lump, and Boltzmannian explanations of thermodynamic behaviour typically make essential use of the fact that the equilibrium macro-region is the largest of all macro regions. Hence macro-states need to be defined differently in the context of interacting systems. Goldstein and Lebowitz (2004, pp. 60-3) discuss the problem of defining macro-states for particles interacting with a two-body potential φ(qi − qj ), where qi and qj are the position coordinates of two particles, and they develop a formalism for calculating the Boltzmann entropy for systems consisting of a large number of such particles. However, the formalism yields analytical results only for the special case of a system of hard balls. Numerical considerations also provide results for (two-dimensional) particles interacting with a cutoff LennardJones potential, i.e. a potential that has the Lennard-Jones form for |qi −qj | ≤ rc and is zero for all |qi − qj | > rc , where rc is a cutoff distance (Garrido, Goldstein and Lebowitz 2004, p. 2). These results are interesting, but they do not yet provide the sought-after generalisation of the Boltzmann approach to more realistic systems. Hard ball systems are like ideal gases in that the interaction of the particles do not contribute to the energy of the system; the only difference between the two is that hard balls are extended while the ‘atoms’ of an ideal gas are point particles. Similarly, the cutoff Lennard-Jones potential also represents only a small departure from the idea of the ideal gas as the cutoff distance ensures that no long range interactions contribute to the energy of the system. However, typical realistic interactions such as gravity and electrostatic attraction/repulsion are long range interactions. Hence, it is still an open question whether the Boltzmann formalism can be extended to systems with realistic interactions. 3.2.8

Reductionism

Over the past decades, the issue of reductionism has attracted the attention of many philosophers and a vast body of literature on the topic has grown; Kim (1998) presents a brief survey; for a detailed discussion of the different positions see Hooker (1981) and Batterman (2002, 2003); Dupr´e (1993) expounds a radically sceptical perspective on reduction. This enthusiasm did not resonate with those writing on the foundations of SM and the philosophical debates over the nature (and even desirability) of reduction had rather little impact on work done on the foundations of SM (this is true for both the Boltzmannian and Gibbsian traditions). This is not the place to make up for this lack of interaction between two communities, but it should be pointed out that it might be beneficial to both

138 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

those interested in reduction as well as those working on the foundations of SM to investigate whether, and if so how, philosophical accounts of reduction relate to SM and what consequences certain philosophical perspectives on reduction would have on how we think about the aims and problems of SM. One can only speculate about what the reasons for this mutual disinterest are. A plausible explanation seems to be that reductionism has not been perceived as problematic by those working on SM and hence there did not seem to be a need to turn to the philosophical literature. A look at how reductionism is dealt with in the literature on SM confirms this suspicion: by and large there is agreement that the aim of SM is to derive, fully and rigorously, the laws of TD from the underlying micro theory. This has a familiar ring to it for those who know the philosophical debates over reductionism. In fact, it is precisely what Nagel (1961, Chapter 11) declared to be the aim of reduction. So one can say that the Nagelian model of reduction is the (usually unquestioned and unacknowledged) ‘background philosophy’ of SM. This sets the agenda. I will first introduce Nagel’s account of reduction, discuss some of its problems, mention a possible ramification, and then examine how well the achievements of SM square with this conception of reduction. At the end I will mention some further issues in connection with reduction. The core idea of Nagel’s theory of reduction is that a theory T1 reduces a theory T2 (or T2 is reduced to T1 ) only if the laws of T2 are derivable from those of T1 ; T1 is then referred to as the ‘reducing theory’ and T2 as the ‘reduced theory’. In the case of a so-called homogeneous reduction both theories contain the same descriptive terms and use them with (at least approximately) the same meaning. The derivation of Kepler’s laws of planetary motion and Galileo’s law of free fall from Newton’s mechanics are proposed as paradigm cases of reductions of this kind. Things get more involved in the case of so-called ‘heterogeneous’ reductions, when the two theories do not share the same descriptive vocabulary. The reduction of TD belongs to this category because both TD and SM contain concepts that do not form part of the other theory (e.g. temperature is a TD concept that does not appear in the core of SM, while trajectories and phase functions are foreign to TD), and others are used with very different meanings (entropy is defined in totally dissimilar ways in TD and in SM). In this case so-called ‘bridge laws’ need to be introduced, which connect the vocabulary of both theories. More specifically, Nagel requires that for every concept C of T2 that does not appear in T1 there be a bridge law connecting C to concepts of T1 (this is the so-called ‘requirement of connectability’). The standard example of a bridge law is the equipartition relation E = 3/2kB T , connecting temperature T with the mean kinetic energy E. Bridge laws carry with them a host of interpretative problems. What status do they have? Are they linguistic conventions? Or are they factual statements? If so, of what sort? Are they statements of constant conjunction (correlation) or do they express nomic necessities or even identities? And depending on which option one chooses the question arises of how a bridge law is established. Is

THE BOLTZMANN APPROACH

139

it a factual discovery? By which methods is it established? Moreover, in what sense has T1 reduced T2 if the reduction can only be carried out with the aid of bridge laws which, by definition, do not belong to T1 ? Much of the philosophical discussions on Nagelian reduction has centred around these issues. Another problem is that strict derivability often is too stringent a requirement because only approximate versions of the T2 -laws can be obtained. For instance, it is not possible to derive strict universal laws from a statistical theory. To make room for a certain mismatch between the two theories, Schaffner (1976) introduced the idea that concepts of T2 often need to be modified before they can be reduced to T1 . More specifically, Schaffner holds that T1 reduces T2 only if there is a corrected version T2∗ of T2 such that T2∗ is derivable from T1 given that (1) the primitive terms of T2∗ are associated via bridge laws with various terms of T1 , (2) T2∗ corrects T2 in the sense that T2∗ makes more accurate predictions than T2 and (3) T2∗ and T2 are strongly analogous. With this notion of reduction in place we can now ask whether Boltzmannian SM reduces TD in this sense. This problem is usually narrowed down to the question of whether the Second Law of TD can be deduced from SM. This is of course an important question, but it is by no means the only one; I come back to other issues below. From what has been said so far it is obvious that the Second Law cannot be derived from SM. The time reversal invariance of the dynamics and Poincar´e recurrence imply that the Boltzmann entropy does not increase monotonically at all times. In fact, when an SM system has reached equilibrium it fluctuates away from equilibrium every now and then. Hence, a strict Nagelian reduction of TD to SM is not possible. However, following Schaffner, this is anyway too much to ask for; what we should look for is a corrected version TD* of TD, which satisfies the above-mentioned conditions and which can be reduced to SM. Callender (2001, pp. 542-5) argues that this is precisely what we should do because trying to derive the exact Second Law would amount to ‘taking thermodynamics too seriously’; in fact, what we need to derive from SM is an ‘analogue’ of the Second Law.50 One such analogue is BL, although there may be other candidates. The same move helps us to reduce thermodynamic irreversibility. Callender (1999, p. 359 and pp. 364-7) argues that it is a mistake to try to deduce strict irreversibility from SM. All we need is an explanation of how phenomena that are irreversible on an appropriate time scale emerge from SM, where what is appropriate is dictated by the conditions of observation. In other words, what we need to recover from SM is the phenomena supporting TD, not a strict reading of the TD laws. Given this, the suggestion is that SB can plausibly be regarded as the SM counterpart of the entropy of TD*. This is a plausible suggestion, but it seems that more needs to be said by way of justification. Associating SB with the 50 The same problem crops up when reducing the notions of equilibrium (Callender 2001, pp. 545-7) and the distinction between intensive and extensive TD variables (Yi 2003, pp. 1031-2) to SM: a reduction can only take place if we first present a suitably revised version of TD.

140 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

entropy of TD* effectively amounts to introducing a bridge law that defines the TD* entropy in terms of the logarithm of the phase volume of macro regions. This brings back all the above questions about the nature of bridge laws. What justifies the association of TD* entropy with its SM counterpart? Of what kind is this association? The discussion of the relation between the two entropies is usually limited to pointing out that the values of the two coincide in relevant situations. This certainly is an important point, but it does not answer the deeper questions about the relationship between the two concepts. Although the second law occupies centre stage in TD, it is not the only law that needs to be reduced; in particular, we need to account for how the First Law of TD reduces to SM. And in this context a further problem crops up (Sklar 1999, p. 194). To explain how systems of very different kinds can transfer energy to one another, we need to assume that these systems have temperatures. This, in turn, implies that temperature can be realised in radically different ways; in other words, temperature is multiply realisable. How can that be? How do the various ‘realisers’ of temperature relate to one another? What exactly makes them realisers of this concept and why can we give them a uniform treatment in the theory?51 Similar problems also appear when we reduce more ‘local’ laws and properties to SM. For instance, the relation between pressure, volume and temperature of an ideal gas is given by the equation pV = nkB T , the so called ‘ideal gas law’. In order to derive this law we need to make associations, for instance between pressure and mechanical properties like mass and momentum transfer, that have the character of bridge laws. How are these justified? Sklar (1993, pp. 349-50) points out how complex even this seemingly straightforward case is. And then there are those TD concepts that SM apparently remains silent about. Most importantly the concept of a quasi-static transformation (or process), which lies at the heart of TD. The laws of TD only apply to equilibrium situations and therefore changes in a system have to be effected in a way that never pushes the system out of equilibrium , i.e. by so-called quasi-static transformations (see Uffink (2001) for discussion of this concept). But what does it mean in SM to perform a quasi-static transformation on a system?52 Furthermore, one of the alleged payoffs of a successful reduction is explanation, i.e. the reduction is supposed to explain the reduced theory. Does SM explain TD and if so in what sense? This question is clearly stated by Sklar (1993, pp. 148-54; 2000, p. 740) Callender (1999, pp. 372-3) and Hellman (1999, p. 210), but it still awaits an in-depth discussion. 3.3 The Gibbs Approach At the beginning of the Gibbs approach stands a radical rupture with the Boltzmann programme. The object of study for the Boltzmannians is an individual 51 For a further discussion of temperature see Sklar (1993, pp. 351-4), Uffink (1996, pp. 383-6) and Yi (2003, pp. 1032-6). 52 Thanks to Wolfgang Pietsch for drawing my attention to this point.

THE GIBBS APPROACH

141

system, consisting of a large but finite number of micro constituents. By contrast, within the Gibbs framework the object of study is a so-called ensemble, an uncountably infinite collection of independent systems that are all governed by the same Hamiltonian but distributed over different states. Gibbs introduces the concept as follows: We may imagine a great number of systems of the same nature, but differing in the configurations and velocities which they have at a given instant, and differing not only infinitesimally, but it may be so as to embrace every conceivable combination of configuration and velocities. And here we may set the problem, not to follow a particular system through its succession of configurations, but to determine how the whole number of systems will be distributed among the various conceivable configurations and velocities at any required time, when the distribution has been given for some one time. (Gibbs 1902, p. v)

Ensembles are fictions, or ‘mental copies of the one system under consideration’ (Schr¨ odinger 1952, p. 3); they do not interact with each other, each system has its own dynamics, and they are not located in space and time. Ensembles should not be confused with collections of micro-objects such as the molecules of a gas. The ensemble corresponding to a gas made up of n molecules, say, consists of an infinite number of copies of the entire gas; the phase space of each system in the ensemble is the 6n-dimensional γ-space of the gas as a whole. 3.3.1

The Gibbs Formalism

Consider an ensemble of systems. The instantaneous state of one system of the ensemble is specified by one point in its γ-space, also referred to as the system’s micro-state.53 The state of the ensemble is therefore specified by an everywhere positive density function ρ(q, p, t) on the system’s γ-space.54 The time evolution of the ensemble is then associated with changes in the density function in time. Within the Gibbs formalism ρ(q, p, t) is regarded as a probability density, reflecting the probability of finding the state of a system chosen at random from the entire ensemble in region R ⊆ Γ55 at time t:  pt (R) =

ρ(q, p, t)dΓ

(3.23)

R

For this reason the distribution has to be normalised: 53 To be more precise, the system’s fine-grained micro-state. However, within the Gibbs approach coarse-graining enters the stage only much later (in §3.3.5) and so the difference between coarse-grained and fine-grained micro-states need not be emphasised at this point. 54 That is, ρ(q, p, t) ≥ 0 for all (q, p) ∈ Γ and all instants of time t. γ 55 The μ-space of a system does not play any rˆ ole in the Gibbs formalism. For this reason I from now on drop the subscript ‘γ’ and only write ‘Γ’ instead of ‘Γγ ’ when referring to a system’s γ-space.

142 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

 Γ

ρ(q, p, t) dΓ = 1.

(3.24)

Now consider a real valued function f : Γ × t → R. The phase average (some times also ‘ensemble average’) of this function is given by:  f¯(t) =

Γ

f (q, p, t)ρ(q, p, t) dΓ.

(3.25)

Phase averages occupy centre stage in the Gibbs formalism because it is these that, according to the formalism, we observe in experiments. More specifically, the Gibbs formalism postulates that to every experimentally observable quantity F (t) (with the exception of absolute temperature and entropy) there corresponds a phase function f (q, p, t) such that eq.3.25 yields the value that we should expect to find in an experiment: F (t) = f¯(t). Using the principles of Hamiltonian mechanics one can then prove that the total derivative of the density function equals zero, dρ = 0, dt

(3.26)

which is commonly referred to as ‘Liouville’s theorem’ in this context. Intuitively, this theorem says that ρ moves in phase space like an incompressible fluid. With eq.3.44 in the Appendix it follows that the time evolution of ρ is given by Liouville’s equation: ∂ρ = −{ρ, H}, ∂t

(3.27)

where { · , · } is the Poisson bracket and H the Hamiltonian governing the system’s dynamics. By definition, a distribution is stationary iff ∂ρ/∂t = 0 for all t. Given that observable quantities are associated with phase averages and that equilibrium is defined in terms of the constancy of the macroscopic parameters characterising the system, it is natural to regard the stationarity of the distribution as defining equilibrium because a stationary distribution yields constant averages.56 For this reason Gibbs refers to stationarity as the ‘condition of statistical equilibrium’. Among all stationary distributions57 those satisfying a further requirement, the Gibbsian maximum entropy principle, play a special rˆ ole. The fine-grained Gibbs entropy (sometimes also ‘ensemble entropy’) is defined as 56 Provided that the observable f itself is not explicitly time dependent, in which case one would not require equilibrium expectation values to be constant. 57 As Gibbs notes, every distribution that can be written as a function of the Hamiltonian is stationary.

THE GIBBS APPROACH

143

 SG (ρ) := −kB

Γ

ρ log(ρ)dΓ.

(3.28)

The Gibbsian maximum entropy principle then requires that SG (ρ) be maximal, given the constraints that are imposed on the system. The last clause is essential because different constraints single out different distributions. A common choice is to keep both the energy and the particle number in the system fixed: E=const and n=const (while also assuming that the spatial extension of the system is finite). One can prove that under these circumstances SG (ρ) is maximal for what is called the ‘microcanonical distribution’ (or ‘microcanonical ensemble’), the distribution which is uniform on the energy hypersurface H(q, p) = E and zero elsewhere: ρ(q, p) = C δ[E − H(q, p)],

(3.29)

where C is some suitable normalisation constant and δ is Dirac’s delta function.58 If we choose to hold the number of particles constant while allowing for energy fluctuations around a given mean value we obtain the so-called canonical distribution; if we also allow the particle number to fluctuate around a given mean value we find the so-called grand-canonical distribution (for details see, for instance, Tolman 1938, Chapters 3 and 4). 3.3.2

Problems and Tasks

In this subsection I list the issues that need to be addressed in the Gibbs programme and make some remarks about how they differ from the problems that arise in the Boltzmann framework. Again, these issues are not independent of each other and the response to one bears on the responses to the others. 3.3.2.1 Issue 1: Ensembles and Systems The most obvious problem concerns the use of ensembles. The probability distribution in the Gibbs approach is defined over an ensemble, the formalism provides ensemble averages, and equilibrium is regarded as a property of an ensemble. But what we are really interested in is the behaviour of a single system. What can the properties of an ensemble, a fictional entity consisting of infinitely many copies of a system, tell us about the one real system that we investigate? And how are we to reconcile the fact that the Gibbs formalism treats equilibrium as a property of an ensemble with physical common sense and thermodynamics, both of which regard an individual system as the bearer of this property? These difficulties raise the question of whether the commitment to ensembles could be renounced. Are ensembles really an irreducible part of the Gibbsian 58 This distribution is sometimes referred to as the ‘super microcanonical distribution’ while the term ‘microcanonical distribution’ is used to refer to a slightly different distribution, namely one that is constant on a thin but finite ‘sheet’ around the accessible parts of the energy hypersurface and zero elsewhere. It turns out that the latter distribution is mathematically more manageable.

144 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

scheme or are they just an expedient, or even a pedagogical ploy, of no fundamental significance? If so, how can the theory be reformulated without appeal to ensembles? These questions are of fundamental significance, not least because it is the use of ensembles that frees the Gibbs approach from some of the most pressing problems of the Boltzmann approach, namely the reversal and the recurrence objections. These arise exactly because we are focussing on what happens in an individual system; in an ensemble recurrence and reverse behaviour are no problem because it can be accepted that some systems in the ensemble will behave non-thermodynamically, provided that their contribution to the properties of the ensemble as a whole is taken into account when calculating ensemble averages. So some systems behaving strangely is no objection as this does not imply that the ensemble as a whole behaves in a strange way too. 3.3.2.2 Issue 2: The Connection with Dynamics and the Interpretation of Probability The microcanonical distribution has been derived from the Gibbsian maximum entropy principle and the requirement that the equilibrium distribution be stationary. Neither of these requirements make reference to the dynamics of the system. However, as in the case of the combinatorial argument, it seems odd that equilibrium conditions can be specified without any appeal to the dynamics of the systems involved. That equilibrium can be characterised by a microcanonical distribution must, or so it seems, have something to do with facts about the system in question. Understanding the connection between the properties of a system and the Gibbsian probability distribution is complicated by the fact that the distribution is one pertaining to an ensemble rather than an individual system. What, if anything, in the dynamics gives rise to, or justifies, the use of the microcanonical distribution? And if there is no such justification, what is the reason for this? Closely related to the question of how the probability distribution relates to the system’s dynamics is the problem of interpreting these probabilities. The options are the same as in §3.2.3.2 and need not be repeated here. What is worth emphasising is that, as we shall see, different interpretations of probability lead to very different justifications of the maximum entropy requirement and its connection to the dynamics of the system; in fact, in non-equilibrium theory they lead to very different formalisms. Thus, this is a case where philosophical commitments shape scientific research programmes. 3.3.2.3 Issue 3: Why Does Gibbs Phase Averaging Work? The Gibbs formalism posits that what we observe in actual experiments are phase averages. Practically speaking this method works just fine. But why does it work? Why do averages over an ensemble coincide with the values found in measurements performed on an actual physical system in equilibrium? There is no obvious connection between the two and if Gibbsian phase averaging is to be more than a black-box technique then we have to explain what the connection between phase averages and measurement values is.

THE GIBBS APPROACH

145

3.3.2.4 Issue 4: The Approach to Equilibrium Phase averaging only applies to equilibrium systems and even if we have a satisfactory explanation of why this procedure works, we are still left with the question of why and how the system reaches equilibrium at all if, as often happens, it starts off far from equilibrium. Gibbsian non-equilibrium theory faces two serious problems. The first is that the Gibbs entropy is constant. Consider now a system out of equilibrium, characterised by a density ρq,p,t . This density is not stationary and its entropy not maximal. Given the laws of thermodynamics we would expect this density to approach the equilibrium density as time evolves (e.g. in the case of a system with constant energy and constant particle number we would expect ρq,p,t to approach the microcanonical distribution), which would also be reflected in an increase in entropy. This expectation is frustrated. Using Liouville’s equation one can prove that ρq,p,t does not approach the microcanonical distribution and, what seems worse, that the entropy does not increase at all. In fact, it is straightforward to see that SG is a constant of the motion (Zeh, 2001, pp. 48-9); that is, dSG (ρq,p,t )/dt = 0, and hence SG (ρq,p,t ) = dSG (ρq,p,0 ) for all times t. This precludes a characterisation of the approach to equilibrium in terms of increasing Gibbs entropy. Hence, either such a characterisation has to be given up (at the cost of being fundamentally at odd with thermodynamics), or the formalism has to be modified in a way that makes room for entropy increase. This precludes a characterisation of the approach to equilibrium in terms of increasing Gibbs entropy. Hence, either such a characterisation has to be given up (at the cost of being fundamentally at odds with thermodynamics), or the formalism has to be modified in a way that makes room for entropy increase. The second problem is the characterisation of equilibrium in terms of a stationary distribution. The Hamiltonian equations of motion, which govern the system, preclude an evolution from a non-stationary to a stationary distribution: if, at some point in time, the distribution is non-stationary, then it will remain non-stationary for all times and, conversely, if it is stationary at some time, then it must have been stationary all along (van Lith 2001a, 591-2). Hence, if a system is governed by Hamilton’s equation, then a characterisation of equilibrium in terms of stationary distributions contradicts the fact that an approach to equilibrium takes place in systems that are not initially in equilibrium. Clearly, this is a reductio of a characterisation of equilibrium in terms of stationary distributions. The reasoning that led to this characterisation was that an equilibrium state is one that remains unchanged through time, which, at the mechanical level, amounts to postulating an unchanging, i.e. stationary, distribution. This was too quick. Thermodynamic equilibrium is defined as a state in which all macro-parameters describing the system are constant. So all that is needed for equilibrium is that the distribution be such that mean values of the functions associated with thermodynamic quantities are constant in time (Sklar 1978, p. 191). This is a much weaker requirement because it can be met by distributions that are not stationary. Hence we have to come to a more ‘liberal’

146 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

characterisation of equilibrium; the question is what this characterisation is.59 3.3.2.5 Issue 5: Reductionism Both the Boltzmannian and the Gibbsian approach to SM eventually aim to account for the TD behaviour of the systems under investigation. Hence the questions for the Gibbs approach are exactly the same as the ones mentioned in §3.2.3.7, and the starting point will also be Nagel’s model of reduction (introduced in §3.2.8). 3.3.2.6 Plan As mentioned above, the methods devised to justify the use of the microcanonical distribution and the legitimacy of phase averaging, as well as attempts to formulate a coherent non-equilibrium theory are radically different depending on whether probabilities are understood ontically or epistemically. For this reason it is best to discuss these two families of approaches separately. §3.3.3 presents arguments justifying Gibbs phase averaging on the basis of an ontic understanding of the probabilities involved. What this understanding might be is discussed in §3.3.4. I turn to this question only after a discussion of different justifications of phase averaging because although an ontic understanding of probabilities is clearly assumed, most writers in this tradition do not discuss this assumption explicitly and one can only speculate about what interpretation of probability they might endorse. §3.3.5 is concerned with different approaches to non-equilibrium that are based on this interpretation of probabilities. In §3.3.6 I discuss the epistemic approach to the Gibbs formalism. I close this Section with a discussion of reductionism in the Gibbs approach (§3.3.7). 3.3.3

Why Does Gibbs Phase Averaging Work?

Why do phase averages coincide with values measured in actual physical systems? There are two families of answers to this question, one based on ergodic theory (using ideas we have seen in the Boltzmann Section), the other building on the notion of a thermodynamic limit. For reasons of space we will treat this second approach much more briefly. 3.3.3.1 Time Averages and Ergodicity Common wisdom justifies the use of phase averages as follows.60 The Gibbs formalism associates physical quantities with functions on the system’s phase space. Making an experiment to measure one of these quantities takes some time. So what measurement devices register is not the instantaneous value of the function in question, but rather its time average over the duration of the measurement; hence it is time averages that 59 Leeds (1989, pp. 328-30) also challenges as too strong the assumption that a physical system in an equilibrium state has a precise probability distribution associated with it. Although this may well be true, this seems to be just another instance of the time-honoured problem of how a precise mathematical description is matched up with a piece of physical reality that is not intrinsically mathematical. This issue is beyond the scope of this review. 60 This view is discussed but not endorsed, for instance, in Malament and Zabell (1980, p. 342), Bricmont (1996, pp. 145-6), Earman and Redei (1996, pp. 67-9), and van Lith (2001a, pp. 581-3).

THE GIBBS APPROACH

147

are empirically accessible. Then, so the argument continues, although measurements take an amount of time that is short by human standards, it is long compared to microscopic time scales on which typical molecular processes take place (sometimes also referred to as ‘microscopic relaxation time’). For this reason the actually measured value is approximately equal to the infinite time average of the measured function. This by itself is not yet a solution to the initial problem because the Gibbs formalism does not provide us with time averages and calculating these would require an integration of the equations of motion, which is unfeasible. This difficulty can be circumvented by assuming that the system is ergodic. In this case time averages equal phase averages, and the latter can easily be obtained from the formalism. Hence we have found the sought-after connection: the Gibbs formalism provides phase averages which, by ergodicity, are equal to infinite time averages, and these are, to a good approximation, equal to the finite time averages obtained from measurements. This argument is problematic for at least two reasons (Malament and Zabell 1980, pp. 342-3; Sklar 1973, p. 211). First, from the fact that measurements take some time it does not follow that what is actually measured are time averages. Why do measurements produce time averages and in what way does this depend on how much time measurements take? Second, even if we take it for granted that measurements do produce finite time averages, then equating these with infinite time averages is problematic. Even if the duration of the measurement is very long (which is often not the case as actual measurement may not take that much time), finite and infinite averages may assume very different values. And the infinity is crucial: if we replace infinite time averages by finite ones (no matter how long the relevant period is taken to be), then the ergodic theorem does not hold any more and the explanation is false. Besides, there is another problem once we try to apply the Gibbs formalism to non-equilibrium situations. It is a simple fact that we do observe how systems approach equilibrium, i.e. how macroscopic parameter values change, and this would be impossible if the values we observed were infinite time averages. These criticisms seem decisive and call for a different strategy in addressing Issue 3. Malament and Zabell (1980) respond to this challenge by suggesting a new way of explaining the success of equilibrium theory, at least for the microcanonical ensemble. Their method still invokes ergodicity but avoids altogether appeal to time averages and only invokes the uniqueness of the measure (see §3.2.4). Their explanation is based on two Assumptions (ibid., p. 343). Assumption 1. The phase function f associated with a macroscopic parameter of the system exhibits small dispersion with respect to the microcanonical measure; that is, the set of points on the energy hypersurface ΓE at which f assumes values that differ significantly from its phase average has vanishingly small microcanonical measure. Formally, for any ‘reasonably small’ ε > 0 we have

148 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

 λ

 x ∈ ΓE : | f (x) −

ΓE

f dλ | ≥ ε



≈ 0,

(3.30)

where λ is the microcanonical measure (i.e. the measure that is constant on the accessible part of the energy hypersurface and normalised). Assumption 2. At any given time, the microcanonical measure represents the probability of finding the system in a particular subset of the phase space: p(A) = λ(A), where A is a measurable but otherwise arbitrary subset of ΓE . These two assumptions jointly imply that, at any given time, it is overwhelmingly likely that the system’s micro-state is one for which the value of f coincides with, or is very close to, the phase average. The question is how these assumptions can be justified. In the case of Assumption 1 Malament and Zabell refer to a research programme that originated with the work of Khinchin. The central insight of this programme is that phase functions which are associated with macroscopic parameters satisfy strong symmetry requirements and as a consequence turn out to have small dispersion on the energy surface for systems with a large number of constituents. This is just what is needed to justify Assumption 1. This programme will be discussed in the next subsection; let us assume for now that it provides a satisfactory justification of Assumption 1. To justify Assumption 2 Malament and Zabell introduce a new postulate: the equilibrium probability measure p( · ) of finding a system’s state in a particular subset of ΓE must be absolutely continuous with respect to the microcanonical measure λ (see §3.2.4). Let us refer to this as the ‘Absolute Continuity Postulate’ (ACP).61 Now consider the dynamical system (X, φ, λ), where X is ΓE , φ is the flow on ΓE induced by the equations of motion governing the system, and λ is the microcanonical measure on ΓE . Given this, one can present the following argument in support of Assumption 2 (ibid. p. 345): (P1) (X, φ, λ) is ergodic. (P2) p( · ) is invariant in time because this is the defining feature of equilibrium probabilities. (P3) by ACP, p( · ) is absolutely continuous with λ. (P4) According to the uniqueness theorem (see §3.2.4.1), λ is the only measure invariant in time. Conclusion: p( · ) = λ. Hence the microcanonical measure is singled out as the one and only correct measure for the probability of finding a system’s micro-state in a certain part of phase space. 61 I formulate ACP in terms of λ because this simplifies the argument to follow. Malament and Zabell require that p( · ) be absolutely continuous with μE , the Lebesgue measure μ on Γ restricted to ΓE . However, on ΓE the restricted Lebesgue measure and the microcanonical measure only differ by a constant: λ = c μE , where c := 1/μE (ΓE ) and hence whenever a measure is absolutely continuous with μE it is also with λ and vice versa.

THE GIBBS APPROACH

149

The new and crucial assumption is ACP and the question is how this principle can be justified. What reason is there to restrict the class of measures that we take into consideration as acceptable equilibrium measures to those that are absolutely continuous with respect to the microcanonical measure? Malament and Zabell respond to this problem by introducing yet another principle, the ‘displacement principle’ (DP). This principle posits that if of two measurable sets in ΓE one is but a small displacement of the other, then it is plausible to believe that the probability of finding the system’s micro-state in one set should be close to that of finding it in the other (ibid., p. 346). This principle is interesting because one can show that it is equivalent to the claim that probability distributions are absolutely continuous with respect to the Lebesgue measure, and hence the microcanonical measure, on ΓE (ibid., pp. 348-9).62 To sum up, the advantages of this method over the ‘standard account’ are that it does not appeal to measurement, that it takes into account that SM systems are ‘large’ (via Assumption 1), and that it does not make reference to time averages at all. In fact, ergodicity is used only to justify the uniqueness of the microcanonical measure. The remaining question is what reasons there are to believe in DP. Malament and Zabell offer little by way of justification: they just make some elusive appeal to the ‘method by which the system is prepared or brought to equilibrium’ (ibid., p. 347.) So it is not clear how one gets from some notion of state preparation to DP. But even if it was clear, why should the success of equilibrium SM depend on the system being prepared in a particular way? This seems to add an anthropocentric element to SM which, at least if one is not a proponent of the ontic approach (referred to in §3.2.3.2), seems to be foreign to it. The argument in support of Assumption 2 makes two further problematic assumptions. First, it assumes equilibrium to be defined in terms of a stationary distribution, which, as we have seen above, is problematic because it undercuts a dynamical explanation of the approach to equilibrium (variants of this criticism can be found in Sklar (1978) and Leeds (1989). Second, it is based on the premise that the system in question is ergodic. As we have seen above, many systems that are successfully dealt with by the formalism of SM are not ergodic and hence the uniqueness theorem, on which the argument in support of Assumption 2 is based, does not apply. To circumvent this difficulty Vranas (1998) has suggested replacing ergodicity with what he calls ε-ergodicity. The leading idea behind this move is to challenge the commonly held belief that even if a system is just a ‘little bit’ non-ergodic, then the uniqueness theorem fails completely (Earman and Redei 1996, p. 71). Vranas points out that there is a middle ground between holding and failing completely and then argues that this middle ground actually provides us with everything we need. 62 As Leeds (1989, p. 327) points out, Malament and Zabell’s proof is for Rn and they do not indicate how the proof could be modified to apply to the energy hypersurface, where translations can take one off the surface.

150 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

Two measures λ1 and λ2 are ε-close iff for every measurable set A: |λ1 (A) − λ2 (A)| ≤ ε, where ε is a small but finite number. The starting point of Vranas’ argument is the observation that we do not need p( · ) = λ: to justify Gibbsian phase averaging along the lines suggested by Malament and Zabell all we need is that p( · ) and λ are ε-close, as we cannot tell the difference between a probability measure that is exactly equal to λ and one that is just a little bit different from it. Hence we should replace Assumption 2 by Assumption 2’, that statement that p( · ) and λ are ε-close. The question now is: how do we justify Assumption 2’ ? If a system is non-ergodic then its phase space X is decomposable; that is, there exist two sets, A ⊆ X and B := X \ A, with measure greater than zero which are invariant under the flow. Intuitively, if the system is ‘just a little bit non-ergodic’, then the system is ergodic on B and λ(A)  λ(B) (where, again, λ is the microcanonical measure). This motivates the following definition: A dynamical system (X, λ, φ) is ε-ergodic iff the system’s dynamics is ergodic on a subset Y of X with λ(Y ) = 1 − ε.63 Strict ergodicity then is the limiting case of ε = 0. Furthermore, given two small but finite numbers ε1 and ε2 , Vranas defines λ2 to be ‘ε1 /ε2 -continuous’ with λ1 iff for every measurable set A: λ2 (A) ≤ ε2 if λ1 (A) ≤ ε1 (Vranas 1998, p. 695). Vranas then proves an ‘ε-version’ of the uniqueness theorem, the ε-equivalence theorem (ibid., 703-5): if λ1 is ε1 -ergodic and λ2 is ε1 /ε2 -continuous with respect to λ1 and invariant, then λ1 and λ2 are ε3 -close with ε3 = 2ε2 + ε1 (1 − ε1 )−1 . Given this, the Malament and Zabell argument can be rephrased as follows: (P1’) (X, φ, λ) is ε-ergodic. (P2) p( · ) is invariant in time because this is the defining feature of equilibrium probabilities. (P3’) ε-ACP: p( · ) is ε/ε2 -continuous with respect to λ. (P4’) The ε-equivalence theorem. Conclusion’: p( · ) and λ are ε3 -close with ε3 = 2ε2 + ε(1 − ε)−1 . The assessment of this argument depends on what can be said in favour of (P1’) and (P3’), since (P4’) is a mathematical theorem and (P2) has not been altered. In support of (P1’) Vranas (ibid., p. 695-98) reviews computational evidence showing that systems of interest are indeed ε-ergodic. In particluar, he mentions the following cases. A one-dimensional system of n self-gravitating plane parallel sheets of uniform density was found to be strictly ergodic as n increases (it reaches strict ergodicity for n = 11). The Fermi-Pasta-Ulam system (a one dimensional chain of n particles with weakly nonlinear nearest-neighbour interaction) is ε-ergodic for large n. There is good evidence that a Lennard-Jones gas is ε-ergodic for large n and in the relevant energy range, i.e. for energies large enough so that quantum effects do not matter. From these Vranas draws 63 Vranas (1998, p. 695) distinguishes between ‘ε-ergodic’ and ‘epsilon-ergodic’, where a system is epsilon-ergodic if it is ε-ergodic with ε tiny or zero. In what follows I always assume ε to be tiny and hence do not distinguish between the two.

THE GIBBS APPROACH

151

the tentative conclusion that the dynamical systems of interest in SM are indeed ε-ergodic. But he is clear about the fact that this is only a tentative conclusion and that it would be desirable to have theoretical results. The justification of (P3’) is more difficult. This does not come as a surprise because Malament and Zabell did not present a justification for ACP either. Vranas (ibid., pp. 700–2) presents some arguments based on the limited precision of measurement but admits that this argument invokes premises that he cannot justify. To sum up, this argument enjoys the advantage over previous arguments that it does not have to invoke strict ergodicity. However, it is still based on the assumption that equilibrium is characterised by a stationary distribution, which, as we have seen, is an obstacle when it comes to formulating a workable Gibbsian non-equilibrium theory. In sum, it is still an open question whether the ergodic programme can eventally explain in a satisfactory way why Gibbsian SM works. 3.3.3.2 Khinchin’s Programme and the Thermodynamic Limit Ergodic theory works at a general level in that it makes no assumptions about the number of degrees of freedom of the system under study and does not restrict the allowable phase functions beyond the requirement that they be integrable. Khinchin (1949) points out that this generality is not only unnecessary; actually it is the source of the problems that this programme encounters. Rather than studying dynamical systems at a general level, we should focus on those cases that are relevant in statistical mechanics. This involves two restrictions. First, we only have to consider systems with a large number of degrees of freedom; second, we only need to take into account a special class of phase function, so-called sum functions. A function is a sum function if it can be written as a sum over one-particle functions:

f (x) =

n 

fi (xi ),

(3.31)

i=1

where xi is the vector containing the position and momentum coordinates of particle i (that is, xi ∈ R6 while x ∈ R6n ) Under the assumption that the Hamiltonian of the system is a sum function as well, Khinchin can prove the following theorem: Khinchin’s Theorem. For all sum functions f there are positive constants k1 and k2 such that ˛ ∗ ˛ “n o” ˛ f (x) − f¯˛ −1/4 λ x ∈ ΓE : ˛ (3.32) ≤ k2 n−1/4 , ˛ ≥ k1 n ¯ f where λ is the microcanonical measure.

This theorem is sometimes also referred to as ‘Khinchin’s ergodic theorem’; let us say that a system satisfying the condition specified in Khinchin’s theorem is ‘K-

152 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

ergodic’.64 For a summary and a discussion of the proof see Batterman (1998, pp. 190-8), van Lith (2001b, pp. 83-90) and Badino (2006). Basically the theorem says that as n becomes larger, the measure of those regions on the energy hypersurface where the time and the space means differ by more than a small amount tends towards zero. For any finite n, K-ergodicity is weaker than ergodicity in the sense that the region where time and phase average do not coincide can have a finite measure, while it is of measure zero if the system is ergodic; this discrepancy vanishes for n → ∞. However, even in the limit for n → ∞ there is an important difference between ergodicity and K-ergodicity: if K-ergodicity holds, it only holds for a very special class of phase functions, namely sum-functions; ergodicity, by contrast, holds for any λ-integrable function. A number of problems facing an explanation of equilibrium SM based on K-ergodicity need to be mentioned. First, like the afore-mentioned approaches based on ergodicity, Khinchin’s programme associates the outcomes of measurements with infinite time averages and is therefore vulnerable to the same objections. Second, ergodicity’s measure zero problem turns into a ‘measure k2 n−1/4 problem’, which is worse because now we have to justify that a part of the energy hypersurface of finite measure (rather than measure zero) can be disregarded. Third, the main motivation for focussing attention on sum-functions is the claim that all relevant functions, i.e. the ones that correspond to thermodynamic quantities, are of that kind. Batterman (1998, p. 191) points out that this is too narrow as there are functions of interest that do not have this form. A further serious difficulty is what Khinchin himself called the ‘methodological paradox’ (Khinchin 1949, pp. 41-3). The proof of the above theorem assumes the Hamiltonian to be a sum function (and this assumption plays a crucial rˆ ole in the derivation of the theorem). However, for an equilibrium state to arise to begin with, the particles have to interact (collide), which cannot happen if the Hamiltonian is a sum function. Khinchin’s response is to assume that there are only short range interactions between the molecules (which is the case, for instance, in a hard ball gas). If this is the case, Khinchin argues, the interactions are effective only on a tiny part of the phase space and hence have no significant effect on averages. This response has struck many as unsatisfactory and ad hoc, and so the methodological paradox became the starting point for a research programme now known as the ‘thermodynamic limit’, investigating the question of whether one can still prove ‘Khinchin-like’ results in the case of Hamiltonians with interaction terms. Results of this kind can be proven in the limit for n → ∞, if also the volume V of the system tends towards infinity in such a way that the number density n/V remains constant. This programme, championed among others by Lanford, Mazur, Ruelle, and van der Linden, has reached a tremendous degree of mathematical sophistication and defies summary in simple terms. Classic state64 K-ergodicity

should not be conflated with the property of being a K-system; i.e. being a system having the Kolmogorov property.

THE GIBBS APPROACH

153

ments are Ruelle (1969, 2004); surveys and further references can be found in Compagner (1989), van Lith (2001b, pp. 93-101) and Uffink (2007, pp. 1020-8). A further problem is that for finite n K-ergodic systems need not be metrically transitive. This calls into question the ability of an approach based on K-ergodicity to provide an answer to the question of why measured values coincide with microcanonical averages. Suppose there is some global constant of motion other than H, and as a result the motion of the system remains confined to some part of the energy hypersurface. In this case there is no reason to assume that microcanonical averages with respect to the entire energy hypersurface coincide with measured values. Faced with this problem one could argue that each system of that kind has a decomposition of its energy hypersurface into different regions of non-zero measure, some ergodic and others not, and that, as n and V get large, the average values of relevant phase functions get insensitive towards the non-ergodic parts. Earman and R´edei (1996, p. 72) argue against this strategy on the grounds that it is straightforward to construct an infinity of normalised invariant measures that assign different weights to these regions than does the microcanonical measure. However, phase averages with respect to these other measures can deviate substantially form microcanonical averages, and it is to be expected that these predictions turn out wrong. But why? In a non-ergodic system there is no reason to grant the microcanonical measure a special status and Khinchin’s approach does not provide a reason to expect microcanonical averages rather than any other average value to correspond to measurable quantities. Batterman (1998) grants this point but argues that there is another reason to expect correspondence with observed values; but this reason comes from a careful analysis of renormalisation group techniques and their application to the case at hand, rather than any feature of either Khinchin’s approach or the thermodynamic limit. A discussion of these techniques is beyond the scope of this review; the details of the case at hand are considered in Batterman (1998), and a general discussion of renormalisation and its relation to issues in connection with reductionism and explanation can be found in Batterman (2002). 3.3.4 Ontic Probabilities in Gibbs’ Theory Two ontic interpretations of Gibbsian probabilities have been suggested in the literature: frequentism and time averages. Let us discuss them in turn. 3.3.4.1 Frequentism A common way of looking at ensembles is to think about them in analogy with urns, but rather than containing balls of different colours they contain systems in different micro-states. This way of thinking about ρ was first suggested in a notorious remark by Gibbs (1902, p. 163), in which he observes that ‘[w]hat we know about a body can generally be described most accurately and most simply by saying that it is one taken at random from a great number (ensemble) of bodies which are completely described’. Although Gibbs himself remained non-committal as regards an interpretation of probability, this point of view naturally lends itself to a frequentist analysis of probabilities. In

154 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

this vein Malament and Zabell (1980, p. 345) observe that one can regard Gibbsian probabilities as representing limiting relative frequencies within an infinite ensemble of identical systems. First appearances notwithstanding, this is problematic. The strength of frequentism is that it grounds probabilities in facts about the world. There are some legitimate questions for those who associate probabilities with infinite limiting frequencies, as these are not experimental facts. However, frequentists of any stripe agree that one single outcome is not enough to ground a probability claim. But this is the best we can ever get in Gibbsian SM. The ensemble is a fictitious entity; what is real is only the one system in the laboratory and so we can make at most one draw from this ensemble. All the other draws would be hypothetical. But on what grounds do we decide what the result of these draws would be? It is obvious that these hypothetical draws do not provide a basis for a frequentist interpretation of probabilities. Another way of trying to ground a frequency interpretation is to understand frequencies as given by consecutive measurements made on the actual system. This move successfully avoids the appeal to hypothetical draws. Unfortunately this comes at the price of another serious problem. Von Mises’ theory requires that successive trials whose outcomes make up the sequence on which the relative frequencies are defined (the collective) be independent. This, as von Mises himself pointed out, is generally not the case if the sequence is generated by one and the same system.65 So making successive measurements on the same system does not give us the kind of sequences needed to define frequentist probabilities. 3.3.4.2 Time Averages Another interpretation regards Gibbsian probabilities as time averages of the same kind as the ones we discussed in §3.2.4. On this view, pt (R) in eq.3.23 is the average time that the system spends in region R. As in the case of Boltzmannian probabilities, this is in need of qualification as a relevant interval over which the time average is taken has to be specified and the dependence on initial conditions has to vanish. If, again, we assume that the system is ergodic on the energy hypersurface we obtain neat answers to these questions (just as in the Boltzmann case). Assuming the system to be ergodic solves two problems at once. For one, it puts the time average interpretation on solid grounds (for the reasons discussed in §3.2.4.2 in the context of the Boltzmannian approach). For another, it offers an explanation of why the microcanonical distribution is indeed the right distribution; i.e. it solves the uniqueness problem. This is important because even if all interpretative issues were settled, we would still be left with the question of which among the infinitely many possible distributions would be the correct one to work with. The uniqueness theorem of ergodic theory answers this question 65 Von Mises discussed this problem in connection with diffusion processes and suggested getting around this difficulty by reconstructing the sequence in question, which is not a collective, as a combination of two sequences that are collectives (von Mises 1939, Chapter 6). Whether this is a viable solution in the context at hand is an open question.

THE GIBBS APPROACH

155

in an elegant way by stating that the microcanonical distribution is the only distribution absolutely continuous with respect to the Lebesgue measure (although some argument still would have to be provided to establish that every acceptable distribution has to be absolutely continuous with respect to the Lebesgue measure). However, this proposal suffers from all the difficulties mentioned in §3.2.4.3, which, as we saw, are not easily overcome. A further problem is that it undercuts an extension of the approach to non-equilibrium situations. Interpreting probabilities as infinite time averages yields stationary probabilities. As a result, phase averages are constant. This is what we expect in equilibrium, but it is at odds with the fact that we witness change and observe systems approaching equilibrium departing from a non-equilibrium state. This evolution has to be reflected in a change of the probability distribution, which is impossible if it is stationary by definition. Hence the time average interpretation of probability together with the assumption that the system is ergodic make it impossible to account for non-equilibrium behaviour (Sklar 1973, p. 211; Jaynes 1983, p. 106; Dougherty 1993, p. 846; van Lith 2001a, p. 586). One could try to circumvent this problem by giving up the assumption that the system is ergodic and define pt (R) as a finite time average. However, the problem with this suggestion is that it is not clear what the relevant time interval should be, and the dependence of the time average on the initial condition would persist. These problems make this suggestion rather unattractive. Another suggestion is to be a pluralist about the interpretation of probability and hold that probabilities in equilibrium have to be interpreted differently than probabilities in non-equilibrium. Whatever support one might muster for pluralism about the interpretation of probability in other contexts, it seems out of place when the equilibrium versus non-equilibrium distinction is at stake. At least in this case one needs an interpretation that applies to both cases alike (van Lith 2001a, p. 588). 3.3.5

The Approach to Equilibrium

The main challenge for Gibbsian non-equilibrium theory is to find a way to get the Gibbs entropy moving. Before discussing different solutions to this problem, let me again illustrate what the problem is. Consider the by now familiar gas that is confined to the left half of a container (Vleft ). Then remove the separating wall. As a result the gas will spread and soon evenly fill the entire volume (Vtotal ). From a Gibbsian point of view, what seems to happen is that the equilibrium distribution with respect to the left half evolves into the equilibrium distribution with respect to the entire container; more specifically, what seems to happen is that the microcanonical distribution over all micro-states compatible with the gas being in Vleft , Γleft , evolves into the microcanonical distribution over all states compatible with the gas being in Vtotal , Γtotal . The problem is, that this development is ruled out by the laws of mechanics for an isolated system. The time evolution of an ensemble density is subject to Liouville’s eq.3.27, according

156 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

to which the density moves in phase space like an incompressible liquid, and therefore it is not possible that a density that was uniform over Γleft at some time can be uniform over Γtotal at some later point. Hence, as it stands, the Gibbs approach cannot explain the approach to equilibrium. 3.3.5.1 Coarse-Graining The ‘official’ Gibbsian proposal is that this problem is best addressed by coarse-graining the phase space; the idea is introduced in Chapter XII of Gibbs (1902) and has since been endorsed, among others, by Penrose (1970), Farquhar (1964), and all supporters of the programme of stochastic dynamics discussed below. The procedure is exactly the same as in the Boltzmann case (§3.2.2), with the exception that we now coarse-grain the system’s γ-space rather than its μ-space. The so-called coarse-grained density ρ¯ is defined as the density that is uniform within each cell, taking as its value the average value in this cell of the original continuous density ρ: ρ¯ω (q, p, t) :=

1 δω



ρ(q  , p , t)dΓ ,

(3.33)

ω(q,p)

where ω(q, p) is the cell in which the point (q, p) lies and δω is the Lebesgue measure of a cell. Whether we work with ρ¯ω or ρ is of little importance to the practitioner because for any phase function that does not fluctuate on the scale of δω (which is true of most physically relevant phase functions) the phase average with respect to ρ¯ω and ρ are approximately the same. We can now define the coarse-grained entropy Sω :  ρω ) = −kB Sω (ρ) := SG (¯

Γ

ρ¯ω log(¯ ρω )dΓ

(3.34)

One can prove that the coarse-grained entropy is always greater or equal to the fine-grained entropy: Sω (ρ) ≥ SG (ρ); the equality holds only if the fine-grained distribution is uniform over the cells of the coarse-graining (see Uffink 1995b, p. 155; Wehrl 1978, p. 229; Lavis 2004, p. 672). What do we gain by working with ρ¯ω rather than with ρ? The main point is that the coarse-grained density ρ¯ω is not governed by Liouville’s equation and hence is not subject to the restrictions mentioned above. So it is, at least in principle, possible for ρ¯ω to evolve in such a way that it will be uniform over the portion of the phase space available to the system in equilibrium. This state is referred to as ‘coarse-grained equilibrium’ (Ridderbos 2002, p. 69). The approach to coarse-grained equilibrium happens if under the dynamics of the system ρ becomes so scrambled that an equal portion of it is located in every cell of the partition. Because the averaged density is ‘blind’ to differences within each cell, the spread out states of the initial equilibrium condition will, on the averaged level, look like a homogenous distribution. This is illustrated in fig. 3.10 for the example mentioned at the beginning of this subsection, where the initial

THE GIBBS APPROACH

157

density is constant over Γleft while the final density is expected to be constant over Γtotal (this figure is adapted from Uffink 1995b, p. 154).

Γleft

φt (Γleft )

ρ

Fig. 3.10. Evolution into a quasi-equilibrium distribution A fine-grained distribution which has evolved in this way, i.e. which appears to be uniform at the coarse-grained level, is said to be in a quasi-equilibrium (Blatt 1959, p. 749; Ridderbos 2002, p. 73). On the coarse-graining view, then, all that is required to explain the approach to equilibrium in the Gibbs approach is a demonstration that an arbitrary initial distribution indeed evolves into a quasi-equilibrium distribution (Ridderbos 2002, p. 73). The question then is under what circumstances this happens. The standard answer is that the system has to be mixing (see §3.2.4.1 for a discussion of mixing). This suggestion has some intuitive plausibility given the geometrical interpretation of mixing, and it receives further support from the convergence theorem (eq.3.20). In sum, the proposal is that we coarse-grain the system’s phase space and then consider the coarse-grained entropy, which indeed increases if the system is mixing. What can be said in support of this point of view? The main thrust of arguments in favour of coarse-graining is that even if there are differences between the fine-grained and the coarse-grained density, we cannot empirically distinguish between them and hence there is no reason to prefer one to the other. There are various facets to this claim; these are discussed but not endorsed in Ridderbos (2002, p. 73). First, measurements have finite precision and if δω is chosen so that it is below that precision, no measurement that we can perform on the system will ever be able to tell us whether the true distribution is ρ or ρ¯ω . Second, as already observed above, the values of macroscopic variables calculated using the coarse-grained density coincide with those calculated using the fine-grained density (if the relevant phase function does not fluctuate so violently as to fluctuate on the scale of δω). This is all we need because thermodynamic equilibrium is defined in terms of the values of macroscopic parameters and as long as these coincide there is no reason to prefer the fine-grained to a coarsegrained density.

158 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

This programme faces several serious difficulties. To begin with, there is the problem that mixing is only defined, and so only achieved, for t → ∞, but thermodynamic systems seem to reach equilibrium in finite time. One might try to mitigate the force of this objection by saying that it is enough for a system to reach an ‘almost mixed’ state in the relevant finite time. The problem with this suggestion is that from the fact that a system is mixing nothing follows about how fast it reaches a mixed state and hence it is not clear whether it becomes ‘almost mixed’ over the relevant observation times (see Berkovitz et al. 2006, p. 687). Moreover, mixing is too stringent a requirement for many realistic systems. Mixing implies ergodicity and, a fortiori, if a system is not ergodic it cannot be mixing (see §3.2.4.1). But there are relevant systems that fail to be ergodic, and hence also fail to be mixing (as we have seen in §3.2.4.3). This is a serious difficulty and unless it can be argued—as Vranas did with regards to ergodicity— that systems which fail to be mixing are ‘almost mixing’ in some relevant sense and reach some ‘almost mixed state’ in some finite time, an explanation of the approach to equilibrium based on mixing is not viable. Second, there is a consistency problem, because we now seem to have two different definitions of equilibrium (Ridderbos 2002, p. 73). One is based on the requirement that the equilibrium distribution be stationary; the other on apparent uniformity. These two concepts of equilibrium are not co-extensive and so we face the question of which one we regard as the constitutive one. Similarly, we have two notions of entropy for the same system. Which one really is the system’s entropy? However, it seems that this objection need not really trouble the proponent of coarse-graining. There is nothing sacrosanct about the formalism as first introduced above and, in keeping with the revisionary spirit of the coarse-graining approach, one can simply declare that equilibrium is defined by uniformity relative to a partition and that Sω is the ‘real’ entropy of the system. Third, as in the case of Boltzmannian coarse-graining, there is a question about the justification of the introduction of a partition. The main justification is based on the finite accuracy of observations, which can never reveal the precise location of a system’s micro-state in its γ-space. As the approach to equilibrium only takes place on the coarse-grained level, we have to conclude that the emergence of thermodynamic behaviour depends on there being limits to the observer’s measurement resolution. This, so the objection continues, is misguided because thermodynamics does not appeal to observers of any sort and thermodynamic systems approach equilibrium irrespective of what those witnessing this process can know about the system’s micro-state. This objection can be challenged on two grounds. First, one can mitigate the force of this argument by pointing out that micro-states have no counterpart in thermodynamics at all and hence grouping some of them together on the basis of experimental indistinguishability cannot possibly lead to a contradiction with thermodynamics. All that matters from a thermodynamic point of view is that the macroscopic quantities come out right, and this is the case in the coarse-

THE GIBBS APPROACH

159

graining approach (Ridderbos 2002, p. 71). Second, the above suggestion does not rely on there being actual observers, or actual observations taking place. The claim simply is that the fine-grained distribution has to reach quasi-equilibrium. The concept is defined relative to a partition, but there is nothing subjective about that. Whether or not a system reaches quasi-equilibrium is an objective matter of fact that depends on the dynamics of the system, but has nothing to do with the existence of observers. Those opposed to coarse-graining reply that this is besides the point because the very justification for introducing a partition to begin with is an appeal to limited observational capacities so that whether or not quasi-equilibrium is an objective property given a particular partition is simply a non-issue. So, at bottom, the disagreement seems to be over the question of whether the notion of equilibrium is essentially a macroscopic one. That is, does the notion of equilibrium make sense to creatures with unlimited observational powers? Or less radically: do they need this notion? It is at least conceivable that for them the gas indeed does not approach equilibrium but moves around in some very complicated but ever changing patterns, which only look stable and unchanging to those who cannot (or simply do not) look too closely. Whether or not one finds convincing a justification of coarse-graining by appeal to limited observational powers depends on how one regards this possibility. Fourth, one can question the central premise of the argument for regarding ρ¯ω as the relevant equilibrium distribution, namely that ρ¯ω and ρ are empirically indistinguishable. Blatt (1959) and Ridderbos and Redhead (1998) argue that this is wrong because the spin-echo experiment (Hahn 1950) makes it possible to empirically discriminate between ρ or ρ¯, even if the size of the cells is chosen to be so small that no direct measurement could distinguish between states within a cell. For this reason, they conclude, replacing ρ with ρ¯ω is illegitimate and an appeal to coarse-graining to explain the approach to equilibrium has to be renounced. In the spin-echo experiment, a collection of spins is placed in a magnetic field  pointing along the z-axis and the spins are initially aligned with this field (fig. B 3.11). z y x

Fig. 3.11. Spins aligned with a magnetic field Then the spins are subjected to a radio frequency pulse, as a result of which they are tilted by 90 degrees so that they now point in the x-direction (fig. 3.12).

160 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS z y x

Fig. 3.12. Spins shifted 90◦ by a pulse  the spins start precessing around Due to the presence of the magnetic field B, the z-axis and in doing so emit an oscillating electromagnetic pulse, the ‘free induction decay signal’ (fig. 3.13; the curved dotted arrows indicate the direction of rotation). z y x

Fig. 3.13. Precession of spins This signal is the macroscopic evidence for the fact that all spins are aligned and precess around the same axis. After some time this signal decays, indicating that the spins are now no longer aligned and point in ‘random’ directions (fig. 3.14). z y x

Fig. 3.14. Signal decays and spins point in random directions The reason for this is that the precession speed is a function of the field strength  and it is not possible to create an exactly homogeneous magnetic field. of B Therefore the precession frequencies of the spins differ slightly, resulting in the spins pointing in different directions after some time t = τ has elapsed. At that point a second pulse is applied to the system, tilting the spins in the x − z plane by 180 degrees (fig. 3.15; the straight dotted arrows indicate the direction of the spins before the pulse). The result of this is a reversal of the order of the spins in the sense that the faster spins that were ahead of the slower ones are now behind the slower ones

THE GIBBS APPROACH

161

z y x

Fig. 3.15. Spins after a second pulse (fig. 3.16; s1 and s2 are two spins, s1 and s2 their ‘tilted versions’). However, those that were precessing faster before the second pulse keep doing so after the pulse and hence ‘catch up’ with the slower ones. After time t = 2τ all spins are aligned again and the free induction decay signal reappears (the ‘echo pulse’). This is the macroscopic evidence that the original order has been restored.66

s’1

s’2

s2

s1

Fig. 3.16. Order of spins (in terms of speed) is reversed At time t = τ , when all spins point in random directions, ρ¯ is uniform and the system has reached its coarse-grained equilibrium. From a coarse-grainer’s point of view this is sufficient to assert that the system is in equilibrium, as we cannot distinguish between true and coarse-grained equilibrium. So according to Blatt and Redhead and Ridderbos the spin-echo experiment shows that this rationale is wrong because we actually can distinguish between true and coarse-grained equilibrium. If the system was in true equilibrium at t = τ the second radio pulse flipping the spins by 180 degrees would not have the effect of aligning the spins again at t = 2τ ; this only happens because the system is merely in a coarsegrained equilibrium. Hence equilibrium and quasi-equilibrium distributions can be shown to display radically different behaviour. Moreover, this difference is 66 It is often said that this experiment is the empirical realisation of a Loschmidt velocity reversal (in which a ‘Loschmidt demon’ instantaneously transforms the velocities vi of all particles in the system into −vi ). This is incorrect. The directions of precession (and hence the particles’ velocities) are not reversed in the experiment. The reflection of the spins in the x − z plane results in a reversal of their ordering while leaving their velocities unaltered. The grain of truth in the standard story is that a reversal of the ordering with unaltered velocities is in a sense ‘isomorphic’ to a velocity reversal with unaltered ordering.

162 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

such that we can experimentally detect it without measuring microdynamical variables: we simply check whether there is an echo-pulse at t = 2τ . This pulls the rug from under the feet of the coarse-grainer and we have to conclude that it is therefore not permissible to base fundamental arguments in statistical mechanics on coarse-graining (Blatt 1959, p. 746). What is the weight of this argument? Ridderbos (2002, p. 75) thinks that the fact that we can, after all, experimentally distinguish between ρ¯ and ρ, and hence between ‘real’ equilibrium and quasi-equilibrium, is by itself a sufficient reason to dismiss the coarse-graining approach. Others are more hesitant. Ainsworth (2005, pp. 626-7) points out that, although valid, this argument fails to establish its conclusion because it assumes that for coarse-graining approach to be acceptable ρ¯ and ρ must be empirically indistinguishable. Instead, he suggests appealing to the fact, proffered by some in support of Boltzmannian coarsegraining, that there is an objective separation of the micro and macro scales (see §3.2.7). He accepts this point of view as essentially correct and submits that the same response is available to the Gibbsian: coarse-graining can be justified by an appeal to the separation of scales rather than by pointing to limitations of what we can observe. As the notion of equilibrium is one that inherently belongs to the realm of the macroscopic, coarse-grained equilibrium is the correct notion of equilibrium, irrespective of what happens at the micro scale. However, as I have indicated above, the premise of this argument is controversial since it is not clear whether there is indeed an objective separation of micro and macro scales. Ridderbos and Redhead make their case against coarse-graining by putting forward two essentially independent arguments. Their first argument is based on theoretical results. They introduce a mathematical model of the experiment and then show that the coarse-grained distribution behaves in a way that leads to false predictions. They show that the system reaches a uniform coarse-grained distribution over the entire phase space at t = τ (as one would expect), but then fails to evolve back into a non-equilibrium distribution under reversal, so that, in coarse-grained terms, the system is still described by a uniform distribution at t = 2τ (1998, p. 1250). Accordingly, the coarse-grained entropy reaches its maximum at t = τ and does not decrease as the spins evolve back to their initial positions. Hence, the coarse-grained entropy is still maximal when the echo pulse occurs and therefore the occurrence of the echo is, from a coarsegrained perspective, completely miraculous (1998, p. 1251). Their second argument is based on the assumption that we can, somehow, experimentally observe the coarse-grained entropy (as opposed to calculating it in the model). Then we face the problem that observational results seem to tell us that the system has reached equilibrium at time t = τ and that after the application of the second pulse at that time evolves away from equilibrium; that is, we are led to believe that the system behaves anti-thermodynamically. This, Ridderbos and Redhead (1998, p. 1251) conclude, is wrong because the experiments do not actually contradict the Second Law.

THE GIBBS APPROACH

163

So the experimental results would stand in contradiction both with the theoretical results predicting that the coarse-grained entropy assumes its maximum value at t = 2τ and with the second law of thermodynamics, which forbids high to low entropy transitions in isolated systems (and the spin echo system is isolated after the second pulse). This, according to Ridderbos and Redhead, is a reductio of the coarse-graining approach.67 These arguments have not gone unchallenged. The first argument has been criticised by Lavis (2004) on the grounds that the behaviour of ρ¯ and the coarsegrained entropy predicted by Ridderbos and Redhead is an artifact of the way in which they calculated these quantities. There are two methods for calculating ρ¯. The first involves a coarse-graining of the fine-grained distribution at each instant of time; i.e. the coarse-grained distribution at time t is determined by first calculating the fine-grained distribution at time t (on the basis of the time evolution of the system and the initial distribution) and then coarse-graining it. The second method is based on re-coarse-graining as time progresses; i.e. the coarse-grained distribution at time t is calculated by evolving the coarsegrained distribution at an earlier time and then re-coarse-graining. Lavis points out that Ridderbos and Redhead use the second method: they calculate ρ¯ at time t = 2τ by evolving ρ¯ at time t = τ forward in time. For this reason, the fact that they fail to find ρ¯ returning to its initial distribution is just a manifestion of the impossibility of ‘un-coarse-graining’ a coarse-grained distribution. Lavis then suggests that we should determine that coarse-grained distribution at some time t by using the first method, which, as he shows, yields the correct behaviour: the distribution returns to its initial form and the entropy decreases in the second half of the experiment, assuming its initial value at t = 2τ . Hence the echo-pulse does not come as a surprise after all. The question now is which of the two coarse-graining methods one should use. Although he does not put it quite this way, Lavis’ conclusion seems to be that given that there are no physical laws that favour one method over the other, the principle of charity should lead us to choose the one that yields the correct results. Hence Ridderbos and Redhead’s result has no force against the coarse-graining. As regards the second argument, both Lavis (2004) and Ainsworth (2005) point out that the decrease in entropy during the second half of the experiment need not trouble us too much. Ever since the work of Maxwell and Boltzmann ‘entropy increase in an isolated system is taken to be highly probable but not certain, and the spin-echo model, along with simulations of other simple models, is a nice example of the working of the law’ (Lavis 2004, p. 686). On this view, the spin-echo experiment simply affords us one of these rare examples in which, 67 Their own view is that the fine-grained entropy is the correct entropy and that we were wrong to believe that the entropy ever increased. Despite appearances, the thermodynamic entropy does not increase between t = 0 and t = τ and hence there is no need for it to decrease after t = τ in order to resume its initial value at t = 2τ ; it is simply constant throughout the experiment. However, this view is not uncontroversial (Sklar 1993, pp. 253-4).

164 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

due to skilful engineering, we can prepare a system in one of these exceptional states which evolve from high to low entropy. 3.3.5.2 Interventionism One of the crucial assumptions, made more or less tacitly so far, is that the systems under consideration are isolated. This, needless to say, is an idealising assumption that can never be realised in practice. Real systems cannot be perfectly isolated from their environment and are always subject to interactions; for instance, it is impossible to shield a system from gravitation. Blatt (1959) suggested that taking systems to be isolated not only fails to be the harmless idealisation that it is generally believed to be; it actually is the source of the problem. This recognition is the starting point for the interventionist programme, at the heart of which lies the idea that real systems are open in that they are constantly subject to outside perturbations, and that it is exactly these perturbations that drive the system into equilibrium. In more detail, the leading idea is that every system interacts with its environment (the gas, for instance, collides with the wall of the container and the walls interact in many different ways with their surroundings), and that these interactions are ‘in principle not amenable to a causal description, and must of necessity be described in statistical terms’ (Blatt 1959, p. 751, original emphasis). The perturbations from outside serve as a kind of ‘stirring mechanism’ or ‘source of randomness’ that drives the system around randomly in the phase space, in much the same way as it would be the case if the system was mixing. As a consequence, the observable macroscopic quantities are soon driven towards their equilibrium values. This includes the Gibbs entropy; in an open system Liouville’s theorem no longer holds and there is nothing to prevent the Gibbs entropy from increasing.68 Of course, from the fact that the Gibbs entropy can increase it does not follow that it actually does increase; whether or not this is the case depends on the system as well as the properties of the outside perturbations. Blatt (1959) and Ridderbos and Redhead (1998) assure us that in realistic model systems one can prove this to be the case. Granting this, we have an elegant explanation of why and how systems approach equilibrium, which also enjoys the advantage that no revision of the classical laws is needed.69 A common objection against this suggestion points out that we are always free to consider a larger system, consisting of our ‘original’ system and its environment. For instance, we can consider the ‘gas cum box’ system, which, provided that classical mechanics is a universal theory, is also governed by classical mechanics. So we are back to where we started. Interventionism, then, seems 68 It is a curious fact about the literature on the subject that interventionism is always discussed within the Gibbs framework. However, it is obvious that interventionism, if true, would also explain the approach to equilibrium in the Boltzmannian framework as it would explain why the state of the system wanders around randomly on the energy surface, which is needed for it to ultimately end up in the equilibrium region (see §3.2.3.1). 69 For a discussion of interventionism and time-reversal see Ridderbos and Redhead (1998, pp. 1259-62) and references therein.

THE GIBBS APPROACH

165

wrong because it treats the environment as a kind of deus ex machina that is somehow ‘outside physics’; but the environment is governed by the fundamental laws of physics just as the system itself is and so it cannot do the job that the interventionist has singled out for it to do. The interventionist might now reply that the ‘gas cum box’ system has an environment as well and it is this environment that effects the desired perturbations. This answer does not resolve the problems, of course. We can now consider an even larger system that also encompasses the environment of the ‘gas cum box’ system. And we can keep expanding our system until the relevant system is the entire universe, which, by assumption, has no environment any more that might serve as a source of random perturbations. Whether this constitutes a reductio of the interventionist programme depends on one’s philosophical commitments. The above argument relies on the premise that classical mechanics (or quantum mechanics, if we are working within quantum SM) is a universal theory, i.e. one that applies to everything that there is without restrictions. This assumption, although widely held among scientists and philosophers alike, is not uncontroversial. Some have argued that we cannot legitimately claim that laws apply universally. In fact, laws are always tested in highly artificial laboratory situations and claiming that they equally apply outside the laboratory setting involves an inductive leap that is problematic. Hence we have no reason to believe that classical mechanics applies to the universe as a whole; see for instance Reichenbach (1956) and Cartwright (1999) for a discussion of this view. This, if true, successfully undercuts the above argument against interventionism.70 There is a way around the above objection even for those who do believe in the generality of laws, namely to deny Blatt’s assumption that the environment needs to be genuinely stochastic. Pace Blatt, that the environment be genuinely stochastic (i.e. as governed by indeterministic laws rather than classical mechanics) is not an indispensable part of the interventionist programme. As Ridderbos and Redhead (1998, p. 1257) point out, all that is required is that the system loses coherence, which can be achieved by dissipating correlations into the environment. For observations restricted to the actual system, this means that correlational information is not available. But the information is not lost; it has just been ‘dislocated’ into the degrees of freedom pertaining to the environment. The question then becomes whether the universe as a whole is expected to approach equilibrium, or whether thermodynamic behaviour is only required to hold for a subsystem of the universe. Those who hold that the ‘dissipation’ of correlational information into environmental degrees of freedom is enough to explain 70 Interventionists are sometimes charged with being committed to an instrumentalist take on laws, which, the critics continue, is an unacceptable point of view. This is mistaken. Whatever one’s assessment of the pros and cons of instrumentalism, all the interventionist needs is the denial that the laws (or more specifically, the laws of mechanics) are universal laws. This is compatible with realism about laws understood as providing ‘local’ descriptions of ‘parts’ of the universe (a position sometime referred to as ‘local realism’).

166 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

the approach to equilibrium are committed to this view. Ridderbos and Redhead are explicit about this (1998, pp. 1261-2). They hold that the fine-grained Gibbs entropy of the universe is indeed constant since the universe as a whole has no outside, and that there is no approach to equilibrium at the level of the universe. Moreover, this does not stand in conflict with the fact that cosmology informs us that the entropy of the universe is increasing; cosmological entropies are coarsegrained entropies and, as we have seen above, there is no conflict between an increase in coarse-grained entropy and the constancy of the fine-grained Gibbs entropy. Ridderbos and Redhead acknowledge that the question now is whether the claim that the Gibbs entropy of the universe is constant is true, which is an issue that has to be settled empirically. 3.3.5.3 Changing the Notion of Equilibrium One of the main problems facing Gibbsian non-equilibrium theory is that under a Hamiltonian time evolution a non-stationary distribution cannot evolve into a stationary one (see §3.3.2.4). Hence strict stationarity is too stringent a requirement for equilibrium. Nevertheless, it seems plausible to assume that an equilibrium distribution has to approximate a stationary distribution in some relevant sense. What is this relevant sense? Van Lith suggested turning the desired result into a definition and replacing strict stationarity with the requirement that the distribution be such that the phase average of every function in a physically relevant set of functions only fluctuates mildly around its average value (van Lith 1999, p. 114). More precisely, let Ω be a class of phase functions f (x) corresponding to macroscopically relevant quantities. Then the system is in equilibrium from time τ onwards iff for every function f (x) ∈ Ω there is a constant cf such that:  |

f (x)ρt (x)dΓ − cf | ≤ εf ,

(3.35)

where εf is a small number (which can be different for every f ). This definition of equilibrium seems to have the advantage of preserving all the desirable features of equilibrium while no longer running into the problem that equilibrium can never be reached. However, from the fact that an arbitrary non-equilibrium distribution can reach equilibrium thus defined it does not follow that it actually does. What conditions does the dynamics of a system have to meet in order for the approach to equilibrium to take place? Van Lith points out that being mixing is a sufficient condition (van Lith 1999, p. 114) because the Convergence Theorem (see §3.2.4.1) states that in the limit all time averages converge to the microcanonical averages, and hence they satisfy the above definition. But this proposal suffers from various problems. First, as van Lith herself points out (1999, p. 115), the proposal does not contain a recipe to get the (finegrained) Gibbs entropy moving; hence the approach to equilibrium need not be accompanied by a corresponding increase in the Gibbs entropy.

THE GIBBS APPROACH

167

Second, as we have seen above, mixing is too stringent a condition: it is not met by many systems of interest. Remedy for this might be found in the realisation that less than full-fledged mixing is needed to make the above suggestion work. In fact, all we need is a condition that guarantees that the Convergence Theorem holds (Earman and Redei 1996, p. 74; van Lith 1999, p. 115). One condition of that sort is that the system has to be mixing for all f ∈ Ω. The question then is, what this involves. This question is difficult, if not impossible to answer, before Ω is precisely specified. And even then there is the question of whether the convergence is sufficiently rapid to account for the fact that thermodynamic systems reach equilibrium rather quickly.71 3.3.5.4 Alternative Approaches Before turning to the epistemic approach, I would like to briefly mention three other approaches to non-equilibrium SM; lack of space prevents me from discussing them in more detail. Stochastic Dynamics. The leading idea of this approach is to replace the Hamiltonian dynamics of the system with an explicity probabilistic law of evolution. Characteristically this is done by coarse-graining the phase space and then postulating a probabilistic law describing the transition from one cell of the partition to another one. Liouville’s theorem is in general not true for such a dynamics and hence the problem of the constancy of the Gibbs entropy does not arise. Brief introductions can be found in Kreuzer (1981, Chapter 10), Reif (1985, Chapter 15) and Honerkamp (1998, Chapter 5); detailed expositions of the approach include Penrose (1970; 1979), Mackey (1989; 1992), and Streater (1995). The main problem with this approach is that its probabilistic laws are put in ‘by hand’ and are not derived from the underlying dynamics of the system; that is, it is usually not possible to derive the probabilistic laws from the underlying deterministic evolution and hence the probabilistic laws are introduced as independent postulates. However, unless one can show how the transition probabilities postulated in this approach can be derived from the Hamiltonian equations of motion governing the system, this approach does not shed light on how thermodynamical behaviour emerges from the fundamental laws governing a system’s constituents. For critical discussions of the stochastic dynamics programme see Sklar (1993, Chapters 6 and 7), Callender (1999, pp. 358-64) and Uffink (2007, pp. 1038-63). The Brussels School (sometimes also ‘Brussels-Austin School’). An approach closely related to the Stochastic Dynamics programme has been put forward by the so-called Brussels School, led by Ilya Prigogine. The central contention of this programme is that if the system exhibits sensitive dependence on initial conditions (and most systems do) the very idea of a precise micro-state given by a point in phase space ceases to be meaningful and should be replaced by an 71 Another alternative definition of equilibrium, which also applies to open systems, has been suggested by Pitowsky (2001, 2006), but for a lack of space I cannot further discuss this suggestion here.

168 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

explicitly probabilistic description of the system in terms of open regions of the phase space, i.e. by a Gibbs distribution function. This programme, if successful, can be seen as providing the sought after justification for the above-mentioned shift from a Hamiltonian micro dynamics to an explicitly probabilistic scheme. These claims have been challenged on different grounds; for presentations and critical discussions of the ideas of the Brussels School see Batterman (1991), Bricmont (1996), Karakostas (1996), Lombardi (1999, 2000), Edens (2001) and Bishop (2004). An approach that is similar to the programme of the Brussels School in that it denies that the conceptual framework of classical mechanics, in particular the classical notion of a state, is adequate to understand SM, has been suggested by Krylov. Unfortunately he died before he could bring his programme to completion, and so it is not clear what form his ideas would have taken in the end. For philosophical discussions of Krylov’s programme see Batterman (1990), R´edei (1992) and Sklar (1993, pp. 262-9). The BBGKY Hierarchy. The main idea of the BBGKY (after Bogolyubov, Born, Green, Kirkwood, and Yvon) approach is to describe the evolution of an ensemble by dint of a reduced probability density and then derive (something like) a Boltzmann equation for this density, which yields the approach to equilibrium. The problem with the approach is that, just as in the case of the Boltzmann’s (early) theory, the irreversibility is a result of (something like) the Stosszahlansatz, and hence all its difficulties surface again at this point. For a discussion of this approach see Uffink (2007, pp. 1034-8). 3.3.6

The Epistemic Approach

The approaches discussed so far are based on the assumption that SM probabilities are ontic (see §3.2.3.2). It is this assumption that those who argue for an epistemic interpretation deny. They argue that SM probabilities are an expression of what we know about a system, rather than a feature of a system itself. This view can be traced back to Tolman (1938) and has been developed into an all-encompassing approach to SM by Jaynes in a series of papers published (roughly) between 1955 and 1980, some of which are gathered in Jaynes (1983).72 At the heart of Jaynes’ approach to SM lies a radical reconceptualisation of what SM is. On his view, SM is about our knowledge of the world, not about the world itself. The probability distribution represents our state of knowledge about the system at hand and not some matter of fact about the system itself. More specifically, the distribution represents our lack of knowledge about a system’s micro-state given its macro condition; and, in particular, entropy becomes a measure of how much knowledge we lack. As a consequence, Jaynes regards SM as a part of general statistics, or ‘statistical inference’, as he puts it: 72 In this subsection I focus on Jaynes’ approach. Tolman’s view is introduced in his (1938, pp. 59–70); for a discussion of Tolman’s interpretation of probability see Uffink (1995b, pp. 166–7).

THE GIBBS APPROACH

169

Indeed, I do not see Predictive Statistical Mechanics and Statistical Inference as different subjects at all; the former is only a particular realization of the latter [...] Today, not only do Statistical Mechanics and Statistical Inference not appear to be two different fields, even the term ‘statistical’ is not entirely appropriate. Both are special cases of a simple and general procedure that ought to be called, simply, “inference”. (Jaynes 1983, pp. 2–3)

The questions then are: in what way a probability distribution encodes a lack of knowledge; according to what principles the correct distribution is determined; and how this way of thinking about probabilities sheds any light on the foundation of SM. The first and the second of these questions are addressed in §3.3.6.1; the third is discussed in §3.3.6.2. 3.3.6.1 The Shannon Entropy Consider a random variable x which can take any of the m discrete values in X = {x1 , ..., xm } with probabilities p(xi ); for instance, x can be the number of spots showing on the next roll of a die, in which case X = {1, 2, 3, 4, 5, 6} and the probability for each even is 1/6. The Shannon entropy of the probability distribution p(xi ) is defined (Shannon 1949) as: SS (p) := −

m 

p(xi ) log(p(xi )),

(3.36)

i=1

which is a quantitative measure for the uncertainty of the outcome. If the probability for one particular outcome is one while the probabilities for all other outcomes are zero, then there is no uncertainty and SS equals zero; SS reaches its maximum for a uniform probability distribution, i.e. p(xi ) = 1/m for all i, in which case we are maximally uncertain about the outcome; an accessible discussion of the relation between the Shannon entropy and uncertainty can be found Jaynes (1994, Chapter 11); see Cover and Thomas (1991) for a detailed treatment. Sometimes we are given X but fail to know the p(xi ). In this case Jaynes’s maximum entropy principle (MEP) instructs us to choose that distribution p(xi ) m for which the Shannon entropy is maximal (under the constraint i=1 p(xi ) = 1). For instance, from this principle it follows immediately that we should assign p = 1/6 to each number of spots when rolling the die. If there are constraints that need to be taken into account then MEP instructs us to choose that distribution for which SS is maximal under the given constraints. The most common type of constraint is that the expectation value for a particular function f has a given value c: f  :=

m 

f (xi )p(xi ) = c.

(3.37)

i=1

This can be generalised to the case of a continuous variable, i.e. X = (a, b), where (a, b) is an interval of real numbers (the boundaries of this interval can be finite or infinite). The continuous Shannon entropy is

170 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

 SS (p) := −

b

p(x) log[p(x)]dx,

(3.38)

a

where p(x) is a probability density over (a, b).73 So for a continuous variable the most common type of constraint is 

b

f  :=

f (x)p(x)dx = c,

(3.39)

a

and MEP tells us to choose p(x) such that it maximises SS (p) under the given constraints. Why is MEP compelling? The intuitive idea is that we should always choose the distribution that corresponds to a maximal amount of uncertainty, i.e. is maximally non-committal with respect to the missing information. But why is this a sound rule? In fact MEP is fraught with controversy; and, to date, no consensus on its significance, or even cogency, has been reached. However, debates over the validity of MEP belong to the foundations of statistical inference in general and as such they are beyond the scope of this review; for discussions see, for instance, Lavis (1977), Denbigh and Denbigh (1985), Lavis and Milligan (1985, §5), Shimony (1985), Seidenfeld (1986), Uffink (1995a; 1996a), Howson and Urbach (2006, pp. 276–88). In what follows let us, for the sake of argument, assume that MEP can be justified satisfactorily and discuss what it has to offer for the foundations of SM. But before moving on, a remark about the epistemic probabilities here employed is in place. On the current view, epistemic probabilities are not subjective, i.e. they do not reduce to the personal opinion of individual observers, as would be the case in a personalist Bayesian theory (such as de Finetti’s). On the contrary Jaynes advocates an ‘impersonalism’ that bases probability assignments solely on the available data and MEP; anybody’s personal opinions do not enter the scene at any point. Hence, referring to Jaynes’ position as ‘subjectivism’ is a— frequently used—misnomer. 3.3.6.2 MEP and SM The appeal of MEP for equilibrium SM lies in the fact that the continuous Shannon entropy is equivalent to the Gibbs entropy (3.28) up to the multiplicative constant kB if in eq.3.38 we take X to be the phase space and ρ the probability distribution. Gibbsian equilibrium distributions are required to maximise SG under certain constraints and hence, trivially, they also satisfy MEP. For an isolated system, for instance, the maximum entropy distribution is the microcanonical distribution. In fact, even more has been achieved: MEP not only coincides with the Gibbsian maximum entropy principle introduced in §3.3.1; on the current view, this principle, which above has been postulated 73 This generalisation is problematic in many respects; and for the continuum limit to be taken properly, a background measure and the relative entropy need to be introduced. In the simplest case where the background measure is the Lebesgue measure we retrieve eq.3.38. For a discussion of this issue see Uffink (1995a, pp. 235–9).

THE GIBBS APPROACH

171

without further explanation, is justified because it can be understood as a version of MEP. As we have seen at the beginning of this subsection, Jaynes sees the aim of SM as making predictions, as drawing inferences. This opens a new perspective on non-equilibrium SM, which, according to Jaynes, should refrain from trying to explain the approach to equilibrium by appeal to dynamical or other features of the system and only aim to make predictions about the system’s future behaviour (1983, 2). Once this is understood, the puzzle of the approach to equilibrium has a straightforward two-step answer. In Sklar’s (1993, pp. 255–257) reconstruction, the argument runs as follows. The first step consists in choosing the initial distribution. Characteristic nonequilibrium situations usually arise from the removing of a constraint (e.g. the opening of a shutter) in a particular equilibrium situation. Hence the initial distribution is chosen in the same way as an equilibrium distribution, namely by maximising the Shannon entropy SS relative to the known macroscopic constraints. Let ρ0 (q, p, t0 ) be that distribution, where t0 is the instant of time at which the constraint in question is removed. Assume now that the experimental set-up is such that a set of macroscopic parameters corresponding to the phase functions fi , i = 1, ..., k, are measured. At time t0 these have as expected values  fi (q, p) ρ0 (q, p, t0 )dΓ, i = 1, ..., k. (3.40) f¯i (t0 ) = Furthermore we assume that the entropy which we determine in an actual experiment, the experimental entropy Se , at time t0 is equal to the Gibbs entropy: Se (t0 ) = SS (ρ0 (t0 )). The second step consists in determining the distribution and the entropy of a system at some time t1 > t0 . To this end we first use Liouville’s equation to determine the image of the initial distribution under the dynamics of the system, ρ0 (t1 ), and then calculate the expectation values of the observable parameters at time t1 :  fi (q, p) ρ0 (q, p, t1 )dΓ, i = 1, ..., k. (3.41) f¯i (t1 ) = Now we calculate a new density ρ1 (q, p, t1 ), which maximises the Shannon entropy under the constraints that  (3.42) fi (q, p) ρ1 (q, p, t1 )dΓ = f¯i (t1 ), i = 1, ..., k. The experimental entropy of the system at t1 then is Se (t1 ) = SS (ρ1 (t1 )). This entropy is greater than or equal to Se (t0 ) for the following reason. By Liouville’s theorem we have SS (ρ0 (t0 )) = SS (ρ0 (t1 )). Both SS (ρ0 (t1 )) and SS (ρ1 (t1 )) satisfy the constraints in Equation (3.42). By construction, SS (ρ1 (t1 )) is maximal relative to these constraints; this need not be the case for SS (ρ0 (t1 )). Therefore Se (t0 ) ≤ Se (t1 ). This is Jaynes’ justification of the Second Law.

172 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

Jaynes’ epistemic approach to SM has several interesting features. Unlike the approaches that we have discussed so far, it offers a clear and cogent interpretation of SM probabilities, which it views as rational degrees of belief. This interpretation enjoys the further advantage over its ontic competitors that it can dispense with ensembles. On Jaynes’ approach there is only one system, the one on which we are performing our experiments, and viewing probabilities as reflecting our lack of knowledge about this system rather than some sort of frequency renders ensembles superfluous. And most importantly, the problems so far that beset non-equilibrium theory no longer arise: the constancy of the Gibbs entropy becomes irrelevant because of the ‘re-maximising’ at time t1 , and the stationarity of the equilibrium distribution is no longer an issue because the dynamics of the probability distribution is now a function of both our epistemic situation and the dynamics of the system, rather than only Liouville’s equation. And last but not least—and this is a point that Jaynes himself often emphasised—all this is achieved without appealing to complex mathematical properties like ergodicity or even mixing. 3.3.6.3 Problems Let us discuss Jaynes’ approach to non-equilibrium SM first. Consider a sequence t0 < t1 < t2 < ... of increasing instants of time and consider the entropy Se (tj ), j = 0, 1, ... at these instants; all the Se (tj ), j ≥ 2 are calculated with eq.3.42 after substituting ρj for ρ1 . Conformity with the Second Law would require that Se (t0 ) ≤ Se (t1 ) ≤ Se (t2 ) ≤ .... However, this is generally not the case (Lavis and Milligan 1985, 204; Sklar 1993, 257–258) because the experimental entropy Se is not necessarily a monotonically increasing function. Jaynes’ algorithm to calculate the Se (tj ) can only establish that Se (t0 ) ≤ Se (tj ), for all j > 0 but it fails to show that Se (ti ) ≤ Se (tj ), for all 0 < i < j; in fact, it is indeed possible that Se (ti ) > Se (tj ) for some i < j. A way around this difficulty would be to use ρj−1 (tj−1 ) to calculate ρj (tj ), rather than ρ0 (t0 ). This would result in the sequence becoming monotonic, but it would have the disadvantage that the entropy curve would become dependent on the sequence of instants of time chosen (Lavis and Milligan ibid.). This seems odd even from a radically subjectivist point of view: why should the value of Se at a particular instant of time, depend on earlier instants of time at which we chose to make predictions, or worse, why should it depend on us having made any predictions at all? In equilibrium theory, a problem similar to the one we discussed in connection with the ergodic approach (§3.3.3) arises. As we have seen in Equation (3.40), Jaynes also assumes that experimental outcomes correspond to phase averages as given in Equation (3.25). But why should this be the case? It is correct that we should rationally expect the mean value of a sequence of measurements to coincide with the phase average, but prima facie this does not imply anything about individual measurements. For instance, when throwing a die we expect the mean of a sequence of events to be 3.5; but we surely don’t expect the die to show 3.5 spots after any throw! So why should we expect the outcome of a

THE GIBBS APPROACH

173

measurement of a thermodynamic parameter to coincide with the phase average? For this to be the case a further assumption seems to be needed, for instance (something like) Khinchin’s assumption that the relevant phase functions assume almost the same value for almost all points of phase space (see §3.3.3.2) A further problem is that the dynamics of the system does not play any rˆ ole in Jaynes’ derivation of the microcanonical distribution (or any other equilibrium distribution that can be derived using MEP). This seems odd because even if probability distributions are eventually about our (lack of) knowledge, it seems that what we can and cannot know must have something to do with how the system behaves. This point becomes particularly clear from the following considerations (Sklar 1993, pp. 193–4). Jaynes repeatedly emphasised that ergodicity—or the failure thereof—does not play any rˆ ole in his account. This cannot be quite true. If a system is not ergodic then the phase space decomposes into two (or more) invariant sets (see §3.2.4.1). Depending on what the initial conditions are, the system’s state may be confined to some particular invariant set, where the relevant phase functions have values that differ from the phase average; as a consequence MEP leads to wrong predictions. This problem can be solved by searching for the ‘overlooked’ constants of motion and then controlling for them, which yields the correct results.74 However, the fact remains that our original probability assignment was wrong, and this was because we have ignored certain important dynamical features of the system. Hence the correct application of MEP depends, after all, on dynamical features of the system. More specifically, the micro canonical distribution seems to be correct only if there are no such invariant subsets, i.e. if the system is ergodic. A final family of objections has to do with the epistemic interpretation of probabilities itself (rather than with ‘technical’ problems in connection with the application of the MEP formalism). First, the Gibbs entropy is defined in terms of the distribution ρ, and if ρ pertains to our epistemic situation rather than to (aspects of) the system, it, strictly speaking, does not make any sense to say that entropy is a property of the system; rather, entropy is a property of our knowledge of the system. Second, in the Gibbs approach equilibrium is defined in terms of specific properties that the distribution ρ must possess at equilibrium (see §3.3.1). Now the same problem arises: if ρ reflects our epistemic situation rather than facts about the system, then it does not make sense to say that the system is in equilibrium; if anything, it is our knowledge that is in equilibrium. This carries over the non-equilibrium case. If ρ is interpreted epistemically, then the approach to equilibrium also pertains to our knowledge and not to the system. This has struck many commentators as outright wrong, if not nonsensical. Surely, the boiling of kettles or the spreading of gases has something to do with how the molecules constituting these systems behave and not with what we happen (or fail) to know about them (Redhead 1995, pp. 27– 8; Albert 2000, p. 64; Goldstein 2001, p. 48; Loewer 2001, p. 611). Of course, 74 Quay

(1978, pp. 53–4) discusses this point in the context of ergodic theory.

174 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

nothing is sacred, but further explanation is needed if such a radical conceptual shift is to appear plausible. Against the first point Jaynes argues that entropy is indeed epistemic even in TD (1983, pp. 85–6) because here there is no such thing as the entropy of a physical system. In fact, the entropy is relative to what variables one chooses to describe the system; depending on how we describe the system, we obtain different entropy values. From this Ben-Menahem (2001, §3) draws the conclusion that, Jaynes’ insistence on knowledge notwithstanding, one should say that entropy is relative to descriptions rather than to knowledge, which would mitigate considerably the force of the objection. This ties in with the fact (mentioned in Sklar 1999, p. 195) that entropy is only defined by its function in the theory (both in TD and in SM); we neither have a phenomenal access to it nor are there measurement instruments to directly measure entropy. These points do, to some extent, render an epistemic (or descriptive) understanding of entropy more plausible, but whether they in anyway mitigate the implausibility that attaches to an epistemic understanding of equilibrium and the approach to equilibrium remains an open question. 3.3.7

Reductionism

How does the Gibbsian approach fare with reducing TD to SM? The aim of a reduction is the same as in the Boltzmannian case: deduce a revised version of the laws of TD from SM (see §3.2.8). The differences lie in the kind of revisions that are made. I first discuss those approaches that proffer an ontic understanding of probabilities and then briefly discuss how reduction could be construed. Boltzmann took over from TD the notion that entropy and equilibrium are properties of an individual system and sacrificed the idea that equilibrium (and the associated entropy values) are stationary. Gibbs, on the contrary, retains the stationarity of equilibrium, but at the price of making entropy and equilibrium properties of an ensemble rather than an individual system. This is because both equilibrium and entropy are defined in terms of the probability distribution ρ, which is a distribution over an ensemble and not over an individual system. Since a particular system can be a member of many different ensembles one can no longer assert that an individual system is in equilibrium. This ‘ensemble character’ carries over to other physical quantities, most notably temperature, which are also properties of an ensemble and not of an individual system. This is problematic because the state of an individual system can change considerably as time evolves while the ensemble average does not change at all; so we cannot infer from the behaviour of an ensemble to the behaviour of an individual system. However, what we are dealing with in experimental contexts are individual systems; and so the shift to ensembles has been deemed inadequate by some. Maudlin (1995, p. 147) calls it a ‘Pyrrhic victory’ and Callender (1999) argues that this and related problems disqualify the Gibbs approach as a serious contender for a reduction of TD. It is worth observing that Gibbs himself never claimed to have reduced TD

CONCLUSION

175

to SM and only spoke about ‘thermodynamic analogies’ when discussing the relation between TD and SM; see Uffink (2007, pp. 994–6) for a discussion. The notion of analogy is weaker than that of reduction, but it is at least an open question whether this is an advantage. If the analogy is based on purely algebraic properties of certain variables then it is not clear what, if anything, SM contributes to our understanding of thermal phenomena; if the analogy is more than a merely formal one, then at least some of the problems that we have been discussing in connection with reduction are bound to surface again. 3.4

Conclusion

Before drawing some general conclusions from the discussion in Sections 3.2 and 3.3, I would like to briefly mention some of the issues, which, for lack of space, I could not discuss. 3.4.1

Sins of Omission

SM and the Direction of Time. The discussion of irreversibility so far has focused on the problem of the directionality of change in time. One can take this one step further and claim that this directionality in fact constitutes the direction of time itself (the ‘arrow of time’). Attempts to underwrite the arrow of time by an appeal to the asymmetries of thermodynamics and SM can be traced back to Boltzmann, and have been taken up by many since. The literature on the problem of the direction of time is immense and it is impossible to give a comprehensive bibliography here; instead I mention just some approaches that are closely related to SM. The modern locus classicus for a view that seeks to ground the arrow of time on the flow of entropy is Reichenbach (1956). Earman (1974) offers a sceptical take on this approach and provides a categorisation of the different issues at stake. These are further discussed in Sklar (1981; 1993, Chapter 10), Price (1996; 2002a; 2002b), Horwich (1987), Callender (1998), Albert (2000, Chapter 6), Brown (2000), North (2002), Castagnino and Lombardi (2005), Hagar (2005) and Frisch (2006). The Gibbs paradox. Consider a container that is split in two halves by a barrier in the middle. The left half is filled with gas G1 , the right half with a different gas G2 ; both gases have the same temperature. Now remove the shutter. As a result both gases start to spread and get mixed. We then calculate the entropy of the initial and the final state and find that the entropy of the mixture is greater than the entropy of the gases in their initial compartments. This is the result that we would expect. The paradox arises from the fact that the calculations do not depend on the fact that the gases are different; that is, if we assume that we have air of the same temperature on both sides of the barrier the calculations still yield an increase in entropy when the barrier is removed. This seems wrong since it would imply that the entropy of a gas depends on its history and cannot be a function of its thermodynamic state alone (as thermodynamics requires). What has gone wrong? The standard ‘textbook solution’ of this problem is that classical SM gets the entropy wrong because it makes a mistake when

176 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

counting states (see for instance Huang 1963, pp. 153–4; Greiner et al. 1993, pp. 206–8). The alleged mistake is that we count states that differ only by a permutation of two indistinguishable particles as distinct, while we should not do this. Hence the culprit is a flawed notion of individuality, which is seen as inherent to classical mechanics. The solution, so the argument goes, is provided by quantum mechanics, which treats indistinguishable particles in the right way. This argument raises a plethora of questions concerning the nature of individuality in classical and quantum mechanics, the way of counting states in both the Boltzmann and the Gibbs approach, and the relation of SM to thermodynamics. These issues are discussed in Rosen (1964), Lande (1965), van Kampen (1984), Denbigh and Denbigh (1985, Chapter 4), Denbigh and Redhead (1989), Jaynes (1992), Redhead and Teller (1992), Mosini (1995), Costantini and Garibaldi (1997, 1998), Huggett (1999), Gordon (2002) and Saunders (2006). Maxwell’s Demon. Imagine the following scenario, originating in a letter of Maxwell’s written in 1867. Take two gases of different temperature that are separated from one another only by a wall. This wall contains a shutter, which is operated by a demon who carefully observes all molecules. Whenever a particle moves towards the shutter from the colder side and the particle’s velocity is greater than the mean velocity of the particles in the hotter gas, then the demon opens the shutter, and so lets the particle pass through. Similarly, when a particle heads for the shutter from within the hotter gas and the particle’s velocity is lower than the mean velocity of the particles of the colder gas, then the demon lets the particle pass through the shutter. The net effect of this is that the hotter gas becomes even hotter and the colder one even colder. So we have a heat transfer from the cooler to the hotter gas, and this without doing any work; it is only the skill and intelligence of the demon, who is able to sort molecules, that brings about the heat transfer. But this sort of heat transfer is not allowed according to the Second Law of thermodynamics. So the conclusion is that the demon has produced a violation of the second law of thermodynamics. In Maxwell’s own interpretation, this thought experiment shows that the second law is not an exceptionless law; it rather describes a general tendency for systems to behave in a certain way, or, as he also puts it, it shows that the second law has only ‘statistical certainty’. Since Maxwell, the demon had a colourful history. In particular, in the wake of Szilard’s work, much attention has been paid to the entropy costs of processing and storing information. These issues are discussed in Daub (1970), Klein (1970), Leff and Rex (1990; 2003), Shenker (1997; 1999), Earman and Norton (1998; 1999), Albert (2000, Chapter 5), Bub (2001), Bennett (2003), Norton (2005), Maroney (2005) and Ladyman et al. (2007). Entropy. There are a number of related but not equivalent concepts denoted by the umbrella term ‘entropy’: thermodynamic entropy, Shannon entropy, Boltzmann entropy (fine-grained and coarse-grained), Gibbs entropy (fine-grained and coarse-grained), Kolmogorov-Sinai entropy, von Neumann entropy and fractal entropy, to mention just the most important ones. It is not always clear how

CONCLUSION

177

these relate to one another as well as to other important concepts such as algorithmic complexity and informational content. Depending on how these relations are construed and on how the probabilities occurring in most definitions of entropy are interpreted, different pictures emerge. Discussions of these issues can be found in Grad (1961), Jaynes (1983), Wehrl (1978), Denbigh and Denbigh (1985), Denbigh (1989b), Barrett and Sober (1992; 1994; 1995), Smith et al. (1992), Denbigh (1994), Frigg (2004), Balian (2005), and with a particular focus on entropy in quantum mechanics in Shenker (1999), Henderson (2003), Timpson (2003), Campisi (2005, 2008), Sorkin (2005) and Hemmo and Shenker (2006). The relation between entropy and counterfactuals is discussed in Elga (2001) and Kutach (2002). Quantum Mechanics and Irreversibility. This review was concerned with the problem of somehow ‘extracting’ time-asymmetric macro laws from time-symmetric classical micro laws. How does this project change if we focus on quantum rather than classical mechanics? Prima facie we are faced with the same problems because the Schr¨odinger equation is time reversal invariant (if we allow replacing the wave function by its complex conjugate when evolving it backwards in time). However, in response to the many conceptual problems of quantum mechanics new interpretations of quantum mechanics or even alternative quantum theories have been suggested, some of which are not time reversal invariant. Dynamical reduction theories (such as GRW theory) build state collapses into the fundamental equation, which thereby becomes non time-reversal invariant. Albert (1994a; 1994b; 2000, Chapter 7) has suggested that this time asymmetry can be exploited to underwrite thermodynamic irreversibility; this approach is discussed in Uffink (2002). Another approach has been suggested by Hemmo and Shenker who, in a series of papers, develop the idea that we can explain the approach to equilibrium by environmental decoherence (2001; 2003; 2005). Phase Transitions. Most substances, for instance water, can exist in different phases (liquid, solid, gas). Under suitable conditions, so-called phase transitions can occur, meaning that the substance changes from, say, the liquid to the solid phase. How can the phenomenon of phase transitions be understood from a microscopic point of view? This question is discussed in Sewell (1986, Chapters 5-7), Lebowitz (1999), Liu (2001) and Emch and Liu (2002, Chapters 11-14). SM methods outside physics. Can the methods of SM be used to deal with problems outside physics? In some cases it seems that this is the case. Constantini and Garibaldi (2004) present a generalised version of the Ehrenfest flea model and show that it can be used to describe a wide class of stochastic processes, including problems in population genetics and macroeconomics. The methods of SM have also been applied to markets, a discipline now known as ‘econophysics’; see Voit (2005) and Rickles (2008).

178 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

3.4.2 Summing Up The foremost problem of the foundation of SM is the lack of a generally accepted and universally used formalism, which leads to a kind of schizophrenia in the field. The Gibbs formalism has a wider range of application and is mathematically superior to the Boltzmannian approach and is therefore the practitioner’s workhorse. In fact, virtually all practical applications of SM are based on the Gibbsian machinery. The weight of successful applications notwithstanding, a consensus has emerged over the last decade and a half that the Gibbs formalism cannot explain why SM works and that when it comes to foundational issues the Boltzmannian approach is the only viable option (see Lavis (2005) and references therein). Hence, whenever the question arises of why SM is so successful, an explanation is given in Boltzmannian terms. This is problematic for at least two reasons. First, at least in its current form, the Boltzmann formalism has a very limited range of applicability. The Boltzmann formalism only applies to non (or very weakly) interacting particles and at the same time it is generally accepted that the Past Hypothesis, an assumption about the universe as a whole, is needed to make it work. But the universe as a whole is not a collection of weakly interacting systems, not even approximately. Second, even if the internal problems of the Boltzmann approach can be solved, we are left with the fact that what delivers the goodies in ‘normal science’ is the Gibbs rather than the Boltzmann approach. This would not be particularly worrisome if the two formalisms were intertranslatable or equivalent in some other sense (like, for instance, the Schr¨ odinger and the Heisenberg picture in quantum mechanics). However, as we have seen above, this is not the case. The two frameworks disagree fundamentally over what the object of study is, the definition of equilibrium, and the nature of entropy. So even if all the internal difficulties of either of these approaches were to find a satisfactory solution, we would still be left with the question of how the two relate. A suggestion of how these two frameworks could be reconciled has recently been presented by Lavis (2005). His approach involves the radical suggestion to give up the notion of equilibrium, which is binary in that systems either are or not in equilibrium, and to replace it by the continuous property of ‘commonmess’. Whether this move is justified and whether it solves the problem is a question that needs to be discussed in the future. Appendix A. Classical Mechanics CM can be presented in various more or less but not entirely equivalent formulations: Newtonian mechanics, Lagrangian mechanics, Hamiltonian mechanics and Hamilton-Jacobi theory; for comprehensive presentations of these see Arnold (1978), Goldstein (1980), Abraham and Marsden (1980) and Jos´e and Saletan (1998). Hamiltonian Mechanics (HM) is best suited to the purposes of SM; hence this appendix focuses entirely on HM.

CONCLUSION

179

CM describes the world as consisting of point-particles, which are located at a particular point in space and have a particular momentum. A system’s state is fully determined by a specification of each particle’s position and momentum. Conjoining the space and momentum dimension of all particles of a system in one vector space yields the so-called phase space Γ of the system. The phase space of a system with m degrees of freedom is 2m dimensional; for instance, the phase space of a system consisting of n particles in three-dimensional space has 6n dimensions. Hence, the state of a mechanical system is given by the 2m-tuple x := (q, p) := (q1 , . . . , qm , p1 , . . . , pm ) ∈ Γ. The phase space Γ is endowed with a Lebesgue measure μL , which, in the context of SM, is also referred to as the ‘standard measure’ or the ‘natural measure’. The time evolution of the system is governed by Hamilton’s equation of motion:75 q˙i =

∂H ∂pi

and

p˙i = −

∂H , ∂qi

i = 1, . . . , m,

(3.43)

where H(q, p, t) is the so-called ‘Hamiltonian’ of the system. Under most circumstances the Hamiltonian is the energy of the system (this is not true in systems with time dependent boundary conditions, but these do not play a rˆ ole in the present discussion). If the Hamiltonian satisfies certain conditions (see Arnold (2006) for a discussion of these) CM is deterministic in the sense that the state x0 of the system at some particular instant of time t0 (the so-called ‘initial condition’) uniquely determines the state of the system at any other time t. Hence, each point in Γ lies on exactly one trajectory (i.e. no two trajectories in phase space can ever cross) and H(q, p, t) defines a one parameter group of transformations φt , usually referred to as ‘phase flow’, mapping the phase space onto itself: x → φt (x) for all x ∈ Γ and all t. A quantity f of the system is a function of the coordinates and (possibly) time: f (q, p, t). The time evolution of f is given by ∂f , f˙ = {f, H} + ∂t where { , } is the so-called Poisson bracket:    ∂g ∂h ∂h ∂g {g, h} := − , ∂qi ∂pi ∂qi ∂pi i

(3.44)

(3.45)

for any two differentiable functions g and h on Γ. From this it follows that H is a conserved quantity iff it does not explicitly depend on time. In this case we say that the motion is stationary, meaning that the phase flow depends only on the time interval between the beginning of the 75 The

dot stands for the total time derivative: f˙ := df /dt.

180 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

motion and ‘now’ but not on the choice of the initial time. If H is a conserved quantity, the motion is confined to a 2m − 1 dimensional hypersurface ΓE , the so-called ‘energy hypersurface’, defined by the condition H(q, p) = E, where E is the value of the total energy of the system. Hamiltonian dynamics has three distinctive features, which we will now discuss. Liouville’s theorem asserts that the Lebesgue measure (in this context also referred to as ‘phase volume’) is invariant under the Hamiltonian flow: For any Lebesgue measurable region R ⊆ Γ and for any time t: R and the image of R under the Hamiltonian flow, φt (R), have the same Lebesgue measure; i.e. μL (R) = μL (φt (R)).

In geometrical terms, a region R can (and usually will) change its shape but not its volume under the Hamiltonian time evolution. This also holds true if we restrict the motion of the system to the energy hypersurface ΓE , provided we choose the ‘right’ measure on ΓE . We obtain this measure, μL, E , by restricting μL to ΓE so that the 6n − 1 dimensional hypervolume of regions in ΓE is conserved under the dynamics. This can be achieved by dividing the surface element dσE on ΓE by the gradient of H (Kac 1959, 63):  dσE (3.46) μL, E (RE ) := gradH RE for any RE ⊆ ΓE , where  gradH :=

n   ∂H 2 k=1

∂pk

+

 ∂H 2 ∂qk

1/2 .

(3.47)

We then have μL, E (RE ) = μL, E (φt (RE )) for all RE ⊆ ΓE and for all t. Poincar´e’s recurrence theorem: Roughly speaking, Poincar´e’s recurrence theorem says that any system that has finite energy and is confined to a finite region of space must, at some point, return arbitrarily close to its initial state, and does so infinitely many times. The time that it takes the system to return close to its initial condition is called ‘Poincar´e recurrence time’. Using the abstract definition of a dynamical system introduced in §3.2.4.1, the theorem can be stated as follows: Consider an area-preserving mapping of the phase space X of a system onto itself, φt (X) = X, and suppose that its measure is finite, μ(X) < ∞. Then, for any measurable subset A with μL (A) > 0 of X, almost every point x in A returns to A infinitely often; that is, for all finite times τ the set B := {x|x ∈ A and for all times t ≥ τ : φt x ∈ / A} has measure zero.

The Hamiltonian systems that are of interest in SM satisfy the requirements of the theorem if we associate X with the accessible region of the energy hypersurface.

CONCLUSION

181

Time reversal invariance. Consider, say, a ball moving from left to right and record this process on videotape. Intuitively, time reversal amounts to playing the tape backwards, which makes us see a ball moving from right to left. Two ingredients are needed to render this idea precise, a transformation reversing the direction of time, t → −t, and the reversal of a system’s instantaneous state. In some contexts it is not obvious what the instantaneous state of a system is and what should be regarded as its reverse.76 In the case of HM, however, the ball example provides a lead. The instantaneous state of a system is given by (q, p), and in the instant in which the time is reversed the ball suddenly ‘turns around’ and moves from right to left. This suggests the sought-after reversal of the instantaneous state amounts to changing the sign of the momentum: R(q, p) := (q, −p), where R is the reversal operator acting on instantaneous states. Now consider a system in the initial state (qi , pi ) at time ti that evolves, under the system’s dynamics, into the final state (qf , pf ) at some later time tf . The entire process (‘history’) is a parametrised curve containing all intermediates states: h := {(q(t), p(t))|t ∈ [ti , tf ]}, where (q(ti ), p(ti )) = (qi , pi ) and (q(tf ), p(tf )) = (qf , pf ). We can now define the time-reversed process of h as follows: T h := {R(q(−t), p(−t))|t ∈ [−tf , −ti ]}, where T is the time-reversal operator acting on histories. Introducing the variable τ := −t and applying R we have T h = {(q(τ ), −p(τ ))| τ ∈ [ti , tf ]}. Hence, T h is a process in which the system evolves from state R(qf , pf ) to state R(qi , pi ) when τ ranges over [ti , tf ]. Call the class of processes h that are allowed by a theory A; in the case of HM A contains all trajectories that are solutions of Hamilton’s equation of motion. A theory is time reversal invariant (TRI) iff for every h: if h ∈ A then T h ∈ A (that is, if A is closed under time reversal). Coming back to the analogy with videotapes, a theory is TRI iff a censor who has to ban films containing scenes which violate the law of the theory issues a verdict which is the same for either direction of playing the film (Uffink 2001, p. 314). This, however, does not imply that the processes allowed by a TRI theory are all palindromic in the sense that the processes themselves look the same when played backwards; this can but need not be the case. HM is TRI in this sense. This can be seen by time-reversing the Hamiltonian equations: carry out the transformations t → τ and (q, p) → R(q, p) and after some elementary algebraic manipulations you find dqi /dτ = ∂H/∂pi and dpi /dτ = −∂H/∂pi , i = 1, ..., m. Hence the equations have the same form in either direction of time, and therefore what is allowed in one direction of time is also allowed in the other.77 76 A recent controversy revolves around this issue. Albert (2000, Chapter 1) claims that, common physics textbook wisdom notwithstanding, neither electrodynamics, nor quantum mechanics nor general relativity nor any other fundamental theory turns out to be time reversal invariant once the instantaneous states and their reversals are defined correctly. This point of view has been challenged by Earman (2002), Uffink (2002) and Malament (2004), who defend common wisdom; for a further discussion see Leeds (2006). 77 There was some controversy over the question of whether classical mechanics really is TRI; see Hutchison (1993, 1995a, 1995b), Savitt (1994) and Callender (1995). However, the moot

182 RECENT WORK ON THE FOUNDATIONS OF STATISTICAL MECHANICS

The upshot of this is that if a theory is TRI then the following holds: if a transition from state (qi , pi ) to state (qf , pf ) in time span Δ := tf − ti is allowed by the lights of the theory, then the transition from state R(qf , pf ) to state R(qi , pi ) in time span Δ is allowed as well, and vice versa. This is the crucial ingredient of Loschmitd’s reversibility objection (see §3.2.3.3). B. Thermodynamics Thermodynamics is a theory about macroscopic quantities such as pressure, volume and temperature and it is formulated solely in terms of these; no reference to unobservable microscopic entities is made. At its heart lie two laws, the First Law and Second Law of TD. Classical presentations of TD include Fermi (1936), Callen (1960), Giles (1964) and Pippard (1966). The first law of thermodynamics. The first law says that there are two ways of exchanging energy with a system, putting heat into it and doing work on it, and that energy is a conserved quantity: ΔU = ΔQ + ΔW,

(3.48)

where ΔU is the energy put into the system, and ΔQ and ΔW are, respectively, the heat and work that went into the system. Hence, put simply, the first law, says that one cannot create energy and thereby rules out the possibility of a perpetual motion machine. The second law of thermodynamics. The First Law does not constrain the ways in which one form of energy can be transformed into another one and how energy can be exchanged between systems or parts of a system. For instance, according to the first law it is in principle possible to transform heat into work or work into heat according to one’s will, provided the total amount of heat is equivalent to the total amount of work. However, it turns out that although one can always transform work into heat, there are severe limitations on the ways in which heat can be transformed into work. These limitations are specified by the Second Law. Following the presentation in Fermi (1936, pp. 48–55), the main tenets of the Second Law can be summarised as follows. Let A and B be two equilibrium states of the system. Then consider a quasi-static transformation (i.e. one that is infinitely gentle in the sense that it proceeds only through equilibrium states), which takes the system from A to B . Now consider the integral 

B

dQ , (3.49) T A where T is the temperature of the system and dQ is the amount of heat quasistatically absorbed by the system. One can then prove that the value of this point in this debate was the status of frictional forces, which, unlike in Newtonian Mechanics, are not allowed in HM. So this debate has no implications for the question of whether HM is TRI.

CONCLUSION

183

integral does not depend on the sequence by which one gets from A to B; it only depends on A and B themselves. Now choose an arbitrary equilibrium state E of the system and call it the standard state. Then we can define the entropy of the state A as 

A

S(A) = E

dQ , T

(3.50)

where the integral is taken over a quasi-static transformation. With this at hand we can formulate the Second Law of thermodynamics: 

B

A

dQ ≤ S(B) − S(A). T

(3.51)

For a totally isolated system we have dQ = 0. In this case the Second Law takes the particularly intuitive form: S(A) ≤ S(B).

(3.52)

That is, for any transformation in an isolated system, the entropy of the final state can never be less than that of the initial state. The equality sign holds if, and only if, the transformation is quasi-static. Thermodynamics is not free of foundational problems. The status of the Second Law is discussed in Popper (1957), Lieb and Yngvason (1999) and Uffink (2001); Cooper (1967), Boyling (1972), Moulines (1975; 2000), Day (1977) and Garrido (1986) examine the formalism of TD and possible axiomatisations. The nature of time in TD is considered in Denbigh (1953) and Brown and Uffink (2001); Rosen (1959), Roberts and Luce (1968) and Liu (1994) discuss the compatibility of TD and relativity theory. Wickens (1981) addresses the issue of causation in TD. Acknowledgements I would like to thank Jeremy Butterfield, Craig Callender, Adam Caulton, Jos´e Diez, Foad Dizadji-Bahmani, Olimpia Lombardi, Stephan Hartmann, Carl Hoefer, David Lavis, Wolfgang Pietsch, Jos Uffink, and Charlotte Werndl, for invaluable discussions and comments on earlier drafts. Thanks to Dean Rickles for all the hard editorial work, and for his angelic patience with my ever changing drafts. I would also like to acknowledge financial support from two project grants of the Spanish Ministry of Science and Education (SB2005-0167 and HUM2005-04369).

REFERENCES

Abraham, R. and J. E. Marsden (1980). Foundations of Mechanics (2nd edn). London: Benjamin-Cummings. Ainsworth, P. (2005). The spin-echo experiment and statistical mechanics. Foundations of Physics Letters 18, 621–35. Albert, D. (1994a). The foundations of quantum mechanics and the approach to thermodynamic equilibrium, British Journal for the Philosophy of Science 45, 669–77. ————– (1994b). The foundations of quantum mechanics and the approach to thermodynamic equilibrium, Erkenntnis 41, 191–206. ————– (2000). Time and Chance. Cambridge/MA and London: Harvard University Press. Armstrong, D. (1983). What is a Law of Nature? Cambridge: Cambridge University Press. Arnold, V. I. (1978). Mathematical Methods of Classical Mechanics. New York and Berlin Springer. ————– (2006). Ordinary Differential Equations. Berlin Springer. ————– and Avez, A. (1968). Ergodic Problems of Classical Mechanics. New York: Wiley. Badion, M. (2006). The foundational role of ergodic theory, Foundations of Science 11, 323–47. Balian, R. (2005). Information in statistical physics, Studies in the History and Philosophy of Modern Physics 36, 323–53. Barrett, M. and E. Sober (1992). Is entropy relevant to the asymmetry between retrodiction and prediction?, British Journal for the Philosophy of Science 43, 141–60. ————– (1994). The second law of probability dynamics, British Journal for the Philosophy of Science 45, 941–53. ————– (1995). When and why does entropy increase?, in Steven Savitt ed, Time’s Arrows Today: Recent Physical and Philosophical Work on the Direction of Time. New York: Cambridge University Press, pp. 230–55. Batterman, R. W. (1990). Irreversibility and statistical mechanics: A new approach?, Philosophy of Science 57, 395–419. ————– (1991). Randomness and probability in dynamical theories: On the proposals of the Prigogine School, Philosophy of Science 58, 241–63. ————– (1998). Why equilibrium statistical mechanics works: Universality and the renormalization group, Philosophy of Science 65, 183–208. ————– (2002). The Devil in the Details. Asymptotic Reasoning in Explanation, Reduction, and Emergence. Oxford: Oxford University Press. 184

REFERENCES

185

————– (2003). Intertheory relations in physics. Stanford Encyclopaedia of Philosophy, http://plato.stanford.edu, Spring 2003 edn. Ben-Menahem, Y. (2001). Direction and description. Studies in History and Philosophy of Modern Physics 32, 621–35. Bennett, C. H. (2003). Notes on Landauer’s Principle, reversible computation, and Maxwell’s Demon, Studies in History and Philosophy of Modern Physics 34, 501–10. Berkovitz, J., R. Frigg and F. Kronz (2006). The ergodic hierarchy, randomness, and chaos, Studies in History and Philosophy of Modern Physics 37, 661–91. Bishop, R. (2004). Nonequilibrium statistical mechanics Brussels-Austin Style, Studies in History and Philosophy of Modern Physics 35, 1–30. Blackmore, J. (1999). Boltzmann and epistemology, Synthese 119, 157–89. Blatt, J. (1959). An alternative approach to the ergodic problem, Progress in Theoretical Physics 22, 745–55. ¨ Boltzmann, L. (1877). Uber die beziehung zwischen dem zweiten hauptsatze der mechanischen w¨armetheorie und der wahrscheinlichkeitsrechnung resp. den s¨atzen u ¨ber das w¨ armegleichgewicht. Wiener Berichte 76, 373-435. Reprinted in F. Hasen¨ ohrl ed, Wissenschaftliche Abhandlungen. Leipzig: J. A. Barth 1909, Vol. 2, pp. 164–223. Boyling, J. B. (1972). An axiomatic approach to classical thermodynamics, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences 329, 35–70. Bricmont, J. (1996). Science of chaos or chaos in science? In P. R. Gross, N. Levitt, and M. W. Lewis eds, The Flight from Science and Reason. Annals of the New York Academy of Sciences, Vol. 775, New York: The New York Academy of Sciences, pp. 131–75. ————– (2001). Bayes, Boltzmann, and Bohm: Probabilities in physics, in Bricmont et al. (2001), pp. 4–21. ————– D. D¨ urr, M. C. Galavotti, G. Ghirardi, F. Petruccione and N. Zangh`ı eds (2001). Chance in Physics: Foundations and Perspectives. Berlin and New York: Springer. Brown, H. (2000). Essay Review of Time’s Arrow and Archimedes’ Point by H. Price, Contemporary Physics 41, 335–6, 1996. ————– and J. Uffink (2001). The origin of time-asymmetry in thermodynamics: the minus first law, Studies in History and Philosophy of Modern Physics 32, 525–38. Brush, S. G. (1976). The Kind of Motion We Call Heat. Amsterdam: North Holland Publishing. Bub, J. (2001). Maxwell’s Demon and the thermodynamics of computation, Studies in the History and Philosophy of Modern Physics 32, 569–79. Butterfield, J. (1987). Probability and disturbing measurement, Proceedings of the Aristotelian Society. Suppl. Vol. LXI, 211–43. Callen, H. (1960). Thermodynamics. New York: Wiley.

186

REFERENCES

Callender, C. (1995). The metaphysics of time reversal: Hutchison on classical mechanics, British Journal for the Philosophy of Science 46, 331–40. ————– (1998). The view from no-when, British Journal for the Philosophy of Science 49, 135–59. ————– (1999). Reducing thermodynamics to statistical mechanics: The case of entropy, Journal of Philosophy 96, 348–73. ————– (2001). Taking thermodynamics too seriously, Studies in the History and Philosophy of Modern Physics 32, 539–53. ————– (2004a). There Is no puzzle about the low-entropy past, in Christopher Hitchcock ed, Contemporary Debates in Philosophy of Science. Oxford, Malden/MA and Victoria: Blackwell, pp. 240–55. ————– (2004b). Measures, explanations and the past: Should special initial conditions be explained?, British Journal for the Philosophy of Science 55, 195–217. Campisi, M. (2005). On the mechanical foundations of thermodynamics: The generalized Helmholtz theorem, Studies in History and Philosophy of Modern Physics 36, 275–90. ————– (2008). Statistical mechanical proof of the second law of thermodynamics based on volume entropy, Studies in History and Philosophy of Physics 39, 181–94. Cartwright, N. (1999). The Dappled World. A Study of the Boundaries of Science. Cambridge: Cambridge University Press. ————– and A. Alexandrova (2006). Philosophy of science: Laws, forthcoming in M. Smith and F. Jackson eds, The Oxford Handbook of Contemporary Philosophy. Oxford: Oxford University Press. Castagnino, M. and O. Lombardi (2005). A global and non-entropic approach to the problem of the arrow of time, in Albert Reimer ed, Spacetime Physics Research Trends. Horizons in World Physics. New York: Nova Science, pp. 74–108. Cercignani, C. (1998). Ludwig Boltzmann. The Man who Trusted Atoms. Oxford: Oxford University Press. Clark, P. (1987). Determinism and probability in physics, Proceedings of the Aristotelian Society. Suppl. Vol. LXI, 185–210. ————– (1989). Determinism, probability and randomness in classical statistical mechanics, in K. Gavroglu, Y. Goudaroulis and P. Nicolacopoulos eds, Imre Lakatos and Theories of Scientific Change. Dordrecht: Kluwer, pp. 95–110. ————– (1995). Popper on determinism, in Anthony O’Hear ed, Karl Popper: Philosophy and Problems. Cambridge: Cambridge University Press, 149–62. ————– (2001). Statistical mechanics and the propensity interpretation of probability, in Bricmont et al. eds pp. 271–81. Compagner, A. (1989). Thermodynamics as the continuum limit of statistical mechanics, American Journal of Physics 57, 106–17.

REFERENCES

187

Cooper, J. L. B. (1967). The foundations of thermodynamics, Journal of Mathematical Analysis and Applications 17, 172–92. Cornfeld, I. P., Fomin, S. V., and Sinai, Y. G. (1982). Ergodic Theory. Berlin and New York: Springer. Costantini, D. and G. Ubaldo (1997). A probabilistic foundation of elementary particle statistics. Part I, Studies in History and Philosophy of Modern Physics 28, 483–506. ————– (1998). A probabilistic foundation of elementary particle statistics: Part II.’ Studies in History and Philosophy of Modern Physics 29, 37–59. ————– (2004). The Ehrenfest fleas: From model to theory, Synthese 139, 107–42. Cover, T. M. and J. M. Thomas (1991). Elements of Information Theory. New York and Chichester: Wiley. Daub, E. (1970). Maxwell’s Demon, Studies in the History and Science 1, 213– 27. Davies, P. (1974). The Physics of Time Asymmetry. Berkeley: UC Press. Day, M. A. (1977). An axiomatic approach to first law of thermodynamics, Journal of Philosophical Logic 6, 119–34. Denbigh, K. G. (1953). Thermodynamics and the subjective sense of time, British Journal for the Philosophy of Science 4 183–91. ————– (1989a). The many faces of irreversibility, British Journal for the Philosophy of Science 40, 501–18. ————– (1989b). Note on entropy, disorder and disorganisation, British Journal for the Philosophy of Science 40, 323–32. ————– (1994). Comment on Barrett and Sober’s paper on the relevance of entropy to retrodiction and prediction, British Journal for the Philosophy of Science 45, 709–11. ————– and Denbig, J. S. (1985). Entropy in Relation to Incomplete Knowledge. Cambridge: Cambridge University Press. ————– and M.L.G. Redhead (1989). Gibbs’ paradox and non-uniform convergence, Synthese 81, 283–312. de Regt, H. W. (1996). Philosophy and the kinetic theory of gases, British Journal for the Philosophy of Science 47, 31–62. Dougherty, J. P. (1993). Explaining statistical mechanics, Studies in History and Philosophy of Science 24, 843–66. Dupr´e, J. (1993). The Disorder of Things: Metaphysical Foundations of the Disunity of Science. Cambridge/MA and London: Harvard University Press. Earman, J. (1974). An attempt to add a little direction to the problem of the direction of time”, Philosophy of Science 41, 15–47. ————– (1984). Laws of nature: The empiricist challenge, in R. Bogdan ed, D. M. Armstrong. Dordrecht: Reidel, pp. 191–223. ————– (2002). What time reversal invariance is and why it matters, International Studies in the Philosophy of Science 16, 245–64.

188

REFERENCES

————– (2006). The past hypothesis: Not even false, Studies in History and Philosophy of Modern Physics 37, 399–430. ————– and J. Norton (1998). Exorcist XIV: The wrath of Maxwell Demon. Part I. From Maxwell to Szilard, Studies in the History and Philosophy of Modern Physics 29, 435–71. ————– and ————– (1999). Exorcist XIV: The wrath of Maxwell Demon. Part II. From Szilard to Landauer and Beyond, Studies in the History and Philosophy of Modern Physics 30, 1–40. ————– and M. R´edei (1996). Why ergodic theory does not explain the success of equilibrium statistical mechanics, British Journal for the Philosophy of Science 47, 63–78. Edens, B. (2001). Semigroups and Symmetry: An Investigation of Prigogine’s Theories. Available at http://philsci-archive.pitt.edu/archive/00000436. ¨ Ehrenfest, P. and T. Ehrenfest (1907). Uber zwei bekannte einw¨ ande gegen das Boltzmannsche H-Theorem, Phyikalische Zeitschrift 8, 311–14. ————– (1912). The Conceptual Foundations of the Statistical Approach in Mechanics. Mineola/New York: Dover Publications, 2002. Elga, A. (2001). Statistical mechanics and the asymmetry of counterfactual dependence, Philosophy of Science (Supplement) 68, 313–24. Emch, G. (2005). Probabilistic issues in statistical mechanics, Studies in the History and Philosophy of Modern Physics 36, 303–22. ————– (2007). Quantum statistical physics, in J. Butterfield and J. Earman eds, Philosophy of Physics. Amsterdam: North Holland, pp. 1075–182. ————– and C. Liu (2002). The Logic of Thermo-statistical Physics. Berlin and New York: Springer. Farquhar, I. E. (1964). Ergodic Theory in Statistical Mechnics. London, New York, and Sydney: John Wiley and Sons. Fermi, E. (1936). Thermodynamics. Mineola/NY: Dover Publications, 2000. Feynman, R. P. (1965). The Character of Physical Law. Cambridge/Ma: MIT Press, 1967. Friedman, K. S. (1976). A partial vindication of ergodic theory, Philosophy of Science 43, 151–62. Frigg, R. (2004). In what sense is the Kolmogorov-Sinai entropy a measure for chaotic behaviour?—Bridging the gap between dynamical systems theory and communication theory, British Journal for the Philosophy of Science 55, 411–34. ————– (2006). Chance in statistical mechanics, forthcoming in Philosophy of Science (Proceedings). ————– (2007a). Probability in Boltzmannian statistical mechanics, to be published in Gerhard Ernst and Andreas H¨ uttemann eds, Time, Chance and Reduction. Philosophical Aspects of Statistical Mechanics. Cambridge: Cambridge University Press. ————– (2007b) Typicality and the approach to equilibrium in Boltzmannian statistical mechanics, to be published in M. Suarez ed, Probabilities, Causes

REFERENCES

189

and Propensities in Physics. Dordrecht: Synthese Library. Frisch, M. (2005). Counterfactuals and the past hypothesis, Philosophy of Science 72, 739–750. ————– (2006). A tale of two arrows, Studies in History and Philosophy of Science 37, 542–558. Galavotti, M. C. (2005). Philosophical Introduction to Probability Theory. Stanford: CSLI Publications. Garrido, J. (1986). Axiomatic basis of equilibrium classical thermodynamics, Erkenntnis 25, 239–64 Garrido, P. L., S. Goldstein and J. L. Lebowitz (2004). Boltzmann entropy for dense fluids not in local equilibrium, Physical Review Letters 92(5), 1–4. Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics. Woodbridge: Ox Bow Press, 1981. Giles, R. (1964). Mathematical Foundations of Thermodynamics. Oxford: Pergamon Press. Gillies, D. (2000). Philosophical Theories of Probability. London: Routledge. Goldstein, H. (1980). Classical Mechanics. Reading, MA: Addison Wesley. Goldstein, S. (2001). Boltzmann’s approach to statistical mechanics, in Bricmont et al., pp. 39–54. ————– and J. L. Lebowitz (2004). On the (Boltzmann) entropy of nonequilibrium systems, Physica D, 53–66. Gordon, B. L. (2002). Maxwell-Boltzmann statistics and the metaphysics of modality, Synthese 133, 393–417. Grad H. (1961). The many faces of entropy, Communications on Pure and Applied Mathematics 14, 323–54. Greiner, W., L. Neise and H. St¨ ucker (1993). Thermodynamik und Statistische Mechanik. Leipzig: Harri Deutsch. Guttmann, Y. M. (1999). The Concept of Probability in Statistical Physics. Cambridge: Cambridge University Press. Hagar, A. (2005). Discussion: The foundation of statistical mechanics—Questions and answers, Philosophy of Science 72, 468–78. Hahn, E. L. (1950). Spin echoes, Physics Review 80, 580–94. Hellman, G. (1999). Reduction(?) to what?: Comments on L. Sklar’s ‘The reduction (?) of thermodynamics to statistical mechanics’, Philosophical Studies 95, 203–14. Hemmo, M. and O. Shenker (2001). Can we explain thermodynamics by quantum decoherence?’ Studies in History and Philosophy of Modern Physics 32, 555–68. ————– (2003). Quantum decoherence and the approach to equilibrium, Philosophy of Science 70, 330–58. ————– (2005). Quantum decoherence and the approach to equilibrium II, Studies in History and Philosophy of Modern Physics 36, 626–48. ————– (2006). Von Neumann’s entropy does not correspond to thermodynamic entropy, Philosophy of Science 73, 153–74.

190

REFERENCES

Henderson, L. (2003). The Von Neumann entropy: A Reply to Shenker, British Journal for the Philosophy of Science 54, 291–96. Honerkamp, J. (1998). Statistical Physics: An Advanced Approach with Applications. Berlin and New York: Springer. Hooker, C. (1981). Towards a general theory of reduction, Dialogue 20, 38–60, 201–235, 496–529. Horwich, P. (1987). Asymmetries in Time. Cambridge: Cambridge University Press. Howson, C. (1995). Theories of probability, British Journal for the Philosophy of Science 46, 1-32. ————– and P. Urbach (2006). Scientific Reasoning. The Bayesian Approach. Third edn, Chicago and La Salle/IL: Open Court. Huang, K. (1963). Statistical Mechanics. New York: Wiley. Huggett, N. (1999). Atomic metaphysics, Journal of Philosophy 96, 5–24. Hutchison, K. (1993). Is classical mechanics really time-reversible and deterministic?, British Journal for the Philosophy of Science 44, 307–23. ————– (1995a), Differing criteria for temporal symmetry, British Journal for the Philosophy of Science 46, 341–7. ————– (1995b), Temporal asymmetry in classical mechanics, British Journal for the Philosophy of Science 46, 219–34. Jaynes, E. T. (1983). Papers on Probability, Statistics, and Statistical Physics. Ed by R. D. Rosenkrantz. Dordrecht: Reidel. ————– (1992). The Gibbs Paradox, in Smith et al. (1992), pp. 1–21. ————– (1994). Probability Theory: The Logic of Science. Jos´e, J. V. and E. J. Saletan (1998). Classical Dynamics: A Contemporary Approach. Cambridge: Cambridge University Press. Kac, M. (1959). Probability and Related Topics in Physical Science. New York: Interscience Publishing. Karakostas, V. (1996). On the Brussels School’s arrow of time in quantum theory, Philosophy of Science 63, 374–400. Khinchin, A. I. (1949). Mathematical Foundations of Statistical Mechanics. Mineola/NY: Dover Publications 1960. Kim, J. (1998). Reduction, Problems of, in Edward Craig ed, Routledge Encyclopaedia of Philosophy. London: Routledge. Klein, M. J. (1970). Maxwell, his Demon, and the second law of thermodynamics. American Scientist 58, 84–97. ————– (1973). The development of Boltzmann’s statistical ideas, in D. Cohen and W. Thirring eds, The Boltzmann Equation: Theory and Applications. Vienna: Springer, pp. 53–106. Kreuzer, H. (1981). Nonequilibrium Thermodynamics and its Statistical Foundations. Oxford: Oxford University Press. Kutach, D. N. (2002). The entropy theory of counterfactuals, Philosophy of Science 69, 82–104. Ladyman, M., S. Presnell, A. J. Short and B. Groisman (2007). The connection

REFERENCES

191

between logical and thermodynamic irreversibility.’ Studies In History and Philosophy of Modern Physics 38, 58–79. Land´e, A. (1965). Solution of the Gibbs paradox, Philosophy of Science 32, 192–3. Lavis, D. (1977). The role of statistical mechanics in classical physics, British Journal for the Philosophy of Science 28, 255–79. ————– (2004). The spin-echo system reconsidered, Foundations of Physics 34, 669–88. ————– (2005). Boltzmann and Gibbs: An attempted reconciliation, forthcoming in Studies in History and Philosophy of Modern Physics. 36, 245–73. ————– and P. Milligan (1985). Essay Review of Jaynes’ Collected Papers, British Journal for the Philosophy of Science 36, 193–210. Lebowitz, J. L. (1993a). Boltzmann’s entropy and time’s arrow, Physics Today, September Issue, 32–38. ————– (1993b). Macroscopic laws, microscopic dynamics, time’s arrow and Boltzmann’s entropy, Physica A 194, 1–27. ————– (1999). Statistical mechanics: A selective review of two central issues, Reviews of Modern Physics 71, 346–57. Leeds, S. (1989). Malament and Zabell on Gibbs phase averaging, Philosophy of Science 56, 325–40. ————– (2003). Foundations of statistical mechanics–Two approaches, Philosophy of Science 70, 126–44. ————– (2006). Discussion: Malament on time reversal, Philosophy of Science 73, 448–58. Leff, H. S. and A. F. Rex (1990). Maxwell’s Demon. Bristol: Adam Hilger. ————– (2003). Maxwell’s Demon 2. Bristol: Institute of Physics. Lewis, D. (1986). A Subjectivist’s guide to objective chance and Postscripts to a subjectivist’s guide to objective chance, in Philosophical papers, Vol. 2, Oxford: Oxford University Press, pp. 83–132. ————– (1994), Humean supervenience debugged, Mind 103, 473–90. Lieb, E. H. and Jakob Y. (1999). The physics and mathematics of the second law of thermodynamics, Physics Reports 310, 1–96. Liu, C. (1994). Is there a relativistic thermodynamics? A case study of the meaning of special relativity, Studies in History and Philosophy of Modern Physics 25, 983–1004. ————– (2001). Infinite systems in SM explanations: Thermodynamic limit, renormalization (semi-) groups, and irreversibility, Philosophy of Science (Proceedings) 68, 325–44. Loewer, B. (2001). Determinism and chance, Studies in History and Philosophy of Modern Physics 32, 609–29. ————– (2004). David Lewis’ Humean theory of objective chance, Philosophy of Science 71, 1115–25. Lombardi, O. (1999). Prigogine y la transformacion del panadero, Revista Latinoamericana de Filosofia 25, 69–86.

192

REFERENCES

————– (2000). La interpretacion de la irreversibilidad: Prigogine versus Gibbs, Dialogos 35, 37–56. Mackey, M. C. (1989). The dynamic origin of increasing entropy, Review of Modern Physics 61, 981–1015. ————– (1992). Time’s Arrow: The Origins of Thermodynamic Behaviour. Berlin Springer. Malament, D. (2004). On the time reversal invariance of classical electromagnetic theory, Studies in History and Philosophy of Modern Physics 35, 295– 315. ————– and S. L. Zabell (1980). Why Gibbs phase averages work, Philosophy of Science 47, 339–49. Maroney, O. J. E. (2005). The (absence of a) relationship between thermodynamic and logical reversibility, Studies in the History and Philosophy of Modern Physics 36, 355–74. Maudlin, T. (1995). Review of Lawrence Sklar’s Physics and Chance and Philosophy of Physics, British Journal for the Philosophy of Science 46, 145–49. Mellor, H. (2005). Probability: A Philosophical Introduction. London: Routledge. Mosini, V. (1995) Fundamentalism, antifundamentalism, and Gibbs’ paradox, Studies in History and Philosophy of Science 26, 151–62. Moulines, C. U. (1975). A logical reconstruction of simple equilibrium thermodynamics, Erkenntnis 9, 101–30. ————– (2000) The basic core of simple equilibrium thermodynamics, in W. Balzer, J. D. Sneed and C. U. Moulines eds, Structuralist Knowledge Representation: Paradigmatic Examples. Poznan Studies in the Philosophy of the Sciences and the Humanities 75, Amsterdam and Atlanta: Rodopi. Nagel, E. (1961). The Structure of Science. London: Routledge and Keagan Paul. North, J. (2002). What is the problem about the time-asymmetry of thermodynamics? — A reply to Price, British Journal for the Philosophy of Science 53, 121–36. Norton, J. (2005). Eaters of the lotus: Landauer’s Principle and the return of Maxwell’s Demon, Studies in the History and Philosophy of Modern Physics 36, 375–411. Parker, D. (2005). Thermodynamic irreversibility: Does the big bang explain what it purports to explain?, Philosophy of Science (Proceedings) 72, 751–63. Penrose, O. (1970). Foundations of Statistical Mechanics. Oxford: Oxford University Press. ————– (1979). Foundation of statistical mechanics, Reports on the Progress of Physics 42, 1937–2006. Penrose, R. (1989). The Emperor’s New Mind. Oxford: Oxford University Press. ————– (2006). The Road to Reality. A Complete Guide to the Laws of the Universe. London: Vintage. Pippard, A. B. (1966). The Elements of Classical Thermodynamics. Cambridge: Cambridge University Press.

REFERENCES

193

Pitowsky, I. (2001). Local fluctuations and local observers in equilibrium statistical mechanics, Studies in History and Philosophy of Modern Physics 32, 595–607. ————– (2006). On the definition of equilibrium, Studies in History and Philosophy of Modern Physics 37, 431–8. Popper, K. (1957). Irreversibility; Or, entropy since 1905, British Journal for the Philosophy of Science 8, 151–5. ————– (1959). The propensity interpretation of probability, British Journal for the Philosophy of Science 10, 25–42. Price, H. (1996). Time’s Arrow and Archimedes’ Point. New Directions for the Physics of Time. New York and Oxford: Oxford University Press. ————– (2002a). Boltzmann’s time bomb, British Journal for the Philosophy of Science 53, 83–119. ————– (2002b). Burbury’s last case: The mystery of the entropic arrow, Philosophy 50 (Supplement), 19–56. ————– (2004). On the origins of the arrow of time: Why there is still a puzzle about the low-entropy past, in Christopher Hitchcock ed, Contemporary Debates in Philosophy of Science. Oxford and Malden/MA: Blackwell, pp. 219–39. ————– (2006). Recent work on the arrow of radiation, Studies in History and Philosophy of Science 37, 498–527. Quay, P. M. (1978). A philosophical explanation of the explanatory functions of ergodic theory, Philosophy of Science 45, 47–59. R´edei, M. (1992). Krylov’s proof that statistical mechanics cannot be founded on classical mechanics and interpretation of classical statistical mechanical probabilities, Philosophia Naturalis 29, 268–84. Redhead, M.L.G. (1995). From Physics to Metaphysics. Cambridge: Cambridge University Press. ————– and P. Teller (1992). Particle labels and the theory of indistinguishable particles in quantum mechanics, British Journal for the Philosophy of Science 43, 201–18. Reichenbach, H. (1956). The Direction of Time. Berkeley: University of California Press. Reif, F. (1985). Fundamentals of Statistical and Thermal Physics. Columbus/OH: McGraw Hill. Rickles, D. (2008). Econophysics and the complexity of financial markets, forthcoming in J. Collier and C. Hooker eds, Handbook of the Philosophy of Science, Vol.10: Philosophy of Complex Systems. North Holland: Elsevier. Ridderbos, K. (2002). The coarse-graining approach to statistical mechanics: How blissful is our ignorance?’ Studies in History and Philosophy of Modern Physics 33, 65–77. Ridderbos, T. M. and Redhead, M. L. G. (1998). The Spin-echo experiments and the second law of thermodynamics, Foundations of Physics 28, 1237–70.

194

REFERENCES

Roberts, F.S. and R.D. Luce (1968). Axiomatic thermodynamics and extensive measurement, Synthese 18, 311–26. Rosen, P. (1959). The clock paradox and thermodynamics, Philosophy of Science 26, 145–7. Rosen, R. (1964). The Gibbs paradox and the distinguishability of physical systems, Philosophy of Science 31, 232–6. Ruelle, D. (1969). Statistical Mechanics: Rigorous Results. New York: Benjamin. ————– (2004). Thermodynamic Formalism: The Mathematical Structure of Equilibrium Statistical Mechanics, Cambride: Cambridge University Press. Saunders, S. (2006). On the explanation for quantum statistics, Studies in History and Philosophy of Modern Physics 37, 192–211. Savitt, S. (1994). Is classical mechanics time reversal invariant?, British Journal for the Philosophy of Science 45, 907–13. Schr¨ odinger, E. (1952). Statistical Thermodynamics. Mineola/New York: Dover 1989. Schaffner, K. (1976). Reduction in biology: Prospects and problems, in Proceeding of the Biennial Philosophy of Science Association Meeting 1974, pp. 613–32. Seidenfeld, T. (1986). Entropy and uncertainty, Philosophy of Science 53, 467– 91. Sewell, G. (1986). Quantum Theory of Collective Phenomena. Oxford: Clarendon Press. Shannon, C. E. (1949). The mathematical theory of communication, in Shannon, Claude E. and Warren Weaver: The Mathematical Theory of Communication. Urbana, Chicago and London: University of Illinois Press. Shenker, O. (1997). Maxwell’s Demon. PhD Thesis. The Hebrew University of Jerusalem. ————– (1999). Maxwell’s Demon and Baron Munchhausen: Free will as a perpetuum mobile, Studies in the History and Philosophy of Modern Physics 30, 347–72. ————– (1999). Is κTr(ρlnρ) the entropy in quantum mechanics?, British Journal for the Philosophy of Science 50, 33–48. Shimony, A. (1985). The status of the principle of maximum entropy, Synthese 63, 35–53. Smith, C. R., G. J. Erickson and P. O. Neudorfer eds (1992). Maximum Entropy and Bayesian Methods. Dordrecht: Kluwer. Sklar, L. (1973). Statistical explanation and ergodic theory, Philosophy of Science 40, 194–212. ————– (1978). Comments on papers by Earman and Malament, Proceedings of the Biennial Meeting of the Philosophy of Science Association, Vol. II, 186– 93. ————– (1981). Up and down, left and right, past and future, Nous 15, 111– 29.

REFERENCES

195

————– (1993). Physics and Chance. Philosophical Issues in the Foundations of Statistical Mechanics. Cambridge: Cambridge University Press. ————– (1999). The reduction(?) of thermodynamics to statistical mechanics, Philosophical-Studies 95, 187–202. ————– (2000a). Topology versus measure in statistical mechanics, Monist 83, 258–73 ————– (2000b) Interpreting theories: The case of statistical mechanics, British Journal for the Philosophy of Science 51 (Supplement), 729–42. Sorkin, R. (2005). Ten theses on black hole entropy, Studies in History and Philosophy of Modern Physics 36, 291–301. Streater, R. F. (1995). Stochastic Dynamics: A Stochastic Approach to Nonequilibrium Thermodynamics. London: Imperial College Press. Timpson, C. G. (2003). On a supposed conceptual inadequacy of the Shannon information in quantum mechanics, Studies in History and Philosophy of Modern Physics 34, 441–68. Tolman, R. C. (1938). The Principles of Statistical Mechanics, Mineola/New York: Dover 1979. Uffink, J. (1995a). Can the maximum entropy principle be explained as a consistency requirement?, Studies in History and Philosophy of Modern Physics 26, 223–61. ————– (1995b). Grondslagen van de Thermische en Statistische Fysica. Unpublished lecture notes. ————– (1996a). The constraint rule of the maximum entropy principle, Studies in History and Philosophy of Modern Physics 27, 47–79. ————– (1996b). Nought but molecules in motion (Review essay of Lawrence Sklar: Physics and Chance). Studies in History and Philosophy of Modern Physics 27, 373–87. ————– (2001). Bluff your way in the second law of thermodynamics, Studies in the History and Philosophy of Modern Physics 32, 305–94. ————– (2002). Review of Time and Chance by David Albert, Studies in History and Philosophy of Modern Physics 33, 555–63. ————– (2004). Boltzmann’s work in statistical physics, Standford Encyclopedia of Philosophy, http://plato.stanford.edu, Winter 2004 edn. ————– (2007). Compendium of the foundations of classical statistical physics, in J. Butterfield and J. Earman eds, Philosophy of Physics. Amsterdam: North Holland, 923–1047. van Kampen, N. G. (1984). The Gibbs paradox, in W. E. Parry ed, Essays in Theoretical Physics: In Honor of Dirk Ter Haar. Oxford: Pergamon Press, 303–12. van Lith, J. (1999). Reconsidering the concept of equilibrium in classical statistical mechanics, Philosophy of Science 66 (Supplement), 107–18. ————– (2001a). Ergodic theory, interpretations of probability and the foundations of statistical mechanics, Studies in History and Philosophy of Modern Physics 32, 581–94.

196

REFERENCES

————– (2001b). Stir in Sillness: A Study in the Foundations of Equilibrium Statistical Mechanics. PhD Thesis, University of Utrecht. Available at http://www.library.uu.nl/digiarchief/dip/diss/1957294/inhoud.htm. ————– (2003). Probability in classical statistical mechanics (Review of The Concept of Probability in Statistical Physics by Y. M. Guttmann). Studies in History and Philosophy of Modern Physics 34, 143–50. Voit, J. (2005). The Statistical Mechanics of Financial Markets. Berlin Springer. von Mises, R. (1939). Probability, Statistics and Truth. London: George Allen and Unwin. von Plato, J. (1981). Reductive relations in interpretations of probability, Synthese 48, 61–75. ————– (1982). The significance of the ergodic decomposition of stationary measures of the interpretation of probability, Synthese 53, 419–32. ————– (1988). Ergodic theory and the foundations of probability, in B. Skyrms and W. L. Harper eds (1988). Causation, Chance and Credence. Vol. 1, Dordrecht: Kluwer, 257–77. ————– (1989). Probability in dynamical systems, in J. E. Fenstad, I. T. Frolov and R. Hilpinen eds, Logic, Methodology and Philosophy of Science VIII, 427–43. ————– (1991). Boltzmann’s ergodic hypothesis, Archive for History of Exact Sciences 42, 71–89. ————– (1994). Creating Modern Probability. Cambridge: Cambridge University Press. Visser, H. (1999). Boltzmann and Wittgenstein or how pictures became linguistic, Synthese 119, 135–56. Vranas, P. B. M. (1998). Epsilon-ergodicity and the success of equilibrium statistical mechanics, Philosophy of Science 65, 688–708. Wald, R. M. (2006). The arrow of time and the initial conditions of the universe, Studies in History and Philosophy of Science 37, 394–98. Wehrl, A. (1978). General properties of entropy, Reviews of Modern Physics 50, 221–60. Wickens, J. S. (1981). Causal explanation in classical and statistical thermodynamics, Philosophy of Science 48, 65–77. Winsberg, E. (2004a). Can conditioning on the past hypothesis militate against the reversibility objections?’ Philosophy of Science 71, 489–504. ————– (2004b). Laws and statistical mechanics, Philosophy of Science 71, 707–18. Yi, S. W. (2003). Reduction of thermodynamics: A few problems, Philosophy of Science (Proceedings) 70, 1028–38. Zangh`ı, N. (2005). I fondamenti concettuali dell’approccio statistico in fisica, in V. Allori, M. Dorato, F. Laudisa and N. Zangh`ı eds, La Natura Delle Cose. Introduzione ai Fundamenti e alla Filosofia della Fisica. Roma: Carocci. Zeh, H. D. (2001). The Physical Basis of the Direction of Time. Berlin and New York: Springer.

4 PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY CHRIS TIMPSON Introduction While quantum information theory is one of the most lively, up-and-coming new areas of research in physics, its central concerns have long been familiar. They are simply those that have lain close to the heart of anyone interested in the foundations of quantum mechanics, since its inception: How does the quantum world differ from the classical one? What is distinctive about the field, however, is that this question is approached from a particular viewpoint: a task-oriented one. It has turned out to be most productive to ask: what can one do with quantum systems, that one could not with classical ones? What use can one make of non-commutativity, entanglement; and the rest of our familiar friends? The answers have involved identifying a rich range of communication and computational tasks that are distinctively quantum mechanical in nature: notions, for example, of quantum computation, quantum cryptography and entanglement-assisted communication. Providing these answers has deepened our understanding of quantum theory considerably, while spurring impressive experimental efforts to manipulate and control individual quantum systems. What is surprising, and, prima facie, need not have been the case, is that the peculiar behaviour of quantum systems does provide such interesting opportunities for new forms of communication and computation, when one might have feared that these peculiarities would only present annoying obstacles for the increasing miniaturisation of information processing devices. For philosophers, and for those interested in the foundations of quantum mechanics, quantum information theory therefore makes a natural and illuminating object of study. There is a great deal to be learnt therein about the behaviour of quantum systems that one did not know before. We shall survey a few of these points here. But there are further reasons why quantum information theory is particularly intriguing. Running along with the development of the field have been a number of moreor-less explicitly philosophical propositions. Many have felt, for example, that the development of quantum information theory heralds the dawn of a new phase of physical theorising, in which the concept of information will come to play a much more fundamental rˆ ole than it has traditionally been assigned. Some have gone so far as to re-vivify immaterialist ideals by arguing that information should 197

198

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

be seen as the basic category from which all else flows, and that the new task of physics will be to describe how this information evolves and manifests itself. (Wheeler, 1990) is the cheerleader for this sort of view. Or again, the rallying cry of the quantum information scientist is that ‘Information is Physical!’, a doctrine of surprising-sounding ontological import. On the less extreme side is the widespread view that developments in quantum information will finally help us sort out the conceptual problems in quantum mechanics that have so vexed the theory from the beginning. In order to get clearer on what import quantum information theory does have, it would be beneficial to gain a better understanding of what the theory is about. This will be one of our main aims here. In §4.1 we will survey some elementary aspects of quantum information theory, with a focus on some of the principles and heuristics involved. In §4.2 we will examine in detail what exactly quantum information (and therefore quantum information theory) is; and deploy our findings in resolving puzzles surrounding the notion of quantum teleportation. This will provide us with a better grasp of the relation between information theory and the world. In §4.3 we turn to examine what one might learn from the development of quantum computation, both about quantum systems and about the theory of computation, asking where the speed-up in quantum computers might come from and what one should make of the Church-Turing hypothesis in this new setting. Finally, in §4.4, we broach the compelling question of what, if anything, quantum information theory might have to teach us about the traditional foundational problems in quantum mechanics. Some pit-falls are noted before we discuss a number of attempts to provide information-theoretic axiomatisations of quantum mechanics: Zeilinger’s Foundational Principle, the CBH theorem and quantum Bayesianism. On all of these matters there is more to be said than I essay here. In general there are two kinds of strategies that have been manifest in attempts to obtain philosophical or foundational dividends from quantum information theory, the direct and the indirect. We will canvass a number of each. The direct strategies include such thoughts as these: the quantum state is to be understood as information; quantum information theory supports some form of immaterialism; quantum computation is evidence for the Everett interpretation. None of these survives close examination, and it seems unlikely that any such direct attempt to read a philosophical lesson from quantum information theory will. Much more interesting and substantial are the indirect approaches which seek, for example, to learn something useful about the structure or axiomatics of quantum theory by reflecting on quantum information-theoretic phenomena; that might look to quantum information theory to provide new analytic tools for investigating that structure; or that look to suggested constraints on the power of computers as potential constraints on new physical laws. The deepest lessons are perhaps still waiting to be learnt.

FIRST STEPS WITH QUANTUM INFORMATION

4.1

199

First Steps with Quantum Information

As I have said, quantum information theory is animated by the thought that the difference in character of quantum and classical systems makes possible interesting new forms of communication and computation. And one may reasonably hope that reflecting on the nature and possibility of these new tasks will in turn shed light back on the differences between quantum and classical. Quantum information theory may be seen as an extension of classical information theory that introduces new primitive information-theoretic resources, particularly quantum bits and shared entanglement; and develops quantum generalisations of the associated notions of sources, channels and codes. Within this general setting, one may then devise cryptographic, communication or computational tasks that go beyond the classical, and investigate their properties. 4.1.1

Bits and Qubits

It is useful to begin by focusing on the differences between the familiar classical primitive—the bit—and the corresponding quantum primitive—the qubit (quantum bit)1 . A classical bit is some physical object which can occupy one of two distinct, stable classical states, conventionally labelled by the binary values 0 or 1. The term ‘bit’ is also used to signify an amount of classical information: the number of bits that would be required to encode the output of a source is called the quantity of information the source produces (Shannon, 1948). We shall see more of this below (§4.2). A qubit is the precise quantum analogue of a bit: it is a two-state quantum system. Examples might be the spin degree of freedom of an electron or of a nucleus, or an atom with an excited and an unexcited energy state, or the polarization of a photon. The two basic orthogonal states of a qubit are represented by vectors labelled |0 and |1 . These states are called the computational basis states and provide analogues of the classical 0 and 1 states. But of course, analogy is not identity. While a classical bit may only exist in either the 0 or 1 states, the same is not true of a qubit. It may exist in an arbitrary superposition of the computational basis states: |ψ = α |0 + β |1 , where α and β are complex numbers whose moduli squared sum to one. There are, therefore, continuously many different states that a qubit may occupy, one for each of the different values the pair α and β may take on; and this leads to the natural thought that qubits contain vastly more information than classical bits, with their measly two element state space. Intuitively, this enormous difference in the amounts of information associated with bit and qubit might seem to be their primary information-theoretic distinction. However a little care is required here. While it is certainly true that the existence of superpositions represents a fundamental difference between qubits and bits, it is not straightforward to maintain that qubits therefore contain vastly 1 The term ‘qubit’ was introduced in print in (Schumacher, 1995), the concept having been first aired by Schumacher, following conversations with Wootters, at the IEEE meeting on the Physics of Computation in Dallas, October 1992.

200

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

more information. For a start, it is only under certain conditions that systems may usefully be said to contain information at all—typically only when they are playing a suitable rˆ ole in a communication protocol of some sort. But more importantly, we need to make a distinction between two different notions of information that coincide in the classical case, but diverge in the quantum; that is, a distinction between specification information and accessible information. Consider a sequence of N systems, each of which has been prepared in some particular state from a given finite set of states (the very simplest case would be a sequence of bits which has been prepared in some sequence of 0s and 1s). Assume, furthermore, that each particular state occurs in the sequence with a given probability. We may think of this sequence as being our message. We may now ask how much information (in bits) is required to specify what this sequence of states is. This is called the specification information associated with the message. We might also ask how much information can be acquired or read from the sequence: this is the accessible information. Clearly, in the classical case, the two quantities will coincide, as classical states are perfectly distinguishable. When presented with the message, we may determine the sequence of states perfectly by observation or measurement; and what we have determined— the identity of the sequence of states the message comprises—evidently gives us enough information to specify what that sequence is. However, in the quantum case, these two quantities will differ, in general. If we prepare our N systems in a sequence of states drawn from a set of non-orthogonal quantum states, it will not be possible to identify the sequence of states by measurement. This means that in general much more information will be required to specify the sequence than may be obtained from it. Take the case of a sequence of qubits. As we have said, there are continuously many states that each qubit could be prepared in, so the specification information associated with the sequence could be unboundedly large. But it would only be if each of the qubits were prepared in one or other of two fixed orthogonal states that we could reliably identify what the sequence of states prepared actually was; and then we would only be getting one bit of information per qubit in the sequence. It turns out that this would in fact be the best that we could do. A striking result due to (Holevo, 1973), called the Holevo bound, establishes that the maximum amount of information that can be obtained from measurements on a quantum system is given by the logarithm (to base 2) of the number of orthogonal states the system possesses, no matter how clever our measuring procedure. Thus, in the case of qubits, the maximum amount of information per qubit that can be decoded from measurements on the sequence is just one bit. Given that ‘encoded’ is a success word (one can’t be said to have encoded something if one cannot, in principle decode it), this tells us that the maximum amount of information that can be encoded into a qubit is just one bit; the same amount, of course, as a classical bit. So while we may prepare some sequence of qubits having an unboundedly large specification information, we could not thereby have managed to encode more than a single bit of information into each qubit. Looked

FIRST STEPS WITH QUANTUM INFORMATION

201

at from a certain perspective, this presents an intriguing puzzle. As Caves and Fuchs have put it: just why is the state-space of quantum mechanics so gratuitously large, from the point of view of storing information? (Caves and Fuchs, 1996). There is a final important reason why we should not, on reflection, have been tempted to conclude that qubits can contain vastly more information than classical bits, on the strength of the possibility of preparing them in superpositions of computational basis states. It is that the intuition driving this thought derives from an overly classical way of thinking about and quantifying information. If we could prepare a classical system in any one of an arbitrarily large number of different states, then it might indeed be appropriate to associate an arbitrarily large amount of information with that system. Classical information. But quantum systems are not classical systems and quantum states are not classical states. It was Schumacher’s insight (Schumacher, 1995) that this allowed us to introduce a new notion of information peculiar to quantum systems—quantum information. And we need a new theory to tell us how much of this information there may be about in a given situation (we will see how Schumacher developed this in §4.2). Thus when talking about the amount of information that is associated with a given system, or has been encoded into it, we need to clarify whether we are talking about transmitting classical information using quantum systems, or whether we are talking about encoding and transmitting quantum information properly so-called. In the former context, the notions of specification and accessible information apply: how much classical information is required to specify a sequence, or how much classical information one can gain from it, respectively; and we know that at most one classical bit can be encoded into a qubit. In the latter context, we apply the appropriate measure of the amount of quantum information; and it may come as no surprise to learn that the maximum amount of quantum information that may be encoded into a qubit is one qubit’s worth! (See below.) 4.1.2

The No-Cloning Theorem

The difference in the nature of the state spaces of bit and qubit—the fact that qubits can support superpositions and hence enjoy a large number of distinct, but non-distinguishable states—does not, therefore, manifest itself in a simpleminded difference in the amount of information the two types of objects can contain, but in more subtle and interesting ways. We have already seen one, in the ensuing difference between accessible and specification information. A closely related idea is that of no-cloning. We have already used the idea that it is not possible to distinguish perfectly between non-orthogonal quantum states; equivalently, that it is not possible to determine an unknown state of a single quantum system. If we don’t at least know an orthogonal set the state in question belongs to (e.g., the basis the system was

202

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

prepared in) then no measurement will allow us to find out its state reliably2 . This result is logically equivalent to an important constraint on information processing using quantum systems. Whether we are primarily concerned with encoding classical information or quantum information into quantum systems, we will be involved in preparing those systems in various quantum states. The no-cloning theorem due to (Dieks, 1982) and (Wootters and Zurek, 1982) states that it is impossible to make copies of an unknown quantum state. Presented with a system in an unknown state |ψ , there is no way of ending up with more than one system in the same state |ψ . One can swap |ψ from one system to another3 , but one can’t copy it. This marks a considerable difference from classical information processing protocols, as in the classical case, the value of a bit may be freely copied into numerous other systems, perhaps by measuring the original bit to see its value, and then preparing many other bits with this value. The same is not possible with quantum systems, obviously, given that we can’t determine the state of a single quantum system by measurement: the measuring approach would clearly be a non-starter. To see that no more general scheme would be possible either, consider a device that makes a copy of an unknown state |α . This would be implemented by a unitary evolution4 U that takes the product |α |ψ0  , where |ψ0  is a standard state, to the product |α |α . Now consider another possible state |β . Suppose the device can copy this state too: U |β |ψ0  = |β |β . If it is to clone a general unknown state, however, it must be able to copy a superposition such as √ |ξ = 1/√ 2(|α + |β ) also, but the effect of U on |ξ is to produce an entangled state 1/ 2(|α |α + |β |β ) rather than the required |ξ |ξ . It follows that no general cloning device is possible. This argument makes use of a central feature of quantum dynamics: its linearity 5 . In fact it may be seen in the following way that if a device can clone more than one state, then these states must belong to an orthogonal set. We are supposing that U |α |ψ0  = |α |α and U |β |ψ0  = |β |β . Taking the inner product 2 Imagine

trying to determine the state by measuring in some basis. One will get some outcome corresponding to one of the basis vectors. But was the system actually in that state before the measurement? Only if the orthogonal basis we chose to measure in was one containing the unknown state. And even if we happened on the right basis by accident, we couldn’t know that from the result of the measurement, so we could not infer the identity of the unknown state. For a fully general discussion, see (Busch, 1997). 3 Take two Hilbert spaces of the same dimension, H and H . The ‘swap’ operation U on 1 2 S H1 ⊗ H2 is a unitary operation that swaps the state of system 1 for the state of system 2 and vice versa: US |ψ 1 P |ψ   2 = |ψ   1 |ψ 2. If we take {|φi  1, 2} as basis sets for H1 and H2 respectively, then US = ij |φj  1 1| φi ⊗ |φi  2 2| φj , for example. 4 Is it too restrictive to consider only unitary evolutions? One can always consider a nonunitary evolution, e.g. measurement, as a unitary evolution on a larger space. Introducing auxiliary systems, perhaps including the state of the apparatus, doesn’t affect the argument. 5 An operator O on H is linear if its effect on a linear combination of vectors is equal to the same linear combination of the effects of the operator on each vector taken individually: O(α |u1  + β |u2  ) = α O |u1  + βO |u2  = α |v1  + β |v2  ; |ui  , |vi  ∈ H. Unitary operators are, of course, linear.

FIRST STEPS WITH QUANTUM INFORMATION

203

of the first equation with the second implies that α|β = α|β2 , which is only satisfied if α|β = 0 or 1, i.e., only if |α and |β are identical or orthogonal. I said above that no-cloning was logically equivalent to the impossibility of determining an unknown state of a single system. We have already seen this in one direction: if one could determine an unknown state, then one could simply do so for the system in question and then construct a suitable preparation device to make as many copies as one wished, as in the classical measuring strategy. What about the converse? If one could clone, could one determine an unknown state? The answer is yes. If we are given sufficiently many systems all prepared in the same state, then the results of a suitable variety of measurements on this group of systems will furnish one with knowledge of the identity of the state (such a process is sometimes called quantum state tomography). For example, if we have a large number of qubits all in the state |ψ = α |0 +β |1 , then measuring them one by one in the computational basis will allow us to estimate the Born rule probabilities |0|ψ|2 = |α|2 and |1|ψ|2 = |β|2 , with increasing accuracy as the number of systems is increased. This only gives us some information about the identity of |ψ , of course. To determine this state fully, we also need to know the relative phase of α and β. One could find this by also making a sufficient number of measurements on further identically prepared individual systems in the rotated √ √ bases {1/ 2(|0 ± |1 } and {1/ 2(|0 ± i |1 }, for example (Fano, 1957; Band and Park, 1970). (One would need to make more types of measurement if the system were higher dimensional. For an n-dimensional system, one needs to establish the expectation values of a minimum of n2 − 1 operators.) Thus access to many copies of identically prepared systems allows one to find out their state; and with a cloner, one could multiply up an individual system into a whole ensemble all in the same state; so cloning would allow identification of unknown states. (It would also imply, therefore, the collapse of the distinction between accessible and specification information.) In fact it was in the context of state determination that the question of cloning first arose (Herbert, 1982). Cloning would allow state determination, but then this would give rise to the possibility of superluminal signalling using entanglement in an EPR-type setting: one would be able to distinguish between different preparations of the same density matrix, hence determine superluminally which measurement was performed on a distant half of an EPR pair. The no-cloning theorem was derived to show that this possibility is ruled out. So the no-cloning theorem is not only interesting from the point of view of showing differences between classical and quantum information processing, important as that is. It also illustrates in an intriguing way how tightly linked together various different aspects of the quantum formalism are. The standard proof of no-cloning is based on the fundamental linearity property of the dynamics: suggestive if one were searching for information-theoretic principles that might help illuminate aspects of the quantum formalism. Furthermore, cloning is logically equivalent to the possibility of individual state determination and hence implies superluminal signalling; thus no-cloning seems to be a crucial part

204

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

of the apparent peaceful co-existence between quantum mechanics and relativity. All this might seem to suggest some link between no-signalling and linearity of the dynamics: see (Svetlichny, 1998) and (Simon et al., 2001) for some work in this connection (but cf. (Svetlichny, 2002) also); (Horodecki et al., 2005) discuss no-cloning and the related idea of no-deleting in a general setting. 4.1.3

Quantum Cryptography

Quantum cryptography is the study of the possibilities of secret communication using quantum properties. It holds out the promise of security of communication guaranteed by the laws of physics, in contrast to the mere computational difficulty that underwrites our best in classical security. In doing so it makes essential use of the fact that non-orthogonal quantum states cannot be perfectly distinguished; essential use, that is, of the great size of the qubit state space that, in a sense, we have seen we lack access to. The existence of non-orthogonal states is linked, of course, to the non-commutativity of observables and the existence of incompatible physical quantities. One of the reasons, therefore, that quantum cryptography has been of interest is that it provides a very direct ‘cash-value’ practical application of—and new theoretical playground for—some of the most puzzling and non-classical aspects of the quantum formalism6 . How might one go about using qubits for secret communication? One thought might be to try to hide the secret message directly in a sequence of qubits (this was the form that one of the very earliest protocols in fact took (Bennett et al., 1982; Brassard, 2005)). So, for example, one party, Alice, might encode a classical message (a sequence of 0s and 1s, say) into a sequence of quantum systems by preparing them in various non-orthogonal states. Thus spin-up and spin-down might represent 0 and 1 respectively; and for each qubit in her sequence, she could choose what basis to prepare it in. Picking from σz and σx bases, for example, her encoded message will be an alternating sequence of σz and σx eigenstates, with the eigenvalue of each indicating the classical bit value encoded. So a sequence like |↑z  |↓z  |↓x  |↑x  |↑z  would represent the message 01100. Now if the other party, Bob, for whom the message is intended, knows what sequence of bases Alice chose—that is, if they have met previously and agreed upon the basis sequence clandestinely—then he is able to measure in the appropriate basis for each system and read out correctly what the classical bit value 6 For example, the study of quantum cryptography has provided very useful conceptual and formal tools for clarifying and quantifying what had been the unsatisfactorily messy matters of what, if anything, measurement and disturbance have to do with one another. The folklore, since Heisenberg, has not been edifying. See (Fuchs, 1998) and (Fuchs and Jacob, 2001). The lesson is to focus on states; and non-orthogonality is the crucial thing. Measurements disturb non-orthogonal sets of states, but if a state is known to be from some orthogonal set, it is, perhaps surprisingly, possible to measure any observable on it you wish and return it to its initial state, i.e., to leave it undisturbed.

FIRST STEPS WITH QUANTUM INFORMATION

205

encoded is. However, any eavesdropper, Eve, who wishes to learn the message, cannot do so, as she doesn’t know which basis each system was prepared in. All she can have access to is a sequence of non-orthogonal states; and we know that she will be unable to identify what that sequence of states is; therefore she will be unable to learn the secret message. Furthermore, if she does try to learn something about the identity of the sequence of states, she will end up disturbing them in such a way that Alice and Bob will be able to detect her eavesdropping. They will then know that if they wish to preserve the security of future transmissions they will need to meet once more and agree upon a new secret sequence of encoding bases. If there is no eavesdopping, though, they may keep on using the same encoding basis sequence over and over again. However it turns out that this sort of protocol isn’t the best one to use. Although Eve cannot fully identify the sequence of non-orthogonal states—and hence the secret message—by measurement, she will be able to gain some information about it7 ; and her actions in trying to gather information will end up scrambling some of the message that Alice is trying to send Bob—he will not receive everything that Alice is trying to send. One can avoid these kinds of problems and generate a perfectly secure protocol by making use of the ideas of key distribution instead (Bennett and Brassard, 1984). 4.1.3.1 Key Distribution There are two central techniques here, both developed before the advent of quantum cryptography. The first is called symmetrical or private-key cryptography; the second, asymmetrical or public-key cryptography. In both techniques the message being sent is encrypted and rendered unreadable using a key—and a key is required to unlock the message and allow reading once more. In private-key cryptography, both parties share the same, secret, key, which is used both for encryption and decryption. The best known (and the only known provably secure) technique is the one-time pad. Here the key consists of a random string of bit values, of the same length as the message to be encrypted. The message string is encrypted simply by adding (modulo 2) the value of each bit in the message to the value of the corresponding bit in the key string. This generates a cryptogram which is just as random as the bit values in the private key and will thus provide Eve with no information about the message. The cryptogram is decrypted by subtracting (again modulo 2) the key from the cryptogram, returning the starting message string. Thus if Alice and Bob share a random secret key, they can communicate securely. The down-side to this protocol is that each key may only be used once. If more than one message were encoded using the same key then Eve could begin to identify the key by comparing the cryptograms. Also, whenever Alice and Bob wish to share a new key, they must 7 It is for this reason that Alice and Bob would have to change their agreed basis sequence after detecting the presence of Eve. If they didn’t then Eve would eventually be able to gain enough information about the encoding basis sequence to learn a good deal about the messages being sent.

206

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

meet in secret, or use a trusted courier; and a key has to be as long as any message sent. Hence the preference for public-key cryptography in the majority of cases. Public-key cryptography is based on one-way functions. These are functions whose values are easy to calculate given an argument, but whose inverse is hard to compute. Some such functions enjoy a so-called ‘trapdoor’: supplying an extra piece of information makes the inverse calculation easy. In a public-key system, Bob will create a suitably related pair of a public key and a secret private key. The public key will be used for encryption, which will be easy to perform, but hard to reverse. The private key is the trapdoor that makes the decryption easy. Bob keeps the private key to himself and broadcasts the public key, so that anyone who wants to send him a message may do so, sure in the knowledge that it will be very hard to decrypt by anyone apart from Bob. The best known of such systems is the RSA (Rivest, Shamir and Adlemann) protocol, whose security is based on the apparent computational difficulty of factoring large numbers. The great advantage of public-key systems is that Alice and Bob do not need to meet in secret to share a key—the key used for encryption may simply be broadcast over a public channel. The disadvantage is that the security of the protocol relies only on the computational intractability of the decryption operation in the absence of the private-key; and it’s not even known whether any truly adequate one-way functions with trapdoors exist. Quantum cryptography, or more properly, quantum key distribution, allows one to combine the benefits of both systems. Using quantum systems, Alice and Bob may generate a useable key without having to meet in secret or share any secret beforehand, while at the same time they can be assured of complete security for their communication (at least if the laws of quantum mechanics are correct). The central idea was first presented by (Bennett and Brassard, 1984). They realised that one could use the fact that any eavesdropper interacting with quantum systems prepared in non-orthogonal states would disturb those states—and thereby betray their presence—as a basis for sifting out a secret shared random key. The protocol (dubbed ‘BB84’ after its creators) proceeds a follows: 1. Alice will send Bob a large number of qubits via a quantum channel, choosing at random whether to prepare them in the σz basis or the σx basis (making a note of which she chooses); and choosing at random whether to prepare each system in the up or down spin state (corresponding to a 0 and a 1 value, respectively; again she notes which she chooses). 2. Bob, on receiving each qubit from Alice, chooses at random whether to measure σz or σx and notes whether he gets a 0 or a 1 (spin-up or spindown) outcome for each measurement. Half of the time Bob will have measured in the same basis as Alice prepared the system in; and half of the time he will have measured in a different basis. But neither knows which cases are which. At this stage, both Bob and Alice will possess a random sequence of 0s and 1s, but they will not

FIRST STEPS WITH QUANTUM INFORMATION

207

possess the same sequence. If Bob measured in the same basis as Alice chose then the outcome of his measurement will be the same as the value 0 or 1 that Alice prepared, but if he measured in the other basis, he will get a 0 or 1 outcome at random, the value being uncorrelated to the value Alice chose. 3. The next stage of the protocol is that Alice and Bob jointly announce which basis they chose for each system, discarding from their records the bit values for all those systems where they differed in the basis chosen (they do not, however, announce their classical bit values). The resulting string of classical bits that Alice and Bob now each possess is called the sifted key and, in the absence of noise or any eavesdropping on the transmitted quantum systems, they will now share a secret random key. Notice that neither Alice nor Bob determines which of Alice’s initial random sequence of 0 or 1 choices is retained at the sifted key stage; it is a matter of chance depending on the coincidences in their independent random choices of basis. 4. Now is the time to check for Eve. Given that the qubits sent from Alice to Bob are prepared in a random sequence of states drawn from a nonorthogonal set, any attempt by Eve to determine what the states are will give rise to a disturbance of the sequence. For instance, she might try to gain some information about the key by measuring either σz or σx on each system en route between Alice and Bob: this would provide her with some information about the sequence being sent; but half the time it would project the state of a qubit into the other basis than the one Alice initially prepared. Alice and Bob can check for such disturbance by Alice randomly selecting a subset of bits from her sifted key and announcing which bits she has chosen and their values. If the qubits were undisturbed in transmission between Alice and Bob, then Bob should have exactly the same bit values as Alice has announced. 5. Finally, Bob announces whether his bit values for the checked sub-set agree with Alice’s or differ. If they agree for the subset of bits publicly announced and checked then Alice and Bob can be sure that there was no eavesdropping; and the remaining bits in their sifted key after they have discarded the checked bits consitute a secret shared random key. If the checked values differ too much, however, then Alice and Bob discard all the remaining bits and recommence the protocol. Once Alice and Bob have completed the protocol successfully, they know they share a secret random key that can be used for one-time pad encryption. The cryptogram can be broadcast over public channels and Bob (and nobody else) will be able to decrypt it. 4.1.3.2 Remarks a) In this protocol, Alice and Bob make use of two channels: a quantum channel transmitting the qubits, which they assume Eve may have access to; and a public (broadcast) channel which anyone can hear, but, we assume,

208

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

Eve cannot influence. Notice that Eve can always prevent Alice and Bob from successfully completing their protocol and obtaining their key simply by blocking the quantum channel. But this would be self-defeating from her point of view. Her end is to acquire some information about Alice and Bob’s random key, so that she may gain some information about any future message they may encrypt using it. If she prevents them from coming to share a key, then they will never try to send such a message, so she would automatically be unable to find out any secrets. b) The crucial component of quantum key distribution is the fact that Eve cannot gain any information about the identity of the states being sent from Alice to Bob without betraying her presence by disturbing them. We saw this in the simple case in which Eve essays an ‘intercept and resend’ strategy: intercepting individual qubits en route, measuring them, and then hoping to send on to Bob a new qubit in the same state as the original one sent from Alice, so that her measurement is not detected. In the case where Eve intercepts and measures in either the σz or σx basis, she will introduce 25% errors into the sifted key, which will be easy to detect at the data checking stage (50% of the systems get projected into the other basis by her measurement; measuring these, half the time Bob will, at random, get a result correlating with Alice’s, half the time, however, he will get the opposite result: an error).8 Notice the links with our previous ideas of no-cloning and of the impossibility of determining an unknown state by measurement (the impossibility of distinguishing perfectly between non-orthogonal states). If Eve were able to clone the qubits sent from Alice to Bob, then she could keep a copy of each for herself and produce her own copy of Alice and Bob’s key as they make the crucial announcements; if she could determine unknown states by measurement, she could intercept the qubits, find out what states Alice was sending to Bob and prepare a fresh sequence in the same states afterwards to resend. Whilst it can also be proved directly (see (Bennett et al., 1992) for a simple case) the fact that Eve must introduce some disturbance when she tries to gain information about the identity of the states being sent can actually be seen as a requirement of consistency given the impossibility of distinguishing perfectly between non-orthogonal states (c.f. (Busch, 1997; Fuchs, 1998)). To see why, consider the simple case of a pair of non-orthogonal states |φ1  and |φ2  (the reasoning generalises). A necessary, but not sufficient, condition to be able to distinguish between these states by making some 8 In general, Eve could attempt more subtle attacks, for example, not measuring individual systems, but blocks of them, or entangling ancilla systems with each qubit and not performing any measurement on these ancillas until after Alice and Bob have started making their announcements. Accordingly, full security proofs need to be equally subtle. See, e.g. (Nielsen and Chuang, 2000), §12.6.5 and refs.

FIRST STEPS WITH QUANTUM INFORMATION

209

measurement M , is that the two states generate different probability distributions over the outcomes of the measurement. We have a system prepared in one or other of these states. Suppose that measuring M did not disturb either |φ1  or |φ2  . This would mean that by repeating the measurement over and over again on our individual system, we could eventually arrive at a good estimate of the probability distribution that the state of the system generates, as the state remains the same pre- and post-measurement. But knowing the probability distribution generated for the outcomes of M would allow us to see whether the state of the system was |φ1  or |φ2  , given, by hypothesis, that these two distributions are distinct. Thus it cannot be the case that neither of these non-orthogonal states is left undisturbed by M . Any measurement that would provide information about the identity of the state of the system must therefore lead to a disturbance of at least one of the states in the non-orthogonal set; hence Eve will always betray her presence by introducing errors with some non-zero probability.9 c) Realistic quantum cryptographic protocols have to allow for the possibility of noise. In BB84, errors that are detected at the data checking stage could be due either to Eve, or to noise, or to both. To account for this, information reconciliation and privacy amplification protocols were developed (see (Nielsen and Chuang, 2000) §12.6.2 and references therein). Information reconciliation is a process of error correction designed to increase the correlation between Alice’s and Bob’s strings by making use of the public channel, while giving away as little as possible to Eve. For example, Alice might choose pairs of bits and announce their parity (bit value sum modulo 2), and Bob will announce whether or not he has the same parity for each of his corresponding pairs. If not, they both discard that pair; if they are the same, Alice and Bob both keep the first bit and discard the second. Knowing the parity of the pair won’t tell Eve anything about the value of the retained bit. (This example is from (Gisin et al., 2002)). After a suitable process of reconciliation, Alice and Bob will share the same key to within acceptable errors, but if some of the original errors were due to Eve, it’s possible that she possesses a string which has some correlation to theirs. If the original error rate was low enough, however, Alice and Bob are able to implement privacy amplification, which is a process that systemat9 To see how the argument generalises, consider a larger non-orthogonal set {|φ  }. Suppose i each |φi  generated a different probability distribution for the outcomes of M . Then M must disturb at least one element of {|φi  } and indeed, one element of every pair-wise orthogonal subset of {|φi  }. Consider also another measurement M  for which at least some of the states of {|φi  }, but perhaps not all, generate distinct probability distributions. (This is a minimal condition for a measurement to count as information-gathering for the set.) It’s simple to show that the states that generate distinct probability distributions for M  cannot all be orthogonal, so there is at least some non-orthogonal pair from {|φi  } that generates distinct distributions for M  . Applying our previous reasoning, it follows that at least one of these will be disturbed by measurement of M  .

210

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

ically reduces the correlation between their strings and Eve’s (Nielsen and Chuang, 2000) (see §12.6.2). We have focused on one form of quantum key distribution, which proceeds by transmitting qubits prepared in non-orthogonal states. It is also possible to use entanglement to generate a key ((Ekert, 1991); see also (Bennett et al., 1992)). Suppose one had a reliable source of entangled systems, for instance a source that could be relied on to generate the spin singlet state √  − ψ = 1/ 2(|↑ |↓ − |↓ |↑ ). If a large number of such entangled pairs were produced and one of each pair given to Alice and one to Bob, then Alice and Bob can procede along the same lines as in the BB84 protocol. Each chooses to measure σz or σx at random on each system, obtaining a random sequence of 0 or 1 outcomes. Then just as before, they announce which basis they measured in for each system and discard those outcomes where they did not measure in the same basis, once more obtaining a sifted random key. Again, they may then check for Eve’s presence. (In this case, when measuring in the same basis, Bob will get the opposite outcome to Alice’s. He can simply perform a bit-flip on every bit to obtain the correlated values.) If they wished to, they could even select a subset of the qubits produced by the source to check that the states being produced by the source violate a Bell inequality—that way they can be sure that sneaky Eve has not replaced the putative singlet source with some other source that might provide her with greater information. Quantum key distribution is the aspect of quantum information that has achieved the greatest practical development so far, making use of photon qubits. From the first table-top demonstration models in 1989, key distribution systems have now been demonstrated over distances of tens of kilometers10 . The DARPA Quantum Network, a quantum key distribution network involving half-a-dozen nodes, has been running continuously since 2004 under the streets of Cambridge Massachussetts, linking Harvard and Boston Universities. Anton Zeilinger’s group in Vienna is leading a collaboration (Space-QUEST) involving the European Space Agency, that will see an entangled photon source on the International Space Station by 2012 for the distribution of entanglement to widely separated ground stations from space; a quite remarkable prospect that would allow testing of the properties of entanglement over longer distances than possible on Earth, as well as key distribution between very widely separated sites11 . While quantum cryptography is not exclusively concerned with quantum key distribution, also including discussion of other kinds of protocols such as bitcommitment (of which we will hear a little more later), it is true to say that key distribution has been the dominant interest. It is therefore important to note 10 The current record for free-space quantum key distribution (as of June 2007) is 144km, from La Palma in the Canaries to Tenerife (Ursin et al., 2007). 11 See www.quantum.at/quest.

FIRST STEPS WITH QUANTUM INFORMATION

211

that in the context of key distribution, quantum cryptography is not concerned with the actual transmission of secret messages, or with hiding messages in quantum systems. Rather, it deals with the problem of establishing certain necessary conditions for the classical transmission of secret messages, in a way that could not be achieved classically. The keys that Alice and Bob arrive at after such pains, using their transmitted quantum systems, are not themselves messages, but a means of encoding real messages secretly. 4.1.4

Entanglement-Assisted Communication

In his lectures Wittgenstein used to say: Don’t look for the meaning, look for the use.12 Misappropriating gently, we might describe quantum information theorists as adopting just such an attitude vis a` vis entanglement. The strategy has paidoff handsomely. Focusing on what one can do with entanglement, considered as a communication and computational resource, the theory of entanglement has blossomed enormously, with the development of a range of quantitative measures of entanglement, intensive study of different kinds of bi-partite and multi-partite entanglement and detailed criteria for the detection and characterisation of entanglement (see (Bruss, 2002) for a succinct review; (Eisert and Gross, 2006) for more on multi-particle entanglement). The conceptual framework provided by questions of communication and computation was essential to presenting the right kinds of questions and the right kinds of tools to drive these developments. A state is called entangled if it is not separable, that is, if it cannot be written in the form:  αi ρiA ⊗ ρiB , for mixed states, |ΨAB = |φA |ψB , for pure, or ρAB = i



where αi > 0, i αi = 1 and A, B label the two distinct subsystems. The case of pure states of bipartite systems is made particularly simple by the existence of the Schmidt decomposition—such states can always be written in the form: √ |ΨAB = pi |φ¯i A |ψ¯i B , (4.1) i

where {|φ¯i }, {|ψ¯i } are orthonormal bases for systems A and B respectively, and pi are the (non-zero) eigenvalues of the reduced density matrix of A. The number of coefficients in any decomposition of the form (4.1) is fixed for a given state |ΨAB , hence if a state is separable (unentangled), there is only one term in the Schmidt decomposition, and conversely. For the mixed state case, this simple test does not exist, but progress has been made in providing operational criteria for entanglement: necessary and sufficient conditions for 2 ⊗ 2 and 2 ⊗ 3 dimensional systems and necessary conditions for separability (sufficient conditions for entanglement) otherwise (Horodecki et al., 1996; 12 Note:

this doesn’t appear in print. John Wisdom told Roger White that it was said in lectures (private communication).

212

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

Peres, 1996). (See (Seevick and Uffink, 2001; Seevinck and Svetlichny, 2002) for discussion of N -party criteria.) It is natural to think that shared entanglement could be a useful communication-theoretic resource; that sharing a pair of systems in an entangled state would allow you to do things that you could not otherwise do. (A familiar one: violate a Bell inequality.) The essence of entangled systems, after all, is that they possess global properties that are not reducible to local ones; and we may well be able to utilise these distinctive global properties in trying to achieve some communication task or distributed computational task. The central idea that entanglement—genuinely quantum correlation—differs from any form of classical correlation (and therefore may allow us to do things a shared classical resource would not) is enshrined in the central law (or postulate) of entanglement theory: that the amount of entanglement that two parties share cannot be increased by local operations that each party performs on their own system and classical communication between them. This is a very natural constraint when one reflects that one shouldn’t be able to create shared entanglement ex nihilo. If Alice and Bob are spatially separated, but share a separable state, then no sequence of actions they might perform locally on their own systems, even chains of conditional measurements (where Bob waits to see what result Alice gets before he choses what he will do; and so on) will turn the separable state into an entangled one. Classical correlations may increase, but the state will remain separable13 . Possessing such a non-classical shared resource, then, we can proceed to ask what one might be able to do with it. The two paradigmatic cases of the use of entanglement to assist communication are superdense coding (Bennett and Weisner, 1992) and teleportation (Bennett et al., 1993). 4.1.4.1 Superdense Coding Superdense coding is a protocol that allows you to send classical information in a surprising way using shared entanglement. If Alice and Bob share a maximally entangled state of two qubits, such as the singlet state, then Alice will be able to transmit to Bob two classical bits when she only sends him one qubit, twice as much as the maximum we usually expect to be able to send with a single qubit, and apparently in violation of the Holevo bound! The trick is that Alice may use a local unitary operation to change the global state of the entangled pair. Applying one of the Pauli operators {1, σx , σy , σz } to her half of the entangled pair, she can flip the joint state into one of the others of the four maximally entangled Bell states (see Table 4.1), a choice of one from four, corresponding to two bit values (00, 01, 10 or 11). If Alice now sends Bob 13 If Alice and Bob were in the same location, though, it would be easy for them to turn a separable state into an entangled state, as they can perform operations on the whole of the tensor √ product Hilbert space (e.g. perform a unitary on the joint space mapping |↑ A |↑ B to 1/ 2(|↑ A |↓ B − |↓ A |↑ B)). When spatially separated, they may only perform operations on the individual systems’ Hilbert spaces.

FIRST STEPS WITH QUANTUM INFORMATION

|φ+  |φ−  |ψ +  |ψ − 

√ = 1/√2(|↑ |↑ = 1/ √2(|↑ |↑ = 1/√2(|↑ |↓ = 1/ 2(|↑ |↓

⎫ + |↓ |↓ ) ⎪ ⎪ ⎬ − |↓ |↓ ) + |↓ |↑ ) ⎪ ⎪ ⎭ − |↓ |↑ )

=

213

⎧ −iσy ⊗ 1 |ψ −  ⎪ ⎪ ⎨ −σx ⊗ 1 |ψ −  σz ⊗ 1 |ψ −  ⎪ ⎪ ⎩ 1 ⊗ 1 |ψ − 

Table 4.1 The four Bell states, a maximally entangled basis for 2 ⊗ 2 dim. systems her half of the entangled pair, he can simply perfom a measurement in the Bell basis to see which of the four states Alice has produced, thereby gaining two bits of information (fig. 4.1).

A

, σι

1

H

H

B 2

Fig. 4.1. Superdense coding: time runs along the horizontal axis. A maximally entangled state of systems 1 and 2 is prepared by Bob (B), here by the action of a Hadamard gate, H, which performs a rotation of π around an axis at an angle of π/4 in the z-x plane; followed by a controlled-NOT operation—the circle indicates the control qubit, the point of the arrow, the target, to which σx is applied if the control is in the 0 computational state. System 1 is sent to Alice (A) who may do nothing, or perform one of the Pauli operations. On return of system 1, Bob performs a measurement in the Bell basis, here by applying a controlled-NOT operation, followed by the Hadamard gate. This allows him to infer which operation was performed by Alice. But what about the Holevo bound? How can it be that a single qubit is carrying two classical bits in this protocol? The simple answer is that it is not. The presence of both qubits is essential for the protocol to work; and it is the pair, as a whole, that carry the two bits of information; therefore there is no genuine conflict with the Holevo bound. What is surprising, perhaps, is the time ordering in the protocol. There would be no puzzle at all if Alice simply encoded two classical bit values into the state of a pair of qubits and sent the pair to Bob (and she could choose any othogonal basis for the pair, whether separable or entangled to do this, so long as Bob knows which she opts for). But although there are two qubits involved in the protocol, Alice doesn’t make her choice of classical bit value until one half of the entangled pair is with her and one half

214

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

with Bob. It then looks puzzling how, when she has access only to one system, she could encode information into both.14 And one might think that it must be the qubit she sends to Bob that really contains the information, from considerations of locality and continuity. It turns out that this latter thought rests on a mistake, however, one which also proves significant in understanding teleportation; we will discuss it in §4.2.4. In truth, superdense coding is to be understood in terms of a simple physical mechanism, albeit a non-classical one. The protocol relies on the fact that in the presence of entanglement, local operations can have a non-trivial effect on the global state of the system, that is, can change the irreducibly global properties of the joint system. In particular, it is possible to span bases of maximally entangled states simply by performing local operations (Bennett and Weisner, 1992). Alice, performing her unitary on her system, is able to make a change in the global properties of the joint system; a change, note, that is in fact as great as it could be, flipping the original joint state into one orthogonal to it. It’s because of this physical property of maximally entangled states that Alice is able to encode two bit values into the global state of the joint system when she will, and when she only has access to one half of the pair. (See (Timpson and Brown, 2002) and (Timpson, 2005) for discussion of whether this sort of phenomenon amounts to a new form of non-locality or not.) 4.1.4.2 Teleportation The notion of teleportation is familiar from science fiction: objects are made to disappear (dematerialise) from one location and reappear (re-materialise) exactly as they were before at another, distant, location. Anyone with a cursory knowledge of quantum mechanics might think that there were fundamental physical reasons why such a process would be impossible. To make something, or someone, re-appear exactly as before, it would seem that we would need to be able to determine their prior physical state exactly. But this would require knowing the quantum states of each individual component of the person or thing, down to the last atom, presumably; and we know that it is just not possible to determine unknown quantum states; and we may well disturb things trying to do so. So teleportation must be physically impossible. But is it? Surprisingly, teleportation does turn out to be possible if we make use of some entanglement. In quantum teleportation Alice and Bob again share a pair of particles in a maximally entangled state. If Alice is presented with some system in an unknown quantum state then she is able to make this very state re-appear at Bob’s location, while it is destroyed at hers (fig. 4.2). Moreover—and this is the remarkable bit—nothing depending on the identity of the unknown state crosses the region between. Superdense coding uses entanglement to assist classical communication, 14 The communication in this protocol goes in two steps, first the sharing of the entanglement, then the sending by Alice of her qubit to Bob. One way to think of things is that sharing entanglement is a way of saving up some communication in advance, whose content you can determine later, at any time you wish. Compare the discussion in (Mermin, 2001a) of a similar point regarding teleportation.

FIRST STEPS WITH QUANTUM INFORMATION

215

but in quantum teleportation, entanglement is being used to transmit something purely quantum mechanical—an unknown quantum state, intact, from Alice to Bob. It therefore deserves to be known as the first protocol genuinely concerned with quantum information transmission proper; although we should note that the protocol was devised a little before the full-blown concept of quantum information had been developed by Schumacher.

χ

ψ Alice

χ 12

Bob

Alice

2

Bob

Fig. 4.2. Teleportation Let’s consider the standard example using qubits in more detail (Bennett et al., 1993). We begin with Alice and Bob sharing one of the four Bell states, let’s say the singet state |ψ −  . Alice is presented with a qubit in some unknown state |χ = α |↑ + β |↓ and her aim is to transmit this state to Bob. By performing a suitable joint measurement on her half of the entangled pair and the system whose state she is trying to transmit (in this example, a measurement in the Bell basis), Alice will change the state of Bob’s half of the entangled pair into a state that differs from |χ by one of four unitary transformations, depending on what the outcome of her measurement was. If a record of the outcome of Alice’s measurement is then sent to Bob, he may perform the required operation to obtain a system in the state Alice was trying to send (fig. 4.3). χ

Alice

χ Bob

Fig. 4.3. Teleportation. A pair of systems is first prepared in an entangled state and shared between Alice and Bob, who are widely spatially separated. Alice also possesses a system in an unknown state |χ . Once Alice performs her Bell-basis measurement, two classical bits recording the outcome are sent to Bob, who may then perform the required conditional operation to obtain a system in the unknown state |χ . (Continuous black lines represent qubits, dotted lines represent classical bits.)

216

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

The end result of the protocol is that Bob obtains a system in the state |χ , with nothing that bears any relation to the identity of this state having traversed the space between him and Alice. Only two classical bits recording the outcome of Alice’s measurement were sent between them; and the values of these bits are completely random, with no dependence on the parameters α and β. Meanwhile, no trace of the identity of the unknown state remains in Alice’s region, as required, of course, to accord with the no-cloning theorem (the state of her original system will usually now be maximally mixed). The state has indeed disappeared from Alice’s region and reappeared in Bob’s, so ‘teleportation’ really does seem an appropriate name for this phenomenon. The formal description of the process is straightforward. We begin with system 1 in the unknown state |χ and Alice and Bob sharing a pair of systems (2 and 3) in the singlet state |ψ −  . The total state of the three systems at the beginning of the protocol is therefore simply     1  |χ 1 ψ − 23 = √ α |↑ 1 + β |↓ 1 |↑ 2 |↓ 3 − |↓ 2 |↑ 3 . 2

(4.2)

Notice that at this stage, the state of system 1 factorises from that of systems 2 and 3; and so in particular, the state of Bob’s system is independent of α and β. We may re-write this initial state in a suggestive manner, though:   α |↑ 1 |↑ 2 |↓ 3 + β |↓ 1 |↑ 2 |↓ 3 − α |↑ 1 |↓ 2 |↑ 3 − β |↓ 1 |↓ 2 |↑ 3 √ |χ 1 ψ − 23 = 2

=

(4.3)

     1  +   φ 12 α |↓ 3 − β |↑ 3 + φ− 12 α |↓ 3 + β |↑ 3 2  +     −    + ψ 12 −α |↑ 3 + β |↓ 3 + ψ 12 −α |↑ 3 − β |↓ 3 . (4.4)

The basis used is the set         {φ± 12 |↑ 3, φ± 12 |↓ 3, ψ ± 12 |↑ 3, ψ ± 12 |↓ 3}, that is, we have chosen (as we may) to express the total state of systems 1,2 and 3 using an entangled basis for systems 1 and 2, even though these systems are quite independent. But so far, of course, all we have done is re-written the state in a particular way; nothing has changed physically and it is still the case that it is really systems 2 and 3 that are entangled and wholly independent of system 1, in its unknown state. Looking closely at (4.4) we notice that the relative states of system 3 with respect to particular Bell basis states for 1 and 2 have a very simple relation to

FIRST STEPS WITH QUANTUM INFORMATION

217

the initial unknown state |χ ; they differ from |χ by one of four local unitary operations:        1  +   φ 12 −iσy3 |χ 3 + φ− 12 σx3 |χ 3 |χ 1 ψ − 23 = 2         + ψ + 12 −σz3 |χ 3 + ψ − 12 −13 |χ 3 ,

(4.5)

where the σi3 are the Pauli operators acting on system 3 and 1 is the identity. To re-iterate, though, only system 1 actually depends on α and β; the state of system 3 at this stage of the protocol (its reduced state, as it is a member of an entangled pair) is simply the maximally mixed 1/2 1. Alice is now going to perform a measurement. If she were simply to measure system 1 then nothing of interest would happen—she would obtain some result and affect the state of system 1, but systems 2 and 3 would remain in the same old state |ψ −  . However, as she has access to both systems 1 and 2, she may instead perform a joint measurement, and now things get interesting. In particular, if she measures 1 and 2 in the Bell basis, then after the measurement we will be left with only one of the terms on the right-hand side of eqn. (4.5), at random; and this means that Bob’s system will have jumped instantaneously into one of the states −iσy3 |χ 3, σx3 |χ 3, −σz3 |χ 3 or − |χ 3, with equal probability. But how do things look to Bob? As he neither knows whether Alice has performed her measurement, nor, if she has, what the outcome turned out to be, he will still ascribe the same, original, density operator to his system—the maximally mixed state.15 No measurement on his system could yet reveal any dependence on α and β. To complete the protocol therefore, Alice needs to send Bob a message instructing him which of four unitary operators to apply (iσy , σx , −σz , −1) in order to make his system acquire the state |χ with certainty; for this she will need to send two bits16 . With these bits in hand, Bob applies the needed transformation and obtains a system in the state |χ .17 We should note that this quantum mechanical process differs from science fiction versions of teleportation in at least two ways, though. First, it is not matter that is transported, but simply the quantum state |χ ; and second, the protocol is not instantaneous, but must attend for its completion on the arrival of the classical bits sent from Alice to Bob. Whether or not the quantum protocol 15 Notice that an equal mixture of the four possible post-measurement states of his system results in the density operator 1/2 1. 16 Two bits are clearly sufficient, for the argument that they are strictly necessary, see (Bennett et al., 1993) fig. 2. 17 In this description, as in the original (Bennett et al., 1993) treatment, we have assumed that a process of collapse occurs after Alice’s measurement, in order to pick out, probabilistically, a definite state of Bob’s system. It is straightforward, however, to give no-collapse versions of the teleportation protocol. (Vaidman, 1994) provides an Everettian description and (Braunstein, 1996) a detailed general discussion of teleportation in a no-collapse setting. See (Timpson, 2006) for further discussion.

218

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

approximates to the science fiction ideal, however, it remains a very remarkable phenomenon from the information-theoretic point of view18 . For consider what has been achieved. An unknown quantum state has been sent to Bob; and how else could this have been done? Only by Alice sending a quantum system in the state |χ to Bob,19 for she cannot determine the state of the system and send a description of it instead. If, however, Alice did per impossibile somehow learn the state and send a description to Bob, then systems encoding that description would have to be sent between them. In this case something that does bear a relation to the identity of the state is transmitted from Alice to Bob, unlike in teleportation. Moreover, sending such a description would require a very great deal of classical information, as in order to specify a general state of a two dimensional quantum system, two continuous parameters need to be specified. The picture we are left with, then, is that in teleportation there has been a transmission of something that is inaccessible at the classical level; in the transmission this information has been in some sense disembodied; and finally, the transmission has been very efficient—requiring, apart from prior shared entanglement, the transfer of only two classical bits. The initial entanglement that Alice and Bob shared, however, will have been used up at the end of the protocol. If Alice wanted to teleport any more unknown states to Bob, they would need to be in possession of more entangled pairs. While the formal description of teleportation is, as we have seen, simple, the question of how one ought to understand what is going on has been extremely vexed. We will return to this question in §4.2.4. It is worth noting, however, that teleportation, just like superdense coding, is driven by the fact that local operations can induce substantive differences in global properties of entangled systems (Braunstein et al., 2000); again, specifically, by the fact that maximally entangled bases can be spanned by local unitary operations. Finally, we should note that since teleportation is a linear process, it may be used for the process of entanglement swapping. Let’s suppose that Alice shares one maximally entangled state with Bob and another with Charles. If she performs the teleportation protocol on her half of the Alice–Charles entangled pair, then the result will be that the initial entanglement between Alice and Bob will be destroyed, and the initial entanglement between Alice and Charles will be destroyed, but Charles and Bob will now share a maximally entangled pair when they did not before. Thus entanglement can be swapped from Alice–Charles to Charles–Bob, at the cost of using up an entangled pair that Alice and Bob shared. 18 Interestingly, it can be argued that quantum teleporation is perhaps not so far from the sci-fi ideal as one might initially think. (Vaidman, 1994) suggests that if all physical objects are made from elementary particles, then what is distinctive about them is their form (i.e. their particular state) rather than the matter from which they are made. Thus it seems one could argue that objects really are teleported in the protocol. 19 Or by her sending Bob a system in a state explicitly related to |χ (cf. (Park, 1970)).

FIRST STEPS WITH QUANTUM INFORMATION

219

4.1.4.3 Quantifying Entanglement The basic examples we have seen of superdense coding and teleportation both make use of maximally entangled pairs of qubits. If the qubits were less than maximally entangled then the protocols would not work properly, perhaps not at all. Given that entanglement is a communication resource that will be used up in a process like teleportation, it is natural to want to quantify it. The amount of entanglement in a Bell state, the amount required to perform teleportation of a qubit, is defined as one ebit. The general theory of quantifying entanglement takes as its central axiom the condition that we have already met: no increase of entanglement under local operations and classical communication. In the case of pure bipartite entanglement, the measure of degree of entanglement turns out to be effectively unique, given by the von Neumann entropy of the reduced states of the entangled pair (Popescu and Rohrlich, 1997; Donald et al., 2002). In the case of mixed state entanglement, there exists a range of distinct measures. (Vedral et al., 1997; Vedral and Plenio, 1998) propose criteria that any adequate measure must satisfy and discuss relations between a number of measures. 4.1.5

Quantum Computers

Richard Feynman was the prophet of quantum computation. He pointed out that it seems that one cannot simulate the evolution of a quantum mechanical system efficiently on a classical computer. He took this to imply that there might be computational benefits to be gained if computations are carried out using quantum systems themselves rather than classical systems; and he went on to describe a universal quantum simulator (Feynman, 1982). However it is with Deutsch’s introduction of the concept of the universal quantum computer that the field really begins (Deutsch, 1985). In a quantum computer, we want to use quantum systems and their evolution to perform computational tasks. We can think of the basic components of a quantum computer as a register of qubits and a system of computational gates that can be applied to these qubits to perform various evolutions and evaluate various functions. States of the whole register of qubits in the computational basis would be |0 |0 |0 . . . |0 , for example, or |0 |1 |0 . . . |1 , which can also be written |000 . . . 0 and |010 . . . 1 respectively; these states are analogous to the states of a classical register of bits in a normal computer. At the end of a computation, one will want the register to be left in one of the computational basis states so that the result may be read out. The immediately exciting thing about basing one’s computer on qubits is that it looks as if they might be able to provide one with massive parallel processing. Suppose we prepared each of the N qubits in our register in an equal superposition of 0 and 1, then the state of the whole register will end up being in an equal superposition of all the 2N possible sequences of 0s and 1s: 1 √ (|0000 . . . 00 + |0000 . . . 01 + |0000 . . . 11 + . . . + |1111 . . . 11 ). 2N

220

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

A classical N -bit register can store one of 2N numbers: an N -qubit register looks like it might store 2N numbers simultaneously, an enormous advantage. Now if we have an operation that evaluates a function of an input string, the linearity of quantum mechanics ensures that if we perform this operation on our superposed register, we will evaluate the function simultaneously for all possible inputs, ending up with a register in which all the 2N outputs are superposed! This might look promising, but the trouble is, of course, that it is not possible to read out all the values that are superposed in this state. Measuring in the computational basis to read out an outcome we will get a “collapse” to some one of the answers, at random. Thus despite all the quantum parallel processing that went on, it proves very difficult to read much of it out. In this naive example, we have done no better than if we had evaluated the function on a single input, as classically. It is for this reason that the design of good quantum algorithms is a very difficult task: one needs to make subtle use of other quantum effects such as the constructive and destructive interference between different computational paths in order to make sure that we can read out useful information at the end of the computation, i.e., that we can improve on the efforts of classical computers. The possible evolutions of states of quantum mechanical systems are given by unitary operators. A universal quantum computer will thus be a system that can (using finite means) apply any unitary operation to its register of qubits. It turns out that a relatively small set of one and two qubit quantum gates is sufficient for a universal quantum computer.20 A quantum gate is a device that implements a unitary operation that acts on one or more qubits (we have already seen some schematic examples in figs. 4.1 and 4.3). By combining different sequences of gates (analogously to logic gates in a circuit diagram) we can implement different unitary operations on the qubits they act on. A set of gates is universal if by combining elements of the set, we can build up any unitary operation on N qubits to arbitrary accuracy. So what can quantum computers do? First of all, they can compute anything that a classical Turing machine can compute; such computations correspond to permutations of computational basis states and can be achieved by a suitable subset of unitary operations. Second, they can’t compute anything that a classical Turing machine can’t. This is most easily seen in the following way (Ekert and Jozsa, 1996). We can picture a probabilistic Turing machine as following one branch of a tree-like structure of computational paths, with the nodes of the tree corresponding to computational states. The edges leading from the nodes correspond to the different computational steps that could be made from that state. Each path is labelled with its probability and the probability of a final, halting, state is given by summing the probabilities of each of the paths leading to that state. We may 20 See for example (Nielsen and Chuang, 2000, §4.5). We are considering the quantum network model of quantum computation which is more intuitive and more closely linked to experimental applications than the alternative quantum Turing machine model that Deutsch began with. The two models were shown to be equivalent in (Yao, 1993).

THE CONCEPT(S) OF INFORMATION

221

see a quantum computer in a similar fashion, but this time with the edges connecting the nodes being labelled with the appropriate probability amplitude for the transition. The quantum computer follows all of the different computational paths at once, in a superposition; and because we have probability amplitudes, the possibility of interference between the different computational paths exists. However, if we wished, we could program a classical computer to calculate the list of configurations of the quantum computer and calculate the complex numbers of the probability amplitudes. This would allow us to calculate the correct probabilities for the final states, which we could then simulate by tossing coins. Thus a quantum computer could be simulated by a probabilistic Turing machine; but such a simulation is very inefficient. The advantage of quantum computers lies not, then, with what can be computed, but with its efficiency. In computational complexity, the crudest measure of whether a computational task is tractable or not, or an algorithm efficient, is given by seeing how the resources required for the computation scale with increased input size. If the resources scale polynomially with the size of the input in bits, the task is deemed tractable. If they do not, in which case the resources are said to depend exponentially on the input size, the task is called hard or intractable. A breakthrough in quantum computation was achieved when (Shor, 1994) presented an efficient algorithm for factoring on a quantum computer, a task for which it is believed no efficient classical algorithm exists.21 Hence quantum computers provide exponential speed-up over the best known classical algorithms for factoring; and this is strong evidence that quantum computers are more powerful than classical computers. Another very important quantum algorithm is due to (Grover, 1996). This algorithm also provides a speed-up, although not an exponential one, over classical methods for searching an unstructured database. For √ a database of size n, the algorithm allows the desired object to be found in n steps, rather than the order of n steps one would expect classically. (A good review of quantum computation up to and including the development of Shor’s algorithm is provided by (Ekert and Jozsa, 1996).) 4.2

The Concept(s) of Information

Having reviewed some of the basic features of quantum information theory it’s time we were a little more precise about some conceptual matters, specifically, more precise about just what information in this theory is supposed to be. ‘Information’ is a notoriously promiscuous term with a marked capacity for dulling critical capacities: it is used in different ways in a large variety of different contexts across the sciences and in everyday life, in various technical and nontechnical uses; and typically little more than lip service is paid to the ensuing 21 Thus quantum computers would destroy the security of the widely-used RSA public-key protocol mentioned earlier. It’s therefore perhaps comforting that what quantum mechanics takes with one hand (ease of factoring, therefore violating state-of-the-art security) it gives back with the other (quantum key distribution).

222

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

conceptual distinctness of these various uses. Often the introduction of a neologism would be preferable to taxing further the sadly over-worked ‘information’. (See (Timpson, 2004b) for discussion of distinctions between the everyday semantic/epistemic notion of information and the concepts of information that arise in discussions of information theory.) Here we will concern ourselves with the question: What is quantum information? It is commonly supposed that this question has not yet received, perhaps cannot be expected to receive, a definite or illuminating answer. Vide the Horodeckis: Quantum information, though not precisely defined, is a fundamental concept of quantum information theory. (Horodecki et al., 2006)

And Jozsa: |ψ may be viewed as a carrier of “quantum information” which ... we leave ... undefined in more fundamental terms ... Quantum information is a new concept with no classical analogue ... In more formal terms, we would aim to formulate and interpret quantum physics in a way that has a concept of information as a primary fundamental ingredient. Primary fundamental concepts are ipso facto undefined (as a definition amounts to a characterization in yet more fundamental terms) and they acquire meaning only afterward, from the structure of the theory they support. (Jozsa, 2004)

However, I shall demur from this. Given a proper understanding of the meaning and significance of the coding theorems, it becomes clear that quantum information already admits of a perfectly precise and adequate definition; and moreover, that there exist very strong analogies (pace Jozsa) between classical and quantum information. Both may be seen as species of a single genus. In addition, the ontological status of quantum information can be settled: I shall argue that quantum information is not part of the material contents of the world. In both classical and quantum information theory, we will see, the term ‘information’ functions as an abstract, not a concrete, noun.22 4.2.1 Coding Theorems: Both What and How Much Discussions of information theory, quantum and classical, generally begin with an important caveat concerning the scope of their subject matter. The warnings typically take something like the following form: Note well, reader: Information theory doesn’t deal with the content or usefulness of information, rather it deals only with the quantity of information.

Now while there is obviously an important element of truth in statements such as these, they can also be seriously misleading, in two interrelated ways. First, the distinction between the technical notions of information deriving from information theory and the everyday semantic/epistemic concept is not sufficiently noted; for it may easily sound as if information theory does at least describe the amount of information in a semantic/epistemic sense that may be around. But this is not so. In truth we have two quite distinct concepts (or families 22 This

2006).

lesson already features in related ways in (Timpson, 2004b; Timpson, 2005; Timpson,

THE CONCEPT(S) OF INFORMATION

223

of concepts)—call them ‘informatione ’ and ‘informationt ’ for the everyday and technical concepts respectively—and quantifying the amount of the latter does not tell us about the quantity, if any, of the former, as Shannon himself noted (Shannon, 1948, p. 31). For elaboration on the distinctness of informatione and informationt , including discussion of the opposing view of (Dretske, 1981), see (Timpson, 2004b, Chapter 1). The second point of concern is that the coding theorems that introduced the classical (Shannon, 1948) and quantum (Schumacher, 1995) concepts of informationt do not merely define measures of these quantities. They also introduce the concept of what it is that is transmitted, what it is that is measured. Thus we may as happily describe what informationt is, as how much of it there may be. Let us proceed to do so. We may take our lead from Shannon: The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another. (Shannon, 1948, p. 31)

The technical notion of information then enters when we note that informationt may be identified as what it is the aim of such a communication protocol to transmit. Thus the following definition suggests itself (Timpson, 2004b, §1.2.3): Informationt is what is produced by an informationt source that is required to be reproducible at the destination if the transmission is to be counted a success.

This definition is evidently a very general one, but that is as it should be. If we follow Shannon in his specification of what the problem of communication is, then the associated notion of informationt introduced should be sensitive to what one’s aims and interests in setting up a communication system are. Different aims and interests may give rise to more or less subtly differentiated concepts of informationt as what one is interested in transmitting and reproducing varies: indeed we will see the most vivid example of this when comparing classical and quantum informationt . Yet these all remain concepts of information t as they all arise in the general setting adumbrated by Shannon that the broad definition seeks to capture. There are several components to the generality of this definition. One might ask what informationt sources are; what they produce; and what counts as success. The answers given to these questions, though, will in general be interdependent (we will see some examples below). What counts as a successful transmission will, of course, depend once more upon what one’s aims and interests in devising the communication protocol are. Specifying what counts as success will play a large part in determining what it is we are trying to transmit; and this, in turn, will determine what it is that informationt sources produce that is the object of our interest. Finally, informationt sources will need to be the sorts of things that produce what it is that we are concerned to transmit. 4.2.1.1 Two Types of Informationt Source Some examples will help put flesh on the remarks so far. The prototypical informationt source was introduced by Shannon in his noiseless coding theorem. Such a source is some object which

224

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

may be characterised as producing elements drawn from a fixed alphabet, say a discrete alphabet {a1 , a2 , . . . , an }, with given probabilities p(ai ). (The extension to the continuous case takes the obvious form.) Messages are then long sequences of elements drawn from the alphabet. The aim of the communication protocol is to be able to reproduce at some distant point whatever sequence the source produces. If classical informationt is what is produced by a classical informationt source—the Shannon prototype—then quantum informationt is what is produced by a quantum informationt source. Schumacher’s notion of a quantum informationt source is the immediate generalisation to the quantum domain of the Shannon prototype: A quantum informationt source is some object which may be characterised as producing systems in quantum states drawn from a fixed set of states, e.g., {ρa1 , ρa2 , . . . , ρan }, with probabilities p(ai ). Again, we will be interested in long sequences drawn from the source. We are now in a position to give a general answer to the question of what informationt sources produce: they produce sequences of states. Or more precisely, they produce tokens of particular types. 4.2.1.2 Classical Informationt Let us look more closely at the example of classical informationt . As we know, a distinguishing characteristic of classical informationt when compared with quantum informationt is that the varying outputs of a classical information source are distinguishable one from another, i.e., one can tell which of the possible elements ai was produced in a given instance. After the source has run for a while, a given sequence of states will have been produced, for example a sequence like: a7 a3 a4 a9 a9 a7 a1 . . . a2 a1 a3 a7 . . . a1 a9 a1 . This particular sequence could be identified by description (e.g., “It’s the sequence ‘a7 a3 a4 a9 . . .’,” etc.), by name (call it ‘sequence 723’), or, given the distinguishability of the ai , identified demonstratively. (Handed a concrete token of the sequence, one could in principle determine—generally, infer—what particular sequence it was.) This sequence (type) will have been realised by a given system, or systems, taking on the properties that correspond to being in the various states ai , in order. What will be required at the end of the communication protocol is either that another token of this type actually be reproduced at a distant point; or at least, that it be possible to reproduce it there, by a standard procedure. But what is the informationt produced by the source that we desire to transmit? Is it the sequence type, or the token? The answer is quick: it is the type; and we may see why when we reflect on what it would be to specify what is produced and what is transmitted. We would specify what is produced (transmitted) by naming or otherwise identifying the sequence itself—it was sequence 723, the sequence ‘a7 a3 a4 a9 . . .’, in the example—and this is to identify the type, not to

THE CONCEPT(S) OF INFORMATION

225

identify or name a particular concrete instance of it.23 4.2.1.3 Quantum Informationt The quantum example is similar, but here we must distinguish two cases. The basic type of quantum informationt source (Schumacher, 1995) is one which produces pure states: we may take as our example a device which outputs systems in one of the states {|a1  , |a2  , . . . , |an  } with probabilities p(ai ); these states need not be orthogonal. Then the output of this source after it has been running for a while might be a sequence of systems in particular quantum states, e.g., |a7  |a3  |a4  |a9  |a9  |a7  |a1  . . . |a2  |a1  |a3  |a7  . . . |a1  |a9  |a1  . Again we have a sequence type, instantiated by particular systems taking on various states. And again such a sequence may be named or described, but notice that this time it will not, in general, be possible to identify what sequence a given number of systems instantiate merely by being presented with them, as the |ai  need not be orthogonal, so typically will not be distinguishable. However, this does not stop the lesson learnt above applying once more: the informationt produced by the source—quantum informationt , now—will be specified by specifying what sequence (type) was produced. These sequences will clearly be of a different, and more interesting, sort than those produced by a classical source. (One might say that with classical and quantum informationt , one was concerned with different types of type!) Just as before, though, what will be required for a successful transmission to be effected is that another token of this type be reproduced, or be reproducible (following a standard procedure) at the desired destination. That is, we need to be able to end up with a sequence of systems taking on the appropriate quantum states in the right order. What is transmitted is a particular sequence of quantum states. This was the most basic form of quantum informationt source. We gain a richer notion when we take into account the possibility of entanglement. So consider a different type of quantum informationt source (Schumacher, 1995), one that always outputs systems in a particular mixed state ρ. Such a source might seem dull until we reflect that these might be systems in improperly mixed states (d’Espagnat, 1976), that is, components of larger entangled systems, the other parts of which may be inaccessible to us. In particular, there could be a variety of different states of these larger systems that give rise to the same reduced state for the smaller components that the informationt source presents us with. How should we conceive of what this informationt source produces? 23 Even when we identify what was produced by gesturing to the concrete token and saying ‘That was what was produced’, we are identifying the sequence type, here by means of what Quine would call ‘deferred ostension’. The ‘what’ in these contexts is functioning as an interrogative, not a relative, pronoun (c.f. (Glock, 2003, p. 76) for an analogous case).

226

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

We have a choice. We might be unimaginative and simply require that the ‘visible’ output of the source be reproducible at the destination. The source produces a sequence ρ ⊗ ρ ⊗ ρ ⊗ . . . and we should be able to reproduce this sequence at the destination. What is transmitted will then be specified by specifying this sequence. But we might be more interesting and require that not only should the ‘visible’ output sequence be reproducible at the destination, but so also should any entanglement that the original output systems might possess. Given the importance of being able to transfer entanglement in much of quantum informationt theory, this latter choice turns out to be the better one to make.24 We may model the situation as follows. Take three sets of systems, labelled A, B and C. Systems in set B are the systems that our source outputs, we suppose them all to be in the mixed state ρ. Systems in set A are the hidden partners of systems in set B. The ith member of B (Bi ) can be thought to be part of a larger system whose other part consists of the ith member of A (Ai ); in addition, we assume that the joint system composed of Ai and Bi together is in some pure state |ψ Ai Bi which will give a reduced state of ρ when we trace over Ai (such a state is called a purification of ρ). If ρ is mixed then |ψ Ai Bi , by assumption pure, will necessarily be entangled. The systems in set C are the ‘target’ systems at the destination point. Now consider the ith output of our informationt source. This will be the system Bi , having the reduced state ρ. But this is only half the story: along with Bi is the hidden system Ai ; and together these are in the state |ψ Ai Bi . As the end result of the transmission process, we would like Ci to be in the state ρ, but if we are to preserve entanglement, then our truly desired end result would be Ci becoming entangled to Ai , in just the way Bi had been previously. So we actually desire that the pure state |ψ previously instantiated by Ai Bi should end up being instantiated by Ai and Ci . This would be transfer of the entanglement, or transfer of the ‘quantum correlation’, that Bi —the visible output of the source— had previously possessed. This may all now be expressed in terms of sequences of states once more. The quantum source outputs sequences of systems in entangled states, half of which (systems B) we see; and half of which (systems A) we do not. A particular segment of such a sequence might look like: . . . |ψ Ai Bi |ψ   Aj Bj |ψ   Ak Bk . . . , where |ψ   and |ψ   , like |ψ , are purifications of ρ. Such a sequence is the piece of quantum informationt produced and it will be successfully reproduced by a protocol if the end result is another token of the type, but this time involving the systems C: . . . |ψ Ai Ci |ψ   Aj Cj |ψ   Ak Ck . . . . 24 As (Duwell, 2005) has emphasised, this corresponds to the choice of the entanglement fidelity (cf. (Nielsen and Chuang, 2000, §9.3)) as the criterion of successful message reproduction for quantum informationt .

THE CONCEPT(S) OF INFORMATION

227

The general conclusion we may draw is that pieces of quantum informationt , far from being mysterious—perhaps unspeakable—are quite easily and perspicuously described. A given item of quantum informationt will simply be some particular sequence of Hilbert space states, whether the source produces systems in individual pure states, or as parts of larger entangled systems. What is more, we have seen that quantum informationt is closely analogous to classical informationt : in both cases, informationt is what is produced by the respective informationt sources (both fall under the general definition); and in both cases, what is produced can be analysed in terms of sequences of states (types). 4.2.2

Bits and Pieces

So far we have been emphasising the largely neglected point that the coding theorems characteristic of informationt theory provide us with a perfectly good and straightforward account of what informationt is; but we should not, in our enthusiasm, forget the more commonly emphasised aspect of these theorems. It is also of the utmost importance that the coding theorems provide us with a notion of how much informationt a given source outputs. How much informationt a source produces is measured, following Shannon, in terms of the minimal amount of channel resources required to encode the output of the source in such a way that any message produced may be accurately reproduced at the destination. That is, to ask how much informationt a source produces is ask to what degree is the output of the source compressible? Shannon showed that the compressibility of a classical informationt source is given by the familiar expression  H(A) = − p(ai ) log p(ai ), i

known as the Shannon informationt (logarithms to base 2). This specifies the number of bits required per letter to encode the output of the source. (Schumacher, 1995) extended this proof to the quantum domain, showing that the minimum number of qubits required per step to encode the output of quantum informationt sources of the sorts mentioned above, is given by the von Neumann entropy of the source: S(ρ) = −Trρ log ρ, where ρ is the density matrix associated with the output of the source. So this aspect of the coding theorems provides us with the notion of bits of information, quantum or classical; the amount of informationt that a source produces; and this is to be contrasted with pieces of informationt , what the output of a source (quantum or classical) is, as described above. 4.2.3

The Worldliness of Quantum Information

Let us now consider an important corollary of the discussion so far. It concerns the worldliness or otherwise of informationt . Is informationt part of the material contents of the world? In particular, is quantum informationt part of the material

228

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

contents of the world? Is it a new type of physical substance or stuff, admittedly, perhaps, a rather unusual one, that has a spatio-temporal location and whose ebb and flow it is the aim of quantum informationt theory to describe? The writings of some physicists (Jozsa, 1998; Penrose, 1998; Deutsch and Hayden, 2000, for example) might lead one to suppose so. However it follows from the analysis in §4.2.1 that this thought would be mistaken. Informationt , what is produced by a source, or what is transmitted, is not a concrete thing or a stuff. It is not so, because, as we have seen, what is produced/transmitted is a sequence type and types are abstracta. They are not themselves part of the contents of the material world, nor do they have a spatiotemporal location. Particular tokens of the type will have a location, of course, but the type itself, a given piece of informationt , will not. Putting the point in the formal mode, ‘informationt ’ in both the quantum and classical settings is an abstract noun (in fact an abstract mass noun), not a concrete one. This result may or may not come as a surprise. What is undoubted is that there has been confusion over it, particularly when the nature of quantum teleportation has been up for discussion (see §4.2.4). The realisation that quantum informationt is not a substance and is not part of the spatio-temporal contents of the world might lead on naturally to the conclusion that it therefore does not exist at all ; that there is no such thing as quantum informationt . This indeed was the conclusion of (Duwell, 2003) although he has since retreated from this position to one closer to that advocated here (Duwell, 2005). The negative conclusion might be termed nihilism about quantum informationt . Adopting a nihilist position, however, would appear to be an over-reaction to the fact that informationt is not a material thing. As we have seen, quantum informationt is what is produced by a quantum informationt source. This will be an abstractum (type), but there is no need to conclude thereby that it does not exist. Many abstracta are very often usefully said to exist. To appreciate the point it is perhaps helpful to compare with a famous example of a non-existing substance. So take ‘caloric’. This term was thought to refer to a material substance, one responsible for the thermal behaviour of various systems, amongst other things. But we found out that there was no such substance. So we say ‘Caloric does not exist’. But we also know now that there is no such substance as quantum informationt : why should we not therefore say ‘Quantum information does not exist’ ? The reason is that the two cases are entirely disanalogous, as the oddity of the phrasing in the previous sentence should immediately alert one to. The rˆ ole of ‘caloric’ was as a putative substance referring term; semantically it was a concrete noun, just one that failed to pick out any natural kind in this world. ole was never that By contrast ‘informationt ’ was always an abstract noun. It’s rˆ of referring to a substance. So it’s not that we’ve discovered that there’s no such substance as quantum informationt (a badly formed phrase), but rather that

THE CONCEPT(S) OF INFORMATION

229

attention has been drawn to the type of rˆ ole that the term ‘informationt ’ plays. And this is not one of referring to a substance, whether putatively or actually. So unlike the case of caloric, where we needed to go out into the world and discover by experiment whether or not there is a substance called ‘caloric’, we know from the beginning that the thought that there might be a substance called ‘informationt ’ is misbegotten, based on a misconception of the rˆ ole of the term. At this stage a further point must be addressed. One might be discomfited by my earlier comment that many abstracta are often usefully said to exist. Isn’t this an area of some dispute? Indeed, wouldn’t nominalists precisely be concerned to deny it? As it happens, though, the purposes of my argument may happily be served without taking a stand on such a contentious metaphysical issue. The point can be made that ‘informationt ’ is an abstract noun and that it therefore plays a fundamentally different rˆ ole from a substance referring term; that it would be wrong to assert that quantum informationt does not exist on the basis of recognising that quantum informationt is not a substance; without having to take a stand on the status of abstracta. In fact all that is required for our discussion throughout is a very minimal condition concerning types that comes in both nominalist and non-nominalist friendly versions. The non-nominalist version says the following: a piece of informationt , quantum or classical will be a particular sequence of states, an abstract type. What is involved in the type existing? Minimally, a sufficient condition for type existence will be that there be facts about whether particular concrete objects would or would not be tokens of that type. (Notice that this minimal condition needn’t commit one to conceiving of types as Platonic objects). The nominalist version takes a similar form, but simply asserts that talk of type existence is to be paraphrased away as talk of the obtaining of facts about whether or not concrete objects would or wouldn’t be instances of the type. 4.2.3.1 A Special Case Having argued against the nihilist view and adressed possible nominalist concerns, we should close this section of the discussion by noting that there remains one special case in which it would seem to be correct to assert that quantum informationt does not exist, the discussion so far notwithstanding. Suppose one denied that there were facts about what quantum states systems possessed, or about what quantum operations devices implement. Then there will be no fact about what the output of a quantum source is, so there will be no fact about whether the systems produced are or are not an instance of any relevant type. In this event, it would be appropriate to maintain that quantum informationt does not exist, as even the minimal criterion just given will not be satisfied. But does anyone hold this view of quantum mechanics? Yes: it is ‘quantum Bayesianism’ as advocated by Caves, Fuchs and Schack (see, e.g., (Fuchs, 2002a)) which we will be discussing in due course. For the quantum Bayesian, therefore, and perhaps only for them, it would be correct to say that quantum informationt does not exist.

230

4.2.4

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

Application: Understanding Teleportation

Why is it helpful to highlight the logico-grammatical status of informationt as an abstract noun? In short, because the matter has given rise to confusion; and nowhere more so than in discussion of entanglement-assisted communication. One of the claims of (Timpson, 2006) is that failure to recognise that informationt is an abstract noun is a necessary condition for finding anything conceptually problematic in teleportation, as so many have. Here’s how the story goes. The puzzles that teleportation presents cluster around two central questions. First, how is so much informationt transported in the protocol. And second, most pressingly, just how does the informationt get from Alice to Bob? We will concentrate on the second here (see (Timpson, 2006) for further discussion of the first). A very common view is expressed by (Jozsa, 1998; Jozsa, 2004) and (Penrose, 1998). In their view, the classical bits used in the protocol evidently can’t be carrying the informationt : two classical bits are quite insufficient to specify the state teleported and in any case the bit values are entirely independent of the identity of the state. Therefore the entanglement shared between Alice and Bob must be providing the channel down which the informationt travels. They conclude that in teleportation, an indefinitely large, or even infinite amount of informationt travels backwards in time from Alice’s measurement to the time at which the entangled pair was created, before propagating forward in time from that event to Bob’s performance of his unitary operation and the attaining by his system of the correct state. Teleportation seems to reveal that entanglement has a remarkable capacity to provide a hitherto unsuspected type of information channel, one which allows informationt to travel backwards in time; and a very great deal of it at that. It seems that we have made the discovery that quantum informationt is a type of informationt with the striking, and non-classical, property that it may flow backwards in time. The position is summarized succinctly by Penrose: How is it that the continuous “information” of the spin direction of the state that she [Alice] wishes to transmit ... can be transmitted to Bob when she actually sends him only two bits of discrete information? The only other link between Alice and Bob is the quantum link that the entangled pair provides. In spacetime terms this link extends back into the past from Alice to the event at which the entangled pair was produced, and then it extends forward into the future to the event where Bob performs his [operation]. Only discrete classical information passes from Alice to Bob, so the complex number ratio which determines the specific state being “teleported” must be transmitted by the quantum link. This link has a channel which “proceeds into the past” from Alice to the source of the EPR pair, in addition to the remaining channel which we regard as “proceeding into the future” in the normal way from the EPR source to Bob. There is no other physical connection. ((Penrose, 1998, p. 1928))

But this is a very outlandish picture. Is it really justified? (Deutsch and Hayden, 2000) think not. They provide an analysis (based on a novel unitary,

THE CONCEPT(S) OF INFORMATION

231

no-collapse picture of quantum mechanics) according to which the bits sent from Alice to Bob do, after all, carry the informationt characterizing the teleported state. The informationt flows from Alice to Bob, hidden away, unexpectedly in Alice’s seemingly classical bits.25 Trying to decide how the informationt is transmitted in teleportation thus presents us with some hard questions. It looks like we have a competition between two different ontological pictures, one in which informationt flows backwards, then forwards in time; the other in which the informationt flows more normally, but hidden away inaccessibly in what we thought were classical bits. Perhaps we ought also to entertain the view that the informationt just jumped non-locally somehow, instead. But what might that even mean? The correct way out of these conundrums is to reject a starting assumption that they all share, by noting that there is something bogus about the question ‘How does the informationt get from Alice to Bob?’ in the first place. Focus on the appearance of the phrase ‘the informationt ’ in this question. Our troubles arise when we take this phrase to be referring to a particular, to some sort of substance (stuff), perhaps, or to an entity, whose behaviour in teleportation it is our task to describe. This is the presumption behind the requirements of locality and continuity of informationt flow that all of Jozsa, Penrose, Deutsch and Hayden apply in their various ways; and why it looks odd to think alternatively of the informationt just jumping non-locally from Alice to Bob: things don’t behave like that, we are inclined to think. All these approaches share the idea that informationt is a kind of thing and that we need to tell a story about how this thing, denoted by ‘the informationt ’, moves about. But when we recognise that ‘informationt ’ is an abstract noun, this pressure disappears. ‘The informationt ’ precisely does not refer to a substance or entity, or any kind of material thing at all; a fortiori it is not something about which we can intelligibly ask whether it takes a spatio-temporally continuous path or not. (By contrast, it remains perfectly intelligible to ask the quite different question whether, in a given protocol, informationt is transmitted by processes that are spatio-temporally continuous.) Since ‘the informationt ’ does not introduce a particular, the question ‘How does the informationt get from Alice to Bob?’ cannot be a request for a description of how some thing travels. If it has a meaning, it is quite another one. It follows that the locus of our confusion is dissolved. The legitimate meaning of ‘How does the informationt get from Alice to Bob?’, then, is just this: it is a roundabout way of asking what physical processes are involved in achieving the protocol. The end of the protocol is achieved when Bob’s system is left in the same state as the one initially presented to Alice. That is what it is for the quantum informationt to have been transmitted. We may then ask what physical processes were responsible for this; and the question will have a straightforward answer, although not one independent of your pre25 The details of Deutsch and Hayden’s approach and the question of what light it might shed on the notion of quantum informationt is studied in detail in (Timpson, 2005) and (Wallace and Timpson, 2006).

232

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

ferred interpretation of quantum mechanics. You pay your money and you take your choice of the alternative, clear-cut, answers. See (Timpson, 2006, §5) for a description in each of a variety of popular interpretations. So while there can remain a source of disagreement about the physical processes involved in teleportation, co-extensive with disagreement over favoured interpretation of quantum mechanics, there is no longer any distinctive conceptual puzzle left about the protocol. Once it is recognised that ‘informationt ’ is an abstract noun, it is clear that there is no further question to be answered regarding how informationt is transmitted that goes beyond providing a description of the processes involved in achieving the end of the protocol. One doesn’t face a double task consisting of (a) describing the physical processes by which information is transmitted, followed by (b) tracing the path of a ghostly particular, information. There is only task (a). The point should not be misunderstood: the claim is not that there is no such thing as the transmission of informationt , but simply that one should not understand the transmission of informationt on the model of transporting potatoes, or butter, say, or piping water. Notice, finally, that the lesson developed here regarding teleportation applies equally in the case of superdense coding. There the source of puzzlement was how Alice could encode two classical bits into the single qubit she sends to Bob, given that the qubit she sends surely has to contain the information. But we should simply reject this latter premise, as it relies on the incorrect ‘thing’ model of informationt . 4.2.5

Summing Up

In this section we have seen how a straightforward explanation of what quantum informationt is may be given; and seen moreover that there are very close links to the classical concept, despite Jozsa’s misgivings we noted earlier. It is certainly true that quantum and classical informationt differ in the types of sequence type that are involved—the quantum case requiring the richer structure of sequences of quantum states—but this does not preclude the two notions of informationt from falling under a single general heading, from being, as advertised, species of a single genus. The crucial steps in the argument were, first, formulating the general definition of what informationt is: that which is produced by a source that is required to be reproducible at the destination; and second, noting that the pertinent sense of ‘what is produced’ is that which points us to the sequence types and not to the tokens. As a corollary we found that ‘informationt ’ is an abstract noun and therefore that neither classical nor quantum informationt are parts of the material contents of the world. Does this conclusion deprive quantum informationt theory of its subject matter? Indeed not. It’s subject matter in the abstract may be conceived of as the study of the structural properties of pieces of quantum informationt (various sequences of quantum states and their possible transformations); and it’s subject

THE PHYSICAL SIDE OF THE THEORY OF COMPUTATION

233

matter in the concrete may be conceived of as the study of the various new types of physical resources that the theory highlights (qubits and shared entanglement) and what may be done with them. But finally, what bearing does all this have on the sorts of philosophical issues we noted in the introduction? We have seen the importance of being straight on the status of informationt in understanding what is going on in teleportation. Two other things also follow quite directly, it seems. It is often claimed to be an important ontological insight deriving from, or perhaps driving, the success of quantum informationt theory that ‘Information is Physical’ (Landauer, 1996). Exactly what the rˆ ole of this slogan might be deserves more detailed discussion (Timpson, 2004b), but things are quite clear on one reading, at least: it is simply a category mistake (we return to another reading later on). Pieces of informationt , quantum or classical, are abstract types. They are not physical, it is rather their tokens that are. To suppose otherwise is to make the category mistake. Thus the slogan certainly does not present us with an ontological lesson. It might perhaps be thought that the purport of the lesson was actually supposed to be that we have made a discovery of a certain kind: that there really are physical instantiations of various pieces of quantum information (sequence types) possible in our world; and this need not have been so. Perhaps. But the force of this lesson is surely limited: it should come as no surprise given that we already knew the world could be well described quantum mechanically. The second point is this. As noted in the introduction, some have taken the development of quantum informationt theory to support a certain kind of immaterialism (what might be called informational immaterialism). Wheeler, for example, in his ‘It from Bit’ proposal suggests that the basis of the physical world is really an immaterial one: ‘... that all things physical are informationtheoretic in origin and this is a participatory universe’ (Wheeler, 1990). This is an old metaphysical idea in the impressive modern dress of the most up-to-date of theories. But is such a view really supported by the successes of quantum informationt theory? It would seem not. We have seen that pieces of informationt are abstracta. To be realised they will need to be instantiated by some particular token or other; and what will such tokens be? Unless one is already committed to immaterialism for some reason, these tokens will be material physical things. So even if one’s fundamental (quantum) theory makes great play of informationt , it will not thereby dispense with the material world. One needs the tokens along with the types. Thus we may safely conclude that immaterialism gains not one whit of support from the direction of quantum informationt theory.

4.3

The Physical Side of the Theory of Computation

Quantum computation has presented a number of conceptual issues (see, e.g., (Deutsch, 1985; Deutsch, 1997), (Deutsch et al., 1999), (Timpson, 2004a)). Here we shall highlight two. First, where does the computational speed-up come from

234

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

in quantum computers? Second, what happens to the Church-Turing hypothesis in this context? 4.3.1 Speed-Up We have good reason to believe that quantum computers can be more efficient than classical ones: there is no known efficient classical algorithm for factoring, but there is a quantum one. It is interesting to ask where this speed-up comes from for at least two reasons. The first is a practical reason: If we had a better understanding of what was distinctively quantum about quantum computation— the feature that allows the speed-up—then we would be better placed to develop further interesting quantum algorithms. The second, related, idea is more philosophical: understanding where the speed-up comes from would give us another handle on what the fundamental differences between classical and quantum systems are. Classical systems won’t allow us to compute certain functions efficiently: what are the crucial differences that allow quantum systems to do so? It is natural, although not wholly uncontroversial, to view the property of entanglement as the main source of the exponential speed-up given by quantum algorithms such as Shor’s (Jozsa, 1998; Ekert and Jozsa, 1998; Jozsa, 2000; Jozsa and Linden, 2003). Ekert and Jozsa make the point that it cannot just be superposition on its own that does the job, as classical systems that allow superpositions and thereby have vector spaces as their state space26 would not allow speed-up. The crucial point seems to be how the state spaces for individual systems compose: classical vector space systems compose by the direct sum27 of the individual systems’ state spaces (so N 2-dimensional systems composed would have a dimensionality of 2N ) whereas quantum state spaces compose by the tensor product (so the dimension of N qubits is 2N ) giving rise to entanglement. However, even if we grant that entanglement plays a, or perhaps the, crucial rˆ ole, it is still possible to ask quite what the mechanism is. A popular answer has been in terms of parallel processing: we ought to think of the evolution of a quantum computer as a large number of distinct simultaneous computations. Indeed it has sometimes been suggested that the possibility of quantum computation provides resounding support for a Many Worlds view of quantum mechanics, as a way of understanding this parallel processing. Deutsch puts the point in characteristically forthright terms: When a quantum factorization engine is factorizing a 250-digit number, the number of interfering universes will be of the order of 10500 ... To those who still cling to a single universe world-view, I issue this challenge: explain how Shor’s algorithm works. I do not merely mean predict that it will work ... I mean provide an explanation. When Shor’s 26 Waves on strings would be an example—to get a finite dimensional state space, imagine confining yourself to the two lowest energy modes for each string. (Such a system is not a bit, of course, as there are a continuous number of distinct states given by superpositions of these two modes.) 27 The direct sum V ⊕ V of two vector spaces V , V , is a vector space composed of elements 1 2 1 2 f = fi , fj , fi ∈ V1 , fj ∈ V2 ; an ordered pair of elements of V1 and V2 . If {gi,j } represents a basis for V1,2 respectively, then a basis for V1 ⊕ V2 will be given by {gi , 0, 0, gj }, hence dimV1 ⊕ V2 = dimV1 + dimV2 .

THE PHYSICAL SIDE OF THE THEORY OF COMPUTATION

235

algorithm has factorized a number using 10500 or so times the computational resources that can be seen to be present, where was the number factorized? There are only about 1080 atoms in the entire visible universe, an utterly miniscule number compared with 10500 . So if the visible universe were the extent of physical reality, physical reality would not even remotely contain the resources required to factorize such a large number. Who did factorize it, then? How, and here, was the computation performed? (Deutsch, 1997, pp. 216–7)

But this rhetorical challenge is a plea on behalf of a fallacy; what can be called the simulation fallacy (Timpson, 2006): the fallacy of reading off features of a simulation as real features of the thing simulated, with no more ado. In this case, reading features of what would be required to provide a classical simulation of a computation as features of the computation itself. Deutsch assumes that a computation that would require a very large amount of resources if it were to be performed classically should be explained as a process that consists of a very large number of computations, in Everettian parallel universes. But the fact that a very large amount of classical computation might be required to produce the same result as the quantum computation does not entail that the same amount of resources are required by the quantum computer, or that the quantum computation consists of a large number of parallel classical computations. One can insist: why, after all, should the resources be counted in classical terms, to begin with? See (Steane, 2003) for further criticism of Deutsch’s notion of parallel processing. ((Hewitt-Horsman, 2002) defends the intelligibility, if not the ineluctability, of the Many-Worlds analysis.) The question of what classical resources would be required to simulate various quantum goings-on is a crucial idea in quantum information theory, but only for its pragmatic significance: it’s a guide to possible new better-than-classical protocols. It is by no means a guide to ontology. Some recent theoretical developments shed further doubt on the parallel processing idea. 4.3.1.1 One-Way Computation One-way quantum computation, also known as measurement-based or cluster state computation (Raussendorf and Briegel, 2001; Raussendorf et al., 2003) is a very significant development for the practical implementation of quantum computation (see (Browne and Briegel, 2006) for an introduction). In the standard quantum circuit model, a register of qubits is prepared in an initial, separable, computational basis state, which is then unitarily evolved by the action of the required sequence of gates on the qubits, typically into a complicated superposed entangled state, before perhaps ending with a measurement in the computational basis to read the result out. Different computations will take the register through different sequences of superposed entangled states with different unitary evolutions. By contrast, in one-way computing, a computation will begin with a network of qubits ready prepared in a particular kind of richly entangled state (a cluster or graph state); and different computations can start with the same state. The computation then proceeds by a sequence of measurements on single qubits and classical communication

236

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

alone. There is no unitary evolution. Different algorithms will correspond to different sequences of one qubit measurements, where the basis in which a given measurement will be performed typically depends on the results of preceding measurements. It turns out that this system is easier to implement than the circuit model (no one or two qubit gates are needed and no two qubit measurements: two qubit operations are the really tricky ones to achieve controllably) and it is considerably closer to current experimental capabilities. While standard quantum computation is reversible (up to any final measurement, at least), the one-way model is not (hence the name). The measurements at each step are irreversible and degrade the initial entanglement of the starting cluster state. The point to take from this (as a number of people have emphasised, e.g. (Steane, 2003)) is that there is nothing in the one-way model of computation that looks like the parallel processing story; there are no linearly evolving parallel paths, as there is no unitary evolution. There is just a sequence of measurements banging on a large entangled state; the same state for different computations. Given that the one-way model and the circuit model are provably equivalent in terms of computational power, it follows that parallel processing cannot be the essence of quantum computational speed-up.28 4.3.1.2 Bub’s Geometrical Formulation A more tentative, but nonetheless suggestive thought is this. Recently (Bub, 2006a) has provided a geometrical way of thinking about certain quantum algorithms that shows how apparently rather different looking algorithms, in particular, Deutsch’s original XOR algorithm (Deutsch, 1985) and Shor’s algorithm, can be seen to exploit the same quantum mechanical fact in their operation: the fact that it is possible in quantum mechanics to compute the value of a disjunction without computing the values of the individual disjuncts. On this way of looking at things, rather than a quantum algorithm computing all the values at once—the parallelism idea—the point is that the algorithm is seen explicitly to avoid computing any of the actual values of the function, these proving to be redundant for what the algorithm is aiming to achieve. What is particularly pertinent about Bub’s analysis, though, is that it suggests that we may be asking the wrong question. The important point is that Shor’s algorithm gives an exponential speed-up, whereas Deutsch’s algorithm doesn’t. So really what we thought we would have wanted was an analysis of these algorithms that makes them look different, yet here they are illuminatingly cast as the same. So perhaps our question should not be ‘Why are quantum computers faster for some processes than classical ones?’ but rather ‘Why is it that classical computers are so slow for some computations?’

28 A caveat. As far as I know no-one has yet attempted a description of one-way computing in fully unitary no-collapse quantum mechanics, i.e., where the measurements would be analysed quantum mechanically too. It’s conceivable that such an analysis would reveal closer links to the circuit model than is currently apparent, although this is perhaps unlikely. Either way, the result would be of interest.

THE PHYSICAL SIDE OF THE THEORY OF COMPUTATION

4.3.2

237

Whither the Church-Turing Hypothesis?

The study of quantum computation can, in some ways, be seen as a liberation for computer science. The classical Turing machine, abstractly characterised, had dominated theorising since its conception (Turing, 1936). What the development of quantum computers showed was that just focusing on abstract computational models, in isolation from the consideration of the physical laws govering the objects that might eventually have to implement them, can be to miss a lot. The progenitors of quantum computation realised that the question of what computational processes fundamental physics might allow was a very important one; and one which had typically been neglected in the purely mathematical development of computer science. One can argue that Turing’s model of computing involved implicit classical assumptions about the kinds of physical computational processes there could be; hence his model was not the most general, hence Feynman’s tongue-in-cheek remark a propos Turing: ‘He thought he understood paper’.29 This is the line that (Deutsch, 1985; Deutsch, 1997) explores. Thus quantum computers remind us that the theory of computing has two sides, the mathematical and the physical; and that the interplay between them is important. We may miss things if our most general computational model does not in fact take into account all the possible kinds of physical process there are that might accomodate a computational reading; while a model that relies on processes that could not be physically implemented would not be an interesting one for practical purposes, perhaps would not even count as a computational model. It turned out, of course, that quantum computers do not go wildly beyond Turing machines, they do not, for example compute the non-Turing computable; but they do instead raise important new questions in the rich theory of computational complexity.30 And the general point is well taken. For some, this is how the slogan ‘Information is Physical’ is best read: as a needed corrective to computer science. Less ringing, perhaps, but more accurate, would be ‘Computers are Physical!’. In more strident application of the same point, it is significant to note that sensible proposals do exist for physically possible computations that would compute non-Turing computable functions, e.g., (Hogarth, 1994), (Shagrir and Pitowsky, 2003) (although note the discussion in (Earman and Norton, 1993)). Deutsch takes the lesson so far as saying that a new principle ought to replace the familiar Church-Turing hypothesis at the heart of the theory of computation, a physical principle which he calls the Turing Principle: Every finitely realizable physical system can be perfectly simulated by a universal model computing machine operating by finite means. (Deutsch, 1985)

29 Cited

by (Deutsch, 1997, p. 252). an elementary discussion, see (Williams and Clearwater, 2000), in more detail, (Nielsen and Chuang, 2000, Chapters 3,4). 30 For

238

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

Elsewhere I have argued that this is mistaken (Timpson, 2004a). Here let us simply reflect on some crucial differences between two theses that are often confused. ((Copeland, 2000; Copeland, 2002) is exemplary in making such distinctions; see also (Pitowsky, 2002)). 4.3.2.1 The Church-Turing Hypothesis This is the claim, deriving from the seminal papers of (Church, 1936) and (Turing, 1936) that the class of effectively calculable functions is the class of Turing machine computable functions. This is a definition, or a stipulation, (in the material mode) of how the rough intuitive notion of effective calculability was to be formally understood. Given its definitional character, ‘hypothesis’ is not really an apt name. It was important to provide such a definition of effective calculability in the 1930s because of the epistemological troubles in mathematics that drove Hilbert’s formalist programme. The emphasis here is squarely on what can be computed by humans (essential if the epistemological demands are to be met, see (Timpson, 2004a, §3) and refs. therein) not anything to do with characterising the limits of machine computation. 4.3.2.2 The Physical Church-Turing thesis This is a quite different thesis that comes in a variety of names and is often conflated with the Church-Turing hypothesis. It is the claim that the class of functions that can be computed by any physical system is co-extensive with the Turing computable functions. Sometimes it comes in a stronger version that imposes some efficiency requirement: E.g., the efficiency of computation for any physical system is the same as that for a Turing machine (or perhaps, for a probabilistic Turing machine). This is about the ultimate limits of machine computation. (Deutsch’s Turing Principle is a thesis, directed towards the limits of physical computation, something along these lines; but where the concrete details of the Turing Machine have been abstracted away in the aim of generality.) Notice that the kind of evidence that might be cited in support of these theses is quite different. In fact, since the first is a stipulation, it wouldn’t make sense to offer evidence in support of its truth. All one can do is offer reasons for or against it as a good definition. The facts that are typically cited to explain its entrenchment are precisely of this form: one points to all the different attempts at capturing the notion of algorithm or of the effectively calculable: they all return the same class of functions (e.g. (Cutland, 1980, p. 67)). This tells us that Church and Turing did succeed in capturing the intuitive notion exceedingly well: we have no conflict with our pre-theoretic notions. By contrast, the physical thesis is an empirical claim and consequently requires inductive support. It’s truth depends on what you can get physical systems to do for you. The physical possibility of Malament-Hogarth spacetimes (and of the other elements required in Hogarth’s protocol) for example, would prove it wrong. It’s not clear how much direct or (more likely) indirect inductive support it actually possesses, certainly it should not be thought as deservedly entrenched

THE PHYSICAL SIDE OF THE THEORY OF COMPUTATION

239

as the Church-Turing hypothesis, although many are inclined to believe it. (Some admit: it’s just a hunch.) What we do know is that quantum computation shows that the strong version, at least, is wrong (so long as no classical efficient factoring algorithm exists; and we believe none does). Which of these two theses, if either, really lies at the heart of the theory of computation? In a sense, both: it depends what you want the theory of computation to be. If you are concerned with automated computing by machines and specifically with the ultimate limits of what you can get real machines to do for you, you will be interested in something like the physical version of the thesis, although one could clearly get along fine if it were false. If you are concerned with the notion of effective calculability and recursive functions, you will stick with the former thesis, the latter being largely irrelevant. 4.3.2.3 Computational Constraints on Physical Laws Some have been tempted to suggest that physical constraints on what can be computed should be seen as important principles governing physical theory. (Nielsen, 1997) for example, argues that the physical Church-Turing hypothesis is incompatible with the standard assumption in quantum mechanics that a measurement can be performed for every observable one can construct (neglecting for present purposes dynamical constraints such as the Wigner-Araki-Yanase theorem (Peres, 1995, pp. 421–2)) and the thesis is also incompatible with the possibility of unrestricted unitary operations. He conjectures that it is the physical Church-Turing thesis which should be retained and the required restrictions imported into quantum theory. Whether this is the correct conclusion to draw would depend on whether the inductive support for the physical thesis was greater than that accruing to quantum mechanics in its usual, unrestricted form. This seems questionable; although teasing out the evidence on either side would be an interesting task. A plausible default position might be that if one has in hand a well-confirmed and detailed physical theory that says that some process is possible, then that theory holds the trump card over a less specific generalisation covering the same domain. Consider the case of thermodynamics: this theory suggests that fluctuation phenomena should be impossible; kinetic theory suggests that they will happen—which one are you going to believe?31 Jozsa has presented another very interesting argument in similar vein. In his view, there is reason to think that computational complexity is a fundamental constraint on physical law. It is noteworthy that several different models of computation, very distinct physically—digital classical computing, analogue classical computing and quantum computing—share similar restrictions in their computing power: one can’t solve certain problems in polynomial time. But this is for different reasons in the various cases. In the analogue case, for example, expo31 This leads us to an interesting general methodological issue: the default position just outlined looks plausible in some cases, but less so in others: consider the advent of Special Relativity in Einstein’s hands. Perhaps in that case, though, one can point to specific defeating conditions that undermined the authority of the detailed theory in the domain in question.

240

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

nential effort would be needed to build sufficiently precise devices to perform the required computations, because it is very difficult to encode larger and larger numbers stably in the state of an analogue system. In the quantum case, one can see a restriction with measurement: if we could but read out all the results contained in a superposition then we would have enormous computational power; but we can’t. Thus both analogue and quantum computation might appear to hold out the hope of great computing power, but both theories limit the ability to harness that power, while slight variations in the theories would allow one access to it.32 This looks like a conspiracy on behalf of nature, or to put it another way, a case of homing in on a robust aspect of reality. Perhaps, then (the thought is) some general principle of the form ‘No physical theory should allow efficient solution of computational tasks of the class x’ obtains. We might then use this as a guide to future theorising. However, it is unlikely that such a principle could sustain much commitment unless it were shown to mesh suitably with bona fide physical principles. If one constructed a theory that was well-formed according to all physical desiderata one could think of, yet violated the computational complexity principle, it is implausible that one would reject it on those grounds alone. 4.4 Foundations of QM Whether advances in quantum information theory will finally help us to resolve our conceptual troubles with quantum mechanics is undoubtedly the most intriguing question that this new field holds out. Such diametrically opposed interpretational viewpoints as Copenhagen and Everett have drawn strength since its development. Copenhagen, because appeal to the notion of information has often loomed large in approaches of that ilk and a quantum theory of information would seem to make such appeals more serious and precise (more scientifically respectable, less hand-wavey); Everett, because the focus on the ability to manipulate and control individual systems in quantum information science encourages us to take the quantum picture of the world seriously; because of the intuitive appeal of Deutsch’s many-worlds parallel processing view of algorithms; and most importantly, because of the theoretical utility of always allowing oneself the possibility of extending a process being studied to a unitary process on a larger Hilbert space. (This is known in the trade as belonging to the Church of the Larger Hilbert Space.) In addition to providing meat for interpretational heuristics, quantum information theory, with its study of quantum cryptography, error correction in quantum computers, the transmission of quantum information down noisy channels and so on, has given rise to a range of powerful analytical tools that may be used in describing the behaviour of quantum systems and therefore in testing our interpretational ideas. 32 For

an example of this in the quantum case, consider (Valentini, 2002b) on sub-quantum information processing in non-equilibrium Bohm theory.

FOUNDATIONS OF QM

241

4.4.1 Instrumentalism Once More? As just mentioned, one strand in Copenhagen thought has always suggested that the correct way to understand the quantum state is in terms of information. One can see the (in)famous statement attributed to Bohr in just this light: There is no quantum world. There is only an abstract physical description. It is wrong to think that the task of physics is to find out how nature is. Physics concerns what we can say about nature. (Petersen, 1963)

Physics concerns what we can say about nature, not how things are; what we can say about nature—what information we have—is encoded in the quantum state. The state doesn’t represent objective features of the world, it’s just a means for describing our information. (Mermin, 2001b), (Peierls, 1991), (Wheeler, 1990) and (Zeilinger, 1999) have all been drawn to views of this nature. A canonical statement of the view is given by Hartle: The state is not an objective property of an individual system but is that information obtained from knowledge of how a system was prepared, which can be used for making predictions about future measurements. (Hartle, 1968, p. 709)

With the flourishing of quantum information theory, which can indeed be seen, in a certain sense, as taking quantum states to be information (cf. §4.2.1) this view seems to acquire scientific legitimacy, even, perhaps, an attractive timeliness.33 There are some common objections to construing the quantum state as information from those of a more realist bent. Why, one might ask, if the quantum state is just information, should it evolve in accord with the Schr¨ odinger equation? Why should my state of mind, if you like, evolve in that way? Yet we know the quantum state does (at least most of the time). Does it even make sense for cognitive states to be governed by dynamical laws? Or, one might be worried about where measurement outcomes are supposed to come from in this interpretation—measurement outcomes can’t simply be information too, surely? Musn’t they be part of the world? Neither of these are strong objections, though, both having simple answers. For the first, the reason that one’s state of mind—the information one has that the quantum state represents—evolves in accord with the Schr¨ odinger equation (when it ought to), is that one subscribes to the laws of quantum mechanics. If a system is prepared in a certain way, then according to the theory, certain probabilities are to be expected for future measurement outcomes—this is what one comes to believe. If the system is then subject to some evolution, the theory tells you something specific: that what can be expected for future measurements will change, in a certain systematic way. It is because one is committed to quan33 The reader should draw their own conclusions about the validity of the train of thought involved, though. Notice that, when partaking in a quantum communication protocol, quantum states can be thought of as quantum information; but wouldn’t one want something more like classical information when talking about Copenhagen-style measurement outcomes? Wouldn’t one actually want informatione rather than informationt too? Reflect, also, on the discussion in §4.2.5 for rebuttal of the idealist trend in the Bohr quotation.

242

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

tum theory as descriptively accurate at the empirical level that one will update one’s cognitive state appropriately. You know the rules for how states at t1 are supposed to be related to states at t2 , so you assign them at those times accordingly. As for the second, there is no requirement, on the view being adumbrated, that measurement outcomes be constituted by information (whatever that might mean) as there is no requirement that they be represented by a quantum state (e.g., we don’t have to think of measurement pointer degrees of freedom taking on definite states as being constitutive of measurement outcomes). One can simply treat measurement outcomes as brute facts, happenings that will lead the experimenter to adopt certain quantum states in ways dictated by the theory, experimental context and their background beliefs. The real problem for the approach, indeed an insurmountable one, is presented rather by the following dilemma. The quantum state represents information? John Bell asked wisely: Information about what? (Bell, 1990) It seems that only two kinds of answer could be given: 1. Information about what the outcomes of experiments will be; 2. Information about how things are with a system prior to measurement, i.e., about hidden variables. Neither of these is satisfactory. The essential interpretive aim of construing the quantum state as information is to mollify worries about its odd behaviour (collapse, nonlocality). Such behaviour isn’t troublesome if the state isn’t describing physical goings-on. One argues: there’s not really any physical collapse, just a change in our knowledge; there’s not really any nonlocality, it’s only Alice’s knowledge of (information about) Bob’s system that changes when she makes a measurement on her half of an EPR pair. But now suppose one opted for answer (2) to our question ‘Information about what?’, arguing that the information was about hidden variables. This would defeat the purpose of adopting this approach in the first place, as we all know that hidden variables are going to be very badly behaved indeed in quantum mechanics (nonlocality, contextuality). So our would-be informationist surely can’t want this answer. Turning then to the first answer, the trouble here is to avoid simply sliding into instrumentalism. An instrumentalist would assert that the quantum state is merely a device for calculating the statistics for measurement outcomes. How is the current view any different, apart from having co-opted the vogue term ‘information’ ? The point is, instrumentalism is not a particularly attractive or interesting interpretive option in quantum mechanics, amounting more to a refusal to ask questions than to take quantum mechanics seriously. It is scarcely the epistemologically enlightened position that older generations of physicists, suffering from positivistic hang-overs, would have us believe. If instrumentalism is all that appealing to information really amounts to, then there is little to be said for it. This shop-worn position is not made any more attractive simply by

FOUNDATIONS OF QM

243

being re-packaged with modern frills. A further fundamental problem for this approach is that ‘information’ as it is required to feature in the approach, is a factive term. (I can’t have the information that p unless it is true that p.) This turns out to undermine the move away from the objectivity of state ascriptions it was the express aim of the approach to achieve. This matter is discussed in (Timpson, 2004b, Chapter 8). We may safely conclude that simply reading the quantum state in terms of information is not a successful move. 4.4.2

Axiomatics

If we are to find interesting work for the notion of information in approaching foundational questions in quantum mechanics we must avoid an unedifying descent into instrumentalism. A quite different approach is to investigate whether ideas from quantum information theory might help provide a perspicuous conceptual basis for quantum mechanics by leading us to an enlightening axiomatisation of the theory. We have seen that strikingly different possibilities for information transfer and computation are to be found in quantum mechanics when compared with the classical case: might these facts not help us characterise how and why quantum theory has to differ from classical physics? The most powerful expression of this viewpoint has been presented by Fuchs and co-workers (cf. (Fuchs, 2003)). We shall briefly survey three approaches in this vein. 4.4.2.1 Zeilinger’s Foundational Principle (Zeilinger, 1999) adopts an instrumentalist view of the quantum state along with a phenomenalist metaphysics: physical objects are assumed not to exist in and of themselves but to be mere constructs relating sense impressions. Of more interest, and logically separable, is Zeilinger’s concern in this paper to provide an information-theoretic foundational principle for quantum mechanics. The hope is to present an intuitively straightforward principle that plays a key rˆ ole in deriving the structure of the theory. Zeilinger suggests he has found it in the principle that: Foundational Principle: An elementary system represents the truth value of one proposition.

This is also expressed as the claim that elementary systems carry only one bit of information. Elementary systems are those minimal components that are arrived at as the end result of a process of analysis of larger composite systems into smaller component parts. In fact the Foundational Principle comes out as a tautology in this setting, as elementary systems are defined as those which can be described by a single (presumably, elementary) proposition only. (Shades of Wittgenstein’s Tractatus here.) The claim is that the Foundational Principle is the central principle for understanding quantum mechanics and that it explains both irreducible randomness and the existence of entanglement: key quantum features. It turns out, however, that the principle won’t do the job (Timpson, 2003). To see why, let us first cast the principle in more perspicuous form. As Zeilinger intends by ‘proposition’ something that represents an experimental

244

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

question, the principle is the claim: The state of an elementary system specifies the answer to a single yes/no experimental question. Then the explanation offered for randomness in quantum mechanics is that elementary quantum systems cannot, given the Foundational Principle, carry enough information to specify definite answers to all experimental questions that could be asked. Therefore, questions lacking definite answers must receive a random outcome; and this randomness must be irreducible because if it could be reduced to hidden properties, then the system would carry more than one bit of information. Entanglement is explained as arising when all of the N bits of information associated with N elementary systems are used up in specifying joint rather than individual properties, or more generally, when more of the information is in joint properties than would be allowed classically (Brukner et al., 2001). What goes wrong with both of these purported explanations, however, is that no attention has been paid to the structure of the set of experimental questions on individual and joint systems. But without saying something about this, the Foundational Principle has no power at all. Consider: irreducible randomness would only arise when there are more experimental questions that can be asked of an elementary system than its most detailed (pure) state description could provide an answer for. But what determines how many experimental questions there are and how they relate to one another? Certainly not the Foundational Principle. The Foundational Principle doesn’t explain why, having given the finest grained state description we can manage, experimental questions still exist that haven’t already been answered by our specification of that state. Put bluntly, why isn’t one bit enough? (Compare a classical Ising model spin—here the one bit we are allowed per system is quite sufficient to answer all experimental questions that could be asked.) If we assume the structure of the set of questions is quantum mechanical, then of course such questions exist. But we cannot assume this structure: it is what we are trying to derive; and in the absence of any argument why space for randomness exists, we cannot be said to have explained its presence. The story with entanglement is similar. We would only have an explanation of entanglement if it were explained why it is that there exist experimental questions concerning joint systems to which the assignment of truth values is not equivalent to an assignment of truth values to questions concerning individual systems. It is only if this is the case that there can be more information exhausted in specifying joint properties than individual ones, otherwise the joint properties would be reducible to individual ones. What we want to know is why this is the case; but the Foundational Principle cannot tell us. As it stands, the Foundational Principle is wholly unsuccessful. Might we be able to salvage something from the approach, however? Perhaps if we were to add further axioms that entailed something about the structure of the set of experimental questions, progress could be made. A possible addition might be a postulate (Rovelli, 1996) adopts: It is always possible to acquire new information about a system. One wouldn’t be terribly impressed by an explanation of

FOUNDATIONS OF QM

245

irreducible randomness invoking the Foundational Principle and this postulate, however, as it would look rather too much like putting the answer in by hand. But there might be other virtues of the system to be explored. (Grinbaum, 2005) discusses another axiom of similar pattern to Zeilinger’s Foundational Principle, from a quantum logical perspective. (Spekkens, 2007) in a very suggestive paper, presents a toy theory whose states are states of less than maximal knowledge— the finest grained state description the theory allows leaves as many questions about the physical properties of a system unanswered as answered. What is remarkable is that these states display much of the rich behaviour that quantum states display and which we have become accustomed to thinking is characteristic of quantum phenomena. The thought is that if such phenomena arise naturally for states of less than complete information, perhaps quantum states also ought to be thought of in that manner. Adopting this approach wholeheartedly, though, we would have to run once more the gauntlet outlined above of answering what the information was supposed to be about. 4.4.2.2 The CBH Theorem A remarkable theorem due to Clifton, Bub and Halvorson (the CBH theorem) (Clifton et al., 2003) fares considerably better than Zeilinger’s Foundational Principle. In this theorem, a characterisation of quantum mechanics is achieved in terms of three information-theoretic constraints (although it can be questioned whether all three are strictly necessary). The constraints are: 1. No superluminal information transmission between two systems by measurement on one of them; 2. no broadcasting of the information contained in an unknown state; and 3. no unconditionally secure bit-commitment. No broadcasting is a generalisation to mixed states of the no-cloning theorem (Barnum et al., 1996). A state ρ would be broadcast if one could produce from it a pair of systems A and B in a joint state ρ˜AB whose reduced states are both equal to ρ. This can obtain even when ρ˜AB = ρ ⊗ ρ, so long as ρ is not pure. States can be broadcast iff they commute. Arguably, no-broadcasting is a more intrinsically quantum phenomenon than no-cloning, because overlapping classical probability distributions cannot be cloned either, but they can be broadcast (Fuchs, 1996). Bit-commitment is a cryptographic protocol in which one party, Alice, provides another party, Bob, with an encoded bit value (0 or 1) in such a way that Bob may not determine the value of the bit unless Alice provides him with further information at a later stage (the ‘revelation’ stage) yet in which the information that Alice gives Bob is nonetheless sufficient for him to be sure that the bit value he obtains following revelation is indeed the one Alice committed to originally. It turns out that this is a useful cryptographic primitive. A protocol is insecure if either party can cheat—Alice by being free to chose which value is revealed at revelation, or Bob by learning something about the value before revelation. Classically, there is no such protocol which is unconditionally secure. It was thought for a time that quantum mechanics might allow such a protocol, using different

246

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

preparations of a given density matrix as a means of encoding the bit value in such a way that Bob couldn’t determine it, but it was realised that Alice could always invoke a so-called EPR cheating strategy in order to prepare whichever type of density matrix she wished at the revelation stage (Lo and Chau, 1997; Mayers, 1997). Instead of preparing a single system in a mixed state to give to Bob, she could present him with half of an entangled pair, leaving herself free to prepare whichever mixture she wished later. (See (Bub, 2001) for a detailed discussion.) We shan’t dwell on bit-commitment, however as, arguably, it is a redundant condition in the CBH theorem (see (Timpson, 2004b, §9.2.2)). Finally we should note that the theorem is cast in the context of C ∗ -algebras, which CBH argue is a sufficiently general starting point as C ∗ -algebras can accomodate both quantum and classical theories.34 The theorem states that any C ∗ -algebraic theory satisfying the information-theoretic constraints will be a quantum theory, that is, will have a non-commuting algebra of observables for individual systems, commuting algebras of observables for spacelike separated systems, and will allow entanglement between spacelike separated systems. The converse holds too ((Halvorson, 2004) filled-in a final detail) so the conditions are necessary and sufficient for a theory to be quantum mechanical. It is interesting and indeed remarkable that such a characterisation of quantum mechanics can be achieved and it undoubtedly enrichens our understanding of quantum mechanics and its links to other concepts, as one would hope for from a worthwhile novel axiomatisation of a theory. But with that said, questions have been raised both about the scope of the theorem and about what direct light it sheds on the nature and origin of quantum mechanics. On the question of scope, a number of people have enquired whether the C ∗ algebraic starting point is quite so neutral as CBH assumed. Both (Smolin, 2005) and (Spekkens, 2007) provided examples of theories satisfying the informationtheoretic constraints, yet palpably failing to add up to quantum mechanics. What their constructions lacked were aspects of the C ∗ -algebraic starting point the theorem assumes. But for this very reason, their constructions raise the question: just how much work is that initial assumption doing? Concrete examples of the restrictiveness of the C ∗ -algebraic starting point may also be given (Timpson, 2004b, §9.2.2). The C ∗ -algebraic notion of state implies that expectation values for observables must be additive. However, ever since Bell’s critique of von Neumann’s no-hidden variables assumption, it has been recognised that this is an extremely restrictive assumption (Bell, 1966). Insisting on beginning with C ∗ -algebras automatically rules out a large class of possible theories: hidden variables theories having quantum-mechanical structures 34 A C ∗ -algebra is a particular kind of complex algebra (a complex algebra being a complex vector space of elements, having an identity element and an associative and distributive product defined on it). A familiar example of a C ∗ -algebra is given by the set of bounded linear operators on a Hilbert space; and in fact any abstract C ∗ -algebra finds a representation in terms of such operators on some Hilbert space. One defines a state on the algebra, which is a positive, normalized, linear functional that can be thought of as ascribing expectation values to those elements of the algebra that represent observable quantities.

FOUNDATIONS OF QM

247

of observables. This sort of criticism also relates to work by Valentini on the behaviour of general hidden variables theories which allow the possibility of non-equilibrium (i.e., non-Born rule) probability distributions (Valentini, 2002b; Valentini, 2002a). In such theories, empirical agreement with ordinary quantum mechanics is merely a contingent matter of the hidden variables having reached an equilibrium distribution. Out of equilibrium, markedly non-quantum behaviour follows, specifically, the possibility of instantaneous signalling and the possibility of distinguishing non-orthogonal states: two of the three informationtheoretic conditions will be violated. From this perspective, the principles are not at all fundamental, but are accidental features of an equilibrium condition. 4.4.2.3 Interpretive Issues However it is over what conclusions can be drawn from the CBH theorem about the nature of quantum mechanics that the greatest doubts lie. In the original paper, some pregnant suggestions are made: The fact that one can characterize quantum theory ... in terms of just a few informationtheoretic principles ... lends credence to the idea that an information theoretic point of view is the right perspective to adopt in relation to quantum theory ... We ... suggest substituting for the conceptually problematic mechanical perspective on quantum theory an information-theoretic perspective ... we are suggesting that quantum theory be viewed, not as first and foremost a mechanical theory of waves and particles ... but as a theory about the possibilites and impossibilities of information transfer. (Clifton et al., 2003, p. 4)

The difficulty is specifying what this amounts to. Given that the informationtheoretic axioms have provided us with the familiar quantum mechanical structure once more, it is difficult to see that any of the debate over how this structure is to be interpreted, whether instrumentally or realistically, whether Copenhagen, collapse, Bohm, Everett, or what-not, is at all affected. Thus it is unclear how the information-theoretic perspective (however that is to be cashed out) could impinge on the standard ontological and epistemological questions; arguably it does not (Timpson, 2004b, pp. 214–22). (Clifton et al., 2003) suggest that their theorem may be seen as presenting quantum mechanics as a principle theory, as opposed to a constructive theory, and this is where its interpretive novelty is to lie. The principle/constructive distinction is due to Einstein. Thermodynamics is the paradigm principle theory, to be contrasted with a constructive theory like the kinetic theory of gases. Principle theories begin from some general well-grounded phenomenological principles in order to derive constraints that any processes in a given domain have to satisfy. Constructive theories build from the bottom up, from what are considered to be suitably basic (and simple) elements and the laws governing their behaviour, to more complex phenomena. Einstein self-consciously adopted the principle theory approach as a route to Special Relativity. There are two problems here. The first: Even if one were to agree that quantum mechanics might usefully be viewed as a principle theory, where the principles are information-theoretic, then this would not take us very far. It would tell us that systems have to have certain C ∗ -algebraic states and algebras of observ-

248

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

ables associated with them, on pain of violation of the principles. But adopting this approach does not constrain at all how these states and observables are to be understood. Yet the usual interpretive issues in quantum mechanics lie at just this level: how are we to understand how the formalism is to map onto features of reality (if at all)? Remaining silent on this is simply to fail to engage with the central conceptual questions, rather than to present a less problematic alternative. The second point is that drawing an analogy with Einstein’s (wonderfully successful!) principle theory approach to Special Relativity backfires (Brown and Timpson, 2006). Einstein was quite clear that constructive theories were to be preferred to principle theories and that constructive theories were more explanatory. He only reached for a principle theory methodology to obtain the Special Theory of Relativity as a move of desperation, given the confused state of physics at the turn of the 20th century; and was always unhappy with central elements of his original formulation of the theory thereafter (see (Brown, 2005) for more detail on this and on constructive alternatives to Einstein’s original formulation). Einstein’s 1905 methodology was a case of pragmatism winning out over explanatory depth. It is hard to see that an analogous manoeuvre would serve any purpose now, given that we already possess quantum theory; and that this theory, in its quotidian form and application, is clearly the constructive theory for physics. 4.4.2.4 Quantum Bayesianism The final approach we shall consider is the most radical—and for that reason, the most interesting—one so far. This is the quantum Bayesianism of Caves, Fuchs and Schack (Fuchs, 2003; Caves et al., 2002b; Fuchs, 2002a; Caves et al., 2002a; Fuchs and Schack, 2004). (Here we concentrate on the position as advocated by Fuchs.) The quantum Bayesian approach is characterized by its non-realist view of the quantum state: the quantum state ascribed to an individual system is understood to represent a compact summary of an agent’s degrees of belief about what the results of measurement interventions on a system will be. The probability ascriptions arising from a particular state are understood in a purely subjective, Bayesian manner. Then, just as with a subjective Bayesian view of probability there is no right or wrong about what the probability of an event is, with the quantum Bayesian view of the state, there is no right or wrong about what the quantum state assigned to a system is.35 The approach thus figures as the terminus of the tradition which has sought to tie the quantum state to cognitive states, but now, importantly, the cognitive state invoked is that of belief, not knowledge. The quantum state does not represent information, on this view 35 The fact that scientists in the lab tend to agree about what states should be assigned to systems is then explained by providing a subjective ‘surrogate’ for objectivity, along the lines that de Finetti provided for subjective probability: an explanation why different agents’ degrees of beliefs may be expected to come into alignment given enough data, in suitable circumstances (Caves et al., 2002b).

FOUNDATIONS OF QM

249

(despite the occasional misleading claim to this effect), it represents an individual agent’s subjective degrees of belief about what will happen in a measurement. Importantly, however, this non-realist view of the quantum state is not the end point of the proposal, but merely its starting point. The aim is for more than a new formulation of instrumentalism and for this reason, it would be misguided to attack the approach as being an instrumentalist one. Rather, the hope expressed is that when the correct view is taken of certain elements of the quantum formalism (viz. quantum states and operations) it will be possible to ‘see through’ the quantum formalism to the real ontological lessons it is trying to teach us. Fuchs and Schack put it in the following way: [O]ne ... might say of quantum theory, that in those cases where it is not just Bayesian probability theory full stop, it is a theory of stimulation and response (Fuchs, 2002b; Fuchs, 2003). The agent, through the process of quantum measurement stimulates the world external to himself. The world, in return, stimulates a response in the agent that is quantified by a change in his beliefs—i.e., by a change from a prior to a posterior quantum state. Somewhere in the structure of those belief changes lies quantum theory’s most direct statement about what we believe of the world as it is without agents. (Fuchs and Schack, 2004)

Given the point of departure of a Bayesian view of the state, and using techniques from quantum information, the aim is to winnow the objective elements of quantum theory (reflecting physical facts about the world) from the subjective (to do with our reasoning). Ultimately, the hope is to show that the mathematical structure of quantum mechanics is largely forced on us, by demonstrating that it represents the only, or, perhaps, simply the most natural, framework in which intersubjective agreement and empirical success can be achieved given the basic fact (much emphasized in the Copenhagen tradition) that in the quantum domain, it seems that the ideal of a detached observer may not be obtained. One of the main attractions of this approach, therefore, is that it aims to fill-in an important lacuna associated with many views in the Copenhagen tradition: It is all very well, perhaps, adopting some non-realist view of the quantum formalism, but, one may ask, why is it that our best theory of the very small takes such a form that it needs to be interpreted in this manner? Why are we forced to a theory that does not have a straightforward realist interpretation? Why is this the best we can do? The programme of Caves, Fuchs and Schack sets out its stall to make progress with these questions, hoping to arrive at some simple physical statements which capture what it is about that world that forces us to a theory with the structure of quantum mechanics. Note, however, that although the aim is to seek a transparent conceptual basis for quantum mechanics, there is no claim that the theory should be understood as a principle theory. In further contrast to the CBH approach, rather than seeking to provide an axiomatisation of the quantum formalism which might be interpreted in various ways, the idea instead is to take one particular interpretive stance and see whether this leads us to a perspicuous axiomatisation. This approach is self-consciously a research programme: If we adopt this view of the quantum formalism, where does it lead us? The proof of the pudding will

250

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

be in the eating. The immediately pressing questions the approach raises are whether adopting the Bayesian approach would force us to give up too much of what one requires as objective in quantum mechanics, and what ontological picture goes along with the approach. How ought we to conceive a world in which the quantum Bayesian approach is the right one to take towards our best fundamental theory? These are matters for further investigation. 4.5

Outlook

We have traversed a wide theoretical landscape and dwelt on quite a number of issues. Some conclusions have been clear and others left open. Some have been positive: we have seen how one ought to understand the notion of quantum informationt , for example, and how this helps us understand informationt flow in entanglement-assisted communication. Others have been negative: we have seen the dangers of crude attempts to argue that the quantum state is information, or that quantum parallelism is a good argument for the Everett interpretation. Some important issues have been barely touched on, others not at all. Let’s close by indicating a few of these last. Perhaps the most important philosophical issue that we have not discussed directly here is the general question of what kind of rˆ ole the concept of informationt has to play in physics. We have established some matters relevant to this: that informationt is not a kind of stuff, so introduction of the concept of quantum informationt is not a case of adding to the furniture of the world; but we have not attacked the issue directly. What we would like to be able to do is answer the question of what kind of physical concept informationt is. Is it a fundamental one? (Might there be more than one way in which concepts can be physically fundamental? Probably.) Or is it an adventitious one: of the nature of an addition from without; an addition from the parochial perspective of an agent wishing to treat some system information-theoretically, for whatever reason? In addressing this issue it would be extremely helpful to have detailed comparisons with physical concepts that usually are taken as fundamental (relatively unproblematically), such as energy, charge and mass. (Notice that ‘energy’, ‘charge’, ‘mass’, are all abstract nouns too; property names in fact. How does ‘informationt ’ differ from these? It is not always a property name, for one thing.) A related theme is that of principle versus constructive theories, one we have touched on briefly. With its focus on task-oriented principles (you can’t clone, you can’t broadcast, you can’t teleport without an ebit of entanglement), quantum information theory perhaps provides some scope for re-assessing preference for constructive theories over principle theories. If ‘informationt ’ features heavily in those latter theories, perhaps this would be an argument that informationt is indeed a fundamental concept. Against that, one faces the perennial concern whether principle theories can ever really be truly fundamental. Related again is the entire sphere of entanglement thermodynamics (see (Clifton, 2002) for an invitation). The principle of no-increase of entanglement under local operations and classical communication appears to be in some ways

FURTHER READING

251

akin to the second law of thermodynamics. Various analogies have been developed between entanglement and thermodynamic quantities (Plenio and Vedral, 1998; Rohrlich, 2001; Horodecki et al., 2001). It is a pressing concern to establish what these analogies have to teach us about the nature of entanglement and whether they are more than merely formal. Another noteworthy omission is any discussion of the thermodynamics of informationt processing. This is an important issue that bears on the question of what one is to make of the notion of informationt physically, particularly in discussion of Maxwell’s demon, and in modern treatments of the exorcism of the demon that appeal to Landauer’s Principle. Finally, one might wish to consider more explicitly the methodological lessons that quantum informationt theory presents. One such lesson, perhaps, is that it provides an example of a theory of rich vigour and complexity in fundamental physics which does not proceed by introducing new kinds of material things into the world: it does not postulate new fundamental fields, particles, aether or ectoplasm. What it does do is ask new kinds of questions, illustrating the fact that fundamental physics need not always progress by the successful postulation of new things, or new kinds of things, but can also progress by introducing new general frameworks of enquiry in which new questions can be asked and in which devices are developed to answer them. Thus quantum informationt theory might be another example to set alongside anaytical mechanics in Butterfield’s call for more attention on the part of philosophers of science to the importance of such general problem setting and solving schemes in physics (Butterfield, 2004). 4.6

Further Reading

Systematic presentations of quantum information theory are given by (Nielsen and Chuang, 2000; Bouwmeester et al., 2000; Preskill, 1998; Bennett and Shor, 1998). (Schumacher, 1995) is a very instructive read, as are Shannon’s original papers (Shannon, 1948) (although there are one or two aspects of Shannon’s presentation that have promulgated confusion down the years, c.f. (Uffink, 1990) and discussion in (Timpson, 2003, §2)). (Bub, 2006b) provides in many ways a fine complement to the discussion presented here. For more detail on many of the arguments and contentions I’ve presented, see (Timpson, 2004b) and (Timpson, 2007). (Fuchs, 2003) is a pithy and instructive collection of meditations on what significance quantum information theory might hold for the foundations of quantum mechanics, including the inside story on the mischieviously polemical (Fuchs and Peres, 2000). (Fuchs, 2002b) gives important background on the development of the quantum Bayesian position while (Caves et al., 2006) provides the clearest statement to date of some important points. A very promising approach to coming to understand quantum mechanics better

252

PHILOSOPHICAL ASPECTS OF QUANTUM INFORMATION THEORY

by getting a grasp on where it is located in the space of possible theories which allow information processing of various kinds is given by (Barrett, 2007), building on the work of Popescu, Rohrlich and Hardy. See also (Barnum et al., 2006). (Leff and Rex, 2003) is a most useful collection of essays on Maxwell’s demon. See (Earman and Norton, 1998; Earman and Norton, 1999; Maroney, 2002; Bennett, 2003; Maroney, 2005; Norton, 2005; Ladyman et al., 2007) for further discussion. Vol. 34, No. 3 of Studies in History and Philosophy of Modern Physics (2003) is a special issue on quantum information and computation, containing a number of the papers that have already been referred to, along with others of interest. (Zurek, 1990) is a proceedings volume of intriguing and relatively early discussions of the physics of information. (Teuscher, 2004) is a stimulating volume of essays on Turing, his work on computers and some modern developments. Acknowledgements I would like to thank Jeremy Butterfield, Harvey Brown, Jeff Bub, Ari Duwell, Chris Fuchs, Hans Halvorson, Pieter Kok, Joseph Melia, Michael Nielsen and Rob Spekkens for useful discussion. Thanks also to the editor for his invitation to contribute to this volume and for his patience.

REFERENCES Band, W. and Park, J. L. (1970). The empirical determination of quantum states. Foundations of Physics, 1(2): 133–44. Barnum, H., Barrett, J., Leifer, M., and Wilce, A. (2006). Cloning and broadcasting in generic probabilistic theories. arXiv:quant-ph/0611295. Barnum, H., Caves, C. M., Fuchs, C. A., Jozsa, R., and Schumacher, B. (1996). Noncommuting mixed states cannot be broadcast. Phys. Rev. Lett., 76: 2818. Barrett, J. (2007). Information processing in generalized probabilistic theories. Physical Review A, 75: 032304. arXiv:quant-ph/0508211. Bell, J. S. (1966). On the problem of hidden variables in quantum mechanics. Rev. Mod. Phys., 38: 447–52. Repr. in his Speakable and Unspeakable in Quantum Mechanics 2nd edn. Cambridge: Cambridge University Press (2004). Bell, J. S. (1990). Against ‘measurement’. Physics World. August, pp. 33– 40. Repr. in his Speakable and Unspeakable in Quantum Mechanics 2nd edn. Cambridge: Cambridge University Press (2004). Bennett, C. H. (2003). Notes on Landauer’s principle, reversible computation, and Maxwell’s demon. Studies in History and Philosophy of Modern Physics, 34(3): 501–10. Bennett, C. H. and Brassard, G. (1984). Quantum cryptography: Public key distribution and coin tossing. In Proc. IEEE Int. Conf. Computers, Systems and Signal Processing, pp. 175–9. Bennett, C. H., Brassard, G., and Breidbart, S. (1982). Quantum cryptography II: How to reuse a one-time pad safely even if P=NP. Unpublished manuscript. Bennett, C. H., Brassard, G., Cr´epeau, C., Jozsa, R., Peres, A., and Wootters, W. (1993). Teleporting an unknown state via dual classical and EPR channels. Phys. Rev. Lett., 70: 1895–9. Bennett, C. H., Brassard, G., and Mermin, N. D. (1992). Quantum cryptography without Bell’s inequality. Physical Review Letters, 68(5): 557–9. Bennett, C. H. and Shor, P. W. (1998). Quantum information theory. IEEE Trans. on Inf. Theor., 44(6): 2724–42. Bennett, C. H. and Weisner, S. J. (1992). Communication via one- and two-particle operators on Einstein-Podolsky-Rosen states. Phys. Rev. Lett., 69(20): 2881–4. Bouwmeester, D., Ekert, A., and Zeilinger, A. (2000). The Physics of Quantum Information. Springer-Verlag, Berlin, Heidelberg, New York. Brassard, G. (2005). Brief history of quantum cryptopgraphy: A personal perspective. arXiv:quant-ph/0604072. Braunstein, S. L. (1996). Quantum teleportation without irreversible detection. 253

254

REFERENCES

Physical Review A, 53(3): 1900–2. Braunstein, S. L., D’Ariano, G. M., Milburn, G. J., and Sacchi, M. F. (2000). Universal teleportation with a twist. Phys. Rev. Lett., 84(15): 3486–9. Brown, H. R. (2005). Physical Relativity: Space-Time Structure from a Dynamical Perspective. Oxford University Press. Brown, H. R. and Timpson, C. G. (2006). Why special relativity should not be a template for a fundamental reformulation of quantum mechanics. In Demopoulos, W. and Pitowsky, I., eds, Physical Theory and Its Interpretation: Essays in Honor of Jeffrey Bub, Vol. 72 of The Western Ontario Series in Philosophy of Science. Springer. arXiv:quant-ph/0601182. Browne, D. E. and Briegel, H. J. (2006). One-way quantum computation: A tutorial introduction. arXiv:quant-ph/0603226. Brukner, C., Zukowski, M., and Zeilinger, A. (2001). The essence of entanglement. arXiv:quant-ph/0106119. Bruss, D. (2002). Characterizing entanglement. J. Math. Phys., 43(9): 4237–51. Bub, J. (2001). The quantum bit commitment theorem. Found. Phys., 31: 735–56. Bub, J. (2006a). A geometrical formulation of quantum algorithms. arXiv:quant-ph/0605243. Bub, J. (2006b). Quantum information and computation. In Butterfield, J. and Earman, J., eds, Philosophy of Physics, Handbook of the Philosophy of Science. Elsevier. arXiv:quant-ph/0512125. Busch, P. (1997). Is the quantum state (an) observable? In Cohen, R. S., M. Horne and J. Stachel eds, Potentiality, Entanglement and Passion-at-aDistance, pp. 61–70. Kluwer Academic Pubishers, Dordrecht, Boston, London. arXiv:quant-ph/9604014. Butterfield, J. (2004). Between laws and models: Some philosophical morals of Lagrangian mechanics. arXiv:physics/0409030. Caves, C. M. and Fuchs, C. A. (1996). Quantum information: How much information in a state vector? In Mann, A. and Revzen, R. eds, The Dilemma of Einstein, Podolsky and Rosen—60 Years Later. Israel Physical Society. arXiv:quant-ph/9601025. Caves, C. M., Fuchs, C. A., and Schack, R. (2002a). Conditions for compatibility of quantum state assignments. Phys. Rev. A, 66(6): 062111/1–11. arXiv:quant-ph/0206110. Caves, C. M., Fuchs, C. A., and Schack, R. (2002b). Unknown quantum states: The quantum de Finetti representation. J. Math. Phys., 43(9): 4537. Caves, C. M., Fuchs, C. A., and Schack, R. (2006). Subjective probability and quantum certainty. arXiv:quant-ph/0608190. Forthcoming in Studies in History and Philosophy of Modern Physics. Church, A. (1936). An unsolvable problem of elementary number theory. American Journal of Mathematics., 58: 345–65. repr. in (?) pp. 89–107. Clifton, R. (2002). The subtleties of entanglement and its role in quantum information theory. Philosophy of Science, 69(3, Part II): S150–S167.

REFERENCES

255

Clifton, R., Bub, J., and Halvorson, H. (2003). Characterizing quantum theory in terms of information theoretic constraints. Found. Phys., 33(11): 1561. Page refs. to arXiv:quant-ph/0211089. Copeland, B. J. (2000). Narrow versus wide mechanism: Including a reexamination of Turing’s views on the mind-machine issue. The Journal of Philosophy, XCVI(1). Copeland, B. J. (2002). The Church-Turing thesis. Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/archives/fall2002/entries/churchturing. Cutland, N. (1980). Computability: An introduction to recursive function theory. Cambridge: Cambridge University Press. d’Espagnat, B. (1976). Conceptual Foundations of Quantum Mechanics. Addison-Wesley, 2nd edn. Deutsch, D. (1985). Quantum theory, the Church-Turing Principle and the universal quantum computer. Proceedings of the Royal Society of London A, 400: 97–117. Deutsch, D. (1997). The Fabric of Reality. Penguin Books, London. Deutsch, D., Ekert, A., and Lupacchini, R. (1999). Machines, logic and quantum physics. arXiv:math.HO/9911150. Deutsch, D. and Hayden, P. (2000). Information flow in entangled quantum systems. Proceedings of the Royal Society of London A, 456: 1759–74. arXiv:quant-ph/9906007. Dieks, D. (1982). Communication by EPR devices. Phys. Lett. A, 92(6): 271–2. Donald, M. J., Horodecki, M., and Rudolph, O. (2002). The uniqueness theorem for entanglement measures. Journal of Mathematical Physics, 43(9): 4252–72. Dretske, F. I. (1981). Knowledge and the Flow of Information. Basil Blackwell, Oxford. Duwell, A. (2003). Quantum information does not exist. Studies in History and Philosophy of Modern Physics, 34(3): 479–99. Duwell, A. (2005). Quantum information Does exist. Unpublished manuscript. Earman, J. and Norton, J. (1993). Forever is a day: Supertasks in Pitowsky and Malament-Hogarth spacetimes. Philosophy of Science, 60: 22–42. Earman, J. and Norton, J. D. (1998). Exorcist XIV: The wrath of Maxwell’s Demon. Part I. From Maxwell to Szilard. Studies in History and Philosophy of Modern Physics, 29(4): 435–71. Earman, J. and Norton, J. D. (1999). Exorcist XIV: The wrath of Maxwell’s Demon. Part II. From Szilard to Landauer and beyond. Studies in History and Philosophy of Modern Physics, 30(1): 1–40. Eisert, J. and Gross, D. (2006). Multi-particle entanglement. In Bruss, D. and Leuch, G. eds, Lectures on Quantum Information. Wiley. arXiv:quantph/0505149. Ekert, A. (1991). Quantum cryptography based on Bell’s theorem. Phys. Rev. Lett., 67: 661–3. Ekert, A. and Jozsa, R. (1996). Quantum computation and Shor’s factoring

256

REFERENCES

algorithm. Rev. Mod. Phys., 68(3): 733–53. Ekert, A. and Jozsa, R. (1998). Quantum algorithms: Entanglement enhanced information processing. Phil. Trans. R. Soc. Lond. A, 356(1743): 1769–82. arXiv:quant-ph/9803072. Fano, U. (1957). Description of states in quantum mechanics by density operator techniques. Rev. Mod. Phys., 29(1): 74–93. Feynman, R. (1982). Simulating physics with computers. Int. J. of Theor. Phys., 21(6/7). Fuchs, C. A. (1996). Distinguishability and Accessible Information in Quantum Theory. PhD thesis, University of New Mexico. arXiv:quant-ph/9601020. Fuchs, C. A. (1998). Information gain vs. state disturbance in quantum theory. Fortschritte der Physik, 46(4,5): 535–65. arXiv:quant-ph/9611010. Fuchs, C. A. (2002a). Quantum mechanics as quantum information (and only a little more). In Khrenikov, A., ed, Quantum Theory: Reconsideration of Foundations. V¨ axj¨ o University Press. arXiv:quant-ph/0205039. Fuchs, C. A. (2002b). Quantum states: What the hell are they? (The Post-V¨ axj¨ o Phase Transition). http://netlib.bell-labs.com/who/cafuchs. Fuchs, C. A. (2003). Notes on a Paulian Idea: Foundational, Historical, Anecdotal and Forward Looking Thoughts on the Quantum (Selected Correspondence). V¨ axj¨ o University Press. arXiv:quant-ph/0105039. Fuchs, C. A. and Jacob, K. (2001). Information tradeoff relations for finitestrength quantum measurements. Phys. Rev. A, 63(6): 062305. arXiv:quantph/0009101. Fuchs, C. A. and Peres, A. (2000). Quantum theory needs no ‘interpretation’. Physics Today, 53(3): 70–1. Fuchs, C. A. and Schack, R. (2004). Unknown quantum states and operations, a ˇ aˇcek, J. eds, Quantum State Estimation, Bayesian view. In Paris, M. and Reh´ Lecture Notes in Physics, pp. 147–87. Springer. arXiv:quant-ph/0404156. Gisin, N., Ribordy, G., Tittell, W., and Zbinden, H. (2002). Quantum cryptography. Reviews of Modern Physics, 74: 145–95. Glock, H.-J. (2003). Quine and Davidson on Language, Thought and Reality. Cambridge: Cambridge University Press. Grinbaum, A. (2005). Information-theoretic principle entails orthomodularity of a lattice. Foundations of Physics Letters, 18(6): 563–72. Grover, L. (1996). A fast quantum-mechanical algorithm for database search. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing. Halvorson, H. (2004). Remote preparation of arbitrary ensembles and quantum bit commitment. Journal of Mathematical Physics, 45: 4920–31. arXiv:quantph/0310001. Hartle, J. B. (1968). Quantum mechanics of individual systems. Am. J. Phys., 36(8): 704–12. Herbert, N. (1982). Flash—a superluminal communicator based upon a new kind of quantum measurement. Found. Phys., 12(12): 1171–9.

REFERENCES

257

Hewitt-Horsman, C. (2002). Quantum computation and many worlds. arXiv:quant-ph/0210204. Hogarth, M. (1994). Non-Turing computers and non-Turing computability. Philosophy of Science Supplementary, I: 126–38. Holevo, A. S. (1973). Information theoretical aspects of quantum measurement. Probl. Inf. Transm. (USSR), 9: 177–83. Horodecki, M., Horodecki, P., and Horodecki, R. (1996). Separability of mixed states: Necessary and sufficient conditions. Phys. Lett. A, 223. Horodecki, M., Horodecki, P., Horodecki, R., and Piani, M. (2006). Quantumness of ensemble from no-broadcasting principle. International Journal of Quantum Information, 4(1): 105–18. arXiv:quant-ph/0506174. Horodecki, M., Horodecki, R., Sen, A., and Sen, U. (2005). Common origin of no-cloning and no-deleting principles—conservation of information. Found. Phys., 35(12): 2041–49. Horodecki, R., Horodecki, M., and Horodecki, P. (2001). Balance of information in bipartite quantum-communication systems: Entanglement-energy analogy. Physical Review A, 63: 022310. Jozsa, R. (1998). Entanglement and quantum computation. In Huggett, S., Mason, L., Tod, K. P., Tsou, S. T., and Woodhouse, N. M. J. eds, The Geometric Universe, pp. 369–79. Oxford University Press, Oxford. arXiv:quantph/9707034. Jozsa, R. (2000). Quantum algorithms. In Bouwmeester, D., Ekert, A., and Zeilinger, A. eds, The Physics of Quantum Information, pp. 104–26. SpringerVerlag, Berlin Heidelberg. Jozsa, R. (2004). Illustrating the concept of quantum information. IBM Journal of Research and Development, 4(1): 79–85. arXiv:quant-ph/0305114. Jozsa, R. and Linden, N. (2003). On the role of entanglement in quantumcomputational speed-up. Proc. R. Soc. Lond. A, 459(2036): 2011–32. arXiv:quant-ph/0201143. Ladyman, J., Presnell, S., Short, A. J., and Groisman, B. (2007). The connection between logical and thermodynamic irreversibility. Studies in History and Philosophy of Modern Physics, 38: 58–79. PITT PHIL SCI 00002689. Landauer, R. (1996). The physical nature of information. Phys. Lett. A, 217: 188–93. Leff, H. S. and Rex, A. F. (2003). Maxwell’s Demon 2: Entropy, Classical and Quantum Information, Computing. Institute of Physics Publishing. Lo, H. K. and Chau, H. F. (1997). Is quantum bit commitment really possible? Phys. Rev. Lett., 78(17): 3410–3. Maroney, O. (2002). Information and Entropy in Quantum Theory. PhD thesis, Birkbeck College, University of London. http://www.bbk.ac.uk/tpru/OwenMaroney/thesis/thesis.html. Maroney, O. (2005). The (absence of a) relationship between thermodynamic and logical reversibility. Studies in History and Philosophy of Modern Physics, 36: 355–74.

258

REFERENCES

Mayers, D. (1997). Unconditionally secure bit commitment is impossible. Phys. Rev. Lett., 78(17): 33414–7. Mermin, N. D. (2001a). From classical state-swapping to teleportation. Physical Review A, 65(1): 012320. arXiv:quant-ph/0105117. Mermin, N. D. (2001b). Whose knowledge? In Bertlmann, R. and Zeilinger, A. eds, Quantum (Un)speakables: Essays in Commemoration of John S. Bell. Springer-Verlag, Berlin, Heidleberg. arXiv:quant-ph/0107151. Nielsen, M. A. (1997). Computable functions, quantum measurements, and quantum dynamics. Phys. Rev. Lett., 79(15): 2915–8. Nielsen, M. A. and Chuang, I. (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press. Norton, J. D. (2005). Eaters of the lotus: Landauer’s principle and the return of Maxwell’s demon. Studies in History and Philosophy of Modern Physics, 36: 375–411. Park, J. L. (1970). The concept of transition in quantum mechanics. Foundations of Physics, 1(1): 23. Peierls, R. (1991). In defence of “Measurement”. Physics World. January pp. 19–20. Penrose, R. (1998). Quantum computation, entanglement and state reduction. Philosophical Transactions of the Royal Society of London A, 356: 1927–39. Peres, A. (1995). Quantum Theory: Concepts and Methods. Kluwer Academic Publishers, Dordrecht. Peres, A. (1996). Separability criterion for density matrices. Phys. Rev. Lett., 77(8): 1413–5. Petersen, A. (1963). The philosophy of Niels Bohr. Bulletin of the Atomic Scientists, 19(7): 8–14. Pitowsky, I. (2002). Quantum speed-up of computations. Philosophy of Science, supp. 69(3): S168–S177. Proceedings of PSA 2000, Symposia papers. Plenio, M. B. and Vedral, V. (1998). Teleportation, engtanglement and thermodynamics in the quantum world. Contemporary Physics, 39(6): 431–46. arXiv:quant-ph/9804075. Popescu, S. and Rohrlich, D. (1997). Thermodynamics and the measure of entanglement. Physical Review A, 56: R3319–R3321. Preskill, J. (1998). Preskill’s lectures on quantum computing. http://www.theory.caltech.edu/∼preskill/ph229. Raussendorf, R. and Briegel, H. J. (2001). A one-way computer. Physical Review Letters, 86(22): 5188–91. Raussendorf, R., Browne, D. E., and Briegel, H. J. (2003). Measurement-based quantum computation on cluster states. Physical Review A, 68: 022312. Rohrlich, D. (2001). Thermodynamical analogues in quantum information theory. Optics and Spectroscopy, 91(3): 363–7. Rovelli, C. (1996). Relational quantum mechanics. International Journal of Theoretical Physics, 35(8): 1637–78. Schumacher, B. (1995). Quantum coding. Phys. Rev. A, 51(4): 2738.

REFERENCES

259

Seevick, M. and Uffink, J. (2001). Sufficient conditions for three-particle entanglement and their tests in recent experiments. Physical Review A, 65: 012107. Seevinck, M. and Svetlichny, G. (2002). Bell-type inequalities for partial separability in N-particle systems and quantum mechanical violations. Physical Review Letters, 89(6): 060401. Shagrir, O. and Pitowsky, I. (2003). Physical hypercomputation and the Church-Turing thesis. Minds and Machines, 18: 87–101. Shannon, C. E. (1948). The mathematical theory of communication. Bell Syst. Tech. J., 27: 379–423, 623–56. repr. in (?) pp. 30–125; page refs. to this reprint. Shor, P. W. (1994). Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science. See also arXiv:quant-ph/9508027. Simon, C., Buˇzek, V., and Gisin, N. (2001). No-signalling condition and quantum dynamics. Phys. Rev. Lett., 87: 170405. Smolin, J. (2005). Can quantum cryptography imply quantum mechanics? Quantum Information and Computation, 5(2): 161–9. arXiv:quantph/0310067. Spekkens, R. (2007). Evidence for the epistemic view of quantum states: A toy theory. Physical Review A, 75: 032110. arXiv:quant-ph/0401052. Steane, A. M. (2003). A quantum computer only needs one universe. Studies in History and Philosophy of Modern Physics, 34(3):469–78. Svetlichny, G. (1998). Quantum formalism with state-collapse and superluminal communication. Found. Phys., 28(2): 131–55. Svetlichny, G. (2002). Comment on ‘No-signalling condition and quantum dynamics’. arXiv:quant-ph/0208049. Teuscher, C., ed (2004). Alan Turing: Life and Legacy of a Great Thinker. Springer. Timpson, C. G. (2003). On a supposed conceptual inadequacy of the Shannon information in quantum mechanics. Studies in History and Philosophy of Modern Physics, 34(3): 441–68. Timpson, C. G. (2004a). Quantum computers: The Church-Turing hypothesis versus the Turing Principle. In Teuscher, C., ed, Alan Turing: Life and Legacy of a Great Thinker, pp. 213–40. Springer-Verlag, Berlin Heidelberg. Timpson, C. G. (2004b). Quantum Information Theory and the Foundations of Quantum Mechanics. PhD thesis, University of Oxford. arXiv:quantph/0412063. Timpson, C. G. (2005). Nonlocality and information flow: The approach of Deutsch and Hayden. Foundations of Physics, 35(2): 313–43. arXiv:quantph/0312155. Timpson, C. G. (2006). The grammar of teleportation. The British Journal for the Philosophy of Science, 57: 587–621. arXiv:quant-ph/0509048. Timpson, C. G. (2007). Quantum Information Theory and the Foundations of

260

REFERENCES

Quantum Mechanics. Oxford University Press. Forthcoming. Timpson, C. G. and Brown, H. R. (2002). Entanglement and relativity. In Lupacchini, R. and Fano, V. eds, Understanding Physical Knowledge. University of Bologna, CLUEB, Bologna. arXiv:quant-ph/0212140. Turing, A. (1936). On Computable Numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 42: 230–65. repr. in (?) pp. 116–51. Uffink, J. (1990). Measures of Uncertainty and the Uncertainty Principle. PhD thesis, University of Amsterdam. http://www.phys.uu.nl/igg/jos/publications/proefschrift.pdf. Ursin, R., Tiefenbacher, F., Schmitt-Manderbach, T., Weier, H., Scheidl, T., Lindenthal, M., Blauensteiner, B., Jennewein, T., Perdigues, J., Trojek, P., Omer, B., Furst, M., Meyenburg, M., Rarity, J., Sodnik, Z., Barbieri, C., Weinfurter, H., and Zeilinger, A. (2007). Entanglement-based quantum communication over 144km. Nature Physics. http://dx.doi.org/10.1038/nphys629. Vaidman, L. (1994). On the paradoxical aspects of new quantum experiments. In Hull, D., Forbes, M., and Burian, R. eds, PSA 1994, Vol. 1, pp. 211–7. Philosophy of Science Association. Valentini, A. (2002a). Signal-locality and subquantum information in deterministic hidden-variable theories. In Butterfield, J. and Placek, T. eds, NonLocality and Modality, Vol. 64 of NATO Science Series: II. Kluwer Academic. arXiv:quant-ph/0112151. Valentini, A. (2002b). Subquantum information and computation. Pramana J. Phys., 59: 31. arXiv:quant-ph/0203049. Vedral, V. and Plenio, M. B. (1998). Entanglement measures and purification procedures. Physical Review A, 57(3): 1619–33. Vedral, V., Plenio, M. B., Rippin, M. A., and Knight, P. L. (1997). Quantifying entanglement. Physical Review Letters, 78(12): 2275–9. Wallace, D. and Timpson, C. G. (2006). Non-locality and gauge freedom in Deutsch and Hayden’s formulation of quantum mechanics. arXiv:quantph/0503149. Forthcoming in Foundations of Physics. Wheeler, J. A. (1990). Information, physics, quantum: The search for links. In Zurek, W., ed, Complexity, Entropy and the Physics of Information, pp. 3–28. Addison-Wesley, Redwood City, CA. Williams, C. P. and Clearwater, S. H. (2000). Ultimate Zero and One: Computing at the Quantum Frontier. Copernicus: Springer-Verlag. Wootters, W. K. and Zurek, W. H. (1982). A single quantum cannot be cloned. Nature, 299: 802–3. Yao, A. C. (1993). Quantum circuit complexity. In Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science. Zeilinger, A. (1999). A foundational principle for quantum mechanics. Found. Phys., 29(4): 631–43. Zurek, W., ed (1990). Complexity, Entropy and the Physics of Information. SFI

REFERENCES

261

Studies in the Sciences of Complexity, vol. VIII. Addison-Wesley, Redwood City, CA.

5 QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS DEAN RICKLES Introduction ‘Quantum Gravity’ does not denote any existing theory: the field of quantum gravity is very much a ‘work in progress’. As you will see in this chapter, there are multiple lines of attack each with the same core goal: to find a theory that unifies, in some sense, general relativity (Einstein’s classical field theory of gravitation) and quantum field theory (the theoretical framework through which we understand the behaviour of particles in non-gravitational fields). Quantum field theory and general relativity seem to be like oil and water, they don’t like to mix—it is fair to say that combining them to produce a theory of quantum gravity constitutes the greatest unresolved puzzle in physics. Our goal in this chapter is to give the reader an impression of what the problem of quantum gravity is; why it is an important problem; the ways that have been suggested to resolve it; and what philosophical issues these approaches, and the problem itself, generate. This review is extremely selective, as it has to be to remain a manageable size: generally, rather than going into great detail in some area, we highlight the key features and the options, in the hope that readers may take up the problem for themselves—however, some of the basic formalism will be introduced so that the reader is able to enter the physics and (what little there is of) the philosophy of physics literature prepared.1 I have also supplied references for those cases where I have omitted some important facts. Hence, this chapter is intended primarily as a catalyst for future research projects by philosophers of physics, both budding and well-matured. 5.1

The Strange Case of Quantum Gravity

Quantum gravity involves the unification of the principles of quantum theory and general relativity. Constructing such a theory constitutes one of the greatest challenges in theoretical physics. It is a particularly hard challenge for many reasons, both formal and conceptual, some of which I aim to elucidate in what follows. Even up until the 1970s, quantum gravity was, as Michael Duff remarks, “a subject ... pursued only by mad dogs and Englishmen” ((Duff, 1999), p. 185). That is something of an exaggeration, of course: for starters, when Duff speaks of 1 Unlike the chapters on quantum theory and statistical mechanics—less so quantum information theory—this chapter is, therefore, mainly devoted to spelling the various research programmes out, without quite so much emphasis on the nitty gritty philosophical problems.

262

THE STRANGE CASE OF QUANTUM GRAVITY

263

quantum gravity, he has in mind the particle physicist’s approach to the problem, according to which the gravitational interaction involves an exchange of gravitons (the quanta of the gravitational field). However, quantum gravity, understood as the general unificatory problem sketched above, has been pursed for around eighty years in some form or another, by many of the greatest physicists, many of whom were not English! The remark has more than a grain truth to it though; even now quantum gravity research is looked upon with some trepidation and, often, bemusement. This attitude stems primarily from the extreme detachment of quantum gravity research from experimental physics—a feature that leads Nambu (1985) to refer to quantum gravity research as “postmodern physics”! As a result, much of the research conducted in quantum gravity looks like an exercise in pure mathematics or, sometimes, metaphysics. This aspect makes quantum gravity especially interesting from a philosophical point of view, as I shall attempt to demonstrate throughout this chapter. Quantum gravity is, in fact, rather curious from the point of view of the philosopher of physics because it is one of the few areas of contemporary physics where (some of) the physicists who are central figures in the field actively engage with philosophers, collaborating with them, participating in philosophy conferences, and contributing chapters to philosophical books and journals (a pair of recent examples are: (Callender and Huggett, 2001), and (Rickles et al., 2006)). Carlo Rovelli, co-founder of one of the main lines of attack known as ‘loop quantum gravity’, explicitly invites philosophers’ cooperation, writing (in an essay from another philosophy collection: (Earman and Norton, 1997)): As a physicist involved in this effort [quantum gravity—DR], I wish the philosophers who are interested in the scientific description of the world would not confine themselves to commenting and polishing the present fragmentary physical theories, but would take the risk of trying to look ahead. ((Rovelli, 1997), p. 182)

Similarly, John Baez—a mathematical physicist who has done important work in making loop gravity rigorous—(again writing in a philosophical collection: (Callender and Huggett, 2001)) writes: Can philosophers really contribute to the project of reconciling general relativity and quantum field theory? Or is this a technical business best left to the experts? [...] General relativity and quantum field theory are based on some profound insights about the nature of reality. These insights are crystallized in the form of mathematics, but there is a limit to how much progress we can make by just playing around with this mathematics. We need to go back to the insights behind general relativity and quantum field theory, learn to hold them together in our minds, and dare to imagine a world more strange, more beautiful, but ultimately more reasonable than our current theories of it. For this daunting task, philosophical reflection is bound to be of help. ((Baez, 2001), p. 177)

This ‘intellectually open’ attitude, coupled with what we might call the ‘fluid’ state of play in quantum gravity, makes it possible that philosophers of physics

264

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

might get involved with the constructive business of physics, as the philosopher Tian Yu Cao points out, “with a good chance to make some positive contributions, rather than just analysing philosophically what physicists have already established” ((Cao, 2001), p. 183).2 However, the exact nature of these ‘positive contributions’ is still unclear at the present stage, though this primer will point out some possible directions. However, perhaps the most important reason for the introduction of philosophical reflection in the area of quantum gravity research is, as suggested above, the fact that it does not (thus far) have the character of an empirical problem: current physics, as exemplified by the standard model of particle physics plus general relativity, does not contradict any piece of experimental evidence.3 What matters most, in quantum gravity, is internal consistency, in the sense of some particular approach to the problem being logically coherent, and external compatibility, in the sense of the particular approach being compatible with our most well-established background knowledge (i.e. with what we know from both theory and experiment)—cf. (t’Hooft, 2001). These constraints do not appear to be sufficiently stringent to uniquely determine the desired theory of quantum gravity; instead there are multiple research avenues that each seem to satisfy the constraints (or at least approximate them). The extent to which the various approaches are really quantum theories of gravity is somewhat controversial: string theory is only known at a perturbative level (as is quantum electrodynamics), which is not sufficient to provide the full theory of quantum gravity we are seeking; loop quantum gravity faces (amongst other problems) a ‘reconstruction problem’ according to which classical general relativity is not demonstrably proven to be its classical limit. This problem aside, satisfying these demands can often direct the specific research programmes into conceptual problems that philosophers are well acquainted with—problems to do with the nature of space, time, matter, causality, change, identity, substance, and so on (i.e. the warhorses of traditional philosophy). We discuss this aspect in the next section, and also 2 We should not get too carried away though, and ought to heed Dirac’s warning—selfprofessed enemy of philosophy though he was—that “[u]nless such [philosophical] ideas have a mathematical basis they will be ineffective” ((Dirac, 1978), p. 1). What the above quotations are intended to convey is the fact that ‘mathematical’ and ‘philosophical’ are woven together more tightly than usual in a great deal of quantum gravity research, not that we can make scientific progress by idle philosophical speculation. 3 Very roughly, the standard model of particle physics is our best description of the strong nuclear, weak nuclear, and electromagnetic forces (or interactions)—of course, the standard model unifies the weak and electromagnetic into a single ‘electroweak’ force. It tells us that matter is composed of particles called ‘fermions’ bound together by the exchange of (strong, weak, and electromagnetic) force-carrying particles called ‘bosons’. Gravity is not included in the interactions treated in the standard model. Some quantum gravity researchers—especially those from the particle physics community—view the search for a quantum theory of gravity as tantamount to the search for a unified description of all interactions. That is, they suppose that the problem of quantum gravity can only be satisfactorily resolved by unifying the gravitational force with the other forces of nature and matter. This would supply the fabled ‘theory of everything’ or ‘TOE’.

WHAT IS THE PROBLEM OF QUANTUM GRAVITY?

265

consider what the possible motivations of quantum gravity research might be if not empirical ones. Let us now turn to the general characterization of quantum gravity—this primarily involves explaining the nature of the problem that such a theory is intended to resolve. We will then give a brief history of quantum gravity. After this, we shall present the central ideas required from the ingredient theories of quantum gravity, namely quantum theory and general relativity. Then we can turn our attention to the various approaches aimed at reconciling these two theories. We then consider several issues of ‘special interest’: the nature and rˆole of background independence, the experimental status of quantum gravity, the relevance of quantum gravity research to the interpretation of quantum theory, and ‘cross-fertilization’ amongst the various approaches. We finish off by speculating about the future of the field and the future for philosophical research within this area. A brief annotated resource list is appended to aid readers new to quantum gravity. 5.2

What is the Problem of Quantum Gravity?

This is perhaps the most speculative of the chapters in this textbook for the simple reason that, as mentioned above, an established theory of Quantum Gravity does not yet exist. However, the problem that a theory of quantum gravity is supposed to resolve does exist, and it is this problem and its possible solutions (of which there are many) that we wish to focus upon.4 We can isolate three strictly separate research programmes, in the literature, that can be found to go by the name ‘quantum gravity’. 5.2.1

Quantum Gravity as a Theory of Everything

On the one hand, as briefly described in footnote 3, there is a problem concerning two theories—general relativity and the standard model of particle physics (formulated in the framework of quantum field theory)—that aim to describe the way particles interact in fields, and these must be unified in some way. The latter theory deals with non-gravitational interactions (the strong, weak, and electromagnetic forces), that involve physics of order of magnitude of atoms and smaller, and the former deals with the gravitational interactions, at orders of 4 Contrary to what some other philosophers might say (e.g. (Weinstein, 2005), p. 2), there is a sense in which these proposed solutions (i.e. particular approaches to resolving the problem of quantum gravity) are amenable to the sort of ontological and epistemological analysis that philosophers of physics typically engage with (i.e. for well-established theories such as special relativity and quantum mechanics). One can ask what the ontological consequences of a particular approach are, what it says about space, time, matter, causality, and so on. This will tell us what kind of a universe the approach describes. And one can ask epistemological questions, such as whether it is possible to distinguish between certain approaches or certain interpretations of a particular approach and whether and what we could come to know about the universe as given by some approach. Quantum gravity is a perfectly suitable subject for interpretive questions of this kind—cf.. (Callender and Huggett, 2001), p. 3, for a similar viewpoint.

266

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

magnitude much greater than that of atoms.5 General relativity is a fully classical field theory, meaning it considers only the interaction of classical matter and fields with the (classical) gravitational field: so, no particles or fields with indefinite properties and trajectories—i.e. no uncertainty principle operating.6 This should impose, as a minimal constraint on quantum gravity, the (correspondence) principle that when quantum effects can be neglected (i.e. when  → 0) we recover classical general relativity (with no fluctuations as a result of the uncertainty relations), and when general relativistic effects can be neglected (i.e. when G → 0) we recover standard quantum field theory (with no deviations from flat, or at least fixed spacetime geometry)—of course, when c → 0 (i.e. when velocities are much lower than c) we can neglect special relativistic effects.7 According to this view of the problem of quantum gravity, the issue is one of unification of a rather grand sort. In order to achieve a unification of quantum theory with general relativity one must devise a single framework for treating all interactions, and so describe gravitational and non-gravitational interactions in the same scheme—hence, this goes beyond the ‘mere’ quantization of gravity. This is the approach of string theory. Problems with straightforward quantizations of gravity lead most string theorists to believe that quantum gravity and grand unification in a theory of everything are one and the same problem. Needless to say, this view is not shared by many outside of string theory. 5.2.2

Quantum Gravity as Quantum Cosmology

Quantum gravity is sometimes erroneously used to refer to quantum cosmology, the programme aimed at understanding ‘the universe as a whole’ quantum mechanically (and, therefore, in the absence of ‘external’ observers)—a prime example of this conflation is (Robson, 1996) (see especially §1.1); see (DeWitt, 1967c; Wheeler, 1968; Misner, 1969) for the first real steps in the field of quantum cos5 Hence, one often sees the claim that quantum theory is our ‘theory of the very small’, and general relativity is our ‘theory of the very large’—see (Penrose, 2000) for an example. This demarcation of the domains of fundamental physical theory has allowed for the ‘schizophrenic’ worldview to persist for so long: general relativity and particle physics can safely ignore each other at currently accessible scales. 6 It is also non-linear, which causes many a formal headache, especially when attempting quantization by standard methods. The structure of quantum theory is, of course, linear. One of the drastic approximation methods used to aid quantization of the gravitational field is precisely to ‘linearize’ the theory. However, this is too approximate to deliver much insight about full quantum gravity: the interesting features of gravity are, to a large extent, determined by the non-linearity. Note, however, that Roger Penrose, and some others, view the non-linearity as being implicated in the (nonlinear) collapse of the wavefunctions of quantum systems—see §5.3.7 below for more on this. Note, however, that non-linearity is by no means a special feature of general relativity: any interacting theory is non-linear, and they face the same kinds of technical problems. 7 One would, of course, like this constraint to uniquely specify a theory. Unfortunately, however, this minimal constraint appears to be satisfiable by multiple approaches. Indeed, there are ‘folk theorems’ that point to the fact that it will be met by any Lorentz invariant quantum theory of a spin-2 particle (coupled to a suitable energy-momentum tensor)—see (Wald, 1986) for a thorough examination.

WHAT IS THE PROBLEM OF QUANTUM GRAVITY?

267

mology. Hence, one speaks in this context of the ‘wavefunction of the universe’, ˆ = 0 (in the canonical governed by the famous ‘Wheeler-DeWitt’ equation HΨ formalism). Here the quantum state Ψ is a functional of the metric and matter fields. This is a constraint equation, the vanishing of which informs us that the states have to be diffeomorphism invariant. That this equation is the central dynamical equation of the theory and is time independent forms one of the thorniest conceptual problems of quantum gravity: the problem of time—we return to this in §5.6.4.6. Certainly, quantum cosmology is philosophically very interesting, and intimately connected to the problem of quantum gravity, but it is strictly distinct. Quantum cosmology is presumably something a quantum theory of gravity would have to tell us about,8 but one can also discuss quantum cosmology independently of specific quantum theories of gravity, and vice versa. The obvious difference is that the fact that our universe contains gravity is a contingent matter: there are possible universes in which gravity does not exist. Structurally, of course, it will be a very different kind of universe, with different interactions, but given that it is a single object, one can talk about quantum cosmology in that universe too, despite the fact that gravity does not exist within it at all. Hence, the two topics are logically quite distinct despite being, in fact, physically related. Even given this physical relatedness, in our own universe, differences emerge in the kinds of problems quantum cosmology and quantum gravity each deal with. Most obviously, quantum gravity will ultimately be concerned with systems that are miniscule, whereas quantum cosmology will be concerned with systems that are enormous—although, as I mentioned in the previous footnote, the two have things to say to each other. In particular, since gravity is the dominating force at large scales, a theory of quantum cosmology will involve a quantum theory of gravity. Moreover, since we have evidence that there are epochs in the universe’s history—including likely future epochs—in which the universe is incredibly dense (and small), it seems evident that quantum mechanical effects will be in operation, and so quantum gravity is required. 5.2.3

Quantum Gravity as Synthesis of ‘Quantum’ and ‘Gravitation’

More generally, when we speak of ‘the problem of quantum gravity’ we will not mean either of these two things. Instead, we will mean the integration of general relativity with quantum theory simpliciter, rather than with the standard model (our most empirically successful quantum theory). The latter is essential only for grand unification models or theories of everything (such as string theory and 8 Indeed, there are thriving research sub-programmes being pursued in the different approaches to quantum gravity: string cosmology (Gasperini, 2007), loop quantum cosmology (Bojowald, 2005), causal-set cosmology (Sorkin, 2000) and more. These applications of the approaches to quantum gravity are yielding predictions that have the potential to be tested with current, or near-future technologies. Hence, ultimately, it may prove to be the case that quantum cosmology is the testing-ground of quantum gravity: i.e. the very large is used to test claims about the very small. We shall have more to say about this in §5.7.3.

268

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

many particle physics-inspired approaches), and not strictly part of quantum gravity per se. Most approaches to the quantum gravity problem do not consider the standard model (string theory is an exception), or attempt at best aim to demonstrate compatibility with it. Quantum gravity in the sense of this chapter is a more minimal notion; one can consider a ‘pure’ quantum theory of gravity, independent of any other fields whatsoever. Thus, Ashtekar and Geroch, in their superb early review of quantum gravity, characterize quantum gravity as “some physical theory which encompasses the principles of both quantum mechanics and general relativity” ((Ashtekar and Geroch, 1974), p. 1213).9 This way of understanding the problem is at the same time more specific and more general than the other ways. It is more general in that, inasmuch as the two approaches mentioned above will qualify as a possible solution of the problem so conceived, they are included in this characterization. On the other hand, it is more specific in that it focuses attention on one particular kind of interaction, gravitation, in much the same way that quantum electrodynamics [QED] focuses on electromagnetism, ignoring the rest—although, strictly speaking, the validity of QED is hedged on the imposition of a cutoff, which depends on the presence of other fields. That is, this viewpoint allows for the possibility of quantum gravity theories that are not at the same time theories of everything. 5.3 Why Bother? Arguments for Quantum Gravity Both of these theoretical frameworks, general relativity and quantum (field) theory, are experimentally very well confirmed: no experimental test has conflicted with either—many of these tests have been highly novel and incredibly accurate. If there is no empirical anomaly, then why bother with quantum gravity: why do we need such a theory? There are two main lines of response to this question: one is formal the other conceptual—there are also some experimental considerations, but these are far from decisive.10 There are other reasons that fall somewhere between formal and conceptual. We outline some of these below.11 Firstly, let us spell out the peculiar nature of the quantum gravity problem some more. It was commonly believed that the quantum nature of gravity would never make itself felt in any physically realizable experiment. This is still a common notion, and it isn’t entirely clear whether the voices to the contrary are right or not. No definitive experiment has been performed to reveal these quantum properties yet. As Unruh ((1984), p. 238) explains, there is an enormous difference 9 However, exactly what is meant by ‘integration’ or ‘encompasses’ here is not completely clear, and the various approaches differ in how they understand these notions (and their close cousins: ‘synthesize’, ‘unify’, and so on). We return to this point below in §5.3.5. 10 For example, the Higgs boson—supposedly responsible for the spontaneous violation of gauge symmetry, giving masses to gauge bosons and fermions—has yet to be experimentally detected. The Large Hadron Collider [LHC] should be able to generate sufficient beam energy to produce the Higgs. Similarly, Hawking radiation has yet to be discovered. 11 One of these—the ‘formal necessity’ response—will occupy us when we reach the ‘semiclassical gravity’ section in the context of the various ‘Methods’—see §5.6.1.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

269

between the coupling of an electron’s charge to the electromagnetic field and the coupling of its mass to the gravitational field: a coupling of 10−2 as compared with 10−22 !12 Unruh’s response to the question of whether we can ignore quantum gravity is nonetheless a resounding No: the coupling to the gravitational field of small masses is indeed miniscule, but large aggregates of matter, consisting of enormous numbers of particles, can have huge coupling constants (i.e. huge masses). That is, mass is itself an aggregative property of objects; an object composed of two objects with masses m1 and m2 will simply have the mass M = m1 + m2 : if you want to increase the coupling constant just add more masses. On this basis Unruh advances an experiment based on neutron stars, acting as a source of gravitational waves, rather than electrons. The idea is to set up an analogue of the slit experiments so as to exhibit macroscopic quantum interference effects, using another (cold) neutron star as the ‘detector’ and a black hole as the ‘slit’. One then ‘counts quanta’ (gravitons) hitting the detector star and watches for discontinuous variations in the number. From this (nomologically possible) setup and these transitions one can read off the required interference effects. Hence, we have an ‘in principle’ experiment that would exhibit, at a macroscopic level, the quantum nature of the gravitational field. As Unruh points out, however, this does not tell us much about how to actually formulate a theory to explain these results and advance predictions about new phenomena. Attempts to apply standard quantization methods have thus far failed to yield a consistent, complete, and/or empirically satisfactory theory. The question remains, then, given the lack of an experimental basis, why bother pursuing quantum gravity? 5.3.1

Dimensional Analysis

There are three fundamental constants that are expected to play a rˆ ole in quantum gravity: c, G, and .13 The values of these constants inform us about the scale at which relativistic, gravitational, and quantum effects become important: •  (Planck’s constant, with dimension L2 M T −1 ) sets the size at which quantum effects become important. • G (Newton’s universal constant of gravitation, with dimension L3 M −1 T −2 ) sets the size at which gravitational effects become important. • c (the speed of light, with dimension LT −1 ) sets the scale at which relativistic effects become important. 12 Feynman ((1963), p. 697) gives an example that perhaps highlights this difference more dramatically by showing how the gravitational coupling between a proton and an electron in a hydrogen atom would shift the wave-function by just 43 arcseconds over a time period of 100 times the age of the Universe! 13 We should expect to see Boltzmann’s constant at some level too; however, we shall ignore thermal considerations here. We mention some ‘semi-classical’ phenomena in §5.6.1.2 (and more briefly in the next subsection), where we discuss the results from the physics of quantum black holes, in which black holes radiate (approximately) thermal radiation.

270

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

As Planck noticed (Planck, 1899), these constants can be combined in a unique way so as to determine a fundamental length: lP = G/c3 ≈ 1.62 × 1033 cm (known as the Planck length). The Planck energy, running inversely to the Planck length, is 1.22 × 1019 GeV. At lengths bigger than this we can safely adopt a schizophrenic attitude with respect to gravity and the quantum (i.e. we can ignore possible interactions between them). However, close to this length there will be a non-negligible interplay between them: this is the scale at which quantum gravity is expected to play a rˆ ole.14 Since it is a length composed of the fundamental constants of our well-established theories, we should expect there to be physics operating at this scale in such a way as to combine the physics associated with the various constants. Many believe that this combination will result in fluctuations of the metric of spacetime at such scales. In more detail, general relativity cannot be ignored when the mass of an object is of the order of its Schwarzschild radius. If one has a particle of mass m and a Compton wavelength near to its Schwarzschild radius, then one would require both general relativity and quantum field theory (and so, presumably, quantum gravity). The Compton wavelength lC of a particle tells us that if we wish to localize a particle of mass m to within length lC then we must input sufficient energy to create another particle of mass m. This length is associated with quantum field theory; it is formed from a combination of  and c: lC = /mc. The Schwarzschild radius lS is similar: it tells us that if we condense an object of mass m to a size s < lS a black hole will form as a result. This length is associated with general relativity, being formed from a combination of G and

c: lS = Gm/c2 . Now, when m = MP = c/G (the ‘Planck mass’) the two lengths become identical. (Note that short distance measurements δx demand very high energy waves compressed into a very small volume (length and energy being inversely related via )—this process forces the creation of microscopic black holes, as mentioned. However, these will Hawking evaporate in a decay time δt ∼ lP2 /δx. This simple fact crops up in a variety of contexts in quantum gravity research.) Now, the more pragmatic reader may still be wondering why we should bother with quantum gravity, since the length is so miniscule as to be utterly out of reach empirically speaking. However, this would ignore crucial physical implica14 Hence, as Schweber points out, “it is constants of nature ... that demarcate domains” ((Schweber, 1992), pp. 138–9). By this is meant that the scales at which the physics circumscribed by some theory become important is set by the fundamental constants that the theory depends on—Heisenberg unpacked this idea in a systematic way in (Heisenberg, 1934). However, Diego Meschini has recently argued that we should express the relevance of Planck units to quantum gravity research as “a humble belief, and not as an established fact” ((Meschini, 2007), p. 15). The main argument is that the final theory of quantum gravity might either contain additional constants, or else be missing c, G, or : only empirical research can tell us which alternative is the case. This is perfectly true and Meschini is right to draw attention to the laxness in the dimensional argument. However, while the dimensional argument cannot demonstrate the necessity of the Planck scale for quantum gravity, it does appear to point to its sufficiency.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

271

tions of our well-established theories: namely that they predict singularities of various kinds. It ignores certain phenomena that should exist in universes with both quantum fields and gravity (such as black hole evaporation). It also ignores fundamental conflicts between these background theories, such as the cosmological constant problem, and also recent advances made on the experimental side of quantum gravity—on which, see §5.7.3. We deal with these implications, and others, in the remainder of this section. 5.3.2

Black Holes and Spacetime Singularities

A great deal of the motivational underpinning of quantum gravity, especially in more recent times, has had to do with various aspects of black hole physics and cosmology. Of paramount importance in this regard was Stephen Hawking’s discovery that black holes radiate (Hawking, 1975).15 Hawking’s Black Hole Information Paradox is one important way in which the conflict between quantum mechanics and gravity—and the need for a quantum theory of gravity—becomes readily apparent.16 Suppose we have some matter in a pure quantum state |ψ, with density matrix ρ = |ψψ|. The entropy of a system in such a state is given by S = −Trρlnρ = 0.17 Hawking radiation causes the black hole to radiate, thus decaying.18 The emitted radiation is (approximately) thermal, so that the state is represented by a (mixed) density matrix

ρ = Σn pn |ψψ| = 0 (the entropy is of the order Mp : the Planck mass = c/G = 2.17665 × 10−5 g). Hence, the process of black hole evaporation results in an apparent loss of information: as entropy goes up, information goes down according to ΔInformation = −ΔEntropy. What has happened? To get this result Hawking used a semi-classical analysis involving a quantum field on a classical, fixed black hole background geometry. We seem to have a conflict: 15 Erik Curiel bemoans the ubiquitous practice of referring to these black hole results as “physical fact”. On the equally widespread practice of using the results to defend approaches to quantum gravity he writes that “[t]he derivation of the Bekenstein-Hawking entropy formula by the counting of microstates, intriguing and impressive as it may be in many ways, cannot serve as a demonstration of the scientific merit of a theory of quantum gravity, for the BekensteinHawking entropy formula itself has no empirical standing” ((Curiel, 2001), p. S435). However, this does not mean that it has no standing at all, and that it does not have sufficient standing to provide reasons for upping the credence one is willing to give to some approach to quantum gravity. The fact remains, if we believe (to some degree) in general relativity and quantum field theory, given the direct empirical support of those theories, then we can reasonably believe in those results that are a necessary consequence of their merger. However, we can perhaps agree with Curiel that the practice of treating the black hole results as data is simply false. 16 The quantum and statistical mechanical background required by this example can be found in the chapters by Wallace, Frigg, and Timpson. 17 Note that it was Jacob Bekenstein (1973) who made the connection between black holes (or, rather, their horizon areas) and entropy on the basis of Hawking’s theorem (1971) that a black hole’s surface area can never decrease. 18 Not surprisingly, given the equivalence principle, an analogous effect occurs for accelerated observers in the Minkowski vacuum: the observer will detect thermal radiations known as ‘Rindler quanta’ (this effect is known as the ‘Unruh effect’ or, sometimes, the ‘Davies-Unruh effect’).

272

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

black hole evaporation leads to an evolution of a pure state into a mixed state, in violation of basic quantum mechanical principles. Bryce DeWitt endorses the view that the quantum aspects of black holes motivate quantum gravity research in the very strongest of terms, writing that “Hawking’s discovery ... shows in a most striking way that general relativity must be wedded to the quantum theory if consistency with statistical mechanics is to be assured” ((DeWitt, 1980), p. 697). There are several proposals for accounting for the missing information: • Information Loss: the black hole together with the information stored in it disappear entirely in blatant violation of unitarity (since pure states can evolve into mixed states). This is a bullet-biting strategy and, as such, is the least popular.19 • Information Recovery: the information is ‘contained’ in the outgoing Hawking radiation and is recoverable in principle (if not in practice), thus preserving unitarity. This is the standard response to the problem. • Remnants: the information release is a quantum gravitational effect that kicks in once the black hole is of the order Mp . Computations on the time needed to release all of the stored information given the small amount of available energy mean that a persisting slowly decaying ‘remnant’ must remain.20 It is widely believed that in order to resolve the problem satisfactorily one needs a quantum theory of gravity to tell us about the nature of ‘the final state’ of evaporation. Indeed, the thermal spectrum of a black hole is now considered to be a ‘target’ that any respectable theory of quantum gravity must hit: both of the main contenders (string theory and loop gravity) do so, in very different ways and at different levels of generality. In more detail, what one requires is a microscopic description of black holes, or a way of counting the quantum states of black holes. With the quantum geometry of string theory and loop quantum gravity this is possible, albeit given certain provisos: the former uses D-brane technology (where D-branes are the microstates of black holes) and the latter uses the intersections of spin-networks (the quantum states of the gravitational field) with the surface of the black hole (i.e. the horizon). The computation of the entropy of a black hole is then just a combinatorial procedure. An interesting aspect of the black hole information problem in relation to quantum gravity is the fact that, in many approaches to quantum gravity, time is no longer fundamental. Given that the information problem has to do with 19 Note that this was in fact Hawking’s own response to the information problem. In controversial circumstances, at the GR17 conference in Dublin in 2004, he recently rejected his earlier view in favour of the more orthodox ‘correlation’ response, mentioned below—see http://math.ucr.edu/home/baez/week207.html for a transcript of his talk, with commentary by John Baez. 20 A philosophical ‘mini-debate’ on the cogency of the remnant resolution can be found in (Belot et al., 1999; Bokulic, 2001; 2005). I refer the reader to these papers for more details.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

273

whether black hole evaporation does or does not violate unitarity, one has to say something about time (unitarity being conservation of probabilities over time). If time is simply absent at the quantum gravitational level then it seems that the problem evaporates too, thus offering more (admittedly somewhat ad hoc) support for the idea that we need a quantum gravity—cf. (Kiefer, 2006) for more on this issue. 5.3.3

Cosmological Considerations

Given the observed expansion of the universe, if we were to trace it back we would find that it leads to an initial (‘Big Bang’) singularity.21 Naturally, given the magnitude of the curvature and tidal forces at such a singularity, the equations of classical general relativity will not be applicable. At this scale, as we have seen, we will have quantum effects and gravity operating together, and so will need a quantum theory of gravity to describe what is going on.22 As mentioned above, the independence of quantum gravity from quantum cosmology can be seen clearly by comparing the class of questions that each deals with. Quantum cosmology is focused on the problem of constructing a quantum theory of a single object with no environment: no external, classical observers. However, in quantum gravity a perfectly reasonable problem is to consider measurements of the gravitational field in a region of space made by just such an external, classical observer—see (Rovelli, 1991), §7 for more on this. Hence, cosmology and quantum gravity are strictly independent domains of inquiry. However, given the contingent facts as revealed by the cosmic background radiation, it is clear that there is at least one epoch in our own universe’s history in which quantum gravitational effects (as delineated by the Planck units) will become important. Hence, cosmological considerations point to the demand for a quantum theory of gravity—note also, that much of the work carried out in quantum cosmology (especially in the early days) is conducted in the framework of quantum geometrodynamics. 5.3.3.1 The Cosmological Constant Problem A more pressing problem is known as the ‘cosmological constant problem’. This is too big a problem to cover in this primer, however, it is an important part of quantum gravity research. The cosmological constant Λ corresponds to vacuum energy present throughout 21 Our best evidence for the existence of a Big Bang in the universe’s past is, of course, encoded in the quasi-uniformity of the cosmic background (blackbody, T0 = 2.728K) radiation predicted by Gamov and found by Penzias and Wilson—we can safely set aside the various problems with this view (e.g. the horizon problem and the ‘clumpiness’ problem since they lie outside our domain of interest in this chapter). For more details, see the excellent book (Maoz, 2007) for an elementary guide or the equally excellent book (Dodelson, 2003) for a more in depth guide—(Mukhanov, 2005; Liddle and Lyth, 2000) offer more advanced presentations, both well worth a read. 22 Recall from §5.2.2 that this situation might be viewed as signalling the need for a quantum cosmological theory instead, so that a wave-function of the universe replaces a classical spacetime geometry. Again, this does not thereby imply that we will have a quantum theory of gravity on the table since the two can proceed independently of one another.

274

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

space. As is well known, it was originally introduced by Einstein as a way (the only way in four dimensions) to extend his field equations for gravity to achieve a (quasi-) static Universe.23 The problem relevant to quantum gravity research has to do with fact that quantum field theory and general relativity have very different ways of computing the value of the cosmological constant (yielding very different values). The energy spectrum of an harmonic oscillator, EN = (N + 1/2)ω, has a nonzero ground state in quantum mechanics. This is the zero-point energy, standardly explained by reference to the uncertainty principle (i.e. there’s no way to freeze a particle). In the context of (free) quantum field theory the field is understood to be an infinite family of such harmonic oscillators, therefore the energy density of the quantum vacuum is going to be infinite on account of the nonzero contribution from each vibrational mode of the fields being considered. One can bring this value down in various ways: by imposing a cutoff at the Planck length, ignoring those modes that have wavelengths smaller than this, or by turning on the interactions between the vibrational modes. This still leaves an extremely large value. Very loosely speaking, in quantum field theory this infinite (or very large) value can be swept under the rug (the energy can be rescaled to zero): only energy differences between the vacuum and energy states make sense. That is to say: the absolute value of the energy density of the vacuum in quantum field theory is unobservable. The value one gives is largely a matter of convention. This is not true in cosmology. Here the cosmological constant, or the curvature of spacetime, is taken to measure the energy density of empty space. This provides a way to perform experiments to empirically determine the correct value. The energy density, as actually observed in curvature measurements, comes out as ρ  10−30 gcm−3 . Very close to zero. The problem is, then: why is the measured value of the cosmological constant so much smaller than that predicted by quantum field theory? Here is a theoretical conflict between our two great fundamental theories. It is partly empirical too since we have a constraint impose by the measured value for the curvature of the Universe. Resolving this problem is one of the challenges for a quantum theory of gravity. 5.3.4

Non-Renormalizability of General Relativity

Quantum field theory amounts to a synthesis of quantum mechanics and special relativity: fields are pretty much forced by this unification, since one must account for situations with variable particle number (i.e. the creation and annihilation of particles) as indicated in high-speed collision experiments. The framework it 23 The history of the cosmological constant is an extremely interesting episode in physics. I refer the reader to Norbert Straumann’s (2008) excellent account. See also (Straumann, 2008) for a more advanced account, including the connection to the problem of Dark Energy. Sean Carroll has some excellent free online material on the cosmological constant problem at his website: http://preposterousuniverse.com.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

275

provides has found a home for quantum theories of the electromagnetic, strong, and weak interactions. In this approach—specifically, focusing on perturbation theory—the excitations of the various fields (photons, gluons, etc.) mediate the various interactions—for example, photons mediate the electromagnetic interactions between charged particles in the context of quantum electrodynamics [QED]. An old and venerable approach tried to accommodate gravity in the same formal framework: here the gravitational interaction would be mediated by the graviton (the massless, spin-2 quanta of the gravitational field). As in the other examples of quantum fields theories, quantum gravity along these lines runs into formal difficulties: when one tries to compute probability amplitudes for processes involving graviton exchange (and, indeed, involving the exchange of other particles in quantum field theory) we get infinities out. Since we cannot readily make sense of measuring infinite quantities, something has clearly gone awry. In standard quantum field theories, QED for example, these infinities can be ‘absorbed’ in a procedure called renormalization—note that many of the founding fathers of quantum field theory were not fond of the renormalization methods (see (Brown, 1992) for historical perspectives on renormalization). The problems occur when the region of integration involves very short distances (or, equivalently, very high momenta on the virtual particles). In quantum gravity, as the momenta increase (or the distances decrease) the strength of the interactions grow without limit, so that the divergences get progressively worse.24 Renormalization then refers to singularities of another kind, but essentially to do with spacetime issues. Quantum gravity, given the dimensional argument above, should provide us with a description of the behaviour of fields—the gravitational field—at very short distances (so called ‘ultraviolet’ distances). There is no doubt that general relativity performs remarkably well at the scales it has been tested at thus far. However, if we view it through the lens of quantum field theory then, in four-dimensions at least, it is perturbatively non-renormalizable.25 In other words, when one attempts to predict the values of observables, following the method of Feynman diagrams, one finds that there are infinite amplitudes that cannot be absorbed through the machinery of renormalization theory.26 24 For a pedagogically outstanding introduction to renormalization (restricted to QED), see (Mills, 1993). The other contributions to the book featuring this article (namely (Brown, 1992)), dealing with historical and philosophical issues of renormalization, are also well worth reading. 25 Power counting arguments are sufficient to show that, like the abandoned Fermi theory of the weak interaction, the gravitational coupling constant has a negative mass dimension. This is a necessary symptom of a non-renormalizable theory. 26 The standard definition of renormalization is an ‘absorption of the infinities via a redefinition of a finite number of physical parameters’. What does that mean? The physical parameters are such things as mass m and charge e. They have perfectly finite values when observed in real experiments. The way out of this is to deny that the ‘bare’ mass and charge in the Lagrangian are the ‘true’ representational devices, but instead one uses the more realistic case of the mass and charge of an electron swarming with virtual particles. In the case of gravity we find that the perturbative theory is not renormalizable in this manner. To renormalize the theory one would have to insert infinitely many ‘absorption parameters’, that would each (infinitely many of them) have to be determined through experiment, as above.

276

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

This is usually taken to mean the scales at which these divergences make their entrance mark the cut-off point at which general relativity breaks down, and a new theory with new physics must take over. Or, in other words, that general relativity is not a fundamental theory, only an effective theory. One might legitimately wonder whether there is such a thing as a Fundamental Theory, at the root of all other phenomena. We might be able to do no better than finding a tower of effective theories for the simple reason that there is no true basis theory. Certainly, recent work in condensed matter physics and complexity science makes such a view more appealing—see (Morrison, 2006) for example; (Laughlin, 2006) gives an interesting (popular) account of the non-reductionist alternative. It can’t be denied that the search for fundamental theories has proven itself to be heuristically very useful in the development of physics. It is interesting to speculate about what physics would be like without this belief in a ‘base’ that underlies and (reductively) explains all other phenomena. If the work in, say, condensed matter physics is anything to go by then we could expect a more ‘practical’ physics. This distancing of science from practical matters might be what underlies Nancy Cartwright’s distaste (as expressed in, e.g., (Cartwright, 1999)) with what she labels ‘fundamentalism’ (namely, a belief in the ‘unity of Nature’—a matter we turn to in the next subsection). One can certainly appreciate the sentiment, but robbed of the excitement that the belief in fundamentalism offers, the development of physics would surely be a much slower, more tedious affair! However, this is no argument for fundamentalism. The effective field theory approach seems to be a legitimate alternative.27 There are several directions this state of affairs (viz. non-renormalizability) might point: string theorists believe that because their theory is finite and reproduces general relativity at the appropriate scales, it is the best game in town (some erroneously say the only game in town). Alternatively, one might try to modify the action by adding terms that would serve to cancel out the divergences—much effort was put into such approaches in the seventies, leading to so-called ‘higher-derivative theories’ and ‘supergravity’. Another option that has led to a serious competitor to string theory is to put the non-renormalizability of general relativity down to the questionable assumptions that the perturbative methods rely on, such as a fixed, classical, continuum spacetime background. Sidestepping this methodology, and proceeding with a non-perturbative quantization of general relativity can be shown to lead to finite results, as in the case of loop quantum gravity. As Abhay Ashtekar, a leading light of this approach, writes: Why should a smooth continuum be then a good approximation to represent the space-time geometry? We should not presuppose what the 27 For philosophical investigations of this approach, see: (Hartmann, 2001) and (Castellani, 2002). This work can be directly applied to the problem of quantum gravity by arguing that the relativity principle is an emergent phenomenon. Detailed initial work has been carried out in (Laughlin, 2003). Jon Bain (2008) provides a philosophical analysis of this condensed matter physics-based modelling of spacetime.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

277

microstructure should be. Rather, we should let the theory itself tell us what this structure is. This, in turn, means that when one comes to gravity, the basic assumptions of perturbation theory are flawed. ((Ashtekar, 1995), p. 193)

Without these presumptions of perturbation theory, the divergence problems evaporate. The eradication of infinities by quantum gravity has been a common aspiration for a long time. The idea is that it might ‘cure’ the ultraviolet divergences of field theories by providing some kind of physically realistic cutoff: It is possible that this new situation so different from quantized theories invariant with respect to the LORENTZ group only, may help to overcome the divergence difficulties which are so intimately connected with a c-number for the light-cone in the latter theories. (Pauli in (Klein, 1956), p. 6928 )

There are two things to say here. Firstly, just because a theory is nonrenormalizable does not mean that we should discard it: it can still be predictive and, hence, useful (see (Kubo and Nunami, 2003) for a well-argued defense of this point). Secondly, perturbative nonrenormalizability does not mean that there is no consistent nonperturbative quantization available. As canonical quantum gravity researchers (such as Ashtekar above) argue, the machinery of perturbative quantization methods, with their presuppositions of smooth, fixed, spacetime backgrounds, are bound to be inadequate when it comes to quantizing gravity, since gravity is inextricably entangled with spacetime geometry. 5.3.5

Dreams of Unification

The idea of unification has the status of a regulative principle, or an ideal which should be striven towards. Maxwell’s theory beautifully unified electricity and magnetism into a single structure (the electromagnetic field), and got a constant of nature into the bargain! Einstein—in an equally beautiful way—unified Maxwell’s theory with the principle of Galilean relativity, thereby resolving a serious empirical anomaly (namely, the null effect of the æther wind experiments): here, thanks to Minkowski, space and time were unified into a single four dimensional structure, spacetime. The theory of gravitation was unified with special relativity in Einstein’s general theory of relativity: here gravity is unified with spacetime. Quantum theory was unified with special relativity in quantum field theory (the fields being necessitated by the Lorentz-invariant, spacetime description). Unification has seemingly gone into overdrive in the latter part of the last century, with unifications of the weak and electromagnetic interactions (in the Weinberg-Salam electroweak theory): Just as Einstein comprehended the nature of gravitational charge in terms of spacetime curvature, can we comprehend the nature of other charges, of the entire unified set, as a set, in terms of something equally profound? This briefly is the dream, much reinforced by the verification of gauge theory predictions. ((Salam, 1980), p. 723) 28 See

also (Pauli, 1967) for more on this idea.

278

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

Carlip calls the “appeal to unification” motivation of quantum gravity “[t]he stock reply” ((Carlip, 2001), p. 888). One can certainly find this belief in many writings on quantum gravity. For example, Ashtekar writes: Everything in our past experience tells us that the two descriptions of Nature we currently use [quantum mechanics and general relativity— DR] must be approximations, special cases which arise as suitable limits of a single, universal theory. That theory must be based on a synthesis of the basic principles of general relativity and quantum mechanics. This would be the quantum theory of gravity that we are seeking. ((Ashtekar, 1995), p. 186)

Einstein too, of course, hoped for and sought a theory that would (geometrically) treat both gravitation and electromagnetism (ignoring the other less well-known interactions then in evidence) as “one unified conformation” (Einstein’s Leyden Lecture, quoted in (DeWitt, 1980), p. 681). The ‘standard model’ is often portrayed as the example of unification par excellence. It is a Yang-Mills theory with gauge group SU (3) × SU (2) × U (1). However, many are unhappy with the way the standard model is stitched together (specifically, the way SU (3) is tacked on): what would be preferable is a framework that encompasses all interactions within a single gauge group, a single interaction. This naturally involves gravity. The appeal to unification is, however, based on a questionable assumption: there is no a priori reason why nature should be unified, and many approaches to quantum gravity only seek to unify (if that is the right word for it) general relativity and quantum theory, rather than unifying all interactions. There is no reason why the world shouldn’t be a Frankenstein’s monster, with one force serving one purpose, and another distinct (irreducible) force serving some other purpose—a “dappled world” to borrow Cartwright’s phraseology. The belief that there is just one underlying force or law responsible for everything else borders on religious belief, a belief that the world has to be a certain way. One might make an inductive case based on past unifications that unification has led to progress, and is converging to a single interaction or law of nature, but the present situation, the experimental situation at least, paints a non-unified picture of the world—the formal situation too is problematic: QCD is not unified with the electroweak theory, as mentioned above. Simply bunching together some gauge groups does not give us unification. Moreover, ‘unification’ is a rather vague notion, and there are many distinct ways in which one might be said to have unified something or some group of things.29 This vagueness filters through into quantum gravity, for although there is a sense in which quantum gravity is about unification—i.e. the sense in which quantum mechanics and general relativity have to be accounted for in some common 29 See

(Maudlin, 1996) for a careful philosophical analysis of unification in physics. See also (R¨ uger, 1989) for a philosophical discussion dealing explicitly with quantum gravity.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

279

framework—the different methods go about this in different ways. Loop quantum gravity is rather ‘minimalist’ in that it seeks only to produce a quantum theory of the gravitational field, such that the possibility of unifying this theory with the theories of the other interactions is secondary. String theory, by contrast, amounts to a ‘theory of everything’, namely a theory in which one kind of interaction determines every other fact and facet of reality.30 While this latter view—that Nature is unified—may have held sway, certainly amongst physicists, in the past, it is no longer believed to be enforced by the physics (if that ever were the case). It seems very possible, and I think plausible, that loop quantum gravity is on the right track, and if this is the case, then we could wind up having a disunified picture of the world in which we have quantum gauge field theories of all interactions, but only two of the four (the electromagnetic and the weak interactions, giving the elecroweak interaction) are truly unified (in the sense of having been merged into a single interaction). That said, it is a fairly incontestable claim that unification has proven to be a major player in contemporary physics and its development. But in each of these cases there has been some empirical anomaly that drove the unification, and some observational data to guide the development of theory. In the case of quantum gravity the anomalies are purely formal and there is no data to guide the construction.31 Note, however, that quantum gravity should not be confused with so-called ‘unified field theories’. Unified field theories—as developed by Einstein, Eddington, Kaluza, Weyl, Schr¨ odinger, and company—aimed to modify spacetime in various ways to accommodate non-gravitational fields (mainly, it has to be said, the electromagnetic field). However, elements of these unsuccessful attempts at unification still play a crucial rˆ ole in many quantum gravity approaches. Most notable here is the introduction, by Kaluza, of extra dimensions of spacetime and, by Klein, their compactification into a tightly curled circle (whose winding number properties are automatically responsible for quantizing the electric charge). Also central, of course, is Weyl’s notion of gauge invariance. 5.3.6

Incompatibilities

An intuitive argument for the need for a quantum theory of gravity is as follows. Our best theory of gravity is at the same time a theory of spacetime, and since 30 Note too that even the ‘hybrid’ view of semi-classical gravity, where gravity is kept classical and interacts with quantized matter (or the expectation value of quantized matter) can be said to be unified in this very weak sense. 31 We might make a case for the cosmological constant problem constituting genuine data to guide the construction of the theory. In this case, the different approaches to the energy density of the vacuum by general relativity and quantum field theory lead to very different numbers. Quantum field theory gives either infinity (on the assumption of continuous spacetime) or an extremely large but finite number (if we count only the larger than Planck length contributions). Experiment puts the energy density near to zero, so we have a serious clash here. Unification of quantum field theory with general relativity might be one way to get a sensible number from quantum field theory. For philosophical discussions of this problem see (Rugh and Zinkernagel, 2002) and (Saunders, 2002).

280

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

our theories of the other interactions and matter (all of which are quantum gauge field theories) are local in spacetime, we can naturally expect some kind of linkage between the two, gravity and quantum.32 In particular, one might expect that the coupling between spacetime geometry (gravitation) and quantized matter sources will lead to a quantum geometry, quantum spacetime. We might call this ‘the infection argument’. We find this basic idea appearing in other arguments too; some examples of a ‘formal’ and ‘conceptual’ nature are given below—we consider the validity of the infection argument in §5.6.1. 5.3.6.1 Conceptual Compatibility Issues As we will see in §5.5, there exist basic incompatibilities between general relativity and quantum theory—at least insofar as the examples of quantum theories we currently have at our disposal go. The great advance of general relativity is its background independence, related to (but not equivalent to) its lack of a preferred reference frame or preferred family of frames. More accurately, general relativity does not involve fixed spacetime geometry with values given a priori. The spacetime geometry is what one gets out of the theory by solving the field equations. By contrast, quantum theory appears to demand that there be a fixed spacetime geometry (i.e. it appears to be be necessarily background dependence), related to (but not equivalent to) the existence of a preferred reference frame, or a family of such—see (Weinstein, 2001) for a clear discussion of this apparent dependence of quantum theory on absolute spacetime structure. The basic dynamical variable in general relativity is the metric. The metric serves a dual purpose in this theory: it both determines the geometry of spacetime (and so the kinematic structure against which physical processes are defined) and acts as a (pre-) potential for the gravitational field. Since it is a dynamical variable that also is responsible for spacetime geometry, it follows that geometry itself is dynamical: one has to solve the dynamics in order to get to the kinematics. This feature is a central part of ‘background independence’, the notion that there are no absolute objects (or ‘unmoved movers’) in the theory (or, at least, that the metric is not such a fixed structure). This feature has tended to separate out the various approaches into two camps: those who feel that background independence is a necessary component of quantum gravity, to be preserved at all cost, and those that don’t mind using background dependence to get an approach off the ground (generally, but not always in the hope of one day demonstrating that their approach is, at bottom, background independent too). Those who adopt the former position tend to have academic histories in general relativity, while those who adopt the latter position tend to be part of the particle physics community. General relativity is in conflict with quantum theory as standardly understood, then, because the former is a theory in which the geometry of spacetime 32 However, there is a large amount of disagreement over how best to conceptualize and formulate this linkage in a theory of quantum gravity leading to a proliferation of competing approaches.

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

281

is a dynamical variable, a physical degree of freedom. We are solving for the structure of spacetime when we solve general relativity’s field equations. However, it is very much a classical dynamics: spacetime geometry does not fluctuate in general relativity, and its evolution and properties are quite definite (up to diffeomorphism symmetry). Quantum theory apparently demands a classical spacetime geometry too, but it also prima facie demands a fixed geometry, one that does not vary from solution to solution depending on how the matter-energy content of the universe looks. Hence, we appear to have very divergent treatments of spacetime in these two frameworks: in the former case we have to solve equations of motion to get the geometry of spacetime out; in the latter case we do not. How one deals with this problem determines the path one follows in resolving the problem of quantum gravity. One can reject the background dependence of quantum theory or modify quantum theory in some way and try to quantize the geometry—the idea here is that the metric representing geometry and gravity is a dynamical variable and since one would like a quantum dynamics of gravity one also gets a quantum dynamics of geometry. Alternatively, one can overrule the dynamical nature of geometry, and follow the path of standard quantum theory in retaining a fixed geometrical structure. There are various ‘intermediate’ paths and paths leading ‘off the beaten track’ too, as we will see in §5.6. 5.3.6.2 The Problem of Time. A particularly troublesome incompatibility, certainly one that has received most attention from philosophers, is ‘the problem of time’. This is a consequence of background independence and will be an issue for any theory of quantum gravity (or, indeed, any classical theory) that seeks to retain background independence. Time is a fixed ‘external’ parameter in standard quantum theory, a structure against which dynamics unfolds but that is not itself determined dynamically. In quantum mechanics time appears in the fundamental dynamical equation (the Schr¨ odinger equation) as Newtonian absolute time: H|ψ, t = i

∂ |ψ, t ∂t

(5.1)

This t is insensitive to the nature of ψ and is not converted into an operator in the quantum theory: it remains a classical parameter and is not the eigenvalue of some operator.33 This property of time in quantum theory is considered to be vital in setting up the fundamental formal sectors of the theory: inner-product, basic variables, conservation of probabilities (in time!), and so on. Similar remarks can be made about the special relativistic quantum theory, in which there exist preferred foliations of successive events in the form of Lorentz frames. Not 33 An argument due to Pauli (1926) (known as “Pauli’s theorem”) demonstrates that time cannot be an operator because it’s conjugate, energy (the Hamiltonian operator), is bounded below, whereas time is not. What we would need to have a time operator is for the Hamiltonian operator to have a continuous spectrum spanning the whole of the real number line.

282

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

so in general relativity where the spacetime geometry will be determined by the state of matter.34 5.3.6.3 Formal Compatibility Issues There are a number of apparent formal incompatibilities between general relativity and quantum mechanics. Here I simply present several of these without going into details: many will be developed further in subsequent sections. • It seems we would lose unitarity in quantum gravity: in quantum mechanics probabilities have to add up to unity at a fixed time (or a fixed time in a preferred slicing). This is meaningless in general relativity since there are no preferred foliations. • Time evolution is determined by a hamiltonian operator in quantum theories. In general relativity (for compact universes) the hamiltonian is a sum of constraints. In the quantized version of this theory the hamiltonian operator is identically zero on physical states (i.e. it annihilates them). • Causality (or rather, microcausality) is axiomatic in quantum field theory: spacelike separated bosonic fields must commute, and spacelike separated fermionic fields must anti-commute. However, in general relativity causality is determined by the matter distribution. In quantum gravity it is reasonable to expect that the light cones will fluctuate, but then so will the causal structure! • QFT is a local theory: the observables are spacetime local (with support in spacetime regions). General relativity is diffeomorphism invariant, so the observables of a general relativistic theory will have to be non-local. • General relativity requires nonlinear equations, whereas quantum theory is a linear theory. This nonlinearity is at the root of the divergences that are faced when perturbative quantization methods are applied to general relativity. This clearly is not unique to general relativity, however; any theory with interactions will be a nonlinear theory. There is a further intuitive formal compatibility issue concerning the ‘quantumness’ of quantum theory and the ‘classicalness’ of general relativity. There are arguments suggesting that a classical field coupled to a quantized source will violate the uncertainty principle, since one will be able to use the classical field to determine with a precision greater than that allowed by the uncertainty relations the simultaneous position and momentum of a particle. Furthermore, if we adopt a collapse interpretation of quantum theory, so that the classical field’s measurement sends the particle’s state from a superposition into a definite state, then the principle of conservation of momentum is violated. If we adopt a no-collapse interpretation, then it becomes possible to exploit the coupling to transmit superluminal signals. Empirical inadequacies result if we adopt 34 Rather oddly, Curiel maintains that “there is no manifest contradiction between the two theories themselves” ((Curiel, 2001), p. S431). I submit that this is as clear an example of a contradiction as one could ever wish for!

WHY BOTHER? ARGUMENTS FOR QUANTUM GRAVITY

283

the Everett-DeWitt Many Worlds interpretation. Given a superposition of position states of some massive object, the coupling ought (formally) to lead, after a measurement, to a state in which the field is sensitive to the average of the uncollapsed superposition of position states of the matter; but we really would observe that the field is sensitive to the apparently collapsed state (i.e. one of the other ‘branch’). We return to these issues in §5.6.1. 5.3.7

Gravity and the Measurement Problem

Roger Penrose believes that the measurement problem of quantum mechanics provides the deepest and clearest reason to seek a quantum theory of gravity. In his most recent book this problem trumps the compatibility issues and, indeed, all of the issues just mentioned (see (Penrose, 1999), p. 581). To see Penrose’s perspective, we simply note that linear, unitary evolution does not provide a good description of the macroworld: we do not, it seems, observe macroscopic superpositions—although one might legitimately question the coherence of this statement. What about ‘no-collapse’ interpretations, such as the Many-Worlds interpretation? Even in this case, as Penrose points out, ‘rules of experience’ are needed to make sense of the relationship between unitary reality and the (illusory) definiteness of our observations. If we adopt a collapse interpretation, so that unitarity is supplemented (‘in the large’), then we need to know what this supplementary factor is. Penrose’s answer is that the unitarity of quantum theory has to be modified precisely when gravitational effects become important (see (Penrose, 1996))— that is, quantum theory turns into a non-linear theory at the Planck length.35 Gravity is the supplementary factor. As Penrose puts it himself: “the phenomenon of quantum state reduction is a gravitational phenomenon ((Penrose and Marcer, 1998), p. 1932). Specifically, when a body is in a quantum superposition of locations the state becomes very unstable, becoming more unstable as the mass increases. The system then collapses into some one or the other state in the superposition with a time proportional to the gravitational self-energy of the difference between the superposed field configurations (i.e. the different mass distributions). Hence, there is an objective reduction that is at odds with unitary quantum evolution—Penrose has proposed an experiment to verify the predictions of his approach (see (Penrose and Marcer, 1998), §6). We have yet to see the results of this experiment. If it confirms Penrose’s prediction then this would constitute another feature that any worthy approach to quantum gravity ought to be able to account for. Note that since decoherence effects will be present in the kind of experimental situation Penrose describes, these will have to be separated out from the collapse process.

35 Penrose is not alone in thinking that quantum theory is modified by gravitational effects. For similar proposals, see: (K´ arolyh´ azy, 1966; K´ arolyh´ azy et al., 1986; Di´ osi, 1989; Ghirardi et al., 1990). A good review of the general idea is (Anandan, 1999).

284

5.3.8

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

For Knowledge’s Sake

Ashtekar and Geroch write that “a quantum theory of gravitation would represent an extension of our conceptual framework for the description of nature” and that “any such extension would be of interest in itself” ((Ashtekar and Geroch, 1974), p. 1213). Natural curiosity certainly plays a large rˆ ole in the quest for quantum gravity. The problem of quantum gravity certainly has something of the character of a jigsaw puzzle (of extreme complexity). Such a puzzle taxes the imagination, keying in to those aspects of physics that are absolutely fundamental to our worldview. We strongly suspect that quantum gravity will profoundly alter the way we think about the world, about space, time, matter, and causality. This is taken as a given amongst virtually all approaches to the problem. It is a worthwhile quest precisely because of the dramatic advance in understanding that it will precipitate. Conclusion There are, then, several reasons (not entirely watertight) that together conspire to suggest that a quantum theory of gravity is a theoretical compulsion, if not an experimentally provoked necessity. What is curious about them is that they cropped up at different periods of research. As we see in the next section, what drove the search initially was an intuition that quantum matter would ‘no doubt’ have to modify the gravitational field in some way, just as the gravitational field would have to modify the behaviour of quantum matter in some way. Heisenberg’s dimensional considerations quickly tell one that this modification will not be observable at any energies within our capabilities; hence, matters were largely shelved for many decades. When quantum gravity was finally studied in a systematic way, it was undertaken to a considerable degree on the basis of analogies with other fields. Inferences were made (not always soundly) on the basis of these analogies to physics of the Planck scale. Later work revealed the inadequacies in this analogical reasoning. We trace this development, albeit very briefly, in the next section.36 5.4

A Potted History of Quantum Gravity

In order to get a grip on quantum gravity as a research field as a whole it is important to appreciate its historical trajectory, from its genesis to the present day. There is still very little that has been written on the historical aspects of quantum gravity;37 this is somewhat surprising given that it has been with us 36 I

should point out, before closing this section, that James Mattingly has recently argued— convincingly I think—that the case for quantizing the gravitational “has yet to be articulated” ((Mattingly, 2006b), p. 328). This is a distinct question from that posed here; namely, why bother with quantum gravity (whether that involves an actual quantizing of the gravitational field or not—that is, there might be some alternative method that does not get to the quantum theory by quantizing a classical theory). Again, we discuss these matters in §5.6.1. 37 Notable exceptions are: (Stachel, 1999; Gorelik, 1992; Rovelli, 2002). I lean heavily on the latter in what follows. (DeWitt, 1967c) also offers an excellent historical insight into the early days of quantum gravity.

A POTTED HISTORY OF QUANTUM GRAVITY

285

in some form or another just as long as quantum field theory: such a study would make a fine research project—I would urge historians of physics to take up this challenge sooner rather than later, since many of the physicists who were involved in the early episodes are unfortunately fast departing! To further this end we present a potted history of the highlights of quantum gravity. This brief excursus, interesting in itself, will give us a better handle on the methods to be discussed in §5.6, for we can see how these methods fit into the overall scheme and how they are interrelated. (I should remark that many of the early papers on quantum gravity, especially the early reviews, are the easiest way to get to grips with the field: as a field progresses, certain results come to be taken for granted and the steps leading to them are leaped over. Unfortunately, such steps are often those that give one physical insight into the nature of the problem.38 ) 5.4.1

The Early History of Quantum Gravity

The foundations for general relativity were finally in place by 1915; those of quantum mechanics by 1926; and those of quantum field theory by 1927 (naturally, these dates are fairly arbitrary and refer to the publication of key ideas rather than their creation—see (Cao, 2004; Mehra, 2000; Schweber, 1994; Janssen et al., 2006) for good historical studies of this work). The earliest attempts to bring these theoretical frameworks together involved the same methods as had and would be used for the other fundamental interactions.39 As Abhay Ashtekar puts it, the methodology was “to do unto gravity as one would do unto any other physical field” ((Ashtekar, 1991), p. 2). As is becoming clear after decades of intense effort, gravity is not like any other force, at least not in terms of its formal representation, nor, many believe, in terms of how it is (or ought to be) conceptualized. Einstein’s early remarks seem to have inspired no further research efforts at all, neither by himself nor others. However, similar claims were made intermittently over the next decade or so, but nothing amounting to a serious attempt to construct a full-blown quantum theory of gravity was undertaken. It was generally assumed that there would be no special difficulty in quantizing the gravitational field as opposed to the electromagnetic field. Interest was, curiously, 38 Unfortunately for English-only readers like me, many of the early papers are written in German, French, and Russian: it would make another excellent project to translate these papers and provide a commentary and contextual framework for them (to enable the poor ‘mono-linguists’ like myself access too!). 39 Almost as soon as general relativity was completed, Einstein was aware of a possible conflict between it and the principles of quantum theory, and the need for a quantum theory of gravity. He writes: “because of the intra-atomic movement of electrons, the atom must radiate not only electromagnetic but also gravitational energy, if only in minute amounts. Since, in reality, this cannot be the case in nature, then it appears that the quantum theory must modify not only Maxwell’s electrodynamics but also the new theory of gravitation” ((Einstein, 1916a), p. 696). Gorelik ((1992), p. 365) argues that an “analogy with electrodynamics” lay behind Einstein’s comment, since a calculation of the collapse time under the (miniscule amount of) gravitational radiation gives 1037 , which is too long to be inconsistent with the empirical evidence. This analogy was a persistent feature of early research on quantum gravity—see below).

286

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

restricted to the latter field, general relativity being largely ignored by theoretical physicists at the time—perhaps because of electromagnetism’s greater simplicity or because the subatomic structure of matter was deemed more interesting and fruitful (and, presumably, more accessible). For example, we find Heisenberg and Pauli explicitly voicing this position40 : One should mention that the quantization of the gravitational field which appears necessary for physical reasons [presumably Einstein’s—DR], may be carried out without any new difficulties by using a formalism wholly analogous to that applied here. ((Heisenberg and Pauli, 1929), p. 3)

This attitude—which involves using weak gravitational fields, so that geometrical aspects can be ignored—persisted for many decades, and was responsible for the first serious ‘split’ between the approaches in the fifties: dividing particle physicists and general relativists. The approach suggested involves the downplaying of the geometrical features of gravity and the fact that the metric, rather than a field with respect to the flat metric, is dynamical. When general relativity became more widespread and the general relativists began to turn their attentions to the quantum problem, they were horrified at this degradation of Einstein’s theory. This same pair of attitudes persists to the present day in the battle between string theory and loop quantum gravity. This split underwrites a substantial difference in the importance the researchers from the two camps place on conceptual issues: general relativists often have a preoccupation with issues of the nature of space, time, matter, causality, and so on, whereas the particle physicists tend to give such issues very little credence (if any)—see (Rickles, 2005a; 2006; 2007; 2005b; Rickles and French, 2006) for a discussion of these matters.41 The challenge finally began to be taken up seriously in the 1930s, though the ‘electromagnetism-gravity’ analogy persisted for some time longer.42 Here we find the term ‘gravitational quanta’ being used for the first time, by L´eon Rosenfeld (Rosenfeld, 1930). We find a large-scale, trail-blazing approach by a relatively 40 In this paper, we also find for the first time the idea that gravity corresponds to a massless, spin-2 field, so that the particle carrying the force would be massless and spin-2 (note that the presence of spin-2 particles implies that a theory containing them would, ceteris paribus, be generally covariant)—see also (Fierz, 1939; Fierz and Pauli, 1939) (especially §6 from the latter). This particle was coined ‘graviton’ by Blokhintsev and Gal’perin (Blokhintsev and Gal’perin, 1934). This fact underlies string theory’s claim to be a quantum theory of gravity: the spectrum of the (closed) string contains an oscillatory mode that when quantized corresponds to just such a particle. 41 Those working on so-called ‘canonical quantization’ methods have contributed to many philosophers’ workshops and conferences, and can be found contributing chapters to books (for example, essays by two prominent figures from the canonical camp can be found in (Rickles et al., 2006)). 42 Formal analogies between general relativity and Maxwell’s theory misled researchers for many years. Bryce DeWitt notes that at the time of the earliest attempts to quantize gravity quantum field theory’s “umbilical cord to electrodynamics had not yet been cut” ((DeWitt, 1970), p. 182). However, the formulation of Maxwell’s theory, championed by Mandelstam, using path-dependent variables (holonomies) was very productive, leading the way to loop gravity—see (Mandelstam, 1962), p. 353 and also (Gambini and Pullin, 1996).

A POTTED HISTORY OF QUANTUM GRAVITY

287

unknown Russian, Matvey Petrovich Bronstein—see (Gorelik, 1992; 2005). From then on, there is a progressive expansion of work on the problem of quantum gravity. The major lines of attack are developed, and an enormous amount of work is devoted to pushing the approaches to completion. This continues until the beginning of the 70s, where it becomes clear that there are severe consistency problems in all the main approaches. In particular, the main dynamical equation for the ‘canonical’ approach is eventually viewed to be ill-defined and, almost fatally for the ‘covariant’ approach, results begin to accumulate concerning the perturbative non-renormalizability of general relativity. In the late 80s these issues are finally resolved. A dramatic shift involving the use of spatially extended fundamental objects (‘strings’) cures the regularization issue in the covariant approach, and a shift to new variables cures the ill-definedness (and absence of solutions) in the canonical approach—there are, of course, still various formal problems facing both approaches, but not nearly so severe. 5.4.2

Developing the Research Avenues

From the early efforts of Rosenfeld, Dirac, Bergmann, Wheeler, Schwinger, DeWitt, Gupta, Mandelstam, Feynman, Misner, and others, came three distinct methodologies that are, more or less, still in operation in the varied approaches around today—there are points of intersection between these methodologies, as we will see.43 These are (with a historical breakdown of key events—here I borrow heavily from (Rovelli, 2002)):44 • Covariant: construct quantum gravity as a theory of the fluctuations of the metric field over a flat (non-dynamical) spacetime. In other words, split the metric into a background part and small perturbations and quantize only the latter, leaving the former classical and non-dynamical—this, roughly, is known as the method of the background field. In this way, one splits apart the dual rˆ ole the metric plays in general relativity. ∗ Initiated: first attempt to bring gravitation into the realm of particle physics and quantum field theory using linearized gravitational field equations (with non-linear terms ‘patched in’ at a later stage)— (Rosenfeld, 1930; Fierz, 1939; Fierz and Pauli, 1939; Gupta, 1952b; 1952a; Kraichnan, 1955; Thirring, 1961; Mandelstam, 1962) ∗ Feynman rules: covariant quantum gravity formulated using Feynman diagrams giving the contributions of each diagram to the overall 43 Naturally,

one can find other ways to divide the approaches (particle physics versus general relativity; background dependent versus background independent, etc.), but the taxonomy given here fits the standard practice of physicists. One finds it repeated in most general review papers on the subject. 44 Naturally, this is heavily truncated, and many important players and events have been omitted in the interests of brevity. Important omissions are Misner’s instigation of ‘Quantum Cosmology’ (Misner, 1969) (using mini-superspace models) and Hawking’s discovery that black holes radiate at a thermal spectrum (Hawking, 1975). We have treated these topics, albeit briefly, elsewhere in this primer.

288

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

amplitude (which is simply the sum of amplitudes contributed by the topologically distinct diagrams); given this one can then go about computing the probability for the process concerned by squaring the absolute value of the total amplitude—(Feynman, 1963; DeWitt, 1965; 1967a; 1967b; Faddeev and Popov, 1967; Mandelstam, 1968) ∗ Non-renormalizability: on the basis of work on the Feynman rules, covariant perturbative quantum gravity was found to have ineradicable UV divergences—(Deser, 1957; t’Hooft, 1971; t’Hooft and Veltman, 1972; Deser and van Nieuwenhuizen, 1974) ∗ Higher-derivative/supergravity/strings45 : these approaches all emerged in a bid to evade the divergence problems while remaining firmly wedded to the covariant quantization methodology—(van Nieuwenhuizen, 1981; Green and Schwartz, 1984; Witten, 1995) • Canonical (‘Hamiltonian’): construct quantum gravity as a theory of the fluctuations of the metric as a whole, so no fixed metric is involved— however, spacetime is split into space and time, with a fixed topological structure. The metric information is represented, in full, by operators on a Hilbert space. One aims to find the eigenvalues of the operators along with their transition probabilities. ∗ Initiated: The basic ideas of a canonical approach to general relativity are formulated and potential problems vis-´ a-vis quantization discussed (leading to incredibly detailed investigation of the phase space structure of general relativity)—(Bergmann, 1949a; 1949b; Dirac, 1950; 1951; 1958b; Peres, 1962; 1968; Bergmann and Komar, 1972)46 ∗ New variables I: Arnowitt, Deser, and Misner put general relativity into a form fit for running through the basic quantization steps47 — (Arnowitt et al., 1962) ∗ Dynamical equations: explicit formulations of the constraints (the basic dynamical equations of Hamiltonian general relativity) of general relativity are found—(DeWitt, 1967c; Wheeler, 1968) 45 The shift to string theory has a curious history. The crucial first step is generally agreed to have been taken by Gabriele Veneziano (1968). His contribution concerned the connection of the data coming from experimental investigations into hadrons with Euler’s Beta function. This was purely descriptive or ‘phenomenological’; no mechanism was postulated for why the function might be applicable in this context. The explanation came from three independent sources: Nambu (1970), Nielson (1970), and Susskind (1970), who each realized that the function could be seen to make sense if the strong forces were understood as oscillating strings. We return to this history, and some finer details of the other approaches, in §5.6. 46 Donald Salisbury (2007) argues that Rosenfeld was really the creator of the constrained Hamiltonian formalism (the formal core of canonical quantization approaches), and missed out on writing down the Hamiltonian for general relativity by dint of not choosing one of his tetrad fields to be normal to the foliation of spacetime he gave. 47 Note that this approach makes use of asymptotically flat spacetime, and hence introduces background structure at infinity.

A POTTED HISTORY OF QUANTUM GRAVITY

289

∗ New variables II (‘spin-connection’): geometrical information is now contained in connections (or more accurately, fluxes)—(Sen, 1982; Ashtekar, 1986) ∗ Loop representation: Jacobson and Smolin find a class of exact solutions to the Wheeler-DeWitt equation based on Wilson loops. Rovelli joins the partnership, eventually producing, along with the help of many others, a viable contender to string theory using the Wilson loops as the fundamental variables48 —(Jacobson and Smolin, 1988; Rovelli and Smolin, 1988) • Feynman Quantization: a close cousin of the covariant approach involving the construction of quantum gravity as a theory involving Feynman’s functional-integral (path-integral; sum over histories) quantization techniques applied to the metrics of general relativity—hence, the focus is on entire spacetime histories. In this case, the amplitude to go from one metric state h at t0 to another h at t1 is given by the integral over all possible field configurations that coincide with the specified metric states at t0 and t1 (the boundaries): h , t1 |h, t0  = D[g]exp(iS[g]) (where D[g] is a measure on the space of all 4-geometries and S[g] is the action for general relativity). ∗ Initiated: Misner formulates the basic structure of the approach, involving a specification of the weights of the paths (= spacetime geometries) in the Feynman integral, an operator form of the field equations, and a “partial evaluation of the Feynman propagator”, recovering from this the canonical result that the Hamiltonian operator is zero—(Misner, 1957) ∗ Euclidean ‘Sum-Over-Histories’ approach: Misner’s early approach was picked up by Hawking, who ‘Wick rotated’ to imaginary time it (thus summing over Riemannian/Euclidean metrics only) to simplify the path integral and overcome some consistency problems in the original approach, such as the lack of a rigorous measure on the space of geometries. By dealing with single boundary and triple boundary situations Hawking (and Hartle) were able to describe universe creation ex nihilo and the birth of ‘baby universes’—(Hawking, 1978; 1979) ∗ Discretization: represent quantum gravity as a theory of dynamical triangulations so that the path integral can be computed in a combinatorial fashion, by counting the geometries; continuum

48 Following this work, there was a considerable amount of technical work still to be done to make the theory rigorous: e.g. regularization, defining the inner-product structure, and so on. Many of the formal troubles with the loop representation were resolved by shifting to a basis of spin-networks—(Rovelli and Smolin, 1995). See Rovelli’s entry on “Loop Quantum Gravity” in the Library of Living Reviews for a more detailed breakdown of key events: http://relativity.livingreviews.org/Articles/lrr-1998-1.

290

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

aspects emerge in an appropriate limit—(Ambjørn et al., 1997; Ambjørn, 1996)49 ∗ Spin-Foam Models: covariant, discrete, path-integral representation of the constraints of loop quantum gravity (in many ways this approach can be viewed as a synthesis of the three methodologies— (Barrett and Crane, 1998; Baez, 1998; Oriti, 2005) In addition to these main lines of attack, there are several ‘external’ approaches that do not readily fit into the categories laid out (this is just a very small sample): • External: Twistors (Penrose, 1967), Non-Commutative Geometry (Snyder, 1947; Connes, 1994), Regge Calculus (Regge, 1961); Causal Sets (Sorkin, 1983; Henson, forthcoming); Topological Quantum Field Theory (Witten, 1988); Generalized Quantum Mechanics (Hartle, 1995); Group Field Theory (Oriti, 2006); Causaloids (Hardy, 2007).50 However, these are generally not intended to function as whole-sale quantum gravity theories, but as ‘ingredients’ of some genuine theory of quantum gravity— for the most part, they do not involve the quantization of general relativity. Their importance should not be downplayed though; it is through the inspiration from these alternatives that many of the ideas of the main approaches were derived.51 They also frequently forge new connections between mathematics and physics (such as the link to knot theory which is now a central component of several approaches—see (Baez and Munian, 1994)). Conclusion What is interesting to note about this brief depiction of the approaches, is that during the first forty or so years, there are researchers who straddle the various research methodologies.52 Twenty years later this is virtually unheard of, with 49 A very good general review of discrete approaches to quantum gravity, including many not mentioned here, is (Loll, 1998). 50 An interesting collection of ‘iconoclastic’ approaches can be found in the International Journal of Theoretical Physics, Vol. 45, No. 8, 2006. 51 For example, the concept of a ‘spin network’ used to label the (kinematical) states in loop quantum gravity is a direct descendent of Roger Penrose’s early attempt to construct a combinatorial notion of quantum space in his ‘Theory of Quantized Directions’ (Penrose, 1971), which morphed into his ‘twistor theory’ programme (Penrose, 1979). The concept was then ‘covariantized’ in the ‘spin foam model’ approach, which seeks to describe the evolution of spin network states. Twistor theory itself has recently merged with string theory, giving birth to ‘Twistor string theory’ (Witten, 2004) (see also: http : //gallery.math.ucdavis.edu/albums/mathphyscon/160 4E dwardW itten.mp4)— hence, this version of string theory and loop quantum gravity have a close direct descendent! 52 For example, Dirac did work on the field theory of extended objects (what would nowadays be called ‘2-branes’) and early work on the Hamiltonian formulation of general relativity that forms the bedrock of loop quantum gravity. He also laid the formal foundations for the pathintegral approach ready for Feynman to develop it into a self-contained approach—see §32 of (Dirac, 1958a).

THE INGREDIENTS OF QUANTUM GRAVITY

291

just one or two exceptions. This is one of those aspects that is of sociological as well as historical interest. However, it should be clear that though current researchers tend to divide themselves fairly rigidly according to the categories mentioned here, the methodological divides are no such thing: there are many points of contact between the various approaches, and there is, as has happened many times in the past, and as Lee Smolin argues in his Three Roads to Quantum Gravity (2002), the very strong possibility for convergence between approaches. 5.5

The Ingredients of Quantum Gravity

As we have seen, to make a quantum theory of gravity we need to take into account both quantum theory and general relativity. These are the ingredients of quantum gravity.53 At present they ignore each other: general relativity deals only with the interactions of classical things and quantum field theory deals only with non-gravitational interactions. The different approaches can be viewed as different ‘recipes’, some using more of one of these ingredients than the other, some adding different ingredients to the mixture: extra dimensions, supersymmetry, complexification, discreteness, etc. Some, as we have already seen with Penrose’s ideas, think that entirely different ingredients may be required to get a truly fundamental theory with general relativity and quantum field theory as limits. Since David Wallace has already given us an excellent account of quantum theory and quantum field theory (see, e.g., §2.7.1), we do not need to spend too much time going over the details again. Instead, we focus on extracting just those details that are relevant. We look at general relativity, quantum theory (in general) and quantum field theory. We will also briefly need to say something about the process of quantizing a classical system to get a corresponding quantum system. Again, our task here is made easier thanks to Roman Frigg’s excellent account of classical mechanics in the Appendix to his chapter. 5.5.1

General Relativity

General relativity is primarily a theory describing the classical gravitational field. Let’s begin by giving a nutshell definition of general relativity: • The gravitational field is the metric field, and affects all other fields and objects. It is itself affected by mass-energy. General relativity is the theory of this field. In other words, the gravitational field supplies structures generally associated with spacetime: spacelike and timelike separation, causal structure, etc. The reason for this dual function is that the metric field is taken to represent gravitational field structures and spacetime structures. We are led here by the equivalence principle which tells us that gravitational mass and inertial mass are the same. This thought prompts the idea that objects that fall freely in a uniform 53 I

borrow the term ‘ingredients’ from (Butterfield and Isham, 2001).

292

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

gravitational field are following geodesics in curved spacetime. By adding the local equivalence to Minkowski spacetime (i.e. in small neighbourhoods of the points of the manifold) and adding the condition that the spacetime should satisfy a set of field equations relating curvature to matter-energy sources we get Einstein’s theory—naturally, this is drastically simplified! A simple inspection of Einstein’s field equations of general relativity already points towards the need, or at least the consideration of a quantum theory of gravity, in some form of other. The simplest way to present general relativity in a short space is using models.54 General relativity has models of the form M = M, g, T . To be models of general relativity the two tensor fields g and T must satisfy the Einstein field equation at every point of the manifold M: Gμν ≡ Rμν [g] −

gμν R[g] + Λgμν = −8πGN Tμν [Φ] 2

(5.2)

Gμν is the Einstein tensor describing the curvature of spacetime, and Tμν is the stress-energy tensor describing (the stress-energy-momenta of) various possible sources Φ of the gravitational field, such as particles, fields, dusts, strings, branes, etc.55 The term −8πGN (sometimes written κ) is the coupling constant, proportional to Newton’s universal constant of gravitation, determining the strength of gravitational interactions (in the sense that when κ = 0 there are zero interactions). The problematic term that involves quantum theory is, of course, the stress-energy tensor of matter. Our best theories of matter are quantum theories, which, prima facie, implies that Tμν should best be represented by a quantum operator, a q-number rather than a c-number. If this is the case, and the success of the standard model of particle physics suggests that it is, or at least a good approximation (certainly better than classical physics!), then we have to face the issue of the coupling between the geometry of spacetime and this quantized matter source—however, we are getting ahead of ourselves here. In vacuum general relativity (i.e. when Tμν = 0) the equation is simply Rμν = 0. Those distributions of the tensor fields over the manifold that satisfy this equation of motion 54 For more extended coverage, investigate the following books (organized according to difficulty: hardest first): (Wald, 1984), (Misner et al., 1973), (Carroll, 2003), (Schutz, 2003). Another very interesting and highly original presentation of general relativity (though very hard to come by) is (Kopczy´ nscki and Trautman, 1992)—this book contains the most elementary and condensed treatment of advanced topics from general relativity that I have seen; it also have numerous useful ‘conceptual asides’. The general mathematical background required for studying general relativity (differential geometry, topology, etc.) can be found in the Les Houches, Relativity, Groups, and Topology lectures (DeWitt, 1964; Stora and DeWitt, 1986). Alternatively, a simpler genera relativity ‘toolkit’ is (Poisson, 2004). 55 Note that Einstein was dissatisfied with the ‘phenomenological’ nature of his field equation. He compared it to “a building, one wing of which is made of fine marble (left part of the equation), but the other wing of which is built of low grade wood (right side of equation). The phenomenological representation of matter is, in fact, only a crude substitute for a representation which would correspond to all known properties of matter” ((Einstein, 1956), p. 83).

THE INGREDIENTS OF QUANTUM GRAVITY

293

correspond to the dynamically possible histories; those distributions that are possible in a merely ‘formal’ sense are kinematically possible trajectories. Only the former represent physical possibilities.56 The standard assumption (made to fit what we observe) is that spacetime S = M, g is represented by a real, four-dimensional connected, C ∞ Hausdorff manifold M on which is defined a tensor field g of Lorentzian signature. The metric gμν is a sixteen component symmetric tensor, which means that Einstein’s equation has ten independent components.57 In a manifestly covariant formulation, these are unified. However, if we follow the canonical procedure of splitting spacetime into space and time then only six of these ten components drive the metric, telling us how it evolves in time. Four of the components (or equations) are constraints that the metric on a spatial slice, and its first derivative, have to satisfy at any and all times. An alternative formulation involves looking at the Einstein-Hilbert action functional for general relativity (this is the action that would be used to weight the paths in the Feynman quantization approach mentioned in the previous section): 1 S[g] = 16πGN



d4 x −

det g R (+ surface terms)

(5.3)

This action is invariant under the symmetry group of general relativity, namely the group of four-dimensional diffeomorphisms of the spacetime manifold. This gives us an alternative way of understanding the constraints: the basic idea is that the dynamical equations we get from this action contain ‘surplus’, unphysical structure—i.e. not all of the degrees of freedom are physical. Instead, there are variables that are deemed physically equivalent if they are connected by an element of the invariance group (i.e. by a 4-diffeomorphism). In this case the transformation is called a gauge transformation, and in the present case we would have two (gauge-equivalent) metrics that describe one and the same spacetime geometry. There are then, as with most theories, multiple ways to formulate general relativity. What these have in common is, in some sense (and with very varying degrees of explicitness), the idea that the gravitational field dynamics are given by the dynamics of the geometry of space or the spacetime geometry. A full spacetime, equipped with a gravitational field, is a bunch of spatial geometries stacked up in a certain way. Note the emphasis on the 3-geometry rather than the 3-metric here: the 3-metric represents a 3-geometry, but the relation between 56 As John Wheeler helpfully puts it, “[k]inematics describes conceivable motions without asking whether they are allowed or forbidden. Dynamics analyses the difference between a physically reasonable and a disallowed history” ((Wheeler, 1964b), p. 65). 57 The tensorial nature of the metric means that the values of these components depend on a particular coordinate system. However, regardless of the coordinate system used, the lengths of worldlines in spacetime are assigned the same value—that is, the length between a pair of spacetime points is an invariant quantity.

294

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

metrics and geometries is many-to-one: a geometry is an equivalence class of metrics with elements related by a diffeomorphism. Now, if Riem(Σ) is the space of Riemannian 3-metrics on a spatial hypersurface then we can get the space of 3geometries (‘superspace’ S(Σ)) by ‘quotienting out’ the group of diffeomorphisms of Σ, i.e. Diff(Σ): S(Σ) =

Riem(Σ) . Diff(Σ)

(5.4)

There are many interesting and intricate mathematical features of superspace; for example, although the space of Riemannian metrics on a manifold is a manifold, superspace is not (it possesses ‘kinks’), as was proved by Fischer (1970). However, one can set up dynamics in superspace. What one needs is a way of ‘drawing’ a curve connecting distinct points of superspace (each representing an instantaneous space) so as to generate a spacetime. This is performed by a Hamilton-Jacobi equation. This is the starting point for canonical quantization along geometrodynamical lines: the quantum dynamical equation is the notorious Wheeler-DeWitt equation. There are also many interesting technical differences between general relativity and pre-general relativistic theories. Notably, the invariance group of general relativity, being the diffeomorphism group of the manifold, is not a finitedimensional Lie group. This feature has been used to underwrite the ‘specialness’ of general relativity, namely its background independence.58 Much is made in the context of quantum gravity of the background independence of general relativity, and the imposition of background independence as a requirement of a theory of quantum gravity. As Unruh points out:59 In all other quantized theories, one can assume that the space-time manifold structure can be defined by some matter which does not interact with the field of interest. It is precisely the universality of gravity which prevents us from doing this for the gravitational field. ((Unruh, 1984), p. 242)

We return to this subject in §5.7.2. For now it suffices to say that the geometry of spacetime is dynamical in general relativity; its value depends on the solution to the field equations. This is quite unlike anything encountered in any quantum theory to date where the geometry of space and time have their values fixed ab initio. 5.5.2

Quantum Field Theory

In this section we outline only the skeletal framework of quantum theory: we shall not be concerned with such things as entanglement and non-locality. The 58 Note, however, that some physicists do not see this as a good thing. Vladimir Fock (1976), for example, proposed a ‘bi-metric’ theory in which a secondary metric is included in the theory as a background structure. 59 See (Weinstein, 2001) for a similar perspective of the nature of the background independence of general relativity and the problems it poses for quantum theory.

THE INGREDIENTS OF QUANTUM GRAVITY

295

main aspect we wish to highlight here is the relationship between the structure of quantum theory and that of general relativity.60 Let us begin with the quantization of a generic classical system. The first step is to define a system with a certain number of degrees of freedom n. The dynamical variables qk (k = 1, ..., n) associated to these degrees of freedom then span a configuration space Q. The dynamics of this system is given by the t Euler-Lagrange equation, δS = 0 (where S = t12 dtL), which is derived from the Lagrangian L(q, q). ˙ A Legendre transformation gives one the coordinates < qk , pk > (where pk ≡ ∂L/∂ q˙k ) of a phase space Γ. The Lagrangian may then be written in terms of canonical variables as L = Σpk q˙k − H(q, p). In this case, the dynamics is given by the Hamiltonian H(q, p), where the time-development of q and p are given by: q˙k = {qk , H} ≡ ∂H/∂pk p˙k = {pk , H} ≡ −∂H/∂qk For functions (i.e. observables) on the phase space, one computes their Poisson bracket as:  ∂F ∂G ∂F ∂G − {F, G} := Σ ∂q ∂pk ∂qk k ∂pk The brackets for the ‘fundamental’ variables are {qk , pl } = δkl (where δkl is the Kronecker delta: 1 when k = l and zero otherwise). Now, in order to quantize such a system, one applies the following set of rules, derived for the most part from the symplectic geometry of the phase space. Firstly, Poisson brackets become commutators: [ˆ qk , pˆl ] = iδkl —with qˆk and pˆl now understood as operators. Secondly, the state of the system is given by a complex function Ψ(q) on the configuration space, whose properties make it a Hilbert space (i.e. a normed vector space with an inner product, standardly taken to be positivedefinite to forbid negative energy values). The inner product structure makes the dynamical variables Hermitian operators on the state space. The dynamics (the time-translation operator) is given by the quantum Hamiltonian via the ˆ = i∂ψ/∂t or via Heisenberg’s equation of motion (for Schr¨ odinger equation Hψ ˆ = i dO . some dynamical variable O) [O, H] dt However, for general relativity (and many other systems) this simple series of steps is not possible. General relativity is a constrained theory, a gauge theory— as exemplified by its general covariance—and so the step from the Lagrangian to the phase space, and the Hamiltonian description does not go smoothly: the Hessian of the Lagrangian (namely the matrix ∂ 2 L/∂ q˙j ∂q.k ) is singular, and not all of the variables are physical—some are gauge degrees of freedom or ‘surplus’.61 60 A

more general treatment, with an eye to conceptual issues, is (Isham, 1995). is a general problem for gauge theories. See (Henneaux and Teitelboim, 1992) for a thorough treatment of this aspect of gauge theories. 61 This

296

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

The problem, then, is the presence of constraints on the variables. These come in two flavours in general relativity: there is the diffeomorphism constraint (three of them at each point of an initial spatial slice) which maps a x0 = c hypersurface to itself, and there is the Hamiltonian constraint which maps a x0 = x hypersurface onto other hypersurfaces—we do not need to view these hypersurfaces as existing in a pre-given block of spacetime; indeed, in the canonical (geometrodynamical) approach there is just a single 3-manifold to work with. These constraints taken together make up the full Hamiltonian of general relativity. Now, a natural choice of configuration space, given general covariance, is the space of 3-geometries (or 3-metrics modulo the diffeomorphism symmetry). Given this, any wavefunctions defined over this space will automatically satisfy the diffeomorphism constraint: Ψ[3 g] = Ψ[φ(3 g)] (where φ ∈ Diff(Σ)). The Hamiltonian constraint has proven much harder to satisfy, and indeed to define in a consistent manner.62 Let us briefly turn to the particle physicists understanding of interactions. The central idea of quantum field theory is that forces between particles are mediated by the exchange of field quanta. The type of force (what it acts on, what its sources are, and how it acts) is determined by the spins and masses of the particles. For example, Maxwell’s (classical) theory of electrodynamics is the classical limit of the (relativistic) quantum theory of a massless spin-1 particle, the photon. In the case of the gravitational field we must select mass and spin on the basis of observed properties of gravitational interactions: long-range, static, attractive, macroscopic, etc. This leads one to postulate a massless spin-2 (selfinteracting) particle: the graviton.63 The Lorentz invariant theory of gravitons propagating on a flat background has general relativity as a low energy limit. In this sense the classical theory of general relativity is implied by a specially relativistic quantum particle theory. However, serious problems emerge when this scheme is extended into high frequency domains (where quantum field theory proper is required), as we sketched in §5.3.4. (But recall that sense can still be made of the approach in terms of effective field theory: see below.) General relativity is, of course, usually considered to be a geometric theory in the sense that gravitation is deemed a manifestation of curved spacetime as determined by its energy content—(test) particles will then follow geodesics within this curved spacetime. The key point to take from the particle physics approach is that spacetime is decidedly flat (or at least fixed in place independently of the 62 This is, of course, just one way to understand quantization (namely, canonical quantization): as Wallace outlined in §2.7.1, there are many formulations of quantum field theory. We introduce the details of these other approaches when we discuss the various approaches to quantum gravity. (There is another approach to quantum field theory known as ‘axiomatic quantum field theory’ (also sometimes called ‘general quantum field theory’) since this approach is concerned more with conceptual clarity and formal precision than non-axiomatic approaches.There are many excellent texts on this approach to quantum field theory. My personal favourite is (Baez et al., 1992))—the book is out of print as of writing, but it can be obtained perfectly legally from John Baez’s website: http://math.ucr.edu/home/baez/bsz.html.) 63 For a detailed account, see (Deser, 1975b) (especially §2) or (Deser, 1975a). Or, for the ‘horse’s mouth’ arguments, see (Fierz and Pauli, 1939), (Feynman, 1995) and (Weinberg, 1964).

THE INGREDIENTS OF QUANTUM GRAVITY

297

processes occurring within it). Geometrical features of spacetime are not dynamical. In the case of gravity, understood as a quantum field theory, particles will always really be moving around in spacetime with a fixed metric; however, it will appear that they are following geodesics in curved spacetime because of the graviton exchange going on between them. This approach views gravity much as any other interaction according to which interactions between two (charged) objects is achieved through the exchange of field quanta. There are severe mathematical problems facing quantum field theories (and indeed field theories in general) with interactions turned on (including selfinteractions): given that the interactions are taken to happen at the same point of spacetime, one has to consider products of operators at one and the same point. There is a way around it: perturbation theory. One pretends that the complicated interactions are perturbations of a well-defined free (= without interactions) field theory. More technically, one expands the physical quantities of the theory as a Taylor series in the coupling constant—the latter determining the strength of the interaction concerned. This is where the famous divergences come from: when the various terms in the series are evaluated, they spew out infinities.64 We don’t measure infinite values! Hence, we have a problem. This was resolved, after a great deal of hard labour and not to everyone’s satisfaction65 by the mathematical magic of renormalization theory. Let us say some more about renormalization and its problems.66 Quantum field theory provides our present theoretical framework for describing the interaction of charged point particles with non-gravitational fields: the electromagnetic field, the weak field, and the strong field. The quantum field theories that describe these interactions are all local in the sense that the field interactions occur at individual points of spacetime. Making the interactions local allows for the peaceful coexistence of special relativity and quantum theory, thus preserving causality. However, as we saw in §5.3.4, this locality (involving the piling up of field interactions at the same spacetime point) leads to infinities in the theory: the locality leads to singularities which in turn lead to divergences. The renormalization programme led to a finite version of quantum field theory for electromagnetic interactions, known as renormalized QED. The path to this theory involves the introduction of a cutoff so that wavelengths (resp. energies) shorter (resp. higher) than the cutoff are ignored (thus rendering the divergent 64 Note that we are talking about the individual terms of the power series diverging here, not the whole series when summed. Proving converge or divergence of the full perturbation series is a seriously hard task: string theory, and many quantum field theories defined in this way, have question marks over the nature of the full series. 65 Dirac is a case in point here: (Dirac, 1951; 1987). Feynman too was not convinced that the renormalization was the last word on the divergence problems: http : //nobelprize.org/nobelp rizes/physics/laureates/1965/feynman − lecture.html. 66 Renormalization has received a fair amount of interest from philosophers: (Huggett and Weingard, 1995; 1996; Cao and Schweber, 1993; Teller, 1989). For an exceptionally clear, elementary guide to renormalization, see John Baez’s entry at: http://math.ucr.edu/home/baez/renormalization.html.

298

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

terms finite).67 Depending on the cutoff chosen, we will get different values predicted for the various physical quantities. In order to get rid of this unwanted dependence, and get the theory predictively back on track, one performs a readjustment manoeuvre called renormalization as follows. The parameter-values of this modified theory are inserted from their experimentally observed ‘dressed’ values—dressed, that is, by the swarm of virtual particles that are produced in line with the time-energy uncertainty relations. However, the theory and the parameters at this stage are dependent on the cutoff scale, which is arbitrary, so we must take the continuum limit, letting it go to zero (or the momentum go to infinity). When we do this we get renormalized QED, with quantities that are independent of the cutoff. To eradicate the divergences of QED one needs to perform renormalization of both charge and mass. Broadly speaking, if one needs to redefine only a finite number of parameters to absorb the infinities then the theory is renormalizable. Otherwise it is non-renormalizable. In the early days of quantum field theory non-renormalizability was considered to be the death-knell of a theory, implying at best its being non-fundamental and at worst its formal inconsistency. Later, renormalization group techniques led to a revision of this ‘selection principle’ interpretation of the significance of non-renormalizability. Instead, a more pragmatic interpretation was adopted according to which a theory is ‘sensitive’ to scale, so that as the energy is varied the theory may or may not cease to be useful. This domain dependent approach is known as the effective field theory: a theory will be effective only in a certain range. The reason that theories can be effective in a certain range is that they are ‘insensitive’ to what is going on at other scales (especially at higher energies) and so are rendered relatively free from ‘interference’. It is perfectly possible that quantum gravity is an effective field theory in just this sense—i.e. not a theory of everything in the sense of §5.2.1. This is just how string theorists view perturbative string theory—see §5.6.3. 5.5.3

Conclusions

Given the various similarities between Maxwell’s theory of the electromagnetic field and Einstein’s theory of the gravitational field, one might expect to gain some limited insight into what quantum gravity has in store for us, by comparing it with the quantum version of Maxwell’s theory. In the classical theory one deals with electric and magnetic fields, which are defined at each point of spacetime. On quantizing these fields we find that the fields must obey the uncertainty relations (thus forbidding simultaneous specification of their values). However, this expectation, though apparently borne out by many approaches, is a little premature: there are features of general relativity that are simply not 67 This cutoff can be understood as slicing off frequencies of waves in the field that are higher than some chosen value, in the Lorentzian formalism. Alternatively, if we use the map t → it (= Wick rotation), which transforms Lorentzian spacetime into Euclidean spacetime, it can be understood as a distance cutoff which allows one to ignore any behaviour of the fields happening at scales shorter than the chosen cutoff, the distance D.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

299

shared by Maxwell’s theory, or indeed any other classical field theory. General relativity, as we have seen, does not involve a background metric manifold. This alone muddies the waters immensely, as far as simple extrapolations of the above kind are concerned. What this background independence implies, amongst other things, is that one cannot simply localize the fields in the theory; one cannot define a simple expression for the energy; the propagation for the fields is more complicated; and, in the realm of the quantum, one cannot postulate the microcausality axiom, nor can one formulate a Schr¨ odinger equation, for there is no external notion of space and time to ground these features (one has instead a constraint equation). These differences suggest that the whole conceptual structure of quantum gravity will be radically different from anything we have had experience of before. One might even legitimately question whether the theory need be a quantum theory at all. 5.6

The Manifold Methods of Quantum Gravity

Both strong contenders from the catalogue of methods presented in §5.4 (i.e. covariant and canonical) do violence to general relativity in some way: the covariant methods break the background-independence of the theory; canonical methods break the manifest spacetime covariance of the theory. Note also that string theory—belonging to the covariant methods, historically at least—is an all out modification of general relativity, in both the quantum and classical domains (cf. (Witten, 1989)). The idea is that it is a successor to general relativity, rather than a quantization of it. A central idea of string theory is that general relativity’s being perturbatively nonrenormalizable simply rules it out as a consistent theory at the quantum level: one has to look elsewhere. Loop quantum gravity argues that the perturbative analysis contains fundamental flaws, involving the assumption of a fixed continuum spacetime to get things going. To an outsider, the battles between these two sides (covariant and canonical) may look rather strange. The current animosity—mainly, it has to be said, between string theory and loop gravity—is a fairly new development. In the past there have been suggestions to examine closely the relations (on both a formal and conceptual level) between the canonical and covariant approaches in order to bring them closer together—see e.g. (Hartle and Kuchaˇr, 1984). Hopefully this attitude will reemerge at some point. We saw in our brief browse through the historical development of quantum gravity that there are three main methodological approaches, and a family of ‘external’ approaches that do not readily fit into any of these three categories. In this section we will examine these approaches in more depth by focusing on specific implementations: we look at covariant perturbative quantization in §5.6.2 and consider string theory independently of this in §5.6.3. In §5.6.4 we discuss canonical approaches, particularizing to loop quantum gravity in §5.6.4.3. We then consider the path-integral (or Feynman quantization) approach in §5.6.5, followed by a series of external approaches in §5.6.6. Philosophical ramifications will be integrated en route. The natural, logical place to begin, however, is with

300

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

an approach that isn’t really a ‘full’ theory of quantum gravity at all, but is instead a half-classical-half-quantum ‘hybrid’ theory. This will lead us in to the philosophically interesting terrain of whether we even need to quantize the gravitational field. 5.6.1

Semiclassical Quantum Gravity

At first sight, it isn’t clear that one needs to quantize gravity in order to get what one requires from a quantum theory of gravity, namely a framework that can deal with both general relativity and quantized matter fields—after all, we have good empirical evidence about these two aspects of the world taken in separation, but no empirical evidence whatsoever about quantized gravity itself. Given the experimental data, then, all one would seem to require is some way of accounting for gravitational interactions between quantized matter-energy: so prima facie, the way is open for a theory involving quantized matter-energy interacting through a classical gravitational field.68 This would give us a hybrid ‘half and half’ or ‘semiclassical’ theory of gravity. Whether or not this is possible—that is, whether or not there is a necessity to quantize the gravitational field given the quantized nature of matter fields and the universal coupling of gravity—has led to a fairly sizeable philosophical literature (sizeable, that is, given the general paucity of philosophical material on quantum gravity). The reason for this philosophical interest is that given the lack of experiment to decide the issue, “we can only attempt to shed light on such a question [whether gravity should be quantized— DR] from the epistemological side” as Rosenfeld puts it ((Rosenfeld, 1966), p. 599). Here we outline the main ideas of semi-classical gravity, and then discuss some of the arguments contained in this literature. Firstly, let us again write down the (abbreviated) central equation of (classical) general relativity: Gμν = −8πGN Tμν [Φ]

(5.5)

As we said above (in §5.5.1), the left hand side of this equation is a classical term representing the geometry of spacetime while the right hand side represents any matter-energy within the universe and thus, given what information we have about the nature of matter, ought to be considered as dependent on quantum operators rather than classical functions. We have, then, a formal mismatch: a c-number field on one side and a q-number on the other, an apples and oranges equation. Either we turn the left hand side into a q-number field, which would involve a quantization of the gravitational field, or else we can try to turn the right hand side into a c-number. One early approach along the latter lines—of 68 This could be accomplished in two broad ways: (1) by formulating a quantum field theory on a curved spacetime in which the spacetime is dynamically decoupled from the matter (i.e. unresponsive to its evolution); and (2) by formulating a quantum field theory in which the state of the matter influences the spacetime geometry. In both cases the geometry remains classical. We discuss approach (2) in this section since it is closer to quantum gravity.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

301

Møller (1962) and Rosenfeld (1963)—involves coupling a classical gravitational field to a quantized source, satisfying the semiclassical Einstein equations: Gμν = −8πGN ψ | Tˆμν [Φ] | ψ

(5.6)

Hence, only the stress-energy tensor becomes a quantum operator and the gravitational field interacts with the expectation value (a c-number) of this tensor (henceforth abbreviated to Tμν ), rather than directly with the quantum fields. The solution would involve a classical spacetime M, gμν , a matter field Φ with stress-energy tensor Tμν , and a quantum state ψ of the matter field. Hence, roughly, we will be dealing in models of the form M, gμν , Tμν [Φ]ψ  satisfying eq.5.6. If this framework makes sense, then there would appear to be no quantization routes to quantum gravity. The industry of approaches that do not quantize general relativity points to the fact that many researchers do not think that quantization is necessary—though this does not mean that they think quantum gravity is not necessary.69 However, one often finds arguments that claim to demonstrate the opposite. 5.6.1.1 Arguments for the Necessity of Quantization. Bergmann and Komar refer to the idea that the gravitational field must be quantized on pain of violating the uncertainty relations as “the principal argument in favour of quantization” ((Bergmann and Komar, 1980), p. 228). If gravity were kept classical then that means that it isn’t subject to the uncertainty relations like quantum quantities are: coupling the two types would appear to open the door to using the classical gravitational field to circumvent the uncertainty relations. Eppley and Hannah extend this to include momentum conservation and relativistic causality (Eppley and Hannah, 1975). The basic idea is that since all mass generates a gravitational field, then particles generate a gravitational field. These particles obey the uncertainty relations. However, if we retain the classicality of the gravitational field that these particles generate then we can, by simultaneously measuring the components of the field, determine the precise values for positions and momenta of the particles, thus violating the uncertainty principle. Thus, we have a reductio: the gravitational field must be quantized to avoid this violation. The details of Eppley and Hannah’s thought experiment are rather involved. A simpler approach is described by Unruh (1984).70 Here the idea is to connect 69 However, as we saw in §5.3, there were problems in making that case completely watertight, there being many and varied arguments of varying strength. 70 This gedanken-experiment appears to be a variation on the theme of one initially offered up by Kibble (1981); though Kibble used it to argue that semiclassical gravity is viable. Kibble was followed by Page and Geilker (1981), who attempted to actually perform a version of Kibble’s experiment. See Mattingly (2006b) for more discussion. Leslie Ballentine writes of the Page-Geilker experiment “A less surprising experimental result has seldom, if ever, been published!” ((Ballentine, 1982), p. 522). Of course, we know that if we perform a macroscopic measurement of the kind carried out—gravitational measurements of two lead balls with two possible position states—then we know that individual measurements will track the definite, individual configuration, not the average of all possible configurations. Not surprising, perhaps,

302

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

a Cavendish balance to a Schr¨ odinger’s Cat type setup. The prediction of the semi-classical theory is supposed to be at odds with our expectations for what should happen. The details are as follows. We begin with a sealed box in which there are the following elements: a pair of masses connected by a stretched spring and being held apart by a rod; on the rod there is a bomb, itself connected to a detonator which is linked up to a geiger counter, which is beside a radioactive source (which leads the counter to click, on average, once a day). When the geiger counter clicks, the explosive device will detonate the bomb, resulting in the breaking of the rod and the two masses being pulled together. Outside the box is a Cavendish balance whose equilibrium position depends on whether the masses in the box are together or apart—hence, the position will depend on whether the radioactive source decays. There is also an observer outside of the box. This setup is represented in fig. 5.1.

Cavendish Balance

Observer Detonator

Spring Charge

Geiger Counter

Rod

Radioactive Source

Mass

Fig. 5.1. Unruh’s Schr¨ odinger-Cavendish experiment: initial state Now, according to the semiclassical quantum gravity approach, the gravitational field outside of the box is described by the (here abbreviated) semi-classical but it makes its point well: in the Everett picture, for example, if the gravitational field were not quantized, then the measurements to determine its values in the case of a mass in a quantum superposition of position states would track the average centre of mass. Naturally, this is a variation on the measurement problem, and gives some of the motivation for Penrose’s collapse approach discussed in §5.3.7.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

303

field equations: Gμν = 8πTμν . Given the dependence of the balance on the positions of the masses, its state is determined by that of the system inside the box. The states of the box are split up into two types: those for which the masses are held apart |⇐⇒; and those for which they are together |⇒⇐. The initial expectation value will be: Tμν 0 = ⇐⇒| Tμν |⇒⇐

(5.7)

Time evolution will take this state into a linear superposition of both kinds of states (masses apart and masses together): φ = α(t) |⇐⇒ + β(t) |⇒⇐

(5.8)

The squared moduli of the coefficients are: | α(t) |2 ≈ e−λt | β(t) |2 ≈ 1 − e−λt

(5.9) (5.10)

The expectation value after the system has been evolved is: φ | Tμν | φ =| α |2 ⇐⇒| Tμν |⇐⇒ + | β |2 ⇒⇐| Tμν |⇒⇐ +2Re[α∗ β⇐⇒| Tμν |⇒⇐] Substituting the (empirically observed) values for the coefficients, and taking the final ‘interference’ term to be negligible, we have: t | Tμν | t = e−λt ⇐⇒| Tμν |⇐⇒ + 1 − e−λt ⇒⇐| Tμν |⇒⇐

(5.11)

The formalism leads us to predict that the Cavendish balance will slowly (and continuously) shift from one equilibrium position, where the masses are apart (as in fig. 5.1) to another, where the masses are together (as in fig. 5.2). However, as Unruh points out, this is at odds with what we expect to happen: rather than a smooth swinging around of the balance, we expect a sudden swing when the counter clicks. If it does move like this, then given the ‘slow swing’ prediction, the semi-classical equation cannot be correct (not even as a good approximation). This ‘discontinuity problem’ forms another problem, connected to the quantum measurement problem. According to collapse interpretations, when we make an observation of the sealed system (by peeking into the box) the state changes discontinuously from a superposition α |⇒⇐ + β |⇐⇒ to a state with definite positions for the masses: |⇒⇐ or |⇐⇒, with probabilities | α |2 and | β |2 respectively. But this discontinuous collapse translates into a discontinuous shift

304

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

in Tμν , so that the conservation law Tμν ;ν = 0 is violated (where ‘; ν’ denotes the covariant derivative with respect to ν). This violation can be exploited for superluminal signalling—this latter problem forms the much cited thought experiment by Eppley and Hannah.

Cavendish Balance Observer

Detonator

Geiger Counter Mass Radioactive Source

Rod

Fig. 5.2. Unruh’s Schr¨ odinger-Cavendish experiment: final state Eppley and Hannah’s argument is intended to show that the gravitational field has to be quantized. It is set up as a reductio ad absurdum argument: assume that semiclassical gravity is a consistent and physically reasonable theory. One then uses a gravity wave to simultaneously measure both the position x and momentum p of some putatively quantum body at a level of precision that violates the uncertainty relations: i.e. Δx · Δp ≥ 2 . In other words, although the quantized matter source is subject to the uncertainty relations, the classical wave is not, and hence it can be exploited (so the argument assumes) to get around the uncertainty in the quantized system. Therefore, the original assumption, that semi-classical gravity is consistent and physically reasonable, must be false. James Mattingly (2006) has objected on a variety of grounds to the drawing of the “necessity of quantization” conclusion from such thought experiments. Firstly, when tested in the way suggested—per impossible as Mattingly argues in a different paper (Mattingly, 2006) (see below)—the uncertainty relations might be found to be violated. In other words, checking for such violations is some-

THE MANIFOLD METHODS OF QUANTUM GRAVITY

305

thing to be determined by experiment, not from the comfort of one’s armchair. Secondly, he argues that the power of the uncertainty relations is interpretationdependent: in some interpretations, the uncertainty relations reflect our ignorance of the ‘real stuff’ that is going on in the world. Mattingly takes issue, then, with the tenability of the thought-experiment invoked by Eppley and Hannah and those like it. However, Unruh’s experiment is nearer to actual performability than is Eppley and Hannah’s. Indeed, Unruh writes that the Page and Geilker experiment illustrates “the ease with which such a gedanken experiment could be made real” ((Unruh, 1984), p. 237). Mattingly’s complaint against these thought experiments has to do with their physical realizability: far from being performable, Mattingly argues that they are in fact nomologically and, indeed, metaphysically impossible. The basic idea at the core of his argument is an old and well known one: measuring position (i.e. localizing a particle) requires a concentration of energy; as this energy is forced to occupy smaller and smaller volumes, a black hole will form and swallow up any measuring equipment! This seems to be a strong argument against the Epply-Hannah experiment; however, Unruh’s experiment appears to be unaffected. Rosenfeld, who is often aligned with the ‘necessity of quantization of the gravitational field’ view, in fact dismissed such arguments (see (Rosenfeld, 1963), for example). The idea, in more intuitive terms, is that quantumness spreads like a virus through classical systems, so that a classical system interacting with a quantum system and remaining classical is impossible—since gravity couples to all mass-energy sources, it has to be quantized. Rosenfeld was surely right in thinking that the decision to quantize some object has to come from empirical evidence not a priori considerations. The issue of whether the gravitational field has to be quantized remains an open one. I should, however, point to a recent proposal by Adrian Kent that might serve to close the issue up.71 Kent (2005) proposes a simple (and ‘doable’) experiment to test the local causality of spacetime. That this experiment is feasible points, at least, to the fact that the question of the quantum properties of the gravitational field is indeed an empirical one. The experiment is a variation on Page and Geilker’s experiment, only now involving a pair of Cavendish bars coupled to a pair of spacelike separated detectors used to perform a standard Bell-type experiment. General relativity is locally causal (or stochastic Einstein local in Butterfield’s terminology). This is understood to mean that the probability distribution of metric and matter fields in a region of spacetime is determined only by events in the past of the region (i.e. not a spacelike separated points or regions). The straightforward implication is, then, that any theory formulated with respect to a spacetime background described by general relativity (i.e. classical and geometric) would also have to be locally causal: this would include semiclassical theories 71 I

thank Jeremy Butterfield for bringing this paper to my attention. Butterfield (2007) discusses the experiment under the banner of ‘stochastic Einstein locality’.

306

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

of gravity. However, one of the common expectations of quantum gravity is that it should allow superpositions of spacetime geometries: given spacelike separated entangled quantum systems (pairs of photons in a polarization singlet state, of the kind found in the wings of Bell experiments), we should be able to render these superpositions macroscopic if we couple the systems at the spacelike wings to some macroscopic device, such as a Cavendish balance, that displays different mass configurations depending on what is measured on the quantum system. Given the entanglement, this should then lead to a violation of local causality by spacetime: the quantum correlations of the photons will be mimicked by the states of the Cavendish balance (four permutations of the states of the masses on the torsion balance). This is the raw outline of Kent’s experiment. If we measure a violation (the intuitively expected result), then general relativity cannot provide a universal description of gravity and spacetime: we need a quantum theory of gravity. However, it is clear that this result is not the kind of thing we can know a priori : it might be the case that we don’t detect any violation at all. In this case quantum theory would fail to describe all possible Bell-type experiments. Alternatively, we might find “the coexistence of a quantum theory of matter with some classical theory of gravity which respects local causality, but which has the surprising property that classical gravitational fields do not couple to classical matter in the way suggested by general relativity” (p. 3). This is simply more grist for Rosenfeld’s mill.72 5.6.1.2 Important Results in the Semi-Classical Theory. There is a natural inverse relation between semiclassical gravity and the theory of quantum fields on curved spacetime: the former describes the way in which quantum fields act as sources for the gravitational field, and the latter describes how gravity impacts on quantum fields. This framework, though relevant to the construction of a full-blown quantum theory of gravity, leads to certain results that function as benchmarks for these theories. The most important such result, as mentioned earlier, is the Hawking effect, which exposes connections between quantum field theory, general relativity, and thermodynamics. By considering the behaviour of quantum fields near black holes Hawking discovered that they (that is, black holes) radiate. The radiation is in the form of outgoing particles that are created in the exterior of the black hole, and have the effect of carrying mass-energy away from the hole, resulting in black hole ‘evaporation’. The spectrum of black hole radiation must be a prediction of any viable quantum theory of gravity, for, given the ingredient theories, there will be a dynamically possible scenario in which such effects are manifested. Both string theory and loop gravity can— albeit with some difficulty—get the correct value. Let us take a brief detour into black holes and their quantum behaviour. 72 See W¨ uthrich (2007) for an excellent examination of the metaphysical issues surrounding semi-classical theories of gravity, including the impact of some of the most recent debates in this area (connected to viewing general relativity as an effective field theory—W¨ uthrich ultimately argues that the alternative accounts “at least prove the tenability of an opposition to quantization” (p. 777).

THE MANIFOLD METHODS OF QUANTUM GRAVITY

307

Black holes provide a central testing ground for up and coming quantum gravity theories. The quantum theory of black holes is especially interesting because it includes aspects of gravitation, quantum theory, and thermal physics. There is an immediate and intuitive connection that is easy to see if one considers what happens when something falls into a black hole (now understood in the old-fashioned way for the purposes of the example): whatever goes in or, (given what we said above), most of what goes in (past the event-horizon), never comes out again. What goes in is ‘lost’ because black holes have no internal structure. Throw whatever you like in, the black hole just increases its mass. In particular, throw an object that has a high entropy into a black hole and it appears that you can thereby eliminate the entropy, effectively reducing the entropy of the rest of the Universe. Prima facie, this is, of course, a clear violation of the second law of thermodynamics. However, the connections go much deeper than this: Hawking (1971) proved that the total area of a family of event horizons associated to a family of black holes will always increase. This makes the area sound like entropy: that too always increases, or at least it seemed to until we started thinking about throwing things into black holes. This gives an analogy with the second law of thermodynamics, only this time involving the area of horizons. Of course, if entropy just is area then entropy doesn’t decrease when we throw something in to a black hole: area increases and, therefore, so does entropy! There are more connections one can make between thermal physics and black hole physics: the mass M of a black hole is analogous to energy E; the surface gravity of a black hole κ is analogous to the temperature T ; likewise, there are analogies for the other laws of thermodynamics. Jacob Bekenstein was the first to take this analogy seriously and argue that entropy really is just proportional to area. That is, the area of black holes is another source of entropy that has to be added when we consider the total store of entropy in the world. Hence, we have a simple analogy: entropy is fond of increasing and so is the surface area of a black hole. However, this is, of course, not an identity: black hole entropy is a (monotonically increasing) function of the area. Hence, in order to get the entropy of a black hole from its area we need to know what this function is. Let us briefly review some of the details of the connections. Quantum black holes emit (approximately) thermal radiation at temperature: T ∼

c3 GN M

(5.12)

where M is the mass of the black hole. Notice that setting  to zero (i.e. in the classical limit) the temperature vanishes, as we would expect—that is, this is a quantum effect. The thermal nature of black hole radiation leads to the idea that black holes have an entropy (called the ‘Bekenstein-Hawking entropy’): S=

Ac3 4GN 

(5.13)

308

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

where A is the area of the black hole’s horizon (i.e. the surface area).73 The general expression for the entropy of a system is written in terms of the number N of quantum states (microstates) of the system: S = lnN

(5.14)

Given eq.5.13 this leads us to the following (rough) value for this number of microstates: Ac3

N ∼ e 4GN 

(5.15)

This is a huge number, implying that black holes have enormously high entropies.74 As I mentioned earlier, one of the ‘viability tests’ of a quantum theory of gravity is to compute the Bekenstein-Hawking entropy using materials internal to the approach, so as to give a microscopic description of black holes. That is, one should be able to use the theory to compute the quantum states of a black hole and, hopefully, find that it matches the Bekenstein-Hawking entropy. This has been done in a number of approaches, albeit with varying degrees of generality and success. 5.6.2

Covariant Quantization

Much of the early work in quantum gravity has followed the route of ‘covariant quantization’, largely as a result of the astonishing successes involving the quantization of the other forces using such methods. The viewpoint of covariant approaches is that gravitation is simply a gauge field theory which describes the behaviour of massless, self-interacting, spin-2 bosons. The theory is in principle of a kind with the field theories for the other interactions. The fundamental particles of the theory, namely gravitons, are treated as no different in principle from other particles such as photons and electrons. Crucially, these particles are taken to move in a flat, fixed Minkowski spacetime. Hence, gravity is no longer considered to be a geometric theory, in which gravitation is a result of curved spacetime; rather, the particles behave as though they were moving in such a curved spacetime when ‘in reality’ they are moving in such a way in virtue of graviton exchange. It is this feature that is responsible for the animosity often shown between the relativistic and particle physics communities: the difference is between quantizing fields in spacetime and quantizing spacetime itself.75 73 Note that we can set c = G N =  = 1 to get the simple expression whereby entropy is just a quarter of the surface area. 74 Note, however, that this large entropy is a property of black holes that holds independently of whether entropy really encodes a count of the number of microstates. It is also noteworthy that the entropy scales with area as opposed to volume, as one might more naturally expect. For a discussion of the significance of this ‘holographic’ property see (Susskind and Lindsay, 2005). 75 Note that wave-particle duality provides a correspondence between the massless spin-2 particles propagating on flat spacetime and gravitational waves moving in flat spacetime. Or, in other words, quantized versions of these waves are the gravitons.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

309

Often, when some computations are too complex for a theory’s equations to be solved exactly, one resorts to approximation methods. One such method, at the heart of quantum field theory, is perturbation theory. Here one calculates physical quantities of interest as power series expansions in the relevant coupling constant of the theory. Consider again the case of QED. Here a coupling constant is the electric charge on electrons, e. This determines the strength of interactions involving electrons. In fact, one works with the (dimensionless) number α = 2πe2 /hc ≈ 1/137, known as the fine structure constant. The small size of this number, much less than unity, allows one to approximate the probability amplitude for some process in QED (such as electron-electron scattering, S(α)) using the first few terms of a power series in α—these terms one represents using (families of) Feynman diagrams: S(α) = S0 + αS1 + α2 S2 + · · ·

(5.16)

In the case of QED as we add more terms to the series the contributions reduce substantially, so one can work the first few terms and ignore the rest. However, if we crank up the strength of the coupling, the perturbation scheme begins to fail. Already in the physics of strong interactions (where the coupling approaches unity) this scheme falters (new physics can appear at high coupling).76 This failure necessitates the introduction of alternative approximation schemes, or else exact methods. Since we turned to perturbation theory in the first place because of the computational complexity, it goes without saying that these alternative methods are very hard indeed. This didn’t stop particle physicists from applying the arsenal of their approximation methods to the gravitational field. Even in this case where the coupling constant is relatively small we still face the fact that such expressions as eq.5.7.3 are divergent. The programme of conventional quantum field theory is in many respects an exercise in perturbation theory. This is formulated in a four-dimensional covariant manner. The covariant quantization method involves the application of the particle physicists’ machinery of perturbative quantum field theory to gravity. The key step in this approach is to split the spacetime metric gab into two pieces: • a kinematic part that functions as the background against which quantization is defined (giving us machinery such as microcausality, inner product, and so on), generally chosen to be the flat Minkowski metric ηab . • a dynamical part, hab , that will represent the dynamical field to be quantized—this is understood as measuring the ‘deviation’ of the physical metric from the chosen background. The original spacetime metric gab is then written as: 76 I should point out that this is not ‘globally’ true: there are certain ‘asymptotically free’ sectors of the theory of strong interactions (i.e. quantum chromodynamics, or QCD) where perturbation theory is applicable. The reason is that asymptotic freedom implies the existence of a fixed point as the momentum goes to infinity; at this fixed point the coupling constant vanishes.

310

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

gab = ηab + GN hab

(5.17)

Where GN , Newton’s constant, functions as the coupling constant with respect to which one expands in the perturbation series. The idea, then, is to view only the part hab as the physical gravitational field, and one disconnects this from the dual rˆ ole it plays in determining the spacetime geometry in the context of Einstein’s theory of general relativity. This gravitational field lives on the background geometry defined by ηab . Hence, quantization occurs with respect to a fixed spacetime, just as in standard quantum field theories. What we get, as a result of this quantization, is a theory of massless, spin-2 particles: the quanta of the gravitational field, namely gravitons. The interactions of these gravitons is determined by the Einstein-Hilbert action. Given this definition of the fields on a background and the action, one can engage in perturbative methods—i.e. compute graviton-graviton scattering, and so on. Being a perturbative approach it is an approximation rather than an exact theory—as Mandelstam points out, it can only be regarded as a “provisional solution of the problem [of quantum gravity]” ((Mandelstam, 1962), p. 346).77 Reading the literature on perturbative quantization methods would, however, often lead one to believe otherwise. It isn’t hard to see why many general relativists dislike this approach. It does away with what is usually seen to be the defining feature of general relativity: background independence. One has separated out the dual function played by the single object in general relativity. In general relativity the metric determines the gravitational field structures and the geometry of spacetime. Not so in the covariant approach; here a separate object performs each function. However, this approach certainly has a strong pedigree (as we saw in §5.4): Rosenfeld, Gupta, Feynmann, DeWitt, and other luminaries of physics all did serious work on it— as with the canonical quantization method (discussed in §5.6.4), pursuing such a path was perfectly natural at the time: it is rational to use the methods one already has at one’s disposal before breaking with tradition and following an entirely new path). As we mentioned earlier, the theory is not renormalizable—i.e. the quantum theory possesses infinitely many undetermined parameters.78 The standard take on such theories is that they are not able to make predictions. What it signifies 77 The canonical quantization approach if successful would provide us with an exact quantization, though one has to trade in the manifest covariance of the perturbative approach—see §5.6.4. Note that string theory is, thus far, only known through its perturbation expansion, though the more recent work tackles its non-perturbative features—see §5.6.3. 78 The technical explanation for why things go wrong with gravitation when we try to quantize it in this way is that gravity is described by a non-Abelian gauge field theory. The quanta of the field, gravitons, self-interact; they exchange gravitons with themselves! However, some non-Abelian gauge feld theories are renormalizable: the work on the covariant quantization of gravity, of Feynman and DeWitt, was instrumental in t’Hooft and Veltman’s later (Nobel prize winning) work on the quantization of gauge field theories. This is quite unlike the electromagnetic field and its quanta, photons; here we have a field described by an Abelian gauge field theory.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

311

is that the perturbative expansion is not proving to be a good approximation to the physics at short distances/high energies. Note, however, that many string theorists view the perturbative non-renormalizability of general relativity as evidence that the quantum theory is inconsistent.79 This motivates their shift to an alternative theory, and their claim that string theory is the only viable quantum theory of gravity. Such claims are simply not true: loop quantum gravity shows quite clearly that the nonperturbative theory can make sense, and be finite, despite the perturbative problems. What this shows is that there must have been assumptions going into the perturbative methods that caused the failures. One such assumption is that of smoothness of spacetime structure at arbitrarily small distances. Being a background independent approach, loop quantum gravity seeks to uncover this structure rather than imposing it at the outset. The diffeomorphism invariance of general relativity is in large part at the root of the problems with the covariant perturbative approach to quantum gravity. Recall that diffeomorphism invariance is a gauge freedom in the theory that derives from the multiplicity of representations by localized metrics of one and the same geometry. This results in a breakdown of unitarity in diagrams containing closed graviton loops. The unitarity of the theory can only be restored by adding correction terms (corresponding to ‘ghost particles’). However, adding the ghost particles still leaves another problem: there are divergences in the Feynman diagrams. The absence of a dimensionless coupling constant in general relativity means that the standard renormalization procedures can’t work in this context: the divergences are logarithmic. This problem leads the covariant approach down several distinct avenues: supergravity and superstrings are the two main lines (with M-theory containing both as limiting cases). Another option was to treat general relativity as an ‘effective field theory’. That still leaves one with the problem of specifying the theory to describe the physics at scales beyond the effective range of this approach to quantum gravity. Hence, though it has proven useful, leaving many tools that form a vital part of many other approaches to quantum gravity, the covariant perturbative approach is no longer pursued, and is generally considered to be a dead-end. 5.6.3

String Theory

String theory is, as Ed Witten concisely puts it, “a quantum theory that looks like Einstein’s General Relativity at long distances” ((Witten, 2001), p. 1577). Despite its present status as the leading candidate for a quantum theory of gravity—and indeed a unified theory of all interactions—string theory did not begin life as a quantum theory of gravity. Rather, it began as a framework for modelling the strong interactions in the 1960s (the ‘dual resonance model’ suggested by Gabriele Veneziano, where ‘resonance’ refers to the fleeting nature of 79 This is not just a string theoretic response; there were many pre-string efforts to find an alternative to general relativity (or modifications thereof) that recovered the physics of general relativity ‘in the large’, but that involved new physics at small distances, such that these modifications cured the ultraviolet divergences.

312

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

certain hadrons), in response to the profusion of hadrons that were being thrown up by the new high-powered accelerators—these particles had spins greater than 1 and the corresponding theory was, therefore, nonrenormalizable (in four spacetime dimensions).80 It was discovered that the various oscillation modes of strings corresponded to the great variety of hadrons; hence, the dual models were reinterpreted as string theories—the initial argument behind string theory was, then, of the ‘inference to the best explanation’ kind (there wasn’t any other viable explanation available at the time!). However, it became clear (thanks to ‘asymptotic freedom’) that standard quantum field theory could account for the strong interaction without recourse to the higher dimensional entities postulated by string theory, the result being quantum chromodynamics. Curiously, the feature that originally led to string theory’s rejection, was the same feature that led to its resurgence: the existence of a massless spin-2 particle in the string’s spectrum— i.e. there is a quantum state of the string that corresponds to the graviton. Even once it was realized that it had the potential to incorporate gravity (on account of the correspondence between the spin-2 mode and the graviton),81 as well as the other interactions (demonstrated by Green and Schwartz (Green and Schwartz, 1984)), acceptance was relatively slow, largely as a result of initial inconsistencies in the bosonic theory. But, after various tweaks to string theory resulting in improved behaviour82 —primarily the introduction of spacetime supersymmetry into the theory—it quickly gained momentum. While certainly not regarded as fact, string theory is becoming ‘mainstream physics’, with numerous textbooks and university courses (even at the undergraduate level!).83 String theory is considered to qualify as a quantum theory of gravity for 80 Recall

that the strong force holds together protons and neutrons inside the nuclei of atoms, and also holds together the quarks inside the protons and neutrons. The exchange particles that mediate the force are gluons. When we work out the amplitudes for processes involving some hadrons we find that prima facie distinct families of Feynman diagrams are equivalent or ‘dual’ descriptions of the same thing. The strangeness of this, from the perspective of quantum field theory, is what prompted Nambu (1970) to suggest a string model (on the basis of Veneziano’s discovery). Apparent distinctness at the particle level becomes obvious (topological) equivalence at the string level, where the Feynman diagrams are Riemann surfaces. 81 The particle responsible for mediating the gravitational interaction has to be massless to reproduce the (gravitational) inverse-square law. Can’t be half-integral because the Pauli exclusion principle renders a large-scale field impossible. Spin greater than 2 rules out a static force. Spin 1 gives a repulsive force (for like particles). Spin-0 gives a scalar field. Narrowing down to spin-2, associated with a symmetric Lorentz tensor field. As Isham ((Isham, 1991), p. 145) points out, the spin-0 case corresponds to Newtonian gravity and the spin-2 case to general relativity. 82 I should point out, however, that there is as yet no proof that the theory is free of the kinds of UV divergences that plagued the covariant perturbative approach. 83 Just how this state of affairs was achieved would make an excellent ‘sociology of science’ case-study, or just an interesting case-study for historians and philosophers of science. One such small-scale study has already been undertaken: (Galison, 1995). An excellent collection of historical essays on the origins of string theory can be found in (Gasperini, 2008). More information can also be found in John Schwarz entry at the Caltech Oral Histories Archive: http://oralhistories.library.caltech.edu/116.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

313

two reasons: firstly, as mentioned above, the string spectrum has a vibrational mode corresponding to a massless spin-2 particle, which can be taken to represent a graviton.84 Secondly, the ambient spacetime containing the string has to satisfy an equation that has Einstein’s field equations as a large-distance limit— this equation has to be satisfied in order for the theory to be well-defined. The physics of general relativity appears as a low energy/long distance limit of string theory, but the theory gets modified at higher energies, as one would expect. On this basis, a claim often made by string theorists is that string theory predicts general relativity! This is an absurd claim: one make make a case for saying that string theory explains gravity and general relativity, by reducing it to strings and their vibrations and interactions, but prediction is essentially a future-directed matter. Alternatively, one finds string theory being motivated through its necessary inclusion of gravity. As it is put in a recent textbook on string theory: “Ordinary quantum field theory does not allow gravity to exist; string theory requires it” ((Becker et al., 2007), p. 4). However, as Rovelli points out “[t]he fact that string theory includes GR is a necessary condition for taking it seriously, not an argument in support of its physical correctness” ((Rovelli, 1998), p. 5); that is, it does not amount to a sufficient condition. A prominent string theorist takes matters even further, interpreting the necessary inclusion of gravity as providing strong grounds (confirmation!) for belief in the theory: A priori, we shouldn’t have expected that a randomly chosen framework would correctly predict the existence of spin-0, spin-1/2, spin-1 particles with spin-2 gravity and the right interactions and the right incorporation of the quantum effects. However, string theory is able to do that (and no other theory can), besides dozens of other successes. I think that this result itself is already such a non-trivial confirmation of the theory that it is unreasonable to believe that string theory could be wrong. ((Motl, 2007))

This seems to betray a lack of knowledge of how string theory was discovered; why it was pursued; and how confirmation works. String theory was not quite discovered by accident, or no more so than most theories. As we said, it started as a theory of the strong interaction only to be shelved because of the spin-2 modes, and other ‘unsavoury’ features. Of course, it was then just this feature that motivated the development of strong theory after its losing out to QCD as a theory of hadrons. It wasn’t a “randomly chosen framework”. If it didn’t have this feature then it would currently join countless other theories on the scrapheap. On the issue of confirmation, one might as well say that general relativity predicts Newtonian gravity because it emerges in the appropriate limit.85 84 For a thorough investigation of the relationship between spin-2 fields and general relativity (general covariance), see (Wald, 1986). 85 Note that Joseph Polchinski has suggested that one might qualify the claim that string theory predicts gravity as follows: “once one starts from the principle that the fundamental degrees of freedom are one dimensional, one derives as consequences gravity, a definite dimen-

314

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

5.6.3.1 String Perturbation Theory String theory, in its original perturbative formulation, was really just a two dimensional field theory on the worldsheet W traced out by the strings (the quanta of the theory).86 W is mapped into the target spacetime manifold M via a coordinatization function X : W → M. This theory is then quantized and one ends up with excitations of the string corresponding to particles propagating through M.87 In other words, the perturbative description of string theory amounts to a 2-dimensional field theory on the worldsheet of a string, where the coordinates of the string are the field variables. String dynamics is two-dimensional because a one-dimensional object sweeps out a two-dimensional worldsheet. The dynamics must describe how the worldsheet of the string unfolds in spacetime. √ In order to get gravity out of this framework a parameter α that is related to the string tension, T = 1/2πα (α = energy per unit of length), needs to be set to the order of the Planck length, namely 10−33 cm. This sets the length of strings in the theory; naturally, they are way too small to be observed in present day experiments. Important for the mechanism of interacting between strings is a new field known as the ‘dilaton’. The dilaton plays a rˆ ole in determining the strength of the interactions through the string coupling constant gs . This sits at the regions of splitting and joining of strings, much like the coupling constant in QED sits at the vertices.88 However, the fact that the splitting and joining is ‘spread out’,

sionality to spacetime, and completely determined dynamical laws (e.g. no free parameters). This is what one hopes to see in a theory, in the way that the equivalence principle determines Einstein’s equations, but more complete in that Einstein’s equations govern only the gravitational part of the theory, and only approximately” (email communication). Given this, if we adopt a Hempelian approach, then we can agree that string theory explains gravity (since it is deducible as a logical consequence), but it doesn’t make sense to say that this constitutes prediction. However, this account of explanation (and prediction) is, in any case, flawed for many reasons—those unfamiliar with these reasons should consult (Salmon, 1984). Moreover, as Polchinski also notes, the one-dimensionality is itself a derived principle, governing the perturbative description. R 86 The action for this theory follows the action for a simple mass point: S = m ds (for mass m and path s). This gives R us the motion of the particle. Similarly the defining action for string theory is SN B = T dA (where T = 1/2πα is the string tension, where α has √ dimensions of length squared—in fact α is the string length scale—and A is the area of the worldsheet). a metric hab (a, b, = 1, 2) on the worldsheet we can rewrite this R Introducing √ as: SN B = T d2 σ −h. This is the ‘Nambu-Goto action’. It is generally replaced with the R √ Polyakov action: SP = T d2 σ −hhab hab (where hab = ∂a X μ ∂b Xν and X μ (σ1 , σ2 ) is a map describing the embedding of the worldsheet into the target spacetime with dimensions μ = 1, ..., D). 87 String field theory modifies this framework by considering string theory as a theory in spacetime, rather than on the worldsheet. A string field Ψ[X(W)] is a map from the configuration space of strings to the complex numbers. 88 There are many ways to understand the dynamics of strings: first and second quantized. The first quantized formulation focuses on the worldsheet of the string and involves summing over them. The second quantized picture, string field theory, involves field operators that create and annihilate strings.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

315

Fig. 5.3. Interactions between strings take place by (local) joining and splitting, i.e. when two points of the strings touch—xy-pic code used with the kind permission of Aaron Lauda. because of the extendedness of the strings, means that the interactions don’t involve singularities: the notorious problems of renormalization are avoided. In string theory Newton’s constant GN is proportional to the string coupling constant and the square of the Planck length: GN ∼ gs2 ls2 (for superstring theory). This series of facts results in the ability to reproduce the low-energy physics of GR by means of a perturbative expansion corresponding to splitting and joining of strings in flat spacetime. In this sense, general relativity looks like an effective field theory, with string theory the more fundamental theory. When the string is quantized (assuming flat spacetime and no interactions) we find an infinite hierarchy of ever increasing mass modes. It matters whether the string is open or closed. In the case of the closed string, in the massless bit of the spectrum, one finds a mode corresponding to a graviton (a massless spin-two particle). It is for this reason that string theory is taken to constitute a quantum theory of gravity. However, string theory is much more than that since it also contains modes corresponding to all the other force carrying particles. It is in this sense a theory of everything, although the particles and interactions it describes do not quite correspond to those we observe. To get some physical intuition for string theory, let us give a graphical description. Consider the motion of a string from one embedding in spacetime A to another B. Unlike 0D particles, which trace out 1D worldlines as they shift between spacetime configurations, 1D strings sweep out 2D surfaces called worldsheets. An amplitude is assigned to the propagation of the string from A to B; this amounts to a sum over all possible worldsheets connecting the Aconfiguration and the B-configuration—one follows the Feynman path-integral approach of assigning a weight of exp(i/(−S)) to each worldsheet with the Aand B-configurations as boundaries. The coupling constant determines the probabilities governing string propagation events, such as splitting and joining of strings (that is, it determines the rate at which strings interact): for large gstring

316

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

the interactions are strong, which amounts to high probabilities for splitting and joining. Consistency (namely, ruling out tachyons which imply an unstable vacuum) demands that fermionic degrees of freedom are added to the theory. This still does not deliver a unique theory. Rather, there are five distinct string theories living in 10 dimensional spacetime. These are called type I, type IIA, type IIB, E8 × E8 heterotic, and SO(32) heterotic. 5.6.3.2 Residual Dimensions and their Compactification The consistency of the quantized version of string theory depends on what is called ‘the critical dimension’ of spacetime. In the case of string theory proper—i.e. Bosonic string theory—spacetime must be 26 dimensional for the theory to make mathematical sense. In the case of superstring theory (i.e. with fermions too) the critical dimension is 10. A large part of string theory is devoted to working out how to compactify the residual dimensions, and showing how various properties of our four dimensional world emerge from various kinds of compactification, from the way the strings wind around them, and so on.89 However, compactification is not the only way to go to account for the four dimensionality of the perceived world. An alternative is what we might call ‘branification’ (more commonly called ‘brane-world scenarios’)—see (Randall and Sundrum, 1999) for the original presentation of such models; a lengthy discussion can be found in (Maartens, 2004). The idea here is to view the world we have access to as a ‘slice’ (an embedded brane) in a higher dimensional world (an infinite ambient brane), much like Edwin Abbott’s Flatlanders! High energy

Low energy

Fig. 5.4. The basic idea of compactification: we have two pictures of a cylinder, but at low energies/small distances the cylinder becomes ‘invisible’, effectively removing a dimension. Now think of the horizontal dimension as our ordinary 4 dimensional spacetime, and think of the vertical dimension as a 22, 6, or 7 dimensional manifold (manifest at high energies, compactified at low energies). The same reasoning leads to the successful approximation of strings by point particles (in the context of quantum field theory). 89 The idea of compactification has much in common with the old Kaluza-Klein attempt to unify gravity and electromagnetism by postulating a fifth dimension, the curvature of which determines the electromagnetic force, just as curvature in the four spacetime dimensions determines the gravitational force. In Kaluza-Klein theory, the curvature of this extra dimension has to be around the Planck length in order to account for the observed properties.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

317

Purely for historical interest I note that Einstein (in his ‘middle period’) would not have been happy with the higher dimensions demanded by string theory. On the Kaluza-Klein unified model of gravitation and electromagnetism, involving an additional dimension of space, Einstein writes: It is anomalous to replace the four-dimensional continuum by a fivedimensional one and then subsequently to tie up artificially one of these five dimensions in order to account for the fact that it does not manifest itself. ((Einstein, 1931), p. 438)

It is not a massive leap to infer that the same would apply to the 26, 10, or 11 dimensions of string theory, superstring theory, and M-theory.90 But is this good reasoning? In most other situations no doubt it would; it is a simple application of a principle of parsimony or simplicity. But quantum gravity is all about consistency, and if the only way to get a consistent theory is to postulate extra dimensions then should we not accept them? Oskar Klein himself wrote that his unified theory, with its extra coordinate vanquished to the Planck scale, “has such strange features that it should hardly be taken literally” ((Klein, 1956), p. 59). Indeed, he goes on to remark on the “somewhat repellent appearance of the small length just mentioned” (ibid.). Hence, Klein adopted a decidedly instrumentalist approach to the fifth dimension. But, again, what if the extra dimension did manifest itself, albeit very indirectly. Of course, this is just what string theorists claim: the extra dimensions, despite being ‘curled up’ have ramifications for macrophysics; they explain the particles and laws that we observe. They can even be called upon to explain the weakness of gravity in branification scenarios. Here is an area that might be profitably studied by philosophers: is this taking the notion of observation one step too far? Some hardened scientific realists might even have trouble taking these extra dimensions as real given the suggested way of making them empirically manifest. But just how different is this case from, e.g. Newton’s arguments for the existence of absolute space and time (whereby the effects of absolute space and time were taken to point to their existence)? Not really so different in my opinion; however, the question deserves closer scrutiny from philosophers. 90 As Joseph Polchinski pointed out to me, Einstein returned to Kaluza-Klein theory in 1938 (working with Peter Bergmann: (Einstein and Bergmann, 1938)) and 1943 (working with Pauli: (Einstein and Pauli, 1943)). In the former paper he writes that “one can assign some meaning to the fifth coordinate without contradicting the four dimensional character of our world” (p. 683). and in the latter paper wrote that “When one tries to find a unified theory of the gravitational and electromagnetic fields, he cannot help feeling that there is some truth in Kaluza’s five-dimensional theory” (ibid., p. 131). He even goes so far as to suggest that the problem of the inaccessibility of the fifth dimension might be evaded by taking the fields corresponding to the non-singular solutions to be “linearly extended”. They argued in the end that there were no such solutions; however, their paper was incorrect, conflating coordinate and physical singularities. See (Gross, 2005) for more on this aspect of Einstein’s work, including details on the identification of the error in Einstein and Pauli’s paper, and the solutions (known as ‘Kaluza-Klein monopoles’) that result once the error has been corrected—these now play a central role in M-theory.

318

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

Indeed, philosophically interesting issues are never far from the surface in string theory. For example, the freedom in the way the dimensions are dealt with leads to an enormous number of four dimensional theories: quantum string theory doesn’t describe a unique theory (or vacuum state), but instead appears to describe a ‘landscape’ of theories, some of which are good at describing our world, and some which aren’t. A serious problem is to find a selection principle to weed out the wheat from the chaff. The situation is analogous to the relationship between the ‘merely’ kinematically possible motions of some system, and the dynamically possible motions: string theory needs a dynamical principle, a criterion for choosing some particular location or region of the landscape. Choosing in such a way so as to fit what we see is open to the charge of ad hocness. What one wants, ideally, is some principle internal to the theory, or well-motivated on non-ad hoc gounds. One selection principle that has been utilized to reduce the size of the landscape is the ‘Anthropic Principle’:91 choose only those worlds that possess laws of physics capable of supporting us humans! A (better) alternative, I think, is to view string theory as an overarching framework in which to construct specific string theories, much as quantum field theory does not name any particular theory but a set of principles that one uses to build such a theory.92 5.6.3.3 The Second Revolution The ‘first revolution’ picture was strictly perturbative. It was known that such a framework was an inadequate model. Strings are treated in much the same way as gravitons are treated in the covariant perturbative approach, namely as propagating in an independent background spacetime. That was known to be inadequate too, but the aim was to push for clues about the nature of the right theory, and the kinds of physical quantities it would involve. Hence, the same idealization is present in string theory: though we know (or think we know) that gravitons and strings are really bound up with spacetime—being a wave in a geometry in the case of gravitons and maybe in the case of strings too—given the computational complexities, we can ignore this for the time being. There were two key problems with this first revolution picture of string theory: the residual dimensions of spacetime and the existence of five string theories. These two problems interact in various ways. The residual dimensions—and the fact that we cannot see them—are resolved by compactification (although this leaves the problem of explaining the mechanism behind the compactification). 91 See (Susskind, 2005) for a book-length discussion written at a popular level. (Weinstein, 2006) offers a philosophical discussion of the anthropic principle in string theory and multiverse cosmology. 92 A potential disanalogy between the string theory and quantum field theory cases is, however, that different quantum field theories are genuinely different theories, in the sense of having different dynamical equations, while in string theory there is just one set of equations and many different solutions. This is viewed as a virtue of string theory since it points to the generation of massive complexity (of solutions) from a handful of central, defining equations—my thanks to Joseph Polchinski for this objection: private communication). However, I would argue that since the solutions are set against different backgrounds, then we have a multiplicity of theories in the string theory case too. In any case, I leave this as a problem for the reader.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

319

The multiplicity problem is resolved by dualities connecting the various string theories. One kind of duality amounts to an equivalence of string theories using different compactifications. This only partially resolves the multiplicity problem, for associated to the different string theories (in 9+1 dimensions) are a massive number of string theories which amount to yet more distinct theories (in 3+1 dimensions). A key part of the second revolution, leading into nonperturbative sectors of string theory, was the discovery, by Joseph Polchinski (1995), of D-branes (nonperturbative excitations—the ‘D’ stands for ‘Dirichlet boundary conditions’)— discovered by imposing yet another consistency condition (to avoid yet another anomaly). D-branes are curiously entwined with black hole physics—the directness of this connection is often papered over in advanced string theory texts. The idea is that D-branes are in fact a configuration (microstate) of a black hole, on which strings can have their endpoints—this is the meaning of the Dirichlet boundary conditions mentioned above: the brane is the boundary for the string’s endpoints. Now, in a Type II string theory there can be only closed strings. However, the possibility opened up by D-branes is that one can effectively have such a situation. To get a feel for this, imagine a closed string near a black hole—here I borrow heavily from (Witten, 2001). Now imagine that a portion of the string slips behind the event horizon: we have (from the perspective of an observer outside of the hole) what looks like a string with endpoints on the surface. Now further imagine that the black hole evaporates. It can’t evaporate to nothing because the string is attached to its horizon, and absolute evaporation would mean that we end up with an open string, which is not possible in Type II string theory. What happens, instead, is that the black hole radiates until it reaches a stable ground-state consisting of a D-brane—in this sense, the D-branes arise from a demand for consistency. A similar situation can happen where there are two black holes and a string plugged into both (i.e. with endpoints in each). The black holes will evaporate again, and each end up in a stable state: a D-brane. This prevents the string having its endpoints in the vacuum.93 5.6.3.4 Duality Symmetries and M Theory This new work led to the postulation, by Edward Witten (1995), of a new theory labelled M-theory. The central idea of this new theory is that the profusion of different string theories really exist as limiting cases of some more fundamental theory that exists in eleven dimensional spacetime. This hypothesis is given support by the discovery that the different string theories are connected to one another by certain symmetries called dualities. These various string theories are defined on a variety of spacetime backgrounds, so M-theory is conjectured to be background independent. This seems like a reasonable conjecture since M-theory is supposed to ‘incorporate’ the various string models that are defined with respect to different backgrounds. 93 The

end-points can move around on the D-brane, but they cannot leave it unless they join to form a closed string.

320

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

However, there remains the pressing problem of actually defining M-theory!94 Edward Witten and others exploited these duality symmetries, and various results from earlier work on supergravity, then, to argue that there was some more fundamental theory that had the five dual string theories as limiting cases. The limits involve the postulation of an additional dimension, resulting in eleven dimensions of spacetime. This extra dimension has the topology of a circle. But this isn’t all: a new kind of higher-dimensional object, a membrane (a 2-dimensional surface), is introduced into the theory in order to make sense in eleven dimensions (superstring theories demand ten). The membrane wraps one of its dimensions around this circle, leaving one to propagate in the remaining nine spatial dimensions. This reproduces string theory: a one dimensional object moving in ten dimensions of spacetime. Witten noticed that the five consistent string theories are uniquely determined by various ways of wrapping membranes around the dimension. The dualities mentioned above hold for the membrane constrained to the nine spatial dimensions. This new eleven dimensional theory, with its new ontology, was christened M-theory by Witten, where ‘M’ is a placeholder to be filled in once the theory has actually been discovered. Hence, string theory is no longer a theory of strings; rather, it is a theory of p-branes (strings are a special case where p = 1: 1-branes).95 Though these M-theoretic ideas are extremely interesting, they are still very much up in the air (even for quantum gravity research!). This makes it very hard to probe philosophically. However, there are more robust elements: the duality symmetries are a case in point, so we briefly say some more about these. T-Duality: Equivalent Geometries and Topologies: T-duality (where ‘T’ stands for ‘target space’) is a gauge symmetry of classical string theory concerning the propagation of strings in a background metric spacetime. One finds that T-duality connects various prima facie distinct backgrounds so 94 There is much excitement at the moment over the status of the so-called ‘AdS/CFT’ correspondence (aka ‘the Maldacena conjecture’) which is believed to be a core part of Mtheory, and indeed quantum gravity simpliciter. This is essentially another duality mapping that would allow one to deal with a background independent theory on the boundary of a spacetime. One can construct a non-perturbative string theory using this duality since the CFT (conformal field theory) side can be algorithmically defined—I owe this point to Joseph Polchinski. This basic idea is known, more generally, as the holographic principle, and is believed to be a necessary principle obeyed by any and all viable approaches to quantum gravity. The holographic principle will no doubt be one of the standard hobbyhorses of future generations of philosophy of physics, however, the subject is too advanced for this primer: I refer the interested reader to (Schwarz, 1999), (Di Vecchia, 1999), and (Petersen, 1999). Some more very brief and technical remarks are contained in the next two footnotes. 95 Hence, there is a general class of extended objects: the p-branes. In the branification scenario mentioned previously, the spacetime we inhabit is viewed as a 3-brane. The study of 3-brane physics led to the Maldacena conjecture. The idea is that the 3-brane, specifically our spacetime described by a four dimensional gauge theory (N = 4 super Yang-Mills theory) without gravity, is a holographic part of a higher-dimensional theory with gravity (type-IIB string theory compactified on AdS5 × S 5 , with the Yang-Mills theory living on the boundary of AdS5 ). They are, in a sense, dual theories.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

321

that there is no way to physically distinguish between them—that is, they lead to the same physics. This scenario is similar in many ways to the hole problem in general relativity, where we have distinct metrics that have the same observable consequences. Both are generic problems of gauge symmetries. What it shows is that the background metric is not a measurable thing: T-dual metrics will span the same gauge orbit. However, though they share this feature, the symmetries have a very different source. In the case of classical general relativity the problem stems from the diffeomorphism invariance, having to do with the dynamical nature of the metric and the unobservability of manifold points. In√the case of T-duality the source is the fact that strings have a length, ls = α , and can be wound around compactified dimensions—-of course, strings cannot probe distances below ls (see §5.6.3.5). Taking a simple case in which we have extra dimensions compactified on a circle of radius R, T-duality says that this scenario is equivalent to the case α /R, where α is just the fundamental string length scale l2 . T-duality renders both the two type II theories and the two heterotic theories (exactly) physically equivalent. This implies that T-duality extends over all orders of the string perturbation expansion. T-duality applies to more complex spaces, such as CalabiYau spaces which are mirror symmetric—these are especially relevant for superstring theory since they form product spaces with four dimensional manifolds to form the string backgrounds. S-Duality: Equivalent Physics at Strong and Weak Coupling: Like Tduality, S-duality also relates what were previously thought to be distinct string theories. The idea is that the physics of one theory at strong coupling (large gs ) is physically equivalent to that of another theory at weak coupling (for small gs ). This duality holds for string theories of type-I and SO(32) heterotic. Let OI (gs ) be some observable in the type-I string theory and OSO(32) (1/gs ) be that observable in the SO(32) theory. Then S-duality says: OI (gs ) = OSO(32) (

1 ) gs

(5.18)

The type-IIB theory is self-dual, meaning that: OIIB (gs ) = OIIB (

1 ) gs

(5.19)

This duality enables one to look at the strong coupling behaviour of these string theories using the dual weakly coupled theory.96 In other words, one is led into distinctly non-perturbative territory. Of particular interest is 96 Again, this is related to the so-called Maldacena conjecture now relating the weak-coupling limit of a gravitational theory to the strongly-coupled limit of a certain kind of gauge theory without gravity. See previous footnote for more details and references.

322

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

the fate of the two remaining consistent superstring theories, namely typeIIA and E8 × E8 heterotic. If we try to run these theories through the S-duality machine the √ theories develop an additional spacetime dimension (with a size given by gs α ). This new eleven dimensional theory has eleven dimensional supergravity as a low-energy classical limit. As we already mentioned, the quantum theory, given the place-holder name ‘M-theory’, is still to be worked out. One possibility for the M is ‘Matrix’: here the idea is to replace string coordinates with matrices that commute at large distances but that fail to do so at short distances. The fact that those working on this approach do not yet know what the fundamental degrees of freedom are renders it unfit for most philosophical analysis: strictly speaking, there is not enough of the theory to analyze! However, one does sense a definite ‘circling in’ on this unknown theory. 5.6.3.5 Many Roads to Quantum Geometry A result of potential philosophical significance is the derivation of a ‘minimum length’ in the string—see (Amati et al., 1989). This minimal length actually refers to the smallest distance that one can experimentally probe using collisions of strings. For a particle, with energy E (and setting c = 1), one can probe distances:  E The extra dimension that strings possess modify this somewhat to: ΔX ∼

ΔX ∼

 + GN E  E

(5.20)

(5.21)

The additional term GN E  corresponds to the strings’ being thrown together at very high energies. This gives as a minimal distance (for stringy scattering experiments): ΔXmin ∼



GN  = l P

(5.22)

That is, the minimal length in string theory is the Planck length, as one might expect. This provides an intuitive explanation for why there are no ultraviolet divergences in string theory: the minimal length provides a physical cutoff—it also grounds a notion of quantum geometry in string theory. Likewise, one of the philosophically interesting√aspects of T-duality is that it too appears to imply a fundamental length scale α for spacetime. To see how this arises simply note that the duality links physics at distance radius R to that  at radius αR . Witten spells out the implications of this: What one might imagine would be a world in which at distances above √ √ α , normality prevails, but at distances below α , not just physics as we know it but local physics altogether has disappeared. There will be no distance, no times, no energies, no particles, no local signals - only

THE MANIFOLD METHODS OF QUANTUM GRAVITY

323

differential topology, or its string theoretic successor. ((Witten, 1989), pp. 350–1)

It is a little puzzling why Witten nonetheless insists that there would be ‘some√ thing’ beneath α corresponding to physical reality, rather than interpreting the topological structure as mere surplus. This might √ constitute a fallacy of misplaced concreteness. However, it seems that sub- α scales can be probed by D-branes, but they too lead to a form of quantum geometry (see below). There are many conceptually (and mathematically) interesting features of D-branes. Most interesting, from a philosophical point of view, is that the coordinates of D-branes (when combined into a system of such) are non-commutative. Hence, we have a direct connection to noncommutative geometry and quantum theory—quantum geometry of a rather different variety to that found in loop quantum gravity, for example. How does this come about? D-brane positions are represented by matrices rather than numbers or vectors. When we have a system of N (indistinguishable) D-branes, the positions are then given by N × N matrices. For N > 1 these will generally not commute. There is a connection to group theory and gauge theory too: for a system of N D-branes, the physics is given by gauge group SU (N ). Now, since D-branes are micro-black holes, if we stack a system of them together then we can create a macro-black hole. This black hole will then be described by the SU (N ) gauge theory, where N depends on how many D-branes we stack. In the limit of large numbers of D-branes (to build up size/mass), this method delivers the desired Bekenstein-Hawking entropy. At least it does for so-called ‘extremal black holes’, namely those whose charges take the maximum possible value relative to the values for mass and angular momentum—see (Strominger and Vafa, 1996) for details. Any results that string theorists have in this regard, however, are contingent on their extrapolation away from extremal cases. 5.6.3.6 Conclusions The second revolution picture amounts to the emancipation from the constraints imposed by string perturbation theory. The incompleteness with which it does this, is a matter of much controversy. Indeed, many outside of string theory complain that what we end up with is not really a theory at all, but a very long, very complex promissory note. Time and experiment will tell. However, it is true that philosophical work, especially of the interpretive sort, is much harder as applied to the non-perturbative approach. 5.6.4

Canonical Quantization

Canonical approaches to general relativity formulate the theory within the Hamiltonian formalism as a dynamical theory of the basic configuration variable representing space (e.g. spatial geometry, spatial connection, or Wilson loop of a spatial connection). One then attempts to adapt standard quantization techniques to general relativity so formulated. Hence, the fundamental object is space, and general relativity is about its evolution. This is quite unlike general relativity in its standard spacetime formulation. In that case although

324

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

spacetime is said to be ‘dynamical’, it itself does not change: spacetime is given once and for all. Of course, in order to evolve, a time is needed against which things do their evolving; but in general relativity time is one of the things to be solved for. ‘Dynamical’ here refers to the nature of spacetime’s coupling to matter, to background independence. Canonical approaches do allow us to speak about evolution by setting apart time from space, and allowing space to evolve with respect to it. Hence, one splits spacetime apart so that M ∼ = R × σ (where σ is a compact three dimensional hypersurface). This is achieved by means of a foliation of the spacetime by a family of spacelike hypersurfaces, or leaves: Ft : σ → (Σt ⊂ M)

(5.23)

Each leaf Σt is then taken to correspond to an instantaneous spatial slice (an ‘instant of time’).97 Spacetime is recovered as a stack (a one-parameter family) of these slices. However, there are many ways in which this spacetime recovery can be achieved from stacks of slices. This freedom in the way spaces are stacked corresponds to the diffeomorphism (gauge) symmetry of the spacetime version of general relativity. The symmetry is canonically rendered using four constraint functions on the chosen spatial manifold, a scalar field (known as the Hamiltonian constraint) and a vector field (known as the diffeomorphism constraint). These have the effect, respectively, of pushing data on the slice onto another nearby (infinitesimally close) slice and shifting data tangentially to the slice. In a canonical approach (to field theories) one writes theories in terms of fields and their momenta. Spacetime covariant tensors are split apart into spatial (tangential) and temporal (normal) components. This naturally obscures general covariance, but the theory is generally covariant despite surface appearances. The general covariance of the Einstein equations, reflecting the spacetime diffeomorphism invariance of the theory, is encoded in constraints.98 Taken together, when satisfied, these constraints are taken to reflect spacetime diffeomorphism invariance; 97 Each of these surfaces corresponds to the same initially chosen surface in terms of its intrinsic properties. They are distinguished by their extrinsic properties, by the way they are embedded in spacetime. 98 In the Lagrangian formulation of general relativity the spacetime symmetries are manifest. The canonical formulation buries the symmetries in constraints, due to the splitting of spacetime into space and time. The basic idea of a constraint can be best understood with a simple visual example. First consider a particle that is able to move on a plane, with configuration variable (position, specified by a pair of numbers (x, y) ∈ R2 ) and associated momentum variables px and py . Four degrees of freedom in total. Now suppose that the particle is restricted to a circle, x2 + y 2 = r 2 . In this case the particle no longer has the ‘freedom to move’ that it previously had, so that there are states, previously accessible, that are now inaccessible—hence the terminology, constraint. The constraint in this case is: xpx + ypy = 0. The variables are, therefore, not all independent of one another: we can solve for one in terms of the others. This means that the number of degrees of freedom is reduced by two, so there are just two remaining. The inaccessible states are usually understood as ‘surplus structure’, and one can move to the ‘reduced’ formulation with fewer degrees of freedom, taking these as providing the basis of a description without constraints, with the surplus jettisoned.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

325

together they tell us that the geometry of spacetime is not affected by the action of the diffeomorphisms they generate. This job is done by two, of course, since the diffeomorphism constraint deals with aspects of the spatial geometry and the Hamiltonian constraint deals with aspects of time. Imposing both delivers the desired full spacetime diffeomorphism invariance. 5.6.4.1 The ADM Formulation. The canonical formulation of general relativity depicts the gravitational field and spacetime geometry in terms of the evolution of fields defined on spatial slices (or hypersurfaces) Σi , given some foliation of spacetime. In the ADM (geometrodynamical) case, the geometry of the hypersurfaces is described by the 3-metric qab (induced by the foliation); this is the configuration variable.99 The configuration space is then the space of Riemannian metrics on Σ, Riem(Σ). Recall that in geometrodynamics (cf. (Arnowitt et al., 1962)) the points in the phase space of GR are given by pairs (q, p)—where q is a Riemannian metric on a 3-manifold Σ and p is related to the extrinsic curvature K of Σ describing the way it is embedded in a four dimensional Lorentzian spacetime. In GR, the pair must satisfy the constraint equations, and this condition picks out a surface in the phase space called the constraint surface. The observables of the theory are those quantities that have vanishing Poisson Bracket with all of the constraints. According to the geometrodynamical program, each point on the constraint surface represents a physically possible (i.e., by the lights of general relativity) spacelike hypersurface of a general relativistic spacetime. Points lying on the complement of this surface are also 3-manifolds, but they do not represent physically possible spacetimes; they have metric and extrinsic curvature tensors that are incompatible with those needed to qualify as a 3-space embedded in a general relativistic spacetime. The constraint surface comes equipped with a set of transformations C → C that partition the surface into subspaces known as ‘gauge orbits’ (these transformations are the gauge transformations). To develop a spacetime from an initial data slice (satisfying the constraints), one selects a one-parameter family of gauge transformations by choosing a specific lapse function N (x, t) and shift vector N a (x, t) and evolving the data with the constraints—varying the action with respect the lapse function gives the Hamiltonian constraint: H[N ] := √

qab qcd ab cd

1 )p p − detqR ≈ 0 (qac qbd − 2 detq

(5.24)

99 We can distinguish various approaches through the choice of variables. The earliest attempts used the most obvious choice of a 3-metric q and its conjugate p, giving a theory of geometrodynamics. More recent work has been couched in terms of a connection and its conjugate and the holonomy of a connection (i.e. the path-dependent integration of the connection around a loop in Σ) and its conjugate. The latter choice leads to loop quantum gravity, in which the fundamental classical variables are Wilson loops. See §5.6.4.2, below, for more details on these alternative polarizations.

326

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

where R is the Ricci scalar curvature associated with the 3-metric q. Varying with respect to the shift gives the diffeomorphism constraint: D[N a ] := −2qac Db pbc ≈ 0

(5.25)

Where D is the covariant derivative compatible with the 3-metric q. Conceptual (and technical) problems follow quickly from this formulation. Most obvious is that the imposition of the Hamiltonian constraint implies that we do not have a time coordinate to work with—but see below for a method of extracting a time variable from the field components. Given this, one does not have a Hamiltonian generating the evolution; the Hamiltonian constraint itself delivers the dynamics. The problem is that the deformations it generates are diffeomorphisms, and imposing the constraint amounts to an invariance with respect to such deformations. This is, of course, just what we would expect in a theory without an external, absolute time parameter. But how are we to make sense of a dynamical theory for which this is the case—i.e. how are we to make physical sense of a background independent theory? There is some consensus forming that imposing the Hamiltonian constraint amounts to viewing the dynamics as involving correlations between the physical degrees of freedom of the theory, rather than involving a relationship between some fixed structure and the physical processes. Hence, though evolution with respect to an independent time parameter is ruled out by the Hamiltonian constraint, one can adopt a kind of relational view of evolution according to which a field, say, evolves with respect to another (physical) field rather than an external time parameter. Whether this amounts to relationalism or not is not clear. There are two problems here: firstly the manifold still appears as a background structure in canonical approaches. Secondly, one still has the option of interpreting the gravitational field as a substantival space. We return to these issues below. Now let us turn to the quantization of this theory. There are four quantum constraints in quantum (Hamiltonian) general relativity—or, rather, 4 × ∞ (that is four infinite families worth) since these are local in that they sit at every point of the hypersurface. There are three diffeomorphism (or momentum) constraints that render quantum states independent of the choice of coordinates on Σ: ˆ a (q, p)Ψ(x) = 0 , ∀x ∈ Σ (a = 1, 2, 3) H

(5.26)

and the Hamiltonian constraint: ˆ ⊥ (q, p)Ψ(x) = 0 , ∀x ∈ Σ H

(5.27)

The full Hamiltonian for general relativity—the dynamical equation—is then a sum of these constraints:  ˆa) ≈ 0 ˆ p)Ψ(x) = ˆ⊥ + N aH H(q, d3 x (N H (5.28) Σ

THE MANIFOLD METHODS OF QUANTUM GRAVITY

327

Given the constraints we get the result that quantum states are annihilated by the full Hamiltonian: ˆ =0 HΨ

(5.29)

This equation (known as the ‘Wheeler-DeWitt equation’) contains all of the dynamics in quantum geometrodynamics.100 The quantum states (the wavefunctionals Ψ) depend only on the 3-metric, not on time—hence, the solutions represent stationary wavefunctions. In fact, the diffeomorphism constraint implies that the quantum states are only dependent on the 3-geometry rather than the metric—that is to say, the states are invariant under diffeomorphisms of Σ thanks to the diffeomorphism constraint. As mentioned above, the fact that the fundamental dynamical equation does not depend on time has led to much conceptual discussion. However, geometrodynamical approach ran out of steam due to irresolvable technical difficulties. New variables based on a canonical transformation of the phase space of general relativity led to a more tractable formulation. 5.6.4.2 New Variables To understand this formulation properly, we must first introduce the concept of a dreibein or triad eai (where i indexes an internal space). These are used to give an alternative representation of the geometry of a spatial slice. Each triad corresponds to a triplet of mutually orthogonal vector fields. These fields are sufficient to reconstruct the spatial geometry since we have the relation q ab = eai ebi (where q ab is the inverse metric). This change of variables introduces an additional constraint into the theory—the Gauss law constraint generating SO(3) transformations—on account of the freedom to rotate the vectors without altering the metric. The new variables (Aia , Eia ) introduced by Ashtekar101 are related to the triad formulation as: Eia = |det ebj |−1 eai and Aia = Γia + γKai (where Kai = Kab ebi , γ is 100 Note that there is an alternative way to quantize constrained Hamiltonian systems. One can solve the constraints first, rather than solving them at the quantum mechanical level as was done here (this latter approach is known as ‘constrained quantization’ or ‘Dirac quantization’). Maxwell’s theory, for example, proceeds by solving the constraints classically, and then quantizes with respect to the reduced phase space (reduced quantization). The virtue of this approach is that one can apply the steps of standard Hamiltonian quantization without constraints. However, in the case of general relativity the reduced phase space is much more difficult to work with. Also, since the constraints are associated with dynamics, in eradicating them it looks as if one eradicates dynamics. See (Ashtekar and Tate, 1991) for a superb introduction to these methods and their differences. 101 The Ashtekar connection Ai is responsible for representing the curvature of space (it a provides a notion of parallel transport of spinors); the triad Eia (the ‘electric fields’) contains the metric information (i.e. the triad determines the geometry of space), which it describes via a family of local orthonormal frames (geometric observables can be reconstructed as functionals of these fields). The picture of quantum spatial geometry is very well-understood. However, the geometry of spacetime is much more complex because it requires that one solve the dynamical equation, the Hamiltonian constraint.

328

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

the Barbero-Immirzi parameter,102 and Γ is the spin-connection). One recovers the standard geometry of the spatial slice via the relation: Eia Eib = q ab det q. One can then rewrite the diffeomorphism and Hamiltonian constraints, D[N a ] and H[N ], as follows:103  1 i D[N a ] = d3 xN a Fab Eib ≈ 0 (5.30) 8πGγ Σ As before, the diffeomorphism constraint will, when satisfied, insure that the theory is independent of a background spatial geometry, or, in other words, that it is spatially diffeomorphism invariant. Solving this constraint (and the Gauss Law constraint below) provides the arena for loop quantum gravity: quantum spatial geometry. There are objects that solve these constraints in a fairly natural way. The Hamiltonian constraint responsible for the dynamics, leading to spacetime geometry, is written: 1 H[N ] = 16πGβ

 Σ

1

j a b i i d3 x N |det E|− 2 [ ijk Fab Eka Ekb − 2(1 + γ 2 )K[a Kb] Ei Ej ] ≈ 0

(5.31)

The additional Gauss law constraint is written:  1 G[Λ] = d3 xΛi (∂a Eia + ijk Aja Eka ) ≈ 0 8πGγ Σ

(5.32)

This constraint, when satisfied, ensures that the theory is independent of arbitrary rotations of the dreibeins (which we desire, since the spatial geometry, as encoded in the 3-metric, is invariant with respect to such rotations). 5.6.4.3 Loop Variables The modern incarnation of canonical quantum gravity is ‘loop quantum gravity’. The loop variables are based on connection variables: one takes the path-ordered integral of the connection around a closed curve (i.e. a loop α) in Σ. This is the holonomy U [A, α] of the connection:  U [A, α] = P exp (G A) (5.33) α 102 The exact value of this parameter (a dimensionless constant) has to be put in by hand (i.e. it is a free parameter). It was set positive by Ashtekar. There is a method to set the value by calibrating loop quantum gravity’s expression for the entropy of a black hole with Hawking’s formula. However, recently Olaf Dreyer showed how its valued can be computed using the quasi-normal modes√(i.e. the vibrational modes) of classical black holes (Dreyer, 2003)—he obtains γ = ln3/2π 2. The details are rather complicated: for an elementary discussion see (Baez, 2003). 103 In the equations that follow, F i is the curvature of the Ashtekar connection. These equalab ities vanish on the constraint surface, hence the use of ‘≈’ (weak vanishing) as opposed to ‘=’. The presentation here follows Bojowald (2005).

THE MANIFOLD METHODS OF QUANTUM GRAVITY

329

The conjugate variable is then the flux of the ‘electric field’ Eia through a strip S ∈ Σ (where S : [0, 1] → Σ).104 Much of the work achieved so far in loop quantum gravity has been carried out in the context of ‘kinematics’—i.e. without solving the quantum constraints, or at least not all of them: the Hamiltonian constraint is the problem. However, interesting results have been found in the kinematical sector which are argued to be robust enough to carry through into the full theory, where all of the quantum constraints are implemented. Perhaps the most interesting result is the discovery of discreteness of geometric operators on the space of states. We have to be rather careful with this result though; the operators do not commute with the constraints, and therefore do not class as genuine observables. Only when the surfaces and volumes in question are physical (the surface of this piece of paper, for example), do we get the possibility of a genuine observable, that is a gaugeinvariant quantity (or a quantity represented by an operator on the physical Hilbert space). Attempts have been made to solve all of the quantum constraints using a modified (discrete) covariant methodology: this is the way of spin-foam modelling, on which see §5.6.6.8. To quantize this theory one first isolates the Poisson algebra between the fundamental canonical variables—in this case we have as the basic configuration variable of the gravitational field the holonomy HA (γ) of an SU (2)-connection Aia on 3-space and, as the conjugate momentum variable, the flux of the electric fields Eia over a 2-surface S. From their Poisson algebra, one then defines an abstract algebra A of quantum operators—i.e. at this stage, there is no concrete representation of the operators on a Hilbert space; that is the next task. Once armed with a representation of A on Hilbert space H one then has the kinematic backbone of a quantum theory of gravity based on loops.105 Elements of H are wave-functions of the connections (square integrable with respect to a diffeomorphism-invariant measure). This space forms an overcomplete basis (the space is too big), something which is cured by shifting to the ‘spin-network’ basis (see (Rovelli and Smolin, 1995)). The subspaces of this space are labelled by graphs whose links are then labelled by spins (half-integers). This framework gives us a notion of quantum geometry: the geometric operators constructed in this approach serve to ‘deposit’ a quantum of area on a 2-surface that it intesects. One finds that the spectra of eigenvalues of geometrical operators (such as area and volume) constructed from the fluxes are discrete. Whether this result has 104 The use of path-dependent variables to aid in the quantization of the gravitational field was in fact suggested many decades ago by Mandelstam (1962)—Mandelstam points out that Bryce DeWitt and David Pandres made similar suggestions. These ideas were developed in great depth in (Gambini and Pullin, 1996). 105 Astute readers might be concerned about this stage of the quantization procedure since in systems with infinitely many degrees of freedom one generally faces the problem of inequivalent representations. However, here the novelty of general relativity, its background independence, comes to the rescue, serving to uniquely determine a representation (known as the ‘AshtekarLewandowski representation’)—see (Lewandowski et al., 2006) and (Fleischhack, 2007) for details.

330

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

physical significance is another matter—see §5.6.4.5. 5.6.4.4 Holes and Determinism in the Canonical Approach. We have already touched upon the steps involved in canonical quantization. Here we develop this some more. The first step is to cast a theory in Hamiltonian form. This is immediately problematic in general relativity because it does not have the structure of a standard Hamiltonian system—its phase space is not a regular symplectic manifold. What this means is that, given a specification of the values of the canonical variables at an initial time, the equations of motion of the theory are not able to propagate all variables: the future values of the variables cannot be determined uniquely. A simple reductio can show why this has to be the case: assume that the components of the metric can be uniquely determined given their initial specification (along with their first time derivatives). In this case there is a single, unique way that the metric will develop in the future. But this violates the general covariance of general relativity. Hence, the original assumption was wrong. Instead, the theory permits arbitrary coordinate transformations to the future of some initial time (and which is the identity at all times before this time). Hence, there are multiple solutions (ways of developing the data) compatible with some initial specification. Determinism appears to be threatened, but this is eradicated imposing a gauge principle which considers these distinct solutions to represent one and the same physical situation (this is the 3-geometry that was mentioned earlier). This is analogous to the problem that arises in the case of Maxwell’s theory of electromagnetism (written in terms of potentials A and φ) for which the future values of these quantities are determined only up to an arbitrary function of spacetime.106 Again, the indeterminism can be washed away by treating the multiple futures as gauge-equivalent, representing one and the same physical configuration of fields. This non-uniqueness, then, does not imply that general relativity does not have a well-posed initial value problem: it does. The Einstein field equations comprise a second order system of hyperbolic equations, so that in this sense, in general, a specification of the metric and its first time derivatives on some spacelike hypersurface will deliver a unique solution. The problem is that the diffeomorphism invariance of the spacetime formulation of the theory leads to the presence of constraints on the initial data. 106 There is a disanalogy between these two cases: Maxwell’s theory faces the Aharonov-Bohm effect. Here we modify the double-slit experiment to include a solenoid (long—ideally, infinitely so—and thin) sitting beyond and in between the slits. This produces a magnetic field confined within the solenoid when it is turned on. Outside of the solenoid the value of the magnetic field is zero. Electrons are fired through the slits at the screen, and when the solenoid is turned on, they undergo a phase shift—this is manifested by a shift in the interference pattern on the detection screen. This phase shifting is known as the “Aharonov-Bohm effect”. The magnetic field is zero in the path taken by the electrons, but the vector potential is non-zero. This appears to show that the gauge potentials are real, so that gauge equivalence does not imply physical equivalence. Either that, or there is some non-local action being performed at a distance by the magnetic field on the electrons. See (Aharonov and Bohm, 1959) for the original presentation or (Feynman, 1962), Chapter15, for an exceptionally clear presentation.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

331

We find that the solution is uniquely determined up to a diffeomorphism. This is basically the issue underlying the hole problem, along with its standard solution. This problem plays a key role in the quantum theory of canonical gravity since one needs to make a decision about whether to remove the gauge freedom before or after quantization (or before and after the constraints have been solved). Loop quantum gravity, and most other modern canonical approaches, adopt the method of Dirac quantization according to which the constraints are imposed as operators at the quantum level. Since symmetries (such as the gauge symmetries associated with the constraints) come with quite a lot of metaphysical baggage attached, such a move involves philosophically weighty assumptions. For example, the presence of symmetries in a theory would appear to allow for more possibilities than one without, so eradicating the symmetries means eradicating a chunk of possibility space: in particular, one is eradicating states that are deemed to be physically equivalent, despite having some differences somewhere or other. Hence, imposing the constraints involves some serious modal assumptions. Belot and Earman (2001) have argued that since the traditional positions on the ontology of spacetime (relationalism and substantivalism) involve a commitment to a certain way of counting possibilities,107 the decision to eliminate symmetries can have serious implications for the ontology one can then adopt. 5.6.4.5 Is Quantum Geometry Discrete? In loop quantum gravity, as we saw, the geometric operators are not physical observables: they are operators on a kinematical state space that is far too big! The diffeomorphism and Hamiltonian constraints have not been factored out of the space, though the Gauss law constraint is satisfied. One forms a state space with the configuration variable as an SU(2) valued connection. This gives L2 (A/G) (with A ∈ SU(2)-connections on Σ and where G is the Gauss constraint). Quantum states are given by spin networks, which form a basis for L2 (A/G). Recall that in order to get solutions to the diffeomorphism constraint one has to factor out the diffeomorphism freedom. This involves taking equivalence classes of spin-networks under diffeomorphisms of Σ, giving us the s-knots (i.e. the orbits of spin-networks by the action of the diffeomorphism constraint). Finding solutions to the constraints would give us states of quantum gravity that are invariant under diffeomorphisms of spacetime. These would be the physical states that the theory would be about and that the experimentalists would try to find out about. One of the main claims of loop quantum gravity is that it makes a genuine physical prediction about the nature of space; one that is, in principle, testable. The theory says that surfaces of space are discrete; or rather, the operators corresponding to the measurements one would perform to determine the properties of space (area, volume, etc.) have a discrete spectrum which can be computed 107 Roughly, substantivalists countenance more possibilities than relationalists because they count states that differ only in how the physical stuff is distributed over a set of spacetime points. Relationalists will count these as the same physical possibility.

332

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

and which forms the basis of the prediction. Hence, this property of discreteness of geometry at the Planck scale forms the core of the potentially predictive basis of loop quantum gravity. Loop theorists also use it to compute the entropy of quantum black holes. To do this one views the surface area (i.e. the horizon) of a black hole as being determined by the number of spin network

links intersecting the surface, each one depositing an area quantum of 8πlP2 γ j(j + 1). However, given the problem presented above, these cannot be viewed yet as genuine physical predictions of the theory.108 As potentially revolutionary and philosophically exciting as the spectral discreteness of geometric operators is, it has to be conceded that there is, thus far, no firm proof of Planck scale discreteness. The discreteness result has been shown to hold only for operators on the kinematical Hilbert space: that is, for gauge variant quantities. It is still an open question whether this result transfers to genuine observables (i.e. operators that satisfy all of the constraints and are defined on the physical Hilbert space). This means that the results of discreteness does not imply that this is what would be measured in a real physical experiment (such measurements will presumably be described by gauge invariant, Dirac observables). Hence, the loop gravity programme still needs to solve the dynamics because Dirac observables must commute with all of the constraints. If the theory is then found to retain this discreteness, then the result is clearly one of tremendous physical significance. Unpacking the meaning and implications of this kind of discreteness is a task philosophers could be engaging in now. 5.6.4.6 The Problem of Time. Conceptual problems are much more transparent in canonical, non-perturbative approaches (1) because the full metric is quantized and (2) because gμν is understood to perform its dual function of both describing the gravitational field and as being the determinant of the spacetime geometry in virtue of its status as a metric field. Naturally, if we adopt this viewpoint, then a quantization of gμν , so that the components become quantum field variables, will, in some sense, involve a quantization of spacetime geometry. Unpacking this notion will be a job for philosophers as well as physicists. One potential consequence—hard to prove for the physical case, as we saw above—is that spacetime (or, rather, space) is discrete. Another is that spacetime seems not to be involved as a background, rather it is what one gets out of the theory (at least, that’s the hope).

108 It is generally assumed that discreteness transfers from the gauge variant (kinematical, unphysical) to the gauge invariant (dynamical, physical) cases, so that the genuine physical geometrical operators possess discrete spectra too. However, Dittrich and Thiemann (2007) demonstrate that discreteness at the kinematical level does not always transfer into the physical level for the case of partial and complete observables. Their examples are based on models with few degrees of freedom. It is possible that their result is an artifact of this restriction; in the case of general relativity, you will recall, there are infinitely many constraints. The burden of proof is, however, clearly on the loop programme to demonstrate the transference of spectral properties.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

333

Arguably, however, the main philosophical problem of all canonical approaches is the problem of time, which also manifests itself as a problem of change, and is buried in the problem of observables. All of these relate back to the diffeomorphism invariance of Einstein’s theory, and the way in which this invariance principle is encoded in the constraint equations, especially the Wheeler-DeWitt equation. In order to keep this chapter at a reasonable length we focus on just few aspects of this very large, much discussed problem.109 As Ashtekar and Geroch point out, “[s]ince time is essentially a geometrical concept, its definition must be in terms of the metric. But the metric is also the dynamical variable, so the flow of time becomes intertwined with the flow of the dynamics of the system” ((1974), p. 1215). The problem of time (and change) is that the dynamics of the system is contained in the constraints, and these are taken to generate gauge motions. But gauge motions correspond to no change in physical state! Further problems naturally arise when we take the metric to be a quantum variable, in which case there are fluctuations in the geometry (including temporal geometry). This gives us a variety of interrelated problems, both classical and quantum, to do with time and change: • Prima facie, the genuine observables of the theory don’t change from one hypersurface to the next since they must commute with all of the constraints (which take the data from one hypersurface to another)—the constraints generate gauge, so this motion is pure gauge. • Displacement in time is just a diffeomorphism and, therefore, also a gauge transformation. The physics needs to be independent of such transformations, and so must be time-independent. • In the quantum theory, when we impose the constraints we get a Hamiltonian that necessarily annihilates physical states. One response to these problems, adopted by Karel Kuchaˇr (1992) is to distinguish physically between the Hamiltonian and diffeomorphism constraints, so that only the latter is viewed as a generator of gauge transformations. One then has to find a ‘hidden’ or ‘internal’ time among the phase space variables.110 Now, the configuration variable in canonical quantum gravity, in the geometrodynamical formulation, is the 3-metric which requires 6 numbers per space point.111 By gauge fixing and selecting a particular coordinate system three more degrees 109 More details, from a philosophical point of view, can be found in the review articles (Belot and Earman, 2001; Rickles, 2006). An excellent pair of classic reviews is (Kuchaˇr, 1992; Isham, 1993). For a philosophically astute review of the problem of the emergence of time in quantum gravity, covering a range of approaches to quantum gravity rather than just canonical approaches, see (Butterfield and Isham, 1999). 110 Several authors have suggested that the non-local hidden variables theory of Bohm may offer a way out of the trouble by dispensing with the Schr¨ odinger equation entirely in place of another that has a nonzero quantum dynamics—see (Callender and Weingard, 1996) and (Goldstein and Teufel, 2001). However, I have yet to see this proposal worked out beyond the basic idea; certainly not into something that could deliver a quantum theory of gravity. 111 Three degrees of freedom are removed by the symmetry of the metric: g (x) = g (x). ab ba

334

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

of freedom can be disposed of, leaving three—this is just to impose invariance under the spatial diffeomorphism group Diff(Σ). As we saw earlier, general relativity has two degrees of freedom corresponding to the two helicity states of a massless spinning graviton. One way to resolve the discrepancy is to impose invariance under normal deformations of the hypersurface (from the perspective of the ambient spacetime in which the hypersurface is embedded). This seems reasonable on account of the fact that we need to reflect the full spacetime diffeomorphism group in the formalism, and Diff(Σ) isn’t by itself sufficient. This is encoded in the Hamiltonian constraint (or Wheeler-DeWitt equation). Of course, the Wheeler-DeWitt equation does not contain a time variable. The residual degree of freedom has been identified as a ‘hidden’ time variable known as ‘internal time’. The time is, then, buried in the gravitational field, and appears as a function of the field’s variables. When this time is isolated from the variables, one can consider the evolution of the other variables (the two physical degrees of freedom) against this internal time. This does not violate the condition of no preferred reference frames, since the choice of internal time is not unique: time is ‘many fingered’, as one would expect. The invariance principle encoded in the Wheeler-DeWitt equation is a reflection of the fact that there is no privileged way of slicing spacetime into instants of time. Alternatively, one can view the invariance as pointing to the eradication of a fundamental notion of time. A problem (or feature) that emerges from the internal time approach is that, since time is bound up with the gravitational field, it is a semi-classical or emergent feature. One begins with a notion of space and then develops a notion of spacetime from this. However, because spacetime trajectories are subject to the uncertainty principle, just like particle trajectories, the notion of spacetime breaks down.112 Hence, there are various proposed resolutions of this problem: some retain time as an intrinsic feature of the world; others deny any fundamental reality to time. Following Karel Kuchaˇr’s terminology, these broad responses are called ‘Heraclitean’ and ‘Parmenidean’ respectively.113 The latter category of responses, the time deniers, follow Einstein’s dictum that “if physics wants to use time, it first has to define it” (Einstein, 1916b). This leaves us a certain amount of 112 Recall that Kant argued that space and time were necessary conditions for experience. Presumably this implies that any adequate physics must include them amongst its fundamental categories. But this is denied by many quantum gravity researchers, not just canonical theorists. But there need not be a clash: the claim is generally that time and space do not exist at the microscopic level, not that they do not exist simpliciter. It is possible, for example, that spacetime is an emergent or collective property that is manifest only at certain scales. This viewpoint can be seen by remembering that the features of space and time are determined by the (classical) gravitational field. 113 It is important to note that the debate here is not directly connected to the debate in the philosophy of time between ‘A-theorists’ and ‘B-theorists’ (or ‘tensers’ and ‘detensers’, if you prefer). Both of these latter camps agree that time exists, but disagree as to its nature. By contrast, the division between Heraclitean and Parmenidean interpretations concerns whether or not time exists simpliciter !

THE MANIFOLD METHODS OF QUANTUM GRAVITY

335

freedom in ‘recovering’ a notion of time. We briefly look at several proposals that have a more philosophical flavour. Julian Barbour (2001) bites the bullet and defends the view that time, as a dimension against which things evolve, does not exist. This is based on the fact that canonical quantum gravity uses just the 3-space. Jeremy Butterfield (2001) describes Barbour’s position as “a curious, but coherent, position which combines aspects of modal realism ` a la Lewis and presentism ` a la Prior” (p. 291). While there is a whiff of these positions in Barbour’s position, his brand of timelessness is more directly connected to a denial of persistence—and, as such, is not timeless at all. Rather, it is changeless. Far from denying time, then, Barbour has in fact reduced it (or, rather, the instants of time) to the points of a relative configuration space. This is, however, very different from time as is standardly conceived, and how it is modelled in quantum mechanics, for example. The central structure in Barbour’s vision is the space of Riemannian metrics mod the spatial diffeomorphism group (known as “superspace”): Riem(Σ)/Diff(Σ). Choosing this space as the configuration space of the theory amounts to solving the diffeomorphism constraint; this is Barbour’s relative configuration space that he labels “Platonia” (ibid., p. 44). The Hamiltonian constraint (i.e. the Wheeler-DeWitt equation) is then understood as giving (once solved, and “once and for all” ((Barbour, 1994), p. 2875)) a static probability distribution over Platonia that assigns amplitudes to 3-geometries (Σ, q) in accordance with Ψ[q]2 . Each 3-geometry is taken to correspond to a “possible instant of experienced time” (ibid.) The very real appearance of change is explained by introducing the notion of a ‘time capsule’, or a ‘special Now’, by which he means “any fixed pattern that creates or encodes the appearance of motion, change or history” ((Barbour, 2001), p. 30). Barbour conjectures that the relative probability distribution determined by the Wheeler-DeWitt equation is peaked on time capsules; as he puts it “the timeless wavefunction of the universe concentrates the quantum mechanical probability on static configurations that are time capsules, so that the situations which have the highest probability of being experienced carry within them the appearance of time and history” (ibid.). Barbour’s approach is certainly timeless in this sense: it contains no reference to a background temporal metric in either the classical or quantum theory. The metric is defined by the dynamics, in true Machian style. Butterfield mentions that Barbour’s denial of time might sound (to a philosopher) like a simple denial of temporal becoming—i.e. a denial of the A-series conception of time. He rightly distances Barbour’s view from this B-series conception. Strictly speaking, there is neither an A-series nor a B-series on Barbour’s scheme. Barbour believes that space is fundamental, rather than spacetime. This emerges from his Machian analysis of general relativity. However, according to Barbour, there are many Nows that exist ‘timelessly’, even though we happen to be confined to one. The following passage brings the ‘Lewis/Prior’ hybrid flavour of Barbur’s view that Butterfield mentions out: All around NOW ... are other Nows with slightly different versions of

336

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS yourself. All such nows are ‘other worlds’ in which there exist somewhat different but still recognizable versions of yourself. (ibid., p. 56)

Clearly, given the multiplicity of Nows, this cannot be presentism conceived of along Priorian lines, though we can certainly see the connection to modal realism; talk of other nows being “simultaneously present” (ibid.) surely separates this view from the Priorian presentist’s thesis. That Barbour’s approach is not a presentist approach is best brought out by the lack of temporal flow; there is no A-series change. Such a notion of change is generally tied to presentism. Indeed, the notion of many nows existing simultaneously sounds closer to eternalism than presentism; i.e. the view that past and future times exist with as much ontological robustness as the present time. These points also bring out analogies with ‘many-worlds’ interpretations of quantum mechanics; so much so that a more appropriate characterization might be a ‘many-Nows’ theory.114 There is a view, that has become commonplace since the advent of special relativity, that objects are four-dimensional; objects are said to ‘perdure’, rather than ‘endure’: this latter view is aligned to a three-dimensionalist account according to which objects are wholly present at each time they exist, the former view is known as ‘temporal part theory’. The four-dimensionalist view is underwritten by a wide variety of concerns: for metaphysicians these concerns are to do with puzzles about change; for physics-minded philosophers they are to do with what physical theory has to say. Change over time is characterized by differences between successive temporal parts of individuals. Whichever view one chooses, the idea of persisting individuals plays a role; without this, the notion of change is simply incoherent, for change requires there to be a subject of change. Although Barbour’s view is usually taken to imply a three-dimensionalist interpretation (by Butterfield for one), it is perfectly compatible with a kind of temporal parts type theory. We see that the parts of Platonia, the Nows, do not change or endure and they cannot perdure since they are three-dimensional items and the parts occupying distinct 3-spaces (and, indeed, the 3-spaces themselves) are not genidentical; rather, the quantum state varies from Now to Now in accordance with the Hamiltonian constraint in such a way that the parts (specifically, the time capsules) contain records that ‘appear’ to tell a story of linear evolution and persistence. Properly understood, then, Barbour’s views arise from a simple thesis about identity over time, i.e., a denial of persistence: We think things persist in time because structures persist, and we mistake the structure for substance. But looking for enduring substance is like looking for time. It slips through your fingers. (ibid., p. 49)

In denying persisting individuals, Barbour has given a philosophical grounding for his alleged timelessness. However, as I mentioned earlier, the view that results might be seen as not at all timeless: the relative configuration space, consisting 114 Indeed, Barbour himself claims that his approach suggests what he calls a “many-instants ... interpretation of quantum mechanics” (ibid.). However, it seems clear that the multiplicity of Nows is as much a classical as a quantum feature, so we have a definite disanalogy.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

337

of Nows, can be seen as providing a reduction of time, in much the same way that Lewis’ plurality of worlds provides a reduction of modal notions.115 The space of Nows is given once and for all and does not alter, nor does the quantum state function defined over this space, and therefore the probability distribution is fixed too. But just like modality lives on in the structure of Lewis’ plurality, so time lives on in the overall structure of Barbour’s Platonia. For a good philosophical examination of this view, see (Butterfield, 2001); for a recent more technical account, placed in historical context, see (Barbour, forthcoming). Richard Healey (Healey, 2004) is not convinced by the ‘no-change’ argument in classical general relativity. He questions the claim that Dirac observables exhaust general relativistic worlds arguing that this is overly restrictive. Observing change involves the introduction of ‘frames’: genuine change is frame-dependent. Once we have frames on top of the unchanging structure given by Dirac observables then a coherent notion of change emerges—indeed supervenes on the structure formed from the Dirac observables. Of course, if we are to have an account of these frames that is internal to general relativity, then it will surely be the case that they are built up from physical degrees of freedom of the theory. If this is so then we have a position that begins to resemble the relational-type responses according to which change, at a fundamental level (in the sense of a change of the fundamental variables with respect to a time parameter) does not exist, but evolution does take place with respect to other physical degrees of freedom of the theory. That this is in fact the case can, I think, be inferred from Healey’s example of the detection of gravitational waves using LIGO. Healey wants to align the notion of frame with the notion of a foliation. It is difficult to see how calling this a frame takes one beyond ordinary Hamiltonian general relativity which, of course, is based on just such foliations. Perhaps Healey’s basing foliations on physical observers is supposed to add some extra ingredient (see pp. 410–1). However, the invocation of the conscious states of observers looks like a restatement of the problem: we appear to observe change, and yet general relativity appears to rule it out on account of the fact that the basic observables of theory cannot take on different values on different time slices relative to any foliation (given some initially chosen topology for the slices). The basic idea that one can have ‘change without change’ (i.e. change at one level but not at a more fundamental level) does deserve the close scrutiny of philosophers. Indeed, many of the Parmenidean views espouse this view in some form or another. Carlo Rovelli has attempted to resolve the problem of time in several different ways, similar to Healey’s in the basic approach. His most recent attempt involves the utilization of his own ‘partial observables’ framework (Rovelli, 2002)—his textbook on loop quantum gravity (Rovelli, 2004) contains a very readable account of the partial observable framework. Partial observables are contrasted with complete observables. This distinction keys into the division between gauge115 Roughly,

Lewis’ idea is that the notions of necessity and possibility are to be cashed out in terms of holding at all or some of a class of ‘flesh and blood’ worlds.

338

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

variant and gauge-invariant quantities. In the constrained Hamiltonian formalism of Dirac only the latter are genuine, physical observables. The former are unphysical surplus; useful perhaps, but ultimately doing no direct representational work. Rovelli modifies this framework so that complete observables are viewed as correlations between partial observables. Let’s spell this out in more detail. A partial observable is a physical quantity to which we can associate a measurement leading to a number and a complete observable is defined as a quantity whose value (or probability distribution) can be predicted by the relevant theory. Partial observables are taken to coordinatize an extended configuration space Q and complete observables coordinatize an associated reduced phase space Γred . The “predictive content” of some dynamical theory is then given by the kernel of the map f : Q × Γred → Rn . This space gives the kinematics of a theory and the dynamics is given by the constraints, φ(q a , pa ) = 0, on the associated extended phase space T ∗ Q. There are quantities that can be measured whose values are not predicted by the theory; yet the theory is deterministic because it does predict correlations between partial observables (which form complete observables). The dynamics is then spelt out in terms of relations between partial observables. Hence, the theory formulated in this way describes relative evolution of (gauge variant) variables as functions of each other. No variable is privileged as the independent one (cf.. (Montesinos et al., 1999), p. 5). The dynamics concerns the relations between elements of the space of partial observables, and though the individual elements do not have a well defined evolution, relations between them (i.e. correlations) do: they are independent of coordinate space and coordinate time. Rovelli suggests that we can use this to resolve the problem of time as follows: let φ = T be a partial observable parametrizing the ticks of a clock (evolving out across a gauge orbit), and let f = a be another partial observable (also stretching out over a gauge orbit). Both are gauge variant quantities: their change is purely gauge dependent. A gauge invariant quantity, a complete observable, can be constructed from these partial observables as: O[f ;T ] (τ, x) = f (x )

(5.34)

These quantities encode correlations. They tell us what the value of a gauge variant function f is when, under the gauge flow generated by the constraint, the gauge variant function T takes on the value τ . This correlation is gauge invariant. These are the kinds of quantity that a background independent gauge theory like general relativity is all about. We don’t talk about the value of the gravitational field at a point of the manifold, but where some other physical quantity (say, the electromagnetic field) takes on a certain value. Hence, evolution is relational.116 116 This view is broadly in line with Einstein’ own view of the physical content of general relativity. Indeed, he is very modern-sounding on this point, writing that “the gravitational field at a certain location represents nothing ‘physically real’, but the gravitational field together with other data does” ((Einstein, 1918), p. 71).

THE MANIFOLD METHODS OF QUANTUM GRAVITY

339

There are issues remaining over the interpretation of these correlations. Rovelli claims that “the extended configuration space has a direct physical interpretation, as the space of the partial observables” ((Rovelli, 2002), p. 124013-1, my emphasis). Both spaces—the space of genuine (complete) observables and partial observables—are invested with fundamental physicality (or ontological primacy) by Rovelli; the partial observables, in particular, are taken to be physical variables. Einstein, it seems, argued that only the correlation (between the gravitational field and other data) is physically real; the relata are secondary, unable to exist independently. This fits the Dirac picture better, but to adopt it one must fill in an account of the relationship between the partial and complete observables in such a way that the complete observables have ontological primacy. In (Rickles, 2006; 2007; 2008) it is argued that the proper interpretation of Rovelli’s approach is structural. That is, the partial observables have no intrinsic properties, but gain their properties from the correlation they find themselves in (the complete observable). One must also adapt the approach to mesh with quantum theory—see (Weinstein, 2000) for a philosophical discussion of this problem; (H´ ajiˇcek, 1996) gives a more technical discussion. Belot and Earman (2001) argue that these problems of time and change may serve to reinvigorate the listless debate between substantivalism and relationalism—that is, the debate concerning whether spacetime is a ‘substance’ (ontologically independent of matter) or not—by furnishing physical reasons for deciding one way or the other. They believe that substantivalism and relationalism are aligned to particular ways of canonically quantizing gravity (specifically with how one deals with constraints) so that if one approach were successful, and the other not, then that would give a principled reason for accepting the aligned spacetime ontology. I have argued elsewhere that this is not the case (see (Rickles, 2005b; 2006; 2007)). The problems and responses to time and change are analogous to the problems and responses to the hole argument, and the solution to the latter, vis-´ a-vis spacetime ontology, is far from decided. 5.6.4.7 Conclusions Philosophers have debated geometrodynamics since the early 1970s—e.g. (Graves, 1971). Moreover, the physicists involved with geometrodynamics have contributed to the philosophical debate—e.g. (Misner, 1972). This is a testament to the fact that canonical approaches have been more fruitful for philosophical research. Whether this is due to some deep reasons or whether it is because it is just simpler to understand is debatable. Whatever the reasons are, it seems that canonical quantum gravity is a very fit vehicle for philosophical research. 5.6.5 Feynman Quantization In the previous sections we have looked at covariant and canonical quantization techniques applied to the gravitational field. Of course, Richard Feynman developed another form of quantization using functional integral (or path-integral) techniques. Here we will follow Charles Misner (Misner, 1957) in speaking of the

340

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

functional integral or path integral approach as ‘Feynman Quantization’.117 This too has been applied to gravity in the hope of constructing a quantum theory of gravity. If successful the approach would be able to generate solutions of the Wheeler-DeWitt equation. According to Misner, the Feynman quantization of general relativity involves the following series of steps: • A 4-manifold M of points—where the x ∈ M have “no physical significance in themselves” (p. 501) and serve a purely convenient purpose (“as handles for stating the significant mathematical relationships”). No definite metric for M is chosen, nor will it be. • A family of hypersurfaces 3-dimensional submanifolds σ ⊂ M. • Define the metric at a point to be the inner product of a pair of tangent vectors at the point. However, the metric at a point is not to be viewed as a distance interval, only as useful in the computation of such intervals. • Introduce a ‘field history’ f (x) over all points of M that gives a definite value for the field (including the metric field) at the point. When the x ranges over the points of a hypersurface σ1 ⊂ M, the field history is a ‘field configuration’ f1 at σ1 . • Define a ‘state functional’ ψσ for the hypersurface σ: the state functional ψ1 at σ1 is given by the complex number ψ1 (f1 ) (the value of ψ1 when the field is in configuration f1 ); one does this for each field configuration on σ1 . The state functional ψ1 , then, completely characterizes the physical state of the field system—i.e. given ψ1 (and the theoretical apparatus appropriate to the field concerned) one can compute expectation values for all observables. • The ‘Heisenberg formulation’ versus ‘Schr¨ odinger formulation’ distinction amounts to choosing a representation of states ψ either as a state functional on some particular hypersurface or as a representation by a family of state functionals on different hypersurfaces (satisfying the Schr¨ odinger equation). • However, Misner uses what he calls the “Feynman principle” in place of the Schr¨ odinger equation: a state functional on a particular hypersurface, one can find the other members of the family of states on other hypersurfaces via the functional integration over the field configurations f0 at the initial hypersurface σ0 : ψa (fa ) = K(fσ , σ; f0 σ0 )ψ0 (f0 )δf0 (where K is the Feynman propagator). • Misner notes that gauge invariance implies that there is an overdetermination of physical states by the state functionals on hypersurfaces. In the context of general relativity this gauge invariance is diffeomorphism invariance. It has the further implication that pairs of manifolds diffeomorphic 117 Misner

credits John Wheeler with suggesting the idea of attempting a Feynman quantizaR tion of general relativity via the formula exp{(i/)(Einstein action)}d (field histories).

THE MANIFOLD METHODS OF QUANTUM GRAVITY

341

(i,e, topologically equivalent) to one another should have equivalent Feynman propagators: “the Feynman propagator connecting equivalent hypersurfaces is trivial” (p. 507).118 According to this approach, in the context of the quantum field theory of particles, one computes the probability for a particle to go between two states by summing over all possible trajectories (histories) that could connect the states— with an assignment of two numbers (the amplitude and the phase) to each possible path—by adding together the squares of those amplitudes. Hence, Misner proposed that one apply this method to the gravitational field. Here the histories would be the ‘trajectories’ followed by spatial geometry, which of course amount to a spacetime (or a ‘segment’ of such). Let us spell this out. Recall that in the classical theory of general relativity we are dealing with the metric as a classical degree of freedom. Once we have supplied the initial conditions, the metric is propagated uniquely119 in accordance with the Einstein equations (relating the metric to the stress-energy tensor representing the matter source of the gravitational field. This uniqueness (up to gauge) in the evolution of the metric is in line with the classical nature of the theory, in much the same way as a classical particle follows a unique trajectory in spacetime (of course, in the case of the metric the evolution is not in spacetime, but in superspace). Quantization forces us to replace these unique trajectories with amplitudes. In other words, if in the classical case we are interested in the motion of some object from x(0) to y(t), then the quantum scenario will be given by the sum over all possible paths connecting these points:  y, t|x, 0 = exp(iSpath /) (5.35) paths

Here, ‘paths’ ranges over the (infinite) set of possible trajectories with identical extremities, initial point x(0) and end point y(t). The full amplitude is computed by adding the various paths together, where each path contributes an amplitude exp(iSpath /). In the gravitational case, the action S will be the Einstein-Hilbert action and the space of paths will contain 4-metrics that have coincident 3-metrics on the initial and final time-slices. Recall that the Einstein-Hilbert action is:  S[g] = R vol (5.36) M

118 Of course, this is simply a recasting of the hole problem and problem of time issues only now in the Feynman description—however, this does show that those problems are not artifacts of the canonical formulation, as is sometimes supposed (see, for example, (Maudlin, 2004)). This train of thought can also be seen in the context of topological quantum field theories, where the dynamics is trivial. 119 Of course, there is the hole problem to contend with, generating infinitely many diffeomorphic evolutions, but that is easily resolved by viewing them, quite naturally since they are generated by constraints, as gauge equivalent.

342

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

√ ‘R’ is the Ricci scalar of the metric and ‘vol’ is the volume form, −g (where g is the determinant of the metric)— ‘R vol’ is the Lagrangian of general relativity. Hence, one can consider the path integral formulation of general relativity by substituting its action in the above expression, giving us:  h2 , t2 | h1 , t1  =

e

iS[h] 

 dμ[g] =

e

R i( M R vol) 

dμ[g]

(5.37)

Where we now consider the amplitude for the induced 3-metric h0 on an initial hypersurface to go to another 3-metric h1 on a later hypersurface. But we quickly run into trouble: although the times t0 and t1 denoted something physical in the case of standard quantum field theories (since we had a fixed spacetime to work with), in this case the time labels are gauge artifacts. In this case the amplitude gets tarnished with the gauge character of these labels.

h2

gμν

h1 Fig. 5.5. Representation of the gravitational path integral as a gravitational propagator from an initial 3-geometry h1 to a final 3-geometry h2 , interpolated by the 4-geometry gμν Even supposing these problems can be resolved, there are other complications. For example, as before we include paths that are not considered to be dynamically possible by the lights of the classical theory. In the case of general relativity this will include interpolating metrics that correspond to different underlying topologies (i.e. that are compatible only with different topologies). However, the topological structure is fixed in all known approaches to general relativity, and it is fixed in covariant and canonical approaches. Note, however, that many view the dynamical topologies as a virtue of the approach (and a vice in those approaches, such as the canonical approach, that fix it): the idea of fluctuating topologies keys in to Wheeler’s notion of spacetime foam. The Euclidean approach took the topology change on board and firmed up the Feynman quantization approach. The basic idea is the same: the probability

THE MANIFOLD METHODS OF QUANTUM GRAVITY

343

amplitude to go from an initial configuration (of metric hab and of, now, of mat   ter Φ) on a hypersurface Σ to another configuration (hab , Φ , Σ ) is computed by the functional (path) integral of exp(−iS) over all possible interpolating configurations:     hab , Φ , Σ |hab , Φab , Σ = ΣM D(g, φ)e−iS(gμν ,φ) (5.38) The Euclidean, path integral approach was taken up primarily for its cosmological applications: (1) in the context of black hole physics (see (Hawking, 1979; 1978)); (2) in the context of providing solutions to the Wheeler-DeWitt equation (wave-functions of the Universe—see (Hartle and Hawking, 1983)). Solutions to HΨ = 0 are given by a path integral (over spacetime histories). The ‘no-boundary proposal’ of Hartle and Hawking for initial conditions of the universe is given by taking the path integral to range over (closed) Riemannian 4-metrics, C, with just one boundary (i.e. no initial boundary):  Ψ[hij ] = D[gμν ]exp(−S[gμν ]) (5.39) C

If the metric manifold has only one boundary, as in the Hartle-Hawking noboundary proposal, then the path integral is taken to represent a quantum tunneling effect: ‘tunneling from nothing’. The solution is taken to provide the wavefunction for the Universe, the so-called ‘Hartle-Hawking wavefunction’. As mentioned previously, this approach still faces a version of the problem of time found in the canonical approach: there is no reference to time here, the wavefunction depends only on 3-geometries. Although this approach has filtered in to many other approaches, it is no longer pursued in its raw state as a contender theory of quantum gravity. The primary problems have to do with the definition of the integral itself and the measure space D[gμν ]. There are also interpretive issues to do with the very basis of the Euclidean approach: for example, the formalism involves Wick-rotation which turns t into −it (‘imaginary time’). There are two basic problems with this procedure: (1) the physical interpretation is not forthcoming—Deltete and Guy (2004) argue that the idea of “emerging” from imaginary time is simply incoherent.120 (2) spacetime is Lorentzian, not Euclidean—Gibbons and Hartle (1990) have argued that the metric signature changes dynamically so that in the early Universe the metric is Riemannian, and then later it became Lorentzian. Hence, there are both technical problems and interpretive problems facing Feynman quantization approaches. However, as we will see in the external methods section, §5.6.6 below, the basic idea of summing over topologies and geometries is fairly ubiquitous, though the integral has to be tamed in various ways. Feynman quantization has, mutatis mutandis, become a central part of several approaches to quantum gravity. 120 See

also (Butterfield and Isham, 1999), pp. 161–5, for a discussion of this problem.

344

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

5.6.6 External Methods Here we look at several interesting and/or promising research directions that do not fit into the methodological triplet: ‘Covariant, canonical, path-intergral’. One feature of the triple is that they involve quantizations of general relativity (or some other classical theory). Naturally, this involves a substantial amount of conceptual and formal baggage from the classical theory. The external methods often depart from quantization, and thereby detach themselves from (possibly misleading) assumptions of classical theories. Quite often, however, these external methods will overlap significantly with multiple methodologies. 5.6.6.1 Regge Calculus and Discrete Methods. We have seen how the idea of quantum geometry seems to be a fairly generic feature of the main approaches to quantum gravity. Regge calculus allows one to deal with such possibilities—as utilized, for example, in Wheeler’s notion of ‘spacetime foam’.121 In the Regge calculus, continuous space is decomposed into discrete chunks called simplices. These are stuck together along their faces. The edges (or “bones” in the jargon) are responsible for the gravitational field: the curvature resides here. One uses these chunks to see what the geometry is like at an approximate level (with the approximation being dependent on the size of the chunks). As with many discrete methods one takes some continuum limit. Hence, the idea is to produce a discrete approximation of (continuum) general relativity. A spacetime is taken to be a simplicial 4-complex (a network of edges, faces, and vertices: basically, geometry of solids in four dimensions), where the lengths of the edges are the dynamical (gravitational variables). Quantization follows the Feynman quantization method: one sums over lengths. An alternative discrete approach to quantum gravity known as the method of Dynamical Triangulations involves keeping the sizes of the building blocks fixed, but vary the way they are combined. In this case one sums over the ways that the blocks could be combined—see (Ambjørn et al., 1997) for a book-length introduction; for the state of the art, see (Ambjørn et al., forthcoming) or (Loll, 1998). As seems to be the norm in quantum gravity research, the concepts and methods crops up in various guises. We find that the covariant extension of loop gravity leads to something like these approaches in the context of discrete spin foams. The methods here take discreteness as an input, rather than an output as was the case with loop quantum gravity. One is trying to match what one expects quantum spacetime to be like. However, one would like to solve for as much structure as possible. 5.6.6.2 Quantum Topology and Topological Quantum Field Theory. Related to Feynman quantization methods is the notion that topology becomes a dynamical 121 The original motivations in (Regge, 1961) were also practical: Regge wanted to find ways of making general relativity more computationally manageable. Making the scheme discrete allows one to do numerical relativity. One can solve the Einstein equations for a larger class of models (without symmetry assumptions, and so on).

THE MANIFOLD METHODS OF QUANTUM GRAVITY

345

variable too. Given this one is led to consider quantizing it too, so that one quantizes below the metric. A firm proposal for making sense of this is topological quantum field theory.122 The basic insight of topological quantum field theory is that the minimal constraints of quantum gravity can be met by producing a quantum field theory on a non-metric, differentiable manifold. These will be diffeomorphism invariant quantum theories. Naturally, such theories do not possess local degrees of freedom (i.e. no properties here or there); but there are global degrees of freedom that do not refer to specific points or regions of the manifold. Most work on this approach has been carried out by mathematicians where a firm axiomatic framework is known. This involves (2) the assignment of Hilbert spaces to spatial manifolds, whose rays will represent states of the universe that are possible relative to the given manifold. (2) The assignment of linear operators to cobordisms (corresponding to spacetime segments joining spatial boundaries) representing how states change given the interpolating manifold. The problem is, though topological quantum field theories share some of the properties we expect a quantum theory of gravity to have, they do not constitute quantum theories of gravity since gravity does not make an appearance. See (Baez, 2005) for more details. 5.6.6.3 Non-Commutative Geometry. Given a real manifold M one can form an algebra consisting of the scalar functions on it. Given this algebra alone, one can reconstruct the manifold from whence it came. There is an isomorphism between the (commutative) algebra of functions A and M. This suggests the following possibility: if we can use a commutative algebra to reconstruct a space, then can we do the same thing with a non-commutative algebra? Of course, such non-commutative algebras are a feature of quantum mechanics; for example, the algebra of observables satisfying the canonical commutation relations. The answer is Yes, and the structure that is reconstructed is called a non-commutative geometry. Though the idea has features we might expect in a quantum theory of gravity, it does not have the resources to function as a self-standing approach. 5.6.6.4 Group Field Theory. The group field theory method aims to instill as much generality as possible in quantum gravity, dispensing with both background geometry and topology. Hence, all spacetime concepts are dispensed with during the initial formulation. It is intended to be a quantum field theory of spacetime. It blends elements of the path-integral approach, the loop quantum gravity approach, and discrete methods (simplicial methods, dynamical triangulations). Indeed, it has hopes to be a unifying, overarching framework for all of these methods. Group field theory is perhaps the newest approach that is intended to provide a full ‘fundamental formulation’ quantum theory of gravity. The details are still in the early stages of development and we will have to wait for more 122 Of course, the Euclidean programme allowed for topology change to occur, and there is a sense, in that approach, of what Wheeler called spacetime foam, since one can sum over different topologies.

346

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

concrete results into which philosophers can dig their teeth (see (Oriti, 2006; Friedel, 2005) for good reviews). 5.6.6.5 Twistor Theory. Twistor theory is the brainchild of Roger Penrose. One of its primary motivations is, as he puts it, “the thought that there should be a more intimate union between space-time structure and quantum mechanics than exists in conventional theory” ((Penrose and Ward, 1980), p. 283). Hence, the twistor programme involves the forging of mathematical links between the formal structures of quantum theory and general relativity. The structures of general relativity are, it seems, usually privileged somewhat more than those of quantum theory; that is, it is manifolds rather than vector spaces that take center stage. Twistor theory reverses this, connecting the mathematical structures of quantum theory with spacetime. For example, the complex number field does the work in the context of quantum theory and this is taken up in the construction of spacetime in twistor theory. In this way one achieves some kind of unification between quantum theory and spacetime: one and the same mathematical framework applies to both. The approach is based on two philosophical principles: (1) that the use of a spacetime continuum is an unjustified assumption in physics; (2) that the use of real-number structures in physics is an unjustified assumption.123 It is based on the space of light rays, rather than point-events. The latter are then formed from the coincidences between the former. Connecting this up to the problem of quantum gravity, we can notice that the usual methodology is to quantize on at least a fixed manifold structure—i.e. keeping the points well-defined. Then the metric (responsible for the light cone structure: the null cones) is quantized, and thus becomes ‘fuzzy’. In twistor theory the situation is reversed: the null cones are well-defined, and the point structure becomes fuzzy. The central mathematical idea of the ‘twistor’ programme is a complexification of spacetime. The idea is supposed to be taken seriously, rather than simply as a heuristic device; naturally, this increases the dimensionality of spacetime (4D real Minkowski spacetime is a subspace of an eight real dimensional manifold, namely complexified Minkowski space). In twistor quantization, the twistor space forms the arena in which quantum processes occur, and at whose points quantum fields are defined. We leave the subject of twistor theory without going into the formal details— the theory is, mathematically, extremely complicated. Indeed, as Penrose admits ((1999), p. 607), despite its ultimate aim as providing a general framework in which quantum gravity will be formulated, twistor theory has had most success in 123 Chris Isham has been developing an approach based on similar philosophical principles, though one that implements them in a very different way to Penrose. For example, Butterfield and Isham speak of the “danger of certain a priori, classical ideas about space and time being used unthinkingly in the very formulation of quantum theory” ((Butterfield and Isham, 2000), p. 1711). Hence, though both Isham and Penrose propose a revision of spacetime, Isham’s approach is more radical than Penrose’s, arguing that it might even be necessary to dispense with set theory in order to get what we want: a quantum theory of gravity.

THE MANIFOLD METHODS OF QUANTUM GRAVITY

347

the area of pure mathematics rather than physics. It remains to be seen whether Witten’s (2004) integration with string theory bears fruit in physics. 5.6.6.6 Supergravity. Why stop at complex numbers as in twistor theory? Hypercomplex numbers—complex numbers involving more than just pairs of real numbers—might be even more useful from the point of view of quantum gravity since quantum theory involves non-commutativity and hypercomplex numbers have non-commutative products. This connects up to elementary particle physics through the notion of a Grassman algebra. Supersymmetry increases the number of fields in general relativity. Gravitons get a ‘superpartner’, the spin- 32 gravitino. These fermionic fields serve to cancel the bosonic fields, thus taming (some of) the infinities that inevitably trouble most field theories. Indeed, the primary aim is to cure these divergences, thus making modified general relativity perturbatively renormalizable. Unfortunately the pure supergravity theory fails at the 3-loop level—general relativity itself fails at the 2-loop level. 5.6.6.7 Causal Sets. Rafael Sorkin has been developing an increasingly popular approach to quantum gravity known as ‘causal set theory’. A causal set is a set C on which there is defined a relation of precedence ≺ that is transitive ([(x ≺ y) ∧ (y ≺ z)] ⊃ x ≺ z), non-circular ([(x ≺ y) ∧ (y ≺ x)] ⊃ x = y), and finite between any two points (|Y | = |{y|x ≺ y ≺ z}| < ∞). Or, in other words, a causal set is a locally finite partially ordered set (or ‘poset’). His approach involves utilizing causal sets within a ‘sum over histories’ type approach so that the histories are built from (discrete) causal sets. One can find hints of a programme of this kind in the mid-50s. D. van Dantzig, for example, suggests, on the basis of the empirical inaccessibility and non-definability of worldpoints, that one deal instead with a “finite set of discrete events with spacetime relations between them” ((van Dantzig, 1956), p. 52). The basic idea of a causal set was known before Sorkin’s work on the subject. The application to gravity was briefly considered by t’Hooft ((t’Hooft, 1979), p. 340), where he was concerned with retaining causality despite losing the metric tensor—he also notes that Lorentz and diffeomoprhism invariance can be preserved on a lattice given an appropriately defined causality relation.124 Butterfield (2007) has recently argued, however, that Sorkin’s causal set theory fails to satisfy stochastic Einstein locality. The causal set approach, unlike loop quantum gravity, includes spacetime discreteness as an input. The key is to eradicate all traces of continua from the formalism; any such notions are to emerge in an appropriate classical, macroscopic limit. Hence, such things as distance, length, time, and so on are approxi124 There are precursors too in the philosophical and logical literature: see (Reichenbach, 1969) and (Zeeman, 1964) respectively. One can also find the basic idea of treating the causal structure of spacetime as fundamental in Robb’s axiomatization of special relativity: (Robb, 1936). For a comparison of these approaches with modern causal set theory, see (Sorkin, 2002).

348

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

mate concepts: at a microscopic level, spacetime ceases to means the same thing as at the macroscopic level, one has only a primitive notion of before and after. 5.6.6.8 Spin-Foams. There is a covariant (path-integral) version of loop quantum gravity that involves the time-development of spin-networks, a procedure that generates ‘spin foams’, the cross-sections of which are spin-networks. These are then integrated over, to produce a ‘sum over spin-foams’ formulation of quantum gravity. The aim of this approach is to resolve the difficulties facing the dynamics of loop quantum gravity: the spin foam formulation deals directly with the dynamics—it is a spacetime formulation of loop quantum gravity, only the spacetime is the object that is solved for (hence, this is still a background independent approach). Before leaving this section I should mention some very recent work that pertains to the unification of matter and gravity in spin foam models. What this work shows is that some causal spin-foam models possess ‘emergent’ local degrees of freedom representing particles. Hence, just as string theory has modes of the string corresponding to elementary particles, so the covariant extension of loop quantum gravity has too. How such models bear on traditional debates in metaphysics and the philosophy of space and time remains to be seen. 5.6.6.9 Generalized Quantum Mechanics. It seems pretty clear that quantum mechanics is incompatible with the picture of spacetime that general relativity provides us with. Quantum mechanics demands a fixed spacetime geometry. This is essential for the definition of the states, which are given on spacelike hypersurfaces, and that are evolved unitarily onto other hypersurfaces. General relativity does away with this notion, so that any such evolution will be a gauge motion: the spacetime geometry is dynamical. In a quantum theory of gravity we expect that matters will be even worse since the geometry, as a dynamical variable, will undergo fluctuations and so will, in general, be without a definite value. Just as particles don’t have definite trajectories in quantum mechanics, so spacetime geometry doesn’t have a definite trajectory. Hence, the incompatibility. This is simply to recap what we have said already, of course. Generalized quantum theory is intended to provide a framework that can cope with such situations. One has to generalize standard quantum mechanics in such a way as to apply to closed systems, so that measurement and the notion of observers doesn’t play a fundamental rˆ ole. This is so that the theory can be applied to the universe as a whole, enabling quantum cosmological considerations, as should be possible in a theory of quantum gravity. One must also generalize the spacetime dependence so that dynamical geometry can be incorporated. The histories in this case, however, do not involve evolution of quantum systems within spacetime but involve the evolution of spacetime itself. A history, then, could be a spacetime geometry. Essentially one ends up with something similar to the Feynman quantization approach, though one that involves decoherent histories (see §2.2.3 and §2.3.1 of Wallace’s chapter for the details of this approach). In any case, we saw earlier, in §5.2.2, that quantum cosmology, though bound

THE MANIFOLD METHODS OF QUANTUM GRAVITY

349

up in various ways with quantum gravity, is a very different enterprise. 5.6.6.10 Causaloid Quantum Gravity. Lucien Hardy (2007) has recently begun to work on a new approach to quantum gravity that is intended to provide a ‘framework’ for quantum gravity theories. The idea is to develop a general formalism that respects the key features of both general relativity, which he takes to be the dynamical (nonprobabilistic) causal structure, and quantum theory, which he takes to be the probabilistic (nondynamical) dynamics. The causaloid (of some theory) is an entity that encodes all that can be calculated in the theory. The causaloid formalism does not depend on background causal structure and it can handle quantum theory. Given his characterization of quantum gravity as a probabilistic theory with a dynamical causal structure, Hardy argues that this might help in the search for quantum gravity. As he admits, there are many technical hurdles to leap: not least of these is the problem of incorporating general relativity in the formalism (beyond the lack of fixed causal structure, which is necessary but clearly not sufficient). 5.6.6.11 Asymptotically Safe Quantum Gravity. The standard model contains perturbatively renormalizable quantum gauge field theories of the electroweak and strong interactions. General relativity, when set up as a quantum field theory, is not perturbatively renormalizable (in the coupling constant GN ). The approach known as ‘asymptotic safety’, associated with Martin Reuter (see (Niedermaier and Reuter, 2006)), attempts to avoid these difficulties by modifying the computational (renormalization) strategies, making them nonperturbative: hence, the idea is to produce a nonperturbative renormalization of quantum gravity. Asymptotic safety, coined in (Weinberg, 1979), refers to the fact that the physical quantities in such a theory will not be affected by divergences (as a cutoff limit is taken). This approach makes use of the Wilson (renormalization) group flow ideas that tell us how the dynamical behaviour of a system changes as the scale (or energy) is varied. The details of which are too complicated to go into here: see (Gross, 1999) for an excellent, concise introduction. However, the results here are philosophically interesting for a variety of reasons. For one thing, it appears to imply that quantum general relativity can be held up as a fundamental theory of sorts (i.e. applicable at all energies/distances). This approach also predicts a fractal-like microstructure of spacetime. Both of these aspects, resulting from the intermingling of statistical physics ideas with quantum gravity research, are ripe for philosophical pickings.125 Conclusions The approaches to quantum gravity that we presented above do not constitute a case of complete underdetermination: they make a number of very different predictions about the world. It is becoming clear that initial pessimistic claims 125 For

related work along these lines see: (Volovik, 2003), (Zhang, 2004), and (Laughlin and Pines, 2000).

350

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

concerning the non-testability of these approaches were not accurate. As we will see in §5.7.3, recent work has shown that there are tests that might be made in the foreseeable future that have the potential to falsify certain approaches— or at least cast them into a shadow of serious doubt. What makes quantum gravity such (philosophical) fun is that if we can make the connections between philosophical positions, with respect to time, space and change, then some test will also support or undermine these positions too! It seems, moreover, that from what we have seen these connections can be forged, leading to the promise of new areas of experimental metaphysics (or, at least, new constraints on old metaphysics). 5.7

Special Topics

In this section we focus briefly on several topics of especial philosophical interest. We look at interactions and cross-fertilization between the methodologies; background independence, the experimental status of quantum gravity research, and the interpretation of quantum theory. The treatment here very barely skims the surface and is intended purely as a preliminary pointer to the kind of philosophical work that has and could be carried out. 5.7.1

Interactions and Cross-Fertilization

The battlefield of quantum gravity research is littered with the corpses of oncepromising approaches. Some, badly damaged, soldier on. In certain cases one finds approaches resurrected. In other cases their parts are transplanted onto other approaches. Given this apparent patchwork quality to the approaches, one wonders why there isn’t more interaction between the leading approaches, string theory and loop quantum gravity, and with the other approaches. Part of this, I expect, can be put down to the fact that they have different aims: string theory aims to be a theory of all interactions while loop gravity is concerned primarily with the quantization of gravity. There are less charitable reasons one might give: funding demands, arrogance, etc.! Indeed, the field of quantum gravity research has become a rather ugly one in recent years, with string theorists and researchers from other approaches engaged in various ‘my theory’s better than yours’ antics.126 This doesn’t seem a healthy way to do physics. As the brief history of quantum gravity highlights, significant advances appeared to occur when researchers were dealing with multiple approaches with an eye to their connections. An important example here, mentioned earlier, is Roger Penrose’s work on combinatorial spacetime, in which he attempts to construct both spacetime and quantum mechanics from discrete elements (Penrose, 1971). Now, the approach is certainly not an intrinsic approach: it is based on a quantity, angular momentum, that plays a rˆ ole in both spacetime and quantum mechanics, and it has discrete 126 One

can find this in various books, blogs, and debates: see, for http://motls.blogspot.com and http://www.math.columbia.edu/ woit/wordpress.

example,

SPECIAL TOPICS

351

spectrum. Hence, some feature of both spacetime and quantum theory is utilized in a central way to develop a notion of quantum space. This basic idea has infected all of the major lines of research and many ‘external’ approaches. The notion of spin-network thus functions as a kind of unifying instrument. The ubiquity of such a concept throughout quantum gravity research ought to signal collaboration—however, as philosophers know well, ought does not imply is. In “How to Be a Good Empiricist—A Plea for Tolerance in Matters Epistemological”, Feyerabend writes127 : You can be a good empiricist only if you are prepared to work with many alternative theories rather than with a single point of view and ‘experience’. This plurality of theories must not be regarded as a preliminary stage of knowledge which will at some time in the future be replaced by the One True Theory. Theoretical pluralism is assumed to be an essential feature of all knowledge that claims to be objective. ... The function of such concrete alternatives is, however, this: They provide a means of criticizing the accepted theory in a manner which goes beyond the criticism provided by comparison of that theory ‘with the facts’. ... This, then, is the methodological justification of a plurality of theories: Such a plurality allows for a much sharper criticism of accepted ideas than does the comparison with a domain of ‘facts’ which are supposed to sit there independently of theoretical considerations. (Feyerabend 1968, pp. 14–5)

This bears many similarities with Chamberlin’s ‘Method of Multiple Working Hypotheses’ (1965). There the idea is that the use of several methods produces a situation in which “the re-action of one hypothesis upon another tends to amplify the recognized scope of each, and their mutual conflicts whet the discriminative edge of each” (Chamberlin 1965 (1890), p. 756). Feynman too expressed a similar sentiment in his Nobel address: For different views suggest different kinds of modifications which might be made and hence are not equivalent in the hypotheses one generates from them in ones attempt to understand what is not yet understood. I, therefore, think that a good theoretical physicist today might find it useful to have a wide range of physical viewpoints and mathematical expressions of the same theory ... available to him. This may be asking too much of one man. Then new students should as a class have this. If every individual student follows the same current fashion in expressing and thinking about electrodynamics or field theory, then the variety of hypotheses being generated to understand strong interactions, say, is limited. Perhaps rightly so, for possibly the chance is high that the truth lies in the fashionable direction. But, on the off-chance that it is in another direction - a direction 127 However, Feyerabend probably isn’t the best spokesperson to pick for the point I am making here, since he famously argued that even Galileo resorted to chicanery and propaganda such as one finds amongst quantum gravity researchers. However, the following quote serves my purposes well. Indeed, since there are no such ‘facts’ (by which he means empirical facts, it would seem that the pluralism of which he speaks is demanded more so.

352

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS obvious from an unfashionable view of field theory - who will find it? (http://nobelprize.org/nobel prizes/physics/laureates/1965/feynmanlecture.html)

In my view, part of what lies behind the present split in quantum gravity research is a detachment from the historical origins and early trajectory of such research. Never before has there been such a gap between the origins of some research topic and its initial formulation. Many of the early pioneers are no longer with us. As I mentioned above, these early pioneers worked on multiple lines of attack, because they saw that one had to try almost anything at that stage. The problem is, these lines have solidified into well-defined paths that no longer seem to intersect—or if they do it is through dense thickets. In the early period there was no sense that they were on the right path,128 but this is no longer true: a large proportion of string theorists believe they are absolutely on the right track, and so (it has to be said) do many loop theorists. There is no evidence to decide between them or to tell us which, if any, really is right. Given this state of affairs, it makes no (rational) sense to exclude other approaches; certainly not as a matter of ideology. However, this kind of prescriptive preaching has a tendency to backfire: perhaps the ideological (even dogmatic) way in which researchers are glued to their pet programmes, despite the lack of evidence, might very well lead to fruitful results—Thomas Kuhn (1964) argued persuasively for a certain amount of dogma in the practice of normal science. The difference is: quantum gravity is not yet normal science; that is, there is no singe theory that is pursued in virtue of a consensus amongst physicists that they have the right theory. Time will tell if the current way of doing physics works: I doubt it though. At the very least, I’m sure it would be far more fruitful if there were more cooperation. 5.7.2 Background Structure It is often claimed that the novelty of general relativity lies in its (manifest) ‘background independence’. However, background independence is a slippery concept apparently meaning different things to different people. The ‘debate’ between strings and loop on the issue of background independence is severely hampered by the fact that there is no firm definition of background independence on the table. The two camps are almost certainly talking past each other when discussing this issue. It certainly does seem reasonable to assume that in order to reproduce a manifestly background independent theory like general relativity, a quantum theory of gravity should be background independent too. However, so far as I know, there is no proof of this. The problem might be that background independence simply isn’t a formal property of theories—Gordon Belot (2007) has recently argued that background independence is an interpretive matter. The debate over background structure is focused around the status of the metric. The metric is a variable that describes the geometry of spacetime. In non-generally relativistic theories it is fixed to a single value assignment for all 128 This

is not true of the very earliest attempts based on the electromagnetic analogy, where it was believed that the quantization of gravity would be a relatively trivial matter.

SPECIAL TOPICS

353

of the theory’s models, constraining the motion of light, particles, and fields, but not itself being in any way affected by these motions. In general relativity, of course, this all changes: the metric is a dynamical variable, which implies that the geometry of spacetime is dynamical. Einstein’s insight was to see that this was able to account for gravitational interactions. As Carlo Rovelli expresses it: “What Einstein has discovered is that Newton had mistaken a physical field for a background entity. The two entities hypostatized by Newton, space and time, are just a particular local configuration of a physical entity—the gravitational field—very similar to the electric and the magnetic field” ((Rovelli, 2006), p. 27). In other words: “Newtonian space and time and the gravitational field are the same entity” (ibid.). However, most of physics ignores this lesson and freezes the metric to a single value once again, in order to simplify calculations. Such a manoeuvre is called a background field method. We saw that some approaches to quantum gravity adopted such a methodology; but it can only ever be a stopgap. In some cases it is appropriate (when G is small), but certainly not in cases that we would expect a theory of quantum gravity to deal with (extreme gravitational situations, such as end states of quantum black holes and the initial phase of the universe, close to the Big Bang. There are certain immediate obstacles that we face as soon as we contemplate a full quantization of a background independent field theory like general relativity. The dynamical nature of the metric field, coupled with its dual rˆ ole, points to a notion of quantum geometry. This quantum geometry differs greatly from other quantum field theories. For example, one of the axioms of quantum field theory (in Minkowski spacetime) is that spacelike separated fields commute: ˆ ˆ [φ(x), φ(y)] = 0 ∀x, y ∈ M, I(x, y) < 0

(5.40)

That is, whatever field measurements we make at the spacetime point x cannot influence the field values at y whenever these points are at spacelike separation— this still applies if we substitute ‘regions’ for ‘points’. Now consider what happens when we turn the metric field into a quantum gravitational field: [ˆ g (x), gˆ(y)] = 0 ∀x, y ∈ M, I(x, y) < 0

(5.41)

In this case gˆ is a quantized metric field, and the metric field determines spacetime geometry as well as the gravitational field. Without a definite metric in hand we cannot say that a pair of points or regions are spacelike separated: such a notion is determined by the metric which is itself determined by field equations. Hence, we get a fuzzy notion of causal structure in this case, and this axiom can’t hold: it assumes that the points of spacetime have some independent meaning, when according to general relativity they do not—see (Weinstein, 2000) for a discussion of this problem. This is why the background dependent methods are clung to, despite their failings: they provide useful machinery to aid theory construction.

354

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

We face a series of questions when considering background independence: What exactly is it? Why is it considered to be an important principle? What theories incorporate it? To what extent do they incorporate it? In particular, is string theory background independent? If ‘clues’ from the duality symmetries of M-theory are anything to go by, it looks like string theory might be even more background independent than loop quantum gravity, for the dimensionality of spacetime becomes a dynamical variable too (cf.. (Stelle, 2000), p. 7). Indeed, various string theorists claim that their theory is background independent. In many cases it seems that they have a different understanding of what this entails than loop quantum gravity researchers—this takes us to the first, definitional, question. In particular they seem to think that the ability to place a general metric in the Lagrangian amounts to background independence. This falls short of the mark for how we are understanding it here, namely as a reactive dynamically coupling between spacetime and everything else. Though one can indeed place a variety of metrics in the stringy Lagrangian, one does not then vary the metric in the action. There is no interaction between the strings and spacetime. Indeed, this is not really distinct from quantum field theory of point particles in curved spacetimes: the same freedom to insert a general metric appears there too. There is an alternative argument for the background independence of string theory that comes from the field theoretic formulation of the theory: string field theory. The idea is that classical spacetime emerges from the two dimensional conformal field theory on the strings worldsheet. However, one surely has to say something about the target space, for the worldsheet metric takes on a metric induced from the ambient target spacetime. Yet another argument for the background independence of string theory might point to the fact that the dimensionality of spacetime in string theory has to satisfy an equation of motion (a consistency condition): this is how the dimensionality comes out (as 26 or 10, depending on whether one imposes supersymmetry). One contender for the definition of background independence is a structure that is dynamical in the sense that one has to solve equations of motion to get at its values. In this case we would have extreme background independence stretching to the structure of the manifold itself. However, the problem with this is that this structure is the same in all models of the theory; yet we expect background independent theories to be about structures that vary across models. Whether background independence will continue to be the divisive principle that many take it to be is not clear. It seems like all of the main approaches are converging onto background independence. This includes string theory, as mentioned. For example, Michael Green, a string theorist, writes: One of the most obvious problems is that all the suggested microscopic models of M theory are background dependent. A truly background-independent formulation would not make a distinction between the target space and the embedded objects—both concepts should emerge from a novel kind of quantum geometry. ((Green, 1999), p. A99)

SPECIAL TOPICS

355

Lee Smolin (Smolin, 1999) acknowledges that recent results into the nonperturbative (strong coupling) aspects of string theory points to existence of an underlying background independent version of string theory known as Mtheory. One reason for its alleged background independence is that it includes multiple perturbative string theories defined with respect to a multitude of different backgrounds. There are, as we saw earlier, (duality) symmetries linking these various spacetimes, much as diffeomorphism invariance links the various spatial slices linked by the constraints in the canonical approach. The issues here are subtle and complex, and philosophers have begun to consider them. The central problem faced, as a philosopher, when trying to make sense of claims such as these is that there is no solid, unproblematic definition of background structure (and therefore background independence and dependence) on the table. Without this, one simply cannot decide who is right; one cannot decide which theories are background independent and which aren’t. Hence, an urgent issue in both physics and the philosophy of physics is to work out exactly what is meant by ‘background independence’ in a way that satisfies all parties. Until this is achieved, background freedom cannot be helpfully used to distinguish the approaches, nor can we profitably discuss its importance. A serious attempt to define background independence in such a way as to make these tasks possible has been made in (Giulini, 2007)—I refer the reader to this excellent article for a review of the various ways of making sense of background independence. 5.7.3

The Experimental Situation

As Carlip writes in his recent review of quantum gravity: “The ultimate measure of any theory is its agreement with Nature; if we do not have any such tests, how will we know whether we are right?” ((Carlip, 2001), p. 885). Quantum gravity, of course, reverses the conventional way of doing physics—Green, Schwarz, and Witten write that “[q]uantum gravity has always been a theorist’s puzzle par excellence ((Green et al., 1987), p. 14). Generally, when attempting to construct a new theory to deal with some novel phenomena, physicists have some experimental data at their disposal to which they will attempt to fit phenomenological models, hopefully leading to a predictive general theory. What is interesting about this methodology, looking at various crucial historical episodes, is that, often, conceptual and formal consistency are bypassed in a bid to gain a good fit to reality. By contrast, quantum gravity is almost entirely based on conceptual and formal consistency, along with constraints imposed by background knowledge and theories. It has, from the beginning, seemed to be out of bounds as far as experimental probing is concerned. This, more than anything else, gave quantum gravity research a bad name (and still does, though to a lesser extent): the litmus test of any scientific theory is an experimental test. Without this one is dabbling in pure mathematics, or worse, metaphysics! Experiment is what makes us willing to brand a theory a physical theory. Yet the very possibility of experiments to observe quantum gravity effects seems to be ruled out on simple dimensional grounds: the natural unit of energy in quantum gravity is

356

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

c5 /G = 1022 M eV , but even the Large Hadron Collider [LHC] will only reach energies of the order 14 TeV (for combined energies of collided particles)—see (t’Hooft, 1979), p. 328 for a stereotypical example of this pessimistic argument. However, there have been a number of recent developments—beginning in the late 1990s with the work of Giovanni Amelino-Camelia—that indicate that the initial pessimism concerning the testability of the various quantum gravity theories might be unfounded.129 Indeed, these developments have spawned a whole new research programme called ‘quantum gravity phenomenology’. This work, if successful, will transform quantum gravity research into a genuine experimental discipline—it might well wreck the chances of all of the leading theories too. Firstly, let us recall the reasons behind the pessimism. The scale at which quantum gravitational effects are supposed to appear is set by the various physical constants of fundamental physics: , c, and G. These characterize quantum, relativistic, and gravitational phenomena respectively. Hence, quantum gravitational phenomena will involve all three. By combining these constants we get units of length, time, and mass/energy at which the effects of quantum gravity should make themselves manifest. We gave some of these earlier, but we shall repeat them for convenience: ! −33 • Planck length: lp = G cm c3 ≈ 1.62 × 10 ! l p −44 • Planck time: tp = G s c5 = c ≈ 5.40 × 10 ! 5 c  −5 • Planck mass: Mp = g (Planck energy ≈ G = lp c ≈ 2.17 × 10 19 1.22 × 10 GeV) These are very many orders of magnitude beyond current experimental capabilities.130 They would appear to lie beyond our capabilities in the distant future too. Hence, the pessimism. However, the pessimism seems to have been unwarranted. The problem with the scale argument, as we noted earlier, is that it is applicable to individual quantum gravitational events. The key insight of quantum gravity phenomenology is to combine such events, and so generate amplified effects that can be detected with present day equipment, or equipment that can conceivably be constructed now or in the near future. Hence, the irony of quantum gravity phenomenology is that, not only does one not attempt to (directly) probe the very small distances normally associated with quantum gravity, one goes to the farthest reaches of 129 The

tests devised by Amelino-Camelia and others, however, are doable with present technology. The methodology is the same as Unruh’s: don’t even think about tying to probe individual events at the Planck scale; instead, look for events that amount to an amplification of the Planck scale effects. For a selection of recent articles on quantum gravity phenomenology, see (Amelino-Camelia and Kowalski-Glikman, 2005)—a more recent, highly readable review is (L¨ ammerzahl, 2007). 130 If the Planck mass looks like rather a reasonable figure, note that one needs to consider it concentrated in a volume (roughly) with sides equal to the Planck length in order to produce quantum gravitational effects.

SPECIAL TOPICS

357

the opposite end of the scale spectrum, using astronomical features as probes. This can be achieved by observing various systems: cosmic rays, gamma-ray bursts, Kaon bursts, particles, light, and the cosmic background radiation. One looks for ways that quantum gravitational effects might manifest themselves in these systems, given that they are such that Planck scale effects will have been able to become amplified in them (because of the very large energies or the vast time/distance-scales they involve—e.g. photons that have travelled vast distances, and so might have been cumulatively modified by miniscule Planck-scale effects).131 That does not mean we can produce the necessary effects ourselves, in experimental devices on Earth. Instead, one utilizes various ‘natural experiments’ that the universe itself provides us with. For example, there are particles that have been travelling across vast distances at tremendous velocities, far greater than we could manage. Hence, by trading in control, we can help ourselves to phenomena that can be utilized to test the various proposals. That quantum gravitational effects will not be measurable on individual elementary particles is intuitively quite clear. Bryce DeWitt devised rigorous arguments to show this to be the case: the gravitational field itself does not make sense at such scales. He showed that the static field from such a particle (with a mass of the order 10−20 in dimensionless units) would not exceed the quantum fluctuations. The static field dominates for systems with masses greater than 3.07 × 10−6 . The gravitational field is from this viewpoint an ‘emergent’ “statistical phenomenon of bulk matter” ((DeWitt, 1962), p. 372). DeWitt points out that the continuum picture “must persist [even at 10−32 cm] if the general coordinate transformation group is really fundamental” ((DeWitt, 1962), p. 373). That is, the diffeomorphism group depends on the manifold. One might wonder how this squares with the canonical approaches, loop quantum gravity in particular. However, as DeWitt goes on to remark, there need be no conflict since there will be a natural cutoff that allows us to ‘ignore’ distances below this. I quote him at length: Such a “cutoff” would, of course, eliminate the ultraviolet divergences of field theory and establish a fundamental role for gravitation in elementary particle physics. Moreover, the existence of a “cutoff” at this wavelength is not obviously incompatible with the success of modern field theory in correlating experimental data. ((DeWitt, 1962), pp. 373–4). 131 Very recent results from the MAGIC (Major Atmospheric Gamma-ray Imaging Cherenkov) telescope might constitute a genuine example of quantum gravity phenomenology: (MAGIC Collaboration, 2007) (see also http://magic.mppmu.mpg.de). The data reveals an energydependent time delay in the photons from the active galaxy Markarian 501—that is, it appears that the photons have different arrival times despite having the same departure times. The best fit of the parameter M under the hypothesis Δc/c = −E/M (where E is the energy of photons) is M = 0.4 × 1018 Gev (i.e. the Planck mass). Whether or not this is due to some quantum gravity based corrections to the propagation of particles in a discrete, fluctuating spacetime is not yet clear. However, it does seem to indicate that quantum gravity effects are indeed measurable using currently available technology. (I thank Carlo Rovelli for this observation.)

358

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS [I]t must constantly be borne in mind that the “bad” divergences of quantum gravidynamics are of an essentially different kind from those of other field theories. They are direct consequences of the fact that the light cone itself gets shifted by the non-linearities of the theory. But the light-cone shift is precisely what gives the theory its unique interest, and a special effort should be made to separate the divergences which it generates from other divergences. ((DeWitt, 1962), p. 374). It may ... be worth while to make strong attempts to link the apparent “internal” spaces more directly to the ordinary four-dimensional spacetime of everyday experience, even at the risk of resurrecting some longabandoned so-called “unified field theories” in modified or generalized form. The present unattractiveness of theories of this type is due at least in part to the lack of a quantum formalism for them. If the quantization programme for gravitation can be successfully pushed through then these theories may become more attractive. ((DeWitt, 1962), p. 373).

The proposed effects are expected to flow from the apparently generic modification of spacetime that quantum gravity will herald. As mentioned, there have been proposals that essentially involve using the universe as an experimental device, by exploiting its age and the ages of certain processes that have been occurring in it. One idea is that, discernibly, light changes its properties over vast distances of travel in the case of discrete spacetime. The idea is that the graininess of spacetime causes birefringent effects in the observed light (that has travelled over large distances)—see (Gambini and Pullin, 1999) for an elementary review. According to Maxwell’s equations, defined on continuous spacetime, the speed of light in vacuo will be independent of its wavelength λ. The expectation is that this will be false if spacetime has a discrete, polymer-like structure. The basic idea here stems from the fact that a wave propagating across a discrete lattice will violate Lorentz invariance: this symmetry breaking is then viewed as a ‘probe’ to test quantum gravity proposals. Since many approaches involve spacetime (or just spatial) discreteness, this might well afford an experimental test. However, spacetime discreteness is not itself a sufficient condition for Lorentz non-invariance: causal sets are discrete structures that do not seem to violate it. Hence, this evidence would still leave us ignorant on many crucial ‘theory-selection’ matters. This should not inhibit experiment, of course. While these experiments would, if successful, generate some qualitatively novel phenomena, the results would be generic—amounting to what Smolin (forthcoming) calls ‘soft predictions’. That is, they would not allow us to choose between a group of theories that postulated spacetime discreteness. It is possible that one such approach comes up with a particularly accurate specific value for the effect, and this would then be a step towards theory adoption. Finally, let us consider the issue of the importance of experiment over mathematical rigour. Werner Heisenberg was skeptical of ‘overly rigorous’ mathematical methods in physics. Some of his remarks on this issue are highly applicable

SPECIAL TOPICS

359

to quantum gravity research: When you try too much for rigorous mathematical methods, you fix your attention on those points which are not important from the physics point and thereby you get away from the experimental situation. If you try to solve a problem by rather dirty mathematics, as I have mostly done, then you are forced always to think of the experimental situation; and whatever formulae you write down, you try to compare the formulae with reality and thereby, somehow, you get closer to reality than by looking for the rigorous methods. ((Heisenberg, 2005), p. 106)

On the issue of phenomenological methods, however, he was equally cold: When you get into such a new field, the trouble is, that with phenomenological methods you are bound always to use the old concepts; because you have no other concepts, and making theoretical connections means then applying the old methods to this new situation. Therefore the decisive step is always a rather discontinuous step. You can never hope to go by small steps nearer and nearer to the real theory; at one point you are bound to jump, you must really leave the old concepts and try something new and then whether you can swim or stand or something else, but in any case you can’t keep the old concepts. ((Heisenberg, 2005), p. 106–7)

Heisenberg’s view of theory-development is more than a little Kuhnian,132 then. It is highly philosophical too: it blends interpretive issues with constructive issues. For example, he goes on to say: [I]n quantum mechanics ... first we had the mathematical scheme, and then, of course, we had to try to use a reasonable language in connection with it. Finally we could ask what concepts does this mathematical scheme imply and how do we have to describe nature? ((Heisenberg, 2005), p. 107)

Perhaps part of the problem is that many of the old guard who helped found quantum mechanics are now dead. Constructing quantum mechanics was a conceptually taxing venture: it made philosophers of many of those involved—many were already of a philosophical bent, of course. Quantum gravity research is similar in this way, only the times scales involved in its development are much longer. 5.7.4

Interpretation of Quantum Theory

I have largely steered clear of issues to do with the interpretation of quantum theory here. However, since, in most cases, we are dealing with quantum theories—in some cases quantum theory will be emergent—then the old troubles will reappear in this context. The question is: are there additional worries or complications that quantum gravity might bring to the interpretation of quantum theory? The clear answer seems to be Yes. Recall, for example, that Roger Penrose believes that 132 Indeed,

there are many points of overlap between the views of these two men on the subject of scientific devlopment.

360

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

gravitational effects have an important rˆ ole to play in quantum theory. Indeed, he thinks that the rules of quantum theory will have to be modified in the light of such effects. Note that this is not a general solution of the measurement problem: it applies only to those superpositions of observables that are relevant to gravitational distortions, mass, location, shape, etc. That there will be implications for the interpretation of quantum theory has been known for some time. This is the case in quantum cosmology too. As John Wheeler explains: The quantity ψ [the universe’s wave-function–DR] ... is a probability amplitude for something—but for what? and “observed” by whom? ... There is no platform outside the universe on which to stand to observe it. There is no possibility by external observations to produce uncontrollable disturbances and thereby transitions from one quantum state to another. It is not even clear that there is any but a unique quantum state. The tools most useful in other dynamical systems to distinguish one quantum state from another are completely missing here. “Total energy” is a term without meaning for a closed universe. ((Wheeler, 1964a), p. 517)

In the first article from his remarkable triptych on quantum gravity, Bryce DeWitt argues that quantum gravity is best interpreted ` a la Everett: Everett’s view of the world is a very natural one to adopt in the quantum theory of gravity, where one is accustomed to speak without embarrassment of the ‘wave function of the universe’. It is possible that Everett’s view is not only natural but essential. ((DeWitt, 1967c), p.1141)

I have yet to see an argument to the effect that Everett’s approach is necessary in quantum gravity. One can think of modal interpretations doing the job, or perhaps Bohmian mechanics—though the close relationships between these interpretations are well known. One thing does seem certain, and that is that the Copenhagen orthodoxy is ineffective in such a context since there is no (external) classical observer available. How true is this? Firstly, let us see what motivates this belief. The idea is that the universe, taken as an individual system in its own right, is a closed system, the only genuinely closed system there is, since quantum entanglements spread. This spread terminates at the level of the universe, making this a valid object for quantum theory: the universe is a quantum system too. Now, this is the domain of quantum cosmology, and this is clearly what DeWitt has in mind.133 The connection to quantum gravity results from the fact that the gravitational interaction dominates at such cosmological scales. If the universe is indeed a quantum object, and gravity is in operation here, then quantum cosmology will involve, in some sense, a quantum theory of gravity. However, collapse interpretations in general do not seem to be at odds with the notion of a wave-function of the universe: a spontaneous collapse 133 Indeed,

the context of the above quotation is a quantization of a closed Friedmann universe containing matter—the first example of quantum cosmology.

FINAL REMARKS: WHAT WILL THE FUTURE BRING?

361

interpretation—such as Penrose’s that utilizes the instabilities of a superposition of geometries—requires only ‘internal’ processes. It also seems likely that there will be decoherence effects between the geometry and matter, resulting in an ‘effective’ collapse. Moreover, as mentioned earlier, quantum gravity is a separate enterprise from this: one can talk of the measuring of a spatial region (e.g. (Rovelli, 2004)), and bring in all of the machinery and baggage of the Copenhagen interpretation if one is so inclined. We should, then, be careful with the claims that quantum gravity can have any bearing on the interpretation of quantum mechanics (functioning, as some have suggested, as some kind of ‘selection principle’ on the class of interpretations). However, this issue—of working out the relationships between quantum gravity proposals and interpretations of quantum gravity—would seem to be an excellent one for philosophers of physics to be tackling now.

5.8

Final Remarks: What Will the Future Bring?

For what it’s worth, I’ll finish by giving my own views on the status of quantum gravity, and what I see happening. We have seen that the experimental situation is not quite as dim as is often made out. However, the kinds of experiments that are on the table so far tend to be of the ‘generic’ sort: they are not very useful for ‘weeding out’ approaches, but they may enable the demonstration of some of the quantum properties of the gravitational field (or their non-existence!) and so put an end to the steady flow of papers asking whether we really need a quantum theory of gravity. It will also provide experimental values that the various approaches can aim for, for once we have secured this initial empirical basis, without further precision tests to separate out the various approaches, non-experimental factors may come into play, as regards theory-selection—notice, for example, how string theory is already becoming entrenched. However, it is also possible that the near-future will bring more ‘approachspecific’ tests too. There is some hope that the energies generated by the LHC will be sufficient to test more specific features of the approaches to quantum gravity, such as extra dimensions, supersymmetry, microscopic black hole behaviour, and so on. But one can all too readily envisage situations in which approaches are tweaked to fit. If supersymmetry is discovered to operate then string theory passes what ought to be a ‘crucial test’—or as crucial as can be given the ‘holistic’ lessons of Duhem and Lakatos. But this would not vindicate string theory, for it doesn’t uniquely mesh with supersymmetry or, indeed, extra dimensions. The most likely situation is that one approach may prove to be more heuristically useful, at suggesting promising research and experimental directions, and so on. Hence, we may end up with a family of approaches that can match each other experimentally, and claim compatibility with the results—if not quite able to predict the results. This does not imply that they are all ‘equally good’: selection often happens on the basis of more than experimental evidence alone. I

362

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

don’t think this necessarily implies that the extra-experimental factors are not perfectly rational ones. Lee Smolin, and many other members of the canonical camps, believe that the future will bring a background independent theory of quantum gravity— though perhaps not loop gravity. One of the arguments Smolin presents is that it would be a step backwards to move to background dependent theories after we have already been presented with Einstein’s background independent theory of gravitation. Since gravitation is universal it infects all other interactions (it can’t be screened off), so background independence should apply across the board. But this isn’t a good argument. The key problem is that it clashes with certain events from the history of physics, as bundled in the pessimistic meta-induction argument. Recall the twists and turns in the fortunes of æther theory, or the wave theory, or the particle theory. The wave and particle debate is particularly interesting here since that involved a tying together of the concepts in quantum theory and special relativity. However, we surely cannot expect a similar duality between background dependence and independence? At least I don’t see how any sense can be made of such an idea. Background independence appears to be being taken on board by all parties, string theorists included. We need to know how to interpret such theories: without any background how does one understand the ontology? That’s of more interest to philosophers perhaps, but there are related technical issues as well: How does one compute physical quantities, scattering matrices, and so on? More seriously, however, is the problem of working out what background independence means! What will happen in terms of the research environment? If there is to be progress I think it is inevitable that cross-fertilization across the camps will have to begin. Smolin’s book The Trouble with Physics (2006), though construed by most string theorists as a venomous attack, at least triggered some dialogue, if only negative at this stage. That book was important in that it provided a critical view on string theory, something that has been lacking because of the insular and difficult nature of contemporary quantum gravity research: string theorists don’t want to challenge their theory (given various academic pressures and so on), and non-string theorists usually don’t understand string theory well enough to give a confident reasoned critique. What of ‘traditional’ philosophy of science issues, such as the ‘realism/antirealism’ debate? I see no reason to suppose that quantum gravity will have an impact on this debate beyond that of quantum theory, relativity theory, and gauge theory taken separately. There are, for example, issues with quarks to do with their observability (or not) and the fact that they cannot be isolated, and so on. Likewise, our evidence for the existence of extra-dimensions, dark energy, and so on will be indirect too.134 More indirect? This question leads to an unhelpful 134 In this primer the focus has been mainly on space, time, and spacetime, as opposed to matter. Of course, quantum gravity will have things to say about matter too—more or less, depending on the approach taken. However, the choice to limit discussion of matter was guided

RESOURCES AND FURTHER READING

363

continuum of ‘directness’ that I think we should not pursue. Although I have tried to remain reasonably impartial in this review, I should nail my colours to the mast as regards my favoured approach. For several reasons I favour loop quantum gravity and its covariant spin foam extension. But the more I learn about string theory the more I can appreciate the claims of its beauty and breadth. Moreover, one cannot help but notice the many similarities between strings, loops, and others. Such personal attachments are bound to colour one’s judgement and allegiances. But solidified allegiance really ought to wait for experiment to catch up with theory, at least a little. String theory is already being pursued by its adherents as if it were ‘normal science’, a paradigm set in stone.135 The revolution in physics remains incomplete. It is, I think, a puzzle that will remain with us for much of the 21st century and fundamentally alter the way we think about the world. Philosophers would do well to catch up while the pieces of the puzzle are still manageable! 5.9

Resources and Further Reading

Here I supply information that will aid the reader in getting to grips with quantum gravity. On-Line Talks and Lectures It is a wonderfully exciting time in terms of learning physics: one can learn mathematics and physics from some of the present masters. There is a wealth of on-line lectures, courses, and seminars.136 An annotated list of some of the best of these follows: • A rapidly evolving database of talks on quantum gravity and surrounding research areas from the Perimeter Institute can be accessed at http://www.perimeterinstitute.ca/en/Scientific/Seminars/PIRSA/. • A vast repository of talks, many pertaining to quantum gravity, can be found at the Kavli Institute for Theoretical Physics’ website: http://dougpc.itp.ucsb.edu/online. • Talks from 2001: A Spacetime Odyssey can be found at http://pauli.physics.lsa.umich.edu/w/arch/som/sto2001/Lectures.html. • African Summer Theory Institute, featuring a variety of excellent introductory lectures leading up to quantum gravity— http://www.asti.ac.za/lectures.php. both by word count and by the fact that most approaches to quantum gravity shelve discussions of matter too. 135 There are two reasons for this, so far as I can see: (1) according to string theorists, quantum gravity demands unification of gravity with matter, so since string theory is the only consistent framework to achieve this, it has to be right. (2) according to string theorists, quantum gravity must be a theory of everything, and if this is the case, it must be unique. Neither is correct, of course, as the alternative approaches reveal. 136 Of course, though working at the time of writing, these links are not guaranteed to be permanent.

364

QUANTUM GRAVITY: A PRIMER FOR PHILOSOPHERS

• Video of talks from a summer institute on Gravity in the Quantum World and the Cosmos can be found at http://www.conf.slac.stanford.edu/ssi/2005/program.htm. • A vast array of talks, including many on quantum gravity and related areas—http://webcast.cern.ch/Projects/WebLectureArchive/index.html. • Some high quality quantum gravity talks from seminars and conferences can be found at the Pacific Institute of Theoretical Physics— http://pitp.physics.ubc.ca/upcoming/index.html. • Video of talks from a summer school on string theory, gravity and cosmology—http://www.pims.math.ca/science/2003/fmp/. • String theorists hold an annual Strings conference; talks from these have been uploaded for several years: ∗ http://www.damtp.cam.ac.uk/strings02/speak.html. ∗ http://www.yukawa.kyoto-u.ac.jp/contents/seminar/archive/2003/ str2003/speakerspro.html. ∗ http://www.fields.utoronto.ca/programs/scientific/04-05/stringtheory/strings2005/public talks.html. ∗ http://gesalerico.ft.uam.es/strings07/060 marco.htm. • An excellent series of talks from the two recent loop quantum gravity conferences (modelled on the Strings conferences), Loops ’05 and Loops ’07, can be found here: ∗ http://www.matmor.unam.mx/eventos/loops07. ∗ http://loops05.aei.mpg.de/index files/Programme.html. Reading My advice to those who are coming to quantum gravity afresh, or are fairly new to the area is to read the early papers on the subject, those mentioned in §5.4. Launching into most contemporary research articles in this area is liable to scare one off. The early papers, especially those which introduce some idea for the very first time, are far easier to make sense of, and will inevitably give one a better physical grasp on some idea. There are several review papers also that provide relatively easy access. There have recently appeared some general textbooks and collections that contain material of a fairly elementary level: (Ehlers and Friedrich, 1994; Kiefer, 2007; Stamatescu, 2007; Fauser et al., 2007; Giulini et al., 2003; Kowalski-Glikman, 2000). Specific textbooks on the more heavily researched approaches are also available, mainly on string theory and canonical quantum gravity. An excellent introduction to loop quantum gravity, and canonical approaches more generally, is (Rovelli, 2007)—this book also includes some good discussions of philosophical issues. The best introduction to string theory, for newcomers, is (Becker et al., 2007). Compared to the wealth of material available on the interpretation of quantum mechanics, there is very little material on the philosophical as-

RESOURCES AND FURTHER READING

365

pects of quantum gravity. However, if one looks hard enough, there is some to be found. Three books on the subject are: (Ashtekar and Stachel, 1991; Callender and Huggett, 2001; Rickles et al., 2006). Stephen Hawking once remarked that each mathematical formula in a popular science book halves the number of sales. I think that this pessimism, though it might have been right at the time, is no longer justified. There is a new brand of popular science book, generally written, like Hawking’s Brief History of Time, by ‘insiders’. However, these are not scared to include equations, though they are usually quick to explain their meanings. The first book of this kind was probably Roger Penrose’s The Emperor’s New Mind (2002)—he has written a new ‘popular’ book The Road to Reality: A Complete Guide to the Laws of the Universe (2007) that contains more mathematics (and more complex mathematics) than many physics research monographs. This book would make an excellent companion whilst trying to get to grips with quantum gravity, and indeed it contains introductory material on many of the approaches delivered with the touch of a true master expositor. There are a few more excellent popular accounts of quantum gravity, written by quantum gravity researchers. In order to get the ‘big picture’ I highly recommend Lee Smolin’s two books: (Smolin, 2002; 2006). Also, Julian Barbour’s book The End of Time (2001) is an excellent account of the conceptual problems involved in quantum gravity. For a superb account of string theory, see Brian Greene’s two books: (Greene, 2000; 2005). Acknowledgements I would like to thank Julian Barbour, Jeremy Butterfield, Joe Polchinski, Carlo Rovelli for their helpful comments and suggestions. Steven Weinstein deserves special thanks for his thorough reading of an earlier draft.

REFERENCES Aharonov, Y. and D. Bohm. (1959). Significance of electromagnetic potentials in the quantum theory. Physical Review, 115(3): 485–91. Amati, D., M. Ciafaloni, and G. Veneziano. (1989). Can spacetime be probed below the string scale? Physics Letters B, 216(1,2): 41–7. Ambjørn, J., M. Carfora, and A. Marzouli. (1997). The Geometry of Dynamical Trianglations. Lecture Notes in Physics. Springer-Verlag, Berlin. ————– (1996). Quantization of geometry. In F. David, P. Ginsparg and J. Zinn-Justin eds, Fluctuating Geometries in Statistical Mechanics and Field Theory: Proceedings of the Les Houches Summer School, pp. 77–195. NorthHolland. ————– J. Jurkiewicz, and R. Loll. (forthcoming). Quantum gravity, or the art of building spacetime. To appear in D. Oriti, ed, Approaches to Quantum Gravity. Cambridge: Cambridge University Press. Amelino-Camelia, G. and J. Kowalski-Glikman eds, (2005). Planck Scale Effects in Astrophysics and Cosmology. Springer-Verlag. Anandan, J. (1999). The quantum measurement problem and the possible role of the gravitational field. Foundations of Physics, 29(3): 333–48. Arnowitt, R., S. Deser, and C. W. Misner. (1962). The dynamics of general relativity. In L. Witten, ed, Gravitation: An Introduction to Current Research, pp. 227–65. Wiley. Ashtekar, A. and R. Geroch. (1974). Quantum theory of gravitation. Reports on Progress in Physics, 37: 1211–56. ————– (1986). New variables for classical and quantum gravity. Physical Review Letters, 57: 2244–7. ————– (1991). Introduction: The long and winding road to quantum gravity. In A. Ashtekar and J. Stachel eds, Conceptual Problems of Quantum Gravity, pp. 1–9. Birkha¨ user. ————– and R. S. Tate. (1991). Lectures on Non-Perturbative Canonical Gravity. World Scientific. ————– and J. Stachel. (1991). Conceptual Problems of Quantum Gravity. Birkha¨ user. ————– (1995). Mathematical problems of non-perturbative quantum general relativity. In B. Julia and J. Zinn-Justin eds, Les Houches, Session LVII, 1992: Gravitation and Quantizations, pp. 181–283. Elsevier Science Publishing Co.. Baez, J. C. and X. Munian. (1994). Gauge Fields, Knots, and Gravity. World Scientific. ————– (2005). Quantum quandaries: A category-theoretic perspective. In D. Rickles, S. French, and J. Saatsi eds, Structural Foundations of Quantum 366

REFERENCES

367

Gravity, pp. 240–65. Oxford University Press. ————– (2001). Higher-dimensional algebra and Planck scale physics. In C. Callender and N. Huggett eds, Physics Meets Philosophy at the Planck Scale, pp. 177–95. Cambridge: Cambridge University Press. ————– (2003). Quantization of area: The plot thickens. Matters of Gravity, 21. ————– (1998). Spin foam models. Classical and Quantum Gravity, 15(7): 1827–58. ————– I. E. Segal, and Z. Zhou. (1992). Introduction to Algebraic and Constructive Quantum Field Theory. Princeton University Press. Bain, J. (2008). Condensed matter physics and the nature of spacetime. In D. Dieks, ed, The Ontology of Spacetime II, pp. 301-29. Elsevier. Ballentine, L. E. (1982). Comment on “Indirect evidence for quantum gravity.” Physical Review Letters, 48(7): 522. J. Barbour. (2001). The End of Time: The Next Revolution in Physics. Oxford University Press. ————– (2001). The Discovery of Dynamics: A Study from a Machian Point of View of the Discovery and the Structure of Dynamical Theories. Oxford University Press. ————– (forthcoming). Absolute or Relative Motion: The Deep Structure of General Relativity. Oxford University Press. ————– (1994). The timelessness of quantum gravity: II. The appearance of dynamics in static configurations. Classical and Quantum Gravity, 11(12): 2875–97. Barrett, J. and L. Crane (1998). Relativistic spin networks and quantum gravity. Journal of Mathematical Physics, 39: 3296–302. Becker, K., M. Becker, and J. H. Schwarz. (2007). String Theory and M-Theory: A Modern Introduction. Cambridge: Cambridge University Press. Bekenstein, J. (1973). Black holes and entropy. Physical Review D, 7: 2333–46. Belot, G., J. Earman, and L. Ruetsche. (1999). The Hawking information loss paradox: The anatomy of a controversy. British Journal for. Philosophy of Science, 50: 189–229. ————– and J. Earman (2001). PreSocratic quantum gravity. In C. Callender and N. Huggett eds, Physics Meets Philosophy at the Planck Scale, pp. 213– 55. Cambridge: Cambridge University Press. ————– (2007). Background independence. http://www.pitt.edu/ gbelot/Papers/Background1.pdf. Bergmann, P. G., and A. Komar. (1972). The coordinate group symmetries of general relativity. International Journal of Theoretical Physics, 5: 15–28. ————– and A. Komar. (1980). The phase space formulation of general relativity and approaches toward its canonical quantization. In A. Held, ed, General Relativity and Gravitation: One Hundred Years After the Birth of Albert Einstein., Vol. 1, pp. 227–54. Plenum Press. ————– (1949). Non-linear field theories. Physical Review, 75(4): 680–5.

368

REFERENCES

————– (1949). Non-linear field theories ii: Canonical equations and quantization. Reviews of Modern Physics, 21: 480–7. Blokhintsev, D. I. and F. M. Gal’perin. (1934). Gipoteza neitrino i zakon sokhraneniya energii. Pod znamenem marksizma, 6: 147–57. Bojowald, M. (2005). Loop quantum cosmology. Living Reviews in Relativity, 8(11). http://www.livingreviews.org/lrr-2005-11 Bokulic, P. (2001). Black hole remnants and classical vs. quantum gravity. Philosophy of Science. Supplement: Proceedings of the 2000 Biennial Meeting of the Philosophy of Science, 68(3): S407–S423. ————– (2005). Does black hole complementarity answer Hawking’s information loss paradox? Philosophy of Science, 72: 1336–49. Brown, L. M. ed (1992). Renormalization: From Lorentz to Landau (and beyond). Springer-Verlag. Butterfield, J. (2001). The end of time? The British Journal for the Philosophy of Science, 53:289–330. ————– and J. Earman eds, (2007). Philosophy of Physics, Vol. 2 of Handbook of the Philosophy of Science. Elsevier Science Publishing Co. ————– and C. J. Isham. (2001). Spacetime and the philosophical challenge of quantum gravity. In C. Callender and N. Huggett eds, Physics Meets Philosophy at the Planck Scale, pp. 33–89. Cambridge: Cambridge University Press. ————– and C. J. Isham. (1999). On the emergence of time in quantum gravity. In J. Butterfield, ed, Arguments of Time, pp. 111–68. Oxford University Press. ————–and C. J. Isham. (2001). Some possible roles for topos theory in quantum theory and quantum gravity. Foundations of Physics 30(1): 1707– 35. ————– (2007). Stochastic Einstein locality revisited. The British Journal for the Philosophy of Science 58(4): 805-67. Callender, C. and N. Huggett eds, (2001). Physics Meets Philosophy at the Planck Scale. Cambridge: Cambridge University Press. ————– and R. Weingard. (2001). Time, Bohm’s theory, and quantum cosmology. Philosophy of Science, 63(3): 470–4, 1996. Cao, T-Y. (2004). Conceptual Foundations of Quantum Field Theory. Cambridge: Cambridge University Press. ————–(2001). Prerequisites for a consistent framework of quantum gravity. Studies in the History and Philosophy of Modern Physics, 32(2): 181–2004. ————– and S. Schweber. (1993). The conceptual foundations and philosophical aspects of renormalization theory. Synthese, 97(1): 33–108. Carlip, S. (2001). Quantum gravity: A progress report. Reports on Progress in Physics, 64: 885. Carroll, S. (2003). Spacetime and Geometry: An Introduction to General Relativity. Benjamin Cummings. Cartwright, N. (1999). The Dappled World: A Study of the Boundaries of

REFERENCES

369

Science. Cambridge: Cambridge University Press. Castellani, E. (2002). Reductionism, emergence, and effective field theories. Studies in History and Philosophy of Modern Physics 33(2): 251–67. Connes, A. (1994). Non-Commutative Geometry. Academic Press, New York. Curiel, E. (2001). Against the excesses of quantum gravity: A plea for modesty. Philosophy of Science, 68(3): S424–S441. Deltete, R. J. and R. A. Guy. (2004). Emerging from imaginary time. Synthese, 108(2): 185–203. Deser, S. and P. van Nieuwenhuizen. (1974). One-loop divergences of quantized Einstein-Maxwell fields. Physical Review D, 10(2): 401–10. ————– (1957). General relativity and the divergence problem in quantum field theory. Reviews of Modern Physics, 29: 417–23. ————– (1975). Quantum gravitation: Trees, loops and renormalization. In C. J. Isham, R. Penrose, and D. W. Sciama eds, Quantum Gravity: An Oxford Symposium, pp. 136–73. Oxford University Press. ————– (1975). Uniqueness and nonrenormalizability of quantum gravitation. In G. Shaviv and J. Rosen eds, General Relativity and Gravitation 1974, proceedings of the 7th International Conference on General Relativity and Gravitation (GR7), held June 23 - 28 1974, in Tel-Aviv, Israel, pp. 1–8. John Wiley & Sons. DeWitt, B. S. (1962). The quantization of geometry. In L. Witten, ed, Gravitation: An Introduction to Current Research, pp. 266–381. Wiley. ————– (1965). Dynamical Theory of Groups and fields. John Wiley & Sons. ————– (1967). Quantum theory of gravity. 1. The canonical theory. Physical Review, 160: 1113–48. ————– (1967). Quantum theory of gravity. ii. The manifestly covariant theory. Physical Review, 162(5): 1195–239. ————–(1967). Quantum theory of gravity. iii. Applications of the covariant theory. Physical Review, 162(5): 1239–56. ————– (1970). Quantum theories of gravity. General Relativity and Gravitation, 1(2): 181–9. ————– (1980). Quantum gravity: The new synthesis. In S. Hawking and W. Israel eds, General Relativity: An Einstein Centernary Survey, pp. 680– 745. Cambridge: Cambridge University Press. DeWitt, C. M. ed (1964). Relativity, Groups and Topology, 1963. Routledge. Di´ osi, L. (1989). Models for universal reduction of macroscopic quantum fluctuations. Physical Review, A40: 1165–74. Dirac, P. A. M. (1950). Generalized Hamiltonian dynamics. Canadian Journal of Mathematics, 2: 129–48. ————– (1951). The Hamiltonian form of field dynamics. Canadian Journal of Mathematics, 3: 1–23. ————– (1951). A new classical theory of electrons. Proceedings of the Royal Society of London A, 209: 291–6. ————– (1958). The Principles of Quantum Mechanics. Oxford University

370

REFERENCES

Press. ————– (1958). The theory of gravitation in Hamiltonian form. Proceedings of the Royal Society of London A, 246: 333–43. ————– (1978). The mathematical foundations of quantum theory. In A. R. Marlow, ed, Mathematical Foundations of Quantum Theory, pp. 1–8. Academic Press. ————– (1987). The inadequacies of quantum field theory. In B. Kursunoglu and E. Wigner eds, Mathematical Foundations of Quantum Theory, pp. 194– 8. Cambridge: Cambridge University Press. Dittrich, B. and T. Thiemann. (2007). Are the spectra of geometrical operators in loop quantum gravity really discrete? arXiv:0708.1721v2. Di Vecchia, P. (1999). An introduction to AdS/CFT correspondence. Fortschritte der Physik, 48(1-3): 87–92. Dodelson, S. (2003). Modern Cosmology. Academic Press. Dreyer, O. (2003). Quasinormal modes, the area spectrum, and black hole entropy. Physical Review Letters, 90(8): 081301–081304. Duff, M. J. (1999). A layman’s guide to M-theory. In J. Ellis, F. Hussain, T. Kibble, G. Thompson, and M. Virasoro, eds, The Abdus Salam Memorial Meeting, pp. 184–213. World Scientific. Earman, J. and J. Norton eds, (1997). The Cosmos of Science. University of Pittsburgh Press. Ehlers, J. and H. Friedrich eds, (1994). Canonical Gravity: From Classical to Quantum : Proceedings of the 117th We Heraeus Seminar Held at Bad Honnef, Germany, 13-17 September 1993. Springer-Verlag. Einstein, A. (1918). Dialogue about objections to the theory of relativity. In A. Engel, ed, The Collected Papers of Albert Einstein, Vol. 7, The Berlin Years: Writings: 1918-1921, English Translation of Selected Texts. Princeton University Press, 2002. ————– (1916). Approximate integration of the field equations of gravitation. In A. Engel, ed, The Collected Papers of Albert Einstein, Vol. 6, The Berlin Years: Writings: 1914-1917, English Translation of Selected Texts. Princeton University Press, 1997. ————– (1916). The principal ideas of the theory of relativity. In A. Engel, ed, The Collected Papers of Albert Einstein, Vol. 7, The Berlin Years: Writings: 1918-1921, English Translation of Selected Texts, The Berlin Years: Writings: 1914-1917, English Translation of Selected Texts. Princeton University Press, 2002. ————– (1931). Gravitational and electrical fields. Science, 74: 438–39. ————– and P. Bergmann (1938). On a generalization of Kaluza’s theory of electricity. The Annals of Mathematics, 39(3): 683–701. ————– and W. Pauli (1943). On the non-existence of regular stationary solutions of relativistic field equations. The Annals of Mathematics, 44(2): 131–7. ————– (1931). Out of My Later Years. Castle Books.

REFERENCES

371

Eppley, K. and E. Hannah. (1931). The necessity of quantizing the gravitational field. Foundations of Physics, 7(1/2): 51–68. Faddeev, L. D. and V. N. Popov. (1931). Feynman diagrams for the Yang-Mills field. Physics Letters B, 25(1): 29–30. Fauser, B., J. Tolksdorf and E. Zeidler eds, (2007). Quantum Gravity: Mathematical Models and Experimental Bounds. Birkh¨ user Basel. Feynman, R. P. (1962). The Feynman Lectures on Physics, Vol.2. AddisonWesley Publishing Co. ————– (1963). Quantum theory of gravitation. Acta Physica Polonica, 24: 697–722. ————– (1995). Feynman Lectures on Gravitation (B. Hatfield, ed). AddisonWesley Publishing Co. Fierz, M. and W. Pauli. (1939). On relativistic wave-equations for particles of arbitrary spin in an electromagnetic field. Proceedings of the Royal Society of London A, 173: 211–32. ¨ ————– (1939). Uber die relativistische theorie kraftefreier teilchen mit beliebigem spin. Helvetica Physica Acta, 12: 3–37. Fischer, A. (1970). The theory of superspace. In M. Carmeli, S. I. Fickler, and L. Witten eds, Relativity—Proceedings of the Relativity Conference in the Midwest: Cincinnati, Ohio, June 2–6, 1969. Plenum Press, New York. Fleischhack, C. (2007). Kinematical uniqueness of loop quantum gravity. In B. Fauser, J. Tolksdorf, and E. Zeidler eds, Quantum Gravity: Mathematical Models and Experimental Bounds, pp. 203-219. Birkh¨ user Basel. Fock, V. (1976). The Theory of Space, Time and Gravitation. Pergamon Press. Friedel, L. (1976). Group field theory: An overview. International Journal of Theoretical Physics, 44(10): 1769–83. Galison, P. (1995). Theory bound and unbound: Superstrings and experiments. In F. Weinert, ed, Laws of Nature: Essays on the Philosophical, Scientific and Historical Dimensions, pp. 369-407. Berlin: Walter de Gruyter. Gambini, R. and J. Pullin. (1996). Loops, Knots, Gauge Theories, and Quantum Gravity. Cambridge: Cambridge University Press. ————– and J. Pullin. (1999). Quantum gravity experimental physics? General Relativity and Gravitation, 31(11): 1631–7. Gasperini, M. (2007). Elements of String Cosmology. Cambridge: Cambridge University Press. ————– and J. Maharana. (2008). String Theory and Fundamental Interactions: Gabriele Veneziano and Theoretical Physics: Historical and Contemporary Perspectives. Springer. Ghirardi, C. G., R. Grassi, and A. Rimini. (2008). Continuous-spontaneousreduction model involving gravity. Physical Review, A42: 1057–64. Gibbons, G. W. and S. W. Hawking eds, (1993). Euclidean Quantum Gravity. World Scientific Pub Co Inc. ————– and J. B. Hartle. (1990). Real tunneling geometries and the largescale topology of the universe. Physical Review, D42(8): 2458–68.

372

REFERENCES

Giulini, D. J. W., C Kiefer and C L¨ ammerzahl eds, (2003). Quantum Gravity: From Theory to Experimental Search. Springer. ————– (2007). Some remarks on the notions of general covariance and background independence. In I.O. Stamatescu, ed, Approaches to Fundamental Physics: An Assessment of Current Theoretical Ideas, pp. 105-21. SpringerVerlag. Goldstein, S. and S. Teufel. (2001). Quantum spacetime without observers: Ontological clarity and the conceptual foundations of quantum gravity. In C. Callender and N. Huggett eds, Physics Meets Philosophy at the Planck Scale, pp. 275–89. Cambridge: Cambridge University Press. Gorelik, G. (1992). First steps of quantum gravity and the Planck values. In J. Eisenstaedt, ed, Studies in the History of General Relativity, Vol. 3 of Einstein Studies, pp. 364–79. Birkha¨ user. ————–(2005). Matvei Bronstein and quantum gravity: 70th anniversary of the unsolved problem. Physics-Uspekhi, 48(10): 1039–53. Graves, J. C. (1971). The Conceptual Foundations of Contemporary Relativity Theory. M.I.T. Press. Green, M. B. and J. H. Schwartz. (1984). Anomaly cancellation in supersymmetric d=10 gauge theory and superstring theory. Physics Letters B, 149: 117–22. ————– ————– and E. Witten. (1987). Superstring Theory. Vol. 1: Introduction. Cambridge: Cambridge University Press. ————– (1999). Superstrings, M Theory and Quantum Gravity. Classical and Quantum Gravity, 16: A77–A100. Greene, B. (2000). The Elegant Universe. Vintage. ————– (2005). The Fabric of the Cosmos: Space, Time, and the Texture of Reality. Vintage. Gross, D. (2005). Einstein and the search for unification. Current Science 89(12): 2035–2040. Gross, D. (1999). Renormalization groups. In P. Deligne, P. Etingof, D. S. Freed, L. C. Jeffrey, D. Kazhdan, J. W. Morgan, D. R. Morrison, and E. Witten eds, Quantum Fields and Strings: A Course for Mathematicians, pp. 551–98. American Mathematical Society. Gupta, S. (1952). Quantization of Einstein’s gravitational field: General treatment. Proceedings of the Physical Society A, 65: 608–19. ————– (1952). Quantization of Einstein’s gravitational field: Linear approximation. Proceedings of the Physical Society A, 65: 161–9. H´ ajiˇcek, P. (1996). Time evolution and observables in constrained systems. Classical and Quantum Gravity, 13: 1353–75. Hardy, L. (2007). Towards quantum gravity: A framework for probabilistic theories with non-fixed causal structure. Journal of Physics A, 40: 3081–99. Hartle, J. B. (1995). Spacetime quantum mechanics and the quantum mechanics of spacetime. In B. Julia and J. Zinn-Justin eds, Gravitation and Quantizations, Session LVII of Les Houches, 5 July-1 August 1992, pp. 285–480.

REFERENCES

373

North Holland. ————– and K. Kuchaˇr. (1984). The role of time in path integral formulations of parametrized theories. In S. M. Christensen, ed, Quantum Theory of Gravity: Essays in Honor of the 60th Birthday of Bryce S DeWitt, pp. 315–26. Adam Hilger Ltd. ————– and S. Hawking. (1983). Wave function of the universe. Physical Review D, 28(12): 2960–75. Hartmann, S. (2001). Effective field theories, reductionism and scientific explanation. Studies in History and Philosophy of Modern Physics 32(2): 267–304. Hawking, S. and G. Ellis. (1973). The Large Scale Structure of Space-Time. Cambridge: Cambridge University Press. ————– (1971). Gravitational radiation from colliding black holes. Physical Review Letters, 26: 1344–6. ————– Particle creation by black holes. (1975). Communications in Mathematical Physics, 43: 199–220. ————– Quantum gravity and path integrals. (1978). Physical Review D, 18: 1747–53. ————– The path-integral approach to quantum gravity. (1979). In S. Hawking and W. Israel eds, General Relativity: An Einstein Centernary Survey, pp. 746–89. Cambridge: Cambridge University Press. Healey, R. (2004). Change without change, and how to observe it in general relativity. Synthese, 141: 381–415. Henneaux, M. and C. Teitelboim. (1992). Quantization of Gauge Systems. Princeton University Press. Henson, J. (forthcoming). The causal set approach to quantum gravity. To appear in D. Oriti, ed, Approaches to Quantum Gravity: Towards a New Understanding of Space and Time. Cambridge University Press. Heisenberg, W. and W. Pauli. (1929). Zur quantendynamik der wellenfelder. Zeitschrift fur Physik, 56: 1–61. ————– Theory, criticism, and a philosophy. (2005). In A. Salam, ed, Unification of Fundamental Forces: The First of the 1988 Dirac Memorial Lectures, pp. 85–124. Cambridge University Press. ————– (1995). The universal length appearing in the theory of elementary particles. In A. I. Miller, ed, Early Quantum Electrodynamics: A Source Book, pp. 244–53. Cambridge University Press. Huggett, N. and R. Weingard. (1995). The renormalisation group and effective field theories. Synthese, 102: S159–S167. ————– and R. Weingard. (1996). Exposing the machinery of infinite renormalization . Philosophy of Science, Vol. 63, Supplement. Proceedings of the 1996 Biennial Meetings of the Philosophy of Science Association. Part I. Contributed Papers: 171–94. Isham, C. J. (1991). Conceptual and geometrical problems in quantum gravity. In H. Mitter and H. Gausterer eds, Recent Aspects of Quantum Fields, pp. 123–229. Springer.

374

REFERENCES

————– (1993). Canonical quantum gravity and the problem of time. In L.A. Ibort and M.A. Rodriguez eds, Integrable Systems, Quantum Groups, and Quantum Field Theories, pp. 157–288. Springer. ————– (1995). Lectures on Quantum Theory: Mathematical and Structural Foundations. Imperial College Press. Jacobson, T. and L. Smolin. (1988). Nonperturbative quantum geometries. Nuclear Physics B, 299(2): 295–345. Janssen, M., J. Renn, J. Norton, T. Sauer, and J. Stachel eds (2006). General Relativity in the Making : Einstein’s Zurich Notebook, Commentary and Essays . Springer. K´ arolyh´ azy, F., A. Frenkel, and B. Luk´ acs. (1986). On the possible role of gravity on the reduction of the wavefunction. In R. Penrose and C. J. Isham eds, Quantum Concepts in Space and Time, pp. 109–28. Oxford University Press. ————– (1966). Gravitation and quantum mechanics of macroscopic bodies. Nuovo Cimento, A42: 390–402. Kent, A. (2005). A proposed test of the local causality of spacetime. http://arxiv.org/abs/gr-qc/0507045. Kibble, T. W. B. (1981). Is a semiclassical theory of gravity viable? In C. J. Isham, R. Penrose, and D. W. Sciama eds, Quantum Gravity II: A Second Oxford Symposium, pp. 63–80. Oxford University Press. Kiefer, C. (2006). Quantum gravity: General introduction and recent developments. Annals of Physics, 15(1-2): 129–48. ————– (2007). Quantum Gravity. Oxford University Press. Klein, O. (1956). Generalisations of einstein’s theory of gravitation considered from the point of view of quantum field theory. In A. Mercier and M. Kervaire eds, F¨ unfzig Jahre Relativit¨ atstheorie, pp. 58–71. Birkha¨ user Verlag, Basel. Kopczy´ nscki, W. and A. Trautman. (1992). Spacetime and Gravitation. John Wiley & Sons. Kowalski-Glikman, J. (2000). Towards Quantum Gravity: Proceedings of the XXXV International Winter School on Theoretical Physics, Held in Polanica, Poland, 2-11 February 1999. Springer. Kraichnan, R. H. (1955). Special-relativistic derivation of generally covariant gravitation theory. Physical Review, 98(4): 1118–22. Kubo, J. and M. Nunami. (2003). Unrenormalizable theories can be predictive. The European Physical Journal, C26: 461–72. Kuchaˇr, K. (1992). Time and interpretations of quantum gravity. In G. Kunstatter, D. E. Vincent and J. G. Williams. 4th Canadian Conference on General Relativity and Relativistic Astrophysics, pp. 211–314. World Scientific. Kuhn, T. S. (1964). The Structure of Scientific Revolutions. University of Chicago Press. L¨ ammerzahl, C. (2007). The search for quantum gravity effects. In B. Fauser, J¨ urgen Tolksdorf, and E. Zeidler eds, Quantum Gravity: Mathematical Models and Experimental Bounds, pp. 15–39. Birkha¨ user, Basel.

REFERENCES

375

Laughlin, R. B. (2003). Emergent relativity. International Journal of Modern Physics A18: 831-54. ————– (2006). A Different Universe: Reinventing Physics from the Bottom Down. Basic Books. ————– and D. Pines. (2000). The theory of everything. Proceedings of the National Academy of Sciences 97: 28-31. Lewandowski, J., A. Okol´ ow, H. Sahlmann, and T. Thiemann. (2006). Uniqueness of diffeomorphism invariant states on holonomy–flux algebras. Communications in Mathematical Physics, 267(3): 703–33. Liddle, A. R. and D. H. Lyth. (2000). Cosmological Inflation and Large-Scale Structure. Cambridge: Cambridge University Press. Lockwood, M. (1991). Mind, Brain and the Quantum: The Compound ’I’ . Blackwells. Loll, R. (1998). Discrete approaches to quantum gravity in four dimensions. Living Reviews in Relativity 1, http://www.livingreviews.org/lrr-1998-13. Maartens, R. (2004). Brane-world gravity. Living Reviews in Relativity 7: http://www.livingreviews.org/lrr-2004-7. MAGIC Collaboration. (2007). Probing quantum gravity using photons from a Mkn 501 flare observed by MAGIC. http://arxiv.org/abs/0708.2889. Mandelstam, S. (1962). Quanization of the gravitational field. Proceedings of the Royal Society of London A, 270(1342): 346–53. ————– (1968). Feynman rules for the gravitational field from the coordinateindependent field theoretic formalism. Physical Review, 175: 1604–23. Maoz, D. (2007). Astrophysics in a Nutshell. Princeton University Press. Marolf, D. (2004). Resource letter nsst-1: The nature and status of string theory. American Journal of Physics, 72(6): 730–41. Mattingly, J. (2006). Why Eppley and Hannah’s thought experiment fails. Physical Review D, 73: 062025. ————– (2006). Is quantum gravity necessary? In J. Eisenstaedt and A. Kox eds, The Universe of General Relativity, Vol. 11 of Einstein Studies, pp. 327– 38. Birkha¨ user, Boston. Maudlin, T. (1996). On the unification of physics. The Journal of Philosophy, 93(3): 129–44. ————– (2004). Thoroughly muddled McTaggart: Or, how to abuse gauge freedom to create metaphysical monostrosities. Philosophers’ Imprint, 2(4), http://www.philosophersimprint.org/002004. Mehra, J. (2000). The Historical Development of Quantum Theory. SpringerVerlag. Meschini, D. (2007). Planck-scale physics: Facts and beliefs. Foundations of Science, 12(4), 277-94. Mills, R. (1993). Tutorial on infinities in QED. In L. M. Brown, ed, Renormalization: From Lorentz to Landau (and Beyond), pp. 59–85. Springer-Verlag. Misner, C. W., K. S. Thorne, and J. A. Wheeler. (1973). Gravitation. W. H. Freeman and Co..

376

REFERENCES

————– (1957). Feynman quantization of general relativity. Reviews of Modern Physics, 29: 497–509. ————– (1969). Quantum cosmology. Physical Review, 186: 1319–27. ————– (1972). Some topics for philosophical inquiry concerning the theories of mathematical geometrodynamics and of physical geometrodynamics. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association: 7–29. Møller, C. (1962). The energy-momentum complex in general relativity and related problems. In A. Lichnerowicz and M. A. Tonnelat, eds, Les Theories Relativistes de la Gravitation, pp. 2–29. Paris: Centre National de la Recherche Scientifique. Montesinos, M., C. Rovelli, and T. Thiemann. (1999). An SL(2, R) model of constrained systems with two Hamiltonian constraints. Physical Review D, 60: 044009. Morrison, M. (2006). Emergence, reduction, and theoretical principles: Rethinking fundamentalism. Philosophy of Science 73: 876–87. Motl, L. (2007). Special theories: Good and bad. Available online: http://motls.blogspot.com/2007/05/special-theories-good-and-bad.html. Mukhanov, V. (2005). Physical Foundations of Cosmology. Cambridge: Cambridge University Press. Nambu, Y. (1970). Quark model and the factorization of Veneziano amplitude. In R. Chand, ed, Symmetries and Quark Models: Proceedings of the International Conference held at Wayne State University, Detroit, Michigan, June 18-20, 1969, p. 269. Gordon and Breach Science Publishers, New York. ————– (1985). Directions of particle physics. Progress of Theoretical Physics (Supplement), 85: 104–10. Niedermaier, M. and M. Reuter. (2006). The asymptotic safety scenario in quantum gravity. Living Reviews in Relativity, 9(5): http://www.livingreviews.org/lrr-2006-5. Nielson, H. (1970). An almost physical interpretation of the Veneziano amplitude. Delivered at the 15th International Conference on High Energy Physics, Kiev. Oriti, D. (2005). The Feynman propagator for quantum gravity: Spin foams, proper time, orientation, causality and timeless-ordering. Brazilian Journal of Physics, 35(2B): 481–8. ————– (2006). Quantum gravity as a quantum field theory of simplicial geometry. In B. Fauser, J. Tolksdorf and E. Zeidler, ed, Quantum Gravity, pp. 101–26. Birkh¨ auser Verlag Basel. Page, D. N. and C. D. Geilker. (1981). Indirect evidence for quantum gravity. Physical Review Letters, 47: 979–82. ————– (1982). Page responds. Physical Review Letters, 48(7): 523. Pauli, W. (1926). Quantentheorie. In H. Geiger and K. Scheel eds, Handbuch der Physik IV, pp. 1–278. Berlin: Springer. ————– (1967). Theory of Relativity. Pergamon Press.

REFERENCES

377

Penrose, R. and P. Marcer. (1998). Quantum computation, entanglement and states reduction. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 356(1743):1927–39. ————– and R. S. Ward. (1980). Twistors for flat and curved space-time. In A. Held, ed, General Relativity and Gravitation: One Hundred Years After the Birth of Albert Einstein, Vol. 2, pp. 283–328. Plenum Press. ————– (1967). Twistor theory. Journal of Mathematical Physics, 8: 345–66. ————– (1969). Gravitational collapse: The role of general relativity. Nuovo Cimento, 1: 252–76. ————– (1971). Angular momentum: An approach to combinatorial spacetime. In T. Bastin, ed, Quantum Theory and Beyond, pp. 151–80. Cambridge: Cambridge University Press. ————– (1979). Combinatorial quantum theory and quantized directions. In L. P. Hughston and R. S. Ward eds, Advances in Twistor Theory, number 37 in Research Notes in Mathematics, pp. 301–7. Pitman. ————– (1996). On gravity’s role in quantum state reduction. General Relativity and Gravitation, 28: 581–600. ————– (1999). The central programme of twistor theory. Chaos, Solitons & Fractals, 10(2-3): 581–611. ————– (2000). The Large, the Small and the Human Mind. Cambridge: Cambridge University Press. ————– (2007). The Road to Reality: A Complete Guide to the Laws of the Universe. Vintage. ————– (2002). The Emperor’s New Mind: Concerning Computers, Minds, and the Laws of Physics. Oxford University Press. Peres, A. (1962). On Cauchy’s problem in general relativity. Nuovo Cimento, 26: 53–62. ————– (1962). Canonical quantization of gravitational field. Physical Review, 171(5): 1335–44. Petersen, J. L. (1999). Introduction to the Maldacena conjecture on AdS/CFT. International Journal of Modern Physics A, 14(23): 3597–672. ¨ Planck, M. (1899). Uber irreversible strahlungsvorg¨ ange. Funfte Mitteilun (Schluss). Berl. Ber.: 440–80. Poisson, E. (2004). A Relativist’s Toolkit: The Mathematics of Black-Hole Mechanics. Cambridge: Cambridge University Press. Polchinski, J. (1995). Dirichlet branes and Ramond-Ramond charges. Physical Review Letters, 75(26): 4724–7. Randall, L and R. Sundrum. (1999). An alternative to compactification. Physical Review Letters, 83(23): 4690–3. Regge, T. (1961). General relativity without coordinates. Nuovo Cimento, 19: 558–71. Reichenbach, H. (1969). Axiomatization of the Theory of Relativity. University of California Press. Rickles, D. and S. French. (2006). Quantum gravity meets structuralism: In-

378

REFERENCES

terweaving relations in the foundations of physics. In D. Rickles, S. French, and J. Saatsi eds, The Structural Foundations of Quantum Gravity, pp. 1–39. Oxford University Press. (2006). ————– S. French, and J. Saatsi eds, The Structural Foundations of Quantum Gravity. Oxford University Press. ————– (2005). Interpreting quantum gravity. Studies in the History and Philosophy of Modern Physics, 36: 691–715. ————– (2005). A new spin on the hole argument. Studies in the History and Philosophy of Modern Physics, 36: 415–34. ————– (2006). Time and structure in canonical gravity. In D. Rickles, S. French, and J. Saatsi eds, The Structural Foundations of Quantum Gravity, pp. 152–95. Oxford University Press. ————– (2007). Symmetry, Structure and Spacetime. Philosophy and Foundations of Physics. Elsevier. ————– (2008). Who’s afraid of background independence? In D. Dieks, ed, The Ontology of Spacetime II, pp. 133-52. Elsevier. Robb, A. A. (1936). Geometry of Time and Space. Cambridge: Cambridge University Press. Robson, B. (1996). An introduction to quantum cosmology. In N. Visvanathan and W. S. Woolcock eds, Cosmology: The Physics of the Universe, pp. 473– 531. World Scientific. ¨ Rosenfeld, L. (1930). Uber die gravitationswirkungen des lichtes. Zeitschrift fur Physik, 65: 589–99. ————– (1963). On quantization of fields. In R. S. Cohen and J. Stachel eds, Selected Papers of L´eon Rosenfeld (1979), pp. 598–608. Reidel, Dordrecht. ————– (1966). Quantum theory and gravitation. In R. S. Cohen and J. Stachel eds, Selected Papers of L´eon Rosenfeld (1979), pp. 599–608. Reidel, Dordrecht. Rovelli, C. (2004). Quantum Gravity. Cambridge: Cambridge University. Rovelli, C. and L. Smolin. (1988). A new approach to quantum gravity based on loop variables. In B. R. Lyer, ed, Highlights in Gravitation and Cosmology. Cambridge: Cambridge University Press. Rovelli, C. and L. Smolin. (1995). Spin networks and quantum gravity. Physical Review D, 52(10): 5743–59. Rovelli, C. (1991). Is there incompatibility between the ways time is treated in general relativity and standard quantum mechanics. In A. Ashtekar and J. Stachel eds, Conceptual Problems of Quantum Gravity, pp. 126–39. Birkha¨ user, Basel. Rovelli, C. (1997). Halfway through the woods: Contemporary research on space and time. In J. Earman and J. Norton eds, The Cosmos of Science, pp. 180–223. University of Pittsburgh Press, Pittsburgh. Rovelli, C. (1998). String, loops and others: A critical survey of the present approaches to quantum gravity. arXiv:gr-qc/9803024v3. Rovelli, C. (2002). Partial observables. Physical Review D, 65: 124013–8.

REFERENCES

379

Rovelli, C. (2002). Notes for a brief history of quantum gravity. In V. G. Gurzadyan, R.T. Jantzen, and R. Ruffini eds, The Ninth Marcel Grossmann Meeting. Proceedings of the MGIXMM Meeting held at The University of Rome “La Sapienza”, 2-8 July 2000, pp. 742–68. World Scientific. Rovelli, C. (2006). The disappearance of space and time. In D. Dieks, ed, The Ontology of Spacetime, Vol. 1 of Philosophy and Foundations of Physics, pp. 25–36. Elsevier, Amsterdam. Rovelli, C. (2007). Quantum gravity. In J. Butterfield and J. Earman eds, Philosophy of Physics, Part B, pp. 1287–329. Elsevier Science Publishing Co. R¨ uger, A. (1989). Complementarity meets general relativity: A study in ontological commitments and theory unification. Synthese, 79: 559–80. Salam, A. (1980). Gauge unification of fundamental forces. Science, 210(4471): 723–32. Salisbury, D. C. (2007). Rosenfeld, Bergmann, Dirac and the invention of constrained Hamiltonian dynamics. To appear in H. Kleinert, R.T. Jantzen and R. Ruffini eds, Proceedings of the Eleventh Marcel Grossmann Meeting on General Relativity. World Scientific, Singapore. Salmon, W. C. (1984). Scientific explanation. In M. Salmon, J. Earman, C. Glymour, J. Lennox, P. Machamer, J. McGuire, J. Norton, W. Salmon, and K. Schaffner, Introduction to the Philosophy of Science, pp. 7-41. Englewood Cliffs, NJ: Prentice Hall. Saunders, S. (2002). Is the zero-point energy real? In M. Kuhlmann, H. Lyre, and A. Wayne eds, Ontological Aspects of Quantum Fiekd Theory, pp. 137–66. World Scientific. Schutz, B. (2003). Gravity from the Ground Up: An Introductory Guide to Gravity and General Relativity. Cambridge: Cambridge University Press. Schwarz, J. H. (1999). Introduction to M theory and AdS/CFT duality. In A. Ceresole, ed, Quantum Aspects of Gauge Theories, Supersymmetry and Unification, pp. 1–21. Springer-Verlag. Schweber, S. S. (1992). Changing conceptualization of renormalization theory. In L. M. Brown, ed, Renormalization: From Lorentz to Landau (and Beyond), pp. 137–66. Springer-Verlag. ————– (1994). QED and the Men Who Made It: Dyson, Feynman, Schwinger, and Tomonaga. Princeton University Press. Seiler, E. and I-O Stamatescu eds (2007). Approaches to Fundamental Physics: An Assessment of Current Theoretical Ideas. Springer. Sen, A. (1982). Gravity as a spin system. Physics Letters B, 119: 89–91. Smolin, L. (1999). Towards a background independent approach to M theory. Chaos, Solitons & Fractals, 10(2-3): 555–65. ————– (forthcoming). Generic predictions of quantum theories of gravity. In D. Oriti, ed, Approaches to Quantum Gravity—Toward a New Understanding of Space, Time, and Matter. Cambridge: Cambridge University Press. ————– (2006). The Trouble With Physics: The Rise of String Theory, the Fall of a Science, and What Comes Next. Houghton Mifflin.

380

REFERENCES

————– (2002). Three Roads to Quantum Gravity. Perseus Books Group. Snyder, H. S. (1947). Quantized space-time. Physical Review, 71: 38–41. Sorkin, R. (1983). Posets as lattice topologies. In B. Bertotti, F de Felice, and A. Pascolini eds, General Relativity and Gravitation: Proceedings of the GR10 Conference, Vol. 1, pp. 635–7. Consiglio Nazionale Delle Ricerche, Rome. ————– (2000). Indications of causal set cosmology. International Journal of Theoretical Physics, 39(7): 1731–6. ————– (2002). Causal sets: Discrete gravity. arXiv:gr-qc/0309009v1. Stachel, J. (1999). The early history of quantum gravity. In B. R. Lyer and B. Bhawai eds, Black Holes, Gravitational radiation and the Universe, pp. 528–32. Kluwer Academic. Stelle, K. S. (2000). The unification of quantum gravity. Nuclear Physics B, proceedings Supplements, 88: 3–9. Stora, R. and B. S. DeWitt eds, (1986). Relativity, Groups and Topology, II: Proceedings of the Les Houches Summer School, Session XL, 27 June-4 August, 1983. Les Houches Summer School Proceedings. Elsevier Science Publishing Co. Strominger, A. and C. Vafa. (1996). On the microscopic origin of the BekensteinHawking entropy. Physics Letters B, 379(1): 99–104. Straumann, N. (2008). The history of the cosmological constant problem. arXiv:gr-qc/0208027v1. ————– (2007). Dark energy. In I.O. Stamatescu, ed, Approaches to Fundamental Physics: An Assessment of Current Theoretical Ideas, pp. 327–98. Springer-Verlag. Susskind, L. (1970). Dual symmetric theory of hadrons i. Nuovo Cimento, A69: 457–96. ————– (2005). The Cosmic Landscape: String Theory and the Illusion of Intelligent Design. Little, Brown and Co. ————– and J. Lindsay. (2005). An Introduction to Black Holes, Information and the String Theory Revolution: The Holographic Universe . World Scientific Publishing Co. Pte. Ltd. Teller, P. (1989). Infinite renormalization. Philosophy of Science, 56(2): 238-57. Thirring, W. E. (1961). An alternative approach to the theory of gravitation. Annals of Physics, 16: 96–117. t’Hooft, G. and M. Veltman. (1972). Regularization and renormalization of gauge fields. Nuclear Physics B, 44: 189–213. ————– (1971). Renormalizable Lagrangians for massive Yang-Mills fields. Nuclear Physics B, 35: 167–88. ————– (1979). Quantum gravity: A fundamental problem and some radical ideas. In M L´evy and S. Deser eds, Recent Developments in Gravitation, Proceedings from Carg`ese 1978, Vol. B44 of NATO Advanced Study Institutes Series, pp. 323–45. D. Reidel, Dordrecht. ————– (2001). Can there be physics without experiments? Challenges and pitfalls. In L. Bergstr¨ om and U. Lindstr¨ om eds, The Oskar Klein Memorial

REFERENCES

381

Lectures, Vol. 3. World Scientific. Unruh, W. (1984). Steps towards a quantum theory of gravity. In S. M. Christensen, ed, Quantum Theory of Gravity: Essays in Honor of the 60th Birthday of Bryce S DeWitt, pp. 234–42. Adam Hilger Ltd, Bristol. van Dantzig, D. (1956). On the relation between geometry and physics and the concept of space-time. In A. Mercier and M. Kervaire eds, F¨ unfzig Jahre Relativit¨ atstheorie, pp. 48–53. Birkha¨ user Verlag, Basel. van Nieuwenhuizen, P. (1981). Supergravity. Physics Reports, 68(4): 189–398. Veneziano, G. (1968). Construction of a crossing-symmetric, Regge-behaved amplitude for linearly rising trajectories. Nuovo Cimento, A57: 190–7. Volovik, G. (2003). The Universe in a Helium Droplet. Oxford University Press. Wald, R. (1984). General Relativity. University of Chicago Press. ————– (1986). Spin-two fields and general covariance. Physical Review D, 33(12): 3613–25. Weinberg, S. (1964). Photons and gravitons in S-matrix theory: Derivation of charge conservation and equality of gravitational and inertial mass. Physical Review, 135(4B): B1049–B1056. ————– (1979). Ultraviolet divergences in quantum theories of gravitation. In S. W. Hawking and W. Israel eds, General Relativity: An Einstein Centenary Survey, pp. 790–831. Cambridge: Cambridge University Press. Weinstein, S. (2006). Anthropic reasoning and typicality in multiverse cosmology and string theory. Classical and Quantum Gravity 23: 4231–6. ————– (2005). Quantum gravity. Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/quantum-gravity/. ————– (2001). Absolute quantum mechanics. The British Journal for the Philosophy of Science 52: 67-73. ————– (2000). Naive quantum gravity. In C. Callendar and N. Huggett eds, Physics Meets Philosophy at the Planck Scale, pp. 90–100. Cambridge: Cambridge University Press. Wheeler, J. A. and E. F. Taylor. (1992). Spacetime Physics. W. H. Freeman and Co. ————– (1964). Geometrodynamics and the issue of the final state. In C. DeWitt and B. DeWitt eds, Relativity, Groups and Topology, pp. 317– 520. Gordon and Breach Science Publishers. ————– (1964). Gravitation as geometry—ii. In H-Y Chiu and W. F. Hoffmann eds, Gravitation and Relativity, pp. 65–89. W. A. Benjamin, Inc. ————– (1968). Superspace and the nature of quantum geometrodynamics. In C. M. DeWitt and J. A. Wheeler eds, Battelle Recontres. 1967 Lectures in Mathematics and Physics, pp. 242–307. W. A. Benjamin, Inc. Witten, E. (1988). Topological quantum field theory. Communications in Mathematical Physics, 117(3): 353–86. ————– (1989). The search for higher symmetry in string theory. Transactions of the Royal Society of London, A329: 349–57. ————– (1995). String theory dynamics in various dimensions. Nuclear

382

REFERENCES

Physics B, 443(1): 85–126. ————– (2001). Black holes and quark confinement. Current Science, 81(12): 1576–81. ————– (2004). Perturbative gauge theory as a string theory In twistor space. Communications in Mathematical Physics, 252: 189–258. W¨ uthrich, C. (2005). To quantize or not to quantize: Fact and folklore in quantum gravity. Philosophy of Science, 72: 777–88. Zhang, S. C. (2004). To see a world in a grain of sand. In J. D. Barrow, P. C. W. Davies, C. L. Harper eds, Science and Ultimate Reality: Quantum Theory, Cosmology and Complexity, pp. 667–90. Cambridge: Cambridge University Press. Zeeman, E. C. (1964). Causality implies the Lorentz group. Journal of Mathematical Physics 5: 490–3. Rugh, S. E. and H. Zinkernagel. (1964). The quantum vacuum and the cosmological constant problem. Studies In History and Philosophy of Modern Physics 33(4): 663-705.

INDEX M-theory, 319

Descriptive versus revisionary metaphysics, 10 Determination, 104 DeWitt, B.S., 357 Dilaton, 314 Direction of time, 175 Duff, M., 262 Dynamical triangulations, 344

Albert, D. Z., 129 Ashtekar, A., 276 Background independence, 280, 352 Baez, J., 263 BBGKY hierarchy, 168 Bekenstein-Hawking entropy, 271, 307 Bell’s theorem, 78 Birkhoff Theorem, 121 Black hole information paradox, 271 suggested resolutions, 272 Boltzmann entropy, 105 Boltzmann’s Law contrast with second law of thermodynamics, 105 the H-theorem, 105 Boltzmann, L., 103 Branes, 319 Brussels-Austin School, 167 Bub, J., 236

Earman, J., 128 Effective field theory, 298 Ehrenfests, 116 urn model, 117 Eigenvalue-eigenvector link, 21, 43 Eigenvector-eigenvalue link, 57 fuzzy link, 57 Counting anomaly, 58 mass density link, 59 Einstein, A. on quantum gravity, 285 Emergence of classical spacetime, 334, 354 Ensemble, 141 Entanglement as resource, 211 quantification, 219 thermodynamics, 250 Entanglement-assisted Communication, 211 Entropy, 176 EPR, 35, 203 Equilibrium statistical mechanics, 99 Ergodic programme, 106 Ergodic theorem definition of ergodicity, 121 Ergodic theory, 121 Everett Interpretation, 39 Everett interpretation personal identity, 48 preferred basis problem, 40–42, 45 probability problem, 40, 47 incoherence problem, 48 quantitative problem, 48, 49 subjective uncertainty program, 49 Experimental metaphysics, 9

Callender, C., 128 Canonical quantum gravity, 323 Cartwright, N., 276 Causal sets, 347 Church-Turing hypothesis, 237 physical version, 238 Coarse-graining, 107 Combinatorial argument, 107 Compton wavelength, 270 Consistent histories, 26, 33 ontological status of wave function, 35 principle of unicity, 34 Constraints in general relativity diffeomorphism constraint, 328 Gauss law constraint, 327, 328 Hamiltonian constraint, 328 Cosmic background radiation, 273 Covariant quantization, 308 Decoherence theory, 22–24 decoherence timescale, 23 environment-induced, 23 pointer basis, 24 system/environment division, 28 Decoherent histories, 28, 32 Dennett, D., 46

Feynman quantization, 340 Feynman, R. P., 219, 351 Fine, A., 7 Functionalism, 41, 46 Fundamentalism, 276

383

384 Gauge transformation, 293 General relativity, 291 new variables, 327 Generalized quantum mechanics, 348 geometrodynamics, 325 Gibbs phase averaging, 146 Gibbs’ paradox, 175 Gibbs, J. W. approach to statistical mechanics, 140 Graviton, 315 Gravitons, 275 Group field theory, 345 Grover’s search algorithm, 221 GRW theory, 52 Problem of tails, 56 Hawking radiation, 271, 306 Healey, R., 337 Heisenberg, W., 358 Hidden variable theories, 63, 66 de Broglie-Bohm theory, 72 Modal interpretations, 68 ontology of state vector, 76 probability rule, 74 underdetermination, 73 Hole problem, 331 Holevo bound, 200, 212 Holonomy, 328 IGUS, 45 Information specification versus accessible, 200 Internal time, 334 Interpretation, 7 Irreversible processes, 100, 177 It from Bit, 233 Jaynes, E. T. epistemic approach to statistical mechanics, 168 Kant, I., 6 Khinchin, A. Y., 151 Kinematics versus dynamics, 293 Kuchaˇr, K., 333 Kuhn, T. S., 13 Landscape problem, 318 Lewis, D., 133 Liouville’s theorem, 142, 180 Loop quantum gravity, 328 Loschmidt demon, 161 Loschmidt’s Objection, 117, 134 Macrostates, 104 and macro-regions, 104

INDEX as supervenient on microstates, 104 Malament-Hogarth spacetime, 238 Maldacena conjecture, 321 Maximum entropy principle, 169 Maxwell’s demon, 176 Maxwell-Boltzmann distribution, 110, 119 Measure zero problem, 125 Measurement problem, 16, 20, 21 and gravitation, 283 Microcanonical measure, 148 Microcausality condition, 353 Nagel, E., 138 No-cloning theorem, 201 Non-contextuality, 36 Non-equilibrium statistical mechanics, 100 Occupation numbers, 108 One-way quantum computation, 235 Operationalism, 35–38 Past hypothesis statistical postulate, 131 Penrose, R., 266, 283 Perturbation theory, 309 Phase transitions, 177 Philosophy of physics, 4 Planck units, 270 Poincar´e’s Recurrence Theorem, 118 Price, H., 127 Principle versus constructive theories, 250 Probability conditional, 115 in statistical mechanics, 106 of the macro-state, 106 of the micro-state, 106 interpretations, 114, 133 ontic, 153 principal principle, 52 unconditional, 115 Problem of time, 333 Quantum Bayesianism, 248 Quantum black holes, 269, 272 Quantum chaos, 25 Quantum computers, 219 and many-worlds interpretation, 234 quantum gates, 220 speed-up, 234 Universal quantum computer, 219 Quantum cosmology, 267 as distinct from quantum gravity, 273 Wheeler-DeWitt equation, 267 Quantum cryptography, 204 eavesdropping, 205 BB84 protocol, 206

INDEX sifted key, 207 BBB protocol, 204 experiments, 210 key distribution, 205 one-time pad, 205 Quantum electrodynamics, 275 Quantum field theory axiomatic approach, 296 Quantum geometry, 331 Quantum gravity, 12 experimental status, 355 necessity of quantization, 301 perturbative non-renormalizability, 310 Quantum gravity phenomenology, 356 Quantum information ontology of, 227 Quantum information theory, 12 and interpretation of quantum mechanics, 240 Quantum mechanics, 11 bare theory, 42 Dirac-von Neumann formulation, 31 dynamical-collapse theories, 52 new pragmatism, 32 orthodox interpretation, 30 traditional formulation, 21 Quantum system specification of, 17 Quantum teleportation, 214 Quantum theory bare formalism, 18 measurement, 18, 19 Positive-operator-valued measurement, 18 Projection-valued measurement, 18 Quantum topology, 345 Quasi-equilibrium, 157 Qubit, 199 Randomness, 114 Redhead, M. L. G., 4 Reduction and idealisation, 120 bridge laws, 138 of thermodynamics to statistical mechanics, 120, 139, 174 Reductionism, 137 Regge calculus, 344 relative configuration space, 335 Relativistic quantum physics, 78 quantum field theory, 79 measurement problem, 83 ontology, 80 status of particles, 81 Renormalization, 275, 297 cutoff, 277

385 Rosenfeld, L., 286 Rovelli, C., 263 RSA, 206, 221 Schmidt decomposition, 211 Schwarzschild radius, 270 Second law of thermodynamics, 100, 117, 139 and quantum black holes, 307 Semiclassical Quantum Gravity, 300 Shannon entropy, 169 Shannon information, 227 Shannon, C., 223 Shimony, A., 9 Shor’s algorithm, 221 Spin foams, 348 Spin-echo experiment, 159 Standard model, 278 Statistical mechanics, 11 arrangement, 108 distribution, 108 Statistical Postulate, 106 Stochastic dynamics, 167 Strawson, P. F., 10 String theory, 311 black holes, 323 minimum length, 322 Superdense Coding, 212 Supergravity, 347 Superspace, 294 Supervenience, 104 The CBH theorem, 245 interpretation, 247 The past hypothesis, 127 Theory of everything, 265 Thermodynamics laws of, 182 time Heraclitean view, 334 Parmenidean view, 334 Time reversal invariance, 139, 181 Turning machine, 220 Twistor theory, 346 Underdetermination, 8 of solutions by equations, 330 Unification, 266, 279 Universal law, 105 Unruh effect, 271 Unruh, W., 268 van Fraassen, B., 7 Wheeler-DeWitt equation, 327, 335 Wick rotation, 298

386 Zeilinger’s Foundational Principle, 243 Zermelo’s recurrence objection, 118, 134

INDEX