Unsettled Thoughts: A Theory of Degrees of Rationality [Hardcover ed.] 0198833717, 9780198833710

How should thinkers cope with uncertainty? What makes their degrees of belief rational, and how should they reason about

368 16 9MB

English Pages 240 [228] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Unsettled Thoughts: A Theory of Degrees of Rationality [Hardcover ed.]
 0198833717, 9780198833710

Citation preview

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

Unsettled Thoughts

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

Unsettled Thoughts A Theory of Degrees of Rationality JULIA STAFFEL

1

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

3

Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Julia Staffel 2019 The moral rights of the author have been asserted First Edition published in 2019 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2019946143 ISBN 978–0–19–883371–0 DOI: 10.1093/oso/9780198833710.001.0001 Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A. Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

Preface This book, and the research project in formal epistemology of which it is a part, emerged from an unlikely place. Entering graduate school, I was convinced I would specialize in philosophy of language, and semantics in particular. Everything changed when I took graduate seminars with Jake Ross, Kenny Easwaran, and Jim Van Cleve, in which I learned about the existence of formal epistemology, and its attempts to rigorously explicate what it means to be rational. I was intrigued, but also slightly baffled. Having studied the semantics of gradable adjectives with Manfred Krifka in Berlin a couple of years earlier, it seemed clear to me that “rational” belonged to the class of gradable adjectives, and I wondered how one might conceptualize degrees of rationality. To my surprise, very little had been written about this. Suddenly I found myself with a research project in epistemology. I was fortunate to have Jake Ross as my advisor, whose ingenious and tireless support I am extremely grateful for. I also greatly benefited from working with Kenny Easwaran, Mark Schroeder, and Ralph Wedgwood at USC. I couldn’t have asked for a better team of mentors. Branden Fitelson helped kick-start the project by supervising an independent study project at Berkeley in the summer of 2009, and by teaching me how to use Mathematica. He has remained a friendly supporter of my research ever since. A couple of times when the project took a wrong turn, Teddy Seidenfeld helped get me back on track. Another enthusiastic supporter of my work from the beginning was Alan Hájek, who generously invited me to spend the 2013 fall semester at the ANU as part of his research project “The Objects of Probabilities.” After finishing my dissertation and publishing a couple of pieces from it, I wasn’t sure whether to continue this line of inquiry, or focus on something else. Around this time, I received an email out of the blue from Glauber De Bona, who was at the time a PhD student in Computer Science at the University of São Paolo, studying ways of measuring incoherence in probabilistic data sets. We hit it off, and we have so far written two papers together, with hopefully more to come. Some of the results of our joint work are incorporated in the discussion of accuracy and Dutch books in Chapters 4 and 5. I couldn’t ask for a better collaborator, and I hope to eventually meet him in person.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

vi



I seriously started working on gathering my views on formal models of non-ideal rationality into a book manuscript in the 2016–17 academic year, during which I was funded by an ACLS Research Fellowship and a First Book Fellowship from the Humanities Center at Washington University in St. Louis. I would especially like to thank some of my fellow St. Louis philosophers from that time for their support and advice: Lizzie Schechter, Kathryn Lindeman, John Doris, and Casey O’Callaghan. In April 2017, the Humanities Center hosted a workshop on the first part of the manuscript, where I received enormously helpful comments from Mike Titelbaum, Carl Craver, Jon Kvanvig, and Casey O’Callaghan. I finished a first draft of the book in the late fall of 2017. I received invaluable feedback on it at a two-day workshop hosted by Washington University in St. Louis in December 2017, during which each chapter was subjected to intense scrutiny and discussion. Contributors to this event included Richard Pettigrew, Ralph Wedgwood, Jonathan Weisberg, Casey O’Callaghan, Roy Sorensen, Matthew Babb, Brian Talbot, John Doris, Lizzie Schechter, Julia Haas, Anya Plutynski, Alan Hazlett, and Eric Brown. Parts of the manuscript were also discussed by my online formal epistemology reading group, and I would like to thank its members Catrin Campbell-Moore, Richard Pettigrew, Kenny Easwaran, Jason Konek, Ben Levinstein, Bernard Salow, and Pavel Janda. Other philosophers whose advice and insight I have greatly benefited from include Hanti Lin, Lyle Zynda, Brad Armendt, Matt Kotzen, Sara Moss, Eric Wiland, Eric Hochstein, Joe Salerno, John Greco, Sylvia Wenmackers, Jessica Collins, Liam Kofi Bright, Teddy Seidenfeld, Paul Silva, Adam Morton, Leszek Wronski, Daniel Greco, Sinan Dogramaci, Jim Joyce, Justin Snedegar, Blake Roeber, Miriam Schoenfield, David Wiens, and Graham Oddie. I worry that I have forgotten people who really belong on this list—if you are one of them, thank you, and I apologize for not including you here. I have presented material from this book to various audiences over the years, and I am grateful for the time and effort people have invested in helping me develop my ideas. I would also like to thank my editor, Peter Momtchiloff, for his continued support, and for ensuring a smooth publishing process. Kenny Easwaran deserves special thanks for the very useful feedback he gave me as a reader for Oxford University Press. The person who deserves the most gratitude is my incomparable husband Brian Talbot. He has supported me all these years, and he has read and listened to more versions of this material than I could ever have reasonably asked for. Whenever I encountered obstacles, whether they were philosophical or motivational, I could count on him for helping me see the next step.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

Contents List of Figures

1. Introduction 2. De-Idealizing Bayesianism Introduction 1. Bayesian Basics 2. Current Bayesian Research 3. How to Characterize Bayesian Methods 4. Ideal Bayesian Norms and Non-ideal Thinkers Conclusion

3. Approximation Measures Introduction 1. Basic Desiderata for Approximation Measures 2. Qualitative Measures 3. Quantitative Measures Conclusion

4. Why Approximate Coherence? The Dutch Book Justification

ix

1 12 12 13 19 23 29 33

34 34 35 39 45 54

56

Introduction 1. Dutch Book Arguments 2. Degrees of Incoherence and Dutch Books Conclusion

56 57 58 67

Appendix

69

5. Why Approximate Coherence? The Accuracy Justification

73

Introduction 1. The Accuracy Dominance Argument for Probabilism 2. Distance Measures and Accuracy 3. Can We Increase Accuracy and Reduce Dutch Book Vulnerability at Once? 4. Do We Still Have Too Many Measures? Conclusion

73 74 76

6. Approximating Ideal Rationality with Multiple Norms

95

Introduction 1. Beyond Coherence 2. Formal Possibilities for Using Measures: Bundle and Piecemeal Strategies 3. Application to Cases

95 96

85 87 93

103 106

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

viii

 4. Does this Ruin Our Nice Results About Approximating Coherence? Conclusion Appendix A: Extending the Argument from Chapter 5 to the PP and PoI Appendix B: The Case of Conflicting Ideals

7. Evaluating Credence Change Introduction 1. Ways of Evaluating Credence Change 2. Revising Credences 3. Augmenting Credences 4. Updating Credences Conclusion

8. A Small Piece of the Puzzle Introduction 1. Rationality: Propositional and Doxastic 2. Rationality: Ideal and Ecological 3. Rationality: Evaluative and Ameliorative 4. Rationality: Epistemology and Semantics 5. Rationality: Permissions and Obligations Conclusion

9. How Do Beliefs Simplify Reasoning? Introduction 1. The Puzzle 2. Solving the Puzzle 3. The Descriptive and Normative Consequences of Rejecting PC Conclusion

116 119

121 125 127 127 128 132 134 147 150

152 152 153 156 156 162 165 169

171 171 172 183 190 197

10. Matters To Be Settled

199

References Index

207 215

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

List of Figures Fig. 3.1 Absolute distance

48

Fig. 3.2 Euclidean distance

49

Fig. 3.3 Chebyshev distance

49

Fig. 3.4 Kullback–Leibler divergence

50

Fig. 5.1 Accuracy dominance

76

Fig. 5.2 Approximating coherence

84

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

1 Introduction We are uncertain about many things. I am uncertain about whether it will rain this weekend, I am fairly confident that I locked my front door this morning, and I think it is almost certain that my lottery ticket will lose. When we reason and make decisions, we must work with our degrees of confidence about what the world is like. This raises a descriptive and a normative question. The descriptive question is: How do people reason with uncertain information? The normative question is: How does a rational person reason with uncertain information? Philosophers have focused on the normative question, and the many more specific normative questions about epistemic rationality that this overarching question branches out into, such as: What degrees of belief, or credences, are rational for a thinker to have, and why? How should a rational thinker’s credences cohere with one another? How does a rational thinker change their credences in response to learning new information? How should a rational thinker’s credences be informed by their knowledge of the objective chances? What credences are rational for a thinker to have in the absence of any evidence? Can two thinkers rationally adopt different credences based on the same body of evidence? Epistemologists have made a lot of headway towards answering these questions in recent years by using formal models to represent credences, and by using mathematical and logical methods to investigate the normative properties of people’s degrees of belief. In order to be able to use these formal models, one must first decide how to represent credences. Our credences can range from certainty that something is false to certainty that it is true. Ordinary human thinkers’ degrees of confidence are a vague and messy affair. In many cases, the most appropriate characterization of our mental states specifies a vague range our credence falls in, such as “very high, fairly high, more confident than not,” etc. We are also often able to specify our confidence comparatively, for example as being (slightly/much) more confident in A than in B. In some cases, it makes sense to attribute precise credences to people, for example in simple games of chance. It doesn’t seem unreasonable to ascribe to someone a 1/6 credence that a die will come up 5 when rolled. The formal

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

2



epistemologist’s task is then to propose a systematic way of representing this messy assortment of more and less precise degrees of confidence. The most popular approach is to represent credences as falling somewhere in the interval of the real numbers between 0 and 1, where having a degree of belief of 0 marks the low end of the confidence scale, and 1 the high end. Intermediate degrees of belief are represented with precise real numbers in this interval. A function that assigns precise degrees of belief to different propositions can be viewed as giving a simplified model of a thinker’s actual degrees of confidence. This way of modeling credences is not entirely unfamiliar from everyday contexts: we sometimes say “I am 90 percent sure of this”, or “It’s 50/50 that this will happen.” In relying on models of credences that represent them as point-valued, formal epistemologists employ a technique for measuring real-life quantities that is familiar from other contexts. We often need to rely on somewhat arbitrary ways of making quantities precise that aren’t as precise in real life. For example, suppose I need to make a plan for which trees to cut back based on their height and width. I need to both record how high and wide each tree currently is, and also give guidelines for how much to cut them. But trees are unwieldy objects, and there is no precise number such that it is the exact height of that tree. Branches bend and move, leaves fall off, and each measuring position we choose will yield a slightly different value. There’s some arbitrariness involved in the measurements, but it is still useful to pretend that the trees’ measurements can be precisely specified in singling out which ones to cut and how much to cut them. A similar argument can be made in favor of representing degrees of belief numerically in ways that are artificially precise. I will adopt this way of representing credences throughout the book. Once a format for representing credences has been chosen, we can use formal models to state and defend normative constraints on people’s credences. The most influential and powerful framework for engaging in this project is Bayesianism, which develops norms of ideal rationality for credences by using formal, probabilistic models. A central tenet of the Bayesian view is that rational people’s credences should be representable as probabilities. If a person’s credences can be modeled as agreeing with a probability function, then they satisfy a necessary condition for being rational. We say in those cases that a thinker has (probabilistically) coherent credences. While this is a constraint that affects a thinker’s credences at a time, Bayesians also propose rules for how rational thinkers should update their credences in response to incoming evidence, such as the conditionalization rule. These are not the only constraints on rational credences that Bayesians adopt, but they are the most

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



3

widely accepted ones. Current Bayesian research is largely concerned with developing and criticizing arguments that are meant to justify these norms of rationality, but also with exploring which further norms of rationality and reasoning there are that pertain to credences. While philosophers have focused on normative questions about what makes our credences rational, the descriptive question of how we in fact reason and make decisions based on our credences has been taken up by psychologists and cognitive scientists. They have produced a large body of research showing that human reasoners don’t live up to the requirements of rationality that Bayesians propose. The norm violations psychologists find range from mild divergences from the proposed norms to fairly egregious violations, such as the well-known base-rate fallacy. This failure of human reasoners to live up to Bayesian standards of rationality is unsurprising once we pay attention to how demanding Bayesian requirements of rationality are. They are explicitly formulated as requirements of ideal rationality, and fully complying with them has been shown to be extremely computationally demanding. Since human thinkers have limited computational power, attention, and time available to them to manage their credences, being ideally rational by Bayesian standards is simply out of reach for them. This raises the question of how the norms of ideal rationality are even relevant to human thinkers. Presumably, we are interested in answering the question of what it takes for us to be rational. If all we can learn from formal epistemologists is that we fail to be rational according to standards that can at best be met by angels or other limitless supercreatures, then the Bayesian research program does not really answer the questions we care about. Bayesians tend to respond that the norms they propose are ideals that imperfect thinkers should approximate. The more a thinker’s imperfectly rational credences approximate compliance with norms of ideal rationality, the better. On this view, rational ideals are useful even if we can never fully comply with them. Yet, current Bayesian models don’t have the resources to substantiate this argument. They only characterize norms of ideal rationality, but they aren’t able to distinguish between better and worse ways of failing to comply with the ideal norms. They can neither judge which of two credence distributions approximates ideal rationality more closely, nor can they demonstrate that approximating ideal rationality more closely is better for non-ideal thinkers. I am not the first to notice that this is a serious problem for the Bayesian approach to theorizing about epistemic rationality, but little has been done so far to address it (see e.g. Earman 1992, Zynda 1996).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

4



In order to vindicate the idea that Bayesian norms of rationality are relevant to human thinkers as ideals that should be approximated, we need to make precise and justify its two core assumptions. The first assumption is that nonideal thinkers can be more or less irrational, and that being less irrational is better. This is intuitively plausible, but we need to find a precise sense in which this is true about non-ideally rational thinkers. Second, it is assumed that nonideal thinkers should “approximate” ideal rationality, which suggests that we can understand the way in which being less irrational is better (if there is such a way) in terms of how far away the thinker’s credences are from the ideal. But can this notion of distance be made formally precise, or is it just an evocative metaphor? The reason why it’s not obvious that the first assumption is justifiable is that there are some ideals that are “winner takes all.” To see that there aren’t always degrees of goodness when there is an ideal worth striving for, compare the ideal of being fluent in Portuguese to the ideal of being the first choice for your dream job. Being the first choice for your dream job is clearly an ideal situation. You would benefit from it by having the opportunity to pursue a stimulating career, to earn money by doing something you love, etc. However, there are natural ways of conceiving of approximations to this ideal that don’t get you an increasing share of these benefits. If you are the second choice for the position, you get no more of a portion of the benefits associated with the ideal state than if you are the fifth choice for the position. The benefits don’t obtain in any of the non-ideal situations, even if some of them are intuitively closer to the ideal state than others. Hence, on this way of understanding approximations to the ideal, being closer does not equate being better off. By contrast, suppose that your ideal is to be fluent in Portuguese. Benefits of fluency in Portuguese include being able to talk to other Portuguese speakers, understanding Portuguese literature and films, etc. Approximating the ideal of being fluent clearly gives you an increasing portion of the benefits associated with being completely fluent. The closer you are to being fluent, the better you can converse with other Portuguese speakers, and understand films and literature. Bayesians clearly think of approximations to norms of rationality as being similar to the language example rather than the job example. Yet, this view needs to be justified. It needs to be shown what benefits are associated with being ideally rational, and how approximating ideal rationality gives the thinker an increasing portion of these benefits. Assuming that Bayesians are right that approximating ideal rationality is not a “winner takes all” case, they are still left with the question of how we should understand the notion of approximating an ideal. Can it be captured by

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



5

a formal measure of distance? There is an abundance of possible measures of closeness to choose from, but it’s not guaranteed that any particular measure has the right features such that getting closer to ideal rationality according to it tracks the benefits we’re trying to capture. The aim of this book is to show how Bayesians can precisify and justify these two assumptions, which helps them develop a comprehensive account of the epistemic rationality of degrees of belief. I explain how we can measure to what extent a thinker’s credences approximate compliance with norms of ideal rationality, and how these measures track various benefits of being less irrational. Bayesianism provides an extremely attractive, powerful framework for developing norms of rationality for credences, and once we gain the ability to model the credences of non-ideal thinkers, we can use it to answer a wide variety of questions about epistemic rationality. My discussion will proceed as follows. Chapter 2 is concerned with the question of how we should develop a comprehensive normative theory of the epistemic rationality of credences. Its aim is twofold: (i) to familiarize readers with the basics of the Bayesian framework that are essential for understanding the arguments in subsequent chapters, and (ii) to offer an interpretation of the goals and methods of the Bayesian framework that reveals its shortcomings when applied to non-ideal thinkers. I begin by giving an overview of current avenues of Bayesian research, and how they contribute to the project of developing a comprehensive account of what makes our credences rational. I show that Bayesianism in its current form is best understood as framework for building a normative theory of what makes our credences ideally rational. Most of the recent work in this area takes a particular direction, namely towards formulating and justifying norms that constitute requirements of ideal epistemic rationality. While this is an important avenue of research, it doesn’t help us make progress towards answering the question of how Bayesianism can provide us with interesting normative standards for non-ideal thinkers. The framework of ideal norms that is currently being developed lacks the capacity to distinguish between better and worse ways of being imperfectly epistemically rational. Moreover, it lacks the resources to substantiate a central Bayesian claim, namely that ideal epistemic norms apply to the beliefs of non-ideal thinkers as aims that should be approximated. Bayesianism can only offer a comprehensive theory of rational credence if it can offer an account of how thinkers who aren’t fully rational can approximate the ideal Bayesian norms. In Chapter 3, I begin to show how this can be accomplished. My focus is the central Bayesian tenet that a

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

6



thinker’s unconditional credences should be probabilistically coherent. It seems intuitively compelling that thinkers who don’t have coherent credences can be more or less incoherent, i.e. their credences can diverge from complying with the probability axioms only a little, or quite substantially. We can capture this intuitive idea by representing a thinker’s credence function as a vector, and measuring its distance to the closest probabilistically coherent credence function that is defined over the same set of propositions. The problem that arises in this context is that there are many possible measures we can use to determine the distance from coherence, and those measures can deliver incompatible rankings. I present a representative range of such measures and illustrate the ways in which they differ. The question we are left with at the end of this chapter is: Which, if any, of the many incompatible measures of distance from coherence should we adopt? In Chapters 4 and 5, I show that we cannot select an appropriate measure of distance from coherence until we answer another important question: Why is it good to be less, rather than more incoherent? An intuitively compelling answer is: because credences can perform their function better when they approximate coherence more closely. To develop this thought, I present two popular arguments that defend the norm of ideal coherence. The Dutch book argument takes the function of credences to be to guide our actions. The argument uses bets as stand-ins for actions more generally, and demonstrates that incoherent credences justify betting behavior that leads to guaranteed losses, whereas coherent credences don’t. The accuracy dominance argument takes the function of credences to be to represent the world as accurately as possible. This argument shows that if a thinker’s credences are incoherent, there is another coherent credence function they could adopt instead that is more accurate, or closer to the truth, regardless of what the world is like. Coherent credence functions, by contrast, are not accuracy-dominated in this way. I argue that we can generalize these arguments in order to show why it is good for non-ideal thinkers to have credences that are less, rather than more incoherent. If we use a particular measure of the distance to coherence, reducing incoherence leads to decreased Dutch-book-vulnerability. If we use a different measure of distance to coherence, reducing incoherence leads to improved accuracy in every possible world. We can show, moreover, that for any incoherent credence function, it is always possible to measure distance from coherence in such a way that there is a series of less incoherent credence functions that are both more accurate in every possible world and less Dutch book-vulnerable. Hence, using this argumentative strategy, we can vindicate

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



7

the assumption that it is beneficial to be closer to the ideal of coherence, and we can show which distance measures capture closeness to the ideal in ways that track increases in the relevant benefits. The argumentative strategy from these chapters is meant as a template that can be adapted to alternative arguments for the coherence norm, and also to other Bayesian norms that are supposed to function as ideals to be approximated. In Chapter 6, I consider views on which being ideally rational requires more than just being coherent. While extreme subjective Bayesians think that the coherence norm is the only requirement of epistemic rationality on our credences, more moderate proponents defend further requirements, such as versions of the Indifference Principle, the Principal Principle, the Uniqueness Principle, and others. This raises a question: How do we measure approximations to rationality, when being ideally rational requires thinkers to comply with multiple different epistemic norms? I distinguish different approaches to justifying norms of rationality by whether they assume that there is a single epistemic value or good that explains the various requirements of rationality, or whether there are multiple epistemic values or goods that have to be aggregated somehow in evaluating the rationality of epistemic states. I then list different possible measuring strategies for determining degrees of overall rationality, and I show which measuring strategies pair most naturally with the different normative theories of how rational requirements are justified. I also consider cases of constrained approximation, in which multiple principles of rationality apply to a thinker, but they are for some reason incapable of complying with one of them. I show that the next best credence assignment for the thinker need not be one in which the remaining principles are obeyed. In Chapter 7, I show how the rationality measures we established in previous chapters can be used to evaluate changes in a thinker’s credences. I am especially interested in how we can evaluate credence changes in thinkers who reason from irrational starting points. I consider three types of changes: cases in which the thinker revises their credences (i) without learning new information or adding new attitudes; (ii) without learning new information, but adding new attitudes; (iii) as a response to learning new evidence. Examining those cases leads to a variety of interesting discoveries: First, an incoherent thinker can always form new credences based on their existing incoherent credences without becoming more incoherent. Second, the identification of good patterns of reasoning is of very limited benefit in dealing with irrational thinkers. This is in large part because reasoning rules or patterns that are good to use for fully rational thinkers can lead to suboptimal results when employed by irrational thinkers. Third, evidence from cases in which thinkers

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

8



augment their credence functions by adding credences can help us argue against one of the strategies for measuring degrees of irrationality presented in Chapter 6. In Chapter 8, I consider how the theory of degrees of rationality I propose fits in with other related approaches to theorizing about rationality. I see the view developed in this book as a small piece of a bigger puzzle, and it is therefore important to ensure that there is no tension between the view I develop and nearby puzzle pieces. I focus on five topics in particular: the relationship between propositional and doxastic rationality, the relationship between ideal and ecological rationality, the relationship between evaluative and ameliorative approaches to theorizing about rationality, the relationship between epistemological and semantic perspectives on rationality, and the relationship between rational evaluations, permissions, and obligations. I argue that my view, which is best understood as an account of the propositional epistemic rationality of credences, not only harmonizes with the ways in which rationality is theorized in these domains, but is often a necessary ingredient for developing theories of these aspects of rationality. In Chapter 9, I expand my focus to examine how my theory of epistemic rationality can accommodate outright beliefs. In previous chapters, I am concerned exclusively with rational constraints on credences. Yet, according to a compelling and widely accepted view, people also have beliefs that don’t encode uncertainty, such as the belief that Paris is the capital of France, or the belief that today is Wednesday. In fact, this was (and still is) the dominant conception of belief in much of epistemological theorizing. This chapter is thus devoted to the question of what role such outright beliefs play in our epistemic conduct. On the standard conception of outright belief, people can rationally have outright beliefs in claims that they are not completely certain of. I endorse the view that people need outright beliefs in addition to credences to simplify their reasoning. Outright beliefs do this by allowing thinkers to ignore small error probabilities. What is outright believed can change between contexts. When our beliefs change, we have to ask how related other beliefs, including beliefs representing uncertainties, change in light of this. It has been claimed that our beliefs change via an updating procedure resembling conditionalization. However, conditionalization is notoriously complicated. This claim is thus in tension with the explanation that the function of beliefs is to simplify our reasoning. I propose to resolve this puzzle by endorsing a different hypothesis about how beliefs change across contexts that better accounts for the simplifying role of beliefs. I show that outright beliefs can only play their simplifying role by introducing slight incoherence in thinkers’ belief systems.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



9

Finally, Chapter 10 summarizes my findings and points to avenues of future research. In my discussion, I rely on a number of basic assumptions that I won’t explicitly defend, but that are worth mentioning at the outset. First, as I explained earlier, I will rely on the common Bayesian way of representing credences with precise real numbers between zero and one. This is not the only possible way of representing credences. Some authors argue that it’s best to represent a person’s credences with qualitative confidence orderings, because this is closer to how actual thinker’s credences are structured (Stefánsson 2017). However, it’s not clear that this modeling choice is ultimately better than using numerical representations. If the axioms imposed on the qualitative ordering are too weak, we can’t represent information about how much more confident a thinker is in some claim A than in B, and hence the models miss important information about a thinker’s credences (Meacham & Weisberg 2011). But if we use stronger axioms that allow us to represent these differences in confidence in qualitative orderings, then the qualitative ordering can be represented by a probability function. In that case, I’m not persuaded that there is a distinctive advantage to avoiding numerical representations. Another approach is to model credences as interval-valued, or as sets of credence functions. Yet, this approach encounters a similar issue as pointvalued representations, because determining the end points of each interval also requires determining arbitrary, precise cutoffs. The main philosophical points in this book don’t depend on how we represent credences. However, some of the formal results would have to be reworked if an alternative formal representation were chosen. For example, there is some controversy in the literature about how to measure the accuracy of imprecise representations of credences. The results in this book that rely on accuracy measures for precise credences would have to be shown to hold when we substitute the chosen measures for imprecise credences. The standard representation of credences also assumes that credences are mental states that come in degrees, and that their contents are propositions, or sentences, or suitably similar truth-evaluable contents. This way of representing credences locates subjective uncertainty in the thinker’s attitudes, rather than in the contents of their attitudes. Some authors have argued for alternative representations that locate uncertainty in the contents of attitudes. For example, Moss (2018) argues that credences are best represented as simple belief-attitudes towards probabilistic contents, where these contents are sets of probability spaces. Moss argues that this representation is preferable to the standard one due to its greater theoretical elegance. However, Carr (2019)

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

10  suggests that representations of credences that locate subjective uncertainty in the attitude and representations that locate subjective uncertainty in the attitude content are generally intertranslatable. It seems safe to me to assume that the arguments and results I present do not depend on choosing a particular one of these representations, but I won’t investigate this question further here. Some formal epistemologists have advertised frameworks for representing credences and justifying epistemic norms that are intended as substantively different alternatives to the Bayesian framework. Examples of such frameworks include ranking theory and Dempster–Schafer theory, among others.¹ These frameworks share with Bayesianism that they put forth highly demanding norms of ideal rationality. Hence, they also face the problem of explaining in what sense these norms can be regulative ideals for human thinkers. I hope that proponents of these frameworks will see the need to develop a solution to this problem, and that they will find the proposals I make for Bayesians to be helpful guides for how they might do so in their preferred theories. I will furthermore presuppose a particular understanding of epistemic normativity. The question “why should we be epistemically rational?” can be answered in many different ways. One particularly attractive and plausible answer says that being rational is a particularly good way (perhaps even the only, or the best way) of ensuring that one’s attitudes are epistemically valuable. On this view, thinkers whose attitudes are less than fully rational thereby miss out on having attitudes that are as valuable as they could be. If we think of rationality in this way, we can see how being closer to fully rational could be better than being farther away from being fully rational. Being closer to fully rational (hopefully) delivers a greater amount of epistemic value than being farther away. This understanding of rationality as promoting epistemic value is of course very broad, and compatible with many different views about what epistemic value is, and how exactly it gives rise to norms of rationality. I won’t consider alternative understandings of epistemic rationality explicitly, but I encourage readers to ask themselves whether any alternative views they have in mind can explain why it is better to be less rather than more irrational. Lastly, towards the end of the book, I will turn my focus away from credences and ask how outright beliefs fit into our normative picture. On my view of belief, some of our belief-like attitudes, our credences, encode

¹ A very accessible overview of D–S theory can be found in Liu & Yager (2008). For an introduction to ranking theory, see Spohn (2012). A good overview of different frameworks for representing uncertainty is provided by Halpern (2003).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



11

uncertainty, but not all of them do. Human thinkers also have outright beliefs, sometimes also called “full beliefs” or just “beliefs,” which do not encode uncertainty. We ascribe outright belief to people when we say things like “James believes that he is out of milk” or “Jane believes that she parked her car in the back lot.” Having an outright belief in a claim doesn’t require being certain of the claim. This view of the dual nature of belief is widely shared, and I won’t explicitly defend it here. The ninth chapter is devoted to explaining how my view about the rationality of credences extends to outright beliefs. While I do assume a broadly Bayesian approach towards developing a comprehensive theory of epistemic rationality, my view leaves a lot of room for adopting different views within this general approach. There is widespread disagreement regarding what exactly the norms of epistemic rationality are, and how they should be formulated. My book will demonstrate how to answer the questions that drive my investigation for Bayesians of all stripes.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

2 De-Idealizing Bayesianism Introduction This chapter will be concerned with the question of how we should develop a comprehensive normative theory of the epistemic rationality of credences.¹ My starting point will be the popular Bayesian approach to this problem. My aim is twofold. I want to (i) give an overview of current avenues of Bayesian research, and how they contribute to the project of developing a comprehensive account of what makes our credences rational, and (ii) argue that we need to expand the standard Bayesian accounts in a different direction in order to allow for meaningful evaluations of non-ideal reasoners. I will show that Bayesianism in its current form is best understood as a framework, or set of tools, for developing a normative theory of what makes our credences ideally epistemically rational. Most of the recent work towards developing the Bayesian framework goes in a particular direction, namely towards creating more comprehensive theories of what makes credences ideally rational. While this is an important direction of research, it doesn’t help us make progress towards answering the question of how Bayesianism can provide us with interesting normative standards for non-ideal thinkers. The framework of ideal norms that is currently being developed lacks the capacity to distinguish between better and worse ways of being imperfect. This also means that the current version of Bayesianism fails to give an account of what it means to approximate ideal rationality, and in exactly what sense such approximations constitute improvements of a thinker’s epistemic situation.² In later chapters, I will show how we can begin to remedy these omissions.

¹ In what follows, when I talk about the rationality of credences, I always mean the epistemic rationality of credences unless explicitly stated otherwise. ² There are a few exceptions to this claim. A handful of authors, such as Zynda (1996) and Schervish, Seidenfeld, and Kadane (2000, 2002, 2003) have worked on developing de-idealized versions of the Bayesian framework. Zynda’s paper especially stands out by giving detailed and insightful arguments for why we need an account of degrees of probabilistic incoherence. In decision theory, Weirich (2004) and Buchak (2014b) have put forth proposals for de-idealizing decision theory. Further relevant work

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

13

I will begin by describing the basic tenets of Bayesianism in section 1. In section 2, I characterize current avenues of Bayesian research. I argue that most of the literature focuses on expanding the inventory of ideal norms of rationality and providing justifications for them, or on generalizing existing ideal norms to a wider range of cases. In section 3, I compare three different ways of viewing the methodology Bayesians employ in arguing for norms of rationality, and I explain which view provides the best methodological underpinning of the Bayesian project. In section 4, I argue that in order to provide a comprehensive theory of the epistemic rationality of credences, Bayesians need to expand their framework so as to be able to capture the idea that non-ideal thinkers can fall short of the Bayesian ideals to a greater or lesser extent. Doing so will help developing the popular claim that Bayesian ideals should be approximated by human thinkers.

1. Bayesian Basics Bayesianism is a framework for developing answers to the question of what makes our credences epistemically rational. The basic idea that underlies Bayesian theories of rational credence is that there is an important connection between rational credences and the laws of probability. Loosely speaking, a person’s credences are rational when they behave like probabilities. As we discussed in Chapter 1, credences are mental states that come in greater and lesser degrees of strength, and we can use numerical measures to try to quantify these degrees of strength. It is an interesting question in cognitive psychology what the best methods for eliciting a person’s credences are, but for now, we will just assume that we have such a procedure available to us. We can represent a person’s credences by assigning real numbers between 0 and 1 to each of their degrees of confidence in a sentence or claim. The interval [0,1] is chosen for purely conventional reasons. The number zero is usually interpreted as representing the lowest possible credence that someone could have in a claim, and the number one is interpreted as representing the highest possible credence. We could choose different numbers without changing the basic structure of the framework. Yet, once an interval is chosen, it no longer makes sense to ask whether someone’s credences could lie outside of the interval—by stipulation, this is impossible. includes Hacking (1967), Field & Milne (2009) and MacFarlane (2004). There is also interesting related work in computer science.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

14 -  Once we have an assignment of real numbers from the [0,1] interval to sentences, we can ask whether this assignment is a probability function. To set up a probability function, we begin with a set of atomic statements {Ai}, and we combine it with the standard sentential logical operators to define a language L. We assume that {Ai} has finitely many members, and that the relation of logical entailment is defined in the classical way. A probability function P on L assigns a real number from the interval [0,1] to every sentence in L, in such a way that the following axioms are satisfied: Normalization: For any statement A, if A is a tautology, PðAÞ = 1 Non-Negativity: For any statement A, PðAÞ  0 Finite Additivity: For any two mutually exclusive statements Ai, Aj, PðAi ∨Aj Þ = PðAi Þ þ PðAj Þ Instead of defining probabilities over sentences or statements of a language, we can also define probabilities over sets of possible worlds. For many of the standard debates in the literature on Bayesianism, it doesn’t make a difference which option we choose. Yet, later when we start talking about irrational thinkers, the sentence-based framework will be advantageous, so I am choosing it instead of the world-based framework.³ In what sense do a thinker’s credences, when represented as real numbers in the [0,1] interval, need to obey the probability axioms in order to be rational? There are at least two ways in which an assignment of credences to sentences can fail to be a probability function: it can be incomplete, or it can be incoherent. Probability functions as we defined them are always complete, i.e. they assign a value to every sentence that is expressible in the language over which the function is defined. But a person’s credences need not be complete in the same way. For example, someone might have well-defined credences in a number of atomic sentences, but not in their conjunction, since the conjunction is not a claim that this person has ever considered. A probability function also assigns probability 1 to every tautology, but the average person has not considered most tautologies, and thus cannot be said to assign any credence at all to them.⁴ ³ Irrational thinkers sometimes assign different credences to statements that are logically equivalent. This is easier to model in the sentence-based framework than in the world-based framework, because the world-based framework doesn’t give us good resources to allow for different ways of identifying the same set of worlds, whereas the sentence-based framework does. ⁴ On one view, credences are a type of dispositions. Proponents of this view might question whether people’s credences have gaps, since all that is needed for having a credence is having the right kind of

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

15

The second way in which a person’s credences can fail to be a probability function is by being incoherent. If a thinker’s credences violate the probability axioms, for example by assigning a value of less that 1 to a tautology, then their credences are probabilistically incoherent, or incoherent, for short. We can derive all of the standard rules of probability that are usually listed in mathematics textbooks from the axioms above, and any deviation from these rules makes a thinker’s credences incoherent. I said above that a loose definition of Bayesianism claims that a rational thinker’s credences must behave like probabilities. The distinction between an incomplete and an incoherent credence assignment can help us see why this definition is indeed loose at best. In both cases, the credences fail to behave like a probability function, but only one of these failures is commonly taken to constitute a rational defect. A thinker’s credences are usually not taken to exhibit a rational defect when they are merely incomplete. Not having considered a claim and formed an opinion about it is not considered a failure of rationality. This might seem surprising, since Bayesianism is usually taken to require thinkers to be logically omniscient. Yet, on closer inspection, this requirement, properly understood, is compatible with the claim that incomplete credences are not irrational. Let’s focus on the Bayesian requirement that logical truths should get credence 1, and falsehoods credence 0. The point I am about to make naturally extends to other logical relations that constrain rational credences, such as entailments and equivalences between claims, so I won’t discuss them explicitly. One possible interpretation is that a thinker is logically omniscient just in case she has credence 1 in every logical truth and credence 0 in every logical falsehood. Another possible interpretation is that a thinker must assign the correct credence of 1 or 0 to any logical truth or falsehood, provided they have an attitude towards it. A better term for this second interpretation is that the thinker must logically infallible. On either understanding, the ideally rational thinker is able to correctly identify all logical truths and falsehoods (and other logical relationships), and never assigns them incorrect credences. The latter understanding, which requires thinkers to be logically infallible, is perfectly compatible with having

disposition to reason, act, or speak once the claim under consideration becomes relevant. However, it seems plausible to me that even on this view, our credences have gaps. The fact that I would be able to assent to or act upon a claim after considering it (which might take me a while to think about if the claim is sufficiently complicated) doesn’t show that I am currently opinionated about this claim. I might be able to form a disposition to respond to it in the right circumstances, but that doesn’t mean I already have such a disposition. Moreover, for some sufficiently complex claims, I might not even be able to form an opinion about them upon reflection. For more on dispositional accounts of belief, see e.g Schwitzgebel (2002).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

16 -  incomplete credences.⁵ Based on these considerations, we might say that a thinker’s credences are rational as long as they can be coherently filled in or completed so as to agree with a probability function on the underlying algebra of sentences. By contrast, a thinker whose credences are incoherent is taken to thereby violate a necessary condition for having rational credences. It doesn’t matter for this whether the incoherent thinkers’ credences are complete or incomplete. An incomplete assignment of credences to the sentences of the underlying language L is incoherent when it can’t possibly be extended to an assignment of credences to all sentences of L that agrees with any probability function. This claim—that incoherence amounts to irrationality—has been subject to the most attention and argumentative defense among philosophers interested in Bayesianism. I will treat it as a central commitment of a Bayesian theory of rational credence (early proponents of this view are, e.g. Ramsey 1926, de Finetti 1937). A further central commitment of Bayesianism concerns people’s conditional credences. The standard definition of a conditional probability is given by the ratio formula of conditional probability: Conditional Probability: The conditional probability of B given A, written & BÞ P(B|A) is defined as follows when PðAÞ>0: PðBjAÞ = PðA PðAÞ Conditional probabilities play an important role in the standard rule for updating on new evidence: Conditionalization: When new evidence A is acquired with certainty (and no other evidence is acquired), the resulting probability of every claim B is equal to its previous probability conditional on A, so Pnew ðBÞ = Pold ðBjAÞ. As before, the principles are formulated as concerning probability functions, so we need to spell out how they relate to rationality constraints on people’s credences. As before, people’s assignments of conditional credences can either be incomplete or incoherent. The latter is standardly assumed to be a rational ⁵ Smithies (2015) argues that logical omniscience is a requirement of ideal rationality. He appears to endorse the first view of logical omniscience. However, I think this is mostly because he frames his discussion by appealing to ideal thinkers, and he conceives of ideal thinkers as having considered every logical claim. He never attends to the possibility of an ideal agent not having considered a particular logical claim. He also paraphrases his view by saying that ideal rationality is “incompatible with ignorance and error in the logical domain” (p. 2778). By contrast, understanding the requirement in terms of logical infallibility is suggested by Titelbaum (2015) and Zynda (1996).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

17

defect, whereas we view incomplete assignments of conditional credences as rationally permissible for the reasons given above. The conditionalization rule is taken to govern the revision of credences based on evidence by probabilistically coherent thinkers. If the thinker learns some claim A (and nothing else) with the highest possible confidence, then they should update their credences as suggested by the rule. The rule is silent about cases that don’t meet these conditions. Also, it doesn’t handle cases involving self-locating information well, so it should best be understood as not applying to these cases (Arntzenius 2003). These two claims—that conditional credences must obey the ratio formula to be rational, and that rational thinkers should conditionalize when they learn new evidence with credence 1—are two further central commitments of Bayesianism.⁶ In laying out these central tenets of Bayesianism, we have already encountered a characteristic tool that Bayesians employ in setting out and arguing for norms on credences: mathematical models. The Bayesian method of arguing for normative claims via mathematical models proceeds roughly as follows: We first identify a phenomenon that we want to make normative claims about. We then represent this phenomenon in a formal structure, and derive results about the properties of this structure. From there, the results have to be interpreted with respect to their normative significance for the phenomenon that is represented by the formal model (see Titelbaum forthcoming[a], for an insightful characterization of normative modeling). More specifically, the aim of Bayesianism is to make normative claims about what makes credences rational. A person’s credences are formalized as assignments of real numbers to sentences. We can then prove results about these assignments, for example that assignments of real numbers to sentences that obey the probability axioms have particular properties. These properties are then argued to have normative significance, which in turn yields normative claims about the domain of the theory, in this case, people’s credences. The structure of the Bayesian theory, unlike informal normative theories, thus requires that we have so-called bridge principles (e.g. MacFarlane 2004, Titelbaum 2013) that connect the domain of the normative theory to the formalism, and that help us interpret the formal results with respect to their normative significance. These bridge principles are in many cases not obvious, and they require detailed examination and argumentative support. However, the extra work that is required to justify the bridge principles pays off when we ⁶ But see Hájek (2011) and Easwaran (2011b) for a detailed discussion of whether the standard definition of conditional probability should be modified.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

18 -  consider the advantages of having a powerful formal framework at our disposal. It helps us represent and systematize large interconnected structures, such as a person’s credences, and discover normatively relevant properties of different credence assignments that would be difficult to come by without a formal framework. Moreover, formal frameworks allow us to investigate a phenomenon in a precise and controlled way, because we can specify and define all the parameters the framework employs. Thereby, we can discover novel features of the framework and enhance our understanding of the underlying issues. It is difficult to see how, for example, a theory of rational credences could generate insights at the same level of rigor and generality if it didn’t have formal tools and models at its disposal. Theories that are offered as alternatives to Bayesianism tend to resemble it with respect to the way formal models are used to state and derive principles of rationality. Examples of such views are Dempster–Schafer theory and ranking theory, which propose alternative, weaker constraints on rational credences (see e.g Halpern 2003 for a helpful overview). While I won’t discuss those theories here in detail, many of the points I will be making subsequently about Bayesianism equally apply to those alternative theories, because the way these theories develop norms of ideal rationality closely resembles the Bayesian methodology. Hence, the following discussion should be interesting even to readers who endorse normative theories about rational credences other than Bayesianism. I have now laid out some central Bayesian commitments: that we should model credences as assignments of real numbers from the [0,1] interval to sentences (or sets of worlds), that probabilistic coherence is a requirement of rationality, that thinkers who learn appropriate types of evidence with certainty should conditionalize, and that we use formal models to argue for normative claims. Proponents of different versions of Bayesian theories of rationality often endorse additional claims about what makes a thinker’s credences rational, and I will discuss some of them below. But the commitments presented in this section are shared by virtually any Bayesian, and I will treat them as core tenets of Bayesianism throughout the book. Bayesianism as just described gives us some necessary conditions on what it takes to have rational credences, plus a general methodology for further investigation. I am interested in exploring how Bayesianism should be expanded to give us a comprehensive theory of epistemic rationality for credences. In the remainder of this chapter, I will first describe projects that currently dominate Bayesian research, and I will argue that these projects further the goal of giving a complete theory of what it takes to have ideally rational credences. I will then argue that we also need to ensure that we can use the Bayesian framework

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

19

to make meaningful normative judgments about thinkers who can only aspire to, but never reach the goal of having ideally rational credences. I will explain how the Bayesian framework needs to be expanded in order to meet this demand.

2. Current Bayesian Research In stating norms of rationality pertaining to credences, Bayesianism abstracts away from many factors that we usually consider relevant to whether or not a thinker’s attitudes or reasoning processes are rational, such as the amount of time a thinker has available to reason, the cognitive capacities or limitations the thinker is equipped with, or the limited introspective access the thinker has to their own mental states and processes.⁷ The norms of rational credence are developed, roughly speaking, by considering the characteristic roles credences (or beliefs more generally) play, such as representing the world and guiding action, and by thinking about how they ought to be configured to optimally play these roles.⁸ Here’s how Lewis (1974, p. 338) describes this methodology in relation to decision theory: Decision theory (at least, if we omit the frills) is not esoteric science, however unfamiliar is may seem to an outsider. Rather, it is a systematic exposition of the consequences of certain well-chosen platitudes about beliefs, desire, preference, and choice. It is the very core of our common-sense theory of persons, dissected out and elegantly systematized.

Lewis conceives of decision theory as a kind of systematic descriptive theory of how people make decisions, rather than as a primarily normative theory. Still, we can think of Bayesian principles of rational credence as being generated by the same reflective equilibrium method. Even if the principles seem very technical and removed from common sense, they are in fact carefully systematized and tested on the basis of core descriptive and normative assumptions about the role of beliefs more generally and credences in particular. These core assumptions tend to be either platitudes about the roles credences play in our ⁷ This is partly due to the fact that Bayesian norms of rationality are best understood as capturing the notion of propositional rationality, i.e. which attitudes would be rational for the thinker to adopt given her evidential situation and her priors. More on this later. ⁸ For a critical assessment of whether this methodology correctly captures the target phenomena, see Woods (2013).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

20 -  cognitive lives, or intuitions that are taken to reveal central features of our credences. A common strategy is to start with a strong intuition about what makes credences rational or irrational, and then to find arguments that appeal to the function of credences to underpin the intuition. For example, most Bayesians find the idea that ideally rational credences must be coherent extremely compelling, so they search for arguments to support this judgment. The two most popular arguments for the claim that credences ought to be coherent are based on two fundamental, common-sense assumptions about the function of credences: that they should represent the world as accurately as possible, and that they should be action guiding. Accuracy-dominance arguments show that if a thinker’s credences are incoherent, there is another coherent credence assignment they could adopt instead that is more accurate regardless of what world is actual. Coherent credences, by contrast, are not accuracy-dominated in this way (see de Finetti 1974 for the proof of a version of the underlying mathematical theorem; for developments of the argument, see Joyce 1998, 2009, Leitgeb & Pettigrew 2010a, b, Pettigrew 2016). Hence, accuracy-dominance arguments demonstrate why coherent credences are better than incoherent ones at accurately representing the world. Dutch book arguments use bets as stand-ins for actions more generally, and demonstrate that incoherent credences justify betting behavior that leads to guaranteed losses, whereas coherent credences don’t. Hence, coherent credences play their action guiding-role better than incoherent credences (see e.g. Ramsey 1926, de Finetti 1937, Christensen 2004, Hájek 2008b). Current research projects in Bayesian epistemology use the methodology of formulating norms on the basis of common-sense assumptions about credences in order to further develop the view. Many researchers think that there are additional requirements of rationality pertaining to credences besides the ones I introduced, so they attempt to formulate and justify those further requirements.⁹ An example of such a requirement is the Principal Principle, first proposed by David Lewis (1980). The common-sense idea behind the principle is that if a thinker has knowledge of objective chances, then this knowledge should constrain their credence assignments. For instance, if I know that the objective chance of rain for tonight is 80 percent, then the rational credence for me to assign to the claim that it will rain tonight is 0.8. ⁹ Of course, there’s also work criticizing or adjusting the central tenets of Bayesianism that I introduced in the previous section, but discussing this work would lead us too far afield (see e.g. Easwaran 2011a). The following overview of different projects is not intended to provide full bibliography. Rather, it’s supposed to point the interested reader to a couple of papers for each topic that would be a good starting point to explore the issue.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

21

There is a lively debate about how exactly this principle should be stated, and how it can be justified (for a comprehensive overview of different versions of the principle, and a proposal for how to justify it, see Pettigrew 2016). Another principle that is a candidate for a necessary condition on having rational credences is the Principle of Indifference. It is based on the common-sense idea that if you have a set of options that are in some sense symmetrical, and you have no evidence at all about which option will be realized, you should distribute your credence equally among the options. The most intuitive illustrations for this idea have to do with gambling. Usually, we have no evidence regarding which number will come up when a die is rolled, or how the cards are distributed in a deck, or where the ball will land on the roulette wheel. The reasonable response seems to be to give each possible outcome equal credence. Unfortunately, once we leave the realm of gambling, we don’t always have a principled way of identifying a symmetrical set of options. For this reason, it is controversial whether the Principle of Indifference really has the status of a general principle of rationality for credences (see e.g. LaPlace 1814; for recent discussion see e.g. Van Fraassen 1989, White 2010, Pettigrew 2016). Further principles people have proposed include principles governing the impact of higher-order evidence (Christensen 2010, Lasonen-Aarnio 2014, Titelbaum 2015, Schoenfield 2018a, Dorst 2019), the Regularity Principle (see e.g. Hájek 2012), the Reflection Principle (Van Fraassen 1984, 1989, Briggs 2009), and others. There is also a lively debate about how permissive the rational constraints on our credences are. Proponents of permissive views argue that there is more than one rational assignment of credences that is permitted by a given body of evidence (Meacham 2014). Proponents of uniqueness principles deny this. (Early defenses of versions of Uniqueness can be found in White 2005, Christensen 2007, and Feldman 2007. Recent ones include Dogramaci & Horowitz 2016, and Greco & Hedden 2016.)¹⁰ Another, slightly different but related project Bayesian epistemologists are engaged in is to generalize principles of rationality that are plausible, but in their current form restricted to a subclass of situations. Restricted versions of principles are often defended initially to simplify the discussion, and it is assumed that they will later be generalized. Removing those restrictions (ideally) shouldn’t affect the results that have been established for the more limited domain. The following research projects can be seen as attempts to

¹⁰ Carnap (1950) attempts to characterize a unique logical probability function, and can thus also be seen as an early defender of Uniqueness. Proponents of objective Bayesianism also fall into this camp (e.g. Williamson 2010).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

22 -  remove restrictions and generalize existing Bayesian principles: There’s work to ensure that normative results that have been established for thinkers with finitely many credences hold up if we allow credences with infinite domains, and also related critical work arguing that expansions of this kind fail for particular norms or norm justifications (Williamson 1999, Easwaran 2013a and 2013b). The standard conditionalization rule is restricted to cases in which thinkers learn particular types of evidence with certainty. There’s work on expanding rules of credence revision to cover cases in which we don’t learn evidence with certainty (Jeffrey 1965, 1992), cases in which we learn about the presence of undercutting defeaters (Gallow 2014, Weisberg 2015, Greco 2017), cases in which we learn conditional information (van Fraasen 1981), cases in which we forget evidence (Titelbaum 2013), and cases in which our selflocating beliefs change (Elga 2000, Titelbaum 2013). I classify these cases as removals of restrictions, since the research usually doesn’t present them as additional norms for credences that are entirely distinct from the existing norms. Rather, the more general norms are usually taken to be expansions of existing norms, and it is taken to be a success condition of establishing a new more general norm that it still validates the existing norm in the appropriate range of cases. Further work that can possibly be seen as removing restrictions is concerned with alternative ways of modeling credences. We’ve already discussed earlier that the assumption that credences are precise that is made in standard Bayesian models does not reflect a corresponding level of precision in actual agents’ credences. Not having precise credences is usually not seen as a rational defect, although this is a point of controversy.¹¹ Some philosophers think that in cases where one doesn’t have very specific evidence, it can even be rationally required not to have precise credences. Hence, there’s work on how to formulate requirements of rationality that extend to cases in which thinkers have imprecise credences (see e.g. Sturgeon 2008). Moreover, there is work on alternative ways of modeling credences as being merely qualitatively ordered, although it is debatable whether these approaches should be considered versions of Bayesianism (see e.g. Stefánsson 2017). Yet another way in which Bayesianism is being expanded is by arguing that in order to correctly capture people’s credences, it is not sufficient to distinguish between conditional and unconditional credences. As Lennertz (2015) has argued, the structure of

¹¹ Elga (2010) argues that rationality requires thinkers to have precise credences. Carr (2019) suggests that precise or imprecise credences are just different ways of measuring thinkers’ credences, and that neither one of them can be rationally required.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

23

certain kinds of quantified statements cannot be subsumed under either of these two categories, and we need to introduce a new class of credences he calls “quantificational credences.” There’s moreover debate about whether people have outright beliefs in addition to credences. Many philosophers think that people have both of these mental states, and that we need a theory of rational norms that gives us norms for beliefs and credences. Efforts to come up with such a theory that delivers rational norms for both of these types of belief can also be seen as removing restrictions from the Bayesian theory. I take on the question of how to formulate norms on outright beliefs in the penultimate chapter of the book. Some of these classifications are definitely controversial. But for my purposes it is not very important whether we should count a particular research project as establishing a new norm, or generalizing an existing norm. What is more important is that all the research projects just mentioned focus on the project of establishing a complete set of norms that tells us what it means to have perfectly rational credences.

3. How to Characterize Bayesian Methods The methodology currently employed in Bayesian research abstracts away from any obstacles thinkers might face towards having rational credences, and it thus doesn’t pay much attention to divergences from the ideal standards of rationality, other than saying that they aren’t ideal. This focus on establishing ideal norms of rationality has been criticized as a weakness of the Bayesian project. There are two more specific versions of this criticism. One worry is that we shouldn’t bother investigating ideal rationality at all, and focus on human rationality instead (given a suitable understanding of human rationality that is distinct from ideal rationality). The second worry is that it is a mistake to focus exclusively on ideal rationality in one’s theories. Christensen (2007) offers a convincing defense of the value of studying ideal rationality, even if it is unattainable for human thinkers. He points out that rationality is a quality that comes in degrees, and we think of humans as falling somewhere on a spectrum of being more or less rational. Once we acknowledge the graded nature of rationality, it is not surprising that, if we consider the full scale of degrees of rationality, the extremes of this scale may be unreachable. Yet, it is important to include the extremes in order to fully understand what it means to be more or less rational. Yet, this defense immediately shows why the second version of the worry is more serious: if

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

24 -  rationality is a graded quality, we cannot understand it if we focus exclusively on what it takes to be ideally rational, but not on what is involved in being rational to greater or lesser degrees. Focusing only on ideal rationality is just as misguided as focusing exclusively on any other specific point on the scale of degrees of rationality, such as the degree of rationality reachable by the average human thinker. A proper investigation of what it takes to be epistemically rational aims at characterizing the scale of degrees of rationality, with full or ideal rationality being the extreme that constitutes the upper limit of the scale. Bayesians of course readily admit that the standards of ideal rationality are so demanding that human reasoners can never be expected to fully comply with them. Their standard response is that rational ideals should be approximated as closely as possible by non-ideal reasoners, even if they can never become fully rational.¹² Given how commonly this response is given, one might expect that Bayesians are already well on their way towards developing a theory of degrees of rationality. Yet, in fact, such a theory has found little elaboration in the literature. As already mentioned, my goal in later chapters is to explain in detail how we can give a theory of degrees of rationality within the Bayesian framework, which both captures the idea that one can approximate ideal rationality more or less closely, and the idea that closer approximations to the ideal are in some sense better. Before turning to this task, I will examine different ways of characterizing the Bayesian methodology, to see which characterization best describes Bayesian practice, and is compatible with the idea that rational ideals should be approximated. I already introduced one understanding of Bayesian methodology in the previous section, which I will henceforth call the systematization view. On this view, the norms are derived by thinking about the characteristic role our credences are supposed to play in our thinking and decision-making. Once we have a systematic understanding of the role of our credences, we can develop norms that our credences should obey in order to perform their roles in the best possible ways. In developing the norms, we abstract away from limiting factors that interfere with credences playing this role perfectly, such as ¹² The way in which human reasoners approximate ideal norms can, but need not be intentional. A reasoner might intentionally check her arguments for flaws, or examine whether any of her attitudes are irrational in order to better comply with epistemic norms. At other times, her reasoning might be executed in a certain way out of habit. We can also think of our cognitive systems, which execute many reasoning tasks automatically and below the level of conscious awareness, as working to generate close to optimal results given the constraints they operate under, and given the environment the agent finds herself in. These latter ways of approximating ideal epistemic norms need not be intentional, the agent might not even be aware that it is happening. Bayesians who claim that human thinkers’ credences should approximate ideal norms of rationality usually have both of these senses in mind, the intentional and the non-intentional.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

25

processing or time limitations or possibilities of error. This view of the Bayesian method makes clear why the ideal norms are unreachable for nonideal thinkers, yet still apply to them: the limiting factors that are abstracted away from in formulating the norms for our attitudes do in fact constrain thinkers like you and me, and so we can’t be ideally rational. On this view, we can also make sense of the idea that non-ideal thinkers are better off the more closely they approximate rational ideals. If fully rational credences are best at performing their functional roles, it is at least prima facie plausible to assume that the less rational one’s credences are, the worse they are at performing these roles, and the more rational they are, the better they are at performing them. Of course, these claims need further argumentative support, but they are sensible working hypotheses on the systematization view. There are two additional ways in which the Bayesian project has been characterized in the literature, which I will call the ideal agent view and the scientific idealization view. I will examine now how these characterizations compare to the systematization view in making sense of the Bayesian methodology and the applicability of Bayesian norms to non-ideal thinkers. According to the ideal agent view, Bayesianism is not best seen as a theory that is based on common-sense assumptions about our credences. Instead, it is best viewed as a theory that characterizes what the credences of ideal thinkers would look like, where these ideal thinkers have no limitations regarding their memory, computational abilities, and the like.¹³ Rational ideals on this view are the principles that ideal thinkers’ credences comply with. If we characterize Bayesian norms of rational credence in this way, this invites two questions: First, how do we know what ideal thinkers and their credences are like? Second, what do the resulting norms have to do with us, who are very different from ideal thinkers? The answer to the first question is: We don’t really know. Ideal thinkers aren’t creatures that we have an independent grip on, so that we could find out things about their credences, which would help us develop norms pertaining to them.¹⁴ But even if we had an independent grip on ideal thinkers and their credences, this wouldn’t help us with the uncomfortable

¹³ It’s not clear that any philosophers have explicitly endorsed this view, but appeals to ideal thinkers are common in the literature. Sometimes these appeals appear to presuppose a strong reliance on ideal agents, which is captured in the ideal agent view. Other times, ideal agents are merely used for presentational purposes and to illustrate what compliance with ideal norms would amount to. I don’t object to the latter use of ideal agents at all. ¹⁴ There is of course a sense in which we can know what ideal agents are like. We can make assumptions about epistemic norms and goods, and depending on these assumptions, we can describe agents that perfectly exemplify all of them. However, if this is how we arrive at our conception of ideal agents, then we can’t appeal to their features to justify epistemic norms, on pain of circularity.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

26 -  problem to which the second question points. If we are ultimately interested in developing norms of rationality for credences had by people like you and me, then it doesn’t seem immediately clear why we should do so by investigating how thinkers who are very different from us manage their credences. Given that I differ significantly in my descriptive qualities from ideal thinkers, why would it be ideal for me to have credences that behave like those of ideal thinkers? And given that I cannot have credences that behave just like theirs, why would it be better for me to have credences that at least approximate the way ideal thinkers’ credences behave? In order to answer these questions, a proponent of this understanding of Bayesianism needs to first provide a response to the epistemological challenge, i.e. the challenge of how we can learn about the structure of ideal thinkers’ credences when we don’t have an independent grip on the nature of ideal thinkers. In a second step, the normative relevance of the resulting principles must be justified, presumably by arguing that thinkers like you and me resemble ideal thinkers in just the right ways, which makes it the case that thinkers like you and me would be better off by having credences that obey the same principles as ideal thinkers’ credences, or that at least approximately obey these principles. While I don’t want to argue here that it is impossible to answer these questions, this way of characterizing the Bayesian project strikes me as clearly inferior to the systematization view.¹⁵ The systematization view avoids both of the uncomfortable questions we encountered above: First, the systematization view doesn’t require us to have an independent grasp of what ideal thinkers are like. Second, it doesn’t require justifying that norms pertaining to these ideal thinkers also apply to us. Rather, the norms are developed as norms pertaining to our credences, they simply abstract away from limiting factors that can prevent us from fully complying with those norms. Thus, the question of whether human thinkers and ideal thinkers are alike in the right ways to ensure that the norms for ideal thinkers apply to humans does not arise on the systematization picture. We can then think of ideal thinkers as performing a merely illustrative function—we conceive of them as creatures that perfectly comply with all the norms, but we don’t need to appeal to ideal thinkers in justifying or deriving the norms. Another way of characterizing the Bayesian project is the scientific idealization view. This approach tries to make the Bayesian focus on rational ideals more palatable by claiming that it shares important features with idealized ¹⁵ An insightful criticism of relying too heavily on ideal thinkers in epistemic theorizing can also be found in Titelbaum (2013, 72–5), and in Christensen (2007).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

27

theories in science, which are taken to have a more robust standing.¹⁶ In scientific theories or models, idealizing assumptions are introduced for a variety of different purposes. Weisberg (2007) argues that all idealizations amount to distortions of models or theories. He distinguishes three subtypes of idealizations according to the purpose they serve: Galilean idealizations (McMullin 1985), which are supposed to make complicated models more computationally tractable; minimalist idealizations, which are supposed to isolate the most salient causal factors driving a particular phenomenon; and multiple-models idealizations, which involve constructing different, incompatible models of a particular phenomenon to shed light on it from different angles. Each type of idealization involves a distorted representation of reality, because we’re either pretending that certain factors that play a role in the phenomenon in reality are absent or negligible, or we’re assuming that factors that can in fact vary are held constant. A natural question in this context is in what sense an idealized theory can be thought to be true or correct. The answer standardly given is that idealized theories are precisely true descriptions of counterfactual scenarios in which the relevant idealizing assumptions hold. Insofar as the idealizing assumptions are approximately true about the actual world, the theories can also be seen as being approximately true descriptions of how things actually are (although a model that starts from approximately true assumptions doesn’t always generate approximately true outcomes or predictions, see Colyvan 2013). But even models that represent reality incorrectly in central respects can be useful in understanding the phenomena being modeled (see e.g. Wimsatt 1987 for an overview of how false models can be helpful, and Bokulich 2011 for related discussion). It is important to notice that none of the scientific idealizations are normative in the sense that the idealized representations are characterizing states that are in some way better than the real-world phenomena they are models of. Nor is it claimed that the world would be better if it more closely resembled the idealized scientific models.¹⁷ We can see some commonalities between this way of thinking about ideal scientific theories and ideal Bayesian theories. In both cases, we identify a phenomenon in the actual world that we are interested in, and in modeling the

¹⁶ Authors who draw analogies between theories of ideal rationality (including but not limited to Bayesian theories) and scientific idealizations are for example Weirich 2004, Colyvan 2013, Woods 2013, Yap 2014. ¹⁷ Sometimes, scientific models are used for normative purposes. For example, we might use models that predict climate change to help us figure out how we can slow down global warming. Yet, this is not a case in which the scientific model itself is normative. The normative claim that it is desirable to slow down global warming is external to the scientific model. (Thanks to E. Hochstein for pointing this out.)

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

28 -  phenomenon, we try to concentrate on its essential features, and abstract away from complicating factors. However, we can also see some important disanalogies between scientific, descriptive models and normative Bayesian models. The most obvious one, which I just mentioned, is that Bayesian models are normative, which means that they make claims about how the world ought to be, which can’t be said of scientific models (for a general discussion of normative modeling, see also Titelbaum forthcoming[a]). Relatedly, we find important differences once we ask in what sense the theories or models are true. If we transferred the answer we just gave regarding scientific models directly onto normative Bayesian theories of rational credence, we’d have to say that these theories are precisely true of counterfactual scenarios in which all of the idealizing assumptions involved are correct, and approximately true of us just in case the idealizing assumptions are approximately correct of us. There are a variety of problems with this picture. First, one might worry that the cognitive limitations that are being disregarded in developing Bayesian norms are significant enough that it would be inappropriate to say that Bayesian norms are based on assumptions that are approximately true of us. It doesn’t seem even approximately true of humans that they don’t have any cognitive limitations. Moreover, Bayesians certainly don’t want to claim that Bayesian norms merely approximate the true norms that apply to human reasoners, which is what they would presumably have to say if the analogy to the scientific case applied perfectly. Rather, they want to claim that ideal norms of rationality apply to credences of human reasoners, and that those credences are better the more closely they approximate compliance with the rational ideals. But this is again a disanalogy to the scientific case, since scientific theories definitely don’t assume that greater proximity to an ideal theory constitutes any kind of improvement. Hence, while the Bayesian and scientific idealizations both construct simplified models of real phenomena, we have seen that there are significant differences between models that are intended to be normative, and descriptive models. Comparing Bayesian models, which are normative, to scientific models, which are descriptive is not particularly useful for understanding the Bayesian methodology for justifying ideal norms, or for legitimizing the Bayesian claim that non-ideal reasoners are better off if they approximate rational ideals more closely. I thus conclude that the systematization view of the Bayesian project is preferable to the ideal agent view and the scientific idealization view, since it has better resources to explain how Bayesian ideals apply to human thinkers, and why it is at least prima facie plausible that human reasoners are better off the more closely they approximate being ideally rational.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    - 

29

4. Ideal Bayesian Norms and Non-ideal Thinkers 4.1 Bayesian Norms as Ideals While it is very attractive to understand Bayesian norms as rational ideals or goals for thinkers like us, the current form of the Bayesian theory is insufficiently developed to play this role. If the Bayesian norms are supposed to function as ideals to be approximated for thinkers like you and me, several conditions must be met. First, it has to be true that it is beneficial to approximate the ideal norms, even if we can’t fully comply with them. In other words, there must be a sense in which non-ideal thinkers can be better or worse off that depends on how rational they are. It’s not always the case that if something can be considered an ideal, it is better to approximate it more closely, given a natural interpretation of what it means to be closer. In the first chapter, I already introduced the example of getting your dream job. This scenario can plausibly be considered as an ideal. The reason you consider it an ideal is because there are particular benefits associated with getting the job, such as doing something you find interesting, making a good living, etc. The job will be given to the candidate who is at the top of the hiring committee’s ranking. There is a natural sense in which you are closer to getting the job if you are ranked second by the search committee than if you are ranked tenth by the committee. However, it’s not at all clear that not getting the job in virtue of being in second place is better than not getting the job in virtue of being in tenth place. A job search is a winner-takes-all situation, and it’s not true that people who were closer to the top in the ranking get a greater portion of the benefit associated with being in first place than people who were ranked lower. Rather, they all completely miss out on the benefits of being in first place. What this shows is that, given certain natural understandings of closeness, being closer to some ideal state isn’t always better, and that being very close to some ideal state doesn’t need to confer almost the same benefit as being in the ideal state.¹⁸ For Bayesian norms to function as ideals to be approximated for our credences, approximations to them shouldn’t work like the job example. Rather,

¹⁸ We could of course think of the case in a different way, where you get your second-favorite job rather than your dream job. In this case, being close to the ideal does give you a large portion of the benefits that are associated with the ideal of getting your dream job. Or perhaps we might think being in second place is better than being in tenth place because it gives you a better chance at getting the job. But this doesn’t affect my point that there are some ways of conceiving of approximating an ideal regarding which it isn’t true that being closer is better.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

30 -  being in some sense closer to the ideal should confer more of the benefits that come from perfectly complying with the norms. For example, having coherent rather than incoherent credences is supposed to be valuable because it helps make our credences more accurate, and better at guiding our actions. We could explain why it’s better to have less, rather than more incoherent credences if approximating coherence conferred more of these benefits. If approximating coherence means better accuracy and action-guidingness, then there is a robust sense in which we can benefit from approximating the ideal Bayesian norm of probabilistic coherence. If there were no benefit to being closer to ideal, however, it would be hard to explain in which sense ideal norms apply to our credences. Given that we never comply with them perfectly, there would be no sense in which it could be beneficial for beings like us to try to approximate ideal rationality. Suppose there is a sense in which non-ideal thinkers can be more and less irrational, and that being less irrational is in some sense better. It is a further substantive question how we should understand the notion of “approximation” or “closeness” to the ideal that is exhibited by non-ideal thinkers’ credences. Can we make this notion formally precise in terms of a mathematically defined distance measure, or is this distance-talk merely a convenient metaphor? There are many different distance measures that could in principle be used to capture the distance from an ideal, but it is not guaranteed that any of them will track the benefits of being less irrational, if there are such benefits.¹⁹ Current Bayesian research, with very few exceptions that I will discuss in detail in later chapters (Zynda 1996, Schervish, Seidenfeld & Kadane 2000, 2002, 2003, 2012), has not investigated how Bayesian ideals can be approximated, and in what sense being closer to the ideals might be better than being further away. This is somewhat surprising, since the importance of extending Bayesianism in this way has not gone unrecognized. For example, in his influential book Bayes or Bust, John Earman remarks in discussing whether Bayesianism should be regarded as a normative theory that applies to human reasoners: “The response that Bayesian norms should be regarded as norms towards which we should strive even if we always fall short is idle puffery unless it is specified how we can take steps that bring us closer to these goals” (Earman 1992, p. 56). Earman himself doesn’t take up this problem in further detail. Yet, he’s entirely correct in pointing out that in order to be a theory that

¹⁹ Thanks to Kenny Easwaran for helping me see how to best frame these issues here and in other places in the book.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    - 

31

can give meaningful evaluations of non-ideal thinkers’ credences, Bayesianism needs to explore these dimensions. As I mentioned in the previous section, probabilistic coherence is not the only requirement of rationality that Bayesians endorse. Many proponents of Bayesianism endorse some combination of additional requirements of rationality, such as the Principle of Indifference, the Principal Principle, and others. Hence, according to many versions of Bayesianism, there are multiple dimensions across which thinkers can diverge from, or approximate, the Bayesian ideal. Thinkers can comply with some requirements but fall short of others. For example, a thinker might have coherent credences, but fail to comply with the Principal Principle, which requires making their credences match the objective chances they know about. Alternatively, they might succeed in making their credences match the chances they knows about, but fail to make their non-chance-based credences cohere with the credences that are based on the chances. Once there are multiple criteria that the ideal state needs to meet, and that can be violated or complied with independently of each other, the question of how a non-ideal state can be improved to approximate the ideal more closely becomes increasingly difficult to answer. We should expect that the tradeoffs and interaction effects between violations of different rational norms will not be straightforward. An example that demonstrates how this might happen goes as follows. Suppose in order to be fully rational, a thinker’s credences should be both coherent and agree with the known chances. Assume further that for some reason, the agent is unable to make a small number of her credences agree with the chances that she knows about. How should her remaining credences behave? She might be able to still have coherent credences, although that would lead to further mismatches between her credences and the known chances. Alternatively, she might sacrifice coherence in order to ensure that at least her remaining credences agree with the known chances, and are coherent with each other, though not with the credences in the small non-chancy subset.²⁰ It seems intuitively plausible that the second arrangement is epistemically better. While overall coherence is sacrificed, this prevents the non-chancy credences from “spreading” their defectiveness further. I will discuss in Chapter 6 how Bayesians can judge in cases like this which non-ideal credence assignments approximate rational ideals most closely.

²⁰ This kind of problem is raised by Easwaran & Fitelson (2012) for accuracy-based approaches to justifying coherence.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

32 - 

4.2 Unfinished Business? Assume that we have succeeded in accomplishing the following: (i) for every Bayesian ideal norm, we have specified a way of approximating that norm; (ii) for each such case, we have justified that the norm is an ideal in the sense that it is better to be closer to it; and (iii) we have come up with a theory that lets us judge the goodness and badness of credal states that incorporates all the relevant dimensions of evaluation. In other words, assume that we have accomplished all the goals I mentioned in the previous section. Is that all we need to do in order to ensure that Bayesianism is a complete normative theory of the epistemic rationality of credences? Are there normative judgments, specifically rationality judgments about non-ideal thinkers that a Bayesian framework that incorporates features (i), (ii), and (iii) can’t account for? The theory I just characterized classifies all thinkers whose credences don’t conform to the ideal norms as irrational to some degree. This seems to be in conflict with views about rationality that are widespread in the philosophical literature. On these views, epistemic states and reasoning patterns are deemed rational that, at least at first glance, violate the requirements of ideal Bayesianism. For example, suppose a person needs to figure out the answer to a question. The reasoning problem is complex enough, and the time to figure out the answer limited enough, that the agent won’t be able to carefully reason to the correct answer. This is a situation that is extremely common. According to theories of bounded rationality, introduced by Simon (1955) and recently defended by Gigerenzer (2008, see especially the first essay “Bounded and Rational”), it is rational in these kinds of cases for thinkers to use reasoning shortcuts, or heuristics. These heuristics are much simpler to execute than reasoning strategies that guarantee ideal results, but, if deployed in the right context, can still be effective strategies for solving complex reasoning problems. Yet, by reducing complexity, and by being tailored to specific contexts, they can lead to suboptimal results depending on the task they are applied to. Hence, the theory of bounded rationality seems to recommend the opposite of the Bayesian view, namely that rational thinkers ought to avoid pursuing compliance with ideal norms in many cases. Similarly, cases of logical learning are often discussed in the literature as examples of rational thinking that seemingly violate ideal norms (see e.g. Christensen 2007, Smithies 2015). Suppose a mathematician (correctly) suspects a particular formula to be a theorem, but has not yet attempted or discovered the proof. It is commonly claimed that it is rational for the mathematician to have less than full confidence in the formula before discovering the proof, even

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



33

though this seems to contradict the Bayesian requirement that the ideally rational credence to assign is 1. After laying out the details of the Bayesian view, I will proceed in Chapter 8 to discussing these and other types of rationality judgments that are commonly endorsed in the literature, and that initially seem to conflict with the verdicts of the Bayesian account. I will argue that these conflicts are in fact only apparent, and that a deeper analysis of the relevant types of cases shows that the Bayesian view has an important role to play in underwriting the rationality judgments about these cases. Drawing together resources from the philosophical literature on the nature of justification, and on the semantics of gradable adjectives and deontic modals, and from the psychology literature on ecological rationality will help us appreciate the complex landscape of epistemic rationality, and the central role that the Bayesian view plays in enlightening it.

Conclusion I began this chapter by laying out some central tenets of the Bayesian framework, and by characterizing current avenues of Bayesian research. Most of this research is focused on identifying, justifying, and expanding requirements of ideal epistemic rationality for credences. As a result, current Bayesian research fails to pay sufficient attention to demonstrating how and why ideal norms of epistemic rationality apply to non-ideal thinkers. Answering these questions requires accomplishing a variety of different tasks. A foundational task that I undertook in this chapter was to give an interpretation of the Bayesian method that makes it at least prima facie plausible that Bayesian ideal norms can serve as regulative ideals or aims for non-ideal thinkers. I argued that the systematization view of the Bayesian method suits this purpose better than the ideal agent view and the scientific idealization view. I then argued that in order to generate evaluative judgments about non-ideal thinkers that help us distinguish between better and worse ways of being irrational, we need to spell out what it means to approximate complying with an ideal norm, and why it is better to approximate an ideal norm more closely. Moreover, we additionally need to explore the relationship between the Bayesian view of rationality and other, seemingly conflicting views of what constitutes epistemically rational attitudes and ways of thinking. I will take on these tasks in the following chapters.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

3 Approximation Measures Introduction In the previous chapter, I argued that Bayesianism can only be extended to a comprehensive theory of rational credence if it can answer two important questions: Why is it beneficial to approximate ideal rationality, if it is beneficial at all? And if it is beneficial, how should we understand the relevant notion of approximation? In this chapter, I will begin to look for answers to these questions, starting with the latter one. My strategy will be somewhat venturesome: I’ll assume that it is possible to come up with a useful, precise notion of approximating rational ideals, and I will develop constraints that will help us zero in on such a notion. The guiding idea is that talk of ‘approximating ideal rationality’ is not just metaphorical, but that there is a literal way of thinking about distance from ideal rationality that has the potential to track benefits of being less rather than more irrational. To make the discussion more concrete, I will focus on the central Bayesian tenet that a thinker’s unconditional credences should be probabilistically coherent. I develop some desiderata that a measure of distance from a rational ideal (coherence in our case) should satisfy in order to be recognizable as a distance measure, and in order to potentially be able to track benefits of being less irrational. I then spell out and examine different approaches for determining how closely a thinker’s credences approximate the ideal of being probabilistically coherent, and I identify a class of acceptable measures based on these desiderata. Once we’ve homed in on a class of suitable measures, we can go on in subsequent chapters to examine whether any of them in fact track ways in which approximating coherence (and ideal rationality more generally) delivers increasing benefits. In section 1, I lay out the methodology I employ in finding an appropriate measure of closeness to probabilistic coherence, and I propose a number of desiderata that an adequate measure should satisfy. In section 2, I will discuss an attempt to give a qualitative measure of closeness to probabilistic coherence, and explain why it doesn’t satisfy several of the desiderata laid out in section one. In section 3, I show how we can use distance measures to capture the idea that incoherent credences can be closer to or farther away from being

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

35

coherent, and I discuss which distance measures are particularly well suited for our purposes.¹

1. Basic Desiderata for Approximation Measures The aim of this chapter is to identify suitable ways of measuring how closely a given credence assignment approximates the ideal of being probabilistically coherent. Coherence is the most central ideal norm in the Bayesian framework, and once we have a strategy for measuring approximations to coherence, we can expand our approach to include approximations to other ideal norms, or combinations of them, as well. In order to capture approximations to coherence, we can either use qualitative or quantitative measures. A qualitative measure delivers a (total or partial) ranking of credence assignments according to their relative closeness to coherence. A quantitative measure produces such a ranking as well, but it additionally delivers numerical information about a credence assignment’s closeness to coherence. A quantitative measure, but not a qualitative one, can thus deliver information about how much closer one assignment of credences to sentences is to coherence than another. Regardless of whether a measure is qualitative or quantitative, there are a number of desiderata it should meet in order to properly capture closeness to coherence. The most basic desideratum is that it must be recognizable as a measure of closeness to coherence. We assumed, at least for now, that the notion of approximation is to be understood literally rather than metaphorically, and this desideratum captures this idea. That means that it must agree with our clear judgments about comparative closeness to coherence. I call this the judgment preservation desideratum. While we usually can’t easily recognize which of two credence assignments is closer to being coherent in complicated cases, we often have a very clear sense in simple cases. Judgments about these simple cases can help us rule out measures that are not recognizable as capturing approximations to coherence. A couple of examples will help illustrate what I have in mind. Suppose two thinkers, Ann and Ben, have credences about the same set of sentences. They assign the exact same coherent credences to all the sentences, with one exception. There is one tautology T in their credence assignment, to which Ann assigns a credence of 0.99, and Ben assigns a credence of 0.5. To have fully coherent credences, both thinkers ¹ Parts of this chapter are based on Staffel (2015). I am grateful to Kenny Easwaran for helpful suggestions regarding the framing of the discussion in this chapter.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

36   would have to assign T a credence of 1. It seems intuitively obvious that Ann’s credence approximates ideal coherence more than Ben’s does. For another example, consider now a case in which Ann and Ben have credences in only two claims, A and ~A. They both assign a credence of 0.5 to A, but Ann assigns ~A a credence of 0.51, whereas Ben assigns ~A a credence of 0.7. Both credence assignments are incoherent, because they violate the requirement that a thinker’s credences in a set of mutually exclusive and jointly exhaustive statements must sum to 1. Again, it seems obvious that Ann’s credences approximate ideal coherence more closely than Ben’s. At least prima facie, we should be suspicious of any measure that doesn’t validate clear judgments about approximations to coherence in simple cases like the ones above, because such a measure can hardly be taken to precisify the idea that Bayesian ideals should be approximated by non-ideal thinkers. Besides being recognizable as a measure that captures closeness to coherence, there are three further basic desiderata a suitable measure should satisfy. They are the incompleteness desideratum, the comparability desideratum, and the no-inundation desideratum. According to the incompleteness desideratum, a suitable measure of the closeness to coherence of a thinker’s credence assignment should be applicable to credence assignments that are incomplete. Recall that in Chapter 2, I discussed in what sense a thinker’s credence assignment must resemble a probability function in order to be rational. Probability functions are always defined over full algebras, i.e. there is no proposition expressible in the language that the probability function does not assign a value to. However, we saw that this is neither a realistic assumption to make about people’s credences, nor does it seem to indicate a rational defect if a person’s credence assignment is not complete in this way. Formally, we represent this as simply not including those claims in a representation of a thinker’s credences. Any plausible measure of closeness to coherence should be applicable to such incomplete credence assignments. It would severely limit the usefulness of a measure if it were only applicable to credence assignments that are defined over complete Boolean algebras, as probability functions usually are. The comparability desideratum requires that suitable measures of closeness to coherence avoid excessive incomparability. Like the incompleteness desideratum, it ensures that we home in on a distance measure that is applicable to the kinds of cases we’re interested in. If two credence assignments can intuitively be compared with respect to how closely they approximate coherence, then a suitable measure of distance to coherence should allow us to compare them. In general, we should expect credence assignments of the same

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

37

size to be comparable, especially when they are defined over the same set of sentences. For example, when an incoherent thinker revises some of her credences, we should be able to compare the credence assignments before and after the revision with regard to how closely they approximate coherence. Credence assignments that are very different in size are more difficult to accommodate. If one credence assignment contains ten times as many sentences as another, comparing their distance from coherence does not obviously make sense. However, there might be ways around this problem, such as relativizing a credence assignment’s closeness to coherence to its size. I will leave it open for now whether a measure can meet the comparability desideratum if it doesn’t allow comparisons between credence assignments that greatly differ in size. The last desideratum I propose is that a suitable measure of distance from coherence should avoid what I call the inundation problem. In a nutshell, a measure has an inundation problem when its output is generated by an unrepresentative sample of what is to be measured, such that the properties of this sample crowd out or overwhelm relevant features of the whole space. The reason why inundation should be avoided by distance measures that suit our purposes is that measures that suffer from inundation are unlikely to be good at tracking the benefits of being less irrational, if there are such benefits (even though they might technically be distance measures from a mathematical point of view). This no-inundation desideratum requires more explanation and defense than the previous ones. To see what it amounts to, it is useful to first consider as an analogy how we measure the overall wealth of a country. One common way of doing so is by determining the size of its gross domestic product, which is essentially a measure of the size of a country’s economy. Another common strategy is to measure the gross domestic product based on purchasing power per capita, which relativizes the gross domestic product of a country to the size of its population, and takes into account the local cost of living. In principle, there are also other, nonstandard measures we could adopt. For example, we could measure a country’s wealth according to the average income of the richest 1 percent of the population, or the poorest 1 percent. It is easy to see why these nonstandard measures aren’t as good as the commonly used ones if we are trying to measure the overall wealth of a country. They might be good measures of how rich the richest, or how poor the poorest inhabitants are, but they tell us little about the overall wealth of a country. That means a good measure of overall wealth must either take into account, roughly speaking, the wealth of every inhabitant, or it must somehow extrapolate from the wealth of a group that is deemed representative. But the

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

38   nonstandard measures I mentioned rely on extrapolating from groups that aren’t representative of the entire country. For example, a very poor country with a small rich elite could mistakenly be classified as wealthy. Hence, even if our common-sense notion of a country’s wealth isn’t precise enough to pick out a specific measure, it is still useful for ruling out measures that clearly don’t capture how wealthy a country is overall. How does this shed light on measuring distance from coherence? Just like the notion of the wealth of a country, the notion of the coherence of a credence assignment is a global notion. A credence assignment is probabilistically coherent just in case the whole assignment is representable as a (complete or incomplete) probability function, and incoherence obtains when there are divergences from this requirement anywhere in the thinker’s credence assignment. A suitable measure of divergences from coherence should reflect this global character. Such a measure will track differences in coherence between credence assignments in a way that is sensitive to changes in coherence across all of the thinker’s credences. And if there is indeed a sense in which differences in coherence amount to differences in how good the thinker’s credences are (with goodness being suitably precisified), then a measure of distance from coherence that has such global sensitivity will be a better candidate for tracking this goodness than one that doesn’t. Hence, just like in the case of measuring wealth, we should be wary of measures that determine distance from coherence based on a non-representative sample of the thinker’s credences. If a measure exhibits this behavior, I will say that it has an inundation problem. The inundation problem obtains when two credence assignments appear to differ in how closely they approximate coherence, but the measure in question assigns them the same distance from coherence, because it relies on an unrepresentative sample of the thinker’s credences, and relevant differences among the two assignments are thereby inundated. Similarly, a measure that has an inundation problem might assign different degrees of closeness to coherence to credence assignments that are intuitively similar in this respect, because the measure relies on an unrepresentative sample of the thinker’s credences. This is of course not to say that a measure of distance from coherence that exhibits inundation could never be useful. Just like in the wealth case, we might sometimes be interested in the properties of a non-representative sample of the thinker’s credences. A measure that is sensitive only to the properties of this sample just wouldn’t serve the purpose of measuring the overall closeness of the agent’s credences to coherence, and it wouldn’t hold much promise for tracking benefits of coherence.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

39

The desiderata I have specified will not be specific enough to identify a single measure as the best measure of closeness to coherence. Moreover, whether a measure fulfills a desideratum can sometimes be subject to interpretation. Since we don’t have a hard and fast way of identifying exactly which credence functions are intuitively comparable, and which cases fall into the category of clear and simple judgments of comparative closeness that should be preserved, there might sometimes be disagreement about whether a particular measure is ruled out or not. But that’s not a problem, given the purpose that the desiderata are intended to serve. Even if they only approximately characterize a set of desirable measures, they will still be very helpful in ruling out measures that clearly don’t capture our target, and hence narrowing down the options. Recall that we’re conjecturing for the purposes of our investigation that there is a literal way of thinking about distance from ideal rationality, and more specifically, coherence, that has the potential to track benefits of being less rather than more irrational. In what follows, I will examine two different types of measures of distance from coherence for unconditional credences, qualitative measures, and quantitative measures. A qualitative measure can deliver a ranking of credence assignments according to their comparative closeness to coherence, but it cannot give us quantitative information regarding how much closer one credence assignment is to coherence than another. A quantitative measure can deliver numerical information about distance from coherence, so it can deliver verdicts about how big a difference there is between credence assignments regarding their closeness to coherence. I will begin by discussing a qualitative measure, and I will argue that it doesn’t satisfy the desiderata I set out in this section. I will then move on to discussing various quantitative measures.

2. Qualitative Measures The earliest published proposal for an incoherence measure is due to Lyle Zynda (1996). Zynda argues that we need a measure of closeness to coherence in order to explain how ideal Bayesian norms can apply to real thinkers. The measure he develops is a qualitative measure of incoherence, and to my knowledge, it is also the only qualitative measure in the literature. I will argue that it is not well suited to meet the four desiderata, and that similar issues are likely to befall other qualitative measures. The basic idea behind Zynda’s measure is that we can compare incoherent credence assignments by comparing their maximal coherent restrictions. That

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

40   means we compare incoherent credence assignments by comparing the largest subsets of those assignments that aren’t incoherent. He defines a credence function as a set of ordered pairs of propositions and their assigned credences, where the propositions must form a Boolean algebra.² A maximal coherent restriction of an incoherent credence assignment can be generated by removing the smallest possible number of proposition/credence pairs from the thinker’s credence assignment, such that the remaining credences can be extended to a coherent credence assignment over a Boolean algebra. Often, there will be more than one way of doing so, in which case all possible ways of creating a maximal coherent restriction must be considered. In comparing two different credence assignments c and c’, one can arrive at one of four different outcomes: 1) c and c’ have the same maximal coherent restrictions (Zynda requires that the maximal coherent restrictions of c and c’ contain all the same pairs of propositions and assigned credences, it is not sufficient that they are defined over the same set of propositions. The same holds, mutatis mutandis, for the subsequent relationships). 2) For all maximal coherent restrictions of c and c’, each of the maximal coherent restrictions of c is a proper subset of one the maximal coherent restrictions of c’. 3) For all maximal coherent restrictions of c and c’, each of the maximal coherent restrictions of c’ is a proper subset of one of the maximal coherent restriction of c, or 4) none of the above. In the first case, c and c’ are equally coherent; in the second case, c’ is more coherent than c; in the third case, c is more coherent than c’; and in the fourth case, c and c’ are incommensurable. We can use this method to order credence assignments that are commensurable with respect to how incoherent they are. Zynda’s measure is a natural proposal for setting up a qualitative measure of approximation to coherence, but unfortunately, it has several serious problems. The first problem is that it delivers orderings of incoherent credence assignments that are intuitively incorrect. The second problem is that the measure generates excessive incomparability between credence assignments. The third problem is that the measure does not permit incomplete credence assignments, and it can’t be modified to change this. I will first show that the measure violates the judgment preservation desideratum. For example, consider two credence assignments c₁ and c₂, which are both defined over the same set of propositions fA; ∼A; ⊥; Tg. They assign credences as follows:

² A set of propositions has the structure of a Boolean algebra just in case it contains every logically distinct proposition that can be expressed by combining the atomic propositions in the set with the standard logical connectives.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  c1 ðAÞ ¼ 0:5 c1 ð∼AÞ ¼ 0:5 c1 ð⊥Þ ¼ 0 c1 ðΤÞ ¼ 0:5

41

c2 ðAÞ ¼ 0:5 c2 ð∼AÞ ¼ 0:5 c2 ð⊥Þ ¼ 0 c2 ðΤÞ ¼ 0:99

The two functions have the same maximally coherent restrictions, namely the following set of proposition/credence pairs f; ; g. This means that according to Zynda’s measure, they are equally ranked with respect to how much they approximate coherence. However, this is not an intuitive result at all. Given that c₁ assigns a degree of belief of 0.5 to the tautology, whereas c₂ assigns it a degree of belief of 0.99, and that is their only difference, it seems much more natural to think that c₁ is more incoherent than c₂. Hence, Zynda’s measure fails to capture the intuitive difference between these two credence assignments. And since capturing intuitive differences in closeness to coherence between credence assignments is one important desideratum for a measure of distance from coherence, this is a significant problem for this measure.³ The second problem with Zynda’s measure concerns cases in which two credence assignments can intuitively be compared, but are incommensurable according to Zynda’s measure. Suppose again that we are comparing credence assignments defined over the set of propositions {A, ~A, ⊥, T}. We want to compare two credence assignments c₃ and c₄: c3 ðAÞ ¼ 0:5 c3 ð∼AÞ ¼ 0:51

c4 ðAÞ ¼ 0:9 c4 ð∼AÞ ¼ 0:9

³ Zynda is in fact aware of this kind of result of his measure, and he comments on it in a footnote of his paper. Consider, for example, a person whose degree of belief function f is thoroughly incoherent but is everywhere numerically close to a probability function. [ . . . ] Intuitively, there is a sense in which such a person’s state of opinion is very ‘close’ to being coherent, but it would come out very badly on my account, since very little of it is actually coherent. [ . . . ] This is a distinct sense of comparative coherence from the one offered above; in my view, both senses are interesting and worth developing in greater detail. (Zynda, 1996, p. 215) Interestingly, Zynda acknowledges that his measure does not capture the very intuitive idea that the degree of incoherence of a credence assignment depends on numerical closeness to a probability function. Yet, he claims that there is a different graded notion of incoherence, which only depends on which parts of a credence assignment are actually coherent. I don’t think that this notion is what we are aiming for when we try to find a measure of probabilistic incoherence. If we want to know how much a thinker diverges from being perfectly coherent, it seems natural and important to take numerical differences between thinkers’ credences into account. Zynda’s measure may still be of technical interest, but I think it fails to capture our most natural and interesting judgments about approximations to coherence.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

42   c3 ð⊥Þ ¼ 0 c3 ðΤÞ ¼ 1

c4 ð⊥Þ ¼ 0 c4 ðΤÞ ¼ 1

Notice that the sum of the credences in A and ~A is 1.01 for c₃, whereas it is 1.8 in the case of c₄. Since rationality requires that any thinker’s credences in two propositions A and ~A sum to 1, it seems intuitively obvious that c₄ displays a greater departure from coherence than c₃. However, Zynda’s measure doesn’t let us compare the two credence assignments. For each credence assignment, we can create two maximally coherent restrictions, either by removing the credence in A, or by removing the credence in ~A. Doing so reveals that they neither have the same maximally coherent restrictions, nor do their maximally coherent restrictions stand in a subset relation, which means that they are incommensurable. This is an undesirable result, since it seems intuitively unproblematic to compare the two functions. It is easy to see that once we consider larger credence assignments, there will be even more ways to create maximally coherent restrictions of any incoherent credence assignment. We will thus frequently encounter the problem that the subset relations between the maximally coherent restrictions of credence assignments needed to produce incoherence orderings don’t hold. The measure thus violates the comparability desideratum. The last problem concerns the incompleteness desideratum. Zynda must assume that every credence assignment is defined on a full Boolean algebra of propositions, i.e. contains every logically distinct proposition that can be formed from some set of atomic propositions and the standard logical operators. However, this is an undesirable idealizing assumption, because real thinkers most likely have “gaps” in their credence assignments, which are propositions they have never entertained, and thus don’t have a credence in. Gappy credence assignments cannot be evaluated in Zynda’s measure. Filling in gaps in one’s credence assignment is actually one important form of reasoning thinkers can engage in: a thinker may consider some proposition A that they had not previously entertained, and wonder what credence to assign to it based on the credences they already have. Yet, the Boolean algebra requirement prevents us from evaluating whether a thinker has successfully executed this kind of augmentative reasoning. Since it involves adding a credence to one’s existing credence assignment, it follows that either the thinker’s initial credences, or their resulting credences, or both, cannot be defined over a Boolean algebra. Of course, the Boolean algebra requirement only presents a problem for Zynda’s measure if it cannot be relaxed. It unfortunately turns out that Zynda

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

43

needs this assumption, because otherwise his measure gives clearly incorrect results, as we can see from the following example: Suppose a thinker reasons in the following way, augmenting their existing credences by adding a credence in ~(A ∨ ~A). c5 ðAÞ ¼ 0:5 c5 ðAÞ ¼ 0:5 c5 ð∼AÞ ¼ 0:5 ) c5 ð∼AÞ ¼ 0:5 c5 ðΑ ∨∼ΑÞ ¼ 1 c5  ðΑ ∨∼ΑÞ ¼1 c5 ∼ðΑ ∨∼ΑÞ ¼ 0:1 In evaluating this instance of augmentative reasoning, it is immediately obvious that the thinker’s new credence in ~(A ∨ ~A) makes their credences incoherent, even though their credences were coherent initially. Hence, an adequate measure of distance from coherence should tell us that the thinker increased their distance from coherence by reasoning in a way that made them incoherent. With Zynda’s measure, however, we cannot determine how far from coherent the initial credence assignment c₅ is, because it is not a full Boolean algebra. It only becomes a Boolean algebra when ~(A ∨ ~A) is added. If Zynda allowed his measure to apply to c₅ before and after ~(A ∨ ~A) is added, both credence assignments would turn out to be ranked equally, because they have the same maximal coherent restriction. However, c₅ is initially coherent, but with c₅ (~(A ∨ ~A)) = 0.1 added, it is incoherent, so this cannot be true. It is precisely to avoid this sort of result that Zynda’s measure requires probability functions to be defined over complete Boolean algebras of propositions. In sum, we saw that Zynda’s measure has three major problems. The first problem is that his measure does not take into account numerical differences between incoherent thinker’s credences, which leads to counterintuitive ways of ordering various credence assignments according to their closeness to coherence. The second problem is that the measure renders credence assignments incommensurable that can intuitively be easily compared. The third problem stems from the indispensable requirement that a credence assignment must be defined over a Boolean algebra of propositions, which makes the measure unsuitable for evaluating gappy credence assignments. The measure thus violates three of our desiderata. These problems are important to keep in mind, because they give us guidance for finding a better measure. Before moving on, I will briefly consider whether the problems we have identified with Zynda’s measure have consequences for its ability to track potential benefits of being less incoherent. The desiderata specified earlier were

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

44   supposed to help us identify measures that are especially well suited to explain why there is something beneficial about approximating coherence. If Zynda’s measure tracked some benefit of being coherent despite violating several of our desiderata, this would suggest that the desiderata were not well chosen. Yet, it turns out that this worry can’t be substantiated. As I mentioned in Chapter 2, there are different arguments for why it is valuable to have coherent credences. Two central reasons that have been given are that coherent credences avoid being accuracy-dominated, and that they are not vulnerable to a Dutch book. Hence, we can ask whether Zynda’s measure identifies a sense of closeness to coherence that delivers an increasing portion of those benefits. Consider the two credence assignments from above: c1 ðAÞ ¼ 0:5 c1 ð∼AÞ ¼ 0:5 c1 ð⊥Þ ¼ 0 c1 ðTÞ ¼ 0:5

c2 ðAÞ ¼ 0:5 c2 ð∼AÞ ¼ 0:5 c2 ð⊥Þ ¼ 0 c2 ðTÞ ¼ 0:99

As mentioned earlier, c₁ is worse than c₂ in terms of how well the two functions intuitively approximate coherence. This result is reflected both in the accuracy and in the Dutch book-vulnerability of c₁ and c₂. Regardless of which possible world is actual, c₂ is more accurate than c₁ according to any continuous strictly proper inaccuracy measure. This is because the only difference between the credence assignment is in the credence towards the tautology, and c₂ obviously assigns a credence closer to 1 than c₁. The difference in credences in the tautology also leads to differences in Dutch book vulnerability. A clever bookie who wants to exploit the credence assignments’ incoherence would be able to extract a greater guaranteed loss from c₁ than from c₂, assuming we are disallowing repeated betting and holding the stakes of the bets fixed between c₁ and c₂. Yet, as we remarked earlier, Zynda’s measure ranks c₁ and c₂ as being equally incoherent. This means that some important benefits that we hope to get from being more, rather than less coherent, line up with our intuitive judgments about c₁ and c₂, rather than with Zynda’s ranking. Of course, we’ve just examined a single example, but reflection on the general setup of the measure suggests that this failure of the measure to track improvements in Dutch book loss or accuracy is not an anomaly. This suggests that our desiderata for homing in on suitable measures of approximations to coherence are steering us in the right direction. Of course, there might be other ways to devise a qualitative measure of closeness to coherence that looks different from Zynda’s. However, the

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

45

examples that we used to show that his measure is problematic provide compelling evidence that it is desirable to have a measure that is sensitive to numerical differences between credence assignments. I can’t think of a good strategy for coming up with a qualitative measure that exhibits this kind of sensitivity. It seems more sensible to move right on to quantitative measures.

3. Quantitative Measures 3.1 Distances and Divergences We are trying to capture the idea that incoherent credence assignments can approximate the ideal of probabilistic coherence to varying degrees. This idea of approximation does not need to be seen as a metaphor—we can make literal sense of the notion of being close to or far away from coherence. There is a rich literature on how to measure the distance between two entities, and many of those measures are suitable for measuring the distance between two credence assignments to the same set of claims. If we want to capture how closely someone’s credences approximate the ideal of probabilistic coherence, a natural implementation works as follows: measure the distance between the thinker’s actual credences and the closest coherent credence assignment according to some suitable distance measure, and interpret the result as the degree to which the credence assignment approximates the ideal of probabilistic coherence. I will now explain a bit more formally how to implement this idea, and introduce some sample distance measures to show the types of verdicts regarding approximations to coherence they generate. We can express a credence assignment as a vector X = (x₁, . . . , xn), where each component of the vector represents one of the thinker’s credences in a claim. The distance from coherence of this credence assignment is then determined by finding a vector Y = (y₁, . . . , yn), which represents a credence assignment that is coherent, and minimizes the distance between X and Y. In measuring approximations to perfect coherence, we’re both interested in which credence assignments approximate coherence more than others, and in how closely a given credence assignment approximates coherence. A quantitative measure can give us both of these kinds of information. Our everyday concept of distance suggests that there aren’t that many different ways to measure distances; perhaps one might even think there is only one way. But from a mathematical point of view, this is not true. There are a great variety of different distance measures with very different properties that are all

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

46   candidates for being used to measure approximations to coherence.⁴ A distance in the technical sense of the term is generally defined as follows: Let S be a set. A function d: S  S ! ℝ is called a distance on S, if for all x, y in S, the following conditions are met: Non-negativity : dðx; yÞ  0 Symmetry: dðx; yÞ ¼ dðy; xÞ Reflexivity : dðx; xÞ ¼ 0 The non-negativity condition requires distances to be either zero or positive. The symmetry condition ensures that the distance between two elements of S is the same no matter from which element we start. The reflexivity condition ensures that no element of S is at a positive distance from itself. If a distance additionally fulfills the following conditions, it is called a metric: Identity of Indiscernibles: dðx; yÞ ¼ 0 if and only if x=y Triangle Inequality: dðx; yÞ þ dðx; zÞ  dðz; yÞ Metrics capture the intuitive features we associate with distances, but there are still many different measures that fall into this category. Familiar distances include the absolute distance, which is also called Manhattan or taxicab distance, and Euclidean distance. Absolute distance: kX  Yk1 ¼

Xn i¼1

jxi  yi j

Euclidean distance: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Xn 2 kX  Yk2 ¼ ðx  yi Þ i¼1 i Both the absolute and the Euclidean distance are metrics that belong to a larger family, called the p-norm distance measures. We can generate the absolute distance by replacing p with 1 in the formula for generating p-norm distance measures below. We get the Euclidean distance if we replace p with 2.

⁴ The subsequent definitions follow Deza & Deza (2009).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

kX  Ykp ¼

Xn

jx  yi jp i¼1 i

47

1=p

In fact, replacing p with any real number greater than or equal to 1 generates a metric, and if we set p = 1, we get the Chebyshev distance, which measures the distance between two vectors as the maximum difference between two elements xi and yi of X and Y. Interestingly, not every way of measuring distances (understood in the pretheoretical sense) that has been found to have useful characteristics and applications even satisfies the technical definition of a distance given above. There are measures called divergences, which are often used to determine the “distance” between probability distributions, and which need not be symmetric or satisfy the triangle inequality. An example of such a measure is the Kullback–Leibler divergence, which is defined as follows: DKL ðPkQÞ ¼

Xn i¼1

PðiÞln

PðiÞ QðiÞ

For the Kullback–Leibler divergence to be defined, P must be a probability function, since the measure has the format of an expectation. Moreover, it must be the case that P(i) = 0 whenever Q(i) = 0. The KL-divergence is popular in statistics because it can be used to measure the relative entropy, or difference in information, between two probability functions.⁵ These are just a small sample of the distance measures we can use to compute the difference between two credence functions (from now on, unless indicated otherwise, I use “distance” in the inclusive sense that doesn’t rule out divergences). I have picked those measures out as examples because, as we will see in Chapters 4 and 5, they each have different properties that help us answer the question of why it is good to approximate coherence. Readers who want to get a more complete sense of what distance measures are out there are advised to consult Deza and Deza’s Encyclopedia of Distances (2009). It wouldn’t matter which of the many possible distance measures we use if they all agreed on how to order incoherent credence assignments, and perhaps also on when there are big and small differences in incoherence. But unfortunately, the measures just introduced don’t agree on these points. One way in which we can illustrate these differences is by looking at some very simple

⁵ For a good discussion of how the KL-divergence can be used to measure incoherence, see De Bona (2016, section 3.3). Also helpful are Cha (2007) and Deza & Deza (2009).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

48   1.0

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Fig. 3.1 Absolute distance

graphs. In each of the following diagrams (Figures 3.1, 3.2, 3.3, 3.4), the diagonal line indicates all the possible coherent credence assignments to a pair of claims A and ~A. The point on the line represents a credence of 0.5 in both A and ~A. The shape around this point indicates, for some distance measure, all the points that are as far away from it as (0.4, 0.4). This shows that each of these distance measures has a different take on which points are the same distance away from (0.5, 0.5) as (0.4, 0.4). The measure in Figure 3.1 is absolute distance, in Figure 3.2 Euclidean distance, in Figure 3.3 Chebyshev distance, and in Figure 3.4 KL-divergence. For another simple illustration of how the measures’ judgments come apart, we will consider a credence assignment that consists of credences in three distinct tautological claims. The credence assignment of a coherent thinker can be represented as the vector V=(1,1,1). This is of course a rather artificial example of a credence assignment, but it is well suited to demonstrate the differences between the measures, because the closest coherent credence assignment in this case is the same regardless of which measure we use. This

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

1.0

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

Fig. 3.2 Euclidean distance

1.0

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

Fig. 3.3 Chebyshev distance

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

50   1.0

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Fig. 3.4 Kullback–Leibler divergence

is not usually the case: different measures can disagree on what the closest coherent credence function to some incoherent credence function is. But it’s especially enlightening to compare the different measures’ verdicts when we hold the closest coherent credence function fixed, which this example allows us to do. Consider now the following vectors, which represent incoherent assignments of credences to the three tautologies: V1 V2 V3 V4 V5

¼ ð0:9; 0:9; 0:9Þ ¼ ð0:7; 1; 1Þ ¼ ð0:71; 0:99; 1Þ ¼ ð0:8; 0:8; 0:8Þ ¼ ð0:9; 0:9; 0:89Þ

The following table shows the distance of V₁ – V₅ to the vector representing the closest coherent credence assignment (1,1,1) according to some of the measures just mentioned:

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

51

Closeness of V₁ – V₅ to V = (1,1,1)

absolute Euclidean Chebyshev Kullback– Leibler

V₁

V₂

V₃

V₄

V₅

Ordering (best to worst)

0.3 0.173205 0.1 0.316082

0.3 0.3 0.3 0.356675

0.3 0.290172 0.29 0.352541

0.6 0.34641 0.2 0.669431

0.31 0.179165 0.11 0.327255

V₁ = V₂ = V₃ < V₅ < V₄ V₁ < V₅ < V₃ < V₂ < V₄ V₁ < V₅ < V₄ < V₃ < V₂ V₁ < V₅ < V₃ < V₂ < V₄

We can see significant differences here both in how the measures order the credence assignments from least to most incoherent, and also in terms of how they judge the distance from coherence. For example, only the absolute distance measure judges the credence assignments represented by V₁, V₂, and V₃ to be equally far from coherent, whereas none of the others do. All the other measures agree that V₁ represents the least incoherent credences in the mix, but they also disagree about how much less incoherent they are than the next closest one. There is also striking disagreement about V₄. All measures except Chebyshev agree that it represents the most far off credence assignment, but Chebyshev ranks it second. The differences between the first three measures can be explained by pointing to the effect of increasing the value of p from 1 to 2 to infinity in the schema for p-norm distances above. The absolute distance measure is completely insensitive to how incoherence is distributed in a credence assignment. It rates credences where the incoherence is spread in small portions throughout the credence assignment, like in V₁, to be just as far from coherent as credence assignments where incoherence is more dramatic, but concentrated in a small portion of the credence assignment, like in V₂ or V₃. Yet, as we increase the value of p, the measures become more sensitive to how the incoherence is distributed, with evenly spread small portions of incoherence being rated as much less significant than more dramatic but localized incoherence. The most extreme is the Chebyshev distance in this respect, because it ignores everything except the largest distance between two elements in the two vectors. This is why we get the result that the Chebyshev distance ranks V₄ differently than all of the other measures. The largest difference between two credences in V₄ is 0.2, whereas it is at least 0.29 in V₂ and V₃. Since the Chebyshev distance is insensitive to other characteristics of the credence assignment, it does not capture that the remaining credences in V₂ and V₃ are perfectly or almost perfectly coherent, whereas this is not the case in V₄. Being a divergence rather than a metric, the Kullback–Leibler divergence is not an instance of the same p-norm schema as the other

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

52   distances, but we can see that the way it evaluates our credence assignments for distance from coherence is similar to the absolute and the Euclidean distance measure. It orders the credence assignments in the same way as the Euclidean distance measure, but the way it judges the differences between the incoherence of the credence assignments is close to the results we get from the absolute distance measure.

3.2 Distance Measures and the Four Desiderata Having gained a feel for at least a few of the different distance measures we might use to determine how much a credence assignment approximates the closest coherent credence assignment, we can now take a look at how well this way of measuring incoherence satisfies our four desiderata. Recall that they were (i) judgment preservation, (ii) incompleteness, (iii) comparability, and (iv) no inundation. According to the judgment preservation desideratum, a suitable measure must agree with our clear judgments about differences in closeness to coherence. We only tend to have those clear judgments in very simple cases. Requiring that a measure should capture these judgments ensures that it is recognizable as a measure of how much a credence assignment approximates coherence, rather than a measure of some other property of credence assignments. The four measures I introduced in the previous section all deliver the correct verdicts about the examples in section two involving credence assignments c₁—c₅, which caused trouble for Zynda’s qualitative measure. However, we also saw that they all delivered different verdicts about how to order the credence assignments represented by the vectors V₁—V₅ in section 3.1. I don’t take the verdicts about these cases to indicate violations of the judgment preservation desideratum, because these cases don’t strike me as ones about which we have clear judgments regarding their comparative incoherence. Hence, the judgment preservation desideratum does not rule out any of the distances just considered as suitable bases for measures of closeness to coherence. The distance-based measures we just considered also satisfy the incompleteness desideratum. Since we can transcribe any credence assignment, whether complete or incomplete, into a vector, and calculate its distance to the (or a) closest coherent credence assignment to the same set of sentences, this way of measuring approximations to coherence can easily handle incomplete credence assignments.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

53

Distance-based measures also avoid excessive incomparability. A distancebased measure delivers a non-negative real number as the distance from coherence of any credence assignment. In principle, this means that every credence assignment can be compared with every other credence assignment with respect to their distance from coherence. These comparisons are especially straightforward when we’re considering two different credence assignments to the same set of sentences, and we’re asking how far away each of them is from the (or a) coherent credence assignment that is closest to it. We may also wish to compare the distance from coherence of credence functions that are defined over different sets of sentences, but this can of course lead to problems when one credence assignment contains many more sentences than another one. Simply in virtue of its size, the distance to coherence might be greater for the larger credence function than for the smaller credence function. This problem can be mitigated, however, if we relativize the total distance from coherence to the size of the credence function. By computing the average distance from coherence per sentence, we gain a way of comparing the incoherence of credence assignments that differ in size, if we wish to do so.⁶ Hence, distance-based measures of closeness to coherence meet the comparability desideratum. The fourth desideratum requires that measures of closeness to coherence avoid the inundation problem in order to ensure that the measure is sensitive to incoherence obtaining anywhere in a thinker’s credence assignment. Recall that the inundation problem obtains when the distance from coherence of an entire credence assignment is determined based on an unrepresentative sample of the thinker’s credences. One of the measures considered above, the Chebyshev measure, clearly has this problem, because it determines the distance from coherence of an entire credence assignment purely based on its worst element, i.e. the credence that is farthest away from its counterpart in the closest coherent credence assignment. If two credence assignments are the same in this respect, for example, if there are two credence assignments that mistakenly assign a credence of 0 to a tautology, then the Chebyshev measure assigns them the same distance from coherence regardless of what any of the other credences look like. This is clearly a problematic feature of a measure that is intended to keep track of the overall incoherence of a credence assignment. Hence, we should caution against using the Chebyshev measure ⁶ This problem is familiar from discussions about comparing the welfare of very differently sized populations, and comparing the accuracy of very differently sized credence assignments (Parfit 1984, Carr 2015, Talbot 2017, Pettigrew 2018). The averaging approach is problematic when measuring accuracy, but the same problems don’t transfer to measuring degrees of incoherence.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

54   as an acceptable measure for determining the overall closeness to coherence of a credence assignment. The other measures we considered do not have the same problem, although, as we saw earlier, they differ in how sensitive they are to how incoherence is distributed throughout a credence assignment. The higher the value of p in the p-norm distance measures, the more the measure is sensitive to whether incoherence is spread out or concentrated, with concentrated incoherence being rated as worse by the measure. Being sensitive to the distribution of incoherence throughout a credence assignment is not the same thing as having the inundation problem. The measures under considerations, apart from the Chebyshev measure, all pick up on incoherence that appears anywhere in the credence assignment. They differ in how they weigh the badness of the incoherence based on how it is distributed relative to the closest coherent credence assignment.

Conclusion In this chapter, I introduced four desiderata on appropriate measures of how closely a given credence function approximates the ideal of probabilistic coherence. I first considered a qualitative measure of incoherence, and I showed that it fails to meet several of the desiderata, and that it is poorly suited to capture why it might be better to approximate coherence more closely. The problems were mostly stemming from the fact that the measure is insensitive to numerical differences between credence assignments, which led me to move straight to investigating quantitative measures instead of trying to find an alternative qualitative measure. My examination of quantitative measures of closeness to coherence was more fruitful. There is a wide range of distances and divergences that meet the desiderata, any of which can be used as a basis for incoherence measures that determine the distance of an incoherent credence function to the (or a) closest coherent credence function. I specifically examined a few candidate measures that will prove useful in what follows, in order to illustrate subtle differences in the rankings they produce. The resulting wealth of options for measuring approximations to coherence is both a blessing and a curse. It is a blessing in the sense that we have succeeded in identifying precise ways in which we can spell out the idea that a thinker’s credences can be more or less close to coherent. But it is also a curse, in the sense that we now have a wealth of options, which produce different judgments about how to rank credence functions regarding how close to coherent they are. Most of the remaining measures are able to capture

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



55

our judgments about clear, simple cases of comparative incoherence, so we can’t use those judgments to narrow down the pool of measures any further. Fortunately, we will be able to make progress on singling out appropriate measures once we return to the question about the value of coherence: In what sense (if any) is it better to approximate coherence more closely? In the following two chapters, I will focus on two senses in which being coherent is valuable. Coherent credences are not accuracy dominated, and they avoid being vulnerable to Dutch books. This gives us two ways in which approximating coherence more closely can be beneficial, namely in terms of decreased Dutch book vulnerability, and in terms of improved accuracy. We will narrow down which quantitative incoherence measures we should use by identifying which measures track these senses in which being less incoherent is better. In a later chapter, I will also generalize the use of distance measures to show how we can measure the extent to which a thinker’s credences approximate not only coherence, but ideal rationality more generally. Since many Bayesians think that there are additional requirements besides coherence, we need to measure how closely a thinker approximates the credences that are rationally required by the combination of all applicable requirements of rationality.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

4 Why Approximate Coherence? The Dutch Book Justification

Introduction In this chapter and the next, my focus is on the question of how Bayesians can justify the claim that approximating probabilistic coherence is beneficial for non-ideal thinkers. Ideal coherence norms are standardly argued for by showing that having coherent credences is beneficial in ways that having incoherent credences is not. Thinkers who have probabilistic credences are not accuracydominated, and they are not vulnerable to Dutch books. In order to justify the idea that it is better to be less rather than more incoherent, a promising strategy is to argue that the more closely a credence assignment approximates coherence, the greater the portion of the benefits of perfect coherence it gets. We want to find natural ways of characterizing increasing shares of the benefits that coherence confers, which will then, if all goes well, align with measures of approximation to coherence that are recommended by the desiderata in Chapter 3. In this chapter, my focus is on Dutch book arguments. I will argue that we can justify that it is better to be closer to coherent by showing that decreased incoherence is associated with decreased losses from Dutch books. While incoherent thinkers can never be immune from Dutch book losses, the amount they stand to lose, given that we standardize bet sizes, is greater the more incoherent their credences are. This is only true, however, if we select appropriate distance measures to determine degrees of approximation to coherence. In section 1, I will give a brief explanation of how Dutch book arguments work. In section 2, I will introduce different notions of normalized Dutch book losses, which can help us capture what it means to be vulnerable to smaller or larger Dutch book losses. I then explain how these notions of normalized Dutch book losses link up with distance measures of incoherence, so that we can explain in what sense approximating coherence delivers increasing benefits regarding the thinker’s Dutch book vulnerability.¹ ¹ Some of the results in this chapter are based on Staffel (2015).

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

57

1. Dutch Book Arguments Dutch book arguments start from the idea that one central function of our credences is guiding our actions. Using bets as stand-ins for actions more generally, they are supposed to demonstrate that probabilistic credences are suitable for guiding action, whereas incoherent credences are not. Dutch book arguments rest on the basic assumption that there is a connection between a thinker’s degrees of belief in a statement and the cost the thinker is willing to incur for a bet on that statement. As Christensen (2004) argues, this connection is normative, in the sense that one’s credence in a claim justifies, or sanctions as fair, paying a specific cost for a bet on that claim. If one’s degree of belief in some statement A is x, then one should consider it fair to pay a cost whose utility is xY in order to get a reward whose utility is Y if A is true, and nothing if A is false. In this scenario, we will say that a thinker who takes part in this sort of transaction is buying a bet on A. Likewise, one should consider it fair to be on the other end of a gamble of this kind, so that one receives a payment whose utility is xY, and one must pay out a reward whose utility is Y just in case A is true. In this case, we will say that a thinker who takes part in this sort of transaction is selling a bet on A. Hence, the thinker’s credence marks a point that determines a fair price for both buying and selling a bet. Of course, a thinker should also consider it fair to buy the same bet at a lower price, or sell it for a higher price, but the indifference point marked by the thinker’s credence is special, because it is the highest buying price, and the lowest selling price justified by their credence.² It is common practice to represent these gambles as actual monetary gambles, even though it is obviously an idealizing assumption that utility can be represented linearly in terms of dollar amounts. Dutch book arguments show that a thinker whose credences violate the probability axioms is vulnerable to a guaranteed betting loss from a set of bets that are individually sanctioned as fair by their credences. By contrast, a coherent thinker faces no such guaranteed loss. Having degrees of belief that sanction as fair each bet in a set of bets that, by the laws of logic, jointly guarantee a monetary loss is rationally defective. Therefore, since only probabilistic credences avoid sanctioning as fair individual bets that lead to a sure loss when combined, only probabilistic credences avoid being rationally ² As Kenny Easwaran has pointed out to me, in order to make a Dutch book against someone, it is not strictly speaking necessary that a thinker is indifferent between selling and buying a bet at the price determined by their credence. As long as they are willing to sell at any higher price and buy at any lower price, a Dutch book can be made if the thinker is at all incoherent.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

58   ? defective. As Christensen (2004) has emphasized, the reason why Dutch books indicate rational defectiveness is not that the thinker is actually in danger of being impoverished by a clever bookie. Being cheated out of one’s money is a practical problem, not an epistemic one. Vulnerability to Dutch book loss indicates an epistemic defect, because the evaluation of the bets in an unfair betting situation as fair derives its justification directly from the thinker’s credences. Each of the bets in the Dutch book is fair in light of the thinker’s credences, yet the logically guaranteed outcome of the combination of these bets ensures an unfair advantage for the bookie. Yet, the credences of a rational thinker should not justify regarding as fair each bet in a set of bets that logically guarantees an unfair advantage for the bookie. Hence, since having incoherent credences puts the thinker in such a situation, incoherent credences are rationally defective.³ Since Dutch books establish a direct connection between credences and betting prices, it is a natural idea that we can relate them to incoherence measures. Dutch book arguments show us that coherent thinkers avoid guaranteed betting losses, hence, given that we understand bets as stand-ins for actions more generally, coherent credences are suitable for guiding our actions. If it is in fact desirable for thinkers to approximate the ideal of having coherent credences as closely as possible, then one reason for this could be that the less incoherent a thinker’s credences are, the better suited they are for guiding action. In other words, if we can show that increased incoherence leads to heightened Dutch book vulnerability, and heightened Dutch book vulnerability indicates an inferior ability to guide action, we have an argument for why probabilistic coherence is an epistemic ideal that is worth approximating.

2. Degrees of Incoherence and Dutch Books 2.1 Dutch Books and the Normalization Problem Unfortunately, the approach of explaining the value of approximating coherence by appealing to decreased Dutch book losses immediately runs into a problem. The standard way in which Dutch book arguments are formulated makes no prescriptions about the sizes of the bets involved in a Dutch book. We are told which buying or selling price would be justified for a given bet by ³ For a good overview of the origins of and ongoing debate about Dutch book arguments, see Hájek, 2008b.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

     

59

the thinker’s credence, but nothing constrains the amount of the payout. For example, if Sally and Polly both have a credence of 0.6 in some tautology T, I could make Sally sell a $1 bet on T for $0.60, thereby making her lose $0.40, and I could make Polly sell a $10 bet on T for $6, thereby making her lose $4. In this scenario, Polly would lose ten times as much as Sally. But of course, this difference in monetary loss does not reflect any difference in the thinkers’ credences or in their distance from coherence. The only difference between Sally and Polly is that I chose to make them sell bets of different sizes. Similarly, if the number of times thinkers can be made to bet on the same statement isn’t restricted, we can achieve differences in guaranteed loss without a difference in the thinkers’ credences, and thus without a difference in how incoherent the thinkers are. Hence, if we want to find a way of comparing the Dutch book losses to which different credence assignments give rise, we need to standardize or normalize the ways in which Dutch books are set up. Without any way of normalizing Dutch book losses, there won’t be any meaningful sense in which greater or lesser distance from coherence could be correlated with increased or decreased vulnerability to Dutch books. In a series of papers, Schervish, Seidenfeld and Kadane (henceforth SSK, 2000, 2002, 2003) have offered a variety of different ways of solving this problem, which they label the normalization problem. They propose various ways of standardizing Dutch books, so as to make the losses different credence assignments give rise to comparable. I will survey the normalizations they propose and explain which of them have particularly desirable properties for our purposes. In particular, we are interested in finding a way of measuring normalized Dutch book loss that correlates with a suitable way of measuring approximations to coherence from Chapter 3. Moreover, an acceptable way of normalizing Dutch book losses should not be ad hoc, i.e. the manner in which it identifies greater and lesser Dutch book vulnerability should be independently plausible, and not merely tailored to fit our favored approximation measures.

2.2 SSK’s Normalizations I will now introduce SSK’s proposals for how to normalize the Dutch book loss from a collection of bets based on a thinker’s credences. My goal here is slightly different from theirs. SSK’s idea is that we can measure degrees of incoherence directly in terms of the normalized Dutch book loss that a credence function gives rise to. Hence, different normalizations of Dutch book losses give rise to different ways of measuring incoherence. By contrast,

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

60   ? I argued in the previous chapter that there are many different distance measures that we can use in principle to measure how closely a thinker’s credences approximate coherence (and the ideally rational credences more generally). I am interested in finding out which distance measures are such that approximating coherence according to them delivers an increasing portion of the benefits associated with being ideally coherent. One of these benefits is immunity from Dutch books, but it is not the only benefit. While one distance measure might track changes in Dutch book vulnerability, another distance measure might track increases and decreases in another benefit associated with coherence, such as accuracy. Hence, unlike SSK, I don’t seek to measure degrees of incoherence directly in terms of standardized Dutch book vulnerability. Instead, I propose that we measure incoherence as distance from coherence according to some distance measure, and that we identify appropriate distance measures as those that meet the desiderata laid out in the previous chapter, and that correlate with increases and decreases in some relevant value that is associated with coherence. This approach leaves room for the possibility that there are measures of degrees of incoherence that are appropriate by our desiderata and track a relevant benefit of coherence, yet don’t track the benefit of decreased Dutch book vulnerability. The different Dutch book normalizations that SSK propose all have the desirable feature of being well-motivated in terms of features of betting scenarios, but still, some of them will turn out to be more useful for our purposes than others. I will now discuss their proposals in more detail. SSK distinguish two different aspects of solving the normalization problem: measuring the size of a single bet, and measuring the size of a collection of bets. I’ll begin by discussing their proposals for measuring the size of a single bet, and then I’ll explain how these options can be combined with ways of measuring the size of a collection of bets. The thinker is the person who has incoherent credences, and the bookie is the person who is setting up the Dutch book.⁴ SSK make the following three proposals for measuring the size of a single bet. (i) The Thinker’s Escrow The size of the bet is measured in terms of the amount of money that the thinker needs in order to cover their part of the bet. For example, if the thinker ⁴ SSK use different terminology, i.e. the bookie is the incoherent person for them. In my terminology, I call the possibly incoherent person the thinker, and the person who devises the betting scenario the bookie. My use of the term “bookie” appears to be consistent with common usage in the philosophy literature, but SSK’s use seems to be normal outside of philosophy.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

     

61

is willing to pay $0.40 for a bet that pays out $1, then the thinker’s escrow is $0.40, since this is the most the thinker can lose from this bet. In other words, the thinker’s escrow is the thinker’s highest potential net loss from the bet. (ii) The Bookie’s Escrow The size of the bet is measured in terms of the amount of money that the bookie needs in order to cover their part of the bet. For example, if the thinker is willing to pay $0.40 to the bookie for a bet that pays out $1, then the bookie’s escrow is $0.60, since this is the most the bookie can lose from this bet. In other words, the bookie’s escrow is the bookie’s highest potential net loss from the bet. (iii) The Neutral Normalization The neutral normalization is the sum of the bookie’s and the thinker’s escrow, which is also sometimes called the stakes of the bet. In the betting scenario mentioned in i) and ii), the neutral normalization would be $1. Each of these single-bet-size measures can then be used to normalize a collection of multiple bets. SSK focus specifically on two ways of normalizing a collection of bets, which they call the “sum” and the “max” normalization. To employ the sum-normalization for a collection of bets, the guaranteed loss from this collection of bets is divided by the sum of the single-betnormalizations. For example, if we use the bookie’s escrow as our singlebet normalization, we have to divide the guaranteed loss from a collection of bets by the sum of the bookie’s escrows for each of the bets in the collection. If instead the “max” normalization is used in combination with the bookie’s escrow for a collection of bets, the guaranteed loss from the collection is divided by the largest bookie’s escrow of any of the bets involved in the collection.⁵ Hence, we end up with six ways of measuring Dutch book loss, by combining each single-bet-normalization with each of the two normalizations for collections of bets. The idea behind all of the proposals is similar: while we don’t initially prescribe how large the bets in the Dutch book are supposed to be, we normalize the resulting guaranteed loss by measuring the sizes of the individual bets involved, and dividing the overall guaranteed loss from a set of

⁵ As SSK point out, there can also be intermediate normalizations that lie in between the “sum” and the “max” proposals. I won’t discuss them separately here, since the two extreme proposals are most interesting for our purposes.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

62   ? bets by a quantity that is determined by the sizes of the individual bets involved in the gamble. In order to measure the standardized DB loss resulting from a thinker’s (finite) credence function, we choose one of the six combinations of normalizations, and, using each of the thinker’s credences for at most one bet, we choose a combination of bets based on their credences that maximizes their standardized loss according to our chosen normalization.⁶ I will first explain why the neutral normalization is better than the other two single-bet normalizations, and then go on to discuss the relative merits of the neutral/sum and neutral/max normalizations. The problem with both the thinker’s escrow and the bookie’s escrow is that they don’t work well for normalizing betting losses from bets on tautologies and contradictions. Suppose Sally has a single incoherent credence in a tautology, c(T) = 0.5, and suppose we choose either the bookie’s escrow/max, or the bookie’s escrow/sum normalization (if there’s only one bet, they are the same). To determine Sally’s standardized DB loss, we’d have to maximize the following quantity: Sally’s guaranteed loss from selling a bet on T, divided by the bookie’s escrow for the bet on T. But the problem is that the bookie can’t lose this bet, hence her escrow is 0, and we can’t get a well-defined value for Sally’s standardized DB loss. This makes the bookie’s escrow an unattractive normalization of Dutch book losses for our purposes, since there are some Dutch book losses that it cannot be applied to. The thinker’s escrow/max, or the thinker’s escrow/sum normalization also has an unattractive feature. Suppose again we’re considering Sally, whose only credence is c(T) = 0.5. To compute her normalized Dutch book loss, we have to maximize the following quantity: Sally’s guaranteed loss from selling a bet on T, divided by Sally’s escrow for the bet on T. But this quantity is obviously going to be 1, no matter what incoherent credence Sally has in T. This is an odd result. Our motivation for standardizing Dutch book losses was to ensure that differences in losses are erased that stem from differences in how many times people bet, or from differences in how large the bets are that people are offered. If we offer Sally the opportunity to sell one $1 bet on T at the lowest selling price justified by her credence, the price, and hence her guaranteed loss, will vary along with her credence. We’re holding both the size and number of bets fixed. An appropriate normalization for Sally’s Dutch book loss should capture this variation in loss that’s induced by variations in her credence, but

⁶ I choose an informal presentation of SSK’s measures here to make my discussion more readerfriendly. Readers who are interested in the formal details of SSK’s proposals are encouraged to consult their presentations of the material. SSK’s proposals can be applied to probability intervals as well.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

     

63

instead, the thinker’s escrow normalization reports Sally’s normalized Dutch book loss as either 1 (if she has any incoherent credence in T) or 0 (if her credence in T is 1). There is no reason why the difference in losses should be erased; by contrast, it should be preserved, as it stems only from variations in her credence. Hence, the thinker’s escrow, at least in some cases, doesn’t appropriately perform its intended function as a normalization. Again, it would be better to have a way of normalizing Dutch book losses that avoids this kind of odd result.⁷ Of course, the problems I have identified with the thinker’s escrow and the bookie’s escrow specifically concern bets on non-contingent statements. Perhaps in some circumstances in which those types of statements are not being considered, the thinker’s escrow and the bookie’s escrow could still be useful. I will leave an exploration of this option for another time, and turn instead to the neutral normalization, which fortunately avoids the problems just discussed. It can straightforwardly be applied to normalizing Dutch book losses from bets on both contingent and non-contingent statements. SSK recognize this, and as a result, they express some preference for measures based on the neutral normalization, even though they keep working with the other measures as well (SSK 2003). The Neutral/Sum Normalization I will thus focus on the remaining two normalizations, neutral/sum and neutral/max, and investigate whether either of them is suitable for the purpose of explaining why it is beneficial to approximate coherence. The neutral/sum normalization can be described informally in the following way: the standardized DB loss for a given credence assignment is determined by looking at the worst Dutch book that can be made against someone with that credence function (for a more formal version of the arguments in this section, please see the Appendix). The worst Dutch book can be found by determining which set of bets makes the thinker lose the most money relative to the sum of the stakes ⁷ I have ruled out the bookie’s and thinker’s escrow normalizations on the grounds that they don’t perform the intended job of a normalization particularly well: the former is undefined for certain types of bets, and the latter erases differences in losses in some types of cases when those differences should be preserved, since they reflect only variations in credence. Hence, we won’t use them here to represent the degree to which a thinker is vulnerable to being Dutch booked. However, we might still wonder whether there could be distance measures that obey our desiderata, such that getting closer to coherence would track decreases in Dutch book loss as measures by either one of these normalizations. The answer is negative when we allow for bets on non-contingent statements. None of the distance measures we were considering gives us either constant or undefined results in the cases I discuss. However, there might be distance measures that align with the thinker’s or bookie’s escrow normalizations if we limit our attention to bets on contingent statements. I won’t investigate this here.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

64   ? of all the bets involved in this Dutch book. In order to find this set of bets, the bookie may include bets on or against as many statements in the thinker’s credence function as necessary, as long as each statement is used for no more than one bet. In this sort of arrangement, bets of any size are permissible, but since the guaranteed loss from these bets has to be divided by the sum of the total stakes of the bets involved in order to determine the standardized DB loss, the resulting loss is normalized. We can illustrate this with a simple example. Suppose that I have the following credences: c1 ðAÞ ¼ 0:6 c1 ð AÞ ¼ 0:5 c1 ðTÞ ¼ 0:9 We’ll assume for simplicity that all bets we can use for Dutch books have $1 stakes. This is not required by SSK’s measure, but in this case, we can safely make this simplifying assumption, because it does not change the result. Now, if these are my credences, then my standardized loss is determined by the Dutch book that creates the highest guaranteed loss relative to the sum of the stakes of the involved bets. There are three prima facie plausible candidates for being the worst Dutch book: (I) Buy one $1 bet on each of A, A, costing $0.60 and $0.50, respectively. Result: guaranteed loss of $0.10, hence the loss ratio is 0.1/2 = 1/20 (II) Sell one $1 bet on T for $0.90. Result: guaranteed loss of $0.10, hence the loss ratio is 0.1/1 = 1/10 (III) Make all of the bets in (I) and (II) at the same time. Result: guaranteed loss of $0.20, hence the loss ratio is 0.2/3 = 1/15 As we can easily see, the Dutch book that results in the highest loss ratio is (II), and so, according to the neutral/sum measure, this is the Dutch book that actually determines c₁’s standardized DB loss, which is 1/10. Notice that in order to find the worst Dutch book that can be made against a thinker, it is not necessarily optimal to include bets on or against all of the statements in which the thinker has incoherent credences. The Dutch book that does this, which is (III), in fact leads to a lower loss ratio than Dutch book (II), which includes only one credence. This feature might seem familiar from a distance measure we encountered earlier, the Chebyshev distance. We saw that if we measure the distance to the

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

     

65

closest coherent credence function by using Chebyshev distance, this distance reflects only the difference between the two corresponding elements of the two vectors whose difference is greatest. This similarity between the Chebyshev distance, used as an incoherence measure, and the neutral/sum normalization is no accident. As proven by De Bona & Finger (2015), the neutral/sum normalization and the incoherence measure that relies on the Chebyshev distance to the closest coherent credence function deliver the same results for finite, unconditional credence functions. Hence, we have found a distance measure, Chebyshev distance, we can use for measuring approximations to coherence that tracks standardized Dutch book loss according to the neutral/ sum normalization. Unfortunately, we already saw that the Chebyshev measure is not well suited to track the overall incoherence of a credence function, since it suffers from the inundation problem. Hence, it doesn’t meet one of our desiderata from Chapter 3. Fortunately, we can turn to the neutral/max normalization and its associated distance measure to measure incoherence in a way that avoids this problem. The Neutral/Max Normalization The incoherence measure based on the Chebyshev distance and its corresponding neutral/sum normalization are bad measures of the total incoherence of a credence function for the same reason that the average income of the poorest 1 percent is a bad measure of the wealth of a country: in both cases, we try to extrapolate from the worst case, but the worst case is not guaranteed to be representative of the overall wealth of a country, or the overall incoherence of a credence function. An incoherence measure that meets the no-inundation desideratum must avoid using a non-representative sample of the thinker’s credences to determine her overall degree of incoherence. One incoherence measure that meets this condition, as well as the other desiderata from Chapter 3 is the measure that is based on absolute (also called Manhattan or Taxicab) distance. This incoherence measure is especially interesting in this context because it tracks standardized DB loss if we use SSK’s remaining normalization, the neutral/ max normalization. As De Bona & Finger (2015) have proven, for finite, unconditional credence assignments, measuring incoherence by determining the absolute distance to some closest coherent credence function delivers the same results as the neutral/max normalization. (See the Appendix to this chapter for an explanation of how this can be extended to conditional credences.)

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

66   ? Let’s take a closer look at how this normalization works. The neutral/max normalization requires that we measure the standardized DB loss of a thinker’s credence assignment by maximizing the following quantity: the guaranteed loss that can be generated from a collection of bets that involve some or all of the thinker’s credences, divided by the largest stakes of any bet included in the collection. So, for example, if, given that we’re maximizing this quantity, the bet with the largest stakes in the collection is a $2 bet, and all the other bets have stakes that are $2 or smaller, then the denominator of the equation would be set to 2, and the standardized loss of the thinker is the loss generated by the bets included in the Dutch book, divided by 2. Equivalently, we can simply require that the stakes of the bets involved must be in the interval [1,1]. This means that we can normalize a credence function’s Dutch book loss by determining the largest guaranteed loss we can achieve by using each of the thinker’s credences for at most one bet, where each bet’s stakes are in the interval [1,1]. The neutral/max normalization differs from the neutral/sum normalization in that it doesn’t single out the worst Dutch book that can be made against a thinker in order to determine the standardized loss she is vulnerable to. Rather, it seeks the optimal way of exploiting as many of the thinker’s credences as possible to create Dutch book losses, and sums the losses from all of them to determine the standardized loss to which a credence assignment gives rise. In what follows, I will again simplify the calculations to make the arguments reader-friendly; please consult the Appendix and SSK’s work for the full formalism. Let’s revisit the example from the previous section, to see how the neutral/max normalization determines its standardized Dutch book loss. c1 ðAÞ ¼ 0:6 c1 ð AÞ ¼ 0:5 c1 ðTÞ ¼ 0:9 The neutral/max normalization determines the standardized DB loss in a way that takes advantage of all of the ways in which the thinker is incoherent. Recall that we must normalize the Dutch book loss by the stakes of the largest bet. Suppose that the stakes of the bet on T are $1, so the thinker will sell it for $0.90, hence losing $0.10. We can easily confirm that the thinker’s degree of incoherence is maximized when the stakes of the bets on A and A are also $1. We make the thinker buy the bets on A and A for a total of $1.10, guaranteeing a loss of $0.10. Hence, the thinker’s standardized loss is $0.10+

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



67

$0.10, divided by the stakes of the largest bet (=$1), which is 0.2. This is equal to the absolute distance of c₁ from some closest coherent credence function. We have thus arrived at the following desirable result: we have identified an incoherence measure for finite, precise, unconditional credence assignments that measures the degree to which a credence function approximates coherence as the absolute distance to some closest coherent credence function. This incoherence measure has the two features we were looking for: First, it satisfies the desiderata laid out in Chapter 4 (judgment preservation, incompleteness, comparability, no inundation). It thus identifies a way in which a credence function can approximate the ideal of coherence to greater or lesser degrees. Second, the measure tracks standardized DB loss according to the neutral/max normalization. The neutral/max normalization is an independently plausible way of making Dutch book losses comparable. We have thus identified a clear sense in which approximating coherence is good. Recall that I argued earlier that if Bayesians want to claim that coherence is an ideal that imperfect thinkers should approximate, then they must supply reasons for why it is better to be closer to being coherent rather than farther away. We can now give one such reason: Being coherent has the benefit of avoiding vulnerability to Dutch books. Approximating coherence, at least if measured with the absolute distance measure, leads to a systematic decrease in how much a thinker stands to lose from a Dutch book that is normalized according to the neutral/max normalization. This normalization is independently plausible, since it eliminates variations in Dutch book loss due to bet frequency or bet size, but tracks variations in Dutch book losses that stem from variations in the thinkers credences. Hence, approximating coherence lets thinkers approximate the benefits that come with being fully coherent.

Conclusion In this chapter I began to approach the question of how Bayesians should justify the claim that approximating probabilistic coherence is beneficial for non-ideal thinkers. I focused on the Dutch book argument, which is intended to show that coherent credences are suitable for guiding actions, whereas incoherent credences are not. The Dutch book argument demonstrates this by using bets as stand-ins for actions more generally, and showing that incoherent, but not coherent credences are vulnerable to guaranteed Dutch book losses. A natural idea for explaining why being less incoherent is better than being more incoherent is that decreased incoherence is somehow

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

68   ? beneficial in terms of the Dutch book losses a thinker is vulnerable to. The problem is that bet sizes in Dutch books can be arbitrarily large or small, hence, even two thinkers with the exact same credences can be made to lose different amounts from Dutch books, provided that they are offered bets of different sizes. Hence, if there is any hope for the idea that degrees of incoherence somehow track the thinker’s vulnerability to Dutch book losses, we need a way of standardizing Dutch books so as to make them comparable. SSK have already offered different approaches to solving this problem by offering a variety of different normalizations for Dutch book losses. We considered six different possible normalizations that SSK propose in their work. We ruled out four of those (bookie’s and thinker’s escrow combined with max and sum) because they were either undefined for certain types of bets, or because they erase differences in losses in some cases when those differences should be preserved, since they reflect only variations in credence, but not in bet size or bet frequency. These four proposals are thus not entirely apt to fulfill their function as normalizations of Dutch book loss. The remaining neutral normalization for individual bets avoids the problems of the bookie’s and thinker’s escrow. When combined with the sum normalization for packages of multiple bets, it is equivalent to the incoherence measure based on the Chebyshev distance. In other words, the closer a thinker’s credences are to the nearest coherent credence function according to Chebyshev distance, the lower her normalized Dutch book loss is according to the neutral/sum normalization. When combined with the max normalization for packages of multiple bets instead, we arrive at a way of measuring standardized Dutch book losses that is tracked by the absolute incoherence measure. We found the Chebyshev measure of incoherence to be problematic in Chapter 3, because it violates the no-inundation desideratum. But fortunately, the absolute distance measure of incoherence satisfies all the desiderata laid out in Chapter 3. We have thus found at least one acceptable measure of closeness to coherence that tracks Dutch book losses which are standardized according to a plausible normalization. That means we have found a way of showing that it is better to be less incoherent (on this measure), because the more closely a credence function approximates coherence, the greater the portion of the benefits of perfect coherence it gets (where this benefit is minimizing Dutch book-vulnerability according to the neutral/max normalization). Yet, not all advocates of the norm of probabilistic coherence endorse Dutch book arguments. There are a number of different ways of arguing for why coherence is beneficial, and each of these ways opens up a different avenue for arguing that approximating coherence is beneficial. In each case,

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



69

we want to identify a way of measuring degrees of incoherence, such that approximating coherence on this measure delivers an increasing portion of the benefits associated with perfect coherence. My task in the next chapter is to apply this argument schema to another popular argument for coherence, the accuracy dominance argument.

Appendix SSK’s Neutral/Sum and Neutral/Max Measures In this Appendix, I focus on the neutral/sum normalization, to give a more detailed formal exposition of my arguments in the main text. To see how this Dutch book normalization works, suppose there is an thinker who has a credence function c that assigns credences to a set of statements {A₁,. . ., An}.⁸ We can represent a bet on or against one of these statements according to the thinker’s credences in the following way: Bet: αðIndðAi Þ  cðAi ÞÞ In this case, Ind(Ai) is the indicator function of Ai, which assigns a value of 1 if Ai is true and a value of 0 if Ai is false. c(Ai) is the credence the thinker assigns to the statement Ai. The coefficient α determines both the size of the bet, as well as whether the thinker is betting on or against Ai. If α > 0, then the thinker bets on the truth of Ai, whereas if α < 0, the thinkers bets against the truth of Ai. In the following, it will be assumed that a thinker who assigns a precise credence to a statement thereby evaluates as fair the bet on and the bet against that statement at the price that is fixed by her credence. A thinker is incoherent if there is a collection of gambles she evaluates as fair that together guarantee a loss. Formally, we can represent this as follows: Let A₁,. . ., An be the statements that some thinker assigns credences to, let c be the thinker’s credence function, which may or may not be probabilistically coherent, and let S be the set of possible world states. If there is some choice of coefficients α₁,. . ., αn, such that the sum of the payoffs of the bets on or against A₁,. . ., An is negative for every world state s ∈ S, then the thinker is vulnerable to a Dutch book. Thus, there is a Dutch book iff ⁹ Xn sups ∈ S i¼1 αi ðIndAi ðsÞ  cðAi ÞÞ < 0

⁸ SSK’s incoherence measure is defined in terms of upper and lower previsions and it uses random variables instead of statements. The version I present here is somewhat simplified, because I use indicator functions instead of random variables, and I take credences to determine both the buying and selling price of a bet. That means that the measure I discuss is strictly speaking a special case of their more general measure. ⁹ The function “sup” picks out the least upper bound of a set. In this context, it selects the highest value from the combined payoffs in all worlds in S. Thus, if the highest possible payoff is still negative, the thinker can be Dutch-booked.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

70   ? This formula tells us how to determine whether a Dutch book can be made against a thinker who has a given credence function in a given set of statements. We can capture the guaranteed loss an thinker faces from a collection of gambles of the form Yi = α (Ind(Ai) – c(Ai)) as follows:¹⁰ Xn GðYÞ ¼ min 0; sups ∈ S i¼1 αi ðInd Ai ðsÞ  cðAi ÞÞ In order to normalize the guaranteed loss, we can divide it by the sum of the coefficients of the individual bets. This normalization is called the “neutral/sum” normalization by SSK. We can thus compute the rate of loss H(Y): GðYÞ HðYÞ ¼ Xn jαi j i¼n

The normalized Dutch book loss can be determined for a set of statements and a credence function over these statements by choosing the coefficients α₁,. . ., αn in such a way that H(Y) is maximized. To maximize H(Y), it may be necessary not to include certain statements in the Dutch book, which can be achieved by setting the relevant coefficients αi to 0. We can illustrate how the normalization works with an example. Suppose a thinker has credences in two statements, A and A. Her credence assignment c₂ is incoherent, since she assigns c₂(A) = 0.5 and c₂(A) = 0.6. In order to measure her rate of incoherence, we will first set up two bets with her, one for each statement, and sum them in order to determine their combined payoff: Y ¼ α1 ðIndA ðsÞ  0:5Þ þ α2 ðIndA ðsÞ  0:6Þ Since we can either be in a world where A is true or in a world where A is false, we can get two values for Y: If A; then Y ¼ α1 0:5  α2 0:6 If A; then Y ¼ α2 0:4  α1 0:5 Thus, we can calculate G(Y) as follows, where α₁, α₂ > 0:¹¹ If α₂  1.25 α₁ or α₁  1.2 α₂, then the second term in the brackets in the G(Y) equation is positive, which means that G(Y) = 0, otherwise GðYÞ ¼ sups ∈ S fα1 ðIndA ðsÞ  0:5Þ þ α2 ðIndA ðsÞ  0:6Þg: Thus, when a Dutch book can be made, (i.e. when G(Y) > 0) we can measure the normalized Dutch book loss by choosing the coefficients in such a way that H(Y) is maximized: HðYÞ ¼

sups ∈ S fα1 ðIndA ðsÞ  0:5Þ þ α2 ðIndA ðsÞ  0:6Þg jα1 j þ jα2 j

¹⁰ The “min” function is used here to select the smallest number of the numbers in a set. It ensures that if no Dutch book can be made against a thinker, the guaranteed loss she faces is 0. If a Dutch book can be made, the “min” function selects it, and the negative sign in front guarantees that we end up with a positive number that indicates the thinker’s guaranteed loss. ¹¹ If we set α₁, α₂ < 0, the combined payoff would be guaranteed to be positive.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



71

This can be achieved in this example if we choose α₁ = α₂, which results in a neutral/sum normalized loss of 0.05. I will now move on to the example discussed in the main text. When employing the neutral/sum normalization, in order to maximize the rate of loss, it is often necessary to leave some incoherent credences out of Dutch book. Remember that we are allowed to choose αi = 0 if necessary to maximize the rate of loss. In a case in which there are two Dutch books that can be made against a thinker, but one of them leads to a greater rate of loss on its own, the total rate of loss can be maximized by setting the relevant coefficients to 0. For example, suppose a thinker has the credence function c₁, that is defined as follows: c₁(A) = 0.6, c₁(A) = 0.5, c₁(T) = 0.9. A thinker who adopts c₁ can be Dutch booked in two ways: on their incoherent credences in the partition {A,A}, and on their less than full credence in the tautoloy. In this case, the rate of loss comes down to: HðYÞ ¼

0:1α1 þ 0:1α3 2α1 þ α3

The rate of loss in this case reaches its maximum value of 0.1 if we set α₁ = 0. This amounts to only Dutch booking the thinker on her credence in the tautology, but refraining from Dutch booking her on her incoherent credences in the partition {A,A}. This is the feature of the normalization that makes it line up with the incoherence measure based on the Chebyshev distance. As we saw in Chapter 3, the Chebyshev incoherence measure doesn’t meet the no-inundation desideratum. Correspondingly, since only the worst Dutch book determines the thinker’s normalized Dutch book loss, incoherence in other parts of the credence function gets inundated and is not reflected in the thinker’s loss. In order to determine the degree of incoherence of a credence function according to the neutral/max normalization instead, we must make the following adjustment: instead of normalizing the Dutch book loss by dividing the betting loss by the sum of the stakes of the individual bets, we must normalize it by dividing the Dutch book loss by the stakes of the largest bet included in the gamble. Hence, the rate of loss that must be maximized becomes: HðYÞ ¼

GðYÞ maxfjα1 j; . . . ; jαn jg

Another way to achieve the same result is to limit the stakes of each bet in the Dutch book to the interval [1, 1], and find the Dutch book that leads to the greatest guaranteed loss, with each credence being used for at most one bet. As explained in the main text, the neutral/max normalization is equivalent to the absolute distance to some closest coherent credence function. This measure does not exhibit the inundation problem.

Conditional Credences The neutral/max normalization as I have presented it here and the absolute distance-based incoherence measure are both defined over finite, unconditional credence functions. They can be expanded to cover conditional credence functions as follows (De Bona & Finger 2015, Potyka 2014). Suppose a thinker has coherent conditional and unconditional credences, such that her conditional credences obey the ratio formula: cðAi jAj Þ ¼

cðAi & Aj Þ ¼x cðAj Þ

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

72   ? This is equivalent to: cðAi & Aj Þ  xcðAj Þ ¼ 0 Her unconditional credences can also be expressed in this way, by assuming that Aj is tautological. Hence, all of her credences can be expressed as conditional credence assignments. If a thinker has credences c’ that disagree with the assignments in c, then there will be values ei such that c0 ðAi & Aj Þ  xc0 ðAj Þ ¼ ei For any two credence functions c’ and c, defined over the same sets of conditional credences, we can thus create a vector that measures the violations ei of c’ compared to c. To measure the degree of incoherence of a credence function c’, we find Xn some coherent credence function c, such that e is minimized. This way of measuring i i¼1 the absolute distance to the closest coherent credence function for conditional credences is equivalent to using the neutral/max normalization to for Dutch book loss, when it is extended to conditional credences as well.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

5 Why Approximate Coherence? The Accuracy Justification

Introduction In the previous chapter, I presented one strategy for defending the view that it is beneficial to approximate coherence. If we measure degrees of incoherence as the absolute distance of a credence assignment from some closest coherent credence assignment, then degrees of incoherence track standardized Dutch book loss according to the neutral/max normalization. Hence, approximating coherence, on this view, delivers a portion of the benefit of being fully coherent, namely minimizing (standardized) Dutch book loss. The present chapter will use the same argument schema to demonstrate a way in which approximating coherence is good, but this time, we will focus on a different benefit of having fully coherent credences. The accuracy argument for probabilism demonstrates that, given a suitable measure of the accuracy of a credence assignment, incoherent credences are always accuracy-dominated, i.e. there is an alternative, coherent credence assignment that is more accurate in every possible world. Coherent credence assignments, by contrast, are not accuracy-dominated. To argue that it is better to be less, rather than more incoherent, we need to identify a way of measuring approximations to coherence that somehow tracks improvements in accuracy. I will show how this can be done. In section 1, I will explain how the accuracy argument for probabilism works. In section 2, I will explain how we can measure degrees of incoherence in a way that tracks accuracy. In section 3, I will show that there is a way of measuring approximations to coherence that tracks Dutch book vulnerability and accuracy at the same time. In section 4, I will point out some strategies for choosing between the remaining measures of incoherence that satisfy all of our constraints so far.¹ ¹ The results presented in this chapter are based on two papers that were co-authored with Glauber De Bona (De Bona & Staffel 2017, 2018).

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

74   ?

1. The Accuracy Dominance Argument for Probabilism The basic idea behind the accuracy dominance argument for probabilism is that it is fundamentally valuable to have credences that are as accurate, i.e. as close to the truth, as possible, and that having coherent credences promotes accuracy.² To set up the argument, we begin by defining an optimal, or maximally accurate credence assignment at a given world. We claim that the best credence to have in some statement A at world w is 1 if A is true at w, and 0 if A is false at w. Next, we need a measure of inaccuracy: if a thinker’s credences are not maximally accurate, we need to measure their distance from the truth in a given world. For the argument to work, we can choose any additive inaccuracy measure that is based on a continuous, strictly proper scoring rule (more on this below). There are numerous scoring rules that can be used to make the argument work. An example of a proper scoring rule that is popular in the literature is the Brier score. Let c be a credence assignment defined on a language L, and let Indw be the indicator function that returns 1 if some statement A is true at a world w, and 0 if A is false at w. Then the Brier score of c at a world w is defined as follows: Bðc; wÞ ¼

X

2 cðAÞ  Indw ðAÞ

A∈L

Alternative, well-known continuous, strictly proper scoring rules are for example the logarithmic scoring rule and the spherical scoring rule. For some interesting comparisons between scoring rules, see Bickel (2007). To establish the claim that only probabilistic credences are rational, the argument relies on a principle of decision theory. Suppose that we have a set O of options {o₁,. . ., on}, and a set of worlds W. Let U be a utility function, such that U takes an option oj from O and a world wi from W and returns a real number that gives the utility of oj at wi. An option oj strongly dominates an option ok just in case the utility of oj is greater than the utility of ok in every possible world. An option oj weakly dominates an option ok just in case the utility of oj is at least as great as the utility of ok in every possible world, and strictly greater in at least one world. We can now state a sufficient condition for an option to be irrational for a thinker whose utility function is U.

² Versions of this argument can be found in De Finetti (1974), Joyce (1998, 2009), Predd et al. (2009) and Pettigrew (2016).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

     

75

Undominated Dominance: If some option oj is strongly dominated by another option ok relative to U, and ok is not even weakly dominated by any other option relative to U, then option oj is irrational for any thinker whose utility function is U. The assumptions about how to measure accuracy together with Undominated Dominance entail that there is a significant difference between probabilistically coherent and probabilistically incoherent credence assignments. If a credence assignment is incoherent, then there is an alternative coherent credence assignment whose inaccuracy is lower at every possible world, and which is not itself weakly dominated by another credence assignment. By contrast, if a credence assignment is coherent, then there is no alternative credence assignment that even weakly accuracy-dominates it. Hence, the dominance principle rules out incoherent credence assignments as irrational, but it doesn’t rule out coherent credence assignments. We can illustrate how the argument works with a simple example, which we can represent graphically. Suppose a thinker has the following incoherent credences: c1 ðIt’s rainingÞ ¼ 0:4; c1 ðIt’s not rainingÞ ¼ 0:3: Then there is at least one credence function (in fact, more than one in this case) that is more accurate in every possible world, if we measure inaccuracy with suitable accuracy measure, such as the Brier score. For example, the following alternative credence assignment strongly accuracy dominates the thinker’s existing credences if we measure accuracy with the Brier score: c2 ðIt’s rainingÞ ¼ 0:55; c2 ðIt’s not rainingÞ ¼ 0:45 Figure 5.1 illustrates why this is. The two dots mark the two alternative credence assignments we are considering, c₁ and c₂. The solid line connecting the points (0,1) and (1,0) represents all the possible coherent credence assignments. (Ignore the dashed line for now, its significance will become apparent later.) The two arrows mark the distances from the thinker’s incoherent credences c₁ to the perfectly accurate credences in each of the two possible worlds. If the thinker adopted the dominating, coherent credence function c₂ instead of c₁, which is symbolized by the dot on the solid diagonal line, the distance to the maximally accurate credences would decrease for each of the two possible worlds. (Credence functions near c₂ will also have this property,

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

76   ? 1.0

0.8

0.6

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Fig. 5.1 Accuracy dominance

not just c₂.) Hence, c₂ is closer to the truth than c₁ regardless of whether it’s raining or not. As just illustrated, the accuracy dominance argument shows that if the function of one’s credences is to accurately represent the world, then they should be coherent, since coherent credence assignments are better at accurately representing the world than incoherent credence assignments. This line of reasoning gives us a potential strategy for justifying why it is better to approximate coherence. If we can show that approximating coherence has the benefit of promoting accuracy in our credences, then we have another argument for why probabilistic coherence is an epistemic ideal that is worth striving towards.

2. Distance Measures and Accuracy The accuracy dominance argument shows that coherence conveys the benefit of putting thinkers in a position where there is no a priori determinable way

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

   

77

for them to improve the accuracy of their credences, whereas incoherent thinkers could change their credences in an a priori determinable way that would make them more accurate no matter what. In joint research with Glauber De Bona, I investigated the question of whether reducing the degree of incoherence of one’s credences guarantees improvements in accuracy (De Bona & Staffel 2017). I will introduce some definitions that will help precisify the idea that reductions in incoherence might lead to improvements in accuracy, and help state the relevant results. As before, a credence assignment c is said to be probabilistically coherent just in case there is a probability function P that agrees with c on all of its assignments. I already introduced the idea in Chapter 3 that a credence assignment c can be represented as a vector X = (x₁,. . ., xn), where each component of the vector represents one of the thinker’s credences in a claim. We can then use divergences or distance measures to determine how far two vectors are apart from each other, which helps us define measures of incoherence: The degree of incoherence of any credence assignment is determined by finding a vector Y = (y₁,. . ., yn), which represents a credence assignment that is coherent, and minimizes the distance between X and Y. We will use d to stand for a divergence, which is a non-negative function that can be used to measure the divergence between two vectors, where d(X,Y) = 0 if two vectors X and Y are identical. To use a divergence d in an incoherence measure, an incoherence measure Id(c) can be defined as follows: Id ðcÞ ¼ minfdðc; c’Þjc’ is coherentg The incoherence measure Id is generated by the divergence d, and it determines incoherence by measuring how far away c is from some closest coherent credence assignment according to d. Every such measure is non-negative, and it assigns a degree of incoherence of 0 to every coherent credence assignment. For instance, if d is the absolute distance measure, Id is the incoherence measure that tracks neutral/max normalized DB loss. But here, we need to cast a wider net to identify divergences that can be used in an incoherence measure that tracks accuracy. We could substitute any divergence or metric for d, including those whose properties I explored in Chapter 3. The question of interest here is: Does reducing one’s incoherence have the benefit of improving the accuracy of one’s credences? Given that there are many different measures Id(c), depending on how we fill in the value of d, it is important that the question is formulated in a way that respects different ways of measuring incoherence.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

78   ? Moreover, as I mentioned earlier, there are also many ways of measuring (in)accuracy. We will focus here on inaccuracy measures that are additive and based on continuous, strictly proper scoring rules, because all inaccuracy measures of this kind validate the accuracy-dominance argument.³ Suppose we have a set of statements {A₁,. . ., An}, and let V ⊂ {0,1}n be the set of vectors that correspond to the possible worlds in which A₁,. . ., An are true or false. Hence, the vectors in V also represent the most accurate credences at those worlds. An inaccuracy measure ℑ takes a vector X representing a credence assignment and a vector v from the set V representing a possible world and returns a real number that indicates the inaccuracy of this credence assignment at that world. An inaccuracy measure ℑ is said to be additive if there is a scoring rule sr: [0,1]{0,1}! ℝ for individual credences, such that for all vectors X representing credence assignments and vectors v from the set V representing possible worlds, the following formula represents the measure of the inaccuracy of a credence assignment at a world. Xn ℑðX; vÞ ¼ srðxi ; vi Þ i¼1 Thus, what makes the measure additive is that the total inaccuracy score of the credence assignment is the sum of the inaccuracy of the individual credences. It is easy to see that the Brier score, which was introduced as an example of a suitable scoring rule in section one, fits this description. As mentioned earlier, the accuracy-dominance argument goes through if inaccuracy measures are used that are based on continuous, strictly proper scoring rules. A scoring rule is strictly proper if it is such that any individual credence expects itself to get a lower inaccuracy score than any alternative credence. In other words, sr is strictly proper just in case for any credence c(Ai) = x, and any y ∈ ½0; 1; ysrðx; 1Þ þ ð1  yÞsrðx; 0Þ is minimized when y = x. A scoring rule sr is continuous just in case there are no “jumps”, i.e. small changes in credence never give rise to large changes in inaccuracy (see Pettigrew 2016, section 4.2). Besides the Brier score, other

³ A clarification is important here: it is crucial for the success of the accuracy-dominance argument for probabilism that one measure of inaccuracy is chosen as the correct inaccuracy measure. If more than one measure is considered legitimate, it creates a problem for the argument, as was first pointed out by Aaron Bronfman. The problem arises because the coherent credence assignment(s) that accuracy-dominate an incoherent credence assignment according to one inaccuracy measure can be disjoint from the coherent credence assignments that accuracy-dominate that same credence assignment according to a different inaccuracy measure. For discussion, see Pettigrew (2016). Pettigrew addresses this problem by arguing that the Brier score is the only appropriate inaccuracy measure. I will say a bit about it at the end of this chapter.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

   

79

examples of scoring rules that fulfill these conditions are the logarithmic scoring rule and the spherical scoring rule (see e.g. Joyce 2009 for details). We thus have an array of choices for both measuring degrees of incoherence and for measuring the inaccuracy of a credence assignment at a world. The question we’re interested in, namely whether approximating coherence gives thinkers a benefit in terms of the accuracy of their credences, thus needs to be precisified. We need to ask whether there are any combinations of incoherence and inaccuracy measures, such that reducing incoherence according to the relevant incoherence measure provides an accuracy-improvement according to the relevant inaccuracy measure. What exactly is an accuracyimprovement? We can rely on the notions of weak and strong dominance to spell out this notion. Applied to credence assignments, the notions are defined as follows: For two credence assignments c and c’, and some inaccuracy measure ℑ, c weakly ℑ-accuracy-dominates c’ just in case c is at least as accurate as c’ in every world, and more accurate in at least one world. Moreover, c strongly ℑ-accuracy-dominates c’ just in case c is more accurate in every world. If a thinker moves from a credence assignment c’ to a credence assignment c that either weakly or strongly accuracy-dominates her current credences c’, then this move constitutes an ℑ-accuracy-improvement according to the accuracy measure ℑ. The first, most general question we may ask now is whether there is any combination of incoherence and inaccuracy measures such that every reduction in incoherence amounts to an accuracy-improvement. In other words, we want to know whether we can find true instances of the following two conditionals by substituting appropriate pairs of Id and ℑ: (Ia) Reducing Incoherence Promotes Accuracy (weak dominance version) For all credence assignments c, c’, defined on the same set of statements: if c is less Id-incoherent than c’, then c weakly ℑ-accuracy-dominates c’. (Ib)

Reducing Incoherence Promotes Accuracy (strong dominance version)

For all credence assignments c, c’, defined on the same set of statements: if c is less Id-incoherent than c’, then c strongly ℑ-accuracy-dominates c’. Both conditionals are false regardless of which inaccuracy measures and incoherence measures we substitute. In fact, some reflection reveals that this is unsurprising. Every coherent credence assignment is less incoherent than any incoherent credence assignment according to every incoherence measure. For (Ia) to be true, there would have to be some inaccuracy measure

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

80   ? according to which every coherent credence assignment weakly dominates every incoherent credence assignment. But there is no such inaccuracy measure. Consequently, there can’t be a strong dominance relation of the relevant kind either, and (Ib) is also false (De Bona & Staffel 2017). To see why this is, it is helpful to consult Figure 5.1 in section 1. There is simply no plausible way of measuring inaccuracy according to which every credence function on the solid black line is at least as close to the truth as c₁ regardless of whether it rains or doesn’t rain. While this shows that not every way of becoming more coherent provides thinkers with an accuracy benefit, it might still be true that there are systematic connections between becoming more coherent and becoming more accurate. As I will now explain, there are two interesting connections of this kind. Roughly speaking, (i) moving to a more accurate credence function also makes one’s credences more coherent, and (ii) there is a specific way of becoming more coherent that guarantees improvements in accuracy. I will now characterize these results in more detail, and specify for which combinations of incoherence and accuracy measures they hold. The following two conditionals, which invert the dependence relation between incoherence and inaccuracy compared to (Ia) and (Ib), are true for certain combinations of Id and ℑ. (IIa)

Reducing Inaccuracy Promotes Coherence (weak dominance version)

For all credence assignments c, c’, defined on the same set of statements: if c weakly ℑ-accuracy-dominates c’, then c is not more Id-incoherent than c’. (IIb) Reducing Inaccuracy Promotes Coherence (strong dominance version) For all credence assignments c, c’, defined on the same set of statements: if c strongly ℑ-accuracy-dominates c’, then c is less Id-incoherent than c’. The conditionals (IIa) and (IIb) hold whenever an inaccuracy measure ℑ is paired with an incoherence measure Id that is suitably related to ℑ. The measures ℑ and Id stand in the right relationship to each other to validate (IIa) and (IIb) whenever we use a divergence d for our incoherence measure Id that is derived in a particular way from an inaccuracy measure. In slogan form, the idea is: measure the distance between two credence functions in terms of expected increase in inaccuracy. More specifically, a divergence d can be defined based on an accuracy measure as follows: for every coherent credence assignment c, we can calculate the degree of inaccuracy it expects itself to have. We can furthermore calculate the degree of inaccuracy c expects any other

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

   

81

(possibly incoherent) credence assignment c’ to have, according to some inaccuracy measure ℑ. The divergence d based on ℑ measures the difference between the inaccuracy score that c expects itself to have, and the (higher) inaccuracy score that c expects c’ to have. Now that we have defined a way of measuring the distance between two credence functions in terms of expected inaccuracy increases, we can plug it into our standard schema for defining an incoherence measure Id. Suppose d is a divergence that is derived from an incoherence measure ℑ in the way just described. Then the incoherence measure Id works as follows: the distance from some incoherent credence assignment c’ to the closest coherent credence assignment c is the difference between the inaccuracy score that c expects itself to have, and the inaccuracy score that c expects for c’, where c is selected such that this difference is minimized. For example, if we use the Brier score as our inaccuracy measure, then the degree of incoherence of a credence assignment c’ is the minimal difference between the expected Brier score of some coherent credence assignment c and the expected Brier score of c’ relative to c. Defining a divergence based on the Brier score in this way, and using it to measure distance from coherence is equivalent to using squared Euclidean distance to measure distance from coherence. The relevant results of this chapter are also preserved if we use Euclidean distance in our incoherence measure in combination with the Brier score as our accuracy measure. If we instead use the additive logarithmic scoring rule to measure inaccuracy, then we can generate a matching incoherence measure by using as our d a version of the Kullback–Leibler divergence (De Bona & Staffel 2017, De Bona 2016, Capotorti, Regoli, & Vattari 2009, Gneiting & Raftery 2007).⁴ Hence, we know which pairs of incoherence measures and inaccuracy measures lead to true instances of (IIa) and (IIb). There might perhaps also be other pairs of measures for which those conditionals hold, but we should not generally expect mixing and matching incoherence and inaccuracy measures to uphold these relationships. In fact, it turns out that if we choose the Brier score as our inaccuracy measure, the only p-norm distance measure that we can use to measure incoherence to validate

⁴ The results about measuring degrees of incoherence with a version of the Kullback–Leibler divergence and measuring inaccuracy with the additive log-measure are restricted to cases in which there are no maximally inaccurate credences, i.e. no cases of assigning credence 1 to a false proposition or assigning credence 0 to a true one.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

82   ? (IIa) and (IIb) is (squared) Euclidean distance (see proposition 4 in De Bona & Staffel 2017). We can illustrate this result by returning to our example from section one. Recall, we were considering the following two credence functions: c1 ðIt’s rainingÞ ¼ 0:4; c1 ðIt’s not rainingÞ ¼ 0:3 c2 ðIt’s rainingÞ ¼ 0:55; c2 ðIt’s not rainingÞ ¼ 0:45 Suppose we measure accuracy with the Brier score, as before, and we measure the degree of incoherence of c₁ as the minimal difference between the expected Brier score of some coherent credence assignment c and the expected Brier score of c₁ relative to c. It turns out that c₂ is the coherent credence function that minimizes this difference, so the degree of incoherence of c₁ is the difference between the degree of inaccuracy that c₂ expects itself to have, and the (higher) degree of inaccuracy that c₂ expects c₁ to have. The degree of incoherence of c₁ comes out as 0.212 in this example. By contrast, the degree of incoherence of c₂ is of course 0. As mentioned before, c₂ strongly accuracydominates c₁ if we measure accuracy with the Brier score. Hence, this is an instance of principle (IIb) above. While it is interesting to know that improvements in accuracy have benefits in terms of coherence, this is not the relationship we were initially trying to establish. Our aim was to show that approximating coherence is good because it has accuracy benefits. But what (IIa) and (IIb) show is that improving accuracy has benefits (or is at least not detrimental) with regard to coherence. However, there is a further result about the relationship between accuracy and coherence that speaks more directly to our initial question. Recall that we found that not every way of reducing incoherence generates an accuracy benefit. However, it turns out that if a thinker reduces her incoherence in a specific way, her accuracy will improve. Suppose that ℑ is a convex⁵, additive inaccuracy measure based on a strictly proper scoring rule, and d is the

⁵ An inaccuracy measure ℑ is convex if, for every ∈ ½0; 1n  ½0; 1n  V  ½0; 1, the following holds: ℑðλX þ ð1  λÞY; vÞ  λℑðX; vÞ þ ð1  λÞℑðY; vÞ: An argument for why convex inaccuracy measures are attractive can be found in Joyce (2009). According to Joyce, convex inaccuracy measures encourage thinkers to be conservative in their belief revision strategies. Suppose a thinker could choose to have her credences either slightly increased or decreased by a random process that makes an increase or decrease equally likely. A non-convex inaccuracy measure might make this seem like a belief revision strategy that is on average beneficial for the accuracy of our credences. By contrast, a convex inaccuracy measure discourages such a belief revision method. Examples of convex inaccuracy measures are the Brier inaccuracy measure and the logarithmic inaccuracy measure.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

   

83

divergence generated by this inaccuracy measure in the way just introduced. Then the following claim holds: (III) Approximating Some Closest Coherent Credence Assignment Promotes Accuracy If c’ is an incoherent credence assignment on {A₁,. . ., An}, and c is some d-closest coherent credence assignment to c’ on {A₁,. . ., An}, then c strongly ℑ-accuracy dominates c’. Moreover, for any λ in the interval (0,1], let c* be a credence assignment on {A₁,. . ., An} such that c*ðAi Þ ¼ λcðAi Þþ ð1  λÞc’ðAi Þ. For any λ ∈ ð0; 1; c* strongly ℑ-accuracy dominates c’. What this means is that if a thinker reduces their incoherence by moving towards a credence assignment that is on the direct path to the closest coherent credence assignment according to Id, then it is guaranteed that their resulting credences will be less inaccurate according to the corresponding inaccuracy measure ℑ in every possible world. Hence, while not every way of becoming less incoherent gives thinkers an accuracy benefit, becoming less incoherent by moving towards the closest coherent credence assignment does. Both the Brier score and the logarithmic inaccuracy measures have the convexity property that is needed to generate this result, though the (less popular) spherical scoring rule doesn’t. Let’s return to our earlier example to illustrate this point: We already know that c₂ is the closest coherent credence function to c₁, provided we use (squared) Euclidean distance, which is illustrated in Figure 5.2 by the dashed line between the two dots representing c₁ and c₂. Now, any credence function that lies on this segment of the dashed line is both less incoherent than c₁ according to the incoherence measure based on (squared) Euclidean distance, and it also accuracy-dominates c₁ if accuracy is measured with the Brier score. As one’s credences move closer to c₂, there is a steady increase in accuracy in every possible world and a steady decrease in incoherence. An example of a credence function that has these properties is c₃: c3 ðIt’s rainingÞ ¼ 0:41 and c3 ðIt’s not rainingÞ ¼ 0:31 c₃ lies on the direct path between c₁ and c₂, and it is both less incoherent than c₁ (0.198 0. The ETP says that if ch is one of the possible chance functions, then a thinker’s current credence in some claim Ai, conditional on ch being the current chance function, must agree with ch’s probability for Ai, where ch is being updated on the thinker’s current total evidence. And here’s a version of the Reflection Principle: Reflection Principle: For any claim A, real number x ∈ [0,1], and times t₁ and t₂, where t₁ comes before t₂, rationality requires that c1 ðAjc2 ðAÞ ¼ xÞ ¼ x: The Reflection Principle requires that, conditional on the thinker’s future credence in A being x, their current credence in A is x. Combined with the conditionalization rule, this means that if a thinker learns with certainty that they will have a particular credence in the future, then she should ensure that her current credence agrees with this future credence (for further discussion, see van Fraassen 1989, 1999, Christensen 1991, Talbott 1991, Arntzenius 2003, Briggs 2009, Easwaran 2013b, and Mahtani 2015. Some authors, e.g. Easwaran, interpret the formal structure of the Reflection Principle slightly differently, namely as constraining the agent’s plans for what future credences to adopt.). The last principle I will mention here is what I call the Basing Principle. The principle requires that rational thinkers’ attitudes must stand in appropriate relationships.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

100       Basing Principle: A thinker’s credences must be properly based on their reasons or evidence in order to be (doxastically) rational. This principle is familiar from debates about justification in traditional epistemology, because it helps distinguish between thinkers who have reasons for their beliefs, but whose beliefs are not based on those reasons, from thinkers who base their beliefs on their reasons. In the former case, the thinker’s beliefs are merely propositionally justified, and in the latter case, the thinker’s beliefs are also doxastically justified. We can make the same distinction when we evaluate the rationality of a thinker’s beliefs (Wedgwood 2017). The principles mentioned earlier, i.e. Uniqueness, Indifference, and the Deference Principles, all concern propositional rationality, whereas the Basing Principle concerns doxastic rationality, i.e. whether the credences that an agent actually has are properly based, whatever exactly that might mean. The method we have used so far—measuring the distance of an agent’s credences from some ideal credence function—is well suited for determining the degree of propositional rationality of a credence function, but not the degree of doxastic rationality of a credence function. It is easy to see why— distance measures don’t pick up on the reasons for which a thinker adopts or holds a particular credence assignment. We will thus set the Basing Principle aside for now and focus on measuring degrees of propositional rationality when there are multiple principles of propositional rationality in play. The question of what makes an agent’s credences doxastically rational is interesting and difficult in its own right, and I hope to address it in more detail elsewhere (for some relevant discussion, see Smithies 2015, Gibbs 2017, Dogramaci 2018).

1.2 Justification, Values, and Conflicts In discussing the requirement that thinkers should have probabilistic credences, I assumed a broadly value-based approach to justifying requirements of rationality. I will presuppose this approach more generally in what follows. According to this approach, any requirement of epistemic rationality can be explained in terms of a relevant value. For example, the value of accuracy lets us rank a thinker’s credences according to how closely they approximate the truth. The closer the thinker’s credences are to the truth, the more valuable

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

101

they are according to this dimension of evaluation. Principles of rationality, such as the requirement to have probabilistically coherent credences, can then be explained in terms of the value they help promote. This is exactly how the accuracy argument for probabilism proceeds. On this view of how requirements of rationality are justified, it makes sense to rank thinkers’ beliefs as being more or less (ir)rational, because being less rational corresponds in some way to having beliefs that are less epistemically valuable. Once we think of rational requirements as being justified in virtue of promoting epistemic value, we are faced with the question of whether there is just one epistemic value, or multiple epistemic values. For example, accuracy-firsters claim that there is only one fundamental epistemic value, namely accuracy. On their view, any and all requirements of epistemic rationality are derivable from the value of accuracy. Another somewhat popular suggestion is that the only fundamental epistemic value is to have credences that are correctly proportioned to one’s evidence, and one might then proceed to justifying rational requirements on credences by arguing that complying with those requirements promotes this value. Or perhaps one might argue that having knowledge is what’s ultimately and uniquely epistemically valuable. These kinds of single-value views can be contrasted with views according to which there is more than one fundamental epistemic value. Multi-value views explain principles of rationality by arguing that for each proposed principle of rationality, there is at least one epistemic value it promotes. For example, someone might think that there are two epistemic values: accuracy, and matching the evidence. It would be natural on this view to suggest that probabilism promotes the value of accuracy, and that e.g. the Principal Principle promotes the value of matching one’s credences to the evidence.³ The question that arises for multi-value views is how the values interact with each other to give rise to all things considered requirements of rationality. We can’t simply say that each principle that promotes some value is a requirement of rationality on this view, since the demands of different principles can come into conflict. A credence function that is optimal according to one of the values might not be optimal in light of all of the relevant values (see Easwaran & Fitelson 2012 for a case in which the evidentialist demands conflict with the demands of accuracy/coherence). From here on out, I will be using the notion ³ Another option is that there are multiple epistemic values that all support the same requirements. I won’t consider this case separately, to avoid making things unnecessarily complicated. We can treat this case in the same way as the case in which there is a single value that underwrites all principles of rationality, as long as the different values don’t require us to use incompatible divergences for measuring approximations to ideal rationality.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

102       of a principle of rationality in contrast with the notion of a requirement. The notion of a principle is meant to be weaker than the notion of a requirement, in the sense that it might be overridden once we determine what is all things considered rationally required of a thinker. It’s supposed to be similar to the idea of a prima facie duty in ethics. This is to account for the idea that a particular value could give rise to a principle of rationality, but once all of the relevant factors are taken into account, it might not be all things considered required of a thinker to comply with this principle. Views that claim that there are multiple values that give rise to different principles of rationality thus face the question of what the relationship between these values is. We need to know the relationship between the values to know how they combine to give rise to requirements of ideal rationality. There are three possibilities of how the different values might interact when we try to order credence assignments according to how epistemically rational they are: (i) The values can be lexically ordered, which means that higher-ranked values always trump lower-ranked values, and lower-ranked values only play a role in the ordering when a higher-ranked value sees two options as tied. (ii) The values are incomparable, which means that credence assignments can only be ranked when all the values agree on the ordering. Otherwise no ranking is possible. (iii) The values can be compared, and weighing them against each other is possible in order to produce a ranking of how rational different credence assignments are.⁴ A further interesting question concerns the possibility of irresolvable conflicts between different requirements of rationality. Some authors have argued that requirements of ideal rationality can be in tension with one another, so that it is impossible to fulfill all of them at once. On this kind of view, there are cases in which every credence assignment a thinker might adopt violates at least one requirement of rationality (Christensen 2007, Worsnip 2018, Leonard manuscript). In what follows, I will bracket the possibility of such irresolvable conflicts, and explain how to measure approximations to ideal rationality depending on whether one holds a single-value or a multi-value view. I show how to extend my proposal to views on which rational requirements can conflict in Appendix B. In the next section, I will survey the mathematical options for measuring approximations to ideal rationality. Once we have these options laid out, we can examine which measuring

⁴ As Chad Lee-Stronach has suggested to me, there might also be the option of partial comparability (see Sen 1970). I will ignore this option here to keep the discussion manageable, but I think it should be kept in mind once we look more closely at specific conceptions of epistemic value.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

103

strategies pair best with which views of how epistemic values underwrite requirements of rationality.

2. Formal Possibilities for Using Measures: Bundle and Piecemeal Strategies We saw previously that we can fruitfully employ divergences to capture the intuitive idea that our credences can be closer to or farther away from complying with the coherence requirement. I proposed to measure how coherent a thinker’s credence assignment is by determining its divergence from the closest coherent credence assignment. We were able to single out specific divergences to use in our measure by consulting arguments for probabilism. Choosing an appropriate divergence in this way let us show that approximating coherence delivered epistemic benefits, such as improved accuracy, and decreased normalized Dutch-book losses. We can expand this strategy to measure approximations to being ideally epistemically rational when this involves complying with multiple principles of rationality.⁵ As I mentioned above, I am focusing on principles of propositional rationality, so we can build on the general idea of measuring the divergence between a thinker’s credences and the closest credence assignment that meets the relevant principles(s). Of course, in order to choose suitable measures that track increases in the relevant underlying values, we need to apply the argumentative strategy we used in Chapters 4 and 5. However, since this chapter is supposed to give an overview of different measuring strategies that work for different conceptions of value more generally, it won’t be feasible here to go through this step for every possible view one might have. In Appendix A, I show how we can extend the accuracy-based approach from Chapter 5 to cases in which the Principal Principle is a requirement of rationality in addition to coherence, and to cases in which the Indifference Principle applies. Based on Pettigrew’s accuracy-based arguments for these principles, we can show that moving on a direct path towards the closest credence function that satisfies probabilism and the Principal Principle is guaranteed to improve the expected accuracy of one’s credences from the perspective of the objective chances. And in cases in which the Indifference Principle applies, moving towards the indifferent credences improves the ⁵ My strategy for distinguishing between different measurement options was inspired in part by a related discussion in Pettigrew (2013).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

104       worst-case inaccuracy of one’s credences (see Appendix A for details). I will leave it to the reader to determine which distance measures best track the values that give rise to requirements or principles of rationality on their favored view. Instead, I will move on and show which options there are for measuring approximations to ideal rationality, and how well they fit with different underlying conceptions of value. These general points won’t depend on which specific values and associated distance measures are chosen. There are two general strategies we might employ (ignoring the possibility of conflicts between requirements for now). On the first strategy, which I will call the bundle strategy, we initially determine which credence assignments are rationally permitted for a thinker at a particular time according to all principles of rationality taken together. For example, if someone thinks that Probabilism and the Principal Principle are the only principles of rationality, then there must be some specific coherent credence assignments that are rational to hold for a thinker at a particular time. We then measure, using a suitable divergence, to what degree a thinker’s actual credences approximate some closest credence assignment that is rational for them to hold. Hence, on this strategy, we use all the relevant principles of rationality bundled together to determine a set of permissible credence assignments, and measure the degree to which a thinker’s credences are epistemically rational as the divergence between their actual credences and some closest credence assignment in the set of rationally permissible credence assignments. The second strategy, which I will call the piecemeal strategy, proceeds instead by measuring approximations to each principle of rationality separately, and aggregating the results. Suppose again that Probabilism and the Principal Principle are the only principles of rationality. On this view, we first measure separately to what extent a thinker’s credences fall short of complying with Probabilism, and to what extent their credences fall short of complying with the Principal Principle. In each case, we use a suitable divergence to measure the degree to which the thinker’s credences approximate some closest credence assignment that complies with the individual principle of rationality. The resulting two measurements then need to be aggregated into a single verdict. There are different available aggregation procedures: (a) piecemeal+lexical ordering: An aggregation procedure based on a lexical ordering of the principles ranks credence assignments according to their degree of approximation to the highest-ranked principle, degrees of approximation to lower-ranked principles function as tiebreakers. (b) piecemeal+no comparisons: Each principle of rationality and associated divergence produces

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

105

its own ranking of credence assignments, and an overall ranking of credence assignments in terms of their all-things-considered epistemic rationality can only be achieved in cases in which all the individual rankings agree. (c) piecemeal+weighted averaging: An overall ranking of credence assignments in terms of their all-things considered epistemic rationality is achieved by considering weighted averages of the credence assignment’s approximations to each principle. A simple version of the piecemeal strategy involves simply adding the closeness to compliance with each principle to generate a total score. The options I outlined above are purely formal specifications of different strategies for measuring to what degree a thinker’s credences are rational, where being fully rational requires compliance with multiple principles of rationality. Which of these formal strategies we should actually use depends on how we conceive of the way epistemic values give rise to requirements of rationality. Earlier, I distinguished between value-based accounts of rationality that aim to explain all requirements of rationality in terms of a single epistemic value, and accounts that aim to explain different requirements of rationality in terms of multiple underlying epistemic values. Multiple-value accounts can be further subdivided into (i) accounts assuming that epistemic values are lexically ordered, (ii) accounts assuming that epistemic values are incomparable, and (iii) accounts assuming that epistemic values can be weighed against each other. It is easy to see that piecemeal+lexical ordering is a fitting measuring strategy if there are multiple epistemic values that are lexically ordered. If one holds instead that there are multiple epistemic values that can’t be compared, the most natural measuring strategy to adopt is piecemeal + no comparisons. This leaves us with two remaining views: the view on which there is a single epistemic value that gives rise to requirements of rationality, and the view on which there are multiple values that gives rise to principles of rationality that can be compared with and weighed against each other. In principle, we can combine each of these views about value with each of the remaining two formal measuring strategies. On the bundle strategy, we determine which credence assignment or set of credence assignments is ideally rational according to all the principles of rationality, and then we measure the divergence between a thinker’s actual credence assignment and the closest credence assignment that would be ideally rational for them to have. On version (c) of the piecemeal strategy, piecemeal+weighted averaging, we first determine separately how closely a credence assignment approximates each principle, and then we produce a weighted average of

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

106       the individual scores to determine the overall degree of rationality of the credence assignment. These two strategies are worth examining and comparing in more detail. I will do that by looking at some examples in the next two sections.

3. Application to Cases To get a better sense of how the bundle strategy and the piecemeal strategy for determining orderings of overall rationality work, I will look at two different kinds of cases. The first type of case is one in which we are just comparing any credence assignments according to how rational they are. The second type of case involves constraints on the thinker’s credences. We consider how the thinker can get as close to being rational as possible if her credences are somehow barred from being fully rational. I will show that, given a singlevalue view, the bundle strategy is more suitable than the piecemeal strategy, while both the bundle strategy and the piecemeal strategy seem fitting given a multiple-value view.

3.1 Unconstrained Approximating In order to illustrate how the bundle and the piecemeal strategy work, we will consider a simple example. Consider a thinker who has credences in two propositions, A and ∼A. We will assume that, in order to be rational, the thinker’s credences should obey the following principles: the credences have to be probabilistically coherent, and they have to obey the Principal Principle. We will assume that the thinker’s knowledge about the chances mandates that that both A and ∼A should get credence 0.5. The optimal credence assignment to have in this case is thus: cðAÞ ¼ 0:5 cð∼AÞ ¼ 0:5 If we apply the bundle strategy for determining the overall rationality of the thinker’s credences, all we need to do is select an appropriate divergence, and determine how closely a given credence assignment approximates c. For example, suppose we select squared Euclidean distance (SED) as our

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

107

divergence, and we want to compare the following two credence assignments according to how irrational they are: c₁ðAÞ ¼ 0:4 c₁ð∼AÞ ¼ 0:6 c₂ðAÞ ¼ 0:4 c₂ð∼AÞ ¼ 0:4 It turns out that both c₁ and c₂ are equally far away from c, in terms of SED.⁶ Hence, if this measure was used to determine degrees of overall rationality, the two credence assignments would be judged to approximate ideal rationality to the same degree. What happens if we instead use the piecemeal method? Let’s say we use SED to measure the distance from compliance with each principle separately, and we aggregate the scores by adding them. This procedure ranks c₁ as more rational than c₂. This is because both credences are equally far away from complying with the credences mandated by the Principal Principle, but c₁’s distance to coherence is 0, whereas c₂’s distance to coherence is positive. Hence, any way of aggregating the scores that gives positive weight to how distant a credence assignment is from the closest coherent credence assignment will rank c₁ to be more rational than c₂. This shows that the bundle strategy and the piecemeal strategy are genuinely distinct: They can produce conflicting verdicts about which of two credence assignments is closer to being rational, even if we hold fixed which divergence is used in both cases.⁷ This is an important finding, because it means that if we want to select a measuring strategy by using the kind of value-based justification introduced in chapters four and five, the two measuring strategies won’t necessarily agree on whether two credence functions have the same epistemic value.

⁶ SED between c and c₁: (0.5-0.4)²+(0.5-0.6)²= 0.02 SED between c and c₂: (0.5-0.4)²+(0.5-0.4)²= 0.02 ⁷ This example can be used to demonstrate why it matters how exactly we formulate principles of rationality. Suppose that we set up this case as one in which Probabilism and the Principle of Indifference are the relevant principles, instead of Probabilism and the Principal Principle. If we formulate the Principle of Indifference in a way that never counts non-probabilistic credences as satisfying it, then the example works out exactly as shown above. But if we go with a weaker version of the Principle, then c₂ would count as satisfying it, while not satisfying probabilism. This affects how c₁ and c₂ compare on the piecemeal+averaging strategy. If we use a weak version of the Principle of Indifference, and we give equal weight to compliance with both principles, c₁ and c₂ are judged to be equally irrational.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

108       3.1.1 Single-value Views Which of these measuring strategies should be adopted by an advocate of a single-value approach? In order to answer this question, it will be useful to take a closer look at the currently most popular single-value approach—the accuracy-first approach to justifying norms of epistemic rationality. According to this view, the only thing that is epistemically valuable is to have credences that are as accurate as possible. Rational norms are justified insofar as complying with them furthers the goal of having accurate credences. A prominent advocate of this view is Richard Pettigrew (2016), who has shown that accuracy arguments can be given to support Probabilism, the Indifference Principle, Conditionalization, and the Principal Principle. Easwaran (2013b) shows that there are also accuracy arguments in favor of the Reflection Principle and a principle called Conglomerability. The accuracy arguments for the different rational norms rely on different principles of decision theory. As I explained in Chapter 5, the accuracy argument for probabilism relies on the dominance principle: it is shown that if a thinker has incoherent credences, there is always a coherent credence assignment that is more accurate in every possible world, and that is not itself accuracy dominated. Pettigrew argues that, depending on the circumstances, we can rely on stronger decision theoretic principles to derive more stringent constraints on thinkers’ credences than coherence. For example, in cases in which the thinker has not yet gathered any evidence, Pettigrew argues that it is rational to select a credence assignment via the Maximin rule. According to Maximin, a thinker should choose the option that has the best worst-case outcome. Pettigrew argues that, in the absence of any evidence, thinkers should adopt a flat credence distribution, because doing so will minimize the maximum inaccuracy of their credences in any possible world. The credence distribution required by Maximin is thus exactly the credence distribution that complies with the Indifference Principle.⁸ In cases in which thinkers have knowledge of what the objective chances are, by contrast, Pettigrew argues that a different decision theoretic principle—a chance-based dominance principle—is appropriate to apply, which supports a version of the Principal Principle as a norm of ideal rationality. These decision theoretic principles are all stronger than the dominance principle, and thus lead to stricter constraints on credence assignments. Hence, on this picture, we can determine which norms of rationality a thinker should obey in a situation by selecting the ⁸ Pettigrew also discusses alternative arguments for the Principle of Indifference, see section 3 of his (2016).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

109

appropriate principle of decision theory and combining it with a suitable inaccuracy measure that helps determine the epistemic value of any credence assignment. Because the only thing that matters from an epistemic point of view is accuracy, according to this line of reasoning, the only thing that matters for determining a credence assignment’s degree of irrationality is how much accuracy is lost by having this credence assignment instead of the ideally rational credence assignment. However, because the decision theoretic principles used to justify different principles of rationality in terms of accuracy vary, the way we must track improvements in accuracy that are due to approximating the ideally rational credences also varies depending on which principles are in force. In Chapter 5, we showed that when we use a dominance-based argument to justify probabilism, we can then also show that approximating some closest coherent credence assignment on the most direct path produces guaranteed improvements in accuracy in every possible world. By contrast, the Principal Principle, which we assumed to be in force in our example introduced earlier, is justified by Pettigrew as follows: if one’s credences violate the Principal Principle, there is an alternative credence function that is probabilistic and satisfies it, which has a better expected Brier-accuracy according to all the possible current chance functions. If one satisfies the principle, there is no such alternative credence function. In a similar fashion to the argument in Chapter 5, we can then show why credence functions are better the more closely they approximate the credences that are recommended by the Principal Principle and Probabilism taken together: moving on a direct path towards the closest credence function that satisfies probabilism and the Principal Principle is guaranteed to improve the expected accuracy of one’s credences from the perspective of the objective chances. In other words, what is bad about violating the Principal Principle and Probabilism is that it leads to a decrease in expected accuracy from the point of view of the objective chances. (And in cases in which the Indifference Principle applies, moving towards the indifferent credences improves the worst-case inaccuracy of one’s credences. See Appendix A for details.) Applied to our example, we know that c(A) = 0.5 and c(∼A) = 0.5 is the optimal credence assignment from an accuracy perspective, because it satisfies Probabilism and the Principal Principle. We can thus determine how much less rational an alternative credence assignment is by calculating how much more inaccurate c expects it to be compared to how inaccurate c expects itself to be. This way of determining degrees of overall rationality in the accuracy-first framework clearly corresponds to the bundle strategy discussed

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

110       above, where we use a divergence to measure how far a credence assignment is from some closest ideally rational credence assignment. We saw that in our example, the bundle strategy combined with SED as our sample divergence delivered the result that c₁ and c₂ are equally far from the ideal credence assignment. Hence, the result delivered by the bundle strategy is correct from the perspective of the accuracy-first framework that only cares about how much accuracy is lost or expected to be lost by adopting a less than fully rational credence assignment. We can now see that the piecemeal strategy for measuring degrees of overall rationality is a poor fit for a single-value approach like the accuracy-first approach to justifying requirements of rationality. The piecemeal approach ranks c₁ to be closer to ideally rational than c₂, because c₁ is coherent, whereas c₂ is not. This is true regardless of which divergence we use in our measurements. Yet, on the accuracy approach there is no reason to value coherence independently of its contribution to accuracy. c₁ and c₂ are both problematic in the same way and to the same degree, because they increase the expected inaccuracy from the perspective of c by the same amount. Hence, from a value perspective, there is no reason to prefer c₁ to c₂ because it is closer to some coherent credence assignment or other. The accuracy-first approach is of course not the only possible single-value approach. For example, on some versions of evidentialism, what fundamentally matters is that a thinker’s credences match their evidence (or perhaps their reasons).⁹ On this approach, the optimal credences are those that are supported by the thinker’s evidence. Presumably, proponents of this approach would want to derive common requirements of rationality, such as Probabilism, by pointing out that your evidence can never allow you to have incoherent credences (although see Worsnip 2018, for an argument that this doesn’t work). Other principles, such as the Indifference Principle, have also been defended on evidentialist grounds (White 2010). Evidentialists can use very similar tools for measuring divergences from ideal rationality as accuracyfirsters. While this view has not been developed, to my knowledge, evidentialists might find it attractive to use similar measures as accuracy-firsters to determine how much a credence assignment diverges from the ideal. Using strictly proper scoring rules would ensure that the credences best supported by the evidence consider themselves to be better than any alternative credences.

⁹ Of course, one might also think that matching one’s credences to one’s evidence is only valuable because doing so promotes the formation of true beliefs. On this view, the single fundamental value is not matching one’s credences to the evidence, but forming true beliefs.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

111

These measures could then also be used to measure how much a credence assignment diverges from the one(s) best supported by the evidence. If the evidentialist took this approach, it’s easy to see that the relevant epistemic utility rankings would resemble the rankings of the accuracy-firster. Just like the accuracy-firsters, evidentialists don’t care about coherence unless it helps the thinker better comply with their evidence. We can thus make a similar argument as above, showing that the bundle strategy fits better with an evidentialist single-value account than the piecemeal strategy. 3.1.2 Multiple-value Views It is less clear which measuring strategy a proponent of a multi-value account should prefer. A proponent of a multi-value account might hold that the piecemeal account is in fact getting things right. Even though the proponent of a multi-value account would agree that c is the ideal credence assignment because it is both coherent and obeys the Principal Principle, it seems less clear that all that matters for measuring degrees of overall rationality is how far a given credence assignment is from the ideal credence assignment. For example, if Probabilism is a rational principle because it promotes the value of coherence, and the Principal Principle is a rational principle because it promotes the value of respecting one’s evidence, then complying with each principle matters independently. And if this is the case, we can see that this might justify judging c₁ to be less irrational overall than c₂, because at least c₁ does not fail to be coherent, whereas c₂ does. On the other hand, even on the multi-value view, one might argue that, once we have determined which credence assignment optimally promotes all the relevant values, a credence assignment is better the more closely it approximates the one that optimally promotes all the epistemic value. This line of reasoning would speak in favor of using the bundle strategy to measure how closely a given credence assignment approximates ideal rationality. However, this can only work if the distance from the optimal credence assignment somehow tracks improvements in all the relevant values simultaneously.¹⁰ In the next section, I will investigate cases of what I call “constrained approximation.” These are cases in which a thinker’s credences are somehow constrained from being ideal, and the question is what the optimal credence assignment is for this thinker, given the constraints on their credences. These cases are interesting to investigate because they give us further examples that can help us illuminate the differences between the bundle strategy and the ¹⁰ Thanks to Kenny Easwaran for helping me clarify this point.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

112       piecemeal strategy. This will help us decide which strategy combines better with a multi-value view.¹¹

3.2 Constrained Approximation We will now consider scenarios in which the optimal state (being perfectly rational) requires that a variety of conditions are met, and it is possible to satisfy some of them without the others. An interesting question to ask about these kinds of cases is: if one principle is violated, is the second best possibility one in which the remaining principles are still obeyed? I will show that the answer to this question is negative.¹² We will again focus on the simple example in which a thinker has credences in two propositions, A and ∼A, which are supposed to be coherent and obey the Principal Principle. The optimal credence assignment to have in this case is as before: cðAÞ ¼ 0:5 cð∼AÞ ¼ 0:5 To introduce the “second best” problem, we will now suppose that the thinker is prevented from complying with the Principal Principle. We will stipulate for this purpose that the thinker is stuck with a credence in A of 0.4. However, they could still be coherent, if they assigned c₁(A) = 0.4 and c₁(∼A) = 0.6. The question we thus face is: given that the thinker is forced to violate one principle of rationality (the PP), would they be epistemically better off if they adopted a credence assignment that fulfills the other principle of rationality, coherence, or if they adopted some other credence assignment? The answer to this question of course depends on which measuring strategy we adopt. I will first examine the different versions of the piecemeal strategy, followed by the bundle strategy. On the first version of the piecemeal strategy, there is a lexical ordering of rational principles, such that the proximity to the highest ranked value ¹¹ I present an additional argument for why the multi-value view should adopt the bundle strategy in Chapter 7. ¹² The problem of how to measure approximations to an ideal state, where the ideal state requires that multiple constraints are met, has been discussed in a formal manner in economics under the label of the general theory of second best (Lipsey & Lancaster 1956–7, Wiens 2016). The constrained approximation cases I discuss here are superficially similar to the problem in economics, but have importantly different structural features, as David Wiens has helped me see.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

113

determines the ordering of credence assignments, and proximity to lower ranked values functions as a tiebreaker. Which credence assignment that assigns a credence of 0.4 to A is second best depends on the ranking of principles in this case. If Probabilism is ranked above the PP, then c₁(A) = 0.4 and c₁(∼A) = 0.6 is the second best credence assignment, because it is coherent. However, if the PP is ranked above Probabilism, then what matters most is how closely the thinker’s credences approximate the credences prescribed by the Principal Principle. In this case, the credence assignment that is closest to the ideal credences is c₃(A) = 0.4 and c₃(∼A) = 0.5, regardless of which additive divergence we use to measure closeness. On the version of the piecemeal strategy that treats approximations to different rational principles as incomparable, we only get orderings of credence assignments when there are no conflicts between the orderings produced by approximations to the individual principles. As a result, this strategy cannot deliver a verdict about whether c₁ or c₃ is the second best credence assignment. Probabilism favors c₁ over c₃, whereas the PP favors c₃ over c₁. Since this is a conflict case, neither of these two credence assignments gets selected as second best (nor does any other credence assignment). What happens if we use the weighted-averaging version of the piecemeal strategy? In our example, we know that our thinker is stuck with a credence of 0.4 in A. We can use a distance measure, say squared Euclidean distance (SED), in order to determine the degree to which any given credence in ∼A lets the thinker approximate any of the norms. We know that assigning c₃(∼A) = 0.5 minimizes the thinker’s distance from complying with the PP, and assigning c₁(∼A) = 0.6 minimizes the thinker’s distance from complying with probabilism. If we want to find the credence that has the best added score (i.e. that minimizes the sum of the SED to the closest coherent credence assignment and the SED to the credence recommended by the PP), we must take the average between the two credences (Moss 2011, see also Pettigrew 2019). This means that the recommended second best credence assignment according to this way of determining epistemic utility is c₄(A) = 0.4 and c₄(∼A) = 0.55.¹³ Moss (2011) shows that we would get the same result if

¹³ The way in which this result was arrived at was the following, using the Brier score or SED as our distance measure: First, we compute the expected Brier score of the credence assignment that is best according to the PP, which is c(A) = 0.5 and c(∼A) = 0.5. Its expected Brier score is 0.5. To score credence assignments according to how closely they approximate the PP, we calculate how much they would increase the expected Brier score relative to c. To compute compliance with the coherence norm, we calculate the expected Brier score of the best available coherent credence assignment, which is c₁(A) = 0.4 and c₁(∼A) = 0.6. Its expected Brier score is 0.48. To score credence assignments according to how closely they approximate coherence, we calculate how much they would increase the expected Brier

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

114       we used another strictly proper scoring rule as the basis of our distance measure from each of credence assignments recommended by the norms. If we decided to give different weight to the different norms, for example if we decided to give twice as much weight to approximating coherence than to complying with the PP, then we could find the recommended second best credence by looking for the credence that is generated by a weighted average between credences that gives twice as much weight to the credence that is optimal according to the coherence norm. Of course, we could also use different divergences to score approximation to each norm, in which case there isn’t an easy way to figure out which credence assignment is second best, we’ll just have to calculate which credence assignment minimizes the weighted approximation score. Regardless of which of these versions of the proposal we accept, we will find that the credence assignment that is recommended as second best by this version of the piecemeal strategy will be some kind of compromise that lies between the credence assignments that are second best according to any epistemic norm considered individually. Hence, we can end up with a credence assignment that none of the norms consider second best, and that is also not the credence assignment that most closely approximates the credence assignment that is ideal according to the joint verdict of the epistemic norms. This is exactly what happens in our example: Holding fixed a particular distance measure, say SED, c₄, which assigns c₄(A) = 0.4 and c₄(∼A) = 0.55, is not the credence assignment that is second best according to either the coherence norm or the PP, and it is also not the credence assignment that is second best in the sense that it best approximates the credence assignment that is ideal according to the combined verdict of the epistemic norms, which gives A and ∼A 0.5 credence each. This result is somewhat surprising, since it delivers a rather counterintuitive answer to the question of which credence assignment is second best. c₄ doesn’t seem like a very natural candidate for being the second best credence assignment when c is the ideal credence assignment. This kind of result might push proponents of the multi-value

score relative to c₁. To determine which credence assignments are best according to both norms, we determine which assignments minimize the sum of the expected Brier increases according to both norms. This amounts to giving both norms equal weight in determining the ranking of credence assignments. We can check the result in Mathematica by using the following equation: Minimize ½ð0:5ð1  0:4Þ2 þ 0:5ð0  0:4Þ2 þ 0:5ð1  xÞ2 þ 0:5ð0  xÞ2  0:5Þ þ ð0:4ð1  0:4Þ2 þ 0:6ð0  0:4Þ2 þ 0:6ð1  xÞ2 þ0:4ð0  xÞ2  0:48Þ; 0< ¼ x< ¼ 1; x Result: {0.015,{x=0.55}}

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

115

approach towards using the bundle strategy for measuring approximations to overall rationality. The bundle strategy delivers a more intuitive result, as I will now show. According to the bundle strategy, what matters is how closely a given credence assignment approximates the ideally rational credence assignment. Given any additive divergence, and given the constraint that A is assigned a credence of 0.4 by the thinker, the credence assignment that approximates the ideal credence assignment c most closely is c₃(A) = 0.4 and c₃(∼A) = 0.5. Since c₃ agrees with c on the credence in ∼A, there is no other credence assignment that could be closer. This result of the bundle strategy for measuring proximity to overall rationality is correct from the perspective of a single-value view like the accuracy-first view, for reasons explained in the previous section. All that matters are increases in (expected) inaccuracy, and the bundle strategy selects the credence assignment as second best that has the smallest increase in expected inaccuracy from the perspective of the ideal credence assignment. We thus have another instance in which the bundle strategy and the weighted averaging version of the piecemeal strategy deliver different results. We already settled that the bundle strategy is the right choice of measuring strategy for proponents of single value views about requirements of rationality. We were uncertain about which strategy should be used by proponents of multiple-value views who think that different epistemic values can be weighed against each other. Both measuring strategies initially seem fitting, but the fact that the piecemeal strategy delivers a rather counterintuitive result in our example might help decide the matter in favor of the bundle strategy. There is of course more to be said about this, but the matter is probably most fruitfully decided in a context in which a particular multi-value account is defended. Once we have a particular multi-value view on the table, it is of course possible that the piecemeal strategy tracks improvements in epistemic value better than the bundle strategy, despite its counterintuitive results. Since I am not arguing for a particular view of epistemic value here, I will leave the matter for another time. More generally, we find that it is not the case that the second best credence assignment is one that still meets the principles of rationality that are not barred from being complied with by the constraint. The only exceptions to this are cases in which principles of rationality are lexically ordered, and the thinker is not prevented by the constraint from complying with the highest ranked principle.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

116      

4. Does this Ruin Our Nice Results About Approximating Coherence? One might worry at this point how the results from this chapter square with the results from previous chapters concerning why it is beneficial to approximate coherence. As we learned in Chapter 5, if an incoherent thinker were to change their credences by moving on the direct path towards the closest coherent credence assignment (where a suitable divergence is used to determine what the closest coherent credence assignment is), they would be guaranteed to keep increasing the accuracy of their credences in all possible worlds as she approaches coherence. I argued that this result supports and precisifies the initial idea that it is epistemically valuable to be less rather than more incoherent. Yet, as we saw in this chapter, many epistemologists think that there are other rational requirements besides Probabilism that thinkers’ credences must obey in order to be fully propositionally rational. Once we add in these additional requirements, the following happens: If we have a thinker who has credences that are not fully propositionally rational, and we determine via the bundle strategy or via a version of the piecemeal strategies which alternative credence assignments would be a rational improvement over the thinker’s current credences, it is no longer guaranteed that the credences that constitute such an improvement in overall propositional rationality also lie on the direct path towards the closest coherent credence assignment. Consider again our earlier example, in which the ideal credences to have are c(A) = 0.5 and c(∼A) = 0.5. Now imagine a thinker in fact has the following credences: c5 ðAÞ ¼ 0:9 c5 ð∼AÞ ¼ 0:2 Compare this to the credence assignment c₆(A) = 0.7 and c₆(∼A) = 0.35. This credence assignment lies halfway on the direct path between c₅ and c. c₆ has two interesting features. For the first feature: suppose that we use one of the divergences that we singled out at the end of Chapter 4, such as squared Euclidean distance or Generalized Kullback–Leibler to measure distance from compliance with a principle of rationality. c₆ is closer to being rational than c₅ regardless of which measuring strategy we use: If we use the bundle strategy, we find that c₆ is closer to c than c₅ regardless of which divergence we use. If we use the piecemeal strategy, on each version of it (lexical ordering, incomparability, and the weighted aggregation strategy), c₆ is closer to being

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

     

117

rational than c₅. The second interesting feature is that c₆ does not accuracydominate c₅ according to any of the inaccuracy measures associated with the relevant divergences.¹⁴ Hence, we cannot justify why the closeness judgments we get from applying any of the measuring strategies really represent an improvement in rationality of c₆ over c₅ by appealing to a guaranteed increase in accuracy, since no such guaranteed increase obtains. One might thus wonder: What is the justification for claiming that these rankings reflect genuine improvements in overall rationality, when they don’t preserve the justifications for individual requirements? Let’s first examine how an advocate of a single-value view, such as the accuracy-first view, can respond to this. According to this view, what really matters is which credence assignment is best from an accuracy-perspective. If a thinker has incoherent credences, and this is the only constraint that is relevant to determining which credences are best from an accuracyperspective, it makes sense that moving towards the closest coherent credence assignment would be beneficial, because it would lead to a guaranteed improvement in accuracy across all possible worlds. However, compare this case to the one we focused on earlier, in which the thinker also has information about the chances, which they can rely upon in adjusting their credences. According to Pettigrew, the reason why thinkers should comply with the Principal Principle is that, if they don’t, there is an alternative, PP-compliant credence function that has better expected accuracy according to every possible chance function. Moreover, as I show in Appendix A, if an agent moves on the direct path towards the closest coherent PP-compliant credence function, the expected accuracy of their credences relative to all possible chance function will improve. In this sense, c₆ is an improvement over c₅ in terms of accuracy. (Notice that there might also be additional ways of improving the expected accuracy of one’s credences relative to the chances. Moving directly towards the closest PP-permissible credence function is just a way of guaranteeing this.) The fact that c₆ doesn’t accuracy-dominate c₅ isn’t a problem, because there are additional factors that affect which credence function is rational for the agent to adopt that trump dominance reasoning. Hence, the results we discovered earlier about approximating coherence are not at all irrelevant. However, compliance with requirements of propositional rationality is not an end in itself. It is beneficial when it helps the thinker acquire more epistemic value. Yet, in situations where the thinker has information about the chances (or the truth, or other accuracy-relevant information) that ¹⁴ The math is easy to do and left to the reader.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

118       they can rely upon, what is most rational for them is no longer determined just by relying on dominance reasoning, because they have stronger accuracy-based reasons to move to a non-dominating credence assignment (see also Pettigrew 2013). What if we adopt a version of the piecemeal strategy instead? On this view, Probabilism is just one principle of rationality that promotes a particular value, such as accuracy. But there can be other principles of rationality that promote different values. Depending on the version of the piecemeal strategy we adopt, the value promoted by Probabilism may be trumped by another value, or it must at least be weighed against competing values. In each case, it is clear that the original justification for approximating probabilistic credences still matters, because it matters to comply with the individual principles that enter into the piecemeal strategy’s measurement. However, that doesn’t mean that the overall ranking of credence assignments will still reflect all the individual justifications. If there is another value that trumps the value supporting probabilism, then the question of which credence assignment more closely approximates coherence can only serve as a tie-breaker. If the best credence assignment is determined by a weighted aggregation strategy, it is unsurprising that the resulting credence is not necessarily best according to any of the requirements entering into the aggregation. This is just how aggregation works, and it doesn’t mean that the justifications for the individual principles no longer matter.¹⁵ A slightly different worry one might have concerns recommendations about how a thinker should change her credences. So far, we have developed ways of ranking credence assignments according to how closely they approximate individual principles of propositional rationality, and how closely they approximate overall propositional rationality. We have also seen that, for a given thinker, particular ways of approximating compliance with principles of rationality can be beneficial in terms of the accuracy of their credences, and the extent to which they are vulnerable to Dutch book losses. This makes it tempting to think that if one credence assignment is ranked to be more rational than some other credence assignment, a thinker who has the less rational credence assignment should move to the more rational credence assignment if they can. This suggestion seems especially natural in the case in which the thinker can improve their accuracy in every possible world by approximating coherence more closely. But once we see that credence ¹⁵ On some theories, the result of aggregating multiple values might be a single meta-value, which would then fit the model of a single-value view.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



119

assignments can be ranked with regard to how rational they are in ways that are not immediately justifiable via principles like accuracy dominance, one might worry that we end up with implausible recommendations about how thinkers can improve their rationality. For example, suppose again that Probabilism and the Principal Principle are the relevant principles of rationality, and that the ideally rational credences are given by cðAÞ ¼ 0:5 cð∼AÞ ¼ 0:5 Compare then the following non-ideal credence assignments: c₆(A) = 0.7, c₆(∼A) = 0.35 and c₇(A) = 0.45 and c₇(∼A) = 0.38. Suppose we measure their degree of approximation to the ideal credences, given by c, with the bundle strategy and squared Euclidean distance as our divergence. It is easy to see that c₆ is further away from c than c₇, hence the degree of irrationality of c₇ is lower than that of c₆. But even though c₇ is less irrational than c₆, it seems strange to think that a thinker who has the credences represented by c₆ should change their credences to c₇. There are in fact infinitely many less irrational credence functions than c₆, and we certainly don’t want to say that a thinker should adopt an arbitrary one of them (and c₇ would be such an arbitrary one) just because it is less irrational. What this shows is that it would be a mistake to think that rankings of credence assignments according to their degree of propositional rationality immediately translate into recommendations for how thinkers should improve their credences. Just because an alternative credence assignment is closer to being rational than mine, it doesn’t automatically mean that it’s a good idea for me to change my credences to that credence assignment. In making recommendations regarding how thinkers ought to reason with or revise their credences, there are factors in addition to degrees of propositional rationality that are relevant, for example whether there is a way for the agent to adopt those credences in a way that is doxastically rational, or based on good reasoning. I will say more about the relationship between evaluations of degrees of propositional rationality and claims about how a thinker should change their credences in the following chapters.

Conclusion In this chapter, I explored how we should measure approximations to ideal rationality when there is more than one synchronic principle of ideal

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

120       epistemic rationality. The principles I focused on are best understood as principles of propositional rationality—the measurement strategies under consideration here are not well-suited for determining how doxastically rational a thinker’s credences are. I laid out different mathematical strategies for measuring such approximations, and examined which strategies are best suited for different views about how epistemic values give rise to principles of rationality. The table below summarizes my findings. A “-” signifies a poor fit between the view of value and measuring strategy, a “+” signifies a good fit, a “?” signifies a questionable fit. Measuring strategies and views about values Measuring strategies!Views about value #

Bundle

Piecemeal +lexical ordering

Piecemeal +no comparisons

Piecemeal +weighted averaging

Single value Multiple lexically ordered values supporting different principles Multiple incomparable values supporting different principles Multiple comparable values supporting different principles Multiple values supporting all the same principles Conflicting requirements (Appendix B)

+ 

 +

 

 





+



+





?

+¹⁶









+

+



For each view of value, we found a well-fitting measuring strategy, except the view on which there are multiple comparable epistemic values that give rise to different principles. Both the bundle strategy and the piecemeal +weighted averaging strategy seemed like promising candidates initially. The former strategy delivered more intuitive results in constrained approximation cases and thus emerged as having an advantage. In the next chapter, I will offer an additional argument for why the bundle strategy should be paired with this type of multiple-value view. By examining cases of constrained approximation, we also found that, once a thinker is unable to fulfill one of several principles of rationality, the second best credence assignment is not necessarily one in which the remaining ¹⁶ Assuming the same divergence can be used to track each value.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

121

principles are still fulfilled. In the next chapter, we will apply the measures we just developed to cases in which thinkers change their credences.

Appendix A: Extending the Argument from Chapter 5 to the PP and PoI Pettigrew (2016) offers accuracy-based arguments for versions of the Principle of Indifference (PoI) and the Principal Principle (PP). We can build on these arguments using the same type of strategy we employed in Chapter 5 in order to show why it can be beneficial for thinkers to approximate the credence recommended by the PoI or by the PP.

Accuracy-based Justification of the PP Pettigrew discusses various versions of the PP. Which one we should adopt depends on various factors. I will discuss here his argument for what he calls the Evidential Temporal Principle (ETP). Pettigrew points out that it has to be modified if we allow for selfundermining chances, but for simplicity’s sake, we won’t worry about that case here. Here is Pettigrew’s statement of the principle: Evidential Temporal Principle (ETP): If an agent has a credence function c and total evidence E, and Tch is the proposition that ch is the current chance function, then rationality requires that cðAi jTch Þ ¼ chðAi jEÞ for all propositions Ai in {A₁, . . . , An}, and all possible chance functions ch such that Tch is in {A₁, . . . , An} and c(Tch) > 0. Pettigrew’s accuracy-based justification for ETP is stated below. (I*ETP) Veritism: The ultimate source of epistemic value is accuracy. (II*ETP) Brier Alethic Accuracy: The inaccuracy of a credence function at a world is the squared Euclidean distance from the omniscient credence function at that world to the credence function. (III*ETP)

Current Chance Evidential Immodest Dominance:

Suppose ℑ is a legitimate measure of inaccuracy and E is a proposition. Then if (i) c is strongly current chance ℑ-dominated¹⁷ by probabilistic c’ conditional on E, (ii) there is no credence function that weakly current chance ℑ-dominates c’ conditional on E, and

¹⁷ In explication of clauses (i) and (ii), Pettigrew offers the following definitions: Let C be the set of possible current chance functions, let E be a proposition, and U be a utility function. Then: - We say that o* strongly current chance U-dominates o conditional on E if ExpU ðojchðjEÞÞ x, we know that c’’ has a better worst-case inaccuracy than c. Hence, moving towards to credences recommended by the PoI reduces the thinker’s worst-case inaccuracy, and thus can be justified because it delivers an accuracybenefit. My general suggestion has been that we should measure the degree to which a thinker’s credences are irrational by measuring the distance between their credences and the closest rationally permissible credence assignment. So in this case, we would choose a suitable distance measure or divergence, and measure how far away a thinker’s credences are from the indifferent credence assignment. However, if we accept Pettigrew’s justification for the PoI, which asks thinkers to minimize worst-case inaccuracy, then we might also consider a different strategy. We might measure the irrationality of credence functions according to their worst-case inaccuracy. To see that this makes a difference, consider the following credence functions, familiar from the main text of the chapter: cðAÞ ¼ 0:5 cð∼AÞ ¼ 0:5 c1 ðAÞ ¼ 0:4 c1 ð∼AÞ ¼ 0:6 c2 ðAÞ ¼ 0:4 c2 ð∼AÞ ¼ 0:4 Using squared Euclidean distance, c₁ and c₂ are equally far away from c. However, they don’t have the same worst-case inaccuracy. For example, if we measure inaccuracy with the Brier score, c₁ has a worse worst-case inaccuracy (0.72) than c₂ (0.52). Hence, adopting this

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

125

alternative way of measuring degrees of rationality in contexts in which the PoI applies leads to different results than using the standard measuring strategy I advocate in the text. I won’t pursue this suggestion here further, but I think this is an interesting alternative to my standard approach if we adopt Pettigrew’s argument for the PoI.

Appendix B: The Case of Conflicting Ideals In this chapter, we have assumed that all-things-considered requirements of rationality can always agree upon what the ideally rational credences for a particular thinker at a particular time are. But some philosophers worry that requirements of rationality can come into conflict. A common example of such a conflict concerns the requirement, which is an aspect of Probabilism, that we must give credence 1 to all tautologies, and the requirement that we must proportion our credences to our evidence. The worry is that we all have ample evidence about our own fallibility. We have all made mistakes occasionally in trying to identify logical truths, and even if we haven’t made mistakes before, it hardly seems like we have good evidence that we are immune from error in the future (cases of this sort are common in the literature on higher-order evidence, see for example Christensen 2007 and Leonard, manuscript). Hence, it seems like complying with probabilism requires that we give logical truths credence 1, but complying with the requirement to respect our evidence requires that we give logical truths less than full credence. Hence, there is no credence we can assign to a given logical truth that doesn’t violate one requirement or the other. How can we determine the degree to which someone’s credence are rational when the requirements of rationality are in conflict with each other? In order to answer this question, it is helpful to survey the different responses to cases of rational conflict in the literature. Leonard (manuscript) lays out three proposals, which he labels Priority, Conflict and Indeterminacy. His paper argues for endorsing Indeterminacy, but since our purpose here is not to adjudicate between these proposals, I will simply explain how to determine approximations to overall rationality on each proposal. According to the Priority solution, conflicts between requirements of rationality are only apparent, since in each alleged conflict case, one of the requirements of rationality trumps the conflicting requirement(s). It is easy to see that this solution is one we have already encountered above. It is the view according to which there is a lexical ordering of epistemic values and the requirements they give rise to. We assumed in our earlier discussion that it is fixed which rule or value is the trumping one, but one might of course also adopt a view where this changes in different situations. We already discussed how to measure approximations to ideal rationality on a lexical ordering view, so there’s no need to further attend to it here. By contrast, the Conflict account claims that there are genuine rational dilemmas. Hence, the conflict view claims that in the examples we were considering, it is both true that it is rationally required to assign credence 1 to each tautology, and that it is rationally required to assign a credence less than 1 to each tautology. If there are dilemmas at the level of ideal rationality, it is natural to assume that these dilemmas extend to the question of how to order credence assignments according to how rational they are. This idea can be formalized by using the no-comparisons version of the piecemeal strategy. On this strategy, each requirement of rationality and associated divergence produces its own ranking of credence assignments, and an overall ranking of credence assignments in terms of

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

126       their all-things considered epistemic rationality can only be achieved in cases in which all the individual rankings agree. The cases in which the rankings disagree are then treated on the Conflict account in the same way as the conflicts at the ideal level. If according to the ranking associated with requirement A, credence assignment c is closer to the ideal than credence assignment b, but according to requirement B, b is ranked higher than c, then both of these claims are true, and they are treated as being in tension. Leonard argues that there are problems with both Conflict and Priority, and instead adopts Indeterminacy. According to this view, when multiple requirements of rationality come into conflict in a particular case, then the thinker is immune from one of the requirements in that case. However, unlike on Priority, which claims that there is a determinate answer to the question of which requirement the thinker is immune from in a given case, the Indeterminacy view claims that when there are two equally good resolutions to a putative rational dilemma, and it is unsettled which one of them is correct, then it is indeterminate which requirement the thinker is immune to. It is easy to see how this account of conflicting requirements extends to measuring degrees of rationality. The most natural approach is to appeal to one of the two versions of the piecemeal strategy just mentioned. The idea is that either the lexical ordering or the no-comparisons piecemeal strategy is used to produce orderings according to each of the conflicting norms. In the conflict cases, one of the orderings is correct, but it is indeterminate which one it is.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

7 Evaluating Credence Change Introduction In the previous chapter, we saw how we can use distance measures to evaluate and compare credence assignments according to their overall propositional rationality. Depending on the view of epistemic value accepted, different strategies for evaluating the overall rationality of a credence assignment are appropriate. In this chapter, I will show how the measures we developed can be used to evaluate changes in a thinker’s credence assignment. I will consider three types of changes: cases in which the thinker revises their credences (i) without learning new information or adding new attitudes; (ii) without learning new information, but adding new attitudes; (iii) as a response to learning new evidence. Two important qualifications need to be made right away, to ensure that it is clear what the discussion in this chapter is about. First, I am talking about evaluating changes in a thinker’s credences, rather than reasoning. Those two can come apart. For example, suppose a thinker learns that A is true, and their credences change as a result. One way in which this could happen is through reasoning. It is controversial how exactly we should define reasoning, but a reasoned change in credences is interestingly different from a credence change that happens because of a bump on the head, or sophisticated magnetic stimulation, or some such process. The way our distance-based measures of rationality are set up, they cannot distinguish between a change in credences that results from reasoning, and a change in credences that comes about in a different way. Whether a thinker has reasoned or not must be decided by other criteria. What our measures can provide is an evaluation of how a credence change affects the propositional epistemic rationality of a thinker’s credences. The other key qualification about the aims of this chapter concerns the distinction between evaluations, and prescriptions or advice. This chapter is primarily about evaluating a change in the thinker’s credences, and figuring out how the change affects the propositional epistemic rationality of the thinker’s credences. The fact that a credence change would reduce the propositional epistemic irrationality of the thinker’s credences does not immediately

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

128    entail that it is advisable for the thinker to change their credences in this way. Sometimes this is true, but sometimes it is not. So readers should be cautious about jumping from evaluative facts to giving advice to reasoners. As we investigate how to evaluate credence changes with our measures of irrationality, we will learn some interesting further things along the way. I will argue that the identification of good patterns of reasoning is of very limited benefit in dealing with irrational thinkers. This is in large part because reasoning rules or patterns that are good to use for fully rational thinkers can lead to suboptimal results when employed by irrational thinkers. Furthermore, by examining cases in which thinkers augment their credence assignments by adding credences, we will gather more evidence against using the piecemeal+weighted averaging strategy for measuring degrees of irrationality. In section 1, I will distinguish different ways in which attitude changes can be evaluated, and argue that distance measures of rationality can play a helpful role in doing so. In section 2, I will show how we can use them to evaluate credence revisions in the absence of new information. In section 3, I will apply them to examples of augmentative reasoning, in which thinkers form credences in claims they hadn’t previously considered on the basis of their existing attitudes. In section 4, I will examine cases in which thinkers update their credences upon learning new evidence.

1. Ways of Evaluating Credence Change Thinkers change their attitudes constantly, and we are interested in how attitude changes affect the propositional epistemic rationality of the thinker’s credences. I will contrast two different ways in which we might evaluate credence changes. One is a natural way of applying our distance measures of irrationality to credence changes. We evaluate the thinker’s credences before and after the change by measuring the degree of propositional (ir)rationality of the thinker’s credences. This way of evaluating credence changes can detect and compare the goodness of the thinker’s epistemic states before and after the change. The other way of evaluating credence changes I will consider has recently gained a fair amount of attention in the literature. It evaluates attitude changes by identifying good patterns that attitude changes can instantiate. Good reasoning patterns are usually understood as being such that, if we start with premises that are good in some way, i.e. by being rational, or true, or justified,

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

129

then they (normally) lead to conclusions that share these properties. McHugh and Way (2018b) generalize this to saying that good reasoning patterns normally lead from fitting attitudes to fitting attitudes. Notice that this approach to evaluating attitude changes also employs ways of evaluating the goodness of the attitudes before and after the attitude change. Yet, instead of evaluating individual instances of change, this approach identifies patterns that reliably produce good outputs if the inputs are also good in the relevant way. Notice that neither one of these approaches can determine by itself whether a thinker has reasoned well. Attitude changes can be brought about by (automatic or deliberative) reasoning, but they don’t need to be. My credences can also change as a result of a head injury, some futuristic medical intervention, or some other event along these lines. My credence change resulting from a bump on the head might look like it follows the pattern of a good rule of reasoning, but that doesn’t mean that my credence change instantiates this pattern in a non-accidental way. A thinker might come to think that B is true after consciously entertaining her beliefs that A, and that A entails B, but this sequence doesn’t guarantee that the thinker’s adoption of the belief in B is related to her other beliefs in a way that would make this sequence of events an instance of reasoning. To distinguish between attitude changes that constitute instances of reasoning and attitude changes that don’t, the relationship between the thinker’s attitudes matters. Roughly speaking, for the thinker’s adoption of a belief that B to be the product of reasoning from her beliefs that A and that A entails B, her conclusion-attitude must stand in both a causal and a rational relation to her premise-attitudes. How to explicate this relationship is a notoriously difficult question, particularly because we also want to be able to capture instances of bad reasoning, where, intuitively, a thinker inferred a conclusion from a premise, although the premise doesn’t (completely) support adopting the conclusion (see Wedgwood 2011, McHugh & Way 2018a). I won’t explore here how we should characterize the difference between reasoning and other attitude changes. Rather, I’ll just note that an assessment of whether someone has reasoned well or badly, rather than just changed their attitudes in a way that was beneficial or detrimental from the perspective of propositional epistemic rationality, must take into consideration the relationship between premise- and conclusion-attitudes that is the mark of reasoning.¹ ¹ Why care so much about whether someone’s credence revisions were accomplished as a result of good reasoning? One reason might be that thinkers who use good reasoning to change their attitudes

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

130    My focus in this chapter is particularly on identifying methods for evaluating credence changes that work regardless of whether we apply them to rational or irrational thinkers. There are a couple of reasons to think that the first method, which measures the degree of irrationality of a thinker’s credences before and after a change, has important advantages over the method that evaluates credence changes by checking whether the change instantiates a good pattern. First, the pattern method encounters a version of the generality problem. Even if we can tell which attitudes entered the thinker’s reasoning as premise attitudes, and which attitudes the thinker ended up with, there can still be an open question regarding what pattern the reasoning instantiated. Perhaps the thinker’s reasoning followed a good pattern, or it could be a mistaken (and hence accidentally good) application of a faulty reasoning pattern, or perhaps it wasn’t produced as a result of following any pattern of reasoning at all. As we know from discussions of the generality problem in the context of reliabilism, this is a difficult problem to solve (Conee and Feldman 1998). But even if we set the generality problem aside, there is a problem if we want to use the reasoning pattern approach to evaluate how well irrational thinkers reason.² Suppose we adopt the following characterization of good reasoning patterns, which was proposed by McHugh & Way (2018b): The move from [premise attitudes] P₁ . . . Pn to [a conclusion attitude] C is a good pattern of reasoning iff, and because, other things equal, if P₁ . . . Pn are fitting, C is fitting too.

Here’s why this account is ill-suited for our purposes: When thinkers reason from irrational starting points, then their reasoning is not captured by such an account of good reasoning, because their attitudes are necessarily unfitting. Interestingly, McHugh & Way are worried about the problem of bad input for

can be relied upon to have attitudes that are well supported by their reasons. Being able to reliably form attitudes in an epistemically good manner is a desirable quality that the person whose attitudes only ever change as a result of a bump on the head doesn’t have. Also, it might be necessary for the attitude change to be attributable to the thinker that it came about as a result of reasoning. It seems that this matters for whether the thinker can be criticized or praised for a change in her attitudes. ² McHugh & Way don’t think that the relevant way in which the input attitudes must be good or fitting is by being rational. They worry that being rational is not a worthwhile aim, and that we thus lack an explanation of what the point of reasoning is (2018a). But we have seen that on the conceptions of epistemic value we examined, this need not be a worry. For example, we saw that being rational can promote values such as having true beliefs, so there’s no mystery as to why thinkers should aim to have rational attitudes: because they promote epistemic values.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

    

131

reasoning. They want to allow for good reasoning to be able to proceed from “mistaken or unjustified” attitudes. Their example is one in which a person decides to take a sip from the glass in front of them, mistakenly believing it contains gin, although it actually contains petrol. It’s easy to see that, if this belief were true, then it would be a good decision to take a sip from the glass, hence, this case is covered by their account of good reasoning. But the problem with attitudes that are irrational in particular ways, for example by being incoherent, is that they are always irrational, they aren’t rational in alternative contexts. Unlike in the case of the belief that there is gin in the class, which might be true in another context, there is no situation in which it is fitting for a thinker to have credences in A and ~A that don’t sum to 1. Suppose we have a case of good reasoning in which P₁ . . . .Pn are irrational credences, and the output of the reasoning is a rational credence assignment to the same statements that P₁ . . . .Pn are defined over. Then, it is obvious that the input attitudes must be unfitting, whereas the conclusion attitudes are fitting. This is an instance of good reasoning, but the antecedent of the conditional “if P₁ . . . .Pn are fitting” must be false, which means that this definition doesn’t actually cover this non-vacuously as an instance of good reasoning. By contrast, if we simply evaluate the rationality of the thinker’s attitudes before and after the revision, we can easily capture that an improvement has occurred. By using the method of evaluating the rationality of the thinker’s attitudes before and after a credence change we can also show that if an irrational thinker tries to use a rule of good reasoning, but uses irrational attitudes as input, this can unnecessarily increase irrationality in their attitudes. We will see an example of this in section 3. The other problem with focusing on patterns of good reasoning is that there are a lot of ways of being irrational, so it’s not obvious that trying to fit all attitude revisions that lead to a desirable reduction in irrationality into specific patterns of reasoning is all that helpful. It seems much more straightforward to simply use measures of rationality like the ones developed previously to detect whether a change in a thinker’s irrational credences has led to an improvement in the propositional epistemic rationality of their credences. While detecting patterns is certainly interesting and might help us eventually in the quest of formulating advice for irrational thinkers, I think the task of evaluating belief changes of irrational thinkers can be accomplished without paying too much attention to them. I will thus use the method of measuring the degree of irrationality of a thinker’s beliefs before and after a belief change in the rest of the chapter. When there are cases that shed additional light on the applicability of the pattern method, I will point these out.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

132   

2. Revising Credences In this section, I will show how we can evaluate credence changes in which a thinker revises their existing credences, without adding any new attitudes, and without learning new information. Such credence revisions provide us with specific instances of good attitude changes that are not captured by the definition of a good reasoning pattern we encountered in section one. Suppose a thinker has a credence assignment c that is defined over some set of statements {A₁, . . . , An}, and, without learning anything new, the thinker’s credences change to a new credence assignment c’ that is also defined over the same set, but distinct from c. There are two possibilities regarding c: either c is in the range of credence assignments that is permitted by the principles of ideal propositional rationality for the thinker, or it is not (I am bracketing the indeterminacy case discussed in Appendix B of Chapter 6 to make things simpler). Depending on the starting point, there are different options for the outcomes of the revision: Input: c is an ideally rational credence assignment Possible Outputs: - c’ is an ideally rational credence assignment - c’ is not an ideally rational credence assignment Input: c is not an ideally rational credence assignment Possible Outputs: - c’ is an ideally rational credence assignment - c’ is not an ideally rational credence assignment, but more rational than c - c’ is not an ideally rational credence assignment, but less rational than c - c’ is not an ideally rational credence assignment, but just as irrational as c - c’ is not an ideally rational credence assignment, but not comparable to c. As explained previously, we’re measuring the degree of irrationality of a credence assignment by using an appropriate distance measure in combination with either the bundle strategy or a version of the piecemeal strategy. Most of these cases are quite straightforward. From the point of view of propositional epistemic rationality, it constitutes an improvement when a thinker’s credences change from being less rational to being more rational. A change from a more rational to a less rational credence assignment constitutes a worsening. As we saw in the previous chapter, there are views of rationality on which different epistemic values are incomparable. This type of view opens up the possibility that c and c’ might not be comparable, i.e. c’ is not ranked with

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

133

regard to its degree of rationality compared to c. This can happen when c’ is better than c at approximating some rational principles, and worse at approximating others. This view can’t really offer an evaluation of the change in credence from c to c’. It can’t even say that the change is of neutral epistemic value, because that would suggest that c and c’ are somehow equally rational. Rather, this view must be silent about credence changes of this type. While this is not a very satisfying result, it is an unsurprising consequence of any view that denies the comparability of different epistemic values. What about revisions that take thinkers from a rationally permissible credence assignment to another rationally permissible credence assignment, where those credence assignments are distinct from each other? In principle, this seems possible, at least according to permissivist views of rationality. But in fact, once we combine a permissivist view with a rule about updating on new evidence that restricts the kinds of permissible credence changes, such as conditionalization, this possibility is ruled out. On a typical permissivist position, there is more than one prior credence function that is rationally permissible. But if we also adopt the view that, if a thinker has rational credences, they should not change their credences unless they learn new evidence (and the permissible changes are uniquely determined by an update rule), then once a prior has been chosen, the rationally permissible credences at any given time for this agent are uniquely determined (see White 2005 and Meacham 2014 for discussion). Another way to think about this case is as one in which a thinker updates their credences on tautologous evidence. Conditionalization says that if you update on a tautology, you should stick to your existing credences. Hence, a change in credences without learning new information essentially constitutes a violation of the conditionalization rule. Of course, while conditionalization is part of the canonical rules of Bayesian epistemology, some might wish to reject it in favor of more permissive updating rules. If an updating rule is adopted that allows more than one way of responding to evidence, and that permits changing one’s credences without new evidence, it’s possible that there can be a change from one to another permissible credence function. The framework I propose is compatible with updating rules that effectively enforce uniqueness once a prior credence assignment has been chosen, as well as with more permissive approaches to updating and credence changes.³ ³ Kenny Easwaran has suggested to me that a permissivist who endorses conditionalization as the uniquely rational updating strategy might not consider a change in credences to another initially permissible credence function rationally problematic if it results from an event that’s definitely not reasoning, such as a bump on the head. Yet, the same change would be considered irrational if it was the result of (bad) reasoning. I think this might be correct, I won’t try to settle this question here.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

134    There can also be cases in which a thinker ends up with distinct, but equally irrational credence assignments. From the point of view of propositional epistemic rationality, nothing has changed for the thinker. We can think of different ways in which this might happen. Perhaps the thinker attempted to revise their credences to make them less irrational, but ended up making changes that left them with different, but overall equally irrational credences. But there could also be cases of unmotivated flip-flopping between equally irrational credence assignments. While the latter case presumably lacks the additional features required to count as reasoning, the former case might at least be the result of somewhat decent reasoning. Perhaps the thinker corrected some obvious irrationality in one portion of their credence assignment, without noticing that this would make those credences be more at odds with another portion of their credence assignment. In any case, the measures of degrees of irrationality make the correct contribution to evaluating the thinker’s credences in these cases. As I already indicated in section 1, the identification of good reasoning patterns by means of a definition like the one provided by McHugh and Way is of limited value when applied to irrational thinkers. Cases of belief revision of the type discussed in this section can give us examples of good reasoning that their definition does not cover. Suppose a thinker begins with a set of credences that are irrational, perhaps due to being incoherent. The thinker then revises their credences in such a way that significantly reduces or even eliminates their degree of irrationality. If the thinker’s starting credences are such that there is no situation in which they could be rationally permissible, then the antecedent of the conditional “if P₁ . . . Pn are fitting, . . . ” must be false, rendering the definition of a good reasoning pattern vacuously true (although it is difficult to evaluate how the “because” clause in the definition works in this kind of case). Perhaps a better way of identifying good reasoning patters would avoid this problem. For now, we will leave open the question whether good instances of credence revisions as defined in this section can be fruitfully classified as following particular patterns.

3. Augmenting Credences This section is concerned with a type of reasoning that doesn’t involve learning new evidence, but still leads the thinker to adopt new attitudes by drawing out consequences of their existing attitudes. I call it augmentative reasoning. (From here on out in this chapter, I will use “reasoning” in a broad sense

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

135

that also includes mere credence changes, to improve the text’s readability.) I first examine augmentative reasoning when done by fully rational thinkers, and subsequently when done by irrational thinkers. My discussion will have three main upshots: First, I consider a principle called Preservation, which says that, in any instance of augmentative reasoning, an irrational thinker is able to augment their credences in a way that preserves their initial degree of irrationality. I show that Preservation is true if we measure degrees of irrationality with the bundle strategy, but is false if we measure degrees of irrationality with any version of the piecemeal strategy. The relevant proofs also demonstrate that no two strategies always agree on how the thinker should augment their credences so as to minimize increases in irrationality. Second, I use these results to show that it is not a good idea for irrational thinkers to use the same rules of reasoning for augmenting their credences as perfectly rational thinkers. Third, my findings raise problems for the piecemeal+weighted averaging strategy for measuring degrees of irrationality.

3.1 Augmenting Credences under Ideal Conditions Augmentative reasoning is the focus of most of the literature on reasoning, and people usually talk about it as concerning outright beliefs (see for example Broome 2013, Boghossian 2014, McHugh & Way 2018a, b). Modus ponens reasoning is a standard example of this kind of reasoning, where a thinker comes to believe some claim A on the basis of their existing beliefs that B and that (if B then A). A very similar kind of reasoning can be done with credences. I will call this type of reasoning augmentative reasoning. When reasoning in this way, the thinker holds their existing credences fixed, and assigns credences to new propositions on their basis. The thinker’s new credences are constrained by the probability axioms in combination with their old credences. We can see how this works in rational thinkers by considering the following example: c₁ðSmith will be the next presidentÞ ¼ 0:1 c₁ðJones will be the next presidentÞ ¼ 0:5 c₁ðMurphy will be the next presidentÞ ¼ 0:4 Say our thinker doesn’t currently have a credence assigned to the disjunction (Smith or Jones will be the next president). But they can engage in augmentative reasoning to form an opinion about it. If they keep their existing

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

136    credences unchanged, the only credence assignment to (Smith or Jones will be the next president) that is probabilistically coherent with their current credences is 0.6. Of course, as we discussed in the previous section, thinkers are not stuck with their existing attitudes; they are open to revision. Sometimes, a thinker might come to revise their existing attitudes once they notice that they have implausible implications when taken together. But we will set these cases aside here and just focus on instances of augmentative reasoning in which the thinker is, for one reason or another, stuck with their existing attitudes. Again, there are two cases we need to consider: the case in which a thinker starts with credences that are rationally permissible, and the case in which a thinker starts with credences that violate the constraints of ideal rationality in some way. First, consider thinkers whose credences are permissible according to the constraints of ideal rationality. Conceive of a thinker as someone who still has gaps in their credence assignment, i.e. they aren’t opinionated about every statement they can entertain. For such a thinker to augment their credences in a rational way, their augmented credence assignment should still be rationally permissible. This is all pretty straightforward. But we might ask whether there is actually a specific strategy the thinker can employ in order to ensure that they are augmenting their credences in a way that leads to a rationally permissible outcome. They can achieve this by ensuring that their new credence is coherent with their existing credences. We are specifically interested here in cases in which their existing credences actually make a precise recommendation for what their new credence should be. (Though of course there could also be cases where their existing credences merely put constraints on their new credence without prescribing a precise value. I’ll set this case aside for now.) These are cases in which the thinker forms a new credence by drawing out the consequences of their existing credences. A rule of reasoning, or reasoning pattern, that the thinker can employ for this purpose is the following: Augmentative Inference Rule (AIR): Suppose the thinker is holding their existing credences fixed, and they are considering what credence to assign to some gap proposition A. For any subset of their credences S, if, given their credences in S, x is the only credence they can have in A that is consistent with the probability axioms, then they should assign c(A) = x.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

137

Of course, this is not the only possible rule by which a rational thinker could augment their existing credences and stay rational. But this rule seems quite natural, for several reasons: It allows the thinker to use any subset of their credences as their starting point for reasoning that determines a precise credence for the gap proposition. This seems more natural as a rule of reasoning than, for example, a rule that requires a thinker to consult all of their existing credences as premise-attitudes when selecting an appropriate credence for the gap proposition. Standard examples of rules of deductive reasoning work in very similar ways, where a thinker infers a conclusion from a subset of their beliefs that entail this conclusion. Also, this rule is a good pattern of reasoning by McHugh & Way’s definition, since it leads to rational conclusions when its input are rational credences, according to our measures of rationality. We will examine in the next subsection whether AIR is also a good rule of reasoning to use for thinkers who start from irrational credence assignments.

3.2 Augmentative Reasoning for Irrational Thinkers What about thinkers who begin from irrational starting points? How should they augment their credences? One response to this question is that irrational thinkers should become rational before they engage in any other kind of reasoning, or, they should at least somehow quarantine the irrational attitudes, so as not to infect the thinker’s reasoning that employs their non-defective attitudes (see e.g. Harman 1986, p. 15). We can grant that from the perspective of ideal rationality, it is very desirable that thinkers only reason from rational starting points, and that it would therefore be good if irrational thinkers could become rational before engaging in any reasoning other than revising their irrational credences. But at the same time, this is unrealistic for human reasoners. As mentioned before, for human thinkers, being ideally rational by Bayesian standards is not an attainable goal, so they can hardly abstain from reasoning until they’ve made their attitudes fully rational. With credences, the idea that thinkers should quarantine their defective attitudes is also problematic. Requirements like coherence are global requirements, so it’s not always obvious which individual attitudes are to blame for a failure of global coherence. Perhaps this is possible in the case of tautologies and contradictions to which the thinker assigns middling credences, but otherwise, pinpointing which attitudes are the incoherent ones is difficult. Moreover, if it were possible for the thinker to locate

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

138    and isolate irrational attitudes, doing so would often be just as difficult as actually fixing the irrationality, so this proposal isn’t really more feasible than the one that thinkers should become rational before engaging in reasoning (other than revising). We thus need to ask how irrational thinkers can augment their credences in optimal ways, given that they’re irrational. The question is: given that a thinker is augmenting their credences from an irrational starting point, what is the best outcome they can achieve from the perspective of ideal rationality? Suppose an irrational thinker engages in augmentative reasoning. What are the possible outcomes regarding the degree of rationality of their credences? In principle, there are four possibilities, like before: their augmented credences are more rational, less rational, or equally rational compared to their starting credences, or they are incomparable to their starting credences. Given how we are measuring degrees of irrationality, we can rule out the first possibility: it is not possible for a thinker to become more rational as a result of augmentative reasoning. We’ve been using additive distance measures to generate various strategies for determining how closely a thinker’s credence approximate rationally permissible credence assignments. When a thinker adds a new credence, their existing credences remain the same. Hence, their new credence can add new irrationality into the mix, but it cannot diminish the irrationality of their existing credences. Hence, the thinker’s augmented credences, if they are comparable to their starting credences, can only be as irrational or more irrational than their starting credences. To demonstrate what the best possible outcome is, we need to look at the different measuring strategies separately. We will begin with the bundle strategy. According to the bundle strategy, we measure the degree of irrationality of a credence assignment c by choosing an appropriate distance measure or divergence d, and determining how closely c approximates some d-closest rationally permissible credence assignment. If the bundle strategy is used, we can prove the following claim:

Preservation In any instance of augmentative reasoning that a thinker engages in, it is always possible for them to augment their credence assignment in such a way that their new credence assignment is as close to some closest rational credence assignment as their initial credence assignment c. Proof of Preservation Using the Bundle Strategy In measuring the irrationality of a credence assignment, we treat the credence assignment as a vector X = (x₁, . . . , xn), where each component of the vector

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

139

represents one of the thinker’s credences in a proposition. The degree of irrationality of this credence assignment is then determined by finding a vector Y = (y₁, . . . , yn), which represents a credence assignment that is rationally permissible, and minimizes the distance between X and Y. The divergences we have used to determine the distance between the vectors are additive. This means essentially that the contribution that each component of the vector makes to the distance between the vectors is computed separately and summed up. Now, suppose X = (x₁, . . . , xn) represents the initial credence assignment c of an irrational thinker, and Y = (y₁, . . . , yn) represents some closest rationally permissible credence assignment c* to c. When a thinker engages in augmentative reasoning, they expand their existing credence assignment by adding a credence for some proposition A. We can represent the expanded irrational credence assignment c+ with a vector X+ = (x₁, . . . , xn, xn+1) that has an additional component. Suppose now that Y+ = (y₁, . . . , yn, yn+1) represents the expanded version of c*, which now includes an assignment c*+(A) = a. The value c*+(A) = a, represented as yn+1 in the vector, is a credence that can be coherently added to the rational credence assignment c*.⁴ How does the degree of irrationality of c relate to the degree of irrationality of c+? First, since c is contained in c+, there can’t be any decrease in the distance to a closest rational credence assignment. So, c+ must either be as irrational as c or more irrational. Which option it turns out to be depends on whether the added credence increases the overall distance to some closest rational credence assignment. We said that c*+ is expanded by adding a coherent credence for the new proposition A, c*+(A) = a. For c+, the new credence in A can either be c+(A) = a, or c+(A) 6¼ a. If c+(A) = a, then |xn+1  yn+1|= 0, so the new components of the vectors X+ and Y+ don’t increase the distance between them compared to the distance between X and Y. If c+(A) 6¼ a, then |xn+1  yn+1|> 0, and so the distance between X and Y increases. This means that if the thinker wants to augment their credence assignment without becoming more irrational, they must ensure that the value of their new credence coheres with the credences in c*, which is the (or a) closest rational credence assignment to their existing credences. If they choose a credence for A which does not cohere with the credences in the (or one of the) closest rational credence assignment to their initial credences, their irrationality increases as a result of adding the new credence. ⃞

⁴ It is assumed here that if c* is a rational credence assignment, then its coherent expansion c*+ (where the newly added credence is the unique one coherent with c*) is also rational, and if c*+ is rational, so is its contraction c*. This assumption strikes me as extremely plausible, but I am not providing a separate argument for it here.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

140    Hence, if we use the bundle strategy for measuring degrees of irrationality, then for any instance of augmentative reasoning they might engage in, the irrational thinker can assign a new credence that keeps their degree of irrationality constant. Doing so requires assigning the credence to the gap proposition that coheres with the rational credence assignment that serves as the point from which the degree of irrationality of the thinker’s credences is measured. What if we use a piecemeal strategy instead for measuring degrees of irrationality? On the piecemeal strategy, we measure the distance from the closest permissible credence assignment for each requirement of rationality separately. Once these separate measures have been obtained, there are three options: (i) The scores are aggregated into a single score, for example by computing the average (piecemeal+weighted averaging). (ii) The requirements of rationality are lexically ordered. Credence assignments are ranked with regard to their distance from some closest permissible credence assignment according to the highest ranked requirements, and distance to compliance with lower ranked requirements can serve as a tiebreaker (piecemeal+lexical ordering). (iii) On the incomparability view, we cannot compare the degree to which a credence assignment approximates compliance with one requirement with the degree to which it approximates another requirement. We can only generate orderings of credence assignments when there is agreement between the different constraints about how credence assignments should be ranked (piecemeal+no comparisons). To see what happens with regard to each version of the piecemeal strategy, let’s assume there is an irrational credence assignment c that is supposed to be augmented to some new credence assignment c+, and let A be the proposition to which the new credence is assigned. Further, let’s assume that there are two requirements of rationality, R1 and R2, and let assume that cR1 is some closest coherent credence assignment to c that satisfies R1 and cR2 is some closest coherent credence assignment to c that satisfies R2. Moreover, assume we are using some additive divergence d to measure the distance to some closest credence assignment that satisfies a requirement of rationality, and also assume that cR1 6¼ cR2. First, we will show that under these assumptions, Preservation is false if we measure degrees of irrationality with the piecemeal+weighted averaging strategy.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

141

Proof of the Falsity of Preservation Given Piecemeal+Weighted Averaging In order to maintain c’s degree of irrationality, c+ would have to assign a credence to A that does not increase the average distance to some closest rational credence assignment. In order to accomplish that, c+ would have to assign a credence to A that is equal to cR1+(A) and cR2+(A). But since cR1 6¼ cR2, it can easily be the case that cR1+(A) 6¼ cR2+(A). When cR1+(A) 6¼ cR2+(A), then the new credence that the thinker assigns to A must inevitably increase the distance to some closest coherent credence assignment that complies with R1 or the distance to some closest coherent credence assignment that complies with R2. This means that the average distance to some closest credence assignment that complies with a requirement of rationality will also increase. As a result, it is not guaranteed, when we use the piecemeal+weighted averaging strategy to measure a thinker’s degree of irrationality, that they can augment their credences in such a way that they maintain their previous degree of irrationality. Augmenting their credences will make them more irrational. The only exception are cases in which there is agreement between the different closest credence assignment cR1 and cR2 regarding what credence should ideally be assigned to A. In all other cases, the best the thinker can do is assign c+(A) in such a way that the increase in irrationality is minimized. ⃞ This shows that Preservation is false on the piecemeal+weighted averaging measuring strategy. If we add some more specifications, we can be more precise about which assignment c+(A) is going to increase the thinker’s degree of irrationality the least. Moss (2011) shows that, if we have two coherent credence assignments, and we use the same strictly proper scoring rule to measure the epistemic disutility of each credence assignment, then the compromise credence assignment between the two that minimizes their joint (or average) epistemic disutility is the one that is the equally weighted average between them. This result is interesting for our purposes. In previous chapters, we have used strictly proper scoring rules to measure the divergence from some closest credence assignment that complies with a rational requirement, by computing the increase in the expected score that results from adopting the irrational credence assignment instead of some closest credence assignment that complies with the requirement. If we use such a divergence d to measure how closely c approximates cR1 and cR2, then the problem of finding the credence in A that increases the thinker’s irrationality the least is the same problem as the problem of finding the compromise credence between

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

142    cR1+(A) and cR2+(A) that minimizes their average utility score. Moss’ result tells us that this is the linear average between the two credences, which is 0.5  cR1+(A) + 0.5  cR2+(A). Hence, under these measuring constraints, the piecemeal+weighted averaging strategy has the result that the assignment c+(A) that increases the thinker’s degree of irrationality the least is equal to the credence that lies halfway between cR1+(A) and cR2+(A). What if we instead adopt the piecemeal+lexical ordering strategy? We can show that Preservation is false in this scenario as well: Proof of the Falsity of Preservation Given Piecemeal+Lexical Ordering Assume everything is defined as before, except that now R1 and R2 are ordered in such a way that compliance with R1 trumps compliance with R2. In this case, the thinker can ensure to not increase the distance to some closest credence assignment that complies with R1 by assigning c+(A) = cR1+(A). As a result, c and c+ are on a par regarding approximating compliance with R1. However, as we learned before, assigning a credence to A in this way increases the thinker’s distance from some closest credence assignment cR2, unless cR1+(A) = cR2+(A). If the thinker’s new credence assignment c+ is less close to complying with R2 than their old credence assignment c, then this acts as a tiebreaker, and c+ is ranked as being overall less rational than c. Hence, this type of case acts as a counterexample to Preservation. ⃞ Interestingly, this strategy makes a different recommendation than piecemeal +weighted averaging for what constitutes the best possible augmentation of the thinker’s credences. On the piecemeal+lexical ordering strategy, the best possible augmentation of the thinker’s credences is one that assigns c+(A) = cR1+(A), where R1 is the highest ordered requirement of rationality. This assignment will leave them less rational, however, unless this credence assignment to A is also optimal according to all lower ranked requirements of rationality. By contrast, we saw that on the piecemeal+weighted averaging strategy, the best new credence to adopt in A lies halfway between cR1+(A) = cR2+(A). Lastly, we will consider the piecemeal+incomparability strategy. On this measuring strategy, the degrees to which a credence assignment approximates compliance with different principles of rationality are incomparable. In the scenario we have been considering where cR1+(A) 6¼ cR2+(A), there is no credence assignment to A that leaves the thinker’s distance from complying with both R1 and R2 the same, hence Preservation is false here as well. Which credence assignment to A is best on the piecemeal+incomparability strategy, in that it increases the thinker’s irrationality the least? The thinker

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

143

must either choose c+(A) = cR1+(A) or c+(A) = cR2+(A) or an entirely different value for c+(A). Depending on which the thinker chooses, her credences will either stay at the same distance from complying with one requirement of rationality and become worse with regard to the other, or their distance from complying with both requirements will increase. In terms of the orderings that we are allowed to generate on the incomparability strategy, a credence assignment c+ that is worse than c at approximating compliance with both R1 and R2 is overall less rational than an alternative credence assignment that is only worse than c at approximating compliance with one of the requirements, and equally good at approximating the other. Hence, the piecemeal+incomparability strategy recommends choosing either c+(A) = cR1+(A) or c+(A) = cR2+(A), although either assignment would leave the thinker with a credence assignment c+ that is less rational than c. To sum up: The principle Preservation, which says that a thinker is always able to augment her credences in a way that preserves her initial degree of irrationality in any instance of augmentative reasoning they might engage in, is true if we measure degrees of irrationality with the bundle strategy, but it is false if we measure degrees of irrationality with any version of the piecemeal strategy. Moreover, no two strategies always agree on how the thinker should augment their credences so as to minimize increases in irrationality.

3.3 Upshots We can use these insights about optimal augmentation strategies according to the different measures in two ways: First, we can use them to investigate whether it is a good idea for irrational thinkers to use the same rules of reasoning for augmenting their credences as perfectly rational thinkers. Second, we can use them to further assess the plausibility of the different strategies for measuring degrees of irrationality. We now know the optimal outcomes for an irrational thinker’s augmentations of their credence assignment, so we can see what happens if they try to use the same rule of reasoning for augmenting their credences that ideal thinkers can use. Recall that a good rule of reasoning for ideal thinkers to use in augmenting their credences is the following: Augmentative Inference Rule (AIR): Suppose the thinker is holding their existing credences fixed, and they are considering what credence to assign to some gap proposition A.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

144    For any subset of their credences S, if, given their credences in S, x is the only credence they can have in A that is consistent with the probability axioms, then they should assign c(A) = x. This rule can also be used by irrational thinkers to augment their credence assignments. However, there is an important difference between cases in which irrational thinkers use AIR and cases in which rational thinkers use AIR. For ideally rational thinkers whose credences are coherent, every subset of their credence assignment that recommends a precise credence for the new proposition A will make the same recommendation. By contrast, if a thinker has irrational credences that are incoherent, different subsets of the thinker’s credence assignment can make different recommendations. Here’s an example of how this can happen. Assume a thinker has the following credences: c₂ðAÞ ¼ 0:49 c₂ðA & BÞ ¼ 0:2 c₂ðA &  BÞ ¼ 0:2 c₂ð AÞ ¼ ? The thinker seeks to augment their credence assignment by assigning a credence to ~A. If they look towards the subset of their credence assignment that contains their credence in A, and they apply AIR, they will assign c₂(~A) = 0.51. By contrast, if they look towards the subset of their credence assignment that contains their credences in the two conjunctions, and they apply AIR, then they will assign c₂(~A) = 0.6. In order to determine whether it is a good idea for the thinker to find their new credence in ~A by using AIR, we need to evaluate whether the assignments proposed by AIR are optimal. To do so, we need to choose a more specific scenario. We must settle on a measure of the degree of irrationality of the thinker’s credences, and specify which requirements of rationality are in force. Let’s assume that we are in a scenario in which the relevant requirements of rationality are the Indifference Principle (applying to both A/~A and B/~B), and Probabilism. Furthermore, assume that we use the bundle strategy in combination with some additive divergence for measuring degrees of irrationality. Given those parameters, the closest credence assignment that meets all of the requirements is c₃(A) = 0.5, c₃(A&B) = 0.25, c₃(A&~B) = 0.25. We know, given our results above, that finding the assignment c₃(~A) that coheres with c₃ is the optimal credence to assign to ~A for someone whose credence assignment is c, because it does not increase their degree of irrationality. But c₃(~A) = 0.5 diverges from both

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

145

possible applications of AIR to the thinker’s existing credences. Hence, the irrational thinker cannot use the same rule of reasoning as the ideal thinker to find optimal augmentations of their credences. Our example assumes that the bundle strategy for measuring irrationality is used, and that there are two requirements of rationality. We can generate similar examples in which AIR does not yield an optimal augmentation of the thinker’s credences if we use one of the piecemeal measures, or if we assume that there are more or fewer requirements of ideal rationality (for some examples in which AIR fails to deliver optimal results when only probabilistic coherence is taken to be rationally required, see Staffel 2017b). Coming up with the relevant examples is left to the reader. This sheds additional light on McHugh & Way’s suggestion for identifying rules of good reasoning. AIR passes their test for a rule of good reasoning, because it delivers rational attitudes as output when rational attitudes are used as input. Their account leaves open what happens when a rule is used with attitudes as inputs that are necessarily unfitting. It turns out that suboptimal outputs are generated. Hence, irrational thinkers can’t assume that it’s a good idea to use rules of reasoning that have proven to generate optimal outputs when used by rational thinkers.⁵

⁵ One might object here that, if an irrational thinker applies AIR to a coherent subset of their credences and thereby generates a new credence, this should be counted as good reasoning, because if their starting credences were part of a larger rational credence assignment, their augmented credences would be optimal. Yet, there is reason to question this line of argument, because it overgenerates instances of good reasoning. To see why, consider a slightly different kind of case. Say the set E contains the thinker’s total evidence, or all of their reasons that bear on what attitude they should have towards some claim A. Suppose further that, considering all of E, the thinker should arrive at a middling credence in A. However, assume further that if we only look at subsets of E, these subsets taken by themselves might support very different attitudes towards E, such as very high or very low credence, depending on which subset is selected. This can happen, for example, if we think that there are evidential requirements beyond coherence, such that different subsets of E taken on their own can provide different evidence and thus recommend different credences. If a thinker then went on to ignore some of their evidence, and formed, say, a high credence in A based on a subset of their evidence, we would think that this thinker had made a mistake, or reasoned poorly. In technical jargon, we might say that the thinker violated the requirement of total evidence (see e.g. Carnap 1962, p. 211). An incoherent thinker who just picks a subset of their credences and reasons on its basis is doing something very similar. They just pick some of their attitudes to form a new credence in, say, A, even though taking additional attitudes they have into account would lead them to form a different credence in A. This scenario is very similar to the one in which a thinker violates the requirement of total evidence. (Notice that this cannot happen to a fully rational thinker who uses the AIR rule.) Since it’s plausible to criticize the thinker who ignores relevant evidence for having reasoned poorly, it’s also plausible that the thinker who relies on a subset of their overall irrational credences is criticizable. Of course, one could bite the bullet here and claim that the thinker who violates the total evidence requirement still reasoned well on the basis of the evidence they are attending to. But the proponent of this position still needs to give an alternative account then of what kind of mistake this thinker has made, since their reasoning is clearly defective. Regardless of how we resolve this, the case of augmentative reasoning provides further support for the claim I advanced in section 1: identifying good reasoning patterns is of limited use when we are considering thinkers who reason on the basis of attitudes that are necessarily rationally defective.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

146    We will now take another look at our measuring strategies, and their respective recommendations about how to augment one’s credences. We can use these results to shed more light on how to best measure degrees of irrationality. In the previous chapter, I argued that the view on which there is only a single source of epistemic value pairs naturally with the bundle strategy for measuring irrationality. But we were less sure about which measure to use for the view on which there are several comparable epistemic values giving rise to requirements of rationality. This view can be paired with both the bundle strategy and the piecemeal+weighted averaging strategy for measuring degrees of irrationality. Once we take a closer look at the assessments given by the two strategies for how irrational thinkers should best augment their credences, we will see that the bundle strategies’ assessments are superior to the assessments of the piecemeal+weighted averaging strategy. This is because, if we evaluate the newly acquired credences in isolation, they will always be rationally permissible if they were recommended by the bundle strategy, but not when they were recommended by the piecemeal+weighted averaging strategy. This will give us further reasons to choose the bundle strategy instead of the piecemeal +weighted averaging strategy to measure degrees of irrationality on views on which there are multiple comparable epistemic values. We will assume that there are no rational conflicts, i.e. the requirements of rationality that apply to a thinker’s credences in a given situation identify a set of credence assignments that comply with all of the rational requirements simultaneously. The bundle strategy recommends that an optimal augmentation of an irrational thinker’s credences should assign a credence to the new proposition that coheres with some closest rationally permissible credence assignment to the thinker’s credences. By contrast, this is not the case for the piecemeal +weighted averaging strategy. Since approximations to compliance with different principles of rationality are computed separately, and the results are then aggregated, the optimal augmentations of the thinker’s credences are not necessarily ones that cohere with one of the credence assignments in the set of credence assignments that are permissible in light of all of the relevant requirements of rationality. Hence, if we consider just the augmented credences by themselves, i.e. c+\c, the bundle strategy guarantees that these credences are rationally permissible according to all the requirements of ideal rationality, whereas the piecemeal+weighted averaging strategy does not. Viewed by themselves, the credences in c+\c can be rationally defective by the lights of the piecemeal+weighted averaging strategy. I submit that this favors the bundle strategy. It ensures that its recommendations for how to assign new credences hold up to the standards of ideal rationality when the

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

147

new credences are viewed in isolation, whereas this is not the case for the piecemeal+weighted averaging strategy. This result favors the bundle strategy over the piecemeal+weighted averaging strategy for measuring degrees of irrationality on the view that there are multiple, comparable epistemic values. Of course, what ultimately matters is which measuring strategy lines up with one’s views on epistemic value, and tracks how approximating ideal rationality delivers increasing portions of this value. Yet, this argument suggests that a view of epistemic value that pairs well with the piecemeal+averaging strategy has fairly unpalatable consequences.

4. Updating Credences Another important kind of credence change occurs when a thinker learns new evidence. The standard rule of ideal rationality for changing one’s credences when new evidence is learned with certainty is the conditionalization rule. As before, we will consider scenarios in which a thinker is not able to revise their irrational credences before updating them on their new evidence. The goal is to figure out how to evaluate the thinker’s credences once they have learned new evidence. The general strategy we have used to measure credence changes evaluates the thinker’s credences before and after the change, by measuring how distant the credences are from the closest rationally permissible credence assignment. We can use the same strategy to measure the rationality of a credence change after the thinker acquires some new evidence. In the (synchronic and diachronic) cases we have considered so far, we only had to consider the credences that are rational for a thinker at a particular time, given their evidence and the relevant rational requirements that apply to them. This was because in cases of credence revisions and augmentative reasoning, there is no change in which credences are rationally permissible for the agent. Once we consider cases of responding to new evidence, which are governed by rules for rational updating, the credences that are rationally permissible for a thinker can change. Let’s first examine cases in which a thinker starts with rational credences. Take the example we considered in the previous section, in which the rational credences are given by c₃ðAÞ ¼ 0:5 c₃ðA & BÞ ¼ 0:25 c₃ðA &  BÞ ¼ 0:25 c₃ð AÞ ¼ 0:5

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

148    If we assume that a rational thinker must conditionalize on their evidence, then, if the thinker learns that A is true, the rationally permissible credences change to c₄ðAÞ ¼ 1 c₄ðA & BÞ ¼ 0:5 c₄ðA &  BÞ ¼ 0:5 c₄ð AÞ ¼ 0 Hence, if we measure the degree of rationality of the credences the thinker assigns after the update, we need to measure the divergence between the thinker’s new credences and the closest rationally permissible credences after the update. As mentioned in section 1, even if there were multiple rationally permissible priors available to a thinker, once they have settled on a prior, the conditionalization rule ensures that there will always be a unique rationally permissible credence assignment available after each update (defined over the same set of statements). Hence, if a rational thinker updates by conditionalization, their degree of irrationality will be zero before and after the update. Of course, if we adopted an alternative update rule to conditionalization that permitted a range of responses to learning new information, then there wouldn’t have to be a unique rationally permissible credence assignment at any given time, instead, there might be a set of them. In that case, we use the same strategy as before and measure the distance from some closest rationally permissible credence assignment in the set. In this case, too, the thinker’s degree of irrationality will be zero before and after the update. I am not aware of any particular permissive updating rule in the literature, but our measuring strategy should not exclude the adoption of such a rule. This strategy for measuring the degree of irrationality of a response to incoming evidence is in harmony with the strategies we employed in the previous sections. As mentioned before, revising one’s credences without any new evidence can actually be thought of as an example of updating on tautological evidence. On this way of thinking, the cases we discussed in section 1 are simply a particular instance of the cases discussed in the current section. So far, we have considered the case in which a thinker starts with a rationally permissible credence assignment. However, the case in which a thinker has irrational credences raises an important question that does not come up in the former case. I pointed out that once a rational thinker has

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

149

picked a prior, an updating rule like conditionalization effectively enforces uniqueness. But if a thinker doesn’t have (and never had) fully rational credences, and thus there is no rational credence function that they have adopted as their prior, is there a uniquely rational way to respond to incoming evidence for them? We can think about this case in a couple of different ways. On one way of thinking, each thinker (whether ideal or not) has, in addition to their credence assignment and their evidence, a set of standards that reflect their preferred or endorsed belief-forming strategies. Kopec & Titelbaum (2016) call these someone’s epistemic or evidential standards (though these terms probably have originated earlier in the literature), and Schoenfield (2018b) calls them the cognitive properties an agent endorses. If, for each thinker, their evidential standards determine which prior credence assignment is rational for them, then there is no difference between thinkers whose credences are fully rational at some point and thinkers who aren’t. Either way, there is a unique credence function that is rational for them to adopt at each point in time, assuming we endorse an impermissive updating rule like conditionalization. On another way of thinking, human reasoners’ evidential standards or endorsed cognitive properties might not be specific enough to determine a unique rational prior credence assignment. For example, Schoenfield writes: Consider, for example, the proposition that it will rain in Honolulu next New Year’s Day (H). There is no single doxastic attitude towards H, I claim, that is uniquely picked out by the set of cognitive properties I endorse, C. (Schoenfield 2018b, p. 4)

So, on this view, unless someone has actually committed to a prior credence assignment, there is no prior that’s “theirs,” as determined by their evidential standards, epistemic worldview, or by anything else. If that’s right, then for any irrational thinker, there is more than one credence assignment that is permissible for them. We can still measure their degree of irrationality as the distance to the closest rationally permissible credence assignment, but that’s not to say that this closest rationally permissible credence assignment is somehow privileged as their rightful unique prior. On this picture, there is a set of permissible credence assignments at any given time, given their evidential standards, their evidence, and the requirements of rationality that apply to them, including an updating rule. The agent’s degree of irrationality at a given time is determined by measuring the distance between their credences and the closest

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

150    rationally permissible credence assignment at that time, but which credence assignment this is can change from update to update.⁶ And, of course, if we accept a permissive updating rule, this further adds to the set of rationally permissible credences for a thinker at a time. I won’t attempt to settle the question here of whether there is a uniquely rational prior for each irrational agent or not, since our measuring strategy can be applied either way. My inclination is to think that there isn’t, but a discussion of this point would lead us too far afield (but see Simpson 2017 for an interesting discussion of the relation between cognitive abilities and epistemic standards). Regardless of whether we evaluate the credence changes of rational or irrational thinkers in response to incoming evidence, and regardless of which updating rule we adopt, the rationality of an update is measured as the degree of (ir)rationality of the resulting credences. This means that it is straightforward to compare various different ways in which a thinker could change their credences—we just compare the degree to which the resulting updates approximate some closest ideally rational post-update credence assignment. This is again similar to what we were doing in cases in which no new evidence is learned. Yet, it’s worth noting that these cases differ in the respect that the rationally permissible credences remain the same before and after the revision.

Conclusion In this chapter, I showed how we can use distance measures of degrees of rationality to evaluate changes in thinkers’ credence assignments. I considered three types of changes: cases in which the thinker revises their credences (i) without learning new information or adding new attitudes; (ii) without learning new information, but adding new attitudes; (iii) as a response to learning new evidence. I proposed specific methods for evaluating each type of credence change, and I argued for several additional claims along the way: First, I argued that my method for evaluating credence change is better suited for evaluating the credence changes of irrational thinkers than the method that looks for

⁶ Alternatively, a permissivist could endorse a stricter standard here, and claim that an irrational thinker should update in such a way that the rational post-update credences are determined by updating on the closest rationally permissible pre-update credences. But this would of course have to be argued for.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



151

instantiations of good reasoning patterns. Second, I showed that, even if we can identify patterns of good reasoning for ideally rational thinkers, following these rules doesn’t produce optimal outcomes for thinkers who reason from incoherent starting points. Third, we found that the piecemeal+weighted averaging strategy produced implausible results when used to measure the degree of rationality of the products of augmentative reasoning.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

8 A Small Piece of the Puzzle Introduction The view I have developed in the previous chapters shows how we can give a theory of degrees of propositional epistemic rationality, how we can justify that it is better to be less irrational, and how we can use this theory to evaluate the rationality of credence assignments and credence changes. The theory and argumentative strategies I have proposed go a long way towards answering the two challenges I set out at the beginning: the challenge that Bayesian theories of ideal rationality have little to say about non-ideal, human thinkers, and the challenge that they can’t substantiate their claim that ideal rationality is worth approximating for imperfect thinkers. My theory concerns the propositional epistemic rationality of credences, which is a very specific type of rationality that applies to a specific type of doxastic attitude. Our ordinary concept of rationality, which is the starting point for most or all philosophical theorizing about rationality, is of course much broader. It can be applied to many things—attitudes, ways of thinking, actions, decisions, people, etc.—and the conditions under which something has the property of being rational depends in important ways on the type of thing it is. For the purposes of philosophical investigation of rationality, it has been useful to investigate different subtypes and aspects of rationality separately, without losing sight of their commonalities. For example, epistemic and practical rationality are usually investigated separately and on their own terms; epistemologists further distinguish between doxastic and propositional rationality; rationality as it applies to reasoning methods is treated separately from rationality as it applies to attitudes, and so on. While this “divide and conquer” approach has been highly fruitful, it is important to keep the big picture in view, and to check every so often whether and how different pieces of the puzzle fit together. This is especially important when it looks like different ways of theorizing about rationality yield incompatible results. The account of propositional epistemic rationality I have developed is just one piece of this big puzzle, and my aim in this chapter is to take a step back and explain how I see it fitting into the bigger picture. I won’t attempt to relate my account to every

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

153

possible way of thinking about aspects of rationality—that would of course be far too ambitious. Rather, I will focus on what I take to be nearby puzzle pieces, i.e. ways of thinking about rationality that are interestingly and closely related to my approach, and I will explain how I see their connection. This will both locate my view within a larger context, and also give me an opportunity to explore some apparent tensions between my view and related views. My discussion will be centered around the following five topics: the relationship between propositional and doxastic rationality, the relationship between ideal and ecological rationality, the relationship between evaluative and ameliorative approaches to theorizing about rationality, the relationship between epistemological and semantic perspectives on rationality, and the relationship between rational evaluations, permissions, and obligations. The upshot of my discussion will be that my account of the propositional epistemic rationality of credences not only harmonizes with the ways in which rationality is theorized in these domains, but is often a necessary ingredient for developing theories of these aspects of rationality.

1. Rationality: Propositional and Doxastic I have already extensively discussed the objection that Bayesianism is too demanding for ordinary thinkers to comply with. The remedy, I argued, is to show in which ways it is beneficial for ordinary thinkers to approximate the Bayesian ideals, even if they are not fully reachable for thinkers like you and me. But there is a related worry which is not addressed by this response. The worry is based on rationality judgments about ordinary thinkers, which seem to clash with the verdicts of the Bayesian theory. To illustrate it, compare the following two cases: Uncertain Logic Student Una is a logic student, and she is working on a homework assignment that asks her to figure out whether some set of premises P entails some conclusion C. In fact, “If P then C” is tautological, but Una has not found the proof yet. She currently assigns the conditional a middling credence of 0.5. Certain Logic Student Cera is a logic student, and she is working on a homework assignment that asks her to figure out whether some set of premises P entails some conclusion C. In fact, “If P then C” is tautological, but Cera has not found the proof yet. Still, she currently assigns the conditional a credence of 1.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

154       From the perspective of propositional epistemic rationality, Cera’s credence is rational and Una’s credence is irrational. Yet, upon being presented with these cases, the more intuitive response for many people is that Una is rational, and Cera is irrational. It is easy to see why—Cera’s certainty is entirely baseless, whereas Una’s uncertainty is appropriately cautious. This type of intuition sometimes motivates people to question or reject the Bayesian view of rationality altogether—which is understandable given that not many proponents of Bayesianism explicitly distinguish doxastic from propositional rationality.¹ But once we see the standard Bayesian requirements as requirements of propositional rationality, they are a lot less counterintuitive. What is propositionally rational for a thinker depends on what their evidence supports. If a claim is tautological, and its truth can thus be determined without any empirical evidence (or, alternatively, any thinker is in some sense already in possession of the needed evidence to determine its truth), then, for any thinker, assigning this claim full credence is propositionally rational. This is the case for both Una and Cera. By contrast, an attitude is doxastically rational for a thinker only if it is based in the right way on their reasons or evidence. Our intuitions about whether a thinker has rational credences or beliefs often track this notion of rationality. We deem Cera’s certainty in “If P then C” irrational, because it is not based on anything. It is irrational in the doxastic sense. The doxastic/propositional distinction was initially introduced in traditional epistemology in order to differentiate between types of justification or warrant (Firth 1978). But we can apply it just as well to rationality (if there is even a difference between rationality and justification). On the standard view of the matter, doxastic rationality is propositional rationality plus basing. Some have questioned what the right order of explanation is—should we explain doxastic rationality in terms of propositional rationality, as the standard picture suggests, or is doxastic rationality somehow explanatorily prior? For example, Dogramaci, in a recent paper on the doxastic rationality of credences, suggests, roughly, that credences are doxastically rational when they were generated by correct reasoning (Dogramaci 2018). The credences that are propositionally rational for the thinker are the ones that they could arrive at by following the rules of correct reasoning with credences “flawlessly and relentlessly.” A somewhat similar proposal regarding justification for

¹ Notable exceptions include Smithies (2015), Wedgwood (2017), Dogramaci (2018), and Titelbaum (forthcoming[b]).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

155

beliefs is put forth in an earlier influential paper by Turri (Turri 2010, for discussion see e.g. Silva 2015). While the introduction of this distinction helps us get a plausible account of Cera’s case, it is less clear whether it can account for Una’s. Una’s case belongs to a more general class of cases in which a thinker has not yet had a chance to examine her evidence, or to reason through a problem carefully, and thus adopts a non-committal attitude in light of this. A similar case is brought up by Dogramaci (2018) to motivate his discussion: before you get a chance to do the math, a credence of 1/10 that the trillionth digit of π is a 2 seems entirely rational. Moreover, we usually think that a specific non-committal attitude is appropriate in a given case. It would be odd to assign a credence of 1/20 to the trillionth digit of π being a 2. Since there are ten options, the indifferent credence 1/10 seems uniquely rational (before one does any calculations). However, given the standard gloss of a doxastically rational attitude as one that is properly based on one’s evidence, these non-committal attitudes can’t be called doxastically rational, since their main feature is that they are appropriate precisely because one has not yet evaluated one’s evidence. Also, as Smithies (2015) points out, it is standardly assumed that doxastic rationality entails propositional rationality—one cannot have an attitude that is doxastically rational if that attitude is not also propositionally rational for that thinker. This condition is violated by our examples as well. For instance, we want to maintain that for Una, the propositionally rational credence in “If P then C” is a credence of 1, not a credence of 0.5. Hence, insofar as noncommittal attitudes seem rational in cases like Una’s, the standard view of doxastic rationality does not help us explain why this is so. This problem is not just one that applies to credences, it also applies to cases that are formulated using the traditional notion of belief, in which a thinker suspends judgment until she has properly assessed her evidence. I submit that solving it will require a general refinement of our theory of epistemic and propositional rationality or justification. I will not attempt to do this here, and I am not aware of any extant solutions to the problem. Dogramaci (2018), for instance, never comes back to his example about π. He says later that nonprobabilistic judgments that are generated by Type-1 heuristic reasoning processes can be defeasibly doxastically rational, and that this justification is defeated once we become interested in the relevant propositions. Once our attention is drawn to them, we must apply correct, Type 2 reasoning strategies to assign the correct credence. Unfortunately, this proposal does not satisfyingly account for the π example. First, assigning an initial credence of 1/10 to the trillionth digit being a 2 need not be the result of a Type 1 heuristic. It can

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

156       be very deliberate. Moreover, the view doesn’t really explain why it would seem less rational if the thinker assigned a 0.5 credence to the digit being a 2 instead of 1/10 (perhaps as a result of applying some shortcut rule like “it’s 50/50 because the number is either a 2 or it’s not”). Hence, more work is needed to account for these types of cases. To sum up: Older treatments of Bayesian accounts of rationality were not always careful to make the distinction between doxastic and propositional rationality, which left the view open to the objection that it makes incorrect verdicts about cases like Una’s and Cera’s. Once we specify that norms like probabilism are best understood as norms of propositional rationality, this objection can be (at least partially) answered. Yet, this is not to say that it is well understood what it takes to have doxastically rational credences, and what the relationship between propositional and doxastic rationality consists in. There are many open questions to be answered, and formal epistemologists have only recently begun to attend to them. Some of these questions are equally unsettled in the parallel debates about justification in traditional epistemology. There’s just a lot more work to do.

2. Rationality: Ideal and Ecological Being dissatisfied with the unrealistic demands of theories of ideal rationality, some philosophers and psychologists have developed a theory of rationality that is designed to be more applicable to limited human thinkers. This theory concerns both practical and epistemic rationality, and is known under the label ecological rationality. We will again consider a couple of toy examples for illustration. Doctor Amy is a doctor, and she frequently has to reason with uncertain information in her job. For example, she is confronted with questions such as “How likely is it that this positive test is a false positive?” “How sure are you that the patient has disease X rather than Y?” and so on. While Amy often doesn’t get exactly the right answer, the credences she forms in response to these questions on the basis of her evidence are always within a very narrow margin of the ideally rational credences she should assign. Hurried Traveler James is running late for a flight, and he needs to quickly figure out which mode of transportation is most likely to allow him to catch the plane. He could

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

157

take a taxi, or a bus, or the subway, or a train. He has lots of credences that are relevant to answering this question: his credences regarding how long each mode of transportation takes in optimal conditions, his credences about how likely it is that each mode of transportation is delayed, his credences about what the average delay is given that it occurs for each mode of transportation, etc. For example, he is confident that if there is an accident on the road, the taxi would be severely delayed, but he also thinks that such an accident is unlikely. James can’t waste time on complicated deliberations, so instead of figuring out exactly which mode of transportation is favored given his evidence, he just decides to take the taxi, because he is confident that it is fastest under optimal conditions. Both Amy and James are not fully rational by the standards of ideal rationality. Amy’s credences closely approximate, but don’t match, the credences that are ideally rational for her, and James violates norms of ideally rational decision making by ignoring various relevant factors. Yet, being ordinary humans with limited cognitive capacities and time, complying with requirements of ideal rationality is out of reach for them. The best they can do is make good use of their available resources, to produce outcomes that are sufficiently good for their purposes. The theory of ecological rationality is concerned with exactly this problem: what are good reasoning strategies for thinkers with limited resources? Which heuristics are rational for them to use? Psychological research has focused on identifying heuristics that people actually use in reasoning, and on evaluating whether these heuristics are rational for people to use (see e.g. Gigerenzer 2008, and other works by him). In order to evaluate whether a heuristic is rational, so it is argued, it is not sufficient to know how it works. In addition, we need to know in which environments or for what kinds of tasks it is likely to be used. If a heuristic, when used in a particular environment by a particular type of thinker, produces close to normatively ideal results much of the time, and does so more efficiently than the “correct” strategy that always generates the normatively ideal result, it is judged to be rational for that type of thinker in that type of environment. An example of such a heuristic is the “take the best” heuristic, which is used in choosing between options. Instead of doing a full evaluation and weighting of all the relevant criteria, the thinker only considers which option is favored by whatever criterion is judged to best discriminate between the options. Graefe & Armstrong (2012) apply this method to election forecasting, testing whether election forecasts that only rely on information about how candidates approach a major political issue can be as reliable as forecasts

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

158       that incorporate substantially more information. This type of heuristic for choosing between options is an effective time-saver in making choices whenever there is a criterion that the thinker can identify, such that performance according to this criterion is a good proxy for overall choiceworthiness (of a hypothesis, an action, etc.). The take the best heuristic’s rationality depends on whether thinkers are likely to rely on it in environments in which it identifies the best option (or an option that is very highly ranked) with high frequency. Applied to our examples, the theory of ecological rationality would plausibly identify both Amy’s and James’ reasoning as rational. We don’t know what reasoning strategy Amy uses, but it reliably produces very close to correct results in Amy’s environment, and it thus qualifies as being ecologically rational. James uses a version of the “take the best” heuristic to pick a means of transportation, using as his choice criterion which one would be best under optimal conditions. If we assume that, most of the time, traffic conditions are good, then using this heuristic is ecologically rational. (If traffic conditions are usually bad, another choice criterion might be rational to use, namely which option is best when traffic conditions are bad.) Both the theory of degrees of rationality I propose and the theory of ecological rationality start from the common observation that imperfect thinkers can at best approximate rational ideals. But they go on from there to answer different questions: The theory I develop is motivated by the question: To what degree do this thinker’s attitudes fall short of the ideal? The theory of ecological rationality asks: Given that this thinker falls short of ideal rationality, do they use reasoning strategies that are rational in their situation? Thus, the two views focus on different targets of evaluation— attitudes vs. reasoning strategies, and they also differ in their evaluation conditions. The theory of degrees of rationality simply measures divergence from the ideal, without accounting for the thinker’s limitations or the context. The theory of ecological rationality asks whether a reasoning strategy is good when applied by a type thinker in a type of environment. Once we see these differences, apparent conflicts between the views disappear. We can say that Amy the doctor is both slightly irrational from the point of view of ideal rationality, and that she uses rational ways of reasoning given her limitations and her environment. But the two perspectives are more than just compatible. One is in fact necessary for making sense of the other, as I will now argue. In the literature that evaluates heuristics for whether they produce close to ideal outcomes in particular environments, the notion of a “close to ideal” or “optimal” outcome is usually defined in a way that is specific to the task. In the study about

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

159

election forecasting cited above, an optimal result is just one that matches the correct answer, i.e. the actual election outcome. But it is easy to see that if we want to evaluate heuristics that are used in theoretical reasoning more generally, it is desirable to have a general way of evaluating how closely the outcome of a reasoning process approximates the ideally epistemically rational outcome. We can’t just always check whether a reasoning method led to the truth, because we don’t always know what the truth is! The theory of degrees of epistemic rationality we have developed so far fits this task well. For any heuristic thinkers use to simplify tasks in theoretical reasoning, the outputs of the heuristic can be evaluated with our theory according to how closely they approximate the ideally rational result.² These evaluations can be connected to different types of environments. By definition, heuristics are not equivalent to the reasoning procedures that always deliver ideal results, and it can depend on the specific type of task or environment they are applied to whether they deliver close to ideal results. If a thinker only uses a heuristic in environments or for tasks for which it is particularly well suited, then it doesn’t matter whether there are other contexts in which the heuristic delivers highly irrational outputs. Hence, in order to determine whether a heuristic is rational, we need a standard that tells us how close to ideal the results are that the heuristic delivers in the environments of interest. My theory of degrees of rationality can supply this standard for theoretical reasoning tasks. We can illustrate the use of this method with an example of a heuristic for theoretical reasoning. Tentori et al. (2016) have hypothesized that reasoners, in determining how confident they should be in a claim given some evidence they have acquired, use the degree of confirmation provided by the evidence as a proxy for determining the posterior probability of the claim conditional on the evidence. If a piece of evidence is judged to have strong confirmatory force, then the claim under consideration is judged to be very probable conditional on the evidence. This heuristic—using the degree of confirmation provided by a piece of evidence to judge the posterior probability of a claim given the evidence—is pretty reliable in many situations. However, there are exceptions, most notably, reasoning tasks that tend to generate the base rate fallacy. In these kinds of reasoning tasks, a piece of evidence, say a positive test result, has strong confirmatory force without yielding a high posterior probability. Thus, if people use the heuristic to estimate the posterior probability in these cases, ² And, as we saw earlier, on some measures, getting closer to ideal rationality entails getting closer to the truth, regardless of what the truth happens to be.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

160       their estimates don’t closely approximate the ideal credence assignments warranted by the evidence. Our measures of degrees of rationality are perfectly suited to capture the extent to which the results delivered by this heuristic approximate the ideally rational credence assignments. To sum up: contrary to what is sometimes suggested, the Bayesian view of epistemic rationality and the ecological view of rationality are not in conflict. They seek to answer different, but related questions about the rationality of non-ideal thinkers. I have argued that my theory of degrees of ideal rationality provides a crucial component for theories of ecological rationality: it provides a scale on which we can measure to what extent the outputs of heuristics for theoretical reasoning approximate the ideally rational credence assignments in different environments.

3. Rationality: Evaluative and Ameliorative The theory of degrees of rationality I propose is an evaluative theory. This means that it can be used to evaluate how rational or irrational a thinker’s credences or credence changes are. However, we might hope for more—not just a measure of our irrationality, but guidance for how to be more rational. Readers who were hoping for practical advice on how to become more rational are probably sorely disappointed at this point, if they even made it this far. We would ultimately like to answer questions about how particular irrational thinkers should reason or change their credences, and also about how we can teach people to be more rational. Taken by itself, the evaluative theory I have developed cannot (and is not intended to) answer these questions. Once we think more about why this is, we can get a better sense of what is needed in addition. Most importantly, applying an evaluative theory like the theory of degrees of rationality requires taking a third-personal point of view. It requires knowing, for example, what someone’s credences are, how they can be represented in a numerical model, and how to apply the mathematical measuring strategies introduced in previous chapters. This type of information is not available to us via first-personal introspection. If it were, we would hardly be left with irrational thinkers—a thinker who could represent their own credences in this way, and apply the measures to figure out how irrational they are, would have to already know what the ideally rational credences are in order to measure the distance between them and their own credences. But if they know what the ideally rational credences are, why wouldn’t they have already

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

161

adopted them? We have good reason to think that such a thinker would not be irrational to begin with. But focus now on an irrational thinker who only has ordinary mental capacities. There are infinitely many credence functions that are more rational than their own. How should this thinker reason, or change their credences? If we ask this question with the thinker’s particular capacities and situation in mind, the answer can hardly be “Pick one of the infinitely many more rational credence functions and adopt it.” Even if there are in principle infinitely many credence functions that are more rational than the thinker’s credences, many of them are unavailable to them, in the sense that there is no conscious or unconscious reasoning process that the thinker could realistically engage in that would lead them to have those credences. It is a difficult question which reasoning processes are available to a particular thinker at a given time, and I will not try find an answer here. But it seems obvious to me that this question is relevant if we want to know how a thinker should improve their credences. Moreover, depending on how broadly we understand the question of how a particular thinker should change their credences, practical factors come into play in addition to epistemic factors. In the long run, we’re all dead. In the meantime, we’re not well advised to spend our time improving the rationality of our credences in completely trivial or unimportant matters. Hence, there might be a sense in which it is not advisable, all things considered, for a thinker to improve the rationality of their credences, because doing so would be a waste of time, and prevent them from spending their mental energy on more important matters instead. Thus, the right question to ask is: Which of the credence changes that are available to this thinker should they carry out, given their circumstances? The answer will partly depend on which credence change will leave them most rational, but it can also factor in practical considerations. There is thus no direct path from an evaluative theory of epistemic rationality to a theory that tells us how particular thinker should change their credences in particular situations.³

³ Notice that this problem is not just one we encounter in epistemology. We can find similar issues in ethics. Suppose the correct normative ethical theory says that our actions should maximize expected welfare. We can use this theory to evaluate people’s actions, according to whether they did maximize expected welfare, or to what extent they fell short of doing so. And, to no one’s surprise, ordinary humans are quite terrible at acting in ways that maximize expected welfare. If we want people to act more ethically, teaching them about expected welfare maximization is hardly going to be the way to success. Just like in the case of rationality, many more factors must be considered if we want to answer the question of what a person should actually do, in their situation, and how we can help them be better (see e.g Sidgwick 1981, p. 485).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

162       We might also be interested in teaching people to be more rational, or better at reasoning. Again, while measures of degrees of rationality can be used to evaluate how rational the outcomes of particular reasoning strategies are, they don’t provide us with strategies we can teach people that help them improve their reasoning. Finding effective methods for improving people’s reasoning skills can’t be done from the armchair, since it requires empirical investigation. Of course, we can try to teach people enough probability theory and mathematics to be able to solve probabilistic reasoning tasks. The obvious problem with this approach is that this requires a lot of education and practice, and even if people learn probability theory in high school or college, they will forget what they have learned without repeated practice. Moreover, in the types of reasoning tasks that generate the most dramatic mistakes, such as tasks that induce the base rate fallacy or the conjunction fallacy, even mathematical training doesn’t seem to prevent people from finding the incorrect answer to be intuitive. In response, psychologists have been searching for alternative ways of improving people’s reasoning performance on these tasks. The most promising approaches have involved manipulating the format in which reasoning tasks are presented. For example, people are better at reasoning with probabilistic information when it is presented pictorially, or in frequency formats (see e.g. Brase & Hill 2015 for a comprehensive survey of the relevant research). And of course, we’d also like to know how to teach people to be better reasoners in real life situations in which we can’t control the format in which a problem is presented. In sum, an evaluative theory of degrees of rationality like the one defended here cannot, by itself, answer questions about how irrational thinkers should reason or change their credences, or questions about how we can teach people to be more rational. Yet, in order to answer those types of questions, we must be able to compare reasoning strategies and methods according to how rational their outputs are. For this purpose, measures of degrees of propositional epistemic rationality are an essential tool.

4. Rationality: Epistemology and Semantics All along, we have treated propositional epistemic rationality as a property that comes in degrees. An assignment of credences can be more or less rational. Hence, we should expect the term “rational,” which we use to refer to this property, to be a gradable adjective. My aim in this section is to sketch how my view of degrees of rationality relates to semantic treatments of

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

163

gradable adjectives more generally. Establishing this connection has two benefits: first, it demonstrates that the view developed here guarantees a smooth fit between epistemological and semantic theorizing, and second, we can use established insights about the semantic properties of gradable adjectives to explain some of our rationality judgments about particular cases. Gradable adjectives such as “tall,” “clean,” and “full” are each associated with a type of scale according to which items can be ranked. A gradable adjective G can standardly be used in constructions such as “This thing is G-er than that thing,” “This thing is G,” “This thing is the G-est,” “This thing is very G,” and so on. In linguistics and philosophy of language, a lot of research has been done on the semantics of these adjectives, and it applies to our inquiry insofar as “rational” is a gradable adjective. We can distinguish subtypes of gradable adjectives according to the kind of scale they are associated with. Some adjectives have scales with fixed endpoints or maxima, whereas others have open scales with no fixed endpoints. The former are standardly called absolute gradable adjectives, and the latter relative gradable adjectives (for some relevant discussion, see e.g. Kennedy & McNally 2005, Toledo & Sasson 2011, and references therein). An example of a relative gradable adjective is “tall,” since there is no maximal height something can be—for any height, something could always be taller. An example of an absolute gradable adjective is “full.” There is always a point at which something can’t be any fuller, hence, the scale associated with “full” has a maximum or endpoint. For any gradable adjective G, a sentence of the form “Entity O is G” is true just in case the degree to which O is G is within a certain range on the scale associated with G. Yet, absolute and relative gradable adjectives differ in how flexibly this range can be determined. For absolute gradable adjectives, this range must typically be close to the endpoint of the scale. For example, we can truly say “This bucket is full” when the bucket is almost or completely full, but not when it is half full. (Although as Toledo & Sassoon point out, this can differ according to conventions that apply to specific types of items. For example, wine glasses are typically considered full when they are about 1/3 filled, so a “full” wine glass must be filled up to or almost up to this conventional mark. These conventions don’t appear to apply when modifiers such as “completely” are added.) Relative gradable adjectives behave differently in this respect. Since the associated scale has no endpoint, there is greater flexibility in how the range of values can be determined that something must fall into in order to count as G. For example, the threshold for a person to count as tall is not the same for basketball players as it is for toddlers.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

164       The scale associated with “rational” in ordinary English clearly has a maximum. This is confirmed by the observation that it makes sense to say “James is completely rational,” or “James’ credences are completely rational” (and that, by contrast, it doesn’t make sense to say “James is completely tall”). Hence, given the categories introduced above, “rational” is an absolute gradable adjective. Relatedly, the scale for “irrational” also seems to be closed, as it makes sense to call something “entirely irrational” or “completely irrational.”⁴ This means that “irrational” behaves similarly to other closed-scale adjectives like “empty,” rather than open scale-ones like “bad.” This also means that if “rational” is an absolute gradable adjective, we should expect positive rationality judgments, such as “Her beliefs are rational” or “She is rational,” or “This is a rational decision” to be correct only if the entity to which rationality is attributed possesses a degree of rationality that is very close to or at the end of the scale.⁵ The notion of rationality we have been investigating—propositional epistemic rationality for credences—is of course not identical with the more general everyday notion of rationality. It is a precisification of the everyday concept, which carves out a particular subtype of rationality. Still, we should expect that it shares its semantic structure with the everyday concept. The way in which we have developed the notion of degrees of propositional epistemic rationality bears this out. The maximum of the scale is reached when a credence assignment fully complies with all of the requirements of rationality that apply to it. Also, given that we measure irrationality as the distance from some closest ideally rational credence assignment, there is a lower bound to how irrational a credence assignment to a given set of claims can be. We have not developed a similar scale here for degrees of doxastic rationality, but once we do so in future research, it will plausibly turn out to have the same type of scale structure. Thinking about degrees of propositional epistemic rationality from a semantic perspective also vindicates the way in which we make rationality ⁴ Whether it makes sense to call something “completely irrational” or “entirely irrational” seems to depend in part on the target of the evaluation. It seems odd to call a person “entirely irrational” (what would that even consist in?), but it seems natural to call a belief “entirely irrational,” for example in cases where the person’s evidence rationalizes the opposite opinion and gives no support at all for the agent’s belief. ⁵ With beliefs or decisions, there doesn’t seem to be a conventional constraint that lets us attribute rationality even when the end of the scale is not reached. While full bath tubs and full wine glasses are considered full when they are filled to the level that is typical given their use (try taking a bath in a completely full bath tub!), there doesn’t seem to be an analogous phenomenon for beliefs and decisions. I am less sure about attributing rationality to people. Perhaps there is a typical amount of rationality we’re looking for in people, which lets us call people rational even when their attitudes and decisions are not all completely or very close to completely rational.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

165

judgments about particular cases. As explained before, judgments involving the positive form of an absolute gradable adjective, such as “Jane’s credences are rational,” require that the relevant entity, in this case Jane’s credences, possess the property of being rational to the highest, or almost to the highest degree. Once we add a modifier like “completely” or “entirely,” the judgment is correct only if Jane’s credences are rational to the highest degree. This is just how we have been using the term “rational” in our discussion of propositional epistemic rationality. Also, it allows us to call credences rational in some contexts in which a thinker’s credences are very close to ideal. Recall the example of Amy, the doctor, whose credences are always within a very narrow margin of the ideally rational credences she should assign. I think it is very tempting to call her credences rational in the Bayesian sense, even if they are not exactly perfect. Or at least, it might seem somewhat misleading to call her credences irrational. A semantic explanation that can underwrite this intuition is that, in a context in which we’re not requiring exact precision, Amy’s credences are close enough to the top end of the rationality scale that we can correctly call them “rational.”⁶ In sum, my theory of degrees of propositional epistemic rationality establishes a scale structure for degrees of rationality that also makes sense from a semantic point of view. “Rational” in its ordinary meaning is an absolute gradable adjective, and our precisification of this notion preserves the semantic features of the ordinary notion. Hence, our epistemological theorizing fits smoothly with standard semantic treatments of gradable adjectives.

5. Rationality: Permissions and Obligations In this chapter, I have so far focused on how my theory of degrees of propositional epistemic rationality fits in with theorizing about the nature of rationality more generally. But when we make rationality judgments, we often don’t use the language of rationality in order to judge a person, an attitude, or an action. Instead, we often use deontic modals, such as “ought,” “may,” “should,” or “must” in order to make judgments about rationality. For example, in the earlier examples, we might say “Una should assign this tautology a middling credence until she finds a proof for it,” or “Cera shouldn’t be certain in this tautology if her certainty is entirely baseless.” We can also ⁶ In principle, this tolerance for close-enough cases could be either a pragmatic or a semantic phenomenon, but this difference is not important for our purposes.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

166       express requirements of propositional epistemic rationality with deontic modals, as in “Tautologies ought to be assigned credence 1,” or “All credence assignments must be probabilistically coherent.” My goal in this section is to sketch how deontic modals are generally thought to work, and to explain how they can be used to express the different types of rationality judgments I have characterized in the previous sections. I will show that the way in which we have conceived of degrees of rationality fits well with standard accounts of deontic modals. We find a lot of debate in the literature about what the best theory of the semantics (and pragmatics) of deontic modals is, but for our purposes, it will suffice to point out crucial features of deontic modals that are generally undisputed, and need to be accounted for by any theory of modals.⁷ Consider a judgment of the form “S ought to ϕ.” Roughly, for “S ought to ϕ” to be true, there must be a set of alternatives that are in some sense available or possible for S, and these alternatives must be ranked in such a way that ϕ-ing is ranked highest (or perhaps no lower than any of the other sufficiently good available alternatives). In the language of possible worlds, if S ϕs in all the optimal worlds among the worlds under consideration, then it is true that S ought to ϕ. For example, consider the claim “Oona ought to pick up a birthday cake from the supermarket on the way to the party.” Whether or not this is true depends on what the alternative possibilities are, and how they are ranked. It may be that the alternatives are to not pick up anything for the party, or to pick up a rotisserie chicken. If picking up a cake is ranked above picking up nothing and picking up a chicken, and those are all the options under consideration, then it is true that Oona ought to pick up a cake. However, if a better available option were for Oona to bring a homemade cake to the party, rather than a storebought one, then it would be false that Oona ought to pick up a cake at the supermarket. Which options or worlds count as available is not fixed by the meaning of “ought.” Rather, depending on the context, these can be individuated more broadly or more narrowly. Also, context determines the criteria by which options or worlds are ordered. If a statement says what someone morally ought to do, options are ranked according to their moral goodness. If it says what someone ought to do according to the rules in the faculty handbook, options are ranked according to their compliance with those rules. Which norms are relevant for the ordering can be stated explicitly, or it can be ⁷ The recent edited collection by Chrisman & Charlow on deontic modals is a good starting point to get an overview of the debate (2016). The classic treatment of deontic modals can be found in Kratzer (1981). See also Smithies (2015) for some insightful comments about how “ought” works in the context of rationality judgments.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

:   

167

implicit in the context. Moreover, ought-statements are information sensitive. This feature accounts for the difference between what are often called objective and subjective oughts: I might know more than Oona about the person for whom the birthday party is held. I happen to know that this person loves chicken but hates cake, so I can truly say “Oona ought to bring the chicken.” Yet, since Oona doesn’t know this, and has no reason to violate usual birthday party norms, the subjective ought-claim “Oona ought to bring the cake” is also true, when we are talking about what Oona ought to do given her perspective. These features and others of statements involving modals allow for great flexibility in what can be expressed by a statement of the form “S ought to ϕ” (and similar statements with other deontic modals, such as “must” or “may”), and they also give rise to many challenges for developing a semantic account of modals that captures all of these features perfectly. But for our purposes, we don’t need to focus on all these details. We’re interested in explaining the relationship between judgments about what’s rational and related judgments involving deontic modals, and in order to sketch how this can be accomplished, we need not consider the more intricate details of how different semantic accounts model information-sensitivity, options, and rankings. Given the inherent flexibility of deontic modals, they can be used to express a variety of different types of normative epistemic judgments. Let’s start with judgments that express the norms of propositional epistemic rationality, such as “Tautologies ought to be assigned credence 1,” or “All credence assignments should be probabilistically coherent.” As explained above, deontic modals require an underlying set of options, and an ordering of those options. For these types of judgments, the underlying set of options is the set of all possible credence assignments (there are no restrictions to a thinker’s abilities or otherwise). Those credence assignments can be ordered according to the degree of their propositional epistemic rationality, using the type of account of degrees of rationality I have defended. Since all and only the highest ranked credence functions are probabilistically coherent, the judgment “All credence assignment should be probabilistically coherent” is correct, given this way of filling in the deontic modal’s context-sensitive parameters. But of course, in a different context, both the set of options and the way of ordering them can change. In our discussion of the difference between propositional and doxastic rationality in section 1, we saw that the credences that are most rational from the perspective of propositional rationality need not be the credences that are most rational from the perspective of doxastic rationality. Take the case of Cera, who is certain of a tautology, but is only guessing; she

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

168       has no proof or argument that the relevant claim is a tautology. While her credence is propositionally rational, it is not optimal from the perspective of doxastic rationality. Since doxastic rationality takes into consideration the quality and stage of the agent’s reasoning, the correct judgment is either that Cera ought not to assign full credence to the tautology (if we hold fixed the background assumption that she hasn’t found a derivation for it), or that Cera ought to assign full credence to the tautology on the basis of proper reasoning. Hence, it is both correct that Cera should and that she shouldn’t assign full credence to the tautology, depending on whether the judgment is made with respect to doxastic or propositional rationality, and depending on which background assumptions we hold fixed. We also considered notions of rationality that take into consideration the cognitive and situational limitations of thinkers. For example, if we ask how a thinker should change their credences, we can understand this to mean “given the credence changes that are available to the thinker, and the situation they are in, how should they change their credences?” The “should” in this question is thus plausibly to be understood relative to a limited set of options, which is constrained by the thinker’s capacities and her situation. The ranking of these options can follow a variety of norms, as we just saw, for example the norms of doxastic or propositional rationality. Such a limitation of options is plausible in contexts in which we are interested in judging different reasoning strategies and heuristics, as proposed by the theory of ecological rationality. Recall the case of James, who is in a hurry to get to the airport. We might say “He ought to figure out which mode of transportation is best under worst-case traffic conditions, and take that one.” Here, we recommend a particular heuristic as the optimal reasoning strategy in his circumstances. Again, the options for this type of judgment are restricted to the ones that are feasible given the time constraints in the situation, and the ordering of those options is carried out according to which option delivers the most rational decision. Of course, there is much more to be said both about the semantics and pragmatics of deontic modals, and about how we can use them to express our judgments about rational obligations and permissions. But this quick sketch should be sufficient to demonstrate the flexibility of deontic modals with respect to what options are considered available, and how the options can be ranked. This makes them suitable for expressing different kinds of epistemic normative assessments. Our theory of degrees of propositional epistemic rationality can underwrite deontic modal judgments by filling in the ordering parameter in contexts in which this notion of rationality is relevant. Once we appreciate the basic structure of deontic modal judgments, we can see that

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



169

there is at most an apparent conflict between ought-judgments that are correct in different contexts, such as the judgments that Cera should assign full credence to the tautology and that she shouldn’t assign full credence to it.

Conclusion My goal in this chapter was to locate my view of degrees of propositional epistemic rationality within a bigger picture of research on different aspects of rationality. I aimed to show that my theory does not just provide a piece of the puzzle, but that it actually integrates smoothly with neighboring puzzle pieces within the broader project of understanding rationality. I focused on the relationship between my view and theories of doxastic rationality, theories of ecological rationality, ameliorative approaches to rationality, semantic theories of gradable adjectives, and theories of deontic modals. The main upshot of my discussion was that most of these approaches rely implicitly or explicitly on an ordering of options according to their degree of rationality. The theory of degrees of rationality I have developed is well suited to provide this kind of ordering. More work is needed to develop a nuanced theory of doxastic rationality, but I predict that my account of degrees of propositional rationality will be one of its essential ingredients. In sum, my theory integrates smoothly with a variety of approaches to thinking about the nature of rationality. In the next chapter, I want to pick up a particular strand of inquiry from this chapter to further illuminate the predicament of non-ideal, human thinkers. The examples of Amy the doctor, and James the hurried traveler brought to our attention the fact that human reasoners are often faced with reasoning tasks that are far too complex for them to execute properly in a reasonable amount of time. Hence, they need to find shortcuts that help them simplify reasoning tasks in ways that don’t leave them with highly irrational conclusions. I will argue in the next chapter that human thinkers partly solve this problem by reasoning with outright beliefs, in addition to credences. So far in this book, I have only talked about the rationality of credences. Yet, many epistemologists, myself included, don’t think that our belief-like attitudes are exhausted by our credences. Human reasoners also have outright beliefs. Outright beliefs are the type of attitude we talk about when we say “Jane believes that she is out of milk” or “James believes that Australia is in the Southern hemisphere.” When a thinker outright believes some claim A, they take A for granted, or treat A as true, in any reasoning and decision-making

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

170       processes in which this belief is involved. In this regard, an outright belief that A resembles a credence of 1 in A. Yet, it is widely accepted that in order for a thinker to rationally hold an outright belief in A, it is not necessary that they are certain that A. For example, Jane’s belief that she is out of milk can be rationally permissible even if she was merely very confident, but not certain, that she is out of milk. But once we adopt the position that human thinkers have both credences and outright beliefs, the question arises what the point is of having beliefs in addition to credences. Why have this extra set of attitudes, when it seems like credences alone should be sufficient for the job of representing the world? This question is known in the literature as the “Bayesian Challenge.” It is the challenge of answering the question “Why do we have (and need) outright beliefs in addition to credences?” In my view, the most plausible answer that has been given to this question is that reasoning with outright beliefs is a kind of heuristic that makes reasoning tasks more tractable for limited human reasoners. Outright beliefs let thinkers ignore small error probabilities, thereby limiting the number of options that need to be considered in a given reasoning tasks. Yet, just like with any other heuristic, relying on a simplified method is not cost free. To gain simplicity, we have to give up ideal rationality. But this is not a bad trade for beings like us. Explaining and defending this view of outright belief is the topic of the next chapter.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

9 How Do Beliefs Simplify Reasoning? Introduction In previous chapters, the discussion focused primarily on the rationality of credences. But according to an increasingly popular view in epistemology, human thinkers have two kinds of belief-like attitudes: credences, and outright beliefs. Outright beliefs (which are often also just called “beliefs”), are coarsegrained attitudes that don’t make reference to how probable or improbable things are.¹ It is commonly assumed that people can outright believe claims that they are not completely certain of, and that this is rationally permissible. In this chapter, I present a puzzle for the role of outright beliefs in reasoning, and propose a solution.² Why do we need outright beliefs in addition to credences? A popular explanation is that outright beliefs simplify our reasoning processes by allowing us to disregard small probabilities of error. For most ordinary empirical claims, even highly probable ones, we’re not completely certain that they are true. Forming an outright belief in such a claim lets us reason as if we had full confidence in it. This is useful in many contexts, because we can thereby dramatically narrow the number of possibilities we need to consider in reasoning and decision-making. I argue that this claim about the simplifying function of outright beliefs gives rise to a puzzle when combined with other plausible and commonly endorsed claims about outright beliefs. Defenders of such a dual view of belief usually assume that whether we rely on a high credence in some claim A or an outright belief in A in our reasoning can change from context to context. Moreover, in any given context, we can rely on mixtures of credences and

¹ The terminology varies a bit in the literature: some people call any ungraded belief an outright belief, regardless of whether its content is probabilistic. On this view, both a belief that it’s hot outside and a belief that it’s probably hot outside would count as outright beliefs. I use the term “outright belief” in a more limited sense here, which excludes ungraded beliefs towards probabilistic contents. Only the former example is an outright belief in my sense. When I use the term “belief” in this chapter, it is interchangeable with this understanding of “outright belief.” ² The material in this chapter was initially published in Staffel (2018).

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0009

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

172     ? outright beliefs, treating some claims as true and others as uncertain. This gives rise to the question of how thinkers manage their credences and outright beliefs across contexts, i.e. how they arrive at a given set of credences and outright beliefs to rely upon in a given context of reasoning. It has been proposed that thinkers do so by employing a strategy I call pseudoconditionalization (PC). PC appears to be an attractive strategy for managing outright beliefs and credences across contexts, because it ensures that, in any given context of reasoning, the set of attitudes the thinker reasons with is coherent. The problem with PC is that it is psychologically unrealistic that human reasoners use it, since it is very computationally demanding. Hence we arrive at the following puzzle: How could it be true that the purpose of having outright beliefs is to simplify reasoning, when managing one’s outright beliefs requires such a complex strategy? I propose to solve the puzzle by rejecting the view that thinkers manage their beliefs and credences by employing PC. Based on this solution, I will furthermore argue for a descriptive and a normative claim. The descriptive claim is that the available strategies for managing beliefs and credences across contexts that are compatible with the simplifying function of outright beliefs can give rise to synchronic and diachronic incoherence in a thinker’s attitudes. By revealing possible tradeoffs between simplicity and coherence in reasoning, we gain a better understanding of why limited human reasoners fail to have ideally coherent doxastic states. Moreover, I argue that the view of outright belief as a simplifying heuristic is incompatible with the view that there are ideal norms of coherence or consistency for outright belief that outstrip human thinkers’ reasoning capacities. If the main purpose of outright beliefs is to simplify reasoning, then beliefs can’t be governed by norms that are so difficult to comply with that doing so prevents the beliefs from serving this function. In section 1, I present the ingredients of the puzzle, and explain in what sense they are puzzling when taken together. In section 2, I examine which of the claims comprising the puzzle ought to be rejected. In section 3, I explain how we can think of the simplifying role of outright beliefs in a way that avoids the puzzle, and I draw lessons from the proposed solution to the puzzle.

1. The Puzzle In the current literature on belief, it is increasingly popular to assume that human thinkers have two kinds of belief-like attitudes: credences, and outright

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

173

beliefs (which are often just called “beliefs”).³ Outright beliefs are coarse-grained attitudes that don’t make reference to how probable or improbable things are. We ascribe outright beliefs to people by saying things like “Jane believes the store is still open,” or “James thinks that Paris is the capital of France.” One of the most important differences between outright beliefs and credences is how they behave in reasoning. If someone relies on an outright belief in A in reasoning, the person takes A for granted, or treats A as true. The possibility that ~A is ruled out. By contrast, if someone reasons with a high credence in A, they don’t take A for granted. The possibility that A might be false is not ruled out. To see the difference, consider someone who is selecting what to bring to a vegan brunch. Their reasoning and decisionmaking will be substantially different depending on whether they reason with an outright belief that the cookies contain no animal products, or whether they reason with a high credence that the cookies don’t contain animal products.⁴ If we accept that people in fact possess those two types of belief-like attitudes, we may then wonder: How are beliefs and credences in fact related in human thinkers? Which combinations of credences and beliefs are rational for people to have? The answers to these questions can of course come apart, since the first question is a descriptive question, and the second one a normative question. Most normative theories of belief deem an outright belief in some claim A to be permissible when a thinker is highly confident in A, but not necessarily certain of it (with some exceptions such as lottery propositions). The combinations of beliefs and credences we actually find in people seem to be roughly in line with these normative constraints. People seem to readily form beliefs in claims that they are not completely certain of, and rely on them in reasoning and communication, but refrain from forming beliefs when their confidence is too low (Weisberg forthcoming). This way of thinking about outright beliefs and credences gives rise to a natural way of incorporating outright beliefs into formal models of doxastic attitudes. In slogan format, the idea is that “belief is credence 1 in context.”

³ E.g. Buchak 2014a, Clarke 2013, Easwaran & Fitelson 2015, Greco 2015, Leitgeb 2016, Lin 2013, Lin & Kelly 2012, Ross & Schroeder 2014, Staffel 2016, Sturgeon 2015, Tang 2015, Weatherson 2016, Wedgwood 2012, Weisberg 2013, Weisberg forthcoming. For earlier discussions of this type of view, see e.g. Harsanyi 1985, Levi 1964. ⁴ This way of distinguishing between credences and beliefs according to their roles in reasoning might not be attractive to adherents of some types of interpretivist views of propositional attitudes. I leave it as a question for future research whether a version of the puzzle I present here arises for such interpretivist views.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

174     ? Clarke (2013) and Greco (2015, 2017) argue that we should identify an outright belief with a credence of 1 in a claim, but allow that whether a claim is assigned credence 1 can vary between contexts (see also Harsanyi 1985). Similarly, Wedgwood (2012) argues that reasoners like us have theoretical credences (the credences that we adopt purely in light of our evidence), and practical credences. The latter are simplified versions of our theoretical credences. An outright belief is identified with a practical credence of 1. Tang (2015) also defends the view that outright beliefs can be identified with high probability estimates that get rounded up to 1. Hence, on this view, a credence of 1 in a claim A no longer exclusively represents a thinker’s (stable) certainty that A is true, it can also represent the attitude of treating A as true in a context of reasoning. I will adopt this modeling convention throughout the chapter, but the substance of my argument does not depend on it. While the view that human thinkers have both outright beliefs and credences has struck many philosophers as extremely plausible, it also raises an important question, which has sometimes been called the “Bayesian challenge.” Having both outright beliefs and credences creates a kind of redundancy in our attitudes that might seem unnecessary. Why have outright beliefs at all, when it seems like credences by themselves can do the job? This explanatory challenge is thought to have been first expressed by Richard Jeffrey (1970), who said: I am inclined to think that Ramsey sucked the marrow out of the ordinary notion [of belief] and used it to nourish a more adequate view. But maybe there is more there, of value. I hope so. Show me; I have not seen it at all clearly, but it may be there for all that.

There is a small literature on answers to the Bayesian challenge. A highly popular answer to the challenge is that human thinkers need outright beliefs in addition to credences in their inventory of doxastic attitudes, because outright beliefs help simplify reasoning (see e.g. Harsanyi 1985, Lance 1995, Leitgeb 2016, Lin 2013, Lin & Kelly 2012, Ross & Schroeder 2014, Tang 2015). Human reasoners’ cognitive limitations make it infeasible for them to use only credences in their reasoning, because it requires keeping track of many different possibilities, even if some of those possibilities are very improbable and could safely be ignored. This is where outright beliefs help. By letting reasoners treat highly probable claims as true, the number of possibilities that need to be considered is greatly reduced, thereby making reasoning problems more

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

175

tractable.⁵ We can illustrate this with a simple example. Suppose you are wondering how likely it is to rain during an upcoming tennis match. The problem is that you don’t remember where the tennis match will take place. You think it might be in New York or Boston or LA. Your credences are as follows: c1 ðNYÞ ¼ 0:48 c1 ðBostonÞ ¼ 0:48 c1 ðLAÞ ¼ 0:04 Of course, how likely it is to rain during the match depends on where it will take place. You have the following conditional credences reflecting this: c1 ðrainjNYÞ ¼ 0:7 c1 ðrainjBostonÞ ¼ 0:9 c1 ðrainjLAÞ ¼ 0:1 In order to correctly compute c₁(rain), you need to plug your conditional credences of the form (rain|place of match) and your unconditional credences about where the match happens into the total probability theorem: c1 ðrainÞ ¼ c1 ðrainjNYÞ c1 ðNYÞ þ c1 ðrainjBostonÞc1 ðBostonÞ þc1 ðrainjLAÞc1 ðLAÞ c1 ðrainÞ ¼ 0:772 This computation could be simplified if you disregarded the possibility that the match might be in LA, which you consider to be very improbable. ⁵ One might wonder why I have classified the “treating as true” attitude as being an outright belief, rather than the attitude of acceptance. Acceptances also allow reasoners to treat claims as true, and can thus help simplify reasoning processes. In the philosophical literature, acceptances and outright beliefs are commonly distinguished because they differ in how they are formed. Acceptance is usually taken to be under a thinker’s voluntary control. I can decide to treat any claim as true in reasoning, no matter how low my confidence in the claim is, while being fully aware that the claim might be false. By contrast, outright beliefs cannot be voluntarily adopted in the same way (although some philosophers think we have some amount of control over what we believe). Rather, whether or not we outright believe a claim is usually regulated by cognitive processes that are automatic. We seem to have some deliberative control over switching from relying on an outright belief to relying on a credence, for example by directing our attention to ways in which we might be mistaken. But we usually can’t employ deliberative control over which claims we take for granted in framing a reasoning problem, this is done automatically and without our conscious awareness. The Bayesian challenge, as I understand it, is the question of why our minds are equipped to employ outright beliefs, so construed, and this chapter discusses one possible answer and its implications.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

176     ? If you reasoned with an outright belief that the match is in Boston or New York, which you consider to be equally likely, you could simply take the average between the rain probabilities in Boston and New York, which comes out to c₁(rain) = 0.8. Of course, this is not quite correct, but it’s very close to the correct answer. Formally, we might represent your reasoning as follows then, representing outright belief as credence 1 in context: c1 ðNYÞ ¼ 0:5 c1 ðBostonÞ ¼ 0:5 c1 ðLAÞ ¼ 0:04 c1 ðrainÞ ¼ c1 ðrainjNYÞ c1 ðNYÞ þ c1 ðrainjBostonÞc1 ðBostonÞ c1 ðrainÞ ¼ ð0:7 þ 0:9Þ0:5 ¼ 0:8 This example is a very simple illustration of the general idea that reducing the number of possibilities under consideration simplifies reasoning problems, both of the theoretical and the practical kind. This observation helps explain why it makes sense for limited human reasoners to have outright beliefs in addition to credences: outright beliefs let us eliminate improbable options from consideration in framing reasoning problems, thus making them easier to solve. We will capture this response to the Bayesian challenge in claim (1): (1) Human thinkers have outright beliefs in addition to credences because outright beliefs help simplify reasoning processes. In order to show that (1) leads to a puzzle, I need to introduce three further claims. I will state them first, then explain why each of them is plausible, and then argue that they give rise to a puzzle when combined with (1). (2) Reasoning processes can involve mixtures of credences and outright beliefs, and it is flexible from context to context which outright beliefs are relied upon. (3) Outright beliefs and credences in a context are determined by pseudoconditionalizing on a set of background credences. (4) Pseudo-conditionalizing is difficult to execute for human reasoners, because it is computationally expensive. Claim (2) consists of two sub-claims, which we might call Mixing and Switching. According to Mixing, any given reasoning task can involve both outright beliefs and credences. The tennis match example illustrates this claim: you rely on an outright belief in the disjunction that the match is in New York

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

177

or Boston, but you also rely on various conditional and unconditional credences when determining how confident you should be that it will rain during the match. Many, perhaps even most, practical and theoretical reasoning problems have this structure, in which some contingent claims are treated as true, and some claims are treated as uncertain in generating an answer to the problem. According to Switching, whether or not we rely on a credence or on an outright belief in a relevant claim can change from context to context. (In a given context, we obviously have to rely on one or the other, not both.) Such a change can be triggered, for example, by a change in the stakes of a situation. Suppose you are talking to your friend about when you will be able to meet her. You believe that you can take the bus to arrive at her house at 6 pm, and you tell this to your friend. Your friend might then make you aware of the fact that it would be very bad if you came later than 6 pm. Realizing that an error would be costly, you might no longer want to rely on your outright belief that the bus arrives at 6 pm, and instead take into consideration the possibility that the bus might be late and not get there by 6 pm. One plausible way to think about what happened here is that you switched from relying on an outright belief to relying on a high credence in the claim that the bus arrives at your friend’s house at 6 pm. Again, it seems common to switch between treating a claim as true and treating it as merely highly likely when the context changes, for example when a new aspect of the situation becomes salient, an error possibility comes into focus, or the stakes change. This type of phenomenon is well documented in the philosophical literature (see, e.g. Hawthorne 2004, Greco 2015, 2017 for discussion, and especially Nagel 2011 for an empirically informed account of switching). The phenomena of Mixing and Switching lead to a question, to which claim (3) of the puzzle offers an answer. The question is: How do thinkers (or better, their cognitive systems that execute this task) manage the shifts between different contexts of reasoning, and determine which credences and outright beliefs to rely upon in a given context? Managing these attitude shifts requires solving two problems. The first problem that needs solving is to figure out which claims to treat as true and which claims to treat as uncertain in a given context. In other words, the thinker (or their cognitive system) must figure out which probabilities of error can safely be ignored in a given context of reasoning. Obviously, a thinker can’t run an expected utility calculation to decide which error possibilities can safely be ignored, since this would defy the purpose of simplifying the reasoning problem. Hence, a different mechanism

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

178     ? must be at work. This problem has received some attention in the philosophy literature, see for example Lin (2014).⁶ But even if the problem of choosing which outright beliefs to rely upon in a given context has been solved, there’s a second problem, which will be my focus in what follows. The problem is whether and how to readjust one’s remaining doxastic attitudes once it is settled which possibilities are considered live in a given context. This question has not received much attention in the literature. To my knowledge, there is only one proposal for how thinkers might handle these adjustments, which was made by Harsanyi (1985), and more recently (and independently of Harsanyi) by Clarke (2013).⁷ They propose that thinkers engage in a strategy I will call “pseudo-conditionalization” for settling on a particular set of attitudes in a given context. The idea is that thinkers have a global credence function that consists of conditional credences over a set of possibilities. In different contexts, thinkers treat different possibilities as live. By pseudo-conditionalizing on the possibilities they consider live in a given context, it is determined what a thinker’s attitudes are in a context. If something is not considered a live possibility in a context, we treat it as if the thinker assigns it credence 0, and its negation is assigned credence 1, and thus believed in that context. The thinker’s remaining credences are adjusted as if the thinker were conditionalizing on the live options. In shifting between contexts, the thinker can expand or contract the set of possibilities they rule out, and hence change the set of beliefs and credences they rely on. When I introduced the tennis match example, I already introduced the pseudo-conditionalization strategy, without calling it that. We assumed that the thinker’s full possibility space included three options for where the tennis match might take place, and that they disregarded the LA option because it was so improbable. I modeled this as the thinker’s updating their credence in (Boston or NY) from 0.96 to 1, and readjusting their credences accordingly, from 0.48 to 0.5 in each disjunct. Assuming that the thinker’s conditional probabilities remain fixed across contexts, it is now easier for them to answer the question of whether it will rain during the match. It is easy to see why I choose to call this strategy “pseudo-conditionalization.” Ordinary conditionalization is the standard Bayesian strategy for rationally

⁶ As an anonymous referee helpfully points out, this problem is closely related to the so-called frame problem in AI. A good survey is provided by Shanahan (2016). ⁷ I also talk about this problem in my discussion of whether Pettigrew’s accuracy-first program is incompatible with a view that takes thinkers to have both credences and outright beliefs. In his reply, Pettigrew considers some alternatives to pseudo-conditionalization, but finds them problematic (Staffel 2017a, Pettigrew 2017b).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

179

updating one’s credences on newly learned evidence. The rule says that if you learn that some claim A is true with certainty (i.e. c(A) becomes 1), then your new credence in any claim B should be your old credence in B conditional on A, i.e. cnew(B) = cold(B|A). Pseudo-conditionalizing works in the same way, except that the thinker doesn’t set their credence in a claim A to 1 because they have learned that it is true, but instead because they are ruling out the possibility of ~A for the purposes of reasoning in that context.⁸ The pseudo-conditionalization strategy seems like an attractive solution to the problem of how thinkers manage their credences and outright beliefs across contexts. It inherits its attractiveness from the appeal of regular conditionalization. Regular conditionalization ensures that a thinker’s updated credences are appropriately related to their prior credences, and that the resulting credences are always probabilistic (assuming the starting credences were probabilistic, too). Both of these points are important in explaining the appeal of pseudo-conditionalization. A thinker who reduces the complexity of a reasoning task by ruling some improbable options out from consideration is presumably still interested in otherwise maintaining the relative proportions of their credences, which reflect both what evidence they have already gathered, and what their take on the impact of this evidence is. Moreover, there are many arguments for the claim that synchronically incoherent attitudes are ⁸ The way I have characterized PC might create the impression that proponents of PC are committed to a particular view about how credences and beliefs are stored and/or generated in the mind. The view that seems to naturally accompany PC is one on which some set of background credences are the most fundamental attitudes that are stored by the thinker, on the basis of which outright beliefs are generated when they are needed in a given context. Yet, this view does not seem particularly psychologically realistic. Fortunately, on closer inspection, proponents of PC are not actually committed to it. Proponents of PC are merely committed to the idea that thinkers’ doxastic attitudes are representable as being generated by PC. This is compatible with different theories of how credences and outright beliefs are stored and generated in our minds. One view that seems relatively uncontroversial in the empirical and philosophical literature is that some of our doxastic attitudes are stored explicitly, whereas others are stored implicitly. Explicitly stored attitudes are immediately and easily available for cognitive processes, whereas implicitly stored attitudes have to be inferred from other attitudes in order to make them available for cognitive processes (see e.g. Harman 1986, Kirsh 2003). Yet, this still leaves open a variety of views about which attitudes are explicitly stored. Do we explicitly store just some of our credences, or some of our beliefs, or some of both? Weisberg (forthcoming) has recently argued for yet another option, namely that we store information, and depending on the situation at hand, this information can be called up as an outright belief or as a credence. Weisberg argues that research on memory beliefs supports this view as the most psychologically realistic theory of how we store and generate outright beliefs and credences. I discuss this view further in section 3. What matters for our purposes is that any of these positions is in principle compatible with the claim that the evolution of a thinker’s credences and beliefs across contexts is representable by pseudoconditionalization. Different views will differ in their explanations of what the underlying cognitive processes are that give rise to the thinker’s attitudes being so representable, but differences in these explanations don’t affect the argument against PC I make in what follows.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

180     ? rationally defective. Hence, a simplification strategy that renders the resulting set of attitudes probabilistically coherent (at least when the starting credences are probabilistic) is desirable. PC perfectly accomplishes both of these tasks, so it seems like the obviously best strategy for managing beliefs and credences across contexts. This brings us to the last claim in the puzzle. According to (4), pseudoconditionalizing is difficult to execute for human reasoners, because it is computationally expensive. The justification for (4) is based on results from research in psychology, computer science, and complexity theory, all of which have investigated how demanding it is to update probability functions by conditionalizing. While psychologists have not studied PC in particular, they have studied whether reasoners’ updates on newly learned evidence are representable as updates by conditionalization. Since PC is the same rule as conditionalization, except that the shifts to credence 1 are not evidencebased, psychological research about conditionalization can help us evaluate the feasibility of PC for human thinkers. Relevant psychological evidence includes the following findings: In reasoning tasks that ask subjects to estimate the posterior probability of a hypothesis on the basis of relevant evidence, subjects’ estimates tend to be too conservative, i.e. they underestimate how much the evidence impacts the probability of the hypothesis (Edwards 1982, Slovic & Liechtenstein 1971). There have also been studies comparing people’s ability to judge the degree to which a piece of evidence confirms a hypothesis and people’s ability to judge the posterior probability of a hypothesis given a piece of evidence. (These can come apart: a piece of evidence can have high confirmatory force, yet not lead to a very high posterior probability.) People turn out to be better at judging confirmation than posterior probabilities (e.g. Mastropasqua et al. 2010, Tentori et al. 2016). In investigating people’s conditional probability judgments, it has been found that subjects don’t strictly conform to the rigidity requirement, which prescribes that conditional credences should be stable (Zhao & Osherson 2010). Moreover, conditional probability judgments differ (contrary to what would be rational according to Bayesian models) depending on whether subjects learn that some claim p is true, or are asked to suppose that p is true (Zhao et al. 2012). People’s conditional probability judgments also don’t align perfectly with their unconditional probability judgments as required by the standard ratio formula (Zhao et al. 2009; see also Evans et al. 2015 for further helpful discussion and references). Taken together, the findings in the psychology literature suggest that the way in which people update on newly learned evidence often approximates conditionalization, but mostly doesn’t match

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

181

it perfectly. It is an ongoing research program to identify the reasoning strategies that give rise to the observed updating patterns, but most researchers share the working hypothesis that the underlying reasoning strategies are not equivalent to the conditionalization rule; rather, it seems plausible that reasoners use simplified strategies that approximate conditionalization closely enough. The hypothesis that reasoners use simpler updating strategies than standard conditionalization gains support from results in computer science and complexity theory. If the conditionalization rule were relatively simple to implement from a computational point of view, then it would be puzzling why human reasoners don’t conform to it. Yet, it turns out that conditionalization can be very complex to execute, which explains why limited reasoners may rely on simpler rules instead. The most general and abstract result that is relevant for our purposes is that probabilistic inference, including conditionalization, is NP-hard (Cooper 1990). For a problem to be in the complexity class NP, it must be the case that it can be solved by a non-deterministic Turing machine in polynomial time. In other words, the maximum amount of time it takes a non-deterministic Turing machine to solve the problem is determined by a polynomial function of the size of the input. A task is called NP-hard if it is at least as hard as any NP problem, and possibly harder. Hence, Cooper’s result tells us that the general task of determining the probability of some claim A based on a body of information is at least as hard as any problem in the complexity class NP. This is to say that a thinker’s credences (or a Bayesian network, which is Cooper’s framework for representing probabilistic information) may contain all the necessary information to determine c(A), but many computational steps might be required to actually arrive at the result.⁹ This is of course not to say that every single probabilistic reasoning task is this complex. This result about NP-hardness is a result concerning the complexity of probabilistic reasoning (including conditionalization) in general, not a result about particular instances of reasoning. Whether a particular reasoning problem is difficult depends on the information that is explicitly available to the thinker, and the kinds of computations required for arriving at the relevant ⁹ According to Cooper, the hardest probabilistic inference problems are ones that are representable by multiply connected belief networks that contain many uninstantiated variables. This means that the probability of any given claim in the network can depend on the probabilities of more than one other claim, and that many claims in the network have non-extreme probability values. Probabilistic reasoning tasks are easier when they are representable by singly connected networks, or when the truth-values of many of the claims in the network are known. The latter finding is in agreement with the thesis that outright beliefs simplify reasoning, since they let the reasoner assume that their truth values are known.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

182     ? answer. Consider the tennis example again, in which the thinker wants to answer the question of how likely it is to rain during the match. If the thinker already knew, for example, how likely it was not to rain during the match, the inference to the probability of rain would be trivial. Similarly for pseudoconditionalization: If the thinker already had all of the conditional probabilities needed for pseudo-conditionalizing in any context readily available, then pseudo-conditionalizing would not be very difficult, because PC would basically require a trivial one-step inference.¹⁰ But this assumption is not psychologically realistic. Humans have some conditional and unconditional credences readily available to them, whereas others can only be generated by inferential processes (Kirsh 2003). If thinkers had the conditional credences needed for pseudo-conditionalizing readily available to them, then they should also have them readily available for use in regular conditionalization. Yet, as we know from the psychological research I briefly summarized above, human thinkers at best approximate correct Bayesian inferences, including conditionalization, in their reasoning. It is a matter of active debate how we can characterize the algorithm human thinkers use when reasoning with and updating their credences, but there seems to be little doubt that it is some kind of heuristic and not a full Bayesian algorithm. Hence, if being a perfect conditionalizer is infeasible for a human thinker, being a perfect pseudo-conditionalizer is, too. We now have the resources to see why claims (1)–(4) together generate a puzzle. While they are not inconsistent, it is hard to see how they could be true at the same time. Claim (1) asserts that the reason why human thinkers are equipped with outright beliefs in addition to credences is that outright beliefs simplify reasoning tasks, and thereby make them more tractable for limited beings like us. But if human thinkers have to use pseudo-conditionalization for managing their credences and outright beliefs across contexts, then the computational cost of using this strategy is likely to significantly diminish the computational benefits from having outright beliefs in the first place. Hence, it

¹⁰ A similar picture is suggested for consideration by an anonymous referee. If humans had all of their conditional credences explicitly stored, as well as all the claims to which they assign credence 1 or 0, then any unconditional credence could be arrived at via a trivial one-step inference. On this view, computational effort is reduced by drastically increasing the amount of information that needs to be stored by the thinker. The thinker needs to have a conditional credence “ready to go” for every possible evidential situation she might find herself in. Human thinkers don’t have this kind of extensive store of conditional credences, so this way of making PC feasible doesn’t work for us. This problem was already recognized by Harman, who argues that humans can’t update their credences via conditionalization (1986, p. 25). However, Harman mistakenly assumes that this is the only way in which reasoning with degrees of belief could be implemented by humans, and on this basis, rejects the possibility that humans can reason with credences altogether.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

183

is very implausible that the claims in the puzzle are jointly true. In the next section, I will examine which of the four claims should be rejected.

2. Solving the Puzzle The first claim in the puzzle states that people have outright beliefs in addition to credences in their inventory of doxastic attitudes, because they help rein in the complexity of our reasoning. There are different strategies for denying this claim. The first option is of course to deny its presupposition, i.e. that people have outright beliefs at all. On such a view, which has been defended by Jeffrey (1970) and Pettigrew (2016) among others, there is no puzzle to begin with, because the question of what beliefs are for doesn’t arise. This solution is of course unattractive for the many defenders of the view that we have outright beliefs. Having been convinced by arguments for this claim, such as Weisberg’s (forthcoming), I will proceed on the assumption that human thinkers do in fact have outright beliefs that can play a role in their reasoning and decision-making, and note that this view is not universally endorsed. One might also consider challenging the idea that cutting down the space of possibilities simplifies reasoning tasks. Yet, since this result has a solid footing in complexity theory (Cooper 1990), this response is implausible. This leaves us with the option of denying claim (1) by arguing that human thinkers have outright beliefs for a different reason. On such a view, their main function is something other than simplifying reasoning. Some alternative explanations of the function of outright beliefs have recently been proposed in the literature. One view is that outright beliefs are necessary as a basis for moral judgment. Buchak (2014a) argues that merely being very confident that, say, Hans didn’t pay for his concert ticket, is not sufficient for judging that Hans did something wrong. This is because high confidence can sometimes be justified based on purely statistical evidence (for example evidence that most concert attendees used forged tickets), but purely statistical evidence is an intuitively (and legally) insufficient basis for a judgment of wrongdoing. Buchak argues that outright beliefs are sensitive to different evidence types than credences, and that one can only rationally form an outright belief in a claim if one possesses non-statistical evidence in its support. Hence, a rational outright belief in a claim A can form the basis of a moral judgment, since it ensures that the thinker possesses non-statistical evidence in support of A. By contrast, a high credence in A may be based on purely statistical evidence, and thus cannot always be an appropriate basis for

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

184     ? a moral judgment. While I am sympathetic to Buchak’s argument that purely statistical evidence is insufficient as a basis for moral judgment, her position is less convincing as a response to the Bayesian challenge. The worry, in brief, is this: Suppose high confidence based on non-statistical evidence can rationalize outright belief, and outright belief is the appropriate basis of moral judgment. Why, on this view, is outright belief needed as a middleman? Why can’t high confidence based on non-statistical evidence directly support moral judgments? This problem with Buchak’s view is especially salient when we consider her analogy with legal cases. As Moss (2018) points out, in civil cases, the “preponderance of evidence” standard applies, which means that in order to win a case, the relevant party in the trial must show that the evidence makes it more than 50 percent likely that they are right. The evidence in question that gives rise to this probability must not be purely statistical. In a case that is decided according to this standard of proof, the resulting probability might not be high enough to warrant belief that the winning party is right. Hence, what matters for whether a legal judgment can be reached in this kind of case is whether the relevant claim can be established to be more than 50 percent likely based on the right kind of evidence, not whether one can form an outright justified belief in the relevant claim. While Buchak argues compellingly that both moral judgments and legal judgments should not be based on purely statistical evidence, it does not follow that the main function of outright beliefs is to ground moral (or legal) judgments. Another strategy for explaining the function of outright beliefs appeals to the importance of knowledge. Outright belief is usually seen as a necessary condition for knowledge, hence, the importance of having outright beliefs could be justified by appealing to the value of having knowledge. A closer look at how I’ve defined credences and outright beliefs for the purposes of this chapter shows that this response is unpersuasive. I’ve defined credences as any doxastic attitudes that encode uncertainty, and outright beliefs as doxastic attitudes that don’t encode uncertainty. So defined, credences can constitute knowledge on any view of knowledge. It is controversial whether credences, understood as graded attitudes towards simple contents, can constitute knowledge (see Moss 2018, for an argument that they can). Yet, this is not my definition of a credence. On my view, a belief that A is likely falls into the category of credences, and no one disagrees that this type of attitude is a candidate for knowledge (Moss 2018, Hawthorne & Stanley 2008, Hawthorne 2004, Williamson 2000). On this view, the Bayesian challenge can simply be reformulated by asking why a thinker whose credences constitute knowledge needs outright beliefs (i.e. beliefs not encoding uncertainty).

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

185

While there may of course be other responses to the Bayesian challenge that I haven’t considered here, I will for now conclude that we shouldn’t resolve our puzzle by rejecting claim (1). The remainder of the chapter thus assumes that the reason why human thinkers have outright beliefs is that they simplify reasoning. The second claim in the puzzle states that people can reason with mixes of outright beliefs and credences, and that which outright beliefs we rely on can shift between contexts. There are different ways in which we could deny this claim. One way to deny it is to argue that it is fixed or largely fixed across contexts which outright beliefs we reason with. Another way to deny it is to argue that people always reason with either outright beliefs or credences, but never with mixes of them. The “no mixing” claim seems extremely implausible from the point of view of a defender of a dual account of the nature of belief. There are plenty of examples of reasoning processes that are most naturally described as involving such mixed attitudes. When we reason with claims we deem uncertain, we usually do so in light of background assumptions we simply treat as true. For example, take the following line of reasoning: “Janet will cook Indian food or Chinese food for dinner. But she probably won’t make Indian food today, because we had that last time. Thus, she’ll probably make Chinese.” This argument is naturally described as having an outright belief in a disjunction as its first premise, and credences as its second premise and conclusion. It is a perfectly ordinary instance of how people reason. Hence, the “no mixing” claim is implausible. Interestingly, there are some formal frameworks that seem to be designed to validate something like “no mixing.” For example, Leitgeb’s stability theory of belief is built in such a way that a coherent thinker who uses standard conditionalization to reason with their credences, and AGM belief revision theory to reason with their outright beliefs, will always be guaranteed to have rationally permissible combinations of outright beliefs and credences. But the framework is not designed to allow mixed reasoning with beliefs and credences (Leitgeb 2016). Similarly for Lin and Kelly’s odds threshold rule for reasoning with beliefs and credences (2013). However, I don’t think the proponents of these frameworks are taking themselves to be committed to making empirical claims about whether people can use mixtures of beliefs and credences in their reasoning. In sum, I can’t see a good reason why any proponent of the view that people have credences and outright beliefs would want to argue for “no mixing.” A more plausible way of denying the second claim of the puzzle is to argue that while we can technically change what we take to be live possibilities in

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

186     ? different contexts, those changes rarely ever happen. And if we need to hardly ever pseudo-conditionalize to switch contexts, having to do so does not make excessive cognitive demands on thinkers like us. We could get the benefit of limiting the space of possibilities we need to engage with within a context, while hardly ever paying the cost of pseudo-conditionalizing. Whether this response succeeds is largely an empirical question. It depends on how often we shift between taking a claim for granted and not taking it for granted. I am not sure how to assess whether this happens infrequently enough to assuage the worry that context-shifting, and thus belief-shifting, via PC is too cognitively demanding. Yet, it is worth noting that endorsing this response is at odds with the motivations that underlie the “belief is credence 1 in context” view in the first place. Moreover, there is a large literature on how to accommodate context shifts, and the proposals range from views that propose shifty standards for which beliefs count as rational (contextualism and subject-sensitive invariantism fall in this category), to views that endorse shifty norms for what attitudes can be relied upon in a given context (see e.g. Levin 2008, Brown 2008). All of these views are supposed to be motivated by the observation that shifts between what we treat as true and what we treat as merely highly likely happen frequently, and not just on very rare occasions. Hence, it would seem odd if proponents of these “shifty” views were attracted to a solution to the puzzle that emphasizes how rarely our beliefs undergo contextual shifts. I don’t want to rule out that this might be the correct solution to the puzzle, but it doesn’t seem to be an attractive route given the motivations that led to the view of belief that gives rise to the puzzle in the first place. Also, the empirical explanation of switching offered by Nagel (2011), which says that people tend to switch to reasoning with credences when they shift from a more automatic to a more deliberate way of thinking, does not suggest that it is particularly rare.¹¹ This leaves us with the third and fourth claims of the puzzle—that we pseudo-conditionalize on a set of background credences to generate our beliefs

¹¹ An anonymous referee points out that some views of belief, such as the belief-as-plan view (see e.g. Dallmann 2017) seem hostile to the idea that we often switch between reasoning with beliefs and credences. The idea is that beliefs are stable in the sense that they are resistant to being constantly reconsidered, which helps us manage our cognitive load. Dallmann shows that it is in fact beneficial for limited thinkers to ignore evidence after a certain point, when it doesn’t seem like considering additional evidence will have much impact on one’s beliefs (or credences). However, the claim that we should sometimes switch from reasoning with outright beliefs to reasoning with credences is consistent with the idea that we should ignore additional evidence in some conditions. While Dallmann doesn’t talk about switching in particular, he explicitly mentions that the belief-as plan view is compatible with the view that human thinkers have both credences and beliefs. Hence, I don’t take there to be an immediate conflict between views like Dallmann’s and the view presented here.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

187

and credences in a context, and that doing so is psychologically difficult because it is computationally demanding. The claim that conditionalization is a computationally and cognitively demanding process is well supported by evidence from cognitive science and complexity theory, as I explained in the previous section. This evidence readily applies to pseudo-conditionalization, which is the same formal procedure as conditionalization. Hence, we don’t have good grounds to question the fourth claim in the puzzle.¹² Instead, I propose that we challenge the claim that we determine our outright beliefs and credences in a context by pseudo-conditionalizing. One might think at this point that we already have conclusive grounds to reject (3), since we have good reason to think that PC fails to be an accurate description of how thinkers manage their credences and beliefs across contexts. But (3) need not be read as making a descriptive claim about PC being an accurate formal representation of a psychological process. (3) can also be understood as claiming that PC is a normative principle that prescribes how ideally rational thinkers should manage their credences and outright beliefs. If PC is a normative principle, it is one that is too demanding to be complied with by ordinary human thinkers, as we just saw. But perhaps this is unproblematic: other Bayesian principles of rationality, such as the requirement to have

¹² According to recent psychological research, human thinkers can engage in different types of reasoning, sometimes called “System 1 reasoning” and “System 2 reasoning.” While the exact details of these views are controversial, the idea is that System 1 reasoning is fast, automatic, and can handle large quantities of information. System 2 reasoning, by contrast, is under the thinker’s deliberative control, takes up working memory, and can handle only small quantities of information (for some helpful recent discussion of the distinction, see Kahnemann 2011, Evans & Stanovich 2013, Mugg 2016). It has been suggested to me that perhaps the puzzle can be dissolved as follows: Since System 2 reasoning is constrained by the limitations of working memory, we need outright beliefs in order to simplify this type of reasoning. However, the processes that determine the credences and outright beliefs that a given act of System 2 reasoning employs are executed by System 1, which can handle more complex processing tasks. Hence, it is not a problem if the strategies for managing our outright beliefs and credences are complicated, because they are handled by System 1 processes. If this view were correct, it would ease the tension between the claim that outright beliefs have the function of simplifying reasoning, and the claim that PC is a very complicated strategy for managing outright beliefs and credences. Unfortunately, this response to the puzzle is not empirically plausible. While it is plausible that the processes by which outright beliefs and credences are selected for use in a given reasoning task are automatic and don’t use working memory, it is not plausible to assume that our System 1 computational resources can handle PC. The psychological evidence we have about whether human reasoners are conditionalizers allows for people to rely on System 1 reasoning in solving the relevant inference tasks. As explained earlier, the research shows that people at best approximate Bayesian reasoning in a lot of cases, but they are not perfect conditionalizers. If our automatic reasoning capacities can at best approximate conditionalization, they can at best approximate pseudo-conditionalization, as those are formally equivalent rules of inference. Moreover, it is also not empirically plausible to assume that outright beliefs only play a role in simplifying reasoning tasks that are executed deliberately and employ working memory. Most of our everyday reasoning is executed by automatic processes, and these processes need to treat many highly likely claims as true in order to make the necessary computations feasible.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

188     ? probabilistically coherent credences, are also unattainable by ordinary thinkers, but this does not disqualify them. I’ve argued that they can be characterized as rational ideals, which can function as regulatory aims for non-ideal thinkers. Unfortunately, the move toward interpreting PC normatively is unsuccessful, as I will now argue. There is a crucial difference between requirements such as coherence on the one hand, and PC, viewed as a normative requirement, on the other hand. On the view expressed by claim (1) in the puzzle, the ability to reason with outright beliefs is a kind of heuristic that limited thinkers are equipped with to better cope with their limitations. Credences, by contrast, are not viewed as having a heuristic function in the same sense. Their role is independent of the cognitive limitations of the thinker who has them. But if outright beliefs, unlike credences, are essentially a heuristic tool, then whatever norms apply to them can’t plausibly undermine their heuristic function. In other words: If there is a norm that is supposed to apply to some state or activity, the norm can’t possibly be a true norm if it prevents the state or activity from serving its purpose or functioning correctly. Too see this point more clearly, we can consider some analogies to other devices or activities that are intended to function as heuristics. First, imagine that Maximus University has recently made bikes available in various places, which are supposed to help students travel more quickly across the large campus. Now, suppose further that there is a rule for riding bikes on this campus, which states that to prevent accidents, riding faster than normal walking speed is not permitted. It’s easy to see that if this were the rule for how to properly ride bikes on campus, there would be no point in having the bikes in the first place, because if people followed the rule, their travel times across campus would be just the same as if they walked. Hence, the bikes can’t serve their purpose of making it faster to get across campus if bike riding is governed by this norm. Another good example is a recent advice video that was posted on the Food Network’s Instagram page.¹³ The aim of the video was to give the audience a “life hack” for making sandwiches in a more practical way. Here’s the “improved” method of making peanut butter sandwiches the video recommends: Take a piece of parchment paper and add a big spoonful of peanut

¹³ https://www.youtube.com/watch?v=T7DsHMga92g Readers who are looking for additional illustrations of this point may try googling “pointless life hacks.” The “Quick Tips” section in the magazine Cook’s Illustrated is also a rich source of examples.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

  

189

butter. Fold the paper over, and roll over it with a rolling pin to flatten the peanut butter between the parchment. Freeze overnight, then cut into squares with kitchen shears. Then peel the paper off the peanut butter squares and put them on top of your bread. The “life hack” proposed in the video quickly became the subject of ridicule online. People pointed out that instead of making it easier to prepare a peanut butter sandwich, which is generally the point of a “life hack,” this method actually made it more difficult and time consuming. Hence, the frozen parchment method can’t possibly be an improvement over the standard method for making peanut butter sandwiches, since it does not simplify the task. The analogy to managing our beliefs via PC is easy to see: the point of having beliefs is supposed to be that they help simplify reasoning. Any plausible norm that governs managing our beliefs and credences across contexts must be compatible with beliefs serving their intended function. But since PC is too complicated for human reasoners to use as a managing strategy for their beliefs and credences, it is not a plausible norm governing this activity, since it hinders outright beliefs’ function of simplifying reasoning. Just like the speed limit prevents bikes from serving the purpose of decreasing travel times across campus, and the parchment paper method prevents people from efficiently making peanut butter sandwiches, requiring thinkers to manage their credences and beliefs via PC prevents outright beliefs from effectively simplifying reasoning processes. Hence, interpreting claim (3) in the puzzle normatively instead of descriptively does not enhance its plausibility. In sum, I’ve argued that we should resolve the puzzle by rejecting claim (3). It is implausible regardless of whether we interpret is as a descriptive claim or as a normative claim. But this means that the defender of the remaining claims in the puzzle, and in particular claim (1), lacks an explanation of how our credences and outright beliefs are determined in a given context of reasoning. Of course, it is ultimately an empirical question of how human thinkers manage their beliefs and credences across contexts. My aim is not to resolve this empirical question here. Instead, I will consider some possible strategies humans might use to manage their beliefs and credences, which suggest that simpler reasoning strategies often carry the cost of leading to synchronically and diachronically incoherent attitudes. I will moreover argue for the normative claim that my solution to the puzzle has implications for the debate about normative constraints on belief—endorsing the heuristic view of outright belief requires that putative norms on belief must be vetted for their complexity and feasibility for human reasoners.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

190     ?

3. The Descriptive and Normative Consequences of Rejecting PC Using PC is infeasible for limited human reasoners because it requires that we update the remaining credences whenever the number of possibilities that are considered live is adjusted. This is an excessively taxing task for thinkers like us, who don’t store all of our beliefs and credences explicitly. Hence, human reasoners must be relying on a simpler procedure for managing their beliefs and credences. We know from the study of approximative Bayesian reasoning that any rule that is simpler than PC will fall short of producing the same outcome as PC in at least some contexts, even if it is some kind of approximating algorithm (Kwisthout et al. 2011, see also Predd et al. 2008). It is of course ultimately an empirical question how people in fact manage their beliefs and credences across contexts. Which procedure works well for human thinkers depends, among other things, on facts about their cognitive architecture into which we currently have very little insight. Still, thinking about different possible strategies for managing beliefs and credences across contexts can give us some clues about their potential costs and benefits for human thinkers. We observed earlier that PC has some desirable features. Thinkers who are representable as pseudo-conditionalizers have stable conditional credences, which means that their assessment of the impact of the evidence they may receive doesn’t change. Moreover, employing PC guarantees that a thinker has coherent credences within each context of reasoning. Still, using PC inevitably introduces some diachronic incoherence, in the sense that the thinker changes her credence without receiving new evidence. This of course happens by design, since the whole point of PC is to treat options as ruled out when the evidence deems them to be merely unlikely. Hence, no alternative procedure would avoid introducing this kind of diachronic incoherence. By contrast, alternative strategies for managing beliefs and credences might introduce instability (and thus incoherence) in the thinker’s conditional credences across contexts, as well as incoherence within a context of reasoning. To see more concretely how this might play out, I will consider two slightly different proposals that have recently been made in the literature about how credences and beliefs are stored by human thinkers, and see what kinds of procedures for managing beliefs and credences across contexts might naturally pair with them. One such proposal has been made by Norby (2015), who argues for the following view: Thinkers represent a space of possibilities, of which a subset is selected by some automatic filtering process to be considered

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

       

191

in a specific context of reasoning. Once such a subset is selected, credences are then assigned to the possibility space under consideration. For example, if the possibility space that is called up includes the possibility that I have milk in my fridge, but not the possibility that I don’t, then it is treated as true (and thus assigned credence 1 in context) that I have milk in my fridge. It is a bit less clear how Norby thinks intermediate credences are determined in a given context. He suggests that the underlying full possibility space does not include an assignment of credences. Rather, different possibilities in the space have different probabilities of being selected for consideration in a given context. However, this is of course not the same thing as the thinker’s credence in a possibility: The fact that a possibility is likely to be included for consideration in a variety of contexts is compatible with it being given low credence in those contexts. Norby emphasizes that the same possibility can receive different credence assignments in different contexts, but it remains a bit unclear what determines a possibility’s non-extreme credence assignment in a given context. Weisberg (forthcoming) offers a more concrete answer to this question. He shares the view that thinkers don’t explicitly encode beliefs or credences, but instead represent possibilities or pieces of information at a fundamental level. Citing empirical evidence about the recall of memories, he suggests the view that outright beliefs and credences are generated partly on the basis of features of the recall process itself. He explains: Confidence in memory-based beliefs appears to be constructed at the time of recall, rather than stored. If you’re asked what the capital of Iceland is, the more easily the answer (Reykjavik) comes to mind, and the more related information comes to mind, the more certain you will be that your answer is correct. So your confidence that Reykjavik is the capital of Iceland doesn’t appear to be stored in memory, at least not directly. [ . . . ] If nothing about Iceland is stored in memory, nothing will come to mind at the time of recall, and you will have virtually no confidence that the capital of Iceland is a place called “Reykjavik”. (p. 26)

Of course, Weisberg’s view refers specifically to memory-based beliefs, so we need to be careful not to overgeneralize it. Combining Norby’s and Weisberg’s views, we arrive at the following rough and ready picture of our mental architecture: We represent a space of possibilities, and in a given context of reasoning, a subset of this space is selected for consideration. Some possibilities are recalled without their complements—they constitute what is believed

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

192     ? or disbelieved in that context. Factors like salience and easy recall presumably influence which possibilities are awarded this status. Other possibilities are given less than full confidence, and how much confidence they get is at least sometimes and at least partly determined by how smoothly we can call up these possibilities and what evidence is recalled in favor and against the relevant possibility. Accepting this picture as our working hypothesis, we can see several ways in which deviations from PC might crop up. First, it is not clear how conditional credences enter the picture. If they are not stored or generated in a stable manner, but also made up “on the fly”, then we might encounter diachronic incoherence in the thinker’s conditional credences, i.e. they might change between contexts. This would constitute a deviation from PC. Another type of diachronic incoherence has been observed to occur when possibilities are represented with a different fineness of grain in different contexts. Norby illustrates this with data on the well-known unpacking effect: Suppose I can represent an event either as the coarse partition {E, ~E}, or in a finer-grained manner as {E₁, . . . ., En, ~E}. For example, this could be {I buy a car, I don’t buy a car} vs. {I buy a VW, I buy a Porsche, . . . , I don’t buy a car}. Thinkers tend to assign E a lower probability when it is presented as one option, compared to when it is unpacked, i.e. the sum of the credences assigned to E₁ through En is larger than the credence assigned to E. Norby takes this data to provide evidence for his view of how credences are represented, according to which thinkers lack stable credences across contexts. We can thus observe that thinkers who lack stable background credences are vulnerable to a kind of diachronic incoherence which does not arise for adherents of PC. We may also consider whether the Norby/Weisberg picture should lead us to expect incoherence in a thinker’s credences within a given context of reasoning. Again, my remarks here will be somewhat speculative, since we don’t have detailed knowledge of how the process of assigning credences in a given context of reasoning works. Suppose, as Weisberg claims, that each possibility is assigned a credence based on features of the thinker’s memory and the recall episode. If this happens for each possibility separately, we could end up with slightly incoherent credences even within a context of reasoning. Here’s how this might happen. Suppose a thinker’s full possibility space regarding some issue E (say, where the tennis match will take place) contains five possibilities, E₁ – E₅. If all possibilities were considered, E₁ and E₂ would be considered very unlikely, say 5 percent likely each. E₃ would be considered the most likely, say 50 percent, and the remaining two possibilities are each considered 20 percent likely. Now, in some contexts, the thinker might not

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

       

193

even consider E₁ and E₂ at all. But if the mechanism for assigning credences to the remaining three options remains the same as in a context in which all five possibilities are considered, then the thinker might end up assigning credences to E₃, E₄ and E₅ that sum to slightly less than 100 percent. Unless some procedure is in place that ensures that the credences in the options under consideration are normalized, so that they sum to 100 percent, the “recall and assign each credence on the fly” procedure does not guarantee coherence within a context. Such a normalization procedure requires computational effort, so it is a place in which our mind might cut corners. Of course, it is important to emphasize again that whether we employ such a normalization procedure or not is ultimately an empirical question. But as described, the Norby/Weisberg picture does not rule out the possibility of incoherence within a context. If there is no normalization procedure that coherentizes the credences in a given context, we again arrive at deviations from PC, this time within contexts of reasoning. In the literature on the relationship between credences and beliefs, some philosophers have proposed threshold rules that determine what is believed in a context. For example, Foley (2009) proposes a descriptive version of the Lockean thesis, which says that each context is associated with a credence threshold, such that “one believes that P just in case one is sufficiently confident of the truth of P.” Call this rule General Rounding (GR). Alternatively, there could be a rule according to which a thinker treats only some claims as true (false) that are assigned a credence above (below) some threshold. Call this rule Selective Rounding (SR).¹⁴ Questions concerning the complexity of the implementation of such rules, and how this affects the coherence of a thinker’s attitude, don’t usually get explicitly discussed. But it is worth noting that if those rules were implemented without employing an additional renormalization procedure, they would also lead to incoherence within contexts. We can see the effects that failures to renormalize have on the coherence of a thinker’s credences by considering an example. Here are some credences that constitute a subset of some thinker’s full credence function. Within this subset, no possibilities have currently been ruled out for the purpose of simplifying reasoning: c2 ðA&BÞ ¼ 0:86 c2 ðA&eBÞ ¼ 0:1 c2 ðeA&BÞ ¼ 0:01 ¹⁴ I am grateful to Sylvia Wenmackers for helping me see that GR and SR should be distinguished.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

194     ? c2 ðeA&eBÞ ¼ 0:03 c2 ðAÞ ¼ 0:96 c2 ðeAÞ ¼ 0:04 c2 ðBÞ ¼ 0:87 c2 ðeBÞ ¼ 0:13 Suppose that the thinker eliminates some of the possibilities by using either GR or SR. If the remaining, unrounded credences were determined in the same way, regardless of whether some possibilities are eliminated from consideration, the thinker is left with an incoherent credence function within a context. Although this kind of procedure introduces slight incoherence, it eliminates the computational cost of renormalizing the remaining credences depending on which possibilities are considered live, and is thus less difficult to execute than PC. Below, the table shows how the different rules would determine the thinker’s credences in a given context. The columns PC(A) and PC(B) show the thinker’s attitudes if they pseudo-conditionalized on A and B, respectively. The columns GR(>0.85) and GR(>0.9) show how the thinker’s attitudes would turn out as a result of applying the General Rounding rule with different thresholds. The columns SR(A) and SR(B) show the thinker’s attitudes resulting from selective rounding, treating either only A or only B as true. Looking at these results, we can easily see that since they don’t renormalize the remaining non-extreme credences, they tend to generate some incoherence in the thinker’s credence function within a context of reasoning. There is much more to be learned about how humans store beliefs and credences, and how they reason with them. Examining some sample theories has been instructive in shedding light on open questions in this area. Studying the Norby/Weisberg proposal is instructive, because we saw how their sketch Pseudo-conditionalizing versus rounding Credences only

PC(A)

PC(B)

GR (>0.85)

GR (>0.9)

SR(A)

SR(B)

c₂(A&B) = 0.86 c₂(A&~B) = 0.1 c₂(~A&B) = 0.01 c₂(~A&~B) = 0.03 c₂(A) = 0.96 c₂(~A) = 0.04 c₂(B) = 0.87 c₂(~B) = 0.13

0.895 0.105 0 0 1 0 0.895 0.105

0.989 0 0.011 0 0.989 0.011 1 0

1 0 0 0 1 0 1 0

0.86 0.1 0 0 1 0 0.87 0.13

0.86 0.1 0 0 1 0 0.87 0.13

0.86 0 0.001 0 0.96 0.04 1 0

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

       

195

of the mental architecture of human belief suggests processes for managing beliefs and credences across contexts that can give rise to violations of PC. First, incoherence between contexts can take hold if there is no stable basis of conditional credences that remains fixed over time. The human mind might lack such a stable basis in order to avoid excessive storage demands. Second, the view on which each credence is generated in a somewhat individualized manner based on what is stored in memory and features of the recall episode suggests that incoherence might also arise within a context of reasoning. If the assignment of individual credences is somewhat insensitive to what other possibilities have been ruled out, this might generate failures to renormalize credences within a context, which in turn generates incoherence. We also considered threshold rules for determining what is believed within a context, and noted that if those rules are not implemented alongside a renormalization procedure, they, too, generate incoherence within a context. The main point to take away from this discussion is that there are a variety of different ways in which human reasoners might manage their beliefs and credences across contexts, and depending on exactly how their mental architecture is set up, there will be different heuristics that make this task manageable. Different heuristics will lead to different types of deviations from PC, and thus different forms of incoherence within a context or between contexts of reasoning. (Remember that any strategy that avoids incoherence is equivalent to PC, and thus just as difficult to execute.) Using a heuristic for managing beliefs and credences across contexts that introduces some synchronic or diachronic incoherence might seem initially like a bad deal—after all, this means that thinkers have credences that are accuracy-dominated, or vulnerable to Dutch books. But, as I’ve argued earlier, small divergences from coherence lead to small deviations from optimal accuracy, and to relatively little Dutch book vulnerability. If thinkers generally only treat claims as true that they take to be highly probable, the incoherence generated by not renormalizing the remaining credences within a context of reasoning is generally not going to be very dramatic. This is because, if the thinker’s credence in some claim shifts by only a little, say by 0.05, then the adjustments to the thinker’s remaining credences this would normally require according to PC are also quite minimal, and so if those adjustments aren’t made, the divergence from coherence created this way is not very large. Similar claims hold for slight incoherence between different contexts of reasoning. As a result, the loss in accuracy, and the increased Dutch book vulnerability to which this amount of incoherence gives rise is also not very dramatic. These considerations suggest that trading simplicity for coherence can actually be

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

196     ? beneficial for thinkers, provided these tradeoffs don’t generate incoherence that is too significant. Whatever method thinkers actually use for managing their credences and outright beliefs, we know, based on empirical research of the kind I cited in section 2, that (with some exceptions such as base-rate cases) human reasoners tend to be decently good at approximating coherent Bayesian reasoning, even if they are usually not perfectly coherent. One might still worry, however, that in some cases, ignoring a small probability of error, or being slightly incoherent, could have catastrophic consequences. For example, the probability that I might have a grave accident while on vacation might be very small, but if it happened, the cost of treatment could bankrupt me. Hence, when I decide whether to buy travel insurance, I should not simply treat it as true that I won’t have an accident, even if my credence in this claim is extremely high. Even if ignoring small error probabilities and being slightly incoherent are unproblematic in many contexts, in high stakes situations doing so is extremely inadvisable. Fortunately, human thinkers seem to be sensitive to this issue. It has been extensively documented in the literature on pragmatic encroachment that reasoners tend to switch to relying on credences in relevant claims in high stakes contexts (Roeber 2018). The famous bank case illustrates this (De Rose 1992): When nothing much rides on whether the bank is open on Saturday, people seem to be inclined to treat it as true for the purposes of reasoning if they are fairly confident in it. But when it’s extremely important that they get to the bank, people start paying attention to the small chance that they might be mistaken. Hence, in cases where small error probabilities matter, people often seem to attend to these small error risks more carefully. But of course, more empirical research is needed to pin down the reasoning strategies that give rise to this data. We learn from these observations that for human thinkers, using a heuristic for managing beliefs and credences that differs from PC need not be disadvantageous. Given a smart approximation strategy, the cost of the resulting incoherence need not be significant. A further lesson from rejecting (3) and endorsing a heuristic view of outright belief concerns constraints on what consistency or coherence norms on belief can plausibly look like. Standardly, justifications of synchronic and diachronic norms of belief appeal to purely epistemic grounds. Such grounds include, for example, whether the norms promote particular aims such as believing the truth and not believing falsehoods, having properly based or well-justified beliefs, and so on. These defenses thus tend to consider the epistemic credentials of these norms, but they usually don’t attend to how easy or difficult it would be for human thinkers to abide by them. But, as I have

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



197

argued above, if the main function of outright beliefs is to be a heuristic tool that simplifies the reasoning processes of limited thinkers, then norms that govern beliefs must not prevent outright beliefs from serving this function. This means that, if one endorses the heuristic view of the function of outright belief, one is thereby committed to thinking that suitable norms of belief must meet both epistemic constraints and feasibility constraints. It is insufficient to argue for norms of belief based on epistemic constraints alone. An example of a view on which my argument puts pressure is Leitgeb’s stability theory of belief (Leitgeb 2016). In the first chapter of his book, he endorses the view that outright beliefs simplify reasoning (although it is not entirely clear that he is committed to the view that this is their primary function). He then goes on to endorse norms on belief, as well as norms on how credences and beliefs may be connected in rational thinkers, that share many features with standard ideal consistency and coherence norms familiar from Bayesian epistemology. Leitgeb readily admits that these ideal norms can’t realistically be obeyed by human thinkers, yet does not recognize that endorsing such norms is in tension with his claim that the function of beliefs is to simplify reasoning. I will leave it open here how Leitgeb and philosophers who hold similar views should resolve this tension. It is an important and unexplored question for future research which norms on belief are compatible with their simplifying role.

Conclusion In this chapter, I presented a new puzzle about outright belief. The puzzle arises from combining a popular answer to the Bayesian challenge—that outright beliefs have the function of simplifying reasoning—with some additional plausible claims about outright belief that have been recently defended in the literature. Those claims are that we can mix outright beliefs and credences in reasoning, that we can switch between relying on a high credence and relying on an outright belief in different contexts, and that thinkers manage their credences and outright beliefs across contexts via the pseudoconditionalization strategy. The tension between these claims arises from the fact that PC, while having some features that appear desirable from a normative point of view, is computationally demanding. PC can be interpreted descriptively or normatively, but it is implausible on either interpretation. Its descriptive version is incompatible with known results about human reasoning, specifically the result that they aren’t perfect conditionalizers. The

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

198     ? normative version of PC is implausible, because it would require outright beliefs to be governed by a norm that impedes their function as a simplifying heuristic. I then considered which lessons we can draw from this solution to the puzzle. A natural question to ask is what alternative strategies are available to human thinkers for managing their beliefs and credences, given that PC is an inadequate description of their thought processes. While it is ultimately an empirical question which strategies thinkers use, I argued that we should expect tradeoffs between simplicity and coherence. PC already induces some diachronic incoherence in the thinker’s attitudes, and any strategy that is not equivalent to PC will incur additional incoherence either within or between contexts of reasoning. As we saw in earlier chapters, incoherence leads to suboptimal accuracy and Dutch book vulnerability. Yet, since the kinds of feasible strategies thinkers might use to manage their outright beliefs and credences are likely to introduce only a fairly minimal degree of incoherence, the tradeoff can be beneficial for the thinker. I further argued that we can draw an important normative lesson from this solution to the puzzle. Endorsing the heuristic view of outright belief captured in claim (1) of the puzzle entails that putative norms on belief can’t be defended on purely epistemic grounds. They must also be vetted for their complexity and feasibility for human reasoners. If outright beliefs’ primary function is to simplify reasoning, they can’t possibly be governed by norms that prevent them from accomplishing this.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

10 Matters to Be Settled My aim in this book was to defend the view that Bayesian norms of rationality are relevant to human thinkers as ideals worthy of approximation. I proceeded by developing answers to two questions: Why is it better to be closer to being ideally rational, rather than farther away from it? What constitutes being closer to or farther away from being ideally rational? My answer to the latter question is that we should measure degrees of approximation to ideal rationality by using distances between vectors. We can represent credence assignments as vectors, and then measure the distance between a vector representing an irrational credence function and a vector representing some closest ideally rational credence function. Selecting appropriate distance measures and divergences for this task is not entirely straightforward, but we can make progress by looking to the answer to the former question. It is better to be closer to being rational, rather than farther away from it, because approximating ideal rationality (in the right ways) delivers increasing portions of epistemic value. Depending on how we conceive of the type of epistemic value that is best achieved by being ideally rational, we must choose appropriate distance measures for measuring degrees of rationality from the set of divergences that is generally appropriate for this task. I explicitly considered accuracy and action-guidingness as examples of such values, but the general style of argument can be transferred to any type of value-based conception of rationality. I then showed how the theory applies to cases in which a thinker’s credences are governed by multiple principles of rationality, and how we can use it to evaluate the rationality of credence changes. The remaining chapters were devoted to looking at the bigger picture: in Chapter 8, I explored how my theory integrates with related, but distinct ways of theorizing about rationality, and in Chapter 9, I explained how I see the relationship between credences and outright beliefs, and specifically their interaction in human reasoning. Inevitably, pursuing a large project like this leaves many questions unanswered, and opens up a multitude of avenues for future research. In these last few pages, I will briefly summarize the main results from each chapter again, and, more importantly, point out ways in which the themes from the chapter could fruitfully be investigated further.

Unsettled Thoughts: A Theory of Degrees of Rationality. Julia Staffel, Oxford University Press (2019). © Julia Staffel. DOI: 10.1093/oso/9780198833710.001.0001

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

200    

Chapter 1 In the first chapter, I motivated the project of developing a theory of degrees of epistemic rationality, and I gave an overview of the discussion in the book. I also introduced some crucial assumptions that form the basis of my discussion, but that readers might wish to alter or abandon. I chose to represent uncertainty in the standard Bayesian format as real numbers between 0 and 1. However, alternative representations of uncertainty are preferred by some proponents of theories of ideal rationality, such as imprecise or comparative credences. The questions that motivate my inquiry are just as applicable to these frameworks, and so are the general argumentative schemata I propose to answer them. However, the exact nature of the arguments we use to justify specific requirements of rationality has to be adapted to those representations of credences, as well as the ways in which we measure approximations to ideal rationality, and the formal justifications for why approximating ideal rationality is beneficial. For example, it is currently hotly debated how to measure the accuracy of imprecise credences, and the answer to this question will influence how to measure degrees of rationality for imprecise credences. Hence, one project for future research is to adapt the framework I propose to alternative representations of credences. I furthermore adopt the popular Bayesian framework for articulating and defending requirements of ideal epistemic rationality. This commits me to accepting probabilism as a fundamental epistemic norm on credences. I stay neutral about which further norms we should accept, but the menu of options is somewhat constrained by the acceptance of probabilism and the choice of Bayesianism as the underlying general theory. There are alternative formal frameworks we could adopt that are similar to Bayesianism in that they let us articulate norms of ideal epistemic rationality pertaining to credences, for example ranking theory or Dempster–Schafer theory. Insofar as they are theories of ideal rationality, they face the same questions about their applicability to non-ideal thinkers as Bayesianism. Answering them is an outstanding research project for proponents of these views. Lastly, I assume a broadly value-based view of rationality. On this view, the requirements of rationality are explained in terms of some type of (epistemic) value that is best attained by being ideally rational. On this type of view, we can understand the benefit of approximating ideal rationality in terms of value as well. The closer you are to being rational, the larger the portion of value you get. But this is not the only way in which we might justify requirements of

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

201

rationality. Some people have suggested reasons-based justifications for being rational, and some might argue that there is no further justification—being rational is valuable in itself. Insofar as these justifications are promoting requirements of ideal rationality, they have to face the same questions as value-based approaches. Hence, proponents of these approaches owe an answer to the question of how thinkers benefit from approximating ideal rationality, and how we should measure such approximations.

Chapter 2 In Chapter 2, I introduced the standard Bayesian formalism, and I argued that current avenues of research don’t contribute to explaining how the theory applies to non-ideal thinkers. Rather, current research is mostly concerned with pinning down the correct theory of ideal rationality. Moreover, I argued that the systematization view is the best way of understanding the Bayesian methodology, since it can straightforwardly explain why Bayesian principles of ideal rationality apply to human non-ideal thinkers. I employed a standard Bayesian setup, which defines a probability function over a finite algebra of sentences. Limiting representations of credences in this way is a common strategy to keep mathematical models simple, and once the desired results have been established, the models are expanded to capture thinkers who have infinitely many beliefs. It is a question for future research how the measures of degrees of rationality can be modified to apply to infinitely large credence assignments. Furthermore, the more fundamental question of whether it is important to expand our models to infinitely large credence assignments also merits further discussion. I have also bracketed the question of how to treat cases in which thinkers change the underlying algebra of statements to which they assign credences. This problem is sometimes referred to in the literature as the problem of new theories. My understanding is that there isn’t currently any sort of consensus about how ideally rational thinkers should distribute their credences in those cases. But proponents of particular views on this matter might want to think about how their suggestions extend to non-ideal thinkers who modify their algebra. I argued for a particular way of understanding the Bayesian theory of rationality—as a systematization of common sense normative and descriptive platitudes about degrees of belief. The view uses normative mathematical models in order to formulate and investigate those systematizations. My

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

202     discussion of these points is relatively brief, and I expect that there is much more to be said about the merits of a theory of rationality that takes ideal rationality as its starting point, and about the best way of using and understanding normative models.

Chapter 3 In Chapter 3, I introduced four desiderata that measures of degrees of incoherence (and measures of degrees of rationality more generally) have to satisfy. I showed that a straightforward way of setting up a qualitative measure fails to meet those desiderata. Quantitative measures, by contrast, turned out to fare better. I introduced the general idea of measuring degrees of incoherence by representing credence assignments as vectors and determining the distance from a given credence assignment to some closest coherent credence assignment. Depending on which distance measure or divergence is chosen, we can get different verdicts about how close a given credence assignment is to coherence. I gave examples of common distance measures and divergences to illustrate their different verdicts. I chose the measures I used as examples because they line up with particular values that underwrite requirements of rationality, such as action-guidingness (or avoidance of Dutch books), and accuracy, which I discuss in later chapters. But of course, those are not the only values one might appeal to in justifying requirements of rationality. There are many suitable distance measures and divergences that I didn’t explicitly consider, and that might track other types of value. There is more work to be done in studying the properties of these measures and determining whether they would be useful for measuring degrees of rationality. Moreover, proponents of comparative, non-numerical representations of degrees of belief might want to take another look at qualitative measures of degrees of coherence or rationality, since quantitative measures might be inapplicable to their formal models of credences.

Chapters 4 and 5 In Chapters 4 and 5, I developed a general argument structure to justify why it is good to approximate ideal rationality: First, identify some value that justifies some requirement of rationality, in the sense that complying with the

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

203

requirement best promotes this value, and violating the requirement precludes optimal promotion of the value. Then identify a way in which this value can be had to greater or lesser degrees. Lastly, select a distance measure or divergence such that approximating ideal compliance with the requirement of rationality delivers increasing amounts of the value. Both the distance measure and the method of quantifying amounts of value need to be independently plausible, so as to avoid ad hocism. In Chapter 4, I applied this general structure to the Dutch book justification for probabilism, and in Chapter 5 to the accuracy justification for probabilism. In each case, we can identify incoherence measures that track improvements in Dutch book vulnerability, and improvements in accuracy. Moreover, there are ways of reducing incoherence that lead to decreased Dutch book vulnerability and higher accuracy at the same time. I chose the Dutch book and accuracy arguments to fill in the general argument schema, because they are very popular, and well formulated and studied. But of course, there are many other ways in which one might try to justify particular requirements of rationality. It is a task for future research to identify these justifications and to explore whether the argument schema I propose can be applied to them. Furthermore, we didn’t manage to settle on one particular incoherence measure as the only correct one for measuring the overall incoherence of credence functions. At the end of Chapter 5, I listed a variety of ways in which we can further narrow down which measure we want to use, but one’s choice will largely depend on one’s wider epistemological commitments. I encourage readers to make up their own minds about this.

Chapter 6 In Chapter 6, I introduced several popular principles of rationality that many Bayesians endorse in addition to coherence, and I asked how we should measure approximations to ideal rationality when multiple principles of rationality apply to a credence assignment at the same time. I introduced different formal strategies for measuring degrees of rationality, which naturally line up with different ways of thinking about how epistemic values underwrite requirements of rationality. I showed that, if a thinker is unable to comply with one principle, the way to achieve the highest degree of rationality possible does not usually involve complying with the remaining principles, even if this is still possible.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

204     I stay neutral in this chapter about what the correct principles of rationality are, and how they are justified in terms of epistemic value (if that’s how they are justified). Ultimately, these questions need to be settled, but my view is designed to cover the different possible views we might arrive at. Of course, there might also be further principles of rationality that I haven’t considered. For example, there is currently a lively debate about how rational thinkers should respond to receiving various kinds of higher-order evidence. It is an interesting project for future research to integrate rational norms pertaining to higher-order evidence into an account of degrees of rationality.

Chapter 7 In Chapter 7, I showed how we can use our measures of degrees of rationality to evaluate credence changes. I considered three types of changes: cases in which the thinker revises their credences (i) without learning new information or adding new attitudes; (ii) without learning new information, but adding new attitudes; (iii) as a response to learning new evidence. I argued that evaluating credence changes by checking whether they instantiate good reasoning patters is not very useful, because reasoning patterns that are suitable for use by ideally rational thinkers often produce suboptimal outcomes when used by irrational thinkers. It is better to measure how a credence change affects the degree of rationality of the thinker’s attitudes. I also suggested that the recommendations of the piecemeal+averaging strategy for how to optimally engage in augmentative reasoning were less plausible than the recommendations of the bundle strategy, which provides evidence that even proponents of multiple value views should use the bundle strategy for measuring degrees of irrationality. The discussion in this chapter raises a number of interesting questions that are worthy of further investigation: Can we identify good patterns of reasoning for thinkers who reason from irrational starting points? Or is it impossible to adapt the pattern method in a systematic way to make it applicable to irrational thinkers? Regarding the debate about whether there is more than one rationally permissible credence assignment for each thinker, we found that it wasn’t clear in virtue of what an irrational thinker could be thought of as having adopted a particular prior. This complicates the debate about whether intrapersonal uniqueness is true, and merits further investigation. Moreover, I also left open the more general question of what distinguishes mere changes in a thinker’s attitudes from genuine reasoning.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 

205

Chapter 8 Chapter 8 is devoted to exploring how the view of degrees of rationality I defend fits into the bigger picture of theorizing about rationality. I specifically discussed the relationship between propositional and doxastic rationality, the relationship between ideal and ecological rationality, the relationship between evaluative and ameliorative approaches to theorizing about rationality, the relationship between epistemological and semantic perspectives on rationality, and the relationship between rational evaluations, permissions, and obligations. The upshot of my discussion was that my account of the propositional epistemic rationality of credences not only harmonizes with the ways in which rationality is theorized in these domains, but is often a necessary ingredient for developing theories of these aspects of rationality. Since Chapter 8 was designed to zoom out and explore relationships between different ways of understanding rationality, my discussion could do hardly more than scratch the surface. All of the connections I investigated merit a much deeper exploration. In my view, the most important philosophical issues brought up by my discussion are getting clearer about the relationship between propositional and doxastic rationality, and developing an account of epistemic norms that is not merely evaluative, but followable or guiding. Moreover, I suggest that there is far less conflict than is commonly assumed between Bayesian accounts of ideal rationality, and ways of thinking about rationality that emphasize feasibility constraints on what is rational. There’s more research to be done in exploring how these perspectives can be reconciled with each other.

Chapter 9 Chapter 9 expands the focus of the discussion in a different way. In earlier chapters, the only doxastic attitudes I took into consideration were credences, i.e. doxastic attitudes that encode uncertainty. But according to an extremely plausible picture of the human mind, we also have outright beliefs, which don’t take into account uncertainty, and let us treat claims as either true or false. I endorsed the view that the function of outright beliefs is to simplify reasoning tasks for limited thinkers like us. This view raises a host of questions about how we manage beliefs and credences in such a way that beliefs can serve this function. I argued that the available strategies for managing beliefs and credences across contexts that are compatible with the simplifying function

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

206     of outright beliefs can give rise to synchronic and diachronic incoherence in a thinker’s attitudes. Trading simplicity for coherence need not be a bad deal, however, as long as the degree of incoherence stays low enough. Moreover, I argued that the view of outright belief as a simplifying heuristic is incompatible with the view that there are ideal norms of coherence or consistency for outright belief that outstrip human thinkers’ reasoning capacities. If the main purpose of outright beliefs is to simplify reasoning, then beliefs can’t be governed by norms that are so difficult to comply with that doing so prevents the beliefs from serving this function. My discussion had to leave many descriptive and normative questions unanswered: How do human reasoners actually use beliefs and credences in reasoning, and what are their strategies for managing shifts between contexts? These are empirical questions that philosophers can’t answer with their methods alone. The most important normative question raised by my discussion is: What are the correct norms of reasoning with outright beliefs? Extant accounts in the literature tend to ignore whether complying with their proposed norms is feasible for human thinkers, but if I am right about the function of outright beliefs, this needs to be taken into account.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

References Arntzenius, Frank. 2003. Some Problems for Conditionalization and Reflection. Journal of Philosophy 100 (7), 356–70. Bickel, Eric J.. 2007. Some Comparisons among Quadratic, Spherical, and Logarithmic Scoring Rules. Decision Analysis 4 (2), 49–65. Boghossian, Paul. 2014. What is Inference? Philosophical Studies 16 (1), 1–18. Bokulich, Alisa. 2011. How Scientific Models Can Explain. Synthese 180 (1), 33–45. Brase, Gary L. & Hill, W. Trey. 2015. Good Fences Make for Good Neighbors but Bad Science: A Review of What Improves Bayesian Reasoning and Why. Frontiers in Psychology 6 (340). Briggs, R. A. 2009. Distorted Reflection. Philosophical Review 118 (1), 59–85. Broome, John. 2013. Rationality through Reasoning, Chichester: Wiley-Blackwell. Brown, Jessica. 2008. Subject-Sensitive Invariantism and the Knowledge Norm for Practical Reasoning. Noûs 42 (2), 167–89. Buchak, Lara. 2014a. Belief, Credence, and Norms. Philosophical Studies 169 (2), 285–311. Buchak, Lara. 2014b. Risk and Rationality. Oxford: Oxford University Press. Caportoti, A., Regoli, G., & Vattari, F. 2009. On the Use of a New Discrepancy Measure to Correct Incoherent Assessments and to Aggregate Conflicting Opinions Based on Imprecise Conditional Probabilities. Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications, ISIPTA, ed. Thomas Augustin. Society for Imprecise Probability. Carnap, Rudolf. 1962. Logical Foundations of Probability. 2nd ed. Chicago: University of Chicago Press. Carr, Jennifer. 2015. Epistemic Expansions. Res Philosophica 92 (2), 217–236. Carr, Jennifer. 2019. Subjective Probability and the Content/Attitude Distinction. In Oxford Studies in Epistemology 6. T. S. Gendler & John Hawthorne (eds.). Oxford: Oxford University Press, 35–57. Cha, Sung-Hyuk. 2007. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences 4 (1), 300–7. Charlow, Nate & Chrisman, Matthew (eds.). 2016. Deontic Modality. Oxford: Oxford University Press. Christensen, David. 1991. Clever Bookies and Coherent Beliefs. The Philosophical Review 100 (2), 229–247. Christensen, David. 2004. Putting Logic in its Place. Oxford: Oxford University Press. Christensen, David. 2007. Does Murphy’s Law Apply in Epistemology? In Oxford Studies in Epistemology 2. T. S. Gendler & John Hawthorne (eds.). Oxford: Oxford University Press, 3–31. Christensen, David. 2007. Epistemology of Disagreement: The Good News. Philosophical Review 116, 187–217. Christensen, David. 2010. Higher-Order Evidence. Philosophical and Phenomenological Research 81 (1), 185–215.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

208  Clarke, Roger. 2013. Belief Is Credence One (In Context). Philosophers’ Imprint 13, 1–18. Colyvan, Mark. 2013. Idealization in Normative Models. Synthese 190 (8), 1337–50. Conee, Earl & Feldman, Richard. 1998. The Generality Problem for Reliabilism. Philosophical Studies 89 (1), 1–29. Cooper, Gregory F. 1990. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks. Artificial Intelligence 42 (2–3), 393–405. Dallmann, Justin. 2017. When Obstinacy is a Better (Cognitive) Policy. Philosophers’ Imprint 17 (24), 1–17. De Bona, Glauber. 2016. Measuring Inconsistency in Probabilistic Knowledge Bases. Ph.D. Thesis, Institute of Mathematics and Statistics of the University of São Paulo. De Bona, Glauber & Finger, Marcelo. 2015. Measuring Inconsistency in Probabilistic Logic: Rationality Postulates and Dutch Book Interpretation. Artificial Intelligence 227, 140–64. De Bona, Glauber & Staffel, Julia. 2017. Graded Incoherence for Accuracy-Firsters. Philosophy of Science 84 (2), 189–213. De Bona, Glauber & Staffel, Julia. 2018. Why be (approximately) coherent? Analysis. online first. de Finetti, Bruno. 1937. La Prevision: ses lois logiques, se sources subjectives. Annales de l’Institut Henri Poincare 7: 1–68; Translated into English and reprinted in: Kyburg and Smokler (eds). Studies in Subjective Probability. Huntington, NY: Krieger, 1980. de Finetti, Bruno. 1974. Theory of Probability. Chichester: Wiley. DeRose, Keith. 1992. Contextualism and Knowledge Attributions. Philosophy and Phenomenological Research 52 (4), 913–29. Deza, Michel Marie & Deza, Elena. 2009. Encyclopedia of Distances. Berlin/Heidelberg: Springer. Dogramaci, Sinan. 2018. Rational Credence Through Reasoning. Philosophers’ Imprint 18 (11), 1–25. Dogramaci, Sinan & Horowitz, Sophie. 2016. An Argument for Uniqueness about Evidential Support. Philosophical Issues 26 (1), 130–47. Dorst, Kevin. 2019. Evidence: A Guide for the Uncertain. Philosophy and Phenomenological Research, online first. Douven, Igor. 2009. Uniqueness Revisited. American Philosophical Quarterly 46, 347–61. Earman, John. 1992. Bayes or Bust. Cambridge, MA: MIT Press. Easwaran, Kenny. 2011a. Bayesianism II: Applications and Criticisms. Philosophy Compass 6 (5), 321–32. Easwaran, Kenny. 2011b. Varieties of Conditional Probability. In: Handbook for Philosophy of Statistics. P. Bandyopadhyay and Malcolm Forster (eds.). Elsevier, 137–48. Easwaran, Kenny. 2013a. Why Countable Additivity? Thought 2 (1), 53–61. Easwaran, Kenny. 2013b. Expected Accuracy Supports Conditionalization—and Conglomerability and Reflection. Philosophy of Science 80 (1): 119–42. Easwaran, Kenny & Fitelson, Branden. 2012. An “Evidentialist” Worry about Joyce’s Argument for Probabilism. Dialectica 66 (3), 425–33. Easwaran, Kenny & Fitelson, Branden. 2015. Accuracy, Coherence, and Evidence. Oxford Studies in Epistemology 5, 61–96. Edwards, Ward. 1982. Conservatism in Human Information Processing. In: Daniel Kahnemann, Paul Slovic, & Amos Tversky (eds.). Judgment under Uncertainty: Heuristics and Biases, Cambridge: Cambridge University Press, 359–69. Elga, Adam. 2000. Self-Locating Belief and the Sleeping Beauty Problem. Analysis 60 (2), 143–7.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



209

Elga, Adam. 2010. Subjective Probabilities Should Be Sharp. Philosophers’ Imprint 10 (5), 1–11. Evans, Jonathan St. B. T. & Stanovich, Keith E. 2013. Dual-Process Theories of Higher Cognition: Advancing the Debate. Perspectives on Psychological Science 8 (3), 223–41. Evans, Jonathan St. B. T., Thompson, Valerie A., & Over, David E. 2015. Uncertain Deduction and Conditional Reasoning. Frontiers in Psychology 6, 398. Feldman, Richard. 2007. Reasonable Religious Disagreements. In: L. Antony (ed.), Philosophers Without Gods. Oxford: Oxford University Press, 194–214. Field, Hartry & Milne, Peter. 2009. The Normative Role of Logic. Proceedings of the Aristotelian Society Supplementary Volume LXXXIII, 251–68. Firth, Roderick. 1978. Are Epistemic Concepts Reducible to Ethical Concepts? In: A. I. Goldman & J. Kim (eds.). Values and Morals. Dordrecht: D. Reidel, 215–29. Foley, Richard. 2009. Beliefs, Degrees of Belief, and the Lockean Thesis. In: F. Huber & C. Schmidt-Petri (eds.). Degrees of Belief, Synthese Library 342, Springer, 2009, 37–47. Gallow, J. Dmitri. 2014. How to Learn from Theory-Dependent Evidence; or Commutativity and Holism: A Solution for Conditionalizers. British Journal for the Philosophy of Science 65 (3), 493–519. Gibbs, Cameron. 2017. Bayesing for the Bayesian. Synthese, online first. Gigerenzer, Gerd. 2008. Rationality for Mortals. How People Cope with Uncertainty. Oxford: Oxford University Press. Gneiting, T. & Raftery, A. E. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association 102 (477), 359–78. Graefe, A., & Armstrong, J. S. 2012. Predicting Elections From the Most Important Issue: A Test of the Take-the-Best Heuristic. Journal of Behavioral Decision Making 25 (1), 41–8. Greco, Daniel. 2015. How I Learned to Stop Worrying and Love Probability 1. Philosophical Perspectives 29, 179–201. Greco, Daniel. 2017. Cognitive Mobile Homes. Mind 126 (501): 93–121. Greco, Daniel & Hedden, Brian. 2016. Uniqueness and Metaepistemology. Journal of Philosophy 113 (8), 365–95. Hacking, Ian. 1967. Slightly More Realistic Personal Probability. Philosophy of Science 34 (4), 311–25. Hájek, Alan. 2008b. Dutch Book Arguments. In: P. Anand, P. Pattanaik, & C. Puppe (eds.). The Oxford Handbook of Rational and Social Choice. Oxford: Oxford University Press, 173–95. Hájek, Alan. 2011. Conditional Probability. In: P. Bandyopadhyay & Malcolm Forster (eds.). Handbook for Philosophy of Statistics. Elsevier, 99–136. Hájek, Alan. 2012. Is Strict Coherence Coherent? Dialectica 66 (3), 411–24. Halpern, Joseph. 2003. Reasoning about Uncertainty. Cambridge, MA: MIT Press. Harman, Gilbert. 1986. Change in View. Cambridge, MA: MIT Press. Harsanyi, John C. 1985. Acceptance of Empirical Statements: A Bayesian Theory Without Cognitive Utilities. Theory and Decision 18 (1), 1–30. Hawthorne, John. 2004. Knowledge and Lotteries. Oxford: Oxford University Press. Hawthorne, John & Stanley, Jason. 2008. Knowledge and Action. Journal of Philosophy 105 (10), 571–90. Jaynes, E. T. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge University Press. Jeffrey, Richard. 1965. The Logic of Decision. New York: McGraw-Hill.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

210  Jeffrey, Richard. 1970. Dracula meets Wolfman: Acceptance vs. Partial Belief. In: M. Swain (ed.). Induction, Acceptance, and Rational Belief. Dordrecht: D. Reidel, 1970, 157–85. Jeffrey, Richard. 1992. Probability and the Art of Judgment. Cambridge: Cambridge University Press. Joyce, J. M. 1998. A Nonpragmatic Vindication of Probabilism. Philosophy of Science 65 (4), 575–603. Joyce, J. M. 2009. Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief. In: F. Huber & C. Schmidt-Petri (eds.). Degrees of Belief. Vol. 342 of Synthese Library, Berlin: Springer, 263–97. Kahnemann, Daniel. 2011. Thinking Fast and Slow. New York: Farrar, Straus and Giroux. Kelly, Thomas. 2013. Evidence Can Be Permissive. In: M. Steup, J. Turri, & E. Sosa (eds.). Contemporary Debates in Epistemology, 2nd ed. Hoboken: John Wiley & Sons, 298–311. Kennedy, Christopher & McNally, Louis. 2005. Scale Structure, Degree Modification, and the Semantics of Gradable Predicates. Language 81 (2), 345–81. Kirsh, David. 2003. Implicit and Explicit Representation. In: Lynn Nadel et al. (ed.), Encyclopedia of Cognitive Science. Wiley, 478–81. Kopec, Matthew & Titelbaum, Michael. 2016. The Uniqueness Thesis. Philosophy Compass 11 (4), 189–200. Kratzer, Angelika. 1981. The Notional Category of Modality. In: H Eikmeyer. & H. Rieser (eds.). Words, Worlds, and Contexts. Berlin: De Gruyter, 38–74. Kwisthout, Johan, Wareham, Todd, & van Rooij, Iris. 2011. Bayesian Intractability is not an Ailment that Approximation Can Cure. Cognitive Science 35 (5), 779–84. Lance, Mark Norris. 1995. Subjective Probability and Acceptance. Philosophical Studies 77 (1), 147–79. Laplace, P. S., 1814, English edition 1951. A Philosophical Essay on Probabilities. New York: Dover Publications Inc. Lasonen-Aarnio, Maria. 2014. Higher-Order Evidence and the Limits of Defeat. Philosophy and Phenomenological Research 88 (2), 314–45. Leitgeb, Hannes. 2016. The Stability of Belief: How Rational Belief Coheres with Probability. Oxford: Oxford University Press. Leitgeb, Hannes & Pettigrew, Richard. 2010a. An Objective Justification of Bayesianism I: Measuring Inaccuracy. Philosophy of Science 77 (2), 201–35. Lennertz, Benjamin. 2015. Quantificational Credences. Philosophers’ Imprint 15 (9), 1–24. Leonard, Nick. A Puzzle About Probabilism. Manuscript. Levi, Isaac. 1964. Belief and Action. The Monist 48 (2), 306–16. Levin, Janet. 2008. Assertion, Practical Reason, and Pragmatic Theories of Knowledge. Philosophy and Phenomenological Research 76 (2), 359–84. Levinstein, Benjamin A. 2017. A Pragmatist’s Guide to Epistemic Utility. Philosophy of Science 84 (2), 613–38. Lewis, David. 1974. Radical Interpretation. Synthese 27 (3), 331–44. Lewis, David. 1980. A Subjectivist’s Guide to Objective Chance. In: Richard C. Jeffrey (ed.). Studies in Inductive Logic and Probability, Volume II. Berkeley, CA: University of California Press, 83–132. Lewis, David. 1994. Humean Supervenience Debugged. Mind 103 (412), 473–90. Lin, Hanti. 2013. Foundations of Everyday Practical Reasoning. Journal of Philosophical Logic 42 (6), 831–62. Lin, Hanti. 2014. On the Regress Problem of Deciding How to Decide. Synthese 191, 661–70.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



211

Lin, Hanti & Kelly, Kevin. 2012. Propositional Reasoning that Tracks Probabilistic Reasoning. Journal of Philosophical Logic 41 (6), 957–81. Lipsey, R. G. & Lancaster, Kelvin. 1956–7. The General Theory of Second Best. The Review of Economic Studies 24 (1), 11–32. Liu, L. & Yager, R. 2008. Classic Works of the Dempster–Shafer Theory of Belief Functions: An Introduction. In L. Liu and R. Yager (eds.), Classic Works of the Dempster–Shafer Theory of Belief Functions. Studies in Fuzziness and Soft Computing 219. Berlin: Springer, 1–34. MacFarlane, John. 2004. In What Sense (If Any) Is Logic Normative For Thought? Manuscript, presented at the Central APA 2004. Mahtani, Anna. 2015. Dutch Books, Coherence, and Logical Consistency. Noûs 49 (3), 522–37. Mastropasqua, Tommaso, Crupi, Vincenzo, & Tentori, Katya. 2010. Broadening the Study of Inductive Reasoning: Confirmation Judgments with Uncertain Evidence. Memory & Cognition 38 (7), 941–50. McHugh, Conor & Way, Jonathan. 2018a. What is Reasoning? Mind 127 (505), 167–96. McHugh, Conor & Way, Jonathan. 2018b. What is Good Reasoning? Philosophy and Phenomenological Research 96 (1), 153–74. McMullin, E. 1985. Galilean Idealization. Studies in History and Philosophy of Science Part A 16 (3), 247–73. Meacham, Christopher J. G. 2014. Impermissive Bayesianism. Erkenntnis 79 (S6), 1185–217. Meacham, Christopher J. G. & Weisberg, Jonathan. 2011. Representation Theorems and the Foundations of Decision Theory. Australasian Journal of Philosophy 89 (4), 641–63. Moss, Sarah. 2011. Scoring Rules and Epistemic Compromise. Mind 120 (480), 1053–69. Moss, Sarah. 2018. Probabilistic Knowledge. Oxford: Oxford University Press. Mugg, Joshua. 2016. The Dual-Process Turn: How Recent Defenses of Dual-Process Theories of Reasoning Fail. Philosophical Psychology 29 (2), 300–9. Nagel, Jennifer. 2011. The Psychological Basis of the Harman–Vogel Paradox. Philosophers’ Imprint 11 (5), 1–28. Norby, Aaron. 2015. Uncertainty Without All the Doubt. Mind and Language 30 (1), 70–94. Norton, John. 2008. Ignorance and Indifference. Philosophy of Science 75 (1), 45–68. Novack, Greg. 2010. A Defense of the Principle of Indifference. Journal of Philosophical Logic 39 (6), 655–78. Osherson, Daniel & Vardi, Moshe. 2006. Aggregating Disparate Estimates of Chance. Games and Economic Behavior 56 (1), 148–73. Parfit, Derek. 1984. Reasons and Persons. Oxford: Oxford University Press. Pettigrew, Richard. 2013. Accuracy and Evidence. Dialectica 67 (4), 579–96. Pettigrew, Richard. 2016. Accuracy and the Laws of Credence. Oxford: Oxford University Press. Pettigrew, Richard. 2017a. Aggregating Incoherent Agents Who Disagree. Synthese. online first. Pettigrew, Richard. 2017b. Précis and Replies to Contributors for Book Symposium on Accuracy and the Laws of Credence. Episteme 14 (1), 1–30. Pettigrew, Richard. 2018. The Population Ethics of Belief: In Search of an Epistemic Theory X. Noûs 52 (2), 336–72. Pettigrew, Richard. 2019. On the Accuracy of Group Credences. Oxford Studies in Epistemology 6.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

212  Potyka, Nico. 2014. Linear Programs for Measuring Inconsistency in Probabilistic Logics. Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning, 568–77. Predd, J. B., Osherson, D., Kulkarni, S., & Poor, H. V. 2008. Aggregating Probabilistic Forecasts from Incoherent and Abstaining Experts. Decision Analysis 5 (4), 177–89. Predd, J. B., Seiringer, R., Lieb, E. H., Osherson, D. N., Poor, H. V., & Kulkarni, S. R. 2009. Probabilistic Coherence and Proper Scoring Rules. IEEE Transactions on Information Theory 55 (10), 4786–92. Ramsey, Frank P. 1926. Truth and Probability. In: Richard B. Braithwaite (ed.). Foundations of Mathematics and Other Logical Essay. London: Routledge & Kegan Paul, 1931, 156–98. Roeber, Blake. 2018. The Pragmatic Encroachment Debate. Noûs 52 (1), 171–95. Ross, Jacob & Schroeder, Mark. 2014. Belief, Credence, and Pragmatic Encroachment. Philosophy and Phenomenological Research 88 (2), 259–88. Russell, Jeffrey S., Hawthorne, John, & Buchak, Lara. 2015. Groupthink. Philosophical Studies 172 (5), 1287–309. Schervish, Mark, Seidenfeld, Teddy, & Kadane, Joseph. 2000. How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence. International Journal of Uncertainty, Fuzziness and Knowlegde-based Systems 8, 347–55. Schervish, Mark, Seidenfeld, Teddy, & Kadane, Joseph. 2002. Measuring Incoherence. Sankhya: The Indian Journal of Statistics 64, 561–87. Schervish, Mark, Seidenfeld, Teddy, & Kadane, Joseph. 2003. Measures of Incoherence: How not to Gamble if You Must. In: José Bernardo et al. (eds.). Bayesian Statistics 7. Oxford: Oxford University Press, 385–401. Schervish, Mark, Seidenfeld, Teddy, & Kadane, Joseph. 2012. What Kind of Uncertainty is That? The Journal of Philosophy 109 (8–9), 516–33. Schoenfield, Miriam. 2014. Permission to Believe: Why Permissivism Is True and What It Tells Us About Irrelevant Influences on Belief. Noûs 48 (1): 193–218. Schoenfield, Miriam. 2018a. An Accuracy Based Approach to Higher Order Evidence. Philosophy and Phenomenological Research 96 (3), 690–715. Schoenfield, Miriam. 2018b. Permissivism and the Value of Rationality: A Challenge to the Uniqueness Thesis. Philosophy and Phenomenological Research. online first. Schwitzgebel, Eric. 2002. A Phenomenal, Dispositional Account of Belief. Noûs 36 (2), 249–75. Sen, Amartya. 1970. Interpersonal Aggregation and Partial Comparability. Econometrica 38 (3), 393–409. Shanahan, Murray. 2016. The Frame Problem. The Stanford Encyclopedia of Philosophy, Spring 2016 Edition, ed. Edward N. Zalta. https://plato.stanford.edu/archives/spr2016/ entries/frame-problem/ Sidgwick, Henry. 1981. The Method of Ethics. 7th ed. Indianapolis: McMillan. Silva, Paul. 2015. On Doxastic Justification and Properly Basing One’s Beliefs. Erkenntnis 80 (5), 945–55. Simon, Herbert A. 1955. A Behavioral Model of Rational Choice. The Quarterly Journal of Economics 69 (1), 99–118. Simpson, Robert Mark. 2017. Permissivism and the Arbitrariness Objection. Episteme 14 (4), 519–38. Slovic, Paul & Liechtenstein, Sarah. 1971. Comparison of Bayesian and Regression Approaches to the Study of Information Processing in Judgment. Organizational Behavior and Human Performance 6, 649–744.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi



213

Smithies, Declan. 2015. Ideal Rationality and Logical Omniscience. Synthese 192 (9), 2769–93. Spohn, Wolfgang. 2012. The Laws of Belief: Ranking Theory and its Philosophical Applications. Oxford: Oxford University Press. Staffel, Julia. 2015. Measuring the Overall Incoherence of Credence Functions. Synthese 192 (5), 1467–93. Staffel, Julia. 2016. Beliefs, Buses and Lotteries: Why Rational Belief Can’t be Stably High Credence. Philosophical Studies 173 (7), 1721–34. Staffel, Julia. 2017a. Accuracy for Believers. Episteme 14 (1), 39–48. Staffel, Julia. 2017b. Should I Pretend I’m Perfect? Res Philosophica 94 (2), 301–24. Staffel, Julia. 2018. How Do Beliefs Simplify Reasoning? Noûs, online first. Stefánsson, Orri. 2017. What is “Real” in Probabilism? Australasian Journal of Philosophy 95 (3), 573–87. Sturgeon, Scott. 2008. Reason and the Grain of Belief. Noûs 42 (1), 139–65. Sturgeon, Scott. 2015. The Tale of Bella and Creda. Philosophers’ Imprint 15 (31). Talbot, Brian. 2017. Repugnant Accuracy. Noûs, online first. Talbott, William. 1991. Two Principles of Bayesian Epistemology. Philosophical Studies 62 (2), 135–50. Tang, Weng Hong. 2015. Belief and Cognitive Limitations. Philosophical Studies 172 (1), 249–60. Tentori, Katya, Chater, Nick, & Crupi, Vincenzo. 2016. Judging the Probability of Hypotheses Versus the Impact of Evidence: Which Form of Inductive Inference is More Accurate and Time-Consistent? Cognitive Science 40, 758–78. Titelbaum, Michael. 2013. Quitting Certainties. Oxford: Oxford University Press. Titelbaum, Michael. 2015. Rationality’s Fixed Point (Or: In Defense of Right Reason). Oxford Studies in Epistemology 5, 253–94. Titelbaum, Michael. forthcoming a. Normative Modeling. Methods in Analytic Philosophy: A Contemporary Reader. ed. J. Horvath. Bloomsbury Academic Press. Titelbaum, Michael. forthcoming b. Return to Reason. To appear in an OUP volume on higher-order evidence, ed. Matthias Skipper & Asbjrn Steglich-Petersen. Toledo, Assaf & Sassoon, Galit W. 2011. Absolute vs. Relative Adjectives: Variance Within vs. Between Individuals. Proceedings of SALT 21, 135–54. Turri, John. 2010. On the Relationship between Propositional and Doxastic Justification. Philosophy and Phenomenological Research 80 (2), 312–26. van Fraasen, Bas. 1981. A Problem for Relative Information Minimizers in Probability Kinematics. British Journal for the Philosophy of Science 32 (4), 375–9. van Fraasen, Bas. 1984. Belief and the Will. Journal of Philosophy 81 (5), 235–56. van Fraasen, Bas. 1989. Laws and Symmetry. Oxford: Clarendon Press. van Fraassen, Bas. 1999. Conditionalization, a New Argument For. Topoi 18, 93–6. Weatherson, Brian. 2016. Games, Beliefs and Credences. Philosophy and Phenomenological Research 92 (2), 209–36. Wedgwood, Ralph. 2011. Primitively Rational Belief-Forming Processes. In: A. Reisner & A. Steglich-Petersen. Reasons for Belief. Cambridge: Cambridge University Press, 180–200. Wedgwood, Ralph. 2012. Outright Belief. Dialectica 66 (3), 309–29. Wedgwood, Ralph. 2017. The Normativity of Rationality. Oxford: Oxford University Press. Weirich, Paul. 2004. Realistic Decision Theory. Oxford: Oxford University Press. Weisberg, Michael. 2007. Three Kinds of Idealization. Journal of Philosophy 104 (2), 639–59.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

214  Weisberg, Jonathan. 2013. Knowledge in Action. Philosophers’ Imprint 13 (22). Weisberg, Jonathan. 2015. Updating, Undermining, and Independence. British Journal for the Philosophy of Science 66 (1), 121–59. Weisberg, Jonathan. forthcoming. Belief in Psyontology. Philosophers’ Imprint. White, Roger. 2005. Epistemic Permissiveness. Philosophical Perspectives 19, 445–59. White, Roger. 2010. Evidential Symmetry and Mushy Credence. Oxford Studies in Epistemology 3, 161–86. Wiens, David. 2016. Assessing Ideal Theories: Lessons from the Theory of Second Best. Politics, Philosophy, and Economics 15 (2), 132–49. Williamson, Jon. 1999. Countable Additivity and Subjective Probability. British Journal for the Philosophy of Science 50 (3), 401–16. Williamson, Jon. 2010. In Defense of Objective Bayesianism. Oxford: Oxford University Press. Williamson, Timothy. 2000. Knowledge and Its Limits. Oxford: Oxford University Press. Wimsatt W. C. 1987. False Models as Means to Truer Theories. In: M. Nitecki & A. Hoffman (eds.). Neutral Models in Biology. Oxford: Oxford University Press, 23–55. Woods, John. 2013. Epistemology Mathematicized. Informal Logic 33 (2), 292–331. Worsnip, Alex. 2018. The Conflict of Evidence and Coherence. Philosophy and Phenomenological Research 96 (1), 3–44. Yap, Audrey. 2014. Idealization, Epistemic Logic, and Epistemology. Synthese 191 (14), 3351–66. Zhao, Jiaying, Crupi, Vincenzo, Tentori, Katya, Fitelson, Branden, & Osherson, Daniel. 2012. Updating: Learning Versus Supposing. Cognition 124, 373–8. Zhao, Jiaying & Osherson, Daniel. 2010. Updating Beliefs in Light of Uncertain Evidence: Descriptive Assessment of Jeffrey’s Rule. Thinking & Reasoning 16 (4), 288–307. Zhao, Jiaying, Shah, Anuj, & Osherson, Daniel. 2009. On the Provenance of Judgments of Conditional Probability. Cognition 113, 26–36. Zynda, Lyle. 1996. Coherence as an Ideal of Rationality. Synthese 109 (2), 175–216.

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

Index Note: Tables and figures are indicated by an italic “t ”, “f ” and notes are indicated by “n” following the page number accuracy dominance argument 19–20, 74–6, 88–9, 108–10, 117–18, 121 accuracy measures 74, 87–93 Brier 74–5, 78, 81–3, 90–1, 109–10, 113n.13, 121–5 logarithmic 74, 78–9, 81–3, 92–3 spherical 74, 78–9, 83, 92–3 strictly proper 44, 78–9, 82–3, 87–8, 110–11, 141–2

ameliorative epistemology 116–19, 160–2 approximation constrained 112–16 unconstrained 106–12

Basing Principle 100 Bayesian Challenge 174–5, 183–5 belief norms of 189, 196–7 outright 172–83

Bronfman, Aaron 88–9 Brown, Jessica 185–6 Buchak, Lara 12n.2, 173n.3, 183–4

contextualism 185–6 Cook’s Illustrated 188n.13 Cooper, Gregory F. 181–2 credences conditional 16–17, 71–2, 190 imprecise 9, 22–3, 62n.6, 198 prior 97–9, 123, 133, 148–50, 179–80 quantificational 22–3 representation of 9–10, 173–4

Dallmann, Justin 186n.11 De Bona, Glauber 47n.5, 64–5, 71, 76–7, 79–82, 85–7, 122–3 Dempster-Schafer theory 10, 17–18, 198 deontic modals 165–9 desiderata for approximation measures comparability 36–7, 41–2, 53, 67 incompleteness 36, 42–3, 52, 67 judgment preservation 35–6, 40–1, 52, 67 no inundation 37–8, 53–4, 65, 68–9, 71, 87n.6

distance 45–52 Carr, Jennifer 9–10, 22n.11, 53n.6 Christensen, David 23–4, 26n.15, 32–3, 57–8, 102–3, 125 Clarke, Roger 173–4, 178 complexity of Bayesian inference 175–6, 181–2 conditionalization 16, 21–2, 133, 147–50, 180–1 pseudo 178–82 human approximation of 180–1

conditional probability 16, 17n.6 context sensitivity 158, 166–9, 177–8, 185–6, 190–2

absolute 46–8, 51, 65–7, 72, 86–7 Euclidean 46–7, 49, 51, 81–3, 90–2, 106–7, 113–14, 121–5 Chebyshev 46–7, 49, 51, 53–4, 64–5, 71, 86–7 Kullback–Leibler divergence 47, 50–1, 81–2, 91–2, 116–17 p-norm 46–7, 53–4

Dogramaci, Sinan 20–1, 100, 154–6 dominance 75, 117–18 current chance 121–2 strong 74, 79–80, 83, 121

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

216  dominance (cont.) undominated 75 weak 74, 79–80, 121

Dutch book argument 20, 44, 57–9, 69–72, 85–7, 123, 195–6 Dutch book normalization 59–67 bookie’s escrow 61–3 max 61–3, 65–7 neutral 61–7 sum 61–5 thinker’s escrow 60–3

Earman, John 30–1 Easwaran, Kenny 17n.6, 21–2, 31n.20, 57n.2, 97n.1, 99, 101–2, 108, 133n.3, 173n.3 epistemicism 89 epistemic rationality value based conception 10, 90, 94, 100–3, 120, 198–9

epistemic value(s) comparable 102, 111, 113–14, 141–2 conflicting 102–3, 125 incomparable 102, 104–6, 116–17, 120, 125–6, 140, 142 lexically ordered 102, 104–6, 112–13, 116–17, 120, 125, 140, 142–3 multiple value views 100–3 objectivist conception of 90–3 single value views 100–3 subjectivist conception of 90

epistemic standards 148–9 evaluative norms 32–3, 127–8, 160–2 evidentialism 101–2, 110–11 Finger, Marcelo 64–5, 71 Fitelson, Branden 31n.20, 101–2, 173n.3 Gigerenzer, Gerd 32, 157–8 gradable adjectives 162–5 Greco, Daniel 20–2, 173–4, 177 guiding norms see ameliorative epistemology Hájek, Alan, 17n.6, 20–1, 58n.3 Harman, Gilbert 137, 179n.8, 182n.10 Harsanyi, John 173–4, 173n.3, 178 heuristic 32, 155–60, 168, 172, 180–1, 188–9, 195–7

ideal agent view 25–6 inaccuracy measures see accuracy measures incoherence measure 87–93 qualitative 39–45 quantitative 45–54

Indifference Principle 20–1, 98, 107n.7, 109–11, 123–5, 144–5 inference rule, augmentative 136, 143 infinite domain 21–2 Jeffrey, Richard 21–2, 174, 183 Joyce, James 19–20, 74n.2, 78–9, 82n.5 justification see rationality Kadane, Joseph 59–71 knowledge 101–2, 184 Kopec, Matthew 97n.1, 148–9 Kullback-Leibler divergence see distance Leitgeb, Hannes 19–20, 174–5, 185, 196–7 Levin, Janet 185–6 Levinstein, Ben 92–3 Lewis, David 19–21, 98–9 Lin, Hanti 174–5, 177–8, 185 Lockean thesis see thresholds for belief logical learning 153–6 logical omniscience 15–16 maximin principle 108–9, 123 McHugh, Conor 128–31, 137, 145 measuring strategies for degrees of rationality bundle 103–12, 115–17, 119–20, 132, 138–41, 144–7 piecemeal 103–18, 120, 125–6, 132, 134–5, 140–3, 146–7

memory 190–2 mental states, storage of 179n.8, 181–2, 190–3 model 17, 26–8, 173–4, 199 Moss, Sarah 9–10, 91–2, 113–14, 141–2, 183–4 Nagel, Jennifer 177, 185–6 Norby, Aaron 190–5 normalization 14, 87–93, 192–3 NP-hardness 181–2 ought 162–5

OUP CORRECTED PROOFS – FINAL, 25/11/2019, SPi

 peanut butter sandwich 188–9 permissivism 20–1, 97, 133, 148 Pettigrew, Richard 19–21, 53n.6, 74–6, 78–9, 85, 89–91, 98–9, 103–4, 108–10, 113–14, 121, 178n.7, 183 pragmatic encroachment 196 preservation principle 138–42 Principal Principle 20–1, 31, 98–9, 103–4, 109–10, 121–3 probability axioms 14 probability function incomplete 14–16, 36, 52 prior 97–9, 123, 133, 148–50, 179–80

217

Schervish, Mark 59–71 Schroeder, Mark 174–5 scientific idealization view 25–7 scoring rules see accuracy measures second best, problem of 112 Seidenfeld, Teddy 59–71 semantics of “rational” 162–5 Smithies, Declan 16n.5, 32–3, 100, 155, 166n.7 stability theory 185, 196–7 statistical evidence 183–4 subject-sensitive invariantism 185–6 supervaluationism 89 System 1 reasoning 155–6, 187n.12 System 2 reasoning 155–6, 187n.12 systematization view 24–8

reasoning rules of 128–37, 143–7 without new information or new attitudes 132–4 without new information, but forming new attitudes 134–47 with new information 147–50 with credences and outright beliefs 174–80, 190–7

rationality bounded 32, 156–60, 168, 180–1, 195 ecological 156–60, 168, 195 doxastic 100, 153–6, 167–8 ordinary notion of 152–3, 160 propositional 19, 100, 116, 118–19, 128, 153–6, 162–8

ranking theory 10, 17–18, 198 Reflection Principle 20–1, 99, 108 Regularity Principle 20–1 representation of mental states, 179n.8, 181–2, 190–3 Ross, Jacob 174–5

Talbot, Brian 53n.6 Tentori, Katya 159–60, 180–1 thresholds for belief 193–4, 194t Titelbaum, Michael 16n.5, 17–18, 20–2, 27–8, 148–9, 154n.1 tradeoffs between simplicity and coherence 195–6 uniqueness 20–1, 97, 133, 148–50 unpacking effect 192–3 updating see conditionalization, reasoning vegan brunch 172–3 Way, Jonathan 128–31, 137, 145 Wedgwood, Ralph 100, 129, 154n.1, 173–4 Weisberg, Jonathan 9, 21–2, 173, 179n.8, 191–3 Zynda, Lyle 3, 12n.2, 16n.5, 39–45